-
-
Notifications
You must be signed in to change notification settings - Fork 14.8k
rustc_session: be more precise about -Z plt=yes on x86-64? #141720
Copy link
Copy link
Open
Labels
A-codegenArea: Code generationArea: Code generationA-linkageArea: linking into static, shared libraries and binariesArea: linking into static, shared libraries and binariesC-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchCategory: An issue highlighting optimization opportunities or PRs implementing suchI-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.O-x86_64Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64)Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64)T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.
Metadata
Metadata
Assignees
Labels
A-codegenArea: Code generationArea: Code generationA-linkageArea: linking into static, shared libraries and binariesArea: linking into static, shared libraries and binariesC-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchCategory: An issue highlighting optimization opportunities or PRs implementing suchI-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.O-x86_64Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64)Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64)T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.
Type
Fields
Give feedbackNo fields configured for issues without a type.
in #109982 rustc switched to
-Z plt=yeson non-x86-64 platforms for a bunch of good reasons. and stuck with-Z plt=noby default on x86-64 for also good reasons! unfortunately, defaulting to-Z plt=nois a slight pessimization in programs heavily dependent on calls into statically linked libraries.PLT calls on x86 end up compiled to
e8 <addr>calls, which at link time can be rewritten to direct calls to the callee, and presumably deletion of the GOT entry. when we skip the PLT on x86-64, it seems that linkers are unwilling to do a link-time optimization offf 15 <GOT addr>into90 e8 <fn addr>when the callee is local to the object, so an indirect call to the object-local persists*.i expect
-Z plt=noto be better than-Z plt=yeson x86-64 for all cases where the called functions are dynamically linked. i also expect-Z plt=noto be worse than-Z plt=yeson x86-64 for all cases where the called functions are statically linked and <4 GiB from their call sites. it'd be nice if we could skipnon_lazy_bindif we know the called function is to be statically linked. if your compiled artifact is >4 GiB .. i've heard of such things but have no idea what's best :)"if we know the called function is to be statically linked" is the more annoying problem, though, because
rustc-link-libtells rustc only what libraries get what kind of linkage. especially on Unix-y platforms we don't know which of those platforms will provide a given symbol. the extern block can have a#[link(kind="static")]attribute which i've used in this minimized example of the problem i'm talking about, which almost seems like enough information to choose when to do this optimization at codegen-time. unfortunately, if the source file says#[link(name="util", kind="static")] extern "C" { pub fn foo(); }, and then you compile that source likerustc -l dylib=util ..., the command-line parameter simply overrides the link attribute and you end up with a dynamic link tofoowith the (in context) reasonableff 15 [GOT_entry]call.because of the
#[link]/-l KIND=NAMEinteraction i'm really not sure what to do here. i was going to initially suggest plumbing#[link(kind="static")]through to inform ifnonlazybindis appropriate, but i had expected that conflicting link directives would at least produce an error. silently ending up with the command line argument is pretty unfortunate. does it seem reasonable to plumb the#[link]attribute as a hint, advise#[link(kind="static")]for statically linked functions, and make conflicting#[link]and-larguments produce an error?memorysafety/rav1d#1417 is a more substantive case which motivates this issue, where hot code is a collection of assembly routines that are statically linked. i've written a longer analysis about the case in that issue, but it's just more supporting information around the observation above.
worse, for code that is hot around an indirect call to a constant target, branch prediction quite effectively hides the cost of this indirect call. if the hot code is more like a large region of warm code, the branch prediction can end up evicted and these indirect calls to a constant local function become quite costly.
worse (pt2), LLVM reasonably tries to improve the indirect call situation by hoisting loads to repeated calls of the same target, which can cause register pressure, additional spills, generally make this kind of unfortunate situation even worse.