-
-
Notifications
You must be signed in to change notification settings - Fork 14.8k
Access to thread locals isn't inlined across crates #25088
Copy link
Copy link
Closed
Labels
A-codegenArea: Code generationArea: Code generationA-thread-localsArea: Thread local storage (TLS)Area: Thread local storage (TLS)C-enhancementCategory: An issue proposing an enhancement or a PR with one.Category: An issue proposing an enhancement or a PR with one.I-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.
Metadata
Metadata
Assignees
Labels
A-codegenArea: Code generationArea: Code generationA-thread-localsArea: Thread local storage (TLS)Area: Thread local storage (TLS)C-enhancementCategory: An issue proposing an enhancement or a PR with one.Category: An issue proposing an enhancement or a PR with one.I-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.
Type
Fields
Give feedbackNo fields configured for issues without a type.
Proposed solution
Right now access to thread locals defined by
thread_local!aren't inlined across crates, causing performance problems that wouldn't otherwise be seen within one crate. This can probably be solved with a few new minor language features:#[inline]annotation could be processed onstaticvariables. If the variable does not have any internal mutability, then the definition can be inlined into other LLVM modules and tagged withavailable_externally. That means that the contents are available for optimization, but if you're taking the address it's available elsewhere.#[inline].Those two pieces I believe should provide enough inlining opportunities to ensure that accesses are as fast when done from external crates as they are done with internal crates.
Original description
This hurts performance for the locks in
std::sync, as they callstd::rt::unwind::panicking()(which just reads a thread-local). For uncontended locks the cost is quite significant.There are two problems:
std::rt::unwind::panicking()isn't marked inline. This is trivial to solve.thread_local!goes through function pointers, which LLVM fails to see through. These are the__getitfunctions inlibstd/thread/local.rs. Consider these two files:call_foogets the following IR with everything compiled with full optimization. Note the call through a function pointer: