-
-
Notifications
You must be signed in to change notification settings - Fork 14.8k
Codegen weirdness for sum of count_ones over an array #101060
Copy link
Copy link
Open
Labels
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.A-autovectorizationArea: Autovectorization, which can impact perf or code sizeArea: Autovectorization, which can impact perf or code sizeA-codegenArea: Code generationArea: Code generationC-bugCategory: This is a bug.Category: This is a bug.I-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.O-x86_64Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64)Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64)P-highHigh priorityHigh priorityT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.WG-llvmWorking group: LLVM backend code generationWorking group: LLVM backend code generationregression-from-stable-to-stablePerformance or correctness regression from one stable version to another.Performance or correctness regression from one stable version to another.
Metadata
Metadata
Assignees
Labels
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.A-autovectorizationArea: Autovectorization, which can impact perf or code sizeArea: Autovectorization, which can impact perf or code sizeA-codegenArea: Code generationArea: Code generationC-bugCategory: This is a bug.Category: This is a bug.I-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.O-x86_64Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64)Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64)P-highHigh priorityHigh priorityT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.WG-llvmWorking group: LLVM backend code generationWorking group: LLVM backend code generationregression-from-stable-to-stablePerformance or correctness regression from one stable version to another.Performance or correctness regression from one stable version to another.
Type
Fields
Give feedbackNo fields configured for issues without a type.
(Issue loosely owned by @wesleywiser and @pnkfelix monitoring llvm/llvm-project#57476 )
Original Description below
Before 1.62.0, this code correctly compiled to two popcounts and an addition on a modern x86-64 target.
Since 1.62.0 (up to latest nightly), the codegen is... baffling at best.
The assembly for the original function is now a terribly misguided autovectorization. And, just to make sure (even though it's pretty obvious), I did run a benchmark - the autovectorized function is ~8x slower on my Zen 2 system.
Calling that function from a different function brings back normal assembly.
-Cno-vectorize-slpdoes nothing. I don't know exactly what-Cno-vectorize-loopsdoes, but it's not good.If you change the length of the array to 4, both functions get autovectorized.
-Cno-vectorize-slpfixes the second function now. Adding-Cno-vectorize-loopscauses the passthrough function to generate the worst assembly.Changing
into_itertoiterfixes length 2, but doesn't fix length 4.I could go on, but in short it's a whole mess.
I found a workaround that consistently works for all lengths:
iterand-Cno-vectorize-slp.@rustbot modify labels: +regression-from-stable-to-stable -regression-untriaged +A-array +A-codegen +A-iterators +A-LLVM +A-simd +I-slow +O-x86_64 +perf-regression