Add basic HVX reduce kernels.#9506
Merged
copybara-service[bot] merged 1 commit intomasterfrom Feb 24, 2026
Merged
Conversation
1eea957 to
7f2b311
Compare
Kernels that should be reasonably good:
- min, max, min_max for all types
- sum, sum_squared for int8 and uint8 for k1 > 1
Kernels that are not good and need work:
- sum, sum_squared for int8 and uint8 for k1 = 1. These are currently naively implemented with conversions, and wide arithmetic (instead of widening arithmetic).
- In general k1 = 1 is not good because we unroll the accumulator by 2x/4x, so we can load whole vectors, which makes the accumulators really large (e.g. 128). This means that we're very likely to hit tail case code.
Example inner loop (k1 > 1 uint8 sum, sum_squared is almost identical):
```
.LBB30_116: // %while.body14.i
// Parent Loop BB30_110 Depth=1
// Parent Loop BB30_112 Depth=2
// Parent Loop BB30_114 Depth=3
// => This Inner Loop Header: Depth=4
{
v27 = vmemu(r5++#1)
}
{
v28 = vmemu(r6++#1)
}
{
v10.w += vrmpy(v27.ub,r9.b)
v23 = vmemu(r0++#1)
}
{
v9.w += vrmpy(v28.ub,r9.b)
v15 = vmemu(r7++#1)
}
{
v5.w += vrmpy(v23.ub,r9.b)
}
{
v6.w += vrmpy(v15.ub,r9.b)
r3 = add(r3,#-128)
}
{
p3 = cmp.gtu(r3,#127)
if (p3.new) jump:t .LBB30_116
}
```
PiperOrigin-RevId: 874380564
7f2b311 to
c28ce78
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add basic HVX reduce kernels.
Kernels that should be reasonably good:
Kernels that are not good and need work:
Example inner loop (k1 > 1 uint8 sum, sum_squared is almost identical):