-
Notifications
You must be signed in to change notification settings - Fork 267
Ck tile multi reduce improvements #3607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Ck tile multi reduce improvements #3607
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request optimizes the multi-reduction operation by reorganizing the reduction loop structure and improving memory access patterns. The key insight is swapping the loop order to read data once while performing multiple reduction operations, rather than reading data multiple times.
Changes:
- Swapped reduction loop from
{n_iter { num_op }}to{num_op {n_iter}}to minimize data reads - Improved vectorization logic to handle both contiguous and non-contiguous reduction dimensions
- Added validation to ensure all reduction operations share the same identity value (required for the new implementation)
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| include/ck_tile/ops/reduce/kernel/multi_reduce2d_kernel.hpp | Restructured reduction loop to process all operations per iteration, added identity value validation, and improved vectorization handling |
| test/ck_tile/reduce/test_multi_reduce2d_threadwise.cpp | Updated test configuration from Add-Max-Add to Add-SumSquare to align with new identity value requirement |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Proposed changes
Some performance improvement have been implemented:
Features change:
isSupportedArgumentand unit test have been updated to reflect the new behavior. See the discussion for an alternative design.Checklist
Please put an
xinto the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-formaton all changed filesDiscussion
isSupportedArgumentmethod allowing multiple different operations (like addition or bitwise OR). Another way to enforce this would be to remove the possibility to have multiple reduce operations, but keeping multiple different elementwise and accumulator operations, as in old CK.