Add (scalarized) safe gather and scatter ops#171
Open
valadaptive wants to merge 10 commits intolinebender:mainfrom
Open
Add (scalarized) safe gather and scatter ops#171valadaptive wants to merge 10 commits intolinebender:mainfrom
valadaptive wants to merge 10 commits intolinebender:mainfrom
Conversation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Depends on #170. The PR stack is getting a bit large.
When looking into gradient rendering in Vello, I noticed that we often perform many reads from the gradient color LUT. Each of those reads requires a bounds check.
The indices into this LUT come from vector types, so theoretically, we should be able to elide all the bounds checks like this:
Unfortunately, the compiler still does not recognize that the indices are guaranteed to be in-bounds. To elide the bounds checks, we need to perform the
minoperation on every index individually, after converting it to ausize.We can avoid this by introducing a "gather" operation here which ensures all the indices are valid (panicking if the source slice is empty and clamping the indices), then doing a bunch of unchecked accesses in a loop. There's an equivalent "scatter" operation which does the same thing but for writes.
These operations work on arbitrary slice types and
gatherreturns an array. The only vector type involved is the one that holds all the indices.This PR intentionally does not introduce any operations that gather into/scatter from vector types. That also means that no hardware gather/scatter instructions are used right now, and the performance benefit comes solely from avoiding bounds checks. I'm not sure if we should be reserving the names
gatherandscatterfor operations that work on vectors.