Skip to content

Add (scalarized) safe gather and scatter ops#171

Open
valadaptive wants to merge 10 commits intolinebender:mainfrom
valadaptive:gather-scatter
Open

Add (scalarized) safe gather and scatter ops#171
valadaptive wants to merge 10 commits intolinebender:mainfrom
valadaptive:gather-scatter

Conversation

@valadaptive
Copy link
Contributor

Depends on #170. The PR stack is getting a bit large.

When looking into gradient rendering in Vello, I noticed that we often perform many reads from the gradient color LUT. Each of those reads requires a bounds check.

The indices into this LUT come from vector types, so theoretically, we should be able to elide all the bounds checks like this:

let mut indices: u32x16<S> = [...];
assert!(!self.lut.is_empty());
assert!(self.lut.len() <= u32::MAX);
indices = indices.min((self.lut.len() - 1) as u32);

Unfortunately, the compiler still does not recognize that the indices are guaranteed to be in-bounds. To elide the bounds checks, we need to perform the min operation on every index individually, after converting it to a usize.

We can avoid this by introducing a "gather" operation here which ensures all the indices are valid (panicking if the source slice is empty and clamping the indices), then doing a bunch of unchecked accesses in a loop. There's an equivalent "scatter" operation which does the same thing but for writes.

These operations work on arbitrary slice types and gather returns an array. The only vector type involved is the one that holds all the indices.

This PR intentionally does not introduce any operations that gather into/scatter from vector types. That also means that no hardware gather/scatter instructions are used right now, and the performance benefit comes solely from avoiding bounds checks. I'm not sure if we should be reserving the names gather and scatter for operations that work on vectors.

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants