Skip to content

Conversation

@msaroufim
Copy link
Member

@msaroufim msaroufim commented Feb 7, 2026

Summary

  • Fixes vulnerability where submissions could cache results based on Python object identity (id(tensor))

Changes

  1. Clone data before each timing iteration (outside the timed region) - gives fresh object identities while not affecting measured kernel time
  2. Use local seed variable instead of mutating test.args["seed"] - avoids shared mutable state

The benchmark harness was vulnerable to submissions that cache results
based on Python object identity (e.g., id(tensor)). Since the same
data objects were reused across all timing iterations, a submission
could cache on first call and return cached results on subsequent
calls, showing artificial speedups of 12-36%.

Changes:
- Clone data before each timing iteration (outside the timed region)
  to give each iteration fresh object identities while not affecting
  measured kernel time
- Use local seed variable instead of mutating test.args["seed"] to
  avoid shared mutable state between benchmark runs
Additional hardening on top of the object-identity caching fix:

- Shuffle data order each timing iteration to prevent call-count
  caching (a submission could track invocation count and predict
  which data item appears at each position)
- Move clone before torch.cuda.synchronize() so clone GPU copies
  can overlap with previous iteration's tail work
- Fix pre-existing recheck bug where only the last item's
  correctness was checked (if not good was outside the for loop)
- Use shuffle_order indices to correctly pair shuffled outputs
  with their reference data during recheck
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant