[Misc] Enable dataclasses.dataclass with more than 31 ndarrays on Metal#392
Merged
hughperkins merged 16 commits intomainfrom Mar 4, 2026
Merged
[Misc] Enable dataclasses.dataclass with more than 31 ndarrays on Metal#392hughperkins merged 16 commits intomainfrom
hughperkins merged 16 commits intomainfrom
Conversation
Use @test_utils.test(arch=qd.metal) instead of @pytest.mark.parametrize, remove manual qd.init() calls and __main__ block per project test guidelines. Made-with: Cursor
When ndarrays are accessed via physical GPU addresses (to bypass Metal's 31 kernel parameter limit), the actual data buffers are not bound to the command encoder. Metal therefore cannot track read/write dependencies between consecutive kernel dispatches for these buffers. Fix by calling [encoder useResource:usage:] for each physically-addressed buffer before dispatch. Also add gpuAddress availability guard for macOS 13.0+. Made-with: Cursor
Ensures pending Metal command buffers are flushed before exposing ndarray GPU memory via DLPack, preventing stale reads. Made-with: Cursor
Made-with: Cursor
The sync() call in to_dlpack() is only needed on Metal where GPU commands must be explicitly flushed before DLPack access. Cache the Arch.metal enum at module level for fast lookup. Made-with: Cursor
duburcqa
reviewed
Feb 28, 2026
|
|
||
| uint64_t MetalDevice::get_memory_physical_pointer(DeviceAllocation handle) { | ||
| const MetalMemory &memory = get_memory(handle.alloc_id); | ||
| if (__builtin_available(macOS 13.0, iOS 16.0, *)) { |
Collaborator
Author
There was a problem hiding this comment.
Not sure. By the way, my preferred aproach would be to remove this, and stop supporting macos < 13. Thoughts?
Contributor
There was a problem hiding this comment.
Yes please go ahead. It is not supported by Genesis either.
duburcqa
approved these changes
Feb 28, 2026
When physical storage buffer is enabled (Metal), load_buffer and store_buffer were skipping the u1→u8/i8 conversion, causing SPIRV-Cross to emit 2-byte (short) accesses for booleans instead of 1-byte. This produced wrong strides and corrupted data for boolean arrays. Reorder the conditionals so u1 handling takes priority over the u64 physical pointer path, matching the behavior of the non-physical path. Made-with: Cursor
The Metal shader compiler on Apple Silicon miscompiles stores through physical GPU pointers when the byte offset involves a runtime stride multiply and the stored value is loop-invariant (only the first row of a 2D+ array gets written). Change the SPIRV codegen to emit OpConvertUToPtr on the base address once and OpPtrAccessChain for element indexing, instead of per-element OpConvertUToPtr from ulong arithmetic. SPIRV-Cross translates this to base[index] rather than per-element reinterpret_cast, which the Metal compiler handles correctly. The struct-wrapper fallback is preserved for code paths without decomposed pointer components (e.g. atomics). Includes regression tests covering const stores, row-dependent stores, various 2D/3D shapes, and multi-argument kernels. Made-with: Cursor
Made-with: Cursor
Covers atomic_add (i32, f32), atomic_sub, atomic_min, atomic_max, and a histogram pattern exercising concurrent writes from multiple threads to the same ndarray through physical pointers. Made-with: Cursor
Two-part fix for a miscompilation where SPIRV-Cross inlines physical storage buffer OpLoad results as pointer dereference expressions at each use site, causing re-reads from modified memory. 1. spirv_ir_builder.cpp: Emit MemoryAccessVolatileMask on all OpLoad instructions from physical storage buffer pointers. 2. SPIRV-Cross (spirv_glsl.cpp): Respect MemoryAccessVolatileMask on OpLoad by disabling expression forwarding, forcing a local temporary. This fixes Genesis broadphase insertion sort corruption where the sort key was re-read after array elements were shifted in place. Made-with: Cursor
duburcqa
approved these changes
Mar 2, 2026
erizmr
approved these changes
Mar 2, 2026
Update the SPIRV-Cross submodule to use the Genesis-Embodied-AI fork, which includes a fix to respect MemoryAccessVolatileMask on OpLoad (preventing expression forwarding for volatile physical pointer loads) and regression tests for it.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue: #
Brief Summary
copilot:summary
Walkthrough
copilot:walkthrough