HipIGBADetector: use UniformConstant for __chip_module_has_no_IGBAs#1149
HipIGBADetector: use UniformConstant for __chip_module_has_no_IGBAs#1149
Conversation
Initialize in pass, immutable at runtime (runtime only reads it). CrossWorkgroup is for mutable device memory.
| // UniformConstant is correct: we initialize it in this pass, and at | ||
| // runtime it is immutable (the runtime only reads it). CrossWorkgroup | ||
| // is for mutable device memory. | ||
| SPIRV_UNIFORMCONSTANT_AS); |
There was a problem hiding this comment.
"the runtime only reads it" statement is ambiguous: Does it mean the host driver reads it? And/or the kernel? The statement is true for the former but not for the latter.
Switching storage class from CrossWorkgroup to UniformConstant is concerning here because the space for the latter can be scarce (e.g. 64kb on RTX 3080 through OpenCL). This can have increased change for launch failures for users already consuming most of the constant memory.
There was a problem hiding this comment.
I see the change here is motivated by an issue mentioned here:
CrossWorkgroup variables with initializers may be optimized or lowered incorrectly in the in-tree SPIR-V backend, producing invalid forward references.
Please, open a bug report on llvm-project and add link to the comment.
There was a problem hiding this comment.
This is a single variable placed in const memory is const mem space really a concern here? Reading from const memory should be faster and the space required is negligible, no? @linehill
There was a problem hiding this comment.
This is a single variable placed in const memory is const mem space really a concern here? Reading from const memory should be faster ...
It won't be faster in a way it matters. The variable is not read by kernels so there is no benefit placing it in constant memory. It's only read by SPIR-V parser from the SPIR-V bitcode once per module.
... and the space required is negligible, no? @linehill
I'd agree if it is placed in CrossWorkgroup but not if placed in constant memory - 64kb in the RTX3080 is not much even if the variable is fraction of it. You are not taking account an user might use up all the available scarce constant memory space which works fine until this change here. That said, it might not matter if the kernel compiler is smart enough to eliminate the variable out since it is not read by any kernel (or, at least, is not supposed to).
Okay yeah that makes sense. I'll double check if there are issues with crossworkgroup + llvm 22 + native |
Cherry-pick of 0bc8d63204afb3ec02a6d9d3f985913ef06c5995 onto main.
Original commit: HipIGBADetector: use UniformConstant for __chip_module_has_no_IGBAs