Conversation
2b73168 to
e404b77
Compare
Due to a known issue where WARP preview workflow stalls the machine indefinitely (#772) this PR disables the WARP preview builds from running automatically when modified by a PR. PRs like #935 and #769 have popped up to update/modify the WARP preview workflows and caused the machine to stall -- preventing it from running its usual scheduled jobs or other PRs.
|
Please update your branch so you don't accidentally stall one of the CI machines (see #970 for details) |
Keenuts
left a comment
There was a problem hiding this comment.
Need to spend more time on the ballot and generation implem. Initial comments to begin with.
| Err->release(); | ||
| return 0; | ||
| } | ||
| uint32_t SubgroupSize = PSO->threadExecutionWidth(); |
There was a problem hiding this comment.
This looks incorrect for Metal: When you create a compute pipeline state, it calculates the maximum number of threads available on the device. This value never changes, but may be different for different pipeline objects. ( https://developer.apple.com/documentation/metal/mtlcomputepipelinestate/maxtotalthreadsperthreadgroup )
It's not on the threadExecutionWidth property directly, but on the associated maxTotalThreadsPerThreadgroup, and explains why it's on the pipeline state instead of a device state (register pressure might impact it). Hence we can deduce the threadExecutionWidth is also not guaranteed to be stable across pipelines.
This should be added to the Device.h getMinMaxSubgroupSize() function: say that this in an hint for Metal at least.
| VkBufferUsageFlags Usage, | ||
| VkMemoryPropertyFlags MemoryFlags, | ||
| size_t Size, void *Data = nullptr) { | ||
| const VkDeviceSize AllocationSize = std::max<size_t>(Size, 1); |
There was a problem hiding this comment.
Why are we calling createBuffer with a size == 0?
Shouldn't this be an error instead? (like calling malloc(0) is right)
| VkBufferUsageFlags Usage, | ||
| VkMemoryPropertyFlags MemoryFlags, | ||
| size_t Size, void *Data = nullptr) { | ||
| const VkDeviceSize AllocationSize = std::max<size_t>(Size, 1); |
There was a problem hiding this comment.
Why are we calling createBuffer with a size == 0?
Shouldn't this be an error instead? (like calling malloc(0) is right)
| if (Data && Size > 0) { | ||
| void *Dst = nullptr; | ||
| if (vkMapMemory(IS.Device, Memory, 0, VK_WHOLE_SIZE, 0, &Dst)) | ||
| if (vkMapMemory(IS.Device, Memory, 0, AllocationSize, 0, &Dst)) |
There was a problem hiding this comment.
Why not sticking with VK_WHOLE_SIZE?
| } | ||
|
|
||
| ResourceBundle Bundle{getDescriptorType(R.Kind), R.size(), R.BufferPtr}; | ||
| const size_t LogicalSize = R.size(); |
There was a problem hiding this comment.
Seems like if the buffer size is zero, we should fail earlier or ignore it:
- size() should be computed when we set the buffer data (Data: or FillSize:)
Hence the size should at least be large enough to hold the getElementSize, hence this change could be reverted.
| if (MaxNestingLevel < 2) { | ||
| std::cerr << "Error: MaxNestingLevel must be >= 2" << std::endl; | ||
| return 1; | ||
| } | ||
|
|
||
| std::vector<uint32_t> Counts; | ||
| std::stringstream CountsStream(CountsPerLevel); | ||
| std::string CountItem; | ||
| while (std::getline(CountsStream, CountItem, ',')) { | ||
| Counts.push_back(std::stoi(CountItem)); | ||
| } | ||
|
|
||
| // Index by nesting level. Levels 0 and 1 are intentionally unused. | ||
| std::vector<uint32_t> TestsCountPerLevel(MaxNestingLevel + 1, 0); | ||
| if (Counts.size() == 1) { | ||
| for (uint32_t Level = 2; Level <= MaxNestingLevel; ++Level) { | ||
| TestsCountPerLevel[Level] = Counts[0]; | ||
| } | ||
| } else { | ||
| // Expect one count per level from 2 to MaxNestingLevel | ||
| const size_t Expected = MaxNestingLevel - 1; | ||
| if (Counts.size() != Expected) { | ||
| std::cerr << "Error: Expected " << Expected << " counts (for levels 2.." | ||
| << MaxNestingLevel << "), got " << Counts.size() << std::endl; | ||
| return 1; | ||
| } | ||
| for (size_t CountIndex = 0; CountIndex < Counts.size(); ++CountIndex) { | ||
| TestsCountPerLevel[static_cast<uint32_t>(CountIndex) + 2] = | ||
| Counts[CountIndex]; | ||
| } | ||
| } |
There was a problem hiding this comment.
| if (MaxNestingLevel < 2) { | |
| std::cerr << "Error: MaxNestingLevel must be >= 2" << std::endl; | |
| return 1; | |
| } | |
| std::vector<uint32_t> Counts; | |
| std::stringstream CountsStream(CountsPerLevel); | |
| std::string CountItem; | |
| while (std::getline(CountsStream, CountItem, ',')) { | |
| Counts.push_back(std::stoi(CountItem)); | |
| } | |
| // Index by nesting level. Levels 0 and 1 are intentionally unused. | |
| std::vector<uint32_t> TestsCountPerLevel(MaxNestingLevel + 1, 0); | |
| if (Counts.size() == 1) { | |
| for (uint32_t Level = 2; Level <= MaxNestingLevel; ++Level) { | |
| TestsCountPerLevel[Level] = Counts[0]; | |
| } | |
| } else { | |
| // Expect one count per level from 2 to MaxNestingLevel | |
| const size_t Expected = MaxNestingLevel - 1; | |
| if (Counts.size() != Expected) { | |
| std::cerr << "Error: Expected " << Expected << " counts (for levels 2.." | |
| << MaxNestingLevel << "), got " << Counts.size() << std::endl; | |
| return 1; | |
| } | |
| for (size_t CountIndex = 0; CountIndex < Counts.size(); ++CountIndex) { | |
| TestsCountPerLevel[static_cast<uint32_t>(CountIndex) + 2] = | |
| Counts[CountIndex]; | |
| } | |
| } | |
| std::vector<uint32_t> Counts; | |
| { | |
| std::stringstream CountsStream(CountsPerLevel); | |
| std::string CountItem; | |
| while (std::getline(CountsStream, CountItem, ',')) { | |
| Counts.push_back(std::stoi(CountItem)); | |
| } | |
| } | |
| if (MaxNestingLevel < 2) { | |
| std::cerr << "Error: MaxNestingLevel must be >= 2" << std::endl; | |
| return 1; | |
| } | |
| if (Counts.size() != 1 || Counts.size() != MaxNestingLevel - 1) { | |
| std::cerr << "Error: 1 or max_nesting_level - 1 counts must be given."; | |
| return 1; | |
| } | |
| assert(MaxNestingLevel >= 2); | |
| assert(Counts.size() == 1 || Counts.size() == MaxNestingLevel - 1); | |
| std::vector<uint32_t> TestsCountPerLevel; | |
| if (Counts.size() == 1) | |
| TestsCountPerLevel.assign(MaxNestingLevel - 1, 0); | |
| else | |
| TestsCountPerLevel.assign(Counts.begin(), Counts.end()); | |
| // Level 0 and 1 are set to 0 because X Y Z | |
| TestsCountPerLevel.insert(0, 0); | |
| TestsCountPerLevel.insert(0, 0); |
| /*ThreadgroupSizeX=*/7, | ||
| /*ThreadgroupSizeY=*/13); |
|
|
||
| typedef std::bitset<128> bitset_inv_t; | ||
|
|
||
| inline Ballot waveSizeToMask(uint32_t waveSize, uint32_t /*waveCount*/) { |
| } | ||
|
|
||
| protected: | ||
| uint32_t m_bits; |
| reconvergence::ReconvergenceTestGenerator TestGenerator(OutputDir, | ||
| TestExpectations); | ||
|
|
||
| // Wave size must be less than or equal to 128 with current ballot |
There was a problem hiding this comment.
Your bitset to ballot implem works on a uint64, meaning max wave size is 64, otherwise we might have issues.
Part of llvm/llvm-project#136930
The plan is to trigger the pipeline and check all the failing tests to add
XFAILstatements.The failing tests should be investigated, added to the
FailedTests.yaml, and have appropriate TODOs. Until this work is done, I plan to have the test pipeline pass regardless of the reconvergence test results.