Skip to content

Improved IB host-no-atomic mode#753

Open
chhwang wants to merge 12 commits intomainfrom
chhwang/fix-ib-no-atomic
Open

Improved IB host-no-atomic mode#753
chhwang wants to merge 12 commits intomainfrom
chhwang/fix-ib-no-atomic

Conversation

@chhwang
Copy link
Contributor

@chhwang chhwang commented Feb 24, 2026

Fix potential memory inconsistency in IB host-no-atomic mode, and reduce latency overhead by introducing GDRCopy.

  • IB no-atomic: 8-byte RDMA write-with-imm carries full 64-bit token to remote signal GPU buffer, which is read by the remote host before updating the inbound token for strict data-flag ordering.
  • GDRCopy: recv thread reads token via BAR1 (CUDA) or uncached GPU memory (ROCm); no more cudaMemcpyAsync/CUDA stream

@chhwang chhwang requested a review from a team February 25, 2026 04:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant