[main] thd support and GDN packed-seq alignment#4296
[main] thd support and GDN packed-seq alignment#4296DAISY-gh wants to merge 4 commits intoNVIDIA:mainfrom
Conversation
|
This PR has been automatically converted to draft because all PRs must start as drafts. When you are ready for review, click Ready for Review to begin the review process. This will:
See the contribution guide for more details. |
Pass padding_mask from GPT postprocess into MTP layers and roll it per MTP step so MoE global aux-loss token accounting stays mask-consistent across decoder and MTP paths.
Signed-off-by: yuzhongw <yuzhongw@nvidia.com> Co-authored-by: kunlunl <kunlunl@nvidia.com>
Restore partition_dim attributes and assert formatting in gated_delta_net.py to match the original PR2644 behavior exactly.
e993b03 to
ec7ee79
Compare
|
Hi @DAISY-gh , thanks for your work. We have already had one PR #2645 about GDN THD for main. Would you mind if I close this PR and cherry-pick your fix commits
into #2645? Additionally, we will add some UTs to cover your changes and help move the review progress forward. Thanks! |
Ok, is there an ETA for main to merge your PR and my fix (major on MTP, as well as propagate padding mask to the model)? Thanks and I'll close it. |
|
Yuzhong will cherry-pick the fix and merge #2645 later. |
We do not have an exact ETA because it is somehow out of my control, but I hope it can be merged within 1~2 weeks. |
Summary
Test plan