Skip to content

[main] thd support and GDN packed-seq alignment#4296

Closed
DAISY-gh wants to merge 4 commits intoNVIDIA:mainfrom
DAISY-gh:thd-support-main
Closed

[main] thd support and GDN packed-seq alignment#4296
DAISY-gh wants to merge 4 commits intoNVIDIA:mainfrom
DAISY-gh:thd-support-main

Conversation

@DAISY-gh
Copy link
Copy Markdown
Contributor

Summary

Test plan

  • Run targeted unit tests for GDN/packed-sequence and MTP paths
  • Run relevant CI checks in NVIDIA/Megatron-LM
  • Validate no regression for non-THD path

@DAISY-gh DAISY-gh requested review from a team as code owners April 14, 2026 12:56
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 14, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@svcnvidia-nemo-ci svcnvidia-nemo-ci marked this pull request as draft April 14, 2026 12:57
@github-actions
Copy link
Copy Markdown
Contributor

This PR has been automatically converted to draft because all PRs must start as drafts.

When you are ready for review, click Ready for Review to begin the review process. This will:

  1. Add the oncall reviewer (optional reviewer)
  2. Add required review teams based on your changes

See the contribution guide for more details.

DAISY-gh and others added 4 commits April 14, 2026 20:58
Pass padding_mask from GPT postprocess into MTP layers and roll it per MTP step so MoE global aux-loss token accounting stays mask-consistent across decoder and MTP paths.
Signed-off-by: yuzhongw <yuzhongw@nvidia.com>
Co-authored-by: kunlunl <kunlunl@nvidia.com>
Restore partition_dim attributes and assert formatting in gated_delta_net.py to match the original PR2644 behavior exactly.
@DAISY-gh DAISY-gh marked this pull request as ready for review April 14, 2026 13:01
@svcnvidia-nemo-ci svcnvidia-nemo-ci requested a review from a team April 14, 2026 13:01
@yuzhongw-nvidia
Copy link
Copy Markdown
Contributor

yuzhongw-nvidia commented Apr 15, 2026

Hi @DAISY-gh , thanks for your work. We have already had one PR #2645 about GDN THD for main. Would you mind if I close this PR and cherry-pick your fix commits

into #2645? Additionally, we will add some UTs to cover your changes and help move the review progress forward. Thanks!

@DAISY-gh
Copy link
Copy Markdown
Contributor Author

Hi @DAISY-gh , thanks for your work. We have already had one PR #2645 about GDN THD for main. Would you mind if I close this PR and cherry-pick your fix commits

into #2645? Additionally, we will add some UTs to cover your changes and help move the review progress forward. Thanks!

Ok, is there an ETA for main to merge your PR and my fix (major on MTP, as well as propagate padding mask to the model)? Thanks and I'll close it.

@DAISY-gh
Copy link
Copy Markdown
Contributor Author

Yuzhong will cherry-pick the fix and merge #2645 later.

@DAISY-gh DAISY-gh closed this Apr 15, 2026
@yuzhongw-nvidia
Copy link
Copy Markdown
Contributor

Ok, is there an ETA for main to merge your PR and my fix (major on MTP, as well as propagate padding mask to the model)? Thanks and I'll close it.

We do not have an exact ETA because it is somehow out of my control, but I hope it can be merged within 1~2 weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants