[main] thd support and GDN packed-seq alignment by DAISY-gh · Pull Request #4296 · NVIDIA/Megatron-LM

DAISY-gh · 2026-04-14T12:56:56Z

Summary

Add THD/GDN support and fixes on top of main integration branch.
This branch includes/aligned with the ongoing upstream change in Megatron-LM Open PR: [main] feat(moe): Support packed sequence for gated delta net (GDN) #2645.
Includes additional padding-mask propagation fix for MTP MoE routing.

Test plan

Run targeted unit tests for GDN/packed-sequence and MTP paths
Run relevant CI checks in NVIDIA/Megatron-LM
Validate no regression for non-THD path

copy-pr-bot · 2026-04-14T12:57:00Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2026-04-14T12:57:06Z

This PR has been automatically converted to draft because all PRs must start as drafts.

When you are ready for review, click Ready for Review to begin the review process. This will:

Add the oncall reviewer (optional reviewer)
Add required review teams based on your changes

See the contribution guide for more details.

Pass padding_mask from GPT postprocess into MTP layers and roll it per MTP step so MoE global aux-loss token accounting stays mask-consistent across decoder and MTP paths.

Signed-off-by: yuzhongw <yuzhongw@nvidia.com> Co-authored-by: kunlunl <kunlunl@nvidia.com>

Restore partition_dim attributes and assert formatting in gated_delta_net.py to match the original PR2644 behavior exactly.

yuzhongw-nvidia · 2026-04-15T00:31:27Z

Hi @DAISY-gh , thanks for your work. We have already had one PR #2645 about GDN THD for main. Would you mind if I close this PR and cherry-pick your fix commits

into #2645? Additionally, we will add some UTs to cover your changes and help move the review progress forward. Thanks!

DAISY-gh · 2026-04-15T00:42:01Z

Hi @DAISY-gh , thanks for your work. We have already had one PR #2645 about GDN THD for main. Would you mind if I close this PR and cherry-pick your fix commits

fix: propagate padding mask through mtp moe routing

align GDN packed-seq implementation with PR2644 baseline

into #2645? Additionally, we will add some UTs to cover your changes and help move the review progress forward. Thanks!

Ok, is there an ETA for main to merge your PR and my fix (major on MTP, as well as propagate padding mask to the model)? Thanks and I'll close it.

DAISY-gh · 2026-04-15T00:44:43Z

Yuzhong will cherry-pick the fix and merge #2645 later.

yuzhongw-nvidia · 2026-04-15T00:47:52Z

Ok, is there an ETA for main to merge your PR and my fix (major on MTP, as well as propagate padding mask to the model)? Thanks and I'll close it.

We do not have an exact ETA because it is somehow out of my control, but I hope it can be merged within 1~2 weeks.

DAISY-gh requested review from a team as code owners April 14, 2026 12:56

svcnvidia-nemo-ci marked this pull request as draft April 14, 2026 12:57

github-actions bot added the community-request label Apr 14, 2026

DAISY-gh and others added 4 commits April 14, 2026 20:58

fix: propagate padding mask through mtp moe routing

0cd7344

Pass padding_mask from GPT postprocess into MTP layers and roll it per MTP step so MoE global aux-loss token accounting stays mask-consistent across decoder and MTP paths.

support GDN packed sequence

48b0be5

Fix several bugs

19f2744

Signed-off-by: yuzhongw <yuzhongw@nvidia.com> Co-authored-by: kunlunl <kunlunl@nvidia.com>

align GDN packed-seq implementation with PR2644 baseline

ec7ee79

Restore partition_dim attributes and assert formatting in gated_delta_net.py to match the original PR2644 behavior exactly.

DAISY-gh force-pushed the thd-support-main branch from e993b03 to ec7ee79 Compare April 14, 2026 13:00

DAISY-gh marked this pull request as ready for review April 14, 2026 13:01

svcnvidia-nemo-ci requested a review from a team April 14, 2026 13:01

Victarry requested a review from yuzhongw-nvidia April 14, 2026 14:31

DAISY-gh closed this Apr 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[main] thd support and GDN packed-seq alignment#4296

[main] thd support and GDN packed-seq alignment#4296
DAISY-gh wants to merge 4 commits intoNVIDIA:mainfrom
DAISY-gh:thd-support-main

DAISY-gh commented Apr 14, 2026

Uh oh!

copy-pr-bot bot commented Apr 14, 2026

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

yuzhongw-nvidia commented Apr 15, 2026 •

edited

Loading

Uh oh!

DAISY-gh commented Apr 15, 2026

Uh oh!

DAISY-gh commented Apr 15, 2026

Uh oh!

yuzhongw-nvidia commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

DAISY-gh commented Apr 14, 2026

Summary

Test plan

Uh oh!

copy-pr-bot bot commented Apr 14, 2026

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

yuzhongw-nvidia commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DAISY-gh commented Apr 15, 2026

Uh oh!

DAISY-gh commented Apr 15, 2026

Uh oh!

yuzhongw-nvidia commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yuzhongw-nvidia commented Apr 15, 2026 •

edited

Loading