[megatron] enable bucketed weight sync for non-colocated nccl weight sync in megatron by erictang000 · Pull Request #1324 · NovaSky-AI/SkyRL

erictang000 · 2026-03-14T07:18:21Z

Enable packed broadcast for non-colocated Megatron weight sync

Bucketing was previously only enabled for the CUDA IPC strategy (colocated mode). This extends it to the broadcast strategy (non-colocated mode), packing all tensors in each bucket into a single contiguous buffer and broadcasting it in one NCCL operation — matching how CUDA IPC already works. This reduces both per-tensor NCCL overhead and HTTP round-trips, which matters most for MoE models with many small expert parameters.

Changes

broadcast_strategy.py: Add sizes field to BroadcastWeightUpdateRequest; sender packs tensors into one buffer and broadcasts once per bucket; receiver unpacks using sizes
megatron_worker.py: Always enable bucketing, removing the CudaIpcTransferStrategy-only guard
test_megatron_worker.py: Add non_colocated_moe test entry for Moonlight-16B-A3B-Instruct

Results

Partial weight sync on 4xL40S (4 layers only)

Before:

After:

Full weight sync on 8xH100 (4 inf, 4 train) with Moonlight-16B-A3B

Before:

After:

Test

uv run --isolated --extra dev --extra megatron -- pytest -s tests/backends/skyrl_train/gpu/gpu_ci/test_megatron_worker.py::test_megatron_policy_weight_sync -k non_colocated

SumanthRH

Nice

erictang000 added 2 commits March 14, 2026 00:44

x

853aec7

add tensor flattening to eliminate actual overhead

fc4d480

This comment was marked as resolved.

Sign in to view

erictang000 added 4 commits March 14, 2026 07:30

ty gemini and devin

66d725e

revert test

400a101

oops

39798fe

x

a4c3ae2

erictang000 requested a review from SumanthRH March 14, 2026 17:53

SumanthRH approved these changes Mar 14, 2026

View reviewed changes

SumanthRH merged commit 84fea6f into NovaSky-AI:main Mar 14, 2026
5 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[megatron] enable bucketed weight sync for non-colocated nccl weight sync in megatron#1324

[megatron] enable bucketed weight sync for non-colocated nccl weight sync in megatron#1324
SumanthRH merged 6 commits intoNovaSky-AI:mainfrom
erictang000:enable_nccl_bucketing_megatron

erictang000 commented Mar 14, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

SumanthRH left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

erictang000 commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Enable packed broadcast for non-colocated Megatron weight sync

Changes

Results

Test

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

SumanthRH left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

erictang000 commented Mar 14, 2026 •

edited

Loading