feat: add npu patch for qwen3-vl-8b grpo & ppo by cjy0x · Pull Request #1750 · THUDM/slime

cjy0x · 2026-03-23T02:20:12Z

PR Description

Overview

This PR includes a set of patches to adapt the training stack (megatron-bridge, megatron, mindspeed, sglang, slime) for NPU (Ascend) compatibility, along with several bug fixes.

megatron-bridge.patch

param_mapping.py: Added compatibility handling for the extra prefix introduced by mindspeed's patch on the TE module.
transformer_block.py: Fixed a recomputation (activation checkpointing) bug in the Qwen3-VL transformer block.

megatron.patch

@jit_fuser removal: Removed all @jit_fuser decorators, as torch_npu does not support the corresponding fused operators and causes errors on certain versions.
CUDA → NPU translation: Applied manual cuda→npu replacements where torch_npu's automatic translation failed to take effect.

mindspeed.patch

fused_rope.py: Added argument alignment shim to handle the version gap between mindspeed and upstream megatron.
megatron_adaptor.py: Added an args format conversion layer, as slime passes args in a format incompatible with mindspeed's expected structure.

sglang.patch

qwen3_vl.py - fast_pos_embed_interpolate: Fixed the fast_pos_embed_interpolate function in the vision model to correct positional embedding interpolation behavior. ([Reference](https://github.com/sgl-project/sglang/pull/19693/changes#diff-ae8ed8a721eecaefcf77446a0f593b7c92d2270cbb1589ceaeea2e2b68981c6cR401-R483))
input_deepstack_embeds revert: Reverted a previous modification to input_deepstack_embeds, replacing it with the equivalent three-line implementation from upstream. ([Reference](https://github.com/sgl-project/sglang/pull/19693/changes#diff-190683f5c7ffde48eb0ce015d76532f833fe1e122779eae6da8181dbfab3895bR142-R144))
attention_backend: Routed attention computation through the FIA operator in the attention backend for NPU compatibility.

slime.patch

NPU detection: Added an is_npu() utility in common.py to gate NPU-specific logic throughout the codebase.
Ray resource allocation: Replaced num_gpus=N in ray.remote / .options() calls with resources={"NPU": N} (or "GPU" on non-NPU), as ray.remote(num_gpus=...) is not supported on NPU. Affects actor_group.py, placement_group.py, and rollout.py.
Ray GPU ID API: Replaced unsupported ray.get_gpu_ids() with ray.get_runtime_context().get_accelerator_ids()["NPU"], with additional int() casting since the returned values are strings. Affects placement_group.py.
Manual device/backend replacements: Applied manual corrections for cases torch_npu cannot auto-translate: nccl→hccl (actor.py, update_weight_from_distributed.py), cuda→npu (memory_utils.py), and CUDA_VISIBLE_DEVICES→ASCEND_RT_VISIBLE_DEVICES (sglang_engine.py).
MindSpeed plugin integration: Added the necessary initialization code to load the mindspeed plugin when using megatron as the training backend. Affects train.py and actor.py.
model_provider argument injection: Injected mindspeed-specific arguments into the megatron_bridge model provider, as they are otherwise not recognized during model loading. Also injects recomputation-related config here, since setting it at the script level has no effect. This will be simplified once mindspeed handles injection natively. Affects model_provider.py.

grpo raw_reward curve:

Co-authored-by: shiyuan680 <917935075@qq.com> Co-authored-by: PengchengShi00 <spc117369@gmail.com> Signed-off-by: cjy0x <isjunyi.chen@gmail.com>

cjy0x changed the title ~~add npu patch for qwen3-vl-8b grpo & ppo based on v0.2.2~~ add npu patch for qwen3-vl-8b grpo & ppo based on tag v0.2.2 Mar 23, 2026

cjy0x changed the title ~~add npu patch for qwen3-vl-8b grpo & ppo based on tag v0.2.2~~ feat: add npu patch for qwen3-vl-8b grpo & ppo based on tag v0.2.2 Mar 23, 2026

cjy0x force-pushed the ascend_patch branch 3 times, most recently from 23f0202 to 995ed8b Compare March 26, 2026 08:59

cjy0x changed the title ~~feat: add npu patch for qwen3-vl-8b grpo & ppo based on tag v0.2.2~~ feat: add npu patch for qwen3-vl-8b grpo & ppo Mar 26, 2026

feat: add npu patch for qwen3-vl-8b grpo & ppo

6df04eb

Co-authored-by: shiyuan680 <917935075@qq.com> Co-authored-by: PengchengShi00 <spc117369@gmail.com> Signed-off-by: cjy0x <isjunyi.chen@gmail.com>

cjy0x force-pushed the ascend_patch branch from 995ed8b to 6df04eb Compare March 26, 2026 09:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add npu patch for qwen3-vl-8b grpo & ppo#1750

feat: add npu patch for qwen3-vl-8b grpo & ppo#1750
cjy0x wants to merge 1 commit intoTHUDM:mainfrom
cjy0x:ascend_patch

cjy0x commented Mar 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cjy0x commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Description

Overview

megatron-bridge.patch

megatron.patch

mindspeed.patch

sglang.patch

slime.patch

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cjy0x commented Mar 23, 2026 •

edited

Loading