Skip to content

feat: add npu patch for qwen3-vl-8b grpo & ppo#1750

Open
cjy0x wants to merge 1 commit intoTHUDM:mainfrom
cjy0x:ascend_patch
Open

feat: add npu patch for qwen3-vl-8b grpo & ppo#1750
cjy0x wants to merge 1 commit intoTHUDM:mainfrom
cjy0x:ascend_patch

Conversation

@cjy0x
Copy link
Copy Markdown

@cjy0x cjy0x commented Mar 23, 2026

PR Description

Overview

This PR includes a set of patches to adapt the training stack (megatron-bridge, megatron, mindspeed, sglang, slime) for NPU (Ascend) compatibility, along with several bug fixes.


megatron-bridge.patch

  • param_mapping.py: Added compatibility handling for the extra prefix introduced by mindspeed's patch on the TE module.
  • transformer_block.py: Fixed a recomputation (activation checkpointing) bug in the Qwen3-VL transformer block.

megatron.patch

  • @jit_fuser removal: Removed all @jit_fuser decorators, as torch_npu does not support the corresponding fused operators and causes errors on certain versions.
  • CUDA → NPU translation: Applied manual cudanpu replacements where torch_npu's automatic translation failed to take effect.

mindspeed.patch

  • fused_rope.py: Added argument alignment shim to handle the version gap between mindspeed and upstream megatron.
  • megatron_adaptor.py: Added an args format conversion layer, as slime passes args in a format incompatible with mindspeed's expected structure.

sglang.patch


slime.patch

  • NPU detection: Added an is_npu() utility in common.py to gate NPU-specific logic throughout the codebase.
  • Ray resource allocation: Replaced num_gpus=N in ray.remote / .options() calls with resources={"NPU": N} (or "GPU" on non-NPU), as ray.remote(num_gpus=...) is not supported on NPU. Affects actor_group.py, placement_group.py, and rollout.py.
  • Ray GPU ID API: Replaced unsupported ray.get_gpu_ids() with ray.get_runtime_context().get_accelerator_ids()["NPU"], with additional int() casting since the returned values are strings. Affects placement_group.py.
  • Manual device/backend replacements: Applied manual corrections for cases torch_npu cannot auto-translate: ncclhccl (actor.py, update_weight_from_distributed.py), cudanpu (memory_utils.py), and CUDA_VISIBLE_DEVICESASCEND_RT_VISIBLE_DEVICES (sglang_engine.py).
  • MindSpeed plugin integration: Added the necessary initialization code to load the mindspeed plugin when using megatron as the training backend. Affects train.py and actor.py.
  • model_provider argument injection: Injected mindspeed-specific arguments into the megatron_bridge model provider, as they are otherwise not recognized during model loading. Also injects recomputation-related config here, since setting it at the script level has no effect. This will be simplified once mindspeed handles injection natively. Affects model_provider.py.

grpo raw_reward curve:
image

@cjy0x cjy0x changed the title add npu patch for qwen3-vl-8b grpo & ppo based on v0.2.2 add npu patch for qwen3-vl-8b grpo & ppo based on tag v0.2.2 Mar 23, 2026
@cjy0x cjy0x changed the title add npu patch for qwen3-vl-8b grpo & ppo based on tag v0.2.2 feat: add npu patch for qwen3-vl-8b grpo & ppo based on tag v0.2.2 Mar 23, 2026
@cjy0x cjy0x force-pushed the ascend_patch branch 3 times, most recently from 23f0202 to 995ed8b Compare March 26, 2026 08:59
@cjy0x cjy0x changed the title feat: add npu patch for qwen3-vl-8b grpo & ppo based on tag v0.2.2 feat: add npu patch for qwen3-vl-8b grpo & ppo Mar 26, 2026
Co-authored-by: shiyuan680 <917935075@qq.com>
Co-authored-by: PengchengShi00 <spc117369@gmail.com>
Signed-off-by: cjy0x <isjunyi.chen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant