Your Question
Hi,
We are trying to train GLM-5 (DeepseekV32ForCausalLM) using --megatron-to-hf-mode bridge to load HF safetensors directly. However, it crashes with: ValueError: Model architecture 'DeepseekV32ForCausalLM' is not yet supported.
It seems slime_plugins/mbridge/deepseek_v32.py registers the bridge using its own @register_model, but model_provider.py relies strictly on Megatron's internal MegatronModelBridge.REGISTRY, causing a disconnect.
We temporarily bypassed this by monkey-patching it in model_provider.py:
from slime_plugins.mbridge.deepseek_v32 import DeepseekV32Bridge
MegatronModelBridge.register_bridge(
source="DeepseekV32ForCausalLM", target=GPTModel
)(DeepseekV32Bridge)
Is online bridge mode supposed to be supported for GLM-5, and this is just a missing registry wrapper?
Or is the official workflow to strictly use mbridge to convert HF weights to a Megatron distributed (dist) checkpoint offline, then pass it via --load to bypass AutoBridge altogether?
What I've Tried
mbridge
Environment (if relevant)
- slime version:
- Python version:
- PyTorch version:
- CUDA/ROCm version:
- GPU type and count:
- OS:
Additional Context
No response
Pre-submission Checklist
Your Question
Hi,
We are trying to train GLM-5 (DeepseekV32ForCausalLM) using --megatron-to-hf-mode bridge to load HF safetensors directly. However, it crashes with: ValueError: Model architecture 'DeepseekV32ForCausalLM' is not yet supported.
It seems slime_plugins/mbridge/deepseek_v32.py registers the bridge using its own @register_model, but model_provider.py relies strictly on Megatron's internal MegatronModelBridge.REGISTRY, causing a disconnect.
We temporarily bypassed this by monkey-patching it in model_provider.py:
from slime_plugins.mbridge.deepseek_v32 import DeepseekV32Bridge
MegatronModelBridge.register_bridge(
source="DeepseekV32ForCausalLM", target=GPTModel
)(DeepseekV32Bridge)
Is online bridge mode supposed to be supported for GLM-5, and this is just a missing registry wrapper?
Or is the official workflow to strictly use mbridge to convert HF weights to a Megatron distributed (dist) checkpoint offline, then pass it via --load to bypass AutoBridge altogether?
What I've Tried
mbridge
Environment (if relevant)
Additional Context
No response
Pre-submission Checklist