Skip to content

[Question] Must GLM-5 be converted to dist format offline? (AutoBridge registry mismatch) #1787

@ygxw0909

Description

@ygxw0909

Your Question

Hi,

We are trying to train GLM-5 (DeepseekV32ForCausalLM) using --megatron-to-hf-mode bridge to load HF safetensors directly. However, it crashes with: ValueError: Model architecture 'DeepseekV32ForCausalLM' is not yet supported.

It seems slime_plugins/mbridge/deepseek_v32.py registers the bridge using its own @register_model, but model_provider.py relies strictly on Megatron's internal MegatronModelBridge.REGISTRY, causing a disconnect.

We temporarily bypassed this by monkey-patching it in model_provider.py:

from slime_plugins.mbridge.deepseek_v32 import DeepseekV32Bridge
MegatronModelBridge.register_bridge(
source="DeepseekV32ForCausalLM", target=GPTModel
)(DeepseekV32Bridge)

Is online bridge mode supposed to be supported for GLM-5, and this is just a missing registry wrapper?
Or is the official workflow to strictly use mbridge to convert HF weights to a Megatron distributed (dist) checkpoint offline, then pass it via --load to bypass AutoBridge altogether?

What I've Tried

mbridge

Environment (if relevant)

  • slime version:
  • Python version:
  • PyTorch version:
  • CUDA/ROCm version:
  • GPU type and count:
  • OS:

Additional Context

No response

Pre-submission Checklist

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions