Refactor megatron to mcore_bridge by tastelikefeet · Pull Request #134 · modelscope/twinkle

tastelikefeet · 2026-03-30T08:44:26Z

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

Write the detail information belongs to this PR.

Experiment results

Paste your experiment result here(if needed).

gemini-code-assist

Code Review

This pull request refactors the Megatron-Core integration by offloading model configuration, creation, and weight loading logic to the mcore_bridge dependency, which allows for the removal of significant internal boilerplate code. However, the review highlights several critical issues introduced during the refactoring: the removal of the _BASE_LAYER_SUFFIXES constant and the self.hf_config attribute will lead to runtime errors since they are still referenced in the codebase. Furthermore, the send_weights method contains a NameError due to the use of an undefined args variable, which should be replaced with values from the strategy configuration.

src/twinkle/model/megatron/megatron.py

gemini-code-assist · 2026-03-30T08:48:22Z

src/twinkle/model/megatron/megatron.py

-        args = get_args()
-        org_vocab_size = getattr(self.hf_config, 'vocab_size', args.padded_vocab_size)
+        org_vocab_size = getattr(self.hf_config, 'vocab_size', self.strategy.config.padded_vocab_size)
        _padded_vocab_size = args.padded_vocab_size


The args variable is used here but is no longer defined in this scope after the refactoring. This will cause a NameError. The padded_vocab_size should be retrieved from self.strategy.config.

Suggested change

_padded_vocab_size = args.padded_vocab_size

_padded_vocab_size = self.strategy.config.padded_vocab_size

tastelikefeet · 2026-03-31T10:25:13Z

/gemini review

gemini-code-assist

Code Review

This pull request refactors the Megatron-Core integration by delegating model configuration and creation to a new MegatronStrategy class and the mcore_bridge library, while removing the internal args.py and the TorchSampler module. The review identifies a critical AttributeError in multi_lora.py where get_target_modules is called on the wrong class. Additionally, the review highlights that the refactored _add_base_layer_suffix is now too generic and may incorrectly modify non-LoRA layer names, and that MultiLoraMegatron initializes parameter configurations without an optimizer, which could prevent the setup of training-specific features like gradient reduction overlap.

src/twinkle/model/multi_lora.py

gemini-code-assist · 2026-03-31T10:29:58Z

src/twinkle/model/megatron/megatron.py

+        def _add_base_layer_suffix(params):
+            _BASE_LAYER_SUFFIXES = ['weight', 'bias']
+            for name, param in params:
+                for suffix in _BASE_LAYER_SUFFIXES:
+                    if name.endswith(suffix):
+                        attr = suffix.rsplit('.', 1)[-1]  # 'weight' or 'bias'
+                        name = f'{name[:-len(attr)]}base_layer.{attr}'
+                        break
+                yield name, param


The implementation of _add_base_layer_suffix has become overly generic. It now adds a .base_layer. prefix to any parameter name ending in weight or bias. The previous implementation correctly used a specific list of suffixes for layers that are typically targeted by LoRA. This change might incorrectly modify parameter names for non-LoRA layers (e.g., model.norm.weight), which could lead to weight loading failures in downstream systems like vLLM when LoRA is enabled. It is recommended to revert to a more specific list of target suffixes to ensure only LoRA-adapted layers are modified.

src/twinkle/model/megatron/multi_lora_megatron.py

…idge

wip

8fb9991

gemini-code-assist bot reviewed Mar 30, 2026

View reviewed changes

tastelikefeet added 12 commits March 30, 2026 17:15

wip

35efd81

wip

01e7535

wip

399960a

fix

f2bd846

wip

096c193

wip

53c19a7

wip

d951d47

wip

ad82a77

wip

48cbf13

wip

43ef29e

fix

c3c7620

lint code

dec91c9

gemini-code-assist bot reviewed Mar 31, 2026

View reviewed changes

tastelikefeet added 4 commits March 31, 2026 21:56

fix

0a1c34c

Merge commit 'a89ede55e3daa4fc36f0319c77847e0bf257fcce' into feat/mbr…

13ec7f5

…idge

fix

ab0b161

fix

d50465f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor megatron to mcore_bridge#134

Refactor megatron to mcore_bridge#134
tastelikefeet wants to merge 17 commits intomodelscope:mainfrom
tastelikefeet:feat/mbridge

tastelikefeet commented Mar 30, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Mar 30, 2026

Uh oh!

tastelikefeet commented Mar 31, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	_padded_vocab_size = args.padded_vocab_size
	_padded_vocab_size = self.strategy.config.padded_vocab_size

Conversation

tastelikefeet commented Mar 30, 2026

PR type

PR information

Experiment results

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

tastelikefeet commented Mar 31, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant