Refactor megatron to mcore_bridge#134
Refactor megatron to mcore_bridge#134tastelikefeet wants to merge 17 commits intomodelscope:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the Megatron-Core integration by offloading model configuration, creation, and weight loading logic to the mcore_bridge dependency, which allows for the removal of significant internal boilerplate code. However, the review highlights several critical issues introduced during the refactoring: the removal of the _BASE_LAYER_SUFFIXES constant and the self.hf_config attribute will lead to runtime errors since they are still referenced in the codebase. Furthermore, the send_weights method contains a NameError due to the use of an undefined args variable, which should be replaced with values from the strategy configuration.
| args = get_args() | ||
| org_vocab_size = getattr(self.hf_config, 'vocab_size', args.padded_vocab_size) | ||
| org_vocab_size = getattr(self.hf_config, 'vocab_size', self.strategy.config.padded_vocab_size) | ||
| _padded_vocab_size = args.padded_vocab_size |
There was a problem hiding this comment.
The args variable is used here but is no longer defined in this scope after the refactoring. This will cause a NameError. The padded_vocab_size should be retrieved from self.strategy.config.
| _padded_vocab_size = args.padded_vocab_size | |
| _padded_vocab_size = self.strategy.config.padded_vocab_size |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request refactors the Megatron-Core integration by delegating model configuration and creation to a new MegatronStrategy class and the mcore_bridge library, while removing the internal args.py and the TorchSampler module. The review identifies a critical AttributeError in multi_lora.py where get_target_modules is called on the wrong class. Additionally, the review highlights that the refactored _add_base_layer_suffix is now too generic and may incorrectly modify non-LoRA layer names, and that MultiLoraMegatron initializes parameter configurations without an optimizer, which could prevent the setup of training-specific features like gradient reduction overlap.
| def _add_base_layer_suffix(params): | ||
| _BASE_LAYER_SUFFIXES = ['weight', 'bias'] | ||
| for name, param in params: | ||
| for suffix in _BASE_LAYER_SUFFIXES: | ||
| if name.endswith(suffix): | ||
| attr = suffix.rsplit('.', 1)[-1] # 'weight' or 'bias' | ||
| name = f'{name[:-len(attr)]}base_layer.{attr}' | ||
| break | ||
| yield name, param |
There was a problem hiding this comment.
The implementation of _add_base_layer_suffix has become overly generic. It now adds a .base_layer. prefix to any parameter name ending in weight or bias. The previous implementation correctly used a specific list of suffixes for layers that are typically targeted by LoRA. This change might incorrectly modify parameter names for non-LoRA layers (e.g., model.norm.weight), which could lead to weight loading failures in downstream systems like vLLM when LoRA is enabled. It is recommended to revert to a more specific list of target suffixes to ensure only LoRA-adapted layers are modified.
PR type
PR information
Write the detail information belongs to this PR.
Experiment results
Paste your experiment result here(if needed).