Mdp by shanmugamr1992 · Pull Request #1849 · NVIDIA-NeMo/RL

shanmugamr1992 · 2026-01-29T22:05:44Z

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Signed-off-by: Anna Shors <ashors@nvidia.com>

terrykong

i know this is a draft and things are in flux, but leaving some feedback from a first pass. this definitely needs more passes.

cc @ashors1 @ananthsub @yaoyu-33 for design of mcore inf in the policy worker

my opinion is we now need to maybe move the inference to a mixin so that the regular training policy methods are clearly separated from the generation ones because now there are many. After separating, the MegatronInferenceMixin can be added as one of the parent classes (multi-ineheritance). right now the megatron policy worker class has ballooned quite significantly and is very intimidating.

a general feedback is i would love to have more boilerplate get pushed into megatron inference. the amount of code change needed seems like a lot and we seem to need to set state that i would imagine mcore inference APIs might handle (like the local/none thing)

terrykong · 2026-02-02T21:17:19Z

why were these dropped?

terrykong · 2026-02-02T21:18:12Z

+        self.dynamic_inference_engine = None
+        self.inference_client = None
+        self.inference_context = None
+        self.inference_wrapped_model = None
+        self._inference_engine_initialized = False
+        self._inference_engine_paused = True  # Start paused since we begin with training
+        self._inference_loop = None  # Event loop for inference operations
+        self._inference_thread = None  # Thread running the event loop
+


high level q: why does mcore inference require so much book keeping by the application?

Hmm, yeah we definitely have to do a better job of abstracting all of this out. There is a task we have to do that. Mocre inference with coordinator is waht requires all of this. Just mcore inference is a little bit better . (we need to do a lot more here especially because we are using ray on top of things, and we are doing colocated (which means we need to suspend engine, resume3 etc) I agree we should make this simpler though . Will definitely push for this soon.

terrykong · 2026-02-02T21:20:15Z

+        model_cfg = cfg_from_pretrained.model
+        cfg_from_pretrained.logger = LoggerConfig()

+        # Ensure make_vocab_size_divisible_by has a reasonable default (128 is standard)


@ananthsub @yaoyu-33 can you comment on this?

i feel this is potentially an unsafe thing to default especially if we don't currently have a way to chop off the vocab during HF export.

this is good to have in general though b/c i know vocab parallel will have issues w/o but prob not a good thing to enable globally yet

terrykong · 2026-02-02T21:20:49Z

+        # Setting moe_router_dtype to higher precision (e.g. fp64) can improve numerical stability,
+        # especially when using many experts.
+        model_cfg.moe_router_dtype = self.cfg["megatron_cfg"]["moe_router_dtype"]
+        model_cfg.moe_token_dispatcher_type = "alltoall"


what is this hard coded? can this be plumbed in

RL/nemo_rl/models/policy/__init__.py

Line 193 in dacac7e

moe_token_dispatcher_type: str

?

terrykong · 2026-02-02T21:26:28Z

+        unified_memory_level = mcore_generation_config["unified_memory_level"]
+        model_config = self.model.config
+        # Enable CUDA graphs for inference
+        model_config.cuda_graph_impl = "local"


another place it's set from local (vs none). is this potentially error prone since we have many places where this needs to be set?

Removed this.

terrykong · 2026-02-02T21:28:52Z

+        self._inference_engine_paused = True
+        print(f"[Rank {self.rank}] paused inference engine")
+
+    async def pause_engine(self):


should this be protected? it doesn't appear to be something the user needs to be aware of

Suggested change

async def pause_engine(self):

async def _pause_engine(self):

terrykong · 2026-02-02T21:31:11Z

+
+        self._inference_engine_paused = False
+
+    def pause_inference_engine(self):


shall we use the nomenclature sleep/wake to match the other inference engines at least in the nemo-rl API? i think it's okay that mcore inf calls it pause/resume, but i do think that could potentially be a little confusing if we ever do partial rollouts or in-flight weight updates when n actual pause may be needed

yaoyu-33 and others added 6 commits January 15, 2026 17:41

Update Megatron submodule pins

2712979

Bump Megatron submodules

6d03b70

fix CACHED_DEPENDENCIES

82d1059

Signed-off-by: Anna Shors <ashors@nvidia.com>

API updates

fde1997

Signed-off-by: Anna Shors <ashors@nvidia.com>

Working version of qwen30 with uvm and cudagraphs

6691804

Latest changes

48cee6e

shanmugamr1992 requested review from a team as code owners January 29, 2026 22:05

Latest changes

a97faac

terrykong reviewed Feb 2, 2026

View reviewed changes

yfw force-pushed the update-megatron-lm-5247a1f branch from 0c7d97e to e970643 Compare February 9, 2026 18:42

Shanmugam Ramasamy added 2 commits February 11, 2026 21:47

Fixed terrys comments

95207c8

Fixed terrys comments

1a6529b


		self._inference_engine_paused = False

		def pause_inference_engine(self):

Conversation

shanmugamr1992 commented Jan 29, 2026

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Uh oh!

terrykong left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants