[wip] loss reduction by justinvyu · Pull Request #1296 · NovaSky-AI/SkyRL

justinvyu · 2026-03-09T18:56:09Z

port over changes from #925 to new directory structure

… scale loss by dp_size for FSDP/Megatron parity Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…omparison Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Signed-off-by: Justin Yu <justinvyu@anyscale.com>

justinvyu · 2026-03-09T19:06:33Z

skyrl/train/trainer.py

+        # Step 1: Z-score normalization (if enabled)
+        if self.cfg.trainer.algorithm.advantage_batch_normalize:
+            num_actions = response_mask.sum()
+            mean = advantages.mean()
+            std = ((advantages - mean).pow(2) * response_mask).sum()
+            rstd = (std / num_actions).clamp(min=1e-8).rsqrt()
+            advantages = (advantages - mean) * rstd


note: this is a bit different from before -- we're taking a per-minibatch z-score, compared to before it's using the epoch-level mean/std

justinvyu · 2026-03-09T19:06:49Z

skyrl/train/trainer.py

+        # Option 1b: token-mean within each microbatch, then mean across microbatches
+        elif self.cfg.trainer.algorithm.loss_reduction == "token_mean_baseline":
+            micro_batch_size = self.cfg.trainer.micro_train_batch_size_per_gpu
+            num_micro_batches = len(data) // micro_batch_size
+            for i in range(num_micro_batches):
+                start_idx = i * micro_batch_size
+                end_idx = (i + 1) * micro_batch_size
+                microbatch_advantages = advantages[start_idx:end_idx]
+                microbatch_loss_mask = loss_mask[start_idx:end_idx]
+                # Compute token-mean within each microbatch
+                microbatch_advantages = microbatch_advantages / microbatch_loss_mask.sum().clamp(min=1)
+                # Average across microbatches
+                microbatch_advantages /= num_micro_batches
+                data["advantages"][start_idx:end_idx] = microbatch_advantages


baseline mean across minibatch token mean implementation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

justinvyu and others added 3 commits March 9, 2026 11:51

Move loss reduction normalization to trainer-level advantage scaling,…

589c150

… scale loss by dp_size for FSDP/Megatron parity Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add token_mean_baseline loss reduction for mean-of-microbatch-means c…

333f31a

…omparison Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix assertion

aaaba4c

Signed-off-by: Justin Yu <justinvyu@anyscale.com>

justinvyu commented Mar 9, 2026

View reviewed changes

Update tests for sum-based reduce_loss and dp_size scaling changes

a121360

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[wip] loss reduction#1296

[wip] loss reduction#1296
justinvyu wants to merge 4 commits intoNovaSky-AI:mainfrom
justinvyu:token_mean_loss_reduction

justinvyu commented Mar 9, 2026

Uh oh!

justinvyu Mar 9, 2026

Uh oh!

justinvyu Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

justinvyu commented Mar 9, 2026

Uh oh!

justinvyu Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

justinvyu Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant