Skip to content

[wip] loss reduction#1296

Draft
justinvyu wants to merge 4 commits intoNovaSky-AI:mainfrom
justinvyu:token_mean_loss_reduction
Draft

[wip] loss reduction#1296
justinvyu wants to merge 4 commits intoNovaSky-AI:mainfrom
justinvyu:token_mean_loss_reduction

Conversation

@justinvyu
Copy link
Contributor

port over changes from #925 to new directory structure

justinvyu and others added 3 commits March 9, 2026 11:51
… scale loss by dp_size for FSDP/Megatron parity

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…omparison

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Comment on lines +1049 to +1055
# Step 1: Z-score normalization (if enabled)
if self.cfg.trainer.algorithm.advantage_batch_normalize:
num_actions = response_mask.sum()
mean = advantages.mean()
std = ((advantages - mean).pow(2) * response_mask).sum()
rstd = (std / num_actions).clamp(min=1e-8).rsqrt()
advantages = (advantages - mean) * rstd
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this is a bit different from before -- we're taking a per-minibatch z-score, compared to before it's using the epoch-level mean/std

Comment on lines +1062 to +1075
# Option 1b: token-mean within each microbatch, then mean across microbatches
elif self.cfg.trainer.algorithm.loss_reduction == "token_mean_baseline":
micro_batch_size = self.cfg.trainer.micro_train_batch_size_per_gpu
num_micro_batches = len(data) // micro_batch_size
for i in range(num_micro_batches):
start_idx = i * micro_batch_size
end_idx = (i + 1) * micro_batch_size
microbatch_advantages = advantages[start_idx:end_idx]
microbatch_loss_mask = loss_mask[start_idx:end_idx]
# Compute token-mean within each microbatch
microbatch_advantages = microbatch_advantages / microbatch_loss_mask.sum().clamp(min=1)
# Average across microbatches
microbatch_advantages /= num_micro_batches
data["advantages"][start_idx:end_idx] = microbatch_advantages
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

baseline mean across minibatch token mean implementation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant