[bug][train] Fix max_seq_len calculation by tamoghnokandar · Pull Request #1303 · NovaSky-AI/SkyRL

tamoghnokandar · 2026-03-10T01:01:08Z

Summary

This PR removes the implicit max_seq_len heuristic calculation and requires users to set it explicitly when using trainer.algorithm.loss_reduction=seq_mean_token_sum_norm.

Changes

Removed the automatic trainer.algorithm.max_seq_len default from SkyRLTrainConfig.__post_init__
Added config validation that requires trainer.algorithm.max_seq_len to be explicitly set when loss_reduction == "seq_mean_token_sum_norm"
Updated config comments and docs to reflect the new behavior and explain that max_seq_len must be chosen based on the user’s intended sequence-length normalization budget
Updated tests to cover:
- max_seq_len remaining None by default
- explicit max_seq_len values being preserved
- validate_cfg() failing when seq_mean_token_sum_norm is used without max_seq_len
- validate_cfg() continuing to allow token_mean and sequence_mean without max_seq_len
- validate_cfg() passing when seq_mean_token_sum_norm is used with an explicit max_seq_len

Testing

Updated tests/train/test_config.py for the new behavior

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

devin-ai-integration

Devin Review found 3 new potential issues.

View 5 additional findings in Devin Review.

devin-ai-integration · 2026-03-10T01:07:21Z

skyrl/train/utils/utils.py

+    if cfg.trainer.algorithm.loss_reduction == "seq_mean_token_sum_norm":
+        if cfg.trainer.algorithm.max_seq_len is None:
+            raise ValueError(
+                "`trainer.algorithm.max_seq_len` must be set explicitly when "
+                "`trainer.algorithm.loss_reduction='seq_mean_token_sum_norm'`. "
+                "Choose the total sequence-length normalization constant for your setup; "
+                "this often matches the model context window / vLLM `max_model_len` when appropriate."


🔴 Breaking change: Dr. GRPO example script fails because auto-calculated max_seq_len fallback was removed

The PR removes the max_seq_len auto-calculation from SkyRLTrainConfig.__post_init__ (skyrl/train/config/config.py:713-722 on LEFT) and adds a hard assertion requiring it to be set explicitly when loss_reduction='seq_mean_token_sum_norm'. However, the official Dr. GRPO example script at examples/train/algorithms/drgrpo/run_drgrpo_gsm8k.sh:15,23 uses LOSS_REDUCTION="seq_mean_token_sum_norm" but never passes trainer.algorithm.max_seq_len. This script previously worked because __post_init__ auto-computed max_seq_len = max_input_length + max_generate_length. Now it will crash with an AssertionError at validation time.

Same issue in skyrl-agent example

skyrl-agent/examples/run_skyrl/run_skyrl_swe.sh:67 also sets trainer.algorithm.loss_reduction="seq_mean_token_sum_norm" without setting max_seq_len, so it will also fail.

Prompt for agents

Two example scripts need to be updated to explicitly pass trainer.algorithm.max_seq_len now that the auto-calculation fallback has been removed: 1. examples/train/algorithms/drgrpo/run_drgrpo_gsm8k.sh: Add a line like trainer.algorithm.max_seq_len=1536 (512 + 1024, matching max_prompt_length + max_generate_length from the script) to the uv run command. 2. skyrl-agent/examples/run_skyrl/run_skyrl_swe.sh: Add a line like trainer.algorithm.max_seq_len=40768 (8000 + 32768, matching max_prompt_length + max_generate_length from the script) to the uv run command. Both scripts use loss_reduction=seq_mean_token_sum_norm and will now fail the new assertion at skyrl/train/utils/utils.py:279-285 without this fix.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-03-10T01:07:22Z

skyrl/train/utils/utils.py

+    if cfg.trainer.algorithm.loss_reduction == "seq_mean_token_sum_norm":
+        if cfg.trainer.algorithm.max_seq_len is None:
+            raise ValueError(
+                "`trainer.algorithm.max_seq_len` must be set explicitly when "
+                "`trainer.algorithm.loss_reduction='seq_mean_token_sum_norm'`. "
+                "Choose the total sequence-length normalization constant for your setup; "
+                "this often matches the model context window / vLLM `max_model_len` when appropriate."


🔴 Breaking change: Dr. GRPO example script fails because auto-calculated max_seq_len fallback was removed

The PR removes the max_seq_len auto-calculation from SkyRLTrainConfig.__post_init__ (skyrl/train/config/config.py:713-722 on LEFT) and adds a hard assertion requiring it to be set explicitly when loss_reduction='seq_mean_token_sum_norm'. However, the official Dr. GRPO example script at examples/train/algorithms/drgrpo/run_drgrpo_gsm8k.sh:15,23 uses LOSS_REDUCTION="seq_mean_token_sum_norm" but never passes trainer.algorithm.max_seq_len. This script previously worked because __post_init__ auto-computed max_seq_len = max_input_length + max_generate_length. Now it will crash with an AssertionError at validation time.

Same issue in skyrl-agent example

skyrl-agent/examples/run_skyrl/run_skyrl_swe.sh:67 also sets trainer.algorithm.loss_reduction="seq_mean_token_sum_norm" without setting max_seq_len, so it will also fail.

Prompt for agents

Two example scripts need to be updated to explicitly pass trainer.algorithm.max_seq_len now that the auto-calculation fallback has been removed: 1. examples/train/algorithms/drgrpo/run_drgrpo_gsm8k.sh: Add a line like trainer.algorithm.max_seq_len=1536 (512 + 1024, matching max_prompt_length + max_generate_length from the script) to the uv run command. 2. skyrl-agent/examples/run_skyrl/run_skyrl_swe.sh: Add a line like trainer.algorithm.max_seq_len=40768 (8000 + 32768, matching max_prompt_length + max_generate_length from the script) to the uv run command. Both scripts use loss_reduction=seq_mean_token_sum_norm and will now fail the new assertion at skyrl/train/utils/utils.py:279-285 without this fix.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-03-10T01:07:23Z

skyrl/train/config/config.py

+    """Used for ``seq_mean_token_sum_norm`` loss reduction.
+    Must be set explicitly for that reduction mode; otherwise can remain ``None``."""


🟡 Stale comment in reduce_loss claims max_seq_len has a default fallback that no longer exists

At skyrl/backends/skyrl_train/utils/ppo_utils.py:999-1000, the comment says "NOTE: max_seq_len can be set explicitly via algorithm.max_seq_len, otherwise defaults to cfg.generator.max_input_length + cfg.generator.sampling_params.max_generate_length". This auto-calculation default was removed by this PR (deleted from skyrl/train/config/config.py:713-722), and max_seq_len must now always be set explicitly when using seq_mean_token_sum_norm. The stale comment will mislead developers into thinking a fallback still exists.

Prompt for agents

Update the stale comment in skyrl/backends/skyrl_train/utils/ppo_utils.py at lines 999-1000. The comment currently reads: # NOTE: max_seq_len can be set explicitly via algorithm.max_seq_len, otherwise defaults to # cfg.generator.max_input_length + cfg.generator.sampling_params.max_generate_length It should be updated to something like: # NOTE: max_seq_len must be set explicitly via algorithm.max_seq_len when using seq_mean_token_sum_norm loss reduction. This aligns with the new docstring at skyrl/train/config/config.py:374-375 and the validation at skyrl/train/utils/utils.py:279-285.

Was this helpful? React with 👍 or 👎 to provide feedback.

Fix max_seq_len calculation

c871632

This comment was marked as resolved.

Sign in to view

tamoghnokandar and others added 2 commits March 9, 2026 18:04

Added ValueError instead of Assertion

32662b2

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Added ValueError instead of Assertion

ee74399

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

devin-ai-integration bot reviewed Mar 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug][train] Fix max_seq_len calculation#1303

[bug][train] Fix max_seq_len calculation#1303
tamoghnokandar wants to merge 3 commits intoNovaSky-AI:mainfrom
tamoghnokandar:fix_max_seq_len

tamoghnokandar commented Mar 10, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

devin-ai-integration bot Mar 10, 2026

Uh oh!

devin-ai-integration bot Mar 10, 2026

Uh oh!

devin-ai-integration bot Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		"""Used for ``seq_mean_token_sum_norm`` loss reduction.
		Must be set explicitly for that reduction mode; otherwise can remain ``None``."""

Conversation

tamoghnokandar commented Mar 10, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tamoghnokandar commented Mar 10, 2026 •

edited by devin-ai-integration bot

Loading