fix(main): store label for GSP direct-MSE + unwrap TD3 tuple + bump GSP cadence by jdbloom · Pull Request #15 · NESTLab/RL-CollectiveTransport

jdbloom · 2026-04-13T13:54:18Z

Summary

Three coordinated changes supporting the GSP direct-MSE training path from GSP-RL PR #24.

1. `store_gsp_transition` call sites — pass `label` as 2nd arg. Previously non-attention variants stored the previous prediction in the action field so the DDPG critic could evaluate Q(state, prediction). With the DDPG actor-critic training path gone (see GSP-RL #24), the action slot now carries the supervised target. Three call sites updated.

2. TD3 tuple-loss crash fix. `learn_TD3` returns `(0, 0)` on non-actor-update steps; Main.py was storing that tuple in the per-step `loss` timeseries, crashing `hdf5_writer.write_episode` with `ValueError: setting an array element with a sequence` when TD3 jobs reached episode end. Crashed `td3_gsp_s123` in the live diagnostic batch. Unwrapped the tuple at both learn call sites.

3. `GSP_LEARNING_FREQUENCY`: 500 → 4. Direct-MSE updates are cheap; the 500-step cadence was a hedge for the actor-critic path's sparse feedback. 4 matches `LEARN_EVERY` so the GSP predictor gets ~125× more gradient steps per episode.

Test plan

New contract test `tests/test_main_gsp_contract.py` — static grep assertion that every `store_gsp_transition` call passes `label` as 2nd arg. RED before the fix (3 violations), GREEN after.
Full RL-CT suite: 112/112 pass (excluding pre-existing `test_nan_guards.py` import error unrelated to this PR)
Main.py syntax check

Companion PR (must merge together)

GSP-RL #24 — `learn_gsp_mse` method + `learn_gsp` dispatch rewrite. The two PRs are runtime-coupled. Merging only one leaves the GSP training path reading the wrong field of the replay buffer.

🤖 Generated with Claude Code

…dence Three coordinated changes supporting the GSP direct-MSE training path from GSP-RL PR #24. 1. store_gsp_transition call sites — pass LABEL as the 2nd arg Previously non-attention variants stored the previous prediction (`old_heading_gsp[i]`) in the action field so the DDPG critic could evaluate Q(state, prediction). The DDPG actor-critic training path is gone, so the "action" slot is now used for the supervised target. The attention variant already passed `label` here; now all branches do. Three call sites updated: independent-learning training store, shared- model training store, and the global-knowledge branch. 2. TD3 tuple-loss crash fix `learn_TD3` returns (0, 0) on non-actor-update steps (where the critic updated but the actor did not, because of UPDATE_ACTOR_ITER>1). Main.py was storing this tuple in the `loss` timeseries passed to hdf5_writer, which then crashed with: ValueError: setting an array element with a sequence. The detected shape was (4500,) + inhomogeneous part. This crashed `td3_gsp_s123` in the diagnostic batch. Unwrap the tuple to a scalar at both learn call sites (independent + shared). 3. GSP_LEARNING_FREQUENCY: 500 → 4 Under direct-MSE GSP training the loss is supervised and cheap — there's no need for the 500-step cadence that was a hedge for the actor-critic path's sparse feedback. 4 matches LEARN_EVERY so the GSP predictor updates in lockstep with the primary network and gets ~125× more gradient steps per episode. New contract test: tests/test_main_gsp_contract.py statically asserts that every store_gsp_transition call in Main.py passes the label as its 2nd arg. Catches regressions in the call signature. Full RL-CT suite: 112/112 pass (excluding pre-existing test_nan_guards.py import error unrelated to this PR). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jdbloom merged commit ceaa2f6 into master Apr 13, 2026
3 checks passed

jdbloom deleted the fix/gsp-direct-mse-wiring branch April 13, 2026 13:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(main): store label for GSP direct-MSE + unwrap TD3 tuple + bump GSP cadence#15

fix(main): store label for GSP direct-MSE + unwrap TD3 tuple + bump GSP cadence#15
jdbloom merged 1 commit into
masterfrom
fix/gsp-direct-mse-wiring

jdbloom commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jdbloom commented Apr 13, 2026

Summary

Test plan

Companion PR (must merge together)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant