Skip to content

fix(main): store label for GSP direct-MSE + unwrap TD3 tuple + bump GSP cadence#15

Merged
jdbloom merged 1 commit into
masterfrom
fix/gsp-direct-mse-wiring
Apr 13, 2026
Merged

fix(main): store label for GSP direct-MSE + unwrap TD3 tuple + bump GSP cadence#15
jdbloom merged 1 commit into
masterfrom
fix/gsp-direct-mse-wiring

Conversation

@jdbloom
Copy link
Copy Markdown
Collaborator

@jdbloom jdbloom commented Apr 13, 2026

Summary

Three coordinated changes supporting the GSP direct-MSE training path from GSP-RL PR #24.

1. `store_gsp_transition` call sites — pass `label` as 2nd arg. Previously non-attention variants stored the previous prediction in the action field so the DDPG critic could evaluate Q(state, prediction). With the DDPG actor-critic training path gone (see GSP-RL #24), the action slot now carries the supervised target. Three call sites updated.

2. TD3 tuple-loss crash fix. `learn_TD3` returns `(0, 0)` on non-actor-update steps; Main.py was storing that tuple in the per-step `loss` timeseries, crashing `hdf5_writer.write_episode` with `ValueError: setting an array element with a sequence` when TD3 jobs reached episode end. Crashed `td3_gsp_s123` in the live diagnostic batch. Unwrapped the tuple at both learn call sites.

3. `GSP_LEARNING_FREQUENCY`: 500 → 4. Direct-MSE updates are cheap; the 500-step cadence was a hedge for the actor-critic path's sparse feedback. 4 matches `LEARN_EVERY` so the GSP predictor gets ~125× more gradient steps per episode.

Test plan

  • New contract test `tests/test_main_gsp_contract.py` — static grep assertion that every `store_gsp_transition` call passes `label` as 2nd arg. RED before the fix (3 violations), GREEN after.
  • Full RL-CT suite: 112/112 pass (excluding pre-existing `test_nan_guards.py` import error unrelated to this PR)
  • Main.py syntax check

Companion PR (must merge together)

GSP-RL #24 — `learn_gsp_mse` method + `learn_gsp` dispatch rewrite. The two PRs are runtime-coupled. Merging only one leaves the GSP training path reading the wrong field of the replay buffer.

🤖 Generated with Claude Code

…dence

Three coordinated changes supporting the GSP direct-MSE training path from
GSP-RL PR #24.

1. store_gsp_transition call sites — pass LABEL as the 2nd arg

   Previously non-attention variants stored the previous prediction
   (`old_heading_gsp[i]`) in the action field so the DDPG critic could
   evaluate Q(state, prediction). The DDPG actor-critic training path is
   gone, so the "action" slot is now used for the supervised target.
   The attention variant already passed `label` here; now all branches do.

   Three call sites updated: independent-learning training store, shared-
   model training store, and the global-knowledge branch.

2. TD3 tuple-loss crash fix

   `learn_TD3` returns (0, 0) on non-actor-update steps (where the critic
   updated but the actor did not, because of UPDATE_ACTOR_ITER>1). Main.py
   was storing this tuple in the `loss` timeseries passed to hdf5_writer,
   which then crashed with:

     ValueError: setting an array element with a sequence. The detected
     shape was (4500,) + inhomogeneous part.

   This crashed `td3_gsp_s123` in the diagnostic batch. Unwrap the tuple
   to a scalar at both learn call sites (independent + shared).

3. GSP_LEARNING_FREQUENCY: 500 → 4

   Under direct-MSE GSP training the loss is supervised and cheap — there's
   no need for the 500-step cadence that was a hedge for the actor-critic
   path's sparse feedback. 4 matches LEARN_EVERY so the GSP predictor
   updates in lockstep with the primary network and gets ~125× more
   gradient steps per episode.

New contract test: tests/test_main_gsp_contract.py statically asserts that
every store_gsp_transition call in Main.py passes the label as its 2nd arg.
Catches regressions in the call signature.

Full RL-CT suite: 112/112 pass (excluding pre-existing test_nan_guards.py
import error unrelated to this PR).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jdbloom jdbloom merged commit ceaa2f6 into master Apr 13, 2026
3 checks passed
@jdbloom jdbloom deleted the fix/gsp-direct-mse-wiring branch April 13, 2026 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant