Skip to content

[Record Submission] SP8192 + QK5 + Legal TTT — val_bpb 1.0842 | 15.99MB#1476

Open
aryan-cs wants to merge 2 commits intoopenai:mainfrom
aryan-cs:submission/record-1.0842
Open

[Record Submission] SP8192 + QK5 + Legal TTT — val_bpb 1.0842 | 15.99MB#1476
aryan-cs wants to merge 2 commits intoopenai:mainfrom
aryan-cs:submission/record-1.0842

Conversation

@aryan-cs
Copy link
Copy Markdown

@aryan-cs aryan-cs commented Apr 8, 2026

Record: SP8192 + QK-Gain 5 + Legal Score-First TTT

val_bpb: 1.0842 | 15.99 MB

This submission validates a strong configuration using the SP8192 tokenizer, QK_GAIN_INIT=5, and legal score-first TTT.

The PR contains the tracked submission package:

  • records/track_10min_16mb/2026-04-08_PR1413_SP8192_QK5_LegalTTT_1.0842

Results

Metric Value
Pre-quantization (post-EMA) 1.09054898
Exact legal_ttt_exact 1.08418021
Artifact size 15,968,912 bytes
Total package size 15,987,393 bytes

Main Changes

This submission uses the following configuration:

  • SP8192 tokenizer
  • QK_GAIN_INIT=5
  • legal score-first TTT
  • compact quantized artifact under the 16 MB submission limit

The resulting run achieves a strong exact score while staying within the tracked submission format.


Architecture

Component Setting
Tokenizer SP8192
QK gain 5
Evaluation Legal score-first TTT
Submission track track_10min_16mb
Artifact final_model.int6.ptz

Submission Package

Included files:

  • README.md
  • submission.json
  • train_gpt.py
  • train_and_exact_log.txt
  • final_model.int6.ptz

Notes

This PR submits the validated run above as a clean tracked record package.

taka6745 pushed a commit to taka6745/parameter-golf that referenced this pull request Apr 8, 2026
…#1476/openai#1477 confirm SP8192+TTT is new comp meta — our SP8192 build is ready, deploy next; LEGAL_TTT brittleness pattern confirmed n=2
taka6745 pushed a commit to taka6745/parameter-golf that referenced this pull request Apr 8, 2026
…C FAIL

Pattern across 5 stacked tests with LEGAL_TTT — ALL FAIL:
- bare champion (gated + LEGAL_TTT only)             = 1.3711 ★
- + 3-way (normuon + asym_skip)                       = 1.40695 (+0.036)
- + L02_anti-curriculum (BEC_REVERSE)                 = 1.4112 (+0.041)
- + L05_norm_pct_dropout                              = 1.41515 (+0.044)
- + L09_NGRAM_BACKOFF (cross-layer logit residual)    = 1.4567 (+0.086) ★ WORST

CONFIRMED: LEGAL_TTT champion is a UNIQUE brittle local minimum. Even
cross-layer additions (n-gram bias residual that touches LOGITS not WEIGHTS)
catastrophically destroy it. The hypothesis that "cross-layer = orthogonal"
is FALSE. LEGAL_TTT eval-time SGD seems to learn to overfit to the EXACT
forward pass shape of the bare champion; ANY logit perturbation upstream
breaks the per-batch context/target convergence.

Implication: Path to bear-fruit (1.3666) requires either:
1. SP8192 + bare LEGAL_TTT (comp meta from PR openai#1476) — needs vocab swap
2. A new mechanism that REPLACES LEGAL_TTT entirely
3. Hyperparameter tuning of LEGAL_TTT itself (longer steps, different LR)

Pod I's STACK_BACKOFF_GATED (no LEGAL_TTT) is the only remaining test
that can validate cross-layer compositionality WITHOUT the brittle ingredient.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@MatoTeziTanka
Copy link
Copy Markdown

MatoTeziTanka commented Apr 11, 2026

Community Review — [Record Submission] SP8192 + QK5 + Legal TTT — val_bpb 1.0842 | 15.99MB

Compliance: NEEDS AUTHOR ACTION — train_gpt.py fails to import on CT2038 (Python 3.10 / torch 2.10.0+cpu)

What I found: The CPU smoke test on CT2038 (proteus-engine, 128 GB RAM, Triton 3.6.0, flash_attn stub, cutlass_evt_fusion stub) failed at the import step with:

SyntaxError: f-string: expecting '}' (line 260)

A few of the common patterns I've seen for this class of error in the 2026-04-11 sweep:

Recommendation: Could you run python3 -c "import py_compile; py_compile.compile('train_gpt.py')" on your records-folder train_gpt.py under Python 3.10 specifically? The eval image is Python 3.10 per Issue #17 / the README, so any parse error on 3.10 blocks the submission at import time before any of the scored-eval logic runs.

Once the parse/import issue is fixed, I'll re-run the compliance audit through the normal pipeline. No other flags identified yet because the audit halts at the import step.


Reviewed by @MatoTeziTankaThe Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): IMPORT_FAIL — SyntaxError: f-string: expecting '}' (line 260). Classification via classify_prs.py AST-based classifier; full compliance audit deferred until the import issue is resolved. Auto-drafted from a template and spot-checked before posting.

@aryan-cs
Copy link
Copy Markdown
Author

Thanks for the catch. Fixed in the latest push to the PR branch.

What changed in train_gpt.py:

  • Repacked the compressed wrapper so the inner source is Python 3.10-compatible.
  • Fixed the Python 3.10-incompatible f-string at the reported line (f" {cat}: {", ".join(...)}").
  • Fixed a second same-class f-string issue in the train_shards log line.
  • Guarded the flash_attn import and added a fallback path using torch.nn.functional.scaled_dot_product_attention so CPU/import preflight does not hard-fail when flash_attn_interface is absent.

Verification:

  • Python 3.10 compile of the decompressed payload: OK
  • Import smoke with flash_attn_interface available: OK
  • Forced import smoke with both flash_attn_interface and flash_attn blocked: OK
  • Submission folder size after the fix: 15,987,149 bytes

GitHub now shows PR #1476 head at commit 76dc599.

Please re-run the compliance audit when convenient.

@MatoTeziTanka
Copy link
Copy Markdown

Re-audited at head SHA 76dc599. Decompressed the lzma payload on CT2038 + ran CPU gauntlet.

Gauntlet result (CT2038, Python 3.10, torch 2.10.0+cpu):

Import: PASS (5.5s — lzma decompression)
Hyperparameters: dim=512, layers=11, heads=8, vocab=8192
Model: PASS (35,943,512 params)
Forward pass: PASS (loss=9.0025)
Code size: 16,874 bytes (compressed wrapper)
Artifact: FAIL — expected on CPU (GPTQ requires GPU)

The original import error (PEP 701 f-string) is fixed — the inner payload decompresses and imports cleanly under Python 3.10. The flash_attn guard also works.

Compliance audit on the decompressed inner source (476 lines):

The TTT implementation at lines 345-389 (eval_val_sliding_ttt) follows the legal score-first-per-chunk pattern:

  • Score phase (lines 360-366): Each chunk's windows scored under torch.no_grad() before any optimizer step
  • is_last_chunk guard (lines 367-368): is_last_chunk = ci == num_chunks - 1 → adaptation runs only if not is_last_chunk and h.ttt_epochs > 0
  • Train phase (lines 369-384): SGD with cosine LR on unfrozen params, clipped gradients

This is the canonical legal TTT shape from PR #1413 (dexhunter): score chunk under no_grad, adapt on same chunk, next chunk sees updated weights, last chunk scored but never adapted. No n-gram cache, no SLOT, no pre-quant TTT.

Record-track checks from submission.json:

  • val_bpb: 1.08418021
  • Artifact: 15,968,912 bytes (99.8% of 16MB cap) — under budget
  • Seeds: 1 seed only — the submission.json has no seeds or seed_results field. Record claims require 3-seed validation per the competition rules.

Verdict: LOOKS CLEAN on compliance. TTT is legal score-first per #1413.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: Compliance is clean. One note: this is a single-seed submission — submission.json has no multi-seed results. If record-track claims require 3-seed validation, the author would need to add those runs.

Thanks for the thorough fix documentation @aryan-cs.


Re-audit by @MatoTeziTanka. CPU gauntlet on CT2038: IMPORT_OK, MODEL_OK, FORWARD_OK. Inner lzma payload decompressed (476 lines) and compliance-audited: legal score-first TTT (lines 360-368), no n-gram, no SLOT. submission.json verified: 15.97MB artifact, 1 seed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants