[Record Submission] SP8192 + QK5 + Legal TTT — val_bpb 1.0842 | 15.99MB by aryan-cs · Pull Request #1476 · openai/parameter-golf

aryan-cs · 2026-04-08T15:50:48Z

Record: SP8192 + QK-Gain 5 + Legal Score-First TTT

val_bpb: 1.0842 | 15.99 MB

This submission validates a strong configuration using the SP8192 tokenizer, QK_GAIN_INIT=5, and legal score-first TTT.

The PR contains the tracked submission package:

records/track_10min_16mb/2026-04-08_PR1413_SP8192_QK5_LegalTTT_1.0842

Results

Metric	Value
Pre-quantization (post-EMA)	`1.09054898`
Exact `legal_ttt_exact`	`1.08418021`
Artifact size	`15,968,912` bytes
Total package size	`15,987,393` bytes

Main Changes

This submission uses the following configuration:

SP8192 tokenizer
QK_GAIN_INIT=5
legal score-first TTT
compact quantized artifact under the 16 MB submission limit

The resulting run achieves a strong exact score while staying within the tracked submission format.

Architecture

Component	Setting
Tokenizer	SP8192
QK gain	`5`
Evaluation	Legal score-first TTT
Submission track	`track_10min_16mb`
Artifact	`final_model.int6.ptz`

Submission Package

Included files:

README.md
submission.json
train_gpt.py
train_and_exact_log.txt
final_model.int6.ptz

Notes

This PR submits the validated run above as a clean tracked record package.

…| 15.99MB

…#1476/openai#1477 confirm SP8192+TTT is new comp meta — our SP8192 build is ready, deploy next; LEGAL_TTT brittleness pattern confirmed n=2

…C FAIL Pattern across 5 stacked tests with LEGAL_TTT — ALL FAIL: - bare champion (gated + LEGAL_TTT only) = 1.3711 ★ - + 3-way (normuon + asym_skip) = 1.40695 (+0.036) - + L02_anti-curriculum (BEC_REVERSE) = 1.4112 (+0.041) - + L05_norm_pct_dropout = 1.41515 (+0.044) - + L09_NGRAM_BACKOFF (cross-layer logit residual) = 1.4567 (+0.086) ★ WORST CONFIRMED: LEGAL_TTT champion is a UNIQUE brittle local minimum. Even cross-layer additions (n-gram bias residual that touches LOGITS not WEIGHTS) catastrophically destroy it. The hypothesis that "cross-layer = orthogonal" is FALSE. LEGAL_TTT eval-time SGD seems to learn to overfit to the EXACT forward pass shape of the bare champion; ANY logit perturbation upstream breaks the per-batch context/target convergence. Implication: Path to bear-fruit (1.3666) requires either: 1. SP8192 + bare LEGAL_TTT (comp meta from PR openai#1476) — needs vocab swap 2. A new mechanism that REPLACES LEGAL_TTT entirely 3. Hyperparameter tuning of LEGAL_TTT itself (longer steps, different LR) Pod I's STACK_BACKOFF_GATED (no LEGAL_TTT) is the only remaining test that can validate cross-layer compositionality WITHOUT the brittle ingredient. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

MatoTeziTanka · 2026-04-11T20:04:04Z

Community Review — [Record Submission] SP8192 + QK5 + Legal TTT — val_bpb 1.0842 | 15.99MB

Compliance: NEEDS AUTHOR ACTION — train_gpt.py fails to import on CT2038 (Python 3.10 / torch 2.10.0+cpu)

What I found: The CPU smoke test on CT2038 (proteus-engine, 128 GB RAM, Triton 3.6.0, flash_attn stub, cutlass_evt_fusion stub) failed at the import step with:

SyntaxError: f-string: expecting '}' (line 260)

A few of the common patterns I've seen for this class of error in the 2026-04-11 sweep:

PEP 701 f-string nesting — e.g. log(f" {cat}: {", ".join(...)}") is valid Python 3.12+ but invalid Python 3.10 because the inner ", " re-enters the outer double-quote context. One-character fix: ', ' instead of ", ". See PR Record: SP8192 + Improved Parallel Residuals + Muon 0.97 + LR 0.03 + Legal TTT — val_bpb 1.07785 (3-seed mean) #1541 / Record: SP8192 + Triple Recurrence + Banking + Fused MLP + Muon 0.97 — val_bpb 1.0778 (3-seed mean) #1523 for reference.
Missing flash_attn variants — e.g. from flash_attn_interface import flash_attn_varlen_func when the wrapper script only stubs flash_attn_func. Not a PR defect on H100s, but the eval image / CPU preflight path needs a guarded import.
Local compiled extension — e.g. import cutlass_evt_fusion from a records/*/cutlass_evt_fusion/ subfolder that isn't on the import path at smoke time. Usually an import-order issue inside the script.
Actual syntax error — typo, missing bracket, etc.

Recommendation: Could you run python3 -c "import py_compile; py_compile.compile('train_gpt.py')" on your records-folder train_gpt.py under Python 3.10 specifically? The eval image is Python 3.10 per Issue #17 / the README, so any parse error on 3.10 blocks the submission at import time before any of the scored-eval logic runs.

Once the parse/import issue is fixed, I'll re-run the compliance audit through the normal pipeline. No other flags identified yet because the audit halts at the import step.

Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): IMPORT_FAIL — SyntaxError: f-string: expecting '}' (line 260). Classification via classify_prs.py AST-based classifier; full compliance audit deferred until the import issue is resolved. Auto-drafted from a template and spot-checked before posting.

aryan-cs · 2026-04-12T03:29:24Z

Thanks for the catch. Fixed in the latest push to the PR branch.

What changed in train_gpt.py:

Repacked the compressed wrapper so the inner source is Python 3.10-compatible.
Fixed the Python 3.10-incompatible f-string at the reported line (f" {cat}: {", ".join(...)}").
Fixed a second same-class f-string issue in the train_shards log line.
Guarded the flash_attn import and added a fallback path using torch.nn.functional.scaled_dot_product_attention so CPU/import preflight does not hard-fail when flash_attn_interface is absent.

Verification:

Python 3.10 compile of the decompressed payload: OK
Import smoke with flash_attn_interface available: OK
Forced import smoke with both flash_attn_interface and flash_attn blocked: OK
Submission folder size after the fix: 15,987,149 bytes

GitHub now shows PR #1476 head at commit 76dc599.

Please re-run the compliance audit when convenient.

MatoTeziTanka · 2026-04-12T17:22:27Z

Re-audited at head SHA 76dc599. Decompressed the lzma payload on CT2038 + ran CPU gauntlet.

Gauntlet result (CT2038, Python 3.10, torch 2.10.0+cpu):

Import: PASS (5.5s — lzma decompression)
Hyperparameters: dim=512, layers=11, heads=8, vocab=8192
Model: PASS (35,943,512 params)
Forward pass: PASS (loss=9.0025)
Code size: 16,874 bytes (compressed wrapper)
Artifact: FAIL — expected on CPU (GPTQ requires GPU)

The original import error (PEP 701 f-string) is fixed — the inner payload decompresses and imports cleanly under Python 3.10. The flash_attn guard also works.

Compliance audit on the decompressed inner source (476 lines):

The TTT implementation at lines 345-389 (eval_val_sliding_ttt) follows the legal score-first-per-chunk pattern:

Score phase (lines 360-366): Each chunk's windows scored under torch.no_grad() before any optimizer step
is_last_chunk guard (lines 367-368): is_last_chunk = ci == num_chunks - 1 → adaptation runs only if not is_last_chunk and h.ttt_epochs > 0
Train phase (lines 369-384): SGD with cosine LR on unfrozen params, clipped gradients

This is the canonical legal TTT shape from PR #1413 (dexhunter): score chunk under no_grad, adapt on same chunk, next chunk sees updated weights, last chunk scored but never adapted. No n-gram cache, no SLOT, no pre-quant TTT.

Record-track checks from submission.json:

val_bpb: 1.08418021
Artifact: 15,968,912 bytes (99.8% of 16MB cap) — under budget
Seeds: 1 seed only — the submission.json has no seeds or seed_results field. Record claims require 3-seed validation per the competition rules.

Verdict: LOOKS CLEAN on compliance. TTT is legal score-first per #1413.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: Compliance is clean. One note: this is a single-seed submission — submission.json has no multi-seed results. If record-track claims require 3-seed validation, the author would need to add those runs.

Thanks for the thorough fix documentation @aryan-cs.

Re-audit by @MatoTeziTanka. CPU gauntlet on CT2038: IMPORT_OK, MODEL_OK, FORWARD_OK. Inner lzma payload decompressed (476 lines) and compliance-audited: legal score-first TTT (lines 360-368), no n-gram, no SLOT. submission.json verified: 15.97MB artifact, 1 seed.

[Record Submission] PR1413 SP8192 + QK5 + Legal TTT — val_bpb 1.0842 …

027c0eb

…| 15.99MB

Fix Python 3.10 import compatibility for record-1.0842

76dc599

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Record Submission] SP8192 + QK5 + Legal TTT — val_bpb 1.0842 | 15.99MB#1476

[Record Submission] SP8192 + QK5 + Legal TTT — val_bpb 1.0842 | 15.99MB#1476
aryan-cs wants to merge 2 commits intoopenai:mainfrom
aryan-cs:submission/record-1.0842

aryan-cs commented Apr 8, 2026

Uh oh!

MatoTeziTanka commented Apr 11, 2026 •

edited

Loading

Uh oh!

aryan-cs commented Apr 12, 2026

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aryan-cs commented Apr 8, 2026

Record: SP8192 + QK-Gain 5 + Legal Score-First TTT

Results

Main Changes

Architecture

Submission Package

Notes

Uh oh!

MatoTeziTanka commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Community Review — [Record Submission] SP8192 + QK5 + Legal TTT — val_bpb 1.0842 | 15.99MB

Uh oh!

aryan-cs commented Apr 12, 2026

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MatoTeziTanka commented Apr 11, 2026 •

edited

Loading