Commit bc8614d
docs+revert: batched prefill gated on FP32 KV only; FP16 V needs more work
The attempt to auto-enable batched for FP16 V cache via direct FP32 VB
reads (avoiding FP16 round-trip inside batched attention) caused
downstream decode degradation: in-batch attention was bit-identical
to per-token with FP16 round-trip, but the resulting KV cache values
at later layers diverged slightly, and that drift propagated into
decode producing repetition-loop garbage ("HelhelHelhel...").
Reverted to the FP32-KV-only gate. Safe default: batched prefill
activates automatically when `-k fp32` is set, falls back to per-token
for the default FP16 V KV type.
README v3.2 update documents the user-facing status:
- Long-prompt prefill on `-k fp32`: 2.4× end-to-end, ~4× prefill
- Default FP16 V: unchanged (per-token)
- Bringing batched to default FP16 V = next major engineering item
11/11 STRICT+COHERENT+Metal-ON tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 672fea2 commit bc8614d
3 files changed
+12
-10
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
165 | 165 | | |
166 | 166 | | |
167 | 167 | | |
| 168 | + | |
| 169 | + | |
168 | 170 | | |
169 | 171 | | |
170 | 172 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
304 | 304 | | |
305 | 305 | | |
306 | 306 | | |
307 | | - | |
308 | | - | |
309 | | - | |
310 | | - | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
311 | 313 | | |
312 | | - | |
| 314 | + | |
313 | 315 | | |
314 | 316 | | |
315 | 317 | | |
316 | 318 | | |
317 | 319 | | |
318 | | - | |
319 | | - | |
| 320 | + | |
| 321 | + | |
320 | 322 | | |
321 | 323 | | |
322 | 324 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3360 | 3360 | | |
3361 | 3361 | | |
3362 | 3362 | | |
3363 | | - | |
3364 | | - | |
3365 | | - | |
| 3363 | + | |
3366 | 3364 | | |
3367 | 3365 | | |
3368 | 3366 | | |
| |||
0 commit comments