Skip to content

consensus: persist AppQC and blocks in avail#2896

Open
wen-coding wants to merge 7 commits intomainfrom
wen/persist_appqc_and_blocks
Open

consensus: persist AppQC and blocks in avail#2896
wen-coding wants to merge 7 commits intomainfrom
wen/persist_appqc_and_blocks

Conversation

@wen-coding
Copy link
Contributor

@wen-coding wen-coding commented Feb 16, 2026

Summary

Adding crash-safe persistence for availability state (AppQC and signed lane proposals), with asynchronous block persistence to keep fsync off the critical path.

  • Extract consensus/persist/ sub-package: Moves the generic A/B file persistence logic (persist.go, persist_test.go) into its own package via git mv (preserving history), exporting Persister, NewPersister, WriteAndSync, SuffixA/SuffixB.
  • Add block-file persistence (persist/blocks.go): Each signed lane proposal is stored as an individual <lane_hex>_<blocknum>.pb file in a blocks/ subdirectory; includes load-all, delete-before, and header-mismatch validation. On load, blocks are sorted and truncated at the first gap (with a warning log), so higher layers receive clean contiguous slices.
  • Wire persistence into availability state (avail/state.go): NewState now accepts stateDir, initialises both the A/B persister (for AppQC) and BlockPersister, loads persisted data on restart, and passes it to newInner for queue restoration. Persistence loading is factored into a dedicated loadPersistedState helper.
  • Restore state on restart (avail/inner.go): On load, advances commitQCs, appVotes, and per-lane block queues past already-persisted indices. Block restoration simply iterates the pre-sorted contiguous slices from the persistence layer.
  • Async block persistence: PushBlock and ProduceBlock add blocks to the in-memory queue immediately and send a persist job to a background goroutine via a buffered channel. The background writer fsyncs each block to disk and advances a per-lane blockPersisted cursor. RecvBatch gates on this cursor so votes are only signed for blocks that have been durably written to disk. This moves fsync off the critical path and out of the inner lock.
  • Thread PersistentStateDir from consensus.Config through consensus/state.go into avail.NewState.
  • Expand persistence design doc (consensus/inner.go): Documents what is persisted, why, recovery semantics, write behavior, and rebroadcasting strategy.
  • Isolate disk-recovery logic in persist layer: All block sorting, contiguous-prefix extraction, and gap truncation now live in persist/blocks.go, keeping newInner focused on queue initialization.

Ref: sei-protocol/sei-v3#512

@github-actions
Copy link

github-actions bot commented Feb 16, 2026

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedFeb 19, 2026, 12:26 AM

@codecov
Copy link

codecov bot commented Feb 16, 2026

Codecov Report

❌ Patch coverage is 70.16129% with 74 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.32%. Comparing base (243cc2a) to head (05beddb).
⚠️ Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
sei-tendermint/internal/autobahn/avail/state.go 56.70% 33 Missing and 9 partials ⚠️
...mint/internal/autobahn/consensus/persist/blocks.go 75.72% 13 Missing and 12 partials ⚠️
...ei-tendermint/internal/autobahn/consensus/state.go 37.50% 4 Missing and 1 partial ⚠️
...endermint/internal/autobahn/avail/subscriptions.go 60.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2896      +/-   ##
==========================================
+ Coverage   57.22%   57.32%   +0.10%     
==========================================
  Files        2093     2090       -3     
  Lines      171771   172432     +661     
==========================================
+ Hits        98294    98850     +556     
- Misses      64701    64748      +47     
- Partials     8776     8834      +58     
Flag Coverage Δ
sei-chain 52.81% <70.16%> (+0.13%) ⬆️
sei-cosmos 48.18% <ø> (-0.01%) ⬇️
sei-db 68.42% <ø> (-0.31%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sei-tendermint/internal/autobahn/avail/inner.go 95.08% <100.00%> (+3.65%) ⬆️
...ei-tendermint/internal/autobahn/consensus/inner.go 63.21% <ø> (ø)
...int/internal/autobahn/consensus/persist/persist.go 76.14% <100.00%> (ø)
...endermint/internal/autobahn/avail/subscriptions.go 82.97% <60.00%> (-3.39%) ⬇️
...ei-tendermint/internal/autobahn/consensus/state.go 82.03% <37.50%> (-0.90%) ⬇️
...mint/internal/autobahn/consensus/persist/blocks.go 75.72% <75.72%> (ø)
sei-tendermint/internal/autobahn/avail/state.go 67.46% <56.70%> (-5.01%) ⬇️

... and 89 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment on lines +286 to +288
for lane, q := range inner.blocks {
m[lane] = q.first
}

Check warning

Code scanning / CodeQL

Iteration over map Warning

Iteration over map may be a possible source of non-determinism
Extract generic A/B file persistence into a reusable consensus/persist/
sub-package and add block-file persistence for crash-safe availability
state recovery.

Changes:
- Move persist.go and persist_test.go into consensus/persist/ (git mv to
  preserve history), exporting Persister, NewPersister, WriteAndSync,
  SuffixA, SuffixB.
- Add persist/blocks.go: per-block file persistence using
  <lane_hex>_<blocknum>.pb files in a blocks/ subdirectory, with load,
  delete-before, and header-mismatch validation.
- Wire avail.NewState to accept stateDir, create A/B persister for
  AppQC and BlockPersister for signed lane proposals, and restore both
  on restart (contiguous block runs, queue alignment).
- Update avail/state.go to persist AppQC on prune and delete obsolete
  block files after each AppQC advance.
- Thread PersistentStateDir from consensus.Config through to
  avail.NewState.
- Expand consensus/inner.go doc comment with full persistence design
  (what, why, recovery, write behavior, rebroadcasting).
- Move TestRunOutputsPersistErrorPropagates to consensus/inner_test.go
  for proper package alignment.
- Add comprehensive tests for blocks persistence (empty dir, multi-lane,
  corrupt/mismatched skip, DeleteBefore, filename roundtrip).

Ref: sei-protocol/sei-v3#512
Co-authored-by: Cursor <cursoragent@cursor.com>
@wen-coding wen-coding force-pushed the wen/persist_appqc_and_blocks branch from ebf93df to f4a9c1e Compare February 17, 2026 04:50
@wen-coding wen-coding changed the title Port sei-v3 PR #512: persist AppQC and blocks to disk consensus: persist AppQC and blocks to disk Feb 18, 2026
wen-coding and others added 2 commits February 17, 2026 17:50
Move persisted data loading (AppQC deserialization and block loading)
into a dedicated function for readability.

Co-authored-by: Cursor <cursoragent@cursor.com>
Move block sorting, contiguous-prefix extraction, and gap truncation
from avail/inner.go into persist/blocks.go so all disk-recovery logic
lives in one place. This isolates storage concerns in the persistence
layer, simplifying newInner and preparing for a future storage backend
swap.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment on lines 61 to 78
for lane, bs := range loaded.blocks {
q, ok := i.blocks[lane]
if !ok || len(bs) == 0 {
continue
}
first := bs[0].Number
q.first = first
q.next = first
for _, b := range bs {
q.q[q.next] = b.Proposal
q.next++
}
// Advance the votes queue to match so headers() returns ErrPruned
// for already-committed blocks instead of blocking forever.
vq := i.votes[lane]
vq.first = first
vq.next = first
}

Check warning

Code scanning / CodeQL

Iteration over map Warning

Iteration over map may be a possible source of non-determinism
Comment on lines +166 to +181
for lane, bs := range raw {
sorted := slices.Sorted(maps.Keys(bs))
var contiguous []LoadedBlock
for i, n := range sorted {
if i > 0 && n != sorted[i-1]+1 {
log.Warn().
Str("lane", lane.String()).
Uint64("gapAt", uint64(sorted[i-1]+1)).
Int("skipped", len(sorted)-i).
Msg("truncating loaded blocks at gap; remaining will be re-fetched")
break
}
contiguous = append(contiguous, LoadedBlock{Number: n, Proposal: bs[n]})
}
result[lane] = contiguous
}

Check warning

Code scanning / CodeQL

Iteration over map Warning

Iteration over map may be a possible source of non-determinism
wen-coding and others added 2 commits February 17, 2026 21:15
Co-authored-by: Cursor <cursoragent@cursor.com>
PushBlock and ProduceBlock now add blocks to the in-memory queue
immediately and send a persist job to a background goroutine via a
buffered channel. The background writer fsyncs each block to disk
and advances a per-lane blockPersisted cursor under the inner lock.

RecvBatch gates on this cursor so votes are only signed for blocks
that have been durably written to disk. When persistence is disabled
(testing), the cursor is nil and RecvBatch falls back to bq.next.

Co-authored-by: Cursor <cursoragent@cursor.com>
@wen-coding wen-coding changed the title consensus: persist AppQC and blocks to disk consensus: persist AppQC and blocks, async block fsync Feb 18, 2026
@wen-coding wen-coding changed the title consensus: persist AppQC and blocks, async block fsync consensus: persist AppQC and blocks in avail Feb 18, 2026
wen-coding and others added 2 commits February 18, 2026 13:52
newInner no longer takes a separate persistEnabled bool; loaded != nil
already implies persistence is enabled. Tests with loaded data now
correctly reflect this.

Co-authored-by: Cursor <cursoragent@cursor.com>
blockPersisted is reconstructed from disk on restart, not persisted
itself. Move its creation to just above the block restoration loop
(past the loaded==nil early return) so the code reads top-down.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments