TraceDB: Snapshot-backed state for the trace baker#3360
Conversation
1764fc9 to
c173794
Compare
7b8a363 to
af0de0f
Compare
2412434 to
e0a56bd
Compare
af0de0f to
6981a66
Compare
e0a56bd to
c5e3d21
Compare
6981a66 to
fe9ec89
Compare
c5e3d21 to
4bfc441
Compare
ae34e85 to
f861506
Compare
4bfc441 to
cf01744
Compare
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
7c210c4 to
27dc9b0
Compare
cf01744 to
c28c23f
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3360 +/- ##
==========================================
- Coverage 59.25% 59.22% -0.03%
==========================================
Files 2110 2110
Lines 174181 174058 -123
==========================================
- Hits 103210 103094 -116
+ Misses 62044 62032 -12
- Partials 8927 8932 +5
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Stacks on the trace baker PR. Captures an O(1) memiavl snapshot of the SC tree at EndBlock and serves trace re-execution from in-RAM state instead of SS-pebble. memiavl: refcount *Snapshot. Tree.Copy() Acquires; Snapshot.Close unmaps only on the final release. Without this a held copy was a use-after-munmap waiting to happen — the background snapshot rewrite calls Tree.ReplaceWith → snapshot.Close mid-flight, segfaulting any held copy. The internal rewrite goroutine also drops its clone's ref so the refcount can reach zero. Committer interface gains Copy(). memiavl delegates to *DB.Copy. composite returns nil when flatkv is engaged so the snapshot path silently falls back. storev2 rootmulti adds SnapshotSCStore + CacheMultiStoreFromCommitter. EVM keeper: TraceSnapshotStore (bounded by-height map) and EndBlock capture keyed by snapshot.Version() (= H-1 at EndBlock(H)). App: SnapshotAwareRPCContextProvider builds the sdk.Context directly from the snapshot CMS to skip the throwaway CacheMultiStoreWithVersion that CreateQueryContext would otherwise make. Configurable via [evm]: trace_bake_use_snapshot (default false) trace_bake_snapshot_window (default 64) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Point to the existing memiavl MemNode gauges and trace-baker counters that operators should watch when enabling the snapshot path on high-throughput nodes. No new metrics — just signposts to ones that already exist. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
9bba87f to
aacf715
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit aacf715. Configure here.
Resolve conflict in sei-db/state_db/sc/types/types.go by keeping both the Copy() addition from this branch and the Importer doc comment from main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| homeDir string, | ||
| stateStore types.StateStore, | ||
| isPanicOrSyntheticTxFunc func(ctx context.Context, hash common.Hash) (bool, error), // used in *ExcludeTraceFail endpoints | ||
| traceCtxProviders ...TraceContextProvider, |
There was a problem hiding this comment.
only the first element is ever read. A non-variadic *TraceContextProvider parameter (or a small options struct) avoids the "what if someone passes two" ambiguity and keeps the signature self-documenting.
| } | ||
|
|
||
| // Close releases all retained snapshots. | ||
| func (s *TraceSnapshotStore) Close() { |
There was a problem hiding this comment.
TraceSnapshotStore.Close() returns nothing, but inside it ignores per-snapshot release errors:
x/evm/keeper/trace_snapshot.go:97-112
_ = releaser.ReleaseSnapshotRefs()
Same as the WARN above: refcount mismatches are real bugs. Either return error or log at WARN level on close to keep ops visibility.
| defer telemetry.ModuleMeasureSince(types.ModuleName, time.Now(), telemetry.MetricKeyEndBlocker) | ||
| // Bake height-1: at EndBlock(N) the indexer's safe latest is N-1, so | ||
| // N-1 is the most recent block guaranteed to be queryable. | ||
| // Bake height-1: at EndBlock(N) the indexer's safe latest is N-1. When |
There was a problem hiding this comment.
EndBlock snapshot semantics are subtle — comment is dense, easy to mis-read
The off-by-one here is correct but non-obvious: storev2/rootmulti.flush() doesn't run until Commit(), so at EndBlock(N) the SC tree state is state_after_commit_of_(N-1) and snap.Version() == N-1. The baker then traces H=N-1, whose initializeBlock calls ctxProvider(H-1) = ctxProvider(N-2), which finds snap[N-2] Put at the previous EndBlock(N-1). Worth a one-line "lined up because rs.flush is called from Commit, not from EndBlock" in the comment to save the next reader a half-hour.
Also: initializeBlock calls the provider twice — once for prevBlockHeight (H-1) and once for blockNumber (H) (for WithNextMs). That means a single trace leases both snap[H-1] and snap[H]. As long as TraceBakeSnapshotWindow >= 2 this is fine, but if an operator misconfigures window=1 the second lease will miss and silently fall through to SS-pebble for oracle_mem/WithNextMs. Consider clamping window to >= 2 (or whatever the documented minimum is) at config-load time.
| defer close(ch) | ||
| // Release per-tree snapshot refs; don't call cloned.Close() which | ||
| // would also tear down the live db's writer pool and stream handler. | ||
| defer func() { _ = cloned.MultiTree.Close() }() |
There was a problem hiding this comment.
_ = will silently hide the new "Snapshot over-close" path. Since over-close is a real bug (it means refcounts are unbalanced), at least:
defer func() {
if err := cloned.MultiTree.Close(); err != nil {
logger.Error("failed to release cloned snapshot refs after rewrite", "err", err)
}
}()

Describe your changes and provide context
StateReleaseFunc, avoiding GC finalizers.trace_bake_use_snapshot; falls back when the backend cannot provide a snapshot.Testing performed to validate your change
go test ./sei-db/state_db/sc/memiavl -run 'TreeCopy|Snapshot' -count=1Note
Medium Risk
Introduces optional snapshot-backed state for
debug_trace*baking and changes memiavl snapshot lifecycle/refcounting, which can affect memory usage and correctness of trace/state replay if mishandled (though it is opt-in with fallbacks).Overview
Adds an opt-in path (
evm.trace_bake_use_snapshot,evm.trace_bake_snapshot_window) for the trace baker to replay blocks against in-memory memiavl snapshots instead of SS-pebble, including wiring snapshot capture atEndBlockand closing retained snapshots on app shutdown.Plumbs a new
TraceContextProviderthrough the EVM RPC servers and tracing backend sodebug_trace*can build contexts from leased snapshots and return a release function via geth’sStateReleaseFunc.Extends storev2/rootmulti and SeiDB committers to support O(1) snapshot
Copy()and safe release of snapshot refs, and adds refcounting to memiavl snapshot mmap lifecycles with regression/unit tests covering rewrite/reload and snapshot-store eviction behavior.Reviewed by Cursor Bugbot for commit 0e3215d. Bugbot is set up for automated code reviews on this repo. Configure here.