feat(metadata): Persistent WAL journal with recovery and compaction by krishvishal · Pull Request #2916 · apache/iggy

krishvishal · 2026-03-11T09:22:04Z

Which issue does this PR close?

Summary

Add FileStorage: file-backed Storage impl with positional reads, appends, truncate, fsync.
Add MetadataJournal: append-only WAL indexed by a ring buffer (op % SLOT_COUNT). Crash recovery scans forward and truncates partial tail entries. Compaction atomically rewrites the WAL keeping only entries above the snapshot watermark.
Add recover(): startup recovery pipeline that loads the latest snapshot, opens the WAL, and replays entries past the snapshot sequence number through the state machine.
Add checkpoint() on IggyMetadata: persists a snapshot then advances the journal watermark and compacts.
Update Journal / Storage traits: io::Result return types, set_snapshot_op, remaining_capacity, compact default methods.

codecov · 2026-03-11T10:52:07Z

Codecov Report

❌ Patch coverage is 61.03226% with 302 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.79%. Comparing base (f0e8578) to head (01615c1).
⚠️ Report is 4 commits behind head on master.

Files with missing lines	Patch %	Lines
core/metadata/src/impls/metadata.rs	13.55%	100 Missing and 2 partials ⚠️
core/journal/src/metadata_journal.rs	84.03%	40 Missing and 21 partials ⚠️
core/metadata/src/impls/recovery.rs	70.40%	31 Missing and 6 partials ⚠️
core/partitions/src/journal.rs	0.00%	24 Missing ⚠️
core/simulator/src/deps.rs	0.00%	22 Missing ⚠️
core/journal/src/file_storage.rs	70.58%	13 Missing and 7 partials ⚠️
core/metadata/src/stm/snapshot.rs	0.00%	14 Missing ⚠️
core/journal/src/lib.rs	0.00%	10 Missing ⚠️
core/simulator/src/replica.rs	0.00%	6 Missing ⚠️
core/partitions/src/iggy_partition.rs	0.00%	5 Missing ⚠️
... and 1 more

Additional details and impacted files

@@             Coverage Diff              @@
##             master    #2916      +/-   ##
============================================
- Coverage     71.81%   71.79%   -0.03%     
  Complexity      930      930              
============================================
  Files          1116     1121       +5     
  Lines         92616    93699    +1083     
  Branches      70139    71231    +1092     
============================================
+ Hits          66512    67270     +758     
- Misses        23543    23820     +277     
- Partials       2561     2609      +48

Flag	Coverage Δ
csharp	`67.43% <ø> (-0.21%)`	⬇️
go	`36.38% <ø> (ø)`
java	`62.08% <ø> (ø)`
node	`91.37% <ø> (-0.15%)`	⬇️
python	`81.43% <ø> (ø)`
rust	`72.48% <61.03%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
core/partitions/src/log.rs	`0.00% <0.00%> (ø)`
core/partitions/src/iggy_partition.rs	`0.00% <0.00%> (ø)`
core/simulator/src/replica.rs	`0.00% <0.00%> (ø)`
core/journal/src/lib.rs	`0.00% <0.00%> (ø)`
core/metadata/src/stm/snapshot.rs	`84.13% <0.00%> (-6.08%)`	⬇️
core/journal/src/file_storage.rs	`70.58% <70.58%> (ø)`
core/simulator/src/deps.rs	`0.00% <0.00%> (ø)`
core/partitions/src/journal.rs	`0.00% <0.00%> (ø)`
core/metadata/src/impls/recovery.rs	`70.40% <70.40%> (ø)`
core/journal/src/metadata_journal.rs	`84.03% <84.03%> (ø)`
... and 1 more

... and 14 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

krishvishal · 2026-03-11T15:57:43Z

The FillSnapShot trait bound has become a bit viral. I'm thinking of ways to reduce that.

core/metadata/src/impls/metadata.rs

core/journal/src/metadata_journal.rs

atharvalade · 2026-03-11T17:38:10Z

The FillSnapShot trait bound has become a bit viral. I'm thinking of ways to reduce that.

hmm maybe you can try a blanket super-trait alias like trait MetadataStm: StateMachine<...> + FillSnapshot<MetadataSnapshot> {} with a blanket impl. It won't reduce the actual propagation but cuts the noise from 4 lines to 1 at each use site. Longer term, pulling checkpoint into a separate component that owns the FillSnapshot concern behind a callback would remove the bound from IggyMetadata's M parameter entirely.

atharvalade

lgtm

core/journal/src/file_storage.rs

numinnex · 2026-03-13T15:45:43Z

core/journal/src/lib.rs

+
+    /// Advance the snapshot watermark so entries at or below `op` may be
+    /// evicted from the journal's in-memory index. The default is a no-op
+    /// for journals that do not require this watermark.
+    fn set_snapshot_op(&self, _op: u64) {}
+
+    /// Number of entries that can be appended before the journal would need
+    /// to evict un-snapshotted slots. Returns `None` for journals that don't persist to disk.
+    fn remaining_capacity(&self) -> Option<usize> {
+        None
+    }
+
+    /// Remove snapshotted entries from the WAL to reclaim disk space.
+    /// The default is a no-op for journals that do not persist to disk.
+    ///
+    /// # Errors
+    /// Returns an I/O error if compaction fails.
+    fn compact(&self) -> impl Future<Output = io::Result<()>> {
+        async { Ok(()) }
+    }


I think, I'd prefer the journal to have some sort of drain method that allows to "extract" range of items, simiarly to how Vec::drain(begin..end) works. This way we do not hack some apis on the interface just to cover an edge case, but we create a general purpose API, that can be used to shrink the journal and we handle the watermark outside of the Journal.

And yeah an Stream iter would be perfect, but since AsyncIterator is still unstable and it's probably going to replace the Stream trait, we can return no-async drain iterator and do the disk read for the entire range in one go.

I agree that drain API is much cleaner. One thing to consider is, drain would read and deserialize all removed entries to return them to the caller, but the main consumer today (checkpoint) doesn't need the returned entries, it just wants them removed from WAL. What do we do considering wasted the deserialization cost?

Also the ops in the journal aren't necessarily in a contiguous range. There can be gaps from pipelined prepares arriving out of order, slots overwritten when a new op lands on the same index. So drain(begin..end) is semantically not suitable here.

The drain api would take op as the input e.g you would provide op_range rather than index_range, so the wrapping of the journal doesn't matter.

The cost of deserialization isn't that big, if not existent in-fact, since we store Message<PrepareHeader> as entries in our IggyMetadata journal, so it's just reading opaque bytes from disk and constructing the Message<PrepareHeader> from them (only cost there is the validation).

core/journal/src/lib.rs

core/metadata/src/impls/metadata.rs

numinnex · 2026-03-13T16:09:48Z

core/metadata/src/impls/metadata.rs

    pub snapshot: Option<S>,
    /// State machine - lives on all shards
    pub mux_stm: M,
+    /// Root data directory, used by checkpoint to persist snapshots.


Maybe it's good idea to store some sort of snapshot_coordinator there struct, that would hide those details away ?

numinnex · 2026-03-13T16:12:29Z

core/metadata/src/impls/metadata.rs

+    /// # Errors
+    /// Returns `SnapshotError` if snapshotting, persistence, or compaction fails.
+    #[allow(clippy::future_not_send)]
+    pub async fn checkpoint(&self, data_dir: &Path, last_op: u64) -> Result<(), SnapshotError>


If we'd go with Comment on line R143, then this would be part of that coordinator I've mentioned on R143

pollution. Now the requirement is captured at the creation of IggyMetaData. - Fix review comments

std::fs

krishvishal · 2026-03-17T18:04:46Z

CI failed due to flaky tests. It is being addressed here: #2963

krishvishal force-pushed the reboot-state branch from 8162f81 to f31d9f2 Compare March 11, 2026 10:50

atharvalade suggested changes Mar 11, 2026

View reviewed changes

core/metadata/src/impls/metadata.rs Outdated Show resolved Hide resolved

core/metadata/src/impls/metadata.rs Outdated Show resolved Hide resolved

core/journal/src/metadata_journal.rs Show resolved Hide resolved

atharvalade approved these changes Mar 12, 2026

View reviewed changes

numinnex requested changes Mar 13, 2026

View reviewed changes

krishvishal added 10 commits March 17, 2026 22:23

feat(metadata): Persistent WAL journal with recovery and compaction

4cc218e

fix(metadata): merge conflicts and clippy lints

a7d6ae0

fix(ci): update DEPENDENCIES.MD

f2f755a

fix(metadata): use fn pointer to address the trait bound

8c9dc7c

pollution. Now the requirement is captured at the creation of IggyMetaData. - Fix review comments

refactor(journal): change FileStorage to use compio instead of

52c294f

std::fs

refactor(journal): use UnsafeCell instead of RefCell

56d2e4c

fix(metadata): address review comments

5517603

more fixes.

62e47f1

fix: merge conflicts

c554ec9

fix: clippy

704207a

krishvishal force-pushed the reboot-state branch from 5a99456 to 704207a Compare March 17, 2026 17:08

krishvishal added 5 commits March 18, 2026 00:06

Merge branch 'master' into reboot-state

943ba12

Merge branch 'master' into reboot-state

d5b6242

Merge branch 'master' into reboot-state

9345800

address review comments

710723a

Merge branch 'master' into reboot-state

01615c1

Conversation

krishvishal commented Mar 11, 2026

Which issue does this PR close?

Summary

Uh oh!

codecov bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

krishvishal commented Mar 11, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

atharvalade commented Mar 11, 2026

Uh oh!

atharvalade left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

krishvishal Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

krishvishal commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Mar 11, 2026 •

edited

Loading

krishvishal Mar 14, 2026 •

edited

Loading