Reset LSPS5 `persistence_in_flight` counter on persist errors by tnull · Pull Request #4597 · lightningdevkit/rust-lightning

tnull · 2026-05-05T20:14:27Z

LSPS5ServiceHandler::persist incremented persistence_in_flight at the top as a single-runner gate, but only decremented it on the success path: each interior ? on a kv_store future propagated the error out of the function while leaving the counter at >= 1. After one transient I/O failure (disk full, brief unavailability of a remote KVStore, EPERM, etc.) every subsequent persist() call hit the fetch_add > 0 short-circuit and silently returned Ok(false).

The in-memory needs_persist flags then continued to grow without ever reaching disk, so webhook state, removals, and notification cooldowns were lost on the next process restart — including the spec-mandated webhook retention/pruning state — without any error surfaced to the operator. The counter is monotonic, so recovery required a process restart.

Adopt the LSPS1 / LSPS2 pattern: split the body into an inner do_persist and an outer persist that unconditionally clears the counter via store(0) after the call returns, regardless of outcome. A failed write now still propagates Err, but the next persist() attempt actually retries the write instead of no-op'ing.

Co-Authored-By: HAL 9000

ldk-reviews-bot · 2026-05-05T20:14:30Z

👋 Thanks for assigning @TheBlueMatt as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

ldk-claude-review-bot · 2026-05-05T20:19:40Z

I've reviewed the full diff, comparing the new LSPS5 persist/do_persist structure against the LSPS1 and LSPS2 reference implementations. Let me trace all paths through the counter logic.

LSPS5 persist() paths:

Early return (fetch_add > 0): counter correctly left for the running thread to manage.
do_persist() returns Err: explicit store(0) before returning — correct.
do_persist() returns Ok, fetch_sub(1) returns 1: counter is now 0, break — correct.
do_persist() returns Ok, fetch_sub(1) returns > 1: store(1), loop again — correct.

All paths correctly reset the counter. The fix is sound.

No issues found.

The code correctly ensures persistence_in_flight is reset to 0 on all exit paths. The structural placement of the loop differs from LSPS1/LSPS2 (loop in persist with store(0) only on error vs. loop in do_persist with unconditional store(0) + debug_assert in persist), but the semantics are equivalent — no counter value can leak. The test properly validates the regression scenario via a FailableKVStore that selectively fails LSPS5 namespace writes.

TheBlueMatt · 2026-05-05T20:26:11Z


+		let res = self.do_persist().await;
+		debug_assert!(res.is_err() || self.persistence_in_flight.load(Ordering::Acquire) == 0);
+		self.persistence_in_flight.store(0, Ordering::Release);


No, this races with a second writer that is started at the same time. We should move the loop out of the inner method and into this method to control the flag entirely in this method.

Added a fixup.

codecov · 2026-05-05T23:46:15Z

Codecov Report

❌ Patch coverage is 46.15385% with 28 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.22%. Comparing base (1a26867) to head (a23feec).
⚠️ Report is 22 commits behind head on main.

Files with missing lines	Patch %	Lines
lightning-liquidity/src/lsps5/service.rs	46.15%	24 Missing and 4 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4597      +/-   ##
==========================================
- Coverage   86.84%   86.22%   -0.62%     
==========================================
  Files         161      156       -5     
  Lines      109260   108645     -615     
  Branches   109260   108645     -615     
==========================================
- Hits        94882    93677    -1205     
- Misses      11797    12362     +565     
- Partials     2581     2606      +25

Flag	Coverage Δ
fuzzing-fake-hashes	`?`
fuzzing-real-hashes	`?`
tests	`86.22% <46.15%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

TheBlueMatt

please squash

ldk-reviews-bot · 2026-05-07T21:35:57Z

👋 The first review has been submitted!

Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer.

`LSPS5ServiceHandler::persist` incremented `persistence_in_flight` at the top as a single-runner gate, but only decremented it on the success path: each interior `?` on a `kv_store` future propagated the error out of the function while leaving the counter at >= 1. After one transient I/O failure (disk full, brief unavailability of a remote `KVStore`, EPERM, etc.) every subsequent `persist()` call hit the `fetch_add > 0` short-circuit and silently returned `Ok(false)`. The in-memory `needs_persist` flags then continued to grow without ever reaching disk, so webhook state, removals, and notification cooldowns were lost on the next process restart — including the spec-mandated webhook retention/pruning state — without any error surfaced to the operator. The counter is monotonic, so recovery required a process restart. Adopt the LSPS1 / LSPS2 pattern: split the body into an inner `do_persist` and an outer `persist` that unconditionally clears the counter via `store(0)` after the call returns, regardless of outcome. A failed write now still propagates `Err`, but the next `persist()` attempt actually retries the write instead of no-op'ing. Co-Authored-By: HAL 9000

tnull · 2026-05-08T09:15:32Z

please squash

Squashed without further changes.

tnull added backport 0.1 backport 0.2 labels May 5, 2026

ldk-reviews-bot requested a review from joostjager May 5, 2026 20:25

TheBlueMatt reviewed May 5, 2026

View reviewed changes

TheBlueMatt removed the request for review from joostjager May 6, 2026 01:05

tnull requested a review from TheBlueMatt May 6, 2026 09:27

TheBlueMatt approved these changes May 7, 2026

View reviewed changes

tnull force-pushed the 2026-05-lsps5-persist-counter-leak branch from a23feec to b3544de Compare May 8, 2026 09:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reset LSPS5 `persistence_in_flight` counter on persist errors#4597

Reset LSPS5 `persistence_in_flight` counter on persist errors#4597
tnull wants to merge 1 commit intolightningdevkit:mainfrom
tnull:2026-05-lsps5-persist-counter-leak

tnull commented May 5, 2026

Uh oh!

ldk-reviews-bot commented May 5, 2026 •

edited

Loading

Uh oh!

ldk-claude-review-bot commented May 5, 2026 •

edited

Loading

Uh oh!

TheBlueMatt May 5, 2026

Uh oh!

tnull May 6, 2026

Uh oh!

codecov Bot commented May 5, 2026 •

edited

Loading

Uh oh!

TheBlueMatt left a comment

Uh oh!

ldk-reviews-bot commented May 7, 2026

Uh oh!

tnull commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

tnull commented May 5, 2026

Uh oh!

ldk-reviews-bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ldk-claude-review-bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TheBlueMatt May 5, 2026

Choose a reason for hiding this comment

Uh oh!

tnull May 6, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

TheBlueMatt left a comment

Choose a reason for hiding this comment

Uh oh!

ldk-reviews-bot commented May 7, 2026

Uh oh!

tnull commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ldk-reviews-bot commented May 5, 2026 •

edited

Loading

ldk-claude-review-bot commented May 5, 2026 •

edited

Loading

codecov Bot commented May 5, 2026 •

edited

Loading