Skip to content

Conversation

@tnull
Copy link
Contributor

@tnull tnull commented Jan 26, 2026

Fixes #4346.

Previously, we refactored the GossipVerifier to not require holding a
circular reference. As part of this, we moved to a model where the
UtxoFutures are now polled by the background processor which checks
for completion through get_and_clear_pending_msg_events.

However, as part of this refactor we introduced race-condition: as we
only held Weak references in PendingChecksContext and the
UtxoFuture was directly dropped by the GossipVerifier after calling
resolve, the actual data was dropped with the future and gone when the
background processor attempted to retrieve and apply it via
check_resolved_futures.

Here, we fix this issue by simply holding on to the state Arcs in a
separate pending_states Vec that is only pruned in
check_resolved_futures, ensuring any completed results are collected
first.

@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Jan 26, 2026

🎉 This PR is now ready for review!
Please choose at least one reviewer by assigning them on the right bar.
If no reviewers are assigned within 10 minutes, I'll automatically assign one.
Once the first reviewer has submitted a review, a second will be assigned if required.

@tnull tnull marked this pull request as draft January 26, 2026 16:16
@tnull tnull force-pushed the 2026-01-fix-gossip-verification branch 2 times, most recently from 7c871f2 to 5c757b3 Compare January 26, 2026 16:18
.read_only()
.channels()
.get(&valid_announcement.contents.short_channel_id)
.unwrap()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. the rustfmt behavior of treating things like as_ref().unwrap() as "equally important" as ".get(X) or ".features" makes this so hard to follow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still the same on the latest version?

@tnull tnull self-assigned this Jan 28, 2026
@tnull tnull moved this to Goal: Merge in Weekly Goals Jan 28, 2026
@tnull tnull force-pushed the 2026-01-fix-gossip-verification branch from 5c757b3 to e3bf6a5 Compare January 28, 2026 13:53
@tnull tnull requested a review from TheBlueMatt January 28, 2026 13:54
@tnull tnull marked this pull request as ready for review January 28, 2026 13:54
@tnull tnull force-pushed the 2026-01-fix-gossip-verification branch 2 times, most recently from f672e27 to 7c08572 Compare January 28, 2026 13:57
Copy link
Contributor Author

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now switched to a much less invasive approach where we simply keep track of the pending future states Arcs in a separate Vec until the next call of check_resolved_futures, which will collect completed states and prune the data structures.

@tnull tnull added this to the 0.3 milestone Jan 28, 2026
@tnull
Copy link
Contributor Author

tnull commented Jan 28, 2026

Whoops, seems there are still some more tests to fix, drafting again 🤭

@tnull tnull marked this pull request as draft January 28, 2026 14:16
@tnull tnull force-pushed the 2026-01-fix-gossip-verification branch from 7c08572 to 585fd56 Compare January 28, 2026 14:26
tnull added 3 commits January 28, 2026 15:27
Signed-off-by: Elias Rohrer <dev@tnull.de>
Previously, we refactored the `GossipVerifier` to not require holding a
circular reference. As part of this, we moved to a model where the
`UtxoFuture`s are now polled by the background processor which checks
for completion through `get_and_clear_pending_msg_events`.

However, as part of this refactor we introduced race-condition: as we
only held `Weak` references in `PendingChecksContext` and the
`UtxoFuture` was directly dropped by the `GossipVerifier` after calling
`resolve`, the actual data was dropped with the future and gone when the
background processor attempted to retrieve and apply it via
`check_resolved_futures`.

Here, we fix this issue by simply holding on to the `state` `Arc`s in a
separate `pending_states` `Vec` that is only pruned in
`check_resolved_futures`, ensuring any completed results are collected
first.

Signed-off-by: Elias Rohrer <dev@tnull.de>
@tnull tnull force-pushed the 2026-01-fix-gossip-verification branch from 585fd56 to 0da4883 Compare January 28, 2026 14:27
@tnull
Copy link
Contributor Author

tnull commented Jan 28, 2026

Whoops, seems there are still some more tests to fix, drafting again 🤭

... and they're passing again.

@tnull tnull marked this pull request as ready for review January 28, 2026 14:28
@codecov
Copy link

codecov bot commented Jan 28, 2026

Codecov Report

❌ Patch coverage is 92.06349% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.11%. Comparing base (9e91b2e) to head (0da4883).
⚠️ Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
lightning/src/routing/utxo.rs 92.06% 23 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4348      +/-   ##
==========================================
+ Coverage   86.08%   86.11%   +0.02%     
==========================================
  Files         156      156              
  Lines      102428   102625     +197     
  Branches   102428   102625     +197     
==========================================
+ Hits        88179    88377     +198     
- Misses      11754    11755       +1     
+ Partials     2495     2493       -2     
Flag Coverage Δ
tests 86.11% <92.06%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bleh, don't love the extra vec but you're right that its much simpler and probably worth it. Feel free to squash (maybe with a few more rustfmt'isms fixed) and let's land this.

.read_only()
.nodes()
.get(&node_id_1)
.unwrap()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are also pretty borderline (there's a few more further down).

.read_only()
.channels()
.get(&valid_announcement.contents.short_channel_id)
.unwrap()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still the same on the latest version?

@ldk-reviews-bot
Copy link

👋 The first review has been submitted!

Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Goal: Merge

Development

Successfully merging this pull request may close these issues.

gossip doesnt resolve async

3 participants