Use `BestBlock` for chain state serialization (and somewhat parallelize init) #4266

TheBlueMatt · 2025-12-08T14:43:26Z

On restart, LDK expects the chain to be replayed starting from
where it was when objects were last serialized. This is fine in the
normal case, but if there was a reorg and the node which we were
syncing from either resynced or was changed, the last block that we
were synced as of might no longer be available. As a result, it
becomes impossible to figure out where the fork point is, and thus
to replay the chain.

Luckily, changing the block source during a reorg isn't exactly
common, but we shouldn't end up with a bricked node.

To address this, lightning-block-sync allows the user to pass in
Cache which can be used to cache recent blocks and thus allow for
reorg handling in this case. However, serialization for, and a
reasonable default implementation of a Cache was never built.

Instead, here, we start taking a different approach. To avoid
developers having to persist yet another object, we move
BestBlock to storing some number of recent block hashes. This
allows us to find the fork point with just the serialized state.

In conjunction with 403dc1a (which
allows us to disconnect blocks without having the stored header),
this should allow us to replay chain state after a reorg even if
we no longer have access to the top few blocks of the old chain
tip.

While we only really need to store ANTI_REORG_DELAY blocks (as we
generally assume that any deeper reorg won't happen and thus we
don't guarantee we handle it correctly), its nice to store a few
more to be able to handle more than a six block reorg. While other
parts of the codebase may not be entirely robust against such a
reorg if the transactions confirmed change out from under us, its
entirely possible (and, indeed, common) for reorgs to contain
nearly identical transactions.

While we're here, we also parallelize a few parts of the initial sync and utilize the cache to speed it up. As such, this is based on #4147 which is based on #4175..

ldk-reviews-bot · 2025-12-08T14:43:29Z

👋 Thanks for assigning @joostjager as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

TheBlueMatt · 2026-01-27T00:34:52Z

Oops, I forgot about this PR lol. Rebased it should be ready for review!

On restart, LDK expects the chain to be replayed starting from where it was when objects were last serialized. This is fine in the normal case, but if there was a reorg and the node which we were syncing from either resynced or was changed, the last block that we were synced as of might no longer be available. As a result, it becomes impossible to figure out where the fork point is, and thus to replay the chain. Luckily, changing the block source during a reorg isn't exactly common, but we shouldn't end up with a bricked node. To address this, `lightning-block-sync` allows the user to pass in `Cache` which can be used to cache recent blocks and thus allow for reorg handling in this case. However, serialization for, and a reasonable default implementation of a `Cache` was never built. Instead, here, we start taking a different approach. To avoid developers having to persist yet another object, we move `BestBlock` to storing some number of recent block hashes. This allows us to find the fork point with just the serialized state. In conjunction with 403dc1a (which allows us to disconnect blocks without having the stored header), this should allow us to replay chain state after a reorg even if we no longer have access to the top few blocks of the old chain tip. While we only really need to store `ANTI_REORG_DELAY` blocks (as we generally assume that any deeper reorg won't happen and thus we don't guarantee we handle it correctly), its nice to store a few more to be able to handle more than a six block reorg. While other parts of the codebase may not be entirely robust against such a reorg if the transactions confirmed change out from under us, its entirely possible (and, indeed, common) for reorgs to contain nearly identical transactions.

joostjager

No major findings, although this part of the code is completely new to me, so I might miss things.

The PR could have been split in two. One with the reorg fix, and another one with the perf optimizations. They are closely related though.

Final comment is whether BestBlock is now not descriptive enough anymore.

lightning/src/chain/mod.rs

joostjager · 2026-01-27T07:23:33Z

lightning/src/chain/mod.rs

+	///
+	/// These ensure we can find the fork point of a reorg if our block source no longer has the
+	/// previous best tip after a restart.
+	pub previous_blocks: [Option<BlockHash>; ANTI_REORG_DELAY as usize * 2],


This const really can't be changed, because it would break serialization. Does that need to be noted here?

ANTI_REORG_DELAY*2 shows up in various locations, maybe make it a const too

Indeed. I noted this by simply dropping the constant reference in the serialization logic so that any constant change will be a compilation failure.

joostjager · 2026-01-27T10:01:46Z

lightning/src/util/ser.rs

+		for hash_opt in self {
+			match hash_opt {
+				Some(hash) => hash.write(w)?,
+				None => ([0u8; 32]).write(w)?,


Is there a reason to not use the standard Option serialization and go with the magic all-zeroes instead?

Less read branching felt worth it given something had to be implemented anyway it seemed simpler.

I'd go for the standard solution here. Interested to hear 2nd reviewer opinion.

The "standard solution" here just being to read/write the Option<[u8; 32]> directly in the loop rather than matching? Not sure its worth more branching to save 5 LoC. In general, stream-based deserialization logic is super branch-intensive and generally kinda slow because of it.

lightning-block-sync/src/lib.rs

joostjager · 2026-01-27T10:28:19Z

lightning-block-sync/src/lib.rs

+				found_header = Some(*header);
+				break;
+			}
+			let height = prev_best_block.height.checked_sub(height_diff).ok_or(


I think this can be validated just as well once outside the loop?

I mean sure but it seemed like cleaner code to do it in the loop where we need to calculate the height for each block anyway?

lightning-block-sync/src/lib.rs

joostjager · 2026-01-27T10:40:36Z

lightning-background-processor/src/lib.rs

-#[cfg(not(c_bindings))]
-use lightning::util::async_poll::MaybeSend;
 use lightning::util::logger::Logger;
+#[cfg(not(c_bindings))]


commit msg typo

joostjager · 2026-01-27T10:44:42Z

lightning-block-sync/src/async_poll.rs

@@ -0,0 +1 @@
+../../lightning/src/util/async_poll.rs


Does this symlinking work on Windows? I can't believe that it is necessary to use this hack. I think it can be called a hack?

AFAIU yes it does. git used to default to not allowing it but i think that has since changed. We already have other symlinks in the repo for our lockorder stuff and maybe crypto.

joostjager · 2026-01-27T10:49:17Z

lightning-block-sync/src/init.rs

-			.connect_blocks(common_ancestor, most_connected_blocks, &mut chain_poller)
-			.await
-			.map_err(|(e, _)| e)?;
+	while !most_connected_blocks.is_empty() {


Isn't the parallel fetching useful in ChainNotifier.connect_blocks too?

Shouldn't be all that useful/important. At runtime we should only get a block every ~ten minutes so having two come in at once such that we need to parallel fetch should be pretty rare. Technically on startup we may not do a full sync and may just sync to a common tip rather than the current tip, making the first sync slower, but (a) no one does that today afaik and (b) connecting blocks marginally more slowly at runtime/immediately after startup isn't really a problem, and may even be a good thing by reserving CPU cycles for other tasks.

lightning-block-sync/src/lib.rs

ldk-reviews-bot · 2026-01-27T11:01:11Z

👋 The first review has been submitted!

Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer.

The deserialization of `ChannelMonitor`, `ChannelManager`, and `OutputSweeper` is implemented for a `(BlockHash, ...)` pair rather than on the object itself. This ensures developers are pushed to think about initial chain sync after deserialization and provides the latest chain sync state conviniently at deserialization-time. In the previous commit we started storing additional recent block hashes in `BestBlock` for use during initial sync to ensure we can handle reorgs while offline if the chain source loses the reorged-out blocks. Here, we move the deserialization routines to be on a `(BestBlock, ...)` pair instead of `(BlockHash, ...)`, providing access to those recent block hashes at deserialization-time.

In 403dc1a we converted the `Listen` disconnect semantics to only pass the fork point, rather than each block being disconnected. We did not, however, update the semantics of `lightning-block-sync`'s `Cache` to reduce patch size. Here we go ahead and do so, dropping `ChainDifference::disconnected_blocks` as well as its no longer needed.

On restart, LDK expects the chain to be replayed starting from where it was when objects were last serialized. This is fine in the normal case, but if there was a reorg and the node which we were syncing from either resynced or was changed, the last block that we were synced as of might no longer be available. As a result, it becomes impossible to figure out where the fork point is, and thus to replay the chain. Luckily, changing the block source during a reorg isn't exactly common, but we shouldn't end up with a bricked node. To address this, `lightning-block-sync` allows the user to pass in `Cache` which can be used to cache recent blocks and thus allow for reorg handling in this case. However, serialization for, and a reasonable default implementation of a `Cache` was never built. Instead, here, we start taking a different approach. To avoid developers having to persist yet another object, we move `BestBlock` to storing some number of recent block hashes. This allows us to find the fork point with just the serialized state. In a previous commit, we moved deserialization of various structs to return the `BestBlock` rather than a `BlockHash`. Here we move to actually using it, taking a `BestBlock` in place of `BlockHash` to `init::synchronize_listeners` and walking the `previous_blocks` list to find the fork point rather than relying on the `Cache`.

In the previous commit, we moved to relying on `BestBlock::previous_blocks` to find the fork point in `lightning-block-sync`'s `init::synchronize_listeners`. Here we now drop the `Cache` parameter as we no longer rely on it. Because we now have no reason to want a persistent `Cache`, we remove the trait from the public interface. However, to keep disconnections reliable we return the `UnboundedCache` we built up during initial sync from `init::synchronize_listeners` which we expect developers to pass to `SpvClient::new`.

In the previous commit we moved to hard-coding `UnboundedCache` in the `lightning-block-sync` interface. This is great, except that its an unbounded cache that can use arbitrary amounts of memory (though never really all that much - its just headers that come in while we're running). Here we simply limit the size, and while we're at it give it a more generic `HeaderCache` name.

In the next commit we'll fetch blocks during initial connection in parallel, which requires a multi-future poller. Here we add a symlink to the existing `lightning` `async_poll.rs` file, making it available in `lightning-block-sync`

In `init::synchronize_listeners` we may end up spending a decent chunk of our time just fetching block data. Here we parallelize that step across up to 36 blocks at a time. On my node with bitcoind on localhost, the impact of this is somewhat muted by block deserialization being the bulk of the work, however a networked bitcoind would likely change that. Even still, fetching a batch of 36 blocks in parallel happens on my node in ~615 ms vs ~815ms in serial.

In `lightning-blocksync::init::synchronize_listeners`, we may have many listeners we want to do a chain diff on. When doing so, we should make sure we utilize our header cache, rather than querying our chain source for every header we need for each listener. Here we do so, inserting into the cache as we do chain diffs. On my node with a bitcoind on localhost, this brings the calculate-differences step of `init::synchronize_listeners` from ~500ms to under 150ms.

When `synchronize_listeners` runs, it returns a cache of the headers it needed when doing chain difference-finding. This allows us to ensure that when we start running normally we have all the recent headers in case we need them to reorg. Sadly, in some cases it was returning a mostly-empty cache. Because it was only being filled during block difference reconciliation it would only get a block around each listener's fork point. Worse, because we were calling `disconnect_blocks` with the cache the cache would assume we were reorging against the main chain and drop blocks we actually want. Instead, we avoid dropping blocks on `disconnect_blocks` calls and ensure we always add connected blocks to the cache.

TheBlueMatt mentioned this pull request Dec 8, 2025

lightning-block-sync: Implement serialization logic for header Cache types #3600

Open

TheBlueMatt force-pushed the 2025-10-reorg-blocksource-swap branch from 9fc766e to 4efd4f8 Compare January 27, 2026 00:34

TheBlueMatt marked this pull request as ready for review January 27, 2026 00:34

ldk-reviews-bot requested a review from joostjager January 27, 2026 00:35

TheBlueMatt force-pushed the 2025-10-reorg-blocksource-swap branch from 4efd4f8 to f54a087 Compare January 27, 2026 00:39

joostjager reviewed Jan 27, 2026

View reviewed changes

TheBlueMatt and others added 15 commits January 27, 2026 15:38

f add claude's bestblock test

0532501

f use hard-coded values to break constant changes leading to ser break

0618934

f correct cache retention on reorg

2e75824

f correct + constify header cache limit

de82976

Consolidate all the pub aync utils to native_async

c91cdac

Add async_poll.rs to lightning-block-sync

f36e1ff

In the next commit we'll fetch blocks during initial connection in parallel, which requires a multi-future poller. Here we add a symlink to the existing `lightning` `async_poll.rs` file, making it available in `lightning-block-sync`

Silence "elided lifetime has a name" warnings in no-std locking

c12f01c

TheBlueMatt force-pushed the 2025-10-reorg-blocksource-swap branch from f54a087 to 6bd7c74 Compare January 27, 2026 19:37

TheBlueMatt requested a review from joostjager January 27, 2026 19:38

		@@ -0,0 +1 @@
		../../lightning/src/util/async_poll.rs No newline at end of file

Use BestBlock for chain state serialization (and somewhat parallelize init) #4266

Are you sure you want to change the base?

Use BestBlock for chain state serialization (and somewhat parallelize init) #4266

Conversation

TheBlueMatt commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ldk-reviews-bot commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TheBlueMatt commented Jan 27, 2026

Uh oh!

joostjager left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ldk-reviews-bot commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Use `BestBlock` for chain state serialization (and somewhat parallelize init) #4266

Use `BestBlock` for chain state serialization (and somewhat parallelize init) #4266

TheBlueMatt commented Dec 8, 2025 •

edited

Loading

ldk-reviews-bot commented Dec 8, 2025 •

edited

Loading