Fix shutdown race: abort background tasks before closing durability#4581
Open
clockwork-labs-bot wants to merge 1 commit intomasterfrom
Open
Fix shutdown race: abort background tasks before closing durability#4581clockwork-labs-bot wants to merge 1 commit intomasterfrom
clockwork-labs-bot wants to merge 1 commit intomasterfrom
Conversation
The view_cleanup_task runs with_auto_commit() on a loop, which calls request_durability(). If db.shutdown() closes the durability channel before the task is aborted (in Host::drop), a request_durability() call panics with 'durability actor vanished'. On Windows, this can crash the server process. Fix: abort all background tasks (view_cleanup, disk_metrics, tx_metrics) before calling db.shutdown(), so they cannot race with durability channel closure. Fixes flaky test_all_templates failures on Windows CI.
kim
requested changes
Mar 7, 2026
Contributor
kim
left a comment
There was a problem hiding this comment.
This is nonsense -- the panic doesn't crash the server, it only panics a tokio task. Like most of these bot analyses, the premise is just completely wrong.
That said, if we prefer the stack trace to go away for noisiness reasons, the right way to do that is to drop the host before shutting down the database.
It will not make the test failure go away.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a race condition where the
view_cleanup_taskcan panic with "durability actor vanished" during database shutdown, crashing the server on Windows.Root Cause
The shutdown sequence in
HostController::exit_modulewas:module.exit().awaitdb.shutdown().await— closes the durability channelHost::drop— aborts background tasks (view cleanup, metrics)The
view_cleanup_taskrunswith_auto_commit()on a loop, which callsrequest_durability(). If the task fires between steps 2 and 3,request_durability()panics because the durability channel is already closed.Fix
Abort all background tasks before calling
db.shutdown(), so they cannot race with durability channel closure:module.exit().awaitdb.shutdown().await— closes the durability channelThe tasks are still aborted again in
Host::drop(no-op since already aborted).Testing
This fixes flaky
test_all_templatesfailures on Windows CI, such as:https://github.com/clockworklabs/SpacetimeDB/actions/runs/22745918903/job/65969841716?pr=4376
The failure pattern: server panics at
durability.rs:96("durability actor vanished"), server process dies, all subsequent template tests get connection refused.