Store Durable Object alarms per-namespace on disk#6104
Open
threepointone wants to merge 1 commit intomainfrom
Open
Store Durable Object alarms per-namespace on disk#6104threepointone wants to merge 1 commit intomainfrom
threepointone wants to merge 1 commit intomainfrom
Conversation
Previously, all alarms were stored in a single global in-memory SQLite database, meaning they were lost on process restart. This made it impossible to test alarm resiliency scenarios. Move alarm scheduler ownership from Server into ActorNamespace, so each DO namespace gets its own AlarmScheduler backed by metadata.sqlite in the namespace's storage directory. On-disk namespaces get persistent alarms; in-memory namespaces get in-memory alarms. No new configuration is needed -- alarm storage follows the existing durableObjectStorage setting. Ref: #605 (comment) Co-authored-by: Cursor <cursoragent@cursor.com>
|
All contributors have signed the CLA ✍️ ✅ |
Author
|
I have read the CLA Document and I hereby sign the CLA |
Author
|
recheck |
Author
|
the workers-sdk test failure seems to be just about fixing the file count on the workers-sdk side |
Contributor
|
/bigbonk roast this PR. Hi Sunil. testing bonk |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
(thanks opus)
Alarms in workerd are currently stored in a single global in-memory SQLite database. This means all scheduled alarms are lost on process restart, which prevents testing alarm resiliency scenarios and doesn't match production behavior.
This PR moves alarm scheduler ownership from
ServerintoActorNamespace, so each DO namespace gets its ownAlarmSchedulerbacked bymetadata.sqlitein the namespace's storage directory. On-disk namespaces get persistent alarms; in-memory namespaces get in-memory alarms.Follows up on the design discussion in #605 (comment).
Design decisions
Per-namespace, not global. Kenton's original feedback on #605 was that a single global alarm database is problematic: it decouples alarm storage from namespace data, it makes splitting/combining configs lossy, and it creates the confusing possibility of on-disk DOs with in-memory alarms (or vice versa). Each namespace now owns its scheduler and stores alarms alongside its actor
.sqlitefiles.No new configuration. Alarm storage mode is inferred from the existing
durableObjectStoragesetting. If the namespace useslocalDisk, alarms go on disk. If it'sinMemoryornone, alarms stay in memory. This was an explicit goal from the #605 discussion -- no new config knobs needed.metadata.sqliteas the filename. Named generically (rather thanalarms.sqlite) so it can hold other per-namespace metadata in the future, as Kenton suggested.AlarmSchedulerclass unchanged. The class already accepted a VFS + path in its constructor, so no API changes were needed. The only change is where schedulers are created and who owns them.Changes
server.h: Removed globalAlarmSchedulermember,startAlarmScheduler()declaration, and thealarm-scheduler.hinclude (moved to.c++).server.c++:ActorNamespacenow owns akj::Maybe<kj::Own<AlarmScheduler>>. Created inlink()using the namespace's own VFS (on-disk) or an in-memory VFS (fallback). The namespace self-registers with its scheduler at link time. Removed the global scheduler,LinkedIoChannels::alarmScheduler, and all related wiring.server-test.c++: Added a new "Durable Object alarm persistence (on disk)" test that sets an alarm, tears down the server, restarts with the same storage directory, and verifies the alarm survived. Updated two existing tests whose file-count assertions now includemetadata.sqlite.Notes for reviewers
alarm-scheduler.handalarm-scheduler.c++are completely untouched.kj::systemPreciseCalendarClock()call in theActorNamespaceconstructor is a pre-existing pattern -- the oldstartAlarmScheduler()called it the same way. Threading a calendar clock through theServerconstructor would be cleaner but significantly more churn for no immediate benefit.ActorNamespacematters:ownAlarmScheduleris declared beforeactorsso it outlives all actors that reference it (same constraint as the existingactorStoragefield).thiscapture in theregisterNamespacelambda is safe because the scheduler is owned by the namespace and destroyed first. This matches the original pattern which captured&actorNs.