Skip to content

Comments

Store Durable Object alarms per-namespace on disk#6104

Open
threepointone wants to merge 1 commit intomainfrom
alarm-storage
Open

Store Durable Object alarms per-namespace on disk#6104
threepointone wants to merge 1 commit intomainfrom
alarm-storage

Conversation

@threepointone
Copy link

(thanks opus)

Alarms in workerd are currently stored in a single global in-memory SQLite database. This means all scheduled alarms are lost on process restart, which prevents testing alarm resiliency scenarios and doesn't match production behavior.

This PR moves alarm scheduler ownership from Server into ActorNamespace, so each DO namespace gets its own AlarmScheduler backed by metadata.sqlite in the namespace's storage directory. On-disk namespaces get persistent alarms; in-memory namespaces get in-memory alarms.

Follows up on the design discussion in #605 (comment).

Design decisions

Per-namespace, not global. Kenton's original feedback on #605 was that a single global alarm database is problematic: it decouples alarm storage from namespace data, it makes splitting/combining configs lossy, and it creates the confusing possibility of on-disk DOs with in-memory alarms (or vice versa). Each namespace now owns its scheduler and stores alarms alongside its actor .sqlite files.

No new configuration. Alarm storage mode is inferred from the existing durableObjectStorage setting. If the namespace uses localDisk, alarms go on disk. If it's inMemory or none, alarms stay in memory. This was an explicit goal from the #605 discussion -- no new config knobs needed.

metadata.sqlite as the filename. Named generically (rather than alarms.sqlite) so it can hold other per-namespace metadata in the future, as Kenton suggested.

AlarmScheduler class unchanged. The class already accepted a VFS + path in its constructor, so no API changes were needed. The only change is where schedulers are created and who owns them.

Changes

  • server.h: Removed global AlarmScheduler member, startAlarmScheduler() declaration, and the alarm-scheduler.h include (moved to .c++).
  • server.c++: ActorNamespace now owns a kj::Maybe<kj::Own<AlarmScheduler>>. Created in link() using the namespace's own VFS (on-disk) or an in-memory VFS (fallback). The namespace self-registers with its scheduler at link time. Removed the global scheduler, LinkedIoChannels::alarmScheduler, and all related wiring.
  • server-test.c++: Added a new "Durable Object alarm persistence (on disk)" test that sets an alarm, tears down the server, restarts with the same storage directory, and verifies the alarm survived. Updated two existing tests whose file-count assertions now include metadata.sqlite.

Notes for reviewers

  • alarm-scheduler.h and alarm-scheduler.c++ are completely untouched.
  • The kj::systemPreciseCalendarClock() call in the ActorNamespace constructor is a pre-existing pattern -- the old startAlarmScheduler() called it the same way. Threading a calendar clock through the Server constructor would be cleaner but significantly more churn for no immediate benefit.
  • Declaration order in ActorNamespace matters: ownAlarmScheduler is declared before actors so it outlives all actors that reference it (same constraint as the existing actorStorage field).
  • The this capture in the registerNamespace lambda is safe because the scheduler is owned by the namespace and destroyed first. This matches the original pattern which captured &actorNs.

Previously, all alarms were stored in a single global in-memory SQLite
database, meaning they were lost on process restart. This made it
impossible to test alarm resiliency scenarios.

Move alarm scheduler ownership from Server into ActorNamespace, so each
DO namespace gets its own AlarmScheduler backed by metadata.sqlite in
the namespace's storage directory. On-disk namespaces get persistent
alarms; in-memory namespaces get in-memory alarms. No new configuration
is needed -- alarm storage follows the existing durableObjectStorage
setting.

Ref: #605 (comment)
Co-authored-by: Cursor <cursoragent@cursor.com>
@threepointone threepointone requested review from a team as code owners February 18, 2026 14:22
@github-actions
Copy link

github-actions bot commented Feb 18, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@threepointone
Copy link
Author

I have read the CLA Document and I hereby sign the CLA

@threepointone
Copy link
Author

recheck

github-actions bot added a commit that referenced this pull request Feb 18, 2026
@kentonv kentonv self-requested a review February 18, 2026 14:40
@threepointone
Copy link
Author

the workers-sdk test failure seems to be just about fixing the file count on the workers-sdk side
I don't understand the windows failure tho

@southpolesteve
Copy link
Contributor

/bigbonk roast this PR. Hi Sunil. testing bonk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants