Skip to content

Commit c95e141

Browse files
committed
fix(webapp): fail loud on mollifier drainer misconfiguration
The bootstrap in mollifierDrainerWorker.server.ts wrapped getMollifierDrainer() in a try/catch that logged-and-continued on any error, which absorbed the two designed-to-crash throws in initializeMollifierDrainer(): - "MollifierDrainer initialised without a buffer" (missing buffer client) - "TRIGGER_MOLLIFIER_DRAIN_SHUTDOWN_TIMEOUT_MS must be at least ... below GRACEFUL_SHUTDOWN_TIMEOUT" (shutdown-timeout reconciliation) Both are deploy-time mistakes: silently disabling the drainer means the gate keeps writing to the buffer, the drainer never reads, and entries TTL out in 10min. Bounded in phase 1 (monitoring-only) but customer- visible data loss in phase 2/3 where the drainer replays into engine.trigger. Better to fail loud now than retrofit the contract later. Introduce MollifierConfigurationError for the two deterministic throws. The bootstrap's catch now rethrows that class (process crashes at module top-level → orchestrator health check fails → deploy rolls back) while still logging-and-continuing on transient errors (Redis blip during init shouldn't take the whole webapp down). instanceof + name fallback covers the Remix dev hot-reload realm edge case.
1 parent f8c4077 commit c95e141

2 files changed

Lines changed: 51 additions & 3 deletions

File tree

apps/webapp/app/v3/mollifier/mollifierDrainer.server.ts

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,25 @@ import { singleton } from "~/utils/singleton";
66
import { getMollifierBuffer } from "./mollifierBuffer.server";
77
import type { BufferedTriggerPayload } from "./bufferedTriggerPayload.server";
88

9+
// Distinct error class for the deterministic "fail loud at boot" throws
10+
// below. The bootstrap in `mollifierDrainerWorker.server.ts` catches
11+
// transient/init errors and logs them so an unrelated Redis blip doesn't
12+
// crash the webapp, but it RETHROWS this class — a misconfigured
13+
// shutdown timeout or missing buffer is a deploy-time mistake that
14+
// should fail health checks and roll back, not silently disable a
15+
// half-rolled-out feature.
16+
//
17+
// The `name` getter is set explicitly so cross-realm `instanceof` checks
18+
// (e.g. when Remix dev hot-reloads the module and the consumer keeps a
19+
// reference to the old class) can fall back to `error.name === ...` and
20+
// still recognise the marker.
21+
export class MollifierConfigurationError extends Error {
22+
constructor(message: string) {
23+
super(message);
24+
this.name = "MollifierConfigurationError";
25+
}
26+
}
27+
928
function initializeMollifierDrainer(): MollifierDrainer<BufferedTriggerPayload> {
1029
const buffer = getMollifierBuffer();
1130
if (!buffer) {
@@ -15,7 +34,9 @@ function initializeMollifierDrainer(): MollifierDrainer<BufferedTriggerPayload>
1534
// the buffer can't initialise (e.g. TRIGGER_MOLLIFIER_REDIS_HOST resolves
1635
// to nothing). Crashing surfaces the misconfig immediately rather
1736
// than silently leaving entries un-drained.
18-
throw new Error("MollifierDrainer initialised without a buffer — env vars inconsistent");
37+
throw new MollifierConfigurationError(
38+
"MollifierDrainer initialised without a buffer — env vars inconsistent",
39+
);
1940
}
2041

2142
// Validate BEFORE start() so a misconfigured shutdown timeout fails
@@ -37,7 +58,7 @@ function initializeMollifierDrainer(): MollifierDrainer<BufferedTriggerPayload>
3758
env.TRIGGER_MOLLIFIER_DRAIN_SHUTDOWN_TIMEOUT_MS >=
3859
env.GRACEFUL_SHUTDOWN_TIMEOUT - shutdownMarginMs
3960
) {
40-
throw new Error(
61+
throw new MollifierConfigurationError(
4162
`TRIGGER_MOLLIFIER_DRAIN_SHUTDOWN_TIMEOUT_MS (${env.TRIGGER_MOLLIFIER_DRAIN_SHUTDOWN_TIMEOUT_MS}) must be at least ${shutdownMarginMs}ms below GRACEFUL_SHUTDOWN_TIMEOUT (${env.GRACEFUL_SHUTDOWN_TIMEOUT}); otherwise the primary's hard exit shadows the drainer's deadline.`,
4263
);
4364
}

apps/webapp/app/v3/mollifierDrainerWorker.server.ts

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,10 @@
11
import { env } from "~/env.server";
22
import { logger } from "~/services/logger.server";
33
import { signalsEmitter } from "~/services/signals.server";
4-
import { getMollifierDrainer } from "./mollifier/mollifierDrainer.server";
4+
import {
5+
getMollifierDrainer,
6+
MollifierConfigurationError,
7+
} from "./mollifier/mollifierDrainer.server";
58

69
declare global {
710
// eslint-disable-next-line no-var
@@ -79,6 +82,30 @@ export function initMollifierDrainerWorker(): void {
7982
drainer.start();
8083
}
8184
} catch (error) {
85+
// Deterministic misconfig (shutdown-timeout vs GRACEFUL_SHUTDOWN_TIMEOUT,
86+
// missing buffer client) is a deploy-time mistake the operator must
87+
// see immediately — rethrow so the process crashes, health checks
88+
// fail, and the orchestrator rolls the deploy back. Phase 1 is
89+
// monitoring-only and the silent-fallback was tempting, but Phase 2/3
90+
// make the drainer the source of truth for diverted triggers, where a
91+
// silently-disabled drainer means data loss. Better to fail loud now
92+
// than retrofit later.
93+
//
94+
// We accept both `instanceof` and `error.name === ...` so Remix dev
95+
// hot-reload (where the consumer can hold a stale class reference)
96+
// still recognises the marker.
97+
if (
98+
error instanceof MollifierConfigurationError ||
99+
(error instanceof Error && error.name === "MollifierConfigurationError")
100+
) {
101+
logger.error("Mollifier drainer misconfiguration — failing loud", {
102+
error: error.message,
103+
});
104+
throw error;
105+
}
106+
// Anything else (transient Redis blip, unexpected runtime error) is
107+
// logged but kept non-fatal — the rest of the webapp shouldn't go
108+
// down because the buffer's Redis cluster is briefly unreachable.
82109
logger.error("Failed to initialise mollifier drainer", { error });
83110
}
84111
}

0 commit comments

Comments
 (0)