Skip to content

Commit ead1e5a

Browse files
authored
feat(webapp): reload LLM pricing registry on Redis pub/sub (#3534)
## Summary Adds a Redis pub/sub reload path to the webapp's in-memory LLM pricing registry. When enabled on a process, the registry reloads from the database whenever a publish lands on the configured channel — instead of waiting for the existing 5-minute interval. Lets pricing/model changes propagate to cost enrichment within seconds. Subscription is **off by default** and opt-in per process. Only OTel-ingesting services need real-time freshness; dashboard and worker services run fine on the periodic interval and shouldn't pile onto each publish with a full-table reload. ## Design When `LLM_PRICING_RELOAD_PUBSUB_ENABLED=true`, subscribes via `createRedisClient` against `COMMON_WORKER_REDIS_*` and listens on `LLM_PRICING_RELOAD_CHANNEL` (default `llm-registry:reload`). The 5-minute periodic reload stays as a backstop, and a SIGTERM/SIGINT handler closes the subscription cleanly. The publisher side lives outside this PR — any process running in the same Redis namespace can trigger a reload by `PUBLISH llm-registry:reload <anything>`. Includes a `.server-changes/` note for the changelog. ### Debounced reload Bursts of publishes are coalesced. The first publish schedules a reload at T+`LLM_PRICING_RELOAD_DEBOUNCE_MS` (default 1s); subsequent publishes during that window are no-ops because the trailing reload picks up everything when it queries the DB. Bounds reload rate to at most 1 per debounce window regardless of publisher chattiness, so a runaway upstream publisher can't fan out into a flood of full-table-scan reloads. ## Test plan - [ ] With `LLM_PRICING_RELOAD_PUBSUB_ENABLED=false` (default): `redis-cli PUBSUB NUMSUB llm-registry:reload` returns `0` while the webapp is up - [ ] With it set to `true`: returns `>= 1` - [ ] `redis-cli PUBLISH llm-registry:reload test` returns `1` (one subscriber received) on a subscribed process - [ ] Mutate an `LlmModel` row externally, publish on the channel, observe the registry's match() picks up the change without waiting for the 5-min tick - [ ] Publish 100x in rapid succession; confirm only one reload fires within the debounce window
1 parent f7a2bc7 commit ead1e5a

3 files changed

Lines changed: 80 additions & 7 deletions

File tree

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
area: webapp
3+
type: improvement
4+
---
5+
6+
The LLM pricing registry now reloads from the database whenever a publish lands on `LLM_PRICING_RELOAD_CHANNEL` on the worker Redis, instead of waiting for the next 5-minute interval. LLM model and pricing changes reflect in cost enrichment within seconds.

apps/webapp/app/env.server.ts

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1427,6 +1427,14 @@ const EnvironmentSchema = z
14271427
// LLM cost tracking
14281428
LLM_COST_TRACKING_ENABLED: BoolEnv.default(true),
14291429
LLM_PRICING_RELOAD_INTERVAL_MS: z.coerce.number().int().default(5 * 60 * 1000), // 5 minutes
1430+
LLM_PRICING_RELOAD_CHANNEL: z.string().default("llm-registry:reload"),
1431+
LLM_PRICING_RELOAD_DEBOUNCE_MS: z.coerce.number().int().default(1000),
1432+
// Whether to subscribe this process to the LLM_PRICING_RELOAD_CHANNEL.
1433+
// Default off — only OTel-ingesting services need real-time pricing
1434+
// freshness; dashboard/worker processes are fine on the existing
1435+
// 5-minute periodic reload. In multi-service deployments, set this to
1436+
// true on the span-ingesting services.
1437+
LLM_PRICING_RELOAD_PUBSUB_ENABLED: BoolEnv.default(false),
14301438
LLM_PRICING_SEED_ON_STARTUP: BoolEnv.default(false),
14311439
LLM_PRICING_READY_TIMEOUT_MS: z.coerce.number().int().default(500),
14321440
LLM_METRICS_BATCH_SIZE: z.coerce.number().int().default(5000),

apps/webapp/app/v3/llmPricingRegistry.server.ts

Lines changed: 66 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
11
import { ModelPricingRegistry, seedLlmPricing } from "@internal/llm-model-catalog";
22
import { prisma, $replica } from "~/db.server";
33
import { env } from "~/env.server";
4+
import { logger } from "~/services/logger.server";
45
import { signalsEmitter } from "~/services/signals.server";
6+
import { createRedisClient } from "~/redis.server";
57
import { singleton } from "~/utils/singleton";
68
import { setLlmPricingRegistry } from "./utils/enrichCreatableEvents.server";
79

@@ -27,20 +29,77 @@ export const llmPricingRegistry = singleton("llmPricingRegistry", () => {
2729
console.error("Failed to initialize LLM pricing registry", err);
2830
});
2931

30-
// Periodic reload
32+
// Periodic reload (backstop for the pub/sub path below)
3133
const reloadInterval = env.LLM_PRICING_RELOAD_INTERVAL_MS;
3234
const interval = setInterval(() => {
3335
registry.reload().catch((err) => {
3436
console.error("Failed to reload LLM pricing registry", err);
3537
});
3638
}, reloadInterval);
3739

38-
signalsEmitter.on("SIGTERM", () => {
39-
clearInterval(interval);
40-
});
41-
signalsEmitter.on("SIGINT", () => {
42-
clearInterval(interval);
43-
});
40+
// Pub/sub reload is opt-in per process (default off). Without it, the
41+
// registry stays accurate via the existing 5-minute interval. Enable on
42+
// the OTel-ingesting services where pricing freshness directly affects
43+
// span cost enrichment; dashboard and worker services don't need it and
44+
// shouldn't pile onto each publish with a full-table reload.
45+
if (env.LLM_PRICING_RELOAD_PUBSUB_ENABLED) {
46+
const subscriber = createRedisClient("llm-pricing:subscriber", {
47+
keyPrefix: "llm-pricing:subscriber:",
48+
host: env.COMMON_WORKER_REDIS_HOST,
49+
port: env.COMMON_WORKER_REDIS_PORT,
50+
username: env.COMMON_WORKER_REDIS_USERNAME,
51+
password: env.COMMON_WORKER_REDIS_PASSWORD,
52+
tlsDisabled: env.COMMON_WORKER_REDIS_TLS_DISABLED === "true",
53+
clusterMode: env.COMMON_WORKER_REDIS_CLUSTER_MODE_ENABLED === "1",
54+
});
55+
56+
subscriber.subscribe(env.LLM_PRICING_RELOAD_CHANNEL).catch((err) => {
57+
logger.warn("Failed to subscribe to LLM pricing reload channel", {
58+
channel: env.LLM_PRICING_RELOAD_CHANNEL,
59+
error: err instanceof Error ? err.message : String(err),
60+
});
61+
});
62+
63+
// Coalesce reload calls so a burst of publishes only triggers one
64+
// reload. The first publish schedules a reload at
65+
// T+LLM_PRICING_RELOAD_DEBOUNCE_MS; subsequent publishes during that
66+
// window are no-ops because the trailing reload picks up everything
67+
// when it queries the DB. Bounds reload rate to at most 1 per debounce
68+
// window regardless of publisher chattiness.
69+
const debounceMs = env.LLM_PRICING_RELOAD_DEBOUNCE_MS;
70+
let pendingReloadTimer: NodeJS.Timeout | null = null;
71+
72+
function scheduleReload() {
73+
if (pendingReloadTimer) return;
74+
pendingReloadTimer = setTimeout(() => {
75+
pendingReloadTimer = null;
76+
registry.reload().catch((err) => {
77+
logger.warn("Failed to reload LLM pricing registry from pub/sub", {
78+
error: err instanceof Error ? err.message : String(err),
79+
});
80+
});
81+
}, debounceMs);
82+
}
83+
84+
subscriber.on("message", (channel) => {
85+
if (channel !== env.LLM_PRICING_RELOAD_CHANNEL) return;
86+
scheduleReload();
87+
});
88+
89+
signalsEmitter.on("SIGTERM", () => {
90+
clearInterval(interval);
91+
if (pendingReloadTimer) clearTimeout(pendingReloadTimer);
92+
void subscriber.quit().catch(() => {});
93+
});
94+
signalsEmitter.on("SIGINT", () => {
95+
clearInterval(interval);
96+
if (pendingReloadTimer) clearTimeout(pendingReloadTimer);
97+
void subscriber.quit().catch(() => {});
98+
});
99+
} else {
100+
signalsEmitter.on("SIGTERM", () => clearInterval(interval));
101+
signalsEmitter.on("SIGINT", () => clearInterval(interval));
102+
}
44103

45104
return registry;
46105
});

0 commit comments

Comments
 (0)