Skip to content

Commit 64d223d

Browse files
committed
docs(ai-chat): add how-it-works page
A conceptual deep dive sitting between Overview and Backend. Covers the chat session as a pair of S2 channels and a long-lived task, the run's lifecycle states (Cold start, Streaming, Idle, Suspended, Resuming, Continuation, Closed), a step-by-step trace of one turn, and the three persistence layers that survive idle gaps, deploys, refreshes, and crashes: the engine checkpoint (CRIU today, full Firecracker VM snapshots on the new microVM compute), the chat-level S3 snapshot, and the browser's lastEventId cursor. Closes with warm-vs-resumed-vs-continuation timings, a hooks pointer table, a good-fit subsection, and a single-turn Mermaid sequence diagram.
1 parent 9e11b2b commit 64d223d

2 files changed

Lines changed: 227 additions & 0 deletions

File tree

docs/ai-chat/how-it-works.mdx

Lines changed: 226 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,226 @@
1+
---
2+
title: "How it works"
3+
sidebarTitle: "How it works"
4+
description: "End-to-end mechanics of a chat.agent turn: the two durable channels per session, the long-lived task that reads and writes them, and how a chat survives refreshes, deploys, and idle gaps."
5+
---
6+
7+
This page explains how `chat.agent` is put together, what each piece does on a single turn, and how a chat survives across turns. It is not an API tour — for that, see [Backend](/ai-chat/backend), [Frontend](/ai-chat/frontend), and the [Reference](/ai-chat/reference). For the byte-level wire format, see [Client Protocol](/ai-chat/client-protocol).
8+
9+
<Note>
10+
**What you don't have to think about**: SSE reconnects, WebSocket backpressure, container cold starts, whether a worker is currently running, or how to re-deliver chunks the client missed during a reload. The platform handles those. **What you do have to think about**: idempotency in your `run()` function, and how much state you keep in memory between turns versus persist in your own database.
11+
</Note>
12+
13+
## The primary noun: a chat session is a pair of streams and a task
14+
15+
A **chat session** is the unit chat.agent owns. It is three things bound together:
16+
17+
- An **inbox** channel called `.in` — every user message lands here as a record.
18+
- An **outbox** channel called `.out` — every assistant chunk leaves through here.
19+
- A long-lived **agent task** that reads from `.in` and writes to `.out`.
20+
21+
Both channels are S2 ([s2.dev](https://s2.dev)) durable append-only streams, keyed by the session. Think of them as a pair of per-session topics on a tiny Kafka: records have monotonically increasing sequence numbers, readers resume from a cursor, writers append to the tail. We chose S2 because reads are resumable from an offset — so a browser reload can replay the response stream without re-running the LLM, and a crashed run can rejoin mid-conversation by reading from where it left off.
22+
23+
A chat ID identifies the session for the lifetime of the conversation. The same session can be served by **many runs**: one run handles a turn (or several), goes idle, eventually exits, and the next user message triggers a fresh continuation run on the same session. Sessions are the durable identity; runs are the ephemeral compute.
24+
25+
## The lifecycle states
26+
27+
A run moves through a small state machine over its lifetime. Each state is named below, with the trigger that moves it to the next.
28+
29+
### Cold start
30+
31+
There is no run yet for this session. The frontend's first `sendMessage` posts to the session's `.in` channel; the server sees no live `currentRunId` and triggers a fresh `chat.agent` run with `continuation: false`. Moves to **Streaming** as soon as the task wakes and begins consuming `.in`.
32+
33+
### Streaming
34+
35+
The agent task is running. It reads the new message off `.in`, fires `onTurnStart`, runs your `run()` function, and pipes `streamText()` chunks onto `.out`. The browser is SSE-subscribed to `.out` and renders chunks as they land. When `streamText()` ends, the task writes a `trigger:turn-complete` control record (an S2 record with an empty body and a special header) and immediately trims `.out` back to the *previous* turn's completion marker — keeping the outbox bounded to roughly one turn of chunks at steady state. Moves to **Idle** after `onTurnComplete` runs and the post-turn snapshot is written.
36+
37+
### Idle (awaiting next message)
38+
39+
The turn is over. The task is alive but not doing work — it is parked in a waitpoint on `.in`, waiting for the next user message. If one arrives, it goes back to **Streaming** for the next turn. If `idleTimeoutInSeconds` (defaulting to a few minutes) passes with no new message, it moves to **Suspended**.
40+
41+
### Suspended
42+
43+
The task fires `onChatSuspend`, then the engine **checkpoints** the run's whole process state and frees the compute. The session is still live (the row exists, the `.out` stream is still readable, the chat ID still works), but no machine is dedicated to it. This is the same Checkpoint-Resume System that powers every Trigger.dev task — covered in detail at [How it works → Checkpoint-Resume](/how-it-works#the-checkpoint-resume-system). Moves to **Resuming** when the next message lands in `.in`.
44+
45+
### Resuming
46+
47+
The engine restores the suspended run from its checkpoint. The same JS process picks up exactly where it parked — `chat.local` values, the accumulator, in-flight promises, in-memory caches all preserved as they were. `onChatResume` fires immediately after the restore, then the task transitions to **Streaming**. No boot work, no snapshot read, no SDK reinitialization. This is the cheap path.
48+
49+
### Continuation (after exit)
50+
51+
If the run has fully exited (because it hit `maxTurns`, the customer called `chat.endRun()` or `chat.requestUpgrade()`, or it was cancelled or crashed), the next user message can't resume it — there is nothing to resume. Instead, the server triggers a brand-new run with `continuation: true`. The new run does a cold boot but reads the prior conversation's S3 snapshot and replays any `.out` chunks after the snapshot cursor, so the new run starts with the full message history already accumulated. Then it enters **Streaming** with `turn === 0` of the new run but `messageCount > 0`.
52+
53+
### Closed
54+
55+
`POST /api/v1/sessions/:id/close` flips `closedAt` on the session row. Future appends are rejected. Reads still work for transcript viewing. The session is terminal.
56+
57+
## One turn, end to end
58+
59+
Here is a typical cold turn — user opens the page, types "What's the weather?", reads the response — traced through every component.
60+
61+
<Steps>
62+
<Step title="Browser: useChat calls transport.sendMessages">
63+
The Vercel AI SDK's `useChat` hook serializes the user's message into the slim wire format: `{ chatId, trigger: "submit-message", message, metadata }`. Only the new message goes on the wire, not the full history.
64+
</Step>
65+
<Step title="Browser: transport posts to /append">
66+
The transport calls `POST /realtime/v1/sessions/:chatId/in/append`, authenticated with the session's public access token. The body is one S2 record.
67+
</Step>
68+
<Step title="Server: route ensures a run exists">
69+
The append route resolves the session, then calls `ensureRunForSession()`. The session's `currentRunId` is null (cold start), so it triggers a new `chat.agent` run on the project's dev/prod environment and atomically claims the slot via an optimistic version counter.
70+
</Step>
71+
<Step title="Server: route appends the record to S2 .in">
72+
The route writes the message to `s2://sessions/:chatId/in` as a single record. S2 assigns a sequence number. Any waitpoints registered on this channel fire, which would wake an existing run — but there is no run waiting yet, so this is a no-op for now.
73+
</Step>
74+
<Step title="Browser: transport opens an SSE subscription to .out">
75+
In parallel with the send, the transport opens `GET /realtime/v1/sessions/:chatId/out` (server-sent events). It passes its `lastEventId` if it has one cached; on a brand-new chat it does not. Any chunks the agent writes from now on will be delivered to this stream.
76+
</Step>
77+
<Step title="Task: agent run boots">
78+
The newly-triggered run starts. `onBoot` fires once per worker process. Because this is a fresh chat, no snapshot is read.
79+
</Step>
80+
<Step title="Task: enters the turn loop, reads the message from .in">
81+
The agent reads the pending record off `.in` via a waitpoint. `onChatStart` fires (once per chat lifetime). `onTurnStart` fires (every turn).
82+
</Step>
83+
<Step title="Task: runs your run() function, streams chunks to .out">
84+
Your code calls `streamText({ model, messages })`. Each `UIMessageChunk` it produces is appended to `s2://sessions/:chatId/out` as a record. The browser sees them arrive on the SSE stream and the AI SDK renders them.
85+
</Step>
86+
<Step title="Task: writes the turn-complete control record">
87+
When `streamText()` finishes, the agent writes a record with header `trigger:turn-complete` and an empty body. The browser transport sees this header and closes the per-turn readable stream.
88+
</Step>
89+
<Step title="Task: trims .out back to the previous turn-complete">
90+
Immediately after writing the new turn-complete marker, the agent issues an S2 trim command targeting the *previous* turn-complete's sequence number. This bounds the stream's storage to roughly one turn of chunks plus the latest control record.
91+
</Step>
92+
<Step title="Task: fires onTurnComplete, writes snapshot to S3">
93+
`onTurnComplete` runs (your hook for persistence). Then the agent writes `ChatSnapshotV1``{ version: 1, messages, lastOutEventId, lastOutTimestamp }` — to S3 at `sessions/:chatId/snapshot.json`. This write is awaited, not fire-and-forget, so the next run is guaranteed to find it.
94+
</Step>
95+
<Step title="Task: goes idle, then suspends">
96+
The agent re-enters the waitpoint on `.in`. After `idleTimeoutInSeconds` of nothing arriving, `onChatSuspend` fires and the engine snapshots the run. Compute is freed.
97+
</Step>
98+
</Steps>
99+
100+
## Three layers of persistence
101+
102+
chat.agent survives idle gaps, deploys, refreshes, and crashes because three separate persistence mechanisms work at three different layers of the stack. They're orthogonal — each protects against a different failure mode, and conflating them is a common source of bugs.
103+
104+
### Layer 1: the engine checkpoint (compute)
105+
106+
When a run enters the Suspended state, the engine **checkpoints** the running process — its memory, CPU registers, and open file descriptors — and frees the compute. Today this is done via [CRIU](https://criu.org/) (Checkpoint/Restore in Userspace), the same mechanism that powers every Trigger.dev task's suspend/resume. On the new microVM compute runtime (currently in [private beta](/compute-private-beta)), it becomes a full Firecracker VM snapshot: every byte of memory plus filesystem state plus every kernel object inside the VM.
107+
108+
When the next message arrives, the engine **restores** the checkpoint. The same JS process picks up at the exact instruction it parked on. From your code's perspective, the line right after the `messagesInput.wait()` waitpoint just continues executing. Anything in process memory survives: `chat.local`, the message accumulator, in-flight Promises, in-memory caches, open DB connections. The runId is unchanged.
109+
110+
This is what lets you write `run()` as a single long-lived function with stateful closures, even though the underlying compute actually goes through checkpoint/restore cycles between turns. `onChatSuspend` fires immediately before the checkpoint; `onChatResume` fires immediately after the restore.
111+
112+
### Layer 2: the chat snapshot (S3)
113+
114+
After every turn the agent writes a `ChatSnapshotV1` blob to S3 — full accumulated `UIMessage[]` plus the current `lastOutEventId` cursor. This is chat-specific and lives one layer above the engine. It has nothing to do with CRIU or Firecracker.
115+
116+
The chat snapshot bridges run *boundaries*. If a run exits cleanly — because it hit `maxTurns`, called `chat.endRun()` or `chat.requestUpgrade()`, was cancelled, crashed, or got bumped to a new version after a deploy — the engine checkpoint is gone with it. When the next user message arrives, the server triggers a fresh run with `continuation: true`. That new run reads the S3 snapshot, replays any post-snapshot chunks from `.out`, merges by message ID, and starts its first turn with the full conversation history already in memory.
117+
118+
The chat snapshot carries only message history — not process memory. `chat.local`, in-memory caches, open connections all need to be reinitialized on a continuation. This is why `onBoot` (every fresh worker) is the right place to initialize `chat.local`, not `onChatStart` (only the very first turn of the chat). See [Persistence and replay](/ai-chat/patterns/persistence-and-replay) for the full snapshot model.
119+
120+
If your task registers a `hydrateMessages` hook, the chat snapshot is skipped entirely — your hook is the single source of truth for history.
121+
122+
### Layer 3: the `lastEventId` cursor (browser)
123+
124+
The transport stores `lastEventId` — the S2 sequence number of the most recent chunk it processed — in its session state. On page reload, it reopens the SSE stream with `Last-Event-ID: <cursor>` as a header. S2 resumes from that cursor; chunks the browser already saw are not redelivered. If the agent was mid-turn when the browser reloaded, the rest of the turn streams in. If the turn had already completed, the stream closes immediately via an `X-Session-Settled` header so the client doesn't long-poll for nothing.
125+
126+
Unlike the other two layers, this one is client-side. The server doesn't even need to know the browser refreshed — the agent run keeps running (or stays suspended) regardless.
127+
128+
### Which layer covers which failure mode
129+
130+
| What happened | Recovery layer | Same run? | In-memory state preserved? |
131+
| --- | --- | --- | --- |
132+
| Idle gap mid-conversation (suspend → resume) | Engine checkpoint | Yes | Yes |
133+
| Run exited cleanly (`endRun`, `requestUpgrade`, `maxTurns`) | Chat snapshot | No (fresh continuation run) | No |
134+
| Run crashed mid-turn (OOM, exception) | Chat snapshot + `.out` tail replay | (retried as a new attempt) | No |
135+
| Browser tab reloaded mid-stream | `lastEventId` cursor on `.out` | (run unaffected) | (n/a) |
136+
| Deploy rolled out a new version mid-chat | Chat snapshot, via `requestUpgrade` flow | No | No |
137+
138+
No single layer covers every case. The engine checkpoint alone can't survive a run exit (there's nothing to restore). The chat snapshot alone can't survive a tab refresh mid-turn (chunks already streamed would be lost). The `lastEventId` cursor alone can't bridge run boundaries (the new run wouldn't know the history). Together they cover every realistic failure.
139+
140+
## Warm vs cold: same chat, three different timings
141+
142+
Take the same conversation — "What's the weather?" then "What about tomorrow?" — and look at how each second turn lands.
143+
144+
**Warm second turn (within a few seconds).** The first turn finished, the agent is parked on the `.in` waitpoint, status is **Idle**. The new message hits `/append`, the waitpoint fires, the agent wakes inside the same run with all memory intact, runs `onTurnStart` for turn 2, streams the response. No checkpoint involved — the process never went to sleep. Latency to first chunk: dominated by the LLM, not the platform.
145+
146+
**Resumed second turn (a few minutes later).** The first turn finished and the agent suspended — the engine checkpoint is stored, compute is freed. The new message hits `/append`. The engine restores the checkpoint, fires `onChatResume`, and the task picks up exactly where it parked — all in-memory state preserved (`chat.local`, the accumulator, the lot). Latency to first chunk: the engine's restore overhead, then the LLM.
147+
148+
**Continuation second turn (an hour later, or after a deploy).** The first turn finished and the run eventually exited. The new message hits `/append`, the server triggers a fresh run with `continuation: true`. The new run boots cold, `onBoot` fires, the agent reads the S3 chat snapshot, replays the `.out` tail, then enters the turn loop with the full conversation already accumulated. The previous run's in-memory state is gone — anything in `chat.local` has to be re-initialized in `onBoot`. Latency to first chunk: cold start plus snapshot read, then the LLM.
149+
150+
All three look identical to the browser. Only the agent task knows which path it took, via `payload.continuation` and `ctx.attempt.number`.
151+
152+
## Lifecycle hooks: where you plug in
153+
154+
| Hook | When it fires | Typical use |
155+
| --- | --- | --- |
156+
| `onBoot` | Once per worker process, before any chat work | Initialize `chat.local` resources |
157+
| `onPreload` | Once per chat lifetime, if the chat was preloaded before the first message | Warm caches, fetch the user's profile |
158+
| `onChatStart` | Once per chat lifetime, on the first turn of a fresh chat (not on continuation) | First-message persistence, system-prompt setup |
159+
| `onValidateMessages` | Every turn, before merging the incoming message | Reject or transform user input |
160+
| `hydrateMessages` | Every turn, instead of snapshot+replay | Use your DB as the source of truth |
161+
| `onTurnStart` | Every turn, before `run()` | Compact history, persist the user message |
162+
| `onBeforeTurnComplete` | Every turn, after streaming, before the turn-complete record | Emit a final custom chunk |
163+
| `onTurnComplete` | Every turn, after the turn-complete record is written | Persist the assistant message and `lastEventId` |
164+
| `onChatSuspend` / `onChatResume` | At the idle → suspend / suspend → wake transitions | Release/reacquire expensive resources |
165+
166+
See [Lifecycle hooks](/ai-chat/lifecycle-hooks) for the full signatures and firing order.
167+
168+
## When chat.agent is the right primitive
169+
170+
**Good fit**:
171+
- Multi-turn conversational agents where the user is expected to come back later.
172+
- Long-running agent loops with tool calls, where a single turn can take a minute or more.
173+
- Cases where you want page reloads to resume the in-flight response without re-running the model.
174+
- Cases where you can't predict idle gaps — humans go to lunch.
175+
176+
**Not a good fit**:
177+
- Single-shot completions where you don't need durability or resume. Call your model directly.
178+
- Workflows where you control both ends and want a custom protocol. Use [`chat.task` and primitives](/ai-chat/backend#raw-task-with-primitives) directly without the `chat.agent` wrapper.
179+
- High-fanout broadcasting (one source, many subscribers). Use Trigger.dev realtime streams against a regular task instead.
180+
181+
## Putting it together
182+
183+
```mermaid
184+
sequenceDiagram
185+
participant Browser
186+
participant API as Trigger.dev API
187+
participant S2_in as S2 .in
188+
participant S2_out as S2 .out
189+
participant Agent as chat.agent task
190+
participant S3 as S3 snapshot
191+
192+
Note over Agent: Cold start
193+
Browser->>API: POST /sessions/:id/in/append
194+
API->>S2_in: append(message)
195+
API->>Agent: trigger run (continuation: false)
196+
Browser->>API: GET /sessions/:id/out (SSE)
197+
API->>S2_out: read stream
198+
Agent->>S2_in: read message (waitpoint)
199+
Agent->>S2_out: append chunk(s)
200+
S2_out-->>Browser: SSE chunks
201+
Agent->>S2_out: append turn-complete (control)
202+
Agent->>S2_out: trim < previous turn-complete
203+
Agent->>S3: write snapshot
204+
Note over Agent: Idle on waitpoint
205+
206+
Note over Agent: ...time passes...
207+
Note over Agent: Suspended
208+
209+
Browser->>API: POST /sessions/:id/in/append
210+
API->>S2_in: append(message)
211+
API->>Agent: restore from suspend
212+
Agent->>S2_in: read message
213+
Agent->>S2_out: append chunk(s)
214+
S2_out-->>Browser: SSE chunks
215+
Agent->>S2_out: append turn-complete
216+
Agent->>S3: write snapshot
217+
Note over Agent: Idle again
218+
```
219+
220+
## Where to go next
221+
222+
- [Quick start](/ai-chat/quick-start) — get a chat running in a few minutes.
223+
- [Backend](/ai-chat/backend) — the `chat.agent()` API in detail.
224+
- [Lifecycle hooks](/ai-chat/lifecycle-hooks) — every hook, what fires when.
225+
- [Persistence and replay](/ai-chat/patterns/persistence-and-replay) — deeper on the snapshot model.
226+
- [Client protocol](/ai-chat/client-protocol) — wire format if you're writing a custom transport.

docs/docs.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,7 @@
106106
"ai-chat/overview",
107107
"ai-chat/changelog",
108108
"ai-chat/quick-start",
109+
"ai-chat/how-it-works",
109110
"ai-chat/backend",
110111
"ai-chat/lifecycle-hooks",
111112
"ai-chat/frontend",

0 commit comments

Comments
 (0)