|
| 1 | +--- |
| 2 | +title: "How it works" |
| 3 | +sidebarTitle: "How it works" |
| 4 | +description: "End-to-end mechanics of a chat.agent turn: the two durable channels per session, the long-lived task that reads and writes them, and how a chat survives refreshes, deploys, and idle gaps." |
| 5 | +--- |
| 6 | + |
| 7 | +This page explains how `chat.agent` is put together, what each piece does on a single turn, and how a chat survives across turns. It is not an API tour — for that, see [Backend](/ai-chat/backend), [Frontend](/ai-chat/frontend), and the [Reference](/ai-chat/reference). For the byte-level wire format, see [Client Protocol](/ai-chat/client-protocol). |
| 8 | + |
| 9 | +<Note> |
| 10 | +**What you don't have to think about**: SSE reconnects, WebSocket backpressure, container cold starts, whether a worker is currently running, or how to re-deliver chunks the client missed during a reload. The platform handles those. **What you do have to think about**: idempotency in your `run()` function, and how much state you keep in memory between turns versus persist in your own database. |
| 11 | +</Note> |
| 12 | + |
| 13 | +## The primary noun: a chat session is a pair of streams and a task |
| 14 | + |
| 15 | +A **chat session** is the unit chat.agent owns. It is three things bound together: |
| 16 | + |
| 17 | +- An **inbox** channel called `.in` — every user message lands here as a record. |
| 18 | +- An **outbox** channel called `.out` — every assistant chunk leaves through here. |
| 19 | +- A long-lived **agent task** that reads from `.in` and writes to `.out`. |
| 20 | + |
| 21 | +Both channels are S2 ([s2.dev](https://s2.dev)) durable append-only streams, keyed by the session. Think of them as a pair of per-session topics on a tiny Kafka: records have monotonically increasing sequence numbers, readers resume from a cursor, writers append to the tail. We chose S2 because reads are resumable from an offset — so a browser reload can replay the response stream without re-running the LLM, and a crashed run can rejoin mid-conversation by reading from where it left off. |
| 22 | + |
| 23 | +A chat ID identifies the session for the lifetime of the conversation. The same session can be served by **many runs**: one run handles a turn (or several), goes idle, eventually exits, and the next user message triggers a fresh continuation run on the same session. Sessions are the durable identity; runs are the ephemeral compute. |
| 24 | + |
| 25 | +## The lifecycle states |
| 26 | + |
| 27 | +A run moves through a small state machine over its lifetime. Each state is named below, with the trigger that moves it to the next. |
| 28 | + |
| 29 | +### Cold start |
| 30 | + |
| 31 | +There is no run yet for this session. The frontend's first `sendMessage` posts to the session's `.in` channel; the server sees no live `currentRunId` and triggers a fresh `chat.agent` run with `continuation: false`. Moves to **Streaming** as soon as the task wakes and begins consuming `.in`. |
| 32 | + |
| 33 | +### Streaming |
| 34 | + |
| 35 | +The agent task is running. It reads the new message off `.in`, fires `onTurnStart`, runs your `run()` function, and pipes `streamText()` chunks onto `.out`. The browser is SSE-subscribed to `.out` and renders chunks as they land. When `streamText()` ends, the task writes a `trigger:turn-complete` control record (an S2 record with an empty body and a special header) and immediately trims `.out` back to the *previous* turn's completion marker — keeping the outbox bounded to roughly one turn of chunks at steady state. Moves to **Idle** after `onTurnComplete` runs and the post-turn snapshot is written. |
| 36 | + |
| 37 | +### Idle (awaiting next message) |
| 38 | + |
| 39 | +The turn is over. The task is alive but not doing work — it is parked in a waitpoint on `.in`, waiting for the next user message. If one arrives, it goes back to **Streaming** for the next turn. If `idleTimeoutInSeconds` (defaulting to a few minutes) passes with no new message, it moves to **Suspended**. |
| 40 | + |
| 41 | +### Suspended |
| 42 | + |
| 43 | +The task fires `onChatSuspend`, then the engine **checkpoints** the run's whole process state and frees the compute. The session is still live (the row exists, the `.out` stream is still readable, the chat ID still works), but no machine is dedicated to it. This is the same Checkpoint-Resume System that powers every Trigger.dev task — covered in detail at [How it works → Checkpoint-Resume](/how-it-works#the-checkpoint-resume-system). Moves to **Resuming** when the next message lands in `.in`. |
| 44 | + |
| 45 | +### Resuming |
| 46 | + |
| 47 | +The engine restores the suspended run from its checkpoint. The same JS process picks up exactly where it parked — `chat.local` values, the accumulator, in-flight promises, in-memory caches all preserved as they were. `onChatResume` fires immediately after the restore, then the task transitions to **Streaming**. No boot work, no snapshot read, no SDK reinitialization. This is the cheap path. |
| 48 | + |
| 49 | +### Continuation (after exit) |
| 50 | + |
| 51 | +If the run has fully exited (because it hit `maxTurns`, the customer called `chat.endRun()` or `chat.requestUpgrade()`, or it was cancelled or crashed), the next user message can't resume it — there is nothing to resume. Instead, the server triggers a brand-new run with `continuation: true`. The new run does a cold boot but reads the prior conversation's S3 snapshot and replays any `.out` chunks after the snapshot cursor, so the new run starts with the full message history already accumulated. Then it enters **Streaming** with `turn === 0` of the new run but `messageCount > 0`. |
| 52 | + |
| 53 | +### Closed |
| 54 | + |
| 55 | +`POST /api/v1/sessions/:id/close` flips `closedAt` on the session row. Future appends are rejected. Reads still work for transcript viewing. The session is terminal. |
| 56 | + |
| 57 | +## One turn, end to end |
| 58 | + |
| 59 | +Here is a typical cold turn — user opens the page, types "What's the weather?", reads the response — traced through every component. |
| 60 | + |
| 61 | +<Steps> |
| 62 | + <Step title="Browser: useChat calls transport.sendMessages"> |
| 63 | + The Vercel AI SDK's `useChat` hook serializes the user's message into the slim wire format: `{ chatId, trigger: "submit-message", message, metadata }`. Only the new message goes on the wire, not the full history. |
| 64 | + </Step> |
| 65 | + <Step title="Browser: transport posts to /append"> |
| 66 | + The transport calls `POST /realtime/v1/sessions/:chatId/in/append`, authenticated with the session's public access token. The body is one S2 record. |
| 67 | + </Step> |
| 68 | + <Step title="Server: route ensures a run exists"> |
| 69 | + The append route resolves the session, then calls `ensureRunForSession()`. The session's `currentRunId` is null (cold start), so it triggers a new `chat.agent` run on the project's dev/prod environment and atomically claims the slot via an optimistic version counter. |
| 70 | + </Step> |
| 71 | + <Step title="Server: route appends the record to S2 .in"> |
| 72 | + The route writes the message to `s2://sessions/:chatId/in` as a single record. S2 assigns a sequence number. Any waitpoints registered on this channel fire, which would wake an existing run — but there is no run waiting yet, so this is a no-op for now. |
| 73 | + </Step> |
| 74 | + <Step title="Browser: transport opens an SSE subscription to .out"> |
| 75 | + In parallel with the send, the transport opens `GET /realtime/v1/sessions/:chatId/out` (server-sent events). It passes its `lastEventId` if it has one cached; on a brand-new chat it does not. Any chunks the agent writes from now on will be delivered to this stream. |
| 76 | + </Step> |
| 77 | + <Step title="Task: agent run boots"> |
| 78 | + The newly-triggered run starts. `onBoot` fires once per worker process. Because this is a fresh chat, no snapshot is read. |
| 79 | + </Step> |
| 80 | + <Step title="Task: enters the turn loop, reads the message from .in"> |
| 81 | + The agent reads the pending record off `.in` via a waitpoint. `onChatStart` fires (once per chat lifetime). `onTurnStart` fires (every turn). |
| 82 | + </Step> |
| 83 | + <Step title="Task: runs your run() function, streams chunks to .out"> |
| 84 | + Your code calls `streamText({ model, messages })`. Each `UIMessageChunk` it produces is appended to `s2://sessions/:chatId/out` as a record. The browser sees them arrive on the SSE stream and the AI SDK renders them. |
| 85 | + </Step> |
| 86 | + <Step title="Task: writes the turn-complete control record"> |
| 87 | + When `streamText()` finishes, the agent writes a record with header `trigger:turn-complete` and an empty body. The browser transport sees this header and closes the per-turn readable stream. |
| 88 | + </Step> |
| 89 | + <Step title="Task: trims .out back to the previous turn-complete"> |
| 90 | + Immediately after writing the new turn-complete marker, the agent issues an S2 trim command targeting the *previous* turn-complete's sequence number. This bounds the stream's storage to roughly one turn of chunks plus the latest control record. |
| 91 | + </Step> |
| 92 | + <Step title="Task: fires onTurnComplete, writes snapshot to S3"> |
| 93 | + `onTurnComplete` runs (your hook for persistence). Then the agent writes `ChatSnapshotV1` — `{ version: 1, messages, lastOutEventId, lastOutTimestamp }` — to S3 at `sessions/:chatId/snapshot.json`. This write is awaited, not fire-and-forget, so the next run is guaranteed to find it. |
| 94 | + </Step> |
| 95 | + <Step title="Task: goes idle, then suspends"> |
| 96 | + The agent re-enters the waitpoint on `.in`. After `idleTimeoutInSeconds` of nothing arriving, `onChatSuspend` fires and the engine snapshots the run. Compute is freed. |
| 97 | + </Step> |
| 98 | +</Steps> |
| 99 | + |
| 100 | +## Three layers of persistence |
| 101 | + |
| 102 | +chat.agent survives idle gaps, deploys, refreshes, and crashes because three separate persistence mechanisms work at three different layers of the stack. They're orthogonal — each protects against a different failure mode, and conflating them is a common source of bugs. |
| 103 | + |
| 104 | +### Layer 1: the engine checkpoint (compute) |
| 105 | + |
| 106 | +When a run enters the Suspended state, the engine **checkpoints** the running process — its memory, CPU registers, and open file descriptors — and frees the compute. Today this is done via [CRIU](https://criu.org/) (Checkpoint/Restore in Userspace), the same mechanism that powers every Trigger.dev task's suspend/resume. On the new microVM compute runtime (currently in [private beta](/compute-private-beta)), it becomes a full Firecracker VM snapshot: every byte of memory plus filesystem state plus every kernel object inside the VM. |
| 107 | + |
| 108 | +When the next message arrives, the engine **restores** the checkpoint. The same JS process picks up at the exact instruction it parked on. From your code's perspective, the line right after the `messagesInput.wait()` waitpoint just continues executing. Anything in process memory survives: `chat.local`, the message accumulator, in-flight Promises, in-memory caches, open DB connections. The runId is unchanged. |
| 109 | + |
| 110 | +This is what lets you write `run()` as a single long-lived function with stateful closures, even though the underlying compute actually goes through checkpoint/restore cycles between turns. `onChatSuspend` fires immediately before the checkpoint; `onChatResume` fires immediately after the restore. |
| 111 | + |
| 112 | +### Layer 2: the chat snapshot (S3) |
| 113 | + |
| 114 | +After every turn the agent writes a `ChatSnapshotV1` blob to S3 — full accumulated `UIMessage[]` plus the current `lastOutEventId` cursor. This is chat-specific and lives one layer above the engine. It has nothing to do with CRIU or Firecracker. |
| 115 | + |
| 116 | +The chat snapshot bridges run *boundaries*. If a run exits cleanly — because it hit `maxTurns`, called `chat.endRun()` or `chat.requestUpgrade()`, was cancelled, crashed, or got bumped to a new version after a deploy — the engine checkpoint is gone with it. When the next user message arrives, the server triggers a fresh run with `continuation: true`. That new run reads the S3 snapshot, replays any post-snapshot chunks from `.out`, merges by message ID, and starts its first turn with the full conversation history already in memory. |
| 117 | + |
| 118 | +The chat snapshot carries only message history — not process memory. `chat.local`, in-memory caches, open connections all need to be reinitialized on a continuation. This is why `onBoot` (every fresh worker) is the right place to initialize `chat.local`, not `onChatStart` (only the very first turn of the chat). See [Persistence and replay](/ai-chat/patterns/persistence-and-replay) for the full snapshot model. |
| 119 | + |
| 120 | +If your task registers a `hydrateMessages` hook, the chat snapshot is skipped entirely — your hook is the single source of truth for history. |
| 121 | + |
| 122 | +### Layer 3: the `lastEventId` cursor (browser) |
| 123 | + |
| 124 | +The transport stores `lastEventId` — the S2 sequence number of the most recent chunk it processed — in its session state. On page reload, it reopens the SSE stream with `Last-Event-ID: <cursor>` as a header. S2 resumes from that cursor; chunks the browser already saw are not redelivered. If the agent was mid-turn when the browser reloaded, the rest of the turn streams in. If the turn had already completed, the stream closes immediately via an `X-Session-Settled` header so the client doesn't long-poll for nothing. |
| 125 | + |
| 126 | +Unlike the other two layers, this one is client-side. The server doesn't even need to know the browser refreshed — the agent run keeps running (or stays suspended) regardless. |
| 127 | + |
| 128 | +### Which layer covers which failure mode |
| 129 | + |
| 130 | +| What happened | Recovery layer | Same run? | In-memory state preserved? | |
| 131 | +| --- | --- | --- | --- | |
| 132 | +| Idle gap mid-conversation (suspend → resume) | Engine checkpoint | Yes | Yes | |
| 133 | +| Run exited cleanly (`endRun`, `requestUpgrade`, `maxTurns`) | Chat snapshot | No (fresh continuation run) | No | |
| 134 | +| Run crashed mid-turn (OOM, exception) | Chat snapshot + `.out` tail replay | (retried as a new attempt) | No | |
| 135 | +| Browser tab reloaded mid-stream | `lastEventId` cursor on `.out` | (run unaffected) | (n/a) | |
| 136 | +| Deploy rolled out a new version mid-chat | Chat snapshot, via `requestUpgrade` flow | No | No | |
| 137 | + |
| 138 | +No single layer covers every case. The engine checkpoint alone can't survive a run exit (there's nothing to restore). The chat snapshot alone can't survive a tab refresh mid-turn (chunks already streamed would be lost). The `lastEventId` cursor alone can't bridge run boundaries (the new run wouldn't know the history). Together they cover every realistic failure. |
| 139 | + |
| 140 | +## Warm vs cold: same chat, three different timings |
| 141 | + |
| 142 | +Take the same conversation — "What's the weather?" then "What about tomorrow?" — and look at how each second turn lands. |
| 143 | + |
| 144 | +**Warm second turn (within a few seconds).** The first turn finished, the agent is parked on the `.in` waitpoint, status is **Idle**. The new message hits `/append`, the waitpoint fires, the agent wakes inside the same run with all memory intact, runs `onTurnStart` for turn 2, streams the response. No checkpoint involved — the process never went to sleep. Latency to first chunk: dominated by the LLM, not the platform. |
| 145 | + |
| 146 | +**Resumed second turn (a few minutes later).** The first turn finished and the agent suspended — the engine checkpoint is stored, compute is freed. The new message hits `/append`. The engine restores the checkpoint, fires `onChatResume`, and the task picks up exactly where it parked — all in-memory state preserved (`chat.local`, the accumulator, the lot). Latency to first chunk: the engine's restore overhead, then the LLM. |
| 147 | + |
| 148 | +**Continuation second turn (an hour later, or after a deploy).** The first turn finished and the run eventually exited. The new message hits `/append`, the server triggers a fresh run with `continuation: true`. The new run boots cold, `onBoot` fires, the agent reads the S3 chat snapshot, replays the `.out` tail, then enters the turn loop with the full conversation already accumulated. The previous run's in-memory state is gone — anything in `chat.local` has to be re-initialized in `onBoot`. Latency to first chunk: cold start plus snapshot read, then the LLM. |
| 149 | + |
| 150 | +All three look identical to the browser. Only the agent task knows which path it took, via `payload.continuation` and `ctx.attempt.number`. |
| 151 | + |
| 152 | +## Lifecycle hooks: where you plug in |
| 153 | + |
| 154 | +| Hook | When it fires | Typical use | |
| 155 | +| --- | --- | --- | |
| 156 | +| `onBoot` | Once per worker process, before any chat work | Initialize `chat.local` resources | |
| 157 | +| `onPreload` | Once per chat lifetime, if the chat was preloaded before the first message | Warm caches, fetch the user's profile | |
| 158 | +| `onChatStart` | Once per chat lifetime, on the first turn of a fresh chat (not on continuation) | First-message persistence, system-prompt setup | |
| 159 | +| `onValidateMessages` | Every turn, before merging the incoming message | Reject or transform user input | |
| 160 | +| `hydrateMessages` | Every turn, instead of snapshot+replay | Use your DB as the source of truth | |
| 161 | +| `onTurnStart` | Every turn, before `run()` | Compact history, persist the user message | |
| 162 | +| `onBeforeTurnComplete` | Every turn, after streaming, before the turn-complete record | Emit a final custom chunk | |
| 163 | +| `onTurnComplete` | Every turn, after the turn-complete record is written | Persist the assistant message and `lastEventId` | |
| 164 | +| `onChatSuspend` / `onChatResume` | At the idle → suspend / suspend → wake transitions | Release/reacquire expensive resources | |
| 165 | + |
| 166 | +See [Lifecycle hooks](/ai-chat/lifecycle-hooks) for the full signatures and firing order. |
| 167 | + |
| 168 | +## When chat.agent is the right primitive |
| 169 | + |
| 170 | +**Good fit**: |
| 171 | +- Multi-turn conversational agents where the user is expected to come back later. |
| 172 | +- Long-running agent loops with tool calls, where a single turn can take a minute or more. |
| 173 | +- Cases where you want page reloads to resume the in-flight response without re-running the model. |
| 174 | +- Cases where you can't predict idle gaps — humans go to lunch. |
| 175 | + |
| 176 | +**Not a good fit**: |
| 177 | +- Single-shot completions where you don't need durability or resume. Call your model directly. |
| 178 | +- Workflows where you control both ends and want a custom protocol. Use [`chat.task` and primitives](/ai-chat/backend#raw-task-with-primitives) directly without the `chat.agent` wrapper. |
| 179 | +- High-fanout broadcasting (one source, many subscribers). Use Trigger.dev realtime streams against a regular task instead. |
| 180 | + |
| 181 | +## Putting it together |
| 182 | + |
| 183 | +```mermaid |
| 184 | +sequenceDiagram |
| 185 | + participant Browser |
| 186 | + participant API as Trigger.dev API |
| 187 | + participant S2_in as S2 .in |
| 188 | + participant S2_out as S2 .out |
| 189 | + participant Agent as chat.agent task |
| 190 | + participant S3 as S3 snapshot |
| 191 | +
|
| 192 | + Note over Agent: Cold start |
| 193 | + Browser->>API: POST /sessions/:id/in/append |
| 194 | + API->>S2_in: append(message) |
| 195 | + API->>Agent: trigger run (continuation: false) |
| 196 | + Browser->>API: GET /sessions/:id/out (SSE) |
| 197 | + API->>S2_out: read stream |
| 198 | + Agent->>S2_in: read message (waitpoint) |
| 199 | + Agent->>S2_out: append chunk(s) |
| 200 | + S2_out-->>Browser: SSE chunks |
| 201 | + Agent->>S2_out: append turn-complete (control) |
| 202 | + Agent->>S2_out: trim < previous turn-complete |
| 203 | + Agent->>S3: write snapshot |
| 204 | + Note over Agent: Idle on waitpoint |
| 205 | +
|
| 206 | + Note over Agent: ...time passes... |
| 207 | + Note over Agent: Suspended |
| 208 | +
|
| 209 | + Browser->>API: POST /sessions/:id/in/append |
| 210 | + API->>S2_in: append(message) |
| 211 | + API->>Agent: restore from suspend |
| 212 | + Agent->>S2_in: read message |
| 213 | + Agent->>S2_out: append chunk(s) |
| 214 | + S2_out-->>Browser: SSE chunks |
| 215 | + Agent->>S2_out: append turn-complete |
| 216 | + Agent->>S3: write snapshot |
| 217 | + Note over Agent: Idle again |
| 218 | +``` |
| 219 | + |
| 220 | +## Where to go next |
| 221 | + |
| 222 | +- [Quick start](/ai-chat/quick-start) — get a chat running in a few minutes. |
| 223 | +- [Backend](/ai-chat/backend) — the `chat.agent()` API in detail. |
| 224 | +- [Lifecycle hooks](/ai-chat/lifecycle-hooks) — every hook, what fires when. |
| 225 | +- [Persistence and replay](/ai-chat/patterns/persistence-and-replay) — deeper on the snapshot model. |
| 226 | +- [Client protocol](/ai-chat/client-protocol) — wire format if you're writing a custom transport. |
0 commit comments