Skip to content

fix(cli): fail attempt on uncaught exception instead of hanging to maxDuration (TRI-9117)#3529

Merged
matt-aitken merged 5 commits intomainfrom
tri-9117-uncaught-exception-fail-attempt
May 6, 2026
Merged

fix(cli): fail attempt on uncaught exception instead of hanging to maxDuration (TRI-9117)#3529
matt-aitken merged 5 commits intomainfrom
tri-9117-uncaught-exception-fail-attempt

Conversation

@matt-aitken
Copy link
Copy Markdown
Member

When a Node EventEmitter (e.g. node-redis) emits an "error" event with no
listener attached, Node escalates it to process.on("uncaughtException") in
the task worker. The worker reported the error via the UNCAUGHT_EXCEPTION
IPC event but did not exit, and the supervisor-side handler in
taskRunProcess only logged the message at debug level — leaving the run()
promise orphaned until maxDuration fired and producing empty attempts
(durationMs=0, costInCents=0).

The supervisor now rejects the in-flight attempt with an
UncaughtExceptionError and gracefully terminates the worker (preserving
the OTEL flush window) on UNCAUGHT_EXCEPTION. The attempt fails fast with
TASK_EXECUTION_FAILED, surfacing the original error name, message, and
stack trace, and falls under the normal retry policy. This mirrors the
existing indexing-side behavior in indexWorkerManifest. Apply the same
handling to unhandled promise rejections, which Node already routes
through uncaughtException by default.

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 6, 2026

🦋 Changeset detected

Latest commit: a5837fc

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 29 packages
Name Type
trigger.dev Patch
@trigger.dev/core Patch
d3-chat Patch
references-d3-openai-agents Patch
references-nextjs-realtime Patch
references-realtime-hooks-test Patch
references-realtime-streams Patch
references-telemetry Patch
@trigger.dev/build Patch
@trigger.dev/python Patch
@trigger.dev/redis-worker Patch
@trigger.dev/schema-to-json Patch
@trigger.dev/sdk Patch
@internal/cache Patch
@internal/clickhouse Patch
@internal/llm-model-catalog Patch
@internal/redis Patch
@internal/replication Patch
@internal/run-engine Patch
@internal/schedule-engine Patch
@internal/testcontainers Patch
@internal/tracing Patch
@internal/tsql Patch
@internal/zod-worker Patch
@internal/sdk-compat-tests Patch
@trigger.dev/react-hooks Patch
@trigger.dev/rsc Patch
@trigger.dev/database Patch
@trigger.dev/otlp-importer Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 6, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 3ea76d18-54a6-4b96-a366-6e40f31dbe55

📥 Commits

Reviewing files that changed from the base of the PR and between 3f70ec1 and a5837fc.

📒 Files selected for processing (3)
  • docs/troubleshooting.mdx
  • packages/core/src/v3/errors.ts
  • packages/core/src/v3/links.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/troubleshooting.mdx
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
🧰 Additional context used
📓 Path-based instructions (10)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

Files:

  • packages/core/src/v3/links.ts
  • packages/core/src/v3/errors.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use zod for validation in packages/core and apps/webapp

Files:

  • packages/core/src/v3/links.ts
  • packages/core/src/v3/errors.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Files:

  • packages/core/src/v3/links.ts
  • packages/core/src/v3/errors.ts
**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

  • packages/core/src/v3/links.ts
  • packages/core/src/v3/errors.ts
packages/core/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (packages/core/CLAUDE.md)

Never import the root package (@trigger.dev/core). Always use subpath imports such as @trigger.dev/core/v3, @trigger.dev/core/v3/utils, @trigger.dev/core/logger, or @trigger.dev/core/schemas

Files:

  • packages/core/src/v3/links.ts
  • packages/core/src/v3/errors.ts
packages/**/*.{ts,tsx,js}

📄 CodeRabbit inference engine (CLAUDE.md)

Use pnpm run build to verify changes in public packages (packages/*)

Files:

  • packages/core/src/v3/links.ts
  • packages/core/src/v3/errors.ts
{packages,integrations}/**/*.{ts,tsx,js}

📄 CodeRabbit inference engine (CLAUDE.md)

When modifying any public package (packages/* or integrations/*), add a changeset using pnpm run changeset:add

Files:

  • packages/core/src/v3/links.ts
  • packages/core/src/v3/errors.ts
{package.json,**/*.{ts,tsx,js}}

📄 CodeRabbit inference engine (CLAUDE.md)

Pin Zod to version 3.25.76 exactly across the entire monorepo - never use a different version or version range

Files:

  • packages/core/src/v3/links.ts
  • packages/core/src/v3/errors.ts
**/*.{ts,tsx,js}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx,js}: Import from @trigger.dev/core using subpaths only, never the root export
Always import tasks from @trigger.dev/sdk, never from @trigger.dev/sdk/v3 or deprecated client.defineJob
Add crumbs to code using // @Crumbs comments or `// `#region` `@crumbs blocks for debug tracing during development

Files:

  • packages/core/src/v3/links.ts
  • packages/core/src/v3/errors.ts
**/*.{ts,tsx,js,jsx,json,md,css,scss}

📄 CodeRabbit inference engine (AGENTS.md)

Code formatting is enforced using Prettier. Run pnpm run format before committing

Files:

  • packages/core/src/v3/links.ts
  • packages/core/src/v3/errors.ts
🧠 Learnings (2)
📚 Learning: 2026-03-22T13:26:12.060Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).

Applied to files:

  • packages/core/src/v3/links.ts
  • packages/core/src/v3/errors.ts
📚 Learning: 2026-03-22T19:24:14.403Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.

Applied to files:

  • packages/core/src/v3/links.ts
  • packages/core/src/v3/errors.ts
🔇 Additional comments (5)
packages/core/src/v3/links.ts (1)

15-19: LGTM — consistent with existing troubleshooting link pattern.

The new entry follows the established structure and the URL anchor is being introduced in the same PR's docs commit.

packages/core/src/v3/errors.ts (4)

354-419: LGTM — TASK_RUN_UNCAUGHT_EXCEPTION correctly placed in the retryable group.

Adding it alongside TASK_RUN_CRASHED / TASK_EXECUTION_FAILED is consistent with the PR intent. The assertExhaustive default ensures future code additions don't silently skip the new value.


422-449: LGTM — shouldLookupRetrySettings correctly opts into task-configured retry.

Treating TASK_RUN_UNCAUGHT_EXCEPTION the same as TASK_PROCESS_EXITED_WITH_NON_ZERO_CODE / TASK_PROCESS_SIGTERM is the right call: the task author's retry config should govern re-attempts after an uncaught exception.


549-558: LGTM — UncaughtExceptionError design is sound.

The class correctly wraps the original error without losing its name/message/stack, and exposes origin as a string union (per guidelines). The deliberate choice not to extend InternalError keeps parseError from converting it silently; the supervisor path uses taskRunProcess.ts's parseExecuteError for the proper INTERNAL_ERROR / TASK_RUN_UNCAUGHT_EXCEPTION conversion, and the indexing path already has its dedicated branch in serializeIndexingError.


727-738: LGTM — link-only entry is the right design.

Omitting message preserves the original error text (e.g. "read ECONNRESET") in the dashboard, which is far more actionable than a generic override. The comment clearly explains the intent. The generic INTERNAL_ERROR arm in taskRunErrorEnhancer (line 878) will spread getPrettyTaskRunError(error.code) and surface the link correctly.


Walkthrough

Adds handling for uncaught exceptions/unhandled rejections emitted by task worker processes. The supervisor now rejects in-flight attempts when an UNCAUGHT_EXCEPTION occurs (via a new private helper), terminates the worker gracefully, and surfaces an UncaughtExceptionError as a built-in error (including name, message, and stack trace). The internal error code TASK_RUN_UNCAUGHT_EXCEPTION was added to schemas and mapped into retry decision and run-status paths. Tests, troubleshooting docs, and a troubleshooting link were added/updated to cover the new behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The description covers the problem, solution, and testing strategy, but the provided PR description lacks the full template structure including the Testing and Changelog sections from the template. Complete the PR description by adding explicit Testing and Changelog sections following the repository template, even if briefly.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: handling uncaught exceptions by failing attempts promptly instead of hanging until maxDuration, with the ticket reference.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch tri-9117-uncaught-exception-fail-attempt

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/cli-v3/src/executions/taskRunProcess.test.ts (1)

122-153: 💤 Low value

LGTM!

Both tests cover the meaningful behavior — surfacing TASK_EXECUTION_FAILED with original error details, and preserving origin in the message. Using the real UncaughtExceptionError class (no mocks) aligns with the repo's testing guidelines.

Optional nit: the second test could also assert result.type === "INTERNAL_ERROR" and that stackTrace is undefined when no stack is provided, which would lock in the current behavior for the no-stack path. Not blocking.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/cli-v3/src/executions/taskRunProcess.test.ts` around lines 122 -
153, Add the optional assertions to the "preserves origin=unhandledRejection in
the surfaced message" test: after calling TaskRunProcess.parseExecuteError with
the UncaughtExceptionError, also assert that result.type === "INTERNAL_ERROR"
and that result.stackTrace is undefined when no stack was provided; update the
test block that constructs the UncaughtExceptionError and checks
result.code/message to include these two additional expectations.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@packages/cli-v3/src/executions/taskRunProcess.test.ts`:
- Around line 122-153: Add the optional assertions to the "preserves
origin=unhandledRejection in the surfaced message" test: after calling
TaskRunProcess.parseExecuteError with the UncaughtExceptionError, also assert
that result.type === "INTERNAL_ERROR" and that result.stackTrace is undefined
when no stack was provided; update the test block that constructs the
UncaughtExceptionError and checks result.code/message to include these two
additional expectations.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e7ea152b-a10d-4891-beaf-4617f6fe6b2e

📥 Commits

Reviewing files that changed from the base of the PR and between 6e8b039 and 65a5aec.

📒 Files selected for processing (3)
  • .changeset/uncaught-exception-fail-attempt.md
  • packages/cli-v3/src/executions/taskRunProcess.test.ts
  • packages/cli-v3/src/executions/taskRunProcess.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
  • GitHub Check: units / e2e-webapp / 🧪 E2E Tests: Webapp
  • GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
  • GitHub Check: sdk-compat / Bun Runtime
  • GitHub Check: sdk-compat / Cloudflare Workers
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: sdk-compat / Deno Runtime
  • GitHub Check: typecheck / typecheck
  • GitHub Check: Analyze (javascript-typescript)
🧰 Additional context used
📓 Path-based instructions (11)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

Files:

  • packages/cli-v3/src/executions/taskRunProcess.test.ts
  • packages/cli-v3/src/executions/taskRunProcess.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Files:

  • packages/cli-v3/src/executions/taskRunProcess.test.ts
  • packages/cli-v3/src/executions/taskRunProcess.ts
**/*.{test,spec}.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use vitest for all tests in the Trigger.dev repository

Files:

  • packages/cli-v3/src/executions/taskRunProcess.test.ts
**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

  • packages/cli-v3/src/executions/taskRunProcess.test.ts
  • packages/cli-v3/src/executions/taskRunProcess.ts
packages/**/*.{ts,tsx,js}

📄 CodeRabbit inference engine (CLAUDE.md)

Use pnpm run build to verify changes in public packages (packages/*)

Files:

  • packages/cli-v3/src/executions/taskRunProcess.test.ts
  • packages/cli-v3/src/executions/taskRunProcess.ts
**/*.test.{ts,tsx,js}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.test.{ts,tsx,js}: Use vitest exclusively for testing and never mock anything - use testcontainers instead
Place test files next to source files using the pattern MyService.ts -> MyService.test.ts

**/*.test.{ts,tsx,js}: Use vitest for unit testing and run tests with pnpm run test
Test files should live beside the files under test with descriptive describe and it blocks
Tests should avoid mocks or stubs and use helpers from @internal/testcontainers when Redis or Postgres are needed

Files:

  • packages/cli-v3/src/executions/taskRunProcess.test.ts
**/*.test.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Use testcontainers with redisTest, postgresTest, or containerTest from @internal/testcontainers for testing with Redis/PostgreSQL dependencies

Files:

  • packages/cli-v3/src/executions/taskRunProcess.test.ts
{packages,integrations}/**/*.{ts,tsx,js}

📄 CodeRabbit inference engine (CLAUDE.md)

When modifying any public package (packages/* or integrations/*), add a changeset using pnpm run changeset:add

Files:

  • packages/cli-v3/src/executions/taskRunProcess.test.ts
  • packages/cli-v3/src/executions/taskRunProcess.ts
{package.json,**/*.{ts,tsx,js}}

📄 CodeRabbit inference engine (CLAUDE.md)

Pin Zod to version 3.25.76 exactly across the entire monorepo - never use a different version or version range

Files:

  • packages/cli-v3/src/executions/taskRunProcess.test.ts
  • packages/cli-v3/src/executions/taskRunProcess.ts
**/*.{ts,tsx,js}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx,js}: Import from @trigger.dev/core using subpaths only, never the root export
Always import tasks from @trigger.dev/sdk, never from @trigger.dev/sdk/v3 or deprecated client.defineJob
Add crumbs to code using // @Crumbs comments or `// `#region` `@crumbs blocks for debug tracing during development

Files:

  • packages/cli-v3/src/executions/taskRunProcess.test.ts
  • packages/cli-v3/src/executions/taskRunProcess.ts
**/*.{ts,tsx,js,jsx,json,md,css,scss}

📄 CodeRabbit inference engine (AGENTS.md)

Code formatting is enforced using Prettier. Run pnpm run format before committing

Files:

  • packages/cli-v3/src/executions/taskRunProcess.test.ts
  • packages/cli-v3/src/executions/taskRunProcess.ts
🧠 Learnings (2)
📚 Learning: 2026-03-22T13:26:12.060Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).

Applied to files:

  • packages/cli-v3/src/executions/taskRunProcess.test.ts
  • packages/cli-v3/src/executions/taskRunProcess.ts
📚 Learning: 2026-03-22T19:24:14.403Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.

Applied to files:

  • packages/cli-v3/src/executions/taskRunProcess.test.ts
  • packages/cli-v3/src/executions/taskRunProcess.ts
🔇 Additional comments (5)
packages/cli-v3/src/executions/taskRunProcess.ts (4)

36-36: LGTM!

Import is correctly added alongside the other error types from the same module.


355-370: LGTM!

Helper correctly iterates pending attempts only, flips status before invoking the rejecter, and is idempotent if called twice. Mirrors the pending-attempt handling in #handleExit while keeping the call sites focused.


592-599: LGTM!

Mapping to TASK_EXECUTION_FAILED with the original name/message/stack is consistent with the indexing-side behavior referenced in the PR. Including origin in the surfaced message is helpful for distinguishing uncaughtException vs unhandledRejection in logs.


207-221: IPC payload schema verified—constructor contract is properly satisfied.

The UNCAUGHT_EXCEPTION message schema in ExecutorToWorkerMessageCatalog (packages/core/src/v3/schemas/messages.ts:134-142) exposes message.error and message.origin with exactly the shape expected by UncaughtExceptionError's constructor (packages/core/src/v3/errors.ts:547-551). The handler correctly passes these fields as-is, so the parsed-error path will not produce undefined fields at runtime.

.changeset/uncaught-exception-fail-attempt.md (1)

1-9: LGTM!

Patch-level changeset is appropriate, the problem/fix narrative matches the PR description, and the customer guidance about attaching client.on("error", ...) listeners is a useful hint.

…xDuration (TRI-9117)

When a Node EventEmitter (e.g. node-redis) emits an "error" event with no
listener attached, Node escalates it to process.on("uncaughtException") in
the task worker. The worker reported the error via the UNCAUGHT_EXCEPTION
IPC event but did not exit, and the supervisor-side handler in
taskRunProcess only logged the message at debug level — leaving the run()
promise orphaned until maxDuration fired and producing empty attempts
(durationMs=0, costInCents=0).

The supervisor now rejects the in-flight attempt with an
UncaughtExceptionError and gracefully terminates the worker (preserving
the OTEL flush window) on UNCAUGHT_EXCEPTION. The attempt fails fast with
TASK_EXECUTION_FAILED, surfacing the original error name, message, and
stack trace, and falls under the normal retry policy. This mirrors the
existing indexing-side behavior in indexWorkerManifest. Apply the same
handling to unhandled promise rejections, which Node already routes
through uncaughtException by default.
@matt-aitken matt-aitken force-pushed the tri-9117-uncaught-exception-fail-attempt branch from 65a5aec to 501513f Compare May 6, 2026 13:45
devin-ai-integration[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

…m failure

Following on from the prior commit that wired UNCAUGHT_EXCEPTION to fail
the attempt: the parseExecuteError branch returned an INTERNAL_ERROR with
code TASK_EXECUTION_FAILED, which made the run show as "System failure" in
the dashboard. The exception was raised by user code (or a dependency the
user controls — e.g. an EventEmitter "error" event with no listener), so
it should surface as a regular task failure ("Failed" status), not as a
platform fault.

Widen parseExecuteError's return to TaskRunError and have the
UncaughtExceptionError branch return a BUILT_IN_ERROR carrying the
original error name, message, and stack. This routes through the same
finalization path as a thrown user error: status=FAILED, normal retry
policy, catchError / handleError hooks fire as expected.

Both call sites (managed/execution.ts, dev-run-controller.ts) already pass
the result into TaskRunFailedExecutionResult.error, which accepts the full
TaskRunError union — no caller-side changes needed.
…ERNAL_ERROR code

Introduce TASK_RUN_UNCAUGHT_EXCEPTION as a dedicated TaskRunInternalError
code so the engine handles retry through its existing crash-style pathway
(lockedRetryConfig lookup), and the dashboard renders these failures as
"Failed" rather than "System failure".

The previous BUILT_IN_ERROR approach showed the right status but didn't
respect the user's retry policy: BUILT_IN_ERROR with retry: undefined
short-circuits to fail_run because shouldLookupRetrySettings(BUILT_IN_ERROR)
returns false. Inline retry calculation in cli-v3 was rejected as
duplicating logic already owned by the engine.

This change mirrors how TASK_PROCESS_EXITED_WITH_NON_ZERO_CODE,
TASK_PROCESS_SIGTERM, and TASK_PROCESS_SIGSEGV already work — same
lookup-and-retry pathway, just with a different surface status (Failed
vs Crashed) and the original error's message + stackTrace carried on the
INTERNAL_ERROR payload. No global behaviour changes; the new code is
opt-in via parseExecuteError's UncaughtExceptionError branch.

Touchpoints:
- packages/core/src/v3/schemas/common.ts: enum entry
- packages/core/src/v3/errors.ts: shouldRetryError, shouldLookupRetrySettings
- internal-packages/run-engine/src/engine/errors.ts: runStatusFromError
- packages/cli-v3/src/executions/taskRunProcess.ts: parseExecuteError + revert TaskRunError widening
- tests + changeset + server-changes entry
@matt-aitken matt-aitken force-pushed the tri-9117-uncaught-exception-fail-attempt branch from 501513f to 3f70ec1 Compare May 6, 2026 14:38
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
packages/cli-v3/src/executions/taskRunProcess.ts (1)

207-221: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Open past-review issue: stale execution state after #rejectPendingAttempts rejection.

When #rejectPendingAttempts rejects the in-flight promise, execute() throws at await promise and the two cleanup lines that follow are never reached:

// execute() lines 328-331 (NOT reached on rejection):
this._currentExecution = undefined;
this._isPreparedForNextAttempt = true;

This leaves isExecuting() returning true and isPreparedForNextAttempt stuck at false until the instance is discarded.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/cli-v3/src/executions/taskRunProcess.ts` around lines 207 - 221, The
UNCAUGHT_EXCEPTION handler currently calls this.#rejectPendingAttempts which
causes execute() to throw before it can clear state, leaving isExecuting() true
and isPreparedForNextAttempt false; to fix, ensure the instance state is cleared
before rejecting or ensure rejection happens asynchronously so execute()'s
cleanup runs: update the UNCAUGHT_EXCEPTION handler (and/or
`#rejectPendingAttempts` call site) to set this._currentExecution = undefined and
this._isPreparedForNextAttempt = true immediately before calling
this.#rejectPendingAttempts(new UncaughtExceptionError(...)) (or call
this.#rejectPendingAttempts inside a setImmediate/nextTick so execute()'s await
can complete and then perform rejection), and keep the await
this.#gracefullyTerminate(...) after that; reference these symbols:
UNCAUGHT_EXCEPTION handler, `#rejectPendingAttempts`, execute(), isExecuting(),
isPreparedForNextAttempt, and `#gracefullyTerminate`.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@packages/cli-v3/src/executions/taskRunProcess.ts`:
- Around line 207-221: The UNCAUGHT_EXCEPTION handler currently calls
this.#rejectPendingAttempts which causes execute() to throw before it can clear
state, leaving isExecuting() true and isPreparedForNextAttempt false; to fix,
ensure the instance state is cleared before rejecting or ensure rejection
happens asynchronously so execute()'s cleanup runs: update the
UNCAUGHT_EXCEPTION handler (and/or `#rejectPendingAttempts` call site) to set
this._currentExecution = undefined and this._isPreparedForNextAttempt = true
immediately before calling this.#rejectPendingAttempts(new
UncaughtExceptionError(...)) (or call this.#rejectPendingAttempts inside a
setImmediate/nextTick so execute()'s await can complete and then perform
rejection), and keep the await this.#gracefullyTerminate(...) after that;
reference these symbols: UNCAUGHT_EXCEPTION handler, `#rejectPendingAttempts`,
execute(), isExecuting(), isPreparedForNextAttempt, and `#gracefullyTerminate`.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 250b407d-2d7a-46de-82f9-c2fa27b2342b

📥 Commits

Reviewing files that changed from the base of the PR and between 501513f and 3f70ec1.

📒 Files selected for processing (7)
  • .changeset/uncaught-exception-fail-attempt.md
  • .server-changes/uncaught-exception-status-mapping.md
  • internal-packages/run-engine/src/engine/errors.ts
  • packages/cli-v3/src/executions/taskRunProcess.test.ts
  • packages/cli-v3/src/executions/taskRunProcess.ts
  • packages/core/src/v3/errors.ts
  • packages/core/src/v3/schemas/common.ts
✅ Files skipped from review due to trivial changes (1)
  • .server-changes/uncaught-exception-status-mapping.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
🧰 Additional context used
📓 Path-based instructions (14)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

Files:

  • packages/core/src/v3/schemas/common.ts
  • packages/cli-v3/src/executions/taskRunProcess.ts
  • internal-packages/run-engine/src/engine/errors.ts
  • packages/cli-v3/src/executions/taskRunProcess.test.ts
  • packages/core/src/v3/errors.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use zod for validation in packages/core and apps/webapp

Files:

  • packages/core/src/v3/schemas/common.ts
  • packages/core/src/v3/errors.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Files:

  • packages/core/src/v3/schemas/common.ts
  • packages/cli-v3/src/executions/taskRunProcess.ts
  • internal-packages/run-engine/src/engine/errors.ts
  • packages/cli-v3/src/executions/taskRunProcess.test.ts
  • packages/core/src/v3/errors.ts
**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

  • packages/core/src/v3/schemas/common.ts
  • packages/cli-v3/src/executions/taskRunProcess.ts
  • internal-packages/run-engine/src/engine/errors.ts
  • packages/cli-v3/src/executions/taskRunProcess.test.ts
  • packages/core/src/v3/errors.ts
packages/core/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (packages/core/CLAUDE.md)

Never import the root package (@trigger.dev/core). Always use subpath imports such as @trigger.dev/core/v3, @trigger.dev/core/v3/utils, @trigger.dev/core/logger, or @trigger.dev/core/schemas

Files:

  • packages/core/src/v3/schemas/common.ts
  • packages/core/src/v3/errors.ts
packages/**/*.{ts,tsx,js}

📄 CodeRabbit inference engine (CLAUDE.md)

Use pnpm run build to verify changes in public packages (packages/*)

Files:

  • packages/core/src/v3/schemas/common.ts
  • packages/cli-v3/src/executions/taskRunProcess.ts
  • packages/cli-v3/src/executions/taskRunProcess.test.ts
  • packages/core/src/v3/errors.ts
{packages,integrations}/**/*.{ts,tsx,js}

📄 CodeRabbit inference engine (CLAUDE.md)

When modifying any public package (packages/* or integrations/*), add a changeset using pnpm run changeset:add

Files:

  • packages/core/src/v3/schemas/common.ts
  • packages/cli-v3/src/executions/taskRunProcess.ts
  • packages/cli-v3/src/executions/taskRunProcess.test.ts
  • packages/core/src/v3/errors.ts
{package.json,**/*.{ts,tsx,js}}

📄 CodeRabbit inference engine (CLAUDE.md)

Pin Zod to version 3.25.76 exactly across the entire monorepo - never use a different version or version range

Files:

  • packages/core/src/v3/schemas/common.ts
  • packages/cli-v3/src/executions/taskRunProcess.ts
  • internal-packages/run-engine/src/engine/errors.ts
  • packages/cli-v3/src/executions/taskRunProcess.test.ts
  • packages/core/src/v3/errors.ts
**/*.{ts,tsx,js}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx,js}: Import from @trigger.dev/core using subpaths only, never the root export
Always import tasks from @trigger.dev/sdk, never from @trigger.dev/sdk/v3 or deprecated client.defineJob
Add crumbs to code using // @Crumbs comments or `// `#region` `@crumbs blocks for debug tracing during development

Files:

  • packages/core/src/v3/schemas/common.ts
  • packages/cli-v3/src/executions/taskRunProcess.ts
  • internal-packages/run-engine/src/engine/errors.ts
  • packages/cli-v3/src/executions/taskRunProcess.test.ts
  • packages/core/src/v3/errors.ts
**/*.{ts,tsx,js,jsx,json,md,css,scss}

📄 CodeRabbit inference engine (AGENTS.md)

Code formatting is enforced using Prettier. Run pnpm run format before committing

Files:

  • packages/core/src/v3/schemas/common.ts
  • packages/cli-v3/src/executions/taskRunProcess.ts
  • internal-packages/run-engine/src/engine/errors.ts
  • packages/cli-v3/src/executions/taskRunProcess.test.ts
  • packages/core/src/v3/errors.ts
{apps,internal-packages}/**/*.{ts,tsx,js}

📄 CodeRabbit inference engine (CLAUDE.md)

Use pnpm run typecheck to verify changes in apps and internal packages (apps/*, internal-packages/*) instead of build, which proves almost nothing about correctness

Files:

  • internal-packages/run-engine/src/engine/errors.ts
**/*.{test,spec}.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use vitest for all tests in the Trigger.dev repository

Files:

  • packages/cli-v3/src/executions/taskRunProcess.test.ts
**/*.test.{ts,tsx,js}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.test.{ts,tsx,js}: Use vitest exclusively for testing and never mock anything - use testcontainers instead
Place test files next to source files using the pattern MyService.ts -> MyService.test.ts

**/*.test.{ts,tsx,js}: Use vitest for unit testing and run tests with pnpm run test
Test files should live beside the files under test with descriptive describe and it blocks
Tests should avoid mocks or stubs and use helpers from @internal/testcontainers when Redis or Postgres are needed

Files:

  • packages/cli-v3/src/executions/taskRunProcess.test.ts
**/*.test.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Use testcontainers with redisTest, postgresTest, or containerTest from @internal/testcontainers for testing with Redis/PostgreSQL dependencies

Files:

  • packages/cli-v3/src/executions/taskRunProcess.test.ts
🧠 Learnings (2)
📚 Learning: 2026-03-22T13:26:12.060Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).

Applied to files:

  • packages/core/src/v3/schemas/common.ts
  • packages/cli-v3/src/executions/taskRunProcess.ts
  • internal-packages/run-engine/src/engine/errors.ts
  • packages/cli-v3/src/executions/taskRunProcess.test.ts
  • packages/core/src/v3/errors.ts
📚 Learning: 2026-03-22T19:24:14.403Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.

Applied to files:

  • packages/core/src/v3/schemas/common.ts
  • packages/cli-v3/src/executions/taskRunProcess.ts
  • internal-packages/run-engine/src/engine/errors.ts
  • packages/cli-v3/src/executions/taskRunProcess.test.ts
  • packages/core/src/v3/errors.ts
🔇 Additional comments (7)
packages/core/src/v3/schemas/common.ts (1)

177-177: LGTM — schema enum is consistent with all exhaustive consumers.

The new code is already handled in shouldRetryError, shouldLookupRetrySettings, and runStatusFromError per the relevant context snippets, so no exhaustive-check gaps are introduced.

internal-packages/run-engine/src/engine/errors.ts (1)

18-23: LGTM — correct status mapping, exhaustive guard intact.

TASK_RUN_UNCAUGHT_EXCEPTION belongs with the other user-code errors that resolve to COMPLETED_WITH_ERRORS, and the assertExhaustive(error.code) at line 69 ensures any future schema additions that miss this switch will fail to compile.

packages/core/src/v3/errors.ts (1)

395-405: LGTM — TASK_RUN_UNCAUGHT_EXCEPTION correctly retries and consults run retry config.

Placing it in the return true branch of shouldRetryError and shouldLookupRetrySettings is the right call: uncaught exceptions from user code (or their dependencies) should honour the task's configured retry policy, mirroring the treatment of TASK_PROCESS_EXITED_WITH_NON_ZERO_CODE and TASK_PROCESS_SIGTERM.

packages/cli-v3/src/executions/taskRunProcess.ts (2)

355-370: LGTM — #rejectPendingAttempts correctly guards against double-rejection.

Setting the status to "REJECTED" before calling rejecter ensures that the parallel #handleExit loop (which also checks "PENDING") cannot double-reject the same attempt when the process eventually exits.


592-605: LGTM — parseExecuteError correctly surfaces TASK_RUN_UNCAUGHT_EXCEPTION with original error context.

The originalError.message and originalError.stack are passed directly, which is consistent with other branches in parseExecuteError. Both are optional on TaskRunInternalError, so the case where originalError.stack is absent is handled cleanly.

packages/cli-v3/src/executions/taskRunProcess.test.ts (1)

122-154: LGTM — tests accurately reflect the production behavior for both exception origins.

The two cases exercise the key requirements: stack forwarding (case 1) and same code for unhandledRejection (case 2). Both drive the pure static method without mocking, which aligns with the project's testing guidelines.

.changeset/uncaught-exception-fail-attempt.md (1)

1-6: LGTM — changeset covers both affected public packages with a clear user-facing description.

Adds a prettyInternalErrors entry pointing the dashboard at a troubleshooting
doc anchor for uncaught exceptions. Link-only — no `message` override, so the
customer's original error (e.g. "read ECONNRESET") is preserved as the main
text. The link gives them somewhere to read about attaching .on("error")
listeners and the unhandledRejection pathway.

The docs anchor (#uncaught-exceptions) doesn't exist yet — needs a docs PR
to add the troubleshooting section.
Adds an "Uncaught exceptions" section that the dashboard's pretty-link
button now points at (#uncaught-exceptions). Covers what the error means,
the common EventEmitter-without-listener cause (with a node-redis example),
the .on("error", ...) fix, and the unhandledRejection path.
@matt-aitken matt-aitken marked this pull request as ready for review May 6, 2026 15:25
@matt-aitken matt-aitken merged commit 62e0066 into main May 6, 2026
42 of 44 checks passed
@matt-aitken matt-aitken deleted the tri-9117-uncaught-exception-fail-attempt branch May 6, 2026 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants