Skip to content

fix: dispose native FFI resources before process.exit() in job shutdown#1042

Merged
toubatbrian merged 2 commits intolivekit:mainfrom
Raysharr:fix/dispose-native-resources-before-exit
Feb 12, 2026
Merged

fix: dispose native FFI resources before process.exit() in job shutdown#1042
toubatbrian merged 2 commits intolivekit:mainfrom
Raysharr:fix/dispose-native-resources-before-exit

Conversation

@Raysharr
Copy link
Copy Markdown
Contributor

@Raysharr Raysharr commented Feb 11, 2026

Summary

Calls dispose() from @livekit/rtc-node before process.exit(0) in the job process shutdown sequence to properly clean up native FFI resources.

Problem

When a job process shuts down (e.g., after SIP trunk disconnect), the current shutdown sequence is:

session.close() → room.disconnect() → shutdown callbacks → process.exit(0)

room.disconnect() disconnects from the LiveKit room, but does not clean up the Rust FFI Server resources — specifically the tokio async/audio runtimes, FfiRoom instances, and native handles in the DashMap. These resources leak on every job shutdown.

Fix

Add await dispose() after all job cleanup completes but before process.exit(0):

session.close() → room.disconnect() → callbacks → dispose() → process.exit(0)

dispose() calls livekitDispose() which:

  1. Closes all FfiRoom instances (drops track handles, awaits task JoinHandles)
  2. Clears the DashMap of native handles
  3. Shuts down both tokio runtimes (async + audio)
  4. Invalidates the FfiServer config

The call is wrapped in try/catch so a failed cleanup never blocks process exit.

Testing

Tested against a production voice agent (SIP trunk + Silero VAD + Google STT + ElevenLabs TTS + LiveKit turn detector). Confirmed via logs that dispose() completes successfully on every SIP disconnect.

Note: The libc++abi: mutex lock failed crash that sometimes appears on process.exit(0) is a separate native-layer issue — it persists regardless of JS-level cleanup (including dispose(), ONNX session release, and drain delays). It occurs during C++ destructor ordering and is cosmetic: all job work, IPC messaging, and resource cleanup complete before it fires. See node-sdks#564 for the related native crash.

Changes

  • agents/src/ipc/job_proc_lazy_main.ts — Import dispose from @livekit/rtc-node, call it before process.exit(0) in the shutdown sequence

Risk

Minimal. dispose() is idempotent and designed for exactly this purpose. The try/catch ensures it never blocks exit even if it fails.

Call `dispose()` from `@livekit/rtc-node` before `process.exit(0)` in
the job process shutdown sequence. Without this, the process terminates
while Rust FFI resources (tokio runtimes, libwebrtc threads) are still
running, which can cause:

  libc++abi: terminating due to uncaught exception of type
  std::system_error: mutex lock failed: Invalid argument

The crash is a race condition — most reliably triggered when:
- Audio is actively flowing through native pipeline (STT/VAD)
- SIP trunk disconnect causes rapid shutdown
- Multiple native threads are mid-execution during process.exit()

The fix adds `await dispose()` after all job cleanup completes (session
close, room disconnect, shutdown callbacks) but before process.exit(0).
dispose() is wrapped in try/catch so a failed cleanup never blocks exit.

Shutdown sequence (before):
  session.close() → room.disconnect() → callbacks → process.exit(0)

Shutdown sequence (after):
  session.close() → room.disconnect() → callbacks → dispose() → exit(0)

Related: livekit/node-sdks#564
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Feb 11, 2026

CLA assistant check
All committers have signed the CLA.

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Feb 11, 2026

🦋 Changeset detected

Latest commit: a18f008

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 19 packages
Name Type
@livekit/agents Patch
@livekit/agents-plugin-anam Patch
@livekit/agents-plugin-baseten Patch
@livekit/agents-plugin-bey Patch
@livekit/agents-plugin-cartesia Patch
@livekit/agents-plugin-deepgram Patch
@livekit/agents-plugin-elevenlabs Patch
@livekit/agents-plugin-google Patch
@livekit/agents-plugin-hedra Patch
@livekit/agents-plugin-inworld Patch
@livekit/agents-plugin-lemonslice Patch
@livekit/agents-plugin-livekit Patch
@livekit/agents-plugin-neuphonic Patch
@livekit/agents-plugin-openai Patch
@livekit/agents-plugin-resemble Patch
@livekit/agents-plugin-rime Patch
@livekit/agents-plugin-silero Patch
@livekit/agents-plugins-test Patch
@livekit/agents-plugin-xai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

Copy link
Copy Markdown
Contributor

@toubatbrian toubatbrian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch on this bug! Have you tried testing the agent after this change and saw that the error is gone?

Copy link
Copy Markdown
Contributor Author

@Raysharr Raysharr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I tested this against our production voice agent (SIP trunk, Silero VAD, Google STT, ElevenLabs TTS, LiveKit turn detector) and ran a deeper investigation.

dispose() works correctly — after patching, the log confirms native resources disposed on every shutdown. This properly cleans up FFI rooms, native handles, and tokio runtimes that were previously leaking on each job exit.

However, during testing I discovered that the libc++abi: mutex lock failed crash is a separate, deeper issue. It persists even after:

  • dispose() completes ✅
  • ONNX sessions released (VAD + turn detector) ✅
  • 3s drain delay before exit ✅

The crash fires at process.exit(0) itself — during C++ destructor ordering in the native addon layer. All job work, IPC, and cleanup complete successfully before it. It's cosmetic but noisy.

I'd suggest we:

  1. Merge this PR as-isdispose() is the correct JS-side cleanup and should have been here regardless. It fixes the resource leak.
  2. Track the mutex crash separately — it needs a fix in the native Rust/C++ teardown (likely related to node-sdks#564).

I've also updated the PR description to reflect these findings.

@toubatbrian toubatbrian merged commit b1002e7 into livekit:main Feb 12, 2026
4 checks passed
@github-actions github-actions Bot mentioned this pull request Feb 10, 2026
@sgzrov
Copy link
Copy Markdown

sgzrov commented May 1, 2026

Filed #1375 — same libc++abi mutex error class, but the JS-side dispose-ordering workaround in this PR doesn't address it (the abort fires even after native resources disposed). Cross-linking for searchability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants