feat(mistralai): add streaming STT support for Voxtral Realtime #4773

zkewal · 2026-02-09T20:30:03Z

Summary

Adds realtime streaming transcription to livekit-plugins-mistralai using Mistral's voxtral-mini-transcribe-realtime-2602 model via the first-party SDK's RealtimeTranscription API. The existing batch transcription path is preserved unchanged.

Why This Matters

The realtime model streams partial transcription deltas mid-utterance instead of waiting for the full audio, reducing perceived latency. It also addresses the batch model's tendency to hallucinate on silence significantly from our private evals. See #4754 for context.

Changes

`stt.py` (+470/-12)

Add SpeechStream(stt.RecognizeStream) — fully async streaming via RealtimeTranscription.transcribe_stream() with 50ms PCM audio chunks
Debounced finalization after VAD FlushSentinel (finalize_delay_ms, default 100ms), plus idle-based finalization (650ms) for the STTNode path where FlushSentinel is not used
Map Mistral events to LiveKit lifecycle: TextDelta → INTERIM_TRANSCRIPT, SegmentDelta/Done → FINAL_TRANSCRIPT, with START/END_OF_SPEECH
Unknown event recovery and duplicate final detection
Model-based routing: batch models use recognize(), realtime models use stream(). Calling recognize() with a realtime model raises a clear error
_build_capabilities() with graceful fallback for offline_recognize across livekit-agents versions

`models.py`

Add voxtral-mini-transcribe-realtime-2602 to STTModels

`pyproject.toml`

mistralai>=1.9.11 → mistralai[realtime]>=1.12.0 (adds WebSocket support)
livekit-agents>=1.4.1 → >=1.3.5 (no 1.4-specific APIs used; avoids OpenTelemetry version conflict with mistralai==1.12.0)

`init.py`

Export SpeechStream

Usage

from livekit.plugins.mistralai import STT

# Streaming (new)
stt = STT(model="voxtral-mini-transcribe-realtime-2602")
stream = stt.stream()

# Batch (unchanged)
stt = STT(model="voxtral-mini-latest")
event = await stt.recognize(buffer)

Testing & Validation

Verified batch backward compatibility (existing models unaffected)
Validated VAD flush finalization and idle-timeout finalization paths
Confirmed no dependency conflicts with livekit-agents>=1.3.5

Closes #4754

Add realtime streaming transcription using Mistral's voxtral-mini-transcribe-realtime-2602 model via the first-party SDK's RealtimeTranscription API. - Add SpeechStream (stt.RecognizeStream) with async audio generator and event-driven transcription processing - Support interim transcripts (TextDelta) and final transcripts (SegmentDelta/Done) with START/END_OF_SPEECH lifecycle events - Debounced finalization after VAD FlushSentinel with configurable finalize_delay_ms, plus idle-based finalization for STTNode path - Model-based routing: batch models use recognize(), realtime models use stream() - Add mistralai[realtime] dependency for WebSocket support Closes livekit#4754

zkewal · 2026-02-09T20:53:26Z

@chenghao-mou there is a dependency conflict with the mistralai first-party SDK, which includes web-socket support for the streaming model and livekit/agents, over the opentelemetry-api package. The livekit/agents repo requires version >=1.39.0, which was bumped with the 1.3.6 release, while the mistralai Python SDK depends on version 1.38.0.

How do you suggest I handle this? In my local testing, I downgraded livekit/agents to 1.3.5 and was able to test the current PR.

More details here: https://github.com/livekit/agents/actions/runs/21839471789/job/63019576507?pr=4773#step:5:13

submitted an issue to Mistral AI to raise the upper limit: mistralai/client-python#341

zkewal added 3 commits February 9, 2026 21:27

Update pyproject.toml

c031866

Update stt.py

1e606fc

zkewal changed the title ~~Feat/mistralai streaming stt~~ feat(mistralai): add streaming STT support for Voxtral Realtime Feb 9, 2026

Update stt.py

11e1d58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mistralai): add streaming STT support for Voxtral Realtime #4773

feat(mistralai): add streaming STT support for Voxtral Realtime #4773

zkewal commented Feb 9, 2026 •

edited

Loading

Uh oh!

zkewal commented Feb 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat(mistralai): add streaming STT support for Voxtral Realtime #4773

Are you sure you want to change the base?

feat(mistralai): add streaming STT support for Voxtral Realtime #4773

Conversation

zkewal commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why This Matters

Changes

stt.py (+470/-12)

models.py

pyproject.toml

__init__.py

Usage

Testing & Validation

Uh oh!

zkewal commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zkewal commented Feb 9, 2026 •

edited

Loading

`stt.py` (+470/-12)

`models.py`

`pyproject.toml`

`init.py`

zkewal commented Feb 9, 2026 •

edited

Loading