Skip to content

fix(voice): cancel realtime generation when speech is interrupted#5703

Open
longcw wants to merge 1 commit into
mainfrom
longc/realtime-cancel-on-interrupt
Open

fix(voice): cancel realtime generation when speech is interrupted#5703
longcw wants to merge 1 commit into
mainfrom
longc/realtime-cancel-on-interrupt

Conversation

@longcw
Copy link
Copy Markdown
Contributor

@longcw longcw commented May 11, 2026

Summary

session.interrupt() was a silent no-op when fired before the realtime server had emitted response.created. The local SpeechHandle was marked interrupted, but _realtime_reply_task had already passed its auth-wait block and continued to call rt_session.generate_reply(), after which the server produced a response the agent then discarded. Reported in #5642 (comment).

Two changes fix it:

  • Activity-side race. _realtime_reply_task now races the generate_reply future against speech_handle._interrupt_fut via wait_if_not_interrupted. On interrupt, the future is cancelled and the task returns.
  • Plugin-side cancel propagation. Each realtime plugin's generate_reply future now wires an add_done_callback that, on cancellation, cleans up tracking state and signals the server (or drops the queued send) where the API supports it:
    • OpenAI (and xAI by inheritance): emits response.cancel.
    • Google: calls interrupt() to send activity_start to Gemini.
    • Phonic: cancels the queued send task (Phonic has no programmatic server-side cancel).
    • Ultravox: sends a deferred barge-in (urgency=immediate, defer_response=True).
    • AWS Nova (experimental): cancels the queued send task (Nova handles in-progress interruption automatically).

Added test_generate_reply_cancellation in tests/test_realtime/test_realtime.py (parameterized over OpenAI/Azure) which cancels a generate_reply future immediately and verifies a subsequent generate_reply does not hit conversation_already_has_active_response.

Fixes #5642

When session.interrupt() runs before rt_session has emitted response.created,
the in-flight generate_reply was forging ahead and the server would still
produce a response that the agent then ignored. _realtime_reply_task now races
the generate_reply future against the speech interrupt and cancels it on
interrupt, and each realtime plugin's generate_reply future now signals a
server-side cancel (or drops the queued send) from its done callback.
@chenghao-mou chenghao-mou requested a review from a team May 11, 2026 05:46
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

Open in Devin Review

if speech_handle.interrupted:
# cancel the pending generation; the plugin emits response.cancel
if not generate_reply_fut.done():
generate_reply_fut.cancel()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the future is done, should we call self._rt_session.interrupt() instead?

self._pending_generation_fut = None
if f.cancelled() and is_current:
# external cancel: signal interrupt to Gemini via activity_start
self.interrupt()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is guarded by _manual_activity_detection in start_user_activity, should we send it here directly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

await session.interrupt() with gpt-realtime model not reliable

2 participants