feat!: major performance & accuracy improvements in speech-to-text module by IgorSwat · Pull Request #1132 · software-mansion/react-native-executorch

IgorSwat · 2026-05-08T08:26:37Z

Description

This PR introduces several changes to the speech-to-text module based on Whisper models:

CoreML integration - models re-exported to CoreML backend, bringing significant performance upgrade for iOS devices.
New streaming algorithm - eliminates duplicates in streaming output, resulting in a major quality improvement of the live streaming mode.
Changes in demo apps: removed faulty 'voice mode' screen in LLM demo app, refactored speech to text screen in 'speech' app by adding new CoreML models to selection bar and changing the default model for iOS devices.
Minor code improvements in speech-to-text module

Introduces a breaking change?

Yes
No

Type of change

Bug fix (change which fixes an issue)
New feature (change which adds functionality)
Documentation update (improves or adds clarity to existing documentation)
Other (chores, tests, code style improvements etc.)

Tested on

iOS
Android

Testing instructions

Run demo app to test the live streaming mode.

Screenshots

Related issues

#1124

Checklist

I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have updated the documentation accordingly
My changes generate no new warnings

Additional notes

I am still trying to figure out a way to export Whisper efficiently to Vulkan backend after some initial failures, to cover Android devices as well.

…ware-mansion/react-native-executorch into @is/speech-to-text-ultimate

msluszniak · 2026-05-08T08:44:55Z

  WHISPER_SMALL_EN,
  TranscriptionResult,
  SpeechToTextProps,
+  WHISPER_SMALL_EN_COREML,


why this is added after TranscriptionResult and SpeechToTextProps? ;p

msluszniak · 2026-05-08T08:46:32Z

    "react": "19.2.5",
    "react-native": "0.83.4",
-    "react-native-audio-api": "0.12.0",
+    "react-native-audio-api": "0.11.5",


hey, why is that? We virtually never want to downgrade packages in demo apps.

audio-api 0.12.0 causes build fails on iOS, and I think it's the same issue @benITo47 had when testing the 1.2.0 binaries some time ago.

Could you please name when you have these fails? I don't have any on iOS simulator.

Tested on physical device.

@mdydek could you look at this one? Maybe you have an intuition behind this error? @IgorSwat do you have iOS 26.2 on your physical device?

Yeah, I even have 26.4.

msluszniak · 2026-05-08T08:47:39Z

 namespace rnexecutorch::models::speech_to_text {

+/**
+ * Basically a different representation of token,


Suggested change

* Basically a different representation of token,

* Different representation of token,

msluszniak · 2026-05-08T08:49:25Z

+  for (size_t i = 1; i < sequenceIds.size(); ++i) {
+    std::span<uint64_t> single(sequenceIds.data() + i, 1);
+    logitsTensor = this->decode(single, encoderFeatures, startPos);
+    ++startPos;


Suggested change

for (size_t i = 1; i < sequenceIds.size(); ++i) {

std::span<uint64_t> single(sequenceIds.data() + i, 1);

logitsTensor = this->decode(single, encoderFeatures, startPos);

++startPos;

for (size_t i = 1; i < sequenceIds.size(); ++i, ++startPos) {

std::span<uint64_t> single(sequenceIds.data() + i, 1);

logitsTensor = this->decode(single, encoderFeatures, startPos);

msluszniak · 2026-05-08T08:53:59Z


-  return {.committed = move_to_vector(committed),
-          .nonCommitted = move_to_vector(nonCommitted)};
+  // Return the results


Suggested change

// Return the results

msluszniak · 2026-05-08T08:55:11Z

+    // Because of step 1, we know that if the last EOS exist in eos_,
+    // then it must be the last entry.
+    if (eos_.empty() || eos_.back().position != lastEosIndex) {
+      // Register last EOS entry


Suggested change

// Register last EOS entry

msluszniak · 2026-05-08T08:55:38Z

+  std::vector<Segment> transcriptions = asr_->transcribe(input, options);

  // Flatten segments into a single word sequence.
+  // This is basically our 'nonCommitted' part for now.


Suggested change

// This is basically our 'nonCommitted' part for now.

// This is our 'nonCommitted' part for now.

msluszniak · 2026-05-08T08:55:53Z

-  return std::vector<Word>(std::make_move_iterator(container.begin()),
-                           std::make_move_iterator(container.end()));
+OnlineASR::OnlineASR(const ASR *asr) : asr_(asr) {
+  // Reserve an expected amount of memory for audio buffer.


Suggested change

// Reserve an expected amount of memory for audio buffer.

msluszniak · 2026-05-08T09:02:26Z

+
+  // Last-tick committed delta + whatever never made it past the commit
+  // threshold.
+  std::vector<Word> residual = std::move(result.committed);


Suggested change

std::vector<Word> residual = std::move(result.committed);

std::vector<Word> residual{std::move(result.committed)};

msluszniak · 2026-05-08T09:06:19Z

@@ -1325,14 +1338,17 @@
    STYLE_TRANSFER_UDNIE,


Ok, so from 0.9 we will effectively drop support from our URL to original models (neither xnnpack nor coreml), right?

I don't get it - the original models are XNNPACK ones, so they will still be available.

WHISPER_TINY_EN_QUANTIZED is quantized xnnpack, WHISPER_TINY_EN is I guess full precision xnnpack, since there is no WHISPER_TINY_EN_QUANTIZED we dropped something, what exactly?

Well, I just think the quantized models are pointless - they weigh only a little bit less than standard float32 models, there do not bring any significant inference speed up compared to baseline, and no one really downloads them on HF. I believe their existance just introduces unnecessary noise to the module.

I see, I'm ok with removing some of those, now the only question is what should we remove, quantized or non-quantized. If they are just a bit smaller and just a bit faster, they are still better than original one, aren't they?

Well, float32 baseline models are well tested and surely at least as accurate as quantized (and probably more accurate). If performance difference is minimal (or frankly not existing) then I don't like the idea of risking accuracy drops for some type of inputs.

Sure thing, that explanation is absolutely fine for me, I mostly asked because I wanted to be on the same page :))

msluszniak · 2026-05-08T15:30:33Z

Also if this PR adds breaking change, please describe it directly below Introduces a breaking change? section in PR body.

msluszniak · 2026-05-11T12:20:47Z

+  std::span<uint64_t> firstToken(sequenceIds.data(), 1);
+  executorch::aten::Tensor logitsTensor =
+      this->decode(firstToken, encoderFeatures, startPos);
+  ++startPos;


Please abstract it into for loop, sth like this:

executorch::aten::Tensor logitsTensor = nullptr; for (size_t i = 0; i < sequenceIds.size(); ++i, ++startPos) { ... }

msluszniak · 2026-05-11T12:23:46Z

-  audioBuffer_.reserve(static_cast<size_t>(2 * params::kStreamChunkThreshold *
-                                           constants::kSamplingRate));
+bool OnlineASR::isReady() const {
+  std::scoped_lock<std::mutex> lock(streamingMutex);


std::scope_lock generally doesn't need to be explicitly templated with mutex type, you can drop it, please apply the same to the rest of the places.

msluszniak · 2026-05-11T12:25:24Z

  for (auto &segment : transcriptions) {
-    words.insert(words.end(), std::make_move_iterator(segment.words.begin()),
-                 std::make_move_iterator(segment.words.end()));
+    std::move(segment.words.begin(), segment.words.end(),


std::ranges::move with back_inserter

msluszniak · 2026-05-11T12:27:37Z

+  for (size_t i = 0; i < memory_.eos.size(); i++) {
+    const auto &eos = memory_.eos[i];
+    if (eos.position >= words.size() || !utils::isEos(words[eos.position]) ||
+        (eos.position > 0 &&
+         eos.preceeding != words[eos.position - 1].content)) {
+      memory_.eos.erase(memory_.eos.begin() + i, memory_.eos.end());


Probably for loop with iterators is more accurate here,

msluszniak · 2026-05-11T12:30:09Z

+  // in a 'good' spot - where it will remove a significant audio chunk, yet
+  // won't affect most recent, unfinished speech samples.
+  size_t bufferSize = audioBuffer_.size();
+  if (bufferSize > static_cast<size_t>(params::kStreamSafeBufferDuration *


Use std::cmp_greater instead.

msluszniak · 2026-05-11T12:31:21Z

+
+std::vector<Word> OnlineASR::commitAndClean(std::vector<Word> &transcript) {
+  const size_t bufferSize = audioBuffer_.size();
+  const float midBufferThreshold = params::kStreamMaxDuration / 2.0F;


Probably might be constexpr

msluszniak · 2026-05-11T12:32:11Z

+  // recorded any speech. In this case we can safely cut the maximum amount of
+  // audio data.
+  if (memory_.eos.empty()) {
+    size_t cut =


same, probably constexpr

msluszniak · 2026-05-11T12:42:03Z

-  }
-
-  return 0;
+constexpr inline bool isEos(const Word &word) {


Will the be ever used in compile time scenario so you added constexpr here, since I don't think so

IgorSwat added 6 commits May 7, 2026 12:10

Optimal streaming algorithm

b8276eb

Merge branch '@is/speech-to-text-ultimate' of https://github.com/soft…

5b1c7b6

…ware-mansion/react-native-executorch into @is/speech-to-text-ultimate

Revert back to 100ms refresh rate

a82e6fc

Add CoreML whisper models

eccda55

Update model urls

0f77124

Change default model for iOS devices

6c6545d

IgorSwat requested review from benITo47, chmjkb and msluszniak May 8, 2026 08:26

IgorSwat added model Issues related to exporting, improving, fixing ML models improvement PRs or issues focused on improvements in the current codebase labels May 8, 2026

msluszniak assigned IgorSwat May 8, 2026

msluszniak requested changes May 8, 2026

View reviewed changes

IgorSwat changed the title ~~feat: major performance & accuracy improvements in speech-to-text module~~ feat!: major performance & accuracy improvements in speech-to-text module May 8, 2026

IgorSwat added 2 commits May 10, 2026 10:50

Add explicit timeout parameter

1dfafbe

Concurrency fixes & automatic cleaunp

66fde0e

msluszniak reviewed May 11, 2026

View reviewed changes

	* Basically a different representation of token,
	* Different representation of token,

	// This is basically our 'nonCommitted' part for now.
	// This is our 'nonCommitted' part for now.

	std::vector<Word> residual = std::move(result.committed);
	std::vector<Word> residual{std::move(result.committed)};

Conversation

IgorSwat commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Introduces a breaking change?

Type of change

Tested on

Testing instructions

Screenshots

Related issues

Checklist

Additional notes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

msluszniak May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

msluszniak May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

msluszniak commented May 8, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

IgorSwat commented May 8, 2026 •

edited

Loading

msluszniak May 8, 2026 •

edited

Loading

msluszniak May 8, 2026 •

edited

Loading