• PSA: Speeding up offline STTusing modern Sayboard Vosk models

    From Maria Sophia@[email protected] to comp.mobile.android on Tue Apr 21 16:00:00 2026
    From Newsgroup: comp.mobile.android

    My keyboard offline speech-to-text (STT) got screwed up recently, specifically, the standard IME trigger got reset somehow to Sayboard.

    On Android, since you can use any keyboard with any speech engine, the keyboard's mic button does not perform speech recognition itself.

    On Android, the keyboard is one Input Method Editor (IME) while the
    keyboard's mic button calls the system's Voice Input Method IME.

    A keyboard IME might be HeliBoard, Samsung Keyboard, OpenBoard,
    FlorisBoard, Simple Keyboard, AnySoftKeyboard, etc., while a voice
    input IME might be Sayboard, Sherpa-ONNX, Transcribo, Whisper+VoiceInput, Samsung Voice Input, Google Voice Typing, etc.

    Since HeliBoard is generally considered the best offline keyboard shell,
    in Settings > General management is "Heliboard settings > Toolbar", the "Select toolbar keys > Voice input" mic key defaults to being on.

    In Settings > General management is "Keyboard list and default" where the "Physical Keyboard is set to not connected" (where the physical keyboard started this problem because Genymotion scrcpy -k settings screwed things
    up and I had to recently reset my entire well-honed keyboard setup).

    Note: On Windows, "scrcpy -k" simulates a physical keyboard on Android,
    which reduces the step of bringing up a keyboard when mirroring Android,
    but it's not worth the trouble it caused (which I only learned belatedly).

    But I forgot how I had accomplished it in the past. Sigh.
    So here's a log, of sorts, that may help others.

    In Keyboard list and default are
    Default keyboard = HeliBoard (version 3.9)
    Samsung Keyboard
    TalkBack braille keyboard = off
    OpenBoard = on
    Keepass2Android = off
    Automate = off
    Button Mapper = off
    Key Mapper Basic Input Method = off
    Simple Keyboard = on
    HeliBoard = on
    FlorisBoard = on
    Whisper+voice Input = on
    Sayboard = on
    Transcribro = off
    Magikeyboard (KeePassDX) = off
    Keyboard button on navigation bar = on

    But when I press the Heliboard "mic" button, up pops Sayboard as the STT. That's because Sayboard became (somehow) the standard IME trigger.

    When I open Sayboard as an app, it shows: English (United States) /storage/emulated/0/Android/data/com.elishaazaria.sayboard/files/Models/en-US/ vosk-model-small-en-us-0.15

    That Vosk model is, IMHO, slow to start and rather inaccurate at times.
    So I went today to https://alphacephei.com/vosk/models

    I downloaded the newer, faster, larger vosk-model-en-us-0.22-lgraph model. https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip =40MB https://alphacephei.com/vosk/models/vosk-model-en-us-0.22-lgraph.zip =128MB

    Supposedly the larger different "graph" type is much faster at decoding on mid-range phones than the standard small models is (mine is Galaxy A32).
    Name: stt_english_vosk-model-en-us-0.22-lgraph.zip
    Size: 130557655 bytes (124 MiB)
    SHA256: D9838B4AAA82A75C4A17F5ACA300EACA129AAAB2A7CBF951BAFBB500EB9C4334

    I had to look up how to install Vosk models manually.
    <https://github.com/ElishaAz/Sayboard/wiki/Install-a-Vosk-Model-Manually> Since Sayboard downloaded the small model, I can reuse its directory.
    In Sayboard, I pressed the "Import model" (folder icon).

    Since the file had a long name, and there were similarly named files in my external sd card folders, I had to choose which file manager to select it

    Good for long file names
    File Manager (blue folder icon)
    Ghost Commander
    Total Commander
    OK for long file names (truncated but scroll)
    File Manager (yellow folder icon)
    Amaze
    Material Files
    Bad for long file names (truncated)
    FX File Chooser (X-plore access was denied)
    My Files
    Confusing I/F
    OI File Manager
    X-Plore

    I also had to first delete the existing model because Sayboard kept saying
    that the model already exists (it doesn't replace the new model or add it).

    Now the model is in:
    /storage/emulated/0/Android/data/com.elishaazaria.sayboard/files/Models/
    en-US/vosk-model-en-us-0.22-lgraph

    But I think it's still too slow and inaccurate so I'll try different models such as the SherpaOnxx models perhaps...
    <https://github.com/k2-fsa/sherpa-onnx/blob/master/android/README.md>
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Maria Sophia@[email protected] to comp.mobile.android on Tue Apr 21 20:25:56 2026
    From Newsgroup: comp.mobile.android

    Maria Sophia wrote:
    But I think it's still too slow and inaccurate so I'll try different models such as the SherpaOnxx models perhaps...
    <https://github.com/k2-fsa/sherpa-onnx/blob/master/android/README.md>

    Drat.

    There is no official, polished, Play-Store-ready Sherpa-ONNX IME yet.

    But there is a working, actively maintained Sherpa-ONNX Voice Input IME
    that we can install manually at <https://github.com/k2-fsa/sherpa-onnx>.

    Supported NPUs
    1. Rockchip NPU (RKNN)
    2. Qualcomm NPU (QNN)
    3. Ascend NPU
    4 . Axera NPU
    Mine is a MediaTek Dimensity 720 CPU so there won't be any NPU.

    But it will run on CPU (ARM64)
    Paraformer vs Zipformer?
    A. Paraformer is a non-streaming model
    B. Zipformer is a streaming model

    Which one for a Samsung Galaxy A32-5G?
    a. Speed? Paraformer
    b. Accuracy? Paraformer
    c. Startup time? Paraformer
    d. CPU efficiency? Paraformer
    e. Streaming? Zipformer

    Going with Zipformer models, I need to find:
    a. Model Type: Streaming Zipformer
    b. Version: Small or Tiny
    c. Format: Int8 (Quantized) <=== this is de rigueur

    <https://github.com/k2-fsa/sherpa-onnx/releases>
    Latest <https://github.com/k2-fsa/sherpa-onnx/releases/tag/v1.12.39>
    Drat. It's not there even though the list of assets is huge.

    APKs for streaming speech recognition https://k2-fsa.github.io/sherpa/onnx/android/apk.html

    Since I'm on Samsung A32-5G (which uses an ARM64 processor)
    a. sherpa-onnx-x.y.z-arm64-v8a-asr-en-nemo_ctc_80ms.apk
    b. sherpa-onnx-x.y.z-arm64-v8a-asr-en-nemo_ctc_480ms.apk
    c. sherpa-onnx-x.y.z-arm64-v8a-asr-en-nemo_ctc_1040ms.apk
    d. sherpa-onnx-x.y.z-arm64-v8a-asr-en-small_zipformer.apk

    I'll pick the 1/2-second version above (1/2-second chunks). <https://huggingface.co/csukuangfj2/sherpa-onnx-apk/resolve/main/asr/1.12.36/sherpa-onnx-1.12.36-arm64-v8a-asr-en-nemo_ctc_480ms.apk>
    Name: sherpa-onnx-1.12.36-arm64-v8a-asr-en-nemo_ctc_480ms.apk
    Size: 439248156 bytes (418 MiB)
    SHA256: 67C81FC9067C9BC5B4C0B205C0B808149C0290C20A1291AF6E8784147BE230BE

    Copy that APK from the PC to Android over USB.
    Install using Muntashirakon App Manager. It said "Installing ASR".
    The app icon is named "ASR Next-gen Kaldi".
    The package is "com.k2fsa.sherpa.onnx" "Version 1.12.36 (20260408)".

    During the installation process, a popup asks:
    Allow ASR to record audio = while app is running & then the app popped up
    with a "Start" button which was their test sequence.
    I said "Hello I love you won't you tell me your name".
    It wrote "O: Uh i love you donon't you tell me your name".
    Testing it again, I said the one line and it wrote:
    "0: Have i love yont you tell your name"
    Third test:
    "0: hallo i love you donn't you tell name"
    So, um, it's not all that good.

    Settings > General management > Keyboard list and default.
    Drat. I should see "Sherpa-ONNX" but it's not there.
    Apparently, given it's just a demo app, it doesn't "announce" itself to
    Android as a keyboard or a voice provider.

    Sigh. While there is code for an IME in the Sherpa-ONNX GitHub site, they
    don't always provide a pre-built APK for it in every release.

    So we need to move on to a different but "real" voice IME instead. Sigh. Whisper+ is next.
    --- Synchronet 3.21f-Linux NewsLink 1.2