30 Sept 2025
Short utterances and the hidden cost of handshakes
For 5–15 second notes, network setup time can outweigh everything else. On‑device avoids the detours.
You say “Thanks.” The network says “Hold on.”
Cloud flows typically include multiple handshakes (TLS/DNS) and at least one remote hop. For very short phrases, this setup time can dominate. On‑device dictation avoids the detours entirely: your audio stays local, text appears immediately, and there’s nothing to upload.
See the effect in the interactive demo (choose Short and try different networks): /blog/latency-demo
Related
Engineering
Cleaner input, cleaner transcripts: audio conditioning for accuracy
Normalized loudness and gentle filtering help the recognizer hear what you meant, not the room.
Performance
Long sessions: uploads vs 30‑second windowing
Why streaming on‑device and finalizing only the last ~30 seconds keeps long dictations responsive.