Why offline dictation feels faster

Short phrases flatter the cloud. Long sessions expose it.

This interactive shows the difference. Use your own network. See where the time goes.

Cloud: streaming + LLM rewrite

Total: 4.4 s
Handshakes240 msService overhead350 msStreaming (upload+ASR)NaNm NaNsLLM rewrite (proxy)700 ms

On‑device: finalize + LLM rewrite

Total: 3.3 s
Finalize last ~30s2.5 sHandshake (rewrite)120 msLLM rewrite (BYOK)700 ms

Legend

Handshakes Service overhead Upload Transcribe Streaming (upload+ASR) LLM rewrite (proxy) Finalize last ~30s LLM rewrite (BYOK)

Assumptions (realistic, simplified)

  • Streaming can overlap upload with ASR; file upload cannot.
  • Handshake per hop ≈ 2×RTT (DNS+TLS+warmups). Cloud path includes ASR hop + proxy hop; BYOK uses a single hop.
  • Cloud ASR set above (default 200× real‑time). On‑device shows only the last ~30s finalize (≈2.5 s).
  • Same rewrite speed for both paths (≈1200 tok/s); proxy adds only hop latency.

Short phrases (5–15s): handshakes dominate cloud flows. On‑device avoids them entirely.

Long sessions: upload size and proxy hops compound latency. On‑device streams live and only finalizes the last ~30s when you stop (≈2–3s on M1).

With BYOK, rewrites go directly from your Mac to the provider in one hop. Audio stays local.