PerformanceSep 30, 20251 min readfreshly reviewed

Long sessions: uploads vs 30‑second windowing

Why streaming on‑device and finalizing only the last ~30 seconds keeps long dictations responsive.

Half an hour of clean audio is not a “quick upload.”

TL;DR

Long sessions punish cloud workflows because upload time scales with audio length.
On-device streaming stays responsive by working in fixed windows.
Finalization is bounded: when you stop, only the last window needs finishing.

Uploading long, high‑quality audio takes time, especially on variable wifi. Many cloud tools avoid heavy compression to protect accuracy, which increases upload size.

Voice Type stays on‑device, streams continuously, and when you stop, finishes only the last ~30s window (≈2–3s on an M1). That’s why long sessions feel snappy in practice.

Explore the difference (choose Medium or Long in the demo): /blog/latency-demo

Performance

Short utterances and the hidden cost of handshakes

For 5–15 second notes, network setup time can outweigh everything else. On‑device avoids the detours.

Product

Best dictation app for Mac in 2026: what actually matters

A practical buyer's guide to Mac dictation in 2026: which tools fit quick notes, full hands-free control, private local workflows, and file transcription.

Long sessions: uploads vs 30‑second windowing

Key takeaways

TL;DR

Dictate into any Mac text field without waiting on uploads.

Related articles