30 Sept 2025
Long sessions: uploads vs 30‑second windowing
Why streaming on‑device and finalizing only the last ~30 seconds keeps long dictations responsive.
Half an hour of clean audio is not a “quick upload.”
Uploading long, high‑quality audio takes time, especially on variable wifi. Many cloud tools avoid heavy compression to protect accuracy, which increases upload size.
Voice Type stays on‑device, streams continuously, and when you stop, finishes only the last ~30s window (≈2–3s on an M1). That’s why long sessions feel snappy in practice.
Explore the difference (choose Medium or Long in the demo): /blog/latency-demo
Related
Normalized loudness and gentle filtering help the recognizer hear what you meant, not the room.
What actually makes dictation feel fast and accurate? Real-world trade‑offs between file uploads and on‑device streaming, plus an interactive demo.