Every fraction of a second between speaking and seeing text is time your brain waits instead of thinks. Cloud dictation adds network round-trips, upload time, and server queues. On-device skips all of that.

Length

Network

LLM rewrite

BYOK cleanup

60ms RTT · 5 Mbps up

Cloud dictation

Upload audio → cloud STT

14s

Handshake120msAudio upload7.7sTranscribe6.0sOverhead300ms

Voice Type

Local STT → LLM rewrite

2.8s

Final transcript1.2sLLM handshake120msLLM rewrite1.5s

Audio stays on your Mac. Only text goes to your LLM.

Voice Type is 11s faster (80% less wait)

Calculation details

• Cloud audio: 128 kbps, STT ~50× real-time

• On-device: ~1.2s finalize on M1 Pro

• LLM: Cerebras ~1000 tokens/sec

• Speech: ~225 wpm (fast talker)

The bottom line

Cloud dictation depends on your network. Fast fiber? Fine. Café Wi-Fi? Noticeable lag. Airplane? Doesn't work.

On-device takes ~1-2 seconds to finalize, regardless of network. Same speed in your office or at 35,000 feet.

Voice Type processes everything on your Mac. No upload, no waiting.

Try the free 7-day trial

Interactive demo

Where does the time go?

Cloud dictation

Voice Type

The bottom line

Learn

Company