Technology

Voice Type delivers fast, accurate macOS dictation entirely on‑device. Our pipeline focuses on two things that dictate real‑world dictation quality: speed and accuracy.

On‑device pipeline

  • Robust VAD: RNNoise‑based voice activity detection continuously finds speech and ignores background noise.
  • Segmentation: smart sentence/phrase boundaries with padding and merging; continuous background capture.
  • Audio normalization: target −14 LUFS with K‑weighting; then high‑pass at 50 Hz (2nd‑order Butterworth) to reduce low‑frequency rumble.
  • Resampling: 48 kHz → 16 kHz via band‑limited conversion to match model expectations.
  • Batching: transcribes in ~30‑second windows, preserving context across windows.
  • Acceleration: Core ML + Metal optimizations keep inference snappy and power‑efficient.

Why it feels faster

  • No uploads: audio never leaves your Mac—no network delay, no large file transfers, no degraded results from compressed audio.
  • 30‑second batching: when you stop dictation we only finish the current ~30s block; on an M1 this finalization typically completes in about 2–3 seconds.
  • Energy‑aware: batching + on‑device inference avoids long, continuous cloud round‑trips.

Why it stays accurate

  • Input conditioning: LUFS normalization and filtering align input closer to common Whisper‑training audio distributions (a mix of read speech and podcast‑like sources).
  • Noise resilience: RNNoise helps suppress ambient noise without over‑smoothing speech.
  • No forced “prompt fixes”: we avoid heavy LLM prompting to coerce spellings, which can reduce overall transcript fidelity. We improve the signal before recognition instead.

Privacy and reviews

  • App Store sandboxed: distributed via the Mac App Store; Apple sandboxing applies.
  • Authentic reviews: ratings/reviews are from the App Store—unfiltered, including critical ones.
  • Network calls: none during dictation, except receipt checks and optional “bring‑your‑own‑key” rewrite calls if you enable them.

Optional instant rewrite

If you enable BYOK, Voice Type can hand off the transcript to a fast provider for rewriting or formatting. Combined with high token‑throughput providers, end‑to‑end dictation → rewrite is exceptionally quick while your audio stays local.