Technology
Voice Type delivers fast, accurate macOS dictation entirely on‑device. Our pipeline focuses on two things that dictate real‑world dictation quality: speed and accuracy.
On‑device pipeline
- Robust VAD: RNNoise‑based voice activity detection continuously finds speech and ignores background noise.
- Segmentation: smart sentence/phrase boundaries with padding and merging; continuous background capture.
- Audio normalization: target −14 LUFS with K‑weighting; then high‑pass at 50 Hz (2nd‑order Butterworth) to reduce low‑frequency rumble.
- Resampling: 48 kHz → 16 kHz via band‑limited conversion to match model expectations.
- Batching: transcribes in ~30‑second windows, preserving context across windows.
- Acceleration: Core ML + Metal optimizations keep inference snappy and power‑efficient.
Why it feels faster
- No uploads: audio never leaves your Mac—no network delay, no large file transfers, no degraded results from compressed audio.
- 30‑second batching: when you stop dictation we only finish the current ~30s block; on an M1 this finalization typically completes in about 2–3 seconds.
- Energy‑aware: batching + on‑device inference avoids long, continuous cloud round‑trips.
Why it stays accurate
- Input conditioning: LUFS normalization and filtering align input closer to common Whisper‑training audio distributions (a mix of read speech and podcast‑like sources).
- Noise resilience: RNNoise helps suppress ambient noise without over‑smoothing speech.
- No forced “prompt fixes”: we avoid heavy LLM prompting to coerce spellings, which can reduce overall transcript fidelity. We improve the signal before recognition instead.
Privacy and reviews
- App Store sandboxed: distributed via the Mac App Store; Apple sandboxing applies.
- Authentic reviews: ratings/reviews are from the App Store—unfiltered, including critical ones.
- Network calls: none during dictation, except receipt checks and optional “bring‑your‑own‑key” rewrite calls if you enable them.
Optional instant rewrite
If you enable BYOK, Voice Type can hand off the transcript to a fast provider for rewriting or formatting. Combined with high token‑throughput providers, end‑to‑end dictation → rewrite is exceptionally quick while your audio stays local.