On-device pipeline
- Robust voice activity detection: RNNoise-based VAD continuously tracks speech and suppresses background noise.
- Segmentation tuned for conversation: Phrase boundaries with padding and merging preserve context between utterances.
- Audio conditioning: We normalise to −14 LUFS with K-weighting, then apply a 50 Hz second-order Butterworth high-pass filter.
- Band-limited resampling: 48 kHz audio is converted to 16 kHz to match Whisper-derived model expectations.
- Context-preserving batching: Dictation streams in ~30 second blocks so longer thoughts stay coherent.
- Core ML + Metal acceleration: Optimised operators keep inference responsive while remaining energy-efficient.
Why it feels faster
- No uploads: Audio never leaves your Mac, eliminating network latency and compression artefacts.
- Quick finalisation: When you release the hotkey, the current batch finalises within 2–3 seconds on an M-series Mac.
- Energy-aware scheduling: On-device inference avoids long-running cloud calls, keeping laptops cooler and quieter.
Why it stays accurate
- Signal-first improvements: We reinforce audio quality before recognition instead of post-processing transcripts with heavy prompt engineering.
- Noise resilience: RNNoise lowers ambient distractions without over-smoothing consonants and sibilants.
- Domain vocabulary: Custom word lists feed directly into the recogniser, keeping technical jargon intact.
Privacy and reviews
- Mac App Store distribution: Voice Type ships with Apple notarisation and sandboxing enabled.
- Authentic transparency: App Store reviews are shown without filtering, including critical feedback.
- Minimal network calls: We only ping Apple for receipt checks and, if you enable it, your optional rewrite provider.
Optional instant rewrite
Enable bring-your-own-key rewriting to hand transcripts to a rapid LLM for formatting, summary, or drafting. Because dictation stays local, you maintain full control over routing and retention.
