Voice Type and Voice Ink are both Mac dictation apps built on Whisper-based models running locally. They share the same foundational speech recognition technology. What differs is how each app handles audio before recognition and the dictation workflow.
Short answer
- Pick Voice Type if you want hold-to-dictate hotkeys, RNNoise audio conditioning, and a streamlined dictation-first workflow.
- Pick Voice Ink if you prefer its specific UI approach or workflow style.
At a glance
Finalization speed
Voice Type finalizes in under 2 seconds regardless of dictation length. The streaming architecture processes chunks as you speak, so only the last segment needs finalizing.
Beam search accuracy
Voice Type uses beam search decoding rather than greedy decoding, exploring multiple transcriptions simultaneously for higher accuracy on complex phrases.
Punctuation handling
Voice Type has near-parity with Dragon Dictate for spoken punctuation—say 'period', 'comma', 'new paragraph' naturally. A key feature professional users expect.
Custom vocabulary
Voice Type supports prompt conditioning for custom words following Whisper best practices. Technical terms and product names transcribe correctly.
Audio preprocessing
Voice Type applies RNNoise noise suppression, LUFS normalization, and silence trimming before recognition. Cleaner input produces more accurate output.
Audio preprocessing: the key difference
Voice Type applies multiple preprocessing steps before audio reaches the Whisper model:
- RNNoise: A recurrent neural network trained specifically for speech noise suppression, developed by Xiph.org (the team behind Opus codec). It removes keyboard clicks, air conditioning, and ambient room noise.
- LUFS normalization: Loudness Units Full Scale ensures consistent input levels regardless of microphone distance or voice volume variations.
- Silence trimming: Dead air is removed before processing, reducing unnecessary computation and improving recognition focus.
The result: cleaner input produces more accurate output, especially when dictating in cafes, open offices, or with background conversations. This preprocessing pipeline is the core differentiator in Voice Type's approach.
On-device processing
Both apps run entirely on your Mac using Apple's Core ML and Metal GPU acceleration. No audio leaves your computer for transcription. This means:
- Consistent performance regardless of internet connection
- Works offline on planes, in cafes with poor WiFi, or during outages
- Complete privacy - your voice recordings stay on your machine
- Predictable latency with no server round-trips
Who should choose what
Choose Voice Type if…
- •You need sub-2-second finalization regardless of length.
- •You want Dragon-level punctuation support built in.
- •You dictate technical terms and need custom vocabulary.
- •You work in noisy environments and need audio preprocessing.
Choose Voice Ink if…
- •You prefer Voice Ink's specific UI or features.
- •You already own Voice Ink and it meets your needs.
- •You want a different workflow style.
Technology references
- OpenAI Whisper (GitHub) - Open-source speech recognition model
- Robust Speech Recognition via Large-Scale Weak Supervision - Whisper paper (OpenAI, 2022)
- RNNoise (GitHub) - Neural network noise suppression by Xiph.org
- RNNoise: Learning Noise Suppression - Technical demo by Jean-Marc Valin
- Apple Core ML - On-device machine learning framework
- Apple Metal - GPU acceleration for ML inference
