Both Voice Type and SuperWhisper run OpenAI Whisper locally on your Mac using Apple's Core ML. The difference is in what happens before and after recognition: audio conditioning, hotkey behavior, and optional AI features.
Short answer
- Pick Voice Type if you want hold-to-dictate hotkeys, audio conditioning for noisy environments, and a single one-time price.
- Pick SuperWhisper if you prefer tiered pricing, AI rewriting features, or need the Pro tier's unlimited transcription.
At a glance
Finalization speed
Voice Type finalizes in under 2 seconds regardless of how long you dictate. The streaming architecture processes audio in chunks, so only the last segment needs finalizing when you stop.
Recognition accuracy
Voice Type uses beam search decoding for higher accuracy on complex phrases. Combined with RNNoise preprocessing and proper audio conditioning, technical terms transcribe correctly.
Punctuation handling
Voice Type has near-parity with Dragon Dictate for spoken punctuation—a key feature professional dictation users expect. Say 'period', 'comma', 'new paragraph' naturally.
Custom vocabulary
Voice Type supports custom word priming following Whisper best practices for prompt conditioning. Add product names, technical terms, and jargon that transcribe correctly.
Audio preprocessing
Voice Type applies LUFS normalization, RNNoise noise suppression, and silence trimming before recognition. SuperWhisper relies on the raw Whisper model without preprocessing.
Pricing
Voice Type: $19.99 one-time. SuperWhisper: $9.99 (Basic), $19.99 (Standard), $29.99 (Pro) - tiered pricing with different feature sets.
Audio preprocessing
Voice Type conditions audio before it reaches the Whisper model. This includes loudness normalization via LUFS metering, background noise reduction using RNNoise (a recurrent neural network trained specifically for speech denoising), and silence trimming. The goal: cleaner input produces more accurate output, especially in non-ideal recording conditions.
SuperWhisper passes audio directly to the Whisper model. This works well in quiet environments but may produce more errors with background noise or inconsistent microphone levels.
Speed architecture
Voice Type finalizes in under 2 seconds no matter how long you've been dictating. The architecture streams audio in ~30-second windows, processing each chunk as you speak. When you release the hotkey, only the final segment needs processing—the rest is already done.
This streaming approach means consistent latency whether you dictate for 10 seconds or 10 minutes. Cloud-based tools often have variable latency that scales with audio length.
Beam search and accuracy
Voice Type uses beam search decoding rather than greedy decoding. Beam search explores multiple possible transcriptions simultaneously and selects the most likely sequence, improving accuracy on ambiguous or technical phrases.
Combined with proper prompt conditioning for custom vocabulary (following OpenAI's Whisper documentation), technical terms, product names, and domain-specific jargon transcribe correctly.
Model options
SuperWhisper lets you choose from multiple Whisper model sizes (tiny, base, small, medium, large). Smaller models are faster but less accurate; larger models are slower but handle complex vocabulary better.
Voice Type ships with an optimized model tuned for the hold-to-dictate workflow where consistent sub-2-second finalization matters. The beam search configuration and audio preprocessing compensate for model size trade-offs.
Who should choose what
Choose Voice Type if…
- •You need sub-2-second finalization regardless of dictation length.
- •You want Dragon-level punctuation support ('period', 'new paragraph').
- •You dictate technical terms and need custom vocabulary priming.
- •You work in noisy environments and need audio preprocessing.
Choose SuperWhisper if…
- •You want AI-powered text rewriting or formatting.
- •You need to choose from multiple Whisper model sizes.
- •You prefer tiered pricing based on features you need.
- •You want toggle-mode dictation instead of hold-to-talk.
Technology references
- OpenAI Whisper (GitHub) - Open-source speech recognition model used by both apps
- Robust Speech Recognition via Large-Scale Weak Supervision - Original Whisper research paper (OpenAI, 2022)
- RNNoise (GitHub) - Neural network noise suppression by Xiph.org
- RNNoise: Learning Noise Suppression - Technical demo by Jean-Marc Valin (Mozilla/Xiph)
- Apple Core ML - On-device machine learning framework
- Apple Metal - GPU acceleration for ML inference
- Hacker News: Whisper large-v2 discussion - Community discussion on Whisper accuracy
