Core ML and Metal: why Apple Silicon changed on-device dictation - Voice Type blog Skip to main content Voice Type Pricing Learn Enterprise Trust Blog Blog Core ML and Metal: why Apple Silicon changed on-device dictation Running Whisper-style models locally without melting your laptop. ← Back to Blog | Home 28 Nov 2025 Three years ago, running a Whisper-class model on a laptop meant choosing between accuracy and usability. The CPU couldn't keep up. The GPU was power-hungry. Cloud was the only practical option. Apple Silicon changed this. What Core ML does Core ML is Apple's framework for running ML models on-device. It handles the boring parts: memory management, operator fusion, scheduling across CPU/GPU/Neural Engine. You give it a model, it figures out how to run it efficiently. For Whisper-style models, Core ML can split work across the Neural Engine (for matrix operations) and GPU (for attention layers). The result: fast inference without thermal throttling. What Metal adds Metal is Apple's low-level GPU API. Core ML uses it under the hood, but we also use Metal directly for custom audio processing kernels. The pre-processing pipeline (noise suppression, normalization, resampling) runs on Metal. This keeps the CPU free for UI work and avoids memory copies between CPU and GPU. Real numbers On an M1 Mac, finalizing 30 seconds of audio takes about 2-3 seconds. That's the model inference time after you release the hotkey. On an M3, it's closer to 1.5-2 seconds. Intel Macs still work but run slower. The model uses CPU fallbacks, which means longer finalization and more power draw. Why this matters for battery Cloud dictation keeps the radio on. Uploading audio, waiting for response, downloading text. Each step draws power. On-device inference is a burst: work hard for 2 seconds, then idle. The Neural Engine is remarkably efficient for this pattern. We've had users dictate for hours on battery. Trade-offs Model size vs accuracy. Smaller models run faster but miss more words. We offer multiple model sizes (27 MB to 550 MB) so you can pick the trade-off that fits your hardware. Larger models make sense on M2/M3 with plenty of unified memory. Smaller models work better on 8GB machines or older Intel Macs. On-device used to mean compromise. Now it means fast, private, and battery-friendly. The hardware caught up. Previous What we learned building a dictation app for Mac Next Voice typing on Mac: what actually works Related articles Product What we learned building a dictation app for Mac Notes on accuracy, privacy, and what actually matters when you dictate every day. Guide Voice typing on Mac: what actually works Quick notes on macOS dictation options and when each makes sense. Voice Type Learn All guides Voice Type vs Apple Dictation Dragon alternatives For writers For developers Notion on Mac Latency demo Press kit Company Enterprise Trust Center Pricing Blog Company Terms of service Privacy policy Contact us © 2025 Careless Whisper Inc.