Core ML and Metal: why Apple Silicon changed on-device dictation - Voice Type blog                       Skip to main content


Voice Type


Pricing
Learn
Enterprise
Trust
Blog


Blog
Core ML and Metal: why Apple Silicon changed on-device dictation
Running Whisper-style models locally without melting your laptop.
← Back to Blog  |  Home

28 Nov 2025


Three years ago, running a Whisper-class model on a laptop meant choosing between accuracy and usability. The CPU couldn't keep up. The GPU was power-hungry. Cloud was the only practical option.

Apple Silicon changed this.

What Core ML does

Core ML is Apple's framework for running ML models on-device. It handles the boring parts: memory management, operator fusion, scheduling across CPU/GPU/Neural Engine. You give it a model, it figures out how to run it efficiently.

For Whisper-style models, Core ML can split work across the Neural Engine (for matrix operations) and GPU (for attention layers). The result: fast inference without thermal throttling.

What Metal adds

Metal is Apple's low-level GPU API. Core ML uses it under the hood, but we also use Metal directly for custom audio processing kernels.

The pre-processing pipeline (noise suppression, normalization, resampling) runs on Metal. This keeps the CPU free for UI work and avoids memory copies between CPU and GPU.

Real numbers

On an M1 Mac, finalizing 30 seconds of audio takes about 2-3 seconds. That's the model inference time after you release the hotkey.

On an M3, it's closer to 1.5-2 seconds.

Intel Macs still work but run slower. The model uses CPU fallbacks, which means longer finalization and more power draw.

Why this matters for battery

Cloud dictation keeps the radio on. Uploading audio, waiting for response, downloading text. Each step draws power.

On-device inference is a burst: work hard for 2 seconds, then idle. The Neural Engine is remarkably efficient for this pattern. We've had users dictate for hours on battery.

Trade-offs

Model size vs accuracy. Smaller models run faster but miss more words. We offer multiple model sizes (27 MB to 550 MB) so you can pick the trade-off that fits your hardware.

Larger models make sense on M2/M3 with plenty of unified memory. Smaller models work better on 8GB machines or older Intel Macs.

On-device used to mean compromise. Now it means fast, private, and battery-friendly. The hardware caught up.


Previous
What we learned building a dictation app for Mac
Next
Voice typing on Mac: what actually works

Related articles
Product
What we learned building a dictation app for Mac
Notes on accuracy, privacy, and what actually matters when you dictate every day.
Guide
Voice typing on Mac: what actually works
Quick notes on macOS dictation options and when each makes sense.


Voice Type


Learn

All guides

Voice Type vs Apple Dictation

Dragon alternatives

For writers

For developers

Notion on Mac

Latency demo

Press kit


Company

Enterprise

Trust Center

Pricing

Blog

Company

Terms of service

Privacy policy

Contact us


© 2025 Careless Whisper Inc.