Your Mac's microphone captures audio at 48 kHz. Whisper-style models expect 16 kHz. The conversion matters more than you'd think.

TL;DR

Bad resampling introduces aliasing artifacts the model can “hear.”
Band-limited resampling keeps consonants intact while preventing fold-back noise.
Resample after upstream steps (VAD/noise suppression), not before.

The naive approach breaks things

Simple decimation (just dropping samples) creates aliasing artifacts. High frequencies fold back into the audible range as noise. The model hears phantom sounds that weren't in your voice.

Cheap resampling libraries optimize for speed, not quality. Fine for ringtones. Bad for speech recognition where subtle differences between consonants matter.

Band-limited resampling

We use band-limited resampling: apply a low-pass filter at the Nyquist frequency (8 kHz), then decimate. This removes frequencies that would alias before they can cause problems.

The filter matters. Too aggressive and you lose the high-frequency content that distinguishes "s" from "f" from "th". Too gentle and aliasing sneaks through.

Why not just record at 16 kHz?

macOS audio APIs default to 48 kHz. Fighting the system adds latency and edge cases. Better to accept 48 kHz and resample correctly.

Plus, we process at 48 kHz for the earlier pipeline stages (VAD, noise suppression). Higher sample rate means more information to work with when detecting speech boundaries.

The full chain

Capture at 48 kHz (Mac default)
RNNoise VAD + noise suppression (48 kHz)
Segmentation into phrases (48 kHz)
LUFS normalization + high-pass filter (48 kHz)
Band-limited resample to 16 kHz
Feed to Whisper-style model

Related: RNNoise VAD · Technology overview

Each step operates at the sample rate that makes sense for it. The model gets clean 16 kHz audio that matches its training distribution.

Small details compound. A 1% improvement in each pipeline stage adds up to noticeably better transcripts.

Blog

48 kHz in, 16 kHz out: why resampling matters

TL;DR

The naive approach breaks things

Band-limited resampling

Why not just record at 16 kHz?

The full chain

Related articles

Learn

Company