30 Sept 2025
Cleaner input, cleaner transcripts: audio conditioning for accuracy
Normalized loudness and gentle filtering help the recognizer hear what you meant, not the room.
If the input is messy, the output will be too.
Voice Type normalizes loudness to a consistent target and applies a light high pass filter to reduce low frequency rumble. Combined with noise aware voice activity detection, this gives the model input closer to the audio it was trained on which leads to fewer garbles and better punctuation.
We avoid heavy “prompt fixes” that can make transcripts look confident but less faithful. Instead, we improve the signal before recognition.
Related
Why streaming on‑device and finalizing only the last ~30 seconds keeps long dictations responsive.
What actually makes dictation feel fast and accurate? Real-world trade‑offs between file uploads and on‑device streaming, plus an interactive demo.