Speech-to-text (ASR) converts audio into text automatically. Used by accessibility teams, content creators, and developers building voice interfaces. Salary: $70-120k junior, $120-180k mid, $180-270k senior. Learn in 3-4 weeks. Adjacent to NLP, audio processing, and machine learning.
Speech-to-text (ASR, automatic speech recognition) converts audio recordings into text automatically. Modern ASR models (Whisper, Google Cloud, AWS Transcribe) achieve >95% accuracy on clean audio and can handle multiple languages, accents, and dialects. Applications range from accessibility (captions for deaf users), content creation (podcast transcripts, video subtitles), to voice interfaces (Alexa, Siri). ASR combines audio signal processing, acoustic modeling, and language models to predict what words were spoken.
| Region | Junior | Mid | Senior |
|---|---|---|---|
| USA | $70k | $145k | $240k |
| UK | $50k | $100k | $160k |
| EU | $55k | $110k | $175k |
| CANADA | $65k | $130k | $220k |
Take a 10-min Career Match — we'll suggest the right tracks.
Find my best-fit skills →Skill-based matching across 2,536 careers. Free, ~10 minutes.
Take Career Match — free →