Frequently Asked Questions
Everything you need to know about TranscribeAnything.
What audio and video formats are supported?
MP3, WAV, M4A, FLAC, AAC, OGG, WMA, OPUS for audio. MP4, MOV, WebM for video. When you upload a video, we automatically extract the audio track.
How many languages are supported?
Over 96 languages, from Afrikaans to Zulu. 16 languages have excellent accuracy (under 8% word error rate), including English, Spanish, French, German, Japanese, Korean, and Chinese.
Can I transcribe YouTube videos?
Yes. Paste any YouTube URL and we'll download the audio and transcribe it. This works with most video and podcast platforms — over 1,800 sites are supported.
What is speaker diarization?
Speaker diarization identifies who said what in a recording. When enabled, your transcript labels each segment with a speaker ID (Speaker 0, Speaker 1, etc.), making it easy to follow multi-person conversations.
Is my audio data private?
When using the CLI or MCP server locally, your audio never leaves your device — all processing happens on your machine. For the web app, uploaded audio is processed on our servers and deleted within 24 hours.
What AI model is used?
We use OpenAI's Whisper model (via faster-whisper, a CTranslate2-optimized implementation). The default 'turbo' model balances speed and accuracy. 'large-v3' provides maximum accuracy for all languages.
Do I need an account?
No. Basic web transcription works without any signup. The CLI and MCP server are always free and unlimited with no account required.
What export formats are available?
Plain text (TXT), SubRip subtitles (SRT), WebVTT subtitles (VTT), and structured JSON with full segment and word-level data.
How accurate is the transcription?
For English and other Tier 1 languages, expect under 7% word error rate — comparable to professional human transcription. Accuracy varies by language, audio quality, and background noise.
Can I use this with AI assistants like Claude?
Yes. TranscribeAnything includes an MCP (Model Context Protocol) server that Claude and other AI agents can use to transcribe audio as part of larger workflows. Install it with: npx transcribeanything-mcp
Is there a file size or duration limit?
The free web tier supports files up to 30 minutes. The CLI has no limits — process files of any length locally. Pro accounts support files up to 4 hours.
What about real-time / live transcription?
Real-time transcription from a microphone is on our roadmap for a future release. Currently, TranscribeAnything processes pre-recorded audio and video files.