Question 1

What audio and video formats are supported?

Accepted Answer

MP3, WAV, M4A, FLAC, AAC, OGG, WMA, OPUS for audio. MP4, MOV, WebM for video. When you upload a video, we automatically extract the audio track.

Question 2

How many languages are supported?

Accepted Answer

Over 96 languages, from Afrikaans to Zulu. 16 languages have excellent accuracy (under 8% word error rate), including English, Spanish, French, German, Japanese, Korean, and Chinese.

Question 3

Can I transcribe YouTube videos?

Accepted Answer

Yes. Paste any YouTube URL and we'll download the audio and transcribe it. This works with most video and podcast platforms — over 1,800 sites are supported.

Question 4

What is speaker diarization?

Accepted Answer

Speaker diarization identifies who said what in a recording. When enabled, your transcript labels each segment with a speaker ID (Speaker 0, Speaker 1, etc.), making it easy to follow multi-person conversations.

Question 5

Is my audio data private?

Accepted Answer

When using the CLI or MCP server locally, your audio never leaves your device — all processing happens on your machine. For the web app, uploaded audio is processed on our servers and deleted within 24 hours.

Question 6

What AI model is used?

Accepted Answer

We use OpenAI's Whisper model (via faster-whisper, a CTranslate2-optimized implementation). The default 'turbo' model balances speed and accuracy. 'large-v3' provides maximum accuracy for all languages.

Question 7

Do I need an account?

Accepted Answer

No. Basic web transcription works without any signup. The CLI and MCP server are always free and unlimited with no account required.

Question 8

What export formats are available?

Accepted Answer

Plain text (TXT), SubRip subtitles (SRT), WebVTT subtitles (VTT), and structured JSON with full segment and word-level data.

Question 9

How accurate is the transcription?

Accepted Answer

For English and other Tier 1 languages, expect under 7% word error rate — comparable to professional human transcription. Accuracy varies by language, audio quality, and background noise.

Question 10

Can I use this with AI assistants like Claude?

Accepted Answer

Yes. TranscribeAnything includes an MCP (Model Context Protocol) server that Claude and other AI agents can use to transcribe audio as part of larger workflows. Install it with: npx transcribeanything-mcp

Question 11

Is there a file size or duration limit?

Accepted Answer

The free web tier supports files up to 30 minutes. The CLI has no limits — process files of any length locally. Pro accounts support files up to 4 hours.

Question 12

What about real-time / live transcription?

Accepted Answer

Real-time transcription from a microphone is on our roadmap for a future release. Currently, TranscribeAnything processes pre-recorded audio and video files.

Frequently Asked Questions