CaptionFit — Transcribe Audio & Render Captioned Videos

How it works

Three steps. That's the whole thing.

Transcribe audio. Sync captions. Render video. No timeline-scrubbing, no manually nudging timestamps, no weird XML.

— STEP 01 · TRANSCRIBE

Drop in audio or video

MP3, WAV, M4A, MP4 — up to 2 hours. Pick a language (or auto-detect) and a caption length, then hit Transcribe.

rooftop_demo.mp3

3.4 MB · 0:24

verse_takes_03.wav

28.1 MB · 4:12

aligning…

+ drop more files

— STEP 02 · SYNC

Paste script or lyrics (optional)

Got the words already? Paste them — one line per caption — to fix spelling and word grouping. We snap them to the audio.

LYRICS.TXT

When the night is bright

we run a little wild

city lights below↳ snap to 0:08

I'll meet you on the rooftop

underneath the neon

for one more song

— STEP 03 · RENDER

Render video or grab the SRT

Pick a font, size, color and aspect ratio (9x16 or 16x9). Hit Render Video — or just download the SRT.

100:00:01,200 → 00:00:03,800When the night is bright

2

00:00:04,100 → 00:00:06,500we run a little wild

3

00:00:08,400 → 00:00:11,100city lights below

What you get

Built for the last-mile stuff that always eats your night.

Fast

Faster than real-time

A 4-minute song aligns in about 20 seconds. We run on dedicated GPUs so your queue stays empty.

Captionfit

12s

Service B

1m 42s

Service C

3m 08s

By hand

~40m

Lyric-aware

It uses what you give it

Paste lyrics or a script and CaptionFit aligns to those exact words. No more "ahh-vuh-tahn-deal" mishears.

Audio-only

we run a little while

city light's below

underneath the neo

+ Lyrics

we run a little wild

city lights below

underneath the neon

Render-ready

Burn-in, your way

Pick a font, dial in size and color, choose 9x16 for Reels or 16x9 for YouTube. Hit Render Video.

Noto Sans ▾ A− A+ ↑ ↓

9x16

16x9 · Cover

Render queue: 12s Render Video →

Suno → video

Turn a Suno song into a captioned video.

Paste a Suno link and CaptionFit fetches the track, reads its lyrics, and captions it automatically — perfectly timed to the vocals, free, with no transcription. Then style it and render a lyric video for Reels, Shorts, or YouTube.

Paste a Suno link

One paste. Drop in a suno.com/song/… link — no export, no copying lyrics.

Auto-captioned, free. Captioning a Suno track from its link costs no credits.

Perfect words, any language. Uses Suno's own lyrics, aligned to the beat — every word spelled right.

Link → captioned

URLsuno.com/song/…

0:02Caption line one, on the beat

0:05Caption line two, perfectly timed

0:08…and the rest, automatically

Translate

Translate captions into 100+ languages.

Caption once, then translate your subtitles into any language with a single click. Timings stay perfectly in sync, and every caption style, font, and word-by-word karaoke highlight carries over — so you can reach a global audience without re-editing a thing.

100+ languages

SpanishFrenchGermanPortugueseArabicHindiJapaneseKoreanChinese+ 90 more

One click. Pick a language and translate every caption in seconds.

Perfectly in sync. Translated captions keep the exact original timing.

Keeps your design. Works with every style, font, and karaoke highlight.

Same line, every language

ENWe run a little wild

ESCorremos un poco salvajes

FROn court un peu sauvages

DEWir laufen ein wenig wild

JA少しワイルドに駆ける

In the editor

Preview, tweak, render. All in one tab.

Recent projects

rooftop_demojust now

verse_takes_032h ago

podcast_ep_42yesterday

livestream_clipMon

lecture_introApr 28

New transcription

rooftop_demo.mp3 · 6 segments

00:24 · english · lyric-aligned

00:01,200When the night is bright0.99

00:04,100we run a little wild0.97

00:08,400city lights below0.99

00:12,300I'll meet you on the rooftop0.95

00:16,000underneath the neon0.98

00:20,100for one more song0.99

Preview & edit

Tweak captions while the video plays.

Burn-in preview updates live. Edit a line, nudge a timestamp, split or merge — render when it feels right.

16x9 · 1080p

00:00 / 00:24

Noto Sans · 48 · White

Position · Bottom

Render Video

J back 1s K play/pause L ahead 1s ⌘↵ split here

From the inbox

People who used to dread Sunday captioning.

I had a 3-minute song and the lyrics in a Notes file. CaptionFit gave me an SRT in 18 seconds and I uploaded the video before my coffee was cold.

MR

Mara Reyes

Independent songwriter

The lyric-paste feature is the unlock. Other tools mishear half my band's vocals — pasting the words means it just works.

DW

Devon Wu

Music video editor

Replaced an internal Python script we'd been duct-taping for a year. The keyboard shortcuts in the editor are chef's kiss.

PK

Priya Kothari

Podcast producer, Loopfield

Pricing

Start free. Scale when you're ready.

No credit card required. All plans include transcription, MP4 rendering, the captions editor, multilingual fonts, and AI tools.

Free

$0

20 tokens / month

~10 min transcribed & rendered video
4 AI cover images
Unlimited Script Fix & lyrics align

For trying it out — short clips and quick experiments.

Get started free

Things people ask before signing up.

How accurate is the alignment?

When you paste lyrics or a script, alignment is typically within 80–150ms of the spoken word — good enough that you'll rarely need to nudge anything. Audio-only transcription depends on the recording, but you can always paste a correction and re-align.

Which formats can I upload?

MP3, WAV, M4A, FLAC, AAC, OGG, plus video formats (MP4, MOV, WebM, MKV). Up to 2 hours per file on paid plans, 10 minutes on Free.

What languages are supported?

Over 99 languages — including English, Spanish, French, German, Mandarin, Hindi, Arabic, Japanese, and many more. Leave the language field on Auto-detect and CaptionFit will identify it for you.

Can I translate my captions into another language?

Yes. After captioning, click Translate and pick from 100+ languages. CaptionFit translates every caption line while keeping the original timing perfectly in sync — so you can publish subtitles in multiple languages, export translated SRTs, or render a captioned video in any language. The original is kept, so you can produce several languages from one project.

Can I caption a Suno song from its link?

Yes. Choose the "Suno link" option, paste your suno.com/song/… URL, and CaptionFit downloads the track, reads its lyrics, and captions it automatically — timed to the audio, with no transcription. Captioning a Suno track from its link is free; then you can style the captions and render a 9:16 or 16:9 lyric video.

Do I need an account to get started?

No account needed to start. Drop a file and CaptionFit gets to work — your project is saved in your browser. Sign in any time to access your projects across devices.

Can I style the captions?

Yes — choose from 30+ fonts, set your own color, adjust size and vertical position. Everything is live-previewed before you commit to a render.

What is Karaoke mode?

Karaoke mode highlights each word as it's spoken, keeping viewers locked in. Choose Color style (the active word changes color) or Stroke style (the active word gets an outline). Works on both 9:16 and 16:9 renders.

What aspect ratios can I render?

9:16 for Reels, TikTok, and Shorts, and 16:9 for YouTube and the web. Toggle Cover to fit a horizontal source into a vertical canvas (or vice versa).

Can I download captions as something other than a video?

Yes — download a clean SRT any time, even before rendering. SRT files work directly in YouTube Studio, Adobe Premiere, DaVinci Resolve, and most other editing tools.

Does CaptionFit train on my audio?

No — CaptionFit does not train any models on your audio or transcripts. Your files are stored only while your project exists; delete a project (or your account) and its files are removed from our servers. To produce captions we send your audio to specialist AI providers (ElevenLabs and Anthropic) acting as our processing providers, solely to return your results.

What about long files or batch jobs?

On paid plans you can drop a folder of files at once and we'll align them in parallel. Long files (lectures, audiobooks) are chunked automatically — you still get one clean SRT at the end.

Is CaptionFit the same as "Caption Fit"?

Yes — CaptionFit and "Caption Fit" are the same product. The brand name is written as one word (CaptionFit), but many people search for it as two words. Either way, you're in the right place.

Ready when you are

Drop a track. Get a captioned video.

No card required, no setup call, no "book a demo." Free tier covers most one-off projects.

+ New transcription See features

Captioned video in minutes, not Saturdays.