Transcription & AI

Autorec transcribes recordings locally using whisper.cpp and optionally generates AI summaries via any OpenAI-compatible API.

How Transcription Works

  1. After a recording finishes, autorec extracts the audio track
  2. The audio is processed by the selected Whisper model entirely on your machine
  3. Two output files are created alongside the video:
    • .txt — plain text transcript
    • .srt — subtitle file with timestamps

No audio or video data leaves your computer during transcription.

Whisper Models

Models are downloaded on first use and stored in ~/.local/share/autorec/models/ (Linux) or %LOCALAPPDATA%\autorec\models\ (Windows).

ModelSizeSpeedAccuracyBest For
tiny~75 MBFastestBasicQuick notes, low-power machines
base~142 MBFastGoodDefault — recommended for most users
small~466 MBModerateBetterWhen accuracy matters more than speed
medium~1.5 GBSlowHighNon-English languages, difficult audio
large~3 GBSlowestBestMaximum accuracy, powerful hardware

Downloading Models

  1. Open Settings from the tray menu
  2. Go to the Transcription section
  3. Select a model size
  4. Click Download — the model downloads once and is reused for all future transcriptions

AI Summaries

AI summaries use a cloud API to generate a title and summary from the transcript text. Only the text is sent — no audio or video.

Setup

  1. Open Settings > AI Summaries
  2. Enter your API endpoint (e.g., https://api.openai.com/v1)
  3. Enter your API key
  4. Choose a model (e.g., gpt-4o-mini)
  5. Enable auto-summarize

Compatible Services

Any service with an OpenAI-compatible chat completions endpoint works:

  • OpenAIhttps://api.openai.com/v1
  • OpenRouterhttps://openrouter.ai/api/v1
  • Local models (Ollama, LM Studio, etc.) — use your local endpoint

What Gets Generated

For each transcribed recording, autorec generates:

  • Title — a short, descriptive title for the meeting
  • Summary — a concise summary of key points discussed

Both appear in the video library and the video detail view, making it easy to find the meeting you need without rewatching.