yt2doc is meant to work fully locally, without invoking any external API. The OpenAI SDK dependency is required solely to interact with Ollama. There

Search code, repositories, users, issues, pull requests...

submited by
Style Pass
2024-10-16 22:30:04

yt2doc is meant to work fully locally, without invoking any external API. The OpenAI SDK dependency is required solely to interact with Ollama.

There have been many existing projects that transcribe YouTube videos with Whisper and its variants, but most of them aimed to generate subtitles, while I had not found one that priortises readability. Whisper does not generate line break in its transcription, so transcribing a 20 mins long video without any post processing would give you a huge piece of text, without any line break or topic segmentation. This project aims to transcribe videos with that post processing.

By default, yt2doc uses faster-whisper as transcription backend. You can run yt2doc with different faster-whisper configs (model size, device, compute type etc):

For the meaning and choices of --whisper-model, --whisper-device and --whisper-compute-type, please refer to this comment of faster-whisper.

Leave a Comment