I was watching the latest news episode from Whisky.com (where fine spirits meet ™) the other day on YouTube, and noticed that the transcription was

Search code, repositories, users, issues, pull requests...

submited by
Style Pass
2024-10-05 13:00:03

I was watching the latest news episode from Whisky.com (where fine spirits meet ™) the other day on YouTube, and noticed that the transcription was really off.

I'm not sure which transcriber is being used by YouTube to generate the closed captions, but it makes a bunch of mistakes, some of which are obviously related to the whisky domain, while others are general transcriptions mistakes.

Using OpenAI's whisper transcriber, results are significantly better, but still, domain-related errors are common. The transcriber is missing important context.

Expected: "two single malts" (single malt is a common whisky term) YouTube transcriber: "two single molds" OpenAI Whisper: "two single moulds"

YouTube transcriber: "the distel group and they uh uh had buah haban deanston and toomore in their group" OpenAI Whisper: "the Distel Group and they had Bunnehaben, Diensten and Tobermory in their group"

So, while it looks like YouTube could generate some improvements by using a model similar to Whisper (perhaps it's some conscious decision on their end to use a smaller and weaker model due to their scale), there is still much room for improvement on top of Whisper's result as well.

Leave a Comment