CrisperWhisper is an advanced variant of OpenAI's Whisper, designed for fast, precise, and verbatim speech recognition with accurate (crisp) word-leve

Search code, repositories, users, issues, pull requests...

submited by

Style Pass

2024-11-22 15:30:05

CrisperWhisper is an advanced variant of OpenAI's Whisper, designed for fast, precise, and verbatim speech recognition with accurate (crisp) word-level timestamps. Unlike the original Whisper, which tends to omit disfluencies and follows more of a intended transcription style, CrisperWhisper aims to transcribe every spoken word exactly as it is, including fillers, pauses, stutters and false starts.

CrisperWhisper significantly outperforms Whisper Large v3, especially on datasets that have a more verbatim transcription style in the ground truth, such as AMI and TED-LIUM.

CrisperWhisper demonstrates superior performance segmentation performance. This performance gap is especially pronounced around disfluencies and pauses. The following table uses the metrics as defined in the paper. For this table we used a collar of 50ms. Heads for each Model were selected using the method described in the How section and the result attaining the highest F1 Score was choosen for each model using varying number of heads.

Additional Installations: Follow OpenAI's instructions to install additional dependencies like ffmpeg and rust: Whisper Setup.