VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild

submited by
Style Pass
2024-03-29 16:00:11

VoiceCraft is a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on in-the-wild data including audiobooks, internet videos, and podcasts.

All speakers are unseen during training. Utterances are from our RealEdit evaluation set, comprises audiobooks, YouTube videos, and Spotify podcasts

This website is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Leave a Comment