Content creation and creative work is changing forever with the advent of generative ML models like GPT3 & Bloom (text generation), DALLE & St

Introducing Peregrine: A Truly Realistic Text to Speech Model with Emotion and Laughter

submited by
Style Pass
2022-09-22 23:00:17

Content creation and creative work is changing forever with the advent of generative ML models like GPT3 & Bloom (text generation), DALLE & Stable Diffusion (image generation), and RunwayML (video generation).

Today we are introducing our first model, Peregrine, an ultra-realistic Text to Speech model for the missing modality in that new set of models which is Generative Audio.

We believe in a future where all content creation will be generated by AI but guided by humans, and the most creative work will depend on the human ability to articulate their desired creation to the model.

However, despite these achievements, current TTS systems usually demand high quality studio-recorded annotated audio from different speakers with different styles and emotions in order to fulfill the needs for commercial applications.

Furthermore, the addition of a new speaker to the model usually requires at least 30 minutes of clean studio recorded data with phonetic annotations.

Leave a Comment