Recently a post about generating audiobooks started trending on hn, and some people in the comments wished they could clone their voice and narrate te

Generate Speech of Your Favourite Narrator Locally With F5-TTS-MLX

submited by
Style Pass
2025-01-16 19:30:14

Recently a post about generating audiobooks started trending on hn, and some people in the comments wished they could clone their voice and narrate text without sending it off their machine. Itโ€™s never been easier!

For this example, we only need a mac, uv (modern python package manager), ffmpeg for audio processing and optionally chatgpt for transcribing your voice (but you can do it manually or use mlx-whisper, for example). We will be using F5-TTS-MLX, an open-source speech synthesis implementation of F5 TTS model in Apple Silicon array framework

The original implementation has ๐Ÿ‡ฌ๐Ÿ‡ง๐Ÿ‡บ๐Ÿ‡ธ๐Ÿ‡ซ๐Ÿ‡ฎ๐Ÿ‡ซ๐Ÿ‡ท๐Ÿ‡ฎ๐Ÿ‡ณ๐Ÿ‡ฎ๐Ÿ‡น๐Ÿ‡ฏ๐Ÿ‡ต๐Ÿ‡จ๐Ÿ‡ณ๐Ÿ‡ท๐Ÿ‡บ๐Ÿ‡ช๐Ÿ‡ธ language support. The MLX implementation also includes a duration predictor specifically for English, which simplifies the creation of natural-sounding audio, but itโ€™s not available in other languages. Personally Iโ€™m interested in German, and someone fine-tuned F5 for it, so additionally training a duration predictor can help to translate audio to it, similar to what Lex Fridman did for the interview with Javier Milei, but open source.

Leave a Comment