Recently a post about generating audiobooks started trending on hn, and some people in the comments wished they could clone their voice and narrate te

Generate Speech of Your Favourite Narrator Locally With F5-TTS-MLX

submited by

Style Pass

2025-01-16 19:30:14

Recently a post about generating audiobooks started trending on hn, and some people in the comments wished they could clone their voice and narrate text without sending it off their machine. It’s never been easier!

For this example, we only need a mac, uv (modern python package manager), ffmpeg for audio processing and optionally chatgpt for transcribing your voice (but you can do it manually or use mlx-whisper, for example). We will be using F5-TTS-MLX, an open-source speech synthesis implementation of F5 TTS model in Apple Silicon array framework

The original implementation has 🇬🇧🇺🇸🇫🇮🇫🇷🇮🇳🇮🇹🇯🇵🇨🇳🇷🇺🇪🇸 language support. The MLX implementation also includes a duration predictor specifically for English, which simplifies the creation of natural-sounding audio, but it’s not available in other languages. Personally I’m interested in German, and someone fine-tuned F5 for it, so additionally training a duration predictor can help to translate audio to it, similar to what Lex Fridman did for the interview with Javier Milei, but open source.