Speech Synthesis on Linux

2021-09-25

The main options are espeak, Gnuspeech, spd-say, MBROLA, PicoTTS and the festvox project which festival and flite are part of. While I found espeak to be the simplest system to use, festival produced the best results when used with the right voices. This post shall outline the various ways festival can be used and the steps required to achieve good results.

The voice_cmu_us_slt_arctic_hts voice (used above) can be installed easily and isn’t bad but the Nitech HTS voices are better. HTS stands for hidden-markov-model-based speech synthesis system and Nitech is the Nagoya Institute of Technology.

The voices were not available on the original site at the time of writing this post thus I uploaded them to a GitHub repository.

Note: The Nitech voices are not compatible with festival versions greater than 2.1 which is from 2010. The default festival version on Ubuntu 20.04 is 2.5.

Flite is a small, fast and more portable (albeit less customizable) speech synthesis engine for festival voices. Install it with sudo apt install flite. Download more voices:

