LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search, ICASSP 2021, by Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Jinzhu Li, S

NeuralSpeech/LightSpeech at master · microsoft/NeuralSpeech · GitHub

submited by
Style Pass
2023-04-02 00:30:02

LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search, ICASSP 2021, by Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Jinzhu Li, Sheng Zhao, Enhong Chen and Tie-Yan Liu, is a method to find lightweight and efficient TTS models with neural architecture search.

Note: pytorch_lightning 0.6.0 may have a security issue(see here and here), you can ignore it or try to solve it following this patch.

--reset is to reset the hyper parameters stored in the config file under the checkpoint folder if exists with the config file proved through --config.

To measure the inference time of the model, add --hparams "profile_infer=True" to the inference command. This will measure the inference time of the model (exclude the vocoder) along with the time of generated audio waves. You will see the model inference time model_time and the generated audio waves time gen_wav_time in the log output. After the inference is done, you can get the total model inference time and the total generated audio waves time. For example, following output log

means the total inference time of the model is 3.95 seconds, and the time of all the generateed audio waves is 626.75 seconds. To calculate the RTF, just divide the model inference time by the genearated audio waves time: RTF = model_time/gen_wav_time. In this example, the RTF=3.95/626.75=0.0063.

Leave a Comment