Inference latency #288

Ananya21162 · 2024-10-10T12:22:35Z

I was trying out the model with 439 characters and saw 5-6 sec of average latency on libri-TTS dataset. Is there a way we can reduce the latency (decoder takes the most time).
Also, I finetuned the model with a few samples from a new speaker and saw the latency increased by 600-700 ms further, is this expected?
Is the latency expected to increase if the dataset is larger (english only)?
Similarly if we add more languages, is the model inference latency going to increase?

Respaired · 2024-10-12T14:35:54Z

HifiGAN is essentially larger and heavier.

you need to either find another ckpt pretrained on ISTFT or train a new model yourself from scratch. you can also fine tune on top of the LJ ckpt which is not recommended but one of my friends managed to get reasonable results by doing so.

as for your other questions, no the dataset have no impact on the latency. only the parameters of your model and mainly the size of the decoder matters the most.

Ananya21162 · 2024-10-15T13:04:07Z

Thanks for your reply. We have two models; one trained on libriTTS-R (360+100 hrs) data and the other finetuned on this model with 20 min audio samples for multiple speakers. We kept max_len 100 for the first and 400 for the second one. The first model and the second one have an average latency difference of nearly 1.5 sec.
Is it because of this parameter? What should be the ideal value?

Respaired · 2024-10-17T17:14:33Z

You're welcome.
as i've said, your choice of max_len or the dataset shouldn't matter.
only the decoder has the largest impact.

Ananya21162 · 2024-10-21T10:08:31Z

Understood. But in our experiment, we checked the size of decoder for both models mentioned above. It was same for both , 217 MBs. But still both models have a latency difference of 1.5 seconds. Do you know of any other possible cause?
In fact, we compared all the model components and they are consistent for all

bert size: 201359360 / bit | 25.17 / MB
bert_encoder size: 12599296 / bit | 1.57 / MB
predictor size: 518227584 / bit | 64.78 / MB
decoder size: 1737263744 / bit | 217.16 / MB
text_encoder size: 179404800 / bit | 22.43 / MB
predictor_encoder size: 444186016 / bit | 55.52 / MB
style_encoder size: 444186016 / bit | 55.52 / MB
diffusion size: 1620926464 / bit | 202.62 / MB
text_aligner size: 251790464 / bit | 31.47 / MB
pitch_extractor size: 168037024 / bit | 21.00 / MB
mpd size: 1315384640 / bit | 164.42 / MB
msd size: 8988864 / bit | 1.12 / MB
wd size: 37556288 / bit | 4.69 / MB
Total Model size: 6939910560 / bit | 867.49 / MB

Ananya21162 · 2024-10-21T12:25:22Z

Also, one model is trained from scratch and the other one is fine-tuned. Will that make any difference? Num of Model params & model size is same :/

Respaired · 2024-11-08T09:23:01Z

Unless you change the decoder, or use very short samples with LFInference, there must not be a whole lot of latency overhead

UmerrAhsan · 2024-12-12T09:39:24Z

it’s unusual that fine-tuning StyleTTS2 increases the checkpoint file size, even though the number of parameters in the model remains the same. Has anyone identified the reason behind this size increase?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference latency #288

Inference latency #288

Ananya21162 commented Oct 10, 2024 •

edited

Loading

Respaired commented Oct 12, 2024 •

edited

Loading

Ananya21162 commented Oct 15, 2024

Respaired commented Oct 17, 2024

Ananya21162 commented Oct 21, 2024 •

edited

Loading

Ananya21162 commented Oct 21, 2024

Respaired commented Nov 8, 2024

UmerrAhsan commented Dec 12, 2024

Inference latency #288

Inference latency #288

Comments

Ananya21162 commented Oct 10, 2024 • edited Loading

Respaired commented Oct 12, 2024 • edited Loading

Ananya21162 commented Oct 15, 2024

Respaired commented Oct 17, 2024

Ananya21162 commented Oct 21, 2024 • edited Loading

Ananya21162 commented Oct 21, 2024

Respaired commented Nov 8, 2024

UmerrAhsan commented Dec 12, 2024

Ananya21162 commented Oct 10, 2024 •

edited

Loading

Respaired commented Oct 12, 2024 •

edited

Loading

Ananya21162 commented Oct 21, 2024 •

edited

Loading