-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed difference for longer input text #296
Comments
Latency generally increases as the length of the input sentence grows. However, a slowdown for short sentences is not typical and might indicate an issue. I've worked with StyleTTS2 and successfully reduced its latency by 2.5-3 times. If you can share your model file, I can investigate further to pinpoint the issue. One possible reason for unnatural output is that StyleTTS2 is trained on audiobook datasets, where the style is tailored toward narration. This makes it perform well for longer sentences but struggle with shorter text, leading to degraded quality. Additionally, the model is trained with a high maximum sequence length, which could also explain the inconsistency when dealing with shorter inputs. |
Thank you so much for your response. |
Hi @Ananya21162, Without seeing the code, I can't say much, but what I would suggest is to perform an inner ablation study. Print the time taken for each component during inference—such as the text encoder, BERT, alignment, prosody predictor, decoder, diffusion, and other relevant components. This way, you can identify which specific component is causing the issue, and that will help pinpoint the problem. Then let me know, and we can further debug it. |
We are noticing very slow speed for small sentences and for longer sentences, the model starts normally and then gradually increases the speed to quite noticeably high, which sounds un-natural often.
What could be the possible cause for this? Can anyone please help!
The text was updated successfully, but these errors were encountered: