Replies: 4 comments 2 replies
-
I don't quite understand your question. The emotions are unlabeled during training (i.e., we have no emotion labels in the dataset). These emotion labels are estimated based on its associated text, as we assume that the readers in the dataset will read texts that are happy in a happy tone. The model learns this association in an unsupervised way, so when we provide a happy text, the resultant style will be happy, and then we can use this happy style to synthesize any text regardless of whether the text is happy or not. |
Beta Was this translation helpful? Give feedback.
-
The question was sloppy. I'm sorry. The answer I wanted was how to use style transfer, because there was no related usage in the .ipynb file of the Colab folder. I have one question about pl-bert. The capacity of .t7 file of pl-bert learned in English is about 30mb. (Your pl-bert pre-learning model) My dataset is about 4 million, and token_maps has a file capacity of about 600kb. Do you know why there is a nearly 10x capacity difference? |
Beta Was this translation helpful? Give feedback.
-
In Inference_LibriTTS.ipynb, it doesn't seem that ref_bert_dur is used anywhere after it is calculated. Is this intentional? |
Beta Was this translation helpful? Give feedback.
-
would like to see this reolved |
Beta Was this translation helpful? Give feedback.
-
Questions about emotional expression.
In this project, it seems to analyze text to automatically find emotions and reflect them in the voice.
But I want to convey my emotions directly, and put them in the voice that I synthesize.
(For example, when synthesizing a voice, it conveys the feeling of 'happiness' to synthesize a happy voice.)
Is there any way I can set up my own emotions?
(I think you set your emotions in the style transfer item on the demo page, so I would really appreciate it if you could tell me about this as well.)
Thanks you.
Beta Was this translation helpful? Give feedback.
All reactions