A question about expressing one's emotions #181

junylee11 · 2024-01-03T07:56:43Z

junylee11
Jan 3, 2024

Questions about emotional expression.

In this project, it seems to analyze text to automatically find emotions and reflect them in the voice.

But I want to convey my emotions directly, and put them in the voice that I synthesize.
(For example, when synthesizing a voice, it conveys the feeling of 'happiness' to synthesize a happy voice.)

Is there any way I can set up my own emotions?
(I think you set your emotions in the style transfer item on the demo page, so I would really appreciate it if you could tell me about this as well.)

Thanks you.

yl4579 · 2024-01-08T05:29:35Z

yl4579
Jan 8, 2024
Maintainer

I don't quite understand your question. The emotions are unlabeled during training (i.e., we have no emotion labels in the dataset). These emotion labels are estimated based on its associated text, as we assume that the readers in the dataset will read texts that are happy in a happy tone. The model learns this association in an unsupervised way, so when we provide a happy text, the resultant style will be happy, and then we can use this happy style to synthesize any text regardless of whether the text is happy or not.

1 reply

platform-kit Jan 16, 2024

Would there be any way to introduce labelled emotions to the training? Say, by adding things such as [sad] or perhaps using emojis to indicate shift in tone during training, so that after training someone could control the emotion independently from the speaker's audio sample?

junylee11 · 2024-01-09T05:57:58Z

junylee11
Jan 9, 2024
Author

The question was sloppy. I'm sorry.

The answer I wanted was how to use style transfer, because there was no related usage in the .ipynb file of the Colab folder.
After the question, I looked again carefully and found that the .ipynb file in the Demo folder had the use of style transfer, and I solved my questions. Thank you.

I have one question about pl-bert. The capacity of .t7 file of pl-bert learned in English is about 30mb. (Your pl-bert pre-learning model)
However, .t7 files learned in other languages have a capacity of about 300mb.

My dataset is about 4 million, and token_maps has a file capacity of about 600kb. Do you know why there is a nearly 10x capacity difference?

1 reply

abusaadp Dec 25, 2024

Can you please post the solution for the benefit of others?

effusiveperiscope · 2024-01-15T07:23:30Z

effusiveperiscope
Jan 15, 2024

In Inference_LibriTTS.ipynb, it doesn't seem that ref_bert_dur is used anywhere after it is calculated. Is this intentional?

0 replies

korakoe · 2024-03-17T23:50:29Z

korakoe
Mar 17, 2024

would like to see this reolved

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question about expressing one's emotions #181

{{title}}

Replies: 4 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

A question about expressing one's emotions #181

junylee11 Jan 3, 2024

Replies: 4 comments · 2 replies

yl4579 Jan 8, 2024 Maintainer

platform-kit Jan 16, 2024

junylee11 Jan 9, 2024 Author

abusaadp Dec 25, 2024

effusiveperiscope Jan 15, 2024

korakoe Mar 17, 2024

junylee11
Jan 3, 2024

Replies: 4 comments 2 replies

yl4579
Jan 8, 2024
Maintainer

junylee11
Jan 9, 2024
Author

effusiveperiscope
Jan 15, 2024

korakoe
Mar 17, 2024