Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How is pronunciation decided? #206

Open
iv2985 opened this issue Oct 27, 2024 · 2 comments
Open

How is pronunciation decided? #206

iv2985 opened this issue Oct 27, 2024 · 2 comments

Comments

@iv2985
Copy link

iv2985 commented Oct 27, 2024

Homographs like "wind" have different meaning and pronunciation depending on the context, but same spelling. For example "wind power" vs "wind a clock". How is this pronunciation decided in such cases?

It is pronouncing the "wind" in "wind power" the wrong way - the way it would be pronounced in "wind a clock". Strangely, it gets it right for the default voices, but wrong when I trained a new English voice.

@highfillgoods
Copy link

highfillgoods commented Dec 18, 2024

More of a banaid use the EN-BR and it sounds better. The EN-US says wine'd power noticeably. but MeloTTS has g2p-en doing the pronunciation as far as I can tell. for me it is loacted in my conda envirnment.. /home/user/anaconda3/envs/melotts/lib/python3.10/site-packages/g2p_en.

Test the current Pronunciation out got to terminal or your your go code runner, and type python and enter and run each one of these.

from g2p_en import G2p

g2p = G2p()

word = "wind"
phonemes = g2p(word)
print(f"Phonemes for '{word}': {phonemes}")`
Phonemes for 'wind': ['W', 'AY1', 'N', 'D']

after you find what you need edit the homo file

nano homographs.en
add something like this to the list

WIND|W IH1 N D|W AY1 N D|N

that should get you started on the first half at least

@nwhitehead
Copy link

One issue is that the code currently calls g2p() separately for each word. The G2p package can lookup the word in the dictionary or guess a pronunciation if it is not in the dictionary. But this doesn't allow G2p to figure out the part of speech to do disambiguation. The W IH1 N D versus W AY1 N D can be distinguished based on noun/verb, so changing MeloTTS to call g2p() on the full text would fix this specific problem.

There are other cases in English where the part of speech (noun/verb/etc.) is not enough to distinguish pronunciations. A full solution is to train another DL model to go from text to phonemes. A recent model I found is SoundChoice 1 which has Apache-2.0 licensed weights available 2. Swapping in a model like this would be a more full-featured fix. But playing around with SoundChoice shows it isn't perfect, either. It doesn't get, "You wind a bobbin but the wind blows."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants