translator.create_glossary() forces to remove regional variant #109

CJRzzZ · 2024-06-14T18:57:37Z

I've encountered a problem with the translator.create_glossary() function, where it sets the source language of a glossary object to "EN" despite the argument specifying "EN-US". This behavior seems to stem from the code in "translator.py" at line 302, which attempts to strip regional variants and retain only the base language code.

This leads to an issue because "EN" is deprecated in the DeepL API, which then throws a deepl.exceptions.DeepLException stating "target_lang="EN" is deprecated, please use "EN-GB" or "EN-US" instead." Furthermore, if the glossary is set with "EN" and translator.translate_text() is called with "EN-US" as the source language, a ValueError is raised, stating "source_lang and target_lang must match glossary". This inconsistency makes it impossible to use a matching value for the source language.

Could you please look into this? Thank you for your attention to this matter.

JanEbbing · 2024-06-17T08:03:47Z

Sorry, can you clearly describe (maybe with sample code) what you are doing and what error you get?

Glossaries don't have a regional variant attached to them, so "EN" is correct as the source or target language of a glossary.
It should then be possible to use glossaries for all variants of their associated language.
"EN-US" as the source language
This sounds like the issue - the source language would have to be "EN". Regional variants are only supported for target languages. The error you get seems to be wrong though, I can follow up on this.

You can read more on this differentiation in the documentation here

CJRzzZ · 2024-06-18T04:36:16Z

Sure, here is the sample code,
g = translator.create_glossary("GITCG_en_to_jp", 'EN-US', 'JA', dict_en_to_jp )
result = translator.translate_text(clean_text, source_lang=source_lang, target_lang=target_lang, glossary=g, ).text
In the first line, I tried to store the glossary with "EN-US" as the source language. The function "create_glossary" will automatically convert the source language into "EN". But it brings problem in the second line, when I tried to use "EN-US" as the source_lang, it returned "source_lang and target_lang must match glossary" error; when I tried to use "EN" as the source_lang, it returned "target_lang="EN" is deprecated, please use "EN-GB" or "EN-US" instead" error. So this is the error I have met and I hope I made it clear to you.

JanEbbing · 2024-06-18T07:30:17Z

Yes, like I said - we differentiate between source and target languages

"EN" is a valid source language
"EN-US" is an invalid source language
"EN" is an invalid target language
"EN-US" is a valid target language

So in your code, the following should work:

source_lang = "EN"
target_lang = "JA"
g = translator.create_glossary("GITCG_en_to_jp", source_lang, target_lang, dict_en_to_jp )
result = translator.translate_text(clean_text, source_lang=source_lang, target_lang=target_lang, glossary=g, ).text

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

translator.create_glossary() forces to remove regional variant #109

translator.create_glossary() forces to remove regional variant #109

CJRzzZ commented Jun 14, 2024

JanEbbing commented Jun 17, 2024

CJRzzZ commented Jun 18, 2024

JanEbbing commented Jun 18, 2024

translator.create_glossary() forces to remove regional variant #109

translator.create_glossary() forces to remove regional variant #109

Comments

CJRzzZ commented Jun 14, 2024

JanEbbing commented Jun 17, 2024

CJRzzZ commented Jun 18, 2024

JanEbbing commented Jun 18, 2024