You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a text to translate from Italian to English, this one text_to_translate = "| \\_VOEMI | Data emissione operazione | Deve essere maggiore o uguale alla data di emissione della polizza e minore o uguale alla data di sistema. |"
If I translate the text with target "EN-GB" i get this result | Issuance date | Must be greater than or equal to the policy issue date and less than or equal to the system date. |
The issue here is that the part | \\_VOEMI gets lost.
However, if I specify that the target language is "EN-US" I get this correct result | | \_VOEMI | Issuance date transaction | Must be greater than or equal to the policy issue date and less than or equal to the system date. |
The text was updated successfully, but these errors were encountered:
Im not 100% what your use case is, but you will get the highest possible translation quality by parsing structured data like this before feeding it into the API, for example in your case:
text_to_translate="| \\_VOEMI | Data emissione operazione | Deve essere maggiore o uguale alla data di emissione della polizza e minore o uguale alla data di sistema. |"special_tokens= ["\\_"]
delimiter="|"translator=deepl.Translator(...)
translated_texts= []
fortextintext_to_translate.split(delimiter):
if (nottext.strip()) orany(map(lambdatok: text.contains(tok), special_tokens)):
translated_texts.append(text)
continueelse:
# you might want to trim the whitespace here as well with text.trim(), and maybe# fill up the missing whitespace when appending to translated_texts, as this looks like a tabletranslated_texts.append(translator.translate_text(text, ...).text)
output=delimiter.join(translated_texts)
Due to the nature of ML models, we otherwise cannot guarantee that the output is stable/preserves these kinds of tokens. You can also take a look at ignore tags as another option.
Jan, thanks for your prompt response. I will implement your suggestions. At the same time it is interesting the different behaviour between "EN-GB" and "EN-US".
I have a text to translate from Italian to English, this one
text_to_translate = "| \\_VOEMI | Data emissione operazione | Deve essere maggiore o uguale alla data di emissione della polizza e minore o uguale alla data di sistema. |"
I have also a glossary I want to use
If I translate the text with target "EN-GB" i get this result
| Issuance date | Must be greater than or equal to the policy issue date and less than or equal to the system date. |
The issue here is that the part
| \\_VOEMI
gets lost.However, if I specify that the target language is "EN-US" I get this correct result
| | \_VOEMI | Issuance date transaction | Must be greater than or equal to the policy issue date and less than or equal to the system date. |
The text was updated successfully, but these errors were encountered: