-
-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More consistent line breaks in generated po files #5251
Comments
There is a custom TextWrapper class since #4511. Its code seems wrong for emojis (what you observe) and generally the algorithm is different from gettext one. Maybe it's time to reimplement gettext one in Python instead of tweaking TextWrapper. The tricky part is that it is 630 lines of code: https://git.savannah.gnu.org/gitweb/?p=gettext.git;a=blob;f=gettext-tools/src/write-po.c;h=1c03b2932f5a15828a5db62ea9c82de5db0d5fdb;hb=HEAD#l652 |
@nijel I could try my hand at this, if it's just a matter of porting to python. How likely is it that a port would be merged? I'm hesitant spending time on this unless it's likely to be used. |
We definitely want to be consistent with gettext here, so it's likely to be merged unless there is some major issue with the code. |
I looked a bit into this. It seems to be a bit more involved than merely porting, a few functions, right? So the existing code overrides python's The C-code you linked as 1700 lines, so I assume not all of it would need to be ported, since you mentioned 630 loc? I could still try to help out here, but I think I would need some support in what exactly needs to be replaced and what parts of the C code are actually needed. |
I might have miscalculated the lines in gettext code, anyway the point was that it's a lot of code. The wrapper is used in PO serializer only, it uses TextWrapper because it was existing solution. Later support for CJK was added. But having completely custom wrapper would be okay as well. |
Sorry but I'm probably out. Tried to wrap my head around what needs to be done and came to the conclusion that the codebase is too large for me to dive in without a deeper understanding and I wouldn't be sure enough that any changes I make are sufficiently free from side effects. |
The only thing which is needed is a class that will wrap text for the PO file. It has an existing interface and test-cases (which could be extended to cover issues you've discovered), so the risk of unwanted side effects is relatively low once the tests pass. PS: I've added these tests as xfail in #5256. Those tests are executed with both gettext and Python implementations, so they provide a great way to catch different behavior of both implementations. |
This covers more use-cases than simple builtin cjklen which focused on CJK letters only. Issue translate#5251
#5282 addresses part of this issue (not handling all Unicode wide chars). |
This covers more use-cases than simple builtin cjklen which focused on CJK letters only. Issue #5251
It seems that po file line breaks have some weird inconsistencies in them that make them hard to compare (e.g. using Beyond Compare) to the output of common translation tools like poedit.
I described my observations on the Weblate issue tracker here: WeblateOrg/weblate#11615 before being pointed to this repo.
Could anyone shed some light on why it is behaving like this and maybe point to the place in the code where line breaks are determined?
It would be nice to create a format that is in line with what common po editors create to make handling and comparing file changes more natural.
The text was updated successfully, but these errors were encountered: