Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zh-TW: Day 1 Translations #1322

Merged
merged 23 commits into from
Oct 22, 2023
Merged

zh-TW: Day 1 Translations #1322

merged 23 commits into from
Oct 22, 2023

Conversation

henrif75
Copy link
Collaborator

@henrif75 henrif75 commented Oct 6, 2023

Professional translations for missing day 1 entries #684

The diff is large mostly because of the normalization.

Professional translations for missing day1 entries google#684

The diff is large mostly because of the normalization.
@victorhsieh
Copy link
Collaborator

Wow, the diff is 21K lines. Any hints to help me narrow down to smaller ranges?

vgnogueira pushed a commit to vgnogueira/comprehensive-rust that referenced this pull request Oct 7, 2023
They caches have been refreshed long ago. I'm no longer sure which google#1322 I was referring to in the comment — it might have been a weird typo for google#552.
@mgeisler
Copy link
Collaborator

mgeisler commented Oct 7, 2023

The diff is large mostly because of the normalization.

What kind of normalization are you doing?

My guess is that you've used mdbook-i18n-normalize? You don't have to do this since there hasn't been any non-compatible changes made to mdbook-i18n-helpers since version 0.2.0. Infact, mdbook-i18n-normalize is a bit clumsy: even with google/mdbook-i18n-helpers#93, there will be differences in how it wraps the lines compared to msgmerge (and also msgcat, which uses the same wrapping as msgmerge). This is part of google/mdbook-i18n-helpers#32.

All in all, this means that you should try to submit PRs with files that have been touched lastby one of the Gettext tools: by msgcat or msgmerge.

However, I tried running the file thought msgcat -o po/zh-TW.po po/zh-TW.po just now, and I still get a lot of differences:

% git diff HEAD^ | diffstat
 zh-TW.po |13296 ++++++++++++++++++++++++++++++++++++++++++++++++-------------------------------------------------
 1 file changed, 6704 insertions(+), 6592 deletions(-)

So the changes are not primarily because of changes to the line numbers.

Looking through gh pr diff 1322 | bat -l patch, I see changes to files that are outside of Day 1:

gh pr diff 1322 | rg '\+#:' | cut -d : -f 2 | sort -u
 src/android/aidl/changing.md
 src/android/aidl/deploy.md
...
 src/bare-metal/alloc.md
 src/bare-metal/android.md
...
 src/exercises/android/morning.md
 src/exercises/bare-metal/afternoon.md
...
 src/exercises/day-2/afternoon.md
 src/exercises/day-2/book-library.md
...

So I think the zh-TV.po file needs more fine-grained splitting.

I recently found that I can msggrep out all files below the android/, bare-metal/, conconcurrency/, and async/ directories with this lovely incantation (in Zsh):

msggrep --location=src/{exercises/,}{async,bare-metal}{.md,"/*","/*/*","/*/*/*"} po/zh-TW.po

The rest (msggrep --invert-match) is Days 1 to 3. Getting the content of those days can be done similarly, but there are many more directories to match. Infact, now that I think of it, it might be easier to list the files explicitly to msggrep.

@henrif75
Copy link
Collaborator Author

henrif75 commented Oct 9, 2023

The diff is large mostly because of the normalization.

What kind of normalization are you doing?

Yes, I used the mdbook-i18n-normalize at the end and most of the diff comes from the new lines and line numbers.

However, I tried running the file thought msgcat -o po/zh-TW.po po/zh-TW.po just now, and I still get a lot of differences:

% git diff HEAD^ | diffstat
 zh-TW.po |13296 ++++++++++++++++++++++++++++++++++++++++++++++++-------------------------------------------------
 1 file changed, 6704 insertions(+), 6592 deletions(-)

So the changes are not primarily because of changes to the line numbers.

When merging the translations, I did against an updated messages.pot. The diffs are coming mostly because of organic changes on the English text.
I'll resubmit the PRs anyway because of the mdbook-i18n-normalize, but to address the big diff we could:

  • Refresh the PO file first or
  • Just merge the translations against the current PO file without refreshing it.
    Any suggestions are welcome.

@mgeisler
Copy link
Collaborator

mgeisler commented Oct 9, 2023

  • Just merge the translations against the current PO file without refreshing it.

That is probably the right approach: separating the use of msgmerge (and mdbook-xgettext) from the translations.

With that, the steps become

  1. Regenerate messages.pot once. This sets POT-Creation-Date to now.
  2. Run msgmerge to merge messages.pot into the existing translation.
  3. Split the PO file into smaller chunks with msggrep.
  4. Send the chunks to the in-house translators.
  5. When translations come back, use msgcat to join them with the PO file. Be careful that the translations don't just end up in the back of the PO file: the msgid has to exist in the PO file for msgcat to add the msgstr at the right spot. If the msgid changed, then perhaps msgmerge is the right tool, I'm not 100% sure here.
  6. Open a PR with the changes.

This should ensure that the POT-Creation-Date header is set correct (to the date when the translation was submitted to the in-house translators, not to today's day or to any time in-between). It should also ensure that PRs only contain small amounts of changes.

If I understand correctly, the main change here is to only extract English strings rarely: once before sending the PO files to the translators.

@henrif75 henrif75 marked this pull request as draft October 9, 2023 20:35
@henrif75
Copy link
Collaborator Author

henrif75 commented Oct 9, 2023

This should ensure that the POT-Creation-Date header is set correct (to the date when the translation was submitted to the in-house translators, not to today's day or to any time in-between). It should also ensure that PRs only contain small amounts of changes.

I had done that when I sent the file to the translators, with the difference that I sent a single file instead of the chunks. I think I can adapt your procedure by:

  1. Update manually the POT-Creation-Date to the date when I refreshed the PO file with messages.pot.
  2. Split the translated file into those chunks and merge them with the current zh-TW.po file, without refreshing again with messages.pot

Does it sound correct?

@henrif75
Copy link
Collaborator Author

henrif75 commented Oct 9, 2023

I think now it's much more manageable?

@henrif75 henrif75 marked this pull request as ready for review October 9, 2023 22:00
@mgeisler
Copy link
Collaborator

I think now it's much more manageable?

Yes, it looks much better now! The diff actually loads in GitHub and from a quick skim, it looks like it's only adding translations! 🎉

Copy link
Collaborator

@victorhsieh victorhsieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Publishing my progress so far. I'd suggest smaller patches next time, so that reviewers can process separately.

But in any case, thanks for the translation!

po/zh-TW.po Outdated Show resolved Hide resolved
po/zh-TW.po Outdated Show resolved Hide resolved
po/zh-TW.po Outdated Show resolved Hide resolved
po/zh-TW.po Outdated Show resolved Hide resolved
po/zh-TW.po Outdated Show resolved Hide resolved
po/zh-TW.po Outdated Show resolved Hide resolved
po/zh-TW.po Outdated Show resolved Hide resolved
po/zh-TW.po Outdated Show resolved Hide resolved
po/zh-TW.po Outdated Show resolved Hide resolved
po/zh-TW.po Outdated Show resolved Hide resolved
henrif75 and others added 5 commits October 11, 2023 12:44
Co-authored-by: Victor Hsieh <[email protected]>
Co-authored-by: Victor Hsieh <[email protected]>
Co-authored-by: Victor Hsieh <[email protected]>
Co-authored-by: Victor Hsieh <[email protected]>
Co-authored-by: Victor Hsieh <[email protected]>
@henrif75
Copy link
Collaborator Author

Hi @victorhsieh ,
The check is failing probably because you're using a @gmail.com email that didn't accept the CLA, would like to check it out?
Thanks!

Copy link
Collaborator

@victorhsieh victorhsieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/me making some progress slowly...

po/zh-TW.po Outdated Show resolved Hide resolved
po/zh-TW.po Outdated Show resolved Hide resolved
po/zh-TW.po Outdated Show resolved Hide resolved
po/zh-TW.po Outdated Show resolved Hide resolved
po/zh-TW.po Outdated Show resolved Hide resolved
po/zh-TW.po Outdated Show resolved Hide resolved
po/zh-TW.po Outdated Show resolved Hide resolved
po/zh-TW.po Outdated Show resolved Hide resolved
po/zh-TW.po Outdated Show resolved Hide resolved
po/zh-TW.po Outdated Show resolved Hide resolved
Copy link
Collaborator

@victorhsieh victorhsieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with nits. Thanks for the translation!

po/zh-TW.po Outdated Show resolved Hide resolved
po/zh-TW.po Outdated Show resolved Hide resolved
po/zh-TW.po Outdated Show resolved Hide resolved
po/zh-TW.po Outdated Show resolved Hide resolved
po/zh-TW.po Outdated Show resolved Hide resolved
po/zh-TW.po Outdated Show resolved Hide resolved
po/zh-TW.po Outdated Show resolved Hide resolved
po/zh-TW.po Outdated Show resolved Hide resolved
po/zh-TW.po Outdated Show resolved Hide resolved
po/zh-TW.po Outdated Show resolved Hide resolved
@henrif75 henrif75 merged commit 696f8e1 into google:main Oct 22, 2023
30 of 31 checks passed
@henrif75 henrif75 deleted the zh-TW-day1 branch October 22, 2023 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants