Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create dprint plugin to format the PO files. #2529

Open
HidenoriKobayashi opened this issue Dec 23, 2024 · 4 comments
Open

Create dprint plugin to format the PO files. #2529

HidenoriKobayashi opened this issue Dec 23, 2024 · 4 comments
Assignees

Comments

@HidenoriKobayashi
Copy link
Collaborator

Create a stable format plugin for PO files using dprint.

@HidenoriKobayashi HidenoriKobayashi self-assigned this Dec 23, 2024
@HidenoriKobayashi
Copy link
Collaborator Author

HidenoriKobayashi commented Dec 23, 2024

The general direction for this task is

  • create a dprint wasm plugin to format PO files
  • use polib crate to parse the PO structure
  • use textwrap if polib does not handle languages with no space

So far, I have

  • created a plugin that uses polib and found that it doesn't handle languages with no space well. It essentially becomes one line.
  • integrated textwrap to find the dprint complaining about the translation result not being stable.

Now, I'm developing some unit tests for the plug-in as the test harness checks the stability of the result by translating the result one more time. I also found that polib does not try to format long comment/source lines.

@HidenoriKobayashi
Copy link
Collaborator Author

I think I know why the result is not stable with textwrap. It removes the space when a line is broken at the space. For example,

msgid ""
"[![Build workflow](https://img.shields.io/github/actions/workflow/status/"
"google/comprehensive-rust/build.yml?style=flat-square)](https://github.com/"
"google/comprehensive-rust/actions/workflows/build.yml?query=branch%3Amain) [!"
"[GitHub contributors](https://img.shields.io/github/contributors/google/"
"comprehensive-rust?style=flat-square)](https://github.com/google/"
"comprehensive-rust/graphs/contributors)"

would be turned into

msgid ""
"[![Build workflow](https://img.shields.io/github/actions/workflow/status/google/"
"comprehensive-rust/build.yml?style=flat-square)](https://github.com/google/"
"comprehensive-rust/actions/workflows/build.yml?query=branch%3Amain) [![GitHub"
"contributors](https://img.shields.io/github/contributors/google/comprehensive-"
"rust?style=flat-square)](https://github.com/google/comprehensive-rust/graphs/"
"contributors)"

There's no space after "GitHub" in the third line. So if this result is fed into the formatter again, then it would produce a different result because it see "GitHubcontributors".

@mgeisler I think you were right when you suspected that this might be related to mgeisler/textwrap#503
Although it's slightly different in our case, because we just want to have one space?

@mgeisler
Copy link
Collaborator

mgeisler commented Jan 6, 2025

There's no space after "GitHub" in the third line. So if this result is fed into the formatter again, then it would produce a different result because it see "GitHubcontributors".

@mgeisler I think you were right when you suspected that this might be related to mgeisler/textwrap#503 Although it's slightly different in our case, because we just want to have one space?

Ah, I see what you mean! Thanks for the clear example to reproduce this.

I would suggest not using Textwrap at first since preserving the trailing whitespace will require some changes to it :( Specifically, I recently realized that it's not possible to customize the library keep trailing whitespace right now: mgeisler/textwrap#572 (comment). You're very welcome to dive in and fix this, of course 😄.

In case it's useful, the crate includes a very simple wrapping example. It also doesn't preserve trailing whitespace, but it should be easy to fix by replacing the '\n' with a ' '. The example assumes that multiple spaces can be replaced with a single space, which should be true for Markdown.

@HidenoriKobayashi
Copy link
Collaborator Author

Thanks for the suggestions!

I experimented a few approaches, but I think we still want to use Textwrap for wrapping because it handles languages without spaces pretty well.

Could we do a simple amendment to the result of Textwrap? Something like that checks if the last word of each line was followed by a space in the original string and appends one if so? For the simple case above, it seems to work but there may be a case that this naive algorithm cannot handle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants