Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forked library; and some thoughts about whether it's worth it to keep all modules at same Unicode version #279

Open
ctrlcctrlv opened this issue May 29, 2021 · 3 comments

Comments

@ctrlcctrlv
Copy link

I'm working on a font editor, MFEK. I also contribute to Unicode when I can. One of my fonts requires characters in Unicode 14.0.

For those reasons, I had to fork the project. I only need blocks, categories, and names, so I called my version QD-UNIC—“quick and dirty UNIC”. https://github.com/MFEK/qd-unic.rlib

I think that, perhaps, this project was too ambitious, in the sense that all the modules must match each other in Unicode version. That's what's caused a single PR, #226, to stall development of everything because of issues with unic-ucd-segment.

Obviously some of these modules are very easy to keep updated, and unic-gen works phenomenally well. Those implementing things like text segmentation and BIDI are going to be more difficult, and certainly subject to the needs of the community…which more often match mine than not. Basically, in short, users who only care about getting character names shouldn't suffer because no one has yet contributed a fix to a text segmentation problem.

Anyway, I doubt y'all will agree, which is why I forked, but I thought I'd let you know why I forked.

@eyeplum
Copy link
Member

eyeplum commented May 29, 2021

I'm in a similar position with you too: I have a Unicode tool in production and need to keep the data of the app up-to-date (currently Unicode 13.0). So far I've been updating rust-unic in my own fork only https://github.com/eyeplum/rust-unic. Since my app also has a feature to perform grapheme segmentation, I also attempted a fix for unic-ucd-segment in Unicode 11.0 (which I presume had worked since all tests are passing at the moment in my fork).

I would love to eventually merge my fork back so that we could keep rust-unic up-to-date (iirc updates after 11.0 are pretty straightforward).

I will try to find some time to break the changes on my fork into small PRs and see.


As for decoupling each modules so that they can have different Unicode versions, it does sound pretty tempting to me as well to have something like a separate UCD module which is always kept up-to-date (probably trivially), since that's my main use case as well. Maybe breaking rust-unic into separate repos and use versioning would make it possible?

E.g. in the unified project, rust-unic depends on rust-unic-ucd (where rust-unic-ucd is a separate project), the Unicode version is kept the same between rust-unic and rust-unic-ucd:

rust-unic (Unicode 13.0)
`-- rust-unic-ucd (Unicode 13.0)

In its own project, rust-unic-ucd can have any Unicode version it supports (as git tags or branches):

rust-unic-ucd
- branch: master => Unicode 13.0 (the latest released version of the Unicode Standard)
- branch: next => Unicode 14.0 (the next release of the Unicode Standard)
- tag: unicode-12.1 => Unicode 12.1
- tag: unicode-12.0 => Unicode 12.0
- tag: unicode-11.0 => Unicode 11.0
- ...

I'm actually quite excited when playing around this idea in my head, but I haven't thought of all ramifications.

@zbraniecki
Copy link
Member

Hi all! You may want to consider helping us with ICU4x project! One of the power features were working on is robust data provider which works with Unicode properties and should address your needs - we're currently focused on supplying the needs of regular expression and segmentation APIs but would be open to collaborate on other targets !

@CAD97
Copy link
Collaborator

CAD97 commented Jun 4, 2021

Yeah, for the time being, ICU4x is the way to go. At least until @behnam is active again, this project is effectively on hiatus.

If you ping me, I think I can still merge PRs, but I wouldn't personally suggest using unic as a unicode table provider at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants