-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forked library; and some thoughts about whether it's worth it to keep all modules at same Unicode version #279
Comments
I'm in a similar position with you too: I have a Unicode tool in production and need to keep the data of the app up-to-date (currently Unicode 13.0). So far I've been updating rust-unic in my own fork only https://github.com/eyeplum/rust-unic. Since my app also has a feature to perform grapheme segmentation, I also attempted a fix for I would love to eventually merge my fork back so that we could keep rust-unic up-to-date (iirc updates after 11.0 are pretty straightforward). I will try to find some time to break the changes on my fork into small PRs and see. As for decoupling each modules so that they can have different Unicode versions, it does sound pretty tempting to me as well to have something like a separate UCD module which is always kept up-to-date (probably trivially), since that's my main use case as well. Maybe breaking rust-unic into separate repos and use versioning would make it possible? E.g. in the unified project,
In its own project,
I'm actually quite excited when playing around this idea in my head, but I haven't thought of all ramifications. |
Hi all! You may want to consider helping us with ICU4x project! One of the power features were working on is robust data provider which works with Unicode properties and should address your needs - we're currently focused on supplying the needs of regular expression and segmentation APIs but would be open to collaborate on other targets ! |
Yeah, for the time being, ICU4x is the way to go. At least until @behnam is active again, this project is effectively on hiatus. If you ping me, I think I can still merge PRs, but I wouldn't personally suggest using unic as a unicode table provider at the moment. |
I'm working on a font editor, MFEK. I also contribute to Unicode when I can. One of my fonts requires characters in Unicode 14.0.
For those reasons, I had to fork the project. I only need blocks, categories, and names, so I called my version QD-UNIC—“quick and dirty UNIC”. https://github.com/MFEK/qd-unic.rlib
I think that, perhaps, this project was too ambitious, in the sense that all the modules must match each other in Unicode version. That's what's caused a single PR, #226, to stall development of everything because of issues with
unic-ucd-segment
.Obviously some of these modules are very easy to keep updated, and
unic-gen
works phenomenally well. Those implementing things like text segmentation and BIDI are going to be more difficult, and certainly subject to the needs of the community…which more often match mine than not. Basically, in short, users who only care about getting character names shouldn't suffer because no one has yet contributed a fix to a text segmentation problem.Anyway, I doubt y'all will agree, which is why I forked, but I thought I'd let you know why I forked.
The text was updated successfully, but these errors were encountered: