Forked library; and some thoughts about whether it's worth it to keep all modules at same Unicode version #279

ctrlcctrlv · 2021-05-29T01:43:07Z

I'm working on a font editor, MFEK. I also contribute to Unicode when I can. One of my fonts requires characters in Unicode 14.0.

For those reasons, I had to fork the project. I only need blocks, categories, and names, so I called my version QD-UNIC—“quick and dirty UNIC”. https://github.com/MFEK/qd-unic.rlib

I think that, perhaps, this project was too ambitious, in the sense that all the modules must match each other in Unicode version. That's what's caused a single PR, #226, to stall development of everything because of issues with unic-ucd-segment.

Obviously some of these modules are very easy to keep updated, and unic-gen works phenomenally well. Those implementing things like text segmentation and BIDI are going to be more difficult, and certainly subject to the needs of the community…which more often match mine than not. Basically, in short, users who only care about getting character names shouldn't suffer because no one has yet contributed a fix to a text segmentation problem.

Anyway, I doubt y'all will agree, which is why I forked, but I thought I'd let you know why I forked.

The text was updated successfully, but these errors were encountered:

eyeplum · 2021-05-29T22:34:53Z

I'm in a similar position with you too: I have a Unicode tool in production and need to keep the data of the app up-to-date (currently Unicode 13.0). So far I've been updating rust-unic in my own fork only https://github.com/eyeplum/rust-unic. Since my app also has a feature to perform grapheme segmentation, I also attempted a fix for unic-ucd-segment in Unicode 11.0 (which I presume had worked since all tests are passing at the moment in my fork).

I would love to eventually merge my fork back so that we could keep rust-unic up-to-date (iirc updates after 11.0 are pretty straightforward).

I will try to find some time to break the changes on my fork into small PRs and see.

As for decoupling each modules so that they can have different Unicode versions, it does sound pretty tempting to me as well to have something like a separate UCD module which is always kept up-to-date (probably trivially), since that's my main use case as well. Maybe breaking rust-unic into separate repos and use versioning would make it possible?

E.g. in the unified project, rust-unic depends on rust-unic-ucd (where rust-unic-ucd is a separate project), the Unicode version is kept the same between rust-unic and rust-unic-ucd:

rust-unic (Unicode 13.0)
`-- rust-unic-ucd (Unicode 13.0)

In its own project, rust-unic-ucd can have any Unicode version it supports (as git tags or branches):

rust-unic-ucd
- branch: master => Unicode 13.0 (the latest released version of the Unicode Standard)
- branch: next => Unicode 14.0 (the next release of the Unicode Standard)
- tag: unicode-12.1 => Unicode 12.1
- tag: unicode-12.0 => Unicode 12.0
- tag: unicode-11.0 => Unicode 11.0
- ...

I'm actually quite excited when playing around this idea in my head, but I haven't thought of all ramifications.

zbraniecki · 2021-05-29T23:15:10Z

Hi all! You may want to consider helping us with ICU4x project! One of the power features were working on is robust data provider which works with Unicode properties and should address your needs - we're currently focused on supplying the needs of regular expression and segmentation APIs but would be open to collaborate on other targets !

CAD97 · 2021-06-04T21:03:53Z

Yeah, for the time being, ICU4x is the way to go. At least until @behnam is active again, this project is effectively on hiatus.

If you ping me, I think I can still merge PRs, but I wouldn't personally suggest using unic as a unicode table provider at the moment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Forked library; and some thoughts about whether it's worth it to keep all modules at same Unicode version #279

Forked library; and some thoughts about whether it's worth it to keep all modules at same Unicode version #279

ctrlcctrlv commented May 29, 2021

eyeplum commented May 29, 2021 •

edited

Loading

zbraniecki commented May 29, 2021

CAD97 commented Jun 4, 2021

Forked library; and some thoughts about whether it's worth it to keep all modules at same Unicode version #279

Forked library; and some thoughts about whether it's worth it to keep all modules at same Unicode version #279

Comments

ctrlcctrlv commented May 29, 2021

eyeplum commented May 29, 2021 • edited Loading

zbraniecki commented May 29, 2021

CAD97 commented Jun 4, 2021

eyeplum commented May 29, 2021 •

edited

Loading