Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to Unicode 11.0 #259

Open
8 tasks
eyeplum opened this issue Mar 6, 2019 · 1 comment
Open
8 tasks

Upgrade to Unicode 11.0 #259

eyeplum opened this issue Mar 6, 2019 · 1 comment
Labels
A: source-data Source Data C: emoji Unicode Emoji C: idna IDNA: Internationalized Domain Names in Applications C: segmentation Unicode Text Segmentation C: ucd Unicode Character Database feature New features requested or planned

Comments

@eyeplum
Copy link
Member

eyeplum commented Mar 6, 2019

Description

Update external data and all modules to Unicode 11.0. For changes in Unicode 11.0, see: https://www.unicode.org/versions/Unicode11.0.0/

Tasks

These tasks are roughly planned out according to changes in Unicode 11.0 and the potential impact of rust-unic's implementation.

Segmentation

  • Implement new grapheme cluster breaking rules
    • Implement the new Extended_Pictographic property
    • Remove GB10 implementation
    • Implement the new GB11 rule
  • Implement new word breaking rules

Emoji

  • Add .rsv table for the new Extended_Pictographic property (other works are captured by UCD)

UCD

  • [Optional] Implement the new Equivalent_Unified_Ideograph property

IDNA

  • Update IDNA conformance test to the new format

Related Issues, Pull Requests, Forks

@eyeplum eyeplum added C: segmentation Unicode Text Segmentation C: ucd Unicode Character Database C: emoji Unicode Emoji A: source-data Source Data C: idna IDNA: Internationalized Domain Names in Applications labels Mar 6, 2019
@eyeplum eyeplum changed the title Update to Unicode 11.0 Upgrade to Unicode 11.0 Mar 6, 2019
@behnam
Copy link
Member

behnam commented Apr 8, 2019

unicode-rs/unicode-segmentation#43 is tracking segmentation algorithm updates. I think we can keep the code in sync with unicode-segmentation, regardless of where we do the impl first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: source-data Source Data C: emoji Unicode Emoji C: idna IDNA: Internationalized Domain Names in Applications C: segmentation Unicode Text Segmentation C: ucd Unicode Character Database feature New features requested or planned
Projects
None yet
Development

No branches or pull requests

2 participants