Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more accented characters decomposition #3838

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

neube3
Copy link

@neube3 neube3 commented Mar 19, 2023

Hello!

Hopefully I've done everything right despite knowing next to nothing about C or Java.

The PR addresses mostly Polish language (which uses chars like "ż/Ż", "ą/Ą", "ę/Ę"), but I have included every available character in Windows' charmap which had either "dot above" or "ogonek", respectively (the terminology came from charmap, but "ogonek" literally means "a little tail" and is actually used by the Polish people to describe those two letters).

In the class I interjected the dotabove above umlaut since I was trying to preserve the UTF order. Below, in private static and mapping itself, I just added it to the end, so the grouping would hopefully make sense.

There is a nonsensical sentence ZAŻÓŁĆ GĘŚLĄ JAŹŃ used to test if a keyboard layout, a program, etc. can display all Polish diacritics. Right now, it ends up as ZAÓĆ GŚL JAŹŃ. After this PR is merged, it should be almost correct - ZAŻÓĆ GĘŚLĄ JAŹŃ, missing only the Ł character. Sadly, Ł itself is impossible to decompose because of no UTF chars exist for "connecting upwards stroke", so "ł/Ł" seems to be out of reach until this gets added (which it probably won't). I know "connecting short stroke" and "connecting long stroke" exist, but even if they did work (because I was not able to make them work properly), it would still be a different character (L + - != Ł).

Since there is addition, but no change or removal of anything, this should merge nicely.

@rom1v
Copy link
Collaborator

rom1v commented Mar 19, 2023

Thank you for your PR.

There is a nonsensical sentence ZAŻÓŁĆ GĘŚLĄ JAŹŃ used to test if a keyboard layout, a program, etc. can display all Polish diacritics. Right now, it ends up as ZAÓĆ GŚL JAŹŃ. After this PR is merged, it should be almost correct - ZAŻÓĆ GĘŚLĄ JAŹŃ, missing only the Ł character.

In practice, I get exactly ZAÓĆ GŚL JAŹŃ with your PR. Do you observe a different behavior on your device?

@neube3
Copy link
Author

neube3 commented Mar 19, 2023

As I said, I can't compile it myself nor know Java/C, so it was a hopeful try.

Sadly, this means it probably won't work.

Deeper digging revealed https://source.android.com/docs/core/interaction/input/key-character-map-files#behaviors - if I am reading this correctly (if not belatedly), the five dead keys you have implemented are the only ones exposed via the Android API?

So this is not only a bummer because it's impossible to fully implement polish characters (missing Ł), but three more are impossible because android itself doesn't handle it well in APIs?

I admit I'm way over my head here (e.g. I don't understand how this is possible since multiple keyboards do have those characters - I guess they're not decomposing or using dead keys and/or IMEs are somehow going around that); since it doesn't work, I guess it was a good try, but that's the extent of it.

Sorry for the fuss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants