Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FB4] ICU63.1 suppresses conversion errors #8108

Open
dmitry-lipetsk opened this issue May 7, 2024 · 7 comments
Open

[FB4] ICU63.1 suppresses conversion errors #8108

dmitry-lipetsk opened this issue May 7, 2024 · 7 comments

Comments

@dmitry-lipetsk
Copy link
Contributor

FB4 returns an empty string when he can't translate Unicode symbol into ICU-codepage.

FB3 in this case returns an error.

Unicode symbol with code 0x115F (input string 'ᅟ')

Connection charset is NONE.

-- FB3 and FB4 are OK.
select cast(_utf8 '' as varchar(1) character set utf8) from rdb$database

-- FB3 returns an error, FB4 OK (an error is expected)
select cast(_utf8 '' as varchar(1) character set tis620) from rdb$database

-- FB3 and FB4 return an error (it is OK)
select cast(_utf8 '' as varchar(1) character set win1251) from rdb$database

The problem in new implementation of callback function UCNV_FROM_U_CALLBACK_STOP in ICU v63.1

https://github.com/unicode-org/icu/blob/5df4d7dfd8d77dd16aa3a0b398d50a22f4c85daa/icu4c/source/common/ucnv_err.cpp#L68-L113


In ICU v52 (FB3) this function does not contain any code

https://github.com/unicode-org/icu/blob/574e7d9d55760680ea14dbfc4908429a58c5d544/icu4c/source/common/ucnv_err.c#L53-L66

@asfernandes
Copy link
Member

They are called ignorable code points.
So why they must not be ignored?

@dmitry-lipetsk
Copy link
Contributor Author

I think, we must have one behaviour for built-in and external charsets.

@dmitry-lipetsk
Copy link
Contributor Author

If you agree with this inconsistent behavior, I can create PR.

@asfernandes
Copy link
Member

First, note that there is no relation of FB versions. This relation is specific in Windows due to us deploying a fixed ICU versions. In Linux, we don't deploy ICU library.

So if you're talking about consistent behavior only using ICU (and not modifying it), then a PR may be ok.

@dmitry-lipetsk
Copy link
Contributor Author

I offer to use a "stable" implementation of callback functions UCNV_TO_U_CALLBACK_STOP and UCNV_FROM_U_CALLBACK_STOP instead standard "mutable" implementations.

It does not require a modification of ICU.

then a PR may be ok.

Ok, I will do it.

dmitry-lipetsk added a commit to dmitry-lipetsk/firebird that referenced this issue May 10, 2024
Server will use "stable" implementation of UCNV_FROM_U_CALLBACK_STOP function to provide one behaviour for built-in charset and ICU-charset when source unicode-string contains "ignorable" symbols.

It will always produces translation error.

This patch restores a behaviour of FB2.1-FB3 for "ignorable" symbols.
@dmitry-lipetsk
Copy link
Contributor Author

Done. If this patch is ok, I can port it on FB5 and FB6 (master tree).

I decided to do not touch UCNV_TO_U_CALLBACK_STOP.

asfernandes added a commit that referenced this issue May 15, 2024
* Fix for issue #8108

Server will use "stable" implementation of UCNV_FROM_U_CALLBACK_STOP function to provide one behaviour for built-in charset and ICU-charset when source unicode-string contains "ignorable" symbols.

It will always produces translation error.

This patch restores a behaviour of FB2.1-FB3 for "ignorable" symbols.

* Misc.

---------

Co-authored-by: Adriano dos Santos Fernandes <[email protected]>
@asfernandes
Copy link
Member

If this patch is ok, I can port it on FB5 and FB6 (master tree).

Please do.

dmitry-lipetsk added a commit to dmitry-lipetsk/firebird that referenced this issue May 15, 2024
Server will use "stable" implementation of UCNV_FROM_U_CALLBACK_STOP function to provide one behaviour for built-in charset and ICU-charset when source unicode-string contains "ignorable" symbols.

It will always produces translation error.

This patch restores a behaviour of FB2.1-FB3 for "ignorable" symbols.
dmitry-lipetsk added a commit to dmitry-lipetsk/firebird that referenced this issue May 15, 2024
Server will use "stable" implementation of UCNV_FROM_U_CALLBACK_STOP function to provide one behaviour for built-in charset and ICU-charset when source unicode-string contains "ignorable" symbols.

It will always produces translation error.

This patch restores a behaviour of FB2.1-FB3 for "ignorable" symbols.

---
ATTENTION: These changes were not tested in a master tree (only in FB4).
asfernandes pushed a commit that referenced this issue May 15, 2024
Server will use "stable" implementation of UCNV_FROM_U_CALLBACK_STOP function to provide one behaviour for built-in charset and ICU-charset when source unicode-string contains "ignorable" symbols.

It will always produces translation error.

This patch restores a behaviour of FB2.1-FB3 for "ignorable" symbols.
asfernandes pushed a commit that referenced this issue May 15, 2024
Server will use "stable" implementation of UCNV_FROM_U_CALLBACK_STOP function to provide one behaviour for built-in charset and ICU-charset when source unicode-string contains "ignorable" symbols.

It will always produces translation error.

This patch restores a behaviour of FB2.1-FB3 for "ignorable" symbols.

---
ATTENTION: These changes were not tested in a master tree (only in FB4).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment