Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PS-9237 feature: Include support for utf8mb4_0900_ai_ci in MySQL 5.7 #5364

Draft
wants to merge 8 commits into
base: release-8.0.37-29
Choose a base branch
from
15 changes: 15 additions & 0 deletions sql/sys_vars.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1869,6 +1869,21 @@ static bool check_charset(sys_var *, THD *thd, set_var *var) {
my_error(ER_UNKNOWN_CHARACTER_SET, MYF(0), err.ptr());
return true;
}
// if 'default_collation_for_utf8mb4' is set to something other than
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about similar code in check_collation_not_null() ?
In theory one can do SET @@global.collation_connection = utf8mb4.
Perhaps we should handle this case as well?
Or at least add comment why we don't think it is necessary/don't want to do it (indeed it might be non-trivial to distinguish cases when one uses charset name only - utf8mb4 vs case when one specifies collation explicitly -utf8mb4_0900_ai_ci.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a bit more pondering about it, I think it is a good idea to have test coverage for cases when user can use both charset name and collation name. Like SET NAMES utf8mb4; vs SET NAMES utf8mb4 COLLATE utf8mb4_0900_ai_ci and SET @@global.collation_connection = utf8mb4 vs SET @@global.collation_connection = utf8mb4_0900_ai_ci.

Do you agree?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlenev I extended coverage with the following statements

SET character_set_client = utf8mb4;
SET NAMES DEFAULT;
SET NAMES utf8mb4;
SET NAMES utf8mb4 COLLATE utf8mb4_general_ci;
SET CHARACTER SET DEFAULT;
SET CHARACTER SET utf8mb4;
SET collation_connection = utf8mb4_general_ci;

Note that 'SET NAMES utf8mb4 COLLATE utf8mb4_0900_ai_ci' will generate
character_set_client=255 in the binary log - I believe this is correct behavior as uses pecifies the collation explicitly.

Likewise, 'SET collation_connection = utf8mb4_0900_ai_ci' will generate
character_set_client=255 in the binary log.

As for 'SET collation_connection = utf8mb4' (when we try to assign a character set name to a collation variable), there is now problem here as this statatement is considered
syntactically incorrect In other words, no need to change check_collation_not_null().

// default 'utf8mb4' collation ('utf8mb4_0900_ai_ci') and if the value
// returned by 'get_charset_by_csname()' is also default 'utf8mb4'
// collation ('utf8mb4_0900_ai_ci'), meaning that were requesting for
// 'utf8mb4', we need to fix the returned value depending on the value of
// 'default_collation_for_utf8mb4' (currently, only 'utf8mb4_general_ci'
// is possible)
const auto *primary_utf8mb4_collation =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same comment as for previous patch:

Perhaps it is better simply do:

if (var->save_result.ptr == &my_charset_utf8mb4_0900_ai_ci)
  var->save_result.ptr = thd->variables.default_collation_for_utf8mb4;

instead ?

What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworked the same way.

get_charset_by_csname("utf8mb4", MY_CS_PRIMARY, MYF(0));
if (thd->variables.default_collation_for_utf8mb4 !=
primary_utf8mb4_collation) {
if (var->save_result.ptr == primary_utf8mb4_collation) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ cppcoreguidelines-pro-type-union-access ⚠️
do not access members of unions; use (boost::)variant instead

var->save_result.ptr = thd->variables.default_collation_for_utf8mb4;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ cppcoreguidelines-pro-type-union-access ⚠️
do not access members of unions; use (boost::)variant instead

}
}
warn_on_deprecated_charset(
thd, static_cast<const CHARSET_INFO *>(var->save_result.ptr),
err.ptr());
Expand Down
Loading