Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PS-9237 feature: Include support for utf8mb4_0900_ai_ci in MySQL 5.7 #5364

Draft
wants to merge 8 commits into
base: release-8.0.37-29
Choose a base branch
from
18 changes: 18 additions & 0 deletions sql/sql_connect.cc
Original file line number Diff line number Diff line change
Expand Up @@ -670,6 +670,24 @@ void reset_mqh(THD *thd, LEX_USER *lu, bool get_them = false) {

bool thd_init_client_charset(THD *thd, uint cs_number) {
CHARSET_INFO *cs;

// if the 8.0 client sets 'MYSQL_SET_CHARSET_NAME' option to 'utf8mb4' or
// leaves it empty, basically meaning the same, this function will be called
// with 'cs_number' equal to 255 (meaning 'utf8mb4_0900_ai_ci')

// at the same time, if 'default_collation_for_utf8mb4' is set to something
// other than default 'utf8mb4' collation ('utf8mb4_0900_ai_ci', number 255),
// we need to fix 'cs_number' here by setting it to the corresponding number
// of 'default_collation_for_utf8mb4' (currently only 'utf8mb4_general_ci',
// number 45, is supported)
const auto *primary_utf8mb4_collation =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see in other places where we check if default_collation_for_utf8mb4 needs to kick in (e.g. see sql_lex.cc) we directly use my_charset_utf8mb4_0900_ai_ci instead of getting access to it though get_charset_by_csname().

IMO it makes sense to be consistent and do the same here... Especially since this code seems to be called for each connect so saving even a few CPU cycles would be nice.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think that instead of two ifs you can simply do:

if (cs_number == my_charset_utf8mb4_0900_ai_ci.number)
  cs_numer = thd->variables.default_collation_for_utf8mb4->number;

What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, @dlenev I had exactly the same doubts and was going back and forth with this. From one side I did not want to hardcode primary_utf8mb4_collation to be my_charset_utf8mb4_0900_ai_ci (because who knows may be in the next version the default collation will change again). On the other hand, I totally agree that establishing the connection is a critical path and we should not add any unnecessary cycles here.
Anyway, if this caught your attention as well, then it is probably more serious than I thought.
Let's wait for the final feedback from the customer and I will add the changes you suggested into the final patch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlenev I reworked the critical paths code with simplified versions

  if (thd->variables.default_collation_for_utf8mb4 !=
      &my_charset_utf8mb4_0900_ai_ci) {
    if (client_cs == &my_charset_utf8mb4_0900_ai_ci) {
      client_cs = thd->variables.default_collation_for_utf8mb4;
    }
  }

that does not involve charset by name lookup.

get_charset_by_csname("utf8mb4", MY_CS_PRIMARY, MYF(0));
if (thd->variables.default_collation_for_utf8mb4 !=
primary_utf8mb4_collation) {
if (cs_number == primary_utf8mb4_collation->number) {
cs_number = thd->variables.default_collation_for_utf8mb4->number;
}
}
/*
Use server character set and collation if
- opt_character_set_client_handshake is not set
Expand Down