-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: #6146 - utf8/utf8mb3 alias causes redundant ALTER TABLE changes #6370
base: 4.3.x
Are you sure you want to change the base?
Conversation
Just pushed a change to fix the psalm issues |
It seems there are CI jobs failing. Please take a look at this guide for more on how to handle those. |
ಠ_ಠ |
Codecov says that some of the new code is untested, most notably the parts using You can learn more about such tests at https://www.doctrine-project.org/projects/doctrine-dbal/en/stable/reference/testing.html |
I think I found the reason: Let me retry this. |
In a recent PR, we got reports from CodeCov about several lines not being covered. After further inspection, I found an error message in the coverage file upload saying we hit a rate limit. This will make it more likely we spot such issues in the future. The PR: doctrine#6370
From the Codecov UI:
… we currently send 69 uploads, so there is no way we are going to be below the upload limit. I will have to look into this separately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, my feedback is not a blocker.
public function normalizeCollation(string $collation): string | ||
{ | ||
if ($this->useUtf8mb3 && str_starts_with($collation, 'utf8_')) { | ||
return 'utf8mb3' . substr($collation, 4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd make things more obvious here:
return 'utf8mb3' . substr($collation, 4); | |
return 'utf8mb3' . substr($collation, strlen('utf8')); |
} | ||
|
||
if (! $this->useUtf8mb3 && str_starts_with($collation, 'utf8mb3_')) { | ||
return 'utf8' . substr($collation, 7); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return 'utf8' . substr($collation, 7); | |
return 'utf8' . substr($collation, strlen('utf8mb3')); |
Ah there is one issue though: since this is a fix, you should target 3.8.x |
I can create another PR for this branch. |
Please don't, we usually do merges up. Instead, let me retarget your PR, and please rebase your commits on 3.8.x and force push. |
In a recent PR, we got reports from CodeCov about several lines not being covered. After further inspection, I found an error message in the coverage file upload saying we hit a rate limit. This will make it more likely we spot such issues in the future. The PR: doctrine#6370
The PR uses I'm going to be offline for the next week or so, and will look at this on my return. |
Enjoy your time off 👍 |
@fisharebest do you need help with the rebase? |
I have rewritten my patch to use the However, I have been unable to back-port this PR to the 3.8.x branch. (I can, but it doesn't fix the bug...). The handling of charset/collation is different. I'd probably need to copy lots of other changes from the 4.0.x branch to make it work. Options are:
|
I'd be surprised if there were lots of changes you could backport that are not breaking changes, so I'd say we should consider applying this to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All seems ok
As far as I understand this, utf8 would stay utf8 once utf8mb3 isn't supported anymore. I didn't find information about when this alias handling was introduced on a short search though. |
Now that MySQL 9.0 is released, we can see that the logic still works, and the tests pass. |
Can you please rebase onto 4.1? That should trigger a CI pipeline that includes MySQL 9. |
All the tests are passing. Looks like a missing token in the CI tools. |
You can ignore the Codecov failure, that issue is tracked here: codecov/codecov-action#1548 |
Yes sorry, we lost track of this PR, apparently. I'll do a review of the PR. |
/** | ||
* INFORMATION_SCHEMA.CHARACTER_SETS will contain either 'utf8' or 'utf8mb3' but not both. | ||
*/ | ||
public function informationSchemaUsesUtf8mb3(Connection $connection): bool | ||
{ | ||
$sql = "SELECT character_set_name FROM INFORMATION_SCHEMA.CHARACTER_SETS WHERE character_set_name = 'utf8mb3'"; | ||
|
||
return $connection->fetchOne($sql) === 'utf8mb3'; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having my issues with this method.
First of all, it is highly unusual that our platform classes interact with the connection, in this case send a query to the server. Our platform classes are meant to collect and normalize differences between database systems and versions. Since you don't ever override this method, there's no reason why this method should exist on a platform class.
Moreover, you wrote that…
In MySQL (and MariaDB),
utf8
andutf8mb3
are aliases. Depending on the database server version, the information schema may contain one or the other. Similarly, column definitions may contain one or the other.
If this behavior really depends on the database version and taking into account that we know the database version: Why do we need to perform feature detection by running a query on the server?
public function normalizeCharset(string $charset): string | ||
{ | ||
if ($this->useUtf8mb3 && $charset === 'utf8') { | ||
return 'utf8mb3'; | ||
} | ||
|
||
if (! $this->useUtf8mb3 && $charset === 'utf8mb3') { | ||
return 'utf8'; | ||
} | ||
|
||
return $charset; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that CharsetMetadataProvider
is the right abstraction for this logic. The purpose of CharsetMetadataProvider
is to query a piece of configuration from the server in a cachable manner. A normalization method is something entirely different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more I think about it: This normalization logic belongs into the platform classes. And if you put it there, you don't need that detection logic of yours and you don't that useUtf8mb3
flag. Give it a try. 🙂
$utf8mb3Provider = new ConnectionCharsetMetadataProvider($connection, true); | ||
|
||
self::assertSame('utf8mb3', $utf8mb3Provider->normalizeCharset('utf8')); | ||
self::assertSame('utf8mb3', $utf8mb3Provider->normalizeCharset('utf8mb3')); | ||
self::assertSame('foobar', $utf8mb3Provider->normalizeCharset('foobar')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a separate test.
$utf8mb3Provider = new ConnectionCollationMetadataProvider($connection, true); | ||
|
||
self::assertSame('utf8mb3_unicode_ci', $utf8mb3Provider->normalizeCollation('utf8_unicode_ci')); | ||
self::assertSame('utf8mb3_unicode_ci', $utf8mb3Provider->normalizeCollation('utf8mb3_unicode_ci')); | ||
self::assertSame('foobar_unicode_ci', $utf8mb3Provider->normalizeCollation('foobar_unicode_ci')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a separate test.
@fisharebest Will you have time to pick up the work on this PR? From what I gathered, we don't need the detection logic that you've built at all. The charsets
This means, that if we normalize both schemas to let's say the long version Note that MySQL 5.7 support is deprecated by now, so I would really like to keep logic that works around behavioral changes of MySQL 8.0 as simple as possible. |
Not quite. The change in canonical/alias happened between 8.0.29 and 8.0.30. For MariaDB it is a little more complicated. The same change was made in 10.6.1, but a run-time variable (
Yes, but there is still the small issue of DBAL looking up default collations for charsets. See the second part of my comment at #6146 (comment) We'd need to normalise the result of these queries too. |
Oh right. MySQL's versioning policy during 8.0.x times was wild.
WTF. Thank you very much for your thorough research. So, we could set that variable before we run our queries and reset it afterwards, right?
Fine by me. But again, if we do all this normalization, we don't need run any detection logic on the server because we enable DBAL to work with MySQL/MariaDB responsing with old and new character set names (and corresponsing collation names), correct? |
Summary
In MySQL (and MariaDB),
utf8
andutf8mb3
are aliases. Depending on the database server version, the information schema may contain one or the other. Similarly, column definitions may contain one or the other.When performing schema-comparisons, a mismatch between
utf8
andutf8mb3
causes anALTER TABLE
statement to be generated - even when columns are identical.The code already contains a function to normalize the charset/collation options:
dbal/src/Platforms/MySQL/Comparator.php
Line 83 in 61d79c6
My solution is to extend this function so that it also replaces
utf8
withutf8mb3
(or vice-versa), to match the current database connection.To detect the prefered character set for the connection, I've added:
AbstractMySQLPlatform::informationSchemaUsesUtf8mb3()
MariaDBPlatform::informationSchemaUsesUtf8mb3()
The obvious place for the new normalization logic is the
CharsetMetadataProvider
andCollationMetadataProvider
classes. These are interfaces, so this is technically a BC break, however, they are marked@internal
.I've added tests which should cover all the new/modified code. I've run it against MySQL. I don't have MariaDB to hand.