-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong encoding #138
Comments
I join this issue! |
Same problem here. How can I fix it ? |
You can utf8_decode($countrie["name_en"]) |
Can it be that the encoding in the backend is wrong? Or double encoded? |
Same issue here |
Yes, this works, but it doesn't make sense to me. Seems that something is double encoded, haven't checked the sources though yet. |
I've made a check on the json countries file. It seems that the double encoded values are the translated ones (Eg. the ones in the fields "name_XX"). |
Yes, exactly. We'll have to monitor when this gets fixed in the package, we'll have to remove our |
I have used the solution suggested here for Laravel Collection and it worked:
|
The reason it's not done is because it's not easy to decode/re-encode them all correctly. Something I always have to say: the data we have here was not done by me, it's a collection of many other sources, and people just choose what they want/can use, I have zero control over this. Unfortunately So if someone can come up with a strong solution for correctly enconding everything to UTF8, I'm more than pleased to merge a PR. Cheers! |
This is working for me: protected function decode(?string $name): ?string
{
if (blank($name) || mb_detect_encoding($name) !== 'UTF-8') {
return $name;
}
return utf8_decode($name);
} But I'm unsure if we have to do this in the package. I can't check if ALL encodings are good, and probably not every single will be fixed. Also, it will take a lot more time to generate all the files, which is already very slow. Any thoughts? |
It didn't really fix them all, still had a lot of strings wrongly encoded, so I found this forceutf8 package that solved it (not fully too, still got some wrong, but it's way better): protected function decode(?string $name): ?string
{
if (blank($name)) {
return $name;
}
if (mb_detect_encoding($name) !== 'UTF-8') {
return Encoding::toUTF8($name);
}
return Encoding::fixUTF8($name);
} |
@antonioribeiro from where are you getting the countries data or how are you putting that data together? Not sure if it helps right now, but initially I thought I have a conversion issue, so I opened up this thread on Stack Overflow: |
@klodoma , here you have a list of sources I'm using: https://github.com/antonioribeiro/countries#copyright. Sanitize encoding is not impossible, but it's a lot of data to sanitize. And they may require different strategies. |
The issue is that part of the unicode is in unicode codepoint notation ( So, should we go with utf8_decode(json_decode('"'. "\u00d6sterreich" . '"'))
=> b"Österreich" No fun... I would suggest rebuilding the whole JSON files with consistent encoding, or for that matter, I am going to go back to using mledoze/countries, which was in better shape in that regard. |
Indeed, encoding is inconsistent and I do believe @lupinitylabs's suggestion makes sense. Is there any solution brewing for this @antonioribeiro? Regardless, I'd suggest you add technical information in your README in order to help developers handle those inconsistencies properly. |
Any update here @antonioribeiro ? |
How to fix the name of the country so that these strange characters do not appear?
In code:
In database:
I changed encoding of table & columns to UTF-16LE:
The text was updated successfully, but these errors were encountered: