Added polish diacritics in non_ascii_equivalents.py #386

finem4n · 2024-10-31T19:23:48Z

As in title I've added some polish diacritics and also extended filter tags list

phw

Thanks. This is somewhat similar to #387

Extending the tag list that way makes sense to me. The new tags seem to be in line with how the plugin originally was conceived. It probably would be even better to have some configuration for this, but in absence of this I think the extensions as presented here is useful.

What also applies here is my comment on #387 about using Picard's picard.util.textencoding.unaccent function (please see my detailed comment there). This would allow to get rid of the explicit mapping of most accented characters. As I see it the first mapping section can be completely removed then, except for the two letters "Ł" and "ł" (which could be placed under "Misc letters" then.

In the future this will then avoid the need to add additional accented characters, likely there are a few that we still miss.

finem4n · 2024-11-07T23:26:52Z

Hi. I implemented picard.util.textencoding.unaccent. I did some testing and at first glance it handled more than letters, e.g. ≠ and the L shaped ones, but results were disappointing:
≠ changed to =
Ls changed back to 「 instead of |-
So I left them in CHAR_TABLE as they were before.
As per your suggestion in #387 I renamed function ascii to to_ascii. I couldn't find a better name.
I also bumped version and appended my name in authors section, if you don't mind.
If I have more free time, I'd be willing to step in and add scripting functionality.

Sophist-UK · 2024-11-08T09:43:44Z

In # 387 Echelon666 has said:

You have to finish it yourself.

@finem4n Konrad Would you be willing to include the extra characters from the other PR taking into account @phw Philipp's comments?

finem4n · 2024-11-08T11:37:44Z

@Sophist-UK Yeah, sure.
According to wiki one of the transliterations of þ (thorn) is th not p, so I've changed that. I also went with ascii representations of ♥ → ・ instead of minus.

phw

Thanks for the update, this looks good to me.

plugins/non_ascii_equivalents/non_ascii_equivalents.py

Echelon666 · 2024-11-13T21:02:18Z

What about this:

"č": "c",
"š": "s",
"ș": "s",

unaccent performs this?

zas · 2024-11-13T21:11:20Z

What about this:

"č": "c", "š": "s", "ș": "s",

unaccent performs this?

Yes.

>>> unaccent("čšș")
'css'

phw

Thanks a lot

Added polish diacritics in non_ascii_equivalents.py

49dc251

phw requested changes Nov 7, 2024

View reviewed changes

phw mentioned this pull request Nov 7, 2024

Strengthening an existing plugin “Non-ASCII Equivalents” #387

Closed

implemented picard.util.textencoding.unaccent

0e548e0

Added 'other' characters from metabrainz#387

d65e0f6

Removed trailing whitespace

fcf56e8

phw approved these changes Nov 13, 2024

View reviewed changes

phw requested a review from zas November 13, 2024 15:08

phw added the enhancement label Nov 13, 2024

zas reviewed Nov 13, 2024

View reviewed changes

plugins/non_ascii_equivalents/non_ascii_equivalents.py Show resolved Hide resolved

changed conversion of å to aa and Å to AA

8776866

phw approved these changes Nov 13, 2024

View reviewed changes

phw merged commit 277aa4d into metabrainz:2.0 Nov 13, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added polish diacritics in non_ascii_equivalents.py #386

Added polish diacritics in non_ascii_equivalents.py #386

finem4n commented Oct 31, 2024

phw left a comment

finem4n commented Nov 7, 2024 •

edited

Loading

Sophist-UK commented Nov 8, 2024

finem4n commented Nov 8, 2024

phw left a comment

Echelon666 commented Nov 13, 2024 •

edited

Loading

zas commented Nov 13, 2024

phw left a comment

Added polish diacritics in non_ascii_equivalents.py #386

Added polish diacritics in non_ascii_equivalents.py #386

Conversation

finem4n commented Oct 31, 2024

phw left a comment

Choose a reason for hiding this comment

finem4n commented Nov 7, 2024 • edited Loading

Sophist-UK commented Nov 8, 2024

finem4n commented Nov 8, 2024

phw left a comment

Choose a reason for hiding this comment

Echelon666 commented Nov 13, 2024 • edited Loading

zas commented Nov 13, 2024

phw left a comment

Choose a reason for hiding this comment

finem4n commented Nov 7, 2024 •

edited

Loading

Echelon666 commented Nov 13, 2024 •

edited

Loading