Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strengthening an existing plugin “Non-ASCII Equivalents” #387

Closed
wants to merge 3 commits into from

Conversation

Echelon666
Copy link

I added 41 new characters to the existing “Non-ASCII Equivalents” plugin:

18 Polish diacritics
23 "My others" special characters that were junk in the ID3 tag of my MP3 collection.

thread on forum:

https://community.metabrainz.org/t/strengthening-an-existing-plugin-non-ascii-equivalents/722822

Echelon666 and others added 3 commits November 6, 2024 22:46
I added 41 new characters to the existing “Non-ASCII Equivalents” plugin:

18 Polish diacritics
23 "My others" special characters that were junk in the ID3 tag of my MP3 collection.

Co-Authored-By: Anderson Mesquita <[email protected]>
@Echelon666
Copy link
Author

I don't know how to fix the Codacy warning. I don't want to break the code.

Copy link
Member

@phw phw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your changes. In general the additions make sense, but I have two concerns here:

  1. I think this plugin would be more useful as a separate plugin if it would provide a scripting function instead of just doing the replacement in 3 arbitrary tags. See my comment below.
    Otherwise the extended replacements should become part of the existing plugin. I don't see why we need an essentially identical plugin in the plugin list, this just causes confusion.

  2. The simple accented characters that just get replaced with their unaccented base character should be removed from the static replacements. Instead Picard's built-in unaccent function should be run on the string. This will take care of all the accented cases here and all those that are probably missing.

return char


def ascii(word):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ascii is a built-in Python function, and the custom function is hiding that. Hence the warning. I'd recommend renaming this function to something else (replace_ascii_equivalents?)

"・": "-",
"☆": "-",
"★": "-",
"/": ",",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the four chars below don't seem to belong there. Those are all valid ASCII characters and I don't see why those should be unwanted in tags.

For Windows filenames they are not allowed, but those characters get replaced anyway by Picard in filenames and there are existing settings for those.

"album",
"artist",
"title",
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this limitation to those three tags is one of the weaknesses of the current plugin. This seems very arbitrary and limiting.

What I think would be much more useful is if the plugin would register a scripting function, see https://picard-docs.musicbrainz.org/en/appendices/plugins_api.html#tagger-script-functions . If there was a scripting function $ascii(...) or such (maybe someone has a better name) then users would be flexible to replace characters in whatever tag they want:

$set(composer,$ascii(%composer%))

"→": "-",
"・": "-",
"☆": "-",
"★": "-",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better replace with ASCII star *

"í": "i", "ì": "i", "ï": "i", "î": "i",
"ó": "o", "ò": "o", "ö": "o", "ô": "o",
"ú": "u", "ù": "u", "ü": "u", "û": "u",
"ý": "y", "ỳ": "y", "ÿ": "y", "ŷ": "y",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the above, some of the ones in "Misc Letters" and most of the new "Polish letters" you added can be handled by Picard's built-in function picard.util.textencoding.unaccent.

from picard.util.textencoding import unaccent

print(unaccent("ÄŽ"))  # this will print "AZ"
```

"Ç": "C",
"ç": "c",
"Ñ": "N",
"ñ": "n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above four will be handled by unaccent

"Ł": "L",
"ł": "l",
"Ń": "N",
"ń": "n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for "Ł" and "ł" all the above will be handled by unaccent



def ascii(word):
return "".join(sanitize(char) for char in word)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest getting rid of the diacritic characters in the replacement tables as noted above and use Picard's unaccent function instead as I described above. Then do this here:

    word = "".join(sanitize(char) for char in word)
    return unaccent(word)

@phw
Copy link
Member

phw commented Nov 7, 2024

@Echelon666 @finem4n So we currently have two similar PRs, #386 and #387. I think you guys should see what changes we make.

If Echelon666 is fine reworking this plugin to provide a scripting function I think this warrants this as a separate plugin. The existing non_ascii_equivalents plugin can still be extended with additional tags as in #386 and reworked to use unaccent function.

@Sophist-UK
Copy link
Contributor

Sophist-UK commented Nov 7, 2024

I am entirely unclear why we are creating an entirely new "enhanced" plugin here rather than simply enhancing the existing one.

This is like me saying I want to tweak something in Picard, and I do so by publishing Picard Enhanced which does exactly what Picard does but with my additional tweak.

This is a route to madness - an exponentially expanding ecosystem of almost identical functionality, but each slightly different from the other. If I wanted to enhance this plugin further, am I supposed to publish a "Non-ASCII Equivalents Enhanced Extended" version - and what happens when (sooner) we run out of synonyms for "Enhanced" OR (later) it literally takes more characters for the name than there are atoms in the Universe?

Obviously you need to be wary of introducing backwards incompatibilities that would significantly change the outcome for someone upgrading in ways they wouldn't want. I do NOT think that is the case here, but even if it is was backwardly incompatible, I would expect to see minor incompatibilities handled with an option not by creating an entirely different but extremely similar competing plugin.

P.S. It also makes reviewing the code more difficult because you don't get a DIFF showing what has changed from the original.

P.P.S. I did suggest to @Echelon666 in the forums BEFORE he created this PR that this should be an update to the existing plugin, but that advice was apparently ignored. See also #384 where I took the time to provide an example of his proposed plugin with a better coding style, but @Echelon666 also ignored advice and review comments. TBH I am somewhat sadly forming the impression that it is complete waste of time reviewing PRs from him, because review comments and advice are never accepted and in the end the PR simply languishes open but unmergeable.

@phw said:

If Echelon666 is fine reworking this plugin to provide a scripting function I think this warrants this as a separate plugin.

I disagree. Adding a scripting function is a non-breaking change.

@phw
Copy link
Member

phw commented Nov 7, 2024

I disagree. Adding a scripting function is a non-breaking change.

My problem with the existing plugin is that it does all this transformation on existing kind of arbitrary existing tags. Adding a function is non-breaking, but removing the automatic conversion of tags would be. So when installing the plugin one would need to accept this automatic conversion.

On the other hand we could also add such a function to Picard proper.

Overall #386 is the better PR currently, though.

@Sophist-UK
Copy link
Contributor

Sophist-UK commented Nov 7, 2024

No - whilst adding a hardcoded expanded list of tags to apply this to would indeed be breaking, instead providing a UI options page where the default set of tags is the existing set would be non-breaking. IMO we really, really need to avoid proliferation of highly similar plugins and instead have one plugin with options.

@phw
Copy link
Member

phw commented Nov 7, 2024

IMO we really, really need to avoid proliferation of highly similar plugins and instead have one plugin with options.

Yes, I agree with that. The non_ascii_equivalents as it exists currently is for that reason something I definitely would not want to port to the new plugin system as it is.

@Sophist-UK
Copy link
Contributor

@Echelon666 When you make a commit, please provide an explanatory commit comments so that we can immediately see what your commit is intended to do.

@Echelon666
Copy link
Author

OK. Thanks.

@Echelon666
Copy link
Author

I removed Trailing whitespace and repeated Ó.

I can't help you more because I don't know Python and Github very well.

You have to finish it yourself.

finem4n added a commit to finem4n/picard-plugins that referenced this pull request Nov 8, 2024
@phw
Copy link
Member

phw commented Nov 13, 2024

@Echelon666 I think the earlier PR #386, which now has also been updated, better addresses improving the plugin. With these changes most of what you wanted to have improved, specifically with handling Polish characters, will be solved in the existing plugin.

@Echelon666
Copy link
Author

@phw

But what do you suggest? That I remove my (this) PR?

Let it be. ;) Just give me a sign.

I will have my plugin for private use anyway, because in addition to Polish characters it has to remove funny characters.

@Sophist-UK
Copy link
Contributor

Sophist-UK commented Nov 13, 2024

Philipp is suggesting that you close this PR because:
a. Your ideas were incorporated into another PR which is going to get merged; and
b. Because you said "I can't help you more because I don't know Python and Github very well. You have to finish it yourself." and no one is prepared to finish it for you because of your attitude.

I would imagine that he would prefer that you close it yourself unmerged, rather than for him to do it unilaterally.

And don't forget that I predicted this a few days ago in a comment elsewhere.

@Echelon666
Copy link
Author

@Sophist-UK can you shut up?

@Sophist-UK
Copy link
Contributor

Sophist-UK commented Nov 13, 2024

Let it be. ;) Just give me a sign.

You quite literally asked for someone to tell you what to do. So I did - admittedly pretty bluntly.

And then you blame ME and tell me to shut up?

Given our shared history of me being blunt with you on several occasions now, did you really think that telling me to "shut up" would work rather than just prompt a further blunt response?

FFS, grow up, act more maturely and take responsibility for the consequences of your own words and attitude.

@Echelon666
Copy link
Author

Go away, you loser.

phw pushed a commit that referenced this pull request Nov 13, 2024
@phw
Copy link
Member

phw commented Nov 14, 2024

Yes, I'll close this. There have been two very identical PRs, and the other one provided the significant changes to the existing plugin. Coding changes can take time, and oftentimes one needs to go one step back to move forward. Also a plugin like this always can be changed and extended for personal use easily if wanted.

@Echelon666 It's one thing to argue, but insulting others here is not acceptable. @Sophist-UK, who has been around for a long time and knows Picard and the plugins well, has been trying to help you despite all difficulties and disagreements.

I'll close this PR now, but I expect all future discussions here and on the forums to be civilized.

@phw phw closed this Nov 14, 2024
@metabrainz metabrainz locked as too heated and limited conversation to collaborators Nov 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants