Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add quality checker #149

Open
koppor opened this issue Sep 3, 2023 · 9 comments · May be fixed by #170
Open

Add quality checker #149

koppor opened this issue Sep 3, 2023 · 9 comments · May be fixed by #170

Comments

@koppor
Copy link
Member

koppor commented Sep 3, 2023

I needed to fix lists, because "wrong" lists were in. See #148

We should have a checker. Following are the tasks it should check:

ERROR: Wrong escape

"Zeszyty Naukowe Wy\","Problemy Mat."
"Journal of Evolutionary Biochemistry and Physiology\","J. Evol. Biochem. Physiol."

ERROR: Wrong beginning letters

"Zeszyty Naukowe Wy\","Problemy Mat."

(This is #107)

ERROR: List contains non-UTF8 characters

This is #125.

WARN: Double entries

"Advances in Applied Mathematics","Adv. Appl. Math."
"Advances in Applied Mathematics","Adv. in Appl. Math."

(This refs #77)

WARN: Same full form appearing twice

"Advances in Applied Mathematics","Adv. Appl. Math."
"Advances in Applied Mathematics","Adv. in Appl. Math."

(This refs #77)

WARN: Same abbrevation appearing twice

"Advances in Data Analysis and Classification. ADAC","Adv. Data Anal. Classif."
"Advances in Data Analysis and Classification. ADAC. Theory, Methods, and Applications in Data Science","Adv. Data Anal. Classif."

(This refs #77)

WARN: abbreviation is the same as the full text

"Quantum","Quantum"

WARN: Management is abbreviated with outdated "Manage." instead of "Manag.

This is #78

@northword
Copy link
Contributor

WARN: abbreviation is the same as the full text

When journal name is only one word,its abbreviation is the same as the full name.
e.g. full name: Fuel , its abbrev is Fuel.

@philcaz
Copy link

philcaz commented Oct 16, 2024

Hi, I would like to tackle this issue with my group : )

@koppor koppor moved this from Free to take to Assigned in Good First Issues Oct 16, 2024
@koppor
Copy link
Member Author

koppor commented Oct 16, 2024

@northword I think, the expected result is a Python tool residing in https://github.com/JabRef/abbrv.jabref.org/tree/main/scripts. It should print out issues and exit with failure code if issues are found. -- You can chose another programming language of you want.

Example output of lychee, which has another purpose, but also outputs check results:

Image

(Source: https://github.com/JabRef/jabref/actions/runs/11361716475)

@philcaz
Copy link

philcaz commented Oct 19, 2024

Hey, when implementing the check logic for 'WARN: abbreviation is the same as the full text,' should we only give a warning if the journal's name has more than one word and the abbreviation is the same as its full name? If the journal name is just one word, as @northword mentioned, should we simply pass it?

@koppor
Copy link
Member Author

koppor commented Oct 19, 2024

Hey, when implementing the check logic for 'WARN: abbreviation is the same as the full text,' should we only give a warning if the journal's name has more than one word and the abbreviation is the same as its full name? If the journal name is just one word, as @northword mentioned, should we simply pass it?

Yes.

@philcaz
Copy link

philcaz commented Oct 19, 2024

My current function that checks the validity of starting letters of abbreviations considers the below entries as invalid, because the starting letters of the abbreviations do not match well with the full names.

Full: 'Polish Academy of Sciences', Abbrev: 'Acta Phys. Polon. A'
Full: 'Jagellonian University', Abbrev: 'Acta Phys. Polon. B'
Full: 'Universităţii din Timișoara', Abbrev: 'An. Univ. Timișoara Ser. Mat.-Inform.'
Full: 'Universităţii "Ovidius" Constanţa', Abbrev: 'An. Ştiinţ. Univ. Ovidius Constanţa Ser. Mat.'

However, these abbreviations seem to be legitimate for the corresponding full names, though not being obvious. Could you provide some idea how I should refine the criteria of invalidity?

@koppor
Copy link
Member Author

koppor commented Oct 19, 2024

Maybe a hard coded list of exceptions? 😅

@philcaz
Copy link

philcaz commented Oct 19, 2024

Not sure how many there are to be hardcoded : ( I might try using some similarity threshold to check them. That way abbreviations that are legitimate but are too different from the original full names would fail the check. Does that work?

@koppor
Copy link
Member Author

koppor commented Oct 19, 2024

Not sure how many there are to be hardcoded : ( I might try using some similarity threshold to check them. That way abbreviations that are legitimate but are too different from the original full names would fail the check. Does that work?

I haven't tried.

Maybe test cases need to be generated.

Maybe warnings can be output. Then an exception file generated by the user. Similar to .lycheeignore for the link checker lychee.

Obe might aslo output a number stating the distance.

For manual lists, this is helpful.

For downloaded lists, reports could be made.

I think, there are bugs in the lists.

@philcaz philcaz linked a pull request Oct 20, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

4 participants