Use periods consistently #79

junoslukan · 2021-02-23T15:46:39Z

As it stands, some abbreviations include periods, while others don't. Compare for example "Quality Assurance in Health Care" (Qual. Assur. Health Care) and "Quality Assurance in the Fish Industry" (Dev Food Sci).

Is there any reason for this inconsistency? I think it would be best to always use the period or not and to possibly leave the choice up to the user.

koppor · 2021-02-25T14:43:46Z

This somehow relates to #54.

I think, we did not properly document how the whole journal abbreviation lists are created, combined, etc.

We have some initial documentation at https://docs.jabref.org/advanced/journalabbreviations. And we are automatically importing the journal lists from here via https://github.com/JabRef/jabref/blob/master/.github/workflows/refresh-journal-lists.yml.

I currently have no time to dive into this topic further. Maybe you can? I know this it hard stuff and will take much time.

junoslukan · 2021-02-27T17:41:44Z

Hi koppor, thanks for looking into this.

I am not sure how the lists are created. It says in the workflow you linked to:

      # remove all lists without dot in them
      # we use abbrevatiation lists containing dots in them only (to be consistent)

But on the other hand there is a Python script in the merge you linked to that uses those lists.

Anyway, one idea for adding the dots would be to not add them to non-abbreviated words only. For that, you could look for exact matches in the abbreviated and non-abbreviated columns and assume that the ones which are the same are non-abbreviated words which require no dot.

I am not sure if that makes sense for all cases, as it does not apply for the example I listed ("Quality Assurance in the Fish Industry" -> "Dev Food Sci"), but to be honest, I am not sure if this is not an error anyway.

I also don't know if you would want to go ahead with modifying the lists at all or rather keep them intact and separate them into "with dots" and "dotless" only.

Let me know what needs to be done and I can judge better if I would be able to do that.

koppor · 2021-09-17T17:36:11Z

We have no documentation how the lists are created. One has to check the log of each file https://github.com/JabRef/abbrv.jabref.org/tree/main/journals. I hope, someone refines the README.md stating the source of the lists.

One has to note that

The lists in JabRefs distribution are created at https://github.com/JabRef/jabref/blob/3d7521c29145a24f473efe9a743d8d730ceef588/.github/workflows/refresh-journal-lists.yml#L48. It takes the dotless lists only
I have no clue who uses the Python scripts. The lists type dot / dotless are hardcoded in the python scripts:

abbrv.jabref.org/combine_journal_lists_dots.py

Line 19 in 08255f6

'journals/journal_abbreviations_acs.csv',
I think, the script https://github.com/JabRef/abbrv.jabref.org/blob/main/combine_journal_lists.py is only there for completeness.

Krzmbrzl · 2022-02-22T07:26:13Z

Anyway, one idea for adding the dots would be to not add them to non-abbreviated words only. For that, you could look for exact matches in the abbreviated and non-abbreviated columns and assume that the ones which are the same are non-abbreviated words which require no dot.

I second this approach. It seems like it would easily allow to automate this and it also appears to cover most cases (correctly).
Are there objections to adding a script to this repo that applies this rule to all CSV files and which is then also used on a PR-Check to ensure that proper punctuation is continued to be used?

Siedlerchr · 2022-02-22T08:07:38Z

@Krzmbrzl I think this sounds like a valid idea! You are welcome to provide a script!

Krzmbrzl · 2022-02-22T17:57:37Z

So I gave this a try and came up with

#!/usr/bin/env python3


import argparse
import os
import csv

# A list of CSV filenames that shall be excluded from being processed
blacklist = [
        "journal_abbreviations_annee-philologique.csv"
]

def main():
    parser = argparse.ArgumentParser("This script will make the use of periods in journal name abbreviations consistent (make sure they are used)")
    parser.add_argument("--journal-dir", help="The path to the directory containing the journal CSVs", metavar="PATH", default="journals")

    args = parser.parse_args()

    for currentFileName in os.listdir(args.journal_dir):
        if currentFileName in blacklist or not currentFileName.endswith(".csv"):
            continue

        changedEntries = 0
        changedRows = []

        with open(os.path.join(args.journal_dir, currentFileName), "r", newline="") as currentFile:
            # Assume files are small enough to easily fit in memory
            reader = csv.reader(currentFile, delimiter=";")

            for row in reader:
                if len(row) == 0:
                    # Skip empty lines
                    continue

                # columns are separated by semicolon

                assert len(row) >= 2 and len(row) <= 4, "Invalid column count in CSV file"

                fullName = row[0]
                abbreviation = row[1]
                # shortestUniqueAbbreviation = elements[2]
                # frequency = elements[3]

                specialChars = [",", ":", ";", "(", ")", "[", "]", "{", "}", "\"", "'"]

                fullWords = [x.strip().lower() for x in fullName.split(" ")]
                abbrWords = [x.strip() for x in abbreviation.split(" ")]

                # Replace special chars in full word list
                for currentChar in specialChars:
                    for i in range(len(fullWords)):
                        fullWords[i] = fullWords[i].replace(currentChar, "")

                # Remove empty entries
                fullWords = list(filter(None, fullWords))
                abbrWords = list(filter(None, abbrWords))

                changed = False

                for i in range(len(abbrWords)):
                    if any(char in specialChars for char in abbrWords[i]):
                        # Word contains a special character -> rather leave it alone
                        continue
                    if "-" in abbrWords[i]:
                        # Dashes in words are suspicious as well -> let's rather not touch these
                        continue

                    if abbrWords[i].endswith("."):
                        # Is already using a period
                        if abbrWords[i][ : -1].lower() in fullWords:
                            # The word was used as an abbreviation, but it appears that it wasn't really abbreviated -> remove period
                            abbrWords[i] = abbrWords[i][ : -1]
                            changed = True
                    else: 
                        if abbrWords[i].lower() in fullWords:
                            # Assume that every word that appears in the full journal name as-is, is not
                            # abbreviated and therefore also should not get a period attached to it
                            continue
                        else:
                            # Since the current word is not part of the full journal name, we assume that it
                            # was abbreviated and thus, we add a period to it
                            abbrWords[i] += "."
                            changed = True

                if changed:
                    changedEntries += 1
                    row[1] = " ".join(abbrWords)
                    # print(" ".join(fullWords), "->", " ".join(abbrWords), "(", abbreviation, ")")

                changedRows.append(row)

        if changedEntries > 0:
            # Write out new content
            with open(os.path.join(args.journal_dir, currentFileName), "w", newline="") as currentFile:
                writer = csv.writer(currentFile, delimiter=";", lineterminator="\n")
                writer.writerows(changedRows)

        print("======== Changed %d entries for %s" % (changedEntries, currentFile.name))




if __name__ == "__main__":
    main()

However, as it turns out, there appear too many exceptions that can't be handled properly using this simple approach. From what I have seen so far, I would even go as far as to say that an automated approach is probably not really feasible and any changes have to be performed manually by someone who knows what they are doing 🤷

Siedlerchr · 2022-02-22T18:06:24Z

So wee need to have seperate lists with dots and without dots? Maybe your script can generate a basis...

Krzmbrzl · 2022-02-22T18:33:07Z

Nah, I think there just is no good way of automating this. There exist several journal abbreviations that add e.g. a Sect. A to the abbreviation, even though the full name does not contain any kind of reference to a section A. Therefore, it is not possible to deduce whether the "A" is an abbreviation for something or whether it is meant literally.
The same issues occurs where certain journal abbreviations use city names that don't appear in the full name.

However, baking all these cases as special cases into the code, would probably result in approximately the same work as fixing this by hand.

Finally, there also seem to exist groups of journals that use no punctuation at all. They just combine the starting letters of the full name parts into a word (fictional example: Science Magazine -> SM).

northword · 2024-10-01T09:37:14Z

I think we should use ISO 4 journal abbreviations with periods, because removing periods is much more reliable than adding them, as suggested by another similar program, Zotero ¹.

JabRef could perhaps have a built-in feature for removing periods, or this repository could store only abbreviations with periods, and output a copy of the abbreviations without periods (in a combine script)

https://www.zotero.org/support/adding_items_to_zotero#journal_abbreviations ↩

koppor · 2024-10-01T12:38:52Z

Our documentation at https://docs.jabref.org/advanced/journalabbreviations has journal lists with dots. We have documentation on our lists at https://github.com/JabRef/abbrv.jabref.org/tree/main/journals#readme.

"Entrez", "Index Medicus" provides dotless abbreviations only. How to handle these?

I think JabRef/jabref#10557, needs to be fixed. Then, the combined journal list is obsolete. Then we can do lists per area (e.g., medicine, computer science, ...)

koppor mentioned this issue Sep 18, 2021

Create journal_abbreviations_jcr.csv #88

Closed

Siedlerchr mentioned this issue Jan 2, 2022

JabRef does not open after csv file was edited by hand JabRef/jabref#8376

Closed

2 tasks

anntzer mentioned this issue Jan 6, 2022

journals.json is inconsistent as to whether abbreviations are dotted texworld/betterbib#241

Closed

Siedlerchr mentioned this issue Feb 21, 2022

Journal abbreviations are missing periods JabRef/jabref#8516

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use periods consistently #79

Use periods consistently #79

junoslukan commented Feb 23, 2021

koppor commented Feb 25, 2021

junoslukan commented Feb 27, 2021

koppor commented Sep 17, 2021

Krzmbrzl commented Feb 22, 2022

Siedlerchr commented Feb 22, 2022

Krzmbrzl commented Feb 22, 2022

Siedlerchr commented Feb 22, 2022

Krzmbrzl commented Feb 22, 2022

northword commented Oct 1, 2024

koppor commented Oct 1, 2024

Use periods consistently #79

Use periods consistently #79

Comments

junoslukan commented Feb 23, 2021

koppor commented Feb 25, 2021

junoslukan commented Feb 27, 2021

koppor commented Sep 17, 2021

Krzmbrzl commented Feb 22, 2022

Siedlerchr commented Feb 22, 2022

Krzmbrzl commented Feb 22, 2022

Siedlerchr commented Feb 22, 2022

Krzmbrzl commented Feb 22, 2022

northword commented Oct 1, 2024

Footnotes

koppor commented Oct 1, 2024