Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add check for existing crossrefs when importing entries into a library #11922

Closed
wants to merge 15 commits into from

Conversation

tom-pjobrien
Copy link

@tom-pjobrien tom-pjobrien commented Oct 12, 2024

  • Added a check to ensure citation keys generate correctly when importing an already-crossreferenced entry into a library

Before:
image

After:
image

This PR fixes #11136

Mandatory checks

  • Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your code currently does not meet JabRef's code guidelines.
We use Checkstyle to identify issues.
The tool reviewdog already placed comments on GitHub to indicate the places. See the tab "Files" in you PR.
Please carefully follow the setup guide for the codestyle.
Afterwards, please run checkstyle locally and fix the issues.

You can check review dog's comments at the tab "Files changed" of your pull request.

Copy link
Collaborator

@InAnYan InAnYan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some small comments on your PR, but probably you also need a review from main maintainers, as I haven't fully understood the original issue

@tom-pjobrien tom-pjobrien marked this pull request as draft October 12, 2024 08:34
@tom-pjobrien
Copy link
Author

Thanks for the comments @InAnYan, I'll take a look through and update the key generation logic.

@tom-pjobrien tom-pjobrien marked this pull request as ready for review October 15, 2024 03:10
@tom-pjobrien tom-pjobrien requested a review from InAnYan October 15, 2024 03:11
Copy link
Member

@koppor koppor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange change...

Did you test it?

Comment on lines 203 to 204
finalizedEntries.addAll(entries);
generateKeys(finalizedEntries);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the list? What happens on a second import of other entries? Will the citation keys of the previous entries be replaced?

@tom-pjobrien
Copy link
Author

Strange change...

Did you test it?

I tested this change manually with a few different inputs and variations (e.g importing crossreferenced entries at the same time in different orders, importing them in different imports etc.). It worked with all key patterns I tried as well as for general use.

I used a list since in practice entries are imported and have keys generated one at a time, so adding them to a list as they are imported and then generating keys on all of them at the end is the best way of ensuring crossref relationships are respected in terms of citation keys. Currently if a second import occurs the list will be maintained, but if old keys should not be regenerated that's an easy fix as well.

Copy link
Member

@koppor koppor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add two tests to org.jabref.gui.externalfiles.ImportHandlerTest

One where the List of entries has the cross-refernced entry first, the other where it comes second.

Ensure that the citationkeygenerator is active. (This is a harder one, because the test currently does not have the citation key generator enabled).

This is crucial to automatically check that the poster's issue is resolved:

image


I would suggest to remove the variable finalizedEntries and directly pass entries to generateKeys as I don't see any additional value of this variable.

@tom-pjobrien
Copy link
Author

I can absolutely work on writing some tests.

I would suggest to remove the variable finalizedEntries and directly pass entries to generateKeys as I don't see any additional value of this variable.

I believe that passing entries to generateKeys is what was causing the original issue. importCleanedEntries imports entries one at a time. For example if I used your above reproducer, the first time importCleanedEntries is called it will only generate a key for the first entry, and then the second entry the next time it is called etc.:
image
Essentially entries will always be a size-1 list of a single entry. If a child is imported before a parent this will then lead to its citekey being incorrectly generated. Using a list which collects all entries will 're-generate' the child entry's key with respect to the parent entry. After an import is completed the list is cleared so it does not effect future imports. If there is a better way to implement this please to let me know.

@@ -76,6 +76,9 @@ public class ImportHandler {
private final DialogService dialogService;
private final TaskExecutor taskExecutor;

// list is filled as each entry is passed to importCleanedEntries
private final List<BibEntry> finalizedEntries = new ArrayList<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename list to addedEntries. finalized is a strange word.

Also remove final keyword, because the list itself is modified; thus final is misleading.

@koppor
Copy link
Member

koppor commented Oct 17, 2024

I think, this change is required because of our update at #10741 to org.jabref.gui.externalfiles.ImportHandler#importEntriesWithDuplicateCheck -- now, entries get added one by one and not as complete list. - This PR reassembles the complete list. @Siedlerchr I don't see a better solution for now. Do you?

@tom-pjobrien
Copy link
Author

I have made the requested changes and added tests, so everything is now working correctly. However, I have noticed something unusual. The previous case which worked correctly before my changes (parent first, then child) no longer works correctly with a single call of generateKeys, as it seems like some other commit has caused issues with crossref inheritance. Calling generateKeys a first time across all entries before they are added to the database (in importEntriesWithDuplicateCheck) resolves this, so it's a small issue, but I just wanted to point it out.

Comment on lines 420 to 426
LOGGER.debug("not first entry, not pref flag, break will be used");
importEntryWithDuplicateCheck(database, entry);
LOGGER.debug("not first entry, not pref flag, break will be used");
importEntryWithDuplicateCheck(database, entry);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong indent.

@koppor
Copy link
Member

koppor commented Oct 19, 2024

The previous case which worked correctly before my changes (parent first, then child) no longer works correctly with a single call of generateKeys, as it seems like some other commit has caused issues with crossref inheritance. Calling generateKeys a first time across all entries before they are added to the database (in importEntriesWithDuplicateCheck) resolves this, so it's a small issue, but I just wanted to point it out.

I do not understand. Is there a test case for it? - You moved generateKeys(addedEntries); down - I haven't checked that method if it also updates the crossrefs.

@tom-pjobrien
Copy link
Author

There is a test case for it and it passes. I had to make some small adjustments to my implementation, but my tests pass and everything is working as expected.

@koppor
Copy link
Member

koppor commented Oct 20, 2024

There is a test case for it and it passes. I had to make some small adjustments to my implementation, but my tests pass and everything is working as expected.

Any hint which concrete test case it is?

I need more time to understand the "small issue" and what needs to be done.


I understand: With your PR: Something is broken which is not broken before.

@koppor
Copy link
Member

koppor commented Oct 20, 2024

To review:

  1. Try if crossreferenced entry citekey changed when pasing into a new library #11136 (comment) works now
  2. Swap the entries of crossreferenced entry citekey changed when pasing into a new library #11136 (comment) and try again (in a new library)
  3. Switch to another library try pasting other entries - does that still work?

Maybe adjust the test text of #11136 (comment) - it seems to be incomplete.

  • Starting with an empty library
  • Configuration of citation key generator

Tests are existing at src/test/java/org/jabref/gui/externalfiles/ImportHandlerTest.java.

But the "smal lissue" described at #11922 (comment) is unclear to me.

@tom-pjobrien
Copy link
Author

Any hint which concrete test case it is?

The test case is importCrossRefParentFirstTest.

But the "smal lissue" described at #11922 (comment) is unclear to me.

It was originally the case that when a parent was imported before a child it used to work before I made any changes, but this stopped working. I made an adjustment to my code so it works again, by initially calling generateKeys across all entries in importEntriesWithDuplicateCheck to establish crossref keys. Each entry is then inserted one at a time in importCleanedEntries where its key is re-generated with duplicate checks + automatic fields in mind. So there is no issue at all anymore, I just wanted to explain my changes. Apologies for the confusion.

Try if #11136 (comment) works now
Swap the entries of #11136 (comment) and try again (in a new library)
Switch to another library try pasting other entries - does that still work?

I have done all of the above and everything is working as expected. I also tried a number of key generation patterns ([year][auth], [title][year] etc.) as well as using my own test data. The parent's year is used correctly for the child's key no matter the order of import/state of the libraries.

@koppor
Copy link
Member

koppor commented Oct 21, 2024

I made an adjustment to my code so it works again, by initially calling generateKeys across all entries in importEntriesWithDuplicateCheck to establish crossref keys.

Strange change. The entries pasted should have working referencing keys. If it only works after an update, something is wrong probably somewhere else.

@Siedlerchr
Copy link
Member

Please be aware that you should only generate keys if the preferences are set but I think that generateCitaitonKeys already has a check

@koppor koppor added the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label Oct 23, 2024
@koppor
Copy link
Member

koppor commented Oct 27, 2024

Please be aware that you should only generate keys if the preferences are set but I think that generateCitaitonKeys already has a check

Already handled:

    private void generateKeys(List<BibEntry> entries) {
        if (!preferences.getImporterPreferences().isGenerateNewKeyOnImport()) {
            return;
        }

Our method name is not that good.

Copy link
Member

@koppor koppor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not work at all.

image

The Crossref value is NOT updated on key generation. This needs to be fixed.

org.jabref.logic.citationkeypattern.CitationKeyGenerator#generateAndSetKey needs to be modifiered to return a List<FieldChange>. (List is empty on no change)

org.jabref.logic.citationkeypattern.CitationKeyGenerator#generateKey needs to be adpated to go through this.database.getEntries() and check crossref of each entry. If it is the old key: replace with new one and break.

You can try out with following entries - there is no year used for citaton key generation on paste for the second article.

@Article{parent,
  author   = {first-author},
  year     = {2024},
}

@Article{,
  author   = {third-best},
  title    = {title},
  crossref = {parent},
}

Thus,

@koppor koppor added status: changes required Pull requests that are not yet complete and removed status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers labels Oct 27, 2024
@koppor
Copy link
Member

koppor commented Oct 30, 2024

I think, citaton keys are properly updated using the org.jabref.model.database.KeyChangeListener.

@koppor
Copy link
Member

koppor commented Dec 9, 2024

Closing this issue due to inactivity 💤

Please ping us if you intend to resume work on this one.

@calixtus calixtus closed this Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: changes required Pull requests that are not yet complete
Projects
None yet
Development

Successfully merging this pull request may close these issues.

crossreferenced entry citekey changed when pasing into a new library
6 participants