Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regular expression in search field and group delivers strange results #12163

Closed
2 tasks done
ytzemih opened this issue Nov 7, 2024 · 13 comments
Closed
2 tasks done

Regular expression in search field and group delivers strange results #12163

ytzemih opened this issue Nov 7, 2024 · 13 comments

Comments

@ytzemih
Copy link

ytzemih commented Nov 7, 2024

JabRef version

Latest development branch build (please note build date below)
I tried the build from 2024-11-05 23:12

Operating system

GNU / Linux

Details on version and operating system

Linux Mint/Debian 12

Checked with the latest development build (copy version output from About dialog)

  • I made a backup of my libraries before testing the latest development version.
  • I have tested the latest development version and the problem persists

Steps to reproduce the behaviour

  1. Use, for example, a complex search expression "(title = a|b|c) or (keywords = d|e|f)", which works just fine for me under JR 5.15, either in the search field or as an expression in some search group.
  2. Make sure "regular expression" checkbox/feature is turned on.
  3. You get an empty result or at least a result that does not include entries expected to be included.
  4. (optional) When changing the search expression a few times (extend it, correct, etc.), the entry list slowly degrades: the filter seems to fail to update properly and empty entries appear. Scrolling makes everything worse.
    There are perhaps several issues at play. But my focus was on getting a search done. The migration back to the old (nice) search seems not to be a faithful migration. I discovered these issues after migrating the search expressions from 5.16 back to the old search. Thanks for checking this.

I have switched back to JR 5.15 and the back-migrated search expressions work pretty well again.

Appendix

...

Log File
Paste an excerpt of your log file here
@ytzemih ytzemih changed the title Search expression in search field and search groups deliver strange results Regular expression in search field and group delivers strange results Nov 7, 2024
@koppor koppor added the search label Nov 8, 2024
@ryan-carpenter
Copy link

ryan-carpenter commented Nov 8, 2024

The regex option has no effect on "field = value" expressions. To use regex with field names, the expression must have the form "field =~ value", which will apply the regular expression regardless of the ".*" regex option.
To put it another way, using field = myterm explicitly disables regex while field =~ myterm explicitly enables it, on this term only without affecting the rest of the search. Note that the "abc" case-sensitive option follows the same principle.

The idea makes sense, because it allows regex and non-regex terms to coexist in the same search. However, in practice this is totally unintuitive and not worth the trade-off. My suggestion for the maintainers is to keep "field =~ value" explicit (always apply regex syntax for this term) and make "field = value" apply standard or regex syntax, depending on the regex button/checkmark. In other words, = and =~ should be treated as equivalent when the regex option is enabled.

Personally, I keep regex enabled all the time, so adding escape characters as needed has become second nature.

This is how the search currently works in the development version.

Terms Regex Term 1 Term 2
title =~ pa*ediatric AND 1.0 Off Matches "paediatric", "pediatric" Matches "1.0"
title =~ pa*ediatric AND 1.0 On Matches "paediatric", "pediatric" Matches "1.0", "1+0" "1/0", "1q0", ...
title = pa*ediatric AND 1.0 Off No match. Regex is disabled "1.0"
title = pa*ediatric AND 1.0 On No match. Regex is disabled for this term Matches "1.0", "1+0" "1/0", "1q0", ...

@koppor
Copy link
Member

koppor commented Nov 8, 2024

@ytzemih Thank you for reporting. Since this is a freetime project and heavily depending on volunteers, I would like to ask you if you could spend some time on crafting a "minimal working example". Maybe, you can use a bib file from https://github.com/JabRef/jabref/tree/main/src/test/resources/testbib and check if you can reproduce the issue there?

Note that the whole endevour was a Google Summer of Code project (https://github.com/JabRef/jabref/wiki/GSoC-2024-%E2%80%90-Lucene-Search-Backend-Integration). The contributor continued on the project to have the search syntax similar to v5.15 (again) - but we extended it (as explained at #12163 (comment)) to enable mixed use of RegEx and non-regex.

@koppor koppor added this to the 6.0-alpha milestone Nov 8, 2024
@ytzemih
Copy link
Author

ytzemih commented Nov 8, 2024

@ryan-carpenter and @koppor: Thanks for getting back to me.

First of all: I keep RegEx on all the time, too.

My problem is rather simple: I couldn't find what @ryan-carpenter explains on https://docs.jabref.org/finding-sorting-and-cleaning-entries/search#regular-expressions. Yesterday, I carefully studied this website to get things right. But, now I understand why my search expressions don't work under JR 6.

Regarding the MWE and the testbibs: I tested my expressions now with JR6 using =~ instead of =, and they all seem to work. Using your new search expression specification, I can't reproduce the issue anymore.

Your rationale are sound, perhaps a little too ambitious for the seasoned JR user, but fine if you communicate it in the help. Without clear help, the mixed syntax might be confusing for some users like me.

In other words, = and =~ should be treated as equivalent when the regex option is enabled.

This sounds like the removal of the RegEx button may avoid confusion once and for all?

On a side note, I know about JR being a free-time project, hence, I'm contributing bug reports (because I've not enough time to actually code). I've been using JR for almost 20 years now. Great tool, and cool that you allow GSoC contributors to bring innovations into the game, with the hassle that new developers might create at times ... I've been young myself and not all of my source code contributions have been "perpetual pearls" ... ;)

Anyway, thanks for all your work.

(I've bumped into some Java exceptions, but will file them separately.)

@koppor
Copy link
Member

koppor commented Nov 8, 2024

@LoayGhreeb

My suggestion for the maintainers is to keep "field =~ value" explicit (always apply regex syntax for this term) and make "field = value" apply standard or regex syntax, depending on the regex button/checkmark. In other words, = and =~ should be treated as equivalent when the regex option is enabled.

WDYT?

Currently, I think, this is good, because users might want to write simple queries as "abc and title=def" and then say: OK, please do that case-sensitive and use regex. - and for all of the two search sub terms.

@koppor
Copy link
Member

koppor commented Nov 8, 2024

In other words, = and =~ should be treated as equivalent when the regex option is enabled.
This sounds like the removal of the RegEx button may avoid confusion once and for all?

No, the other way round: Just have the button set the interpretation of all sub terms, and not a mixed interpretation.

Example:

  • abc. AND title = def.
  • Entry with author = abc and title defg.

Currently: RegEx on:

Does NOT match the entry, because title does not end with a dot.

Proposal: RegEx on:

Matches the entry, because title contains a letter after def (matched by the dot= .

On a side note, I know about JR being a free-time project, hence, I'm contributing bug reports (because I've not enough time to actually code).

👍 - Maybe, we can meet at "JabCon" and code for three days next year? 😅 The sight seeing on JabCon is rewarding! 😅

I've been using JR for almost 20 years now.

Nice! Longer than me then 😅.

(I've bumped into some Java exceptions, but will file them separately.)

Yes, please. We try to curate them.

@ytzemih
Copy link
Author

ytzemih commented Nov 8, 2024

@koppor Ah, I see where you're getting at. I'd support the proposal. Because the current behavior seems quite intricate/unintuitive.

Didn't know about JabCon, where will it take place? Not sure whether I can join but cool to see that there is such a thing.

@koppor
Copy link
Member

koppor commented Nov 8, 2024

Didn't know about JabCon, where will it take place? Not sure whether I can join but cool to see that there is such a thing.

We write very seldomly about it. We have a "small" homepage (https://jabcon.jabref.org/), but the blog posts tell more (https://blog.jabref.org/tags/jabcon/) - I just added a link to it from the JabCon page.

@ryan-carpenter
Copy link

Just have the button set the interpretation of all sub terms, and not a mixed interpretation.

Precisely.

The button is important because:

  • it "advertises" the opportunity to discover regex, and easy access to "advanced" features like this are part of the magic. JabRef's own syntax is also powerful yet simple, whereas other apps might provide a query builder that has to be opened in a separate window to accomplish what JabRef does in the search bar.
  • it provides a means of toggling regex on all terms at once, reducing the number of characters to edit when switching from one syntax to another.

The ability to apply regex searching with =~ is nice, and I think you should keep this, because:

  • it allows use of mixed regex and non-regex, for those who want it, because they can leave regex off and instead apply =~ selectively.
  • it makes regex controllable entirely from the keyboard. Tip: use anyfield =~ value for regex everywhere.

@ryan-carpenter
Copy link

My problem is rather simple: I couldn't find what @ryan-carpenter explains on https://docs.jabref.org/finding-sorting-and-cleaning-entries/search#regular-expressions. Yesterday, I carefully studied this website to get things right. But, now I understand why my search expressions don't work under JR 6.

Same for me. I knew something had changed but could not recall where I had seen the news. Checked Github. Checked Discourse. Asked on Gitter. Finally realised I had seen it in the regex button tooltip!

@koppor
Copy link
Member

koppor commented Nov 8, 2024

My problem is rather simple: I couldn't find what @ryan-carpenter explains on https://docs.jabref.org/finding-sorting-and-cleaning-entries/search#regular-expressions.

Fixed 😅 - https://docs.jabref.org/finding-sorting-and-cleaning-entries/search#modifiers-for-fields

@ytzemih
Copy link
Author

ytzemih commented Nov 8, 2024

Thanks! Today, I learned more about JR.

@ryan-carpenter
Copy link

@ytzemih, Lucene syntax was reverted to JabRef syntax, so I think this issue can be closed

@ytzemih
Copy link
Author

ytzemih commented Dec 13, 2024

@ryan-carpenter, thanks, I saw that. Yes, can be closed from my PoV.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants