Inconsistent + buggy behaviour of textContainsPrefix / Undocumented cases #4073

mrckzgl · 2023-10-20T13:43:02Z

This issue is resulting out of the following discussion: #3942

First, the documentation for textContainsPrefix is incomplete (more exactly: contradictory) for the case, where the search string contains multiple words / tokens (please have a look at the OP of the discussion for details).
I came up with the following plausible behaviour inferred from the single token case:

"For each token in the query string, at least one token in the text string (read: value of the field which is searched) has to be present, where query token is a prefix of text token"

According to @mad the In-memory implementation org.janusgraph.core.attribute.Text#CONTAINS_PREFIX works as I inferred, but SolrIndex and also LuceneIndex behave differently. Also according to @mad this could be considered a bug. For the LuceneIndex we found the actual behaviour (works just like regular tokenized textContains if query string consists of multiple tokens) and a possible fix to get it working as described above.

If I would be in charge of this, I would first propose to agree on the desired behaviour of textContainsPrefix for the multi token case, where I would actually propose the behaviour above. Then I would consistently implement this across all possible index backends and also very importantly: Add a description of the behavour to the documentation. But, I am not in charge (Spoiler: Probably don't have the time for a PR). So what do you as contributors / maintainers think of how to resolve the issue?

Version: at least 0.6.3
Storage Backend: possibly all?
Mixed Index Backend: at least SolrIndex, LuceneIndex
Link to discussed bug: Unexpected / undocumented behaviour of textContainsPrefix when query string contains multiple tokens #3942
Expected Behavior: textContainsPrefix should behave consistently and reasonable if query string contains multiple tokens. Further, that behaviour should be documented.
Current Behavior: textContainsPrefix behaves differently across backends.

mrckzgl added the kind/bug/possible label Oct 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent + buggy behaviour of textContainsPrefix / Undocumented cases #4073

Inconsistent + buggy behaviour of textContainsPrefix / Undocumented cases #4073

mrckzgl commented Oct 20, 2023 •

edited

Loading

Inconsistent + buggy behaviour of textContainsPrefix / Undocumented cases #4073

Inconsistent + buggy behaviour of textContainsPrefix / Undocumented cases #4073

Comments

mrckzgl commented Oct 20, 2023 • edited Loading

mrckzgl commented Oct 20, 2023 •

edited

Loading