Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Implementation of ExistsQuery #2160

Merged
merged 1 commit into from
Sep 7, 2023
Merged

Conversation

imotov
Copy link
Contributor

@imotov imotov commented Aug 28, 2023

Implements ExistsQuery that returns all document where a field has some value.

Fixes #2159

@imotov imotov requested review from fulmicoton and PSeitz August 28, 2023 23:39
@fulmicoton
Copy link
Collaborator

fulmicoton commented Aug 29, 2023

@imotov The approach is ok. I would

  • ditch the doc ids buffering. It makes your code more complex, and I suspect it makes it slower than the trivial implementation. @PSeitz may have a different view on this.
  • possibly change the spec of what to do if the column is not found
  • add unit tests

The other half of the ticket is to integrate this into the ExistsQuery of quickwit:

  • disabling exists query indexing for fast fields
  • plug that implementation instead.

One tricky thing will be to test what happens when facing a fast field or when facing a dynamic field.

src/query/exist_query.rs Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Aug 29, 2023

Codecov Report

Patch coverage: 96.46% and project coverage change: +0.14% 🎉

Comparison is base (ee6a7c2) 94.28% compared to head (82a1022) 94.42%.
Report is 2 commits behind head on main.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the GitHub App Integration for your organization. Read more.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2160      +/-   ##
==========================================
+ Coverage   94.28%   94.42%   +0.14%     
==========================================
  Files         320      322       +2     
  Lines       62469    63146     +677     
==========================================
+ Hits        58896    59627     +731     
+ Misses       3573     3519      -54     
Files Changed Coverage Δ
src/query/mod.rs 100.00% <ø> (ø)
src/query/exist_query.rs 96.12% <96.12%> (ø)
src/fastfield/readers.rs 90.65% <98.03%> (+2.50%) ⬆️

... and 12 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@imotov imotov force-pushed the issue/2159-exists-query branch from 3830aa5 to 5a618c1 Compare August 31, 2023 01:16
@imotov
Copy link
Contributor Author

imotov commented Aug 31, 2023

possibly change the spec of what to do if the column is not found

I have implemented it to be consistent with other queries that I have seen. So, if field doesn't exists or not fast - it returns an error. I can see how returning nothing if the field doesn't exist might make sense, but it might also be pretty confusing. Returning nothing if field is not fast can be really confusing to users.

}

impl ExistsQuery {
#![allow(dead_code)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#![allow(dead_code)]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we allow dead code? The compiler is informing you that there is no way to call this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah, I keep forgetting that pub use thing.

@imotov imotov mentioned this pull request Aug 31, 2023
@imotov imotov marked this pull request as ready for review August 31, 2023 04:18
@PSeitz
Copy link
Contributor

PSeitz commented Aug 31, 2023

I have implemented it to be consistent with other queries that I have seen. So, if field doesn't exists or not fast - it returns an error. I can see how returning nothing if the field doesn't exist might make sense, but it might also be pretty confusing. Returning nothing if field is not fast can be really confusing to users.

Non-Existence should return an empty result on JSON fields, since they may be non-existing only on some segments.

@fulmicoton
Copy link
Collaborator

@PSeitz @imotov there is a bit of a semantic problem in our discussion. When we say the field is absent, we are talking of Field in the sense of tantivy schema. Not (Field, Path).

If it is not available, meaning resolve path has return None, I think it makes sense to return an error.
In quickwit, we will need to have some logic to be a little bit more lenient, in order to handle changes of schema.
That logic is already pretty solid, and exists in the conversion from QueryAst to tantivy query.

@imotov
Copy link
Contributor Author

imotov commented Aug 31, 2023

@PSeitz, @fulmicoton so, the way it is implemented right now if you have a json field called foo and you try to run presence query on foo.bar, it will return an error if foo doesn't exists and it will return an empty result set if foo exists and is defined as json field but none of the json documents have bar inside of them. I updated the test in e4e5225 to demonstrate these scenarios.

Adds an implementation of ExistsQuery that takes advantage of fast fields.

Fixes #2159

Co-authored-by: Paul Masurel <[email protected]>
@imotov imotov force-pushed the issue/2159-exists-query branch from 82a1022 to 738d146 Compare September 1, 2023 18:11
@imotov
Copy link
Contributor Author

imotov commented Sep 1, 2023

Can somebody merge this in? I am not authorized to merge pull requests in this repository.

Not ready to be merged yet. Found some issues while trying to integrate it into quickwit.

@imotov
Copy link
Contributor Author

imotov commented Sep 5, 2023

Turned out that the issue was in quickwit and not here. OK to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ExistsQuery
4 participants