forked from quickwit-oss/tantivy
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Expose phrase-prefix queries via the built-in query parser (quickwit-…
…oss#2044) * Expose phrase-prefix queries via the built-in query parser This proposes the less-than-imaginative syntax `field:"phrase ter"*` to perform a phrase prefix query against `field` using `phrase` and `ter` as the terms. The aim of this is to make this type of query more discoverable and simplify manual testing. I did consider exposing the `max_expansions` parameter similar to how slop is handled, but I think that this is rather something that should be configured via the querser parser (similar to `set_field_boost` and `set_field_fuzzy`) as choosing it requires rather intimiate knowledge of the backing index. * Prevent construction of zero or one term phrase-prefix queries via the query parser. * Add example using phrase-prefix search via surface API to improve feature discoverability.
- Loading branch information
1 parent
7ee78bd
commit b325d56
Showing
6 changed files
with
232 additions
and
46 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
use tantivy::collector::TopDocs; | ||
use tantivy::query::QueryParser; | ||
use tantivy::schema::*; | ||
use tantivy::{doc, Index, ReloadPolicy, Result}; | ||
use tempfile::TempDir; | ||
|
||
fn main() -> Result<()> { | ||
let index_path = TempDir::new()?; | ||
|
||
let mut schema_builder = Schema::builder(); | ||
schema_builder.add_text_field("title", TEXT | STORED); | ||
schema_builder.add_text_field("body", TEXT); | ||
let schema = schema_builder.build(); | ||
|
||
let title = schema.get_field("title").unwrap(); | ||
let body = schema.get_field("body").unwrap(); | ||
|
||
let index = Index::create_in_dir(&index_path, schema)?; | ||
|
||
let mut index_writer = index.writer(50_000_000)?; | ||
|
||
index_writer.add_document(doc!( | ||
title => "The Old Man and the Sea", | ||
body => "He was an old man who fished alone in a skiff in the Gulf Stream and he had gone \ | ||
eighty-four days now without taking a fish.", | ||
))?; | ||
|
||
index_writer.add_document(doc!( | ||
title => "Of Mice and Men", | ||
body => "A few miles south of Soledad, the Salinas River drops in close to the hillside \ | ||
bank and runs deep and green. The water is warm too, for it has slipped twinkling \ | ||
over the yellow sands in the sunlight before reaching the narrow pool. On one \ | ||
side of the river the golden foothill slopes curve up to the strong and rocky \ | ||
Gabilan Mountains, but on the valley side the water is lined with trees—willows \ | ||
fresh and green with every spring, carrying in their lower leaf junctures the \ | ||
debris of the winter’s flooding; and sycamores with mottled, white, recumbent \ | ||
limbs and branches that arch over the pool" | ||
))?; | ||
|
||
// Multivalued field just need to be repeated. | ||
index_writer.add_document(doc!( | ||
title => "Frankenstein", | ||
title => "The Modern Prometheus", | ||
body => "You will rejoice to hear that no disaster has accompanied the commencement of an \ | ||
enterprise which you have regarded with such evil forebodings. I arrived here \ | ||
yesterday, and my first task is to assure my dear sister of my welfare and \ | ||
increasing confidence in the success of my undertaking." | ||
))?; | ||
|
||
index_writer.commit()?; | ||
|
||
let reader = index | ||
.reader_builder() | ||
.reload_policy(ReloadPolicy::OnCommit) | ||
.try_into()?; | ||
|
||
let searcher = reader.searcher(); | ||
|
||
let query_parser = QueryParser::for_index(&index, vec![title, body]); | ||
// This will match documents containing the phrase "in the" | ||
// followed by some word starting with "su", | ||
// i.e. it will match "in the sunlight" and "in the success", | ||
// but not "in the Gulf Stream". | ||
let query = query_parser.parse_query("\"in the su\"*")?; | ||
|
||
let top_docs = searcher.search(&query, &TopDocs::with_limit(10))?; | ||
let mut titles = top_docs | ||
.into_iter() | ||
.map(|(_score, doc_address)| { | ||
let doc = searcher.doc(doc_address)?; | ||
let title = doc.get_first(title).unwrap().as_text().unwrap().to_owned(); | ||
Ok(title) | ||
}) | ||
.collect::<Result<Vec<_>>>()?; | ||
titles.sort_unstable(); | ||
assert_eq!(titles, ["Frankenstein", "Of Mice and Men"]); | ||
|
||
Ok(()) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.