Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fulltext search #563

Open
JohannesLichtenberger opened this issue Dec 7, 2022 · 4 comments
Open

Fulltext search #563

JohannesLichtenberger opened this issue Dec 7, 2022 · 4 comments

Comments

@JohannesLichtenberger
Copy link
Member

We need to have a way to do fulltext search on text nodes. Probably therefore it's possible to include Lucene.

@Rathan-Naik
Copy link

I can pitch in here.

@JohannesLichtenberger
Copy link
Member Author

We have to check, if we can somehow implement some kind of a store (I think it's called Directory) and the fields, as our main data structure is a keyed trie indexing 64 bit nodeKeys <=> nodes and it would be great if we could store the full text index likewise in our persistent structure. Haven't checked Lucene, though.

@adamretter
Copy link

We make use of Lucene in eXist-db for the Full Text index. There are definitely advantages and disadvantages to using Lucene.

On the one hand Lucene is very mature and flexible whilst offering decent performance. If you want to implement something like the W3C XQuery Full Text extensions, it will have almost everything you need baked in. Also, you can allow users to choose or code their own Analyzers for pretty much any language or purpose which is neat.

On the other hand, if you need transactional consistency, as far as I am aware there is no good way to involve Lucene in the transactions against your own indexes. I enquired some time ago, so perhaps things have changed more recently, but previously there was no way to control Lucene transactions directly, so you could not do a 2PC approach.

@JohannesLichtenberger
Copy link
Member Author

Hi Adam, isn't the single writer supposed to implement the two phase commit interface https://lucene.apache.org/core/7_4_0/core/org/apache/lucene/index/TwoPhaseCommit.html ?

I had a quick look, and I think we'd need to implement a custom Directory... but I'm not sure if we can somehow store the Documents in another subtree (in a trie) as we do with the other indexes. Thus, it would be automatically versioned which is what we need after all. AFAICS, the documents are written in DocumentsWriter, which is sadly not an interface and also instances are created directly in IndexWriter. Thus, I'm not sure if it's even possible to change the index structure in which lucene stores the documents besides the actual Directory to store to/read from!?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants