Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ids for indexing docs in Solr #15

Open
innovationchef opened this issue Mar 7, 2018 · 3 comments
Open

Ids for indexing docs in Solr #15

innovationchef opened this issue Mar 7, 2018 · 3 comments

Comments

@innovationchef
Copy link
Member

innovationchef commented Mar 7, 2018

I observed that the current code creates a hash key for ID and checks the Solr core if it is already indexed or not. If it is already indexed, we skip it, else we index the new ID. Shouldn't we leave the search part? Every time we index a new document, we first search the whole database if it already exists or not and then post it. When we will have a very large DB, this may become expensive. We can rather leave the search part and index it directly. If the ID is already indexed, it will be overwritten anyway.

Any thought on adding modifiers to our json doc that is being posted to Solr? - https://lucene.apache.org/solr/guide/6_6/updating-parts-of-documents.html#UpdatingPartsofDocuments-Example.1 We can use "set'

@justinccdev
Copy link
Member

Maybe. But I believe this ID lookup is on an index which is very efficient - I expect that the bottlenecks are the crawl itself and maybe the number of independent requests made to Solr during indexing.

@innovationchef
Copy link
Member Author

This ID lookup is indeed efficient but to perform it, we are sending requests one by one for every row in the DB. This is making the whole indexing slow as I pointed out in #14
Both of these issues can be solved together if we post a bunch of documents together for indexing. I am not sure how to fix the number of documents that we will be posting in one request. I am preparing a solution for this. I will send a PR for discussion

@justinccdev
Copy link
Member

Sounds good, thanks @innovationchef

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants