Ids for indexing docs in Solr #15

innovationchef · 2018-03-07T09:50:06Z

I observed that the current code creates a hash key for ID and checks the Solr core if it is already indexed or not. If it is already indexed, we skip it, else we index the new ID. Shouldn't we leave the search part? Every time we index a new document, we first search the whole database if it already exists or not and then post it. When we will have a very large DB, this may become expensive. We can rather leave the search part and index it directly. If the ID is already indexed, it will be overwritten anyway.

Any thought on adding modifiers to our json doc that is being posted to Solr? - https://lucene.apache.org/solr/guide/6_6/updating-parts-of-documents.html#UpdatingPartsofDocuments-Example.1 We can use "set'

justinccdev · 2018-03-08T17:39:19Z

Maybe. But I believe this ID lookup is on an index which is very efficient - I expect that the bottlenecks are the crawl itself and maybe the number of independent requests made to Solr during indexing.

innovationchef · 2018-03-10T11:29:00Z

This ID lookup is indeed efficient but to perform it, we are sending requests one by one for every row in the DB. This is making the whole indexing slow as I pointed out in #14
Both of these issues can be solved together if we post a bunch of documents together for indexing. I am not sure how to fix the number of documents that we will be posting in one request. I am preparing a solution for this. I will send a PR for discussion

justinccdev · 2018-03-13T18:35:00Z

Sounds good, thanks @innovationchef

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ids for indexing docs in Solr #15

Ids for indexing docs in Solr #15

innovationchef commented Mar 7, 2018 •

edited

Loading

justinccdev commented Mar 8, 2018

innovationchef commented Mar 10, 2018

justinccdev commented Mar 13, 2018

Ids for indexing docs in Solr #15

Ids for indexing docs in Solr #15

Comments

innovationchef commented Mar 7, 2018 • edited Loading

justinccdev commented Mar 8, 2018

innovationchef commented Mar 10, 2018

justinccdev commented Mar 13, 2018

innovationchef commented Mar 7, 2018 •

edited

Loading