You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I observed that the current code creates a hash key for ID and checks the Solr core if it is already indexed or not. If it is already indexed, we skip it, else we index the new ID. Shouldn't we leave the search part? Every time we index a new document, we first search the whole database if it already exists or not and then post it. When we will have a very large DB, this may become expensive. We can rather leave the search part and index it directly. If the ID is already indexed, it will be overwritten anyway.
Maybe. But I believe this ID lookup is on an index which is very efficient - I expect that the bottlenecks are the crawl itself and maybe the number of independent requests made to Solr during indexing.
This ID lookup is indeed efficient but to perform it, we are sending requests one by one for every row in the DB. This is making the whole indexing slow as I pointed out in #14
Both of these issues can be solved together if we post a bunch of documents together for indexing. I am not sure how to fix the number of documents that we will be posting in one request. I am preparing a solution for this. I will send a PR for discussion
I observed that the current code creates a hash key for ID and checks the Solr core if it is already indexed or not. If it is already indexed, we skip it, else we index the new ID. Shouldn't we leave the search part? Every time we index a new document, we first search the whole database if it already exists or not and then post it. When we will have a very large DB, this may become expensive. We can rather leave the search part and index it directly. If the ID is already indexed, it will be overwritten anyway.
Any thought on adding modifiers to our json doc that is being posted to Solr? - https://lucene.apache.org/solr/guide/6_6/updating-parts-of-documents.html#UpdatingPartsofDocuments-Example.1 We can use "set'
The text was updated successfully, but these errors were encountered: