Slow indexing in Solr #14

innovationchef · 2018-03-07T09:07:32Z

indexers.txt
I calculated the time elapsed in indexing the documents for 20 json files from http://beta.synbiomine.org/synbiomine/sitemap.xml and I realized that the current way of indexing is very slow. It took me around 15 seconds to index 20 docs when we commit it one by one (We are reading a row in SQL and posting it in a for loop). If we rather collect the rows, convert then to json once and post the list of 20 at once, it will take only 0.7 seconds to do the same. A possible explanation for this could be the time taken to post a single query to the server and waiting for the response is 0.7 sec. When we do it for 20 docs, we are making 20 requests - 20*0.7 = 14 secs.
@justinccdev Have you noticed this before?

Test code -
indexers.txt

justinccdev · 2018-03-08T17:32:14Z

This is quite possible, I made little attempt to optimize what has been largely a proof-of-concept until now.

If there's an easy optimization (bearing in mind this stuff might be replaced by scrapy/frontera anyway) then that would be good to see. The issue with posting a bunch of json in a single db row (if I understand you right) is that then manipulating those entries individual may become more complex.

innovationchef mentioned this issue Mar 10, 2018

Ids for indexing docs in Solr #15

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow indexing in Solr #14

Slow indexing in Solr #14

innovationchef commented Mar 7, 2018 •

edited

Loading

justinccdev commented Mar 8, 2018

Slow indexing in Solr #14

Slow indexing in Solr #14

Comments

innovationchef commented Mar 7, 2018 • edited Loading

justinccdev commented Mar 8, 2018

innovationchef commented Mar 7, 2018 •

edited

Loading