Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow indexing in Solr #14

Open
innovationchef opened this issue Mar 7, 2018 · 1 comment
Open

Slow indexing in Solr #14

innovationchef opened this issue Mar 7, 2018 · 1 comment

Comments

@innovationchef
Copy link
Member

innovationchef commented Mar 7, 2018

indexers.txt
I calculated the time elapsed in indexing the documents for 20 json files from http://beta.synbiomine.org/synbiomine/sitemap.xml and I realized that the current way of indexing is very slow. It took me around 15 seconds to index 20 docs when we commit it one by one (We are reading a row in SQL and posting it in a for loop). If we rather collect the rows, convert then to json once and post the list of 20 at once, it will take only 0.7 seconds to do the same. A possible explanation for this could be the time taken to post a single query to the server and waiting for the response is 0.7 sec. When we do it for 20 docs, we are making 20 requests - 20*0.7 = 14 secs.
@justinccdev Have you noticed this before?

Test code -
indexers.txt

@justinccdev
Copy link
Member

This is quite possible, I made little attempt to optimize what has been largely a proof-of-concept until now.

If there's an easy optimization (bearing in mind this stuff might be replaced by scrapy/frontera anyway) then that would be good to see. The issue with posting a bunch of json in a single db row (if I understand you right) is that then manipulating those entries individual may become more complex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants