Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solr throttling #10558

Open
wants to merge 14 commits into
base: develop
Choose a base branch
from
Open

Solr throttling #10558

wants to merge 14 commits into from

Conversation

ErykKul
Copy link
Collaborator

@ErykKul ErykKul commented May 13, 2024

What this PR does / why we need it:
Solr performance problems experiment.

Closes (possibly): #10469

@coveralls
Copy link

coveralls commented May 13, 2024

Coverage Status

coverage: 20.576% (-0.02%) from 20.593%
when pulling b688961 on ErykKul:10469_solr_performance
into 3867cfe on IQSS:develop.

@qqmyers
Copy link
Member

qqmyers commented May 13, 2024

@ErykKul - FYI - there's an earlier PR in this area: #10241 by @jeromeroucou (who may have further thoughts). A couple things from the code/notes there: anything there that could be merged with what you're doing? Any thoughts about the ConcurrentUpdate client for indexing? I was potentially seeing some issues after running for days at QDR with the Http2 client. It was hard to tell since the machines were not isolated test ones, so there could have been solr restarts I didn't know about (if so, perhaps we want to be able to refresh the client?). In any case, it would be good to verify - in that PR or here - that running for longer periods doesn't cause a new issue and perhaps see if we can recover from a solr restart.

@ErykKul
Copy link
Collaborator Author

ErykKul commented May 14, 2024

@qqmyers I am just trying to help @landreev finding out how to improve the Solr performance. I think that the changes to make the DB queries faster might have contributed to the problems, you make one thing go faster, the other one starts getting more load kind of thing. Also, I do think that having semaphores on CPU intensive operations for Solr, and letting the cheap queries go fast is a good idea. But still, I was looking for a way to throttle the amount of requests we send concurrently to Solr and connection pools are great for that. I did not see how to regulate the connection pool size on the current Solr client, and HTTP2 client has a clear way of doing that.

I finally found the parameter in the old client for setting the connection pool size and it is set to 10000 by default, which might be a bit much. The HTTP2 client has 64 by default, but they are HTTP2, so you can use the same connection for multiple concurrent requests, it is not clear to me yet how many there can be in total (at least it is how I understand it now). The old client is more simple: 10000 concurrent requests. I think that we should stick to the old client for now, at least until we know exactly what the problems are. We can now also use the connection pool size to tune the performance better, if ever needed.

@ErykKul
Copy link
Collaborator Author

ErykKul commented May 14, 2024

@qqmyers @landreev
I have added the semaphore and made the number of simultaneously running heavy operations configurable with dataverse.solr.concurrency.max-heavy-operations (defaults to 1). Also, max open connections to Solr is configurable too (dataverse.solr.concurrency.max-solr-connections defaults to 10000). I made all background-like calls to Solr use the "heavy operations" calls. I think this sums up (more or less) the discussion on slack. I will investigate why the build fails, and fix that first. It can go to review/experimenting after that.

@ErykKul
Copy link
Collaborator Author

ErykKul commented May 14, 2024

The build looks better now, I think it should be fine (sometimes jenkins fails for apparently no reason, not sure if it will pass, and if it will fail, why it fails...). I forgot to mention: I added two new metrics:

  • solr_heavy_operation_permit_wait_time_seconds_mean displays how long does it take to receive a permit to index a dataset.
  • heavy_solr_operation_time_seconds displays how long does it take to perform a heavy Solr operation.
    I am not sure if something else needs to be done to make the metrics work, or if they should work like that without anything special.

@ErykKul ErykKul changed the title 10469 solr performance Solr throttling May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants