-
Notifications
You must be signed in to change notification settings - Fork 546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow test execution and metric calculation #1034
Comments
I checked the insides of ClassificationPreset and DataDriftPreset. I've seen a lot of data copying, which is far from ideal. Also pandas is used here, while in some cases faster alternatives could be utilized. |
I wonder if metric calculation can be done at least in several processes |
The best solution, I believe, would be to use parallel execution, but we need to explore the feasibility of its application. Trying to optimize individual sections is of little use because in my case, we are calculating ~2000 different tests and metrics. I don't see any problems with generating HTML since you're only using HTML when necessary. |
Also using polars with lazy calculations instead of pandas can be a good solution if we are talking about calc optimization |
@c0t0ber |
@c0t0ber |
Also I'm less sure about it, but the data points in HTML may be duplicated in context of several metrics/tests |
Idk if this is correct and/or possible, but the following would be cool
|
When using real data with a size of 100k rows and a large number of columns, metrics, and tests (around 1000), their calculation can take up to 20 minutes. Additionally, computer resources are not fully utilized, with a powerful processor not exceeding 20% of one core's capacity. Consequently, with many tests and metrics and a high RPS of new data, Evidently may not be able to process them in time.
The text was updated successfully, but these errors were encountered: