-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about usage of cores and db-benchmark performance #3368
Comments
For operations listed in https://dataframes.juliadata.org/stable/lib/functions/#Multithreading-support DataFrames.jl uses as many cores as you start your Julia process with.
Yes. However, it currently was not considered as top priority. Having said that:
Also note that in the benchmarks you reference DuckDB not Polars is generally the fastest solution and we treat it as a reference benchmark. |
thank you for the detailed response @bkamins - my question was based on a discussion with a colleague about the db-benchmark - I will look into the multi-threading support and get back I'm not sure about my bandwidth or capability to help with the source code to improve on the db-benchmark, but it's a very popular benchmark that does influence the usage of libraries, so it'd be great to see the Julia performance improve - thank you again! |
Help with the code is always welcome. However, as I have commented, even sharing real-life examples that are slow in practice would help. The point is that this benchmark is run on a large multi-core server, while probably typically people run their code on laptops /smaller servers that have a different performance characteristic (and this is the target we want to optimize for in the first place). |
hello - I have a general question about whether Dataframes.jl uses all of the physical cores available on the machine when executing code (the way polars does - https://www.pola.rs/) - I'd greatly appreciate it if someone could share any resources on tips to improve the performance of Dataframes.jl.
it'd also be super helpful to get some feedback on whether there is any way to improve the performance of Dataframes.jl in the recently updated db-benchmark:
https://duckdblabs.github.io/db-benchmark/
thank you
The text was updated successfully, but these errors were encountered: