-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compression rate to minimize reading time? #278
Comments
Hi @frederickluser, that's an interesting question!
Both compression algorithms will take more time for compression when the compression settings is higher but for decompression time there is almost no difference. So if you want to write once and read often, your best option is to use the highest compression settings possible. With equal decompression time, the smaller number of bytes that need to be read from disk will shorten your reading times :-) If you would have an infinitely fast disk the reading time would only be limited by decompression speed, and the actual level selected would probably not matter too much. Hope that helps :-) (PS: in the README benchmark figure you can also see that with the fast (but limited) disk speed there, more compression leads to higher reading speeds) |
Hey Marcus Great, thanks a lot for the super informative answer! That is every helpful. All the best, Frederic |
Thank you so much for all your great work. I wondered which compression factor would minimize reading time for large files with e.g. 100 million observations, if I'm not concerned about writing time. Do you have any intuition or previous benchmarks from, let's say extreme cases (e.g., compress = 0, 50, 100)?
EDIT: I guess optimal compression rates depend also on one's hardware. In my case at least, I work on a quite powerful machine, 36 virtual processors, 2.3GHz, 440 GB ...
Any comment highly appreciated. All the best,
Frederic
The text was updated successfully, but these errors were encountered: