Compression rate to minimize reading time? #278

frederickluser · 2023-07-25T14:00:31Z

Thank you so much for all your great work. I wondered which compression factor would minimize reading time for large files with e.g. 100 million observations, if I'm not concerned about writing time. Do you have any intuition or previous benchmarks from, let's say extreme cases (e.g., compress = 0, 50, 100)?

EDIT: I guess optimal compression rates depend also on one's hardware. In my case at least, I work on a quite powerful machine, 36 virtual processors, 2.3GHz, 440 GB ...

Any comment highly appreciated. All the best,
Frederic

MarcusKlik · 2023-12-01T08:43:49Z

Hi @frederickluser, that's an interesting question!

fst uses LZ4 (highest speeds) and ZSTD (lowest speeds) for compression and decompression. In general, the size of your fst file will be smallest for the highest compression settings.

Both compression algorithms will take more time for compression when the compression settings is higher but for decompression time there is almost no difference.

So if you want to write once and read often, your best option is to use the highest compression settings possible. With equal decompression time, the smaller number of bytes that need to be read from disk will shorten your reading times :-)

If you would have an infinitely fast disk the reading time would only be limited by decompression speed, and the actual level selected would probably not matter too much.

Hope that helps :-)

(PS: in the README benchmark figure you can also see that with the fast (but limited) disk speed there, more compression leads to higher reading speeds)

frederickluser · 2023-12-04T16:51:56Z

Hey Marcus

Great, thanks a lot for the super informative answer! That is every helpful.

All the best, Frederic

MarcusKlik self-assigned this Dec 1, 2023

MarcusKlik added the benchmarks label Dec 1, 2023

MarcusKlik added this to the fst v0.9.10 milestone Dec 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compression rate to minimize reading time? #278

Compression rate to minimize reading time? #278

frederickluser commented Jul 25, 2023 •

edited

MarcusKlik commented Dec 1, 2023 •

edited

frederickluser commented Dec 4, 2023

Compression rate to minimize reading time? #278

Compression rate to minimize reading time? #278

Comments

frederickluser commented Jul 25, 2023 • edited

MarcusKlik commented Dec 1, 2023 • edited

frederickluser commented Dec 4, 2023

frederickluser commented Jul 25, 2023 •

edited

MarcusKlik commented Dec 1, 2023 •

edited