support different ways of ingesting data for tpc-h and tpc-ds #20

lucvlaming · 2020-04-16T13:45:42Z

this would be very useful to e.g. ingest zstd compressed files as to speed up the ingest for small datasets (up to e.g. 100G)

sdressler · 2020-04-16T13:46:47Z

Can you please elaborate a bit?

lucvlaming · 2020-04-16T13:48:09Z

the majority of the ingest time is now taken up by actually generating the data. if you have enough space (e.g. big-bertha) then storing the input makes for a much quicker turn-around time for when you have to try a set of benchmarks.

sdressler · 2020-04-16T13:49:59Z

We could add a data-source flag or similar to these benchmarks. Thus, if the user has data files ready, it would work from there and fall back to the generator.

lucvlaming · 2020-04-16T13:50:27Z

that would be very cool :)

sdressler added the enhancement New feature or request label Apr 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support different ways of ingesting data for tpc-h and tpc-ds #20

support different ways of ingesting data for tpc-h and tpc-ds #20

lucvlaming commented Apr 16, 2020

sdressler commented Apr 16, 2020

lucvlaming commented Apr 16, 2020

sdressler commented Apr 16, 2020

lucvlaming commented Apr 16, 2020

support different ways of ingesting data for tpc-h and tpc-ds #20

support different ways of ingesting data for tpc-h and tpc-ds #20

Comments

lucvlaming commented Apr 16, 2020

sdressler commented Apr 16, 2020

lucvlaming commented Apr 16, 2020

sdressler commented Apr 16, 2020

lucvlaming commented Apr 16, 2020