Speed up the read/preprocess in ABFE workflow #359

xiki-tempula · 2024-05-17T14:10:27Z

In the current ABFE workflow set up, the file read and preprocess are running on a single thread, which is kind of waste of time when reading a lot of files.
In the ABFE workflow, I think we could speed things up by wrap the read and preprocess in a multiprocess thread.
I'm thinking of adding a new dependency joblib for that. I wonder if I could get some advice if the community are happy with that.

The text was updated successfully, but these errors were encountered:

orbeckst · 2024-05-23T16:56:02Z

joblib is not a big dependency; however, we can also think about making it an optional dep.

Perhaps a good starting point for discussion is to see how much speed-up it can bring and if it's something people will likely always want to use. Do you have some benchmark comparisons for typical data sets and how it scales?

xiki-tempula · 2024-05-29T09:27:38Z

It is quite a big speed up. Assume that we have 64 lambda windows where each one has 6251 time points.
On a 64 core instance, the read goes from
7.98 s ± 114 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
to
625 ms ± 13.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The preprocess goes from
1min 10s ± 598 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
to
3.12 s ± 86.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

xiki-tempula · 2024-05-29T19:43:54Z

Apparently joblib is a dependency of scikit-learn, which is our dependency. So we would always have joblib in our env.

orbeckst · 2024-05-29T20:08:19Z

Just make it explicit anyway.

xiki-tempula added parsers preprocessors workflows labels May 17, 2024

xiki-tempula linked a pull request May 29, 2024 that will close this issue

Parallel read and preprocess the data #371

Open

orbeckst assigned xiki-tempula Sep 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up the read/preprocess in ABFE workflow #359

Speed up the read/preprocess in ABFE workflow #359

xiki-tempula commented May 17, 2024

orbeckst commented May 23, 2024

xiki-tempula commented May 29, 2024

xiki-tempula commented May 29, 2024

orbeckst commented May 29, 2024

Speed up the read/preprocess in ABFE workflow #359

Speed up the read/preprocess in ABFE workflow #359

Comments

xiki-tempula commented May 17, 2024

orbeckst commented May 23, 2024

xiki-tempula commented May 29, 2024

xiki-tempula commented May 29, 2024

orbeckst commented May 29, 2024