Parallel read and preprocess the data #371

xiki-tempula · 2024-05-29T19:30:40Z

Use joblib to parallelise the read and preprocess.

…rkflow

…tps://github.com/alchemistry/alchemlyb into 359-speed-up-the-readpreprocess-in-abfe-workflow

codecov · 2024-05-29T20:04:23Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.84%. Comparing base (584588c) to head (54bb316).
Report is 28 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #371   +/-   ##
=======================================
  Coverage   98.83%   98.84%           
=======================================
  Files          28       28           
  Lines        1895     1899    +4     
  Branches      407      408    +1     
=======================================
+ Hits         1873     1877    +4     
  Misses          2        2           
  Partials       20       20

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

orbeckst

Overall this looks like a powerful new feature for the workflow. Do you have some simple performance benchmarks?

My primary concerns are (see comments)

Making sure that joblib is explicitly installed.
Is the new default n_jobs=-1 safe?
add docs

orbeckst · 2024-10-02T17:09:44Z

CHANGES

@@ -13,14 +13,17 @@ The rules for this file:
 * release numbers follow "Semantic Versioning" https://semver.org

 ------------------------------------------------------------------------------
-??/??/2024 orbeckst
+??/??/2024 orbeckst, xiki-tempula


merge & resolve conflicts and set up 2.5.0

orbeckst · 2024-10-02T17:11:42Z

environment.yml

@@ -11,3 +11,4 @@ dependencies:
 - pyarrow
 - matplotlib
 - loguru
+- joblib


Also update any other places where installation is needed (pyproject,toml, CI devtools/conda-envs/test_env.yaml, RTD, later the cf feedstock)

orbeckst · 2024-10-02T17:14:50Z

src/alchemlyb/tests/test_workflow_ABFE.py

 import pytest
 from alchemtest.amber import load_bace_example
 from alchemtest.gmx import load_ABFE
+from joblib import parallel_config


Maybe just import joblib so that it's clearer below what comes from joblib? In this way any un-qualified functions and classes are alchemlyb and everything else is external.

orbeckst · 2024-10-02T17:15:33Z

src/alchemlyb/tests/test_workflow_ABFE.py

+ suffix="xvg",
+ T=310,
+ )
+ with parallel_config(backend="threading"):


I find

Suggested change

with parallel_config(backend="threading"):

with joblib.parallel_config(backend="threading"):

clearer when quickly reading the code.

orbeckst · 2024-10-02T17:16:09Z

src/alchemlyb/workflows/abfe.py

@@ -125,6 +126,8 @@ def read(self, read_u_nk=True, read_dHdl=True):
 Whether to read the u_nk.
 read_dHdl : bool
 Whether to read the dHdl.
+ n_jobs : int
+ Number of parallel workers to use for reading the data.


explain what the default -1 does

orbeckst · 2024-10-02T17:16:51Z

src/alchemlyb/workflows/abfe.py

@@ -115,7 +116,7 @@ def __init__(
 else:
 raise NotImplementedError(f"{software} parser not found.")

- def read(self, read_u_nk=True, read_dHdl=True):
+ def read(self, read_u_nk=True, read_dHdl=True, n_jobs=-1):


Is default -1 really always the best choice? Did you try on a machine with, say, 16 cores, and 16 hyperthreaded cores (or really anything with hyperthreads)?

I am willing to make -1 the default if this is not throwing surprises for users. Otherwise the conservative 1 would be better and users can then explicitly enable.

add versionchanged to docs

Add a paragraph about parallelization: how to enable it, what it does (for each file), any potential problems...

orbeckst · 2024-10-02T17:24:58Z

src/alchemlyb/workflows/abfe.py

@@ -201,6 +219,7 @@ def run(
 overlap="O_MBAR.pdf",
 breakdown=True,
 forwrev=None,
+ n_jobs=-1,


see above

Is -1 safe as new default?

add docs (explanation)

add versionchanged

orbeckst · 2024-10-02T17:25:44Z

src/alchemlyb/workflows/abfe.py

@@ -307,7 +329,7 @@ def update_units(self, units=None):
 logger.info(f"Set unit to {units}.")
 self.units = units or None

- def preprocess(self, skiptime=0, uncorr="dE", threshold=50):
+ def preprocess(self, skiptime=0, uncorr="dE", threshold=50, n_jobs=-1):


see above

Is -1 safe as new default?

add docs (explanation)

add versionchanged

xiki-tempula added 4 commits May 27, 2024 21:30

update

da36cff

update

5ed0ce5

update

d348e76

update

fda5ef7

xiki-tempula linked an issue May 29, 2024 that may be closed by this pull request

Speed up the read/preprocess in ABFE workflow #359

Open

xiki-tempula and others added 5 commits May 29, 2024 20:30

Merge branch 'master' into 359-speed-up-the-readpreprocess-in-abfe-wo…

fd52924

…rkflow

update

0e089a6

Merge branch '359-speed-up-the-readpreprocess-in-abfe-workflow' of ht…

165994c

…tps://github.com/alchemistry/alchemlyb into 359-speed-up-the-readpreprocess-in-abfe-workflow

update

3fa533a

fix test

77eee59

xiki-tempula added 2 commits May 29, 2024 21:12

update

acae6b5

update

6b82e09

xiki-tempula marked this pull request as ready for review May 29, 2024 20:34

make test more clear

d5fe5dd

xiki-tempula requested a review from orbeckst May 29, 2024 20:43

xiki-tempula force-pushed the 359-speed-up-the-readpreprocess-in-abfe-workflow branch from 0f9ddae to d5fe5dd Compare June 3, 2024 09:03

fix type

54bb316

orbeckst added enhancement parsers preprocessors labels Sep 19, 2024

orbeckst requested changes Oct 2, 2024

View reviewed changes

orbeckst self-assigned this Oct 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel read and preprocess the data #371

Parallel read and preprocess the data #371

xiki-tempula commented May 29, 2024 •

edited

Loading

codecov bot commented May 29, 2024 •

edited

Loading

orbeckst left a comment

orbeckst Oct 2, 2024

orbeckst Oct 2, 2024

orbeckst Oct 2, 2024

orbeckst Oct 2, 2024

orbeckst Oct 2, 2024

orbeckst Oct 2, 2024

orbeckst Oct 2, 2024

orbeckst Oct 2, 2024

orbeckst Oct 2, 2024

orbeckst Oct 2, 2024

	with parallel_config(backend="threading"):
	with joblib.parallel_config(backend="threading"):

Parallel read and preprocess the data #371

Are you sure you want to change the base?

Parallel read and preprocess the data #371

Conversation

xiki-tempula commented May 29, 2024 • edited Loading

codecov bot commented May 29, 2024 • edited Loading

Codecov Report

orbeckst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xiki-tempula commented May 29, 2024 •

edited

Loading

codecov bot commented May 29, 2024 •

edited

Loading