You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I came across your repository while searching for ways to train multiple NN models simultaneously using 1 single GPU. My model is pretty small (just 1 layer MLP) and the VRAM used more each model is only 260mb. However, when I try to use joblib train multiple models at the same time, though they do start at the same time (according to the log), the total training time is still the same as training models sequentially. Do you happen to have any tips / quick insights / things to look at for this? I know this is not directly an issue with your package but would really appreciate any help.
My code is like this:
with parallel_backend('loky', n_jobs=-1):
parallel = Parallel(n_jobs=-1)
parallel(
delayed(process_latent_pair)(mi_estimator, iid, tid, cfg, exp_name, args, DEVICE) # process_latent_pair trains 1 NN model
for iid in range(13)
for tid in range(13)
)
My environment:
python 3.9.19
torch==2.4.1
joblib==1.4.2
The text was updated successfully, but these errors were encountered:
Hi @xuyxu I don't use torchensemble but using joblib directly. I didn't get any luck debugging this and thought you probably have lots of experience with this so would like to ask for advice on what might have caused this lack of speedup.
Hi,
I came across your repository while searching for ways to train multiple NN models simultaneously using 1 single GPU. My model is pretty small (just 1 layer MLP) and the VRAM used more each model is only 260mb. However, when I try to use joblib train multiple models at the same time, though they do start at the same time (according to the log), the total training time is still the same as training models sequentially. Do you happen to have any tips / quick insights / things to look at for this? I know this is not directly an issue with your package but would really appreciate any help.
My code is like this:
My environment:
The text was updated successfully, but these errors were encountered: