You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Context: Running jobs in a multi-node, multi-gpu environment using SlurmExecutor.
Request: Functionality to execute custom shell commands on each GPU inside NeMo container before the main fn_or_script execution starts.
Why?:
When experimenting/testing new features, we set env vars (different from env vars added to the sbatch script), execute basic shell commands on each GPU for getting some info about the container
I understand we can do so by launching an interactive container session. But, doing so for multiple gpus at the same time is not feasible.
currently, I've updated scripts in NeMo-Run to add the following to srun command- bash -c "<custom shell command(s)> && python -m nemo_run.core.runners.fdl_runner -n <exp_name> -p /nemo_run/configs/exp_1_packager /nemo_run/configs/<exp_name>_fn_or_script"
as opposed to the default being the following python -m nemo_run.core.runners.fdl_runner -n <exp_name> -p /nemo_run/configs/exp_1_packager /nemo_run/configs/<exp_name>_fn_or_script
The text was updated successfully, but these errors were encountered:
Context: Running jobs in a multi-node, multi-gpu environment using SlurmExecutor.
Request: Functionality to execute custom shell commands on each GPU inside NeMo container before the main
fn_or_script
execution starts.Why?:
currently, I've updated scripts in NeMo-Run to add the following to srun command-
bash -c "<custom shell command(s)> && python -m nemo_run.core.runners.fdl_runner -n <exp_name> -p /nemo_run/configs/exp_1_packager /nemo_run/configs/<exp_name>_fn_or_script"
as opposed to the default being the following
python -m nemo_run.core.runners.fdl_runner -n <exp_name> -p /nemo_run/configs/exp_1_packager /nemo_run/configs/<exp_name>_fn_or_script
The text was updated successfully, but these errors were encountered: