Multi-GPU training is reducing speed compared to single GPU #217

tanvir-utexas · 2022-06-12T10:12:12Z

For training with both the baseline and soft-teacher configs, I am always getting much slower training with more gpus. For training with 1% label, the single gpu training shows 2 days of approximated training while 8 gpus shows 5 days of approximated training. I don't understand the underlying reason. I am using 8 A5000 GPU node. Can anyone tell how long should it take? What can I do to get the speedup from multi-gpu training? I am badly stuck on this. Any help will be greatly appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-GPU training is reducing speed compared to single GPU #217

Multi-GPU training is reducing speed compared to single GPU #217

tanvir-utexas commented Jun 12, 2022

Multi-GPU training is reducing speed compared to single GPU #217

Multi-GPU training is reducing speed compared to single GPU #217

Comments

tanvir-utexas commented Jun 12, 2022