-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
why multi-gpu training slower than single gpu #5250
Comments
Hello @wangdada-love , Thank you for the interesting question. Let me answer the 2nd one first. DALI is used in MLPerf competition in the benchmarks posted by NVIDIA. Since MLPerf is all about performance, if the native TF would be faster, we'd be using that one ;) Additionally, we do have multitude of success stories (please refer here) that emphasise how DALI helps in data augmentation. With regards to your firs question, without some additional details it is hard to tell what's happening. Should you like to diagnose what's happening, I'd like to suggest you two things. First, please look at the output of
And then use Nsight Systems to open captured profile and look what happened. |
Describe the question.
I have rewritten my data augmentation methods using the DALI module and applied them to train the DeeplabV3 model based on TensorFlow. However, I have observed that the training speed is faster when using a single GPU, and the speed significantly decreases when training on 4 GPUs. Both my data augmentation methods and the generation of DaliDataset are implemented following the official documentation:https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/frameworks/tensorflow/tensorflow-dataset-multigpu.html
My current concerns are as follows:
Check for duplicates
The text was updated successfully, but these errors were encountered: