About the speed of multi-gpu training #9

LiewFeng · 2022-12-21T04:00:37Z

Hi, @Cc-Hy .When I train the model on kitti train, 2 GPUs takes more time than 1 GPU, which is really strange. Do you encounter this pthenomenon?

sunnyHelen · 2022-12-21T08:02:13Z

Hi, can I ask how much GPU memory can afford the training of this model? I need to evaluate if my GPU memory is enough to try it.

sunnyHelen · 2022-12-21T08:02:24Z

@LiewFeng

LiewFeng · 2022-12-21T08:08:32Z

@sunnyHelen ~18G.

sunnyHelen · 2022-12-21T08:09:57Z

Ok. Thanks a lot~

Cc-Hy · 2022-12-21T11:07:47Z

@LiewFeng
That's very strange. Can you provide more details of your training?
For example, what commond do you run, the batch size, and how much time is spent in both cases .

LiewFeng · 2022-12-23T08:10:31Z

Hi, @Cc-Hy .Sorry for the late reply. The command is the same as that provided by the GETTING_STARTED.md. I didn't modify the batch size.
For 1 GPU setting, it takes about 10 mins for the first epoch. It should take 10 hours for 60-epoch training. However, it only takes 5 hours, which is really strange.
For 2 GPU setting, it takes about 6 mins for the first epoch. It should take 6 hours for 60-epoch training. It only takes 6hours, which is normal.
Another phenomenon is that the cpu utilization of 1 GPU setting is high, while that of 2 GPU setting is really low.

LiewFeng · 2022-12-23T08:29:14Z

Experiments are conducted on kitti train.

Cc-Hy · 2022-12-24T04:04:41Z

@LiewFeng
Hi, it seems your 2 GPU training time is close to mine. It takes ~ 6 minutes for each epoch and I use 2 NVIDIA GeForce RTX 3090. And it takes ~ 12 minutes for each epoch when I use one GPU.

So I think your 2 GPU training time is normal. But if your GPUs are really working at very low utilization, you may check your CPU status. I once met this situation where my CPU was suffering from a bottleneck and the GPU could not work fully.

LiewFeng · 2022-12-26T04:54:02Z

Hi, @Cc-Hy . I figure it out. The reason is the version of pytorch. When I run the experiment with 1 GPU, the pytorch version is 1.10. When I try to run with 2 GPUs, it gets stuck. Then I turn to pytorch 1.8 and it can work, but 2x slower. I am using A 100. It's about 2x faster than 3090. I still get stuck with 2GPU. It seems it's solved in OpenPCDet. Sadly, it doesn't work for me.

LiewFeng · 2022-12-26T09:14:35Z

Problem of getting stuck fixed here and it works for me.

LiewFeng closed this as completed Dec 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the speed of multi-gpu training #9

About the speed of multi-gpu training #9

LiewFeng commented Dec 21, 2022

sunnyHelen commented Dec 21, 2022

sunnyHelen commented Dec 21, 2022

LiewFeng commented Dec 21, 2022

sunnyHelen commented Dec 21, 2022

Cc-Hy commented Dec 21, 2022

LiewFeng commented Dec 23, 2022

LiewFeng commented Dec 23, 2022

Cc-Hy commented Dec 24, 2022

LiewFeng commented Dec 26, 2022

LiewFeng commented Dec 26, 2022

About the speed of multi-gpu training #9

About the speed of multi-gpu training #9

Comments

LiewFeng commented Dec 21, 2022

sunnyHelen commented Dec 21, 2022

sunnyHelen commented Dec 21, 2022

LiewFeng commented Dec 21, 2022

sunnyHelen commented Dec 21, 2022

Cc-Hy commented Dec 21, 2022

LiewFeng commented Dec 23, 2022

LiewFeng commented Dec 23, 2022

Cc-Hy commented Dec 24, 2022

LiewFeng commented Dec 26, 2022

LiewFeng commented Dec 26, 2022