DDP share information before processes are properly initialized. #20414

AndrasSalamon · 2024-11-12T23:43:34Z

AndrasSalamon
Nov 12, 2024

Hello!

TL;DR: How to share information between processes before lightning class init_process_group?

My train scripts goes like this:

load config
create train folder (with date & time)
create data module
create model
create logger & callbacks (uses train folder)
train
test (uses train folder)

This works fine with 1 GPU, but with multiple GPUs (DDP strategy) the train folder causes problems. The model checkpoint, the logger, and even the test code uses this train folder, but with DDP every process creates its own folder, and I need one common folder. I tried the broadcasting and barrier methods I found here, but I need to share this information with the other processes before the Trainer class is initialized.

Is there a way to share information before the Trainer init, or am I going about this the wrong way?

(Currently stuck on version 1.6.5, but I am willing to upgrade to solve this problem,)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DDP share information before processes are properly initialized. #20414

{{title}}

Replies: 0 comments

Select a reply

DDP share information before processes are properly initialized. #20414

AndrasSalamon Nov 12, 2024

Replies: 0 comments

AndrasSalamon
Nov 12, 2024