Parallelize trainer base evaluation in DDP setting #10

jxmorris12 · 2023-11-13T16:28:33Z

When training models, the bulk of evaluation is done on the main worker. When we train with 8 GPUs, we should get around an 8x speedup on eval, which would make a difference with large evaluation sets.

The main culprit is this method: https://github.com/jxmorris12/vec2text/blob/master/vec2text/trainers/base.py#L363C5-L365C27 and the subsequent call to _get_decoded_sequences in the Base trainer class. We explicitly enumerate over an eval dataloader of the first n samples which (I think) will happen once in each worker. Instead, we should split the work among multiple GPUs.

jxmorris12 added enhancement New feature or request help wanted Extra attention is needed labels Nov 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize trainer base evaluation in DDP setting #10

Parallelize trainer base evaluation in DDP setting #10

jxmorris12 commented Nov 13, 2023

Parallelize trainer base evaluation in DDP setting #10

Parallelize trainer base evaluation in DDP setting #10

Comments

jxmorris12 commented Nov 13, 2023