Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize trainer base evaluation in DDP setting #10

Open
jxmorris12 opened this issue Nov 13, 2023 · 0 comments
Open

Parallelize trainer base evaluation in DDP setting #10

jxmorris12 opened this issue Nov 13, 2023 · 0 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@jxmorris12
Copy link
Owner

When training models, the bulk of evaluation is done on the main worker. When we train with 8 GPUs, we should get around an 8x speedup on eval, which would make a difference with large evaluation sets.

The main culprit is this method: https://github.com/jxmorris12/vec2text/blob/master/vec2text/trainers/base.py#L363C5-L365C27 and the subsequent call to _get_decoded_sequences in the Base trainer class. We explicitly enumerate over an eval dataloader of the first n samples which (I think) will happen once in each worker. Instead, we should split the work among multiple GPUs.

@jxmorris12 jxmorris12 added enhancement New feature or request help wanted Extra attention is needed labels Nov 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant