-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some adjustment for supporting Deepspeed-Ulysses #2877
base: main
Are you sure you want to change the base?
Conversation
Thanks for the PR @zeyugao. We will review this PR asap but just to let you know, we have quite a lot of backlog since @pacman100 left. This might take a bit of time before we review this ! Hope you understand ! If anyone is interested in merging asap, please add a thumbs up ! |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
315bc81
to
13dad66
Compare
13dad66
to
2e33e77
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding support for Deepspeed SP, @zeyugao
I don't think all of parallel_state.py needs to be copied when all you want to use is just a few functions. It's a bad practice to include huge chunks of unused code in a public repo. So perhaps you could massive reduce it?
And the key missing part from this PR is a test that adds support for this feature.
@@ -0,0 +1,790 @@ | |||
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. | |
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. | |
# Adapted from https://github.com/microsoft/Megatron-DeepSpeed/blob/main/megatron/core/parallel_state.py |
probably a good idea to mention the source - I didn't check but it looks like this is where you took it from.
BTW, an alternative solution has been just posted here: huggingface/transformers#32305 so perhaps wait before the maintainers decided which version is the most fitting before you invest more time into improving this PR, @zeyugao (I'm not a maintainer) |
Thank you for commenting and mentioning that alternative PR. I acknowledge that Here are some differences between the two PRs for reference (In my understanding):
|
That's a wonderfully written followup, Elsa @samadejacobs, could you please have a look at Elsa's comments above and see if anything still needs to be added to your PR - and if so hopefully you can share the credits with Elsa. Once thing I for sure agree with Elsa is this is Accelerate and not HF transformers, so we need to support things like the correct world size. I'm pretty sure the final result should be an amalgamation of these 2 PRs. Since these are 2 different repos - possibly merging Sam's PR first as it's more upstream to Accelerate, and then having Elsa adapt her PR to add the missing parts? |
One more comment: The accelerate and transformers should be both modified to provide a more out of box usage of sequence parallel: huggingface/transformers#31525 and this one |
@zeyugao, thanks for your contribution. Few things to note/recommend:
|
@zeyugao, so I suppose huggingface/transformers#31525 will have to be revisited as well once huggingface/transformers#32305 has been merged. |
Hi, I have a couple of questions. First of all, what is the status of this? I will glad to give a hand if it will be useful. Secondly, should we adjust |
@pavelgein The ulysses integration for transformers is tracked in huggingface/transformers#32305 just mentioned before. If you use transformers to train the model, there is no need to adjust I think that we can try to find more inconsistencies after huggingface/transformers#32305 is merged |
cc @XuehaiPan |
@@ -0,0 +1,790 @@ | |||
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need this file. Just import the parallel_state
from megatron.core
.
from accelerate.utils.imports import is_megatron_lm_available
if is_megatron_lm_available():
from accelerate.utils.megatron_lm import mpu
@@ -1634,11 +1635,16 @@ def _prepare_deepspeed(self, *args): | |||
gradient_accumulation_steps=self.gradient_accumulation_steps, | |||
) | |||
|
|||
if mpu.model_parallel_is_initialized(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functions in mpu
are available after you call mpu.initialize_model_parallel(...)
. But I do not see related code in this PR. Is this PR still in progress?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is called in transformers
side
Moreover, this pr has become a subset of huggingface/transformers#35301 and #3299 . Many improvements have been made there. Maybe this pr is deprecated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this pr is deprecated?
Thanks for the context.
What does this PR do?
In this pr, I made some necessary modification to
accelerate
to achieve sequence parallel with Deepspeed-Ulysses incorporated withtransformers
.For more concrete
train_batch_size
.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@muellerzr @BenjaminBossan @SunMarc