-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
error:tImplementedError: Could not run 'aten::_amp_foreach_non_finite_check_and_unscale_' with arguments from the 'CPU' backend. #3
Comments
@daixiangzi that looks like an unrelated error that is specific to cpu with the |
When I use a single machine with multiple gpu, I can run your code, but when I use multiple machines with multiple gpus,error will appear. |
|
are you using fsdp? |
no ,I use ddp |
ah, hard for me to debug. i only have a single machine |
@daixiangzi could you try disabling mixed precision just for the with torch.cuda.amp.autocast(enabled = False):
# ... your moe forward |
finally seeing the limits of pytorch in jax, properly working moe is just a few lines of code |
ok |
I try it just now, but not work |
define moe layer:
forward:
|
NotImplementedError: Could not run 'aten::amp_foreach_non_finite_check_and_unscale' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build proces |
in fact I train dp version on single machine with multiple gpu, but result is still bad, I have a little doubt about soft-moe-pytorch/soft_moe_pytorch/soft_moe.py Line 315 in 4dc07b0
I see origin rep:https://github.com/google-research/vmoe/blob/dfb9ee01ce6dfc5a8228b406e768ee325dd18fcd/vmoe/nn/vit_moe.py#L109C27-L109C45 your code:
origin rep code seem is:
|
@daixiangzi hmm, have you tried it their way? conventionally we always work with pre-normalized values, so that would be weird they dispatch based on unnormalized input into the block |
no ,I will try it.but I always have some doubts about the effectiveness of softmoe, mainly because I have been adjusting it for a long time |
In terms of visual tasks, there are currently many open source versions of MOE, but in fact, there is not a sufficient experiment to demonstrate the effectiveness of this approach |
@daixiangzi yea it happens thanks for sharing your results just to make sure we don't miss anything, would you like to also do a quick run where i substituted the rmsnorm with layernorm? (set https://github.com/lucidrains/soft-moe-pytorch/blob/main/soft_moe_pytorch/soft_moe.py#L287 to |
Actually, before this, I tried my own version where the norm part used LN, but the effect was still not as good as the baseline |
@daixiangzi ah ok, good to know i'll take your experience as a datapoint |
myself softmoe version:
|
in softmoe paper:
|
ohh i did it correct then the first time |
ok |
Hi, I met the same problem as you. Have you solved this problem? |
when I use:
from torch.cuda.amp import GradScaler
amp = GradScaler(init_scale=512, growth_interval=100)
for img, label in train_iter:
with torch.cuda.amp.autocast(True):
label = backbone(img)
amp.scale(loss).backward()
amp.unscale_(opt)
The text was updated successfully, but these errors were encountered: