Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input tensor at index 2 has invalid shape [2, 2, 12, 1024, 64], but expected [2, 3, 12, 1024, 64] #264

Open
ZouRuia opened this issue Jan 13, 2023 · 0 comments

Comments

@ZouRuia
Copy link

ZouRuia commented Jan 13, 2023

我用三块卡训练得时候会出现这个错,然后我去查了一圈,发现有一个四块卡报RuntimeError: Input tensor at index 3 has invalid shape [2, 2, 16, 128, 64] but expected [2, 4, 16, 128, 64]的,然后我就又改回了四块卡训练,然后就很奇怪的跑通了。。但是不知道为什么。。
args:
Namespace(batch_size=8, device='5,6,1,4', epochs=5, fp16=False, fp16_opt_level='O1', gradient_accumulation=1, log_step=1, lr=0.00015, max_grad_norm=1.0, model_config='config/model_config_small.json', num_pieces=100, output_dir='model/', pretrained_model='', raw=False, raw_data_path='data/data/doupo/train.json', segment=False, stride=768, tokenized_data_path='data/tokenized/', tokenizer_path='cache/vocab_small.txt', warmup_steps=2000)
config:
{
"attn_pdrop": 0.1,
"embd_pdrop": 0.1,
"finetuning_task": null,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"n_ctx": 1024,
"n_embd": 768,
"n_head": 12,
"n_layer": 10,
"n_positions": 1024,
"num_labels": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_past": true,
"pruned_heads": {},
"resid_pdrop": 0.1,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torchscript": false,
"use_bfloat16": false,
"vocab_size": 13317
}

using device: cuda
calculating total steps
100%|████████████████████████████████████████████████████████████████████████████| 100/100 [00:01<00:00, 92.82it/s]
total steps = 3914
Let's use 4 GPUs!
starting training
epoch 1
time: 2023-01-13 11:48:51.538218
/u01/zourui/anaconda3/envs/GPT/lib/python3.8/site-packages/torch/nn/parallel/functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.
warnings.warn('Was asked to gather along dimension 0, but all '
/u01/zourui/anaconda3/envs/GPT/lib/python3.8/site-packages/transformers/optimization.py:166: UserWarning: This overload of add
is deprecated:
add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
add_(Tensor other, *, Number alpha) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:1005.)
exp_avg.mul_(beta1).add_(1.0 - beta1, grad)
now time: 11:49. Step 1 of piece 0 of epoch 1, loss 9.667740821838379
now time: 11:49. Step 2 of piece 0 of epoch 1, loss 9.682665824890137
now time: 11:49. Step 3 of piece 0 of epoch 1, loss 9.685418128967285
now time: 11:49. Step 4 of piece 0 of epoch 1, loss 9.6702299118042
now time: 11:49. Step 5 of piece 0 of epoch 1, loss 9.668827056884766
now time: 11:49. Step 6 of piece 0 of epoch 1, loss 9.66973876953125
now time: 11:49. Step 7 of piece 0 of epoch 1, loss 9.65914535522461

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant