You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I got a problem after I downloaded everything from hugging face and run the finetune.sh . I have totally no idea how to figure out it...
This is part of error information, it still has a lot which I didn't paste, because it's all different layers' same problem
Traceback (most recent call last): File "/root/autodl-tmp/Video-LLaVA/videollava/train/train_mem.py", line 13, in <module> train() File "/root/autodl-tmp/Video-LLaVA/videollava/train/train.py", line 1003, in train model.get_model().initialize_vision_modules( File "/root/autodl-tmp/Video-LLaVA/videollava/model/llava_arch.py", line 66, in initialize_vision_modules image_tower = build_image_tower(model_args) File "/root/autodl-tmp/Video-LLaVA/videollava/model/multimodal_encoder/builder.py", line 11, in build_image_tower return CLIPVisionTower(image_tower, args=image_tower_cfg, **kwargs) File "/root/autodl-tmp/Video-LLaVA/videollava/model/multimodal_encoder/clip_encoder.py", line 18, in __init__ self.load_model() File "/root/autodl-tmp/Video-LLaVA/videollava/model/multimodal_encoder/clip_encoder.py", line 24, in load_model self.vision_tower = CLIPVisionModel.from_pretrained(self.vision_tower_name) File "/root/miniconda3/envs/videollava/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2903, in from_pretrained ) = cls._load_pretrained_model( File "/root/miniconda3/envs/videollava/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3310, in _load_pretrained_model raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}") RuntimeError: Error(s) in loading state_dict for CLIPVisionModel: size mismatch for vision_model.embeddings.class_embedding: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.embeddings.patch_embedding.weight: copying a param with shape torch.Size([1024, 3, 14, 14]) from checkpoint, the shape in current model is torch.Size([768, 3, 32, 32]). size mismatch for vision_model.embeddings.position_embedding.weight: copying a param with shape torch.Size([257, 1024]) from checkpoint, the shape in current model is torch.Size([50, 768]). size mismatch for vision_model.pre_layrnorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.pre_layrnorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.encoder.layers.0.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for vision_model.encoder.layers.0.self_attn.k_proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.encoder.layers.0.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for vision_model.encoder.layers.0.self_attn.v_proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.encoder.layers.0.self_attn.q_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for vision_model.encoder.layers.0.self_attn.q_proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.encoder.layers.0.self_attn.out_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for vision_model.encoder.layers.0.self_attn.out_proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.encoder.layers.0.layer_norm1.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.encoder.layers.0.layer_norm1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.encoder.layers.0.mlp.fc1.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]). size mismatch for vision_model.encoder.layers.0.mlp.fc1.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for vision_model.encoder.layers.0.mlp.fc2.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]). size mismatch for vision_model.encoder.layers.0.mlp.fc2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.encoder.layers.0.layer_norm2.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.encoder.layers.0.layer_norm2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.encoder.layers.1.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for vision_model.encoder.layers.1.self_attn.k_proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.encoder.layers.1.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for vision_model.encoder.layers.1.self_attn.v_proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.encoder.layers.1.self_attn.q_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for vision_model.encoder.layers.1.self_attn.q_proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.encoder.layers.1.self_attn.out_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for vision_model.encoder.layers.1.self_attn.out_proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.encoder.layers.1.layer_norm1.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.encoder.layers.1.layer_norm1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.encoder.layers.1.mlp.fc1.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]). …… size mismatch for vision_model.encoder.layers.11.mlp.fc2.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
The text was updated successfully, but these errors were encountered:
I got a problem after I downloaded everything from hugging face and run the finetune.sh . I have totally no idea how to figure out it...
This is part of error information, it still has a lot which I didn't paste, because it's all different layers' same problem
Traceback (most recent call last): File "/root/autodl-tmp/Video-LLaVA/videollava/train/train_mem.py", line 13, in <module> train() File "/root/autodl-tmp/Video-LLaVA/videollava/train/train.py", line 1003, in train model.get_model().initialize_vision_modules( File "/root/autodl-tmp/Video-LLaVA/videollava/model/llava_arch.py", line 66, in initialize_vision_modules image_tower = build_image_tower(model_args) File "/root/autodl-tmp/Video-LLaVA/videollava/model/multimodal_encoder/builder.py", line 11, in build_image_tower return CLIPVisionTower(image_tower, args=image_tower_cfg, **kwargs) File "/root/autodl-tmp/Video-LLaVA/videollava/model/multimodal_encoder/clip_encoder.py", line 18, in __init__ self.load_model() File "/root/autodl-tmp/Video-LLaVA/videollava/model/multimodal_encoder/clip_encoder.py", line 24, in load_model self.vision_tower = CLIPVisionModel.from_pretrained(self.vision_tower_name) File "/root/miniconda3/envs/videollava/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2903, in from_pretrained ) = cls._load_pretrained_model( File "/root/miniconda3/envs/videollava/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3310, in _load_pretrained_model raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}") RuntimeError: Error(s) in loading state_dict for CLIPVisionModel: size mismatch for vision_model.embeddings.class_embedding: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.embeddings.patch_embedding.weight: copying a param with shape torch.Size([1024, 3, 14, 14]) from checkpoint, the shape in current model is torch.Size([768, 3, 32, 32]). size mismatch for vision_model.embeddings.position_embedding.weight: copying a param with shape torch.Size([257, 1024]) from checkpoint, the shape in current model is torch.Size([50, 768]). size mismatch for vision_model.pre_layrnorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.pre_layrnorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.encoder.layers.0.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for vision_model.encoder.layers.0.self_attn.k_proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.encoder.layers.0.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for vision_model.encoder.layers.0.self_attn.v_proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.encoder.layers.0.self_attn.q_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for vision_model.encoder.layers.0.self_attn.q_proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.encoder.layers.0.self_attn.out_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for vision_model.encoder.layers.0.self_attn.out_proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.encoder.layers.0.layer_norm1.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.encoder.layers.0.layer_norm1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.encoder.layers.0.mlp.fc1.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]). size mismatch for vision_model.encoder.layers.0.mlp.fc1.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for vision_model.encoder.layers.0.mlp.fc2.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]). size mismatch for vision_model.encoder.layers.0.mlp.fc2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.encoder.layers.0.layer_norm2.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.encoder.layers.0.layer_norm2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.encoder.layers.1.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for vision_model.encoder.layers.1.self_attn.k_proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.encoder.layers.1.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for vision_model.encoder.layers.1.self_attn.v_proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.encoder.layers.1.self_attn.q_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for vision_model.encoder.layers.1.self_attn.q_proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.encoder.layers.1.self_attn.out_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for vision_model.encoder.layers.1.self_attn.out_proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.encoder.layers.1.layer_norm1.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.encoder.layers.1.layer_norm1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for vision_model.encoder.layers.1.mlp.fc1.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]). …… size mismatch for vision_model.encoder.layers.11.mlp.fc2.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
The text was updated successfully, but these errors were encountered: