Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError when running demo code: 'str' object has no attribute 'size' #10

Open
GopiSumanth opened this issue Jan 22, 2025 · 0 comments

Comments

@GopiSumanth
Copy link

GopiSumanth commented Jan 22, 2025

I'm trying to run the demo code from the Sa2VA-1B model repository. While executing the code, I encountered the following error:
Could you please help me resolve this issue?
Environment details:

Model: ByteDance/Sa2VA-1B
Source: Hugging Face Hub (https://huggingface.co/ByteDance/Sa2VA-1B)

code:

from transformers import AutoTokenizer, AutoModel
from PIL import Image
import numpy as np
import os

# load the model and tokenizer
path = "ByteDance/Sa2VA-1B"
model = AutoModel.from_pretrained(
    path,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    use_flash_attn=True,
    trust_remote_code=True).eval().cuda()
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, use_fast=False)


# for video chat
video_folder = "video"
images_paths = os.listdir(video_folder)
images_paths = [os.path.join(video_folder, image_name) for image_name in images_paths]
if len(images_paths) > 5:  # uniformly sample 5 frames
    step = (len(images_paths) - 1) // (5 - 1)
    images_paths = [images_paths[0]] + images_paths[1:-1][::step][1:] + [images_paths[-1]]
text_prompts = "<image>Please describe the video."
input_dict = {
    'video': images_paths,
    'text': text_prompts,
    'past_text': '',
    'mask_prompts': None,
    'tokenizer': tokenizer,
}
return_dict = model.predict_forward(**input_dict)
answer = return_dict["prediction"]

print("video response: ",answer)

Error:

Traceback (most recent call last):
  File "_codepath_", line 51, in <module>
    return_dict = model.predict_forward(**input_dict)
  File "_codepath_/.cache/huggingface/modules/transformers_modules/ByteDance/Sa2VA-1B/1059a41774e0541f4d2a333ae8e9b97b64901f89/modeling_sa2va_chat.py", line 601, in predict_forward
    ori_image_size = video[0].size
AttributeError: 'str' object has no attribute 'size'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant