Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single Character #17

Open
p0mad opened this issue Jul 16, 2023 · 9 comments
Open

Single Character #17

p0mad opened this issue Jul 16, 2023 · 9 comments

Comments

@p0mad
Copy link

p0mad commented Jul 16, 2023

Hi,
Is it possible to generate a single character from the Pose for more than 5 seconds?

I have a video of Pose ( openpose + hands + face) and i was wondering if it is possible to generate an output video withe the length of 5 seconds that has a consistent character/Avatar which plays Dance, .... from the controled (pose) input?

Thanks
Best regards

@YBYBZhang
Copy link
Owner

Hi pomad, thanks for your attention! In my machine (11GB 2080TI), it is feasible to produce a consistent video conditioned on human pose with about 100 frames (i.e., 4~5 seconds in 24 fps), which is shown in https://github.com/YBYBZhang/ControlVideo#long-video-generation.

@p0mad
Copy link
Author

p0mad commented Jul 17, 2023

@YBYBZhang
Thats great. But have you initialized the pose with some inputs (video or an image)?

I have a video of OpenPose+hands+face and i want to generate human like animation (No matter what, but just a consistent Character/Avatar)
Sample Video

human pose with about 100 frames (i.e., 4~5 seconds in 24 fps), which is shown in #long-video-generation.

The hulk sized grows and the face/hairs changes during the generated video!
Do you have any idea on how to have a fixed sized and consistent character?

Thanks
Best regards

@YBYBZhang
Copy link
Owner

@p0mad
The synthesized Hulk video is initialized with poses below.
Now, our ControlVideo ensures video consistency with fully cross-frame attention only.
In future, adding temporal attention by finetuning on sufficient videos may improve size and character consistency!
https://github.com/YBYBZhang/ControlVideo/assets/40799060/21b53efe-2167-4f74-afc2-3bec021acf20

@p0mad
Copy link
Author

p0mad commented Jul 17, 2023

@YBYBZhang
Thanks for the detailed information.
Would you please also give me some insights / guides into the Hands+Face of the Pose?
Is there any model that i can use? ( i see that ControlNet has the Full-OpenPose) but as i tested in the HF space, it wont care about it!
Is there any reason? (bad output)

Also would you please provide me some prompts that output a consistent character for the provided pose ( like a boy playing something with black background and animation style) to get a consistent character able to dance with the correct generation of face and hands?

This was my best bet on generation with the pose!
final_result (2)

Thanks
Best regards

@YBYBZhang
Copy link
Owner

Full-Openpose ControlNet is trained based on Stable Diffusion v1.5, and thus inherits its limitations in producing low-quality hands and faces.
I have tried to produce a video using ControlVideo (ControlNet v1.1, full-openpose), with a simple prompt "A man, animation style." As shown below, the sythensized video looks more consistent than that from vanilla ControlNet. I hope this would help you.

A.man.animation.style.mp4

@p0mad
Copy link
Author

p0mad commented Jul 18, 2023

@YBYBZhang
Thank you so much for your time.
Would you please guide me into the steps of generating this video?

You have installed controlnet 1.1, download the OpenPose-full weights and then selected the openpose-full, put the "A man, animation style." in prompt box, input the video pose ( or have you used batch?) and then generate without any other input?
How about the seed, steps?

is there any other ways to improve hand and face accuracy?
like using openpifpaf as mentioned in the ControlNet paper ( which is on SD2.1)?
or SD2.1/SD-XL for OpenPose-full version?

Also would you please let me know your GPU and Mem, CPU?

Thanks
Best regards

@YBYBZhang
Copy link
Owner

With 2080Ti 11GB GPU, I use the following script to produce above video:

python inference.py \
    --prompt "A man, animation style." \
    --condition "openpose" \
    --video_path "data/pose1.mp4" \
    --output_path "outputs/" \
    --video_length 55 \
    --smoother_steps 19 20 \
    --width 512 \
    --height 512 \
    --frame_rate 2 \
    --version v11 \
    --is_long_video

where pose1.mp4 is center-cropped from your pose video.

I haven't explored using higher SD or ControlNet to enhance hand and face, but I believe that they could achieve this goal.

@p0mad
Copy link
Author

p0mad commented Jul 18, 2023

@YBYBZhang, Thank you so much

I have Five questions in regards:

--condition "openpose"

  1. As it is ContNet 1.1, Do you think the output with "openpose_full" could lead to better results?
    Activate Hand-Face as mentioned in the HF CN11 we need to put hand_and_face=True. but i couldnt found such a thing on your repo

.

--video_length 55 \

  1. I was wondering why you have choosen 55 as the video Length?

The original Pose:
Sample Video
image
shows the length of 1.8 s

while the output video: yours(output)

image

The Results is confusing to me(Shows 2.7s)!

Have you changed the input (Pose) to the 30Fps and center cropped?
can you please send the cropped version of my pose?

Do we able to generate output with 24 or 30FPS instead of 20? (is that --smoother_steps option?)
.

but I believe that they could achieve this goal

  1. Can i use SD 2.1 with controlNet 1.1 with the same openPose Weights?
    or it needs to be trained on SD 2.1? (As you Mentioned defaults is SD1.5 with CN1.1)

.

  1. Would it be possible to Also input a random image ( desired character) as an intial Character to the SD+CN?
    Example image:
    image

  2. Is it possible to set number of steps for each frame in Unet?

Thanks Again
Best regard

@YBYBZhang
Copy link
Owner

  1. "openpose" and "openpose_full" shares the same type of ControlNet. The given video is poses with hand and face landmarks, so I directly input it into ControlVideo.
  2. The video is length of about 110, so I choose 55 for efficient generation. The input and output fps are different, and you could set output fps in this line. You could directly crop the video in this website.
  3. Open-sourced ControlNet is trained based on SDv1.5. If you want to use SD v2 based ControlNet, you must retrain it.
  4. With both "shuffle ControlNet" and "openpose ControlNet", this goal might be achieved.
  5. Maybe possible, but there is no corresponding implementation as far as I know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants