Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue while running the interactive demo + Video Editing Documentation #67

Open
sarthakg2002 opened this issue May 16, 2024 · 13 comments
Open

Comments

@sarthakg2002
Copy link

sarthakg2002 commented May 16, 2024

Screenshot 2024-05-16 091226
What am i doing wrong. I followed the installation guide.

Also i want to achieve the adding feature from the demo where the image was added to the dance video. Could you guide me to that part of the code cause from the scripting_demo_add_del_objects.py, its not clear where the video editing is being done (only images and not dealing with frames). Is there somewhere i could find the code to generate similar results?

@sarthakg2002 sarthakg2002 changed the title Issue while running the interactive demo Issue while running the interactive demo + Video Editing Documentation May 16, 2024
@hkchengrex
Copy link
Owner

I would have to check the error message later. Is it possible that your workspace is corrupted (i.e., created but with no image present)? Try removing the entire workspace and starting again.

For the video editing demo, you can use the layered mode in the interactive demo. If you are running Cutie as a script, you would have to implement the layering by yourself but it should be fairly straightforward. The mask is used to separate the foreground from the background, and the layers are rendered in this order background->insertion layer->foreground.

@sarthakg2002
Copy link
Author

sarthakg2002 commented May 16, 2024

Got it thanks!

I was looking at the code and couldn't find where the mask of the image is calculated. Everywhere the mask is being loaded. Does this project assume the mask to be provided?

If so, which model was used to get masks for your dataset? I was thinking of using the model Sam for this purpose. Will that work?

@sarthakg2002
Copy link
Author

I was following this notebook: https://colab.research.google.com/drive/1yo43XTbjxuWA7XgCUO9qxAi7wBI6HzvP?usp=sharing&authuser=1 but it doesn't do layering so i took the initial setup from here and the rest i got from the main_controller.py. However, to use the overlay_layer_torch() function i needed to use the Resourcemanager class. When i pass the config variable (cfg), it gives me an error for no key images in cfg. How can i initialize it for images and other keys as well (i'm guessing video and max_overall_size keys will also give error).

@hkchengrex
Copy link
Owner

Hi, the first masks are always given in the VOS setting. You can indeed use SAM to create those masks.

For editing, I think it's easier to copy the masking logic and create your own function.

Sorry that I'm quite busy these days and cannot provide an example for now.

@sarthakg2002
Copy link
Author

sarthakg2002 commented May 24, 2024

Hey, Can you please let me know about the initialization for the ResourceManager class. I'm really having a hard time figuring that part out.

@hkchengrex
Copy link
Owner

However, to use the overlay_layer_torch() function i needed to use the Resourcemanager class

def overlay_layer_torch(image: torch.Tensor, prob: torch.Tensor, layer: torch.Tensor,
target_objects: List[int]):
# insert a layer between foreground and background
# The CPU version is less accurate because we are using the hard mask
# The GPU version has softer edges as it uses soft probabilities
image = image.permute(1, 2, 0)
if len(target_objects) == 0:
obj_mask = torch.zeros_like(prob[0]).unsqueeze(2)
else:
# TODO: figure out why we need to convert this to numpy array
obj_mask = prob[np.array(target_objects, dtype=np.int32)].sum(0).unsqueeze(2)
layer_alpha = layer[:, :, 3].unsqueeze(2)
layer_rgb = layer[:, :, :3]
# background_alpha = torch.maximum(obj_mask, layer_alpha)
background_alpha = (1 - obj_mask) * (1 - layer_alpha)
im_overlay = (image * background_alpha + layer_rgb * (1 - obj_mask) * layer_alpha +
image * obj_mask).clip(0, 1)
im_overlay = (im_overlay * 255).byte().cpu().numpy()
return im_overlay

I don't think it is needed.
In any case, the logic is quite straightforward with just 10 lines of code. I don't think you would need to go through the internal logic in the controller (which is designed for the GUI).

@sarthakg2002
Copy link
Author

How do i get the variables prob and target_objects. Is layer just the image to be be inserted between the foreground and background which is converted to torch value.

@hkchengrex
Copy link
Owner

prob is our prediction before argmax. target_objects is a list of objects that should be used in masking. Yes.

@sarthakg2002
Copy link
Author

But how do i get those values. For example if i have 2 torch images img and overlay which i got using imread and then using image_to_torch, how do i get the values for those variables?

@sarthakg2002
Copy link
Author

sarthakg2002 commented Jun 4, 2024

To be more specific I'm trying to add an object (image) into a video and track it using pose estimation at a specific coordinate. Here is my current code for handling a single frame:

import torch
import numpy as np


def image_to_torch(frame: np.ndarray):
    device = 'cuda'
    frame = frame.transpose((2, 0, 1))
    frame = torch.from_numpy(frame).float().to(device, non_blocking=True) / 255
    return frame


def overlay_image_alpha(img, img_overlay, x, y, alpha_mask):
    y1, y2 = max(0, y), min(img.shape[0], y + img_overlay.shape[0])
    x1, x2 = max(0, x), min(img.shape[1], x + img_overlay.shape[1])

    y1o, y2o = max(0, -y), min(img_overlay.shape[0], img.shape[0] - y)
    x1o, x2o = max(0, -x), min(img_overlay.shape[1], img.shape[1] - x)

    if y1 >= y2 or x1 >= x2 or y1o >= y2o or x1o >= x2o:
        return img

    overlay_slice = img_overlay[y1o:y2o, x1o:x2o, :]
    mask_slice = alpha_mask[y1o:y2o, x1o:x2o]

    img_slice = img[y1:y2, x1:x2, :]

    alpha = mask_slice[..., None] / 255.0
    img[y1:y2, x1:x2, :] = (1.0 - alpha) * img_slice + alpha * overlay_slice[..., :3]

    return img


def overlay_image(img, img_overlay, x, y, alpha_mask):
    white_background = np.ones_like(img) * 255
    img_with_overlay = overlay_image_alpha(white_background, img_overlay, x, y, alpha_mask)
    img_with_overlay = image_to_torch(img_with_overlay).permute(1, 2, 0)
    # obj_mask = torch.zeros_like(torch.tensor(1, dtype=torch.int8)).unsqueeze(2)
    layer_alpha = img_with_overlay[:, :, 3].unsqueeze(2)
    layer_rgb = img_with_overlay[:, :, :3]
    background_alpha = (1 - obj_mask) * (1 - layer_alpha)
    img = image_to_torch(img)
    img_final = (img * background_alpha + layer_rgb * (1 - obj_mask) * layer_alpha + img * obj_mask).clip(0, 1)
    img_final= (img_final* 255).byte().cpu().numpy()
    return img_final

Not sure how to get obj_mask.

@hkchengrex
Copy link
Owner

Where are you using Cutie? The mask comes from there.

@sarthakg2002
Copy link
Author

Here is the updated code. i had to change the layer_alpha line by putting index 2 instead of 3 but im getting error that size of tensor should march :

def overlay_image(img, img_overlay, x, y, alpha_mask):
    white_background = np.ones_like(img) * 255
    img_with_overlay = overlay_image_alpha(white_background, img_overlay, x, y, alpha_mask)
    img_with_overlay = image_to_torch(img_with_overlay).permute(1, 2, 0)

    cutie = get_default_model()
    processor = InferenceCore(cutie, cfg=cutie.cfg)
    pil_image = img[:, :, ::-1]
    pil_image = Image.fromarray(pil_image)
    palette = [(0, 0, 0), (255, 255, 255)]
    indexed_image = pil_image.convert('P', palette=palette)
    mask = indexed_image.point(lambda p: 0 if p == 0 else 1)
    objects = np.unique(np.array(mask))
    objects = objects[objects != 0].tolist()
    mask = torch.from_numpy(np.array(mask)).cuda()
    image = to_tensor(pil_image).cuda().float()
    prob = processor.step(image, mask, objects=objects)

    obj_mask = prob[np.array(objects, dtype=np.int32)].sum(0).unsqueeze(2)
    layer_alpha = img_with_overlay[:, :, 2].unsqueeze(2)
    layer_rgb = img_with_overlay[:, :, :3]
    background_alpha = (1 - obj_mask) * (1 - layer_alpha)
    img = image_to_torch(img).permute(2, 0, 1)
    img_overlay = (img * background_alpha + layer_rgb * (1 - obj_mask) * layer_alpha + img * obj_mask).clip(0, 1)

    img_overlay = (img_overlay * 255).byte().cpu().numpy()
    return img_overlay

Error:

img_overlay = (img * background_alpha + layer_rgb * (1 - obj_mask) * layer_alpha + img * obj_mask).clip(0, 1)
                    ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RuntimeError: The size of tensor a (1280) must match the size of tensor b (720) at non-singleton dimension 2

@hkchengrex
Copy link
Owner

It would not work if you changed it from 3 to 2. You need a transparent PNG image as the layer image.
Also, your layer image might not have the same dimensions as the input. You would need to resize/pad it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants