Issue while running the interactive demo + Video Editing Documentation #67

sarthakg2002 · 2024-05-16T13:14:58Z

What am i doing wrong. I followed the installation guide.

Also i want to achieve the adding feature from the demo where the image was added to the dance video. Could you guide me to that part of the code cause from the scripting_demo_add_del_objects.py, its not clear where the video editing is being done (only images and not dealing with frames). Is there somewhere i could find the code to generate similar results?

hkchengrex · 2024-05-16T16:18:50Z

I would have to check the error message later. Is it possible that your workspace is corrupted (i.e., created but with no image present)? Try removing the entire workspace and starting again.

For the video editing demo, you can use the layered mode in the interactive demo. If you are running Cutie as a script, you would have to implement the layering by yourself but it should be fairly straightforward. The mask is used to separate the foreground from the background, and the layers are rendered in this order background->insertion layer->foreground.

sarthakg2002 · 2024-05-16T19:20:56Z

Got it thanks!

I was looking at the code and couldn't find where the mask of the image is calculated. Everywhere the mask is being loaded. Does this project assume the mask to be provided?

If so, which model was used to get masks for your dataset? I was thinking of using the model Sam for this purpose. Will that work?

sarthakg2002 · 2024-05-17T14:29:48Z

I was following this notebook: https://colab.research.google.com/drive/1yo43XTbjxuWA7XgCUO9qxAi7wBI6HzvP?usp=sharing&authuser=1 but it doesn't do layering so i took the initial setup from here and the rest i got from the main_controller.py. However, to use the overlay_layer_torch() function i needed to use the Resourcemanager class. When i pass the config variable (cfg), it gives me an error for no key images in cfg. How can i initialize it for images and other keys as well (i'm guessing video and max_overall_size keys will also give error).

hkchengrex · 2024-05-20T00:59:42Z

Hi, the first masks are always given in the VOS setting. You can indeed use SAM to create those masks.

For editing, I think it's easier to copy the masking logic and create your own function.

Sorry that I'm quite busy these days and cannot provide an example for now.

sarthakg2002 · 2024-05-24T18:10:47Z

Hey, Can you please let me know about the initialization for the ResourceManager class. I'm really having a hard time figuring that part out.

hkchengrex · 2024-05-25T03:01:29Z

However, to use the overlay_layer_torch() function i needed to use the Resourcemanager class

Cutie/gui/interactive_utils.py

Lines 195 to 215 in b8930f0

 def overlay_layer_torch(image: torch.Tensor, prob: torch.Tensor, layer: torch.Tensor, 

 target_objects: List[int]): 

 # insert a layer between foreground and background 

 # The CPU version is less accurate because we are using the hard mask 

 # The GPU version has softer edges as it uses soft probabilities 

 image = image.permute(1, 2, 0) 

 if len(target_objects) == 0: 

 obj_mask = torch.zeros_like(prob[0]).unsqueeze(2) 

 else: 

 # TODO: figure out why we need to convert this to numpy array 

 obj_mask = prob[np.array(target_objects, dtype=np.int32)].sum(0).unsqueeze(2) 

 layer_alpha = layer[:, :, 3].unsqueeze(2) 

 layer_rgb = layer[:, :, :3] 

 # background_alpha = torch.maximum(obj_mask, layer_alpha) 

 background_alpha = (1 - obj_mask) * (1 - layer_alpha) 

 im_overlay = (image * background_alpha + layer_rgb * (1 - obj_mask) * layer_alpha + 

 image * obj_mask).clip(0, 1) 

 im_overlay = (im_overlay * 255).byte().cpu().numpy() 

 return im_overlay

I don't think it is needed.
In any case, the logic is quite straightforward with just 10 lines of code. I don't think you would need to go through the internal logic in the controller (which is designed for the GUI).

sarthakg2002 · 2024-06-04T13:45:41Z

How do i get the variables prob and target_objects. Is layer just the image to be be inserted between the foreground and background which is converted to torch value.

hkchengrex · 2024-06-04T15:51:32Z

prob is our prediction before argmax. target_objects is a list of objects that should be used in masking. Yes.

sarthakg2002 · 2024-06-04T17:43:15Z

But how do i get those values. For example if i have 2 torch images img and overlay which i got using imread and then using image_to_torch, how do i get the values for those variables?

sarthakg2002 · 2024-06-04T18:18:07Z

To be more specific I'm trying to add an object (image) into a video and track it using pose estimation at a specific coordinate. Here is my current code for handling a single frame:

import torch
import numpy as np


def image_to_torch(frame: np.ndarray):
    device = 'cuda'
    frame = frame.transpose((2, 0, 1))
    frame = torch.from_numpy(frame).float().to(device, non_blocking=True) / 255
    return frame


def overlay_image_alpha(img, img_overlay, x, y, alpha_mask):
    y1, y2 = max(0, y), min(img.shape[0], y + img_overlay.shape[0])
    x1, x2 = max(0, x), min(img.shape[1], x + img_overlay.shape[1])

    y1o, y2o = max(0, -y), min(img_overlay.shape[0], img.shape[0] - y)
    x1o, x2o = max(0, -x), min(img_overlay.shape[1], img.shape[1] - x)

    if y1 >= y2 or x1 >= x2 or y1o >= y2o or x1o >= x2o:
        return img

    overlay_slice = img_overlay[y1o:y2o, x1o:x2o, :]
    mask_slice = alpha_mask[y1o:y2o, x1o:x2o]

    img_slice = img[y1:y2, x1:x2, :]

    alpha = mask_slice[..., None] / 255.0
    img[y1:y2, x1:x2, :] = (1.0 - alpha) * img_slice + alpha * overlay_slice[..., :3]

    return img


def overlay_image(img, img_overlay, x, y, alpha_mask):
    white_background = np.ones_like(img) * 255
    img_with_overlay = overlay_image_alpha(white_background, img_overlay, x, y, alpha_mask)
    img_with_overlay = image_to_torch(img_with_overlay).permute(1, 2, 0)
    # obj_mask = torch.zeros_like(torch.tensor(1, dtype=torch.int8)).unsqueeze(2)
    layer_alpha = img_with_overlay[:, :, 3].unsqueeze(2)
    layer_rgb = img_with_overlay[:, :, :3]
    background_alpha = (1 - obj_mask) * (1 - layer_alpha)
    img = image_to_torch(img)
    img_final = (img * background_alpha + layer_rgb * (1 - obj_mask) * layer_alpha + img * obj_mask).clip(0, 1)
    img_final= (img_final* 255).byte().cpu().numpy()
    return img_final

Not sure how to get obj_mask.

hkchengrex · 2024-06-05T14:36:04Z

Where are you using Cutie? The mask comes from there.

sarthakg2002 · 2024-06-05T20:37:37Z

Here is the updated code. i had to change the layer_alpha line by putting index 2 instead of 3 but im getting error that size of tensor should march :

def overlay_image(img, img_overlay, x, y, alpha_mask):
    white_background = np.ones_like(img) * 255
    img_with_overlay = overlay_image_alpha(white_background, img_overlay, x, y, alpha_mask)
    img_with_overlay = image_to_torch(img_with_overlay).permute(1, 2, 0)

    cutie = get_default_model()
    processor = InferenceCore(cutie, cfg=cutie.cfg)
    pil_image = img[:, :, ::-1]
    pil_image = Image.fromarray(pil_image)
    palette = [(0, 0, 0), (255, 255, 255)]
    indexed_image = pil_image.convert('P', palette=palette)
    mask = indexed_image.point(lambda p: 0 if p == 0 else 1)
    objects = np.unique(np.array(mask))
    objects = objects[objects != 0].tolist()
    mask = torch.from_numpy(np.array(mask)).cuda()
    image = to_tensor(pil_image).cuda().float()
    prob = processor.step(image, mask, objects=objects)

    obj_mask = prob[np.array(objects, dtype=np.int32)].sum(0).unsqueeze(2)
    layer_alpha = img_with_overlay[:, :, 2].unsqueeze(2)
    layer_rgb = img_with_overlay[:, :, :3]
    background_alpha = (1 - obj_mask) * (1 - layer_alpha)
    img = image_to_torch(img).permute(2, 0, 1)
    img_overlay = (img * background_alpha + layer_rgb * (1 - obj_mask) * layer_alpha + img * obj_mask).clip(0, 1)

    img_overlay = (img_overlay * 255).byte().cpu().numpy()
    return img_overlay

Error:

img_overlay = (img * background_alpha + layer_rgb * (1 - obj_mask) * layer_alpha + img * obj_mask).clip(0, 1)
                    ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RuntimeError: The size of tensor a (1280) must match the size of tensor b (720) at non-singleton dimension 2

hkchengrex · 2024-06-06T06:18:35Z

It would not work if you changed it from 3 to 2. You need a transparent PNG image as the layer image.
Also, your layer image might not have the same dimensions as the input. You would need to resize/pad it.

sarthakg2002 changed the title ~~Issue while running the interactive demo~~ Issue while running the interactive demo + Video Editing Documentation May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue while running the interactive demo + Video Editing Documentation #67

Issue while running the interactive demo + Video Editing Documentation #67

sarthakg2002 commented May 16, 2024 •

edited

hkchengrex commented May 16, 2024

sarthakg2002 commented May 16, 2024 •

edited

sarthakg2002 commented May 17, 2024

hkchengrex commented May 20, 2024

sarthakg2002 commented May 24, 2024 •

edited

hkchengrex commented May 25, 2024

sarthakg2002 commented Jun 4, 2024

hkchengrex commented Jun 4, 2024

sarthakg2002 commented Jun 4, 2024

sarthakg2002 commented Jun 4, 2024 •

edited

hkchengrex commented Jun 5, 2024

sarthakg2002 commented Jun 5, 2024

hkchengrex commented Jun 6, 2024

Issue while running the interactive demo + Video Editing Documentation #67

Issue while running the interactive demo + Video Editing Documentation #67

Comments

sarthakg2002 commented May 16, 2024 • edited

hkchengrex commented May 16, 2024

sarthakg2002 commented May 16, 2024 • edited

sarthakg2002 commented May 17, 2024

hkchengrex commented May 20, 2024

sarthakg2002 commented May 24, 2024 • edited

hkchengrex commented May 25, 2024

sarthakg2002 commented Jun 4, 2024

hkchengrex commented Jun 4, 2024

sarthakg2002 commented Jun 4, 2024

sarthakg2002 commented Jun 4, 2024 • edited

hkchengrex commented Jun 5, 2024

sarthakg2002 commented Jun 5, 2024

hkchengrex commented Jun 6, 2024

sarthakg2002 commented May 16, 2024 •

edited

sarthakg2002 commented May 16, 2024 •

edited

sarthakg2002 commented May 24, 2024 •

edited

sarthakg2002 commented Jun 4, 2024 •

edited