Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NMS] - add segmentation models support #847

Merged
merged 24 commits into from
Feb 7, 2024

Conversation

AdonaiVera
Copy link
Contributor

Description

This PR introduces Non-Maximum Suppression (NMS) algorithm focused on segmentation, enhancing our object detection capabilities, particularly in segmentation tasks. We have renamed the traditional non_max_suppression function to box_non_max_suppression for better clarity regarding its application to bounding boxes. Furthermore, we've integrated a conditional mechanism within the with_nms function that utilizes segmentation masks for NMS when such masks are present in the predictions. This optimization leverages the spatial context provided by segmentation masks to improve the suppression process.

This enhancement is part of a task segmented into two parts for more focused development and review. This PR addresses the first part, as discussed in the related issue here. Splitting the task ensures thorough implementation and testing of each component.

In addition, this PR encompasses comprehensive unit tests for the new NMS functionality and a demo that demonstrates the algorithm's application and effectiveness in real-world scenarios.

Type of change

  • New feature (non-breaking change which adds functionality)

How has this change been tested

I created a demo to showcase this functionality here

Docs

  • Docs updated? What were the changes:

docs/detection/utils.md Show resolved Hide resolved
supervision/__init__.py Show resolved Hide resolved
supervision/detection/core.py Outdated Show resolved Hide resolved
supervision/detection/core.py Outdated Show resolved Hide resolved
@SkalskiP
Copy link
Collaborator

SkalskiP commented Feb 6, 2024

Hi @AdonaiVera 👋🏻 ! I left a few comments on the code, but for the moment, these are secondary issues. The main problem I see is speed. Using your Colab example, I measured that it takes 13 seconds to calculate the mask_iou_batch for your image. We need to make it 1-2 orders of magnitude faster.

In addition, RAM consumption jumps from 4GB to 12-13GB just because you count the IoU mask.

A quick query in ChatGPT suggested the code below, but I didn't check if the calculation result was the same or if the code was faster. I leave it here simply as an idea.

import numpy as np

def mask_iou_batch(masks_true: np.ndarray, masks_detection: np.ndarray) -> np.ndarray:
    """
    Compute Intersection over Union (IoU) of two sets of masks -
        `masks_true` and `masks_detection`.

    Args:
        masks_true (np.ndarray): 3D `np.ndarray` representing ground-truth masks.
        masks_detection (np.ndarray): 3D `np.ndarray` representing detection masks.

    Returns:
        np.ndarray: Pairwise IoU of masks from `masks_true` and `masks_detection`.
    """
    # No need to cast to bool here if the masks are already boolean

    # Calculate intersection and union areas
    intersection_area = np.logical_and(masks_true[:, None], masks_detection).sum(axis=(2, 3))
    masks_true_area = masks_true.sum(axis=(1, 2))[:, None]
    masks_detection_area = masks_detection.sum(axis=(1, 2))

    # Calculate union area using the inclusion-exclusion principle
    union_area = masks_true_area + masks_detection_area - intersection_area

    # Avoid division by zero and calculate IoU
    iou = np.divide(intersection_area, union_area, out=np.zeros_like(intersection_area, dtype=float), where=union_area != 0)

    return iou

The second idea I have is to resize the masks. Maybe we don't need to count IoU at full resolution. I'm curious, if we did rescaling on the input to mask_iou_batch and scaled the masks to the maximum dimension of 640 would the IoU result differ much?

Update

I couldn't help myself and decided to see how the IoU mask speed would change if we used rescaling and calculated IoU on smaller masks. For your image, the execution time drops from 13 seconds to 0.8 seconds. Rescaling the masks alone takes 0.01 seconds.

Here is my vectorized rescaling code. I also included it in our test Colab.

def resize_masks(masks: np.ndarray, max_dimension: int = 640) -> np.ndarray:
    """
    Resize all masks in the array to have a maximum dimension of max_dimension,
    maintaining aspect ratio.

    Args:
        masks (np.ndarray): 3D array of binary masks with shape (N, H, W).
        max_dimension (int): The maximum dimension for the resized masks.

    Returns:
        np.ndarray: Array of resized masks.
    """
    max_height = np.max(masks.shape[1])
    max_width = np.max(masks.shape[2])
    scale = min(max_dimension / max_height, max_dimension / max_width)

    new_height = int(scale * max_height)
    new_width = int(scale * max_width)

    # Create a grid of coordinates in the resized shape
    x = np.linspace(0, max_width - 1, new_width).astype(int)
    y = np.linspace(0, max_height - 1, new_height).astype(int)
    xv, yv = np.meshgrid(x, y)

    # Use nearest-neighbor interpolation to resize all masks
    resized_masks = masks[:, yv, xv]

    # Reshape to remove the extra dimension
    resized_masks = resized_masks.reshape(masks.shape[0], new_height, new_width)
    return resized_masks

Update 2

I also checked how the IoU mask values change when we scale the mask. The graph below shows the absolute value of the difference between mask IoU calculated on full-size masks and on scaled masks. The max value of the IoU difference is 0.00588. In my opinion, mask scaling when calculating IoU is a must-have.

download - 2024-02-06T103803 368

@AdonaiVera
Copy link
Contributor Author

Hi @SkalskiP 👋

I made the changes to the algorithm based on your feedback. I executed the test on my local machine and found that the execution time for the mask_iou_batch function was 11 seconds without any improvements.
mask_iou_batch execution time: 11.263449907302856 seconds

Then, I implemented your suggestion of using the inclusion-exclusion principle to improve the mask_iou_batch function. After this change, the execution time reduced from 11 seconds to almost 4 seconds.
mask_iou_batch execution time: 3.967336893081665 seconds

After that, I applied your resize function to the input images before applying the mask_iou_batch function, and this further decreased the execution time to almost 0.5 seconds.
mask_iou_batch execution time: 0.5139651298522949 seconds

Thank you for your improvements, and I believe that these changes have significantly improved the algorithm's speed. I also incorporated the other small changes that you suggested and pushed the changes.

I think the idea of resizing the images worked really well because, from the graph you provided, the absolute value of the difference between the mask IoU calculated on full-size masks and on scaled masks is not significant, and the processing time improved a lot.

I executed all the unit test and the algorithm is working fine 👍

PDT: I left the print time message to keep making more tests if needed. Thank you for your valuable feedback. 🚀

Thank you 🥷

@SkalskiP
Copy link
Collaborator

SkalskiP commented Feb 7, 2024

Hi @AdonaiVera 👋🏻 ! I made a few small changes in the second part of mask_non_max_suppression. Converted those for loops into a more vectorized solution.

keep = np.ones(rows, dtype=bool)
for i in range(rows):
    if keep[i]:
        condition = (ious[i] > iou_threshold) & (categories[i] == categories)
        keep[i + 1 :] = np.where(condition[i + 1 :], False, keep[i + 1 :])

return keep[sort_index.argsort()]

I also added some additional unit tests.

In general, we could make it even faster. But taking it from 13 to 0.4 seconds is still a good start. I'm merging! Thanks a lot for your help! 🔥

@SkalskiP SkalskiP merged commit a670faf into roboflow:develop Feb 7, 2024
8 checks passed
@AdonaiVera
Copy link
Contributor Author

Amazing, thank you @SkalskiP 🚀
Now, I'm going to move on to the second part of the issue

- load image
- slice the image into NxN tiles
- surround smaller size slices (the ones close to the edges) with a letterbox so that all tiles are NxN
- loop over slices
   - run inference
- update box coordinate values to match the image coordinate system, not the slice coordinate system
- pad masks to match the image coordinate system, not the slice coordinate system
- merge detections

I'll update a PR soon 💪

@SkalskiP
Copy link
Collaborator

SkalskiP commented Feb 7, 2024

@AdonaiVera, thanks a lot! 🙏🏻 You know where to find me!

@AdonaiVera AdonaiVera deleted the segmentation_nms branch February 9, 2024 04:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

2 participants