-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[InferenceSlicer] - add segmentation models support #678
Comments
Hi @SkalskiP , I've been working on integrating advanced mask handling capabilities into the InferenceSlicer class, I've hit a snag while trying to merge detection objects with variable-sized masks in the Detections.merge I'm considering a few approaches to address this, such as resizing, padding, or storing masks individually, I'd appreciate your thoughts on the best path forward. Should we prioritize memory efficiency, ease of Thank you for your guidance. 👋 |
Hi @AdonaiVera 👋🏻 We expect the exact dimensions of masks (width and height) to be the same as the source image. This is our approach for now. We are, of course, aware of potential memory and speed optimizations. If your masks are variable-sized, you should pad them and make them all equally sized. |
Yes @SkalskiP, Its true masks match the dimensions of the slice's image. However, the Inference Slicer creates slices of variable sizes, particularly at image boundaries, depending on the slice_wh and the actual dimension of the image, leading to masks that don't all share the same dimensions. This variability creates a challenge when we try to merge these masks using np.vstack, as it requires uniform dimensions. def stack_or_none(name: str):
if all(d.__getattribute__(name) is None for d in detections_list):
return None
if any(d.__getattribute__(name) is None for d in detections_list):
raise ValueError("All or none of the '{}' fields must be None".format(name))
return np.vstack([d.__getattribute__(name) for d in detections_list]) if name == "mask" else np.hstack([d.__getattribute__(name) for d in detections_list])
mask = stack_or_none("mask") I am considering two potential solutions to address the issue at hand. The first approach is to scale masks to match the largest dimensions and then resize them as needed (As you suggest, but we need to add two resizing steps). The second approach is to store the masks in a different format that can accommodate variable sizes. I would appreciate your thoughts on these approaches or any other suggestions you might have. I want to make sure that I fully understand the problem before creating the PR. Thank you 🚀 |
@AdonaiVera oooooh! Now I understand what you mean. I did not foresee this complexity when dissecting this task. I'm sorry. This is quite obvious in hindsight. 🤦🏻♂️ First of all, I'm happy to limit the scope of this task to just the segmentation of NMS. This would still be a big win for supervision. Then, we could work on segmentation slicing logic separately. Let me know what you think. Even if you opt to continue, we should split work into two PRs, separate for segmentation NMS and slicing logic. Please open it if you can. 🙏🏻 As for the potential solution, here is what I think: Some time ago, one of the users opened this issue. There was no follow-up, so we closed it, but I think a particular group of models expects images in specific shapes, for example,
|
@AdonaiVera let me know what you decided 🙏 |
Hi @SkalskiP 👋 I like the idea of splitting the task into two different PRs. This will help us to organize the results better. The first PR will focus on Segmentation NMS, and the second one will focus on segmentation slicing. I will work on the NMS feature first. Once I finish it, I will open the PR. Regarding your solution, I like it a lot as it will solve the issue of different slide sizes. However, the user cannot add the size of the slide as an input. It has to be inferred depending on the size of the image to guarantee the same size in each slide. I can test this idea and see how it works. 💪 |
@AdonaiVera I'm waiting for Segmentation NMS PR! 🙏🏻 |
Hi @SkalskiP 👋 |
Hi @SkalskiP 👋
First, I integrated a conditional to pad the images that were smaller than the slice size (corners). After, I created the function '_apply_padding_to_slice' to apply padding to a slice using the resizing method letterbox. However, aligning mask data to the full image is more complex than bounding boxes because masks cover each pixel. Here's what I'm thinking: Another option is to use sparse mask storage. The idea should be that while processing each slice, we store the mask in a "sparse" format, meaning we only keep track of the parts of the mask that aren't empty. This saves a lot of memory. Later, when we merge everything, we convert this sparse mask into a regular one that covers the entire image. I would like to hear your thoughts. 🥷 UPDATE: |
released via |
Description
Currently,
sv.InferenceSlicer
supports only object detection models. Adding support for instance segmentation would require the following changes:sv.InferenceSlicer
uses Non-Max Suppression (NMS) to sift out duplicate detections at the tile intersection. At the moment, Supervision only has a box-based NMS. A segmentation-based NMS would be almost ideal, the only change would be to replace thebox_iou_batch
with a newmask_iou_batch
.sv.InferenceSlicer
. At this point, we would have to check whether the detections have masks. And if so, use the new NMS.API
Usage example
Additional
The text was updated successfully, but these errors were encountered: