Is MOTIP a class-agnostic model for tracking-by-detection task? #36

max-unfinity · 2024-12-16T20:37:38Z

Hi, thank you for amazing work!

Is MOTIP a class-agnostic model for tracking-by-detection task?
For example, if I have my Deformable DETR checkpoint trained on some marine dataset, can I use your pre-trained MOTIP models without any re-training or fine-tuning? Specifically, is the SeqDecoder module class-agnostic?

HELLORPG · 2024-12-22T04:44:06Z

I'm sorry for the delay in my responses. I'll be in the hospital for a while (maybe a month or more), so my replies might be slower than usual. Thanks for your understanding and patience.

Thanks for your interest in our work.
Currently, our MOTIP is not a class-agnostic model. The reason is that our SeqDecoder and DETR are trained jointly. This means that the weights (checkpoints) of these two parts need to be used together. Specifically, our SeqDecoder needs the object features from DETR (query's output embedding), so consistency should be ensured.
However, this pipeline (ID prediction) can be further extended to a class-agnostic method easily. Here is a feasible method: decouple the detector (DETR) and the SeqDecoder parts. The detector only provides the bounding boxes (currently, we also use the output embeddings), and the SeqDecoder should independently extract the corresponding object features from the raw images based on the given bounding boxes (currently, we use the DETR output embeddings as mentioned above). This way, these two parts can be connected solely through the detector's output boxes, without the need for consistent training and usage.
🤗 Currently, our design aims to minimize the engineering details and complex designs that need to be considered during the exploration process. Therefore, it may not be the most suitable for practical applications. However, I believe we DO demonstrate that our proposed method has significant potential. I am more than willing to help transition this work to a wide range of application scenarios or assist in future research.

MattLiutt · 2025-01-03T06:27:40Z

Thanks for the excellent repo! So do you mean that we can separate detector and SeqDecoder during inference? For instance, I get a general usage detector (DETR or YOLO series), can I then do it like Sort those tracking-by-detection methods? Or we need to train both detector and SeqDecoder jointly and then do the inference? Clarify me if there's any misunderstanding! Thanks a lot!

HELLORPG · 2025-01-07T04:34:44Z

So do you mean that we can separate detector and SeqDecoder during inference? For instance, I get a general usage detector (DETR or YOLO series), can I then do it like Sort those tracking-by-detection methods?

The codebase we provide in this repo does not support this feature. My above reply means that our thinking (ID prediction for target association) can be extended to the association-only model (similar to ReID methods) rather than the joint detection and association model (as we did).

If you want to get an association-only tracking-by-detection model, you need to re-write your own code based on ours. Once you ensure that feature extraction in SeqDecoder is decoupled from the detector, then this SeqDecoder can be combined with any trained detector to become a tracking-by-detection method you mentioned (not tied to any specific detector). You can refer to work like MASA/PuTR for inspiration, where they trained a decoupled feature extractor.

Or we need to train both detector and SeqDecoder jointly and then do the inference?

For this repo, YES. Because the detector (Deformable DETR) also plays the role of feature extractor for SeqDecoder. Therefore, the detector and SeqDecoder are coupled together, trained together and inferred together.

I hope this clarifies your concerns. Please let me know if you need additional details.

MattLiutt · 2025-01-07T08:28:02Z

Thanks for prompt response! Appreciated!

For this repo, YES. Because the detector (Deformable DETR) also plays the role of feature extractor for SeqDecoder. Therefore, the detector and SeqDecoder are coupled together, trained together and inferred together.

Just one last question, this repo used Deformable DETR as the detector as well feature extractor for SeqDecoder, is it feasible to replace it with other transformer?

Thanks so much!

HELLORPG · 2025-01-09T09:58:51Z

is it feasible to replace it with other transformer?

Yes, of course. In this repo, we also provide MOTIP-DAB-Deformable-DETR except the default MOTIP-Deformable-DETR, as reported in dancetrack results.

Specifically, you can refer to the following code to use your own transformer detector (self.detr) and the corresponding criterion function (self.detr_criterion):

MOTIP/models/motip.py

Lines 91 to 100 in 1dda4c4

    
           if self.detr_framework == "Deformable-DETR": 
        
               # DETR model and criterion: 
        
               self.detr, self.detr_criterion, _ = build_deformable_detr(detr_args) 
        
           elif self.detr_framework == "DAB-Deformable-DETR": 
        
               detr_args.num_patterns = 0 
        
               detr_args.random_refpoints_xy = False 
        
               self.detr, self.detr_criterion, _ = build_dab_deformable_detr(detr_args) 
        
               # TODO: We will upload the DAB-DETR code soon. 
        
           else: 
        
               raise RuntimeError(f"Unknown DETR framework: {self.detr_framework}.")

Additionally, we need to make some modifications to the return values of the DETR detector to ensure it returns the target features (output embeddings):

MOTIP/models/deformable_detr/deformable_detr.py

Lines 191 to 194 in 1dda4c4

    
           # Output the outputs of last decoder layer. 
        
           # We need these outputs to generate the embeddings for objects. 
        
           out["outputs"] = hs[-1] 
        
           return out

HELLORPG pinned this issue Dec 22, 2024

HELLORPG added the discussion Thoughtful discussion and insight label Dec 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is MOTIP a class-agnostic model for tracking-by-detection task? #36

Is MOTIP a class-agnostic model for tracking-by-detection task? #36

max-unfinity commented Dec 16, 2024

HELLORPG commented Dec 22, 2024

MattLiutt commented Jan 3, 2025

HELLORPG commented Jan 7, 2025

MattLiutt commented Jan 7, 2025

HELLORPG commented Jan 9, 2025

Is MOTIP a class-agnostic model for tracking-by-detection task? #36

Is MOTIP a class-agnostic model for tracking-by-detection task? #36

Comments

max-unfinity commented Dec 16, 2024

HELLORPG commented Dec 22, 2024

MattLiutt commented Jan 3, 2025

HELLORPG commented Jan 7, 2025

MattLiutt commented Jan 7, 2025

HELLORPG commented Jan 9, 2025