Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If the target disappears for a long time and then reappears, can the algorithm recover it? #20

Open
LuletterSoul opened this issue Jul 17, 2024 · 3 comments
Labels
discussion Thoughtful discussion and insight

Comments

@LuletterSoul
Copy link

Hello, thank you for MCG for open sourcing this amazing MOT work! As the issue mentioned in the title, in a sequence, the target disappears from the camera for a long time and then reappears, I found that the target is treated as a newborn and given a new ID, which does not actually align with real-world usage scenarios.

Can historical identity IDs of disappearing targets be recorded, so that the targets can be found again when they reappear?

@HELLORPG
Copy link
Collaborator

Thanks for your thumbs up.
This is a pretty interesting question, and I will try to explain my thoughts as briefly as I can.

The phenomenon you observed is because the long time you mentioned exceeds what the model can allow (determined by MAX_TEMPORAL_LENGTH). This is due to the limitations of the relative position embedding we use. The video frames we can see during training are limited, so we cannot support an infinite sequence length (in other words, a long-time occlusion). From a utilitarian perspective, because such long-term occlusion is relatively rare on MOT benchmarks, we do not consider this issue by default.

But I have two ideas that may be able to handle the situation you mentioned:

  1. A plug-and-play ReID module can be used to handle these long-disappearing targets. This is very common in practical applications because long-term occlusions are difficult to construct in training data. Many manual algorithms, including ByteTrack/OC-SORT, can only handle occlusions of dozens of frames.
  2. Add a tricky implementation to our model. Our (mostly) inference code is here. In our implementation, trajectories are a queue stored by time, initialized here and updated here. Therefore, we eliminate objects in the furthest video frame over time. If an ID does not exist in all frames (trajectory_history in our code), then even if the same target appears later, it will be regarded as a newborn object (in your words, the target disappears from the camera for a long time and then reappears). So an intuitive idea is: over time, we all retrain at least one item for each ID.
    Specifically, before the queue is updated, you can check whether there is an ID in the abandoned frame that does not exist in other remaining frames. If it is, you could save it manually to the previous frame (save it from trajectory_history[0] to trajectory_history[1], then call trajectory_history.append(current_tracks)). In this way, this ID will not be lost when the trajectory_history[0] is popped while retaining the possibility of recovering it in future frames.
    However, I'm not sure about the effectiveness of this process, as it goes beyond what the model sees during training and may introduce gaps. In addition, I think that if you use this trick, you still need to limit the maximum disappearance time of an ID; otherwise, it may cause some unexpected problems.

My expression may be a bit complicated. If you don't understand it, fell free to reply and discussion.

@LuletterSoul
Copy link
Author

@HELLORPG Thank you for your expert opinion, I think the first method is more commonly used and effective in industry, but it has to rely on a strong ReID model. This relies heavily on large-scale data. In constrast, I think the idea of MOTIP is very good because the model can be optimized for e2e and is friendly to actual deployment.

I've tried using MOTIP in some other data scenarios (such as MOT in uav perspective). And I found that it may have ID Switch problem. I have an idea, is it possible to pre-train seq_decoder using ReID's dataset to further enhance its ID prediction capabilities?

@HELLORPG
Copy link
Collaborator

HELLORPG commented Dec 22, 2024

I'm sorry for the delay in my responses. I'll be in the hospital for a while (maybe a month or more), so my replies might be slower than usual. Thanks for your understanding and patience.

I have an idea, is it possible to pre-train seq_decoder using ReID's dataset to further enhance its ID prediction capabilities?

It should be a promising idea. Just like we can use only one image to train the model (e.g., joint training on CrowdHuman), ReID datasets can also be used for the training. However, I think there will be a lot of engineering details and tricks to be considered.

@HELLORPG HELLORPG added the discussion Thoughtful discussion and insight label Dec 22, 2024
@HELLORPG HELLORPG pinned this issue Dec 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Thoughtful discussion and insight
Projects
None yet
Development

No branches or pull requests

2 participants