Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Reproduction #17

Open
mattiasegu opened this issue Apr 15, 2024 · 11 comments
Open

Performance Reproduction #17

mattiasegu opened this issue Apr 15, 2024 · 11 comments
Labels
help wanted Extra attention is needed

Comments

@mattiasegu
Copy link

Hi, thank you very much for providing this well-structured codebase!

I tried training MeMOTR (with DAB-DETR) on DanceTrack and run into performance issues. In particular, using the provided config file and pretrained checkpoint I only obtain:

HOTA DetA AssA
62.481 74.141 52.901

In particular, the association accuracy lags > 2 points behind the performance reported in the paper. Was anyone able to reproduce the original performance? Is there anything I'm missing? @HELLORPG have you tried training this model with the current codebase and config file? Thanks in advance for your help!

@HELLORPG
Copy link
Collaborator

May I ask how many GPUs you have used for training? 8 GPUs?

@mattiasegu
Copy link
Author

mattiasegu commented Apr 15, 2024

Yes, 8 GPUs NVIDIA RTX 4090. I use the --use-checkpoint flag and the default learning rate provided in the config file. I'm now trying with deformable DETR to see if the issue I have is only with DAB-DETR

@HELLORPG
Copy link
Collaborator

And I do re-run our code before open source. However, I do not evaluate it on the val set but directly on the test set. It achieved the desired results.
Could you please submit your result to the Codalab server so we can see its performance on the test set?
As I discussed here:

# We change some parameters compared with our paper, looking forward more stable training convergence.

I tried my best to make the work more consistent. In my experience and that of others, convergence on DanceTrack can easily become unstable. However, with our codebase, it should be possible to maintain < 1.0 HOTA swing on DanceTrack (in my reproduction, they only occur ~0.5 HOTA).

@HELLORPG
Copy link
Collaborator

HELLORPG commented Apr 15, 2024

Some other evidence of inconsistent results on DanceTrack is as follows:

  • In OC-SORT they also face ~0.5 HOTA instability. The situation will be more serious in the E2E model than in the heuristic algorithm.
  • Issue of MOTRv2 also reported the instability.

I found that the results on SportsMOT are more stable.

BTW, I suggest that you can re-run the DAB-D-DETR version again, and see what will happen. If it's depended on luck, I don't believe we will have such bad luck twice in a row.

@HELLORPG
Copy link
Collaborator

One more thing, have you used --use-checkpoint during your training? I used to try to run this exp on 3090 24G, but when processing 5 frames, the CUDA memory will be insufficient.

@HELLORPG
Copy link
Collaborator

One more thing, have you used --use-checkpoint during your training? I used to try to run this exp on 3090 24G, but when processing 5 frames, the CUDA memory will be insufficient.

Forget about it. I missed it in your reply. My fault.

@mattiasegu
Copy link
Author

Thank you very much for your detailed replies and for your efforts! I will keep the issue updated as soon as I:

  • re-run the DAB-DETR training
  • run the DeformableDETR training
  • validate the models on the DanceTrack test set
  • train on other datasets (e.g. SportsMOT)

@HELLORPG
Copy link
Collaborator

My pleasure. Keep in touch~

@HELLORPG
Copy link
Collaborator

If anyone else has tried to reproduce the experiments on DanceTrack, you can post your results here to give us more evidence.

@HELLORPG HELLORPG added the help wanted Extra attention is needed label Apr 15, 2024
@mattiasegu
Copy link
Author

mattiasegu commented Apr 18, 2024

Update: with MeMOTR deformable DETR I can get a performance that is reasonably close to that reported in the paper for DanceTrack val:

Paper My Run
HOTA DetA AssA HOTA DetA AssA
61.0 71.2 52.5 60.8 72.1 51.5

As you mentioned, there is quite some variance from one epoch to the other, but this result seems more satisfying than the one obtained with DAB-DETR :)

Nonetheless, it seems that the difference lies in the association accuracy

@HELLORPG
Copy link
Collaborator

HELLORPG commented Apr 20, 2024

Thank you very much for your result. I believe it will help those who reproduce the experiments later to get more information beyond our paper.

In the past few days, I have used the code from this repository to conduct reproduction experiments. Due to the limitations of GPU resources (you know, a lot of exps for other work), I only reproduced the MeMOTR with DAB-D-DETR.

Here is my result:

HOTA DetA AssA
val 63.7 74.5 54.6
test 67.7 80.1 57.4

and my log.txt is here.

In my experience, this is an acceptable result on DanceTrack. However, to be honest, this instability is really frustrating. According to my previous exploration, careful adjustment of training strategies is required to alleviate this issue. But I don't have so many GPUs to repeat a large number of experiments (if you want to verify the stability of training, you need to conduct at least 3~4 times for a specific setting). If you or anyone else has any ideas or results about it, please feel free to discuss them with me. I'm also trying to alleviate this problem in the extended version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants