Bad inference result on sample after overfitting on same sample #48

jonasdieker · 2023-06-23T07:51:10Z

I wanted to ensure the model can memorise a single training example. To do this I set the __len__() method in the Dataset to return 1. When training I printed the data_dict to ensure that the same sample was used for each iteration. Since the dataset length was set to 1, each epoch consisted of a single training step.

I visualised the train curves in tensorboard and as expected, all three losses eventually decreased to 0. Then I wanted to visualise the prediction of the model. For this I used the test.py script. However, when running on the same sample from training (000000.bin) the model produces zero predictions.

If I set the score_thr in pointpillar.py to 0, then I get a lot of predictions but they are obviously all very low confidence.

Any idea where I am going wrong?

The text was updated successfully, but these errors were encountered:

zhulf0804 · 2023-06-23T07:57:45Z

Hi @jonasdieker, It's strange. Could you post the visualized predictions when setting score_thr to 0 ?
By the way, did you load the pretrained weight successfully?

jonasdieker · 2023-06-23T08:10:49Z

Hi thank you for your very fast reply!

Sorry maybe I should have made it clear that I wanted to train from scratch on a single kitti sample to see if I can get decent predictions overfitting. Therefore, no pretrained weights were loaded, instead I loaded the model weights which I saved from my overfit-training run, produced as described above.

The reason: I tried to do the same for NuScenes to test if the model can memorise the new data when overfitting. In this case the model also predicts nothing, however I am not able to get a zero loss after playing with the parameters. So there is likely more parameter adjusting I need to do still ...

Here is the visualisation you asked for. (Note: I am using a different visualisation function because your one did not work for me over ssh)

White is pedestrian, green is cyclist and blue is car.

Here are the confidences:

[0.0112691  0.01061759 0.01054672 0.01012148 0.01011159 0.00997026
 0.00983873 0.00945836 0.00936741 0.00894571 0.00888245 0.00886574
 0.00883586 0.00870235 0.00864896 0.00861476 0.00859446 0.00854981
 0.00853697 0.00851393 0.00847296 0.00834575 0.00832187 0.00829636
 0.00829282 0.00826259 0.00825665 0.00825058 0.00824824 0.00824112
 0.00823086 0.00821262 0.00817523 0.00817244 0.00815322 0.00815221
 0.00809674 0.00809228 0.00809175 0.00807787 0.00805884 0.00801394
 0.00799607 0.00798928 0.00394109 0.00385207 0.00380854 0.00376242
 0.00368402 0.00364244]

And the class counts:

[44, 4, 2]

Hope this is somewhat helpful for you!

jonasdieker · 2023-06-23T08:20:17Z

One more comment worth making: In the kitti dataloader I actually commented out the data_augment function.

I did this in order to consistently get the same data for overfitting. I only use point_range_filter even for split="train".

zhulf0804 · 2023-06-23T13:37:05Z

Hello @jonasdieker, did you also visualize the G.T. result and the predicted result used the weights provided by this repo on 000000.bin. Are they reasonable ?

jonasdieker · 2023-06-23T13:39:10Z

Yes, I did and they were fine. That is why I am confused by my experiments outcome!

Edit: I will send a visualisation of that when I have access to the machine again!

zhulf0804 · 2023-06-23T13:54:55Z

Ok. One more thing, could you help to verify the single training example is 000000.bin again ?

jonasdieker · 2023-06-26T08:18:51Z

So I tried it again and verified I was overfitting on the same sample as I was testing on. I tried it with 000000.bin and then also 000001.bin individually, and both times the loss was practically zero but returned no bounding boxes at all with the test.py script and the default setting defined here:

PointPillars/model/pointpillars.py

Lines 262 to 266 in b9948e7

 # val and test 

 self.nms_pre = 100 

 self.nms_thr = 0.01 

 self.score_thr = 0.1 

 self.max_num = 50

Could you try to repeat this experiment? It should only take a few minutes.

Edit:

When setting the train_dataloader to split="val" and still with the training set length set to 1, I can perform training and validation on the same 000001.bin sample only. The weird thing is that if I look at tensorboard I get the following plots:

So now I am even more confused but it confirms that val/test performs really badly in this specific scenario. Especially the class loss actually diverges, which again makes sense why the confidence is so low and all boxes are filtered out by the get_predicted_bboxes_single method with the default params linked above.

jonasdieker · 2023-06-26T10:36:20Z

@zhulf0804 Ok, I think this is kind of interesting:

The only difference between train and val in train.py is the fact model.eval() is called (which of course you should be calling). But if I comment out that line I get the following plots:

Doing the same in test.py I get:

which is perfect! So, overfitting works exactly as expected with this change. However, I do not understand how this impacts the performance, as changing from train mode to eval mode does the following:

I think I need to give this some more thought. Let me know if you have an explanation!

zhulf0804 · 2023-06-27T06:28:23Z

Hello, @jonasdieker.
Both validation cls loss and visualized prediction (using test.py) become well by just removing model.eval(), like the following line ?

PointPillars/train.py

Line 139 in b9948e7

pointpillars.eval()

jonasdieker · 2023-06-27T07:41:41Z

Hello @zhulf0804, yes that is exactly right!

zhulf0804 · 2023-06-27T13:21:29Z

Ok, and I'm also confused about the result. I'll test it when I have access to the machine.
Besides, looking forward to your explanation to this question.
Best.

mdane-x · 2023-10-24T18:21:08Z

Do you have any updates on this? @jonasdieker did you find out the issue? I am getting the same problem, overfitting on one (or few samples) loss goes to 0, but then 0 predictions using test.py. And even worse, when I run test.py multiple times with NO changes, i get different results (sometimes few bboxes, most of the time zero - [] [] [])

jonasdieker · 2023-10-31T14:03:24Z

Hi @mdane-x, as far as I remember overfitting on one (or a few) sample(s) didn't work. I ended up commenting out model.eval(). I believe the issue was due to the normalisation. If you have a good explanation of what is going on, please add it here!

mdane-x · 2023-10-31T14:12:01Z

Hi @jonasdieker, thanks for the answer. I haven't managed to make it work, even after removing the eval() line. I am getting empty predictions with any trained model (on few samples)

jonasdieker · 2023-10-31T14:23:10Z

@mdane-x, hmmm that is very strange. I am not sure how to help you. In my experience it helps to visualise as much as you can. What does your validation loss look like? Is it also going to zero?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad inference result on sample after overfitting on same sample #48

Bad inference result on sample after overfitting on same sample #48

jonasdieker commented Jun 23, 2023 •

edited

zhulf0804 commented Jun 23, 2023

jonasdieker commented Jun 23, 2023 •

edited

jonasdieker commented Jun 23, 2023 •

edited

zhulf0804 commented Jun 23, 2023

jonasdieker commented Jun 23, 2023 •

edited

zhulf0804 commented Jun 23, 2023

jonasdieker commented Jun 26, 2023 •

edited

jonasdieker commented Jun 26, 2023

zhulf0804 commented Jun 27, 2023

jonasdieker commented Jun 27, 2023

zhulf0804 commented Jun 27, 2023

mdane-x commented Oct 24, 2023

jonasdieker commented Oct 31, 2023

mdane-x commented Oct 31, 2023

jonasdieker commented Oct 31, 2023

Bad inference result on sample after overfitting on same sample #48

Bad inference result on sample after overfitting on same sample #48

Comments

jonasdieker commented Jun 23, 2023 • edited

zhulf0804 commented Jun 23, 2023

jonasdieker commented Jun 23, 2023 • edited

jonasdieker commented Jun 23, 2023 • edited

zhulf0804 commented Jun 23, 2023

jonasdieker commented Jun 23, 2023 • edited

zhulf0804 commented Jun 23, 2023

jonasdieker commented Jun 26, 2023 • edited

jonasdieker commented Jun 26, 2023

zhulf0804 commented Jun 27, 2023

jonasdieker commented Jun 27, 2023

zhulf0804 commented Jun 27, 2023

mdane-x commented Oct 24, 2023

jonasdieker commented Oct 31, 2023

mdane-x commented Oct 31, 2023

jonasdieker commented Oct 31, 2023

jonasdieker commented Jun 23, 2023 •

edited

jonasdieker commented Jun 23, 2023 •

edited

jonasdieker commented Jun 23, 2023 •

edited

jonasdieker commented Jun 23, 2023 •

edited

jonasdieker commented Jun 26, 2023 •

edited