Model not learning #36

naguileraleal · 2023-07-27T19:46:19Z

Hello!
First of all, thank you for this awesome project!

I'm trying to use QA to annotate fibrosis on my images and I'm not having good results. This is what my images look like

Looking at the project's Tensorboard, I'm seeing the validation loss has some epochs where it diverges and then, after some epochs, drops back to normal values. I also see a lot of epochs with very large loss values. I tried training the model with several patch sizes (256, 512, 1024) and the loss behaves the same.
These losses were obtained after training with 306 training ROIs and 181 validation ROIs, with a patch size of 256x256. Each ROI is 512x512 pixels in size. The negative class predominates over the positive class.
Train loss

Validation loss

I'm trying to find out why the model is not learning. For that reason I made some modifications to the train_model.py script, like avoiding noise/blur/scaling data augmentations, test time data augmentations, and logging each layer's gradients to tensorboard.

After avoiding noise/blur/scaling data augmentations the loss values where smaller, but the behaviour was the same as before.

This is a histogram of the gradients of the model's last layer.

I don't have a QA project that produces good segementations, so I don't know how should these gradients look like. If someone has this information it would be valuable to me.

The loss function is the CrossEntropyLoss

QuickAnnotator/train_model.py

Line 259 in 57a1358

 criterion = torch.nn.CrossEntropyLoss(ignore_index=-1, reduce=False, weight=class_weight) 

the last layer is a Conv2d

QuickAnnotator/unet.py

Line 53 in 57a1358

self.last = nn.Conv2d(prev_channels, n_classes, kernel_size=1)

and there is no activation function later on

QuickAnnotator/unet.py

Line 66 in 57a1358

return self.last(x)

so the output's values are unbounded. If the output has 0 (or very near to 0) values, as it could, the loss values should be very large, as the ones I'm seeing. Why not including a SoftMax activation to the last layer's output? That's my next debugging move.

Pixels that belong to the "Unknown" class are assigned the -1 tag in the ground truth

QuickAnnotator/train_model.py

Line 90 in 57a1358

img_mask[(mask[:, :, 0] == 0) & (mask[:, :, 1] == 0)] = -1 # unknown class

Later on this mask is passed to the CrossEntropyLoss function. What is the result of applying this function to negative values? Does it ignore these pixels? I would like these pixels not to be considered when training. Is this the effect achieved when passing a negative value to this loss?

Any help is much appreciated.

The text was updated successfully, but these errors were encountered:

jacksonjacobs1 · 2023-08-18T13:45:23Z

Hi naguileraleal,
Thank you for the question! First of all, have you made any progress with this issue since you raised it?

I'll need a little more information about your training and validation sets. Can you tell me the labeling distribution (positive present vs. all negative) for each set?

naguileraleal · 2023-10-03T01:45:40Z

Hello!
Sorry for the delay. I've spent a long time trying to debug the training, without success.

About the proportion of positive vs negative pixels, the value of the pclassweight parameter used for training the UNet is 0.939710278448174, which I know is really debalanced.
The ratio (positive/total) pixels for the train and validation sets is 0.0543 and 0.0535 respectively. I do not have pixels in the 'Unknown' category.

jacksonjacobs1 · 2023-10-04T20:30:25Z

Thanks for the information. I would recommend taking some measures to make your classes more balanced. One strategy for doing this would be to only select and annotate patches which have fibrosis regions present. Does that make sense?

naguileraleal · 2023-10-27T17:46:57Z

Removing all the all-background patches solved the issue. There were a lot of them (~60%).
This should be a consideration when using QuickAnnotator for annotating anomalies.

Thanks for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model not learning #36

Model not learning #36

naguileraleal commented Jul 27, 2023 •

edited

Loading

jacksonjacobs1 commented Aug 18, 2023

naguileraleal commented Oct 3, 2023

jacksonjacobs1 commented Oct 4, 2023

naguileraleal commented Oct 27, 2023 •

edited

Loading

Model not learning #36

Model not learning #36

Comments

naguileraleal commented Jul 27, 2023 • edited Loading

jacksonjacobs1 commented Aug 18, 2023

naguileraleal commented Oct 3, 2023

jacksonjacobs1 commented Oct 4, 2023

naguileraleal commented Oct 27, 2023 • edited Loading

naguileraleal commented Jul 27, 2023 •

edited

Loading

naguileraleal commented Oct 27, 2023 •

edited

Loading