Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model not learning #36

Open
naguileraleal opened this issue Jul 27, 2023 · 4 comments
Open

Model not learning #36

naguileraleal opened this issue Jul 27, 2023 · 4 comments

Comments

@naguileraleal
Copy link

naguileraleal commented Jul 27, 2023

Hello!
First of all, thank you for this awesome project!

I'm trying to use QA to annotate fibrosis on my images and I'm not having good results. This is what my images look like
016-22_1_4000_20000
016-22_1_4000_20000_mask
016-22_1_4000_20000_mask_overlay

Looking at the project's Tensorboard, I'm seeing the validation loss has some epochs where it diverges and then, after some epochs, drops back to normal values. I also see a lot of epochs with very large loss values. I tried training the model with several patch sizes (256, 512, 1024) and the loss behaves the same.
These losses were obtained after training with 306 training ROIs and 181 validation ROIs, with a patch size of 256x256. Each ROI is 512x512 pixels in size. The negative class predominates over the positive class.
Train loss
train-loss
Validation loss
test-loss

I'm trying to find out why the model is not learning. For that reason I made some modifications to the train_model.py script, like avoiding noise/blur/scaling data augmentations, test time data augmentations, and logging each layer's gradients to tensorboard.

After avoiding noise/blur/scaling data augmentations the loss values where smaller, but the behaviour was the same as before.

This is a histogram of the gradients of the model's last layer.
last-layer-gradients
I don't have a QA project that produces good segementations, so I don't know how should these gradients look like. If someone has this information it would be valuable to me.

The loss function is the CrossEntropyLoss

criterion = torch.nn.CrossEntropyLoss(ignore_index=-1, reduce=False, weight=class_weight)

the last layer is a Conv2d
self.last = nn.Conv2d(prev_channels, n_classes, kernel_size=1)
and there is no activation function later on
return self.last(x)

so the output's values are unbounded. If the output has 0 (or very near to 0) values, as it could, the loss values should be very large, as the ones I'm seeing. Why not including a SoftMax activation to the last layer's output? That's my next debugging move.

Pixels that belong to the "Unknown" class are assigned the -1 tag in the ground truth

img_mask[(mask[:, :, 0] == 0) & (mask[:, :, 1] == 0)] = -1 # unknown class
Later on this mask is passed to the CrossEntropyLoss function. What is the result of applying this function to negative values? Does it ignore these pixels? I would like these pixels not to be considered when training. Is this the effect achieved when passing a negative value to this loss?

Any help is much appreciated.

@jacksonjacobs1
Copy link
Collaborator

Hi naguileraleal,
Thank you for the question! First of all, have you made any progress with this issue since you raised it?

I'll need a little more information about your training and validation sets. Can you tell me the labeling distribution (positive present vs. all negative) for each set?

@naguileraleal
Copy link
Author

Hello!
Sorry for the delay. I've spent a long time trying to debug the training, without success.

About the proportion of positive vs negative pixels, the value of the pclassweight parameter used for training the UNet is 0.939710278448174, which I know is really debalanced.
The ratio (positive/total) pixels for the train and validation sets is 0.0543 and 0.0535 respectively. I do not have pixels in the 'Unknown' category.

@jacksonjacobs1
Copy link
Collaborator

Thanks for the information. I would recommend taking some measures to make your classes more balanced. One strategy for doing this would be to only select and annotate patches which have fibrosis regions present. Does that make sense?

@naguileraleal
Copy link
Author

naguileraleal commented Oct 27, 2023

Removing all the all-background patches solved the issue. There were a lot of them (~60%).
This should be a consideration when using QuickAnnotator for annotating anomalies.

Thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants