Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Really performs worse when number of classes are High #73

Open
jaytimbadia opened this issue Feb 10, 2022 · 7 comments
Open

Really performs worse when number of classes are High #73

jaytimbadia opened this issue Feb 10, 2022 · 7 comments

Comments

@jaytimbadia
Copy link

This algorithm really performs worse when number of classes are high like around 100, which is most of the time the case.
At that time, we need more images per class and as we increase labels it almost reaches to normal models image per class requirement.

Really wasted my time on this.

@carlini
Copy link
Collaborator

carlini commented Feb 10, 2022

That's interesting you find it's not working. We were able to reach ~state of the art accuracy at the time on semi supervised CIFAR-100 (with 100 classes) and ImageNet (with 1000 classes). What are you trying?

@jaytimbadia
Copy link
Author

Hi,

I guess it performs well or SOTA on Semi SL benchmarks not conventional with more label data per class.
Please let me know if above is correct.
Bcs Cifar100 has 96% on SL and I am not sure what accuracy this mode achieves on Cifar 100 and Imagenet 1000.

I am working on a classification task with ~2000 classes, I tried with 400 images per class which gave me an supervised accuracy of 48.85%.
If this is worse, how can the pseudo labelling training task perform well? and due to this unlabelled training is not performing well.
This means that, we need a quite better model (complex) to train the further semi task and repeat.

I am not sure how to tackle this problem I have. I don't have much labelled data (400 per class) and have been stuck for 2 months now.

@carlini
Copy link
Collaborator

carlini commented Feb 10, 2022

Sorry I don't really follow what you're saying here.

You have 2000 classes, and with 400 images per class you get 48.8% accuracy. Is this with FixMatch? "If this is worse" than what? And how does pseudo labeling enter into here? And why do you need a more complex model?

How much unlabeled data do you have if you have 400 labeled images per class?

@jaytimbadia
Copy link
Author

OK.Let me put it simply.

I have 2000 classes with 400 labelled images per class and I have ~600 non labelled images per class (some from net other augmented from labelled)
Let me know will fix match work on this?
I have tried and it doesn;t give good results.
Let me know if you have any other suggestion.

I got 48.8% on running fixmatch.py file.
DO I need to do any model changes? Also why it wont work on higher classes? Is it due to the more patterns and less complex model? SO should change architecture?

@carlini
Copy link
Collaborator

carlini commented Feb 10, 2022

What accuracy does fully supervised training on these images give, ignoring the unlabeled data?

@jaytimbadia
Copy link
Author

I did not did that, but now I have it. Its gives 84.27% with 15-85 split. Pure supervised on labelled.

I have used transfer learning - Dense Net.

@carlini
Copy link
Collaborator

carlini commented Feb 25, 2022

Supposing you don't use transfer learning, what accuracy do you get?

The thing I'm trying to understand is this: supervised learning will strictly out-perform semi-supervised learning given the same number of labeled examples. If your task is just hard, it's entirely reasonable that FixMatch might just actually not reach very high accuracy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants