This repo contains the implementation of our paper: "XCon: Learning with Experts for Fine-grained Category Discovery". (arXiv)
We address the problem of generalized category discovery (GCD) in this paper, i.e. clustering the unlabeled images leveraging the information from a set of seen classes, where the unlabeled images could contain both seen classes and unseen classes. The seen classes can be seen as an implicit criterion of classes, which makes this setting different from unsupervised clustering where the cluster criteria may be ambiguous. We mainly concern the problem of discovering categories within a fine-grained dataset since it is one of the most direct applications of category discovery, i.e. helping experts discover novel concepts within an unlabeled dataset using the implicit criterion set forth by the seen classes. State-of-the-art methods for generalized category discovery leverage contrastive learning to learn the representations, but the large inter-class similarity and intra-class variance pose a challenge for the methods because the negative examples may contain irrelevant cues for recognizing a category so the algorithms may converge to a local-minima. We present a novel method called Expert-Contrastive Learning (XCon) to help the model to mine useful information from the images by first partitioning the dataset into sub-datasets using k-means clustering and then performing contrastive learning on each of the sub-datasets to learn fine-grained discriminative features. Experiments on fine-grained datasets show a clear improved performance over the previous best methods, indicating the effectiveness of our method.
- Python 3.8
- Pytorch 1.10.0
- torchvision 0.11.1
pip install -r requirements.txt
In our experiments, we use generic image classification datasets including CIFAR-10/100 and ImageNet.
We also use fine-grained image classification datasets including CUB-200, Stanford-Cars, FGVC-Aircraft and Oxford-Pet.
Our model is initialized with the parameters pretrained by DINO on ImageNet. The DINO checkpoint of ViT-B-16 is available at here.
Set the path of datasets and the directory for saving outputs in config.py
.
- Get the k-means labels for partitioning the dataset.
bash bash_scripts/get_kmeans_subset.sh
- Get the length of k expert sub-datasets.
bash bash_scripts/get_subset_len.sh
Fine-tune the model with the evaluation of semi-supervised k-means.
bash bash_scripts/representation_learning.sh
To run the semi-supervised k-means alone by first running
bash bash_scripts/extract_features.sh
and then running
bash bash_scripts/ssk_means.sh
To estimate the number of classes in the unlabeled dataset by first running
bash bash_scripts/extract_features.sh
and then running
bash bash_scripts/estimate_k.sh
Results of our method are reported as below. You can download our model checkpoint by the link.
Datasets | All | Old | New | Models |
---|---|---|---|---|
CIFAR10 | 96.0 | 97.3 | 95.4 | ckpt |
CIFAR100 | 74.2 | 81.2 | 60.3 | ckpt |
ImageNet-100 | 77.6 | 93.5 | 69.7 | ckpt |
CUB-200 | 52.1 | 54.3 | 51.0 | ckpt |
Stanford-Cars | 40.5 | 58.8 | 31.7 | ckpt |
FGVC-Aircraft | 47.7 | 44.4 | 49.4 | ckpt |
Oxford-Pet | 86.7 | 91.5 | 84.1 | ckpt |
If you find this repo useful for your research, please consider citing our paper:
@inproceedings{fei2022xcon,
title = {XCon: Learning with Experts for Fine-grained Category Discovery},
author = {Yixin Fei and Zhongkai Zhao and Siwei Yang and Bingchen Zhao},
booktitle={British Machine Vision Conference (BMVC)},
year = {2022}
}