Skip to content

This repo reproduces results of "Understanding Self-Supervised Learning Dynamics without Contrastive Pairs" paper (https://arxiv.org/pdf/2102.06810.pdf)

Notifications You must be signed in to change notification settings

miszkur/SelfSupervisedLearning

Repository files navigation

Self-Supervised Learning without contrastive pairs

Tobias Höppe, Agnieszka Miszkurka, Dennis Wilkman

This repo reproduces the results of Understanding Self-Supervised Learning Dynamics without Contrastive Pairs paper. It is a final project for Advanced Deep Learning course at KTH Royal Institute of Technology in Stockholm.

We implemented all-in-one siamese netwok which can work as:

Environment

The project is implemented with Tensorflow 2. Prepare an virtual environment with python>=3.6, and then use the following command line for the dependencies.

pip install -r requirements.txt

Project overview

The project structure is as follows:

.
├── data_processing
├── experiments
│   ├── notebooks
│   │   ├── results_eigenspace
│   │   └── saved_model
│   ├── scripts
│   │   ├── results_eigenspace
│   │   └── saved_model
│   └── visualisation
└── models

Data processing

Contains augmentations and methods for processing CIFAR-10 and STL-10.

Experiments

Contains notebooks and scripts for running experiments along with visualisation utilities.

All parameter settigns can be found in config.py.

Models

Contains models for self-supervised pre-training (SiameseNetwork) and finetuning (ClassificationNetwork) and their building blocks.

How to run

To run training pipeline (pretraining + finetuning), from the main directory, run:

python train.py --model MODEL_NAME --name SAVE_DIR_NAME

Where MODEL_NAME can be one of: byol, simsiam, directpred,directcopy. You can also specify number of epochs for pretraining with --epochs_pretraining flag (default: 101) and finetuning with --epochs_finetuning flag (default: 50).

Additionaly the following flags can be used to run different experiments:

  • --symmetry to impose symmetry regularisation on predictor (Wp)
  • --eigenspace to track eigenspace evolvement. Results of eigenspace evolvement are saved in results/SAVE_DIR_NAME/eigenspace_results
  • --one_layer_predictor Make predictor consist of only one layer (only applicable to BYOL and SimSiam)

Pretrained encoder will be saved in results/SAVE_DIR_NAME directory as a .h5 file. Finetuned classifier will be saved in results/SAVE_DIR_NAME/classifier as a keras model.

There are models already available in those folders.

To check the final accuracy on the test set run

python test.py --name SAVE_DIR_NAME

If eigenspace results are in the SAVE_DIR_NAME, they will be visualise. If you only want visualisation (without running the classifier) add --only vis flag.

Alternatively, you can use jupyter notebook, for example see experiments/notebooks/direct_pred.ipynb.

Network architecture

image info

Siamese network consists of two networks with the same architecture. ResNet-18 () as encoder, which is supposed to create hidden features and a projector head , which is a two layer MLP, with purpose to map the feature space into a lower dimensional hidden space. The online network also has an additional predictor head, again consisting of a two layer MLP. The target network has a StopGrad function instead of a predictor head. Therefore during back propagation, only the weights of the online network are updated. The loss between the output of the online and target network is equal to the cosine-similarity loss function. Note, that the final loss of one image is the symmetric loss + , since each augmentation is given to both networks.

Experiments

Configuration

Below are all available configurations which can be found in config.py.

Network \ Settings original Symmetry regularisation One layer predictor (original: two layers)
BYOL get_byol / get_eigenspace_experiment get_eigenspace_experiment_with_symmetry get_byol_baseline
SimSiam get_simsiam get_simsiam_symmetric get_simsiam_baseline
Network \ Settings original SimSiam 3 layer predictor
DirectPred get_direct_pred get_simsiam_pred get_deeper_projection
DirectCopy get_direct_copy

SimSiam with symmetric predictor

Stable (not collapsing) version of SimSiam with symmetric predictor (with different learning rate and weight decay for predictor and the rest of the network) can be found on branch simsiam_predictor.

Results

For detailed results see report of our project. All our experiments were run on CIFAR-10 due to computational constraints. Self-Supervised pretraining takes around 4 hours 30 minutes on GCP's V100.

Accuracy on CIFAR-10

Model Config Accuracy
BYOL get_byol 85.7%
SimSiam get_simsiam 79.4%

image info Figure 1: Results for DirectPred and DirectCopy with and without EMA. SGD baseline is BYOL with one layer predictor.

Eigenspace allignment

First, we pre-train BYOL and SimSiam keep track of the predictor heads symmetry and eigenspace alignment. In Figure 2 we can see, that the assumption of an symmetric predictor holds. Even without symmetry regularisation, Wp approaches symmetry during training. Also, we can see that for all non-zero eigenvalues of Wp the eigenspaces between F and align as the training progresses.

image info Figure 2: Pre-training BYOL for 100 epochs of CIFAR-10. Top row: BYOL without symmetry regularisation on . Bottom row: BYOL with symmetry regularisation on . The eigenvalues of F are plotted on the log scale, since the eigenvalues vary a lot.