Unsupervised Semantic Correspondence Using Stable Diffusion

This repository contains the implementation of our method for estimating correspondences with Stable Diffusion in an unsupervised manner. Code for getting the attention maps has been modified from Prompt-to-Prompt and the dataloader for the Spair-71k and PFWillow datasets has been modified from Cats++. Our new method surpasses weakly supervised methods and closes the gap to strongly supervised methods.

Getting Started

Here are instructions on how to run the repository:

Install dependencies: This project uses a conda environment for managing dependencies. You can create the environment and install all dependencies with the following command:
```
conda env create -f environment.yml
```

Run the evaluation script:

conda activate LDM_correspondences
python3 -m eval.eval

More options can be found with
```
python3 -m eval.eval --help
```

Visualizing Attention Maps

The project includes an interactive local website for visualizing attention maps associated with identified correspondences. Follow the steps below to launch the visualization:

Activate the conda environment and run the evaluation script with visualization:
```
conda activate LDM_correspondences
python3 -m eval.eval --visualize
```
Launch the interactive website by running the visualization script:
```
python3 -m clickable_lines.app
```

This will display correspondences. Click on each to visualize the corresponding attention maps.

Method Overview

We supervise the attention maps corresponding to randomly initialized text embedding to activate in a source region. This text embedding can then be applied to any target image where we simply look for the argmax in its attention map.

We are motivated by the fact that the attention maps for specific words act as pseudo-segmentation for those regions. By inputting an image instead of random noise we can use Stable Diffusion for inference tasks.

We find that even when our method predicts incorrect correspondences, the regions it predicts still seem reasonable. On the bottom right, of note, even though all points are meant to correspond with the wine bottle, points occluded by the wine glass instead map to the wine glass.

Our method outperforms weakly supervised methods and in the case of PF-Willow, is on par with strongly supervised methods.

We also find that when we look for correspondences between different classes, it still estimates plausible correspondences.

Citing

If you find this code useful for your research please consider citing the following paper:

@article{hedlin2023unsupervised,
  title={Unsupervised Semantic Correspondence Using Stable Diffusion}, 
  author={Eric Hedlin and Gopal Sharma and Shweta Mahajan and Hossam Isack and Abhishek Kar and Andrea Tagliasacchi and Kwang Moo Yi},
  year={2023},
  eprint={2305.15581},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
clickable_lines		clickable_lines
docs		docs
eval		eval
example_images		example_images
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unsupervised Semantic Correspondence Using Stable Diffusion

Getting Started

Visualizing Attention Maps

Method Overview

Citing

About

Releases

Packages

Languages

License

ubc-vision/LDM_correspondences

Folders and files

Latest commit

History

Repository files navigation

Unsupervised Semantic Correspondence Using Stable Diffusion

Getting Started

Visualizing Attention Maps

Method Overview

Citing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages