GitHub - lixirui142/VidToMe: Official Pytorch Implementation for "VidToMe: Video Token Merging for Zero-Shot Video Editing" (CVPR 2024)

VidToMe: Video Token Merging for Zero-Shot Video Editing (CVPR 2024)
_{Official Pytorch Implementation}

Xirui Li, Chao Ma, Xiaokang Yang, and Ming-Hsuan Yang

Project Page | Paper | Summary Video

source_1.mp4

VidToMe merges similar self-attention tokens across frames, improving temporal consistency while reducing memory consumption.

Abstract

Diffusion models have made significant advances in generating high-quality images, but their application to video generation has remained challenging due to the complexity of temporal motion. Zero-shot video editing offers a solution by utilizing pre-trained image diffusion models to translate source videos into new ones. Nevertheless, existing methods struggle to maintain strict temporal consistency and efficient memory consumption. In this work, we propose a novel approach to enhance temporal consistency in generated videos by merging self-attention tokens across frames. By aligning and compressing temporally redundant tokens across frames, our method improves temporal coherence and reduces memory consumption in self-attention computations. The merging strategy matches and aligns tokens according to the temporal correspondence between frames, facilitating natural temporal consistency in generated video frames. To manage the complexity of video processing, we divide videos into chunks and develop intra-chunk local token merging and inter-chunk global token merging, ensuring both short-term video continuity and long-term content consistency. Our video editing approach seamlessly extends the advancements in image editing to video editing, rendering favorable results in temporal consistency over state-of-the-art methods.

Updates

[02/2024] Code is released.
[02/2024] Accepted to CVPR 2024!
[12/2023] Release paper and website.

TODO

Release evaluation dataset and more examples.
Release evaluation code.

Setup

Clone the repository.

git clone [email protected]:lixirui142/VidToMe.git
cd VidToMe

Create a new conda environment and install PyTorch following PyTorch Official Site. Then pip install required packages.

conda create -n vidtome python=3.9
conda activate vidtome
# Install torch, torchvision (https://pytorch.org/get-started/locally/)
pip install -r requirements.txt

We recommand installing xformers for fast and memory-efficient attention.

Run

python run_vidtome.py --config configs/tea-pour.yaml

Check more config examples in 'configs'. The default config value are specified in 'default.yaml' with explanation.

Citation

If you find this work useful for your research, please consider citing our paper:

@inproceedings{li2024vidtome,
    title={VidToMe: Video Token Merging for Zero-Shot Video Editing},
    author={Li, Xirui and Ma, Chao and Yang, Xiaokang and Yang, Ming-Hsuan},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    year={2024}
    }

Acknowledgments

The code is mainly developed based on ToMeSD, PnP, Diffusers.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
assets		assets
configs		configs
data		data
utils		utils
vidtome		vidtome
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
generate.py		generate.py
invert.py		invert.py
requirements.txt		requirements.txt
run_vidtome.py		run_vidtome.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

configs

configs

data

data

utils

utils

vidtome

vidtome

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

generate.py

generate.py

invert.py

invert.py

requirements.txt

requirements.txt

run_vidtome.py

run_vidtome.py

Repository files navigation

VidToMe: Video Token Merging for Zero-Shot Video Editing (CVPR 2024)
_{Official Pytorch Implementation}

Updates

TODO

Setup

Run

Citation

Acknowledgments

About

Releases

Packages

Languages

License

lixirui142/VidToMe

Folders and files

Latest commit

History

Repository files navigation

VidToMe: Video Token Merging for Zero-Shot Video Editing (CVPR 2024)Official Pytorch Implementation

Updates

TODO

Setup

Run

Citation

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Languages

VidToMe: Video Token Merging for Zero-Shot Video Editing (CVPR 2024)
_{Official Pytorch Implementation}