EVA: Zero-shot Accurate Attributes and Multi-Object Video Editing

EVA: Zero-shot Accurate Attributes and Multi-Object Video Editing
Xiangpeng Yang, Linchao Zhu, Hehe Fan, Yi Yang,

(Note that the videos on GitHub are heavily compressed. The full videos are available on the project webpage.)

📚 TL; DR: EVA is based on T2I model (StableDiffusion 1.5), designed for accurate multi-attribute editing in single/multi-object scenarios without any training.

📣 EVA Intro Video

eva_intro.mp4

Click for the full abstract

Current diffusion-based video editing primarily focuses on local editing (object/background editing) or global style editing by utilizing various dense correspondences. However, these methods often fail to accurately edit the foreground and background simultaneously while preserving the original layout. We find that the crux of the issue stems from the imprecise distribution of attention weights across designated regions, including inaccurate text-to-attribute control and attention leakage. To tackle this issue, we introduce EVA, a zero-shot and multi-attribute video editing framework tailored for human-centric videos with complex motions. We incorporate a Spatial-Temporal Layout-Guided Attention mechanism that leverages the intrinsic positive and negative correspondences of cross-frame diffusion features.To avoid attention leakage, we utilize these correspondences to boost the attention scores of tokens within the same attribute across all video frames while limiting interactions between tokens of different attributes in the self-attention layer. For precise text-to-attribute manipulation, we use discrete text embeddings focused on specific layout areas within the cross-attention layer. Benefiting from the precise attention weight distribution, EVA can be easily generalized to multi-object editing scenarios and achieves accurate identity mapping. Extensive experiments demonstrate EVA achieves state-of-the-art results in real-world scenarios.

🔥 Project page

For more editing results, plz see project webpage.

Acknowledgements

This codebase builds on diffusers. Besides, we acknowledge following amazing open-sourcing projects:

FazteZero (https://github.com/ChenyangQiQi/FateZero).
controlvideo (https://github.com/thu-ml/controlvideo).

📌 Citation

If you find this paper useful, please consider staring 🌟 this repo and citing 📑 our paper:

@misc{yang2024eva,
      title={EVA: Zero-shot Accurate Attributes and Multi-Object Video Editing}, 
      author={Xiangpeng Yang and Linchao Zhu and Hehe Fan and Yi Yang},
      year={2024},
      eprint={2403.16111},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
assets		assets
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

.gitattributes

.gitattributes

README.md

README.md

Repository files navigation

EVA: Zero-shot Accurate Attributes and Multi-Object Video Editing

📣 EVA Intro Video

🔥 Project page

Acknowledgements

📌 Citation

About

Releases

Packages

knightyxp/EVA_Video_Edit

Folders and files

Latest commit

History

Repository files navigation

EVA: Zero-shot Accurate Attributes and Multi-Object Video Editing

📣 EVA Intro Video

🔥 Project page

Acknowledgements

📌 Citation

About

Topics

Resources

Stars

Watchers

Forks