Skip to content

Latest commit

 

History

History
411 lines (379 loc) · 13.3 KB

README.md

File metadata and controls

411 lines (379 loc) · 13.3 KB

MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance (ACM MM2024)

This repository is the official implementation of MAG-Edit.

Qi Mao, Lan Chen, Yuchao Gu, Zhen Fang, Mike Zheng Shou

Project Website arXiv


(a) Blended latent diffusion (b) DiffEdit (c) Prompt2Prompt
(d) Plug-and-play (e) P2P+Blend (f) PnP+Blend

🔖 Abstract

TL; DR: MAG-Edit is the first method specifically designed to address localized image editing in complex scenarios without training.

CLICK for the full abstract Recent diffusion-based image editing approaches have exhibited impressive editing capabilities in images with simple compositions. However, localized editing in complex scenarios has not been well-studied in the literature, despite its growing real-world demands. Existing mask-based inpainting methods fall short of retaining the underlying structure within the edit region. Meanwhile, mask-free attention-based methods often exhibit editing leakage and misalignment in more complex compositions. In this work, we develop MAG-Edit, a training-free, inference-stage optimization method, which enables localized image editing in complex scenarios. In particular, MAG-Edit optimizes the noise latent feature in diffusion models by maximizing two mask-based cross-attention constraints of the edit token, which in turn gradually enhances the local alignment with the desired prompt. Extensive quantitative and qualitative experiments demonstrate the effectiveness of our method in achieving both text alignment and structure preservation for localized editing within complex scenarios.

📝 Changelog

  • 2024.05.24 Release Token Ratio Code!
  • 2023.12.19 Release Project Page and Paper!

💡TODO:

  • Release Spatial Ratio Code
  • Release Token Ratio Code
  • Release MAG-Edit paper and project page

🎮 MAG-Edit Implementation

Setup Environment

Our method is tested using cuda12.0 on a single A100 or V100. The preparation work mainly includes downloading the pre-trained model and configuring the environment.

conda create -n mag python=3.8
conda activate mag

pip install -r requirements.txt

We use Stable Diffusion v1-4 as backbone, please download from Hugging Face and change the file path in line26 in code_tr/network.py.

Run MAG-Edit (Token Ratio)

To run MAG-Edit, single GPU with at least 32 GB VRAM is required. The code_tr/edit.sh provide the edit sample.

CUDA_VISIBLE_DEVICES=0 python edit.py --source_prompt="there is a set of sofas on the red carpet in the living room"\
                --target_prompt="there is a set of sofas on the yellow carpet in the living room" \
                --target_word="yellow" \
                --img_path="examples/1/1.jpg"\
                --mask_path="examples/1/mask.png"\
                --result_dir="result"\
                --max_iteration=15\
                --scale=2.5

The result is saved at code_tr/result.

Various Editing Types

Other Applications


Qualitative Comparison

Comparison with training-free methods

Simplified
Prompt
Source
Image
Ours Blended LD DiffEdit P2P PnP
Green
pillow
Denim
pants
White
bird
Slices of
steak

Comparison with training and finetuning methods

Simplified
Prompt
Source
Image
Ours Instruct
-Pix2Pix
Magic
-Brush
SINE
Yellow
car
Plaid
Sofa
Tropical
fish
Straw
-berry

Comparison with Inversion methods

Simplified
Prompt
Source
Image
Ours Style
-Diffusion
ProxNPI DirectInversion
Jeep
Floral
sofa
Yellow
shirt

🚩 Citation

@inproceedings{mao2024mag,
  title={Mag-edit: Localized image editing in complex scenarios via mask-based attention-adjusted guidance},
  author={Mao, Qi and Chen, Lan and Gu, Yuchao and Fang, Zhen and Shou, Mike Zheng},
  booktitle={Proceedings of the 32nd ACM International Conference on Multimedia},
  pages={6842--6850},
  year={2024}
}

💞 Acknowledgements

This repository borrows heavily from prompt-to-prompt and layout-guidance. Thanks to the authors for sharing their code and models.