Skip to content

Latest commit

 

History

History
100 lines (76 loc) · 3.99 KB

README.md

File metadata and controls

100 lines (76 loc) · 3.99 KB

Sub-path Linear Approximation Model

Paper Hugging Face Model GitHub

Official Repository of our ECCV 2024 Oral paper: Accelerating Image Generation with Sub-path Linear Approximation Model

Project Page: https://subpath-linear-approx-model.github.io/

News

  • [2024/08/12] 🎉 Our SPLAM is selected as an oral presentation by ECCV 2024.
  • [2024/07/01] 🎉 Our SPLAM has been accepted by ECCV 2024!
  • [2024/05/07] 🔥 We provide the pre-trained model in 🤗 Hugging Face, download here.
  • [2024/04/23] 🔥 We release the paper on Arxiv.

Usage

Environment Setting

Install diffusers library from source:

git clone https://github.com/huggingface/diffusers
cd diffusers
pip install -e .

Install required packages:

pip install -r requirements.txt

Initialize an 🤗Accelerate environment with:

accelerate config

Example of Lanching a Training

The following uses the Conceptual Captions 12M (CC12M) dataset as an example, and for illustrative purposes only. For best results you may consider large and high-quality text-image datasets such as LAION. You may also need to search the hyperparameter space according to the dataset you use.

export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export OUTPUT_DIR="path/to/saved/model"

accelerate launch train_splam_distill_sd_wds.py \
    --pretrained_teacher_model=$MODEL_NAME \
    --output_dir=$OUTPUT_DIR \
    --mixed_precision=fp16 \
    --resolution=512 \
    --learning_rate=8e-6 --loss_type="huber" --ema_decay=0.95 --adam_weight_decay=0.0 \
    --max_train_steps=1000 \
    --max_train_samples=4000000 \
    --dataloader_num_workers=8 \
    --train_shards_path_or_url="pipe:curl -L -s https://huggingface.co/datasets/laion/conceptual-captions-12m-webdataset/resolve/main/data/{00000..01099}.tar?download=true" \
    --validation_steps=200 \
    --checkpointing_steps=200 --checkpoints_total_limit=10 \
    --train_batch_size=12 \
    --gradient_checkpointing --enable_xformers_memory_efficient_attention \
    --gradient_accumulation_steps=1 \
    --use_8bit_adam \
    --resume_from_checkpoint=latest \
    --report_to=wandb \
    --seed=453645634 \
    --push_to_hub

Inference

We implement SPLAM to be compatible with LCMScheduler interface. You can use SPLAM similarly, with guidance_scale set to 1 constantly:

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained("alimama-creative/slam-sd1.5")

# To save GPU memory, torch.float16 can be used, but it may compromise image quality.
pipe.to(torch_device="cuda", torch_dtype=torch.float16)

prompt = "a painting of a majestic kingdom with towering castles, lush gardens, ice and snow world"

num_inference_steps = 2

images = pipe(prompt=prompt, num_inference_steps=num_inference_steps, guidance_scale=1, lcm_origin_steps=50, output_type="pil").images

BibTex

@misc{xu2024acceleratingimagegenerationsubpath,
      title={Accelerating Image Generation with Sub-path Linear Approximation Model}, 
      author={Chen Xu and Tianhui Song and Weixin Feng and Xubin Li and Tiezheng Ge and Bo Zheng and Limin Wang},
      year={2024},
      eprint={2404.13903},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
}