-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support flux example #1073
base: main
Are you sure you want to change the base?
support flux example #1073
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,101 @@ | ||
# Run FLUX with nexfort backend (Beta Release) | ||
|
||
1. [Environment Setup](#environment-setup) | ||
- [Set Up OneDiff](#set-up-onediff) | ||
- [Set Up NexFort Backend](#set-up-nexfort-backend) | ||
- [Set Up Diffusers Library](#set-up-diffusers) | ||
- [Set Up FLUX](#set-up-flux) | ||
2. [Execution Instructions](#run) | ||
- [Run Without Compilation (Baseline)](#run-without-compilation-baseline) | ||
- [Run With Compilation](#run-with-compilation) | ||
3. [Performance Comparison](#performance-comparison) | ||
4. [Dynamic Shape for FLUX](#dynamic-shape-for-flux) | ||
|
||
## Environment setup | ||
### Set up onediff | ||
https://github.com/siliconflow/onediff?tab=readme-ov-file#installation | ||
|
||
### Set up nexfort backend | ||
https://github.com/siliconflow/onediff/tree/main/src/onediff/infer_compiler/backends/nexfort | ||
|
||
### Set up diffusers | ||
|
||
``` | ||
pip3 install --upgrade diffusers[torch] | ||
``` | ||
### Set up FLUX | ||
Model version for diffusers: https://huggingface.co/black-forest-labs/FLUX.1-schnell | ||
|
||
HF pipeline: https://github.com/huggingface/diffusers/blob/main/docs/source/en/api/pipelines/flux.md | ||
|
||
## Run | ||
|
||
### Run without compilation (Baseline) | ||
```shell | ||
python3 benchmarks/text_to_image.py \ | ||
--model black-forest-labs/FLUX.1-schnell \ | ||
--height 1024 --width 1024 \ | ||
--scheduler none \ | ||
--steps 4 \ | ||
--output-image ./flux-schnell.png \ | ||
--prompt "beautiful scenery nature glass bottle landscape, , purple galaxy bottle," \ | ||
--compiler none \ | ||
--dtype bfloat16 \ | ||
--seed 1 \ | ||
--print-output | ||
``` | ||
|
||
### Run with compilation | ||
|
||
```shell | ||
python3 benchmarks/text_to_image.py \ | ||
--model black-forest-labs/FLUX.1-schnell \ | ||
--height 1024 --width 1024 \ | ||
--scheduler none \ | ||
--steps 4 \ | ||
--output-image ./flux-schnell-compile.png \ | ||
--prompt "beautiful scenery nature glass bottle landscape, , purple galaxy bottle," \ | ||
--compiler nexfort \ | ||
--compiler-config '{"mode": "benchmark:cudagraphs:max-autotune:low-precision:cache-all", "memory_format": "channels_last", "options": {"cuda.fuse_timestep_embedding": false, "inductor.force_triton_sdpa": true}}' \ | ||
--dtype bfloat16 \ | ||
--seed 1 \ | ||
--print-output | ||
``` | ||
|
||
## Performance comparison | ||
|
||
Testing on NVIDIA A800-SXM4-80GB, with image size of 1024*1024, iterating 4 steps: | ||
| Metric | A800-SXM4-80GB 1024*1024 | | ||
| ------------------------------------ | ------------------------ | | ||
| Data update date (yyyy-mm-dd) | 2024-08-07 | | ||
| PyTorch iteration speed | 2.18 it/s | | ||
| OneDiff iteration speed | 2.80 it/s (+28.4%) | | ||
| PyTorch E2E time | 2.06 s | | ||
| OneDiff E2E time | 1.53 s (-25.7%) | | ||
| PyTorch Max Mem Used | 35.79 GiB | | ||
| OneDiff Max Mem Used | 40.44 GiB | | ||
| PyTorch Warmup with Run time | 2.81 s | | ||
| OneDiff Warmup with Compilation time | 253.01 s | | ||
| OneDiff Warmup with Cache time | 73.63 s | | ||
|
||
<sup>1</sup> OneDiff Warmup with Compilation time is tested on Intel(R) Xeon(R) Platinum 8358P CPU @ 2.60GHz. Note this is just for reference, and it varies a lot on different CPU. | ||
|
||
|
||
## Dynamic shape for FLUX | ||
|
||
Run: | ||
|
||
```shell | ||
python3 benchmarks/text_to_image.py \ | ||
--model black-forest-labs/FLUX.1-schnell \ | ||
--height 1024 --width 1024 \ | ||
--scheduler none \ | ||
--steps 4 \ | ||
--output-image ./flux-schnell-compile.png \ | ||
--prompt "beautiful scenery nature glass bottle landscape, , purple galaxy bottle," \ | ||
--compiler nexfort \ | ||
--compiler-config '{"mode": "benchmark:cudagraphs:max-autotune:low-precision:cache-all", "memory_format": "channels_last", "options": {"cuda.fuse_timestep_embedding": false, "inductor.force_triton_sdpa": true}, "dynamic", true}' \ | ||
--run_multiple_resolutions 1 \ | ||
--dtype bfloat16 \ | ||
--seed 1 \ | ||
``` |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,101 @@ | ||||||
import argparse | ||||||
import time | ||||||
|
||||||
import cv2 | ||||||
import numpy as np | ||||||
import torch | ||||||
|
||||||
from diffusers import FluxPipeline | ||||||
from PIL import Image | ||||||
|
||||||
parser = argparse.ArgumentParser() | ||||||
parser.add_argument("--base", type=str, default="black-forest-labs/FLUX.1-schnell") | ||||||
parser.add_argument( | ||||||
"--prompt", | ||||||
type=str, | ||||||
default="chinese painting style women", | ||||||
) | ||||||
parser.add_argument("--height", type=int, default=512) | ||||||
parser.add_argument("--width", type=int, default=512) | ||||||
parser.add_argument("--n_steps", type=int, default=4) | ||||||
parser.add_argument("--saved_image", type=str, required=False, default="flux-out.png") | ||||||
parser.add_argument("--seed", type=int, default=1) | ||||||
parser.add_argument("--warmup", type=int, default=1) | ||||||
parser.add_argument("--run", type=int, default=3) | ||||||
parser.add_argument( | ||||||
"--compile", type=(lambda x: str(x).lower() in ["true", "1", "yes"]), default=True | ||||||
) | ||||||
parser.add_argument("--run-multiple-resolutions", action="store_true") | ||||||
args = parser.parse_args() | ||||||
|
||||||
|
||||||
# load stable diffusion | ||||||
pipe = FluxPipeline.from_pretrained(args.base, torch_dtype=torch.bfloat16) | ||||||
# pipe = FluxPipeline.from_pretrained(args.base, torch_dtype=torch.bfloat16, local_files_only=True, revision="93424e3a1530639fefdf08d2a7a954312e5cb254") | ||||||
pipe.to("cuda") | ||||||
|
||||||
if args.compile: | ||||||
from onediffx import compile_pipe | ||||||
|
||||||
pipe = compile_pipe( | ||||||
pipe, | ||||||
backend="nexfort", | ||||||
options={ | ||||||
"options": { | ||||||
"cuda.fuse_timestep_embedding": False, | ||||||
"inductor.force_triton_sdpa": True, | ||||||
} | ||||||
}, | ||||||
) | ||||||
|
||||||
|
||||||
# generate image | ||||||
generator = torch.manual_seed(args.seed) | ||||||
|
||||||
print("Warmup") | ||||||
for i in range(args.warmup): | ||||||
image = pipe( | ||||||
args.prompt, | ||||||
height=args.height, | ||||||
width=args.width, | ||||||
output_type="pil", | ||||||
num_inference_steps=args.n_steps, # use a larger number if you are using [dev] | ||||||
generator=torch.Generator("cpu").manual_seed(args.seed), | ||||||
).images[0] | ||||||
|
||||||
|
||||||
print("Run") | ||||||
for i in range(args.run): | ||||||
begin = time.time() | ||||||
image = pipe( | ||||||
args.prompt, | ||||||
height=args.height, | ||||||
width=args.width, | ||||||
output_type="pil", | ||||||
num_inference_steps=args.n_steps, # use a larger number if you are using [dev] | ||||||
generator=torch.Generator("cpu").manual_seed(args.seed), | ||||||
).images[0] | ||||||
end = time.time() | ||||||
print(f"Inference time: {end - begin:.3f}s") | ||||||
|
||||||
image.save(f"{i=}th_{args.saved_image}.png") | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fix incorrect string interpolation in The use of - image.save(f"{i=}th_{args.saved_image}.png")
+ image.save(f"{i}th_{args.saved_image}.png") Committable suggestion
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 这个等号去掉吧,在文件名里面比较麻烦 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jackalcooper,好的,去掉等号是正确的选择。这样文件名会更简洁。感谢您的确认! |
||||||
|
||||||
|
||||||
if args.run_multiple_resolutions: | ||||||
print("Test run with multiple resolutions...") | ||||||
sizes = [1024, 512, 768, 256] | ||||||
for h in sizes: | ||||||
for w in sizes: | ||||||
print(f"Running at resolution: {h}x{w}") | ||||||
start_time = time.time() | ||||||
image = pipe( | ||||||
args.prompt, | ||||||
height=h, | ||||||
width=w, | ||||||
output_type="pil", | ||||||
num_inference_steps=args.n_steps, # use a larger number if you are using [dev] | ||||||
generator=torch.Generator("cpu").manual_seed(args.seed), | ||||||
).images[0] | ||||||
end_time = time.time() | ||||||
print(f"Inference time: {end_time - start_time:.2f} seconds") | ||||||
image.save(f"{i=}th_{args.saved_image}_{h}x{w}.png") | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fix incorrect string interpolation in The use of - image.save(f"{i=}th_{args.saved_image}_{h}x{w}.png")
+ image.save(f"{i}th_{args.saved_image}_{h}x{w}.png") Committable suggestion
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 这个等号去掉吧,在文件名里面比较麻烦 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jackalcooper, 正如建议中所示,我已经去掉了文件名中的等号。请查看是否符合您的要求! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused imports.
The imports
cv2
,numpy
, andPIL.Image
are not used in the script and should be removed to clean up the code.Also applies to: 5-5, 9-9
Tools
Ruff