Optimization issues causing slow speed #32

Loko415 · 2024-03-18T13:49:48Z

Why are you using bfp16? From my research this is causing slower speed and is often used for training instead? Also why are you offloading to CPU? And why are you freeing cuda memory each time? This just makes the model reload each time. I may be completely wrong idk much about ai but I find it strange. Can you explain this to me? How much vram does this model use. 12? Maybe it's using too much vram I only got a 3060 with 12gb ram maybe it's going into swap ram sys ram that's why it's slower than sdxl.

IndigoDosSantos · 2024-03-18T17:51:54Z

Why 🧠float16 (bfp16)? I'm using bfloat16 due to its smaller memory requirement. Although bfloat16 provides less precision compared to float32 or half-precision float (fp16), it retains the extensive range needed to represent very large or tiny numbers. In diffusion models like Stable Diffusion and Stable Cascade, this compromise in precision is generally acceptable.

Why CPU offloading? Offloading models to the CPU helps preserve valuable VRAM, especially for users lacking top-tier GPUs such as the 4090.

Why clear CUDA memory? Clearing CUDA memory after each process (prior and decoder) helps reduce VRAM consumption. Since Stable Cascade operates in a sequential manner, it's unnecessary to keep both the prior and decoder models in VRAM at the same time. This strategy lessens memory demands, particularly on systems with limited VRAM capacity.

VRAM Usage: My configuration necessitates approximately 11GB of VRAM with models offloaded.

Comparison to SDXL: Indeed, Stable Cascade doesn't match SDXL in speed. Due to its more compact size, SDXL can typically remain fully loaded in VRAM, avoiding the loading delays that come with offloading.

Btw, found a few lines in the loading process that are not needed, even slow the model loading process further down. Deleted them and this is the, although small, (but we take what we get, aren’t we? 😅) impact:

Found a few lines in the loading process that are not needed, even slow the model loading process further down. See [issue #32 ](#32) for reference.

Loko415 · 2024-03-19T17:52:16Z

Very nice ☺️

Loko415 · 2024-03-19T17:59:42Z

Should have preset high vram low vram etc incase their GPU can take it 👍

IndigoDosSantos added a commit that referenced this issue Mar 18, 2024

Update run.py

4af1cb6

Found a few lines in the loading process that are not needed, even slow the model loading process further down. See [issue #32 ](#32) for reference.

IndigoDosSantos added the enhancement New feature or request label Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization issues causing slow speed #32

Optimization issues causing slow speed #32

Loko415 commented Mar 18, 2024

IndigoDosSantos commented Mar 18, 2024

Loko415 commented Mar 19, 2024

Loko415 commented Mar 19, 2024

Optimization issues causing slow speed #32

Optimization issues causing slow speed #32

Comments

Loko415 commented Mar 18, 2024

IndigoDosSantos commented Mar 18, 2024

Loko415 commented Mar 19, 2024

Loko415 commented Mar 19, 2024