Add docker support #1239

nopperl · 2023-05-31T17:01:22Z

Description

Add support to run the UI using docker.
Since the previous PRs (#403, #844) stalled, I merged their approaches and fixed remaining issues.

Notes

To improve security, the process is run using a non-root user inside the container. Since bind-mounts are owned by root inside the container, the entrypoint.sh script changes ownership to the non-root user to make it writable.

Environment and Testing

Ubuntu 22.04
Docker 24.0.2
Nvidia Container Toolkit 1.13.1

vladmandic · 2023-06-01T20:45:36Z

thanks for picking this up!

tcmalloc is amazing, but i don't want to go down the path of me installing it. can you remove all mentions of it?
and yes, tcmalloc should make its way into faq, but that's besides the point

don't modify README.md - better create a Wiki page for Docker and then it can be as short or as long as you want
i can create a link on README.md that points to Wiki page

do we need default ./data at all?
i totally agree that --data-dir should be specified, but why default to ./data?

nopperl · 2023-06-02T14:56:47Z

@vladmandic I have now removed tcmalloc and the changes to the README.md.

Regarding --data-dir, I thought that using a subdir of the workdir of the container would be a sane default. But using /data or something else as default also works.

vladmandic · 2023-06-02T15:36:59Z

just an idea - having data inside container is really against the concept of containers.
how about making --data-dir mandatory instead?

for example:

RUN [ -z "$--data-dir" ] && echo "Must specify data directory" && exit 1 || true

nopperl · 2023-06-02T16:25:31Z

Good idea, I have made it mandatory now

vladmandic · 2023-06-02T16:26:39Z

looks good to me, but please tell me you've actually tested it? :)

nopperl · 2023-06-02T17:34:58Z

I built it again from scratch and noticed an error ^^
The requirements.txt file was ignored due to the /*.txt entry in the ignore file. Now it works.

FullBleed · 2023-06-02T19:42:37Z

Can you guys talk about the security benefits/pros and cons of using this?

vladmandic · 2023-06-02T22:06:09Z

Can you guys talk about the security benefits/pros and cons of using this?

talk about benefits of using docker in general? not really, that's really outside of the scope of this pr, this is to provide simple-to-use template.

nopperl · 2023-06-20T18:54:38Z

@vladmandic I think its ready to be merged

staff0rd · 2023-07-05T15:42:22Z

I merged master into this and have the following findings regarding docker compose up:

--skip-update appears no longer valid and should be removed
On recreate, installation is again attempted, including downloading of all packages including torch torchvision - probably the venv or wherever those get put should be a volume also

entrypoint.sh

Dockerfile

Kubuxu · 2023-07-06T10:22:41Z

I would also suggest making the first argument to the entrypoint webui and setting it by default with RUN ["webui"] if the first argument is different that webui, exec arguments directly.

staff0rd · 2023-07-06T12:22:42Z

docker-compose.yml

+ environment:
+ DATA_DIR: "./data"
+ volumes:
+ - ./data:/webui/data


Solves the reinstall problem noted here.

Suggested change

- ./data:/webui/data

- ./data:/webui/data

- ./venv/lib:/webui/venv/lib

- ./repositories:/webui/repositories

Even with the above, i still see the following on re-create but i'm not sure where they are coming from:

Downloading (…)olve/main/vocab.json: 100% 961k/961k [00:00<00:00, 34.7MB/s] Downloading (…)olve/main/merges.txt: 100% 525k/525k [00:00<00:00, 43.1MB/s] Downloading (…)cial_tokens_map.json: 100% 389/389 [00:00<00:00, 2.82MB/s] Downloading (…)okenizer_config.json: 100% 905/905 [00:00<00:00, 3.26MB/s] Downloading (…)lve/main/config.json: 100% 4.52k/4.52k [00:00<00:00, 13.2MB/s]

It would be more clean to declare those folders as VOLUMEs in the Dockerfile. Then you could even leave out the bind mounts in the compose file so they will be created as anonymous volumes during launch.

/webui/venv/lib contains files that were installed when building the image, so I think it would require additional changes to mount it

Co-authored-by: Jakub Sztandera <[email protected]>

nopperl · 2023-07-06T20:17:10Z

Very unfortunate that the --skip-update flag was removed, thanks for bringing it to my attention @staff0rd. I think solving this indirectly by storing the packages and repositories in a bind-mounted directory is suboptimal, since they're not application state and should be stored within the container. @vladmandic is there a plan to bring --skip-update back or is there an equivalent feature?

nopperl · 2023-07-06T20:19:42Z

@Kubuxu thanks for the suggestions, I've fixed the env vars.

I would also suggest making the first argument to the entrypoint webui and setting it by default with RUN ["webui"] if the first argument is different that webui, exec arguments directly.

Could you clarify what you meant by this? Essentially running webui.sh instead of python launch.py in entrypoint.sh per default (with the possibility of specifying other commands)?

Kubuxu · 2023-07-08T22:07:43Z

Could you clarify what you meant by this? Essentially running webui.sh instead of python launch.py in entrypoint.sh per default (with the possibility of specifying other commands)?

python launch.py is fine (even better as webui.sh is not needed). I didn't notice that you didn't use webui.sh.

Correction, not RUN but CMD.

But in essence, having the default run command in CMD either as "python", "launch.py" or as webui "alias" which is handed by entrypoint.sh, which then allows one to override it.

So for example

ENTRYPOINT ["/bin/bash", "-c", "${INSTALLDIR}/entrypoint.sh \"$0\" \"$@\""] # same as today
CMD ["webui"]

Then the entrypoint.sh should detect webui at $1 and activate the env, and call python launch.py, otherwise it launches the command.
See postgress entrypoint as example:

#!/usr/bin/env bash
set -e

if [ "$1" = 'postgres' ]; then
    chown -R postgres "$PGDATA"

    if [ -z "$(ls -A "$PGDATA")" ]; then
        gosu postgres initdb
    fi
    shift
    exec gosu postgres "$@"
fi

exec "$@"

This will allow the user to both pass params to the launch.py like this: docker run image webui --api --backend diffusers and to run custom commands to test the image docker run --rm image nvidia-smi

Dockerfile

Co-authored-by: Jakub Sztandera <[email protected]>

nopperl · 2023-07-10T14:41:24Z

@Kubuxu I think what you want to do here is already possible using the --entrypoint flag of docker run. So, for your example, you can do docker run --rm --entrypoint nvidia-smi image to override the entrypoint.

Kubuxu · 2023-07-10T14:45:02Z

Yeah, this is another way of doing this. We can go down the --entrypoint path instead.

djmaze · 2023-07-11T19:57:10Z

Dockerfile

@@ -0,0 +1,51 @@
+FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04


I propose allowing to use different CUDA versions by using the following instead (adapted from llama.cpp):

ARG UBUNTU_VERSION=22.04 # This needs to generally match the container host's environment. ARG CUDA_VERSION=11.8.0 # Target the CUDA runtime image ARG BASE_CUDA_CONTAINER=nvidia/cuda:${CUDA_VERSION}-cudnn8-runtime-ubuntu${UBUNTU_VERSION} FROM ${BASE_CUDA_CONTAINER}

I have implemented this now, although the user needs to be careful to specify compatible cuda and ubuntu versions.

djmaze · 2023-07-16T13:13:06Z

entrypoint.sh

+fi
+
+# Ensure that potentially bind-mounted directories are owned by the user that runs the service
+chown -R $RUN_UID:$RUN_UID $DATA_DIR


I propose to add these lines as well. Otherwise image generation works only if the option Always save all generated images is enabled.

# Create directory for temporary files and assign it to the user that runs the service mkdir /tmp/gradio chown -R $RUN_UID:$RUN_UID /tmp/gradio

JohanAR · 2023-07-23T18:54:00Z

Dockerfile

+
+# Install automatic1111 dependencies (installer.py)
+RUN . $INSTALLDIR/venv/bin/activate && \
+ python installer.py && \


I don't think this does anything, installer.py looks like it doesn't run any code if one tries to run it directly.

I think python launch.py --test is what you would want, except it installs the CPU-only version of Torch because docker compose build doesn't support runtimes if I understood it correctly. There is a --use-cuda flag but it doesn't actually force the use of cuda, but perhaps installer.py could be modified so that it does?

there are plenty of flags that can be used as-is, there is even --skip-torch. but installing packages AND skipping torch is not viable since plenty of packages down the list require torch, so they would pull it in.

I think the goal here is to install torch and other dependencies while building the docker image, so skipping it wouldn't be of much use :) Bypassing the nvidia-smi check when --use-cuda is present makes this work for me at least.

The arg description says "force use nVidia CUDA backend", so IMO it would be ok to skip the check and crash if it doesn't work, but it's of course up to you to decide how you want your app to behave.

Don't assume docket image is for Nvidia only, but it's OK to require one of --use-xxxparams to be provided.

hazrpg · 2023-07-30T02:45:25Z

I had issues building this from within Ubuntu (20.04). I'm going to document my experience so that you can see the troubles I had along the way to hopefully help me fix them, but ultimately fix it for others who might use it once this has been accepted as a merge. Please don't take this as negative criticism at all, cos I really do appreciate all the hard work you guys are putting into this! I hope my experiences can help to get this accepted. I just wish I knew more to help move things along.

I kept getting the error:

$ docker-compose up
ERROR: The Compose file './docker-compose.yml' is invalid because:
'name' does not match any of the regexes: '^x-'

You might be seeing this error because you're using the wrong Compose file version. Either specify a supported version (e.g "2.2" or "3.3") and place your service definitions under the `services` key, or omit the `version` key and place your service definitions at the root of the file to use version 1.

which I fixed by changing the docker-compose.yml file to not include name: sd-automatic since version 3.9 is defined on line 1 and name: is not supported. Please see: https://docs.docker.com/compose/compose-file/compose-file-v3/

You can also confirm it using the command docker-compose config which will tell you if the compose file is formatted correctly.

After I got past that error by removing the name variable, this was the error I got:

$ docker-compose up      
Building nvidia
Sending build context to Docker daemon   38.6MB
Step 1/17 : ARG UBUNTU_VERSION=22.04     CUDA_VERSION=11.8.0     BASE_CUDA_CONTAINER=nvidia/cuda:${CUDA_VERSION}-cudnn8-runtime-ubuntu${UBUNTU_VERSION}
Step 2/17 : FROM ${BASE_CUDA_CONTAINER}
invalid reference format
ERROR: Service 'nvidia' failed to build : Build failed

For some reason BASE_CUDA_CONTAINER=nvidia/cuda:${CUDA_VERSION}-cudnn8-runtime-ubuntu${UBUNTU_VERSION} isn't being evaluated properly. I had to fix this by hardcoding it into the file so the line was:

ARG UBUNTU_VERSION=22.04 \
    CUDA_VERSION=11.8.0 \
    BASE_CUDA_CONTAINER=nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu20.04

And I changed it to:

ARG UBUNTU_VERSION=20.04 \
    CUDA_VERSION=12.1.0 \
    BASE_CUDA_CONTAINER=nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu20.04

Although I changed it to 20.04 and 12.1.0 (which I confirmed by going to: https://hub.docker.com/r/nvidia/cuda/tags?page=1&name=12.1.0-cudnn8-runtime-ubuntu), I'm pretty sure changing it to:

ARG UBUNTU_VERSION=22.04 \
    CUDA_VERSION=11.8.0 \
    BASE_CUDA_CONTAINER=nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04

Would work fine since that does exist too: https://hub.docker.com/r/nvidia/cuda/tags?page=1&name=11.8.0-cudnn8-runtime-ubuntu

The main issue seems to be with BASE_CUDA_CONTAINER not accepting the variables ${CUDA_VERSION} and ${UBUNTU_VERSION} even though in my mind that looks sane. I tried putting quotes in so that the full line was BASE_CUDA_CONTAINER="nvidia/cuda:${CUDA_VERSION}-cudnn8-runtime-ubuntu${UBUNTU_VERSION}" but that didn't work.

The next issue is the tzdata, it would be good to set a default during installation with an ENV so that you can set your own, since just doing docker-compose up without any commands forces you with this dialogue after installing all the apt packages:

Configuring tzdata
------------------

Please select the geographic area in which you live. Subsequent configuration
questions will narrow this down by presenting a list of cities, representing
the time zones in which they are located.

  1. Africa      4. Australia  7. Atlantic  10. Pacific  13. Etc
  2. America     5. Arctic     8. Europe    11. SystemV
  3. Antarctica  6. Asia       9. Indian    12. US
Geographic area:

But when you type in 8 and hit enter, nothing happens. I had to stop the instance in portainer, and recreate it but with the -it flags so that I could interact with it in an attached tty window to the instance. That then allowed me to do the required continent, follow by the required city.

But once those were in, and it finished setting up. It just stopped running. Trying to re-run it, it obviously continues where it left off because all the packages are installed and tzdata is already set up, and then stops straight away. Trying to diagnose what the last message was and docker says there are no logs it can access for it.

Re-running it in the terminal again to make sure I didn't miss anything and I get:

$ docker run 5f270feee059 

==========
== CUDA ==
==========

CUDA Version 12.1.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

*************************
** DEPRECATION NOTICE! **
*************************
THIS IMAGE IS DEPRECATED and is scheduled for DELETION.
    https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/support-policy.md

Oh! So must be the GPU permission, but still:

$ docker run 5f270feee059 --gpus=all

==========
== CUDA ==
==========

CUDA Version 12.1.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

*************************
** DEPRECATION NOTICE! **
*************************
THIS IMAGE IS DEPRECATED and is scheduled for DELETION.
    https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/support-policy.md

/opt/nvidia/nvidia_entrypoint.sh: line 67: exec: --: invalid option
exec: usage: exec [-cl] [-a name] [command [arguments ...]] [redirection ...]

Slightly more information, tried it with the runtime=nvidia parameter as per the nvidia documentation for CUDA:

$ docker run 5f270feee059 --gpus all --runtime=nvidia

==========
== CUDA ==
==========

CUDA Version 12.1.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

*************************
** DEPRECATION NOTICE! **
*************************
THIS IMAGE IS DEPRECATED and is scheduled for DELETION.
    https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/support-policy.md

/opt/nvidia/nvidia_entrypoint.sh: line 67: exec: --: invalid option
exec: usage: exec [-cl] [-a name] [command [arguments ...]] [redirection ...]

Hmmm... tried the nvidia test using the same base cuda I used for the installation of nvidia/cuda:12.1.0-base-ubuntu20.04:

$ sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:12.1.0-base-ubuntu20.04 nvidia-smi
[sudo] password for hazrpg: 
Unable to find image 'nvidia/cuda:12.1.0-base-ubuntu20.04' locally
12.1.0-base-ubuntu20.04: Pulling from nvidia/cuda
56e0351b9876: Already exists 
b0f696c0aebb: Pull complete 
e627444df06f: Pull complete 
dcf21018e934: Pull complete 
a2855a2ef2e0: Pull complete 
Digest: sha256:d0bf043a20ecc11940c5a452f67f239f9dec34a01d8f5583d2af93cf0da0f072
Status: Downloaded newer image for nvidia/cuda:12.1.0-base-ubuntu20.04
Sun Jul 30 02:40:24 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060        Off | 00000000:01:00.0  On |                  N/A |
|  0%   49C    P5              16W / 170W |   1572MiB / 12288MiB |     13%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

So everything is set up fine for docker, but the image still isn't working. Not sure where I am going wrong, but I feel like I'm close!

Note that I pulled this from the master branch on nopperl:master to test this out.

Edit: I realised after submitting that I hadn't tried the proper image for the nvidia test of nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu20.04 so I changed it but still got the same result as above. I also realised that I had used sudo for the pre-build compose image (like I had for the nvidia test image) so re-ran sudo docker run 5f270feee059 --gpus all --runtime=nvidia to make sure the issue wasn't a permissions problem trying to access the hardware, but that still also gave me the same results as before. So not overly sure what's going on.

hazrpg · 2023-07-30T03:50:27Z

Didn't want to give up, so tried one more time - scrapped and purged everything, reset the repo back to how it was, and did docker-compose up again. The BASE_CUDA_CONTAINER was still an issue, so instead of setting it to my ubuntu version and the cuda I have installed, I just used the 22.04 and 11.8.0 from the original file, and changed BASE_CUDA_CONTAINER to be hardcoded to those versions instead (figure maybe that was why there was an issue).

This time I got a lot further! It installed correctly, run through everything. But this time in the terminal it looked like it had stopped doing anything after Available models: ./data/models/Stable-diffusion 0.

Started up another terminal and attached to the running image, and saw a different message saying Download the default model? (y/N) so I typed in y and hit enter. It started downloading the sd 1.5 model - perfect!

Then afterwards I got:

nvidia_1  | 03:39:52-863637 ERROR    Module load: /webui/extensions-builtin/sd-webui-controlnet/scripts/api.py: AttributeError

Followed by a long traceback log, but it looked like it was still going and did...

nvidia_1  | Image Browser: ImageReward is not installed, cannot be used.
nvidia_1  | 03:40:15-057529 INFO     Loading UI theme: name=black-orange style=Auto                                                                                            
nvidia_1  | Image Browser: Creating database
nvidia_1  | Image Browser: Database created
nvidia_1  | 03:40:16-030004 ERROR    Failed reading extension data from Git repository: a1111-sd-webui-lycoris: HEAD is a detached symbolic reference as it points to          
nvidia_1  |                          'b0d24ca645b6a5cb9752169691a1c6385c6fe6ae'                                                                                                
nvidia_1  | 03:40:16-036250 ERROR    Failed reading extension data from Git repository: clip-interrogator-ext: HEAD is a detached symbolic reference as it points to           
nvidia_1  |                          '9e6bbd9b8931bbe869a8e28e7005b0e13c2efff0'                                                                                                
nvidia_1  | 03:40:16-045836 ERROR    Failed reading extension data from Git repository: multidiffusion-upscaler-for-automatic1111: HEAD is a detached symbolic reference as it 
nvidia_1  |                          points to '70b3c5ea3c9f684d04e7ff59167565974415735c'                                                                                      
nvidia_1  | 03:40:16-053253 ERROR    Failed reading extension data from Git repository: sd-dynamic-thresholding: HEAD is a detached symbolic reference as it points to         
nvidia_1  |                          'f02cacfc923e8bbf73f25327d722d50c458d66bb'                                                                                                
nvidia_1  | 03:40:16-066565 ERROR    Failed reading extension data from Git repository: sd-extension-system-info: HEAD is a detached symbolic reference as it points to        
nvidia_1  |                          '8046b1544513cea06d1c41748c22727c930323ab'                                                                                                
nvidia_1  | 03:40:16-075336 ERROR    Failed reading extension data from Git repository: sd-webui-controlnet: HEAD is a detached symbolic reference as it points to             
nvidia_1  |                          '7b707dc1f03c3070f8a506ff70a2b68173d57bb5'                                                                                                
nvidia_1  | 03:40:16-085855 ERROR    Failed reading extension data from Git repository: sd-webui-model-converter: HEAD is a detached symbolic reference as it points to        
nvidia_1  |                          'f6e0fa5386fb82ef44feac74d66958af951fcc48'                                                                                                
nvidia_1  | 03:40:16-097230 ERROR    Failed reading extension data from Git repository: stable-diffusion-webui-images-browser: HEAD is a detached symbolic reference as it     
nvidia_1  |                          points to '75af6d0c32b72350b2f140f186cd8ce0e24dda10'                                                                                      
nvidia_1  | 03:40:16-111035 ERROR    Failed reading extension data from Git repository: stable-diffusion-webui-rembg: HEAD is a detached symbolic reference as it points to    
nvidia_1  |                          '657ae9f5486019a94dbe11d3560b28cccf35a0fd'                                                                                                
nvidia_1  | 03:40:16-147008 INFO     Setting Torch parameters: dtype=torch.float16 vae=torch.float16 unet=torch.float16                                                        
Loading weights: /webui/data/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/4.3 GB -:--:--
nvidia_1  | LatentDiffusion: Running in eps-prediction mode
nvidia_1  | DiffusionWrapper has 859.52 M params.
Downloading (…)olve/main/vocab.json: 100%|██████████████████████████████████████████████████████████████████████████████████████| 961k/961k [00:00<00:00, 2.82MB/s]
Downloading (…)olve/main/merges.txt: 100%|██████████████████████████████████████████████████████████████████████████████████████| 525k/525k [00:00<00:00, 1.84MB/s]
Downloading (…)cial_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████| 389/389 [00:00<00:00, 2.08MB/s]
Downloading (…)okenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████| 905/905 [00:00<00:00, 5.89MB/s]
Downloading (…)lve/main/config.json: 100%|████████████████████████████████████████████████████████████████████████████████████| 4.52k/4.52k [00:00<00:00, 23.9MB/s]
nvidia_1  | 03:40:19-248309 INFO     Model created from config: /webui/configs/v1-inference.yaml                                                                               
Calculating model hash: /webui/data/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.3/4.3 GB 0:00:00
nvidia_1  | 03:40:39-639737 INFO     Applying scaled dot product cross attention optimization                                                                                  
nvidia_1  | 03:40:39-649293 INFO     Embeddings loaded: 0 []                                                                                                                   
nvidia_1  | 03:40:39-661568 INFO     Model loaded in 23.5s (load=0.6s create=2.5s hash=2.2s apply=17.4s vae=0.5s move=0.3s)                                                    
nvidia_1  | 03:40:40-197750 INFO     Model load finished: {'ram': {'used': 9.04, 'total': 62.59}, 'gpu': {'used': 3.36, 'total': 11.75}, 'retries': 0, 'oom': 0}               
nvidia_1  | Running on local URL:  http://0.0.0.0:7860
nvidia_1  | 
nvidia_1  | To create a public link, set `share=True` in `launch()`.
nvidia_1  | 03:40:40-532231 INFO     Local URL: http://localhost:7860/                                                                                                         
nvidia_1  | 03:40:40-533238 INFO     API Docs: http://localhost:7860/docs                                                                                                      
nvidia_1  | 03:40:40-533900 INFO     Initializing middleware                                                                                                                   
nvidia_1  | ╭─────────────────────────────────────────────────────────────── Traceback (most recent call last) ───────────────────────────────────────────────────────────────╮
nvidia_1  | │ /webui/launch.py:149 in <module>                                                                                                                                │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │   148                                                                                                                                                           │
nvidia_1  | │ ❱ 149     instance = start_server(immediate=True, server=None)                                                                                                  │
nvidia_1  | │   150     while True:                                                                                                                                           │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │ /webui/launch.py:129 in start_server                                                                                                                            │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │   128         else:                                                                                                                                             │
nvidia_1  | │ ❱ 129             server = server.webui()                                                                                                                       │
nvidia_1  | │   130     if args.profile:                                                                                                                                      │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │ /webui/webui.py:274 in webui                                                                                                                                    │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │   273     start_common()                                                                                                                                        │
nvidia_1  | │ ❱ 274     start_ui()                                                                                                                                            │
nvidia_1  | │   275     load_model()                                                                                                                                          │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │ /webui/webui.py:265 in start_ui                                                                                                                                 │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │   264     modules.progress.setup_progress_api(app)                                                                                                              │
nvidia_1  | │ ❱ 265     create_api(app)                                                                                                                                       │
nvidia_1  | │   266     ui_extra_networks.add_pages_to_demo(app)                                                                                                              │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │ /webui/webui.py:166 in create_api                                                                                                                               │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │   165     log.debug('Creating API')                                                                                                                             │
nvidia_1  | │ ❱ 166     from modules.api.api import Api                                                                                                                       │
nvidia_1  | │   167     api = Api(app, queue_lock)                                                                                                                            │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │ /webui/modules/api/api.py:17 in <module>                                                                                                                        │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │    16 from modules import errors, shared, sd_samplers, deepbooru, sd_hijack, images, scripts,                                                                   │
nvidia_1  | │ ❱  17 from modules.api.models import * # pylint: disable=unused-wildcard-import, wildcard-impo                                                                  │
nvidia_1  | │    18 from modules.processing import StableDiffusionProcessingTxt2Img, StableDiffusionProcessi                                                                  │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │ /webui/modules/api/models.py:106 in <module>                                                                                                                    │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │   105     ]                                                                                                                                                     │
nvidia_1  | │ ❱ 106 ).generate_model()                                                                                                                                        │
nvidia_1  | │   107                                                                                                                                                           │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │ /webui/modules/api/models.py:91 in generate_model                                                                                                               │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │    90         DynamicModel = create_model(self._model_name, **model_fields)                                                                                     │
nvidia_1  | │ ❱  91         DynamicModel.__config__.allow_population_by_field_name = True                                                                                     │
nvidia_1  | │    92         DynamicModel.__config__.allow_mutation = True                                                                                                     │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │ /webui/venv/lib/python3.10/site-packages/pydantic/_internal/_model_construction.py:205 in __getattr__                                                           │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │   204                         return getattr(self, '__pydantic_core_schema__')                                                                                  │
nvidia_1  | │ ❱ 205             raise AttributeError(item)                                                                                                                    │
nvidia_1  | │   206                                                                                                                                                           │
nvidia_1  | ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
nvidia_1  | AttributeError: __config__
nvidia_1  | stable-diffusion-automatic-xl-docker_nvidia_1 exited with code 1

And that's when it exited.

Re-running docker-compose up or even just running the image directly, gives me all the same errors (except this time it isn't downloading anything, it looks like its just trying to use what it had).

So, still not working, but at least it was a derp moment on my part for putting in lower ubuntu version and a higher cuda version. There does appear to be an issue getting some of the needed dependencies, such as the extensions (although not fully required technically to get it working), and loading up the /webui/extensions-builtin/sd-webui-controlnet/scripts/api.py script. And also running the middleware. The middleware being the thing that crashes it.

JohanAR · 2023-07-30T08:39:58Z

I think docker-compose has been deprecated in favour of "docker compose". IIRC that ought to solve the top-level name tag error.

hazrpg · 2023-07-31T09:23:02Z

@JohanAR Sure, you're not wrong that "docker compose" is the preferred method and "docker-compose" is deprecated and is a stub for legacy reasons to "docker compose" in the latest versions of docker.

However, NVIDIA CUDA Toolkit is only supported on Docker 20.10.x (ref: nvidia install guide. Which meant I had to downgrade to 20.10 a long while back to get anything CUDA working without some hacky workaround.

And the docker command does not support docker compose on version 20.10.x:

$ docker compose
docker: 'compose' is not a docker command.
See 'docker --help'

Which means, most people should be running on docker 20.10.x if they want to have the CUDA toolkit on Linux properly, or even in the cloud for that matter. And I believe those on Windows will likely experience similar issues since that recommends going through the WDL2 route.

There are workarounds to this obviously on the latest version of docker, which as far as I understand crashes on the latest-latest (which means you have to always be running a slightly older version of 23.x.x or 24.x.x) but that would mean this repo would need to support said workarounds or other people will post issue after issue that it isn't working for them.

I'm going through the process of upgrading back to the latest version - cos I would love to be proved wrong - and will report back my findings, but I suspect I will end up having to figure a bunch of workarounds to get it to work properly.

djmaze · 2023-07-31T09:36:03Z

However, NVIDIA CUDA Toolkit is only supported on Docker 20.10.x

I used it with Docker 23 as well as now 24, with Ubuntu 22.04 and now 23.04, using the apt package source https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64.

It worked flawlessly out-of-the-box and I did not experience problems. IMHO there is no reason to keep using an old Docker version.

hleroy · 2023-08-01T15:04:59Z

Running in to the same issue as @hazrpg, it fails when "Initializing middleware". I'm not sure what the Python code is doing, but it seems to be missing some configuration attributes, maybe?

Configuration

Ubuntu 22.04.2 LTS
Docker version 24.0.5
Docker Compose version v2.20.2
NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.0

Also, is it possible to pass a flag to avoid the prompt "Download the default model? (y/N)" ? The reason I'm asking is that it's quite uncommon to have to attach to the running container to answer setup parameters. It works but it's not usual with Docker builds.

sd-automatic-nvidia-1  | Running on local URL:  http://0.0.0.0:7860
sd-automatic-nvidia-1  | 
sd-automatic-nvidia-1  | To create a public link, set `share=True` in `launch()`.
sd-automatic-nvidia-1  | 14:55:30-633627 INFO     Local URL: http://localhost:7860/                      
sd-automatic-nvidia-1  | 14:55:30-637451 INFO     API Docs: http://localhost:7860/docs                   
sd-automatic-nvidia-1  | 14:55:30-640605 INFO     Initializing middleware                                
sd-automatic-nvidia-1  | ╭───────────────────── Traceback (most recent call last) ──────────────────────╮
sd-automatic-nvidia-1  | │ /webui/launch.py:149 in <module>                                             │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │   148                                                                        │
sd-automatic-nvidia-1  | │ ❱ 149     instance = start_server(immediate=True, server=None)               │
sd-automatic-nvidia-1  | │   150     while True:                                                        │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │ /webui/launch.py:129 in start_server                                         │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │   128         else:                                                          │
sd-automatic-nvidia-1  | │ ❱ 129             server = server.webui()                                    │
sd-automatic-nvidia-1  | │   130     if args.profile:                                                   │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │ /webui/webui.py:274 in webui                                                 │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │   273     start_common()                                                     │
sd-automatic-nvidia-1  | │ ❱ 274     start_ui()                                                         │
sd-automatic-nvidia-1  | │   275     load_model()                                                       │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │ /webui/webui.py:265 in start_ui                                              │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │   264     modules.progress.setup_progress_api(app)                           │
sd-automatic-nvidia-1  | │ ❱ 265     create_api(app)                                                    │
sd-automatic-nvidia-1  | │   266     ui_extra_networks.add_pages_to_demo(app)                           │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │ /webui/webui.py:166 in create_api                                            │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │   165     log.debug('Creating API')                                          │
sd-automatic-nvidia-1  | │ ❱ 166     from modules.api.api import Api                                    │
sd-automatic-nvidia-1  | │   167     api = Api(app, queue_lock)                                         │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │ /webui/modules/api/api.py:17 in <module>                                     │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │    16 from modules import errors, shared, sd_samplers, deepbooru, sd_hijack, │
sd-automatic-nvidia-1  | │ ❱  17 from modules.api.models import * # pylint: disable=unused-wildcard-imp │
sd-automatic-nvidia-1  | │    18 from modules.processing import StableDiffusionProcessingTxt2Img, Stabl │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │ /webui/modules/api/models.py:106 in <module>                                 │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │   105     ]                                                                  │
sd-automatic-nvidia-1  | │ ❱ 106 ).generate_model()                                                     │
sd-automatic-nvidia-1  | │   107                                                                        │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │ /webui/modules/api/models.py:91 in generate_model                            │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │    90         DynamicModel = create_model(self._model_name, **model_fields)  │
sd-automatic-nvidia-1  | │ ❱  91         DynamicModel.__config__.allow_population_by_field_name = True  │
sd-automatic-nvidia-1  | │    92         DynamicModel.__config__.allow_mutation = True                  │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │ /webui/venv/lib/python3.10/site-packages/pydantic/_internal/_model_construct │
sd-automatic-nvidia-1  | │ ion.py:205 in __getattr__                                                    │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │   204                         return getattr(self, '__pydantic_core_schema__ │
sd-automatic-nvidia-1  | │ ❱ 205             raise AttributeError(item)                                 │
sd-automatic-nvidia-1  | │   206                                                                        │
sd-automatic-nvidia-1  | ╰──────────────────────────────────────────────────────────────────────────────╯
sd-automatic-nvidia-1  | AttributeError: __config__
sd-automatic-nvidia-1  |

Nuullll · 2023-08-02T01:12:32Z

Also, is it possible to pass a flag to avoid the prompt "Download the default model? (y/N)" ?

@hleroy --no-download

aeberts · 2023-08-07T18:42:26Z

Firstly, thanks to everyone for the great work put into vladmandic/automatic! I'm recording my experiences trying to use the Dockerfile with vast.ai in case it is useful for others. My apologies if the approach I took was not best practices or just plain wrong - I'm fairly new to docker so please take the following as the experiences of a naive end-user trying to get this to work on a GPU cloud provider.

My use case is that I have a Macbook Pro but I would like to build and use a docker image of vladmandic/automatic that can be used on a GPU cloud provider like vast.ai or runpod.io.

My config:

OS: MacOS Monterey 12.6
Docker engine: 24.0.2
Docker Compose: 2.19.1

Steps:

clone nopperl/automatic to my MBP
modify the Dockerfile FROM instruction: FROM --platform=linux/amd64 ${BASE_CUDA_CONTAINER}
run docker compose build -t alexeberts/stable-diffusion:sdnext-test-2 .
wait 30 mins
run docker push alexeberts/stable-diffusion:sdnext-test-2
setup template on vast.ai using alexeberts/stable-diffusion:sdnext-test-2
create instance on vast.ai using the ssh login option.
ssh into the instance and run entrypoint.sh

Results:

The container args INSTALLDIR etc are not automatically added to the new environment
After setting up the args manually, and running entrypoint.sh the server starts but with the same errors @hleroy and @hazrpg ran into.
I was not able to get a running instance of automatic.
I considered trying to build the image using docker compose build to see if I was missing configuration info from docker-compose.yml but I could not figure out how to ensure that docker compose build would build a linux/amd64 container (adding platform: linux/amd64 to the docker-compose.yml resulted in an error).

I'm happy to continue testing on vast.ai if someone can provide a linux image or instructions for how to successfully build a linux image from this repo on a MBP.

hazrpg · 2023-08-09T08:28:41Z

However, NVIDIA CUDA Toolkit is only supported on Docker 20.10.x

I used it with Docker 23 as well as now 24, with Ubuntu 22.04 and now 23.04, using the apt package source https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64.

It worked flawlessly out-of-the-box and I did not experience problems. IMHO there is no reason to keep using an old Docker version.

I did eventually try the upstream docker apt packages, instead of the Canonical/Debian ones. Looks like although the Nvidia toolkit says it doesn't support newer versions, the lovely docker peeps must have gotten around that and made sure it still works. So I stand corrected, thank you for pointing it out.

However I'm still stuck at the middleware stage sadly even with the newer docker and using docker compose.

AIWintermuteAI · 2023-11-18T13:56:02Z

Why did the PR stall? Was there a technical difficulty?

vladmandic · 2023-11-18T16:25:00Z

moving status to draft until comments are incorporated and maintainer is found.

ilkersigirci · 2024-01-11T11:16:34Z

What is the status of this PR? Using SD.next with docker install would be a huge win IMHO.

vladmandic · 2024-01-11T12:52:08Z

there are plenty of users using sdnext inside a docker container, but having an official dockerfile is tricky as everyone has their own idea what docker config should be like and it also varies on platform.

FoxxMD · 2024-01-11T14:34:43Z

On that note for anyone looking for a "one-click" docker deploy -- I have contributed to and am using grokuku/stable-diffusion on a linux host with nvidia gpu. It "just works" and stays up-to-date with master branch automatically. Read the readme ofc but an example run command:

docker run -d -p 9000:9000 -e "PUID=1000" -e "PGID=1000" -e "WEBUI_VERSION=04" -v /path/on/host/data:/config --runtime=nvidia --gpus all holaflenain/stable-diffusion

ilkersigirci · 2024-01-11T15:29:55Z

On that note for anyone looking for a "one-click" docker deploy -- I have contributed to and am using grokuku/stable-diffusion on a linux host with nvidia gpu. It "just works" and stays up-to-date with master branch automatically. Read the readme ofc but an example run command:
docker run -d -p 9000:9000 -e "PUID=1000" -e "PGID=1000" -e "WEBUI_VERSION=04" -v /path/on/host/data:/config --runtime=nvidia --gpus all holaflenain/stable-diffusion

Thanks for the link, will try soon.

abhiaagarwal · 2024-05-08T16:27:57Z

I currently have a WIP branch right here for an NVIDIA CUDA-based docker image. Also installs onediff, tensorflow, and working on adding TensorRT support. It boots and works perfectly, but still tweaking it. Any contributions welcome for anyone who wants to add AMD/Intel/whatever support, I only have NVIDIA hardware so I can't test on other platforms.

Broadly, the installer.py file is hard to work with (I had to basically reverse engineer it and convert it into declarative docker statements) — perhaps it should be converted to a pyproject.toml file with dependencies declared imperatively rather than the current form. I think poetry is probably the best tool for this with its mature dependency group functionality, conda doesn't have the right ability.

abhiaagarwal@37944e9

Yoinky3000 · 2024-06-30T05:52:48Z

I have recently created a sd next docker image that supports both cuda(from 11.8 to 12.5) and rocm(from 5.5 to 6.1), I have tested the image multiple times and it works great
https://github.com/Yoinky3000/sd-next-docker
https://hub.docker.com/repository/docker/yoinky3000/sd-next-docker/general

Add docker support

ddcb7e3

Remove tcmalloc and instructions in readme

8369566

Make data dir for mandatory for docker containers and add entrypoint.sh

b73492b

Remove requirements.txt from .gitignore

a977533

vladmandic mentioned this pull request Jun 9, 2023

Simple Dockerfile and Container YAML with no Volumes #844

Closed

Kubuxu reviewed Jul 6, 2023

View reviewed changes

entrypoint.sh Outdated Show resolved Hide resolved

Kubuxu reviewed Jul 6, 2023

View reviewed changes

Dockerfile Outdated Show resolved Hide resolved

staff0rd reviewed Jul 6, 2023

View reviewed changes

Properly escape env vars in run command

9609ef9

Co-authored-by: Jakub Sztandera <[email protected]>

Kubuxu reviewed Jul 8, 2023

View reviewed changes

Dockerfile Outdated Show resolved Hide resolved

kunalgoyal9 approved these changes Jul 9, 2023

View reviewed changes

fix missing quote in docker entrypoint

2b44398

Co-authored-by: Jakub Sztandera <[email protected]>

djmaze reviewed Jul 11, 2023

View reviewed changes

Allow the use of a different cuda and ubuntu version

0a456fc

djmaze reviewed Jul 16, 2023

View reviewed changes

JohanAR reviewed Jul 23, 2023

View reviewed changes

vladmandic marked this pull request as draft November 18, 2023 16:25

This was referenced Jul 1, 2024

Add support to run as Docker Container #3306

Closed

Add support to run as Docker Container #3320

Open

		@@ -0,0 +1,51 @@
		FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04

Add docker support #1239

Are you sure you want to change the base?

Add docker support #1239

Conversation

nopperl commented May 31, 2023

Description

Notes

Environment and Testing

vladmandic commented Jun 1, 2023

nopperl commented Jun 2, 2023

vladmandic commented Jun 2, 2023

nopperl commented Jun 2, 2023

vladmandic commented Jun 2, 2023

nopperl commented Jun 2, 2023

FullBleed commented Jun 2, 2023

vladmandic commented Jun 2, 2023

nopperl commented Jun 20, 2023

staff0rd commented Jul 5, 2023

Kubuxu commented Jul 6, 2023

staff0rd Jul 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

djmaze Jul 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nopperl commented Jul 6, 2023

nopperl commented Jul 6, 2023

Kubuxu commented Jul 8, 2023 • edited Loading

nopperl commented Jul 10, 2023

Kubuxu commented Jul 10, 2023

djmaze Jul 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hazrpg commented Jul 30, 2023 • edited Loading

hazrpg commented Jul 30, 2023

JohanAR commented Jul 30, 2023

hazrpg commented Jul 31, 2023

djmaze commented Jul 31, 2023

hleroy commented Aug 1, 2023 • edited Loading

Nuullll commented Aug 2, 2023

aeberts commented Aug 7, 2023

hazrpg commented Aug 9, 2023

AIWintermuteAI commented Nov 18, 2023

vladmandic commented Nov 18, 2023

ilkersigirci commented Jan 11, 2024

vladmandic commented Jan 11, 2024

FoxxMD commented Jan 11, 2024

ilkersigirci commented Jan 11, 2024

abhiaagarwal commented May 8, 2024

Yoinky3000 commented Jun 30, 2024

staff0rd Jul 6, 2023 •

edited

Loading

djmaze Jul 11, 2023 •

edited

Loading

Kubuxu commented Jul 8, 2023 •

edited

Loading

djmaze Jul 11, 2023 •

edited

Loading

hazrpg commented Jul 30, 2023 •

edited

Loading

hleroy commented Aug 1, 2023 •

edited

Loading