Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Docker files #117

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Add Docker files #117

wants to merge 4 commits into from

Conversation

bytebone
Copy link

@bytebone bytebone commented Apr 26, 2024

This PR adds the required files to quickly setup the training UI and Tensorboard with Docker. It is based on the Docker template for Python apps, which is where the comments come from (can be removed if desired).

To run, you need the fully cloned git repo. Just like on the InvokeAI main repo, all dockerfiles and bind mounts (except the existing sample_data folder) are in a docker subfolder. I've kept all approaches similar to the main repo's dockerfiles.

GPU integration is done the same way it's done for InvokeAI. The training UI and Tensorboard are executed on neighbouring ports for convenience.

If you're ready to accept the PR, I can also add usage instructions to the README / docs if desired.

@RyanJDick
Copy link
Collaborator

Thanks for this PR. Sorry that I've been slow in getting around to it. I do think it is worthwhile to get this in. I'll test it out and do a review in the next day or two.

@RyanJDick
Copy link
Collaborator

I tested this out today. There are a few things that need to be addressed before we can merge.

File Permissions

I ran into some file permission errors when the scripts try to write to the output directory. (I'm running on a Ubuntu host.) To fix this, I think we need to do two things:

  1. Let's commit empty data/ and output/ directories to the repo. When these are created automatically by docker, they are owned by root rather than the current user. I'm thinking something like this:
.
├── compose.yaml
├── data
│   └── DOCKER_DATA_MOUNT_DIR.txt
├── Dockerfile
└── output
    └── DOCKER_OUTPUT_MOUNT_DIR.txt
  1. We need to use the host UID/GID in the container so that the user in the container can write to output/ with the correct permissions. There are a few ways to accomplish this. One option is to add user: ${MY_UID-10001}:${MY_GID-10001} to the compose.yaml and then run it with MY_UID="$(id -u)" MY_GID="$(id -g)" docker compose up. We also need to make sure this works on Windows - instructions might have to be slightly different on Windows.

Hugging Face Cache

The hugging face cache should ideally be mounted into the container so that huggingface models can be loaded automatically without re-downloading.

Documentation

The docs for this feature should include:

  • Prerequisites (docker, nvidia-container-toolkit)
  • Command to launch
  • Explanation of bind mount directory structure
  • Explanation of why there are two containers, and which ports are exposed

@bytebone
Copy link
Author

bytebone commented May 2, 2024

Thanks for the elaborate feedback. I'll implement it as soon as possible, might take me some days due to other plans.

Points on file permissions make sense, though I'm considering making the user hardcode their UID/GID into the compose file to reduce the typing required for startup and prevent issues when forgetting to do so. Alternatively we could use a startup script that sets the variables on its own, but that feels like overengineering to me.

Can you recall from the top of your head where the HF cache lies? I thought they're placed in the users home directory, which is already mounted to the ./data folder.

Since I'm building this on a Windows machine, everything is compatible OOB. But I am dependant on your testing on Linux, including GPU mounting.

I'll deal with the docs once we've got everything else locked in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants