This is original for Llama2 inference and langchain study. But now the container will be used as a Jupyterlab platform for general LLM study and testing
You can download models from https://huggingface.co/models before start. or inference models with Ollama
Old Description This is a simple tutorial depends on Llama2 and LangChain for begginers like me.
There are too many models/components/libraries to inference a LLM if you tried text-generation-webui before.
I have too many questions about how it works and why I should use this?
For example:
hf? GGML? GGUF? GPTQ? Transformers, CTransformers, exllama, llama.cpp, gptq.....
Why there is no [INT] or <<SYS>> in text-generation-webui?
Can we use "User: " and "AI: " to replace Llama2 "[INST]"? If yes, what's the difference
What is the instruction_template, chat-instruct_command?
oobabooga/text-generation-webui#3644
...... .......There are too many mystories in LLM, especially when the are wrapped in text-generation-ui or LangChain.
As a result, I tried to start from scratch, step by step.
This project is merely a simple record for studying and practicing, and still work in progress.
Note: Some descriptions and comments are still written in Chinese
-
Windows WSL2 + Ubuntu
OR Ubuntu -
NVidia GPU w/ driver installed
Ubuntu Docker
Please refers to: Setup & Installation
docker setup
cd docker && docker compose up --build
(This may take half an hour to download and build)
before open jupyterlab browser (127.0.0.1:8888).
Note:
- if you want pip install anything in console, make sure to source
. /app/venv/bin/activate
first - if you want mount volume,
docker [compsose] run -v /host_path:/mount_path llama2_inference_for_beginner
Optional: Load/Save
You can save/load image like this:
docker save -o docker-llama2_inference_for_beginner-1.tar docker-llama2_inference_for_beginner-1
AND
docker load -i docker-llama2_inference_for_beginner-1.tar
If you are WSL, refers to WSLSetup.md
If you are Linux, install NV driver and CUDA:
https://www.nvidia.com/download/index.aspx
https://developer.nvidia.com/cuda-downloads
auto install docker
sudo ./docker/install_docker.sh
Remember to re-login
check docker cuda
docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi
OR
---------- Deprecated, for inference only -------------
https://docs.docker.com/engine/install/ubuntu/
sudo apt update
sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
echo \
"deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu "\
$(. /etc/os-release && echo $VERSION_CODENAME)" stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
sudo systemctl restart docker
Note : Prevent dcoker from subnet ip conflict with 172.1x.xxx.xxx
https://serverfault.com/questions/916941/configuring-docker-to-not-use-the-172-17-0-0-range
Note1: Sometimes the docker will re-start fail, you can try to do it again later. Note2: https://stackoverflow.com/questions/43988006/docker-create-two-bridges-that-corrupts-my-internet-access
the default empty bip means it will just grab an allocation from the pool, like any other network/container will.
$ sudo vi /etc/docker/daemon.json
{
"default-address-pools" : [
{
"base" : "172.118.0.0/16",
"size" : 24
}
],
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
}
}
sudo systemctl restart docker