Llama2_Inference_For_Beginner

Description

This is original for Llama2 inference and langchain study. But now the container will be used as a Jupyterlab platform for general LLM study and testing

You can download models from https://huggingface.co/models before start. or inference models with Ollama

Old Description This is a simple tutorial depends on Llama2 and LangChain for begginers like me.

There are too many models/components/libraries to inference a LLM if you tried text-generation-webui before.

I have too many questions about how it works and why I should use this?

For example:

hf? GGML? GGUF? GPTQ? Transformers, CTransformers, exllama, llama.cpp, gptq.....
Why there is no [INT] or <<SYS>> in text-generation-webui?
Can we use "User: " and "AI: " to replace Llama2 "[INST]"? If yes, what's the difference
What is the instruction_template, chat-instruct_command?
oobabooga/text-generation-webui#3644
...... .......

There are too many mystories in LLM, especially when the are wrapped in text-generation-ui or LangChain.

As a result, I tried to start from scratch, step by step.

This project is merely a simple record for studying and practicing, and still work in progress.

Note: Some descriptions and comments are still written in Chinese

prerequisite:

Windows WSL2 + Ubuntu
OR Ubuntu
NVidia GPU w/ driver installed

Quick Start

Ubuntu Docker

Please refers to: Setup & Installation

docker setup
cd docker && docker compose up --build
(This may take half an hour to download and build) before open jupyterlab browser (127.0.0.1:8888).

Note:

if you want pip install anything in console, make sure to source . /app/venv/bin/activate first
if you want mount volume, docker [compsose] run -v /host_path:/mount_path llama2_inference_for_beginner

Optional: Load/Save
You can save/load image like this:
docker save -o docker-llama2_inference_for_beginner-1.tar docker-llama2_inference_for_beginner-1
AND
docker load -i docker-llama2_inference_for_beginner-1.tar

Setup & Installation

Setup cuda

If you are WSL, refers to WSLSetup.md

If you are Linux, install NV driver and CUDA:
https://www.nvidia.com/download/index.aspx
https://developer.nvidia.com/cuda-downloads

Setup Docker

auto install docker
sudo ./docker/install_docker.sh

Remember to re-login

check docker cuda docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi

OR

---------- Deprecated, for inference only -------------
https://docs.docker.com/engine/install/ubuntu/

sudo apt update
sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg

echo \
  "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu "\
  $(. /etc/os-release && echo $VERSION_CODENAME)" stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
sudo systemctl restart docker

(Optional) Docker IP conflict

Note : Prevent dcoker from subnet ip conflict with 172.1x.xxx.xxx

https://serverfault.com/questions/916941/configuring-docker-to-not-use-the-172-17-0-0-range

Note1: Sometimes the docker will re-start fail, you can try to do it again later. Note2: https://stackoverflow.com/questions/43988006/docker-create-two-bridges-that-corrupts-my-internet-access

   the default empty bip means it will just grab an allocation from the pool, like any other network/container will.

$ sudo vi /etc/docker/daemon.json
{
  "default-address-pools" : [
    {
      "base" : "172.118.0.0/16",
      "size" : 24
    }
  ],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  }
}

sudo systemctl restart docker

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.gpt4all		.gpt4all
.ollama		.ollama
docker		docker
mydata		mydata
mylib		mylib
.env		.env
.gitmodules		.gitmodules
1_Inference_CTransformers(GGUF).ipynb		1_Inference_CTransformers(GGUF).ipynb
1_Inference_Llama.cpp(GGUF).ipynb		1_Inference_Llama.cpp(GGUF).ipynb
1_Inference_OpenAI.ipynb		1_Inference_OpenAI.ipynb
1_Inference_Transformers(HF).ipynb		1_Inference_Transformers(HF).ipynb
2.1_QA-Code-OpenAI-GGUF.ipynb		2.1_QA-Code-OpenAI-GGUF.ipynb
2_QA-GGUF.ipynb		2_QA-GGUF.ipynb
2_QA-OpenAI.ipynb		2_QA-OpenAI.ipynb
README.md		README.md
data.ipynb		data.ipynb
quick_start_gemini.ipynb		quick_start_gemini.ipynb
quick_start_ollama(local).ipynb		quick_start_ollama(local).ipynb
quickstart_gpt4all(local).ipynb		quickstart_gpt4all(local).ipynb
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Llama2_Inference_For_Beginner

Description

prerequisite:

Quick Start

Setup & Installation

Setup cuda

Setup Docker

(Optional) Docker IP conflict

About

Releases

Packages

Languages

smithlai/Llama2_Inference_For_Beginner

Folders and files

Latest commit

History

Repository files navigation

Llama2_Inference_For_Beginner

Description

prerequisite:

Quick Start

Setup & Installation

Setup cuda

Setup Docker

(Optional) Docker IP conflict

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages