How do I use GPU for inference? #565

prnvbn · 2024-01-02T11:21:09Z

prnvbn
Jan 2, 2024

I am able to successfully run inference for the following example however, the GPU is not being utilized. I have confirmed no GPU utilization by using nvidia-smi. Is there anything extra that is required to be done for using GPU for inference in guidance?

    llama2 = models.LlamaCppChat(
        model_path, 
        n_gpu_layers=40,
        n_batch=512,
        n_ctx=2048,
        echo=False
    )

    with system():
        lm = llama2 + "You are a cat expert."
    
    with user():
        lm += "What are the smallest cats?"
    
    with assistant():
        lm += gen("answer", stop=".")

I have tested GPU inference using langchain and that is able to utilize the GPU which means its not something wrong with the GPU itself.

Answered by prnvbn

Jan 2, 2024

Seems like llama-cpp-python had to be reinstalled with CUBLAS enabled. I hade done this for my langchain conda env but forgot to do it for the guidance env. would be nice if this was documented somewhere :)

The command I used to re-install - CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir

If installing for the first time, use - CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python

View full answer

prnvbn · 2024-01-02T12:01:30Z

prnvbn
Jan 2, 2024
Author

Seems like llama-cpp-python had to be reinstalled with CUBLAS enabled. I hade done this for my langchain conda env but forgot to do it for the guidance env. would be nice if this was documented somewhere :)

The command I used to re-install - CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir

If installing for the first time, use - CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do I use GPU for inference? #565

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How do I use GPU for inference? #565

prnvbn Jan 2, 2024

Replies: 1 comment

prnvbn Jan 2, 2024 Author

prnvbn
Jan 2, 2024

prnvbn
Jan 2, 2024
Author