-
I am able to successfully run inference for the following example however, the GPU is not being utilized. I have confirmed no GPU utilization by using llama2 = models.LlamaCppChat(
model_path,
n_gpu_layers=40,
n_batch=512,
n_ctx=2048,
echo=False
)
with system():
lm = llama2 + "You are a cat expert."
with user():
lm += "What are the smallest cats?"
with assistant():
lm += gen("answer", stop=".") I have tested GPU inference using langchain and that is able to utilize the GPU which means its not something wrong with the GPU itself. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Seems like The command I used to re-install - If installing for the first time, use - |
Beta Was this translation helpful? Give feedback.
Seems like
llama-cpp-python
had to be reinstalled with CUBLAS enabled. I hade done this for my langchain conda env but forgot to do it for the guidance env. would be nice if this was documented somewhere :)The command I used to re-install -
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir
If installing for the first time, use -
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python