-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LMQL fails to use llama-cpp-python to load models #297
Comments
I realize the error with verbosity is a llama-cpp-python bug that has an open issue. I've updated my example to show the lmql specific issue I am encountering. |
Hi there, I just tried to reproduce this on my workstation, and it seems to all work. Can you make sure to re-install llama.cpp with the correct build flags to enable GPU support? My test code:
Note that I put a token constraint on After re-installing llama.cpp do you still the same issue? |
I run into multiple issues when trying to use lmql in colab
When running a query without the verbose flag set or set to False, I get this error:
Code:
Output:
Restarting the runtime or deleting and spinning up a new instance and setting verbose to True at least clears that portion of code, but never progresses beyond that:
Code:
Output:
GPU RAM never increases, System RAM usage will increase sometimes, but doesn't seem consistent. It appears the model never actually loads and so never does any inferencing.
I've tried following examples from both the documentation and from other sites with no luck. A minimal example of the error canbe found at the following colab notebook:
https://colab.research.google.com/drive/1aND43pi3v11fW_2kTYDHLaoxXV69HPWq?usp=sharing
The text was updated successfully, but these errors were encountered: