-
Notifications
You must be signed in to change notification settings - Fork 353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KoboldCPP crashes after Arch system update when loading GGUF model: ggml_cuda_host_malloc ... invalid argument #1158
Comments
Did you select the number of layers yourself, or was it automatically picked? |
I chose the number of layers through trial and error. 19 layers was the maximum I could fit on the GPU with 8k context without it running out of VRAM. |
Try fewer layers. |
I have tried running it again with 10 layers, and the result is still the same. The only difference is that it says |
Similar error on EndeavourOS with 6.11.4-arch2-1 kernel (existed in previous version as well).
|
Try using the default settings, don't change anything. Just launch koboldcpp, select your model, select CUDA, and disable MMAP. Does that work and load correctly? |
Describe the Issue
After updating my computer, when running KoboldCPP, the program either crashes or refuses to generate any text. Most of the time, when loading a model, the terminal shows an error:
ggml_cuda_host_malloc: failed to allocate 6558.12 MiB of pinned memory: invalid argument
before trying to load the model into memory.Occasionally it will successfully boot up, but processing prompt is much slower than before the system update, and it aborts before actually generating anything. Eventually it simply crashes with
Killed
printed to the console before exiting.I've tried updating to the latest version of koboldCPP, and using both
cuda1210
andcuda1150
versions produce the same result.Additional Information:
OS: Arch Linux, kernel version 6.11.3-arch1-1 (previous working version: 6.10)
CPU: AMD Ryzen 5 5600 (12) @ 4.468GHz
GPU: NVIDIA GeForce RTX 3060
Model used: Beyonder 4x7b-v2 q5_k_m
GPU layers: 19
CPU threads: 6
Context size: 8192 with ContextShift on
Crashes whether FlashAttention is off or on
Log:
The text was updated successfully, but these errors were encountered: