-
-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CuBLAS error in executión and compilatión with warnings #150
Comments
Warnings in compilation: (python_3.8) raul@lesquina:~/Desktop/Proyecto/go-llama.cpp$ BUILD_TYPE=cublas make libbinding.a cd llama.cpp && patch -p1 < ../patches/1902-cuda.patch |
same problem |
what is your go version? 18 or 21 go version |
Having the same problem. Using go version 1.21 on Ubuntu Linux on latest version of go-llama.cpp.
Then
|
same issue. I'm able to compile (some warnings only):
then able to load the model on GPU:
resulting in:
but then segfaults on inference:
|
this should be fixed already. can you try master? which version are you running on? |
same result unfortunately (just cloned again fresh):
able to load the model (confirmed also by nvidia-smi):
with the same segfault on inference:
I believe (not sure, far from competent in C, used LOG() function to chase this) the problem may originate on this line (both llama_sample_token_binding / llama_sample_token seem to produce the crash, compiled both ways): Line 450 in b8a1245
my env (also tried cuda 11.2 with nvidia driver 470.199.02):
Hope that's useful somehow, cheers! [edit] |
Work fine with OpenBLAS and not acceleration.
But, by separatelly, llama.cpp compile and work fine with CuBLAS.
Executión in Error:
(python_3.8) raul@lesquina:~/Desktop/Proyecto/go-llama.cpp$ CGO_LDFLAGS="-lcublas -lcudart -L/usr/local/cuda/lib64/" LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go run ./examples -m "/home/raul/Desktop/Proyecto/models/llama-2-13b-chat.ggmlv3.q2_K.bin" -t 14
github.com/go-skynet/go-llama.cpp
binding.cpp: In function ‘void llama_binding_free_model(void*)’:
binding.cpp:613:5: warning: possible problem detected in invocation of ‘operator delete’ [-Wdelete-incomplete]
613 | delete ctx->model;
| ^~~~~~~~~~~~~~~~~
binding.cpp:613:17: warning: invalid use of incomplete type ‘struct llama_model’
613 | delete ctx->model;
| ~~~~~^~~~~
In file included from ./llama.cpp/examples/common.h:5,
from binding.cpp:1:
./llama.cpp/llama.h:66:12: note: forward declaration of ‘struct llama_model’
66 | struct llama_model;
| ^~~~~~~~~~~
binding.cpp:613:5: note: neither the destructor nor the class-specific ‘operator delete’ will be called, even if they are declared when the class is defined
613 | delete ctx->model;
| ^~~~~~~~~~~~~~~~~
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6
llama.cpp: loading model from /home/raul/Desktop/Proyecto/models/llama-2-13b-chat.ggmlv3.q2_K.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 128
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_head_kv = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: n_gqa = 1
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: freq_base = 0.0
llama_model_load_internal: freq_scale = 5.60519e-44
llama_model_load_internal: ftype = 10 (mostly Q2_K)
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 0.11 MB
llama_model_load_internal: using CUDA for GPU acceleration
llama_model_load_internal: mem required = 5587.01 MB (+ 100.00 MB per state)
llama_model_load_internal: offloading 0 repeating layers to GPU
llama_model_load_internal: offloaded 0/43 layers to GPU
llama_model_load_internal: total VRAM used: 0 MB
llama_new_context_with_model: kv self size = 100.00 MB
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x100 pc=0x7f84c7ab39fa]
runtime stack:
runtime.throw({0x56fc86?, 0x0?})
/usr/lib/go-1.20/src/runtime/panic.go:1047 +0x5d fp=0x7fff116a39d8 sp=0x7fff116a39a8 pc=0x4523fd
runtime.sigpanic()
/usr/lib/go-1.20/src/runtime/signal_unix.go:821 +0x3e9 fp=0x7fff116a3a38 sp=0x7fff116a39d8 pc=0x466ac9
goroutine 1 [syscall]:
runtime.cgocall(0x4b5320, 0xc00004ebf0)
/usr/lib/go-1.20/src/runtime/cgocall.go:157 +0x5c fp=0xc00004ebc8 sp=0xc00004eb90 pc=0x4239bc
github.com/go-skynet/go-llama%2ecpp._Cfunc_load_model(0x118dce0, 0x80, 0x0, 0x1, 0x0, 0x1, 0x1, 0x0, 0x0, 0x0, ...)
_cgo_gotypes.go:238 +0x4d fp=0xc00004ebf0 sp=0xc00004ebc8 pc=0x4b20ed
github.com/go-skynet/go-llama%2ecpp.New({0x7fff116c2fed, 0x43}, {0xc00004ee68, 0x4, 0x1?})
/home/raul/Desktop/Proyecto/go-llama.cpp/llama.go:26 +0x257 fp=0xc00004ecf8 sp=0xc00004ebf0 pc=0x4b2637
main.main()
/home/raul/Desktop/Proyecto/go-llama.cpp/examples/main.go:35 +0x38f fp=0xc00004ef80 sp=0xc00004ecf8 pc=0x4b460f
runtime.main()
/usr/lib/go-1.20/src/runtime/proc.go:250 +0x207 fp=0xc00004efe0 sp=0xc00004ef80 pc=0x454ce7
runtime.goexit()
/usr/lib/go-1.20/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc00004efe8 sp=0xc00004efe0 pc=0x47fac1
goroutine 2 [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/lib/go-1.20/src/runtime/proc.go:381 +0xd6 fp=0xc000040fb0 sp=0xc000040f90 pc=0x455116
runtime.goparkunlock(...)
/usr/lib/go-1.20/src/runtime/proc.go:387
runtime.forcegchelper()
/usr/lib/go-1.20/src/runtime/proc.go:305 +0xb0 fp=0xc000040fe0 sp=0xc000040fb0 pc=0x454f50
runtime.goexit()
/usr/lib/go-1.20/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000040fe8 sp=0xc000040fe0 pc=0x47fac1
created by runtime.init.6
/usr/lib/go-1.20/src/runtime/proc.go:293 +0x25
goroutine 3 [GC sweep wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/lib/go-1.20/src/runtime/proc.go:381 +0xd6 fp=0xc000041780 sp=0xc000041760 pc=0x455116
runtime.goparkunlock(...)
/usr/lib/go-1.20/src/runtime/proc.go:387
runtime.bgsweep(0x0?)
/usr/lib/go-1.20/src/runtime/mgcsweep.go:278 +0x8e fp=0xc0000417c8 sp=0xc000041780 pc=0x441e8e
runtime.gcenable.func1()
/usr/lib/go-1.20/src/runtime/mgc.go:178 +0x26 fp=0xc0000417e0 sp=0xc0000417c8 pc=0x437366
runtime.goexit()
/usr/lib/go-1.20/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0000417e8 sp=0xc0000417e0 pc=0x47fac1
created by runtime.gcenable
/usr/lib/go-1.20/src/runtime/mgc.go:178 +0x6b
goroutine 4 [GC scavenge wait]:
runtime.gopark(0xc000068000?, 0x5884e0?, 0x1?, 0x0?, 0x0?)
/usr/lib/go-1.20/src/runtime/proc.go:381 +0xd6 fp=0xc000041f70 sp=0xc000041f50 pc=0x455116
runtime.goparkunlock(...)
/usr/lib/go-1.20/src/runtime/proc.go:387
runtime.(*scavengerState).park(0x6a2940)
/usr/lib/go-1.20/src/runtime/mgcscavenge.go:400 +0x53 fp=0xc000041fa0 sp=0xc000041f70 pc=0x43fdd3
runtime.bgscavenge(0x0?)
/usr/lib/go-1.20/src/runtime/mgcscavenge.go:628 +0x45 fp=0xc000041fc8 sp=0xc000041fa0 pc=0x4403a5
runtime.gcenable.func2()
/usr/lib/go-1.20/src/runtime/mgc.go:179 +0x26 fp=0xc000041fe0 sp=0xc000041fc8 pc=0x437306
runtime.goexit()
/usr/lib/go-1.20/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000041fe8 sp=0xc000041fe0 pc=0x47fac1
created by runtime.gcenable
/usr/lib/go-1.20/src/runtime/mgc.go:179 +0xaa
goroutine 18 [finalizer wait]:
runtime.gopark(0x1a0?, 0x6a2d80?, 0xa0?, 0x61?, 0xc000040770?)
/usr/lib/go-1.20/src/runtime/proc.go:381 +0xd6 fp=0xc000040628 sp=0xc000040608 pc=0x455116
runtime.runfinq()
/usr/lib/go-1.20/src/runtime/mfinal.go:193 +0x107 fp=0xc0000407e0 sp=0xc000040628 pc=0x4363a7
runtime.goexit()
/usr/lib/go-1.20/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0000407e8 sp=0xc0000407e0 pc=0x47fac1
created by runtime.createfing
/usr/lib/go-1.20/src/runtime/mfinal.go:163 +0x45
exit status 2
The text was updated successfully, but these errors were encountered: