Releases: ggerganov/llama.cpp
Releases · ggerganov/llama.cpp
b3145
b3143
move BLAS to a separate backend (#6210) * move BLAS to a separate backend * rename GGML_USE_OPENBLAS to GGML_USE_BLAS * alloc : reuse same buffer when the same buffer type if used multiple times * set number of threads automatically for openblas and blis * sched : print assignments when GGML_SCHED_DEBUG env variable is set * sched : allow ops with weights on an incompatible buffer type This will cause the weight to be copied to a backend that supports the op, which is very costly. The weight should have been stored in a buffer of a backend that can run the op, but llama.cpp cannot do this automatically at the moment. --------- Co-authored-by: Georgi Gerganov <[email protected]>
b3140
tests : add non-cont unary tests (#7857) * tests : add non-cont unary tests * ggml : update unary asserts and "supports_op" ggml-ci
b3139
ggml : improve ggml_is_contiguous logic (#7856) * ggml : improve ggml_is_contiguous logic ggml-ci * ggml : support more contiguous cases ggml-ci
b3138
server : restore numeric prompts (#7883)
b3135
vulkan: select only one device for single gpu with multiple drivers (… …#7582)
b3134
Update Vulkan RoPE implementation (#7818) * Update Vulkan RoPE implementation * Return nullptr on alloc_buffer when allocation fails, instead of throwing an exception Minor fixes * Fix segfault when running out of VRAM Co-authored-by: slaren <[email protected]> --------- Co-authored-by: slaren <[email protected]>
b3131
llama-bench: more compact markdown tables (#7879)
b3130
tests : check the Python version (#7872) ggml-ci
b3091
ggml : refactor rope norm/neox (#7634) * ggml : unify rope norm/neox (CPU) * ggml : fix compile warning * ggml : remove GLM rope mode ggml-ci * metal : better rope implementation ggml-ci * cuda : better rope implementation ggml-ci * naming : n_orig_ctx -> n_ctx_orig ggml-ci * dev : add reminders to update backends ggml-ci * vulkan : fix ggml_rope_ext() usage * cuda : fix array size + indents ggml-ci