v1.4.0 #838
ggerganov
announced in
Announcements
v1.4.0
#838
Replies: 1 comment
-
Is it possible to get command tool binary for windows in this version ? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Overview
This is a new major release adding integer quantization and partial GPU (NVIDIA) support
Integer quantization
This allows the
ggml
Whisper models to be converted from the default 16-bit floating point weights to 4, 5 or 8 bit integer weights.The resulting quantized models are smaller in disk size and memory usage and can be processed faster on some architectures. The transcription quality is degraded to some extend - not quantified at the moment.
Q4_0
,Q4_1
,Q4_2
,Q5_0
,Q5_1
,Q8_0
Q5
quantized models: https://whisper.ggerganov.comHere is a quantitative evaluation of the different quantization modes applied to the LLaMA and RWKV large language models. These results can give an impression about the expected quality, size and performance improvements for quantized Whisper models:
LLaMA quantization (measured on M1 Pro)
ref: https://github.com/ggerganov/llama.cpp#quantization
RWKV quantization
Q4_0
Q4_1
Q4_2
Q5_0
Q5_1
Q8_0
FP16
FP32
ref: ggerganov/ggml#89 (comment)
This feature is possible thanks to the many contributions in the llama.cpp project: https://github.com/users/ggerganov/projects/2
GPU support via cuBLAS
Using cuBLAS results mainly in improved Encoder inference speed. I haven't done proper timings, but one can expect at least 2-3 times faster Encoder evaluation with modern NVIDIA GPU cards compared to CPU-only processing. Feel free to post your Encoder benchmarks in issue #89.
This is another feature made possible by the llama.cpp project. Special recognition to @slaren for putting almost all of this work together
This release remains in "beta" stage as I haven't verified that everything works as expected.
What's Changed
New Contributors
Full Changelog: v1.3.0...v1.4.0
This discussion was created from the release v1.4.0.
Beta Was this translation helpful? Give feedback.
All reactions