Replies: 1 comment 3 replies
-
Oh well, answering my own question, OpenVINO was definitely worth pursuing! Used a 3.10 venv as outlined in the documentation to convert the Will try to modify the convert script to convert my French distilled model and I should be all set. Is it still worth it to build ggml with BLAS and Intel MKL or is everything happening on the GPU? |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi all, I'm looking to build a fast STT for use in Home Assistant. I'm coming from faster-whisper with a small model running directly on the N100 machine which runs Home Assistant. Each command was taking 6-7 seconds with really hit or miss results.
I recently learned about Vulkan support in whisper.cpp and decided to migrate the STT component to my home server, running a Xeon D-1521 and a discrete GPU. I am now able to run a large-v3 model in about 8-9 seconds with infinitely better accuracy, which I'll trade a couple seconds for any day. Vulkan is really a game changer, as it is about 10x faster compared to the CPU backend. It would be awesome if I could bring that down under the 5 second mark, but I'm struggling as everything I tried so far has had no effect at all.
Here's everything I tried:
bofenghuang/whisper-large-v3-french-distil-dec4
(which I understand is about the same as using the newer turbo model)The one thing I haven't tried yet is using OpenVINO which I believe can also run on the Arc GPU, however I haven't been able to yet as I'm currently stuck with Python and OpenVINO versions that are seemingly too recent for whisper.cpp.
Should I pursue OpenVINO given my current hardware, or have I hit a hard limit? Anything else that is worth trying (besides downgrading to a smaller model and sacrificing accuracy)?
Beta Was this translation helpful? Give feedback.
All reactions