-
-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REQUEST] Support for a Qwen based vision model #672
Comments
Qwen2-VL is supported (images at least, not video just yet) on the dev branch. NVLM-D looks interesting, and I might consider it next, once Qwen2-VL support is complete. |
It's chrismas every day here |
@turboderp I tried to test loading Qwen2-VL (downloaded from https://huggingface.co/turboderp/Qwen2-VL-72B-Instruct-exl2/tree/6.0bpw ), I updated ExllamaV2 to the latest dev branch in TabbyAPI venv, but when I try to ask about attached image, the model says it does not see it. I tried both Text Completion (http://127.0.0.1:5000/ API URL) and Chat Completion (http://127.0.0.1:5000/v1 API URL) options in SillyTavern. I know that at least Chat Completion should work with images because it works with OpenedAI-Vision, so maybe I have something misconfigured? Are there steps to try to test the vision capabilities? Also, are Text and Chat completions both supposed to work, or only one of them? |
I have the same issue, qwen2 vl 72b does not see any images, it works just like text-only model. I am using ST too as the frontend, and I manually installed latest exllamav2 into the tabbyapi environment. Was anyone able to make the model work with images using tabbyapi, and if yes, in what frontend? |
Are you installing it with the dev branch? https://github.com/turboderp/exllamav2/blob/master/examples/multimodal_pixtral.py If this is working then meaning the issue is with either your front end doesnt support it or your backend that run openai spec server doesnt support it yet |
Have you tried the qwen vl 2b model? https://huggingface.co/turboderp/Qwen2-VL-2B-Instruct-exl2 |
Problem
Hello,
I'm very pleased to see exllama getting vision capabilities for the first time with Pixtral!
You hinted at supporting new models in the release notes. What models are you hopping to support?
Solution
If I may suggest a few ideas, Qwen based vision models are the SOTA as of writing. Support for Qwen2-VL and/or NVML-D could be a huge step forward
Alternatives
No response
Explanation
Support for either of these beasts
https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct
https://huggingface.co/nvidia/NVLM-D-72B
Examples
No response
Additional context
Forgot to mention that the Qwen VL model family offers multiple sizes (2B, 7B, 72B), which could be convenient for the GPU poor community.
Acknowledgements
The text was updated successfully, but these errors were encountered: