[REQUEST] Support for a Qwen based vision model #672

TyraVex · 2024-11-12T09:27:04Z

Problem

Hello,

I'm very pleased to see exllama getting vision capabilities for the first time with Pixtral!

You hinted at supporting new models in the release notes. What models are you hopping to support?

Solution

If I may suggest a few ideas, Qwen based vision models are the SOTA as of writing. Support for Qwen2-VL and/or NVML-D could be a huge step forward

Alternatives

No response

Explanation

Support for either of these beasts
https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct
https://huggingface.co/nvidia/NVLM-D-72B

Examples

No response

Additional context

Forgot to mention that the Qwen VL model family offers multiple sizes (2B, 7B, 72B), which could be convenient for the GPU poor community.

Acknowledgements

I have looked for similar requests before submitting this one.
I understand that the developers have lives and my issue will be answered when possible.
I understand the developers of this program are human, and I will make my requests politely.

turboderp · 2024-11-18T14:00:28Z

Qwen2-VL is supported (images at least, not video just yet) on the dev branch. NVLM-D looks interesting, and I might consider it next, once Qwen2-VL support is complete.

TyraVex · 2024-11-18T15:57:40Z

It's chrismas every day here
Thank you so much, this is so useful
I have plently of projects that will rely on this feature :)

Lissanro · 2024-11-26T04:46:36Z

@turboderp I tried to test loading Qwen2-VL (downloaded from https://huggingface.co/turboderp/Qwen2-VL-72B-Instruct-exl2/tree/6.0bpw ), I updated ExllamaV2 to the latest dev branch in TabbyAPI venv, but when I try to ask about attached image, the model says it does not see it.

I tried both Text Completion (http://127.0.0.1:5000/ API URL) and Chat Completion (http://127.0.0.1:5000/v1 API URL) options in SillyTavern. I know that at least Chat Completion should work with images because it works with OpenedAI-Vision, so maybe I have something misconfigured?

Are there steps to try to test the vision capabilities? Also, are Text and Chat completions both supposed to work, or only one of them?

Alexey-Akishin · 2024-11-28T11:00:02Z

I have the same issue, qwen2 vl 72b does not see any images, it works just like text-only model. I am using ST too as the frontend, and I manually installed latest exllamav2 into the tabbyapi environment.

Was anyone able to make the model work with images using tabbyapi, and if yes, in what frontend?

remichu-ai · 2024-11-29T00:36:03Z

Are you installing it with the dev branch?
If you already installed it from the dev branch, just refer to the sample code here to confirm that you have exllama with vision working

https://github.com/turboderp/exllamav2/blob/master/examples/multimodal_pixtral.py

If this is working then meaning the issue is with either your front end doesnt support it or your backend that run openai spec server doesnt support it yet

meigami0 · 2024-12-04T01:44:36Z

Are you installing it with the dev branch? If you already installed it from the dev branch, just refer to the sample code here to confirm that you have exllama with vision working

https://github.com/turboderp/exllamav2/blob/master/examples/multimodal_pixtral.py

If this is working then meaning the issue is with either your front end doesnt support it or your backend that run openai spec server doesnt support it yet

Have you tried the qwen vl 2b model? https://huggingface.co/turboderp/Qwen2-VL-2B-Instruct-exl2
I tried this sample code，the qwen vl 7b quantized model is working, but for qwen vl 2b, both the original model and the quantized model only return special tokens or repeating content.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REQUEST] Support for a Qwen based vision model #672

[REQUEST] Support for a Qwen based vision model #672

TyraVex commented Nov 12, 2024 •

edited

Loading

turboderp commented Nov 18, 2024

TyraVex commented Nov 18, 2024

Lissanro commented Nov 26, 2024

Alexey-Akishin commented Nov 28, 2024

remichu-ai commented Nov 29, 2024

meigami0 commented Dec 4, 2024

[REQUEST] Support for a Qwen based vision model #672

[REQUEST] Support for a Qwen based vision model #672

Comments

TyraVex commented Nov 12, 2024 • edited Loading

Problem

Solution

Alternatives

Explanation

Examples

Additional context

Acknowledgements

turboderp commented Nov 18, 2024

TyraVex commented Nov 18, 2024

Lissanro commented Nov 26, 2024

Alexey-Akishin commented Nov 28, 2024

remichu-ai commented Nov 29, 2024

meigami0 commented Dec 4, 2024

TyraVex commented Nov 12, 2024 •

edited

Loading