Replies: 3 comments 1 reply
-
phi3-vision just came out today. |
Beta Was this translation helpful? Give feedback.
-
It's not really clear what you're asking. GPT-4o is a multimodal model, that understands text, images, and audio in the SAME weights, so it has those things conceptually connected, and it can have an end-to-end conversation with that one model. That's really nice because you get things like understanding of emotion and background sounds and images in the environment, and very appropriate output with similar emotion etc. But, it's very possible to just connect a speech-to-text model, an image-to-text model, an llm, and a text-to-speech model in a pipeline, to get a similar effect with a little less integration. That's what Open-WebUI does today, and a server like LocalAI will provide all of the necessary models for it. |
Beta Was this translation helpful? Give feedback.
-
Actually I just heared recently of a model using both audio and text and answers in real time with audio or text. It's going to be opensource and was announced like 6 months ago. It's by kyutai and called Moshi. But their PR sucks a tremendouas amount so it's still not here apparetly. They have a demo available there: https://moshi.chat/?queue_id=talktomoshi |
Beta Was this translation helpful? Give feedback.
-
It seems like what we are trying to achieve with voice and vision was outran by openai again.
Is anyone aware of an opensource alternative that we can plug into OpenWebUI?
It got me seriously thinking and confused whether to subscribe to their API or hit the $2000 donate button on this repo at this point. Its a difficult choice. Or maybe Im too far behind everyone else.
I searched the web for opensource alternatives to GPT4o, but WWW can't find it either.
So I ask it here, hoping to plug it into ollama or this great webui.
Its good to see both vision and voice integrated in this repo like what OpenAI does to GPT4o but I am not aware of such capable model yet.
Lastly, Id like to get in touch with a developer that is very familiar with how Openwebui works. I don't mind working with a knowledgeable software engineer who can plug my custom feature requests in, with monthly salary as a work for hire. Coz I figure that if I have access to someone who can plug these features in to this great project, news like this couldn't have been much of a challenge.
I was trying to push an AI product but then openAI released this which completely outdated everything that I worked for and affected its marketability.
Looking forward to a collaborative chat with interested developers for a solution integrated to this UX.
Thank you very much and God bless.
Beta Was this translation helpful? Give feedback.
All reactions