Do we have an opensource alternative to GPT4o? #2399

oliverbob · 2024-05-20T00:02:05Z

oliverbob
May 20, 2024

It seems like what we are trying to achieve with voice and vision was outran by openai again.

Is anyone aware of an opensource alternative that we can plug into OpenWebUI?

It got me seriously thinking and confused whether to subscribe to their API or hit the $2000 donate button on this repo at this point. Its a difficult choice. Or maybe Im too far behind everyone else.

I searched the web for opensource alternatives to GPT4o, but WWW can't find it either.

So I ask it here, hoping to plug it into ollama or this great webui.

Its good to see both vision and voice integrated in this repo like what OpenAI does to GPT4o but I am not aware of such capable model yet.

Lastly, Id like to get in touch with a developer that is very familiar with how Openwebui works. I don't mind working with a knowledgeable software engineer who can plug my custom feature requests in, with monthly salary as a work for hire. Coz I figure that if I have access to someone who can plug these features in to this great project, news like this couldn't have been much of a challenge.

I was trying to push an AI product but then openAI released this which completely outdated everything that I worked for and affected its marketability.

Looking forward to a collaborative chat with interested developers for a solution integrated to this UX.

Thank you very much and God bless.

shuasimodo · 2024-05-21T22:32:32Z

shuasimodo
May 21, 2024

phi3-vision just came out today.

0 replies

lee-b · 2024-09-14T13:57:13Z

lee-b
Sep 14, 2024

It's not really clear what you're asking. GPT-4o is a multimodal model, that understands text, images, and audio in the SAME weights, so it has those things conceptually connected, and it can have an end-to-end conversation with that one model. That's really nice because you get things like understanding of emotion and background sounds and images in the environment, and very appropriate output with similar emotion etc. But, it's very possible to just connect a speech-to-text model, an image-to-text model, an llm, and a text-to-speech model in a pipeline, to get a similar effect with a little less integration. That's what Open-WebUI does today, and a server like LocalAI will provide all of the necessary models for it.

0 replies

thiswillbeyourgithub · 2024-09-14T14:12:01Z

thiswillbeyourgithub
Sep 14, 2024

Its good to see both vision and voice integrated in this repo like what OpenAI does to GPT4o but I am not aware of such capable model yet

Actually I just heared recently of a model using both audio and text and answers in real time with audio or text.

It's going to be opensource and was announced like 6 months ago. It's by kyutai and called Moshi. But their PR sucks a tremendouas amount so it's still not here apparetly.

Here's a demo

They have a demo available there: https://moshi.chat/?queue_id=talktomoshi

1 reply

thiswillbeyourgithub Sep 23, 2024

Update:

moshi is out and with an open license : https://github.com/kyutai-labs/moshi
it's a foundational model so can't wait to see what the community will do with it
just a few days before, we got llama omni : https://github.com/ictnlp/LLaMA-Omni
groq is doing things with speech too to reduce latency : https://console.groq.com/docs/speech-text

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do we have an opensource alternative to GPT4o? #2399

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Do we have an opensource alternative to GPT4o? #2399

oliverbob May 20, 2024

Replies: 3 comments · 1 reply

shuasimodo May 21, 2024

lee-b Sep 14, 2024

thiswillbeyourgithub Sep 14, 2024

thiswillbeyourgithub Sep 23, 2024

oliverbob
May 20, 2024

Replies: 3 comments 1 reply

shuasimodo
May 21, 2024

lee-b
Sep 14, 2024

thiswillbeyourgithub
Sep 14, 2024