Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speak to text , and Text to Text for language trans "extensions" #203

Open
gedw99 opened this issue Oct 29, 2024 · 3 comments
Open

speak to text , and Text to Text for language trans "extensions" #203

gedw99 opened this issue Oct 29, 2024 · 3 comments

Comments

@gedw99
Copy link

gedw99 commented Oct 29, 2024

whisper can be wrapped with golang easily and then the system can do speak to text.

working demo here:

https://github.com/gedw99/galene-stt that is NOT integrated with broadcast-box yet.

This makefile works everywhere and "dep-test" will run and do an audio to text...


Text to Text might also be useful as another "extension".

Just raising to see if there is support for integration or not.

@gedw99 gedw99 changed the title speak to text , and Text to Text for language trans speak to text , and Text to Text for language trans "extensions" Oct 29, 2024
@ChaseCares
Copy link
Collaborator

Hello! This is a interesting idea, I can't speak to whether adding extensions would be feasible, or if that is something Sean would like to add. However, I do have reservations to adding speech recognition. I use and rely on stt and have first hand experience in how inaccurate the transcribing can be. I don't believe that it would be accurate enough to be useful, in my experience the transcriptions often require heavy editing to match what was spoken.

With that said, if it was going to be implemented, I think adding some warnings about accuracy would be a good idea, and that the text may be inaccurate or misleading. This is important because if a user is exclusively relying on the text to understand what is going on they would have no way to verify the accuracy.

I am optimistic about the technology, it has gotten significantly better recently. I would be very interesting to hear other people's opinions about it.

Thanks for the suggestion!

@gedw99
Copy link
Author

gedw99 commented Dec 6, 2024

Agree that accuracy warnings are required. I have found it to work really well , it proof is in the trying .

Once you use local STT it’s highly compelling because the latency is gone, so it feels so much more natural.

There is a PR that adds Pocketbase Auth to Broadcast box and I could add STT server side via benthos plugins . That’s one way to do it and not glog up the main binary. It’s a WASM plugin them .

This also means it can be added client side as a WASM plugin too. The OS have their own STT but it’s not available to webrtc browsers I believe ?

this architectural approach is 1 that I have found useful because it’s much easier to extend a system and not get package dependency bloat . Benthos is very light if you use the version with no extra stuff , and then use WASM or stdio to call the plugins.

@Sean-Der
@neilschark

any thoughts on this idea ???

@gedw99
Copy link
Author

gedw99 commented Jan 7, 2025

It’s pretty accurate at STT. I tested it go 10 minutes.

I integrated it into another project that uses a service worker to run the WASM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants