Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spam Attack #2105

Open
DariusAlexander opened this issue Apr 29, 2024 · 2 comments
Open

Spam Attack #2105

DariusAlexander opened this issue Apr 29, 2024 · 2 comments

Comments

@DariusAlexander
Copy link

Noticed there's prediction outputs that include spam:

start,end,text
0,8640," 6 greens of fresh snow peas, 5 thick slabs of blue cheese and maybe a snack for her brothered"
8640,9000," Bob."
9000,16000," For more information visit www.beadaholique.com to purchase beading supplies and to get design ideas!"
16000,23000," www.beadaholique.com to purchase beading supplies and to get design ideas!"
23000,30000," www.beadaholique.com to purchase beading supplies and to get design ideas!"

Source audio file is 30s long, zero-padded on the end with about 20s of (absolute) silence.

I followed the Quick Start guide:
git clone https://github.com/ggerganov/whisper.cpp.git
bash ./models/download-ggml-model.sh base.en
make
./main -ocsv -f myfile.wav

I've just started looking at this project, so I don't know the problem deeply, but seems the model downloaded by /models/download-ggml-model.sh (https://huggingface.co/ggerganov/whisper.cpp) might be the issue.

@bobqianic
Copy link
Collaborator

See this openai/whisper#1783

@bobqianic
Copy link
Collaborator

bobqianic commented Apr 29, 2024

I have been contemplating for the past two months on how to utilize limited resources: 4 * V100, 4 * P100, 2 * 2080ti, and 200 A100 card hours gifted to me by someone else, to partially solve these issues. Whisper sometimes experiences severe hallucinations; you can check this paper https://arxiv.org/pdf/2402.08021. The reason for these severe hallucinations is that Whisper itself is trained on a weakly labeled dataset with considerable noise, making it prone to learning irrelevant information. My current idea is to distill Whisper Large v2, use it to label datasets, then clean those datasets using LLM and other neural networks. Finally train a new Whisper based on the Mixture of Experts (MoE) architecture. However, I'm not entirely sure if this approach will be successful.

The current vocabulary of Whisper is still too small, now only 60K, which will affect the performance of the model. Also, the context is too small, currently only 448 tokens, which needs to be expanded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants