New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spam Attack #2105
Comments
See this openai/whisper#1783 |
I have been contemplating for the past two months on how to utilize limited resources: 4 * V100, 4 * P100, 2 * 2080ti, and 200 A100 card hours gifted to me by someone else, to partially solve these issues. Whisper sometimes experiences severe hallucinations; you can check this paper https://arxiv.org/pdf/2402.08021. The reason for these severe hallucinations is that Whisper itself is trained on a weakly labeled dataset with considerable noise, making it prone to learning irrelevant information. My current idea is to distill Whisper Large v2, use it to label datasets, then clean those datasets using LLM and other neural networks. Finally train a new Whisper based on the Mixture of Experts (MoE) architecture. However, I'm not entirely sure if this approach will be successful. The current vocabulary of Whisper is still too small, now only 60K, which will affect the performance of the model. Also, the context is too small, currently only 448 tokens, which needs to be expanded. |
Noticed there's prediction outputs that include spam:
Source audio file is 30s long, zero-padded on the end with about 20s of (absolute) silence.
I followed the Quick Start guide:
git clone https://github.com/ggerganov/whisper.cpp.git
bash ./models/download-ggml-model.sh base.en
make
./main -ocsv -f myfile.wav
I've just started looking at this project, so I don't know the problem deeply, but seems the model downloaded by
/models/download-ggml-model.sh
(https://huggingface.co/ggerganov/whisper.cpp
) might be the issue.The text was updated successfully, but these errors were encountered: