-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notes on repetitions #38
Comments
TODO: add R function to detect repetitions, the location in the audio/transcription where this occurs and after which the model does not recover, such that it can be used to relaunch the transcription with other settings or a better model. |
I've been running into this issue a lot with large-v3. Makes it basically unusable for my purposes. Sounds like v2 may be better? |
yes, large-v2 or medium and remove silences - best model for silence removal is Silero, webrtc is a lot faster but less accurate. Next plug in the detected non-silence periods in the predict function - either use argument sections (which will create a new audio file based on these voiced sections) or arguments offset/duration (which will also look a bit around the cutoff timepoints) - available since audio.whisper 0.4 Next to that, I hope ggerganov/whisper.cpp#1768 will also make improvements once incorporated in whisper.cpp and in audio.whisper |
large-v2 seems to be doing better (even without removing the silences). Interestingly, it is also running a lot faster than v3, presumably because it is not wasting as much time hallucinating. Trying audio.vadsilero now... Moved discussion over to #62 |
Strategies to reduce repetitions / hallucinations
Related to timestamps: see ggerganov/whisper.cpp#1724
The text was updated successfully, but these errors were encountered: