Notes on repetitions #38

jwijffels · 2024-01-29T13:28:29Z

Strategies to reduce repetitions / hallucinations

Use Voice Activity Detection (e.g. https://github.com/bnosac/audio.vadwebrtc or https://github.com/bnosac/audio.vadsilero) to remove silences
From Whisper large v3 model repeats a lot ggerganov/whisper.cpp#1507 (comment):

Use 5 beams
Increase entropy threshold from the default 2.4 to 2.8 for example. Higher threshold will reject repetitive text and fallback to sampling with higher temperature
Reduce the maximum context size (--max-context). By default it is 224. Setting it to 64 or 32 can reduce the repetitions significantly. Setting it to 0 will most likely eliminate all repetitions, but the transcription quality can be affected because it will be losing the context from the previous transcript

Related to timestamps: see ggerganov/whisper.cpp#1724

jwijffels · 2024-01-29T14:50:32Z

TODO: add R function to detect repetitions, the location in the audio/transcription where this occurs and after which the model does not recover, such that it can be used to relaunch the transcription with other settings or a better model.

jmgirard · 2024-03-25T16:16:32Z

I've been running into this issue a lot with large-v3. Makes it basically unusable for my purposes. Sounds like v2 may be better?

jwijffels · 2024-03-25T16:21:39Z

yes, large-v2 or medium and remove silences - best model for silence removal is Silero, webrtc is a lot faster but less accurate.

Next plug in the detected non-silence periods in the predict function - either use argument sections (which will create a new audio file based on these voiced sections) or arguments offset/duration (which will also look a bit around the cutoff timepoints) - available since audio.whisper 0.4

Next to that, I hope ggerganov/whisper.cpp#1768 will also make improvements once incorporated in whisper.cpp and in audio.whisper

jmgirard · 2024-03-25T22:28:43Z

large-v2 seems to be doing better (even without removing the silences). Interestingly, it is also running a lot faster than v3, presumably because it is not wasting as much time hallucinating. Trying audio.vadsilero now... Moved discussion over to #62

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notes on repetitions #38

Notes on repetitions #38

jwijffels commented Jan 29, 2024 •

edited

Loading

jwijffels commented Jan 29, 2024

jmgirard commented Mar 25, 2024

jwijffels commented Mar 25, 2024 •

edited

Loading

jmgirard commented Mar 25, 2024 •

edited

Loading

Notes on repetitions #38

Notes on repetitions #38

Comments

jwijffels commented Jan 29, 2024 • edited Loading

jwijffels commented Jan 29, 2024

jmgirard commented Mar 25, 2024

jwijffels commented Mar 25, 2024 • edited Loading

jmgirard commented Mar 25, 2024 • edited Loading

jwijffels commented Jan 29, 2024 •

edited

Loading

jwijffels commented Mar 25, 2024 •

edited

Loading

jmgirard commented Mar 25, 2024 •

edited

Loading