no hallucinations on the first generation. #421

violetgoing · 2024-12-12T16:14:16Z

Download URL for sample audio

Please upload download URL for sample audio file so I can test with some settings for better result. You can use https://easyupload.io/ or any other service to share.
I noticed that at the first generation there are no hallucinations, but then they appear, maybe it is related to the loading of VRAM?

jhj0517 · 2024-12-13T06:39:24Z

Hi, Thanks for raising the issue.
I don't know why you got good results on the first try, but I was able to reproduce the hallucination using only the default parameters.
Probably the parameters are different between the first generation and the next.

The hallucination is removed with

VAD turned on
Using large-v2 models.

Can you try again with VAD turned on and large-v2? This is the result I got:

https://gist.github.com/jhj0517/7d44c4dbafa0f838208fc5c06516504e

violetgoing · 2024-12-13T08:35:42Z

i have enabled large-v2, lang russian, vad, bgm separation, diarization and one word on the entire file but diarization works

also i converted m4a to mp3

jhj0517 · 2024-12-13T10:44:40Z

i have enabled large-v2, lang russian, vad, bgm separation, diarization and one word on the entire file

That shouldn't happen but is probably caused by the hallucination of the bgm separation model.

Try turning on VAD only to clean the audio.
You can turn on diarization if you want, because it has nothing to do with cleaning the audio itself.

I've tested with following default VAD parameters:

I could observe no word repetition :

https://gist.github.com/jhj0517/7d44c4dbafa0f838208fc5c06516504e

violetgoing · 2024-12-13T17:46:22Z

@jhj0517 It worked, but there is a new problem that the text from the same speaker is stretched on several lines when it can be placed in one, is this a feature of large-v2?

jhj0517 · 2024-12-14T05:33:48Z

stretched on several lines when it can be placed in one

That's related to how VAD works in the transcription pipeline, which definitely needs better implementation.

According to #396 (comment) whisperX has better implementation for it, I'll work on VAD when I have time.

Or if someone make PR for it, it would be greatly appreciated.

violetgoing added the hallucination hallucination of the models label Dec 12, 2024

violetgoing assigned jhj0517 Dec 12, 2024

jhj0517 added this to the vad milestone Dec 14, 2024

jhj0517 added the enhancement New feature or request label Dec 14, 2024

jhj0517 mentioned this issue Dec 17, 2024

Install faster-whisper directly from repository #428

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

no hallucinations on the first generation. #421

no hallucinations on the first generation. #421

violetgoing commented Dec 12, 2024

jhj0517 commented Dec 13, 2024

violetgoing commented Dec 13, 2024

jhj0517 commented Dec 13, 2024 •

edited

Loading

violetgoing commented Dec 13, 2024 •

edited

Loading

jhj0517 commented Dec 14, 2024

no hallucinations on the first generation. #421

no hallucinations on the first generation. #421

Comments

violetgoing commented Dec 12, 2024

jhj0517 commented Dec 13, 2024

violetgoing commented Dec 13, 2024

jhj0517 commented Dec 13, 2024 • edited Loading

violetgoing commented Dec 13, 2024 • edited Loading

jhj0517 commented Dec 14, 2024

jhj0517 commented Dec 13, 2024 •

edited

Loading

violetgoing commented Dec 13, 2024 •

edited

Loading