-
-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
no hallucinations on the first generation. #421
Comments
Hi, Thanks for raising the issue. The hallucination is removed with
Can you try again with VAD turned on and https://gist.github.com/jhj0517/7d44c4dbafa0f838208fc5c06516504e |
That shouldn't happen but is probably caused by the hallucination of the bgm separation model. Try turning on VAD only to clean the audio. I've tested with following default VAD parameters: I could observe no word repetition : |
@jhj0517 It worked, but there is a new problem that the text from the same speaker is stretched on several lines when it can be placed in one, is this a feature of large-v2? |
That's related to how VAD works in the transcription pipeline, which definitely needs better implementation. According to #396 (comment) whisperX has better implementation for it, I'll work on VAD when I have time. Or if someone make PR for it, it would be greatly appreciated. |
Download URL for sample audio
I noticed that at the first generation there are no hallucinations, but then they appear, maybe it is related to the loading of VRAM?
The text was updated successfully, but these errors were encountered: