Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no hallucinations on the first generation. #421

Open
violetgoing opened this issue Dec 12, 2024 · 5 comments
Open

no hallucinations on the first generation. #421

violetgoing opened this issue Dec 12, 2024 · 5 comments
Assignees
Labels
enhancement New feature or request hallucination hallucination of the models
Milestone

Comments

@violetgoing
Copy link

Download URL for sample audio

@violetgoing violetgoing added the hallucination hallucination of the models label Dec 12, 2024
@jhj0517
Copy link
Owner

jhj0517 commented Dec 13, 2024

Hi, Thanks for raising the issue.
I don't know why you got good results on the first try, but I was able to reproduce the hallucination using only the default parameters.
Probably the parameters are different between the first generation and the next.

The hallucination is removed with

  1. VAD turned on
  2. Using large-v2 models.

Can you try again with VAD turned on and large-v2? This is the result I got:

https://gist.github.com/jhj0517/7d44c4dbafa0f838208fc5c06516504e

@violetgoing
Copy link
Author

i have enabled large-v2, lang russian, vad, bgm separation, diarization and one word on the entire file but diarization works
image
also i converted m4a to mp3

@jhj0517
Copy link
Owner

jhj0517 commented Dec 13, 2024

i have enabled large-v2, lang russian, vad, bgm separation, diarization and one word on the entire file

That shouldn't happen but is probably caused by the hallucination of the bgm separation model.

Try turning on VAD only to clean the audio.
You can turn on diarization if you want, because it has nothing to do with cleaning the audio itself.

I've tested with following default VAD parameters:

image

I could observe no word repetition :

@violetgoing
Copy link
Author

violetgoing commented Dec 13, 2024

@jhj0517 It worked, but there is a new problem that the text from the same speaker is stretched on several lines when it can be placed in one, is this a feature of large-v2?
image

@jhj0517 jhj0517 added this to the vad milestone Dec 14, 2024
@jhj0517 jhj0517 added the enhancement New feature or request label Dec 14, 2024
@jhj0517
Copy link
Owner

jhj0517 commented Dec 14, 2024

stretched on several lines when it can be placed in one

That's related to how VAD works in the transcription pipeline, which definitely needs better implementation.

According to #396 (comment) whisperX has better implementation for it, I'll work on VAD when I have time.

Or if someone make PR for it, it would be greatly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request hallucination hallucination of the models
Projects
None yet
Development

No branches or pull requests

2 participants