We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thanks for showing that in order to do exact transcription one needs to use a voice seperation model (UVR) and a silence cutting model (VAD) and only afterwards transcribe with Whisper. Found a list of recent state of the art voice seperation models here: https://github.com/ZFTurbo/Music-Source-Separation-Training?tab=readme-ov-file#vocal-models Experimenting with this one now: https://github.com/KimberleyJensen/Mel-Band-Roformer-Vocal-Model
The text was updated successfully, but these errors were encountered:
Thanks for noticing, I'll definitely take a look at it when I have time!
Sorry, something went wrong.
jhj0517
No branches or pull requests
Thanks for showing that in order to do exact transcription one needs to use a voice seperation model (UVR) and a silence cutting model (VAD) and only afterwards transcribe with Whisper.
Found a list of recent state of the art voice seperation models here:
https://github.com/ZFTurbo/Music-Source-Separation-Training?tab=readme-ov-file#vocal-models
Experimenting with this one now:
https://github.com/KimberleyJensen/Mel-Band-Roformer-Vocal-Model
The text was updated successfully, but these errors were encountered: