WhisperX: Word-level timestamps, diarization (new), batch inference within file(new) #684
Replies: 13 comments 17 replies
-
@m-bain , This looks quite interesting. I'd love to be able to test it on Japanese when available. |
Beta Was this translation helpful? Give feedback.
-
what about this one? |
Beta Was this translation helpful? Give feedback.
-
If you're low on GPU RAM, running transcribe() from python seems to work where running the cli app for whisper (or via whisperx) won't. Also, if whisperx's align() function runs you out of GPU RAM, you totally can use a smaller WAV2VEC2 model. I've demonstrated both below. I'm no expert in Python, and no doubt I've probably done things improperly/needlessly below, but it works for me. And all on my very budget GTX GeForce 950. Maybe it will help someone understand what's happening. Thanks Max!
|
Beta Was this translation helpful? Give feedback.
-
Cool project, thanks! I had a similar idea I'm glad someone implemented it |
Beta Was this translation helpful? Give feedback.
-
update: whisperx now provides
|
Beta Was this translation helpful? Give feedback.
-
Could you possibly whip up a way to limit/lessen the amount of characters used in a sample? Like for example: Netflix reccommends 42 characters or one line of subtitles before it changes to a new timestamp; if that makes any sense :P. Maybe a setting that limits the characters to a specific amount, or a way to increase the frequency of timestamp tokens? Thanks! |
Beta Was this translation helpful? Give feedback.
-
Hi @m-bain , just saw the paper-drop note (https://arxiv.org/abs/2303.00747). Good work and big big congratulations!!! |
Beta Was this translation helpful? Give feedback.
-
Hi @m-bain, How do I run whisperx with Hebrew Word-level timestamps? Thanks |
Beta Was this translation helpful? Give feedback.
-
Looks great. I just wanted to test it in colab.research.google.com. I used these install commands: When I run it using !whisperx "file" --model..., I'm getting this error: What can be the reason? Thanks. |
Beta Was this translation helpful? Give feedback.
-
I'm getting this error when using Whisperx in Google colabs since today. |
Beta Was this translation helpful? Give feedback.
-
This seems to be a great project. Thank you for sharing. Now that I have the *.word.srt file, how do I convert the contents to an SRT file with 42 characters per line and 2 lines at a time? `1 2 3 4 5 6 7 8 9 10 11 12 |
Beta Was this translation helpful? Give feedback.
-
Now, getting this "Repository unavailable due to DMCA takedown.". Whisperx has been taken down for now |
Beta Was this translation helpful? Give feedback.
-
For anyone who may find this useful, I used ChatGPT to create the Python file I needed.
` def read_srt_file(file_path): def write_srt_file(file_path, srt_data): def split_lines(subtitle_text): def process_srt_data(srt_data): if name == "main": |
Beta Was this translation helpful? Give feedback.
-
Hi,
I've released whisperX which refines the timestamps from whisper transcriptions using forced alignment a phoneme-based ASR model (e.g. wav2vec 2.0). This provides word-level timestamps, as well as improved segment timestamps.
I hacked this fairly up fairly quickly so feedback is welcome, and it's worth playing around with the hyperparameters (particularly how much to extend the original whisper segment -- sometimes these can be super inaccurate).
Example:
Using whisper out of the box (
medium.en
), many transcriptions are out of sync:sample_whisper_og.mov
Now, using WhisperX (
medium.en
) with forced alignment to wav2vec2.0:sample_whisperx.mov
And supports other languages:
sample_de_01_vis.mov
Beta Was this translation helpful? Give feedback.
All reactions