-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Whisper] Add word timestamps and confidence scores #201
Conversation
Super cool, thanks for adding that! |
Addresses #146 |
After measuring the time taken for operations to add word-level timestamps/scores, I've found that most are consumed by the extra model forward pass. There also appears to be overhead in the first run of DTW, likely due to Numba JIT compilation Below are the measured times from tests with the large model.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @bofenghuang this looks really nice to me and I think we can merge it!
One thing I'm wondering is if we can test the alignment code and/or the word timestamp code at all? It is a bit involved so it would be good to have a test or two to cover it.
d466638
to
bfbcb5d
Compare
Hi @awni, thanks for the review! I've just done a rebase and added a test for word-level timestamps & confidence, comparing the results with those from openai-whisper. |
Below are the new measured times from tests run on my mac m1 pro:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding this!! I updated the README to reflect the addition.
* Add word timestamps and confidence scores * Create a separate forward_with_cross_qk function * Move multiple ops from np to mlx, clean comments * Save alignment_heads * Cast qk to fp32 * Add test for word-level timestamps and confidence scores * format + readme * nit --------- Co-authored-by: Awni Hannun <[email protected]>
Hi @awni 👋
I've tried to add several new features to the Whisper implementation through this PR, following the implementation of the original repository:
transcribe()
openai/whisper#869)This is still a draft version that may require some optimizations:
median_filter
anddtw
. I used directly themedian_filter
from scipy, since I didn't find theunfold
function in mlx. As fordtw
, I kept the original numba versionqk
attention scores in the model forwardBelow are the benchmark times from tests run on my M1 Pro.