-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encapsulate generation options in a config struct #193
Conversation
# TODO: Whisper generation_config.json doesn't have task-specific | ||
# tokens and those are instead added on the fly before generation. | ||
# We need to add support for model-specific configuration like | ||
# language and task and update configuration based on that | ||
data = | ||
case data do | ||
%{"forced_decoder_ids" => [[1, nil], [2, 50359]]} -> | ||
put_in(data["forced_decoder_ids"], [[1, 50259], [2, 50359], [3, 50363]]) | ||
|
||
data -> | ||
data | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words, this basically maintains the current behaviour before we add language/task options. These options are Whisper-specific, but we need a way to pass them to the serving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a enterely beginner in AI so I'm not sure if this helps or even if it is a good practice, but I ended with this approach:
defp serving do
{:ok, whisper} = Bumblebee.load_model({:hf, "openai/whisper-small"})
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "openai/whisper-small"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "openai/whisper-small"})
Bumblebee.Audio.speech_to_text(whisper, featurizer, tokenizer,
max_new_tokens: 100,
defn_options: [compiler: EXLA],
forced_token_ids: [
{1, 50267}, # %{"languages" => [{50259, en}, {50262, es}, {50267, pt}]}
{2, 50359}, # %{"modes" => [{50358, translate}, {50359, transcribe}]}
{3, 50363} # %{"opts" => [{50363, notimestamps}]}
]
)
end
And here's my first attempt:
defp serving do
{:ok, whisper} = Bumblebee.load_model({:hf, "openai/whisper-small"})
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "openai/whisper-small"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "openai/whisper-small"})
spec = Bumblebee.configure(whisper.spec, forced_token_ids: [{1, 50267}, {2, 50359}, {3, 50363}])
whisper = %{whisper | spec: spec}
Bumblebee.Audio.speech_to_text(whisper, featurizer, tokenizer,
max_new_tokens: 100,
defn_options: [compiler: EXLA]
)
end
While testing with livebook, both worked fine for me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something like this could work for this issue:
defp serving do
{:ok, whisper} = Bumblebee.load_model({:hf, "openai/whisper-small"})
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "openai/whisper-small"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "openai/whisper-small"})
Bumblebee.Audio.speech_to_text(whisper, featurizer, tokenizer,
max_new_tokens: 100,
defn_options: [compiler: EXLA],
# translate these options inside `Bumblebee.Audio.speech_to_text`
lang: :pt,
mode: :translate,
timestamps: true
)
end
What you think? It must have some standard since Bumblebee.Audio
module is model agnostic, but I believe that at least language is an acceptable option. Also this module may have a internal mechanism to parse these generic options into model specific options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alissonfpmorais thanks for the feedback, after the changes you could do:
{:ok, generation_config} = Bumblebee.load_generation_config({:hf, "openai/whisper-small"})
generation_config =
Bumblebee.configure(generation_config, forced_token_ids: [{1, 50267}, {2, 50359}, {3, 50363}])
I think eventually the end-user API could be that serving can accept additional options like :lang
in this case and then the generation behaviour (model module) can handle the additional options and alters the generation config. However, we may need to reconfigure the tokenizer as well. Also perhaps we can figure out making some of the options (like language) configurable at run time.
Basically there are a couple considerations that are not strictly coupled with this change, so I put them in a separate issue :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks the quick reply!
Oh I see, I'm gonna update my code after the changes.
Also enabling options being configurable at runtime would be awesome!
Thanks for the amazing job on the lib :))
CI fails because we try to run too many slow tests and run of resources (likely disk space). Everything passes locally, so I'm merging :) |
Closes #87.
Now whenever we generate text we will also load generation config (sourced from either
generation_config.json
orconfig.json
).This also adds generation strategy option, which is a map like
%{type: :contrastive_search, top_k: 4, penalty_alpha: 0.6}
. Since there are very few strategy-specific options I think the nesting is fine and we can require all of them. This way we avoid conflicting option names, and listing options that are not relevant for the configured strategy.