Encapsulate generation options in a config struct #193

jonatanklosko · 2023-04-06T16:29:24Z

Closes #87.

Now whenever we generate text we will also load generation config (sourced from either generation_config.json or config.json).

# Load the config with model defaults
{:ok, generation_config} = Bumblebee.load_generation_config({:hf, "gpt2"})
# Further customize the options
generation_config = Bumblebee.configure(generation_config, max_new_tokens: 10)

This also adds generation strategy option, which is a map like %{type: :contrastive_search, top_k: 4, penalty_alpha: 0.6}. Since there are very few strategy-specific options I think the nesting is fine and we can require all of them. This way we avoid conflicting option names, and listing options that are not relevant for the configured strategy.

jonatanklosko · 2023-04-06T16:31:31Z

lib/bumblebee/text/generation_config.ex

+      # TODO: Whisper generation_config.json doesn't have task-specific
+      # tokens and those are instead added on the fly before generation.
+      # We need to add support for model-specific configuration like
+      # language and task and update configuration based on that
+      data =
+        case data do
+          %{"forced_decoder_ids" => [[1, nil], [2, 50359]]} ->
+            put_in(data["forced_decoder_ids"], [[1, 50259], [2, 50359], [3, 50363]])
+
+          data ->
+            data
+        end


In other words, this basically maintains the current behaviour before we add language/task options. These options are Whisper-specific, but we need a way to pass them to the serving.

I'm a enterely beginner in AI so I'm not sure if this helps or even if it is a good practice, but I ended with this approach:

defp serving do {:ok, whisper} = Bumblebee.load_model({:hf, "openai/whisper-small"}) {:ok, featurizer} = Bumblebee.load_featurizer({:hf, "openai/whisper-small"}) {:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "openai/whisper-small"}) Bumblebee.Audio.speech_to_text(whisper, featurizer, tokenizer, max_new_tokens: 100, defn_options: [compiler: EXLA], forced_token_ids: [ {1, 50267}, # %{"languages" => [{50259, en}, {50262, es}, {50267, pt}]} {2, 50359}, # %{"modes" => [{50358, translate}, {50359, transcribe}]} {3, 50363} # %{"opts" => [{50363, notimestamps}]} ] ) end

And here's my first attempt:

defp serving do {:ok, whisper} = Bumblebee.load_model({:hf, "openai/whisper-small"}) {:ok, featurizer} = Bumblebee.load_featurizer({:hf, "openai/whisper-small"}) {:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "openai/whisper-small"}) spec = Bumblebee.configure(whisper.spec, forced_token_ids: [{1, 50267}, {2, 50359}, {3, 50363}]) whisper = %{whisper | spec: spec} Bumblebee.Audio.speech_to_text(whisper, featurizer, tokenizer, max_new_tokens: 100, defn_options: [compiler: EXLA] ) end

While testing with livebook, both worked fine for me

Maybe something like this could work for this issue:

defp serving do {:ok, whisper} = Bumblebee.load_model({:hf, "openai/whisper-small"}) {:ok, featurizer} = Bumblebee.load_featurizer({:hf, "openai/whisper-small"}) {:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "openai/whisper-small"}) Bumblebee.Audio.speech_to_text(whisper, featurizer, tokenizer, max_new_tokens: 100, defn_options: [compiler: EXLA], # translate these options inside `Bumblebee.Audio.speech_to_text` lang: :pt, mode: :translate, timestamps: true ) end

What you think? It must have some standard since Bumblebee.Audio module is model agnostic, but I believe that at least language is an acceptable option. Also this module may have a internal mechanism to parse these generic options into model specific options.

@alissonfpmorais thanks for the feedback, after the changes you could do:

{:ok, generation_config} = Bumblebee.load_generation_config({:hf, "openai/whisper-small"}) generation_config = Bumblebee.configure(generation_config, forced_token_ids: [{1, 50267}, {2, 50359}, {3, 50363}])

I think eventually the end-user API could be that serving can accept additional options like :lang in this case and then the generation behaviour (model module) can handle the additional options and alters the generation config. However, we may need to reconfigure the tokenizer as well. Also perhaps we can figure out making some of the options (like language) configurable at run time.

Basically there are a couple considerations that are not strictly coupled with this change, so I put them in a separate issue :)

Thanks the quick reply!

Oh I see, I'm gonna update my code after the changes.
Also enabling options being configurable at runtime would be awesome!

Thanks for the amazing job on the lib :))

jonatanklosko · 2023-04-06T21:47:12Z

CI fails because we try to run too many slow tests and run of resources (likely disk space). Everything passes locally, so I'm merging :)

Encapsulate generation options in a config struct

6259dfb

jonatanklosko commented Apr 6, 2023

View reviewed changes

jonatanklosko mentioned this pull request Apr 6, 2023

Expand speech-to-text #187

Closed

3 tasks

josevalim approved these changes Apr 6, 2023

View reviewed changes

Update reference values for BLIP

5d26c1b

jonatanklosko merged commit 17c6d60 into main Apr 6, 2023

jonatanklosko deleted the jk-generation-config branch April 6, 2023 21:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encapsulate generation options in a config struct #193

Encapsulate generation options in a config struct #193

jonatanklosko commented Apr 6, 2023

jonatanklosko Apr 6, 2023

alissonfpmorais Apr 6, 2023 •

edited

Loading

alissonfpmorais Apr 6, 2023

jonatanklosko Apr 6, 2023

alissonfpmorais Apr 6, 2023

jonatanklosko commented Apr 6, 2023

Encapsulate generation options in a config struct #193

Encapsulate generation options in a config struct #193

Conversation

jonatanklosko commented Apr 6, 2023

jonatanklosko Apr 6, 2023

Choose a reason for hiding this comment

alissonfpmorais Apr 6, 2023 • edited Loading

Choose a reason for hiding this comment

alissonfpmorais Apr 6, 2023

Choose a reason for hiding this comment

jonatanklosko Apr 6, 2023

Choose a reason for hiding this comment

alissonfpmorais Apr 6, 2023

Choose a reason for hiding this comment

jonatanklosko commented Apr 6, 2023

alissonfpmorais Apr 6, 2023 •

edited

Loading