Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added GODEL support with T5 model type #376

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

Emulator000
Copy link

@Emulator000 Emulator000 commented May 11, 2023

This PR adds GODEL support as mentioned in #324 issue.

Notes
I just tested with a local download copy of the GODEL base model and it works except if I change this line:

let generated_response = &generated_sequence[input_length - removed_padding.0..];

Unfortunately it panics with:

thread 'main' panicked at 'range start index 6 out of range for slice of length 5', /home/dario/projects/rust-bert/src/pipelines/conversation.rs:940:43

If instead for testing purposes I put something like:

let generated_response = generated_sequence.as_slice();

It works greatly!

Could someone understand why we have this different behavior between GPT2 and T5 models? Is like we don't have the input sequence in the generated sequence so we don't have to remove it with padding.

@Emulator000
Copy link
Author

Emulator000 commented May 11, 2023

Also, we should upload the Rust ot model into Huggingface's space in order to work!

- Skip truncation of prompt for encoder-decoder models
- Add right padding logic for encoder-decoder models
@guillaume-be
Copy link
Owner

Thank you @Emulator000 for identifying the issue with T5 models and submitting this PR. I have opened a small PR to address the issue without commenting out the input prompt filtering that is needed for decoder models. The suggested changes should work for both encoder-decoders (such as T5) and decoders (such as DialoGPT) - please have a look and merge if this is fine - this should update this PR at the same time.

It would be good to have the rust-version of the GODEL models pushed to the Hugging Face model hub before this gets merged since they are registered in the library. Can you please submit a PR to add the rust weights to the upstream repositories?

When this is completed it would also be great to have a test with the smallest GODEL model, please consider including one as part of the T5 test suite.

Thanks!

@Emulator000
Copy link
Author

please have a look and merge if this is fine - this should update this PR at the same time.

Definitely good and it works, I just tested it!

It would be good to have the rust-version of the GODEL models pushed to the Hugging Face model hub before this gets merged since they are registered in the library. Can you please submit a PR to add the rust weights to the upstream repositories?

I've already converted it from Microsoft's original weights but how can I push into their upstream repository? Where can I open the PR?

When this is completed it would also be great to have a test with the smallest GODEL model, please consider including one as part of the T5 test suite.

Sure, I'll add tests for this right now 😄

@Emulator000
Copy link
Author

@Emulator000
Copy link
Author

@guillaume-be tests added!

The only weird thing that I see here is the output from GODEL (you can see in the tests), for some strange reason, every generated texts starts with a blank space and I don't understand why, is something related to the padding?

@Emulator000
Copy link
Author

@guillaume-be should we have to ping Microsoft someway in order to get PRs merged?

@guillaume-be
Copy link
Owner

@Emulator000 yes it seems notifications sometimes don't get through in the model hub. What has worked well in the past is leaving an issue on the model repo (in this case https://github.com/microsoft/GODEL/issues) and hope to be able to reach the Hugging Face repository organization owners.

@Emulator000
Copy link
Author

Just opened an issue here 💪🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants