Everyone is welcome to contribute, and we value everybody's contribution. Code contributions are not the only way to help the community. Answering questions, helping others, and improving the documentation are also immensely valuable.
It also helps us if you spread the word! Reference the library in blog posts about the awesome projects it made possible, shout out on Twitter every time it has helped you, or simply ⭐️ the repository to say thank you.
However you choose to contribute, please be mindful and respect our code of conduct.
This guide was heavily inspired by the awesome scikit-learn guide to contributing.
There are several ways you can contribute to 🤗 Transformers:
- Fix outstanding issues with the existing code.
- Submit issues related to bugs or desired new features.
- Implement new models.
- Contribute to the examples or to the documentation.
If you don't know where to start, there is a special Good First Issue listing. It will give you a list of open issues that are beginner-friendly and help you start contributing to open-source. Just comment in the issue that you'd like to work on it.
For something slightly more challenging, you can also take a look at the Good Second Issue list. In general though, if you feel like you know what you're doing, go for it and we'll help you get there! 🚀
All contributions are equally valuable to the community. 🥰
If you notice an issue with the existing code and have a fix in mind, feel free to start contributing and open a Pull Request!
Do your best to follow these guidelines when submitting a bug-related issue or a feature request. It will make it easier for us to come back to you quickly and with good feedback.
The 🤗 Transformers library is robust and reliable thanks to users who report the problems they encounter.
Before you report an issue, we would really appreciate it if you could make sure the bug was not already reported (use the search bar on GitHub under Issues). Your issue should also be related to bugs in the library itself, and not your code. If you're unsure whether the bug is in your code or the library, please ask on the forum first. This helps us respond quicker to fixing issues related to the library versus general questions.
Once you've confirmed the bug hasn't already been reported, please include the following information in your issue so we can quickly resolve it:
- Your OS type and version and Python, PyTorch and TensorFlow versions when applicable.
- A short, self-contained, code snippet that allows us to reproduce the bug in less than 30s.
- The full traceback if an exception is raised.
- Attach any other additional information, like screenshots, you think may help.
To get the OS and software versions automatically, run the following command:
transformers-cli env
You can also run the same command from the root of the repository:
python src/transformers/commands/transformers_cli.py env
If there is a new feature you'd like to see in 🤗 Transformers, please open an issue and describe:
-
What is the motivation behind this feature? Is it related to a problem or frustration with the library? Is it a feature related to something you need for a project? Is it something you worked on and think it could benefit the community?
Whatever it is, we'd love to hear about it!
-
Describe your requested feature in as much detail as possible. The more you can tell us about it, the better we'll be able to help you.
-
Provide a code snippet that demonstrates the features usage.
-
If the feature is related to a paper, please include a link.
If your issue is well written we're already 80% of the way there by the time you create it.
We have added templates to help you get started with your issue.
New models are constantly released and if you want to implement a new model, please provide the following information
- A short description of the model and link to the paper.
- Link to the implementation if it is open-sourced.
- Link to the model weights if they are available.
If you are willing to contribute the model yourself, let us know so we can help you add it to 🤗 Transformers!
We have added a detailed guide and templates to help you get started with adding a new model, and we also have a more technical guide for how to add a model to 🤗 Transformers.
We're always looking for improvements to the documentation that make it more clear and accurate. Please let us know how the documentation can be improved such as typos and any content that is missing, unclear or inaccurate. We'll be happy to make the changes or help you make a contribution if you're interested!
For more details about how to generate, build, and write the documentation, take a look at the documentation README.
Before writing any code, we strongly advise you to search through the existing PRs or issues to make sure nobody is already working on the same thing. If you are unsure, it is always a good idea to open an issue to get some feedback.
You will need basic git
proficiency to contribute to
🤗 Transformers. While git
is not the easiest tool to use, it has the greatest
manual. Type git --help
in a shell and enjoy! If you prefer books, Pro
Git is a very good reference.
You'll need Python 3.7 or above to contribute to 🤗 Transformers. Follow the steps below to start contributing:
-
Fork the repository by clicking on the Fork button on the repository's page. This creates a copy of the code under your GitHub user account.
-
Clone your fork to your local disk, and add the base repository as a remote:
$ git clone [email protected]:<your Github handle>/transformers.git $ cd transformers $ git remote add upstream https://github.com/huggingface/transformers.git
-
Create a new branch to hold your development changes:
$ git checkout -b a-descriptive-name-for-my-changes
🚨 Do not work on the
main
branch! -
Set up a development environment by running the following command in a virtual environment:
$ pip install -e ".[dev]"
If 🤗 Transformers was already installed in the virtual environment, remove it with
pip uninstall transformers
before reinstalling it in editable mode with the-e
flag.Depending on your OS, you may need to install some external libraries as well if the
pip
installation fails.For macOS, you will likely need MeCab which can be installed from Homebrew:
brew install mecab
-
Develop the features on your branch.
As you work on your code, you should make sure the test suite passes. Run the tests impacted by your changes like this:
$ pytest tests/<TEST_TO_RUN>.py
For more information about tests, check out the Testing guide.
🤗 Transformers relies on
black
andisort
to format its source code consistently. After you make changes, apply automatic style corrections and code verifications that can't be automated in one go with:$ make fixup
This target is also optimized to only work with files modified by the PR you're working on.
If you prefer to run the checks one after the other, the following command applies the style corrections:
$ make style
🤗 Transformers also uses
flake8
and a few custom scripts to check for coding mistakes. Quality controls are run by the CI, but you can run the same checks with:$ make quality
Finally, we have a lot of scripts to make sure we didn't forget to update some files when adding a new model. You can run these scripts with:
$ make repo-consistency
To learn more about those checks and how to fix any issues with them, check out the Checks on a Pull Request guide.
If you're modifying documents under
docs/source
directory, make sure the documentation can still be built. This check will also run in the CI when you open a pull request. To run a local check make sure you install the documentation builder:$ pip install ".[docs]"
Run the following command from the root of the repository:
$ doc-builder build transformers docs/source/en --build_dir ~/tmp/test-build
This will build the documentation in the
~/tmp/test-build
folder where you can inspect the generated Markdown files with your favorite editor. You can also preview the docs on GitHub when you open a pull request.Once you're happy with your changes, add changed files with
git add
and record your changes locally withgit commit
:$ git add modified_file.py $ git commit
Please remember to write good commit messages to clearly communicate the changes you made!
To keep your copy of the code up to date with the original repository, rebase your branch on
upstream/branch
before you open a pull request or if requested by a maintainer:$ git fetch upstream $ git rebase upstream/main
Push your changes to your branch:
$ git push -u origin a-descriptive-name-for-my-changes
If you've already opened a pull request, you'll need to force push with the
--force
flag. Otherwise, if the pull request hasn't been opened yet, you can just push your changes normally. -
Now you can go to your fork of the repository on GitHub and click on Pull request to open a pull request. Make sure you tick off all the boxes in our checklist below. When you're ready, you can send your changes to the project maintainers for review.
-
It's ok if maintainers request changes, it happens to our core contributors too! So everyone can see the changes in the pull request, work in your local branch and push the changes to your fork. They will automatically appear in the pull request.
☐ The pull request title should summarize your contribution.
☐ If your pull request addresses an issue, please mention the issue number in the pull
request description to make sure they are linked (and people viewing the issue know you
are working on it).
☐ To indicate a work in progress please prefix the title with [WIP]
. These are
useful to avoid duplicated work, and to differentiate it from PRs ready to be merged.
☐ Make sure existing tests pass.
☐ If adding a new feature, also add tests for it.
- If you are adding a new model, make sure you use
ModelTester.all_model_classes = (MyModel, MyModelWithLMHead,...)
to trigger the common tests. - If you are adding new
@slow
tests, make sure they pass usingRUN_SLOW=1 python -m pytest tests/models/my_new_model/test_my_new_model.py
. - If you are adding a new tokenizer, write tests and make sure
RUN_SLOW=1 python -m pytest tests/models/{your_model_name}/test_tokenization_{your_model_name}.py
passes. CircleCI does not run the slow tests, but GitHub Actions does every night!
☐ All public methods must have informative docstrings (see
modeling_bert.py
for an example).
☐ Due to the rapidly growing repository, don't add any images, videos and other
non-text files that'll significantly weigh down the repository. Instead, use a Hub
repository such as hf-internal-testing
to host these files and reference them by URL. We recommend placing documentation
related images in the following repository:
huggingface/documentation-images.
You can open a PR on this dataset repostitory and ask a Hugging Face member to merge it.
For more information about the checks run on a pull request, take a look at our Checks on a Pull Request guide.
An extensive test suite is included to test the library behavior and several examples. Library tests can be found in the tests folder and examples tests in the examples folder.
We like pytest
and pytest-xdist
because it's faster. From the root of the
repository, specify a path to a subfolder or a test file to run the test.
$ python -m pytest -n auto --dist=loadfile -s -v ./tests/models/my_new_model
Similarly, for the examples
directory, specify a path to a subfolder or test file to run the test. For example, the following command tests the text classification subfolder in the PyTorch examples
directory:
$ pip install -r examples/xxx/requirements.txt # only needed the first time
$ python -m pytest -n auto --dist=loadfile -s -v ./examples/pytorch/text-classification
In fact, this is actually how our make test
and make test-examples
commands are implemented (not including the pip install
)!
You can also specify a smaller set of tests in order to test only the feature you're working on.
By default, slow tests are skipped but you can set the RUN_SLOW
environment variable to
yes
to run them. This will download many gigabytes of models so make sure you
have enough disk space, a good internet connection or a lot of patience!
Remember to specify a path to a subfolder or a test file to run the test. Otherwise, you'll run all the tests in the tests
or examples
folder, which will take a very long time!
$ RUN_SLOW=yes python -m pytest -n auto --dist=loadfile -s -v ./tests/models/my_new_model
$ RUN_SLOW=yes python -m pytest -n auto --dist=loadfile -s -v ./examples/pytorch/text-classification
Like the slow tests, custom tokenizer tests are skipped but you can set the RUN_CUSTOM_TOKENIZERS
environment variable to yes
to run them.
🤗 Transformers uses pytest
as a test runner only. It doesn't use any
pytest
-specific features in the test suite itself.
This means unittest
is fully supported. Here's how to run tests with
unittest
:
$ python -m unittest discover -s tests -t . -v
$ python -m unittest discover -s examples -t examples -v
For documentation strings, 🤗 Transformers follows the Google Python Style Guide. Check our documentation writing guide for more information.
On Windows (unless you're working in Windows Subsytem for Linux or WSL), you need to configure git to transform Windows CRLF
line endings to Linux LF
line endings:
git config core.autocrlf input
One way to run the make
command on Windows is with MSYS2:
- Download MSYS2, and we assume it's installed in
C:\msys64
. - Open the command line
C:\msys64\msys2.exe
(it should be available from the Start menu). - Run in the shell:
pacman -Syu
and installmake
withpacman -S make
. - Add
C:\msys64\usr\bin
to your PATH environment variable.
You can now use make
from any terminal (Powershell, cmd.exe, etc.)! 🎉
When updating the main branch of a forked repository, please follow these steps to avoid pinging the upstream repository which adds reference notes to each upstream PR, and sends unnecessary notifications to the developers involved in these PRs.
- When possible, avoid syncing with the upstream using a branch and PR on the forked repository. Instead, merge directly into the forked main.
- If a PR is absolutely necessary, use the following steps after checking out your branch:
$ git checkout -b your-branch-for-syncing
$ git pull --squash --no-commit upstream main
$ git commit -m '<your message without GitHub references>'
$ git push --set-upstream origin your-branch-for-syncing