Add support for function_to_apply
in classification
#211
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
For classification models, users should have some way of specifying the function to be applied to the logits rather than
softmax
always being used since the logits do not always correspond to mutually exclusive classes.Take, for example, a model's logits representing whether a piece of text conveys negative, neutral, or positive sentiment in a text classification task. In this case, the
softmax
activation function is apt as it ensures that the probabilities in the final output vector sum to 1. However, if the logits denote whether the text is toxic, obscene, insulting, etc. (as seen in Detoxify models), thesigmoid
activation function would be more appropriate since the values in the final output vector do not necessarily have to sum to 1.HF pipelines used
softmax
as the mandatory activation function over multi-class model logits until huggingface/transformers#8328, when they added support for thefunction_to_apply
parameter which can be specified by the user.I am proposing support for a
function_to_apply
parameter here as well. The options are the same as implemented in HF:softmax
,sigmoid
, andnone
(none
applies no activation function and just returns the raw logits).softmax
is the default since it preserves backward compatibility and seems to be the most common case.