-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return raw outputs in TextClassificationPipeline #8328
Conversation
Will that flag be configurable on a model-level for the hosted Inference API, and if yes, how? |
Through the configuration, as specified in the original issue would be the easiest way to control it through the API. Haven't thought about it this far however. The other option would be to disable the |
Here's a proposal @julien-c (see latest commit). The current config = GPT2Config.from_pretrained("microsoft/DialogRPT-updown", task_specific_params={"text-classification": {"return_raw_outputs": True}})
model = GPT2ForSequenceClassification.from_pretrained("microsoft/DialogRPT-updown", config=config) This will be used as the default value for the pipeline, and allow users to specify pipeline-specific arguments directly in their model configuration. The proposal was only made for the |
pinging @Narsil and @mfuntowicz for their thoughts on this (we have ways to get configurable params on the API, just wanted to make sure we could use them for this) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
A nit, and more of a question in argument management.
src/transformers/pipelines.py
Outdated
@@ -516,7 +516,7 @@ def __init__( | |||
if framework is None: | |||
framework = get_framework(model) | |||
|
|||
self.task = task | |||
self.task = task or self.task |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not defined for Pipeline
and most others, so we probably should change that to
getattr(self, 'task', "")
src/transformers/pipelines.py
Outdated
self.return_all_scores = return_all_scores | ||
self.return_raw_outputs = return_raw_outputs if return_raw_outputs is not None else False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but we probably should be more consistent in our way of managing pipe arguments
vs call arguments
....
I've also run into this issue recently when uploading some multi-label text classification models to the HuggingFace hub, where it seems like the default activation for multiple classes is a softmax. Having it return raw outputs is definitely useful, however, it still wouldn't show the scores I would expect in the inference api - in our case using a sigmoid over the results to allow more than one positive label? Would it be useful to also have an argument like |
Hi @laurahanu, thanks for your proposal. I'll try and see how to best integrate that in this PR. |
669e668
to
9a7ae2a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@julien-c @Narsil @mfuntowicz @laurahanu please review and tell me if this implementation fits your needs.
src/transformers/pipelines.py
Outdated
return_all_scores = return_all_scores if return_all_scores is not None else self.return_all_scores | ||
function_to_apply = function_to_apply if function_to_apply is not None else self.function_to_apply |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per @mfuntowicz's request, the arguments can now be handled via the __call__
method of the pipeline, as well as the __init__
and the model configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor NIT: I find the following slightly more readable. Feel free to ignore if you don't agree.
if return_all_scores is None:
return_all_score = self.return_all_scores
if function_to_apply is None:
function_to_apply = self.function_to_apply
src/transformers/pipelines.py
Outdated
if function_to_apply == "default": | ||
if self.model.config.num_labels == 1: | ||
scores = sigmoid(outputs) | ||
else: | ||
scores = softmax(outputs) | ||
elif function_to_apply == "sigmoid": | ||
scores = sigmoid(outputs) | ||
elif function_to_apply == "softmax": | ||
scores = softmax(outputs) | ||
elif function_to_apply.lower() == "none": | ||
scores = outputs | ||
else: | ||
scores = np.exp(outputs) / np.exp(outputs).sum(-1, keepdims=True) | ||
if self.return_all_scores: | ||
raise ValueError(f"Unrecognized `function_to_apply` argument: {function_to_apply}") | ||
|
||
if return_all_scores: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handles several cases:
- Default cases, sigmoid for single labels and softmax for multi-label
- Sigmoid for all (@laurahanu)
- Softmax for all
- No function applied
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this suits my case, thank you!
src/transformers/pipelines.py
Outdated
task = "text2text-generation" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All pipelines now have a default task, for consistency.
small_models = [ | ||
"sshleifer/tiny-distilbert-base-uncased-finetuned-sst-2-english" | ||
] # Default model - Models tested without the @slow decorator | ||
small_models = ["distilbert-base-cased"] # Default model - Models tested without the @slow decorator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change was necessary as the previous model was not outputting valid results (0.0
on everything).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're really missing a test on the actual SHAPE and values of the output of the pipelines called with different arguments.
The rest is more of proposal for nits if you agree with my remarks.
# Compare all outputs (call, init, model argument) | ||
for example_result_0, example_result_1 in zip(pipeline_output_0, pipeline_output_1): | ||
# Iterate through the results | ||
print(np.allclose(example_result_0["score"], example_result_1["score"], atol=1e-6)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be an assert instead of a print
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, good catch!
return _string_output, _string_list_output | ||
|
||
string_output = classifier(string_input) | ||
string_list_output = classifier(string_list_input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're missing from these tests an actual tests of what string_output
and string_list_output
look like:
Ideally self.assertEqual(string_output, {'label': XX, 'score': XX})
.
Realistically because assertEqual won't work on floats:
assertTrue(isinstance(string_output, dict))
assertTrue(set(string_output.keys()), {'label', 'score'})
assertEquals(string_output['score'].shape, [12, 1064])
# checking only one value is probably good enough, but at least we will know if the output changes
assertAlmostEquals(string_output['score'][0, 0], -12.125)
src/transformers/pipelines.py
Outdated
return_all_scores = return_all_scores if return_all_scores is not None else self.return_all_scores | ||
function_to_apply = function_to_apply if function_to_apply is not None else self.function_to_apply |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor NIT: I find the following slightly more readable. Feel free to ignore if you don't agree.
if return_all_scores is None:
return_all_score = self.return_all_scores
if function_to_apply is None:
function_to_apply = self.function_to_apply
src/transformers/pipelines.py
Outdated
scores = sigmoid(outputs) | ||
elif function_to_apply == "softmax": | ||
scores = softmax(outputs) | ||
elif function_to_apply.lower() == "none": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personnally would leave out .lower()
and be more strict.
src/transformers/pipelines.py
Outdated
else: | ||
scores = np.exp(outputs) / np.exp(outputs).sum(-1, keepdims=True) | ||
if self.return_all_scores: | ||
raise ValueError(f"Unrecognized `function_to_apply` argument: {function_to_apply}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add the valid choices in the error message ? I don't like adding them manually and maybe programmatically but that would complexify the code overall so I'm not sure about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, good idea.
3bae404
to
c92d200
Compare
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Hello, was just wondering if there are any updates on allowing the user to choose which function to run on the outputs? It seems like in the current documentation, the pipeline would still only run a sigmoid over the result if there is one label.
|
c92d200
to
56b33cc
Compare
Sorry for taking a long time to merge this; I wonder if we can't re-use the recently introduced |
Yes, this flag is there for this reason specifically :-) |
Thanks for reopening this and having another look at it! Do you have a timeline in mind for when this would be merged/available to use with the model hub? |
56b33cc
to
9e07b95
Compare
Hey @laurahanu, the PR should now be ready for merging. I'm pinging two team members for review; this should be merged in the coming days and deployed on the API shortly after. |
Great, thank you @LysandreJik! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve this PR because I think it can go as-is, however I think a few changes should be made (could be in the PR or later).
- Remove "default" option (those are usually confusing, None should be the default instead)
- Change the tests for a more linear approach that do include actual scores (important to make sure we don't break scores later)
elif self.model.config.problem_type == "single_label_classification": | ||
self.function_to_apply = "softmax" | ||
|
||
def sigmoid(_outputs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: functions that can be at the file level should in general (prevents accidental state leak though variable name)
shifted_exp = np.exp(_outputs - maxes) | ||
return shifted_exp / shifted_exp.sum(axis=-1, keepdims=True) | ||
|
||
if function_to_apply == "default": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should stay away from "default" in that case.
IMO this should be taken care of when inferring the function_to_apply
this will make logic a bit more simple to understand.
scores = sigmoid(outputs) | ||
else: | ||
scores = softmax(outputs) | ||
elif function_to_apply == "sigmoid": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: we could use an enum here. it would be a bit more consistent with other pipelines (here function to apply is equivalent to aggregation_strategy
for token classification for instance.
It does make the code a bit more verbose. We still do need to support raw strings as arguments as it's still much simpler to use than importing the enum from somewhere.
(I'm not sure it's worth it, feel free to ignore)
# Iterate through the results | ||
self.assertTrue( | ||
np.allclose(example_result_0["score"], example_result_1["score"], atol=1e-6) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think those tests are super hard to read and to maintain. Because it contains lot of logic, determining of this test is correct or not is kind of hard.
Also the expected values are not hardcoded in the test meaning drift in output vlues would not be caught by this test.
I am a proponent of much simpler tests in logic, but much more verbose.
classifier = pipeline(task="sentiment-analysis", model=model, tokenizer=tokenizer)
self.assertEqual(nested_simplify(classifier("I really disagree with what you've said.")), [{'score': 0.1, 'label' : "LABEL_0"}])
# other function to apply
self.assertEqual(nested_simplify(classifier("I really disagree with what you've said.", function_to_apply="sigmod")), [{'score': 0.2, 'label' : "LABEL_0"}])
# function_to_apply as pipeline arg
classifier2 = pipeline(task="sentiment-analysis", model=model, tokenizer=tokenizer, function_to_apply="sigmoid")
self.assertEqual(nested_simplify(classifier("I really disagree with what you've said.")), [{'score': 0.2, 'label' : "LABEL_0"}])
# from config
# check default strategy depends on num_labels
# ....
I would happy to make a separate PR to suggest this and be compliant with the current written tests.
@LysandreJik Do you agree ?
f905253
to
d6ea1c1
Compare
function_to_apply = ClassificationFunction.SOFTMAX | ||
|
||
if isinstance(function_to_apply, str): | ||
function_to_apply = ClassificationFunction[function_to_apply.upper()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we try except with clean error message, or is the current exception good enough ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking the exception is good enough, but happy to update if you feel strongly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No no I think it's fine, I didn't check what the actual message was, in the previous iteration it was explicit so I was wondering if it was ok here. I trust you there.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
@LysandreJik is this ready to be merged? |
d6ea1c1
to
40e163f
Compare
Currently the
TextClassificationPipeline
does a softmax over the output values whennum_labels > 1
, and does a sigmoid over the output whennum_labels == 1
.As seen in #8259, this may be problematic when systems depend on a different output range.
Adding a flag
return_raw_outputs
that will not apply the sigmoid or softmax in that case.closes #8259