Return raw outputs in TextClassificationPipeline #8328

LysandreJik · 2020-11-05T14:55:58Z

Currently the TextClassificationPipeline does a softmax over the output values when num_labels > 1, and does a sigmoid over the output when num_labels == 1.

As seen in #8259, this may be problematic when systems depend on a different output range.

Adding a flag return_raw_outputs that will not apply the sigmoid or softmax in that case.

closes #8259

julien-c · 2020-11-05T14:57:33Z

Will that flag be configurable on a model-level for the hosted Inference API, and if yes, how?

LysandreJik · 2020-11-05T14:59:27Z

Through the configuration, as specified in the original issue would be the easiest way to control it through the API. Haven't thought about it this far however.

The other option would be to disable the sigmoid over the output and let the user handle that instead, but this would break existing models on the inference API such as the DialogRPT model.

LysandreJik · 2020-11-05T15:38:20Z

Here's a proposal @julien-c (see latest commit).

The current TextClassificationPipeline has a task set to "" as it is not initialized from pipeline(task).
This proposal puts the default task as "text-classification" for the TextClassificationPipeline. Since the pipeline will fetch the task_specific_params according to the given task, it can therefore fetch the following configuration option (which can of course already be set in the configuration on S3):

config = GPT2Config.from_pretrained("microsoft/DialogRPT-updown", task_specific_params={"text-classification": {"return_raw_outputs": True}})
model = GPT2ForSequenceClassification.from_pretrained("microsoft/DialogRPT-updown", config=config)

This will be used as the default value for the pipeline, and allow users to specify pipeline-specific arguments directly in their model configuration.

The proposal was only made for the TextClassificationPipeline, but it would need to be made to all other pipelines to stay coherent. Let me know if this is an approach you would be interested in.

julien-c · 2020-11-05T20:05:25Z

pinging @Narsil and @mfuntowicz for their thoughts on this (we have ways to get configurable params on the API, just wanted to make sure we could use them for this)

Narsil

LGTM.

A nit, and more of a question in argument management.

Narsil · 2020-11-11T10:02:46Z

src/transformers/pipelines.py

@@ -516,7 +516,7 @@ def __init__(
        if framework is None:
            framework = get_framework(model)

-        self.task = task
+        self.task = task or self.task


This is not defined for Pipeline and most others, so we probably should change that to

getattr(self, 'task', "")

Narsil · 2020-11-11T10:10:24Z

src/transformers/pipelines.py

        self.return_all_scores = return_all_scores
+        self.return_raw_outputs = return_raw_outputs if return_raw_outputs is not None else False


Looks good, but we probably should be more consistent in our way of managing pipe arguments vs call arguments....

laurahanu · 2020-11-17T14:18:22Z

I've also run into this issue recently when uploading some multi-label text classification models to the HuggingFace hub, where it seems like the default activation for multiple classes is a softmax.

Having it return raw outputs is definitely useful, however, it still wouldn't show the scores I would expect in the inference api - in our case using a sigmoid over the results to allow more than one positive label?

Would it be useful to also have an argument like output_nonlinearity for the desired activation function e.g. sigmoid/softmax?

LysandreJik · 2020-11-18T15:08:23Z

Hi @laurahanu, thanks for your proposal. I'll try and see how to best integrate that in this PR.

LysandreJik

@julien-c @Narsil @mfuntowicz @laurahanu please review and tell me if this implementation fits your needs.

LysandreJik · 2020-11-30T23:00:23Z

src/transformers/pipelines.py

+        return_all_scores = return_all_scores if return_all_scores is not None else self.return_all_scores
+        function_to_apply = function_to_apply if function_to_apply is not None else self.function_to_apply


Per @mfuntowicz's request, the arguments can now be handled via the __call__ method of the pipeline, as well as the __init__ and the model configuration.

Minor NIT: I find the following slightly more readable. Feel free to ignore if you don't agree.

if return_all_scores is None: return_all_score = self.return_all_scores if function_to_apply is None: function_to_apply = self.function_to_apply

LysandreJik · 2020-11-30T23:01:32Z

src/transformers/pipelines.py

+        if function_to_apply == "default":
+            if self.model.config.num_labels == 1:
+                scores = sigmoid(outputs)
+            else:
+                scores = softmax(outputs)
+        elif function_to_apply == "sigmoid":
+            scores = sigmoid(outputs)
+        elif function_to_apply == "softmax":
+            scores = softmax(outputs)
+        elif function_to_apply.lower() == "none":
+            scores = outputs
        else:
-            scores = np.exp(outputs) / np.exp(outputs).sum(-1, keepdims=True)
-        if self.return_all_scores:
+            raise ValueError(f"Unrecognized `function_to_apply` argument: {function_to_apply}")
+
+        if return_all_scores:


Handles several cases:

Default cases, sigmoid for single labels and softmax for multi-label

Sigmoid for all (@laurahanu)

Softmax for all

No function applied

Yes, this suits my case, thank you!

LysandreJik · 2020-11-30T23:01:56Z

src/transformers/pipelines.py

+    task = "text2text-generation"
+


All pipelines now have a default task, for consistency.

LysandreJik · 2020-11-30T23:02:22Z

tests/test_pipelines_sentiment_analysis.py

-    small_models = [
-        "sshleifer/tiny-distilbert-base-uncased-finetuned-sst-2-english"
-    ]  # Default model - Models tested without the @slow decorator
+    small_models = ["distilbert-base-cased"]  # Default model - Models tested without the @slow decorator


This change was necessary as the previous model was not outputting valid results (0.0 on everything).

Narsil

I think we're really missing a test on the actual SHAPE and values of the output of the pipelines called with different arguments.

The rest is more of proposal for nits if you agree with my remarks.

Narsil · 2020-12-08T14:21:30Z

tests/test_pipelines_sentiment_analysis.py

+                                # Compare all outputs (call, init, model argument)
+                                for example_result_0, example_result_1 in zip(pipeline_output_0, pipeline_output_1):
+                                    # Iterate through the results
+                                    print(np.allclose(example_result_0["score"], example_result_1["score"], atol=1e-6))


Shouldn't this be an assert instead of a print ?

Indeed, good catch!

Narsil · 2020-12-08T14:26:16Z

tests/test_pipelines_sentiment_analysis.py

+                return _string_output, _string_list_output
+
+            string_output = classifier(string_input)
+            string_list_output = classifier(string_list_input)


I think we're missing from these tests an actual tests of what string_output and string_list_output look like:

Ideally self.assertEqual(string_output, {'label': XX, 'score': XX}).
Realistically because assertEqual won't work on floats:

assertTrue(isinstance(string_output, dict)) assertTrue(set(string_output.keys()), {'label', 'score'}) assertEquals(string_output['score'].shape, [12, 1064]) # checking only one value is probably good enough, but at least we will know if the output changes assertAlmostEquals(string_output['score'][0, 0], -12.125)

Narsil · 2020-12-08T14:31:25Z

src/transformers/pipelines.py

+        return_all_scores = return_all_scores if return_all_scores is not None else self.return_all_scores
+        function_to_apply = function_to_apply if function_to_apply is not None else self.function_to_apply


Minor NIT: I find the following slightly more readable. Feel free to ignore if you don't agree.

if return_all_scores is None: return_all_score = self.return_all_scores if function_to_apply is None: function_to_apply = self.function_to_apply

Narsil · 2020-12-08T14:34:59Z

src/transformers/pipelines.py

+            scores = sigmoid(outputs)
+        elif function_to_apply == "softmax":
+            scores = softmax(outputs)
+        elif function_to_apply.lower() == "none":


I personnally would leave out .lower() and be more strict.

Narsil · 2020-12-08T14:36:25Z

src/transformers/pipelines.py

        else:
-            scores = np.exp(outputs) / np.exp(outputs).sum(-1, keepdims=True)
-        if self.return_all_scores:
+            raise ValueError(f"Unrecognized `function_to_apply` argument: {function_to_apply}")


Could we add the valid choices in the error message ? I don't like adding them manually and maybe programmatically but that would complexify the code overall so I'm not sure about this.

Indeed, good idea.

github-actions · 2021-04-15T15:07:14Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

laurahanu · 2021-05-06T10:35:42Z

Hello, was just wondering if there are any updates on allowing the user to choose which function to run on the outputs? It seems like in the current documentation, the pipeline would still only run a sigmoid over the result if there is one label.

If multiple classification labels are available (model.config.num_labels >= 2), the pipeline will run a softmax over the results. If there is a single label, the pipeline will run a sigmoid over the result.

LysandreJik · 2021-06-07T15:10:17Z

Sorry for taking a long time to merge this; I wonder if we can't re-use the recently introduced problem_type instead, wdyt @sgugger @abhi1thakur?

sgugger · 2021-06-07T15:12:23Z

Yes, this flag is there for this reason specifically :-)

laurahanu · 2021-06-28T14:14:06Z

Thanks for reopening this and having another look at it! Do you have a timeline in mind for when this would be merged/available to use with the model hub?

LysandreJik · 2021-07-01T07:15:02Z

Hey @laurahanu, the PR should now be ready for merging. I'm pinging two team members for review; this should be merged in the coming days and deployed on the API shortly after.

laurahanu · 2021-07-01T10:25:07Z

Great, thank you @LysandreJik!

sgugger

Thanks for working on this!

src/transformers/pipelines/text_classification.py

Narsil

I approve this PR because I think it can go as-is, however I think a few changes should be made (could be in the PR or later).

Remove "default" option (those are usually confusing, None should be the default instead)
Change the tests for a more linear approach that do include actual scores (important to make sure we don't break scores later)

Narsil · 2021-07-05T09:16:40Z

src/transformers/pipelines/text_classification.py

+            elif self.model.config.problem_type == "single_label_classification":
+                self.function_to_apply = "softmax"
+
+        def sigmoid(_outputs):


NIT: functions that can be at the file level should in general (prevents accidental state leak though variable name)

Narsil · 2021-07-05T09:18:27Z

src/transformers/pipelines/text_classification.py

+            shifted_exp = np.exp(_outputs - maxes)
+            return shifted_exp / shifted_exp.sum(axis=-1, keepdims=True)
+
+        if function_to_apply == "default":


I think we should stay away from "default" in that case.

IMO this should be taken care of when inferring the function_to_apply this will make logic a bit more simple to understand.

Narsil · 2021-07-05T09:21:02Z

src/transformers/pipelines/text_classification.py

+                scores = sigmoid(outputs)
+            else:
+                scores = softmax(outputs)
+        elif function_to_apply == "sigmoid":


NIT: we could use an enum here. it would be a bit more consistent with other pipelines (here function to apply is equivalent to aggregation_strategy for token classification for instance.

It does make the code a bit more verbose. We still do need to support raw strings as arguments as it's still much simpler to use than importing the enum from somewhere.

(I'm not sure it's worth it, feel free to ignore)

Narsil · 2021-07-05T09:28:40Z

tests/test_pipelines_text_classification.py

+                                    # Iterate through the results
+                                    self.assertTrue(
+                                        np.allclose(example_result_0["score"], example_result_1["score"], atol=1e-6)
+                                    )


I think those tests are super hard to read and to maintain. Because it contains lot of logic, determining of this test is correct or not is kind of hard.
Also the expected values are not hardcoded in the test meaning drift in output vlues would not be caught by this test.

I am a proponent of much simpler tests in logic, but much more verbose.

classifier = pipeline(task="sentiment-analysis", model=model, tokenizer=tokenizer) self.assertEqual(nested_simplify(classifier("I really disagree with what you've said.")), [{'score': 0.1, 'label' : "LABEL_0"}]) # other function to apply self.assertEqual(nested_simplify(classifier("I really disagree with what you've said.", function_to_apply="sigmod")), [{'score': 0.2, 'label' : "LABEL_0"}]) # function_to_apply as pipeline arg classifier2 = pipeline(task="sentiment-analysis", model=model, tokenizer=tokenizer, function_to_apply="sigmoid") self.assertEqual(nested_simplify(classifier("I really disagree with what you've said.")), [{'score': 0.2, 'label' : "LABEL_0"}]) # from config # check default strategy depends on num_labels # ....

I would happy to make a separate PR to suggest this and be compliant with the current written tests.

@LysandreJik Do you agree ?

Narsil · 2021-07-05T13:54:47Z

src/transformers/pipelines/text_classification.py

+                function_to_apply = ClassificationFunction.SOFTMAX
+
+        if isinstance(function_to_apply, str):
+            function_to_apply = ClassificationFunction[function_to_apply.upper()]


Should we try except with clean error message, or is the current exception good enough ?

I'm thinking the exception is good enough, but happy to update if you feel strongly

No no I think it's fine, I didn't check what the actual message was, in the previous iteration it was explicit so I was wondering if it was ok here. I trust you there.

github-actions · 2021-07-30T15:07:58Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

laurahanu · 2021-07-30T15:16:30Z

@LysandreJik is this ready to be merged?

Co-authored-by: Sylvain Gugger <[email protected]>

LysandreJik requested a review from mfuntowicz November 5, 2020 14:55

LysandreJik mentioned this pull request Nov 5, 2020

Disable default sigmoid function for single label classification Inference API #8259

Closed

Narsil approved these changes Nov 11, 2020

View reviewed changes

LysandreJik force-pushed the text-classification-pipeline-raw-output branch from 669e668 to 9a7ae2a Compare November 30, 2020 22:40

LysandreJik commented Nov 30, 2020

View reviewed changes

Narsil reviewed Dec 8, 2020

View reviewed changes

mfuntowicz force-pushed the master branch from 447808c to 18c32ee Compare December 8, 2020 22:38

LysandreJik mentioned this pull request Feb 26, 2021

Dont use sigmoid when num_labels==1 #10417

Closed

LysandreJik force-pushed the text-classification-pipeline-raw-output branch from 3bae404 to c92d200 Compare February 26, 2021 15:59

github-actions bot closed this Apr 24, 2021

laurahanu mentioned this pull request May 6, 2021

Mismatched results between your lib vs huggingface unitaryai/detoxify#15

Open

LysandreJik reopened this Jun 7, 2021

LysandreJik force-pushed the text-classification-pipeline-raw-output branch from c92d200 to 56b33cc Compare June 7, 2021 15:09

LysandreJik force-pushed the text-classification-pipeline-raw-output branch from 56b33cc to 9e07b95 Compare June 30, 2021 16:18

LysandreJik requested review from Narsil and sgugger July 1, 2021 07:14

LysandreJik mentioned this pull request Jul 1, 2021

Multilabel Sequence Classification in trainer #10232

Open

sgugger approved these changes Jul 1, 2021

View reviewed changes

src/transformers/pipelines/text_classification.py Outdated Show resolved Hide resolved

src/transformers/pipelines/text_classification.py Outdated Show resolved Hide resolved

Narsil approved these changes Jul 5, 2021

View reviewed changes

LysandreJik force-pushed the text-classification-pipeline-raw-output branch from f905253 to d6ea1c1 Compare July 5, 2021 13:23

Narsil reviewed Jul 5, 2021

View reviewed changes

LysandreJik and others added 5 commits August 4, 2021 14:34

Return raw outputs in TextClassificationPipeline

5101aed

Style

3b26ad0

Support for problem type

0bfb892

Update src/transformers/pipelines/text_classification.py

d242b13

Co-authored-by: Sylvain Gugger <[email protected]>

Apply Nicolas' comments

40e163f

LysandreJik force-pushed the text-classification-pipeline-raw-output branch from d6ea1c1 to 40e163f Compare August 4, 2021 12:38

LysandreJik merged commit 3f44a66 into master Aug 4, 2021

LysandreJik deleted the text-classification-pipeline-raw-output branch August 4, 2021 12:42

LysandreJik mentioned this pull request Mar 30, 2022

Request on application/pipeline for Text Regression #16366

Closed

coderrg mentioned this pull request May 20, 2023

Add support for function_to_apply in classification elixir-nx/bumblebee#211

Merged

		self.return_all_scores = return_all_scores
		self.return_raw_outputs = return_raw_outputs if return_raw_outputs is not None else False

		return_all_scores = return_all_scores if return_all_scores is not None else self.return_all_scores
		function_to_apply = function_to_apply if function_to_apply is not None else self.function_to_apply

Return raw outputs in TextClassificationPipeline #8328

Return raw outputs in TextClassificationPipeline #8328

Conversation

LysandreJik commented Nov 5, 2020

julien-c commented Nov 5, 2020

LysandreJik commented Nov 5, 2020

LysandreJik commented Nov 5, 2020 • edited Loading

julien-c commented Nov 5, 2020

Narsil left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

laurahanu commented Nov 17, 2020

LysandreJik commented Nov 18, 2020

LysandreJik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Narsil left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Apr 15, 2021

laurahanu commented May 6, 2021

LysandreJik commented Jun 7, 2021

sgugger commented Jun 7, 2021

laurahanu commented Jun 28, 2021

LysandreJik commented Jul 1, 2021

laurahanu commented Jul 1, 2021

sgugger left a comment

Choose a reason for hiding this comment

Narsil left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jul 30, 2021

laurahanu commented Jul 30, 2021

LysandreJik commented Nov 5, 2020 •

edited

Loading