Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

MXPredReshape bug: need to reshape softmax_label #12176

Closed
wants to merge 1 commit into from

Conversation

chsin
Copy link
Contributor

@chsin chsin commented Aug 15, 2018

Description

The fix in #11493 allows reshape only for networks that don't use SoftmaxOutput, so none of the pretrained image classification networks, e.g. inception, can be reshaped. The current test did not catch this because it only used gluon blocks. The reason SoftmaxOutput is weird is because it requires the label "softmax_label" to be reshaped even though "softmax_label" is only used as an input for training and should not even be looked at for prediction, which has been mentioned in other context before. I don't know what's the best way to deal with SoftmaxOutput's label reshape requirement because I can't find a way to get the label names. I am sending this pull request with an edit that allows resnet and inception to be reshaped, along with a test that covers this edge case, but I hope someone can comment on a more robust solution.

I found this issue when I made the following edits to image-classification-predict.cc:223. Before this change, the following would print out -1 and I traced it to c_predict_api.cc:304 where newShape.Size() != arr.shape().Size() for when arg_names[I] is softmax_label:

  // Create Predictor
  PredictorHandle old_hnd = nullptr;
  const mx_uint old_input_shape_data[4] = { 2,
                                        static_cast<mx_uint>(channels),
                                        static_cast<mx_uint>(height),
                                        static_cast<mx_uint>(width) };
  MXPredCreate(static_cast<const char*>(json_data.GetBuffer()),
               static_cast<const char*>(param_data.GetBuffer()),
               static_cast<int>(param_data.GetLength()),
               dev_type,
               dev_id,
               num_input_nodes,
               input_keys,
               input_shape_indptr,
               old_input_shape_data,
               &old_hnd);
  assert(old_hnd);

  int e = MXPredReshape(num_input_nodes,
                        input_keys,
                        input_shape_indptr,
                        input_shape_data,
                        old_hnd,
                        &pred_hnd);
  printf("%d\n", e);

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

@chsin chsin requested a review from anirudh2290 as a code owner August 15, 2018 01:52
@chinakook
Copy link
Contributor

In the inference stage, it's no need to use labels or SoftmaxOutput.

@chinakook
Copy link
Contributor

You can write a script to deploy the symbol by replace SoftmaxOutput with SoftmaxActivate.

@chsin
Copy link
Contributor Author

chsin commented Aug 15, 2018

@chinakook, To predict using the C API, you use MXPredCreate to load the pretrained symbols and weights and this works fine for all the mxnet pretrained models that use SoftmaxOutput, but the problem is that MXPredReshape doesn't work even though the models that were loaded properly. This causes reshape to fail for all the pretrained mxnet models (almost all use SoftmaxOutput) even though predcreate is fine. This is not an acceptable situation. I don't quite understand your comment, but you seem to be suggesting that this inconsistency is acceptable coding practice and it is the user that needs to clean up this mess by writing a script?

@lupesko
Copy link
Contributor

lupesko commented Aug 28, 2018

Thanks for identifying the issue and filing this PR @chsin .
Looping in additional folks to review: @apeforest @anirudh2290 @eric-haibin-lin

@stu1130
Copy link
Contributor

stu1130 commented Sep 18, 2018

bounce again @apeforest @anirudh2290 @eric-haibin-lin

@vandanavk
Copy link
Contributor

@apeforest @anirudh2290 @eric-haibin-lin ping for review

@vandanavk
Copy link
Contributor

@mxnet-label-bot [pr-awaiting-review]

@marcoabreu marcoabreu added the pr-awaiting-review PR is waiting for code review label Sep 28, 2018
@@ -304,7 +304,8 @@ int MXPredReshape(mx_uint num_input_nodes,
for (size_t i=0; i < arg_names.size(); ++i) {
TShape newShape = arg_shapes[i];
NDArray &arr = p->arg_arrays[i];
if (new_shape.count(arg_names[i]) != 0) {
if (new_shape.count(arg_names[i]) != 0 ||
strcmp("softmax_label", arg_names[i].c_str()) == 0) {
Copy link
Contributor

@apeforest apeforest Sep 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still trying to understand the problem. But this hack is not acceptable purely from coding practice.

@Roshrini
Copy link
Member

@chsin Can you please address review comments? Thanks!

@ankkhedia
Copy link
Contributor

@chsin ping! Could you please address the comments? Thanks!

@kalyc
Copy link
Contributor

kalyc commented Nov 12, 2018

@chsin Thanks for your contribution! We would like to take the process of merging this PR forward - could you please address comments above?

@stu1130
Copy link
Contributor

stu1130 commented Nov 20, 2018

@mxnet-label-bot update [pr-awaiting-response]
@chsin let us know if you need any help!

@marcoabreu marcoabreu added pr-awaiting-response PR is reviewed and waiting for contributor to respond and removed pr-awaiting-review PR is waiting for code review labels Nov 20, 2018
@vandanavk
Copy link
Contributor

adding @Vikas89 for review

@roywei
Copy link
Member

roywei commented Dec 11, 2018

@chsin Thanks for the contribution, looks like our CI hanged, could trigger it by push an empty commit?

@sandeep-krishnamurthy
Copy link
Contributor

@chsin - Can you please rebase and retrigger CI?

@Roshrini
Copy link
Member

Roshrini commented Jan 8, 2019

@anirudh2290 Can you review this PR?

@stu1130
Copy link
Contributor

stu1130 commented Jan 15, 2019

@anirudh2290 ping for review @chsin could you rebase and retrigger CI thanks!

@sandeep-krishnamurthy
Copy link
Contributor

@chsin - Thanks for your contributions. Like @apeforest mentioned, this change looks like not generalizable. What happens if name of last layer is not 'softmax_label' for example in Gluon case it will may be prefixed with name_scope.

@chsin chsin closed this Jan 31, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-response PR is reviewed and waiting for contributor to respond
Projects
None yet
Development

Successfully merging this pull request may close these issues.