Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-542] Fix mkldnn performance regression + improve test logging #11262

Merged
merged 46 commits into from
Jun 18, 2018

Conversation

azai91
Copy link
Contributor

@azai91 azai91 commented Jun 13, 2018

Description

Fix MKLDNN performance issue
CreateMKLDNNMem accepts input as param to compare if WriteInPlace valid
improve test logging

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Do not create temp copy of memory during activation unless input/output shapes mismatch.

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@azai91 azai91 force-pushed the fix/mkldnn-performance branch from 8f89758 to becedcc Compare June 14, 2018 05:29
@azai91 azai91 changed the title [MXNET-542] Fix mkldnn performance regression [MXNET-542] Fix mkldnn performance regression + unit test for CreateMKLDNNMem Jun 14, 2018
@azai91 azai91 force-pushed the fix/mkldnn-performance branch 2 times, most recently from ee981f8 to 3f7e954 Compare June 14, 2018 19:50
@azai91 azai91 changed the title [MXNET-542] Fix mkldnn performance regression + unit test for CreateMKLDNNMem [MXNET-542] Fix mkldnn performance regression + improve test logging Jun 14, 2018
@azai91 azai91 force-pushed the fix/mkldnn-performance branch from 42dd59a to 4ec8698 Compare June 14, 2018 22:57
@azai91
Copy link
Contributor Author

azai91 commented Jun 15, 2018

so this diff fails because of a recent test that was added after my initial merge that caused the regression. the test compares the output of the hybridized vs non-hybridized networks and ensures the results are roughly equal. prior to my merge that caused the regression, while it was performant, the hybridized results vs non-hybridized results varied (which causes this new test to fail).

@azai91 azai91 force-pushed the fix/mkldnn-performance branch 4 times, most recently from 1cceb3c to 1fa9e35 Compare June 15, 2018 05:32
@azai91
Copy link
Contributor Author

azai91 commented Jun 15, 2018

Benchmarks as of now:
Current master branch

ubuntu@ip-172-31-11-93:~/incubator-mxnet-original$ python example/image-classification/benchmark_score.py
/home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
INFO:root:network: resnet-50
INFO:root:device: cpu(0)
/home/ubuntu/incubator-mxnet-original/python/mxnet/module/base_module.py:66: UserWarning: Data provided by label_shapes don't match names specified by label_names ([] vs. ['softmax_label'])
  warnings.warn(msg)
[05:01:20] ../src/operator/nn/mkldnn/mkldnn_base.cc:72: Allocate 9437184 bytes with malloc directly
INFO:root:batch size  1, image/sec: 15.918593
INFO:root:batch size  2, image/sec: 27.955709
INFO:root:batch size  4, image/sec: 35.017213
INFO:root:batch size  8, image/sec: 34.267697
INFO:root:batch size 16, image/sec: 37.663340
INFO:root:batch size 32, image/sec: 41.455577

This PR

ubuntu@ip-172-31-11-93:~/incubator-mxnet-reg$ python example/image-classification/benchmark_score.py
/home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Assertion failure at kmp_runtime.cpp(3767): __kmp_gtid_get_specific() == gtid.
Assertion failure at kmp_runtime.cpp(2582): __kmp_init_serial.
Assertion failure at kmp_runtime.cpp(3767): __kmp_gtid_get_specific() == gtid.
Assertion failure at kmp_runtime.cpp(3767): __kmp_gtid_get_specific() == gtid.
INFO:root:network: resnet-50
INFO:root:device: cpu(0)
/home/ubuntu/incubator-mxnet-reg/python/mxnet/module/base_module.py:66: UserWarning: Data provided by label_shapes don't match names specified by label_names ([] vs. ['softmax_label'])
  warnings.warn(msg)
[05:36:17] ../src/operator/nn/mkldnn/mkldnn_base.cc:72: Allocate 147456 bytes with malloc directly
[05:36:17] ../src/operator/nn/mkldnn/mkldnn_base.cc:72: Allocate 589824 bytes with malloc directly
[05:36:17] ../src/operator/nn/mkldnn/mkldnn_base.cc:72: Allocate 2359296 bytes with malloc directly
[05:36:17] ../src/operator/nn/mkldnn/mkldnn_base.cc:72: Allocate 9437184 bytes with malloc directly
INFO:root:batch size  1, image/sec: 16.331885
INFO:root:batch size  2, image/sec: 22.146912
INFO:root:batch size  4, image/sec: 29.284689
INFO:root:batch size  8, image/sec: 47.227163
INFO:root:batch size 16, image/sec: 50.323157
INFO:root:batch size 32, image/sec: 51.729604

@azai91
Copy link
Contributor Author

azai91 commented Jun 15, 2018

Benchmark from diff 9514a1e

ubuntu@ip-172-31-11-93:~/incubator-mxnet-original$ python example/image-classification/benchmark_score.py
/home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
INFO:root:network: resnet-50
INFO:root:device: cpu(0)
/home/ubuntu/incubator-mxnet-original/python/mxnet/module/base_module.py:66: UserWarning: Data provided by label_shapes don't match names specified by label_names ([] vs. ['softmax_label'])
  warnings.warn(msg)
[05:50:11] ../src/operator/nn/mkldnn/mkldnn_base.cc:72: Allocate 147456 bytes with malloc directly
[05:50:11] ../src/operator/nn/mkldnn/mkldnn_base.cc:72: Allocate 589824 bytes with malloc directly
[05:50:11] ../src/operator/nn/mkldnn/mkldnn_base.cc:72: Allocate 2359296 bytes with malloc directly
[05:50:11] ../src/operator/nn/mkldnn/mkldnn_base.cc:72: Allocate 9437184 bytes with malloc directly
INFO:root:batch size  1, image/sec: 22.502343
INFO:root:batch size  2, image/sec: 31.937269
INFO:root:batch size  4, image/sec: 41.011916
INFO:root:batch size  8, image/sec: 47.140644
INFO:root:batch size 16, image/sec: 50.467947
INFO:root:batch size 32, image/sec: 51.507261
ubuntu@ip-172-31-11-93:~/incubator-mxnet-original$ git log
commit 9514a1e39f8356f8fee6202cd86c8f20fbf301b6
Author: kpmurali <[email protected]>
Date:   Tue May 29 17:36:35 2018 -0700

    Fixing the xml markup (#11068)

@azai91
Copy link
Contributor Author

azai91 commented Jun 15, 2018

@pengzhao-intel please review

@pengzhao-intel
Copy link
Contributor

Thanks @azai91 Could you elaborate what is the root cause of the regression and how do you fix it?

@azai91
Copy link
Contributor Author

azai91 commented Jun 15, 2018

@pengzhao-intel the root cause of the issue was that we were always creating new tmp memory for WriteTo operations in the new diff (https://github.com/apache/incubator-mxnet/pull/11026/files#diff-21f97d54b22ca65a086fe9e13c217453R169). I added a check in CreateMKLDNNMem to not create new memory unless the formats are incompatible.

@pengzhao-intel
Copy link
Contributor

Sounds good. @TaoLv please help take a look for the code changes.

@azai91
Copy link
Contributor Author

azai91 commented Jun 15, 2018

I have a separate PR where I write unit test for the CreateMKLDNNMem/CommitOutput coming.

@@ -327,7 +327,8 @@ typedef std::pair<OutDataOp, mkldnn::memory *> mkldnn_output_t;
* If these two functions are used, we have to call CommitOutput to write
* the output back to the output NDArray.
*/
mkldnn_output_t CreateMKLDNNMem(const NDArray &arr,
mkldnn_output_t CreateMKLDNNMem(const NDArray &out_arr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you create a separate function for this? so we don't need to modify all mkldnn operators.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, but then technically wouldn't all shouldn't all operators be eligible to use WriteInPlace?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for overriding this function.

Copy link
Contributor

@zheng-da zheng-da Jun 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@azai91 not really. WriteInplace only works in a few operators (sum, activation, batchnorm). It doesn't work on some operators such as convolution and pooling for sure because the inputs and outputs of these operators don't have the same shape.

If you prefer to use one function, please update the comment and explain why CreateMKLDNNMem needs in_arr. Meanwhile, please rename arr of CreateMKLDNNWeightGrad to out_arr, so CreateMKLDNNMem and CreateMKLDNNWeightGrad are still consistent.

return mkldnn_output_t(OutDataOp::CopyBack, tmp);
} else {
} else if (req == kWriteInplace) {
if (CanWriteTo(out_arr, in_arrs[0], desc)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you pass all inputs to this function but only use the first one?

const mkldnn::memory::primitive_desc &desc) {
bool add_same = in_arr.GetMKLDNNData()->get_data_handle() ==
out_arr.GetMKLDNNData()->get_data_handle();
bool pdesc_same = out_arr.GetMKLDNNData()->get_primitive_desc() == desc;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you should compare the primitive descriptor of both input and output NDArrays

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can only compare with first input

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but you are not comparing with the first input. desc can be different from in_arr's descriptor, right?

@@ -92,7 +92,7 @@ void MKLDNNSumForward(const nnvm::NodeAttrs& attrs, const OpContext &ctx,
} else {
// req == kWriteInplace but cannot be handled by mkldnn and
// req == kAddTo will run into this branch
auto mem = CreateMKLDNNMem(out_data, pdesc.dst_primitive_desc(), req);
auto mem = CreateMKLDNNMem(out_data, inputs, pdesc.dst_primitive_desc(), req);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you remove the code above that handles WriteInplace differently.

@@ -722,6 +731,27 @@ void TestBinaryOp(const OpAttrs &attrs, VerifyFunc verify_fn) {
}
}

TEST(MKLDNN_NDArray, CopyFrom) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any modification for this test? Can you move it back to its original place to show the difference more clearly.

@azai91 azai91 force-pushed the fix/mkldnn-performance branch from 6aa538f to 7f9dac4 Compare June 15, 2018 17:57
@azai91
Copy link
Contributor Author

azai91 commented Jun 15, 2018

@pengzhao-intel for in place writes, is there any reason why we only compare the pdesc and data_handle of the first input and not all of the them? does the write in place mkldnn api only work if the vector to be written over the first one?

https://github.com/apache/incubator-mxnet/blob/master/src/operator/nn/mkldnn/mkldnn_sum.cc#L76

@azai91 azai91 force-pushed the fix/mkldnn-performance branch 3 times, most recently from 027c620 to 873a2c6 Compare June 15, 2018 19:08
@azai91
Copy link
Contributor Author

azai91 commented Jun 15, 2018

Ran a test and it seems this is the case - write in place (at least for sum) requires that the tensor to be written over is at index 0. @pengzhao-intel please verify.

@azai91 azai91 force-pushed the fix/mkldnn-performance branch 4 times, most recently from bb31c0b to 83ce4f0 Compare June 15, 2018 23:53
@azai91 azai91 force-pushed the fix/mkldnn-performance branch 2 times, most recently from 27218eb to f4b7db8 Compare June 17, 2018 17:49
@azai91 azai91 force-pushed the fix/mkldnn-performance branch from f4b7db8 to 76b50bc Compare June 17, 2018 17:57
mkldnn::memory *mem = const_cast<NDArray &>(arr).CreateMKLDNNData(desc);
if (mem == nullptr) {
} else if (req == kWriteInplace && in_arr != nullptr && CanWriteTo(out_arr, *in_arr, desc)) {
mkldnn::memory *mem = const_cast<NDArray &>(out_arr).CreateMKLDNNData(desc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if output data is a view and the required format is MKLDNN format, CreateMKLDNNData can return null.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything else looks good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently we have this check before calling CreateMKLDNNData, but I will add a CHECK to make sure the developer knows to reorder2default beforehand.

https://github.com/apache/incubator-mxnet/blob/master/src/operator/nn/mkldnn/mkldnn_act.cc#L164

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is different. CreateMKLDNNData is called on output arrays. The check you added in activation is on input arrays.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the input and output arrays are the same for kWriteInPlace?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see. you are right.

@azai91 azai91 force-pushed the fix/mkldnn-performance branch from 2b3bd3a to d34a50b Compare June 17, 2018 20:30
@piiswrong piiswrong merged commit 92fde19 into apache:master Jun 18, 2018
zheng-da pushed a commit to zheng-da/incubator-mxnet that referenced this pull request Jun 28, 2018
…pache#11262)

* do not create tmp memory during act

* fix order of alloc memory

* fix conditional

* fix order

* do not pass nullptr to commit

* fix comment

* do not create tmp mem unless shapes diff

* fix params

* always return in CreateMKLDNNMem

* add boilerplate for CreateMKLDNNMem test

* refactor copyfrom

* use copyfrom helper in tests

* add logs

* missing semi

* improve print msg

* target out_mem

* test copy from

* reuse verify copy

* add inplace test / use sum for test

* use assert in sum verify

* lint

* remove unused var

* fix test messsage

* out_mem can be null

* Revert "refactor copyfrom"

This reverts commit 4ab131e.

* add back missing var

* writeinplace explicitly returns same memory

* refactor

* only writeinplace if add and pdesc are eq

* fix comparison

* add second CreateMKLDNNMemory

* CreateMKLDNNMem accepts input

* refactor WriteTo criteria into separate method

* fix lint

* copyfrom test back

* update mldnnsum test to have diff inputs for write in place

* test in place sum with diff arrs

* revert CreateMKLDNNMem extra param change

* pass input arr param for act_forward

* remove extra header

* fix indent

* add check for writeto

* canwriteto uses ref instead of ptr

* update comments for CreateMKLDNNData

* compare input and output desc with op pdesc

* check CreateMKLDNNData does not return null
XinYao1994 pushed a commit to XinYao1994/incubator-mxnet that referenced this pull request Aug 29, 2018
…pache#11262)

* do not create tmp memory during act

* fix order of alloc memory

* fix conditional

* fix order

* do not pass nullptr to commit

* fix comment

* do not create tmp mem unless shapes diff

* fix params

* always return in CreateMKLDNNMem

* add boilerplate for CreateMKLDNNMem test

* refactor copyfrom

* use copyfrom helper in tests

* add logs

* missing semi

* improve print msg

* target out_mem

* test copy from

* reuse verify copy

* add inplace test / use sum for test

* use assert in sum verify

* lint

* remove unused var

* fix test messsage

* out_mem can be null

* Revert "refactor copyfrom"

This reverts commit 4ab131e.

* add back missing var

* writeinplace explicitly returns same memory

* refactor

* only writeinplace if add and pdesc are eq

* fix comparison

* add second CreateMKLDNNMemory

* CreateMKLDNNMem accepts input

* refactor WriteTo criteria into separate method

* fix lint

* copyfrom test back

* update mldnnsum test to have diff inputs for write in place

* test in place sum with diff arrs

* revert CreateMKLDNNMem extra param change

* pass input arr param for act_forward

* remove extra header

* fix indent

* add check for writeto

* canwriteto uses ref instead of ptr

* update comments for CreateMKLDNNData

* compare input and output desc with op pdesc

* check CreateMKLDNNData does not return null
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants