[MXNET-542] Fix mkldnn performance regression + improve test logging #11262

azai91 · 2018-06-13T18:06:54Z

Description

Fix MKLDNN performance issue
CreateMKLDNNMem accepts input as param to compare if WriteInPlace valid
improve test logging

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Do not create temp copy of memory during activation unless input/output shapes mismatch.

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

azai91 · 2018-06-15T00:03:19Z

so this diff fails because of a recent test that was added after my initial merge that caused the regression. the test compares the output of the hybridized vs non-hybridized networks and ensures the results are roughly equal. prior to my merge that caused the regression, while it was performant, the hybridized results vs non-hybridized results varied (which causes this new test to fail).

azai91 · 2018-06-15T05:38:48Z

Benchmarks as of now:
Current master branch

ubuntu@ip-172-31-11-93:~/incubator-mxnet-original$ python example/image-classification/benchmark_score.py
/home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
INFO:root:network: resnet-50
INFO:root:device: cpu(0)
/home/ubuntu/incubator-mxnet-original/python/mxnet/module/base_module.py:66: UserWarning: Data provided by label_shapes don't match names specified by label_names ([] vs. ['softmax_label'])
  warnings.warn(msg)
[05:01:20] ../src/operator/nn/mkldnn/mkldnn_base.cc:72: Allocate 9437184 bytes with malloc directly
INFO:root:batch size  1, image/sec: 15.918593
INFO:root:batch size  2, image/sec: 27.955709
INFO:root:batch size  4, image/sec: 35.017213
INFO:root:batch size  8, image/sec: 34.267697
INFO:root:batch size 16, image/sec: 37.663340
INFO:root:batch size 32, image/sec: 41.455577

This PR

ubuntu@ip-172-31-11-93:~/incubator-mxnet-reg$ python example/image-classification/benchmark_score.py
/home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Assertion failure at kmp_runtime.cpp(3767): __kmp_gtid_get_specific() == gtid.
Assertion failure at kmp_runtime.cpp(2582): __kmp_init_serial.
Assertion failure at kmp_runtime.cpp(3767): __kmp_gtid_get_specific() == gtid.
Assertion failure at kmp_runtime.cpp(3767): __kmp_gtid_get_specific() == gtid.
INFO:root:network: resnet-50
INFO:root:device: cpu(0)
/home/ubuntu/incubator-mxnet-reg/python/mxnet/module/base_module.py:66: UserWarning: Data provided by label_shapes don't match names specified by label_names ([] vs. ['softmax_label'])
  warnings.warn(msg)
[05:36:17] ../src/operator/nn/mkldnn/mkldnn_base.cc:72: Allocate 147456 bytes with malloc directly
[05:36:17] ../src/operator/nn/mkldnn/mkldnn_base.cc:72: Allocate 589824 bytes with malloc directly
[05:36:17] ../src/operator/nn/mkldnn/mkldnn_base.cc:72: Allocate 2359296 bytes with malloc directly
[05:36:17] ../src/operator/nn/mkldnn/mkldnn_base.cc:72: Allocate 9437184 bytes with malloc directly
INFO:root:batch size  1, image/sec: 16.331885
INFO:root:batch size  2, image/sec: 22.146912
INFO:root:batch size  4, image/sec: 29.284689
INFO:root:batch size  8, image/sec: 47.227163
INFO:root:batch size 16, image/sec: 50.323157
INFO:root:batch size 32, image/sec: 51.729604

azai91 · 2018-06-15T05:51:24Z

Benchmark from diff 9514a1e

ubuntu@ip-172-31-11-93:~/incubator-mxnet-original$ python example/image-classification/benchmark_score.py
/home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
INFO:root:network: resnet-50
INFO:root:device: cpu(0)
/home/ubuntu/incubator-mxnet-original/python/mxnet/module/base_module.py:66: UserWarning: Data provided by label_shapes don't match names specified by label_names ([] vs. ['softmax_label'])
  warnings.warn(msg)
[05:50:11] ../src/operator/nn/mkldnn/mkldnn_base.cc:72: Allocate 147456 bytes with malloc directly
[05:50:11] ../src/operator/nn/mkldnn/mkldnn_base.cc:72: Allocate 589824 bytes with malloc directly
[05:50:11] ../src/operator/nn/mkldnn/mkldnn_base.cc:72: Allocate 2359296 bytes with malloc directly
[05:50:11] ../src/operator/nn/mkldnn/mkldnn_base.cc:72: Allocate 9437184 bytes with malloc directly
INFO:root:batch size  1, image/sec: 22.502343
INFO:root:batch size  2, image/sec: 31.937269
INFO:root:batch size  4, image/sec: 41.011916
INFO:root:batch size  8, image/sec: 47.140644
INFO:root:batch size 16, image/sec: 50.467947
INFO:root:batch size 32, image/sec: 51.507261
ubuntu@ip-172-31-11-93:~/incubator-mxnet-original$ git log
commit 9514a1e39f8356f8fee6202cd86c8f20fbf301b6
Author: kpmurali <[email protected]>
Date:   Tue May 29 17:36:35 2018 -0700

    Fixing the xml markup (#11068)

azai91 · 2018-06-15T05:52:51Z

@pengzhao-intel please review

pengzhao-intel · 2018-06-15T05:59:31Z

Thanks @azai91 Could you elaborate what is the root cause of the regression and how do you fix it?

azai91 · 2018-06-15T06:23:36Z

@pengzhao-intel the root cause of the issue was that we were always creating new tmp memory for WriteTo operations in the new diff (https://github.com/apache/incubator-mxnet/pull/11026/files#diff-21f97d54b22ca65a086fe9e13c217453R169). I added a check in CreateMKLDNNMem to not create new memory unless the formats are incompatible.

pengzhao-intel · 2018-06-15T06:50:49Z

Sounds good. @TaoLv please help take a look for the code changes.

azai91 · 2018-06-15T16:53:54Z

I have a separate PR where I write unit test for the CreateMKLDNNMem/CommitOutput coming.

zheng-da · 2018-06-15T17:30:25Z

src/operator/nn/mkldnn/mkldnn_base-inl.h

@@ -327,7 +327,8 @@ typedef std::pair<OutDataOp, mkldnn::memory *> mkldnn_output_t;
 * If these two functions are used, we have to call CommitOutput to write
 * the output back to the output NDArray.
 */
-mkldnn_output_t CreateMKLDNNMem(const NDArray &arr,
+mkldnn_output_t CreateMKLDNNMem(const NDArray &out_arr,


can you create a separate function for this? so we don't need to modify all mkldnn operators.

I thought about that, but then technically wouldn't all shouldn't all operators be eligible to use WriteInPlace?

+1 for overriding this function.

@azai91 not really. WriteInplace only works in a few operators (sum, activation, batchnorm). It doesn't work on some operators such as convolution and pooling for sure because the inputs and outputs of these operators don't have the same shape.

If you prefer to use one function, please update the comment and explain why CreateMKLDNNMem needs in_arr. Meanwhile, please rename arr of CreateMKLDNNWeightGrad to out_arr, so CreateMKLDNNMem and CreateMKLDNNWeightGrad are still consistent.

zheng-da · 2018-06-15T17:35:07Z

src/operator/nn/mkldnn/mkldnn_base.cc

-      return mkldnn_output_t(OutDataOp::CopyBack, tmp);
-    } else {
+  } else if (req == kWriteInplace) {
+    if (CanWriteTo(out_arr, in_arrs[0], desc)) {


you pass all inputs to this function but only use the first one?

zheng-da · 2018-06-15T17:35:40Z

src/operator/nn/mkldnn/mkldnn_base.cc

+                const mkldnn::memory::primitive_desc &desc) {
+  bool add_same = in_arr.GetMKLDNNData()->get_data_handle() ==
+      out_arr.GetMKLDNNData()->get_data_handle();
+  bool pdesc_same = out_arr.GetMKLDNNData()->get_primitive_desc() == desc;


maybe you should compare the primitive descriptor of both input and output NDArrays

can only compare with first input

but you are not comparing with the first input. desc can be different from in_arr's descriptor, right?

zheng-da · 2018-06-15T17:36:57Z

src/operator/nn/mkldnn/mkldnn_sum.cc

@@ -92,7 +92,7 @@ void MKLDNNSumForward(const nnvm::NodeAttrs& attrs, const OpContext &ctx,
  } else {
    // req == kWriteInplace but cannot be handled by mkldnn and
    // req == kAddTo will run into this branch
-    auto mem = CreateMKLDNNMem(out_data, pdesc.dst_primitive_desc(), req);
+    auto mem = CreateMKLDNNMem(out_data, inputs, pdesc.dst_primitive_desc(), req);


you remove the code above that handles WriteInplace differently.

zheng-da · 2018-06-15T17:38:11Z

tests/cpp/operator/mkldnn.cc

@@ -722,6 +731,27 @@ void TestBinaryOp(const OpAttrs &attrs, VerifyFunc verify_fn) {
  }
 }

+TEST(MKLDNN_NDArray, CopyFrom) {


any modification for this test? Can you move it back to its original place to show the difference more clearly.

azai91 · 2018-06-15T18:50:08Z

@pengzhao-intel for in place writes, is there any reason why we only compare the pdesc and data_handle of the first input and not all of the them? does the write in place mkldnn api only work if the vector to be written over the first one?

https://github.com/apache/incubator-mxnet/blob/master/src/operator/nn/mkldnn/mkldnn_sum.cc#L76

azai91 · 2018-06-15T19:13:40Z

Ran a test and it seems this is the case - write in place (at least for sum) requires that the tensor to be written over is at index 0. @pengzhao-intel please verify.

zheng-da · 2018-06-17T18:23:25Z

src/operator/nn/mkldnn/mkldnn_base.cc

-    mkldnn::memory *mem = const_cast<NDArray &>(arr).CreateMKLDNNData(desc);
-    if (mem == nullptr) {
+  } else if (req == kWriteInplace && in_arr != nullptr && CanWriteTo(out_arr, *in_arr, desc)) {
+      mkldnn::memory *mem = const_cast<NDArray &>(out_arr).CreateMKLDNNData(desc);


if output data is a view and the required format is MKLDNN format, CreateMKLDNNData can return null.

everything else looks good

currently we have this check before calling CreateMKLDNNData, but I will add a CHECK to make sure the developer knows to reorder2default beforehand.

https://github.com/apache/incubator-mxnet/blob/master/src/operator/nn/mkldnn/mkldnn_act.cc#L164

this is different. CreateMKLDNNData is called on output arrays. The check you added in activation is on input arrays.

the input and output arrays are the same for kWriteInPlace?

i see. you are right.

…pache#11262) * do not create tmp memory during act * fix order of alloc memory * fix conditional * fix order * do not pass nullptr to commit * fix comment * do not create tmp mem unless shapes diff * fix params * always return in CreateMKLDNNMem * add boilerplate for CreateMKLDNNMem test * refactor copyfrom * use copyfrom helper in tests * add logs * missing semi * improve print msg * target out_mem * test copy from * reuse verify copy * add inplace test / use sum for test * use assert in sum verify * lint * remove unused var * fix test messsage * out_mem can be null * Revert "refactor copyfrom" This reverts commit 4ab131e. * add back missing var * writeinplace explicitly returns same memory * refactor * only writeinplace if add and pdesc are eq * fix comparison * add second CreateMKLDNNMemory * CreateMKLDNNMem accepts input * refactor WriteTo criteria into separate method * fix lint * copyfrom test back * update mldnnsum test to have diff inputs for write in place * test in place sum with diff arrs * revert CreateMKLDNNMem extra param change * pass input arr param for act_forward * remove extra header * fix indent * add check for writeto * canwriteto uses ref instead of ptr * update comments for CreateMKLDNNData * compare input and output desc with op pdesc * check CreateMKLDNNData does not return null

azai91 mentioned this pull request Jun 13, 2018

[MXNET-497] fix bugs in MKLDNN operators to handle the kAddTo request #11129

Merged

6 tasks

azai91 force-pushed the fix/mkldnn-performance branch from 8f89758 to becedcc Compare June 14, 2018 05:29

azai91 changed the title ~~[MXNET-542] Fix mkldnn performance regression~~ [MXNET-542] Fix mkldnn performance regression + unit test for CreateMKLDNNMem Jun 14, 2018

azai91 force-pushed the fix/mkldnn-performance branch 2 times, most recently from ee981f8 to 3f7e954 Compare June 14, 2018 19:50

azai91 changed the title ~~[MXNET-542] Fix mkldnn performance regression + unit test for CreateMKLDNNMem~~ [MXNET-542] Fix mkldnn performance regression + improve test logging Jun 14, 2018

azai91 force-pushed the fix/mkldnn-performance branch from 42dd59a to 4ec8698 Compare June 14, 2018 22:57

azai91 force-pushed the fix/mkldnn-performance branch 4 times, most recently from 1cceb3c to 1fa9e35 Compare June 15, 2018 05:32

zheng-da reviewed Jun 15, 2018

View reviewed changes

azai91 force-pushed the fix/mkldnn-performance branch from 6aa538f to 7f9dac4 Compare June 15, 2018 17:57

azai91 force-pushed the fix/mkldnn-performance branch 3 times, most recently from 027c620 to 873a2c6 Compare June 15, 2018 19:08

azai91 force-pushed the fix/mkldnn-performance branch 4 times, most recently from bb31c0b to 83ce4f0 Compare June 15, 2018 23:53

azai91 added 18 commits June 17, 2018 10:44

writeinplace explicitly returns same memory

125d40b

refactor

b862175

only writeinplace if add and pdesc are eq

bc0fb15

fix comparison

d397146

add second CreateMKLDNNMemory

686c18b

CreateMKLDNNMem accepts input

64a96c8

refactor WriteTo criteria into separate method

d42ec55

fix lint

63b6f93

copyfrom test back

0e8357c

update mldnnsum test to have diff inputs for write in place

e860b6d

test in place sum with diff arrs

37890a3

revert CreateMKLDNNMem extra param change

5f55c08

pass input arr param for act_forward

4ed3213

remove extra header

4586b04

fix indent

c2a8966

add check for writeto

1bef31d

canwriteto uses ref instead of ptr

efed55b

update comments for CreateMKLDNNData

b57e8c9

azai91 force-pushed the fix/mkldnn-performance branch 2 times, most recently from 27218eb to f4b7db8 Compare June 17, 2018 17:49

compare input and output desc with op pdesc

76b50bc

azai91 force-pushed the fix/mkldnn-performance branch from f4b7db8 to 76b50bc Compare June 17, 2018 17:57

zheng-da reviewed Jun 17, 2018

View reviewed changes

zheng-da approved these changes Jun 17, 2018

View reviewed changes

check CreateMKLDNNData does not return null

d34a50b

azai91 force-pushed the fix/mkldnn-performance branch from 2b3bd3a to d34a50b Compare June 17, 2018 20:30

piiswrong merged commit 92fde19 into apache:master Jun 18, 2018

azai91 mentioned this pull request Jun 21, 2018

[MXNET-551] Test CreateMKLDNNMem/CommitOutput #11308

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MXNET-542] Fix mkldnn performance regression + improve test logging #11262

[MXNET-542] Fix mkldnn performance regression + improve test logging #11262

azai91 commented Jun 13, 2018 •

edited

Loading

azai91 commented Jun 15, 2018

azai91 commented Jun 15, 2018 •

edited

Loading

azai91 commented Jun 15, 2018

azai91 commented Jun 15, 2018

pengzhao-intel commented Jun 15, 2018

azai91 commented Jun 15, 2018 •

edited

Loading

pengzhao-intel commented Jun 15, 2018

azai91 commented Jun 15, 2018

zheng-da Jun 15, 2018

azai91 Jun 15, 2018

TaoLv Jun 16, 2018

zheng-da Jun 16, 2018 •

edited

Loading

zheng-da Jun 15, 2018

zheng-da Jun 15, 2018

azai91 Jun 16, 2018

zheng-da Jun 16, 2018

zheng-da Jun 15, 2018

zheng-da Jun 15, 2018

azai91 commented Jun 15, 2018

azai91 commented Jun 15, 2018

zheng-da Jun 17, 2018

zheng-da Jun 17, 2018

azai91 Jun 17, 2018

zheng-da Jun 17, 2018

azai91 Jun 17, 2018

zheng-da Jun 17, 2018

[MXNET-542] Fix mkldnn performance regression + improve test logging #11262

[MXNET-542] Fix mkldnn performance regression + improve test logging #11262

Conversation

azai91 commented Jun 13, 2018 • edited Loading

Description

Checklist

Essentials

Changes

Comments

azai91 commented Jun 15, 2018

azai91 commented Jun 15, 2018 • edited Loading

azai91 commented Jun 15, 2018

azai91 commented Jun 15, 2018

pengzhao-intel commented Jun 15, 2018

azai91 commented Jun 15, 2018 • edited Loading

pengzhao-intel commented Jun 15, 2018

azai91 commented Jun 15, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zheng-da Jun 16, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

azai91 commented Jun 15, 2018

azai91 commented Jun 15, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

azai91 commented Jun 13, 2018 •

edited

Loading

azai91 commented Jun 15, 2018 •

edited

Loading

azai91 commented Jun 15, 2018 •

edited

Loading

zheng-da Jun 16, 2018 •

edited

Loading