Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Fix issue of test_gru_bidirectional #11219 and add robust code #11333

Merged
merged 6 commits into from
Jun 22, 2018

Conversation

lihaofd
Copy link
Contributor

@lihaofd lihaofd commented Jun 19, 2018

Description

In this PR, it fixed issue of test_gru_bidirectional #11219 and add robust code. @pengzhao-intel, @TaoLv

Feature changes

New features

  • Fixed issue of test_gru_bidirectional Flaky test: test_gru_bidirectional #11219. The reason is when creating temporary tensor for reshape, the code utilize the unallocated memory because of the incorrect start and end point. Because the data in unallocated memory didn't be used in linalg_gemm, sometimes the code can pass and sometimes it will crash. We added 1000 times checking for test_gru_bidirectional in tests/python/unittests/test_operator.py. And will keep 1 time checking after testing is passed
  • Add robust code like param initialization. It can also make GRU/LSTM Convergence Curve (example/rnn/bucketing/cudnn_rnn_bucketing.py) be stable.
  • Replace memcpy/memset by using omp for better performance.

Test input size is from DS2 default parameters(seq_length = 300, batch_size = 20, input_size = 800, hidden_size = 800).

Layer=1 bidirectional = False

API Inference time(fwd, samples/sec) Training time(fwd + bwd, samples/sec)
sym.RNN(mem) 357.7 168.06
sym.RNN(omp) 383.14 176.2
speedup 107% 104.8%

Layer=5 bidirectional = False

API Inference time(fwd, samples/sec) Training time(fwd + bwd, samples/sec)
sym.RNN(mem) 86.66 37.24
sym.RNN(omp) 88.1 40
speedup 101.6% 107.4%

Unit-test changes

  • Add testcase test_loop_gru_bidirectional to test 1000 times test_gru_bidirectional in tests/python/unittests/test_operator.py. And will remove 1000 times checking after CI is passed.

Checklist

  • Passed code style checking (make lint).
  • All changes have test coverage.
  • Code is well-documented.

@lihaofd
Copy link
Contributor Author

lihaofd commented Jun 19, 2018

@marcoabreu
When testing 1000 times test_gru_bidirectional in one nosetests, jenkins testing failed with "Sending interrupt signal to process". While testing 100 times test_gru_bidirectional can pass.
Is it caused by timeout? If this is the case, how many times testing could be suggested in this testing?
Thanks!

"test_operator.test_loop_gru_bidirectional ... Sending interrupt signal to process

After 10s process did not stop
or
C:\jenkins_slave\workspace\ut-python-cpu@2\pkg_vc14_cpu\python\mxnet\rnn\rnn_cell.py:675: UserWarning: NTC layout detected. Consider using TNC for FusedRNNCell for faster speed

warnings.warn("NTC layout detected. Consider using "

Sending interrupt signal to process

After 10s process did not stop

@marcoabreu
Copy link
Contributor

Hello,
yes, please don't use our CI for these kind of things but do it locally. Please see https://cwiki.apache.org/confluence/display/MXNET/Reproducing+test+results#Reproducingtestresults-Repeatingtestexecution for details.

Correct, this timeout was our CI terminating the job because it was running too long.

@TaoLv
Copy link
Member

TaoLv commented Jun 20, 2018

@szha could you help to review this fix?

@lihaofd lihaofd changed the title [WIP] fix issue of test_gru_bidirectional #11219 and add robust code Fix issue of test_gru_bidirectional #11219 and add robust code Jun 20, 2018
@szha szha requested a review from piiswrong June 20, 2018 03:31
@szha szha merged commit 5550c0a into apache:master Jun 22, 2018
XinYao1994 pushed a commit to XinYao1994/incubator-mxnet that referenced this pull request Aug 29, 2018
…pache#11333)

* fix param init bug and remove memcpy/memset

* fix bug for bidirection size

* add 100 times loop for test_gru_bidirectional robust checking

* add test_loop_gru_bidirectional

* record number of passed

* remove 1000 times case
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants