Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test failures on smartos while running in parallel #2567

Closed
jbergstroem opened this issue Aug 26, 2015 · 13 comments
Closed

test failures on smartos while running in parallel #2567

jbergstroem opened this issue Aug 26, 2015 · 13 comments
Labels
test Issues and PRs related to the tests.

Comments

@jbergstroem
Copy link
Member

We consistently get failed tests while running the test suite in parallel on smartos which confuses me, this spawnSync test for instance now takes 8 seconds to exit sleep 1:

not ok 49 - test-child-process-spawnsync.js
# 
# assert.js:89
# throw new assert.AssertionError({
# ^
# AssertionError: timer should take as long as sleep
# at null._onTimeout (/home/iojs/build/workspace/node-test-commit-other/nodes/smartos14-64/test/parallel/test-child-process-spawnsync.js:15:10)
# at Timer.listOnTimeout (timers.js:89:15)
# sleep started
# sleep exited [ 1, 848820350 ]

The above test run generated 12 issues where I can consistently reproduce 8. All of them seem timing related.

@misterdjules got any idea?
/CC @nodejs/platform-solaris

@jbergstroem jbergstroem added the test Issues and PRs related to the tests. label Aug 26, 2015
@Trott
Copy link
Member

Trott commented Aug 27, 2015

I certainly do not dispute your general comments, but on specifically that spawnSync test: It doesn't do that timer/sleep stuff quite like that anymore. This change happened yesterday in fffa4c2 specifically because that test was flaky. See #2470 and #2535.

That said, yeah, we've definitely got a problem with flaky tests. I just hope that spawnSync one isn't one of them anymore. That's all. (And I see that your test run is from before the change, so I'm still optimistic that at least that test is now OK.)

@jbergstroem
Copy link
Member Author

@Trott you're right, here's a run against latest master on smartos-32-1:

[root@48fecd68-c8d1-e3aa-ed24-d1a209167cad ~/iojs-smartos14-32]# tools/test.py -J parallel
=== release test-cluster-master-kill ===                                       
Path: parallel/test-cluster-master-kill
assert.js:89
  throw new assert.AssertionError({
  ^
AssertionError: worker was alive after master died
    at process.<anonymous> (/root/iojs-smartos14-32/test/parallel/test-cluster-master-kill.js:74:12)
    at process.g (events.js:260:16)
    at emitOne (events.js:82:20)
    at process.emit (events.js:169:7)
Command: out/Release/node /root/iojs-smartos14-32/test/parallel/test-cluster-master-kill.js
=== release test-debug-signal-cluster ===                                      
Path: parallel/test-debug-signal-cluster
assert.js:89
  throw new assert.AssertionError({
  ^
AssertionError: test timed out.
    at testTimedOut [as _onTimeout] (/root/iojs-smartos14-32/test/parallel/test-debug-signal-cluster.js:54:3)
    at Timer.unrefdHandle (timers.js:307:14)
Command: out/Release/node /root/iojs-smartos14-32/test/parallel/test-debug-signal-cluster.js
=== release test-http-byteswritten ===                                    
Path: parallel/test-http-byteswritten
events.js:141
      throw er; // Unhandled 'error' event
      ^

Error: listen EADDRINUSE :::15046
    at Object.exports._errnoException (util.js:837:11)
    at exports._exceptionWithHostPort (util.js:860:20)
    at Server._listen2 (net.js:1231:14)
    at listen (net.js:1267:10)
    at Server.listen (net.js:1357:5)
    at Object.<anonymous> (/root/iojs-smartos14-32/test/parallel/test-http-byteswritten.js:40:12)
    at Module._compile (module.js:430:26)
    at Object.Module._extensions..js (module.js:448:10)
    at Module.load (module.js:355:32)
    at Function.Module._load (module.js:310:12)
Command: out/Release/node /root/iojs-smartos14-32/test/parallel/test-http-byteswritten.js
=== release test-http-default-encoding ===                                     
Path: parallel/test-http-default-encoding
events.js:141
      throw er; // Unhandled 'error' event
      ^

Error: listen EADDRINUSE :::15046
    at Object.exports._errnoException (util.js:837:11)
    at exports._exceptionWithHostPort (util.js:860:20)
    at Server._listen2 (net.js:1231:14)
    at listen (net.js:1267:10)
    at Server.listen (net.js:1357:5)
    at Object.<anonymous> (/root/iojs-smartos14-32/test/parallel/test-http-default-encoding.js:21:8)
    at Module._compile (module.js:430:26)
    at Object.Module._extensions..js (module.js:448:10)
    at Module.load (module.js:355:32)
    at Function.Module._load (module.js:310:12)
Command: out/Release/node /root/iojs-smartos14-32/test/parallel/test-http-default-encoding.js
=== release test-http-keepalive-client ===                                 
Path: parallel/test-http-keepalive-client
events.js:141
      throw er; // Unhandled 'error' event
      ^

Error: listen EADDRINUSE :::15046
    at Object.exports._errnoException (util.js:837:11)
    at exports._exceptionWithHostPort (util.js:860:20)
    at Server._listen2 (net.js:1231:14)
    at listen (net.js:1267:10)
    at Server.listen (net.js:1357:5)
    at Object.<anonymous> (/root/iojs-smartos14-32/test/parallel/test-http-keepalive-client.js:19:8)
    at Module._compile (module.js:430:26)
    at Object.Module._extensions..js (module.js:448:10)
    at Module.load (module.js:355:32)
    at Function.Module._load (module.js:310:12)
Command: out/Release/node /root/iojs-smartos14-32/test/parallel/test-http-keepalive-client.js
=== release test-http-res-write-after-end ===                                  
Path: parallel/test-http-res-write-after-end
events.js:141
      throw er; // Unhandled 'error' event
      ^

Error: listen EADDRINUSE :::15046
    at Object.exports._errnoException (util.js:837:11)
    at exports._exceptionWithHostPort (util.js:860:20)
    at Server._listen2 (net.js:1231:14)
    at listen (net.js:1267:10)
    at Server.listen (net.js:1357:5)
    at Object.<anonymous> (/root/iojs-smartos14-32/test/parallel/test-http-res-write-after-end.js:20:8)
    at Module._compile (module.js:430:26)
    at Object.Module._extensions..js (module.js:448:10)
    at Module.load (module.js:355:32)
    at Function.Module._load (module.js:310:12)
Command: out/Release/node /root/iojs-smartos14-32/test/parallel/test-http-res-write-after-end.js
=== release test-http-url.parse-search ===                                    
Path: parallel/test-http-url.parse-search
events.js:141
      throw er; // Unhandled 'error' event
      ^

Error: listen EADDRINUSE :::15046
    at Object.exports._errnoException (util.js:837:11)
    at exports._exceptionWithHostPort (util.js:860:20)
    at Server._listen2 (net.js:1231:14)
    at listen (net.js:1267:10)
    at Server.listen (net.js:1357:5)
    at Object.<anonymous> (/root/iojs-smartos14-32/test/parallel/test-http-url.parse-search.js:22:8)
    at Module._compile (module.js:430:26)
    at Object.Module._extensions..js (module.js:448:10)
    at Module.load (module.js:355:32)
    at Function.Module._load (module.js:310:12)
Command: out/Release/node /root/iojs-smartos14-32/test/parallel/test-http-url.parse-search.js
=== release test-listen-fd-cluster ===                                         
Path: parallel/test-listen-fd-cluster
Cluster listen fd test runner
about to listen in parent
events.js:141
      throw er; // Unhandled 'error' event
      ^

Error: listen EADDRINUSE :::15046
    at Object.exports._errnoException (util.js:837:11)
    at exports._exceptionWithHostPort (util.js:860:20)
    at Server._listen2 (net.js:1231:14)
    at listen (net.js:1267:10)
    at Server.listen (net.js:1357:5)
    at test (/root/iojs-smartos14-32/test/parallel/test-listen-fd-cluster.js:70:6)
    at Object.<anonymous> (/root/iojs-smartos14-32/test/parallel/test-listen-fd-cluster.js:40:1)
    at Module._compile (module.js:430:26)
    at Object.Module._extensions..js (module.js:448:10)
    at Module.load (module.js:355:32)
Command: out/Release/node /root/iojs-smartos14-32/test/parallel/test-listen-fd-cluster.js
=== release test-tls-client-resume ===                                         
Path: parallel/test-tls-client-resume
events.js:141
      throw er; // Unhandled 'error' event
      ^

Error: listen EADDRINUSE :::15046
    at Object.exports._errnoException (util.js:837:11)
    at exports._exceptionWithHostPort (util.js:860:20)
    at Server._listen2 (net.js:1231:14)
    at listen (net.js:1267:10)
    at Server.listen (net.js:1357:5)
    at Object.<anonymous> (/root/iojs-smartos14-32/test/parallel/test-tls-client-resume.js:31:8)
    at Module._compile (module.js:430:26)
    at Object.Module._extensions..js (module.js:448:10)
    at Module.load (module.js:355:32)
    at Function.Module._load (module.js:310:12)
Command: out/Release/node /root/iojs-smartos14-32/test/parallel/test-tls-client-resume.js
=== release test-tls-pause ===                                          
Path: parallel/test-tls-pause
events.js:141
      throw er; // Unhandled 'error' event
      ^

Error: listen EADDRINUSE :::15046
    at Object.exports._errnoException (util.js:837:11)
    at exports._exceptionWithHostPort (util.js:860:20)
    at Server._listen2 (net.js:1231:14)
    at listen (net.js:1267:10)
    at Server.listen (net.js:1357:5)
    at Object.<anonymous> (/root/iojs-smartos14-32/test/parallel/test-tls-pause.js:30:8)
    at Module._compile (module.js:430:26)
    at Object.Module._extensions..js (module.js:448:10)
    at Module.load (module.js:355:32)
    at Function.Module._load (module.js:310:12)
Command: out/Release/node /root/iojs-smartos14-32/test/parallel/test-tls-pause.js
[01:16|% 100|+ 808|-  10]: Done   

@Trott
Copy link
Member

Trott commented Aug 27, 2015

Perhaps the 8 EADDRINUSE errors mean that a test that runs early on is failing but not cleaning up after itself and keeping common.PORT blocked on localhost? So maybe fixing one test will also fix 8 other tests?

@jbergstroem
Copy link
Member Author

test/fixtures/clustered-server/app.js seems to be running in parallel which probably shouldn't happen.

Edit: yes it should.

@jbergstroem
Copy link
Member Author

Just an update, here's the three tests I consistently see issues with:

=== release test-cluster-master-kill ===                                       
Path: parallel/test-cluster-master-kill
assert.js:89
  throw new assert.AssertionError({
  ^
AssertionError: worker was alive after master died
    at process.<anonymous> (/root/iojs-smartos14-32/test/parallel/test-cluster-master-kill.js:74:12)
    at process.g (events.js:260:16)
    at emitOne (events.js:82:20)
    at process.emit (events.js:169:7)
Command: out/Release/node /root/iojs-smartos14-32/test/parallel/test-cluster-master-kill.js
=== release test-cluster-master-error ===                           
Path: parallel/test-cluster-master-error
assert.js:89
  throw new assert.AssertionError({
  ^
AssertionError: The workers did not die after an error in the master
    at process.<anonymous> (/root/iojs-smartos14-32/test/parallel/test-cluster-master-error.js:116:12)
    at process.g (events.js:260:16)
    at emitOne (events.js:82:20)
    at process.emit (events.js:169:7)
Command: out/Release/node /root/iojs-smartos14-32/test/parallel/test-cluster-master-error.js
=== release test-debug-signal-cluster ===                                      
Path: parallel/test-debug-signal-cluster
assert.js:89
  throw new assert.AssertionError({
  ^
AssertionError: test timed out.
    at testTimedOut [as _onTimeout] (/root/iojs-smartos14-32/test/parallel/test-debug-signal-cluster.js:54:3)
    at Timer.unrefdHandle (timers.js:307:14)
Command: out/Release/node /root/iojs-smartos14-32/test/parallel/test-debug-signal-cluster.js

@whitlockjc
Copy link
Contributor

I just ran tools/test.py -J parallel on the latest master and I do not see the failures mentioned above. I do realize that there are a number of failing tests that seem to be SmartOS specific but I figured I'd let you know.

[notyou@smartos-14-32 ~/node]# tools/test.py -J parallel
[03:51|% 100|+ 872|-   0]: Done

@whitlockjc
Copy link
Contributor

Also, what are the chances for a smartos label?

@jbergstroem
Copy link
Member Author

@whitlockjc what smartos/packages version?

@whitlockjc
Copy link
Contributor

I used image uuid 47830136-24ac-11e5-a61c-4f7c17f605f4 which has SmartOS 14.4.1. The only extra package I installed was gcc4.9.

@Trott
Copy link
Member

Trott commented Nov 20, 2015

Is this still an issue? Seems like this hasn't been a problem on the new CI.

@jbergstroem
Copy link
Member Author

@Trott we don't run tests in parallel on CI just yet.

@Trott
Copy link
Member

Trott commented Nov 20, 2015

Ah, yes, doing make test on SmartOS triggers this. Got it. Thanks.

Trott added a commit to Trott/io.js that referenced this issue Jan 1, 2016
Instead of waiting 200 milliseconds for workers to exit, check that they
exited in process.on('exit',...). This checks functionality without
coupling it to performance/benchmarks. The test became flaky in CI when
parallelized.

Ref: nodejs#2567
@Trott
Copy link
Member

Trott commented Jan 11, 2016

Parallelized tests are now running green on SmartOS. Latest run of @jbergstroem's parallelizing PR in CI: https://ci.nodejs.org/job/node-test-commit/1656/

Closing. Re-open if I've totally Missed The Point or something.

@Trott Trott closed this as completed Jan 11, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Issues and PRs related to the tests.
Projects
None yet
Development

No branches or pull requests

3 participants