-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate flaky sequential/test-performance #23291
Comments
@nodejs/testing @nodejs/performance |
(The performance team is for changes related to |
https://ci.nodejs.org/job/node-test-commit-smartos/21373/nodes=smartos17-64/console 00:52:03 not ok 2418 sequential/test-performance
00:52:03 ---
00:52:03 duration_ms: 2.224
00:52:03 severity: fail
00:52:03 exitcode: 1
00:52:03 stack: |-
00:52:03 { name: 'node',
00:52:03 entryType: 'node',
00:52:03 startTime: 0,
00:52:03 duration: { around: 1893.748501777649 },
00:52:03 nodeStart: { around: 0 },
00:52:03 v8Start: { around: 0 },
00:52:03 bootstrapComplete: { around: 1887.9314036369324 },
00:52:03 environment: { around: 0 },
00:52:03 loopStart: -1,
00:52:03 loopExit: -1 }
00:52:03 assert.js:351
00:52:03 throw err;
00:52:03 ^
00:52:03
00:52:03 AssertionError [ERR_ASSERTION]: bootstrapComplete: 1796.7190556526184 >= 1111
00:52:03 at checkNodeTiming (/home/iojs/build/workspace/node-test-commit-smartos/nodes/smartos17-64/test/sequential/test-performance.js:92:7)
00:52:03 at Object.<anonymous> (/home/iojs/build/workspace/node-test-commit-smartos/nodes/smartos17-64/test/sequential/test-performance.js:104:1)
00:52:03 at Module._compile (internal/modules/cjs/loader.js:722:30)
00:52:03 at Object.Module._extensions..js (internal/modules/cjs/loader.js:733:10)
00:52:03 at Module.load (internal/modules/cjs/loader.js:620:32)
00:52:03 at tryModuleLoad (internal/modules/cjs/loader.js:560:12)
00:52:03 at Function.Module._load (internal/modules/cjs/loader.js:552:3)
00:52:03 at Function.Module.runMain (internal/modules/cjs/loader.js:775:12)
00:52:03 at startup (internal/bootstrap/node.js:308:19)
00:52:03 at bootstrapNodeJSCore (internal/bootstrap/node.js:878:3)
00:52:03 ... |
I want to run a stress test on smartos 17 but it doesn't look like it's in a choice in our stress test job on CI. Can someone add it? @nodejs/build |
Happened again in #25656 Failed on a Raspberry PI 1 I think:
|
@Trott Not sure if still required, since this was from a few months back, but I've just added it (seemed like a reasonable thing to have there): https://ci.nodejs.org/view/Stress/job/node-stress-single-test/jobConfigHistory/showDiffFiles?timestamp1=2018-12-07_11-55-34×tamp2=2019-01-23_14-21-08 |
For some reason this has affected all three runs of #25656 - does this just always happen on master right now? |
Failing on my other rebased PR too. |
Looks like it's barely-but-relatively-consistently failing due to arbitrary-ish timing constraints. The test lacks a comment explaining its purpose which is unfortunate, given it's bland name open to multiple interpretations. I guess it's testing perf_hooks generally? What is this pinging folks who have authored lines in the test: @BridgeAR @jasnell @TimothyGu @apapirovski @addaleax |
Happened for the first time on node-daily-master today so something seems to have happened in the last 24 hours to cause this to be perma-red (or perma-ish-red) on Pi 1, it would seem. @nodejs/testing https://ci.nodejs.org/job/node-test-binary-arm/5673/RUN_SUBSET=3,label=pi1-docker/console test-requireio_continuationlabs-debian9-armv6l_pi1p-1 00:44:32 not ok 342 sequential/test-performance
00:44:32 ---
00:44:32 duration_ms: 3.785
00:44:32 severity: fail
00:44:32 exitcode: 1
00:44:32 stack: |-
00:44:32 { name: 'node',
00:44:32 entryType: 'node',
00:44:32 startTime: 0,
00:44:32 duration: { around: 2250.0830249786377 },
00:44:32 nodeStart: { around: 0 },
00:44:32 v8Start: { around: 0 },
00:44:32 bootstrapComplete: { around: 2159.4409494400024 },
00:44:32 environment: { around: 0 },
00:44:32 loopStart: -1,
00:44:32 loopExit: -1 }
00:44:32 { name: 'node',
00:44:32 entryType: 'node',
00:44:32 startTime: 0,
00:44:32 duration: { around: 3522.481032371521 },
00:44:32 nodeStart: { around: 0 },
00:44:32 v8Start: { around: 0 },
00:44:32 bootstrapComplete: { around: 2159.4409494400024 },
00:44:32 environment: { around: 0 },
00:44:32 loopStart: { around: 2159.4409494400024 },
00:44:32 loopExit: -1 }
00:44:32 { name: 'node',
00:44:32 entryType: 'node',
00:44:32 startTime: 0,
00:44:32 duration: { around: 3554.94070148468 },
00:44:32 nodeStart: { around: 0 },
00:44:32 v8Start: { around: 0 },
00:44:32 bootstrapComplete: { around: 2159.4409494400024 },
00:44:32 environment: { around: 0 },
00:44:32 loopStart: { around: 2159.4409494400024 },
00:44:32 loopExit: { around: 3555.3136978149414 } }
00:44:32 assert.js:351
00:44:32 throw err;
00:44:32 ^
00:44:32
00:44:32 AssertionError [ERR_ASSERTION]: bootstrapComplete: 1075.962013244629 >= 1073
00:44:32 at checkNodeTiming (/home/iojs/build/workspace/node-test-binary-arm/test/sequential/test-performance.js:92:7)
00:44:32 at Timeout.setTimeout [as _onTimeout] (/home/iojs/build/workspace/node-test-binary-arm/test/sequential/test-performance.js:119:5)
00:44:32 at listOnTimeout (timers.js:324:15)
00:44:32 at processTimers (timers.js:268:5)
00:44:32 ... Failed by just 2ms or so? |
This gets weird. https://ci.nodejs.org/job/node-test-binary-arm/5610/ was green. Hooray? But looking at the fanned jobs for Pi1, nowhere was sequential/test-performance run. And the PR (#25612) doesn't remove it or anything. The test is still in master and that PR is against master. ?????!!!!!! @nodejs/build |
I am not entirely sure about what the test is actually doing but I guess it was failing because: node/test/sequential/test-performance.js Line 17 in d40ed71
Is too far away from the bootstrap complete milestone? Then there could be two remedies:
|
FWIW, the test seems to be passing again, at least for now, on Pi 1. See https://ci.nodejs.org/job/node-test-binary-arm/5677/RUN_SUBSET=3,label=pi1-docker/console for example. Not sure for how long, but I guess it's encouraging that it's not a big block fo red for the moment... |
my take after reviewing the test case is:
|
this is what I did:
these are small numbers in my fast linux box, and able to accommodate in the very conservative number of 1111 millis in the test. However, in the failing So:
I guess the test is doing its intented work, and we should only be changing the heuristics, not the test logic itself. /cc @joyeecheung |
@gireeshpunathil In theory d3806f9 should’ve decreased the boot time because it delays the console extension until those are actually needed when the inspector console is accessed (and no longer done unconditionally during bootstrap), maybe a benchmark run. of misc/startup would help identifying that. |
@joyeecheung - yes,reading the commit message I also got the same impression - this commit as a change to boost the boot time. unfortunately I don't know how to run a benchmark test! |
I suppose it's possible that the increase might be a peculiarity/limitation of It's not clear to me that this test is actually trying to flag bootstrap performance issues as much as it is trying to confirm that Speaking of |
I guess I also don't feel like using Pi 1 as our performance canary is a great approach. I'm inclined to treat this as a "bare functionality of This is supported IMO by the fact that the test was added with the initial implementation of |
@Trott I don't know about it passing as its failed twice in attempts for me today. |
PR-URL: nodejs#25695 Fixes: nodejs#23291 Reviewed-By: Tiancheng "Timothy" Gu <[email protected]> Reviewed-By: Rich Trott <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Jeremiah Senkpiel <[email protected]>
Fixed by d1d357d |
ok, now that #25695 is landed and this one closed it makes little difference to raise my opinion; but raising it anyways. these points:
would have been considerations at the time when the test was introduced IMO. Now that it showed us
we have many instances where unrelated tests threw light on serious issues in the code; issues associated in #25007 are latest examples - none of those were meant to catch race conditions; yet those tests were used to reproduce issues, troubleshoot, fix and finally validate the correctness of the fix. my summary is that the test threw something to us - by design or by chance; amending the code block that suppresses the issue without looking at the root cause of the failure may be like silently ignoring an error. |
I guess ideally we'd do both. I don't want to leave it broken, causing people to either not be able to land valid changes or else get into the habit of ignoring failing test suites. But as you are suggesting, I also want to understand why the test started failing so a determination can be made if it is a sign of a larger problem. I think the issue this brought up is probably more appropriate for a benchmark than a test. (And maybe there already is a benchmark in |
Perhaps the thing to do is to open a new issue about the slower startup. Because I guess we are talking about two different things anyway. One issue is that a test is failing when it should not be failing. That's this issue. It's about fixing the test. But the second issue, which this one uncovered, is why is startup slower than it used to be? These questions are related, but they can (and probably should?) be considered separately? Like, it's possible (as we've done here) to fix the first issue without successfully addressing the second. |
PR-URL: #25695 Fixes: #23291 Reviewed-By: Tiancheng "Timothy" Gu <[email protected]> Reviewed-By: Rich Trott <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Jeremiah Senkpiel <[email protected]>
PR-URL: #25695 Fixes: #23291 Reviewed-By: Tiancheng "Timothy" Gu <[email protected]> Reviewed-By: Rich Trott <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Jeremiah Senkpiel <[email protected]>
PR-URL: #25695 Fixes: #23291 Reviewed-By: Tiancheng "Timothy" Gu <[email protected]> Reviewed-By: Rich Trott <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Jeremiah Senkpiel <[email protected]>
PR-URL: #25695 Fixes: #23291 Reviewed-By: Tiancheng "Timothy" Gu <[email protected]> Reviewed-By: Rich Trott <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Jeremiah Senkpiel <[email protected]>
https://ci.nodejs.org/job/node-test-binary-arm/3710/RUN_SUBSET=4,label=pi1-docker/console
Looks like a case of testing elapsed time (as a proxy for performance?). Is this a case of the test being unreliable or is it a case of the test flagging something we care about? Test doesn't have a comment explaining itself but it was added in 67269fd / #14680 and there's context there.
/ping @jasnell
The text was updated successfully, but these errors were encountered: