-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate flaky test-worker-prof #26401
Comments
the assertion failure indicates that either the parent or the child - one those did not produce a trace output! (something beyond what we thought could potentially cause flake in this test - i.e., the process timings and the number of ticks etc.) |
PR-URL: nodejs#26557 Refs: nodejs#26401 Reviewed-By: Gireesh Punathil <[email protected]> Reviewed-By: Rich Trott <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]>
Now it's flaking on Windows assert.js:340
throw err;
^
AssertionError [ERR_ASSERTION]: 3 >= 15
at Object.<anonymous> (c:\workspace\node-test-binary-windows\test\parallel\test-worker-prof.js:39:3)
at Module._compile (internal/modules/cjs/loader.js:813:30)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:824:10)
at Module.load (internal/modules/cjs/loader.js:680:32)
at tryModuleLoad (internal/modules/cjs/loader.js:612:12)
at Function.Module._load (internal/modules/cjs/loader.js:604:3)
at Function.Module.runMain (internal/modules/cjs/loader.js:876:12)
at internal/main/run_main_module.js:21:11 |
Also on Linux without intl:
|
PR-URL: nodejs#26600 Refs: nodejs#26401 Reviewed-By: Rich Trott <[email protected]> Reviewed-By: Richard Lau <[email protected]>
PR-URL: #26557 Refs: #26401 Reviewed-By: Gireesh Punathil <[email protected]> Reviewed-By: Rich Trott <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]>
PR-URL: #26557 Refs: #26401 Reviewed-By: Gireesh Punathil <[email protected]> Reviewed-By: Rich Trott <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]>
PR-URL: #26600 Refs: #26401 Reviewed-By: Rich Trott <[email protected]> Reviewed-By: Richard Lau <[email protected]>
PR-URL: #26557 Refs: #26401 Reviewed-By: Gireesh Punathil <[email protected]> Reviewed-By: Rich Trott <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]>
PR-URL: #26600 Refs: #26401 Reviewed-By: Rich Trott <[email protected]> Reviewed-By: Richard Lau <[email protected]>
This was on test-digitalocean-ubuntu1604_sharedlibs_container-x64-10 00:09:19 not ok 2195 parallel/test-worker-prof # TODO : Fix flaky test
00:09:19 ---
00:09:19 duration_ms: 0.528
00:09:19 severity: flaky
00:09:19 exitcode: 1
00:09:19 stack: |-
00:09:19 assert.js:340
00:09:19 throw err;
00:09:19 ^
00:09:19
00:09:19 AssertionError [ERR_ASSERTION]: 14 >= 15
00:09:19 at Object.<anonymous> (/home/iojs/build/workspace/node-test-commit-linux-containered/test/parallel/test-worker-prof.js:39:3)
00:09:19 at Module._compile (internal/modules/cjs/loader.js:813:30)
00:09:19 at Object.Module._extensions..js (internal/modules/cjs/loader.js:824:10)
00:09:19 at Module.load (internal/modules/cjs/loader.js:680:32)
00:09:19 at tryModuleLoad (internal/modules/cjs/loader.js:612:12)
00:09:19 at Function.Module._load (internal/modules/cjs/loader.js:604:3)
00:09:19 at Function.Module.runMain (internal/modules/cjs/loader.js:876:12)
00:09:19 at internal/main/run_main_module.js:21:11
00:09:19 ... |
And for comparison, here's what the failure looks like with the changes in https://github.com/nodejs/node/pull/26608/files/840d31383703fdbc1fe7deb1074c306ee2415aea which is the current change proposed in #26608: test-digitalocean-ubuntu1604_sharedlibs_container-x64-10 03:27:44 not ok 2198 parallel/test-worker-prof # TODO : Fix flaky test
03:27:44 ---
03:27:44 duration_ms: 0.519
03:27:44 severity: flaky
03:27:44 exitcode: 1
03:27:44 stack: |-
03:27:44 assert.js:85
03:27:44 throw new AssertionError(obj);
03:27:44 ^
03:27:44
03:27:44 AssertionError [ERR_ASSERTION]: Expected values to be strictly equal:
03:27:44
03:27:44 null !== 0
03:27:44
03:27:44 at Object.<anonymous> (/home/iojs/build/workspace/node-test-commit-linux-containered/test/parallel/test-worker-prof.js:54:10)
03:27:44 at Module._compile (internal/modules/cjs/loader.js:813:30)
03:27:44 at Object.Module._extensions..js (internal/modules/cjs/loader.js:824:10)
03:27:44 at Module.load (internal/modules/cjs/loader.js:680:32)
03:27:44 at tryModuleLoad (internal/modules/cjs/loader.js:612:12)
03:27:44 at Function.Module._load (internal/modules/cjs/loader.js:604:3)
03:27:44 at Function.Module.runMain (internal/modules/cjs/loader.js:876:12)
03:27:44 at internal/main/run_main_module.js:21:11
03:27:44 ... |
thanks @Trott . I have no idea what |
Maybe this? Lines 683 to 686 in 169b7f1
If I'm understanding correctly, maybe it means that the process received a signal? |
Looking at |
yes, looks like it is! but who could have sent it! probably an |
^^^^ @nodejs/build |
> free
total used free shared buff/cache available
Mem: 32946876 2048256 4308300 17532 26590320 30328076
Swap: 0 0 0 |
dmesg (error happened around 10:27:44 UTC)
Might hint at a
|
thanks @refack ! the 32 GB total seem to be ample, but the actual available memory to the process when it ran will depend how many processes were running parallel. So unfortunately that data do not throw any clues. However, the system log shows it all! I am having hard time matching the timezones: Outside of this, we seem to have terminated several processes frequently, through container memory constraints - what do you think of it? I mean, shouldn't we fix that? |
and, |
|
Stress test shows 21 failures in 1000 runs: https://ci.nodejs.org/job/node-stress-single-test/177/nodes=win2012r2-vs2019/console |
I haven't seen this in a while and am wondering if something fixed it. Stress test: https://ci.nodejs.org/job/node-stress-single-test/209/ |
Hmm, didn't build, let's try again with more platforms: https://ci.nodejs.org/job/node-stress-single-test/210/ |
One failure in 1000 runs.
|
https://ci.nodejs.org/job/node-test-binary-arm-12+/9366/RUN_SUBSET=0,label=pi2-docker/console 00:55:03 not ok 787 sequential/test-worker-prof # TODO : Fix flaky test
00:55:03 ---
00:55:03 duration_ms: 32.289
00:55:06 severity: flaky
00:55:06 exitcode: 1
00:55:06 stack: |-
00:55:06 node:assert:122
00:55:06 throw new AssertionError(obj);
00:55:06 ^
00:55:06
00:55:07 AssertionError [ERR_ASSERTION]: child exited with signal: {
00:55:07 error: Error: spawnSync /home/iojs/build/workspace/node-test-binary-arm/out/Release/node ETIMEDOUT
00:55:07 at Object.spawnSync (node:internal/child_process:1086:20)
00:55:07 at spawnSync (node:child_process:667:24)
00:55:07 at Object.<anonymous> (/home/iojs/build/workspace/node-test-binary-arm/test/sequential/test-worker-prof.js:52:23)
00:55:07 at Module._compile (node:internal/modules/cjs/loader:1094:14)
00:55:07 at Object.Module._extensions..js (node:internal/modules/cjs/loader:1123:10)
00:55:07 at Module.load (node:internal/modules/cjs/loader:974:32)
00:55:07 at Function.Module._load (node:internal/modules/cjs/loader:815:14)
00:55:07 at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:76:12)
00:55:07 at node:internal/main/run_main_module:17:47 {
00:55:07 errno: -110,
00:55:07 code: 'ETIMEDOUT',
00:55:07 syscall: 'spawnSync /home/iojs/build/workspace/node-test-binary-arm/out/Release/node',
00:55:07 path: '/home/iojs/build/workspace/node-test-binary-arm/out/Release/node',
00:55:07 spawnargs: [
00:55:07 '--prof',
00:55:07 '/home/iojs/build/workspace/node-test-binary-arm/test/sequential/test-worker-prof.js',
00:55:07 'child'
00:55:07 ]
00:55:07 },
00:55:07 status: null,
00:55:07 signal: 'SIGTERM',
00:55:07 output: [ null, '', '' ],
00:55:07 pid: 20403,
00:55:07 stdout: '',
00:55:07 stderr: ''
00:55:07 }
00:55:07 at Object.<anonymous> (/home/iojs/build/workspace/node-test-binary-arm/test/sequential/test-worker-prof.js:58:10)
00:55:07 at Module._compile (node:internal/modules/cjs/loader:1094:14)
00:55:07 at Object.Module._extensions..js (node:internal/modules/cjs/loader:1123:10)
00:55:07 at Module.load (node:internal/modules/cjs/loader:974:32)
00:55:07 at Function.Module._load (node:internal/modules/cjs/loader:815:14)
00:55:07 at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:76:12)
00:55:07 at node:internal/main/run_main_module:17:47 {
00:55:07 generatedMessage: false,
00:55:07 code: 'ERR_ASSERTION',
00:55:07 actual: 'SIGTERM',
00:55:07 expected: null,
00:55:07 operator: 'strictEqual'
00:55:07 }
00:55:07 ... |
Fixes: nodejs#26401 Co-authored-by: Gireesh Punathil <[email protected]>
Fixes: #26401 Co-authored-by: Gireesh Punathil <[email protected]> PR-URL: #37372 Reviewed-By: Antoine du Hamel <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Gireesh Punathil <[email protected]>
Still flaky: nodejs/reliability#640 |
I start to wonder if we actually need 15 ticks in the tests. Maybe it's already good enough to run some tests and check the code-creation events... |
Use a JS workload instead of repeating FS operations and use a timer to make it less flaky on machines with little resources. PR-URL: #49274 Refs: #26401 Refs: nodejs/reliability#640 Reviewed-By: Benjamin Gruenbaum <[email protected]>
Use a JS workload instead of repeating FS operations and use a timer to make it less flaky on machines with little resources. PR-URL: #49274 Refs: #26401 Refs: nodejs/reliability#640 Reviewed-By: Benjamin Gruenbaum <[email protected]>
Use a JS workload instead of repeating FS operations and use a timer to make it less flaky on machines with little resources. PR-URL: nodejs#49274 Refs: nodejs#26401 Refs: nodejs/reliability#640 Reviewed-By: Benjamin Gruenbaum <[email protected]>
I believe this issue can be closed now #37673 (comment) |
https://ci.nodejs.org/job/node-test-binary-arm/6472/RUN_SUBSET=0,label=pi1-docker/console
test-requireio_ceejbot-debian9-armv6l_pi1p-1
The text was updated successfully, but these errors were encountered: