Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: fix worker send error #20973

Closed
wants to merge 0 commits into from

Conversation

gireeshpunathil
Copy link
Member

In test-child-process-fork-closed-channel-segfault.js, race condition
is observed between the server getting closed and the worker sending
a message. Accommodate the potential errors.

Earlier, the same race was observed between the client and server
and was addressed through ignoring the relevant errors through error
handler. The same mechanism is re-used for worker too.

Refs: #3635 (comment)
Fixes: #20836

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@gireeshpunathil gireeshpunathil added the test Issues and PRs related to the tests. label May 26, 2018
@Trott
Copy link
Member

Trott commented May 27, 2018

@Trott Trott added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label May 27, 2018
@joyeecheung
Copy link
Member

From the CI:

not ok 51 parallel/test-child-process-fork-closed-channel-segfault
  ---
  duration_ms: 0.318
  severity: fail
  exitcode: 1
  stack: |-
    c:\workspace\node-test-binary-windows\test\parallel\test-child-process-fork-closed-channel-segfault.js:80
                throw err;
                ^
    
    Error: write EMFILE
        at ChildProcess.target._send (internal/child_process.js:741:20)
        at ChildProcess.target.send (internal/child_process.js:625:19)
        at Worker.send (internal/cluster/worker.js:40:28)
        at Socket.<anonymous> (c:\workspace\node-test-binary-windows\test\parallel\test-child-process-fork-closed-channel-segfault.js:46:16)
        at Object.onceWrapper (events.js:273:13)
        at Socket.emit (events.js:182:13)
        at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1147:10)
  ...
not ok 52 parallel/test-child-process-fork-net
  ---
  duration_ms: 0.320
  severity: fail
  exitcode: 1
  stack: |-
    PARENT: server listening
    CHILD: server listening
    CLIENT: connected
    PARENT: got connection
    CLIENT: connected
    CHILD: got connection
    CLIENT: closed
    CHILD: got connection
    CHILD: got connection
    CLIENT: closed
    CLIENT: connected
    CLIENT: connected
    CLIENT: closed
    CLIENT: closed
    PARENT: server closed
    testSocket, listening
    CHILD: got socket
    CLIENT: got data
    CLIENT: closed
    events.js:167
          throw er; // Unhandled 'error' event
          ^
    
    Error: write EPIPE
        at ChildProcess.target._send (internal/child_process.js:741:20)
        at ChildProcess.target.send (internal/child_process.js:625:19)
        at SocketListSend._request (internal/socket_list.js:20:16)
        at SocketListSend.close (internal/socket_list.js:40:10)
        at Server.close (net.js:1636:24)
        at Socket.<anonymous> (c:\workspace\node-test-binary-windows\test\parallel\test-child-process-fork-net.js:178:16)
        at Socket.emit (events.js:182:13)
        at TCP._handle.close [as _onclose] (net.js:596:12)
    Emitted 'error' event at:
        at process.nextTick (internal/child_process.js:745:39)
        at process._tickCallback (internal/process/next_tick.js:61:11)
  ...

@gireeshpunathil
Copy link
Member Author

for what this PR was raised, the same test failed!

good thing is that it precisely showed that fix did not work, the best CI run I have ever seen!

Looking at the failure and associated code, I see what is happening:

if (typeof callback === 'function') {
process.nextTick(callback, ex);
} else {
process.nextTick(() => this.emit('error', ex));
}

the error in the child child_process.send is either emitted to its receiver and eventually to the worker object, if there is no callback supplied to send.

if callback is supplied, error is never emitted, instead passed to the callback.

So I should apply the error filter in the callback, not in the error handler.

@gireeshpunathil
Copy link
Member Author

pushed commit to that effect and squashed with the last one, but noticed that the head commit has got associated with this one! how is that possible? what mess up I would have made?

@gireeshpunathil
Copy link
Member Author

ok, forget the last comment - I actually over-squashed one extra commit into mine, fixed now.

@gireeshpunathil
Copy link
Member Author

@gireeshpunathil
Copy link
Member Author

gist of CI:
free-bsd:

06:30:06 not ok 1744 parallel/test-trace-events-fs-sync
06:30:06   ---
06:30:06   duration_ms: 0.874
06:30:06   severity: fail
06:30:06   exitcode: 1
06:30:06   stack: |-
06:30:06     assert.js:80
06:30:06       throw new AssertionError(obj);
06:30:06       ^
06:30:06     
06:30:06     AssertionError [ERR_ASSERTION]: fs.sync.copyfile: 

windows:

ok 51 parallel/test-child-process-fork-closed-channel-segfault
  ---
  duration_ms: 0.426
  ...
not ok 52 parallel/test-child-process-fork-net
  ---
  duration_ms: 0.293
  severity: fail
  exitcode: 1
  stack: |-
    PARENT: server listening
...
    events.js:167
          throw er; // Unhandled 'error' event
          ^
    
    Error: write EPIPE
        at ChildProcess.target._send (internal/child_process.js:741:20)
        at ChildProcess.target.send (internal/child_process.js:625:19)
        at SocketListSend._request (internal/socket_list.js:20:16)
        at SocketListSend.close (internal/socket_list.js:40:10)
        at Server.close (net.js:1636:24)
        at Socket.<anonymous> (c:\workspace\node-test-binary-windows\test\parallel\test-child-process-fork-net.js:178:16)

first one is unrelated flake. the other failure (parallel/test-child-process-fork-net) fails with the same symptom as that of this test, and require the same patch. I suggested @mmarchini to address that based on the result of this.

In short, this PR is stable, and CI failures are unrelated.

@Trott
Copy link
Member

Trott commented May 28, 2018

@nodejs/testing @nodejs/child_process

err.code !== 'EMFILE'
) {
throw err;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am not mistaken the test actually does not work in case one of these errors pops up. So instead of ignoring the error, I would like to skip the test with a message. That way there is at least some kind of notification that the test actually did not work as expected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BridgeAR : the sole purpose of the test is to make sure the channel field of a closed worker process is propery nullified (ref: #2847).

This is asserted after the first send causes the worker to exit and subsequently called back in the worker's closure callback

The race comes after: the second request can fail through a variety of reasons based on the state of the worker and the server, and their failure or success is immaterial to the test.

Hope this clarifies.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the error does not have any influence on the actual thing we want to test: great.

Copy link
Contributor

@cjihrig cjihrig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I think these should be written as assertions, rather than if statements with throws.

@gireeshpunathil
Copy link
Member Author

thanks @cjihrig . Given that:

  • the introduced code is a replication of existing block in the same test,
  • the same pattern is required in a companion test parallel/test-child-process-fork-net
  • this has been failing with high frequency in CI recently,

I think it is worthwhile to go as is (fixing the issue at hand) and look at improvements later.

Copy link
Contributor

@maclover7 maclover7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- I'm personally in favor of getting this landed sooner rather than later since this fixes Windows CI breakage

@gireeshpunathil
Copy link
Member Author

gireeshpunathil commented May 29, 2018

@cjihrig - I know you approved it, but your suggestion with a should in it makes me flaky on landing this. can you please acknowledge or comment on my response? please let me know.

on a separate note: can anyone tell me the discrete steps for landing self PRs? I know the general landing procedure, but am looking for the merge, the purple button status. Last time I attempted it and created a mess!

@bzoz
Copy link
Contributor

bzoz commented May 29, 2018

Try this: https://github.com/nodejs/node-core-utils/blob/master/docs/git-node.md it makes everything supereasy

Copy link
Contributor

@mmarchini mmarchini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

BTW, I ran a node-stress-single-test yesterday, and there were no failures out of 9999 runs :)

https://ci.nodejs.org/job/node-stress-single-test/1873/

@joyeecheung
Copy link
Member

@bzoz The tool does not solve the merge status issue (but I guess if you reset your PR branch to the HEAD that you are about to push to master and force push to update the PR branch, then GitHub would recognize the PR has been merged?)

gireeshpunathil added a commit that referenced this pull request May 29, 2018
In test-child-process-fork-closed-channel-segfault.js, race condition
is observed between the server getting closed and the worker sending
a message. Accommodate the potential errors.

Earlier, the same race was observed between the client and server
and was addressed through ignoring the relevant errors through error
handler. The same mechanism is re-used for worker too.

The only difference is that the filter is applied at the callback
instead of at the worker's error listener.

Refs: #3635 (comment)
Fixes: #20836
PR-URL: #20973

Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Jon Moss <[email protected]>
Reviewed-By: Matheus Marchini <[email protected]>
Reviewed-By: Joyee Cheung <[email protected]>
Reviewed-By: Bartosz Sosnowski <[email protected]>
@gireeshpunathil
Copy link
Member Author

Landed in 397eceb

mmarchini pushed a commit to mmarchini/node that referenced this pull request May 29, 2018
Patch inspired on 397eceb to fix
flakyness on test-child-process-fork-net.

Ref: nodejs#20973
mmarchini pushed a commit to mmarchini/node that referenced this pull request May 29, 2018
`flaky-test-child-process-fork-net` has been failing constantly for the
past few days, and all solutions suggestes so far were didn't work.
Marking it as faky while the issue is not fixed.

Ref: nodejs#21012
Ref: nodejs#20973
Ref: nodejs#20973
mmarchini pushed a commit that referenced this pull request May 30, 2018
`flaky-test-child-process-fork-net` has been failing constantly for the
past few days, and all solutions suggestes so far were didn't work.
Marking it as faky while the issue is not fixed.

Ref: #21012
Ref: #20973
Ref: #20973

PR-URL: #21018
Refs: #21012
Refs: #20973
Refs: #20973
Reviewed-By: Ruben Bridgewater <[email protected]>
Reviewed-By: Jon Moss <[email protected]>
Reviewed-By: Anatoli Papirovski <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Trivikram Kamat <[email protected]>
addaleax pushed a commit that referenced this pull request May 31, 2018
In test-child-process-fork-closed-channel-segfault.js, race condition
is observed between the server getting closed and the worker sending
a message. Accommodate the potential errors.

Earlier, the same race was observed between the client and server
and was addressed through ignoring the relevant errors through error
handler. The same mechanism is re-used for worker too.

The only difference is that the filter is applied at the callback
instead of at the worker's error listener.

Refs: #3635 (comment)
Fixes: #20836
PR-URL: #20973

Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Jon Moss <[email protected]>
Reviewed-By: Matheus Marchini <[email protected]>
Reviewed-By: Joyee Cheung <[email protected]>
Reviewed-By: Bartosz Sosnowski <[email protected]>
addaleax pushed a commit that referenced this pull request May 31, 2018
`flaky-test-child-process-fork-net` has been failing constantly for the
past few days, and all solutions suggestes so far were didn't work.
Marking it as faky while the issue is not fixed.

Ref: #21012
Ref: #20973
Ref: #20973

PR-URL: #21018
Refs: #21012
Refs: #20973
Refs: #20973
Reviewed-By: Ruben Bridgewater <[email protected]>
Reviewed-By: Jon Moss <[email protected]>
Reviewed-By: Anatoli Papirovski <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Trivikram Kamat <[email protected]>
@MylesBorins MylesBorins mentioned this pull request Jun 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. test Issues and PRs related to the tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate test-child-process-fork-closed-channel-segfault
8 participants