-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TLS: improve compliance with shutdown standard, remove hacks #36111
Conversation
Review requested:
|
I'm somewhat hoping someone else will take this over and address the FIXME there, as this PR fix should address some of the CI flakiness around tls/https connections. That FIXME is related to what's causing the occasional failure in the newly added test (e.g. https://ci.nodejs.org/job/node-test-binary-windows-js-suites/6766/RUN_SUBSET=1,nodes=win10-COMPILED_BY-vs2019/testReport/junit/(root)/test/parallel_test_https_agent_jssocket_close/), though possibly not descriptive of the specific fix desired. What's happening there is a conflict between the ownership semantics of the read and write side of the JS TLSWrap implementation: on the write side, the TLSWrap object is taking full ownership (thus shutdown/destroy on the TLS also cause shutdown/destroy on the underlying socket) but on the read side, it's taking shared ownership (EOF on the read doesn't have a way to tell the underlying socket to destroy the read capabilities). There was recently a hack merged to call for posterity:
|
// Pull through final chunk, if anything is buffered. | ||
// the ondata function will handle it properly, and this | ||
// is a no-op if no final chunk remains. | ||
socket.read(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the comment says, this should be a no-op. The underlying socket is not supposed to deliver a read event after a close event, but TLS would try to do so (while racing to destroy it first). Looking back at the commit history, it was added as a hack to work around this issue, but didn't really fix it, as evidenced by flaky CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is buffered data (for example if there is buffered data and the socket is destroyed) shouldn't it be read? We do something similar in ws
to ensure that all received data is actually used and delivered to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no data buffered here, that wouldn't make any sense as the http protocol doesn't allow for more http data to be included after the http message is ended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you know that the http message ended? Isn't this only a listener for the 'close'
event of the socket? That 'close'
event can be emitted in the middle of a HTTP request/response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change does not affect ws
luckily. I've only mentioned it because the same pattern used here (to read buffered data after 'close'
) is also used by ws
as per #36111 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you have an example for why you added those lines to ws
? I see only the commit where you added it:
websockets/ws@6046a28. Note that the documentation for 'error' also indicates that this call to .read
should not trigger any callbacks to occur ("no further events other than 'close' should be emitted"), so by adding that comment and code there, it seems like you may be causing ws
to violate the stream.Readable
interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also seems like you had previously changed it to explicitly have the opposite behavior in websockets/ws@5d8ab0e, but was changed back later during a general code refactoring? From the comment description, it appears to me to say that code was intending to be be part of the 'error' handler (before that handler called destroy), and should be triggering the 'readable' callback, not the 'read' callback.
But why would the underlying socket be doing an active 'read', thus causing it to encounter an 'error', while data was already pending in the buffer? That seems like it would be disregarding of flow control, so wouldn't actually happen in a correctly implemented stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK calling readable.read()
is the only way to pull out buffered data from a readable stream after it is destroyed and in my opinion there must be a way to do that. The fact that doing so causes an additional 'data'
event is a side effect that is actually very useful.
Did you have an example for why you added those lines to
ws
?
See #36111 (comment) and #36111 (comment). The user can call ws.terminate()
and destroy the socket. When that happens there might be buffered data in the socket. That data might contain many vaid WebSocket frames that we do not want to drop.
websockets/ws@5d8ab0e is still in place, see https://github.com/websockets/ws/blob/8.0.0/lib/websocket.js#L946. Any data after the close frame must be discarded per specification.
Anyway this is not about ws
. ws
is not affected by this change. This is about discarding data that was previously not discarded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't fully followed this discussion but #39639 might be of relevance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test fail?
Yes, the new test may fail, as I mentioned above (it's waiting for an 'end' event, but it doesn't "own" the socket shutdown, so it's racing again the 'reset' event generated by TLSWrap when it closed the socket). I'm hoping someone more core to this code (I work more on the Julia and libuv side) can take over and address the further deficiencies. |
I should perhaps clarify that this RST packet is the expected behavior of the HTTP server that's being used as the test driver here, so perhaps it's not entirely clear if this new test is fully correct, or whether it needs to also explicitly handle the 'error' event. It'd could be correct for raw TLS usage for some application, but not HTTP (which would usually choose to destroy the underlying socket, even if TLSWrap wasn't already doing that). Alternatively (or additionally), HTTP and/or TLSWrap could fully destroy the underlying socket when receiving the 'stream_close' message. That would be more fully in compliance with my reading of the standard, but also initially made some other tests unhappy (they tried to also destroy the stream, and were confused when it already was destroyed) |
I think James is currently the one with best understanding of the TLS parts. |
For me, this transforms the active close into a simultaneous close - thus eliminating the |
I've been wondering what you meant by 'active shutdown'. It seems to be a combination of the terms 'active close' (which for http is what happens at the server always, and simply means "the party that called 'close' first". The alternative being "never close".) and of 'graceful shutdown' (a TCP feature which isn't used by TLS). Either way, that's not actually relevant here, as that state change is not under the control of the client code. The |
Normally, most TCP-based protocols signal in some way that the connection is to be closed - some form of a bye message, then both ends proceed to simultaneously call |
Having recently looked at the code, I believe nodejs usually does send the shutdown message first (it's named 'end'), but note that the TLS specification was originally intended to diverge here from your knowledge of TCP in order to avoid a truncation attack. The shutdown message is also not required (it's not present in the diagram you sent). Please see https://tools.ietf.org/html/rfc5246#section-7.2.1 for the TLS specification however.
If you look at the diagram you sent, a simultaneous close is one type of active close. It is a subset, not an alternative. |
Simultaneous close is indeed a special case of an active close where the two |
From what I saw, it seems that the libuv API supposes that it is up to the user to do this - call |
This is neither necessary nor sufficient to prevent an RST packet. I realize that TCP documentation is not always clear on this point, as 'closing the stream' and 'closing the handle' are related, but not identical activities.
Your understanding of TLS is not entirely correct here, as TLS does not fully respect the underlying TCP expectations in this. In particular, "then reading until an EOF" is not required by the standard, and thus must be handled correctly by TLSWrap. The uv_shutdown call should happen from here Lines 399 to 423 in b3b0c43
which feels accurate, since I spent some of my debugging time with a breakpoint on uv_shutdown to watch the program flow with read->write->shutdown->close, so I'm pretty confident in my observation that it already happens. |
Who is throwing that Try this on Windows: 'use strict';
const http = require('http');
const data = 'hello';
const server = http.createServer(function (req, res) {
res.setHeader('content-length', data.length);
res.end(data);
server.close();
});
server.listen(0, function () {
const opts = { port: this.address().port, rejectUnauthorized: false };
http.get(opts).on('response', function (res) {
res.on('data', function (chunk) {
this.pause();
setTimeout(() => { this.resume(); }, 1);
});
});
}); You will get at least 50% |
I think you're confusing two unrelated bugs. This PR is about fixing TLSWrap. Your example code there is for http, and does not use this TLS code, and thus cannot be related. There is a bug in libuv that could possibly could cause that code to emit an RST on Windows, though I haven't been able to trigger it with that example. I do not expect that code to emit RST ever, and from running that for several minutes on both Windows and Linux, I did not see an error generated, even when heavily loaded. I have a patch up for the RST bug in libuv. Unfortunately, that patch also greatly exacerbates an existing problem in node, that has long been a source of CI problems, and for which this patch is intended to drive conversation towards a solution. James, I'd be happy to take a call to explain and discuss fixes, if that would be helpful! |
I am unable to reproduce the plain HTTP What is this Normally, there are only two ways to produce a
It is exactly because resetting hasn't been implemented in Node, that makes me think that only the first case is possible - and this is exactly what I see in the network dump - So the reason is in the TLS code - I agree - but ignoring that last data doesn't mean not reading it from the socket? You don't have any choice on this - either you call Besides, from reading that RFC, I am coming to the conclusion that TLS accepts only one form of closing - simultaneous closing by exchanging |
Ok, in TLS 1.0 proper exchange of |
So now that I agree that this is indeed a TLS-specific problem, there is one point we still do not agree:
|
I've been buried in a few other items and haven't really had the chance to go through this in detail... but I do plan to. Unfortunately it likely won't be until early next week most likely. |
Thanks, that's great! It's holding up ~3 libuv PRs that exacerbate flaky tests in node, so your time is appreciated, though I know it's a demanding change to review. @mmomtchev I agree that's a reasonable alternative design, but rewriting the http 1.x server spec and all implementations is a bit outside the scope of this PR. As to "they made it optional in TLS 1.1", that's instructions for the initiator (the http server), not the http client, so that isn't relevant to this PR. I initially made that same mistake in reading it, so it's understandable. |
I am positive that the server is the problem - the Does your solution simply solve the I traced the JS, the Node C++ code and the libuv C code Trace |
Yes, in fact I think that |
Yes, that would likely be a valid alternative design, but since shutdown is not in the http spec, redesigning it to add your requirements is not in scope here. The node's http server implementation appears to meet the spec given (which permits this close call ordering), while the client does not (which is seen to abort the process or hang), so the fix is required in the node client code. The RST packet on the wire does not concern me, it's only the delivery to the application that is wrong. |
Yes, I am positive, @addaleax , this bears your name This will require some major reshuffling of the current code? |
ebea6f5
to
a18156c
Compare
Landed in f3fbeaf. |
RFC 5246 section-7.2.1 requires that the implementation must immediately stop reading from the stream, as it is no longer TLS-encrypted. The underlying stream is permitted to still pump events (and errors) to other users, but those are now unencrypted, so we should not process them here. But therefore, we do not want to stop the underlying stream, as there could be another user of it, but we do need to remove ourselves as a listener. Per TLS v1.2, we should have also destroy the TLS state entirely here (including the writing side), but this was revised in TLS v1.3 to permit the stream to continue to flush output. There appears to be some inconsistencies in the way nodejs handles ownership of the underlying stream, with `TLS.close()` on the write side also calling shutdown on the underlying stream (thus assuming other users of the underlying stream are not permitted), while receiving EOF on the read side leaves the underlying channel open. These inconsistencies are left for a later person to resolve, if the extra functionality is needed (as described in #35904). The current goal here is to the fix the occasional CI exceptions depending on the timing of these kernel messages through the TCP stack. PR-URL: #36111 Fixes: #35946 Refs: libuv/libuv#3036 Refs: #35904 Co-authored-by: Momtchil Momtchev <[email protected]> Reviewed-By: Robert Nagy <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Luigi Pinca <[email protected]>
RFC 5246 section-7.2.1 requires that the implementation must immediately stop reading from the stream, as it is no longer TLS-encrypted. The underlying stream is permitted to still pump events (and errors) to other users, but those are now unencrypted, so we should not process them here. But therefore, we do not want to stop the underlying stream, as there could be another user of it, but we do need to remove ourselves as a listener. Per TLS v1.2, we should have also destroy the TLS state entirely here (including the writing side), but this was revised in TLS v1.3 to permit the stream to continue to flush output. There appears to be some inconsistencies in the way nodejs handles ownership of the underlying stream, with `TLS.close()` on the write side also calling shutdown on the underlying stream (thus assuming other users of the underlying stream are not permitted), while receiving EOF on the read side leaves the underlying channel open. These inconsistencies are left for a later person to resolve, if the extra functionality is needed (as described in #35904). The current goal here is to the fix the occasional CI exceptions depending on the timing of these kernel messages through the TCP stack. PR-URL: #36111 Fixes: #35946 Refs: libuv/libuv#3036 Refs: #35904 Co-authored-by: Momtchil Momtchev <[email protected]> Reviewed-By: Robert Nagy <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Luigi Pinca <[email protected]>
RFC 5246 section-7.2.1 requires that the implementation must immediately stop reading from the stream, as it is no longer TLS-encrypted. The underlying stream is permitted to still pump events (and errors) to other users, but those are now unencrypted, so we should not process them here. But therefore, we do not want to stop the underlying stream, as there could be another user of it, but we do need to remove ourselves as a listener. Per TLS v1.2, we should have also destroy the TLS state entirely here (including the writing side), but this was revised in TLS v1.3 to permit the stream to continue to flush output. There appears to be some inconsistencies in the way nodejs handles ownership of the underlying stream, with `TLS.close()` on the write side also calling shutdown on the underlying stream (thus assuming other users of the underlying stream are not permitted), while receiving EOF on the read side leaves the underlying channel open. These inconsistencies are left for a later person to resolve, if the extra functionality is needed (as described in #35904). The current goal here is to the fix the occasional CI exceptions depending on the timing of these kernel messages through the TCP stack. PR-URL: #36111 Fixes: #35946 Refs: libuv/libuv#3036 Refs: #35904 Co-authored-by: Momtchil Momtchev <[email protected]> Reviewed-By: Robert Nagy <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Luigi Pinca <[email protected]>
RFC 5246 section-7.2.1 requires that the implementation must immediately stop reading from the stream, as it is no longer TLS-encrypted. The underlying stream is permitted to still pump events (and errors) to other users, but those are now unencrypted, so we should not process them here. But therefore, we do not want to stop the underlying stream, as there could be another user of it, but we do need to remove ourselves as a listener. Per TLS v1.2, we should have also destroy the TLS state entirely here (including the writing side), but this was revised in TLS v1.3 to permit the stream to continue to flush output. There appears to be some inconsistencies in the way nodejs handles ownership of the underlying stream, with `TLS.close()` on the write side also calling shutdown on the underlying stream (thus assuming other users of the underlying stream are not permitted), while receiving EOF on the read side leaves the underlying channel open. These inconsistencies are left for a later person to resolve, if the extra functionality is needed (as described in #35904). The current goal here is to the fix the occasional CI exceptions depending on the timing of these kernel messages through the TCP stack. PR-URL: #36111 Fixes: #35946 Refs: libuv/libuv#3036 Refs: #35904 Co-authored-by: Momtchil Momtchev <[email protected]> Reviewed-By: Robert Nagy <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Luigi Pinca <[email protected]>
RFC 5246 section-7.2.1 requires that the implementation must immediately stop reading from the stream, as it is no longer TLS-encrypted. The underlying stream is permitted to still pump events (and errors) to other users, but those are now unencrypted, so we should not process them here. But therefore, we do not want to stop the underlying stream, as there could be another user of it, but we do need to remove ourselves as a listener. Per TLS v1.2, we should have also destroy the TLS state entirely here (including the writing side), but this was revised in TLS v1.3 to permit the stream to continue to flush output. There appears to be some inconsistencies in the way nodejs handles ownership of the underlying stream, with `TLS.close()` on the write side also calling shutdown on the underlying stream (thus assuming other users of the underlying stream are not permitted), while receiving EOF on the read side leaves the underlying channel open. These inconsistencies are left for a later person to resolve, if the extra functionality is needed (as described in #35904). The current goal here is to the fix the occasional CI exceptions depending on the timing of these kernel messages through the TCP stack. PR-URL: #36111 Fixes: #35946 Refs: libuv/libuv#3036 Refs: #35904 Co-authored-by: Momtchil Momtchev <[email protected]> Reviewed-By: Robert Nagy <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Luigi Pinca <[email protected]>
RFC 5246 section-7.2.1 requires that the implementation must immediately stop reading from the stream, as it is no longer TLS-encrypted. The underlying stream is permitted to still pump events (and errors) to other users, but those are now unencrypted, so we should not process them here. But therefore, we do not want to stop the underlying stream, as there could be another user of it, but we do need to remove ourselves as a listener. Per TLS v1.2, we should have also destroy the TLS state entirely here (including the writing side), but this was revised in TLS v1.3 to permit the stream to continue to flush output. There appears to be some inconsistencies in the way nodejs handles ownership of the underlying stream, with `TLS.close()` on the write side also calling shutdown on the underlying stream (thus assuming other users of the underlying stream are not permitted), while receiving EOF on the read side leaves the underlying channel open. These inconsistencies are left for a later person to resolve, if the extra functionality is needed (as described in nodejs#35904). The current goal here is to the fix the occasional CI exceptions depending on the timing of these kernel messages through the TCP stack. PR-URL: nodejs#36111 Fixes: nodejs#35946 Refs: libuv/libuv#3036 Refs: nodejs#35904 Co-authored-by: Momtchil Momtchev <[email protected]> Reviewed-By: Robert Nagy <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Luigi Pinca <[email protected]>
RFC 5246 section-7.2.1 requires that the implementation must immediately stop reading from the stream, as it is no longer TLS-encrypted. The underlying stream is permitted to still pump events (and errors) to other users, but those are now unencrypted, so we should not process them here. But therefore, we do not want to stop the underlying stream, as there could be another user of it, but we do need to remove ourselves as a listener. Per TLS v1.2, we should have also destroy the TLS state entirely here (including the writing side), but this was revised in TLS v1.3 to permit the stream to continue to flush output. There appears to be some inconsistencies in the way nodejs handles ownership of the underlying stream, with `TLS.close()` on the write side also calling shutdown on the underlying stream (thus assuming other users of the underlying stream are not permitted), while receiving EOF on the read side leaves the underlying channel open. These inconsistencies are left for a later person to resolve, if the extra functionality is needed (as described in #35904). The current goal here is to the fix the occasional CI exceptions depending on the timing of these kernel messages through the TCP stack. PR-URL: #36111 Fixes: #35946 Refs: libuv/libuv#3036 Refs: #35904 Co-authored-by: Momtchil Momtchev <[email protected]> Reviewed-By: Robert Nagy <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Luigi Pinca <[email protected]>
This reverts commit 46f36e3. PR-URL: libuv#3006 Refs: libuv#2967 Refs: libuv#2409 Refs: libuv#2943 Refs: libuv#2968 Refs: nodejs/node#36111 Reviewed-By: Santiago Gimeno <[email protected]>
RFC 5246 section-7.2.1 requires that the implementation must immediately
stop using the stream, as it is no longer TLS-encrypted. The stream is
permitted to still pump events (and errors) to other users, but those
are now unencrypted, so we should not process them here. But therefore,
we do not want to stop the underlying stream, as there could be another
user of it, but we do remove ourselves as a listener.
The section also states that the application must destroy the streamimmediately (discarding any pending writes, and sending a close_notify
response back), but we leave that to the upper layer of the application
here, as it should be sufficient to permit standards compliant usage
just to be ignoring read events.
EDIT: In August 2018, TLS v1.3 changed that requirement, per https://tools.ietf.org/html/rfc8446#section-6.1
Does not address new feature #35904
Closes: #35946
Co-authored-by: Momtchil Momtchev [email protected] @mmomtchev
CI with libuv/libuv#3036: https://ci.nodejs.org/view/libuv/job/libuv-in-node/171/ (the libuv PR shouldn't be needed to pass CI, but instead should make it even harder to pass tests)