-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault in node::Environment::KickNextTick
#2928
Comments
cc @indutny |
Probably not the smallest test case, but I've managed to reproduce it on a single EC2 c3.large instance with the following gist: https://gist.github.com/stephank/1c18a3ce3e5c1a40e282 This requires cranking up the file descriptor limit, something I've not yet tried on the Mac I use for development. (At least with the default hard limit on mac, I can't seem to reproduce it right away.) It appears to be not limited to just connection setup. The test creates 10,000 connections across 5 workers, and usually crashes after those connections are already established. The test uses cluster, but that should not be relevant, it is the master that crashes, which is a simple HTTP server with an upgrade listener that echos everything on the socket. |
I updated the test to be a bit more reliable on the particular setup I was testing. It now spawns a lot more workers, and each prints a dot when it finishes connecting. For me, the crash happens before that every time I've tried. Using this, I did a git bisect between 3.2.0 and 3.3.30, and landed on 1bc4468 |
I reproduced the crash on a debug build of current master (0a329d2), and the backtrace reveals that
I tried to set a breakpoint on
But it never fires. (I also flipped the condition and it does fire.) The |
Narrowed down the issue to Pretty much, as long as instances get reused, things are fine. But once you go beyond the FreeList maximum size, instances are destroyed, and that's when the crash happens. I added a simple check to |
@stephank thanks for the info! I'll take a look at it today. |
@stephank I'm going to give a try to following patch locally in VM: diff --git a/lib/_http_common.js b/lib/_http_common.js
index 7570329..69c8ace 100644
--- a/lib/_http_common.js
+++ b/lib/_http_common.js
@@ -171,6 +171,9 @@ function freeParser(parser, req, socket) {
parser.socket.parser = null;
parser.socket = null;
parser.incoming = null;
+ if (parser.handle)
+ parser.unconsume(parser.handle._externalStream);
+ parser.handle = null;
if (parsers.free(parser) === false)
parser.close();
parser = null;
diff --git a/lib/_http_server.js b/lib/_http_server.js
index 7acc108..8583fb3 100644
--- a/lib/_http_server.js
+++ b/lib/_http_server.js
@@ -329,8 +329,10 @@ function connectionListener(socket) {
socket.on = socketOnWrap;
var external = socket._handle._externalStream;
- if (external)
+ if (external) {
parser.consume(external);
+ parser.handle = socket._handle;
+ }
external = null;
parser[kOnExecute] = onParserExecute;
@@ -368,7 +370,6 @@ function connectionListener(socket) {
socket.removeListener('data', socketOnData);
socket.removeListener('end', socketOnEnd);
socket.removeListener('close', serverSocketCloseListener);
- parser.unconsume(socket._handle._externalStream);
parser.finish();
freeParser(parser, req, null);
parser = null;
@@ -528,8 +529,10 @@ function socketOnWrap(ev, fn) {
return res;
}
- if (this._handle && (ev === 'data' || ev === 'readable'))
+ if (this._handle && (ev === 'data' || ev === 'readable')) {
this.parser.unconsume(this._handle._externalStream);
+ this.parser.handle = null;
+ }
return res;
}
I think it may be that the |
Oh gosh, it is much more simpler than this :) |
The proper fix is: diff --git a/src/node_http_parser.cc b/src/node_http_parser.cc
index 21be19e..8264336 100644
--- a/src/node_http_parser.cc
+++ b/src/node_http_parser.cc
@@ -152,7 +152,8 @@ class Parser : public BaseObject {
Parser(Environment* env, Local<Object> wrap, enum http_parser_type type)
: BaseObject(env, wrap),
current_buffer_len_(0),
- current_buffer_data_(nullptr) {
+ current_buffer_data_(nullptr),
+ refcount_(1) {
Wrap(object(), this);
Init(type);
}
@@ -366,7 +367,9 @@ class Parser : public BaseObject {
static void Close(const FunctionCallbackInfo<Value>& args) {
Parser* parser = Unwrap<Parser>(args.Holder());
- delete parser;
+
+ if (--parser->refcount_ == 0)
+ delete parser;
}
@@ -504,6 +507,22 @@ class Parser : public BaseObject {
}
protected:
+ class ScopedRetainParser {
+ public:
+ ScopedRetainParser(Parser* p) : p_(p) {
+ p_->refcount_++;
+ }
+
+ ~ScopedRetainParser() {
+ if (0 == --p_->refcount_)
+ delete p_;
+ p_ = nullptr;
+ }
+
+ private:
+ Parser* p_;
+ };
+
static const size_t kAllocBufferSize = 64 * 1024;
static void OnAllocImpl(size_t suggested_size, uv_buf_t* buf, void* ctx) {
@@ -540,6 +559,8 @@ class Parser : public BaseObject {
if (nread == 0)
return;
+ ScopedRetainParser retain(parser);
+
parser->current_buffer_.Clear();
Local<Value> ret = parser->Execute(buf->base, nread);
@@ -668,7 +689,10 @@ class Parser : public BaseObject {
char* current_buffer_data_;
StreamResource::Callback<StreamResource::AllocCb> prev_alloc_cb_;
StreamResource::Callback<StreamResource::ReadCb> prev_read_cb_;
+ int refcount_;
static const struct http_parser_settings settings;
+
+ friend class ScopedRetainParser;
};
@stephank may I ask you to give it a try? |
Relevant PR: #2956 |
Ooops, made a mistake in patch. Edited it inline. |
Can confirm this fixes the test for me! |
Hooray! Will work on get this landed ASAP |
`freeParser` deallocates `Parser` instances early if they do not fit into the free list. This does not play well with recent socket consumption change, because it will try to deallocate the parser while executing on its stack. Regression was introduced in: 1bc4468 Fix: nodejs#2928
`freeParser` deallocates `Parser` instances early if they do not fit into the free list. This does not play well with recent socket consumption change, because it will try to deallocate the parser while executing on its stack. Regression was introduced in: 1bc4468 Fix: #2928 PR-URL: #2956 Reviewed-by: Trevor Norris <[email protected]>
Fixed by 229a03f, thanks for reporting! |
`freeParser` deallocates `Parser` instances early if they do not fit into the free list. This does not play well with recent socket consumption change, because it will try to deallocate the parser while executing on its stack. Regression was introduced in: 1bc4468 Fix: #2928 PR-URL: #2956 Reviewed-by: Trevor Norris <[email protected]>
Fix flaky test-http-regr-nodejsgh-2928 that has been failing on Raspberry Pi devices in CI. Fixes: nodejs#4830
Fix flaky test-http-regr-nodejsgh-2928 that has been failing on Raspberry Pi devices in CI. Fixes: nodejs#4830 PR-URL: nodejs#5154 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Rod Vagg <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]>
Fix flaky test-http-regr-gh-2928 that has been failing on Raspberry Pi devices in CI. Fixes: #4830 PR-URL: #5154 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Rod Vagg <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]>
Hard code the value of the host parameter to `common.localhostIPv4` in `server.listen()` and `net.connect()`. This 1. ensures that the client `socket._handle` is not reinitialized during connection due to the family autodetection algorithm, preventing `parser.consume()` from being called with an invalid `socket._handle` parameter. 2. works around an issue in the FreeBSD 12 machine where the stress test is run where some sockets get stuck after connection. PR-URL: #49574 Backport-PR-URL: #52384 Closes: #49565 Fixes: #49564 Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: James M Snell <[email protected]>
We're encountering a segfault during load testing of an http-based application, also using express and sockjs. In particular, the load test establishes a whole bunch of long-lived web socket connections, and the application crashes while we ramp up connections.
This is with official 64-bit Linux binaries on Ubuntu 15.04 (EC2 ami-df95b5a8). The backtraces and the fact that the crashes don't seem to occur on io.js 3.2.0 make me suspect #2355 is related.
Backtrace from io.js 3.3.0:
Backtrace on node.js 4.1.0:
The text was updated successfully, but these errors were encountered: