Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in node::Environment::KickNextTick #2928

Closed
stephank opened this issue Sep 17, 2015 · 14 comments
Closed

Segfault in node::Environment::KickNextTick #2928

stephank opened this issue Sep 17, 2015 · 14 comments
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. http Issues or PRs related to the http subsystem.

Comments

@stephank
Copy link

We're encountering a segfault during load testing of an http-based application, also using express and sockjs. In particular, the load test establishes a whole bunch of long-lived web socket connections, and the application crashes while we ramp up connections.

This is with official 64-bit Linux binaries on Ubuntu 15.04 (EC2 ami-df95b5a8). The backtraces and the fact that the crashes don't seem to occur on io.js 3.2.0 make me suspect #2355 is related.

Backtrace from io.js 3.3.0:

Program received signal SIGSEGV, Segmentation fault.
0x00000000008f0a03 in v8::Isolate::SuppressMicrotaskExecutionScope::SuppressMicrotaskExecutionScope(v8::Isolate*) ()
(gdb) bt
#0  0x00000000008f0a03 in v8::Isolate::SuppressMicrotaskExecutionScope::SuppressMicrotaskExecutionScope(v8::Isolate*) ()
#1  0x0000000000b84113 in v8::internal::Isolate::RunMicrotasks() ()
#2  0x0000000000da5ad8 in node::Environment::KickNextTick() ()
#3  0x0000000000dd860f in node::Parser::OnReadImpl(long, uv_buf_t const*, uv_handle_type, void*) ()
#4  0x0000000000deda53 in node::StreamWrap::OnRead(uv_stream_s*, long, uv_buf_t const*) ()
#5  0x0000000000f66d8f in uv__read (stream=stream@entry=0x3b2c510) at ../deps/uv/src/unix/stream.c:1178
#6  0x0000000000f67400 in uv__stream_io (loop=, w=0x3b2c598, events=1) at ../deps/uv/src/unix/stream.c:1241
#7  0x0000000000f6cc45 in uv__io_poll (loop=loop@entry=0x180d000 , timeout=0) at ../deps/uv/src/unix/linux-core.c:345
#8  0x0000000000f5dfd6 in uv_run (loop=0x180d000 , mode=UV_RUN_ONCE) at ../deps/uv/src/unix/core.c:341
#9  0x0000000000dbdaa8 in node::Start(int, char**) ()
#10 0x00007fc5d03c8a40 in __libc_start_main (main=0x71e340 , argc=5, argv=0x7fff26b15868, init=, fini=,
    rtld_fini=, stack_end=0x7fff26b15858) at libc-start.c:289
#11 0x000000000071e53d in _start ()
Warning: the current language does not match this frame.

Backtrace on node.js 4.1.0:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000dd722e in node::Environment::KickNextTick() ()
(gdb) bt
#0  0x0000000000dd722e in node::Environment::KickNextTick() ()
#1  0x0000000000e09c0f in node::Parser::OnReadImpl(long, uv_buf_t const*, uv_handle_type, void*) ()
#2  0x0000000000e1f0f3 in node::StreamWrap::OnRead(uv_stream_s*, long, uv_buf_t const*) ()
#3  0x0000000000f9930f in uv__read (stream=stream@entry=0x2855d70) at ../deps/uv/src/unix/stream.c:1178
#4  0x0000000000f99980 in uv__stream_io (loop=, w=0x2855df8, events=1) at ../deps/uv/src/unix/stream.c:1241
#5  0x0000000000f9f1c5 in uv__io_poll (loop=loop@entry=0x1859ec0 , timeout=0) at ../deps/uv/src/unix/linux-core.c:345
#6  0x0000000000f90566 in uv_run (loop=0x1859ec0 , mode=UV_RUN_ONCE) at ../deps/uv/src/unix/core.c:341
#7  0x0000000000def3f0 in node::Start(int, char**) ()
#8  0x00007febbf92aa40 in __libc_start_main (main=0x719210 , argc=5, argv=0x7ffcef1aca88, init=, fini=,
    rtld_fini=, stack_end=0x7ffcef1aca78) at libc-start.c:289
#9  0x000000000071940d in _start ()
Warning: the current language does not match this frame.
@brendanashworth brendanashworth added http Issues or PRs related to the http subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. labels Sep 17, 2015
@brendanashworth
Copy link
Contributor

cc @indutny

@stephank
Copy link
Author

Probably not the smallest test case, but I've managed to reproduce it on a single EC2 c3.large instance with the following gist: https://gist.github.com/stephank/1c18a3ce3e5c1a40e282

This requires cranking up the file descriptor limit, something I've not yet tried on the Mac I use for development. (At least with the default hard limit on mac, I can't seem to reproduce it right away.)

It appears to be not limited to just connection setup. The test creates 10,000 connections across 5 workers, and usually crashes after those connections are already established.

The test uses cluster, but that should not be relevant, it is the master that crashes, which is a simple HTTP server with an upgrade listener that echos everything on the socket.

@stephank
Copy link
Author

I updated the test to be a bit more reliable on the particular setup I was testing. It now spawns a lot more workers, and each prints a dot when it finishes connecting. For me, the crash happens before that every time I've tried.

Using this, I did a git bisect between 3.2.0 and 3.3.30, and landed on 1bc4468

@stephank
Copy link
Author

I reproduced the crash on a debug build of current master (0a329d2), and the backtrace reveals that parser->env() == NULL:

Program received signal SIGSEGV, Segmentation fault.
0x000000000157fa70 in node::Environment::TickInfo::in_tick (this=0x1f8) at ../src/env-inl.h:107
107   return in_tick_;
(gdb) bt
#0  0x000000000157fa70 in node::Environment::TickInfo::in_tick (this=0x1f8) at ../src/env-inl.h:107
#1  0x00000000015809ea in node::Environment::KickNextTick (this=0x0) at ../src/env.cc:63
#2  0x00000000015c5747 in node::Parser::OnReadImpl (nread=81, buf=0x7fffffffafd0, pending=UV_UNKNOWN_HANDLE, ctx=0x21584f0) at ../src/node_http_parser.cc:565
#3  0x000000000158d821 in node::StreamResource::OnRead (this=0x21583a0, nread=81, buf=0x7fffffffafd0, pending=UV_UNKNOWN_HANDLE) at ../src/stream_base.h:160
#4  0x00000000015dbb63 in node::StreamWrap::OnReadCommon (handle=0x21583f0, nread=81, buf=0x7fffffffafd0, pending=UV_UNKNOWN_HANDLE) at ../src/stream_wrap.cc:250
#5  0x00000000015dbc0e in node::StreamWrap::OnRead (handle=0x21583f0, nread=81, buf=0x7fffffffafd0) at ../src/stream_wrap.cc:265
#6  0x000000000164a2ca in uv__read (stream=0x21583f0) at ../deps/uv/src/unix/stream.c:1178
#7  0x000000000164a582 in uv__stream_io (loop=0x1ec0100 <default_loop_struct>, w=0x2158478, events=1) at ../deps/uv/src/unix/stream.c:1241
#8  0x000000000165048c in uv__io_poll (loop=0x1ec0100 <default_loop_struct>, timeout=93903) at ../deps/uv/src/unix/linux-core.c:345
#9  0x000000000163c955 in uv_run (loop=0x1ec0100 <default_loop_struct>, mode=UV_RUN_ONCE) at ../deps/uv/src/unix/core.c:341
#10 0x000000000159aa96 in node::StartNodeInstance (arg=0x7fffffffe480) at ../src/node.cc:3926
#11 0x000000000159add1 in node::Start (argc=2, argv=0x1ec1bc0) at ../src/node.cc:3999
#12 0x00000000015c64a2 in main (argc=2, argv=0x7fffffffe5e8) at ../src/node_main.cc:44

I tried to set a breakpoint on Parser::New as:

break ../src/node_http_parser.cc:363 if env == 0

But it never fires. (I also flipped the condition and it does fire.)

The parser pointer and object seem otherwise valid, but perhaps there is some corruption going on?

@stephank
Copy link
Author

Narrowed down the issue to onParserExecute and FreeList.

Pretty much, as long as instances get reused, things are fine. But once you go beyond the FreeList maximum size, instances are destroyed, and that's when the crash happens.

I added a simple check to freelist.js for a double free, and that doesn't appear to be the case, at least. So we're not handling destruction properly somewhere.

@indutny
Copy link
Member

indutny commented Sep 18, 2015

@stephank thanks for the info! I'll take a look at it today.

@indutny
Copy link
Member

indutny commented Sep 18, 2015

@stephank I'm going to give a try to following patch locally in VM:

diff --git a/lib/_http_common.js b/lib/_http_common.js
index 7570329..69c8ace 100644
--- a/lib/_http_common.js
+++ b/lib/_http_common.js
@@ -171,6 +171,9 @@ function freeParser(parser, req, socket) {
       parser.socket.parser = null;
     parser.socket = null;
     parser.incoming = null;
+    if (parser.handle)
+      parser.unconsume(parser.handle._externalStream);
+    parser.handle = null;
     if (parsers.free(parser) === false)
       parser.close();
     parser = null;
diff --git a/lib/_http_server.js b/lib/_http_server.js
index 7acc108..8583fb3 100644
--- a/lib/_http_server.js
+++ b/lib/_http_server.js
@@ -329,8 +329,10 @@ function connectionListener(socket) {
   socket.on = socketOnWrap;

   var external = socket._handle._externalStream;
-  if (external)
+  if (external) {
     parser.consume(external);
+    parser.handle = socket._handle;
+  }
   external = null;
   parser[kOnExecute] = onParserExecute;

@@ -368,7 +370,6 @@ function connectionListener(socket) {
       socket.removeListener('data', socketOnData);
       socket.removeListener('end', socketOnEnd);
       socket.removeListener('close', serverSocketCloseListener);
-      parser.unconsume(socket._handle._externalStream);
       parser.finish();
       freeParser(parser, req, null);
       parser = null;
@@ -528,8 +529,10 @@ function socketOnWrap(ev, fn) {
     return res;
   }

-  if (this._handle && (ev === 'data' || ev === 'readable'))
+  if (this._handle && (ev === 'data' || ev === 'readable')) {
     this.parser.unconsume(this._handle._externalStream);
+    this.parser.handle = null;
+  }

   return res;
 }

I think it may be that the http parser is deallocated before the net.Socket

@indutny
Copy link
Member

indutny commented Sep 18, 2015

Oh gosh, it is much more simpler than this :)

@indutny
Copy link
Member

indutny commented Sep 18, 2015

The proper fix is:

diff --git a/src/node_http_parser.cc b/src/node_http_parser.cc
index 21be19e..8264336 100644
--- a/src/node_http_parser.cc
+++ b/src/node_http_parser.cc
@@ -152,7 +152,8 @@ class Parser : public BaseObject {
   Parser(Environment* env, Local<Object> wrap, enum http_parser_type type)
       : BaseObject(env, wrap),
         current_buffer_len_(0),
-        current_buffer_data_(nullptr) {
+        current_buffer_data_(nullptr),
+        refcount_(1) {
     Wrap(object(), this);
     Init(type);
   }
@@ -366,7 +367,9 @@ class Parser : public BaseObject {

   static void Close(const FunctionCallbackInfo<Value>& args) {
     Parser* parser = Unwrap<Parser>(args.Holder());
-    delete parser;
+
+    if (--parser->refcount_ == 0)
+      delete parser;
   }


@@ -504,6 +507,22 @@ class Parser : public BaseObject {
   }

  protected:
+  class ScopedRetainParser {
+   public:
+    ScopedRetainParser(Parser* p) : p_(p) {
+      p_->refcount_++;
+    }
+
+    ~ScopedRetainParser() {
+      if (0 == --p_->refcount_)
+        delete p_;
+      p_ = nullptr;
+    }
+
+   private:
+    Parser* p_;
+  };
+
   static const size_t kAllocBufferSize = 64 * 1024;

   static void OnAllocImpl(size_t suggested_size, uv_buf_t* buf, void* ctx) {
@@ -540,6 +559,8 @@ class Parser : public BaseObject {
     if (nread == 0)
       return;

+    ScopedRetainParser retain(parser);
+
     parser->current_buffer_.Clear();
     Local<Value> ret = parser->Execute(buf->base, nread);

@@ -668,7 +689,10 @@ class Parser : public BaseObject {
   char* current_buffer_data_;
   StreamResource::Callback<StreamResource::AllocCb> prev_alloc_cb_;
   StreamResource::Callback<StreamResource::ReadCb> prev_read_cb_;
+  int refcount_;
   static const struct http_parser_settings settings;
+
+  friend class ScopedRetainParser;
 };

@stephank may I ask you to give it a try?

@indutny
Copy link
Member

indutny commented Sep 18, 2015

Relevant PR: #2956

@indutny
Copy link
Member

indutny commented Sep 18, 2015

Ooops, made a mistake in patch. Edited it inline.

@stephank
Copy link
Author

Can confirm this fixes the test for me!

@indutny
Copy link
Member

indutny commented Sep 18, 2015

Hooray! Will work on get this landed ASAP

indutny added a commit to indutny/io.js that referenced this issue Sep 19, 2015
`freeParser` deallocates `Parser` instances early if they do not fit
into the free list. This does not play well with recent socket
consumption change, because it will try to deallocate the parser while
executing on its stack.

Regression was introduced in: 1bc4468

Fix: nodejs#2928
indutny added a commit that referenced this issue Sep 19, 2015
`freeParser` deallocates `Parser` instances early if they do not fit
into the free list. This does not play well with recent socket
consumption change, because it will try to deallocate the parser while
executing on its stack.

Regression was introduced in: 1bc4468

Fix: #2928
PR-URL: #2956
Reviewed-by: Trevor Norris <[email protected]>
@indutny
Copy link
Member

indutny commented Sep 19, 2015

Fixed by 229a03f, thanks for reporting!

@indutny indutny closed this as completed Sep 19, 2015
indutny added a commit that referenced this issue Sep 20, 2015
`freeParser` deallocates `Parser` instances early if they do not fit
into the free list. This does not play well with recent socket
consumption change, because it will try to deallocate the parser while
executing on its stack.

Regression was introduced in: 1bc4468

Fix: #2928
PR-URL: #2956
Reviewed-by: Trevor Norris <[email protected]>
Trott added a commit to Trott/io.js that referenced this issue Feb 9, 2016
Fix flaky test-http-regr-nodejsgh-2928 that has been failing on Raspberry Pi
devices in CI.

Fixes: nodejs#4830
Trott added a commit to Trott/io.js that referenced this issue Feb 10, 2016
Fix flaky test-http-regr-nodejsgh-2928 that has been failing on Raspberry Pi
devices in CI.

Fixes: nodejs#4830
PR-URL: nodejs#5154
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Rod Vagg <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
rvagg pushed a commit that referenced this issue Feb 15, 2016
Fix flaky test-http-regr-gh-2928 that has been failing on Raspberry Pi
devices in CI.

Fixes: #4830
PR-URL: #5154
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Rod Vagg <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
richardlau pushed a commit that referenced this issue Apr 17, 2024
Hard code the value of the host parameter to `common.localhostIPv4` in
`server.listen()` and `net.connect()`. This

1. ensures that the client `socket._handle` is not reinitialized during
   connection due to the family autodetection algorithm, preventing
   `parser.consume()` from being called with an invalid `socket._handle`
   parameter.
2. works around an issue in the FreeBSD 12 machine where the stress test
   is run where some sockets get stuck after connection.

PR-URL: #49574
Backport-PR-URL: #52384
Closes: #49565
Fixes: #49564
Reviewed-By: Yagiz Nizipli <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. http Issues or PRs related to the http subsystem.
Projects
None yet
Development

No branches or pull requests

3 participants