Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: increase highWaterMark value to fix file descriptor issue on v16+ #389

Merged
merged 5 commits into from
Dec 27, 2021

Conversation

RafaelGSS
Copy link
Contributor

@RafaelGSS RafaelGSS commented Dec 20, 2021

It aims to solve: clinicjs/node-clinic#293 (comment).

However, I'm not entirely sure if it's the best approach, It was not documented anywhere in nodejs streams documentation but it aims to solve the behavior added in v15.9.0

I'm doing some tests to see if it affects the result data.

Important: see #389 (comment)

@RafaelGSS RafaelGSS requested a review from mcollina December 20, 2021 18:41
Copy link
Contributor

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a unit test?

injects/logger.js Outdated Show resolved Hide resolved
@RafaelGSS
Copy link
Contributor Author

@mcollina after other tests, the behavior described above is happening when using fastify. If I create a raw HTTP server:

const http = require('http')

http.createServer(function (req, res) {
  res.end('Hello, World!\n')
}).listen(3000)

it works fine. So, it might need some other discoveries. Firstly, would you mind elaborating why do you think that it's a bug inside bubbleprof instead of node core? Because it works when using v15.8.0 or prior.

@mcollina
Copy link
Contributor

My read of this is that something changed in node core and Bubbleprof needs to be updated.

If it is a bug in Node core, we'd need to work around it anyway.

Copy link
Contributor

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

After a long time spent on this, I don't think we can actually fix it before our deadline. There is likely a node core bug (we need to get back to it)

injects/logger.js Outdated Show resolved Hide resolved
@RafaelGSS
Copy link
Contributor Author

I was debugging inside node core and I have found that https://github.com/nodejs/node/blob/ce4d3adf5029278090bdfca8e2b8a2fc384b9cc6/lib/internal/streams/writable.js#L733 returns FALSE when using the "bad version" because of the state.length > 0.

It might be a bug inside backpressure streams or it might be missing a warning to tell the users what they should do in that situation, for instance, when there is no listener for drain event and it is emitted, send a warning.

Why it appears on v15.9.0+?

The nodejs/node#37300 adds top-level-await by default on Node.js which increases considerably the amount of data tracked by async_hooks, see below a comparison between the stack traces stored in the file.

v15.8.0 or prior

structuredStackTrace length 90
structuredStackTrace length 46
structuredStackTrace length 14
structuredStackTrace length 18
structuredStackTrace length 12
structuredStackTrace length 8
structuredStackTrace length 8
structuredStackTrace length 19
structuredStackTrace length 19
structuredStackTrace length 11
structuredStackTrace length 13
structuredStackTrace length 5
structuredStackTrace length 5
structuredStackTrace length 9
structuredStackTrace length 5
structuredStackTrace length 6
structuredStackTrace length 6
structuredStackTrace length 4
structuredStackTrace length 8
structuredStackTrace length 10

v15.9.0+

structuredStackTrace length 71
structuredStackTrace length 64
structuredStackTrace length 57
structuredStackTrace length 57
structuredStackTrace length 64
structuredStackTrace length 71
structuredStackTrace length 71
structuredStackTrace length 90
structuredStackTrace length 85
structuredStackTrace length 85
structuredStackTrace length 85
structuredStackTrace length 57
structuredStackTrace length 46
structuredStackTrace length 50
structuredStackTrace length 22
structuredStackTrace length 50
structuredStackTrace length 14
structuredStackTrace length 18
structuredStackTrace length 12
structuredStackTrace length 8
structuredStackTrace length 8
structuredStackTrace length 19
structuredStackTrace length 11
structuredStackTrace length 19
structuredStackTrace length 11
structuredStackTrace length 13
structuredStackTrace length 5
structuredStackTrace length 5
structuredStackTrace length 9
structuredStackTrace length 5
structuredStackTrace length 6
structuredStackTrace length 6
structuredStackTrace length 4
structuredStackTrace length 18
structuredStackTrace length 24
structuredStackTrace length 8
structuredStackTrace length 10
structuredStackTrace length 10
structuredStackTrace length 24
structuredStackTrace length 18
structuredStackTrace length 15
structuredStackTrace length 10
structuredStackTrace length 24
structuredStackTrace length 18
structuredStackTrace length 21
structuredStackTrace length 10
structuredStackTrace length 24
structuredStackTrace length 18
structuredStackTrace length 21
structuredStackTrace length 10
structuredStackTrace length 18
structuredStackTrace length 10

We are now written way more than in the old versions because of top-level-await support. Hence, the backpressure module is triggered on those streams, which means that the drain event is emitted when stream.write returns false.

If a call to stream.write(chunk) returns false, the 'drain' event will be emitted when it is appropriate to resume writing data to the stream.

In that case, we can either handle the drain event or increase the highWaterMark option.

By the docs:

Data is buffered in Writable streams when the writable.write(chunk) method is called repeatedly. While the total size of the internal write buffer is below the threshold set by highWaterMark, calls to writable.write() will return true. Once the size of the internal buffer reaches or exceeds the highWaterMark, false will be returned.


Note: Would be great to create an easily reproducible example to tag in an issue inside Node.js for that.

@RafaelGSS RafaelGSS requested a review from mcollina December 26, 2021 16:42
Copy link
Contributor

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@RafaelGSS RafaelGSS changed the title fix: use out.destroy to fix file descriptor issue on v16+ fix: increase highWaterMark value to fix file descriptor issue on v16+ Dec 27, 2021
@RafaelGSS RafaelGSS merged commit d59e18d into main Dec 27, 2021
@RafaelGSS RafaelGSS deleted the fix/file-descriptor-destroy branch December 27, 2021 12:59
@simoneb
Copy link

simoneb commented Jan 3, 2022

@RafaelGSS why do you think this is an issue in Node core?

@RafaelGSS
Copy link
Contributor Author

@RafaelGSS why do you think this is an issue in Node core?

We have solved it in bubbleprof by just increasing the highWaterMark value, however, if a new update from Node.js is released and after that and we start to capture more events at the same time, this bug could show again.

As mentioned above, this bug might exist in old versions of Node.js as well, the difference is that in the old versions we were not dispatching too many events at the same time.

When the out stream calls .end() is expected to have the queue empty, but in this case, it was not free for some unknown reason. The next step is to create a simple reproducible example and then solve it on the core.

is that clear for you?

@simoneb
Copy link

simoneb commented Jan 3, 2022

Yes, the explanation is clear. Yet, it feels like a stream that's raising the end event before all data has been read would be such an obvious bug that I wound find it really odd that this wasn't caught in Node's own tests.

@simoneb
Copy link

simoneb commented Jan 3, 2022

I wonder if this isn't more likely to be a bug in the implementation of our decoder transform stream (which extends AbstractDecoder, which in turn contains the whole of the implementation). For some reason it may not be respecting the invariants that Node expects. I haven't looked into it though.

@RafaelGSS
Copy link
Contributor Author

RafaelGSS commented Jan 3, 2022

I wonder if this isn't more likely to be a bug in the implementation of our decoder transform stream (which extends AbstractDecoder, which in turn contains the whole of the implementation). For some reason, it may not be respecting the invariants that Node expects. I haven't looked into it though.

Firstly, the issue is not in the decoder because the behavior happens in the encoder. We have removed the decoder and added a breaking point inside of it, and it doesn't work (never calls).

We have also tested without the encoder (just passing through) and the bug is still there. One thing that we have discovered is that if we reduce the amount of data written at the same time it works. For instance, I have changed it to:

  _transform (message, encoding, callback) {
-    const messageLength = this._messageType.encodingLength(message)

-    const framedMessage = Buffer.alloc(messageLength + FRAME_PREFIX_SIZE)
-    framedMessage.writeUInt32BE(messageLength, 0)
-    this._messageType.encode(message, framedMessage, FRAME_PREFIX_SIZE)

-    callback(null, framedMessage)
+   callback(null, "example message")
  }

and works fine, but if I change it to

-  callback(null, "example message")
+  callback(null, JSON.stringify(message))

it doesn't work again.

Note: one thing important to mention is not because the fs stream is stuck and doesn't write anything after much data were written (I have added a setInterval doing an fs.write and it works perfectly, the main issue is when we write much data at the same time and this call returns false) which means that the drain event will be emitted and we should deal with it. From the Node.js docs:

If a call to stream.write(chunk) returns false, the 'drain' event will be emitted when it is appropriate to resume writing data to the stream.

Also, look at one example for Node.js docs:

// Write the data to the supplied writable stream one million times.
// Be attentive to back-pressure.
function writeOneMillionTimes(writer, data, encoding, callback) {
  let i = 1000000;
  write();
  function write() {
    let ok = true;
    do {
      i--;
      if (i === 0) {
        // Last time!
        writer.write(data, encoding, callback);
      } else {
        // See if we should continue, or wait.
        // Don't pass the callback, because we're not done yet.
        ok = writer.write(data, encoding);
      }
    } while (i > 0 && ok);
    if (i > 0) {
      // Had to stop early!
      // Write some more once it drains.
      writer.once('drain', write);
    }
  }
}

So, in the first moment, the current behavior looks expected, however, I don't think being quiet when it happens is doable, I feel that Node.js should emit a warning when a drain event is emitted and the user doesn't have a listener to it. Also, the current behavior might not be expected in Node.js (fs.stream contains a length > 0 when the .end() is called and it's never free)

@RafaelGSS
Copy link
Contributor Author

Those affirmations were gotten from my debugging phrase in bubbleprof. But, for sure if I could create an easily reproducible code and share it on Node.js we could have more insights about that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants