fix: increase highWaterMark value to fix file descriptor issue on v16+ #389

RafaelGSS · 2021-12-20T18:41:45Z

It aims to solve: clinicjs/node-clinic#293 (comment).

However, I'm not entirely sure if it's the best approach, It was not documented anywhere in nodejs streams documentation but it aims to solve the behavior added in v15.9.0

I'm doing some tests to see if it affects the result data.

Important: see #389 (comment)

mcollina

Can you please add a unit test?

injects/logger.js

RafaelGSS · 2021-12-20T20:36:11Z

@mcollina after other tests, the behavior described above is happening when using fastify. If I create a raw HTTP server:

const http = require('http')

http.createServer(function (req, res) {
  res.end('Hello, World!\n')
}).listen(3000)

it works fine. So, it might need some other discoveries. Firstly, would you mind elaborating why do you think that it's a bug inside bubbleprof instead of node core? Because it works when using v15.8.0 or prior.

mcollina · 2021-12-20T21:24:27Z

My read of this is that something changed in node core and Bubbleprof needs to be updated.

If it is a bug in Node core, we'd need to work around it anyway.

mcollina

LGTM

After a long time spent on this, I don't think we can actually fix it before our deadline. There is likely a node core bug (we need to get back to it)

injects/logger.js

RafaelGSS · 2021-12-26T16:41:50Z

I was debugging inside node core and I have found that https://github.com/nodejs/node/blob/ce4d3adf5029278090bdfca8e2b8a2fc384b9cc6/lib/internal/streams/writable.js#L733 returns FALSE when using the "bad version" because of the state.length > 0.

It might be a bug inside backpressure streams or it might be missing a warning to tell the users what they should do in that situation, for instance, when there is no listener for drain event and it is emitted, send a warning.

Why it appears on v15.9.0+?

The nodejs/node#37300 adds top-level-await by default on Node.js which increases considerably the amount of data tracked by async_hooks, see below a comparison between the stack traces stored in the file.

v15.8.0 or prior

structuredStackTrace length 90
structuredStackTrace length 46
structuredStackTrace length 14
structuredStackTrace length 18
structuredStackTrace length 12
structuredStackTrace length 8
structuredStackTrace length 8
structuredStackTrace length 19
structuredStackTrace length 19
structuredStackTrace length 11
structuredStackTrace length 13
structuredStackTrace length 5
structuredStackTrace length 5
structuredStackTrace length 9
structuredStackTrace length 5
structuredStackTrace length 6
structuredStackTrace length 6
structuredStackTrace length 4
structuredStackTrace length 8
structuredStackTrace length 10

v15.9.0+

structuredStackTrace length 71
structuredStackTrace length 64
structuredStackTrace length 57
structuredStackTrace length 57
structuredStackTrace length 64
structuredStackTrace length 71
structuredStackTrace length 71
structuredStackTrace length 90
structuredStackTrace length 85
structuredStackTrace length 85
structuredStackTrace length 85
structuredStackTrace length 57
structuredStackTrace length 46
structuredStackTrace length 50
structuredStackTrace length 22
structuredStackTrace length 50
structuredStackTrace length 14
structuredStackTrace length 18
structuredStackTrace length 12
structuredStackTrace length 8
structuredStackTrace length 8
structuredStackTrace length 19
structuredStackTrace length 11
structuredStackTrace length 19
structuredStackTrace length 11
structuredStackTrace length 13
structuredStackTrace length 5
structuredStackTrace length 5
structuredStackTrace length 9
structuredStackTrace length 5
structuredStackTrace length 6
structuredStackTrace length 6
structuredStackTrace length 4
structuredStackTrace length 18
structuredStackTrace length 24
structuredStackTrace length 8
structuredStackTrace length 10
structuredStackTrace length 10
structuredStackTrace length 24
structuredStackTrace length 18
structuredStackTrace length 15
structuredStackTrace length 10
structuredStackTrace length 24
structuredStackTrace length 18
structuredStackTrace length 21
structuredStackTrace length 10
structuredStackTrace length 24
structuredStackTrace length 18
structuredStackTrace length 21
structuredStackTrace length 10
structuredStackTrace length 18
structuredStackTrace length 10

We are now written way more than in the old versions because of top-level-await support. Hence, the backpressure module is triggered on those streams, which means that the drain event is emitted when stream.write returns false.

If a call to stream.write(chunk) returns false, the 'drain' event will be emitted when it is appropriate to resume writing data to the stream.

In that case, we can either handle the drain event or increase the highWaterMark option.

By the docs:

Data is buffered in Writable streams when the writable.write(chunk) method is called repeatedly. While the total size of the internal write buffer is below the threshold set by highWaterMark, calls to writable.write() will return true. Once the size of the internal buffer reaches or exceeds the highWaterMark, false will be returned.

Note: Would be great to create an easily reproducible example to tag in an issue inside Node.js for that.

mcollina

lgtm

simoneb · 2022-01-03T15:45:02Z

@RafaelGSS why do you think this is an issue in Node core?

RafaelGSS · 2022-01-03T17:38:18Z

@RafaelGSS why do you think this is an issue in Node core?

We have solved it in bubbleprof by just increasing the highWaterMark value, however, if a new update from Node.js is released and after that and we start to capture more events at the same time, this bug could show again.

As mentioned above, this bug might exist in old versions of Node.js as well, the difference is that in the old versions we were not dispatching too many events at the same time.

When the out stream calls .end() is expected to have the queue empty, but in this case, it was not free for some unknown reason. The next step is to create a simple reproducible example and then solve it on the core.

is that clear for you?

simoneb · 2022-01-03T18:27:43Z

Yes, the explanation is clear. Yet, it feels like a stream that's raising the end event before all data has been read would be such an obvious bug that I wound find it really odd that this wasn't caught in Node's own tests.

simoneb · 2022-01-03T18:54:37Z

I wonder if this isn't more likely to be a bug in the implementation of our decoder transform stream (which extends AbstractDecoder, which in turn contains the whole of the implementation). For some reason it may not be respecting the invariants that Node expects. I haven't looked into it though.

RafaelGSS · 2022-01-03T22:13:33Z

I wonder if this isn't more likely to be a bug in the implementation of our decoder transform stream (which extends AbstractDecoder, which in turn contains the whole of the implementation). For some reason, it may not be respecting the invariants that Node expects. I haven't looked into it though.

Firstly, the issue is not in the decoder because the behavior happens in the encoder. We have removed the decoder and added a breaking point inside of it, and it doesn't work (never calls).

We have also tested without the encoder (just passing through) and the bug is still there. One thing that we have discovered is that if we reduce the amount of data written at the same time it works. For instance, I have changed it to:

  _transform (message, encoding, callback) {
-    const messageLength = this._messageType.encodingLength(message)

-    const framedMessage = Buffer.alloc(messageLength + FRAME_PREFIX_SIZE)
-    framedMessage.writeUInt32BE(messageLength, 0)
-    this._messageType.encode(message, framedMessage, FRAME_PREFIX_SIZE)

-    callback(null, framedMessage)
+   callback(null, "example message")
  }

and works fine, but if I change it to

-  callback(null, "example message")
+  callback(null, JSON.stringify(message))

it doesn't work again.

Note: one thing important to mention is not because the fs stream is stuck and doesn't write anything after much data were written (I have added a setInterval doing an fs.write and it works perfectly, the main issue is when we write much data at the same time and this call returns false) which means that the drain event will be emitted and we should deal with it. From the Node.js docs:

If a call to stream.write(chunk) returns false, the 'drain' event will be emitted when it is appropriate to resume writing data to the stream.

Also, look at one example for Node.js docs:

// Write the data to the supplied writable stream one million times.
// Be attentive to back-pressure.
function writeOneMillionTimes(writer, data, encoding, callback) {
  let i = 1000000;
  write();
  function write() {
    let ok = true;
    do {
      i--;
      if (i === 0) {
        // Last time!
        writer.write(data, encoding, callback);
      } else {
        // See if we should continue, or wait.
        // Don't pass the callback, because we're not done yet.
        ok = writer.write(data, encoding);
      }
    } while (i > 0 && ok);
    if (i > 0) {
      // Had to stop early!
      // Write some more once it drains.
      writer.once('drain', write);
    }
  }
}

So, in the first moment, the current behavior looks expected, however, I don't think being quiet when it happens is doable, I feel that Node.js should emit a warning when a drain event is emitted and the user doesn't have a listener to it. Also, the current behavior might not be expected in Node.js (fs.stream contains a length > 0 when the .end() is called and it's never free)

RafaelGSS · 2022-01-03T22:15:04Z

Those affirmations were gotten from my debugging phrase in bubbleprof. But, for sure if I could create an easily reproducible code and share it on Node.js we could have more insights about that.

fix: use out.destroy to fix file descriptor issue on v16+

9adc157

RafaelGSS requested a review from mcollina December 20, 2021 18:41

mcollina requested changes Dec 20, 2021

View reviewed changes

injects/logger.js Outdated Show resolved Hide resolved

mcollina approved these changes Dec 23, 2021

View reviewed changes

injects/logger.js Outdated Show resolved Hide resolved

update: increase highWaterMark option

5cc4aed

RafaelGSS requested a review from mcollina December 26, 2021 16:42

test: adjust threshold

b9ea3a3

mcollina approved these changes Dec 26, 2021

View reviewed changes

RafaelGSS added 2 commits December 26, 2021 14:05

update: revert process sigint in application

a506b26

test: use on end event

7e338bc

RafaelGSS changed the title ~~fix: use out.destroy to fix file descriptor issue on v16+~~ fix: increase highWaterMark value to fix file descriptor issue on v16+ Dec 27, 2021

RafaelGSS merged commit d59e18d into main Dec 27, 2021

RafaelGSS deleted the fix/file-descriptor-destroy branch December 27, 2021 12:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: increase highWaterMark value to fix file descriptor issue on v16+ #389

fix: increase highWaterMark value to fix file descriptor issue on v16+ #389

RafaelGSS commented Dec 20, 2021 •

edited

Loading

mcollina left a comment

RafaelGSS commented Dec 20, 2021

mcollina commented Dec 20, 2021

mcollina left a comment

RafaelGSS commented Dec 26, 2021

mcollina left a comment

simoneb commented Jan 3, 2022

RafaelGSS commented Jan 3, 2022

simoneb commented Jan 3, 2022

simoneb commented Jan 3, 2022

RafaelGSS commented Jan 3, 2022 •

edited

Loading

RafaelGSS commented Jan 3, 2022

fix: increase highWaterMark value to fix file descriptor issue on v16+ #389

fix: increase highWaterMark value to fix file descriptor issue on v16+ #389

Conversation

RafaelGSS commented Dec 20, 2021 • edited Loading

mcollina left a comment

Choose a reason for hiding this comment

RafaelGSS commented Dec 20, 2021

mcollina commented Dec 20, 2021

mcollina left a comment

Choose a reason for hiding this comment

RafaelGSS commented Dec 26, 2021

Why it appears on v15.9.0+?

mcollina left a comment

Choose a reason for hiding this comment

simoneb commented Jan 3, 2022

RafaelGSS commented Jan 3, 2022

simoneb commented Jan 3, 2022

simoneb commented Jan 3, 2022

RafaelGSS commented Jan 3, 2022 • edited Loading

RafaelGSS commented Jan 3, 2022

RafaelGSS commented Dec 20, 2021 •

edited

Loading

RafaelGSS commented Jan 3, 2022 •

edited

Loading