Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stream: mutable highwatermark #33346

Conversation

0807Jpatel
Copy link

added methods to update highWaterMark and objectMode values of Readable and Writable streams.
Refs: #32731
Can also be used for following: #30107

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@nodejs-github-bot nodejs-github-bot added the stream Issues and PRs related to the stream subsystem. label May 10, 2020
@mscdex
Copy link
Contributor

mscdex commented May 10, 2020

As far as the original issue goes, perhaps it would be enough to instead pass in highWaterMark values in an options object passed to http.createServer()?

@0807Jpatel
Copy link
Author

yep, I was intending to do that. But just seemed like a hack than a solution. And it was suggested that I just make them mutable.

@mscdex
Copy link
Contributor

mscdex commented May 11, 2020

I see it as the opposite. Being able to dynamically change the watermark after stream creation doesn't make much sense and I've never seen a real use case for it. Specifying the highWaterMark up front like we can with http/https/tls's request() makes more sense and would be more consistent if we added them to the server counterparts.

@0807Jpatel
Copy link
Author

I am not sure why the checks failed. I don't see any error message.

Copy link
Contributor

@mscdex mscdex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making it official for the reasons previously stated. additionally because i think we should strive to not "pollute" streams objects as much as possible, even if the chance of a property clashing is unlikely.

@addaleax addaleax added the semver-minor PRs that contain new features and should be released in the next minor version. label May 11, 2020
Copy link
Member

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Soo … I agree that it seems odd to be able to change the objectMode flag on a stream after creation – conceptually, whether a stream is in object mode or not is part of its type and should always be known by the time its constructor is called.

However, for the highWaterMark, I see no reason not to make it adjustable, and just because @mscdex has “never seen a real use case for it” doesn’t mean that it isn’t a sensible thing to add. The HWM is mostly relevant for the producer/consumer of a stream, which is often different from the creator of the stream and thus not necessarily in control over constructor arguments.

I don’t think property clashing is a realistic argument here.

@nodejs/streams @nodejs/tsc

doc/api/stream.md Outdated Show resolved Hide resolved
Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm -1 in changing objectMode. I'm ok in adding a way to change the highWaterMark.

However, instead of adding a new method, we should just add a set accessor to

node/lib/_stream_readable.js

Lines 1087 to 1092 in 8a6fab0

readableHighWaterMark: {
enumerable: false,
get: function() {
return this._readableState.highWaterMark;
}
},
.

lib/_stream_readable.js Outdated Show resolved Hide resolved
@0807Jpatel
Copy link
Author

@mcollina why are you against objectMode? because I can see use case of wanting to use objectMode over the network.

@mcollina
Copy link
Member

Can you please articulate why you'd want to change objectMode? Changing it means that downstream or upstream should be aware of the change. Morever, it means that it will possible to change this for streams that are binary-only by default.

It also changes how encoding works etc.

@0807Jpatel
Copy link
Author

0807Jpatel commented May 11, 2020

wouldn't the client receive chunks of object when requesting (http.request) if objectMode is not set to true?
This could happen when Socket decides to write only half of the "object" in a transfer and other half in next transfer.

@0807Jpatel
Copy link
Author

0807Jpatel commented May 11, 2020

var http = require('http');
const { Readable, Writable } = require('stream')

let i = 0;

const myReadable = new Readable({
  objectMode: true,
  read(size) {
  if(i < 3){
    this.push(Buffer.alloc(65536));
  } else {
    this.push(null);
  }
  i++;
  }
});

const myWritable = new Writable({
  objectMode: true,
  write(chunk, encoding, callback) {
    console.log("write length: ", chunk.length);
    callback();
  }
});


var server = http.createServer(function (req, res) {
  myReadable.pipe(res);
});

server.listen(8000, '127.0.0.1');

http.get('http://localhost:8000',  res => {
  res.pipe(myWritable);
});

Output:

write length:  65426
write length:  110
write length:  32761
write length:  32775
write length:  32752
write length:  32784

The idea behind making objectMode mutable was to "fix" this issue. Where each objects are broken down into chunks even tho both myReadable and myWritable set objectMode to true.
@ronag @mcollina

@ronag
Copy link
Member

ronag commented May 11, 2020

@0807Jpatel Sorry, I'm confused here. What is it we're trying to solve? The changing of objectMode doesn't really make sense to me.

@jasnell
Copy link
Member

jasnell commented May 11, 2020

In your example @0807Jpatel, the code is not really broken. In objectMode, the Writable will accept any JavaScript value and it's up to the implementation to decide if it's the right kind of value it expects. In this case, you're passing in Buffer instances that are objects, so the downstream Writable accepts them with no problem. These aren't partial writes or broken down into anything, it's dealing with the Buffer objects given to it on each write.

I agree with the others, making objectMode mutable does not make sense. Making highWaterMark mutable is fine.

@0807Jpatel
Copy link
Author

@jasnell @ronag, Sorry. My fault it seems like I understood the objectMode wrong.
Do we not support objectMode for outgoing messages? When I try to push anything other than Buffer or String, from myReadable it throws error The "chunk" argument must be of type string or an instance of Buffer or Uint8Array Its coming from _http_outgoing.js

@jasnell
Copy link
Member

jasnell commented May 11, 2020

Correct, most streams operate in byte-mode (objectMode = false) which means they treat strings and Buffer objects differently (as byte streams rather than objects). This is the default mode for all Writable and Readable objects. It is not possible to pipe from a Readable with objectMode = true to a Writable with objectMode = false.

@jasnell
Copy link
Member

jasnell commented May 11, 2020

@mcollina and @ronag : It might actually make sense for both pipe and pipeline to actually verify that a Readable that is being piped to a Writable are actually operating in the same mode rather than waiting for the write attempt.

@0807Jpatel
Copy link
Author

@jasnell could you please explain why we don't support objectMode for outgoing message, because to me it makes sense that user would want to send objects through SSE.

@mcollina
Copy link
Member

It might actually make sense for both pipe and pipeline to actually verify that a Readable that is being piped to a Writable are actually operating in the same mode rather than waiting for the write attempt.

Not true. A trick I use all the time it to emit strings or buffer in object mode, and then pipe that into a Writable in binary.

@addaleax
Copy link
Member

Yeah, we do have tests that check that piping between object-mode/non-object-mode streams works so long as there’s only Buffers/strings involved :)

@benjamingr
Copy link
Member

benjamingr commented May 11, 2020

@0807Jpatel I think you might be mixing up a few things here:

could you please explain why we don't support objectMode for outgoing message, because to me it makes sense that user would want to send objects through SSE.

  • You have an outgoing HTTP message.
  • At the end of the day bytes get written on a socket via a system call.
  • Streams let you decide how often to write bytes to the buffer and watermarks let you avoid frequent pulls and avoid unneeded I/O. (So you buffer until you reach the high watermark and stop).
  • Streams are by default sequences of bytes. That's what platforms typically understand.

In order to send an object over HTTP - you must first specify how Node.js should translate it into bytes. This is done by serializing it into a data exchange format like protocol buffers or JSON.

Now your use case isn't unreasonable. Some frameworks (asp.net WebAPI comes to mind) perform "content negotiation" - based on the incoming message and serialize the stream in an output format they think the client understands (still in bytes).

Streams solve a very real case for Node and most Node programs build on them (file streams, network streams etc). They serve a fundamental and crucial role in the platform.


Now - what about object mode? If I understand correctly it was a happy accident (@Raynos asked for them?). Object mode is certainly useful but it doesn't serve the same "building block" IMO and doesn't belong in the same layer.

An object mode "stream" for HTTP would need to perform said "content negotiation" with the client and switching in the middle of a request would be confusing. Mostly since the stream would need to know how to serialize its entities in order to do so correctly (and not all objects are serializable and certainly not in every representation).


As for this PR, I agree with @addaleax's comment #33346 (review) and I echo her use case of streams whose creation users don't control. There are plenty of examples, here is one related to a use case I was discussing with Matteo:

  • I have a child process I created with fork that takes an array of stdio in order to offload work to another process (in the real case logging).
  • I pass 'pipe' to stdio to pipe the data from the parent to the child.
  • I get a stream back on subprocess.stdin whose watermarks I can't control.

I could pass a stream in instead of a string (that's how I worked around it), but I can definitely see the use case here and the solution I ended up owith was very unergonomic.

@@ -568,6 +568,24 @@ added: v12.3.0

Getter for the property `objectMode` of a given `Writable` stream.

##### `writable.updateWritableHighwaterMark(highwaterMark)`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can actually see a case for this part and updating the watermarks. I even needed to use the lowWatermark at some point for a specific use case at my old job and was sad to find out we don't have one anymore :]

I think this sort of change would need to come with a concrete use case of why I'd update the watermarks.

doc/api/stream.md Outdated Show resolved Hide resolved
@Raynos
Copy link
Contributor

Raynos commented May 12, 2020

Now - what about object mode? If I understand correctly it was a happy accident (@Raynos asked for them?). Object mode is certainly useful but it doesn't serve the same "building block" IMO and doesn't belong in the same layer.

The original use case for objectMode is a userland package like https://www.npmjs.com/package/JSONStream

This is a Writable stream of bytes ( or utf8 strings ) and a Readable stream of javascript objects

I don't believe node core has a use case for objectMode: true anywhere but there are many packages in npm that use require('stream').Readable and require('stream').Writable interfaces without writing buffers or strings or anything else that can meaningfully be "buffered" or "concatenated"

I believe objectMode: true is a hack and a bad idea in hindsight but we must carry it forward for backwards compatibility.

I can't imagine any use case for changing it after the creation of a stream.

If you want to know another fun story about why objectMode exists look at the sin that is https://github.com/Raynos/events-stream ; At some point in the past I thought that stream.Readable was a great abstraction for piping DOM events around in pure browser js code.

@Raynos
Copy link
Contributor

Raynos commented May 12, 2020

For changing highWaterMark after creation I am 👍 if it's not too challenging.

@benjamingr
Copy link
Member

@ronag I think the major issue is that none of are are completely sure about this :D

I have an example use case here at the bottom #33346 (comment)

To be fair at my last job (building stuff for p2p CDNs) this sort of "games" with watermarks were extremely common and I ran into a lot of issues with watermarks I worked around - so my point of view is probably very biased towards one use case where really accurate control of the amount of buffering is important. For example - I ran into the lack of low watermarks since Node.js 0.9 and maintained only to be the second person to ask for it if someone ever did - but no one did while I was working on the p2p stuff so I never asked for it :]

@ronag
Copy link
Member

ronag commented May 15, 2020

I have an example use case here at the bottom #33346 (comment)

You can already raise the Readable watermark by doing a manual read(newHigherWatermark).

Which makes me wonder why we need https://github.com/nodejs/node/pull/33346/files#diff-ba6a0df0f5212f5cba5ca5179e209a17R1096 (which is my primary concern).

Likewise, I'm uncertain why exactly we need the corresponding condition in Writable.

this sort of "games" with watermarks were extremely common and I ran into a lot of issues with watermarks I worked around

Note that sometimes these kind of things are a chain of streams and you are just receiving the head ot the chain. Thinking that changing the watermark will do what you expected is going to be unfortunate and confusing.

e.g.

function myApi(...) {
   return pipeline(readFromFile(...), someTransform())
}

const src = myApi()
src.highWaterMark = 1e9 // this will only change the last in the pipeline chain... if it doesn't kill it with the current PR :/

@benjamingr
Copy link
Member

You can already raise the Readable watermark by doing a manual read(newHigherWatermark).

Technically that would work but that's not the same thing as just setting the high watermark. In my use case I don't actually want to read anything yet and I don't want to pull the data - I just want to set up the amount of buffering.

Asking users to .read(newHwm) data just to set the highWatermark even though in practice you might not even read that much data (from a consumer PoV) didn't really address my use case (neither in the example above nor in the "real" cdn streaming use case),

One can argue these use cases are so unique they don't justify complicating our code or API - I will concede to that if you feel strongly it doesn't justify the added complexity.

Thinking that changing the watermark will do what you expected is going to be unfortunate and confusing.

Yes, that's actually the point David raised (of double buffering and watermarks) in the deno streams discussion I linked to above.

Copy link
Member

@benjamingr benjamingr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissing my review in the meantime so this doesn't land by accident until I am convinced again this is a good idea and discussions have exhausted.

To be clear - the actual code and changes LGTM and I am still +1 on this changes, but there are ongoing discussions and I want to make sure this doesn't get merged (with the 2 LGTMs) while they are happening.

@ronag
Copy link
Member

ronag commented May 15, 2020

I think configuration of watermarks should be done during creation to make it consistent and work without edge cases. So with @benjamingr fork example #33346 (review). I think the watermark is an option you pass to fork (does not exists at the moment) which would ensure that the users expectation is fulfilled regardless of double buffering etc...

@benjamingr
Copy link
Member

I think configuration of watermarks should be done during creation to make it consistent and work without edge cases

The problem I see is even if configuration of watermarks happens during creation that doesn't ensure no double buffering occurs or make any guarantees mutable watermarks don't have.


Maybe @0807Jpatel can share what use case they had that these changes address? I want more voices and use cases to be heard other than mine here :]

@ronag
Copy link
Member

ronag commented May 15, 2020

the actual code and changes LGTM and I am still +1 on this changes

I'm -0 with the errorOrDestroy conditions. Otherwise I'm unconvinced but neutral.

@0807Jpatel
Copy link
Author

I agree on @ronag that previous solution made more sense, but it would definitely lead to some bad practice. AFAICT there should not be any reason to change highWaterMark while reading/writing.

Our use case is actually very simple. we need to reduce the highwatermark value of response.

var server = http.createServer(function (request, response) {
  response.writableHighWaterMark = 5 * 1024;
  myReadable.pipe(response);
});

@ronag
Copy link
Member

ronag commented May 15, 2020

AFAICT there should not be any reason to change highWaterMark while reading/writing.

Neither do I. But write can call things indirectly which can cause the user to unexpectedly be in the write scope. It's kind of an unlikely edge case... but it's another rule the user has to keep track of and I'd prefer fewer of those.

@0807Jpatel
Copy link
Author

The whole idea behind this is to change highWaterMark before you start doing read/write that also includes pipe.
https://github.com/nodejs/node/pull/33346/files#diff-b88f51f89100f68ef3f96cfa3a07929eR22
This would be the ideal use case, but I def understand what you are saying.
Do you know any arguments against previous implementation? (where we defer update till next read/write) besides the complexity.

@ronag
Copy link
Member

ronag commented May 15, 2020

Do you know any arguments against previous implementation? (where we defer update till next read/write) besides the complexity.

#33346 (comment)

Thinking that changing the watermark will do what you expected is going to be unfortunate and confusing.

@ronag
Copy link
Member

ronag commented May 15, 2020

Our use case is actually very simple. we need to reduce the highwatermark value of response.

Why do you want to reduce it? Reduce memory usage?

Can we think of some other way to solve this? e.g. a watermark option to http.createServer?

Another consideration would be to add it as an option to pipe? Would reduce API surface and potential misuse.

@0807Jpatel
Copy link
Author

@ronag we have gone full circle here. This thread started with that exact suggestion. When I asked for the feature request, it was to add highwatermark to http.createServer. But we have seen enough use case to make highwatermark mutable. #33346 (review)

IMO we should make highwatermark mutable, but lets see what others have to say about this.

@benjamingr
Copy link
Member

@0807Jpatel hey, just wanted to apologize for the (probably) frustrating process of being asked to do multiple contradicting things :]

I hope it's clear that this is genuinely because we are honestly not sure what the right thing here (as a project) is. I think there was a misunderstanding (in our user base and in core) regarding what highWaterMark does. I also want to mention that in terms of technical levels of the PRs, willingness to collaborate and actual code you are doing well.

Here are some things we may want to do to move this forward:

  • Have a "watermarks in streams" session in the summit next month where we get everyone interested in one room (you invited obviously), show the use cases and decide what solution makes the most sense (mutable highWatermark) vs. setting it in places like http.createServer.
  • Have the same session but not as a part of the OpenJS summit (just a bunch of people).
  • Escalate the issue to the technical steering committee and ask for guidance about the process.
  • Get buy in from all collaborators who have made interesting contributions to the discussion here (Brian, Anna, Matteo, Robert etc) in the issue to move forward - which looks kind of hard at this point.

I'll happily help facilitate either of these 4 options (or an alternative I haven't considered yet). You are also welcome to send me an email (all our emails are in the README). Intuitively I recommend the first one (so more people have a chance to attend).

@benjamingr
Copy link
Member

@0807Jpatel any particular preference to either of the options?

@0807Jpatel
Copy link
Author

@benjamingr I think making highwaterMark is what I would prefer. Someone has done a patch work for http.request to take in highwatermark as options. The issue with that route is we will eventually see a use case that can only be solve by making highwater mark mutable, and all these patch works will still need to be part of the code.
I agree with you guys that more opinions are def needed.

@benjamingr
Copy link
Member

benjamingr commented May 23, 2020

@nodejs/tsc can you please provide guidance about how to proceed here?

This isn't "stalled" and there are no fundamental disagreements, we need some way to:

  • Get interested parties to discuss setting the watermark (ronag, jpatel, mcollina, mscdex, myself, etc).
  • Figure out where's the right place to add this capability if we want to support it (to https, to the stream itself or elsewhere).
  • Follow up on that.

Right now we have a first-time-contributor who has opened several PRs attempting to resolve this and we have given them confusing/conflicting comments in several places. I'm a bit frustrated that this has been their experience here. (It's a tricky change that's fair)

@ronag
Copy link
Member

ronag commented May 23, 2020

I'm more than happy to further discuss this. Though I think what OP is actually trying to solve here has gotten a bit lost (see refs).

I believe there are a few different things here:

More backpressure on server incoming message (#30107).

Which lead to (#30107):

  • Setting the highwatermark on a http server response (i.e. IncomingMessage)
    • Is it per instance or globally?

Which lead to this PR:

  • Changing the highWatermark on an already constructed stream
    • Unclear if it's a good idea. Are we actually just trying to solve the above?

@benjamingr FYI the OP's referenced original issue #32731 has been resolved/closed #32731 (comment)

@0807Jpatel
Copy link
Author

@ronag sorry that close was premature, transform did not solve the issue we are facing. We are just trying to reduce highWaterMark of response in createServer. There are two ways of solving this

  1. add ability to change highWaterMark.
  2. update createServer to take in highWaterMark as part of option. We see something similar to this in Feature Request: Option to configure highWaterMark of HTTP response #30107. The reason I brought this up was to show we can do something similar to this. But if in future there is a use case to make highwaterMark mutable, all the "patch" work we do here will still need to be supported.

So I think making highwaterMark mutable is better, as we have seen couple of use cases already and no significant draw back. I think most of us in this thread agree that it is okay to add this, we are just trying to figure out whats the best way to implement this.

@benjamingr

This comment has been minimized.

@ronag
Copy link
Member

ronag commented May 23, 2020

transform did not solve the issue we are facing

What is the issue you are facing and why did transform not resolve it? From what I understood you just wanted to create more backpressure?

add ability to change highWaterMark.

I'm not blocking this but I'm skeptical of its usefulness and worried about complexity and user confusion.

Since the highwatermark is kind of a hint... writing code that actually depends on the observable behavior of the highwatermark is something I would discourage (unless in objectMode).

update createServer to take in highWaterMark as part of option

I'm not sure I followed the reason for not going down this track?

@mcollina mcollina dismissed their stale review May 24, 2020 16:09

changed my mind.

@mcollina
Copy link
Member

update createServer to take in highWaterMark as part of option.

I would actually recommend this (independently of this patch).


Thinking about this a bit more, It would be really hard to support this for http.OutgoingMessage, if that is the end goal.

@BridgeAR BridgeAR force-pushed the master branch 2 times, most recently from 8ae28ff to 2935f72 Compare May 31, 2020 12:18
@benjamingr benjamingr dismissed their stale review July 5, 2020 16:11

not up to sync on this and don't want to block by accident

@bl-ue
Copy link
Contributor

bl-ue commented Jun 5, 2021

I'm guessing that this PR, being quite old, should be closed.

@mcollina mcollina closed this Jun 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver-minor PRs that contain new features and should be released in the next minor version. stream Issues and PRs related to the stream subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants