-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fs.readFile is slower in v10 #25741
Comments
I agree that #17054 seems likely as a source of the issue, and I agree with your reasoning re: stream usage. |
@zbjornson I think your benchmark is artificial in that you're measuring the latency of a 16mb For the general case, I think #17054 was a brilliant contribution.
In fact, your moving to Perhaps |
Did a quick dirty check, I reverted the change #17054, so now the example benchmark looks like:
Here is the diff BTW:
|
I think
|
Does this affect |
@jdalton no, @jorangreef The benchmark reflects a real-world use case (asynchronously read a file as fast as possible in a scenario where this type of concurrency isn't an issue and/or it's an issue that can be mitigated by increasing the threadpool size). I'd love to know where
This is equally valid to support not partitioning reads. Note in my OP that it's 16 LOC for (I think) a safe implementation of a non-partitioning read, while concatenating a stream is only 5 LOC (below). For that reason, it would be nicer for Node.js to provide a one-shot method and let users create a partitioning method. function readPartitioned(filename, cb) {
const rs = fs.createReadStream(filename);
const buffs = [];
fs.on("data", d => buffs.push(d));
fs.on("error", e => cb(e));
fs.on("end", () => cb(null, Buffer.concat(buffs)));
}
I did sort of a poor job trying to explain this in my OP, but partitioning still causes degradation, and in some senses, causes worse degradation due to the increased overhead. |
Your argument is kind of like saying TCP should not have congestion control. A single TCP connection should be given maximum bandwidth, to download a file "as fast as possible". The problem with this, is that it ignores TCP as a system of connections, and allows a single connection to starve all others. I don't think that leaving congestion control out of TCP is a good default, most users of TCP won't realize their need for it, and the Internet might collapse. Sure, congestion control has an effect on single connection download latency, but it's what most users of TCP want (whether they know it or not). If you really want "fast as possible" downloads, you are always free to use something custom such as UDT, and then I think metrics such as LOC are not the biggest concern.
|
(TCP congestion control is adaptive to network conditions and is designed to allow large segments when there is low congestion to maximize throughput. Always partitioning fs reads is like always running with a small segment size. You also generally can't quickly add more bandwidth to a network to mitigate congestion, whereas you can increase the threadpool size, number of cluster workers and/or number of servers running a web app.) If increasing the partition size works, then that's a perfect solution. But my point is that the primary use cases for |
I disagree. Avoiding partitioning optimizes with the usecase of “only one read at a time”, and in fact you could potentially replace that with a single readFileSync call. The problem we are facing atm is due to the high cost of moving control back-and-forth between JS and C++. Increasing the default chunk size and making it customizable will help significantly. |
Offered for discussion: here's an alternative solution that uses That has nearly the same performance as a one-shot read, and unblocks the event loop almost entirely, while (I think) working around the issues that prevented libuv from using that method in the first place. If folks think that's a viable solution, I'd be happy to work on a PR for it. |
@nodejs/libuv |
aio_read() uses threads too (or rather, libaio does) so no silver bullet there. :) |
Aye, but it offloads thread scheduling to the kernel/OS. Think it's a no-go? |
The parameters might be a little different from libuv but it's still a threadpool under the hood. I don't think it's going to make a material difference. |
I agree. However the aio implementation |
I'll run and post some benchmarks with varying chunk sizes and concurrency later today! That'll work off of a branch that allows specifying the chunk size, so if it becomes a parameter to readFile, that commit could become a PR. |
Here are curves showing various chunk sizes for two file sizes for several numbers of concurrent reads of different files. This used this branch, which adds a Changing the default from 8 KiB to 512 KiB gets to >50% of the max in all cases I tried. From this comment it sounds like the intended default was 64 KiB? — Raising the default plus having the option to set it to
There are a few differences that might inspire a better solution in libuv/Node.js than always partitioning.
Also note that by increasing the duration of reads, partitioning can increase the memory requirement vs. letting earlier reads complete and their resulting buffer possibly/likely being released sooner. (Above, the blue req's buffer is reserved for an extra 200 ms.)
Not just JS/CPP transitions, but also more syscalls+context switches and possibly issued I/O ops. Below are the strace summaries for the various methods. It's a shame to waste cycles especially when there's no contention for execution resources. strace summaries
I'd love to hear counterarguments, but it seems like partitioning has replaced one theoretical DoS vector (via blocking |
Why not? Because you think streams cover all use-cases?
The DoS vector addressed by partitioning is not theoretical. Please don't muddy the water. I would love to see working exploits of your non-theoretical DoS vectors (plural).
It was never the goal of partitioning to safeguard users from infinitely accepting new work. That's a strawman.
You might want to look into the multi-threadpool work being doing by @davisjam, the author of the partitioning patch you are so opposed to. |
@zbjornson your analysis is 100% correct if you consider an application whose scope is to only read files. A typical Node.js application uses the thread pool for all sort of operations, including some critical V8 functions, such as the parallel garbage collector. The previous implementation could saturate the thread pool because the time spent reading was unbounded. |
(V8 operations are currently on a separate threadpool.) |
@addaleax is that documented/written somewhere? I'm asking because I thought that and somebody at the last collaborator summit said it was scheduled on the same pool. The concept still holds true, the threadpool is shared between a lot of different operations, and big file reads could starve it. |
Streams are specifically designed to partition work so that I/O can be interleaved between concurrent requests. I don't understand why |
@sam-github I’m not happy of blocking a thread for a time that is function of the input size. I would be happy to interleave at 1MB or more, even. |
@mcollina I don't feel really strongly about this, but I do think some APIs are not appropriate for some uses. The |
I agree. I think we always said “use readFile because it’s safe and it does not block the event loop” and without the interleaving this sentence is not correct. I prefer not to change our messaging regarding this and add some interleaving. |
@jorangreef I'm sorry if anything I said came off as rude or an attack. My only goal here is to point out the current issues and try to propose better solutions.
Not all, but most of the ones that come to mind based on the discussion in #17054. @davisjam even said
"Theoretical" in the sense that @bnoordhuis described it as theoretical in #17054 (comment) and #17054 (comment) -- no users have opened issues for it and I've not seen it described elsewhere as a vulnerability. @davisjam's explanation of it as a DoS vector required using A concrete example @davisjam provides is
I didn't say any of the three new issues are DoS vectors; I said they are issues that may provide other DoS vectors. I hesitate to call anything that the user has control over a "DoS vector," otherwise you can say the Array constructor is a DoS vector also. Nonetheless, in my previous post I think I explained fairly thoroughly how these new issues can lead to resource exhaustion, which could be considered vectors.
Partitioning made this an issue. Previously Node.js would queue up work instead of accepting work infinitely.
I'm a fan of it and hope it or a variant of it lands, as I think it's a better solution than partitioning. @bnoordhuis has investigated a separate I/O threadpool several times per #11855 and I'd love to get his input on that vs. partitioning. The extent of my own testing was limited to the aio repo I posted earlier, which uses a separate threadpool just for I/O, and it seemed promising.
@mcollina I think that was a typo, but it doesn't block the event loop, it blocks one of the worker threads. The API docs right now just discourage use of the *Sync methods, so the messaging I think remains clear. At the risk of another lengthy post that might be misconstrued as argumentative, I can explain why I think this blocking is preferable to partitioning if you want, but it's rooted in the issues I described in the previous post.
(The reasoning that's not the case right now is described in #14001 (comment). I'd actually consider the threadpool usage "limited" -- just those listed in https://nodejs.org/api/cli.html#cli_uv_threadpool_size_size -- while acknowledging that many HTTP server apps use zlib and crypto.) |
The thing is, even without partitioning, If "as fast as possible" is your goal, and it looks like it is, then you should be using:
You sacrifice all of these when you use For your use case, you should be using |
I see I'm a bit late to this discussion. @zbjornson, you're right that #17054 may have a non-trivial performance impact for certain workloads. My general position is with @jorangreef: if you want to read files as fast as possible, you should write a C++ add-on or use the Node.js syscall wrappers. Indeed you seem to have discovered this :-). A higher-level API like I agree with @mcollina that
If you feel this is a critical problem, then PRs welcome! |
This increases the maximum buffer size per read to 256kb when using `fs.readFile`. This is important to improve the read performance for bigger files. Refs: nodejs#25741
This increases the maximum buffer size per read to 512kb when using `fs.readFile`. This is important to improve the read performance for bigger files. PR-URL: #27063 Refs: #25741 Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Jamie Davis <[email protected]>
I know this was not fully resolved but does this need to remain open at this point? |
Issue nodejs#25741 discusses a number of performance considerations for fs.readFile, which was changed in Node.js 10.x to split discreet chunk reads over multiple event loop turns. While the fs.readFile() operation is certainly slower than it was pre 10.x, it's unlikely to be faster. Document the performance consideration and link back to the issue for more in depth analysis. Signed-off-by: James M Snell <[email protected]> Fixes: nodejs#25741
Issue #25741 discusses a number of performance considerations for fs.readFile, which was changed in Node.js 10.x to split discreet chunk reads over multiple event loop turns. While the fs.readFile() operation is certainly slower than it was pre 10.x, it's unlikely to be faster. Document the performance consideration and link back to the issue for more in depth analysis. Signed-off-by: James M Snell <[email protected]> Fixes: #25741 PR-URL: #36880 Reviewed-By: Antoine du Hamel <[email protected]>
Issue #25741 discusses a number of performance considerations for fs.readFile, which was changed in Node.js 10.x to split discreet chunk reads over multiple event loop turns. While the fs.readFile() operation is certainly slower than it was pre 10.x, it's unlikely to be faster. Document the performance consideration and link back to the issue for more in depth analysis. Signed-off-by: James M Snell <[email protected]> Fixes: #25741 PR-URL: #36880 Reviewed-By: Antoine du Hamel <[email protected]>
Issue #25741 discusses a number of performance considerations for fs.readFile, which was changed in Node.js 10.x to split discreet chunk reads over multiple event loop turns. While the fs.readFile() operation is certainly slower than it was pre 10.x, it's unlikely to be faster. Document the performance consideration and link back to the issue for more in depth analysis. Signed-off-by: James M Snell <[email protected]> Fixes: #25741 PR-URL: #36880 Reviewed-By: Antoine du Hamel <[email protected]>
I'm seeing a 7.6-13.5x drop in read throughput between 8.x and 10.x in both the readfile benchmark and our real-world benchmarks that heavily exercise
fs.readFile
. Based on my troubleshooting, I think it's from #17054.The readfile benchmark (Ubuntu 16):
From what I can extract from the comments in #17054, either no degradation or a 3.6-4.8x degradation was expected for the len=16M cases.
As for why I think it's because of #17054, the benchmark below compares
fs.readFile
against an approximation of howfs.readFile
used to work (one-shot read), measuring time to read the same 16 MB file 50 times.fs.readFile()
(ms)We've switched to
fs.fstat()
and thenfs.read()
as a work-around (above; note how many lines of code it is), but I wouldn't be surprised if this has negatively impacted other apps/tools. As far as the original justification: I'm not convinced that was an appropriate fix or a problem that needed fixing. Web servers (where DoS attacks are relevant) should generally be usingfs.createReadStream.pipe(response)
for serving files (ignoring other use cases for now). Other sorts of apps like build tools, compilers and test frameworks (DoS attacks irrelevant) are the ones that more often need to read an entire file as fast as possible. In some senses, the fix made the DoS situation worse by increasing overhead (reducing number of useful CPU cycles) and by infinitely partitioning reads (each new request delays all existing requests -- the requests are waiting for their own ticks plus some number of ticks from all of the other requests). We built our web app to load-shed while maintaining quality of service for as many clients as possible, so interleaving is not what we want. -- Happy to discuss that more on- or off-line.Is anyone else able to verify that this degradation exists and/or was expected?
cc: @davisjam
The text was updated successfully, but these errors were encountered: