-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
buffers: speed up swap16/32, add swap64 #7157
Conversation
method: ['swap16', 'swap32', 'htons', 'htonl'], | ||
len: [4, 64, 512, 768, 1024, 1536, 2056, 4096, 8192], | ||
n: [1e6] | ||
method: ['swap16', 'swap32', 'swap64'/*, 'htons', 'htonl', 'htonll'*/], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is commenting these out intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, see comment a few lines down. The commented-out ones are only for picking the crossover point between the JS and C++ implementations, and make the benchmarks super slow.
/cc @nodejs/buffer |
((x & 0x00000000FF000000ull) << 8) | \ | ||
((x & 0x0000000000FF0000ull) << 24) | \ | ||
((x & 0x000000000000FF00ull) << 40) | \ | ||
((x & 0x00000000000000FFull) << 56) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you wrap the x
es in the macros in parentheses, like (x)
? I don’t think it makes a difference right now, it’s just a good practice to do that.
``` | ||
|
||
### buf.swap64() | ||
<!-- YAML | ||
added: v6.x.x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using added: REPLACEME
will lead the person doing the release that contains this feature to fill in the correct version.
81556e8
to
a802e79
Compare
Made all changes suggested so far, thanks. (@trevnorris I also added |
}); | ||
|
||
// The htons and htonl methods below are used to benchmark the | ||
// performance difference between doing the byteswap in pure | ||
// javascript regardless of Buffer size as opposed to dropping | ||
// down to the native layer for larger Buffer sizes. | ||
// down to the native layer for larger Buffer sizes. Commented | ||
// out by default because they are slow for big buffers. If |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what the TODO would indicate. The JS methods are copy/pastes of what's in lib/buffer.js and are only for evaluating where to set the crossover point between JS and native implementations. LMK if you had something else in mind...
LGTM with a couple nits. Would like to make sure @trevnorris, @addaleax and @bnoordhuis are happy with it also |
a802e79
to
155b3da
Compare
(((x) & 0xFF00) << 8) | \ | ||
(((x) >> 8) & 0xFF00) | \ | ||
(((x) >> 24) & 0xFF) | ||
#define BSWAP_INTRINSIC_8(x) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: check alignment of \
at the end.
ad6f206
to
adb07aa
Compare
(Safe impls and tests for unaligned buffers added. Pending new benchmarks for aligned v. unaligned, and benchmarking |
swap(this, i, i + 1); | ||
return this; | ||
} | ||
return swap16n.apply(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missed this on the original swap pr. no need to use .apply()
. just pass this
as an argument and access it via args[0]
in c++. rest of the code does this.
Mind applying this patch to the PR? diff --git a/benchmark/buffers/buffer-swap.js b/benchmark/buffers/buffer-swap.js
index 0be334d..dfc2158 100644
--- a/benchmark/buffers/buffer-swap.js
+++ b/benchmark/buffers/buffer-swap.js
@@ -1,8 +1,10 @@
'use strict';
const common = require('../common.js');
+const v8 = require('v8');
const bench = common.createBenchmark(main, {
+ aligned: ['true'],
method: ['swap16', 'swap32', 'swap64'/*, 'htons', 'htonl', 'htonll'*/],
len: [8, 64, 128, 256, 512, 768, 1024, 1536, 2056, 4096, 8192],
n: [5e7]
@@ -54,24 +56,35 @@ Buffer.prototype.htonll = function htonl() {
return this;
};
-function createBuffer(len) {
+function createBuffer(len, aligned) {
+ len += aligned ? 0 : 1;
const buf = Buffer.allocUnsafe(len);
for (var i = 1; i <= len; i++)
buf[i - 1] = i;
- return buf;
+ return aligned ? buf : buf.slice(1);
}
-function bufferSwap(n, buf, method) {
- for (var i = 1; i <= n; i++)
- buf[method]();
+function genMethod(method) {
+ const fnString =
+ 'return function ' + method + '(n, buf) {' +
+ ' for (var i = 0; i <= n; i++)' +
+ ' buf.' + method + '();' +
+ '}';
+ return (new Function(fnString))();
}
function main(conf) {
const method = conf.method;
const len = conf.len | 0;
const n = conf.n | 0;
- const buf = createBuffer(len);
+ const aligned = conf.aligned | 'true';
+ const buf = createBuffer(len, aligned === 'true');
+ const bufferSwap = genMethod(method);
+
+ v8.setFlagsFromString('--allow_natives_syntax');
+ eval('%OptimizeFunctionOnNextCall(bufferSwap)');
+
bench.start();
- bufferSwap(n, buf, method);
+ bufferSwap(n, buf);
bench.end(n);
} First thing is the optimization of the actual function call. Here's the before and after:
Second thing is adding
|
adb07aa
to
f5e2215
Compare
Done, thanks. Somehow I'm no longer seeing the performance gain for aligned buffers... 6.2.0 and this are comparable. 😕 Need to dig to see what's going on. |
f5e2215
to
3ca2712
Compare
I get the following lint errors:
Looking at the performance part now. |
3ca2712
to
c4cf316
Compare
|
@zbjornson |
* Speed up buffer.swap16 and swap32 by using builtins. Up to ~6x gain. Drop transition point between JS and C++ implementations accordingly. Amount of performance improvement not only depends on buffer size but also memory alignment. * Fix tests: C++ impl tests were testing 0-filled buffers so were always passing. * Add similar buffer.swap64 method. * Make buffer-swap benchmark mirror JS impl. doc/api/buffer.markdown has an entry of "added: REPLACEME" that should be changed to the correct release number before tagged. Because node is currently using a very old version of cpplint.py it doesn't know that std::swap() has moved from <algorithm> to <utility> in c++11. So until cpplint.py is updated simply NOLINT the line. Technically it should be NOLINT(build/include_what_you_use), but that puts the line over 80 characters causing another lint error. PR-URL: #7157 Reviewed-By: Trevor Norris <[email protected]> Reviewed-By: James M Snell <[email protected]>
Thanks much for the hard work. Landed in a1059af with minor adjustments and lint fixes. |
This is |
Yep. Semver minor
|
Does not cleanly apply on v6. I think this depends on #7082? (or at least I think the conflict comes from there) |
The conflict comes from there, although it’s only because of neighbouring lines being changed. There should be no logical dependency here; if it helps, I can do a quick backport PR. |
@addaleax If you can post what the patch should look like on v6 I can probably do the resolution. |
Or a PR may be easier, idk. Either or. |
* Speed up buffer.swap16 and swap32 by using builtins. Up to ~6x gain. Drop transition point between JS and C++ implementations accordingly. Amount of performance improvement not only depends on buffer size but also memory alignment. * Fix tests: C++ impl tests were testing 0-filled buffers so were always passing. * Add similar buffer.swap64 method. * Make buffer-swap benchmark mirror JS impl. doc/api/buffer.markdown has an entry of "added: REPLACEME" that should be changed to the correct release number before tagged. Because node is currently using a very old version of cpplint.py it doesn't know that std::swap() has moved from <algorithm> to <utility> in c++11. So until cpplint.py is updated simply NOLINT the line. Technically it should be NOLINT(build/include_what_you_use), but that puts the line over 80 characters causing another lint error. PR-URL: nodejs#7157 Reviewed-By: Trevor Norris <[email protected]> Reviewed-By: James M Snell <[email protected]>
* Speed up buffer.swap16 and swap32 by using builtins. Up to ~6x gain. Drop transition point between JS and C++ implementations accordingly. Amount of performance improvement not only depends on buffer size but also memory alignment. * Fix tests: C++ impl tests were testing 0-filled buffers so were always passing. * Add similar buffer.swap64 method. * Make buffer-swap benchmark mirror JS impl. doc/api/buffer.markdown has an entry of "added: REPLACEME" that should be changed to the correct release number before tagged. Because node is currently using a very old version of cpplint.py it doesn't know that std::swap() has moved from <algorithm> to <utility> in c++11. So until cpplint.py is updated simply NOLINT the line. Technically it should be NOLINT(build/include_what_you_use), but that puts the line over 80 characters causing another lint error. PR-URL: #7157 Reviewed-By: Trevor Norris <[email protected]> Reviewed-By: James M Snell <[email protected]> Backport-URL: #7546
Notable changes: * buffer: Added `buffer.swap64()` to compliment `swap16()` & `swap32()`. (Zach Bjornson) #7157 * build: New `configure` options have been added for building Node.js as a shared library. (Stefan Budeanu) #6994 - The options are: `--shared`, `--without-v8-platform` & `--without-bundled-v8`. * crypto: Root certificates have been updated. (Ben Noordhuis) #7363 * debugger: The server address is now configurable via `--debug=<address>:<port>`. (Ben Noordhuis) #3316 * npm: Upgraded npm to v3.10.3 (Kat Marchán) #7515 & (Rebecca Turner) #7410 * readline: Added the `prompt` option to the readline constructor. (Evan Lucas) #7125 * repl / vm: `sigint`/`ctrl+c` will now break out of infinite loops without stopping the Node.js instance. (Anna Henningsen) #6635 * src: - Added a `node::FreeEnvironment` public C++ API. (Cheng Zhao) #3098 - Refactored `require('constants')`, constants are now available directly from their respective modules. (James M Snell) #6534 * stream: Improved `readable.read()` performance by up to 70%. (Brian White) #7077 * timers: `setImmediate()` is now up to 150% faster in some situations. (Andras) #6436 * util: Added a `breakLength` option to `util.inspect()` to control how objects are formatted across lines. (cjihrig) #7499 * v8-inspector: Experimental support has been added for debugging Node.js over the inspector protocol. (Ali Ijaz Sheikh) #6792 - *Note: This feature is experimental, and it could be altered or removed.* - You can try this feature by running Node.js with the `--inspect` flag. Refs: #7441 PR-URL: #7550
Notable changes: * buffer: Added `buffer.swap64()` to compliment `swap16()` & `swap32()`. (Zach Bjornson) #7157 * build: New `configure` options have been added for building Node.js as a shared library. (Stefan Budeanu) #6994 - The options are: `--shared`, `--without-v8-platform` & `--without-bundled-v8`. * crypto: Root certificates have been updated. (Ben Noordhuis) #7363 * debugger: The server address is now configurable via `--debug=<address>:<port>`. (Ben Noordhuis) #3316 * npm: Upgraded npm to v3.10.3 (Kat Marchán) #7515 & (Rebecca Turner) #7410 * readline: Added the `prompt` option to the readline constructor. (Evan Lucas) #7125 * repl / vm: `sigint`/`ctrl+c` will now break out of infinite loops without stopping the Node.js instance. (Anna Henningsen) #6635 * src: - Added a `node::FreeEnvironment` public C++ API. (Cheng Zhao) #3098 - Refactored `require('constants')`, constants are now available directly from their respective modules. (James M Snell) #6534 * stream: Improved `readable.read()` performance by up to 70%. (Brian White) #7077 * timers: `setImmediate()` is now up to 150% faster in some situations. (Andras) #6436 * util: Added a `breakLength` option to `util.inspect()` to control how objects are formatted across lines. (cjihrig) #7499 * v8-inspector: Experimental support has been added for debugging Node.js over the inspector protocol. (Ali Ijaz Sheikh) #6792 - *Note: This feature is experimental, and it could be altered or removed.* - You can try this feature by running Node.js with the `--inspect` flag. Refs: #7441 PR-URL: #7550
Notable changes: * buffer: Added `buffer.swap64()` to compliment `swap16()` & `swap32()`. (Zach Bjornson) #7157 * build: New `configure` options have been added for building Node.js as a shared library. (Stefan Budeanu) #6994 - The options are: `--shared`, `--without-v8-platform` & `--without-bundled-v8`. * crypto: Root certificates have been updated. (Ben Noordhuis) #7363 * debugger: The server address is now configurable via `--debug=<address>:<port>`. (Ben Noordhuis) #3316 * npm: Upgraded npm to v3.10.3 (Kat Marchán) #7515 & (Rebecca Turner) #7410 * readline: Added the `prompt` option to the readline constructor. (Evan Lucas) #7125 * repl / vm: `sigint`/`ctrl+c` will now break out of infinite loops without stopping the Node.js instance. (Anna Henningsen) #6635 * src: - Added a `node::FreeEnvironment` public C++ API. (Cheng Zhao) #3098 - Refactored `require('constants')`, constants are now available directly from their respective modules. (James M Snell) #6534 * stream: Improved `readable.read()` performance by up to 70%. (Brian White) #7077 * timers: `setImmediate()` is now up to 150% faster in some situations. (Andras) #6436 * util: Added a `breakLength` option to `util.inspect()` to control how objects are formatted across lines. (cjihrig) #7499 * v8-inspector: Experimental support has been added for debugging Node.js over the inspector protocol. (Ali Ijaz Sheikh) #6792 - *Note: This feature is experimental, and it could be altered or removed.* - You can try this feature by running Node.js with the `--inspect` flag. Refs: #7441 PR-URL: #7550
### Notable changes * **buffer**: Added `buffer.swap64()` to compliment `swap16()` & `swap32()`. (Zach Bjornson) [#7157](nodejs/node#7157) * **build**: New `configure` options have been added for building Node.js as a shared library. (Stefan Budeanu) [#6994](nodejs/node#6994) - The options are: `--shared`, `--without-v8-platform` & `--without-bundled-v8`. * **crypto**: Root certificates have been updated. (Ben Noordhuis) [#7363](nodejs/node#7363) * **debugger**: The server address is now configurable via `--debug=<address>:<port>`. (Ben Noordhuis) [#3316](nodejs/node#3316) * **npm**: Upgraded npm to v3.10.3 (Kat Marchán) [#7515](nodejs/node#7515) & (Rebecca Turner) [#7410](nodejs/node#7410) * **readline**: Added the `prompt` option to the readline constructor. (Evan Lucas) [#7125](nodejs/node#7125) * **repl / vm**: `sigint`/`ctrl+c` will now break out of infinite loops without stopping the Node.js instance. (Anna Henningsen) [#6635](nodejs/node#6635) * **src**: - Added a `node::FreeEnvironment` public C++ API. (Cheng Zhao) [#3098](nodejs/node#3098) - Refactored `require('constants')`, constants are now available directly from their respective modules. (James M Snell) [#6534](nodejs/node#6534) * **stream**: Improved `readable.read()` performance by up to 70%. (Brian White) [#7077](nodejs/node#7077) * **timers**: `setImmediate()` is now up to 150% faster in some situations. (Andras) [#6436](nodejs/node#6436) * **util**: Added a `breakLength` option to `util.inspect()` to control how objects are formatted across lines. (cjihrig) [#7499](nodejs/node#7499) * **v8-inspector**: Experimental support has been added for debugging Node.js over the inspector protocol. (Ali Ijaz Sheikh) [#6792](nodejs/node#6792) - **Note: This feature is _experimental_, and it could be altered or removed.** - You can try this feature by running Node.js with the `--inspect` flag.
Said builtins are not supported by older versions of apple-gcc, breaking the build on OS X 10.8. Fixes: nodejs#7618 Refs: nodejs#4290 Refs: nodejs#7157
Checklist
Affected core subsystem(s)
Buffer (swap16 and swap32 methods)
Description of change
Float64Array
s, which I do routinely.)Tested with gcc/g++ 4.8.3, clang 3.4 and MSVC 2015 (i.e. something like #4284 shouldn't happen here). Also tested the macro definitions for the
#else
and__linux__
cases on Ubuntu. I don't know if there's a non-gcc, non-clang linux compiler we need to support; if not, then the__linux__
case could be removed.If/when this goes in, I plan to try using the new macros to speed up the buffer.read/write* methods.
Benchmark Results
16 bit types
32 bit types
64 bit types