Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deps: add simdjson to node.js #47991

Closed
wants to merge 2 commits into from
Closed

deps: add simdjson to node.js #47991

wants to merge 2 commits into from

Conversation

anonrig
Copy link
Member

@anonrig anonrig commented May 13, 2023

This would remove all our json parsing code inside src folder and help us provide faster solutions. I'm happy to add an example implementation to this pull-request if requested.

cc @lemire

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/actions
  • @nodejs/gyp
  • @nodejs/tsc

@nodejs-github-bot nodejs-github-bot added build Issues and PRs related to build files or the CI. dependencies Pull requests that update a dependency file. meta Issues and PRs related to the general management of the project. needs-ci PRs that need a full CI run. tools Issues and PRs related to the tools directory. labels May 13, 2023
marco-ippolito

This comment was marked as resolved.

@marco-ippolito
Copy link
Member

LGTM on the tooling side

Copy link
Member

@RafaelGSS RafaelGSS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you include a specific path to use simdjson and provide benchmarks to hold your statement?

json parsing code inside src folder and help us provide faster solutions.

@mscdex
Copy link
Contributor

mscdex commented May 13, 2023

I'm confused by this as well. What is wrong with using V8's built-in JSON parse function?

@mcollina mcollina added the request-ci Add this label to start a Jenkins CI on a PR. label May 14, 2023
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label May 14, 2023
@nodejs-github-bot
Copy link
Collaborator

@mcollina
Copy link
Member

I have a feeling that this will not run on all our supported platforms (kicked off CI to verify).

Ultimately, simdjson is not significantly faster than JSON.parse() if we need the full objects in JS-land (creating the objects is the bottleneck, not parsing them).

@nodejs-github-bot
Copy link
Collaborator

@anonrig
Copy link
Member Author

anonrig commented May 14, 2023

I have a feeling that this will not run on all our supported platforms (kicked off CI to verify).

@mcollina It seems only AIX is failing. I opened an upstream issue and working on a fix for simdjson. (cc @lemire)

Ultimately, simdjson is not significantly faster than JSON.parse() if we need the full objects in JS-land (creating the objects is the bottleneck, not parsing them).

I don't have sufficient data to back my claim, but this might not be a valid argument, I'm not sure. There are other places on top of JSON.parse that can be beneficial for simdjson such as InternalModuleReadJSON which is used to speed up module loading. It's worth discussing if those places are sufficient enough to add a new dependency to Node.js. I'm interested in hearing your opinion.

I'm confused by this as well. What is wrong with using V8's built-in JSON parse function?

@mscdex Nothing is wrong with the built-in JSON parse. I'm experimenting and asking for community feedback before investing a lot of time in this.


Long story short:

A couple of months ago, in one of @joyeecheung's pull requests, I don't remember which one at the moment but I've communicated the idea of using simdjson to speed up certain functions. I'm interested in researching & experimenting with where we can leverage simdjson and improve performance while reducing our json specific implementations.

@anonrig
Copy link
Member Author

anonrig commented May 14, 2023

Updated simdutf to v3.1.8 fixing AIX build error.

@anonrig anonrig added the request-ci Add this label to start a Jenkins CI on a PR. label May 14, 2023
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label May 14, 2023
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@mcollina
Copy link
Member

I don't have sufficient data to back my claim, but this might not be a valid argument, I'm not sure. There are other places on top of JSON.parse that can be beneficial for simdjson such as InternalModuleReadJSON which is used to speed up module loading. It's worth discussing if those places are sufficient enough to add a new dependency to Node.js. I'm interested in hearing your opinion.

Ultimately it depends on what's the speedup in that operation. I'd restrain from adding a dependency without a merasurable improvement.

@GeoffreyBooth
Copy link
Member

There are other places on top of JSON.parse that can be beneficial for simdjson such as InternalModuleReadJSON which is used to speed up module loading.

We read a lot of package.json files during module loading, especially for typical real-world apps with a hundred or more dependencies. If the library speeds that up, it should produce a measurable if not noticeable performance boost.

@anonrig
Copy link
Member Author

anonrig commented May 23, 2023

@nodejs/loaders In a total of 5 tests are failing at the moment. Appreciate any help.

@anonrig anonrig force-pushed the add-simdjson branch 2 times, most recently from 1138d7a to c62068c Compare May 23, 2023 02:33
@anonrig anonrig added the loaders Issues and PRs related to ES module loaders label May 23, 2023
@anonrig anonrig force-pushed the add-simdjson branch 2 times, most recently from 1be0fe4 to 05ed646 Compare May 23, 2023 20:44
Copy link
Member

@JakobJingleheimer JakobJingleheimer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay (this got buried in my inbox). It looks like (the good news is) all the failures here are related.

The error occurs after several checks against various packageJSON fields. I'm only familiar with the ESM side of modules, but I would bet your removal of packageJsonCache.set(jsonPath, filtered); is the cause (the reads against packageJSON fields within Module._resolveFilename (wherefrom the error is thrown) are probably trying to read from packageJsonCache, which no-longer gets set.

lib/internal/modules/esm/package_config.js Outdated Show resolved Hide resolved
* @param {string} jsonPath
*/
function read(jsonPath) {
if (cache.has(jsonPath)) {
return cache.get(jsonPath);
}

const { 0: string, 1: containsKeys } = internalModuleReadJSON(
const {
0: includes_keys,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is javascript, where camelCase is the standard

Suggested change
0: includes_keys,
0: includesKeys,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely, maybe we should have an eslint rule about this in a different PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised there isn't one @aduh95 I do you know if this has been discussed previously?

lib/internal/modules/package_json_reader.js Outdated Show resolved Hide resolved
lib/internal/modules/package_json_reader.js Outdated Show resolved Hide resolved
@anonrig anonrig force-pushed the add-simdjson branch 4 times, most recently from be620d3 to d1899e9 Compare June 15, 2023 00:17
@anonrig
Copy link
Member Author

anonrig commented Jun 15, 2023

I've reduced the failed tests to the following:

  • test/es-module/test-esm-exports.mjs
  • test/es-module/test-esm-exports-deprecations.mjs
  • test/pummel/test-policy-integrity-parent-commonjs.js

@RafaelGSS Any idea why test-policy-integrity-parent-commonjs.js is failing?

Copy link
Member

@JakobJingleheimer JakobJingleheimer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't know anything about the remaining tests—they pre-date me and basically haven't been touched in years.

test/sequential/test-module-loading.js Outdated Show resolved Hide resolved
lib/internal/modules/esm/package_config.js Show resolved Hide resolved
Comment on lines -356 to -385
const packageJsonCache = new SafeMap();

function readPackage(requestPath) {
const jsonPath = path.resolve(requestPath, 'package.json');

const existing = packageJsonCache.get(jsonPath);
if (existing !== undefined) return existing;

const result = packageJsonReader.read(jsonPath);
const json = result.containsKeys === false ? '{}' : result.string;
if (json === undefined) {
packageJsonCache.set(jsonPath, false);
return false;
}

try {
const filtered = filterOwnProperties(JSONParse(json), [
'name',
'main',
'exports',
'imports',
'type',
]);
packageJsonCache.set(jsonPath, filtered);
return filtered;
} catch (e) {
e.path = jsonPath;
e.message = 'Error parsing ' + jsonPath + ': ' + e.message;
throw e;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oow, nice. It looks like this code was actually redundant to package_config.js / getPackageConfig()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! We were having 2 different caches and duplicated code for both in ESM and CJS.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm actually was that a bug or a feature? 🤔

@guybedford do you know is CJS and ESM are supposed to have different package.json caches?

Comment on lines 17 to 27
* Returns undefined for all failure cases.
* @param {string} jsonPath
*/
function read(jsonPath) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to include a @returns with the type and explanation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the JSDOC to the following. Let me know, if it is not sufficient:

/**
 * Returns undefined for all failure cases.
 * @param {string} jsonPath
 * @returns {{
 *   name?: string,
 *   main?: string,
 *   exports?: string | Record<string, unknown>,
 *   imports?: string | Record<string, unknown>,
 *   type: 'commonjs' | 'module' | 'none' | unknown,
 * } | undefined}
 */

Copy link
Member

@JakobJingleheimer JakobJingleheimer Jun 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I would move the Returns undefined for all failure cases. after the type for @returns (and merely say `undefined` on failure)

Also, I think the type for exports' & imports' Record value is known (I think it's a resolved URL['href']?), but maybe unknown is fine for now.

Copy link
Contributor

@cjihrig cjihrig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this has been open for longer than seven days and has a sign off, I'm going to request changes to prevent this from accidentally landing until benchmarks are provided.

@anonrig
Copy link
Member Author

anonrig commented Jun 15, 2023

Since this has been open for longer than seven days and has a sign off, I'm going to request changes to prevent this from accidentally landing until benchmarks are provided.

Since there is a push after that review, it will never get merged without a new approved review.

@cjihrig
Copy link
Contributor

cjihrig commented Jun 15, 2023

Since there is a push after that review, it will never get merged without a new approved review.

To be honest, I don't think there should be an approval on this PR at all yet, so it's entirely possible that another premature approval could happen.

@anonrig anonrig force-pushed the add-simdjson branch 2 times, most recently from a80c028 to 75f9437 Compare June 15, 2023 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Issues and PRs related to build files or the CI. dependencies Pull requests that update a dependency file. loaders Issues and PRs related to ES module loaders meta Issues and PRs related to the general management of the project. needs-ci PRs that need a full CI run. tools Issues and PRs related to the tools directory.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants