Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nodejs: disable JS test suite on Darwin #325844

Merged
merged 1 commit into from
Jul 10, 2024
Merged

Conversation

emilazy
Copy link
Member

@emilazy emilazy commented Jul 9, 2024

Description of changes

This was recently enabled in #313982. It seems to be much too flaky on Darwin currently, especially x86; see:

Disable these tests on macOS for now as the broken Node.js package is holding up a lot of builds.

Fixes: b26563a

@aduh95 Unfortunately, I lack the expertise to figure out how to get these tests working more reliably on macOS, but if you have any idea what’s going on I’d be happy to work with you to try and get them functioning properly once this is no longer blocking a bunch of builds. There’s a community builder box if you don’t have any Mac hardware available, but I understand if you don’t want to spend time trying to fix this on an OS you don’t use, too.

@vcunat I’ve targeted master on your prior suggestion, but let me know if you think this is too heavy; it would be unfortunate to have a broken Node.js for another entire staging cycle since a lot of things pull it in, but I don’t want to overload the builders. It could potentially be conditionalized further to x86_64-darwin for now as we lucked out with only nodejs_18 failing on aarch64-darwin. I’m also wondering if it might make sense to roll #325620, #325616, and #325621 into this, as they’re high‐severity (only on Windows actually) Node.js security updates that would cause the same number of builds on Darwin anyway (but it’d cause Linux rebuilds too, which this shouldn’t).

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.11 Release Notes (or backporting 23.11 and 24.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

This was recently enabled in NixOS#313982. It seems to be much too flaky
on Darwin currently, especially x86; see:

* <https://hydra.nixos.org/build/264860513/nixlog/8>
* <https://hydra.nixos.org/build/264956149/nixlog/3>
* <https://hydra.nixos.org/build/265094929/nixlog/3>
* <https://hydra.nixos.org/build/264901296/nixlog/3>

Disable these tests on macOS for now as the broken Node.js package
is holding up a lot of builds.

Fixes: b26563a
@ofborg ofborg bot added the 6.topic: darwin Running or building packages on Darwin label Jul 9, 2024
@emilazy
Copy link
Member Author

emilazy commented Jul 9, 2024

I built nodejs-slim successfully on x86_64-darwin with these changes.

I’ll leave it up to @vcunat what to do here in terms of target branch, etc.

@ofborg ofborg bot requested review from aduh95 and cillianderoiste July 9, 2024 14:40
@ofborg ofborg bot added 10.rebuild-darwin: 501+ 10.rebuild-darwin: 501-1000 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels Jul 9, 2024
@vcunat
Copy link
Member

vcunat commented Jul 9, 2024

I'm convinced that master is fine for this. (for posterity)

@paparodeo
Copy link
Contributor

This was recently enabled in #313982. It seems to be much too flaky on Darwin currently, especially x86; see:

* https://hydra.nixos.org/build/264860513/nixlog/8

* https://hydra.nixos.org/build/264956149/nixlog/3

* https://hydra.nixos.org/build/265094929/nixlog/3

* https://hydra.nixos.org/build/264901296/nixlog/3

looking at the logs it seems to me that after out/Release/node gets built and the tests start to run there is another clang++ process that gets run and builds out/Release/node in the middle of the tests after which some tests fail to find out/Release/node which i assume happens due to the build process unlinking it before re-creating it?

seems odd.

@emilazy
Copy link
Member Author

emilazy commented Jul 9, 2024

Wow, I didn’t pick up on that at all. That’s cursed. There are other weird flaky test failures too, although if the out/Release/node thing was mitigated it’d probably be easier to fix them whack‐a‐mole style.

Seems best to unblock Darwin builds for now and try to address this more properly during a later staging cycle?

@paparodeo
Copy link
Contributor

Seems best to unblock Darwin builds for now and try to address this more properly during a later staging cycle?

the change seems fine to me. curious why it is only breaking darwin. i'd be a little suspicious of the other platforms if that new checkPhase target is also rebuilding node and if it is rebuilding it, hopefully it is identical to the first time?

anyway -- i don't have any node expertise and will not be digging further into this. your change LGTM.

@emilazy
Copy link
Member Author

emilazy commented Jul 9, 2024

Apparently it’s flaky on other platforms too, but going by Hydra it seems like it’s worse on macOS for whatever reason. Just wanted to scope the patch to the observed breakage (and avoid generating even more builds on master).

@vcunat vcunat merged commit 0ca2f1e into NixOS:master Jul 10, 2024
41 of 42 checks passed
dminca added a commit to dminca/nix-config that referenced this pull request Jul 12, 2024
* we're back on stable
  * darwin-rebuild switch --flake .
    8.23s user 3.94s system 11% cpu 1:41.45 total
  * nix run . -- switch --flake .
    16.10s user 9.41s system 1% cpu 30:58.09 total

Resolves:
Related: NixOS/nixpkgs#325844
Signed-off-by: Daniel-Andrei Minca <[email protected]>
@tie
Copy link
Member

tie commented Jul 15, 2024

@vcunat @emilazy, wasn’t this already fixed in #262124? The merge into staging was really weird.

] ++ lib.optionals (!stdenv.buildPlatform.isDarwin || lib.versionAtLeast version "20") [
# There are some test failures on macOS before v20 that are not worth the
# time to debug for a version that would be eventually removed in less
# than a year (Node.js 18 will be EOL at 2025-04-30). Note that these
# failures are specific to Nix sandbox on macOS and should not affect
# actual functionality.
] ++ lib.optionals (!stdenv.isDarwin) [
# TODO: JS test suite is too flaky on Darwin; revisit at a later date.
"test-ci-js"
]);

@vcunat
Copy link
Member

vcunat commented Jul 15, 2024

I agree that the 176a56c merge resulted into nonsensical state.

@emilazy
Copy link
Member Author

emilazy commented Jul 16, 2024

Uh, hmm, guess we need to clean that up? 20 and 22 were broken on Hydra on x86_64-darwin too, so I don’t think that the version condition seems right.

@vcunat
Copy link
Member

vcunat commented Jul 16, 2024

Yes, the default one certainly was breaking.

@tie
Copy link
Member

tie commented Jul 16, 2024

20 and 22 were broken on Hydra on x86_64-darwin too, so I don’t think that the version condition seems right.

IIRC I’ve also disabled some tests on Darwin independent of the version and added a few fixes for other failing tests. If these failures are specific to x86_64-darwin, I think it makes more sense to add a check for that instead of disabling the for all Darwin platforms.

@emilazy
Copy link
Member Author

emilazy commented Jul 16, 2024

I don’t really have confidence in running this test suite at all right now if #325844 (comment) is true. I’d rather figure out what’s causing that race condition than play whack‐a‐mole with its consequences. It wasn’t a case of individual tests failing, but the tests suddenly breaking nondeterministically at random points. Unfortunately I don’t have the Node.js build system expertise to really dig in to the cause. The comments in #313982 suggest that the tests might have been causing issues on Linux, too.

If the race condition is truly x86_64-darwin‐only then I’m fine with disabling these tests only on that platform, though.

@tie
Copy link
Member

tie commented Jul 16, 2024

I don’t really have confidence in running this test suite at all right now if #325844 (comment) is true. […] The comments in #313982 suggest that the tests might have been causing issues on Linux, too.

Hm, maybe enableParallelChecking = false could help? Looks like we run multiple test suits in parallel and for some reason one of them rebuilds node binary (and running test-ci-js just exposed this behavior).

I think the underlying tools/test.py script that runs some of these tests can also be used with an external node binary (via --shell flag), so perhaps we can move integration tests to passthru in a separate derivation.

@emilazy
Copy link
Member Author

emilazy commented Jul 16, 2024

A separate derivation would be nice, considering how slow the build is already. If you want to PR a less blunt fix I’m happy to review and test it, but I’m not sure I have the cycles to actively make the changes myself right now. Obviously avoiding the rebuild at all would be preferable to making the checks slower, but I couldn’t really figure out what was going on.

The 22 test suite will also separate need fixes for Python 3.12, by the way. There was a warning about calling fork in a multi‐threaded program triggering on a test helper script that didn’t seem to spawn threads at all, probably because macOS system libraries like to spawn threads in the background and 3.12 added that warning.

@tie
Copy link
Member

tie commented Jul 16, 2024

considering how slow the build is already

Uh, yes, that’s a bug (see #321847) 😅

With #220204 (comment), it takes ~10m to build instead of >30m. I’ll open a separate PR soon that switches Node.js build to ninja based on the patch attached to the comment.

@emilazy
Copy link
Member Author

emilazy commented Jul 16, 2024

That would probably have helped my patience for running the whole build again to see how the tests are breaking last longer 🙃

Anyway, I’m sorry for stepping on your toes here; I just wanted to make sure the x86_64-darwin channel wasn’t missing a huge number of packages in the short term. I do approve of running test suites where we can, I just exceeded the time I wanted to allocate to these specific failures since they weren’t being run at all until very recently anyway. Thanks for your work on this; I’d definitely be happy to be cc’d on any improvements you make for review and testing locally.

@tie tie mentioned this pull request Jul 21, 2024
13 tasks
@emilazy emilazy deleted the push-wllkmpsxkxnp branch August 26, 2024 01:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: darwin Running or building packages on Darwin 6.topic: nodejs 10.rebuild-darwin: 501-1000 10.rebuild-darwin: 501+ 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants