-
-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] build-support/gradle: init #235058
[RFC] build-support/gradle: init #235058
Conversation
This is at an RFC stage, this isn't ready for merge yet.
Pinging maintainers of affected packages: @abbradar @averelld @charles-dyfis-net @chkno @delroth @expipiplus1 @fgaz @gavin @gebner @jiegec @liff @linsui @lorenzleutgeb @Ma27 @mguentner @nagy @raskin @rnhmjoj @roblabla @TheKostins @thoughtpolice @vaci @wentasah @wirew0rm |
As a package maintainer (jadx) I fail to understand how this is better than the existing state of things -- it seems like it's more work on package update, more cruft added to nixpkgs in the form of many checked-in autogenerated files, and this PR's description doesn't really seem to motivate all these changes in a way that I can grasp. Maybe I'm just missing what the benefits are here, but the drawbacks are however very obvious in comparison. In general the tendency across nixpkgs seems to be (rightfully, imo) moving away from checked-in autogenerated files in the repo. See e.g. buildNpmPackage replacing the mess of giant node2nix lockfiles. In this light I'm not sure whether it's the right direction? Additionally, this PR is IMHO unreviewable as is. At the very least changes to each of the modified packages should be individual commits. It took minutes for my web browser to churn through the GitHub "changes" interface to even be able to see what changes were applied to the derivation I maintain. Please don't take these comments as a showstopper on this PR -- my mostly uninformed opinion on the topic isn't worth much :-) . But I suspect that if your documentation and the changes can't convince me, I won't be the only person in that situation. |
You're right in that not all packages need lockfiles, to my knowledge only some Gradle packages actually require them (basically, they are important for reproducibility, but there is a chance the build will be reproducible without a lockfile). I plan to add an optional flag to autogenerate the lockfiles the same way it currently works - but there's no way to prove it will stay reproducible without inspecting the package source. The Gradle derivations in nixpkgs work fine largely by accident, in general they may break if they contain a single dynamic dependency which gets updated. Maintainer overhead won't be that big as I will include an update script. The file pollution is unfortunate, but I don't see another option for reproducibility. Perhaps a Gradle plugin could be written to only list dynamic dependencies in the lockfile. You are mistaken about buildNodePackage. As you can see, it requires lockfiles as well. The difference is JS tooling uses lockfiles much more often than Java tooling. If the upstream Gradle package commits the lockfiles, you don't have to include them in nixpkgs. Also, Github has a language filter, which is convenient to use for this commit. But it's true it would've been easier to look at individual per-package commits, I'm just not at the stage where I think I won't be making frequent enough changes to be able to comfortably deal with rebasing every package after every change. |
Same here, cie-middleware-linux is already reproducible (I tested it) and fetching the dependencies can be done without all this cruft. Maybe it can be made optional? I agree that gradle packages should be a first class citizen of Nixpkgs, and I think this PR is more than welcome, but adding this amount of generated data to Nixpkgs is crazy. IIRC there was in fact an effort to move most of the generated files away from the main repository. Besides, it's almost impossible to review the changes from github. |
Input as maintainer of mindustry: This PR breaks mindustry. Knowingly, because it sets I'm thrilled that folks are working on stuff like this. Thank you! Please continue. But it's hard for me to give feedback on the details of this proposal because, as a non-gradle-expert, I just see a bunch of random changes I don't understand, some obviously-wrong changes like changing As delroth said, please split this into separate commits. In addition to making review feasible, this also has the benefit of making it clear that both the old way and this new way continue to work -- that this isn't some giant cut-over to a new thing that has to be done treewide in one commit (which would be bad). |
As the maintainer of JabRef: Obviousely this doesn't work for JabRef. It breaks the reproduciblity. The gradle lockfile doesn't work for SNAPSHOT as noted in the doc:
And you removed the aarch64 support. I'm not sure why you do this. I prefer my approach. I download deps for different arches into the same fixed-output derivation, so that I don't need to update the hashes for other arches I don't have. |
As the maintainer of Ghidra, I think having an official, less hacky way to handle Gradle packages is fantastic. The current mess of scripts and hacks that every gradle package cobbles together is very unfortunate, as it makes packaging new gradle-based software a huge hassle. Finding the right voodoo invocation to package Ghidra was a rather terrible experience. With that said, I also have to agree that I don't quite understand what the lockfiles add to the table here. I'm admittedly not an expert on gradle things though. Can you explain how lockfiles improve the reproducibility of the packages, vs just having an outputHash on the dep derivation? |
I don't understand what the scripts in Also, I don't understand why there are so many lockfiles for dependencies. I thought that one lockfile per Gradle project would suffice? |
I'm already patching SNAPSHOT versions, so that's fine
will do!
I don't think it changes reproducibility as much as it removes the chance it will suddenly break in the future. Gradle, like other package managers, supports non-exact versions, and if a new version releases, Gradle may then download the newer version and break the dependency hash. As I said above, perhaps a Gradle plugin could be written to only lock dynamic dependencies. In fact, Gradle may download a different dependency with the same name and version number even when using lockfiles, which is very unfortunate, but I'll try to find a way to fix it.
Gradle 6 uses an old lockfile format which has many more lockfiles. With Gradle 7 and above, it's indeed one lockfile per project (or rather two, one of them for the buildscript) but some repos are split into multiple projects and require multiple sets of 2 lockfiles. |
Do you mean this line? |
latest.integration resolves to latest release or snapshot, I don't think it's very likely for the build to break when using a newer release vs older snapshot. However, I forgot about versions like
Which is why I think the logical next step is a custom lockfile format, unless this plugin has better lockfiles. Right now a maven repo gets constructed from the gradle cache, which doesn't always match the actual maven repo structure as gradle keeps some state internally. By hijacking dependency resolution, it's possible to detect snapshots and dynamic versions, write their resolved version to a custom-format lockfile in Additionally, as I mentioned above, it isn't guaranteed Gradle will fetch dependencies from the same repo if two repos have the dependency. Different repos actually use different .pom formats (i.e. they host more or less metadata, they have different xmls, etc), which may cause the hash to change. In conclusion from this discussion and my personal experience:
When I come up with a solution for this, I will update this PR. I think it's possible to do it in just the Gradle DSL, but maybe a Gradle plugin will be required. Looking at Gradle's binary metadata cache is another option, but it's quite fragile as it would have to be updated for each Gradle version. |
It was already decided against committing cargo lockfiles for each package requiring those because of concerns re size of the download tarball. Also, does gradle really not provide content hashes for its locked dependencies? :/ |
Can you link me to the discussion? From what I see it's demonstrably false (see this for example) |
See #217084 (comment).
|
(tl;dr at the end) Regarding your first point - the thing is Gradle builds are inherently unreproducible without lockfiles (and even with lockfiles a whole bunch of caveats applies, which is why I'm thinking on changing the approach to something more suitable for Nix). Rust already encourages bundling lockfiles for applications, and it does happen in practice, while with Gradle it's rarely done because it isn't enabled by default and requires extra effort to maintain - even Gradle themselves position it as more or less simply something to help the devs maintain a stable environment, hence unreproducible snapshots (even the hash verification documentation mentions that multiple hashes for a dependency is possible). Even this PR is potentially impure even assuming the lockfiles work perfectly, because it doesn't sort the list of downloaded jar/pom files, so if multiple jars/poms with the same name but different contents get downloaded, their content in the output depends on the order of files on the filesystem. Though this is fixed with a simple I understand the concern for tarball size, although there are much fewer Gradle packages in nixpkgs so the impact is way smaller, and the lockfiles are much less verbose than in Rust. Compare: [[package]]
name = "aes"
version = "0.7.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9e8b47f52ea9bae42228d07ec09eb676433d7c4ed1ebdf0f1d1c29ed446f1ab8"
dependencies = [
"cfg-if",
"cipher",
"cpufeatures",
"opaque-debug",
] and
What I also see is there's a discussion about switching to a centralized set of Rust crates, which has the added benefit of storing (in Nix store) and fetching each dependency separately. I think this can be done in case of Gradle as well - but because of the Java culture we definitely can't just bundle a single version of each library here, we would have to store each version of each library separately, and "garbage collect" them from nixpkgs when they are no longer needed by any in-tree packages. This is basically the node2nix approach, so there's precedent already. This could offer a decent compromise between tree size and Nix compatibility, and also get rid of the FOD. As a bonus, we can finally properly use Nix-built OpenJFX. The problem is many Gradle packages rely on impurity and manually fetch stuff at build time (e.g. Ghidra, which has a separate task for fetching certain non-Java dependencies, or signald which changes the Java dependencies depending on env vars set or on the architecture). This means that unlike nodejs where fetching dependencies can be done just by looking at package.json, here we have to run the Gradle build script to do it, and we may have to do it for different architectures. That is, we can't do all this without FODs, even if those FODs are only ran by maintainers and not actually built by Hydra. Or if we want to get rid of FODs completely the alternative is to make the maintainers run Gradle without sandboxing to get the list of dependencies. When Gradle is running, we can figure out the URLs of I think in general it's great because "the Gradle way" of fetching dependencies from god knows where isn't good for Nix. I've only packaged two Gradle packages (not in nixpkgs), they had native code dependencies downloaded at build-time, I simply replaced the custom dependency fetchers they used with a hardcoded path to Nix-built packages, and imo this should really be done in nixpkgs as well. In conclusion, I think getting rid of FODs is definitely possible, but that by itself will not change the size overhead. Changing the dependencies to be centralized like in node2nix can mostly alleviate the size issue and bring massive build speed benefits and potentially save on binary cache space, but the caveats (besides the same caveats node2nix has) are that getting Gradle packages' dependencies is turing-complete so we can't possibly account for all cases, and those special cases will have to be accounted for by maintainers - sometimes with more code, always with more reproducibility than there already is. |
This approach isn't flexible enough and doesn't work around many of Gradle's pitfalls. It's probably better than what there is right now, but it also adds maintainer overhead (which is justifiable if it solves the problems with Gradle) while not solving every problem that Gradle brings (i.e. workarounds still have to be applied in like 20% cases, which makes it not justifiable imo). Right now I'm thinking of MITMing Gradle to work around the rest of the problems (e.g. the okio -> okio-jvm issue, or for supporting multiple repos). For those reasons, I'm closing this PR. When I figure out how to MITM Gradle, I'll open a new one. |
This is at an RFC stage, this isn't ready for merge yet.
Description of changes
This is the long overdue Nix builder for Gradle. It works in a way most Gradle packages already work - that is, fetches the dependencies in one derivation, and builds in another, but also adds lockfiles for more reproducibility (there were a couple packages affected by it). While lockfiles add a maintenance overhead because they are almost never used by Java devs, I did my best to ensure they would be fairly easy to update. The current workflow (can be automated) is like this:
package.deps.update
, get the new hashpackage.deps.update
's output lockfiles to the package's directorypackage.deps.update
andpackage.deps
provide the same output if you copy the lockfiles from the former to the latter. But to verify that that is indeed the case it's recommended, though not mandatory to change the hash to a fake hash again just to check that the hash forpackage.deps
matchespackage.deps.update
.Caveats:
This can't be a build hook, because it requires a separate derivation (hooks can't create derivations). This is why it has to be a Nix function. Since this is my first work on this part of nixpkgs, any suggestions on how best to organize it are welcome. The current PR is just a draft and is heavily subject to change, though I'd argue it's already better than what was there before. I'm also not a Java dev nor a Gradle expert.
If a native Gradle package is used, hashes and lockfiles become platform-dependent. This was already the case, but now you have to update not only the hashes, but also lockfiles. But again, the above workflow should be easy to automate.
There are still occasional issues where Gradle stores packages in a way not compatible with maven repository format. There are also some non-reproducible outputs at times even when fully using lockfiles. It's possible to fix this at least to some extent by creating a more complex tool than a perl oneliner.
Things done
sandbox = true
set innix.conf
? (See Nix manual)nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)