-
-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
language package management tooling missing hashes #65275
Comments
Approach 2 (ab)uses fixed-output derivations. Why this might be a bad idea is described here: NixOS/nix#2270 |
Glossary:
Here is a brain dump of what I have learned so far: 0. Fixed output derivationsThe fixed-output derivation is the level zero support. It doesn't take much effort to create and maintain. The language package manager (LPM) commands can be used directly like in the developer documentation. The biggest downside is that the hash is not automatically invalidated when one of the input files are changing. This can create surprising situations when the lockfile is updated but the old program is still running (because it's still reading from the old hash). The hash has to be invalidated manually but changing it to something else, run nix-build and wait for nix to tell you the right hash. Then re-run the build from scratch. Another downside is that not all the tools have a stable on-disk output. Two developers not sharing a binary cache might get different output hashes. I've seen that happen with the cargo tools for a while for example. 1. Fake registryA lockfile is generated that downloads all the dependencies using nix fetchers. Then the aggregate is used to start a fake registry process that the tools can talk to. This solves the outdated hash problem, and since the APIs are usually publicly documented they are also pretty stable. The only implementation of that idea that I know of is https://github.com/nmattia/napalm The biggest downside is that we need to build the API for all the languages. 2. Pre-download the dependenciesThis is similar to (1) but instead of providing an API, the files are placed on disk where the LPM expects to find them. There is often an offline mode that we can re-use. In my experience the on-disk locations are no necessarily documented or stable between releases. To implement it properly, the nix developer is often forced to look at the LPM source code to find the location and duplicate the logic in nix. 3. Build each dependency in it's own derivationThis is going even further than (2) in the LPM integration. The LPM usually controls traversing the dependency tree and running each individual builds, which we hijack and replace with individual derivations that are being built. So here each dependency is built in isolation in it's little sandbox, and then stringed together and presented to the application at the end. Even more heuristic of the LPM is encoded into Nix. Examples of that can be found for ruby and python. The main advantage of this approach is that it minimized the rebuild between two releases. The rebuilds are more incremental. And in theory the dependencies can also be shared between two or more programs. The main downside is that it's a lot of nix code that is running to build a single package. The Hydra evaluator is now running on a 64GB node because evaluating nixpkgs takes a lot of memory. This is a great approach for a company monorepo. Or when maintaining a package set snapshot like Stackage or the python modules. For single packages in nixpkgs, I now believe that this is going one step too far. |
Derivations don't need the exact same dependency set for sharing to be useful though, as long as there are some common dependencies. I quickly checked duplication across |
👍 on adding data to what is essentially a belief right now. I think you have to show that the derivation outputs are the same. For example That being said, the sharing also happens between multiple versions of nixpkgs. Having more granular derivations also allows to minimize rebuild on package updates, and minimize downloads from the user. To really know we would need a big differential equation that balances build times, evaluation times and download times. Actually I was missing the last step: 4. Use Nix as a project build systemIn this scenario, there is no LPM. Nix has entirely replaced the LPM tooling. Nix is building each an every object of a project in it's own derivation and composing them all together. This is the ultimate incremental rebuild, and the ultimate memory and Nix evaluation hog. An example of such implementation: https://github.com/nmattia/snack/ At that point you very much wish that Nix had an Intentional Store to minimize rebuilds. |
The issue then becomes that you have to put generated nix files in git, and these can be very large. Do you have any ideas how this could be solved? I had some ideas about using git lfs or some content addressable storage, like ipfs. |
@offlinehacker This could potentially be addressed by nix flakes (& splitting up nixpkgs into subsystems). |
Also there's one other option that's variation of fake registry option described above: Recording/Caching http proxy You redirect all requests of LPM through local http proxy. This proxy records all requests and transforms in a way that can be later used for reply during installation process. The problem is that package manager not only loads tar archives and git repos, but also makes api requests to something like npm. You need to make response transformations that are specific to each package manager, but the whole service could be generalized with plugins. During installation process you start proxy again with generated configuration from first step as an input. The benefit is that you now no longer require fake registry for every package manager but you have more generalized solution. |
@adisbladis in any case even if you split repo, you still polute other repos with basically files that are large text blobs, but yeah I agree that this would still help. The problem is we can't package some things because generated files are too large, for example take a look here: #49082 |
The best solution that I know of is to extend the nix capabilities to allow recursive nix calls. Recursive Nix is when nix is being called from inside a derivation. I would look a little bit like this: stdenv.mkDerivation {
pname = "xxx";
version = "1.2.3";
src = fetchFromGitHub { ... };
buildPhase = ''
nix-build -I nixpkgs=${pkgs.path} ./default.nix
# or ${./inner.nix} if upstream doesn't have a default.nix or we don't want to use it
'';
installPhase = ''
ln -s $(readlink ./result) $out
'';
} (obviously we would extract this pattern in a new The nice thing here is that import-from-derivation can be allowed in the inner build. It's not going to affect So overall it would make hydra builds a bit slower because nixpkgs has to be re-evaluated again on each build. For the users, the nixpkgs evaluation becomes faster because the complicate IFD happens only at build time. If we start using nix files from upstream it might make refactoring of nixpkgs a bit harder. Package dependencies are harder to follow since they are not passed to the outer default.nix. |
Recursive nix was also discussed in this RFC: NixOS/rfcs#40 It goes even further because it also pre-generates derivations |
When there's something I can try and maybe an example of how to use it, I would love to start experimenting with it to get rid of all the yarn.nix files, where possible. However, I do not see how this could solve the issue for example with ruby tooling, where the hashes are not included in the lockfile. |
Originally posted by @Mic92 in #78810 (comment) I think this discussion should not be held in the Mastodon PR, because it is not specific to mastodon or even yarn2nix. The bundler tooling and some Go tooling works the same and has the same issues. I think this is just moving the problem from expression size / evaluation speed to hidden impurities (see NixOS/nix#2270). This is a general issue and I think it might even be good to create some kind of working group of people who are interested in finding a community concensus and solving this problem for all language package managers in the long-term. I would certainly be interested in it. |
I marked this as stale due to inactivity. → More info |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/future-of-npm-packages-in-nixpkgs/14285/3 |
I marked this as stale due to inactivity. → More info |
I'm sorry for the very general name of this issue, if anyone can come up with a better title for this problem please make a suggestion :-)
I am looking for the best solution for the following problem that I run into quite a lot:
The application I am trying to package has dependencies from it's language's packaging system, that is not tracked in nixpkgs (npm, rubygems, maven, rust crates, ...). There is tooling to adapt the dependency definitions from the language's package management system to nix (yarn2nix, bundix, ..), but since the original dependency definitions don't contain usable hashes for all dependencies, or are missing the hashes for git dependencies, these dependency definitions need to be combined with information from the internet. There are multiple solutions that I see being used:
Which one is actually more favorable? The first one results in difficult-to-maintain packages and spams nixpkgs with large files, the second one is not strictly pure, and it is kind of working around nix using the fixed-output derivation, I think I read edolstra discouraging it.
There was some discussion on IRC, but there was no conclusion, so I raise the issue here, because I think there should be a consensus on how to handle this problem in nixpkgs.
Maybe there is an even better third approach that I don't know yet.
The text was updated successfully, but these errors were encountered: