-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restrict fixed-output derivations #2270
Comments
The main problem with this, of course, is that all fetchers will need to be built into Nix or provided as plugins. |
An alternative approach is what we're doing in nix-fetchers: Push more of this into eval time stuff (see e.g. fetch-pypi-hash |
It's not clear to me why this is impure, at least by my definition of the word. How do you define purity? If the lock file (lots of languages have a similar notion) is specified narrowly/precisely enough, it produces the same outputs every time. And if not, the hash won't validate. The issue arises when you cache the result of these things and don't notice that your input wasn't sufficiently well specified. This brings me back to #520 where I talk about a semi-formal notion of a "lock file", where fetchers (potentially nondeterministic) are expected to produce a fully locked down "lock file" that is supposed to be feedable back into the process in order to reproduce it (and can and should be tested regularly for consistency). I'm pretty strongly against forcing all fetchers to be built into Nix from now on. One of its biggest selling points for me is the ease of writing new fetchers and how clean that model is, and I don't honestly think that this is violating any of that. I'd be very sad to see today's model of FO derivations go away, despite its issues. |
Re: What is the problem to solve? Unlimited network access does raise security questions; maybe the additional fetchers should have to precommit to only talking to a fixed list of domain names and ports (maybe with subnet blacklist to prevent resolving to local IPs, and with a clear ban on listening on external interfaces)? Maybe we could define the notion of fetcher so that they can be written as today, then administrator has to trust them via a mechanism similar to substituters? |
@copumpkin A function is impure if it depends on something other than its inputs. Fetchers depend on the network, so they are impure. The only mitigating aspect of fixed-output derivations is that the impurity is controlled in the sense that the output is verified to have a certain hash. This is a real problem for Nix's reproducibility: for example, fetchurl calls frequently fail because a file disappeared. But at least with fetchurl, there is a generic method to mirror all fetchurl calls in a derivation. |
@volth Implicitly we're already doing that, since cache.nixos.org acts as a backup of FODs. However there are a few issues:
|
OK, so the main point is probably ensuring sane granularity for caching (and that's a very good point). At the same time, having all the Maybe netork support for FOD should be optional (default off) with just Nixpkgs-level policy on nonduplication (which would be explicit)? Maybe with store deduplication eventually complaining if there is duplication between FOD outputs. (Also, does it mean that we want to eventually deduplicate Linux kernel source tarball contents across versions?) GC policies for FOD could be different than for normal builds. Download progress indicator for binary caches would be nice anyway, LibreOffice build output is large, so I expect this to be a transient problem. Path rewriting substituter for FOD is also likely to happen at some point because the hard part is just absent in htis use case. |
As an initial thought: what about fetching things that don't use HTTP(S)? E.g. if we needed a fetcher for FTP, a custom protocol, etc.? Maybe some form of plugin approach could help? |
I am the maintainer of https://github.com/NixOS/nixpkgs/blob/3ff636fb2e756ac57d7f0007dc2c6c2425401997/pkgs/development/compilers/ldc/default.nix and the only reason I need to use a fixed output derivation is because I want to run the unittests and the socket implementation is tested via the loopback address (127.0.0.1). I fail to see how the loopback address can introduce any impurities. If you are going to deprecate fixed output derivations I would like to know how I should change this derivation in the long term. |
You shouldn't need a fixed-output derivation for that. Regular derivations run in a network namespace where they have their own loopback interface:
|
That's right but the problem is that h_addr_list is empty if sandboxing is enabled in the test below.
|
I don't like the idea where nix tries to solve all package managers in the world at all. There is already enough magic with lang2nix tools, and I don't see what is wrong with approach that we have for rust and go, where are we able to produce deterministic package depenency set. It is not nice and shiny, but reduces magic and gets the job done. I much rather like a solution that is known to work, as solution that tries to emulate "right behaviour", for sake of addressing issues of package managers. |
The problem is that fixed-output derivations allow you to have irreproducible builds ( |
Ok if I understand correctly, the problem is that fixed output derivations can be arbitrary complex as there are no boundaries from where information can be retrieved from internet. At the same time you lose dependency graph and it becomes time consuming to rebuild fixed output derivations. Let's say that package managers that nix tries to wrap can be arbitrary complex and can change at any point in time, so it's hard to have nix native implementation, especially one that works right. What I see as alternative is to limit what fixed output derivations can do is:
The other way is of course trying to wrap other package managers, but this brings in a lot of complexity, as you need lang2nix, whether this be part of build or as a pre-evaluation code generator. |
I was curious how much Cache busting is imho one of the more serious practical issues with the current model of fixed-output derivations. |
So for |
@Mic92 That's not exactly the case, the 1.5 number is only correct within a single nixpkgs evaluation, so 1.5 is only the immediate up-front savings. |
The justification I heard for tracking each dependency individually is the "dependency alignment" in Fedora or similar Linux distributions. The idea is that each library should appear in only a single version in whole distro, preferably as a .so file, to reduce maintenance efforts. Depends on the kind of maintenance, obviously. In Nixpkgs, maintenance IMO is mostly about tracking upstream closely, so the benefits aren't there. Second argument for tracking individual dependencies would be keeping track of bugs (and security bugs). Answering the question "Is my system affected through a transitive dependency that some package pulled in?" gets tricky without it. The data transfer savings due to and caching granularity mentioned before seem most relevant argument to me. If working with @offlinehacker The day might come when the "Open Buildgraph Specification" (just made it up) releases a format that build tools can use to consume/provide each others outputs/inputs w/ some propagated metadata, and the build problem will be thus solved 📦 |
Coming here after losing 2 days from being tricked by a fixed-output derivation (NixOS/nixpkgs#66598). I had overridden Thus I would argue that even
wasn't true for me -- it tricked me badly. Here a few questions:
|
* Would it be possible to have a flag or functionality to re-fetch all fixed-output derivations, and check whether they produce the same hashes?
This seems possible now with `nix-store -qR` to get the dependency list, filtering with `nix-store -q -b outputHash` to get fixed-output derivations, and `nix-store -r --check` to rebuild them.
|
This probably won't work if the goal to to redownload FODs using new `curl` because `.drv` files in Nix Store have path to old curl.
I would hope that `nix-store -qR` on the `.drv` file instantiated from the new version refer to the new `curl`, though.
So the `.drv` files are to be rebuild to and nix expression of all FODs are needed (and here we went back to the task of https://discourse.nixos.org/t/using-nixos-in-an-isolated-environment/3369/15)
Well, unlike there we do have a definition of «all» that is based on an easily evaluatable derivation
|
@edolstra Would this sort of situation be improved if recursive nix were possible? E.g. maybe we could have an HTTP proxy on 127.0.0.1 to which the package managers' HTTP requests get redirected, and which serves the requests from store contents after |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: |
As someone who periodically forgets to bump the sha on the prefetched dependencies of |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/fixed-output-derivations-to-become-part-of-flake-inputs/8263/1 |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: |
I marked this as stale due to inactivity. → More info |
NixOS has to make weekly snapshots of Cargo, NPM and Maven, and restrict/redirect network access to own mirrors. |
Maybe only restrict support to un-pinnable upstreams where clear-retention-policy mirror exists; otherwise you are asking to make it impossible at the Nix level to package things distributed in a way not mainstream enough to have a NixOS foundation-level mirror. |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/why-dont-nix-hashes-use-base-16/11325/28 |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/status-of-lang2nix-approaches/14477/1 |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/any-way-to-use-custom-fetcher-for-nix-flake-inputs/18080/1 |
I read entry of this thread and some few posts on top. Searched for Dumb question. That is part of my [[package]]
name = "addr2line"
version = "0.17.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b9ecd88a8c8378ca913a680cd98f0f13ac67383d35993f86c90a70e3f137816b"
dependencies = [
"gimli 0.26.2",
] Why not custom fixed derivation will not just download from Would that not be abuse? |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/buildnodemodules-the-dumbest-node-to-nix-packaging-tool-yet/35733/1 |
People have started (ab)using fixed-output derivations to introduce large impurities into Nix build processes. For example,
fetchcargo
in Nixpkgs takes aCargo.lock
file as an input and produces an output containing all the dependencies specified in theCargo.lock
file. This is impure, but it works becausefetchcargo
is a fixed-output derivation. Such impurities are bad for reproducibility because the dependencies on external files are completely implicit: there is no way to tell from the derivation graph that the derivation depends on a bunch of crates fetched from the Internet.You could argue that
fetchurl
has the same problem, butfetchurl
has simple semantics (fetching a file from a URL) and is more-or-less visible in the derivation graph. This allows tools likemaintainers/scripts/copy-tarballs.pl
to mirrorfetchurl
files to ensure reproducibility.Proposed solution: Add a new sandboxing mode where fixed-output derivations are not allowed to access the network (just like regular derivations). In this mode, only builtin derivations like
builtin:fetchurl
would be allowed to fetch files from the network. This mode should become the default at some point.We would also need
builtin:fetchGit
to replacefetchGit
in Nixpkgs, etc.@cleverca22 pointed out that fixed-output derivations allow shenanigans like opening a reverse interactive shell into the build server, so that's another reason for removing network access.
The text was updated successfully, but these errors were encountered: