Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict fixed-output derivations #2270

Open
edolstra opened this issue Jul 4, 2018 · 57 comments
Open

Restrict fixed-output derivations #2270

edolstra opened this issue Jul 4, 2018 · 57 comments
Labels
feature Feature request or proposal

Comments

@edolstra
Copy link
Member

edolstra commented Jul 4, 2018

People have started (ab)using fixed-output derivations to introduce large impurities into Nix build processes. For example, fetchcargo in Nixpkgs takes a Cargo.lock file as an input and produces an output containing all the dependencies specified in the Cargo.lock file. This is impure, but it works because fetchcargo is a fixed-output derivation. Such impurities are bad for reproducibility because the dependencies on external files are completely implicit: there is no way to tell from the derivation graph that the derivation depends on a bunch of crates fetched from the Internet.

You could argue that fetchurl has the same problem, but fetchurl has simple semantics (fetching a file from a URL) and is more-or-less visible in the derivation graph. This allows tools like maintainers/scripts/copy-tarballs.pl to mirror fetchurl files to ensure reproducibility.

Proposed solution: Add a new sandboxing mode where fixed-output derivations are not allowed to access the network (just like regular derivations). In this mode, only builtin derivations like builtin:fetchurl would be allowed to fetch files from the network. This mode should become the default at some point.

We would also need builtin:fetchGit to replace fetchGit in Nixpkgs, etc.

@cleverca22 pointed out that fixed-output derivations allow shenanigans like opening a reverse interactive shell into the build server, so that's another reason for removing network access.

@edolstra edolstra added the feature Feature request or proposal label Jul 4, 2018
@edolstra edolstra self-assigned this Jul 4, 2018
@edolstra
Copy link
Member Author

edolstra commented Jul 4, 2018

The main problem with this, of course, is that all fetchers will need to be built into Nix or provided as plugins.

@shlevy
Copy link
Member

shlevy commented Jul 7, 2018

An alternative approach is what we're doing in nix-fetchers: Push more of this into eval time stuff (see e.g. fetch-pypi-hash

@copumpkin
Copy link
Member

For example, fetchcargo in Nixpkgs takes a Cargo.lock file as an input and produces an output containing all the dependencies specified in the Cargo.lock file. This is impure, but it works because fetchcargo is a fixed-output derivation.

It's not clear to me why this is impure, at least by my definition of the word. How do you define purity? If the lock file (lots of languages have a similar notion) is specified narrowly/precisely enough, it produces the same outputs every time. And if not, the hash won't validate.

The issue arises when you cache the result of these things and don't notice that your input wasn't sufficiently well specified. This brings me back to #520 where I talk about a semi-formal notion of a "lock file", where fetchers (potentially nondeterministic) are expected to produce a fully locked down "lock file" that is supposed to be feedable back into the process in order to reproduce it (and can and should be tested regularly for consistency).

I'm pretty strongly against forcing all fetchers to be built into Nix from now on. One of its biggest selling points for me is the ease of writing new fetchers and how clean that model is, and I don't honestly think that this is violating any of that. I'd be very sad to see today's model of FO derivations go away, despite its issues.

@7c6f434c
Copy link
Member

7c6f434c commented Jul 7, 2018

Re: fetchcargo: if upstream server allowed to post a Cargo.lock and provided a tarball of relevant dependencies, the functionality wouldn't be that much worse than fetchurl; probably better than submodule-aware call to fetchgit. They don't provide it, so fetchcargo does what fetchgit does, is that too bad?

What is the problem to solve? Unlimited network access does raise security questions; maybe the additional fetchers should have to precommit to only talking to a fixed list of domain names and ports (maybe with subnet blacklist to prevent resolving to local IPs, and with a clear ban on listening on external interfaces)? Maybe we could define the notion of fetcher so that they can be written as today, then administrator has to trust them via a mechanism similar to substituters?

@edolstra
Copy link
Member Author

edolstra commented Jul 8, 2018

@copumpkin A function is impure if it depends on something other than its inputs. Fetchers depend on the network, so they are impure. The only mitigating aspect of fixed-output derivations is that the impurity is controlled in the sense that the output is verified to have a certain hash.

This is a real problem for Nix's reproducibility: for example, fetchurl calls frequently fail because a file disappeared. But at least with fetchurl, there is a generic method to mirror all fetchurl calls in a derivation.

@edolstra
Copy link
Member Author

edolstra commented Jul 9, 2018

@volth Implicitly we're already doing that, since cache.nixos.org acts as a backup of FODs. However there are a few issues:

  • The binary cache could be garbage-collected in the future, so it's probably best not to rely on it.

  • With functions like fetchcargo the granularity is not ideal: each fetchcargo output will contains dozens or hundreds of crates, so you get a lot of duplication that would be avoided if the crates were mirrored individually.

  • Store path hashes depend on the store prefix, so cache.nixos.org can only act as a mirror for people using /nix/store.

  • You don't get the download progress indicator of nix build.

@7c6f434c
Copy link
Member

OK, so the main point is probably ensuring sane granularity for caching (and that's a very good point).

At the same time, having all the fetch[VCS] be Nix plugin would be annoying from the point of view of dependency structure, if nothing else.

Maybe netork support for FOD should be optional (default off) with just Nixpkgs-level policy on nonduplication (which would be explicit)? Maybe with store deduplication eventually complaining if there is duplication between FOD outputs.

(Also, does it mean that we want to eventually deduplicate Linux kernel source tarball contents across versions?)

GC policies for FOD could be different than for normal builds.

Download progress indicator for binary caches would be nice anyway, LibreOffice build output is large, so I expect this to be a transient problem. Path rewriting substituter for FOD is also likely to happen at some point because the hard part is just absent in htis use case.

@andrew-d
Copy link
Contributor

andrew-d commented Feb 2, 2019

As an initial thought: what about fetching things that don't use HTTP(S)? E.g. if we needed a fetcher for FTP, a custom protocol, etc.? Maybe some form of plugin approach could help?

@ThomasMader
Copy link

I am the maintainer of https://github.com/NixOS/nixpkgs/blob/3ff636fb2e756ac57d7f0007dc2c6c2425401997/pkgs/development/compilers/ldc/default.nix and the only reason I need to use a fixed output derivation is because I want to run the unittests and the socket implementation is tested via the loopback address (127.0.0.1).
See https://github.com/dlang/phobos/blob/v2.084.1/std/socket.d#L779 for example.

I fail to see how the loopback address can introduce any impurities.
I guess it's hard to implement an exception into the sandbox for the loopback address.

If you are going to deprecate fixed output derivations I would like to know how I should change this derivation in the long term.
For sure I could just not run the test but my target was always to run all available tests and it proved to be useful.

@edolstra
Copy link
Member Author

You shouldn't need a fixed-output derivation for that. Regular derivations run in a network namespace where they have their own loopback interface:

$ nix-build -E 'with import <nixpkgs> {}; runCommand "foo" { buildInputs = [ iproute ]; } "ip -4 a"'
these derivations will be built:
  /nix/store/pb341wyxv056mh847fkpj7zybs1j623d-foo.drv
building '/nix/store/pb341wyxv056mh847fkpj7zybs1j623d-foo.drv'...
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever

@ThomasMader
Copy link

That's right but the problem is that h_addr_list is empty if sandboxing is enabled in the test below.
Without sandboxing printf outputs 127.

let
  nixpkgsRev = "a44784e81181c971a41c588d93a6cf4bbd1a394c";
  nixpkgs = builtins.fetchTarball "github.com/NixOS/nixpkgs/archive/${nixpkgsRev}.tar.gz";
  pkgs = import nixpkgs {};

  file = pkgs.writeText "test.cpp" ''
    #include <sys/socket.h>
    #include <netdb.h>
    #include <arpa/inet.h>
    #include <stdio.h>
    int main()
    {
        struct hostent *he;
        struct in_addr ipv4addr;
        struct in6_addr ipv6addr;

        inet_pton(AF_INET, "127.0.0.1", &ipv4addr);
        he = gethostbyaddr(&ipv4addr, sizeof ipv4addr, AF_INET);
        printf("first addr: %i\n", (int)*he->h_addr_list[0]);
        return 0;
    }
  '';
in
  pkgs.runCommand "compile" {} ''
    ${pkgs.iproute}/bin/ip -4 a
    mkdir $out
    cd $out
    ${pkgs.clang}/bin/clang ${file} -o test
    ./test
  ''

@offlinehacker
Copy link
Contributor

offlinehacker commented Jun 15, 2019

I don't like the idea where nix tries to solve all package managers in the world at all. There is already enough magic with lang2nix tools, and I don't see what is wrong with approach that we have for rust and go, where are we able to produce deterministic package depenency set. It is not nice and shiny, but reduces magic and gets the job done.

I much rather like a solution that is known to work, as solution that tries to emulate "right behaviour", for sake of addressing issues of package managers.

@edolstra
Copy link
Member Author

The problem is that fixed-output derivations allow you to have irreproducible builds (requireFile would be the canonical example).

@offlinehacker
Copy link
Contributor

offlinehacker commented Jun 16, 2019

Ok if I understand correctly, the problem is that fixed output derivations can be arbitrary complex as there are no boundaries from where information can be retrieved from internet. At the same time you lose dependency graph and it becomes time consuming to rebuild fixed output derivations.

Let's say that package managers that nix tries to wrap can be arbitrary complex and can change at any point in time, so it's hard to have nix native implementation, especially one that works right. What I see as alternative is to limit what fixed output derivations can do is:

  • limitations regarding internet access (blocking where impure derivations can connect), thus removing possible sources of impurities and having better security.
  • incremental builds (have state of privous builds avalible during new builds so fetching becomes faster)
  • ability to split build results in smaller parts, similar as multiple outputs is doing, but dynamic. This would allow to put different depedencies into different outputs and thus improve caching, as you would have to download only some outputs.

The other way is of course trying to wrap other package managers, but this brings in a lot of complexity, as you need lang2nix, whether this be part of build or as a pre-evaluation code generator.

@adisbladis
Copy link
Member

adisbladis commented Jul 5, 2019

I was curious how much buildGoModule and friends inefficiency in fetching sources actually affect us so I went ahead and made a small experiment to see:
https://gist.github.com/adisbladis/5a6805d329326e828bc599fb18cbc058.

Cache busting is imho one of the more serious practical issues with the current model of fixed-output derivations.

@Mic92
Copy link
Member

Mic92 commented Jul 26, 2019

So for buildGoModule we would save half the downloads (median 1.5) if specify each dependency explicitly. I am not quite sold having to generate a deps.nix since it increases the evaluation time and takes more effort then a single checksum.

@adisbladis
Copy link
Member

adisbladis commented Aug 4, 2019

@Mic92 That's not exactly the case, the 1.5 number is only correct within a single nixpkgs evaluation, so 1.5 is only the immediate up-front savings.
Any change to the dependency graph using the FOD packaging model will cause the entire cache for that derivation to be busted.
The only way to measure the true impact would be to measure a package over time.

@jirkadanek
Copy link
Member

jirkadanek commented Aug 10, 2019

The justification I heard for tracking each dependency individually is the "dependency alignment" in Fedora or similar Linux distributions. The idea is that each library should appear in only a single version in whole distro, preferably as a .so file, to reduce maintenance efforts. Depends on the kind of maintenance, obviously. In Nixpkgs, maintenance IMO is mostly about tracking upstream closely, so the benefits aren't there.

Second argument for tracking individual dependencies would be keeping track of bugs (and security bugs). Answering the question "Is my system affected through a transitive dependency that some package pulled in?" gets tricky without it.

The data transfer savings due to and caching granularity mentioned before seem most relevant argument to me.

If working with deps.nix files can be made sufficiently painless and performant for all the build tools involved, I guess I am up for some of the more "advanced" solutions in NixOS/nixpkgs#65275 (comment)

@offlinehacker The day might come when the "Open Buildgraph Specification" (just made it up) releases a format that build tools can use to consume/provide each others outputs/inputs w/ some propagated metadata, and the build problem will be thus solved 📦

@nh2
Copy link
Contributor

nh2 commented Aug 15, 2019

Coming here after losing 2 days from being tricked by a fixed-output derivation (NixOS/nixpkgs#66598).

I had overridden curl in a way that was incorrect when built via fetchurl, but didn't notice it until many days (months?) later because the thing that was fetched was fixed-output and so the curl was never built.

Thus I would argue that even

You could argue that fetchurl has the same problem, but fetchurl has simple semantics

wasn't true for me -- it tricked me badly.

Here a few questions:

  • Would it be possible to have a flag in which nix builds all dependencies, even of fixed-output derivations?
    • For example, even if fetchpatch-mything.patch is fixed-output, if nix can see that its .drv depends on curl.drv, force it to build that curl.drv?
  • Would it be possible to have a flag or functionality to re-fetch all fixed-output derivations, and check whether they produce the same hashes?
  • Why does fetchurl/default.nix not simply use builtins.fetchpatch?
    • For example in pkgsMusl we override curl, but never see whether it actually works when invoked via fetchurl of it because usually all that fixed-output stuff is already in the nix store. It seems unnecessary that we even invoke programs from the nixpkgs package set / overlay to fetch stuff from the Internet when there's a nix primop specifically designed to fetch stuff from the Internet. Am I missing something? If the code path pkgsMusl.mypackage -> fetchpatch -> fetchurl -> pkgsMusl.curl didn't even exist (because builtins.fetchurl was used), we couldn't get it wrong.

@7c6f434c
Copy link
Member

7c6f434c commented Aug 15, 2019 via email

@7c6f434c
Copy link
Member

7c6f434c commented Aug 15, 2019 via email

@masaeedu
Copy link
Contributor

@edolstra Would this sort of situation be improved if recursive nix were possible? E.g. maybe we could have an HTTP proxy on 127.0.0.1 to which the package managers' HTTP requests get redirected, and which serves the requests from store contents after fetchUrl-ing the data? That way you have granular insight into what stuff the package managers are fetching, without having to lift out all the logic that causes them to make those requests out into nix.

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/parsing-go-sum-and-cargo-lock-files-to-spare-the-need-for-fixed-output-derivations/7367/1

@robinp
Copy link

robinp commented Jul 18, 2020

As someone who periodically forgets to bump the sha on the prefetched dependencies of buildBazelPackage and spends an hour or so, I join those who would appreciate having the input hashes affect the derivation's hash, in addition to the fixed hash (the "did at least run once" semantic).

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/fixed-output-derivations-to-become-part-of-flake-inputs/8263/1

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/creating-vendor-directories-directly-in-srcs-of-go-and-rust-packages-so-fixed-output-derivations-wont-be-needed/7367/15

@stale
Copy link

stale bot commented Feb 12, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the stale label Feb 12, 2021
@kvtb
Copy link
Contributor

kvtb commented Feb 27, 2021

NixOS has to make weekly snapshots of Cargo, NPM and Maven, and restrict/redirect network access to own mirrors.
Then resolving-dependencies-to-FOD will be stable and deteministic.

@stale stale bot removed the stale label Feb 27, 2021
@7c6f434c
Copy link
Member

Maybe only restrict support to un-pinnable upstreams where clear-retention-policy mirror exists; otherwise you are asking to make it impossible at the Nix level to package things distributed in a way not mainstream enough to have a NixOS foundation-level mirror.

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/why-dont-nix-hashes-use-base-16/11325/28

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/status-of-lang2nix-approaches/14477/1

@Ma27 Ma27 mentioned this issue Sep 17, 2021
12 tasks
@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/any-way-to-use-custom-fetcher-for-nix-flake-inputs/18080/1

@dzmitry-lahoda
Copy link

I read entry of this thread and some few posts on top. Searched for checksum and vendor key word.

Dumb question.

That is part of my Cargo.lock:

[[package]]
name = "addr2line"
version = "0.17.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b9ecd88a8c8378ca913a680cd98f0f13ac67383d35993f86c90a70e3f137816b"
dependencies = [
 "gimli 0.26.2",
]

Why not custom fixed derivation will not just download from source and check its checksum. Is not as good as nix already does? Could this fetcher be made trusted? I guess somewhere deep in Rust there is some trusted public key setting for registry too.

Would that not be abuse?

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/buildnodemodules-the-dumbest-node-to-nix-packaging-tool-yet/35733/1

@edolstra edolstra removed their assignment Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Feature request or proposal
Projects
None yet
Development

No branches or pull requests