Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge deps and prebuilt_dependencies #75

Closed
Fuuzetsu opened this issue Dec 31, 2017 · 8 comments
Closed

Merge deps and prebuilt_dependencies #75

Fuuzetsu opened this issue Dec 31, 2017 · 8 comments
Labels
P2 major: an upcoming release type: feature request

Comments

@Fuuzetsu
Copy link
Collaborator

We have a problem in that to build anything non-trivial, we need prebuilt_dependencies: this is a list of packages that we expect to be available to GHC. It is up to the user to make sure a toolchain where they are present is used.

Sadly this causes a big inconvenience: the user has to both list everything in prebuilt_dependencies and then manually aggregate all these lists for all packages that toolchain will build back to toolchain definition. This is extremely tedious and goes out of date very quickly. We would like to only list packages in a single place: the dependencies for the package.

There is a way to achieve this and that is to stop using prebuilt_dependencies and instead pass honest bazel targets to deps. The question is where to get these targets. For purpose of this ticket, I will assume nix is still the answer. Note however there is no dependency on rules_nixpkgs from rules_haskell: in theory the user will be able to use the same mechanism with a different backend. It's up to the user.

Instead of generating a toolchain using ghcWithPackages (p: [ every-package-we-need-ever ]), generate package sets. A good candidate for this are LTS packages (but note user is not limited to these: they can cabal2nix/stackage2nix anything they want and use it just the same). The user defines:

http_archive(
  name = "lts",
  repository = …,
  strip_prefix = "lts-collection/lts-10.0",
)

Inside lts-10.0 there's a BUILD file with one bazel target per one package in the snapshot. These targets are provided by lts-10.0/WORKSPACE and may look a little like

nixpkgs_package(
  name = "aeson",
  attribute = "haskellPackages.aeson",
  build_file_content = """
filegroup(
  name = "haskell-outputs",
  …
  srcs = ..., # package db, package conf, libraries, everything needed.
)
"""
)

then inside build file we just have

something_todo(
  name = "aeson",
  target = "@aeson//:haskell-outputs",
)

Finally the user can now say

haskell_library(
  name = "my_lib",
  …,
  deps = [
    "@lts//:aeson"
  ],
)

Where rules_haskell comes in is to make sure it can deal with these dependencies properly: notably that it can find, use and include all necessary files. This for most part will involve -package-db and -package flags into outputs of those packages. Possibly something_todo above can just output a Provider that directly gives us all this information regardless of backend: we can provide default implementation for nix packages somewhere.

We now no longer specify Haskell packages in a toolchain: we unify these on bazel side. This allows us to only specify dependencies where they are needed and removes the tedium.

The reason this works is that bazel targets are lazy: nix (or other backend) will not build the targets until we try to use them. We can therefore safely generate full sets of LTS packages and just pick and choose without inconveniencing users who just need a couple of packages.

What we need is a tool to generate WORKSPACE/BUILD files for LTS (should be simple by re-using stackage2nix), consider if we want to host nix-backend pregenerated package sets in a repository somewhere (I would say yes: requiring that user runs stackage2nix on full LTS is too much: we can run it for every LTS and update periodically + provide instructions how to DYI if users want updates faster; automation of this is possible; having said that there's a new nightly every day so maybe it's not a good idea after all…).

We also need a way on rules_haskell side to unify these packages. We need to implement something_todo for nix backend which provide needed files.

In general I don't think there's that much work though I can imagine quite a few hurdles along the way. For the start we can of course just generate one LTS and use that as POC.

@Fuuzetsu
Copy link
Collaborator Author

I forgot to add that LTS package sets are not exhaustive: the tool should be able to work on stackage2nix output because that tool can also deal with user's extra-deps &c.

Possibly we'll need to improve stackage2nix: it's currently very heavy to build (builds nearly all of stack itself…) and is quite slow (I know for sure I put sub-optimal code in there once…). Improvements here will translate directly to improvements for us.

@mboes
Copy link
Member

mboes commented Jan 2, 2018

There is another, low-budget, way of solving this:

  • instead of relying on a single global ghcWithPackages Nix instantiation, consider (unlike Cabal) that prebuilt dependencies are not so much a property of a library/binary, but rather of its environment/toolchain.

Concretely, instead of

haskell_library(
  name = "foo",
  srcs = [glob("*.hs")],
  prebuilt_deps = ["base", "bytestring", "conduit", ...]
)

We'd have,

haskell_pkgset_nixpkgs(
  name = "pkgset",
  pkgs = ["base", "bytestring", "conduit", ...],
)

haskell_library(
  name = "foo",
  srcs = [glob("*.hs")],
  pkgset = ":pkgset",
)

A haskell_pkgset_nixpkgs rule would create a new compiler using ghcWithPackages with the given set of packages. No global list needed anymore. The point of introducing the notion of pkgset is that in the future pkgsets could conceivably be created any way the user likes. That is, we could have haskell_pkgset_bazel that creates a pkgset entirely from within Bazel without relying on Nix, or even haskell_pkgset_stack that reuses the Stack snapshot cache.

The downside of this approach is that it can add a few seconds to the total compile time, because realizing a Nix derivation expressed as an application of ghcWithPackages is often slow. Doing it once instead of per-package saves a little bit of time. Won't slow down incremental builds, however.

Note: we'll probably need to name pkgsets anyways if users start created fine-grained libraries, because we'll probably want one pkgset per Cabal package, not one per BUILD file scattered across the source tree of a single package. That is - avoid stating more metadata than Cabal.

@mboes
Copy link
Member

mboes commented Jan 2, 2018

Here's a refinement on the previous comment: we could use the toolchain mechanism to register "pkgset builders". Nixpkgs/ghcWithPackages could be one such builder. That way, we could specify pkgsets inline as we do now:

haskell_library(
  name = "foo",
  srcs = [glob("*.hs")],
  pkgs = ["base", "bytestring", "conduit", ...],
)

The pkgset builder to use would be implicit. It would depend on what pkgset builders were registered in the current workspace (there should only really be one).

@johnynek
Copy link

johnynek commented Jan 2, 2018

In rules_scala we face something similar, except on the JVM people want to use already compiled code from maven servers. I wrote a tool to generate the bazel targets to import these directly: https://github.com/johnynek/bazel-deps

I can imagine something similar like cabal2bazel, which could convert a cabal build into a bazel build. If we had this, and the ability to depend on in-WORKSPACE packages (which I guess you have now), that would be an ideal solution to the problem, no?

Note, bazel has build caching and external caching support, so even if it gets slow, a group of people can share a build cache as long as the rules are hermetic it should be safe.

@Fuuzetsu
Copy link
Collaborator Author

Fuuzetsu commented Jan 2, 2018

Right, if we had a tool to do the conversion then we could do just that. The ideas here are mostly on how to avoid having to write such a tool straight up and re-use existing tooling (like cabal2nix/stackage2nix). Part of the motivation is that we'd have to support every feature needed by most of the ecosystem: certainly the common dependency chain. This can't really happen over night so we're looking for intermediate solution which leans on existing expressions to provide the packages.

I'm sure I'll get corrected if I'm wrong.

@mboes
Copy link
Member

mboes commented Jan 2, 2018

Such a tool sounds to me like the sensible approach long term. Related ticket: #17.

@mboes mboes added the P2 major: an upcoming release label Jan 29, 2018
@mboes mboes changed the title Allow use of nix-backed packages in dep removing need for prebuilt_dependencies Merge deps and prebuilt_dependencies May 10, 2018
judah added a commit that referenced this issue Jul 15, 2018
Progress on #75.  Useful for Hazel.
judah added a commit that referenced this issue Jul 16, 2018
This rule forwards a prebuilt dependency, so that it may be
specified directly in `deps` rather than `prebuilt_dependencies`.

This gives progress on #75.  Similarly, it simplifies the use of Hazel,
which can now generate repositories for the prebuilt dependencies
just like for regular packages.
judah added a commit that referenced this issue Jul 16, 2018
This rule forwards a prebuilt dependency, so that it may be
specified directly in `deps` rather than `prebuilt_dependencies`.

This gives progress on #75.  Similarly, it simplifies the use of Hazel,
which can now generate repositories for the prebuilt dependencies
just like for regular packages.
judah added a commit that referenced this issue Jul 16, 2018
This rule forwards a prebuilt dependency, so that it may be
specified directly in `deps` rather than `prebuilt_dependencies`.

This gives progress on #75.  Similarly, it simplifies the use of Hazel,
which can now generate repositories for the prebuilt dependencies
just like for regular packages.
mboes pushed a commit that referenced this issue Jul 18, 2018
This rule forwards a prebuilt dependency, so that it may be
specified directly in `deps` rather than `prebuilt_dependencies`.

This gives progress on #75.  Similarly, it simplifies the use of Hazel,
which can now generate repositories for the prebuilt dependencies
just like for regular packages.
@thufschmitt
Copy link
Contributor

We discussed a bit this issue with @mboes yesterday, and after a bit more of
thinking, here's a (probably quite naive) possible interface for this:

We reuse the haskell_import rule that @judah wrote, but make it depend on
a toolchain defining how to get these packages:

  • The simplest case is a dummy toolchain builtin_haskell_provider which
    assumes that ghc already knows how to find the package and just adds the
    needed cli flags (so the current behavior)

  • The other (or at least another) toolchain nix_haskell_provider get these
    packages using nix and outputs a an extended version of
    HaskellPrebuiltPackageInfo so that it can be used as any
    bazel-build package

The nix_haskell_provider toolchain has an interface similar to
nixpkgs_package, but instead of building a package, it returns a (nix) set of
haskell packages. A simple invokation of it would be

nix_haskell_provider(
  repository = "@nixpkgs",
  path = "haskell.packages.ghc843",
)

Once this toolchain is registered, calling haskell_import(name = "text")
calls nix-build on @nixpkgs.haskell.packages.ghc843 and returns a
HaskellPrebuiltPackageInfo with the needed informations to build and link
against this package (The idea is roughly to reimplement on the bazel side
what's done in
https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/haskell-modules/generic-builder.nix#L284-L320,).

HaskellPrebuiltPackageInfo is defined as:

HaskellPrebuiltPackageInfo = provider(
    doc = "Information about a prebuilt GHC package.",
    fields = {
        "package": "Package name",
        "package_root": "Directory containing the package (or None if the package is bundled with ghc)",
    },
)

When building we add -L{package_root}/lib and -I{package_root}/include
to the ghc cli for each prebuilt transitive dependency, as well as
-package-db={package_root}/package.conf.d -package={package} for each direct
prebuilt dependency.

@mboes
Copy link
Member

mboes commented Dec 2, 2018

This was fixed in #25. prebuilt_libraries is now deprecated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 major: an upcoming release type: feature request
Projects
None yet
Development

No branches or pull requests

4 participants