forked from bazelbuild/proposals
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add "Remote Output Service: place bazel-out/ on a FUSE file system"
As requested by @lberki in bazelbuild/bazel#12823, I'm hereby sending out a proposal for adding support for a client-side FUSE daemon to Bazel. Suggestions are welcome!
- Loading branch information
1 parent
80a4f2a
commit 36c7774
Showing
1 changed file
with
226 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,226 @@ | ||
--- | ||
created: 2021-02-09 | ||
last updated: 2021-02-09 | ||
status: To be reviewed | ||
reviewers: | ||
- philwo | ||
title: "Remote Output Service: place bazel-out/ on a FUSE file system" | ||
authors: | ||
- EdSchouten | ||
--- | ||
|
||
# Abstract | ||
|
||
This document describes an extension to Bazel, allowing it to host its | ||
bazel-out/ directory on a FUSE file system. The goal of this change is | ||
to reduce the amount of network that Bazel generates when remote | ||
execution is used. The end result is similar to what's offered by | ||
["Remote Builds without the Bytes"](https://docs.google.com/document/d/11m5AkWjigMgo9wplqB8zTdDcHoMLEFOSH0MdBNCBYOE/), | ||
with the difference that outputs remain accessible. | ||
|
||
# Background | ||
|
||
Bazel can use the [Remote Execution protocol](https://github.com/bazelbuild/remote-apis) | ||
to offload the execution of build actions to a remote build cluster. | ||
When executing actions remotely, Bazel performs the following three | ||
tasks successively: | ||
|
||
1. Uploading inputs: Bazel uploads individual files and Directory, | ||
Command and Action messages into the Content Addressable Storage | ||
(CAS). | ||
2. Execution: Bazel requests that the build cluster executes the Action | ||
that was uploaded into the CAS. The build cluster returns an | ||
ActionResult. | ||
3. Downloading outputs: Bazel downloads all files referenced by the | ||
ActionResult. In case of directory outputs, Bazel also downloads all | ||
files referenced by Tree objects referenced by the ActionResult. | ||
|
||
In the common case, the first two tasks consume little bandwidth. By | ||
using an RPC method named FindMissingBlobs(), Bazel is capable of | ||
scanning which objects already exist in the CAS, thereby allowing it to | ||
skip unnecessary uploads of objects. Assuming the retention rate of the | ||
remote build cluster is adequate, you will see that almost all network | ||
bandwidth generated by Bazel is caused by the third task. In 2018, Jakob | ||
Buchgraber [measured](https://docs.google.com/document/d/11m5AkWjigMgo9wplqB8zTdDcHoMLEFOSH0MdBNCBYOE/) | ||
that when building Bazel itself using remote execution, at least 95.4% | ||
of network bandwidth was caused by the downloading of outputs. | ||
|
||
The reason Bazel downloads output files is as follows: | ||
|
||
- To allow the user to inspect and use the results of a build, e.g. to | ||
run the software that was built. | ||
- Bazel supports mixed local and remote execution. Targets may either | ||
be annotated explicitly to specify where they are run (using the | ||
"no-remote" tag), or features such as | ||
[the dynamic spawn scheduler](https://blog.bazel.build/2019/02/01/dynamic-spawn-scheduler.html) | ||
may be used to let Bazel automatically decide where actions are run. | ||
If a locally executing action depends on files that were built | ||
remotely, Bazel needs to download those files to satisfy the | ||
execution requirements of the locally executing action. | ||
- To serve as Bazel's bookkeeping. To a certain extent, Bazel is | ||
capable of building projects whose dependency graph does not fit in | ||
memory. Bazel keeps a finitely sized cache of file metadata (size, | ||
hash, etc.) in memory. Once exhausted, Bazel may discard entries. | ||
During a later stage of a build, Bazel may need to recompute these | ||
entries by inspecting files on disk. | ||
- To guarantee forward progress of the build, even if objects were to | ||
disappear from the remote CAS. By having the files present locally, | ||
Bazel can reupload them if they were to disappear. | ||
|
||
Based on the conclusions of his measurement results, Jakob added a | ||
series of command line flags (`--remote_download_minimal`, | ||
`--remote_download_toplevel`, etc.) that permit users to skip the | ||
downloading step when possible. When `--remote_download_minimal` is | ||
enabled, outputs will only be downloaded if successive actions depend on | ||
them, or if `bazel run` is invoked and the output is an execution | ||
dependency. During every invocation, Bazel constructs an additional | ||
ActionInputMap that stores metadata of all files that are only present | ||
remotely, thereby allowing input roots to be constructed without having | ||
files present locally. Various remote CAS implementations have been | ||
improved to guarantee the retention of recently accessed objects, | ||
thereby ensuring forward progress. | ||
|
||
Though `--remote_download_minimal` has helped many users of Bazel's | ||
remote execution to scale, some inherent downsides remain: | ||
|
||
- Bazel's memory usage has increased significantly. The ActionInputMap | ||
that gets created during build may easily consume multiple gigabytes | ||
of space for a sufficiently large project. | ||
- Incremental builds have become significantly slower. Because the | ||
ActionInputMap is discarded after every build, Bazel effectively | ||
assumes that the outputs have disappeared, thereby causing it to do a | ||
full rebuild. | ||
- Users now need to make a conscious choice whether they want to | ||
download output files or not. This makes it harder to do ad hoc | ||
exploration. | ||
|
||
# Proposal | ||
|
||
In a nutshell, the proposal is to optionally let Bazel create a | ||
bazel-out/ directory that contains the same layout as generated by a | ||
plain build (without `--remote_download_minimal`), but somehow delay | ||
downloads of remote files until their contents are actually read (either | ||
by successive no-remote actions, or when the user accesses bazel-out/ | ||
manually). This prevents the need for the additional ActionInputMap and | ||
once again grants the user the ability to explore build outputs on | ||
demand. | ||
|
||
One challenge with this approach is that it relies on special operating | ||
system features to create such lazy-loading files. It can't be built on | ||
top of the plain POSIX API. Unfortunately, no standards seem to exist in | ||
this space: | ||
|
||
- Linux provides [FUSE](https://en.wikipedia.org/wiki/Filesystem_in_Userspace), | ||
which can be used to create a mountpoint for which all operations are | ||
forwarded through a character device to a userspace process. | ||
- For macOS there is [macFUSE](https://osxfuse.github.io) (also known as | ||
OSXFUSE). It implements a slightly older version of the FUSE protocol, | ||
with minor macOS specific protocol extensions in place. In terms of | ||
robustness and performance, it fares worse than Linux's FUSE | ||
implementation. Though still hosted on GitHub, this project is no | ||
longer Open Source. The last Open Source version no longer runs on | ||
modern versions of macOS. | ||
- Windows provides an API called | ||
[Projected File System (ProjFS)](https://docs.microsoft.com/en-us/windows/win32/projfs/projected-file-system) | ||
that makes it possible to instantiate files underneath a | ||
virtualization root whose contents are backed by a | ||
[promise](https://en.wikipedia.org/wiki/Futures_and_promises). | ||
- On operating systems that don't offer the features above, but do | ||
support networked file systems (e.g., NFS, SMB), it may be desirable | ||
to let the system mount a fictive volume managed by a userspace process. | ||
|
||
Some of these APIs also require administrative privileges to work | ||
properly. Though FUSE can on most Linux systems be used without any | ||
special privileges through the use of the `fusermount` utility (setuid), | ||
it does require elevated privileges to get it to work inside a Docker | ||
container. | ||
|
||
Because of these limitations, it makes sense to let a daemon other than | ||
Bazel manage these directories. In order to manage the lifecycle of | ||
these bazel-out/ directories and to instruct the creation of | ||
lazy-loading files, Bazel may communicate with this daemon over gRPC. | ||
This approach has a couple of advantages: | ||
|
||
- It's a proven strategy. Google already has such a system internally | ||
for Blaze. Their daemon is called objfsd and uses FUSE. | ||
- By having a well-defined and succint protocol, it is possible for | ||
people to easily design their own daemons that use custom protocols or | ||
have custom storage policies (e.g., snapshotting and preserving the | ||
results of builds). These daemons may have a release cadence that is | ||
independent from Bazel. | ||
- A single daemon may be run on a system, caching build results for | ||
multiple users and multiple projects. The lifetime of the daemon may | ||
be managed separately from the Bazel server, which may be useful for | ||
shared CI setups. | ||
|
||
The question then becomes what the schema needs to look like that Bazel | ||
and the daemon will use. Google already has a schema for this | ||
internally. Unfortunately, this protocol is not a perfect match: | ||
|
||
- It doesn't support REv2 specific concepts such as instance names, | ||
user-configurable hashing functions, etc. | ||
- It makes no use of REv2 Protobuf messages, even though there are many | ||
that seem useful in literal form (Digest, OutputFile, OutputDirectory, | ||
OutputSymlink). | ||
- The semantics around preserving files seems different. Google's | ||
internal protocol has support for explicitly marking files that need | ||
to be retained, while REv2 provides no such mechanism. It is assumed | ||
that clients call FindMissingBlobs() to instruct a build cluster to | ||
keep files present, at least for the remainder of a build. | ||
|
||
Because of that, [Bazel PR #12823](https://github.com/bazelbuild/bazel/pull/12823) | ||
provides an experimental implementation that uses a custom gRPC protocol | ||
that looks as follows: | ||
|
||
```proto | ||
service RemoteOutputService { | ||
// Remove a bazel-out/ directory. | ||
rpc Clean(CleanRequest) returns (google.protobuf.Empty); | ||
// Indicate that a build is starting or has completed. May create a | ||
// bazel-out/ if none exists for the current workspace. | ||
rpc StartBuild(StartBuildRequest) returns (StartBuildResponse); | ||
rpc FinalizeBuild(FinalizeBuildRequest) returns (google.protobuf.Empty); | ||
// Create one or more lazy-loading files or directories inside a | ||
// bazel-out/ directory, backed by objects in the CAS. | ||
rpc BatchCreate(BatchCreateRequest) returns (google.protobuf.Empty); | ||
// Reobtain the metadata of files stored in the bazel-out/ directory. | ||
rpc BatchStat(BatchStatRequest) returns (BatchStatResponse); | ||
} | ||
``` | ||
|
||
This protocol is strongly based on the API of | ||
[Bazel's OutputService class](https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/vfs/OutputService.java). | ||
The Protobuf file in the PR contains more complete documentation. An | ||
example server-side implementation | ||
[has been published as part of the Buildbarn project](https://github.com/buildbarn/bb-clientd). | ||
|
||
# Alternatives considered | ||
|
||
In the months before this proposal was written, various experiments were | ||
performed: | ||
|
||
- An attempt was made to add an integrated FUSE file system to Bazel | ||
itself. This project was eventually abandoned due to the limitations | ||
mentioned previously (i.e., the inability to run Bazel without | ||
elevated privileges). | ||
- Letting Bazel manage the bazel-out/ directory itself, emitting | ||
symbolic links to a FUSE file system that provides a flat view of the | ||
CAS ([PR #11703](https://github.com/bazelbuild/bazel/pull/11703) and | ||
[PR #11622](https://github.com/bazelbuild/bazel/pull/11662)). Though | ||
this worked fine from Bazel's point of view, the additional symlink | ||
indirection confused many build actions. It completely broke dynamic | ||
linkage. | ||
- Letting Bazel write its contents into a tmpfs-like FUSE daemon, where | ||
files may be hardlinked from a directory that provides a flat view of | ||
the CAS. This worked, but performance was very bad due to a high | ||
amount of context switching. The gRPC-based solution solves this by | ||
supporting batched requests. | ||
|
||
# Backward-compatibility | ||
|
||
The goal is to implement this feature in such a way that local | ||
execution, plain remote execution, remote execution with | ||
`--remote_download_minimal`, remote execution with a local disk cache, | ||
etc. all remain functional. |