Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote: async upload #13655

Closed
wants to merge 10 commits into from
Closed

Conversation

coeuvre
Copy link
Member

@coeuvre coeuvre commented Jul 8, 2021

When remote cache is enabled, the outputs of locally executed actions are uploaded to the remote cache. However, the uploads take place as the part of a local action spawn which prevents dependent actions from starting .

This change introduces flag --experimental_remote_cache_async flag, which when enabled, makes the uploads happen in the background. Remote actions that need the output as input are still wait for the upload to finish, including the already started upload with the other inputs.

Uploads that are still active after the build are waited to finish with message like:

INFO: Build completed successfully, 5001 total actions
Waiting for remote cache: 4911 uploads; 16s

This change also de-duplicates calls to FindMissingBlobs as well as file uploads to the remote cache. This is effectively the same as #13166 but with different implementation.

Fixes #13632.

@google-cla google-cla bot added the cla: yes label Jul 8, 2021
@brentleyjones
Copy link
Contributor

Thanks for this @coeuvre!

@coeuvre coeuvre force-pushed the remote-async-upload branch from 5b9cfdc to 8e8922c Compare July 27, 2021 04:09
@coeuvre coeuvre force-pushed the remote-async-upload branch from 5fd6c3b to e9eb347 Compare August 11, 2021 03:01
@coeuvre
Copy link
Member Author

coeuvre commented Aug 11, 2021

Rebased and fixed the tests. New tests are needed to cover the new code and this PR is ready to go!

@brentleyjones
Copy link
Contributor

How does this interact with BES, and in particular --bes_upload_mode?

@coeuvre
Copy link
Member Author

coeuvre commented Aug 12, 2021

BES uses ByteStreamUploader to upload local files while RE uses RemoteCache which uses ByteStreamUploader under the hood.

ByteStreamUploader can deduplicate file uploads. This PR updates RemoteCache to deduplicate file uploads as well. Technically, they share the same upload instance. But there are some overheads here:

upload 1 (RE) ----
                 |---- RemoteCache ---- upload a ----
upload 2 (RE) ----                                  |
                                                    |---- ByteStreamUploader ---- upload ---- Server
                                                    |
upload b (BES) --------------------------------------

I will make followup PRs to update BES to use RemoteCache directly and remove the deduplication logic in ByteStreamUploader (as well as improvements for #11473).

RE doesn't read --bes_upload_mode and behaves like wait_for_upload_complete. Adding support for that is not complex but I'm not sure whether it's needed.

@coeuvre coeuvre marked this pull request as ready for review August 12, 2021 07:56
@coeuvre coeuvre requested a review from philwo August 12, 2021 07:56
@brentleyjones
Copy link
Contributor

I will make followup PRs to update BES to use RemoteCache directly and remove the deduplication logic in ByteStreamUploader (as well as improvements for #11473).

I really look forward to fixes for #11473, in particular to having it respect no-remote/no-cache tags. Currently I have to set --experimental_build_event_upload_strategy=local when I really want it to be --experimental_build_event_upload_strategy=remote.

Copy link
Member

@philwo philwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, could you split this into smaller parts during import? 😊

bazel-io pushed a commit that referenced this pull request Sep 1, 2021
Add `shutdown` and `awaitTermination` to `AsyncTaskCache` which can be used to wait for in progress tasks finished.

Part of #13655.

PiperOrigin-RevId: 394169528
bazel-io pushed a commit that referenced this pull request Sep 1, 2021
Update RemoteCacheClient#uploadActionResult to return a Future.

Part of #13655.

PiperOrigin-RevId: 394173767
bazel-io pushed a commit that referenced this pull request Sep 1, 2021
Refactor UploadManifest to an upper level and update ExecutionServer to use it directly.

Part of #13655.

PiperOrigin-RevId: 394178331
bazel-io pushed a commit that referenced this pull request Sep 3, 2021
Remove RemoteCache#upload. Use UploadManifest#upload directly in RemoteExecutionService.

Part of #13655.

PiperOrigin-RevId: 394606309
bazel-io pushed a commit that referenced this pull request Sep 6, 2021
Deduplicate calls to findMissingBlobs and file uploads. This is a different implementation to #13166.

Part of #13655.

Fixes #12113.

Closes #13166.

PiperOrigin-RevId: 395015753
bazel-io pushed a commit that referenced this pull request Sep 6, 2021
Add RxUtils#mergeBulkTransfer which is similar to waitForBulkTransfer but is used with Rx.

Add UploadManifest#uploadAsync and update UploadManifest#upload to use it.

Part of #13655.

PiperOrigin-RevId: 395018504
bazel-io pushed a commit that referenced this pull request Sep 6, 2021
Add --experimental_remote_cache_async flag, which when enabled, makes the uploads happen in the background.

Part of #13655.

PiperOrigin-RevId: 395020967
bazel-io pushed a commit that referenced this pull request Sep 6, 2021
Update UI to display background uploads that are still active after the build with message like:

```
INFO: Build completed successfully, 5001 total actions
Waiting for remote cache: 4911 uploads; 16s
```

Part of #13655.

PiperOrigin-RevId: 395047830
@bazel-io bazel-io closed this in 581c81a Sep 7, 2021
@coeuvre
Copy link
Member Author

coeuvre commented Sep 7, 2021

Imported.

@brentleyjones You can give it a try by setting --experimental_remote_cache_async. Feedback is welcomed :)

@brentleyjones
Copy link
Contributor

brentleyjones commented Sep 7, 2021

@coeuvre This seems to work as advertised!

Something that confused me though: at the end of a build, I saw the number of items waiting to be uploaded increase as other ones finished? For example, I saw this:

Waiting for remote cache: 87 uploads, 11s

then after some finished, and the count went down, I saw it increase

Waiting for remote cache: 111 uploads, 42s

that doesn't make sense to me?


Also, a small UX feedback: in the profile.json, upload outputs now shows for a very small bit, and doesn't actually represent the upload anymore. I think it would be nice if the uploads appeared in their own swimlanes, though maybe the profile isn't the right place for that sort of thing, since it can end before uploads are finished now as well (same with BES uploads, which aren't shown at all). If not, maybe it should have a different name to avoid confusion with the existing blocking upload behavior.

@brentleyjones
Copy link
Contributor

brentleyjones commented Sep 7, 2021

Ohh, another thought: this behaves like --bes_upload_mode=wait_for_upload_complete. I believe there is value in a version that behaves like nowait_for_upload_to_complete and/or fully_async, to prevent the end of the build from being blocked by the uploads (for the same reasons one might adjust the --bes_upload_mode).

Maybe a new flag can replace --bes_upload_mode and both types of uploads can be tied to the same semantics? Or, at a minimum, --experimental_remote_cache_async can change to --experimental_remote_cache_upload_mode?

@coeuvre
Copy link
Member Author

coeuvre commented Sep 8, 2021

Thanks for the feedback!

then after some finished, and the count went down, I saw it increase

The steps of an async upload task are:

  1. Call findMissingDigests to find which files to upload.
  2. For each missing file/blob:
    1. Start uploading the content and increase the counter.
    2. Decrease the counter after finished.
  3. After all files/blobs are uploaded:
    1. Start uploading action result and increase the counter.
    2. Decrease the counter after finished.

After the build finished, there could be multiple upload tasks running in the background and some of them may still waiting for the result of findMissingDigests. Once those tasks get the results, the counter will be increased depends on how many missing files/blobs.

That being said, I understand that this may create confusion to users. One way to improve is before calling findMissingDigests, we can increase the counter for all the outputs. Decrease counter for files that are already existed in the remote cache immediately after findMissingDigests returned. And decrease counter for files that are uploaded after upload finished.

Also, a small UX feedback: in the profile.json, upload outputs now shows for a very small bit, and doesn't actually represent the upload anymore. I think it would be nice if the uploads appeared in their own swimlanes, though maybe the profile isn't the right place for that sort of thing, since it can end before uploads are finished now as well (same with BES uploads, which aren't shown at all). If not, maybe it should have a different name to avoid confusion with the existing blocking upload behavior.

For profile.json, we should probably only keep upload outputs for blocking upload. It would nice if we can show uploads (and other background tasks) in their own swimlanes (the format actually supports it with Async Events) but requires some fundamental changes to the profile code. I can look into it later.

Ohh, another thought: this behaves like --bes_upload_mode=wait_for_upload_complete. I believe there is value in a version that behaves like nowait_for_upload_to_complete and/or fully_async, to prevent the end of the build from being blocked by the uploads (for the same reasons one might adjust the --bes_upload_mode).

Noted.

bazel-io pushed a commit that referenced this pull request Sep 22, 2021
…se after build finished.

At the end of a build, the number of files waiting to be uploaded could increase as other ones finished. This PR fixes that.

Also, changes to only emit profile block `upload outputs` for blocking uploads.

Fixes #13655 (comment).

Closes #13954.

PiperOrigin-RevId: 398161750
@brentleyjones
Copy link
Contributor

@coeuvre Do we need a new issue to track the request for background (post-build) uploads?

@coeuvre
Copy link
Member Author

coeuvre commented Jan 25, 2022

Yes, please help create a new issue to better track that (and other features you think are missing)!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cache uploads shouldn't block dependent local actions
3 participants