Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support zstd decompression for external dependencies #11968

Closed
wants to merge 29 commits into from

Conversation

glukasiknuro
Copy link
Contributor

@glukasiknuro glukasiknuro commented Aug 18, 2020

Bug: Add support for .tar.zst to http_archive

Handling of jni headers and loading system library is very similar to one already in bazel for native os library and possibly can be unified at some point.

Checked repo with big external deps, and noticed following improvements when checking bazel tracing profile:

size (tar.gz) |  extracting tar.gz  | extracting tar.zst
--------------------------------------------------------
    773M      |       13 s          |       3.2 s
    421M      |       13 s          |         5 s
    346M      |      9.2 s          |       3.3 s
    261M      |      7.5 s          |         3 s

One thing to check is whether bazel distributes binaries compiled with -c opt - personally tested with version compiled with this option.

@googlebot
Copy link

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@gjasny
Copy link
Contributor

gjasny commented Aug 18, 2020

@googlebot I consent

@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@glukasiknuro glukasiknuro force-pushed the zstd branch 3 times, most recently from fe720d8 to 24b67dc Compare August 19, 2020 19:03
glukasiknuro and others added 13 commits August 19, 2020 13:01
Looks adding zstd slightly increased current size and getting an error:

   Size of install_base is 281984 kB, expected it to be less than 281600 kB
 - a number of C2036 fixes - void * arithmetic is not supported by MSVC
 - gcc compiler options not supported by MSVC
Possibly can be fixed upstream:

```
INFO: From Compiling third_party/zstd-jni/src/main/native/jni_zdict.c:
third_party/zstd-jni/src/main/native/jni_zdict.c: In function 'Java_com_github_luben_zstd_Zstd_trainFromBuffer':
third_party/zstd-jni/src/main/native/jni_zdict.c:55:12: warning: 'size' may be used uninitialized in this function [-Wmaybe-uninitialized]
 E1: return size;
            ^
third_party/zstd-jni/src/main/native/jni_zdict.c: In function 'Java_com_github_luben_zstd_Zstd_trainFromBufferDirect':
third_party/zstd-jni/src/main/native/jni_zdict.c:89:12: warning: 'size' may be used uninitialized in this function [-Wmaybe-uninitialized]
 E1: return size;
            ^
```
@glukasiknuro glukasiknuro marked this pull request as ready for review August 19, 2020 20:51
@glukasiknuro
Copy link
Contributor Author

@meisterT could you review this PR or suggest how to help with proceeding with this change?

@meisterT
Copy link
Member

Sorry for the delay, I was OOO.

What is the cost in terms of Bazel binary size with this change?

cc @meteorcloudy for the potential additional dependency

@glukasiknuro
Copy link
Contributor Author

PTAL

Fixed windows support - System.loadLibrary under windows does not add "lib" in from of shared library name.
Loading from Runfiles is needed in tests.

Also upgraded zstd-jni to 1.4.5-11 which contains fixes for pointer arithmetic which allows the patch to be smaller (luben/zstd-jni#140). Would love to remove the patch for Native.java at all, but need to figure out how to do it so that zstd-jni will work as well.

Propagated changes to #12274, and tested under windows and in our internal codebase (@meteorcloudy could you double check it works for you as well?)

@philwo
Copy link
Member

philwo commented Nov 9, 2020

@glukasiknuro I'm importing this now. I will have to split it up into a couple of commits due to the changes to //third_party, don't be surprised. :)

bazel-io pushed a commit that referenced this pull request Nov 9, 2020
Partial import of #11968.

PiperOrigin-RevId: 341407243
philwo added a commit that referenced this pull request Nov 9, 2020
Extracted from #11968, as I have to merge this separately from the rest
of that PR.
philwo added a commit that referenced this pull request Nov 9, 2020
Extracted from #11968, as I have to merge this separately from the rest
of that PR.
bazel-io pushed a commit that referenced this pull request Nov 9, 2020
Extracted from #11968, as I have to merge this separately from the rest
of that PR.

Partial commit for third_party/*, see #12437.

Signed-off-by: Philipp Wollermann <[email protected]>
@philwo
Copy link
Member

philwo commented Nov 10, 2020

@glukasiknuro Two out of three parts are merged, I prepared a code review internally for the remaining changes - unfortunately I noticed that internally we appear to have a version of Apache Commons Compress without zstd support, so this doesn't compile when I build it in Google's monorepo. 😞 I'll have to figure out how to either upgrade our version of that library or change things so that our internal version of Bazel builds without the TarZstdFunction part (essentially disabling this feature just for our internal version).

@philwo
Copy link
Member

philwo commented Nov 10, 2020

@glukasiknuro Would it be an option to use https://github.com/airlift/aircompressor as an alternative to the JNI version?

@glukasiknuro
Copy link
Contributor Author

glukasiknuro commented Nov 10, 2020

@glukasiknuro Would it be an option to use https://github.com/airlift/aircompressor as an alternative to the JNI version?

Using aircompressor would be nice b/c it's in pure java.

Although looks like aircompressor does not support streaming (i.e. whole file needs to be decompressed at once) and it does not support bigendian machines because of UNSAFE usage (this maybe smaller issue). But let me know if you think those are non-blockers, i may try to use aircompressor.

BTW tangential effort is supporting zstd to compress blobs in remote cache (bazelbuild/remote-apis#168), I believe having zstd in some form would be needed for this effort as well. Looks like aircompressor implements only default zstd compression level, but this can be probably OK (@mostynb ?)

@mostynb
Copy link
Contributor

mostynb commented Nov 10, 2020

BTW tangential effort is supporting zstd to compress blobs in remote cache (bazelbuild/remote-apis#168), I believe having zstd in some form would be needed for this effort as well. Looks like aircompressor implements only default zstd compression level, but this can be probably OK (@mostynb ?)

The REAPI compressed-blob bytestream proposal leaves the compression level up to the implementations, I don't think using a library that only supports the default compression level would cause any interoperability issues. But you probably want to at least experiment with the default level and the "fastest" level before committing to a particular library.

@philwo
Copy link
Member

philwo commented Nov 12, 2020

it does not support bigendian machines because of UNSAFE usage

That's unfortunately a blocker, as we don't want to break our s390x support. :/

I'm looking into merging the current version for now.

@philwo
Copy link
Member

philwo commented Nov 12, 2020

I'm sorry - I'm rolling back the partially merged PR before cutting Bazel 4.0 LTS, as we can't merge the remaining part in time and are worried that the newly added patched JNI dependency is too risky and will make it hard for distributions like Debian to accept.

Let's revisit this when we find a suitable pure Java version of Zstd.

bazel-io pushed a commit that referenced this pull request Nov 12, 2020
*** Reason for rollback ***

We're rolling back this feature before cutting Bazel 4.0 LTS, as we can't merge the remaining part in time and are worried that the newly added patched JNI dependency will make it hard for distributions like Debian to accept.

Let's revisit this when we find a suitable pure Java version of Zstd.

*** Original change description ***

Add @com_github_luben_zstd_jni repo to WORKSPACE.

Partial import of #11968.

PiperOrigin-RevId: 342024383
@aiuto aiuto added the team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. label Mar 31, 2021
@philwo philwo removed the team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website label Nov 29, 2021
@aignas
Copy link

aignas commented Jan 21, 2022

Hello all, what would be required to have this feature in bazel given that zstd compression/decompression feature was landed for the grpc remote cache functionality?

@meteorcloudy
Copy link
Member

@glukasiknuro Can you try to rebase with Bazel@HEAD and see if you can already use the existing zstd-jni library?

benjaminp added a commit to benjaminp/bazel that referenced this pull request Mar 11, 2022
This is mostly a rebase of bazelbuild#11968 with a few tweaks of my own.

Fixes bazelbuild#10342.

Co-authored-by: Grzegorz Lukasik <[email protected]>

RELNOTES[new]: `repository_ctx.extract`, and thus `http_archive`, can now decompress zstandard-compressed archives.
benjaminp added a commit to benjaminp/bazel that referenced this pull request Mar 15, 2022
This is mostly a rebase of bazelbuild#11968 with a few tweaks of my own.

Fixes bazelbuild#10342.

Co-authored-by: Grzegorz Lukasik <[email protected]>

RELNOTES[new]: `repository_ctx.extract`, and thus `http_archive`, can now decompress zstandard-compressed archives.
bazel-io pushed a commit that referenced this pull request Mar 21, 2022
This is mostly a rebase of #11968 with a few tweaks of my own.

Fixes #10342.

Co-authored-by: Grzegorz Lukasik <[email protected]>

RELNOTES[new]: `repository_ctx.extract`, and thus `http_archive`, can now decompress zstandard-compressed archives.

Closes #15026.

PiperOrigin-RevId: 436146779
brentleyjones pushed a commit to brentleyjones/bazel that referenced this pull request Mar 21, 2022
This is mostly a rebase of bazelbuild#11968 with a few tweaks of my own.

Fixes bazelbuild#10342.

Co-authored-by: Grzegorz Lukasik <[email protected]>

RELNOTES[new]: `repository_ctx.extract`, and thus `http_archive`, can now decompress zstandard-compressed archives.

Closes bazelbuild#15026.

PiperOrigin-RevId: 436146779
(cherry picked from commit 00d74ff)
Wyverald pushed a commit that referenced this pull request Mar 21, 2022
This is mostly a rebase of #11968 with a few tweaks of my own.

Fixes #10342.

Co-authored-by: Grzegorz Lukasik <[email protected]>

RELNOTES[new]: `repository_ctx.extract`, and thus `http_archive`, can now decompress zstandard-compressed archives.

Closes #15026.

PiperOrigin-RevId: 436146779
(cherry picked from commit 00d74ff)

Co-authored-by: Benjamin Peterson <[email protected]>
@meteorcloudy
Copy link
Member

Closing due to #15026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants