-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BLAKE3 hasher to vfs #18784
Add BLAKE3 hasher to vfs #18784
Conversation
@coeuvre -- Could use your advice on how to fix:
Thanks! |
For 1, yes the test is to verify Bazel can be built without network. You need to add blake3 to @meteorcloudy what's your opinion on 2? |
@@ -82,8 +73,6 @@ private DigestHashFunction( | |||
checkArgument(!names.isEmpty()); | |||
this.name = names.get(0); | |||
this.names = names; | |||
this.messageDigestPrototype = getMessageDigestInstance(); | |||
this.messageDigestPrototypeSupportsClone = supportsClone(messageDigestPrototype); | |||
} | |||
|
|||
public static DigestHashFunction register( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot change how the registration works here because we rely on this internally. You could, however, glue BLAKE3 into this with following steps:
- Write a class that implements
java.security.MessageDigest
forBLAKE3
. This is the class that interact with JNI and call into c code. - Register the
MessageDigest
from step 1 by implementing ajava.security.Provider
and add it withjava.security.Security.addProvider(...)
. So that when you callMessageDigest.getInstance("BLAKE3")
, you can get back the instance from the provider. - Adapt
DigestMessage
forBLAKE3
intoHashFunction
by calling MessageDigestHashFunction. - Create a
DigestHashFunction
by callingDigestHashFunction.register(...)
using theHashFunction
from step 3.
I know this is verbose and not fun but we have to write it in this way.
For the Lines 276 to 281 in 75b0221
|
For "Clang on :ubuntu: 18.04 LTS (OpenJDK 11, gcc 7.5.0)", we do plan to phase out ubuntu 1804 soon. But we need to schedule the work and ideally we do that for Bazel 7 not in the middle of a LTS release since we promises backwards compatibility. So if there is other ways to workaround it, that will be preferred. |
a904ba6
to
8e26aa5
Compare
PTAL @coeuvre -- I've made the changes you suggested and made the implementation a MessageDigest as well as reverting the changes to Fingerprint etc. Thank you for the detailed pointers there. Re failing tests, I have merged bazelbuild/bazel-central-registry#698 to fix the bzlmod tests but now they are failing in a new way with:
Any ideas on how to fix this? |
public final class Blake3Provider extends Provider { | ||
public Blake3Provider() { | ||
super("BLAKE3Provider", "1.0", "A BLAKE3 digest provider"); | ||
put("MessageDigest.BLAKE3", "com.google.devtools.build.lib.vfs.Blake3MessageDigest"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Use Blake3MessageDigest.class.getName()
to make refactor easier in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -67,6 +68,7 @@ public int getDigestMaximumLength() { | |||
|
|||
public static final DigestHashFunction SHA1 = register(Hashing.sha1(), "SHA-1", "SHA1"); | |||
public static final DigestHashFunction SHA256 = register(Hashing.sha256(), "SHA-256", "SHA256"); | |||
public static final DigestHashFunction BLAKE3 = register(new Blake3HashFunction(), "BLAKE3"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just use new MessageDigestHashFunction("BLAKE3")
(doc) instead of implementing HashFunction
by yourself in Blake3HashFunction
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, we don't want put BLAKE3 code in the common code path. Neither the registration here nor the provider below.
I suggest we have a utility class BazelHashFunctions
which installs the provider and register the hash function, e.g.:
class BazelHashFunctions {
static {
addProvider(...);
}
public static final DigestHashFunction BLAKE3 = DigestHashFunction.register(...);
}
In a following PR when we want to create a file system based on BLAKE3, we could do that in BazelFileSystemModule
with code like:
new UnixFileSystem(BazelHashFunctions.BLAKE3, ...);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to read between the lines here and figure out where you want stuff / how you want it to be injected -- let me know what I got wrong.
I moved blake3 stuff into a bazel
subdir of vfs
. The provider is statically registered when BazelHashFunctions
is loaded by the classloader in src/main/java/com/google/devtools/build/lib/bazel/Bazel.java
. Re the MessageDigestHashFunction -- addressed above, but that's not public afaict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we want the provider to be statically registered by the classloader when it is loading class BazelHashFunctions
. But we don't want to have it implementing BlazeModule
and put it in src/main/java/com/google/devtools/build/lib/bazel/Bazel.java
.
What I proposed above is we could add the code
new UnixFileSystem(BazelHashFunctions.BLAKE3, ...);
to somewhere like here so that when we want to use BLAKE3, it registers it.
We don't need to do it in this CL as we haven't focused on integrating BLAKE3 into Bazel's VFS. Let's limit the scope of this CL to only adapt BLAKE3 as a DigestHashFunction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give a little more context on what else is required to add the digest function to vfs? I'm a little unclear on that part and it will help me for the next cl. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I think I overcomplicated that. The only necessary part is we want to register BLAKE3
for Bazel only. We can just force loading BazelHashFunctions
by adding following code block in BazelFilesystemModule
class BazelFilesystemmodule {
static {
BazelHashFunctions.ensureRegistered();
}
}
where BazelHashFunctions.ensureRegistered
is just an empty static function whose sole purpose is to let JVM load class BazelHashFunctions
and run the static block there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is just what I needed. Done.
import java.security.DigestException; | ||
import java.security.MessageDigest; | ||
|
||
public final class Blake3MessageDigest extends MessageDigest implements Hasher { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use MessageDigestHashFunction
as suggested below, you can remove implements Hasher
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MessageDigestHashFunction is not public, and copying it in is prohibitive because it brings like 5 other classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I missed that. Can we then create a separate class to implement Hasher
? I found it is confusing to combine these two especially when we have update
, engineUpdate
and share the underlying buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/main/java/com/google/devtools/build/lib/vfs/Blake3MessageDigest.java
Outdated
Show resolved
Hide resolved
07fbd47
to
c92ed72
Compare
} | ||
|
||
public void engineUpdate(byte[] data, int offset, int length) { | ||
while (length > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One optimization here is we could avoid copying if the size of data
is larger than ONESHOT_THRESHOLD
. But this is totally optional and we could leave that for a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was trying to keep this simple for now -- enough complexity already :) Agree on deferring this for a followup (where we measure perf).
} | ||
|
||
public void engineUpdate(byte b) { | ||
engineUpdate(new byte[] {b}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: avoid creating a new object, put the byte into buffer directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
@CanIgnoreReturnValue | ||
public Hasher putBytes(ByteBuffer b) { | ||
buffer = b; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to copy/use the argument b
now instead of saving it for use later. I believe the contract of the interface is the caller is free to use the passed buffer after the function is returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right you are.
339d81d
to
3d6b67e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this PR is pretty good now! Just a few more comments and we are ready to go!
src/main/native/BUILD
Outdated
cc_binary( | ||
name = "libunix_jni.so", | ||
srcs = [ | ||
"blake3_jni.cc", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this line since blake3_jni
is already in the deps
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
java_test( | ||
name = "BazelTests", | ||
size = "small", | ||
tags = ["no_windows"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see BALKE3 code can be compiled on Windows. Are there any blockers for running the tests on windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
"c/blake3_avx512_x86-64_unix.S", | ||
# Disable to appease bazel-ci which uses ubuntu-18 (EOL) and GCC 7 | ||
# lacking the headers to compile AVX512. | ||
# "c/blake3_avx512_x86-64_unix.S", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@meteorcloudy FYI, this is an unfortunate consequence of the old versions running in CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related: #18784 (comment)
src/main/native/blake3_jni.cc
Outdated
return env->ReleaseByteArrayElements(array, addr, 0); | ||
} | ||
|
||
extern "C" JNIEXPORT long JNICALL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This returns long
while the corresponding Java method returns int
. It should also the JNIEnv *
and jobject
parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
@Override | ||
protected void finalize() throws Throwable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That usage is fine as it comes after the loadJni
call. The failing test bootstraps Bazel and may not honor BUILD files, maybe the new JNI sources need to be listed somewhere else?
81ded64
to
890327f
Compare
@coeuvre I believe this is ready for another look / attempt at merging. Thanks |
Thanks, I am importing now! |
Everything went well. Currently waiting for internal review. |
@coeuvre sweet, thanks! Btw did you see my comment on the gerrit repo? https://bazel-review.googlesource.com/c/bazel/+/225070 Hoping we can make that last minute change before it goes in. Thanks again! |
Sorry, I didn't see that because there wasn't any notification :(. Np, I can make the change now. |
Thanks @coeuvre! |
@coeuvre thank you again for all your help shepherding this through the process |
Currently, it is included in runtime library with glob which is shared between bazel and blaze. Moving it to bazel package to prevent from packaging it when building blaze. An upcoming change (in bazelbuild#18784) will package BLAKE3 to this module and we don't want it in blaze. PiperOrigin-RevId: 547516042 Change-Id: I673fa3d6824d56c85e85b02d13b4eb56571edda9 (cherry picked from commit 6d6bbdc)
Currently, it is included in runtime library with glob which is shared between bazel and blaze. Moving it to bazel package to prevent from packaging it when building blaze. An upcoming change (in bazelbuild#18784) will package BLAKE3 to this module and we don't want it in blaze. PiperOrigin-RevId: 547516042 Change-Id: I673fa3d6824d56c85e85b02d13b4eb56571edda9 (cherry picked from commit 6d6bbdc)
Currently, it is included in runtime library with glob which is shared between bazel and blaze. Moving it to bazel package to prevent from packaging it when building blaze. An upcoming change (in #18784) will package BLAKE3 to this module and we don't want it in blaze. PiperOrigin-RevId: 547516042 Change-Id: I673fa3d6824d56c85e85b02d13b4eb56571edda9 (cherry picked from commit 6d6bbdc)
Currently, it is included in runtime library with glob which is shared between bazel and blaze. Moving it to bazel package to prevent from packaging it when building blaze. An upcoming change (in bazelbuild#18784) will package BLAKE3 to this module and we don't want it in blaze. PiperOrigin-RevId: 547516042 Change-Id: I673fa3d6824d56c85e85b02d13b4eb56571edda9 (cherry picked from commit 6d6bbdc)
Currently, it is included in runtime library with glob which is shared between bazel and blaze. Moving it to bazel package to prevent from packaging it when building blaze. An upcoming change (in #18784) will package BLAKE3 to this module and we don't want it in blaze. PiperOrigin-RevId: 547516042 Change-Id: I673fa3d6824d56c85e85b02d13b4eb56571edda9 (cherry picked from commit 6d6bbdc)
Currently, it is included in runtime library with glob which is shared between bazel and blaze. Moving it to bazel package to prevent from packaging it when building blaze. An upcoming change (in #18784) will package BLAKE3 to this module and we don't want it in blaze. PiperOrigin-RevId: 547516042 Change-Id: I673fa3d6824d56c85e85b02d13b4eb56571edda9 (cherry picked from commit 6d6bbdc) Co-authored-by: Googler <[email protected]>
The changes in this PR have been included in Bazel 6.4.0 RC1. Please test out the release candidate and report any issues as soon as possible. If you're using Bazelisk, you can point to the latest RC by setting USE_BAZEL_VERSION=last_rc. |
Add BLAKE3 hasher to vfs
This PR adds the Blake3Hasher and Blake3HashFunction classes to vfs and
makes them available under the flag --digest_function=BLAKE3.
I spent a little time trying to optimize performance more here, and am happy to
chat about different approaches (ByteBuffer vs byte[] vs unsafe vs mmap). In the
end this CL implements something pretty conservative, but there is room to get
more performance out of this with a different approach.
This is a partial commit for #18658.