Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace protobuf-java(lite) with pure Kotlin implementations on JVM #148

Merged
merged 5 commits into from
Apr 29, 2021

Conversation

garyp
Copy link
Collaborator

@garyp garyp commented Apr 25, 2021

This avoids compatibility issues when applications have transitive dependencies on different versions of protobuf-java(lite) via pbandk and some other library (e.g. Firebase). Since we no longer depend on protobuf-java(lite), we now also bundle
the well-known types proto files ourselves. Applications using the Protobuf Gradle Plugin expect these proto files to be available in pbandk (or one of its dependencies) in order to run protoc-gen-kotlin.

Additional changes included in this PR:

  • Allow reading multiple messages from an InputStream

    Since protobuf messages are not self-delimiting, by default decodeFromStream() will try to read until the end of the stream and try to decode all bytes it reads as part of the message. Applications will often prefix a message with its length when writing multiple messages to a single output stream. When consuming such a stream, the application can read the length first and then pass it to decodeFromStream() to make sure only that many bytes are read from the stream. Also modify the encodeToStream() method to return the number of bytes that were written to the stream.

  • Allow running JVM conformance tests with different I/O implementations

    By setting the PBANDK_CONFORMANCE_JVM_IO environment variable to either BYTE_BUFFER or BYTE_BUFFER, the conformance tests will instead encode/decode using ByteBuffer or InputStream/OutputStream on the JVM. This is handy as a quick test that those I/O paths work correctly. It's also handy as a rough benchmark since the conformance test involves a fair amount of protocol buffer encoding/decoding.

I did some very rough benchmarks of this change by running the conformance tests using both the old protobuf-java and the new pure-Kotlin implementations. From my benchmarks, the pure-Kotlin implementation is 70% slower than the protobuf-java implementation when encoding/decoding using ByteArrays or ByteBuffers. The pure-Kotlin implementation is comparable in speed (might even be slightly faster) when encoding/decoding using InputStream/OutputStream. This isn't completely surprising since the protobuf-java library has specialized implementations that make use of sun.misc.Unsafe for faster access to the byte array when sun.misc.Unsafe is available, whereas our pure-Kotlin implementation is only using the official ByteArray APIs.

This benchmark was running under the OpenJDK JVM on MacOS. Results might or might not be different on Android (since the ART runtime is very different than a typical desktop JVM, and the protobuf-javalite library also is implemented differently from protobuf-java) but I don't have an easy way to run the benchmarks on an Android device. The conformance test runner communicates with pbandk using protocol buffers sent over stdin/stdout. I modified the pbandk conformance test code to allow choosing whether to perform that stdin/stdout communication using either ByteArray, ByteBuffer, or InputStream/OutputStream.

These are the results with the previous protobuf-java pbandk implementation:

» hyperfine -w 3 -m 100 -L jvm_io BYTE_ARRAY,BYTE_BUFFER,STREAM 'env PBANDK_CONFORMANCE_JVM_IO={jvm_io} ./conformance/test-conformance.sh jvm'
Benchmark #1: env PBANDK_CONFORMANCE_JVM_IO=BYTE_ARRAY ./conformance/test-conformance.sh jvm
  Time (mean ± σ):      1.405 s ±  0.022 s    [User: 240.9 ms, System: 42.4 ms]
  Range (min … max):    1.361 s …  1.522 s    100 runs

Benchmark #2: env PBANDK_CONFORMANCE_JVM_IO=BYTE_BUFFER ./conformance/test-conformance.sh jvm
  Time (mean ± σ):      1.408 s ±  0.026 s    [User: 240.8 ms, System: 42.5 ms]
  Range (min … max):    1.351 s …  1.541 s    100 runs

Benchmark #3: env PBANDK_CONFORMANCE_JVM_IO=STREAM ./conformance/test-conformance.sh jvm
  Time (mean ± σ):      1.423 s ±  0.066 s    [User: 246.0 ms, System: 45.0 ms]
  Range (min … max):    1.332 s …  1.602 s    100 runs

Summary
  'env PBANDK_CONFORMANCE_JVM_IO=BYTE_ARRAY ./conformance/test-conformance.sh jvm' ran
    1.00 ± 0.02 times faster than 'env PBANDK_CONFORMANCE_JVM_IO=BYTE_BUFFER ./conformance/test-conformance.sh jvm'
    1.01 ± 0.05 times faster than 'env PBANDK_CONFORMANCE_JVM_IO=STREAM ./conformance/test-conformance.sh jvm'

and these are the results with the new pure-Kotlin pbandk implementation:

» hyperfine -w 3 -m 100 -L jvm_io BYTE_ARRAY,BYTE_BUFFER,STREAM 'env PBANDK_CONFORMANCE_JVM_IO={jvm_io} ./conformance/test-conformance.sh jvm'
Benchmark #1: env PBANDK_CONFORMANCE_JVM_IO=BYTE_ARRAY ./conformance/test-conformance.sh jvm
  Time (mean ± σ):      2.412 s ±  0.040 s    [User: 2.604 s, System: 0.332 s]
  Range (min … max):    2.345 s …  2.554 s    100 runs

Benchmark #2: env PBANDK_CONFORMANCE_JVM_IO=BYTE_BUFFER ./conformance/test-conformance.sh jvm
  Time (mean ± σ):      2.456 s ±  0.032 s    [User: 2.751 s, System: 0.387 s]
  Range (min … max):    2.365 s …  2.539 s    100 runs

Benchmark #3: env PBANDK_CONFORMANCE_JVM_IO=STREAM ./conformance/test-conformance.sh jvm
  Time (mean ± σ):      1.316 s ±  0.018 s    [User: 243.4 ms, System: 44.6 ms]
  Range (min … max):    1.286 s …  1.392 s    100 runs

Summary
  'env PBANDK_CONFORMANCE_JVM_IO=STREAM ./conformance/test-conformance.sh jvm' ran
    1.83 ± 0.04 times faster than 'env PBANDK_CONFORMANCE_JVM_IO=BYTE_ARRAY ./conformance/test-conformance.sh jvm'
    1.87 ± 0.04 times faster than 'env PBANDK_CONFORMANCE_JVM_IO=BYTE_BUFFER ./conformance/test-conformance.sh jvm'

@garyp garyp requested a review from JeroenMols April 26, 2021 00:14
Copy link
Contributor

@seanadkinson seanadkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 lmk if I should review the files with the license at the top, since those are meaty.

Copy link
Contributor

@JeroenMols JeroenMols left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I must admit that my knowledge of the project is still a bit limited to understand all changes in depth, but I tried to leave some meaningful comments.

Thanks also for the clear performance data! Honestly. I'm not too worried about that, because:

  • protobuf most likely used in conjunction with some kind of network requests, so we should ensure "serialization/deserialization time" <<< "network request time" instead of focussing on absolute times
  • time per serialization/deserialization is still low: if I understand the data correctly, the entire conformance suite runs in under 2 sec? How many tests are there in the suite? Assuming it's a 1000, then we are looking at ~2ms, which is below a 60fps frame rendering time.
  • there is a way to optimize this further (using streams) which we can explain in the readme OR we can ensure that the default examples use the fast method (e.g. provide a retrofit convertor based on streams)

@garyp garyp force-pushed the rm-protobuf-java branch from e871552 to f8d6e24 Compare April 26, 2021 17:24
garyp added 5 commits April 28, 2021 17:17
Since protobuf messages are not self-delimiting, by default
`decodeFromStream()` will try to read until the end of the stream and
try to decode all bytes it reads as part of the message.  Applications
will often prefix a message with its length when writing multiple
messages to a single output stream. When consuming such a stream, the
application can read the length first and then pass it to
`decodeFromStream()` to make sure only that many bytes are read from the
stream.
The conformance test communicates with the conformance test runner using
protocol buffer messages over stdin/stdout. By default it uses
`encodeToByteArray()` and `decodeFromByteArray()` to encode/decode the
messages on all platforms.

By setting the `PBANDK_CONFORMANCE_JVM_IO` environment variable to
either `BYTE_BUFFER` or `BYTE_BUFFER`, the conformance tests will
instead encode/decode using `ByteBuffer` or `InputStream`/`OutputStream`
on the JVM. This is handy as a quick test that those I/O paths work
correctly. It's also handy as a rough benchmark since the conformance
test involves a fair amount of protocol buffer encoding/decoding.
This avoids compatibility issues when applications have transitive
dependencies on different versions of protobuf-java(lite) via pbandk and
some other library (e.g. Firebase).

Also modify the `encodeToStream()` method to return the number of bytes
that were written to the stream. This can be useful information for the
caller.
Since we no longer depend on protobuf-java(lite), we now have to bundle
these proto files ourselves. Applications using the Protobuf Gradle
Plugin expect these proto files to be available in pbandk (or one of its
dependencies) in order to run `protoc-gen-kotlin`.
…d proto files

Now that we're bundling the well-known type proto files, we no longer
need to read them from the copy of protobuf installed on the build
system.

This update also pulled in a newer version of `descriptor.proto` with an
added field.
@garyp garyp force-pushed the rm-protobuf-java branch from f8d6e24 to 8bfd99b Compare April 29, 2021 00:47
@garyp garyp marked this pull request as ready for review April 29, 2021 00:47
@garyp garyp merged commit af13914 into master Apr 29, 2021
@garyp garyp deleted the rm-protobuf-java branch April 29, 2021 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PBandK 0.10.0.beta3 incompatible with Firebase Performance monitoring 19.0.10 or lower
3 participants