Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support UNRECOGNIZED types + decode BYTES columns lazily #2219

Merged
merged 10 commits into from
Feb 2, 2023

Conversation

olavloite
Copy link
Collaborator

@olavloite olavloite commented Jan 5, 2023

Unrecognized Types

Adds support for types that do not have a known type code in the client library (yet). This can happen if a new type is added to Cloud Spanner before this client is updated, or if someone is using an old version of the client library with a version of Cloud Spanner that contains a type that was not available at the time that the specific version of the client library was built.

Unrecognized types are assigned a type with type code UNRECOGNIZED. The values of these types (and all other types) can be retrieved as follows:

  1. Calling ResultSet#getValue(..) will return the column value as a Value instance.
  2. Calling Value#getAsString() will return a string representation of the given value. The string value will be a valid and complete representation of the underlying value. This method is guaranteed to work for all known and unknown types. Array types will return a comma-separated string enclosed in square brackets (e.g. [true, NULL, false] for a boolean array).
  3. Calling Value#getAsStringList() will return a list of strings. The list of strings will contain:
    1. In case the value is an array: One string for each element in the array. Each element is a complete and valid string representation of the underlying array element value.
    2. In case the value is a non-array type: A singleton list containing only the string representation of the specific value.
    3. A user can always call Value#getType()#getCode() == Code.ARRAY to check whether a value is an array with one element, or whether it is a non-array type to determine what the reason is that getAsStringList() returns a list with one element.

Decoding BYTES

BYTES columns are encoded as Base64 strings. Decoding these are relatively CPU-heavy, especially for large values. Decoding them is not always necessary if the user only needs the Base64 string. Also, the internally used Guava decoder is less efficient than JDK implementations that are available from Java 8 and onwards. This change therefore delays the decoding of BYTES columns until it is actually necessary, and then uses the JDK implementation instead of the Guava version. The JDK implementation in OpenJDK 17 uses approx 1/3 of the CPU cycles of the Guava version.

This feature is combined with unrecognized types so it can make use of the new getAsString() methods. This prevents the introduction of what is (theoretically) a breaking change. Currently, Value#getString(..) throws an exception if you try to call this on a BYTES value. This remains the case after this change, and instead users should call Value#getAsString() to get the underlying Base64 value of a BYTES column.

BYTES columns are encoded as Base64 strings. Decoding these are relatively CPU-heavy,
especially for large values. Decoding them is not always necessary if the user only
needs the Base64 string. Also, the internally used Guava decoder is less efficient
than JDK implementations that are available from Java 8 and onwards. This change
therefore delays the decoding of BYTES columns until it is actually necessary, and
then uses the JDK implementation instead of the Guava version. The JDK implementation
in OpenJDK 17 uses approx 1/3 of the CPU cycles of the Guava version.
@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: spanner Issues related to the googleapis/java-spanner API. labels Jan 5, 2023
@product-auto-label product-auto-label bot added size: xl Pull request size is extra large. and removed size: m Pull request size is medium. labels Jan 11, 2023
@olavloite olavloite marked this pull request as ready for review January 13, 2023 12:58
@olavloite olavloite requested a review from a team as a code owner January 13, 2023 12:58
@olavloite olavloite changed the title perf: decode BYTES columns lazily feat: support UNRECOGNIZED types + decode BYTES columns lazily Jan 13, 2023
@gcf-owl-bot gcf-owl-bot bot requested a review from a team as a code owner January 13, 2023 13:28
Copy link

@aseering aseering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just did a quick scan and left a comment or two.

b.append("ARRAY<");
} else {
// This is very unlikely to happen. It would mean that we have introduced a type that
// is not an ARRAY, but does have an array element type.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the new PROTO type do this? @charvisingla

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proto type has its own field called proto_type_fqn for specifying the proto type name. I agree that its very unlikely to use array_element_type field for a non array type.

@@ -251,7 +251,6 @@ public Date getDate(String columnName) {

@Override
public Value getValue(int columnIndex) {
checkNonNull(columnIndex, columnIndex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a breaking change? We are modifying the contract here, I wonder if any customer is relying on this? I would imagine not, but double checking.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... That's a good point. I don't think anyone would be relying on it, but you're right that it is a change of the contract. In this case we will always be returning a non-null value, even if the database value is NULL (we're returning a Value instance whose internal value is null and whose type is set to the type of the column), so it also a safe value to return. I think there are two (three) options here:

  1. Accept the small change in contract.
  2. Add an extra method named something like 'getValueOrNull`
  3. Do a major version bump (but I don't feel like this feature is worth that)

@thiagotnunes @rajatbhatta @ansh0l Any specific thoughts on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 makes sense to me as well. As discussed offline, the documentation for getValue also indicates that this method can be used with columns having a null value.

@@ -204,13 +210,23 @@ private Type(
Code code,
@Nullable Type arrayElementType,
@Nullable ImmutableList<StructField> structFields) {
this.proto = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we always populate the corresponding proto type?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a little difficult. This constructor is used to create the fixed type constants for all known types in the client library. They do not have a proto definition available at construction time. See for example here:

private static final Type TYPE_BOOL = new Type(Code.BOOL, null, null);

@olavloite olavloite requested a review from rajatbhatta January 23, 2023 19:27
Copy link
Contributor

@rajatbhatta rajatbhatta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, apart from a few nits or questions. Thank you @olavloite for working on this!

@@ -251,7 +251,6 @@ public Date getDate(String columnName) {

@Override
public Value getValue(int columnIndex) {
checkNonNull(columnIndex, columnIndex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 makes sense to me as well. As discussed offline, the documentation for getValue also indicates that this method can be used with columns having a null value.

@olavloite
Copy link
Collaborator Author

CI failure for Windows is a setup failure for that CI run and not related to this change.

@olavloite olavloite merged commit fc721c4 into main Feb 2, 2023
@olavloite olavloite deleted the lazy-decode-bytes branch February 2, 2023 10:10
gcf-merge-on-green bot pushed a commit that referenced this pull request Feb 8, 2023
🤖 I have created a release *beep* *boop*
---


## [6.36.0](https://togithub.com/googleapis/java-spanner/compare/v6.35.2...v6.36.0) (2023-02-08)


### Features

* Support UNRECOGNIZED types + decode BYTES columns lazily ([#2219](https://togithub.com/googleapis/java-spanner/issues/2219)) ([fc721c4](https://togithub.com/googleapis/java-spanner/commit/fc721c4d30de6ed9e5bc4fbbe0e1e7b79a5c7490))


### Bug Fixes

* **java:** Skip fixing poms for special modules ([#1744](https://togithub.com/googleapis/java-spanner/issues/1744)) ([#2244](https://togithub.com/googleapis/java-spanner/issues/2244)) ([e7f4b40](https://togithub.com/googleapis/java-spanner/commit/e7f4b4016f8c4c7e4fac0b822f5af2cffd181134))


### Dependencies

* Update dependency com.google.cloud:google-cloud-monitoring to v3.11.0 ([#2262](https://togithub.com/googleapis/java-spanner/issues/2262)) ([d566613](https://togithub.com/googleapis/java-spanner/commit/d566613442217bdfc69caea7242464fba2647519))
* Update dependency com.google.cloud:google-cloud-shared-dependencies to v3.2.0 ([#2264](https://togithub.com/googleapis/java-spanner/issues/2264)) ([b5fdbc0](https://togithub.com/googleapis/java-spanner/commit/b5fdbc0accdaaf1f63c62c1837d72bb378dc8f43))
* Update dependency com.google.cloud:google-cloud-trace to v2.10.0 ([#2263](https://togithub.com/googleapis/java-spanner/issues/2263)) ([96f0c81](https://togithub.com/googleapis/java-spanner/commit/96f0c8181aeb8ca75647a783d8b163f371ad937e))

---
This PR was generated with [Release Please](https://togithub.com/googleapis/release-please). See [documentation](https://togithub.com/googleapis/release-please#release-please).
.setName("c")
.setType(
Type.newBuilder()
.setCodeValue(Integer.MAX_VALUE)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line sets a type code value that does not exist, and will be seen as an unrecognized type in the client library

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the googleapis/java-spanner API. size: xl Pull request size is extra large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants