-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support UNRECOGNIZED types + decode BYTES columns lazily #2219
Conversation
BYTES columns are encoded as Base64 strings. Decoding these are relatively CPU-heavy, especially for large values. Decoding them is not always necessary if the user only needs the Base64 string. Also, the internally used Guava decoder is less efficient than JDK implementations that are available from Java 8 and onwards. This change therefore delays the decoding of BYTES columns until it is actually necessary, and then uses the JDK implementation instead of the Guava version. The JDK implementation in OpenJDK 17 uses approx 1/3 of the CPU cycles of the Guava version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just did a quick scan and left a comment or two.
b.append("ARRAY<"); | ||
} else { | ||
// This is very unlikely to happen. It would mean that we have introduced a type that | ||
// is not an ARRAY, but does have an array element type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the new PROTO type do this? @charvisingla
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proto type has its own field called proto_type_fqn for specifying the proto type name. I agree that its very unlikely to use array_element_type field for a non array type.
@@ -251,7 +251,6 @@ public Date getDate(String columnName) { | |||
|
|||
@Override | |||
public Value getValue(int columnIndex) { | |||
checkNonNull(columnIndex, columnIndex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a breaking change? We are modifying the contract here, I wonder if any customer is relying on this? I would imagine not, but double checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... That's a good point. I don't think anyone would be relying on it, but you're right that it is a change of the contract. In this case we will always be returning a non-null value, even if the database value is NULL
(we're returning a Value
instance whose internal value is null and whose type is set to the type of the column), so it also a safe value to return. I think there are two (three) options here:
- Accept the small change in contract.
- Add an extra method named something like 'getValueOrNull`
- Do a major version bump (but I don't feel like this feature is worth that)
@thiagotnunes @rajatbhatta @ansh0l Any specific thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 makes sense to me as well. As discussed offline, the documentation for getValue also indicates that this method can be used with columns having a null
value.
@@ -204,13 +210,23 @@ private Type( | |||
Code code, | |||
@Nullable Type arrayElementType, | |||
@Nullable ImmutableList<StructField> structFields) { | |||
this.proto = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we always populate the corresponding proto type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a little difficult. This constructor is used to create the fixed type constants for all known types in the client library. They do not have a proto definition available at construction time. See for example here:
java-spanner/google-cloud-spanner/src/main/java/com/google/cloud/spanner/Type.java
Line 49 in f637f1e
private static final Type TYPE_BOOL = new Type(Code.BOOL, null, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, apart from a few nits or questions. Thank you @olavloite for working on this!
google-cloud-spanner/src/main/java/com/google/cloud/spanner/Type.java
Outdated
Show resolved
Hide resolved
@@ -251,7 +251,6 @@ public Date getDate(String columnName) { | |||
|
|||
@Override | |||
public Value getValue(int columnIndex) { | |||
checkNonNull(columnIndex, columnIndex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 makes sense to me as well. As discussed offline, the documentation for getValue also indicates that this method can be used with columns having a null
value.
google-cloud-spanner/src/main/java/com/google/cloud/spanner/AbstractResultSet.java
Outdated
Show resolved
Hide resolved
google-cloud-spanner/src/main/java/com/google/cloud/spanner/AbstractResultSet.java
Show resolved
Hide resolved
google-cloud-spanner/src/main/java/com/google/cloud/spanner/AbstractResultSet.java
Outdated
Show resolved
Hide resolved
google-cloud-spanner/src/main/java/com/google/cloud/spanner/Value.java
Outdated
Show resolved
Hide resolved
...loud-spanner/src/test/java/com/google/cloud/spanner/connection/RandomResultSetGenerator.java
Show resolved
Hide resolved
...loud-spanner/src/test/java/com/google/cloud/spanner/connection/RandomResultSetGenerator.java
Show resolved
Hide resolved
CI failure for Windows is a setup failure for that CI run and not related to this change. |
🤖 I have created a release *beep* *boop* --- ## [6.36.0](https://togithub.com/googleapis/java-spanner/compare/v6.35.2...v6.36.0) (2023-02-08) ### Features * Support UNRECOGNIZED types + decode BYTES columns lazily ([#2219](https://togithub.com/googleapis/java-spanner/issues/2219)) ([fc721c4](https://togithub.com/googleapis/java-spanner/commit/fc721c4d30de6ed9e5bc4fbbe0e1e7b79a5c7490)) ### Bug Fixes * **java:** Skip fixing poms for special modules ([#1744](https://togithub.com/googleapis/java-spanner/issues/1744)) ([#2244](https://togithub.com/googleapis/java-spanner/issues/2244)) ([e7f4b40](https://togithub.com/googleapis/java-spanner/commit/e7f4b4016f8c4c7e4fac0b822f5af2cffd181134)) ### Dependencies * Update dependency com.google.cloud:google-cloud-monitoring to v3.11.0 ([#2262](https://togithub.com/googleapis/java-spanner/issues/2262)) ([d566613](https://togithub.com/googleapis/java-spanner/commit/d566613442217bdfc69caea7242464fba2647519)) * Update dependency com.google.cloud:google-cloud-shared-dependencies to v3.2.0 ([#2264](https://togithub.com/googleapis/java-spanner/issues/2264)) ([b5fdbc0](https://togithub.com/googleapis/java-spanner/commit/b5fdbc0accdaaf1f63c62c1837d72bb378dc8f43)) * Update dependency com.google.cloud:google-cloud-trace to v2.10.0 ([#2263](https://togithub.com/googleapis/java-spanner/issues/2263)) ([96f0c81](https://togithub.com/googleapis/java-spanner/commit/96f0c8181aeb8ca75647a783d8b163f371ad937e)) --- This PR was generated with [Release Please](https://togithub.com/googleapis/release-please). See [documentation](https://togithub.com/googleapis/release-please#release-please).
.setName("c") | ||
.setType( | ||
Type.newBuilder() | ||
.setCodeValue(Integer.MAX_VALUE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line sets a type code value that does not exist, and will be seen as an unrecognized type in the client library
Unrecognized Types
Adds support for types that do not have a known type code in the client library (yet). This can happen if a new type is added to Cloud Spanner before this client is updated, or if someone is using an old version of the client library with a version of Cloud Spanner that contains a type that was not available at the time that the specific version of the client library was built.
Unrecognized types are assigned a type with type code
UNRECOGNIZED
. The values of these types (and all other types) can be retrieved as follows:ResultSet#getValue(..)
will return the column value as aValue
instance.Value#getAsString()
will return a string representation of the given value. The string value will be a valid and complete representation of the underlying value. This method is guaranteed to work for all known and unknown types. Array types will return a comma-separated string enclosed in square brackets (e.g.[true, NULL, false]
for a boolean array).Value#getAsStringList()
will return a list of strings. The list of strings will contain:Value#getType()#getCode() == Code.ARRAY
to check whether a value is an array with one element, or whether it is a non-array type to determine what the reason is thatgetAsStringList()
returns a list with one element.Decoding BYTES
BYTES columns are encoded as Base64 strings. Decoding these are relatively CPU-heavy, especially for large values. Decoding them is not always necessary if the user only needs the Base64 string. Also, the internally used Guava decoder is less efficient than JDK implementations that are available from Java 8 and onwards. This change therefore delays the decoding of BYTES columns until it is actually necessary, and then uses the JDK implementation instead of the Guava version. The JDK implementation in OpenJDK 17 uses approx 1/3 of the CPU cycles of the Guava version.
This feature is combined with unrecognized types so it can make use of the new
getAsString()
methods. This prevents the introduction of what is (theoretically) a breaking change. Currently,Value#getString(..)
throws an exception if you try to call this on aBYTES
value. This remains the case after this change, and instead users should callValue#getAsString()
to get the underlying Base64 value of aBYTES
column.