-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: retry pdml transaction on EOS internal error #360
fix: retry pdml transaction on EOS internal error #360
Conversation
It is possible to have the stream closed with an EOS internal error. This should be retried by the client. On this PR we add this retry logic.
@@ -55,23 +56,6 @@ | |||
this.rpc = rpc; | |||
} | |||
|
|||
private ByteString initTransaction() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method was just moved down (after the public methods).
@@ -127,20 +111,24 @@ long executeStreamingPartitionedUpdate(final Statement statement, final Duration | |||
} | |||
} | |||
break; | |||
} catch (UnavailableException e) { | |||
} catch (UnavailableException | InternalException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this catch block we retry on EOS exception.
private boolean shouldResumeOrRestartTransaction(Exception e) { | ||
return e instanceof UnavailableException | ||
|| (e instanceof InternalException | ||
&& e.getMessage().contains("Received unexpected EOS on DATA frame from server")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is hacky, but unfortunately we do not get a specific exception for this error, so we have to proxy through the exception message. This follows the approach taken in the bigquery client: https://github.com/googleapis/java-bigquerystorage/pull/263/files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same error is also retried for streaming queries, and this check could probably use the existing check in SpannerExceptionFactory
:
Line 233 in b264100
private static boolean isRetryable(ErrorCode code, @Nullable Throwable cause) { |
Codecov Report
|
@thiagotnunes I'm going to hold off on reviewing this until I better understand why this error happens. Hope it's ok to wait a couple of days until the Spanner team has had a chance to look at the issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making this change @thiagotnunes. I would also like @olavloite to take a look before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks generally good to me, but could you have a look and see if it's possible to reuse the existing isRetryable
method for the internal exception?
private boolean shouldResumeOrRestartTransaction(Exception e) { | ||
return e instanceof UnavailableException | ||
|| (e instanceof InternalException | ||
&& e.getMessage().contains("Received unexpected EOS on DATA frame from server")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same error is also retried for streaming queries, and this check could probably use the existing check in SpannerExceptionFactory
:
Line 233 in b264100
private static boolean isRetryable(ErrorCode code, @Nullable Throwable cause) { |
Thanks for the feedback @olavloite, I did not know about that functionality! I will make a change in this PR to use that. |
…nto retry-pdml-on-eos
When the exception is an EOS, we should retry the exception.
Re-uses the retry logic applied by the spanner exception factory for retrying EOS internal exceptions in pdml transactions.
@@ -257,35 +256,8 @@ private static boolean hasCauseMatching( | |||
} | |||
|
|||
private static class Matchers { | |||
static final Predicate<Throwable> isRetryableInternalError = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I extracted these predicates into their own classes that are now exposed to the PartitionedDmlTransaction
. This was the cleanest / lowest impact way I could find to de-duplicate the EOS retryable logic.
@olavloite @skuruppu do you mind giving the PR another look? I did the following changes:
|
public boolean apply(Throwable cause) { | ||
if (isInternalError(cause)) { | ||
if (cause.getMessage().contains(HTTP2_ERROR_MESSAGE)) { | ||
// See b/25451313. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and below, if we could remove the references to internal issues that would be great. I don't know why they were there before. Folks working on this repo are unlikely to have access to those so no point exposing them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, will tackle this in a following PR.
@thiagotnunes Thanks for extracting the predicates for the retryable internal errors. |
🤖 I have created a release \*beep\* \*boop\* --- ## [1.60.0](https://www.github.com/googleapis/java-spanner/compare/v1.59.0...v1.60.0) (2020-08-18) ### Features * adds clirr check on pre-commit hook ([#388](https://www.github.com/googleapis/java-spanner/issues/388)) ([bd5c93f](https://www.github.com/googleapis/java-spanner/commit/bd5c93f045e06372b2235f3d350bade93bff2c24)) * include SQL statement in error message ([#355](https://www.github.com/googleapis/java-spanner/issues/355)) ([cc5ac48](https://www.github.com/googleapis/java-spanner/commit/cc5ac48232b6e4550b98d213c5877d6ec37b293f)) ### Bug Fixes * enables emulator tests ([#380](https://www.github.com/googleapis/java-spanner/issues/380)) ([f61c6d0](https://www.github.com/googleapis/java-spanner/commit/f61c6d0d332f15826499996a292acc7cbab267a7)) * remove custom timeout and retry settings ([#365](https://www.github.com/googleapis/java-spanner/issues/365)) ([f6afd21](https://www.github.com/googleapis/java-spanner/commit/f6afd213430d3f06d9a72c64a5c37172840fed0e)) * remove unused kokoro files ([#367](https://www.github.com/googleapis/java-spanner/issues/367)) ([6125c7d](https://www.github.com/googleapis/java-spanner/commit/6125c7d221c77f4c42497b72107627ee09312813)) * retry pdml transaction on EOS internal error ([#360](https://www.github.com/googleapis/java-spanner/issues/360)) ([a53d736](https://www.github.com/googleapis/java-spanner/commit/a53d7369bb2a8640ab42e409632b352decbdbf5e)) * sets the project for the integration tests ([#386](https://www.github.com/googleapis/java-spanner/issues/386)) ([c8fa458](https://www.github.com/googleapis/java-spanner/commit/c8fa458f5369a09c780ee38ecc09bd2562e8f987)) ### Dependencies * stop auto updates of commons-lang3 ([#362](https://www.github.com/googleapis/java-spanner/issues/362)) ([8f07ed6](https://www.github.com/googleapis/java-spanner/commit/8f07ed6b44f9c70f56b9ee2e4505c40385337ca7)) * update dependency com.google.cloud:google-cloud-shared-dependencies to v0.8.6 ([#374](https://www.github.com/googleapis/java-spanner/issues/374)) ([6f47b8a](https://www.github.com/googleapis/java-spanner/commit/6f47b8a759643f772230df0c2e153338d44f70ce)) * update dependency org.openjdk.jmh:jmh-core to v1.24 ([#375](https://www.github.com/googleapis/java-spanner/issues/375)) ([94f568c](https://www.github.com/googleapis/java-spanner/commit/94f568cf731ba22cac7f0d898d7776a3cc2c178f)) * update dependency org.openjdk.jmh:jmh-core to v1.25 ([#382](https://www.github.com/googleapis/java-spanner/issues/382)) ([ec7888e](https://www.github.com/googleapis/java-spanner/commit/ec7888e1d62cf800bf6ad166d242e89443ddc7aa)) * update dependency org.openjdk.jmh:jmh-generator-annprocess to v1.25 ([#376](https://www.github.com/googleapis/java-spanner/issues/376)) ([8ffdc48](https://www.github.com/googleapis/java-spanner/commit/8ffdc481e15901f78eac592bd8d4bef33ac3378a)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please).
🤖 I have created a release \*beep\* \*boop\* --- ## [1.60.0](https://www.github.com/googleapis/java-spanner/compare/v1.59.0...v1.60.0) (2020-08-18) ### Features * adds clirr check on pre-commit hook ([googleapis#388](https://www.github.com/googleapis/java-spanner/issues/388)) ([bd5c93f](https://www.github.com/googleapis/java-spanner/commit/bd5c93f045e06372b2235f3d350bade93bff2c24)) * include SQL statement in error message ([googleapis#355](https://www.github.com/googleapis/java-spanner/issues/355)) ([cc5ac48](https://www.github.com/googleapis/java-spanner/commit/cc5ac48232b6e4550b98d213c5877d6ec37b293f)) ### Bug Fixes * enables emulator tests ([googleapis#380](https://www.github.com/googleapis/java-spanner/issues/380)) ([f61c6d0](https://www.github.com/googleapis/java-spanner/commit/f61c6d0d332f15826499996a292acc7cbab267a7)) * remove custom timeout and retry settings ([googleapis#365](https://www.github.com/googleapis/java-spanner/issues/365)) ([f6afd21](https://www.github.com/googleapis/java-spanner/commit/f6afd213430d3f06d9a72c64a5c37172840fed0e)) * remove unused kokoro files ([googleapis#367](https://www.github.com/googleapis/java-spanner/issues/367)) ([6125c7d](https://www.github.com/googleapis/java-spanner/commit/6125c7d221c77f4c42497b72107627ee09312813)) * retry pdml transaction on EOS internal error ([googleapis#360](https://www.github.com/googleapis/java-spanner/issues/360)) ([a53d736](https://www.github.com/googleapis/java-spanner/commit/a53d7369bb2a8640ab42e409632b352decbdbf5e)) * sets the project for the integration tests ([googleapis#386](https://www.github.com/googleapis/java-spanner/issues/386)) ([c8fa458](https://www.github.com/googleapis/java-spanner/commit/c8fa458f5369a09c780ee38ecc09bd2562e8f987)) ### Dependencies * stop auto updates of commons-lang3 ([googleapis#362](https://www.github.com/googleapis/java-spanner/issues/362)) ([8f07ed6](https://www.github.com/googleapis/java-spanner/commit/8f07ed6b44f9c70f56b9ee2e4505c40385337ca7)) * update dependency com.google.cloud:google-cloud-shared-dependencies to v0.8.6 ([googleapis#374](https://www.github.com/googleapis/java-spanner/issues/374)) ([6f47b8a](https://www.github.com/googleapis/java-spanner/commit/6f47b8a759643f772230df0c2e153338d44f70ce)) * update dependency org.openjdk.jmh:jmh-core to v1.24 ([googleapis#375](https://www.github.com/googleapis/java-spanner/issues/375)) ([94f568c](https://www.github.com/googleapis/java-spanner/commit/94f568cf731ba22cac7f0d898d7776a3cc2c178f)) * update dependency org.openjdk.jmh:jmh-core to v1.25 ([googleapis#382](https://www.github.com/googleapis/java-spanner/issues/382)) ([ec7888e](https://www.github.com/googleapis/java-spanner/commit/ec7888e1d62cf800bf6ad166d242e89443ddc7aa)) * update dependency org.openjdk.jmh:jmh-generator-annprocess to v1.25 ([googleapis#376](https://www.github.com/googleapis/java-spanner/issues/376)) ([8ffdc48](https://www.github.com/googleapis/java-spanner/commit/8ffdc481e15901f78eac592bd8d4bef33ac3378a)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please).
This is an auto-generated regeneration of the .pb.go files by cloud.google.com/go/internal/gapicgen. Once this PR is submitted, genmgr will update the corresponding CL at gocloud to depend on the newer version of go-genproto, and assign reviewers. Whilst this or any regen PR is open in go-genproto, gapicgen will not create any more regeneration PRs or CLs. If all regen PRs are closed, gapicgen will create a new set of regeneration PRs and CLs once per night. If you have been assigned to review this CL, please: - Ensure that CI is passing. If it's failing, it requires your manual attention. - Approve and submit this PR if you believe it's ready to ship. That will prompt genmgr to assign reviewers to the gocloud CL. Corresponding gocloud CL: https://code-review.googlesource.com/c/gocloud/+/56190
This PR was generated using Autosynth. 🌈 Synth log will be available here: https://source.cloud.google.com/results/invocations/0a285685-944f-4f44-84fb-761c5c361020/targets - [ ] To automatically regenerate this PR, check this box. Source-Link: googleapis/synthtool@b416a7b
The Spanner server can return an internal error closing the connection stream. When this happens an
InternalException
is thrown with a specificEOS
message. We are seeing this specially in long running PDML tasks.This case is now retried by the client, either by resuming the existing transaction (when there is a resume token) or by starting a new transaction.