Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: retry pdml transaction on EOS internal error #360

Merged
merged 8 commits into from
Jul 26, 2020
Merged

fix: retry pdml transaction on EOS internal error #360

merged 8 commits into from
Jul 26, 2020

Conversation

thiagotnunes
Copy link
Contributor

@thiagotnunes thiagotnunes commented Jul 20, 2020

The Spanner server can return an internal error closing the connection stream. When this happens an InternalException is thrown with a specific EOS message. We are seeing this specially in long running PDML tasks.

This case is now retried by the client, either by resuming the existing transaction (when there is a resume token) or by starting a new transaction.

@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Jul 20, 2020
It is possible to have the stream closed with an EOS internal error.
This should be retried by the client. On this PR we add this retry logic.
@@ -55,23 +56,6 @@
this.rpc = rpc;
}

private ByteString initTransaction() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method was just moved down (after the public methods).

@@ -127,20 +111,24 @@ long executeStreamingPartitionedUpdate(final Statement statement, final Duration
}
}
break;
} catch (UnavailableException e) {
} catch (UnavailableException | InternalException e) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this catch block we retry on EOS exception.

private boolean shouldResumeOrRestartTransaction(Exception e) {
return e instanceof UnavailableException
|| (e instanceof InternalException
&& e.getMessage().contains("Received unexpected EOS on DATA frame from server"));
Copy link
Contributor Author

@thiagotnunes thiagotnunes Jul 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hacky, but unfortunately we do not get a specific exception for this error, so we have to proxy through the exception message. This follows the approach taken in the bigquery client: https://github.com/googleapis/java-bigquerystorage/pull/263/files.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same error is also retried for streaming queries, and this check could probably use the existing check in SpannerExceptionFactory:

private static boolean isRetryable(ErrorCode code, @Nullable Throwable cause) {
. It's still just as hacky, but it would keep it confined to one method.

@thiagotnunes thiagotnunes changed the title Retry PDML Transaction on EOS Internal Error fix: retry pdml transaction on EOS internal error Jul 21, 2020
@codecov
Copy link

codecov bot commented Jul 21, 2020

Codecov Report

❗ No coverage uploaded for pull request base (master@87f4f13). Click here to learn what that means.
The diff coverage is n/a.

@skuruppu
Copy link
Contributor

@thiagotnunes I'm going to hold off on reviewing this until I better understand why this error happens. Hope it's ok to wait a couple of days until the Spanner team has had a chance to look at the issue.

@thiagotnunes thiagotnunes added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Jul 21, 2020
Copy link
Contributor

@skuruppu skuruppu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this change @thiagotnunes. I would also like @olavloite to take a look before merging.

@thiagotnunes thiagotnunes removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Jul 21, 2020
Copy link
Collaborator

@olavloite olavloite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks generally good to me, but could you have a look and see if it's possible to reuse the existing isRetryable method for the internal exception?

private boolean shouldResumeOrRestartTransaction(Exception e) {
return e instanceof UnavailableException
|| (e instanceof InternalException
&& e.getMessage().contains("Received unexpected EOS on DATA frame from server"));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same error is also retried for streaming queries, and this check could probably use the existing check in SpannerExceptionFactory:

private static boolean isRetryable(ErrorCode code, @Nullable Throwable cause) {
. It's still just as hacky, but it would keep it confined to one method.

@thiagotnunes
Copy link
Contributor Author

Thanks for the feedback @olavloite, I did not know about that functionality! I will make a change in this PR to use that.

When the exception is an EOS, we should retry the exception.
Re-uses the retry logic applied by the spanner exception factory for
retrying EOS internal exceptions in pdml transactions.
@@ -257,35 +256,8 @@ private static boolean hasCauseMatching(
}

private static class Matchers {
static final Predicate<Throwable> isRetryableInternalError =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I extracted these predicates into their own classes that are now exposed to the PartitionedDmlTransaction. This was the cleanest / lowest impact way I could find to de-duplicate the EOS retryable logic.

@thiagotnunes
Copy link
Contributor Author

thiagotnunes commented Jul 24, 2020

@olavloite @skuruppu do you mind giving the PR another look? I did the following changes:

  • Extracted predicates from the SpannerExceptionFactory in order to use them in the PartitionedDmlTransaction, thus removing the EOS logic duplication.
  • Changed the PartitionedDmlTransaction class to use the extracted predicate.
  • Added tests for the extracted classes.

public boolean apply(Throwable cause) {
if (isInternalError(cause)) {
if (cause.getMessage().contains(HTTP2_ERROR_MESSAGE)) {
// See b/25451313.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and below, if we could remove the references to internal issues that would be great. I don't know why they were there before. Folks working on this repo are unlikely to have access to those so no point exposing them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, will tackle this in a following PR.

@olavloite
Copy link
Collaborator

@thiagotnunes Thanks for extracting the predicates for the retryable internal errors.
I have no further comments, this looks good to me.

gcf-merge-on-green bot pushed a commit that referenced this pull request Aug 20, 2020
🤖 I have created a release \*beep\* \*boop\* 
---
## [1.60.0](https://www.github.com/googleapis/java-spanner/compare/v1.59.0...v1.60.0) (2020-08-18)


### Features

* adds clirr check on pre-commit hook ([#388](https://www.github.com/googleapis/java-spanner/issues/388)) ([bd5c93f](https://www.github.com/googleapis/java-spanner/commit/bd5c93f045e06372b2235f3d350bade93bff2c24))
* include SQL statement in error message ([#355](https://www.github.com/googleapis/java-spanner/issues/355)) ([cc5ac48](https://www.github.com/googleapis/java-spanner/commit/cc5ac48232b6e4550b98d213c5877d6ec37b293f))


### Bug Fixes

* enables emulator tests ([#380](https://www.github.com/googleapis/java-spanner/issues/380)) ([f61c6d0](https://www.github.com/googleapis/java-spanner/commit/f61c6d0d332f15826499996a292acc7cbab267a7))
* remove custom timeout and retry settings ([#365](https://www.github.com/googleapis/java-spanner/issues/365)) ([f6afd21](https://www.github.com/googleapis/java-spanner/commit/f6afd213430d3f06d9a72c64a5c37172840fed0e))
* remove unused kokoro files ([#367](https://www.github.com/googleapis/java-spanner/issues/367)) ([6125c7d](https://www.github.com/googleapis/java-spanner/commit/6125c7d221c77f4c42497b72107627ee09312813))
* retry pdml transaction on EOS internal error ([#360](https://www.github.com/googleapis/java-spanner/issues/360)) ([a53d736](https://www.github.com/googleapis/java-spanner/commit/a53d7369bb2a8640ab42e409632b352decbdbf5e))
* sets the project for the integration tests ([#386](https://www.github.com/googleapis/java-spanner/issues/386)) ([c8fa458](https://www.github.com/googleapis/java-spanner/commit/c8fa458f5369a09c780ee38ecc09bd2562e8f987))


### Dependencies

* stop auto updates of commons-lang3 ([#362](https://www.github.com/googleapis/java-spanner/issues/362)) ([8f07ed6](https://www.github.com/googleapis/java-spanner/commit/8f07ed6b44f9c70f56b9ee2e4505c40385337ca7))
* update dependency com.google.cloud:google-cloud-shared-dependencies to v0.8.6 ([#374](https://www.github.com/googleapis/java-spanner/issues/374)) ([6f47b8a](https://www.github.com/googleapis/java-spanner/commit/6f47b8a759643f772230df0c2e153338d44f70ce))
* update dependency org.openjdk.jmh:jmh-core to v1.24 ([#375](https://www.github.com/googleapis/java-spanner/issues/375)) ([94f568c](https://www.github.com/googleapis/java-spanner/commit/94f568cf731ba22cac7f0d898d7776a3cc2c178f))
* update dependency org.openjdk.jmh:jmh-core to v1.25 ([#382](https://www.github.com/googleapis/java-spanner/issues/382)) ([ec7888e](https://www.github.com/googleapis/java-spanner/commit/ec7888e1d62cf800bf6ad166d242e89443ddc7aa))
* update dependency org.openjdk.jmh:jmh-generator-annprocess to v1.25 ([#376](https://www.github.com/googleapis/java-spanner/issues/376)) ([8ffdc48](https://www.github.com/googleapis/java-spanner/commit/8ffdc481e15901f78eac592bd8d4bef33ac3378a))
---


This PR was generated with [Release Please](https://github.com/googleapis/release-please).
thiagotnunes pushed a commit to thiagotnunes/java-spanner that referenced this pull request Jun 5, 2021
🤖 I have created a release \*beep\* \*boop\* 
---
## [1.60.0](https://www.github.com/googleapis/java-spanner/compare/v1.59.0...v1.60.0) (2020-08-18)


### Features

* adds clirr check on pre-commit hook ([googleapis#388](https://www.github.com/googleapis/java-spanner/issues/388)) ([bd5c93f](https://www.github.com/googleapis/java-spanner/commit/bd5c93f045e06372b2235f3d350bade93bff2c24))
* include SQL statement in error message ([googleapis#355](https://www.github.com/googleapis/java-spanner/issues/355)) ([cc5ac48](https://www.github.com/googleapis/java-spanner/commit/cc5ac48232b6e4550b98d213c5877d6ec37b293f))


### Bug Fixes

* enables emulator tests ([googleapis#380](https://www.github.com/googleapis/java-spanner/issues/380)) ([f61c6d0](https://www.github.com/googleapis/java-spanner/commit/f61c6d0d332f15826499996a292acc7cbab267a7))
* remove custom timeout and retry settings ([googleapis#365](https://www.github.com/googleapis/java-spanner/issues/365)) ([f6afd21](https://www.github.com/googleapis/java-spanner/commit/f6afd213430d3f06d9a72c64a5c37172840fed0e))
* remove unused kokoro files ([googleapis#367](https://www.github.com/googleapis/java-spanner/issues/367)) ([6125c7d](https://www.github.com/googleapis/java-spanner/commit/6125c7d221c77f4c42497b72107627ee09312813))
* retry pdml transaction on EOS internal error ([googleapis#360](https://www.github.com/googleapis/java-spanner/issues/360)) ([a53d736](https://www.github.com/googleapis/java-spanner/commit/a53d7369bb2a8640ab42e409632b352decbdbf5e))
* sets the project for the integration tests ([googleapis#386](https://www.github.com/googleapis/java-spanner/issues/386)) ([c8fa458](https://www.github.com/googleapis/java-spanner/commit/c8fa458f5369a09c780ee38ecc09bd2562e8f987))


### Dependencies

* stop auto updates of commons-lang3 ([googleapis#362](https://www.github.com/googleapis/java-spanner/issues/362)) ([8f07ed6](https://www.github.com/googleapis/java-spanner/commit/8f07ed6b44f9c70f56b9ee2e4505c40385337ca7))
* update dependency com.google.cloud:google-cloud-shared-dependencies to v0.8.6 ([googleapis#374](https://www.github.com/googleapis/java-spanner/issues/374)) ([6f47b8a](https://www.github.com/googleapis/java-spanner/commit/6f47b8a759643f772230df0c2e153338d44f70ce))
* update dependency org.openjdk.jmh:jmh-core to v1.24 ([googleapis#375](https://www.github.com/googleapis/java-spanner/issues/375)) ([94f568c](https://www.github.com/googleapis/java-spanner/commit/94f568cf731ba22cac7f0d898d7776a3cc2c178f))
* update dependency org.openjdk.jmh:jmh-core to v1.25 ([googleapis#382](https://www.github.com/googleapis/java-spanner/issues/382)) ([ec7888e](https://www.github.com/googleapis/java-spanner/commit/ec7888e1d62cf800bf6ad166d242e89443ddc7aa))
* update dependency org.openjdk.jmh:jmh-generator-annprocess to v1.25 ([googleapis#376](https://www.github.com/googleapis/java-spanner/issues/376)) ([8ffdc48](https://www.github.com/googleapis/java-spanner/commit/8ffdc481e15901f78eac592bd8d4bef33ac3378a))
---


This PR was generated with [Release Please](https://github.com/googleapis/release-please).
ansh0l pushed a commit to ansh0l/java-spanner that referenced this pull request Nov 10, 2022
This is an auto-generated regeneration of the .pb.go files by
cloud.google.com/go/internal/gapicgen. Once this PR is submitted, genmgr will
update the corresponding CL at gocloud to depend on the newer version of
go-genproto, and assign reviewers. Whilst this or any regen PR is open in
go-genproto, gapicgen will not create any more regeneration PRs or CLs. If all
regen PRs are closed, gapicgen will create a new set of regeneration PRs and
CLs once per night.

If you have been assigned to review this CL, please:

- Ensure that CI is passing. If it's failing, it requires your manual attention.
- Approve and submit this PR if you believe it's ready to ship. That will prompt
  genmgr to assign reviewers to the gocloud CL.

Corresponding gocloud CL: https://code-review.googlesource.com/c/gocloud/+/56190
rajatbhatta pushed a commit to rajatbhatta/java-spanner that referenced this pull request Nov 17, 2022
This PR was generated using Autosynth. 🌈

Synth log will be available here:
https://source.cloud.google.com/results/invocations/0a285685-944f-4f44-84fb-761c5c361020/targets

- [ ] To automatically regenerate this PR, check this box.

Source-Link: googleapis/synthtool@b416a7b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants