-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: respect PDML timeout when using streaming RPC #338
Conversation
@olavloite will do so on Monday and will let you know. Digging in a little on the fix, it seems we just needed to fix the keepalive for the rpc calls, so am I right to assume that the thinking is that the Thanks for looking into this! |
I got another exception when running the code. I think there is a bug on using the Exception in thread "main" com.google.cloud.spanner.SpannerException: UNKNOWN: Invalid timeout: < 0 s
at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerExceptionPreformatted(SpannerExceptionFactory.java:211)
at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:57)
at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:151)
at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:94)
at com.google.cloud.spanner.PartitionedDMLTransaction.executeStreamingPartitionedUpdate(PartitionedDMLTransaction.java:156)
at com.google.cloud.spanner.SessionImpl.executePartitionedUpdate(SessionImpl.java:118)
at com.google.cloud.spanner.SessionPool$PooledSession.executePartitionedUpdate(SessionPool.java:1349)
at com.google.cloud.spanner.SessionPool$PooledSessionFuture.executePartitionedUpdate(SessionPool.java:1236)
at com.google.cloud.spanner.DatabaseClientImpl$3.apply(DatabaseClientImpl.java:227)
at com.google.cloud.spanner.DatabaseClientImpl$3.apply(DatabaseClientImpl.java:224)
at com.google.cloud.spanner.DatabaseClientImpl.runWithSessionRetry(DatabaseClientImpl.java:241)
at com.google.cloud.spanner.DatabaseClientImpl.executePartitionedUpdate(DatabaseClientImpl.java:222)
at com.google.cloud.spanner.PdmlIssue.main(PdmlIssue.java:14)
Caused by: java.lang.IllegalArgumentException: Invalid timeout: < 0 s
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:142)
at com.google.api.gax.grpc.GrpcCallContext.withStreamWaitTimeout(GrpcCallContext.java:177)
at com.google.cloud.spanner.spi.v1.GapicSpannerRpc.executeStreamingPartitionedDml(GapicSpannerRpc.java:1108)
at com.google.cloud.spanner.PartitionedDMLTransaction.executeStreamingPartitionedUpdate(PartitionedDMLTransaction.java:113)
... 8 more This is my executor: final SpannerOptions options = SpannerOptions.newBuilder().build();
final Spanner spanner = options.getService();
final DatabaseId id = DatabaseId.of(PROJECT, INSTANCE, DATABASE);
final DatabaseClient client = spanner.getDatabaseClient(id);
client.executePartitionedUpdate(Statement.of(
"UPDATE Table SET bool = true WHERE true"
)); |
In any case, the failure occurs after the specified timeout (2 hours by default). |
@thiagotnunes Thanks for the test results. I've updated the retry-loop to take a potential zero or negative deadline into account, and I've added some additional unit tests for this class. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest tests that I've done have terminated with a com.google.cloud.spanner.SpannerException: DEADLINE_EXCEEDED
, but the weird thing is that the process did not take 2 hours to fail, but around 68 mins, so it does seem that the 2 hours window is still not being respected.
I might have misspoke when I said that the previous one failed in 2 hours, I thought it did, but I had no timer in runner code. I added it now. These are the logs from the execution:
Started running at Tue Jul 14 12:41:21 AEST 2020
Exception in thread "main" com.google.cloud.spanner.SpannerException: DEADLINE_EXCEEDED
Finished running at Tue Jul 14 13:49:33 AEST 2020 (took 4092380ms)
at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerExceptionPreformatted(SpannerExceptionFactory.java:211)
at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:57)
at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:52)
at com.google.cloud.spanner.SpannerExceptionFactory.fromApiException(SpannerExceptionFactory.java:229)
at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:144)
at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:94)
at com.google.cloud.spanner.PartitionedDMLTransaction.executeStreamingPartitionedUpdate(PartitionedDMLTransaction.java:162)
at com.google.cloud.spanner.SessionImpl.executePartitionedUpdate(SessionImpl.java:118)
at com.google.cloud.spanner.SessionPool$PooledSession.executePartitionedUpdate(SessionPool.java:1349)
at com.google.cloud.spanner.SessionPool$PooledSessionFuture.executePartitionedUpdate(SessionPool.java:1236)
at com.google.cloud.spanner.DatabaseClientImpl$3.apply(DatabaseClientImpl.java:227)
at com.google.cloud.spanner.DatabaseClientImpl$3.apply(DatabaseClientImpl.java:224)
at com.google.cloud.spanner.DatabaseClientImpl.runWithSessionRetry(DatabaseClientImpl.java:241)
at com.google.cloud.spanner.DatabaseClientImpl.executePartitionedUpdate(DatabaseClientImpl.java:222)
at com.google.cloud.spanner.PmlIssue.main(PmlIssue.java:19)
Process finished with exit code 1
@@ -157,6 +163,10 @@ long executeStreamingPartitionedUpdate(final Statement statement, Duration timeo | |||
} | |||
} | |||
|
|||
Stopwatch createStopwatchStarted() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would it be hard to inject the Stopwatch
instead? Not a big deal, just wondering if we could use composition here. Disregard if it would be too hard to inject.
|
||
@SuppressWarnings("unchecked") | ||
@RunWith(JUnit4.class) | ||
public class PartitionedDmlTransactionTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice tests!
I know what the problem is: multiple failures are occurring so whenever we are subtracting the elapsed time, we are subtracting the accumulated time of the An example of the problem I am seeing:
Btw, @olavloite let me know if you this feedback loop is too big (me testing a day after your change). I could implement these minor fixes if you'd like so, otherwise I am happy on testing / reporting back the results to you. |
@thiagotnunes Thanks for the elaborate testing and report. Hopefully the timeout problem should be fixed now. And yes, please do feel free to add any additional fixes that might be needed to get this 100%. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@olavloite thanks for the fix. It worked as expected (timing out after 2 hours).
🤖 I have created a release \*beep\* \*boop\* --- ## [1.59.0](https://www.github.com/googleapis/java-spanner/compare/v1.58.0...v1.59.0) (2020-07-16) ### Features * add support for NUMERIC data type ([#193](https://www.github.com/googleapis/java-spanner/issues/193)) ([b38a91d](https://www.github.com/googleapis/java-spanner/commit/b38a91d8daac264b9dea327d6b31430d9599bd78)) * spanner NUMERIC type ([#349](https://www.github.com/googleapis/java-spanner/issues/349)) ([78c3192](https://www.github.com/googleapis/java-spanner/commit/78c3192266c474fc43277a8bf3f15caa968a0100)) ### Bug Fixes * check if emulator is running if env var is set ([#340](https://www.github.com/googleapis/java-spanner/issues/340)) ([597f501](https://www.github.com/googleapis/java-spanner/commit/597f501803e6d58717a6e3770e6fd3f34454e9a5)) * fix potential unnecessary transaction retry ([#337](https://www.github.com/googleapis/java-spanner/issues/337)) ([1a4f4fd](https://www.github.com/googleapis/java-spanner/commit/1a4f4fd675a1580c87ad1d53c650a20bd2ff4811)), closes [#327](https://www.github.com/googleapis/java-spanner/issues/327) * respect PDML timeout when using streaming RPC ([#338](https://www.github.com/googleapis/java-spanner/issues/338)) ([d67f108](https://www.github.com/googleapis/java-spanner/commit/d67f108e86925c1296e695db8e78fa82e11fa4fa)) * runs sample tests in java 8 and java 11 ([#345](https://www.github.com/googleapis/java-spanner/issues/345)) ([b547e31](https://www.github.com/googleapis/java-spanner/commit/b547e31d095be3cf1646e0e9c07bfc467ecc3c22)) * set gRPC keep-alive to 120 seconds ([#339](https://www.github.com/googleapis/java-spanner/issues/339)) ([26be103](https://www.github.com/googleapis/java-spanner/commit/26be103da1117c4940550fad1672c66e6edfbdb3)) ### Dependencies * update dependency com.google.cloud:google-cloud-shared-dependencies to v0.8.3 ([#334](https://www.github.com/googleapis/java-spanner/issues/334)) ([45acd89](https://www.github.com/googleapis/java-spanner/commit/45acd8960c961d48e91a7b1546efa64d9e9ae576)) * update shared config to 0.9.2 ([#328](https://www.github.com/googleapis/java-spanner/issues/328)) ([75df62c](https://www.github.com/googleapis/java-spanner/commit/75df62c0176137fda1d0a9076b83be06f11228ce)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please).
This is an auto-generated regeneration of the .pb.go files by cloud.google.com/go/internal/gapicgen. Once this PR is submitted, genmgr will update the corresponding CL at gocloud to depend on the newer version of go-genproto, and assign reviewers. Whilst this or any regen PR is open in go-genproto, gapicgen will not create any more regeneration PRs or CLs. If all regen PRs are closed, gapicgen will create a new set of regeneration PRs and CLs once per night. If you have been assigned to review this CL, please: - Ensure that CI is passing. If it's failing, it requires your manual attention. - Approve and submit this PR if you believe it's ready to ship. That will prompt genmgr to assign reviewers to the gocloud CL. Corresponding gocloud CL: https://code-review.googlesource.com/c/gocloud/+/54272
@thiagotnunes Would you mind trying the same test with this PR? I don't have a dataset that is big enough that it will run for more than one hour.
Fixes #336