-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add toString to futures returned by operations #3140
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sometimes an operation can get stuck indefinitely. The underlying reasons can vary significantly: - the underlying attempt rpc can get stuck due to a bug in grpc (ie grpc/grpc-java#11026) - the operation can get stuck in layers above gax: googleapis/java-bigtable#1939 - or it can get stuck in gax itself (dont have a pointer handy) Guava futures provide some observability for ListenableFutures, but in creating the custom ApiFutures in gax, we lose that functionality. This PR sprinkles a few to toString to allow callers to inspect the internal state of the operation. For example with these changes, the toString() of the future returned from bigtableDataClient.mutateRows() changes from TransformFuture@652ce654[status=PENDING, info=[inputFuture=[com.google.api.core.ApiFutureToListenableFuture@522ba524], function=[com.google.api.core.ApiFutures$ApiFunctionToGuavaFunction@29c5ee1d]]] to ListenableFutureToApiFuture{delegate=TransformFuture@7ac9af2a[status=PENDING, info=[inputFuture=[ApiFutureToListenableFuture{apiFuture=CallbackChainRetryingFuture{super=com.google.api.gax.retrying.CallbackChainRetryingFuture@7bb004b8[status=PENDING], latestCompletedAttemptResult=null, attemptResult=null, attemptSettings=TimedAttemptSettings{globalSettings=RetrySettings{totalTimeout=PT10M, initialRetryDelay=PT0.01S, retryDelayMultiplier=2.0, maxRetryDelay=PT1M, maxAttempts=0, jittered=true, initialRpcTimeout=PT1M, rpcTimeoutMultiplier=1.0, maxRpcTimeout=PT1M}, retryDelay=PT0S, rpcTimeout=PT1M, randomizedRetryDelay=PT0S, attemptCount=0, overallAttemptCount=0, firstAttemptStartTimeNanos=635709620001791}}}], function=[com.google.api.core.ApiFutures$ApiFunctionToGuavaFunction@652ce654]]]} This allows us to reason about whats stuck. I'm working another PR that will add a close(timeout) to the Batcher that will use this functionality to identify why batcher.close() timed out
igorbernstein2
added a commit
to igorbernstein2/sdk-platform-java
that referenced
this pull request
Aug 29, 2024
…le. Currently it is impossible to debug because we dont expose any internal state to analyze. This PR adds 2 additional methods that should help in diagnosing issues: 1. close(timeout) will try to close the batcher, but if any of the underlying batch operations fail, the exception message will contain a wealth of information describing the underlying state of operations as provided by googleapis#3140 2. cancelOutstanding this allows for remediation for close(timeout) throwing an exception. The intended usecase is dataflow connector's FinishBundle: try { batcher.close(Duration.ofMinutes(1)); } catch(BatchingException e) { batcher.cancelOutstanding(); batcher.close(Duration.ofMinutes(1)); }
4 tasks
blakeli0
reviewed
Aug 29, 2024
api-common-java/src/test/java/com/google/api/core/ApiFutureToListenableFutureTest.java
Outdated
Show resolved
Hide resolved
blakeli0
reviewed
Aug 29, 2024
api-common-java/src/test/java/com/google/api/core/ApiFutureToListenableFutureTest.java
Outdated
Show resolved
Hide resolved
Quality Gate passed for 'gapic-generator-java-root'Issues Measures |
Quality Gate failed for 'java_showcase_integration_tests'Failed conditions |
blakeli0
approved these changes
Aug 30, 2024
blakeli0
added a commit
that referenced
this pull request
Sep 9, 2024
There have been reports of batcher.close() hanging every once in awhile. Currently it is impossible to debug because we dont expose any internal state to analyze. This PR adds 2 additional methods that should help in diagnosing issues: 1. close(timeout) will try to close the batcher, but if any of the underlying batch operations fail, the exception message will contain a wealth of information describing the underlying state of operations as provided by #3140 2. cancelOutstanding this allows for remediation for close(timeout) throwing an exception. The intended usecase is dataflow connector's FinishBundle: ```java try { batcher.close(Duration.ofMinutes(1)); } catch(TimeoutException e) { // log details why the batch failed to close with the help of #3140 logger.error(e); batcher.cancelOutstanding(); batcher.close(Duration.ofMinutes(1)); } ``` Example exception message: > Exception in thread "main" com.google.api.gax.batching.BatchingException: Timed out trying to close batcher after PT1S. Batch request prototype: com.google.cloud.bigtable.data.v2.models.BulkMutation@2bac9ba. Outstanding batches: Batch{operation=CallbackChainRetryingFuture{super=null, latestCompletedAttemptResult=ImmediateFailedFuture@6a9d5dff[status=FAILURE, cause=[com.google.cloud.bigtable.data.v2.models.MutateRowsException: Some mutations failed to apply]], attemptResult=null, attemptSettings=TimedAttemptSettings{globalSettings=RetrySettings{totalTimeout=PT10M, initialRetryDelay=PT0.01S, retryDelayMultiplier=2.0, maxRetryDelay=PT1M, maxAttempts=0, jittered=true, initialRpcTimeout=PT1M, rpcTimeoutMultiplier=1.0, maxRpcTimeout=PT1M}, retryDelay=PT1.28S, rpcTimeout=PT1M, randomizedRetryDelay=PT0.877S, attemptCount=8, overallAttemptCount=8, firstAttemptStartTimeNanos=646922035424541}}, elements=com.google.cloud.bigtable.data.v2.models.RowMutationEntry@7a344b65} Co-authored-by: Blake Li <[email protected]>
ldetmer
added a commit
that referenced
this pull request
Sep 9, 2024
🤖 I have created a release *beep* *boop* --- <details><summary>2.45.0</summary> ## [2.45.0](v2.44.0...v2.45.0) (2024-09-09) ### Features * add Batcher#close(timeout) and Batcher#cancelOutstanding ([#3141](#3141)) ([b5a92e4](b5a92e4)) * add full RetrySettings sample code to Settings classes ([#3056](#3056)) ([8fe3a2d](8fe3a2d)) * add toString to futures returned by operations ([#3140](#3140)) ([afecb8c](afecb8c)) * bake gapic-generator-java into the hermetic build docker image ([#3067](#3067)) ([a372e82](a372e82)) ### Bug Fixes * **gax:** prevent truncation/overflow when converting time values ([#3095](#3095)) ([699074e](699074e)) ### Dependencies * add opentelemetry exporter-metrics and shared-resoucemapping to shared dependencies ([#3078](#3078)) ([fc8d80d](fc8d80d)) * update dependency certifi to v2024.8.30 ([#3150](#3150)) ([c18b705](c18b705)) * update dependency com.google.api-client:google-api-client-bom to v2.7.0 ([#3151](#3151)) ([5f43e43](5f43e43)) * update dependency com.google.errorprone:error_prone_annotations to v2.31.0 ([#3153](#3153)) ([3071509](3071509)) * update dependency com.google.errorprone:error_prone_annotations to v2.31.0 ([#3154](#3154)) ([335ee63](335ee63)) * update dependency com.google.guava:guava to v33.3.0-jre ([#3119](#3119)) ([41174b0](41174b0)) * update dependency dev.cel:cel to v0.7.1 ([#3155](#3155)) ([b1ddd16](b1ddd16)) * update dependency filelock to v3.16.0 ([#3175](#3175)) ([6681113](6681113)) * update dependency idna to v3.8 ([#3156](#3156)) ([82f5326](82f5326)) * update dependency io.netty:netty-tcnative-boringssl-static to v2.0.66.final ([#3148](#3148)) ([a7efaa8](a7efaa8)) * update dependency net.bytebuddy:byte-buddy to v1.15.1 ([#3115](#3115)) ([0e06c5f](0e06c5f)) * update dependency org.apache.commons:commons-lang3 to v3.17.0 ([#3157](#3157)) ([8d3b9fd](8d3b9fd)) * update dependency org.checkerframework:checker-qual to v3.47.0 ([#3166](#3166)) ([365674d](365674d)) * update dependency org.yaml:snakeyaml to v2.3 ([#3158](#3158)) ([e67ea9a](e67ea9a)) * update dependency platformdirs to v4.3.2 ([#3176](#3176)) ([4f2f9e0](4f2f9e0)) * update dependency virtualenv to v20.26.4 ([#3177](#3177)) ([080e078](080e078)) * update google api dependencies ([#3118](#3118)) ([67342ea](67342ea)) * update google auth library dependencies to v1.25.0 ([#3168](#3168)) ([715884a](715884a)) * update google http client dependencies to v1.45.0 ([#3159](#3159)) ([a3fe612](a3fe612)) * update googleapis/java-cloud-bom digest to 6626f91 ([#3147](#3147)) ([658e40e](658e40e)) * update junit5 monorepo to v5.11.0 ([#3111](#3111)) ([6bf84c8](6bf84c8)) * update netty dependencies to v4.1.113.final ([#3165](#3165)) ([9b5957d](9b5957d)) * update opentelemetry-java monorepo to v1.42.0 ([#3172](#3172)) ([413c44e](413c44e)) ### Documentation * Update DEVELOPMENT.md ([#3126](#3126)) ([92bdf4e](92bdf4e)) </details> --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com> Co-authored-by: ldetmer <[email protected]>
ldetmer
pushed a commit
that referenced
this pull request
Sep 17, 2024
Sometimes an operation can get stuck indefinitely. The underlying reasons can vary significantly: - the underlying attempt rpc can get stuck due to a bug in grpc (ie grpc/grpc-java#11026) - the operation can get stuck in layers above gax: googleapis/java-bigtable#1939 - or it can get stuck in gax itself (dont have a pointer handy) Guava futures provide some observability for ListenableFutures, but in creating the custom ApiFutures in gax, we lose that functionality. This PR sprinkles a few to toString to allow callers to inspect the internal state of the operation. For example with these changes, the toString() of the future returned from bigtableDataClient.mutateRows() changes from > TransformFuture@652ce654[status=PENDING, info=[inputFuture=[com.google.api.core.ApiFutureToListenableFuture@522ba524], function=[com.google.api.core.ApiFutures$ApiFunctionToGuavaFunction@29c5ee1d]]] to > ListenableFutureToApiFuture{delegate=TransformFuture@7ac9af2a[status=PENDING, info=[inputFuture=[ApiFutureToListenableFuture{apiFuture=CallbackChainRetryingFuture{super=com.google.api.gax.retrying.CallbackChainRetryingFuture@7bb004b8[status=PENDING], latestCompletedAttemptResult=null, attemptResult=null, attemptSettings=TimedAttemptSettings{globalSettings=RetrySettings{totalTimeout=PT10M, initialRetryDelay=PT0.01S, retryDelayMultiplier=2.0, maxRetryDelay=PT1M, maxAttempts=0, jittered=true, initialRpcTimeout=PT1M, rpcTimeoutMultiplier=1.0, maxRpcTimeout=PT1M}, retryDelay=PT0S, rpcTimeout=PT1M, randomizedRetryDelay=PT0S, attemptCount=0, overallAttemptCount=0, firstAttemptStartTimeNanos=635709620001791}}}], function=[com.google.api.core.ApiFutures$ApiFunctionToGuavaFunction@652ce654]]]} This allows us to reason about whats stuck. I'm working another PR that will add a close(timeout) to the Batcher that will use this functionality to identify why batcher.close() timed out
ldetmer
pushed a commit
that referenced
this pull request
Sep 17, 2024
There have been reports of batcher.close() hanging every once in awhile. Currently it is impossible to debug because we dont expose any internal state to analyze. This PR adds 2 additional methods that should help in diagnosing issues: 1. close(timeout) will try to close the batcher, but if any of the underlying batch operations fail, the exception message will contain a wealth of information describing the underlying state of operations as provided by #3140 2. cancelOutstanding this allows for remediation for close(timeout) throwing an exception. The intended usecase is dataflow connector's FinishBundle: ```java try { batcher.close(Duration.ofMinutes(1)); } catch(TimeoutException e) { // log details why the batch failed to close with the help of #3140 logger.error(e); batcher.cancelOutstanding(); batcher.close(Duration.ofMinutes(1)); } ``` Example exception message: > Exception in thread "main" com.google.api.gax.batching.BatchingException: Timed out trying to close batcher after PT1S. Batch request prototype: com.google.cloud.bigtable.data.v2.models.BulkMutation@2bac9ba. Outstanding batches: Batch{operation=CallbackChainRetryingFuture{super=null, latestCompletedAttemptResult=ImmediateFailedFuture@6a9d5dff[status=FAILURE, cause=[com.google.cloud.bigtable.data.v2.models.MutateRowsException: Some mutations failed to apply]], attemptResult=null, attemptSettings=TimedAttemptSettings{globalSettings=RetrySettings{totalTimeout=PT10M, initialRetryDelay=PT0.01S, retryDelayMultiplier=2.0, maxRetryDelay=PT1M, maxAttempts=0, jittered=true, initialRpcTimeout=PT1M, rpcTimeoutMultiplier=1.0, maxRpcTimeout=PT1M}, retryDelay=PT1.28S, rpcTimeout=PT1M, randomizedRetryDelay=PT0.877S, attemptCount=8, overallAttemptCount=8, firstAttemptStartTimeNanos=646922035424541}}, elements=com.google.cloud.bigtable.data.v2.models.RowMutationEntry@7a344b65} Co-authored-by: Blake Li <[email protected]>
ldetmer
added a commit
that referenced
this pull request
Sep 17, 2024
🤖 I have created a release *beep* *boop* --- <details><summary>2.45.0</summary> ## [2.45.0](v2.44.0...v2.45.0) (2024-09-09) ### Features * add Batcher#close(timeout) and Batcher#cancelOutstanding ([#3141](#3141)) ([b5a92e4](b5a92e4)) * add full RetrySettings sample code to Settings classes ([#3056](#3056)) ([8fe3a2d](8fe3a2d)) * add toString to futures returned by operations ([#3140](#3140)) ([afecb8c](afecb8c)) * bake gapic-generator-java into the hermetic build docker image ([#3067](#3067)) ([a372e82](a372e82)) ### Bug Fixes * **gax:** prevent truncation/overflow when converting time values ([#3095](#3095)) ([699074e](699074e)) ### Dependencies * add opentelemetry exporter-metrics and shared-resoucemapping to shared dependencies ([#3078](#3078)) ([fc8d80d](fc8d80d)) * update dependency certifi to v2024.8.30 ([#3150](#3150)) ([c18b705](c18b705)) * update dependency com.google.api-client:google-api-client-bom to v2.7.0 ([#3151](#3151)) ([5f43e43](5f43e43)) * update dependency com.google.errorprone:error_prone_annotations to v2.31.0 ([#3153](#3153)) ([3071509](3071509)) * update dependency com.google.errorprone:error_prone_annotations to v2.31.0 ([#3154](#3154)) ([335ee63](335ee63)) * update dependency com.google.guava:guava to v33.3.0-jre ([#3119](#3119)) ([41174b0](41174b0)) * update dependency dev.cel:cel to v0.7.1 ([#3155](#3155)) ([b1ddd16](b1ddd16)) * update dependency filelock to v3.16.0 ([#3175](#3175)) ([6681113](6681113)) * update dependency idna to v3.8 ([#3156](#3156)) ([82f5326](82f5326)) * update dependency io.netty:netty-tcnative-boringssl-static to v2.0.66.final ([#3148](#3148)) ([a7efaa8](a7efaa8)) * update dependency net.bytebuddy:byte-buddy to v1.15.1 ([#3115](#3115)) ([0e06c5f](0e06c5f)) * update dependency org.apache.commons:commons-lang3 to v3.17.0 ([#3157](#3157)) ([8d3b9fd](8d3b9fd)) * update dependency org.checkerframework:checker-qual to v3.47.0 ([#3166](#3166)) ([365674d](365674d)) * update dependency org.yaml:snakeyaml to v2.3 ([#3158](#3158)) ([e67ea9a](e67ea9a)) * update dependency platformdirs to v4.3.2 ([#3176](#3176)) ([4f2f9e0](4f2f9e0)) * update dependency virtualenv to v20.26.4 ([#3177](#3177)) ([080e078](080e078)) * update google api dependencies ([#3118](#3118)) ([67342ea](67342ea)) * update google auth library dependencies to v1.25.0 ([#3168](#3168)) ([715884a](715884a)) * update google http client dependencies to v1.45.0 ([#3159](#3159)) ([a3fe612](a3fe612)) * update googleapis/java-cloud-bom digest to 6626f91 ([#3147](#3147)) ([658e40e](658e40e)) * update junit5 monorepo to v5.11.0 ([#3111](#3111)) ([6bf84c8](6bf84c8)) * update netty dependencies to v4.1.113.final ([#3165](#3165)) ([9b5957d](9b5957d)) * update opentelemetry-java monorepo to v1.42.0 ([#3172](#3172)) ([413c44e](413c44e)) ### Documentation * Update DEVELOPMENT.md ([#3126](#3126)) ([92bdf4e](92bdf4e)) </details> --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com> Co-authored-by: ldetmer <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Sometimes an operation can get stuck indefinitely. The underlying reasons can vary significantly:
Guava futures provide some observability for ListenableFutures, but in creating the custom ApiFutures in gax, we lose that functionality. This PR sprinkles a few to toString to allow callers to inspect the internal state of the operation. For example with these changes, the toString() of the future returned from bigtableDataClient.mutateRows() changes from
to
This allows us to reason about whats stuck. I'm working another PR that will add a close(timeout) to the Batcher that will use this functionality to identify why batcher.close() timed out
Thank you for opening a Pull Request! Before submitting your PR, please read our contributing guidelines.
There are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> ☕️