-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add options to control number of Storage API connections when using multiplexing #31721
Conversation
…ool; rename counter to be more accurate
R: @GaoleMeng CC: @reuvenlax |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #31721 +/- ##
============================================
- Coverage 71.18% 71.18% -0.01%
Complexity 3019 3019
============================================
Files 1058 1058
Lines 134254 134261 +7
Branches 3257 3257
============================================
+ Hits 95571 95572 +1
- Misses 35536 35542 +6
Partials 3147 3147
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
...google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java
Show resolved
Hide resolved
...google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java
Show resolved
Hide resolved
...google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java
Outdated
Show resolved
Hide resolved
"Enables multiplexing mode, where multiple tables can share the same connection. Only available when writing with STORAGE_API_AT_LEAST_ONCE" | ||
+ " mode. This is recommended if your write operation is creating 20+ connections. When using multiplexing, consider tuning " | ||
+ "the number of connections created by the connection pool with minConnectionPoolConnections and maxConnectionPoolConnections. " | ||
+ "For more information, see https://cloud.google.com/bigquery/docs/write-api-best-practices#connection_pool_management") | ||
@Default.Boolean(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it's time to enable this by default (though we should probably update our BOM to point to a more-recent client lib first).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to trying this, I'd probably vote to to do it separately after the release cut though to give it time to bake and so we can check on benchmarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enabling this as a default may have negative effects on pipelines writing to >20 destinations (since maxConnectionsPerRegion is 20 by default).
Whereas currently the sink would create 1 connection per destination, enabling multiplexing would limit it to 20 connections (unless the user explicitly sets a higher maximum)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks. I think this is probably safe to merge so it makes the release cut, but it would be great to get thoughts from @GaoleMeng and @agrawal-siddharth here as well before we actually approve an RC
...google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java
Outdated
Show resolved
Hide resolved
...google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java
Outdated
Show resolved
Hide resolved
...google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java
Show resolved
Hide resolved
...google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java
Outdated
Show resolved
Hide resolved
Thanks for taking a look @agrawal-siddharth! Merging now |
* add options to set min and max connections to connection management pool; rename counter to be more accurate * add multiplexing description * add to CHANGES.md * whitespace * doc * clarify documentation and address comments * adjust description * add details
…ultiplexing (apache#31721) * add options to set min and max connections to connection management pool; rename counter to be more accurate * add multiplexing description * add to CHANGES.md * clarify documentation and address comments * adjust description * add details
…ultiplexing (apache#31721) * add options to set min and max connections to connection management pool; rename counter to be more accurate * add multiplexing description * add to CHANGES.md * clarify documentation and address comments * adjust description * add details
Controls min and max number of connections to connection pool. For more details on this BigQuery Storage Write API feature, see https://cloud.google.com/bigquery/docs/write-api-best-practices#connection_pool_management