Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add options to control number of Storage API connections when using multiplexing #31721

Merged
merged 10 commits into from
Jul 12, 2024

Conversation

ahmedabu98
Copy link
Contributor

Controls min and max number of connections to connection pool. For more details on this BigQuery Storage Write API feature, see https://cloud.google.com/bigquery/docs/write-api-best-practices#connection_pool_management

@ahmedabu98
Copy link
Contributor Author

R: @GaoleMeng
R: @agrawal-siddharth

CC: @reuvenlax
CC: @damccorm

Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

Copy link

codecov bot commented Jun 29, 2024

Codecov Report

Attention: Patch coverage is 36.36364% with 7 lines in your changes missing coverage. Please review.

Project coverage is 71.18%. Comparing base (c3756c0) to head (3801412).
Report is 1 commits behind head on master.

Files Patch % Lines
...beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java 0.00% 7 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #31721      +/-   ##
============================================
- Coverage     71.18%   71.18%   -0.01%     
  Complexity     3019     3019              
============================================
  Files          1058     1058              
  Lines        134254   134261       +7     
  Branches       3257     3257              
============================================
+ Hits          95571    95572       +1     
- Misses        35536    35542       +6     
  Partials       3147     3147              
Flag Coverage Δ
java 69.59% <36.36%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@liferoad liferoad added this to the 2.58.0 Release milestone Jul 9, 2024
"Enables multiplexing mode, where multiple tables can share the same connection. Only available when writing with STORAGE_API_AT_LEAST_ONCE"
+ " mode. This is recommended if your write operation is creating 20+ connections. When using multiplexing, consider tuning "
+ "the number of connections created by the connection pool with minConnectionPoolConnections and maxConnectionPoolConnections. "
+ "For more information, see https://cloud.google.com/bigquery/docs/write-api-best-practices#connection_pool_management")
@Default.Boolean(false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's time to enable this by default (though we should probably update our BOM to point to a more-recent client lib first).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to trying this, I'd probably vote to to do it separately after the release cut though to give it time to bake and so we can check on benchmarks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enabling this as a default may have negative effects on pipelines writing to >20 destinations (since maxConnectionsPerRegion is 20 by default).

Whereas currently the sink would create 1 connection per destination, enabling multiplexing would limit it to 20 connections (unless the user explicitly sets a higher maximum)

Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks. I think this is probably safe to merge so it makes the release cut, but it would be great to get thoughts from @GaoleMeng and @agrawal-siddharth here as well before we actually approve an RC

@ahmedabu98
Copy link
Contributor Author

Thanks for taking a look @agrawal-siddharth! Merging now

@ahmedabu98 ahmedabu98 merged commit 4d429dd into apache:master Jul 12, 2024
19 checks passed
jrmccluskey pushed a commit that referenced this pull request Jul 15, 2024
* add options to set min and max connections to connection management pool; rename counter to be more accurate

* add multiplexing description

* add to CHANGES.md

* whitespace

* doc

* clarify documentation and address comments

* adjust description

* add details
acrites pushed a commit to acrites/beam that referenced this pull request Jul 17, 2024
…ultiplexing (apache#31721)

* add options to set min and max connections to connection management pool; rename counter to be more accurate

* add multiplexing description

* add to CHANGES.md

* clarify documentation and address comments

* adjust description

* add details
reeba212 pushed a commit to reeba212/beam that referenced this pull request Dec 4, 2024
…ultiplexing (apache#31721)

* add options to set min and max connections to connection management pool; rename counter to be more accurate

* add multiplexing description

* add to CHANGES.md

* clarify documentation and address comments

* adjust description

* add details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants