Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of BigQueryIO connector when withPropagateSuccessfulStorageApiWrites(true) is used #31840

Merged
merged 6 commits into from
Jul 17, 2024

Conversation

slilichenko
Copy link
Contributor

Minor updates to the way TableRows are re-constructed from the proto messages used for calls to the Storage Write API.

Profiling showed over 50% improvement in the CPU utilization in the code branch related to this re-construction.

Additional optimization is possible if the STORAGE_API_AT_LEAST_ONCE method is used. Another PR will be submitted as the follow up to this one.

slilichenko and others added 2 commits July 10, 2024 20:31
…Write API proto's to TableRows when withPropagateSuccessfulStorageApiWrites(true) is used.
Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @damondouglas for label java.
R: @shunping for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

Copy link
Contributor

@ahmedabu98 ahmedabu98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@slilichenko
Copy link
Contributor Author

@ahmedabu98 - I don't have permissions to merge the PR. Could you please, or someone with committer privileges, merge it and mention which release this PR will be available in?

@ahmedabu98
Copy link
Contributor

Release 2.58.0 has been cut a while ago. This will make it in for version 2.59.0.

Sorry for not asking earlier, but could you add an entry in CHANGES.md mentioning this improvement?

@slilichenko
Copy link
Contributor Author

@ahmedabu98 - added a line to the 2.59.0 section. When is the SNAPSHOT build going to be available and what's the tentative release date?

@ahmedabu98 ahmedabu98 merged commit f3e6c66 into apache:master Jul 17, 2024
19 checks passed
@ahmedabu98
Copy link
Contributor

We build snapshots on a daily basis, so probably tomorrow. If all goes well, 2.59.0 should be released mid/late September

reeba212 pushed a commit to reeba212/beam that referenced this pull request Dec 4, 2024
…fulStorageApiWrites(true) is used (apache#31840)

* Performance improvements related to conversion of BigQuery's Storage Write API proto's to TableRows when withPropagateSuccessfulStorageApiWrites(true) is used.

* Fix spotless findings.

* Update CHANGES.md

* Update CHANGES.md - moved the entry to 2.59.0 section.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants