Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support dynamic destinations with Python Storage API #30045

Merged
merged 8 commits into from
Jan 24, 2024

Conversation

ahmedabu98
Copy link
Contributor

Python's write to BQ Storage API connector uses the cross-language framework. This PR adds support for dynamic destinations to this write mode.

Destination for each row is determined in the Python side using the callable. Then we send each row with its destination over to Java. We send Java a magic string to communicate that dynamic destinations are being used so that it handles this correctly.

Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @AnandInguva for label python.
R: @damondouglas for label java.
R: @damondouglas for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@ahmedabu98
Copy link
Contributor Author

R: @chamikaramj
R: @johnjcasey

Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

@damondouglas damondouglas self-requested a review January 19, 2024 17:24
sdks/python/apache_beam/io/gcp/bigquery.py Outdated Show resolved Hide resolved
input_beam_rows = (
input_rows
| "Wrap in Beam Row" >>
beam.Map(lambda row: beam.Row(destination=row[0], record=row[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for row[0] or row[1] to be None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only if the dynamic destination function returns None, which would be weird from the user's side.

sdks/python/apache_beam/io/gcp/bigquery.py Outdated Show resolved Hide resolved
@github-actions github-actions bot added the build label Jan 21, 2024
@ahmedabu98
Copy link
Contributor Author

ahmedabu98 commented Jan 22, 2024

Copy link
Contributor

@johnjcasey johnjcasey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM

@chamikaramj
Copy link
Contributor

LGTM.

@ahmedabu98 ahmedabu98 added this to the 2.54.0 Release milestone Jan 24, 2024
@ahmedabu98 ahmedabu98 merged commit 2721414 into apache:master Jan 24, 2024
61 of 63 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants