-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: BigQuery Storage Write API does not write with no complaint #28168
Comments
.add-labels P1 |
Can you share more details about your job? Streaming ? Source? Since this is a Dataflow job, you could open a cloud ticket: https://cloud.google.com/dataflow/docs/support/getting-support#file-bugs-or-feature-requests |
Hey @onurdialpad, can you provide a reproducible snippet? |
@ahmedabu98 sure, here a snippet. I tested the snippet as well and it did the same thing. No write, no log/error.
@liferoad Data comes from Pub/Sub in the real environment and the Dataflow is running with streaming mode. It works if I use |
@onurdialpad I tried running the snippet you provided on 2.49.0 and it worked for both local and Dataflows runners. Can you provide us any relevant logs you're seeing? +1 to @liferoad's suggestion of opening a Dataflow ticket, it will help the internal engineers debug your pipeline better |
@ahmedabu98 thanks for trying that, Can you elaborate what you meant by "it worked", did you see the job wrote records to the BQ? Regarding with opening ticket to Dataflow, sure I will do it. Just a note, when I try to run the snippet with DirectRunner on the local it does not write anything to BQ with no log. To clarify: it "works" but not as intended which means it is supposed to write to BQ but it does not, it just works without doing anything. |
Hey @onurdialpad, I'm still digging into it but I've narrowed it down to runner V2 (both Java and Python jobs exhibit this behavior). I suspect a recent internal change is tripping up this behavior. I'll continue investigating but for now, you may be able to mitigate this by running with the legacy runner. Python Dataflow jobs default to runner v2 but you can disable it as long as you're using a Beam version that is before |
Ahh sorry nevermind, this xlang storage write connector was implemented on 2.47.0, so that mitigation won't work |
Hey @ahmedabu98 thanks for the effort! It's interesting that the snippet I share here doesn't produce any output on BQ side even it uses batch source |
Hey @onurdialpad, we've confirmed it is a bug in Dataflow's Runner V2 that gets hit by Storage Write API with autosharding. One workaround is to use I'm going to open a PR to also allow setting a fixed number of shards as another workaround, which may be available for Beam |
This is tagged as blocking 2.51.0 which is in progress now. This does seem like a major lack of functionality. I see followups and comments on and about #28592. Is there a cherrypick open or is it not yet resolved? |
Hey @kennknowles, this is resolved and a CP is ready in #28631 |
What happened?
I wanted to test Storage Write API with SDK 2.49.0 and tried to write a simple data on Dataflow but the "writing" step does not do anything, no logging there as well.
Here is my code snippet.
Here the step does not produce output
Issue Priority
Priority: 1
Issue Components
The text was updated successfully, but these errors were encountered: