Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using WriteToBigQuery FILE_LOADS in a streaming pipeline does not delete temp tables #20544

Closed
damccorm opened this issue Jun 4, 2022 · 5 comments

Comments

@damccorm
Copy link
Contributor

damccorm commented Jun 4, 2022

Using the FILE_LOADS method in WriteToBigQuery, it initially appears to work, sending load jobs, which then (at least sometimes) succeed and the data goes into the correct tables.

But the temporary tables that were created never get deleted. Often the data was just never even copied from the temp tables to the destination.

In the code (https://github.com/apache/beam/blob/aca9099acca969dc217ab183782e5270347cd354/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L846)

...it appears that after the load jobs, beam should wait for them to finish, then copy the data from the temp tables and delete them; however, it seems that when used with a streaming pipeline, it doesn't complete these steps.

 

In case it's not clear, this is for the python SDK.

 

For reference: https://stackoverflow.com/questions/64526500/using-writetobigquery-file-loads-in-a-streaming-pipeline-just-creates-a-lot-of-t/64543619#64543619

Imported from Jira BEAM-11134. Original Jira may contain additional context.
Reported by: lkavenagh.

@Abacn
Copy link
Contributor

Abacn commented Oct 31, 2022

@ahmedabu98 WIth the fix of FILE_LOADS in is this still a issue?

@Abacn
Copy link
Contributor

Abacn commented Oct 31, 2022

.remove-labels 'awaiting triage'

@ahmedabu98
Copy link
Contributor

Yes, this should be fixed by #23012. thanks for catching that

@Abacn
Copy link
Contributor

Abacn commented Oct 31, 2022

@ahmedabu98 Thanks. Then this issue is obsolete.

@Abacn
Copy link
Contributor

Abacn commented Oct 31, 2022

.close-issue not_planned

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants