Using WriteToBigQuery FILE_LOADS in a streaming pipeline does not delete temp tables #20544

damccorm · 2022-06-04T18:07:19Z

Using the FILE_LOADS method in WriteToBigQuery, it initially appears to work, sending load jobs, which then (at least sometimes) succeed and the data goes into the correct tables.

But the temporary tables that were created never get deleted. Often the data was just never even copied from the temp tables to the destination.

In the code (https://github.com/apache/beam/blob/aca9099acca969dc217ab183782e5270347cd354/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L846)

...it appears that after the load jobs, beam should wait for them to finish, then copy the data from the temp tables and delete them; however, it seems that when used with a streaming pipeline, it doesn't complete these steps.

In case it's not clear, this is for the python SDK.

For reference: https://stackoverflow.com/questions/64526500/using-writetobigquery-file-loads-in-a-streaming-pipeline-just-creates-a-lot-of-t/64543619#64543619

Imported from Jira BEAM-11134. Original Jira may contain additional context.
Reported by: lkavenagh.

The text was updated successfully, but these errors were encountered:

Abacn · 2022-10-31T16:20:24Z

@ahmedabu98 WIth the fix of FILE_LOADS in is this still a issue?

Abacn · 2022-10-31T16:20:48Z

.remove-labels 'awaiting triage'

ahmedabu98 · 2022-10-31T16:24:49Z

Yes, this should be fixed by #23012. thanks for catching that

Abacn · 2022-10-31T16:28:01Z

@ahmedabu98 Thanks. Then this issue is obsolete.

Abacn · 2022-10-31T16:28:11Z

.close-issue not_planned

damccorm added awaiting triage bug dataflow gcp io-py-gcp P3 labels Jun 4, 2022

damccorm added io py python and removed io-py-gcp labels Jun 16, 2022

github-actions bot removed the awaiting triage label Oct 31, 2022

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using WriteToBigQuery FILE_LOADS in a streaming pipeline does not delete temp tables #20544

Using WriteToBigQuery FILE_LOADS in a streaming pipeline does not delete temp tables #20544

damccorm commented Jun 4, 2022

Abacn commented Oct 31, 2022

Abacn commented Oct 31, 2022

ahmedabu98 commented Oct 31, 2022

Abacn commented Oct 31, 2022

Abacn commented Oct 31, 2022

Using WriteToBigQuery FILE_LOADS in a streaming pipeline does not delete temp tables #20544

Using WriteToBigQuery FILE_LOADS in a streaming pipeline does not delete temp tables #20544

Comments

damccorm commented Jun 4, 2022

Abacn commented Oct 31, 2022

Abacn commented Oct 31, 2022

ahmedabu98 commented Oct 31, 2022

Abacn commented Oct 31, 2022

Abacn commented Oct 31, 2022