You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using the FILE_LOADS method in WriteToBigQuery, it initially appears to work, sending load jobs, which then (at least sometimes) succeed and the data goes into the correct tables.
But the temporary tables that were created never get deleted. Often the data was just never even copied from the temp tables to the destination.
...it appears that after the load jobs, beam should wait for them to finish, then copy the data from the temp tables and delete them; however, it seems that when used with a streaming pipeline, it doesn't complete these steps.
In case it's not clear, this is for the python SDK.
Using the
FILE_LOADS
method inWriteToBigQuery
, it initially appears to work, sending load jobs, which then (at least sometimes) succeed and the data goes into the correct tables.But the temporary tables that were created never get deleted. Often the data was just never even copied from the temp tables to the destination.
In the code (https://github.com/apache/beam/blob/aca9099acca969dc217ab183782e5270347cd354/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L846)
...it appears that after the load jobs, beam should wait for them to finish, then copy the data from the temp tables and delete them; however, it seems that when used with a streaming pipeline, it doesn't complete these steps.
In case it's not clear, this is for the python SDK.
For reference: https://stackoverflow.com/questions/64526500/using-writetobigquery-file-loads-in-a-streaming-pipeline-just-creates-a-lot-of-t/64543619#64543619
Imported from Jira BEAM-11134. Original Jira may contain additional context.
Reported by: lkavenagh.
The text was updated successfully, but these errors were encountered: