You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 11, 2022. It is now read-only.
Dataflow SDK for Java 1.7.0 introduced a performance regression in BigQueryIO.Write when using BigQuery's streaming inserts. Users may see the following stack trace in their logs:
java.lang.IllegalArgumentException: timeout value is negative at java.lang.Thread.sleep(Native Method)
at com.google.cloud.dataflow.sdk.util.BigQueryTableInserter.insertAll(BigQueryTableInserter.java:287)
at com.google.cloud.dataflow.sdk.io.BigQueryIO$StreamingWriteFn.flushRows(BigQueryIO.java:2446)
at com.google.cloud.dataflow.sdk.io.BigQueryIO$StreamingWriteFn.finishBundle(BigQueryIO.java:2404)
Solution
The fix for this issue was merged into the GitHub master branch in #448. It will be included in the upcoming 1.8.0 release of the Dataflow SDK for Java, which is expected to be available by October 4th.
Impact
Streaming
When run on the Cloud Dataflow service in streaming mode, this issue will result in a slightly higher error rate. However, thanks to Dataflow's and BigQuery's retry policies, there will be no lost or duplicated data.
Most streaming jobs should see little impact from this regression. However, jobs near the BigQuery quota of 100K inserts/sec may cross that threshold because of additional retries, which could result in the job falling further and further behind. These users are advised to temporarily remain on the 1.6.1 version of the SDK or update existing 1.7.0 jobs back to the 1.6.1 SDK.
Batch
Normal batch usage of BigQueryIO.Write is not affected by this issue.
It is possible to encounter this issue in batch when using BigQueryIO.Write with per-window sharding, though the BigQueryIO.Write documentation already warns against this unsupported use. Batch pipelines that use this unsupported code may fail due to the increased error rate.
The text was updated successfully, but these errors were encountered:
Problem
Dataflow SDK for Java 1.7.0 introduced a performance regression in
BigQueryIO.Write
when using BigQuery's streaming inserts. Users may see the following stack trace in their logs:Solution
The fix for this issue was merged into the GitHub master branch in #448. It will be included in the upcoming 1.8.0 release of the Dataflow SDK for Java, which is expected to be available by October 4th.
Impact
Streaming
When run on the Cloud Dataflow service in
streaming
mode, this issue will result in a slightly higher error rate. However, thanks to Dataflow's and BigQuery's retry policies, there will be no lost or duplicated data.Most streaming jobs should see little impact from this regression. However, jobs near the BigQuery quota of 100K inserts/sec may cross that threshold because of additional retries, which could result in the job falling further and further behind. These users are advised to temporarily remain on the 1.6.1 version of the SDK or update existing 1.7.0 jobs back to the 1.6.1 SDK.
Batch
Normal batch usage of
BigQueryIO.Write
is not affected by this issue.It is possible to encounter this issue in batch when using
BigQueryIO.Write
with per-window sharding, though theBigQueryIO.Write
documentation already warns against this unsupported use. Batch pipelines that use this unsupported code may fail due to the increased error rate.The text was updated successfully, but these errors were encountered: