-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update releases 2.33.0 and 2.34.0 to include broken BQ copy jobs in batch as a known issue #27563
Conversation
R: @liferoad |
@@ -79,6 +79,7 @@ notes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527 | |||
* Spark 2.x users will need to update Spark's Jackson runtime dependencies (`spark.jackson.version`) to at least version 2.9.2, due to Beam updating its dependencies. | |||
* See a full list of open [issues that affect](https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20affectedVersion%20%3D%202.33.0%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC) this version. | |||
* Go SDK jobs may produce "Failed to deduce Step from MonitoringInfo" messages following successful job execution. The messages are benign and don't indicate job failure. These are due to not yet handling PCollection metrics. | |||
* Large BigQueryIO writes that use file loads method will fail in batch mode. Specifically, writes that are large enough to use copy jobs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should mention what kind of errors users will see and what the action users need to do. If you have a github issue, you could add more details there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have an issue for this, but I'll add an error description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this?
Large BigQueryIO writes with the FILE_LOADS
method might fail in batch mode when large writes use copy jobs. The resulted in error message is IllegalArgumentException: Attempting to access unknown side input
. Please upgrade to a newer version (> 2.34.0) or use another write method (STORAGE_WRITE_API
).
And does this only affect Java SDK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed it, PTAL
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
Thanks! Please fix the whitespace then will merge |
@Abacn fixed whitespace |
…atch as a known issue (apache#27563) * update release docs * add error description and workaround * update * add java * fix whitespace
In the code was released for versions 2.33.0 and 2.34.0, the wrong side input is passed into WriteRename: https://github.com/apache/beam/blob/d916c1f55e57a61b54135d0922ad8660735bd287/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java#L430-L442C47, resulting in the following error:
Example error
org.apache.beam.sdk.util.UserCodeException: java.lang.IllegalArgumentException: Attempting to access unknown side input SimplePCollectionView{tag=Tag:1261#815f08b6005eb55f>, viewFn=org.apache.beam.sdk.values.PCollectionViews$SingletonViewFn2@f6a7c0f, coder=StringUtf8Coder, windowMappingFn=GlobalWindowMappingFn{}, pCollection=null}. at org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:39) at org.apache.beam.sdk.io.gcp.bigquery.WriteRename$DoFnInvoker.invokeProcessElement(Unknown Source) at org.apache.beam.fn.harness.FnApiDoFnRunner.processElementForWindowObservingParDo(FnApiDoFnRunner.java:780) at org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:266) at org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:218) at org.apache.beam.fn.harness.FnApiDoFnRunner.outputTo(FnApiDoFnRunner.java:1757) at org.apache.beam.fn.harness.FnApiDoFnRunner.access$2500(FnApiDoFnRunner.java:144) at org.apache.beam.fn.harness.FnApiDoFnRunner$WindowObservingProcessBundleContext.outputWithTimestamp(FnApiDoFnRunner.java:2160) at org.apache.beam.fn.harness.FnApiDoFnRunner$ProcessBundleContextBase.output(FnApiDoFnRunner.java:2442) at org.apache.beam.sdk.io.gcp.bigquery.ReifyAsIterable$1.processElement(ReifyAsIterable.java:49) at org.apache.beam.sdk.io.gcp.bigquery.ReifyAsIterable$1$DoFnInvoker.invokeProcessElement(Unknown Source) at org.apache.beam.fn.harness.FnApiDoFnRunner.processElementForWindowObservingParDo(FnApiDoFnRunner.java:780) at org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:266) at org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:218) at org.apache.beam.fn.harness.FnApiDoFnRunner.outputTo(FnApiDoFnRunner.java:1757) at org.apache.beam.fn.harness.FnApiDoFnRunner.access$2500(FnApiDoFnRunner.java:144) at org.apache.beam.fn.harness.FnApiDoFnRunner$WindowObservingProcessBundleContext.outputWithTimestamp(FnApiDoFnRunner.java:2160) at org.apache.beam.sdk.transforms.DoFnOutputReceivers$WindowedContextOutputReceiver.outputWithTimestamp(DoFnOutputReceivers.java:87) at org.apache.beam.sdk.io.Read$BoundedSourceAsSDFWrapperFn.processElement(Read.java:311) at org.apache.beam.sdk.io.Read$BoundedSourceAsSDFWrapperFn$DoFnInvoker.invokeProcessElement(Unknown Source) at org.apache.beam.fn.harness.FnApiDoFnRunner.processElementForWindowObservingSizedElementAndRestriction(FnApiDoFnRunner.java:1065) at org.apache.beam.fn.harness.FnApiDoFnRunner.access$1000(FnApiDoFnRunner.java:144) at org.apache.beam.fn.harness.FnApiDoFnRunner$4.accept(FnApiDoFnRunner.java:645) at org.apache.beam.fn.harness.FnApiDoFnRunner$4.accept(FnApiDoFnRunner.java:640) at org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:266) at org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:218) at org.apache.beam.fn.harness.BeamFnDataReadRunner.forwardElementToConsumer(BeamFnDataReadRunner.java:221) at org.apache.beam.sdk.fn.data.DecodingFnDataReceiver.accept(DecodingFnDataReceiver.java:43) at org.apache.beam.sdk.fn.data.DecodingFnDataReceiver.accept(DecodingFnDataReceiver.java:25) at org.apache.beam.fn.harness.data.QueueingBeamFnDataClient$ConsumerAndData.accept(QueueingBeamFnDataClient.java:316) at org.apache.beam.fn.harness.data.QueueingBeamFnDataClient.drainAndBlock(QueueingBeamFnDataClient.java:219) at org.apache.beam.fn.harness.control.ProcessBundleHandler.processBundle(ProcessBundleHandler.java:353) at org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:140) at org.apache.beam.fn.harness.control.BeamFnControlClient$InboundObserver.lambda$onNext$0(BeamFnControlClient.java:110) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IllegalArgumentException: Attempting to access unknown side input SimplePCollectionView{tag=Tag:1261#815f08b6005eb55f>, viewFn=org.apache.beam.sdk.values.PCollectionViews$SingletonViewFn2@f6a7c0f, coder=StringUtf8Coder, windowMappingFn=GlobalWindowMappingFn{}, pCollection=null}. at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:216) at org.apache.beam.fn.harness.state.FnApiStateAccessor.get(FnApiStateAccessor.java:152) at org.apache.beam.fn.harness.FnApiDoFnRunner$WindowObservingProcessBundleContext.sideInput(FnApiDoFnRunner.java:2103) at org.apache.beam.sdk.io.gcp.bigquery.WriteRename.startWriteRename(WriteRename.java:197) at org.apache.beam.sdk.io.gcp.bigquery.WriteRename.processElement(WriteRename.java:132)We don't have any batch copy job tests in BigQueryIOWriteTest that would have otherwise caught this (all of our tests for copy jobs use a streaming pipeline).
Tests for this batch case added in: #27434