Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update releases 2.33.0 and 2.34.0 to include broken BQ copy jobs in batch as a known issue #27563

Merged
merged 5 commits into from
Jul 20, 2023

Conversation

ahmedabu98
Copy link
Contributor

@ahmedabu98 ahmedabu98 commented Jul 19, 2023

In the code was released for versions 2.33.0 and 2.34.0, the wrong side input is passed into WriteRename: https://github.com/apache/beam/blob/d916c1f55e57a61b54135d0922ad8660735bd287/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java#L430-L442C47, resulting in the following error:

Example error org.apache.beam.sdk.util.UserCodeException: java.lang.IllegalArgumentException: Attempting to access unknown side input SimplePCollectionView{tag=Tag:1261#815f08b6005eb55f>, viewFn=org.apache.beam.sdk.values.PCollectionViews$SingletonViewFn2@f6a7c0f, coder=StringUtf8Coder, windowMappingFn=GlobalWindowMappingFn{}, pCollection=null}. at org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:39) at org.apache.beam.sdk.io.gcp.bigquery.WriteRename$DoFnInvoker.invokeProcessElement(Unknown Source) at org.apache.beam.fn.harness.FnApiDoFnRunner.processElementForWindowObservingParDo(FnApiDoFnRunner.java:780) at org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:266) at org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:218) at org.apache.beam.fn.harness.FnApiDoFnRunner.outputTo(FnApiDoFnRunner.java:1757) at org.apache.beam.fn.harness.FnApiDoFnRunner.access$2500(FnApiDoFnRunner.java:144) at org.apache.beam.fn.harness.FnApiDoFnRunner$WindowObservingProcessBundleContext.outputWithTimestamp(FnApiDoFnRunner.java:2160) at org.apache.beam.fn.harness.FnApiDoFnRunner$ProcessBundleContextBase.output(FnApiDoFnRunner.java:2442) at org.apache.beam.sdk.io.gcp.bigquery.ReifyAsIterable$1.processElement(ReifyAsIterable.java:49) at org.apache.beam.sdk.io.gcp.bigquery.ReifyAsIterable$1$DoFnInvoker.invokeProcessElement(Unknown Source) at org.apache.beam.fn.harness.FnApiDoFnRunner.processElementForWindowObservingParDo(FnApiDoFnRunner.java:780) at org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:266) at org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:218) at org.apache.beam.fn.harness.FnApiDoFnRunner.outputTo(FnApiDoFnRunner.java:1757) at org.apache.beam.fn.harness.FnApiDoFnRunner.access$2500(FnApiDoFnRunner.java:144) at org.apache.beam.fn.harness.FnApiDoFnRunner$WindowObservingProcessBundleContext.outputWithTimestamp(FnApiDoFnRunner.java:2160) at org.apache.beam.sdk.transforms.DoFnOutputReceivers$WindowedContextOutputReceiver.outputWithTimestamp(DoFnOutputReceivers.java:87) at org.apache.beam.sdk.io.Read$BoundedSourceAsSDFWrapperFn.processElement(Read.java:311) at org.apache.beam.sdk.io.Read$BoundedSourceAsSDFWrapperFn$DoFnInvoker.invokeProcessElement(Unknown Source) at org.apache.beam.fn.harness.FnApiDoFnRunner.processElementForWindowObservingSizedElementAndRestriction(FnApiDoFnRunner.java:1065) at org.apache.beam.fn.harness.FnApiDoFnRunner.access$1000(FnApiDoFnRunner.java:144) at org.apache.beam.fn.harness.FnApiDoFnRunner$4.accept(FnApiDoFnRunner.java:645) at org.apache.beam.fn.harness.FnApiDoFnRunner$4.accept(FnApiDoFnRunner.java:640) at org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:266) at org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:218) at org.apache.beam.fn.harness.BeamFnDataReadRunner.forwardElementToConsumer(BeamFnDataReadRunner.java:221) at org.apache.beam.sdk.fn.data.DecodingFnDataReceiver.accept(DecodingFnDataReceiver.java:43) at org.apache.beam.sdk.fn.data.DecodingFnDataReceiver.accept(DecodingFnDataReceiver.java:25) at org.apache.beam.fn.harness.data.QueueingBeamFnDataClient$ConsumerAndData.accept(QueueingBeamFnDataClient.java:316) at org.apache.beam.fn.harness.data.QueueingBeamFnDataClient.drainAndBlock(QueueingBeamFnDataClient.java:219) at org.apache.beam.fn.harness.control.ProcessBundleHandler.processBundle(ProcessBundleHandler.java:353) at org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:140) at org.apache.beam.fn.harness.control.BeamFnControlClient$InboundObserver.lambda$onNext$0(BeamFnControlClient.java:110) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IllegalArgumentException: Attempting to access unknown side input SimplePCollectionView{tag=Tag:1261#815f08b6005eb55f>, viewFn=org.apache.beam.sdk.values.PCollectionViews$SingletonViewFn2@f6a7c0f, coder=StringUtf8Coder, windowMappingFn=GlobalWindowMappingFn{}, pCollection=null}. at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:216) at org.apache.beam.fn.harness.state.FnApiStateAccessor.get(FnApiStateAccessor.java:152) at org.apache.beam.fn.harness.FnApiDoFnRunner$WindowObservingProcessBundleContext.sideInput(FnApiDoFnRunner.java:2103) at org.apache.beam.sdk.io.gcp.bigquery.WriteRename.startWriteRename(WriteRename.java:197) at org.apache.beam.sdk.io.gcp.bigquery.WriteRename.processElement(WriteRename.java:132)

We don't have any batch copy job tests in BigQueryIOWriteTest that would have otherwise caught this (all of our tests for copy jobs use a streaming pipeline).

Tests for this batch case added in: #27434

@ahmedabu98
Copy link
Contributor Author

R: @liferoad

@@ -79,6 +79,7 @@ notes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527
* Spark 2.x users will need to update Spark's Jackson runtime dependencies (`spark.jackson.version`) to at least version 2.9.2, due to Beam updating its dependencies.
* See a full list of open [issues that affect](https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20affectedVersion%20%3D%202.33.0%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC) this version.
* Go SDK jobs may produce "Failed to deduce Step from MonitoringInfo" messages following successful job execution. The messages are benign and don't indicate job failure. These are due to not yet handling PCollection metrics.
* Large BigQueryIO writes that use file loads method will fail in batch mode. Specifically, writes that are large enough to use copy jobs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should mention what kind of errors users will see and what the action users need to do. If you have a github issue, you could add more details there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have an issue for this, but I'll add an error description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this?

Large BigQueryIO writes with the FILE_LOADS method might fail in batch mode when large writes use copy jobs. The resulted in error message is IllegalArgumentException: Attempting to access unknown side input. Please upgrade to a newer version (> 2.34.0) or use another write method (STORAGE_WRITE_API).

And does this only affect Java SDK?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed it, PTAL

@github-actions
Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

@Abacn
Copy link
Contributor

Abacn commented Jul 19, 2023

Thanks! Please fix the whitespace then will merge

@ahmedabu98
Copy link
Contributor Author

@Abacn fixed whitespace

@Abacn Abacn merged commit 3d501ee into apache:master Jul 20, 2023
cushon pushed a commit to cushon/beam that referenced this pull request May 24, 2024
…atch as a known issue (apache#27563)

* update release docs

* add error description and workaround

* update

* add java

* fix whitespace
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants