Make non-portable Splittable DoFn the only option when executing Java "Read" transforms #20530

damccorm · 2022-06-04T18:03:47Z

All runners seem to be capable of migrating to splittable DoFn for non-portable execution except for Dataflow runner v1 which will internalize the current primitive read implementation that is shared across runner implementations.

Imported from Jira BEAM-10670. Original Jira may contain additional context.
Reported by: lcwik.

aditiwari01 · 2023-04-12T06:56:22Z

Hi @damccorm

I was trying KafkaIO with FlinkRunner but facing following issue:

Exception in thread "main" java.lang.IllegalStateException: No translator known for org.apache.beam.runners.core.construction.SplittableParDo$PrimitiveUnboundedRead
	at org.apache.beam.runners.core.construction.PTransformTranslation.urnForTransform(PTransformTranslation.java:283)
	at org.apache.beam.runners.flink.FlinkStreamingPipelineTranslator.visitPrimitiveTransform(FlinkStreamingPipelineTranslator.java:135)
	at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:593)
	at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:585)
	at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:585)
	at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:585)
	at org.apache.beam.sdk.runners.TransformHierarchy$Node.access$500(TransformHierarchy.java:240)
	at org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:214)
	at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:469)
	at org.apache.beam.runners.flink.FlinkPipelineTranslator.translate(FlinkPipelineTranslator.java:38)
	at org.apache.beam.runners.flink.FlinkStreamingPipelineTranslator.translate(FlinkStreamingPipelineTranslator.java:92)
	at org.apache.beam.runners.flink.FlinkPipelineExecutionEnvironment.translate(FlinkPipelineExecutionEnvironment.java:115)
	at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.java:105)
	at org.apache.beam.sdk.Pipeline.run(Pipeline.java:323)
	at org.apache.beam.sdk.Pipeline.run(Pipeline.java:309)
	at BeamPipelineKafka.main(BeamPipelineKafka.java:54)

As you mentioned all the runners are capable of Splittable DoFn, is there anything I am missing?

I have also tried with "--experiments=use_deprecated_read" to use primitive read but still facing same issue.

kennknowles · 2023-09-26T20:50:16Z

I don't think anyone is actively pursuing this goal at the moment. I think that the portable FlinkRunner is the one that has splittable DoFn support. They are pretty independent runners, I believe.

kennknowles · 2024-01-04T20:14:29Z

The current (bad) status is that all non-Dataflow runners will use legacy read if the runner is set up prior to expansion. This results in non-portable expansion behaviors.

The desired status would be that runners override the SDF read to legacy read if desired. The code to do this is already shipped with KafkaIO and used in the Dataflow runner, but it would be some real work, and probably throwaway work, to adjust other runners to use the override. More likely we just push everything to SDF.

damccorm added clarified improvement P1 labels Jun 4, 2022

damccorm added core java and removed sdk-java-core labels Jun 16, 2022

manuzhang added P2 and removed P1 labels Jul 4, 2022

kennknowles closed this as completed Sep 26, 2023

github-actions bot added this to the 2.52.0 Release milestone Sep 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make non-portable Splittable DoFn the only option when executing Java "Read" transforms #20530

Make non-portable Splittable DoFn the only option when executing Java "Read" transforms #20530

damccorm commented Jun 4, 2022

aditiwari01 commented Apr 12, 2023

kennknowles commented Sep 26, 2023

kennknowles commented Jan 4, 2024

Make non-portable Splittable DoFn the only option when executing Java "Read" transforms #20530

Make non-portable Splittable DoFn the only option when executing Java "Read" transforms #20530

Comments

damccorm commented Jun 4, 2022

aditiwari01 commented Apr 12, 2023

kennknowles commented Sep 26, 2023

kennknowles commented Jan 4, 2024