Disable KafkaIO SDF while it is tested and fixed #22261

johnjcasey · 2022-07-13T15:17:13Z

See #22303

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Choose reviewer(s) and mention them in a comment (R: @username).
Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI.

chamikaramj · 2022-07-13T15:35:58Z

sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java

+ * <h1>Reading from Kafka SDF is currently broken, as re-starting the pipeline will cause the
+ * consumer to start from scratch. See <a
+ * href="https://github.com/apache/beam/issues/21730">this</a>. Current workaround is to use
+ * --experimental_option=use_deprecated_read to use the Unbounded implementation</h1>


I don't think this will work for Dataflow Runner v2 Java pipelines. Can you try ?

"use_unbounded_sdf_wrapper" should work but I only tried it for x-lang.

will do once your change is merged

johnjcasey · 2022-07-13T16:00:28Z

Run Spotless Precommit

johnjcasey · 2022-07-13T16:48:29Z

Run Java PreCommit

johnjcasey · 2022-07-13T16:49:38Z

Run PythonDocs PreCommit

github-actions · 2022-07-13T17:10:39Z

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

codecov · 2022-07-14T14:29:20Z

Codecov Report

Merging #22261 (0cb617e) into master (50346b5) will increase coverage by 0.00%.
The diff coverage is 50.00%.

@@           Coverage Diff           @@
##           master   #22261   +/-   ##
=======================================
  Coverage   74.17%   74.17%           
=======================================
  Files         706      704    -2     
  Lines       93190    93159   -31     
=======================================
- Hits        69124    69105   -19     
+ Misses      22798    22787   -11     
+ Partials     1268     1267    -1

Flag	Coverage Δ
python	`83.53% <50.00%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
sdks/python/apache_beam/io/kafka.py	`80.00% <50.00%> (ø)`
.../python/apache_beam/testing/test_stream_service.py	`88.09% <0.00%> (-4.77%)`	⬇️
sdks/python/apache_beam/utils/interactive_utils.py	`95.12% <0.00%> (-2.44%)`	⬇️
...che_beam/runners/interactive/interactive_runner.py	`90.06% <0.00%> (-1.33%)`	⬇️
...eam/runners/portability/fn_api_runner/execution.py	`92.44% <0.00%> (-0.65%)`	⬇️
sdks/python/apache_beam/transforms/combiners.py	`93.05% <0.00%> (-0.39%)`	⬇️
...eam/runners/interactive/interactive_environment.py	`91.71% <0.00%> (-0.31%)`	⬇️
...hon/apache_beam/runners/worker/bundle_processor.py	`93.54% <0.00%> (-0.13%)`	⬇️
.../pkg/beam/core/runtime/harness/diagnostics_hook.go
sdks/go/pkg/beam/util/harnessopts/heap_dump.go
... and 2 more

Help us with your feedback. Take ten seconds to tell us how you rate us.

github-actions · 2022-07-14T16:04:54Z

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @pabloem for label python.

Available commands:

stop reviewer notifications - opt out of the automated review tooling
remind me after tests pass - tag the comment author after tests pass
waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

johnjcasey · 2022-07-14T16:37:14Z

Run Portable_Python PreCommit

johnjcasey · 2022-07-14T16:37:22Z

Run Python PreCommit

johnjcasey · 2022-07-14T16:37:29Z

Run PythonDocs PreCommit

Abacn · 2022-07-14T17:49:11Z

pydoc precommit failure is due to

packages/apache_beam/io/kafka.py:docstring of apache_beam.io.kafka:67: WARNING: Unexpected indentation.

…kafka-sdf-comment

johnjcasey · 2022-07-14T20:24:38Z

codecov is exclusively unrelated changes, so there is no need to interact with it

johnjcasey · 2022-07-15T15:15:25Z

R: @chamikaramj

github-actions · 2022-07-15T16:07:39Z

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

chamikaramj

Thanks.

chamikaramj · 2022-07-15T16:46:02Z

sdks/python/apache_beam/io/kafka.py

@@ -76,16 +76,33 @@

  For more information specific to Flink runner see:
  - https://beam.apache.org/documentation/runners/flink/
+
+  Reading via Kafka SDF is currently broken, and will cause the pipeline


I think we can remove Python updates since we made UnboundedSource wrapped SDF Kafka the default for now: #22286

chamikaramj · 2022-07-15T16:49:51Z

sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java

+ * <h1>Reading from Kafka SDF is currently broken, as re-starting the pipeline will cause the
+ * consumer to start from scratch. See <a
+ * href="https://github.com/apache/beam/issues/21730">this</a>. Current workaround is to use
+ * --experimental_option=use_unbounded_sdf_wrapper to use the Unbounded implementation</h1>


"For runners that require SDF, current workaround is to use ..."

Also, pls confirm that this works for Java pipelines that use Dataflow Runner v2.

…kafka-sdf-comment

sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java

chamikaramj

Thanks. LGTM other than one comment.

chamikaramj · 2022-07-15T18:11:45Z

sdks/python/apache_beam/transforms/external.py

@@ -706,27 +706,12 @@ class JavaJarExpansionService(object):
  This can be passed into an ExternalTransform as the expansion_service
  argument which will spawn a subprocess using this jar to expand the
  transform.
-
-  Args:


Please don't revert changes to external.py since this feature and doc updates are useful in general.

…kafka-sdf-comment

…g an experimental option

johnjcasey · 2022-07-20T22:04:30Z

Run Python_PVR_Flink PreCommit

johnjcasey · 2022-07-20T22:29:45Z

Run Java PreCommit

chamikaramj

LGTM. Thanks.

johnjcasey · 2022-07-20T22:35:28Z

Run Python_PVR_Flink PreCommit

nbali · 2022-07-20T22:51:52Z

sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java

-        return input.apply(new ReadFromKafkaViaUnbounded<>(this, keyCoder, valueCoder));
-      }
-      return input.apply(new ReadFromKafkaViaSDF<>(this, keyCoder, valueCoder));
+      // Reading from Kafka SDF is currently broken, as re-starting the pipeline will cause the


This isn't a valid reason to completely disable Kafka SDF. What if the consumer is totally fine with starting from scratch? I have business need that requires scanning time ranges - that is only being supported by SDF - without caring about any previous consumer offset. Disable it ONLY if group.id is provided.

beam/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIOReadImplementationCompatibility.java

Lines 93 to 94 in 367173f

START_READ_TIME,

STOP_READ_TIME(SDF),

Basically just add to the whole chunk of code you commented here a check like this:

|| ExperimentalOptions.hasExperiment( input.getPipeline().getOptions(), "use_unbounded_sdf_wrapper") || getConsumerConfig().get(ConsumerConfig.GROUP_ID_CONFIG) != null || compatibility.supportsOnly(KafkaIOReadImplementation.LEGACY)

This is temporary. We primarily want to make sure that a new user won't run into this problem. We intend to fix this as rapidly as possible. If you have a less typical use case, that will still work on existing versions of Beam, while we try to get this fixed

If this gets merged into master and gets released it's not temporary. Totally removing the SDF support is a breaking change. It should be as minimally breaking as possible. I have shown one precondition that indicates it works just fine even in this bugged state. There could be even more.

If you have this use case, there are probably others. We won't disable this at all then

johnjcasey added 3 commits July 13, 2022 11:03

21730 add warning message

2bf47c9

add python warnings

4ae0939

add python warnings

4ef4c00

chamikaramj reviewed Jul 13, 2022

View reviewed changes

Merge remote-tracking branch 'origin/master' into add-kafka-sdf-comment

8b20e60

github-actions bot added io java kafka python labels Jul 13, 2022

johnjcasey added 2 commits July 14, 2022 09:50

Merge remote-tracking branch 'origin/master' into add-kafka-sdf-comment

476cc54

update comments

aeb6b6e

github-actions bot added the Next Action: Reviewers label Jul 14, 2022

johnjcasey added 3 commits July 14, 2022 13:51

fix indentation for pydoc

ac6c369

Merge remote-tracking branch 'origin/add-kafka-sdf-comment' into add-…

1f55b1a

…kafka-sdf-comment

Merge remote-tracking branch 'origin/master' into add-kafka-sdf-comment

5b0e77a

chamikaramj reviewed Jul 15, 2022

View reviewed changes

update comment

ba3f3b8

johnjcasey added 4 commits July 15, 2022 13:31

remove python comments

b408303

Merge remote-tracking branch 'origin/add-kafka-sdf-comment' into add-…

fa33138

…kafka-sdf-comment

Merge remote-tracking branch 'origin/master' into add-kafka-sdf-comment

0d01374

Merge remote-tracking branch 'origin/master' into add-kafka-sdf-comment

d3997c0

github-actions bot removed the python label Jul 15, 2022

johnjcasey added 3 commits July 15, 2022 13:53

revert Cham's changes

6daf3a0

Merge remote-tracking branch 'origin/add-kafka-sdf-comment' into add-…

7670410

…kafka-sdf-comment

Disable kafka sdf by adding "use_unbounded_sdf_wrapper"

d6345ad

johnjcasey changed the title ~~Add comments and logs to warn about Kafka sdf not properly restarting~~ Disable KafkaIO SDF while it is tested and fixed Jul 15, 2022

chamikaramj reviewed Jul 15, 2022

View reviewed changes

sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java Show resolved Hide resolved

github-actions bot added the python label Jul 15, 2022

add link to higher level issue

e44d2dd

chamikaramj reviewed Jul 15, 2022

View reviewed changes

johnjcasey added 8 commits July 15, 2022 14:13

unrevert changes to external.py

bf4e978

Merge remote-tracking branch 'origin/add-kafka-sdf-comment' into add-…

7ddb931

…kafka-sdf-comment

exclude SDF specific tests

beb3383

Merge remote-tracking branch 'origin/add-kafka-sdf-comment' into add-…

6ec8808

…kafka-sdf-comment

temporarily remove kafka specific tests

21af25c

change workaround to just use the unbounded reader, instead of settin…

0f8a586

…g an experimental option

remove method temporarily

4d3cb12

run spotless

7b1bfae

chamikaramj approved these changes Jul 20, 2022

View reviewed changes

nbali suggested changes Jul 20, 2022

View reviewed changes

Merge remote-tracking branch 'origin/master' into add-kafka-sdf-comment

0cb617e

johnjcasey closed this Jul 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable KafkaIO SDF while it is tested and fixed #22261

Disable KafkaIO SDF while it is tested and fixed #22261

johnjcasey commented Jul 13, 2022 •

edited

Loading

chamikaramj Jul 13, 2022

johnjcasey Jul 13, 2022

johnjcasey commented Jul 13, 2022

johnjcasey commented Jul 13, 2022

johnjcasey commented Jul 13, 2022

github-actions bot commented Jul 13, 2022

codecov bot commented Jul 14, 2022 •

edited

Loading

github-actions bot commented Jul 14, 2022

johnjcasey commented Jul 14, 2022

johnjcasey commented Jul 14, 2022

johnjcasey commented Jul 14, 2022

Abacn commented Jul 14, 2022

johnjcasey commented Jul 14, 2022

johnjcasey commented Jul 15, 2022

github-actions bot commented Jul 15, 2022

chamikaramj left a comment

chamikaramj Jul 15, 2022

chamikaramj Jul 15, 2022

chamikaramj left a comment

chamikaramj Jul 15, 2022

johnjcasey commented Jul 20, 2022

johnjcasey commented Jul 20, 2022

chamikaramj left a comment

johnjcasey commented Jul 20, 2022

nbali Jul 20, 2022

nbali Jul 20, 2022

johnjcasey Jul 21, 2022

nbali Jul 21, 2022

johnjcasey Jul 21, 2022

Disable KafkaIO SDF while it is tested and fixed #22261

Disable KafkaIO SDF while it is tested and fixed #22261

Conversation

johnjcasey commented Jul 13, 2022 • edited Loading

GitHub Actions Tests Status (on master branch)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnjcasey commented Jul 13, 2022

johnjcasey commented Jul 13, 2022

johnjcasey commented Jul 13, 2022

github-actions bot commented Jul 13, 2022

codecov bot commented Jul 14, 2022 • edited Loading

Codecov Report

github-actions bot commented Jul 14, 2022

johnjcasey commented Jul 14, 2022

johnjcasey commented Jul 14, 2022

johnjcasey commented Jul 14, 2022

Abacn commented Jul 14, 2022

johnjcasey commented Jul 14, 2022

johnjcasey commented Jul 15, 2022

github-actions bot commented Jul 15, 2022

chamikaramj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chamikaramj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnjcasey commented Jul 20, 2022

johnjcasey commented Jul 20, 2022

chamikaramj left a comment

Choose a reason for hiding this comment

johnjcasey commented Jul 20, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnjcasey commented Jul 13, 2022 •

edited

Loading

codecov bot commented Jul 14, 2022 •

edited

Loading