Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable KafkaIO SDF while it is tested and fixed #22261

Closed
wants to merge 27 commits into from

Conversation

johnjcasey
Copy link
Contributor

@johnjcasey johnjcasey commented Jul 13, 2022

See #22303


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

* <h1>Reading from Kafka SDF is currently broken, as re-starting the pipeline will cause the
* consumer to start from scratch. See <a
* href="https://github.com/apache/beam/issues/21730">this</a>. Current workaround is to use
* --experimental_option=use_deprecated_read to use the Unbounded implementation</h1>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will work for Dataflow Runner v2 Java pipelines. Can you try ?

"use_unbounded_sdf_wrapper" should work but I only tried it for x-lang.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do once your change is merged

@johnjcasey
Copy link
Contributor Author

Run Spotless Precommit

@johnjcasey
Copy link
Contributor Author

Run Java PreCommit

@johnjcasey
Copy link
Contributor Author

Run PythonDocs PreCommit

@github-actions
Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@codecov
Copy link

codecov bot commented Jul 14, 2022

Codecov Report

Merging #22261 (0cb617e) into master (50346b5) will increase coverage by 0.00%.
The diff coverage is 50.00%.

@@           Coverage Diff           @@
##           master   #22261   +/-   ##
=======================================
  Coverage   74.17%   74.17%           
=======================================
  Files         706      704    -2     
  Lines       93190    93159   -31     
=======================================
- Hits        69124    69105   -19     
+ Misses      22798    22787   -11     
+ Partials     1268     1267    -1     
Flag Coverage Δ
python 83.53% <50.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdks/python/apache_beam/io/kafka.py 80.00% <50.00%> (ø)
.../python/apache_beam/testing/test_stream_service.py 88.09% <0.00%> (-4.77%) ⬇️
sdks/python/apache_beam/utils/interactive_utils.py 95.12% <0.00%> (-2.44%) ⬇️
...che_beam/runners/interactive/interactive_runner.py 90.06% <0.00%> (-1.33%) ⬇️
...eam/runners/portability/fn_api_runner/execution.py 92.44% <0.00%> (-0.65%) ⬇️
sdks/python/apache_beam/transforms/combiners.py 93.05% <0.00%> (-0.39%) ⬇️
...eam/runners/interactive/interactive_environment.py 91.71% <0.00%> (-0.31%) ⬇️
...hon/apache_beam/runners/worker/bundle_processor.py 93.54% <0.00%> (-0.13%) ⬇️
.../pkg/beam/core/runtime/harness/diagnostics_hook.go
sdks/go/pkg/beam/util/harnessopts/heap_dump.go
... and 2 more

Help us with your feedback. Take ten seconds to tell us how you rate us.

@github-actions
Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @pabloem for label python.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@johnjcasey
Copy link
Contributor Author

Run Portable_Python PreCommit

@johnjcasey
Copy link
Contributor Author

Run Python PreCommit

@johnjcasey
Copy link
Contributor Author

Run PythonDocs PreCommit

@Abacn
Copy link
Contributor

Abacn commented Jul 14, 2022

pydoc precommit failure is due to

packages/apache_beam/io/kafka.py:docstring of apache_beam.io.kafka:67: WARNING: Unexpected indentation.

@johnjcasey
Copy link
Contributor Author

codecov is exclusively unrelated changes, so there is no need to interact with it

@johnjcasey
Copy link
Contributor Author

R: @chamikaramj

@github-actions
Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

Copy link
Contributor

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@@ -76,16 +76,33 @@

For more information specific to Flink runner see:
- https://beam.apache.org/documentation/runners/flink/

Reading via Kafka SDF is currently broken, and will cause the pipeline
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove Python updates since we made UnboundedSource wrapped SDF Kafka the default for now: #22286

* <h1>Reading from Kafka SDF is currently broken, as re-starting the pipeline will cause the
* consumer to start from scratch. See <a
* href="https://github.com/apache/beam/issues/21730">this</a>. Current workaround is to use
* --experimental_option=use_unbounded_sdf_wrapper to use the Unbounded implementation</h1>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"For runners that require SDF, current workaround is to use ..."

Also, pls confirm that this works for Java pipelines that use Dataflow Runner v2.

@github-actions github-actions bot removed the python label Jul 15, 2022
@johnjcasey johnjcasey changed the title Add comments and logs to warn about Kafka sdf not properly restarting Disable KafkaIO SDF while it is tested and fixed Jul 15, 2022
Copy link
Contributor

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. LGTM other than one comment.

@@ -706,27 +706,12 @@ class JavaJarExpansionService(object):
This can be passed into an ExternalTransform as the expansion_service
argument which will spawn a subprocess using this jar to expand the
transform.

Args:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't revert changes to external.py since this feature and doc updates are useful in general.

@johnjcasey
Copy link
Contributor Author

Run Python_PVR_Flink PreCommit

@johnjcasey
Copy link
Contributor Author

Run Java PreCommit

Copy link
Contributor

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

@johnjcasey
Copy link
Contributor Author

Run Python_PVR_Flink PreCommit

return input.apply(new ReadFromKafkaViaUnbounded<>(this, keyCoder, valueCoder));
}
return input.apply(new ReadFromKafkaViaSDF<>(this, keyCoder, valueCoder));
// Reading from Kafka SDF is currently broken, as re-starting the pipeline will cause the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a valid reason to completely disable Kafka SDF. What if the consumer is totally fine with starting from scratch? I have business need that requires scanning time ranges - that is only being supported by SDF - without caring about any previous consumer offset. Disable it ONLY if group.id is provided.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically just add to the whole chunk of code you commented here a check like this:

          || ExperimentalOptions.hasExperiment(
              input.getPipeline().getOptions(), "use_unbounded_sdf_wrapper")
          || getConsumerConfig().get(ConsumerConfig.GROUP_ID_CONFIG) != null
          || compatibility.supportsOnly(KafkaIOReadImplementation.LEGACY)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is temporary. We primarily want to make sure that a new user won't run into this problem. We intend to fix this as rapidly as possible. If you have a less typical use case, that will still work on existing versions of Beam, while we try to get this fixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this gets merged into master and gets released it's not temporary. Totally removing the SDF support is a breaking change. It should be as minimally breaking as possible. I have shown one precondition that indicates it works just fine even in this bugged state. There could be even more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have this use case, there are probably others. We won't disable this at all then

@johnjcasey johnjcasey closed this Jul 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants