-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable KafkaIO SDF while it is tested and fixed #22261
Closed
Closed
Changes from 9 commits
Commits
Show all changes
27 commits
Select commit
Hold shift + click to select a range
2bf47c9
21730 add warning message
johnjcasey 4ae0939
add python warnings
johnjcasey 4ef4c00
add python warnings
johnjcasey 8b20e60
Merge remote-tracking branch 'origin/master' into add-kafka-sdf-comment
johnjcasey 476cc54
Merge remote-tracking branch 'origin/master' into add-kafka-sdf-comment
johnjcasey aeb6b6e
update comments
johnjcasey ac6c369
fix indentation for pydoc
johnjcasey 1f55b1a
Merge remote-tracking branch 'origin/add-kafka-sdf-comment' into add-…
johnjcasey 5b0e77a
Merge remote-tracking branch 'origin/master' into add-kafka-sdf-comment
johnjcasey ba3f3b8
update comment
johnjcasey b408303
remove python comments
johnjcasey fa33138
Merge remote-tracking branch 'origin/add-kafka-sdf-comment' into add-…
johnjcasey 0d01374
Merge remote-tracking branch 'origin/master' into add-kafka-sdf-comment
johnjcasey d3997c0
Merge remote-tracking branch 'origin/master' into add-kafka-sdf-comment
johnjcasey 6daf3a0
revert Cham's changes
johnjcasey 7670410
Merge remote-tracking branch 'origin/add-kafka-sdf-comment' into add-…
johnjcasey d6345ad
Disable kafka sdf by adding "use_unbounded_sdf_wrapper"
johnjcasey e44d2dd
add link to higher level issue
johnjcasey bf4e978
unrevert changes to external.py
johnjcasey 7ddb931
Merge remote-tracking branch 'origin/add-kafka-sdf-comment' into add-…
johnjcasey beb3383
exclude SDF specific tests
johnjcasey 6ec8808
Merge remote-tracking branch 'origin/add-kafka-sdf-comment' into add-…
johnjcasey 21af25c
temporarily remove kafka specific tests
johnjcasey 0f8a586
change workaround to just use the unbounded reader, instead of settin…
johnjcasey 4d3cb12
remove method temporarily
johnjcasey 7b1bfae
run spotless
johnjcasey 0cb617e
Merge remote-tracking branch 'origin/master' into add-kafka-sdf-comment
johnjcasey File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -76,16 +76,33 @@ | |
|
||
For more information specific to Flink runner see: | ||
- https://beam.apache.org/documentation/runners/flink/ | ||
|
||
Reading via Kafka SDF is currently broken, and will cause the pipeline | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can remove Python updates since we made UnboundedSource wrapped SDF Kafka the default for now: #22286 |
||
to re-read all data on Kafka whenever restarted. | ||
See https://github.com/apache/beam/issues/21730. | ||
Current workaround is: | ||
Start an expansion service with experiment "use_unbounded_sdf_wrapper" | ||
#pylint: disable=line-too-long | ||
java -jar sdks/java/io/expansion-service/build/libs/beam-sdks-java-io-expansion-service-2.41.0-SNAPSHOT.jar 12345 --experiments=use_unbounded_sdf_wrapper | ||
#pylint: enable=line-too-long | ||
|
||
Update transforms in kafka.py to use this expansion service. | ||
https://github.com/apache/beam/blob/2c8e7eb7a39cbe3a1678a5c6b8b3f8700d4d8706/sdks/python/apache_beam/io/kafka.py#L189 | ||
|
||
Use instructions in Kafka example to run this from a Git clone. | ||
https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/kafkataxi | ||
""" | ||
|
||
# pytype: skip-file | ||
|
||
import logging | ||
import typing | ||
|
||
from apache_beam.transforms.external import BeamJarExpansionService | ||
from apache_beam.transforms.external import ExternalTransform | ||
from apache_beam.transforms.external import NamedTupleBasedPayloadBuilder | ||
|
||
_LOGGER = logging.getLogger(__name__) | ||
|
||
ReadFromKafkaSchema = typing.NamedTuple( | ||
'ReadFromKafkaSchema', | ||
[('consumer_config', typing.Mapping[str, str]), | ||
|
@@ -166,6 +183,10 @@ def __init__( | |
Java Kafka Reader reads keys and values as 'byte[]'. | ||
:param expansion_service: The address (host:port) of the ExpansionService. | ||
""" | ||
_LOGGER.warning( | ||
"Reading from Kafka via SDF is currently broken, and the pipeline will" | ||
"re-read all data on the topic whenever the pipeline is restarted. " | ||
"See Kafka.py for workaround instructions.") | ||
if timestamp_policy not in [ReadFromKafka.processing_time_policy, | ||
ReadFromKafka.create_time_policy, | ||
ReadFromKafka.log_append_time]: | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"For runners that require SDF, current workaround is to use ..."
Also, pls confirm that this works for Java pipelines that use Dataflow Runner v2.