Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop emitting upon truncate in Java PeriodicSequence #25716

Merged
merged 1 commit into from
Mar 4, 2023

Conversation

Abacn
Copy link
Contributor

@Abacn Abacn commented Mar 3, 2023

Fixes #22776. Java counterpart of the python fix: #23765

Tested draining successful on Dataflow runner v2. JobId: 2023-03-03_10_53_37-6111128436657959155
However Dataflow runner v1 still not working: JobId: 2023-03-03_10_54_38-5379949162309256925
checked that on runner v1 the truncate function never gets called after hitting the drain button. This may be a feature gap that Dataflow runner v1 does not support drain splittable DoFn souce.

Please add a meaningful description for your change here


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI.

@github-actions github-actions bot added the java label Mar 3, 2023
@@ -207,7 +213,7 @@ public ProcessContinuation processElement(

boolean claimSuccess = true;

estimator.setWatermark(Instant.ofEpochMilli(restriction.getFrom()));
estimator.setWatermark(Instant.ofEpochMilli(nextOutput));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is made in accordance with Python implementation. Currently the watermark will be set back to the initial timestamp (which violates the monotonically increasing) though it will be set back in the while clause.

@Abacn
Copy link
Contributor Author

Abacn commented Mar 3, 2023

During testing it is also found that runner v2 calls newTracker as much as 5x than runner v1, even though the runner v2 job has 4 CPU core and runner v1 job only has two. This may have performance impacts since newTracer is used in every splittable DoFn.

runner v1 log:

image

runner v2 log:

image

@Abacn
Copy link
Contributor Author

Abacn commented Mar 3, 2023

R: @damccorm

@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2023

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

@Abacn
Copy link
Contributor Author

Abacn commented Mar 3, 2023

Run Java_IOs_Direct PreCommit

Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks

@Abacn
Copy link
Contributor Author

Abacn commented Mar 3, 2023

Java PreCommit known flake org.apache.beam.runners.flink.FlinkRequiresStableInputTest.testParDoRequiresStableInputPortable (#25247 (comment))

KafkaIO precommit known flake: testWatermarkUpdateWithSparseMessages #25261

@Abacn
Copy link
Contributor Author

Abacn commented Mar 3, 2023

Run Java PreCommit

@Abacn
Copy link
Contributor Author

Abacn commented Mar 3, 2023

Run Java_Kafka_IO_Direct PreCommit

@Abacn Abacn merged commit 4da6025 into apache:master Mar 4, 2023
@Abacn Abacn deleted the fixperiodicseqnotdrain branch March 4, 2023 00:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Periodic Impulse does not drain on Dataflow
2 participants