Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing lull reporting at bundle level processing #29882

Merged
merged 54 commits into from
Feb 26, 2024

Conversation

arvindram03
Copy link
Contributor

Implementing lull reporting for dataflow worker at bundle level processing. We dump a stack trace when the bundle processing time exceeds 10 mins. As part of this, we log the step names and time spent in each step to help users debug stuck jobs better.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @chamikaramj added as fallback since no labels match configuration

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@arvindram03
Copy link
Contributor Author

This change was reviewed internally and approved by the streaming team.

@chamikaramj
Copy link
Contributor

Thanks will check (sorry about the delay, was OOO).

@arvindram03 arvindram03 requested a review from scwhittle January 8, 2024 18:36
@arvindram03
Copy link
Contributor Author

The failures seems to be irrelevant to the change in this PR.

* @param trackedThread The execution thread that is in a lull.
* @param millis The milliseconds since the state was most recently entered.
*/
public abstract void reportBundleLull(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be a method on the ExecutionState, as it is not state specific. It could just be a method within ExecutionStateTracker which represents the execution across states.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would trigger a bigger change because the logging is happening at DataflowExecutionState which is an implementation of ExecutionState.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is worthwhile to avoid having this rely on the step context when it should be logged regardless of if there is a step context or not.

You can extract whatever shared logic to print the thread stacks static method in ExecutionStateTracker to share it.

@arvindram03 arvindram03 requested a review from scwhittle January 9, 2024 20:11
@arvindram03
Copy link
Contributor Author

@scwhittle its ready for review.

@arvindram03
Copy link
Contributor Author

Failures are due to https://opensource.org/licenses/mit planned downtime

@scwhittle
Copy link
Contributor

Just made final edit (replace anyOf with allOf instead of separate allOf) instead of round-trip. Will merge once tests pass

@arvindram03
Copy link
Contributor Author

The tests are green and ready to be merged.

@arvindram03
Copy link
Contributor Author

Seems like an unrelated failure

@scwhittle
Copy link
Contributor

Run Java_GCP_IO_Direct

@scwhittle scwhittle closed this Feb 26, 2024
@scwhittle scwhittle reopened this Feb 26, 2024
@scwhittle scwhittle merged commit ffe2dba into apache:master Feb 26, 2024
21 checks passed
@arvindram03
Copy link
Contributor Author

Thanks for the thorough review and help in getting this merged Sam.

@arvindram03 arvindram03 deleted the lull branch February 26, 2024 22:14
arvindram03 added a commit to arvindram03/beam that referenced this pull request Mar 15, 2024
scwhittle pushed a commit that referenced this pull request Mar 18, 2024
Abacn pushed a commit to Abacn/beam that referenced this pull request Mar 18, 2024
Abacn added a commit that referenced this pull request Mar 18, 2024
hjtran pushed a commit to hjtran/beam that referenced this pull request Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants