Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#23106] Add periodic.Sequence and periodic.Impulse transforms to Go SDK #25808
[#23106] Add periodic.Sequence and periodic.Impulse transforms to Go SDK #25808
Changes from 2 commits
09a518e
3a147b8
2a27b7a
db5ba79
23ed0c1
6ef98ca
e6cdac3
5ef874b
8bc367a
fbdf5a8
3f24a07
6af6dce
de762ce
e7167f6
1b05c69
c1018a2
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nesting here, has the added deficit of hiding the core of the example.
I'll note that the window for the side input is usually going to be larger than the window for the main processing. While this isn't wrong, the usualy goal around the pattern is a situation like allowing files that change hourly get read in once each hour, and have the more frequent data able to re-use the cached read in file. (Granted, this behavior isn't yet enabled by default in the Go SDK, but that's an aside).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of making the side input window larger. Do you think it is worth making that change, taking another configuration to specify the side input window size?
I'm also curious to know what you mean by
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't able to apply this suggestion after having changed the
periodic.Impulse
signature, but have applied the same change in a separate commit. Thanks!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WRT the cache:
Beam's execution abstraction, the FnAPI, has provisions for caching data cross bundle from the StateAPI on the SDK side, in order to avoid repeated deserialization, and additional round trips to the primary store of the runner to fetch state data. Side inputs also come across the StateAPI. In particular, very valuable for streaming jobs, as typically a single "SDK harness" is usually responsible for the same key all the time, so for tight windows it would look up the same data from the side inputs.
WRT not applying after the impulse change. Makes sense since an int64 was being received from the "update" function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't address the larger side input window. Since it's an example, we don't need to over configure things. I'd make the window 5 times larger to demonstrate the paradigm, and an explicit comment that's what the larger window is for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to add a short unit test if using the
prism
runner directly.While the runner isn't fully complete yet, it does run and execute ProcessContinuation transforms and watermarks!
It just doesn't do the splitting just yet, or actually "wait" for any process continuations at the moment. But when the "sequence" is done, it will terminate, so we can add a test with period of a second, a duration of a minute, and then count that we're getting 60 elements out of the transform. (Small risk of getting 59 instead, as a flake...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great! I couldn't get it working on the direct runner, but didn't try using the new prism explicitly. Have now added two tests
TestImpulse
andTestSequence
and it looks like that's working. Let me know what you think.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The direct runner (and basically all non-portable runners), are not great, since they mis-align expectations for users when they move to executing on a "real" runner, like Flink or Dataflow. There's also no "spec" about what a Direct Runner should do to properly test things, so the Go Direct Runner is missing a number of features that the Java and Python ones use.
Essentially Prism is going to replace the Go Direct Runner as the default runner for the Go SDK at least, and hopefully make testing and local runs for all facets of Beam easier. The prism Readme has a bunch of the vision, and desired goals, but I gave a talk at last year's Beam Summit, about the motivations, especially in all the "hidden" bits of beam that users usually don't interact with, but are affected by.
https://www.youtube.com/watch?v=G4lbkvAG6xk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While playing around locally I've added this way to mock the time.Now function. If it ends up being unused I'm happy to drop it if that makes more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will work in the direct runner, but that's because the direct runner won't successfully run the example or anything local. It would be better to fold things into the sequence definition to enable appropriate testing behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed I struggled with the direct runner, but have now added two working tests that uses the
prism
runner. I am struggling to understand how to fold this into the sequence definition as you suggest, to properly test it. Instead I went ahead and removed this entirely and the DoFn usestime.Now
.I am happy to leave it there for now. If you want to expand on how to fold it into the definition and test it I am happy to add this as well.