-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experimental Swift SDK Implementation #28062
Conversation
… up and adding tests and Apache License labels where needed.
… one is just a special case of the other. Add the wordcount integration test to verify the refactoring does indeed work and start moving the pipeline code into the branch.
…etes successfully with the Python Portable Runner.
…ng with the typealias data
…n non-mac platforms.
…likely to be ignored by .gitignore
… a new approach with optional "real" names to get more stable transform names. Start adding "ObjectFn" for more complicated DoFn implementations.
Codecov Report
@@ Coverage Diff @@
## master #28062 +/- ##
==========================================
- Coverage 72.33% 72.32% -0.01%
==========================================
Files 678 678
Lines 99726 99730 +4
==========================================
- Hits 72133 72132 -1
- Misses 26030 26035 +5
Partials 1563 1563
Flags with carried forward coverage won't be shown. Click here to find out more. see 12 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
…art stubbing out External Transforms and Schema support. Improve automatic naming of pardos and make explicit name implementation cleaner. Add S3 and GCS dependencies to implement basic internal IOs.
…. Added a very basic composite version of WordCount to the test suite. Still need to hook up the graph generation, which requires searching for the roots from the leaves to give it more of that "SwiftUI" style.
…support to start. Doesn't do anything fancy, but does implement a true version of wordcount with the classic Shakespeare input.
…t to introduce an element-wise pardo. This better matches other SDK conventions (i.e. processElement) and allows us to implicitly transfer the timestamp and window from the input to an output. Repurposes the PInput and POutput protocols to handle this functionality. Also update the map and flatmap implementations to use pardo rather than pstream to get a sense of how well it works (well).
…table Runner as well as the Python Portable Runner. Modified the PInput/POutput functionality to just be structs, this allows us to use them for both closures as well as the eventual DoFn interface and cleans up the function signatures.
return (id,transform) | ||
//TODO: Handle timer messages | ||
default: | ||
log.info("Unhanled message \(message)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.info("Unhanled message \(message)") | |
log.info("Unhandled message \(message)") |
log.info("Submitted job \(job.jobID)") | ||
var done = false | ||
while !done { | ||
let status = try await client.getState(.with { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think where prism is failing with Swift is that I've currently got the code to be a bit blocking on state & message transitions, and I'm not sure I have it set up to "do the right thing" with the unary getState calls, vs getStateStream or getMessageStream, so eventually it gets blocked up.
Not a 100% explanation for what we've seen, but better than nothing.
Great info! I'll let you know when it's resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Back to stumped after replicated this code. Prism is happy to run this approach from the Go SDK side. I might try seeing how hard it is to get the Swift SDK running in Linux, and answer my questions directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested on Linux x86 but haven't tried Linux ARM yet. In theory when I was messing around with timestamps I'm doing the right thing with endianness there but I haven't verified that everywhere yet. In any case, "swift test" should work with a beta 5.9 build of Swift. I'm currently moving some of that to an "Examples" folder to make this sort of thing easier. I mostly used tests initially because it's a one button command in my IDE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've successfully installed Swift (5.9 dev snap) on my personal laptop and pulled down the PR locally, and can replicate the behaviour on prism that you were seeing. I should be able to get to the bottom of those problems, and make prism more robust!
Hot dang do I love the clarity in the Swift Docs on installing on different OS flavours, and how automated swift test
is. It failed the first time, and clearly said what else I needed to install (openSSL) to clear it up.
We should also add the following to the beam repo's .gitignore, right under the python section.
# Ignore files generated by the Swift build process
sdks/swift/.build/**/*
Building the swift SDK lead to many many files in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mystery problem has gone deeper. Prism is hosting worker services at X, swift is trying to connect to X, and then prism apparently never gets that connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it!
I just let Go's net listener determine the endpoint host spec. For local host, it uses [::]:PORT, which it's fine with parsing itself when it receives it.
The bit that made it tricky is I didn't notice the GRPCChannel override to support the beam api endpoint descriptor. But when I did, i hard changed the host to "localhost" and then the pipeline fails with a reasonable "whats's a split request?" error.
Prisms at fault here, and has become more resilient as a result. #winning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, excellent. Is the problem that Go is using IPv6 and I'm connecting on IPv4? Most of the code I looked at seemed to be doing the localhost thing so I just sort of copied that... but that's probably suboptimal in the year 2023 (At some point I also need to add the TLS support too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the sequence of events:
I ask for a port ":0", to listen on on any IP this machine is on.
It gives me "[::]:PORT"
but [::] is ipv6 unspecified.
In Go, that's explicitly handed as the loopback address.
The fix on my end is to just say the endpoint is one of "localhost:PORT" or "[::1]:PORT".
The next failure does happen on Swift side (KV<String, String> isn't KV<String,Optional>) which either means prism messed up coders, or the swift SDK did, and this is just revealing something due to prism executing a single transform.
I need to make prism gracefully mark the job failed on that SDK worker failure though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing it's case 3: A transform that is fused by the other runners that Prism doesn't fuse. From the error I'm guessing that Prism is asking for a length prefixed coder whereas the Swift SDK coder is not length prefixed. Swift's coder implementation is pretty strict right now and does not automatically convert non-optional coders into optional coders, but I think that should be allowable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2023-08-28T09:35:31-0700 info Source : [ApacheBeam] Waiting for input on inst003-stage004_source
Could not cast value of type 'ApacheBeam.KV<Swift.String, Swift.Optional<Swift.String>>' (0x7f3a980093f8) to 'ApacheBeam.KV<Swift.String, Swift.String>' (0x7f3adf6cf438).
… processing discovered during Prism Runner integration.
Ignore swift build artifacts.
…support so things should be more portable across chipsets now.
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
This PR has been obsoleted by beam-swift repo |
An experimental implementation of a Beam SDK for the Swift language. This implementation relies on the forthcoming Swift 5.9 compiler version and has been tested against the 8/11 beta release on Intel Mac and Linux platforms.
It provides basic support for pipeline construction and submission to a portable runner. An implementation of ParDo as well as most of the basic Coders. The IntegrationTests.swift file contains an example of word count which processes local files (included as test fixtures).
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.