-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Feature][Go SDK]: Calling Python transforms from Go pipeline results #22931
Comments
I'm indeed curious about what I am missing from docs regarding calling xlang transforms. Following the chapter 13 of Beam Programming Model guide, I wrote three Pipelines, one on each language. The Python and Java one expose their SplitWords implementation to an expansion service. I can run each pipeline, both locally and on Google Cloud Dataflow remote worker, to make sure they all are properly working by their own. Each main has some arguments to allow for a cross language call configuration. I have then tested several combinations of xlang calls and only one worked:
So, to summarize, my experiments yielded these results:
This is my Go code: https://github.com/ronoaldo/micro-beam/blob/main/05_xlang/go/pipeline.go Am my missing something? This is the Go output after I launch the Java expansion server:
|
I don't think using Python transforms from Go SDK is fully supported yet. I believe @riteshghorse is looking into this. |
Yes, Python transform from Go will be available by mid Sept in Go SDK at the Head or we will try to get it in for 2.42 release. Note: Cross Language transforms doesn't work on Go Direct Runner. You have to use Python Portable/Flink/Dataflow runner. Here is an example code for running KafkaIO in Go SDK using Cross-Language with Java SDK: https://github.com/apache/beam/blob/master/sdks/go/examples/kafka/taxi.go |
Yay! Super! Thanks
Oh, that is interesting. While reading over that example I didn't noticed it was not using the Direct Runner from Go. Just to make sure I understood properly, if I use a different runner, such as Dataflow, my sample that calls Go from Java should work, right? I mean, using a custom-made jar from the Java side instead of using a Java Beam SDK transform. Calling Python from Go would be very nice in order to have access to some very handy transforms like DataframeTransform. To avoid cluttering the issue tracker, you guys can close / relate this issue to the one tracking the implementation of Go -> Python xlang calls. |
@ronoaldo Yes, Dataflow, Flink, and the Python Portable runner support executing Cross Language transforms. The Go Direct Runner doesn't and is going to be replaced with something more useful in the next 6 months. I think we currently plan to wrap the Dataframe transforms too, along with Run Inference. At this point, we're aiming for the 2.43.0 or 2.44.0 releases though, as 2.42.0 is cut. I don't think we have a public issue for tracking this though, so I'll assign to @riteshghorse , and he can close this issue after referencing any existing issues. Thank you for the clear reports though! |
(swapped some labels and edited the title things for book keeping) |
Thanks @lostluck! A small follow up - using the Dataflow Runner I could call Java from Python, but could not call Java from Go in a sample/custom pipeline. The error in Dataflow was ... odd! I could not decipher it. I'll open a separate report to track this use case (Go -> Java xlang for a custom Java pipeline). |
I would also point out that the docs state this as being a possible workflow as of now, which is what confused me in the beginning:
If 2.42 docs can be updated, it worth mentioning that calling Python from Go is unsupported, instead of telling that we must start the expansion service manually. |
What happened?
Following the documentation available here for Beam 2.4.1, I am trying to write a simple pipeline in Go using an external transform defined in Python.
I followed the docs with my own "word count" code but ended up having an error that was very obscure. Then I decided to run the Python test cross lang suite (pip installed at
apache_beam/runners/portability/expansion_service_test.py
) to use it as expansion service entrypoint and call a sample from the Go SDK tree, specifically, the xlang/wordcount sample. To my surprise, It failed with the same kind of error.After several trial and errors I failed to get Go pipeline call an external Python transform following the docs and even reading some code from the stack traces.
Here is the output from the Python expansion service:
This is the error from the pipeline (very similar one I got with my own code, except it was complaining for 'n5' but same KeyError at the same place):
I tested this using a Python virtual env with
apache-beam[gcp]
andgo install
'ed this example:github.com/apache/beam/sdks/v2/go/examples/xlang/wordcount@latest
. From one terminal session I launched the expansion service and on another the wordcount Go program.Issue Priority
Priority: 2
Issue Component
Component: sdk-go
The text was updated successfully, but these errors were encountered: