Skip to content
This repository has been archived by the owner on Nov 11, 2022. It is now read-only.

PCollection<Void> instead of PDone #645

Open
brucedeen opened this issue Mar 18, 2019 · 2 comments
Open

PCollection<Void> instead of PDone #645

brucedeen opened this issue Mar 18, 2019 · 2 comments

Comments

@brucedeen
Copy link

I'm not expecting this to be done. But I do want to highlight the UseCase I have for this.
My environment is as follows.

  1. I only allow templates to be run in my environment, for batch jobs I can invoke the template very easily from Composer (aka Airflow).
  2. I want to notify on a message event (pubsub topic) when I complete. This can carve 2.5 minutes off of a success dataflow completion and i would like to take advantage of that.
    If I have the above 2, I cannot wait until finished on the pipeline and then publish a message, it must be handled.

currently I have replaced the resulting PDone of many Output interactions with PCollection on provided IO classes, this allows me to wait for the completion of say a save to BigTable or a save to Datastore and then publish a message.

Is there anyway of getting this functionality without changing the PDone into PCollection?

@alec-ferguson-sunrun
Copy link

I second this, it would be incredibly useful.

I think development of DataflowIO specifically now lives in the core Beam repo, https://github.com/apache/beam. I created this ticket to propose adding an option for the DatastoreIO.v1().write() to return PCollection<Void>, and I plan on submitting a PR soon. Comment there and discuss?
https://issues.apache.org/jira/browse/BEAM-9491

@kennknowles
Copy link
Contributor

You are correct. This repository is for archival purposes only. Thanks for finding the Jira and linking to it!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants