Skip to content
This repository has been archived by the owner on Nov 11, 2022. It is now read-only.

Version 1.7.0

Compare
Choose a tag to compare
@dhalperi dhalperi released this 13 Sep 18:44
· 266 commits to master since this release
v1.7.0
  • Added support for Cloud Datastore API v1 in the new com.google.cloud.dataflow.sdk.io.datastore.DatastoreIO. Deprecated the old DatastoreIO class that supported only the deprecated Cloud Datastore API v1beta2.
  • Improved DatastoreIO.Read to support dynamic work rebalancing, and added an option to control the number of query splits using withNumQuerySplits.
  • Improved DatastoreIO.Write to work with an unbounded PCollection, supporting writing to Cloud Datastore when using the DataflowPipelineRunner in streaming mode.
  • Added the ability to delete Cloud Datastore Entity objects directly using Datastore.v1().deleteEntity or to delete entities by key using Datastore.v1().deleteKey.
  • Added support for reading from a BoundedSource to the DataflowPipelineRunner in streaming mode. This enables the use of TextIO.Read, AvroIO.Read and other bounded sources in these pipelines.
  • Added support for optionally writing a header and/or footer to text files produced with TextIO.Write.
  • Added the ability to control the number of output shards produced when using a Sink.
  • Added TestStream to enable testing of triggers with multiple panes and late data with the InProcessPipelineRunner.
  • Added the ability to control the rate at which UnboundedCountingInput produces elements using withRate(long, Duration).
  • Improved performance and stability for pipelines using the DataflowPipelineRunner in streaming mode.
  • To support TestStream, reimplemented DataflowAssert to use GroupByKey instead of sideInputs to check assertions. This is an update-incompatible change to DataflowAssert for pipelines run on the DataflowPipelineRunner in streaming mode.
  • Fixed an issue in which a FileBasedSink would produce no files when writing an empty PCollection.
  • Fixed an issue in which BigQueryIO.Read could not query a table in a non-US region when using the DirectPipelineRunner or the InProcessPipelineRunner.
  • Fixed an issue in which the combination of timestamps near the end of the global window and a large allowedLateness could cause an IllegalStateException for pipelines run in the DirectPipelineRunner.
  • Fixed a NullPointerException that could be thrown during pipeline submission when using an AfterWatermark trigger with no late firings.