Skip to content
This repository has been archived by the owner on Nov 11, 2022. It is now read-only.

Version 1.9.0

Compare
Choose a tag to compare
@dhalperi dhalperi released this 09 Jan 20:00
· 137 commits to master since this release
v1.9.0
  • Added the ValueProvider interface for use in pipeline options. Making an option of type ValueProvider<T> instead of T allows its value to be supplied at runtime (rather than pipeline construction time) and enables Dataflow templates. Support for ValueProvider has been added to TextIO, PubSubIO, and BigQueryIO and can be added to arbitrary PTransforms as well.
  • Added the ability to automatically save profiling information to Google Cloud Storage using the --saveProfilesToGcs pipeline option. For more information on profiling pipelines executed by the DataflowPipelineRunner, see issue #72.
  • Deprecated the --enableProfilingAgent pipeline option that saved profiles to the individual worker disks. For more information on profiling pipelines executed by the DataflowPipelineRunner, see issue #72.
  • Changed FileBasedSource to throw an exception when reading from a file pattern that has no matches. Pipelines will now fail at runtime rather than silently reading no data in this case. This change affects TextIO.Read or AvroIO.Read when configured withoutValidation.
  • Enhanced Coder validation in the DirectPipelineRunner to catch coders that cannot properly encode and decode their input.
  • Improved display data throughout core transforms, including properly handling arrays in PipelineOptions.
  • Improved performance for pipelines using the DataflowPipelineRunner in streaming mode.
  • Improved scalability of the InProcessRunner, enabling testing with larger datasets.
  • Improved the cleanup of temporary files created by TextIO, AvroIO, and other FileBasedSource implementations.
  • Modified the default version range in the archetypes to exclude beta releases of Dataflow SDK for Java, version 2.0.0 and later.