This repository has been archived by the owner on Nov 11, 2022. It is now read-only.
Version 1.9.0
- Added the
ValueProvider
interface for use in pipeline options. Making an option of typeValueProvider<T>
instead ofT
allows its value to be supplied at runtime (rather than pipeline construction time) and enables Dataflow templates. Support forValueProvider
has been added toTextIO
,PubSubIO
, andBigQueryIO
and can be added to arbitrary PTransforms as well. - Added the ability to automatically save profiling information to Google Cloud Storage using the
--saveProfilesToGcs
pipeline option. For more information on profiling pipelines executed by theDataflowPipelineRunner
, see issue #72. - Deprecated the
--enableProfilingAgent
pipeline option that saved profiles to the individual worker disks. For more information on profiling pipelines executed by theDataflowPipelineRunner
, see issue #72. - Changed
FileBasedSource
to throw an exception when reading from a file pattern that has no matches. Pipelines will now fail at runtime rather than silently reading no data in this case. This change affectsTextIO.Read
orAvroIO.Read
when configuredwithoutValidation
. - Enhanced
Coder
validation in theDirectPipelineRunner
to catch coders that cannot properly encode and decode their input. - Improved display data throughout core transforms, including properly handling arrays in
PipelineOptions
. - Improved performance for pipelines using the
DataflowPipelineRunner
in streaming mode. - Improved scalability of the
InProcessRunner
, enabling testing with larger datasets. - Improved the cleanup of temporary files created by
TextIO
,AvroIO
, and otherFileBasedSource
implementations. - Modified the default version range in the archetypes to exclude beta releases of Dataflow SDK for Java, version 2.0.0 and later.