This repository has been archived by the owner on Nov 11, 2022. It is now read-only.
Version 2.0.0
The Dataflow SDK for Java 2.0.0 is the first stable 2.x release of the Dataflow SDK for Java, based on a subset of Apache Beam 2.0.0. See the Apache Beam 2.0.0 release notes for additional change information.
Note for users upgrading from version 1.x
This is a new major version, and therefore comes with the following caveats:
- Breaking Changes: The Dataflow SDK 2.x for Java has a number of breaking changes from the 1.x series of releases.
- Update Incompatibility: The Dataflow SDK 2.x for Java is update-incompatible with Dataflow 1.x. Streaming jobs using a Dataflow 1.x SDK cannot be updated to use a Dataflow 2.x SDK. Dataflow 2.x pipelines may only be updated across versions starting with SDK version 2.0.0.
Updates and improvements since 2.0.0-beta3
Version 2.0.0 is based on a subset of Apache Beam 2.0.0. The most relevant changes in this release for Cloud Dataflow customers include:
- Added new API in
BigQueryIO
for writing into multiple tables, possibly with different schemas, based on data. See BigQueryIO.Write.to(SerializableFunction) and BigQueryIO.Write.to(DynamicDestinations). - Added new API for writing windowed and unbounded collections to
TextIO
andAvroIO
. For example, see TextIO.Write.withWindowedWrites() and TextIO.Write.withFilenamePolicy(FilenamePolicy). - Added
TFRecordIO
to read and write TensorFlow TFRecord files. - Added the ability to automatically register
CoderProvider
s in the defaultCoderRegistry
.CoderProvider
s are registered by aServiceLoader
via concrete implementations of aCoderProviderRegistrar
. - Changed order of parameters for
ParDo
with side inputs and outputs. - Changed order of parameters for
MapElements
andFlatMapElements
transforms when specifying an output type. - Changed the pattern for reading and writing custom types to
PubsubIO
andKafkaIO
. - Changed the syntax for reading to and writing from
TextIO
,AvroIO
,TFRecordIO
,KinesisIO
,BigQueryIO
. - Changed syntax for configuring windowing parameters other than the
WindowFn
itself using theWindow
transform. - Consolidated
XmlSource
andXmlSink
intoXmlIO
. - Renamed
CountingInput
toGenerateSequence
and unified the syntax for producing bounded and unbounded sequences. - Renamed
BoundedSource#splitIntoBundles
to#split
. - Renamed
UnboundedSource#generateInitialSplits
to#split
. - Output from
@StartBundle
is no longer possible. Instead of accepting a parameter of typeContext
, this method may optionally accept an argument of typeStartBundleContext
to accessPipelineOptions
. - Output from
@FinishBundle
now always requires an explicit timestamp and window. Instead of accepting a parameter of typeContext
, this method may optionally accept an argument of typeFinishBundleContext
to accessPipelineOptions
and emit output to specific windows. XmlIO
is no longer part of the SDK core. It must be added manually using the newxml-io
package.
More information
Please see Cloud Dataflow documentation and release notes for version 2.0.