Skip to content
This repository has been archived by the owner on Nov 11, 2022. It is now read-only.

Version 2.0.0

Compare
Choose a tag to compare
@davorbonaci davorbonaci released this 31 May 05:56
· 53 commits to master since this release
v2.0.0

The Dataflow SDK for Java 2.0.0 is the first stable 2.x release of the Dataflow SDK for Java, based on a subset of Apache Beam 2.0.0. See the Apache Beam 2.0.0 release notes for additional change information.

Note for users upgrading from version 1.x

This is a new major version, and therefore comes with the following caveats:

  • Breaking Changes: The Dataflow SDK 2.x for Java has a number of breaking changes from the 1.x series of releases.
  • Update Incompatibility: The Dataflow SDK 2.x for Java is update-incompatible with Dataflow 1.x. Streaming jobs using a Dataflow 1.x SDK cannot be updated to use a Dataflow 2.x SDK. Dataflow 2.x pipelines may only be updated across versions starting with SDK version 2.0.0.

Updates and improvements since 2.0.0-beta3

Version 2.0.0 is based on a subset of Apache Beam 2.0.0. The most relevant changes in this release for Cloud Dataflow customers include:

  • Added new API in BigQueryIO for writing into multiple tables, possibly with different schemas, based on data. See BigQueryIO.Write.to(SerializableFunction) and BigQueryIO.Write.to(DynamicDestinations).
  • Added new API for writing windowed and unbounded collections to TextIO and AvroIO. For example, see TextIO.Write.withWindowedWrites() and TextIO.Write.withFilenamePolicy(FilenamePolicy).
  • Added TFRecordIO to read and write TensorFlow TFRecord files.
  • Added the ability to automatically register CoderProviders in the default CoderRegistry. CoderProviders are registered by a ServiceLoader via concrete implementations of a CoderProviderRegistrar.
  • Changed order of parameters for ParDo with side inputs and outputs.
  • Changed order of parameters for MapElements and FlatMapElements transforms when specifying an output type.
  • Changed the pattern for reading and writing custom types to PubsubIO and KafkaIO.
  • Changed the syntax for reading to and writing from TextIO, AvroIO, TFRecordIO, KinesisIO, BigQueryIO.
  • Changed syntax for configuring windowing parameters other than the WindowFn itself using the Window transform.
  • Consolidated XmlSource and XmlSink into XmlIO.
  • Renamed CountingInput to GenerateSequence and unified the syntax for producing bounded and unbounded sequences.
  • Renamed BoundedSource#splitIntoBundles to #split.
  • Renamed UnboundedSource#generateInitialSplits to #split.
  • Output from @StartBundle is no longer possible. Instead of accepting a parameter of type Context, this method may optionally accept an argument of type StartBundleContext to access PipelineOptions.
  • Output from @FinishBundle now always requires an explicit timestamp and window. Instead of accepting a parameter of type Context, this method may optionally accept an argument of type FinishBundleContext to access PipelineOptions and emit output to specific windows.
  • XmlIO is no longer part of the SDK core. It must be added manually using the new xml-io package.

More information

Please see Cloud Dataflow documentation and release notes for version 2.0.