Skip to content
This repository has been archived by the owner on Nov 11, 2022. It is now read-only.

Releases: GoogleCloudPlatform/DataflowJavaSDK

Version 0.4.20150727

28 Jul 17:18
Compare
Choose a tag to compare
Version 0.4.20150727 Pre-release
Pre-release
  • Removed the requirement to explicitly set --project if Google Cloud SDK has the default project configuration set.
  • Added support for creating BigQuery sources from a query.
  • Added support for custom unbounded sources in the DirectPipelineRunner and DataflowPipelineRunner. See UnboundedSource for details.
  • Removed unnecessary ExecutionContext argument in BoundedSource.createReader and related methods.
  • Changed BoundedReader.splitAtFraction to require thread-safety (i.e. safe to call asynchronously with advance or start). Added RangeTracker to help implement thread-safe readers. Users are heavily encouraged to use the class rather than implementing an ad-hoc solution.
  • Modified Combine transforms by lifting them into (and above) the GroupByKey resulting in better performance.
  • Modified triggers such that after a GroupByKey, the system will switch to a "Continuation Trigger", which attempts to preserve the original intention regarding handling of speculative and late triggerings instead of returning to the default trigger.
  • Added WindowFn.getOutputTimestamp and changed GroupByKey behavior to allow incomplete overlapping windows to not hold up progress of earlier, completed windows.
  • Changed triggering behavior so that empty panes are produced if they are the first pane after the watermark (ON_TIME) or the final pane.
  • Removed the Window.Trigger intermediate builder class.
  • Added validation that allowed lateness is specified on the Window PTransform when a trigger is specified.
  • Re-enabled verification of GroupByKey usage. Specifically, the key must have a deterministic coder and using GroupByKey with an unbounded PCollection requires windowing or triggers.
  • Changed PTransform names so that they may no longer contain the = or ; characters.

Version 0.4.20150710

16 Jul 20:39
Compare
Choose a tag to compare
Version 0.4.20150710 Pre-release
Pre-release
  • Added support for per-window tables to BigQueryIO.
  • Added support for a custom source implementation for Avro. See AvroSource for more details.
  • Removed 250GiB Google Cloud Storage file size upload restriction.
  • Fixed BigQueryIO.Write table creation bug in streaming mode.
  • Changed Source.createReader() and BoundedSource.createReader() to be abstract.
  • Moved Source.splitIntoBundles() to BoundedSource.splitIntoBundles().
  • Added support for reading bounded views of a PubSub stream in PubsubIO for non-streaming Dataflow pipeline runners and DirectPipelineRunner.
  • Added support for getting a Coder using a Class to the CoderRegistry.
  • Changed CoderRegistry.registerCoder(Class<T>, Coder<T>) to enforce that the provided coder actually encodes values of the given class, and its use with raw types of generic classes is forbidden as it will rarely work correctly.
  • Migrate to Create.withCoder() and CreateTimestamped.withCoder() instead of calling setCoder() on the outcoming PCollection when the Create PTransform is being applied.
  • Added three successively more detailed WordCount examples.
  • Removed PTransform.getDefaultName() which was redundant with PTransform.getKindString().
  • Added support a unique name check for PTransform's during job creation.
  • Removed PTransform.withName() and PTransform.setName(). The name of a transform is now immutable after construction. Library transforms (like Combine) can provide builder-like methods to change the name. Names can always be overridden at the location where the transform is applied using apply("name", transform).
  • Added the ability to select the network for worker VMs using DataflowPipelineWorkerPoolOptions.setNetwork(String).

Version 0.4.20150602

02 Jun 19:23
Compare
Choose a tag to compare
Version 0.4.20150602 Pre-release
Pre-release
  • Added a dependency on the gcloud core component version 2015.02.05 or newer. Update to the latest version of gcloud by running gcloud components update. See Application Default Credentials for more details on how credentials can be specified.
  • Removed previously deprecated Flatten.create(). Use Flatten.pCollections() instead.
  • Removed previously deprecated Coder.isDeterministic(). Implement Coder.verifyDeterministic() instead.
  • Replaced DoFn.Context#createAggregator with DoFn#createAggregator.
  • Added support for querying the current value of an Aggregator. See PipelineResult for more information.
  • Added experimental DoFnWithContext to simplify accessing additional information from a DoFn.
  • Removed experimental RequiresKeyedState.
  • Added CannotProvideCoderException to indicate inability to infer a coder, instead of returning null in such cases.
  • Added CoderProperties for assembling test suites for user-defined coders.
  • Replaced a constructor of PDone with a static factory PDone.in(Pipeline).
  • Updated string formatting of the TIMESTAMP values returned by the BigQuery source, when using DirectPipelineRunner or when BigQuery data is used as a side input, which aligns it with the case when BigQuery data is used as a main input.
  • Added a requirement that the value returned by Source.Reader.getCurrent() must be immutable and remain valid indefinitely.
  • Replaced some usage of Source with BoundedSource. For example, Read.from() transform can now only be applied to BoundedSource objects.
  • Moved experimental late-data handling, i.e., the data that arrives to the streaming pipeline after the watermark has passed it, from PubSubIO to Window. Late data will default to being dropped at the first GroupByKey following a Read operation. To allow late data through use Window.Bound#withAllowedLateness.
  • Added experimental support for accumulating elements within a window across panes.

Version 0.4.20150414

16 Apr 12:10
Compare
Choose a tag to compare
Version 0.4.20150414 Pre-release
Pre-release
  • Initial Beta release of the Dataflow SDK for Java.
  • Improved execution performance in many areas of the system.
  • Added support for progress estimation and dynamic work rebalancing for user-defined sources.
  • Added support for user-defined sources to provide the timestamp of the values read via Reader.getCurrentTimestamp().
  • Added support for user-defined sinks.
  • Added support for custom types in PubsubIO.
  • Added support for reading and writing XML files. See XmlSource and XmlSink.
  • Renamed DatastoreIO.Write.to to DatastoreIO.writeTo. In addition, entities written to Cloud Datastore must have complete keys.
  • Renamed ReadSource transform into Read.
  • Replaced Source.createBasicReader with Source.createReader.
  • Added support for triggers, which allows getting early or partial results for a window, and specifying when to process late data. See Window.into.triggering.
  • Reduced visibility of PTransform's getInput(), getOutput(), getPipeline(), and getCoderRegistry(). These methods will soon be deleted.
  • Renamed DoFn.ProcessContext#windows to DoFn.ProcessContext#window. In order for a DoFn to call DoFn.ProcessContext#window, it must implement RequiresWindowAccess.
  • Added DoFn.ProcessContext#windowingInternals to enable windowing on third-party runners.
  • Added support for side inputs when running streaming pipelines on the [Blocking]DataflowPipelineRunner.
  • Changed [Keyed]CombineFn.addInput() to return the new accumulator value. Renamed Combine.perElement().withHotKeys() to Combine.perElement().withHotKeyFanout().
  • Renamed First.of to Sample.any and RateLimiting to IntraBundleParallelization to better represent its functionality.

Version 0.3.20150326

27 Mar 16:11
Compare
Choose a tag to compare
Version 0.3.20150326 Pre-release
Pre-release
  • Added support for accessing PipelineOptions in the Dataflow worker.
  • Removed one of the type parameters in PCollectionView, which may require simple changes to user's code that uses PCollectionView.
  • Changed side input API to apply per window. Calls to sideInput() now return values only in the specific window corresponding to the window of the main input element, and not the whole side input PCollectionView. Consequently, sideInput() can no longer be called from startBundle and finishBundle of a DoFn.
  • Added support for viewing a PCollection as a Map when used as a side input. See View.asMap().
  • Renamed custom source API to use term "bundle" instead of "shard" in all names. Additionally, term "fork" is replaced with "dynamic split".
  • Custom source Reader now requires implementing new method start(). Existing code can be fixed by simply adding this method that just calls advance() and returns its value. Additionally, code that uses the Reader should be updated to use both start() and advance(), instead of advance() only.

Version 0.3.20150227

02 Mar 23:47
Compare
Choose a tag to compare
Version 0.3.20150227 Pre-release
Pre-release
  • Initial Alpha version of the Dataflow SDK for Java with support for streaming pipelines.
  • Added determinism checker in AvroCoder to make it easier to interoperate with GroupByKey.
  • Added support for accessing PipelineOptions in the worker.
  • Added support for compressed sources.

Version 0.3.20150211

12 Feb 00:13
Compare
Choose a tag to compare
Version 0.3.20150211 Pre-release
Pre-release
  • Removed the dependency on the gcloud core component version 2015.02.05 or newer.

Version 0.3.20150210

11 Feb 01:00
Compare
Choose a tag to compare
Version 0.3.20150210 Pre-release
Pre-release

Caution: depends on the gcloud core component version 2015.02.05 or newer.

  • Included streaming pipeline runner, which, for now, requires additional whitelisting.
  • Renamed several windowing-related APIs in a non-backward-compatible way.
  • Added support for custom sources, which you can use to read from your own input formats.
  • Introduced worker parallelism: one task per processor.

Version 0.3.20141216

12 Jan 22:16
Compare
Choose a tag to compare
Version 0.3.20141216 Pre-release
Pre-release
  • Initial Alpha version of the Dataflow SDK for Java.

Version 0.3.20150109

12 Jan 21:49
Compare
Choose a tag to compare
Version 0.3.20150109 Pre-release
Pre-release
  • Fixed several platform-specific issues for Microsoft Windows.
  • Fixed several Java 8-specific issues.
  • Added a few new examples.