This page provides how to get started with Greenplum and Apache Spark. You can use these examples to apply these use cases.
- Read data from Greenplum table into Spark DataFrame
- Write data from Spark DataFrame into Greenplum table
The Pivotal Greenplum Database (GPDB) is an advanced, fully featured, open source data warehouse. It provides powerful and rapid analytics on petabyte scale data volumes. Uniquely geared toward big data analytics, Greenplum Database is powered by the world’s most advanced cost-based query optimizer delivering high analytical query performance on large data volumes.
https://pivotal.io/pivotal-greenplum
The Pivotal Greenplum-Spark Connector provides high speed, parallel data transfer between Greenplum Database and Apache Spark clusters to support:
Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Spark Streaming for stream processing. http://spark.apache.org/
MIT