Skip to content

kongyew/greenplum-spark-connector

Repository files navigation

GitPitch

Getting started with Greenplum and Apache Spark in minutes

This page provides how to get started with Greenplum and Apache Spark. You can use these examples to apply these use cases.

Greenplum - Spark Architecture:

Use Cases:

  1. Read data from Greenplum table into Spark DataFrame
  2. Write data from Spark DataFrame into Greenplum table

Reference

Pivotal Greenplum

The Pivotal Greenplum Database (GPDB) is an advanced, fully featured, open source data warehouse. It provides powerful and rapid analytics on petabyte scale data volumes. Uniquely geared toward big data analytics, Greenplum Database is powered by the world’s most advanced cost-based query optimizer delivering high analytical query performance on large data volumes.

https://pivotal.io/pivotal-greenplum

Pivotal Greenplum-Spark Connector

The Pivotal Greenplum-Spark Connector provides high speed, parallel data transfer between Greenplum Database and Apache Spark clusters to support:

Apache Spark

Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Spark Streaming for stream processing. http://spark.apache.org/

License

MIT