Skip to content

Commit

Permalink
[Docs] RayDP Documentation (#14018)
Browse files Browse the repository at this point in the history
* .

* done?

* Docs

* Docs

* Update raydp.rst

* Update raydp.rst

Co-authored-by: Alex Wu <[email protected]>
  • Loading branch information
Alex Wu and Alex Wu committed Feb 11, 2021
1 parent 1b1a249 commit 1e800ab
Show file tree
Hide file tree
Showing 2 changed files with 105 additions and 0 deletions.
1 change: 1 addition & 0 deletions doc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -304,6 +304,7 @@ Papers
xgboost-ray.rst
dask-on-ray.rst
mars-on-ray.rst
raydp.rst
ray-client.rst
.. toctree::
Expand Down
104 changes: 104 additions & 0 deletions doc/source/raydp.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
********************
RayDP (Spark on Ray)
********************

RayDP combines your Spark and Ray clusters, making it easy to do large scale
data processing using the PySpark API and seemlessly use that data to train
your models using TensorFlow and PyTorch.

For more information and examples, see the RayDP Github page:
https://github.com/oap_project/raydp

================
Installing RayDP
================

RayDP can be installed from PyPI and supports PySpark 3.0 and 3.1.

.. code-block bash
pip install raydp
.. note::
RayDP requires ray >= 1.2.0

.. note::
In order to run Spark, the head and worker nodes will need Java installed.

========================
Creating a Spark Session
========================

To create a spark session, call ``raydp.init_spark``

For example,

.. code-block:: python
import raydp
spark = raydp.init_spark(
app_name = "example",
num_executors = 10,
executor_cores = 64,
memory_per_executor = "256GB"
)
====================================
Deep Learning with a Spark DataFrame
====================================

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Training a Spark DataFrame with TensorFlow
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``raydp.tf.TFEstimator`` provides an API for training with TensorFlow.

.. code-block:: python
d = [{'age': 17 , 'grade': 12}]
df = spark.createDataFrame(d).collect()
from tensorflow import keras
model = keras.Sequential([])
estimator = raydp.tf.TFEstimator(
model = model,
num_worker = 10,
feature_columns = ["age"],
label_column = ["grade"]
)
estimator.fit_on_spark(df, test_df=None)
tensorflow_model = estimator.get_model()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Training a Spark DataFrame with TensorFlow
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Similarly, ``raydp.torch.TorchEstimator`` provides an API for training with
PyTorch.

.. code-block:: python
d = [{'age': 17 , 'grade': 12}]
df = spark.createDataFrame(d).collect()
import torch
model = torch.nn.Sequential()
estimator = raydp.tf.TFEstimator(
model = model,
num_worker = 10,
feature_columns = ["age"],
label_column = ["grade"]
)
estimator.fit_on_spark(df, test_df=None)
pytorch_model = estimator.get_model()

0 comments on commit 1e800ab

Please sign in to comment.