-
Notifications
You must be signed in to change notification settings - Fork 314
Setting up a PySpark 2.0 notebook with MLeap an Toree
Mikhail Semeniuk edited this page Nov 7, 2016
·
1 revision
We are going to assume you already have the following installed:
- Python 2.x
- Docker (required to install Toree)
virtualenv venv
source ./venv/bin/activate
pip install jupyter
Clone master into your working directory from Toree's github repo.
For this next step, you'll need to make sure that docker is running.
$ cd incubator-toree
$ make release
$ cd dist/toree-pip
$ pip install toree-0.2.0.dev1.tar.gz
SPARK_HOME=<path to spark> jupyter toree install --interpreters=PySpark
The most error-proof way to add mleap to your project is to modify the kernel directly (or create a new one for Toree and Spark 2.0).
Kernel config files are typically located in /usr/local/share/jupyter/kernels/apache_toree_pyspark/kernel.json
Go ahead and add or modify __TOREE_SPARK_OPTS_
and PYTHONPATH
like so:
"__TOREE_SPARK_OPTS__": "--packages com.databricks:spark-avro_2.11:3.0.1,ml.combust.mleap:mleap-spark_2.11:0.4.0",
"PYTHONPATH": "/usr/local/spark-2.0.0-bin-hadoop2.7/python:/usr/local/spark-2.0.0-bin-hadoop2.7/python/lib/py4j-0.10.1-src.zip:<path to mleap>/python"
An alternative way is to use AddDeps Magics, but we've run into dependency collisions, so do so at your own risk:
%AddDeps ml.combust.mleap mleap-spark_2.11 0.4.0 --transitive