More comments in notebooks and readme. Added presentation

python-sevilla · Dec 19, 2019 · 854b75f · 854b75f
1 parent 50a6353
commit 854b75f
Show file tree

Hide file tree

Showing 10 changed files with 726 additions and 373 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,3 +1,3 @@
-.idea
-.ipynb_checkpoints
-mlruns
+.idea
+.ipynb_checkpoints
+mlruns
diff --git a/LICENSE b/LICENSE
diff --git a/README.md b/README.md
@@ -1,2 +1,54 @@
-# pythonsevilla2019
-TODO
+# pythonsevilla2019
+This repo contains notebooks, code and presentation about the meetup [*Introducción a MLFlow y Databricks: acelerando el Machine Learning Lifecycle*](https://www.meetup.com/Python-Sevilla/events/266430587/) in the meetup group of [Python Sevilla Developers](https://www.meetup.com/Python-Sevilla).
+
+Slides can be found both on [Slideshare](https://www.slideshare.net/fortega86/pythonsevilla2019-mlflow-introduction) and [in this repository](pythonsevilla2019 - Introducción a MLFlow.pdf)
+
+## Installation and use
+1. ``pip install -r requirements.txt``
+2. ``jupyter notebook``
+
+## MLFlow Tracking demo
+Run ``jupyter notebook`` and execute the notebook ``tracking.ipynb``.
+
+To execute the last section is necessary to possess an Azure or AWS account with a deployed Databricks resource.
+
+## MLFlow Projects demo
+The first example uses a public project located in the official MLFlow repository (https://github.com/mlflow/)
+
+``mlflow run https://github.com/mlflow/mlflow-example.git -P alpha=5``
+
+The second example uses our own project (./project_example)
+
+```bash
+cd project_example
+mlflow run . -P n_estimators=500
+```
+
+## MLFLOW Models demo
+Execute the notebook ``models.ipynb``.
+
+The second part of the demo consist on deploying a trained model from the ``mlruns`` folder that MLFlow creates after tracking experiments.
+
+So, it is necessary to navigate through the corresponding folder and execute the ``mlflow serve`` command.
+
+```bash
+cd mlruns/<selected experiment_id>/<selected run_id>/artifacts/model/
+mlflow models serve -m . -p 1234
+```
+
+After some time, a gunicorn + Flask microservice is deployed on port 1234. It is possible to send http post request by means of programs like Postman. The endpoint is:
+
+``localhost:1234/invocations``
+
+And this is an example of valid body for the request:
+
+```json
+{
+	"data": [[ 0.52444161,  0.97309661,  0.43247518,  0.38717859, -1.03377319,
+       -0.73048166, -0.70972218, -0.41044243, -1.00047971, -0.82507126,
+       -0.08818832, -0.04623819, -0.18209319, -0.0038316 , -1.04758402,
+       -0.93257644, -0.65865037, -0.69601737, -0.71241416, -0.25530814,
+        0.58767599,  1.36061943,  0.48167379,  0.44795641, -0.62887522,
+       -0.64418546, -0.62375274, -0.23693879,  0.08147618,  0.05512114]]
+}
+```
diff --git a/models.ipynb b/models.ipynb
@@ -0,0 +1,134 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Introducción a MLFlow y Databricks: acelerando el Machine Learning LifeCycle - Python Sevilla 2019"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## MLFlow Models\n",
+    "In this section, we can see an example of using a trained model from MLFlow experiments."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Load a trained model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import mlflow\n",
+    "import mlflow.sklearn\n",
+    "import numpy as np\n",
+    "from sklearn.datasets import load_breast_cancer\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "import warnings\n",
+    "warnings.filterwarnings('ignore')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "After decide which model will be used, paste here both experiment_id and run_id."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "experiment_id = \"...\"\n",
+    "run_id = \"...\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "cancer = load_breast_cancer()\n",
+    "X = np.array(cancer.data)\n",
+    "y = np.array(cancer.target)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#Feature Scaling\n",
+    "from sklearn.preprocessing import StandardScaler\n",
+    "x_train, x_test, y_train, y_test = train_test_split(X, y, train_size=426, test_size=143, random_state=0)\n",
+    "sc = StandardScaler()\n",
+    "x_train = sc.fit_transform(x_train)\n",
+    "x_test = sc.transform(x_test)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sel = np.random.randint(len(X))\n",
+    "x_sel = x_test[sel]\n",
+    "y_sel = y_test[sel]\n",
+    "print(y_sel)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model = mlflow.sklearn.load_model(f'./mlruns/{experiment_id}/{run_id}/artifacts/model')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model.predict([x_sel])"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/project_example/MLProject b/project_example/MLProject
@@ -0,0 +1,11 @@
+name: tutorial
+
+conda_env: conda.yaml
+
+entry_points:
+  main:
+    parameters:
+      n_estimators: {type: int, default: 100}
+      max_depth: {type: int, default: 2}
+      criterion: {type: str, default: "gini"}
+    command: "python train.py {n_estimators} {max_depth} {criterion}"
diff --git a/project_example/conda.yaml b/project_example/conda.yaml
@@ -0,0 +1,8 @@
+name: breast-rf
+channels:
+  - defaults
+dependencies:
+  - numpy=1.14.3
+  - pip:
+    - mlflow==1.3.0
+    - scikit-learn==0.22
diff --git a/project_example/train.py b/project_example/train.py
@@ -0,0 +1,58 @@
+import mlflow
+import mlflow.sklearn
+import numpy as np
+from sklearn.datasets import load_breast_cancer
+from sklearn.model_selection import train_test_split
+from sklearn.preprocessing import StandardScaler
+import sys
+import warnings
+warnings.filterwarnings('ignore')
+
+# Function to validate a model        
+def validate_model(model, x_test, y_test):    
+    y_pred = model.predict(x_test)
+    y_pred = (y_pred > 0.5)
+    from sklearn.metrics import confusion_matrix
+    tn, fp, fn, tp = confusion_matrix(y_test, y_pred).ravel()
+    precision = tp / (tp + fp)
+    recall = tp / (tp + fn)
+    accuracy = (tp + tn) / (tp + fp + tn + fn)
+
+    return precision, recall, accuracy
+
+
+def breast_cancer_rf(n_estimators=100, max_depth=2, criterion="gini"):
+    from sklearn.ensemble import RandomForestClassifier
+    import mlflow.sklearn
+    with mlflow.start_run() as run:
+        clf = RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth, criterion=criterion)
+        mlflow.log_param("n_estimators", n_estimators)
+        mlflow.log_param("max_depth", max_depth)
+        mlflow.log_param("criterion", criterion)
+        mlflow.set_tag("model type", "sklearn - RandomForest")
+        clf.fit(x_train, y_train)
+        precision, recall, accuracy = validate_model(clf, x_test, y_test)
+        mlflow.log_metric("precision", precision)
+        mlflow.log_metric("recall", recall)
+        mlflow.log_metric("accuracy", accuracy)
+        mlflow.sklearn.log_model(clf, "model")
+        print("Model saved in run %s" % mlflow.active_run().info.run_uuid)
+
+
+if __name__ == "__main__":
+	args = sys.argv[1:]
+	n_estimators = int(args[0])
+	max_depth = int(args[1])
+	criterion = args[2]
+
+	cancer = load_breast_cancer()
+	X = np.array(cancer.data)
+	y = np.array(cancer.target)
+
+	#Feature Scaling
+	x_train, x_test, y_train, y_test = train_test_split(X, y, train_size=426, test_size=143, random_state=0)
+	sc = StandardScaler()
+	x_train = sc.fit_transform(x_train)
+	x_test = sc.transform(x_test)
+
+	breast_cancer_rf(n_estimators=n_estimators, max_depth=max_depth, criterion=criterion)
diff --git a/pythonsevilla2019 - Introducción a MLFlow.pdf b/pythonsevilla2019 - Introducción a MLFlow.pdf
diff --git a/requirements.txt b/requirements.txt
@@ -0,0 +1,6 @@
+Keras==2.3.1
+mlflow==1.4.0
+numpy==1.17.4
+pandas==0.25.3
+scikit-learn==0.22
+tensorflow==2.0.0