Skip to content

Commit

Permalink
More comments in notebooks and readme. Added presentation
Browse files Browse the repository at this point in the history
  • Loading branch information
FernanOrtega committed Dec 19, 2019
1 parent 50a6353 commit 854b75f
Show file tree
Hide file tree
Showing 10 changed files with 726 additions and 373 deletions.
6 changes: 3 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
.idea
.ipynb_checkpoints
mlruns
.idea
.ipynb_checkpoints
mlruns
402 changes: 201 additions & 201 deletions LICENSE

Large diffs are not rendered by default.

56 changes: 54 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,54 @@
# pythonsevilla2019
TODO
# pythonsevilla2019
This repo contains notebooks, code and presentation about the meetup [*Introducción a MLFlow y Databricks: acelerando el Machine Learning Lifecycle*](https://www.meetup.com/Python-Sevilla/events/266430587/) in the meetup group of [Python Sevilla Developers](https://www.meetup.com/Python-Sevilla).

Slides can be found both on [Slideshare](https://www.slideshare.net/fortega86/pythonsevilla2019-mlflow-introduction) and [in this repository](pythonsevilla2019 - Introducción a MLFlow.pdf)

## Installation and use
1. ``pip install -r requirements.txt``
2. ``jupyter notebook``

## MLFlow Tracking demo
Run ``jupyter notebook`` and execute the notebook ``tracking.ipynb``.

To execute the last section is necessary to possess an Azure or AWS account with a deployed Databricks resource.

## MLFlow Projects demo
The first example uses a public project located in the official MLFlow repository (https://github.com/mlflow/)

``mlflow run https://github.com/mlflow/mlflow-example.git -P alpha=5``

The second example uses our own project (./project_example)

```bash
cd project_example
mlflow run . -P n_estimators=500
```

## MLFLOW Models demo
Execute the notebook ``models.ipynb``.

The second part of the demo consist on deploying a trained model from the ``mlruns`` folder that MLFlow creates after tracking experiments.

So, it is necessary to navigate through the corresponding folder and execute the ``mlflow serve`` command.

```bash
cd mlruns/<selected experiment_id>/<selected run_id>/artifacts/model/
mlflow models serve -m . -p 1234
```

After some time, a gunicorn + Flask microservice is deployed on port 1234. It is possible to send http post request by means of programs like Postman. The endpoint is:

``localhost:1234/invocations``

And this is an example of valid body for the request:

```json
{
"data": [[ 0.52444161, 0.97309661, 0.43247518, 0.38717859, -1.03377319,
-0.73048166, -0.70972218, -0.41044243, -1.00047971, -0.82507126,
-0.08818832, -0.04623819, -0.18209319, -0.0038316 , -1.04758402,
-0.93257644, -0.65865037, -0.69601737, -0.71241416, -0.25530814,
0.58767599, 1.36061943, 0.48167379, 0.44795641, -0.62887522,
-0.64418546, -0.62375274, -0.23693879, 0.08147618, 0.05512114]]
}
```
134 changes: 134 additions & 0 deletions models.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introducción a MLFlow y Databricks: acelerando el Machine Learning LifeCycle - Python Sevilla 2019"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## MLFlow Models\n",
"In this section, we can see an example of using a trained model from MLFlow experiments."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load a trained model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import mlflow\n",
"import mlflow.sklearn\n",
"import numpy as np\n",
"from sklearn.datasets import load_breast_cancer\n",
"from sklearn.model_selection import train_test_split\n",
"import warnings\n",
"warnings.filterwarnings('ignore')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After decide which model will be used, paste here both experiment_id and run_id."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"experiment_id = \"...\"\n",
"run_id = \"...\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cancer = load_breast_cancer()\n",
"X = np.array(cancer.data)\n",
"y = np.array(cancer.target)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#Feature Scaling\n",
"from sklearn.preprocessing import StandardScaler\n",
"x_train, x_test, y_train, y_test = train_test_split(X, y, train_size=426, test_size=143, random_state=0)\n",
"sc = StandardScaler()\n",
"x_train = sc.fit_transform(x_train)\n",
"x_test = sc.transform(x_test)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sel = np.random.randint(len(X))\n",
"x_sel = x_test[sel]\n",
"y_sel = y_test[sel]\n",
"print(y_sel)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = mlflow.sklearn.load_model(f'./mlruns/{experiment_id}/{run_id}/artifacts/model')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model.predict([x_sel])"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
11 changes: 11 additions & 0 deletions project_example/MLProject
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
name: tutorial

conda_env: conda.yaml

entry_points:
main:
parameters:
n_estimators: {type: int, default: 100}
max_depth: {type: int, default: 2}
criterion: {type: str, default: "gini"}
command: "python train.py {n_estimators} {max_depth} {criterion}"
8 changes: 8 additions & 0 deletions project_example/conda.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
name: breast-rf
channels:
- defaults
dependencies:
- numpy=1.14.3
- pip:
- mlflow==1.3.0
- scikit-learn==0.22
58 changes: 58 additions & 0 deletions project_example/train.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
import mlflow
import mlflow.sklearn
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import sys
import warnings
warnings.filterwarnings('ignore')

# Function to validate a model
def validate_model(model, x_test, y_test):
y_pred = model.predict(x_test)
y_pred = (y_pred > 0.5)
from sklearn.metrics import confusion_matrix
tn, fp, fn, tp = confusion_matrix(y_test, y_pred).ravel()
precision = tp / (tp + fp)
recall = tp / (tp + fn)
accuracy = (tp + tn) / (tp + fp + tn + fn)

return precision, recall, accuracy


def breast_cancer_rf(n_estimators=100, max_depth=2, criterion="gini"):
from sklearn.ensemble import RandomForestClassifier
import mlflow.sklearn
with mlflow.start_run() as run:
clf = RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth, criterion=criterion)
mlflow.log_param("n_estimators", n_estimators)
mlflow.log_param("max_depth", max_depth)
mlflow.log_param("criterion", criterion)
mlflow.set_tag("model type", "sklearn - RandomForest")
clf.fit(x_train, y_train)
precision, recall, accuracy = validate_model(clf, x_test, y_test)
mlflow.log_metric("precision", precision)
mlflow.log_metric("recall", recall)
mlflow.log_metric("accuracy", accuracy)
mlflow.sklearn.log_model(clf, "model")
print("Model saved in run %s" % mlflow.active_run().info.run_uuid)


if __name__ == "__main__":
args = sys.argv[1:]
n_estimators = int(args[0])
max_depth = int(args[1])
criterion = args[2]

cancer = load_breast_cancer()
X = np.array(cancer.data)
y = np.array(cancer.target)

#Feature Scaling
x_train, x_test, y_train, y_test = train_test_split(X, y, train_size=426, test_size=143, random_state=0)
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)

breast_cancer_rf(n_estimators=n_estimators, max_depth=max_depth, criterion=criterion)
Binary file added pythonsevilla2019 - Introducción a MLFlow.pdf
Binary file not shown.
6 changes: 6 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Keras==2.3.1
mlflow==1.4.0
numpy==1.17.4
pandas==0.25.3
scikit-learn==0.22
tensorflow==2.0.0
Loading

0 comments on commit 854b75f

Please sign in to comment.