forked from FernanOrtega/pythonsevilla2019
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
More comments in notebooks and readme. Added presentation
- Loading branch information
1 parent
50a6353
commit 854b75f
Showing
10 changed files
with
726 additions
and
373 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,3 @@ | ||
.idea | ||
.ipynb_checkpoints | ||
mlruns | ||
.idea | ||
.ipynb_checkpoints | ||
mlruns |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,54 @@ | ||
# pythonsevilla2019 | ||
TODO | ||
# pythonsevilla2019 | ||
This repo contains notebooks, code and presentation about the meetup [*Introducción a MLFlow y Databricks: acelerando el Machine Learning Lifecycle*](https://www.meetup.com/Python-Sevilla/events/266430587/) in the meetup group of [Python Sevilla Developers](https://www.meetup.com/Python-Sevilla). | ||
|
||
Slides can be found both on [Slideshare](https://www.slideshare.net/fortega86/pythonsevilla2019-mlflow-introduction) and [in this repository](pythonsevilla2019 - Introducción a MLFlow.pdf) | ||
|
||
## Installation and use | ||
1. ``pip install -r requirements.txt`` | ||
2. ``jupyter notebook`` | ||
|
||
## MLFlow Tracking demo | ||
Run ``jupyter notebook`` and execute the notebook ``tracking.ipynb``. | ||
|
||
To execute the last section is necessary to possess an Azure or AWS account with a deployed Databricks resource. | ||
|
||
## MLFlow Projects demo | ||
The first example uses a public project located in the official MLFlow repository (https://github.com/mlflow/) | ||
|
||
``mlflow run https://github.com/mlflow/mlflow-example.git -P alpha=5`` | ||
|
||
The second example uses our own project (./project_example) | ||
|
||
```bash | ||
cd project_example | ||
mlflow run . -P n_estimators=500 | ||
``` | ||
|
||
## MLFLOW Models demo | ||
Execute the notebook ``models.ipynb``. | ||
|
||
The second part of the demo consist on deploying a trained model from the ``mlruns`` folder that MLFlow creates after tracking experiments. | ||
|
||
So, it is necessary to navigate through the corresponding folder and execute the ``mlflow serve`` command. | ||
|
||
```bash | ||
cd mlruns/<selected experiment_id>/<selected run_id>/artifacts/model/ | ||
mlflow models serve -m . -p 1234 | ||
``` | ||
|
||
After some time, a gunicorn + Flask microservice is deployed on port 1234. It is possible to send http post request by means of programs like Postman. The endpoint is: | ||
|
||
``localhost:1234/invocations`` | ||
|
||
And this is an example of valid body for the request: | ||
|
||
```json | ||
{ | ||
"data": [[ 0.52444161, 0.97309661, 0.43247518, 0.38717859, -1.03377319, | ||
-0.73048166, -0.70972218, -0.41044243, -1.00047971, -0.82507126, | ||
-0.08818832, -0.04623819, -0.18209319, -0.0038316 , -1.04758402, | ||
-0.93257644, -0.65865037, -0.69601737, -0.71241416, -0.25530814, | ||
0.58767599, 1.36061943, 0.48167379, 0.44795641, -0.62887522, | ||
-0.64418546, -0.62375274, -0.23693879, 0.08147618, 0.05512114]] | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,134 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Introducción a MLFlow y Databricks: acelerando el Machine Learning LifeCycle - Python Sevilla 2019" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## MLFlow Models\n", | ||
"In this section, we can see an example of using a trained model from MLFlow experiments." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Load a trained model" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import mlflow\n", | ||
"import mlflow.sklearn\n", | ||
"import numpy as np\n", | ||
"from sklearn.datasets import load_breast_cancer\n", | ||
"from sklearn.model_selection import train_test_split\n", | ||
"import warnings\n", | ||
"warnings.filterwarnings('ignore')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"After decide which model will be used, paste here both experiment_id and run_id." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"experiment_id = \"...\"\n", | ||
"run_id = \"...\"" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"cancer = load_breast_cancer()\n", | ||
"X = np.array(cancer.data)\n", | ||
"y = np.array(cancer.target)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"#Feature Scaling\n", | ||
"from sklearn.preprocessing import StandardScaler\n", | ||
"x_train, x_test, y_train, y_test = train_test_split(X, y, train_size=426, test_size=143, random_state=0)\n", | ||
"sc = StandardScaler()\n", | ||
"x_train = sc.fit_transform(x_train)\n", | ||
"x_test = sc.transform(x_test)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"sel = np.random.randint(len(X))\n", | ||
"x_sel = x_test[sel]\n", | ||
"y_sel = y_test[sel]\n", | ||
"print(y_sel)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"model = mlflow.sklearn.load_model(f'./mlruns/{experiment_id}/{run_id}/artifacts/model')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"model.predict([x_sel])" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.7.5" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
name: tutorial | ||
|
||
conda_env: conda.yaml | ||
|
||
entry_points: | ||
main: | ||
parameters: | ||
n_estimators: {type: int, default: 100} | ||
max_depth: {type: int, default: 2} | ||
criterion: {type: str, default: "gini"} | ||
command: "python train.py {n_estimators} {max_depth} {criterion}" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
name: breast-rf | ||
channels: | ||
- defaults | ||
dependencies: | ||
- numpy=1.14.3 | ||
- pip: | ||
- mlflow==1.3.0 | ||
- scikit-learn==0.22 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
import mlflow | ||
import mlflow.sklearn | ||
import numpy as np | ||
from sklearn.datasets import load_breast_cancer | ||
from sklearn.model_selection import train_test_split | ||
from sklearn.preprocessing import StandardScaler | ||
import sys | ||
import warnings | ||
warnings.filterwarnings('ignore') | ||
|
||
# Function to validate a model | ||
def validate_model(model, x_test, y_test): | ||
y_pred = model.predict(x_test) | ||
y_pred = (y_pred > 0.5) | ||
from sklearn.metrics import confusion_matrix | ||
tn, fp, fn, tp = confusion_matrix(y_test, y_pred).ravel() | ||
precision = tp / (tp + fp) | ||
recall = tp / (tp + fn) | ||
accuracy = (tp + tn) / (tp + fp + tn + fn) | ||
|
||
return precision, recall, accuracy | ||
|
||
|
||
def breast_cancer_rf(n_estimators=100, max_depth=2, criterion="gini"): | ||
from sklearn.ensemble import RandomForestClassifier | ||
import mlflow.sklearn | ||
with mlflow.start_run() as run: | ||
clf = RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth, criterion=criterion) | ||
mlflow.log_param("n_estimators", n_estimators) | ||
mlflow.log_param("max_depth", max_depth) | ||
mlflow.log_param("criterion", criterion) | ||
mlflow.set_tag("model type", "sklearn - RandomForest") | ||
clf.fit(x_train, y_train) | ||
precision, recall, accuracy = validate_model(clf, x_test, y_test) | ||
mlflow.log_metric("precision", precision) | ||
mlflow.log_metric("recall", recall) | ||
mlflow.log_metric("accuracy", accuracy) | ||
mlflow.sklearn.log_model(clf, "model") | ||
print("Model saved in run %s" % mlflow.active_run().info.run_uuid) | ||
|
||
|
||
if __name__ == "__main__": | ||
args = sys.argv[1:] | ||
n_estimators = int(args[0]) | ||
max_depth = int(args[1]) | ||
criterion = args[2] | ||
|
||
cancer = load_breast_cancer() | ||
X = np.array(cancer.data) | ||
y = np.array(cancer.target) | ||
|
||
#Feature Scaling | ||
x_train, x_test, y_train, y_test = train_test_split(X, y, train_size=426, test_size=143, random_state=0) | ||
sc = StandardScaler() | ||
x_train = sc.fit_transform(x_train) | ||
x_test = sc.transform(x_test) | ||
|
||
breast_cancer_rf(n_estimators=n_estimators, max_depth=max_depth, criterion=criterion) |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
Keras==2.3.1 | ||
mlflow==1.4.0 | ||
numpy==1.17.4 | ||
pandas==0.25.3 | ||
scikit-learn==0.22 | ||
tensorflow==2.0.0 |
Oops, something went wrong.