chantilly
is a deployment tool for online machine learning models. It is designed to work hand in hand with river
.
⚠️ Chantilly is not being worked on anymore. It's being replaced by Beaver. The codebase is now frozen.
- Table of contents
- Introduction
- Installation
- User guide
- Examples
- Development
- Roadmap
- Technical stack
- Similar alternatives
- License
There are many tools for deploying machine learning models. However, none of them support online models that can learn on the fly, but chantilly
does.
Here are some advantages:
- Simple:
chantilly
is essentially just a Flask app. - Straightforward: you provide a model,
chantilly
provides a few API routes to do the rest. - Convenient:
chantilly
rids you of the burden of storing features between prediction and learning steps. - Flexible: use any model you wish, as long as it's written in Python and implements a couple of required methods.
Note that chantilly
is very young, and is therefore subject to evolve. We're also eager for feedback and are happy to work hand in hand with you if you have specific needs.
chantilly
is intended to work with Python 3.7 or above. You can install it from PyPI:
> pip install chantilly
You can also install the latest development version as so:
> pip install git+https://github.com/online-ml/chantilly
Once you've followed the installation step, you'll get access to the chantilly
CLI. You can see the available commands by running chantilly --help
. You can start a server with the run
command:
> chantilly run
This will start a Flask server with all the necessary routes for uploading a model, training it, making predictions with it, and monitoring it. By default, the server will be accessible at localhost:5000
, which is what we will be using in the rest of the examples in the user guide. You can run chantilly routes
in order to see all the available routes.
The first thing you need to do is pick a flavor. Currently, the available flavors are:
regression
for regression tasks.binary
for binary classification tasks.multiclass
for multi-class classification tasks.
You can set the flavor by sending a POST request to @/api/init
, as so:
import requests
config = {'flavor': 'regression'}
requests.post('http://localhost:5000/api/init', json=config)
You can also set the flavor via the CLI:
> chantilly init regression
You can view the current flavor by sending a GET request to @/api/init
:
r = requests.get('http://localhost:5000/api/init')
print(r.json()['flavor'])
You can upload a model by sending a POST request to the @/api/model
route. You need to provide a model which has been serialized with pickle
or dill
(we recommend the latter). For example:
from river import compose
from river import linear_model
from river import preprocessing
import dill
import requests
model = compose.Pipeline(
preprocessing.StandardScaler(),
linear_model.LinearRegression()
)
requests.post('http://localhost:5000/api/model', data=dill.dumps(model))
Likewise, the model can be retrieved by sending a GET request to @/api/model
:
r = requests.get('http://localhost:5000/api/model')
model = pickle.loads(r.content)
Note that chantilly
will validate the model you provide to make sure it works with the flavor you picked. For instance, if you picked the regression
flavor, then the model has to implement fit_one
and predict_one
.
You can also add a upload by using the CLI. First, you need to serialize a model and dump it to a file:
with open('model.pkl', 'wb') as file:
dill.dump(model, file)
Then, call the add-model
sub-command:
> chantilly add-model model.pkl
Predictions can be obtained by sending a POST request to @/api/predict
. The payload you send has to contain a field named features
. The value of this field will be passed to the predict_one
method of the model you uploaded earlier on. If the model you provided predict_proba_one
then that will be used instead. Here is an example:
import requests
r = requests.post('http://localhost:5000/api/predict', json={
'id': 42,
'features': {
'shop': 'Ikea',
'item': 'Dombäs',
'date': '2020-03-22T10:42:29Z'
}
})
print(r.json()['prediction'])
Note that in the previous snippet we've also provided an id
field. This field is optional. If is is provided, then the features will be stored by the chantilly
server, along with the prediction. This allows not having to provide the features again when you want to update the model later on.
The model can be updated by sending a POST request to @/api/learn
. If you've provided an ID in an earlier call to @/api/predict
, then you only have to provide said ID along with the ground truth:
import requests
requests.post('http://localhost:5000/api/learn', json={
'id': 42,
'ground_truth': 10.21
})
However, if you haven't passed an ID earlier on, then you have to provide the features yourself:
requests.post('http://localhost:5000/api/learn', json={
'features': {
'shop': 'Ikea',
'item': 'Dombäs',
'date': '2020-03-22T10:42:29Z'
},
'ground_truth': 10.21
})
Note that the id
field will have precedence in case both of id
and features
are provided. We highly recommend you to use the id
field. First of all it means that you don't have to take care of storing the features between calls to @/api/predict
and @/api/learn
. Secondly it makes the metrics more reliable because they will be using the predictions that were made at the time @/api/predict
was called.
You can access the current metrics via a GET request to the @/api/metrics
route.
Additionally, you can access a stream of metric updates by using the @/api/stream/metrics
. This is a streaming route which implements server-sent events (SSE). As such it will notify listeners every time the metrics are updates. For instance, you can use the sseclient
, which is a thin layer on top of requests
:
import json
import sseclient
messages = sseclient.SSEClient('http://localhost:5000/api/stream/metrics')
for msg in messages:
metrics = json.loads(msg.data)
print(metrics)
You can use the following piece of JavaScript to do the same thing in a browser:
var es = new EventSource('http://localhost:5000/api/stream/metrics');
es.onmessage = e => {
var metrics = JSON.parse(e.data);
console.log(metrics)
};
You can also listen to all the prediction and learning events via the @/api/stream/events
route. This will yield SSE events with an event name attached, which is either 'predict' or 'learn'. From a Python interpreter, you can do the following:
import json
import sseclient
messages = sseclient.SSEClient('http://localhost:5000/api/stream/events')
for msg in messages:
data = json.loads(msg.data)
if msg.event == 'learn':
print(data['model'], data['features'], data['prediction'], data['ground_truth'])
else:
print(data['model'], data['features'], data['prediction'])
In JavaScript, you can you use the addEventListener
method:
var es = new EventSource('http://localhost:5000/api/stream/events');
es.addEventListener('learn', e => {
var data = JSON.parse(e.data);
console.log(data.model, data.features, data.prediction, data.ground_truth)
};
es.addEventListener('predict', e => {
var data = JSON.parse(e.data);
console.log(data.model, data.features, data.prediction)
};
A live dashboard is accessible if you navigate to localhost:5000
in your browser.
Under the hood the dashboard is simply listening to the API's streaming routes.
You can obtain some essential statistics, by querying the @/api/stats
routes:
import requests
r = requests.get('http://localhost:5000/api/stats')
print(r.json())
Here is an output example:
{
"learn": {
"ewm_duration": 3408682,
"ewm_duration_human": "3ms408μs682ns",
"mean_duration": 6541916,
"mean_duration_human": "6ms541μs916ns",
"n_calls": 98
},
"predict": {
"ewm_duration": 3190724,
"ewm_duration_human": "3ms190μs724ns",
"mean_duration": 5248635,
"mean_duration_human": "5ms248μs635ns",
"n_calls": 213
}
}
The mean_duration
fields contain the average duration of each endpoint. The ewm_duration
fields contain an exponential moving average of said duration, and therefore gives you an idea of the recent performance, which can allow you to detect arising performance issues. Note that these durations do not include the time it takes to transmit the response over the network. These durations only pertain to the processing time on chantilly
's side, including but not limited to calls to the model.
These statistic are voluntarily very plain. Their only purpose is to provide a quick healthcheck. The proper way to monitor a web application's performance, including a Flask app, is to use purpose-built tools. For instance you could use Loki to monitor the application logs and Grafana to visualize and analyze them.
You can use different models by giving them names. You can provide a name to a model by adding a suffix to @/api/model
:
from river import tree
import dill
import requests
model = tree.DecisionTreeClassifier()
requests.post('http://localhost:5000/api/model/barney-stinson', data=dill.dumps(model))
You can also pick a name when you add the model through the CLI:
> chantilly add-model model.pkl --name barney-stinson
You can then choose which model to use when you make a prediction:
r = requests.post('http://localhost:5000/api/predict', json={
'id': 42,
'features': {
'shop': 'Ikea',
'item': 'Dombäs',
'date': '2020-03-22T10:42:29Z'
},
'model': 'barney-stinson'
})
The model which was provided last will be used by default if the model
field is not specified. If you provide an id
, then the model which was used for making the prediction will be the one that is updated once the ground truth is made available. You can also specify which model to update directly as so:
requests.post('http://localhost:5000/api/learn', json={
'id': 42,
'ground_truth': 10.21,
'model': 'barney-stinson'
})
Note that the data associated with the given id
is deleted once the model has been updated. In other words you can't call the @/api/model
with the same id
twice.
You can view the available models as well as the default model by sending a GET request to @/api/models
:
r = requests.get('http://localhost:5000/api/models')
print(r.json())
You can delete a model by sending a DELETE request to @/api/model
:
requests.delete('http://localhost:5000/api/model/barney-stinson')
chantilly
follows Flask's instance folder pattern. This means that you can configure chantilly
via a file named instance/config.py
. Note that the location is relative to where you are launching the chantilly run
command from (more information can be found here). You can also configure chantilly
by setting environment variables.
You can set all the builtin variables that Flask provides. You can also set the following variables which are specific to chantilly
:
STORAGE_BACKEND
: determines which storage backend to use.SHELVE_PATH
: location of the shelve database file. Only applies ifSTORAGE_BACKEND
is set toshelve
.REDIS_HOST
: required ifSTORAGE_BACKEND
is set toredis
.REDIS_PORT
: required ifSTORAGE_BACKEND
is set toredis
.REDIS_DB
: required ifSTORAGE_BACKEND
is set toredis
.
The instance/config.py
is a Python file that gets executed before the app starts, therefore this is also where you can configure logging. Here is an example instance/config.py
file:
from logging.config import dictConfig
dictConfig({
'version': 1,
'formatters': {'default': {
'format': '[%(asctime)s] %(levelname)s in %(module)s: %(message)s',
}},
'handlers': {'wsgi': {
'class': 'logging.FileHandler',
'filename': '/var/log/chantilly/error.log',
'formatter': 'default'
}},
'root': {
'level': 'INFO',
'handlers': ['wsgi']
}
})
SECRET_KEY = 'keep_it_secret_keep_it_safe'
SHELVE_PATH = '/usr/local/chantilly'
Currently, the default storage backend is based on the shelve module. It's possible to use a different backend by setting the STORAGE_BACKEND
environment variable.
Add the following to your instance/config.py
file:
STORAGE_BACKEND = 'redis'
REDIS_HOST = 'localhost'
REDIS_PORT = 6379
REDIS_DB = 0
Naturally, the values have to be chosen according to your Redis setup.
It's highly likely that your model will be using external dependencies. A prime example is the datetime
module, which you'll probably want to use to parse datetime strings. Instead of specifying which libraries you want chantilly
to import, the current practice is to import your requirements within your model. For instance, here is an excerpt taken from the New-York city taxi trips example:
from river import compose
from river import linear_model
from river import preprocessing
def parse(trip):
import datetime as dt
trip['pickup_datetime'] = dt.datetime.fromisoformat(trip['pickup_datetime'])
return trip
def datetime_info(trip):
import calendar
day_no = trip['pickup_datetime'].weekday()
return {
'hour': trip['pickup_datetime'].hour,
**{day: i == day_no for i, day in enumerate(calendar.day_name)}
}
model = compose.FuncTransformer(parse)
model |= compose.FuncTransformer(datetime_info)
model |= preprocessing.StandardScaler()
model |= linear_model.LinearRegression()
Note that you need to make sure that the Python interpreter you're running chantilly
with has access to the libraries you want to use.
Essentially, chantilly
is just a Flask application. Therefore, it allows the same deployment options as any other Flask application.
git clone https://github.com/online-ml/chantilly
cd chantilly
python3 -m venv env
source env/bin/activate
pip install -e ".[dev]"
python setup.py develop
There are some extra dependencies that can be installed if necessary.
pip install -e ".[dev,redis]"
You can then run tests.
pytest
The default testing environment uses the shelve module; you can also use redis:
pytest --redis
You may also run the app in development mode.
export FLASK_ENV=development
chantilly run
To deploy to PyPI:
- Update
chantilly/__version__.py
pip install twine
python setup.py sdist
twine check dist/*
twine upload dist/*
- HTTP long polling: Currently, clients can interact with
river
over a straightforward HTTP protocol. Therefore the speed bottleneck comes from the web requests, not from the machine learning. We would like to provide a way to interact withchantilly
via long-polling. This means that a single connection can be used to process multiple predictions and model updates, which reduces the overall latency. - Scaling: At the moment
chantilly
is designed to be run as a single server. Ideally we want to allowchantilly
in a multi-server environment. Predictions are simple to scale because the model can be used concurrently. However, updating the model concurrently leads to reader-write problems. We have some ideas in the pipe, but this is going to need some careful thinking. - Grafana dashboard: The current dashboard is a quick-and-dirty proof of concept. In the long term, we would like to provide a straighforward way to connect with a Grafana instance without having to get your hands dirty. Ideally, we would like to use SQLite as a data source for simplicity reasons. However, The Grafana team has not planned to add support for SQLite. Instead, they encourage users to use plugins. We might also look into Prometheus and InfluxDB.
- Support more paradigms: For the moment we cater to regression and classification models. In the future we also want to support other paradigms, such as time series forecasting and recommender systems.
- Flask for the web server.
- dill for model serialization.
- marshmallow for the API input validation.
- Vue.js, Chart.js, and Moment.js for the web interface.
Most machine learning deployment tools only support making predictions with a trained model. They don't cater to online models which can be updated on the fly. Nonetheless, some of them are quite interesting and are very much worth looking into!
river
is free and open-source software licensed under the 3-clause BSD license.