Skip to content
This repository has been archived by the owner on Jul 3, 2024. It is now read-only.
Margriet Groenendijk edited this page Aug 1, 2018 · 4 revisions

Short Name

Build a machine learning recommendation engine to encourage additional purchases based on past buying behaviour

Short Description

Use Jupyter Notebooks with IBM Watson Studio to build an interactive recommendation engine PixieApp. With Watson Machine Learning a clustering model is deployed and ready to be used as an API, which is explored and tested in a notebook and used in the interactive PixieApp.

Offering Type

Cognitive

Introduction

This code pattern shows how to build a recommendation engine from customer data with Jupyter notebooks, Apache Spark and PixieDust, which are all open-source projects. When combined with Watson Studio and Watson Machine Learning a user can quickly produce an interactive dashboard to explore and test a recommendation model.

Author

By Patrick Titzler and Margriet Groenendijk

Code

Demo

  • link to demo video

Video

  • link to youtube video

Overview

In this code pattern historical shopping data is used to build a recommendation engine with Spark and Watson Machine Learning. The model is then used in an interactive PixieApp in which a shopping basket is simulated and used to create a list of recommendations.

When the reader has completed this code pattern, they will understand how to:

Flow

  1. Log in to IBM Watson Studio
  2. Load the provided notebook into Watson Studio
  3. Load and transform the customer data in the notebook
  4. Build a k-means clustering model with SparkML
  5. Deploy the model to Watson Machine Learning
  6. Test and compare the models build in the notebook and through the Watson Machine Learning API
  7. Use the API to build an interactive PixieApp

Included components

  • IBM Watson Studio: a suite of tools and a collaborative environment for data scientists, developers and domain experts
  • IBM Apache Spark: an open source cluster computing framework optimized for extremely fast and large scale data processing
  • IBM Watson Machine Learning: a set of REST APIs to develop applications that make smarter decisions, solve tough problems, and improve user outcomes

Featured technologies

  • Jupyter notebooks: an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text
  • PixieDust: Python helper library for Jupyter notebooks
  • PixieApps: Python library to write and run UI elements for analytics directly in a Jupyter notebook

Blog

Blog Title: Build an interactive product recommender with Spark and PixieDust

Blog Author: Margriet Groenendijk

Blog Content - see below

Title: Build an interactive product recommender with Spark and PixieDust

Most websites selling products online will show you a list of items that you might be interested in. The better the recommendations the more likely that you will buy any of these, which will increase their sales. But how are these recommendations created?

The most straightforward way is to use the data of purchases of all customers. From this data you can create groups of customers that have bought similar products. A statistical method to do this is called clustering where you create groups in which the customers in each group are more similar to each other thanthe customers in other groups. One of the algorithms you could use is called k-means clustering where each customer will be within a cluster with the nearest mean. In the below example you can see how many products each customer has bought of product A and B.

The dots are customers and they can be clustered into n groups, where you can define as many groups as you need. Below is an example of what this could look like. Note this is just a sketch, so a real k-means algorithm will probably calculate something different.

With a Machine Learning algorithm you can do the same with many more products, where each additional product will add an extra dimension to the above example. In this code pattern the k-means algorithm from Spark ML is used.

We are not there yet. After clustering all customers in groups there is still not a list of products to recommend. A simple way to create the list of recommended products is to order the most bought products in a cluster and then recommend these. With, of course, taking out the products that are already in the customer's basket.

One of my favorite tools to clean the data and build a model is a Jupyter notebook where you can easily run code, add comments and also explore data with charts and tables. You can run notebooks in Watson Studio where you can use a Spark kernel.

After building the model you probably want to show or share it with others. If you want to use your model in a web application to recommend products you can use Watson Machine Learning. Directly from the notebook you can deploy the model as an API, which you can these use from anywhere.

But before using this API in an application it is a good idea to test the model. You can do this in a notebook by running code. But when you want to let others understand what you have build, a PixieApp is a tool that you can use to shows how the recommendation engine works much more clearly. The below shows an interactive PixieApp of a shopping basket where you can add and delete products and then create a list of recommendations based in the contents of the basket.

If you want to learn more, this code pattern shows you all the code that you need to build a recommender engine from customer data in a Jupyter notebook. You will learn how to use Spark to build a k-means model, deploy this model to Watson Machine Learning and then use this model through an API to build an interactive shopping cart as a PixieApp.

Links

Build a recommender with Apache Spark and Elasticsearch

Create a web-based mobile health app using Watson services on IBM Cloud and IBM Watson Studio

Use machine learning to predict U.S. opioid prescribers with Watson Studio and scikit-learn

Clone this wiki locally