Name	Name	Last commit message	Last commit date
parent directory ..
images	images
README.md	README.md
bqml_kfp_retail_propensity_to_purchase.ipynb	bqml_kfp_retail_propensity_to_purchase.ipynb

License

Copyright 2020 Google LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

     https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

How to build an end-to-end propensity to purchase solution using BigQuery ML and Kubeflow Pipelines

You’ll learn how to build a Propensity a model (if a Customer is going to buy) and how to orchestrate an ML Pipeline for doing so. You can use the code as a reference guide. Amend, replace, or add new AI Pipeline Components according to your use case. Please refer to the Notebook, which contains further documentation and detailed instruction.

To see a step-by-step tutorial that walks you through implementing this solution, see Predicting customer propensity to buy by using BigQuery ML and AI Platform.

The Notebook does the followings:

Environment Setup

Setup Cloud AI Platform Pipelines (using the CloudConsole)
Install KFP client
Install Python packages for Google Cloud Services

Kubeflow Pipelines (KFP) Setup

Prepare Data for the training

Create/Validate a Google Cloud Storage Bucket/Folder
Create the input table in BigQuery

Train the model
Evaluate the model
Prepare the Model for batch prediction

Prepare a test dataset (a table)
Predict the model in BigQuery

Prepare the Model for online prediction

Create a new revision (Model revision management)
Export the BigQuery Model

Export the Model from BigQuery to Google Cloud Storage
Export the Training Stats to Google Cloud Storage
Export the Eval Metrics to Google Cloud Storage

Deploy to Cloud AI Platform Prediction
Predict the model in Cloud AI Platform Prediction

Data Exploration using BigQuery, Pandas, matplotlib
SDLC methodologies Adherence (opinionated)

Variables naming conventions

Upper case Names for immutable variables
Lower case Names for mutable variables
Naming prefixes with rpm_ or RPM_

Unit Tests
Cleanup/Reset utility functions

KFP knowledge share (demonstration)

Pass inputs params through function args
Pass params through pipeline args
Pass Output from one Component as input of another
Create an external Shared Volume available to all the Comp
Use built in Operators
Built light weight Component
Set Component not to cache

Architecture of the pipeline

Data Exploration and Visualization in the notebook

Running the Unit tests

Create a local context and use it to unit test the KFP Pipeline component locally. Below is an example of testing your component locally:

# test locally create_gcs_bucket_folder
local_context = get_local_context()
import json
update_local_context (create_gcs_bucket_folder(
    json.dumps(local_context),
    local_context['RPM_GCP_STORAGE_BUCKET'],
    local_context['RPM_GCP_PROJECT'],
    local_context['RPM_DEFAULT_BUCKET_EXT'],
    local_context['RPM_GCP_STORAGE_BUCKET_FOLDER'],
    local_context['RPM_DEFAULT_BUCKET_FOLDER_NAME']
))

Utility functions

Below is an utility function which purges GCS artifacts while unit/integration testing:

#delete BQ Table if not needed...!!!BE CAREFUL!!!
def delete_table(table_id):
    from google.cloud import bigquery
    # Construct a BigQuery client object.
    client = bigquery.Client()
    # client.delete_table(table_id, not_found_ok=True)  # Make an API request.
    client.delete_table(table_id)  # Make an API request.
    print("Deleted table '{}'.".format(table_id))
#delete the table in the bigquery
delete_table(get_local_context()['rpm_table_id'])

Mandatory Variables

You must set values of these parameters, please refer to the instructions in the Notebook for details:

RPM_GCP_PROJECT:'<Your GCP Project>'
RPM_GCP_KFP_HOST='<Your KFP pipeline host>'
RPM_GCP_APPLICATION_CREDENTIALS="<Full path with the file name to the above downloaded json file>"

A screen grab of the Output of the KFP pipeline

KFP Knowledge Share

The below code snippets demonstrates various KFP syntaxes. It shows various ways to pass parameters. You could use whatever works for you.

Pass inputs params through function args example:

# create BQ DS only if it doesn't exist
from typing import NamedTuple
def create_bq_ds (ctx:str, 
    RPM_GCP_PROJECT: str,
    RPM_BQ_DATASET_NAME: str, 
    RPM_LOCATION: str
    ) -> NamedTuple('Outputs', [
    ('rpm_context', str), 
    ('rpm_bq_ds_name', str), 
    ]):

Pass params through pipeline args example:

def bq_googlestr_dataset_to_bq_to_caip_pipeline(
    data_path = all_vars['RPM_PVC_NAME'] #you can pass input variables
):

Pass Output from one Kubeflow Pipelines Component as Input of another Kubeflow Pipelines Component example:

Output 'rpm_table_id' from load_bq_ds_op component passed as input to create_bq_ml_op comp

    create_bq_ml_op = create_kfp_comp(create_bq_ml)(
        load_bq_ds_op.outputs['rpm_context'],
        all_vars['RPM_GCP_PROJECT'],
        all_vars['RPM_MODEL_NAME'],
        all_vars['RPM_DEFAULT_MODEL_NAME'],
        create_bq_ds_op.outputs['rpm_bq_ds_name'],
        load_bq_ds_op.outputs['rpm_table_id']
        )

Create an external Shared Volume available to all the Kubeflow Pipelines Component example:

    #create a volume where the dataset will be temporarily stored.
    pvc_op = VolumeOp(
        name=all_vars['RPM_PVC_NAME'],
        resource_name=all_vars['RPM_PVC_NAME'],
        size="20Gi",
        modes=dsl.VOLUME_MODE_RWO
    )

Use built in Ops example:

    #create a volume where the dataset will be temporarily stored.
    pvc_op = VolumeOp(
        name=all_vars['RPM_PVC_NAME'],
        resource_name=all_vars['RPM_PVC_NAME'],
        size="20Gi",
        modes=dsl.VOLUME_MODE_RWO
    )

Built light weight Kubeflow Pipelines Component example:

# converting functions to container operations
import kfp.components as comp
def create_kfp_comp(rpm_comp):
    return comp.func_to_container_op(
        func=rpm_comp, 
        base_image="google/cloud-sdk:latest")

Set Kubeflow Pipelines Component not to cache example:

    get_versioned_bqml_model_export_path_op.execution_options.caching_strategy.max_cache_staleness = "P0D"

Questions? Feedback?

If you have any questions or feedback, please open up a new issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bqml

bqml

README.md

License

How to build an end-to-end propensity to purchase solution using BigQuery ML and Kubeflow Pipelines

The Notebook does the followings:

Architecture of the pipeline

Data Exploration and Visualization in the notebook

Running the Unit tests

Utility functions

Mandatory Variables

A screen grab of the Output of the KFP pipeline

KFP Knowledge Share

Pass inputs params through function args example:

Pass params through pipeline args example:

Pass Output from one Kubeflow Pipelines Component as Input of another Kubeflow Pipelines Component example:

Create an external Shared Volume available to all the Kubeflow Pipelines Component example:

Use built in Ops example:

Built light weight Kubeflow Pipelines Component example:

Set Kubeflow Pipelines Component not to cache example:

Questions? Feedback?

Files

bqml

Directory actions

More options

Directory actions

More options

Latest commit

History

bqml

Folders and files

parent directory

README.md

License

How to build an end-to-end propensity to purchase solution using BigQuery ML and Kubeflow Pipelines

The Notebook does the followings:

Architecture of the pipeline

Data Exploration and Visualization in the notebook

Running the Unit tests

Utility functions

Mandatory Variables

A screen grab of the Output of the KFP pipeline

KFP Knowledge Share

Pass inputs params through function args example:

Pass params through pipeline args example:

Pass Output from one Kubeflow Pipelines Component as Input of another Kubeflow Pipelines Component example:

Create an external Shared Volume available to all the Kubeflow Pipelines Component example:

Use built in Ops example:

Built light weight Kubeflow Pipelines Component example:

Set Kubeflow Pipelines Component not to cache example:

Questions? Feedback?