Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[READY] Bring deployment docs up-to-date and add new pages for additional targets #2557

Merged
merged 21 commits into from
May 11, 2023
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 29 additions & 19 deletions docs/source/deployment/airflow_astronomer.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,40 @@
# How to deploy your Kedro pipeline on Apache Airflow with Astronomer
# Apache Airflow

This tutorial explains how to deploy a Kedro project on [Apache Airflow](https://airflow.apache.org/) with [Astronomer](https://www.astronomer.io/). Apache Airflow is an extremely popular open-source workflow management platform. Workflows in Airflow are modelled and organised as [DAGs](https://en.wikipedia.org/wiki/Directed_acyclic_graph), making it a suitable engine to orchestrate and execute a pipeline authored with Kedro. [Astronomer](https://docs.astronomer.io/astro/install-cli) is a managed Airflow platform which allows users to spin up and run an Airflow cluster easily in production. Additionally, it also provides a set of tools to help users get started with Airflow locally in the easiest way possible.
Apache Airflow is a popular open-source workflow management platform. It is a suitable engine to orchestrate and execute a pipeline authored with Kedro because workflows in Airflow are modelled and organised as [DAGs](https://en.wikipedia.org/wiki/Directed_acyclic_graph).

The following discusses how to run the [example Iris classification pipeline](../get_started/new_project.md#create-a-new-project-containing-example-code) on a local Airflow cluster with Astronomer.
## How to run a Kedro pipeline on Apache Airflow using a Kubernetes cluster

## Strategy
The `kedro-airflow-k8s` plugin from Get In Data | Part of Xebia enables you to run a Kedro pipeline on Airflow with a Kubernetes cluster. The plugin can be used together with `kedro-docker` to prepare a docker image for pipeline execution.
stichbury marked this conversation as resolved.
Show resolved Hide resolved

Consult the [GitHub repository for `kedro-airflow-k8s`](https://github.com/getindata/kedro-airflow-k8s) for further details, or take a look at the [documentation](https://kedro-airflow-k8s.readthedocs.io/).


## How to run a Kedro pipeline on Apache Airflow with Astronomer

The following tutorial uses a different approach and shows how to deploy a Kedro project on [Apache Airflow](https://airflow.apache.org/) with [Astronomer](https://www.astronomer.io/).

[Astronomer](https://docs.astronomer.io/astro/install-cli) is a managed Airflow platform which allows users to spin up and run an Airflow cluster easily in production. Additionally, it also provides a set of tools to help users get started with Airflow locally in the easiest way possible.

The tutorial discusses how to run the [example Iris classification pipeline](../get_started/new_project.md#create-a-new-project-containing-example-code) on a local Airflow cluster with Astronomer. You may also consider using our [`astro-airflow-iris` starter](https://github.com/kedro-org/kedro-starters/tree/main/astro-airflow-iris) which provides a template containing the boilerplate code that the tutorial describes:

```shell
kedro new --starter=astro-airflow-iris
```


### Strategy

The general strategy to deploy a Kedro pipeline on Apache Airflow is to run every Kedro node as an [Airflow task](https://airflow.apache.org/docs/apache-airflow/stable/concepts/tasks.html) while the whole pipeline is converted into a [DAG](https://airflow.apache.org/docs/apache-airflow/stable/concepts/dags.html) for orchestration purpose. This approach mirrors the principles of [running Kedro in a distributed environment](distributed.md).

## Prerequisites
### Prerequisites

To follow this tutorial, ensure you have the following:

* An Airflow cluster: you can follow [Astronomer's quickstart guide](https://docs.astronomer.io/astro/category/install-astro) to set one up.
* The [Astro CLI installed](https://docs.astronomer.io/astro/install-cli)
* `kedro>=0.17` installed

## Project Setup
### Tutorial project setup

1. [Initialise an Airflow project with Astro](https://docs.astronomer.io/astro/create-project). Let's call it `kedro-airflow-iris`

Expand Down Expand Up @@ -68,9 +86,9 @@ To follow this tutorial, ensure you have the following:

5. Run `pip install -r src/requirements.txt` to install all dependencies.

## Deployment process
### Deployment process

### Step 1. Create new configuration environment to prepare a compatible `DataCatalog`
#### Step 1. Create new configuration environment to prepare a compatible `DataCatalog`

* Create a `conf/airflow` directory in your Kedro project
* Create a `catalog.yml` file in this directory with the following content
Expand Down Expand Up @@ -101,7 +119,7 @@ example_predictions:

This ensures that all datasets are persisted so all Airflow tasks can read them without the need to share memory. In the example here we assume that all Airflow tasks share one disk, but for distributed environment you would need to use non-local filepaths.

### Step 2. Package the Kedro pipeline as an Astronomer-compliant Docker image
#### Step 2. Package the Kedro pipeline as an Astronomer-compliant Docker image

* **Step 2.1**: Package the Kedro pipeline as a Python package so you can install it into the container later on:

Expand All @@ -125,13 +143,13 @@ FROM quay.io/astronomer/ap-airflow:2.0.0-buster-onbuild
RUN pip install --user dist/new_kedro_project-0.1-py3-none-any.whl
```

### Step 3. Convert the Kedro pipeline into an Airflow DAG with `kedro airflow`
#### Step 3. Convert the Kedro pipeline into an Airflow DAG with `kedro airflow`

```shell
kedro airflow create --target-dir=dags/ --env=airflow
```

### Step 4. Launch the local Airflow cluster with Astronomer
#### Step 4. Launch the local Airflow cluster with Astronomer

```shell
astro dev start
Expand All @@ -142,11 +160,3 @@ If you visit the Airflow UI, you should now see the Kedro pipeline as an Airflow
![](../meta/images/kedro_airflow_dag.png)

![](../meta/images/kedro_airflow_dag_run.png)

## Final thought

This tutorial walks you through the manual process of deploying an existing Kedro project on Apache Airflow with Astronomer. However, if you are starting out, consider using our `astro-airflow-iris` starter which provides all the aforementioned boilerplate out of the box:

```shell
kedro new --starter=astro-airflow-iris
```
9 changes: 9 additions & 0 deletions docs/source/deployment/amazon_sagemaker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Amazon SageMaker

Amazon SageMaker provides the components used for machine learning in a single toolset that supports both classical machine learning libraries like [`scikit-learn`](https://scikit-learn.org/) or [`XGBoost`](https://xgboost.readthedocs.io/), and Deep Learning frameworks such as [`TensorFlow`](https://www.tensorflow.org/) or [`PyTorch`](https://pytorch.org/).

Amazon SageMaker is a fully-managed service and its features are covered by the [official service documentation](https://docs.aws.amazon.com/sagemaker/index.html).

## The `kedro-sagemaker` plugin

The `kedro-sagemaker` plugin from Get In Data | Part of Xebia enables you to run a Kedro pipeline on Amazon Sagemaker. Consult the [GitHub repository for `kedro-sagemaker`](https://github.com/getindata/kedro-sagemaker) for further details, or take a look at the [documentation](https://kedro-sagemaker.readthedocs.io/).
stichbury marked this conversation as resolved.
Show resolved Hide resolved
10 changes: 8 additions & 2 deletions docs/source/deployment/argo.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
# Deployment with Argo Workflows
# Argo Workflows (deprecated)

``` {important}
This page contains legacy documentation that has not been tested against recent Kedro releases.
```

<div style="color:gray">This page explains how to convert your Kedro pipeline to use [Argo Workflows](https://github.com/argoproj/argo-workflows), an open-source container-native workflow engine for orchestrating parallel jobs on [Kubernetes](https://kubernetes.io/).
stichbury marked this conversation as resolved.
Show resolved Hide resolved

This page explains how to convert your Kedro pipeline to use [Argo Workflows](https://github.com/argoproj/argo-workflows), an open-source container-native workflow engine for orchestrating parallel jobs on [Kubernetes](https://kubernetes.io/).

## Why would you use Argo Workflows?

Expand Down Expand Up @@ -240,3 +245,4 @@ As an alternative, you can use [Kedro-Argo plugin](https://pypi.org/project/kedr
```{warning}
The plugin is not supported by the Kedro team and we can't guarantee its workability.
```
</div>
8 changes: 7 additions & 1 deletion docs/source/deployment/aws_batch.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# Deployment with AWS Batch
# AWS Batch (deprecated)

``` {important}
This page contains legacy documentation that has not been tested against recent Kedro releases.
```
<div style="color:gray">

## Why would you use AWS Batch?
[AWS Batch](https://aws.amazon.com/batch/) is optimised for batch computing and applications that scale with the number of jobs running in parallel. It manages job execution and compute resources, and dynamically provisions the optimal quantity and type. AWS Batch can assist with planning, scheduling, and executing your batch computing workloads, using [Amazon EC2](https://aws.amazon.com/ec2/) On-Demand and [Spot Instances](https://aws.amazon.com/ec2/spot/), and it has native integration with [CloudWatch](https://aws.amazon.com/cloudwatch/) for log collection.
Expand Down Expand Up @@ -354,3 +359,4 @@ kedro run --env=aws_batch --runner=kedro_tutorial.runner.AWSBatchRunner
You should start seeing jobs appearing on your Jobs dashboard, under the `Runnable` tab - meaning they're ready to start as soon as the resources are provisioned in the compute environment.

AWS Batch has native integration with CloudWatch, where you can check the logs for a particular job. You can either click on [the Batch job in the Jobs tab](https://console.aws.amazon.com/batch/home/jobs) and click `View logs` in the pop-up panel, or go to [CloudWatch dashboard](https://console.aws.amazon.com/cloudwatch), click `Log groups` in the side bar and find `/aws/batch/job`.
</div>
Loading