ML Pipeline Templates: End-to-End Tutorial
ML Pipeline Templates provide step-by-step guidance on implementing typical machine learning scenarios. Each template introduces a machine learning project structure that allows to modularize data processing, model definition, model training, validation, and inference tasks. Using distinct steps makes it possible to rerun only the steps you need, as you tweak and test your workflow. A well-defined, standard project structure helps all team members to understand how a model was created.
ML pipeline templates are based on popular open source frameworks such as Kubeflow, Keras, Seldon to implement end-to-end ML pipelines that can run on AWS, on-prem hardware, and at the edge. Kubeflow is open source, cloud native solution for ML.
The following Kubeflow extensions are introduced to address common challenges:
- get_secret for managing Kubernets secrets from Jupyter notebooks
- S3 filesystem for Kubeflow
- Kaniko KFP component for building Docker images
- Template magic for Notebooks
- Extensions for Notebooks: debugging, variable explorer
- Environment configuration for Notebooks based on configmaps and secrets
- Continuous deployment for Kubeflow Pipelines
Machine learning is increasingly used in real world systems, such as autonomous vehicles, voice recognition, language translation, and many others. However, to generate a positive ROI using ML, it needs to be operationalized (deployed into production). Since machine learning is still a new field, early adopters have run into obstacles deploying machine learning into production due to friction with engineering and IT teams. In some cases, work done by data scientists and machine learning engineers is wasted because it never escapes the lab due to technical constraints, or can't be scaled to larger data.
To address these problems, Machine Learning can take a page from a DevOps playbook. In order to deliver business value and to build intelligent software products, data science teams need to focus on the following priorities:
- Start by asking the right questions - define clear requirements for questions that machine learning models should answer/predict/estimate.
- Implement several experiments, train, and evaluate the model in the Lab environment. Ingesting, cleaning, and labeling training data sets are important steps in this process, because data scientists need as much quality data as possible in order to build and train their ML models. The more high quality data they get, the more accurate are the model predictions.
- As soon as possible, deploy the model to Test or Production environment in order to expose the model to the real life data.
- Based on user feedback and learnings for real life data, iterate frequently to build even better and more intelligent products.
Machine Learning Pipelines play an important role in building production ready AI/ML systems. Using ML pipelines, data scientists, data engineers, and IT operations can collaborate on the steps involved in data preparation, model training, model validation, model deployment, and model testing. Machine Learning pipelines address two main problems of traditional machine learning model development: long cycle time between training models and deploying them to production, which often includes manually converting the model to production-ready code; and using production models that had been trained with stale data. There are many common steps in ML pipelines that should be automated and tracked. For example, model validation steps must be tracked to help data scientists with evaluation of model accuracy and picking the optimal hyper parameters. Agile Stacks provide reusable machine learning pipelines that can be used as a template for your machine learning scenarios.
The following diagram shows a typical ML Pipeline. It provides a platform for running machine learning experiments: making it easy for you to try numerous ideas and techniques and manage your various trials/experiments. The arrows indicate that machine learning projects are highly iterative. As you progress through pipeline steps, you will find yourself iterating on a step until reaching desired model accuracy, then proceeding to the next step. Here is an excellent blog by Jeremy Jordan that discusses machine learning workflow in more detail. The workflow is not complete even after you deploy the model into production: you get feedback from real-world interactions and repeat data preparation and training steps.
In this tutorial we will cover how to leverage Kubeflow Pipeline templates to get your ML experiments from the lab into the real world as quickly as possible. Kubeflow Pipelines is a platform for building and deploying portable, scalable machine learning (ML) workflows based on Kubernetes. We use Kubernetes for automating deployment, scaling, and management of containerized applications. Kubernetes has evolved into the de-facto industry standard for container management and for machine learning at scale. There are many reasons why Kubernetes provides the best platform for machine learning at scale: repeatable experiments, portable and reproducible environments, efficient utilization of CPUs/GPUs, tracking and monitoring metrics in production; and proven scalability.
Based on Agile Stacks Kubeflow Pipeline template, we will implement a machine learning pipeline for training, monitoring and deployment of deep learning models. We will use popular open source frameworks such as Kubeflow, Keras, Seldon to implement end-to-end ML pipelines. The Kubeflow project is designed to simplify the deployment of machine learning projects like Keras and TensorFlow on Kubernetes. These frameworks can leverage multiple GPUs in the Kubernetes cluster for machine learning tasks. We will also cover how you can use Kubeflow pipelines to continuously deploy models to production and retrain models on real life data.
- Learn how to build pipelines for training, monitoring and deployment of deep learning models.
- Prepare and store training data in S3 buckets or NFS volumes.
- Build, train, and deploy models from Jupyter notebooks.
- Train Sequence to Sequence NLP model using multiple GPUs.
- Deploy your machine learning models and experiments at scale on AWS Kubernetes service
- Reduce cost with automation, node autoscaling, and AWS spot instances.
- Deploy machine learning models on AWS, GCP, or on-prem hardware.
- Compare results of experiments with monitoring and experiment tracking tools.
- Automate experiment tracking, hyper parameter optimization, model versioning, deployment to production.
- Implement simple web applications and send prediction requests to the deployed models using Seldon
- Sample pipelines, algorithms, and training data sets will be provided for solving common problems such as data cleansing and training on a large amount of samples.
Step 1: Define the ProblemText summarization is the problem of creating a short and accurate summary of a longer text document. We will be using sequence to sequence neural network to summarize text found in GitHub issues, and make predictions - extract meaning out of text data and generate an issue title based on the issue text. This type of models can be used for text translation or for free-from question answering (generating a natural language answer given a natural language question). It is applicable any time you need to generate text. To train the model, we will gather many training samples (Issue Title, Issue Body) with the goal of training our model to summarize issues and generate titles. The idea is that by seeing many examples of issue descriptions and titles a model can learn how to summarize new issues. We are going to implement a complete ML Pipeline based on GitHub issue summarization model by Hamel Husain. To test the model on real life data, we are going to implement a web application to request inferences from the deployed model.
Step 2: Create ML Pipeline from Template
You can use the Agile Stacks Machine Learning Stack to create ML pipelines from several reusable templates. When building pipelines with Agile Stacks ML Pipeline Templates, you can focus on machine learning, rather than on infrastructure.
If you are working on this tutorial during a workshop, please follow instructions to login into development environments provided at the start of the workshop. If you are going through this tutorial outside of workshop, you need to provision a development environment in your own AWS or GCP cloud account. You can register for Agile Stacks trial account or subscribe via AWS Marketplace.
Once you receive an activation email, login into Agile Stacks Console and create a cloud account by clicking on Cloud > Cloud Accounts > Create. Then create an environment: Cloud > Environments > Create. For example, name your environment as DEV, and associate it with the cloud account you just added. Now we can create a stack template: Templates > Create. Give your stack template a unique name, and select the following stack options from the catalog: Kubernetes, Dashboard, Harbor, Minio, Prometheus, ACM (or Let's Encrypt), Kubeflow.
Click Build Now - this will save stack template in Git as a set of Terraform and CloudFormation scripts, and deploy it into your cloud account. On the next page, name your Kubernetes cluster - for example dev.demo05.superhub.io. For this tutorial, you can use default setting for the count and types of Kubernetes nodes. Notice that if "On Demand" check box is left unchecked, we will deploy Kubernetes nodes on AWS spot instances which offer a cost effective way to run Kubernetes. At a later time in this workshop, you can add a few GPU nodes to this cluster in order to train the model on a large number of training samples. Refer to the following screen shot for recommended settings. Now you need to wait 10-15 minutes for Kubernetes cluster to be deployed.
As you work through this tutorial, it uses billable components of AWS or GCP - in the range between $1 and $2 per hour. To minimize costs, follow the instructions in step 7 (cleanup) when you've finished with the tutorial.
Note that in order to provide sufficient CPU and RAM for Jupyter notebooks, you need to select at least one Worker node of size r4.xlarge or larger. Later on you can add a GPU node to train the model faster.
Step 2: Create ML Pipeline from Template
After your Kubernetes cluster is deployed, create a new ML Pipeline from a template by clicking on Stacks > Applications > Install. Select "Machine Learning" template, and then Kubeflow (Keras and Seldon). Compose your machine learning application: select previously created environment (DEV), and use the following options:
Kubeflow engine: Kubeflow
Artifact object storage: Minio
Source code repository: Github-token-dev
Docker registry: Harbor
Application Name: "kfp1"
Bucket Name: "kfp1"
Source code repository name: "kfp1"
Docker registry: "kfp1"
Click the Install button. After the application is created, you will receive 3 URLs:
Source code: Git repository URL where ML Pipeline source code is stored
Notebook: URL to the pipeline editor
Bucket: URL to the object storage bucket where all training data and model files are stored
Click on Notebook button to open the Jupyter notebook.
Step 3: Define Pipeline Parameters
You can click the Notebook button to open Kubeflow Pipeline Workbench. Alternatively, navigate to your stack instance by clicking Stacks > Instances > List. Select your stack instance and click Kubeflow button.
Click Activity (top left icon), and then click Notebooks. You should see a list of Jupyter notebooks for each pipeline template that was created. Select your Notebook server name and click Connect.
Jupyter notebook integrates code and its output into a single document that combines visualizations, narrative text, mathematical equations, and other rich media. The intuitive workflow promotes iterative and rapid development, making notebooks a preferred tool for many data scientists. You can review all steps of the machine learning pipeline by browsing Python files in workspace > src folder.
Step 4: Examine project structure
A well-organized machine learning codebase should modularize data processing, model definition, model training, validation, and inference tasks. Using distinct steps makes it possible to rerun only the steps you need, as you tweak and test your workflow. A step is a computational unit in the pipeline. A well-defined, standard project structure helps all team members to understand how a model was created.
Data sources and intermediate data are reused across the pipeline, which saves compute time and resources.
├── training-latest <- Data volume (S3 bucket) where training data and model are stored workspace
├── components <- Project components used for pipeline steps such as training ├── nbextentions <- Notebook extension libraries that implement common tasks
├── 101-training.ipynb <- Wrapper Notebook for initial experimentation and fine tuning the model
├── 201-distributed-training.ipynb <- Wrapper Notebook for training at scale using containers an KF pipeline
├── 201-data-pipeline.ipynb <- Example Notebook to implement pipeline steps in Go
├── README.md <- The top-level README for developers using this project.
Step 5: Build Keras modelNext, open the pipeline definition notebook: "workspace > 101-training.ipynb"
Step 6: Run the Training PipelineNext, open the pipeline definition notebook: "workspace > 201-distributed-training.ipynb"
Step 6: Deploy model serving application
After you train your model, you can deploy it to get predictions on test data or on real life data that the model has never seen before. We are going to package the model as a Docker image and deploy it as a REST API. In addition we are going to create a simple Web application to send requests to the model for predictions.
Seldon is a great framework for deploying and managing models in Kubernetes. Seldon makes models available as a REST APIs or as a gRPC endpoints and helps to deploy multiple versions of the same model for A/B testing or multi-armed bandits. Seldon takes care of scaling the model and keeping it running all your models with a standard API.
Open the file “serving.py” to take a look at the code we use in order to uses the trained model files to generate a prediction. We are going to package this code in Docker container and deploy it as Seldon API. In the notebook step  we create Dockerfile, and in step  we deploy the model to a Kubernetes cluster by running kubectl command from Jupyter notebook. Note that “templates/seldon.yaml” provides a template for Kubernetes deployment file. We are going to automatically inject all environment specific details via parameters: model name, model version, path to model files, and Seldon authentication secrets. Keras model is referenced from Seldon container via parameter MODEL_FILE with value "/mnt/s3/latest/training/training1.h5". You can examine seldon.yaml file but you don’t need to change any parameters. The following screen shows the output from model deployment step 3.2. You can test the model by sending it some test data for predictions using seldon.prediction API as shown in step 3.3.
It takes a few minutes to deploy the model via Seldon - if you run validation too soon, you may get an error. Just wait 2-3 minutes and try sending a test request to Seldon again. Step 3.3 provides a code sample you can use to test a Keras model deployed via Seldon. You can edit the value of test_payload to send several test payloads to your model.
Step 7: Deploy Web Application
Now the model is deployed as REST API, and you will build a simple Python Flask application to provide a web UI for end users to interact with the model. The test application is going to pull a random issue from GitHub when a user clicks 'Populate Random Issue' button. When a user clicks "Generate Title" button, the application will send issue text to the model via REST API to generate the issue summary. In step 4.1 you will build an application container. The source code for web application is available in a Python file which you can view or edit from the Jupyter notebook: workspace / components / flask / app / src / app.py.
Once the application is deployed you can access it using the following URL:
Step 8: Wrapping Up
We have completed an end-to-end ML pipeline that allows production ML lifecycle management.
For more information about Kubeflow, visit https://www.kubeflow.org.
The code for this tutorial is on GitHub (Kubeflow Extensions), and it’s available for one-click deployment as ML pipeline template “Kubeflow Pipeline” on Agile Stacks. While you can run this tutorial on any Kubernetes cluster, the easiest way to create a lab environment is with Agile Stacks. You can register for a free trial account using the following link:
In this tutorial we implemented a complete machine learning pipeline from data preparation to model training, and serving. Feel free to experiment with it and adapt it to your needs. The objective is to make your machine learning projects more agile, make iterations faster, and models better.