This guide describes how to deploy Kubeflow and a series of Kubeflow components on GKE (Google Kubernetes Engine). If you want to use Kubeflow Pipelines only, refer to Installation Options for Kubeflow Pipelines for choosing an installation option.
As a high level overview, you need to create one Management cluster which allows you to manage Google Cloud resources via Config Connector. Management cluster can create, manage and delete multiple Kubeflow clusters, while being independent from Kubeflow clusters' activities. Below is a simplified view of deployment structure. Note that Management cluster can live in a different Google Cloud project from Kubeflow clusters, admin should assign owner permission to Management cluster’s service account. It will be explained in detail during Deployment steps.
Follow the steps below to set up Kubeflow environment on Google Cloud. Some of these steps are one-time only, for example: OAuth Client can be shared by multiple Kubeflow clusters in the same Google Cloud project.
- Set up Google Cloud project.
- Set up OAuth Client.
- Deploy Management Cluster.
- Deploy Kubeflow Cluster.
If you encounter any issue during the deployment steps, refer to Troubleshooting deployments on GKE to find common issues and debugging approaches. If this issue is new, file a bug to kubeflow/gcp-blueprints for GKE related issue, or file a bug to the corresponding component in Kubeflow on GitHub if the issue is component specific.
Once you finish deployment, you will be able to:
- manage a running Kubernetes cluster with multiple Kubeflow components installed.
- get a Cloud Endpoint which is accessible via IAP (Identity-aware Proxy).
- enable Multi-user feature for resource and access isolation.
- take advantage of GKE’s Cluster Autoscaler to automatically resize the number of nodes in a node pool.
- choose GPUs and Cloud TPU to accelerate your workload.
- use Cloud Logging to help debugging and troubleshooting.
- access to many managed services offered by Google Cloud.
- Repeat Deploy Kubeflow Cluster if you want to deploy multiple clusters.
- Run a full ML workflow on Kubeflow, using the end-to-end MNIST tutorial.