Authenticating Pipelines to GCP

Authentication and authorization to Google Cloud Platform (GCP) in Pipelines

This page describes authentication for Kubeflow Pipelines to GCP.

Before you begin

Installation Options for Kubeflow Pipelines introduces options to install Pipelines. Be aware that authentication support and cluster setup instructions will vary depending on the option you installed Kubeflow Pipelines with.

Securing the cluster with fine-grained GCP permission control

Workload Identity

Workload identity is the recommended way for your GKE applications to consume services provided by Google APIs. You accomplish this by configuring a Kubernetes service account to act as a Google service account. Any Pods running as the Kubernetes service account then use the Google service account to authenticate to cloud services.

Referenced from Workload Identity Documentation. Please read this doc for:

  • Detailed introduction about workload identity.
  • Instructions to enable it on your cluster.
  • Whether its limitations affect your adoption.

Terminology

This document distinguishes between Kubernetes service accounts (KSAs) and Google service accounts (GSAs). KSAs are Kubernetes resources, while GSAs are specific to Google Cloud. Other documentation usually refers to both of them as just “service accounts”.

Authoring pipelines to use workload identity

Pipelines don’t need any change to authenticate to GCP, it will use the GSA bound to pipeline-runner KSA transparently.

However, existing pipelines that use use_gcp_secret kfp sdk operator need to remove use_gcp_secret usage to use the bound GSA. You can also continue to use use_gcp_secret in a cluster with workload identity enabled, but pipeline steps with use_gcp_secret will use the GSA corresponding to the secret provided.

Cluster setup to use workload identity for Pipelines Standalone

After you enabled workload identity, you need to bind workload identities for KSAs used by Pipelines Standalone. The following helper bash scripts bind workload identities for KSAs provided by Pipelines Standalone:

  • gcp-workload-identity-setup.sh helps you create GSAs and bind them to KSAs used by pipelines workloads. This script provides an interactive command line dialog with explanation messages.
  • wi-utils.sh provides minimal utility bash functions that let you customize your setup. The minimal utilities make it easy to read and use programmatically.

Pipelines use pipeline-runner KSA, you can configure IAM permissions of the GSA bound to this KSA to allow pipelines use GCP APIs.

Pipelines UI uses ml-pipeline-ui KSA. If you need to view visualizations stored in Google Cloud Storage (GCS) from pipelines UI, you should add Storage Object Viewer permission to its bound GSA.

Google service account keys stored as Kubernetes secrets

Authoring pipelines to use GSA keys

Each pipeline step describes a container that is run independently. If you want to grant access for a single step to use one of your service accounts, you can use kfp.gcp.use_gcp_secret(). Examples for how to use this function can be found in the Kubeflow examples repo.

Cluster setup to use use_gcp_secret for Full Kubeflow

You don’t need to do anything. Full Kubeflow deployment has already deployed the user-gcp-sa secret for you.

Cluster setup to use use_gcp_secret for Pipelines Standalone and Hosted GCP ML Pipelines

Pipelines Standalone and Hosted GCP ML Pipelines require you to manually set up the user-gcp-sa secret used by use_gcp_secret.

Instructions to set up the secret:

  1. First download the GCE VM service account token (refer to GCP documentation for more information):

    gcloud iam service-accounts keys create application_default_credentials.json \
      --iam-account [SA-NAME]@[PROJECT-ID].iam.gserviceaccount.com
    
  2. Run:

    kubectl create secret -n [your-namespace] generic user-gcp-sa \
      --from-file=user-gcp-sa.json=application_default_credentials.json