Enabling GPU and TPU

Enable GPU and TPU for Kubeflow Pipelines on Google Kubernetes Engine (GKE)

This page describes how to enable GPU or TPU for a pipeline on Google Kubernetes Engine by using the Pipelines DSL language.

Prerequisites

To enable GPU and TPU on your Kubeflow cluster, follow the instructions on how to customize the Google Kubernetes Engine cluster for Kubeflow before setting up the cluster.

Configure ContainerOp to consume GPUs

After enabling the GPU, the Kubeflow setup script installs a default GPU pool with type nvidia-tesla-k80 with auto-scaling enabled. The following code consumes 2 GPUs in a ContainerOp.

import kfp.dsl as dsl
gpu_op = dsl.ContainerOp(name='gpu-op', ...).set_gpu_limit(2)

The code above will be compiled into Kubernetes Pod spec:

container:
  ...
  resources:
    limits:
      nvidia.com/gpu: "2"

If the cluster has multiple node pools with different GPU types, you can specify the GPU type by the following code.

import kfp.dsl as dsl
gpu_op = dsl.ContainerOp(name='gpu-op', ...).set_gpu_limit(2)
gpu_op.add_node_selector_constraint('cloud.google.com/gke-accelerator', 'nvidia-tesla-p4')

The code above will be compiled into Kubernetes Pod spec:

container:
  ...
  resources:
    limits:
      nvidia.com/gpu: "2"
nodeSelector:
  cloud.google.com/gke-accelerator: nvidia-tesla-p4

See GPU tutorial for a complete example to build a Kubeflow pipeline that uses GPUs.

Check the Google Kubernetes Engine GPU guide to learn more about GPU settings.

Configure ContainerOp to consume TPUs

Use the following code to configure ContainerOp to consume TPUs on Google Kubernetes Engine:

import kfp.dsl as dsl
import kfp.gcp as gcp
tpu_op = dsl.ContainerOp(name='tpu-op', ...).apply(gcp.use_tpu(
  tpu_cores = 8, tpu_resource = 'v2', tf_version = '1.12'))

The above code uses 8 v2 TPUs with TF version to be 1.12. The code above will be compiled into Kubernetes Pod spec:

container:
  ...
  resources:
    limits:
      cloud-tpus.google.com/v2: "8"
  metadata:
    annotations:
      tf-version.cloud-tpus.google.com: "1.12"

To learn more, see an example pipeline that uses a preemptible node pool with TPU or GPU..

See the Google Kubernetes Engine TPU Guide to learn more about TPU settings.

Feedback

Was this page helpful?


Last modified January 3, 2023: Fix hyper-links (8c0f3b4)