Compile a Pipeline

Compile a pipeline definition to YAML

While pipelines and components are authored in Python, they can be compiled to intermediate representation (IR) YAML (example).

A YAML pipeline/component definition preserves a static representation of your pipeline/component. This YAML can be submitted to the KFP backend for execution or deserialized by the KFP SDK for integration into another pipeline.

Compiling

First let’s define a very simple pipeline:

from kfp import compiler
from kfp import dsl

@dsl.component
def addition_component(num1: int, num2: int) -> int:
    return num1 + num2

@dsl.pipeline(name='addition-pipeline')
def my_pipeline(a: int, b: int, c: int = 10):
    add_task_1 = addition_component(num1=a, num2=b)
    add_task_2 = addition_component(num1=add_task_1.output, num2=c)

Now we can compile the pipeline to the file my_pipeline.yaml:

cmplr = compiler.Compiler()
cmplr.compile(my_pipeline, package_path='my_pipeline.yaml')

Just as a pipeline is a template for a multi-step workflow, a component is a template for a single-step workflow. We can also compile the component addition_component directly:

cmplr.compile(addition_component, package_path='addition_component.yaml')

The Compiler.compile method accepts a few optional additional parameters:

pipeline_name (string)

Sets the name of the pipeline (or component). This is written to IR as the pipelineInfo.name field. Will override the name passed to the @dsl.pipeline decorator.

The pipeline name, whether set through the decorator or the compiler, names your pipeline template. When you upload your pipeline, a pipeline context by this name will be created. The pipeline context enables the backend and the Dashboard to associate artifacts and executions created from runs of the same pipeline template. This allows you, for example, to compare metrics artifacts from multiple runs of the same training pipeline to find the best model.

pipeline_parameters (Dict[str, Any])

A map of parameter names to argument values. This amounts to providing default values for pipeline or component parameters. These defaults can be overriden at pipeline submission time.

type_check (bool)

Whether to enable static type checking during compilation. For more information about type checking, see Component I/O: Component interfaces and type checking.

IR YAML

When you compile a pipeline it is written to intermediate representation (IR) YAML. An IR YAML is an instance of the PipelineSpec protocol buffer message type, a platform-agnostic pipeline representation protocol.

IR YAML is considered an intermediate representation because the KFP backend compiles PipelineSpec to Argo Workflow YAML as the final execution definition for execution on Kubernetes.

Unlike v1 component YAML, IR YAML is not intended to be written directly. For a KFP v2 authoring experience similar to the v1 component YAML authoring experience, see Author a Pipeline: Custom Container Components.

IR YAML contains 7 top-level fields:

components

The components section is a map of component name to ComponentSpec for all components used in the pipeline. ComponentSpec defines the interface (inputs and outputs) of a component. For primitive components, ComponentSpec contains a reference to the executor containing the component implementation. For pipelines used as components, ComponentSpec contains a DagSpec which includes references to its underlying primitive components.

Example

deployment_spec

The deployment_spec section contains a map of executor name to ExecutorSpec. ExecutorSpec contains the implementation for a primitive component.

Example

root
root defines the steps of the outermost (root) pipeline definition. It is itself a ComponentSpec. This is the pipeline executed when the YAML is submitted.

Example

pipeline_info
pipeline_info contains pipeline metadata, including the pipeline name.

Example

sdk_version
sdk_version records which version of the KFP SDK compiled the pipeline.

Example

schema_version
schema_version records which version of the PipelineSpec schema is used for the IR YAML.

Example

default_pipeline_root
default_pipeline_root is the remote storage root path where the pipeline outputs will be written, such as a Google Cloud Storage URI (gcs://my/path).

Example

Feedback

Was this page helpful?


Last modified September 15, 2022: Pipelines v2 content: KFP SDK (#3346) (3f6a118c)