Compile a Pipeline

Compile pipelines and components to YAML

To submit a pipeline for execution, you must compile it to YAML with the KFP SDK compiler:

from kfp import dsl
from kfp import compiler

@dsl.component
def comp(message: str) -> str:
    print(message)
    return message

@dsl.pipeline
def my_pipeline(message: str) -> str:
    """My ML pipeline."""
    return comp(message=message).output

compiler.Compiler().compile(my_pipeline, package_path='pipeline.yaml')

In this example, the compiler creates a file called pipeline.yaml, which contains a hermetic representation of your pipeline. The output is called intermediate representation (IR) YAML. You can view an example of IR YAML on GitHub. The contents of the file is the serialized PipelineSpec protocol buffer message and is not intended to be human-readable.

You can find human-readable information about the pipeline in the comments at the top of the compiled YAML:

# PIPELINE DEFINITION
# Name: my-pipeline
# Description: My ML pipeline.
# Inputs:
#    message: str
# Outputs:
#    Output: str
...

You can also compile components, as opposed to pipelines, to IR YAML:

@dsl.component
def comp(message: str) -> str:
    print(message)
    return message

compiler.Compiler().compile(comp, package_path='component.yaml')

Compiler arguments

The Compiler.compile method accepts the following arguments:

Name	Type	Description
`pipeline_func`	`function`	Required Pipeline function constructed with the `@dsl.pipeline` or component constructed with the @dsl.component decorator.
`package_path`	`string`	Required Output YAML file path. For example, `~/my_pipeline.yaml` or `~/my_component.yaml`.
`pipeline_name`	`string`	Optional If specified, sets the name of the pipeline template in the `pipelineInfo.name` field in the compiled IR YAML output. Overrides the name of the pipeline or component specified by the `name` parameter in the `@dsl.pipeline` decorator.
`pipeline_parameters`	`Dict[str, Any]`	Optional Map of parameter names to argument values. This lets you provide default values for pipeline or component parameters. You can override these default values during pipeline submission.
`type_check`	`bool`	Optional Indicates whether static type checking is enabled during compilation.

Type checking

By default, the DSL compiler statically type checks your pipeline to ensure type consistency between components that pass data between one another. Static type checking helps identify component I/O inconsistencies without having to run the pipeline, shortening development iterations.

Specifically, the type checker checks for type equality between the type of data a component input expects and the type of the data provided. See Data Types for more information about KFP data types.

For example, for parameters, a list input may only be passed to parameters with a typing.List annotation. Similarly, a float may only be passed to parameters with a float annotation.

Input data types and annotations must also match for artifacts, with one exception: the Artifact type is compatible with all other artifact types. In this sense, the Artifact type is both the default artifact type and an artifact “any” type.

As described in the following section, you can disable type checking.

IR YAML

The IR YAML is an intermediate representation of a compiled pipeline or component. It is an instance of the PipelineSpec protocol buffer message type, which is a platform-agnostic pipeline representation protocol. It is considered an intermediate representation because the KFP backend compiles PipelineSpec to Argo Workflow YAML as the final pipeline definition for execution.

Unlike the v1 component YAML, the IR YAML is not intended to be written directly.

While IR YAML is not intended to be easily human readable, you can still inspect it if you know a bit about its contents:

Section	Description	Example
`components`	This section is a map of the names of all components used in the pipeline to `ComponentSpec`. `ComponentSpec` defines the interface, including inputs and outputs, of a component. For primitive components, `ComponentSpec` contains a reference to the executor containing the component implementation. For pipelines used as components, `ComponentSpec` contains a DagSpec instance, which includes references to the underlying primitive components.	View on Github
`deployment_spec`	This section contains a map of executor name to `ExecutorSpec`. `ExecutorSpec` contains the implementation for a primitive component.	View on Github
`root`	This section defines the steps of the outermost pipeline definition, also called the pipeline root definition. The root definition is the workflow executed when you submit the IR YAML. It is an instance of `ComponentSpec`.	View on Github
`pipeline_info`	This section contains pipeline metadata, including the `pipelineInfo.name` field. This field contains the name of your pipeline template. When you upload your pipeline, a pipeline context name is created based on this template name. The pipeline context lets the backend and the dashboard associate artifacts and executions from pipeline runs using the pipeline template. You can use a pipeline context to determine the best model by comparing metrics and artifacts from multiple pipeline runs based on the same training pipeline.	View on Github
`sdk_version`	This section records the version of the KFP SDK used to compile the pipeline.	View on Github
`schema_version`	This section records the version of the `PipelineSpec` schema used for the IR YAML.	View on Github
`default_pipeline_root`	This section records the remote storage root path, such as a MinIO URI or Google Cloud Storage URI, where the pipeline output is written.	View on Github

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.

Last modified June 20, 2024: Restructured Kubeflow Pipelines docs (#3737) (8e56df7)