Compile a Pipeline
While pipelines and components are authored in Python, they can be compiled to intermediate representation (IR) YAML (example).
A YAML pipeline/component definition preserves a static representation of your pipeline/component. This YAML can be submitted to the KFP backend for execution or deserialized by the KFP SDK for integration into another pipeline.
First let’s define a very simple pipeline:
from kfp import compiler from kfp import dsl @dsl.component def addition_component(num1: int, num2: int) -> int: return num1 + num2 @dsl.pipeline(name='addition-pipeline') def my_pipeline(a: int, b: int, c: int = 10): add_task_1 = addition_component(num1=a, num2=b) add_task_2 = addition_component(num1=add_task_1.output, num2=c)
Now we can compile the pipeline to the file
cmplr = compiler.Compiler() cmplr.compile(my_pipeline, package_path='my_pipeline.yaml')
Just as a pipeline is a template for a multi-step workflow, a component is a template for a single-step workflow. We can also compile the component
Compiler.compile method accepts a few optional additional parameters:
Sets the name of the pipeline (or component). This is written to IR as the
pipelineInfo.name field. Will override the
name passed to the
The pipeline name, whether set through the decorator or the compiler, names your pipeline template. When you upload your pipeline, a pipeline context by this name will be created. The pipeline context enables the backend and the Dashboard to associate artifacts and executions created from runs of the same pipeline template. This allows you, for example, to compare metrics artifacts from multiple runs of the same training pipeline to find the best model.
A map of parameter names to argument values. This amounts to providing default values for pipeline or component parameters. These defaults can be overriden at pipeline submission time.
Whether to enable static type checking during compilation. For more information about type checking, see Component I/O: Component interfaces and type checking.
When you compile a pipeline it is written to intermediate representation (IR) YAML. An IR YAML is an instance of the
PipelineSpec protocol buffer message type, a platform-agnostic pipeline representation protocol.
IR YAML is considered an intermediate representation because the KFP backend compiles
PipelineSpec to Argo Workflow YAML as the final execution definition for execution on Kubernetes.
Unlike v1 component YAML, IR YAML is not intended to be written directly. For a KFP v2 authoring experience similar to the v1 component YAML authoring experience, see Author a Pipeline: Custom Container Components.
IR YAML contains 7 top-level fields:
components section is a map of component name to
ComponentSpec for all components used in the pipeline.
ComponentSpec defines the interface (inputs and outputs) of a component. For primitive components,
ComponentSpec contains a reference to the executor containing the component implementation. For pipelines used as components,
ComponentSpec contains a DagSpec which includes references to its underlying primitive components.
pipeline_info contains pipeline metadata, including the pipeline name.
sdk_version records which version of the KFP SDK compiled the pipeline.
schema_version records which version of the
PipelineSpec schema is used for the IR YAML.
default_pipeline_root is the remote storage root path where the pipeline outputs will be written, such as a Google Cloud Storage URI (