Component docstring format
KFP allows you to document your components and pipelines using Python docstrings. The KFP SDK automatically parses your docstrings and include certain fields in IR YAML when you compile components and pipelines.
For components, KFP can extract your component input descriptions and output descriptions.
For pipelines, KFP can extract your pipeline input descriptions and output descriptions, as well as a description of your full pipeline.
For the KFP SDK to correctly parse your docstrings, you should write your docstrings in the KFP docstring style. The KFP docstring style is a particular variant on the Google docstring style, with the following changes:
Returns:section takes the same structure as the
Args:section, where each return value in the
Returns:section should take the form
<name>: <description>. This is distinct from the typical Google docstring
Returns:section which takes the form
<type>: <description>, with no names for return values.
- Component outputs should be included in the
Returns:section, even though they are declared via component function input parameters. This applies to function parameters annotated with
Output[<Artifact>]type marker for declaring output artifacts.
- Suggested: Type information, including which inputs are optional/required, should be omitted from the input/output descriptions. This information is duplicative of the annotations.
For example, the KFP SDK can extract input and output descriptions from the following component docstring which uses the KFP docstring style:
@dsl.component def join_datasets( dataset_a: Input[Dataset], dataset_b: Input[Dataset], out_dataset: Output[Dataset], ) -> str: """Concatenates two datasets. Args: dataset_a: First dataset. dataset_b: Second dataset. Returns: out_dataset: The concatenated dataset. Output: The concatenated string. """ ...
Similarly, KFP can extract the component input descriptions, the component output descriptions, and the pipeline description from the following pipeline docstring:
@dsl.pipeline(display_name='Concatenation pipeline') def dataset_concatenator( string: str, in_dataset: Input[Dataset], ) -> Dataset: """Pipeline to convert string to a Dataset, then concatenate with in_dataset. Args: string: String to concatenate to in_artifact. in_dataset: Dataset to which to concatenate string. Returns: Output: The final concatenated dataset. """ ...
Note that if you provide a
description argument to the
@dsl.pipeline decorator, KFP will use this description instead of the docstring description.