Index of Reusable Components
A Kubeflow Pipelines component is a self-contained set of code that performs one step in the pipeline, such as data preprocessing, data transformation, model training, and so on. Each component is packaged as a Docker image. You can add existing components to your pipeline. These may be components that you create yourself, or that someone else has created and made available.
The Kubeflow Pipelines repository includes a variety of reusable components that you can add to your pipeline. This page highlights the components that include usage documentation in the form of README files.
Cloud Machine Learning (ML) Engine
The following components submit jobs to Cloud ML Engine on Google Cloud Platform (GCP).
- Component: Cloud ML Engine model training
- Submits a Python training job to Cloud ML Engine. The job writes the trained model and other training results to a Cloud Storage location of your choice. The component output is the ID of the training job on Cloud ML Engine.
- Component: Cloud ML Engine model deployment
- Deploys a trained model to Cloud ML Engine from a Cloud Storage path. The component output is the Cloud ML Engine resource name of the deployed model version.
- Component: Cloud ML Engine batch prediction
- Submits a batch prediction request to a trained model deployed on Cloud ML Engine. The job writes the prediction results to a Cloud Storage location of your choice. The component output is the ID of the batch prediction job on Cloud ML Engine.
The following components submits a job to BigQuery on GCP.
- Component: BigQuery query
- Submits a query to BigQuery and writes the component output to a Cloud Storage location of your choice.
The following components submit jobs to Cloud Dataflow on GCP.
- Component: Dataflow Python Apache Beam job
- Submits an Apache Beam job authored in Python to Cloud Dataflow. The Cloud Dataflow pipeline runner executes the Python code. The component output is the ID of the Dataflow job.
- Component: Dataflow job from template
- Submits a job to Cloud Dataflow based on a template. The template must be stored in Cloud Storage. The component output is the ID of the Dataflow job.
- For usage instructions for each of the above components, see the README file of the linked component on GitHub.
- See how to build your own pipeline components.
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.