Architecture

An overview of Kubeflow’s architecture

This guide introduces Kubeflow subprojects and how they fit in each stage of the AI lifecycle.

Read the introduction to learn more about Kubeflow subprojects, Kubeflow ecosystem, and Kubeflow distribution.

Kubeflow Overview Diagram

The following diagram shows the Kubeflow subprojects to cover each stage of the AI lifecycle on top of Kubernetes.

Kubeflow Overview Diagram

Kubeflow Landscape

The following diagram gives an overview of the Kubeflow Landscape and how it relates to the wider Kubernetes and AI landscape. Kubeflow builds on Kubernetes as a system for deploying, scaling, and managing AI platforms.

Kubeflow Ecosystem Diagram

Introducing the AI Lifecycle

When you develop and deploy an AI application, the AI lifecycle typically consists of several stages. Developing an AI system is an iterative process. You need to evaluate the output of various stages of the AI lifecycle, and apply changes to the model and parameters when necessary to ensure the model keeps producing the results you need.

The following diagram shows the AI lifecycle stages in sequence:

AI Lifecycle

Looking at the stages in more detail:

In the Data Preparation step you ingest raw data, perform feature engineering to extract ML features for the offline feature store, and prepare training data for model development. Usually, this step is associated with data processing tools such as Spark, Dask, Flink, or Ray.
In the Model Development step you choose an ML framework, develop your model architecture and explore the existing pre-trained models for fine-tuning like BERT or Llama.
In the Model Training step you train or fine-tune your models on the large-scale compute environment. You should use a distributed training if single GPU can’t handle your model size. The results of the model training is the trained model artifact that you can store in the Model Registry.
In the Model Optimization step you optimize your model hyperparameters and optimize your model with various AutoML algorithms such as neural architecture search and model compression. During model optimization you can store ML metadata in the Model Registry.
In the Model Serving step you serve your model artifact for online or batch inference. Your model may perform predictive or generative AI tasks depending on the use-case. During the model serving step you may use an online feature store to extract features. You monitor the model performance, and feed the results into your previous steps in the AI lifecycle.

AI Lifecycle for Production and Development Phases

The AI lifecycle for AI applications may be conceptually split between development and production phases, this diagram explores which stages fit into each phase:

AI Lifecycle with Development and Production

Kubeflow Landscape in the AI Lifecycle

The next diagram shows how Kubeflow subprojects and Kubeflow ecosystem projects fit for each stage of the AI lifecycle:

Kubeflow Landscape in the AI Lifecycle

See the following links for more information about each Kubeflow project:

Kubeflow Spark Operator can be used for data preparation and feature engineering step.
Kubeflow Notebooks can be used for model development and interactive data science to experiment with your AI workflows.
Kubeflow Trainer can be used for large-scale distributed training or LLMs fine-tuning.
Kubeflow Katib can be used for model optimization and hyperparameter tuning using various AutoML algorithms.
Kubeflow Hub can be used to store ML metadata, model artifacts, and preparing models for production serving.
Kubeflow Pipelines can be used to build, deploy, and manage each step in the AI lifecycle.

Kubeflow Interfaces

This section introduces the interfaces that you can use to interact with the Kubeflow subprojects.

Kubeflow Dashboard

The Kubeflow Central Dashboard looks like this:

Kubeflow Central Dashboard - Homepage

The Kubeflow Community Distribution includes Kubeflow Central Dashboard which acts as a hub for your AI platform and tools by exposing the UIs of components running in the cluster.

Kubeflow SDK

The Kubeflow SDK is a set of unified Pythonic APIs that let you run any AI workload at any scale – without the need to learn Kubernetes. It provides simple and consistent APIs across the Kubeflow subprojects, enabling users to focus on building AI applications rather than managing complex infrastructure.

Visit the Kubeflow SDK website to learn more about it.

Next steps

Follow Installing Kubeflow to set up your environment and install Kubeflow.

Feedback

Was this page helpful?

Thank you for your feedback!

We're sorry this page wasn't helpful. If you have a moment, please share your feedback so we can improve.

Last modified June 13, 2026: community: Introduce Kubeflow Community Distribution and Kubeflow Subprojects (#4385) (be2408e7)