XGBoost Guide

How to run XGBoost on Kubernetes with Kubeflow Trainer

This guide describes how to use TrainJob to run distributed XGBoost training on Kubernetes.

Prerequisites

Before exploring this guide, make sure to follow the Getting Started guide to understand the basics of Kubeflow Trainer.

XGBoost Overview

XGBoost supports distributed training through the Collective communication protocol (historically known as Rabit). In a distributed setting, multiple worker processes each operate on a shard of the data and synchronize histogram bin statistics via AllReduce to agree on the best tree splits.

Kubeflow Trainer integrates with XGBoost by:

Deploying worker pods as a JobSet.
Automatically injecting the DMLC_* environment variables required by XGBoost’s Collective communication layer (DMLC_TRACKER_URI, DMLC_TRACKER_PORT, DMLC_TASK_ID, DMLC_NUM_WORKER).
Providing the rank-0 pod with the tracker address so user code can start a RabitTracker for worker coordination.
Supporting both CPU and GPU training workloads.

The built-in runtime is called xgboost-distributed and uses the container image ghcr.io/kubeflow/trainer/xgboost-runtime:latest, which includes XGBoost with CUDA 12 support, NumPy, and scikit-learn.

Worker Count

The total number of XGBoost workers is calculated as:

DMLC_NUM_WORKER = numNodes × workersPerNode

CPU training: 1 worker per node. Each worker uses OpenMP to parallelize across all available CPU cores.
GPU training: 1 worker per GPU. The GPU count is derived from resourcesPerNode limits in the TrainJob.

Next Steps

check out the xgboost example
learn more about TrainerClinet() APIs in the KubeFlow SDK
Explore XGboost documentation for advanced configuration options

Feedback

Was this page helpful?

Thank you for your feedback!

We're sorry this page wasn't helpful. If you have a moment, please share your feedback so we can improve.

Last modified March 13, 2026: trainer: add XGBoost distributed training user guide (#4342) (223ca58e)