Run your first InferenceService

A tutorial on building and deploying a model using the KServe Python SDK

The InferenceService custom resource is the primary interface that is used for deploying models on KServe. Inside an InferenceService, users can specify multiple components that are used for handling inference requests. These components are the predictor, transformer, and explainer. Learn more here.

In this tutorial, you will deploy an InferenceService with a predictor that will load a scikit-learn model trained with the iris dataset. This dataset has three output class: Iris Setosa, Iris Versicolour, and Iris Virginica.

You will then send an inference request to your deployed model in order to get a prediction for the class of iris plant your request corresponds to.

Before you begin

First, install the KServe SDK using the following command. If you run this command in a Jupyter notebook, restart the kernel after installing the SDK.

$ pip install kserve==0.7.0

Import kubernetes.client and kserve packages

from kubernetes import client 
from kserve import KServeClient
from kserve import constants
from kserve import utils
from kserve import V1beta1InferenceService
from kserve import V1beta1InferenceServiceSpec
from kserve import V1beta1PredictorSpec
from kserve import V1beta1SKLearnSpec

Declare Namespace

This will retrieve the current namespace of your Kubernetes context. The InferenceService will be deployed in this namespace.

namespace = utils.get_default_target_namespace()

Define InferenceService

Next, define the InferenceService based on several key parameters. In the predictor parameter, a V1beta1PredictorSpec object with an embedded V1beta1SKLearnSpec object is created. Inside the V1beta1SKLearnSpec object, a storage URI is provided, pointing to the location of the trained iris model in cloud storage.

api_version = constants.KSERVE_GROUP + '/' + kserve_version

isvc = V1beta1InferenceService(api_version=api_version,
                                   name=name, namespace=namespace, annotations={'':'false'}),

Create InferenceService

Now, with the InferenceService defined, you can now create it by calling the create method of the KServeClient.

KServe = KServeClient()

Check the InferenceService

Run the following command to watch the InferenceService until it is ready (or times out).

KServe.get(name, namespace=namespace, watch=True, timeout_seconds=120)

Perform Inference

Next, you can try sending an inference request to the deployed model in order to get predictions. This notebook assumes that you running it in your Kubeflow cluster and will use the internal URL of the InferenceService.

The Python requests library will be used to send a POST request containing your payload.

import requests

isvc_resp = KServe.get(name, namespace=namespace)
isvc_url = isvc_resp['status']['address']['url']


inference_input = {
  'instances': [
    [6.8,  2.8,  4.8,  1.4],
    [6.0,  3.4,  4.5,  1.6]

response =, json=inference_input)

You should see two predictions returned (i.e. {"predictions": [1, 1]}). Both sets of data points sent for inference correspond to the flower with index 1. In this case, the model predicts that both flowers are “Iris Versicolour”.

To learn more about sending inference requests, please check out the KServe guide.

Run Performance Test (Optional)

If you want to load test the deployed model, try deploying the Kubernetes Job to drive load to the InferenceService.

$ kubectl create -f -n kubeflow-user-example-com

Get Job Name

$ kubectl get pods --namespace=kubeflow-user-example-com | grep load

Check the Job Logs

$ kubectl logs <job-name> -n kubeflow-user-example-com

The output should look like similar to the following:

Requests      [total, rate, throughput]         30000, 500.02, 499.99
Duration      [total, attack, wait]             1m0s, 59.998s, 3.336ms
Latencies     [min, mean, 50, 90, 95, 99, max]  1.743ms, 2.748ms, 2.494ms, 3.363ms, 4.091ms, 7.749ms, 46.354ms
Bytes In      [total, mean]                     690000, 23.00
Bytes Out     [total, mean]                     2460000, 82.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:30000
Error Set:

Delete InferenceService

When you are done with your InferenceService, you can delete it by running the following.

KServe.delete(name, namespace=namespace)

Next Steps

Kubeflow Pipelines E2E MNIST Tutorial - provides an end-to-end test sequence (i.e. start a notebook, run a pipeline, execute training, hyperparameter tuning, and model serving with KServe).


Was this page helpful?