Model serving with BentoML

BentoML is an open-source framework for serving, managing, and deploying machine learning models. It provides a standardized structure for packaging and exporting trained models, along with all of the dependencies required to run them, making it easy to share and deploy these models in a reproducible way. BentoML also includes features for tracking and managing the lifecycle of machine learning models, including versioning and artifact management, making it a useful tool for collaboration and production deployment of machine learning models.

Data Scientists and ML Engineers use BentoML to:

  • Accelerate and standardize the process of taking ML models to production.
  • Build scalable and high performance prediction services.
  • Continuously deploy, monitor, and operate prediction services in production.

This guide demonstrates how to serve a BentoML deployable, “bento”, in a Kubernetes cluster.


Before starting this tutorial, make sure you have the following:

Download and import the example bento

This step of the guide will download and import the pre-built example Iris Classifier bento. BentoML provides CLI commands for importing and exporting bentos. Follow the BentoML quickstart tutorial if you wish to learn how to build bentos from scratch.

With system already configured with AWS access:

pip install fs-s3fs
bentoml import s3://

For the system without AWS configuration:

#Download the example bento to a local directory
curl star/iris_classifier.bento --output ./iris_classifier.bento
bentoml import ./iris_classifier.bento

After import is complete. Use bentoml list iris_classifier command to confirm the import was successful.

# example output

Tag                               Size       Creation Time        Path
iris_classifier:6mxwnbar3wj672ue  15.87 KiB  2022-08-01 14:07:36  ~/bentoml/bentos/iris_classifier/6mxwnbar3wj672ue
iris_classifier:bazyb4hyegn272ue  15.92 KiB  2022-06-29 20:02:28  ~/bentoml/bentos/iris_classifier/bazyb4hyegn272ue
iris_classifier:x33xm2gp5wrpb2ue  19.69 KiB  2022-05-09 16:14:17  ~/bentoml/bentos/iris_classifier/x33xm2gp5wrpb2ue

BentoML provides first-class support for containerization. Use the bentoml containerize command to build a Docker image and push to the Docker registry.

# Replace `{docker_username} with your Docker Hub username
bentoml containerize iris_classifier:latest -t {docker_username}/iris_classifier
docker push {docker_username}/iris_classifier

Deploy model server to Kubernetes

The following is an example YAML file for specifying the resources required to run and expose a BentoML model server in a Kubernetes cluster. Replace {docker_username} with your Docker Hub username and save it to iris-classifier.yaml:

apiVersion: v1
kind: Service
    app: iris-classifier
  name: iris-classifier
  namespace: kubeflow
  - name: classify
    port: 3000
    targetPort: 3000
    app: iris-classifier
  type: LoadBalancer
apiVersion: apps/v1
kind: Deployment
    app: iris-classifier
  name: iris-classifier
  namespace: kubeflow
      app: iris-classifier
        app: iris-classifier
      - image: {docker_username}/iris_classifier
        imagePullPolicy: IfNotPresent
        name: iris-classifier
        - containerPort: 3000

Use kubectl CLI to deploy the model API server to the Kubernetes cluster

kubectl apply -f iris-classifier.yaml

Send prediction request

Use kubectl describe command to get the NODE_PORT

kubectl describe svc iris-classifier --namespace kubeflow

And then send the request:

curl -i \
  --header "Content-Type: application/json" \
  --request POST \
  --data '[[5.1, 3.5, 1.4, 0.2]]' \

Additional resources


Was this page helpful?