Kubeflow provides two supported open source model serving systems that allow multi-framework model serving: KFServing and Seldon Core. You should choose the framework that best supports your model serving requirements. A rough comparison between KFServing and Seldon Core is shown below:
|NVIDIA TensorRT Inference Server||x||x|
|Routers incl (MAB)||Roadmap||x|
|Language Wrappers||python, java, R|
- Both projects share technology including Explainability (via Seldon Alibi Explain) and Payload Logging amongst other areas.
- A commercial product Seldon Deploy is available from Seldon that supports both KFServing and Seldon in production.
- KFServing is part of the Kubeflow project ecosystem. Seldon is an external project supported within Kubeflow.
For further information:
For TensorFlow models you can use TensorFlow Serving for both real-time and batch prediction. Documentation is also provided on using TensorFlow serving via Istio. However, if you are thinking of utlizing multiple frameworks we would suggest you use KFServing or Seldon Core as described above.
NVIDIA TensorRT Inference Server
NVIDIA TensorRT Inference Server is a REST and GRPC service for deep-learning inferencing of TensorRT, TensorFlow and Caffe2 models. The server is optimized deploy machine and deep learning algorithms on both GPUs and CPUs at scale.
You can use NVIDIA TensorRT Inference Server standalone but we also recommend you look at using KFServing which includes support for NVIDIA TensorRT Inference Server.
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.