6 Matching Annotations
- Mar 2023
-
-
cluster with 4096 IP addresses can deploy at most 1024 models assuming each InferenceService has 4 pods on average (two transformer replicas and two predictor replicas).
Kubernetes clusters have a maximum IP address limitation
-
According to Kubernetes best practice, a node shouldn't run more than 100 pods.
-
Each model’s resource overhead is 1CPU and 1 GB memory. Deploying many models using the current approach will quickly use up a cluster's computing resource. With Multi-model serving, these models can be loaded in one InferenceService, then each model's average overhead is 0.1 CPU and 0.1GB memory.
If I am not mistaken, the multi-model approach reduces the size by 90% in this case
-
Multi-model serving is designed to address three types of limitations KServe will run into
Benefits of multi-model serving
-
While you get the benefit of better inference accuracy and data privacy by building models for each use case, it is more challenging to deploy thousands to hundreds of thousands of models on a Kubernetes cluster.
With more separation, comes the problem of distribution
-
- Sep 2021
-
blog.kubeflow.org blog.kubeflow.org
-
we will be releasing KServe 0.7 outside of the Kubeflow Project and will provide more details on how to migrate from KFServing to KServe with minimal disruptions
KFServing is now KServe
-