Hypothesis

7 Matching Annotations

Jan 2025
news.mlops.community news.mlops.community

Untitled document

1
1. pyxelr 06 Jan 2025
  
  in Public
  
  Improved GPU utilization, better LLM storage solutions, and prompt caching features in deployment tools like KServe will continue to make it more accessible to deploy a variety of models.
  
  MLOps prediction for 2025
  
  MLOps GPU LLM KServe
Visit annotations in context

Tags

LLM

KServe

MLOps

GPU

Annotators

pyxelr

URL

news.mlops.community/deliveries/dgTGyQkDAIyGAYuGAQGUKLZVA1fmirGZv6KDdN0=
Mar 2023
kserve.io kserve.io

The Scalability Problem - KServe Documentation Website

5
1. pyxelr 14 Mar 2023
  
  in Public
  
  cluster with 4096 IP addresses can deploy at most 1024 models assuming each InferenceService has 4 pods on average (two transformer replicas and two predictor replicas).
  
  Kubernetes clusters have a maximum IP address limitation
  
  Kubernetes Kubeflow KServe MLOps
2. pyxelr 14 Mar 2023
  
  in Public
  
  According to Kubernetes best practice, a node shouldn't run more than 100 pods.
  
  Kubernetes Kubeflow KServe MLOps
3. pyxelr 14 Mar 2023
  
  in Public
  
  Each model’s resource overhead is 1CPU and 1 GB memory. Deploying many models using the current approach will quickly use up a cluster's computing resource. With Multi-model serving, these models can be loaded in one InferenceService, then each model's average overhead is 0.1 CPU and 0.1GB memory.
  
  If I am not mistaken, the multi-model approach reduces the size by 90% in this case
  
  Kubeflow KServe MLOps
4. pyxelr 14 Mar 2023
  
  in Public
  
  Multi-model serving is designed to address three types of limitations KServe will run into
  
  Benefits of multi-model serving
  
  Kubeflow KServe MLOps
5. pyxelr 14 Mar 2023
  
  in Public
  
  While you get the benefit of better inference accuracy and data privacy by building models for each use case, it is more challenging to deploy thousands to hundreds of thousands of models on a Kubernetes cluster.
  
  With more separation, comes the problem of distribution
  
  Kubeflow KServe MLOps
Visit annotations in context

Tags

Kubeflow

MLOps

Kubernetes

KServe

Annotators

pyxelr

URL

kserve.io/website/0.10/modelserving/mms/multi-model-serving/
Sep 2021
blog.kubeflow.org blog.kubeflow.org

KServe: The next generation of KFServing

1
1. pyxelr 29 Sep 2021
  
  in Public
  
  we will be releasing KServe 0.7 outside of the Kubeflow Project and will provide more details on how to migrate from KFServing to KServe with minimal disruptions
  
  KFServing is now KServe
  
  KFServing KServe MLOps Kubeflow
Visit annotations in context

Tags

MLOps

KFServing

KServe

Kubeflow

Annotators

pyxelr

URL

blog.kubeflow.org/release/official/2021/09/27/kfserving-transition.html

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL