5 Matching Annotations
  1. Nov 2025
    1. How we slashed our EKS costs by 43% with one simple scheduler tweak 🚀
      • AWS EKS costs can escalate due to massive, parallel workloads in life sciences/drug development (e.g., genomic sequencing, molecular modeling).
      • Default Kubernetes scheduler uses leastAllocated strategy, spreading pods across many nodes for fairness/high availability.
      • leastAllocated strategy causes many partially utilized nodes, preventing autoscalers from scaling down idle nodes, increasing costs.
      • mostAllocated scheduling strategy "packs" pods onto fewer nodes, maximizing utilization and enabling autoscalers like Karpenter to remove idle nodes.
      • Switching to mostAllocated can reduce runtime costs significantly (e.g., ~10% in UAT, 43% in PROD environments).
      • Custom scheduler deployment on AWS EKS requires creating a service account, ClusterRoleBindings, RoleBinding, a ConfigMap with the mostAllocated scoring strategy, and a deployment with a matching Kubernetes version container image.
      • Resource weights can prioritize packing of expensive resources (e.g., high weight on GPUs for ML workloads).
      • Testing in non-production environments is recommended before full rollout.
      • Implementing mostAllocated scheduling can dramatically optimize costs by enabling cluster autoscalers to shut down unused nodes.
  2. Apr 2024
    1. The problem occurs when you want to move the pod to another node, in cases such as cluster rebalancing, spot interruptions, and other events. This is because the EBS volumes are zonal bound and can only be attached to EC2 instances within the zone they were originally provisioned in.This is a key limitation that CAS is not able to take into an account when provisioning a new node.

      Key limitation of CAS

    2. Since Karpenter can schedule nodes quicker, it will most often win this race and provide a new node for the pending workload. CAS will still attempt to create a new node, however will be slower and will most likely have to remove the node after some time, due to emptiness. This brings unnecessary costs to your cloud bill