Hypothesis

116 Matching Annotations

Aug 2025
www.spectrocloud.com www.spectrocloud.com

Spectro Cloud’s “2025 State of Production Kubernetes” Report Finds AI Driving Growth as Cost Pressures Bite

2
1. pyxelr 17 Aug 2025
  
  in Public
  
  With the majority of app workloads on K8s already, 31% plan to migrate their remaining VMs into Kubernetes
  
  Kubernetes VMware
2. pyxelr 17 Aug 2025
  
  in Public
  
  AI is the top growth driver: 90% of respondents expect their AI workloads on Kubernetes to grow in the next 12 months.
  
  Kubernetes AI
Visit annotations in context

Tags

Kubernetes

VMware

AI

Annotators

pyxelr

URL

spectrocloud.com/news/2025-state-of-production-kubernetes-report
kubernetes.io kubernetes.io

Kubernetes v1.34 Sneak Peek

1
1. pyxelr 17 Aug 2025
  
  in Public
  
  Featured enhancements of Kubernetes v1.34
  
  DRA goes mainstream: stable core with ResourceClaim/DeviceClass and structured parameters — finally flexible, sane GPU/accelerator management without node‑selector tricks or quota hacks.
  
  Ops bonuses: production‑ready kubelet/API Server tracing (end‑to‑end debugging), new Deployment pod replacement policies (faster vs. resource‑conservative), PreferSameNode/SameZone traffic distribution, KYAML output in kubectl, and per‑HPA configurable tolerance.
  
  Zero breaking changes: no deprecations or removals — smooth upgrade for production teams.
  
  Release planned for late August
  
  Kubernetes
Visit annotations in context

Tags

Kubernetes

Annotators

pyxelr

URL

kubernetes.io/blog/2025/07/28/kubernetes-v1-34-sneak-peek/
Jul 2025
kubernetes.io kubernetes.io

Navigating Failures in Pods With Devices

1
1. pyxelr 18 Jul 2025
  
  in Public
  
  Navigating Failures in Pods With Devices
  
  Summary: Navigating Failures in Pods With Devices
  
  This article examines the unique challenges Kubernetes faces in managing specialized hardware (e.g., GPUs, accelerators) within AI/ML workloads, and explores current pain points, DIY solutions, and the future roadmap for more robust device failure handling.
  
  Why AI/ML Workloads Are Different
  
  Heavy Dependence on Specialized Hardware: AI/ML jobs require devices like GPUs, with hardware failures causing significant disruptions.
  
  Complex Scheduling: Tasks may consume entire machines or need coordinated scheduling across nodes due to device interconnects.
  
  High Running Costs: Specialized nodes are expensive; idle time is wasteful.
  
  Non-Traditional Failure Models: Standard Kubernetes assumptions (like treating nodes as fungible, or pods as easily replaceable) don’t apply well; failures can trigger large-scale restarts or job aborts.
  
  Major Failure Modes in Kubernetes With Devices
  
  Kubernetes Infrastructure Failures
  
  Multiple actors (device plugin, kubelet, scheduler) must work together; failures can occur at any stage.
  
  Issues include pods failing admission, poor scheduling, or pods unable to run despite healthy hardware.
  
  Best Practices: Early restarts, close monitoring, canary deployments, use of verified device plugins and drivers.
  
  Device Failures
  
  Kubernetes has limited built-in ability to handle device failures—unhealthy devices simply reduce the allocatable count.
  
  Lacks correlation between device failure and pod/container failure.
  
  DIY Solutions:
  
  Node Health Controllers: Restart nodes if device capacity drops, but these can be slow and blunt.
  
  Pod Failure Policies: Pods exit with special codes for device errors, but support is limited and mostly for batch jobs.
  
  Custom Pod Watchers: Scripts or controllers watch pod/device status, forcibly delete pods attached to failed devices, prompting rescheduling.
  
  Container Code Failures
  
  Kubernetes can only restart containers or reschedule pods, with limited expressiveness about what counts as failure.
  
  For large AI/ML jobs: Orchestration wrappers restart failed main executables, aiming to avoid expensive full job restart cycles.
  
  Device Degradation
  
  Not all device issues result in outright failure; degraded performance now occurs more frequently (e.g., one slow GPU dragging down training).
  
  Detection and remediation are largely DIY; Kubernetes does not yet natively express "degraded" status.
  
  Current Workarounds & Limitations
  
  Most device-failure strategies are manual or require high privileges.
  
  Workarounds are often fragile, costly, or disruptive.
  
  Kubernetes lacks standardized abstractions for device health and device importance at pod or cluster level.
  
  Roadmap: What’s Next for Kubernetes
  
  SIG Node and Kubernetes community are focusing on:
  
  Improving core reliability: Ensuring kubelet, device manager, and plugins handle failures gracefully.
  
  Making Failure Signals Visible: Initiatives like KEP 4680 aim to expose device health at pod status level.
  
  Integration With Pod Failure Policies: Plans to recognize device failures as first-class events for triggering recovery.
  
  Pod Descheduling: Enabling pods to be rescheduled off failed/unhealthy devices, even with restartPolicy: Always.
  
  Better Handling for Large-Scale AI/ML Workloads: More granular recovery, fast in-place restarts, state snapshotting.
  
  Device Degradation Signals: Early discussions on tracking performance degradation, but no mature standard yet.
  
  Key Takeaway
  
  Kubernetes remains the platform of choice for AI/ML, but device- and hardware-aware failure handling is still evolving. Most robust solutions are still "DIY," but community and upstream investment is underway to standardize and automate recovery and resilience for workloads depending on specialized hardware.
  
  Kubernetes AI ML MLOps GPU
Visit annotations in context

Tags

Kubernetes

AI

ML

MLOps

GPU

Annotators

pyxelr

URL

kubernetes.io/blog/2025/07/03/navigating-failures-in-pods-with-devices/
Mar 2025
repost.aws repost.aws

Difference between EKS AMI and Bottle Rocket

1
1. pyxelr 05 Mar 2025
  
  in Public
  
  The main difference between the Amazon EKS-optimized AMI (amazon-eks-node-1.29) and the Bottlerocket AMI (bottlerocket-aws-k8s-1.29) lies in their purpose
  
  See the summary below this highlight
  
  BottleRocket AWS EKS Kubernetes Docker MLOps
Visit annotations in context

Tags

Kubernetes

AWS

Docker

MLOps

EKS

BottleRocket

Annotators

pyxelr

URL

repost.aws/questions/QU5-aVwCeCQT6paofciywPXA/difference-between-eks-ami-and-bottle-rocket
aws.amazon.com aws.amazon.com

Reduce container startup time on Amazon EKS with Bottlerocket data volume | Amazon Web Services

1
1. pyxelr 04 Mar 2025
  
  in Public
  
  Reduce container startup time on Amazon EKS with Bottlerocket data volume
  
  Introduction
  
  Containers are widely used for scalable applications but face challenges with startup times for large images (e.g., AI/ML workloads).
  
  Pulling large images from Amazon Elastic Container Registry (ECR) can take several minutes, impacting performance.
  
  Bottlerocket, an AWS open-source Linux OS optimized for containers, offers a solution to reduce container startup time.
  
  Solution Overview
  
  Bottlerocket's data volume feature allows prefetching container images locally, eliminating the need for downloading during startup.
  
  Prefetching is achieved by creating an Amazon Elastic Block Store (EBS) snapshot of Bottlerocket's data volume and mapping it to new Amazon EKS nodes.
  
  Steps to implement:
  
  Spin up an Amazon EC2 instance with Bottlerocket AMI.
  
  Pull application images from the repository.
  
  Create an EBS snapshot of the data volume.
  
  Map the snapshot to Amazon EKS node groups.
  
  Benefits of Bottlerocket
  
  It separates OS and container data volumes, ensuring consistency and security during updates.
  
  Prefetched images significantly reduce startup times for large containers.
  
  Implementation Walkthrough
  
  Step 1: Build EBS Snapshot
  
  Automate snapshot creation using a script.
  
  Prefetch images like Jupyter-PyTorch and Kubernetes pause containers.
  
  Export the snapshot ID for use in node group configuration.
  
  Step 2: Setup Amazon EKS Cluster
  
  Create two node groups:
  
  no-prefetch-mng: Without prefetched images.
  
  prefetch-mng: With prefetched images mapped via EBS snapshot.
  
  Step 3: Deploy Pods
  
  Test deployment on both node groups.
  
  Prefetched nodes start pods in just 3 seconds, compared to 49 seconds without prefetching.
  
  Results
  
  Prefetching reduced container startup time from 49 seconds to 3 seconds, improving efficiency and user experience.
  
  Further Enhancements
  
  Use Karpenter for automated scaling with Bottlerocket nodes.
  
  Automate snapshot creation in CI pipelines using GitHub Actions.
  
  Cleaning Up
  
  Delete AWS resources (EKS cluster, Cloud9 environment, EBS snapshots) to avoid charges after testing.
  
  Conclusion
  
  Bottlerocket's data volume prefetching dramatically enhances container startup performance for large workloads on Amazon EKS.
  
  BottleRocket AWS EKS Kubernetes Docker MLOps
Visit annotations in context

Tags

Kubernetes

AWS

Docker

MLOps

EKS

BottleRocket

Annotators

pyxelr

URL

aws.amazon.com/blogs/containers/reduce-container-startup-time-on-amazon-eks-with-bottlerocket-data-volume/
Feb 2025
paulbutler.org paulbutler.org

The hater’s guide to Kubernetes

1
1. pyxelr 23 Feb 2025
  
  in Public
  
  The hater’s guide to Kubernetes
  
  Why use Kubernetes
  
  Best for running multiple processes/servers/jobs with redundancy and load balancing
  
  Enables infrastructure-as-code configuration for service relationships
  
  Outsourced infrastructure management via cloud providers (e.g., Google Kubernetes Engine)
  
  What they use
  
  Core resources: Deployments (with rolling updates), Services (ClusterIP/LoadBalancer), CronJobs
  
  Configuration: ConfigMaps and Secrets via Pulumi (TypeScript) instead of raw YAML
  
  Cautious adoptions: StatefulSets for limited persistence, RBAC only when necessary
  
  What they avoid
  
  Hand-written YAML and Helm charts ("fragility for no gain")
  
  Operators, custom resources, service meshes, and most third-party controllers
  
  Local k8s stack replication (prefer Docker Compose for local dev)
  
  Key insights
  
  "A human should never wait for a pod" - unsuitable for interactive workloads requiring fast startup
  
  Use managed databases/storage for critical data instead of k8s volumes
  
  Alternatives like Railway/Render may be better for simpler SaaS apps
  
  Recently adopted Ingress controllers for Cloud Armor integration despite initial reservations
  
  MLOps Kubernetes Helm
Visit annotations in context

Tags

Kubernetes

MLOps

Helm

Annotators

pyxelr

URL

paulbutler.org/2024/the-haters-guide-to-kubernetes/
Jan 2025
www.figma.com www.figma.com

How We Migrated onto K8s in Less Than 12 months | Figma Blog

1
1. pyxelr 19 Jan 2025
  
  in Public
  
  How we migrated onto K8s in less than 12 months
  
  Figma's Initial Infrastructure Challenges:
  
  Figma's monolithic architecture struggled with resource allocation inefficiencies and limited scalability.
  
  High traffic spikes from collaborative design workflows required more robust solutions for resource autoscaling and failover.
  
  Why Kubernetes Was Chosen:
  
  Kubernetes' container orchestration capabilities promised better resource management and service isolation.
  
  Features like Horizontal Pod Autoscaling (HPA), robust networking via Kubernetes Services, and support for StatefulSets made it an ideal fit for Figma’s needs.
  
  The platform also wanted better alignment with cloud-native practices and modern CI/CD workflows.
  
  Incremental Migration Approach:
  
  Step 1: Non-Critical Services: Figma migrated stateless services first, allowing experimentation without risking core functionality.
  
  Step 2: Custom Tooling: Internal tooling was built to manage Kubernetes manifests and automate Helm chart creation for standardization.
  
  Step 3: Stateful Services: For databases and other stateful components, Figma relied on Kubernetes' StatefulSets and persistent volumes (PVs) to ensure data integrity during the migration.
  
  Step 4: Observability Enhancements: Kubernetes-native tools like Prometheus and Grafana were integrated to provide detailed metrics and system insights.
  
  Key Technical Adjustments During Migration:
  
  Service Discovery: Transitioned to Kubernetes-native DNS for internal service communication, replacing legacy methods.
  
  Load Balancing: Leveraged Kubernetes Ingress and external load balancers (e.g., NGINX or cloud-native solutions) for traffic routing.
  
  Networking Complexity: Resolved challenges around multi-cluster networking using Kubernetes CNI plugins like Calico.
  
  Resource Management: Used Resource Quotas and Limits to prevent pod overcommitment and optimize cluster utilization.
  
  Challenges Faced:
  
  Stateful Services: Ensuring zero-downtime migration for databases required careful orchestration of PersistentVolumeClaims (PVCs) and StatefulSets.
  
  Networking: Handling cross-region traffic and external dependencies required tweaking Kubernetes Ingress configurations.
  
  Resource Constraints: Balancing costs and performance involved tuning cluster-autoscaler configurations and evaluating node pool setups.
  
  Benefits Realized Post-Migration:
  
  Scalability: Kubernetes' HPA allowed Figma to scale pods dynamically based on traffic patterns, ensuring consistent performance.
  
  Deployment Efficiency: CI/CD pipelines integrated seamlessly with Kubernetes, enabling faster and more reliable rollouts using tools like Argo CD.
  
  Reliability: Self-healing capabilities, such as pod restarts and node failover, reduced downtime during failures.
  
  Observability: Improved system monitoring with Kubernetes' native metrics server and integrations with Prometheus and Grafana.
  
  Future Enhancements Planned:
  
  Service Mesh Integration: Adoption of Istio or Linkerd to enhance observability, security (e.g., mutual TLS), and traffic management.
  
  Cost Optimization: Further tuning autoscaling policies and resource limits to minimize waste.
  
  Edge Improvements: Deploying Kubernetes clusters closer to end-users for reduced latency, potentially using Kubernetes' Cluster Federation.
  
  Kubernetes ArgoCD MLOps Figma
Visit annotations in context

Tags

ArgoCD

Kubernetes

MLOps

Figma

Annotators

pyxelr

URL

figma.com/blog/migrating-onto-kubernetes/
araji.medium.com araji.medium.com

How Tesla is using Kubernetes and Kafka to handle trillions of events per day

1
1. pyxelr 19 Jan 2025
  
  in Public
  
  How Tesla is using Kubernetes and Kafka to handle trillions of events per day
  
  Overview of Tesla's Data Infrastructure Challenges:
  
  Modern Tesla vehicles generate an enormous volume of telemetry data related to sensor readings, driver behavior, energy consumption, and more.
  
  The primary challenge is ingesting, processing, and analyzing this data at scale while maintaining real-time capabilities.
  
  Kubernetes for Orchestration:
  
  Tesla uses Kubernetes to manage containerized microservices across a distributed cloud environment.
  
  Kubernetes ensures dynamic scaling to handle fluctuating workloads, providing high availability for critical services.
  
  Each microservice is encapsulated in its own container, improving isolation and deployability.
  
  Kafka for Real-Time Event Streaming:
  
  Apache Kafka is the backbone of Tesla’s data pipeline, managing trillions of events daily from globally distributed vehicles.
  
  Kafka topics are structured to partition and replicate data efficiently, ensuring fault tolerance and high throughput.
  
  Producers (vehicles) send data to Kafka brokers, while consumers (analytics systems, data lakes) process these streams in real-time.
  
  Data Processing Pipelines:
  
  Data from Kafka is ingested into processing systems for real-time analytics, anomaly detection, and predictive maintenance.
  
  Stream processing frameworks (e.g., Apache Flink or Kafka Streams) analyze data for immediate feedback.
  
  Batch systems handle aggregation and storage in Tesla’s data lake for long-term insights.
  
  Key Technical Advantages:
  
  Scalability: Kubernetes dynamically allocates resources based on the volume of incoming data and computational requirements.
  
  Resilience: Kafka’s replication factor ensures that no single broker failure impacts the system.
  
  Low Latency: Data streams from Kafka enable Tesla to act on insights in milliseconds, critical for safety and performance monitoring.
  
  Simplified Management:
  
  The platform supports multi-cluster Kubernetes configurations for geographic data segregation.
  
  A central control plane monitors system health, manages deployments, and ensures compliance with data regulations.
  
  Future Goals and Improvements:
  
  Enhancing AI-driven analytics to derive deeper insights from vehicle data.
  
  Further optimizing Kafka’s cluster topology to improve fault tolerance and reduce operational costs.
  
  Expanding edge processing capabilities in vehicles to pre-filter data, reducing bandwidth requirements to the cloud.
  
  Kubernetes Kafka MLOps Tesla
Visit annotations in context

Tags

Tesla

Kubernetes

MLOps

Kafka

Annotators

pyxelr

URL

araji.medium.com/how-tesla-is-using-kubernetes-and-kafka-to-handle-trillions-of-events-per-day-01e6c370d49e
thenewstack.io thenewstack.io

Reddit No Longer Haunted by Drifting Kubernetes Configurations

1
1. pyxelr 19 Jan 2025
  
  in Public
  
  Reddit No Longer Haunted by Drifting Kubernetes Configurations
  
  Kubernetes Configuration Drift Issue:
  
  Reddit experienced a significant outage on March 13, 2022, during a Kubernetes upgrade from version 1.23 to 1.24.
  
  The outage was caused by configuration drift, where unintended changes accumulated over time, leading to inconsistencies across clusters.
  
  New Platform Abstraction with Declarative APIs:
  
  Reddit adopted a declarative approach, leveraging Kubernetes controllers to manage configurations and enforce consistency.
  
  The implementation of these controllers enabled Reddit to abstract platform complexities and ensure a uniform deployment environment.
  
  Centralized Control Plane for Multi-Cluster Management:
  
  The team built a centralized control plane to manage multiple Kubernetes clusters effectively.
  
  Cluster provisioning time was drastically reduced from over 30 hours to approximately 2 hours.
  
  Centralization facilitated standardized configurations and reduced operational overhead.
  
  Development of Achilles SDK:
  
  Achilles, an SDK developed in-house by Reddit, simplified the creation of Kubernetes controllers.
  
  It allowed infrastructure engineers to automate and manage resources programmatically without deep Kubernetes expertise.
  
  The SDK supported a more proactive approach to problem-solving, preventing drift by design.
  
  Benefits and Lessons Learned:
  
  The new system ensured robust monitoring, minimized manual intervention, and improved scalability.
  
  Configuration drift was effectively mitigated, providing a more stable and predictable infrastructure.
  
  The experience highlighted the importance of using Kubernetes-native solutions and declarative configurations for managing large-scale deployments.
  
  Future Goals:
  
  Further refinement of the platform to address edge cases and improve developer experience.
  
  Continued investment in tools and processes to maintain infrastructure consistency at scale.
  
  Kubernetes Reddit MLOps
Visit annotations in context

Tags

Kubernetes

MLOps

Reddit

Annotators

pyxelr

URL

thenewstack.io/reddit-no-longer-haunted-by-drifting-kubernetes-configurations/
Nov 2024
www.gitpod.io www.gitpod.io

We’re leaving Kubernetes - Blog

1
1. pyxelr 17 Nov 2024
  
  in Public
  
  We’re leaving Kubernetes
  
  Why Gitpod is Leaving Kubernetes
  
  Gitpod has decided to transition away from Kubernetes for managing cloud development environments, opting instead for a custom-built solution better suited to their needs. While Kubernetes is powerful for orchestrating stateless application workloads, Gitpod identified several challenges that made it less ideal for their dynamic, stateful workloads.
  
  Key Challenges of Kubernetes
  
  Resource Overhead: Kubernetes introduces significant complexity and resource consumption, which is inefficient for scaling ephemeral development environments.
  
  Latency in Scaling: The time required to scale pods and handle stateful workloads can slow down developer workflows that demand near-instant provisioning.
  
  Stateful Workloads: Kubernetes is designed for stateless applications, and adapting it for stateful environments adds operational complexity.
  
  Cost Inefficiency: Running dynamic workloads on Kubernetes incurs higher operational costs due to the constant need for scaling and resource orchestration.
  
  Security Concerns: Managing multi-tenant security on Kubernetes is challenging, requiring considerable effort to ensure workload isolation and permission control.
  
  Operational Complexity: Maintaining Kubernetes clusters at scale involves a significant operational burden, including updates, monitoring, and configuration management.
  
  Gitpod is now focusing on Gitpod Flex, a new solution tailored to better meet the demands of developers, offering improved scalability, efficiency, and simplicity.
  
  Kubernetes Gitpod
Visit annotations in context

Tags

Kubernetes

Gitpod

Annotators

pyxelr

URL

gitpod.io/blog/we-are-leaving-kubernetes
kubernetes.io kubernetes.io

Kubernetes 1.31: Read Only Volumes Based On OCI Artifacts (alpha)

1
1. pyxelr 17 Nov 2024
  
  in Public
  
  Data scientists, MLOps engineers, or AI developers, can mount large language model weights or machine learning model weights in a pod alongside a model-server, so that they can efficiently serve them without including them in the model-server container image. They can package these in an OCI object to take advantage of OCI distribution and ensure efficient model deployment. This allows them to separate the model specifications/content from the executables that process them.
  
  The introduction of the Image Volume Source feature in Kubernetes 1.31 allows MLOps practitioners to mount OCI-compatible artifacts, such as large language model weights or machine learning models, directly into pods without embedding them in container images. This streamlines model deployment, enhances efficiency, and leverages OCI distribution mechanisms for effective model management.
  
  Kubernetes MLOps LLM LLMOps
Visit annotations in context

Tags

LLM

Kubernetes

MLOps

LLMOps

Annotators

pyxelr

URL

kubernetes.io/blog/2024/08/16/kubernetes-1-31-image-volume-source/
www.devopsdigest.com www.devopsdigest.com

Optimizing Kubernetes Costs with Multi-Tenancy and Virtual Clusters | DEVOPSdigest

1
1. pyxelr 03 Nov 2024
  
  in Public
  
  Optimizing Kubernetes Costs with Multi-Tenancy and Virtual Clusters
  
  The blog post by Cliff Malmborg from Loft Labs discusses optimizing Kubernetes costs using multi-tenancy and virtual clusters. With Kubernetes expenses rising rapidly at scale, traditional cost-saving methods like autoscaling, resource quotas, and monitoring tools help but are not enough for complex environments where underutilized clusters are common. Multi-tenancy enables resource sharing, reducing the number of clusters and, in turn, management and operational costs.
  
  A virtual cluster is a fully functional Kubernetes cluster running within a larger host cluster, providing better isolation and flexibility than namespaces. Unlike namespaces, each virtual cluster has its own Kubernetes control plane, so resources like statefulsets and webhooks are isolated within it, while only core resources (like pods and services) are shared with the host cluster. This setup addresses the "noisy neighbor" problem, where workloads in a shared environment interfere with each other due to resource contention.
  
  Virtual clusters offer the isolation benefits of individual physical clusters but are cheaper and easier to manage than deploying separate physical clusters for each tenant or application. They also support "sleep mode," automatically scaling down unused resources to save costs, and allow shared use of central tools (like ingress controllers) installed in the host cluster. By transitioning to virtual clusters, companies can balance security, isolation, and cost-effectiveness, reducing the need for multiple physical clusters and making Kubernetes infrastructure scalable for modern, resource-demanding applications.
  
  Kubernetes MLOps FinOps
Visit annotations in context

Tags

Kubernetes

MLOps

FinOps

Annotators

pyxelr

URL

devopsdigest.com/optimizing-kubernetes-costs-with-multi-tenancy-and-virtual-clusters
Jun 2024
spacelift.io spacelift.io

How to Run Kubernetes on AWS - Tutorial

5
1. pyxelr 09 Jun 2024
  
  in Public
  
  Neither of the methods shown above are ideal in environments where you require several clusters or need them to be provisioned in a consistent way by multiple people.
  
  In this case, IaC is favored over using EKS directly or manually deploying on EC2
  
  Kubernetes AWS EKS EC2 IaC Terraform
2. pyxelr 09 Jun 2024
  
  in Public
  
  Running a cluster directly on EC2 also gives you the choice of using any available Kubernetes distribution, such as Minikube, K3s, or standard Kubernetes as deployed by Kubeadm.
  
  Kubernetes AWS EKS EC2 Minikube K3s Kubeadm
3. pyxelr 09 Jun 2024
  
  in Public
  
  EKS is popular because it’s so simple to configure and maintain. You don’t need to understand the details of how Kubernetes works or how Nodes are joined to your cluster and secured. The EKS service automates cluster management procedures, leaving you free to focus on your workloads. This simplicity can come at a cost, though: you could find EKS becomes in-flexible as you grow, and it might be challenging to migrate from if you switch to a different cloud provider.
  
  Why use EKS
  
  Kubernetes AWS EKS
4. pyxelr 09 Jun 2024
  
  in Public
  
  The EKS managed Kubernetes engine isn’t included in the free tier. You’ll always be billed $0.10 per hour for each cluster you create, in addition to the EC2 or Fargate costs associated with your Nodes. The basic EKS charge only covers the cost of running your managed control plane. Even if you don’t use EKS, you’ll still need to pay to run Kubernetes on AWS. The free tier gives you access to EC2 for 750 hours per month on a 12-month trial, but this is restricted to the t2.micro and t3.micro instance types. These only offer 1 GiB of RAM so they’re too small to run most Kubernetes distributions.
  
  Cost of EKS
  
  Kubernetes AWS EKS
5. pyxelr 09 Jun 2024
  
  in Public
  
  Some of the other benefits of Kubernetes on AWS include
  
  Benefits of using Kubernetes on AWS: - scalability - cost efficiency - high availability
  
  Kubernetes AWS EKS
Visit annotations in context

Tags

K3s

Kubernetes

IaC

AWS

Minikube

EC2

Kubeadm

Terraform

EKS

Annotators

pyxelr

URL

spacelift.io/blog/kubernetes-on-aws
May 2024
developers.redhat.com developers.redhat.com

Podman and Buildah for Docker users - Red Hat Developer

1
1. TylerRick 14 May 2024
  
  in Public
  
  Podman provides some extra features that help developers and operators in Kubernetes environments. There are extra commands provided by Podman that are not available in Docker.
  
  Podman compatibility Kubernetes tool
Visit annotations in context

Tags

compatibility

Kubernetes

tool

Podman

Annotators

TylerRick

URL

developers.redhat.com/blog/2019/02/21/podman-and-buildah-for-docker-users/
Feb 2024
marvelousmlops.substack.com marvelousmlops.substack.com

What do teams really want from an MLOps Engineer?

1
1. pyxelr 02 Feb 2024
  
  in Public
  
  We’ve (painstakingly) manually reviewed 310 live MLOps positions, advertised across various platforms in Q4 this year
  
  They went through 310 role descriptions and, even though role descriptions may vary significantly, they found 3 core skills that a large percentage of MLOps roles required:
  
  📦 Docker and Kubernetes 🐍 Python 🌥 Cloud
  
  MLOps career Docker Kubernetes Python Cloud
Visit annotations in context

Tags

Kubernetes

Docker

Cloud

MLOps

Python

career

Annotators

pyxelr

URL

marvelousmlops.substack.com/p/what-do-teams-really-want-from-an
Nov 2023
aws.amazon.com aws.amazon.com

Amazon EKS extended support for Kubernetes versions available in preview | Amazon Web Services

1
1. pyxelr 14 Nov 2023
  
  in Public
  
  You can now run Amazon EKS clusters on a Kubernetes version for up to 26 months from the time the version is generally available on Amazon EKS.
  
  AWS Amazon EKS Kubernetes
Visit annotations in context

Tags

Kubernetes

EKS

AWS

Amazon

Annotators

pyxelr

URL

aws.amazon.com/blogs/containers/amazon-eks-extended-support-for-kubernetes-versions-available-in-preview/
gitlab.com gitlab.com

Document how backups are supposed to work with containerized Gitlab (docker/kubernetes) (#1960) · Issues · GitLab.org / omnibus-gitlab · GitLab

1
1. TylerRick 13 Nov 2023
  
  in Public
  
  But rather than do all that work to identify the running pod, why not run the backup using the deployment? (That's what I've always done) kubectl exec deployments/gitlab --namespace gitlab -- gitlab-rake gitlab:backup:create
  
  Kubernetes
Visit annotations in context

Tags

Kubernetes

Annotators

TylerRick

URL

gitlab.com/gitlab-org/omnibus-gitlab/-/issues/1960
Oct 2023
docs.gitlab.com docs.gitlab.com

Grant users Kubernetes access | GitLab

1
1. TylerRick 23 Oct 2023
  
  in Public
  
  User impersonation workflow
  
  user impersonation feature Kubernetes
Visit annotations in context

Tags

Kubernetes

user impersonation feature

Annotators

TylerRick

URL

docs.gitlab.com/ee/user/clusters/agent/user_access.html
Mar 2023
kserve.io kserve.io

The Scalability Problem - KServe Documentation Website

2
1. pyxelr 14 Mar 2023
  
  in Public
  
  cluster with 4096 IP addresses can deploy at most 1024 models assuming each InferenceService has 4 pods on average (two transformer replicas and two predictor replicas).
  
  Kubernetes clusters have a maximum IP address limitation
  
  Kubernetes Kubeflow KServe MLOps
2. pyxelr 14 Mar 2023
  
  in Public
  
  According to Kubernetes best practice, a node shouldn't run more than 100 pods.
  
  Kubernetes Kubeflow KServe MLOps
Visit annotations in context

Tags

Kubernetes

Kubeflow

MLOps

KServe

Annotators

pyxelr

URL

kserve.io/website/0.10/modelserving/mms/multi-model-serving/
Feb 2023
aruva.medium.com aruva.medium.com

~100 Bash Aliases for supersonic Productivity

1
1. pyxelr 27 Feb 2023
  
  in Public
  
  Kubernetes Aliases
  
  Some of my favourite k8s aliases: * 2. * 3.
  
  Bash Kubernetes
Visit annotations in context

Tags

Bash

Kubernetes

Annotators

pyxelr

URL

aruva.medium.com/100-bash-aliases-for-supersonic-productivity-d54a796422d9
Jan 2023
www.mankier.com www.mankier.com

kubectl-auth-can-i command man page - kubernetes-client

1
1. pyxelr 18 Jan 2023
  
  in Public
  
  kubectl auth can-i
  
  Command to check whether an action is allowed
  
  Kubernetes Kubeflow kubectl MLOps
Visit annotations in context

Tags

Kubeflow

Kubernetes

MLOps

kubectl

Annotators

pyxelr

URL

mankier.com/1/kubectl-auth-can-i
Dec 2022
www.digitalocean.com www.digitalocean.com

Introduction to Containers | DigitalOcean

2
1. kip2 07 Dec 2022
  
  in Public
  
  Kubernetes is a purpose-built container orchestration solution
  
  kubernetes
2. kip2 07 Dec 2022
  
  in Public
  
  While a full dive into container orchestration is beyond the scope of this article, two prominent players are Docker with Docker Compose and Docker Swarm mode, and Kubernetes. In roughly order of complexity, Docker Compose is a container orchestration solution that deals with multi-container deployments on a single host. When there are multiple hosts involved, Docker Swarm mode is required.
  
  docker docker-compose kubernetes
Visit annotations in context

Tags

docker

docker-compose

kubernetes

Annotators

kip2

URL

digitalocean.com/community/conceptual-articles/introduction-to-containers
Nov 2022
github.com github.com

alexellis/arkade: Open Source Kubernetes Marketplace

1
1. almereyda 28 Nov 2022
  
  in Public
  
  Kubernetes
Visit annotations in context

Tags

Kubernetes

Annotators

almereyda

URL

github.com/alexellis/arkade
www.digitalocean.com www.digitalocean.com

Introduction to Containers | DigitalOcean

1
1. kip2 01 Nov 2022
  
  in Public
  
  Orchestration involves provisioning, configuration, scheduling, scaling, monitoring, deployment, and more. Kubernetes is an example of a popular container orchestration solution.
  
  orchestration containers kubernetes
Visit annotations in context

Tags

containers

orchestration

kubernetes

Annotators

kip2

URL

digitalocean.com/community/conceptual-articles/introduction-to-containers
Jan 2022
developers.redhat.com developers.redhat.com

Why Kubernetes native instead of cloud native? | Red Hat Developer

2
1. pyxelr 25 Jan 2022
  
  in Public
  
  Adopting Kubernetes-native environments ensures true portability for the hybrid cloud. However, we also need a Kubernetes-native framework to provide the "glue" for applications to seamlessly integrate with Kubernetes and its services. Without application portability, the hybrid cloud is relegated to an environment-only benefit. That framework is Quarkus.
  
  Quarkus framework
  
  Kubernetes MLOps Quarkus Java
2. pyxelr 25 Jan 2022
  
  in Public
  
  Kubernetes-native is a specialization of cloud-native, and not divorced from what cloud native defines. Whereas a cloud-native application is intended for the cloud, a Kubernetes-native application is designed and built for Kubernetes.
  
  Kubernetes-native application
  
  Kubernetes MLOps
Visit annotations in context

Tags

Kubernetes

Java

Quarkus

MLOps

Annotators

pyxelr

URL

developers.redhat.com/blog/2020/04/08/why-kubernetes-native-instead-of-cloud-native
Nov 2021
thenewstack.io thenewstack.io

Living with Kubernetes: Debug Clusters in 8 Commands - The New Stack

8
1. pyxelr 11 Nov 2021
  
  in Public
  
  If for some reason you don’t see a running pod from this command, then using kubectl describe po a is your next-best option. Look at the events to find errors for what might have gone wrong.
  
  kubectl run a –image alpine –command — /bin/sleep 1d
  
  Kubernetes MLOps
2. pyxelr 11 Nov 2021
  
  in Public
  
  As with listing nodes, you should first look at the status column and look for errors. The ready column will show how many pods are desired and how many are running.
  
  kubectl get pods -A -o wide
  
  Kubernetes MLOps
3. pyxelr 11 Nov 2021
  
  in Public
  
  -o wide option will tell us additional details like operating system (OS), IP address and container runtime. The first thing you should look for is the status. If the node doesn’t say “Ready” you might have a problem, but not always.
  
  kubectl get nodes -o wide
  
  Kubernetes MLOps
4. pyxelr 11 Nov 2021
  
  in Public
  
  This command will be the easiest way to discover if your scheduler, controller-manager and etcd node(s) are healthy.
  
  kubectl get componentstatus
  
  Kubernetes MLOps
5. pyxelr 11 Nov 2021
  
  in Public
  
  If something broke recently, you can look at the cluster events to see what was happening before and after things broke.
  
  kubectl get events -A
  
  Kubernetes MLOps
6. pyxelr 11 Nov 2021
  
  in Public
  
  this command will tell you what CRDs (custom resource definitions) have been installed in your cluster and what API version each resource is at. This could give you some insights into looking at logs on controllers or workload definitions.
  
  kubectl api-resources -o wide –sort-by name
  
  Kubernetes MLOps
7. pyxelr 11 Nov 2021
  
  in Public
  
  kubectl get --raw '/healthz?verbose'
  
  Alternative to kubectl get --raw '/healthz?verbose'. It does not show scheduler or controller-manager output, but it adds a lot of additional checks that might be valuable if things are broken
  
  Kubernetes MLOps
8. pyxelr 11 Nov 2021
  
  in Public
  
  Here are the eight commands to run
  
  8 commands to debug Kubernetes cluster:
  
  kubectl version --short kubectl cluster-info kubectl get componentstatus kubectl api-resources -o wide --sort-by name kubectl get events -A kubectl get nodes -o wide kubectl get pods -A -o wide kubectl run a --image alpine --command -- /bin/sleep 1d
  
  Kubernetes MLOps
Visit annotations in context

Tags

Kubernetes

MLOps

Annotators

pyxelr

URL

thenewstack.io/living-with-kubernetes-debug-clusters-in-8-commands/
Oct 2021
medium.com medium.com

What to consider before choosing Argo Workflow?

4
1. pyxelr 10 Oct 2021
  
  in Public
  
  Argo Workflow is part of the Argo project, which offers a range of, as they like to call it, Kubernetes-native get-stuff-done tools (Workflow, CD, Events, Rollouts).
  
  High level definition of Argo Workflow
  
  Kubernetes MLOps Argo
2. pyxelr 10 Oct 2021
  
  in Public
  
  Argo is designed to run on top of k8s. Not a VM, not AWS ECS, not Container Instances on Azure, not Google Cloud Run or App Engine. This means you get all the good of k8s, but also the bad.
  
  Pros of Argo Workflow:
  
  Resilience
  
  Autoscaling
  
  Configurability
  
  Support for RBAC
  
  Cons of Argo Workflow:
  
  A lot of YAML files required
  
  k8s knowledge required
  
  Kubernetes MLOps Argo
3. pyxelr 10 Oct 2021
  
  in Public
  
  If you are already heavily invested in Kubernetes, then yes look into Argo Workflow (and its brothers and sisters from the parent project).The broader and harder question you should ask yourself is: to go full k8s-native or not? Look at your team’s cloud and k8s experience, size, growth targets. Most probably you will land somewhere in the middle first, as there is no free lunch.
  
  Should you go into Argo, or not?
  
  Argo Kubernetes MLOps
4. pyxelr 10 Oct 2021
  
  in Public
  
  In order to reduce the number of lines of text in Workflow YAML files, use WorkflowTemplate . This allow for re-use of common components.
  
  kind: WorkflowTemplate
  
  argo Kubernetes MLOps
Visit annotations in context

Tags

Kubernetes

Argo

argo

MLOps

Annotators

pyxelr

URL

medium.com/datamindedbe/what-to-consider-before-choosing-argo-workflow-54f6067307a8
Sep 2021
podman.io podman.io

What is podman?

2
1. TylerRick 29 Sep 2021
  
  in Public
  
  Hence, Podman allows the creation and execution of Pods from a Kubernetes YAML file (see podman-play-kube). Podman can also generate Kubernetes YAML based on a container or Pod (see podman-generate-kube), which allows for an easy transition from a local development environment to a production Kubernetes cluster.
  
  podman kubernetes interoperability compatibility
2. TylerRick 29 Sep 2021
  
  in Public
  
  We believe that Kubernetes is the defacto standard for composing Pods and for orchestrating containers, making Kubernetes YAML a defacto standard file format.
  
  opinion podman kubernetes de facto standard
Visit annotations in context

Tags

compatibility

kubernetes

opinion

de facto standard

podman

interoperability

Annotators

TylerRick

URL

podman.io/whatis.html
matt-rickard.com matt-rickard.com

An Overview of Docker Desktop Alternatives

1
1. pyxelr 10 Sep 2021
  
  in Public
  
  kind, microk8s, or k3s are replacements for Docker Desktop. False. Minikube is the only drop-in replacement. The other tools require a Linux distribution, which makes them a non-starter on macOS or Windows. Running any of these in a VM misses the point – you don't want to be managing the Kubernetes lifecycle and a virtual machine lifecycle. Minikube abstracts all of this.
  
  At the current moment the best approach is to use minikube with a preferred backend (Docker Engine and Podman are already there), and you can simply run one command to configure Docker CLI to use the engine from the cluster.
  
  Drivers | minikube
  
  docker-env | minikube
  
  Docker minikube Kubernetes kind microk8s k3s MLOps
Visit annotations in context

Tags

k3s

Kubernetes

microk8s

Docker

minikube

MLOps

kind

Annotators

pyxelr

URL

matt-rickard.com/docker-desktop-alternatives/
Aug 2021
zhuanlan.zhihu.com zhuanlan.zhihu.com

手撕kube-proxy —— iptable模式实现原理实例分析

1
1. jtr109 24 Aug 2021
  
  in Public
  
  首先我们看nat表的OUTPUT规则：
  
  使用以下命令可以查看结果：
  
  sudo iptables -t nat -nvL OUTPUT
  
  iptables Kubernetes
Visit annotations in context

Tags

Kubernetes

iptables

Annotators

jtr109

URL

zhuanlan.zhihu.com/p/196393839
segmentfault.com segmentfault.com

探究K8S Service内部iptables路由规则

1
1. jtr109 24 Aug 2021
  
  in Public
  
  kubectl run --image=nginx nginx-web-1 --image-pull-policy='IfNotPresent'
  
  这里应该要创建 Deployment:
  
  kubectl create deployment nginx-web-1 --image=nginx
  
  Kubernetes
Visit annotations in context

Tags

Kubernetes

Annotators

jtr109

URL

segmentfault.com/a/1190000019608055
yankee.dev yankee.dev

6 Tools to Run Kubernetes Locally

7
1. pyxelr 09 Aug 2021
  
  in Public
  
  k3d is basically running k3s inside of Docker. It provides an instant benefit over using k3s on a local machine, that is, multi-node clusters. Running inside Docker, we can easily spawn multiple instances of our k3s Nodes.
  
  k3d <--- k3s that allows to run mult-node clusters on a local machine
  
  Kubernetes MLOps
2. pyxelr 09 Aug 2021
  
  in Public
  
  Kubernetes in Docker (KinD) is similar to minikube but it does not spawn VM's to run clusters and works only with Docker. KinD for the most part has the least bells and whistles and offers an intuitive developer experience in getting started with Kubernetes in no time.
  
  KinD (Kubernetes in Docker) <--- sounds like the most recommended solution to learn k8s locally
  
  Kubernetes MLOps
3. pyxelr 09 Aug 2021
  
  in Public
  
  Contrary to the name, it comes in a larger binary of 150 MB+. It can be run as a binary or in DinD mode. k0s takes security seriously and out of the box, it meets the FIPS compliance.
  
  k0s <--- similar to k3s, but not as lightweight
  
  Kubernetes MLOps
4. pyxelr 09 Aug 2021
  
  in Public
  
  k3s is a lightweight Kubernetes distribution from Rancher Labs. It is specifically targeted for running on IoT and Edge devices, meaning it is a perfect candidate for your Raspberry Pi or a virtual machine.
  
  k3s <--- lightweight solution
  
  Kubernetes MLOps
5. pyxelr 09 Aug 2021
  
  in Public
  
  All of the tools listed here more or less offer the same feature, including but not limited to
  
  7 tools for learning k8s locally:
  
  k3s
  
  k0s
  
  Microk8s
  
  DinD
  
  minikube
  
  KinD
  
  k3d
  
  Kubernetes MLOps
6. pyxelr 09 Aug 2021
  
  in Public
  
  There are multiple tools for running Kubernetes on your local machine, but it basically boils down to two approaches on how it is done
  
  We can run Kubernetes locally as a:
  
  binary package
  
  container using dind
  
  Kubernetes Docker MLOps
7. pyxelr 09 Aug 2021
  
  in Public
  
  Before we move on to talk about all the tools, it will be beneficial if you installed arkade on your machine.
  
  With arkade, we can quickly set up different k8s tools, while using a single command:
  
  e.g. arkade get k9s
  
  Kubernetes MLOps
Visit annotations in context

Tags

Kubernetes

Docker

MLOps

Annotators

pyxelr

URL

yankee.dev/6-tools-to-run-kubernetes-locally
Jul 2021
medium.com medium.com

Python With Docker-Compose | FastAPI [Part-2]

1
1. pyxelr 18 Jul 2021
  
  in Public
  
  there is a drawback, docker-compose runs on a single node which makes scaling hard, manual and very limited. To be able to scale services across multiple hosts/nodes, orchestrators like docker-swarm or kubernetes comes into play.
  
  docker-compose runs on a single node (hard to scale)
  
  docker-swarm or kubernetes run on multiple nodes
  
  Docker docker-compose Kubernetes docker-swarm
Visit annotations in context

Tags

Kubernetes

docker-swarm

docker-compose

Docker

Annotators

pyxelr

URL

medium.com/swlh/python-with-docker-compose-fastapi-part-2-88e164d6ef86
www.openshift.com www.openshift.com

Kubernetes is Removing Docker Support, Kubernetes is Not Removing Docker Support

1
1. pyxelr 08 Jul 2021
  
  in Public
  
  Even though Kubernetes is moving away from Docker, it will always support the OCI and Docker image formats. Kubernetes doesn’t pull and run images itself, instead the Kubelet relies on container engines like CRI-O and containerd to pull and run the images. These are the two main container engines used with CRI-O and they both support the Docker and OCI image formats, so no worries on this one.
  
  Reason why one should not be worried about k8s depreciating Docker
  
  Docker Kubernetes OCI
Visit annotations in context

Tags

Kubernetes

OCI

Docker

Annotators

pyxelr

URL

openshift.com/blog/kubernetes-is-removing-docker-support-kubernetes-is-not-removing-docker-support
Jun 2021
istio.io istio.io

The Istio service mesh

1
1. jtr109 17 Jun 2021
  
  in Public
  
  Secure service-to-service communication in a cluster with TLS encryption, strong identity-based authentication and authorization
  
  What about between clusters?
  
  Kubernetes Istio
Visit annotations in context

Tags

Istio

Kubernetes

Annotators

jtr109

URL

istio.io/latest/about/service-mesh/
github.com github.com

jamiehannaford/what-happens-when-k8s

1
1. jtr109 17 Jun 2021
  
  in Public
  
  中文读者可以在完成了本文学习后，参考这篇文章了解。文章对本文的源码分析做了一些补充。
  
  Kubernetes Chinese
Visit annotations in context

Tags

Kubernetes

Chinese

Annotators

jtr109

URL

github.com/jamiehannaford/what-happens-when-k8s
github.com github.com

caicloud/kube-ladder

1
1. jtr109 07 Jun 2021
  
  in Public
  
  Kube-proxy 客户端凭证
  
  不是应该每个结点上都有 kube-proxy 吗？
  
  doubt Kubernetes
Visit annotations in context

Tags

Kubernetes

doubt

Annotators

jtr109

URL

github.com/caicloud/kube-ladder/blob/master/tutorials/lab3-manual-installtion.md
May 2021
towardsdatascience.com towardsdatascience.com

Serverless Machine Learning Pipelines with Vertex AI: An Introduction

1
1. pyxelr 20 May 2021
  
  in Public
  
  The only problem is that Kubeflow Pipelines must be deployed on a Kubernetes Cluster. You will struggle with permissions, VPC and lots of problems to deploy and use it if you are in a small company that uses sensitive data, which makes it a bit difficult to be adoptedVertex AI solves this problem with a managed pipeline runner: you can define a Pipeline and it will executed it, being responsible to provision all resources, store all the artifacts you want and pass them through each of the wanted steps.
  
  How Vertex AI solves the problem/need of deploying on a Kubernetes Cluster
  
  VertexAI Kubeflow GCP Kubernetes
Visit annotations in context

Tags

VertexAI

Kubernetes

Kubeflow

GCP

Annotators

pyxelr

URL

towardsdatascience.com/serverless-machine-learning-pipelines-with-vertex-ai-an-introduction-30af8b53188e
Apr 2021
datamechanics.co datamechanics.co

Apache Spark 3.1 Release: Spark on Kubernetes is now Generally Available

1
1. pyxelr 01 Apr 2021
  
  in Public
  
  With Spark 3.1, the Spark-on-Kubernetes project is now considered Generally Available and Production-Ready.
  
  With Spark 3.1 k8s becomes the right option to replace YARN
  
  Spark Kubernetes MLOps YARN
Visit annotations in context

Tags

YARN

Kubernetes

MLOps

Spark

Annotators

pyxelr

URL

datamechanics.co/blog-post/apache-spark-3-1-release-spark-on-kubernetes-is-now-ga
Mar 2021
blog.usejournal.com blog.usejournal.com

You Don’t Need All That Complex/Expensive/Distracting Infrastructure

1
1. pyxelr 14 Mar 2021
  
  in Public
  
  Simple … a single Linode VPS.
  
  You might not need all the Kubernetes clusters and run well on a single Linode VPS.
  
  Twitter thread: https://twitter.com/levelsio/status/1101581928489078784
  
  MLOps Kubernetes
Visit annotations in context

Tags

Kubernetes

MLOps

Annotators

pyxelr

URL

blog.usejournal.com/you-dont-need-all-that-complex-expensive-distracting-infrastructure-a70dbe0dbccb
openai.com openai.com

Scaling Kubernetes to 7,500 Nodes

2
1. pyxelr 07 Mar 2021
  
  in Public
  
  We use Prometheus to collect time-series metrics and Grafana for graphs, dashboards, and alerts.
  
  How Prometheus and Grafana can be used to collect information from running ML on K8s
  
  Kubernetes MLOps Prometheus Grafana
2. pyxelr 07 Mar 2021
  
  in Public
  
  large machine learning job spans many nodes and runs most efficiently when it has access to all of the hardware resources on each node. This allows GPUs to cross-communicate directly using NVLink, or GPUs to directly communicate with the NIC using GPUDirect. So for many of our workloads, a single pod occupies the entire node.
  
  The way OpenAI runs large ML jobs on K8s
  
  Kubernetes MLOps
Visit annotations in context

Tags

Kubernetes

Grafana

MLOps

Prometheus

Annotators

pyxelr

URL

openai.com/blog/scaling-kubernetes-to-7500-nodes/
openai.com openai.com

Scaling Kubernetes to 2,500 Nodes

6
1. pyxelr 07 Mar 2021
  
  in Public
  
  We use Kubernetes mainly as a batch scheduling system and rely on our autoscaler to dynamically scale up and down our cluster — this lets us significantly reduce costs for idle nodes, while still providing low latency while iterating rapidly.
  
  Kubernetes autoscaler
  
  MLOps Kubernetes
2. pyxelr 07 Mar 2021
  
  in Public
  
  For high availability, we always have at least 2 masters, and set the --apiserver-count flag to the number of apiservers we’re running (otherwise Prometheus monitoring can get confused between instances).
  
  Tip for high availability:
  
  have at least 2 masters
  
  set --apiserver-count flag to the number of running apiservers
  
  MLOps Kubernetes
3. pyxelr 07 Mar 2021
  
  in Public
  
  We’ve increased the max etcd size with the --quota-backend-bytes flag, and the autoscaler now has a sanity check not to take action if it would terminate more than 50% of the cluster.
  
  If we've more than 1k nodes, etcd's hard storage limit might stop accepting writes
  
  MLOps Kubernetes etcd
4. pyxelr 07 Mar 2021
  
  in Public
  
  Another helpful tweak was storing Kubernetes Events in a separate etcd cluster, so that spikes in Event creation wouldn’t affect performance of the main etcd instances.
  
  Another trick apart from tweaking default settings of Fluentd & Datadog
  
  MLOps Kubernetes
5. pyxelr 07 Mar 2021
  
  in Public
  
  The root cause: the default setting for Fluentd’s and Datadog’s monitoring processes was to query the apiservers from every node in the cluster (for example, this issue which is now fixed). We simply changed these processes to be less aggressive with their polling, and load on the apiservers became stable again:
  
  Default settings of Fluentd and Datadog might not be suited for running many nodes
  
  Datadog Fluentd Kubernetes MLOps
6. pyxelr 07 Mar 2021
  
  in Public
  
  We then moved the etcd directory for each node to the local temp disk, which is an SSD connected directly to the instance rather than a network-attached one. Switching to the local disk brought write latency to 200us, and etcd became healthy!
  
  One of the solutions for etcd using only about 10% of the available IOPS. It was working till about 1k nodes
  
  MLOps Kubernetes etcd
Visit annotations in context

Tags

Kubernetes

etcd

Datadog

Fluentd

MLOps

Annotators

pyxelr

URL

openai.com/blog/scaling-kubernetes-to-2500-nodes/
Dec 2020
www.pidramble.com www.pidramble.com

Home | Raspberry Pi Dramble

1
1. almereyda 10 Dec 2020
  
  in Public
  
  Drupal Kubernetes Raspberry Pi
Visit annotations in context

Tags

Raspberry Pi

Kubernetes

Drupal

Annotators

almereyda

URL

pidramble.com/home
Nov 2020
stackoverflow.com stackoverflow.com

How to join the default bridge network with docker-compose v2?

1
1. TylerRick 02 Nov 2020
  
  in Public
  
  Docker Swarm has lost. Kubernetes has won. My advice? use docker-compose.yml was development only, stick to version: 2.4 and forget 3 exists :+1
  
  endorsement (picking a winner) advice docker kubernetes Docker Swarm
Visit annotations in context

Tags

Docker Swarm

advice

endorsement (picking a winner)

kubernetes

docker

Annotators

TylerRick

URL

stackoverflow.com/questions/43754095/how-to-join-the-default-bridge-network-with-docker-compose-v2
Oct 2020
marketplace.visualstudio.com marketplace.visualstudio.com

Bridge to Kubernetes - Visual Studio Marketplace

1
1. TylerRick 20 Oct 2020
  
  in Public
  
  VS Code: extensions Kubernetes microservices
Visit annotations in context

Tags

Kubernetes

microservices

VS Code: extensions

Annotators

TylerRick

URL

marketplace.visualstudio.com/items
www.spectrocloud.com www.spectrocloud.com

AI/ML and GPUs in Kubernetes - Spectro Cloud

1
1. simfox3 19 Oct 2020
  
  in Public
  
  Kubernetes doesn’t have the ability to schedule and manage GPU resources
  
  But it's provided as a plugin
  
  kubernetes
Visit annotations in context

Tags

kubernetes

Annotators

simfox3

URL

spectrocloud.com/blog/vGPUs-in-kubernetes-land/
May 2020
docs.gitlab.com docs.gitlab.com

Using the Container Registry | GitLab

1
1. TylerRick 28 May 2020
  
  in Public
  
  Kubernetes docker registry
Visit annotations in context

Tags

Kubernetes

docker registry

Annotators

TylerRick

URL

docs.gitlab.com/charts/charts/registry/
Apr 2020
www.reddit.com www.reddit.com

r/kubernetes - ELI5 what is Kubernetes?

1
1. pyxelr 28 Apr 2020
  
  in Public
  
  It's responsible for allocating and scheduling containers, providing then with abstracted functionality like internal networking and file storage, and then monitoring the health of all of these elements and stepping in to repair or adjust them as necessary.In short, it's all about abstracting how, when and where containers are run.
  
  Kubernetes (simple explanation)
  
  webarchitecture Kubernetes
Visit annotations in context

Tags

Kubernetes

webarchitecture

Annotators

pyxelr

URL

reddit.com/r/kubernetes/comments/a9w3tr/eli5_what_is_kubernetes/
dev.to dev.to

Architecture for Everyone - DEV Community 👩‍💻👨‍💻

5
1. pyxelr 28 Apr 2020
  
  in Public
  
  You’ll see pressure to push towards “Cloud neutral” solutions using Kubernetes in various places
  
  Maybe Kubernetes has the advantage of being cloud neutral, but: you pay the cost of a cloud migration:
  
  maintaining abstractions
  
  isolating your way from useful vendor specific features
  
  webarchitecture Kubernetes
2. pyxelr 28 Apr 2020
  
  in Public
  
  Heroku? App Services? App Engine?
  
  You can set up yourself in production in minutes to only a few hours
  
  webarchitecture Kubernetes
3. pyxelr 28 Apr 2020
  
  in Public
  
  Kubernetes (often irritatingly abbreviated to k8s, along with it’s wonderful ecosystem of esoterically named additions like helm, and flux) requires a full time ops team to operate, and even in “managed vendor mode” on EKS/AKS/GKS the learning curve is far steeper than the alternatives.
  
  Kubernetes:
  
  require a full time ops team to operate
  
  the learning curve is far steeper than the alternatives
  
  webarchitecture Kubernetes
4. pyxelr 28 Apr 2020
  
  in Public
  
  Azure App Services, Google App Engine and AWS Lambda will be several orders of magnitude more productive for you as a programmer. They’ll be easier to operate in production, and more explicable and supported.
  
  Use the closest thing to a pure-managed platform as you possibly can. It will be easier to operate in production, and more explicable and supported:
  
  Azure App Service
  
  Google App Engine
  
  AWS Lambda
  
  webarchitecture Kubernetes
5. pyxelr 28 Apr 2020
  
  in Public
  
  With the popularisation of docker and containers, there’s a lot of hype gone into things that provide “almost platform like” abstractions over Infrastructure-as-a-Service. These are all very expensive and hard work.
  
  Kubernetes aren't always required unless you work on huge problems
  
  webarchitecture Kubernetes
Visit annotations in context

Tags

Kubernetes

webarchitecture

Annotators

pyxelr

URL

dev.to/david_whitney/architecture-for-everyone-1b0h
Mar 2020
pythonspeed.com pythonspeed.com

“Let’s use Kubernetes!” Now you have 8 problems

11
1. pyxelr 07 Mar 2020
  
  in Public
  
  from Docker Compose on a single machine, to Heroku and similar systems, to something like Snakemake for computational pipelines.
  
  Other alternatives to Kubernetes:
  
  Docker Compose on a single machine
  
  Heroku and similar systems
  
  Snakemake for computational pipelines
  
  Kubernetes Docker DataEngineering
2. pyxelr 07 Mar 2020
  
  in Public
  
  if what you care about is downtime, your first thought shouldn’t be “how do I reduce deployment downtime from 1 second to 1ms”, it should be “how can I ensure database schema changes don’t prevent rollback if I screw something up.”
  
  Caring about downtime
  
  Kubernetes DataEngineering
3. pyxelr 07 Mar 2020
  
  in Public
  
  The features Kubernetes provides for reliability (health checks, rolling deploys), can be implemented much more simply, or already built-in in many cases. For example, nginx can do health checks on worker processes, and you can use docker-autoheal or something similar to automatically restart those processes.
  
  Kubernetes' health checks can be replaced with nginx on worker processes + docker-autoheal to automatically restart those processes
  
  Kubernetes DataEngineering
4. pyxelr 07 Mar 2020
  
  in Public
  
  Scaling for many web applications is typically bottlenecked by the database, not the web workers.
  
  Kubernetes DataEngineering
5. pyxelr 07 Mar 2020
  
  in Public
  
  Kubernetes might be useful if you need to scale a lot. But let’s consider some alternatives
  
  Kubernetes alternatives:
  
  cloud VMs with up to 416 vCPUs and 8 TiB RAM
  
  scale many web apps with Heroku
  
  Kubernetes DataEngineering
6. pyxelr 07 Mar 2020
  
  in Public
  
  Distributed applications are really hard to write correctly. Really. The more moving parts, the more these problems come in to play. Distributed applications are hard to debug. You need whole new categories of instrumentation and logging to getting understanding that isn’t quite as good as what you’d get from the logs of a monolithic application.
  
  Microservices stay as a hard nut to crack.
  
  They are fine for an organisational scaling technique: when you have 500 developers working on one live website (so they can work independently). For example, each team of 5 developers can be given one microservice
  
  Kubernetes DataEngineering
7. pyxelr 07 Mar 2020
  
  in Public
  
  you need to spin up a complete K8s system just to test anything, via a VM or nested Docker containers.
  
  You need a complete K8s to run your code, or you can use Telepresence to code locally against a remote Kubernetes cluster
  
  Kubernetes DataEngineering
8. pyxelr 07 Mar 2020
  
  in Public
  
  “Kubernetes is a large system with significant operational complexity. The assessment team found configuration and deployment of Kubernetes to be non-trivial, with certain components having confusing default settings, missing operational controls, and implicitly defined security controls.”
  
  Deployment of Kubernetes is non-trivial
  
  Kubernetes DataEngineering
9. pyxelr 07 Mar 2020
  
  in Public
  
  Before you can run a single application, you need the following highly-simplified architecture
  
  Before running the simplest Kubernetes app, you need at least this architecture:
  
  Kubernetes DataEngineering
10. pyxelr 07 Mar 2020
  
  in Public
  
  the Kubernetes codebase has significant room for improvement. The codebase is large and complex, with large sections of code containing minimal documentation and numerous dependencies, including systems external to Kubernetes.
  
  As of March 2020, the Kubernetes code base has more than 580 000 lines of Go code
  
  Kubernetes DataEngineering
11. pyxelr 07 Mar 2020
  
  in Public
  
  Kubernetes has plenty of moving parts—concepts, subsystems, processes, machines, code—and that means plenty of problems.
  
  Kubernetes might be not the best solution in a smaller team
  
  Kubernetes DataEngineering
Visit annotations in context

Tags

Kubernetes

Docker

DataEngineering

Annotators

pyxelr

URL

pythonspeed.com/articles/dont-need-kubernetes/
Feb 2020
github.com github.com

cloudpilots/gkubenv

1
1. TylerRick 18 Feb 2020
  
  in Public
  
  Kubernetes direnv
Visit annotations in context

Tags

Kubernetes

direnv

Annotators

TylerRick

URL

github.com/cloudpilots/gkubenv
Jan 2020
console.cloud.google.com console.cloud.google.com

Settings - Kloudless - Google Cloud Platform

1
1. TylerRick 22 Jan 2020
  
  in Public
  
  Knative
  
  Kubernetes Knative serverless
Visit annotations in context

Tags

Knative

Kubernetes

serverless

Annotators

TylerRick

URL

console.cloud.google.com/
knative.dev knative.dev

Knative

1
1. TylerRick 22 Jan 2020
  
  in Public
  
  Kubernetes
  
  containers serverless Kubernetes Knative
Visit annotations in context

Tags

Kubernetes

Knative

containers

serverless

Annotators

TylerRick

URL

knative.dev/
Jul 2019
engineering.dollarshaveclub.com engineering.dollarshaveclub.com

Elixir OTP applications on Kubernetes – DSC Engineering

1
1. SamRose 07 Jul 2019
  
  in Public
  
  elixir otp kubernetes
Visit annotations in context

Tags

otp

elixir

kubernetes

Annotators

SamRose

URL

engineering.dollarshaveclub.com/elixir-otp-applications-on-kubernetes-9944636b8609
Jun 2019
www.tweag.io www.tweag.io

Configuring and testing Kubernetes clusters with KubeNix and kind

1
1. SamRose 11 Jun 2019
  
  in Public
  
  nix k8s kubernetes
Visit annotations in context

Tags

k8s

nix

kubernetes

Annotators

SamRose

URL

tweag.io/posts/2019-03-07-configuring-and-testing-kubernetes-clusters.html
May 2019
kubernetes.io kubernetes.io

Installing kubeadm

2
1. linuxtrek 16 May 2019
  
  in Public
  
  Installing runtime
  
  apt-get install -y docker.io
  
  kubernetes
2. linuxtrek 16 May 2019
  
  in Public
  
  apt-get update && apt-get install -y apt-transport-https curl curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - cat <<EOF >/etc/apt/sources.list.d/kubernetes.list deb https://apt.kubernetes.io/ kubernetes-xenial main EOF apt-get update apt-get install -y kubelet kubeadm kubectl apt-mark hold kubelet kubeadm kubectl
  
  Install Docker container runtime first.
  
  apt-get install -y docker.io
  
  kubernetes
Visit annotations in context

Tags

kubernetes

Annotators

linuxtrek

URL

kubernetes.io/docs/setup/independent/install-kubeadm/

Joining your nodes

Install runtime.

sudo -i
apt-get update && apt-get upgrade -y
apt-get install -y docker.io

Install kubeadm, kubelet and kubectl.

https://kubernetes.io/docs/setup/independent/install-kubeadm/#installing-kubeadm-kubelet-and-kubectl

apt-get update && apt-get install -y apt-transport-https curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF
apt-get update
apt-get install -y kubelet kubeadm kubectl
apt-mark hold kubelet kubeadm kubectl

kubernetes

Visit annotations in context

Annotators

linuxtrek

URL

kubernetes.io/docs/setup/independent/create-cluster-kubeadm/

Apr 2019
sookocheff.com sookocheff.com

A Guide to the Kubernetes Networking Model

1
1. SamRose 13 Apr 2019
  
  in Public
  
  kubernetes network
Visit annotations in context

Tags

network

kubernetes

Annotators

SamRose

URL

sookocheff.com/post/kubernetes/understanding-kubernetes-networking-model/
Mar 2019
testdriven.io testdriven.io

Deploying Spark on Kubernetes

1
1. SamRose 29 Mar 2019
  
  in Public
  
  apache spark sparkml kubernetes
Visit annotations in context

Tags

sparkml

apache spark

kubernetes

Annotators

SamRose

URL

testdriven.io/blog/deploying-spark-on-kubernetes/
www.shapeblock.com www.shapeblock.com

OpenShift vs Kubernetes - ShapeBlock

1
1. hotcyv 28 Mar 2019
  
  in Public
  
  OpenShift vs Kubernetes
  
  openshift kubernetes
Visit annotations in context

Tags

openshift

kubernetes

Annotators

hotcyv

URL

shapeblock.com/openshift-vs-kubernetes/
www.quora.com www.quora.com

How are OpenShift, OpenStack, Kubernetes, and Docker comparable and different? - Quora

1
1. hotcyv 28 Mar 2019
  
  in Public
  
  openstack openshift kubernetes docker
Visit annotations in context

Tags

kubernetes

docker

openstack

openshift

Annotators

hotcyv

URL

quora.com/How-are-OpenShift-OpenStack-Kubernetes-and-Docker-comparable-and-different
www.thedevelopersconference.com.br www.thedevelopersconference.com.br

Trilha DevOps Tools - TDC 2019 Florianópolis

1
1. cesarbrod 08 Mar 2019
  
  in Public
  
  Pipeline de CI/CD no Kubernetes usando Jenkins e Spinnaker
  
  Uau! Muitos assuntos da prova LPI DevOps são explorados nessa palestra. Fica de olho no tópico: 702 Container Management.
  
  Containers Kubernetes Docker Jenkins Spinnaker DevOps LPI
Visit annotations in context

Tags

Kubernetes

DevOps

Docker

Containers

Jenkins

LPI

Spinnaker

Annotators

cesarbrod

URL

thedevelopersconference.com.br/tdc/2019/florianopolis/trilha-devops-tools
Feb 2019
github.com github.com

kubernetes/community

1
1. SamRose 13 Feb 2019
  
  in Public
  
  kubernetes aws
Visit annotations in context

Tags

aws

kubernetes

Annotators

SamRose

URL

github.com/kubernetes/community/blob/master/sig-aws/kubernetes-on-aws.md
forum.sentry.io forum.sentry.io

Solved: Keeping state in Sentry onpremise with Kubernetes pod

1
1. SamRose 04 Feb 2019
  
  in Public
  
  sentry on-premise kubernetes state
Visit annotations in context

Tags

kubernetes

on-premise

state

sentry

Annotators

SamRose

URL

forum.sentry.io/t/solved-keeping-state-in-sentry-onpremise-with-kubernetes-pod/724
Jan 2019
dasunhegoda.com dasunhegoda.com

Getting Started with Kubernetes and Running locally via Minikube

1
1. SamRose 17 Jan 2019
  
  in Public
  
  minikube kubernetes
Visit annotations in context

Tags

minikube

kubernetes

Annotators

SamRose

URL

dasunhegoda.com/kubernetes-running-locally-via-minikube/1204/
Dec 2018
github.com github.com

kubernetes/kops

1
1. SamRose 29 Dec 2018
  
  in Public
  
  kubernetes kops
Visit annotations in context

Tags

kops

kubernetes

Annotators

SamRose

URL

github.com/kubernetes/kops/blob/master/docs/install.md
offlinehacker.github.io offlinehacker.github.io

NixOS loves k8s

1
1. SamRose 15 Dec 2018
  
  in Public
  
  nixos kubernetes
Visit annotations in context

Tags

nixos

kubernetes

Annotators

SamRose

URL

offlinehacker.github.io/slides.kubernetes_on_nixos/
rzetterberg.github.io rzetterberg.github.io

Setting up Kubernetes on NixOS

1
1. SamRose 15 Dec 2018
  
  in Public
  
  kubernetes nixos nginx
Visit annotations in context

Tags

nixos

nginx

kubernetes

Annotators

SamRose

URL

rzetterberg.github.io/kubernetes-nixos.html
Jan 2018
blog.kubernauts.io blog.kubernauts.io

Your Container Orchestration Needs: Kubernetes vs. Mesos vs. Docker Swarm

1
1. SamRose 11 Jan 2018
  
  in Public
  
  kubernetes mesos
Visit annotations in context

Tags

mesos

kubernetes

Annotators

SamRose

URL

blog.kubernauts.io/your-container-orchestration-needs-kubernetes-vs-mesos-vs-docker-swarm-1efa9acb69be
Jul 2017
www.infoq.com www.infoq.com

深度学习框架TensorFlow在Kubernetes上的实践

1
1. flyisland 13 Jul 2017
  
  in Public
  
  这张图给出了谷歌在2015年提出的Inception-v3模型。这个模型在ImageNet数据集上可以达到95%的正确率。然而，这个模型中有2500万个参数，分类一张图片需要50亿次加法或者乘法运算。
  
  95%成功率，需要 25,000,000个参数！
  
  ai tensorflow kubernetes
Visit annotations in context

Tags

ai

tensorflow

kubernetes

Annotators

flyisland

URL

infoq.com/cn/articles/practise-of-tensorflow-on-kubernetes

Tags

Annotators

URL

Tags

Annotators

URL

Summary: Navigating Failures in Pods With Devices

Why AI/ML Workloads Are Different

Major Failure Modes in Kubernetes With Devices

Current Workarounds & Limitations

Roadmap: What’s Next for Kubernetes

Key Takeaway

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Why Gitpod is Leaving Kubernetes

Key Challenges of Kubernetes

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL