75 Matching Annotations
  1. Oct 2021
    1. Argo Workflow is part of the Argo project, which offers a range of, as they like to call it, Kubernetes-native get-stuff-done tools (Workflow, CD, Events, Rollouts).

      High level definition of Argo Workflow

    2. Argo is designed to run on top of k8s. Not a VM, not AWS ECS, not Container Instances on Azure, not Google Cloud Run or App Engine. This means you get all the good of k8s, but also the bad.

      Pros of Argo Workflow:

      • Resilience
      • Autoscaling
      • Configurability
      • Support for RBAC

      Cons of Argo Workflow:

      • A lot of YAML files required
      • k8s knowledge required
    3. If you are already heavily invested in Kubernetes, then yes look into Argo Workflow (and its brothers and sisters from the parent project).The broader and harder question you should ask yourself is: to go full k8s-native or not? Look at your team’s cloud and k8s experience, size, growth targets. Most probably you will land somewhere in the middle first, as there is no free lunch.

      Should you go into Argo, or not?

    4. In order to reduce the number of lines of text in Workflow YAML files, use WorkflowTemplate . This allow for re-use of common components.

      kind: WorkflowTemplate

  2. Sep 2021
    1. Hence, Podman allows the creation and execution of Pods from a Kubernetes YAML file (see podman-play-kube). Podman can also generate Kubernetes YAML based on a container or Pod (see podman-generate-kube), which allows for an easy transition from a local development environment to a production Kubernetes cluster.
    2. We believe that Kubernetes is the defacto standard for composing Pods and for orchestrating containers, making Kubernetes YAML a defacto standard file format.
    1. kind, microk8s, or k3s are replacements for Docker Desktop. False. Minikube is the only drop-in replacement. The other tools require a Linux distribution, which makes them a non-starter on macOS or Windows. Running any of these in a VM misses the point – you don't want to be managing the Kubernetes lifecycle and a virtual machine lifecycle. Minikube abstracts all of this.

      At the current moment the best approach is to use minikube with a preferred backend (Docker Engine and Podman are already there), and you can simply run one command to configure Docker CLI to use the engine from the cluster.

  3. Aug 2021
    1. 首先我们看nat表的OUTPUT规则:

      使用以下命令可以查看结果:

      sudo iptables -t nat -nvL OUTPUT
      
    1. kubectl run --image=nginx nginx-web-1 --image-pull-policy='IfNotPresent'

      这里应该要创建 Deployment:

      kubectl create deployment nginx-web-1 --image=nginx
      
    1. k3d is basically running k3s inside of Docker. It provides an instant benefit over using k3s on a local machine, that is, multi-node clusters. Running inside Docker, we can easily spawn multiple instances of our k3s Nodes.

      k3d <--- k3s that allows to run mult-node clusters on a local machine

    2. Kubernetes in Docker (KinD) is similar to minikube but it does not spawn VM's to run clusters and works only with Docker. KinD for the most part has the least bells and whistles and offers an intuitive developer experience in getting started with Kubernetes in no time.

      KinD (Kubernetes in Docker) <--- sounds like the most recommended solution to learn k8s locally

    3. Contrary to the name, it comes in a larger binary of 150 MB+. It can be run as a binary or in DinD mode. k0s takes security seriously and out of the box, it meets the FIPS compliance.

      k0s <--- similar to k3s, but not as lightweight

    4. k3s is a lightweight Kubernetes distribution from Rancher Labs. It is specifically targeted for running on IoT and Edge devices, meaning it is a perfect candidate for your Raspberry Pi or a virtual machine.

      k3s <--- lightweight solution

    5. All of the tools listed here more or less offer the same feature, including but not limited to

      7 tools for learning k8s locally:

      1. k3s
      2. k0s
      3. Microk8s
      4. DinD
      5. minikube
      6. KinD
      7. k3d
    6. There are multiple tools for running Kubernetes on your local machine, but it basically boils down to two approaches on how it is done

      We can run Kubernetes locally as a:

      1. binary package
      2. container using dind
    7. Before we move on to talk about all the tools, it will be beneficial if you installed arkade on your machine.

      With arkade, we can quickly set up different k8s tools, while using a single command:

      e.g. arkade get k9s

  4. Jul 2021
    1. there is a drawback, docker-compose runs on a single node which makes scaling hard, manual and very limited. To be able to scale services across multiple hosts/nodes, orchestrators like docker-swarm or kubernetes comes into play.
      • docker-compose runs on a single node (hard to scale)
      • docker-swarm or kubernetes run on multiple nodes
    1. Even though Kubernetes is moving away from Docker, it will always support the OCI and Docker image formats. Kubernetes doesn’t pull and run images itself, instead the Kubelet relies on container engines like CRI-O and containerd to pull and run the images. These are the two main container engines used with CRI-O and they both support the Docker and OCI image formats, so no worries on this one.

      Reason why one should not be worried about k8s depreciating Docker

  5. Jun 2021
    1. Secure service-to-service communication in a cluster with TLS encryption, strong identity-based authentication and authorization

      What about between clusters?

    1. 中文读者可以在完成了本文学习后,参考这篇文章了解。文章对本文的源码分析做了一些补充。

  6. May 2021
    1. The only problem is that Kubeflow Pipelines must be deployed on a Kubernetes Cluster. You will struggle with permissions, VPC and lots of problems to deploy and use it if you are in a small company that uses sensitive data, which makes it a bit difficult to be adoptedVertex AI solves this problem with a managed pipeline runner: you can define a Pipeline and it will executed it, being responsible to provision all resources, store all the artifacts you want and pass them through each of the wanted steps.

      How Vertex AI solves the problem/need of deploying on a Kubernetes Cluster

  7. Apr 2021
    1. With Spark 3.1, the Spark-on-Kubernetes project is now considered Generally Available and Production-Ready.

      With Spark 3.1 k8s becomes the right option to replace YARN

  8. Mar 2021
    1. We use Prometheus to collect time-series metrics and Grafana for graphs, dashboards, and alerts.

      How Prometheus and Grafana can be used to collect information from running ML on K8s

    2. large machine learning job spans many nodes and runs most efficiently when it has access to all of the hardware resources on each node. This allows GPUs to cross-communicate directly using NVLink, or GPUs to directly communicate with the NIC using GPUDirect. So for many of our workloads, a single pod occupies the entire node.

      The way OpenAI runs large ML jobs on K8s

    1. We use Kubernetes mainly as a batch scheduling system and rely on our autoscaler to dynamically scale up and down our cluster — this lets us significantly reduce costs for idle nodes, while still providing low latency while iterating rapidly.
    2. For high availability, we always have at least 2 masters, and set the --apiserver-count flag to the number of apiservers we’re running (otherwise Prometheus monitoring can get confused between instances).

      Tip for high availability:

      • have at least 2 masters
      • set --apiserver-count flag to the number of running apiservers
    3. We’ve increased the max etcd size with the --quota-backend-bytes flag, and the autoscaler now has a sanity check not to take action if it would terminate more than 50% of the cluster.

      If we've more than 1k nodes, etcd's hard storage limit might stop accepting writes

    4. Another helpful tweak was storing Kubernetes Events in a separate etcd cluster, so that spikes in Event creation wouldn’t affect performance of the main etcd instances.

      Another trick apart from tweaking default settings of Fluentd & Datadog

    5. The root cause: the default setting for Fluentd’s and Datadog’s monitoring processes was to query the apiservers from every node in the cluster (for example, this issue which is now fixed). We simply changed these processes to be less aggressive with their polling, and load on the apiservers became stable again:

      Default settings of Fluentd and Datadog might not be suited for running many nodes

    6. We then moved the etcd directory for each node to the local temp disk, which is an SSD connected directly to the instance rather than a network-attached one. Switching to the local disk brought write latency to 200us, and etcd became healthy!

      One of the solutions for etcd using only about 10% of the available IOPS. It was working till about 1k nodes

  9. Dec 2020
  10. Nov 2020
  11. Oct 2020
    1. Kubernetes doesn’t have the ability to schedule and manage GPU resources

      But it's provided as a plugin

  12. May 2020
  13. Apr 2020
    1. It's responsible for allocating and scheduling containers, providing then with abstracted functionality like internal networking and file storage, and then monitoring the health of all of these elements and stepping in to repair or adjust them as necessary.In short, it's all about abstracting how, when and where containers are run.

      Kubernetes (simple explanation)

    1. You’ll see pressure to push towards “Cloud neutral” solutions using Kubernetes in various places

      Maybe Kubernetes has the advantage of being cloud neutral, but: you pay the cost of a cloud migration:

      • maintaining abstractions
      • isolating your way from useful vendor specific features
    2. Heroku? App Services? App Engine?

      You can set up yourself in production in minutes to only a few hours

    3. Kubernetes (often irritatingly abbreviated to k8s, along with it’s wonderful ecosystem of esoterically named additions like helm, and flux) requires a full time ops team to operate, and even in “managed vendor mode” on EKS/AKS/GKS the learning curve is far steeper than the alternatives.

      Kubernetes:

      • require a full time ops team to operate
      • the learning curve is far steeper than the alternatives
    4. Azure App Services, Google App Engine and AWS Lambda will be several orders of magnitude more productive for you as a programmer. They’ll be easier to operate in production, and more explicable and supported.

      Use the closest thing to a pure-managed platform as you possibly can. It will be easier to operate in production, and more explicable and supported:

      • Azure App Service
      • Google App Engine
      • AWS Lambda
    5. With the popularisation of docker and containers, there’s a lot of hype gone into things that provide “almost platform like” abstractions over Infrastructure-as-a-Service. These are all very expensive and hard work.

      Kubernetes aren't always required unless you work on huge problems

  14. Mar 2020
    1. from Docker Compose on a single machine, to Heroku and similar systems, to something like Snakemake for computational pipelines.

      Other alternatives to Kubernetes:

      • Docker Compose on a single machine
      • Heroku and similar systems
      • Snakemake for computational pipelines
    2. if what you care about is downtime, your first thought shouldn’t be “how do I reduce deployment downtime from 1 second to 1ms”, it should be “how can I ensure database schema changes don’t prevent rollback if I screw something up.”

      Caring about downtime

    3. The features Kubernetes provides for reliability (health checks, rolling deploys), can be implemented much more simply, or already built-in in many cases. For example, nginx can do health checks on worker processes, and you can use docker-autoheal or something similar to automatically restart those processes.

      Kubernetes' health checks can be replaced with nginx on worker processes + docker-autoheal to automatically restart those processes

    4. Scaling for many web applications is typically bottlenecked by the database, not the web workers.
    5. Kubernetes might be useful if you need to scale a lot. But let’s consider some alternatives

      Kubernetes alternatives:

      • cloud VMs with up to 416 vCPUs and 8 TiB RAM
      • scale many web apps with Heroku
    6. Distributed applications are really hard to write correctly. Really. The more moving parts, the more these problems come in to play. Distributed applications are hard to debug. You need whole new categories of instrumentation and logging to getting understanding that isn’t quite as good as what you’d get from the logs of a monolithic application.

      Microservices stay as a hard nut to crack.

      They are fine for an organisational scaling technique: when you have 500 developers working on one live website (so they can work independently). For example, each team of 5 developers can be given one microservice

    7. you need to spin up a complete K8s system just to test anything, via a VM or nested Docker containers.

      You need a complete K8s to run your code, or you can use Telepresence to code locally against a remote Kubernetes cluster

    8. “Kubernetes is a large system with significant operational complexity. The assessment team found configuration and deployment of Kubernetes to be non-trivial, with certain components having confusing default settings, missing operational controls, and implicitly defined security controls.”

      Deployment of Kubernetes is non-trivial

    9. Before you can run a single application, you need the following highly-simplified architecture

      Before running the simplest Kubernetes app, you need at least this architecture:

    10. the Kubernetes codebase has significant room for improvement. The codebase is large and complex, with large sections of code containing minimal documentation and numerous dependencies, including systems external to Kubernetes.

      As of March 2020, the Kubernetes code base has more than 580 000 lines of Go code

    11. Kubernetes has plenty of moving parts—concepts, subsystems, processes, machines, code—and that means plenty of problems.

      Kubernetes might be not the best solution in a smaller team

  15. Feb 2020
  16. Jan 2020
  17. Jul 2019
  18. Jun 2019
  19. May 2019
    1. Installing runtime

      apt-get install -y docker.io

    2. apt-get update && apt-get install -y apt-transport-https curl curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - cat <<EOF >/etc/apt/sources.list.d/kubernetes.list deb https://apt.kubernetes.io/ kubernetes-xenial main EOF apt-get update apt-get install -y kubelet kubeadm kubectl apt-mark hold kubelet kubeadm kubectl

      Install Docker container runtime first.

      apt-get install -y docker.io

    1. Joining your nodes

      Install runtime.

      sudo -i
      apt-get update && apt-get upgrade -y
      apt-get install -y docker.io
      

      Install kubeadm, kubelet and kubectl.

      https://kubernetes.io/docs/setup/independent/install-kubeadm/#installing-kubeadm-kubelet-and-kubectl

      apt-get update && apt-get install -y apt-transport-https curl
      curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
      cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
      deb https://apt.kubernetes.io/ kubernetes-xenial main
      EOF
      apt-get update
      apt-get install -y kubelet kubeadm kubectl
      apt-mark hold kubelet kubeadm kubectl
      
  20. Apr 2019
  21. Mar 2019
    1. Pipeline de CI/CD no Kubernetes usando Jenkins e Spinnaker

      Uau! Muitos assuntos da prova LPI DevOps são explorados nessa palestra. Fica de olho no tópico: 702 Container Management.

  22. Feb 2019
  23. Jan 2019
  24. Dec 2018
  25. Jan 2018
  26. Jul 2017
    1. 这张图给出了谷歌在2015年提出的Inception-v3模型。这个模型在ImageNet数据集上可以达到95%的正确率。然而,这个模型中有2500万个参数,分类一张图片需要50亿次加法或者乘法运算。

      95%成功率,需要 25,000,000个参数!