34 Matching Annotations
  1. Jan 2024
    1. HAProxy may also make use of a UNIX socket to send its logs to the local syslog daemon, but it is not recommended at all, because if the syslog server is restarted while haproxy runs, the socket will be replaced and new logs will be lost.

      Unix domain socket not recommended for logging

  2. Sep 2022
    1. These connections would then be reset, potentially causing errors for our customers.

      ECMP causing connection resets

    2. However, issues would quickly surface whenever the set of load balancers changed.

      ECMP is flawed when the next-hop size changes

  3. Sep 2021
    1. Google SRE relies on on-call playbooks, in addition to exercises such as the "Wheel of Misfortune,"7 to prepare engineers to react to on-call events

      Runbooks are important, don't do Wheels of Fortune

    2. Direct costs are neither subtle nor ambiguous. Running a service with a team that relies on manual intervention for both change management and event handling becomes expensive as the service and/or traffic to the service grows, because the size of the team necessarily scales with the load generated by the system.

      Costs associated to manual work!!!

    3. SREs should receive a maximum of two events per 8–12-hour on-call shift. This target volume gives the on-call engineer enough time to handle the event accurately and quickly, clean up and restore normal service, and then conduct a postmortem

      Healthy page schedule for SREs

    4. SRE team is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their service(s)

      SRE responsibility in a team

    5. we want systems that are automatic, not just automated.

      Systems should be automatic, not automated.

    6. hiring for SRE is that we end up with a team of people who (a) will quickly become bored by performing tasks by hand, and (b) have the skill set necessary to write software to replace their previously manual work, even when the solution is complicated

      What kind of engineer should be hired for SRE?

  4. Jul 2021
    1. Uprobes allow you to intercept a userspace program by inserting a debug trap instruction (int3 on an x86) that triggers a soft-interrupt

      Which assembly instruction is used to "interrupt" code execution and run our own stuff, for ex, uprobes in eBPF?

  5. Jun 2021
    1. we created a virtual unit for CPU rate called "GCU" (Google Compute Units). GCUs became the standard for modeling CPU rates, and were used to maintain a mapping from each CPU architecture in our datacenters to its corresponding GCU based upon its performance.

      How to deal with heterogenous hardware in DC?

    2. The backend task is listening on its port and can serve, but is explicitly asking clients to stop sending requests.

      What is lame ducks approach to task distribution?

    3. When this active-request count reaches a configured limit, the client treats the backend as unhealthy and no longer sends it requests. For most backends, 100 is a reasonable limit; in the average case, requests tend to finish fast enough that it is very rare for the number of active requests from a given client to reach this limit under normal operating conditions.

      Managing tak distributribution by flow control

    1. Encapsulation introduces overhead (24 bytes in the case of IPv4+GRE, to be precise), which can cause the packet to exceed the available Maximum Transmission Unit (MTU) size and require fragmentation.

      What's the overhead for ipv4+gre encapsulation?

    2. As a result, we can usually use simple connection tracking, but fall back to consistent hashing when the system is under pressure (e.g., during an ongoing denial of service attack).

      What's better for TCP load balancing? connection tracking or consistent hashing?

  6. May 2021
  7. Apr 2021
    1. The router on a TOR-switch knows only two routes - to the rack's subnet and the default one, pointing to its distribution layer switch.

      How routing in TOR switches work?

    2. Router - non-transparent Layer 3 device performing IP packet forwarding between multiple L3 segments.

      What is a router?

    3. Bridges transparently combine network nodes into Layer 2 segments creating Layer 2 broadcast domains. Nodes of a single segment can exchange data link layer frames with each other using either unicast (MAC) or broadcast addresses.

      What is a bridge?

  8. Mar 2021
    1. func main() { s := "한국어" // 3 Korean characters, encoded in 9 bytes byteLen := len(s) runeLen := utf8.RuneCountInString(s) runeLen2 := len([]rune(s)) // same thing as doing RuneCountInString fmt.Println(byteLen, runeLen, runeLen2) // prints 9 3 3 }

      Length on string is number of bytes, not code points in utf8

    2. One way to modify a string is to first convert it to a byte slice and then back to a string

      Why use byte slice vs string?

    3. If you pass an array into a function, the whole array will be copied. You can still pass a pointer to an array to not have it copied.

      Arrays are copied by value in functions

    4. Arrays of different length are considered to be different incompatible types.

      Different size arrays are different types

    5. func main() { v1 := 1 if v1 == 1 { v1, v2 := 2, 3 fmt.Println(v1, v2) // prints 2, 3 } fmt.Println(v1) // prints 1 ! }

      If shorthand declaration in new block, it’s treated as new declaration

    6. func main() { var s myStruct // error: non-name s.Field on the left side of := s.Field, newVar := 1, 2

      Shorthand declaration doesn’t work for structs

    7. Exceptions to that rule are global variables and function arguments:

      Global variables can stay unused

    1. But we also use Rust to deliver services such as Amazon Simple Storage Service (Amazon S3), Amazon Elastic Compute Cloud (Amazon EC2), Amazon CloudFront, Amazon Route 53, and more.

      Amazon uses Rush for a lot of their cloud offerings

  9. Feb 2021
    1. The difference between "--" and "-f" is that one "-f" must be placed before each file name, while a single "--" is needed before all file names. Both options can be used together, the command line ordering still applies. When more than one file is specified, each file must start on a section boundary, so the first keyword of each file must be one of "global", "defaults", "peers", "listen", "frontend", "backend", and so on. A file cannot contain just a server list for example.

      HAProxy, you can either specify one file or a collection of files.

    2. On Linux this is done using taskset (for haproxy) or using cpu-map (from the haproxy config), and the interrupts are assigned under /proc/irq. Many network interfaces support multiple queues and multiple interrupts. In general it helps to spread them across a small number of CPU cores provided they all share the same L3 cache. Please always stop irq_balance which always does the worst possible thing on such workloads.

      How to improve IRQ performance with HAProxy?

    3. It measures the time spent waiting in poll() compared to the time spent doing processing events. The ratio of polling time vs total time is called the "idle" time, it's the amount of time spent waiting for something to happen. This ratio is reported in the stats page on the "idle" line, or "Idle_pct" on the CLI.

      How to find if HAProxy is busy?

    4. By default, since the focus is set on performance, each released object is put back into the pool it came from, and allocated objects are never freed since they are expected to be reused very soon. On the CLI, it is possible to check how memory is being used in pools thanks to the "show pools" command

      How to check how HAProxy memory is being utilized?

    5. his may be used when splice() is suspected to behave improperly or to cause performance issues, or when using strace to see the forwarded data (which do not appear when using splice()).

      "splice" is a way to transfer data b/w file descriptors without copying memory.

    1. the ability to enable the "defer-accept" bind option to only get notified of an incoming connection once data become available in the kernel buffers

      How to efficiently receive packet only when it’s ready in memory?

    2. TCP splicing to let the kernel forward data between the two sides of a connections thus avoiding multiple memory copies,

      How do efficiently copy network packets between destinations?