Partition Leader Balancing
balance the load on leader preferred replica - first replica of each partition preferred replica is evenly distributed among brokers tries to make the preferred replica as the leader and hence balancing the load
Partition Leader Balancing
balance the load on leader preferred replica - first replica of each partition preferred replica is evenly distributed among brokers tries to make the preferred replica as the leader and hence balancing the load
Replication
followers that lag behind replica lag time max ms is removed from the ISR
Kafka
follower failure is handled by removing the lagging or failed followers from the ISR and considering only the rest and moves the high watermark
Protocol
a log is committed when all replicas are in sync - ISR - then we move watermark
Data Plane
only data upto high watermark is visible to the consumers
Replication
leaders&followers leader commits log with an epoch(indicates generation of lifetime of a log) follower fetch with offset no to sync with leader leader reponds with new record logs with epoch in the next fetch request, leader moves high watermark based on offset in request (n-1) in that response it passes the high watermark to the follower
course
how data is received client -> socket -> n/w thread(lightweight) -> req queue -> i/o thread(crc check & append to commit log[.index and .log file]) -> resp queue -> n/w-> socket
purgatory map -> used for req until data is replicated
how data is fetched client -> socket -> n/w -> req queue -> i/o thread (fetch ranges are calcualted with .index) -> res queue -> n/w -> socket
purgatory map -> waits until is arrived based on consumer config
also follows zero copy - meaning data fetched from file directly to n/w thread - mostly page cached
if not cached -> may need to use tiered storage
Inside the Apache Kafka Broker
consumer properties => same as producer, time and batch size.
Kafka Broker
producer key properties => linger time -> linger.ms batch size -> batch.size
these determine the throughput and latency of kafkaproducers
embarrassingly parallel
independent work, without depending others
Part 1
Breakdown
Dana (Donella) Meadows Lecture: Sustainable Systems
deep insight -systems thinking - system behaviour comes out of the system, its interrelationships, its goals, not the elements of the people or actors in it
Amdahl's law
splitting up of tasks stops being a useful strategy if part of the tasks can't be further splittable
Containerization solves these proble
integration test with containerization looks like exactly what I need
calculate the inuse_
inuse is calculated by allocated-free
allocations
the runtime scales the collected sample values
runtime scales the value, so reported allocations can be approximately equal to actual irrespective of the MemProfileRate
gained from optimizing this function
we should look at the percentage rather than individual functions since the optimizing the latter won't probably make any difference
profiler labels to be missing from CPU profiles
again use Go1.18 atleast to avoid these errors
lets you to control the sampling rate of the CPU profiler
better to use Go1.18 versions to avoid these limitations
The example above is highly simplified and omits many details around return values, frame pointers, return addresses and function inlining. In fact, as of Go 1.17, the program above may not even need any space on the stack as the small amount of data can be managed using CPU registers by the compiler
interesting so if data is small, it is managed by the cpu registers by the compiler, stack is not used
recommend disabling optimizations when building the code being debugged
debugging should be done with compiler optimizations disabled
Go users can create their custom profiles via pprof.Profile and use the existing tools
do we have any custom implementations of these profiles available publicly
Turing & The Halting Problem
very interesting
HackAttic
haven't heard about this, but looks great
Between Java, Scala, Kotlin and search tools like ElasticSearch and Solr, or for old school data houses using Hadoop, I don't think there are *any* tech companies in the world that do not use Java or something JVM based The career penalty for not knowing Java is incredibly high
i agree on this
I want a tiny binary, for a monitoring/logging agent, or a systems tool like docker agent, or kubectl, fantastic, Go (or Rust) is the best language to go for. I want to make REST APIs, and some DB calls - no worse choices than Go to do that. Even Python and Node are far better
would like to know more on why Go isn't suited for webservers
Project Structure
golang project structure, is this a more idiomatic approach to build microservice in go