Hypothesis

138 Matching Annotations

May 2025
book.originit.top book.originit.top

极客时间-轻松学习，高效学习-极客邦

1
1. xuchen.xia 19 May 2025
  
  in Public
  
  group.instance.id
  
  组静态成员，避免瞬时的消费者不可用导致的rebalance
  
  kafka消费者
Visit annotations in context

Tags

kafka消费者

Annotators

xuchen.xia

URL

book.originit.top/single
Apr 2025
book.originit.top book.originit.top

极客时间-轻松学习，高效学习-极客邦

1
1. xuchen.xia 20 Apr 2025
  
  in Public
  
  ReplicaManager主要负责日志的写入以及副本的同步，如果ack配置的不是0，则需要等待副本写入满足acks才会回复响应。同时ReplicaManager会定时检查ISR是否已经不再同步，主要是看当前副本的LEO与Leader的LEO是否相同，或者上次lag的时间是否超过replica.lag.time.max.ms，默认30s
  
  Kafka副本
Visit annotations in context

Tags

Kafka副本

Annotators

xuchen.xia

URL

book.originit.top/single
book.originit.top book.originit.top

极客时间-轻松学习，高效学习-极客邦

6
1. xuchen.xia 17 Apr 2025
  
  in Public
  
  分区读取状态
  
  分区读取状态一般是Leader的状态吧
  
  Kafka问题
2. xuchen.xia 17 Apr 2025
  
  in Public
  
  可见，副本读取状态有截断中和获取中两个：当副本执行截断操作时，副本状态被设置成 Truncating；当副本被读取时，副本状态被设置成 Fetching。
  
  就是当前副本的状态，是否可以去Fetch,如果当前在截断中，就不能Fetch,而如果被delay了同理
  
  Kafka问题
3. xuchen.xia 17 Apr 2025
  
  in Public
  
  def isReplicaInSync: Boolean = lag.isDefined && lag.get <= 0
  
  副本处于ISR中居然是lag为0
  
  Kafka副本
4. xuchen.xia 17 Apr 2025
  
  in Public
  
  highWatermark
  
  各副本已经同步的消息offset, HW leader节点从所有的副本中获取LEO,取最小的LEO做HW,HW为所有已经成功写入全部副本的消息的最新位置。Follower的HW则是他的LEO和leader的HW的较小值
  
  kafka缩写
5. xuchen.xia 17 Apr 2025
  
  in Public
  
  lastStableOffset 是最新的 LSO 值，属于 Kafka 事务的概念。
  
  LSO，最早的一个未提交事务前的位点，也就是read commit能读取的位置
  
  kafka缩写
6. xuchen.xia 17 Apr 2025
  
  in Public
  
  副本对外提供读取服务
  
  Kafka副本
Visit annotations in context

Tags

Kafka问题

Kafka副本

kafka缩写

Annotators

xuchen.xia

URL

book.originit.top/single
book.originit.top book.originit.top

极客时间-轻松学习，高效学习-极客邦

3
1. xuchen.xia 16 Apr 2025
  
  in Public
  
  这是社区为了规避因多线程访问产生锁争用导致线程阻塞，从而引发请求超时问题而做的努力
  
  解决的问题
  
  当多线程同时执行该方法进行检查的时候，拿到锁的线程complete失败，而没拿到锁的线程直接跳过，那么如果没有其他线程再去处理的话，就永远也不会complete。
  
  如何解决？
  
  如果第一个线程成功将retry设置为false，那么第二个线程就会进行重试，而如果本身retry就是true，那么说明被其他线程先一步设置了，该线程不重试。但是如果多个线程顺序执行，都完成不了，那不是一样？
  
  kafka思考 Kafka问题
2. xuchen.xia 16 Apr 2025
  
  in Public
  
  bucket.flush(reinsert)
  
  相当于重新添加进时间轮，但是如果已经到了时间，那么就会执行。而且由于时间轮向后滚动，如果是最下面一级的bucket则所有元素都会过期，而上一级的bucket则会往下一级插入.
  
  由于DelayQueue的存在，时间复杂度和直接使用DelayQueue有什么不同？
  
  首先是以bucket为单位写入DelayQueue，bucket通过分层，使得每一层只有20个bucket，那么n层就只有n*20个bucket，大大减少了DelayQueue元素的数量，减小时间复杂度。
  
  kafka思考
3. xuchen.xia 16 Apr 2025
  
  in Public
  
  delayQueue
  
  为什么这里有一个存储所有TimerTaskList的DelayQueue?那我为什么不直接把所有任务往队列里面塞？这样不还是O(logn)时间复杂度吗插入
  
  Kafka问题
Visit annotations in context

Tags

Kafka问题

kafka思考

Annotators

xuchen.xia

URL

book.originit.top/single
book.originit.top book.originit.top

极客时间-轻松学习，高效学习-极客邦

8
1. xuchen.xia 15 Apr 2025
  
  in Public
  
  选择存活副本列表的第一个副本作为 Leader；选择存活副本列表作为 ISR
  
  NewPartition状态的分区，进行初始化（因为是初始化，所以数据都为空），会将存活副本设置为ISR,而将ISR第一个设置为Leader
  
  kafka思考
2. xuchen.xia 15 Apr 2025
  
  in Public
  
  基本上就是，找出 AR 列表（或给定副本列表）中首个处于存活状态，且在 ISR 列表的副本，将其作为新 Leader。
  
  Leader选举
  
  kafka分区
3. xuchen.xia 15 Apr 2025
  
  in Public
  
  即副本所在的 Broker 依然在运行中
  
  心跳机制来确定broker的状态
  
  kafka
4. xuchen.xia 15 Apr 2025
  
  in Public
  
  AR(Assigned Replicas)与ISR的区别
  
  AR表示分区的副本，而ISR表示与Leader同步的副本
  
  Kafka问题 kafka思考
5. xuchen.xia 15 Apr 2025
  
  in Public
  
  这是分区的副本列表。该列表有个专属的名称，叫 Assigned Replicas，简称 AR
  
  AR
  
  kafka缩写
6. xuchen.xia 15 Apr 2025
  
  in Public
  
  实际上消息队列都是以分区为单位组织的。而一个topic拥有多个分区，客户端可以根据一定的规则(Partitioner)往这多个分区中写入数据，而消费者则通过一定规则或者随机分配到部分partition，从partition中顺序进行消费
  
  kafka思考
7. xuchen.xia 15 Apr 2025
  
  in Public
  
  PartitionStateMachine定义的是分区是否有Leader
  
  kafka原理
8. xuchen.xia 15 Apr 2025
  
  in Public
  
  PartitionLeaderElectionStrategy 接口及其实现对象：定义 4 类分区 Leader 选举策略。你可以认为它们是发生 Leader 选举的 4 种场景。
  
  什么时候会发生分区Leader选举？
  
  Kafka问题
Visit annotations in context

Tags

kafka分区

kafka思考

kafka缩写

kafka原理

Kafka问题

kafka

Annotators

xuchen.xia

URL

book.originit.top/single
book.originit.top book.originit.top

16 | TopicDeletionManager： Topic是怎么被删除的？-极客时间

6
1. xuchen.xia 15 Apr 2025
  
  in Public
  
  代码会给所有符合状态转换的副本所在的 Broker，发送 StopReplicaRequest 请求，显式地告诉这些 Broker 停掉其上的对应副本。Kafka 的副本管理器组件（ReplicaManager）负责处理这个逻辑。后面我们会用两节课的时间专门讨论 ReplicaManager 的实现，这里你只需要了解，StopReplica 请求被发送出去之后，这些 Broker 上对应的副本就停止工作了。
  
  Kafka如何保证请求发送后能够按预期执行？
  
  通过重试兜底保证最终一致性？同时如果主从切换的话，会进行检查然后重新进行状态同步
  
  Kafka问题
2. xuchen.xia 15 Apr 2025
  
  in Public
  
  Controller给Broker发送请求是否需要保证Broker变更成功
  
  不需要，Controller发送请求后会异步的等待broker心跳中包含的响应。同时由于MQ本身的设计能够进行容错，即旧的状态如果不对，那么会进行重试，或者刷新缓存元数据
  
  Kafka问题
3. xuchen.xia 15 Apr 2025
  
  in Public
  
  源码会获取对应分区的详细数据，然后向该副本对象所在的 Broker 发送 LeaderAndIsrRequest 请求，令其同步获知，并保存该分区数据。
  
  为什么不是NewReplica状态就可以直接让其他broker同步？
  
  因为最前面判定了validReplicas,所以这些状态的变更都是允许的
  
  kafka思考
4. xuchen.xia 15 Apr 2025
  
  in Public
  
  尝试从元数据缓存中，
  
  为什么要以元数据缓存为准？
  
  Kafka问题
5. xuchen.xia 15 Apr 2025
  
  in Public
  
  doHandleStateChanges 方法
  
  副本状态变更流程
  
  kafka流程
6. xuchen.xia 15 Apr 2025
  
  in Public
  
  它会将 replicas 按照 Broker ID 进行分组。
  
  按brokerId分组进行副本状态变更
  
  kafka原理
Visit annotations in context

Tags

kafka流程

kafka原理

Kafka问题

kafka思考

Annotators

xuchen.xia

URL

book.originit.top/single
book.originit.top book.originit.top

极客时间-轻松学习，高效学习-极客邦

4
1. xuchen.xia 10 Apr 2025
  
  in Public
  
  日志是如何清理的呢？
  
  deletion策略，正常情况下，根据日志的配置，自动进行最大留存时间、最大留存大小的清理，以segment为单位,具体看kafka-log-retention任务
  
  compaction策略，topic可以设置compaction策略为compaction、delete + compaction，这样的话LogCleaner会定时进行compaction。优化策略，segments氛围cleaned和uncleaned，从uncleaned中遍历消息(在逻辑中可以根据需求读取需要的大小以节省内存)构建offsetMap来存储这一段中最新的key的offset，而从头遍历segments，清理掉offsetMap中已有的已有的消息(也就是最新消息)
  
  kafka cleaner
2. xuchen.xia 10 Apr 2025
  
  in Public
  
  (validBytes - lastIndexEntry > indexIntervalBytes)
  
  recover过程中会重置索引，然后遍历每个batch来添加索引项，这样就能避免掉原本写入消息时，一次性写入的多batch的消息最多只写入一次索引导致查询效率低的问题
  
  kafka log
3. xuchen.xia 10 Apr 2025
  
  in Public
  
  recover 方法
  
  什么时候recover？
  
  kafka在shutdown的时候会将segment刷盘并维护checkpoint和.kafka_cleanshutdown文件，每个partition一份.重启的时候会进行判定
  
  kafka log
4. xuchen.xia 03 Apr 2025
  
  in Public
  
  因为目前 Broker 端日志段新增倒计时是全局设置，这就是说，在未来的某个时刻可能同时创建多个日志段对象，这将极大地增加物理磁盘 I/O 压力。有了 rollJitterMs 值的干扰，每个新增日志段在创建时会彼此岔开一小段时间，这样可以缓解物理磁盘的 I/O 负载瓶颈。
  
  配置topic的segment.jitter.ms或者segment.ms让segment根据时间自动滚动时错峰，避免io压力过大
  
  即使segment的数据量不大，超过固定时间(默认一周)没写入数据，也会滚动segment，那如果很多segment都没写入的话，那么就会同时创建segment导致大量的io占用 rollJitterMs则是一个随机的扰动值,来源于{@link LogConfig#randomSegmentJitter()}，通过这个值可以将segment滚动的时间错开，缓解物理磁盘的 I/O 负载瓶颈
  
  kafka最佳实践
Visit annotations in context

Tags

kafka cleaner

kafka log

kafka最佳实践

Annotators

xuchen.xia

URL

book.originit.top/single
Mar 2025
book.originit.top book.originit.top

极客时间-轻松学习，高效学习-极客邦

1
1. xuchen.xia 30 Mar 2025
  
  in Public
  
  Follower 副本拉取消息后写入副本；
  
  不需要顾及acks，直接写入副本即可
  
  kafka副本
Visit annotations in context

Tags

kafka副本

Annotators

xuchen.xia

URL

book.originit.top/single
book.originit.top book.originit.top

极客时间-轻松学习，高效学习-极客邦

1
1. xuchen.xia 28 Mar 2025
  
  in Public
  
  但是为什么 AbstractFetcherThread 线程总要不断尝试去做截断呢？这是因为，分区的 Leader 可能会随时发生变化。每当有新 Leader 产生时，Follower 副本就必须主动执行截断操作，将自己的本地日志裁剪成与 Leader 一模一样的消息序列，甚至，Leader 副本本身也需要执行截断操作，将 LEO 调整到分区高水位处。
  
  当leader变更时，follower需要truncate自己的日志来保证与leader一致，而leader切换的时候，自身也需要truncate到HW来保持数据的一致性，但是这样会丢消息，当然如果没有到HW，那么需要所有节点同步的消息应该收不到回复
  
  kafka副本
Visit annotations in context

Tags

kafka副本

Annotators

xuchen.xia

URL

book.originit.top/single
book.originit.top book.originit.top

极客时间-轻松学习，高效学习-极客邦

4
1. xuchen.xia 28 Mar 2025
  
  in Public
  
  lastStableOffset
  
  LSO,和事务相关，当前如果有事务的话，没有提交的事务的消息对用户是不可见的，因此LSO在这个事务消息的前面来进行标记
  
  Kafka位点
2. xuchen.xia 28 Mar 2025
  
  in Public
  
  Leader如何知道当前的同步进度？
  
  partition的高水位线表示当前已同步的offset，这些消息是安全的，可以被consumer消费，因此通过这个可以看到消息的同步进度参考
  
  kafka副本
3. xuchen.xia 28 Mar 2025
  
  in Public
  
  如果Leader挂了，follower要达到什么标准才能成为leader？是否可能丢失数据？
  
  kafka副本
4. xuchen.xia 28 Mar 2025
  
  in Public
  
  副本同步的时候，谁主动进行同步？
  
  kafka副本
Visit annotations in context

Tags

kafka副本

Kafka位点

Annotators

xuchen.xia

URL

book.originit.top/single
Jan 2025
araji.medium.com araji.medium.com

How Tesla is using Kubernetes and Kafka to handle trillions of events per day

1
1. pyxelr 19 Jan 2025
  
  in Public
  
  How Tesla is using Kubernetes and Kafka to handle trillions of events per day
  
  Overview of Tesla's Data Infrastructure Challenges:
  
  Modern Tesla vehicles generate an enormous volume of telemetry data related to sensor readings, driver behavior, energy consumption, and more.
  
  The primary challenge is ingesting, processing, and analyzing this data at scale while maintaining real-time capabilities.
  
  Kubernetes for Orchestration:
  
  Tesla uses Kubernetes to manage containerized microservices across a distributed cloud environment.
  
  Kubernetes ensures dynamic scaling to handle fluctuating workloads, providing high availability for critical services.
  
  Each microservice is encapsulated in its own container, improving isolation and deployability.
  
  Kafka for Real-Time Event Streaming:
  
  Apache Kafka is the backbone of Tesla’s data pipeline, managing trillions of events daily from globally distributed vehicles.
  
  Kafka topics are structured to partition and replicate data efficiently, ensuring fault tolerance and high throughput.
  
  Producers (vehicles) send data to Kafka brokers, while consumers (analytics systems, data lakes) process these streams in real-time.
  
  Data Processing Pipelines:
  
  Data from Kafka is ingested into processing systems for real-time analytics, anomaly detection, and predictive maintenance.
  
  Stream processing frameworks (e.g., Apache Flink or Kafka Streams) analyze data for immediate feedback.
  
  Batch systems handle aggregation and storage in Tesla’s data lake for long-term insights.
  
  Key Technical Advantages:
  
  Scalability: Kubernetes dynamically allocates resources based on the volume of incoming data and computational requirements.
  
  Resilience: Kafka’s replication factor ensures that no single broker failure impacts the system.
  
  Low Latency: Data streams from Kafka enable Tesla to act on insights in milliseconds, critical for safety and performance monitoring.
  
  Simplified Management:
  
  The platform supports multi-cluster Kubernetes configurations for geographic data segregation.
  
  A central control plane monitors system health, manages deployments, and ensures compliance with data regulations.
  
  Future Goals and Improvements:
  
  Enhancing AI-driven analytics to derive deeper insights from vehicle data.
  
  Further optimizing Kafka’s cluster topology to improve fault tolerance and reduce operational costs.
  
  Expanding edge processing capabilities in vehicles to pre-filter data, reducing bandwidth requirements to the cloud.
  
  Kubernetes Kafka MLOps Tesla
Visit annotations in context

Tags

Tesla

Kafka

MLOps

Kubernetes

Annotators

pyxelr

URL

araji.medium.com/how-tesla-is-using-kubernetes-and-kafka-to-handle-trillions-of-events-per-day-01e6c370d49e
Aug 2023
developer.confluent.io developer.confluent.io

Kafka Data Replication Protocol: A Complete Guide

6
1. joobisb 11 Aug 2023
  
  in Public
  
  Partition Leader Balancing
  
  balance the load on leader preferred replica - first replica of each partition preferred replica is evenly distributed among brokers tries to make the preferred replica as the leader and hence balancing the load
  
  kafka replication load-balancing
2. joobisb 11 Aug 2023
  
  in Public
  
  Replication
  
  followers that lag behind replica lag time max ms is removed from the ISR
  
  kafka follower-replica replication
3. joobisb 11 Aug 2023
  
  in Public
  
  Kafka
  
  follower failure is handled by removing the lagging or failed followers from the ISR and considering only the rest and moves the high watermark
  
  kafka replication
4. joobisb 11 Aug 2023
  
  in Public
  
  Protocol
  
  a log is committed when all replicas are in sync - ISR - then we move watermark
  
  kafka replication
5. joobisb 11 Aug 2023
  
  in Public
  
  Data Plane
  
  only data upto high watermark is visible to the consumers
  
  kafka
6. joobisb 11 Aug 2023
  
  in Public
  
  Replication
  
  leaders&followers leader commits log with an epoch(indicates generation of lifetime of a log) follower fetch with offset no to sync with leader leader reponds with new record logs with epoch in the next fetch request, leader moves high watermark based on offset in request (n-1) in that response it passes the high watermark to the follower
  
  kafka replication
Visit annotations in context

Tags

replication

follower-replica

kafka

load-balancing

Annotators

joobisb

URL

developer.confluent.io/courses/architecture/data-replication/
developer.confluent.io developer.confluent.io

Apache Kafka Broker Performance: A Brief Introduction and Guide

3
1. joobisb 11 Aug 2023
  
  in Public
  
  course
  
  how data is received client -> socket -> n/w thread(lightweight) -> req queue -> i/o thread(crc check & append to commit log[.index and .log file]) -> resp queue -> n/w-> socket
  
  purgatory map -> used for req until data is replicated
  
  how data is fetched client -> socket -> n/w -> req queue -> i/o thread (fetch ranges are calcualted with .index) -> res queue -> n/w -> socket
  
  purgatory map -> waits until is arrived based on consumer config
  
  also follows zero copy - meaning data fetched from file directly to n/w thread - mostly page cached
  
  if not cached -> may need to use tiered storage
  
  kafka broker-internals
2. joobisb 11 Aug 2023
  
  in Public
  
  Inside the Apache Kafka Broker
  
  consumer properties => same as producer, time and batch size.
  
  kafka kafka-consumer-perf-test
3. joobisb 11 Aug 2023
  
  in Public
  
  Kafka Broker
  
  producer key properties => linger time -> linger.ms batch size -> batch.size
  
  these determine the throughput and latency of kafkaproducers
  
  kafka kafka-producer-perf-test
Visit annotations in context

Tags

broker-internals

kafka-producer-perf-test

kafka

kafka-consumer-perf-test

Annotators

joobisb

URL

developer.confluent.io/courses/architecture/broker/
May 2023
dalelane.co.uk dalelane.co.uk

Why are Kafka messages still on the topic after the retention time has expired?

1
1. pyxelr 29 May 2023
  
  in Public
  
  Today is 9th Feb. The oldest segment – segment 100 – still can’t be deleted by the 7-day topic retention policy, because the most recent message in that segment is only 5 days old. The result is that they can see messages dating back to 28th January on this topic, even though the 28th Jan is now 12 days ago. In a couple of days, all the messages in segment 100 will be old enough to exceed the retention threshold so that segment file will be deleted.
  
  retention.ms set to 7 days doesn't guarantee that you will only see topic messages from the last 7 days. Think of it as a threshold that the Kafka broker can use to decide when messages are eligible for being automatically deleted.
  
  Kafka
Visit annotations in context

Tags

Kafka

Annotators

pyxelr

URL

dalelane.co.uk/blog/
Mar 2023
www.theparisreview.org www.theparisreview.org

We Need the Eggs: On Annie Hall, Love, and Delusion

1
1. oswarren 15 Mar 2023
  
  in Public
  
  Wonderful conversation between Sheila Heti, her brother, and three of their friends, about the Annie Hall 'I need the eggs' joke that ends the film.
  
  humour mysticism relationships delusion conversation judaism romance tension religion myth kafka walter benjamin intellectualism
Visit annotations in context

Tags

relationships

romance

myth

religion

tension

delusion

kafka

walter benjamin

intellectualism

humour

mysticism

conversation

judaism

Annotators

oswarren

URL

theparisreview.org/blog/2022/04/22/we-need-the-eggs/
Feb 2023
hub.docker.com hub.docker.com

bitnami/kafka - Docker Image | Docker Hub

1
1. bodagovsky 27 Feb 2023
  
  in Public
  
  The Bitnami Apache Kafka docker image disables the PLAINTEXT listener for security reasons. You can enable the PLAINTEXT listener by adding the next environment variable, but remember that this configuration is not recommended for production.
  
  production note
  
  kafka note
Visit annotations in context

Tags

kafka note

Annotators

bodagovsky

URL

hub.docker.com/r/bitnami/kafka
Dec 2022
www.zhihu.com www.zhihu.com

RabbitMQ，ZeroMQ，Kafka 是一个层级的东西吗？相互之间有哪些优缺点？ - 知乎

1
1. caocao485 14 Dec 2022
  
  in Public
  
  RabbitMQ，ZeroMQ，Kafka 是一个层级的东西吗？相互之间有哪些优缺点？
  
  mq 消息队列 Rabbitmq kafka
Visit annotations in context

Tags

消息队列

mq

kafka

Rabbitmq

Annotators

caocao485

URL

zhihu.com/question/22480085
Aug 2022
time.geekbang.org time.geekbang.org

13 | 日志：日志记录真没你想象的那么简单-极客时间

1
1. caocao485 28 Aug 2022
  
  in Public
  
  没有用户态到核心态的拷贝
  
  注意点 kafka
Visit annotations in context

Tags

kafka

注意点

Annotators

caocao485

URL

time.geekbang.org/column/article/220307
www.twosigma.com www.twosigma.com

Building a High-Throughput Metrics System Using Open Source Software - Two Sigma

1
1. rocksun 07 Aug 2022
  
  in Public
  
  构建一个高吞吐量的 metrics 系统的思考过程
  
  elk metrics kafka elasticsearch
Visit annotations in context

Tags

elk

elasticsearch

kafka

metrics

Annotators

rocksun

URL

twosigma.com/articles/building-a-high-throughput-metrics-system-using-open-source-software/
developer.51cto.com developer.51cto.com

我与消息队列的八年情缘-51CTO.COM

1
1. caocao485 04 Aug 2022
  
  in Public
  
  它采用pull机制，而不是一般MQ的push模型
  
  rabbitmq是push模型，kafka是pull模型
  
  kafka 消息队列
Visit annotations in context

Tags

消息队列

kafka

Annotators

caocao485

URL

developer.51cto.com/article/695528.html
Mar 2022
medium.com medium.com

Real-time Twitter Map with JavaScript, Python, and Kafka

1
1. SamRose 05 Mar 2022
  
  in Public
  
  python kafka twitter leaflet
Visit annotations in context

Tags

leaflet

twitter

python

kafka

Annotators

SamRose

URL

medium.com/an-idea/real-time-twitter-map-with-javascript-python-and-kafka-a95d6bf34b92
Sep 2021
betterdatascience.com betterdatascience.com

How to Install Apache Kafka Using Docker — The Easy Way

7
1. pyxelr 27 Sep 2021
  
  in Public
  
  You don’t have to download them manually, as a docker-compose.yml will do that for you. Here’s the code, so you can copy it to your machine:
  
  Sample docker-compose.yml file to download both: Kafka and Zookeeper containers
  
  Kafka Zookeper Docker
2. pyxelr 27 Sep 2021
  
  in Public
  
  Kafka version 2.8.0 introduced early access to a Kafka version without Zookeeper, but it’s not ready yet for production environments.
  
  In the future, Zookeeper might be no longer needed to operate Kafka
  
  Kafka Zookeper
3. pyxelr 27 Sep 2021
  
  in Public
  
  Kafka consumer — A program you write to get data out of Kafka. Sometimes a consumer is also a producer, as it puts data elsewhere in Kafka.
  
  Simple Kafka consumer terminology
  
  Kafka
4. pyxelr 27 Sep 2021
  
  in Public
  
  Kafka producer — An application (a piece of code) you write to get data to Kafka.
  
  Simple producer terminology
  
  Kafka
5. pyxelr 27 Sep 2021
  
  in Public
  
  Kafka topic — A category to which records are published. Imagine you had a large news site — each news category could be a single Kafka topic.
  
  Simple Kafka topic terminology
  
  Kafka
6. pyxelr 27 Sep 2021
  
  in Public
  
  Kafka broker — A single Kafka Cluster is made of Brokers. They handle producers and consumers and keeps data replicated in the cluster.
  
  Simple Kafka broker terminology
  
  Kafka
7. pyxelr 27 Sep 2021
  
  in Public
  
  Kafka — Basically an event streaming platform. It enables users to collect, store, and process data to build real-time event-driven applications. It’s written in Java and Scala, but you don’t have to know these to work with Kafka. There’s also a Python API.
  
  Simple Kafka terminology
  
  Kafka
Visit annotations in context

Tags

Zookeper

Kafka

Docker

Annotators

pyxelr

URL

betterdatascience.com/how-to-install-apache-kafka-using-docker-the-easy-way/
jaceklaskowski.gitbooks.io jaceklaskowski.gitbooks.io

Running Kafka Broker in Docker · The Internals of Apache Kafka

1
1. SamRose 03 Sep 2021
  
  in Public
  
  kafka docker
Visit annotations in context

Tags

kafka

docker

Annotators

SamRose

URL

jaceklaskowski.gitbooks.io/apache-kafka/content/kafka-docker.html
medium.com medium.com

Kafka on AWS without breaking the bank

1
1. SamRose 03 Sep 2021
  
  in Public
  
  kafka aws terraform
Visit annotations in context

Tags

terraform

aws

kafka

Annotators

SamRose

URL

medium.com/investing-in-tech/cost-effective-kafka-on-aws-6c02f9b0d7de
Oct 2020
kafka.apache.org kafka.apache.org

Apache Kafka

1
1. zajo 02 Oct 2020
  
  in Public
  
  User topics must be created and manually managed ahead of time
  
  Javadoc says: "should be created".
  
  The reason is: Auto-creation of topics may be disabled in your Kafka cluster. Auto-creation automatically applies the default topic settings such as the replicaton factor. These default settings might not be what you want for certain output topics (e.g., auto.create.topics.enable=true in the Kafka broker configuration).
  
  kafka-streams
Visit annotations in context

Tags

kafka-streams

Annotators

zajo

URL

kafka.apache.org/26/documentation/streams/developer-guide/manage-topics.html
Jun 2020
ceur-ws.org ceur-ws.org

paper43.pdf

1
1. SamRose 11 Jun 2020
  
  in Public
  
  food21 spark kafka sparql rdf
Visit annotations in context

Tags

rdf

sparql

spark

kafka

food21

Annotators

SamRose

URL

ceur-ws.org/Vol-1690/paper43.pdf
Apr 2020
en.wikipedia.org en.wikipedia.org

In the Penal Colony - Wikipedia

1
1. AnatDan 13 Apr 2020
  
  in Public
  
  In the Penal Settlement
  
  בעברית: במושבת העונשין
  
  kafka German_literature
Visit annotations in context

Tags

German_literature

kafka

Annotators

AnatDan

URL

en.wikipedia.org/wiki/In_the_Penal_Colony
Feb 2020
medium.com medium.com

Keeping your ML model in shape with Kafka, Airflow and MLFlow

1
1. SamRose 22 Feb 2020
  
  in Public
  
  airflow kafka Mlflow
Visit annotations in context

Tags

Mlflow

airflow

kafka

Annotators

SamRose

URL

medium.com/vantageai/keeping-your-ml-model-in-shape-with-kafka-airflow-and-mlflow-143d20024ba6
Jul 2019
lithub.com lithub.com

Meet the Man Who Introduced Jacques Derrida to America

1
1. chrisaldrich 24 Jul 2019
  
  in Public
  
  Macksey is also responsible for one of his first short films. “When I was at Hopkins, there was no film program. We talked to Dick [about making a short film] and he said ‘let’s do it,’ and we ended up doing a movie in which he has a small part.”
  
  The short black and white film of just a few minutes was called Fratricide and is based on the Franz Kafka story A Fratricide.
  
  <small>Caleb Deschanel on the set of Fratricide. This may likely have been his first DP job circa ’66 while a student at Johns Hopkins. Photo courtesy of classmate Henry James Korn.</small>
  
  fratricide Caleb Deschanel Franz Kafka Richard Macksey
Visit annotations in context

Tags

fratricide

Caleb Deschanel

Richard Macksey

Franz Kafka

Annotators

chrisaldrich

URL

lithub.com/meet-the-man-who-introduced-derrida-to-america/
Feb 2018
www.thisiscolossal.com www.thisiscolossal.com

A Single Book Disrupts the Foundation of a Brick Wall by Jorge Méndez Blake

1
1. otterscotter 20 Feb 2018
  
  in Public
  
  This minimal, yet poignant presence is reflected in the brick work—Kafka’s novel showcasing how a small idea can have a monumental presence.
  
  love it!
  
  kafka brick_wall art
Visit annotations in context

Tags

art

brick_wall

kafka

Annotators

otterscotter

URL

thisiscolossal.com/2018/02/the-castle-by-jorge-mendez-blake/
inscription.commons.gc.cuny.edu inscription.commons.gc.cuny.edu

Kafka, “In the Penal Colony” (trans. Ian Johnson)

3
1. jallred 01 Feb 2018
  
  in Public
  
  Now he stood there naked.
  
  Why naked? How do you read this little ritual of disrobing? What might it have to do with the comedy that happens in the preceding paragraph?
  
  allred494 kafka
2. ariqj 01 Feb 2018
  
  in Public
  
  his women
  
  Third or fourth mention of the Commandant's 'women.' Connection to 'Honour your superiors'? Objectification of marginalized groups.
  
  allred494 kafka
3. ariqj 01 Feb 2018
  
  in Public
  
  I usually kneel down at this point and observe the phenomenon.
  
  The fetishistic obsession with inscription is particularly disturbing here.
  
  allred494 kafka
Visit annotations in context

Tags

kafka

allred494

Annotators

ariqj

jallred

URL

inscription.commons.gc.cuny.edu/etexts/kafka-in-the-penal-colony-trans-ian-johnson/
Jan 2018
inscription.commons.gc.cuny.edu inscription.commons.gc.cuny.edu

Kafka, “In the Penal Colony” (trans. Ian Johnson)

14
1. ariqj 31 Jan 2018
  
  in Public
  
  Guilt is always beyond a doubt.
  
  Unlikely.
  
  allred494 kafka
2. ariqj 31 Jan 2018
  
  in Public
  
  “It would be useless to give him that information. He experiences it on his own body.
  
  But hasn't the Condemned Man already experienced this subjection and degradation in his body? (like a vandalized house) The inscription of it on his body simply embodies what has already been true.
  
  allred494 kafka
3. ariqj 31 Jan 2018
  
  in Public
  
  “That’s cotton wool?” asked the Traveler and bent down. “Yes, it is,” said the Officer smiling, “feel it for yourself.”
  
  Kafka keeps bringing our attention to the cotton, perhaps to ensure we recognize the historical implications?
  
  allred494 kafka
4. ariqj 31 Jan 2018
  
  in Public
  
  harrow
  
  Common term used in farming/planting.
  
  allred494 kafka
5. ariqj 31 Jan 2018
  
  in Public
  
  epaulettes
  
  According to Google, "an ornamental shoulder piece on an item of clothing, typically on the coat or jacket of a military uniform"
  
  allred494 kafka
6. ariqj 31 Jan 2018
  
  in Public
  
  the administration of the colony was so self-contained that even if his successor had a thousand new plans in mind, he would not be able to alter anything of the old plan, at least not for several years.
  
  Horrifying.
  
  allred494 kafka
7. ariqj 31 Jan 2018
  
  in Public
  
  vacant-looking man with a broad mouth and dilapidated hair and face
  
  The descriptors "Vacant-looking" and "dilapidated" summon up imagery of haunted houses and manors left in ruin rather than people. These terms are primarily used to describe things, not people.
  
  Why then is our "Condemned" an empty house? What has pushed him from subject to object in this way?
  
  allred494 kafka
8. ariqj 31 Jan 2018
  
  in Public
  
  Officer to the Traveler,
  
  Officer, Traveler, Condemned. Everyone is defined solely by the roles that they inhabit.
  
  allred494 kafka
9. jallred 26 Jan 2018
  
  in Public
  
  Then the Traveler heard a cry of rage from the Officer.
  
  How does affect work in this tale? What kinds of feelings are evoked in whom by what kinds of stimuli? What do these eruptions of feeling tell us about the unspoken value system that undergirds this society?
  
  allred494 kafka
10. jallred 26 Jan 2018
  
  in Public
  
  That gave rise to certain technical difficulties with fastening the needles securely, but after several attempts we were successful. We didn’t spare any efforts. And now, as the inscription is made on the body, everyone can see through the glass. Don’t you want to come closer and see the needles for yourself.”
  
  Why glass? Given that the Apparatus is a mere tool, an agent of "justice," why such pains to make its workings visible? Why talk about it so much?
  
  allred494 kafka
11. jallred 26 Jan 2018
  
  in Public
  
  The Traveler wanted to raise various questions, but after looking at the Condemned Man he merely asked, “Does he know his sentence?” “No,” said the Officer. He wished to get on with his explanation right away, but the Traveler interrupted him: “He doesn’t know his own sentence?” “No,” said the Officer once more. He then paused for a moment, as if he was asking the Traveler for a more detailed reason for his question, and said, “It would be useless to give him that information. He experiences it on his own body.”
  
  How you you read this crucial moment? Who knows what in this story, and how does Kafka exploit the lack of symmetry between Commandant, Officer, Traveler, Condemned, and so on?
  
  allred494 kafka
12. jallred 26 Jan 2018
  
  in Public
  
  “He was indeed,” said the Officer, nodding his head with a fixed and thoughtful expression. Then he looked at his hands, examining them. They didn’t seem to him clean enough to handle the diagrams. So he went to the bucket and washed them again. Then he pulled out a small leather folder and said, “Our sentence does not sound severe. The law which a condemned man has violated is inscribed on his body with the harrow. This Condemned Man, for example,” and the Officer pointed to the man, “will have inscribed on his body, ‘Honour your superiors.’”
  
  Alas, the double entendre of "sentence" as a grammatical and legal entity at once is not active in German, but the slippage certainly fits here!
  
  allred494 kafka
13. jallred 26 Jan 2018
  
  in Public
  
  “However,” the Officer said, interrupting himself, “I’m chattering, and his apparatus stands here in front of us. As you see, it consists of three parts. With the passage of time certain popular names have been developed for each of these parts. The one underneath is called the bed, the upper one is called the inscriber, and here in the middle, this moving part is called the harrow.” “The harrow?” the Traveler asked. He had not been listening with full attention. The sun was excessively strong, trapped in the shadowless valley, and one could hardly collect one’s thoughts. So the Officer appeared to him all the more admirable in his tight tunic weighed down with epaulettes and festooned with braid, ready to go on parade, as he explained the matter so eagerly and, while he was talking, adjusted screws here and there with a screwdriver.
  
  What's the effect of Kafka's use of abstraction here? Those who know his other works are perhaps used to this stylistic feature, but why the abstract titles/names, from Commandant to Traveler to apparatus?
  
  allred494 kafka
14. jallred 26 Jan 2018
  
  in Public
  
  Of course, interest in the execution was not very high, not even in the penal colony itself.
  
  What's the tone of this story? Why does it matter that no one is interested in the execution, including the condemned?
  
  allred494 kafka
Visit annotations in context

Tags

kafka

allred494

Annotators

ariqj

jallred

URL

inscription.commons.gc.cuny.edu/etexts/kafka-in-the-penal-colony-trans-ian-johnson/
engineering.linkedin.com engineering.linkedin.com

The Log: What every software engineer should know about real-time data's unifying abstraction

1
1. yorelog 04 Jan 2018
  
  in Public
  
  The Log: What every software engineer should know about real-time data's unifying abstraction
  
  log kafka
Visit annotations in context

Tags

kafka

log

Annotators

yorelog

URL

engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
Nov 2017
spark.apache.org spark.apache.org

Spark Streaming + Kafka Integration Guide (Kafka broker version 0.10.0 or higher) - Spark 2.2.0 Documentation

2
1. ramlinuxprasad 05 Nov 2017
  
  in Public
  
  SubscribePattern allows you to use a regex to specify topics of interest
  
  This can remove the need to reload the kafka writers in order to take consume messages.
  
  regex - "topic-ua-*"
  
  Kafka consumer topic spark_kafka
2. ramlinuxprasad 05 Nov 2017
  
  in Public
  
  The cache for consumers has a default maximum size of 64. If you expect to be handling more than (64 * number of executors) Kafka partitions, you can change this setting via spark.streaming.kafka.consumer.cache.maxCapacity.
  
  You might need this for keeping track of all partitions consumed.
  
  Kafka consumer spark_kafka
Visit annotations in context

Tags

consumer

spark_kafka

Kafka

topic

Annotators

ramlinuxprasad

URL

spark.apache.org/docs/2.2.0/streaming-kafka-0-10-integration.html
Jul 2017
docs.confluent.io docs.confluent.io

Concepts — Confluent Platform 3.2.2 documentation

2
1. ramlinuxprasad 25 Jul 2017
  
  in Public
  
  In distributed mode, you start many worker processes using the same group.id and they automatically coordinate to schedule execution of connectors and tasks across all available workers. I
  
  Distributed workers.
  
  group.id = "SHOUDL BE THE SAME FOR ALL WORKERS"
  
  confluent Kafka distributed workers
2. ramlinuxprasad 25 Jul 2017
  
  in Public
  
  Connectors and tasks are logical units of work and must be scheduled to execute in a process. Kafka Connect calls these processes workers and has two types of workers: standalone and distributed.
  
  Workers = JVM processes
  
  confluent Kafka
Visit annotations in context

Tags

distributed workers

Kafka

confluent

Annotators

ramlinuxprasad

URL

docs.confluent.io/current/connect/concepts.html
medium.com medium.com

Exactly-once Support in Apache Kafka – Jay Kreps – Medium

1
1. flyisland 24 Jul 2017
  
  in Public
  
  exactly-once toread kafka
Visit annotations in context

Tags

toread

kafka

exactly-once

Annotators

flyisland

URL

medium.com/@jaykreps/exactly-once-support-in-apache-kafka-55e1fdd0a35f
www.confluent.io www.confluent.io

Exactly-once Semantics is Possible: Here's How Apache Kafka Does it

1
1. flyisland 24 Jul 2017
  
  in Public
  
  kafka exactly-once toread
Visit annotations in context

Tags

toread

kafka

exactly-once

Annotators

flyisland

URL

confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/
stackoverflow.com stackoverflow.com

How to create topics in apache kafka?

1
1. ramlinuxprasad 18 Jul 2017
  
  in Public
  
  up vote 7 down vote accepted When you are starting your kafka broker you can define set of properties in conf/server.properties file. This file is just key value property file. One of the property is auto.create.topics.enable if it set tot true(by default) kafka will create topic automatically when you send message to non existing topic. All config options you can find here Imho Simple rule for creating topics is the following: number of replicas must be not less than number of nodes that you have. Number of topics must be the multiplier of number of node in your cluster for example: You have 9 node cluster your topic must have 9 partitions and 9 replicas or 18 partitions and 9 replicas or 36 partitions and 9 replicas and so on
  
  Number of replicas = #replicas Number of nodes = #nodes Number of topics = #topic
  
  replicas >= #nodes
  
  k x (#topics) = #nodes
  
  Kafka replicas topic number of replicas
Visit annotations in context

Tags

replicas

Kafka

number of replicas

topic

Annotators

ramlinuxprasad

URL

stackoverflow.com/questions/36441768/how-to-create-topics-in-apache-kafka
github.com github.com

confluentinc/kafka-rest

1
1. ramlinuxprasad 06 Jul 2017
  
  in Public
  
  ab -t 15 -k -T "application/vnd.kafka.binary.v1+json" -p postfile http://localhost:8082/topics/test
  
  ab benchmark
  
  Kafka
Visit annotations in context

Tags

Kafka

Annotators

ramlinuxprasad

URL

github.com/confluentinc/kafka-rest/issues/93
Jun 2017
www.datadoghq.com www.datadoghq.com

Monitoring Kafka performance metrics

2
1. ramlinuxprasad 29 Jun 2017
  
  in Public
  
  in sync replicas (ISRs) should be exactly equal to the total number of replicas.
  
  ISRs are a very imp metric
  
  Kafka monitoring ISR
2. ramlinuxprasad 29 Jun 2017
  
  in Public
  
  Kafka metrics can be broken down into three categories:Kafka server (broker) metricsProducer metricsConsumer metrics
  
  3 Metrics:
  
  Broker
  
  Producer (Netty)
  
  Consumer (SECOR)
  
  Kafka monitoring
Visit annotations in context

Tags

ISR

Kafka

monitoring

Annotators

ramlinuxprasad

URL

datadoghq.com/
kafka.apache.org kafka.apache.org

Apache Kafka

1
1. ramlinuxprasad 21 Jun 2017
  
  in Public
  
  "isr" is the set of "in-sync" replicas.
  
  ISR are pretty import as when nodes go down you will see replicas created later.
  
  Kafka ISR replicas broker
Visit annotations in context

Tags

replicas

Kafka

ISR

broker

Annotators

ramlinuxprasad

URL

kafka.apache.org/documentation.html
mail-archives.apache.org mail-archives.apache.org

Re: HDD or SSD or EBS for kafka brokers in Amazon EC2

1
1. ramlinuxprasad 20 Jun 2017
  
  in Public
  
  We run Kafka on the old and trusty m1.xlarge
  
  aws kafka m1.xlarge
  
  Kafka aws production
Visit annotations in context

Tags

Kafka

aws

production

Annotators

ramlinuxprasad

URL

mail-archives.apache.org/mod_mbox/kafka-users/201506.mbox/<CAMR1f-fvFZun7-gY3R9QQK88KEGfzn8dsgWBFZ+EafueePGQoQ@mail.gmail.com>
www.confluent.io www.confluent.io

How to choose the number of topics/partitions in a Kafka cluster? - Confluent

1
1. ramlinuxprasad 15 Jun 2017
  
  in Public
  
  You measure the throughout that you can achieve on a single partition for production (call it p) and consumption (call it c). Let’s say your target throughput is t.
  
  t = throughput (QPS) p = single partition for production c = consumption
  
  number of partitions Kafka producer consumer
Visit annotations in context

Tags

Kafka

consumer

number of partitions

producer

Annotators

ramlinuxprasad

URL

confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/
engineering.linkedin.com engineering.linkedin.com

Benchmarking Apache Kafka: 2 Million Writes Per Second (On Three Cheap Machines)

1
1. ramlinuxprasad 14 Jun 2017
  
  in Public
  
  Messages are immediately written to the filesystem when they are received. Messages are not deleted when they are read but retained with some configurable SLA (say a few days or a week)
  
  Kafka architecture
Visit annotations in context

Tags

architecture

Kafka

Annotators

ramlinuxprasad

URL

engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
docs.confluent.io docs.confluent.io

Production Deployment — Confluent Platform 3.2.1 documentation

2
1. ramlinuxprasad 13 Jun 2017
  
  in Public
  
  ZooKeeper snapshots can be one such a source of concurrent writes, and ideally should be written on a disk group separate from the transaction log.
  
  zookeeper maintains concurrency in its own way.
  
  Kafka zookeeper config production
2. ramlinuxprasad 13 Jun 2017
  
  in Public
  
  If you do end up sharing the ensemble, you might want to use the chroot feature. With chroot, you give each application its own namespace.
  
  jail zookeeper instance from the other apps
  
  zookeeper production Kafka config
Visit annotations in context

Tags

config

zookeeper

Kafka

production

Annotators

ramlinuxprasad

URL

docs.confluent.io/current/kafka/deployment.html
www.cloudera.com www.cloudera.com

Using Kafka Command-line Tools

1
1. ramlinuxprasad 13 Jun 2017
  
  in Public
  
  Very useful kafka command-line tools to keep track of what's happening in your kafka cluster.
  
  Kafka command-line-tools monitoring
Visit annotations in context

Tags

Kafka

command-line-tools

monitoring

Annotators

ramlinuxprasad

URL

cloudera.com/documentation/kafka/latest/topics/kafka_command_line.html
cwiki.apache.org cwiki.apache.org

Consumer Group Example - Apache Kafka - Apache Software Foundation

2
1. ramlinuxprasad 08 Jun 2017
  
  in Public
  
  Designing a High Level Consumer
  
  By far the most important thing you need to know to make SECOR operate with Kafkaf
  
  Kafka secor consumer consumer_groups number of partitions number of consumers
2. ramlinuxprasad 07 Jun 2017
  
  in Public
  
  the High Level Consumer is provided to abstract most of the details of consuming events from Kafka.
  
  Kafka consumer high level consumer api
Visit annotations in context

Tags

consumer_groups

number of consumers

number of partitions

consumer

high level consumer api

Kafka

secor

Annotators

ramlinuxprasad

URL

cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
github.com github.com

Using KafkaConsumer in Secor · Issue #341 · pinterest/secor

1
1. ramlinuxprasad 07 Jun 2017
  
  in Public
  
  In merced, we used the low-level simple consumer and wrote our own work dispatcher to get precise control.
  
  difference between merced and secor
  
  secor merced Kafka consumer
Visit annotations in context

Tags

merced

consumer

secor

Kafka

Annotators

ramlinuxprasad

URL

github.com/pinterest/secor/issues/341
sookocheff.com sookocheff.com

Kafka in a Nutshell

4
1. ramlinuxprasad 07 Jun 2017
  
  in Public
  
  A better alternative is at least once message delivery. For at least once delivery, the consumer reads data from a partition, processes the message, and then commits the offset of the message it has processed. In this case, the consumer could crash between processing the message and committing the offset and when the consumer restarts it will process the message again. This leads to duplicate messages in downstream systems but no data loss.
  
  This is what SECOR does.
  
  Kafka consumer consumer_groups
2. ramlinuxprasad 07 Jun 2017
  
  in Public
  
  no data loss will occur as long as producers and consumers handle this possibility and retry appropriately.
  
  Retries should be built into the consumer and producer code. If leader for the partition fails, you will see a LeaderNotAvailable Exception.
  
  Kafka Leader replicas consistency availability partitions
3. ramlinuxprasad 07 Jun 2017
  
  in Public
  
  By electing a new leader as soon as possible messages may be dropped but we will minimized downtime as any new machine can be leader.
  
  two scenarios to get the leader back: 1.) Wait to bring the master back online. 2.) Or elect the first node that comes back up. But in this scenario if that replica partition was a bit behind the master then the time from when this replica went down to when the master went down. All that data is Lost.
  
  SO there is a trade off between availability and consistency. (Durability)
  
  Kafka availability consistency consumer producer replicas
4. ramlinuxprasad 07 Jun 2017
  
  in Public
  
  keep in mind that these guarantees hold as long as you are producing to one partition and consuming from one partition.
  
  This is very important a 1-to-1 mapping between writer and reader with partition. If you have more producers per partition or more consumers per partition your consistency is going to go haywire
  
  Kafka producer consumer consistency availability
Visit annotations in context

Tags

consumer_groups

Leader

availability

replicas

consumer

partitions

Kafka

consistency

producer

Annotators

ramlinuxprasad

URL

sookocheff.com/post/kafka/kafka-in-a-nutshell/
www.confluent.io www.confluent.io

Introducing the Kafka Consumer: Getting Started with the New Apache Kafka 0.9 Consumer Client - Confluent

4
1. ramlinuxprasad 07 Jun 2017
  
  in Public
  
  On every received heartbeat, the coordinator starts (or resets) a timer. If no heartbeat is received when the timer expires, the coordinator marks the member dead and signals the rest of the group that they should rejoin so that partitions can be reassigned. The duration of the timer is known as the session timeout and is configured on the client with the setting session.timeout.ms.
  
  Time to live for the consumers. If the heartbeat doesn't reach the co-ordindator in this duration then the co-ordinator redistributes the partitions to the remaining consumers in the consumer group.
  
  Kafka consumer consumer_groups time to live
2. ramlinuxprasad 06 Jun 2017
  
  in Public
  
  The high watermark is the offset of the last message that was successfully copied to all of the log’s replicas.
  
  High Watermark: messages copied over to log replicas
  
  Kafka consumer consumer_groups producer replicas
3. ramlinuxprasad 06 Jun 2017
  
  in Public
  
  Kafka new Client which uses a different protocol for consumption in a distributed environment.
  
  Kafka consumer_groups consumer
4. ramlinuxprasad 06 Jun 2017
  
  in Public
  
  Kafka scales topic consumption by distributing partitions among a consumer group, which is a set of consumers sharing a common group identifier.
  
  Topic consumption is distributed among a list of consumer group.
  
  Kafka consumer consumer_groups
Visit annotations in context

Tags

consumer_groups

replicas

consumer

time to live

Kafka

producer

Annotators

ramlinuxprasad

URL

confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0-9-consumer-client/
engineering.pinterest.com engineering.pinterest.com

Introducing Pinterest Secor – Pinterest Engineering – Medium

2
1. ramlinuxprasad 03 Jun 2017
  
  in Public
  
  Kafka consumer offset management protocol to keep track of what’s been uploaded to S3
  
  consumers keep track of what's written and where it left off by looking at kafka consumer offsets rather than checking S3 since S3 is an eventually consistent system.
  
  Kafka secor consumer s3 consistency
2. ramlinuxprasad 03 Jun 2017
  
  in Public
  
  Data lost or corrupted at this stage isn’t recoverable so the greatest design objective for Secor is data integrity.
  
  data loss in S3 is being mitigated.
  
  secor Kafka s3 consumer
Visit annotations in context

Tags

consumer

Kafka

consistency

secor

s3

Annotators

ramlinuxprasad

URL

engineering.pinterest.com/blog/introducing-pinterest-secor
www.elastic.co www.elastic.co

Basic Concepts | Elasticsearch Reference [2.3] | Elastic

1
1. ramlinuxprasad 01 Jun 2017
  
  in Public
  
  An index can potentially store a large amount of data that can exceed the hardware limits of a single node. For example, a single index of a billion documents taking up 1TB of disk space may not fit on the disk of a single node or may be too slow to serve search requests from a single node alone.
  
  Indexes may overflow the disk space. Hence you want to get the most out of your instances by indexing the nodes.
  
  shards elastic Kafka
Visit annotations in context

Tags

shards

Kafka

elastic

Annotators

ramlinuxprasad

URL

elastic.co/guide/en/elasticsearch/reference/2.3/_basic_concepts.html
github.com github.com

pinterest/secor

2
1. ramlinuxprasad 01 Jun 2017
  
  in Public
  
  incidents are an unavoidable reality of working with distributed systems, no matter how reliable. A prompt alerting solution should be an integral part of the design,
  
  see how it can hook into the current logging mechanism
  
  Kafka consumer producer logging alerting mechanism
2. ramlinuxprasad 01 Jun 2017
  
  in Public
  
  Consumers in this group are designed to be dead-simple, performant, and highly resilient. Since the data copied verbatim, no code upgrades are required to support new message types.
  
  exactly what we want
  
  Kafka consumer consumer_groups realtime
Visit annotations in context

Tags

consumer_groups

realtime

consumer

logging

alerting mechanism

Kafka

producer

Annotators

ramlinuxprasad

URL

github.com/pinterest/secor/blob/master/DESIGN.md
May 2017
medium.com medium.com

Scalable and reliable data ingestion at Pinterest – Pinterest Engineering – Medium

1
1. ramlinuxprasad 31 May 2017
  
  in Public
  
  More events may arrive late for various reasons, we need to handle late-arrived events consistently.
  
  May not be needed for our use case.
  
  Kafka producer consumer
Visit annotations in context

Tags

Kafka

producer

consumer

Annotators

ramlinuxprasad

URL

medium.com/@Pinterest_Engineering/scalable-and-reliable-data-ingestion-at-pinterest-b921c2ee8754
www.quora.com www.quora.com

(1/1) What are the most significant differences between Flume and Kafka? - Quora

1
1. ramlinuxprasad 31 May 2017
  
  in Public
  
  With Flume & FlumeNG, and a File channel, if you loose a broker node you will lose access to those events until you recover that disk.
  
  In flume you loose events if the disk is down. This is very bad for our usecase.
  
  flume Kafka broker producer
Visit annotations in context

Tags

Kafka

flume

producer

broker

Annotators

ramlinuxprasad

URL

quora.com/What-are-the-most-significant-differences-between-Flume-and-Kafka
kafka.apache.org kafka.apache.org

Apache Kafka

6
1. ramlinuxprasad 31 May 2017
  
  in Public
  
  The Kafka cluster retains all published records—whether or not they have been consumed—using a configurable retention period. For example, if the retention policy is set to two days, then for the two days after a record is published, it is available for consumption, after which it will be discarded to free up space. Kafka's performance is effectively constant with respect to data size so storing data for a long time is not a problem.
  
  irrespective of the fact that the consumer has consumed the message that message is kept in kafka for the entire retention policy duration.
  
  You can have two or more consumer groups: 1 -> real time 2 -> back up consumer group
  
  Kafka topic consumer retention policy producer
2. ramlinuxprasad 24 May 2017
  
  in Public
  
  Kafka for Stream Processing
  
  Could be something we can consider for directing data from a raw log to a tenant based topic.
  
  Kafka tenant producer
3. ramlinuxprasad 24 May 2017
  
  in Public
  
  replication factor N, we will tolerate up to N-1 server failures without losing any records
  
  Replication Factor means number of nodes/brokers which could go down before we start losing data.
  
  So if you have a replication factor of 6 for a 11 node cluster, then you will be fault tolerant till 5 nodes go down. After that point you are going to loose data for a particular partition.
  
  Kafka producer data_loss replication_factor
4. ramlinuxprasad 24 May 2017
  
  in Public
  
  Messages sent by a producer to a particular topic partition will be appended in the order they are sent. That is, if a record M1 is sent by the same producer as a record M2, and M1 is sent first, then M1 will have a lower offset than M2 and appear earlier in the log.
  
  ordering is guaranteed.
  
  Kafka producer
5. ramlinuxprasad 24 May 2017
  
  in Public
  
  Consumers label themselves with a consumer group name, and each record published to a topic is delivered to one consumer instance within each subscribing consumer group. Consumer instances can be in separate processes or on separate machines.
  
  kafka takes care of the consumer groups. Just create one Consumer Group for each topic.
  
  Kafka consumer_groups
6. ramlinuxprasad 24 May 2017
  
  in Public
  
  The partitions of the log are distributed over the servers in the Kafka cluster with each server handling data and requests for a share of the partitions.
  
  partitions of logs are per TOPIC basis
  
  topic Kafka
Visit annotations in context

Tags

consumer_groups

retention policy

tenant

topic

consumer

replication_factor

Kafka

producer

data_loss

Annotators

ramlinuxprasad

URL

kafka.apache.org/intro
www.quora.com www.quora.com

How many topics can be created in Apache Kafka? - Quora

1
1. ramlinuxprasad 31 May 2017
  
  in Public
  
  The first limitation is that each partition is physically represented as a directory of one or more segment files. So you will have at least one directory and several files per partition. Depending on your operating system and filesystem this will eventually become painful. However this is a per-node limit and is easily avoided by just adding more total nodes in the cluster.
  
  total number of topics supported depends on the total number of partitions per topic.
  
  partition = directory of 1 or more segment files This is a per node limit
  
  Kafka partitions topic topics
Visit annotations in context

Tags

Kafka

partitions

topics

topic

Annotators

ramlinuxprasad

URL

quora.com/How-many-topics-can-be-created-in-Apache-Kafka
stackoverflow.com stackoverflow.com

Apache Kafka Scaling Topics using partitions

1
1. ramlinuxprasad 31 May 2017
  
  in Public
  
  the number of partitions -- there's no real "formula" other than this: you can have no more parallelism than you have partitions.
  
  This is an important thing to keep in mind. If we need massive parallelism we need to have more partitions.
  
  Kafka broker partitions number of partitions
Visit annotations in context

Tags

Kafka

partitions

number of partitions

broker

Annotators

ramlinuxprasad

URL

stackoverflow.com/questions/36945521/apache-kafka-scaling-topics-using-partitions
sookocheff.com sookocheff.com

Kafka in a Nutshell

1
1. ramlinuxprasad 31 May 2017
  
  in Public
  
  The offset the ordering of messages as an immutable sequence. Kafka maintains this message ordering for you.
  
  Kafka maintains the ordering for you...
  
  producer Kafka offset
Visit annotations in context

Tags

Kafka

offset

producer

Annotators

ramlinuxprasad

URL

sookocheff.com/post/kafka/kafka-in-a-nutshell/
kafka.apache.org kafka.apache.org

Apache Kafka

8
1. ramlinuxprasad 25 May 2017
  
  in Public
  
  replication-factor 3
  
  If n-1=2 nodes go down you will start loosing data. So that means if both the nodes go down you will loose data.
  
  replication_factor Kafka topic
2. ramlinuxprasad 24 May 2017
  
  in Public
  
  For a topic with replication factor N, we will tolerate up to N-1 server failures without losing any records committed to the log.
  
  for Eg for a given topic there are 11 brokers/servers and for each topic the replication factor is 6. That means the topic will start loosing data if more than 5 brokers go down.
  
  Kafka data_loss consumer
3. ramlinuxprasad 24 May 2017
  
  in Public
  
  The way consumption is implemented in Kafka is by dividing up the partitions in the log over the consumer instances so that each instance is the exclusive consumer of a "fair share" of partitions at any point in time. This process of maintaining membership in the group is handled by the Kafka protocol dynamically. If new instances join the group they will take over some partitions from other members of the group; if an instance dies, its partitions will be distributed to the remaining instances.
  
  The coolest feature: this way all you need to do is add new consumers in a consumer group to auto scale per topic
  
  consumer_groups consumer Kafka
4. ramlinuxprasad 24 May 2017
  
  in Public
  
  Consumers label themselves with a consumer group name
  
  maintain separate consumer group per tenant basis. Helps to scale out when we have more load per tenant.
  
  data_loss Kafka consumer_groups consumer
5. ramlinuxprasad 24 May 2017
  
  in Public
  
  The producer is responsible for choosing which record to assign to which partition within the topic.
  
  Producer can publish to a specific topics
  
  Kafka producer topic
6. ramlinuxprasad 24 May 2017
  
  in Public
  
  individual partition must fit on the servers that host it
  
  Each Partition is bounded by the server that hosts that partition.
  
  Kafka producer
7. ramlinuxprasad 24 May 2017
  
  in Public
  
  the only metadata retained on a per-consumer basis is the offset or position of that consumer in the log. This offset is controlled by the consumer: normally a consumer will advance its offset linearly as it reads records, but, in fact, since the position is controlled by the consumer it can consume records in any order it likes.
  
  partition offset maintained by kafka. Offset number is maintained so that if the consumer goes down nothing breaks.
  
  data_loss producer Kafka
8. ramlinuxprasad 24 May 2017
  
  in Public
  
  the retention policy is set to two days, then for the two days after a record is published,
  
  Might have to tweek this based on the persistence level we want to keep.
  
  data_loss Kafka
Visit annotations in context

Tags

Kafka

consumer

topic

replication_factor

consumer_groups

producer

data_loss

Annotators

ramlinuxprasad

URL

kafka.apache.org/documentation.html

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

解决的问题

如何解决？

由于DelayQueue的存在，时间复杂度和直接使用DelayQueue有什么不同？

Tags

Annotators

URL

AR(Assigned Replicas)与ISR的区别

Tags

Annotators

URL

Kafka如何保证请求发送后能够按预期执行？

Controller给Broker发送请求是否需要保证Broker变更成功

为什么不是NewReplica状态就可以直接让其他broker同步？

Tags

Annotators

URL

日志是如何清理的呢？

什么时候recover？

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators