96 Matching Annotations

Oct 2019
learn.akamai.com learn.akamai.com

Overview

1
1. ramlinuxprasad 01 Oct 2019
  
  in Public
  
  OD is designed to deliver non-HTML cacheable objects (that is, objects that aren't text/html content type) under 100 MB in size.
  
  Does Akamai support text content/type. This is a MUST have for us to move forward.
Visit annotations in context

Annotators

ramlinuxprasad

URL

learn.akamai.com/en-us/webhelp/object-delivery/object-delivery-implementation-guide/GUID-920AE35F-979B-4294-AFD1-22A624833708.html
Nov 2017
timmurphy.org timmurphy.org

Checking for empty string in Bash « timmurphy.org

1
1. ramlinuxprasad 24 Nov 2017
  
  in Public
  
  In Bash you quite often need to check to see if a variable has been set or has a value other than an empty string. This can be done using the -n or -z string comparison operators.
  
  Two most useful commands in bash
  
  linux command-line-tools bash
Visit annotations in context

Tags

linux

bash

command-line-tools

Annotators

ramlinuxprasad

URL

timmurphy.org/2010/05/19/checking-for-empty-string-in-bash/
stackoverflow.com stackoverflow.com

SELECT ONE Row with the MAX() value on a column

1
1. ramlinuxprasad 23 Nov 2017
  
  in Public
  
  select top 1 * from newsletters where IsActive = 1 order by PublishDate desc
  
  This doesn't require a full table scan or a join operation. That's just COOL
  
  SQL database
Visit annotations in context

Tags

SQL

database

Annotators

ramlinuxprasad

URL

stackoverflow.com/questions/13752023/select-one-row-with-the-max-value-on-a-column
www.wikiwand.com www.wikiwand.com

Quickselect | Wikiwand

1
1. ramlinuxprasad 19 Nov 2017
  
  in Public
  
  // The search space within the array is changing for each round - but the list // is still the same size. Thus, k does not need to be updated with each round.
  
  If you do not do this you will have to keep moving the kth index every time respective of the size of the array.
  
  if (pivotIdx < k - 1): quickSelect(A, pivotIdx + 1, end, k-1 - (pivotIdx + 1)); if (pivotIdx > k - 1): quickSelect(A, start, pivotIdx - 1, k - 1);
  
  algos quickselect
Visit annotations in context

Tags

quickselect

algos

Annotators

ramlinuxprasad

URL

wikiwand.com/en/Quickselect
www.scribendi.com www.scribendi.com

Using Articles—A, An, The | Scribendi

2
1. ramlinuxprasad 13 Nov 2017
  
  in Public
  
  an hour
  
  hour is pronounced more like OUR therefore you use the article an and NOT a
  
  english grammer article
2. ramlinuxprasad 13 Nov 2017
  
  in Public
  
  A European (again we see the y sound coming from a vowel)
  
  Where would you use a and an
  
  article english grammer
Visit annotations in context

Tags

article

english grammer

Annotators

ramlinuxprasad

URL

scribendi.com/advice/using_articles_a_an_the.en.html
www.datavisor.com www.datavisor.com

Guest Post: End the False Positive Alerts Plague in Anti-Money Laundering (AML) Systems | DataVisor

2
1. ramlinuxprasad 08 Nov 2017
  
  in Public
  
  They have a very simplistic view of the activity being monitored by only distilling it down into only a few dimensions for the rule to interrogate
  
  Number of dimensions need to be large. In normal database systems these dimensions are small.
  
  Banks AML Datavisor Database BigData
2. ramlinuxprasad 08 Nov 2017
  
  in Public
  
  UML automatically finds these hidden patterns to link seemingly unrelated accounts and customers. These links can be one of thousands of data fields that the UML model ingests.
  
  Why does this have to be done in a different system?
  
  Unsupervised Machine Learning
Visit annotations in context

Tags

BigData

Database

Banks

AML

Unsupervised Machine Learning

Datavisor

Annotators

ramlinuxprasad

URL

datavisor.com/technical-posts/guest-post-end-the-false-positive-alerts-plague-in-anti-money-laundering-aml-systems/
spark.apache.org spark.apache.org

Spark Streaming + Kafka Integration Guide (Kafka broker version 0.10.0 or higher) - Spark 2.2.0 Documentation

2
1. ramlinuxprasad 05 Nov 2017
  
  in Public
  
  SubscribePattern allows you to use a regex to specify topics of interest
  
  This can remove the need to reload the kafka writers in order to take consume messages.
  
  regex - "topic-ua-*"
  
  Kafka consumer topic spark_kafka
2. ramlinuxprasad 05 Nov 2017
  
  in Public
  
  The cache for consumers has a default maximum size of 64. If you expect to be handling more than (64 * number of executors) Kafka partitions, you can change this setting via spark.streaming.kafka.consumer.cache.maxCapacity.
  
  You might need this for keeping track of all partitions consumed.
  
  Kafka consumer spark_kafka
Visit annotations in context

Tags

topic

spark_kafka

Kafka

consumer

Annotators

ramlinuxprasad

URL

spark.apache.org/docs/2.2.0/streaming-kafka-0-10-integration.html
Sep 2017
stratechery.com stratechery.com

Stratechery by Ben Thompson

1
1. ramlinuxprasad 29 Sep 2017
  
  in Public
  
  Amazon integrated customer data and payment information with e-book distribution and its Amazon publishing initiative
  
  Customer data (big data) + payment info (where's the money) + e-book distribution (infrastructure: kindle store and kindle device's seamless integration)
  
  Earlier guys integrated: Procurement (writers initial draft) + editing + marketing + distribution = Think book reviews and author tours on talk shows.
  
  Amazon's idea is more insightful and focussed on individual customers and not shooting in the DARK :)
  
  strategy business startups
Visit annotations in context

Tags

business

strategy

startups

Annotators

ramlinuxprasad

URL

stratechery.com/
www.chris-granger.com www.chris-granger.com

Chris Granger - Coding is not the new literacy

1
1. ramlinuxprasad 28 Sep 2017
  
  in Public
  
  Coding is not the fundamental skill
  
  It's not the only thing in the world.
  
  self-help
Visit annotations in context

Tags

self-help

Annotators

ramlinuxprasad

URL

chris-granger.com/2015/01/26/coding-is-not-the-new-literacy/
vi.stackexchange.com vi.stackexchange.com

How do I change the current split's width and height?

1
1. ramlinuxprasad 21 Sep 2017
  
  in Public
  
  up vote 15 down vote This is one of the few reasons I like to use vim's mouse mode. If you use the GUI version, or your terminal supports sending drag events (such as xterm or rxvt-unicode) you can click on the split line and drag to resize the window exactly where you want, without a lot of guess work using the ctrl-w plus,minus,less,greater combinations. In terminal versions, you have to set mouse mode properly for this to work :set mouse=n (I use 'n', but 'a' also works) and you have to set the tty mouse type :set ttymouse=xterm2 A lot of people say that a lot of time is wasted using the mouse (mostly due to the time it takes to move your hand from the keyboard to the mouse and back), but I find that, in this case, the time saved by having immediate feedback while adjusting window sized and the quickness of re-resizing (keep movving the mouse instead of typing another key sequence) outweighs the delay of moingmy hand.
  
  Simply amazing setup to get multiple screens vim
  
  vim mouse
Visit annotations in context

Tags

vim

mouse

Annotators

ramlinuxprasad

URL

vi.stackexchange.com/questions/514/how-do-i-change-the-current-splits-width-and-height
www.cnbc.com www.cnbc.com

Facebook VP David Marcus explains how Messenger will make money

1
1. ramlinuxprasad 19 Sep 2017
  
  in Public
  
  "What we think about is that there's a conversation, and inside of that conversation you have a contextual place where you can have all of the interactions that you want or have or need to have with a brand or service, and it can take multiple forms. It can be buttons, it can be UI [user interface] and it can be conversational when it needs to be," Marcus said.
  
  This is exactly what in context payments mean.
  
  contextual payments
Visit annotations in context

Tags

contextual payments

Annotators

ramlinuxprasad

URL

cnbc.com/2016/09/12/facebook-vp-david-marcus-explains-how-messenger-will-make-money.html
Aug 2017
vim.wikia.com vim.wikia.com

Switching case of characters

1
1. ramlinuxprasad 05 Aug 2017
  
  in Public
  
  gUiw Change current word to uppercase.
  
  uppercase word
  
  vim upper case
Visit annotations in context

Tags

upper case

vim

Annotators

ramlinuxprasad

URL

vim.wikia.com/wiki/Switching_case_of_characters
www.brookings.edu www.brookings.edu

Why are interest rates so low?

2
1. ramlinuxprasad 03 Aug 2017
  
  in Public
  
  “throwing seniors under the bus” (to use the words of one senator) by keeping interest rates low.
  
  Since interest rates are low, seniors who are trying to live off of their savings will be in trouble because their money doesn't earn anything for them.
  
  interest rate equilibrium interest rate
2. ramlinuxprasad 03 Aug 2017
  
  in Public
  
  real interest rates are determined by a wide range of economic factors, including prospects for economic growth—not by the Fed.
  
  Real interest rate = interest rate - inflation
  
  interest rate fed market financial
Visit annotations in context

Tags

fed

market

equilibrium interest rate

financial

interest rate

Annotators

ramlinuxprasad

URL

brookings.edu/blog/ben-bernanke/2015/03/30/why-are-interest-rates-so-low/
Jul 2017
docs.confluent.io docs.confluent.io

Concepts — Confluent Platform 3.2.2 documentation

2
1. ramlinuxprasad 25 Jul 2017
  
  in Public
  
  In distributed mode, you start many worker processes using the same group.id and they automatically coordinate to schedule execution of connectors and tasks across all available workers. I
  
  Distributed workers.
  
  group.id = "SHOUDL BE THE SAME FOR ALL WORKERS"
  
  confluent Kafka distributed workers
2. ramlinuxprasad 25 Jul 2017
  
  in Public
  
  Connectors and tasks are logical units of work and must be scheduled to execute in a process. Kafka Connect calls these processes workers and has two types of workers: standalone and distributed.
  
  Workers = JVM processes
  
  confluent Kafka
Visit annotations in context

Tags

confluent

distributed workers

Kafka

Annotators

ramlinuxprasad

URL

docs.confluent.io/current/connect/concepts.html
stackoverflow.com stackoverflow.com

How to create topics in apache kafka?

1
1. ramlinuxprasad 18 Jul 2017
  
  in Public
  
  up vote 7 down vote accepted When you are starting your kafka broker you can define set of properties in conf/server.properties file. This file is just key value property file. One of the property is auto.create.topics.enable if it set tot true(by default) kafka will create topic automatically when you send message to non existing topic. All config options you can find here Imho Simple rule for creating topics is the following: number of replicas must be not less than number of nodes that you have. Number of topics must be the multiplier of number of node in your cluster for example: You have 9 node cluster your topic must have 9 partitions and 9 replicas or 18 partitions and 9 replicas or 36 partitions and 9 replicas and so on
  
  Number of replicas = #replicas Number of nodes = #nodes Number of topics = #topic
  
  replicas >= #nodes
  
  k x (#topics) = #nodes
  
  Kafka replicas topic number of replicas
Visit annotations in context

Tags

topic

number of replicas

Kafka

replicas

Annotators

ramlinuxprasad

URL

stackoverflow.com/questions/36441768/how-to-create-topics-in-apache-kafka
www.vox.com www.vox.com

The myth of self-control

1
1. ramlinuxprasad 11 Jul 2017
  
  in Public
  
  The students who exerted more self-control were not more successful in accomplishing their goals. It was the students who experienced fewer temptations overall who were more successful when the researchers checked back in at the end of the semester.
  
  Reduce the number of distractions you get better results.
  
  self-help will power social science research
Visit annotations in context

Tags

will power

social science research

self-help

Annotators

ramlinuxprasad

URL

vox.com/science-and-health/2016/11/3/13486940/self-control-psychology-myth
github.com github.com

confluentinc/kafka-rest

1
1. ramlinuxprasad 06 Jul 2017
  
  in Public
  
  ab -t 15 -k -T "application/vnd.kafka.binary.v1+json" -p postfile http://localhost:8082/topics/test
  
  ab benchmark
  
  Kafka
Visit annotations in context

Tags

Kafka

Annotators

ramlinuxprasad

URL

github.com/confluentinc/kafka-rest/issues/93
www.investopedia.com www.investopedia.com

Stocks Basics: What Are Stocks?

1
1. ramlinuxprasad 05 Jul 2017
  
  in Public
  
  Owning stock gives you the right to vote in shareholder meetings, receive dividends (which are the company’s profits) if and when they are distributed, and it gives you the right to sell your shares to somebody else.
  
  dividends are profits shared amongst share holders
  
  stocks trading
Visit annotations in context

Tags

trading

stocks

Annotators

ramlinuxprasad

URL

investopedia.com/university/stocks/stocks1.asp
Jun 2017
www.datadoghq.com www.datadoghq.com

Monitoring Kafka performance metrics

2
1. ramlinuxprasad 29 Jun 2017
  
  in Public
  
  in sync replicas (ISRs) should be exactly equal to the total number of replicas.
  
  ISRs are a very imp metric
  
  Kafka monitoring ISR
2. ramlinuxprasad 29 Jun 2017
  
  in Public
  
  Kafka metrics can be broken down into three categories:Kafka server (broker) metricsProducer metricsConsumer metrics
  
  3 Metrics:
  
  Broker
  
  Producer (Netty)
  
  Consumer (SECOR)
  
  Kafka monitoring
Visit annotations in context

Tags

ISR

Kafka

monitoring

Annotators

ramlinuxprasad

URL

datadoghq.com/
www.oreilly.com www.oreilly.com

The world beyond batch: Streaming 101

1
1. ramlinuxprasad 26 Jun 2017
  
  in Public
  
  with notions of completeness being a convenient optimization rather than a semantic necessity.
  
  completeness cannot be assumed.
  
  apache beem correctness completeness
Visit annotations in context

Tags

apache beem

completeness

correctness

Annotators

ramlinuxprasad

URL

oreilly.com/ideas/the-world-beyond-batch-streaming-101
kafka.apache.org kafka.apache.org

Apache Kafka

1
1. ramlinuxprasad 21 Jun 2017
  
  in Public
  
  "isr" is the set of "in-sync" replicas.
  
  ISR are pretty import as when nodes go down you will see replicas created later.
  
  Kafka ISR replicas broker
Visit annotations in context

Tags

broker

ISR

Kafka

replicas

Annotators

ramlinuxprasad

URL

kafka.apache.org/documentation.html
github.com github.com

dhawangayash/secor

2
1. ramlinuxprasad 21 Jun 2017
  
  in Public
  
  public void increment(String label, String topic) { + Stats.incr(label); + }
  
  import io.prometheus.client.Counter; public static final Counter requests = Counter.build() .name("requests_total").help("Total requests.").register();
  
  public void increment(String label, String topic) { requests.inc(); // Your code here. } }
  
  secor prometheus instrumentation
2. ramlinuxprasad 21 Jun 2017
  
  in Public
  
  The component is pluggable, you can configure ingestion of metrics to any monitoring system by implementing MetricCollector interface. By default metrics are sent to Ostrich, statistic library bundled into secor.
  
  Just inject prometheus simple client into this module and you could get prometheus working for secor.
  
  secor prometheus instrumentation writers
Visit annotations in context

Tags

writers

secor

instrumentation

prometheus

Annotators

ramlinuxprasad

URL

github.com/dhawangayash/secor/commit/09d553a136e7cf8e22e65a33062e562b70e3d7ee
mail-archives.apache.org mail-archives.apache.org

Re: HDD or SSD or EBS for kafka brokers in Amazon EC2

1
1. ramlinuxprasad 20 Jun 2017
  
  in Public
  
  We run Kafka on the old and trusty m1.xlarge
  
  aws kafka m1.xlarge
  
  Kafka aws production
Visit annotations in context

Tags

production

aws

Kafka

Annotators

ramlinuxprasad

URL

mail-archives.apache.org/mod_mbox/kafka-users/201506.mbox/<CAMR1f-fvFZun7-gY3R9QQK88KEGfzn8dsgWBFZ+EafueePGQoQ@mail.gmail.com>
lwn.net lwn.net

Language summit lightning talks

1
1. ramlinuxprasad 17 Jun 2017
  
  in Public
  
  Shannon held up a BBC micro:bit board, which runs MicroPython and has been given to students in the UK, and noted that it only has 16KB of memory.
  
  check out MicroPython
  
  python MicroPython
Visit annotations in context

Tags

MicroPython

python

Annotators

ramlinuxprasad

URL

lwn.net/Articles/723823/
www.confluent.io www.confluent.io

How to choose the number of topics/partitions in a Kafka cluster? - Confluent

1
1. ramlinuxprasad 15 Jun 2017
  
  in Public
  
  You measure the throughout that you can achieve on a single partition for production (call it p) and consumption (call it c). Let’s say your target throughput is t.
  
  t = throughput (QPS) p = single partition for production c = consumption
  
  number of partitions Kafka producer consumer
Visit annotations in context

Tags

producer

number of partitions

Kafka

consumer

Annotators

ramlinuxprasad

URL

confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/
engineering.linkedin.com engineering.linkedin.com

Benchmarking Apache Kafka: 2 Million Writes Per Second (On Three Cheap Machines)

1
1. ramlinuxprasad 14 Jun 2017
  
  in Public
  
  Messages are immediately written to the filesystem when they are received. Messages are not deleted when they are read but retained with some configurable SLA (say a few days or a week)
  
  Kafka architecture
Visit annotations in context

Tags

architecture

Kafka

Annotators

ramlinuxprasad

URL

engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
zookeeper.apache.org zookeeper.apache.org

ZooKeeper Administrator's Guide

1
1. ramlinuxprasad 13 Jun 2017
  
  in Public
  
  Clustered (Multi-Server) Setup
  
  production setup for zookeeper
  
  zookeeper production
Visit annotations in context

Tags

production

zookeeper

Annotators

ramlinuxprasad

URL

zookeeper.apache.org/doc/r3.3.2/zookeeperAdmin.html
docs.confluent.io docs.confluent.io

Production Deployment — Confluent Platform 3.2.1 documentation

2
1. ramlinuxprasad 13 Jun 2017
  
  in Public
  
  ZooKeeper snapshots can be one such a source of concurrent writes, and ideally should be written on a disk group separate from the transaction log.
  
  zookeeper maintains concurrency in its own way.
  
  Kafka zookeeper config production
2. ramlinuxprasad 13 Jun 2017
  
  in Public
  
  If you do end up sharing the ensemble, you might want to use the chroot feature. With chroot, you give each application its own namespace.
  
  jail zookeeper instance from the other apps
  
  zookeeper production Kafka config
Visit annotations in context

Tags

Kafka

zookeeper

config

production

Annotators

ramlinuxprasad

URL

docs.confluent.io/current/kafka/deployment.html
www.cloudera.com www.cloudera.com

Using Kafka Command-line Tools

1
1. ramlinuxprasad 13 Jun 2017
  
  in Public
  
  Very useful kafka command-line tools to keep track of what's happening in your kafka cluster.
  
  Kafka command-line-tools monitoring
Visit annotations in context

Tags

monitoring

Kafka

command-line-tools

Annotators

ramlinuxprasad

URL

cloudera.com/documentation/kafka/latest/topics/kafka_command_line.html
cwiki.apache.org cwiki.apache.org

Consumer Group Example - Apache Kafka - Apache Software Foundation

2
1. ramlinuxprasad 08 Jun 2017
  
  in Public
  
  Designing a High Level Consumer
  
  By far the most important thing you need to know to make SECOR operate with Kafkaf
  
  Kafka secor consumer consumer_groups number of partitions number of consumers
2. ramlinuxprasad 07 Jun 2017
  
  in Public
  
  the High Level Consumer is provided to abstract most of the details of consuming events from Kafka.
  
  Kafka consumer high level consumer api
Visit annotations in context

Tags

Kafka

secor

number of partitions

high level consumer api

number of consumers

consumer_groups

consumer

Annotators

ramlinuxprasad

URL

cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
github.com github.com

Using KafkaConsumer in Secor · Issue #341 · pinterest/secor

1
1. ramlinuxprasad 07 Jun 2017
  
  in Public
  
  In merced, we used the low-level simple consumer and wrote our own work dispatcher to get precise control.
  
  difference between merced and secor
  
  secor merced Kafka consumer
Visit annotations in context

Tags

consumer

merced

Kafka

secor

Annotators

ramlinuxprasad

URL

github.com/pinterest/secor/issues/341
sookocheff.com sookocheff.com

Kafka in a Nutshell

4
1. ramlinuxprasad 07 Jun 2017
  
  in Public
  
  A better alternative is at least once message delivery. For at least once delivery, the consumer reads data from a partition, processes the message, and then commits the offset of the message it has processed. In this case, the consumer could crash between processing the message and committing the offset and when the consumer restarts it will process the message again. This leads to duplicate messages in downstream systems but no data loss.
  
  This is what SECOR does.
  
  Kafka consumer consumer_groups
2. ramlinuxprasad 07 Jun 2017
  
  in Public
  
  no data loss will occur as long as producers and consumers handle this possibility and retry appropriately.
  
  Retries should be built into the consumer and producer code. If leader for the partition fails, you will see a LeaderNotAvailable Exception.
  
  Kafka Leader replicas consistency availability partitions
3. ramlinuxprasad 07 Jun 2017
  
  in Public
  
  By electing a new leader as soon as possible messages may be dropped but we will minimized downtime as any new machine can be leader.
  
  two scenarios to get the leader back: 1.) Wait to bring the master back online. 2.) Or elect the first node that comes back up. But in this scenario if that replica partition was a bit behind the master then the time from when this replica went down to when the master went down. All that data is Lost.
  
  SO there is a trade off between availability and consistency. (Durability)
  
  Kafka availability consistency consumer producer replicas
4. ramlinuxprasad 07 Jun 2017
  
  in Public
  
  keep in mind that these guarantees hold as long as you are producing to one partition and consuming from one partition.
  
  This is very important a 1-to-1 mapping between writer and reader with partition. If you have more producers per partition or more consumers per partition your consistency is going to go haywire
  
  Kafka producer consumer consistency availability
Visit annotations in context

Tags

Kafka

producer

Leader

replicas

availability

consumer_groups

partitions

consistency

consumer

Annotators

ramlinuxprasad

URL

sookocheff.com/post/kafka/kafka-in-a-nutshell/
www.confluent.io www.confluent.io

Introducing the Kafka Consumer: Getting Started with the New Apache Kafka 0.9 Consumer Client - Confluent

4
1. ramlinuxprasad 07 Jun 2017
  
  in Public
  
  On every received heartbeat, the coordinator starts (or resets) a timer. If no heartbeat is received when the timer expires, the coordinator marks the member dead and signals the rest of the group that they should rejoin so that partitions can be reassigned. The duration of the timer is known as the session timeout and is configured on the client with the setting session.timeout.ms.
  
  Time to live for the consumers. If the heartbeat doesn't reach the co-ordindator in this duration then the co-ordinator redistributes the partitions to the remaining consumers in the consumer group.
  
  Kafka consumer consumer_groups time to live
2. ramlinuxprasad 06 Jun 2017
  
  in Public
  
  The high watermark is the offset of the last message that was successfully copied to all of the log’s replicas.
  
  High Watermark: messages copied over to log replicas
  
  Kafka consumer consumer_groups producer replicas
3. ramlinuxprasad 06 Jun 2017
  
  in Public
  
  Kafka new Client which uses a different protocol for consumption in a distributed environment.
  
  Kafka consumer_groups consumer
4. ramlinuxprasad 06 Jun 2017
  
  in Public
  
  Kafka scales topic consumption by distributing partitions among a consumer group, which is a set of consumers sharing a common group identifier.
  
  Topic consumption is distributed among a list of consumer group.
  
  Kafka consumer consumer_groups
Visit annotations in context

Tags

Kafka

producer

replicas

time to live

consumer_groups

consumer

Annotators

ramlinuxprasad

URL

confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0-9-consumer-client/
www.securityweek.com www.securityweek.com

Head to the Cloud for a Head's Up on Fraud | SecurityWeek.Com

1
1. ramlinuxprasad 05 Jun 2017
  
  in Public
  
  In a recent analysis of more than 500 billion events collected from multiple global online services, 18% of user accounts that originated from cloud service IP ranges were fraudulent.
  
  Really cool statistics.
  
  fraud datavisor blog
Visit annotations in context

Tags

fraud

datavisor blog

Annotators

ramlinuxprasad

URL

securityweek.com/head-cloud-heads-fraud
engineering.pinterest.com engineering.pinterest.com

Introducing Pinterest Secor – Pinterest Engineering – Medium

2
1. ramlinuxprasad 03 Jun 2017
  
  in Public
  
  Kafka consumer offset management protocol to keep track of what’s been uploaded to S3
  
  consumers keep track of what's written and where it left off by looking at kafka consumer offsets rather than checking S3 since S3 is an eventually consistent system.
  
  Kafka secor consumer s3 consistency
2. ramlinuxprasad 03 Jun 2017
  
  in Public
  
  Data lost or corrupted at this stage isn’t recoverable so the greatest design objective for Secor is data integrity.
  
  data loss in S3 is being mitigated.
  
  secor Kafka s3 consumer
Visit annotations in context

Tags

Kafka

secor

s3

consistency

consumer

Annotators

ramlinuxprasad

URL

engineering.pinterest.com/blog/introducing-pinterest-secor
www.elastic.co www.elastic.co

Basic Concepts | Elasticsearch Reference [2.3] | Elastic

4
1. ramlinuxprasad 02 Jun 2017
  
  in Public
  
  Replication is important for two primary reasons:
  
  HA with fail-over mechanism.
  
  scale out your search volume/throughput as searches can happen based on your replicas in parallel.
  
  shards elastic advantages of replicating your shards
2. ramlinuxprasad 02 Jun 2017
  
  in Public
  
  Sharding is important for two primary reasons:
  
  horizontal scaling of a single index
  
  parallelize seek operations on multiple shards when index gets too big
  
  shards elastic advantages of sharding indexes
3. ramlinuxprasad 02 Jun 2017
  
  in Public
  
  , Elasticsearch provides the ability to subdivide your index into multiple pieces called shards.
  
  Think of shard as the indexes broken down further to span over multiple nodes in your cluster.
  
  elastic shards index create index
4. ramlinuxprasad 01 Jun 2017
  
  in Public
  
  An index can potentially store a large amount of data that can exceed the hardware limits of a single node. For example, a single index of a billion documents taking up 1TB of disk space may not fit on the disk of a single node or may be too slow to serve search requests from a single node alone.
  
  Indexes may overflow the disk space. Hence you want to get the most out of your instances by indexing the nodes.
  
  shards elastic Kafka
Visit annotations in context

Tags

advantages of sharding indexes

Kafka

create index

advantages of replicating your shards

elastic

index

shards

Annotators

ramlinuxprasad

URL

elastic.co/guide/en/elasticsearch/reference/2.3/_basic_concepts.html
www.elastic.co www.elastic.co

Create Index | Elasticsearch Reference [5.4] | Elastic

1
1. ramlinuxprasad 02 Jun 2017
  
  in Public
  
  The create index API allows to instantiate an index. Elasticsearch provides support for multiple indices, including executing operations across several indices.
  
  In this you could create different shard size per Index basis on Elastic. Super useful when you have a single cluster but multi-tenant
  
  elastic shards multi tenant
Visit annotations in context

Tags

multi tenant

shards

elastic

Annotators

ramlinuxprasad

URL

elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html
github.com github.com

pinterest/secor

2
1. ramlinuxprasad 01 Jun 2017
  
  in Public
  
  incidents are an unavoidable reality of working with distributed systems, no matter how reliable. A prompt alerting solution should be an integral part of the design,
  
  see how it can hook into the current logging mechanism
  
  Kafka consumer producer logging alerting mechanism
2. ramlinuxprasad 01 Jun 2017
  
  in Public
  
  Consumers in this group are designed to be dead-simple, performant, and highly resilient. Since the data copied verbatim, no code upgrades are required to support new message types.
  
  exactly what we want
  
  Kafka consumer consumer_groups realtime
Visit annotations in context

Tags

alerting mechanism

Kafka

producer

realtime

logging

consumer_groups

consumer

Annotators

ramlinuxprasad

URL

github.com/pinterest/secor/blob/master/DESIGN.md
May 2017
github.com github.com

dhawangayash/secor

1
1. ramlinuxprasad 31 May 2017
  
  in Public
  
  tar -zxvf target/secor-0.1-SNAPSHOT-bin.tar.gz -C ${SECOR_INSTALL_DIR}
  
  linux command to create a new tar ball in a different directory. Might come handy
  
  linux tar create tar
Visit annotations in context

Tags

tar

linux

create tar

Annotators

ramlinuxprasad

URL

github.com/dhawangayash/secor
medium.com medium.com

Scalable and reliable data ingestion at Pinterest – Pinterest Engineering – Medium

1
1. ramlinuxprasad 31 May 2017
  
  in Public
  
  More events may arrive late for various reasons, we need to handle late-arrived events consistently.
  
  May not be needed for our use case.
  
  Kafka producer consumer
Visit annotations in context

Tags

producer

Kafka

consumer

Annotators

ramlinuxprasad

URL

medium.com/@Pinterest_Engineering/scalable-and-reliable-data-ingestion-at-pinterest-b921c2ee8754
www.quora.com www.quora.com

(1/1) What are the most significant differences between Flume and Kafka? - Quora

1
1. ramlinuxprasad 31 May 2017
  
  in Public
  
  With Flume & FlumeNG, and a File channel, if you loose a broker node you will lose access to those events until you recover that disk.
  
  In flume you loose events if the disk is down. This is very bad for our usecase.
  
  flume Kafka broker producer
Visit annotations in context

Tags

producer

flume

Kafka

broker

Annotators

ramlinuxprasad

URL

quora.com/What-are-the-most-significant-differences-between-Flume-and-Kafka
stackoverflow.com stackoverflow.com

How do you determine the ideal buffer size when using FileInputStream?

1
1. ramlinuxprasad 31 May 2017
  
  in Public
  
  Optimum buffer size is related to a number of things: file system block size, CPU cache size and cache latency. Most file systems are configured to use block sizes of 4096 or 8192. In theory, if you configure your buffer size so you are reading a few bytes more than the disk block, the operations with the file system can be extremely inefficient (i.e. if you configured your buffer to read 4100 bytes at a time, each read would require 2 block reads by the file system). If the blocks are already in cache, then you wind up paying the price of RAM -> L3/L2 cache latency. If you are unlucky and the blocks are not in cache yet, the you pay the price of the disk->RAM latency as well. This is why you see most buffers sized as a power of 2, and generally larger than (or equal to) the disk block size. This means that one of your stream reads could result in multiple disk block reads - but those reads will always use a full block - no wasted reads. Now, this is offset quite a bit in a typical streaming scenario because the block that is read from disk is going to still be in memory when you hit the next read (we are doing sequential reads here, after all) - so you wind up paying the RAM -> L3/L2 cache latency price on the next read, but not the disk->RAM latency. In terms of order of magnitude, disk->RAM latency is so slow that it pretty much swamps any other latency you might be dealing with. So, I suspect that if you ran a test with different cache sizes (haven't done this myself), you will probably find a big impact of cache size up to the size of the file system block. Above that, I suspect that things would level out pretty quickly. There are a ton of conditions and exceptions here - the complexities of the system are actually quite staggering (just getting a handle on L3 -> L2 cache transfers is mind bogglingly complex, and it changes with every CPU type). This leads to the 'real world' answer: If your app is like 99% out there, set the cache size to 8192 and move on (even better, choose encapsulation over performance and use BufferedInputStream to hide the details). If you are in the 1% of apps that are highly dependent on disk throughput, craft your implementation so you can swap out different disk interaction strategies, and provide the knobs and dials to allow your users to test and optimize (or come up with some self optimizing system).
  
  What's the cache size to keep when reading from file to a buffer?
  
  cache buffer byteinputstream java cpu
Visit annotations in context

Tags

buffer

byteinputstream

cache

cpu

java

Annotators

ramlinuxprasad

URL

stackoverflow.com/questions/236861/how-do-you-determine-the-ideal-buffer-size-when-using-fileinputstream
kafka.apache.org kafka.apache.org

Apache Kafka

6
1. ramlinuxprasad 31 May 2017
  
  in Public
  
  The Kafka cluster retains all published records—whether or not they have been consumed—using a configurable retention period. For example, if the retention policy is set to two days, then for the two days after a record is published, it is available for consumption, after which it will be discarded to free up space. Kafka's performance is effectively constant with respect to data size so storing data for a long time is not a problem.
  
  irrespective of the fact that the consumer has consumed the message that message is kept in kafka for the entire retention policy duration.
  
  You can have two or more consumer groups: 1 -> real time 2 -> back up consumer group
  
  Kafka topic consumer retention policy producer
2. ramlinuxprasad 24 May 2017
  
  in Public
  
  Kafka for Stream Processing
  
  Could be something we can consider for directing data from a raw log to a tenant based topic.
  
  Kafka tenant producer
3. ramlinuxprasad 24 May 2017
  
  in Public
  
  replication factor N, we will tolerate up to N-1 server failures without losing any records
  
  Replication Factor means number of nodes/brokers which could go down before we start losing data.
  
  So if you have a replication factor of 6 for a 11 node cluster, then you will be fault tolerant till 5 nodes go down. After that point you are going to loose data for a particular partition.
  
  Kafka producer data_loss replication_factor
4. ramlinuxprasad 24 May 2017
  
  in Public
  
  Messages sent by a producer to a particular topic partition will be appended in the order they are sent. That is, if a record M1 is sent by the same producer as a record M2, and M1 is sent first, then M1 will have a lower offset than M2 and appear earlier in the log.
  
  ordering is guaranteed.
  
  Kafka producer
5. ramlinuxprasad 24 May 2017
  
  in Public
  
  Consumers label themselves with a consumer group name, and each record published to a topic is delivered to one consumer instance within each subscribing consumer group. Consumer instances can be in separate processes or on separate machines.
  
  kafka takes care of the consumer groups. Just create one Consumer Group for each topic.
  
  Kafka consumer_groups
6. ramlinuxprasad 24 May 2017
  
  in Public
  
  The partitions of the log are distributed over the servers in the Kafka cluster with each server handling data and requests for a share of the partitions.
  
  partitions of logs are per TOPIC basis
  
  topic Kafka
Visit annotations in context

Tags

topic

Kafka

tenant

producer

retention policy

data_loss

replication_factor

consumer_groups

consumer

Annotators

ramlinuxprasad

URL

kafka.apache.org/intro
www.quora.com www.quora.com

How many topics can be created in Apache Kafka? - Quora

1
1. ramlinuxprasad 31 May 2017
  
  in Public
  
  The first limitation is that each partition is physically represented as a directory of one or more segment files. So you will have at least one directory and several files per partition. Depending on your operating system and filesystem this will eventually become painful. However this is a per-node limit and is easily avoided by just adding more total nodes in the cluster.
  
  total number of topics supported depends on the total number of partitions per topic.
  
  partition = directory of 1 or more segment files This is a per node limit
  
  Kafka partitions topic topics
Visit annotations in context

Tags

topics

topic

partitions

Kafka

Annotators

ramlinuxprasad

URL

quora.com/How-many-topics-can-be-created-in-Apache-Kafka
stackoverflow.com stackoverflow.com

Apache Kafka Scaling Topics using partitions

1
1. ramlinuxprasad 31 May 2017
  
  in Public
  
  the number of partitions -- there's no real "formula" other than this: you can have no more parallelism than you have partitions.
  
  This is an important thing to keep in mind. If we need massive parallelism we need to have more partitions.
  
  Kafka broker partitions number of partitions
Visit annotations in context

Tags

number of partitions

partitions

Kafka

broker

Annotators

ramlinuxprasad

URL

stackoverflow.com/questions/36945521/apache-kafka-scaling-topics-using-partitions
sookocheff.com sookocheff.com

Kafka in a Nutshell

1
1. ramlinuxprasad 31 May 2017
  
  in Public
  
  The offset the ordering of messages as an immutable sequence. Kafka maintains this message ordering for you.
  
  Kafka maintains the ordering for you...
  
  producer Kafka offset
Visit annotations in context

Tags

producer

offset

Kafka

Annotators

ramlinuxprasad

URL

sookocheff.com/post/kafka/kafka-in-a-nutshell/
news.ycombinator.com news.ycombinator.com

Startup Mistakes | Hacker News

2
1. ramlinuxprasad 27 May 2017
  
  in Public
  
  So to a start up founder, timing correctly when to do something is far more important than doing that thing amazingly well
  
  Words of wisdom - The act of doing should go hand in hand with the act of figuring out when to do something.
  
  startup enterpreneurship
2. ramlinuxprasad 27 May 2017
  
  in Public
  
  "I don't have a system and need to build one."
  
  It's a POC - GSD and most importantly see what sticks.
  
  Estimation enterpreneurship startup
Visit annotations in context

Tags

enterpreneurship

Estimation

startup

Annotators

ramlinuxprasad

URL

news.ycombinator.com/item
dopeboy.github.io dopeboy.github.io

Startup mistakes – Manish Sinha – Engineer

2
1. ramlinuxprasad 27 May 2017
  
  in Public
  
  ($20*3)-($20*3*.1) = $54
  
  10% of $2000(cost of camer) * 3days = Rental Price
  
  Rental Price - Commission = Rental Made This guy totally forgot taxes here.... :)
  
  54$ for 3 days 365 days a year about 50 % usage so roughly 180 days. $54 for 3 days $? for 180 days = $3240 about 740$ profit per year for a $2000 investment if he's 50% utilized over the year.
  
  Camera's Man this guy needed to crunch some more numbers. Camera's have compatibility issues....
  
  back of the napkiin analysis Estimation P/L Analysis enterpreneurship startup
2. ramlinuxprasad 27 May 2017
  
  in Public
  
  We should have built the absolute minimum and then manually maintained transactions through the backend if need be. To hell with formality. We were in the business of proving a hypothesis yet we acted as if it had already been proven.
  
  quick and dirty solution and see if it sticks...
  
  enterpreneurship startup
Visit annotations in context

Tags

back of the napkiin analysis

enterpreneurship

Estimation

startup

P/L Analysis

Annotators

ramlinuxprasad

URL

dopeboy.github.io/Lessons/
www.goodreads.com www.goodreads.com

Earth Unaware (The First Formic War, #1)

1
1. ramlinuxprasad 27 May 2017
  
  in Public
  
  the asteroid mining and the cultures that sprout up
  
  That was a good idea.
  
  enders-game book-review Earth unaware
Visit annotations in context

Tags

Earth unaware

book-review

enders-game

Annotators

ramlinuxprasad

URL

goodreads.com/book/show/13151129-earth-unaware
docs.aws.amazon.com docs.aws.amazon.com

S3Object (AWS SDK for Java - 1.11.136)

2
1. ramlinuxprasad 26 May 2017
  
  in Public
  
  Represents an object stored in Amazon S3.
  
  S3Object is a pointer to the data object.
  
  s3 reading data aws
2. ramlinuxprasad 26 May 2017
  
  in Public
  
  S3ObjectInputStream
  
  Provides an InputStream to read the data.
  
  s3 aws reading data
Visit annotations in context

Tags

s3

aws

reading data

Annotators

ramlinuxprasad

URL

docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/S3Object.html
www.virgin.com www.virgin.com

The best thing to do with your bright idea? Destroy it

4
1. ramlinuxprasad 26 May 2017
  
  in Public
  
  Listerine, for example, started life on the shelf as an antiseptic, sold as both floor cleaner and a treatment for gonorrhoea. But it wasn't a runaway success until it was marketed as a treatment for bad breath.
  
  Interesting use-case
  
  career-advice entreprenuership anecdotes
2. ramlinuxprasad 26 May 2017
  
  in Public
  
  But many ideas are destined for improvement
  
  you may start with something and end up with something totally different
  
  entreprenuership career-advice
3. ramlinuxprasad 26 May 2017
  
  in Public
  
  Brainstorm.
  
  Break the idea and fix idea. (Scrutinize)
  
  Permeate and let the idea seep into you.
  
  Execute - execute -execute
  
  entreprenuership work career-advice
4. ramlinuxprasad 26 May 2017
  
  in Public
  
  There are many people with good ideas who don't have the means, the will, or the courage to action them. Similarly, there are very talented business people who have no ideas, but are brilliant at the execution.
  
  figure out what you want to be and continue on that path, get better at it, and invest time and effort into it.
  
  entreprenuership focus career-advice work
Visit annotations in context

Tags

anecdotes

entreprenuership

focus

career-advice

work

Annotators

ramlinuxprasad

URL

virgin.com/entrepreneur/best-thing-do-your-bright-idea-destroy-it
www.safaribooksonline.com www.safaribooksonline.com

Chapter 1 : Big Data Technology Landscape

1
1. ramlinuxprasad 25 May 2017
  
  in Public
  
  volume, velocity, and variety
  
  volume: The actual size of traffic
  
  Velocity: How fast does the traffic show up.
  
  Variety: Refers to data that can be unstructured, semi structured or multi structured.
  
  big-data data quality attributes
Visit annotations in context

Tags

quality attributes

big-data

data

Annotators

ramlinuxprasad

URL

safaribooksonline.com/library/view/big-data-analytics/9781484209646/9781484209653_Ch01.xhtml
kafka.apache.org kafka.apache.org

Apache Kafka

8
1. ramlinuxprasad 25 May 2017
  
  in Public
  
  replication-factor 3
  
  If n-1=2 nodes go down you will start loosing data. So that means if both the nodes go down you will loose data.
  
  replication_factor Kafka topic
2. ramlinuxprasad 24 May 2017
  
  in Public
  
  For a topic with replication factor N, we will tolerate up to N-1 server failures without losing any records committed to the log.
  
  for Eg for a given topic there are 11 brokers/servers and for each topic the replication factor is 6. That means the topic will start loosing data if more than 5 brokers go down.
  
  Kafka data_loss consumer
3. ramlinuxprasad 24 May 2017
  
  in Public
  
  The way consumption is implemented in Kafka is by dividing up the partitions in the log over the consumer instances so that each instance is the exclusive consumer of a "fair share" of partitions at any point in time. This process of maintaining membership in the group is handled by the Kafka protocol dynamically. If new instances join the group they will take over some partitions from other members of the group; if an instance dies, its partitions will be distributed to the remaining instances.
  
  The coolest feature: this way all you need to do is add new consumers in a consumer group to auto scale per topic
  
  consumer_groups consumer Kafka
4. ramlinuxprasad 24 May 2017
  
  in Public
  
  Consumers label themselves with a consumer group name
  
  maintain separate consumer group per tenant basis. Helps to scale out when we have more load per tenant.
  
  data_loss Kafka consumer_groups consumer
5. ramlinuxprasad 24 May 2017
  
  in Public
  
  The producer is responsible for choosing which record to assign to which partition within the topic.
  
  Producer can publish to a specific topics
  
  Kafka producer topic
6. ramlinuxprasad 24 May 2017
  
  in Public
  
  individual partition must fit on the servers that host it
  
  Each Partition is bounded by the server that hosts that partition.
  
  Kafka producer
7. ramlinuxprasad 24 May 2017
  
  in Public
  
  the only metadata retained on a per-consumer basis is the offset or position of that consumer in the log. This offset is controlled by the consumer: normally a consumer will advance its offset linearly as it reads records, but, in fact, since the position is controlled by the consumer it can consume records in any order it likes.
  
  partition offset maintained by kafka. Offset number is maintained so that if the consumer goes down nothing breaks.
  
  data_loss producer Kafka
8. ramlinuxprasad 24 May 2017
  
  in Public
  
  the retention policy is set to two days, then for the two days after a record is published,
  
  Might have to tweek this based on the persistence level we want to keep.
  
  data_loss Kafka
Visit annotations in context

Tags

topic

Kafka

producer

replication_factor

consumer

consumer_groups

data_loss

Annotators

ramlinuxprasad

URL

kafka.apache.org/documentation.html
winterbe.com winterbe.com

winterbe.com

1
1. ramlinuxprasad 24 May 2017
  
  in Public
  
  Executing the above code results in a TimeoutException:
  
  For some reason this did not work for me. No idea why?
Visit annotations in context

Annotators

ramlinuxprasad

URL

winterbe.com/posts/2015/04/07/java8-concurrency-tutorial-thread-executor-examples/
www.tutorialspoint.com www.tutorialspoint.com

Apache Spark RDD

1
1. ramlinuxprasad 23 May 2017
  
  in Public
  
  The key idea of spark is Resilient Distributed Datasets (RDD); it supports in-memory processing computation.
  
  RDD is a way in which spark stores datasets in-memory
Visit annotations in context

Annotators

ramlinuxprasad

URL

tutorialspoint.com/apache_spark/apache_spark_rdd.htm
www.infoq.com www.infoq.com

REST Anti-Patterns

3
1. ramlinuxprasad 13 May 2017
  
  in Public
  
  But ideally, a client should have to know a single URI only; everything else – individual URIs, as well as recipes for constructing them e.g. in case of queries – should be communicated via hypermedia, as links within resource representations.
  
  Well something like http://www.example.com/user-ids/1234 this makes each of the userids a separate URI and hence accessible.
2. ramlinuxprasad 13 May 2017
  
  in Public
  
  REST simply means using HTTP to expose some application functionality. The fundamental and most important operation (strictly speaking, “verb” or “method” would be a better term) is an HTTP GET.
  
  Almost everybody is going to come with this mindset into the RESTFul world. I started here and still most times fall back to this thought process when things start to hurt my head. :)
3. ramlinuxprasad 13 May 2017
  
  in Public
  
  HTTP fixes them at GET, PUT, POST and DELETE (primarily, at least), and casting all of your application semantics into just these four verbs takes some getting used to. But once you’ve done that, people start using a subset of what actually makes up REST – a sort of Web-based CRUD (Create, Read, Update, Delete) architecture. Applications that expose this anti-pattern are not really “unRESTful” (if there even is such a thing), they just fail to exploit another of REST’s core concepts: hypermedia as the engine of application state.
  
  This thought process is pretty hard to get used to initially especially with distributed systems.
Visit annotations in context

Annotators

ramlinuxprasad

URL

infoq.com/articles/rest-anti-patterns

RamDGG

I am a Senior Director at Boeing focusing on building aerospace systems.

Annotations: 96

Joined: November 11, 2016

Link: linkedin.com/in/ram-prasad-07638591/

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

replicas >= #nodes

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators