Hypothesis

79 Matching Annotations

May 2017
nikhilnkini.files.wordpress.com nikhilnkini.files.wordpress.com

probabilisticprogrammingconcepts.pdf

22
1. nkini 03 May 2017
  
  in Public
  
  the termconceptto denote the underlying notion.
  
  Concepts according to section 4 of this paper:
  
  Flexible Probabilities: compute prob on the fly by allowing prob to be part of an atom.
  
  Distributional Clauses: A shorthand notation to assign a distribution to a random variable
  
  Unknown Objects : Dropping the closed world assumption, and dropping the unique names assumption.
  
  Stochastic Memoization: "If a random variable in Church is memoized, subsequent calls to it simply look up the result of the first call, similarly to tabling in logic programming"
  
  Constraints
  
  Negation as Failure
  
  Second Order Predicates
  
  Meta-Calls
  
  Time and Dynamics
  
  Generalized Labels for Facts and Queries
2. nkini 03 May 2017
  
  in Public
  
  It is often assumed that probabilistic facts do not unify with other probabilisticfacts or heads of rules
  
  I think this is saying that you can't follow a chain of reasoning without considering the probabilities.
  
  e.g. All men are social + Socrates is a man => Socrates is social, disregards the probabilities of all men are social and Socrates is a man.
3. nkini 03 May 2017
  
  in Public
  
  Legal groundings of such facts can also be restricted by providing a domain, asin the following variant of our alarm example where all persons have the sameprobability of independently hearing the alarm
  
  Just a special case where several probabilistic facts have the same probability.
4. nkini 03 May 2017
  
  in Public
  
  All ground instances o
  
  Umm, are they non-ground or ground?
  
  question
5. nkini 03 May 2017
  
  in Public
  
  b^e^hm^:hj
  
  This conflicts with the possible world (3), so... is this a new example?
  
  question
6. nkini 03 May 2017
  
  in Public
  
  we obtain the success probabilityofcalls(mary),P(calls(mary)) = 0:196
  
  OK, but this is the marginal distribution that Mary calls? Not conditional on the facts that we already know, that a burglary happened, that an earthquake did not, that Mary heard the alarm?
  
  question
7. nkini 03 May 2017
  
  in Public
  
  to provecall
  
  Does this mean to prove 'there exists a call'?
  
  question
8. nkini 03 May 2017
  
  in Public
  
  backward reasoning
  
  query: call
  
  It seems to me that we need to prove that 'call' happened. ```
  
  For this we need to prove that calls(X) happened.
  
  For this we need to prove that alarm,hears_alarm(X) happened.
  
  For this we need to prove that either burglary or an earthquake happened.
  
  Since burglary is true, subbing X = Mary proves calls.
9. nkini 03 May 2017
  
  in Public
  
  orward reasoning
  
  Adds additional facts
10. nkini 03 May 2017
  
  in Public
  
  queryq
  
  What are two examples of queries?
  
  question
11. nkini 03 May 2017
  
  in Public
  
  unique least Herbrand model (i.e., a unique least model using onlysymbols from the program
  
  What does this mean?
  
  "using only symbols from the program" as opposed to what?
  
  question
12. nkini 03 May 2017
  
  in Public
  
  0:1 ::burglary:0:7 ::hearsalarm(mary):0:2 ::earthquake:0:4 ::hearsalarm(john):
  
  Probabilistic facts. Each corresponds to a bool RV.
13. nkini 03 May 2017
  
  in Public
  
  The corresponding logicprogram obtained by adding the set of rulesRto the set of facts, also called apossible world
  
  Notice that probabilities are gone.
14. nkini 03 May 2017
  
  in Public
  
  alarm:earthquake:(1)alarm:burglary:calls(X):alarm;hearsalarm(X):call:calls(X)
  
  Rules / Definite clauses
15. nkini 03 May 2017
  
  in Public
  
  To summarize, the key contributions of this paper ar
  
  Identify core concepts used by various probabilstic languages
  
  Discussion of the execution mechanism that they require
  
  Positioning of state-of-the-art prob languages and implementations wrt these concepts
16. nkini 03 May 2017
  
  in Public
  
  Inference, that is, evaluating the probability distribution dened by a programor model
  
  Remember this definition.
17. nkini 03 May 2017
  
  in Public
  
  Typical probabilistic programming languages, on the other hand, employa variant of Sato's distribution semantics (Sato, 1995), in which random variablesdirectly correspond to ground facts and a traditional program species how to de-duce further knowledge from these facts
  
  Difference:
  
  SRL: Knowledge-based model construction, logic is used as a template for constructing a graphical model.
  
  Typical prob prog languages: Sato's distribution semantics, in which RVs directly correspond to ground facts. A traditional program specifies how to deduce further knowledge from these facts.
18. nkini 03 May 2017
  
  in Public
  
  probabilistic extensions of logic programming languages
  
  ICL
  
  SLP
  
  PRISM
  
  BLP
  
  CLP(BN)
  
  LPAD
  
  P-log
  
  Dyna
  
  CP-Logic
  
  ProbLog
  
  PROPPR
19. nkini 03 May 2017
  
  in Public
  
  statistical relational learningmodels
  
  RBN
  
  PRM
  
  MLN
20. nkini 03 May 2017
  
  in Public
  
  alternative probabilistic programming language
  
  IBAL
  
  BLOG
  
  Church
  
  Figaro
21. nkini 03 May 2017
  
  in Public
  
  By now, it is commonly accepted that the more interestingquestion is concerned with the underlying concepts that these languages employand their eect on the inference mechanisms, as their expressive power is oftenvery similar.
  
  Expressiveness is OK. Interesting? This:
  
  Underlying concepts employed
  
  Effect on inference mechanisms
22. nkini 03 May 2017
  
  in Public
  
  To obtain a better understanding of probabilistic programming, we identify anumber of core programming concepts underlying the primitives used by variousprobabilistic languages, discuss the execution mechanisms that they require anduse these to position and survey state-of-the-art probabilistic languages and theirimplementation.While doing so, we focus on probabilistic extensions oflogicprogramminglanguages such as Prolog, which have been considered for over 20 years.
  
  A survey of probabilistic logic languages, with a bit of talk about probabilistic extensions to only logic languages.
Visit annotations in context

Tags

question

Annotators

nkini

URL

nikhilnkini.files.wordpress.com/2017/05/probabilisticprogrammingconcepts.pdf
Apr 2017
nikhilnkini.files.wordpress.com nikhilnkini.files.wordpress.com

D:/arvind/research/papers/2006/simjoin/paper.dvi

2
1. nkini 27 Apr 2017
  
  in Public
  
  We performed experiments on two real data sets: the first is a pro-prietary address data set and the second is the publicly availableDBLP data set. The address data set contains 1 million strings,each a concatenation of an organization name and address (street,city, zip, state). The DBLP data set contains around 0.5 millionstrings, each a concatenation of authors and title of a publication.
  
  Datasets
2. nkini 27 Apr 2017
  
  in Public
  
  The self-proclaimed be all and end all - does everything exactly and better than everything else - like throwing a bunch of performance metric magnets at the refrigerator and seeing that everything sticks.
  
  Challenge: Find holes in the supremacy claims.
Visit annotations in context

Annotators

nkini

URL

nikhilnkini.files.wordpress.com/2017/04/arasu-vldb06_efficientexactsetsimjoins.pdf
www.iacr.org www.iacr.org

http://note.sonots.com/?plugin=attach&refer=SciSoftware%2FFaceRecognition&openfile=EigenFisherFace.pdf

10
1. nkini 27 Apr 2017
  
  in Public
  
  Motivation of the paper is articulated in the conclusion: We have divided the problem into two aspects: theblack-box functions that match and merge records, and theER algorithm that invokes these functions. In our opinion,this division has two important advantages: (i) it yields gene-ric ER algorithms that can be used, with well-defined se-mantics, in many applications, and (ii) it lets us focus onour performance measure, the number of black-box invoca-tions.
2. nkini 27 Apr 2017
  
  in Public
  
  n summary, ER is an inherently expensive process, sothat only relatively small sets of records can be handled.Thus, large data sets need to be partitioned into smaller setsthat can be resolved in detail. How large a data set can beexhaustively resolved depends on the application. It is alsopossible to distribute the ER computations across multipleprocessors, in order to handle larger data sets. In [7] westudy various strategies for distributing the work done byR-Swoosh.
  
  Worth a look, methinks.
3. nkini 27 Apr 2017
  
  in Public
  
  F-Swoosh
  
  Feature-level
4. nkini 27 Apr 2017
  
  in Public
  
  R-Swoosh:
  
  Record-level
5. nkini 27 Apr 2017
  
  in Public
  
  G-Swoosh
  
  General
6. nkini 27 Apr 2017
  
  in Public
  
  We identify the ICAR properties
  
  Idempotence Commutativity Associativity Representativity
7. nkini 27 Apr 2017
  
  in Public
  
  The particular variant of the ER problem that we studyin this paper may not be the most sophisticated
  
  Amen.
8. nkini 27 Apr 2017
  
  in Public
  
  Un-like other works that focus only on identifying matchingrecords
  
  Unlike horses that don't have wheels, our car comes with 4.
9. nkini 27 Apr 2017
  
  in Public
  
  Because record matching is inherently expensive, largesets of input records are often divided into “buckets” usingapplication knowledge, and then ER is run on each bucket.
  
  Blocking. They're talking about blocking.
10. nkini 27 Apr 2017
  
  in Public
  
  The one with a lot of very strong assumptions. "Could this BE any less relational?"
Visit annotations in context

Annotators

nkini

URL

iacr.org/archive/crypto2007/46220317/46220317.pdf
nikhilnkini.files.wordpress.com nikhilnkini.files.wordpress.com

becker-wsdm10-learningsimilaritymetricsforeventidinsocialmedia.pdf

2
1. nkini 27 Apr 2017
  
  in Public
  
  One key challenge for the classication-based approachinvolves the selection of training examples from which tolearn the similarity classiers. Ideally, we want our modelto correctly predict the similarity of every document to everyother document (or every centroid, based on the modelingchoice described above) in the dataset. However, creating atraining example for each document (or document-centroid)pair results in a skewed label distribution, since a large ma-jority of pairs in the training dataset do not belong to thesame event
  
  Sound familiar?
2. nkini 27 Apr 2017
  
  in Public
  
  Radiohead
  
  Never thought I'd read a paper with "Radiohead" and "my favorite band" in the same sentence. Most certainly not one that wasn't my own anyway.
Visit annotations in context

Annotators

nkini

URL

nikhilnkini.files.wordpress.com/2017/04/becker-wsdm10-learningsimilaritymetricsforeventidinsocialmedia.pdf
nikhilnkini.files.wordpress.com nikhilnkini.files.wordpress.com

baxter-kdd03_comparisonoffastblockingmethodsrecordlinkage.pdf

2
1. nkini 27 Apr 2017
  
  in Public
  
  [6] M. Hernandez and S. Stolfo. Real-world data is dirty:data cleansing and the merge/purge problem.Journalof Data Mining and Knowledge Discovery, 1(2), 1998.
  
  Say the folks that didn't really run experiments on real-world data :/
2. nkini 27 Apr 2017
  
  in Public
  
  The experimental comparisons should be extended to non-DBGendata sets to investigate dependencies on data sourcesand their error characteristics.
  
  Yeah. Tell me about it.
Visit annotations in context

Annotators

nkini

URL

nikhilnkini.files.wordpress.com/2017/04/baxter-kdd03_comparisonoffastblockingmethodsrecordlinkage.pdf
nikhilnkini.files.wordpress.com nikhilnkini.files.wordpress.com

Microsoft Word - vldb_camera2.doc

1
1. nkini 27 Apr 2017
  
  in Public
  
  Weobservedsimilarresultsonthepublicationinformationofabibliographydatabase.Weomitresultsduetospaceconstraints
  
  Oh come on :/
Visit annotations in context

Annotators

nkini

URL

nikhilnkini.files.wordpress.com/2017/04/ananthakrishna-vldb02_eliminatingfuzzyduplicates.pdf
nikhilnkini.files.wordpress.com nikhilnkini.files.wordpress.com

wiri_aiz_final.dvi

1
1. nkini 27 Apr 2017
  
  in Public
  
  Table 2. Target databases.
  
  Wow, they wrote and collected 15million academic articles? Pretty impressive, no?
Visit annotations in context

Annotators

nkini

URL

nikhilnkini.files.wordpress.com/2017/04/aizawa-wiri2005_fastlinkagedetectionschemeformultisourceinfointegration.pdf
Oct 2016
people.ucsc.edu people.ucsc.edu

ldfi-molly.pdf

1
1. nkini 18 Oct 2016
  
  in Public
  
  Had this model failed to identify a bug, we could subsequently haveenriched the sketched specifications.
  
  This works backwards, saying, well, we know there's a bug, and we will change the sketched spec until MOLLY is able to find one...
  
  criticism
Visit annotations in context

Tags

criticism

Annotators

nkini

URL

people.ucsc.edu/~palvaro/molly.pdf
nikhilnkini.files.wordpress.com nikhilnkini.files.wordpress.com

Untitled document

1
1. nkini 09 Oct 2016
  
  in Public
  
  In this survey I explain h
  
  Notion of knowledge can be formalized
  
  Usefulness of formalization
Visit annotations in context

Annotators

nkini

URL

nikhilnkini.files.wordpress.com/2016/09/using-reasoning-about-knowledge-to-analyze-distributed-systems.pdf
groups.csail.mit.edu groups.csail.mit.edu

impossibility-of-distributed-consensus-with-one-faulty-process.pdf

8
1. nkini 07 Oct 2016
  
  in Public
  
  Let C be a bivalent configuration of P, and let e = (p, m) be an event that is applicable to C. Let %? be the set of configurations reachable from C without applying e, and let 9 = e(g) = (e(E) I E E %? and e is applicable to E). Then, 9 contains a bivalent configuration.
  
  This is the setup for the second part:
  
  "Second, we construct an admissible run that avoids ever taking a step that would commit the system to a particular decision."
  
  As I understand it, what the authors say is that for an arbitrary event e and for an arbitrary configuration C, there is always a bivalent state that is reachable from C (the set fancy D in this case). And reaching a decision means that there needs to be a univalent state.
2. nkini 07 Oct 2016
  
  in Public
  
  assumed partial correctness.
  
  It is a valid assumption because total correctness is assumed which subsumes partial correctness.
  
  "First, we argue that there is some initial configuration in which the decision is not already predetermined." - this is the proof of this part.
  
  But why is a proof of this needed when the partial correctness already assumes that it is true that the configuration is bivalent (if all P begins with an initial config)?
  
  question
3. nkini 06 Oct 2016
  
  in Public
  
  The basic idea is to show circumstances under which the protocol remains forever indecisive. This involves two steps. First, we argue that there is some initial configuration in which the decision is not already predetermined. Second, we construct an admissible run that avoids ever taking a step that would commit the system to a particular decision.
  
  simple enough, right?
4. nkini 06 Oct 2016
  
  in Public
  
  The associated sequence of steps is called a run.
  
  What's the relation between a step and an event?
  
  question
5. nkini 06 Oct 2016
  
  in Public
  
  Thus, the message system acts nondeterministically, subject only to the condition that if receive( p) is performed infinitely many times, then every message (p, m) in the message buffer is eventually delivered. In particular, the message system is allowed to return 0 a finite number of times in response to receive(p), even though a message (p, m) is present in the buffer
  
  Hard to follow. Explain please?
  
  Also, why is this particular assumption made, that at infinity receives, every message is eventually delivered - how does not making this assumption change things?
  
  question
6. nkini 06 Oct 2016
  
  in Public
  
  Our system model is rather strong
  
  Can we discuss the choices of models that we have and why this one might be particularly stronger than the others?
  
  question
7. nkini 06 Oct 2016
  
  in Public
  
  Assumptions:
  
  No Byzantine failures
  
  Reliable message system
  
  No access to sync clocks (timeout based algos can't be used)
  
  If one non-faulty process receives a message, all non-faulty processes will and the sender knows this.
  
  No assumptions on:
  
  Relative speeds of processes
  
  Delay time in delivering a message
  
  The ability to detect the death of a process
8. nkini 06 Oct 2016
  
  in Public
  
  . In one atomic step, a process can attempt to receive a message, perform local computation on the basis of
  
  One atomic step:
  
  Attempt to receive a message
  
  perform local computation on the basis of whether or not a message was delivered to it
  
  Send an arbitrary but finite set of messages to other processes
Visit annotations in context

Tags

question

Annotators

nkini

URL

groups.csail.mit.edu/tds/papers/Lynch/jacm85.pdf
pdos.csail.mit.edu pdos.csail.mit.edu

chord_sigcomm.pdf

1
1. nkini 04 Oct 2016
  
  in Public
  
  adaptsefficientlyasnodesjoinandleavethesystem,andcananswerqueriesevenifthesystemiscontinuouslychanging
  
  how?
  
  anticipate
Visit annotations in context

Tags

anticipate

Annotators

nkini

URL

pdos.csail.mit.edu/papers/chord:sigcomm01/chord_sigcomm.pdf
www.allthingsdistributed.com www.allthingsdistributed.com

Microsoft Word - sosp067-decandia1.doc

12
1. nkini 03 Oct 2016
  
  in Public
  
  some application developers may not want to write their own conflict resolution mechanisms and choose to push it down to the data store
  
  Of course they will, it's the job of the data store, right?
2. nkini 03 Oct 2016
  
  in Public
  
  give services control over their system properties, such as durability and consistency, and to let services make their own tradeoffs between functionality, performance and cost-effectiveness
  
  Interesting... need to unpack this
3. nkini 03 Oct 2016
  
  in Public
  
  State management then becomes the main component of a service’s SLA.
  
  Why?
  
  question
4. nkini 03 Oct 2016
  
  in Public
  
  which are in general measured at the 99.9th percentile of the distribution.
  
  what does this mean?
  
  question
5. nkini 03 Oct 2016
  
  in Public
  
  object versioning
  
  What's this?
  
  question
6. nkini 03 Oct 2016
  
  in Public
  
  An SLA stated in terms of mean or median response times will not address the performance of this important customer segment.
  
  Important insight
7. nkini 03 Oct 2016
  
  in Public
  
  non-hostile and there are no security related requirements such as authentication and authorizatio
  
  Oooh, that's a pretty useful assumption
8. nkini 03 Oct 2016
  
  in Public
  
  eventually-consistent storage system can be used in production with demanding application
  
  Very interesting
9. nkini 03 Oct 2016
  
  in Public
  
  In the past year, Dynamo has been the underlying storage technology for a number of the core services in Amazon’s e-commerce platform. It was able to scale to extreme peak loads efficiently without any downtime during the busy holiday shopping season. For example, the service that maintains shopping cart (Shopping Cart Service) served tens of millions requests that resulted in well over 3 million checkouts in a single day and the service that manages session state handled hundreds of thousands of concurrently active sessions.
  
  Much ads
10. nkini 03 Oct 2016
  
  in Public
  
  scalability and availabilit
  
  The two important things they are going for
11. nkini 03 Oct 2016
  
  in Public
  
  To achieve this level of availability, Dynamo sacrifices consistency
  
  Clear trade-off choice
12. nkini 03 Oct 2016
  
  in Public
  
  destroyed by tornados
  
  Oh the drama
Visit annotations in context

Tags

question

Annotators

nkini

URL

allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf
Sep 2016
courses.mpi-sws.org courses.mpi-sws.org

implementing-remote-procedure-calls.pdf

8
1. nkini 29 Sep 2016
  
  in Public
  
  Exception Handling
  
  Whereas the exceptions relating to the distributed nature of the system (network partition etc) should have been part of this section, it unfortunately isn't. It is reduced to a call failed exception without too much detail about how to handle it.
2. nkini 29 Sep 2016
  
  in Public
  
  Having read the Waldo et al. paper first, I instinctively feel the need to defend this paper, which is kind of unprofessional. But I haven't enough time to shake off the emotions, so here goes.
  
  The problem that this paper has and also what sort of shields it from the Waldo et al. note is that it doesn't really try too hard to be a seminal contribution to the area of DS as we know it today. All it claims to do is provide an efficient, secure, programmer-of-that-day friendly way for more than one computer to communicate. They do back these claims up with their evaluations. So if they were honest, it is clear that in their evaluation, they did not come across problems that are fundamental to DS such as network partitions and node failures.
  
  Surely, this paper has important contributions considering that the DS paradigm of choice today - Map Reduce, makes use of some form of RPC to make calls to the map and reduce functions.
3. nkini 29 Sep 2016
  
  in Public
  
  Violation of this principle seemed likely to lead us into the complexities that have made previous communication packages and protocols difficult to use
  
  This line makes it pretty clear that "Distributed systems" and the common problems associated with them was not the concern that was being addressed, but it truly was creating a distributed communication protocol.
4. nkini 29 Sep 2016
  
  in Public
  
  imbedded in our major language, Mesa
  
  It does seem to be a major theme that implementation language choices ARE affecting the design choices, as was correctly pointed out in the Waldo et al paper.
5. nkini 29 Sep 2016
  
  in Public
  
  use procedure calls as the paradigm for expressing control and data transfers. For example, message passing might be a plausible alternative.
  
  Could someone elaborate the difference between the two mentioned paradigms? Doesn't procedure calls involve message passing?
6. nkini 29 Sep 2016
  
  in Public
  
  he primary purpose
  
  When I read this, having read the criticism that has been leveled against this paper in the A note on distributed computing paper. I think I see where the problem lies - this paper was set in a different time, and the problems that were trying to be solved were very different from the problems prevalent in 1994. It feels like at this stage of inception, the very problem of making calls to remote machines was a very new field, and various simplifying assumptions were made, and that performance was thought of, but not in the way that one would think of it ten years later.
7. nkini 29 Sep 2016
  
  in Public
  
  these should make it easier to build distributed computations, and to get them right. Another is efficiency: procedure calls seem simple enough for the communication to be quite rapid. A third is generality: in singie-machine com- putations, procedures are often the most important mechanism for communica- tion between parts of the algorithm.
  
  It's very hard to buy in to these assumptions, to be honest. So I hope they have justified why they think so in the pages to come.
8. nkini 29 Sep 2016
  
  in Public
  
  these should make it easier to build distributed computations, and to get them right.
  
  Easier to build, sure. Get them right? How did they get that idea?
Visit annotations in context

Annotators

nkini

URL

courses.mpi-sws.org/ds-ws12/papers/birrell-rpcs.pdf
nikhilnkini.files.wordpress.com nikhilnkini.files.wordpress.com

a-note-on-distributed-computing.pdf

8
1. nkini 29 Sep 2016
  
  in Public
  
  that all current invocations or callsfor system services will be eventually converted into callsthat might be to an object residing on some other machine.
  
  Why couldn't this be true, if for example, TCP were used?
2. nkini 29 Sep 2016
  
  in Public
  
  for easily under-standable reasons
  
  Are they implying the overhead of changing a lot of code? Or something more fundamental? I don't see the easily understandable reason.
3. nkini 29 Sep 2016
  
  in Public
  
  Unless the naming interfaceincludes an operation to lock a naming context, there is noway that I can make this operation completely robust.
  
  But isn't this true even today, and is just inherently a problem of distributed systems rather than any particular assumptions in a particular paradigm?
4. nkini 29 Sep 2016
  
  in Public
  
  This is not to argue against pleasant programming inter-faces.
  
  There seems to be an inherent assumption that programming interfaces that address DS problems have to be unpleasant, and I don't exactly see why this should be true.
5. nkini 29 Sep 2016
  
  in Public
  
  The hard problems have to do with differ-ences in memory access paradigms between local and dis-tributed entities.
  
  How so? In any way that's fundamentally different than process discovery?
6. nkini 29 Sep 2016
  
  in Public
  
  Thehard problems in distributed computing concern dealingwith partial failure and the lack of a central resource man-ager.
  
  Obviously...
7. nkini 29 Sep 2016
  
  in Public
  
  A less optimistic explanation of the failure of each attemptat unification holds that any such attempt will fail for thesimple reason that programming distributed applications isnot the same as programming non-distributed applications
  
  Did the development of BOOM, BLOOM and BUD take the view that this explanation takes?
8. nkini 29 Sep 2016
  
  in Public
  
  what-ever programming model is currently in vogue
  
  What is it in the 2000s and 2010s? Have we converged to a unified model? How have thing changed?
Visit annotations in context

Annotators

nkini

URL

nikhilnkini.files.wordpress.com/2016/09/a-note-on-distributed-computing.pdf

Tags

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Annotators

URL