8 Matching Annotations
  1. Feb 2019
  2. Oct 2018
    1. 5/20 一小时理解Bayes统计

      方差、平均值这些统计量的计算,都只能算是【叙述统计】。真正统计的核心在于【推理统计】,推理统计有两个学派:frequentist vs. bayesian (频率统计学派 vs. 贝叶斯学派)。

      bayes 实际解决的是一个【推论 inference --- 我们想知道大自然是什么样子的,于是 collect 很多 data,根据这些data做推断,这就是 Inference】的问题。

      【注意:教授说,预测和推断是有区别的, prediction and inference is different】

      什么是 bayesian statistics?

      different target between bayesian and frequentist

      【注:这里从参数模型parameters model 来考虑】

      • a particular mathematical approach to apply probability to statistical problems.

      • incorporating our prior beliefs and evidence, to produce new posterior beliefs.

      这是bayes学派的思路。而频率学派的思路是:对于模型的某个参数,先得到其【点估计】,然后给出这个点估计的【准确度 confidence?】, bayes 不是得到一个点估计,而是得到一个【distribution】

      bayes 的来源

      \(P(A|B) = \frac{P(B|A)P(A)}{P(B)}\)

      \(P(B|A) = \frac{P(A|B)P(B)}{P(A)}\)

      bayes 来自于 bayes' Rule --- 一个条件概率计算式子。

      顺序:今天下雨,我带伞的概率; 逆序:今天带伞,外面下雨的概率;

      bayesian inference

      \(P(\Theta|D)=\frac{P(D|\Theta)P(\Theta)}{P(D)}\)

      • posterior: \(P(\Theta|D)\)
      • likelihood: \(P(D|\Theta)\) ,通过数据可以学习到。
      • prior: \(P(\Theta)\)
      • evidence: \(P(D)\)

      因为 evidence 可以消掉,所以我们经常把 bayesian inference 写作:

      \(P(\Theta|D)\propto P(D|\Theta)P(\Theta)\)

      posterior propto likelihood * prior

      Frequentist vs. Bayesian

      频率学派和贝叶斯学派最大的不同在于,random 来自哪里,前者认为其来自 data,后者认为其来自 parameter

      • Frequentist statistics: the probabilities are the long-run frequency of random events in repeated trials

      一般大学的统计学教学都是 频率学派,杜克大学是个例外,他是bayes学派。MLE 就是典型的频率学派的研究对象。这个学派的特点是得到的是模型参数的【点估计】,它认为在宇宙表象之后有一个【绝对的公理】在支配宇宙万事的运行,它认为 random 来自于 data,比如 linear model 只有两个参数 w and b,\(y = wx+b\),这个 linear model 的 w 和 b 是有两个绝对正确的量的(标准答案)。那为什么我学习or估测的这两个参数的结果会与标准答案不符合呢,是因为我的 data 来自于 random sample。

      亦即:parameter is fixed, data is random.

      所以,这也就解释了

      • where error come from --- random sample;
      • why get a value not a distribution --- parameter is fixed.

      • Bayesian statistics: preserve and refine uncertainty by adjusting individual beliefs in light of new evidence.

      bayes 学派得到的是模型参数的【分布】,认为这个参数并不是一个 fix 的值,这体现了统计学中 random 这一概念。与频率学派不同的是,bayes学派认为 data 已经在那了,我不可能去改 data 来 fit parameter,所以它认为: data is fixed, parameter is random. 它认为 parameter 不是固定的,而是符合某种分布的。比如 Linear model 的两个参数 w 和 b, bayesian 认为他们【没有绝对正确】的值,所以我寻找他的先验估计,通过 fitting 不同的 data 对其进行 update (ps:怎么看着有点像是 iterative 优化方法)从而得到 posterior, 而 posterior 就是模型对这个参数的估计。

      Frequentist vs. Bayesian 举例 Linear Regression

      Linear Regression model: \(y_i = w_0 + \sum_jw_jx_{ij}=w_0+w^Tx_i\)

      Maximum Likelihood Estimation(MLE):

      • Frequentist approach
      • \(y_i=w^Tx_i+\epsilon,\ where\ \epsilon \sim N(0,\sigma^2)\)
      • choose paramters which maximize the likelihood of data given that paramter
      • \(w_{MLE} = argmin_{w}\sum_i(y_i-w^Tx_i)^2\)
  3. Jan 2017
    1. Adversarial Variational Bayes: Unifying Variational Autoencoders andGenerative Adversarial Networks
  4. Jul 2016
    1. following equation

      $$ y={argmax} _{c_{j}\in C}\sum _{h_{i}\in H}{P(c_{j}|h_{i})P(T|h_{i})P(h_{i})} $$

      $$ ={argmax} _{c_{j}\in C}\sum _{h_{i}\in H}{P(c_{j}|h_{i})P(T,h_{i})} $$

      $$= {argmax}_{c_{j}\in C}\sum _{h_{i}\in H}{P(c_{j}|h_{i})P(h_{i}|T)}$$

      \propto doesn't work well.

  5. Jan 2016
    1. P(B|E) = P(B) X P(E|B) / P(E), with P standing for probability, B for belief and E for evidence. P(B) is the probability that B is true, and P(E) is the probability that E is true. P(B|E) means the probability of B if E is true, and P(E|B) is the probability of E if B is true.
    2. The probability that a belief is true given new evidence equals the probability that the belief is true regardless of that evidence times the probability that the evidence is true given that the belief is true divided by the probability that the evidence is true regardless of whether the belief is true. Got that?
    3. Initial belief plus new evidence = new and improved belief.
  6. Oct 2015
    1. Nearly all ap­pli­ca­tions of prob­a­bil­ity to cryp­tog­ra­phy de­pend on the fac­tor prin­ci­ple (or Bayes’ The­o­rem).

      This is easily the most interesting sentence in the paper: Turing used Bayesian analysis for code-breaking during WWII.