2 Matching Annotations
  1. Nov 2023
    1. 6.3. Distribution of sample mean \(p\) for Bernoulli RV.

      1. Sample n points and construct new Binomial RV \(Y = X_1 + \cdots + X_n\). Note \(Y \sim Binomial(p,n)\) with \(\mu = np,\sigma = np(1-p)\).
      2. \(Y\) can be approximated by \(N(np,npq)\) if \(\mu \pm 3\sigma \in (0,n)\), see Wikipedia.
      3. Hence \(Y/n\) ca be approximated by \(N(p,pq/n)\), if \(\mu\pm 3\sigma \in (0,n)\) or \(p \pm 3\sqrt{pq/n} \in (0,1)\).
      4. sample mean \(\hat{p} = \mu_X = \mu_{\bar{X}} = E(\bar{X})\) is
    1. 1.1. BASICS

      1. Population and Sample: Whole set and subset.
      2. Parameter and statistic: Numeric property of Population and Sample respectively.
      3. Descriptive and Inferencial Statistics: Describe sample data and infer about population from sample data.
      4. Qualitative and Quantitive data: Non numeric and numeric.

      1.2. Three main tasks

      Assume "Random Sample" 1. Parameter Estimation. 2. Confidence Interval for Parameter 3. Hypothesis Testing

      2.1. Data Display:

      1. stem and leaf diagrams,
      2. frequency histograms,
      3. relative frequency histograms. (PMF or PDF)
      4. Commulitive frequency graph. (CDF)

      2.2. Measure of central tendancy

      1. Popupation Mean and sample mean
      2. Population Median and sample median
      3. Population mode and sample mode
      4. relation between mean and median gives skewness: left skewed vs right skewed

      2.3. Variability

      1. Range
      2. sample Variance and sample standard deviation. Note denominator.
      3. Population variance and standard deviation.

      2.4. Relative position

      1. Percentile: Pth percentile is $$F_X^{-1}\Big(\frac{P}{100}\Big)$$
      2. Quartile: Q1,Q2,Q3.
      3. Box plot: Also called five number sumarry $${ X_{min}, Q_1, Q_2, Q_3, X_{max} }$$
      4. IQR = Q3-Q1
      5. Z-score. for population and sample $$z(x) = \frac{x-\mu}{\sigma}$$ $$z(x) = \frac{x-\bar{x}}{s}$$

      2.5.1. The Empirical Rule: Assume Data have normal distribution. 1. approx 68% data is in \(\mu \pm \sigma\) 2. approx 95% data is in \(\mu \pm 2\sigma\) 3. approx 99.7% data is in \(\mu \pm 3\sigma\)

      2.5.2. Chebyshev's Theorem: For any distribution, at least \(1-\frac{1}{k^2}\) of the data is in \(\mu \pm k\sigma\)