1.1. BASICS
- Population and Sample: Whole set and subset.
- Parameter and statistic: Numeric property of Population and Sample respectively.
- Descriptive and Inferencial Statistics: Describe sample data and infer about population from sample data.
- Qualitative and Quantitive data: Non numeric and numeric.
1.2. Three main tasks
Assume "Random Sample"
1. Parameter Estimation.
2. Confidence Interval for Parameter
3. Hypothesis Testing
2.1. Data Display:
- stem and leaf diagrams,
- frequency histograms,
- relative frequency histograms. (PMF or PDF)
- Commulitive frequency graph. (CDF)
2.2. Measure of central tendancy
- Popupation Mean and sample mean
- Population Median and sample median
- Population mode and sample mode
- relation between mean and median gives skewness: left skewed vs right skewed
2.3. Variability
- Range
- sample Variance and sample standard deviation. Note denominator.
- Population variance and standard deviation.
2.4. Relative position
- Percentile: Pth percentile is $$F_X^{-1}\Big(\frac{P}{100}\Big)$$
- Quartile: Q1,Q2,Q3.
- Box plot: Also called five number sumarry $${ X_{min}, Q_1, Q_2, Q_3, X_{max} }$$
- IQR = Q3-Q1
- Z-score. for population and sample
$$z(x) = \frac{x-\mu}{\sigma}$$
$$z(x) = \frac{x-\bar{x}}{s}$$
2.5.1. The Empirical Rule:
Assume Data have normal distribution.
1. approx 68% data is in \(\mu \pm \sigma\)
2. approx 95% data is in \(\mu \pm 2\sigma\)
3. approx 99.7% data is in \(\mu \pm 3\sigma\)
2.5.2. Chebyshev's Theorem:
For any distribution,
at least \(1-\frac{1}{k^2}\) of the data is in \(\mu \pm k\sigma\)