Figure 3.1
There's no Figure 3.1 in this chapter, or at least it's not identified
Figure 3.1
There's no Figure 3.1 in this chapter, or at least it's not identified
p-value=PH0[|¯¯¯¯Y−μY,0|>|¯¯¯¯Yact−μY,0|](3.2)(3.2)p-value=PH0[|Y¯−μY,0|>|Y¯act−μY,0|]\begin{equation} p \text{-value} = P_{H_0}\left[ \lvert \overline{Y} - \mu_{Y,0} \rvert > \lvert \overline{Y}^{act} - \mu_{Y,0} \rvert \right] \tag{3.2} \end{equation} In (3.2), ¯¯¯¯YactY¯act\overline{Y}^{act} is the mean of the sample actually computed. Consequently, in order to compute the ppp-value as in (3.2), knowledge about the sampling distribution of ¯¯¯¯YY¯\overline{Y} when the null hypothesis is true is required. However in most cases the sampling distribution of ¯¯¯¯YY¯\overline{Y} is unknown. Fortunately, as stated by the CLT (see Key Concept 2.7), the large-sample approximation ¯¯¯¯Y≈N(μY,0,σ2¯¯¯¯Y) , σ2¯¯¯¯Y=σ2YnY¯≈N(μY,0,σY¯2) , σY¯2=σY2n \overline{Y} \approx \mathcal{N}(\mu_{Y,0}, \, \sigma^2_{\overline{Y}}) \ \ , \ \ \sigma^2_{\overline{Y}} = \frac{\sigma_Y^2}{n} can be made under the null. Thus, ¯¯¯¯Y−μY,0σY/√n∼N(0,1).Y¯−μY,0σY/n∼N(0,1). \frac{\overline{Y} - \mu_{Y,0}}{\sigma_Y/\sqrt{n}} \sim \mathcal{N}(0,1). So in large samples, the ppp-value can be computed without knowledge of the exact sampling distribution of ¯¯¯¯YY¯\overline{Y}. Calculating the p-Value when the Standard Deviation is Known For now, let us assume that σ¯¯¯¯YσY¯\sigma_{\overline{Y}} is known. Then, we can rewrite (3.2) as p-value=PH0⎡⎢⎣∣∣ ∣∣¯¯¯¯Y−μY,0σ¯¯¯¯Y∣∣ ∣∣>∣∣ ∣∣¯¯¯¯Yact−μY,0σ¯¯¯¯Y∣∣ ∣∣⎤⎥⎦=2⋅Φ⎡⎢⎣−∣∣ ∣∣¯¯¯¯Yact−μY,0σ¯¯¯¯Y∣∣ ∣∣⎤⎥⎦.(3.3)p-value=PH0[|Y¯−μY,0σY¯|>|Y¯act−μY,0σY¯|](3.3)=2⋅Φ[−|Y¯act−μY,0σY¯|].\begin{align} p \text{-value} =& \, P_{H_0}\left[ \left\lvert \frac{\overline{Y} - \mu_{Y,0}}{\sigma_{\overline{Y}}} \right\rvert > \left\lvert \frac{\overline{Y}^{act} - \mu_{Y,0}}{\sigma_{\overline{Y}}} \right\rvert \right] \\ =& \, 2 \cdot \Phi \left[ - \left\lvert \frac{\overline{Y}^{act} - \mu_{Y,0}}{\sigma_{\overline{Y}}} \right\rvert\right]. \tag{3.3} \end{align} The ppp-value can be seen as the area in the tails of the N(0,1)N(0,1)\mathcal{N}(0,1) distribution that lies beyond ±∣∣ ∣∣¯¯¯¯Yact−μY,0σ¯¯¯¯Y∣∣ ∣∣
This part is very confusing
b <- c(5, 10, 15, 30)
I dont get the need for this command, it seems we are not using any "b" in the formula
menas
mean
menas
mean
222, 101010, 505050 and 100
These are not the numbers that are in the sample.sizes command below
distribution of the sample mean ¯¯¯¯YY¯\overline{Y} of the Bernoulli distributed random variables YiYiY_i, i=1,...,ni=1,...,ni=1,...,n, is well approximated by the normal distribution with parameters μY=p=0.5μY=p=0.5\mu_Y=p=0.5 and σ2Y=p(1−p)=0.25σY2=p(1−p)=0.25\sigma^2_{Y} = p(1-p) = 0.25 for large
Is it correct to state that the sample mean is well aproximated by the normal distribution with parameters p = 0.5 and sigma square = 0.25 for large n? Shouldnt it be that sigma square = 0.25/n ? (That is, there´s a " divided by n" missing )?
By setting the argument lower.tail to TRUE we ensure that R computes 1−P(Y≤2)1−P(Y≤2)1- P(Y \leq 2), i.e,the probability mass in the tail right of 222.
I think the correct explanation, according to R help function, is: "lower.tail : logical; if TRUE (default), probabilities are P[X ≤ x], otherwise, P[X > x]."
# compute denstiy at x=-1.96, x=0 and x=1.96
Why is it relevant to compute the density at specific points (and not intervals), if the probability of a continuous distribution to take one specific value is zero? I would recomend a note stating that this function is not very useful for continuous variables, as probabilities can only be measured in intervals. I saw on a few websites that this function gives the height at the chosen point, maybe this could also be noted.
denstiy
density