Hypothesis

142 Matching Annotations

Jul 2022
fw8051statistics4ecologists.netlify.app fw8051statistics4ecologists.netlify.app

Chapter 18 Linear Mixed Effects Models | Statistics for Ecologists

31
1. rjbuck000 20 Jul 2022
  
  in Public
  
  correlated due to sharing a random intercep
  
  This correlation only exists if b_i varies over i = 1. , ... sampling units, i.e., sigma_bi > 0.
2. rjbuck000 20 Jul 2022
  
  in Public
  
  you might consider
  
  It seems like you "need to", not "might", consider adding a random effect. The repeated observations is a sampling design parameter as it is restricting the patient-time pairing that could be sampled in a simple random sample. You might consider not including a random intercept if there is clearly no impact from the clustering factor. Even then the observations are only behaving as if they are independent because of no cluster (e.g., lake) effect; they still are correlated observations from sampling perspective.
3. rjbuck000 20 Jul 2022
  
  in Public
  
  random effects induces
  
  I get the impression that adding the random effects to the statistical model induces correlation, but isn't it the case that the study design induces the correlation and the model is trying to do just that, model the correlation.
4. rjbuck000 20 Jul 2022
  
  in Public
  
  corrlation
  
  "correlation"
5. rjbuck000 20 Jul 2022
  
  in Public
  
  Thus, a common recommendation is to use ML when comparing models (e.g., using likelihood ratio tests) that differ only in their fixed effects
  
  Piniero & Bates p.87 do not recommend using likelhood ratio tests for fixed effects, but rather use t-tests from summary(lme).
6. rjbuck000 20 Jul 2022
  
  in Public
  
  would
  
  "would be"
7. rjbuck000 20 Jul 2022
  
  in Public
  
  If this seems confusing, you are not alone. This is another situation where life just seems easier if you are a Bayesian since all parameters are treated as random variables and assigned probability distributions.
  
  I find these sentences unhelpful. There are times when it is appropriate to consider results as a fixed effect rather than a random effect. For example for Se fish, what if those 9 lakes sampled were all the lakes "nearby"? Then assuming a model where they are just a sample from a larger population will lead to poorer estimates than the correct fixed effects model. One could argue that it is more important to do things correctly, rather than just the easy way...
8. rjbuck000 20 Jul 2022
  
  in Public
  
  The deviations from the average intercept and slope, b0ib0ib_{0i} and b1ib1ib_{1i}, are also missing in action. What gives?
  
  These can be found quite easily. I am not sure why you include this sentence rather than a helpful sentence on how easy it is to generate these values, which you show just a paragraph later.
9. rjbuck000 20 Jul 2022
  
  in Public
  
  and
  
  I think this is an important "and" as it applies to all scenarios described in Fig 18.5
10. rjbuck000 20 Jul 2022
  
  in Public
  
  within cluster variability is low
  
  Do you mean when trends are strong, i.e., variability of residuals for within cluster regression is low? It seems likely that if cluster variability is low you would go with random intercept model.
11. rjbuck000 20 Jul 2022
  
  in Public
  
  Schielzeth & Forstmeier
  
  At least for this paper, the authors compared random intercept vs random slope model in a simulation where a random slope was included. What if there was no random slope in the true model? I suspect the random intercept model estimator would be better. Generally speaking the best model is the correct model. How well a model does under alternative truths has not been studied much as it depends on too many alternative truths. One issue to note is that if there is a big difference between between- and within-subject variance, iid N models will work well. If the between-subject variance is small compared to within-subject variance then the model should be at the subject level (fish), but if it is large, then model should be at the lake level (and take averages of the fish within lakes).
12. rjbuck000 20 Jul 2022
  
  in Public
  
  In this model, each site has its own intercept β0iβ0i\beta_{0i} and slope β1iβ1i\beta_{1i}
  
  I think it is important to note that if you take this approach you are no longer estimating how the population mean for each lake varies with Se in lake water, but how B_0 and B_1 vary with Se in lake water. B_0 may or may not equate well with lake population means.
13. rjbuck000 20 Jul 2022
  
  in Public
  
  predictor variable varies within a lake
  
  and assuming impacts the response in some way.
14. rjbuck000 20 Jul 2022
  
  in Public
  
  powerful
  
  "useful"?
15. rjbuck000 20 Jul 2022
  
  in Public
  
  We see that the estimated standard errors for β0β0\beta_0 and β1β1\beta_1 are much narrower in the naive linear model, which is not a surprise since this approach treats the data as though they were independent
  
  Wouldn't that be a good thing? It is bad only if they are biased, but you do not present any evidence that is the case or at least worse than lme model. In any case if the true model had no Lake effect, SE estimates would be much to large for mixed effects model and iid N case would be fine.
16. rjbuck000 20 Jul 2022
  
  in Public
  
  Naive linear model
  
  Aren't the true values 1.16 for intercept and 0.26 for Slope? Would this not indicate that "Naive" model is least biased method?
17. rjbuck000 20 Jul 2022
  
  in Public
  
  mixed-effect model fit using lmer, which was the model used to generate the data
  
  Would the mixed effects model work better than the naive model if we had simulated the naive model? We need to remember that the best model is always going to be the correct model. The naive model is simply an estimator using the incorrect model, it has nothing to do being a poor or naive estimator.
18. rjbuck000 20 Jul 2022
  
  in Public
  
  often
  
  Often? When is it not reasonable?
19. rjbuck000 20 Jul 2022
  
  in Public
  
  data are balanced
  
  You will have the sample problems if sample sizes are balanced, just less egregious ; as noted previously, the issue is the sample is not a simple random sample, not unequal sample sizes.
20. rjbuck000 20 Jul 2022
  
  in Public
  
  estimator of the intercept is biased
  
  This is a biased comparison of the two models. In linear regression there is no adjustment for differences between lakes, however awkward that may be, as there is in the lme model. For a fair comparison lakes factor needs to be included in the model; of course an estimator needs to be selected carefully so they both are estimates of population mean (or at least the same thing).
21. rjbuck000 20 Jul 2022
  
  in Public
  
  SEs estimate the standard deviation
  
  I believe a better estimate would be sqrt(mean(se.hat[,1]^2)) as we would rather take mean of variances than stdDev.
22. rjbuck000 20 Jul 2022
  
  in Public
  
  repeated measures data
  
  I don't believe this data fit the description of what is usually thought of as repeated measures data. The sampling unit is fish and we only take one observation per fish. Based on simulation this is a kind of stratified random sample or cluster sampling, depending on how many lakes there are in the population, with population of fish stratified by randomly selecting lakes and then "random" samples taken from each lake; but sampling plan was not described in the text. The sampling plan can not be ignored when discussing the nature of the model.
23. rjbuck000 20 Jul 2022
  
  in Public
  
  factors
  
  is this statistical factors or ecological factors?
24. rjbuck000 20 Jul 2022
  
  in Public
  
  naive estimates of uncertainty will be too small
  
  I don't understand this. If the sample of fish are a simple random sample from the population of 9 (all?) lakes, then estimates will be unbiased. As it turns out the simulation given as an example is not a simple random sample, so there is an error in model specification.
  
  I am not sure I know what a "naive" estimate is. If Lake is not important is the mixed effects model the naive estimator?
25. rjbuck000 20 Jul 2022
  
  in Public
  
  the number of fish sampled per lake does not depend on these other factors
  
  This is not what is happening in the simulation given. Lakes are first selected, then fish are caught in those lakes. If there were a lot of lakes, 10000 in example, we would rarely catch more than 1 fish from any given lake if a simple random sample (SRS) was taken. Instead we have about 9 fish caught per selected lakes. Hence not all fish are equally likely to be caught.
26. rjbuck000 20 Jul 2022
  
  in Public
  
  sample sizes are random
  
  I am not sure why this is important. The question is whether the data are collected as a simple random sample from the population of fish across all lakes defining the population of fish.
27. rjbuck000 20 Jul 2022
  
  in Public
  
  Well, it depends on the process that leads to the imbalance
  
  It also depends on the question one is trying to answer. If you want an estimate of the population mean, it will matter. If you want to understand the relationship between lake SE and bioaccumulation, then not so much, except in the efficiency of your estimator; in which case equal sample sizes will be most efficient.
28. rjbuck000 19 Jul 2022
  
  in Public
  
  not independent
  
  Independence is a very hard thing to prove or disprove. There could be difference in lakes that lead to different bioaccumulation rates, e.g., the fish caught in different lakes may be different species and have different bioaccumulation rates. The sampling plan, which is not described, would tell us whether fish were selected independently. (As best we can tell given the complexities of sampling wildlife.). Even if the fish were a true simple random sample from the population of fish from nearby lakes, results could look like what is given in the text. Keep in mind what we are observing in the figures are the residuals, not the errors. What is missing here is Lake as a factor in the model, so assuming a simple random sample took place it is model error (lack of fit) not lack of independence that is leading to this picture. On the other hand if Lake has no effect, then a stratified random sample will look like independent data. The issue is that Lake is presumed to have an effect. Note Lake and log(Se) will be confounded as we only have one observation of Se in water per lake.
29. rjbuck000 19 Jul 2022
  
  in Public
  
  locations
  
  Based on the text "to determine whether Selenium concentrations in nearby lakes are bioaccumulating in fish" the population of fish in these lakes are of interest and are the sampling units, not locations, at least at this point in the conversation. So it is the sampled fish that need to be independent. This may or may not have occurred, but a sampling plan has not been stated.
30. rjbuck000 19 Jul 2022
  
  in Public
  
  nearby
  
  Are the 9 sampled lakes all the "nearby" lakes? How many nearby lakes are there? This is an important question that is not directly assessed.
31. rjbuck000 19 Jul 2022
  
  in Public
  
  concentrations
  
  delete? Selenium can bioaccumulate, but not Se concentrations.
Visit annotations in context

Annotators

rjbuck000

URL

fw8051statistics4ecologists.netlify.app/mixed.html
fw8051statistics4ecologists.netlify.app fw8051statistics4ecologists.netlify.app

Chapter 19 Generalized linear mixed effects models (GLMMs) | Statistics for Ecologists

6
1. rjbuck000 19 Jul 2022
  
  in Public
  
  Based on the significant interactions
  
  Could these interactions be due to sparsity of data and large number of nesting boxes?
2. rjbuck000 19 Jul 2022
  
  in Public
  
  Population proportion of skaters
  
  This is the sample proportion as it is based on summarize(PopProportion = mean(Y) rather than the parameters B_0 and B_1 correct?
3. rjbuck000 19 Jul 2022
  
  in Public
  
  model’s design matrix
  
  The response variable is not clearly stated. Is it presence/absence of eggs or clutch size? I believe both have been discussed. The model implies it is presence/absence, but this should be clarified.
4. rjbuck000 19 Jul 2022
  
  in Public
  
  we load the data
  
  I am not sure I understand this sentence. Before we load the data we convert several variables? How can we do that before we load the data? What does "loading" the data mean?
5. rjbuck000 19 Jul 2022
  
  in Public
  
  test statistics, and AIC’s
  
  test statistics and AICs are not presented for both Random Intercept and GLS in the table above this sentence.
6. rjbuck000 19 Jul 2022
  
  in Public
  
  were
  
  was?
Visit annotations in context

Annotators

rjbuck000

URL

fw8051statistics4ecologists.netlify.app/glmm.html
fw8051statistics4ecologists.netlify.app fw8051statistics4ecologists.netlify.app

Chapter 20 Generalized Estimating Equations (GEE) | Statistics for Ecologists

2
1. rjbuck000 19 Jul 2022
  
  in Public
  
  estimates
  
  of the parameter values? Are estimates of SEs of parameter values impacted if assumptions about variance/covariance matrix are incorrect?
2. rjbuck000 19 Jul 2022
  
  in Public
  
  Thus, ∂μi∂βVi=Xi∂μi∂βVi=Xi\frac{\partial \mu_i}{\partial \beta}V_i = X_i
  
  V^-1, i.e., V inverse in equation here?
Visit annotations in context

Annotators

rjbuck000

URL

fw8051statistics4ecologists.netlify.app/gee.html
fw8051statistics4ecologists.netlify.app fw8051statistics4ecologists.netlify.app

Chapter 17 Models for zero-inflated data | Statistics for Ecologists

4
1. rjbuck000 15 Jul 2022
  
  in Public
  
  [more on this later…reference last section of book…if I get there:)]
  
  delete
2. rjbuck000 15 Jul 2022
  
  in Public
  
  This is one of the reasons why I like to cover zero-inflated models
  
  I find this reasoning confusing. In next example, it appears you carry out analysis without Bayesian priors. In the following section you implement using Bayesian. As far as I can tell you get the same answer for both (the intercept is different but I assume that is due to switch of definition of zero-inflation (which is confusing in of itself). Since the answers are the same, it seems the non-Bayesian approach in Sec 17.6 is "easier" as it requires fewer assumptions and less complicated modeling process.
3. rjbuck000 15 Jul 2022
  
  in Public
  
  zero-inflation
  
  Has this been defined?
4. rjbuck000 15 Jul 2022
  
  in Public
  
  we should expect fewer zeros
  
  why should we expect fewer zeroes? It seems this is a conditional distribution problem. First there is the presence absence condition. Second, there is the distribution of values when "present". At least theoretically, even under present" conditions, you may still get zero values. As the mean of the distribution under "present" conditions increases from zero the frequency of zeroes will decline to a point where getting a zero is unlikely. So whether zeroes increase with mean or not, the zero-inflated model could be appropriate.
Visit annotations in context

Annotators

rjbuck000

URL

fw8051statistics4ecologists.netlify.app/zimodels.html
fw8051statistics4ecologists.netlify.app fw8051statistics4ecologists.netlify.app

Chapter 16 Logistic regression | Statistics for Ecologists

7
1. rjbuck000 15 Jul 2022
  
  in Public
  
  95% credible interval
  
  One limitation to use of two sided intervals, confidence or credible, is that it removes 0 probability from any estimated interval (of interest, i.e., discounting bounds of +/-Inf); yet, all we have seen are 0s. Another approach is to note that for p = P(Y = 1) > 0.05, the chance of getting 59 0s in a row is less than "p-value" = 5%. Hence either p > 0.05 and something very unusual happened ("p-value"< 0.05) or p < 0.05 with the most likely value being zero (given no further information).
2. rjbuck000 15 Jul 2022
  
  in Public
  
  primarily
  
  I am confused. What value is a non-informative prior? If it is completely non-informative wouldn't the answers be determined completely by the data? This seems to be the case as Bayesian results generally mirror frequentist results when uninformed priors are used. In Sec 11.2 it is implied that we need informed priors (from data) otherwise the probability is subjective. In the example just given, we need an informed prior about beta_0 to achieve an uninformed prior about p?? So does this model have an informative or uninformative prior?
3. rjbuck000 15 Jul 2022
  
  in Public
  
  it turns out that a N(0,3)N(0,3)N(0, 3) distribution is a much better option
  
  What prior information leads us to believe that beta_0 has a distribution with mean 0 and std dev 3?
4. rjbuck000 15 Jul 2022
  
  in Public
  
  Contrast this process with fitting a model using glm where users may not even know that the mean is being modeled on a transformed scale
  
  I could also write this as "Note how in JAGS we have to spell everything out in tedious detail, but glm does all that for us!"
5. rjbuck000 15 Jul 2022
  
  in Public
  
  an
  
  and?
6. rjbuck000 15 Jul 2022
  
  in Public
  
  get will get the
  
  delete?
7. rjbuck000 15 Jul 2022
  
  in Public
  
  included
  
  "include"
Visit annotations in context

Annotators

rjbuck000

URL

fw8051statistics4ecologists.netlify.app/logistic
fw8051statistics4ecologists.netlify.app fw8051statistics4ecologists.netlify.app

Chapter 14 Introduction to generalized linear models (GLMs) | Statistics for Ecologists

3
1. rjbuck000 15 Jul 2022
  
  in Public
  
  we will no longer be able to write our models in terms of a “signal + error”
  
  I don't see why we can not continue to write models as signal + error. Just because the variance may be a function of the mean doesn't mean we can't view it as signal + error. The link functions are essentially the signal function, Hence, the deviations we observe, i.e., the residuals, are estimates of the "errors". I believe, what is really happening is we are getting more nuanced about modelling the "error"; now we have different distributions, then "random effects models", measurement error, etc. It is important to keep in mind that the "error" is presumed additive, i.e., signal + error. (Except when it is not, of course, but then we do something like a transform to make it additive.)
2. rjbuck000 15 Jul 2022
  
  in Public
  
  Poisson regression
  
  I think it would be useful to add "models" to the end of the sentence.
3. rjbuck000 15 Jul 2022
  
  in Public
  
  linear regression
  
  This is a false comparison between "linear regression", by which I presume the iid N subset of linear regression models is meant, and the Poisson model. In one the signal function is linear and the other it is exponential function. Depending on the truth one of these is going to be wrong. A better comparison is between log(Y) = beta_0 + ... and Poisson.
Visit annotations in context

Annotators

rjbuck000

URL

fw8051statistics4ecologists.netlify.app/glm.html
fw8051statistics4ecologists.netlify.app fw8051statistics4ecologists.netlify.app

Chapter 15 Regression models for count data | Statistics for Ecologists

4
1. rjbuck000 15 Jul 2022
  
  in Public
  
  interpretation of regression coefficients
  
  This is a false indication of the capabilities of linear regression, by which I presume you are referring to the iid N subset of linear regression models. In the Poisson model you are stating that response is an exponential function of the predictors. Yet log(response) with iid N is not considered or compared to the Poisson model. This is like saying a model is Y = X^2, but not allowing linear regression to use polynomials. I am not claiming log(response) is going to be better or worse, but this is an unfair comparison.
2. rjbuck000 15 Jul 2022
  
  in Public
  
  can thus only take on integer values
  
  but this really only matters if the integer range is small and particularly so if near near zero. This is important information when choosing models.
3. rjbuck000 15 Jul 2022
  
  in Public
  
  assumed model
  
  What is the assumed model? I presume it is the model assumed by the null hypothesis.
4. rjbuck000 15 Jul 2022
  
  in Public
  
  other methods might include
  
  Why not include log(response) iid Normal case? This should be considered as a possibility.
Visit annotations in context

Annotators

rjbuck000

URL

fw8051statistics4ecologists.netlify.app/poissonr
fw8051statistics4ecologists.netlify.app fw8051statistics4ecologists.netlify.app

Chapter 13 Bayesian linear regression | Statistics for Ecologists

4
1. rjbuck000 12 Jul 2022
  
  in Public
  
  the easy way
  
  Sorry, but frequentist model gives the same answer and seems way easier than going through all this.
2. rjbuck000 12 Jul 2022
  
  in Public
  
  Confidence
  
  Again, is this meant to be "Credible"? If not what is a Bayesian Confidence Interval?
3. rjbuck000 12 Jul 2022
  
  in Public
  
  If you get a p-value near 0.5, that is ideal as it suggests that the observed data look very similar to the data generated under the assume model (at least when it comes to our chosen test statistic)
  
  What if you get a value near 1? What does that mean?
4. rjbuck000 12 Jul 2022
  
  in Public
  
  assumed model
  
  What is the assumed model? I assume it is the one behind the null hypothesis. These hypotheses are either True or False just like frequentist hypotheses, are they not?
Visit annotations in context

Annotators

rjbuck000

URL

fw8051statistics4ecologists.netlify.app/bayesian-linear-regression.html
fw8051statistics4ecologists.netlify.app fw8051statistics4ecologists.netlify.app

Chapter 11 Introduction to Bayesian statistics | Statistics for Ecologists

10
1. rjbuck000 12 Jul 2022
  
  in Public
  
  By contrast, a frequentist will say that this particular interval either does contain ppp or it does not contain ppp (with probability = 1). The parameter itself is a fixed, but unknown quantity and the interval is random (it depends on the observed data). To interpret the frequentist interval in terms of a probability statement, we need consider creating many such intervals across repeated events (collect data, form interval). We expect that close to 95% of these intervals would contain the fixed and unknown ppp (see Section 11.3.2). Thus, we say that we are 95% sure that our interval contains the true parameter ppp
  
  I find all of this confusing and contradictory; see previous comment as well. The first probability is one of little interest as it is essentially a tautology. The true value is either in the interval or it is not - once you know the truth. We do not, so rely on the second definition to guide our uncertainty in the parameter value. The first probability also holds for Bayesian credible interval as probability is the "belief" that the true parameter is in the interval. So for a give credible interval the true value is either in or outside the interval.
2. rjbuck000 12 Jul 2022
  
  in Public
  
  This statement summarizes our belief about reasonable values of ppp given the observed data
  
  I am puzzled why a frequentist assuming their underlying model, e.g., normal distribution, is correct, can't say that there is a 95% chance that the 95%CI interval covers the true value, i.e., "probability that p is between 0.39 and 0.56 is 0.95." Isn't this the frequentist probability belief model? The fact we have chosen specific values, 0.39 and 0.56, to identify the CI doesn't matter, the CI itself has a 95% chance of covering the true parameter, it just so happens the one we observe was labelled with those specific values.
3. rjbuck000 12 Jul 2022
  
  in Public
  
  how well do these frequentist intervals work
  
  Why not ask how well do these Bayesian Intervals work?
4. rjbuck000 12 Jul 2022
  
  in Public
  
  and
  
  "as"? I think starting at "Inference" would be better.
5. rjbuck000 12 Jul 2022
  
  in Public
  
  one obtains more and more data
  
  the experiment contains more and more data. Data collected piecemeal presumably is captured by increasingly important priors.
6. rjbuck000 12 Jul 2022
  
  in Public
  
  result in informed beliefs (i.e., informed by data)
  
  This is an integral part of Bayesian estimation and inference, but not a word on the challenges of building an informed prior. The text covers all the blemishes of frequentist modelling, so why not mention this important challenge in Bayesian analysis? These challenges should be acknowledged if not briefly described. If you know of any papers using a good informed prior, they should be at least cited.
7. rjbuck000 12 Jul 2022
  
  in Public
  
  belief
  
  about what?
8. rjbuck000 12 Jul 2022
  
  in Public
  
  probability represents something a little different
  
  when it comes to assessing uncertainty in estimation and inference.
9. rjbuck000 12 Jul 2022
  
  in Public
  
  Data are used to test hypotheses which are either True or False
  
  Does this not hold for Bayesian hypothesis testing? That is whatever hypothesis you state or however you state it is either true or false?
10. rjbuck000 12 Jul 2022
  
  in Public
  
  n frequentist statistics
  
  Don't Bayesians define their probability models in the same way? For example, would a Bayesian not use a binomial to assess the probability of getting 7 or more Heads from 10 tosses of a fair coin? Isn't the root of this model the relative frequency of some event? I agree that there are differences in defining and assessing uncertainty in parameter estimates, but this is a whole different principle than probability in general.
Visit annotations in context

Annotators

rjbuck000

URL

fw8051statistics4ecologists.netlify.app/bayes
fw8051statistics4ecologists.netlify.app fw8051statistics4ecologists.netlify.app

Chapter 10 Maximum likelihood | Statistics for Ecologists

15
1. rjbuck000 12 Jul 2022
  
  in Public
  
  of hatθhatθhat{\theta}
  
  hat notation is not correct.
2. rjbuck000 12 Jul 2022
  
  in Public
  
  have many desirable properties
  
  assuming the choice of distribution used in ML is correct.
3. rjbuck000 12 Jul 2022
  
  in Public
  
  If you find that the results from using optim depend on the starting value, then you should be cautious, try multiple starting values, and select the one that results in the smallest negative log-likelihood
  
  It might also imply the likelihood function is very flat.
4. rjbuck000 12 Jul 2022
  
  in Public
  
  be
  
  delete
5. rjbuck000 12 Jul 2022
  
  in Public
  
  can
  
  "which can"
6. rjbuck000 12 Jul 2022
  
  in Public
  
  returns
  
  space after "returns" needed.
7. rjbuck000 12 Jul 2022
  
  in Public
  
  Therefore, it is more common to work with the log-likelihood
  
  I believe it is near universal; why not skip the example with likelihood function and just start here?
8. rjbuck000 12 Jul 2022
  
  in Public
  
  null
  
  "the null"
9. rjbuck000 12 Jul 2022
  
  in Public
  
  we have
  
  and y values are non-negative.
10. rjbuck000 12 Jul 2022
  
  in Public
  
  Fortunately, there are many websites that can perform differentiation if you get stuck
  
  Also, this has been worked out for all the common distributions.
11. rjbuck000 12 Jul 2022
  
  in Public
  
  Depending on how rusty your math is, you may or may not recall ways to simplify this expression.
  
  I don't feel this sentence is necessary.
12. rjbuck000 12 Jul 2022
  
  in Public
  
  given by
  
  assuming tiles are all the same size
13. rjbuck000 12 Jul 2022
  
  in Public
  
  unknown parameters with the data being fixed
  
  I prefer "... unknown parameters conditional on observed data." or "... unknown parameters given the observed data."
14. rjbuck000 12 Jul 2022
  
  in Public
  
  Alternative mode
  
  The notation between Null and Alternative models is confusing. Subscript j from 1 to 80 for null and i from 1 to 80 in Alternative, even though there are not 80 measurements in each of the j sites. Also, I don't believe you want individual parameter values for each observation in Alternative, so lambda should be j = 1, 2 as noted in following sentence.
15. rjbuck000 12 Jul 2022
  
  in Public
  
  our large sample sizes are large
  
  or the data are actually normally distributed, which is also common.
Visit annotations in context

Annotators

rjbuck000

URL

fw8051statistics4ecologists.netlify.app/mle.html
fw8051statistics4ecologists.netlify.app fw8051statistics4ecologists.netlify.app

Chapter 9 Introduction to probability distributions | Statistics for Ecologists

20
1. rjbuck000 12 Jul 2022
  
  in Public
  
  Model for when a large number of factors act in an additive way
  
  Not sure this a good description. We are discussing a single random variable.
2. rjbuck000 12 Jul 2022
  
  in Public
  
  Number of events in a fixed time interval or spatial unit
  
  Is this correct?
3. rjbuck000 12 Jul 2022
  
  in Public
  
  continuous
  
  or take on a wide range of values so the data are essentially continuous
4. rjbuck000 12 Jul 2022
  
  in Public
  
  F distribution
  
  I have not seen this formulation for displaying df for F distribution before; usually I see F_k1,k2 or F(k1,k2).
5. rjbuck000 12 Jul 2022
  
  in Public
  
  distribution is often encountered
  
  It is also the distribution of estimates of variance when data are normally distributed.
6. rjbuck000 12 Jul 2022
  
  in Public
  
  the uniform distribution is often used as a model of ignorance for prior distributions
  
  It is also the distribution of p-values when the null hypothesis and model assumptions are true.
7. rjbuck000 12 Jul 2022
  
  in Public
  
  The last trial will be a success
  
  Technically the last trial will be a success with probability 1 as it is a requirement of the study design that the last trial be a success. The model still works because presumably we chose r before the study starts as a stopping rule. Hence we are just sampling an infinite sequence of successes and failures. Now if we change r midstream, then we have problems.
8. rjbuck000 12 Jul 2022
  
  in Public
  
  a
  
  "as a"
9. rjbuck000 12 Jul 2022
  
  in Public
  
  derive
  
  We can derive the mass function regardless of what is observed. Perhaps you mean "estimate" or simply want to say the geometric probability mass function has y failures followed a success.
10. rjbuck000 12 Jul 2022
  
  in Public
  
  the
  
  "then"
11. rjbuck000 12 Jul 2022
  
  in Public
  
  Binomial distribution
  
  R has a Bernoulli distribution; it is a binomial distribution with n = 1.
12. rjbuck000 12 Jul 2022
  
  in Public
  
  returns the value of the probability density function
  
  While stated above, I think it would be useful to make clear that mean 0 and std dev 1 are implied in the function.
13. rjbuck000 12 Jul 2022
  
  in Public
  
  and
  
  in next line "0|x=1"
14. rjbuck000 12 Jul 2022
  
  in Public
  
  taken
  
  "take"
15. rjbuck000 12 Jul 2022
  
  in Public
  
  frequentist statistics
  
  Don't Bayesians' use these probabilities as well?
16. rjbuck000 12 Jul 2022
  
  in Public
  
  postivie
  
  "positive"
17. rjbuck000 12 Jul 2022
  
  in Public
  
  population
  
  "general population"
18. rjbuck000 12 Jul 2022
  
  in Public
  
  but it is also often used for frequentist inference
  
  I am not sure what being a frequentist or a Bayesian has to do with this problem. We "know" the underlying probabilities 1% Down's syndrome, 90% true positive and 1% false positive. The rest is calculating the probability that a positive test will be correct. Why should it be a surprise frequentists use Bayes theorem?
19. rjbuck000 12 Jul 2022
  
  in Public
  
  Multiplicative rule:
  
  Isn't this the same as the conditional probability rule?
20. rjbuck000 10 Jul 2022
  
  in Public
  
  needed
  
  "need"
Visit annotations in context

Annotators

rjbuck000

URL

fw8051statistics4ecologists.netlify.app/probability.html
fw8051statistics4ecologists.netlify.app fw8051statistics4ecologists.netlify.app

Chapter 8 Modeling Strategies | Statistics for Ecologists

10
1. rjbuck000 12 Jul 2022
  
  in Public
  
  have no ability to discover important associations
  
  But we learn that changes in observed predictors do not impact the binary data. This is no small piece of information.
2. rjbuck000 12 Jul 2022
  
  in Public
  
  use
  
  delete "use"
3. rjbuck000 12 Jul 2022
  
  in Public
  
  Because the model is pre-specified, we can trust our inferences using confidence intervals and p-values as long as the assumptions are not violated.
  
  I am not sure this is true. Consider an example with two highly correlated predictors and only one is cause of changes in response.
4. rjbuck000 12 Jul 2022
  
  in Public
  
  simple simulation example
  
  This highlights my point above that lots of predictors is a multiple comparison problem as well. Out of 100 variables 5 or 6 are marginally significant, so why should I believe any are truly significant. A look at anova of full vs intercept model shows no significance. Now add one variable that does have a relationship, say y = X1. We see that the overall anova for test is not significant, but X1 is pretty significant (I get 0.0004). How do you manage expectations in these large variable models? One possibility is false discovery rates. So this is a problem with large sets of variables, but I am not convinced it is a big issue with small (say < 10) variables, which most studies are likely to have observed. In the end multiple comparisons and multicollinearity are the big issues in model selection.
5. rjbuck000 12 Jul 2022
  
  in Public
  
  A variable is more likely to be included if by chance its importance in the original sample was overestimated than if it was underestimated (Copas & Long, 1991).
  
  It seems to me that the estimated parameter value and SE are both equally at issue here. If the estimate is near 0, but SE is small, then we may incorrectly include the parameter in the model.
6. rjbuck000 12 Jul 2022
  
  in Public
  
  of
  
  delete "of"
7. rjbuck000 12 Jul 2022
  
  in Public
  
  a
  
  "an"
8. rjbuck000 12 Jul 2022
  
  in Public
  
  two approaches
  
  This is a great example, but the professor set up a false dichotomy in laying out your choices. You need to know the subject matter in both cases. Your poor model selection highlights that need.
9. rjbuck000 12 Jul 2022
  
  in Public
  
  our
  
  "your"
10. rjbuck000 12 Jul 2022
  
  in Public
  
  Modeling
  
  I think Model selection strategies is a better title. Also, I think this more naturally follows Ch. 6. See also bertv
Visit annotations in context

Annotators

rjbuck000

URL

fw8051statistics4ecologists.netlify.app/mmi.html
fw8051statistics4ecologists.netlify.app fw8051statistics4ecologists.netlify.app

Chapter 7 Causal Inference | Statistics for Ecologists

2
1. rjbuck000 12 Jul 2022
  
  in Public
  
  unless we include ZZZ
  
  I am confused. I am not sure what it means to exclude a variable. If you mean you have X -> Y then it is "open" (and no longer a chain). If Z is in path as described, then does not change in X lead to a change in Z which leads to a change in Y. So is that not open as water can get from X to Y. If Z can change on its own to impact Y is another question.
  
  For the fork, if we exclude Z there is no connection between X and Y, i.e., no change in Z leads to no change in either X or Y. So seems like Z is necessary to have an open path, but the analogy falters as water is not going from X TO Y, but going to X AND Y.
  
  From a correlation perspective are not chains and forks the same? After all, correlation does not care whether it is X -> Z or X <- Z.
2. rjbuck000 12 Jul 2022
  
  in Public
  
  confounding variables
  
  Is this not multicollinearity? I think it would it be helpful to make this connection here.
Visit annotations in context

Annotators

rjbuck000

URL

fw8051statistics4ecologists.netlify.app/causal.html
fw8051statistics4ecologists.netlify.app fw8051statistics4ecologists.netlify.app

Chapter 6 Multicollinearity | Statistics for Ecologists

5
1. rjbuck000 12 Jul 2022
  
  in Public
  
  together
  
  "each"?
2. rjbuck000 12 Jul 2022
  
  in Public
  
  not
  
  "no"
3. rjbuck000 12 Jul 2022
  
  in Public
  
  parameter estimates may change in magnitude (and even sign) depending on what over variables are included in the model
  
  This might be a good time to de-emphasize the actual estimated values coming out of the model. The goal is really to just understand relationships. This includes the relationship between predictors as well. We should not worry about the specific number because 1) it is only an estimate, so is a ballpark figure (hopefully) in the first place, and 2) can change with model (multicollinearity), population, etc.
4. rjbuck000 12 Jul 2022
  
  in Public
  
  such data-driven approaches can lead to overfitting
  
  This seems to imply that only dropping variables can lead to overfitting. Keeping them in can lead to overfitting as well.
5. rjbuck000 09 Jul 2022
  
  in Public
  
  it
  
  "it is"
Visit annotations in context

Annotators

rjbuck000

URL

fw8051statistics4ecologists.netlify.app/multicol.html
fw8051statistics4ecologists.netlify.app fw8051statistics4ecologists.netlify.app

Chapter 5 Generalized least squares (GLS) | Statistics for Ecologists

4
1. rjbuck000 12 Jul 2022
  
  in Public
  
  case to
  
  Should this be "male" to match the model in the previous section?
2. rjbuck000 12 Jul 2022
  
  in Public
  
  he gls function
  
  Technically, this is not a generalized least squares estimator as function uses ML or REML to get estimates. Generalized least squares requires all values of covariance matrix to be known up to a constant.
3. rjbuck000 09 Jul 2022
  
  in Public
  
  library(here) # Read in escapement data sdata<-read.csv(here("data", "Sockeyedata.csv"), header=T)
  
  I couldn't get this to work. Where is this data?
4. rjbuck000 09 Jul 2022
  
  in Public
  
  be
  
  "by"
Visit annotations in context

Annotators

rjbuck000

URL

fw8051statistics4ecologists.netlify.app/gls.html
fw8051statistics4ecologists.netlify.app fw8051statistics4ecologists.netlify.app

Chapter 4 Modeling Non-linear relationships | Statistics for Ecologists

2
1. rjbuck000 12 Jul 2022
  
  in Public
  
  than a linear model
  
  This is not a fair comparison of the linear model to the non-linear approach because you are not comparing the best of the splines with the best of the linear models, i.e., log(Species) vs log(Area). They may not be directly comparable via AIC due to transform of response variable, but that doesn't mean it is not a good (or even better) model. Besides, which model is easier to understand at the end of the day?
2. rjbuck000 12 Jul 2022
  
  in Public
  
  better
  
  A "different" approach. In Sec. 15, the log response model works equally well with negative binomial GLM.
Visit annotations in context

Annotators

rjbuck000

URL

fw8051statistics4ecologists.netlify.app/nonlin.html
fw8051statistics4ecologists.netlify.app fw8051statistics4ecologists.netlify.app

Chapter 3 Multiple regression | Statistics for Ecologists

4
1. rjbuck000 12 Jul 2022
  
  in Public
  
  This question brings up the thorny issue of multiple comparisons
  
  This problem also holds if you have lots of different variables in the model.
2. rjbuck000 12 Jul 2022
  
  in Public
  
  It turns out that there are other ways to code
  
  This might be a good type to introduce other contrast ideas such as (-0.5, 0.5) to have intercept represent the mean and slope the difference.
3. rjbuck000 12 Jul 2022
  
  in Public
  
  β0+β1X1,i+β2X2,i+…
  
  Should these betas be hatted?
4. rjbuck000 12 Jul 2022
  
  in Public
  
  It
  
  "In"
Visit annotations in context

Annotators

rjbuck000

URL

fw8051statistics4ecologists.netlify.app/matrixreg
fw8051statistics4ecologists.netlify.app fw8051statistics4ecologists.netlify.app

Chapter 2 Bootstrapping | Statistics for Ecologists

4
1. rjbuck000 11 Jul 2022
  
  in Public
  
  Without this assumption, the sampling distribution of our estimators9 does not follow a t-distribution
  
  With or without this assumption, the sampling distribution will or will not follow a t-distribution. It is what it is. If our assumptions about the model, i.e., t-distribution, are incorrect inference about intercept and slope may not be accurate. Luckily, the CLT says we will be in the ballpark most of the time. This holds true for Bayesian models too.
2. rjbuck000 11 Jul 2022
  
  in Public
  
  fewer
  
  I prefer "different". Not all assumptions are the same, and one big assumption may outweigh 3 little ones.
3. rjbuck000 11 Jul 2022
  
  in Public
  
  modern
  
  I do not like the use of "modern", it connotes the iid N model is outdated. The other models are simply expansions on the general principles of the iid case.
4. rjbuck000 11 Jul 2022
  
  in Public
  
  linear regression models
  
  I believe Zuur et al really mean iid Normal case of linear regression. This confusion is continued in the next sentence when include random effects models in list. Random effects models are linear regression models too.
Visit annotations in context

Annotators

rjbuck000

URL

fw8051statistics4ecologists.netlify.app/boot.html
fw8051statistics4ecologists.netlify.app fw8051statistics4ecologists.netlify.app

Chapter 1 Linear regression review | Statistics for Ecologists

5
1. rjbuck000 11 Jul 2022
  
  in Public
  
  Technically, this is an incorrect statement, at least as far as frequentist statistics go
  
  I don't believe this is an incorrect statement, as I am viewing the probability as the chance my randomly observed experiment has in containing the true parameter based on my model assumptions (and the data validating those assumptions).
2. rjbuck000 11 Jul 2022
  
  in Public
  
  should ideally
  
  I believe 95% CIs do contain the true parameter 95% of the time. Estimators of 95% confidence intervals may be biased, so the expected value of the estimator may be 92%, but then it is a 92% confidence interval.
3. rjbuck000 11 Jul 2022
  
  in Public
  
  must
  
  I don't think "must" applies as there are alternative methods for explaining sampling distributions. I prefer an alternative view in that the observed experiment is simply one "randomly" selected experiment from the implied series of identical experiments. The sampling distribution describes the potential outcomes (and their probabilities) of this series of experiments. The outcome is the same. The sampling distribution is often determined from assumptions made in the regression model, sometimes it is derived empirically.
  
  I agree that simulations are an excellent way to study sampling distributions.
4. rjbuck000 11 Jul 2022
  
  in Public
  
  lm(formula = age ~ proportion.black, data = LionNoses)
  
  The use of percent.black was proposed as a better model in previous section. Why not use it here?
5. rjbuck000 11 Jul 2022
  
  in Public
  
  Independence of the errors
  
  I believe this assumption needs to be stated explicitly, it is not implied in the statements above.
Visit annotations in context

Annotators

rjbuck000

URL

fw8051statistics4ecologists.netlify.app/linreg.html

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL