Hypothesis

31 Matching Annotations

Nov 2025
jgeller112.github.io jgeller112.github.io

Bringing Sexy (Webcam Eye-tracking) Back into the Lab: Stage 1 Registered Report

31
1. jverissimo 11 Nov 2025
  
  in Public
  
  we have 35 in each group
  
  There may be an issue here in performing a power analysis using a different model than the one that will be used (i.e., GLMM vs. GAMM). I dont think that's necessary a problem, and maybe it can be justified on the basis of simplicity.
  
  However, the power analysis only concerns the overall proportions, not the effects of time, nor the onsets (the overall proportions correspond only to the parametric part of the GAMM).
  
  But I suspect that onsets are quite hard to estimate precisely, certainly much harder than the overall proportion effects. So there is a risk that with a between-group design and 35 in each group, the onsets will be very uncertain, especially in the noisy conditions of the low-quality webcam. IF there's any possibility to increase this (resource-wise, etc.), the onsets will thank you in the future.
2. jverissimo 11 Nov 2025
  
  in Public
  
  below
  
  Perhaps a set of exploratory analyses could also be interesting, for example, on cohort vs. rhyme, target vs. cohort, or whatever else (but of course, if they are exploratory they can also be decided upon later and then reported as such...)
3. jverissimo 11 Nov 2025
  
  in Public
  
  earlier
  
  One issue: Our onset detection method is based on statistical significance, i.e., the onset is the earliest time point of a significant increase in the cohort (versus unrelated) smooth. One of our reviewers (McMurray) thinks this is not appropriate, because this means that more noisy data and/or data based on smaller samples would lead to later onsets, thus reducing comparability between experiments.
  
  We think of the use of significance as a feature, not a bug: For one, it reduces researcher degrees of freedom because the criterion is automatically determined. Also, this criterion is very broadly applicable (even to other data types, models,, tasks). Finally, we show in our simulation study that sample size and noise play little role in the coverage properties of our method (whereas they affect the bootstrap-based method of Stone et al. much more dramatically).
  
  Nevertheless, ... McMurray is still correct that our method conflates the two things, noise and early/late. In response, I have implemented an option in the package that allows you to specify a "magnitude threshold" for onset detection, which is not based on significance. It's called 'onset_criterion', and by default, it detects an onset when a magnitude of 0.075 logits is reached relative to the baseline (can be changed with 'onset_threshold').
  
  What does this mean for the RR? It seems to me that what is meant by "earlier" in your hypotheses is already connected to the influence of noise? i.e., data from lower-quality webcams can be much more noisy so it'll be harder to detect a significant difference in that condition. In other words, you need a larger effect in terms of proportions for it to be detected and this may only emerge later? If that is true, the default operation of the method (which uses significance) will indeed align well with your hypotheses.
  
  Still, this is something to keep in mind: (1) you might want to make the distinction between noise and early/late more clear in the RR hypothesis. And/or (2) you might want to preregister secondary analysis with a magnitude criterion rather than a significance-based one, in an attempt to separate noise from a magnitude-based increase in proportion of looks.
4. jverissimo 10 Nov 2025
  
  in Public
  
  H2a
  
  In the "new way" of setting up cohort vs. unrelated as part of the DV, I think these three reduce to a single hypothesis (which is about the same as what is currently expressed in H2b). The package will estimate the temporal onset for the cohort-vs-unrelated preference in the two conditions, and then the two onsets will be compared.
5. jverissimo 10 Nov 2025
  
  in Public
  
  Onset
  
  Consider adding hypothesis about "timecourse", i.e., different effects of time in the two webcam conditions (see below, I expand on that wrt the different smooths)
6. jverissimo 10 Nov 2025
  
  in Public
  
  (H1a) Participants will show a competition effect, with more looks directed toward cohort competitors than unrelated distractors.
  
  I was trying to tie the hypotheses to the analyses (as i comment again below); this is more for me to understand it, maybe nothing needs to be changed here.
  
  H1a is tested by the parametric part of the model, with a significant intercept indicating log-odds different from 0 (so, a difference from 50%) over the whole critical window, and thus a cohort-versus-unrelated advantage. One possible complication is that in half of the trials there are two unrelated images, I'll think about that. Another possible complication is whether this will be estimated across cameras (as an omnibus "main effect") or within each camera.
  
  H1b is the effect of camera. IF we set up the model in this cohort-versus-unrelated way, then the effect of camera will express the difference between the cohort-versus-unrelated advantage between the two cameras. But that is currently H1c, it's just that it won't be an "interaction". But then maybe H1b is lost? Maybe it can still be tested, but with some other model, e.g., maybe a model on proportion of looks on any image out of the total of looks (including those outside of the areas of interest). But Im not sure this is what you were getting at.
7. jverissimo 10 Nov 2025
  
  in Public
  
  dynamic
  
  For the smooth comparison (i.e., for comparing the effects of time) this is a bit more tricky. Model comparison is a common option, but the problem is that this requires fitting models with ML rather than fREML, and then it is not possible to model autocorrelation any longer).
  
  The other option is ordered factors. Camera would be coded as an ordered factor and the smooth needs to be "duplicated" (so to speak) in order to have a reference smooth and a difference smooth. HOWEVER, this can produce different results (and different p-values) depending on which level is chosen as the reference. You could arguably use the lower-quality camera as the reference, because you'd like to estimate that level as well as you can (and estimating via a difference is more uncertain). More conservatively, you could do it both ways and only take a p-value as significant when it is significant in both models.
  
  The fact that the ordered factor approach can lead to different results is why I prefer to conduct onset detection from a model with unordered factors (i.e., to estimate a separate smooth in each level).
  
  Your hypotheses don't currently refer directly to the analysis of timecourse differences (via difference smooths), but maybe this could be added? i.e., that you expect a different timecourse in the two camera conditions, as revealed by different time smooths. Then the onset estimation would be a particular aspect of that timecourse.
8. jverissimo 10 Nov 2025
  
  in Public
  
  a
  
  Perhaps these could be linked directly to the hypotheses above. Specifically, the parametric effect of camera corresponds to the hypothesised difference between conditions (i.e., greater proportion of looks to cohort in one camera condition than in the other).
  
  The intercept will be whether there is a preference for cohort vs. unrelated images during the critical window (however, depending on the contrasts of camera, this could be across both camera conditions, or in only one, or even in both; e.g., one could fit a model with 1+camera with -0.5/0.5 and then a model with 0+camera to get the cohort-versus-unrelated preference in each camera condition).
9. jverissimo 10 Nov 2025
  
  in Public
  
  time
  
  if condition here is ordered (as I say above, for the purpose of detecting onsets, it probably shouldnt be), then s(time) by itself should not appear, because s(time, by=cond4) is already estimating separate smooths.
10. jverissimo 10 Nov 2025
  
  in Public
  
  model
  
  Here you could say what the data looks like, i.e., each subejct (of all tested) contributes one value and calibration is coded 0 or 1 for each.
11. jverissimo 10 Nov 2025
  
  in Public
  
  get_onsets
  
  Some arguments have changed their names (current version is 0.5.13). So this would read something like this:
  
  onsets_comp <- get_onsets(model = m1, time_var = "time", by_var = "camera", compare = T, n_samples=10000, seed=1)
12. jverissimo 10 Nov 2025
  
  in Public
  
  silent
  
  this line can go
13. jverissimo 10 Nov 2025
  
  in Public
  
  method
  
  The stored rho value is missing in this model. Also, one needs to include the start of event variable as a bam argument. I find the itsadug package to be convenient (there's a start.event function or something similar).
14. jverissimo 10 Nov 2025
  
  in Public
  
  select
  
  you probably dont want select in these models, as this will make every smooth (incl. the "fixed effect" time smooths) to have even their linear component penalized. This means that these effects will be pulled to a flat line (a bit like applying shrinkage to them, or as if they were random effects).
15. jverissimo 10 Nov 2025
  
  in Public
  
  m1
  
  So taking it all together, m0/m1 would look something like this:
  
  bam(cbind(fix_cohort, fix_unrelated) ~ 1 + camera + s(time, by = camera, k = 10) + s(participant, by=camera, bs = "re") + s(time, participant, by=camera, bs = "re")
16. jverissimo 10 Nov 2025
  
  in Public
  
  re
  
  The random effects are missing the by=cond4 in the s(participant, bs="re"). So this model would force different participants to have the same overall condition difference.
  
  That is, it should be: s(participant, by=condition, bs = "re") + s(time, participant, by=condition, bs="re")
  
  There is another complication here, which is that there are different ways to include random slopes. One is s(participant, by=camera, bs="re"), which is basically two sets of random intercepts (as if we had 0 + camera | participant in lmer, which automatically shows "cell-mean-coding" for the two levels of camera). Another is s(participant, bs="re") + s(participant, camera, bs="re"), which separates intercepts and slopes.
  
  Given that there are also conditon x time slopes,... I think the way with by=condition is simply easier to set up.
17. jverissimo 10 Nov 2025
  
  in Public
  
  item type
  
  Here I would set it up in a different way. I've seen "this way" before (IIRC, Dan Mirman has models like this in his book). However, the issue with this model is the mutual exclusivity between responses (because looks to the cohort cant be looks to unrelated, and vice-versa). One could say that this kind of perfect correlation then gets captured by the model, but even so, this would only work if certain random effects are included. Worse, the sample size ends up not being right either because the model sees 2 counts for every binomial occurrence ( based on one 1 and one 0 for every "real" observation).
  
  Instead, I treat looks to one or the other object in the display as part of the dependent variable (conceptually, this also seems more correct, because objects aren't quite manipulations, but part of the outcome). So the model will have a DV like cbind(fix_cohort, fix_unrelated), which means that only the webcam condition predictor is necessary, with no interaction. I.e., there'll be 2 smooths, one for each camera condition, and each smooth reflects cohort-versus-unrelated proportions. The "onsets" are the earliest points at which a cohort-versus-unrelated preference is detected--which is hypothesised to be earlier in one webcam condition than the other.
  
  (note that in the example dataset from Ito et al. in our preprint, we do have cohort and unrelated as levels of a condition predictor; but that's because that dataset was a target-absent design with different displays in each condition)
18. jverissimo 10 Nov 2025
  
  in Public
  
  consisted
  
  will consist
19. jverissimo 10 Nov 2025
  
  in Public
  
  rates
  
  here maybe cite the van Rij paper (though it's on pupillometry, but all the same)
20. jverissimo 10 Nov 2025
  
  in Public
  
  wiggles
  
  maybe "smooths" is better (and wiggliness for the amount of non-linearity)
21. jverissimo 10 Nov 2025
  
  in Public
  
  resampled
  
  downsampled?
22. jverissimo 10 Nov 2025
  
  in Public
  
  binomial counts
  
  As I commented just above, these binomial counts/proportions will be counts/proportions only within participant x bin combinations (because for participant x trial x bin combinations they are 1 or 0, given the downsampling/binarization)
23. jverissimo 10 Nov 2025
  
  in Public
  
  participant × trial × time bin
  
  in our previous work we added two steps about here: (1) binarization, i.e., within each time bin, only one data point in each trial is counted. The main reason for this was not to overstate the data, which would happen when counting multiple samples within each bin. Naturally, this intersects with "bin size" (as i mentioned above, we've used pretty small bins, but then tried to prevent the overstating of evidence by modelling autocorrelation). When you say "resampled into 100-ms bins", I suppose this means this "binarization" also takes place (as part of downsampling)? Still, the fact that you end up with 1 count within each participant x trial x bin combination could be made explicit. (2) aggregation. Given that we ended up with 1 count per participant x trial x bin, we then summed counts across trials (i.e., items) in each participant x bin combination. The main purpose of this was again to avoid large autocorrelation values, which in by-trial data could be as high as .99. Another advantage is that we end up with smaller (more parsimonious) models without item random effects (though this could also be seen as a disadvantage). Still another advantage is that both the model fitting and the onset detection are extremely fast.
24. jverissimo 08 Nov 2025
  
  in Public
  
  100-ms bins
  
  in our previous work, we've used 20ms attempting to get a compromise between temporal resolution while not overstating the evidence (essentially, the combination of bin size, binarization within each bin, subject aggregation, and modelling of autocorrelation all matter in some way for this balance). I imagine that in the low-quality condition, the sampling rate will not be enough for something like that, but is something like 50ms possible?
25. jverissimo 08 Nov 2025
  
  in Public
  
  Webcams
  
  should sampling rate also be mentioned here, for both cameras? (also, above you mention "variable sampling rate", which suggests that this is not constant?)
26. jverissimo 07 Nov 2025
  
  in Public
  
  We then subtract 200 ms
  
  There's a question here about whether to "chop off" a bit of time from the analysis (100ms if starting at word onset, or 300ms if also accounting for saccade planning/execution), or instead, modelling the whole curve in the GAM including that initial part. Given that the images are previewed, I'd probably fit the whole curve (or chop off 100ms). On top of that, the onset detection method also allows defining a window, which can be smaller than the time window in the GAM. But we probably dont need to touch that.
27. jverissimo 07 Nov 2025
  
  in Public
  
  stability
  
  Perhaps this sentence could move higher up so that it's clear from the start the two aspects that are manipulated. Alternatively, hardware selection and stability could appear already above as something like: "We plan to manipulate two factors, hardware selection and participant stability, across two experiments" or some such. The only reason im suggesting this is that I was mentally connecting the "two factors" to the "environmental and technical sources" sentence that appears just before, which is not quite the same distinction?
28. jverissimo 07 Nov 2025
  
  in Public
  
  noise
  
  Here there is an explicit reference to noise but then effect sizes and timing are mentioned as the result, perhaps as a consequence of the reduced noise? This is not super clear to me. My immediate feeling is that they're separate notions and I'm not sure that reduced noise would lead to all these effects.
  
  On the other hand: (1) In the proportions/binomial case, I suppose more variation could be linked to lower proportions of looks. E,g,, maximum variation between trials/participants would be a mixture of 0% and 100% of looks, so 50% overall, and minimum variation could be that oinly a certain image was looked at, which would correspond to 100% off looks (although it could also be 0% of looks, that would also be minimum variation) ... but I dont know if you had something like this in mind?
  
  (2) Our onset detection method can be sensitive to amount of noise and indeed might end up detecting less noisy effects as being earlier. This is both good and bad. I discuss this ore below.
  
  Still, despite these points, it seems to me that the link between noise and the other aspects needs to be made more explicitly.
29. jverissimo 07 Nov 2025
  
  in Public
  
  statistical power
  
  In addition, and besides effect size, perhaps the notion of noise/variation could also be mentioned in this paragraph as a separate point. This might help link it to the references to "noise" further down. Moreover, both effect sizes (in the sense of overall proportion of looks) and increased noise/variation then impact statistical power and the need for larger sample sizes.
30. jverissimo 07 Nov 2025
  
  in Public
  
  effect sizes
  
  "Effect size" is a bit of a tricky term because it can refer to both standardized or unstandardized effects, i.e., taking into account the noise/uncertainty/variation or not. In this case, I think you mean, e.g., proportion of looks, irrespective of noise, but maybe that can be made more explicit.
31. jverissimo 07 Nov 2025
  
  in Public
  
  50
  
  i didnt understand exactly what this range refers to (and 50 seems quite a small effect that would require good temporal resolution to be detected?)
Visit annotations in context

Annotators

jverissimo

URL

jgeller112.github.io/Webcam2Lab-VWP/

Annotators

URL