Reviewer #2 (Public Review):
Summary<br /> The study used eye tracking with a focus on pupillometry to examine how infants can learn to distinguish between informative and uninformative visual cues. Infants (n = 30, mean age = 8.2-months-old) viewed displays consisting of a sequence of stimuli: a fixation point, a central cue that predicted a subsequent informative or uninformative signal, the signal itself, and the target event (a cartoon animal, referred to as the reward). The key results are that: (1) pupil size differs depending on whether the infants anticipated an informative or uninformative signal, (2) this difference develops across trials, consistent with a slow learning process, and (3) there is rapid generalization when new shapes were introduced that shared features with the informative vs uninformative cues. The study complements a rich literature, including from this same group, showing that children are sensitive to information gains, and is interesting and important in revealing that pupil size is a physiological marker of information anticipation. We have several comments and concerns and believe that addressing them would substantially strengthen the manuscript.
Major points are related to interpretation, statistical robustness, and clarity
1. There is a tendency to overinterpret the findings.<br /> a. Throughout, the authors interpret the findings as meaning that pupil size tracks the "value" of information; however, the results do not demonstrate conclusively whether, or what kind of value information has in this task. A natural hypothesis is that infants are intrinsically motivated to predict - i.e., value the ability to predict the target event as early as possible. In a supplementary figure, the authors present evidence that infants indeed fixate on the target event sooner after seeing informative vs uninformative cues, consistent with the idea that they use the information for improving predictions. However, those results are not fully convincing, as we detail in point 2. Most importantly, the analysis is not integrated or even mentioned in the main analyses analysis. Making the link between the pupil reaction and the use of the information would greatly strengthen the paper (whether or not the supplementary findings hold up to more thorough scrutiny). Either this link should be made and discussed, or the authors should soften their conclusions about the utility of the informative cues.
b. On line 236, the text states that the evidence "...supports the growing body of evidence indicating that infants are proactive in shaping their learning environment by searching for and focusing on information-rich stimuli". The results do not show that the infants search for information, only that they have a pupil reaction that differentiates between informative and uninformative stimuli.
c. On lines 248-249, it seems a stretch to relate the changes in pupil dilation to a shift in information value onto the cue. Without some other measure (e.g., EEG), this remains speculative. While I believe the suggestion is plausible, the language should be softened to highlight this as a follow-up research question that the present research cannot directly speak to.
2. Several findings are statistically weak and several analyses are insufficiently controlled.
a. The analysis in Supplementary Figure 2, which shows that the latencies of target fixations are shorter after informative vs uninformative cues, raises several questions.<br /> i. We were unable to fully test these analyses as the OSF project seems to only contain latency data for 33 participants (including 22 of the 30 that remain in the final sample).<br /> ii. The results are described as revealing a significant difference, but the 89% confidence interval of the difference contains 0. How did the authors establish significance here?<br /> iii. How do the authors distinguish incidental fixations (which just happened to land near the target) from true predictive gaze shifts? Fixations were pooled if they occurred from 1.25 seconds before to 1 second after target onset. This is sufficient time for the eye to move in and out of the window several times. The authors should analyse the distributions of fixation durations to rule out various artifacts unrelated to target prediction.<br /> iv. Latencies to fixation were standardized, bringing the mean across each participant to 0, and yet the statistical model includes a random intercept; is there a justification for this?<br /> v. Standardizing removes information about whether fixations were proactive or reactive. It would be very interesting to see if/how information affects these two differently.<br /> vi. Since informativeness was learned across trials, it seems desirable that the model should include as random effects a trial number and an interaction between trial number and informativeness. This would allow a comparison between learning to predict and the pupil reaction. Are infants who have a stronger (or earlier) pupil reaction also more likely to show stronger learning to anticipate?
b. The main finding that pupil size differs between informative and uninformative cues is based on a 3-second analysis window. This long window most likely spans many saccades, which can affect pupil size on its own or by bringing the eye on or off visual stimuli. There is no analysis to show that the statistics of saccades or fixation locations are equivalent between the two trial types - but this is necessary to convincingly rule out a spurious artifact.
c. The second main finding that the effect of informativeness grows across trials seems statistically weak. The text (line 138) states that the interaction had a beta of 0.002, which was equal to the lower border of the 89%HDI ([0.002, 0.003]). For the second claim that pupil size decreased across informative trials, the beta is -0.002, and 89% HID is non-existent - i.e., [-0.002, -0.002]. (In general, the authors should check their numbers more carefully and make sure they are presented with a degree of precision that allows the reader to interpret them meaningfully.
d. The analyses do not indicate how well the TD model fits; we are shown only that it fits better than a linear model. On line 177 a correlation analysis is mentioned between the data and model, but the statistic cited for this test on line 179 is a mean beta coefficient, so it is impossible to know what this means. An analysis of goodness of fit or, at the very least, a figure superimposing the model and data, would be much more convincing.
3. The descriptions are very unclear in some key parts of the paper
a. The description of the TD model applied to pupil learning (starting on line 391) is very unclear. The model has to include some measure of informativeness - i.e., the match between the cued and true target location - but it is unclear how this was formalized. It is also very unclear how time within the trial is incorporated (the meaning of the TDE equation).
b. The description of the generalization analysis (Fig. 5) is also very unclear. Every single sentence in it evoked some confusion, so I will go through them one by one. "A Bayesian additive model showed that infants' pupil dilation was reduced for novel cues." Reduced relative to what? "This was specific to those novel cues that shared the features of the familiar informative cues (estimated mean difference = -0.05, 89%HDI = [-0.062, -0.038])." All the novel cues shared features with the informative cues; do the authors mean the novel cues that had the critical feature indicative of the informative cue? "The size of this effect approximated the difference between conditions that were observed for familiar stimuli (estimated mean difference = -0.067, 89% HDI = [-201 0.077, -0.057])." What is "this effect"? "Crucially, this difference was not observable at the start of the task, when the familiar stimuli were first introduced (estimated mean difference = -0.007, 89%HDI = [-0.015, 0.001])." At the start of the task, the stimuli were novel, and not familiar.