- Jul 2018
-
europepmc.org europepmc.org
-
On 2017 May 12, Lydia Maniatis commented:
"The high levels of correlation between the four measures used in this study (fixations, interest points, taps and computed salience; see Fig. 3) support the conclusion that the tapping paradigm is a valid measure of salience."
Ascertaining that people look where they point (how could they guide their movement if they didn't?) has to be very high on the list of predictable predictions. With respect to "computed salience,": "Finally, saliency maps computed from the Itti et al. (1998) model were compared against the tap data and found to correlate beyond the null hypothesis, Rӻ bTޠ젰:21; p 젴:3 1015, though not significantly below the sample error hypothesis, Rӻ eST ޠ젰:25; p 젰:075. This relatively low value of Rӻ eST ޠis obtained because the computed saliency maps were relatively diffuse."
"In the absence of a specific task (蘦ree viewing� it seems reasonable to assume that at least for the first few images, and for the first few fixations in these images, observers let themselves be guided by the visual input, rather than by some more complex strategy..."
The criterion of "it seems reasonable [to us] to assume that..." is the contemporary definition of rigor (providing a solid rationale or even testing assumptions, in this case at the least debriefing subjects). In contrast, it seems reasonable to me to assume that if someone asks me to freely select a place in a picture to point to, I would want to point at something interesting or meaningful, not at the first thing that caught my attention, e.g. the brightest spot. That is, observers awareness that someone else is observing and in some way assessing their choices makes the authors assumptions that they are limiting "top-down" influences seem very weak to me. Of course, the top-down/bottom up distinction is itself completely vague. If, in the image, I see two chairs and a sofa and point to the one that I immediately recognize as having seen in IKEA, is this top-down or bottom up?
Relatedly, the authors casually address the issue of how many fixations preceded the pointing during the 1.4 seconds of viewing time: "Note that for the tapping study, the reaction time includes the time after the subject has decided where to tap, the movement of the hand, as well as the (relatively short) delay between the tap on the initialization screen and the presentation of the image. We therefore estimate that the majority of subjects performed three or fewer saccades before deciding where to tap." So, at least 127/252? Is this really an adequate assumption? And what is the rationale for 촨ree or fewer�eing an important cut-off?
It's also typical of the contemporary approach that the experimental emphasis is wholly on technique and statistics and completely agnostic to the actual stimuli/conditions and to the percepts to which they give rise, as well as to the many fundamental conceptual issues that such considerations entail, and of course the effect of stimulus variations on the shape of the data.
This empirical agnosticism is reflected in the use of the term "natural scene" to characterize stimuli; it is completely uninformative as to the characteristics of the stimuli. (This is especially the case as "natural scene" here includes, as it often does in scholarly publications, images of buildings on a college campus).
Surely, certain sets of such stimuli would produce greater or smaller inter-individual differences than others, altering the already weak data significantly as to "saliency maps." For example, if an image contained one person, then attention would generally fall on this person. But if there were two people, the outcome would probably be divided between the two, and so on. (Is seeing a person in a brief presentation top-down or bottom-up?)
Wouldn't it be weird if "attentive pointing" DIDN'T correlate with "other measures of attention"? So weird that the interpretation of the results would probably be chalked up to the many sampling uncertainties and confounding factors that are, in the predictable case, bustled through with lots of convenient (or "reasonable") assumptions and special pleading for weak data.
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
-
- Feb 2018
-
europepmc.org europepmc.org
-
On 2017 May 12, Lydia Maniatis commented:
"The high levels of correlation between the four measures used in this study (fixations, interest points, taps and computed salience; see Fig. 3) support the conclusion that the tapping paradigm is a valid measure of salience."
Ascertaining that people look where they point (how could they guide their movement if they didn't?) has to be very high on the list of predictable predictions. With respect to "computed salience,": "Finally, saliency maps computed from the Itti et al. (1998) model were compared against the tap data and found to correlate beyond the null hypothesis, Rӻ bTޠ젰:21; p 젴:3 1015, though not significantly below the sample error hypothesis, Rӻ eST ޠ젰:25; p 젰:075. This relatively low value of Rӻ eST ޠis obtained because the computed saliency maps were relatively diffuse."
"In the absence of a specific task (蘦ree viewing� it seems reasonable to assume that at least for the first few images, and for the first few fixations in these images, observers let themselves be guided by the visual input, rather than by some more complex strategy..."
The criterion of "it seems reasonable [to us] to assume that..." is the contemporary definition of rigor (providing a solid rationale or even testing assumptions, in this case at the least debriefing subjects). In contrast, it seems reasonable to me to assume that if someone asks me to freely select a place in a picture to point to, I would want to point at something interesting or meaningful, not at the first thing that caught my attention, e.g. the brightest spot. That is, observers awareness that someone else is observing and in some way assessing their choices makes the authors assumptions that they are limiting "top-down" influences seem very weak to me. Of course, the top-down/bottom up distinction is itself completely vague. If, in the image, I see two chairs and a sofa and point to the one that I immediately recognize as having seen in IKEA, is this top-down or bottom up?
Relatedly, the authors casually address the issue of how many fixations preceded the pointing during the 1.4 seconds of viewing time: "Note that for the tapping study, the reaction time includes the time after the subject has decided where to tap, the movement of the hand, as well as the (relatively short) delay between the tap on the initialization screen and the presentation of the image. We therefore estimate that the majority of subjects performed three or fewer saccades before deciding where to tap." So, at least 127/252? Is this really an adequate assumption? And what is the rationale for 촨ree or fewer�eing an important cut-off?
It's also typical of the contemporary approach that the experimental emphasis is wholly on technique and statistics and completely agnostic to the actual stimuli/conditions and to the percepts to which they give rise, as well as to the many fundamental conceptual issues that such considerations entail, and of course the effect of stimulus variations on the shape of the data.
This empirical agnosticism is reflected in the use of the term "natural scene" to characterize stimuli; it is completely uninformative as to the characteristics of the stimuli. (This is especially the case as "natural scene" here includes, as it often does in scholarly publications, images of buildings on a college campus).
Surely, certain sets of such stimuli would produce greater or smaller inter-individual differences than others, altering the already weak data significantly as to "saliency maps." For example, if an image contained one person, then attention would generally fall on this person. But if there were two people, the outcome would probably be divided between the two, and so on. (Is seeing a person in a brief presentation top-down or bottom-up?)
Wouldn't it be weird if "attentive pointing" DIDN'T correlate with "other measures of attention"? So weird that the interpretation of the results would probably be chalked up to the many sampling uncertainties and confounding factors that are, in the predictable case, bustled through with lots of convenient (or "reasonable") assumptions and special pleading for weak data.
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
-