On 2016 Jan 27, Lydia Maniatis commented:
A little way into their introduction, the authors of this article make the following clear and unequivocal assertion:
“These findings underscore the idea that the encoding of objects and shapes is accomplished by a hierarchical feedforward process along the ventral pathway (Cadieu et al., 2007; Serre, Kouh, Cadieu, Knoblich, & Krei- man, 2005; Serre, Oliva, & Poggio, 2007; Van Essen, Anderson, & Felleman, 1992). The question that arises is how local information detected in the early visual areas is integrated to encode more complex stimuli at subsequent stages.”
As all vision scientists are aware, the processes involved at every level of vision are both hierarchical and parallel, feedforward and feedback. These processes do not consist of summing “local information” to produce more complex percepts; a small local stimulus change can reconfigure the entire percept, even if the remaining “local information” is unchanged. This has been solidly established, on a perceptual and neural level, on the basis of experiment and logical argument, for many decades. (The authors' use of the term “complex stimuli,” rather than “complex percepts” is also misjudged, as all stimuli are simple in the sense that they stimulate individual, retinal photoreceptors in the same, simple way. Complexity arises as a result of processing - it is not a feature of the retinal (i.e. proximal) stimulus).
The inaccurate description of the visual process aligns with the authors' attempt to frame the problem of vision as a “summation” problem (using assumptions of signal detection theory), which, again, it decidedly is not. If the theoretical relevance of this study hinges on this inaccurate description, then it has no relevance. Even on its own terms, methodological problems render it without merit.
In order to apply their paradigm, the authors have constructed an unnatural task, highly challenging because of unnatural conditions - very brief exposures resulting in high levels of uncertainty by design, resulting in many errors, and employing unnaturally ambiguous stimuli. The task demands cut across detection, form perception, attention, and cognition (at the limit, where the subjects are instructed to guess, it is purely cognitive). (Such procedures may be common and old (“popular” according to the authors), but this on its own doesn't lend them theoretical merit).
On this basis, the investigators generate a dataset reflecting declining performance in the evermore difficult task. The prediction of their particular model seems to be generic: In terms of the type of models the authors are comparing, the probability of success appears to be 50/50; either a particular exponent (“beta”) in their psychometric function will decline, or it will be flat. (In a personal communication, one of the authors notes that no alternative model would predict a rising beta). The fitting is highly motivated and the criteria for success permissive. Half of the conditions produced non-significant results. Muscular and theory-neutral attempts to fit the data couldn't discover a value of “Q” to fit the model, so the authors “have chosen different values for each experiment,” ranging from 75 to 1,500. The data of one of five subjects were “extreme.” In addition, the results were “approximately half as strong as some previous reports, but “It ...remains somewhat of a mystery as to why the threshold versus signal area slopes found here are shallower than in previous studies, and why there is no difference in our study between the thresholds for Glass patterns and Gabor textures.” In other words, it is not known whether such results are replicable, and what mysterious forces are responsible for this lack of replicability.
It is not clear (to me) how a rough fit to a particular dataset, generated from an unnaturally challenging task implicating multiple, complex, methodologically/theoretically undifferentiated visual processes, of a model that makes such general, low-risk predictions (such as can be virtually assured by a-theoretical methodological choices) can elucidate questions of physiology or principle of the visual, or any, system.
Finally, although the authors state as their goal to decide whether their model “could be rejected as a model of signal integration in Glass pattern and Glass-pattern-like textures” (does this mean they think there are special mechanisms for such patterns?)” they do not claim to reject the only alternative that they compare (“linear summation”), only that “probability and not linear summation is the most likely basis for the detection of circular, orientation-defined textures.”
It is not clear what the “most likely” term means here. Most likely that their hypothesis about the visual system is true (what is the hypothesis)? Most likely to have fit their data better than the alternative? (If we take their analysis at face value, then this is 100% true). Is there a critical experiment that could allow us to reject one or the other? If no alternatives can be rejected, then what is the point of such exercises? If some can be, what would be the theoretical implications? Is there a value in simply knowing that a particular method can produce datasets that can be fit (more or less) to a particular algorithm?
The "summation" approach seen here is typical of an active and productive (in a manner of speaking) subdiscipline (e.g. Kingdom, F. A. A., Baldwin, A. S., & Schmidtmann, G. (2015). Modeling probability and additive summation for detection across multiple mecha- nisms under the assumptions of signal detection theory. Journal of Vision, 15(5):1, 1–16; Meese, T. S., & Summers, R. J. (2012). Theory and data for area summation of contrast with and without uncertainty: Evidence for a noisy energy model. Journal of Vision, 12(11):9, 1–28; Tyler, C. W., & Chen, C.-C. (2000). Signal detection theory in the 2AFC paradigm: Attention, channel uncertainty and probability summation. Vision Research, 40, 3121–3144.)
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.