4 Matching Annotations
  1. Jul 2018
    1. On 2016 Feb 05, Lydia Maniatis commented:

      Some additional examples of the casual approach to theory and method that's exhibited here and seems to have become the norm in the vision literature:

      1. A tolerance for extremely ad hoc suggestions: "Meese (personal communication) has suggested an alternative explanation for why β declines with signal area that does not preclude the possibility that the maximum, i.e., whole-area signal condition is detected by a global linear integrator. He suggests that the visual system might employ linear filters matched in shape to the various signal-shape conditions. Thus for the single pie-wedge and windmill conditions these would be pie-wedge and windmill-shaped filters matched to the signal area, culminating in full-circle global linear integrators for the 100% signal area conditions."

      There is no rationale for the suggestion that there are special mechanisms for "wedge and windmill -shaped" areas; they just happen to be the shapes of the stimuli the authors chose (also without a rationale). If they had used square or heart-shaped stimuli, the existence of the corresponding "filters" would apparently have been conceivable.

      1. In their introduction the authors indicate they are studying basic visual perception. However, their definition of "external noise" is completely contingent, not on the spontaneous appearance of stimuli, but on the instructions given to observers to attempt to locate in the spontaneously-arising percept. They are instructed to detect a particular "texture" in a surface that contains more than one such, and in which the different textures tend to blend perceptually. The non-target texture is labelled "external noise" for the purpose of creating the noise terms demanded by the "model." If the task had been to estimate the presence or proportion of vertical bars in the entire stimulus, the noise term would presumably have been all the non-vertical bars. The definition of the term is completely arbitrary, designated without consulting the visual system, so to speak, as to functionally relevant concepts. In a recent article, Solomon, May and Tyler (2016) defined "external noise" in terms of the standard deviations from which they draw their stimuli; this standard deviation is given two different values simply because the model they choose to fit calls for two "external noise" terms.

      2. The title itself (as well as the text) indicates that the vague conclusions are to be applied to the particular stimuli used, stimuli contained in a round envelope (with wedge or windmill-shaped targets areas).


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2016 Jan 27, Lydia Maniatis commented:

      A little way into their introduction, the authors of this article make the following clear and unequivocal assertion:

      “These findings underscore the idea that the encoding of objects and shapes is accomplished by a hierarchical feedforward process along the ventral pathway (Cadieu et al., 2007; Serre, Kouh, Cadieu, Knoblich, & Krei- man, 2005; Serre, Oliva, & Poggio, 2007; Van Essen, Anderson, & Felleman, 1992). The question that arises is how local information detected in the early visual areas is integrated to encode more complex stimuli at subsequent stages.”

      As all vision scientists are aware, the processes involved at every level of vision are both hierarchical and parallel, feedforward and feedback. These processes do not consist of summing “local information” to produce more complex percepts; a small local stimulus change can reconfigure the entire percept, even if the remaining “local information” is unchanged. This has been solidly established, on a perceptual and neural level, on the basis of experiment and logical argument, for many decades. (The authors' use of the term “complex stimuli,” rather than “complex percepts” is also misjudged, as all stimuli are simple in the sense that they stimulate individual, retinal photoreceptors in the same, simple way. Complexity arises as a result of processing - it is not a feature of the retinal (i.e. proximal) stimulus).

      The inaccurate description of the visual process aligns with the authors' attempt to frame the problem of vision as a “summation” problem (using assumptions of signal detection theory), which, again, it decidedly is not. If the theoretical relevance of this study hinges on this inaccurate description, then it has no relevance. Even on its own terms, methodological problems render it without merit.

      In order to apply their paradigm, the authors have constructed an unnatural task, highly challenging because of unnatural conditions - very brief exposures resulting in high levels of uncertainty by design, resulting in many errors, and employing unnaturally ambiguous stimuli. The task demands cut across detection, form perception, attention, and cognition (at the limit, where the subjects are instructed to guess, it is purely cognitive). (Such procedures may be common and old (“popular” according to the authors), but this on its own doesn't lend them theoretical merit).

      On this basis, the investigators generate a dataset reflecting declining performance in the evermore difficult task. The prediction of their particular model seems to be generic: In terms of the type of models the authors are comparing, the probability of success appears to be 50/50; either a particular exponent (“beta”) in their psychometric function will decline, or it will be flat. (In a personal communication, one of the authors notes that no alternative model would predict a rising beta). The fitting is highly motivated and the criteria for success permissive. Half of the conditions produced non-significant results. Muscular and theory-neutral attempts to fit the data couldn't discover a value of “Q” to fit the model, so the authors “have chosen different values for each experiment,” ranging from 75 to 1,500. The data of one of five subjects were “extreme.” In addition, the results were “approximately half as strong as some previous reports, but “It ...remains somewhat of a mystery as to why the threshold versus signal area slopes found here are shallower than in previous studies, and why there is no difference in our study between the thresholds for Glass patterns and Gabor textures.” In other words, it is not known whether such results are replicable, and what mysterious forces are responsible for this lack of replicability.

      It is not clear (to me) how a rough fit to a particular dataset, generated from an unnaturally challenging task implicating multiple, complex, methodologically/theoretically undifferentiated visual processes, of a model that makes such general, low-risk predictions (such as can be virtually assured by a-theoretical methodological choices) can elucidate questions of physiology or principle of the visual, or any, system.

      Finally, although the authors state as their goal to decide whether their model “could be rejected as a model of signal integration in Glass pattern and Glass-pattern-like textures” (does this mean they think there are special mechanisms for such patterns?)” they do not claim to reject the only alternative that they compare (“linear summation”), only that “probability and not linear summation is the most likely basis for the detection of circular, orientation-defined textures.”

      It is not clear what the “most likely” term means here. Most likely that their hypothesis about the visual system is true (what is the hypothesis)? Most likely to have fit their data better than the alternative? (If we take their analysis at face value, then this is 100% true). Is there a critical experiment that could allow us to reject one or the other? If no alternatives can be rejected, then what is the point of such exercises? If some can be, what would be the theoretical implications? Is there a value in simply knowing that a particular method can produce datasets that can be fit (more or less) to a particular algorithm?

      The "summation" approach seen here is typical of an active and productive (in a manner of speaking) subdiscipline (e.g. Kingdom, F. A. A., Baldwin, A. S., & Schmidtmann, G. (2015). Modeling probability and additive summation for detection across multiple mecha- nisms under the assumptions of signal detection theory. Journal of Vision, 15(5):1, 1–16; Meese, T. S., & Summers, R. J. (2012). Theory and data for area summation of contrast with and without uncertainty: Evidence for a noisy energy model. Journal of Vision, 12(11):9, 1–28; Tyler, C. W., & Chen, C.-C. (2000). Signal detection theory in the 2AFC paradigm: Attention, channel uncertainty and probability summation. Vision Research, 40, 3121–3144.)


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

  2. Feb 2018
    1. On 2016 Jan 27, Lydia Maniatis commented:

      A little way into their introduction, the authors of this article make the following clear and unequivocal assertion:

      “These findings underscore the idea that the encoding of objects and shapes is accomplished by a hierarchical feedforward process along the ventral pathway (Cadieu et al., 2007; Serre, Kouh, Cadieu, Knoblich, & Krei- man, 2005; Serre, Oliva, & Poggio, 2007; Van Essen, Anderson, & Felleman, 1992). The question that arises is how local information detected in the early visual areas is integrated to encode more complex stimuli at subsequent stages.”

      As all vision scientists are aware, the processes involved at every level of vision are both hierarchical and parallel, feedforward and feedback. These processes do not consist of summing “local information” to produce more complex percepts; a small local stimulus change can reconfigure the entire percept, even if the remaining “local information” is unchanged. This has been solidly established, on a perceptual and neural level, on the basis of experiment and logical argument, for many decades. (The authors' use of the term “complex stimuli,” rather than “complex percepts” is also misjudged, as all stimuli are simple in the sense that they stimulate individual, retinal photoreceptors in the same, simple way. Complexity arises as a result of processing - it is not a feature of the retinal (i.e. proximal) stimulus).

      The inaccurate description of the visual process aligns with the authors' attempt to frame the problem of vision as a “summation” problem (using assumptions of signal detection theory), which, again, it decidedly is not. If the theoretical relevance of this study hinges on this inaccurate description, then it has no relevance. Even on its own terms, methodological problems render it without merit.

      In order to apply their paradigm, the authors have constructed an unnatural task, highly challenging because of unnatural conditions - very brief exposures resulting in high levels of uncertainty by design, resulting in many errors, and employing unnaturally ambiguous stimuli. The task demands cut across detection, form perception, attention, and cognition (at the limit, where the subjects are instructed to guess, it is purely cognitive). (Such procedures may be common and old (“popular” according to the authors), but this on its own doesn't lend them theoretical merit).

      On this basis, the investigators generate a dataset reflecting declining performance in the evermore difficult task. The prediction of their particular model seems to be generic: In terms of the type of models the authors are comparing, the probability of success appears to be 50/50; either a particular exponent (“beta”) in their psychometric function will decline, or it will be flat. (In a personal communication, one of the authors notes that no alternative model would predict a rising beta). The fitting is highly motivated and the criteria for success permissive. Half of the conditions produced non-significant results. Muscular and theory-neutral attempts to fit the data couldn't discover a value of “Q” to fit the model, so the authors “have chosen different values for each experiment,” ranging from 75 to 1,500. The data of one of five subjects were “extreme.” In addition, the results were “approximately half as strong as some previous reports, but “It ...remains somewhat of a mystery as to why the threshold versus signal area slopes found here are shallower than in previous studies, and why there is no difference in our study between the thresholds for Glass patterns and Gabor textures.” In other words, it is not known whether such results are replicable, and what mysterious forces are responsible for this lack of replicability.

      It is not clear (to me) how a rough fit to a particular dataset, generated from an unnaturally challenging task implicating multiple, complex, methodologically/theoretically undifferentiated visual processes, of a model that makes such general, low-risk predictions (such as can be virtually assured by a-theoretical methodological choices) can elucidate questions of physiology or principle of the visual, or any, system.

      Finally, although the authors state as their goal to decide whether their model “could be rejected as a model of signal integration in Glass pattern and Glass-pattern-like textures” (does this mean they think there are special mechanisms for such patterns?)” they do not claim to reject the only alternative that they compare (“linear summation”), only that “probability and not linear summation is the most likely basis for the detection of circular, orientation-defined textures.”

      It is not clear what the “most likely” term means here. Most likely that their hypothesis about the visual system is true (what is the hypothesis)? Most likely to have fit their data better than the alternative? (If we take their analysis at face value, then this is 100% true). Is there a critical experiment that could allow us to reject one or the other? If no alternatives can be rejected, then what is the point of such exercises? If some can be, what would be the theoretical implications? Is there a value in simply knowing that a particular method can produce datasets that can be fit (more or less) to a particular algorithm?

      The "summation" approach seen here is typical of an active and productive (in a manner of speaking) subdiscipline (e.g. Kingdom, F. A. A., Baldwin, A. S., & Schmidtmann, G. (2015). Modeling probability and additive summation for detection across multiple mecha- nisms under the assumptions of signal detection theory. Journal of Vision, 15(5):1, 1–16; Meese, T. S., & Summers, R. J. (2012). Theory and data for area summation of contrast with and without uncertainty: Evidence for a noisy energy model. Journal of Vision, 12(11):9, 1–28; Tyler, C. W., & Chen, C.-C. (2000). Signal detection theory in the 2AFC paradigm: Attention, channel uncertainty and probability summation. Vision Research, 40, 3121–3144.)


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2016 Feb 05, Lydia Maniatis commented:

      Some additional examples of the casual approach to theory and method that's exhibited here and seems to have become the norm in the vision literature:

      1. A tolerance for extremely ad hoc suggestions: "Meese (personal communication) has suggested an alternative explanation for why β declines with signal area that does not preclude the possibility that the maximum, i.e., whole-area signal condition is detected by a global linear integrator. He suggests that the visual system might employ linear filters matched in shape to the various signal-shape conditions. Thus for the single pie-wedge and windmill conditions these would be pie-wedge and windmill-shaped filters matched to the signal area, culminating in full-circle global linear integrators for the 100% signal area conditions."

      There is no rationale for the suggestion that there are special mechanisms for "wedge and windmill -shaped" areas; they just happen to be the shapes of the stimuli the authors chose (also without a rationale). If they had used square or heart-shaped stimuli, the existence of the corresponding "filters" would apparently have been conceivable.

      1. In their introduction the authors indicate they are studying basic visual perception. However, their definition of "external noise" is completely contingent, not on the spontaneous appearance of stimuli, but on the instructions given to observers to attempt to locate in the spontaneously-arising percept. They are instructed to detect a particular "texture" in a surface that contains more than one such, and in which the different textures tend to blend perceptually. The non-target texture is labelled "external noise" for the purpose of creating the noise terms demanded by the "model." If the task had been to estimate the presence or proportion of vertical bars in the entire stimulus, the noise term would presumably have been all the non-vertical bars. The definition of the term is completely arbitrary, designated without consulting the visual system, so to speak, as to functionally relevant concepts. In a recent article, Solomon, May and Tyler (2016) defined "external noise" in terms of the standard deviations from which they draw their stimuli; this standard deviation is given two different values simply because the model they choose to fit calls for two "external noise" terms.

      2. The title itself (as well as the text) indicates that the vague conclusions are to be applied to the particular stimuli used, stimuli contained in a round envelope (with wedge or windmill-shaped targets areas).


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.