8 Matching Annotations
  1. Jul 2018
    1. On 2016 Apr 19, Lydia Maniatis commented:

      I don't think it is ever explained why "noise" is added to the stimuli.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2016 Apr 19, Lydia Maniatis commented:

      The authors state that "the conclusions of this paper do not depend on our claim that Gabor "letters" are letters; it is enough that they are objects." This seems to imply some kind of generality of the conclusions, extending to "objects" generally. But later they seem unsure if the method can be extended beyond collections of Gabor patches: "It may be possible to extend our approach beyond Gabor letters to other stimuli, such as words, faces and scenes, whose features are unknown. If one assumes the separability found here, then it may be easy to factor out the efficiency of detecting."

      If the "features" of these other entities are unknown, then what is to be separated, and how? How might the "features" come to be known?

      What is clear is that the authors have not considered and do not seem obliged to consider whether in principle (and on the basis of what principle), their specific experimental conditions and results have any generality - i.e. theoretical significance for perception - at all. In this case, they have simply collected some data and crunched the numbers in some arbitrary (because theoretically ambiguous) fashion. Yet this is supposed to be a top journal, whose editors, one might assume would take such factors into consideration.

      The fact is that the whole conversation is moot because perception demonstrably does not and cannot consist of an arbitrarily asserted "two-step process of feature detection and combination" even if the authors could explain what they mean by "feature."


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    3. On 2016 Apr 19, Lydia Maniatis commented:

      You refer to “detection” and “identification.” In vision science, people often refer to “perception.” Where does perception come in, in your scheme? It would seem that identification of any percept presupposes perceiving it in the first place. In addition, we are capable of seeing things that we cannot “identify.” A random blob, for instance. Given that you refer (however invalidly) to the primary visual areas (V1), it would seem that you are interested in the way percepts arise, i.e. perception. Perceiving an object obviously doesn't correspond to detection, which you describe, in effect, as perceiving any inhomogeneity in the surface on which the stimulus is presented, and again, identification is post-perception. So where in your model does object perception come in?

      Relatedly, your equations are founded on terms for contrast, but it seems inappropriate to model identification or recognition on contrast, since this is a post-perception act of comparison. You might say, in response, that your viewing conditions are so poor that observers need to guess, but what does guessing have to do with basic visual processes? And since the reliability of their guesses depends on an arbitrary selection of “letters” and their in particular their frequencies, of what relevance is such reliability to understanding perception?

      Finally, are you aware that the perceptual emergence of parts of a stimulus are contingent on its structure? This was ascertained in experiments performed by Gottschaldt (1929) (in Ellis, A Source Book of Gestalt Psychology) using line drawings (and you state that you consider lines - “bars” - to be “features.”) He showed that for particular figures, the order of emergence of parts with increasing illumination (analogous to increasing contrast here), was repeatable and structure-dependent. You take for granted that all of the parts of your individual Gabors are “detected” (or identified, I'm not sure which) simultaneously, but if this is the case it is because of their structure, which you don't consider.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    4. On 2016 Apr 19, Lydia Maniatis commented:

      I would like to pose a few questions to the authors of this paper.

      First, with respect to the meaning of the term “feature”

      In paragraph 1, the authors state that: “Identifying a letter requires two steps of visual processing: the observer first detects the letter's features and then combines them to recognize the letter (Pelli, Burns, Farrell & Moore-Page, 2006).”

      The insertion of the reference to Pelli et al (2006) seems to imply that this assertion has been corroborated by those authors, but it turns out that in fact, the study in question does not even clarify what it means by the term “features” nor does it employ “features” as independent variables.

      Specifically, Pelli et al, (2006) define features “as an image, or image component...” So they have merely equated “feature” with “image” or “part of an image.” (That this is pretty useless is confirmed by them when they say that “Rather than start with a given list of features (e.g., Gibson, 1969), we left the features unspecified).” Substituting “image” for "feature" in Suchow and Pelli's (2013) title results in the somewhat nonsensical “Learning to detect and combine images (or parts of images) of an object.”

      In their second paragraph Suchow and Pelli (2013) decide to tackle the question themselves, asking “However, what is a feature?” They “narrowly define features as discrete components of an image that are detected independently of each other (Pelli et al 2006).” (It's not clear what the significance of the Pelli et al (2006) reference is here, given the vagueness described above.) The term “component” is wholly uninformative – and there is no indication of which “components of an image” tend to be “detected independently of each other.” So the authors don't seem to have answered their own question. There is nothing in their definition of the term "feature" to help the reader understand what they mean by it. Claims built on terms without an intelligible working definition are not testable in principle, and therefore are not scientific.

      In fact, the authors admit they don't know what they mean by the term “feature:” “To separate the [arbitrarily hypothesized] steps, we need to know the letters features; they are uncertain for traditional letters, so we use Gabor letters instead.” So now “feature” is equivalent to "Gabor". The title should read: “Learning to detect and combine the Gabors of an object.”

      The authors “suppose that our Gabors are features, detected independently (Watson (1979) Robson, Graham (1981).” So we're not even stating on principle that our Gabors are features, merely "supposing" that they are. Neither of the two references provided to support this “supposing” seem to refer to features.

      On what rationale are Gabors to be referred to as “features” rather than as “objects”?

      It seems as though the authors just want something they can label a “feature” without worrying what they mean by it. They state that the “juxtaposition of n Gabors creates an n-feature letter” but this claim is not supported by the two citations they provide; one deals with peripheral vision and the other uses structures made from Gabors and calls them features but offers no definitions or rationale.

      So I would ask the authors a. What is the basis for your claim, quoted above, that “Identifying a letter requires two steps of visual processing...” b. How do you (or do you) distinguish a “feature” from an image or an arbitrary part of an image, or from an “object”? c. What do you mean when you say that an (undefined) feature is “detected independently”?

      With respect to the phrase “learning to detect.”

      In normal circumstances, we don't need to learn to see the world around us. In your experiments, you're making it so difficult that people have to guess, and practice discerning particular forms when they are very faint or lack the homogeneous surface structure that our visual system relies on to segregate figure from ground. Even if it is possible to learn to harness expectation to achieve better quasi-guesses, why do you consider these difficult and unnatural conditions conducive to learning about the normal functioning of our visual system?

      Is the 75% cut-off point used to define contrast threshold derived from studies of the visual system? On what basis is it chosen? If it is arbitrary, then how can calculations based on it illuminate visual function?

      With respect to the use of “Gabors”

      You say that: “Gabors are fairly well matched to the receptive fields of simple cells in the primary visual cortex” though you offer no references to support this statement. Do you mean to imply that different figures, e.g. drawings of puppies, might be poorly-matched to V1 receptive fields? Would this impair our perception of them? Given that V1 neurons are presumably involved in all of our visual percepts, on what basis do you infer a “well-matched” vs “poorly-matched” dichotomy? Are you aware of Teller's (1984) arguments that the notion that particular stimuli tap into particular neurons or groups of neurons is highly problematic on logical and empirical grounds?


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

  2. Feb 2018
    1. On 2016 Apr 19, Lydia Maniatis commented:

      I would like to pose a few questions to the authors of this paper.

      First, with respect to the meaning of the term “feature”

      In paragraph 1, the authors state that: “Identifying a letter requires two steps of visual processing: the observer first detects the letter's features and then combines them to recognize the letter (Pelli, Burns, Farrell & Moore-Page, 2006).”

      The insertion of the reference to Pelli et al (2006) seems to imply that this assertion has been corroborated by those authors, but it turns out that in fact, the study in question does not even clarify what it means by the term “features” nor does it employ “features” as independent variables.

      Specifically, Pelli et al, (2006) define features “as an image, or image component...” So they have merely equated “feature” with “image” or “part of an image.” (That this is pretty useless is confirmed by them when they say that “Rather than start with a given list of features (e.g., Gibson, 1969), we left the features unspecified).” Substituting “image” for "feature" in Suchow and Pelli's (2013) title results in the somewhat nonsensical “Learning to detect and combine images (or parts of images) of an object.”

      In their second paragraph Suchow and Pelli (2013) decide to tackle the question themselves, asking “However, what is a feature?” They “narrowly define features as discrete components of an image that are detected independently of each other (Pelli et al 2006).” (It's not clear what the significance of the Pelli et al (2006) reference is here, given the vagueness described above.) The term “component” is wholly uninformative – and there is no indication of which “components of an image” tend to be “detected independently of each other.” So the authors don't seem to have answered their own question. There is nothing in their definition of the term "feature" to help the reader understand what they mean by it. Claims built on terms without an intelligible working definition are not testable in principle, and therefore are not scientific.

      In fact, the authors admit they don't know what they mean by the term “feature:” “To separate the [arbitrarily hypothesized] steps, we need to know the letters features; they are uncertain for traditional letters, so we use Gabor letters instead.” So now “feature” is equivalent to "Gabor". The title should read: “Learning to detect and combine the Gabors of an object.”

      The authors “suppose that our Gabors are features, detected independently (Watson (1979) Robson, Graham (1981).” So we're not even stating on principle that our Gabors are features, merely "supposing" that they are. Neither of the two references provided to support this “supposing” seem to refer to features.

      On what rationale are Gabors to be referred to as “features” rather than as “objects”?

      It seems as though the authors just want something they can label a “feature” without worrying what they mean by it. They state that the “juxtaposition of n Gabors creates an n-feature letter” but this claim is not supported by the two citations they provide; one deals with peripheral vision and the other uses structures made from Gabors and calls them features but offers no definitions or rationale.

      So I would ask the authors a. What is the basis for your claim, quoted above, that “Identifying a letter requires two steps of visual processing...” b. How do you (or do you) distinguish a “feature” from an image or an arbitrary part of an image, or from an “object”? c. What do you mean when you say that an (undefined) feature is “detected independently”?

      With respect to the phrase “learning to detect.”

      In normal circumstances, we don't need to learn to see the world around us. In your experiments, you're making it so difficult that people have to guess, and practice discerning particular forms when they are very faint or lack the homogeneous surface structure that our visual system relies on to segregate figure from ground. Even if it is possible to learn to harness expectation to achieve better quasi-guesses, why do you consider these difficult and unnatural conditions conducive to learning about the normal functioning of our visual system?

      Is the 75% cut-off point used to define contrast threshold derived from studies of the visual system? On what basis is it chosen? If it is arbitrary, then how can calculations based on it illuminate visual function?

      With respect to the use of “Gabors”

      You say that: “Gabors are fairly well matched to the receptive fields of simple cells in the primary visual cortex” though you offer no references to support this statement. Do you mean to imply that different figures, e.g. drawings of puppies, might be poorly-matched to V1 receptive fields? Would this impair our perception of them? Given that V1 neurons are presumably involved in all of our visual percepts, on what basis do you infer a “well-matched” vs “poorly-matched” dichotomy? Are you aware of Teller's (1984) arguments that the notion that particular stimuli tap into particular neurons or groups of neurons is highly problematic on logical and empirical grounds?


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2016 Apr 19, Lydia Maniatis commented:

      You refer to “detection” and “identification.” In vision science, people often refer to “perception.” Where does perception come in, in your scheme? It would seem that identification of any percept presupposes perceiving it in the first place. In addition, we are capable of seeing things that we cannot “identify.” A random blob, for instance. Given that you refer (however invalidly) to the primary visual areas (V1), it would seem that you are interested in the way percepts arise, i.e. perception. Perceiving an object obviously doesn't correspond to detection, which you describe, in effect, as perceiving any inhomogeneity in the surface on which the stimulus is presented, and again, identification is post-perception. So where in your model does object perception come in?

      Relatedly, your equations are founded on terms for contrast, but it seems inappropriate to model identification or recognition on contrast, since this is a post-perception act of comparison. You might say, in response, that your viewing conditions are so poor that observers need to guess, but what does guessing have to do with basic visual processes? And since the reliability of their guesses depends on an arbitrary selection of “letters” and their in particular their frequencies, of what relevance is such reliability to understanding perception?

      Finally, are you aware that the perceptual emergence of parts of a stimulus are contingent on its structure? This was ascertained in experiments performed by Gottschaldt (1929) (in Ellis, A Source Book of Gestalt Psychology) using line drawings (and you state that you consider lines - “bars” - to be “features.”) He showed that for particular figures, the order of emergence of parts with increasing illumination (analogous to increasing contrast here), was repeatable and structure-dependent. You take for granted that all of the parts of your individual Gabors are “detected” (or identified, I'm not sure which) simultaneously, but if this is the case it is because of their structure, which you don't consider.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    3. On 2016 Apr 19, Lydia Maniatis commented:

      The authors state that "the conclusions of this paper do not depend on our claim that Gabor "letters" are letters; it is enough that they are objects." This seems to imply some kind of generality of the conclusions, extending to "objects" generally. But later they seem unsure if the method can be extended beyond collections of Gabor patches: "It may be possible to extend our approach beyond Gabor letters to other stimuli, such as words, faces and scenes, whose features are unknown. If one assumes the separability found here, then it may be easy to factor out the efficiency of detecting."

      If the "features" of these other entities are unknown, then what is to be separated, and how? How might the "features" come to be known?

      What is clear is that the authors have not considered and do not seem obliged to consider whether in principle (and on the basis of what principle), their specific experimental conditions and results have any generality - i.e. theoretical significance for perception - at all. In this case, they have simply collected some data and crunched the numbers in some arbitrary (because theoretically ambiguous) fashion. Yet this is supposed to be a top journal, whose editors, one might assume would take such factors into consideration.

      The fact is that the whole conversation is moot because perception demonstrably does not and cannot consist of an arbitrarily asserted "two-step process of feature detection and combination" even if the authors could explain what they mean by "feature."


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    4. On 2016 Apr 19, Lydia Maniatis commented:

      I don't think it is ever explained why "noise" is added to the stimuli.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.