4 Matching Annotations
  1. Jul 2018
    1. On 2017 Apr 24, Lydia Maniatis commented:

      This article’s casual approach to theory is evident in the first few sentences. After noting irrelevantly, that “Since their introduction (Wilkinson, Wilson, & Habak, 1998), RF patterns have become a popular class of stimuli in vision science, commonly used to study various aspects of shape perception,” the authors immediately continue to say that “Theoretically, RF pattern detection (discrimination against a circle) could be realized either by local filters matched to the parts of the pattern, or by a global mechanism that integrates local parts operating on the scale of the entire pattern.” No citation is offered for this vague and breezy assertion, which begs a number of questions.

      1. How did we jump from “shape perception” to “RF detection against a circle”? How is the latter related to the former?

      2. Is the popularity of a pattern sufficient reason to assume that there exist special mechanisms – special detectors, or filters – tailored to its characteristics? Is there any basis whatsoever for this assertion?

      3. Given that we know that the whole does determine the parts perceived, why are we talking about integration of “local” elements? And how do we define local? Doesn’t a piece of a shape also consist of smaller pieces, etc? What is the criterion for designating part and whole in a stimulus pattern (as opposed to the fully-formed percept)?

      Apparently, there have been many ‘models’ proposed for special mechanisms for “RF detection against a circle,” addressing the question in these local/local-to-global terms. Could the mechanism involve maximum curvature integration, tangent orientations at inflection points, etc.? These simply take for granted the underlying assumption that there are special “filters” for “RF discrimination against a circle.” The only question is to what details of the figure are these mechanisms attuned.

      What if we were dealing with different types of shapes? What if the RF boundary shape were formed by different sized dots, or dashes, or rays of different lengths radiating from a center? Would we be talking about dot filters, or line length filters? Why put RF patterns in general, and RF patterns of this type in particular, on such an explanatory pedestal?

      More critically, how is it possible to leverage such patterns to dissect the neural processes underlying perception? When I look at one of these patterns, I don’t have any trouble distinguishing it from a circle. What can this tell me about the underlying process?

      A subculture of vision science has opted to uncritically embrace the view that underlying processes can be inferred quite straightforwardly on the basis of certain procedures that mimic the general framework of signal detection. This view is labeled “signal detection theory” or SDT, but “theory” is overstating it. As noted in my earlier comment, Schmidtmann and Kingdom (2017) never explain why they make what, to a naïve observer, must seem very arbitrary methodological choices, nor does their main reference, Wilkerson, Wilson and Habak (1998). So we have to go back further to find some suggestion of a rationale.

      The founding fathers of the aforementioned subculture include Swets, Tanner and Birdsall (e.g. 1961). As may be seen from a quote from that article (below), the framing of the problem is artificial; major assumptions are adopted wholesale; “perception” is casually converted to “detection” (in order to fit the analogy of a radar observer attempting to guess which blip is the object of interest).

      “In the fundamental detection problem, an observation is made of events occurring in a fixed interval of time and a decision is made; based on this observation, whether the interval contained only the background interference or a signal as well. The interference, which is random, we shall refer to as noise and denote as N; the other alternative we shall term signal plus noise, SN. In the fundamental problem, only these two alternatives exist…We shall, in the following, use the term observation to refer to the sensory datum on which the decision is based. We assume that this observation may be represented as varying continuously along a single dimension…it may be helpful to think of the observation as…the number of impulses arriving at a given point in the cortex within a given time.” Also “We imagine the process of signal detection to be a choice between Gaussian variables….The particular decision that is made depends on whether or not the observation exceeds a criterion value….This description of the detection process is an almost direct translation of the theory of statistical decision.”

      In what sense does the above framework relate to visual perception? I think we can easily show that, in concept and application, it is wholly incoherent and irrational.

      I submit, first, that when I look around me, I don’t see any noise, I just see things. I’m also not conscious of looking for a signal to compare to noise; I just see whatever comes up. I don’t have a criterion for spotting what I don’t know will come up, and I don’t feel uncertain of - I certainly hardly ever have to guess at – what I’m seeing. The very effortlessness of perception is what made it so difficult to discern the fundamental theoretical problems. This is not, of course, to say that what the visual system does in constructing the visual percept from the retinal stimulation isn’t guesswork; but the actual process is light years more complex and subtle than a clumsy and artificial “signal detection” framework.

      Given the psychological certainty of normal perceptual experience, it’s hard to see how to apply this SDT framework. The key seems to be to make conditions of observation so poor as to impede normal perception, making the observer so unsure of what they saw or didn’t see that they must be forced to choose a response, i.e. to guess. One way to degrade viewing conditions is to make the image of interest very low contrast, so that it is barely discernible; another way is to flash it for very brief intervals. Now, in these presentations, the observer presumably sees something; so these manipulations don’t necessarily produce an uncertain perceptual situation (though the brevity of the presentation may make the recollection of that impression mnemonically challenging). Where the uncertainty comes in is in the demand by investigators that observers decide whether the impression is consistent with a quick, degraded glimpse of a particular figure, in this case an RF of a certain type or a circle. I don’t see how one can defend the notion put forth by Swets et al (1961) that this decision, which is more a conscious, cognitive one than a spontaneous perceptual one, is based on a continuously varying criterion. The decision, for example, may be based on a glimpse of one diagnostic feature or another, or on where, by chance, the fovea happens to fall in the 180ms (Schmidtmann and Kingdom, 2017) or 167ms (Wilkerson et al, 1998) interval allowed. But the forced noisiness (due to the poor conditions), the Gaussian presumptions, the continuous variable assumption, and the binary forced choice outputs are needed for the SDT framework to be laid on top of the data.

      For rest of comment (here limited by comment size limits), please see PubPeer.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2017 Apr 23, Lydia Maniatis commented:

      It is oddly difficult to explain why a particular publication has no scientific content, even when, here, this is unequivocally the case. I think it’s important to try and make this quite clear.

      Before addressing the serious theoretical problems, I would like to make the easier points that show that, even on its own terms, the project is sloppy and unsuccessful.

      According to the authors, whatever it is they are proposing is “physiologically unrealistic” (p. 24). Yet they continue on to say: “Nonetheless, the model presented here will hopefully serve as a basis for developing a more physiological model of LF and RF detection.” There is no rationale offered to underpin this inarticulate hope, which seems even more misplaced given that “there is a modest, systematic mismatch between the [unrealistic] model and the data,” despite the very permissive (three free parameters, the post hoc introduction of a corrective function) model-fitting. That the modeling is strictly post hoc and ad hoc in character is reflected in the following statements: “The CFSF model presented here does not predict the inevitable increase in thresholds at frequencies higher than those explored in the present study. To do so would require CFSF with a somewhat different shape to the one shown in Figure 4…However, because we do not have the requisite data showing an upturn in thresholds at very high frequencies, we have not incorporated this feature into our present model.” (p. 24). We are dealing with atheoretical, condition/data-specific post hoc model-fitting with no heuristic value.

      There is also a lack of methodological care in the procedure. As is usual in papers of this type, the number of observers is very small, and they are not all naïve (here, ¾). One is apparently an author (GS). If naivete doesn’t matter, then why mention it, and if it does, why the author participation? Also, while we’re given quite detailed descriptions of many aspects of the stimuli per se – details whose theoretical basis or relevance is unclear - we’re only told that the “monitor’s background was initially set to a mean luminance (grey).” The reference to “grey” is uninformative with respect to actual luminance. The monitor is part of the stimulus. (I don’t understand the reference to “initially.” Maybe I’m missing something.) The following statement also seems strangely casual and vague: “Observers usually completed two experimental blocks for each experimental conditions…” Usually?

      As for this: "The cross-sectional luminance profile was defined by a Gaussian with a standard deviation of 0.05 deg" -- it's just a part of the culture, no explanation needed.

      And then this - in the context of trying to rationalize differences between the present results and those of previous studies: “In addition to the reported data, we conducted a control experiment to measure detection thresholds for RF and LF patterns with a modulation frequency of 30 for two additional naïve observers. Results show that thresholds are no higher than for a modulation frequency of 20.” Why are we discussing unreported data? Why wasn’t this control experiment reported in the body of the paper?

      Experimental stimuli were exposed for 180ms, with a 400ms isi. Why not 500ms, with a 900ms isi? Or something else? 180ms is very short, when we consider the time it takes to initiate a saccade. Was this taken into consideration? Does it matter? In general, on what theoretical basis were conditions selected? Would changing some or all change the results? What would it mean with respect to theory? Is the model so narrowly applicable that it extends only to these specific and apparently arbitrary conditions? If changing conditions would lead to different results, and to different post hoc models, and if the authors can’t predict and assign a theoretical meaning to these different possible outcomes, then it should be clear that the model has no explanatory status, that it is merely an ad hoc mathematical exercise.

      The idea that binary forced choices, with their necessary loss of information, are a good idea is mind-boggling, compounded by the arbitrariness of defining “thresholds” based on a 75% correct rate. Why not 99%? (As I'll discuss later, the SDT rationale is wholly inappropriate here). Why wouldn’t vision scientists be interested in what observers are actually seeing, instead of lumping together who knows what impressions experienced under extremely suboptimal conditions? (The reason for this SDT-related, unfortunate indifference to perception by vision scientists will be discussed in a following comment). Generating data in the required form seems more important than understanding what natural phenomena it reflects and explains, if any. Relatedly, I would note that it is indispensible to the evaluation of any visual perception study for the actual stimuli to be presented for interested readers’ inspection. I have asked the authors for access to these stimuli but haven’t yet received a response.

      But these are minor problems. The fundamental problem is that the authors have implicitly and explicitly adopted assumptions of visual system function that are never tested and are demonstrably lacking in face validity. (In a nutshell we are talking about the major fallacy of treating perception as a signal detection problem and neurons as "detectors.") In other words, even if the assumptions are false, the experiments premised on them are not designed to reveal this. (Yet, not only do existing facts and logical analysis falsify the premises, it would be easy to design similar experiments within the same framework that would falsify or render its arbitrariness evident, as I'll discuss in my second comment). Rather, data generated are simply assumed to reflect the claimed mechanisms, and loosely, with the help of lots of free parameters and ad hoc manipulations, are perpetually interpreted (via model-fitting) in these terms, with tweaks and excuses for every new and slightly different data set that comes along.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

  2. Feb 2018
    1. On 2017 Apr 23, Lydia Maniatis commented:

      It is oddly difficult to explain why a particular publication has no scientific content, even when, here, this is unequivocally the case. I think it’s important to try and make this quite clear.

      Before addressing the serious theoretical problems, I would like to make the easier points that show that, even on its own terms, the project is sloppy and unsuccessful.

      According to the authors, whatever it is they are proposing is “physiologically unrealistic” (p. 24). Yet they continue on to say: “Nonetheless, the model presented here will hopefully serve as a basis for developing a more physiological model of LF and RF detection.” There is no rationale offered to underpin this inarticulate hope, which seems even more misplaced given that “there is a modest, systematic mismatch between the [unrealistic] model and the data,” despite the very permissive (three free parameters, the post hoc introduction of a corrective function) model-fitting. That the modeling is strictly post hoc and ad hoc in character is reflected in the following statements: “The CFSF model presented here does not predict the inevitable increase in thresholds at frequencies higher than those explored in the present study. To do so would require CFSF with a somewhat different shape to the one shown in Figure 4…However, because we do not have the requisite data showing an upturn in thresholds at very high frequencies, we have not incorporated this feature into our present model.” (p. 24). We are dealing with atheoretical, condition/data-specific post hoc model-fitting with no heuristic value.

      There is also a lack of methodological care in the procedure. As is usual in papers of this type, the number of observers is very small, and they are not all naïve (here, ¾). One is apparently an author (GS). If naivete doesn’t matter, then why mention it, and if it does, why the author participation? Also, while we’re given quite detailed descriptions of many aspects of the stimuli per se – details whose theoretical basis or relevance is unclear - we’re only told that the “monitor’s background was initially set to a mean luminance (grey).” The reference to “grey” is uninformative with respect to actual luminance. The monitor is part of the stimulus. (I don’t understand the reference to “initially.” Maybe I’m missing something.) The following statement also seems strangely casual and vague: “Observers usually completed two experimental blocks for each experimental conditions…” Usually?

      As for this: "The cross-sectional luminance profile was defined by a Gaussian with a standard deviation of 0.05 deg" -- it's just a part of the culture, no explanation needed.

      And then this - in the context of trying to rationalize differences between the present results and those of previous studies: “In addition to the reported data, we conducted a control experiment to measure detection thresholds for RF and LF patterns with a modulation frequency of 30 for two additional naïve observers. Results show that thresholds are no higher than for a modulation frequency of 20.” Why are we discussing unreported data? Why wasn’t this control experiment reported in the body of the paper?

      Experimental stimuli were exposed for 180ms, with a 400ms isi. Why not 500ms, with a 900ms isi? Or something else? 180ms is very short, when we consider the time it takes to initiate a saccade. Was this taken into consideration? Does it matter? In general, on what theoretical basis were conditions selected? Would changing some or all change the results? What would it mean with respect to theory? Is the model so narrowly applicable that it extends only to these specific and apparently arbitrary conditions? If changing conditions would lead to different results, and to different post hoc models, and if the authors can’t predict and assign a theoretical meaning to these different possible outcomes, then it should be clear that the model has no explanatory status, that it is merely an ad hoc mathematical exercise.

      The idea that binary forced choices, with their necessary loss of information, are a good idea is mind-boggling, compounded by the arbitrariness of defining “thresholds” based on a 75% correct rate. Why not 99%? (As I'll discuss later, the SDT rationale is wholly inappropriate here). Why wouldn’t vision scientists be interested in what observers are actually seeing, instead of lumping together who knows what impressions experienced under extremely suboptimal conditions? (The reason for this SDT-related, unfortunate indifference to perception by vision scientists will be discussed in a following comment). Generating data in the required form seems more important than understanding what natural phenomena it reflects and explains, if any. Relatedly, I would note that it is indispensible to the evaluation of any visual perception study for the actual stimuli to be presented for interested readers’ inspection. I have asked the authors for access to these stimuli but haven’t yet received a response.

      But these are minor problems. The fundamental problem is that the authors have implicitly and explicitly adopted assumptions of visual system function that are never tested and are demonstrably lacking in face validity. (In a nutshell we are talking about the major fallacy of treating perception as a signal detection problem and neurons as "detectors.") In other words, even if the assumptions are false, the experiments premised on them are not designed to reveal this. (Yet, not only do existing facts and logical analysis falsify the premises, it would be easy to design similar experiments within the same framework that would falsify or render its arbitrariness evident, as I'll discuss in my second comment). Rather, data generated are simply assumed to reflect the claimed mechanisms, and loosely, with the help of lots of free parameters and ad hoc manipulations, are perpetually interpreted (via model-fitting) in these terms, with tweaks and excuses for every new and slightly different data set that comes along.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2017 Apr 24, Lydia Maniatis commented:

      This article’s casual approach to theory is evident in the first few sentences. After noting irrelevantly, that “Since their introduction (Wilkinson, Wilson, & Habak, 1998), RF patterns have become a popular class of stimuli in vision science, commonly used to study various aspects of shape perception,” the authors immediately continue to say that “Theoretically, RF pattern detection (discrimination against a circle) could be realized either by local filters matched to the parts of the pattern, or by a global mechanism that integrates local parts operating on the scale of the entire pattern.” No citation is offered for this vague and breezy assertion, which begs a number of questions.

      1. How did we jump from “shape perception” to “RF detection against a circle”? How is the latter related to the former?

      2. Is the popularity of a pattern sufficient reason to assume that there exist special mechanisms – special detectors, or filters – tailored to its characteristics? Is there any basis whatsoever for this assertion?

      3. Given that we know that the whole does determine the parts perceived, why are we talking about integration of “local” elements? And how do we define local? Doesn’t a piece of a shape also consist of smaller pieces, etc? What is the criterion for designating part and whole in a stimulus pattern (as opposed to the fully-formed percept)?

      Apparently, there have been many ‘models’ proposed for special mechanisms for “RF detection against a circle,” addressing the question in these local/local-to-global terms. Could the mechanism involve maximum curvature integration, tangent orientations at inflection points, etc.? These simply take for granted the underlying assumption that there are special “filters” for “RF discrimination against a circle.” The only question is to what details of the figure are these mechanisms attuned.

      What if we were dealing with different types of shapes? What if the RF boundary shape were formed by different sized dots, or dashes, or rays of different lengths radiating from a center? Would we be talking about dot filters, or line length filters? Why put RF patterns in general, and RF patterns of this type in particular, on such an explanatory pedestal?

      More critically, how is it possible to leverage such patterns to dissect the neural processes underlying perception? When I look at one of these patterns, I don’t have any trouble distinguishing it from a circle. What can this tell me about the underlying process?

      A subculture of vision science has opted to uncritically embrace the view that underlying processes can be inferred quite straightforwardly on the basis of certain procedures that mimic the general framework of signal detection. This view is labeled “signal detection theory” or SDT, but “theory” is overstating it. As noted in my earlier comment, Schmidtmann and Kingdom (2017) never explain why they make what, to a naïve observer, must seem very arbitrary methodological choices, nor does their main reference, Wilkerson, Wilson and Habak (1998). So we have to go back further to find some suggestion of a rationale.

      The founding fathers of the aforementioned subculture include Swets, Tanner and Birdsall (e.g. 1961). As may be seen from a quote from that article (below), the framing of the problem is artificial; major assumptions are adopted wholesale; “perception” is casually converted to “detection” (in order to fit the analogy of a radar observer attempting to guess which blip is the object of interest).

      “In the fundamental detection problem, an observation is made of events occurring in a fixed interval of time and a decision is made; based on this observation, whether the interval contained only the background interference or a signal as well. The interference, which is random, we shall refer to as noise and denote as N; the other alternative we shall term signal plus noise, SN. In the fundamental problem, only these two alternatives exist…We shall, in the following, use the term observation to refer to the sensory datum on which the decision is based. We assume that this observation may be represented as varying continuously along a single dimension…it may be helpful to think of the observation as…the number of impulses arriving at a given point in the cortex within a given time.” Also “We imagine the process of signal detection to be a choice between Gaussian variables….The particular decision that is made depends on whether or not the observation exceeds a criterion value….This description of the detection process is an almost direct translation of the theory of statistical decision.”

      In what sense does the above framework relate to visual perception? I think we can easily show that, in concept and application, it is wholly incoherent and irrational.

      I submit, first, that when I look around me, I don’t see any noise, I just see things. I’m also not conscious of looking for a signal to compare to noise; I just see whatever comes up. I don’t have a criterion for spotting what I don’t know will come up, and I don’t feel uncertain of - I certainly hardly ever have to guess at – what I’m seeing. The very effortlessness of perception is what made it so difficult to discern the fundamental theoretical problems. This is not, of course, to say that what the visual system does in constructing the visual percept from the retinal stimulation isn’t guesswork; but the actual process is light years more complex and subtle than a clumsy and artificial “signal detection” framework.

      Given the psychological certainty of normal perceptual experience, it’s hard to see how to apply this SDT framework. The key seems to be to make conditions of observation so poor as to impede normal perception, making the observer so unsure of what they saw or didn’t see that they must be forced to choose a response, i.e. to guess. One way to degrade viewing conditions is to make the image of interest very low contrast, so that it is barely discernible; another way is to flash it for very brief intervals. Now, in these presentations, the observer presumably sees something; so these manipulations don’t necessarily produce an uncertain perceptual situation (though the brevity of the presentation may make the recollection of that impression mnemonically challenging). Where the uncertainty comes in is in the demand by investigators that observers decide whether the impression is consistent with a quick, degraded glimpse of a particular figure, in this case an RF of a certain type or a circle. I don’t see how one can defend the notion put forth by Swets et al (1961) that this decision, which is more a conscious, cognitive one than a spontaneous perceptual one, is based on a continuously varying criterion. The decision, for example, may be based on a glimpse of one diagnostic feature or another, or on where, by chance, the fovea happens to fall in the 180ms (Schmidtmann and Kingdom, 2017) or 167ms (Wilkerson et al, 1998) interval allowed. But the forced noisiness (due to the poor conditions), the Gaussian presumptions, the continuous variable assumption, and the binary forced choice outputs are needed for the SDT framework to be laid on top of the data.

      For rest of comment (here limited by comment size limits), please see PubPeer.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.