24 Matching Annotations
  1. Jul 2018
    1. On 2017 Aug 06, John Greenwood commented:

      (cross-posted from Pub Peer, comment numbers refer to that discussion but content is the same)

      To address your comments in reverse order -

      Spatial vision and spatial maps (Comment 19):

      We use the term “spatial vision” in the sense defined by Russell & Karen De Valois: “We consider spatial vision to encompass both the perception of the distribution of light across space and the perception of the location of visual objects within three-dimensional space. We thus include sections on depth perception, pattern vision, and more traditional topics such as acuity." De Valois, R. L., & De Valois, K. K. (1980). Spatial Vision. Annual Review of Psychology, 31(1), 309-341. doi:doi:10.1146/annurev.ps.31.020180.001521

      The idea of a "spatial map” refers to the representation of the visual field in cortical regions. There is extensive evidence that visual areas are organised retinotopically across the cortical surface, making them “maps". See e.g. Wandell, B. A., Dumoulin, S. O., & Brewer, A. A. (2007). Visual field maps in human cortex. Neuron, 56(2), 366-383.

      Measurement of lapse rates (Comments 4, 17, 18):

      There really is no issue here. In Experiment 1, we fit a psychometric function in the form of a cumulative Gaussian to responses plotted as a function of (e.g.) target-flanker separation (as in Fig. 1B), with three free parameters: midpoint, slope, and lapse rate. The lapse rate is 100-x where x is the asymptote of the curve. It accounts for lapses (keypress errors etc) when performance is otherwise high - i.e. it is independent of the chance level. In this dataset it is never about 5%. However its inclusion does improve estimate of slope (and therefore threshold) which we are interested in. Any individual differences are therefore better estimated by factoring out individual differences in lapse rate. Its removal does not qualitatively affect the pattern of results in any case. You cite Wichmann and Hill (2001) and that is indeed the basis of this three-parameter fit (though ours is custom code that doesn’t apply the bootstrapping procedures etc that they use).

      Spatial representations (comment 8):

      We were testing the proposal that crowding and saccadic preparation might depend on some degree of shared processes within the visual system. Specific predictions for shared vs distinct spatial representations are made on p E3574 and in more detail on p E3576 of our manuscript. The idea comes from several prior studies arguing for a link between the two, as we cite, e.g.: Nandy, A. S., & Tjan, B. S. (2012). Saccade-confounded image statistics explain visual crowding. Nature Neuroscience, 15(3), 463-469. Harrison, W. J., Mattingley, J. B., & Remington, R. W. (2013). Eye movement targets are released from visual crowding. The Journal of Neuroscience, 33(7), 2927-2933.

      Bisection (Comments 7, 13, 15):

      Your issue relates to biases in bisection. This is indeed an interesting area, mostly studied for foveal presentation. These biases are however small in relation to the size of thresholds for discrimination, particularly for the thresholds seen in peripheral vision where our measurements were made. An issue with bias for vertical judgements would lead to higher thresholds for vertical vs. horizontal judgements, which we don’t see. The predominant pattern in bisection thresholds (as with the other tasks) is a radial/tangential anisotropy, so vertical thresholds are worse than horizontal on the vertical meridian, but better than horizontal thresholds on the horizontal meridian. The role of biases in that anisotropy is an interesting question, but again these biases tend to be small relative to threshold.

      Vernier acuity (Comment 6):

      We don’t measure vernier acuity, for exactly the reasons you outline (stated on p E3577).

      Data analyses (comment 5):

      The measurement of crowding/interference zones follows conventions established by others, as we cite, e.g.: Pelli, D. G., Palomares, M., & Majaj, N. J. (2004). Crowding is unlike ordinary masking: Distinguishing feature integration from detection. Journal of Vision, 4(12), 1136-1169.

      Our analyses are certainly not post-hoc exercises in data mining. The logic is outlined at the end of the introduction for both studies (p E3574).

      Inclusion of the authors as subjects (Comment 3):

      In what way should this affect the results? This can certainly be an issue for studies where knowledge of the various conditions can bias outcomes. Here this is not true. We did of course check that data from the authors did not differ in any meaningful way from other subjects (aside from individual differences), and it did not. Testing (and training) experienced psychophysical observers takes time, and authors tend to be experienced psychophysical observers.

      The theoretical framework of our experiments (Comments 1 & 2):

      We make an assumption about hierarchical processing within the visual system, as we outline in the introduction. We test predictions that arise from this. We don’t deny that feedback connections exist, but I don’t think their presence would alter the predictions outlined at the end of the introduction. We also make assumptions regarding the potential processing stages/sites underlying the various tasks examined. Of course we can’t be certain about this (and psychophysics is indeed ill-poised to test these assumptions) and that is the reason that no one task is linked to any specific neural locus, e.g. crowding shows neural correlates in visual areas V1-V4, as we state (e.g. p E3574). Considerable parts of the paper are then addressed at considering whether some tasks may be lower- or higher-level than others, and we outline a range of justifications for the arguments made. These are all testable assumptions, and it will be interesting to see how future work then addresses this.

      All of these comments are really fixated on aspects of our theoretical background and minor details of the methods. None of this in any way negates our findings. Namely, there are distinct processes within the visual system, e.g. crowding and saccadic precision, that nonetheless show similarities in their pattern of variations across the visual field. We show several results that suggest these two processes to be dissociable (e.g. that the distribution of saccadic errors is identical for trials where crowded targets were correctly vs incorrectly identified). If they’re clearly dissociable tasks, how then to explain the correlation in their pattern of variation? We propose that these properties are inherited from earlier stages in the visual system. Future work can put this to the test.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2017 Jul 08, Lydia Maniatis commented:

      None


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    3. On 2017 Jul 07, Lydia Maniatis commented:

      I'm not an expert in statistics, but it seems to me that the authors have conducted multiple and sequential comparisons without applying the appropriate correction. In addition, the number of subjects is small.

      Also, the definition of key variables - "crowding zone" and "saccade error zone" - seems arbitrary given that they are supposed to tap into fundamental neural features of the brain. The former is defined as "target-flanker separation at which performance reached 80% correct [i.e. 20% incorrect]...which we take as the dimensions of the crowding zone," the latter by fitting "2D Gaussian functions to the landing errors and defin[ing] an ellipse with major and minor axes that captured 80% of the landing positions (shown with a black dashed line in Fig. 1C). The major and minor axes of this ellipse were taken as the radial and tangential dimensions of the “saccade error zone.”"

      What is the relationship between what the authors "take as" the crowding/saccade error zones and a presumptive objective definition? What is the theoretical significance of the 80% cut-off? What would the data look like if we used a 90% cut-off?

      Is a "finding" that a hierarchical linear regression "explains" 7.3% of the variance meaningful? The authors run two models, and in one saccades are a "significant predictor" of the data while in the other they are no longer significant, while gap resolution and bisection are. Conclusions seem to be based more on chance than necessity, so to speak.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    4. On 2017 Jul 07, Lydia Maniatis commented:

      What would it mean for "crowding and saccade errors" to rely on a common spatial representation of the visual field? The phenomena are clearly not identical - one involves motor planning, for example - and thus their neural substrates will not be identical. To the extent that "spatial map" refers to a neural substrate, then these will not be identical. So I'm not understanding the distinction being made between spatial maps "with inherited topological properties" and "distinct spatial maps."


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    5. On 2017 Jul 02, Lydia Maniatis commented:

      None


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    6. On 2017 Jul 02, Lydia Maniatis commented:

      Part 6

      With respect to line bisection:

      It is mentioned by Arnheim (Art and Visual Perception) that if you ask a person to bisect a vertical line under the best conditions - that is, conditions of free-viewing without time limits - they will tend to place the mark too high:

      "An experimental demonstration with regard to size is mentioned by Langfeld: "If one is asked to bisect a perpendicular line without measuring it, one almost invariably places the mark too high. If a line is actually bisected, it is with difficulty that one can convince oneself that the upper half is not longer than the lower half." This means that if one wants the two halves to look alike, one must make the upper half shorter. " (p. 30).

      As the authors of this study don't seem to have taken this apparent, systematic bias into account, their "correct" and "incorrect" criterion of line bisection under the adverse conditions they impose may not be appropriate. It is also obvious that the results of the method used did not alert the authors to the possibility of such a bias.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    7. On 2017 Jun 29, Lydia Maniatis commented:

      Part 5

      With respect to Vernier acuity, and in addition to my earlier objections, I would add that a "low-level" description seems to be at odds with the fact that Vernier acuity, which is also described as hyperacuity, is better than would be expected on the basis of the spacing of the receptors in the retina.

      "Yet spatial distinctions can be made on a finer scale still: misalignment of borders can be detected with a precision up to 10 times better than visual acuity. This hyperacuity, transcending by far the size limits set by the retinal 'pixels', depends on sophisticated information processing in the brain....[the] quintessential example and the one for which the word was initially coined,[1] is vernier acuity: alignment of two edges or lines can be judged with a precision five or ten times better than acuity. " (Wikipedia entry on hyperacuity).

      When an observer is asked a question about alignment of two line segments, the answer they give is, always, based on the percept, i.e. a high-level, conscious product of visual processing. It is paradoxical to argue that some percepts are high and others low-level, because even if one wanted to argue that some percepts reflect low-level activity, the decision to derive the percept or features thereof from a particular level in one and another level in another case would have to be high-level. The perceived better-than-it-should be performance that occurs in instances of so-called hyperacuity is effectivelyan inference, as are all interpretations of the retinal stimulation, whether a 3D Necker cube or the Mona Lisa. It's not always the case that two lines that are actually aligned will appear aligned. (Even a single continuous line may appear bent - yet line segments are supposed to be the V1 specialty). It all depends on the structure of the whole retinal configuration, and the particular, high-level, inferences to which this whole stimulation give rise in perception.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    8. On 2017 Jun 29, Lydia Maniatis commented:

      None


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    9. On 2017 Jun 28, Lydia Maniatis commented:

      Part 3

      I don't understand why it has become normalized for observers in psychophysical experiments to include the authors of a study. Here, authors form one quarter of the participants in the first experiment and nearly one third in the second. Aside from the authors, participants are described as "naive." As this practice is accepted at PNAS, I can only imagine that psychophysical experiments require a mix of subjects who are naive to the purpose and subjects who are highly motivated to achieve a certain result. I only wish the reasons for the practice were made explicit. Because it seems to me that if it's too difficult to find enough naive participants for a study that requires them, then it's too difficult to do the study.

      If the inclusion of authors as subjects seems to taint the raw data, there is also a problem with the procedure to which the data are subjected prior to analysis. This essentially untestable, assumption-laden procedure is completely opaque, and mentioned fleetingly in the Methods:

      "Psychometric functions were fitted to behavioral data using a cumulative Gaussian with three parameters (midpoint, slope, and lapse rate). "

      The key term here is "lapse rate." The lapse rate concept is a controversial theoretical patch-up developed to deal with the coarseness of the methods adopted in psychophysics, specifically the use of forced choices. When subjects are forced to make a choice even when what they perceive doesn't fall into the two, three or four choices preordained by the experimenters, then they are forced to guess. The problem is serious because most psychophysical experiments are conducted under perceptually very poor conditions, such as low contrast and very brief stimulus presentations. This obviously corrupts the data. At some point, practitioners of the method decided they had to take into account this "lapse rate," i.e. the "guess rate." That the major uncertainty incorporated into the forced-choice methodology could not be satisfactorily resolved is illustrated in comments by Prins (2012/JOV), whose abstract I quote in full below:

      "In their influential paper, Wichmann and Hill (2001) have shown that the threshold and slope estimates of a psychometric function may be severely biased when it is assumed that the lapse rate equals zero but lapses do, in fact, occur. Based on a large number of simulated experiments, Wichmann and Hill claim that threshold and slope estimates are essentially unbiased when one allows the lapse rate to vary within a rectangular prior during the fitting procedure. Here, I replicate Wichmann and Hill's finding that significant bias in parameter estimates results when one assumes that the lapse rate equals zero but lapses do occur, but fail to replicate their finding that freeing the lapse rate eliminates this bias. Instead, I show that significant and systematic bias remains in both threshold and slope estimates even when one frees the lapse rate according to Wichmann and Hill's suggestion. I explain the mechanisms behind the bias and propose an alternative strategy to incorporate the lapse rate into psychometric function models, which does result in essentially unbiased parameter estimates."

      It should be obvious that calculating the rate at which subjects are forced to guess is highly-condition-sensitive and subject-sensitive, and that even if one believes the uncertainty can be removed by a data manipulation, there can be no one-size-fits all method. Which strategy for calculating guessing rate have Greenwood et al (2017) adopted? Why? What was the "lapse rate"? There would seem to be no point in even looking at the data unless their data manipulation and its rationale are made explicit.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    10. On 2017 Jun 27, Lydia Maniatis commented:

      Part 2b (due to size limit)

      I would note, finally, that unless the authors are also believers in a transparent brain for some, but not other, perceived features resulting from a retinal stimulation event, the idiosyncratic/summative/inherited/low-level effects claims should presumably be detectable in a wide range of normal perceptual experiences, not only in peripheral vision under conditions which are so poor that observers have to guess at a response some unknown proportion of the time, producing very noisy data interpreted in vague terms with a large number of researcher degrees of freedom and a great deal of theoretical special pleading. Why not look for these hypothesized effects where they would be expected to be most clearly expressed?


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    11. On 2017 Jun 27, Lydia Maniatis commented:

      Part 2 A related and equally untenable theoretical claim casually adopted by Greenwood et al (2017) is one which Teller (1984) has explicitly criticized and Graham (2011) has uncritically embraced and unintentionally satirized. It is the notion that some aspects of perceptual experience are directly related to - and can be used to discern - the behavior of neurons in the "lower levels" of the visual system, usually V1:

      "Prior studies have linked variations in both acuity (26) and perceived object size (59) with idiosyncrasies in visual cortical regions as early as V1."

      "To consider the origin of the relationship between crowding and saccades, we conducted a second experiment to compare crowding with two "lower-level" measures of spatial localization: gap resolution and bisection thresholds."

      "If these similarities were to arise due to an inheritance of a common topology from earlier stages of the visual system, we would expect to see similar patterns of variations in tasks that derive from lower-level processes."

      Before addressing the fatal flaws with the theoretical premise, I would like to note that the two references provided in no way rise to the occasion. Both are attempts to link some measures of task performance to area V1 based on fMRI results. fMRI is still a very crude method of studying neural function to begin with. Additionally, the interpretation of the scans is assumption-laden, and we are supposed to take all of the underlying assumptions as given, with no arguments or evidence. For example, from citation 26:

      "To describe the topology of a given observer's V1, we fit these fMRI activity maps with a template derived from a conformal mapping method developed by Schwartz (Schwartz 1980, Schwartz 1994). According to Schwartz, two-dimensional visual space can be projected onto the two-dimensional flattened cortex using the formula w=k x log(z + a), where z is a complex number representing a point in visual space, and w represents the corresponding point on the flattened cortex. [n.b. It is well-known that visual experience cannot be explained on a point by point basis]. The parameter a reflects the proportion of V1 devoted to the foveal representation, and the parameter k is an overall scaling factor."

      The 1994 Schwartz reference is to a book chapter, and the method being referenced appears to have been proposed in 1980 (pre-fMRI?). I guess we have to take it as given that it is valid.

      From ref. 59:

      "For pRF spread we used the raw, unsmoothed pRF spread estimates produced by our fine-fitting procedure. However, the quantification of surface area requires a smooth gradient in the eccentricity map without any gaps in the map and with minimal position scatter in pRF positions. therefore, we used the final smoothed prameter maps for this analysis. The results for pRF spread are very consistent when using smoothed parameter maps, but we reasoned that the unsmoothed data make fewer assumptions."

      One would ask that the assumptions be made explicit and rationalzed. So, again, references act as window-dressing for unwarranted assertions that the tasks used by the authors directly reflect V1 activity.

      The theoretical problem is that finding some correlation between some perceptual task and some empirical observations of the behavior of neurons in some part of the visual system in no way licenses the inference that the perceptual experience tapped by the task is a direction reflection of the activities of those particular neurons. Such correlations are easy to come by but the inference is not tenable in principle. If the presumed response properties of neurons in V1, for example, are supposed to directly cause feature x of a percept, we have to as a. how is this assumption reconciled with the fact that the activities of the same "low-level" neurons underlie all features of the percept and b. how is it that for this feature, all of the other interconnectivities with other neural layers and populations bypassed?

      Tolerance for the latter problem was dubbed by Teller (1984) the "nothing mucks it up proviso." As an example of the fallacious nature of such thinking, she refers to the Mach bands and their supposed connection to the responses of ganglion cells as observed via single cell recordings:

      "Under the right conditions, the physiological data "look like" the psychophysical data. The analogy is very appealing, but the question is, to what extent, or in what sense, do these results provide an explanation of why we see Mach bands?" (And how, I would add, is this presumed effect supposed to be expressed perceptually in response to all other patterns of retinal stmulation? How does it come about that the responses of ganglion cells are simultaneously shunted directly to perceptual experience, and at the same time participate in the normal course of events underlying visual process as a whole?)

      Teller then points out that, in the absence of an explicit treatment of "the constraints that the hypothesis puts on models of the composit map from the peripheral neural level [in this she includes V1] and the bridge locus, and between the bridge locus and phenomenal states," the proposal is nothing more than a "remote homunculus theory," with a homunculus peering down at ganglion cell activity through "a magical Maxwellian telescope." The aganglion cell explanation continues to feature in perception textbooks and university perception course websites.

      It is interesting to note that Greenwood et al's first mention of "lower-level" effects (see quote above) is placed between scare quotes, yet nowhere do they qualify the term explicitly.

      The ease with which one can discover analogies between presumed neural behavior and psychophysical data was well-described by Graham (2011):

      "The simple multiple-analyzers model shown in the top panel of Fig. 1 was and is a very good account, qualitatively and quantitatively, of the results of psychophysical experiments using near-threshold contrasts . And by 1985 there were hundreds of published papers each typically with many such experiments. It was quite clear by that time, however, that area V1 was only one of 10 or more different areas in the cortex devoted to vision. ...The success of this simple multiple-analyzers model seemed almost magical therefore. [Like a magical Maxwellian telescope?] How could a model account for so many experimental results when it represented most areas of the visual cortex and the whole rest of the brain by a simple decision rule? One possible explanation of the magic is this: In response to near-threshold patterns, only a small proportion of the analyzers are being stimulated above their baseline. Perhaps this sparseness of information going upstream limits the kinds of processing that the higher levels can do, and limits them to being described by simple decision rules because such rules may be close to optimal given the sparseness. It is as if the near-threshold experiments made all higher levels of visual processing transparent, therefore allowing the properties of the low-level analyzers to be seen." Rather than challenging the “nothing mucks it up proviso” on logical and empirical grounds, Graham has uncritically and absurdly embraced it. (I would note that the reference to "near-threshold" refers only to a specific feature of the stimulation in question, not the stimulation as a whole, e.g. the computer screen on which stimuli are being flashed, which, of course, is above-threshold and stimulating the same neurons.)


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    12. On 2017 Jun 26, Lydia Maniatis commented:

      Over thirty years ago, Teller (1984) attempted to inspire a course correction in a field that had become much too reliant on very weak arguments and untested, often implausible assumptions. In other words, she tried to free it from practices appropriate to pseudoscience. Unfortunately, as Greenwood et al (2017) illustrates so beautifully, the field not only ignored her efforts, it became, if anything, even less rigorous.

      The crux of Teller's plea is captured by the passage below (emphasis mine). Her reference is to "linking propositions" which she defines as "statements that relate perceptual states to physiological states, and as such are one of the fundamental building blocks of visual science."

      "Twenty years ago, Brindley pointed out that in using linking hypotheses, visual scientists often introduce unacknowledged, non-rigorous steps into their arguments. Brindley's remarks correctly sensitized us to the lack of rigor ** with which linking propositions have undoubltedly often been used, but led to few detailed, explicit discussions of linking propositions. it would seem usefule to encourage such discussions, and to encourage visual scientists to make linking propositions explicit **so that linking propositions can be subjected to the requirements of consistency and the risks of falsification appropriate to the evaluation of all scientific [as opposed to pseudoscientific] propositions."

      Data itself tells us nothing; it must be interpreted. The interpretation of data requires a clear theoretical framework. One of the requirements of a valid theoretical framework is that its assumptions be a. consistent with each other; b. consistent with known facts; and c. testable, in the sense that it makes new predictions about potentially observable natural phenomena. ("Linking propositions" are effectively just another term for the link between data and theory, applied to a particular field). Theoretical claims, in other words, are not to be made arbitrarily and casually because they are key to the valid interpretation of data.

      The major theoretical premise of Greenwood et al (2017_ is arbitrary and inconsistent with the facts aw we know them and as we can infer them logicaly. The authors don't even try to provide supporting citations that are anything more than window-dressing. The premise is contained in the following two exceprted statements:

      "Given the hierarchical structure of th eviusal system, with inherited receptive field properties at each stage (35), variations in this topological representation could arise early in the viusal system, with pattenrs specific to each individual that are inherited throughout later stages." (Introduction, p. E3574).

      "Given that the receptive fields at each stage in the visual system are likely built via the summation of inputs from the preceding stages (e.g. 58)..." (Discussion, p. E3580).

      The statements are false, so it is no surprise that neither of the references provided is anywhere near adequate to support what we are supposed to accept as "given."

      The first referenc is to Hubel and Wiesel (1962), an early study recording from the striate cortex of the cat. Its theoretical conclusions are early, speculative, based on a narrow set of stimulus conditions, and apply to a species with rather different visual skills than humans. Even so, the paper does not support Greenwood et al's breezy claim; it includes statements that contradict both of the quoted assertions e.g. (emphasis mine):

      "Receptive fields were termed complex when the response to light could not be predicted from the arrangements of excitatory and inhibitory regions. Such regions could generally not be demonstrated; when they could the laws of summation and mutual antagonism did not apply." (p. 151). Even the conclusions that may seem to apply are subject to a conceptual error noted by Teller (1984); the notion that a neuron is specialized to detect the stimulus (of the set selected for testing) to which it fires the fastest. (She likens this error to treating each retinal cone as a detector of wavelength to which it fires the fastest, or at all, when as we know the neural code for color is contingent on relative firing rates of all three cones).

      Well before Hubel and Wiesel, it had become abundantly clear that the link between retinal stimulation and perception could not remotely be described in terms of summative processes. (What receptive field properties have been inherited by the neurons whose activity is responsible for the perception of an edge in the absence of a luminance or spectral step? Or an amodal contour? or a double-layer? etc). Other than as a crude reflection of the fact that neurons are all interconnected in some way, the "inherited" story has no substance and no support.

      And of course, it is well-known that neural connections in the brain are so extraordinarily dynamic and complex - feedforward, feedback, feed-sideways, diagonally...even the effect of the feedforward component, so to speak, is contingent on the general system state at a given moment of stimulation...that to describe it as "hierarchical" is basically to mislead.

      The second supporting citation, to Felleman and van Essen (1991) is also to a paper in which the relevant claims are presented in a speculative fashion.

      To be continued (in addition to additional theoretical problems, the method and analysis - mostly post hoc - is also highly problematic).


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

  2. Feb 2018
    1. On 2017 Jun 26, Lydia Maniatis commented:

      Over thirty years ago, Teller (1984) attempted to inspire a course correction in a field that had become much too reliant on very weak arguments and untested, often implausible assumptions. In other words, she tried to free it from practices appropriate to pseudoscience. Unfortunately, as Greenwood et al (2017) illustrates so beautifully, the field not only ignored her efforts, it became, if anything, even less rigorous.

      The crux of Teller's plea is captured by the passage below (emphasis mine). Her reference is to "linking propositions" which she defines as "statements that relate perceptual states to physiological states, and as such are one of the fundamental building blocks of visual science."

      "Twenty years ago, Brindley pointed out that in using linking hypotheses, visual scientists often introduce unacknowledged, non-rigorous steps into their arguments. Brindley's remarks correctly sensitized us to the lack of rigor ** with which linking propositions have undoubltedly often been used, but led to few detailed, explicit discussions of linking propositions. it would seem usefule to encourage such discussions, and to encourage visual scientists to make linking propositions explicit **so that linking propositions can be subjected to the requirements of consistency and the risks of falsification appropriate to the evaluation of all scientific [as opposed to pseudoscientific] propositions."

      Data itself tells us nothing; it must be interpreted. The interpretation of data requires a clear theoretical framework. One of the requirements of a valid theoretical framework is that its assumptions be a. consistent with each other; b. consistent with known facts; and c. testable, in the sense that it makes new predictions about potentially observable natural phenomena. ("Linking propositions" are effectively just another term for the link between data and theory, applied to a particular field). Theoretical claims, in other words, are not to be made arbitrarily and casually because they are key to the valid interpretation of data.

      The major theoretical premise of Greenwood et al (2017_ is arbitrary and inconsistent with the facts aw we know them and as we can infer them logicaly. The authors don't even try to provide supporting citations that are anything more than window-dressing. The premise is contained in the following two exceprted statements:

      "Given the hierarchical structure of th eviusal system, with inherited receptive field properties at each stage (35), variations in this topological representation could arise early in the viusal system, with pattenrs specific to each individual that are inherited throughout later stages." (Introduction, p. E3574).

      "Given that the receptive fields at each stage in the visual system are likely built via the summation of inputs from the preceding stages (e.g. 58)..." (Discussion, p. E3580).

      The statements are false, so it is no surprise that neither of the references provided is anywhere near adequate to support what we are supposed to accept as "given."

      The first referenc is to Hubel and Wiesel (1962), an early study recording from the striate cortex of the cat. Its theoretical conclusions are early, speculative, based on a narrow set of stimulus conditions, and apply to a species with rather different visual skills than humans. Even so, the paper does not support Greenwood et al's breezy claim; it includes statements that contradict both of the quoted assertions e.g. (emphasis mine):

      "Receptive fields were termed complex when the response to light could not be predicted from the arrangements of excitatory and inhibitory regions. Such regions could generally not be demonstrated; when they could the laws of summation and mutual antagonism did not apply." (p. 151). Even the conclusions that may seem to apply are subject to a conceptual error noted by Teller (1984); the notion that a neuron is specialized to detect the stimulus (of the set selected for testing) to which it fires the fastest. (She likens this error to treating each retinal cone as a detector of wavelength to which it fires the fastest, or at all, when as we know the neural code for color is contingent on relative firing rates of all three cones).

      Well before Hubel and Wiesel, it had become abundantly clear that the link between retinal stimulation and perception could not remotely be described in terms of summative processes. (What receptive field properties have been inherited by the neurons whose activity is responsible for the perception of an edge in the absence of a luminance or spectral step? Or an amodal contour? or a double-layer? etc). Other than as a crude reflection of the fact that neurons are all interconnected in some way, the "inherited" story has no substance and no support.

      And of course, it is well-known that neural connections in the brain are so extraordinarily dynamic and complex - feedforward, feedback, feed-sideways, diagonally...even the effect of the feedforward component, so to speak, is contingent on the general system state at a given moment of stimulation...that to describe it as "hierarchical" is basically to mislead.

      The second supporting citation, to Felleman and van Essen (1991) is also to a paper in which the relevant claims are presented in a speculative fashion.

      To be continued (in addition to additional theoretical problems, the method and analysis - mostly post hoc - is also highly problematic).


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2017 Jun 27, Lydia Maniatis commented:

      Part 2 A related and equally untenable theoretical claim casually adopted by Greenwood et al (2017) is one which Teller (1984) has explicitly criticized and Graham (2011) has uncritically embraced and unintentionally satirized. It is the notion that some aspects of perceptual experience are directly related to - and can be used to discern - the behavior of neurons in the "lower levels" of the visual system, usually V1:

      "Prior studies have linked variations in both acuity (26) and perceived object size (59) with idiosyncrasies in visual cortical regions as early as V1."

      "To consider the origin of the relationship between crowding and saccades, we conducted a second experiment to compare crowding with two "lower-level" measures of spatial localization: gap resolution and bisection thresholds."

      "If these similarities were to arise due to an inheritance of a common topology from earlier stages of the visual system, we would expect to see similar patterns of variations in tasks that derive from lower-level processes."

      Before addressing the fatal flaws with the theoretical premise, I would like to note that the two references provided in no way rise to the occasion. Both are attempts to link some measures of task performance to area V1 based on fMRI results. fMRI is still a very crude method of studying neural function to begin with. Additionally, the interpretation of the scans is assumption-laden, and we are supposed to take all of the underlying assumptions as given, with no arguments or evidence. For example, from citation 26:

      "To describe the topology of a given observer's V1, we fit these fMRI activity maps with a template derived from a conformal mapping method developed by Schwartz (Schwartz 1980, Schwartz 1994). According to Schwartz, two-dimensional visual space can be projected onto the two-dimensional flattened cortex using the formula w=k x log(z + a), where z is a complex number representing a point in visual space, and w represents the corresponding point on the flattened cortex. [n.b. It is well-known that visual experience cannot be explained on a point by point basis]. The parameter a reflects the proportion of V1 devoted to the foveal representation, and the parameter k is an overall scaling factor."

      The 1994 Schwartz reference is to a book chapter, and the method being referenced appears to have been proposed in 1980 (pre-fMRI?). I guess we have to take it as given that it is valid.

      From ref. 59:

      "For pRF spread we used the raw, unsmoothed pRF spread estimates produced by our fine-fitting procedure. However, the quantification of surface area requires a smooth gradient in the eccentricity map without any gaps in the map and with minimal position scatter in pRF positions. therefore, we used the final smoothed prameter maps for this analysis. The results for pRF spread are very consistent when using smoothed parameter maps, but we reasoned that the unsmoothed data make fewer assumptions."

      One would ask that the assumptions be made explicit and rationalzed. So, again, references act as window-dressing for unwarranted assertions that the tasks used by the authors directly reflect V1 activity.

      The theoretical problem is that finding some correlation between some perceptual task and some empirical observations of the behavior of neurons in some part of the visual system in no way licenses the inference that the perceptual experience tapped by the task is a direction reflection of the activities of those particular neurons. Such correlations are easy to come by but the inference is not tenable in principle. If the presumed response properties of neurons in V1, for example, are supposed to directly cause feature x of a percept, we have to as a. how is this assumption reconciled with the fact that the activities of the same "low-level" neurons underlie all features of the percept and b. how is it that for this feature, all of the other interconnectivities with other neural layers and populations bypassed?

      Tolerance for the latter problem was dubbed by Teller (1984) the "nothing mucks it up proviso." As an example of the fallacious nature of such thinking, she refers to the Mach bands and their supposed connection to the responses of ganglion cells as observed via single cell recordings:

      "Under the right conditions, the physiological data "look like" the psychophysical data. The analogy is very appealing, but the question is, to what extent, or in what sense, do these results provide an explanation of why we see Mach bands?" (And how, I would add, is this presumed effect supposed to be expressed perceptually in response to all other patterns of retinal stmulation? How does it come about that the responses of ganglion cells are simultaneously shunted directly to perceptual experience, and at the same time participate in the normal course of events underlying visual process as a whole?)

      Teller then points out that, in the absence of an explicit treatment of "the constraints that the hypothesis puts on models of the composit map from the peripheral neural level [in this she includes V1] and the bridge locus, and between the bridge locus and phenomenal states," the proposal is nothing more than a "remote homunculus theory," with a homunculus peering down at ganglion cell activity through "a magical Maxwellian telescope." The aganglion cell explanation continues to feature in perception textbooks and university perception course websites.

      It is interesting to note that Greenwood et al's first mention of "lower-level" effects (see quote above) is placed between scare quotes, yet nowhere do they qualify the term explicitly.

      The ease with which one can discover analogies between presumed neural behavior and psychophysical data was well-described by Graham (2011):

      "The simple multiple-analyzers model shown in the top panel of Fig. 1 was and is a very good account, qualitatively and quantitatively, of the results of psychophysical experiments using near-threshold contrasts . And by 1985 there were hundreds of published papers each typically with many such experiments. It was quite clear by that time, however, that area V1 was only one of 10 or more different areas in the cortex devoted to vision. ...The success of this simple multiple-analyzers model seemed almost magical therefore. [Like a magical Maxwellian telescope?] How could a model account for so many experimental results when it represented most areas of the visual cortex and the whole rest of the brain by a simple decision rule? One possible explanation of the magic is this: In response to near-threshold patterns, only a small proportion of the analyzers are being stimulated above their baseline. Perhaps this sparseness of information going upstream limits the kinds of processing that the higher levels can do, and limits them to being described by simple decision rules because such rules may be close to optimal given the sparseness. It is as if the near-threshold experiments made all higher levels of visual processing transparent, therefore allowing the properties of the low-level analyzers to be seen." Rather than challenging the “nothing mucks it up proviso” on logical and empirical grounds, Graham has uncritically and absurdly embraced it. (I would note that the reference to "near-threshold" refers only to a specific feature of the stimulation in question, not the stimulation as a whole, e.g. the computer screen on which stimuli are being flashed, which, of course, is above-threshold and stimulating the same neurons.)


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    3. On 2017 Jun 27, Lydia Maniatis commented:

      Part 2b (due to size limit)

      I would note, finally, that unless the authors are also believers in a transparent brain for some, but not other, perceived features resulting from a retinal stimulation event, the idiosyncratic/summative/inherited/low-level effects claims should presumably be detectable in a wide range of normal perceptual experiences, not only in peripheral vision under conditions which are so poor that observers have to guess at a response some unknown proportion of the time, producing very noisy data interpreted in vague terms with a large number of researcher degrees of freedom and a great deal of theoretical special pleading. Why not look for these hypothesized effects where they would be expected to be most clearly expressed?


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    4. On 2017 Jun 28, Lydia Maniatis commented:

      Part 3

      I don't understand why it has become normalized for observers in psychophysical experiments to include the authors of a study. Here, authors form one quarter of the participants in the first experiment and nearly one third in the second. Aside from the authors, participants are described as "naive." As this practice is accepted at PNAS, I can only imagine that psychophysical experiments require a mix of subjects who are naive to the purpose and subjects who are highly motivated to achieve a certain result. I only wish the reasons for the practice were made explicit. Because it seems to me that if it's too difficult to find enough naive participants for a study that requires them, then it's too difficult to do the study.

      If the inclusion of authors as subjects seems to taint the raw data, there is also a problem with the procedure to which the data are subjected prior to analysis. This essentially untestable, assumption-laden procedure is completely opaque, and mentioned fleetingly in the Methods:

      "Psychometric functions were fitted to behavioral data using a cumulative Gaussian with three parameters (midpoint, slope, and lapse rate). "

      The key term here is "lapse rate." The lapse rate concept is a controversial theoretical patch-up developed to deal with the coarseness of the methods adopted in psychophysics, specifically the use of forced choices. When subjects are forced to make a choice even when what they perceive doesn't fall into the two, three or four choices preordained by the experimenters, then they are forced to guess. The problem is serious because most psychophysical experiments are conducted under perceptually very poor conditions, such as low contrast and very brief stimulus presentations. This obviously corrupts the data. At some point, practitioners of the method decided they had to take into account this "lapse rate," i.e. the "guess rate." That the major uncertainty incorporated into the forced-choice methodology could not be satisfactorily resolved is illustrated in comments by Prins (2012/JOV), whose abstract I quote in full below:

      "In their influential paper, Wichmann and Hill (2001) have shown that the threshold and slope estimates of a psychometric function may be severely biased when it is assumed that the lapse rate equals zero but lapses do, in fact, occur. Based on a large number of simulated experiments, Wichmann and Hill claim that threshold and slope estimates are essentially unbiased when one allows the lapse rate to vary within a rectangular prior during the fitting procedure. Here, I replicate Wichmann and Hill's finding that significant bias in parameter estimates results when one assumes that the lapse rate equals zero but lapses do occur, but fail to replicate their finding that freeing the lapse rate eliminates this bias. Instead, I show that significant and systematic bias remains in both threshold and slope estimates even when one frees the lapse rate according to Wichmann and Hill's suggestion. I explain the mechanisms behind the bias and propose an alternative strategy to incorporate the lapse rate into psychometric function models, which does result in essentially unbiased parameter estimates."

      It should be obvious that calculating the rate at which subjects are forced to guess is highly-condition-sensitive and subject-sensitive, and that even if one believes the uncertainty can be removed by a data manipulation, there can be no one-size-fits all method. Which strategy for calculating guessing rate have Greenwood et al (2017) adopted? Why? What was the "lapse rate"? There would seem to be no point in even looking at the data unless their data manipulation and its rationale are made explicit.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    5. On 2017 Jun 29, Lydia Maniatis commented:

      None


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    6. On 2017 Jun 29, Lydia Maniatis commented:

      Part 5

      With respect to Vernier acuity, and in addition to my earlier objections, I would add that a "low-level" description seems to be at odds with the fact that Vernier acuity, which is also described as hyperacuity, is better than would be expected on the basis of the spacing of the receptors in the retina.

      "Yet spatial distinctions can be made on a finer scale still: misalignment of borders can be detected with a precision up to 10 times better than visual acuity. This hyperacuity, transcending by far the size limits set by the retinal 'pixels', depends on sophisticated information processing in the brain....[the] quintessential example and the one for which the word was initially coined,[1] is vernier acuity: alignment of two edges or lines can be judged with a precision five or ten times better than acuity. " (Wikipedia entry on hyperacuity).

      When an observer is asked a question about alignment of two line segments, the answer they give is, always, based on the percept, i.e. a high-level, conscious product of visual processing. It is paradoxical to argue that some percepts are high and others low-level, because even if one wanted to argue that some percepts reflect low-level activity, the decision to derive the percept or features thereof from a particular level in one and another level in another case would have to be high-level. The perceived better-than-it-should be performance that occurs in instances of so-called hyperacuity is effectivelyan inference, as are all interpretations of the retinal stimulation, whether a 3D Necker cube or the Mona Lisa. It's not always the case that two lines that are actually aligned will appear aligned. (Even a single continuous line may appear bent - yet line segments are supposed to be the V1 specialty). It all depends on the structure of the whole retinal configuration, and the particular, high-level, inferences to which this whole stimulation give rise in perception.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    7. On 2017 Jul 02, Lydia Maniatis commented:

      Part 6

      With respect to line bisection:

      It is mentioned by Arnheim (Art and Visual Perception) that if you ask a person to bisect a vertical line under the best conditions - that is, conditions of free-viewing without time limits - they will tend to place the mark too high:

      "An experimental demonstration with regard to size is mentioned by Langfeld: "If one is asked to bisect a perpendicular line without measuring it, one almost invariably places the mark too high. If a line is actually bisected, it is with difficulty that one can convince oneself that the upper half is not longer than the lower half." This means that if one wants the two halves to look alike, one must make the upper half shorter. " (p. 30).

      As the authors of this study don't seem to have taken this apparent, systematic bias into account, their "correct" and "incorrect" criterion of line bisection under the adverse conditions they impose may not be appropriate. It is also obvious that the results of the method used did not alert the authors to the possibility of such a bias.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    8. On 2017 Jul 02, Lydia Maniatis commented:

      None


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    9. On 2017 Jul 07, Lydia Maniatis commented:

      What would it mean for "crowding and saccade errors" to rely on a common spatial representation of the visual field? The phenomena are clearly not identical - one involves motor planning, for example - and thus their neural substrates will not be identical. To the extent that "spatial map" refers to a neural substrate, then these will not be identical. So I'm not understanding the distinction being made between spatial maps "with inherited topological properties" and "distinct spatial maps."


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    10. On 2017 Jul 07, Lydia Maniatis commented:

      I'm not an expert in statistics, but it seems to me that the authors have conducted multiple and sequential comparisons without applying the appropriate correction. In addition, the number of subjects is small.

      Also, the definition of key variables - "crowding zone" and "saccade error zone" - seems arbitrary given that they are supposed to tap into fundamental neural features of the brain. The former is defined as "target-flanker separation at which performance reached 80% correct [i.e. 20% incorrect]...which we take as the dimensions of the crowding zone," the latter by fitting "2D Gaussian functions to the landing errors and defin[ing] an ellipse with major and minor axes that captured 80% of the landing positions (shown with a black dashed line in Fig. 1C). The major and minor axes of this ellipse were taken as the radial and tangential dimensions of the “saccade error zone.”"

      What is the relationship between what the authors "take as" the crowding/saccade error zones and a presumptive objective definition? What is the theoretical significance of the 80% cut-off? What would the data look like if we used a 90% cut-off?

      Is a "finding" that a hierarchical linear regression "explains" 7.3% of the variance meaningful? The authors run two models, and in one saccades are a "significant predictor" of the data while in the other they are no longer significant, while gap resolution and bisection are. Conclusions seem to be based more on chance than necessity, so to speak.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    11. On 2017 Jul 08, Lydia Maniatis commented:

      None


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    12. On 2017 Aug 06, John Greenwood commented:

      (cross-posted from Pub Peer, comment numbers refer to that discussion but content is the same)

      To address your comments in reverse order -

      Spatial vision and spatial maps (Comment 19):

      We use the term “spatial vision” in the sense defined by Russell & Karen De Valois: “We consider spatial vision to encompass both the perception of the distribution of light across space and the perception of the location of visual objects within three-dimensional space. We thus include sections on depth perception, pattern vision, and more traditional topics such as acuity." De Valois, R. L., & De Valois, K. K. (1980). Spatial Vision. Annual Review of Psychology, 31(1), 309-341. doi:doi:10.1146/annurev.ps.31.020180.001521

      The idea of a "spatial map” refers to the representation of the visual field in cortical regions. There is extensive evidence that visual areas are organised retinotopically across the cortical surface, making them “maps". See e.g. Wandell, B. A., Dumoulin, S. O., & Brewer, A. A. (2007). Visual field maps in human cortex. Neuron, 56(2), 366-383.

      Measurement of lapse rates (Comments 4, 17, 18):

      There really is no issue here. In Experiment 1, we fit a psychometric function in the form of a cumulative Gaussian to responses plotted as a function of (e.g.) target-flanker separation (as in Fig. 1B), with three free parameters: midpoint, slope, and lapse rate. The lapse rate is 100-x where x is the asymptote of the curve. It accounts for lapses (keypress errors etc) when performance is otherwise high - i.e. it is independent of the chance level. In this dataset it is never about 5%. However its inclusion does improve estimate of slope (and therefore threshold) which we are interested in. Any individual differences are therefore better estimated by factoring out individual differences in lapse rate. Its removal does not qualitatively affect the pattern of results in any case. You cite Wichmann and Hill (2001) and that is indeed the basis of this three-parameter fit (though ours is custom code that doesn’t apply the bootstrapping procedures etc that they use).

      Spatial representations (comment 8):

      We were testing the proposal that crowding and saccadic preparation might depend on some degree of shared processes within the visual system. Specific predictions for shared vs distinct spatial representations are made on p E3574 and in more detail on p E3576 of our manuscript. The idea comes from several prior studies arguing for a link between the two, as we cite, e.g.: Nandy, A. S., & Tjan, B. S. (2012). Saccade-confounded image statistics explain visual crowding. Nature Neuroscience, 15(3), 463-469. Harrison, W. J., Mattingley, J. B., & Remington, R. W. (2013). Eye movement targets are released from visual crowding. The Journal of Neuroscience, 33(7), 2927-2933.

      Bisection (Comments 7, 13, 15):

      Your issue relates to biases in bisection. This is indeed an interesting area, mostly studied for foveal presentation. These biases are however small in relation to the size of thresholds for discrimination, particularly for the thresholds seen in peripheral vision where our measurements were made. An issue with bias for vertical judgements would lead to higher thresholds for vertical vs. horizontal judgements, which we don’t see. The predominant pattern in bisection thresholds (as with the other tasks) is a radial/tangential anisotropy, so vertical thresholds are worse than horizontal on the vertical meridian, but better than horizontal thresholds on the horizontal meridian. The role of biases in that anisotropy is an interesting question, but again these biases tend to be small relative to threshold.

      Vernier acuity (Comment 6):

      We don’t measure vernier acuity, for exactly the reasons you outline (stated on p E3577).

      Data analyses (comment 5):

      The measurement of crowding/interference zones follows conventions established by others, as we cite, e.g.: Pelli, D. G., Palomares, M., & Majaj, N. J. (2004). Crowding is unlike ordinary masking: Distinguishing feature integration from detection. Journal of Vision, 4(12), 1136-1169.

      Our analyses are certainly not post-hoc exercises in data mining. The logic is outlined at the end of the introduction for both studies (p E3574).

      Inclusion of the authors as subjects (Comment 3):

      In what way should this affect the results? This can certainly be an issue for studies where knowledge of the various conditions can bias outcomes. Here this is not true. We did of course check that data from the authors did not differ in any meaningful way from other subjects (aside from individual differences), and it did not. Testing (and training) experienced psychophysical observers takes time, and authors tend to be experienced psychophysical observers.

      The theoretical framework of our experiments (Comments 1 & 2):

      We make an assumption about hierarchical processing within the visual system, as we outline in the introduction. We test predictions that arise from this. We don’t deny that feedback connections exist, but I don’t think their presence would alter the predictions outlined at the end of the introduction. We also make assumptions regarding the potential processing stages/sites underlying the various tasks examined. Of course we can’t be certain about this (and psychophysics is indeed ill-poised to test these assumptions) and that is the reason that no one task is linked to any specific neural locus, e.g. crowding shows neural correlates in visual areas V1-V4, as we state (e.g. p E3574). Considerable parts of the paper are then addressed at considering whether some tasks may be lower- or higher-level than others, and we outline a range of justifications for the arguments made. These are all testable assumptions, and it will be interesting to see how future work then addresses this.

      All of these comments are really fixated on aspects of our theoretical background and minor details of the methods. None of this in any way negates our findings. Namely, there are distinct processes within the visual system, e.g. crowding and saccadic precision, that nonetheless show similarities in their pattern of variations across the visual field. We show several results that suggest these two processes to be dissociable (e.g. that the distribution of saccadic errors is identical for trials where crowded targets were correctly vs incorrectly identified). If they’re clearly dissociable tasks, how then to explain the correlation in their pattern of variation? We propose that these properties are inherited from earlier stages in the visual system. Future work can put this to the test.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.