4 Matching Annotations
  1. Jul 2018
    1. On 2017 May 22, Lydia Maniatis commented:

      Part 1 This publication is burdened with an unproductive theoretical approach as well as methodological problems (including intractable sampling problems). Conclusions range from trivial to doubtful.

      Contemporary vision science seems determined to take organization of the retinal stimulation out of the picture, and replace it with raw numbers, whether neural firing rates or statistics. This is a fundamental error. A statistical heuristic strategy doesn’t work in any discipline, including physics. For example, a histogram of the relative heights of all the point masses in a particular patch of the world wouldn’t tell anything about the mechanical properties of the objects in that scene, because it would not tell us about distribution and cohesiveness of masses. (Would it tell us anything of interest?)

      In perception, it is more than well established that the appearance of any point in the visual field –with respect to lightness, color, shape, etc - is intimately dependent on the intensities/spectral compositions of the points in the surrounding (the entire) field (specifically their effects on the retina) and on the principles of organization that the visual process effectively applies to the stimulation. Thus, a compilation of, for example, the spectral statistics of Purves’ colored cube would not allow us either to explain or predict the appearance of colored illumination or transparent overlays. Or, rather, it wouldn’t allow us to predict these things unless we employed a very special sample of images, all of which produced such impressions of colored illumination. Then we might get a relatively weak correlation. This is because, within this sample, a preponderance of certain wavelengths would tend to correlate with e.g. a yellow, illumination impression, rather than being due, as might be true for the general case, to the presence of a number of unified apparently yellow and opaque surfaces. Thus, we see how improper sampling can allow us to make better (and, I would add, predictable) predictions without implying explanatory power. In perception, explanatory power strictly requires we take into account principles of organization.

      In contrast, the authors here take the statistics route. They want to show, or rather, don’t completely fail to corroborate the observation that when surfaces are wet, their look colors are deeper and more vivid, and also to corroborate the fact that changes in perception are linked to changes in the retinal stimulation. Using a set of ready-made images (criteria for the selection of which are not provided), they apply to them a manipulation (among others) that has the general effect of increasing the saturation of the colors perceived. One way to ascertain whether this manipulation causes a surface to appear wet would be to simply ask observers to describe the surface, without any clues to what was expected. Would the surface be spontaneously be described as “wet” or “moist”? This would be the more challenging test, but is not the approach taken.

      Instead, observers are first trained on images (examples of which are not provided - I have requested examples) that we are told appear very wet (and the dry versions), and include shape-based cues, such as drops of water or puddles. They are told to use these as a guide to what counts as very wet, or a rating of 5. They are then shown a series of images containing both original and manipulated images (with more saturated colors, but lacking any shape-based cues), and asked to rate wetness from 1 to 5.

      The results are messy, with some transformed images getting higher ratings than the originals and others not, though on average they are more highly rated. But the ratings for all the images are relatively low; and we have to ask, how have the observers understood their task? Are they reporting an authentic perception of wetness or moistness, or do they believe are they trying to guess at how wet a surface actually is, based on a rule of thumb adopted during the training phase, in which, presumably, the wet images were also more color-saturated? (In other words, is the task authentically perceptual, or is it more cognitive guesswork?) What does it mean to rate the wetness of a surface at e.g. the “2” level?

      The cost of ignoring the factor of shape/structure is evident in the authors’ attempt to explain why the ratings for all images were so low, reaching 4 in only one case. They explain that it may be because their manipulation didn’t include areas that looked like drops or puddles. Does this mean that the presence of drops or puddles actually changes the appearance of the surrounding areas, and/or that perhaps those very different training images included other organized features that were overlooked and that affected perception? Did the training teach observers to apply a cue in practice that by itself produces somewhat different perceptual outcomes? I suppose we could ask the observers about their strategy, but this would muddy the facade of quantitative purity.

      At any rate, the manipulation (like most ad hoc assumptions) fails as a tool for prediction, leading the authors to acknowledge that “The image transformation greatly increased the wetness rating for some images but not for others…” (Again, it isn’t clear that “wetness rating” correlates with an authentically perceptual scale). Thus, relative success or failure of the transformation is image-specific, and thus sample-specific; some samples and sample sets would very likely not reach statistical significance. Thus the decision to investigate further (Experiment 1b) using (if I’m reading this correctly) only a single custom-made image that was not part of the original set (on what basis was this chosen?) seems unwise. (This might seem to worsen the sampling problem, but the problem is intractable anyway. As there is no possible sample that would allow the researchers to generate reliable statistics-based predictions for the individual case, any generalization would be instantly falsifiable, and thus lack explanatory power).

      The degree to which any conclusions are tied to the specific (and unrationalized) sample is illustrated by the fact that the technical manipulations were tailored to it (from Experiment 1a): “In deciding [the] parameters of the WET transformation, we preliminarily explored a range of parameters and chose ones that did not disturb the apparent naturalness of all the images used in Experiment 1a.” Note the lack of objective criteria for “naturalness.”). (We’re not told on what basis the parameters in Experiment 1b were chosen). In short, I don’t think this numbers game can tell us anything more from a theoretical point of view than casual observation and e.g., trial and error by artists, already have.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2017 May 22, Lydia Maniatis commented:

      Part !!

      The authors also generate a hypothesis via data-digging on the statistics of the images used in 1a. This hypothesis is that greater “hue entropy” is correlated with a larger wetness impression (whatever this may mean given the study’s methods). This is tested with a new set of artificial images made using third-party software, which also are also obviously subject to sampling/confounding problems. Results of manipulating this variable were mixed, with one comparison non-significant, one significant if we apply a relatively low standard (p = 0.04), and one significant at the 0.005 level. So the results are inconclusive, even for this particular sample. The authors note, further, that “the effect of hue entropy cannot be explained by ecological optics” and rationalize their (ambiguous) results in the following very casual and logically incoherent manner:

      “Since there is a significant overlap in the distribution of color saturation between dry and wet samples…[t]he key to resolving these ambiguities is to increase the number of samples. When wet-related image features are shared by many different parts in the scene, the image features are likely to be produced by a global common factor, such as wetting. In other words, the more independent colors the scene contains, the more reliably the visual system can judge scene wetness.”

      I don’t see why a larger sample would necessarily be a more colorful sample. Also, the authors are suggesting that a larger patch of the visual scene will be more likely to receive a higher wet score than a small patch; this seems very implausible. A Bayesian gloss of this explanation follows, complete with arbitrarily chosen “prior probabilities.” Such a mechanism would render the verisimilitude of human perception highly unreliable on a case-by-case basis, much more so than is the case. The fact is that the visual system doesn’t have to rely on weak probabilities for weakly correlated features when it has much more reliable structural principles to work with.

      The description of stimulus as “a natural texture” is not informative from an experimental point of view. The potential choices are infinitely variable.

      In the text, the authors are using the term color as though it were an objective feature of the stimulus rather than perceptual factor, which is confusing and should be avoided. (From Wikipedia: “As colorfulness, chroma and saturation are defined as attributes of perception they can not be physically measured as such.”)

      Bottom line: 1. Statistical compilations divorced from reference to principles of organization lack explanatory and general predictive power, in perception as in every other discipline. They are not productive tools of scientific discovery.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

  2. Feb 2018
    1. On 2017 May 22, Lydia Maniatis commented:

      Part !!

      The authors also generate a hypothesis via data-digging on the statistics of the images used in 1a. This hypothesis is that greater “hue entropy” is correlated with a larger wetness impression (whatever this may mean given the study’s methods). This is tested with a new set of artificial images made using third-party software, which also are also obviously subject to sampling/confounding problems. Results of manipulating this variable were mixed, with one comparison non-significant, one significant if we apply a relatively low standard (p = 0.04), and one significant at the 0.005 level. So the results are inconclusive, even for this particular sample. The authors note, further, that “the effect of hue entropy cannot be explained by ecological optics” and rationalize their (ambiguous) results in the following very casual and logically incoherent manner:

      “Since there is a significant overlap in the distribution of color saturation between dry and wet samples…[t]he key to resolving these ambiguities is to increase the number of samples. When wet-related image features are shared by many different parts in the scene, the image features are likely to be produced by a global common factor, such as wetting. In other words, the more independent colors the scene contains, the more reliably the visual system can judge scene wetness.”

      I don’t see why a larger sample would necessarily be a more colorful sample. Also, the authors are suggesting that a larger patch of the visual scene will be more likely to receive a higher wet score than a small patch; this seems very implausible. A Bayesian gloss of this explanation follows, complete with arbitrarily chosen “prior probabilities.” Such a mechanism would render the verisimilitude of human perception highly unreliable on a case-by-case basis, much more so than is the case. The fact is that the visual system doesn’t have to rely on weak probabilities for weakly correlated features when it has much more reliable structural principles to work with.

      The description of stimulus as “a natural texture” is not informative from an experimental point of view. The potential choices are infinitely variable.

      In the text, the authors are using the term color as though it were an objective feature of the stimulus rather than perceptual factor, which is confusing and should be avoided. (From Wikipedia: “As colorfulness, chroma and saturation are defined as attributes of perception they can not be physically measured as such.”)

      Bottom line: 1. Statistical compilations divorced from reference to principles of organization lack explanatory and general predictive power, in perception as in every other discipline. They are not productive tools of scientific discovery.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2017 May 22, Lydia Maniatis commented:

      Part 1 This publication is burdened with an unproductive theoretical approach as well as methodological problems (including intractable sampling problems). Conclusions range from trivial to doubtful.

      Contemporary vision science seems determined to take organization of the retinal stimulation out of the picture, and replace it with raw numbers, whether neural firing rates or statistics. This is a fundamental error. A statistical heuristic strategy doesn’t work in any discipline, including physics. For example, a histogram of the relative heights of all the point masses in a particular patch of the world wouldn’t tell anything about the mechanical properties of the objects in that scene, because it would not tell us about distribution and cohesiveness of masses. (Would it tell us anything of interest?)

      In perception, it is more than well established that the appearance of any point in the visual field –with respect to lightness, color, shape, etc - is intimately dependent on the intensities/spectral compositions of the points in the surrounding (the entire) field (specifically their effects on the retina) and on the principles of organization that the visual process effectively applies to the stimulation. Thus, a compilation of, for example, the spectral statistics of Purves’ colored cube would not allow us either to explain or predict the appearance of colored illumination or transparent overlays. Or, rather, it wouldn’t allow us to predict these things unless we employed a very special sample of images, all of which produced such impressions of colored illumination. Then we might get a relatively weak correlation. This is because, within this sample, a preponderance of certain wavelengths would tend to correlate with e.g. a yellow, illumination impression, rather than being due, as might be true for the general case, to the presence of a number of unified apparently yellow and opaque surfaces. Thus, we see how improper sampling can allow us to make better (and, I would add, predictable) predictions without implying explanatory power. In perception, explanatory power strictly requires we take into account principles of organization.

      In contrast, the authors here take the statistics route. They want to show, or rather, don’t completely fail to corroborate the observation that when surfaces are wet, their look colors are deeper and more vivid, and also to corroborate the fact that changes in perception are linked to changes in the retinal stimulation. Using a set of ready-made images (criteria for the selection of which are not provided), they apply to them a manipulation (among others) that has the general effect of increasing the saturation of the colors perceived. One way to ascertain whether this manipulation causes a surface to appear wet would be to simply ask observers to describe the surface, without any clues to what was expected. Would the surface be spontaneously be described as “wet” or “moist”? This would be the more challenging test, but is not the approach taken.

      Instead, observers are first trained on images (examples of which are not provided - I have requested examples) that we are told appear very wet (and the dry versions), and include shape-based cues, such as drops of water or puddles. They are told to use these as a guide to what counts as very wet, or a rating of 5. They are then shown a series of images containing both original and manipulated images (with more saturated colors, but lacking any shape-based cues), and asked to rate wetness from 1 to 5.

      The results are messy, with some transformed images getting higher ratings than the originals and others not, though on average they are more highly rated. But the ratings for all the images are relatively low; and we have to ask, how have the observers understood their task? Are they reporting an authentic perception of wetness or moistness, or do they believe are they trying to guess at how wet a surface actually is, based on a rule of thumb adopted during the training phase, in which, presumably, the wet images were also more color-saturated? (In other words, is the task authentically perceptual, or is it more cognitive guesswork?) What does it mean to rate the wetness of a surface at e.g. the “2” level?

      The cost of ignoring the factor of shape/structure is evident in the authors’ attempt to explain why the ratings for all images were so low, reaching 4 in only one case. They explain that it may be because their manipulation didn’t include areas that looked like drops or puddles. Does this mean that the presence of drops or puddles actually changes the appearance of the surrounding areas, and/or that perhaps those very different training images included other organized features that were overlooked and that affected perception? Did the training teach observers to apply a cue in practice that by itself produces somewhat different perceptual outcomes? I suppose we could ask the observers about their strategy, but this would muddy the facade of quantitative purity.

      At any rate, the manipulation (like most ad hoc assumptions) fails as a tool for prediction, leading the authors to acknowledge that “The image transformation greatly increased the wetness rating for some images but not for others…” (Again, it isn’t clear that “wetness rating” correlates with an authentically perceptual scale). Thus, relative success or failure of the transformation is image-specific, and thus sample-specific; some samples and sample sets would very likely not reach statistical significance. Thus the decision to investigate further (Experiment 1b) using (if I’m reading this correctly) only a single custom-made image that was not part of the original set (on what basis was this chosen?) seems unwise. (This might seem to worsen the sampling problem, but the problem is intractable anyway. As there is no possible sample that would allow the researchers to generate reliable statistics-based predictions for the individual case, any generalization would be instantly falsifiable, and thus lack explanatory power).

      The degree to which any conclusions are tied to the specific (and unrationalized) sample is illustrated by the fact that the technical manipulations were tailored to it (from Experiment 1a): “In deciding [the] parameters of the WET transformation, we preliminarily explored a range of parameters and chose ones that did not disturb the apparent naturalness of all the images used in Experiment 1a.” Note the lack of objective criteria for “naturalness.”). (We’re not told on what basis the parameters in Experiment 1b were chosen). In short, I don’t think this numbers game can tell us anything more from a theoretical point of view than casual observation and e.g., trial and error by artists, already have.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.