4 Matching Annotations
  1. Jul 2018
    1. On 2016 Jan 14, Lydia Maniatis commented:

      Conceptual problems The authors argue that when the visual system gets things wrong, the problem is that the scene is “impoverished” and “unnatural.” Such terms are not specific enough to allow them to be evaluated scientifically. If we use the term “natural” to mean created by natural processes, then the stimuli used in this study were not natural. If the authors mean more than that, they need to both specify what they mean and to control experimentally for the relevant factors. Otherwise, the definition of natural becomes tautological – any scene that produces more or less veridical percepts is defined as “natural.” A non-tautological step forward require examining cases where vision fails, discerning a potential distinction between the features of the cases were it succeeds and the cases where it fails, and testing the assumption that the distinguishing factors matter. Simply defining conditions where vision fails as “impoverished” and “unnatural” doesn't allow such a test (and/or the claims are easily falsified); it leads to confusion, since, for example, it would be difficult to argue that a photograph is an “impoverished” stimulus, or, again, that man-made objects are “natural.” If the authors are suggesting that all types of asymmetry compromise the accuracy of aspects of the 3D percept, then they need to rationalize the claim theoretically and/or frame it precisely enough that it can be tested. (I'm quite sure any clearly-framed symmetry claim can be easily falsified.)

      It is surprising, finally, that the authors claim not to understand the value of illusions in studying perception. The value is in testing hypotheses vis a vis the principles underlying visual performance. For example, the pictorial perception of 3D shapes, the Ames room, contradict Marr's 2.5D sketch concept, i.e. the primacy of depth maps in perception of a third dimension.

      The confusion as to the role of illusions is correlated with an epistemological confusion as to the use and logic of falsification. “It follows,” write the authors, “that with the types of models we are using, falsification is not your “best friend” as it often is elsewhere. In 3D vision, there is usually no model you can turn to, so there is nothing to falsify. Simply put, scientific discovery in 3D vision is not accomplished by using an ANOVA or Bayesian tests to reject some hypotheses. Discovery is accomplished by correctly guessing which cost function is actually being used by the visual system. objects within this space veridically.” But to corroborate a guess (the asumptions incorporated in a model) we must test it, and the test may prove it wrong, i.e. may falsify it. This does not mean testing merely by using ANOVA's to show significance, or lack thereof, in cases where the outcome is foggy, or “Bayesian” tests (to show...whatever), but actually showing that our assumptions – e.g. that binocular vision is not necessary or sufficient to the perception of 3D shape - do, or do not, hold up to rigorous testing.

      Finally, it's not made clear which “Gestalt-like” constraints are being referred to. (Organisational principles are necessary to all perception; is there meant to be more specificity than this?)


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2016 Jan 14, Lydia Maniatis commented:

      This study does not seem to extend the reasonable (a rare thing) approach of the Pizlo group to 3D shape perception (a fundamental and much-neglected topic). The study addresses a. the ability of humans to accurately perceive the proportions of “3D indoor scenes” and b. the ability of a robot to reproduce such scenes using a computational model constructed (in part) by the investigators. However, while in the past the group was interested in discerning principles underlying human performance and using them as the basis for building computational models (which could then serve both the theoretical goal of testing the hypothesized principles and of constructing a computer that can “see like us”), here there appears to be no theoretical connection between human and robot performance.

      The way the robot goes about the task is not the way humans can or do. It involves constructing a depth map (using a camera's inbuilt program) on the basis of binocular disparity and triangulation. Pizlo (2008) is clear on the fact human 3D perception is not built on the basis of depth maps, or '2.5D sketches' exploiting binocular disparities: “There seems little reason to afford binocularity, disparity...a critical role in natural shape perception” p. 176).

      Conversely, the robot does not employ monocular principles – it cannot see with one eye. Yet in humans, monocular performance was not too much worse than binocular (which is necessary but not sufficient for the best accuracy), and the former can override the latter (e.g. the zero disparity in a view of a photograph of a 3D scene/objects doesn't prevent 3D percepts from arising).

      Thus, none of the (monocular) principles that the investigators have effectively applied in previous work on 3D shape perception were in play in this computational model (which, if I understand correctly, does little more than group points associated with objects, whose relative locations are given by the camera software). If the goal is to model human vision, then the present study has not taken any additional steps toward that goal. If the goal is to make robots that can navigate the environment, then non-human mechanisms – e.g. some kind of radar – might be more straightforward.

      At any rate, the human and the robot sections of this paper are unconnected, theoretically and practically.

      Problems with methods Focussing strictly on the experiments with humans, we can note that the only goal the method allowed, in principle, was to confirm that under some conditions (mirror-symmetrical objects in a rectangular room with a floor perpendicular to the direction of gravity), human perception of relative sizes and relative locations of objects is reasonably good. Performance was not great, with high within and between subject variability, and scale biases.

      It is not clear that the problem was not with perception, but with the task, which involved a pictorial representation of the ground plan. For non-artists, copying from life to paper/tablet can be a difficult task, and flaws in the picture don't necessarily correspond to flaws in perception (meaning that the experiment might better have used artists or draftsmen). Memory also comes into play, as the relative distances in the picture are repeatedly, and serially, checked against those of the scene. It is also known that between object/figure distances are less salient perceptually than within-object distances (see e.g. Arnheim, Art and Visual Perception (1974) p. 236-8. This book should be read by all vision scientists!)

      Given these difficulties, it's doubtful that the high degree of variability in performance found by the researchers corresponds to real variability in perception. All this doesn't matter, though, because the investigators' criteria for veridicality were permissive: “The intra-trial Bias was removed by applying a uniform size scaling of the recovered scene in each trial.” Moreover, their definition of veridical corresponds narrowly to the absence of systematic, non-Euclidean geometrical distortion: “We can, therefore, conclude that our subjects’ visual space was not distorted, which means that they saw the 3D spatial arrangement of objects within this space veridically.” Given the apparent difficulty of the task and unreliability of the data, it's not actually clear that such distortions would have been detectable. Still, the conclusion that in certain conditions we see the world more or less veridically should not come as a surprise to scientists, given our ability to navigate it and act on it effectively. (In the case of those metaphysicians, e.g. D. D. Hoffman, who do not concede realism to any degree, no evidence can undermine their position (see e.g. Maniatis, Perception 2015, Vol. 44(10) 1149–1152).

      The fact that the objects in the room were mirror-symmetrical and the floor perpendicular to the direction of gravity only shows that such characteristics of a scene are not inconsistent with the degree of performance rated by the investigators' as “veridical.” It does not show that performance would not be equivalent if asymmetrical objects were used, or if the floor were tilted or slanted, because there were no such control conditions. Indeed, one could argue that performance would be equivalent under ANY conditions. The claim would be incorrect, but the results of this experiment could not be used to contradict such a claim. That is, the study does not actually exclude any possible outcomes/theoretical assumptions with respect to the features of a stimulus that enable veridical perception.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

  2. Feb 2018
    1. On 2016 Jan 14, Lydia Maniatis commented:

      This study does not seem to extend the reasonable (a rare thing) approach of the Pizlo group to 3D shape perception (a fundamental and much-neglected topic). The study addresses a. the ability of humans to accurately perceive the proportions of “3D indoor scenes” and b. the ability of a robot to reproduce such scenes using a computational model constructed (in part) by the investigators. However, while in the past the group was interested in discerning principles underlying human performance and using them as the basis for building computational models (which could then serve both the theoretical goal of testing the hypothesized principles and of constructing a computer that can “see like us”), here there appears to be no theoretical connection between human and robot performance.

      The way the robot goes about the task is not the way humans can or do. It involves constructing a depth map (using a camera's inbuilt program) on the basis of binocular disparity and triangulation. Pizlo (2008) is clear on the fact human 3D perception is not built on the basis of depth maps, or '2.5D sketches' exploiting binocular disparities: “There seems little reason to afford binocularity, disparity...a critical role in natural shape perception” p. 176).

      Conversely, the robot does not employ monocular principles – it cannot see with one eye. Yet in humans, monocular performance was not too much worse than binocular (which is necessary but not sufficient for the best accuracy), and the former can override the latter (e.g. the zero disparity in a view of a photograph of a 3D scene/objects doesn't prevent 3D percepts from arising).

      Thus, none of the (monocular) principles that the investigators have effectively applied in previous work on 3D shape perception were in play in this computational model (which, if I understand correctly, does little more than group points associated with objects, whose relative locations are given by the camera software). If the goal is to model human vision, then the present study has not taken any additional steps toward that goal. If the goal is to make robots that can navigate the environment, then non-human mechanisms – e.g. some kind of radar – might be more straightforward.

      At any rate, the human and the robot sections of this paper are unconnected, theoretically and practically.

      Problems with methods Focussing strictly on the experiments with humans, we can note that the only goal the method allowed, in principle, was to confirm that under some conditions (mirror-symmetrical objects in a rectangular room with a floor perpendicular to the direction of gravity), human perception of relative sizes and relative locations of objects is reasonably good. Performance was not great, with high within and between subject variability, and scale biases.

      It is not clear that the problem was not with perception, but with the task, which involved a pictorial representation of the ground plan. For non-artists, copying from life to paper/tablet can be a difficult task, and flaws in the picture don't necessarily correspond to flaws in perception (meaning that the experiment might better have used artists or draftsmen). Memory also comes into play, as the relative distances in the picture are repeatedly, and serially, checked against those of the scene. It is also known that between object/figure distances are less salient perceptually than within-object distances (see e.g. Arnheim, Art and Visual Perception (1974) p. 236-8. This book should be read by all vision scientists!)

      Given these difficulties, it's doubtful that the high degree of variability in performance found by the researchers corresponds to real variability in perception. All this doesn't matter, though, because the investigators' criteria for veridicality were permissive: “The intra-trial Bias was removed by applying a uniform size scaling of the recovered scene in each trial.” Moreover, their definition of veridical corresponds narrowly to the absence of systematic, non-Euclidean geometrical distortion: “We can, therefore, conclude that our subjects’ visual space was not distorted, which means that they saw the 3D spatial arrangement of objects within this space veridically.” Given the apparent difficulty of the task and unreliability of the data, it's not actually clear that such distortions would have been detectable. Still, the conclusion that in certain conditions we see the world more or less veridically should not come as a surprise to scientists, given our ability to navigate it and act on it effectively. (In the case of those metaphysicians, e.g. D. D. Hoffman, who do not concede realism to any degree, no evidence can undermine their position (see e.g. Maniatis, Perception 2015, Vol. 44(10) 1149–1152).

      The fact that the objects in the room were mirror-symmetrical and the floor perpendicular to the direction of gravity only shows that such characteristics of a scene are not inconsistent with the degree of performance rated by the investigators' as “veridical.” It does not show that performance would not be equivalent if asymmetrical objects were used, or if the floor were tilted or slanted, because there were no such control conditions. Indeed, one could argue that performance would be equivalent under ANY conditions. The claim would be incorrect, but the results of this experiment could not be used to contradict such a claim. That is, the study does not actually exclude any possible outcomes/theoretical assumptions with respect to the features of a stimulus that enable veridical perception.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2016 Jan 14, Lydia Maniatis commented:

      Conceptual problems The authors argue that when the visual system gets things wrong, the problem is that the scene is “impoverished” and “unnatural.” Such terms are not specific enough to allow them to be evaluated scientifically. If we use the term “natural” to mean created by natural processes, then the stimuli used in this study were not natural. If the authors mean more than that, they need to both specify what they mean and to control experimentally for the relevant factors. Otherwise, the definition of natural becomes tautological – any scene that produces more or less veridical percepts is defined as “natural.” A non-tautological step forward require examining cases where vision fails, discerning a potential distinction between the features of the cases were it succeeds and the cases where it fails, and testing the assumption that the distinguishing factors matter. Simply defining conditions where vision fails as “impoverished” and “unnatural” doesn't allow such a test (and/or the claims are easily falsified); it leads to confusion, since, for example, it would be difficult to argue that a photograph is an “impoverished” stimulus, or, again, that man-made objects are “natural.” If the authors are suggesting that all types of asymmetry compromise the accuracy of aspects of the 3D percept, then they need to rationalize the claim theoretically and/or frame it precisely enough that it can be tested. (I'm quite sure any clearly-framed symmetry claim can be easily falsified.)

      It is surprising, finally, that the authors claim not to understand the value of illusions in studying perception. The value is in testing hypotheses vis a vis the principles underlying visual performance. For example, the pictorial perception of 3D shapes, the Ames room, contradict Marr's 2.5D sketch concept, i.e. the primacy of depth maps in perception of a third dimension.

      The confusion as to the role of illusions is correlated with an epistemological confusion as to the use and logic of falsification. “It follows,” write the authors, “that with the types of models we are using, falsification is not your “best friend” as it often is elsewhere. In 3D vision, there is usually no model you can turn to, so there is nothing to falsify. Simply put, scientific discovery in 3D vision is not accomplished by using an ANOVA or Bayesian tests to reject some hypotheses. Discovery is accomplished by correctly guessing which cost function is actually being used by the visual system. objects within this space veridically.” But to corroborate a guess (the asumptions incorporated in a model) we must test it, and the test may prove it wrong, i.e. may falsify it. This does not mean testing merely by using ANOVA's to show significance, or lack thereof, in cases where the outcome is foggy, or “Bayesian” tests (to show...whatever), but actually showing that our assumptions – e.g. that binocular vision is not necessary or sufficient to the perception of 3D shape - do, or do not, hold up to rigorous testing.

      Finally, it's not made clear which “Gestalt-like” constraints are being referred to. (Organisational principles are necessary to all perception; is there meant to be more specificity than this?)


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.