- Jul 2018
-
europepmc.org europepmc.org
-
On 2016 Jun 22, Lydia Maniatis commented:
Reading the title of this article, one might get the idea that we learn the 3D structure of objects from 2D views. Looking at it a little more closely, we might experience some confusion as to the distinction being made between “structure” and “shape.” Isn't it a little tautological to say that grasping structure depends on grasping shape? And what do they mean by “format”?
In fact, the idea that we learn 3D structure from 2D views, which the authors want us to take for granted, is false. But isn't it the case that “a large body of research shows that viewpoint-invariant recognition is learned by viewing 2-D examples of an object (Bulthoff, Edelman, & Tarr, 1995; Hayward & Tarr, 1997; Tarr, Williams, Hayward & Gauthier, 1998)”? In fact, the references cited in no way support this bold claim. One of them is irrelevant, and the other two make claims that are highly qualified, tentative, and in need of corroboration. Given that nothing more definite seems to have developed in the two decades or so since these hopeful publications, the relevant body of research seems neither large nor solid. Another claim is also unsupported by its accompanying citation: “It is thought that during learning people use structural information in the 2-D image to infer the 3-D structure of the object (Nakayama, Shimojo, & Silverman, 1989).” In fact, the cited article addresses issues of figure ground segmentation, and not 3-D shape. Its text doesn't even include the terms 3-D, structure, or shape. So the authors are giving readers a false impression that their project is on empirically solid ground.
In fact, the claim that 3-D structure is learned from 2-D views has long been known to be false on logical and empirical grounds. In a more general sense, the idea that we either can or do learn to perceive in 3-D from 2-D experience is not defensible. No one has to learn to see 3-D objects; it happens automatically. “Newborns can recognize...the same rectangle placed at different slants.” (http://nwkpsych.rutgers.edu/~alan/Gilchrist_NN_2003.pdf). Individual 2D views produce 3D percepts; these percepts may or may not be veridical, but knowledge does not affect the basic process. Each view entails its own necessary 3D perceptual response based on shape rules that assume characteristics such as convexity. View “x” will not produce a different shape percept even if we know the one we see is false, just as a 3D-looking flat image does not look flat even though we know it is. We can't learn to see a Necker cube as flat, from no matter how many angles we inspect it.
The idea that we learn 3-D structure also fails for logical reasons. How does a collection of 2D images, a pack of flat cards, in effect, become a - qualitatively different - 3D percept? On what basis is this conversion made? Is each new sample - of the infinite number of possible samples - a new piece of information? Multiple 2D views don't resolve the fundamental ambiguity of the projection/3D percept relationship. The Ames window is not correctly seen despite a complete set of views around an axis. Shape ambiguity is resolved on the basis of rules of organization, not on the basis of collecting ever new views which are themselves ambiguous. (Even when we know a shape is unchanging, the necessary link between individual views and 3D percepts can produce a changing percept, e.g. https://www.youtube.com/watch?v=jRqYkQz0G-g). Furthermore, how or when does the visual system draw the conclusion that a series of projections, such as are shown to subjects in the task chosen by these authors, are projections of the same single and unchanging 3D shape?
Don't the study's subjects exhibit learning of 3D structure form 2D views? If they did it would be a kind of miracle, but the results don't require us to believe in miracles. Subjects' (rather poor) performance in no way requires that their exposure to a series of views of a very strangely-shaped object produce a consistent, learned, 3D percept. Memory for details can do the trick (a single sharper angle, for example, in one of two objects to be discriminated can allow them to eliminate it without having grasped the whole or even part of the 3D shape). And a lot of the time they are just guessing.
A final thing that should be clear to researchers with some knowledge of perceptual phenomena is that the results of this task are totally contingent on the shapes used. The authors here crudely divide objects into novel and familiar. But whether projections of a single, novel object consistently produce the same 3D percept (which, as we saw, applies to some objects viewed by infants) depends on the shape of the object. There is no absolute expectation of constancy. Different, more rational (as far as the expectations of the visual system) shapes would have produced better outcomes than the random ones used here; others may have produced worse. The numbers here are meaningless because the authors haven't analyzed the role of shape.
As for the finding with regard to “format,” it is well-known that both line drawings and chiaroscuro drawings can both produce good 3D shape percepts, and that silhouettes tend to look flat, lacking any indication of relief. If the results of this study had not borne this out, these facts would have remained intact.
In sum, the results of this study can be interpreted as consistent with a false (on the basis of logic and experiment) premise because the task was not designed to distinguish between this (im)possibility and more plausible (and theoretically uninteresting) alternatives.
It would be nice if people doing research in perception spent a little time actually learning the basics.
p.s. In their discussion, Tian et al: "suggest that stereo provides a behavioral benefit only if it resolves ambiguity in the interpretation of the 3-D structure of objects that cannot be resolved from other sources of information."
This is very similar to the titular claim of an earlier article by Pizlo, Li, Steinman (2008): "Binocular disparity only comes into play when everything else fails; a finding with broader implications than one might suppose."
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
-
- Feb 2018
-
europepmc.org europepmc.org
-
On 2016 Jun 22, Lydia Maniatis commented:
Reading the title of this article, one might get the idea that we learn the 3D structure of objects from 2D views. Looking at it a little more closely, we might experience some confusion as to the distinction being made between “structure” and “shape.” Isn't it a little tautological to say that grasping structure depends on grasping shape? And what do they mean by “format”?
In fact, the idea that we learn 3D structure from 2D views, which the authors want us to take for granted, is false. But isn't it the case that “a large body of research shows that viewpoint-invariant recognition is learned by viewing 2-D examples of an object (Bulthoff, Edelman, & Tarr, 1995; Hayward & Tarr, 1997; Tarr, Williams, Hayward & Gauthier, 1998)”? In fact, the references cited in no way support this bold claim. One of them is irrelevant, and the other two make claims that are highly qualified, tentative, and in need of corroboration. Given that nothing more definite seems to have developed in the two decades or so since these hopeful publications, the relevant body of research seems neither large nor solid. Another claim is also unsupported by its accompanying citation: “It is thought that during learning people use structural information in the 2-D image to infer the 3-D structure of the object (Nakayama, Shimojo, & Silverman, 1989).” In fact, the cited article addresses issues of figure ground segmentation, and not 3-D shape. Its text doesn't even include the terms 3-D, structure, or shape. So the authors are giving readers a false impression that their project is on empirically solid ground.
In fact, the claim that 3-D structure is learned from 2-D views has long been known to be false on logical and empirical grounds. In a more general sense, the idea that we either can or do learn to perceive in 3-D from 2-D experience is not defensible. No one has to learn to see 3-D objects; it happens automatically. “Newborns can recognize...the same rectangle placed at different slants.” (http://nwkpsych.rutgers.edu/~alan/Gilchrist_NN_2003.pdf). Individual 2D views produce 3D percepts; these percepts may or may not be veridical, but knowledge does not affect the basic process. Each view entails its own necessary 3D perceptual response based on shape rules that assume characteristics such as convexity. View “x” will not produce a different shape percept even if we know the one we see is false, just as a 3D-looking flat image does not look flat even though we know it is. We can't learn to see a Necker cube as flat, from no matter how many angles we inspect it.
The idea that we learn 3-D structure also fails for logical reasons. How does a collection of 2D images, a pack of flat cards, in effect, become a - qualitatively different - 3D percept? On what basis is this conversion made? Is each new sample - of the infinite number of possible samples - a new piece of information? Multiple 2D views don't resolve the fundamental ambiguity of the projection/3D percept relationship. The Ames window is not correctly seen despite a complete set of views around an axis. Shape ambiguity is resolved on the basis of rules of organization, not on the basis of collecting ever new views which are themselves ambiguous. (Even when we know a shape is unchanging, the necessary link between individual views and 3D percepts can produce a changing percept, e.g. https://www.youtube.com/watch?v=jRqYkQz0G-g). Furthermore, how or when does the visual system draw the conclusion that a series of projections, such as are shown to subjects in the task chosen by these authors, are projections of the same single and unchanging 3D shape?
Don't the study's subjects exhibit learning of 3D structure form 2D views? If they did it would be a kind of miracle, but the results don't require us to believe in miracles. Subjects' (rather poor) performance in no way requires that their exposure to a series of views of a very strangely-shaped object produce a consistent, learned, 3D percept. Memory for details can do the trick (a single sharper angle, for example, in one of two objects to be discriminated can allow them to eliminate it without having grasped the whole or even part of the 3D shape). And a lot of the time they are just guessing.
A final thing that should be clear to researchers with some knowledge of perceptual phenomena is that the results of this task are totally contingent on the shapes used. The authors here crudely divide objects into novel and familiar. But whether projections of a single, novel object consistently produce the same 3D percept (which, as we saw, applies to some objects viewed by infants) depends on the shape of the object. There is no absolute expectation of constancy. Different, more rational (as far as the expectations of the visual system) shapes would have produced better outcomes than the random ones used here; others may have produced worse. The numbers here are meaningless because the authors haven't analyzed the role of shape.
As for the finding with regard to “format,” it is well-known that both line drawings and chiaroscuro drawings can both produce good 3D shape percepts, and that silhouettes tend to look flat, lacking any indication of relief. If the results of this study had not borne this out, these facts would have remained intact.
In sum, the results of this study can be interpreted as consistent with a false (on the basis of logic and experiment) premise because the task was not designed to distinguish between this (im)possibility and more plausible (and theoretically uninteresting) alternatives.
It would be nice if people doing research in perception spent a little time actually learning the basics.
p.s. In their discussion, Tian et al: "suggest that stereo provides a behavioral benefit only if it resolves ambiguity in the interpretation of the 3-D structure of objects that cannot be resolved from other sources of information."
This is very similar to the titular claim of an earlier article by Pizlo, Li, Steinman (2008): "Binocular disparity only comes into play when everything else fails; a finding with broader implications than one might suppose."
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
-