Theories of Visual Perception: Problems and Perspectives

 

Greek theories of visual perception

The Greeks had two clearly opposing views on the way visual perception works - intromission theories and extramission theories. Intromission theorists, such as Democritus (c. 425 B.C.) and Epicurus (342-270 B.C.), believed that objects cast off resemblances of themselves, called eidola, rather in the way that snakes cast off their skins. These eidola are captured by the eye. It is the entry of eidola into the eye that allow us to see their shape. They took as evidence the fact that objects can be seen to be mirrored in the cornea of the observer. However this approach leads to unanswered questions - How do eidola pass through one another without interference? How do eidola of large objects shrink to enter the eye? How do eidola from a single object reach many people simultaneously? Extramission theorists, such as Plato (c. 427-347 B.C.) believed that visual fire emanated from the eye and coalesced with light to form a conduit that allows "motions" of the object to pass to the sensorium. However as Aristotle (384-322 B.C.) points out it is unreasonable to think that a ray from the eye could reach as far as the stars.

 

These theories demonstrate a lack of a modern understanding of physics and optics but the idea that perception involves the presence of copy of the object in the eye or brain is represented in modern theories of template matching.

 

Johannes Kepler and the retinal image

Modern theories of vision start with Johannes Kepler who in Ad Vitellionem paralipomena (1604) first correctly described the formation of the retinal image in the eye. A few years later Christoph Scheiner (1619) observed the retinal image by scraping away the sclera of the eye of an Ox which was placed in a hole in a shutter (reported by Descartes, 1637). However the was a problem - the retinal image was upside down. Why do we not see the world upside down? The answer to this problem is that the retinal image is not observed. If there existed a small man in the brain (a homunculus) looking at the retinal image then we would still need to explain how he sees the world and so on to an infinite regress.

 

Kepler's theory of the retinal image is pivotal. Old problems are not solved; they are explained away and new problems arise which still set the agenda today. Since the retinal image is two dimensional, how do we see a three dimensional world? How do we work out the real size of objects from their retinal size? How do we recognise an object is the same from different views? How can we see features that are not present in the retinal image?

 

Perspective ambiguity

Perspective drawing in art was developed by the C15th Italian artists/architects Brunelleschi and Alberti. A convenient way of thinking about perspective derives from Leonardo's window. This is a technique for perspective drawing in which the artist views a scene though a glass from a fixed vantage point. The artist then simply copies what he sees in the window on canvas. However there are many possible three-dimensional scenes that can give rise to the same two-dimensional image.

 

This was forcibly brought home by Albert Ames demonstrations in the 1940's. The Ames chair demonstration involves a collection of rods and shapes in 3D space, which looks like a chair from one vantage point. The point of the demonstration is that the visual input to a single eye is ambiguous. We cannot know the true 3D layout of surfaces in a scene from a single viewpoint.

 

Perceptual hypotheses

Constructivists such as Hermann von Helmholtz and Richard Gregory start with the position that external world cannot be directly perceived because of the poverty of the information in the retinal images. Since information is not directly given, we have to interpret the sensory data in order to construct percepts. Images are interpreted on the basis of stored knowledge acquired through learning.

 

Helmholtz believed the visual system drew "unconscious inferences" which he later referred to as "inductive conclusions". Induction is the process of drawing a general conclusion from individual instances - if all the swans we ever see are white we draw the conclusion that "all swans are white". This is same process as is used in the formation of scientific hypotheses. Gregory takes this further and argues that perception is a collection of hypotheses about the world. Evidence for this view comes from analysis of many visual illusions that can be attributed to calibration errors (e.g. the tilt illusion) or misplaced assumptions (Kanisza's triangle) and to the top-down influence of knowledge and expectation.

 

The ecological approach to perception

In the 1950's James Gibson challenged this view of visual processing. He referred to his theory as an ecological approach because, rather than emphasising the poverty of the retinal image, he emphasised the information available in the visual environment to an active observer. He believed that perception was direct, by which he meant that perception is not mediated by a process of inference, and percepts are not constructed from sensations. Gibson emphasised relations in the environment. Whereas the constructivists argue size constancy requires us to scale the retinal image by the viewing distance, Gibson argues we judge size in relation to the amount of background texture covered by the object. Motion of the observer gives rise to optic flow, which specifies how the observer is moving in relation to the environment. Theories of direct perception however do not provide very satisfactory explanations of visual illusions.

 

The Gestalt school

Gestalt psychologists, such as Wertheimer, Koffka and Kohler also rejected the structuralist ideas that perceptions were constructed from sensations. They addressed the question "Why do things look as they do?" (Koffka). They noted the spontaneous tendency to split scenes into figure and ground. They also studied the rules by which material is grouped and segmented. The so-called laws of grouping include good continuation, proximity, symmetry, similarity and common fate. These laws may simply reflect the statistical regularities of the natural visual environment - similar patterns normally arise from the same surface. The core Gestalt idea, that the whole is greater that the sum of the parts, emphasises relations between parts. The melody of a tune is still recognisable though it is played on different instruments. Kohler attempted to explain perception through neural isomorphism i.e. what we see reflects isomorphic patterns in the brain. A good example of this kind of theorising is Kohler's explanation of phi motion. If two spatially separated lights flash on and off in sequence, one experiences continuous motion from the first position to the second position. Kohler supposed that each flash sets up an electric field in the brain and the interaction of these fields caused the perception of motion. Recently, there has been a resurgence of interest in the difficult problems raised by grouping, segmentation and perceptual constancy studied by the Gestalt school.

 

The computational approach

Illustrated well by the work of David Marr, computational psychologists aim to understand visual processes by building computer models of these processes. Vision is seen as the process of forming a description of what is in the scene from the retinal images. This process is sometimes referred to as inverse graphics. From the starting point of a description of the geometry of a scene, the reflectances of surfaces, the position of light sources and the position of a viewer, it is possible to construct a realistic image of a scene. The task of the visual system is to reverse this process and recover the causes of the scene from the images on the retina. Computational vision aims to specify mathematically how this is done and to assign a functional role to neural components involved in this computation.

 

 

Reading:

Gordon, I.E. (1997) Theories of Visual Perception, John Wiley, Chichester.

Lindberg, D.C. (1976) Theories of Vision from Al-Kindi to Kepler, U. of Chicago Press.

 

 

Originally written by: 

Prof. Alan Johnston

Division of Psychology and Language Sciences

University College London

 

Prof. Johnston's original version is available at:

http://www.psychol.ucl.ac.uk/alan.johnston/Theories.html

 

The present version has been edited for typographical errors, and some of the hyperlinks have been modified.