4.3 Vision II: Recognizing What We Perceive

We correctly combine features into unified objects; so, for example, we see that the young man is wearing a red shirt and the young woman is wearing a yellow shirt.
FUSE/PUNCHSTOCK

Our journey into the visual system has already revealed how it accomplishes some pretty astonishing feats. But the system needs to do much more in order for us to be able to interact effectively with our visual worlds. Let’s now consider how the system links together individual visual features into whole objects, allows us to recognize what those objects are, organizes objects into visual scenes, and detects motion and change in those scenes. Along the way we’ll see that studying visual errors and illusions provides key insights into how these processes work.

Attention: The “Glue” That Binds Individual Features into a Whole

Specialized feature detectors in different parts of the visual system analyze each of the multiple features of a visible object: orientation, color, size, shape, and so forth. But how are different features combined into single, unified objects? What allows us to perceive so easily and correctly that the young man in the photo is wearing a red shirt and the young woman is wearing a yellow shirt? Why don’t we see free-floating patches of red and yellow, or even incorrect combinations, such as the young man wearing a yellow shirt and the young woman wearing a red shirt? These questions refer to what researchers call the binding problem in perception, how features are linked together so that we see unified objects in our visual world rather than free-floating or miscombined features (Treisman, 1998, 2006).

146

Illusory Conjunctions: Perceptual Mistakes

Figure 4.14: Illusory Conjunctions illusory conjunctions occur when features such as color and shape are combined incorrectly. For example, when participants are shown a red A and blue X, they sometimes report seeing a blue A and red X. Other kinds of errors, such as a misreported letter (e.g., reporting T when no T was presented) or misreported color (reporting green when no green was presented) occur rarely, indicating that illusory conjunctions are not the result of guessing (based on Robertson, 2003).

In everyday life, we correctly combine features into unified objects so automatically and effortlessly that it may be difficult to appreciate that binding is ever a problem at all. However, researchers have discovered errors in binding that reveal important clues about how the process works. One such error is known as an illusory conjunction, a perceptual mistake where features from multiple objects are incorrectly combined. In a pioneering study of illusory conjunctions, Anne Treisman and Hilary Schmidt (1982) briefly showed study participants visual displays in which black digits flanked colored letters, then instructed them first to report the black digits and second to describe the colored letters. Participants frequently reported illusory conjunctions, claiming to have seen, for example, a blue A or a red X instead of the red A and the blue X that had actually been shown (see FIGURE 4.14a and b). These illusory conjunctions were not just the result of guessing; they occurred more frequently than other kinds of errors, such as reporting a letter or color that was not present in the display (see FIGURE 4.14c). Illusory conjunctions look real to the participants, who were just as confident they had seen them as they were about the actual colored letters they perceived correctly.

How does the study of illusory conjunctions help us understand the role of attention in feature binding?

Why do illusory conjunctions occur? Treisman and her colleagues have tried to explain them by proposing a feature-integration theory (Treisman, 1998, 2006; Treisman & Gelade, 1980; Treisman & Schmidt, 1982), which holds that focused attention is not required to detect the individual features that comprise a stimulus, such as the color, shape, size, and location of letters, but is required to bind those individual features together. From this perspective, attention provides the “glue” necessary to bind features together, and illusory conjunctions occur when it is difficult for participants to pay full attention to the features that need to be glued together. For example, in the experiments we just considered, participants were required to process the digits that flank the colored letters, thereby reducing attention to the letters and allowing illusory conjunctions to occur. When experimental conditions are changed so that participants can pay full attention to the colored letters, and they are able to correctly bind their features together, illusory conjunctions disappear (Treisman, 1998; Treisman & Schmidt, 1982).

The Role of the Parietal Lobe

The binding process makes use of feature information processed by structures within the ventral visual stream, the “what” pathway (Seymour et al., 2010; see FIGURE 4.12). But because binding involves linking together features processed in distinct parts of the ventral stream at a particular spatial location, it also depends critically on the parietal lobe in the dorsal stream, the “where” pathway (Robertson, 1999). For example, Treisman and others studied R.M., who had suffered strokes that destroyed both his left and right parietal lobes. Although many aspects of his visual function were intact, he had severe problems attending to spatially distinct objects. When presented with stimuli such as those in Figure 4.14, R.M. perceived an abnormally large number of illusory conjunctions, even when he was given as long as 10 seconds to look at the displays (Friedman-Hill, Robertson, & Treisman, 1995; Robertson, 2003). More recent studies of persons with similar brain injuries suggest that damage to the upper and posterior portions of the parietal lobe is likely to produce similar problems (Braet & Humphreys, 2009; McCrea, Buxbaum, & Coslett, 2006). These same parietal regions are activated in healthy individuals when they perform the kind of visual feature binding that persons with parietal lobe damage are unable to perform (Shafritz, Gore, & Marois, 2002), as well as when they search for conjunction features (Corbetta et al., 1995; Donner et al., 2002).

147

Recognizing Objects by Sight

Take a quick look at the letters in the accompanying illustration. Even though they’re quite different from one another, you probably effortlessly recognized them as all being examples of the letter G. Now consider the same kind of demonstration using your best friend’s face. Your friend might have long hair, but one day she decides to get it cut dramatically short. Suppose one day your friend gets a dramatic new haircut—or adds glasses, hair dye, or a nose ring. Even though your friend now looks strikingly different, you still recognize that person with ease. Just like the variability in Gs, you somehow are able to extract the underlying features of the face that allow you to accurately identify your friend.

A quick glance and you recognize all these letters as G, but their varying sizes, shapes, angles, and orientations ought to make this recognition task difficult. What is it about the process of object recognition that allows us to perform this task effortlessly?

This thought exercise may seem trivial, but it’s no small perceptual feat. If the visual system were somehow stumped each time a minor variation occurred in an object being perceived, the inefficiency of it all would be overwhelming. We’d have to process information effortfully just to perceive our friend as the same person from one meeting to another, not to mention laboring through the process of knowing when a G is really a G. In general, though, object recognition proceeds fairly smoothly, in large part due to the operation of the feature detectors we discussed earlier.

Our visual systems allow us to identify people as the same individual even when they change such features as hair style and skin color. Despite the extreme changes in these two photos, you can probably tell that they both portray Johnny Depp.
WIREIMAGE/GETTY IMAGES
WARNER BROS./THE KOBAL COLLECTION/ART RESOURCE

How do feature detectors help the visual system get from a spatial array of light hitting the eye to the accurate perception of an object in different circumstances, such as your friend’s face? Some researchers argue for a modular view, that specialized brain areas, or modules, detect and represent faces or houses or even body parts. Using fMRI to examine visual processing in healthy young adults, researchers found a subregion in the temporal lobe that responds most strongly to faces compared to just about any other object category, whereas a nearby area responds most strongly to buildings and landscapes (Kanwisher, McDermott, & Chun, 1997). This view suggests we not only have feature detectors to aid in visual perception but also “face detectors,” “building detectors,” and possibly other types of neurons specialized for particular types of object perception (Downing et al., 2006; Kanwisher & Yovel, 2006). Other researchers argue for a more distributed representation of object categories. In this view, it is the pattern of activity across multiple brain regions that identifies any viewed object, including faces (Haxby et al., 2001). Each of these views explains some data better than the other one, and researchers continue to debate their relative merits.

148

Another perspective on this issue is provided by experiments designed to measure precisely where seizures originate; these experiments have also provided insights on how single neurons in the human brain respond to objects and faces (Suthana & Fried, 2012). For example, in a study by Quiroga et al. (2005), electrodes were placed in the temporal lobes of people who suffer from epilepsy. Then the volunteers were shown photographs of faces and objects as the researchers recorded their neural responses. The researchers found that neurons in the temporal lobe respond to specific objects viewed from multiple angles and to people wearing different clothing and facial expressions and photographed from various angles. In some cases, the neurons also respond to the words for these same objects. For example, a neuron that responded to photographs of the Sydney Opera House also responded when the words Sydney Opera were displayed, but not when the words Eiffel Tower were displayed.

How do we recognize our friends, even when they’re hidden behind sunglasses?

Taken together, these experiments demonstrate the principle of perceptual constancy: Even as aspects of sensory signals change, perception remains consistent. Think back once again to our discussion of difference thresholds early in this chapter. Our perceptual systems are sensitive to relative differences in changing stimulation and make allowances for varying sensory input. This general principle helps explain why you still recognize your friend despite changes in hair color or style or the addition of facial jewelry. It’s not as though your visual perceptual system responds to a change with, “Here’s a new and unfamiliar face to perceive.” Rather, it’s as though it responds with, “Interesting…here’s a deviation from the way this face usually looks.” Perception is sensitive to changes in stimuli, but perceptual constancies allow us to notice the differences in the first place.

Principles of Perceptual Organization

Before object recognition can even kick in, the visual system must perform another important task: grouping the image regions that belong together into a representation of an object. The idea that we tend to perceive a unified, whole object rather than a collection of separate parts is the foundation of Gestalt psychology, which you read about in Chapter 1. Gestalt principles characterize many aspects of human perception. Among the foremost are the Gestalt perceptual grouping rules, which govern how the features and regions of things fit together (Koffka, 1935). Here’s a sampling:

Separating Figure from Ground

Figure 4.16: Ambiguous Edges Here’s how Rubin’s classic reversible figure–ground illusion works: Fixate your eyes on the center of the image and your perception will alternate between a vase and facing silhouettes, even as the sensory stimulation remains constant.

Perceptual grouping is a powerful aid to our ability to recognize objects by sight. Grouping involves visually separating an object from its surroundings. In Gestalt terms, this means identifying a figure apart from the (back)ground in which it resides. For example, the words on this page are perceived as figural: They stand out from the ground of the sheet of paper on which they’re printed. Similarly, your instructor is perceived as the figure against the backdrop of all the other elements in your classroom. You certainly can perceive these elements differently, of course: The words and the paper are all part of a thing called a page, and your instructor and the classroom can all be perceived as your learning environment. Typically, though, our perceptual systems focus attention on some objects as distinct from their environments.

Size provides one clue to what’s figure and what’s ground: Smaller regions are likely to be figures, such as tiny letters on a big sheet of paper. Movement also helps: Your instructor is (we hope) a dynamic lecturer, moving around in a static environment. Another critical step toward object recognition is edge assignment. Given an edge, or boundary, between figure and ground, which region does that edge belong to? If the edge belongs to the figure, it helps define the object’s shape, and the background continues behind the edge. Sometimes, though, it’s not easy to tell which is which.

Edgar Rubin (1886–1951), a Danish psychologist, capitalized on this ambiguity and developed a famous illusion called the Rubin vase or, more generally, a reversible figure—ground relationship. You can view this “face—vase” illusion in FIGURE 4.16 in two ways, either as a vase on a black background or as a pair of silhouettes facing each other. Your visual system settles on one or the other interpretation and fluctuates between them every few seconds. This happens because the edge that would normally separate figure from ground is really part of neither: It equally defines the contours of the vase as it does the contours of the faces. Evidence from fMRIs shows, quite nicely, that when people are seeing the Rubin image as faces, there is greater activity in the face-selective region of the temporal lobe we discussed earlier than when they are seeing it as a vase (Hasson et al., 2001).

150

Theories of Object Recognition

I What is an important difference between template and parts-based theory of object recognition?

Researchers have proposed two broad explanations of object recognition, one based on the object as a whole and the other on its parts.

Each set of theories has strengths and weaknesses, making object recognition an active area of study in psychology. Researchers are developing hybrid theories that attempt to exploit the strengths of each approach (Peissig & Tarr, 2007).

Perceiving Depth and Size

Objects in the world are arranged in three dimensions—length, width, and depth—but the retinal image contains only two dimensions, length and width. How does the brain process a flat, 2-D retinal image so that we perceive the depth of an object and how far away it is? The answer lies in a collection of depth cues that change as you move through space. Monocular and binocular depth cues all help visual perception (Howard, 2002).

Monocular Depth Cues

Monocular depth cues are aspects of a scene that yield information about depth when viewed with only one eye. These cues rely on the relationship between distance and size. Even with one eye closed, the retinal image of an object you’re focused on grows smaller as that object moves farther away, and larger as it moves closer. Our brains routinely use these differences in retinal image size, or relative size, to perceive distance.

151

Figure 4.18: Familiar Size and Relative Size When you view images of people, such as the people in the left-hand photo, or of things you know well, the object you perceive as smaller appears farther away. With a little image manipulation, you can see in the right-hand photo that the relative size difference projected on your retinas is far greater than you perceive. The image of the person in the blue vest is exactly the same size in both photos.
THE PHOTO WORKS

This works particularly well in a monocular depth cue called familiar size. Most adults, for example, fall within a familiar range of heights (perhaps 5–7 feet tall), so retinal image size alone is usually a reliable cue to how far away they are. Our visual system automatically corrects for size differences and attributes them to differences in distance. FIGURE 4.18 demonstrates how strong this mental correction for familiar size is.

In addition to relative size and familiar size, there are several more monocular depth cues, such as

Figure 4.19: Pictorial Depth Cues Visual artists rely on a variety of monocular cues to make their work come to life. You can rely on cues such as linear perspective (a), texture gradient (b), interposition (c), and relative height (d) in an image to infer distance, depth, and position, even if you’re wearing an eye patch.
SUPERSTOCK
AGE FOTOSTOCK/SUPERSTOCK
ALTRENDO/GETTY IMAGES
ROB BLAKERS/GETTY IMAGES

152

Binocular Depth Cues

Figure 4.20: Binocular disparity We see the world in three dimensions because our eyes are a distance apart and the image of an object falls on the retinas of each eye at a slightly different place. In this two-object scene, the images of the square and the circle fall on different points of the retina in each eye. The disparity in the positions of the circle’s retinal images provides a compelling cue to depth.

We can also obtain depth information through binocular disparity, the difference in the retinal images of the two eyes that provides information about depth. Because our eyes are slightly separated, each registers a slightly different view of the world. Your brain computes the disparity between the two retinal images to perceive how far away objects are, as shown in FIGURE 4.20. Viewed from above in the figure, the images of the more distant square and the closer circle each fall at different points on each retina.

Binocular disparity as a cue to depth perception was first discussed by Sir Charles Wheatstone in 1838. Wheatstone went on to invent the stereoscope, essentially a holder for a pair of photographs or drawings taken from two horizontally displaced locations. (Wheatstone did not lack for original ideas; he also invented the accordion and an early telegraph and coined the term microphone.) When viewed, one by each eye, the pairs of images evoked a vivid sense of depth. The View-Master toy is the modern successor to Wheatstone’s invention, and 3-D movies are based on this same idea.

The View-Master has been a popular toy for decades. It is based on the principle of binocular disparity: Two images taken from slightly different angles produce a stereoscopic effect.
CORBIS/ALAMY
©NMPFT/SSPL/THE IMAGE WORKS

Illusions of Depth and Size

We all are vulnerable to illusions, which, as you’ll remember from the Psychology: Evolution of a Science chapter, are errors of perception, memory, or judgment in which subjective experience differs from objective reality (Wade, 2005). The relation between size and distance has been used to create elaborate illusions that depend on fooling the visual system about how far away objects are. All these illusions depend on the same principle: When you view two objects that project the same retinal image size, the object you perceive as farther away will be perceived as larger. One of the most famous illusions is the Ames room, constructed by the American ophthalmologist Adelbert Ames in 1946. The room is trapezoidal in shape rather than square: Only two sides are parallel (see FIGURE 4.21a). A person standing in one corner of an Ames room is physically twice as far away from the viewer as a person standing in the other corner. But when viewed with one eye through the small peephole placed in one wall, the Ames room looks square because the shapes of the windows and the flooring tiles are carefully crafted to look square from the viewing port (Ittelson, 1952).

Figure 4.21: The Amazing Ames Room (a) A diagram showing the actual proportions of the Ames room reveals its secrets. The sides of the room form a trapezoid with parallel sides but a back wall that’s way off square. The uneven floor makes the room’s height in the far back corner shorter than the other. Add misleading cues such as specially designed windows and flooring and position the room’s occupants in each far corner and you’re ready to lure an unsuspecting observer. (b) Looking into the Ames room through the viewing port with only one eye, the observer infers a normal size-distance relationship—that both people are the same distance away. But the different image sizes they project on the retina leads the viewer to conclude, based on the monocular cue of familiar size, that one person is very small and the other is very large.
(B) PHIL SCHERMEISTER/CORBIS

The visual system perceives the far wall as perpendicular to the line of sight so that people standing at different positions along that wall appear to be at the same distance, and the viewer’s judgments of their sizes are based directly on retinal image size. As a result, a person standing in the right corner appears to be much larger than a person standing in the left corner (see FIGURE 4.21b).

153

Perceiving Motion and Change

You should now have a good sense of how we see what and where objects are, a process made substantially easier when the objects stay in one place. But real life, of course, is full of moving targets. Objects change position over time: Birds fly and horses gallop; rain and snow fall; and trees bend in the wind. Understanding how we perceive motion and why we sometimes fail to perceive change can bring us closer to appreciating how visual perception works in everyday life.

Motion Perception

To sense motion, the visual system must encode information about both space and time. The simplest case to consider is an observer who does not move trying to perceive an object that does.

As an object moves across an observer’s stationary visual field, it first stimulates one location on the retina, and then a little later it stimulates another location on the retina. Neural circuits in the brain can detect this change in position over time and respond to specific speeds and directions of motion (Emerson, Bergen, & Adelson, 1992). A region in the middle of the temporal lobe referred to as MT (part of the dorsal stream we discussed earlier) is specialized for the visual perception of motion (Born & Bradley, 2005; Newsome & Paré, 1988), and brain damage in this area leads to a deficit in normal motion perception (Zihl, von Cramon, & Mai, 1983).

Of course, in the real world, rarely are you a stationary observer. As you move around, your head and eyes move all the time, and motion perception is not as simple. The motion-perception system must take into account the position and movement of your eyes, and ultimately of your head and body, in order to perceive the motions of objects correctly and allow you to approach or avoid them. The brain accomplishes this by monitoring your eye and head movements and “subtracting” them from the motion in the retinal image.

Color perception and motion perception both rely partially on opponent processing, which is why we fall prey to illusions such as color aftereffects and the waterfall illusion.
YURIY BRYKAYLO/ALAMY

Motion perception, like color perception, operates in part on opponent processes and is subject to sensory adaptation. A motion aftereffect called the waterfall illusion is analogous to color aftereffects. If you stare at the downward rush of a waterfall for several seconds, you’ll experience an upward motion aftereffect when you then look at stationary objects near the waterfall such as trees or rocks. What’s going on here?

154

The process is similar to seeing green after staring at a patch of red. Motion-sensitive neurons are connected to motion detector cells in the brain that encode motion in opposite directions. A sense of motion comes from the difference in the strength of these two opposing sensors. If one set of motion detector cells is fatigued through adaptation to motion in one direction, then the opposing sensor will take over. The net result is that motion is perceived in the opposite direction. Evidence from fMRIs indicates that when people experience the waterfall illusion while viewing a stationary stimulus, there is increased activity in region MT, which plays a key role in motion perception (Tootell et al., 1995).

How can flashing lights on a casino sign give the impression of movement?

The movement of objects in the world is not the only event that can evoke the perception of motion. The successively flashing lights of a Las Vegas casino sign can evoke a strong sense of motion because people perceive a series of flashing lights as a whole, moving object (see FIGURE 4.15f). This perception of movement as a result of alternating signals appearing in rapid succession in different locations is called apparent motion.

Video technology and animation depend on apparent motion. Motion pictures flash 24 still frames per second (fps). A slower rate would produce a much choppier sense of motion; a faster rate would be a waste of resources because we would not perceive the motion as any smoother than it appears at 24 fps.

Change Blindness and Inattentional Blindness

Motion involves a change in an object’s position over time, but objects in the visual environment can change in ways that do not involve motion (Rensink, 2002). You might walk by the same clothing store window every day and notice when a new suit or dress is on display or register surprise when you see a friend’s new haircut. Intuitively, we feel that we can easily detect changes to our visual environment. However, our comfortable intuitions have been challenged by experimental demonstrations of change blindness, which occurs when people fail to detect changes to the visual details of a scene (Rensink, 2002; Simons & Rensink, 2005). Strikingly, change blindness occurs even when major details of a scene are changed—changes that we incorrectly believe that we could not miss (Beek, Levin, & Angelone, 2007). For example, participants in one study were shown a movie in which a young blond man sits at a desk, gets up and walks away from the desk, and exits from the room (Levin and Simons, 1997). The scene then shifted outside the room, where the young man made a phone call. This all sounds straightforward, but unknown to the participants, the man sitting at the desk was not the same person as the man who made the phone call. Although both are young, blond, and wearing glasses, they were clearly different people. Still, two-thirds of the participants failed to notice the change.

It’s one thing to create change blindness by splicing a film, but does change blindness also occur in live interactions? Another study tested this idea by having an experimenter ask a person on a college campus for directions (Simons & Levin, 1998). While they were talking, two men walked between them holding a door that hid a second experimenter (see FIGURE 4.22). Behind the door, the two experimenters traded places, so that when the men carrying the door moved on, a different person was asking for directions than the one who had been there just a second or two earlier. Remarkably, only 7 of 15 participants reported noticing this change.

Figure 4.22: change blindness The white-haired man was giving directions to one experimenter (a), who disappeared behind the moving door (b), only to be replaced by another experimenter (c). Like many other people, the man failed to detect a seemingly obvious change.
After Simons, D. J., & Levin, D. T. (1998). Failure to detect changes to people during a real-world interaction. Psychonomic Bulletin & Review, 5(4), 644–649.

155

How can a failure of focused attention explain change blindness?

Although it is surprising that people can be blind to such dramatic changes, these findings once again illustrate the importance of focused attention for visual perception (see the discussion of feature-integration theory, p. 146). Just as focused attention is critical for binding together the features of objects, it is also necessary for detecting changes to objects and scenes (Rensink, 2002; Simons & Rensink, 2005). Change blindness is most likely to occur when people fail to focus attention on the changed object (even though the object is registered by the visual system) and is least likely to occur for items that draw attention to themselves (Rensink, O’Regan, & Clark, 1997).

College students who were using their cell phones while walking through campus failed to notice the unicycling clown more frequently than students who were not using their cell phones.
HYMAN ET AL, 2010

The role of focused attention in conscious visual experience is also dramatically illustrated by the closely related phenomenon of inattentional blindness, a failure to perceive objects that are not the focus of attention. Imagine the following scenario. You are watching a circle of people passing around a basketball, somebody dressed in a gorilla costume walks through the circle, and the gorilla stops to beat his chest before moving on. It seems inconceivable that you would fail to notice the gorilla, right? Think again. Simons and Chabris (1999) filmed such a scene, using two teams of three players each who passed the ball to one another as the costumed gorilla made his entrance and exit. Participants watched the film and were asked to track the movement of the ball by counting the number of passes made by one of the teams. With their attention focused on the moving ball, approximately half the participants failed to notice the chest-beating gorilla.

This has interesting implications for a world in which many of us are busy texting and talking on our cell phones while carrying on other kinds of everyday business. We’ve already seen that using cell phones has negative effects on driving (see The Real World: Multitasking). Ira Hyman and colleagues (2010) asked whether cell phone use contributes to inattentional blindness in everyday life. They recruited a clown to ride a unicycle in the middle of a large square in the middle of the campus at Western Washington University. On a pleasant afternoon, the researchers asked 151 students who had just walked through the square whether they saw the clown. Seventy-five percent of the students who were using cell phones failed to notice the clown, compared with less than 50% who were not using cell phones. Using cell phones draws on focused attention, resulting in increased inattentional blindness and emphasizing again that our conscious experience of the visual environment is restricted to those features or objects selected by focused attention.

156

CULTURE & COMMUNITY: Does culture influence change blindness?

The experiments discussed in this section of the chapter show that change blindness is dramatic and occurs across a range of situations. But the evidence for change blindness that we’ve considered comes from studies using participants from Western cultures, mainly Americans. Would change blindness occur in individuals from other cultures? If so, is there any reason to suspect that it would work differently across cultures? Think back to the Culture & Community box from the Psychology: Evolution of a Science chapter, where we discussed evidence showing that people from Western cultures rely on an analytic style of processing information (i.e., they tend to focus on an object without paying much attention to the surrounding context), whereas people from Eastern cultures tend to adopt a holistic style (i.e., they tend to focus on the relationship between an object and the surrounding context: Kitayama et al., 2003; Nisbett & Miyamoto, 2005).

With this distinction in mind, Masuda and Nisbett (2006) noted that previous studies of change blindness, using mainly American participants, had shown that participants are more likely to detect changes in the main or focal object in a scene, and less likely to detect changes in surrounding context. They hypothesized that individuals from an Eastern culture would be more focused on, and therefore likely to notice, changes in surrounding context than individuals from a Western culture. To test their prediction, they conducted three experiments examining change detection in American (University of Michigan) and Japanese (Kyoto University) college students using still photos and brief movie-type vignettes (Masuda & Nisbett, 2006). In each experiment, they made changes either to the main or focal object in a scene, orto the surrounding context (e.g., objects in the background of a scene).

The results of the experiments were consistent with predictions: Japanese students detected more changes to contextual information than did American students, whereas American students detected more changes to focal objects than did Japanese students. These findings extend earlier reports that people from Eastern and Western cultures see the world differently, with Easterners focusing more on the context in which an object appears and Westerners focusing more on the object itself.

  • Illusory conjunctions occur when features from separate objects are mistakenly combined. According to feature-integration theory, attention provides the glue necessary to bind features together. The parietal lobe is important for attention and contributes to feature binding.
  • Some regions in the occipital and temporal lobes respond selectively to specific object categories, supporting the modular view that specialized brain areas represent particular classes of objects such as faces or houses or body parts.
  • The principle of perceptual constancy holds that even as sensory signals change, perception remains consistent. Gestalt principles of perceptual grouping, such as simplicity, closure, and continuity, govern how the features and regions of things fit together.
  • Image-based and parts-based theories each explain some, but not all, features of object recognition.
  • Depth perception depends on: monocular cues, such as familiar size and linear perspective; binocular cues, such as retinal disparity; and motion-based cues, which are based on the movement of the head over time.
  • We experience a sense of motion through the differences in the strengths of output from motion-sensitive neurons. These processes can give rise to illusions such as apparent motion.
  • Change blindness and inattentional blindness occur when we fail to notice visible and even salient features of our environment, emphasizing that our conscious visual experience depends on focused attention.