8.6 Multisensory Perception: Combining Senses

In this chapter and the previous one we’ve been discussing the various senses as if they were independent and totally separable. We smell a rose, hear a voice, and see a face. Yet, in everyday experience we rarely experience an event with only a single sense. Rather, most of our experiences are multisensory. Moreover, most of the time, we don’t experience these inputs to our various systems as a chaotic mish-mash, but as an integrated whole. Multisensory integration (sometimes referred to as multimodal integration) refers to the integration of information from different senses by the nervous system.

Multisensory Integration

31

What is the McGurk effect, and how does it demonstrate visual dominance?

Have you ever had the experience of watching a movie or television show when the audio and video did not match? Most people find this mildly disconcerting and often have trouble figuring out what exactly is being said by whom. We also are more likely to understand what a speaker says if we can see his or her lips. The correlation between speech and lip movements is so consistent that infants as young as 2 months of age are able to match a person’s lip movements to the corresponding sounds (Kuhl & Meltzoff, 1982), although this ability improves with age and experience (see Soto-Faraco et al., 2012).

Of course, when we receive conflicting information from two senses—sight and sound, for example—the result in not necessarily an average of the two senses. For humans (and pigeons, Randich et al., 1978), when sight and sound are put in conflict with one another, vision usually “wins.” This is referred to as the visual dominance effect (Colavita, 1974; Posner et al., 1976). In one set of studies, subjects were presented with a tone, lights, or, on a few trials, both tones and lights. They were instructed to press one key, as quickly as possible, when a tone was presented and another key when a light was presented. Subjects were told that the bimodal trials (tones and lights) were simply mistakes made by the experimenter. When the bimodal trials were presented, subjects pressed the “light key” on 49 of 50 trials (Colavita, 1974).

Another example of vision having greater influence on audition than vice versa is the McGurk effect (McGurk & MacDonald, 1976). This is experienced when one hears a person speak one sound (the phoneme “ba,” for example), but watches a face articulating a different sound (the phoneme “ga,” for example). Although the exact same sound is impinging upon your ears, in one case you will hear “ba” (what is actually being said), while in the other you will hear a different sound. In the “ba/ga” example most people hear “da” when hearing “ba” but watching a person mouthing “ga.” You’ve really got to see this to appreciate it (look up “McGurk effect” online to see it for yourself).

Neuroscience of Multisensory Integration

For someone to experience multisensory integration, the brain must somehow be able to respond appropriately to stimuli from two sensory modalities, such as sound and vision. Researchers discovered neurons in the superior colliculus of cats, and later of other mammals, that respond to information from more than one sensory stimulus (Stein & Meredith, 1993; Wallace et al., 2012). Multisensory neurons are neurons that are influenced by stimuli from more than one sense modality. Based on single-cell recording with nonhuman animals, researchers determined that multisensory integration is most apt to be perceived when the individual sensory stimuli (1) come from the same location, (2) arise at approximately the same time, and (3) evoke relatively weak responses when presented in isolation.

Multisensory neurons are found throughout the brain, but multisensory integration also occurs when the outputs of unimodal neurons (neurons that respond only to a single type of sensory stimulation) are integrated (Wallace et al., 2012).

315

The “Bouba/Kiki” Effect

Figure 8.35: The “bouba/kiki” effect Which of these figures is called “bouba” and which is called “kiki”? Most people from a diversity of different language groups pick the rounded figure as “bouba” and the jagged figure as “kiki,” showing that the relation between an object and what it is called is not arbitrary.

Look at the shapes in Figure 8.35. Now if you had to guess, which shape do you think is called “bouba” and which is called “kiki”? If you are like the vast majority of people, you would guess that the rounded shape on the right was “bouba” and the jagged shape on the left was “kiki.” This was first demonstrated in 1929 by the German Gestalt psychologist Wolfgang Köhler, who tested Spanish speakers and used the made-up words “takete” and “baluba.” It has since been repeated a number of times, with Vilayanur S. Ramachandran and Edward Hubbard (2001) reporting that greater than 95 percent of American college students and Tamil speakers from India selected the jagged figure as “kiki” and the rounded figure as “bouba.”

Why should this be? One of the defining characteristics of language is that what we call something is arbitrary, as Shakespeare’s Juliet observed: “A rose by any other name would smell as sweet.” But it seems that what we call some objects in not totally arbitrary. Ramachandran and Hubbard (2001) proposed that the more rounded stimulus corresponds to the rounded way one must form one’s mouth when saying “bouba,” whereas the sharper-looking stimulus corresponds to the way one forms one’s mouth when saying “kiki.” In other words, there is an implicit multisensory match between a visual stimulus and sound, or perhaps the muscle patterns we use to make those sounds. This effect is not due to the appearance of the letters, but to the sounds of the words, as 2.5-year-old toddlers—who can’t yet read—also show the same effect (Maurer et al., 2006).

The Development of Multisensory Integration

The fact that even toddlers show the “bouba/kiki” effect suggests that multisensory perception is well established by early childhood. But how early do children display multisensory integration? Do infants begin life with each of their senses separable and distinct, and over time learn to integrate their senses? This was the position favored by the famous Swiss psychologist Jean Piaget. (We’ll have much more to say about Piaget in Chapter 11.) Or do children start out with their senses initially fused, and only later with maturation and experience do they become separable? This was the position favored by one of the pioneers of psychology, William James (1890/1950), and later championed by the developmental psychologist Eleanor Gibson (1969). Research over the past several decades generally favors the ideas of James and Gibson (Bremner et al., 2012). Although research clearly indicates that intermodal abilities improve with age (see Bremner et al., 2012, for a review), even newborns are capable of recognizing the equivalence between stimuli in two different modalities (a bright light and loud sound, for instance, Lewkowicz & Turkewitz, 1980).

Elizabeth Spelke (1976) provided an interesting demonstration of multisensory integration in 4-month-old infants. Babies watched films on two side-by-side screens. On one screen a woman was playing peek-a-boo, and on the other screen was a hand holding a stick and drumming it against a block of wood. The babies also heard a sound track corresponding either to the peek-a-boo or the drumming video. Infants spent more time looking at the screen that matched the sound. That is, these 4-month-old infants realized that certain sound sequences corresponded to certain visual displays and looked at those displays in which the sounds and the sights matched (Bahrick, 2002; Lewkowicz, 1992).

Although we tend to think of our senses as providing unique information that is separable from information from other senses, the truth is a bit different. Multisensory integration is the norm rather than the exception, and it is present early in life. However, some people’s multisensory experiences fail to show this typical integration, and it is to this topic that we now turn.

316

Synesthesia

32

What are the defining features of synesthesia? Might synesthesia have any adaptive value?

When you listen to music, do you see color? Or perhaps you taste music, or you see numbers and letters in specific colors. If you do, you are one of the about 1 in 20 people with synesthesia (sĭ-nǝs-thē ’-zhǝ), meaning literally “joined perception” (from Greek), a condition in which sensory stimulation in one modality induces a sensation in a different modality. Though known for centuries, synesthesia was more a curiosity than a topic of serious scientific investigation until recently. Few people claimed to be synesthetes (people with synesthesia), and little wonder. Several hundred years ago such claims would brand one as a witch, and in more enlightened times experiences such as “seeing sounds” were apt to be viewed as hallucinations and perhaps a sign of schizophrenia. Since the 1980s or so, however, the situation has changed substantially, as modern science has taken the phenomenon of synesthesia seriously. Current estimates are that between 1 and 4 percent of the population are synesthetes, and, unlike schizophrenia, synesthesia does not interfere with normal functioning and is not classified as a mental disorder (Hochel & Milán, 2008; Simner et al., 2006).

Synesthesia can come in many forms. To date, 61 different types of synesthesia have been identified, with the most common being grapheme-color (Simner, 2012; Simner et al., 2006). Figure 8.36 shows the frequencies of the types of stimuli that evoke synesthesia (called inducers) and the types of synesthetic sensations (called concurrents). As you can see, letters and/or numbers (lexical stimuli) are the most common inducers of synesthesia, followed by music and sound (adapted from Hochel & Milán, 2008). Figure 8.37 shows how one grapheme-color synesthete views the numbers 0 through 8.

Figure 8.36: Frequency of synesthesia These are the relative frequencies of synesthetic inducers (stimuli that induce synesthesia) and concurrents (the sensory system in which one experiences the synesthesia). As you can see, music and lexical (numbers or letters) stimuli are most apt to induce synesthesia, and the vast majority of the synesthesia effect is in terms of colors (for example, “hearing colors” or seeing specific numbers as specific colors). However, as you can see, many other experiences can induce synesthesia, and they can be expressed in just about all of one’s senses (for example, “tasting numbers” or “smelling sounds.”)
(Adapted from Hochel & Milán, 2008, based on data from Day, 2007.)
Figure 8.37: Number-color associations for one synesthete Notice that the numbers 7 and 8 are composed of two colors each.
(With permission from Ramachandran & Brang, 2008.)

How do you know if you’re a synesthete? As currently defined, synesthetic perception is: (a) involuntary and automatic; (b) consistent (for example, if you see “5” as red today, you will see it as red in 2 weeks); (c) spatially extended (look at how the numbers are distributed in Figure 8.37); (d) memorable; and (e) emotional (Cytowic, 2002). Most people report that most of their synesthetic experiences are emotionally positive. Synesthetes report that they have been synesthetic since childhood, and it apparently is a stable characteristic, lasting a lifetime (Hochel & Milán, 2008). Synesthesia runs in families, suggesting a genetic component, but it often skips generations. Moreover, monozygotic (genetically identical) twins do not always share the trait (Smilek et al., 2005), indicating that the genetic route to synesthesia is not a simple one. In one large-scale study, synesthesia was found to be disproportionally frequent in artists (Rich et al., 2005), something that had been anecdotally reported earlier.

317

A number of different theories about the origins of synesthesia have been proposed, and neuroscience has shown that the brains of synesthetes are different from those of “typical” perceivers. Variants of the most common interpretation of synesthesia, the sensory cross-activation hypothesis, propose that it is due to cross-activation between different areas of the brain (Hubbard et al., 2011; Ramachandran & Hubbard, 2001). For the most common type of synesthesia, grapheme-color, the cross-activation is proposed to occur within the fusiform gyrus, between one area that represents the visual appearance of graphemes (numbers and letters) and an adjacent area associated with color vision, called V4 (Ramachandran & Hubbard, 2001; see Figure 8.38). One hypothesis related to the cross-activation theory was originally proposed by Daphne and Charles Maurer (1988), who suggested that human infants are synesthetes, having many neural connections between different sensory areas. Over the course of typical development, most of these synaptic connections get pruned, resulting in increased segregation of the senses (Holcome et al., 2009; Maurer & Mondloch, 2005). Adult synesthetes are people who fail to display the typical pruning of these cross-modal connections. In support of this, we know that for typically developing children the number of synapses in the sensory and association areas of the brain peak in childhood and decline thereafter (Huttenlocher & Drbaholkar, 1997).

Figure 8.38: Regions thought to be cross-activated in grapheme-color synesthesia (green = grapheme recognition area, red = V4 color area).
rom Ramchandran, V.S. & Hubbard, E.M. (2001) “Synaesthesia: A window into perception, thought and language.” Journal of Consciousness Studies.

A finding consistent with Maurer and Maurer’s interpretation is that adult synesthetes have greater structural connectivity (as reflected by patterns of white matter, the myelin coating on axons, described in Chapter 5) for different parts of the brain than non-synesthetes (Rouw & Scholte, 2007). Other brain-imaging research has shown differences in gray matter (neurons) between different areas of the brains of synesthetes and non-synesthetes (Banissy et al., 2012). Much is still to be learned about the neuroscience of synesthesia, but the more we learn, the more complex the picture becomes. For example, a recent review of the neuroscience literature concluded that a network of brain areas, rather than just a single area, is involved in synesthesia (Rouw et al., 2011).

One question you may ask is, why is synesthesia so common? Why hasn’t natural selection weeded out people with excessive connections between different sensory areas? To put it another way, might people with synesthesia have some adaptive advantage? Vilayanur Ramachandran and his colleagues (Ramachandran & Brang, 2008; Ramachandran & Hubbard, 2001) proposed that the cross-wiring that seems to occur in synesthesia may play a role in seeing connections between higher-order concepts, such as metaphors (“Juliet is the sun,” “soft blue”). The fact that artists are more likely to be synesthetes than the general population is consistent with this theory.

318