Project Summary/Abstract
Effective communication combines auditory and visual (AV) cues in a way that supports linguistic
comprehension. It is widely recognized, for example, that viewing a speaker’s face enhances speech
intelligibility. The specific anatomical and physiological mechanisms by which A and V stimuli related to
communication are integrated into a neural representation that can guide adaptive behavior remain elusive. In
terms of anatomy, the superior temporal gyrus (STG) appears to play a prominent role in AV speech
processing. In addition to being activated during auditory processing, the STG is activated by visual facial
stimuli. Further, the viewing of silent lip-reading activates the STG, suggesting that the STG is involved in
language processing even in the absence of auditory input. Thus, we will focus on the STG. In term of
physiology, it has been hypothesized that in lower level areas, auditory inputs drive neuronal firing patterns
carrying the bulk of the neural representation of verbal information, while visual inputs operate to enhance or
otherwise modulate this firing, and thus the representation. This project will examine an extension of this
hypothesis into the level of the STG. Because the anatomical and physiological resolution with which we can
study AV integration in the normal human STG region is limited, we propose to study the processing and
integration of A and V components of natural vocalizations by macaque monkeys as a model for rudimentary
aspects of AV communication in humans. To this end, we use a sensory task in which monkeys are trained to
discriminate between different conspecific vocalizations while undergoing intracranial recordings from the STG.
Our first aim is to define the physiology of AV interactions in the STG during communication
processing. Current source density (CSD) profiles (reflecting local synaptic activity) and multiunit activity
(MUA) profiles (reflecting concomitant neuronal firing patterns) will be obtained from the STG for A-only, V-
only, and AV components vocalizations. Based upon multisensory integration effects in low-level auditory
areas, we predict that the V input will be modulatory in nature, and will modulate the excitability of ensembles
of STG neurons such that the auditory input arrives when the neuron are at maximal excitability, thereby
amplifying STG output. To further explore this idea, our second aim is to determine how altering AV timing
impacts physiology in STG and behavioral performance. Based again on results from lower-level auditory
areas, we predict that disrupting the natural (and well learned) timing of A and V components will disrupt V
enhancement of A responses. Our findings will yield further insight into how the brain integrates AV stimuli into
a congruent perception, and ultimately, into the mechanisms of language deficits associated with psychiatric
conditions like autism and schizophrenia.