151
|
Abstract
UNLABELLED The brain's circuitry for perceiving and producing speech may show a notable level of overlap that is crucial for normal development and behavior. The extent to which sensorimotor integration plays a role in speech perception remains highly controversial, however. Methodological constraints related to experimental designs and analysis methods have so far prevented the disentanglement of neural responses to acoustic versus articulatory speech features. Using a passive listening paradigm and multivariate decoding of single-trial fMRI responses to spoken syllables, we investigated brain-based generalization of articulatory features (place and manner of articulation, and voicing) beyond their acoustic (surface) form in adult human listeners. For example, we trained a classifier to discriminate place of articulation within stop syllables (e.g., /pa/ vs /ta/) and tested whether this training generalizes to fricatives (e.g., /fa/ vs /sa/). This novel approach revealed generalization of place and manner of articulation at multiple cortical levels within the dorsal auditory pathway, including auditory, sensorimotor, motor, and somatosensory regions, suggesting the representation of sensorimotor information. Additionally, generalization of voicing included the right anterior superior temporal sulcus associated with the perception of human voices as well as somatosensory regions bilaterally. Our findings highlight the close connection between brain systems for speech perception and production, and in particular, indicate the availability of articulatory codes during passive speech perception. SIGNIFICANCE STATEMENT Sensorimotor integration is central to verbal communication and provides a link between auditory signals of speech perception and motor programs of speech production. It remains highly controversial, however, to what extent the brain's speech perception system actively uses articulatory (motor), in addition to acoustic/phonetic, representations. In this study, we examine the role of articulatory representations during passive listening using carefully controlled stimuli (spoken syllables) in combination with multivariate fMRI decoding. Our approach enabled us to disentangle brain responses to acoustic and articulatory speech properties. In particular, it revealed articulatory-specific brain responses of speech at multiple cortical levels, including auditory, sensorimotor, and motor regions, suggesting the representation of sensorimotor information during passive speech perception.
Collapse
|
152
|
Stiers P, Falbo L, Goulas A, van Gog T, de Bruin A. Reverse inference of memory retrieval processes underlying metacognitive monitoring of learning using multivariate pattern analysis. Neuroimage 2016; 132:11-23. [PMID: 26883066 DOI: 10.1016/j.neuroimage.2016.02.008] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2015] [Revised: 12/02/2015] [Accepted: 02/07/2016] [Indexed: 12/18/2022] Open
Abstract
Monitoring of learning is only accurate at some time after learning. It is thought that immediate monitoring is based on working memory, whereas later monitoring requires re-activation of stored items, yielding accurate judgements. Such interpretations are difficult to test because they require reverse inference, which presupposes specificity of brain activity for the hidden cognitive processes. We investigated whether multivariate pattern classification can provide this specificity. We used a word recall task to create single trial examples of immediate and long term retrieval and trained a learning algorithm to discriminate them. Next, participants performed a similar task involving monitoring instead of recall. The recall-trained classifier recognized the retrieval patterns underlying immediate and long term monitoring and classified delayed monitoring examples as long-term retrieval. This result demonstrates the feasibility of decoding cognitive processes, instead of their content.
Collapse
Affiliation(s)
- Peter Stiers
- Department of Neuropsychology and Psychopharmacology, Maastricht University, Maastricht, The Netherlands.
| | - Luciana Falbo
- Department of Neuropsychology and Psychopharmacology, Maastricht University, Maastricht, The Netherlands
| | - Alexandros Goulas
- Department of Neuropsychology and Psychopharmacology, Maastricht University, Maastricht, The Netherlands
| | - Tamara van Gog
- Department of Educational Psychology, Erasmus University Rotterdam, The Netherlands
| | - Anique de Bruin
- Department of Educational Research & Development, Maastricht University, The Netherlands
| |
Collapse
|
153
|
Zhang Q, Hu X, Luo H, Li J, Zhang X, Zhang B. Deciphering phonemes from syllables in blood oxygenation level-dependent signals in human superior temporal gyrus. Eur J Neurosci 2016; 43:773-81. [DOI: 10.1111/ejn.13164] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2015] [Revised: 01/06/2016] [Accepted: 01/06/2016] [Indexed: 11/30/2022]
Affiliation(s)
- Qingtian Zhang
- Tsinghua National Laboratory for Information Science and Technology (TNList); Department of Computer Science and Technology; Tsinghua University; Room 4-504, FIT Building Beijing 100084 China
| | - Xiaolin Hu
- Tsinghua National Laboratory for Information Science and Technology (TNList); Department of Computer Science and Technology; Tsinghua University; Room 4-504, FIT Building Beijing 100084 China
- Center for Brain-Inspired Computing Research (CBICR); Tsinghua University; Beijing China
| | - Huan Luo
- Department of Psychology; Peking University; Beijing China
- IDG/McGovern Institute for Brain Research; Peking University; Beijing China
| | - Jianmin Li
- Tsinghua National Laboratory for Information Science and Technology (TNList); Department of Computer Science and Technology; Tsinghua University; Room 4-504, FIT Building Beijing 100084 China
| | - Xiaolu Zhang
- Tsinghua National Laboratory for Information Science and Technology (TNList); Department of Computer Science and Technology; Tsinghua University; Room 4-504, FIT Building Beijing 100084 China
| | - Bo Zhang
- Tsinghua National Laboratory for Information Science and Technology (TNList); Department of Computer Science and Technology; Tsinghua University; Room 4-504, FIT Building Beijing 100084 China
- Center for Brain-Inspired Computing Research (CBICR); Tsinghua University; Beijing China
| |
Collapse
|
154
|
Pinheiro AP, Rezaii N, Nestor PG, Rauber A, Spencer KM, Niznikiewicz M. Did you or I say pretty, rude or brief? An ERP study of the effects of speaker's identity on emotional word processing. BRAIN AND LANGUAGE 2016; 153-154:38-49. [PMID: 26894680 DOI: 10.1016/j.bandl.2015.12.003] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2015] [Revised: 11/19/2015] [Accepted: 12/10/2015] [Indexed: 06/05/2023]
Abstract
During speech comprehension, multiple cues need to be integrated at a millisecond speed, including semantic information, as well as voice identity and affect cues. A processing advantage has been demonstrated for self-related stimuli when compared with non-self stimuli, and for emotional relative to neutral stimuli. However, very few studies investigated self-other speech discrimination and, in particular, how emotional valence and voice identity interactively modulate speech processing. In the present study we probed how the processing of words' semantic valence is modulated by speaker's identity (self vs. non-self voice). Sixteen healthy subjects listened to 420 prerecorded adjectives differing in voice identity (self vs. non-self) and semantic valence (neutral, positive and negative), while electroencephalographic data were recorded. Participants were instructed to decide whether the speech they heard was their own (self-speech condition), someone else's (non-self speech), or if they were unsure. The ERP results demonstrated interactive effects of speaker's identity and emotional valence on both early (N1, P2) and late (Late Positive Potential - LPP) processing stages: compared with non-self speech, self-speech with neutral valence elicited more negative N1 amplitude, self-speech with positive valence elicited more positive P2 amplitude, and self-speech with both positive and negative valence elicited more positive LPP. ERP differences between self and non-self speech occurred in spite of similar accuracy in the recognition of both types of stimuli. Together, these findings suggest that emotion and speaker's identity interact during speech processing, in line with observations of partially dependent processing of speech and speaker information.
Collapse
Affiliation(s)
- Ana P Pinheiro
- Neuropsychophysiology Laboratory, Psychology Research Center (CIPsi), School of Psychology, University of Minho, Braga, Portugal; Clinical Neuroscience Division, Laboratory of Neuroscience, VA Boston Healthcare System-Brockton Division, Department of Psychiatry, Harvard Medical School, Brockton, MA, United States; Faculty of Psychology, University of Lisbon, Lisbon, Portugal.
| | - Neguine Rezaii
- Clinical Neuroscience Division, Laboratory of Neuroscience, VA Boston Healthcare System-Brockton Division, Department of Psychiatry, Harvard Medical School, Brockton, MA, United States
| | - Paul G Nestor
- Clinical Neuroscience Division, Laboratory of Neuroscience, VA Boston Healthcare System-Brockton Division, Department of Psychiatry, Harvard Medical School, Brockton, MA, United States; Department of Psychology, University of Massachusetts, Boston, MA, United States
| | - Andréia Rauber
- International Studies in Computational Linguistics, University of Tübingen, Tübingen, Germany
| | - Kevin M Spencer
- Neural Dynamics Laboratory, Research Service, VA Boston Healthcare System, and Department of Psychiatry, Harvard Medical School, Boston, MA, United States
| | - Margaret Niznikiewicz
- Clinical Neuroscience Division, Laboratory of Neuroscience, VA Boston Healthcare System-Brockton Division, Department of Psychiatry, Harvard Medical School, Brockton, MA, United States
| |
Collapse
|
155
|
Damarla SR, Cherkassky VL, Just MA. Modality-independent representations of small quantities based on brain activation patterns. Hum Brain Mapp 2016; 37:1296-307. [PMID: 26749189 DOI: 10.1002/hbm.23102] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2015] [Revised: 12/16/2015] [Accepted: 12/16/2015] [Indexed: 11/11/2022] Open
Abstract
Machine learning or MVPA (Multi Voxel Pattern Analysis) studies have shown that the neural representation of quantities of objects can be decoded from fMRI patterns, in cases where the quantities were visually displayed. Here we apply these techniques to investigate whether neural representations of quantities depicted in one modality (say, visual) can be decoded from brain activation patterns evoked by quantities depicted in the other modality (say, auditory). The main finding demonstrated, for the first time, that quantities of dots were decodable by a classifier that was trained on the neural patterns evoked by quantities of auditory tones, and vice-versa. The representations that were common across modalities were mainly right-lateralized in frontal and parietal regions. A second finding was that the neural patterns in parietal cortex that represent quantities were common across participants. These findings demonstrate a common neuronal foundation for the representation of quantities across sensory modalities and participants and provide insight into the role of parietal cortex in the representation of quantity information.
Collapse
Affiliation(s)
- Saudamini Roy Damarla
- Department of Psychology, Center for Cognitive Brain Imaging, Carnegie Mellon University, Pittsburgh, Pennsylvania
| | - Vladimir L Cherkassky
- Department of Psychology, Center for Cognitive Brain Imaging, Carnegie Mellon University, Pittsburgh, Pennsylvania
| | - Marcel Adam Just
- Department of Psychology, Center for Cognitive Brain Imaging, Carnegie Mellon University, Pittsburgh, Pennsylvania
| |
Collapse
|
156
|
Pure word deafness with auditory object agnosia after bilateral lesion of the superior temporal sulcus. Cortex 2015; 73:24-35. [DOI: 10.1016/j.cortex.2015.08.001] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2014] [Revised: 05/11/2015] [Accepted: 08/03/2015] [Indexed: 11/30/2022]
|
157
|
Lindquist MA, Krishnan A, López-Solà M, Jepma M, Woo CW, Koban L, Roy M, Atlas LY, Schmidt L, Chang LJ, Reynolds Losin EA, Eisenbarth H, Ashar YK, Delk E, Wager TD. Group-regularized individual prediction: theory and application to pain. Neuroimage 2015; 145:274-287. [PMID: 26592808 DOI: 10.1016/j.neuroimage.2015.10.074] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2015] [Revised: 09/30/2015] [Accepted: 10/24/2015] [Indexed: 12/13/2022] Open
Abstract
Multivariate pattern analysis (MVPA) has become an important tool for identifying brain representations of psychological processes and clinical outcomes using fMRI and related methods. Such methods can be used to predict or 'decode' psychological states in individual subjects. Single-subject MVPA approaches, however, are limited by the amount and quality of individual-subject data. In spite of higher spatial resolution, predictive accuracy from single-subject data often does not exceed what can be accomplished using coarser, group-level maps, because single-subject patterns are trained on limited amounts of often-noisy data. Here, we present a method that combines population-level priors, in the form of biomarker patterns developed on prior samples, with single-subject MVPA maps to improve single-subject prediction. Theoretical results and simulations motivate a weighting based on the relative variances of biomarker-based prediction-based on population-level predictive maps from prior groups-and individual-subject, cross-validated prediction. Empirical results predicting pain using brain activity on a trial-by-trial basis (single-trial prediction) across 6 studies (N=180 participants) confirm the theoretical predictions. Regularization based on a population-level biomarker-in this case, the Neurologic Pain Signature (NPS)-improved single-subject prediction accuracy compared with idiographic maps based on the individuals' data alone. The regularization scheme that we propose, which we term group-regularized individual prediction (GRIP), can be applied broadly to within-person MVPA-based prediction. We also show how GRIP can be used to evaluate data quality and provide benchmarks for the appropriateness of population-level maps like the NPS for a given individual or study.
Collapse
Affiliation(s)
| | - Anjali Krishnan
- University of Colorado Boulder, USA; Brooklyn College of the City University of New York, USA
| | | | | | | | | | | | - Lauren Y Atlas
- National Center for Complementary and Integrative Health, National Institutes of Health, USA
| | - Liane Schmidt
- INSEAD, France; Cognitive Neuroscience Laboratory, INSERM U960, Department of Cognitive Sciences, Ecole Normale Supérieure, Paris, France
| | | | | | | | | | | | | |
Collapse
|
158
|
Who is That? Brain Networks and Mechanisms for Identifying Individuals. Trends Cogn Sci 2015; 19:783-796. [PMID: 26454482 PMCID: PMC4673906 DOI: 10.1016/j.tics.2015.09.002] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2015] [Revised: 08/31/2015] [Accepted: 09/01/2015] [Indexed: 01/29/2023]
Abstract
Social animals can identify conspecifics by many forms of sensory input. However, whether the neuronal computations that support this ability to identify individuals rely on modality-independent convergence or involve ongoing synergistic interactions along the multiple sensory streams remains controversial. Direct neuronal measurements at relevant brain sites could address such questions, but this requires better bridging the work in humans and animal models. Here, we overview recent studies in nonhuman primates on voice and face identity-sensitive pathways and evaluate the correspondences to relevant findings in humans. This synthesis provides insights into converging sensory streams in the primate anterior temporal lobe (ATL) for identity processing. Furthermore, we advance a model and suggest how alternative neuronal mechanisms could be tested. Our ability to identify unique entities, such as specific individuals, appears to depend on sensory convergence in the anterior temporal lobe. However, the neural mechanisms of sensory convergence in the anterior temporal lobe are unclear. Alternative accounts remain equivocal but could be tested by better bridging the findings in humans and animal models. Recent work in monkeys on face- and voice-identity processes is helping to close epistemic gaps between studies in humans and animal models. We synthesize recent knowledge on the convergence of auditory and visual identity-related processes in the anterior temporal lobe. This synthesis culminates in a model and insights into converging sensory streams in the primate brain, and is used to suggest how the neuronal mechanisms for identifying individuals could be tested.
Collapse
|
159
|
Pell MD, Rothermich K, Liu P, Paulmann S, Sethi S, Rigoulot S. Preferential decoding of emotion from human non-linguistic vocalizations versus speech prosody. Biol Psychol 2015; 111:14-25. [PMID: 26307467 DOI: 10.1016/j.biopsycho.2015.08.008] [Citation(s) in RCA: 76] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2015] [Revised: 08/04/2015] [Accepted: 08/19/2015] [Indexed: 11/26/2022]
Abstract
This study used event-related brain potentials (ERPs) to compare the time course of emotion processing from non-linguistic vocalizations versus speech prosody, to test whether vocalizations are treated preferentially by the neurocognitive system. Participants passively listened to vocalizations or pseudo-utterances conveying anger, sadness, or happiness as the EEG was recorded. Simultaneous effects of vocal expression type and emotion were analyzed for three ERP components (N100, P200, late positive component). Emotional vocalizations and speech were differentiated very early (N100) and vocalizations elicited stronger, earlier, and more differentiated P200 responses than speech. At later stages (450-700ms), anger vocalizations evoked a stronger late positivity (LPC) than other vocal expressions, which was similar but delayed for angry speech. Individuals with high trait anxiety exhibited early, heightened sensitivity to vocal emotions (particularly vocalizations). These data provide new neurophysiological evidence that vocalizations, as evolutionarily primitive signals, are accorded precedence over speech-embedded emotions in the human voice.
Collapse
Affiliation(s)
- M D Pell
- School of Communication Sciences and Disorders, McGill University, Montreal, Canada; International Laboratory for Brain, Music, and Sound Research, Montreal, Canada.
| | - K Rothermich
- School of Communication Sciences and Disorders, McGill University, Montreal, Canada
| | - P Liu
- School of Communication Sciences and Disorders, McGill University, Montreal, Canada
| | - S Paulmann
- Department of Psychology and Centre for Brain Science, University of Essex, Colchester, United Kingdom
| | - S Sethi
- School of Communication Sciences and Disorders, McGill University, Montreal, Canada
| | - S Rigoulot
- International Laboratory for Brain, Music, and Sound Research, Montreal, Canada
| |
Collapse
|
160
|
Abstract
Designing a "cocktail party listener" that functionally mimics the selective perception of a human auditory system has been pursued over the past decades. By exploiting acoustic metamaterials and compressive sensing, we present here a single-sensor listening device that separates simultaneous overlapping sounds from different sources. The device with a compact array of resonant metamaterials is demonstrated to distinguish three overlapping and independent sources with 96.67% correct audio recognition. Segregation of the audio signals is achieved using physical layer encoding without relying on source characteristics. This hardware approach to multichannel source separation can be applied to robust speech recognition and hearing aids and may be extended to other acoustic imaging and sensing applications.
Collapse
|
161
|
Lee YS, Peelle JE, Kraemer D, Lloyd S, Granger R. Multivariate sensitivity to voice during auditory categorization. J Neurophysiol 2015; 114:1819-26. [PMID: 26245316 DOI: 10.1152/jn.00407.2014] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2014] [Accepted: 07/31/2015] [Indexed: 11/22/2022] Open
Abstract
Past neuroimaging studies have documented discrete regions of human temporal cortex that are more strongly activated by conspecific voice sounds than by nonvoice sounds. However, the mechanisms underlying this voice sensitivity remain unclear. In the present functional MRI study, we took a novel approach to examining voice sensitivity, in which we applied a signal detection paradigm to the assessment of multivariate pattern classification among several living and nonliving categories of auditory stimuli. Within this framework, voice sensitivity can be interpreted as a distinct neural representation of brain activity that correctly distinguishes human vocalizations from other auditory object categories. Across a series of auditory categorization tests, we found that bilateral superior and middle temporal cortex consistently exhibited robust sensitivity to human vocal sounds. Although the strongest categorization was in distinguishing human voice from other categories, subsets of these regions were also able to distinguish reliably between nonhuman categories, suggesting a general role in auditory object categorization. Our findings complement the current evidence of cortical sensitivity to human vocal sounds by revealing that the greatest sensitivity during categorization tasks is devoted to distinguishing voice from nonvoice categories within human temporal cortex.
Collapse
Affiliation(s)
- Yune Sang Lee
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, New Hampshire;
| | - Jonathan E Peelle
- Department of Otolaryngology, Washington University in St. Louis, St. Louis, Missouri; and
| | - David Kraemer
- Department of Otolaryngology, Washington University in St. Louis, St. Louis, Missouri; and Department of Education, Dartmouth College, Hanover, New Hampshire
| | - Samuel Lloyd
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, New Hampshire
| | - Richard Granger
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, New Hampshire
| |
Collapse
|
162
|
Chen YP, Nelson LD, Hsu M. From "Where" to "What": Distributed Representations of Brand Associations in the Human Brain. JMR, JOURNAL OF MARKETING RESEARCH 2015; 52:453-466. [PMID: 27065490 PMCID: PMC4822556 DOI: 10.1509/jmr.14.0606] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Considerable attention has been given to the notion that there exists a set of human-like characteristics associated with brands, referred to as brand personality. Here we combine newly available machine learning techniques with functional neuroimaging data to characterize the set of processes that give rise to these associations. We show that brand personality traits can be captured by the weighted activity across a widely distributed set of brain regions previously implicated in reasoning, imagery, and affective processing. That is, as opposed to being constructed via reflective processes, brand personality traits appear to exist a priori inside the minds of consumers, such that we were able to predict what brand a person is thinking about based solely on the relationship between brand personality associations and brain activity. These findings represent an important advance in the application of neuroscientific methods to consumer research, moving from work focused on cataloguing brain regions associated with marketing stimuli to testing and refining mental constructs central to theories of consumer behavior.
Collapse
Affiliation(s)
- Yu-Ping Chen
- Haas School of Business, University of California, Berkeley
- Helen Wills Neuroscience Institute, University of California, Berkeley
| | - Leif D. Nelson
- Haas School of Business, University of California, Berkeley
| | - Ming Hsu
- Haas School of Business, University of California, Berkeley
- Helen Wills Neuroscience Institute, University of California, Berkeley
| |
Collapse
|
163
|
Abstract
Sensory processing involves identification of stimulus features, but also integration with the surrounding sensory and cognitive context. Previous work in animals and humans has shown fine-scale sensitivity to context in the form of learned knowledge about the statistics of the sensory environment, including relative probabilities of discrete units in a stream of sequential auditory input. These statistics are a defining characteristic of one of the most important sequential signals humans encounter: speech. For speech, extensive exposure to a language tunes listeners to the statistics of sound sequences. To address how speech sequence statistics are neurally encoded, we used high-resolution direct cortical recordings from human lateral superior temporal cortex as subjects listened to words and nonwords with varying transition probabilities between sound segments. In addition to their sensitivity to acoustic features (including contextual features, such as coarticulation), we found that neural responses dynamically encoded the language-level probability of both preceding and upcoming speech sounds. Transition probability first negatively modulated neural responses, followed by positive modulation of neural responses, consistent with coordinated predictive and retrospective recognition processes, respectively. Furthermore, transition probability encoding was different for real English words compared with nonwords, providing evidence for online interactions with high-order linguistic knowledge. These results demonstrate that sensory processing of deeply learned stimuli involves integrating physical stimulus features with their contextual sequential structure. Despite not being consciously aware of phoneme sequence statistics, listeners use this information to process spoken input and to link low-level acoustic representations with linguistic information about word identity and meaning.
Collapse
|
164
|
Cvikel N, Levin E, Hurme E, Borissov I, Boonman A, Amichai E, Yovel Y. On-board recordings reveal no jamming avoidance in wild bats. Proc Biol Sci 2015; 282:20142274. [PMID: 25429017 DOI: 10.1098/rspb.2014.2274] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Animals often deal with situations in which vast sensory input is received simultaneously. They therefore must possess sophisticated mechanisms to select important input and ignore the rest. In bat echolocation, this problem is at its extreme. Echolocating bats emit sound signals and analyse the returning echoes to sense their environment. Bats from the same species use signals with similar frequencies. Nearby bats therefore face the difficulty of distinguishing their own echoes from the signals of other bats, a problem often referred to as jamming. Because bats commonly fly in large groups, jamming might simultaneously occur from numerous directions and at many frequencies. Jamming is a special case of the general phenomenon of sensory segregation. Another well-known example is the human problem of following conversation within a crowd. In both situations, a flood of auditory incoming signals must be parsed into important versus irrelevant information. Here, we present a novel method, fitting wild bats with a miniature microphone, which allows studying jamming from the bat's 'point of view'. Previous studies suggested that bats deal with jamming by shifting their echolocation frequency. On-board recordings suggest otherwise. Bats shifted their frequencies, but they did so because they were responding to the conspecifics as though they were nearby objects rather than avoiding being jammed by them. We show how bats could use alternative measures to deal with jamming instead of shifting their frequency. Despite its intuitive appeal, a spectral jamming avoidance response might not be the prime mechanism to avoid sensory interference from conspecifics.
Collapse
Affiliation(s)
- Noam Cvikel
- Department of Zoology, Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Eran Levin
- Department of Entomology, University of Arizona, Tuscon, AZ 85721, USA
| | - Edward Hurme
- Department of Biology, University of Maryland, College Park, MD 20742, USA
| | - Ivailo Borissov
- Department of Zoology, Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Arjan Boonman
- Department of Zoology, Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Eran Amichai
- Department of Zoology, Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Yossi Yovel
- Department of Zoology, Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel Sagol School of Neuroscience, Tel Aviv University, Tel Aviv 6997801, Israel
| |
Collapse
|
165
|
Evans S, Davis MH. Hierarchical Organization of Auditory and Motor Representations in Speech Perception: Evidence from Searchlight Similarity Analysis. Cereb Cortex 2015; 25:4772-88. [PMID: 26157026 PMCID: PMC4635918 DOI: 10.1093/cercor/bhv136] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
How humans extract the identity of speech sounds from highly variable acoustic signals remains unclear. Here, we use searchlight representational similarity analysis (RSA) to localize and characterize neural representations of syllables at different levels of the hierarchically organized temporo-frontal pathways for speech perception. We asked participants to listen to spoken syllables that differed considerably in their surface acoustic form by changing speaker and degrading surface acoustics using noise-vocoding and sine wave synthesis while we recorded neural responses with functional magnetic resonance imaging. We found evidence for a graded hierarchy of abstraction across the brain. At the peak of the hierarchy, neural representations in somatomotor cortex encoded syllable identity but not surface acoustic form, at the base of the hierarchy, primary auditory cortex showed the reverse. In contrast, bilateral temporal cortex exhibited an intermediate response, encoding both syllable identity and the surface acoustic form of speech. Regions of somatomotor cortex associated with encoding syllable identity in perception were also engaged when producing the same syllables in a separate session. These findings are consistent with a hierarchical account of how variable acoustic signals are transformed into abstract representations of the identity of speech sounds.
Collapse
Affiliation(s)
- Samuel Evans
- MRC Cognition and Brain Sciences Unit, Cambridge CB2 7EF, UK Institute of Cognitive Neuroscience, University College London, WC1 3AR, UK
| | - Matthew H Davis
- MRC Cognition and Brain Sciences Unit, Cambridge CB2 7EF, UK
| |
Collapse
|
166
|
Pernet CR, McAleer P, Latinus M, Gorgolewski KJ, Charest I, Bestelmeyer PEG, Watson RH, Fleming D, Crabbe F, Valdes-Sosa M, Belin P. The human voice areas: Spatial organization and inter-individual variability in temporal and extra-temporal cortices. Neuroimage 2015; 119:164-74. [PMID: 26116964 PMCID: PMC4768083 DOI: 10.1016/j.neuroimage.2015.06.050] [Citation(s) in RCA: 133] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2014] [Revised: 06/15/2015] [Accepted: 06/18/2015] [Indexed: 12/02/2022] Open
Abstract
fMRI studies increasingly examine functions and properties of non-primary areas of human auditory cortex. However there is currently no standardized localization procedure to reliably identify specific areas across individuals such as the standard ‘localizers’ available in the visual domain. Here we present an fMRI ‘voice localizer’ scan allowing rapid and reliable localization of the voice-sensitive ‘temporal voice areas’ (TVA) of human auditory cortex. We describe results obtained using this standardized localizer scan in a large cohort of normal adult subjects. Most participants (94%) showed bilateral patches of significantly greater response to vocal than non-vocal sounds along the superior temporal sulcus/gyrus (STS/STG). Individual activation patterns, although reproducible, showed high inter-individual variability in precise anatomical location. Cluster analysis of individual peaks from the large cohort highlighted three bilateral clusters of voice-sensitivity, or “voice patches” along posterior (TVAp), mid (TVAm) and anterior (TVAa) STS/STG, respectively. A series of extra-temporal areas including bilateral inferior prefrontal cortex and amygdalae showed small, but reliable voice-sensitivity as part of a large-scale cerebral voice network. Stimuli for the voice localizer scan and probabilistic maps in MNI space are available for download. Three “voice patches” along human superior temporal gyrus/sulcus. Anatomical location reproducible within- but variable between-individuals. Extended voice processing network includes amygdala and prefrontal cortex. Stimulus material for “voice localizer” scan available for download.
Collapse
Affiliation(s)
- Cyril R Pernet
- Cente for Clinical Brain Sciences, Neuroimaging Sciences, The University of Edinburgh, United Kingdom.
| | - Phil McAleer
- Institute of Neuroscience and Psychology, University of Glasgow, United Kingdom
| | - Marianne Latinus
- Institut des Neurosciences de La Timone, UMR 7289, CNRS & Université Aix-Marseille, France
| | | | - Ian Charest
- Cognition and Brain Sciences Unit, Medical Research Council, Cambridge, United Kingdom
| | | | - Rebecca H Watson
- Faculty of Psychology and Neuroscience, Maastricht University, The Netherlands
| | - David Fleming
- Institute of Neuroscience and Psychology, University of Glasgow, United Kingdom
| | - Frances Crabbe
- Institute of Neuroscience and Psychology, University of Glasgow, United Kingdom
| | | | - Pascal Belin
- Institute of Neuroscience and Psychology, University of Glasgow, United Kingdom; Institut des Neurosciences de La Timone, UMR 7289, CNRS & Université Aix-Marseille, France; Département de Psychologie, Université de Montréal, Canada.
| |
Collapse
|
167
|
Herff C, Heger D, de Pesters A, Telaar D, Brunner P, Schalk G, Schultz T. Brain-to-text: decoding spoken phrases from phone representations in the brain. Front Neurosci 2015; 9:217. [PMID: 26124702 PMCID: PMC4464168 DOI: 10.3389/fnins.2015.00217] [Citation(s) in RCA: 144] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2015] [Accepted: 05/18/2015] [Indexed: 11/24/2022] Open
Abstract
It has long been speculated whether communication between humans and machines based on natural speech related cortical activity is possible. Over the past decade, studies have suggested that it is feasible to recognize isolated aspects of speech from neural signals, such as auditory features, phones or one of a few isolated words. However, until now it remained an unsolved challenge to decode continuously spoken speech from the neural substrate associated with speech and language processing. Here, we show for the first time that continuously spoken speech can be decoded into the expressed words from intracranial electrocorticographic (ECoG) recordings.Specifically, we implemented a system, which we call Brain-To-Text that models single phones, employs techniques from automatic speech recognition (ASR), and thereby transforms brain activity while speaking into the corresponding textual representation. Our results demonstrate that our system can achieve word error rates as low as 25% and phone error rates below 50%. Additionally, our approach contributes to the current understanding of the neural basis of continuous speech production by identifying those cortical regions that hold substantial information about individual phones. In conclusion, the Brain-To-Text system described in this paper represents an important step toward human-machine communication based on imagined speech.
Collapse
Affiliation(s)
- Christian Herff
- Cognitive Systems Lab, Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology Karlsruhe, Germany
| | - Dominic Heger
- Cognitive Systems Lab, Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology Karlsruhe, Germany
| | - Adriana de Pesters
- New York State Department of Health, National Center for Adaptive Neurotechnologies, Wadsworth Center Albany, NY, USA ; Department of Biomedical Sciences, State University of New York at Albany Albany, NY, USA
| | - Dominic Telaar
- Cognitive Systems Lab, Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology Karlsruhe, Germany
| | - Peter Brunner
- New York State Department of Health, National Center for Adaptive Neurotechnologies, Wadsworth Center Albany, NY, USA ; Department of Neurology, Albany Medical College Albany, NY, USA
| | - Gerwin Schalk
- New York State Department of Health, National Center for Adaptive Neurotechnologies, Wadsworth Center Albany, NY, USA ; Department of Biomedical Sciences, State University of New York at Albany Albany, NY, USA ; Department of Neurology, Albany Medical College Albany, NY, USA
| | - Tanja Schultz
- Cognitive Systems Lab, Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology Karlsruhe, Germany
| |
Collapse
|
168
|
Floren A, Naylor B, Miikkulainen R, Ress D. Accurately decoding visual information from fMRI data obtained in a realistic virtual environment. Front Hum Neurosci 2015; 9:327. [PMID: 26106315 PMCID: PMC4460535 DOI: 10.3389/fnhum.2015.00327] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2015] [Accepted: 05/21/2015] [Indexed: 11/13/2022] Open
Abstract
Three-dimensional interactive virtual environments (VEs) are a powerful tool for brain-imaging based cognitive neuroscience that are presently under-utilized. This paper presents machine-learning based methods for identifying brain states induced by realistic VEs with improved accuracy as well as the capability for mapping their spatial topography on the neocortex. VEs provide the ability to study the brain under conditions closer to the environment in which humans evolved, and thus to probe deeper into the complexities of human cognition. As a test case, we designed a stimulus to reflect a military combat situation in the Middle East, motivated by the potential of using real-time functional magnetic resonance imaging (fMRI) in the treatment of post-traumatic stress disorder. Each subject experienced moving through the virtual town where they encountered 1-6 animated combatants at different locations, while fMRI data was collected. To analyze the data from what is, compared to most studies, more complex and less controlled stimuli, we employed statistical machine learning in the form of Multi-Voxel Pattern Analysis (MVPA) with special attention given to artificial Neural Networks (NN). Extensions to NN that exploit the block structure of the stimulus were developed to improve the accuracy of the classification, achieving performances from 58 to 93% (chance was 16.7%) with six subjects. This demonstrates that MVPA can decode a complex cognitive state, viewing a number of characters, in a dynamic virtual environment. To better understand the source of this information in the brain, a novel form of sensitivity analysis was developed to use NN to quantify the degree to which each voxel contributed to classification. Compared with maps produced by general linear models and the searchlight approach, these sensitivity maps revealed a more diverse pattern of information relevant to the classification of cognitive state.
Collapse
Affiliation(s)
- Andrew Floren
- Electrical and Computer Engineering Department, The University of Texas at AustinAustin, TX, USA
| | - Bruce Naylor
- Department of Neuroscience, The University of Texas at AustinAustin, TX, USA
| | - Risto Miikkulainen
- Department of Computer Science, The University of Texas at AustinAustin, TX, USA
| | - David Ress
- Human Neuroimaging Laboratory, Baylor College of MedicineHouston, TX, USA
| |
Collapse
|
169
|
Overath T, McDermott JH, Zarate JM, Poeppel D. The cortical analysis of speech-specific temporal structure revealed by responses to sound quilts. Nat Neurosci 2015; 18:903-11. [PMID: 25984889 PMCID: PMC4769593 DOI: 10.1038/nn.4021] [Citation(s) in RCA: 133] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2015] [Accepted: 04/20/2015] [Indexed: 11/08/2022]
Abstract
Speech contains temporal structure that the brain must analyze to enable linguistic processing. To investigate the neural basis of this analysis, we used sound quilts, stimuli constructed by shuffling segments of a natural sound, approximately preserving its properties on short timescales while disrupting them on longer scales. We generated quilts from foreign speech to eliminate language cues and manipulated the extent of natural acoustic structure by varying the segment length. Using functional magnetic resonance imaging, we identified bilateral regions of the superior temporal sulcus (STS) whose responses varied with segment length. This effect was absent in primary auditory cortex and did not occur for quilts made from other natural sounds or acoustically matched synthetic sounds, suggesting tuning to speech-specific spectrotemporal structure. When examined parametrically, the STS response increased with segment length up to ∼500 ms. Our results identify a locus of speech analysis in human auditory cortex that is distinct from lexical, semantic or syntactic processes.
Collapse
Affiliation(s)
- Tobias Overath
- 1] Duke Institute for Brain Sciences, Duke University, Durham, North Carolina, USA. [2] Department of Psychology, New York University, New York, New York, USA
| | - Josh H McDermott
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, USA
| | - Jean Mary Zarate
- Department of Psychology, New York University, New York, New York, USA
| | - David Poeppel
- 1] Department of Psychology, New York University, New York, New York, USA. [2] Center for Neural Science, New York University, New York, New York, USA. [3] Max Planck Institute for Empirical Aesthetics, Frankfurt, Germany
| |
Collapse
|
170
|
Decoding speech perception from single cell activity in humans. Neuroimage 2015; 117:151-9. [PMID: 25976925 DOI: 10.1016/j.neuroimage.2015.05.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2014] [Revised: 03/27/2015] [Accepted: 05/02/2015] [Indexed: 10/23/2022] Open
Abstract
Deciphering the content of continuous speech is a challenging task performed daily by the human brain. Here, we tested whether activity of single cells in auditory cortex could be used to support such a task. We recorded neural activity from auditory cortex of two neurosurgical patients while presented with a short video segment containing speech. Population spiking activity (~20 cells per patient) allowed detection of word onset and decoding the identity of perceived words with significantly high accuracy levels. Oscillation phase of local field potentials (8-12Hz) also allowed decoding word identity although with lower accuracy levels. Our results provide evidence that the spiking activity of a relatively small population of cells in human primary auditory cortex contains significant information for classification of words in ongoing speech. Given previous evidence for overlapping neural representation during speech perception and production, this may have implications for developing brain-machine interfaces for patients with deficits in speech production.
Collapse
|
171
|
Occelli F, Suied C, Pressnitzer D, Edeline JM, Gourévitch B. A Neural Substrate for Rapid Timbre Recognition? Neural and Behavioral Discrimination of Very Brief Acoustic Vowels. Cereb Cortex 2015; 26:2483-2496. [PMID: 25947234 DOI: 10.1093/cercor/bhv071] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The timbre of a sound plays an important role in our ability to discriminate between behaviorally relevant auditory categories, such as different vowels in speech. Here, we investigated, in the primary auditory cortex (A1) of anesthetized guinea pigs, the neural representation of vowels with impoverished timbre cues. Five different vowels were presented with durations ranging from 2 to 128 ms. A psychophysical experiment involving human listeners showed that identification performance was near ceiling for the longer durations and degraded close to chance level for the shortest durations. This was likely due to spectral splatter, which reduced the contrast between the spectral profiles of the vowels at short durations. Effects of vowel duration on cortical responses were well predicted by the linear frequency responses of A1 neurons. Using mutual information, we found that auditory cortical neurons in the guinea pig could be used to reliably identify several vowels for all durations. Information carried by each cortical site was low on average, but the population code was accurate even for durations where human behavioral performance was poor. These results suggest that a place population code is available at the level of A1 to encode spectral profile cues for even very short sounds.
Collapse
Affiliation(s)
- F Occelli
- UMR CNRS 9197, Institut de NeuroScience Paris-Saclay (NeuroPSI)
- Université Paris-Sud, Institut de NeuroScience Paris-Saclay (NeuroPSI) 91405 Orsay Cedex, France
| | - C Suied
- Département Action et Cognition en Situation Opérationnelle, Institut de Recherche Biomédicale des Armées, 91223 Brétigny sur Orge, France
| | - D Pressnitzer
- UMR CNRS 8248, LSP
- DEC, LSP Ecole Normale Supérieure, 29 rue d'Ulm, 75005 Paris, France
| | - J-M Edeline
- UMR CNRS 9197, Institut de NeuroScience Paris-Saclay (NeuroPSI)
- Université Paris-Sud, Institut de NeuroScience Paris-Saclay (NeuroPSI) 91405 Orsay Cedex, France
| | - B Gourévitch
- UMR CNRS 9197, Institut de NeuroScience Paris-Saclay (NeuroPSI)
- Université Paris-Sud, Institut de NeuroScience Paris-Saclay (NeuroPSI) 91405 Orsay Cedex, France
| |
Collapse
|
172
|
Zhang X, Zhang Q, Hu X, Zhang B. Neural representation of three-dimensional acoustic space in the human temporal lobe. Front Hum Neurosci 2015; 9:203. [PMID: 25932011 PMCID: PMC4399328 DOI: 10.3389/fnhum.2015.00203] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2015] [Accepted: 03/27/2015] [Indexed: 11/13/2022] Open
Abstract
Sound localization is an important function of the human brain, but the underlying cortical mechanisms remain unclear. In this study, we recorded auditory stimuli in three-dimensional space and then replayed the stimuli through earphones during functional magnetic resonance imaging (fMRI). By employing a machine learning algorithm, we successfully decoded sound location from the blood oxygenation level-dependent signals in the temporal lobe. Analysis of the data revealed that different cortical patterns were evoked by sounds from different locations. Specifically, discrimination of sound location along the abscissa axis evoked robust responses in the left posterior superior temporal gyrus (STG) and right mid-STG, discrimination along the elevation (EL) axis evoked robust responses in the left posterior middle temporal lobe (MTL) and right STG, and discrimination along the ordinate axis evoked robust responses in the left mid-MTL and right mid-STG. These results support a distributed representation of acoustic space in human cortex.
Collapse
Affiliation(s)
- Xiaolu Zhang
- State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology (TNList), Department of Computer Science and Technology, Tsinghua University Beijing, China
| | - Qingtian Zhang
- State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology (TNList), Department of Computer Science and Technology, Tsinghua University Beijing, China
| | - Xiaolin Hu
- State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology (TNList), Department of Computer Science and Technology, Tsinghua University Beijing, China ; Center for Brain-Inspired Computing Research (CBICR), Tsinghua University Beijing, China
| | - Bo Zhang
- State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology (TNList), Department of Computer Science and Technology, Tsinghua University Beijing, China ; Center for Brain-Inspired Computing Research (CBICR), Tsinghua University Beijing, China
| |
Collapse
|
173
|
Abstract
A fundamental goal of the human auditory system is to map complex acoustic signals onto stable internal representations of the basic sound patterns of speech. Phonemes and the distinctive features that they comprise constitute the basic building blocks from which higher-level linguistic representations, such as words and sentences, are formed. Although the neural structures underlying phonemic representations have been well studied, there is considerable debate regarding frontal-motor cortical contributions to speech as well as the extent of lateralization of phonological representations within auditory cortex. Here we used functional magnetic resonance imaging (fMRI) and multivoxel pattern analysis to investigate the distributed patterns of activation that are associated with the categorical and perceptual similarity structure of 16 consonant exemplars in the English language used in Miller and Nicely's (1955) classic study of acoustic confusability. Participants performed an incidental task while listening to phonemes in the MRI scanner. Neural activity in bilateral anterior superior temporal gyrus and supratemporal plane was correlated with the first two components derived from a multidimensional scaling analysis of a behaviorally derived confusability matrix. We further showed that neural representations corresponding to the categorical features of voicing, manner of articulation, and place of articulation were widely distributed throughout bilateral primary, secondary, and association areas of the superior temporal cortex, but not motor cortex. Although classification of phonological features was generally bilateral, we found that multivariate pattern information was moderately stronger in the left compared with the right hemisphere for place but not for voicing or manner of articulation.
Collapse
|
174
|
Poliva O. From where to what: a neuroanatomically based evolutionary model of the emergence of speech in humans. F1000Res 2015; 4:67. [PMID: 28928931 PMCID: PMC5600004 DOI: 10.12688/f1000research.6175.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/03/2015] [Indexed: 03/28/2024] Open
Abstract
In the brain of primates, the auditory cortex connects with the frontal lobe via the temporal pole (auditory ventral stream; AVS) and via the inferior parietal lobule (auditory dorsal stream; ADS). The AVS is responsible for sound recognition, and the ADS for sound-localization, voice detection and audio-visual integration. I propose that the primary role of the ADS in monkeys/apes is the perception and response to contact calls. These calls are exchanged between tribe members (e.g., mother-offspring) and are used for monitoring location. Perception of contact calls occurs by the ADS detecting a voice, localizing it, and verifying that the corresponding face is out of sight. The auditory cortex then projects to parieto-frontal visuospatial regions (visual dorsal stream) for searching the caller, and via a series of frontal lobe-brainstem connections, a contact call is produced in return. Because the human ADS processes also speech production and repetition, I further describe a course for the development of speech in humans. I propose that, due to duplication of a parietal region and its frontal projections, and strengthening of direct frontal-brainstem connections, the ADS converted auditory input directly to vocal regions in the frontal lobe, which endowed early Hominans with partial vocal control. This enabled offspring to modify their contact calls with intonations for signaling different distress levels to their mother. Vocal control could then enable question-answer conversations, by offspring emitting a low-level distress call for inquiring about the safety of objects, and mothers responding with high- or low-level distress calls. Gradually, the ADS and the direct frontal-brainstem connections became more robust and vocal control became more volitional. Eventually, individuals were capable of inventing new words and offspring were capable of inquiring about objects in their environment and learning their names via mimicry.
Collapse
|
175
|
Poliva O. From where to what: a neuroanatomically based evolutionary model of the emergence of speech in humans. F1000Res 2015; 4:67. [PMID: 28928931 PMCID: PMC5600004 DOI: 10.12688/f1000research.6175.3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/21/2017] [Indexed: 12/28/2022] Open
Abstract
In the brain of primates, the auditory cortex connects with the frontal lobe via the temporal pole (auditory ventral stream; AVS) and via the inferior parietal lobe (auditory dorsal stream; ADS). The AVS is responsible for sound recognition, and the ADS for sound-localization, voice detection and integration of calls with faces. I propose that the primary role of the ADS in non-human primates is the detection and response to contact calls. These calls are exchanged between tribe members (e.g., mother-offspring) and are used for monitoring location. Detection of contact calls occurs by the ADS identifying a voice, localizing it, and verifying that the corresponding face is out of sight. Once a contact call is detected, the primate produces a contact call in return via descending connections from the frontal lobe to a network of limbic and brainstem regions. Because the ADS of present day humans also performs speech production, I further propose an evolutionary course for the transition from contact call exchange to an early form of speech. In accordance with this model, structural changes to the ADS endowed early members of the genus Homo with partial vocal control. This development was beneficial as it enabled offspring to modify their contact calls with intonations for signaling high or low levels of distress to their mother. Eventually, individuals were capable of participating in yes-no question-answer conversations. In these conversations the offspring emitted a low-level distress call for inquiring about the safety of objects (e.g., food), and his/her mother responded with a high- or low-level distress call to signal approval or disapproval of the interaction. Gradually, the ADS and its connections with brainstem motor regions became more robust and vocal control became more volitional. Speech emerged once vocal control was sufficient for inventing novel calls.
Collapse
|
176
|
Poliva O. From where to what: a neuroanatomically based evolutionary model of the emergence of speech in humans. F1000Res 2015; 4:67. [PMID: 28928931 PMCID: PMC5600004.2 DOI: 10.12688/f1000research.6175.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/12/2016] [Indexed: 03/28/2024] Open
Abstract
In the brain of primates, the auditory cortex connects with the frontal lobe via the temporal pole (auditory ventral stream; AVS) and via the inferior parietal lobe (auditory dorsal stream; ADS). The AVS is responsible for sound recognition, and the ADS for sound-localization, voice detection and integration of calls with faces. I propose that the primary role of the ADS in non-human primates is the detection and response to contact calls. These calls are exchanged between tribe members (e.g., mother-offspring) and are used for monitoring location. Detection of contact calls occurs by the ADS identifying a voice, localizing it, and verifying that the corresponding face is out of sight. Once a contact call is detected, the primate produces a contact call in return via descending connections from the frontal lobe to a network of limbic and brainstem regions. Because the ADS of present day humans also performs speech production, I further propose an evolutionary course for the transition from contact call exchange to an early form of speech. In accordance with this model, structural changes to the ADS endowed early members of the genus Homo with partial vocal control. This development was beneficial as it enabled offspring to modify their contact calls with intonations for signaling high or low levels of distress to their mother. Eventually, individuals were capable of participating in yes-no question-answer conversations. In these conversations the offspring emitted a low-level distress call for inquiring about the safety of objects (e.g., food), and his/her mother responded with a high- or low-level distress call to signal approval or disapproval of the interaction. Gradually, the ADS and its connections with brainstem motor regions became more robust and vocal control became more volitional. Speech emerged once vocal control was sufficient for inventing novel calls.
Collapse
|
177
|
Ji X, Han J, Jiang X, Hu X, Guo L, Han J, Shao L, Liu T. Analysis of music/speech via integration of audio content and functional brain response. Inf Sci (N Y) 2015. [DOI: 10.1016/j.ins.2014.11.020] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
178
|
Decoding multiple sound categories in the human temporal cortex using high resolution fMRI. PLoS One 2015; 10:e0117303. [PMID: 25692885 PMCID: PMC4333227 DOI: 10.1371/journal.pone.0117303] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2013] [Accepted: 12/22/2014] [Indexed: 11/19/2022] Open
Abstract
Perception of sound categories is an important aspect of auditory perception. The extent to which the brain's representation of sound categories is encoded in specialized subregions or distributed across the auditory cortex remains unclear. Recent studies using multivariate pattern analysis (MVPA) of brain activations have provided important insights into how the brain decodes perceptual information. In the large existing literature on brain decoding using MVPA methods, relatively few studies have been conducted on multi-class categorization in the auditory domain. Here, we investigated the representation and processing of auditory categories within the human temporal cortex using high resolution fMRI and MVPA methods. More importantly, we considered decoding multiple sound categories simultaneously through multi-class support vector machine-recursive feature elimination (MSVM-RFE) as our MVPA tool. Results show that for all classifications the model MSVM-RFE was able to learn the functional relation between the multiple sound categories and the corresponding evoked spatial patterns and classify the unlabeled sound-evoked patterns significantly above chance. This indicates the feasibility of decoding multiple sound categories not only within but across subjects. However, the across-subject variation affects classification performance more than the within-subject variation, as the across-subject analysis has significantly lower classification accuracies. Sound category-selective brain maps were identified based on multi-class classification and revealed distributed patterns of brain activity in the superior temporal gyrus and the middle temporal gyrus. This is in accordance with previous studies, indicating that information in the spatially distributed patterns may reflect a more abstract perceptual level of representation of sound categories. Further, we show that the across-subject classification performance can be significantly improved by averaging the fMRI images over items, because the irrelevant variations between different items of the same sound category are reduced and in turn the proportion of signals relevant to sound categorization increases.
Collapse
|
179
|
Correia JM, Jansma B, Hausfeld L, Kikkert S, Bonte M. EEG decoding of spoken words in bilingual listeners: from words to language invariant semantic-conceptual representations. Front Psychol 2015; 6:71. [PMID: 25705197 PMCID: PMC4319403 DOI: 10.3389/fpsyg.2015.00071] [Citation(s) in RCA: 83] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2014] [Accepted: 01/13/2015] [Indexed: 11/13/2022] Open
Abstract
Spoken word recognition and production require fast transformations between acoustic, phonological, and conceptual neural representations. Bilinguals perform these transformations in native and non-native languages, deriving unified semantic concepts from equivalent, but acoustically different words. Here we exploit this capacity of bilinguals to investigate input invariant semantic representations in the brain. We acquired EEG data while Dutch subjects, highly proficient in English listened to four monosyllabic and acoustically distinct animal words in both languages (e.g., “paard”–“horse”). Multivariate pattern analysis (MVPA) was applied to identify EEG response patterns that discriminate between individual words within one language (within-language discrimination) and generalize meaning across two languages (across-language generalization). Furthermore, employing two EEG feature selection approaches, we assessed the contribution of temporal and oscillatory EEG features to our classification results. MVPA revealed that within-language discrimination was possible in a broad time-window (~50–620 ms) after word onset probably reflecting acoustic-phonetic and semantic-conceptual differences between the words. Most interestingly, significant across-language generalization was possible around 550–600 ms, suggesting the activation of common semantic-conceptual representations from the Dutch and English nouns. Both types of classification, showed a strong contribution of oscillations below 12 Hz, indicating the importance of low frequency oscillations in the neural representation of individual words and concepts. This study demonstrates the feasibility of MVPA to decode individual spoken words from EEG responses and to assess the spectro-temporal dynamics of their language invariant semantic-conceptual representations. We discuss how this method and results could be relevant to track the neural mechanisms underlying conceptual encoding in comprehension and production.
Collapse
Affiliation(s)
- João M Correia
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht Brain Imaging Center (M-BIC), Maastricht University Maastricht, Netherlands
| | - Bernadette Jansma
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht Brain Imaging Center (M-BIC), Maastricht University Maastricht, Netherlands
| | - Lars Hausfeld
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht Brain Imaging Center (M-BIC), Maastricht University Maastricht, Netherlands
| | - Sanne Kikkert
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht Brain Imaging Center (M-BIC), Maastricht University Maastricht, Netherlands
| | - Milene Bonte
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht Brain Imaging Center (M-BIC), Maastricht University Maastricht, Netherlands
| |
Collapse
|
180
|
Moerel M, De Martino F, Santoro R, Yacoub E, Formisano E. Representation of pitch chroma by multi-peak spectral tuning in human auditory cortex. Neuroimage 2015; 106:161-9. [PMID: 25479020 PMCID: PMC4388253 DOI: 10.1016/j.neuroimage.2014.11.044] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2014] [Revised: 10/31/2014] [Accepted: 11/20/2014] [Indexed: 01/04/2023] Open
Abstract
Musical notes played at octave intervals (i.e., having the same pitch chroma) are perceived as similar. This well-known perceptual phenomenon lays at the foundation of melody recognition and music perception, yet its neural underpinnings remain largely unknown to date. Using fMRI with high sensitivity and spatial resolution, we examined the contribution of multi-peak spectral tuning to the neural representation of pitch chroma in human auditory cortex in two experiments. In experiment 1, our estimation of population spectral tuning curves from the responses to natural sounds confirmed--with new data--our recent results on the existence of cortical ensemble responses finely tuned to multiple frequencies at one octave distance (Moerel et al., 2013). In experiment 2, we fitted a mathematical model consisting of a pitch chroma and height component to explain the measured fMRI responses to piano notes. This analysis revealed that the octave-tuned populations-but not other cortical populations-harbored a neural representation of musical notes according to their pitch chroma. These results indicate that responses of auditory cortical populations selectively tuned to multiple frequencies at one octave distance predict well the perceptual similarity of musical notes with the same chroma, beyond the physical (frequency) distance of notes.
Collapse
Affiliation(s)
- Michelle Moerel
- Department of Radiology, Center for Magnetic Resonance Research, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Federico De Martino
- Faculty of Psychology and Neuroscience, Department of Cognitive Neuroscience, Maastricht University, Maastricht, 6200 MD, the Netherlands; Maastricht Brain Imaging Center (MBIC), Maastricht University, Maastricht, 6229 EV, the Netherlands
| | - Roberta Santoro
- Faculty of Psychology and Neuroscience, Department of Cognitive Neuroscience, Maastricht University, Maastricht, 6200 MD, the Netherlands; Maastricht Brain Imaging Center (MBIC), Maastricht University, Maastricht, 6229 EV, the Netherlands
| | - Essa Yacoub
- Department of Radiology, Center for Magnetic Resonance Research, University of Minnesota, Minneapolis, MN 55455, USA
| | - Elia Formisano
- Faculty of Psychology and Neuroscience, Department of Cognitive Neuroscience, Maastricht University, Maastricht, 6200 MD, the Netherlands; Maastricht Brain Imaging Center (MBIC), Maastricht University, Maastricht, 6229 EV, the Netherlands
| |
Collapse
|
181
|
Kriengwatana B, Escudero P, ten Cate C. Revisiting vocal perception in non-human animals: a review of vowel discrimination, speaker voice recognition, and speaker normalization. Front Psychol 2015; 5:1543. [PMID: 25628583 PMCID: PMC4292401 DOI: 10.3389/fpsyg.2014.01543] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2014] [Accepted: 12/12/2014] [Indexed: 12/03/2022] Open
Abstract
The extent to which human speech perception evolved by taking advantage of predispositions and pre-existing features of vertebrate auditory and cognitive systems remains a central question in the evolution of speech. This paper reviews asymmetries in vowel perception, speaker voice recognition, and speaker normalization in non-human animals - topics that have not been thoroughly discussed in relation to the abilities of non-human animals, but are nonetheless important aspects of vocal perception. Throughout this paper we demonstrate that addressing these issues in non-human animals is relevant and worthwhile because many non-human animals must deal with similar issues in their natural environment. That is, they must also discriminate between similar-sounding vocalizations, determine signaler identity from vocalizations, and resolve signaler-dependent variation in vocalizations from conspecifics. Overall, we find that, although plausible, the current evidence is insufficiently strong to conclude that directional asymmetries in vowel perception are specific to humans, or that non-human animals can use voice characteristics to recognize human individuals. However, we do find some indication that non-human animals can normalize speaker differences. Accordingly, we identify avenues for future research that would greatly improve and advance our understanding of these topics.
Collapse
Affiliation(s)
- Buddhamas Kriengwatana
- Behavioural Biology, Institute for Biology Leiden, Leiden UniversityLeiden, Netherlands
- Leiden Institute for Brain and Cognition, Leiden UniversityLeiden, Netherlands
| | - Paola Escudero
- The MARCS Institute, University of Western SydneySydney, NSW, Australia
| | - Carel ten Cate
- Behavioural Biology, Institute for Biology Leiden, Leiden UniversityLeiden, Netherlands
- Leiden Institute for Brain and Cognition, Leiden UniversityLeiden, Netherlands
| |
Collapse
|
182
|
Anders S, Heussen Y, Sprenger A, Haynes JD, Ethofer T. Social gating of sensory information during ongoing communication. Neuroimage 2015; 104:189-98. [PMID: 25315788 DOI: 10.1016/j.neuroimage.2014.10.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2014] [Revised: 09/30/2014] [Accepted: 10/05/2014] [Indexed: 11/17/2022] Open
Abstract
Social context plays an important role in human communication. Depending on the nature of the source, the same communication signal might be processed in fundamentally different ways. However, the selective modulation (or "gating") of the flow of neural information during communication is not fully understood. Here, we use multivoxel pattern analysis (MVPA) and multivoxel connectivity analysis (MVCA), a novel technique that allows to analyse context-dependent changes of the strength interregional coupling between ensembles of voxels, to examine how the human brain differentially gates content-specific sensory information during ongoing perception of communication signals. In a simulated electronic communication experiment, participants received two alternative text messages during fMRI ("happy" or "sad") which they believed had been sent either by their real-life friend outside the scanner or by a computer. A region in the dorsal medial prefrontal cortex (dmPFC) selectively increased its functional coupling with sensory-content encoding regions in the visual cortex when a text message was perceived as being sent by the participant's friend, and decreased its functional coupling with these regions when a text message was perceived as being sent by the computer. Furthermore, the strength of neural encoding of content-specific information of text messages in the dmPFC was modulated by the social tie between the participant and her friend: the more of her spare time a participant reported to spend with her friend the stronger was the neural encoding. This suggests that the human brain selectively gates sensory information into the relevant network for processing the mental states of others, depending on the source of the communication signal.
Collapse
Affiliation(s)
- Silke Anders
- Department of Neurology, Universität zu Lübeck, Lübeck, Germany.
| | - Yana Heussen
- Department of Neurology, Universität zu Lübeck, Lübeck, Germany
| | | | - John-Dylan Haynes
- Bernstein Center for Computational Neuroscience Berlin, Charité-Universitätsmedizin, Berlin, Germany
| | - Thomas Ethofer
- Department of Psychiatry, University of Tübingen, Tübingen, Germany; Department of Biomedical Magnetic Resonance, University of Tübingen, Tübingen, Germany
| |
Collapse
|
183
|
Raschle NM, Smith SA, Zuk J, Dauvermann MR, Figuccio MJ, Gaab N. Investigating the neural correlates of voice versus speech-sound directed information in pre-school children. PLoS One 2014; 9:e115549. [PMID: 25532132 PMCID: PMC4274095 DOI: 10.1371/journal.pone.0115549] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2014] [Accepted: 11/24/2014] [Indexed: 02/06/2023] Open
Abstract
Studies in sleeping newborns and infants propose that the superior temporal sulcus is involved in speech processing soon after birth. Speech processing also implicitly requires the analysis of the human voice, which conveys both linguistic and extra-linguistic information. However, due to technical and practical challenges when neuroimaging young children, evidence of neural correlates of speech and/or voice processing in toddlers and young children remains scarce. In the current study, we used functional magnetic resonance imaging (fMRI) in 20 typically developing preschool children (average age = 5.8 y; range 5.2-6.8 y) to investigate brain activation during judgments about vocal identity versus the initial speech sound of spoken object words. FMRI results reveal common brain regions responsible for voice-specific and speech-sound specific processing of spoken object words including bilateral primary and secondary language areas of the brain. Contrasting voice-specific with speech-sound specific processing predominantly activates the anterior part of the right-hemispheric superior temporal sulcus. Furthermore, the right STS is functionally correlated with left-hemispheric temporal and right-hemispheric prefrontal regions. This finding underlines the importance of the right superior temporal sulcus as a temporal voice area and indicates that this brain region is specialized, and functions similarly to adults by the age of five. We thus extend previous knowledge of voice-specific regions and their functional connections to the young brain which may further our understanding of the neuronal mechanism of speech-specific processing in children with developmental disorders, such as autism or specific language impairments.
Collapse
Affiliation(s)
- Nora Maria Raschle
- Laboratories of Cognitive Neuroscience, Division of Developmental Medicine, Department of Developmental Medicine, Boston Children's Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
- Psychiatric University Clinics Basel, Department of Child and Adolescent Psychiatry, Basel, Switzerland
| | - Sara Ashley Smith
- Laboratories of Cognitive Neuroscience, Division of Developmental Medicine, Department of Developmental Medicine, Boston Children's Hospital, Boston, Massachusetts, United States of America
| | - Jennifer Zuk
- Laboratories of Cognitive Neuroscience, Division of Developmental Medicine, Department of Developmental Medicine, Boston Children's Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
| | - Maria Regina Dauvermann
- Laboratories of Cognitive Neuroscience, Division of Developmental Medicine, Department of Developmental Medicine, Boston Children's Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
| | - Michael Joseph Figuccio
- Laboratories of Cognitive Neuroscience, Division of Developmental Medicine, Department of Developmental Medicine, Boston Children's Hospital, Boston, Massachusetts, United States of America
| | - Nadine Gaab
- Laboratories of Cognitive Neuroscience, Division of Developmental Medicine, Department of Developmental Medicine, Boston Children's Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
- Harvard Graduate School of Education, Cambridge, Massachusetts, United States of America
| |
Collapse
|
184
|
Abstract
Listeners can recognize familiar human voices from variable utterances, suggesting the acquisition of speech-invariant voice representations during familiarization. However, the neurocognitive mechanisms mediating learning and recognition of voices from natural speech are currently unknown. Using electrophysiology, we investigated how representations are formed during intentional learning of initially unfamiliar voices that were later recognized among novel voices. To probe the acquisition of speech-invariant voice representations, we compared a "same sentence" condition, in which speakers repeated the study utterances at test, and a "different sentence" condition. Although recognition performance was higher for same compared with different sentences, substantial voice learning also occurred for different sentences, with recognition performance increasing across consecutive study-test-cycles. During study, event-related potentials elicited by voices subsequently remembered elicited a larger sustained parietal positivity (∼250-1400 ms) compared with subsequently forgotten voices. This difference due to memory was unaffected by test sentence condition and may thus reflect the acquisition of speech-invariant voice representations. At test, voices correctly classified as "old" elicited a larger late positive component (300-700 ms) at Pz than voices correctly classified as "new." This event-related potential OLD/NEW effect was limited to the same sentence condition and may thus reflect speech-dependent retrieval of voices from episodic memory. Importantly, a speech-independent effect for learned compared with novel voices was found in beta band oscillations (16-17 Hz) between 290 and 370 ms at central and right temporal sites. Our results are a first step toward elucidating the electrophysiological correlates of voice learning and recognition.
Collapse
|
185
|
Bernstein LE, Liebenthal E. Neural pathways for visual speech perception. Front Neurosci 2014; 8:386. [PMID: 25520611 PMCID: PMC4248808 DOI: 10.3389/fnins.2014.00386] [Citation(s) in RCA: 89] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2014] [Accepted: 11/10/2014] [Indexed: 12/03/2022] Open
Abstract
This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody) can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns of activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1) The visual perception of speech relies on visual pathway representations of speech qua speech. (2) A proposed site of these representations, the temporal visual speech area (TVSA) has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS). (3) Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA.
Collapse
Affiliation(s)
- Lynne E Bernstein
- Department of Speech and Hearing Sciences, George Washington University Washington, DC, USA
| | - Einat Liebenthal
- Department of Neurology, Medical College of Wisconsin Milwaukee, WI, USA ; Department of Psychiatry, Brigham and Women's Hospital Boston, MA, USA
| |
Collapse
|
186
|
Visual abilities are important for auditory-only speech recognition: Evidence from autism spectrum disorder. Neuropsychologia 2014; 65:1-11. [DOI: 10.1016/j.neuropsychologia.2014.09.031] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2014] [Revised: 08/25/2014] [Accepted: 09/18/2014] [Indexed: 11/22/2022]
|
187
|
Junger J, Habel U, Bröhr S, Neulen J, Neuschaefer-Rube C, Birkholz P, Kohler C, Schneider F, Derntl B, Pauly K. More than just two sexes: the neural correlates of voice gender perception in gender dysphoria. PLoS One 2014; 9:e111672. [PMID: 25375171 PMCID: PMC4222943 DOI: 10.1371/journal.pone.0111672] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2014] [Accepted: 10/03/2014] [Indexed: 01/28/2023] Open
Abstract
Gender dysphoria (also known as “transsexualism”) is characterized as a discrepancy between anatomical sex and gender identity. Research points towards neurobiological influences. Due to the sexually dimorphic characteristics of the human voice, voice gender perception provides a biologically relevant function, e.g. in the context of mating selection. There is evidence for a better recognition of voices of the opposite sex and a differentiation of the sexes in its underlying functional cerebral correlates, namely the prefrontal and middle temporal areas. This fMRI study investigated the neural correlates of voice gender perception in 32 male-to-female gender dysphoric individuals (MtFs) compared to 20 non-gender dysphoric men and 19 non-gender dysphoric women. Participants indicated the sex of 240 voice stimuli modified in semitone steps in the direction to the other gender. Compared to men and women, MtFs showed differences in a neural network including the medial prefrontal gyrus, the insula, and the precuneus when responding to male vs. female voices. With increased voice morphing men recruited more prefrontal areas compared to women and MtFs, while MtFs revealed a pattern more similar to women. On a behavioral and neuronal level, our results support the feeling of MtFs reporting they cannot identify with their assigned sex.
Collapse
Affiliation(s)
- Jessica Junger
- Department of Psychiatry, Psychotherapy and Psychosomatics, Medical School, RWTH Aachen University, Aachen, Germany
- Jülich Aachen Research Alliance-Translational Brain Medicine, Jülich, Germany
- * E-mail:
| | - Ute Habel
- Department of Psychiatry, Psychotherapy and Psychosomatics, Medical School, RWTH Aachen University, Aachen, Germany
- Jülich Aachen Research Alliance-Translational Brain Medicine, Jülich, Germany
| | - Sabine Bröhr
- Department of Psychiatry, Psychotherapy and Psychosomatics, Medical School, RWTH Aachen University, Aachen, Germany
| | - Josef Neulen
- Department of Gynaecological Endocrinology and Reproductive Medicine, Medical School, RWTH Aachen University, Aachen, Germany
| | - Christiane Neuschaefer-Rube
- Department of Phoniatrics, Pedaudiology and Communication Disorders, Medical School, RWTH Aachen University, Aachen, Germany
| | - Peter Birkholz
- Department of Phoniatrics, Pedaudiology and Communication Disorders, Medical School, RWTH Aachen University, Aachen, Germany
| | - Christian Kohler
- Department of Psychiatry, Neuropsychiatry Division, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, United States of America
| | - Frank Schneider
- Department of Psychiatry, Psychotherapy and Psychosomatics, Medical School, RWTH Aachen University, Aachen, Germany
- Jülich Aachen Research Alliance-Translational Brain Medicine, Jülich, Germany
| | - Birgit Derntl
- Department of Psychiatry, Psychotherapy and Psychosomatics, Medical School, RWTH Aachen University, Aachen, Germany
- Jülich Aachen Research Alliance-Translational Brain Medicine, Jülich, Germany
| | - Katharina Pauly
- Department of Psychiatry, Psychotherapy and Psychosomatics, Medical School, RWTH Aachen University, Aachen, Germany
- Jülich Aachen Research Alliance-Translational Brain Medicine, Jülich, Germany
| |
Collapse
|
188
|
Kuo PC, Chen YS, Chen LF, Hsieh JC. Decoding and encoding of visual patterns using magnetoencephalographic data represented in manifolds. Neuroimage 2014; 102 Pt 2:435-50. [DOI: 10.1016/j.neuroimage.2014.07.046] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2014] [Revised: 06/27/2014] [Accepted: 07/22/2014] [Indexed: 11/17/2022] Open
|
189
|
Zilles K, Bacha-Trams M, Palomero-Gallagher N, Amunts K, Friederici AD. Common molecular basis of the sentence comprehension network revealed by neurotransmitter receptor fingerprints. Cortex 2014; 63:79-89. [PMID: 25243991 PMCID: PMC4317196 DOI: 10.1016/j.cortex.2014.07.007] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2014] [Revised: 06/02/2014] [Accepted: 07/10/2014] [Indexed: 01/08/2023]
Abstract
The language network is a well-defined large-scale neural network of anatomically and functionally interacting cortical areas. The successful language process requires the transmission of information between these areas. Since neurotransmitter receptors are key molecules of information processing, we hypothesized that cortical areas which are part of the same functional language network may show highly similar multireceptor expression pattern ("receptor fingerprint"), whereas those that are not part of this network should have different fingerprints. Here we demonstrate that the relation between the densities of 15 different excitatory, inhibitory and modulatory receptors in eight language-related areas are highly similar and differ considerably from those of 18 other brain regions not directly involved in language processing. Thus, the fingerprints of all cortical areas underlying a large-scale cognitive domain such as language is a characteristic, functionally relevant feature of this network and an important prerequisite for the underlying neuronal processes of language functions.
Collapse
Affiliation(s)
- Karl Zilles
- Institute of Neuroscience and Medicine (INM-1), Research Centre Juelich, Germany; Department of Psychiatry, Psychotherapy, and Psychosomatics, University Hospital Aachen, RWTH Aachen University, Germany.
| | - Maraike Bacha-Trams
- Institute of Neuroscience and Medicine (INM-1), Research Centre Juelich, Germany; Max Planck Institute for Human Cognitive and Brain Sciences, Department of Neuropsychology, Leipzig, Germany.
| | | | - Katrin Amunts
- Institute of Neuroscience and Medicine (INM-1), Research Centre Juelich, Germany; C. & O. Vogt Institute for Brain Research, Heinrich-Heine-University Duesseldorf, Germany.
| | - Angela D Friederici
- Max Planck Institute for Human Cognitive and Brain Sciences, Department of Neuropsychology, Leipzig, Germany.
| |
Collapse
|
190
|
Steinschneider M, Nourski KV, Rhone AE, Kawasaki H, Oya H, Howard MA. Differential activation of human core, non-core and auditory-related cortex during speech categorization tasks as revealed by intracranial recordings. Front Neurosci 2014; 8:240. [PMID: 25157216 PMCID: PMC4128221 DOI: 10.3389/fnins.2014.00240] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2014] [Accepted: 07/22/2014] [Indexed: 11/21/2022] Open
Abstract
Speech perception requires that sounds be transformed into speech-related objects with lexical and semantic meaning. It is unclear at what level in the auditory pathways this transformation emerges. Primary auditory cortex has been implicated in both representation of acoustic sound attributes and sound objects. While non-primary auditory cortex located on the posterolateral superior temporal gyrus (PLST) is clearly involved in acoustic-to-phonetic pre-lexical representations, it is unclear what role this region plays in auditory object formation. Additional data support the importance of prefrontal cortex in the formation of auditory objects, while other data would implicate this region in auditory object selection. To help clarify the respective roles of auditory and auditory-related cortex in the formation and selection of auditory objects, we examined high gamma activity simultaneously recorded directly from Heschl's gyrus (HG), PLST and prefrontal cortex, while subjects performed auditory semantic detection tasks. Subjects were patients undergoing evaluation for treatment of medically intractable epilepsy. We found that activity in posteromedial HG and early activity on PLST was robust to sound stimuli regardless of their context, and minimally modulated by tasks. Later activity on PLST could be strongly modulated by semantic context, but not by behavioral performance. Activity within prefrontal cortex also was related to semantic context, and did co-vary with behavior. We propose that activity in posteromedial HG and early activity on PLST primarily reflect the representation of spectrotemporal sound attributes. Later activity on PLST represents a pre-lexical processing stage and is an intermediate step in the formation of word objects. Activity in prefrontal cortex appears directly involved in word object selection. The roles of other auditory and auditory-related cortical areas in the formation of word objects remain to be explored.
Collapse
Affiliation(s)
- Mitchell Steinschneider
- Departments of Neurology and Neuroscience, Albert Einstein College of MedicineBronx, NY, USA
| | - Kirill V. Nourski
- Human Brain Research Laboratory, Department of Neurosurgery, The University of IowaIowa City, IA, USA
| | - Ariane E. Rhone
- Human Brain Research Laboratory, Department of Neurosurgery, The University of IowaIowa City, IA, USA
| | - Hiroto Kawasaki
- Human Brain Research Laboratory, Department of Neurosurgery, The University of IowaIowa City, IA, USA
| | - Hiroyuki Oya
- Human Brain Research Laboratory, Department of Neurosurgery, The University of IowaIowa City, IA, USA
| | - Matthew A. Howard
- Human Brain Research Laboratory, Department of Neurosurgery, The University of IowaIowa City, IA, USA
| |
Collapse
|
191
|
De Martino F, Moerel M, Ugurbil K, Formisano E, Yacoub E. Less noise, more activation: Multiband acquisition schemes for auditory functional MRI. Magn Reson Med 2014; 74:462-7. [PMID: 25105832 DOI: 10.1002/mrm.25408] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2014] [Revised: 06/08/2014] [Accepted: 07/24/2014] [Indexed: 11/11/2022]
Abstract
PURPOSE To improve acquisition in fMRI studies of audition by using multiband (MB) gradient-echo echo planar imaging (GE-EPI). METHODS Data were acquired at 3T (Siemens Skyra) with a 32-channel head coil. Functional responses were obtained by presenting stimuli [tones and natural sounds (voices, speech, music, tools, animal cries)] in silent gaps between image acquisitions. Two-fold slice acceleration (MB2) was compared with a standard GE-EPI (MB1). Coverage and sampling rate (TR = 3 s) were kept constant across acquisition schemes. The longer gap in MB2 scans was used to present: (i) sounds of the same length as in conventional GE-EPI (type 1; 800 ms stimuli); (ii) sounds of double the length (type 2; 1600 ms stimuli). RESULTS Functional responses to all sounds (i.e., main effect) were stronger when acquired with slice acceleration (i.e., shorter acquisition time). The difference between voice and nonvoice responses was greater in MB2 type 1 acquisitions (i.e., same length sounds as GE-EPI but presented in a longer silent gap) than in standard GE-EPI acquisitions (interaction effect). CONCLUSION Reducing the length of the scanner noise results in stronger functional responses. Longer "silent" periods (i.e., keeping the sound length the same as in standard acquisitions) result in stronger response to voice compared with nonvoice stimuli.
Collapse
Affiliation(s)
- Federico De Martino
- Department of Cognitive Neurosciences, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, The Netherlands.,Center for Magnetic Resonance Research, Department of Radiology, University of Minnesota, Minneapolis, Minnesota, USA
| | - Michelle Moerel
- Center for Magnetic Resonance Research, Department of Radiology, University of Minnesota, Minneapolis, Minnesota, USA
| | - Kamil Ugurbil
- Center for Magnetic Resonance Research, Department of Radiology, University of Minnesota, Minneapolis, Minnesota, USA
| | - Elia Formisano
- Department of Cognitive Neurosciences, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, The Netherlands
| | - Essa Yacoub
- Center for Magnetic Resonance Research, Department of Radiology, University of Minnesota, Minneapolis, Minnesota, USA
| |
Collapse
|
192
|
Mapping genetically controlled neural circuits of social behavior and visuo-motor integration by a preliminary examination of atypical deletions with Williams syndrome. PLoS One 2014; 9:e104088. [PMID: 25105779 PMCID: PMC4126723 DOI: 10.1371/journal.pone.0104088] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2013] [Accepted: 07/10/2014] [Indexed: 01/09/2023] Open
Abstract
In this study of eight rare atypical deletion cases with Williams-Beuren syndrome (WS; also known as 7q11.23 deletion syndrome) consisting of three different patterns of deletions, compared to typical WS and typically developing (TD) individuals, we show preliminary evidence of dissociable genetic contributions to brain structure and human cognition. Univariate and multivariate pattern classification results of morphometric brain patterns complemented by behavior implicate a possible role for the chromosomal region that includes: 1) GTF2I/GTF2IRD1 in visuo-spatial/motor integration, intraparietal as well as overall gray matter structures, 2) the region spanning ABHD11 through RFC2 including LIMK1, in social cognition, in particular approachability, as well as orbitofrontal, amygdala and fusiform anatomy, and 3) the regions including STX1A, and/or CYLN2 in overall white matter structure. This knowledge contributes to our understanding of the role of genetics on human brain structure, cognition and pathophysiology of altered cognition in WS. The current study builds on ongoing research designed to characterize the impact of multiple genes, gene-gene interactions and changes in gene expression on the human brain.
Collapse
|
193
|
Moerel M, De Martino F, Formisano E. An anatomical and functional topography of human auditory cortical areas. Front Neurosci 2014; 8:225. [PMID: 25120426 PMCID: PMC4114190 DOI: 10.3389/fnins.2014.00225] [Citation(s) in RCA: 147] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2014] [Accepted: 07/08/2014] [Indexed: 12/22/2022] Open
Abstract
While advances in magnetic resonance imaging (MRI) throughout the last decades have enabled the detailed anatomical and functional inspection of the human brain non-invasively, to date there is no consensus regarding the precise subdivision and topography of the areas forming the human auditory cortex. Here, we propose a topography of the human auditory areas based on insights on the anatomical and functional properties of human auditory areas as revealed by studies of cyto- and myelo-architecture and fMRI investigations at ultra-high magnetic field (7 Tesla). Importantly, we illustrate that—whereas a group-based approach to analyze functional (tonotopic) maps is appropriate to highlight the main tonotopic axis—the examination of tonotopic maps at single subject level is required to detail the topography of primary and non-primary areas that may be more variable across subjects. Furthermore, we show that considering multiple maps indicative of anatomical (i.e., myelination) as well as of functional properties (e.g., broadness of frequency tuning) is helpful in identifying auditory cortical areas in individual human brains. We propose and discuss a topography of areas that is consistent with old and recent anatomical post-mortem characterizations of the human auditory cortex and that may serve as a working model for neuroscience studies of auditory functions.
Collapse
Affiliation(s)
- Michelle Moerel
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University Maastricht, Netherlands ; Maastricht Brain Imaging Center, Maastricht University Maastricht, Netherlands ; Department of Radiology, Center for Magnetic Resonance Research, University of Minnesota Minneapolis, MN, USA
| | - Federico De Martino
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University Maastricht, Netherlands ; Maastricht Brain Imaging Center, Maastricht University Maastricht, Netherlands
| | - Elia Formisano
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University Maastricht, Netherlands ; Maastricht Brain Imaging Center, Maastricht University Maastricht, Netherlands
| |
Collapse
|
194
|
Anzellotti S, Caramazza A. The neural mechanisms for the recognition of face identity in humans. Front Psychol 2014; 5:672. [PMID: 25018745 PMCID: PMC4072087 DOI: 10.3389/fpsyg.2014.00672] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2014] [Accepted: 06/10/2014] [Indexed: 01/06/2023] Open
Abstract
Every day we encounter dozens of people, and in order to interact with them appropriately we need to recognize their identity. The face is a crucial source of information to recognize a person’s identity. However, recognizing the identity of a face is challenging because it requires distinguishing between very similar images (e.g., the front views of two different faces) while categorizing very different images (e.g., a front view and a profile) as the same person. Neuroimaging has the whole-brain coverage needed to investigate where representations of face identity are encoded, but it is limited in terms of spatial and temporal resolution. In this article, we review recent neuroimaging research that attempted to investigate the representation of face identity, the challenges it faces, and the proposed solutions, to conclude that given the current state of the evidence the right anterior temporal lobe is the most promising candidate region for the representation of face identity.
Collapse
Affiliation(s)
- Stefano Anzellotti
- Department of Psychology, Harvard University Cambridge, MA, USA ; Center for Mind/Brain Sciences, University of Trento Trento, Italy
| | - Alfonso Caramazza
- Department of Psychology, Harvard University Cambridge, MA, USA ; Center for Mind/Brain Sciences, University of Trento Trento, Italy
| |
Collapse
|
195
|
Haxby JV, Connolly AC, Guntupalli JS. Decoding neural representational spaces using multivariate pattern analysis. Annu Rev Neurosci 2014; 37:435-56. [PMID: 25002277 DOI: 10.1146/annurev-neuro-062012-170325] [Citation(s) in RCA: 398] [Impact Index Per Article: 39.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
A major challenge for systems neuroscience is to break the neural code. Computational algorithms for encoding information into neural activity and extracting information from measured activity afford understanding of how percepts, memories, thought, and knowledge are represented in patterns of brain activity. The past decade and a half has seen significant advances in the development of methods for decoding human neural activity, such as multivariate pattern classification, representational similarity analysis, hyperalignment, and stimulus-model-based encoding and decoding. This article reviews these advances and integrates neural decoding methods into a common framework organized around the concept of high-dimensional representational spaces.
Collapse
Affiliation(s)
- James V Haxby
- Department of Psychological and Brain Sciences, Center for Cognitive Neuroscience, Dartmouth College, Hanover, New Hampshire 03755; , ,
| | | | | |
Collapse
|
196
|
Giordano BL, Pernet C, Charest I, Belizaire G, Zatorre RJ, Belin P. Automatic domain-general processing of sound source identity in the left posterior middle frontal gyrus. Cortex 2014; 58:170-85. [PMID: 25038309 DOI: 10.1016/j.cortex.2014.06.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2013] [Revised: 03/24/2014] [Accepted: 06/09/2014] [Indexed: 11/18/2022]
Abstract
Identifying sound sources is fundamental to developing a stable representation of the environment in the face of variable auditory information. The cortical processes underlying this ability have received little attention. In two fMRI experiments, we investigated passive adaptation to (Exp. 1) and explicit discrimination of (Exp. 2) source identities for different categories of auditory objects (voices, musical instruments, environmental sounds). All cortical effects of source identity were independent of high-level category information, and were accounted for by sound-to-sound differences in low-level structure (e.g., loudness). A conjunction analysis revealed that the left posterior middle frontal gyrus (pMFG) adapted to identity repetitions during both passive listening and active discrimination tasks. These results indicate that the comparison of sound source identities in a stream of auditory stimulation recruits the pMFG in a domain-general way, i.e., independent of the sound category, based on information contained in the low-level acoustical structure. pMFG recruitment during both passive listening and explicit identity comparison tasks also suggests its automatic engagement in sound source identity processing.
Collapse
Affiliation(s)
- Bruno L Giordano
- Centre for Cognitive Neuroimaging, Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, Scotland, UK.
| | - Cyril Pernet
- Brain Research Imaging Center, Neuroimaging Sciences, University of Edinburgh, Western General Hospital, Edinburgh, Scotland, UK.
| | - Ian Charest
- Medical Research Council - Cognition and Brain Sciences Unit, Cambridge, UK.
| | - Guylaine Belizaire
- International Laboratory for Brain, Music and Sound (BRAMS), Université de Montréal, Montréal, QC, Canada; Centre de Recherche de l'Institut Universitaire de Gériatrie de Montréal, Université de Montréal, Montréal, Québec, Canada.
| | - Robert J Zatorre
- Montréal Neurological Institute, McGill University, Montreal, QC, Canada; International Laboratory for Brain, Music and Sound (BRAMS), Université de Montréal, Montréal, QC, Canada.
| | - Pascal Belin
- Centre for Cognitive Neuroimaging, Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, Scotland, UK; Institut des Neurosciences de la Timone, UMR7289, CNRS-Université Aix Marseille, Marseille, France; International Laboratory for Brain, Music and Sound (BRAMS), Université de Montréal, Montréal, QC, Canada.
| |
Collapse
|
197
|
Leonard MK, Chang EF. Dynamic speech representations in the human temporal lobe. Trends Cogn Sci 2014; 18:472-9. [PMID: 24906217 DOI: 10.1016/j.tics.2014.05.001] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2013] [Revised: 04/30/2014] [Accepted: 05/06/2014] [Indexed: 11/20/2022]
Abstract
Speech perception requires rapid integration of acoustic input with context-dependent knowledge. Recent methodological advances have allowed researchers to identify underlying information representations in primary and secondary auditory cortex and to examine how context modulates these representations. We review recent studies that focus on contextual modulations of neural activity in the superior temporal gyrus (STG), a major hub for spectrotemporal encoding. Recent findings suggest a highly interactive flow of information processing through the auditory ventral stream, including influences of higher-level linguistic and metalinguistic knowledge, even within individual areas. Such mechanisms may give rise to more abstract representations, such as those for words. We discuss the importance of characterizing representations of context-dependent and dynamic patterns of neural activity in the approach to speech perception research.
Collapse
Affiliation(s)
- Matthew K Leonard
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, Room 535, San Francisco, CA 94158, USA
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, Room 535, San Francisco, CA 94158, USA.
| |
Collapse
|
198
|
Ley A, Vroomen J, Formisano E. How learning to abstract shapes neural sound representations. Front Neurosci 2014; 8:132. [PMID: 24917783 PMCID: PMC4043152 DOI: 10.3389/fnins.2014.00132] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2014] [Accepted: 05/14/2014] [Indexed: 12/04/2022] Open
Abstract
The transformation of acoustic signals into abstract perceptual representations is the essence of the efficient and goal-directed neural processing of sounds in complex natural environments. While the human and animal auditory system is perfectly equipped to process the spectrotemporal sound features, adequate sound identification and categorization require neural sound representations that are invariant to irrelevant stimulus parameters. Crucially, what is relevant and irrelevant is not necessarily intrinsic to the physical stimulus structure but needs to be learned over time, often through integration of information from other senses. This review discusses the main principles underlying categorical sound perception with a special focus on the role of learning and neural plasticity. We examine the role of different neural structures along the auditory processing pathway in the formation of abstract sound representations with respect to hierarchical as well as dynamic and distributed processing models. Whereas most fMRI studies on categorical sound processing employed speech sounds, the emphasis of the current review lies on the contribution of empirical studies using natural or artificial sounds that enable separating acoustic and perceptual processing levels and avoid interference with existing category representations. Finally, we discuss the opportunities of modern analyses techniques such as multivariate pattern analysis (MVPA) in studying categorical sound representations. With their increased sensitivity to distributed activation changes—even in absence of changes in overall signal level—these analyses techniques provide a promising tool to reveal the neural underpinnings of perceptually invariant sound representations.
Collapse
Affiliation(s)
- Anke Ley
- Department of Medical Psychology and Neuropsychology, Tilburg School of Social and Behavioral Sciences, Tilburg University Tilburg, Netherlands ; Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University Maastricht, Netherlands
| | - Jean Vroomen
- Department of Medical Psychology and Neuropsychology, Tilburg School of Social and Behavioral Sciences, Tilburg University Tilburg, Netherlands
| | - Elia Formisano
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University Maastricht, Netherlands
| |
Collapse
|
199
|
Task-dependent decoding of speaker and vowel identity from auditory cortical response patterns. J Neurosci 2014; 34:4548-57. [PMID: 24672000 DOI: 10.1523/jneurosci.4339-13.2014] [Citation(s) in RCA: 65] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Selective attention to relevant sound properties is essential for everyday listening situations. It enables the formation of different perceptual representations of the same acoustic input and is at the basis of flexible and goal-dependent behavior. Here, we investigated the role of the human auditory cortex in forming behavior-dependent representations of sounds. We used single-trial fMRI and analyzed cortical responses collected while subjects listened to the same speech sounds (vowels /a/, /i/, and /u/) spoken by different speakers (boy, girl, male) and performed a delayed-match-to-sample task on either speech sound or speaker identity. Univariate analyses showed a task-specific activation increase in the right superior temporal gyrus/sulcus (STG/STS) during speaker categorization and in the right posterior temporal cortex during vowel categorization. Beyond regional differences in activation levels, multivariate classification of single trial responses demonstrated that the success with which single speakers and vowels can be decoded from auditory cortical activation patterns depends on task demands and subject's behavioral performance. Speaker/vowel classification relied on distinct but overlapping regions across the (right) mid-anterior STG/STS (speakers) and bilateral mid-posterior STG/STS (vowels), as well as the superior temporal plane including Heschl's gyrus/sulcus. The task dependency of speaker/vowel classification demonstrates that the informative fMRI response patterns reflect the top-down enhancement of behaviorally relevant sound representations. Furthermore, our findings suggest that successful selection, processing, and retention of task-relevant sound properties relies on the joint encoding of information across early and higher-order regions of the auditory cortex.
Collapse
|
200
|
Auditory and visual modulation of temporal lobe neurons in voice-sensitive and association cortices. J Neurosci 2014; 34:2524-37. [PMID: 24523543 DOI: 10.1523/jneurosci.2805-13.2014] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Effective interactions between conspecific individuals can depend upon the receiver forming a coherent multisensory representation of communication signals, such as merging voice and face content. Neuroimaging studies have identified face- or voice-sensitive areas (Belin et al., 2000; Petkov et al., 2008; Tsao et al., 2008), some of which have been proposed as candidate regions for face and voice integration (von Kriegstein et al., 2005). However, it was unclear how multisensory influences occur at the neuronal level within voice- or face-sensitive regions, especially compared with classically defined multisensory regions in temporal association cortex (Stein and Stanford, 2008). Here, we characterize auditory (voice) and visual (face) influences on neuronal responses in a right-hemisphere voice-sensitive region in the anterior supratemporal plane (STP) of Rhesus macaques. These results were compared with those in the neighboring superior temporal sulcus (STS). Within the STP, our results show auditory sensitivity to several vocal features, which was not evident in STS units. We also newly identify a functionally distinct neuronal subpopulation in the STP that appears to carry the area's sensitivity to voice identity related features. Audiovisual interactions were prominent in both the STP and STS. However, visual influences modulated the responses of STS neurons with greater specificity and were more often associated with congruent voice-face stimulus pairings than STP neurons. Together, the results reveal the neuronal processes subserving voice-sensitive fMRI activity patterns in primates, generate hypotheses for testing in the visual modality, and clarify the position of voice-sensitive areas within the unisensory and multisensory processing hierarchies.
Collapse
|