1
|
Auditory cortical delta-entrainment interacts with oscillatory power in multiple fronto-parietal networks. Neuroimage 2016; 147:32-42. [PMID: 27903440 PMCID: PMC5315055 DOI: 10.1016/j.neuroimage.2016.11.062] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2016] [Revised: 11/25/2016] [Accepted: 11/25/2016] [Indexed: 01/28/2023] Open
Abstract
The timing of slow auditory cortical activity aligns to the rhythmic fluctuations in speech. This entrainment is considered to be a marker of the prosodic and syllabic encoding of speech, and has been shown to correlate with intelligibility. Yet, whether and how auditory cortical entrainment is influenced by the activity in other speech–relevant areas remains unknown. Using source-localized MEG data, we quantified the dependency of auditory entrainment on the state of oscillatory activity in fronto-parietal regions. We found that delta band entrainment interacted with the oscillatory activity in three distinct networks. First, entrainment in the left anterior superior temporal gyrus (STG) was modulated by beta power in orbitofrontal areas, possibly reflecting predictive top-down modulations of auditory encoding. Second, entrainment in the left Heschl's Gyrus and anterior STG was dependent on alpha power in central areas, in line with the importance of motor structures for phonological analysis. And third, entrainment in the right posterior STG modulated theta power in parietal areas, consistent with the engagement of semantic memory. These results illustrate the topographical network interactions of auditory delta entrainment and reveal distinct cross-frequency mechanisms by which entrainment can interact with different cognitive processes underlying speech perception.
We study auditory cortical speech entrainment from a network perspective. Found three distinct networks interacting with delta-entrainment in auditory cortex. Entrainment is modulated by frontal beta power, possibly indexing predictions. Central alpha power interacts with entrainment, suggesting motor involvement. Parietal theta is modulated by entrainment, suggesting working memory compensation.
Collapse
|
Research Support, Non-U.S. Gov't |
9 |
80 |
2
|
Bidelman GM, Howell M. Functional changes in inter- and intra-hemispheric cortical processing underlying degraded speech perception. Neuroimage 2015; 124:581-590. [PMID: 26386346 DOI: 10.1016/j.neuroimage.2015.09.020] [Citation(s) in RCA: 75] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2015] [Revised: 07/29/2015] [Accepted: 09/09/2015] [Indexed: 11/18/2022] Open
Abstract
Previous studies suggest that at poorer signal-to-noise ratios (SNRs), auditory cortical event-related potentials are weakened, prolonged, and show a shift in the functional lateralization of cerebral processing from left to right hemisphere. Increased right hemisphere involvement during speech-in-noise (SIN) processing may reflect the recruitment of additional brain resources to aid speech recognition or alternatively, the progressive loss of involvement from left linguistic brain areas as speech becomes more impoverished (i.e., nonspeech-like). To better elucidate the brain basis of SIN perception, we recorded neuroelectric activity in normal hearing listeners to speech sounds presented at various SNRs. Behaviorally, listeners obtained superior SIN performance for speech presented to the right compared to the left ear (i.e., right ear advantage). Source analysis of neural data assessed the relative contribution of region-specific neural generators (linguistic and auditory brain areas) to SIN processing. We found that left inferior frontal brain areas (e.g., Broca's areas) partially disengage at poorer SNRs but responses do not right lateralize with increasing noise. In contrast, auditory sources showed more resilience to noise in left compared to right primary auditory cortex but also a progressive shift in dominance from left to right hemisphere at lower SNRs. Region- and ear-specific correlations revealed that listeners' right ear SIN advantage was predicted by source activity emitted from inferior frontal gyrus (but not primary auditory cortex). Our findings demonstrate changes in the functional asymmetry of cortical speech processing during adverse acoustic conditions and suggest that "cocktail party" listening skills depend on the quality of speech representations in the left cerebral hemisphere rather than compensatory recruitment of right hemisphere mechanisms.
Collapse
|
Research Support, Non-U.S. Gov't |
10 |
75 |
3
|
He L, Cao C. Automated depression analysis using convolutional neural networks from speech. J Biomed Inform 2018; 83:103-111. [PMID: 29852317 DOI: 10.1016/j.jbi.2018.05.007] [Citation(s) in RCA: 56] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2017] [Revised: 04/25/2018] [Accepted: 05/12/2018] [Indexed: 11/17/2022]
Abstract
To help clinicians to efficiently diagnose the severity of a person's depression, the affective computing community and the artificial intelligence field have shown a growing interest in designing automated systems. The speech features have useful information for the diagnosis of depression. However, manually designing and domain knowledge are still important for the selection of the feature, which makes the process labor consuming and subjective. In recent years, deep-learned features based on neural networks have shown superior performance to hand-crafted features in various areas. In this paper, to overcome the difficulties mentioned above, we propose a combination of hand-crafted and deep-learned features which can effectively measure the severity of depression from speech. In the proposed method, Deep Convolutional Neural Networks (DCNN) are firstly built to learn deep-learned features from spectrograms and raw speech waveforms. Then we manually extract the state-of-the-art texture descriptors named median robust extended local binary patterns (MRELBP) from spectrograms. To capture the complementary information within the hand-crafted features and deep-learned features, we propose joint fine-tuning layers to combine the raw and spectrogram DCNN to boost the depression recognition performance. Moreover, to address the problems with small samples, a data augmentation method was proposed. Experiments conducted on AVEC2013 and AVEC2014 depression databases show that our approach is robust and effective for the diagnosis of depression when compared to state-of-the-art audio-based methods.
Collapse
|
Research Support, Non-U.S. Gov't |
7 |
56 |
4
|
Brainstem-cortical functional connectivity for speech is differentially challenged by noise and reverberation. Hear Res 2018; 367:149-160. [PMID: 29871826 DOI: 10.1016/j.heares.2018.05.018] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Revised: 05/18/2018] [Accepted: 05/23/2018] [Indexed: 11/21/2022]
Abstract
Everyday speech perception is challenged by external acoustic interferences that hinder verbal communication. Here, we directly compared how different levels of the auditory system (brainstem vs. cortex) code speech and how their neural representations are affected by two acoustic stressors: noise and reverberation. We recorded multichannel (64 ch) brainstem frequency-following responses (FFRs) and cortical event-related potentials (ERPs) simultaneously in normal hearing individuals to speech sounds presented in mild and moderate levels of noise and reverb. We matched signal-to-noise and direct-to-reverberant ratios to equate the severity between classes of interference. Electrode recordings were parsed into source waveforms to assess the relative contribution of region-specific brain areas [i.e., brainstem (BS), primary auditory cortex (A1), inferior frontal gyrus (IFG)]. Results showed that reverberation was less detrimental to (and in some cases facilitated) the neural encoding of speech compared to additive noise. Inter-regional correlations revealed associations between BS and A1 responses, suggesting subcortical speech representations influence higher auditory-cortical areas. Functional connectivity analyses further showed that directed signaling toward A1 in both feedforward cortico-collicular (BS→A1) and feedback cortico-cortical (IFG→A1) pathways were strong predictors of degraded speech perception and differentiated "good" vs. "poor" perceivers. Our findings demonstrate a functional interplay within the brain's speech network that depends on the form and severity of acoustic interference. We infer that in addition to the quality of neural representations within individual brain regions, listeners' success at the "cocktail party" is modulated based on how information is transferred among subcortical and cortical hubs of the auditory-linguistic network.
Collapse
|
Research Support, Non-U.S. Gov't |
7 |
39 |
5
|
Giroud N, Lemke U, Reich P, Matthes KL, Meyer M. The impact of hearing aids and age-related hearing loss on auditory plasticity across three months - An electrical neuroimaging study. Hear Res 2017; 353:162-175. [PMID: 28705608 DOI: 10.1016/j.heares.2017.06.012] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Revised: 06/22/2017] [Accepted: 06/28/2017] [Indexed: 10/19/2022]
Abstract
The present study investigates behavioral and electrophysiological auditory and cognitive-related plasticity in three groups of healthy older adults (60-77 years). Group 1 was moderately hearing-impaired, experienced hearing aid users, and fitted with new hearing aids using non-linear frequency compression (NLFC on); Group 2, also moderately hearing-impaired, used the same type of hearing aids but NLFC was switched off during the entire period of study duration (NLFC off); Group 3 represented individuals with age-appropriate hearing (NHO) as controls, who were not different in IQ, gender, or age from Group 1 and 2. At five measurement time points (M1-M5) across three months, a series of active oddball tasks were administered while EEG was recorded. The stimuli comprised syllables consisting of naturally high-pitched fricatives (/sh/, /s/, and /f/), which are hard to distinguish for individuals with presbycusis. By applying a data-driven microstate approach to obtain global field power (GFP) as a measure of processing effort, the modulations of perceptual (P50, N1, P2) and cognitive-related (N2b, P3b) auditory evoked potentials were calculated and subsequently related to behavioral changes (accuracy and reaction time) across time. All groups improved their performance across time, but NHO showed consistently higher accuracy and faster reaction times than the hearing-impaired groups, especially under difficult conditions. Electrophysiological results complemented this finding by demonstrating longer latencies in the P50 and the N1 peak in hearing aid users. Furthermore, the GFP of cognitive-related evoked potentials decreased from M1 to M2 in the NHO group, while a comparable decrease in the hearing-impaired group was only evident at M5. After twelve weeks of hearing aid use of eight hours each day, we found a significantly lower GFP in the P3b of the group with NLFC on as compared to the group with NLFC off. These findings suggest higher processing effort, as evidenced by higher GFP, in hearing-impaired individuals when compared to those with normal hearing, although the hearing-impaired show a decrease of processing effort after repeated stimulus exposure. In addition, our findings indicate that the acclimatization to a new hearing aid algorithm may take several weeks.
Collapse
|
Research Support, Non-U.S. Gov't |
8 |
36 |
6
|
Tremblay P, Baroni M, Hasson U. Processing of speech and non-speech sounds in the supratemporal plane: auditory input preference does not predict sensitivity to statistical structure. Neuroimage 2012; 66:318-32. [PMID: 23116815 DOI: 10.1016/j.neuroimage.2012.10.055] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2012] [Revised: 08/27/2012] [Accepted: 10/15/2012] [Indexed: 11/17/2022] Open
Abstract
The supratemporal plane contains several functionally heterogeneous subregions that respond strongly to speech. Much of the prior work on the issue of speech processing in the supratemporal plane has focused on neural responses to single speech vs. non-speech sounds rather than focusing on higher-level computations that are required to process more complex auditory sequences. Here we examined how information is integrated over time for speech and non-speech sounds by quantifying the BOLD fMRI response to stochastic (non-deterministic) sequences of speech and non-speech naturalistic sounds that varied in their statistical structure (from random to highly structured sequences) during passive listening. Behaviorally, the participants were accurate in segmenting speech and non-speech sequences, though they were more accurate for speech. Several supratemporal regions showed increased activation magnitude for speech sequences (preference), but, importantly, this did not predict sensitivity to statistical structure: (i) several areas showing a speech preference were sensitive to statistical structure in both speech and non-speech sequences, and (ii) several regions that responded to both speech and non-speech sounds showed distinct responses to statistical structure in speech and non-speech sequences. While the behavioral findings highlight the tight relation between statistical structure and segmentation processes, the neuroimaging results suggest that the supratemporal plane mediates complex statistical processing for both speech and non-speech sequences and emphasize the importance of studying the neurocomputations associated with auditory sequence processing. These findings identify new partitions of functionally distinct areas in the supratemporal plane that cannot be evoked by single stimuli. The findings demonstrate the importance of going beyond input preference to examine the neural computations implemented in the superior temporal plane.
Collapse
|
Research Support, Non-U.S. Gov't |
13 |
34 |
7
|
Word reading skill predicts anticipation of upcoming spoken language input: a study of children developing proficiency in reading. J Exp Child Psychol 2014; 126:264-79. [PMID: 24955519 DOI: 10.1016/j.jecp.2014.05.004] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2014] [Revised: 05/15/2014] [Accepted: 05/16/2014] [Indexed: 11/23/2022]
Abstract
Despite the efficiency with which language users typically process spoken language, a growing body of research finds substantial individual differences in both the speed and accuracy of spoken language processing potentially attributable to participants' literacy skills. Against this background, the current study took a look at the role of word reading skill in listeners' anticipation of upcoming spoken language input in children at the cusp of learning to read; if reading skills affect predictive language processing, then children at this stage of literacy acquisition should be most susceptible to the effects of reading skills on spoken language processing. We tested 8-year-olds on their prediction of upcoming spoken language input in an eye-tracking task. Although children, like in previous studies to date, were successfully able to anticipate upcoming spoken language input, there was a strong positive correlation between children's word reading skills (but not their pseudo-word reading and meta-phonological awareness or their spoken word recognition skills) and their prediction skills. We suggest that these findings are most compatible with the notion that the process of learning orthographic representations during reading acquisition sharpens pre-existing lexical representations, which in turn also supports anticipation of upcoming spoken words.
Collapse
|
Journal Article |
11 |
33 |
8
|
Herrmann B, Butler BE. Hearing loss and brain plasticity: the hyperactivity phenomenon. Brain Struct Funct 2021; 226:2019-2039. [PMID: 34100151 DOI: 10.1007/s00429-021-02313-9] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Accepted: 06/03/2021] [Indexed: 12/22/2022]
Abstract
Many aging adults experience some form of hearing problems that may arise from auditory peripheral damage. However, it has been increasingly acknowledged that hearing loss is not only a dysfunction of the auditory periphery but also results from changes within the entire auditory system, from periphery to cortex. Damage to the auditory periphery is associated with an increase in neural activity at various stages throughout the auditory pathway. Here, we review neurophysiological evidence of hyperactivity, auditory perceptual difficulties that may result from hyperactivity, and outline open conceptual and methodological questions related to the study of hyperactivity. We suggest that hyperactivity alters all aspects of hearing-including spectral, temporal, spatial hearing-and, in turn, impairs speech comprehension when background sound is present. By focusing on the perceptual consequences of hyperactivity and the potential challenges of investigating hyperactivity in humans, we hope to bring animal and human electrophysiologists closer together to better understand hearing problems in older adulthood.
Collapse
|
Review |
4 |
30 |
9
|
König A, Linz N, Tröger J, Wolters M, Alexandersson J, Robert P. Fully Automatic Speech-Based Analysis of the Semantic Verbal Fluency Task. Dement Geriatr Cogn Disord 2018; 45:198-209. [PMID: 29886493 DOI: 10.1159/000487852] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/22/2017] [Accepted: 02/20/2018] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Semantic verbal fluency (SVF) tests are routinely used in screening for mild cognitive impairment (MCI). In this task, participants name as many items as possible of a semantic category under a time constraint. Clinicians measure task performance manually by summing the number of correct words and errors. More fine-grained variables add valuable information to clinical assessment, but are time-consuming. Therefore, the aim of this study is to investigate whether automatic analysis of the SVF could provide these as accurate as manual and thus, support qualitative screening of neurocognitive impairment. METHODS SVF data were collected from 95 older people with MCI (n = 47), Alzheimer's or related dementias (ADRD; n = 24), and healthy controls (HC; n = 24). All data were annotated manually and automatically with clusters and switches. The obtained metrics were validated using a classifier to distinguish HC, MCI, and ADRD. RESULTS Automatically extracted clusters and switches were highly correlated (r = 0.9) with manually established values, and performed as well on the classification task separating HC from persons with ADRD (area under curve [AUC] = 0.939) and MCI (AUC = 0.758). CONCLUSION The results show that it is possible to automate fine-grained analyses of SVF data for the assessment of cognitive decline.
Collapse
|
|
7 |
30 |
10
|
Hayakawa S, Marian V. Consequences of multilingualism for neural architecture. Behav Brain Funct 2019; 15:6. [PMID: 30909931 PMCID: PMC6432751 DOI: 10.1186/s12993-019-0157-z] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Accepted: 03/16/2019] [Indexed: 12/15/2022] Open
Abstract
Language has the power to shape cognition, behavior, and even the form and function of the brain. Technological and scientific developments have recently yielded an increasingly diverse set of tools with which to study the way language changes neural structures and processes. Here, we review research investigating the consequences of multilingualism as revealed by brain imaging. A key feature of multilingual cognition is that two or more languages can become activated at the same time, requiring mechanisms to control interference. Consequently, extensive experience managing multiple languages can influence cognitive processes as well as their neural correlates. We begin with a brief discussion of how bilinguals activate language, and of the brain regions implicated in resolving language conflict. We then review evidence for the pervasive impact of bilingual experience on the function and structure of neural networks that support linguistic and non-linguistic cognitive control, speech processing and production, and language learning. We conclude that even seemingly distinct effects of language on cognitive operations likely arise from interdependent functions, and that future work directly exploring the interactions between multiple levels of processing could offer a more comprehensive view of how language molds the mind.
Collapse
|
Review |
6 |
30 |
11
|
Flemotomos N, Martinez VR, Chen Z, Singla K, Ardulov V, Peri R, Caperton DD, Gibson J, Tanana MJ, Georgiou P, Van Epps J, Lord SP, Hirsch T, Imel ZE, Atkins DC, Narayanan S. Automated evaluation of psychotherapy skills using speech and language technologies. Behav Res Methods 2022; 54:690-711. [PMID: 34346043 PMCID: PMC8810915 DOI: 10.3758/s13428-021-01623-4] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/15/2021] [Indexed: 11/08/2022]
Abstract
With the growing prevalence of psychological interventions, it is vital to have measures which rate the effectiveness of psychological care to assist in training, supervision, and quality assurance of services. Traditionally, quality assessment is addressed by human raters who evaluate recorded sessions along specific dimensions, often codified through constructs relevant to the approach and domain. This is, however, a cost-prohibitive and time-consuming method that leads to poor feasibility and limited use in real-world settings. To facilitate this process, we have developed an automated competency rating tool able to process the raw recorded audio of a session, analyzing who spoke when, what they said, and how the health professional used language to provide therapy. Focusing on a use case of a specific type of psychotherapy called "motivational interviewing", our system gives comprehensive feedback to the therapist, including information about the dynamics of the session (e.g., therapist's vs. client's talking time), low-level psychological language descriptors (e.g., type of questions asked), as well as other high-level behavioral constructs (e.g., the extent to which the therapist understands the clients' perspective). We describe our platform and its performance using a dataset of more than 5000 recordings drawn from its deployment in a real-world clinical setting used to assist training of new therapists. Widespread use of automated psychotherapy rating tools may augment experts' capabilities by providing an avenue for more effective training and skill improvement, eventually leading to more positive clinical outcomes.
Collapse
|
Research Support, N.I.H., Extramural |
3 |
29 |
12
|
Seery A, Tager-Flusberg H, Nelson CA. Event-related potentials to repeated speech in 9-month-old infants at risk for autism spectrum disorder. J Neurodev Disord 2014; 6:43. [PMID: 25937843 PMCID: PMC4416338 DOI: 10.1186/1866-1955-6-43] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/03/2014] [Accepted: 11/04/2014] [Indexed: 12/15/2022] Open
Abstract
Background Atypical neural responses to repeated auditory and linguistic stimuli have been reported both in individuals with autism spectrum disorder (ASD) and their first-degree relatives. Recent work suggests that the younger siblings of children with ASD have atypical event-related potentials (ERPs) to repeated tones at 9 months of age; however, the functional significance is unclear, and it is unknown whether this atypicality is also present in response to linguistic stimuli. Methods We analyzed ERPs to repetitive and deviant consonant-vowel stimuli at 9 months in 35 unaffected high-risk-for-autism (HRA) infant siblings of children with ASD and 45 low-risk control (LRC) infants. We examined a positive component, the P150, over frontal and central electrode sites and investigated the relationships between this component and later behavior. Results Over frontal electrodes, HRA infants had larger-amplitude ERPs to repetitions of the standard than LRC infants, whereas ERPs to the deviant did not differ between HRA and LRC infants. Furthermore, for HRA infants, the amplitude of ERPs to the standards was positively correlated with later language ability. Conclusions Our work suggests that atypical ERPs to repeated speech during infancy are a possible endophenotype of ASD but that this atypicality is associated with beneficial, rather than disordered, language development. Potential mechanisms driving these relationships and implications for development are discussed.
Collapse
|
Journal Article |
11 |
26 |
13
|
Roldan-Vasco S, Orozco-Duque A, Suarez-Escudero JC, Orozco-Arroyave JR. Machine learning based analysis of speech dimensions in functional oropharyngeal dysphagia. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021; 208:106248. [PMID: 34260973 DOI: 10.1016/j.cmpb.2021.106248] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 06/15/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND AND OBJECTIVE The normal swallowing process requires a complex coordination of anatomical structures driven by sensory and cranial nerves. Alterations in such coordination cause swallowing malfunctions, namely dysphagia. The dysphagia screening methods are quite subjective and experience dependent. Bearing in mind that the swallowing process and speech production share some anatomical structures and mechanisms of neurological control, this work aims to evaluate the suitability of automatic speech processing and machine learning techniques for screening of functional dysphagia. METHODS Speech recordings were collected from 46 patients with functional oropharyngeal dysphagia produced by neurological causes, and 46 healthy controls. The dimensions of speech including phonation, articulation, and prosody were considered through different speech tasks. Specific features per dimension were extracted and analyzed using statistical tests. Machine learning models were applied per dimension via nested cross-validation. Hyperparameters were selected using the AUC - ROC as optimization criterion. RESULTS The Random Forest in the articulation related speech tasks retrieved the highest performance measures (AUC=0.86±0.10, sensitivity=0.91±0.12) for individual analysis of dimensions. In addition, the combination of speech dimensions with a voting ensemble improved the results, which suggests a contribution of information from different feature sets extracted from speech signals in dysphagia conditions. CONCLUSIONS The proposed approach based on speech related models is suitable for the automatic discrimination between dysphagic and healthy individuals. These findings seem to have potential use in the screening of functional oropharyngeal dysphagia in a non-invasive and inexpensive way.
Collapse
|
|
4 |
20 |
14
|
Collard MJ, Fifer MS, Benz HL, McMullen DP, Wang Y, Milsap GW, Korzeniewska A, Crone NE. Cortical subnetwork dynamics during human language tasks. Neuroimage 2016; 135:261-72. [PMID: 27046113 DOI: 10.1016/j.neuroimage.2016.03.072] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2015] [Revised: 03/12/2016] [Accepted: 03/26/2016] [Indexed: 02/07/2023] Open
Abstract
Language tasks require the coordinated activation of multiple subnetworks-groups of related cortical interactions involved in specific components of task processing. Although electrocorticography (ECoG) has sufficient temporal and spatial resolution to capture the dynamics of event-related interactions between cortical sites, it is difficult to decompose these complex spatiotemporal patterns into functionally discrete subnetworks without explicit knowledge of each subnetwork's timing. We hypothesized that subnetworks corresponding to distinct components of task-related processing could be identified as groups of interactions with co-varying strengths. In this study, five subjects implanted with ECoG grids over language areas performed word repetition and picture naming. We estimated the interaction strength between each pair of electrodes during each task using a time-varying dynamic Bayesian network (tvDBN) model constructed from the power of high gamma (70-110Hz) activity, a surrogate for population firing rates. We then reduced the dimensionality of this model using principal component analysis (PCA) to identify groups of interactions with co-varying strengths, which we term functional network components (FNCs). This data-driven technique estimates both the weight of each interaction's contribution to a particular subnetwork, and the temporal profile of each subnetwork's activation during the task. We found FNCs with temporal and anatomical features consistent with articulatory preparation in both tasks, and with auditory and visual processing in the word repetition and picture naming tasks, respectively. These FNCs were highly consistent between subjects with similar electrode placement, and were robust enough to be characterized in single trials. Furthermore, the interaction patterns uncovered by FNC analysis correlated well with recent literature suggesting important functional-anatomical distinctions between processing external and self-produced speech. Our results demonstrate that subnetwork decomposition of event-related cortical interactions is a powerful paradigm for interpreting the rich dynamics of large-scale, distributed cortical networks during human cognitive tasks.
Collapse
|
Research Support, N.I.H., Extramural |
9 |
18 |
15
|
Finch KH, Seery AM, Talbott MR, Nelson CA, Tager-Flusberg H. Lateralization of ERPs to speech and handedness in the early development of Autism Spectrum Disorder. J Neurodev Disord 2017; 9:4. [PMID: 28174606 PMCID: PMC5292148 DOI: 10.1186/s11689-017-9185-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/23/2016] [Accepted: 01/16/2017] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND Language is a highly lateralized function, with typically developing individuals showing left hemispheric specialization. Individuals with autism spectrum disorder (ASD) often show reduced or reversed hemispheric lateralization in response to language. However, it is unclear when this difference emerges and whether or not it can serve as an early ASD biomarker. Additionally, atypical language lateralization is not specific to ASD as it is also seen more frequently in individuals with mixed- and left-handedness. Here, we examined early asymmetry patterns measured through neural responses to speech sounds at 12 months and behavioral observations of handedness at 36 months in children with and without ASD. METHODS Three different groups of children participated in the study: low-risk controls (LRC), high risk for ASD (HRA; infants with older sibling with ASD) without ASD, and HRA infants who later receive a diagnosis of ASD (ASD). Event-related potentials (ERPs) to speech sounds were recorded at 12 months. Utilizing a novel observational approach, handedness was measured by hand preference on a variety of behaviors at 36 months. RESULTS At 12 months, lateralization patterns of ERPs to speech stimuli differed across the groups with the ASD group showing reversed lateralization compared to the LRC group. At 36 months, factor analysis of behavioral observations of hand preferences indicated a one-factor model with medium to high factor loadings. A composite handedness score was derived; no group differences were observed. There was no association between lateralization to speech at 12 months and handedness at 36 months in the LRC and HRA groups. However, children with ASD did show an association such that infants with lateralization patterns more similar to the LRC group at 12 months were stronger right-handers at 36 months. CONCLUSIONS These results highlight early developmental patterns that might be specific to ASD, including a potential early biomarker of reversed lateralization to speech stimuli at 12 months, and a relation between behavioral and neural asymmetries. Future investigations of early asymmetry patterns, especially atypical hemispheric specialization, may be informative in the early identification of ASD.
Collapse
|
Journal Article |
8 |
18 |
16
|
Christmann CA, Berti S, Steinbrink C, Lachmann T. Differences in sensory processing of German vowels and physically matched non-speech sounds as revealed by the mismatch negativity (MMN) of the human event-related brain potential (ERP). BRAIN AND LANGUAGE 2014; 136:8-18. [PMID: 25108306 DOI: 10.1016/j.bandl.2014.07.004] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2014] [Revised: 07/14/2014] [Accepted: 07/17/2014] [Indexed: 06/03/2023]
Abstract
We compared processing of speech and non-speech by means of the mismatch negativity (MMN). For this purpose, the MMN elicited by vowels was compared to those elicited by two non-speech stimulus types: spectrally rotated vowels, having the same stimulus complexity as the speech stimuli, and sounds based on the bands of formants of the vowels, representing non-speech stimuli of lower complexity as compared to the other stimulus types. This design allows controlling for effects of stimulus complexity when comparing neural correlates of processing speech to non-speech. Deviants within a modified multi-feature design differed either in duration or spectral property. Moreover, the difficulty to discriminate between the standard and the two deviants was controlled for each stimulus type by means of an additional active discrimination task. Vowels elicited a larger MMN compared to both non-speech stimulus types, supporting the concept of language-specific phoneme representations and the role of the participants' prior experience.
Collapse
|
Comparative Study |
11 |
15 |
17
|
Söderström P, Horne M, Roll M. Stem Tones Pre-activate Suffixes in the Brain. JOURNAL OF PSYCHOLINGUISTIC RESEARCH 2017; 46:271-280. [PMID: 27240896 PMCID: PMC5368231 DOI: 10.1007/s10936-016-9434-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Results from the present event-related potentials (ERP) study show that tones on Swedish word stems can rapidly pre-activate upcoming suffixes, even when the word stem does not carry any lexical meaning. Results also show that listeners are able to rapidly restore suffixes which are replaced with a cough. Accuracy in restoring suffixes correlated positively with the amplitude of an anterior negative ERP elicited by stem tones. This effect is proposed to reflect suffix pre-activation. Suffixes that were cued by an incorrect tone elicited a left-anterior negativity and a P600, suggesting that the correct processing of the suffix is crucially tied to the activation of the preceding validly associated tone.
Collapse
|
research-article |
8 |
15 |
18
|
Differences in Neural Correlates of Speech Perception in 3 Month Olds at High and Low Risk for Autism Spectrum Disorder. J Autism Dev Disord 2018; 47:3125-3138. [PMID: 28688078 DOI: 10.1007/s10803-017-3222-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
In this study, we investigated neural precursors of language acquisition as potential endophenotypes of autism spectrum disorder (ASD) in 3-month-old infants at high and low familial ASD risk. Infants were imaged using functional near-infrared spectroscopy while they listened to auditory stimuli containing syllable repetitions; their neural responses were analyzed over left and right temporal regions. While female low risk infants showed initial neural activation that decreased over exposure to repetition-based stimuli, potentially indicating a habituation response to repetition in speech, female high risk infants showed no changes in neural activity over exposure. This finding may indicate a potential neural endophenotype of language development or ASD specific to females at risk for the disorder.
Collapse
|
Journal Article |
7 |
14 |
19
|
Kumar S, Chaube MK, Alsamhi SH, Gupta SK, Guizani M, Gravina R, Fortino G. A novel multimodal fusion framework for early diagnosis and accurate classification of COVID-19 patients using X-ray images and speech signal processing techniques. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 226:107109. [PMID: 36174422 PMCID: PMC9465496 DOI: 10.1016/j.cmpb.2022.107109] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 07/11/2022] [Accepted: 09/02/2022] [Indexed: 06/16/2023]
Abstract
BACKGROUND AND OBJECTIVE COVID-19 outbreak has become one of the most challenging problems for human being. It is a communicable disease caused by a new coronavirus strain, which infected over 375 million people already and caused almost 6 million deaths. This paper aims to develop and design a framework for early diagnosis and fast classification of COVID-19 symptoms using multimodal Deep Learning techniques. METHODS we collected chest X-ray and cough sample data from open source datasets, Cohen and datasets and local hospitals. The features are extracted from the chest X-ray images are extracted from chest X-ray datasets. We also used cough audio datasets from Coswara project and local hospitals. The publicly available Coughvid DetectNow and Virufy datasets are used to evaluate COVID-19 detection based on speech sounds, respiratory, and cough. The collected audio data comprises slow and fast breathing, shallow and deep coughing, spoken digits, and phonation of sustained vowels. Gender, geographical location, age, preexisting medical conditions, and current health status (COVID-19 and Non-COVID-19) are recorded. RESULTS The proposed framework uses the selection algorithm of the pre-trained network to determine the best fusion model characterized by the pre-trained chest X-ray and cough models. Third, deep chest X-ray fusion by discriminant correlation analysis is used to fuse discriminatory features from the two models. The proposed framework achieved recognition accuracy, specificity, and sensitivity of 98.91%, 96.25%, and 97.69%, respectively. With the fusion method we obtained 94.99% accuracy. CONCLUSION This paper examines the effectiveness of well-known ML architectures on a joint collection of chest-X-rays and cough samples for early classification of COVID-19. It shows that existing methods can effectively used for diagnosis and suggesting that the fusion learning paradigm could be a crucial asset in diagnosing future unknown illnesses. The proposed framework supports health informatics basis on early diagnosis, clinical decision support, and accurate prediction.
Collapse
|
research-article |
3 |
14 |
20
|
Ghio M, Cara C, Tettamanti M. The prenatal brain readiness for speech processing: A review on foetal development of auditory and primordial language networks. Neurosci Biobehav Rev 2021; 128:709-719. [PMID: 34274405 DOI: 10.1016/j.neubiorev.2021.07.009] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2021] [Revised: 07/02/2021] [Accepted: 07/09/2021] [Indexed: 10/20/2022]
Abstract
Despite consolidated evidence for the prenatal ability to elaborate and respond to sounds and speech stimuli, the ontogenetic functional brain maturation of language responsiveness in the foetus is still poorly understood. Recent advances in in-vivo foetal neuroimaging have contributed to a finely detailed picture of the anatomo-functional hallmarks that define the prenatal neurodevelopment of auditory and language-related networks. Here, we first outline available evidence for the prenatal development of auditory and language-related brain structures and of their anatomical connections. Second, we focus on functional connectivity data showing the emergence of auditory and primordial language networks in the foetal brain. Third, we recapitulate functional neuroimaging studies assessing the prenatal readiness for sound processing, as a crucial prerequisite for the foetus to experientially respond to spoken language. In conclusion, we suggest that the state of the art has reached sufficient maturity to directly assess the neural mechanisms underlying the prenatal readiness for speech processing and to evaluate whether foetal neuromarkers can predict the postnatal development of language acquisition abilities and disabilities.
Collapse
|
Review |
4 |
13 |
21
|
Woodruff Carr K, Tierney A, White-Schwoch T, Kraus N. Intertrial auditory neural stability supports beat synchronization in preschoolers. Dev Cogn Neurosci 2016; 17:76-82. [PMID: 26760457 PMCID: PMC4763990 DOI: 10.1016/j.dcn.2015.12.003] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2015] [Revised: 10/17/2015] [Accepted: 12/03/2015] [Indexed: 01/25/2023] Open
Abstract
The ability to synchronize motor movements along with an auditory beat places stringent demands on the temporal processing and sensorimotor integration capabilities of the nervous system. Links between millisecond-level precision of auditory processing and the consistency of sensorimotor beat synchronization implicate fine auditory neural timing as a mechanism for forming stable internal representations of, and behavioral reactions to, sound. Here, for the first time, we demonstrate a systematic relationship between consistency of beat synchronization and trial-by-trial stability of subcortical speech processing in preschoolers (ages 3 and 4 years old). We conclude that beat synchronization might provide a useful window into millisecond-level neural precision for encoding sound in early childhood, when speech processing is especially important for language acquisition and development.
Collapse
|
Research Support, N.I.H., Extramural |
9 |
13 |
22
|
Xiao B, Huang C, Imel ZE, Atkins DC, Georgiou P, Narayanan SS. A technology prototype system for rating therapist empathy from audio recordings in addiction counseling. PeerJ Comput Sci 2016; 2:e59. [PMID: 28286867 PMCID: PMC5344199 DOI: 10.7717/peerj-cs.59] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Scaling up psychotherapy services such as for addiction counseling is a critical societal need. One challenge is ensuring quality of therapy, due to the heavy cost of manual observational assessment. This work proposes a speech technology-based system to automate the assessment of therapist empathy-a key therapy quality index-from audio recordings of the psychotherapy interactions. We designed a speech processing system that includes voice activity detection and diarization modules, and an automatic speech recognizer plus a speaker role matching module to extract the therapist's language cues. We employed Maximum Entropy models, Maximum Likelihood language models, and a Lattice Rescoring method to characterize high vs. low empathic language. We estimated therapy-session level empathy codes using utterance level evidence obtained from these models. Our experiments showed that the fully automated system achieved a correlation of 0.643 between expert annotated empathy codes and machine-derived estimations, and an accuracy of 81% in classifying high vs. low empathy, in comparison to a 0.721 correlation and 86% accuracy in the oracle setting using manual transcripts. The results show that the system provides useful information that can contribute to automatic quality insurance and therapist training.
Collapse
|
research-article |
9 |
13 |
23
|
Shader MJ, Luke R, Gouailhardou N, McKay CM. The use of broad vs restricted regions of interest in functional near-infrared spectroscopy for measuring cortical activation to auditory-only and visual-only speech. Hear Res 2021; 406:108256. [PMID: 34051607 DOI: 10.1016/j.heares.2021.108256] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Revised: 03/31/2021] [Accepted: 04/19/2021] [Indexed: 12/24/2022]
Abstract
As an alternative to fMRI, functional near-infrared spectroscopy (fNIRS) is a relatively new tool for observing cortical activation. However, spatial resolution is reduced compared to fMRI and often the exact locations of fNIRS optodes and specific anatomical information is not known. The aim of this study was to explore the location and range of specific regions of interest that are sensitive to detecting cortical activation using fNIRS in response to auditory- and visual-only connected speech. Two approaches to a priori region-of-interest selection were explored. First, broad regions corresponding to the auditory cortex and occipital lobe were analysed. Next, the fNIRS Optode Location Decider (fOLD) tool was used to divide the auditory and visual regions into two subregions corresponding to distinct anatomical structures. The Auditory-A and -B regions corresponded to Heschl's gyrus and planum temporale, respectively. The Visual-A region corresponded to the superior occipital gyrus and the cuneus, and the Visual-B region corresponded to the middle occipital gyrus. The experimental stimulus consisted of a connected speech signal segmented into 12.5-sec blocks and was presented in either an auditory-only or visual-only condition. Group-level results for eight normal-hearing adult participants averaged over the broad regions of interest revealed significant auditory-evoked activation for both the left and right broad auditory regions of interest. No significant activity was observed for any other broad region of interest in response to any stimulus condition. When divided into subregions, there was a significant positive auditory-evoked response in the left and right Auditory-A regions, suggesting activation near the primary auditory cortex in response to auditory-only speech. There was a significant positive visual-evoked response in the Visual-B region, suggesting middle occipital gyrus activation in response to visual-only speech. In the Visual-A region, however, there was a significant negative visual-evoked response. This result suggests a significant decrease in oxygenated hemoglobin in the superior occipital gyrus as well as the cuneus in response to visual-only speech. Distinct response characteristics, either positive or negative, in adjacent subregions within the temporal and occipital lobes were fairly consistent on the individual level. Results suggest that temporal regions near Heschl's gyrus may be the most advantageous location in adults for identifying hemodynamic responses to complex auditory speech signals using fNIRS. In the occipital lobe, regions corresponding to the facial processing pathway may prove advantageous for measuring positive responses to visual speech using fNIRS.
Collapse
|
Journal Article |
4 |
12 |
24
|
ALICE: An open-source tool for automatic measurement of phoneme, syllable, and word counts from child-centered daylong recordings. Behav Res Methods 2021; 53:818-835. [PMID: 32875399 PMCID: PMC8062390 DOI: 10.3758/s13428-020-01460-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Recordings captured by wearable microphones are a standard method for investigating young children's language environments. A key measure to quantify from such data is the amount of speech present in children's home environments. To this end, the LENA recorder and software-a popular system for measuring linguistic input-estimates the number of adult words that children may hear over the course of a recording. However, word count estimation is challenging to do in a language- independent manner; the relationship between observable acoustic patterns and language-specific lexical entities is far from uniform across human languages. In this paper, we ask whether some alternative linguistic units, namely phone(me)s or syllables, could be measured instead of, or in parallel with, words in order to achieve improved cross-linguistic applicability and comparability of an automated system for measuring child language input. We discuss the advantages and disadvantages of measuring different units from theoretical and technical points of view. We also investigate the practical applicability of measuring such units using a novel system called Automatic LInguistic unit Count Estimator (ALICE) together with audio from seven child-centered daylong audio corpora from diverse cultural and linguistic environments. We show that language-independent measurement of phoneme counts is somewhat more accurate than syllables or words, but all three are highly correlated with human annotations on the same data. We share an open-source implementation of ALICE for use by the language research community, enabling automatic phoneme, syllable, and word count estimation from child-centered audio recordings.
Collapse
|
Research Support, Non-U.S. Gov't |
4 |
12 |
25
|
Mintz TH, Walker RL, Welday A, Kidd C. Infants' sensitivity to vowel harmony and its role in segmenting speech. Cognition 2018; 171:95-107. [PMID: 29121588 PMCID: PMC5818326 DOI: 10.1016/j.cognition.2017.10.020] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2010] [Revised: 10/13/2017] [Accepted: 10/18/2017] [Indexed: 10/18/2022]
Abstract
A critical part of infants' ability to acquire any language involves segmenting continuous speech input into discrete word forms. Certain properties of words could provide infants with reliable cues to word boundaries. Here we investigate the potential utility of vowel harmony (VH), a phonological property whereby vowels within a word systematically exhibit similarity ("harmony") for some aspect of the way they are pronounced. We present evidence that infants with no experience of VH in their native language nevertheless actively use these patterns to generate hypotheses about where words begin and end in the speech stream. In two sets of experiments, we exposed infants learning English, a language without VH, to a continuous speech stream in which the only systematic patterns available to be used as cues to word boundaries came from syllable sequences that showed VH or those that showed vowel disharmony (dissimilarity). After hearing less than one minute of the streams, infants showed evidence of sensitivity to VH cues. These results suggest that infants have an experience-independent sensitivity to VH, and are predisposed to segment speech according to harmony patterns. We also found that when the VH patterns were more subtle (Experiment 2), infants required more exposure to the speech stream before they segmented based on VH, consistent with previous work on infants' preferences relating to processing load. Our findings evidence a previously unknown mechanism by which infants could discover the words of their language, and they shed light on the perceptual mechanisms that might be responsible for the emergence of vowel harmony as an organizing principle for the sound structure of words in many languages.
Collapse
|
Research Support, N.I.H., Extramural |
7 |
11 |