1
|
Isaev DY, Vlasova RM, Di Martino JM, Stephen CD, Schmahmann JD, Sapiro G, Gupta AS. Uncertainty of Vowel Predictions as a Digital Biomarker for Ataxic Dysarthria. CEREBELLUM (LONDON, ENGLAND) 2024; 23:459-470. [PMID: 37039956 PMCID: PMC10826261 DOI: 10.1007/s12311-023-01539-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 02/27/2023] [Indexed: 04/12/2023]
Abstract
Dysarthria is a common manifestation across cerebellar ataxias leading to impairments in communication, reduced social connections, and decreased quality of life. While dysarthria symptoms may be present in other neurological conditions, ataxic dysarthria is a perceptually distinct motor speech disorder, with the most prominent characteristics being articulation and prosody abnormalities along with distorted vowels. We hypothesized that uncertainty of vowel predictions by an automatic speech recognition system can capture speech changes present in cerebellar ataxia. Speech of participants with ataxia (N=61) and healthy controls (N=25) was recorded during the "picture description" task. Additionally, participants' dysarthric speech and ataxia severity were assessed on a Brief Ataxia Rating Scale (BARS). Eight participants with ataxia had speech and BARS data at two timepoints. A neural network trained for phoneme prediction was applied to speech recordings. Average entropy of vowel tokens predictions (AVE) was computed for each participant's recording, together with mean pitch and intensity standard deviations (MPSD and MISD) in the vowel segments. AVE and MISD demonstrated associations with BARS speech score (Spearman's rho=0.45 and 0.51), and AVE demonstrated associations with BARS total (rho=0.39). In the longitudinal cohort, Wilcoxon pairwise signed rank test demonstrated an increase in BARS total and AVE, while BARS speech and acoustic measures did not significantly increase. Relationship of AVE to both BARS speech and BARS total, as well as the ability to capture disease progression even in absence of measured speech decline, indicates the potential of AVE as a digital biomarker for cerebellar ataxia.
Collapse
Affiliation(s)
- Dmitry Yu Isaev
- Department of Biomedical Engineering, Duke University, Durham, NC, USA.
| | - Roza M Vlasova
- Department of Psychiatry, UNC School of Medicine, University of North Carolina, Chapel Hill, NC, USA
| | - J Matias Di Martino
- Department of Electrical and Computer Engineering, Duke University, Durham, NC, USA
| | - Christopher D Stephen
- Ataxia Center & Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Jeremy D Schmahmann
- Ataxia Center & Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Guillermo Sapiro
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
- Department of Electrical and Computer Engineering, Duke University, Durham, NC, USA
- Departments of Mathematics & Computer Science, Duke University, Durham, NC, USA
| | - Anoopum S Gupta
- Ataxia Center & Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
2
|
Simeone PJ, Green JR, Tager-Flusberg H, Chenausky KV. Vowel distinctiveness as a concurrent predictor of expressive language function in autistic children. Autism Res 2024; 17:419-431. [PMID: 38348589 DOI: 10.1002/aur.3102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 01/10/2024] [Indexed: 02/21/2024]
Abstract
Speech ability may limit spoken language development in some minimally verbal autistic children. In this study, we aimed to determine whether an acoustic measure of speech production, vowel distinctiveness, is concurrently related to expressive language (EL) for autistic children. Syllables containing the vowels [i] and [a] were recorded remotely from 27 autistic children (4;1-7;11) with a range of spoken language abilities. Vowel distinctiveness was calculated using automatic formant tracking software. Robust hierarchical regressions were conducted with receptive language (RL) and vowel distinctiveness as predictors of EL. Hierarchical regressions were also conducted within a High EL and a Low EL subgroup. Vowel distinctiveness accounted for 29% of the variance in EL for the entire group, RL for 38%. For the Low EL group, only vowel distinctiveness was significant, accounting for 38% of variance in EL. Conversely, in the High EL group, only RL was significant and accounted for 26% of variance in EL. Replicating previous results, speech production and RL significantly predicted concurrent EL in autistic children, with speech production being the sole significant predictor for the Low EL group and RL the sole significant predictor for the High EL group. Further work is needed to determine whether vowel distinctiveness longitudinally, as well as concurrently, predicts EL. Findings have important implications for the early identification of language impairment and in developing language interventions for autistic children.
Collapse
Affiliation(s)
- Paul J Simeone
- Department of Communication Sciences and Disorders, MGH Institute of Health Professions, Boston, Massachusetts, USA
- Division of Allied health and Supportive Technology, May Institute, Randolph, Massachusetts, USA
| | - Jordan R Green
- Department of Communication Sciences and Disorders, MGH Institute of Health Professions, Boston, Massachusetts, USA
- Department of Speech and Hearing Bioscience and Technology, Division of Medical Sciences, Harvard University, Cambridge, Massachusetts, USA
| | - Helen Tager-Flusberg
- Department of Psychological & Brain Sciences, College of Arts and Sciences, Boston University, Boston, Massachusetts, USA
| | - Karen V Chenausky
- Department of Communication Sciences and Disorders, MGH Institute of Health Professions, Boston, Massachusetts, USA
- Department of Neurology, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
3
|
Olah J, Spencer T, Cummins N, Diederen K. Automated analysis of speech as a marker of sub-clinical psychotic experiences. Front Psychiatry 2024; 14:1265880. [PMID: 38361830 PMCID: PMC10867252 DOI: 10.3389/fpsyt.2023.1265880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 12/22/2023] [Indexed: 02/17/2024] Open
Abstract
Automated speech analysis techniques, when combined with artificial intelligence and machine learning, show potential in capturing and predicting a wide range of psychosis symptoms, garnering attention from researchers. These techniques hold promise in predicting the transition to clinical psychosis from at-risk states, as well as relapse or treatment response in individuals with clinical-level psychosis. However, challenges in scientific validation hinder the translation of these techniques into practical applications. Although sub-clinical research could aid to tackle most of these challenges, there have been only few studies conducted in speech and psychosis research in non-clinical populations. This work aims to facilitate this work by summarizing automated speech analytical concepts and the intersection of this field with psychosis research. We review psychosis continuum and sub-clinical psychotic experiences, and the benefits of researching them. Then, we discuss the connection between speech and psychotic symptoms. Thirdly, we overview current and state-of-the art approaches to the automated analysis of speech both in terms of language use (text-based analysis) and vocal features (audio-based analysis). Then, we review techniques applied in subclinical population and findings in these samples. Finally, we discuss research challenges in the field, recommend future research endeavors and outline how research in subclinical populations can tackle the listed challenges.
Collapse
Affiliation(s)
- Julianna Olah
- Department of Psychosis Studies, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, United Kingdom
| | - Thomas Spencer
- Department of Psychosis Studies, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, United Kingdom
| | - Nicholas Cummins
- Department of Biostatistics & Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, United Kingdom
| | - Kelly Diederen
- Department of Psychosis Studies, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, United Kingdom
| |
Collapse
|
4
|
Kim JA, Jang H, Choi Y, Min YG, Hong YH, Sung JJ, Choi SJ. Subclinical articulatory changes of vowel parameters in Korean amyotrophic lateral sclerosis patients with perceptually normal voices. PLoS One 2023; 18:e0292460. [PMID: 37831677 PMCID: PMC10575489 DOI: 10.1371/journal.pone.0292460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 09/21/2023] [Indexed: 10/15/2023] Open
Abstract
The available quantitative methods for evaluating bulbar dysfunction in patients with amyotrophic lateral sclerosis (ALS) are limited. We aimed to characterize vowel properties in Korean ALS patients, investigate associations between vowel parameters and clinical features of ALS, and analyze subclinical articulatory changes of vowel parameters in those with perceptually normal voices. Forty-three patients with ALS (27 with dysarthria and 16 without dysarthria) and 20 healthy controls were prospectively collected in the study. Dysarthria was assessed using the ALS Functional Rating Scale-Revised (ALSFRS-R) speech subscores, with any loss of 4 points indicating the presence of dysarthria. The structured speech samples were recorded and analyzed using Praat software. For three corner vowels (/a/, /i/, and /u/), data on the vowel duration, fundamental frequency, frequencies of the first two formants (F1 and F2), harmonics-to-noise ratio, vowel space area (VSA), and vowel articulation index (VAI) were extracted from the speech samples. Corner vowel durations were significantly longer in ALS patients with dysarthria than in healthy controls. The F1 frequency of /a/, F2 frequencies of /i/ and /u/, the VSA, and the VAI showed significant differences between ALS patients with dysarthria and healthy controls. The area under the curve (AUC) was 0.912. The F1 frequency of /a/ and the VSA were the major determinants for differentiating ALS patients who had not yet developed apparent dysarthria from healthy controls (AUC 0.887). In linear regression analyses, as the ALSFRS-R speech subscore decreased, both the VSA and VAI were reduced. In contrast, vowel durations were found to be rather prolonged. The analyses of vowel parameters provided a useful metric correlated with disease severity for detecting subclinical bulbar dysfunction in ALS patients.
Collapse
Affiliation(s)
- Jin-Ah Kim
- Department of Neurology, Seoul National University Hospital, Seoul, Republic of Korea
- Department of Translational Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea
- Genomic Medicine Institute, Medical Research Center, Seoul National University, Seoul, Republic of Korea
| | - Hayeun Jang
- Division of English, Busan University of Foreign Studies, Busan, Republic of Korea
| | - Yoonji Choi
- Department of Korean Language and Literature, Seoul National University, Seoul, Republic of Korea
| | - Young Gi Min
- Department of Neurology, Seoul National University Hospital, Seoul, Republic of Korea
- Department of Translational Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Yoon-Ho Hong
- Department of Neurology, Seoul Metropolitan Government-Seoul National University Boramae Medical Center, Seoul, Republic of Korea
| | - Jung-Joon Sung
- Department of Neurology, Seoul National University Hospital, Seoul, Republic of Korea
- Neuroscience Research Institute, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Seok-Jin Choi
- Department of Neurology, Seoul National University Hospital, Seoul, Republic of Korea
- Center for Hospital Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| |
Collapse
|
5
|
Borrie SA, Hepworth TJ, Wynn CJ, Hustad KC, Barrett TS, Lansford KL. Perceptual Learning of Dysarthria in Adolescence. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:3791-3803. [PMID: 37616225 PMCID: PMC10713018 DOI: 10.1044/2023_jslhr-23-00231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 05/28/2023] [Accepted: 06/20/2023] [Indexed: 08/26/2023]
Abstract
PURPOSE As evidenced by perceptual learning studies involving adult listeners and speakers with dysarthria, adaptation to dysarthric speech is driven by signal predictability (speaker property) and a flexible speech perception system (listener property). Here, we extend adaptation investigations to adolescent populations and examine whether adult and adolescent listeners can learn to better understand an adolescent speaker with dysarthria. METHOD Classified by developmental stage, adult (n = 42) and adolescent (n = 40) listeners completed a three-phase perceptual learning protocol (pretest, familiarization, and posttest). During pretest and posttest, all listeners transcribed speech produced by a 13-year-old adolescent with spastic dysarthria associated with cerebral palsy. During familiarization, half of the adult and adolescent listeners engaged in structured familiarization (audio and lexical feedback) with the speech of the adolescent speaker with dysarthria; and the other half, with the speech of a neurotypical adolescent speaker (control). RESULTS Intelligibility scores increased from pretest to posttest for all listeners. However, listeners who received dysarthria familiarization achieved greater intelligibility improvements than those who received control familiarization. Furthermore, there was a significant effect of developmental stage, where the adults achieved greater intelligibility improvements relative to the adolescents. CONCLUSIONS This study provides the first tranche of evidence that adolescent dysarthric speech is learnable-a finding that holds even for adolescent listeners whose speech perception systems are not yet fully developed. Given the formative role that social interactions play during adolescence, these findings of improved intelligibility afford important clinical implications.
Collapse
Affiliation(s)
- Stephanie A. Borrie
- Department of Communicative Disorders and Deaf Education, Utah State University, Logan
| | - Taylor J. Hepworth
- Department of Communicative Disorders and Deaf Education, Utah State University, Logan
| | - Camille J. Wynn
- Department of Communication Science and Disorders, University of Houston
| | - Katherine C. Hustad
- Waisman Center, University of Wisconsin–Madison
- Department of Communication Sciences and Disorders, University of Wisconsin–Madison
| | | | - Kaitlin L. Lansford
- Department of Communication Science and Disorders, Florida State University, Tallahassee
| |
Collapse
|
6
|
Olah J, Diederen K, Gibbs-Dean T, Kempton MJ, Dobson R, Spencer T, Cummins N. Online speech assessment of the psychotic spectrum: Exploring the relationship between overlapping acoustic markers of schizotypy, depression and anxiety. Schizophr Res 2023; 259:11-19. [PMID: 37080802 DOI: 10.1016/j.schres.2023.03.044] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 03/22/2023] [Accepted: 03/23/2023] [Indexed: 04/22/2023]
Abstract
BACKGROUND Remote assessment of acoustic alterations in speech holds promise to increase scalability and validity in research across the psychosis spectrum. A feasible first step in establishing a procedure for online assessments is to assess acoustic alterations in psychometric schizotypy. However, to date, the complex relationship between alterations in speech related to schizotypy and those related to comorbid conditions such as symptoms of depression and anxiety has not been investigated. This study tested whether (1) depression, generalized anxiety and high psychometric schizotypy have similar voice characteristics, (2) which acoustic markers of online collected speech are the strongest predictors of psychometric schizotypy, (3) whether including generalized anxiety and depression symptoms in the model can improve the prediction of schizotypy. METHODS We collected cross-sectional, online-recorded speech data from 441 participants, assessing demographics, symptoms of depression, generalized anxiety and psychometric schizotypy. RESULTS Speech samples collected online could predict psychometric schizotypy, depression, and anxiety symptoms with weak to moderate predictive power, and with moderate and good predictive power when basic demographic variables were added to the models. Most influential features of these models largely overlapped. The predictive power of speech marker-based models of schizotypy significantly improved after including symptom scores of depression and generalized anxiety in the models (from R2 = 0.296 to R2 = 0. 436). CONCLUSIONS Acoustic features of online collected speech are predictive of psychometric schizotypy as well as generalized anxiety and depression symptoms. The acoustic characteristics of schizotypy, depression and anxiety symptoms significantly overlap. Speech models that are designed to predict schizotypy or symptoms of the schizophrenia spectrum might therefore benefit from controlling for symptoms of depression and anxiety.
Collapse
Affiliation(s)
- Julianna Olah
- Institute of Psychiatry, Psychology and Neuroscience, Department of Psychosis Studies, King's College London, London SE5 8AF, UK.
| | - Kelly Diederen
- Institute of Psychiatry, Psychology and Neuroscience, Department of Psychosis Studies, King's College London, London SE5 8AF, UK
| | - Toni Gibbs-Dean
- Institute of Psychiatry, Psychology and Neuroscience, Department of Psychosis Studies, King's College London, London SE5 8AF, UK
| | - Matthew J Kempton
- Institute of Psychiatry, Psychology and Neuroscience, Department of Psychosis Studies, King's College London, London SE5 8AF, UK
| | - Richard Dobson
- Institute of Psychiatry, Psychology and Neuroscience, Department of Biostatistics & Health Informatics, King's College London, London SE5 8AF, UK
| | - Thomas Spencer
- Institute of Psychiatry, Psychology and Neuroscience, Department of Psychosis Studies, King's College London, London SE5 8AF, UK
| | - Nicholas Cummins
- Institute of Psychiatry, Psychology and Neuroscience, Department of Biostatistics & Health Informatics, King's College London, London SE5 8AF, UK
| |
Collapse
|
7
|
Vorperian HK, Kent RD, Lee Y, Buhr KA. Vowel Production in Children and Adults With Down Syndrome: Fundamental and Formant Frequencies of the Corner Vowels. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:1208-1239. [PMID: 37015000 PMCID: PMC10187968 DOI: 10.1044/2022_jslhr-22-00510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2022] [Revised: 12/01/2022] [Accepted: 12/21/2022] [Indexed: 05/18/2023]
Abstract
PURPOSE Atypical vowel production contributes to reduced speech intelligibility in children and adults with Down syndrome (DS). This study compares the acoustic data of the corner vowels /i/, /u/, /æ/, and /ɑ/ from speakers with DS against typically developing/developed (TD) speakers. METHOD Measurements of the fundamental frequency (f o) and first four formant frequencies (F1-F4) were obtained from single word recordings containing the target vowels from 81 participants with DS (ages 3-54 years) and 293 TD speakers (ages 4-92 years), all native speakers of English. The data were used to construct developmental trajectories and to determine interspeaker and intraspeaker variability. RESULTS Trajectories for DS differed from TD based on age and sex, but the groups were similar with the striking change in f o and F1-F4 frequencies around age 10 years. Findings confirm higher f o in DS, and vowel-specific differences between DS and TD in F1 and F2 frequencies, but not F3 and F4. The measure of F2 differences of front-versus-back vowels was more sensitive of compression than reduced vowel space area/centralization across age and sex. Low vowels had more pronounced F2 compression as related to reduced speech intelligibility. Intraspeaker variability was significantly greater for DS than TD for nearly all frequency values across age. DISCUSSION Vowel production differences between DS and TD are age- and sex-specific, which helps explain contradictory results in previous studies. Increased intraspeaker variability across age in DS confirms the presence of a persisting motor speech disorder. Atypical vowel production in DS is common and related to dysmorphology, delayed development, and disordered motor control.
Collapse
Affiliation(s)
- Houri K. Vorperian
- Vocal Tract Development Lab, Waisman Center, University of Wisconsin–Madison
| | - Raymond D. Kent
- Vocal Tract Development Lab, Waisman Center, University of Wisconsin–Madison
| | - Yen Lee
- Department of Educational Leadership, Edgewood College, Madison, Wisconsin
| | - Kevin A. Buhr
- Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison
| |
Collapse
|
8
|
Kabakoff H, Beames SP, Tiede M, Whalen DH, Preston JL, McAllister T. Comparing metrics for quantification of children's tongue shape complexity using ultrasound imaging. CLINICAL LINGUISTICS & PHONETICS 2023; 37:169-195. [PMID: 35243947 PMCID: PMC9440959 DOI: 10.1080/02699206.2022.2039300] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 12/20/2021] [Accepted: 02/01/2022] [Indexed: 06/01/2023]
Abstract
Speech sound disorders can pose a challenge to communication in children that may persist into adulthood. As some speech sounds are known to require differential control of anterior versus posterior regions of the tongue body, valid measurement of the degree of differentiation of a given tongue shape has the potential to shed light on development of motor skill in typical and disordered speakers. The current study sought to compare the success of multiple techniques in quantifying tongue shape complexity as an index of degree of lingual differentiation in child and adult speakers. Using a pre-existing data set of ultrasound images of tongue shapes from adult speakers producing a variety of phonemes, we compared the extent to which three metrics of tongue shape complexity differed across phonemes/phoneme classes that were expected to differ in articulatory complexity. We then repeated this process with ultrasound tongue shapes produced by a sample of young children. The results of these comparisons suggested that a modified curvature index and a metric representing the number of inflection points best reflected small changes in tongue shapes across individuals differing in vocal tract size. Ultimately, these metrics have the potential to reveal delays in motor skill in young children, which could inform assessment procedures and treatment decisions for children with speech delays and disorders.
Collapse
Affiliation(s)
- Heather Kabakoff
- Department of Communicative Sciences and Disorders, New York University, New York, NY, USA
| | - Sam Pearl Beames
- Department of Communicative Sciences and Disorders, New York University, New York, NY, USA
| | - Mark Tiede
- Haskins Laboratories, New Haven, CT, USA
| | - D. H. Whalen
- Haskins Laboratories, New Haven, CT, USA
- Speech-Language-Hearing Sciences, The Graduate Center, City University of New York, New York, NY, USA
- Linguistics Department, Yale University, New Haven, CT, USA
| | - Jonathan L. Preston
- Haskins Laboratories, New Haven, CT, USA
- Department of Communication Sciences and Disorders, Syracuse University, Syracuse, NY, USA
| | - Tara McAllister
- Department of Communicative Sciences and Disorders, New York University, New York, NY, USA
| |
Collapse
|
9
|
León M, Washington KN, McKenna VS, Crowe K, Fritz K, Boyce S. Characterizing Speech Sound Productions in Bilingual Speakers of Jamaican Creole and English: Application of Durational Acoustic Methods. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:61-83. [PMID: 36580548 PMCID: PMC10023179 DOI: 10.1044/2022_jslhr-22-00304] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 09/11/2022] [Accepted: 09/16/2022] [Indexed: 06/17/2023]
Abstract
PURPOSE This study examined the speech acoustic characteristics of Jamaican Creole (JC) and English in bilingual preschoolers and adults using acoustic duration measures. The aims were to determine if, for JC and English, (a) child and adult acoustic duration characteristics differ, (b) differences occur in preschoolers' duration patterns based on the language spoken, and (c) relationships exist between the preschoolers' personal contextual factors (i.e., age, sex, and percentage of language [%language] exposure and use) and acoustic duration. METHOD Data for this cross-sectional study were collected in Kingston, Jamaica, and New York City, New York, United States, during 2013-2019. Participants included typically developing simultaneous bilingual preschoolers (n = 120, ages 3;4-5;11 [years;months]) and adults (n = 15, ages 19;0-54;4) from the same linguistic community. Audio recordings of single-word productions of JC and English were collected through elicited picture-based tasks and used for acoustic analysis. Durational features (voice onset time [VOT], vowel duration, whole-word duration, and the proportion of vowel to whole-word duration) were measured using Praat, a speech analysis software program. RESULTS JC-English-speaking children demonstrated developing speech motor control through differences in durational patterns compared with adults, including VOT for voiced plosives. Children's VOT, vowel duration, and whole-word duration were produced similarly across JC and English. The contextual factor %language use was predictive of vowel and whole-word duration in English. CONCLUSIONS The findings from this study contribute to a foundation of understanding typical bilingual speech characteristics and motor development as well as schema in JC-English speakers. In particular, minimal acoustic duration differences were observed across the post-Creole continuum, a feature that may be attributed to the JC-English bilingual environment. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.21760469.
Collapse
Affiliation(s)
- Michelle León
- Department of Communication Sciences & Disorders, University of Cincinnati, OH
| | - Karla N. Washington
- Department of Communication Sciences & Disorders, University of Cincinnati, OH
- Department of Speech-Language Pathology, University of Toronto, Ontario, Canada
- Department of Communicative Sciences and Disorders, New York University, NY
| | - Victoria S. McKenna
- Department of Communication Sciences & Disorders, University of Cincinnati, OH
- Department of Biomedical Engineering, University of Cincinnati, OH
- Department of Otolaryngology–Head and Neck Surgery, University of Cincinnati, OH
| | - Kathryn Crowe
- School of Education, Charles Sturt University, Bathurst, New South Wales, Australia
- School of Health Sciences, University of Iceland, Reykjavík
| | - Kristina Fritz
- Department of Psychology, California State University Northridge, Los Angeles
| | - Suzanne Boyce
- Department of Communication Sciences & Disorders, University of Cincinnati, OH
| |
Collapse
|
10
|
León M, Washington KN, McKenna VS, Crowe K, Fritz K. Linguistically Informed Acoustic and Perceptual Analysis of Bilingual Children's Speech Productions: An Exploratory Study in the Jamaican Context. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:2490-2509. [PMID: 35858256 PMCID: PMC9584129 DOI: 10.1044/2022_jslhr-21-00386] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 12/17/2021] [Accepted: 04/08/2022] [Indexed: 05/03/2023]
Abstract
PURPOSE The aim of this study was to characterize speech acoustics in bilingual preschoolers who speak Jamaican Creole (JC) and English. We compared a standard approach with a culturally responsive approach for characterizing speech sound productions. Preschoolers' speech productions were compared to adult models from the same linguistic community as a means for providing confirmatory evidence of typical speech patterns specific to JC-English speakers. METHOD Two protocols were applied to the data collected using the Diagnostic Evaluation of Articulation and Phonology (DEAP) Articulation subtest: (a) the standardized DEAP protocol and (b) a culturally and linguistically adapted protocol reflective of the Jamaican post-Creole (English to Creole) continuum. The protocols were used to analyze responses from JC-English-speaking preschoolers (n = 119) and adults (n = 15). Responses were analyzed using acoustic (voice onset time, whole-word duration, and vowel duration) and perceptual (percentage of consonant correct-revised and response frequencies) measures. RESULTS The culturally responsive protocol captured variation in the frequency and acoustic differences produced in the post-Creole continuum, with higher amounts of "other" responses compared to "standard" target responses for both children and adults. Adults' whole-word durations were shorter and showed more consistent prevoicing during initial plosives compared to the children. CONCLUSIONS Applying culturally responsive methods, including knowledge of the variation produced in the post-Creole continuum and with adult models from the same linguistic community, improved the ecological validity of speech characterizations for JC-English preschoolers. Acoustic properties of speech should be investigated further as a means of describing bilingual development and distinguishing between difference and disorder. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.20249382.
Collapse
Affiliation(s)
- Michelle León
- Department of Communication Sciences and Disorders, University of Cincinnati, OH
| | - Karla N. Washington
- Department of Communication Sciences and Disorders, University of Cincinnati, OH
- Department of Communicative Sciences and Disorders, New York University, New York
| | - Victoria S. McKenna
- Department of Communication Sciences and Disorders, University of Cincinnati, OH
- Department of Biomedical Engineering, University of Cincinnati, OH
- Department of Otolaryngology-Head and Neck Surgery, University of Cincinnati, OH
| | - Kathryn Crowe
- School of Education, Charles Sturt University, Bathurst, New South Wales, Australia
- School of Health Sciences, University of Iceland, Reykjavík
| | - Kristina Fritz
- Department of Psychology, California State University Northridge, Los Angeles
| |
Collapse
|
11
|
An Acoustic Feature-Based Deep Learning Model for Automatic Thai Vowel Pronunciation Recognition. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12136595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
For Thai vowel pronunciation, it is very important to know that when mispronunciation occurs, the meanings of words change completely. Thus, effective and standardized practice is essential to pronouncing words correctly as a native speaker. Since the COVID-19 pandemic, online learning has become increasingly popular. For example, an online pronunciation application system was introduced that has virtual teachers and an intelligent process of evaluating students that is similar to standardized training by a teacher in a real classroom. This research presents an online automatic computer-assisted pronunciation training (CAPT) using deep learning to recognize Thai vowels in speech. The automatic CAPT is developed to solve the inadequacy of instruction specialists and the complex vowel teaching process. It is a unique system that develops computer techniques integrated with linguistic theory. The deep learning model is the most significant part of recognizing vowels pronounced for the automatic CAPT. The major challenge in Thai vowel recognition is the correct identification of Thai vowels when spoken in real-world situations. A convolutional neural network (CNN), a deep learning model, is applied and developed in the classification of pronounced Thai vowels. A new dataset for Thai vowels was designed, collected, and examined by linguists. The result of an optimal CNN model with Mel spectrogram (MS) achieves the highest accuracy of 98.61%, compared with Mel frequency cepstral coefficients (MFCC) with the baseline long short-term memory (LSTM) model and MS with the baseline LSTM model have an accuracy of 94.44% and 90.00% respectively.
Collapse
|
12
|
Chenausky KV, Tager-Flusberg H. The importance of deep speech phenotyping for neurodevelopmental and genetic disorders: a conceptual review. J Neurodev Disord 2022; 14:36. [PMID: 35690736 PMCID: PMC9188130 DOI: 10.1186/s11689-022-09443-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Accepted: 05/06/2022] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Speech is the most common modality through which language is communicated, and delayed, disordered, or absent speech production is a hallmark of many neurodevelopmental and genetic disorders. Yet, speech is not often carefully phenotyped in neurodevelopmental disorders. In this paper, we argue that such deep phenotyping, defined as phenotyping that is specific to speech production and not conflated with language or cognitive ability, is vital if we are to understand how genetic variations affect the brain regions that are associated with spoken language. Speech is distinct from language, though the two are related behaviorally and share neural substrates. We present a brief taxonomy of developmental speech production disorders, with particular emphasis on the motor speech disorders childhood apraxia of speech (a disorder of motor planning) and childhood dysarthria (a set of disorders of motor execution). We review the history of discoveries concerning the KE family, in whom a hereditary form of communication impairment was identified as childhood apraxia of speech and linked to dysfunction in the FOXP2 gene. The story demonstrates how instrumental deep phenotyping of speech production was in this seminal discovery in the genetics of speech and language. There is considerable overlap between the neural substrates associated with speech production and with FOXP2 expression, suggesting that further genes associated with speech dysfunction will also be expressed in similar brain regions. We then show how a biologically accurate computational model of speech production, in combination with detailed information about speech production in children with developmental disorders, can generate testable hypotheses about the nature, genetics, and neurology of speech disorders. CONCLUSIONS Though speech and language are distinct, specific types of developmental speech disorder are associated with far-reaching effects on verbal communication in children with neurodevelopmental disorders. Therefore, detailed speech phenotyping, in collaboration with experts on pediatric speech development and disorders, can lead us to a new generation of discoveries about how speech development is affected in genetic disorders.
Collapse
Affiliation(s)
- Karen V Chenausky
- Speech in Autism and Neurodevelopmental Disorders Lab, Massachusetts General Hospital Institute of Health Professions, 36 1st Avenue, Boston, MA, 02129, USA.
- Department of Neurology, Harvard Medical School, Boston, USA.
- Department of Psychological and Brain Sciences, Boston University, Boston, USA.
| | | |
Collapse
|
13
|
Olmstead AJ, Lee J, Chen J. Perceptual Learning of Altered Vowel Space Improves Identification of Vowels Produced by Individuals With Dysarthria Secondary to Amyotrophic Lateral Sclerosis. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:2204-2214. [PMID: 35623135 DOI: 10.1044/2022_jslhr-21-00567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
PURPOSE This study examines the efficacy of perceptual training for improving typical listeners' identification of vowels produced by individuals with dysarthria. We examined whether training on a subset of vowels can generalize to (a) untrained vowels and (b) other speakers with similar overall intelligibility. METHOD Sixty naive listeners completed a pretest/posttest perceptual learning task. In the pretraining test and posttraining test, participants identified nine American English monophthongs produced by two speakers with dysarthria secondary to amyotrophic lateral sclerosis (ALS). In the 20-min training task, a two-alternative forced choice (2AFC) task with feedback trained listeners on a subset of the vowels and speakers presented in the pretraining test. RESULTS Vowel identification accuracy improved overall as a function of training. However, patterns of generalization between speakers and vowel types were not symmetric. Specifically, listeners generalized training from front vowels to back vowels but not vice versa. Likewise, listeners generalized from one speaker to another but not in the opposite direction. Examination of confusion matrices for the pretraining and posttraining revealed complex patterns of vowel-specific improvement. CONCLUSIONS This study demonstrates that listeners benefit from a very simple training paradigm targeting vowels. Additionally, error patterns revealed that vowels are both resistant to and responsive to perceptual learning. Implications for future research and clinical training paradigms are discussed.
Collapse
Affiliation(s)
- Annie J Olmstead
- Department of Communication Sciences and Disorders, The Pennsylvania State University, University Park
| | - Jimin Lee
- Department of Communication Sciences and Disorders, The Pennsylvania State University, University Park
| | - Janice Chen
- Department of Communication Sciences and Disorders, The Pennsylvania State University, University Park
| |
Collapse
|
14
|
Roepke E, Brosseau-Lapré F. Vowel errors produced by preschool-age children on a single-word test of articulation. CLINICAL LINGUISTICS & PHONETICS 2021; 35:1161-1183. [PMID: 33459085 PMCID: PMC8285462 DOI: 10.1080/02699206.2020.1869834] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 12/20/2020] [Accepted: 12/23/2020] [Indexed: 06/12/2023]
Abstract
Eighty-four children, age 4-5 years, with and without speech sound disorder (SSD) completed a battery of standardized speech and language tests, including the Goldman-Fristoe Test of Articulation, Third Edition (GFTA-3). Children with SSD produced more vowel errors than children with typical speech abilities. Percentage vowels correct and consonant error variability were highly correlated, suggesting that poorly specified phonological representations affect both consonants and vowels within a child's phonological system. However, the GFTA-3 did not contain sufficient target words to determine full vowel inventory. Using words from the GFTA-3, we present a case study of a child with vowel errors along with a sample analysis of these errors, primarily in terms of consonant-vowel feature interactions. Children who exhibit vowel errors on standardized single-word tests of speech accuracy may benefit from further vowel probes to determine how vowel and consonant errors interact in their phonological systems for more targeted therapy.
Collapse
Affiliation(s)
- Elizabeth Roepke
- Department of Speech, Language and Hearing Sciences, Purdue University, West Lafayette, Indiana, USA
| | - Françoise Brosseau-Lapré
- Department of Speech, Language and Hearing Sciences, Purdue University, West Lafayette, Indiana, USA
| |
Collapse
|
15
|
Hidalgo-De la Guía I, Garayzábal-Heinze E, Gómez-Vilda P, Martínez-Olalla R, Palacios-Alonso D. Acoustic Analysis of Phonation in Children With Smith-Magenis Syndrome. Front Hum Neurosci 2021; 15:661392. [PMID: 34149380 PMCID: PMC8209519 DOI: 10.3389/fnhum.2021.661392] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Accepted: 04/27/2021] [Indexed: 11/13/2022] Open
Abstract
Complex simultaneous neuropsychophysiological mechanisms are responsible for the processing of the information to be transmitted and for the neuromotor planning of the articulatory organs involved in speech. The nature of this set of mechanisms is closely linked to the clinical state of the subject. Thus, for example, in populations with neurodevelopmental deficits, these underlying neuropsychophysiological procedures are deficient and determine their phonation. Most of these cases with neurodevelopmental deficits are due to a genetic abnormality, as is the case in the population with Smith–Magenis syndrome (SMS). SMS is associated with neurodevelopmental deficits, intellectual disability, and a cohort of characteristic phenotypic features, including voice quality, which does not seem to be in line with the gender, age, and complexion of the diagnosed subject. The phonatory profile and speech features in this syndrome are dysphonia, high f0, excess vocal muscle stiffness, fluency alterations, numerous syllabic simplifications, phoneme omissions, and unintelligibility of speech. This exploratory study investigates whether the neuromotor deficits in children with SMS adversely affect phonation as compared to typically developing children without neuromotor deficits, which has not been previously determined. The authors compare the phonatory performance of a group of children with SMS (N = 12) with a healthy control group of children (N = 12) matched in age, gender, and grouped into two age ranges. The first group ranges from 5 to 7 years old, and the second group goes from 8 to 12 years old. Group differences were determined for two forms of acoustic analysis performed on repeated recordings of the sustained vowel /a/ F1 and F2 extraction and cepstral peak prominence (CPP). It is expected that the results will enlighten the question of the underlying neuromotor aspects of phonation in SMS population. These findings could provide evidence of the susceptibility of phonation of speech to neuromotor disturbances, regardless of their origin.
Collapse
Affiliation(s)
| | | | - Pedro Gómez-Vilda
- Center for Biomedical Technology, Universidad Politécnica de Madrid, Madrid, Spain
| | | | - Daniel Palacios-Alonso
- Escuela Técnica Superior de Ingeniería Informática, Universidad Rey Juan Carlos, Madrid, Spain
| |
Collapse
|
16
|
Kent RD. Developmental Functional Modules in Infant Vocalizations. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:1581-1604. [PMID: 33861626 DOI: 10.1044/2021_jslhr-20-00703] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Purpose Developmental functional modules (DFMs) are biological modules that are defined by their structural (morphological), functional, or developmental elements, and, in some cases, all three of these. This review article considers the hypothesis that vocal development in the first year of life can be understood in large part with respect to DFMs that characterize the speech production system. Method Literature is reviewed on relevant embryology, orofacial reflexes, craniofacial muscle properties, stages of vocal development, and related topics to identity candidates for DFMs. Results The following DFMs are identified and described: laryngeal, pharyngo-laryngeal, mandibular, velopharyngeal, labial complex, and lingual complex. These DFMs and their submodules, considered along with phenomena such as rhythmic movements, account for several well-documented features of vocal development in the first year of life. The proposed DFMs, rooted in embryologic, histologic, and kinematic properties, serve as low-dimensional control variables for the developing vocal tract. Each DFM is semi-autonomous but interacts with other DFMs to produce patterns of vocal behavior. Discussion Considered in relation to contemporary profiles and models of vocal development in the first year of life, DFMs have interpretive and explanatory value. DFMs complement other approaches in the study of infant vocalizations and are grounded in biology.
Collapse
Affiliation(s)
- Ray D Kent
- Department of Communication Sciences & Disorders, University of Wisconsin-Madison
| |
Collapse
|
17
|
Kent RD, Eichhorn J, Wilson EM, Suk Y, Bolt DM, Vorperian HK. Auditory-Perceptual Features of Speech in Children and Adults With Down Syndrome: A Speech Profile Analysis. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:1157-1175. [PMID: 33789057 PMCID: PMC8608145 DOI: 10.1044/2021_jslhr-20-00617] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Revised: 01/02/2021] [Accepted: 01/03/2021] [Indexed: 05/17/2023]
Abstract
Purpose The aim of this study was to determine how the speech disorder profiles in Down syndrome (DS) relate to reduced intelligibility, atypical overall quality, and impairments in the subsystems of speech production (phonation, articulation, resonance, and prosody). Method Auditory-perceptual ratings of intelligibility, overall quality, and features associated with the subsystems of speech production were obtained from recordings of 79 children and adults with DS. Ratings were made for sustained vowels (62 of 79 speakers) and short sentences (79 speakers). The data were analyzed to determine the severity of the affected features in each speaking task and to detect patterns in the group data by means of principal components analysis. Results Reduced intelligibility was noted in 90% of the speakers, and atypical overall speech quality was noted in 100%. Affected speech features were distributed across the speech production subsystems. Principal components analysis revealed four components each for the vowel and sentence tasks, showing that individuals with DS are not homogeneous in the features of their speech disorder. Discussion The speech disorder in DS is complex in its perceptual features and reflects impairments across the subsystems of speech production, but the pattern is not uniform across individuals, indicating that attention must be given to individual variation in designing treatments.
Collapse
|
18
|
Mogren Å, McAllister A, Sjögreen L. Range of motion (ROM) in the lips and jaw during vowels assessed with 3D motion analysis in Swedish children with typical speech development and children with speech sound disorders. LOGOP PHONIATR VOCO 2021; 47:219-229. [PMID: 33660562 DOI: 10.1080/14015439.2021.1890207] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
PURPOSE The aim was to compare movement patterns of lips and jaw in lateral, vertical and anteroposterior directions during vowel production in children with typical speech development (TSD) and in children with speech sound disorders (SSD) persisting after the age of six. METHODS A total of 93 children were included, 42 children with TSD (6:0-12:2 years, mean age 8:9 ± 1:5, 19 girls and 23 boys) and 51 children with SSD (6:0-16:7 years, mean age 8:5 ± 3:0, 14 girls and 37 boys). Range of motion (ROM) in lips and jaw in the vowels [a, ʊ, ɪ] produced in a syllable repetition task and median values in resting position were measured with a system for 3D motion analysis. The analysis was based on the coordinates for the mouth corners and the chin centre. RESULTS There were significant differences between the groups on movements in lateral direction in both lips and jaw. Children with TSD had generally smaller and more, symmetrical movements in the lips and jaw, in all three dimensions compared to children with SSD. There were no significant differences between the groups in resting position. CONCLUSION Children with SSD persisting after the age of six years show more asymmetrical and more variable movement patterns in lips and jaw during vowel production compared with children with TSD in a simple syllable repetition task. Differences were more pronounced in lateral direction in both lips and jaw.
Collapse
Affiliation(s)
- Åsa Mogren
- Division of Speech and Language Pathology, Department of Clinical Science, Intervention and Technology, Karolinska Institutet, Stockholm, Sweden.,Mun-H-Center, Orofacial Resource Centre for Rare Diseases, Public Dental Service, Gothenburg, Sweden
| | - Anita McAllister
- Division of Speech and Language Pathology, Department of Clinical Science, Intervention and Technology, Karolinska Institutet, Stockholm, Sweden.,Functional Area Speech and Language Pathology, Karolinska University Hospital, Stockholm, Sweden
| | - Lotta Sjögreen
- Mun-H-Center, Orofacial Resource Centre for Rare Diseases, Public Dental Service, Gothenburg, Sweden.,Department of Health and Rehabilitation, Sahlgrenska Academy, Institute of Neuroscience and Physiology, Gothenburg, Sweden
| |
Collapse
|