1
|
Rogers HP, Hseu A, Kim J, Silberholz E, Jo S, Dorste A, Jenkins K. Voice as a Biomarker of Pediatric Health: A Scoping Review. CHILDREN (BASEL, SWITZERLAND) 2024; 11:684. [PMID: 38929263 PMCID: PMC11201680 DOI: 10.3390/children11060684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Revised: 05/24/2024] [Accepted: 05/29/2024] [Indexed: 06/28/2024]
Abstract
The human voice has the potential to serve as a valuable biomarker for the early detection, diagnosis, and monitoring of pediatric conditions. This scoping review synthesizes the current knowledge on the application of artificial intelligence (AI) in analyzing pediatric voice as a biomarker for health. The included studies featured voice recordings from pediatric populations aged 0-17 years, utilized feature extraction methods, and analyzed pathological biomarkers using AI models. Data from 62 studies were extracted, encompassing study and participant characteristics, recording sources, feature extraction methods, and AI models. Data from 39 models across 35 studies were evaluated for accuracy, sensitivity, and specificity. The review showed a global representation of pediatric voice studies, with a focus on developmental, respiratory, speech, and language conditions. The most frequently studied conditions were autism spectrum disorder, intellectual disabilities, asphyxia, and asthma. Mel-Frequency Cepstral Coefficients were the most utilized feature extraction method, while Support Vector Machines were the predominant AI model. The analysis of pediatric voice using AI demonstrates promise as a non-invasive, cost-effective biomarker for a broad spectrum of pediatric conditions. Further research is necessary to standardize the feature extraction methods and AI models utilized for the evaluation of pediatric voice as a biomarker for health. Standardization has significant potential to enhance the accuracy and applicability of these tools in clinical settings across a variety of conditions and voice recording types. Further development of this field has enormous potential for the creation of innovative diagnostic tools and interventions for pediatric populations globally.
Collapse
Affiliation(s)
- Hannah Paige Rogers
- Department of Cardiology, Boston Children’s Hospital, Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, USA
| | - Anne Hseu
- Department of Otolaryngology, Boston Children’s Hospital, 333 Longwood Ave, Boston, MA 02115, USA
| | - Jung Kim
- Department of Pediatrics, Boston Children’s Hospital, Boston, MA 02115, USA
| | | | - Stacy Jo
- Department of Otolaryngology, Boston Children’s Hospital, 333 Longwood Ave, Boston, MA 02115, USA
| | - Anna Dorste
- Boston Children’s Hospital, 300 Longwood Avenue, Boston, MA 02115, USA
| | - Kathy Jenkins
- Department of Cardiology, Boston Children’s Hospital, Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, USA
| |
Collapse
|
2
|
Oberloier S, Whisman NG, Hafting F, Pearce JM. Open source framework for a Broadly Expandable and Reconfigurable data acquisition and automation device (BREAD). HARDWAREX 2023; 15:e00467. [PMID: 37711733 PMCID: PMC10498007 DOI: 10.1016/j.ohx.2023.e00467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 07/25/2023] [Accepted: 08/24/2023] [Indexed: 09/16/2023]
Abstract
Though open source data acquisition (DAQ) systems have been published, closed source proprietary systems are the standard despite often being prohibitively expensive. High costs, however, limit access to high-quality DAQ in low-resource settings. In many cases the functions executed by the closed source and proprietary DAQ cards could be carried out by an open source alternative; however, as desired function count increases, the simplicity of integrating the designs decreases substantially. Although the global library of open source electronic designs is expanding rapidly, and there is clear evidence they can reduce costs for scientists one device at a time, they are generally made to carry a function well, but are often not capable of scaling up or easily being integrated with other designs. Just as other open source projects have found success by having modular frameworks and clearly documented specifications, a framework to unify and enable interoperation of these open source electronics systems would be greatly beneficial to the scientific community. To meet these needs and ensure greater accessibility to high-quality electronics sensing and DAQ systems, this article shares and tests a news framework where new open source electronics can be developed and have plug-and-play functionality. The Broadly Reconfigurable and Expandable Automation Device (BREAD), consists of a basic set of guidelines and requirements to which others can contribute. Here 7 slices (boards) are provided, demonstrated, and validated: 1) Amplified Analog Input, 2) Audio Analysis / Fourier Transform, 3) +/- 10A Current Sensor, 4) 4-Channel Relay Controller 5) 4 Channel Stepper Motor Controller, 6) 4 Channel Type-K Thermocouple Reader and 7) 2 Channel USB Port. Implementing systems using BREAD rather than closed source and proprietary alternatives can result in cost savings of up to 93%.
Collapse
Affiliation(s)
- Shane Oberloier
- Department of Electrical & Computer Engineering, Michigan Technological University, Houghton MI 49931 USA
| | - Nicholas G. Whisman
- Department of Electrical & Computer Engineering, Michigan Technological University, Houghton MI 49931 USA
| | - Finn Hafting
- Department of Electrical & Computer Engineering, Western University, London, ON, Canada
| | - Joshua M. Pearce
- Department of Electrical & Computer Engineering, Western University, London, ON, Canada
| |
Collapse
|
3
|
Chang CH, Lu CT, Chen TL, Huang WT, Torng PC, Chang CW, Chen YC, Yu YL, Chuang YN. The association of bisphenol A and paraben exposure with sensorineural hearing loss in children. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2023; 30:100552-100561. [PMID: 37635162 DOI: 10.1007/s11356-023-29426-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 08/17/2023] [Indexed: 08/29/2023]
Abstract
Bisphenol A (BPA) and parabens (PBs) are chemicals that are extensively used in personal care products (PCPs). In early childhood development, hearing is critical to speech and language development, communication, and learning. In vitro and in vivo, BPA/PBs exhibited neurotoxicity through elevated levels of oxidative stress. BPA also has the potential to be an ototoxicant. Therefore, this study aimed to determine the association of exposure to BPA/PBs with sensorineural hearing loss in children. A cross-sectional study based on hearing tests was conducted. This study enrolled 320 children aged 6-12 years from elementary school. Urinary BPA and PB concentrations were analyzed by using liquid chromatography-tandem mass spectrometry (LC-MS/MS). Logistic regression models were employed to determine the association of BPA/PB exposure with sensorineural hearing loss. Children with sensorineural hearing loss had higher BPA concentrations than normal-hearing children (0.22 ng/ml vs. 0.10 ng/ml, p = 0.05). After adjustment for covariates, the risk of hearing loss at middle frequencies reached 1.83-fold (95% CI: 1.12-2.99) when BPA concentrations increased by 1 log10. The risk of slight hearing loss reached 2.24-fold (95% CI: 1.05-4.78) when children had a tenfold increase in ethyl paraben (EP) concentration. This study clarifies the role of exposure to BPA/PBs in hearing loss in children. Future research needs to be expanded to include cohort designs and nationwide studies to identify causality.
Collapse
Affiliation(s)
- Chia-Huang Chang
- School of Public Health, Taipei Medical University, Taipei, Taiwan.
| | - Chun-Ting Lu
- School of Public Health, Taipei Medical University, Taipei, Taiwan
| | - Tai-Ling Chen
- Department of Otorhinolaryngology, Taipei City Hospital, Ren-Ai Branch, Taipei, Taiwan
| | - Wen-Tzu Huang
- School of Public Health, Taipei Medical University, Taipei, Taiwan
| | - Pao-Chuan Torng
- Department of Speech-Language Pathology and Audiology, National Taipei University of Nursing and Health Sciences, Taipei, Taiwan
| | - Chen-Wei Chang
- Department of Speech-Language Pathology and Audiology, National Taipei University of Nursing and Health Sciences, Taipei, Taiwan
| | - Yu-Chun Chen
- School of Psychology, Speech and Hearing, University of Canterbury, Christchurch, New Zealand
| | - Yu-Lin Yu
- Department of Otorhinolaryngology, Taipei City Hospital, Ren-Ai Branch, Taipei, Taiwan
| | - Yung-Ning Chuang
- Master Program in Food Safety, College of Nutrition, Taipei Medical University, Taipei, Taiwan
| |
Collapse
|
4
|
Perry LK, Mitsven SG, Custode S, Vitale L, Laursen B, Song C, Messinger DS. Reciprocal Patterns of Peer Speech in Preschoolers with and without Hearing Loss. EARLY CHILDHOOD RESEARCH QUARTERLY 2022; 60:201-213. [PMID: 35273424 PMCID: PMC8903181 DOI: 10.1016/j.ecresq.2022.02.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Children with hearing loss often attend inclusive preschool classrooms aimed at improving their spoken language skills. Although preschool classrooms are fertile environments for vocal interaction with peers, little is known about the dyadic processes that influence children's speech to one another and foster their language abilities and how these processes may vary in children with hearing loss. We used new objective measurement approaches to identify and quantify children's vocalizations during social contact, as determined by children's proximity and mutual orientation. The contributions of peer vocalizations to children's future vocalizations and language abilities were examined in oral language inclusion classrooms containing children with hearing loss who use hearing aids or cochlear implants and their typically hearing peers. Across over 600 hours of recorded vocal interactions of twenty-nine 2.5-3.5 year olds (16 girls) in three cohorts of children in a classroom, we found that vocalizations from each peer on a given observation predicted a child's vocalizations to that same peer on the subsequent observation. Children who produced more vocalizations to their peers had higher receptive and expressive language abilities, as measured by a standardized end-of-year language assessment. In fact, vocalizations from peers had an indirect association with end-of-year language abilities as mediated by children's vocalizations to peers. These findings did not vary as a function of hearing status. Overall, then, the results demonstrate the importance of dyadic peer vocal interactions for children's language use and abilities.
Collapse
Affiliation(s)
| | | | | | | | - Brett Laursen
- Department of Psychology, Florida Atlantic University
| | | | - Daniel S. Messinger
- Department of Psychology, University of Miami
- Department of Pediatrics, Department of Electrical & Computer Engineering, Department of Music Engineering, University of Miami
| |
Collapse
|
5
|
Abstract
OBJECTIVES This systematic review is designed to (a) describe measures used to quantify vocal development in pediatric cochlear implant (CI) users, (b) synthesize the evidence on prelinguistic vocal development in young children before and after cochlear implantation, and (c) analyze the application of the current evidence for evaluating change in vocal development before and after cochlear implantation for young children. Investigations of prelinguistic vocal development after cochlear implantation are only beginning to uncover the expected course of prelinguistic vocal development in children with CIs and what factors influence that course, which varies substantially across pediatric CI users. A deeper understanding of prelinguistic vocal development will improve professionals' abilities to determine whether a child with a CI is exhibiting sufficient progress soon after implantation and to adjust intervention as needed. DESIGN We systematically searched PubMed, ProQuest, and CINAHL databases for primary reports of children who received a CI before 5 years 0 months of age that included at least one measure of nonword, nonvegetative vocalizations. We also completed supplementary searches. RESULTS Of the 1916 identified records, 59 met inclusion criteria. The included records included 1125 total participants, which came from 36 unique samples. Records included a median of 8 participants and rarely included children with disabilities other than hearing loss. Nearly all of the records met criteria for level 3 for quality of evidence on a scale of 1 (highest) to 4 (lowest). Records utilized a wide variety of vocalization measures but often incorporated features related to canonical babbling. The limited evidence from pediatric CI candidates before implantation suggests that they are likely to exhibit deficits in canonical syllables, a critical vocal development skill, and phonetic inventory size. Following cochlear implantation, multiple studies report similar patterns of growth, but faster rates producing canonical syllables in children with CIs than peers with comparable durations of robust hearing. However, caution is warranted because these demonstrated vocal development skills still occur at older chronological ages for children with CIs than chronological age peers with typical hearing. CONCLUSIONS Despite including a relatively large number of records, the evidence in this review regarding changes in vocal development before and after cochlear implantation in young children remains limited. A deeper understanding of when prelinguistic skills are expected to develop, factors that explain deviation from that course, and the long-term impacts of variations in vocal prelinguistic development is needed. The diverse and dynamic nature of the relatively small population of pediatric CI users as well as relatively new vocal development measures present challenges for documenting and predicting vocal development in pediatric CI users before and after cochlear implantation. Synthesizing results across multiple institutions and completing rigorous studies with theoretically motivated, falsifiable research questions will address a number of challenges for understanding prelinguistic vocal development in children with CIs and its relations with other current and future skills. Clinical implications include the need to measure prelinguistic vocalizations regularly and systematically to inform intervention planning.
Collapse
|
6
|
Sola AM, Brodie KD, Stephans J, Scarpelli C, Chan DK. Tracking Home Language Production and Environment in Children Who Are Deaf or Hard of Hearing. Otolaryngol Head Neck Surg 2021; 166:171-178. [PMID: 34032520 DOI: 10.1177/01945998211013785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
OBJECTIVE To use an automated speech-processing technology to identify patterns in sound environments and language output for deaf or hard-of-hearing infants and toddlers. STUDY DESIGN Observational study based on a convenience sample. SETTING Home observation conducted by tertiary children's hospital. METHODS The system analyzed 115 naturalistic recordings of 28 children <3.5 years old. Hearing ability was stratified into groups by access to sound. Outcomes were compared across hearing groups, and multivariable linear regression was used to test associations. RESULTS There was a significant difference in age-adjusted child vocalizations (P = .042), conversational turns (P = .022), and language development scores (P = .05) between hearing groups but no significant difference in adult words (P = .11). Conversational turns were positively associated with each language development measure, while adult words were not. For each hour of electronic media, there were significant reductions in child vocalizations (β = -0.47; 95% CI, -0.71 to -0.19), conversational turns (β = -0.45; 95% CI, -0.65 to -0.22), and language development (β = -0.37; 95% CI, -0.61 to -0.15). CONCLUSIONS Conversational turn scores differ among hearing groups and are positively associated with language development outcomes. Electronic media is associated with reduced discernible adult speech, child vocalizations, conversational turns, and language development scores. This effect was larger in children who are deaf or hard of hearing as compared with other reports in typically hearing populations. These findings underscore the need to optimize early language environments and limit electronic noise exposure in children who are deaf or hard of hearing.
Collapse
Affiliation(s)
- Ana Marija Sola
- School of Medicine, University of California-San Francisco, San Francisco, California, USA
| | - Kara D Brodie
- Department of Otolaryngology-Head and Neck Surgery, University of California-San Francisco, San Francisco, California, USA
| | - Jihyun Stephans
- Division of Pediatric Otolaryngology, Department of Otolaryngology-Head and Neck Surgery, University of California-San Francisco, San Francisco, California, USA
| | - Chiara Scarpelli
- Division of Pediatric Otolaryngology, Department of Otolaryngology-Head and Neck Surgery, University of California-San Francisco, San Francisco, California, USA
| | - Dylan K Chan
- Division of Pediatric Otolaryngology, Department of Otolaryngology-Head and Neck Surgery, University of California-San Francisco, San Francisco, California, USA
| |
Collapse
|
7
|
Abstract
We explore here the application of modern computer hardware and software to the collection and analysis of behavioral data. We discuss the issues of ecological validity, storage and processing, data permanence, automation, validity, and algorithmic determinism. Taking the modern landscape into account, we demonstrate several varying projects we have recently undertaken as proofs of concept of the viability and utility of this approach. In particular, we describe four research projects, which involve work on child-directed speech; the application of automatic methods to clinical populations, including children with hearing loss; quality control and the assessment of validity; and the sharing of data in a public database. We conclude by pointing out how the methodology described here can be extended to a wide variety of interdisciplinary and detailed projects that are likely to lead to better science and improved outcomes for populations served by the behavioral, social, and health sciences.
Collapse
|
8
|
VanDam M, Yoshinaga-Itano C. Use of the LENA Autism Screen with Children who are Deaf or Hard of Hearing. MEDICINA (KAUNAS, LITHUANIA) 2019; 55:E495. [PMID: 31426435 PMCID: PMC6723169 DOI: 10.3390/medicina55080495] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/22/2019] [Revised: 07/27/2019] [Accepted: 08/12/2019] [Indexed: 11/25/2022]
Abstract
Background and Objectives: This systematic review reports the evidence from the literature concerning the potential for using an automated vocal analysis, the Language ENvironment Analysis (LENA, LENA Research Foundation, Boulder, CO, USA) in the screening process for children at risk for autism spectrum disorder (ASD) and deaf or hard of hearing (D/HH). ASD and D/HH have increased comorbidity, but current behavioral diagnostic and screening tools have limitations. The LENA Language Autism Screen (LLAS) may offer an additional tool to disambiguate ASD from D/HH in young children. Materials and Methods: We examine empirical reports that use automatic vocal analysis methods to differentiate disordered from typically developing children. Results: Consensus across the sampled scientific literature shows support for use of automatic methods for screening and disambiguation of children with ASD and D/HH. There is some evidence of vocal differentiation between ASD, D/HH, and typically-developing children warranting use of the LLAS, but additional empirical evidence is needed to better understand the strengths and weaknesses of the tool. Conclusions: The findings reported here warrant further, more substantive, methodologically-sound research that is fully powered to show a reliable difference. Findings may be useful for both clinicians and researchers in better identification and understanding of communication disorders.
Collapse
Affiliation(s)
- Mark VanDam
- Department of Speech & Hearing Sciences, Elson S. Floyd College of Medicine, Washington State University, Spokane, WA 99202, USA.
- Hearing Oral Program of Excellence (HOPE), Spokane, WA 99202, USA.
| | | |
Collapse
|
9
|
Bredin-Oja SL, Fielding H, Fleming KK, Warren SF. Clinician vs. Machine: Estimating Vocalizations Rates in Young Children With Developmental Disorders. AMERICAN JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2018; 27:1066-1072. [PMID: 29893787 PMCID: PMC6195029 DOI: 10.1044/2018_ajslp-17-0016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2017] [Revised: 07/07/2017] [Accepted: 03/19/2018] [Indexed: 05/31/2023]
Abstract
PURPOSE The purpose of this study was to investigate the reliability of an automated language analysis system, the Language Environment Analysis (LENA), compared with a human transcriber to determine the rate of child vocalizations during recording sessions that were significantly shorter than recommended for the automated device. METHOD Participants were 6 nonverbal male children between the ages of 28 and 46 months. Two children had autism diagnoses, 2 had Down syndrome, 1 had a chromosomal deletion, and 1 had developmental delay. Participants were recorded by the LENA digital language processor during 14 play-based interactions with a responsive adult. Rate of child vocalizations during each of the 84 recordings was determined by both a human transcriber and the LENA software. RESULTS A statistically significant difference between the 2 methods was observed for 4 of the 6 participants. Effect sizes were moderate to large. Variation in syllable structure did not explain the difference between the 2 methods. Vocalization rates from the 2 methods were highly correlated for 5 of the 6 participants. CONCLUSIONS Estimates of vocalization rates from nonverbal children produced by the LENA system differed from human transcription during sessions that were substantially shorter than the recommended recording length. These results confirm the recommendation of the LENA Foundation to record sessions of at least 1 hr.
Collapse
Affiliation(s)
| | - Heather Fielding
- The Schiefelbusch Institute of Life Span Studies, University of Kansas, Lawrence
| | - Kandace K. Fleming
- The Schiefelbusch Institute of Life Span Studies, University of Kansas, Lawrence
| | - Steven F. Warren
- The Schiefelbusch Institute of Life Span Studies, University of Kansas, Lawrence
| |
Collapse
|
10
|
Greenwood CR, Schnitz AG, Irvin D, Tsai SF, Carta JJ. Automated Language Environment Analysis: A Research Synthesis. AMERICAN JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2018; 27:853-867. [PMID: 29594313 PMCID: PMC7242915 DOI: 10.1044/2017_ajslp-17-0033] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Revised: 07/11/2017] [Accepted: 11/29/2017] [Indexed: 06/08/2023]
Abstract
PURPOSE The Language Environment Analysis (LENA®) represents a breakthrough in automatic speech detection because it makes one's language environment, what adults and children actually hear and say, efficiently measurable. The purpose of this article was to examine (a) current dimensions of LENA research, (b) LENA's sensitivity to differences in populations and language environments, and (c) what has been achieved in closing the Word Gap. METHOD From electronic and human searches, 83 peer-reviewed articles using LENA were identified, and 53 met inclusionary criteria and were included in a systematic literature review. Each article reported results of 1 study. RESULTS Originally developed to make natural language research more efficient and feasible, systematic review identified a broad landscape of relevant LENA findings focused primarily on the environments and communications of young children but also older adults and teachers. LENA's automated speech indicators (adult input, adult-child interaction, and child production) and the audio environment were shown to meet high validity standards, including accuracy, sensitivity to individual differences, and differences in populations, settings, contexts within settings, speakers, and languages. Researchers' own analyses of LENA audio recordings have extended our knowledge of microlevel processes in adult-child interaction. To date, intervention research using LENA has consisted of small pilot experiments, primarily on the effects of brief parent education plus quantitative linguistic feedback to parents. CONCLUSION Evidence showed that automated analysis has made a place in the repertoire of language research and practice. Implications, limitations, and future research are discussed.
Collapse
Affiliation(s)
| | - Alana G. Schnitz
- Juniper Gardens Children's Project, The University of Kansas, Kansas City
| | - Dwight Irvin
- Juniper Gardens Children's Project, The University of Kansas, Kansas City
| | | | - Judith J. Carta
- Juniper Gardens Children's Project, The University of Kansas, Kansas City
| |
Collapse
|
11
|
Ganek H, Eriks-Brophy A. Language ENvironment analysis (LENA) system investigation of day long recordings in children: A literature review. JOURNAL OF COMMUNICATION DISORDERS 2018; 72:77-85. [PMID: 29402382 DOI: 10.1016/j.jcomdis.2017.12.005] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Revised: 12/05/2017] [Accepted: 12/07/2017] [Indexed: 06/07/2023]
Abstract
The Language ENvironment Analysis (LENA) System is a relatively new recording technology that can be used to investigate typical child language acquisition and populations with language disorders. The purpose of this paper is to familiarize language acquisition researchers and speech-language pathologists with how the LENA System is currently being used in research. The authors outline issues in peer-reviewed research based on the device. Considerations when using the LENA System are discussed.
Collapse
Affiliation(s)
- Hillary Ganek
- The Department of Speech-Language Pathology, University of Toronto, 500 University Ave. Toronto, ON, M5G 1V7, Canada.
| | - Alice Eriks-Brophy
- The Department of Speech-Language Pathology, University of Toronto, 500 University Ave. Toronto, ON, M5G 1V7, Canada
| |
Collapse
|
12
|
Beckman ME, Plummer AR, Munson B, Reidy PF. Methods for eliciting, annotating, and analyzing databases for child speech development. COMPUT SPEECH LANG 2017; 45:278-299. [PMID: 28943715 DOI: 10.1016/j.csl.2017.02.010] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Methods from automatic speech recognition (ASR), such as segmentation and forced alignment, have facilitated the rapid annotation and analysis of very large adult speech databases and databases of caregiver-infant interaction, enabling advances in speech science that were unimaginable just a few decades ago. This paper centers on two main problems that must be addressed in order to have analogous resources for developing and exploiting databases of young children's speech. The first problem is to understand and appreciate the differences between adult and child speech that cause ASR models developed for adult speech to fail when applied to child speech. These differences include the fact that children's vocal tracts are smaller than those of adult males and also changing rapidly in size and shape over the course of development, leading to between-talker variability across age groups that dwarfs the between-talker differences between adult men and women. Moreover, children do not achieve fully adult-like speech motor control until they are young adults, and their vocabularies and phonological proficiency are developing as well, leading to considerably more within-talker variability as well as more between-talker variability. The second problem then is to determine what annotation schemas and analysis techniques can most usefully capture relevant aspects of this variability. Indeed, standard acoustic characterizations applied to child speech reveal that adult-centered annotation schemas fail to capture phenomena such as the emergence of covert contrasts in children's developing phonological systems, while also revealing children's nonuniform progression toward community speech norms as they acquire the phonological systems of their native languages. Both problems point to the need for more basic research into the growth and development of the articulatory system (as well as of the lexicon and phonological system) that is oriented explicitly toward the construction of age-appropriate computational models.
Collapse
Affiliation(s)
| | | | | | - Patrick F Reidy
- Callier Center for Communication Disorders, University of Texas at Dallas
| |
Collapse
|
13
|
Richards JA, Xu D, Gilkerson J, Yapanel U, Gray S, Paul T. Automated Assessment of Child Vocalization Development Using LENA. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2017; 60:2047-2063. [PMID: 28609511 DOI: 10.1044/2017_jslhr-l-16-0157] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2016] [Accepted: 01/25/2017] [Indexed: 06/07/2023]
Abstract
PURPOSE To produce a novel, efficient measure of children's expressive vocal development on the basis of automatic vocalization assessment (AVA), child vocalizations were automatically identified and extracted from audio recordings using Language Environment Analysis (LENA) System technology. METHOD Assessment was based on full-day audio recordings collected in a child's unrestricted, natural language environment. AVA estimates were derived using automatic speech recognition modeling techniques to categorize and quantify the sounds in child vocalizations (e.g., protophones and phonemes). These were expressed as phone and biphone frequencies, reduced to principal components, and inputted to age-based multiple linear regression models to predict independently collected criterion-expressive language scores. From these models, we generated vocal development AVA estimates as age-standardized scores and development age estimates. RESULT AVA estimates demonstrated strong statistical reliability and validity when compared with standard criterion expressive language assessments. CONCLUSIONS Automated analysis of child vocalizations extracted from full-day recordings in natural settings offers a novel and efficient means to assess children's expressive vocal development. More research remains to identify specific mechanisms of operation.
Collapse
Affiliation(s)
| | - Dongxin Xu
- LENA Research Foundation, Boulder, COUniversity of Colorado, Boulder
| | - Jill Gilkerson
- LENA Research Foundation, Boulder, COUniversity of Colorado, Boulder
| | | | | | | |
Collapse
|
14
|
Abney DH, Warlaumont AS, Oller DK, Wallot S, Kello CT. Multiple Coordination Patterns in Infant and Adult Vocalizations. INFANCY 2016; 22:514-539. [PMID: 29375276 DOI: 10.1111/infa.12165] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
The study of vocal coordination between infants and adults has led to important insights into the development of social, cognitive, emotional and linguistic abilities. We used an automatic system to identify vocalizations produced by infants and adults over the course of the day for fifteen infants studied longitudinally during the first two years of life. We measured three different types of vocal coordination: coincidence-based, rate-based, and cluster-based. Coincidence-based and rate-based coordination are established measures in the developmental literature. Cluster-based coordination is new and measures the strength of matching in the degree to which vocalization events occur in hierarchically nested clusters. We investigated whether various coordination patterns differ as a function of vocalization type, whether different coordination patterns provide unique information about the dynamics of vocal interaction, and how the various coordination patterns each relate to infant age. All vocal coordination patterns displayed greater coordination for infant speech-related vocalizations, adults adapted the hierarchical clustering of their vocalizations to match that of infants, and each of the three coordination patterns had unique associations with infant age. Altogether, our results indicate that vocal coordination between infants and adults is multifaceted, suggesting a complex relationship between vocal coordination and the development of vocal communication.
Collapse
Affiliation(s)
- Drew H Abney
- Cognitive and Information Sciences, University of California, Merced
| | - Anne S Warlaumont
- Cognitive and Information Sciences, University of California, Merced
| | - D Kimbrough Oller
- School of Communication Sciences and Disorders, University of Memphis
| | | | | |
Collapse
|
15
|
VanDam M, Silbert NH. Fidelity of Automatic Speech Processing for Adult and Child Talker Classifications. PLoS One 2016; 11:e0160588. [PMID: 27529813 PMCID: PMC4986949 DOI: 10.1371/journal.pone.0160588] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2015] [Accepted: 07/21/2016] [Indexed: 11/18/2022] Open
Abstract
Automatic speech processing (ASP) has recently been applied to very large datasets of naturalistically collected, daylong recordings of child speech via an audio recorder worn by young children. The system developed by the LENA Research Foundation analyzes children's speech for research and clinical purposes, with special focus on of identifying and tagging family speech dynamics and the at-home acoustic environment from the auditory perspective of the child. A primary issue for researchers, clinicians, and families using the Language ENvironment Analysis (LENA) system is to what degree the segment labels are valid. This classification study evaluates the performance of the computer ASP output against 23 trained human judges who made about 53,000 judgements of classification of segments tagged by the LENA ASP. Results indicate performance consistent with modern ASP such as those using HMM methods, with acoustic characteristics of fundamental frequency and segment duration most important for both human and machine classifications. Results are likely to be important for interpreting and improving ASP output.
Collapse
Affiliation(s)
- Mark VanDam
- Department of Speech & Hearing Sciences, Elson S. Floyd College of Medicine, Washington State University, Spokane, Washington, United States of America
- Spokane Hearing Oral Program of Excellence (HOPE), Spokane, Washington, United States of America
| | - Noah H. Silbert
- Department of Communication Sciences & Disorders, University of Cincinnati, Cincinnati, Ohio, United States of America
| |
Collapse
|
16
|
VanDam M, Warlaumont AS, Bergelson E, Cristia A, Soderstrom M, De Palma P, MacWhinney B. HomeBank: An Online Repository of Daylong Child-Centered Audio Recordings. Semin Speech Lang 2016; 37:128-42. [PMID: 27111272 PMCID: PMC5570530 DOI: 10.1055/s-0036-1580745] [Citation(s) in RCA: 53] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
HomeBank is introduced here. It is a public, permanent, extensible, online database of daylong audio recorded in naturalistic environments. HomeBank serves two primary purposes. First, it is a repository for raw audio and associated files: one database requires special permissions, and another redacted database allows unrestricted public access. Associated files include metadata such as participant demographics and clinical diagnostics, automated annotations, and human-generated transcriptions and annotations. Many recordings use the child-perspective LENA recorders (LENA Research Foundation, Boulder, Colorado, United States), but various recordings and metadata can be accommodated. The HomeBank database can have both vetted and unvetted recordings, with different levels of accessibility. Additionally, HomeBank is an open repository for processing and analysis tools for HomeBank or similar data sets. HomeBank is flexible for users and contributors, making primary data available to researchers, especially those in child development, linguistics, and audio engineering. HomeBank facilitates researchers' access to large-scale data and tools, linking the acoustic, auditory, and linguistic characteristics of children's environments with a variety of variables including socioeconomic status, family characteristics, language trajectories, and disorders. Automated processing applied to daylong home audio recordings is now becoming widely used in early intervention initiatives, helping parents to provide richer speech input to at-risk children.
Collapse
Affiliation(s)
- Mark VanDam
- Department of Speech and Hearing Sciences, Elson S. Floyd College of Medicine, Washington State University, and Spokane Hearing Oral Program of Excellence (HOPE), Spokane, Washington
| | - Anne S. Warlaumont
- Cognitive and Information Sciences, University of California, Merced, California
| | - Elika Bergelson
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York
| | - Alejandrina Cristia
- Laboratoire de Sciences Cognitives et Psycholinguistique (ENS, EHESS, CNRS), Département d’Etudes Cognitives, Ecole Normale Supérieure, PSL Research University, Paris, France
| | | | - Paul De Palma
- Department of Computer Science, School of Engineering and Applied Science, Gonzaga University, Spokane, Washington
| | - Brian MacWhinney
- Department of Psychology, Carnegie Mellon University, Pittsburgh, Pennsylvania
| |
Collapse
|