1
|
Gustafsson JK, Södersten M, Ternström S, Schalling E. Voice Use in Daily Life Studied With a Portable Voice Accumulator in Individuals With Parkinson's Disease and Matched Healthy Controls. J Speech Lang Hear Res 2019; 62:4324-4334. [PMID: 31830844 DOI: 10.1044/2019_jslhr-19-00037] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Purpose The purpose of this work was to study how voice use in daily life is impacted by Parkinson's disease (PD), specifically if there is a difference in voice sound level and phonation ratio during everyday activities for individuals with PD and matched healthy controls. A further aim was to study how variations in environmental noise impact voice use. Method Long-term registration of voice use during 1 week in daily life was performed for 21 participants with PD (11 male, 10 female) and 21 matched healthy controls using the portable voice accumulator VoxLog. Voice use was assessed through registrations of spontaneous speech in different ranges of environmental noise in daily life and in a controlled studio recording setting. Results Individuals with PD use their voice 50%-60% less than their matched healthy controls in daily life. The difference increases in high levels of environmental noise. Individuals with PD used an average voice sound level in daily life that was 8.11 dB (female) and 6.7 dB (male) lower than their matched healthy controls. Difference in mean voice sound level for individuals with PD and controls during spontaneous speech during a controlled studio registration was 3.0 dB for the female group and 4.1 dB for the male group. Conclusions The observed difference in voice use in daily life between individuals with PD and matched healthy controls is a 1st step to objectively quantify the impact of PD on communicative participation. The variations in voice use in different levels of environmental noise and when comparing controlled and variable environments support the idea that the study of voice use should include methods to assess function in less controlled situations outside the clinical setting.
Collapse
Affiliation(s)
- Joakim Körner Gustafsson
- Division of Speech and Language Pathology, Department of Clinical Science, Intervention and Technology, Karolinska Institutet, Stockholm, Sweden
- Functional Area Speech and Language Pathology, Karolinska University Hospital, Stockholm, Sweden
| | - Maria Södersten
- Division of Speech and Language Pathology, Department of Clinical Science, Intervention and Technology, Karolinska Institutet, Stockholm, Sweden
- Functional Area Speech and Language Pathology, Karolinska University Hospital, Stockholm, Sweden
| | - Sten Ternström
- School of Computer Science and Communication, Department of Speech, Music and Hearing, Royal Institute of Technology (KTH), Stockholm, Sweden
| | - Ellika Schalling
- Division of Speech and Language Pathology, Department of Clinical Science, Intervention and Technology, Karolinska Institutet, Stockholm, Sweden
- Functional Area Speech and Language Pathology, Karolinska University Hospital, Stockholm, Sweden
| |
Collapse
|
2
|
Harper S, Lee S, Goldstein L, Byrd D. Simultaneous electromagnetic articulography and electroglottography data acquisition of natural speech. J Acoust Soc Am 2018; 144:EL380. [PMID: 30522297 PMCID: PMC6219895 DOI: 10.1121/1.5066349] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2018] [Revised: 09/17/2018] [Accepted: 10/10/2018] [Indexed: 06/09/2023]
Abstract
This paper reports on the concurrent use of electroglottography (EGG) and electromagnetic articulography (EMA) in the acquisition of EMA trajectory data for running speech. Static and dynamic intersensor distances, standard deviations, and coefficients of variation associated with inter-sample distances were compared in two conditions: with and without EGG present. Results indicate that measurement discrepancies between the two conditions are within the EMA system's measurement uncertainty. Therefore, potential electromagnetic interference from EGG does not seem to cause differences of practical importance on EMA trajectory behaviors, suggesting that simultaneous EMA and EGG data acquisition is a viable laboratory procedure for speech research.
Collapse
Affiliation(s)
- Sarah Harper
- Department of Linguistics, University of Southern California, Los Angeles, California 90089, USA , , ,
| | - Sungbok Lee
- Department of Linguistics, University of Southern California, Los Angeles, California 90089, USA , , ,
| | - Louis Goldstein
- Department of Linguistics, University of Southern California, Los Angeles, California 90089, USA , , ,
| | - Dani Byrd
- Department of Linguistics, University of Southern California, Los Angeles, California 90089, USA , , ,
| |
Collapse
|
3
|
Lee Y, Gordon Danner S, Parrell B, Lee S, Goldstein L, Byrd D. Articulatory, acoustic, and prosodic accommodation in a cooperative maze navigation task. PLoS One 2018; 13:e0201444. [PMID: 30086554 PMCID: PMC6081084 DOI: 10.1371/journal.pone.0201444] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2017] [Accepted: 07/16/2018] [Indexed: 11/23/2022] Open
Abstract
This study uses a maze navigation task in conjunction with a quasi-scripted, prosodically controlled speech task to examine acoustic and articulatory accommodation in pairs of interacting speakers. The experiment uses a dual electromagnetic articulography set-up to collect synchronized acoustic and articulatory kinematic data from two facing speakers simultaneously. We measure the members of a dyad individually before they interact, while they are interacting in a cooperative task, and again individually after they interact. The design is ideally suited to measure speech convergence, divergence, and persistence effects during and after speaker interaction. This study specifically examines how convergence and divergence effects during a dyadic interaction may be related to prosodically salient positions, such as preceding a phrase boundary. The findings of accommodation in fine-grained prosodic measures illuminate our understanding of how the realization of linguistic phrasal structure is coordinated across interacting speakers. Our findings on individual speaker variability and the time course of accommodation provide novel evidence for accommodation at the level of cognitively specified motor control of individual articulatory gestures. Taken together, these results have implications for understanding the cognitive control of interactional behavior in spoken language communication.
Collapse
Affiliation(s)
- Yoonjeong Lee
- Department of Linguistics, University of Southern California, Los Angeles, California, United States of America
| | - Samantha Gordon Danner
- Department of Linguistics, University of Southern California, Los Angeles, California, United States of America
| | - Benjamin Parrell
- Department of Linguistics, University of Southern California, Los Angeles, California, United States of America
- Department of Linguistics, University of Delaware, Newark, Delaware, United States of America
| | - Sungbok Lee
- Department of Linguistics, University of Southern California, Los Angeles, California, United States of America
| | - Louis Goldstein
- Department of Linguistics, University of Southern California, Los Angeles, California, United States of America
| | - Dani Byrd
- Department of Linguistics, University of Southern California, Los Angeles, California, United States of America
| |
Collapse
|
4
|
Astolfi A, Castellana A, Carullo A, Puglisi GE. Uncertainty of speech level parameters measured with a contact-sensor-based device and a headworn microphone. J Acoust Soc Am 2018; 143:EL496. [PMID: 29960427 DOI: 10.1121/1.5042761] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This work estimates the uncertainty contributions of speech level parameters measured with a contact-sensor-based device and a headworn microphone. Four contributions are considered: (1) instrumental uncertainty, related to device calibration; (2) method repeatability and (3) reproducibility, estimated through repeated measurements without and with device repositioning, respectively; (4) source reproducibility, due to the variability of human speech. To ascertain changes in speech production, differences between measures should be at least higher than the expanded uncertainty. In the case of device repositioning, the expanded uncertainty combines contributions (1), (3), and (4). When the device is not repositioned, it combines contributions (2) and (4).
Collapse
Affiliation(s)
- Arianna Astolfi
- Politecnico di Torino, Department of Energy, Corso Duca degli Abruzzi, 24, 10129, Torino, Italy
| | - Antonella Castellana
- Politecnico di Torino, Department of Energy, Corso Duca degli Abruzzi, 24, 10129, Torino, Italy
| | - Alessio Carullo
- Politecnico di Torino, Department of Electronics and Telecommunications, Corso Duca degli Abruzzi, 24, 10129, Torino, Italy , , ,
| | - Giuseppina Emma Puglisi
- Politecnico di Torino, Department of Energy, Corso Duca degli Abruzzi, 24, 10129, Torino, Italy
| |
Collapse
|
5
|
Greenwood CR, Schnitz AG, Irvin D, Tsai SF, Carta JJ. Automated Language Environment Analysis: A Research Synthesis. Am J Speech Lang Pathol 2018; 27:853-867. [PMID: 29594313 PMCID: PMC7242915 DOI: 10.1044/2017_ajslp-17-0033] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Revised: 07/11/2017] [Accepted: 11/29/2017] [Indexed: 06/08/2023]
Abstract
PURPOSE The Language Environment Analysis (LENA®) represents a breakthrough in automatic speech detection because it makes one's language environment, what adults and children actually hear and say, efficiently measurable. The purpose of this article was to examine (a) current dimensions of LENA research, (b) LENA's sensitivity to differences in populations and language environments, and (c) what has been achieved in closing the Word Gap. METHOD From electronic and human searches, 83 peer-reviewed articles using LENA were identified, and 53 met inclusionary criteria and were included in a systematic literature review. Each article reported results of 1 study. RESULTS Originally developed to make natural language research more efficient and feasible, systematic review identified a broad landscape of relevant LENA findings focused primarily on the environments and communications of young children but also older adults and teachers. LENA's automated speech indicators (adult input, adult-child interaction, and child production) and the audio environment were shown to meet high validity standards, including accuracy, sensitivity to individual differences, and differences in populations, settings, contexts within settings, speakers, and languages. Researchers' own analyses of LENA audio recordings have extended our knowledge of microlevel processes in adult-child interaction. To date, intervention research using LENA has consisted of small pilot experiments, primarily on the effects of brief parent education plus quantitative linguistic feedback to parents. CONCLUSION Evidence showed that automated analysis has made a place in the repertoire of language research and practice. Implications, limitations, and future research are discussed.
Collapse
Affiliation(s)
| | - Alana G. Schnitz
- Juniper Gardens Children's Project, The University of Kansas, Kansas City
| | - Dwight Irvin
- Juniper Gardens Children's Project, The University of Kansas, Kansas City
| | | | - Judith J. Carta
- Juniper Gardens Children's Project, The University of Kansas, Kansas City
| |
Collapse
|
6
|
Carignan C. Using ultrasound and nasalance to separate oral and nasal contributions to formant frequencies of nasalized vowels. J Acoust Soc Am 2018; 143:2588. [PMID: 29857694 DOI: 10.1121/1.5034760] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The experimental method described in this manuscript offers a possible means to address a well known issue in research on the independent effects of nasalization on vowel acoustics: given that the separate transfer functions associated with the oral and nasal cavities are merged in the acoustic signal, the task of teasing apart the respective effects of the two cavities seems to be an intractable problem. The proposed method uses ultrasound and nasalance to predict the effect of lingual configuration on formant frequencies of nasalized vowels, thus accounting for acoustic variation due to changing lingual posture and excluding its contribution to the acoustic signal. The results reveal that the independent effect of nasalization on the acoustic vowel quadrilateral resembles a counter-clockwise chain shift of nasal compared to non-nasal vowels. The results from the productions of 11 vowels by six speakers of different language backgrounds are compared to predictions presented in previous modeling studies, as well as discussed in the light of sound change of nasal vowel systems.
Collapse
Affiliation(s)
- Christopher Carignan
- Institut für Phonetik und Sprachverarbeitung, Ludwig-Maximilians-Universität München, Schellingstraße 3, 80799 Munich, Germany
| |
Collapse
|
7
|
Švec JG, Granqvist S. Tutorial and Guidelines on Measurement of Sound Pressure Level in Voice and Speech. J Speech Lang Hear Res 2018; 61:441-461. [PMID: 29450495 DOI: 10.1044/2017_jslhr-s-17-0095] [Citation(s) in RCA: 62] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2017] [Accepted: 08/16/2017] [Indexed: 06/08/2023]
Abstract
PURPOSE Sound pressure level (SPL) measurement of voice and speech is often considered a trivial matter, but the measured levels are often reported incorrectly or incompletely, making them difficult to compare among various studies. This article aims at explaining the fundamental principles behind these measurements and providing guidelines to improve their accuracy and reproducibility. METHOD Basic information is put together from standards, technical, voice and speech literature, and practical experience of the authors and is explained for nontechnical readers. RESULTS Variation of SPL with distance, sound level meters and their accuracy, frequency and time weightings, and background noise topics are reviewed. Several calibration procedures for SPL measurements are described for stand-mounted and head-mounted microphones. CONCLUSIONS SPL of voice and speech should be reported together with the mouth-to-microphone distance so that the levels can be related to vocal power. Sound level measurement settings (i.e., frequency weighting and time weighting/averaging) should always be specified. Classified sound level meters should be used to assure measurement accuracy. Head-mounted microphones placed at the proximity of the mouth improve signal-to-noise ratio and can be taken advantage of for voice SPL measurements when calibrated. Background noise levels should be reported besides the sound levels of voice and speech.
Collapse
Affiliation(s)
- Jan G Švec
- Department of Biophysics, Faculty of Science, Palacký University, Olomouc, Czech Republic
| | - Svante Granqvist
- Department of Basic Science and Biomedicine, School of Technology and Health, Royal Institute of Technology, Stockholm, Sweden
- Division of Speech and Language Pathology, Department of Clinical Science, Intervention and Technology, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
8
|
Bottalico P, Ipsaro Passione I, Astolfi A, Carullo A, Hunter EJ. Accuracy of the quantities measured by four vocal dosimeters and its uncertainty. J Acoust Soc Am 2018; 143:1591. [PMID: 29604673 PMCID: PMC5864503 DOI: 10.1121/1.5027816] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Although vocal dosimeters are often used for long-term voice monitoring, the uncertainty of the quantities measured by these devices is not always stated. In this study, two common vocal dosimetry quantities, mean vocal sound pressure level and mean vocal fundamental frequency, were measured by four vocal dosimeters (VocaLog2, VoxLog, Voice Care, and APM3200). The expanded uncertainty of the mean error in the estimation of these two quantities as measured by the four dosimeters was performed by simultaneously comparing signals acquired through a reference microphone and the devices themselves. Dosimeters, assigned in random order, were worn by the participants (22 vocally healthy adults), along with a head-mounted microphone, which acted as a reference. For each device, participants produced a sustained /a/ vowel four times and then read a text with three different vocal efforts (relaxed, normal, and raised). The measurement uncertainty was obtained by comparing data from the microphone and the dosimeters. The mean vocal sound pressure level was captured the most accurately by the Voice Care and the VoxLog while the APM3200 was the least accurate. The most accurate mean vocal fundamental frequency was estimated by the Voice Care and the APM3200, while the VoxLog was the least accurate.
Collapse
Affiliation(s)
- Pasquale Bottalico
- Department of Speech and Hearing Science, University of Illinois Urbana-Champaign, Champaign, Illinois 61820, USA
| | - Ivano Ipsaro Passione
- Voice Biomechanics and Acoustics Laboratory, Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan 48824, USA
| | | | - Alessio Carullo
- Department of Electronics and Telecommunications, Politecnico di Torino, Torino, Italy
| | - Eric J Hunter
- Voice Biomechanics and Acoustics Laboratory, Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan 48824, USA
| |
Collapse
|
9
|
Gunjawate DR, Ravi R, Bellur R. Acoustic Analysis of Voice in Singers: A Systematic Review. J Speech Lang Hear Res 2018; 61:40-51. [PMID: 29344619 DOI: 10.1044/2017_jslhr-s-17-0145] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2017] [Accepted: 09/05/2017] [Indexed: 06/07/2023]
Abstract
PURPOSE Singers are vocal athletes having specific demands from their voice and require special consideration during voice evaluation. Presently, there is a lack of standards for acoustic evaluation in them. The aim of the present study was to systematically review the available literature on the acoustic analysis of voice in singers. METHOD A systematic review of studies on acoustic analysis of voice in singers (PubMed/MEDLINE, CINAHL, Scopus, ProQuest, Cochrane, Ovid, Science Direct, and Shodhganga) was carried out. Key words based on PIO (population-investigation-outcome) were used to develop search strings. Titles and abstracts were screened independently, and appropriate studies were read in full for data extraction. RESULTS Of the 895 studies, 26 studies met the inclusion criteria. Great variability was noted in the instruments and task used. Different acoustic measures were employed, such as fundamental frequency, perturbation, cepstral, spectral, dysphonia severity index, singing power ratio, and so forth. CONCLUSION Overall, a great heterogeneity was noted regarding population, tasks, instruments, and parameters. There is a lack of standardized criteria for the evaluation of singing voice. In order to implement acoustic analysis as a part of comprehensive voice evaluation exclusively for singers, there is a certain need for methodical sound studies.
Collapse
Affiliation(s)
- Dhanshree R Gunjawate
- Department of Audiology and Speech Language Pathology, Kasturba Medical College, Mangalore, Manipal Academy of Higher Education, Karnataka, India
| | - Rohit Ravi
- Department of Speech and Hearing, School of Allied Health Sciences, Manipal Academy of Higher Education, Karnataka, India
| | - Rajashekhar Bellur
- Department of Speech and Hearing, School of Allied Health Sciences, Manipal Academy of Higher Education, Karnataka, India
| |
Collapse
|
10
|
Abstract
Objective The objective of this study was to evaluate test-retest nasalance score variability in subjects with hypernasal resonance. Design Two groups of subjects with hypernasal speech recited both the Turtle Passage and the Mouse Passage two times each. For one group, the Nasometer headgear was not changed between repetitions of each stimulus (NCHG; n = 17); and for the other group, the headgear was changed between repetitions (CHG; n = 18). Three subjects in the CHG group would not recite the Mouse Passage two times. Participants The subjects were 35 patients with hypernasal speech followed by a cleft palate team. Main Outcome Measures The outcome measures were the four nasalance scores obtained for each subject. Results There was no significant difference between first and second repetitions for either stimulus in either the NCHG group or in the CHG group. Cumulative frequency tables showed that for the Turtle Passage-NCHG condition, 15 of the 17 (88%) repeated nasalance scores were within 5 nasalance points of each other. For the CHG condition, however, only 9 of 18 (50%) repeated nasalance scores were within 5 points. For the Mouse Passage-NCHG condition, 15 of the 17 (88%) repeated nasalance scores were within 5 points. For the CHG condition, however, only 11 of 15 (73%) repeated scores were within 5 points. Conclusions Test-retest variability was greater in a population of hypernasal patients than that reported in other studies for normal speakers, and headgear change increased test-retest variability.
Collapse
Affiliation(s)
- Thomas Watterson
- University of Nevada School of Medicine, Reno, Nevada 89557, USA.
| | | |
Collapse
|
11
|
Abstract
Objective: To examine the validity of the Nasometer (KayPENTAX, Lincoln Park, NJ) in measuring the temporal characteristics of nasalization by comparing the Nasometer measures to the measures from an external criterion procedure. Design: Speech samples consisted of three rate-controlled nonsense syllables, which varied in their vowel compositions: /izinizi/, /azanaza/, and /uzunuzu/. Acoustic data were recorded simultaneously through the Nasometer and an external criterion procedure (a specialized microphone set that collected acoustic signals separately for the nasal and oral channels). Speech segment durations measured from the two instrumental conditions were compared on the Nasometer display and the Computerized Speech Lab (KayPENTAX, Lincoln Park, NJ) display. Five durational variables were measured: total utterance duration, nasal onset interval, nasal consonant duration, nasal offset interval, and total nasalization duration. Participants: Fourteen normal adults who speak American English as their first language participated in the study. Results: No significant differences were found between the measures from the Nasometer and those from an external criterion procedure in all the durational variables pertinent to nasalization. Different vowels, however, yielded significantly different patterns in these durational variables, in which the low vowel /a/ context revealed significantly longer total nasalization duration than did the high vowel /i/ and /u/ contexts. Conclusions: The results suggest that the Nasometer can be used as a valid tool to measure the temporal characteristics underlying nasalization and confirm significant vowel effects on the temporal patterns of nasalization.
Collapse
Affiliation(s)
- Youkyung Bae
- Department of Speech and Hearing Science, University of Illinois at Urbana-Champaign, Champaign, Illinois 61820, USA.
| | | | | |
Collapse
|
12
|
Abstract
Objective To compare nasalance scores obtained with the Nasometer, the NasalView, and the OroNasal System; evaluate test-retest reliability of the three systems; and explore whether three common text passages used for nasalance analysis could be shortened to a sentence each. Subjects Seventy-six adults with normal speech and hearing (mean age 26.5 years). Procedures Subjects read the complete Zoo Passage, Rainbow Passage, and Nasal Sentences. Main Outcome Measures Mean nasalance magnitudes and mean nasalance distances were obtained with the three devices. Results The Nasometer had the lowest nasalance scores for the nonnasal Zoo Passage. The NasalView had the highest nasalance scores for the phonetically balanced Rainbow Passage. The OroNasal System had the lowest nasalance scores for the Nasal Sentences. The nasalance distance was largest for the Nasometer and smallest for the OroNasal System. Over 90% of the recordings were within 4% to 6% nasalance for most materials recorded with the Nasometer and the NasalView and within 7% to 9% for materials recorded with the OroNasal System. There were significant differences between the complete Zoo Passage and the Nasal Sentences and the individual sentences from these passages for the Nasometer and the OroNasal System. Conclusions The three systems measure nasalance in different ways and provide nasalance scores that are not interchangeable. Test-retest variability for the Nasometer and the NasalView may be higher than previously reported. Individual sentences from the Zoo Passage and the Nasal Sentences do not provide nasalance scores that are equivalent to the complete passages.
Collapse
Affiliation(s)
- Tim Bressmann
- Graduate Department of Speech-Language Pathology, Faculty of Medicine, University of Toronto, Toronto, Canada.
| |
Collapse
|
13
|
Abstract
Objective To obtain normal nasalance values during the production of a standardized speech sample for Irish children and determine whether significantly different scores exist for different speech stimuli for female and male speakers. Design Mean nasalance scores were obtained for normal-speaking children during the repetition of 16 test sentences that were categorized according to consonant type within the sentences (high-pressure consonants, low-pressure consonants, nasal consonants). Participants Seventy children (36 girls and 34 boys, aged 4 years 11 months to 13 years) with normal articulation, resonance, and voice were included. Procedures Children repeated each of the 16 test sentences individually. The sentences were presented in groups according to consonant type, referred to as sentence categories. Data were collected and analyzed using the Kay nasometer (model 6200.3). Nasalance scores were obtained for the total speech sample and each sentence category. Data were statistically analyzed to investigate the effects of gender, sentence category, and gender by sentence category. Results Normative nasalance scores were obtained for the total speech sample (26%), high-pressure consonant sentences (14%), low-pressure consonant sentences (16%), and a nasal consonant sentence (51%). There was no significant difference in nasalance scores between male and female speakers. Significant differences were found between each sentence category (p ≤ .001), except between the high-pressure and low-pressure consonant sentence categories (p = .09). Conclusion The present study provides normative nasalance data for English-speaking Irish children. There was a significant difference between nasalance scores for different speech stimuli.
Collapse
|
14
|
Savariaux C, Badin P, Samson A, Gerber S. A Comparative Study of the Precision of Carstens and Northern Digital Instruments Electromagnetic Articulographs. J Speech Lang Hear Res 2017; 60:322-340. [PMID: 28152131 DOI: 10.1044/2016_jslhr-s-15-0223] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2015] [Accepted: 06/23/2016] [Indexed: 06/06/2023]
Abstract
PURPOSE This study compares the precision of the electromagnetic articulographs used in speech research: Northern Digital Instruments' Wave and Carstens' AG200, AG500, and AG501 systems. METHOD The fluctuation of distances between 3 pairs of sensors attached to a manually rotated device that can position them inside the measurement volumes was determined. For each device, 2 precision estimates made on the basis of the 95% quantile range of these distances (QR95) were defined: The local QR95 was computed for bins around specific rotation angles, and the global QR95 was computed for all angles pooled. RESULTS For all devices, although the local precision lies around 0.1 cm, the global precision is much more worrisome, ranging from 0.03 cm to 2.18 cm, and displays large variations as a function of the position of the sensors in the measurement volume. No influence of the rotational speed was found. The AG501 produced-by far-the lowest errors, in particular concerning the global precision. CONCLUSIONS The local precision can be considered suitable for speech articulatory measurements, but the variations of the global precision need to be taken into account by the knowledge of the spatial distribution of errors. A guideline for good practice in EMA recording is proposed for each system.
Collapse
Affiliation(s)
| | - Pierre Badin
- Univ. Grenoble Alpes, CNRS, GIPSA-Lab, Grenoble, France
| | - Adeline Samson
- Univ. Grenoble Alpes, CNRS, Laboratoire Jean Kuntzmann, Grenoble, France
| | | |
Collapse
|
15
|
Bocquelet F, Hueber T, Girin L, Savariaux C, Yvert B. Real-Time Control of an Articulatory-Based Speech Synthesizer for Brain Computer Interfaces. PLoS Comput Biol 2016; 12:e1005119. [PMID: 27880768 PMCID: PMC5120792 DOI: 10.1371/journal.pcbi.1005119] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2016] [Accepted: 08/25/2016] [Indexed: 11/29/2022] Open
Abstract
Restoring natural speech in paralyzed and aphasic people could be achieved using a Brain-Computer Interface (BCI) controlling a speech synthesizer in real-time. To reach this goal, a prerequisite is to develop a speech synthesizer producing intelligible speech in real-time with a reasonable number of control parameters. We present here an articulatory-based speech synthesizer that can be controlled in real-time for future BCI applications. This synthesizer converts movements of the main speech articulators (tongue, jaw, velum, and lips) into intelligible speech. The articulatory-to-acoustic mapping is performed using a deep neural network (DNN) trained on electromagnetic articulography (EMA) data recorded on a reference speaker synchronously with the produced speech signal. This DNN is then used in both offline and online modes to map the position of sensors glued on different speech articulators into acoustic parameters that are further converted into an audio signal using a vocoder. In offline mode, highly intelligible speech could be obtained as assessed by perceptual evaluation performed by 12 listeners. Then, to anticipate future BCI applications, we further assessed the real-time control of the synthesizer by both the reference speaker and new speakers, in a closed-loop paradigm using EMA data recorded in real time. A short calibration period was used to compensate for differences in sensor positions and articulatory differences between new speakers and the reference speaker. We found that real-time synthesis of vowels and consonants was possible with good intelligibility. In conclusion, these results open to future speech BCI applications using such articulatory-based speech synthesizer.
Collapse
Affiliation(s)
- Florent Bocquelet
- INSERM, BrainTech Laboratory U1205, Grenoble, France
- Univ. Grenoble Alpes, BrainTech Laboratory U1205, Grenoble, France
- CNRS, GIPSA-Lab, Saint-Martin-d'Hères, France
- Univ. Grenoble Alpes, GIPSA-Lab, Saint-Martin-d'Hères, France
| | - Thomas Hueber
- CNRS, GIPSA-Lab, Saint-Martin-d'Hères, France
- Univ. Grenoble Alpes, GIPSA-Lab, Saint-Martin-d'Hères, France
| | - Laurent Girin
- Univ. Grenoble Alpes, GIPSA-Lab, Saint-Martin-d'Hères, France
- INRIA Grenoble Rhône-Alpes, Montbonnot, France
| | - Christophe Savariaux
- CNRS, GIPSA-Lab, Saint-Martin-d'Hères, France
- Univ. Grenoble Alpes, GIPSA-Lab, Saint-Martin-d'Hères, France
| | - Blaise Yvert
- INSERM, BrainTech Laboratory U1205, Grenoble, France
- Univ. Grenoble Alpes, BrainTech Laboratory U1205, Grenoble, France
| |
Collapse
|
16
|
Aron M, Berger MO, Kerrien E, Wrobel-Dautcourt B, Potard B, Laprie Y. Multimodal acquisition of articulatory data: Geometrical and temporal registration. J Acoust Soc Am 2016; 139:636-648. [PMID: 26936548 DOI: 10.1121/1.4940666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Acquisition of dynamic articulatory data is of major importance for studying speech production. It turns out that one technique alone often is not enough to get a correct coverage of the whole vocal tract at a sufficient sampling rate. Ultrasound (US) imaging has been proposed as a good acquisition technique for the tongue surface because it offers a good temporal sampling, does not alter speech production, is cheap, and is widely available. However, it cannot be used alone and this paper describes a multimodal acquisition system which uses electromagnetography sensors to locate the US probe. The paper particularly focuses on the calibration of the US modality which is the key point of the system. This approach enables US data to be merged with other data. The use of the system is illustrated via an experiment consisting of measuring the minimal tongue to palate distance in order to evaluate and design Magnetic Resonance Imaging protocols well suited for the acquisition of three-dimensional images of the vocal tract. Compared to manual registration of acquisition modalities which is often used in acquisition of articulatory data, the approach presented relies on automatic techniques well founded from geometrical and mathematical points of view.
Collapse
Affiliation(s)
- Michaël Aron
- Institut Supérieur de l'Electronique et du Numérique, Brest, France
| | - Marie-Odile Berger
- Institut de Recherche en Informatique et en Automatique, Centre National de la Recherche Scientifique, Université de Lorraine, Laboratoire Lorrain de Recherche en Informatique et ses Applications, Vandœuvre-lès-Nancy, France
| | - Erwan Kerrien
- Institut de Recherche en Informatique et en Automatique, Centre National de la Recherche Scientifique, Université de Lorraine, Laboratoire Lorrain de Recherche en Informatique et ses Applications, Vandœuvre-lès-Nancy, France
| | - Brigitte Wrobel-Dautcourt
- Institut de Recherche en Informatique et en Automatique, Centre National de la Recherche Scientifique, Université de Lorraine, Laboratoire Lorrain de Recherche en Informatique et ses Applications, Vandœuvre-lès-Nancy, France
| | - Blaise Potard
- Institut de Recherche en Informatique et en Automatique, Centre National de la Recherche Scientifique, Université de Lorraine, Laboratoire Lorrain de Recherche en Informatique et ses Applications, Vandœuvre-lès-Nancy, France
| | - Yves Laprie
- Institut de Recherche en Informatique et en Automatique, Centre National de la Recherche Scientifique, Université de Lorraine, Laboratoire Lorrain de Recherche en Informatique et ses Applications, Vandœuvre-lès-Nancy, France
| |
Collapse
|
17
|
Kaganov AS, Kir'yanov PA. [The application of cybernetic modeling methods for the forensic medical personality identification based on the voice and sounding speech characteristics]. Sud Med Ekspert 2015; 58:40-43. [PMID: 26245103 DOI: 10.17116/sudmed201558340-43] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
The objective of the present publication was to discuss the possibility of application of cybernetic modeling methods to overcome the apparent discrepancy between two kinds of the speech records, viz. initial ones (e.g. obtained in the course of special investigation activities) and the voice prints obtained from the persons subjected to the criminalistic examination. The paper is based on the literature sources and the materials of original criminalistics expertises performed by the authors.
Collapse
Affiliation(s)
- A Sh Kaganov
- Institute of Linguistics, Russian Academy of Sciences, Moscow, Russia, 125009
| | - P A Kir'yanov
- Federal state budgetary institution 'Russian Centre for Forensic Medical Expertise', Russian Ministry of Health, Moscow, Russia, 125284
| |
Collapse
|
18
|
Awan SN, Bressmann T, Poburka B, Roy N, Sharp H, Watts C. Dialectical effects on nasalance: a multicenter, cross-continental study. J Speech Lang Hear Res 2015; 58:69-77. [PMID: 25260176 DOI: 10.1044/2014_jslhr-s-14-0077] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2014] [Accepted: 09/08/2014] [Indexed: 06/03/2023]
Abstract
PURPOSE This study investigated nasalance in speakers from six different dialectal regions across North America using recent versions of the Nasometer. It was hypothesized that many of the sound changes observed in regional dialects of North American English would have a significant impact on measures of nasalance. METHOD Samples of the Zoo Passage, the Rainbow Passage, and the Nasal Sentences were collected from young adult male and female speakers (N=300) from six North American dialectical regions (Midland/Mid-Atlantic; Inland North Canada; Inland North; North Central; South; and Western dialects). RESULTS Across the three passage types, effect sizes for dialect were moderate in strength and accounted for approximately 7%-9% of the variation in nasalance. Increased differences in nasalance tended to occur between speakers from distinctly different geographical regions, with the highest nasalance across all passages observed for speakers from the Texas South dialect region. CONCLUSION Clinicians and researchers who use perceptual and instrumental measures of speech production should be aware that dialectical and socially acquired speech patterns may influence the acoustic characteristics of speech and may also influence the interpretation of normative expectations and typical versus disordered cutoff scores for instruments such as the Nasometer.
Collapse
|
19
|
Vatin L, Lagier A, Legou T, Galant C, Arnaud-Pellet MN, Hadj M, Cheynet F, Chossegros C, Giovanni A. [Dynamic palatography: Diagnostic tool for dysfunctional swallowing? Feasibility study]. Rev Laryngol Otol Rhinol (Bord) 2015; 136:181-184. [PMID: 29400042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
OBJECTIVE Dysfunctional swallowing may cause transverse occlusal disorders. The speech re-education of dysfunctional swallowing aims to correct or prevent the recurrence of occlusal disorders. The main objective was to test the dynamic palatography as a diagnosis and quantification tool of the dysfunctional swallowing. MATERIAL AND METHODS The study was prospective and descriptive. Twelve average 23.5 years old women with a clinical dysfunctional swallowing have been included between January and May 2014. None was aware of presenting an atypical swallowing or dento-facial dysmorphism of class II. The dynamic palatography device measured the pressure force of the language on the palate during the lingual rest, swallowing saliva and water. Parameters measured were the duration and magnitude of support of the tongue on the palate. RESULTS Dynamic palatography showed a trend to predominant anterior contact during rest position (25%), and lower position of the language with little contact during swallowing of saliva and water. DISCUSSION Palatography results are consistent with the clinical diagnostic criteria of atypical swallowing. Our palatography tool has the advantage of being unobtrusive in the mouth compared to other pre existing systems. This device should be tested on larger patient populations and could enable monitore atypical swallowing rehabilitation efficiency. The palatography could complete the swallowing assessment and be a monitoring and rehabilitation tool in real time.
Collapse
|
20
|
Xu D, Richards JA, Gilkerson J. Automated analysis of child phonetic production using naturalistic recordings. J Speech Lang Hear Res 2014; 57:1638-1650. [PMID: 24824489 DOI: 10.1044/2014_jslhr-s-13-0037] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/12/2013] [Accepted: 05/05/2014] [Indexed: 06/03/2023]
Abstract
PURPOSE Conventional resource-intensive methods for child phonetic development studies are often impractical for sampling and analyzing child vocalizations in sufficient quantity. The purpose of this study was to provide new information on early language development by an automated analysis of child phonetic production using naturalistic recordings. The new approach was evaluated relative to conventional manual transcription methods. Its effectiveness was demonstrated by a case study with 106 children with typical development (TD) ages 8-48 months, 71 children with autism spectrum disorder (ASD) ages 16-48 months, and 49 children with language delay (LD) not related to ASD ages 10-44 months. METHOD A small digital recorder in the chest pocket of clothing captured full-day natural child vocalizations, which were automatically identified into consonant, vowel, nonspeech, and silence, producing the average count per utterance (ACPU) for consonant and vowel. RESULTS Clear child utterances were identified with above 72% accuracy. Correlations between machine-estimated and human-transcribed ACPUs were above 0.82. Children with TD produced significantly more consonants and vowels per utterance than did other children. Children with LD produced significantly more consonants but not vowels than did children with ASD. CONCLUSION The authors provide new information on typical and atypical language development in children with TD, ASD, and LD using an automated computational approach.
Collapse
|
21
|
Narayanan S, Toutios A, Ramanarayanan V, Lammert A, Kim J, Lee S, Nayak K, Kim YC, Zhu Y, Goldstein L, Byrd D, Bresch E, Ghosh P, Katsamanis A, Proctor M. Real-time magnetic resonance imaging and electromagnetic articulography database for speech production research (TC). J Acoust Soc Am 2014; 136:1307. [PMID: 25190403 PMCID: PMC4165284 DOI: 10.1121/1.4890284] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
USC-TIMIT is an extensive database of multimodal speech production data, developed to complement existing resources available to the speech research community and with the intention of being continuously refined and augmented. The database currently includes real-time magnetic resonance imaging data from five male and five female speakers of American English. Electromagnetic articulography data have also been presently collected from four of these speakers. The two modalities were recorded in two independent sessions while the subjects produced the same 460 sentence corpus used previously in the MOCHA-TIMIT database. In both cases the audio signal was recorded and synchronized with the articulatory data. The database and companion software are freely available to the research community.
Collapse
Affiliation(s)
- Shrikanth Narayanan
- Signal Analysis and Interpretation Laboratory, University of Southern California, 3740 McClintock Avenue, Los Angeles, California 90089-2564
| | - Asterios Toutios
- Signal Analysis and Interpretation Laboratory, University of Southern California, 3740 McClintock Avenue, Los Angeles, California 90089-2564
| | - Vikram Ramanarayanan
- Signal Analysis and Interpretation Laboratory, University of Southern California, 3740 McClintock Avenue, Los Angeles, California 90089-2564
| | - Adam Lammert
- Signal Analysis and Interpretation Laboratory, University of Southern California, 3740 McClintock Avenue, Los Angeles, California 90089-2564
| | - Jangwon Kim
- Signal Analysis and Interpretation Laboratory, University of Southern California, 3740 McClintock Avenue, Los Angeles, California 90089-2564
| | - Sungbok Lee
- Signal Analysis and Interpretation Laboratory, University of Southern California, 3740 McClintock Avenue, Los Angeles, California 90089-2564
| | - Krishna Nayak
- Magnetic Resonance Engineering Laboratory, University of Southern California, 3740 McClintock Avenue, Los Angeles, California 90089-2564
| | - Yoon-Chul Kim
- Magnetic Resonance Engineering Laboratory, University of Southern California, 3740 McClintock Avenue, Los Angeles, California 90089-2564
| | - Yinghua Zhu
- Magnetic Resonance Engineering Laboratory, University of Southern California, 3740 McClintock Avenue, Los Angeles, California 90089-2564
| | - Louis Goldstein
- Department of Linguistics, University of Southern California, 3601 Watt Way, Los Angeles, California 90089-1693
| | - Dani Byrd
- Department of Linguistics, University of Southern California, 3601 Watt Way, Los Angeles, California 90089-1693
| | - Erik Bresch
- Philips Research, High Tech Campus 5, 5656 AE, Eindhoven, Netherlands
| | - Prasanta Ghosh
- Department of Electrical Engineering, Indian Institute of Science, Bangalore, Karnataka, 560012, India
| | - Athanasios Katsamanis
- School of Electrical and Computer Engineering, National Technical University of Athens, Iroon Polytexneiou Street, Athens 15773, Greece
| | - Michael Proctor
- ARC Centre of Excellence in Cognition and its Disorders and Department of Linguistics, Macquarie University, New South Wales 2109, Australia
| |
Collapse
|
22
|
Davidow JH. Systematic studies of modified vocalization: the effect of speech rate on speech production measures during metronome-paced speech in persons who stutter. Int J Lang Commun Disord 2014; 49:100-112. [PMID: 24372888 PMCID: PMC4461240 DOI: 10.1111/1460-6984.12050] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
BACKGROUND Metronome-paced speech results in the elimination, or substantial reduction, of stuttering moments. The cause of fluency during this fluency-inducing condition is unknown. Several investigations have reported changes in speech pattern characteristics from a control condition to a metronome-paced speech condition, but failure to control speech rate between conditions limits our ability to determine if the changes were necessary for fluency. AIMS This study examined the effect of speech rate on several speech production variables during one-syllable-per-beat metronomic speech in order to determine changes that may be important for fluency during this fluency-inducing condition. METHODS & PROCEDURES Thirteen persons who stutter (PWS), aged 18-62 years, completed a series of speaking tasks. Several speech production variables were compared between conditions produced at different metronome beat rates, and between a control condition and a metronome-paced speech condition produced at a rate equal to the control condition. OUTCOMES & RESULTS Vowel duration, voice onset time, pressure rise time and phonated intervals were significantly impacted by metronome beat rate. Voice onset time and the percentage of short (30-100 ms) phonated intervals significantly decreased from the control condition to the equivalent rate metronome-paced speech condition. CONCLUSIONS & IMPLICATIONS A reduction in the percentage of short phonated intervals may be important for fluency during syllable-based metronome-paced speech for PWS. Future studies should continue examining the necessity of this reduction. In addition, speech rate must be controlled in future fluency-inducing condition studies, including neuroimaging investigations, in order for this research to make a substantial contribution to finding the fluency-inducing mechanism of fluency-inducing conditions.
Collapse
Affiliation(s)
- Jason H Davidow
- Department of Speech-Language-Hearing Sciences, Hofstra University, Hempstead, NY, USA
| |
Collapse
|
23
|
Huo X, Park H, Kim J, Ghovanloo M. A dual-mode human computer interface combining speech and tongue motion for people with severe disabilities. IEEE Trans Neural Syst Rehabil Eng 2013; 21:979-91. [PMID: 23475380 PMCID: PMC4445087 DOI: 10.1109/tnsre.2013.2248748] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We are presenting a new wireless and wearable human computer interface called the dual-mode Tongue Drive System (dTDS), which is designed to allow people with severe disabilities to use computers more effectively with increased speed, flexibility, usability, and independence through their tongue motion and speech. The dTDS detects users' tongue motion using a magnetic tracer and an array of magnetic sensors embedded in a compact and ergonomic wireless headset. It also captures the users' voice wirelessly using a small microphone embedded in the same headset. Preliminary evaluation results based on 14 able-bodied subjects and three individuals with high level spinal cord injuries at level C3-C5 indicated that the dTDS headset, combined with a commercially available speech recognition (SR) software, can provide end users with significantly higher performance than either unimodal forms based on the tongue motion or speech alone, particularly in completing tasks that require both pointing and text entry.
Collapse
Affiliation(s)
- Xueliang Huo
- GT-Bionics Laboratory, School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30308 USA. He is currently with the Interactive Entertainment Business of Microsoft, Redmond, WA 98052 USA
| | - Hangue Park
- GT-Bionics Laboratory, School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30308 USA
| | - Jeonghee Kim
- GT-Bionics Laboratory, School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30308 USA
| | - Maysam Ghovanloo
- GT-Bionics Laboratory, School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30308 USA
| |
Collapse
|
24
|
Thorp EB, Virnik BT, Stepp CE. Comparison of nasal acceleration and nasalance across vowels. J Speech Lang Hear Res 2013; 56:1476-1484. [PMID: 23838984 DOI: 10.1044/1092-4388(2013/12-0239)] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
PURPOSE The purpose of this study was to determine the performance of normalized nasal acceleration (NNA) relative to nasalance as estimates of nasalized versus nonnasalized vowel and sentence productions. METHOD Participants were 18 healthy speakers of American English. NNA was measured using a custom sensor, and nasalance was measured using the KayPentax Nasometer II. Speech stimuli consisted of CVC syllables with the vowels (//, /æ/, /i/, /u/) and sentences loaded with high front, high back, low front, and low back vowels in both nasal and nonnasal contexts. RESULTS NNA showed a small but significant effect of the vowel produced during syllable stimuli but no significant effect of vowel loading during sentence stimuli. Nasalance was significantly affected by the vowel being produced during both syllables and sentences with large effect sizes. Both NNA and nasalance were highly sensitive and specific to nasalization. CONCLUSIONS NNA was less affected by vowel than nasalance. Discrimination of nasal versus nonnasal stimuli using NNA and nasalance was comparable, suggesting potential for use of NNA for biofeedback applications. Future work to improve calibration of NNA is needed to lower intersubject variability.
Collapse
|
25
|
Tran PK, Letowski TR, McBride ME. The effect of bone conduction microphone placement on intensity and spectrum of transmitted speech items. J Acoust Soc Am 2013; 133:3900-3908. [PMID: 23742344 DOI: 10.1121/1.4803870] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Speech signals can be converted into electrical audio signals using either conventional air conduction (AC) microphone or a contact bone conduction (BC) microphone. The goal of this study was to investigate the effects of the location of a BC microphone on the intensity and frequency spectrum of the recorded speech. Twelve locations, 11 on the talker's head and 1 on the collar bone, were investigated. The speech sounds were three vowels (/u/, /a/, /i/) and two consonants (/m/, /∫/). The sounds were produced by 12 talkers. Each sound was recorded simultaneously with two BC microphones and an AC microphone. Analyzed spectral data showed that the BC recordings made at the forehead of the talker were the most similar to the AC recordings, whereas the collar bone recordings were most different. Comparison of the spectral data with speech intelligibility data collected in another study revealed a strong negative relationship between BC speech intelligibility and the degree of deviation of the BC speech spectrum from the AC spectrum. In addition, the head locations that resulted in the highest speech intelligibility were associated with the lowest output signals among all tested locations. Implications of these findings for BC communication are discussed.
Collapse
Affiliation(s)
- Phuong K Tran
- U.S. Army Research Laboratory, Human Research and Engineering Directorate, Aberdeen Proving Ground, Maryland 21005, USA.
| | | | | |
Collapse
|
26
|
Abstract
BACKGROUND There are different reports of the usefulness of the Nasometer™ as a complement to listening, often as correlation calculations between listening and nasalance measurements. Differences between findings have been attributed to listener experience and types of speech stimuli. AIMS To compare nasalance scores from the Nasometer with perceptual assessments, for the same and different Swedish speech stimuli, using three groups of listeners with differing levels of experience in judging speech nasality. METHODS & PROCEDURES To compare nasalance scores and blinded listener ratings of randomized recordings using three groups of listeners and two groups of speakers. Speakers were either classified as having hypernasal speech or speech with typical speech resonance. Listeners were speech-language pathologists (SLPs) working predominantly with resonance disorders, other SLPs and untrained listeners. OUTCOMES & RESULTS Correlations (r(s)) between hypernasality ratings and nasalance scores for each listener group and speech stimuli were calculated. For both groups of SLPs all correlations between perceptual ratings and nasalance scores were significant at p= 0.01. The correlations between the nasalance scores and ratings by listeners in the SLP groups were higher than those for the untrained listener group regardless of stimulus type. Post-hoc Mann-Whitney U-tests showed that the only difference that was significant was expert SLP group versus untrained listener group. Secondly, correlations between perceptual ratings and oral stimulus nasalance scores were higher when the perceptual ratings were based on spontaneous speech rather than on the oral stimulus. However, a Wilcoxon signed rank test showed that the difference was not significant. A third finding was that correlations between oral stimulus nasalance scores and perceptual scores were higher than those between mixed stimulus nasalance scores and perceptual scores. A Wilcoxon signed rank test showed that the difference was significant. CONCLUSIONS & IMPLICATIONS The Nasometer might be useful for the SLP with limited experience in assessing resonance disorders in differentiating between hyper- and hyponasality. With listener reliability for ratings of hypernasality still being an issue, the use of a nasalance score as a complement to the perceptual evaluation will also aid the expert SLP. It will give an alternative way of quantifying speech resonance and might help in especially hard to judge cases.
Collapse
Affiliation(s)
- Karin Brunnegård
- Speech and Language Pathology, Department of Clinical Sciences, Umeå University, Stockholm, Sweden.
| | | | | |
Collapse
|
27
|
Abstract
Electropalatographic specification of alveolar fricatives in Croatian is aimed at providing speech therapists with normative data about the range of acceptable productions of /s/ and /z/ in adult speakers of Croatian. Four variables were analysed: place of articulation, total contact, groove width and hold phase duration. Intra- and inter-speaker variability for each variable was analysed. Lingual palatal cues for voicing difference were also quantified and discussed. Results show that Croatian /s/ and /z/ are alveolar and not dental as previously reported. The comparison between the voiced and the voiceless fricative shows that durational measures provide the best differentiation. The voiceless counterpart is significantly longer. The difference between voiced and voiceless is also found in the total contact, with /z/ having more contact in the anterior four rows of electrodes, while /s/ has more contact in the posterior four rows of electrodes. This difference is also reflected in the anterior and the posterior groove widths. Possibilities of using these results as normative data for the diagnosis and treatment of atypical articulation of /s/ and /z/ are discussed.
Collapse
Affiliation(s)
- Marko Liker
- Department of Phonetics, University of Zagreb, Zagreb, Croatia.
| | | | | |
Collapse
|
28
|
Myers FL, Bakker K, St Louis KO, Raphael LJ. Disfluencies in cluttered speech. J Fluency Disord 2012; 37:9-19. [PMID: 22325918 DOI: 10.1016/j.jfludis.2011.10.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2011] [Revised: 07/14/2011] [Accepted: 10/08/2011] [Indexed: 05/31/2023]
Abstract
UNLABELLED The purpose of this study was to examine the nature and frequency of occurrence of disfluencies, as they occur in singletons and in clusters, in the conversational speech of individuals who clutter compared to typical speakers. Except for two disfluency types (revisions in clusters, and word repetitions in clusters) nearly all disfluency types were virtually indistinguishable in frequency of occurrence between the two groups. These findings shed light on cluttering in several respects, foremost of which is that it provides documentation on the nature of disfluencies in cluttering. Findings also have implications for our understanding of the relationship between cluttering and typical speech, cluttering and stuttering, the Cluttering Spectrum Hypothesis, as well as the Lowest Common Denominator definition of cluttering. EDUCATIONAL OBJECTIVES At the end of this activity the reader will be able to: (a) identify types of disfluency associated with cluttered speech; (b) contrast disfluencies in cluttered speech with those associated with stuttering; (c) compare the disfluencies of typical speakers with those of cluttering; (d) explain the perceptual nature of cluttering.
Collapse
Affiliation(s)
- Florence L Myers
- Adelphi University, Department of Communication Sciences and Disorders, Garden City, NY, USA
| | | | | | | |
Collapse
|
29
|
Fitzsimons DA, Jones DL, Barton B, North KN. A procedure for the computerized analysis of cleft palate speech transcription. Clin Linguist Phon 2012; 26:18-38. [PMID: 21728832 DOI: 10.3109/02699206.2011.584270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
The phonetic symbols used by speech-language pathologists to transcribe speech contain underlying hexadecimal values used by computers to correctly display and process transcription data. This study aimed to develop a procedure to utilise these values as the basis for subsequent computerized analysis of cleft palate speech. A computer keyboard file and a modified font file were developed using symbols from the International Phonetic Alphabet and extensions to the International Phonetic Alphabet to improve the computerized storage of phonetic symbols used in cleft palate speech transcription. Computerized coding procedures were written to retrieve hexadecimal values of transcribed symbols and match these to their phonetic attributes as defined in the International Phonetic Alphabet and extensions to the International Phonetic Alphabet. Computerized procedures were subsequently developed to analyse transcription data based on these matched hexadecimal values and their associated phonetic attributes, with respect to cleft palate speech. This method will be a useful addition to existing computerized speech analysis tools.
Collapse
Affiliation(s)
- David A Fitzsimons
- The Cleft Palate Clinic, The Children's Hospital at Westmead, Westmead, NSW, Australia.
| | | | | | | |
Collapse
|
30
|
Hong WH, Chen HC, Yang FPG, Wu CY, Chen CL, Wong AMK. Speech-associated labiomandibular movement in Mandarin-speaking children with quadriplegic cerebral palsy: a kinematic study. Res Dev Disabil 2011; 32:2595-2601. [PMID: 21775100 DOI: 10.1016/j.ridd.2011.06.016] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2011] [Accepted: 06/24/2011] [Indexed: 05/31/2023]
Abstract
The purpose of this study was to investigate the speech-associated labiomandibular movement during articulation production in Mandarin-speaking children with spastic quadriplegic (SQ) cerebral palsy (CP). Twelve children with SQ CP (aged 7-11 years) and 12 age-matched healthy children as controls were enrolled for the study. All children underwent analysis of percentage of consonants correct (PCC) and kinematic analysis of speech tasks using the Vicon Motion 370 system. Kinematic parameters included utterance duration, displacement and velocity of the lip and jaw, coefficient of variation (CV) of lip utterance duration, and spatial and temporal coupling of labiomandibular movement of speech produced in mono-syllable (MS) and poly-syllable (PS) tasks. Children with CP showed lower temporal coupling (MS, p = 0.015; PS, p = 0.007), but not spatial coupling, of labiomandibular movement than healthy children. Children with CP had greater CVs (MS, p = 0.003; PS, p = 0.010) and the peak opening displacement and velocity of lower lip and jaw (p < 0.05) and lower PCC (p < 0.001) than healthy children. Children with SQ CP displayed labiomandibular coupling movement impairment, especially in the aspect of temporal coupling. These children also had high temporal oromotor variability and needed to make more effort to coordinate the labiomandibular movement for speech production.
Collapse
Affiliation(s)
- Wei-Hsien Hong
- Department of Sports Medicine, China Medical University, No. 91, Hsueh-Shih Road, Taichung 40402, Taiwan
| | | | | | | | | | | |
Collapse
|
31
|
Abstract
PURPOSE The purpose of this study was to determine similarities and differences in nasalance scores observed with different computerized nasalance systems in the context of vowel-loaded sentences. Methodology Subjects were 46 Caucasian adults with no perceived hyper- or hyponasality. Nasalance scores were obtained using the Nasometer 6200 (Kay Elemetrics Corp.), the Nasometer II 6400 (Kay Elemetrics Corp.), and the NasalView (Tiger DRS, Inc.) for sentences loaded with mixed, high front, high back, low front, or low back vowels. RESULTS Measures of nasalance obtained with the NasalView were significantly higher than those obtained with the Nasometer 6200, and the measures of nasalance obtained with the Nasometer 6200 were significantly higher than those obtained with the Nasometer II 6400. However, similar effects of vowel loading on measures of nasalance were observed, regardless of system. For all systems, the high front vowel sentence tended to result in higher measures of nasalance than did the high back, low front, and low back vowel sentences--the mixed vowel sentence tended to have a higher degree of nasalance than did any of the other sentences. CONCLUSIONS Although nasalance data computed using different systems are not readily comparable, all three systems that were evaluated produced similar effects of vowel loading on nasalance. Increased nasalance for high front versus low back vowels may be due to factors such as increased oral impedance, reduced radiated oral sound pressure, possible increases in airflow via the nasal cavity, and increased transpalatal nasalance.
Collapse
Affiliation(s)
- Shaheen N Awan
- Bloomsburg University of Pennsylvania, Bloomsburg, PA, USA.
| | | | | |
Collapse
|
32
|
Abstract
PURPOSE This work provides a quantitative assessment of the positional tracking accuracy of the NDI Wave Speech Research System. METHOD Three experiments were completed: (a) static rigid-body tracking across different locations in the electromagnetic field volume, (b) dynamic rigid-body tracking across different locations within the electromagnetic field volume, and (c) human jaw-movement tracking during speech. Rigid-body experiments were completed for 4 different instrumentation settings, permuting 2 electromagnetic field volume sizes with and without automated reference sensor processing. RESULTS Within the anthropometrically pertinent "near field" (< 200 mm) of the NDI Wave field generator, at the 300-mm(3) volume setting, 88% of dynamic positional errors were < 0.5 mm and 98% were < 1.0 mm. Extreme tracking errors (> 2 mm) occurred within the near field for < 1% of position samples. For human jaw-movement tracking, 95% of position samples had < 0.5 mm errors for 9 out of 10 subjects. CONCLUSIONS Static tracking accuracy is modestly superior to dynamic tracking accuracy. Dynamic tracking accuracy is best for the 300-mm(3) field setting in the 200-mm near field. The use of automated head correction has no deleterious effect on tracking. Tracking errors for jaw movements during speech are typically < 0.5 mm.
Collapse
|
33
|
Abstract
PURPOSE To improve lingual ultrasound imaging with the corrected high frame rate anchored ultrasound with software alignment (CHAUSA; Miller, 2008) method. METHOD A production study of the IsiXhosa alveolar click is presented. Articulatory-to-acoustic alignment is demonstrated using a Tri-Modal 3-ms pulse generator. Images from 2 simultaneous data collection paths, using dominant ultrasound technology and the CHAUSA method, are compared. The probe stabilization and head movement correction paradigm is demonstrated. RESULTS The CHAUSA method increases the frame rate from the standard National Television System Committee (NTSC) video rate (29.97) to the ultrasound internal machine rate--in this case, 124 frames per second (fps)--by using Digital Imaging and Communications in Medicine (DICOM; National Electrical Manufacturers Association, 2008) data transfer. DICOM avoids spatiotemporal inaccuracies introduced by dominant ultrasound export techniques. The data display alignment of the acoustic and articulatory signals to the correct high-frame rate (FR) frame (± 4 ms at 124 fps). CONCLUSIONS CHAUSA produces high-FR, high-spatial-quality ultrasound images, which are head corrected to 1 mm. The method reveals tongue dorsum retraction during the posterior release of the alveolar click and tongue tip recoil following the anterior release of the alveolar click, both of which were previously undetectable. CHAUSA visualizes most of the tongue in studies of dynamic consonants with a major reduction in field problems, opening up important areas of speech research.
Collapse
|
34
|
Folker JE, Murdoch BE, Cahill LM, Delatycki MB, Corben LA, Vogel AP. Kinematic analysis of lingual movements during consonant productions in dysarthric speakers with Friedreich's ataxia: A case-by-case analysis. Clin Linguist Phon 2011; 25:66-79. [PMID: 20932172 DOI: 10.3109/02699206.2010.511760] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Articulatory kinematics were investigated using electromagnetic articulography (EMA) in four dysarthric speakers with Friedreich's ataxia (FRDA). Specifically, tongue-tip and tongue-back movements were recorded by the AG-200 EMA system during production of the consonants /t/ and /k/ as produced within a sentence utterance and during a rapid syllable repetition task. The results obtained for each of the participants with FRDA were individually compared to those obtained by a control group (n = 10). Results revealed significantly greater movement durations and increased articulatory distances, most predominantly during the approach phase of consonant production. A task difference was observed with lingual kinematics more disturbed during the syllable repetition task than during the sentence utterance. Despite expectations of slowed articulatory movements in FRDA dysarthria, the EMA data indicated that the observed prolongation of consonant phase durations was generally associated with greater articulatory distances, rather than slowed movement execution.
Collapse
Affiliation(s)
- Joanne E Folker
- School of Health and Rehabilitation Sciences, The University of Queensland, Brisbane, QLD, Australia.
| | | | | | | | | | | |
Collapse
|
35
|
Oller DK, Niyogi P, Gray S, Richards JA, Gilkerson J, Xu D, Yapanel U, Warren SF. Automated vocal analysis of naturalistic recordings from children with autism, language delay, and typical development. Proc Natl Acad Sci U S A 2010; 107:13354-9. [PMID: 20643944 PMCID: PMC2922144 DOI: 10.1073/pnas.1003882107] [Citation(s) in RCA: 183] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
For generations the study of vocal development and its role in language has been conducted laboriously, with human transcribers and analysts coding and taking measurements from small recorded samples. Our research illustrates a method to obtain measures of early speech development through automated analysis of massive quantities of day-long audio recordings collected naturalistically in children's homes. A primary goal is to provide insights into the development of infant control over infrastructural characteristics of speech through large-scale statistical analysis of strategically selected acoustic parameters. In pursuit of this goal we have discovered that the first automated approach we implemented is not only able to track children's development on acoustic parameters known to play key roles in speech, but also is able to differentiate vocalizations from typically developing children and children with autism or language delay. The method is totally automated, with no human intervention, allowing efficient sampling and analysis at unprecedented scales. The work shows the potential to fundamentally enhance research in vocal development and to add a fully objective measure to the battery used to detect speech-related disorders in early childhood. Thus, automated analysis should soon be able to contribute to screening and diagnosis procedures for early disorders, and more generally, the findings suggest fundamental methods for the study of language in natural environments.
Collapse
Affiliation(s)
- D K Oller
- School of Audiology and Speech-Language Pathology, University of Memphis, Memphis, TN 38105, USA.
| | | | | | | | | | | | | | | |
Collapse
|
36
|
O'Brian S, Jones M, Pilowsky R, Onslow M, Packman A, Menzies R. A new method to sample stuttering in preschool children. Int J Speech Lang Pathol 2010; 12:173-177. [PMID: 20433336 DOI: 10.3109/17549500903464338] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
This study reports a new method for sampling the speech of preschool stuttering children outside the clinic environment. Twenty parents engaged their stuttering children in an everyday play activity in the home with a telephone handset nearby. A remotely located researcher telephoned the parent and recorded the play session with a phone-recording jack attached to a digital audio recorder at the remote location. The parent placed an audio recorder near the child for comparison purposes. Children as young as 2 years complied with the remote method of speech sampling. The quality of the remote recordings was superior to that of the in-home recordings. There was no difference in means or reliability of stutter-count measures made from the remote recordings compared with those made in-home. Advantages of the new method include: (1) cost efficiency of real-time measurement of percent syllables stuttered in naturalistic situations, (2) reduction of bias associated with parent-selected timing of home recordings, (3) standardization of speech sampling procedures, (4) improved parent compliance with sampling procedures, (5) clinician or researcher on-line control of the acoustic and linguistic quality of recordings, and (6) elimination of the need to lend equipment to parents for speech sampling.
Collapse
Affiliation(s)
- Sue O'Brian
- Australian Stuttering Research Centre, The University of Sydney, Australia
| | | | | | | | | | | |
Collapse
|
37
|
McNeil MR, Katz WF, Fossett TRD, Garst DM, Szuminsky NJ, Carter G, Lim KY. Effects of online augmented kinematic and perceptual feedback on treatment of speech movements in apraxia of speech. Folia Phoniatr Logop 2010; 62:127-33. [PMID: 20424468 PMCID: PMC2871060 DOI: 10.1159/000287211] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Apraxia of speech (AOS) is a motor speech disorder characterized by disturbed spatial and temporal parameters of movement. Research on motor learning suggests that augmented feedback may provide a beneficial effect for training movement. This study examined the effects of the presence and frequency of online augmented visual kinematic feedback (AVKF) and clinician-provided perceptual feedback on speech accuracy in 2 adults with acquired AOS. Within a single-subject multiple-baseline design, AVKF was provided using electromagnetic midsagittal articulography (EMA) in 2 feedback conditions (50 or 100%). Articulator placement was specified for speech motor targets (SMTs). Treated and baselined SMTs were in the initial or final position of single-syllable words, in varying consonant-vowel or vowel-consonant contexts. SMTs were selected based on each participant's pre-assessed erred productions. Productions were digitally recorded and online perceptual judgments of accuracy (including segment and intersegment distortions) were made. Inter- and intra-judge reliability for perceptual accuracy was high. Results measured by visual inspection and effect size revealed positive acquisition and generalization effects for both participants. Generalization occurred across vowel contexts and to untreated probes. Results of the frequency manipulation were confounded by presentation order. Maintenance of learned and generalized effects were demonstrated for 1 participant. These data provide support for the role of augmented feedback in treating speech movements that result in perceptually accurate speech production. Future investigations will explore the independent contributions of each feedback type (i.e. kinematic and perceptual) in producing efficient and effective training of SMTs in persons with AOS.
Collapse
Affiliation(s)
- M R McNeil
- Geriatric Research Education and Clinical Center, VA Pittsburgh Healthcare System, Pittsburgh, PA 15260, USA.
| | | | | | | | | | | | | |
Collapse
|
38
|
Abstract
Among teachers, music teachers are roughly four times more likely than classroom teachers to develop voice-related problems. Although it has been established that music teachers use their voices at high intensities and durations in the course of their workday, voice-use profiles concerning the amount and intensity of vocal use and vocal load have neither been quantified nor has vocal load for music teachers been compared with classroom teachers using these same voice-use parameters. In this study, total phonation time, fundamental frequency (F₀), and vocal intensity (dB SPL [sound pressure level]) were measured or estimated directly using a KayPENTAX Ambulatory Phonation Monitor (KayPENTAX, Lincoln Park, NJ). Vocal load was calculated as cycle and distance dose, as defined by Švec et al (2003), which integrates total phonation time, F₀, and vocal intensity. Twelve participants (n = 7 elementary music teachers and n = 5 elementary classroom teachers) were monitored during five full teaching days of one workweek to determine average vocal load for these two groups of teachers. Statistically significant differences in all measures were found between the two groups (P < 0.05) with large effect sizes for all parameters. These results suggest that typical vocal loads for music teachers are substantially higher than those experienced by classroom teachers (P < 0.01). This study suggests that reducing vocal load may have immediate clinical and educational benefits in vocal health in music teachers.
Collapse
Affiliation(s)
- Sharon L Morrow
- Department of Music Education, Westminster Choir College of Rider University, Princeton, New Jersey, USA.
| | | |
Collapse
|
39
|
Sawigun C, Ngamkham W, Serdijn WA. Comparison of speech processing strategies for the design of an ultra low-power analog bionic ear. Annu Int Conf IEEE Eng Med Biol Soc 2010; 2010:1374-1377. [PMID: 21096335 DOI: 10.1109/iembs.2010.5626737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Miniaturizing area and power consumptions of cochlear prosthetic devices is strongly required for full implantation. In this paper, several speech encoding strategies are studied and compared in order to find a compact speech processor that allows for full implantation and is able to convey both time and frequency components of the incoming speech to a set of electrical pulse stimuli. The study covers the widely recognized continuous time interleaved sampling (CIS) and strategies that convey the temporal fine structure (TFS), including race-to-spike asynchronous interleaved sampling (AIS), phase-locking (PL) using zero-crossing detection (ZCD), and PL using a peak-picking (PP) technique. To estimate the performances of the four systems, a spike-based reconstruction algorithm is employed to retrieve the original sounds after being processed by different strategies. The correlation factors between the reconstructed and original signals imply that strategies that convey TFS outperform CIS. Among them, the peak picking technique combines good performance with great compactness since envelope detectors are not required.
Collapse
Affiliation(s)
- Chutham Sawigun
- Biomedical Electronics Group, Electronics Research Laboratory, Delft University of Technology, the Netherlands.
| | | | | |
Collapse
|
40
|
Deng X, Chen J, Shuai J. [Detection of endpoint for segmentation between consonants and vowels in aphasia rehabilitation software based on artificial intelligence scheduling]. Sheng Wu Yi Xue Gong Cheng Xue Za Zhi 2009; 26:886-899. [PMID: 19813633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
For the purpose of improving the efficiency of aphasia rehabilitation training, artificial intelligence-scheduling function is added in the aphasia rehabilitation software, and the software's performance is improved. With the characteristics of aphasia patient's voice as well as with the need of artificial intelligence-scheduling functions under consideration, the present authors have designed a set of endpoint detection algorithm. It determines the reference endpoints, then extracts every word and ensures the reasonable segmentation points between consonants and vowels, using the reference endpoints. The results of experiments show that the algorithm is able to attain the objects of detection at a higher accuracy rate. Therefore, it is applicable to the detection of endpoint on aphasia-patient's voice.
Collapse
Affiliation(s)
- Xingjuan Deng
- College of Bioengineering, Chongqing Uniwversity, Chongqing 400030, China.
| | | | | |
Collapse
|
41
|
Abstract
This paper presents the results of open quotient (OQ) measurements in electroglottographic (EGG) signals of young (18-30 years) and elderly (60-82 years) male and female speakers. The paper further presents quantitative results of the relation between the EGG OQ and the perception of a speaker's age. Higgins and Saxman found a decreased EGG OQ with increased age for females, while the EGG OQ increased for males as the speaker's age increased in sustained vowel material 1. Although laryngeal degeneration due to increased age seems to occur to a lesser extent in females, the significant decrease of the OQ in elderly female voices could not be explained in terms of age-related physiological changes. Linville found increased spectral amplitudes in the region of F0 for the elderly (obtained by long-term average spectra (LTAS) measurements of read speech material), independent of gender, which could be indirectly interpreted as an increasing OQ 3. We measured the EGG OQ, not only for sustained vowels but also in vowels taken from isolated words and read speech material. To analyse the relation between breathiness in terms of an increased EGG OQ and the mean perceived age per stimulus, a perception test was carried out, in which listeners were asked to estimate speaker's age based on sustained /a/-vowels varying in vocal effort (soft-normal-loud) during production. 1) The decreased EGG OQ for elderly females originally found by Higgins and Saxman 1 is not apparent in our data for sustained /a/-vowels; for males, however, we also found an increased EGG OQ for the elderly speakers. 2) In addition, an increased EGG OQ for the group of elderly in comparison to the younger males occurs for the unstressed syllable of the word material. 3) Our results show a strong positive relation between perceived age and EGG OQ in male vowel stimuli. Regarding 2), depending on the speech task at least a male speaker's voice gets more breathy as age increases. Considering 3), increased breathiness may contribute to the listener's perception of increased age.
Collapse
Affiliation(s)
- Ralf Winkler
- Department of Speech Communication and Phonetics, Technical University, Berlin, Germany.
| | | |
Collapse
|
42
|
Abstract
The results from six published electroglottographic (EGG-based) methods for calculating the EGG contact quotient (CQEGG) were compared to closed quotients derived from simultaneous videokymographic imaging (CQKYM). Two trained male singers phonated in falsetto and in chest register, with two degrees of adduction in both registers. The maximum difference between methods in the CQEGG was 0.3 (out of 1.0). The CQEGG was generally lower than the CQKYM. Within subjects, the CQEGG co-varied with the CQkym, but with changing offsets depending on method. The CQEGG cannot be calculated for falsetto phonation with little adduction, since there is no complete glottal closure. Basic criterion-level methods with thresholds of 0.2 or 0.25 gave the best match to the CQKYM data. The results suggest that contacting and de-contacting in the EGG might not refer to the same physical events as do the beginning and cessation of airflow.
Collapse
|
43
|
Abstract
An experimental method for quantifying the amount of voicing over time is described in a tutorial manner. A new procedure for obtaining calibrated sound pressure levels (SPL) of speech from a head-mounted microphone is offered. An algorithm for voicing detection (kv) and fundamental frequency (F0) extraction from an electroglottographic signal is described. The extracted values of SPL, F0, and kv are used to derive five vocal doses: the time dose (total voicing time), the cycle dose (total number of vocal fold oscillatory cycles), the distance dose (total distance travelled by the vocal folds in an oscillatory path), the energy dissipation dose (total amount of heat energy dissipated in the vocal folds) and the radiated energy dose (total acoustic energy radiated from the mouth). The doses measure the vocal load and can be used for studying the effects of vocal fold tissue exposure to vibration.
Collapse
Affiliation(s)
- Jan G Svec
- National Center for Voice and Speech, the Denver Center for the Performing Arts, 1245 Champa Street, Denver, CO 80204, USA.
| | | | | |
Collapse
|
44
|
Abstract
The main purpose of the present acoustical study was to delineate further the changes in nasal resonance in childhood and young adulthood. An additional objective was to collect reference nasal resonance scores for normal Flemish-speaking children. Scores were recorded with a Nasometer while 33 children produced sounds and read three standard passages. We compared the nasal resonance data from the children with those of 58 adults that had been obtained in a previous study. Age had a significant effect on three sounds and two texts. The results indicated that young Flemish adults had higher nasal resonance scores than children, particularly when the reading stimuli included nasal consonants for which a co-ordinated opening and closing function of the velopharyngeal mechanism was required. These results reflect anatomical changes and differences in speech programming associated with growth.
Collapse
|
45
|
Munger JB, Thomson SL. Frequency response of the skin on the head and neck during production of selected speech sounds. J Acoust Soc Am 2008; 124:4001-4012. [PMID: 19206823 DOI: 10.1121/1.3001703] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Vibrations within the vocal tract during speech are transmitted through tissue to the skin surface and can be used to transmit speech. Achieving quality speech signals using skin vibration is desirable but problematic, primarily due to the several sound production locations along the vocal tract. The objective of this study was to characterize the frequency content of speech signals on various locations of the head and neck. Signals were recorded using a microphone and accelerometers attached to 15 locations on the heads and necks of 14 males and 10 females. The subjects voiced various phonemes and one phrase. The power spectral densities (PSD) of the phonemes were used to determine a quality ranking for each location and sound. Spectrograms were used to examine signal frequency content for selected locations. A perceptual listening test was conducted and compared to the PSD rankings. The signal-to-noise ratio was found for each location with and without background noise. These results are presented and discussed. Notably, while high-frequency content is attenuated at the throat, it is shown to be detectable at some other locations. The best locations for speech transmission were found to be generally common to males and females.
Collapse
Affiliation(s)
- Jacob B Munger
- Department of Mechanical Engineering, Brigham Young University, Provo, Utah 84602, USA
| | | |
Collapse
|
46
|
George NA, de Mul FFM, Qiu Q, Rakhorst G, Schutte HK. New laryngoscope for quantitative high-speed imaging of human vocal folds vibration in the horizontal and vertical direction. J Biomed Opt 2008; 13:064024. [PMID: 19123670 DOI: 10.1117/1.3041164] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
We report the design of a novel laser line-triangulation laryngoscope for the quantitative visualization of the three-dimensional movements of human vocal folds during phonation. This is the first successful in vivo recording of the three-dimensional movements of human vocal folds in absolute values. Triangulation images of the vocal folds are recorded at the rate of 4000 fps with a resolution of 256x256 pixels. A special image-processing algorithm is developed to precisely follow the subpixel movements of the laser line image. Vibration profiles in both horizontal and vertical directions are calibrated and measured in absolute SI units with a resolution of +/-50 microm. We also present a movie showing the vocal folds dynamics in vertical cross section.
Collapse
Affiliation(s)
- Nibu A George
- University Medical Centre Groningen and University of Groningen, Groningen Voice Research Lab, Department of Biomedical Engineering, 9700 AD Groningen, The Netherlands
| | | | | | | | | |
Collapse
|
47
|
Abstract
Velopharyngeal closure is required for normal speech production. Incomplete velopharyngeal closure manifests as resonance disorders and nasal air escape. Management of velopharyngeal insufficiency requires a general knowledge of speech production as well as a more detailed understanding of the velopharyngeal mechanism. Comprehensive evaluation by a velopharyngeal insufficiency team includes medical assessment focusing on airway obstructive symptoms, perceptual speech analysis, and instrumental assessment, which is utilized to characterize the velopharyngeal gap. Options for intervention include speech therapy, intraoral prosthetic devices, and surgery. Surgical interventions can be categorized as palatal, palatopharyngeal, or pharyngeal procedures. The therapeutic challenge lies in achieving velopharyngeal closure during speech production while maintaining patency of the upper airway. We present our protocol for evaluation of velopharyngeal function with a focus on indications for palatoplasty and pharyngoplasty. We also discuss surgical modifications of sphincter pharyngoplasty.
Collapse
Affiliation(s)
- Kathleen C Y Sie
- Pediatric Otolaryngology-Head and Neck Surgery, Children's Hospital and Regional Medical Center, University of Washington, Seattle, WA 98105, USA
| | | |
Collapse
|
48
|
Scheuerle J. Velopharyngeal dysfunction in perspective: a commentary on the Smith and Kuehn article. J Craniofac Surg 2007; 18:262-4. [PMID: 17414270 DOI: 10.1097/scs.0b013e3180341dc4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
|
49
|
Abstract
This article reviews concepts basic to the evaluation of the speech of persons with velopharyngeal dysfunction. It defines velopharyngeal dysfunction as well as reviews normal and abnormal velopharyngeal function for speech. It defines the common speech characteristics of persons with velopharyngeal dysfunction, including hypernasality, hyponasality, nasal emission, compensatory articulations, and weak pressure consonants. Speech sounds commonly impacted by velopharyngeal dysfunction are discussed. This article identifies the components of a complete speech evaluation as well as identifies anatomic and physiologic measurements of palatal function used to corroborate perceptual speech judgments indicating palatal problems. It identifies special considerations in the evaluation of persons with suspected velopharyngeal dysfunction. It briefly discusses management of velopharyngeal dysfunction. Review questions follow the article.
Collapse
Affiliation(s)
- Bonnie E Smith
- Division of Speech Pathology, Department of Otolaryngology-Head and Neck Surgery, The Craniofacial Center, University of Illinois Medical Center at Chicago, Chicago, Illinois 60612, USA.
| | | |
Collapse
|
50
|
Kaburagi T, Kawai K, Abe S. Analysis of voice source characteristics using a constrained polynomial representation of voice source signals. J Acoust Soc Am 2007; 121:745-8. [PMID: 17348497 DOI: 10.1121/1.2359234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
To analyze the characteristics of voice source signals from speech, a model is presented in the form of polynomial function by expanding the definition of the Rosenberg model. In combination with the all-pole assumption of the vocal-tract filter, methods are described for the pitch-synchronous speech analysis and temporal search of the glottal opening and closing instants. Because the source and filter models are both linear, the parameter estimation problem can be conveniently solved. In addition, the temporal search method can refine the locations of the glottal events and improve the accuracy of the parameter estimation. Analyses of non-nasalized voiced speech are conducted using an electroglottographic device from which the initial estimate of the temporal information is given.
Collapse
Affiliation(s)
- Tokihiko Kaburagi
- Faculty of Design, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka, 815-8540, Japan
| | | | | |
Collapse
|