1
|
Correcting the record: Phonetic potential of primate vocal tracts and the legacy of Philip Lieberman (1934-2022). Am J Primatol 2024:e23637. [PMID: 38741274 DOI: 10.1002/ajp.23637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 04/24/2024] [Accepted: 04/27/2024] [Indexed: 05/16/2024]
Abstract
The phonetic potential of nonhuman primate vocal tracts has been the subject of considerable contention in recent literature. Here, the work of Philip Lieberman (1934-2022) is considered at length, and two research papers-both purported challenges to Lieberman's theoretical work-and a review of Lieberman's scientific legacy are critically examined. I argue that various aspects of Lieberman's research have been consistently misinterpreted in the literature. A paper by Fitch et al. overestimates the would-be "speech-ready" capacities of a rhesus macaque, and the data presented nonetheless supports Lieberman's principal position-that nonhuman primates cannot articulate the full extent of human speech sounds. The suggestion that no vocal anatomical evolution was necessary for the evolution of human speech (as spoken by all normally developing humans) is not supported by phonetic or anatomical data. The second challenge, by Boë et al., attributes vowel-like qualities of baboon calls to articulatory capacities based on audio data; I argue that such "protovocalic" properties likely result from disparate articulatory maneuvers compared to human speakers. A review of Lieberman's scientific legacy by Boë et al. ascribes a view of speech evolution (which the authors term "laryngeal descent theory") to Lieberman, which contradicts his writings. The present article documents a pattern of incorrect interpretations of Lieberman's theoretical work in recent literature. Finally, the apparent trend of vowel-like formant dispersions in great ape vocalization literature is discussed with regard to Lieberman's theoretical work. The review concludes that the "Lieberman account" of primate vocal tract phonetic capacities remains supported by research: the ready articulation of fully human speech reflects species-unique anatomy.
Collapse
|
2
|
Effects of speech rate modifications on phonatory acoustic outcomes in Parkinson's disease. Front Hum Neurosci 2024; 18:1331816. [PMID: 38450224 PMCID: PMC10914948 DOI: 10.3389/fnhum.2024.1331816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 01/30/2024] [Indexed: 03/08/2024] Open
Abstract
Speech rate reduction is a global speech therapy approach for speech deficits in Parkinson's disease (PD) that has the potential to result in changes across multiple speech subsystems. While the overall goal of rate reduction is usually improvements in speech intelligibility, not all people with PD benefit from this approach. Speech rate is often targeted as a means of improving articulatory precision, though less is known about rate-induced changes in other speech subsystems that could help or hinder communication. The purpose of this study was to quantify phonatory changes associated with speech rate modification across a broad range of speech rates from very slow to very fast in talkers with and without PD. Four speaker groups participated: younger and older healthy controls, and people with PD with and without deep brain stimulation of the subthalamic nucleus (STN-DBS). Talkers read aloud standardized sentences at 7 speech rates elicited using magnitude production: habitual, three slower rates, and three faster rates. Acoustic measures of speech intensity, cepstral peak prominence, and fundamental frequency were measured as a function of speech rate and group. Overall, slower rates of speech were associated with differential effects on phonation across the four groups. While all talkers spoke at a lower pitch in slow speech, younger talkers showed increases in speech intensity and cepstral peak prominence, while talkers with PD and STN-DBS showed the reverse pattern. Talkers with PD without STN-DBS and older healthy controls behaved in between these two extremes. At faster rates, all groups uniformly demonstrated increases in cepstral peak prominence. While speech rate reductions are intended to promote positive changes in articulation to compensate for speech deficits in dysarthria, the present results highlight that undesirable changes may be invoked across other subsystems, such as at the laryngeal level. In particular, talkers with STN-DBS, who often demonstrate speech deterioration following DBS surgery, demonstrated more phonatory detriments at slowed speech rates. Findings have implications for speech rate candidacy considerations and speech motor control processes in PD.
Collapse
|
3
|
Speech Perception as a Function of the Number of Channels and Channel Interaction in Cochlear Implant Simulation. Medeni Med J 2023; 38:276-283. [PMID: 38148725 PMCID: PMC10759942 DOI: 10.4274/mmj.galenos.2023.73454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 12/12/2023] [Indexed: 12/28/2023] Open
Abstract
Objective Speech perception relies on precise spectral and temporal cues. However, cochlear implant (CI) processing is confined to a limited frequency range, affecting the information transmitted to the auditory system. This study analyzes the influence of channel interaction and the number of channels on word recognition scores (WRS) within the CI simulation framework. Methods Two distinct experiments were conducted. The first experiment (n=29, average age =23 years, 14 females) evaluated the number of channels using eight, twelve, sixteen, and 22 channel vocoded and nonvocoded word lists for WRS assessment. The second experiment (n=29, average age =25 years, 16 females) explored channel interaction across low, middle, and high-interaction conditions. Results In the first experiment, participants scored 57.93%, 80.97%, 83.59%, 91.03%, and 95.45% under 8, 12, 16, and 22-channel vocoder and non-vocoder conditions, respectively. The number of vocoder channels significantly affected WRS, with significant differences observed in all conditions except between the 12-channel and 16-channels (p<0.01). In the second experiment, the participants scored 2.2%, 20.6%, and 50.6% under high, mid, and low interaction conditions, respectively. Statistically significant differences were observed across all channel interaction conditions (p<0.01). Conclusions While the number of channels had a notable impact on WRS, it is essential to note that certain conditions (12 vs. 16) did not yield statistically significant differences. The observed differences in WRS were eclipsed by the pronounced effects of channel interaction. Notably, all conditions in the channel interaction experiment exhibited statistically significant differences. These findings underscore the paramount importance of prioritizing channel interaction in signal processing and CI fitting.
Collapse
|
4
|
Comparison between the acoustic fundamental frequency of the voice and the vibration frequency of the vocal folds analyzed by digital kymography. Codas 2023; 35:e20220173. [PMID: 37909493 PMCID: PMC10702710 DOI: 10.1590/2317-1782/20232022173pt] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 11/07/2022] [Indexed: 11/03/2023] Open
Abstract
PURPOSE To compare the frequency of vocal fold opening variation, analyzed by digital kymography, with the fundamental voice frequency obtained by acoustic analysis, in individuals without laryngeal alteration. METHODS Observational analytical cross-sectional study. The participants were forty-eight women and 38 men from 18 to 55 years of age. The evaluation was made by voice acoustic analysis, by the habitual emission of the vowel /a/ for 3 seconds, and days of the week, and digital kymography (DKG), by the habitual emission of the vowels /i/ and /ɛ/. The measurements analyzed were acoustic fundamental frequency (f0), extracted by the Computerized Speech Lab (CSL) program, and dominant frequency of the variation of right (R-freq) and left (L-freq) vocal fold opening, obtained through the KIPS image processing program. The mounting of the kymograms consisted in the manual demarcation of the region by vertical lines delimiting width and horizontal lines separating the posterior, middle and anterior thirds of the Rima glottidis. In the statistical analysis, the Anderson-Darling test was used to verify the normality of the sample. The ANOVA and Tukey tests were performed for the comparison of measurements between the groups. For the comparison of age between the groups, the Mann-Whitney test was used. RESULTS There are no differences between the values of the frequency measurement analyzed by digital kymography, with the acoustic fundamental frequency, in individuals without laryngeal alteration. CONCLUSION The values of the dominant frequency of the vocal folds opening variation, as assessed by digital kymography, and the acoustic fundamental frequency of the voice are similar, allowing comparison between these measurements in the multidimensional evaluation of the voice, in individuals without laryngeal alteration.
Collapse
|
5
|
Deep Learning of Speech Data for Early Detection of Alzheimer's Disease in the Elderly. Bioengineering (Basel) 2023; 10:1093. [PMID: 37760195 PMCID: PMC10525115 DOI: 10.3390/bioengineering10091093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2023] [Revised: 08/16/2023] [Accepted: 08/24/2023] [Indexed: 09/29/2023] Open
Abstract
BACKGROUND Alzheimer's disease (AD) is the most common form of dementia, which makes the lives of patients and their families difficult for various reasons. Therefore, early detection of AD is crucial to alleviating the symptoms through medication and treatment. OBJECTIVE Given that AD strongly induces language disorders, this study aims to detect AD rapidly by analyzing the language characteristics. MATERIALS AND METHODS The mini-mental state examination for dementia screening (MMSE-DS), which is most commonly used in South Korean public health centers, is used to obtain negative answers based on the questionnaire. Among the acquired voices, significant questionnaires and answers are selected and converted into mel-frequency cepstral coefficient (MFCC)-based spectrogram images. After accumulating the significant answers, validated data augmentation was achieved using the Densenet121 model. Five deep learning models, Inception v3, VGG19, Xception, Resnet50, and Densenet121, were used to train and confirm the results. RESULTS Considering the amount of data, the results of the five-fold cross-validation are more significant than those of the hold-out method. Densenet121 exhibits a sensitivity of 0.9550, a specificity of 0.8333, and an accuracy of 0.9000 in a five-fold cross-validation to separate AD patients from the control group. CONCLUSIONS The potential for remote health care can be increased by simplifying the AD screening process. Furthermore, by facilitating remote health care, the proposed method can enhance the accessibility of AD screening and increase the rate of early AD detection.
Collapse
|
6
|
Harnessing acoustic speech parameters to decipher amyloid status in individuals with mild cognitive impairment. Front Neurosci 2023; 17:1221401. [PMID: 37746151 PMCID: PMC10512723 DOI: 10.3389/fnins.2023.1221401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 08/08/2023] [Indexed: 09/26/2023] Open
Abstract
Alzheimer's disease (AD) is a neurodegenerative condition characterized by a gradual decline in cognitive functions. Currently, there are no effective treatments for AD, underscoring the importance of identifying individuals in the preclinical stages of mild cognitive impairment (MCI) to enable early interventions. Among the neuropathological events associated with the onset of the disease is the accumulation of amyloid protein in the brain, which correlates with decreased levels of Aβ42 peptide in the cerebrospinal fluid (CSF). Consequently, the development of non-invasive, low-cost, and easy-to-administer proxies for detecting Aβ42 positivity in CSF becomes particularly valuable. A promising approach to achieve this is spontaneous speech analysis, which combined with machine learning (ML) techniques, has proven highly useful in AD. In this study, we examined the relationship between amyloid status in CSF and acoustic features derived from the description of the Cookie Theft picture in MCI patients from a memory clinic. The cohort consisted of fifty-two patients with MCI (mean age 73 years, 65% female, and 57% positive amyloid status). Eighty-eight acoustic parameters were extracted from voice recordings using the extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS), and several ML models were used to classify the amyloid status. Furthermore, interpretability techniques were employed to examine the influence of input variables on the determination of amyloid-positive status. The best model, based on acoustic variables, achieved an accuracy of 75% with an area under the curve (AUC) of 0.79 in the prediction of amyloid status evaluated by bootstrapping and Leave-One-Out Cross Validation (LOOCV), outperforming conventional neuropsychological tests (AUC = 0.66). Our results showed that the automated analysis of voice recordings derived from spontaneous speech tests offers valuable insights into AD biomarkers during the preclinical stages. These findings introduce novel possibilities for the use of digital biomarkers to identify subjects at high risk of developing AD.
Collapse
|
7
|
Effect of clear speech on acoustic measures of dysprosody in Parkinson disease for different reading tasks. INTERNATIONAL JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2023:1-14. [PMID: 37668056 DOI: 10.1080/17549507.2023.2240041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/06/2023]
Abstract
PURPOSE The purpose of the study was to determine the effect of clear speech instruction on acoustic measures of dysprosody between reading passages of differing linguistic content for speakers with and without Parkinson Disease (PD). METHOD Ten speakers with PD and 10 controls served as participants and read five simple and three standard reading stimuli twice. First, speakers read habitually and then following clear speech instruction. Acoustic measures of fundamental frequency variation (semitone standard deviation, STSD), articulation rate, and between-complex pause durations were calculated. RESULT Results indicated speakers with PD exhibited less fundamental frequency variation than controls across reading stimuli and instructions. All speakers exhibited lower STSD and longer between-complex pause durations for the standard compared to simple reading stimuli. For clear speech, all speakers reduced articulation rate and increased between-complex pause durations in both simple and standard reading stimuli. However, speakers with PD exhibited a significantly less robust reduction in articulation rate for clear speech than control speakers for all reading stimuli. CONCLUSION Linguistic content of reading stimuli contributes to differences in fundamental frequency variation and pause duration for all speakers. All speakers reduced articulation rate for clear speech compared to habitual instruction, but speakers with PD did so to a lesser extent than controls. Linguistic content of reading stimuli to examine dysprosody in PD should be considered for clinical application.
Collapse
|
8
|
Consonant articulation acoustics and intelligibility in Swedish speakers with Parkinson's disease: a pilot study. CLINICAL LINGUISTICS & PHONETICS 2023; 37:845-865. [PMID: 35833475 DOI: 10.1080/02699206.2022.2095926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 05/16/2022] [Accepted: 06/22/2022] [Indexed: 06/15/2023]
Abstract
Imprecise consonant articulation is common in speakers with Parkinson's disease and can affect intelligibility. The research on the relationship between acoustic speech measures and intelligibility in Parkinson's disease is limited, and most of the research has been conducted on English. This pilot study investigated aspects of consonant articulation acoustics in eleven Swedish speakers with Parkinson's disease and six neurologically healthy persons. The focus of the study was on consonant cluster production, articulatory motion rate and variation, and voice onset time, and how these acoustic features correlate with speech intelligibility. Among the measures in the present study, typicality ratings of heterorganic consonant clusters /spr/ and /skr/ had the strongest correlations with intelligibility. Measures based on syllable repetition, such as repetition rate and voice onset time, showed varying results with weak to moderate correlations with intelligibility. One conclusion is that some acoustic measures may be more sensitive than others to the impact of the underlying sensory-motor impairment and dysarthria on speech production and intelligibility in speakers with Parkinson's disease. Some aspects of articulation appear to be equally demanding in terms of acoustic realisation for elderly healthy speakers and for speakers with Parkinson's disease, such as sequential motion rate measures. Clinically, this would imply that for the purpose of detecting signs of disordered speech motor control, choosing measures with less variation among older speakers without articulation impairment would lead to more robust results.
Collapse
|
9
|
Quantifying articulatory impairments in neurodegenerative motor diseases: A scoping review and meta-analysis of interpretable acoustic features. INTERNATIONAL JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2023; 25:486-499. [PMID: 36001500 PMCID: PMC9950294 DOI: 10.1080/17549507.2022.2089234] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
PURPOSE Neurodegenerative motor diseases (NMDs) have devastating effects on the lives of patients and their loved ones, in part due to the impact of neurologic abnormalities on speech, which significantly limits functional communication. Clinical speech researchers have thus spent decades investigating speech features in populations suffering from NMDs. Features of impaired articulatory function are of particular interest given their detrimental impact on intelligibility, their ability to encode a variety of distinct movement disorders, and their potential as diagnostic indicators of neurodegenerative diseases. The objectives of this scoping review were to identify (1) which components of articulation (i.e. coordination, consistency, speed, precision, and repetition rate) are the most represented in the acoustic literature on NMDs; (2) which acoustic articulatory features demonstrate the most potential for detecting speech motor dysfunction in NMDs; and (3) which articulatory components are the most impaired within each NMD. METHOD This review examined literature published between 1976 and 2020. Studies were identified from six electronic databases using predefined key search terms. The first research objective was addressed using a frequency count of studies investigating each articulatory component, while the second and third objectives were addressed using meta-analyses. RESULT Findings from 126 studies revealed a considerable emphasis on articulatory precision. Of the 24 features included in the meta-analyses, vowel dispersion/distance and stop gap duration exhibited the largest effects when comparing the NMD population to controls. The meta-analyses also revealed divergent patterns of articulatory performance across disease types, providing evidence of unique profiles of articulatory impairment. CONCLUSION This review illustrates the current state of the literature on acoustic articulatory features in NMDs. By highlighting the areas of need within each articulatory component and disease group, this work provides a foundation on which clinical researchers, speech scientists, neurologists, and computer science engineers can develop research questions that will both broaden and deepen the understanding of articulatory impairments in NMDs.
Collapse
|
10
|
Articulatory speech measures can be related to the severity of multiple sclerosis. Front Neurol 2023; 14:1075736. [PMID: 37384284 PMCID: PMC10294674 DOI: 10.3389/fneur.2023.1075736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 05/11/2023] [Indexed: 06/30/2023] Open
Abstract
Background Dysarthria is one of the most frequent communication disorders in patients with Multiple Sclerosis (MS), with an estimated prevalence of around 50%. However, it is unclear if there is a relationship between dysarthria and the severity or duration of the disease. Objective Describe the speech pattern in MS, correlate with clinical data, and compare with controls. Methods A group of MS patients (n = 73) matched to healthy controls (n = 37) by sex and age. Individuals with neurological and/or systemic conditions that could interfere with speech were excluded. MS group clinical data were obtained through the analysis of medical records. The speech assessment consisted of auditory-perceptual and speech acoustic analysis, from recording the following speech tasks: phonation and breathing (sustained vowel/a/); prosody (sentences with different intonation patterns) and articulation (diadochokinesis; spontaneous speech; diphthong/iu/repeatedly). Results In MS, 72.6% of the individuals presented mild dysarthria, with alterations in speech subsystems: phonation, breathing, resonance, and articulation. In the acoustic analysis, individuals with MS were significantly worse than the control group (CG) in the variables: standard deviation of the fundamental frequency (p = 0.001) and maximum phonation time (p = 0.041). In diadochokinesis, individuals with MS had a lower number of syllables, duration, and phonation time, but larger pauses per seconds, and in spontaneous speech, a high number of pauses were evidenced as compared to CG. Correlations were found between phonation time in spontaneous speech and the Expanded Disability Status Scale (EDSS) (r = - 0.238, p = 0.043) and phonation ratio in spontaneous speech and EDSS (r = -0.265, p = 0.023), which indicates a correlation between the number of pauses during spontaneous speech and the severity of the disease. Conclusion The speech profile in MS patients was mild dysarthria, with a decline in the phonatory, respiratory, resonant, and articulatory subsystems of speech, respectively, in order of prevalence. The increased number of pauses during speech and lower rates of phonation ratio can reflect the severity of MS.
Collapse
|
11
|
The efficacy of memory load on speech-based detection of Alzheimer's disease. Front Aging Neurosci 2023; 15:1186786. [PMID: 37333455 PMCID: PMC10272350 DOI: 10.3389/fnagi.2023.1186786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 05/16/2023] [Indexed: 06/20/2023] Open
Abstract
Introduction The study aims to test whether an increase in memory load could improve the efficacy in detection of Alzheimer's disease and prediction of the Mini-Mental State Examination (MMSE) score. Methods Speech from 45 mild-to-moderate Alzheimer's disease patients and 44 healthy older adults were collected using three speech tasks with varying memory loads. We investigated and compared speech characteristics of Alzheimer's disease across speech tasks to examine the effect of memory load on speech characteristics. Finally, we built Alzheimer's disease classification models and MMSE prediction models to assess the diagnostic value of speech tasks. Results The speech characteristics of Alzheimer's disease in pitch, loudness, and speech rate were observed and the high-memory-load task intensified such characteristics. The high-memory-load task outperformed in AD classification with an accuracy of 81.4% and MMSE prediction with a mean absolute error of 4.62. Discussion The high-memory-load recall task is an effective method for speech-based Alzheimer's disease detection.
Collapse
|
12
|
Staffs' physiological responses to irrelevant background speech and mental workload in open-plan bank office workspaces. Work 2023; 76:623-636. [PMID: 36938764 DOI: 10.3233/wor-220502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/15/2023] Open
Abstract
BACKGROUND Acoustic comfort is one of the most critical challenges in the open-plan workspace. OBJECTIVE This study was aimed to assess the effect of irrelevant background speech (IBS) and mental workload (MWL) on staffs' physiological parameters in open-plan bank office workspaces. METHODS In this study, 109 male cashier staff of the banks were randomly selected. The 30-minute equivalent noise level (LAeq) of the participants was measured in three intervals at the beginning (section A), middle (section B), and end of working hours (section C). The heart rate (HR) and heart rate variability (HRV): low frequency (LF), high frequency (HF), and LF/HF of the staff were also recorded in sections A, B, and C. Moreover, staff was asked to rate the MWL using the NASA-Task load. RESULTS The dominant frequency of the LAeq was 500 Hz, and the LAeq in the frequency range of 250 to 2000 was higher than other frequencies. The LAeq (500 Hz) was 55.82, 69.35, and 69.64 dB(A) in sections A, B, and C, respectively. The results show that the IBS affects staffs' physiological responses so that with increasing in IBS, the HF power decreases. Moreover, with higher MWL, increasing noise exposure, especially IBS, causes more increases in LF power and LF/HF ratio. CONCLUSION It seems that the IBS can affect physiological responses and increase staff stress in open-plan bank office workspaces. Moreover, the mental workload can intensify these consequences in these working settings.
Collapse
|
13
|
Acoustic Measurements of Speech and Voice in Men with Angle Class II, Division 1, Malocclusion. Int Arch Otorhinolaryngol 2022; 27:e10-e15. [PMID: 36714887 PMCID: PMC9879633 DOI: 10.1055/s-0041-1730428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Accepted: 03/03/2021] [Indexed: 11/12/2022] Open
Abstract
Introduction The acoustic analysis of speech (measurements of the fundamental frequency and formant frequencies) of different vowels produced by speakers with the Angle class II, division 1, malocclusion can provide information about the relationship between articulatory and phonatory mechanisms in this type of maxillomandibular disproportion. Objectives To investigate acoustic measurements related to the fundamental frequency (F0) and formant frequencies (F1 and F2) of the oral vowels of Brazilian Portuguese (BP) produced by male speakers with Angle class II, division 1, malocclusion (study group) and compare with men with Angle class I malocclusion (control group). Methods In total, 60 men (20 with class II, 40 with class I) aged between 18 and 40 years were included in the study. Measurements of F0, F1 and F2 of the seven oral vowels of BP were estimated from the audio samples containing repetitions of carrier sentences. The statistical analysis was performed using the Student t -test and the effect size was calculated. Results Significant differences ( p -values) were detected for F0 values in five vowels ([e], [i], [ᴐ], [o] and [u]), and for F1 in vowels [a] and [ᴐ], with high levels for class II, division 1. Conclusion Statistical differences were found in the F0 measurements with higher values in five of the seven vowels analysed in subjects with Angle class II, division 1. The formant frequencies showed differences only in F1 in two vowels with higher values in the study group. The data suggest that data on voice and speech production must be included in the protocol's assessment of patients with malocclusion.
Collapse
|
14
|
Leg movements affect speech intensity. J Neurophysiol 2022; 128:1106-1116. [PMID: 36130171 PMCID: PMC9621708 DOI: 10.1152/jn.00282.2022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Coordination between speech acoustics and manual gestures has been conceived as “not biologically mandated” (McClave E. J Psycholinguist Res 27(1): 69–89, 1998). However, recent work suggests a biomechanical entanglement between the upper limbs and the respiratory-vocal system (Pouw W, de Jonge-Hoekstra D, Harrison SJ, Paxton A, Dixon JA. Ann NY Acad Sci 1491(1): 89–105, 2021). Pouw et al. found that for movements with a high physical impulse, speech acoustics co-occur with the physical impulses of upper limb movements. They interpret this result in terms of biomechanical coupling between arm motion and speech via the breathing system. This coupling could support the synchrony observed between speech prosody and arm gestures during communication. The present study investigates whether the effect of physical impulse on speech acoustics can be extended to leg motion, assumed to be controlled independently from oral communication. The study involved 25 native speakers of German who recalled short stories while biking with their arms or their legs. These conditions were compared with a static condition in which participants could not move their arms. Our analyses are similar to that of Pouw et al. (Pouw W, de Jonge-Hoekstra D, Harrison SJ, Paxton A, Dixon JA. Ann NY Acad Sci 1491(1): 89–105, 2021). Results reveal that the presence of intensity peaks in the acoustic signal co-occur with the time of peak acceleration of legs’ biking movements. However, this was not observed when biking with the arms, which corresponded to lower acceleration peaks. In contrast to intensity, F0 was not affected in the arm and leg conditions. These results suggest that 1) the biomechanical entanglements between the respiratory-vocal system and the lower limbs may also impact speech; 2) the physical impulse may have to reach a threshold to impact speech acoustics. NEW & NOTEWORTHY The link between speech and limb motion is an interdisciplinary challenge and a core issue in motor control and language research. Our research aims to disentangle the potential biomechanical links between lower limbs and the speech apparatus, by investigating the effect of leg movements on speech acoustics.
Collapse
|
15
|
A generalizable speech emotion recognition model reveals depression and remission. Acta Psychiatr Scand 2022; 145:186-199. [PMID: 34850386 DOI: 10.1111/acps.13388] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 11/24/2021] [Accepted: 11/25/2021] [Indexed: 12/12/2022]
Abstract
OBJECTIVE Affective disorders are associated with atypical voice patterns; however, automated voice analyses suffer from small sample sizes and untested generalizability on external data. We investigated a generalizable approach to aid clinical evaluation of depression and remission from voice using transfer learning: We train machine learning models on easily accessible non-clinical datasets and test them on novel clinical data in a different language. METHODS A Mixture of Experts machine learning model was trained to infer happy/sad emotional state using three publicly available emotional speech corpora in German and US English. We examined the model's predictive ability to classify the presence of depression on Danish speaking healthy controls (N = 42), patients with first-episode major depressive disorder (MDD) (N = 40), and the subset of the same patients who entered remission (N = 25) based on recorded clinical interviews. The model was evaluated on raw, de-noised, and speaker-diarized data. RESULTS The model showed separation between healthy controls and depressed patients at the first visit, obtaining an AUC of 0.71. Further, speech from patients in remission was indistinguishable from that of the control group. Model predictions were stable throughout the interview, suggesting that 20-30 s of speech might be enough to accurately screen a patient. Background noise (but not speaker diarization) heavily impacted predictions. CONCLUSION A generalizable speech emotion recognition model can effectively reveal changes in speaker depressive states before and after remission in patients with MDD. Data collection settings and data cleaning are crucial when considering automated voice analysis for clinical purposes.
Collapse
|
16
|
Screening major depressive disorder using vocal acoustic features in the elderly by sex. J Affect Disord 2021; 291:15-23. [PMID: 34022551 DOI: 10.1016/j.jad.2021.04.098] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Revised: 01/12/2021] [Accepted: 04/25/2021] [Indexed: 10/21/2022]
Abstract
BACKGROUND Vocal acoustic features are potential biomarkers of elderly depression. Previous automated diagnostic tests for depression have employed unstandardized speech samples, and few studies have considered differences in voice reactivity. We aimed to develop a voice-based screening test for depression measuring vocal acoustic features of elderly Koreans while they read a series of mood-inducing sentences (MIS). METHODS In this case-control study, we recruited 61 individuals with major depressive disorder and 143 healthy controls (mean age [SD]: 72 [6]; female, 70%) from the community-dwelling elderly population. Participants were asked to read MIS and their variation pattern of acoustic features represented by the correlation distance between two MIS were analyzed as input features using the univariate feature selection technique and subsequently classified by AdaBoost. RESULTS Acoustic features showing significant discriminatory performances were spectral and energy-related features for males (sensitivity 0.95, specificity 0.88, and accuracy 0.86) and prosody-related features for females (sensitivity 0.73, specificity 0.86, and accuracy 0.77). The correlation distance between negative and positive MIS was significantly shorter in the depressed group than in the healthy control (F = 18.574, P < 0.001). LIMITATIONS Small sample size and relatively homogenous clinical profile of depression could limit the generalizability. CONCLUSIONS While reading MIS, spectral and energy-related acoustic features for males and prosody-related features for females are good discriminators for major depressive disorder. These features may be used as biomarkers of depression in the elderly.
Collapse
|
17
|
Acoustic Speech Analytics Are Predictive of Cerebellar Dysfunction in Multiple Sclerosis. THE CEREBELLUM 2021; 19:691-700. [PMID: 32556973 DOI: 10.1007/s12311-020-01151-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Speech production relies on motor control and cognitive processing and is linked to cerebellar function. In diseases where the cerebellum is impaired, such as multiple sclerosis (MS), speech abnormalities are common and can be detected by instrumental assessments. However, the potential of speech assessments to be used to monitor cerebellar impairment in MS remains unexplored. The aim of this study is to build an objectively measured speech score that reflects cerebellar function, pathology and quality of life in MS. Eighty-five people with MS and 21 controls participated in the study. Speech was independently assessed through objective acoustic analysis and blind expert listener ratings. Cerebellar function and overall disease disability were measured through validated clinical scores; cerebellar pathology was assessed via magnetic resonance imaging, and validated questionnaires informed quality of life. Selected speech variables were entered in a regression model to predict cerebellar function. The resulting model was condensed into one composite speech score and tested for prediction of abnormal 9-hole peg test (9HPT), and for correlations with the remaining cerebellar scores, imaging measurements and self-assessed quality of life. Slow rate of syllable repetition and increased free speech pause percentage were the strongest predictors of cerebellar impairment, complemented by phonatory instability. Those variables formed the acoustic composite score that accounted for 54% of variation in cerebellar function, correlated with cerebellar white matter volume (r = 0.3, p = 0.017), quality of life (r = 0.5, p < 0.001) and predicted an abnormal 9HPT with 85% accuracy. An objective multi-feature speech metric was highly representative of motor cerebellar impairment in MS.
Collapse
|
18
|
Gesture-speech physics in fluent speech and rhythmic upper limb movements. Ann N Y Acad Sci 2021; 1491:89-105. [PMID: 33336809 PMCID: PMC8246948 DOI: 10.1111/nyas.14532] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Revised: 10/15/2020] [Accepted: 10/23/2020] [Indexed: 12/18/2022]
Abstract
It is commonly understood that hand gesture and speech coordination in humans is culturally and cognitively acquired, rather than having a biological basis. Recently, however, the biomechanical physical coupling of arm movements to speech vocalization has been studied in steady-state vocalization and monosyllabic utterances, where forces produced during gesturing are transferred onto the tensioned body, leading to changes in respiratory-related activity and thereby affecting vocalization F0 and intensity. In the current experiment (n = 37), we extend this previous line of work to show that gesture-speech physics also impacts fluent speech. Compared with nonmovement, participants who are producing fluent self-formulated speech while rhythmically moving their limbs demonstrate heightened F0 and amplitude envelope, and such effects are more pronounced for higher-impulse arm versus lower-impulse wrist movement. We replicate that acoustic peaks arise especially during moments of peak impulse (i.e., the beat) of the movement, namely around deceleration phases of the movement. Finally, higher deceleration rates of higher-mass arm movements were related to higher peaks in acoustics. These results confirm a role for physical impulses of gesture affecting the speech system. We discuss the implications of gesture-speech physics for understanding of the emergence of communicative gesture, both ontogenetically and phylogenetically.
Collapse
|
19
|
Neuromuscular and biomechanical adjustments of the speech mechanism during modulation of vocal loudness in children with cerebral palsy and dysarthria. Neurocase 2021; 27:30-38. [PMID: 33347384 DOI: 10.1080/13554794.2020.1862240] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
ABSTRACChildren with cerebral palsy (CP) are characterized as difficult to understand because of poor articulation and breathy voice quality. This case series describes the subsystems of the speech mechanism (i.e., respiratory, laryngeal, oroarticulatory) in four children with CP and four matched typically developing children (TDC) during the modulation of vocal loudness. TDC used biomechanically efficient strategies among speech subsystems to increase vocal loudness. Children with CP made fewer breathing adjustments but recruited greater chest wall muscle activity and neuromuscular drive for louder productions. These results inform future clinical research and identify speech treatment targets for children with motor speech disorders.
Collapse
|
20
|
Association between suicidal ideation and acoustic parameters of university students' voice and speech: a pilot study. LOGOP PHONIATR VOCO 2020; 46:55-62. [PMID: 32138570 DOI: 10.1080/14015439.2020.1733075] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
PURPOSE At a worldwide level, suicide is a public health problem that, despite displaying downward trends in several areas of the world, in many countries these rates have increased. One of the elements that contributes to its prevention is an early and dynamic evaluation. Due to this, the objective is to determine the association between acoustic parameters of voice and speech (F0, F1, F2, F3, dB, and Jitter) and suicidal ideation arousal amongst some university students from the city of Temuco, Chile. METHODS Attending to this issue, a cross-sectional design study was conducted through a non-probabilistic sampling of sixty 18- and 19-year-old adolescents from the city of Temuco, that went through an acoustic evaluation of their voice and speech after taking a test to determine suicidal ideation. Afterwards, data were analyzed through IBM SPSS version 23.0 software (IBM SPSS Statistics, Armonk, NY), by means of exploratory, descriptive, and inferential statistics taking the variable's levels of measurements and the types of distributions into account. RESULTS The results point out that 30% of the adolescents, from both genders, displayed suicidal ideation. Taking into account the acoustic results of their voice, it is possible to recognize that the fundamental frequency (F0), the formants (F1, F2), and Jitter, are the ones that majorly link to the presence of suicidal ideation, both in women and men (p < .05). The characteristics that describe F3 were only linked to the presence of suicidal ideation in men (p < .05). CONCLUSIONS It is concluded that the acoustic parameters of voice and speech differ in adolescents with suicidal behavior, opening the possibility of representing a useful tool in the diagnosis of suicide.
Collapse
|
21
|
Effects of Vocal Training on Students' Voices in a Professional Drama School. OTO Open 2019; 3:2473974X19866384. [PMID: 31428732 PMCID: PMC6684147 DOI: 10.1177/2473974x19866384] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Accepted: 07/09/2019] [Indexed: 11/16/2022] Open
Abstract
Objective The purpose of this study was to investigate the effect of vocal training on acoustic and aerodynamic characteristics of student actors’ voices. Study Design Prospective cohort study. Setting Tertiary medical facility speech and swallow center. Subjects and Methods Acoustic, aerodynamic, and Voice Handicap Index–10 measures were collected from 14 first-year graduate-level drama students before and after a standard vocal training program and analyzed for changes over time. Results Among the aerodynamic measures that were collected, mean expiratory airflow was significantly reduced after vocal training. Among the acoustic measures that were collected, mean fundamental frequency was significantly increased after vocal training. On average, Voice Handicap Index–10 scores were unchanged after vocal training. Conclusion The cohort of drama students undergoing vocal training demonstrated improvements in voice aerodynamics, which indicate enhanced glottal efficiency after training. The present study also found an increased average fundamental frequency among the actors during sustained voicing and no changes in jitter and shimmer despite frequent performance.
Collapse
|
22
|
Abstract
In both practicing audiology and speech language pathology, as well as in speech and hearing science research, the space where the work is done is an integral part of the function. Hence, for all of these endeavors it can be important to measure the acoustics of a room. This article provides a tutorial regarding the measurement of room reverberation and background noise, both of which are important when evaluating a space's strengths and limitations for speech communication. As the privacy of patients and research participants is a primary concern, the tutorial also describes a method for measuring the amount of acoustical insulation provided by a room's barriers. Several room measurement data sets - all obtained from the assessment of clinical and research spaces within our own department - are presented here as examples.
Collapse
|
23
|
Modelling category goodness judgments in children with residual sound errors. CLINICAL LINGUISTICS & PHONETICS 2018; 33:295-315. [PMID: 29792525 PMCID: PMC6733520 DOI: 10.1080/02699206.2018.1477834] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Revised: 05/12/2018] [Accepted: 05/14/2018] [Indexed: 05/29/2023]
Abstract
This study investigates category goodness judgments of /r/ in adults and children with and without residual speech errors (RSEs) using natural speech stimuli. Thirty adults, 38 children with RSE (ages 7-16) and 35 age-matched typically developing (TD) children provided category goodness judgments on whole words, recorded from 27 child speakers, with /r/ in various phonetic environments. The salient acoustic property of /r/ - the lowered third formant (F3) - was normalized in two ways. A logistic mixed-effect model quantified the relationships between listeners' responses and the third formant frequency, vowel context and clinical group status. Goodness judgments from the adult group showed a statistically significant interaction with the F3 parameter when compared to both child groups (p < 0.001) using both normalization methods. The RSE group did not differ significantly from the TD group in judgments of /r/. All listeners were significantly more likely to judge /r/ as correct in a front-vowel context. Our results suggest that normalized /r/ F3 is a statistically significant predictor of category goodness judgments for both adults and children, but children do not appear to make adult-like judgments. Category goodness judgments do not have a clear relationship with /r/ production abilities in children with RSE. These findings may have implications for clinical activities that include category goodness judgments in natural speech, especially for recorded productions.
Collapse
|
24
|
Articulatory-acoustic vowel space: Associations between acoustic and perceptual measures of clear speech. INTERNATIONAL JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2017; 19:184-194. [PMID: 27328115 DOI: 10.1080/17549507.2016.1193897] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2015] [Accepted: 04/23/2016] [Indexed: 06/06/2023]
Abstract
PURPOSE The current investigation examined the relationship between perceptual ratings of speech clarity and acoustic measures of speech production. Included among the acoustic measures was the Articulatory-Acoustic Vowel Space (AAVS), which provides a measure of working formant space derived from continuously sampled formant trajectories in connected speech. METHOD Acoustic measures of articulation and listener ratings of speech clarity were obtained from habitual and clear speech samples produced by 10 neurologically healthy adults. Perceptual ratings of speech clarity were obtained from visual-analogue scale ratings and acoustic measures included the AAVS measure, articulation rate and percentage pause. RESULT Clear speech was characterised by a higher perceptual clarity rating, slower articulation rate, greater percentage pause and larger AAVS compared to habitual speech. Additionally, correlation analysis revealed a significant relationship between the perceptual clear speech effect and the relative clarity-related change in the AAVS and articulation rate measures. CONCLUSION The current findings suggest that, along with speech rate measures, the recently introduced AAVS is sensitive to changes in speech clarity.
Collapse
|
25
|
Predicting speech intelligibility with a multiple speech subsystems approach in children with cerebral palsy. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2014; 57:1666-78. [PMID: 24824584 PMCID: PMC4192090 DOI: 10.1044/2014_jslhr-s-13-0292] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2013] [Accepted: 05/08/2014] [Indexed: 05/06/2023]
Abstract
PURPOSE Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystems approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. METHOD Nine acoustic variables reflecting different subsystems, and speech intelligibility, were measured in 22 children with CP. These children included 13 with a clinical diagnosis of dysarthria (speech motor impairment [SMI] group) and 9 judged to be free of dysarthria (no SMI [NSMI] group). Data from children with CP were compared to data from age-matched typically developing children. RESULTS Multiple acoustic variables reflecting the articulatory subsystem were different in the SMI group, compared to the NSMI and typically developing groups. A significant speech intelligibility prediction model was obtained with all variables entered into the model (adjusted R2 = .801). The articulatory subsystem showed the most substantial independent contribution (58%) to speech intelligibility. Incremental R2 analyses revealed that any single variable explained less than 9% of speech intelligibility variability. CONCLUSIONS Children in the SMI group had articulatory subsystem problems as indexed by acoustic measures. As in the adult literature, the articulatory subsystem makes the primary contribution to speech intelligibility variance in dysarthria, with minimal or no contribution from other systems.
Collapse
|
26
|
Color and texture associations in voice-induced synesthesia. Front Psychol 2013; 4:568. [PMID: 24032023 PMCID: PMC3759022 DOI: 10.3389/fpsyg.2013.00568] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2013] [Accepted: 08/09/2013] [Indexed: 11/17/2022] Open
Abstract
Voice-induced synesthesia, a form of synesthesia in which synesthetic perceptions are induced by the sounds of people's voices, appears to be relatively rare and has not been systematically studied. In this study we investigated the synesthetic color and visual texture perceptions experienced in response to different types of “voice quality” (e.g., nasal, whisper, falsetto). Experiences of three different groups—self-reported voice synesthetes, phoneticians, and controls—were compared using both qualitative and quantitative analysis in a study conducted online. Whilst, in the qualitative analysis, synesthetes used more color and texture terms to describe voices than either phoneticians or controls, only weak differences, and many similarities, between groups were found in the quantitative analysis. Notable consistent results between groups were the matching of higher speech fundamental frequencies with lighter and redder colors, the matching of “whispery” voices with smoke-like textures, and the matching of “harsh” and “creaky” voices with textures resembling dry cracked soil. These data are discussed in the light of current thinking about definitions and categorizations of synesthesia, especially in cases where individuals apparently have a range of different synesthetic inducers.
Collapse
|
27
|
Basic parameters of articulatory movements and acoustics in individuals with Parkinson's disease. Mov Disord 2012; 27:843-50. [PMID: 22729986 PMCID: PMC3418799 DOI: 10.1002/mds.24888] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2011] [Revised: 11/15/2011] [Accepted: 11/25/2011] [Indexed: 11/09/2022] Open
Abstract
It has long been recognized that lesions of the basal ganglia frequently result in dysarthria, in part because many individuals with Parkinson's disease (PD) have impaired speech. Earlier studies of speech production in PD using perceptual, acoustic, and/or kinematic analyses have yielded mixed findings about the characteristics of articulatory movements underlying hypokinetic dysarthria associated with PD: in some cases reporting reduced articulatory output, and in other instances revealing orofacial movement parameters within the normal range. The central aim of this experiment was to address these inconsistencies by providing an integrative description of basic kinematic and acoustic parameters of speech production in individuals with PD. Recordings of lip and jaw movements and acoustic data were collected in 16 individuals with PD and 16 age- and sex-matched neurologically healthy older adults. Our results revealed a downscaling of articulatory dynamics in the individuals with PD, evidenced by decreased amplitude and velocity of lower lip and jaw movements, decreased vocal intensity (dB sound pressure level [SPL]), and reduced second formant (F2) slopes. However, speech rate did not differ between groups. Our finding of an overall downscaling of speech movement and acoustic parameters in some participants with PD provides support for speech therapies directed at increasing speech effort in individuals with PD.
Collapse
|
28
|
Classification of speech and language profiles in 4-year-old children with cerebral palsy: a prospective preliminary study. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2010; 53:1496-513. [PMID: 20643795 PMCID: PMC2962882 DOI: 10.1044/1092-4388(2010/09-0176)] [Citation(s) in RCA: 86] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Abstract
PURPOSE In this study, the authors proposed and tested a preliminary speech and language classification system for children with cerebral palsy. METHOD Speech and language assessment data were collected in a laboratory setting from 34 children with cerebral palsy (CP; 18 male, 16 female) with a mean age of 54 months (SD = 1.8). Measures of interest were vowel area, speech rate, language comprehension scores, and speech intelligibility ratings. RESULTS Canonical discriminant function analysis showed that 3 functions accounted for 100% of the variance among profile groups, with speech variables accounting for 93% of the variance. Classification agreement varied from 74% to 97% based on 4 different classification paradigms. CONCLUSIONS The results of this study provide preliminary support for the classification of speech and language abilities of children with CP into 4 initial profile groups. Further research is necessary to validate the full classification system.
Collapse
|
29
|
Evaluating the spectral distinction between sibilant fricatives through a speaker-centered approach. JOURNAL OF PHONETICS 2010; 38:548-554. [PMID: 21278849 PMCID: PMC3027155 DOI: 10.1016/j.wocn.2010.07.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
This study was designed to examine the feasibility of using the spectral mean and/or spectral skewness to distinguish between alveolar and palato-alveolar fricatives produced by individual adult speakers of English. Five male and five female speaker participants produced 100 CVC words with an initial consonant /s/ or /ʃ/. The spectral mean and skewness were derived every 10 milliseconds throughout the fricative segments and plotted for all productions. Distinctions were examined for each speaker through visual inspection of these time history plots and statistical comparisons were completed for analysis windows centered 50 ms after the onset of the fricative segment. The results showed significant differences between the alveolar and palato-alveolar fricatives for both the mean and skewness values. However, there was considerable inter-speaker overlap, limiting the utility of the measures to evaluate the adequacy of the phonetic distinction. When the focus shifted to individual speakers rather than average group performance, only the spectral mean distinguished consistently between the two phonetic categories. The robustness of the distinction suggests that intra-speaker overlap in spectral mean between prevocalic /s/ and /ʃ/ targets may be indicative of abnormal fricative production and a useful measure for clinical applications.
Collapse
|
30
|
Formant centralization ratio: a proposal for a new acoustic measure of dysarthric speech. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2010; 53:114-25. [PMID: 19948755 PMCID: PMC2821466 DOI: 10.1044/1092-4388(2009/08-0184)] [Citation(s) in RCA: 149] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
PURPOSE The vowel space area (VSA) has been used as an acoustic metric of dysarthric speech, but with varying degrees of success. In this study, the authors aimed to test an alternative metric to the VSA-the formant centralization ratio (FCR), which is hypothesized to more effectively differentiate dysarthric from healthy speech and register treatment effects. METHOD Speech recordings of 38 individuals with idiopathic Parkinson's disease and dysarthria (19 of whom received 1 month of intensive speech therapy [Lee Silverman Voice Treatment; LSVT LOUD]) and 14 healthy control participants were acoustically analyzed. Vowels were extracted from short phrases. The same vowel-formant elements were used to construct the FCR, expressed as (F2u + F2a + F1i + F1u) / (F2i + F1a), the VSA, expressed as ABS([F1i x (F2a - F2u) + F1a x (F2u - F2i) + F1u x (F2i - F2a)] / 2), a logarithmically scaled version of the VSA (LnVSA), and the F2i /F2u ratio. RESULTS Unlike the VSA and the LnVSA, the FCR and F2i/F2u ratio robustly differentiated dysarthric from healthy speech and were not gender sensitive. All metrics effectively registered treatment effects and were strongly correlated with each other. CONCLUSION Albeit preliminary, the present findings indicate that the FCR is a sensitive, valid, and reliable acoustic metric for distinguishing dysarthric from unimpaired speech and for monitoring treatment effects, probably because of reduced sensitivity to interspeaker variability and enhanced sensitivity to vowel centralization.
Collapse
|