1
|
Trayvick J, Barkley SB, McGowan A, Srivastava A, Peters AW, Cecchi GA, Foss-Feig JH, Corcoran CM. Speech and language patterns in autism: Towards natural language processing as a research and clinical tool. Psychiatry Res 2024; 340:116109. [PMID: 39106814 PMCID: PMC11371491 DOI: 10.1016/j.psychres.2024.116109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 07/22/2024] [Accepted: 07/26/2024] [Indexed: 08/09/2024]
Abstract
Speech and language differences have long been described as important characteristics of autism spectrum disorder (ASD). Linguistic abnormalities range from prosodic differences in pitch, intensity, and rate of speech, to language idiosyncrasies and difficulties with pragmatics and reciprocal conversation. Heterogeneity of findings and a reliance on qualitative, subjective ratings, however, limit a full understanding of linguistic phenotypes in autism. This review summarizes evidence of both speech and language differences in ASD. We also describe recent advances in linguistic research, aided by automated methods and software like natural language processing (NLP) and speech analytic software. Such approaches allow for objective, quantitative measurement of speech and language patterns that may be more tractable and unbiased. Future research integrating both speech and language features and capturing "natural language" samples may yield a more comprehensive understanding of language differences in autism, offering potential implications for diagnosis, intervention, and research.
Collapse
Affiliation(s)
- Jadyn Trayvick
- Seaver Autism Center for Research and Treatment, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, Box 1230, New York, NY 10029, USA; Department of Psychology, Stony Brook University, Stony Brook, NY 11794, USA
| | - Sarah B Barkley
- Seaver Autism Center for Research and Treatment, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, Box 1230, New York, NY 10029, USA; Department of Psychology, Stony Brook University, Stony Brook, NY 11794, USA
| | - Alessia McGowan
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, Box 1230, New York, NY 10029, USA
| | - Agrima Srivastava
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, Box 1230, New York, NY 10029, USA
| | - Arabella W Peters
- Seaver Autism Center for Research and Treatment, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, Box 1230, New York, NY 10029, USA
| | - Guillermo A Cecchi
- Computational Biology Center-Neuroscience, IBM T.J. Watson Research Center, 1101 Kitchawan Road, Yorktown Heights, NY 10598, USA
| | - Jennifer H Foss-Feig
- Seaver Autism Center for Research and Treatment, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, Box 1230, New York, NY 10029, USA; Department of Psychiatry, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, Box 1230, New York, NY 10029, USA; Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, Box 1230, New York, NY 10029, USA
| | - Cheryl M Corcoran
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, Box 1230, New York, NY 10029, USA; James J. Peters Veterans Administration, 130 W Kingsbridge Rd, Bronx, NY 10468, USA.
| |
Collapse
|
2
|
TaghiBeyglou B, Čuljak I, Bagheri F, Suntharalingam H, Yadollahi A. Estimating the severity of obstructive sleep apnea during wakefulness using speech: A review. Comput Biol Med 2024; 181:109020. [PMID: 39173487 DOI: 10.1016/j.compbiomed.2024.109020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Revised: 06/12/2024] [Accepted: 08/09/2024] [Indexed: 08/24/2024]
Abstract
Obstructive sleep apnea (OSA) is a chronic breathing disorder during sleep that affects 10-30% of adults in North America. The gold standard for diagnosing OSA is polysomnography (PSG). However, PSG has several drawbacks, for example, it is a cumbersome and expensive procedure, which can be quite inconvenient for patients. Additionally, patients often have to endure long waitlists before they can undergo PSG. As a result, other alternatives for screening OSA have gained attention. Speech, as an accessible modality, is generated by variations in the pharyngeal airway, vocal tract, and soft tissues in the pharynx, which shares similar anatomical structures that contribute to OSA. Consequently, in this study, we aim to provide a comprehensive review of the existing research on the use of speech for estimating the severity of OSA. In this regard, a total of 851 papers were initially identified from the PubMed database using a specified set of keywords defined by population, intervention, comparison and outcome (PICO) criteria, along with a concatenated graph of the 5 most cited papers in the field extracted from ConnectedPapers platform. Following a rigorous filtering process that considered the preferred reporting items for systematic reviews and meta-analyses (PRISMA) approach, 32 papers were ultimately included in this review. Among these, 28 papers primarily focused on developing methodology, while the remaining 4 papers delved into the clinical perspective of the association between OSA and speech. In the next step, we investigate the physiological similarities between OSA and speech. Subsequently, we highlight the features extracted from speech, the employed feature selection techniques, and the details of the developed models to predict OSA severity. By thoroughly discussing the current findings and limitations of studies in the field, we provide valuable insights into the gaps that need to be addressed in future research directions.
Collapse
Affiliation(s)
- Behrad TaghiBeyglou
- Institute of Biomedical Engineering, University of Toronto, Toronto, ON, Canada; KITE Research Institute, Toronto Rehabilitation Institute- University Health Network, Toronto, ON, Canada
| | - Ivana Čuljak
- KITE Research Institute, Toronto Rehabilitation Institute- University Health Network, Toronto, ON, Canada
| | - Fatemeh Bagheri
- Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada; North York General Hospital, Toronto, ON, Canada
| | - Haarini Suntharalingam
- KITE Research Institute, Toronto Rehabilitation Institute- University Health Network, Toronto, ON, Canada
| | - Azadeh Yadollahi
- Institute of Biomedical Engineering, University of Toronto, Toronto, ON, Canada; KITE Research Institute, Toronto Rehabilitation Institute- University Health Network, Toronto, ON, Canada.
| |
Collapse
|
3
|
Guo C, Chen F, Kuang C, Dong L. An investigation of acoustic cues to tonal registers and voicing in Donglei Kama). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 156:655-671. [PMID: 39051719 DOI: 10.1121/10.0028009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 06/02/2024] [Indexed: 07/27/2024]
Abstract
The Kam language has experienced historical tonal splits, resulting in the development of a complex tonal system. However, there is still limited knowledge regarding the acoustic characteristics associated with aspiration-based tone splitting. This study aims to investigate the acoustic cues related to the tonal registers and laryngeal configurations in Donglei Kam, a dialect of Southern Kam. Sixteen native speakers of Donglei Kam participated, producing lexical tones. Statistical analyses were conducted to examine the acoustic distinctions between tonal registers, using measurements of voice onset time, spectral tilt, noise, and energy. The results indicated that Donglei Kam retained a two-way contrast of aspiration, albeit with a trend toward gradual loss. Additionally, a breathy voice was detected in the Ciyin tonal register, characterized by elevated spectral tilt values and spectral noise throughout the vowels. Moreover, machine learning classifiers effectively identified tonal registers using voice-quality data, suggesting that the phonation contrast between breathy and modal voice could contribute to the tonal split alongside pitch contrast. In summary, these findings enhance our understanding of the acoustic implementation of breathiness in Kam and offer valuable insights into the role of laryngeal contrast in tonal splits.
Collapse
Affiliation(s)
- Chengyu Guo
- Faculty of Arts and Sciences, Beijing Normal University, Zhuhai 519087, China
| | - Fei Chen
- School of Foreign Languages, Hunan University, Changsha 410082, China
| | - Chen Kuang
- School of Foreign Languages, Hunan University, Changsha 410082, China
| | - Longjie Dong
- School of Foreign Languages, Hunan University, Changsha 410082, China
| |
Collapse
|
4
|
Nylén F, Holmberg J, Södersten M. Acoustic cues to femininity and masculinity in spontaneous speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:3090-3100. [PMID: 38717212 DOI: 10.1121/10.0025932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Accepted: 04/21/2024] [Indexed: 09/20/2024]
Abstract
The perceived level of femininity and masculinity is a prominent property by which a speaker's voice is indexed, and a vocal expression incongruent with the speaker's gender identity can greatly contribute to gender dysphoria. Our understanding of the acoustic cues to the levels of masculinity and femininity perceived by listeners in voices is not well developed, and an increased understanding of them would benefit communication of therapy goals and evaluation in gender-affirming voice training. We developed a voice bank with 132 voices with a range of levels of femininity and masculinity expressed in the voice, as rated by 121 listeners in independent, individually randomized perceptual evaluations. Acoustic models were developed from measures identified as markers of femininity or masculinity in the literature using penalized regression and tenfold cross-validation procedures. The 223 most important acoustic cues explained 89% and 87% of the variance in the perceived level of femininity and masculinity in the evaluation set, respectively. The median fo was confirmed to provide the primary cue, but other acoustic properties must be considered in accurate models of femininity and masculinity perception. The developed models are proposed to afford communication and evaluation of gender-affirming voice training goals and improve voice synthesis efforts.
Collapse
Affiliation(s)
- Fredrik Nylén
- Department of Clinical Sciences, Division of Speech and Language Pathology, Umeå University, Umeå SE901 87, Sweden
| | - Jenny Holmberg
- Department of Clinical Sciences, Division of Speech and Language Pathology, Umeå University, Umeå SE901 87, Sweden
| | - Maria Södersten
- Division of Speech and Language Pathology, Department of Clinical Science, Intervention and Technology, Karolinska Institutet, Stockholm SE141 86, Sweden
- Speech and Language Pathology, Medical Unit, Karolinska University Hospital, Stockholm SE-141 86, Sweden
| |
Collapse
|
5
|
Yaşar Ö, Tahir E, Erensoy I, Terzi M. Comparing dysphonia severity index, objective, subjective, and perceptual analysis of voice in patients with multiple sclerosis and healthy controls. Mult Scler Relat Disord 2024; 82:105378. [PMID: 38142514 DOI: 10.1016/j.msard.2023.105378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 11/17/2023] [Accepted: 12/11/2023] [Indexed: 12/26/2023]
Abstract
BACKGROUND Impairments in voice quality in Multiple Sclerosis (MS) have recently been investigated and different results were found. A voice-centered multidimensional assessment protocol with patient-reported outcome measures was conducted to evaluate all the aspects of the voice changes. OBJECTIVES The study aimed to compare the objective, subjective, and perceptual measures of voice between the people with MS and the healthy control group. METHODS A total of 128 participants, including 64 people with MS age, and gender-matched healthy controls were enrolled in the study. Subjective, objective, and auditory-perceptual voice assessments of the participants were performed. The auditory-perceptual evaluation was performed with GRBAS. The Dysphonia Severity index was computed for both groups. All the participants completed the Turkish version of The Voice Handicap Index-10 (VHI-10) and the Voice-Related Quality of Life (VRQoL). RESULTS Acoustic and aerodynamic parameters of voice were found significantly different for both males and females between the MS and control group. DSI was found significantly different for both males and females in the MS group compared to the control group (p<0.05). All components of the GRBAS scale were significantly higher in the MS group (p<0.001). Using a multivariate regression model, it was determined that age, gender, EDSS score, number of MS attacks, and disease duration did not affect the DSI. The overall VHI-10 score was higher in the MS group (median=1.0 range= 0-28) and lower in the control group (median=0 range= 0-4). The mean VRQoL was lower in the MS group (median=95 range= 62.5-100) than in controls (median=100 range= 85-100) (p<0.001). CONCLUSION Our results indicated that people with MS have significant differences in acoustic and aerodynamic parameters of voice compared to healthy individuals. A significant number of persons with MS are aware that their voice problem affects their quality of life. People with MS must be monitored for voice changes and a multidimensional voice assessment protocol should be implemented.
Collapse
Affiliation(s)
- Özlem Yaşar
- Ondokuz Mayıs University Faculty of Health Sciences, Department of Speech and Language Therapy, Samsun, Turkey.
| | - Emel Tahir
- Ondokuz Mayıs University School of Medicine, Department of Otolaryngology, Samsun, Turkey
| | - Ibrahim Erensoy
- Ondokuz Mayıs University Faculty of Health Sciences, Department of Speech and Language Therapy, Samsun, Turkey.
| | - Murat Terzi
- Ondokuz Mayıs University School of Medicine, Department of Neurology, Samsun, Turkey
| |
Collapse
|
6
|
Kankare E, Laukkanen AM. Validation of the Acoustic Breathiness Index in Speakers of Finnish Language. J Clin Med 2023; 12:7607. [PMID: 38137676 PMCID: PMC10743974 DOI: 10.3390/jcm12247607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Revised: 11/27/2023] [Accepted: 12/07/2023] [Indexed: 12/24/2023] Open
Abstract
Breathiness (perception of turbulence noise in the voice) is one of the major components of hoarseness in dysphonic voices. This study aims to validate a multiparameter analysis tool, the Acoustic Breathiness Index (ABI), for quantification of breathiness in the speaking voice, including both sustained vowels and continuous speech. One hundred and eight speakers with dysphonia (28 M, 80 F, mean age 50, SD 15.4 years) and 87 non-dysphonic controls (18 M, 69 F, mean age 42, SD 14 years) volunteered as participants. They read a standard text and sustained vowel /a:/. Acoustic recordings were made using a head-mounted microphone. Acoustic samples were evaluated perceptually by nine voice experts of different backgrounds (speech therapists, vocologists and laryngologists). Breathiness (B) from the GRBAS scale was rated. Headphones were used in the perceptual analysis. The dysphonic and non-dysphonic speakers differed significantly from each other in the auditory perceptual evaluation of breathiness. A significant difference was also found for ABI, which had a mean value of 2.26 (SD 1.15) for non-dysphonic and 3.07 (SD 1.75) for dysphonic speakers. ABI correlated strongly with B (rs = 0.823, p = 0.01). ABI's power to distinguish the groups was high (88.6%). The highest sensitivity and specificity of ABI (80%) was obtained at threshold value 2.68. ABI is a valid tool for differentiating breathiness in non-dysphonic and dysphonic speakers of Finnish.
Collapse
Affiliation(s)
- Elina Kankare
- Department of Rehabilitation and Psychosocial Support, Logopedics, Phoniatrics, Tampere University Hospital, 33520 Tampere, Finland
| | - Anne-Maria Laukkanen
- Speech and Voice Research Laboratory, Tampere University, 33100 Tampere, Finland;
| |
Collapse
|
7
|
Bruder C, Larrouy-Maestri P. Classical singers are also proficient in non-classical singing. Front Psychol 2023; 14:1215370. [PMID: 38023013 PMCID: PMC10630913 DOI: 10.3389/fpsyg.2023.1215370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Accepted: 10/02/2023] [Indexed: 12/01/2023] Open
Abstract
Classical singers train intensively for many years to achieve a high level of vocal control and specific sound characteristics. However, the actual span of singers' activities often includes venues other than opera halls and requires performing in styles outside their strict training (e.g., singing pop songs at weddings). We examine classical singers' ability to adjust their vocal productions to other styles, in relation with their formal training. Twenty-two highly trained female classical singers (aged from 22 to 45 years old; vocal training ranging from 4.5 to 27 years) performed six different melody excerpts a cappella in contrasting ways: as an opera aria, as a pop song and as a lullaby. All melodies were sung both with lyrics and with a /lu/ sound. All productions were acoustically analyzed in terms of seven common acoustic descriptors of voice/singing performances and perceptually evaluated by a total of 50 lay listeners (aged from 21 to 73 years old) who were asked to identify the intended singing style in a forced-choice lab experiment. Acoustic analyses of the 792 performances suggest distinct acoustic profiles, implying that singers were able to produce contrasting sounding performances. Furthermore, the high overall style recognition rate (78.5% Correct Responses, hence CR) confirmed singers' proficiency in performing in operatic style (86% CR) and their versatility when it comes to lullaby (80% CR) and pop performances (69% CR), albeit with occasional confusion between the latter two. Interestingly, different levels of competence among singers appeared, with versatility (as estimated based on correct recognition in pop/lullaby styles) ranging from 62 to 83% depending on the singer. Importantly, this variability was not linked to formal training per se. Our results indicate that classical singers are versatile, and prompt the need for further investigations to clarify the role of singers' broader professional and personal experiences in the development of this valuable ability.
Collapse
Affiliation(s)
- Camila Bruder
- Department of Music, Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany
| | - Pauline Larrouy-Maestri
- Department of Music, Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany
- Max Planck-NYU Center for Language, Music, and Emotion (CLaME), New York, NY, United States
| |
Collapse
|
8
|
Fung RSY, Wong EYC. Separated and reunified: An apparent time investigation of the voice quality differences between Hong Kong Cantonese and Guangzhou Cantonese. PLoS One 2023; 18:e0293058. [PMID: 37851598 PMCID: PMC10584129 DOI: 10.1371/journal.pone.0293058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 10/04/2023] [Indexed: 10/20/2023] Open
Abstract
Hong Kong Cantonese (HKC) and Guangzhou Cantonese (GZC) are two major accents of Cantonese spoken in two geographically non-contiguous cities in Southern China. Previous studies were unable to identify the phonetic features that discern the two accents since they share the same phonological system. This study attempted to solve the puzzle by investigating the voice quality differences between the two accents through acoustic analysis on the speech output of 191 talkers in three age groups ranging from 18 to 65 years old. Among the various spectral and noise measurements of voice quality, we found that Cepstral Peak Prominence (CPP) was the best acoustic measure to discern the two accents. Based on the CPP measure, GZC had overall increased noise than HKC. Covariation of voice quality and tones was studied. The greatest CPP differences between the two accents were found in the two extreme tones: the high-level and the extra-low-level tones. Furthermore, creaky voice was found mainly tied to the extra-low-level tone in both accents. However, HKC exhibited higher frequency of creaky voice than GZC. The creaky voice in GZC was characterized by increased noise and increased tension, compared to those of HKC. Finally, age was found to be a mediating factor in the voice quality of the two accents. Adopting the Apparent Time Framework, voice quality in the two cities has undergone changes over time. The voice quality of the young generations of the two accents have become merged among the three low tones. Furthermore, the prevalence of creaky voice was increasing across age groups in both accents, and it increased at a faster rate in HKC than GZC.
Collapse
Affiliation(s)
- Roxana S. Y. Fung
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hung Hom, Hong Kong
| | - Eugene Y. C. Wong
- Department of Speech-Language-Hearing Sciences, University of Minnesota, Minneapolis, Minnesota, United States of America
| |
Collapse
|
9
|
Kim JA, Jang H, Choi Y, Min YG, Hong YH, Sung JJ, Choi SJ. Subclinical articulatory changes of vowel parameters in Korean amyotrophic lateral sclerosis patients with perceptually normal voices. PLoS One 2023; 18:e0292460. [PMID: 37831677 PMCID: PMC10575489 DOI: 10.1371/journal.pone.0292460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 09/21/2023] [Indexed: 10/15/2023] Open
Abstract
The available quantitative methods for evaluating bulbar dysfunction in patients with amyotrophic lateral sclerosis (ALS) are limited. We aimed to characterize vowel properties in Korean ALS patients, investigate associations between vowel parameters and clinical features of ALS, and analyze subclinical articulatory changes of vowel parameters in those with perceptually normal voices. Forty-three patients with ALS (27 with dysarthria and 16 without dysarthria) and 20 healthy controls were prospectively collected in the study. Dysarthria was assessed using the ALS Functional Rating Scale-Revised (ALSFRS-R) speech subscores, with any loss of 4 points indicating the presence of dysarthria. The structured speech samples were recorded and analyzed using Praat software. For three corner vowels (/a/, /i/, and /u/), data on the vowel duration, fundamental frequency, frequencies of the first two formants (F1 and F2), harmonics-to-noise ratio, vowel space area (VSA), and vowel articulation index (VAI) were extracted from the speech samples. Corner vowel durations were significantly longer in ALS patients with dysarthria than in healthy controls. The F1 frequency of /a/, F2 frequencies of /i/ and /u/, the VSA, and the VAI showed significant differences between ALS patients with dysarthria and healthy controls. The area under the curve (AUC) was 0.912. The F1 frequency of /a/ and the VSA were the major determinants for differentiating ALS patients who had not yet developed apparent dysarthria from healthy controls (AUC 0.887). In linear regression analyses, as the ALSFRS-R speech subscore decreased, both the VSA and VAI were reduced. In contrast, vowel durations were found to be rather prolonged. The analyses of vowel parameters provided a useful metric correlated with disease severity for detecting subclinical bulbar dysfunction in ALS patients.
Collapse
Affiliation(s)
- Jin-Ah Kim
- Department of Neurology, Seoul National University Hospital, Seoul, Republic of Korea
- Department of Translational Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea
- Genomic Medicine Institute, Medical Research Center, Seoul National University, Seoul, Republic of Korea
| | - Hayeun Jang
- Division of English, Busan University of Foreign Studies, Busan, Republic of Korea
| | - Yoonji Choi
- Department of Korean Language and Literature, Seoul National University, Seoul, Republic of Korea
| | - Young Gi Min
- Department of Neurology, Seoul National University Hospital, Seoul, Republic of Korea
- Department of Translational Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Yoon-Ho Hong
- Department of Neurology, Seoul Metropolitan Government-Seoul National University Boramae Medical Center, Seoul, Republic of Korea
| | - Jung-Joon Sung
- Department of Neurology, Seoul National University Hospital, Seoul, Republic of Korea
- Neuroscience Research Institute, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Seok-Jin Choi
- Department of Neurology, Seoul National University Hospital, Seoul, Republic of Korea
- Center for Hospital Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| |
Collapse
|
10
|
Ikuma T, McWhorter AJ, Oral E, Kunduk M. Formant-Aware Spectral Analysis of Sustained Vowels of Pathological Breathy Voice. J Voice 2023:S0892-1997(23)00154-6. [PMID: 37302909 DOI: 10.1016/j.jvoice.2023.05.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 05/07/2023] [Accepted: 05/08/2023] [Indexed: 06/13/2023]
Abstract
OBJECTIVES This paper reports the effectiveness of formant-aware spectral parameters to predict the perceptual breathiness rating. A breathy voice has a steeper spectral slope and higher turbulent noise than a normal voice. Measuring spectral parameters of acoustic signals over lower formant regions is a known approach to capture the properties related to breathiness. This study examines this approach by testing the contemporary spectral parameters and algorithms within the framework, alternate frequency band designs, and vowel effects. METHODS Sustained vowel recordings (/a/, /i/, and /u/) of speakers with voice disorders in the German Saarbrueken Voice Database were considered (n: 367). Recordings with signal irregularities, such as subharmonics or with roughness perception, were excluded from the study. Four speech language pathologists perceptually rated the recordings for breathiness on a 100-point scale, and their averages were used in the analysis. The acoustic spectra were segmented into four frequency bands according to the vowel formant structures. Five spectral parameters (intraband harmonics-to-noise ratio, HNR; interband harmonics ratio, HHR; interband noise ratio, NNR; and interband glottal-to-noise energy, GNE, ratio) were evaluated in each band to predict the perceptual breathiness rating. Four HNR algorithms were tested. RESULTS Multiple linear regression models of spectral parameters, led by the HNRs, were shown to explain up to 85% of the variance in perceptual breathiness ratings. This performance exceeded that of the acoustic breathiness index (82%). Individually, the HNR over the first two formants best explained the variances in the breathiness (78%), exceeding the smoothed cepstrum peak prominence (74%). The performance of HNR was highly algorithm dependent (10% spread). Some vowel effects were observed in the perceptual rating (higher for /u/), predictability (5% lower for /u/), and model parameter selections. CONCLUSIONS Strong per-vowel breathiness acoustic models were found by segmenting the spectrum to isolate the portion most affected by breathiness.
Collapse
Affiliation(s)
- Takeshi Ikuma
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, Louisiana; Voice Center, The Our Lady of The Lake Regional Medical Center, Baton Rouge, Louisiana.
| | - Andrew J McWhorter
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, Louisiana; Voice Center, The Our Lady of The Lake Regional Medical Center, Baton Rouge, Louisiana
| | - Evrim Oral
- Biostatistics Program, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, Louisiana
| | - Melda Kunduk
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, Louisiana; Voice Center, The Our Lady of The Lake Regional Medical Center, Baton Rouge, Louisiana; Dept. of Communication Sciences & Disorders, Louisiana State University, Baton Rouge, Louisiana
| |
Collapse
|
11
|
Chan RKW. Evidential value of voice quality acoustics in forensic voice comparison. Forensic Sci Int 2023; 348:111725. [PMID: 37182279 DOI: 10.1016/j.forsciint.2023.111725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Revised: 03/28/2023] [Accepted: 05/05/2023] [Indexed: 05/16/2023]
Abstract
Voice recordings in forensic voice comparison casework typically involve speech style mismatch and are separated by days or weeks, but studies that aim to empirically validate the evidential value of speech features rarely include systematic comparisons on contemporaneous vs. non-contemporaneous recordings and match vs. mismatch in speech style. This study addresses this gap and focuses on the acoustics of laryngeal voice quality, since voice quality has been reported to be one of the most popular and useful features for forensic voice comparison. 75 male speakers aged 18-45 were selected from a forensically-oriented database of Australian English speakers in Sydney/New South Wales. The evidential strength of a number of spectral tilt and additive noise parameters were tested under the Bayesian likelihood-ratio framework. Results show that system performance using these parameters as input were stable across 50 replications. When speech style is controlled for, VQ parameters yielded promising results and better system validity was achieved when using more VQ parameters. However, they offered limited speaker-discriminatory value when speech style mismatch is involved, and non-contemporaneous recordings only led to a small decline in performance. Overall, forensic practitioners should be cautious when using spectral tilt measures and additive noise measures as speaker discriminants in forensic casework.
Collapse
|
12
|
Nguyen DD, Madill C. Auditory-perceptual Parameters as Predictors of Voice Acoustic Measures. J Voice 2023:S0892-1997(23)00088-7. [PMID: 37003863 DOI: 10.1016/j.jvoice.2023.02.030] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 02/23/2023] [Accepted: 02/23/2023] [Indexed: 04/03/2023]
Abstract
BACKGROUND Much research has examined the relationship between perceptual and acoustic measures. However, little is known about the prediction values of perceptual measures on an acoustic parameter. AIMS This study utilized simulated and disordered voice samples to investigate the prediction values of breathiness, roughness, and strain ratings on the selection of some time-based and spectral-based measures of voice quality. METHOD This study retrospectively analysed two sets of precollected data. The experimental data had been collected from nine trained speakers manipulating false vocal fold activity, true vocal fold mass, and larynx height. The voice-disordered data had been extracted from a clinical database for 68 patients with muscle tension voice disorders (MTVD). Both data sets had been perceptually rated for breathiness, roughness, and strain. Voice samples (prolonged vowel /ɑ/ and Rainbow Passage readings) had undergone acoustic analysis using Praat for harmonics-to-noise ratio (HNR) and the program "Analysis of Dysphonia in Speech and Voice" (ADSV) for cepstral peak prominence (CPP), Cepstral/Spectral Index of Dysphonia (CSID), and Low/High spectral ratio (L/H ratio). Perceptual parameters were regressed against these acoustic measures to test their prediction values. RESULTS Reliability data showed satisfactory intra- and inter-reliability of perceptual ratings for both data sets. Breathiness significantly predicted CPP (both vocal tasks) and CSID (Rainbow Passage) in experimental data and predicted all the acoustic measures in MTVD data. Roughness significantly predicted HNR, CPP, and CSID in experimental data, and CPP (Rainbow Passage) and CSID (both vocal tasks) in MTVD data. Strain (both vocal tasks) significantly predicted L/H ratio in both data sets. CONCLUSIONS Breathiness ratings predicted selection of HNR, CPP and CSID; roughness ratings predicted selection of CPP and CSID, and strain ratings predicted L/H ratio.
Collapse
Affiliation(s)
- Duy Duong Nguyen
- Voice Research Laboratory, Sydney School of Health Sciences, Faculty of Medicine and Health, The University of Sydney, Sydney, Australia
| | - Catherine Madill
- Voice Research Laboratory, Sydney School of Health Sciences, Faculty of Medicine and Health, The University of Sydney, Sydney, Australia.
| |
Collapse
|
13
|
Yaslıkaya S, Geçkil AA, Birişik Z. Is There a Relationship between Voice Quality and Obstructive Sleep Apnea Severity and Cumulative Percentage of Time Spent at Saturations below Ninety Percent: Voice Analysis in Obstructive Sleep Apnea Patients. MEDICINA (KAUNAS, LITHUANIA) 2022; 58:medicina58101336. [PMID: 36295497 PMCID: PMC9608866 DOI: 10.3390/medicina58101336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/27/2022] [Revised: 09/08/2022] [Accepted: 09/19/2022] [Indexed: 11/19/2022]
Abstract
Background and Objectives: Apnea hypopnea index is the most important criterion in determining the severity of obstructive sleep apnea (OSA), while the percentage of the total number of times which oxygen saturation is measured below 90% during polysomnography (CT90%) is important in determining the severity of hypoxemia. As hypoxemia increases, inflammation will also increase in OSA. Inflammation in the respiratory tract may affect phonation. We aimed to determine the effects of the degree of OSA and CT90% on phonation. Materials and Methods: The patients were between the ages of 18−60 years and were divided into four groups: normal, mild, moderate, and severe OSA. Patients were asked to say the vowels /α:/ and /i:/ for 5 s for voice recording. Maximum phonation time (MPT) was recorded. Using the Praat voice analysis program, Jitter%, Shimmer%, harmonics-to-noise ratio (HNR), and f0 values were obtained. Results: Seventy-two patients were included. Vowel sound /α:/; there was a significant difference for Jitter%, Shimmer%, and HNR measurements between the 1st and the 4th group (p < 0.001, p < 0.001, and p < 0.001, respectively) and a correlation between CT90% and Shimmer% and HNR values (p < 0.001 and p < 0.021, respectively). Vowel sound /i:/; there was a significant difference in f0 values between the 1st group and 2nd and 4th groups (p < 0.028 and p < 0.015, respectively), and for Jitter%, Shimmer%, and HNR measurements between the 1st and 4th group (p < 0.04, p < 0.000, and p < 0.000, respectively), and a correlation between CT90% and Shimmer% and HNR values (p < 0.016 and p < 0.003, respectively). The difference was significant in MPT between the 1st group and 3rd and 4th groups (p < 0.03 and p < 0.003, respectively). Conclusions: Glottic phonation can be affected, especially in patients whose AHI scores are ≥15. Voice quality can decrease as the degree of OSA increases. The increase in CT90% can be associated with the worsening of voice and can be used as a predictor in the evaluation of voice disorders in the future.
Collapse
Affiliation(s)
- Serhat Yaslıkaya
- Department of Otorhinolaryngology, Faculty of Medicine, Adıyaman University, Adıyaman 02100, Turkey
- Correspondence: ; Tel.: +90-4162161015
| | - Ayşegül Altıntop Geçkil
- Department of Chest Diseases, Faculty of Medicine, Malatya Turgut Özal University, Malatya 44210, Turkey
| | - Zehra Birişik
- Department of Speech and Language Therapy, Malatya Training and Research Hospital, Malatya 44000, Turkey
| |
Collapse
|
14
|
Ikuma T, Story B, McWhorter AJ, Adkins L, Kunduk M. Harmonics-to-noise ratio estimation with deterministically time-varying harmonic model for pathological voice signals. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:1783. [PMID: 36182331 DOI: 10.1121/10.0014177] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 09/01/2022] [Indexed: 06/16/2023]
Abstract
The harmonics-to-noise ratio (HNR) and other spectral noise parameters are important in clinical objective voice assessment as they could indicate the presence of nonharmonic phenomena, which are tied to the perception of hoarseness or breathiness. Existing HNR estimators are built on the voice signals to be nearly periodic (fixed over a short period), although voice pathology could induce involuntary slow modulation to void this assumption. This paper proposes the use of a deterministically time-varying harmonic model to improve the HNR measurements. To estimate the time-varying model, a two-stage iterative least squares algorithm is proposed to reduce model overfitting. The efficacy of the proposed HNR estimator is demonstrated with synthetic signals, simulated tremor signals, and recorded acoustic signals. Results indicate that the proposed algorithm can produce consistent HNR measures as the extent and rate of tremor are varied.
Collapse
Affiliation(s)
- Takeshi Ikuma
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, Louisiana 70112, USA
| | - Brad Story
- Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721, USA
| | - Andrew J McWhorter
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, Louisiana 70112, USA
| | - Lacey Adkins
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, Louisiana 70112, USA
| | - Melda Kunduk
- Department of Communication Disorders, Louisiana State University, Baton Rouge, Louisiana 70803, USA
| |
Collapse
|
15
|
|
16
|
Barreda S, Assmann PF. Perception of gender in children's voices. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:3949. [PMID: 34852594 DOI: 10.1121/10.0006785] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Accepted: 09/30/2021] [Indexed: 06/13/2023]
Abstract
To investigate the perception of gender from children's voices, adult listeners were presented with /hVd/ syllables, in isolation and in sentence context, produced by children between 5 and 18 years. Half the listeners were informed of the age of the talker during trials, while the other half were not. Correct gender identifications increased with talker age; however, performance was above chance even for age groups where the cues most often associated with gender differentiation (i.e., average fundamental frequency and formant frequencies) were not consistently different between boys and girls. The results of acoustic models suggest that cues were used in an age-dependent manner, whether listeners were explicitly told the age of the talker or not. Overall, results are consistent with the hypothesis that talker age and gender are estimated jointly in the process of speech perception. Furthermore, results show that the gender of individual talkers can be identified accurately well before reliable anatomical differences arise in the vocal tracts of females and males. In general, results support the notion that the transmission of gender information from voice depends substantially on gender-dependent patterns of articulation, rather than following deterministically from anatomical differences between male and female talkers.
Collapse
Affiliation(s)
- Santiago Barreda
- Department of Linguistics, University of California, Davis, California 95616, USA
| | - Peter F Assmann
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, Texas 75080, USA
| |
Collapse
|
17
|
Gómez-García J, Moro-Velázquez L, Arias-Londoño J, Godino-Llorente J. On the design of automatic voice condition analysis systems. Part III: review of acoustic modelling strategies. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2020.102049] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
18
|
Using Pitch Height and Pitch Strength to Characterize Type 1, 2, and 3 Voice Signals. J Voice 2021; 35:181-193. [DOI: 10.1016/j.jvoice.2019.08.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Revised: 08/05/2019] [Accepted: 08/08/2019] [Indexed: 11/19/2022]
|
19
|
Kreiman J, Lee Y, Garellek M, Samlan R, Gerratt BR. Validating a psychoacoustic model of voice quality. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:457. [PMID: 33514179 PMCID: PMC7822631 DOI: 10.1121/10.0003331] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Revised: 12/07/2020] [Accepted: 12/16/2020] [Indexed: 05/19/2023]
Abstract
No agreed-upon method currently exists for objective measurement of perceived voice quality. This paper describes validation of a psychoacoustic model designed to fill this gap. This model includes parameters to characterize the harmonic and inharmonic voice sources, vocal tract transfer function, fundamental frequency, and amplitude of the voice, which together serve to completely quantify the integral sound of a target voice sample. In experiment 1, 200 voices with and without diagnosed vocal pathology were fit with the model using analysis-by-synthesis. The resulting synthetic voice samples were not distinguishable from the original voice tokens, suggesting that the model has all the parameters it needs to fully quantify voice quality. In experiment 2 parameters that model the harmonic voice source were removed one by one, and the voice tokens were re-synthesized with the reduced model. In every case the lower-dimensional models provided worse perceptual matches to the quality of the natural tokens than did the original set, indicating that the psychoacoustic model cannot be reduced in dimensionality without loss of fit to the data. Results confirm that this model can be validly applied to quantify voice quality in clinical and research applications.
Collapse
Affiliation(s)
- Jody Kreiman
- Departments of Head and Neck Surgery and Linguistics, University of California-Los Angeles, Los Angeles, California 90095-1794, USA
| | - Yoonjeong Lee
- Departments of Head and Neck Surgery and Linguistics, University of California-Los Angeles, Los Angeles, California 90095-1794, USA
| | - Marc Garellek
- Department of Linguistics, University of California-San Diego, San Diego, California 92093-0108, USA
| | - Robin Samlan
- Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721, USA
| | - Bruce R Gerratt
- Department of Head and Neck Surgery, University of California-Los Angeles School of Medicine, Los Angeles, California 90095-1794, USA
| |
Collapse
|
20
|
Asiaee M, Vahedian-Azimi A, Atashi SS, Keramatfar A, Nourbakhsh M. Voice Quality Evaluation in Patients With COVID-19: An Acoustic Analysis. J Voice 2020; 36:879.e13-879.e19. [PMID: 33051108 PMCID: PMC7528943 DOI: 10.1016/j.jvoice.2020.09.024] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2020] [Revised: 09/26/2020] [Accepted: 09/29/2020] [Indexed: 01/19/2023]
Abstract
Objectives With the COVID-19 outbreak around the globe and its potential effect on infected patients’ voice, this study set out to evaluate and compare the acoustic parameters of voice between healthy and infected people in an objective manner. Methods Voice samples of 64 COVID-19 patients and 70 healthy Persian speakers who produced a sustained vowel /a/ were evaluated. Between-group comparisons of the data were performed using the two-way ANOVA and Wilcoxon's rank-sum test. Results The results revealed significant differences in CPP, HNR, H1H2, F0SD, jitter, shimmer, and MPT values between COVID-19 patients and the healthy participants. There were also significant differences between the male and female participants in all the acoustic parameters, except jitter, shimmer and MPT. No interaction was observed between gender and health status in any of the acoustic parameters. Conclusion The statistical analysis of the data revealed significant differences between the experimental and control groups in this study. Changes in the acoustic parameters of voice are caused by the insufficient airflow, and increased aperiodicity, irregularity, signal perturbation and level of noise, which are the consequences of pulmonary and laryngological involvements in patients with COVID-19.
Collapse
Affiliation(s)
- Maral Asiaee
- Department of Linguistics, Faculty of Literature, Alzahra University, Tehran, Iran
| | - Amir Vahedian-Azimi
- Trauma research Center, Nursing Faculty, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Seyed Shahab Atashi
- Department of Food and Drug control, Jundishapour University of Medical Sciences, Ahvaz, Iran
| | | | - Mandana Nourbakhsh
- Department of Linguistics, Faculty of Literature, Alzahra University, Tehran, Iran.
| |
Collapse
|
21
|
Barreira RR, Ling LL. Kullback–Leibler divergence and sample skewness for pathological voice quality assessment. Biomed Signal Process Control 2020. [DOI: 10.1016/j.bspc.2019.101697] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
22
|
Excitation modelling using epoch features for statistical parametric speech synthesis. COMPUT SPEECH LANG 2020. [DOI: 10.1016/j.csl.2019.101029] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
23
|
Hejná M, Šturm P, Tylečková L, Bořil T. Normophonic Breathiness in Czech and Danish: Are Females Breathier Than Males? J Voice 2020; 35:498.e1-498.e22. [PMID: 31902679 DOI: 10.1016/j.jvoice.2019.10.019] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2019] [Revised: 10/30/2019] [Accepted: 10/31/2019] [Indexed: 10/25/2022]
Abstract
The present study compares the voice quality of female and male speech in two languages: Czech, a Slavic language, and Danish, a Germanic language. For both languages, the results based on a total of 120 vocally healthy speakers are in line with the claim that females are universally breathier than males. This was supported by the Cepstral Peak Prominence (CPP) and H1*-H2* measures, which are generally known as the most robust correlates of breathiness, and also by the H1*-A3* measure. However, the sex distinction was unsupported or even contradictory when using some other measures suggested to reflect breathiness, which provides an incentive to insist on employing a number of acoustic measures in future voice research. The perceptual component of the study nevertheless suggests that these contradictory findings are due to differences in perceived roughness rather than breathiness, and that CPP and H1*-H2* do reflect breathiness differences, and CPP in particular. We therefore conclude that it is indeed the case that female speakers are breathier than male speakers. Finally, in terms of the two robust measures (CPP and H1*-H2*), no language-specific differences in the magnitude of the effect of sex on breathiness were found.
Collapse
Affiliation(s)
- Míša Hejná
- Department of English, Aarhus University, Aarhus C, Denmark
| | - Pavel Šturm
- Institute of Phonetics, Charles University, Praha 1, Czech Republic.
| | - Lea Tylečková
- Institute of Phonetics, Charles University, Praha 1, Czech Republic
| | - Tomáš Bořil
- Institute of Phonetics, Charles University, Praha 1, Czech Republic
| |
Collapse
|
24
|
Braun B, Dehé N, Neitsch J, Wochner D, Zahner K. The Prosody of Rhetorical and Information-Seeking Questions in German. LANGUAGE AND SPEECH 2019; 62:779-807. [PMID: 30563430 DOI: 10.1177/0023830918816351] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
This paper reports on the prosody of rhetorical questions (RQs) and information-seeking questions (ISQs) in German for two question types-polar questions and constituent questions (henceforth "wh-questions"). The results are as follows: Phonologically, polar RQs were mainly realized with H-% (high plateau), while polar ISQs mostly ended in H-^H% (high-rise). Wh-RQs almost exclusively terminated in a low edge tone, whereas wh-ISQs allowed for more tonal variation (L-%, L-H%, H-^H%). Irrespective of question type, RQs were mainly produced with L*+H accents. Phonetically, RQs were more often realized with breathy voice quality than ISQs, in particular in the beginning of the interrogative. Furthermore, they were produced with longer constituent durations than ISQs, in particular at the end of the interrogative. While the difference between RQs and ISQs is reflected in the intonational terminus of the utterance, this does not happen in the way suggested in the semantic literature, and in addition, accent type and phonetic parameters also play a role. Crucially, a simple distinction between rising and falling intonation is insufficient to capture the realization of the different illocution types (RQs, ISQs), against frequent claims in the semantic and pragmatic literature. We suggest alternative ways to interpret the findings.
Collapse
|
25
|
Chiaramonte R, Bonfiglio M. Acoustic analysis of voice in bulbar amyotrophic lateral sclerosis: a systematic review and meta-analysis of studies. LOGOP PHONIATR VOCO 2019; 45:151-163. [DOI: 10.1080/14015439.2019.1687748] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- Rita Chiaramonte
- Department of Physical Medicine and Rehabilitation, University of Catania, Catania, Italy
| | - Marco Bonfiglio
- Department for Health Activities, ASP Siracusa, Siracusa, Italy
| |
Collapse
|
26
|
The Teager-Kaiser Energy Cepstral Coefficients as an Effective Structural Health Monitoring Tool. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9235064] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Recently, features and techniques from speech processing have started to gain increasing attention in the Structural Health Monitoring (SHM) community, in the context of vibration analysis. In particular, the Cepstral Coefficients (CCs) proved to be apt in discerning the response of a damaged structure with respect to a given undamaged baseline. Previous works relied on the Mel-Frequency Cepstral Coefficients (MFCCs). This approach, while efficient and still very common in applications, such as speech and speaker recognition, has been followed by other more advanced and competitive techniques for the same aims. The Teager-Kaiser Energy Cepstral Coefficients (TECCs) is one of these alternatives. These features are very closely related to MFCCs, but provide interesting and useful additional values, such as e.g., improved robustness with respect to noise. The goal of this paper is to introduce the use of TECCs for damage detection purposes, by highlighting their competitiveness with closely related features. Promising results from both numerical and experimental data were obtained.
Collapse
|
27
|
Suire A, Raymond M, Barkat-Defradas M. Male Vocal Quality and Its Relation to Females' Preferences. EVOLUTIONARY PSYCHOLOGY 2019; 17:1474704919874675. [PMID: 31564128 PMCID: PMC10367192 DOI: 10.1177/1474704919874675] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2019] [Accepted: 08/16/2019] [Indexed: 11/16/2022] Open
Abstract
In both correlational and experimental settings, studies on women's vocal preferences have reported negative relationships between perceived attractiveness and men's vocal pitch, emphasizing the idea of an adaptive preference. However, such consensus on vocal attractiveness has been mostly conducted with native English speakers, but a few evidence suggest that it may be culture-dependent. Moreover, other overlooked acoustic components of vocal quality, such as intonation, perceived breathiness and roughness, may influence vocal attractiveness. In this context, the present study aims to contribute to the literature by investigating vocal attractiveness in an underrepresented language (i.e., French) as well as shedding light on its relationship with understudied acoustic components of vocal quality. More specifically, we investigated the relationships between attractiveness ratings as assessed by female raters and male voice pitch, its variation, the formants' dispersion and position, and the harmonics-to-noise and jitter ratios. Results show that women were significantly more attracted to lower vocal pitch and higher intonation patterns. However, they did not show any directional preferences for all the other acoustic features. We discuss our results in light of the adaptive functions of vocal preferences in a mate choice context.
Collapse
Affiliation(s)
- Alexandre Suire
- ISEM, University Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - Michel Raymond
- ISEM, University Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | | |
Collapse
|
28
|
On the design of automatic voice condition analysis systems. Part I: Review of concepts and an insight to the state of the art. Biomed Signal Process Control 2019. [DOI: 10.1016/j.bspc.2018.12.024] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
29
|
On the design of automatic voice condition analysis systems. Part II: Review of speaker recognition techniques and study on the effects of different variability factors. Biomed Signal Process Control 2019. [DOI: 10.1016/j.bspc.2018.09.003] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
30
|
Effect of vowel context in cepstral and entropy analysis of pathological voices. Biomed Signal Process Control 2019. [DOI: 10.1016/j.bspc.2018.08.021] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
31
|
Delgado-Hernández J, León-Gómez NM, Izquierdo-Arteaga LM, Llanos-Fumero Y. Cepstral Analysis of Normal and Pathological Voice in Spanish Adults. Smoothed Cepstral Peak Prominence in Sustained Vowels Versus Connected Speech. ACTA ACUST UNITED AC 2018. [DOI: 10.1016/j.otoeng.2017.05.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
32
|
Delgado-Hernández J, León-Gómez NM, Izquierdo-Arteaga LM, Llanos-Fumero Y. Análisis cepstral de la voz normal y patológica en adultos españoles. Medida de la prominencia del pico cepstral suavizado en vocales sostenidas versus habla conectada. ACTA OTORRINOLARINGOLOGICA ESPANOLA 2018; 69:134-140. [DOI: 10.1016/j.otorri.2017.05.006] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2017] [Accepted: 05/24/2017] [Indexed: 10/18/2022]
|
33
|
Mahato NB, Regmi D, Bista M, Sherpa P. Acoustic Analysis of Voice in School Teachers. JNMA J Nepal Med Assoc 2018; 56:658-661. [PMID: 30381759 PMCID: PMC8997266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
INTRODUCTION The term 'voice' is the acoustic energy generated from the vocal tract that are characterized by their dependence on vocal fold vibratory pattern. Teachers as professional voice users are afflicted with dysphonia and are discouraged with their jobs and seek alternative employment. Loud speaking and voice straining may lead to vocal fatigue and vocal fold tissue damage. The objective of the study is to assess the quality of voice of school teachers before and after teaching practice. METHODS Sixty teachers from various schools, volunteered to participate in this study. Acoustic analysis Doctor Speech Tiger Electronics, USA was used to assess the voice quality of the school teachers before and after teaching practice. The data were collected and analyzed using Doctor Speech Tiger Electronics, USA. Analysis was performed in terms of perturbation (jitter and shimmer), fundamental frequency, harmonic to noise ratio and maximum phonation time. RESULTS We found statistically significant difference in all the four parameters except the Jitter value. The fundamental frequency and shimmer value has significantly increased (P<0.001) and (P=0.002) respectively after teaching practice. Unlikely, there was significant decrease in harmonic to noise ratio value (P<0.001) and maximum phonation time value (P<0.01) after teaching practice. CONCLUSIONS Vocal abuse, overuse, or misuse in teaching practice over a long period of time can result in inadequate phonatory pattern due to vocal fold tissue damage, which ultimately results in vocal nodules or polyps. So voice evaluation is particularly important for professional voice users and for the people who are concerned about their quality of voice.
Collapse
Affiliation(s)
- Nain Bahadur Mahato
- Department of ENT-HNS, Kathmandu Medical College, Sinamangal, Kathmandu, Nepal
| | - Deepak Regmi
- Department of ENT-HNS, Kathmandu Medical College, Sinamangal, Kathmandu, Nepal
| | - Meera Bista
- Department of ENT-HNS, Kathmandu Medical College, Sinamangal, Kathmandu, Nepal
| | - Pema Sherpa
- Department of ENT-HNS, Kathmandu Medical College, Sinamangal, Kathmandu, Nepal
| |
Collapse
|
34
|
Abstract
Objective To investigate the effects of breathy voice sources on ratings of hypernasality using synthesized speech. Methods Speech samples were obtained from children with cleft palates who demonstrated varying degrees of hypernasality and from a child with a voice disorder. Sources with 6 degrees of breathiness were created: a breathy source and five synthesized sources with lowered harmonics-to-noise ratio (HNR) values by the addition of impulses. These sources and each original (clear) source were combined with three kinds of filters: mild, moderate, and severely hypernasal. Consequently, 21 ([6 + 1] × 3) stimuli for each vowel (/a/ and /i/) were obtained for ratings. Participants Thirteen speech pathologists with academic training and various clinical experiences with cleft palate speech rated hypernasality of the stimuli on a 5-point scale. Main Outcome Measures Ratings of hypernasality for breathy and clear stimuli were analyzed using a repeated measures analysis of variance. Results The effects of breathy source on ratings of hypernasality were significant for the following filters: mild hypernasal /a/, severe hypernasal /a/, mild hypernasal /i/, and moderate hypernasal /i/. A post-hoc comparison test demonstrated that the more breathy sources (BH0 or BH2) generally increased the hypernasality score for mild hypernasal filters and decreased it for moderate and severe hypernasal filters. The less breathy sources (BH3, BH4, and BH5) hardly affected the ratings. Conclusion The effects of breathiness on ratings of hypernasality seem to moderate rather than to mask perceived hypernasality. That is, breathiness raises slight hypernasality, whereas it reduces severe hypernasality.
Collapse
|
35
|
Parkinson’s Disease and Aging: Analysis of Their Effect in Phonation and Articulation of Speech. Cognit Comput 2017. [DOI: 10.1007/s12559-017-9497-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
36
|
On the harmonic-to-noise ratio as an acoustic cue of vocal timbre of Parkinson speakers. Biomed Signal Process Control 2017. [DOI: 10.1016/j.bspc.2016.09.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
37
|
Gerratt BR, Kreiman J, Garellek M. Comparing Measures of Voice Quality From Sustained Phonation and Continuous Speech. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2016; 59:994-1001. [PMID: 27626612 PMCID: PMC5345563 DOI: 10.1044/2016_jslhr-s-15-0307] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2015] [Revised: 12/10/2015] [Accepted: 03/24/2016] [Indexed: 05/21/2023]
Abstract
PURPOSE The question of what type of utterance-a sustained vowel or continuous speech-is best for voice quality analysis has been extensively studied but with equivocal results. This study examines whether previously reported differences derive from the articulatory and prosodic factors occurring in continuous speech versus sustained phonation. METHOD Speakers with voice disorders sustained vowels and read sentences. Vowel samples were excerpted from the steadiest portion of each vowel in the sentences. In addition to sustained and excerpted vowels, a 3rd set of stimuli was created by shortening sustained vowel productions to match the duration of vowels excerpted from continuous speech. Acoustic measures were made on the stimuli, and listeners judged the severity of vocal quality deviation. RESULTS Sustained vowels and those extracted from continuous speech contain essentially the same acoustic and perceptual information about vocal quality deviation. CONCLUSIONS Perceived and/or measured differences between continuous speech and sustained vowels derive largely from voice source variability across segmental and prosodic contexts and not from variations in vocal fold vibration in the quasisteady portion of the vowels. Approaches to voice quality assessment by using continuous speech samples average across utterances and may not adequately quantify the variability they are intended to assess.
Collapse
|
38
|
Garellek M, Samlan R, Gerratt BR, Kreiman J. Modeling the voice source in terms of spectral slopes. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 139:1404-10. [PMID: 27036277 PMCID: PMC4818273 DOI: 10.1121/1.4944474] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2015] [Revised: 12/24/2015] [Accepted: 03/03/2016] [Indexed: 05/20/2023]
Abstract
A psychoacoustic model of the voice source spectrum is proposed. The model is characterized by four spectral slope parameters: the difference in amplitude between the first two harmonics (H1-H2), the second and fourth harmonics (H2-H4), the fourth harmonic and the harmonic nearest 2 kHz in frequency (H4-2 kHz), and the harmonic nearest 2 kHz and that nearest 5 kHz (2 kHz-5 kHz). As a step toward model validation, experiments were conducted to establish the acoustic and perceptual independence of these parameters. In experiment 1, the model was fit to a large number of voice sources. Results showed that parameters are predictable from one another, but that these relationships are due to overall spectral roll-off. Two additional experiments addressed the perceptual independence of the source parameters. Listener sensitivity to H1-H2, H2-H4, and H4-2 kHz did not change as a function of the slope of an adjacent component, suggesting that sensitivity to these components is robust. Listener sensitivity to changes in spectral slope from 2 kHz to 5 kHz depended on complex interactions between spectral slope, spectral noise levels, and H4-2 kHz. It is concluded that the four parameters represent non-redundant acoustic and perceptual aspects of voice quality.
Collapse
Affiliation(s)
- Marc Garellek
- Department of Linguistics, University of California, San Diego, 9500 Gilman Drive #0108, La Jolla, California 92023-0108, USA
| | - Robin Samlan
- Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721-0071, USA
| | - Bruce R Gerratt
- Department of Head and Neck Surgery, UCLA School of Medicine, Los Angeles, California 90095-1794, USA
| | - Jody Kreiman
- Department of Head and Neck Surgery, UCLA School of Medicine, Los Angeles, California 90095-1794, USA
| |
Collapse
|
39
|
Vocal Characteristics of Elderly Women Engaged in Aerobics in Private Institutions of Salvador, Bahia. J Voice 2016; 30:127.e9-19. [DOI: 10.1016/j.jvoice.2015.02.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2014] [Accepted: 02/12/2015] [Indexed: 11/20/2022]
|
40
|
Balasubramanium RK, Shastry A, Singh M, Bhat JS. Cepstral Characteristics of Voice in Indian Female Classical Carnatic Singers. J Voice 2015; 29:693-5. [DOI: 10.1016/j.jvoice.2015.01.002] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2013] [Accepted: 01/14/2015] [Indexed: 10/23/2022]
|
41
|
Mohseni R, Sandoughdar N. Survey of Voice Acoustic Parameters in Iranian Female Teachers. J Voice 2015; 30:507.e1-5. [PMID: 26275636 PMCID: PMC4943854 DOI: 10.1016/j.jvoice.2015.05.020] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2015] [Accepted: 05/29/2015] [Indexed: 11/15/2022]
Abstract
Objectives Teachers are one of the professional voice users. Voice problems are common among them. Female teachers are known to have more voice problems than male ones. Furthermore, there are only few studies on the voice of Iranian female teachers. The present study investigated the acoustic parameters of voice in Iranian female teachers and compares them with nonteachers. Methods In this cross-sectional study, 90 Iranian female elementary teachers, 30–50 years old, and 90 Iranian female nonteachers in the same age were assessed between May 2010 and October 2011. Data collection was carried out, using the Dr. Speech software (subprogram: vocal assessment Version 4.0 from Tiger Electronics) at the speech therapy clinic under a comfortable phonation. Normal voice in practitioners was judged by the perceptual evaluation by a voice therapist and indirect laryngoscopy examination by an otorhinolaryngologist. Voice characteristics were assessed with GRBAS scale. The speech sample was sustained /â/ using habitual and constant vocal for 10 seconds. Three tokens from each subject were obtained. Then, each subject was asked to read a standard passage in Farsi. Eventually, the difference measures of F0, jitter, shimmer, harmonic to noise ratio (HNR), and maximum of phonation time (MPT) between two groups were investigated by statistics software SPSS 19.0 (IBM corp.2010). Results Results showed that the values of F0 were higher in teachers (210.03 Hz) than in nonteachers (194.11 Hz; P < 0.001). In addition, the values of perturbation measures were greater in teachers (jitter 0.32% and shimmer 4.63%) than those in the control group (jitter 0.22% and shimmer 3.15%; P < 0.001), but in HNR and MPT values, nonteachers showed higher levels (P < 0.001). The value of HNR in teachers was (18.84±1.56) but it was (21.3±1.73) in non-teachers and MPT value in teachers was (16.83±3.65) and in non-teachers was (22.5±5.2). Conclusions It can be concluded that vocal overuse, abuse, or misuse during teaching over a period of time result in achievement of inadequate phonatory pattern with excessive musculoskeletal tension, and the possible result is tissue changes in teacher's voice. In addition, acoustic analysis of voice parameters for teachers may significantly contribute to the objective voice examination of this group.
Collapse
Affiliation(s)
- R Mohseni
- Department of Speech and Language Pathology, Hazrat-e-Rasoul Hospital, Tehran, Iran
| | - N Sandoughdar
- Department of Speech and Language Pathology, Taleghani General Hospital, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
42
|
Kreiman J, Garellek M, Chen G, Alwan A, Gerratt BR. Perceptual evaluation of voice source models. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 138:1-10. [PMID: 26233000 PMCID: PMC4491021 DOI: 10.1121/1.4922174] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Models of the voice source differ in their fits to natural voices, but it is unclear which differences in fit are perceptually salient. This study examined the relationship between the fit of five voice source models to 40 natural voices, and the degree of perceptual match among stimuli synthesized with each of the modeled sources. Listeners completed a visual sort-and-rate task to compare versions of each voice created with the different source models, and the results were analyzed using multidimensional scaling. Neither fits to pulse shapes nor fits to landmark points on the pulses predicted observed differences in quality. Further, the source models fit the opening phase of the glottal pulses better than they fit the closing phase, but at the same time similarity in quality was better predicted by the timing and amplitude of the negative peak of the flow derivative (part of the closing phase) than by the timing and/or amplitude of peak glottal opening. Results indicate that simply knowing how (or how well) a particular source model fits or does not fit a target source pulse in the time domain provides little insight into what aspects of the voice source are important to listeners.
Collapse
Affiliation(s)
- Jody Kreiman
- Department of Head and Neck Surgery, University of California-Los Angeles School of Medicine, 31-24 Rehabilitation Center, Los Angeles, California 90095-1794, USA
| | - Marc Garellek
- Department of Linguistics, University of California-San Diego, 9500 Gilman Drive #0108, La Jolla, California 92093-0108, USA
| | - Gang Chen
- Department of Electrical Engineering, University of California-Los Angeles, 66-147 G Engineering IV, Los Angeles, California 90095-1594, USA
| | - Abeer Alwan
- Department of Electrical Engineering, University of California-Los Angeles, 66-147 G Engineering IV, Los Angeles, California 90095-1594, USA
| | - Bruce R Gerratt
- Department of Head and Neck Surgery, University of California-Los Angeles School of Medicine, 31-24 Rehabilitation Center, Los Angeles, California 90095-1794, USA
| |
Collapse
|
43
|
Watts CR, Ronshaugen R, Saenz D. The effect of age and vocal task on cepstral/spectral measures of vocal function in adult males. CLINICAL LINGUISTICS & PHONETICS 2015; 29:415-423. [PMID: 25651197 DOI: 10.3109/02699206.2015.1005673] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
This study investigated the effect of aging on cepstral/spectral acoustic measures calculated from clinical stimuli (vowels and sentences from the Consensus Auditory Perceptual Evaluation of Voice). Thirty younger adult males (20-49 years of age) and thirty older males (50-79 years of age) produced sustained vowels and read a connected speech stimulus which were applied to cepstral/spectral acoustic analyses to derive the multiparametric measure of Cepstral/Spectral Index of Dysphonia (CSID). Results indicated that older males exhibited significantly greater CSID measures than younger males in connected speech (p=0.001; d=0.98), but not the vowel. Linear regression revealed a moderate correlation between age and CSID in connected speech. These results further inform our understanding of how aging influences voice production in varied contexts and how commonly utilised clinical voice tasks subjected to cepstral/spectral acoustic analyses might differentially inform our knowledge of underlying vocal physiology.
Collapse
Affiliation(s)
- Christopher R Watts
- Department of Communication Sciences & Disorders, Texas Christian University , Fort Worth, TX , USA
| | | | | |
Collapse
|
44
|
Skowronski MD, Shrivastav R, Hunter EJ. Cepstral Peak Sensitivity: A Theoretic Analysis and Comparison of Several Implementations. J Voice 2015; 29:670-81. [PMID: 25944288 DOI: 10.1016/j.jvoice.2014.11.005] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2014] [Accepted: 11/11/2014] [Indexed: 10/23/2022]
Abstract
OBJECTIVE The aim of this study was to develop a theoretic analysis of the cepstral peak (CP), to compare several CP software programs, and to propose methods for reducing variability in CP estimation. STUDY DESIGN Descriptive, experimental study. METHODS The theoretic CP value of a pulse train was derived and compared with estimates computed for pulse train WAV files using available CP software programs: (1) Hillenbrand's CP prominence (CPP) software (Western Michigan University, Kalamazoo, MI), (2) KayPENTAX (Montvale, NJ) Multi-Speech implementation of CPP, and (3) a MATLAB (The Mathworks, Natick, MA, version R2014a) implementation using cepstral interpolation. The CP variation was also investigated for synthetic breathy vowels. RESULTS For pulse trains with period T samples, the theoretic CP is 1/2+ε/T, |ε|<0.1 for all pulse trains (ε=0 for integer T). For fundamental frequencies between 70 and 230Hz, the CP mean±standard deviation was 0.496±0.002 using cepstral interpolation and 0.29±0.03 using Hillenbrand's software, whereas CPP was 35.0±3.8dB using Hillenbrand's software and 20.5±2.7dB using KayPENTAX's software. The CP and CPP versus signal-to-noise ratio for synthetic breathy vowels were fit to a logistic model for the Hillenbrand (R(2)=0.92) and KayPENTAX (R(2)=0.82) estimators as well as an ideal estimator (R(2)=0.98), which used a period-synchronous analysis. CONCLUSIONS The findings indicate that several variables unrelated to the signal itself impact CP values, with some factors introducing large variability in CP values that would otherwise be attributed to the signal (eg, voice quality). Variability may be reduced by using a period-synchronous analysis with Hann windows.
Collapse
Affiliation(s)
- Mark D Skowronski
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan.
| | - Rahul Shrivastav
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| | - Eric J Hunter
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| |
Collapse
|
45
|
Kacha A, Grenez F, Schoentgen J. Multiband vocal dysperiodicities analysis using empirical mode decomposition in the log-spectral domain. Biomed Signal Process Control 2015. [DOI: 10.1016/j.bspc.2014.08.011] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
46
|
Abramson AS, Tiede MK, Luangthongkum T. Voice Register in Mon: Acoustics and Electroglottography. PHONETICA 2015; 72:237-56. [PMID: 26636544 PMCID: PMC4751869 DOI: 10.1159/000441728] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/11/2014] [Accepted: 10/14/2015] [Indexed: 05/20/2023]
Abstract
Mon is spoken in villages in Thailand and Myanmar. The dialect of Ban Nakhonchum, Thailand, has 2 voice registers, modal and breathy; these phonation types, along with other phonetic properties, distinguish minimal pairs. Four native speakers of this dialect recorded repetitions of 14 randomized words (7 minimal pairs) for acoustic analysis. We used a subset of these pairs in a listening test to verify the perceptual robustness of the register distinction. Acoustic analysis found significant differences in noise component, spectral slope and fundamental frequency. In a subsequent session 4 speakers were also recorded using electroglottography, which showed systematic differences in the contact quotient. The salience of these properties in maintaining the register distinction is discussed in the context of possible tonogenesis for this language.
Collapse
Affiliation(s)
- Arthur S. Abramson
- Haskins Laboratories, New Haven, Conn., U.S.A
- Department of Linguistics, University of Connecticut, Storrs, Conn., U.S.A
| | | | | |
Collapse
|
47
|
Jannetts S, Lowit A. Cepstral Analysis of Hypokinetic and Ataxic Voices: Correlations With Perceptual and Other Acoustic Measures. J Voice 2014; 28:673-80. [DOI: 10.1016/j.jvoice.2014.01.013] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2013] [Accepted: 01/23/2014] [Indexed: 10/25/2022]
|
48
|
Solé-Casals J, Munteanu C, Martín OC, Barbé F, Queipo C, Amilibia J, Durán-Cantolla J. Detection of severe obstructive sleep apnea through voice analysis. Appl Soft Comput 2014. [DOI: 10.1016/j.asoc.2014.06.017] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
49
|
Rruqja N, Dejonckere P, Cantarella G, Schoentgen J, Orlandi S, Barbagallo S, Manfredi C. Testing software tools with synthesized deviant voices for medicolegal assessment of occupational dysphonia. Biomed Signal Process Control 2014. [DOI: 10.1016/j.bspc.2014.03.011] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
50
|
Akbari A, Arjmandi MK. An efficient voice pathology classification scheme based on applying multi-layer linear discriminant analysis to wavelet packet-based features. Biomed Signal Process Control 2014. [DOI: 10.1016/j.bspc.2013.11.002] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|