1
|
Cox SR, Huang T, Chen WR, Ng ML. An acoustic study of Cantonese alaryngeal speech in different speaking conditions. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:2973. [PMID: 37212513 PMCID: PMC10205142 DOI: 10.1121/10.0019471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 04/30/2023] [Accepted: 05/02/2023] [Indexed: 05/23/2023]
Abstract
Esophageal (ES) speech, tracheoesophageal (TE) speech, and the electrolarynx (EL) are common methods of communication following the removal of the larynx. Our recent study demonstrated that intelligibility may increase for Cantonese alaryngeal speakers using clear speech (CS) compared to their everyday "habitual speech" (HS), but the reasoning is still unclear [Hui, Cox, Huang, Chen, and Ng (2022). Folia Phoniatr. Logop. 74, 103-111]. The purpose of this study was to assess the acoustic characteristics of vowels and tones produced by Cantonese alaryngeal speakers using HS and CS. Thirty-one alaryngeal speakers (9 EL, 10 ES, and 12 TE speakers) read The North Wind and the Sun passage in HS and CS. Vowel formants, vowel space area (VSA), speaking rate, pitch, and intensity were examined, and their relationship to intelligibility were evaluated. Statistical models suggest that larger VSAs significantly improved intelligibility, but slower speaking rate did not. Vowel and tonal contrasts did not differ between HS and CS for all three groups, but the amount of information encoded in fundamental frequency and intensity differences between high and low tones positively correlated with intelligibility for TE and ES groups, respectively. Continued research is needed to understand the effects of different speaking conditions toward improving acoustic and perceptual characteristics of Cantonese alaryngeal speech.
Collapse
Affiliation(s)
- Steven R Cox
- Department of Communication Sciences and Disorders, Adelphi University, Garden City, New York 11530, USA
| | - Ting Huang
- Haskins Laboratories, New Haven, Connecticut 06511, USA
| | - Wei-Rong Chen
- Haskins Laboratories, New Haven, Connecticut 06511, USA
| | - Manwa L Ng
- Speech Science Laboratory, Faculty of Education, University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|
2
|
Cox SR, McNicholl K, Shadle CH, Chen WR. Variability of Electrolaryngeal Speech Intelligibility in Multitalker Babble. AMERICAN JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2020; 29:2012-2022. [PMID: 32870708 PMCID: PMC8740568 DOI: 10.1044/2020_ajslp-20-00092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 06/08/2020] [Accepted: 06/29/2020] [Indexed: 06/11/2023]
Abstract
Purpose The purpose of this study was to report the variability of electrolarynx (EL) users' speech intelligibility in quiet and in multitalker babble. Method Ten EL users (five Servox® Digital, five TruTone™) who were at least 2 years postlaryngectomy provided recordings of five sentences from the 1965 Revised List of Phonetically Balanced Sentences. Recordings were judged by two groups of naïve listeners in quiet and in the presence of multitalker babble. Fifteen listeners orthographically transcribed a total of 750 sentences containing 3,750 key words in quiet, and another 15 listeners orthographically transcribed the same sentences mixed with multitalker babble. Results Significant differences in speech intelligibility were observed between listening conditions; 17.9% more key words were correctly identified in quiet compared to multitalker babble. Significant differences in fundamental frequency (F0) standard deviation and range but not speech intelligibility were observed between EL device types. A positive correlation of moderate significance was observed between F0 standard deviation and intelligibility for TruTone users in multitalker babble. Conclusions Findings suggest that listeners are able to identify a significantly higher percentage of EL users' speech in quiet compared to multitalker babble, but a large variability in EL users' speech intelligibility exists. Continued investigation involving a larger number of EL users is necessary to confirm this study's findings. Future research should explore the relationships among F0 measures, speaker characteristics (e.g., rate of speech, articulatory precision), and speech intelligibility, in addition to improving alaryngeal rehabilitation training protocols for EL users.
Collapse
Affiliation(s)
- Steven R. Cox
- Department of Communication Sciences and Disorders, Adelphi University, Garden City, NY
| | - Kimberly McNicholl
- Department of Communication Sciences and Disorders, Adelphi University, Garden City, NY
| | | | | |
Collapse
|
3
|
Al-Zanoon N, Parsa V, Doyle PC. Using visual feedback to enhance intonation control with a variable pitch electrolarynx. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:1802. [PMID: 32237840 DOI: 10.1121/10.0000936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Accepted: 03/03/2020] [Indexed: 06/11/2023]
Abstract
This study evaluated the effectiveness of using visual feedback to facilitate pitch control by a speaker using a pressure sensitive onset controlled electrolarynx (EL). This proof-of-concept study was conducted with one healthy adult. The participant-speaker was provided with computer generated visual feedback over five sessions within a consecutive period of three weeks. Changes in force control accuracy were gathered and analyzed. An improvement in finger (thumb) force control accuracy from the first to the last training session was documented. The results of this study provide data toward the development of a clinical training protocol for the use of a pressure sensitive onset controlled EL by laryngectomized speakers. Further, these results highlight the importance of developing a relevant multimodality training protocol for the improvement of postlaryngectomy EL speech production.
Collapse
Affiliation(s)
- Noor Al-Zanoon
- Department of Communication Sciences and Disorders, University of Alberta, 116 Street and 85 Avenue, Edmonton, Alberta T6G 2R3, Canada
| | - Vijay Parsa
- School of Communication Sciences and Disorders, Elborn College, Western University, London, Ontario N6A 3K7, Canada
| | - Philip C Doyle
- School of Communication Sciences and Disorders, Elborn College, Western University, London, Ontario N6A 3K7, Canada
| |
Collapse
|
4
|
Qian Z, Wang L, Zhang S, Liu C, Niu H. Mandarin Electrolaryngeal Speech Recognition Based on WaveNet-CTC. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2019; 62:2203-2212. [PMID: 31200617 DOI: 10.1044/2019_jslhr-s-18-0313] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Purpose The application of Chinese Mandarin electrolaryngeal (EL) speech for laryngectomees has been limited by its drawbacks such as single fundamental frequency, mechanical sound, and large radiation noise. To improve the intelligibility of Chinese Mandarin EL speech, a new perspective using the automatic speech recognition (ASR) system was proposed, which can convert EL speech into healthy speech, if combined with text-to-speech. Method An ASR system was designed to recognize EL speech based on a deep learning model WaveNet and the connectionist temporal classification (WaveNet-CTC). This system mainly consists of 3 parts: the acoustic model, the language model, and the decoding model. The acoustic features are extracted during speech preprocessing, and 3,230 utterances of EL speech mixed with 10,000 utterances of healthy speech are used to train the ASR system. Comparative experiment was designed to evaluate the performance of the proposed method. Results The results show that the proposed ASR system has higher stability and generalizability compared with the traditional methods, manifesting superiority in terms of Chinese characters, Chinese words, short sentences, and long sentences. Phoneme confusion occurs more easily in the stop and affricate of EL speech than the healthy speech. However, the highest accuracy of the ASR could reach 83.24% when 3,230 utterances of EL speech were used to train the ASR system. Conclusions This study indicates that EL speech could be recognized effectively by the ASR based on WaveNet-CTC. This proposed method has a higher generalization performance and better stability than the traditional methods. A higher accuracy of the ASR system based on WaveNet-CTC can be obtained, which means that EL speech can be converted into healthy speech. Supplemental Material https://doi.org/10.23641/asha.8250830.
Collapse
Affiliation(s)
- Zhaopeng Qian
- School of Biological Science & Medical Engineering, Beihang University, Beijing, China
| | - Li Wang
- School of Biological Science & Medical Engineering, Beihang University, Beijing, China
- Beijing Research Center of Urban System Engineering, China
| | - Shaochuan Zhang
- School of Biological Science & Medical Engineering, Beihang University, Beijing, China
| | - Chan Liu
- School of Biological Science & Medical Engineering, Beihang University, Beijing, China
| | - Haijun Niu
- School of Biological Science & Medical Engineering, Beihang University, Beijing, China
| |
Collapse
|
5
|
Li W, Zhaopeng Q, Yijun F, Haijun N. Design and Preliminary Evaluation of Electrolarynx With F0 Control Based on Capacitive Touch Technology. IEEE Trans Neural Syst Rehabil Eng 2018. [PMID: 29522407 DOI: 10.1109/tnsre.2018.2805338] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
An electrolarynx (EL) is one of the most popular voice rehabilitation technologies used after laryngectomy. However, most ELs generate monotonic EL speech, which has been shown to create a particular deficit in speech intelligibility, especially for Chinese Mandarin (Mandarin). Mandarin is a tonal language that makes lexical distinctions using variations in tone. Our purpose is to design an EL that can produce the four Mandarin tones, and to evaluate its performance. We designed a fundamental frequency (F0) control method for Mandarin EL speech and manufactured a touch-controlled electrolarynx (T-EL) prototype. Using monosyllables, disyllabic words, and frequently used phrases, we evaluated speech produced with a T-EL, as well as with monotone (M-EL) and variable-frequency modes (P-EL) of a commercially available TruTone EL. A male native Mandarin speaker with laryngectomy volunteered to be the speaker. Results show that the normal speech pitch contours of the four Mandarin tones were most closely matched by the characteristics produced with T-EL. The statistical accuracy of the T-EL's tone and word perception was significantly higher than that of the other EL types. Moreover, the confusion matrix indicates that the listeners could correctly identify the tones of monosyllables and disyllabic words in T-EL speech. Accurate tone judgment can improve the intelligibility of EL speech in Mandarin. The mean opinion score was used to evaluate the listeners' acceptability of EL speech. The scores of the T-EL and M-EL were very close, and the score of the P-EL was significantly lower than that of the other two ELs. However, the results from a single speaker cannot provide sufficient data to conclude which EL has a higher acceptability. The evaluation of multiple EL speakers with different EL types at difference levels of proficiency should be studied in future research.
Collapse
|
6
|
Wang L, Feng Y, Yang Z, Niu H. Development and evaluation of wheel-controlled pitch-adjustable electrolarynx. Med Biol Eng Comput 2016; 55:1463-1472. [PMID: 28013472 DOI: 10.1007/s11517-016-1606-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2016] [Accepted: 12/03/2016] [Indexed: 10/20/2022]
Abstract
Tone is important in tonal languages, especially in Mandarin. However, there is presently no commercially available electrolarynx (EL) for laryngectomized Mandarin speakers. Moreover, few studies have focused on this area. Our purpose is to design an EL that produces the four Mandarin tones and to evaluate its performance. We designed a wheel-controlled pitch-adjustable EL and manufactured a prototype (Wheel-EL). Using monosyllables, disyllabic segments, and frequently used phrases, we evaluated speech produced by Wheel-EL and by monotone (M-TruTone) and variable-frequency modes (V-TruTone) of the commercially available TruTone EL. The pitch contours of the high-level (HL), middle-rising (MR), and falling-rising (FR) tones produced by Wheel-EL most closely matched the natural speech characteristics of a native speaker. However, redundant sounds were generated in the high-falling (HF) tone. The statistical accuracy of Wheel-EL's tone and word perception was significantly higher than that of other EL types. However, no significant differences existed in acceptability among the three EL speech types. Wheel-EL produces better HL, MR, and FR tones in Mandarin than either M-TruTone or V-TruTone. Nevertheless, redundant sounds affect HF phonation. Accurate tone judgment can improve the intelligibility of EL speech in Mandarin but has no obvious effect on acceptability.
Collapse
Affiliation(s)
- Li Wang
- School of Biological Science and Medical Engineering, Beihang University, No. 37, XueYuan Road, Haidian District, Beijing, 100191, China
| | - Yijun Feng
- School of Biological Science and Medical Engineering, Beihang University, No. 37, XueYuan Road, Haidian District, Beijing, 100191, China
| | - Ze Yang
- School of Biological Science and Medical Engineering, Beihang University, No. 37, XueYuan Road, Haidian District, Beijing, 100191, China
| | - Haijun Niu
- School of Biological Science and Medical Engineering, Beihang University, No. 37, XueYuan Road, Haidian District, Beijing, 100191, China.
| |
Collapse
|
7
|
Kyong JS, Scott SK, Rosen S, Howe TB, Agnew ZK, McGettigan C. Exploring the roles of spectral detail and intonation contour in speech intelligibility: an FMRI study. J Cogn Neurosci 2014; 26:1748-63. [PMID: 24568205 DOI: 10.1162/jocn_a_00583] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
The melodic contour of speech forms an important perceptual aspect of tonal and nontonal languages and an important limiting factor on the intelligibility of speech heard through a cochlear implant. Previous work exploring the neural correlates of speech comprehension identified a left-dominant pathway in the temporal lobes supporting the extraction of an intelligible linguistic message, whereas the right anterior temporal lobe showed an overall preference for signals clearly conveying dynamic pitch information [Johnsrude, I. S., Penhune, V. B., & Zatorre, R. J. Functional specificity in the right human auditory cortex for perceiving pitch direction. Brain, 123, 155-163, 2000; Scott, S. K., Blank, C. C., Rosen, S., & Wise, R. J. Identification of a pathway for intelligible speech in the left temporal lobe. Brain, 123, 2400-2406, 2000]. The current study combined modulations of overall intelligibility (through vocoding and spectral inversion) with a manipulation of pitch contour (normal vs. falling) to investigate the processing of spoken sentences in functional MRI. Our overall findings replicate and extend those of Scott et al. [Scott, S. K., Blank, C. C., Rosen, S., & Wise, R. J. Identification of a pathway for intelligible speech in the left temporal lobe. Brain, 123, 2400-2406, 2000], where greater sentence intelligibility was predominately associated with increased activity in the left STS, and the greatest response to normal sentence melody was found in right superior temporal gyrus. These data suggest a spatial distinction between brain areas associated with intelligibility and those involved in the processing of dynamic pitch information in speech. By including a set of complexity-matched unintelligible conditions created by spectral inversion, this is additionally the first study reporting a fully factorial exploration of spectrotemporal complexity and spectral inversion as they relate to the neural processing of speech intelligibility. Perhaps surprisingly, there was little evidence for an interaction between the two factors-we discuss the implications for the processing of sound and speech in the dorsolateral temporal lobes.
Collapse
|
8
|
Nagle KF, Eadie TL, Wright DR, Sumida YA. Effect of fundamental frequency on judgments of electrolaryngeal speech. AMERICAN JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2012; 21:154-166. [PMID: 22355005 DOI: 10.1044/1058-0360(2012/11-0050)] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
PURPOSE To determine (a) the effect of fundamental frequency (f₀) on speech intelligibility, acceptability, and perceived gender in electrolaryngeal (EL) speakers, and (b) the effect of known gender on speech acceptability in EL speakers. METHOD A 2-part study was conducted. In Part 1, 34 healthy adults provided speech recordings using electrolarynges set at 75 Hz, 130 Hz, and 175 Hz, and 36 listeners transcribed the recordings. In Part 2, 22 speech samples were presented to 16 listeners. First, listeners identified the gender of each speaker and judged his or her speech acceptability using rating scales. Second, listeners judged the same samples for speech acceptability when gender information was provided. RESULTS In Part 1, speakers were significantly more intelligible when using 75-Hz devices. In Part 2, the f₀ of the speech signal significantly impacted listeners' accuracy in perceiving the speaker's gender: In gender-incongruent conditions (males using 175-Hz devices, females using 75-Hz devices), listeners were unable to identify female speakers. Speech acceptability judgments were directly related to intelligibility. Finally, listeners differentially penalized female speakers who used 75-Hz devices when gender information was known. CONCLUSION Low f₀ facilitated speech intelligibility. However, at low f₀, listeners were unable to identify females as female, and females were differentially penalized for speech acceptability. Results may have implications for rehabilitation.
Collapse
|
9
|
Heaton JT, Robertson M, Griffin C. Development of a wireless electromyographically controlled electrolarynx voice prosthesis. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2012; 2011:5352-5. [PMID: 22255547 DOI: 10.1109/iembs.2011.6091324] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
The most common artificial voice source for post-laryngectomy speech rehabilitation is the handheld buzzer or electrolarynx (EL). EL speech is often described as mechanical-sounding (robotic), and typically lacks pitch variation, making it monotone and unnatural. Prior studies have shown improved perceptual ratings of speech naturalness when pitch variation is added to EL speech, and a proof-of-concept EL prosthesis has been developed to provide pitch variation and voice on/off control in relation to neck muscle electromyographic (EMG) signals. The goal of the present study was to design a new wireless version of the EMG-controlled EL (EMG-EL) that could provide a flexible mixture of manual (push button) and automatic (EMG-based) control options for voice onset/offset and pitch, and that could be manufactured at a reasonable cost for widespread patient use. This paper describes both technical and human factors considered while designing the new EMG-EL voice prosthesis.
Collapse
Affiliation(s)
- James T Heaton
- Massachusetts General Hospital Department ofSurgery and Harvard Medical School, One Bowdoin Square Floor 11, Boston, MA 02114, USA.
| | | | | |
Collapse
|
10
|
Miller SE, Schlauch RS, Watson PJ. The effects of fundamental frequency contour manipulations on speech intelligibility in background noise. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 128:435-43. [PMID: 20649237 DOI: 10.1121/1.3397384] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Previous studies have documented that speech with flattened or inverted fundamental frequency (F0) contours is less intelligible than speech with natural variations in F0. The purpose of this present study was to further investigate how F0 manipulations affect speech intelligibility in background noise. Speech recognition in noise was measured for sentences having the following F0 contours: unmodified, flattened at the median, natural but exaggerated, inverted, and sinusoidally frequency modulated at rates of 2.5 and 5.0 Hz, rates shown to make vowels more perceptually salient in background noise. Five talkers produced 180 stimulus sentences, with 30 unique sentences per F0 contour condition. Flattening or exaggerating the F0 contour reduced key word recognition performance by 13% relative to the naturally produced speech. Inverting or sinusoidally frequency modulating the F0 contour reduced performance by 23% relative to typically produced speech. These results support the notion that linguistically incorrect or misleading cues have a greater deleterious effect on speech understanding than linguistically neutral cues.
Collapse
Affiliation(s)
- Sharon E Miller
- Department of Speech-Language-Hearing Sciences, University of Minnesota, Minneapolis, Minnesota 55455, USA.
| | | | | |
Collapse
|