1
|
Does the wearing of masks change voice and speech parameters? Eur Arch Otorhinolaryngol 2021; 279:1701-1708. [PMID: 34550454 PMCID: PMC8456395 DOI: 10.1007/s00405-021-07086-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 09/10/2021] [Indexed: 11/01/2022]
Abstract
PURPOSE The authors aim to review available reports on the potential effects of masks on voice and speech parameters. METHODS A literature search was conducted using MEDLINE and Google Scholar databases through July 2021. Several targeted populations, mask scenarios and methodologies were approached. The assessed voice parameters were divided into self-reported, acoustic and aerodynamic. RESULTS It was observed that the wearing of a face mask has been shown to induce several changes in voice parameters: (1) self-reported-significantly increased vocal effort and fatigue, increased vocal tract discomfort and increased values of voice handicap index (VHI) were observed; (2) acoustics-increased voice intensity, altered formants frequency (F2 and F3) with no changes in fundamental frequency, increased harmonics-to-noise ratio (HNR) and increased mean spectral values in high-frequency levels (1000-8000 Hz), especially with KN95 mask; (3) aerodynamics-maximum phonatory time was assessed in only two reports, and showed no alterations. CONCLUSION Despite the different populations, mask-type scenarios and methodologies described by each study, the results of this review outline the significant changes in voice characteristics with the use of face masks. Wearing a mask shows to increase the perception of vocal effort and an alteration of the vocal tract length and speech articulatory movements, leading to spectral sound changes, impaired communication and perception. Studies analyzing the effect of masks on voice aerodynamics are lacking. Further research is required to study the long-term effects of face masks on the potential development of voice pathology.
Collapse
|
2
|
Nguyen DD, McCabe P, Thomas D, Purcell A, Doble M, Novakovic D, Chacon A, Madill C. Acoustic voice characteristics with and without wearing a facemask. Sci Rep 2021; 11:5651. [PMID: 33707509 PMCID: PMC7970997 DOI: 10.1038/s41598-021-85130-8] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Accepted: 02/19/2021] [Indexed: 01/31/2023] Open
Abstract
Facemasks are essential for healthcare workers but characteristics of the voice whilst wearing this personal protective equipment are not well understood. In the present study, we compared acoustic voice measures in recordings of sixteen adults producing standardised vocal tasks with and without wearing either a surgical mask or a KN95 mask. Data were analysed for mean spectral levels at 0-1 kHz and 1-8 kHz regions, an energy ratio between 0-1 and 1-8 kHz (LH1000), harmonics-to-noise ratio (HNR), smoothed cepstral peak prominence (CPPS), and vocal intensity. In connected speech there was significant attenuation of mean spectral level at 1-8 kHz region and there was no significant change in this measure at 0-1 kHz. Mean spectral levels of vowel did not change significantly in mask-wearing conditions. LH1000 for connected speech significantly increased whilst wearing either a surgical mask or KN95 mask but no significant change in this measure was found for vowel. HNR was higher in the mask-wearing conditions than the no-mask condition. CPPS and vocal intensity did not change in mask-wearing conditions. These findings implied an attenuation effects of wearing these types of masks on the voice spectra with surgical mask showing less impact than the KN95.
Collapse
Affiliation(s)
- Duy Duong Nguyen
- grid.1013.30000 0004 1936 834XVoice Research Laboratory, Faculty of Medicine and Health, D18, Susan Wakil Health Building, Camperdown Campus, The University of Sydney, Western Avenue, Sydney, NSW 2006 Australia
| | - Patricia McCabe
- grid.1013.30000 0004 1936 834XVoice Research Laboratory, Faculty of Medicine and Health, D18, Susan Wakil Health Building, Camperdown Campus, The University of Sydney, Western Avenue, Sydney, NSW 2006 Australia
| | - Donna Thomas
- grid.1013.30000 0004 1936 834XVoice Research Laboratory, Faculty of Medicine and Health, D18, Susan Wakil Health Building, Camperdown Campus, The University of Sydney, Western Avenue, Sydney, NSW 2006 Australia
| | - Alison Purcell
- grid.1013.30000 0004 1936 834XVoice Research Laboratory, Faculty of Medicine and Health, D18, Susan Wakil Health Building, Camperdown Campus, The University of Sydney, Western Avenue, Sydney, NSW 2006 Australia
| | - Maree Doble
- grid.1013.30000 0004 1936 834XVoice Research Laboratory, Faculty of Medicine and Health, D18, Susan Wakil Health Building, Camperdown Campus, The University of Sydney, Western Avenue, Sydney, NSW 2006 Australia
| | - Daniel Novakovic
- grid.1013.30000 0004 1936 834XVoice Research Laboratory, Faculty of Medicine and Health, D18, Susan Wakil Health Building, Camperdown Campus, The University of Sydney, Western Avenue, Sydney, NSW 2006 Australia
| | - Antonia Chacon
- grid.1013.30000 0004 1936 834XVoice Research Laboratory, Faculty of Medicine and Health, D18, Susan Wakil Health Building, Camperdown Campus, The University of Sydney, Western Avenue, Sydney, NSW 2006 Australia
| | - Catherine Madill
- grid.1013.30000 0004 1936 834XVoice Research Laboratory, Faculty of Medicine and Health, D18, Susan Wakil Health Building, Camperdown Campus, The University of Sydney, Western Avenue, Sydney, NSW 2006 Australia
| |
Collapse
|
3
|
Talkington WJ, Donai J, Kadner AS, Layne ML, Forino A, Wen S, Gao S, Gray MM, Ashraf AJ, Valencia GN, Smith BD, Khoo SK, Gray SJ, Lass N, Brefczynski-Lewis JA, Engdahl S, Graham D, Frum CA, Lewis JW. Electrophysiological Evidence of Early Cortical Sensitivity to Human Conspecific Mimic Voice as a Distinct Category of Natural Sound. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2020; 63:3539-3559. [PMID: 32936717 PMCID: PMC8060013 DOI: 10.1044/2020_jslhr-20-00063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Revised: 04/29/2020] [Accepted: 07/01/2020] [Indexed: 06/11/2023]
Abstract
Purpose From an anthropological perspective of hominin communication, the human auditory system likely evolved to enable special sensitivity to sounds produced by the vocal tracts of human conspecifics whether attended or passively heard. While numerous electrophysiological studies have used stereotypical human-produced verbal (speech voice and singing voice) and nonverbal vocalizations to identify human voice-sensitive responses, controversy remains as to when (and where) processing of acoustic signal attributes characteristic of "human voiceness" per se initiate in the brain. Method To explore this, we used animal vocalizations and human-mimicked versions of those calls ("mimic voice") to examine late auditory evoked potential responses in humans. Results Here, we revealed an N1b component (96-120 ms poststimulus) during a nonattending listening condition showing significantly greater magnitude in response to mimics, beginning as early as primary auditory cortices, preceding the time window reported in previous studies that revealed species-specific vocalization processing initiating in the range of 147-219 ms. During a sound discrimination task, a P600 (500-700 ms poststimulus) component showed specificity for accurate discrimination of human mimic voice. Distinct acoustic signal attributes and features of the stimuli were used in a classifier model, which could distinguish most human from animal voice comparably to behavioral data-though none of these single features could adequately distinguish human voiceness. Conclusions These results provide novel ideas for algorithms used in neuromimetic hearing aids, as well as direct electrophysiological support for a neurocognitive model of natural sound processing that informs both neurodevelopmental and anthropological models regarding the establishment of auditory communication systems in humans. Supplemental Material https://doi.org/10.23641/asha.12903839.
Collapse
Affiliation(s)
- William J. Talkington
- Department of Neuroscience, Rockefeller Neuroscience Institute, West Virginia University, Morgantown
| | - Jeremy Donai
- Department of Communication Sciences and Disorders, College of Education and Human Services, West Virginia University, Morgantown
| | - Alexandra S. Kadner
- Department of Neuroscience, Rockefeller Neuroscience Institute, West Virginia University, Morgantown
| | - Molly L. Layne
- Department of Neuroscience, Rockefeller Neuroscience Institute, West Virginia University, Morgantown
| | - Andrew Forino
- Department of Neuroscience, Rockefeller Neuroscience Institute, West Virginia University, Morgantown
| | - Sijin Wen
- Department of Biostatistics, West Virginia University, Morgantown
| | - Si Gao
- Department of Biostatistics, West Virginia University, Morgantown
| | - Margeaux M. Gray
- Department of Biology, Rockefeller Neuroscience Institute, West Virginia University, Morgantown
| | - Alexandria J. Ashraf
- Department of Biology, Rockefeller Neuroscience Institute, West Virginia University, Morgantown
| | - Gabriela N. Valencia
- Department of Neuroscience, Rockefeller Neuroscience Institute, West Virginia University, Morgantown
| | - Brandon D. Smith
- Department of Biology, Rockefeller Neuroscience Institute, West Virginia University, Morgantown
| | - Stephanie K. Khoo
- Department of Biology, Rockefeller Neuroscience Institute, West Virginia University, Morgantown
| | - Stephen J. Gray
- Department of Neuroscience, Rockefeller Neuroscience Institute, West Virginia University, Morgantown
| | - Norman Lass
- Department of Communication Sciences and Disorders, College of Education and Human Services, West Virginia University, Morgantown
| | | | - Susannah Engdahl
- Department of Neuroscience, Rockefeller Neuroscience Institute, West Virginia University, Morgantown
| | - David Graham
- Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown
| | - Chris A. Frum
- Department of Neuroscience, Rockefeller Neuroscience Institute, West Virginia University, Morgantown
| | - James W. Lewis
- Department of Neuroscience, Rockefeller Neuroscience Institute, West Virginia University, Morgantown
| |
Collapse
|
4
|
Monson BB, Hunter EJ, Lotto AJ, Story BH. The perceptual significance of high-frequency energy in the human voice. Front Psychol 2014; 5:587. [PMID: 24982643 PMCID: PMC4059169 DOI: 10.3389/fpsyg.2014.00587] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2014] [Accepted: 05/26/2014] [Indexed: 11/25/2022] Open
Abstract
While human vocalizations generate acoustical energy at frequencies up to (and beyond) 20 kHz, the energy at frequencies above about 5 kHz has traditionally been neglected in speech perception research. The intent of this paper is to review (1) the historical reasons for this research trend and (2) the work that continues to elucidate the perceptual significance of high-frequency energy (HFE) in speech and singing. The historical and physical factors reveal that, while HFE was believed to be unnecessary and/or impractical for applications of interest, it was never shown to be perceptually insignificant. Rather, the main causes for focus on low-frequency energy appear to be because the low-frequency portion of the speech spectrum was seen to be sufficient (from a perceptual standpoint), or the difficulty of HFE research was too great to be justifiable (from a technological standpoint). The advancement of technology continues to overcome concerns stemming from the latter reason. Likewise, advances in our understanding of the perceptual effects of HFE now cast doubt on the first cause. Emerging evidence indicates that HFE plays a more significant role than previously believed, and should thus be considered in speech and voice perception research, especially in research involving children and the hearing impaired.
Collapse
Affiliation(s)
- Brian B. Monson
- Department of Pediatric Newborn Medicine, Brigham and Women’s Hospital, Harvard Medical SchoolBoston, MA, USA
- National Center for Voice and Speech, University of UtahSalt Lake City, UT, USA
| | - Eric J. Hunter
- National Center for Voice and Speech, University of UtahSalt Lake City, UT, USA
- Department of Communicative Sciences and Disorders, Michigan State UniversityEast Lansing, MI, USA
| | - Andrew J. Lotto
- Speech, Language, and Hearing Sciences, University of ArizonaTucson, AZ, USA
| | - Brad H. Story
- Speech, Language, and Hearing Sciences, University of ArizonaTucson, AZ, USA
| |
Collapse
|
5
|
Monson BB, Lotto AJ, Story BH. Analysis of high-frequency energy in long-term average spectra of singing, speech, and voiceless fricatives. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 132:1754-64. [PMID: 22978902 PMCID: PMC3460988 DOI: 10.1121/1.4742724] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2011] [Revised: 07/04/2012] [Accepted: 07/16/2012] [Indexed: 05/04/2023]
Abstract
The human singing and speech spectrum includes energy above 5 kHz. To begin an in-depth exploration of this high-frequency energy (HFE), a database of anechoic high-fidelity recordings of singers and talkers was created and analyzed. Third-octave band analysis from the long-term average spectra showed that production level (soft vs normal vs loud), production mode (singing vs speech), and phoneme (for voiceless fricatives) all significantly affected HFE characteristics. Specifically, increased production level caused an increase in absolute HFE level, but a decrease in relative HFE level. Singing exhibited higher levels of HFE than speech in the soft and normal conditions, but not in the loud condition. Third-octave band levels distinguished phoneme class of voiceless fricatives. Female HFE levels were significantly greater than male levels only above 11 kHz. This information is pertinent to various areas of acoustics, including vocal tract modeling, voice synthesis, augmentative hearing technology (hearing aids and cochlear implants), and training/therapy for singing and speech.
Collapse
Affiliation(s)
- Brian B Monson
- National Center for Voice and Speech, University of Utah, 136 S. Main Street, Suite 320, Salt Lake City, Utah 84101, USA.
| | | | | |
Collapse
|
6
|
Monson BB, Lotto AJ, Ternström S. Detection of high-frequency energy changes in sustained vowels produced by singers. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 129:2263-8. [PMID: 21476681 PMCID: PMC5570078 DOI: 10.1121/1.3557033] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The human voice spectrum above 5 kHz receives little attention. However, there are reasons to believe that this high-frequency energy (HFE) may play a role in perceived quality of voice in singing and speech. To fulfill this role, differences in HFE must first be detectable. To determine human ability to detect differences in HFE, the levels of the 8- and 16-kHz center-frequency octave bands were individually attenuated in sustained vowel sounds produced by singers and presented to listeners. Relatively small changes in HFE were in fact detectable, suggesting that this frequency range potentially contributes to the perception of especially the singing voice. Detection ability was greater in the 8-kHz octave than in the 16-kHz octave and varied with band energy level.
Collapse
Affiliation(s)
- Brian B Monson
- Department of Speech, Language, and Hearing Sciences, University of Arizona, PO Box 210071, Tucson, Arizona 85721, USA.
| | | | | |
Collapse
|
7
|
Liss JM, LeGendre S, Lotto AJ. Discriminating dysarthria type from envelope modulation spectra. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2010; 53:1246-55. [PMID: 20643800 PMCID: PMC3738168 DOI: 10.1044/1092-4388(2010/09-0121)] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
PURPOSE Previous research demonstrated the ability of temporally based rhythm metrics to distinguish among dysarthrias with different prosodic deficit profiles (J. M. Liss et al., 2009). The authors examined whether comparable results could be obtained by an automated analysis of speech envelope modulation spectra (EMS), which quantifies the rhythmicity of speech within specified frequency bands. METHOD EMS was conducted on sentences produced by 43 speakers with 1 of 4 types of dysarthria and healthy controls. The EMS consisted of the spectra of the slow-rate (up to 10 Hz) amplitude modulations of the full signal and 7 octave bands ranging in center frequency from 125 to 8000 Hz. Six variables were calculated for each band relating to peak frequency and amplitude and relative energy above, below, and in the region of 4 Hz. Discriminant function analyses (DFA) determined which sets of predictor variables best discriminated between and among groups. RESULTS Each of 6 DFAs identified 2-6 of the 48 predictor variables. These variables achieved 84%-100% classification accuracy for group membership. CONCLUSIONS Dysarthrias can be characterized by quantifiable temporal patterns in acoustic output. Because EMS analysis is automated and requires no editing or linguistic assumptions, it shows promise as a clinical and research tool.
Collapse
Affiliation(s)
- Julie M Liss
- Motor Speech Disorders Laboratory, Arizona State University Coor, 870102, Tempe, AZ 85287, USA.
| | | | | |
Collapse
|
8
|
Mendoza E, Valencia N, Muñoz J, Trujillo H. Differences in voice quality between men and women: use of the long-term average spectrum (LTAS). J Voice 1996; 10:59-66. [PMID: 8653179 DOI: 10.1016/s0892-1997(96)80019-1] [Citation(s) in RCA: 63] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
The goal of this study was to determine if there are acoustical differences between male and female voices, and if there are, where exactly do these differences lie. Extended speech samples were used. The recorded readings of a text by 31 women and by 24 men were analyzed by means of the Long-Term Spectrum (LTAS), extracting the amplitude values (in decibels) at intervals of 160 Hz over a range of 8 KHz. The results showed a significant difference between genders, as well as an interaction of gender and frequency level. The female voice showed greater levels of aspiration noise, located in the spectral regions corresponding to the third formant, which causes the female voice to have a more "breathy" quality than the male voice. The lower spectral tilt in the women's voices is another consequence of this presence of greater aspiration noise.
Collapse
Affiliation(s)
- E Mendoza
- Department of Personality, Evaluation and Psychological Treatment, University of Granada, Spain
| | | | | | | |
Collapse
|
9
|
Valencia Naranjo N, Mendoza Lara E, Mateo Rodríguez I, Carballo García G. High-frequency components of normal and dysphonic voices. J Voice 1994; 8:157-62. [PMID: 8061771 DOI: 10.1016/s0892-1997(05)80307-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
The analysis of high-frequency energy levels can presently be considered as being more influenced by the characteristics of the speaker than by the phonetic material, serving as an indicator of the quality of the voice. In this study the energy levels in the frequency intervals of 6-10 and 10-16 kHz were evaluated, both in dysphonic voices and in subjects with no voice problems. The material consisted of consonant-vowel syllables formed by stop phonemes and the basic Spanish vowels, analyzing the vocal segment. The results indicate that the energy, found from the power spectrum average, in both frequency areas, is significantly bigger in the group with dysphonic voices. The analysis in the 10-16 kHz regions shows additional significant differences of energy between the vowels, progressively decreasing from front to back vowels.
Collapse
Affiliation(s)
- N Valencia Naranjo
- Departamento de Personalidad, Evaluación, y Tratamiento Psicológico, Universidad de Granada, Spain
| | | | | | | |
Collapse
|