1
|
Quinn S, Oates J, Dacakis G. The Effectiveness of Gender Affirming Voice Training for Transfeminine Clients: A Comparison of Traditional Versus Intensive Delivery Schedules. J Voice 2024; 38:1250.e25-1250.e52. [PMID: 35400554 DOI: 10.1016/j.jvoice.2022.03.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 03/01/2022] [Accepted: 03/02/2022] [Indexed: 11/24/2022]
Abstract
INTRODUCTION Gender affirming voice training is a service provided by speech language pathologists to members of the trans and gender diverse community. While there is some evidence to support the effectiveness of this training, the evidence base is limited by a lack of prospective studies with large sample sizes. Finally, there has been only limited research investigating the effectiveness of this training when delivered on intensive (compressed) schedules, even though such schedules are used in clinical practice and may have practical benefits such as increasing service access for this vulnerable population. METHODOLOGY This study aimed to investigate and compare the effectiveness gender affirming voice training among 34 trans individuals presumed male at birth aiming to develop a perceptually feminine/female-sounding voice. Among these 34 participants, 17 received their training on a traditional schedule (one 45-minute session per week over 12 weeks) and 17 on an intensive scheduled (three 45-minute sessions per week over 4 weeks). Building on a previous mixed methodological study which indicated that these two training groups were equally satisfied with training outcomes, the current study utilised a wide range of self-report, acoustic, and auditory-perceptual outcome measures (including self-ratings and listener-ratings of voice) to investigate training effectiveness. DISCUSSION Results from this study indicated that both training programs were similarly effective, producing positive statistically significant change among participants on a range of outcome measures. Participants in both groups demonstrated significant auditory-perceptual and acoustic voice change and reported increased satisfaction with voice, increased congruence between gender identity and expression, and a reduction in the negative impact of voice concerns on everyday life. However, as has been the case in past studies, training was not sufficient for all participants to achieve their goal of developing a consistently feminine/female-sounding voice. CONCLUSION This study provides evidence to suggest that gender affirming voice training for transfeminine clients may be similarly effective whether delivered intensively or traditionally. This study provides evidence to support the practice of using a wide range of outcome measures to gain holistic insight into client progress in gender affirming voice training programs.
Collapse
Affiliation(s)
- Sterling Quinn
- Discipline of Speech Pathology, La Trobe University, Bundoora, Australia.
| | - Jennifer Oates
- Discipline of Speech Pathology, La Trobe University, Bundoora, Australia
| | - Georgia Dacakis
- Discipline of Speech Pathology, La Trobe University, Bundoora, Australia
| |
Collapse
|
2
|
Lester-Smith RA, Jebaily CG, Story BH. The Effects of Remote Signal Transmission and Recording on Acoustical Measures of Simulated Essential Vocal Tremor: Considerations for Remote Treatment Research and Telepractice. J Voice 2024; 38:325-336. [PMID: 34702610 PMCID: PMC9033886 DOI: 10.1016/j.jvoice.2021.09.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Revised: 09/08/2021] [Accepted: 09/09/2021] [Indexed: 10/20/2022]
Abstract
PURPOSE Studies on medical and behavioral interventions for essential vocal tremor (EVT) have shown inconsistent effects on acoustical and perceptual outcome measures across studies and across participants. Remote acoustical and perceptual assessments might facilitate studies with larger samples of participants and repeated measures that could clarify treatment effects and identify optimal treatment candidates. Furthermore, remote acoustical and perceptual assessment might allow clinicians to monitor clients' treatment responses and optimize treatment approaches during telepractice. Thus, the purpose of this study was to evaluate the accuracy of remote signal transmission and recording for acoustical and perceptual assessment of EVT. METHOD Simulations of EVT were produced using a computational model and were recorded using local and remote procedures to represent client- and clinician-end recordings respectively. Acoustical analyses measured the extent and rate of fundamental frequency (fo) and intensity modulation to represent vocal tremor severity and the cepstral peak prominence (CPPS) to represent voice quality. The data were analyzed using repeated measures analysis of variance (ANOVA) with recording as the within-subjects factor and sex of the computational model as the between-subjects factor. RESULTS There was a significant main effect of recording on the rate of fo modulation and significant interactions of recording and sex for the extent of intensity modulation, rate of intensity modulation, and CPPS. Posthoc pairwise comparisons and analysis of effect size indicated that recording procedures had the largest effect on the extent of intensity modulation for male simulations, the rate of intensity modulation for male and female simulations, and the CPPS for male and female simulations. Despite having disabled all known software and computer audio enhancing options and having stable ethernet connections, there was inconsistent attenuation of signal amplitude in remote recordings that was most problematic for samples with a breathy voice quality but also affected samples with typical and pressed voice qualities. CONCLUSIONS Acoustical measures that correlate to perception of vocal tremor and voice quality were altered by remote signal transmission and recording. In particular, signal transmission and recording in Zoom altered time-based estimates of intensity modulation and CPPS with male and female simulations of EVT and magnitude-based estimates of intensity modulation with male simulations of EVT. In contrast, signal transmission and recording in Zoom minimally altered time- and magnitude-based estimates of fo modulation with male and female simulations of EVT. Therefore, acoustical and perceptual assessments of EVT should be performed using audio recordings that are collected locally on the participant- or client-end, particularly when measuring modulation of intensity and CPP or estimating vocal tremor severity and voice quality. Development of procedures for collecting local audio recordings in remote settings may expand data collection for treatment research and enhance telepractice.
Collapse
Affiliation(s)
- Rosemary A Lester-Smith
- Department of Speech, Language, and Hearing Sciences, Moody College of Communication, The University of Texas at Austin, Austin, Texas.
| | - Charles G Jebaily
- Department of Speech, Language, and Hearing Sciences, Moody College of Communication, The University of Texas at Austin, Austin, Texas; Texas NeuroRehab Center, Austin, Texas
| | - Brad H Story
- Department of Speech, Language, and Hearing Sciences, The University of Arizona, Tucson, Arizona
| |
Collapse
|
3
|
Lester-Smith RA, Derrick E, Larson CR. Characterization of Source-Filter Interactions in Vocal Vibrato Using a Neck-Surface Vibration Sensor: A Pilot Study. J Voice 2024; 38:1-9. [PMID: 34649740 PMCID: PMC8995401 DOI: 10.1016/j.jvoice.2021.08.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 08/18/2021] [Accepted: 08/23/2021] [Indexed: 11/23/2022]
Abstract
PURPOSE Vocal vibrato is a singing technique that involves periodic modulation of fundamental frequency (fo) and intensity. The physiological sources of modulation within the speech mechanism and the interactions between the laryngeal source and vocal tract filter in vibrato are not fully understood. Therefore, the purpose of this study was to determine if differences in the rate and extent of fo and intensity modulation could be captured using simultaneously recorded signals from a neck-surface vibration sensor and a microphone, which represent features of the source before and after supraglottal vocal tract filtering. METHOD Nine classically-trained singers produced sustained vowels with vibrato while simultaneous signals were recorded using a vibration sensor and a microphone. Acoustical analyses were performed to measure the rate and extent of fo and intensity modulation for each trial. Paired-samples sign tests were used to analyze differences between the rate and extent of fo and intensity modulation in the vibration sensor and microphone signals. RESULTS The rate and extent of fo modulation and the extent of intensity modulation were equivalent in the vibration sensor and microphone signals, but the rate of intensity modulation was significantly higher in the microphone signal than in the vibration sensor signal. Larger differences in the rate of intensity modulation were seen with vowels that typically have smaller differences between the first and second formant frequencies. CONCLUSIONS This study demonstrated that the rate of intensity modulation at the source prior to supraglottal vocal tract filtering, as measured in neck-surface vibration sensor signals, was lower than the rate of intensity modulation after supraglottal vocal tract filtering, as measured in microphone signals. The difference in rate varied based on the vowel. These findings provide further support of the resonance-harmonics interaction in vocal vibrato. Further investigation is warranted to determine if differences in the physiological source(s) of vibrato account for inconsistent relationships between the extent of intensity modulation in neck-surface vibration sensor and microphone signals.
Collapse
Affiliation(s)
- Rosemary A Lester-Smith
- Department of Physical Medicine & Rehabilitation, Feinberg School of Medicine, Northwestern University, Chicago, Illinois.
| | - Elaina Derrick
- Department of Speech, Language and Hearing Sciences, Moody College of Communication, The University of Texas at Austin, Austin, Texas
| | - Charles R Larson
- Department of Communication Sciences and Disorders, Northwestern University, Evanston, Illinois
| |
Collapse
|
4
|
Pinto CM. Listeners' Perception of Vocal Effects During Singing. J Voice 2023:S0892-1997(23)00123-6. [PMID: 37156682 DOI: 10.1016/j.jvoice.2023.03.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 03/30/2023] [Accepted: 03/30/2023] [Indexed: 05/10/2023]
Abstract
INTRODUCTION Effective communication is a key feature of vocal music. Singers can communicate during singing by changing their voice qualities to express emotion. Varying acceptable standards are used by performers for voice quality secondary to musical genre. Types of voice qualities that are historically perceived as abusive by some teachers of singing (ToS) and speech-language pathologists (SLPs) are vocal effects. This study investigates the perceptions of vocal effects in professional and nonprofessional listeners (NPLs). METHODS Participants (n = 100) completed an online survey. Participants were divided into four professional groups; Classical ToS, Contemporary ToS, SLPs, and NPLs. Participants completed an identification task to assess their ability to identify the use of a vocal effect. Secondly, participants analyzed a singer performing a vocal effect, rated their preferences towards the effect, and gave objective performance ratings using a Likert scale. Finally, participants were asked if they had concerns about the singer's voice. If the participant responded yes, they were asked who they would refer the singer to, a SLP, ToS or medical doctor (MD). RESULTS Statistically significant differences were observed in SLPs ability to identify the use of vocal effects compared to classical ToS (P = 0.01), contemporary ToS (P = 0.001) as well as NPLs compared to contemporary ToS (P = 0.009). NPLs were reported to have a lesser rate of concern statistically compared to professional listeners (P = .006). Statically significant differences were found when comparing performance rating scores secondary to preference for the vocal effect when comparisons were larger than one Likert rating interval. With listeners giving higher performance ratings, if they reported higher preference ratings. Finally, no significant differences were identified when comparing referral scores secondary to occupation. CONCLUSIONS Findings provide support for the presence of specific biases towards the use of vocal effects although no bias was found in management and care recommendations. Future research is recommended to investigate the nature of these biases.
Collapse
Affiliation(s)
- Cory M Pinto
- Department of Communication Sciences and Disorders, Montclair State University, Bloomfield, NJ.
| |
Collapse
|
5
|
Nestorova T, Brandner M, Gingras B, Herbst CT. Vocal Vibrato Characteristics in Historical and Contemporary Opera, Operetta, and Schlager. J Voice 2023:S0892-1997(22)00428-3. [PMID: 37080891 DOI: 10.1016/j.jvoice.2022.12.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 12/29/2022] [Accepted: 12/29/2022] [Indexed: 04/22/2023]
Abstract
OBJECTIVES/HYPOTHESIS Vibrato is a core aesthetic element in singing. It varies considerably by both genre and era. Though studied extensively in Western classical singing over the years, there is a dearth of studies on vibrato in contemporary commercial music. In addressing this research gap, the objective of this study was to find and investigate common crossover song material from the opera, operetta, and Schlager singing styles from the historical early 20th to the contemporary 21st century epochs. STUDY DESIGN/METHODS A total of 51 commercial recordings of two songs, "Es muss was Wunderbares sein" by Ralph Benatzky, and "Die ganze Welt ist himmelblau" by Robert Stolz, from "The White Horse Inn" ("Im weißen Rößl") were collected from opera, operetta, and Schlager singers. Each sample was annotated using Praat and analyzed in a custom Matlab- and Python-based algorithmic approach of singing voice separation and sine wave fitting novel to vibrato research. RESULTS With respect to vibrato rate and extent, the three most notable findings were that (1) fo and vibrato were inherently connected; (2) Schlager, as a historical aesthetic category, has unique vibrato characteristics, with higher overall rate and lower overall extent; and (3) fo and vibrato extent varied over time based on the historical or contemporary recording year for each genre. CONCLUSIONS Though these results should be interpreted with caution due to the limited sample size, conducting such acoustical analysis is relevant for voice pedagogy. This study sheds light on the complexity of vocal vibrato production physiology and acoustics while providing insight into various aesthetic choices when performing music of different genres and stylistic time periods. In the age of crossover singing training and commercially available recordings, this investigation reveals important distinctions regarding vocal vibrato across genres and eras that bear beneficial implications for singers and teachers of singing.
Collapse
Affiliation(s)
| | - Manuel Brandner
- Institute of Electronic Music and Acoustics, University of Music and Performing Arts, Graz, Styria, Austria
| | | | - Christian T Herbst
- Mozarteum University Salzburg, Salzburg, Austria; Shenandoah University, Winchester, Virginia.
| |
Collapse
|
6
|
Simulation of English Speech Recognition Based on Improved Extreme Random Forest Classification. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:1948159. [PMID: 35814545 PMCID: PMC9270152 DOI: 10.1155/2022/1948159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Revised: 06/05/2022] [Accepted: 06/20/2022] [Indexed: 11/17/2022]
Abstract
Existing speech recognition systems are only for mainstream audio types; there is little research on language types; the system is subject to relatively large restrictions; and the recognition rate is not high. Therefore, how to use an efficient classifier to make a speech recognition system with a high recognition rate is one of the current research focuses. Based on the idea of machine learning, this study combines the computational random forest classification method to improve the algorithm and builds an English speech recognition model based on machine learning. Moreover, this study uses a lightweight model and its improved model to recognize speech signals and directly performs adaptive wavelet threshold shrinkage and denoising on the generated time-frequency images. In addition, this study uses the EI strong classifier to replace the softmax of the lightweight AlexNet model, which further improves the recognition accuracy under a low signal-to-noise ratio. Finally, this study designs experiments to verify the model effect. The research results show that the effect of the model constructed in this study is good.
Collapse
|
7
|
|
8
|
Herbst CT. Performance Evaluation of Subharmonic-to-Harmonic Ratio (SHR) Computation. J Voice 2021; 35:365-375. [DOI: 10.1016/j.jvoice.2019.11.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Revised: 11/09/2019] [Accepted: 11/11/2019] [Indexed: 10/24/2022]
|
9
|
Wei X. Simulation of English intelligent system based on CA-IAFSA algorithm and artificial intelligence. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-189796] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The traditional English teaching mode mostly relies on rote memorization of textbooks, but it lacks the training of oral expression skills and lacks intelligent guidance for students. Taking machine learning algorithm as the system algorithm, this paper combines the CA-IAFSA algorithm to construct an English intelligent system based on artificial intelligence. The system uses image recognition technology, introduces population pheromone and tribal pheromone, and adopts multiple ant colony planning and dual pheromone feedback strategies. Moreover, this paper improves the heuristic information search strategy, pheromone update strategy, and state transition probability of the basic ant colony algorithm. In addition, this paper proposes the MACDPA path planning algorithm to realize the intelligent analysis of English textbook images. Finally, after constructing the model, this paper conducts research and analysis on the performance of the model and uses controlled experimental methods and mathematical statistics to analyze the data. The research results show that the model constructed in this paper performs well in assisted teaching and intelligent translation and meets the expected requirements.
Collapse
Affiliation(s)
- Xinyu Wei
- Inner Mongolia University for Nationalities, Tongliao, Inner Mongolia, China
| |
Collapse
|
10
|
Glasner JD, Johnson AM. Effects of Historical Recording Technology on Vibrato in Modern-Day Opera Singers. J Voice 2020; 36:464-478. [PMID: 32819779 DOI: 10.1016/j.jvoice.2020.07.022] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Revised: 07/15/2020] [Accepted: 07/21/2020] [Indexed: 10/23/2022]
Abstract
OBJECTIVE Past literature indicates that vibrato measurements of singers objectively changed (i.e., vibrato rate decreased and vibrato extent increased) from 1900 to the present day; however, historical audio recording technology may distort acoustic measurements of the voice output signal, including vibrato. As such, the listener's perception of historical singing may be influenced by the limitations of historical technology. This study attempts to show how the wax cylinder phonograph system-the oldest form of mass-produced audio recording technology-alters the recorded voice output signal of modern-day singers and, thus, provides an objective lens through which to study the effect(s) of historical audio recording technology on vibrato measurements. METHODS Twenty professional Western opera singers sang a messa di voce on the vowel [a] and on the pitch C4 for male singers and C5 for female singers, three times into a flat-response omnidirectional microphone and onto an Edison Home Phonograph simultaneously. The middle 1-3 seconds (6-10 vibrato cycles) of each sample was analyzed for vibrato rate, vibrato extent, jitter (ddp), shimmer (dda), and fundamental frequency for each recording condition (wax cylinder phonograph or microphone). Steady-state and frequency-modulating sinewave test signals were also recorded under the multiple recording conditions. RESULTS Results indicated no significant effect of recording condition on vibrato rate (mean [standard deviation], cylinder: 5.3 Hz [0.5], microphone: 5.3 Hz [0.5]) and no significant difference was found for mean fundamental frequency (cylinder: 389 Hz [137], microphone: 390 Hz [137]). A significant main effect of recording condition was found for vibrato extent (cylinder: ±103 cents [30], microphone: ±100 cents [31]). Additionally, mean jitter (ddp) (cylinder: 1.22% [1.09], microphone: 0.24% [0.12]) and mean shimmer (dda) (cylinder: 9.40% [4.90], microphone: 1.92% [0.94]) were significantly higher for the cylinder recording condition, indicating more cycle-to-cycle variability in the wax cylinder recorded signal. Analysis of test signals revealed similar patterns based on recording condition. DISCUSSION This study validates past scholarly inquiry about vibrato measurements as extracted from digitized wax cylinder phonograph recordings by demonstrating that measured vibrato rate remains constant during both recording conditions. In other words, vibrato rate as measured from historical recordings can be viewed as an accurate representation of the historical singer being studied. Furthermore, it suggests that the value of prior vibrato extent measurements from these acoustic recordings may be slightly overestimated from the original voice output signals produced by singers near the beginning of the 20th century (i.e., a narrow vibrato extent might have been numerically smaller on average). Increased jitter and shimmer in the wax cylinder recording conditions may be indicative of nonlinearities in the phonograph recording or playback systems.
Collapse
Affiliation(s)
- Joshua D Glasner
- Department of Music and Performing Arts Professions, New York University, New York, New York.
| | - Aaron M Johnson
- Department of Otolaryngology-Head and Neck Surgery, New York University School of Medicine, New York, New York
| |
Collapse
|