1
|
Sara JDS, Orbelo D, Maor E, Lerman LO, Lerman A. Guess What We Can Hear-Novel Voice Biomarkers for the Remote Detection of Disease. Mayo Clin Proc 2023; 98:1353-1375. [PMID: 37661144 PMCID: PMC10043966 DOI: 10.1016/j.mayocp.2023.03.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 02/08/2023] [Accepted: 03/16/2023] [Indexed: 03/30/2023]
Abstract
The advancement of digital biomarkers and the provision of remote health care greatly progressed during the coronavirus disease 2019 global pandemic. Combining voice/speech data with artificial intelligence and machine-based learning offers a novel solution to the growing demand for telemedicine. Voice biomarkers, obtained from the extraction of characteristic acoustic and linguistic features, are associated with a variety of diseases and even coronavirus disease 2019. In the current review, we (1) describe the basis on which digital voice biomarkers could facilitate "telemedicine," (2) discuss potential mechanisms that may explain the association between voice biomarkers and disease, (3) offer a novel classification system to conceptualize voice biomarkers depending on different methods for recording and analyzing voice/speech samples, (4) outline evidence revealing an association between voice biomarkers and a number of disease states, and (5) describe the process of developing a voice biomarker from recording, storing voice samples, and extracting acoustic and linguistic features relevant to training and testing deep and machine-based learning algorithms to detect disease. We further explore several important future considerations in this area of research, including the necessity for clinical trials and the importance of safeguarding data and individual privacy. To this end, we searched PubMed and Google Scholar to identify studies evaluating the relationship between voice/speech features and biomarkers and various diseases. Search terms included digital biomarker, telemedicine, voice features, voice biomarker, speech features, speech biomarkers, acoustics, linguistics, cardiovascular disease, neurologic disease, psychiatric disease, and infectious disease. The search was limited to studies published in English in peer-reviewed journals between 1980 and the present. To identify potential studies not captured by our database search strategy, we also searched studies listed in the bibliography of relevant publications and reviews.
Collapse
Affiliation(s)
| | - Diana Orbelo
- Division of Otolaryngology, Mayo Clinic College of Medicine and Science, Rochester, MN; Chaim Sheba Medical Center, Tel HaShomer, Israel
| | - Elad Maor
- Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Lilach O Lerman
- Division of Nephrology and Hypertension, Mayo Clinic Rochester, MN
| | - Amir Lerman
- Department of Cardiovascular Medicine, Mayo Clinic College of Medicine and Science, Rochester, MN.
| |
Collapse
|
2
|
Anikin A. The honest sound of physical effort. PeerJ 2023; 11:e14944. [PMID: 37033726 PMCID: PMC10078454 DOI: 10.7717/peerj.14944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 02/02/2023] [Indexed: 04/05/2023] Open
Abstract
Acoustic correlates of physical effort are still poorly understood, even though effort is vocally communicated in a variety of contexts with crucial fitness consequences, including both confrontational and reproductive social interactions. In this study 33 lay participants spoke during a brief, but intense isometric hold (L-sit), first without any voice-related instructions, and then asked either to conceal their effort or to imitate it without actually performing the exercise. Listeners in two perceptual experiments then rated 383 recordings on perceived level of effort (n = 39 listeners) or categorized them as relaxed speech, actual effort, pretended effort, or concealed effort (n = 102 listeners). As expected, vocal effort increased compared to baseline, but the accompanying acoustic changes (increased loudness, pitch, and tense voice quality) were under voluntary control, so that they could be largely suppressed or imitated at will. In contrast, vocal tremor at approximately 10 Hz was most pronounced under actual load, and its experimental addition to relaxed baseline recordings created the impression of concealed effort. In sum, a brief episode of intense physical effort causes pronounced vocal changes, some of which are difficult to control. Listeners can thus estimate the true level of exertion, whether to judge the condition of their opponent in a fight or to monitor a partner’s investment into cooperative physical activities.
Collapse
|
3
|
Validation of a Speech Database for Assessing College Students' Physical Competence under the Concept of Physical Literacy. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph19127046. [PMID: 35742295 PMCID: PMC9222620 DOI: 10.3390/ijerph19127046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 06/04/2022] [Accepted: 06/07/2022] [Indexed: 02/01/2023]
Abstract
This study developed a speech database for assessing one of the elements of physical literacy—physical competence. Thirty-one healthy and native Cantonese speakers were instructed to read a material aloud after various exercises. The speech database contained four types of speech, which were collected at rest and after three exercises of the Canadian Assessment of Physical Literacy 2nd Edition. To show the possibility of detecting each exercise state, a support vector machine (SVM) was trained on the acoustic features. Two speech feature sets, the extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS) and Computational Paralinguistics Challenge (ComParE), were utilized to perform speech signal processing. The results showed that the two stage four-class SVM were better than the stage one. The performances of both feature sets could achieve 70% accuracy (unweighted average recall (UAR)) in the three-class model after five-fold cross-validation. The UAR result of the resting and vigorous state on the two-class model running with the ComParE feature set was 97%, and the UAR of the resting and moderate state was 74%. This study introduced the process of constructing a speech database and a method that can achieve the short-time automatic classification of physical states. Future work on this corpus, including the prediction of the physical competence of young people, comparison of speech features with other age groups and further spectral analysis, are suggested.
Collapse
|
4
|
Lebedeva S, Shved D, Savinkina A. Assessment of the Psychophysiological State of Female Operators Under Simulated Microgravity. Front Physiol 2022; 12:751016. [PMID: 35222056 PMCID: PMC8873526 DOI: 10.3389/fphys.2021.751016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2021] [Accepted: 12/29/2021] [Indexed: 11/13/2022] Open
Abstract
The article describes methods of non-verbal speech characteristics analysis used to determine psychophysiological state of female subjects under simulated microgravity conditions ("dry" immersion, DI), as well as the results of the study. A number of indicators of the acute period of adaptation to microgravity conditions was described. The acute adaptation period in female subjects began earlier (evening of the 1st day of DI) and ended faster than in male ones in previous studies (2nd day of DI). This was indicated by a decrease in the level of state anxiety (STAI, p < 0,05) and depression-dejection [Profile of Mood States (POMS), p < 0,05], as well as a decrease in pitch (p < 0,05) and voice intensity (p < 0,05). In addition, women, apparently, used the "freeze" coping strategy - the proportion of neutral facial expressions on the most intense days of the experiment was at maximum. The subjects in this experiment assessed their feelings and emotions better, giving more accurate answers in self-assessment questionnaires, but at the same time tried to look and sound as calm and confident as possible, controlling their expressions. Same trends in the subjects' cognitive performance were identified as in similar experimental conditions earlier: the subjects' psychophysiological excitement corresponded to better performance in sensorimotor tasks. The difference was in the speed of mathematical computation: women in the present study performed the computation faster on the same days when they made fewer pauses in speech, while in men in previous experiments this relationship was inverse.
Collapse
Affiliation(s)
- Svetlana Lebedeva
- Russian Federation State Scientific Center, Institute of Biomedical Problems of the Russian Academy of Sciences, Moscow, Russia
| | - Dmitry Shved
- Russian Federation State Scientific Center, Institute of Biomedical Problems of the Russian Academy of Sciences, Moscow, Russia
- Moscow Aviation Institute, National Research University, Moscow, Russia
| | - Alexandra Savinkina
- Russian Federation State Scientific Center, Institute of Biomedical Problems of the Russian Academy of Sciences, Moscow, Russia
| |
Collapse
|
5
|
Peifer C, Pollak A, Flak O, Pyszka A, Nisar MA, Irshad MT, Grzegorzek M, Kordyaka B, Kożusznik B. The Symphony of Team Flow in Virtual Teams. Using Artificial Intelligence for Its Recognition and Promotion. Front Psychol 2021; 12:697093. [PMID: 34566774 PMCID: PMC8455848 DOI: 10.3389/fpsyg.2021.697093] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2021] [Accepted: 07/26/2021] [Indexed: 11/13/2022] Open
Abstract
More and more teams are collaborating virtually across the globe, and the COVID-19 pandemic has further encouraged the dissemination of virtual teamwork. However, there are challenges for virtual teams – such as reduced informal communication – with implications for team effectiveness. Team flow is a concept with high potential for promoting team effectiveness, however its measurement and promotion are challenging. Traditional team flow measurements rely on self-report questionnaires that require interrupting the team process. Approaches in artificial intelligence, i.e., machine learning, offer methods to identify an algorithm based on behavioral and sensor data that is able to identify team flow and its dynamics over time without interrupting the process. Thus, in this article we present an approach to identify team flow in virtual teams, using machine learning methods. First of all, based on a literature review, we provide a model of team flow characteristics, composed of characteristics that are shared with individual flow and characteristics that are unique for team flow. It is argued that those characteristics that are unique for team flow are represented by the concept of collective communication. Based on that, we present physiological and behavioral correlates of team flow which are suitable – but not limited to – being assessed in virtual teams and which can be used as input data for a machine learning system to assess team flow in real time. Finally, we suggest interventions to support team flow that can be implemented in real time, in virtual environments and controlled by artificial intelligence. This article thus contributes to finding indicators and dynamics of team flow in virtual teams, to stimulate future research and to promote team effectiveness.
Collapse
Affiliation(s)
- Corinna Peifer
- Department of Psychology, University of Lübeck, Lübeck, Germany
| | - Anita Pollak
- Department of Social Science, Institute of Psychology, University of Silesia in Katowice, Katowice, Poland
| | - Olaf Flak
- University of Silesia in Katowice, Katowice, Poland
| | - Adrian Pyszka
- Department of Human Resource Management, College of Management, University of Economics in Katowice, Katowice, Poland
| | | | | | - Marcin Grzegorzek
- Institute of Medical Informatics, University of Lübeck, Lübeck, Germany
| | | | - Barbara Kożusznik
- Department of Social Science, Institute of Psychology, University of Silesia in Katowice, Katowice, Poland
| |
Collapse
|
6
|
Abstract
Recently, the possibilities of detecting psychosocial stress from speech have been discussed. Yet, there are mixed effects and a current lack of clarity in relations and directions for parameters derived from stressed speech. The aim of the current study is – in a controlled psychosocial stress induction experiment – to apply network modeling to (1) look into the unique associations between specific speech parameters, comparing speech networks containing fundamental frequency (F0), jitter, mean voiced segment length, and Harmonics-to-Noise Ratio (HNR) pre- and post-stress induction, and (2) examine how changes pre- versus post-stress induction (i.e., change network) in each of the parameters are related to changes in self-reported negative affect. Results show that the network of speech parameters is similar after versus before the stress induction, with a central role of HNR, which shows that the complex interplay and unique associations between each of the used speech parameters is not impacted by psychosocial stress (aim 1). Moreover, we found a change network (consisting of pre-post stress difference values) with changes in jitter being positively related to changes in self-reported negative affect (aim 2). These findings illustrate – for the first time in a well-controlled but ecologically valid setting – the complex relations between different speech parameters in the context of psychosocial stress. Longitudinal and experimental studies are required to further investigate these relationships and to test whether the identified paths in the networks are indicative of causal relationships.
Collapse
|
7
|
Izumi K, Minato K, Shiga K, Sugio T, Hanashiro S, Cortright K, Kudo S, Fujita T, Sado M, Maeno T, Takebayashi T, Mimura M, Kishimoto T. Unobtrusive Sensing Technology for Quantifying Stress and Well-Being Using Pulse, Speech, Body Motion, and Electrodermal Data in a Workplace Setting: Study Concept and Design. Front Psychiatry 2021; 12:611243. [PMID: 33995141 PMCID: PMC8113638 DOI: 10.3389/fpsyt.2021.611243] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Accepted: 03/23/2021] [Indexed: 01/02/2023] Open
Abstract
Introduction: Mental disorders are a leading cause of disability worldwide. Depression has a significant impact in the field of occupational health because it is particularly prevalent during working age. On the other hand, there are a growing number of studies on the relationship between "well-being" and employee productivity. To promote healthy and productive workplaces, this study aims to develop a technique to quantify stress and well-being in a way that does not disturb the workplace. Methods and analysis: This is a single-arm prospective observational study. The target population is adult (>20 years old) workers at companies that often engage in desk work; specifically, a person who sits in front of a computer for at least half their work hours. The following data will be collected: (a) participants' background characteristics; (b) participants' biological data during the 4-week observation period using sensing devices such as a camera built into the computer (pulse wave data extracted from the facial video images), a microphone built into their work computer (voice data), and a wristband-type wearable device (electrodermal activity data, body motion data, and body temperature); (c) stress, well-being, and depression rating scale assessment data. The analysis workflow is as follows: (1) primary analysis, comprised of using software to digitalize participants' vital information; (2) secondary analysis, comprised of examining the relationship between the quantified vital data from (1), stress, well-being, and depression; (3) tertiary analysis, comprised of generating machine learning algorithms to estimate stress, well-being, and degree of depression in relation to each set of vital data as well as multimodal vital data. Discussion: This study will evaluate digital phenotype regarding stress and well-being of white-collar workers over a 4-week period using persistently obtainable biomarkers such as heart rate, acoustic characteristics, body motion, and electrodermal activity. Eventually, this study will lead to the development of a machine learning algorithm to determine people's optimal levels of stress and well-being. Ethics and dissemination: Collected data and study results will be disseminated widely through conference presentations, journal publications, and/or mass media. The summarized results of our overall analysis will be supplied to participants. Registration: UMIN000036814.
Collapse
Affiliation(s)
- Keisuke Izumi
- Division of Rheumatology, Department of Internal Medicine, Keio University School of Medicine, Tokyo, Japan
- National Hospital Organization Tokyo Medical Center, Tokyo, Japan
- Medical AI Center, Keio University, Tokyo, Japan
| | - Kazumichi Minato
- Department of Neuropsychiatry, Keio University School of Medicine, Tokyo, Japan
| | - Kiko Shiga
- Department of Neuropsychiatry, Keio University School of Medicine, Tokyo, Japan
| | - Tatsuki Sugio
- Department of Neuropsychiatry, Keio University School of Medicine, Tokyo, Japan
| | - Sayaka Hanashiro
- Department of Neuropsychiatry, Keio University School of Medicine, Tokyo, Japan
| | - Kelley Cortright
- Department of Neuropsychiatry, Keio University School of Medicine, Tokyo, Japan
| | - Shun Kudo
- Department of Neuropsychiatry, Keio University School of Medicine, Tokyo, Japan
| | - Takanori Fujita
- Medical AI Center, Keio University, Tokyo, Japan
- Department of Health Policy and Management, Keio University School of Medicine, Tokyo, Japan
- World Economic Forum Centre for the Fourth Industrial Revolution Japan, Tokyo, Japan
| | - Mitsuhiro Sado
- Department of Neuropsychiatry, Keio University School of Medicine, Tokyo, Japan
- Center for Stress Research, Keio University, Tokyo, Japan
| | - Takashi Maeno
- Human System Design Laboratory, Graduate School of System Design and Management, Keio University, Tokyo, Japan
| | - Toru Takebayashi
- Medical AI Center, Keio University, Tokyo, Japan
- Department of Preventive Medicine and Public Health, Keio University School of Medicine, Tokyo, Japan
| | - Masaru Mimura
- Department of Neuropsychiatry, Keio University School of Medicine, Tokyo, Japan
| | - Taishiro Kishimoto
- Medical AI Center, Keio University, Tokyo, Japan
- Department of Neuropsychiatry, Keio University School of Medicine, Tokyo, Japan
- Department of Psychiatry, Donald and Barbara Zucker School of Medicine, New York, NY, United States
| |
Collapse
|
8
|
Perrine BL, Scherer RC. Aerodynamic and Acoustic Voice Measures Before and After an Acute Public Speaking Stressor. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2020; 63:3311-3325. [PMID: 32916082 DOI: 10.1044/2020_jslhr-19-00252] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Purpose The goal of this study was to determine if differences in stress system activation lead to changes in speaking fundamental frequency, average oral airflow, and estimated subglottal pressure before and after an acute, psychosocial stressor. Method Eighteen vocally healthy adult females experienced the Trier Social Stress Test (TSST) to activate the hypothalamic-pituitary-adrenal axis. The TSST includes public speaking and performing mental arithmetic in front of an audience. At seven time points, three before the stressor and four after the stressor, the participants produced /pa/ repetitions, read the Rainbow Passage, and provided a saliva sample. Measures included (a) salivary cortisol level, (b) oral airflow, (c) estimated subglottal pressure, and (d) speaking fundamental frequency from the second sentence of the Rainbow Passage. Results Ten of the 18 participants experienced a hypothalamic-pituitary-adrenal axis response to stress as indicated by a 2.5-nmol/L increase in salivary cortisol from before the TSST to after the TSST. Those who experienced a response to stress had a significantly higher speaking fundamental frequency before and immediately after the stressor than later after the stressor. No other variable varied significantly due to the stressor. Conclusions This study suggests that the idiosyncratic and inconsistent voice changes reported in the literature may be explained by differences in stress system activation. In addition, laryngeal aerodynamic measures appear resilient to changes due to acute stress. Further work is needed to examine the influence of other stress systems and if these findings hold for dysphonic individuals.
Collapse
Affiliation(s)
- Brittany L Perrine
- Department of Communication Sciences and Disorders, Baylor University, Waco, TX
| | - Ronald C Scherer
- Department of Communication Sciences and Disorders, Bowling Green State University, OH
| |
Collapse
|
9
|
Fundamental frequency during cognitive preparation and its impact on therapy outcome for panic disorder with Agoraphobia. Behav Res Ther 2020; 135:103728. [PMID: 32987282 DOI: 10.1016/j.brat.2020.103728] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2016] [Revised: 08/04/2020] [Accepted: 09/14/2020] [Indexed: 01/31/2023]
Abstract
BACKGROUND Cognitive preparation plays a crucial role in CBT with exposure for panic disorder and agoraphobia. High emotional arousal while developing the exposure rationale might impair patients' cognitive capacities for processing information about treatment and impede therapeutic outcome. OBJECTIVE This study investigates whether patients' vocally encoded emotional arousal, assessed by fundamental frequency (f0), during rationale development is associated with premature treatment dropout, insight into the rationale, and symptom reduction. METHODS Patients' (N = 197, mean age 36.1 years, 79.2% female) f0 during rationale development was measured based on treatment videos from a randomized controlled trial of CBT for panic disorder and agoraphobia. Insight was rater assessed. Symptom severity was self- and rater assessed at the beginning and end of therapy. RESULTS Higher f0 mean during rationale development was associated with lower probability of insight and less reduction in avoidance behavior. f0 was not associated with dropout. Insight was associated with lower probability of dropout and partially mediated the association between f0 and avoidance reduction. DISCUSSION This study highlights the importance of emotional arousal during cognitive preparation for exposure. Therapists should ensure that patients are not too highly aroused while learning about the exposure rationale as an important step in treatment.
Collapse
|
10
|
Sara JDS, Maor E, Borlaug B, Lewis BR, Orbelo D, Lerman LO, Lerman A. Non-invasive vocal biomarker is associated with pulmonary hypertension. PLoS One 2020; 15:e0231441. [PMID: 32298301 PMCID: PMC7162478 DOI: 10.1371/journal.pone.0231441] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Accepted: 03/08/2020] [Indexed: 12/19/2022] Open
Abstract
Emerging data suggest that noninvasive voice biomarker analysis is associated with coronary artery disease. We recently showed that a vocal biomarker was associated with hospitalization and heart failure in patients with heart failure. We evaluate the association between a vocal biomarker and invasively measured indices of pulmonary hypertension (PH). Patients were referred for an invasive cardiac hemodynamic study between January 2017 and December 2018, and had their voices recorded on three separate occasions to their smartphone prior to each study. A pre-established vocal biomarker was determined based on each individual recording. The intra-class correlation co-efficient between the separate voice recording biomarker values for each individual participant was 0.829 (95% CI 0.740-0.889) implying very good agreement between values. Thus, the mean biomarker was calculated for each patient. Patients were divided into two groups: high pulmonary arterial pressure (PAP) defined as ≥ 35 mmHg (moderate or greater PH), versus lower PAP. Eighty three patients, mean age 61.6 ± 15.1 years, 37 (44.6%) male, were included. Patients with a high mean PAP (≥ 35 mmHg) had on average significantly higher values of the mean voice biomarker compared to those with a lower mean PAP (0.74 ± 0.85 vs. 0.40 ± 0.88 p = 0.046). Multivariate logistic regression showed that an increase in the mean voice biomarker by 1 unit was associated with a high PAP, odds ratio 2.31, 95% CI 1.05-5.07, p = 0.038. This study shows a relationship between a noninvasive vocal biomarker and an invasively derived hemodynamic index related to PH obtained during clinically indicated cardiac catheterization. These results may have important practical clinical implications for telemedicine and remote monitoring of patients with heart failure and PH.
Collapse
Affiliation(s)
- Jaskanwal Deep Singh Sara
- Department of Cardiovascular Diseases, Mayo College of Medicine, Rochester, MN, United States of America
| | - Elad Maor
- Chaim Sheba Medical Center, Tel Hashomer, Israel
- Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Barry Borlaug
- Department of Cardiovascular Diseases, Mayo College of Medicine, Rochester, MN, United States of America
| | - Bradley R. Lewis
- Division of Biomedical Statistics and Informatics, Mayo College of Medicine, Rochester, MN, United States of America
| | - Diana Orbelo
- Divison of Laryngology, Mayo College of Medicine, Rochester, MN, United States of America
| | - Lliach O. Lerman
- Division of Nephrology and Hypertension, Mayo Clinic, Rochester, MN, United States of America
| | - Amir Lerman
- Department of Cardiovascular Diseases, Mayo College of Medicine, Rochester, MN, United States of America
| |
Collapse
|
11
|
Van Puyvelde M, Neyt X, McGlone F, Pattyn N. Voice Stress Analysis: A New Framework for Voice and Effort in Human Performance. Front Psychol 2018; 9:1994. [PMID: 30515113 PMCID: PMC6255927 DOI: 10.3389/fpsyg.2018.01994] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2018] [Accepted: 09/28/2018] [Indexed: 11/24/2022] Open
Abstract
People rely on speech for communication, both in a personal and professional context, and often under different conditions of physical, cognitive and/or emotional load. Since vocalization is entirely integrated within both our central (CNS) and autonomic nervous system (ANS), a mounting number of studies have examined the relationship between voice output and the impact of stress. In the current paper, we will outline the different stages of voice output, i.e., breathing, phonation and resonance in relation to a neurovisceral integrated perspective on stress and human performance. In reviewing the function of these three stages of voice output, we will give an overview of the voice parameters encountered in studies on voice stress analysis (VSA) and review the impact of the different types of physiological, cognitive and/or emotional load. In the section "Discussion," with regard to physical load, a competition for ventilation processes required to speak and those to meet metabolic demand of exercised muscles is described. With regard to cognitive and emotional load, we will present the "Model for Voice and Effort" (MoVE) that comprises the integration of ongoing top-down and bottom-up activity under different types of load and combined patterns of voice output. In the MoVE, it is proposed that the fundamental frequency (F0) values as well as jitter give insight in bottom-up/arousal activity and the effort a subject is capable to generate but that its range and variance are related to ongoing top-down processes and the amount of control a subject can maintain. Within the MoVE, a key-role is given to the anterior cingulate cortex (ACC) which is known to be involved in both the equilibration between bottom-up arousal and top-down regulation and vocal activity. Moreover, the connectivity between the ACC and the nervus vagus (NV) is underlined as an indication of the importance of respiration. Since respiration is the driving force of both stress and voice production, it is hypothesized to be the missing-link in our understanding of the underlying mechanisms of the dynamic between speech and stress.
Collapse
Affiliation(s)
- Martine Van Puyvelde
- VIPER Research Unit, LIFE Department, Royal Military Academy, Brussels, Belgium
- Brain, Body and Cognition, Experimental and Applied Psychology, Department of Psychological and Educational Sciences, Vrije Universiteit Brussel, Brussels, Belgium
- Clinical and Lifespan Psychology, Department of Psychological and Educational Sciences, Vrije Universiteit Brussel, Brussels, Belgium
| | - Xavier Neyt
- VIPER Research Unit, LIFE Department, Royal Military Academy, Brussels, Belgium
| | - Francis McGlone
- School of Natural Sciences and Psychology, Faculty of Science, Liverpool John Moores University, Liverpool, United Kingdom
| | - Nathalie Pattyn
- VIPER Research Unit, LIFE Department, Royal Military Academy, Brussels, Belgium
- Brain, Body and Cognition, Experimental and Applied Psychology, Department of Psychological and Educational Sciences, Vrije Universiteit Brussel, Brussels, Belgium
- MFYS-BLITS, Department of Human Physiology, Vrije Universiteit Brussel, Brussels, Belgium
| |
Collapse
|
12
|
Maor E, Sara JD, Orbelo DM, Lerman LO, Levanon Y, Lerman A. Voice Signal Characteristics Are Independently Associated With Coronary Artery Disease. Mayo Clin Proc 2018; 93:840-847. [PMID: 29656789 DOI: 10.1016/j.mayocp.2017.12.025] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/14/2017] [Revised: 11/30/2017] [Accepted: 12/19/2017] [Indexed: 11/24/2022]
Abstract
OBJECTIVE Voice signal analysis is an emerging noninvasive diagnostic tool. The current study tested the hypothesis that patient voice signal characteristics are associated with the presence of coronary artery disease (CAD). METHODS The study population included 138 patients who were enrolled between January 1, 2015, and February 28, 2017: 37 control subjects and 101 subjects who underwent planned coronary angiogram. All subjects had their voice signal recorded to their smartphone 3 times: reading a text, describing a positive emotional experience, and describing a negative emotional experience. The Mel Frequency Cepstral Coefficients were used to extract prespecified voice features from all 3 recordings. Voice was recorded before the angiogram and analysis was blinded with respect to patient data. RESULTS Final study cohort included 101 patients, of whom 71 (71%) had CAD. Compared with subjects without CAD, patients with CAD were older (median, 63 years; interquartile range [IQR], 55-68 years vs median, 53 years; IQR, 42-66 years; P=.003) and had a higher 10-year atherosclerotic cardiovascular disease (ASCVD) risk score (9.4%; IQR, 5.0-18.7 vs 2.7%; IQR, 1.6-11.8; P=.005). Univariate binary logistic regression analysis identified 5 voice features that were associated with CAD (P<.05 for all). Multivariate binary logistic regression with adjustment for ASCVD risk score identified 2 voice features that were independently associated with CAD (odds ratio [OR], 0.37; 95% CI, 0.18-0.79; and 4.01; 95% CI, 1.25-12.84; P=.009 and P=.02, respectively). Both features were more strongly associated with CAD when patients were asked to describe an emotionally significant experience. CONCLUSION This study suggests a potential relationship between voice characteristics and CAD, with clinical implications for telemedicine-when clinical health care is provided at a distance.
Collapse
Affiliation(s)
- Elad Maor
- Department of Cardiovascular Diseases, Mayo Clinic, Rochester, MN
| | - Jaskanwal D Sara
- Department of Cardiovascular Diseases, Mayo Clinic, Rochester, MN
| | - Diana M Orbelo
- Department of Otorhinolaryngology, Mayo Clinic, Rochester, MN
| | - Lilach O Lerman
- Division of Nephrology and Hypertension, Mayo Clinic, Rochester, MN
| | | | - Amir Lerman
- Department of Cardiovascular Diseases, Mayo Clinic, Rochester, MN.
| |
Collapse
|
13
|
|
14
|
Yu C, Hansen JHL. A study of voice production characteristics of astronuat speech during Apollo 11 for speaker modeling in space. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:1605. [PMID: 28372057 DOI: 10.1121/1.4976048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Human physiology has evolved to accommodate environmental conditions, including temperature, pressure, and air chemistry unique to Earth. However, the environment in space varies significantly compared to that on Earth and, therefore, variability is expected in astronauts' speech production mechanism. In this study, the variations of astronaut voice characteristics during the NASA Apollo 11 mission are analyzed. Specifically, acoustical features such as fundamental frequency and phoneme formant structure that are closely related to the speech production system are studied. For a further understanding of astronauts' vocal tract spectrum variation in space, a maximum likelihood frequency warping based analysis is proposed to detect the vocal tract spectrum displacement during space conditions. The results from fundamental frequency, formant structure, as well as vocal spectrum displacement indicate that astronauts change their speech production mechanism when in space. Moreover, the experimental results for astronaut voice identification tasks indicate that current speaker recognition solutions are highly vulnerable to astronaut voice production variations in space conditions. Future recommendations from this study suggest that successful applications of speaker recognition during extended space missions require robust speaker modeling techniques that could effectively adapt to voice production variation caused by diverse space conditions.
Collapse
Affiliation(s)
- Chengzhu Yu
- Center for Robust Speech Systems (CRSS), The University of Texas at Dallas, 800 West Campbell Road, Richardson, Texas 75080, USA
| | - John H L Hansen
- Center for Robust Speech Systems (CRSS), The University of Texas at Dallas, 800 West Campbell Road, Richardson, Texas 75080, USA
| |
Collapse
|
15
|
Johannes B, Sitev AS, Vinokhodova AG, Salnitski VP, Savchenko EG, Artyukhova AE, Bubeev YA, Morukov BV, Tafforin C, Basner M, Dinges DF, Rittweger J. Wireless Monitoring of Changes in Crew Relations during Long-Duration Mission Simulation. PLoS One 2015; 10:e0134814. [PMID: 26252656 PMCID: PMC4529101 DOI: 10.1371/journal.pone.0134814] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2015] [Accepted: 07/14/2015] [Indexed: 11/23/2022] Open
Abstract
Group structure and cohesion along with their changes over time play an important role in the success of missions where crew members spend prolonged periods of time under conditions of isolation and confinement. Therefore, an objective system for unobtrusive monitoring of crew cohesion and possible individual stress reactions is of high interest. For this purpose, an experimental wireless group structure (WLGS) monitoring system integrated into a mobile psychophysiological system was developed. In the presented study the WLGS module was evaluated separately in six male subjects (27-38 years old) participating in a 520-day simulated mission to Mars. Two days per week, each crew member wore a small sensor that registered the presence and distance of the sensors either worn by the other subjects or strategically placed throughout the isolation facility. The registration between two sensors was on average 91.0% in accordance. A correspondence of 95.7% with the survey video on day 475 confirmed external reliability. An integrated score of the "crew relation time index" was calculated and analyzed over time. Correlation analyses of a sociometric questionnaire (r = .35-.55, p< .05) and an ethological group approach (r = .45-.66, p < 05) provided initial evidence of the method's validity as a measure of cohesion when taking behavioral and activity patterns into account (e.g. only including activity phases in the afternoon). This confirms our assumption that the registered amount of time spent together during free time is associated with the intensity of personal relationships.
Collapse
Affiliation(s)
- Bernd Johannes
- Division of Space Physiology, Institute of Aerospace Medicine, German Aerospace Center (DLR), Cologne, Germany
| | - Alexej S. Sitev
- Division of Psychophysiology and Neurophysiology of operator’s activity, State Research Center of Russian Federation, Institute for Biomedical Problems RAS, Moscow, Russia
| | - Alla G. Vinokhodova
- Division of Psychophysiology and Neurophysiology of operator’s activity, State Research Center of Russian Federation, Institute for Biomedical Problems RAS, Moscow, Russia
| | - Vyacheslav P. Salnitski
- Division of Psychophysiology and Neurophysiology of operator’s activity, State Research Center of Russian Federation, Institute for Biomedical Problems RAS, Moscow, Russia
| | - Eduard G. Savchenko
- Division of Psychophysiology and Neurophysiology of operator’s activity, State Research Center of Russian Federation, Institute for Biomedical Problems RAS, Moscow, Russia
| | - Anna E. Artyukhova
- Division of Psychophysiology and Neurophysiology of operator’s activity, State Research Center of Russian Federation, Institute for Biomedical Problems RAS, Moscow, Russia
| | - Yuri A. Bubeev
- Division of Psychophysiology and Neurophysiology of operator’s activity, State Research Center of Russian Federation, Institute for Biomedical Problems RAS, Moscow, Russia
| | - Boris V. Morukov
- State Research Center of Russian Federation, Institute for Biomedical Problems RAS, Moscow, Russia
| | - Carole Tafforin
- Research and Study Group in Human and Space Ethology, Ethospace, Toulouse, France
| | - Mathias Basner
- Department of Psychiatry, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States of America
| | - David F. Dinges
- Department of Psychiatry, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States of America
| | - Jörn Rittweger
- Division of Space Physiology, Institute of Aerospace Medicine, German Aerospace Center (DLR), Cologne, Germany
| |
Collapse
|
16
|
Järvinen K, Laukkanen AM. Vocal Loading in Speaking a Foreign Language. Folia Phoniatr Logop 2015; 67:1-7. [PMID: 25925665 DOI: 10.1159/000381183] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2014] [Accepted: 02/23/2015] [Indexed: 11/19/2022] Open
Abstract
AIMS This study investigated whether speaking a foreign language affects the subjective notions of vocal fatigue, and whether acoustic measurements reveal a higher vocal loading. METHODS The speech samples of 20 native Finnish-speaking and 23 native English-speaking subjects were recorded in Finnish and in English. From the speech samples, fundamental frequency, equivalent sound level, total duration of voiced speech, speech rate, alpha ratio and L1-L0 level difference were analyzed. Vocal doses were calculated. RESULTS According to subjective notions, the voice gets tired more quickly when speaking a foreign language. The mean fundamental frequency increased but the speech rate and total duration of voiced speech decreased significantly when speaking a foreign language. Thus, the vocal doses decreased. CONCLUSIONS The subjective sensations of increased vocal fatigue may be due to increased mental stress rather than to higher vocal loading. However, a trend that speaking a foreign language may involve more loading was found in L1-L0 level difference and in the doses normalized to time dose. Longer speech samples should be studied. Voice quality-based indicators of vocal loading are worth testing in addition to the measures based on the amount of voicing in speech.
Collapse
Affiliation(s)
- Kati Järvinen
- Speech and Voice Research Laboratory, School of Education, University of Tampere, Tampere, Finland
| | | |
Collapse
|
17
|
A methodology to compensate for individual differences in psychophysiological assessment. Biol Psychol 2013; 96:77-85. [PMID: 24315952 DOI: 10.1016/j.biopsycho.2013.11.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2013] [Revised: 09/10/2013] [Accepted: 11/08/2013] [Indexed: 11/21/2022]
Abstract
The main methodological drawback to use physiological measures as indicators of arousal is, the large interindividual variability of autonomic responses hindering the direct comparability, between individuals. The present methodology has been tested in two cohorts (n1=910, n2=845) of, pilot applicants during a selection procedure. Physiological data were obtained during two mentally, demanding tasks and during a Flight Simulator Test. Five typical Autonomic Response Patterns (ARP), were identified by cluster analyses. Autonomic spaces were constructed separately for each group of, subjects having the same typical ARP, on the basis of their normalized eigenvectors. The length of the, vector sum of scores on autonomic space dimensions provided an integral index for arousal, labeled, Psychophysiological Arousal Value (PAV). The PAV still reflected the changes in mental load during the, tests, but equalized physiological differences among ARP-groups. The results obtained in the first, cohort were verified in the second cohort.
Collapse
|
18
|
Weusthoff S, Baucom BR, Hahlweg K. The siren song of vocal fundamental frequency for romantic relationships. Front Psychol 2013; 4:439. [PMID: 23874321 PMCID: PMC3710992 DOI: 10.3389/fpsyg.2013.00439] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2013] [Accepted: 06/25/2013] [Indexed: 11/13/2022] Open
Abstract
A multitude of factors contribute to why and how romantic relationships are formed as well as whether they ultimately succeed or fail. Drawing on evolutionary models of attraction and speech production as well as integrative models of relationship functioning, this review argues that paralinguistic cues (more specifically the fundamental frequency of the voice) that are initially a strong source of attraction also increase couples' risk for relationship failure. Conceptual similarities and differences between the multiple operationalizations and interpretations of vocal fundamental frequency are discussed and guidelines are presented for understanding both convergent and non-convergent findings. Implications for clinical practice and future research are discussed.
Collapse
Affiliation(s)
- Sarah Weusthoff
- Clinical Psychology, Psychotherapy, and Assessment, Department of Psychology, Technische Universität Braunschweig Braunschweig, Germany
| | | | | |
Collapse
|
19
|
Alvear RMBD, Barón-López FJ, Alguacil MD, Dawid-Milner MS. Interactions between voice fundamental frequency and cardiovascular parameters. Preliminary results and physiological mechanisms. LOGOP PHONIATR VOCO 2012; 38:52-8. [PMID: 22741554 DOI: 10.3109/14015439.2012.696140] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
OBJECTIVES To determine heart rate influence on voice fundamental frequency under stress conditions. METHODS In 14 healthy volunteers, heart rate and blood pressure variables were analyzed during three classical autonomic tasks. Sustained voice samples were obtained to analyze F0. RESULTS Cold pressure test increased mean blood pressure, without effect on heart rate; isometric and mental tasks increased heart rate and blood pressure. Voice F0 was only affected by mental and cold ice tasks; it significantly correlated with the heart rate that occurred before and during every vocal emission. DISCUSSION Cardiovascular changes showed that subjects were significantly stressed during autonomic tasks. Heartbeat variations had a regular and significant influence on phonatory frequency, and this effect occurred during baseline and stress conditions.
Collapse
Affiliation(s)
- Rosa M Bermúdez de Alvear
- Radiology and Physical Medicine, Ophthalmology and Otorhinolaryngology Department, Medical Faculty, Malaga University, Spain.
| | | | | | | |
Collapse
|
20
|
Godin KW, Hansen JHL. Analysis of the effects of physical task stress on the speech signal. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 130:3992-3998. [PMID: 22225053 DOI: 10.1121/1.3647301] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Physical task stress is known to affect the fundamental frequency and other measurements of the speech signal. A corpus of physical task stress speech is analyzed using a spectrum F-ratio and frame score distribution divergences. The measurements differ between phone classes, and are greater for vowels and nasals than for plosives and fricatives. In further analysis, frame score distribution divergences are used to measure the spectral dissimilarity between neutral and physical task stress speech. Frame scores are the log likelihood ratios between Gaussian mixture models (GMMs) of physical task stress and of neutral speech. Mel-frequency cepstral coefficients are used as the acoustic feature inputs to the GMMs. A Laplacian distribution is fitted to the frame scores for each of ten phone classes, and the symmetric Kullback-Leibler divergence is employed to measure the change in distribution from neutral to physical task stress. The results suggest that the spectral dissimilarity is greatest for the second level of a four level exertion measurement, and that spectral dissimilarity is greater for nasal phones than for plosives and fricatives. Further, the results suggest that different phone classes are affected differently by physical task stress.
Collapse
Affiliation(s)
- Keith W Godin
- Center for Robust Speech Systems (CRSS), The University of Texas at Dallas, 800 West Campbell Road, Richardson, Texas 75080, USA
| | | |
Collapse
|
21
|
Giddens CL, Barron KW, Clark KF, Warde WD. Beta-adrenergic blockade and voice: a double-blind, placebo-controlled trial. J Voice 2009; 24:477-89. [PMID: 19846273 DOI: 10.1016/j.jvoice.2008.12.002] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2008] [Accepted: 12/02/2008] [Indexed: 11/26/2022]
Abstract
This study investigated the effects of laboratory-induced stress and beta-adrenergic blockade on acoustic and aerodynamic voice measures. In a double-blind, placebo-controlled trial, 12 participants, six males and six females, underwent cold pressor-induced sympathetic activation followed by placebo or treatment with 40 mg propranolol. Aerodynamic and acoustic parameters of voice were collected at baseline, during cold pressor and after treatment with propranolol or placebo. Fundamental frequency, jitter, shimmer, maximum airflow declination rate, voice onset time, speaking rate, and subglottal pressure were measured at baseline, during cold pressor-induced stress, and after treatment with propranolol or placebo. Cardiovascular measures served as indicators of sympathetic nervous system (SNS) activation by cold pressor and antagonism by propranolol, and were collected during all conditions. Cold pressor appeared to adequately agonize the SNS as indicated by significant increases in resting systolic and diastolic blood pressure and heart rate. Propranolol appeared to adequately antagonize the SNS for the participants. Jitter ratio demonstrated a statistically significant increase in the participants treated with propranolol. Speaking rate demonstrated a small but significant increase in the placebo control group during cold pressor. Gender differences were observed in a few measures. Cold pressor adequately agonized and propranolol adequately antagonized the SNS. No statistically significant differences across subjects were observed in the voice parameters during cold pressor-induced stress before treatment. Jitter ratio increased significantly during propranolol treatment and cold pressor. Speaking rate demonstrated a statistically significant increase during cold pressor in the placebo control group. Gender differences were observed, but were few.
Collapse
Affiliation(s)
- Cheryl L Giddens
- Communication Sciences & Disorders, Oklahoma State University, Stillwater, Oklahoma, USA.
| | | | | | | |
Collapse
|