1
|
Thoret E, Andrillon T, Gauriau C, Léger D, Pressnitzer D. Sleep deprivation detected by voice analysis. PLoS Comput Biol 2024; 20:e1011849. [PMID: 38315733 PMCID: PMC10890756 DOI: 10.1371/journal.pcbi.1011849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 02/23/2024] [Accepted: 01/22/2024] [Indexed: 02/07/2024] Open
Abstract
Sleep deprivation has an ever-increasing impact on individuals and societies. Yet, to date, there is no quick and objective test for sleep deprivation. Here, we used automated acoustic analyses of the voice to detect sleep deprivation. Building on current machine-learning approaches, we focused on interpretability by introducing two novel ideas: the use of a fully generic auditory representation as input feature space, combined with an interpretation technique based on reverse correlation. The auditory representation consisted of a spectro-temporal modulation analysis derived from neurophysiology. The interpretation method aimed to reveal the regions of the auditory representation that supported the classifiers' decisions. Results showed that generic auditory features could be used to detect sleep deprivation successfully, with an accuracy comparable to state-of-the-art speech features. Furthermore, the interpretation revealed two distinct effects of sleep deprivation on the voice: changes in slow temporal modulations related to prosody and changes in spectral features related to voice quality. Importantly, the relative balance of the two effects varied widely across individuals, even though the amount of sleep deprivation was controlled, thus confirming the need to characterize sleep deprivation at the individual level. Moreover, while the prosody factor correlated with subjective sleepiness reports, the voice quality factor did not, consistent with the presence of both explicit and implicit consequences of sleep deprivation. Overall, the findings show that individual effects of sleep deprivation may be observed in vocal biomarkers. Future investigations correlating such markers with objective physiological measures of sleep deprivation could enable "sleep stethoscopes" for the cost-effective diagnosis of the individual effects of sleep deprivation.
Collapse
Affiliation(s)
- Etienne Thoret
- Laboratoire des systèmes perceptifs, Département d’études cognitives, École normale supérieure, PSL University, CNRS, Paris, France
- Aix-Marseille University, CNRS, Institut de Neurosciences de la Timone (INT) UMR7289, Perception Representation Image Sound Music (PRISM) UMR7061, Laboratoire d’Informatique et Systèmes (LIS) UMR7020, Marseille, France
- Institute of Language Communication and the Brain, Aix-Marseille University, Marseille, France
| | - Thomas Andrillon
- Sorbonne Université, Institut du Cerveau - Paris Brain Institute - ICM, Mov’it team, Inserm, CNRS, Paris, France
- Université Paris Cité, VIFASOM, ERC 7330, Vigilance Fatigue Sommeil et santé publique, Paris, France
- APHP, Hôtel-Dieu, Centre du Sommeil et de la Vigilance, Paris, France
| | - Caroline Gauriau
- Université Paris Cité, VIFASOM, ERC 7330, Vigilance Fatigue Sommeil et santé publique, Paris, France
- APHP, Hôtel-Dieu, Centre du Sommeil et de la Vigilance, Paris, France
| | - Damien Léger
- Université Paris Cité, VIFASOM, ERC 7330, Vigilance Fatigue Sommeil et santé publique, Paris, France
- APHP, Hôtel-Dieu, Centre du Sommeil et de la Vigilance, Paris, France
| | - Daniel Pressnitzer
- Laboratoire des systèmes perceptifs, Département d’études cognitives, École normale supérieure, PSL University, CNRS, Paris, France
| |
Collapse
|
2
|
Virk JS, Singh M, Singh M, Panjwani U, Ray K. A Multimodal Feature Fusion Framework for Sleep-Deprived Fatigue Detection to Prevent Accidents. SENSORS (BASEL, SWITZERLAND) 2023; 23:4129. [PMID: 37112470 PMCID: PMC10144633 DOI: 10.3390/s23084129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 04/16/2023] [Accepted: 04/18/2023] [Indexed: 06/19/2023]
Abstract
Sleep-deprived fatigued person is likely to commit more errors that may even prove to be fatal. Thus, it is necessary to recognize this fatigue. The novelty of the proposed research work for the detection of this fatigue is that it is nonintrusive and based on multimodal feature fusion. In the proposed methodology, fatigue is detected by obtaining features from four domains: visual images, thermal images, keystroke dynamics, and voice features. In the proposed methodology, the samples of a volunteer (subject) are obtained from all four domains for feature extraction, and empirical weights are assigned to the four different domains. Young, healthy volunteers (n = 60) between the age group of 20 to 30 years participated in the experimental study. Further, they abstained from the consumption of alcohol, caffeine, or other drugs impacting their sleep pattern during the study. Through this multimodal technique, appropriate weights are given to the features obtained from the four domains. The results are compared with k-nearest neighbors (kNN), support vector machines (SVM), random tree, random forest, and multilayer perceptron classifiers. The proposed nonintrusive technique has obtained an average detection accuracy of 93.33% in 3-fold cross-validation.
Collapse
Affiliation(s)
- Jitender Singh Virk
- EIE Department, Thapar Institute of Engineering and Technology, Patiala 147001, India
| | - Mandeep Singh
- EIE Department, Thapar Institute of Engineering and Technology, Patiala 147001, India
| | - Mandeep Singh
- EIE Department, Thapar Institute of Engineering and Technology, Patiala 147001, India
| | - Usha Panjwani
- DIPAS, Defence Research and Development Organisation, Delhi 110054, India
| | - Koushik Ray
- DIPAS, Defence Research and Development Organisation, Delhi 110054, India
| |
Collapse
|
3
|
Zhao Q, Fan HZ, Li YL, Liu L, Wu YX, Zhao YL, Tian ZX, Wang ZR, Tan YL, Tan SP. Vocal Acoustic Features as Potential Biomarkers for Identifying/Diagnosing Depression: A Cross-Sectional Study. Front Psychiatry 2022; 13:815678. [PMID: 35573349 PMCID: PMC9095973 DOI: 10.3389/fpsyt.2022.815678] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 03/30/2022] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND At present, there is no established biomarker for the diagnosis of depression. Meanwhile, studies show that acoustic features convey emotional information. Therefore, this study explored differences in acoustic characteristics between depressed patients and healthy individuals to investigate whether these characteristics can identify depression. METHODS Participants included 71 patients diagnosed with depression from a regional hospital in Beijing, China, and 62 normal controls from within the greater community. We assessed the clinical symptoms of depression of all participants using the Hamilton Depression Scale (HAMD), Hamilton Anxiety Scale (HAMA), and Patient Health Questionnaire (PHQ-9), and recorded the voice of each participant as they read positive, neutral, and negative texts. OpenSMILE was used to analyze their voice acoustics and extract acoustic characteristics from the recordings. RESULTS There were significant differences between the depression and control groups in all acoustic characteristics (p < 0.05). Several mel-frequency cepstral coefficients (MFCCs), including MFCC2, MFCC3, MFCC8, and MFCC9, differed significantly between different emotion tasks; MFCC4 and MFCC7 correlated positively with PHQ-9 scores, and correlations were stable in all emotion tasks. The zero-crossing rate in positive emotion correlated positively with HAMA total score and HAMA somatic anxiety score (r = 0.31, r = 0.34, respectively), and MFCC9 of neutral emotion correlated negatively with HAMD anxiety/somatization scores (r = -0.34). Linear regression showed that the MFCC7-negative was predictive on the PHQ-9 score (β = 0.90, p = 0.01) and MFCC9-neutral was predictive on HAMD anxiety/somatization score (β = -0.45, p = 0.049). Logistic regression showed a superior discriminant effect, with a discrimination accuracy of 89.66%. CONCLUSION The acoustic expression of emotion among patients with depression differs from that of normal controls. Some acoustic characteristics are related to the severity of depressive symptoms and may be objective biomarkers of depression. A systematic method of assessing vocal acoustic characteristics could provide an accurate and discreet means of screening for depression; this method may be used instead of-or in conjunction with-traditional screening methods, as it is not subject to the limitations associated with self-reported assessments wherein subjects may be inclined to provide socially acceptable responses rather than being truthful.
Collapse
Affiliation(s)
- Qing Zhao
- Peking University HuiLongGuan Clinical Medical School, Beijing Huilongguan Hospital, Beijing, China
| | - Hong-Zhen Fan
- Peking University HuiLongGuan Clinical Medical School, Beijing Huilongguan Hospital, Beijing, China
| | - Yan-Li Li
- Peking University HuiLongGuan Clinical Medical School, Beijing Huilongguan Hospital, Beijing, China
| | - Lei Liu
- Peking University HuiLongGuan Clinical Medical School, Beijing Huilongguan Hospital, Beijing, China
| | - Ya-Xue Wu
- Peking University HuiLongGuan Clinical Medical School, Beijing Huilongguan Hospital, Beijing, China
| | - Yan-Li Zhao
- Peking University HuiLongGuan Clinical Medical School, Beijing Huilongguan Hospital, Beijing, China
| | - Zhan-Xiao Tian
- Peking University HuiLongGuan Clinical Medical School, Beijing Huilongguan Hospital, Beijing, China
| | - Zhi-Ren Wang
- Peking University HuiLongGuan Clinical Medical School, Beijing Huilongguan Hospital, Beijing, China
| | - Yun-Long Tan
- Peking University HuiLongGuan Clinical Medical School, Beijing Huilongguan Hospital, Beijing, China
| | - Shu-Ping Tan
- Peking University HuiLongGuan Clinical Medical School, Beijing Huilongguan Hospital, Beijing, China
| |
Collapse
|
4
|
Martin VP, Rouas JL, Micoulaud-Franchi JA, Philip P, Krajewski J. How to Design a Relevant Corpus for Sleepiness Detection Through Voice? Front Digit Health 2021; 3:686068. [PMID: 34713156 PMCID: PMC8521834 DOI: 10.3389/fdgth.2021.686068] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Accepted: 08/19/2021] [Indexed: 12/27/2022] Open
Abstract
This article presents research on the detection of pathologies affecting speech through automatic analysis. Voice processing has indeed been used for evaluating several diseases such as Parkinson, Alzheimer, or depression. If some studies present results that seem sufficient for clinical applications, this is not the case for the detection of sleepiness. Even two international challenges and the recent advent of deep learning techniques have still not managed to change this situation. This article explores the hypothesis that the observed average performances of automatic processing find their cause in the design of the corpora. To this aim, we first discuss and refine the concept of sleepiness related to the ground-truth labels. Second, we present an in-depth study of four corpora, bringing to light the methodological choices that have been made and the underlying biases they may have induced. Finally, in light of this information, we propose guidelines for the design of new corpora.
Collapse
Affiliation(s)
- Vincent P. Martin
- Laboratoire Bordelais de Recherche en Informatique, University of Bordeaux, CNRS–UMR 5800, Bordeaux INP, Talence, France
| | - Jean-Luc Rouas
- Laboratoire Bordelais de Recherche en Informatique, University of Bordeaux, CNRS–UMR 5800, Bordeaux INP, Talence, France
| | | | - Pierre Philip
- Sommeil, Addiction et Neuropsychiatrie, University of Bordeaux, CNRS–USR 3413, CHU Pellegrin, Bordeaux, France
| | - Jarek Krajewski
- Engineering Psychology, Rhenish University of Applied Science, Cologne, Germany
| |
Collapse
|
5
|
Kaduk SI, Roberts APJ, Stanton NA. The circadian effect on psychophysiological driver state monitoring. THEORETICAL ISSUES IN ERGONOMICS SCIENCE 2020. [DOI: 10.1080/1463922x.2020.1842548] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Sylwia I. Kaduk
- Human Factors Engineering, Transportation Research Group, Faculty of Engineering and Physical Sciences, University of Southampton, Southampton, United Kingdom
| | - Aaron P. J. Roberts
- Human Factors Engineering, Transportation Research Group, Faculty of Engineering and Physical Sciences, University of Southampton, Southampton, United Kingdom
| | - Neville A. Stanton
- Human Factors Engineering, Transportation Research Group, Faculty of Engineering and Physical Sciences, University of Southampton, Southampton, United Kingdom
| |
Collapse
|
6
|
Voleti R, Liss JM, Berisha V. A Review of Automated Speech and Language Features for Assessment of Cognitive and Thought Disorders. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING 2020; 14:282-298. [PMID: 33907590 PMCID: PMC8074691 DOI: 10.1109/jstsp.2019.2952087] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
It is widely accepted that information derived from analyzing speech (the acoustic signal) and language production (words and sentences) serves as a useful window into the health of an individual's cognitive ability. In fact, most neuropsychological testing batteries have a component related to speech and language where clinicians elicit speech from patients for subjective evaluation across a broad set of dimensions. With advances in speech signal processing and natural language processing, there has been recent interest in developing tools to detect more subtle changes in cognitive-linguistic function. This work relies on extracting a set of features from recorded and transcribed speech for objective assessments of speech and language, early diagnosis of neurological disease, and tracking of disease after diagnosis. With an emphasis on cognitive and thought disorders, in this paper we provide a review of existing speech and language features used in this domain, discuss their clinical application, and highlight their advantages and disadvantages. Broadly speaking, the review is split into two categories: language features based on natural language processing and speech features based on speech signal processing. Within each category, we consider features that aim to measure complementary dimensions of cognitive-linguistics, including language diversity, syntactic complexity, semantic coherence, and timing. We conclude the review with a proposal of new research directions to further advance the field.
Collapse
Affiliation(s)
- Rohit Voleti
- School of Electrical, Computer, & Energy Engineering, Arizona State University, Tempe, AZ, 85281 USA
| | | | | |
Collapse
|
7
|
Boyer S, Paubel PV, Ruiz R, El Yagoubi R, Daurat A. Human Voice as a Measure of Mental Load Level. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2018; 61:2722-2734. [PMID: 30383160 DOI: 10.1044/2018_jslhr-s-18-0066] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/22/2018] [Accepted: 06/07/2018] [Indexed: 06/08/2023]
Abstract
PURPOSE The aim of this study was to determine a reliable and efficient set of acoustic parameters of the human voice able to estimate individuals' mental load level. Implementing detection methods and real-time analysis of mental load is a major challenge for monitoring and enhancing human task performance, especially during high-risk activities (e.g., flying aircraft). METHOD The voices of 32 participants were recorded during a cognitive task featuring word list recall. The difficulty of the task was manipulated by varying the number of words in each list (i.e., between 1 and 7, corresponding to 7 mental load conditions). Evoked pupillary response, known to be a useful proxy of mental load, was recorded simultaneously with speech to attest variations in mental load level during the experimental task. RESULTS Classic features (fundamental frequency, its standard deviation, number of periods) and original features (frequency modulation and short-term variation in digital amplitude length) of the acoustic signals were predictive of memory load condition. They varied significantly according to the number of words to recall, specifically beyond a threshold of 3-5 words to recall, that is, when memory performance started to decline. CONCLUSIONS Some acoustic parameters of the human voice could be an appropriate and efficient means for detecting mental load levels.
Collapse
Affiliation(s)
- Stanislas Boyer
- Cognition, Languages, Language, Ergonomics-Work & Cognition Laboratory (CLLE-LTC), University of Toulouse and Centre National de la Recherche Scientifique, France
| | - Pierre-Vincent Paubel
- Cognition, Languages, Language, Ergonomics-Work & Cognition Laboratory (CLLE-LTC), University of Toulouse and Centre National de la Recherche Scientifique, France
| | - Robert Ruiz
- Audiovisual Research Laboratory (LARA), University of Toulouse, France
| | - Radouane El Yagoubi
- Cognition, Languages, Language, Ergonomics-Work & Cognition Laboratory (CLLE-LTC), University of Toulouse and Centre National de la Recherche Scientifique, France
| | - Agnès Daurat
- Cognition, Languages, Language, Ergonomics-Work & Cognition Laboratory (CLLE-LTC), University of Toulouse and Centre National de la Recherche Scientifique, France
| |
Collapse
|
8
|
Li L, Ngan CK. A weight-adjusted-voting framework on an ensemble of classifiers for improving sensitivity. INTELL DATA ANAL 2017. [DOI: 10.3233/ida-163184] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Lin Li
- Department of Computer Science and Software Engineering, Seattle University, Seattle, WA 98122, USA
| | - Chun-Kit Ngan
- Division of Engineering and Information Science, The Pennsylvania State University, Malvern, PA 19355, USA
| |
Collapse
|
9
|
Pattern Recognition Methods and Features Selection for Speech Emotion Recognition System. ScientificWorldJournal 2015; 2015:573068. [PMID: 26346654 PMCID: PMC4539500 DOI: 10.1155/2015/573068] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2014] [Accepted: 10/27/2014] [Indexed: 11/17/2022] Open
Abstract
The impact of the classification method and features selection for the speech emotion recognition accuracy is discussed in this paper. Selecting the correct parameters in combination with the classifier is an important part of reducing the complexity of system computing. This step is necessary especially for systems that will be deployed in real-time applications. The reason for the development and improvement of speech emotion recognition systems is wide usability in nowadays automatic voice controlled systems. Berlin database of emotional recordings was used in this experiment. Classification accuracy of artificial neural networks, k-nearest neighbours, and Gaussian mixture model is measured considering the selection of prosodic, spectral, and voice quality features. The purpose was to find an optimal combination of methods and group of features for stress detection in human speech. The research contribution lies in the design of the speech emotion recognition system due to its accuracy and efficiency.
Collapse
|
10
|
Zhou Y, Zhao H, Pan X, Shang L. Deception detecting from speech signal using relevance vector machine and non-linear dynamics features. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2014.04.083] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
11
|
Schuller B, Steidl S, Batliner A, Schiel F, Krajewski J, Weninger F, Eyben F. Medium-term speaker states—A review on intoxication, sleepiness and the first challenge. COMPUT SPEECH LANG 2014. [DOI: 10.1016/j.csl.2012.12.002] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
12
|
Caraty MJ, Montacié C. Vocal fatigue induced by prolonged oral reading: Analysis and detection. COMPUT SPEECH LANG 2014. [DOI: 10.1016/j.csl.2012.12.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
13
|
Montero Benavides A, Fernández Pozo R, Toledano DT, Blanco Murillo JL, López Gonzalo E, Hernández Gómez L. Analysis of voice features related to obstructive sleep apnoea and their application in diagnosis support. COMPUT SPEECH LANG 2014. [DOI: 10.1016/j.csl.2013.08.002] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
14
|
Sustainable Reduction of Sleepiness through Salutogenic Self-Care Procedure in Lunch Breaks: A Pilot Study. EVIDENCE-BASED COMPLEMENTARY AND ALTERNATIVE MEDICINE 2014; 2013:387356. [PMID: 24381633 PMCID: PMC3870120 DOI: 10.1155/2013/387356] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/19/2013] [Revised: 09/30/2013] [Accepted: 10/14/2013] [Indexed: 12/02/2022]
Abstract
The aim of the study was to elucidate the immediate, intermediate, and anticipatory sleepiness reducing effects of a salutogenic self-care procedure called progressive muscle relaxation (PMR), during lunch breaks. The second exploratory aim deals with determining the onset and long-term time course of sleepiness changes. In order to evaluate the intraday range and interday change of the proposed relaxation effects, 14 call center agents were assigned to either a daily 20-minute self-administered PMR or a small talk (ST) group during a period of seven months. Participants' levels of sleepiness were analyzed in a controlled trial using anticipatory, postlunchtime, and afternoon changes of sleepiness as indicated by continuously determined objective reaction time measures (16,464 measurements) and self-reports administered five times per day, once per month (490 measurements). Results indicate that, in comparison to ST, the PMR break (a) induces immediate, intermediate, and anticipatory reductions in sleepiness; (b) these significant effects remarkably show up after one month, and sleepiness continues to decrease for at least another five months. Although further research is required referring to the specific responsible mediating variables, our results suggest that relaxation based lunch breaks are both accepted by employees and provide a sustainable impact on sleepiness.
Collapse
|
15
|
An automated optimal engagement and attention detection system using electrocardiogram. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2012; 2012:528781. [PMID: 22924060 PMCID: PMC3424596 DOI: 10.1155/2012/528781] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/01/2012] [Accepted: 06/18/2012] [Indexed: 11/19/2022]
Abstract
This research proposes to develop a monitoring system which uses Electrocardiograph (ECG) as a fundamental physiological signal, to analyze and predict the presence or lack of cognitive attention in individuals during a task execution. The primary focus of this study is to identify the correlation between fluctuating level of attention and its implications on the cardiac rhythm recorded in the ECG. Furthermore, Electroencephalograph (EEG) signals are also analyzed and classified for use as a benchmark for comparison with ECG analysis. Several advanced signal processing techniques have been implemented and investigated to derive multiple clandestine and informative features from both these physiological signals. Decomposition and feature extraction are done using Stockwell-transform for the ECG signal, while Discrete Wavelet Transform (DWT) is used for EEG. These features are then applied to various machine-learning algorithms to produce classification models that are capable of differentiating between the cases of a person being attentive and a person not being attentive. The presented results show that detection and classification of cognitive attention using ECG are fairly comparable to EEG.
Collapse
|