1
|
Neumann M, Kothare H, Ramanarayanan V. Multimodal speech biomarkers for remote monitoring of ALS disease progression. Comput Biol Med 2024; 180:108949. [PMID: 39126786 DOI: 10.1016/j.compbiomed.2024.108949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Revised: 06/26/2024] [Accepted: 07/03/2024] [Indexed: 08/12/2024]
Abstract
Amyotrophic lateral sclerosis (ALS) is a progressive neurodegenerative disease that severely impacts affected persons' speech and motor functions, yet early detection and tracking of disease progression remain challenging. The current gold standard for monitoring ALS progression, the ALS functional rating scale - revised (ALSFRS-R), is based on subjective ratings of symptom severity, and may not capture subtle but clinically meaningful changes due to a lack of granularity. Multimodal speech measures which can be automatically collected from patients in a remote fashion allow us to bridge this gap because they are continuous-valued and therefore, potentially more granular at capturing disease progression. Here we investigate the responsiveness and sensitivity of multimodal speech measures in persons with ALS (pALS) collected via a remote patient monitoring platform in an effort to quantify how long it takes to detect a clinically-meaningful change associated with disease progression. We recorded audio and video from 278 participants and automatically extracted multimodal speech biomarkers (acoustic, orofacial, linguistic) from the data. We find that the timing alignment of pALS speech relative to a canonical elicitation of the same prompt and the number of words used to describe a picture are the most responsive measures at detecting such change in both pALS with bulbar (n = 36) and non-bulbar onset (n = 107). Interestingly, the responsiveness of these measures is stable even at small sample sizes. We further found that certain speech measures are sensitive enough to track bulbar decline even when there is no patient-reported clinical change, i.e. the ALSFRS-R speech score remains unchanged at 3 out of a total possible score of 4. The findings of this study have the potential to facilitate improved, accelerated and cost-effective clinical trials and care.
Collapse
Affiliation(s)
| | | | - Vikram Ramanarayanan
- Modality.AI, Inc., San Francisco, CA, USA; University of California, San Francisco, CA, USA.
| |
Collapse
|
2
|
Berisha V, Liss JM. Responsible development of clinical speech AI: Bridging the gap between clinical research and technology. NPJ Digit Med 2024; 7:208. [PMID: 39122889 PMCID: PMC11316053 DOI: 10.1038/s41746-024-01199-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 07/19/2024] [Indexed: 08/12/2024] Open
Abstract
This perspective article explores the challenges and potential of using speech as a biomarker in clinical settings, particularly when constrained by the small clinical datasets typically available in such contexts. We contend that by integrating insights from speech science and clinical research, we can reduce sample complexity in clinical speech AI models with the potential to decrease timelines to translation. Most existing models are based on high-dimensional feature representations trained with limited sample sizes and often do not leverage insights from speech science and clinical research. This approach can lead to overfitting, where the models perform exceptionally well on training data but fail to generalize to new, unseen data. Additionally, without incorporating theoretical knowledge, these models may lack interpretability and robustness, making them challenging to troubleshoot or improve post-deployment. We propose a framework for organizing health conditions based on their impact on speech and promote the use of speech analytics in diverse clinical contexts beyond cross-sectional classification. For high-stakes clinical use cases, we advocate for a focus on explainable and individually-validated measures and stress the importance of rigorous validation frameworks and ethical considerations for responsible deployment. Bridging the gap between AI research and clinical speech research presents new opportunities for more efficient translation of speech-based AI tools and advancement of scientific discoveries in this interdisciplinary space, particularly if limited to small or retrospective datasets.
Collapse
Affiliation(s)
- Visar Berisha
- School of Electrical Computer and Energy Engineering and College of Health Solutions, Arizona State University, Tempe, AZ, USA.
| | - Julie M Liss
- College of Health Solutions, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
3
|
Taşcı B. Multilevel hybrid handcrafted feature extraction based depression recognition method using speech. J Affect Disord 2024:S0165-0327(24)01215-1. [PMID: 39127304 DOI: 10.1016/j.jad.2024.08.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 05/26/2024] [Accepted: 08/07/2024] [Indexed: 08/12/2024]
Abstract
BACKGROUND AND PURPOSE Diagnosis of depression is based on tests performed by psychiatrists and information provided by patients or their relatives. In the field of machine learning (ML), numerous models have been devised to detect depression automatically through the analysis of speech audio signals. While deep learning approaches often achieve superior classification accuracy, they are notably resource-intensive. This research introduces an innovative, multilevel hybrid feature extraction-based classification model, specifically designed for depression detection, which exhibits reduced time complexity. MATERIALS AND METHODS MODMA dataset consisting of 29 healthy and 23 Major depressive disorder audio signals was used. The constructed model architecture integrates multilevel hybrid feature extraction, iterative feature selection, and classification processes. During the Hybrid Handcrafted Feature (HHF) generation stage, a combination of textural and statistical methods was employed to extract low-level features from speech audio signals. To enhance this process for high-level feature creation, a Multilevel Discrete Wavelet Transform (MDWT) was applied. This technique produced wavelet subbands, which were then input into the hybrid feature extractor, enabling the extraction of both high and low-level features. For the selection of the most pertinent features from these extracted vectors, Iterative Neighborhood Component Analysis (INCA) was utilized. Finally, in the classification phase, a one-dimensional nearest neighbor classifier, augmented with ten-fold cross-validation, was implemented to achieve detailed, results. RESULTS The HHF-based speech audio signal classification model attained excellent performance, with the 94.63 % classification accuracy. CONCLUSIONS The findings validate the remarkable proficiency of the introduced HHF-based model in depression classification, underscoring its computational efficiency.
Collapse
Affiliation(s)
- Burak Taşcı
- Vocational School of Technical Sciences, Firat University, Elazig 23119, Turkey.
| |
Collapse
|
4
|
Kaczmarek-Majer K, Dominiak M, Antosik AZ, Hryniewicz O, Kamińska O, Opara K, Owsiński J, Radziszewska W, Sochacka M, Święcicki Ł. Acoustic features from speech as markers of depressive and manic symptoms in bipolar disorder: A prospective study. Acta Psychiatr Scand 2024. [PMID: 39118422 DOI: 10.1111/acps.13735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 06/14/2024] [Accepted: 07/06/2024] [Indexed: 08/10/2024]
Abstract
INTRODUCTION Voice features could be a sensitive marker of affective state in bipolar disorder (BD). Smartphone apps offer an excellent opportunity to collect voice data in the natural setting and become a useful tool in phase prediction in BD. AIMS OF THE STUDY We investigate the relations between the symptoms of BD, evaluated by psychiatrists, and patients' voice characteristics. A smartphone app extracted acoustic parameters from the daily phone calls of n = 51 patients. We show how the prosodic, spectral, and voice quality features correlate with clinically assessed affective states and explore their usefulness in predicting the BD phase. METHODS A smartphone app (BDmon) was developed to collect the voice signal and extract its physical features. BD patients used the application on average for 208 days. Psychiatrists assessed the severity of BD symptoms using the Hamilton depression rating scale -17 and the Young Mania rating scale. We analyze the relations between acoustic features of speech and patients' mental states using linear generalized mixed-effect models. RESULTS The prosodic, spectral, and voice quality parameters, are valid markers in assessing the severity of manic and depressive symptoms. The accuracy of the predictive generalized mixed-effect model is 70.9%-71.4%. Significant differences in the effect sizes and directions are observed between female and male subgroups. The greater the severity of mania in males, the louder (β = 1.6) and higher the tone of voice (β = 0.71), more clearly (β = 1.35), and more sharply they speak (β = 0.95), and their conversations are longer (β = 1.64). For females, the observations are either exactly the opposite-the greater the severity of mania, the quieter (β = -0.27) and lower the tone of voice (β = -0.21) and less clearly (β = -0.25) they speak - or no correlations are found (length of speech). On the other hand, the greater the severity of bipolar depression in males, the quieter (β = -1.07) and less clearly they speak (β = -1.00). In females, no distinct correlations between the severity of depressive symptoms and the change in voice parameters are found. CONCLUSIONS Speech analysis provides physiological markers of affective symptoms in BD and acoustic features extracted from speech are effective in predicting BD phases. This could personalize monitoring and care for BD patients, helping to decide whether a specialist should be consulted.
Collapse
Affiliation(s)
- Katarzyna Kaczmarek-Majer
- Department of Stochastic Methods, Systems Research Institute Polish Academy of Sciences, Warsaw, Poland
| | - Monika Dominiak
- Department of Pharmacology and Physiology of the Nervous System, Institute of Psychiatry and Neurology, Warsaw, Poland
- Section of Biological Psychiatry, Polish Psychiatric Association, Warsaw, Poland
| | - Anna Z Antosik
- Section of Biological Psychiatry, Polish Psychiatric Association, Warsaw, Poland
- Department of Psychiatry, Faculty of Medicine, Collegium Medicum, Cardinal Wyszynski University in Warsaw, Warsaw, Poland
| | - Olgierd Hryniewicz
- Department of Stochastic Methods, Systems Research Institute Polish Academy of Sciences, Warsaw, Poland
| | - Olga Kamińska
- Department of Stochastic Methods, Systems Research Institute Polish Academy of Sciences, Warsaw, Poland
| | - Karol Opara
- Department of Stochastic Methods, Systems Research Institute Polish Academy of Sciences, Warsaw, Poland
| | - Jan Owsiński
- Department of Stochastic Methods, Systems Research Institute Polish Academy of Sciences, Warsaw, Poland
| | - Weronika Radziszewska
- Department of Stochastic Methods, Systems Research Institute Polish Academy of Sciences, Warsaw, Poland
| | | | - Łukasz Święcicki
- Department of Affective Disorders, II Psychiatric Clinic, Institute of Psychiatry and Neurology, Warsaw, Poland
| |
Collapse
|
5
|
Liu L, Liu L, Wafa HA, Tydeman F, Xie W, Wang Y. Diagnostic accuracy of deep learning using speech samples in depression: a systematic review and meta-analysis. J Am Med Inform Assoc 2024:ocae189. [PMID: 39013193 DOI: 10.1093/jamia/ocae189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Revised: 06/12/2024] [Accepted: 07/05/2024] [Indexed: 07/18/2024] Open
Abstract
OBJECTIVE This study aims to conduct a systematic review and meta-analysis of the diagnostic accuracy of deep learning (DL) using speech samples in depression. MATERIALS AND METHODS This review included studies reporting diagnostic results of DL algorithms in depression using speech data, published from inception to January 31, 2024, on PubMed, Medline, Embase, PsycINFO, Scopus, IEEE, and Web of Science databases. Pooled accuracy, sensitivity, and specificity were obtained by random-effect models. The diagnostic Precision Study Quality Assessment Tool (QUADAS-2) was used to assess the risk of bias. RESULTS A total of 25 studies met the inclusion criteria and 8 of them were used in the meta-analysis. The pooled estimates of accuracy, specificity, and sensitivity for depression detection models were 0.87 (95% CI, 0.81-0.93), 0.85 (95% CI, 0.78-0.91), and 0.82 (95% CI, 0.71-0.94), respectively. When stratified by model structure, the highest pooled diagnostic accuracy was 0.89 (95% CI, 0.81-0.97) in the handcrafted group. DISCUSSION To our knowledge, our study is the first meta-analysis on the diagnostic performance of DL for depression detection from speech samples. All studies included in the meta-analysis used convolutional neural network (CNN) models, posing problems in deciphering the performance of other DL algorithms. The handcrafted model performed better than the end-to-end model in speech depression detection. CONCLUSIONS The application of DL in speech provided a useful tool for depression detection. CNN models with handcrafted acoustic features could help to improve the diagnostic performance. PROTOCOL REGISTRATION The study protocol was registered on PROSPERO (CRD42023423603).
Collapse
Affiliation(s)
- Lidan Liu
- Department of Population Health Sciences, School of Life Course and Population Sciences, Faculty of Life Sciences & Medicine, King's College London, London, SE1 1UL, United Kingdom
| | - Lu Liu
- Department of Population Health Sciences, School of Life Course and Population Sciences, Faculty of Life Sciences & Medicine, King's College London, London, SE1 1UL, United Kingdom
| | - Hatem A Wafa
- Department of Population Health Sciences, School of Life Course and Population Sciences, Faculty of Life Sciences & Medicine, King's College London, London, SE1 1UL, United Kingdom
| | - Florence Tydeman
- Department of Population Health Sciences, School of Life Course and Population Sciences, Faculty of Life Sciences & Medicine, King's College London, London, SE1 1UL, United Kingdom
| | - Wanqing Xie
- Department of Intelligent Medical Engineering, School of Biomedical Engineering, Anhui Medical University, Hefei, 230032, China
- Department of Psychology, School of Mental Health and Psychological Sciences, Anhui Medical University, Hefei, 230032, China
- Beth Israel Deaconess Medical Center, Harvard Medical School, Harvard University, Boston, MA, 02115, United States
| | - Yanzhong Wang
- Department of Population Health Sciences, School of Life Course and Population Sciences, Faculty of Life Sciences & Medicine, King's College London, London, SE1 1UL, United Kingdom
| |
Collapse
|
6
|
Hartnagel LM, Ebner-Priemer UW, Foo JC, Streit F, Witt SH, Frank J, Limberger MF, Horn AB, Gilles M, Rietschel M, Sirignano L. Linguistic style as a digital marker for depression severity: An ambulatory assessment pilot study in patients with depressive disorder undergoing sleep deprivation therapy. Acta Psychiatr Scand 2024. [PMID: 38987940 DOI: 10.1111/acps.13726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 05/28/2024] [Accepted: 06/22/2024] [Indexed: 07/12/2024]
Abstract
BACKGROUND Digital phenotyping and monitoring tools are the most promising approaches to automatically detect upcoming depressive episodes. Especially, linguistic style has been seen as a potential behavioral marker of depression, as cross-sectional studies showed, for example, less frequent use of positive emotion words, intensified use of negative emotion words, and more self-references in patients with depression compared to healthy controls. However, longitudinal studies are sparse and therefore it remains unclear whether within-person fluctuations in depression severity are associated with individuals' linguistic style. METHODS To capture affective states and concomitant speech samples longitudinally, we used an ambulatory assessment approach sampling multiple times a day via smartphones in patients diagnosed with depressive disorder undergoing sleep deprivation therapy. This intervention promises a rapid change of affective symptoms within a short period of time, assuring sufficient variability in depressive symptoms. We extracted word categories from the transcribed speech samples using the Linguistic Inquiry and Word Count. RESULTS Our analyses revealed that more pleasant affective momentary states (lower reported depression severity, lower negative affective state, higher positive affective state, (positive) valence, energetic arousal and calmness) are mirrored in the use of less negative emotion words and more positive emotion words. CONCLUSION We conclude that a patient's linguistic style, especially the use of positive and negative emotion words, is associated with self-reported affective states and thus is a promising feature for speech-based automated monitoring and prediction of upcoming episodes, ultimately leading to better patient care.
Collapse
Affiliation(s)
- Lisa-Marie Hartnagel
- Mental mHealth Lab, Institute of Sports and Sports Science, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Ulrich W Ebner-Priemer
- Mental mHealth Lab, Institute of Sports and Sports Science, Karlsruhe Institute of Technology, Karlsruhe, Germany
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany
| | - Jerome C Foo
- Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany
- Institute for Psychopharmacology, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany
- Neuroscience and Mental Health Institute, University of Alberta, Edmonton, Alberta, Canada
- Department of Psychiatry, College of Health Sciences, University of Alberta, Edmonton, Alberta, Canada
| | - Fabian Streit
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany
- Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany
- Hector Institute for Artificial Intelligence in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany
| | - Stephanie H Witt
- Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany
| | - Josef Frank
- Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany
| | - Matthias F Limberger
- Mental mHealth Lab, Institute of Sports and Sports Science, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Andrea B Horn
- University Research Priority Program (URPP) Dynamics of Healthy Aging, Healthy Longevity Center, University of Zürich, Zürich, Switzerland
| | - Maria Gilles
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany
| | - Marcella Rietschel
- Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany
| | - Lea Sirignano
- Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany
| |
Collapse
|
7
|
Ramanarayanan V. Multimodal Technologies for Remote Assessment of Neurological and Mental Health. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2024:1-13. [PMID: 38984943 DOI: 10.1044/2024_jslhr-24-00142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2024]
Abstract
PURPOSE Automated remote assessment and monitoring of patients' neurological and mental health is increasingly becoming an essential component of the digital clinic and telehealth ecosystem, especially after the COVID-19 pandemic. This review article reviews various modalities of health information that are useful for developing such remote clinical assessments in the real world at scale. APPROACH We first present an overview of the various modalities of health information-speech acoustics, natural language, conversational dynamics, orofacial or full body movement, eye gaze, respiration, cardiopulmonary, and neural-which can each be extracted from various signal sources-audio, video, text, or sensors. We further motivate their clinical utility with examples of how information from each modality can help us characterize how different disorders affect different aspects of patients' spoken communication. We then elucidate the advantages of combining one or more of these modalities toward a more holistic, informative, and robust assessment. FINDINGS We find that combining multiple modalities of health information allows for improved scientific interpretability, improved performance on downstream health applications such as early detection and progress monitoring, improved technological robustness, and improved user experience. We illustrate how these principles can be leveraged for remote clinical assessment at scale using a real-world case study of the Modality assessment platform. CONCLUSION This review article motivates the combination of human-centric information from multiple modalities to measure various aspects of patients' health, arguing that remote clinical assessment that integrates this complementary information can be more effective and lead to better clinical outcomes than using any one data stream in isolation.
Collapse
Affiliation(s)
- Vikram Ramanarayanan
- Modality.AI, Inc., San Francisco, CA
- Department of Otolaryngology-Head and Neck Surgery, University of California, San Francisco
| |
Collapse
|
8
|
Cordella C, Di Filippo L, Kolachalama VB, Kiran S. Connected Speech Fluency in Poststroke and Progressive Aphasia: A Scoping Review of Quantitative Approaches and Features. AMERICAN JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2024; 33:2091-2128. [PMID: 38652820 PMCID: PMC11253646 DOI: 10.1044/2024_ajslp-23-00208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 10/09/2023] [Accepted: 01/08/2024] [Indexed: 04/25/2024]
Abstract
PURPOSE Speech fluency has important diagnostic implications for individuals with poststroke aphasia (PSA) as well as primary progressive aphasia (PPA), and quantitative assessment of connected speech has emerged as a widely used approach across both etiologies. The purpose of this review was to provide a clearer picture on the range, nature, and utility of individual quantitative speech/language measures and methods used to assess connected speech fluency in PSA and PPA, and to compare approaches across etiologies. METHOD We conducted a scoping review of literature published between 2012 and 2022 following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews guidelines. Forty-five studies were included in the review. Literature was charted and summarized by etiology and characteristics of included patient populations and method(s) used for derivation and analysis of speech/language features. For a subset of included articles, we also charted the individual quantitative speech/language features reported and the level of significance of reported results. RESULTS Results showed that similar methodological approaches have been used to quantify connected speech fluency in both PSA and PPA. Two hundred nine individual speech-language features were analyzed in total, with low levels of convergence across etiology on specific features but greater agreement on the most salient features. The most useful features for differentiating fluent from nonfluent aphasia in both PSA and PPA were features related to overall speech quantity, speech rate, or grammatical competence. CONCLUSIONS Data from this review demonstrate the feasibility and utility of quantitative approaches to index connected speech fluency in PSA and PPA. We identified emergent trends toward automated analysis methods and data-driven approaches, which offer promising avenues for clinical translation of quantitative approaches. There is a further need for improved consensus on which subset of individual features might be most clinically useful for assessment and monitoring of fluency. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.25537237.
Collapse
Affiliation(s)
- Claire Cordella
- Department of Speech, Language and Hearing Sciences, Boston University, MA
| | - Lauren Di Filippo
- Department of Speech, Language and Hearing Sciences, Boston University, MA
| | - Vijaya B. Kolachalama
- Department of Medicine, Boston University Chobanian & Avedisian School of Medicine, MA
- Department of Computer Science and Faculty of Computing & Data Sciences, Boston University, MA
| | - Swathi Kiran
- Department of Speech, Language and Hearing Sciences, Boston University, MA
| |
Collapse
|
9
|
Scroggins JK, Topaz M, Song J, Zolnoori M. Does synthetic data augmentation improve the performances of machine learning classifiers for identifying health problems in patient-nurse verbal communications in home healthcare settings? J Nurs Scholarsh 2024. [PMID: 38961517 DOI: 10.1111/jnu.13004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 05/21/2024] [Accepted: 06/19/2024] [Indexed: 07/05/2024]
Abstract
BACKGROUND Identifying health problems in audio-recorded patient-nurse communication is important to improve outcomes in home healthcare patients who have complex conditions with increased risks of hospital utilization. Training machine learning classifiers for identifying problems requires resource-intensive human annotation. OBJECTIVE To generate synthetic patient-nurse communication and to automatically annotate for common health problems encountered in home healthcare settings using GPT-4. We also examined whether augmenting real-world patient-nurse communication with synthetic data can improve the performance of machine learning to identify health problems. DESIGN Secondary data analysis of patient-nurse verbal communication data in home healthcare settings. METHODS The data were collected from one of the largest home healthcare organizations in the United States. We used 23 audio recordings of patient-nurse communications from 15 patients. The audio recordings were transcribed verbatim and manually annotated for health problems (e.g., circulation, skin, pain) indicated in the Omaha System Classification scheme. Synthetic data of patient-nurse communication were generated using the in-context learning prompting method, enhanced by chain-of-thought prompting to improve the automatic annotation performance. Machine learning classifiers were applied to three training datasets: real-world communication, synthetic communication, and real-world communication augmented by synthetic communication. RESULTS Average F1 scores improved from 0.62 to 0.63 after training data were augmented with synthetic communication. The largest increase was observed using the XGBoost classifier where F1 scores improved from 0.61 to 0.64 (about 5% improvement). When trained solely on either real-world communication or synthetic communication, the classifiers showed comparable F1 scores of 0.62-0.61, respectively. CONCLUSION Integrating synthetic data improves machine learning classifiers' ability to identify health problems in home healthcare, with performance comparable to training on real-world data alone, highlighting the potential of synthetic data in healthcare analytics. CLINICAL RELEVANCE This study demonstrates the clinical relevance of leveraging synthetic patient-nurse communication data to enhance machine learning classifier performances to identify health problems in home healthcare settings, which will contribute to more accurate and efficient problem identification and detection of home healthcare patients with complex health conditions.
Collapse
Affiliation(s)
| | - Maxim Topaz
- Columbia University School of Nursing, New York, New York, USA
- Data Science Institute, Columbia University, New York, New York, USA
- Center for Home Care Policy & Research, VNS Health, New York, New York, USA
| | - Jiyoun Song
- University of Pennsylvania School of Nursing, Philadelphia, Pennsylvania, USA
| | - Maryam Zolnoori
- Columbia University School of Nursing, New York, New York, USA
- Center for Home Care Policy & Research, VNS Health, New York, New York, USA
| |
Collapse
|
10
|
Neumann M, Kothare H, Ramanarayanan V. Multimodal Speech Biomarkers for Remote Monitoring of ALS Disease Progression. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.06.26.24308811. [PMID: 38978682 PMCID: PMC11230328 DOI: 10.1101/2024.06.26.24308811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Amyotrophic lateral sclerosis (ALS) is a progressive neurodegenerative disease that severely impacts affected persons' speech and motor functions, yet early detection and tracking of disease progression remain challenging. The current gold standard for monitoring ALS progression, the ALS functional rating scale - revised (ALSFRS-R), is based on subjective ratings of symptom severity, and may not capture subtle but clinically meaningful changes due to a lack of granularity. Multimodal speech measures which can be automatically collected from patients in a remote fashion allow us to bridge this gap because they are continuous-valued and therefore, potentially more granular at capturing disease progression. Here we investigate the responsiveness and sensitivity of multimodal speech measures in persons with ALS (pALS) collected via a remote patient monitoring platform in an effort to quantify how long it takes to detect a clinically-meaningful change associated with disease progression. We recorded audio and video from 278 participants and automatically extracted multimodal speech biomarkers (acoustic, orofacial, linguistic) from the data. We find that the timing alignment of pALS speech relative to a canonical elicitation of the same prompt and the number of words used to describe a picture are the most responsive measures at detecting such change in both pALS with bulbar (n = 36) and non-bulbar onset (n = 107). Interestingly, the responsiveness of these measures is stable even at small sample sizes. We further found that certain speech measures are sensitive enough to track bulbar decline even when there is no patient-reported clinical change, i.e. the ALSFRS-R speech score remains unchanged at 3 out of a total possible score of 4. The findings of this study have the potential to facilitate improved, accelerated and cost-effective clinical trials and care.
Collapse
Affiliation(s)
| | | | - Vikram Ramanarayanan
- Modality.AI, Inc., San Francisco, CA, USA
- University of California, San Francisco, CA, USA
| |
Collapse
|
11
|
Ciharova M, Amarti K, van Breda W, Peng X, Lorente-Català R, Funk B, Hoogendoorn M, Koutsouleris N, Fusar-Poli P, Karyotaki E, Cuijpers P, Riper H. Use of Machine Learning Algorithms Based on Text, Audio, and Video Data in the Prediction of Anxiety and Posttraumatic Stress in General and Clinical Populations: A Systematic Review. Biol Psychiatry 2024:S0006-3223(24)01362-3. [PMID: 38866173 DOI: 10.1016/j.biopsych.2024.06.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 05/24/2024] [Accepted: 06/04/2024] [Indexed: 06/14/2024]
Abstract
Research in machine learning (ML) algorithms using natural behavior (i.e., text, audio, and video data) suggests that these techniques could contribute to personalization in psychology and psychiatry. However, a systematic review of the current state of the art is missing. Moreover, individual studies often target ML experts who may overlook potential clinical implications of their findings. In a narrative accessible to mental health professionals, we present a systematic review conducted in 5 psychology and 2 computer science databases. We included 128 studies that assessed the predictive power of ML algorithms using text, audio, and/or video data in the prediction of anxiety and posttraumatic stress disorder. Most studies (n = 87) were aimed at predicting anxiety, while the remainder (n = 41) focused on posttraumatic stress disorder. They were mostly published since 2019 in computer science journals and tested algorithms using text (n = 72) as opposed to audio or video. Studies focused mainly on general populations (n = 92) and less on laboratory experiments (n = 23) or clinical populations (n = 13). Methodological quality varied, as did reported metrics of the predictive power, hampering comparison across studies. Two-thirds of studies, which focused on both disorders, reported acceptable to very good predictive power (including high-quality studies only). The results of 33 studies were uninterpretable, mainly due to missing information. Research into ML algorithms using natural behavior is in its infancy but shows potential to contribute to diagnostics of mental disorders, such as anxiety and posttraumatic stress disorder, in the future if standardization of methods, reporting of results, and research in clinical populations are improved.
Collapse
Affiliation(s)
- Marketa Ciharova
- Department of Clinical, Neuro, and Developmental Psychology, Amsterdam Public Health Research Institute, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands; Black Dog Institute, University of New South Wales, Sydney, New South Wales, Australia.
| | - Khadicha Amarti
- Department of Clinical, Neuro, and Developmental Psychology, Amsterdam Public Health Research Institute, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - Ward van Breda
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - Xianhua Peng
- Department of Clinical, Neuro, and Developmental Psychology, Amsterdam Public Health Research Institute, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands; Department of Methodology and Statistics, Tilburg School of Social and Behavioral Sciences, Tilburg University, Tilburg, the Netherlands
| | - Rosa Lorente-Català
- Department of Basic and Clinical Psychology and Psychobiology, Universitat Jaume I, Castellon, Spain
| | - Burkhardt Funk
- Institute of Information Systems, Leuphana University, Lüneburg, Germany
| | - Mark Hoogendoorn
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - Nikolaos Koutsouleris
- Artificial Intelligence in Mental Health Group, Department of Psychosis Studies, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom; Precision Psychiatry Group, Max Planck Institute, Munich, Germany; Section for Precision Psychiatry, Department of Psychiatry and Psychotherapy, University Medical Center, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Paolo Fusar-Poli
- Section for Precision Psychiatry, Department of Psychiatry and Psychotherapy, University Medical Center, Ludwig-Maximilians-University Munich, Munich, Germany; Early Psychosis: Interventions and Clinical-Detection (EPIC) Lab, Department of Psychosis Studies, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom; Department of Brain and Behavioural Sciences, University of Pavia, Pavia, Italy; OASIS Service, South London and the Maudsley National Health Service Foundation Trust, London, United Kingdom
| | - Eirini Karyotaki
- Department of Clinical, Neuro, and Developmental Psychology, Amsterdam Public Health Research Institute, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands; WHO Collaborating Center for Research and Dissemination of Psychological Interventions, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - Pim Cuijpers
- Department of Clinical, Neuro, and Developmental Psychology, Amsterdam Public Health Research Institute, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands; WHO Collaborating Center for Research and Dissemination of Psychological Interventions, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands; Babeș-Bolyai University, International Institute for Psychotherapy, Cluj-Napoca, Romania
| | - Heleen Riper
- Department of Clinical, Neuro, and Developmental Psychology, Amsterdam Public Health Research Institute, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands; Department of Psychiatry, Amsterdam Public Health Research Institute, Amsterdam University Medical Centre, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| |
Collapse
|
12
|
Ben Moshe T, Ziv I, Dershowitz N, Bar K. The contribution of prosody to machine classification of schizophrenia. SCHIZOPHRENIA (HEIDELBERG, GERMANY) 2024; 10:53. [PMID: 38762536 PMCID: PMC11102498 DOI: 10.1038/s41537-024-00463-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 03/15/2024] [Indexed: 05/20/2024]
Abstract
We show how acoustic prosodic features, such as pitch and gaps, can be used computationally for detecting symptoms of schizophrenia from a single spoken response. We compare the individual contributions of acoustic and previously-employed text modalities to the algorithmic determination whether the speaker has schizophrenia. Our classification results clearly show that we can extract relevant acoustic features better than those textual ones. We find that, when combined with those acoustic features, textual features improve classification only slightly.
Collapse
Affiliation(s)
- Tomer Ben Moshe
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Ido Ziv
- Behavioral Sciences, Netanya Academic College, Netanya, Israel.
| | - Nachum Dershowitz
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Kfir Bar
- Effi Arazi School of Computer Science, Reichman University, Herzliya, Israel
| |
Collapse
|
13
|
Low DM, Rao V, Randolph G, Song PC, Ghosh SS. Identifying bias in models that detect vocal fold paralysis from audio recordings using explainable machine learning and clinician ratings. PLOS DIGITAL HEALTH 2024; 3:e0000516. [PMID: 38814939 PMCID: PMC11139298 DOI: 10.1371/journal.pdig.0000516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 04/02/2024] [Indexed: 06/01/2024]
Abstract
Detecting voice disorders from voice recordings could allow for frequent, remote, and low-cost screening before costly clinical visits and a more invasive laryngoscopy examination. Our goals were to detect unilateral vocal fold paralysis (UVFP) from voice recordings using machine learning, to identify which acoustic variables were important for prediction to increase trust, and to determine model performance relative to clinician performance. Patients with confirmed UVFP through endoscopic examination (N = 77) and controls with normal voices matched for age and sex (N = 77) were included. Voice samples were elicited by reading the Rainbow Passage and sustaining phonation of the vowel "a". Four machine learning models of differing complexity were used. SHapley Additive exPlanations (SHAP) was used to identify important features. The highest median bootstrapped ROC AUC score was 0.87 and beat clinician's performance (range: 0.74-0.81) based on the recordings. Recording durations were different between UVFP recordings and controls due to how that data was originally processed when storing, which we can show can classify both groups. And counterintuitively, many UVFP recordings had higher intensity than controls, when UVFP patients tend to have weaker voices, revealing a dataset-specific bias which we mitigate in an additional analysis. We demonstrate that recording biases in audio duration and intensity created dataset-specific differences between patients and controls, which models used to improve classification. Furthermore, clinician's ratings provide further evidence that patients were over-projecting their voices and being recorded at a higher amplitude signal than controls. Interestingly, after matching audio duration and removing variables associated with intensity in order to mitigate the biases, the models were able to achieve a similar high performance. We provide a set of recommendations to avoid bias when building and evaluating machine learning models for screening in laryngology.
Collapse
Affiliation(s)
- Daniel M. Low
- Program in Speech and Hearing Bioscience and Technology, Harvard Medical School, Boston, Massachusetts, United States of America
- McGovern Institute for Brain Research, MIT, Cambridge, Massachusetts, United States of America
| | - Vishwanatha Rao
- Department of Biomedical Engineering, Columbia University, New York, New York, United States of America
- Department of Otolaryngology–Head and Neck Surgery, Massachusetts Eye and Ear Infirmary, Boston, Massachusetts, United States of America
| | - Gregory Randolph
- Department of Otolaryngology–Head and Neck Surgery, Massachusetts Eye and Ear Infirmary, Boston, Massachusetts, United States of America
- Department of Otolaryngology–Head and Neck Surgery, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Phillip C. Song
- Department of Otolaryngology–Head and Neck Surgery, Massachusetts Eye and Ear Infirmary, Boston, Massachusetts, United States of America
- Department of Otolaryngology–Head and Neck Surgery, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Satrajit S. Ghosh
- Program in Speech and Hearing Bioscience and Technology, Harvard Medical School, Boston, Massachusetts, United States of America
- McGovern Institute for Brain Research, MIT, Cambridge, Massachusetts, United States of America
- Department of Otolaryngology–Head and Neck Surgery, Harvard Medical School, Boston, Massachusetts, United States of America
| |
Collapse
|
14
|
Cao S, Rosenzweig I, Bilotta F, Jiang H, Xia M. Automatic detection of obstructive sleep apnea based on speech or snoring sounds: a narrative review. J Thorac Dis 2024; 16:2654-2667. [PMID: 38738242 PMCID: PMC11087644 DOI: 10.21037/jtd-24-310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 04/15/2024] [Indexed: 05/14/2024]
Abstract
Background and Objective Obstructive sleep apnea (OSA) is a common chronic disorder characterized by repeated breathing pauses during sleep caused by upper airway narrowing or collapse. The gold standard for OSA diagnosis is the polysomnography test, which is time consuming, expensive, and invasive. In recent years, more cost-effective approaches for OSA detection based in predictive value of speech and snoring has emerged. In this paper, we offer a comprehensive summary of current research progress on the applications of speech or snoring sounds for the automatic detection of OSA and discuss the key challenges that need to be overcome for future research into this novel approach. Methods PubMed, IEEE Xplore, and Web of Science databases were searched with related keywords. Literature published between 1989 and 2022 examining the potential of using speech or snoring sounds for automated OSA detection was reviewed. Key Content and Findings Speech and snoring sounds contain a large amount of information about OSA, and they have been extensively studied in the automatic screening of OSA. By importing features extracted from speech and snoring sounds into artificial intelligence models, clinicians can automatically screen for OSA. Features such as formant, linear prediction cepstral coefficients, mel-frequency cepstral coefficients, and artificial intelligence algorithms including support vector machines, Gaussian mixture model, and hidden Markov models have been extensively studied for the detection of OSA. Conclusions Due to the significant advantages of noninvasive, low-cost, and contactless data collection, an automatic approach based on speech or snoring sounds seems to be a promising tool for the detection of OSA.
Collapse
Affiliation(s)
- Shuang Cao
- Department of Anesthesiology, The Ninth People’s Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Ivana Rosenzweig
- Sleep and Brain Plasticity Centre, CNS, IoPPN, King’s College London, London, UK
- Sleep Disorders Centre, Guy’s and St Thomas’ Hospital, GSTT NHS, London, UK
| | - Federico Bilotta
- Department of Anaesthesia and Critical Care Medicine, Policlinico Umberto 1 Hospital, Sapienza University of Rome, Rome, Italy
| | - Hong Jiang
- Department of Anesthesiology, The Ninth People’s Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Ming Xia
- Department of Anesthesiology, The Ninth People’s Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| |
Collapse
|
15
|
Weisenburger RL, Mullarkey MC, Labrada J, Labrousse D, Yang MY, MacPherson AH, Hsu KJ, Ugail H, Shumake J, Beevers CG. Conversational assessment using artificial intelligence is as clinically useful as depression scales and preferred by users. J Affect Disord 2024; 351:489-498. [PMID: 38290584 DOI: 10.1016/j.jad.2024.01.212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 01/15/2024] [Accepted: 01/22/2024] [Indexed: 02/01/2024]
Abstract
BACKGROUND Depression is prevalent, chronic, and burdensome. Due to limited screening access, depression often remains undiagnosed. Artificial intelligence (AI) models based on spoken responses to interview questions may offer an effective, efficient alternative to other screening methods. OBJECTIVE The primary aim was to use a demographically diverse sample to validate an AI model, previously trained on human-administered interviews, on novel bot-administered interviews, and to check for algorithmic biases related to age, sex, race, and ethnicity. METHODS Using the Aiberry app, adults recruited via social media (N = 393) completed a brief bot-administered interview and a depression self-report form. An AI model was used to predict form scores based on interview responses alone. For all meaningful discrepancies between model inference and form score, clinicians performed a masked review to determine which one they preferred. RESULTS There was strong concurrent validity between the model predictions and raw self-report scores (r = 0.73, MAE = 3.3). 90 % of AI predictions either agreed with self-report or with clinical expert opinion when AI contradicted self-report. There was no differential model performance across age, sex, race, or ethnicity. LIMITATIONS Limitations include access restrictions (English-speaking ability and access to smartphone or computer with broadband internet) and potential self-selection of participants more favorably predisposed toward AI technology. CONCLUSION The Aiberry model made accurate predictions of depression severity based on remotely collected spoken responses to a bot-administered interview. This study shows promising results for the use of AI as a mental health screening tool on par with self-report measures.
Collapse
Affiliation(s)
- Rachel L Weisenburger
- Department of Psychology and Institute for Mental Health Research, The University of Texas at Austin, United States of America.
| | | | | | - Daniel Labrousse
- Department of Psychiatry, Georgetown University Medical Center, United States of America
| | - Michelle Y Yang
- Department of Psychiatry, Georgetown University Medical Center, United States of America
| | - Allison Huff MacPherson
- Department of Family and Community Medicine, College of Medicine, University of Arizona, United States of America
| | - Kean J Hsu
- Department of Psychiatry, Georgetown University Medical Center, United States of America; Department of Psychology, National University of Singapore, Singapore
| | - Hassan Ugail
- Centre for Visual Computing, University of Bradford, United Kingdom of Great Britain and Northern Ireland
| | | | - Christopher G Beevers
- Department of Psychology and Institute for Mental Health Research, The University of Texas at Austin, United States of America
| |
Collapse
|
16
|
Stein F, Gruber M, Mauritz M, Brosch K, Pfarr JK, Ringwald KG, Thomas-Odenthal F, Wroblewski A, Evermann U, Steinsträter O, Grumbach P, Thiel K, Winter A, Bonnekoh LM, Flinkenflügel K, Goltermann J, Meinert S, Grotegerd D, Bauer J, Opel N, Hahn T, Leehr EJ, Jansen A, de Lange SC, van den Heuvel MP, Nenadić I, Krug A, Dannlowski U, Repple J, Kircher T. Brain Structural Network Connectivity of Formal Thought Disorder Dimensions in Affective and Psychotic Disorders. Biol Psychiatry 2024; 95:629-638. [PMID: 37207935 DOI: 10.1016/j.biopsych.2023.05.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 04/14/2023] [Accepted: 05/04/2023] [Indexed: 05/21/2023]
Abstract
BACKGROUND The psychopathological syndrome of formal thought disorder (FTD) is not only present in schizophrenia (SZ), but also highly prevalent in major depressive disorder and bipolar disorder. It remains unknown how alterations in the structural white matter connectome of the brain correlate with psychopathological FTD dimensions across affective and psychotic disorders. METHODS Using FTD items of the Scale for the Assessment of Positive Symptoms and Scale for the Assessment of Negative Symptoms, we performed exploratory and confirmatory factor analyses in 864 patients with major depressive disorder (n= 689), bipolar disorder (n = 108), or SZ (n = 67) to identify psychopathological FTD dimensions. We used T1- and diffusion-weighted magnetic resonance imaging to reconstruct the structural connectome of the brain. To investigate the association of FTD subdimensions and global structural connectome measures, we employed linear regression models. We used network-based statistic to identify subnetworks of white matter fiber tracts associated with FTD symptomatology. RESULTS Three psychopathological FTD dimensions were delineated, i.e., disorganization, emptiness, and incoherence. Disorganization and incoherence were associated with global dysconnectivity. Network-based statistics identified subnetworks associated with the FTD dimensions disorganization and emptiness but not with the FTD dimension incoherence. Post hoc analyses on subnetworks did not reveal diagnosis × FTD dimension interaction effects. Results remained stable after correcting for medication and disease severity. Confirmatory analyses showed a substantial overlap of nodes from both subnetworks with cortical brain regions previously associated with FTD in SZ. CONCLUSIONS We demonstrated white matter subnetwork dysconnectivity in major depressive disorder, bipolar disorder, and SZ associated with FTD dimensions that predominantly comprise brain regions implicated in speech. Results open an avenue for transdiagnostic, psychopathology-informed, dimensional studies in pathogenetic research.
Collapse
Affiliation(s)
- Frederike Stein
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany; Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany.
| | - Marius Gruber
- Institute for Translational Psychiatry, University of Münster, Münster, Germany; Department of Psychiatry, Psychosomatic Medicine and Psychotherapy, University Hospital Frankfurt, Goethe University, Frankfurt, Germany
| | - Marco Mauritz
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
| | - Katharina Brosch
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany; Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Julia-Katharina Pfarr
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany; Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Kai G Ringwald
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany; Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Florian Thomas-Odenthal
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany; Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Adrian Wroblewski
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany; Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Ulrika Evermann
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany; Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Olaf Steinsträter
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany; Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Pascal Grumbach
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
| | - Katharina Thiel
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
| | - Alexandra Winter
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
| | - Linda M Bonnekoh
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
| | - Kira Flinkenflügel
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
| | - Janik Goltermann
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
| | - Susanne Meinert
- Institute for Translational Psychiatry, University of Münster, Münster, Germany; Institute for Translational Neuroscience, University of Münster, Münster, Germany
| | - Dominik Grotegerd
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
| | - Jochen Bauer
- Department of Radiology, University of Münster, Münster, Germany
| | - Nils Opel
- Institute for Translational Psychiatry, University of Münster, Münster, Germany; Department of Psychiatry, Jena University Hospital/Friedrich Schiller University Jena, Jena, Germany
| | - Tim Hahn
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
| | - Elisabeth J Leehr
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
| | - Andreas Jansen
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany; Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Siemon C de Lange
- Connectome Lab, Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Vrije Universiteit Amsterdam, Amsterdam Neuroscience, Amsterdam, the Netherlands; Department of Sleep and Cognition, Netherlands Institute for Neuroscience, an institute of the Royal Netherlands Academy of Arts and Sciences, Amsterdam, The Netherlands
| | - Martijn P van den Heuvel
- Connectome Lab, Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Vrije Universiteit Amsterdam, Amsterdam Neuroscience, Amsterdam, the Netherlands; Department of Child and Adolescent Psychiatry and Psychology, Section Complex Trait Genetics, Amsterdam Neuroscience, Vrije Universiteit Medical Center, Amsterdam UMC, Amsterdam, the Netherlands
| | - Igor Nenadić
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany; Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Axel Krug
- Department of Psychiatry and Psychotherapy, University of Bonn, Bonn, Germany
| | - Udo Dannlowski
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
| | - Jonathan Repple
- Institute for Translational Psychiatry, University of Münster, Münster, Germany; Department of Psychiatry, Psychosomatic Medicine and Psychotherapy, University Hospital Frankfurt, Goethe University, Frankfurt, Germany
| | - Tilo Kircher
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany; Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| |
Collapse
|
17
|
Zaher F, Diallo M, Achim AM, Joober R, Roy MA, Demers MF, Subramanian P, Lavigne KM, Lepage M, Gonzalez D, Zeljkovic I, Davis K, Mackinley M, Sabesan P, Lal S, Voppel A, Palaniyappan L. Speech markers to predict and prevent recurrent episodes of psychosis: A narrative overview and emerging opportunities. Schizophr Res 2024; 266:205-215. [PMID: 38428118 DOI: 10.1016/j.schres.2024.02.036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 02/18/2024] [Accepted: 02/25/2024] [Indexed: 03/03/2024]
Abstract
Preventing relapse in schizophrenia improves long-term health outcomes. Repeated episodes of psychotic symptoms shape the trajectory of this illness and can be a detriment to functional recovery. Despite early intervention programs, high relapse rates persist, calling for alternative approaches in relapse prevention. Predicting imminent relapse at an individual level is critical for effective intervention. While clinical profiles are often used to foresee relapse, they lack the specificity and sensitivity needed for timely prediction. Here, we review the use of speech through Natural Language Processing (NLP) to predict a recurrent psychotic episode. Recent advancements in NLP of speech have shown the ability to detect linguistic markers related to thought disorder and other language disruptions within 2-4 weeks preceding a relapse. This approach has shown to be able to capture individual speech patterns, showing promise in its use as a prediction tool. We outline current developments in remote monitoring for psychotic relapses, discuss the challenges and limitations and present the speech-NLP based approach as an alternative to detect relapses with sufficient accuracy, construct validity and lead time to generate clinical actions towards prevention.
Collapse
Affiliation(s)
- Farida Zaher
- Douglas Mental Health University Institute, Department of Psychiatry, McGill University, Montreal, QC, Canada
| | - Mariama Diallo
- Douglas Mental Health University Institute, Department of Psychiatry, McGill University, Montreal, QC, Canada
| | - Amélie M Achim
- Département de Psychiatrie et Neurosciences, Université Laval, Québec City, QC, Canada; Vitam - Centre de Recherche en Santé Durable, Québec City, QC, Canada; Centre de Recherche CERVO, Québec City, QC, Canada
| | - Ridha Joober
- Douglas Mental Health University Institute, Department of Psychiatry, McGill University, Montreal, QC, Canada
| | - Marc-André Roy
- Département de Psychiatrie et Neurosciences, Université Laval, Québec City, QC, Canada; Centre de Recherche CERVO, Québec City, QC, Canada
| | - Marie-France Demers
- Centre de Recherche CERVO, Québec City, QC, Canada; Faculté de Pharmacie, Université Laval, Québec City, QC, Canada
| | - Priya Subramanian
- Department of Psychiatry, Schulich School of Medicine, Western University, London, ON, Canada
| | - Katie M Lavigne
- Douglas Mental Health University Institute, Department of Psychiatry, McGill University, Montreal, QC, Canada
| | - Martin Lepage
- Douglas Mental Health University Institute, Department of Psychiatry, McGill University, Montreal, QC, Canada
| | - Daniela Gonzalez
- Prevention and Early Intervention Program for Psychosis, London Health Sciences Center, Lawson Health Research Institute, London, ON, Canada
| | - Irnes Zeljkovic
- Department of Psychiatry, Schulich School of Medicine, Western University, London, ON, Canada
| | - Kristin Davis
- Douglas Mental Health University Institute, Department of Psychiatry, McGill University, Montreal, QC, Canada
| | - Michael Mackinley
- Department of Psychiatry, Schulich School of Medicine, Western University, London, ON, Canada; Prevention and Early Intervention Program for Psychosis, London Health Sciences Center, Lawson Health Research Institute, London, ON, Canada
| | - Priyadharshini Sabesan
- Lakeshore General Hospital and Department of Psychiatry, McGill University, Montreal, QC, Canada
| | - Shalini Lal
- Douglas Mental Health University Institute, Department of Psychiatry, McGill University, Montreal, QC, Canada; Centre de Recherche du Centre Hospitalier de l'Université de Montréal (CRCHUM), Montréal, QC, Canada; School of Rehabilitation, Faculty of Medicine, University of Montréal, Montréal, QC, Canada
| | - Alban Voppel
- Douglas Mental Health University Institute, Department of Psychiatry, McGill University, Montreal, QC, Canada
| | - Lena Palaniyappan
- Douglas Mental Health University Institute, Department of Psychiatry, McGill University, Montreal, QC, Canada; Department of Psychiatry, Schulich School of Medicine, Western University, London, ON, Canada; Robarts Research Institute, Western University, London, ON, Canada.
| |
Collapse
|
18
|
Casten LG, Koomar T, Elsadany M, McKone C, Tysseling B, Sasidharan M, Tomblin JB, Michaelson JJ. Lingo: an automated, web-based deep phenotyping platform for language ability. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.29.24305034. [PMID: 38585791 PMCID: PMC10996758 DOI: 10.1101/2024.03.29.24305034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Background Language and the ability to communicate effectively are key factors in mental health and well-being. Despite this critical importance, research on language is limited by the lack of a scalable phenotyping toolkit. Methods Here, we describe and showcase Lingo - a flexible online battery of language and nonverbal reasoning skills based on seven widely used tasks (COWAT, picture narration, vocal rhythm entrainment, rapid automatized naming, following directions, sentence repetition, and nonverbal reasoning). The current version of Lingo takes approximately 30 minutes to complete, is entirely open source, and allows for a wide variety of performance metrics to be extracted. We asked > 1,300 individuals from multiple samples to complete Lingo, then investigated the validity and utility of the resulting data. Results We conducted an exploratory factor analysis across 14 features derived from the seven assessments, identifying five factors. Four of the five factors showed acceptable test-retest reliability (Pearson's R > 0.7). Factor 2 showed the highest reliability (Pearson's R = 0.95) and loaded primarily on sentence repetition task performance. We validated Lingo with objective measures of language ability by comparing performance to gold-standard assessments: CELF-5 and the VABS-3. Factor 2 was significantly associated with the CELF-5 "core language ability" scale (Pearson's R = 0.77, p-value < 0.05) and the VABS-3 "communication" scale (Pearson's R = 0.74, p-value < 0.05). Factor 2 was positively associated with phenotypic and genetic measures of socieconomic status. Interestingly, we found the parents of children with language impairments had lower Factor 2 scores (p-value < 0.01). Finally, we found Lingo factor scores were significantly predictive of numerous psychiatric and neurodevelopmental conditions. Conclusions Together, these analyses support Lingo as a powerful platform for scalable deep phenotyping of language and other cognitive abilities. Additionally, exploratory analyses provide supporting evidence for the heritability of language ability and the complex relationship between mental health and language.
Collapse
Affiliation(s)
- Lucas G. Casten
- Interdisciplinary Graduate Program in Genetics, University of Iowa, Iowa City, IA
- Department of Psychiatry, University of Iowa, Iowa City, IA
| | - Tanner Koomar
- Department of Psychiatry, University of Iowa, Iowa City, IA
| | - Muhammad Elsadany
- Interdisciplinary Graduate Program in Genetics, University of Iowa, Iowa City, IA
- Department of Psychiatry, University of Iowa, Iowa City, IA
| | - Caleb McKone
- Department of Psychiatry, University of Iowa, Iowa City, IA
| | - Ben Tysseling
- Department of Psychiatry, University of Iowa, Iowa City, IA
| | | | - J. Bruce Tomblin
- Department of Communication Sciences and Disorders, University of Iowa, Iowa City, IA
| | - Jacob J. Michaelson
- Department of Psychiatry, University of Iowa, Iowa City, IA
- Department of Communication Sciences and Disorders, University of Iowa, Iowa City, IA
- Iowa Neuroscience Institute, University of Iowa, Iowa City, IA
- Hawkeye Intellectual and Developmental Disabilities Research Center (Hawk-IDDRC), University of Iowa, Iowa City, IA
| |
Collapse
|
19
|
Maleki Varnosfaderani S, Forouzanfar M. The Role of AI in Hospitals and Clinics: Transforming Healthcare in the 21st Century. Bioengineering (Basel) 2024; 11:337. [PMID: 38671759 PMCID: PMC11047988 DOI: 10.3390/bioengineering11040337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 03/25/2024] [Accepted: 03/26/2024] [Indexed: 04/28/2024] Open
Abstract
As healthcare systems around the world face challenges such as escalating costs, limited access, and growing demand for personalized care, artificial intelligence (AI) is emerging as a key force for transformation. This review is motivated by the urgent need to harness AI's potential to mitigate these issues and aims to critically assess AI's integration in different healthcare domains. We explore how AI empowers clinical decision-making, optimizes hospital operation and management, refines medical image analysis, and revolutionizes patient care and monitoring through AI-powered wearables. Through several case studies, we review how AI has transformed specific healthcare domains and discuss the remaining challenges and possible solutions. Additionally, we will discuss methodologies for assessing AI healthcare solutions, ethical challenges of AI deployment, and the importance of data privacy and bias mitigation for responsible technology use. By presenting a critical assessment of AI's transformative potential, this review equips researchers with a deeper understanding of AI's current and future impact on healthcare. It encourages an interdisciplinary dialogue between researchers, clinicians, and technologists to navigate the complexities of AI implementation, fostering the development of AI-driven solutions that prioritize ethical standards, equity, and a patient-centered approach.
Collapse
Affiliation(s)
| | - Mohamad Forouzanfar
- Département de Génie des Systèmes, École de Technologie Supérieure (ÉTS), Université du Québec, Montréal, QC H3C 1K3, Canada
- Centre de Recherche de L’institut Universitaire de Gériatrie de Montréal (CRIUGM), Montréal, QC H3W 1W5, Canada
| |
Collapse
|
20
|
Low DM, Rao V, Randolph G, Song PC, Ghosh SS. Identifying bias in models that detect vocal fold paralysis from audio recordings using explainable machine learning and clinician ratings. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2020.11.23.20235945. [PMID: 33501466 PMCID: PMC7836138 DOI: 10.1101/2020.11.23.20235945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Introduction Detecting voice disorders from voice recordings could allow for frequent, remote, and low-cost screening before costly clinical visits and a more invasive laryngoscopy examination. Our goals were to detect unilateral vocal fold paralysis (UVFP) from voice recordings using machine learning, to identify which acoustic variables were important for prediction to increase trust, and to determine model performance relative to clinician performance. Methods Patients with confirmed UVFP through endoscopic examination (N=77) and controls with normal voices matched for age and sex (N=77) were included. Voice samples were elicited by reading the Rainbow Passage and sustaining phonation of the vowel "a". Four machine learning models of differing complexity were used. SHapley Additive explanations (SHAP) was used to identify important features. Results The highest median bootstrapped ROC AUC score was 0.87 and beat clinician's performance (range: 0.74 - 0.81) based on the recordings. Recording durations were different between UVFP recordings and controls due to how that data was originally processed when storing, which we can show can classify both groups. And counterintuitively, many UVFP recordings had higher intensity than controls, when UVFP patients tend to have weaker voices, revealing a dataset-specific bias which we mitigate in an additional analysis. Conclusion We demonstrate that recording biases in audio duration and intensity created dataset-specific differences between patients and controls, which models used to improve classification. Furthermore, clinician's ratings provide further evidence that patients were over-projecting their voices and being recorded at a higher amplitude signal than controls. Interestingly, after matching audio duration and removing variables associated with intensity in order to mitigate the biases, the models were able to achieve a similar high performance. We provide a set of recommendations to avoid bias when building and evaluating machine learning models for screening in laryngology.
Collapse
Affiliation(s)
- Daniel M. Low
- Program in Speech and Hearing Bioscience and Technology, Harvard Medical School, Boston, MA, USA
- McGovern Institute for Brain Research, MIT, Cambridge, MA, USA
| | - Vishwanatha Rao
- Department of Biomedical Engineering, Columbia University, New York, NY, USA
- Department of Otolaryngology–Head and Neck Surgery, Massachusetts Eye and Ear Infirmary, Boston, MA, USA
| | - Gregory Randolph
- Department of Otolaryngology–Head and Neck Surgery, Massachusetts Eye and Ear Infirmary, Boston, MA, USA
- Department of Otolaryngology–Head and Neck Surgery, Harvard Medical School, Boston, MA, USA
| | - Phillip C. Song
- Department of Otolaryngology–Head and Neck Surgery, Massachusetts Eye and Ear Infirmary, Boston, MA, USA
- Department of Otolaryngology–Head and Neck Surgery, Harvard Medical School, Boston, MA, USA
| | - Satrajit S. Ghosh
- Program in Speech and Hearing Bioscience and Technology, Harvard Medical School, Boston, MA, USA
- McGovern Institute for Brain Research, MIT, Cambridge, MA, USA
- Department of Otolaryngology–Head and Neck Surgery, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
21
|
Larsen E, Murton O, Song X, Joachim D, Watts D, Kapczinski F, Venesky L, Hurowitz G. Validating the efficacy and value proposition of mental fitness vocal biomarkers in a psychiatric population: prospective cohort study. Front Psychiatry 2024; 15:1342835. [PMID: 38505797 PMCID: PMC10948552 DOI: 10.3389/fpsyt.2024.1342835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 02/14/2024] [Indexed: 03/21/2024] Open
Abstract
Background The utility of vocal biomarkers for mental health assessment has gained increasing attention. This study aims to further this line of research by introducing a novel vocal scoring system designed to provide mental fitness tracking insights to users in real-world settings. Methods A prospective cohort study with 104 outpatient psychiatric participants was conducted to validate the "Mental Fitness Vocal Biomarker" (MFVB) score. The MFVB score was derived from eight vocal features, selected based on literature review. Participants' mental health symptom severity was assessed using the M3 Checklist, which serves as a transdiagnostic tool for measuring depression, anxiety, post-traumatic stress disorder, and bipolar symptoms. Results The MFVB demonstrated an ability to stratify individuals by their risk of elevated mental health symptom severity. Continuous observation enhanced the MFVB's efficacy, with risk ratios improving from 1.53 (1.09-2.14, p=0.0138) for single 30-second voice samples to 2.00 (1.21-3.30, p=0.0068) for data aggregated over two weeks. A higher risk ratio of 8.50 (2.31-31.25, p=0.0013) was observed in participants who used the MFVB 5-6 times per week, underscoring the utility of frequent and continuous observation. Participant feedback confirmed the user-friendliness of the application and its perceived benefits. Conclusions The MFVB is a promising tool for objective mental health tracking in real-world conditions, with potential to be a cost-effective, scalable, and privacy-preserving adjunct to traditional psychiatric assessments. User feedback suggests that vocal biomarkers can offer personalized insights and support clinical therapy and other beneficial activities that are associated with improved mental health risks and outcomes.
Collapse
Affiliation(s)
| | | | | | | | - Devon Watts
- Neuroscience Graduate Program, Department of Health Sciences, McMaster University, Hamilton, ON, Canada
- St. Joseph’s Healthcare Hamilton, Hamilton, ON, Canada
| | - Flavio Kapczinski
- Neuroscience Graduate Program, Department of Health Sciences, McMaster University, Hamilton, ON, Canada
- Department of Psychiatry, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
| | | | | |
Collapse
|
22
|
Wang L, Liu R, Wang Y, Xu X, Zhang R, Wei Y, Zhu R, Zhang X, Wang F. Effectiveness of a Biofeedback Intervention Targeting Mental and Physical Health Among College Students Through Speech and Physiology as Biomarkers Using Machine Learning: A Randomized Controlled Trial. Appl Psychophysiol Biofeedback 2024; 49:71-83. [PMID: 38165498 DOI: 10.1007/s10484-023-09612-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/24/2023] [Indexed: 01/03/2024]
Abstract
Biofeedback therapy is mainly based on the analysis of physiological features to improve an individual's affective state. There are insufficient objective indicators to assess symptom improvement after biofeedback. In addition to psychological and physiological features, speech features can precisely convey information about emotions. The use of speech features can improve the objectivity of psychiatric assessments. Therefore, biofeedback based on subjective symptom scales, objective speech, and physiological features to evaluate efficacy provides a new approach for early screening and treatment of emotional problems in college students. A 4-week, randomized, controlled, parallel biofeedback therapy study was conducted with college students with symptoms of anxiety or depression. Speech samples, physiological samples, and clinical symptoms were collected at baseline and at the end of treatment, and the extracted speech features and physiological features were used for between-group comparisons and correlation analyses between the biofeedback and wait-list groups. Based on the speech features with differences between the biofeedback intervention and wait-list groups, an artificial neural network was used to predict the therapeutic effect and response after biofeedback therapy. Through biofeedback therapy, improvements in depression (p = 0.001), anxiety (p = 0.001), insomnia (p = 0.013), and stress (p = 0.004) severity were observed in college-going students (n = 52). The speech and physiological features in the biofeedback group also changed significantly compared to the waitlist group (n = 52) and were related to the change in symptoms. The energy parameters and Mel-Frequency Cepstral Coefficients (MFCC) of speech features can predict whether biofeedback intervention effectively improves anxiety and insomnia symptoms and treatment response. The accuracy of the classification model built using the artificial neural network (ANN) for treatment response and non-response was approximately 60%. The results of this study provide valuable information about biofeedback in improving the mental health of college-going students. The study identified speech features, such as the energy parameters, and MFCC as more accurate and objective indicators for tracking biofeedback therapy response and predicting efficacy. Trial Registration ClinicalTrials.gov ChiCTR2100045542.
Collapse
Affiliation(s)
- Lifei Wang
- Early Intervention Unit, Department of Psychiatry, The Affiliated Brain Hospital of Nanjing Medical University, Nanjing, People's Republic of China
- Functional Brain Imaging Institute of Nanjing Medical University, Nanjing, People's Republic of China
| | - Rongxun Liu
- Early Intervention Unit, Department of Psychiatry, The Affiliated Brain Hospital of Nanjing Medical University, Nanjing, People's Republic of China
- Functional Brain Imaging Institute of Nanjing Medical University, Nanjing, People's Republic of China
- Henan Key Laboratory of Immunology and Targeted Drugs, School of Laboratory Medicine, Xinxiang Medical University, Xinxiang, People's Republic of China
| | - Yang Wang
- Early Intervention Unit, Department of Psychiatry, The Affiliated Brain Hospital of Nanjing Medical University, Nanjing, People's Republic of China
- Functional Brain Imaging Institute of Nanjing Medical University, Nanjing, People's Republic of China
- Psychology Institute, Inner Mongolia Normal University, Hohhot, Inner Mongolia, People's Republic of China
| | - Xiao Xu
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Ran Zhang
- Early Intervention Unit, Department of Psychiatry, The Affiliated Brain Hospital of Nanjing Medical University, Nanjing, People's Republic of China
- Functional Brain Imaging Institute of Nanjing Medical University, Nanjing, People's Republic of China
| | - Yange Wei
- Early Intervention Unit, Department of Psychiatry, The Affiliated Brain Hospital of Nanjing Medical University, Nanjing, People's Republic of China
- Functional Brain Imaging Institute of Nanjing Medical University, Nanjing, People's Republic of China
| | - Rongxin Zhu
- Early Intervention Unit, Department of Psychiatry, The Affiliated Brain Hospital of Nanjing Medical University, Nanjing, People's Republic of China
- Functional Brain Imaging Institute of Nanjing Medical University, Nanjing, People's Republic of China
| | - Xizhe Zhang
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, Jiangsu, China.
| | - Fei Wang
- Early Intervention Unit, Department of Psychiatry, The Affiliated Brain Hospital of Nanjing Medical University, Nanjing, People's Republic of China.
- Functional Brain Imaging Institute of Nanjing Medical University, Nanjing, People's Republic of China.
- Department of Mental Health, School of Public Health, Nanjing Medical University, Nanjing, China.
| |
Collapse
|
23
|
Evangelista E, Kale R, McCutcheon D, Rameau A, Gelbard A, Powell M, Johns M, Law A, Song P, Naunheim M, Watts S, Bryson PC, Crowson MG, Pinto J, Bensoussan Y. Current Practices in Voice Data Collection and Limitations to Voice AI Research: A National Survey. Laryngoscope 2024; 134:1333-1339. [PMID: 38087983 DOI: 10.1002/lary.31052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Revised: 08/08/2023] [Accepted: 08/29/2023] [Indexed: 02/17/2024]
Abstract
INTRODUCTION Accuracy and validity of voice AI algorithms rely on substantial quality voice data. Although commensurable amounts of voice data are captured daily in voice centers across North America, there is no standardized protocol for acoustic data management, which limits the usability of these datasets for voice artificial intelligence (AI) research. OBJECTIVE The aim was to capture current practices of voice data collection, storage, analysis, and perceived limitations to collaborative voice research. METHODS A 30-question online survey was developed with expert guidance from the voicecollab.ai members, an international collaborative of voice AI researchers. The survey was disseminated via REDCap to an estimated 200 practitioners at North American voice centers. Survey questions assessed respondents' current practices in terms of acoustic data collection, storage, and retrieval as well as limitations to collaborative voice research. RESULTS Seventy-two respondents completed the survey of which 81.7% were laryngologists and 18.3% were speech language pathologists (SLPs). Eighteen percent of respondents reported seeing 40%-60% and 55% reported seeing >60 patients with voice disorders weekly (conservative estimate of over 4000 patients/week). Only 28% of respondents reported utilizing standardized protocols for collection and storage of acoustic data. Although, 87% of respondents conduct voice research, only 38% of respondents report doing so on a multi-institutional level. Perceived limitations to conducting collaborative voice research include lack of standardized methodology for collection (30%) and lack of human resources to prepare and label voice data adequately (55%). CONCLUSION To conduct large-scale multi-institutional voice research with AI, there is a pertinent need for standardization of acoustic data management, as well as an infrastructure for secure and efficient data sharing. LEVEL OF EVIDENCE 5 Laryngoscope, 134:1333-1339, 2024.
Collapse
Affiliation(s)
- Emily Evangelista
- University of South Florida Morsani College of Medicine, Tampa, Florida, U.S.A
| | - Rohan Kale
- Department of Biology, University of South Florida, Tampa, Florida, U.S.A
| | | | - Anais Rameau
- Department of Otolaryngology, Head and Neck Surgery Weill Cornell Medical College, Ithaca, New York, U.S.A
| | - Alexander Gelbard
- Department of Otolaryngology, Head and Neck Surgery Vanderbilt University Medical Center, Nashville, Tennessee, U.S.A
| | - Maria Powell
- Department of Otolaryngology, Head and Neck Surgery Vanderbilt University Medical Center, Nashville, Tennessee, U.S.A
| | - Michael Johns
- Department of Otolaryngology-Head and Neck Surgery Keck College of Medicine, University of Southern California, Los Angeles, California, U.S.A
| | - Anthony Law
- Department of Otolaryngology, Emory University School of Medicine, Atlanta, Georgia, U.S.A
| | - Phillip Song
- Massachusetts Eye and Ear, Division of Laryngology, Otolaryngology-Head and Neck Surgery Harvard Medical School, Boston, Massachusetts, U.S.A
| | - Matthew Naunheim
- Massachusetts Eye and Ear, Division of Laryngology, Otolaryngology-Head and Neck Surgery Harvard Medical School, Boston, Massachusetts, U.S.A
| | - Stephanie Watts
- Department of Otolaryngology, Head and Neck Surgery at University of South Florida Morsani College of Medicine, Tampa, Florida, U.S.A
| | - Paul C Bryson
- Department of Otolaryngology, Head and Neck Surgery at Cleveland Clinic, Cleveland, Ohio, U.S.A
| | - Matthew G Crowson
- Massachusetts Eye and Ear, Otolaryngology-Head and Neck Surgery Harvard Medical School, Boston, Massachusetts, U.S.A
| | - Jeremy Pinto
- Mila Quebec Artificial Intelligence Institute, Montreal, Quebec, Canada
| | - Yael Bensoussan
- Division of Laryngology Department of Otolaryngology, Head and Neck Surgery at University of South Florida Morsani College of Medicine, Tampa, Florida, U.S.A
| |
Collapse
|
24
|
Treccarichi S, Failla P, Vinci M, Musumeci A, Gloria A, Vasta A, Calabrese G, Papa C, Federico C, Saccone S, Calì F. UNC5C: Novel Gene Associated with Psychiatric Disorders Impacts Dysregulation of Axon Guidance Pathways. Genes (Basel) 2024; 15:306. [PMID: 38540364 PMCID: PMC10970690 DOI: 10.3390/genes15030306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 02/23/2024] [Accepted: 02/25/2024] [Indexed: 06/14/2024] Open
Abstract
The UNC-5 family of netrin receptor genes, predominantly expressed in brain tissues, plays a pivotal role in various neuronal processes. Mutations in genes involved in axon development contribute to a wide spectrum of human diseases, including developmental, neuropsychiatric, and neurodegenerative disorders. The NTN1/DCC signaling pathway, interacting with UNC5C, plays a crucial role in central nervous system axon guidance and has been associated with psychiatric disorders during adolescence in humans. Whole-exome sequencing analysis unveiled two compound heterozygous causative mutations within the UNC5C gene in a patient diagnosed with psychiatric disorders. In silico analysis demonstrated that neither of the observed variants affected the allosteric linkage between UNC5C and NTN1. In fact, these mutations are located within crucial cytoplasmic domains, specifically ZU5 and the region required for the netrin-mediated axon repulsion of neuronal growth cones. These domains play a critical role in forming the supramodular protein structure and directly interact with microtubules, thereby ensuring the functionality of the axon repulsion process. We emphasize that these mutations disrupt the aforementioned processes, thereby associating the UNC5C gene with psychiatric disorders for the first time and expanding the number of genes related to psychiatric disorders. Further research is required to validate the correlation of the UNC5C gene with psychiatric disorders, but we suggest including it in the genetic analysis of patients with psychiatric disorders.
Collapse
Affiliation(s)
- Simone Treccarichi
- Oasi Research Institute-IRCCS, 94018 Troina, Italy; (S.T.); (P.F.); (M.V.); (A.M.); (A.G.); (A.V.); (G.C.); (C.P.); (F.C.)
| | - Pinella Failla
- Oasi Research Institute-IRCCS, 94018 Troina, Italy; (S.T.); (P.F.); (M.V.); (A.M.); (A.G.); (A.V.); (G.C.); (C.P.); (F.C.)
| | - Mirella Vinci
- Oasi Research Institute-IRCCS, 94018 Troina, Italy; (S.T.); (P.F.); (M.V.); (A.M.); (A.G.); (A.V.); (G.C.); (C.P.); (F.C.)
| | - Antonino Musumeci
- Oasi Research Institute-IRCCS, 94018 Troina, Italy; (S.T.); (P.F.); (M.V.); (A.M.); (A.G.); (A.V.); (G.C.); (C.P.); (F.C.)
| | - Angelo Gloria
- Oasi Research Institute-IRCCS, 94018 Troina, Italy; (S.T.); (P.F.); (M.V.); (A.M.); (A.G.); (A.V.); (G.C.); (C.P.); (F.C.)
| | - Anna Vasta
- Oasi Research Institute-IRCCS, 94018 Troina, Italy; (S.T.); (P.F.); (M.V.); (A.M.); (A.G.); (A.V.); (G.C.); (C.P.); (F.C.)
| | - Giuseppe Calabrese
- Oasi Research Institute-IRCCS, 94018 Troina, Italy; (S.T.); (P.F.); (M.V.); (A.M.); (A.G.); (A.V.); (G.C.); (C.P.); (F.C.)
| | - Carla Papa
- Oasi Research Institute-IRCCS, 94018 Troina, Italy; (S.T.); (P.F.); (M.V.); (A.M.); (A.G.); (A.V.); (G.C.); (C.P.); (F.C.)
| | - Concetta Federico
- Department Biological, Geological and Environmental Sciences, University of Catania, Via Androne 81, 95124 Catania, Italy;
| | - Salvatore Saccone
- Department Biological, Geological and Environmental Sciences, University of Catania, Via Androne 81, 95124 Catania, Italy;
| | - Francesco Calì
- Oasi Research Institute-IRCCS, 94018 Troina, Italy; (S.T.); (P.F.); (M.V.); (A.M.); (A.G.); (A.V.); (G.C.); (C.P.); (F.C.)
| |
Collapse
|
25
|
Albert P, Haider F, Luz S. CUSCO: An Unobtrusive Custom Secure Audio-Visual Recording System for Ambient Assisted Living. SENSORS (BASEL, SWITZERLAND) 2024; 24:1506. [PMID: 38475042 DOI: 10.3390/s24051506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 02/21/2024] [Accepted: 02/24/2024] [Indexed: 03/14/2024]
Abstract
The ubiquity of digital technology has facilitated detailed recording of human behaviour. Ambient technology has been used to capture behaviours in a broad range of applications ranging from healthcare and monitoring to assessment of cooperative work. However, existing systems often face challenges in terms of autonomy, usability, and privacy. This paper presents a portable, easy-to-use and privacy-preserving system for capturing behavioural signals unobtrusively in home or in office settings. The system focuses on the capture of audio, video, and depth imaging. It is based on a device built on a small-factor platform that incorporates ambient sensors which can be integrated with the audio and depth video hardware for multimodal behaviour tracking. The system can be accessed remotely and integrated into a network of sensors. Data are encrypted in real time to ensure safety and privacy. We illustrate uses of the device in two different settings, namely, a healthy-ageing IoT application, where the device is used in conjunction with a range of IoT sensors to monitor an older person's mental well-being at home, and a healthcare communication quality assessment application, where the device is used to capture a patient-clinician interaction for consultation quality appraisal. CUSCO can automatically detect active speakers, extract acoustic features, record video and depth streams, and recognise emotions and cognitive impairment with promising accuracy.
Collapse
Affiliation(s)
- Pierre Albert
- National Institute for Public Health and the Environment, 3721 MA Bilthoven, The Netherlands
| | - Fasih Haider
- School of Engineering, The University of Edinburgh, Edinburgh EH9 3JW, UK
| | - Saturnino Luz
- Usher Institute, Edinburgh Medical School, The University of Edinburgh, Edinburgh EH8 9YL, UK
| |
Collapse
|
26
|
Li S, Nair R, Naqvi SM. Acoustic and Text Features Analysis for Adult ADHD Screening: A Data-Driven Approach Utilizing DIVA Interview. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE 2024; 12:359-370. [PMID: 38606391 PMCID: PMC11008805 DOI: 10.1109/jtehm.2024.3369764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 01/09/2024] [Accepted: 02/15/2024] [Indexed: 04/13/2024]
Abstract
Attention Deficit Hyperactivity Disorder (ADHD) is a neurodevelopmental disorder commonly seen in childhood that leads to behavioural changes in social development and communication patterns, often continues into undiagnosed adulthood due to a global shortage of psychiatrists, resulting in delayed diagnoses with lasting consequences on individual's well-being and the societal impact. Recently, machine learning methodologies have been incorporated into healthcare systems to facilitate the diagnosis and enhance the potential prediction of treatment outcomes for mental health conditions. In ADHD detection, the previous research focused on utilizing functional magnetic resonance imaging (fMRI) or Electroencephalography (EEG) signals, which require costly equipment and trained personnel for data collection. In recent years, speech and text modalities have garnered increasing attention due to their cost-effectiveness and non-wearable sensing in data collection. In this research, conducted in collaboration with the Cumbria, Northumberland, Tyne and Wear NHS Foundation Trust, we gathered audio data from both ADHD patients and normal controls based on the clinically popular Diagnostic Interview for ADHD in adults (DIVA). Subsequently, we transformed the speech data into text modalities through the utilization of the Google Cloud Speech API. We extracted both acoustic and text features from the data, encompassing traditional acoustic features (e.g., MFCC), specialized feature sets (e.g., eGeMAPS), as well as deep-learned linguistic and semantic features derived from pre-trained deep learning models. These features are employed in conjunction with a support vector machine for ADHD classification, yielding promising outcomes in the utilization of audio and text data for effective adult ADHD screening. Clinical impact: This research introduces a transformative approach in ADHD diagnosis, employing speech and text analysis to facilitate early and more accessible detection, particularly beneficial in areas with limited psychiatric resources. Clinical and Translational Impact Statement: The successful application of machine learning techniques in analyzing audio and text data for ADHD screening represents a significant advancement in mental health diagnostics, paving the way for its integration into clinical settings and potentially improving patient outcomes on a broader scale.
Collapse
Affiliation(s)
- Shuanglin Li
- Intelligent Sensing and Communications Group, School of EngineeringNewcastle UniversityNE1 7RUNewcastle Upon TyneU.K
| | - Rajesh Nair
- Adult ADHD Services, Cumbria, Northumberland, Tyne and Wear NHS Foundation TrustNE3 3XTNewcastle Upon TyneU.K
| | - Syed Mohsen Naqvi
- Intelligent Sensing and Communications Group, School of EngineeringNewcastle UniversityNE1 7RUNewcastle Upon TyneU.K
| |
Collapse
|
27
|
Luo J, Wu Y, Liu M, Li Z, Wang Z, Zheng Y, Feng L, Lu J, He F. Differentiation between depression and bipolar disorder in child and adolescents by voice features. Child Adolesc Psychiatry Ment Health 2024; 18:19. [PMID: 38287442 PMCID: PMC10826007 DOI: 10.1186/s13034-024-00708-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Accepted: 01/11/2024] [Indexed: 01/31/2024] Open
Abstract
OBJECTIVE Major depressive disorder (MDD) and bipolar disorder (BD) are serious chronic disabling mental and emotional disorders, with symptoms that often manifest atypically in children and adolescents, making diagnosis difficult without objective physiological indicators. Therefore, we aimed to objectively identify MDD and BD in children and adolescents by exploring their voiceprint features. METHODS This study included a total of 150 participants, with 50 MDD patients, 50 BD patients, and 50 healthy controls aged between 6 and 16 years. After collecting voiceprint data, chi-square test was used to screen and extract voiceprint features specific to emotional disorders in children and adolescents. Then, selected characteristic voiceprint features were used to establish training and testing datasets with the ratio of 7:3. The performances of various machine learning and deep learning algorithms were compared using the training dataset, and the optimal algorithm was selected to classify the testing dataset and calculate the sensitivity, specificity, accuracy, and ROC curve. RESULTS The three groups showed differences in clustering centers for various voice features such as root mean square energy, power spectral slope, low-frequency percentile energy level, high-frequency spectral slope, spectral harmonic gain, and audio signal energy level. The model of linear SVM showed the best performance in the training dataset, achieving a total accuracy of 95.6% in classifying the three groups in the testing dataset, with sensitivity of 93.3% for MDD, 100% for BD, specificity of 93.3%, AUC of 1 for BD, and AUC of 0.967 for MDD. CONCLUSION By exploring the characteristics of voice features in children and adolescents, machine learning can effectively differentiate between MDD and BD in a population, and voice features hold promise as an objective physiological indicator for the auxiliary diagnosis of mood disorder in clinical practice.
Collapse
Affiliation(s)
- Jie Luo
- National Clinical Research Center for Mental Disorders, Beijing Key Laboratory of Mental Disorders, Beijing Anding Hospital, Beijing Institute for Brain Disorders Capital Medical University, De Sheng Men Wai An Kang Hu Tong 5 Hao, Xi Cheng Qu, Beijing, 100088, People's Republic of China
| | - Yuanzhen Wu
- National Clinical Research Center for Mental Disorders, Beijing Key Laboratory of Mental Disorders, Beijing Anding Hospital, Beijing Institute for Brain Disorders Capital Medical University, De Sheng Men Wai An Kang Hu Tong 5 Hao, Xi Cheng Qu, Beijing, 100088, People's Republic of China
| | - Mengqi Liu
- National Clinical Research Center for Mental Disorders, Beijing Key Laboratory of Mental Disorders, Beijing Anding Hospital, Beijing Institute for Brain Disorders Capital Medical University, De Sheng Men Wai An Kang Hu Tong 5 Hao, Xi Cheng Qu, Beijing, 100088, People's Republic of China
| | - Zhaojun Li
- Beijing Institute of Technology, School of Integrated Circuits and Electronics, Zhongguancun South Street 5 Hao, Hai Dian Qu, Beijing, 100081, China
| | - Zhuo Wang
- Beijing Institute of Technology, School of Integrated Circuits and Electronics, Zhongguancun South Street 5 Hao, Hai Dian Qu, Beijing, 100081, China
| | - Yi Zheng
- National Clinical Research Center for Mental Disorders, Beijing Key Laboratory of Mental Disorders, Beijing Anding Hospital, Beijing Institute for Brain Disorders Capital Medical University, De Sheng Men Wai An Kang Hu Tong 5 Hao, Xi Cheng Qu, Beijing, 100088, People's Republic of China
| | - Lihui Feng
- Beijing Institute of Technology, School of Optics and Photonics, Zhongguancun South Street 5 Hao, Hai Dian Qu, Beijing, 100081, China
| | - Jihua Lu
- Beijing Institute of Technology, School of Integrated Circuits and Electronics, Zhongguancun South Street 5 Hao, Hai Dian Qu, Beijing, 100081, China.
| | - Fan He
- National Clinical Research Center for Mental Disorders, Beijing Key Laboratory of Mental Disorders, Beijing Anding Hospital, Beijing Institute for Brain Disorders Capital Medical University, De Sheng Men Wai An Kang Hu Tong 5 Hao, Xi Cheng Qu, Beijing, 100088, People's Republic of China.
| |
Collapse
|
28
|
Zolnoori M, Sridharan S, Zolnour A, Vergez S, McDonald MV, Kostic Z, Bowles KH, Topaz M. Utilizing patient-nurse verbal communication in building risk identification models: the missing critical data stream in home healthcare. J Am Med Inform Assoc 2024; 31:435-444. [PMID: 37847651 PMCID: PMC10797261 DOI: 10.1093/jamia/ocad195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Accepted: 09/21/2023] [Indexed: 10/19/2023] Open
Abstract
BACKGROUND In the United States, over 12 000 home healthcare agencies annually serve 6+ million patients, mostly aged 65+ years with chronic conditions. One in three of these patients end up visiting emergency department (ED) or being hospitalized. Existing risk identification models based on electronic health record (EHR) data have suboptimal performance in detecting these high-risk patients. OBJECTIVES To measure the added value of integrating audio-recorded home healthcare patient-nurse verbal communication into a risk identification model built on home healthcare EHR data and clinical notes. METHODS This pilot study was conducted at one of the largest not-for-profit home healthcare agencies in the United States. We audio-recorded 126 patient-nurse encounters for 47 patients, out of which 8 patients experienced ED visits and hospitalization. The risk model was developed and tested iteratively using: (1) structured data from the Outcome and Assessment Information Set, (2) clinical notes, and (3) verbal communication features. We used various natural language processing methods to model the communication between patients and nurses. RESULTS Using a Support Vector Machine classifier, trained on the most informative features from OASIS, clinical notes, and verbal communication, we achieved an AUC-ROC = 99.68 and an F1-score = 94.12. By integrating verbal communication into the risk models, the F-1 score improved by 26%. The analysis revealed patients at high risk tended to interact more with risk-associated cues, exhibit more "sadness" and "anxiety," and have extended periods of silence during conversation. CONCLUSION This innovative study underscores the immense value of incorporating patient-nurse verbal communication in enhancing risk prediction models for hospitalizations and ED visits, suggesting the need for an evolved clinical workflow that integrates routine patient-nurse verbal communication recording into the medical record.
Collapse
Affiliation(s)
- Maryam Zolnoori
- School of Nursing, Columbia University, New York, NY 10032, United States
- Center for Home Care Policy & Research, VNS Health, New York, NY 10017, United States
| | | | - Ali Zolnour
- School of Electrical and Computer Engineering, University of Tehran, Tehran 14395-515, Iran
| | - Sasha Vergez
- Center for Home Care Policy & Research, VNS Health, New York, NY 10017, United States
| | - Margaret V McDonald
- Center for Home Care Policy & Research, VNS Health, New York, NY 10017, United States
| | - Zoran Kostic
- Electrical Engineering Department, Columbia University, New York, NY 10027, United States
| | - Kathryn H Bowles
- Center for Home Care Policy & Research, VNS Health, New York, NY 10017, United States
- School of Nursing, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Maxim Topaz
- School of Nursing, Columbia University, New York, NY 10032, United States
- Center for Home Care Policy & Research, VNS Health, New York, NY 10017, United States
| |
Collapse
|
29
|
Wadle LM, Ebner-Priemer UW, Foo JC, Yamamoto Y, Streit F, Witt SH, Frank J, Zillich L, Limberger MF, Ablimit A, Schultz T, Gilles M, Rietschel M, Sirignano L. Speech Features as Predictors of Momentary Depression Severity in Patients With Depressive Disorder Undergoing Sleep Deprivation Therapy: Ambulatory Assessment Pilot Study. JMIR Ment Health 2024; 11:e49222. [PMID: 38236637 PMCID: PMC10835582 DOI: 10.2196/49222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 10/21/2023] [Indexed: 01/19/2024] Open
Abstract
BACKGROUND The use of mobile devices to continuously monitor objectively extracted parameters of depressive symptomatology is seen as an important step in the understanding and prevention of upcoming depressive episodes. Speech features such as pitch variability, speech pauses, and speech rate are promising indicators, but empirical evidence is limited, given the variability of study designs. OBJECTIVE Previous research studies have found different speech patterns when comparing single speech recordings between patients and healthy controls, but only a few studies have used repeated assessments to compare depressive and nondepressive episodes within the same patient. To our knowledge, no study has used a series of measurements within patients with depression (eg, intensive longitudinal data) to model the dynamic ebb and flow of subjectively reported depression and concomitant speech samples. However, such data are indispensable for detecting and ultimately preventing upcoming episodes. METHODS In this study, we captured voice samples and momentary affect ratings over the course of 3 weeks in a sample of patients (N=30) with an acute depressive episode receiving stationary care. Patients underwent sleep deprivation therapy, a chronotherapeutic intervention that can rapidly improve depression symptomatology. We hypothesized that within-person variability in depressive and affective momentary states would be reflected in the following 3 speech features: pitch variability, speech pauses, and speech rate. We parametrized them using the extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS) from open-source Speech and Music Interpretation by Large-Space Extraction (openSMILE; audEERING GmbH) and extracted them from a transcript. We analyzed the speech features along with self-reported momentary affect ratings, using multilevel linear regression analysis. We analyzed an average of 32 (SD 19.83) assessments per patient. RESULTS Analyses revealed that pitch variability, speech pauses, and speech rate were associated with depression severity, positive affect, valence, and energetic arousal; furthermore, speech pauses and speech rate were associated with negative affect, and speech pauses were additionally associated with calmness. Specifically, pitch variability was negatively associated with improved momentary states (ie, lower pitch variability was linked to lower depression severity as well as higher positive affect, valence, and energetic arousal). Speech pauses were negatively associated with improved momentary states, whereas speech rate was positively associated with improved momentary states. CONCLUSIONS Pitch variability, speech pauses, and speech rate are promising features for the development of clinical prediction technologies to improve patient care as well as timely diagnosis and monitoring of treatment response. Our research is a step forward on the path to developing an automated depression monitoring system, facilitating individually tailored treatments and increased patient empowerment.
Collapse
Affiliation(s)
- Lisa-Marie Wadle
- Mental mHealth Lab, Institute of Sports and Sports Science, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Ulrich W Ebner-Priemer
- Mental mHealth Lab, Institute of Sports and Sports Science, Karlsruhe Institute of Technology, Karlsruhe, Germany
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, University of Heidelberg, Mannheim, Germany
| | - Jerome C Foo
- Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, University of Heidelberg, Mannheim, Germany
- Institute for Psychopharmacology, Central Institute of Mental Health, University of Heidelberg, Mannheim, Germany
- Department of Psychiatry, College of Health Sciences, University of Alberta, Edmonton, AB, Canada
| | - Yoshiharu Yamamoto
- Educational Physiology Laboratory, Graduate School of Education, University of Tokyo, Tokyo, Japan
| | - Fabian Streit
- Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, University of Heidelberg, Mannheim, Germany
| | - Stephanie H Witt
- Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, University of Heidelberg, Mannheim, Germany
| | - Josef Frank
- Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, University of Heidelberg, Mannheim, Germany
| | - Lea Zillich
- Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, University of Heidelberg, Mannheim, Germany
| | - Matthias F Limberger
- Mental mHealth Lab, Institute of Sports and Sports Science, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | | | - Tanja Schultz
- Cognitive Systems Lab, University of Bremen, Bremen, Germany
| | - Maria Gilles
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, University of Heidelberg, Mannheim, Germany
| | - Marcella Rietschel
- Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, University of Heidelberg, Mannheim, Germany
| | - Lea Sirignano
- Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, University of Heidelberg, Mannheim, Germany
| |
Collapse
|
30
|
Bélisle-Pipon JC, Powell M, English R, Malo MF, Ravitsky V, Bensoussan Y. Stakeholder perspectives on ethical and trustworthy voice AI in health care. Digit Health 2024; 10:20552076241260407. [PMID: 39055787 PMCID: PMC11271113 DOI: 10.1177/20552076241260407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Accepted: 05/21/2024] [Indexed: 07/27/2024] Open
Abstract
Objective Voice as a health biomarker using artificial intelligence (AI) is gaining momentum in research. The noninvasiveness of voice data collection through accessible technology (such as smartphones, telehealth, and ambient recordings) or within clinical contexts means voice AI may help address health disparities and promote the inclusion of marginalized communities. However, the development of AI-ready voice datasets free from bias and discrimination is a complex task. The objective of this study is to better understand the perspectives of engaged and interested stakeholders regarding ethical and trustworthy voice AI, to inform both further ethical inquiry and technology innovation. Methods A questionnaire was administered to voice AI experts, clinicians, scholars, patients, trainees, and policy-makers who participated at the 2023 Voice AI Symposium organized by the Bridge2AI-Voice AI Consortium. The survey used a mix of Likert scale, ranking and open-ended questions. A total of 27 stakeholders participated in the study. Results The main results of the study are the identification of priorities in terms of ethical issues, an initial definition of ethically sourced data for voice AI, insights into the use of synthetic voice data, and proposals for acting on the trustworthiness of voice AI. The study shows a diversity of perspectives and adds nuance to the planning and development of ethical and trustworthy voice AI. Conclusions This study represents the first stakeholder survey related to voice as a biomarker of health published to date. This study sheds light on the critical importance of ethics and trustworthiness in the development of voice AI technologies for health applications.
Collapse
Affiliation(s)
| | - Maria Powell
- Vanderbilt University Medical Center, Department of Otolaryngology-Head & Neck Surgery, Nashville, TN, Canada
| | - Renee English
- Faculty of Health Sciences, Simon Fraser University, Burnaby, BC, Canada
| | | | - Vardit Ravitsky
- Hastings Center, Garrison, NY, USA
- Department of Global Health and Social Medicine, Harvard University, Cambridge, MA, USA
| | | | - Yael Bensoussan
- Department of Otolaryngology-Head & Neck Surgery, University of South Florida, Tampa, FL, USA
| |
Collapse
|
31
|
Silva WJ, Lopes L, Galdino MKC, Almeida AA. Voice Acoustic Parameters as Predictors of Depression. J Voice 2024; 38:77-85. [PMID: 34353686 DOI: 10.1016/j.jvoice.2021.06.018] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Revised: 05/24/2021] [Accepted: 06/02/2021] [Indexed: 10/20/2022]
Abstract
OBJECTIVE To analyze whether voice acoustic parameters are discriminant and predictive in patients with and without depression. METHODS Observational case-control study. The following instruments were administered to the participants: Self-Reporting Questionnaire (SRQ-20), Beck Depression Inventory-Second Edition (BDI-II), Voice Symptom Scale (VoiSS) and voice collection for subsequent extraction of the following acoustic parameters: mean, mode and standard deviation (SD) of the fundamental frequency (F0); jitter; shimmer; glottal to noise excitation ratio (GNE); cepstral peak prominence-smoothed (CPPS); and spectral tilt. A total of 144 individuals participated in the study: 54 patients diagnosed with depression (case group) and 90 without a diagnosis of depression (control group). RESULTS The means of the acoustic parameters showed differences between the groups: F0 (SD), jitter, and shimmer values were high, while values for GNE, CPPS and spectral tilt were lower in the case group than in the control group. There was a significant association between BDI-II and jitter, shimmer, CPPS, and spectral tilt and between CPPS and the class of antidepressants used. The multiple linear regression model showed that jitter and CPPS were predictors of depression, as measured by the BDI-II. CONCLUSION Acoustic parameters were able to discriminate between patients with and without depression and were associated with BDI-II scores. The class of antidepressants used was associated with CPPS, and the jitter and CPPS parameters were able to predict the presence of depression, as measured by the BDI-II clinical score.
Collapse
Affiliation(s)
- Wegina Jordana Silva
- Department of Speech Therapy, Federal University of Paraíba (UFPB) and Federal University of Rio Grande do Norte (UFRN), João Pessoa, Paraíba, Brazil.
| | - Leonardo Lopes
- Department of Speech Therapy, Federal University of Paraíba (UFPB), Graduate Program in Speech Therapy, Federal University of Paraíba (UFPB) and Federal University of Rio Grande do Norte (UFRN - PPgFon), Graduate Program in Decision and Health Models (PPgMDS), and Graduate Program in Linguistic (PROLING) of UFPB, João Pessoa, Paraíba, Brazil.
| | - Melyssa Kellyane Cavalcanti Galdino
- Department of Psychology, Federal University of Paraíba (UFPB), Graduate Program in Cognitive Neuroscience and Behavior (PPgNeC) of UFPB, João Pessoa, Paraíba, Brazil.
| | - Anna Alice Almeida
- Department of Speech Therapy, Federal University of Paraíba (UFPB), Graduate Program in Speech Therapy, Federal University of Paraíba (UFPB) and Federal University of Rio Grande do Norte (UFRN - PPgFon), Graduate Program in Decision and Health Models (PPgMDS), and Graduate Program in Cognitive Neuroscience and Behavior (PPgNeC) of UFPB, João Pessoa, Paraíba, Brazil.
| |
Collapse
|
32
|
Chopra H, Annu, Shin DK, Munjal K, Priyanka, Dhama K, Emran TB. Revolutionizing clinical trials: the role of AI in accelerating medical breakthroughs. Int J Surg 2023; 109:4211-4220. [PMID: 38259001 PMCID: PMC10720846 DOI: 10.1097/js9.0000000000000705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 08/13/2023] [Indexed: 01/24/2024]
Abstract
Clinical trials are the essential assessment for safe, reliable, and effective drug development. Data-related limitations, extensive manual efforts, remote patient monitoring, and the complexity of traditional clinical trials on patients drive the application of Artificial Intelligence (AI) in medical and healthcare organisations. For expeditious and streamlined clinical trials, a personalised AI solution is the best utilisation. AI provides broad utility options through structured, standardised, and digitally driven elements in medical research. The clinical trials are a time-consuming process with patient recruitment, enrolment, frequent monitoring, and medical adherence and retention. With an AI-powered tool, the automated data can be generated and managed for the trial lifecycle with all the records of the medical history of the patient as patient-centric AI. AI can intelligently interpret the data, feed downstream systems, and automatically fill out the required analysis report. This article explains how AI has revolutionised innovative ways of collecting data, biosimulation, and early disease diagnosis for clinical trials and overcomes the challenges more precisely through cost and time reduction, improved efficiency, and improved drug development research with less need for rework. The future implications of AI to accelerate clinical trials are important in medical research because of its fast output and overall utility.
Collapse
Affiliation(s)
- Hitesh Chopra
- Department of Biosciences, Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Chennai - 602105, Tamil Nadu, India
| | - Annu
- Thin Film and Materials Laboratory, School of Mechanical Engineering, Yeungnam University, Gyeongsan 38541, Republic of Korea
| | - Dong K. Shin
- Thin Film and Materials Laboratory, School of Mechanical Engineering, Yeungnam University, Gyeongsan 38541, Republic of Korea
| | - Kavita Munjal
- Department of Pharmacy, Amity Institute of Pharmacy, Amity University, Noida, Uttar Pradesh 201303, India
| | - Priyanka
- Department of Veterinary Microbiology, College of Veterinary Science, Guru Angad Dev Veterinary and Animal Sciences University (GADVASU), Rampura Phul, Bathinda, Punjab
| | - Kuldeep Dhama
- Indian Veterinary Research Institute (IVRI), Izatnagar, Bareilly, Uttar Pradesh
| | - Talha B. Emran
- Department of Pharmacy, BGC Trust University Bangladesh, Chittagong
- Department of Pharmacy, Faculty of Allied Health Sciences, Daffodil International niversity, Dhaka, Bangladesh
| |
Collapse
|
33
|
Mao K, Wu Y, Chen J. A systematic review on automated clinical depression diagnosis. NPJ MENTAL HEALTH RESEARCH 2023; 2:20. [PMID: 38609509 PMCID: PMC10955993 DOI: 10.1038/s44184-023-00040-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 09/27/2023] [Indexed: 04/14/2024]
Abstract
Assessing mental health disorders and determining treatment can be difficult for a number of reasons, including access to healthcare providers. Assessments and treatments may not be continuous and can be limited by the unpredictable nature of psychiatric symptoms. Machine-learning models using data collected in a clinical setting can improve diagnosis and treatment. Studies have used speech, text, and facial expression analysis to identify depression. Still, more research is needed to address challenges such as the need for multimodality machine-learning models for clinical use. We conducted a review of studies from the past decade that utilized speech, text, and facial expression analysis to detect depression, as defined by the Diagnostic and Statistical Manual of Mental Disorders (DSM-5), using the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guideline. We provide information on the number of participants, techniques used to assess clinical outcomes, speech-eliciting tasks, machine-learning algorithms, metrics, and other important discoveries for each study. A total of 544 studies were examined, 264 of which satisfied the inclusion criteria. A database has been created containing the query results and a summary of how different features are used to detect depression. While machine learning shows its potential to enhance mental health disorder evaluations, some obstacles must be overcome, especially the requirement for more transparent machine-learning models for clinical purposes. Considering the variety of datasets, feature extraction techniques, and metrics used in this field, guidelines have been provided to collect data and train machine-learning models to guarantee reproducibility and generalizability across different contexts.
Collapse
Affiliation(s)
- Kaining Mao
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, T6G 2R3, Canada
| | - Yuqi Wu
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, T6G 2R3, Canada
| | - Jie Chen
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, T6G 2R3, Canada.
| |
Collapse
|
34
|
Cummins N, Dineley J, Conde P, Matcham F, Siddi S, Lamers F, Carr E, Lavelle G, Leightley D, White KM, Oetzmann C, Campbell EL, Simblett S, Bruce S, Haro JM, Penninx BWJH, Ranjan Y, Rashid Z, Stewart C, Folarin AA, Bailón R, Schuller BW, Wykes T, Vairavan S, Dobson RJB, Narayan VA, Hotopf M. Multilingual markers of depression in remotely collected speech samples: A preliminary analysis. J Affect Disord 2023; 341:128-136. [PMID: 37598722 DOI: 10.1016/j.jad.2023.08.097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Revised: 08/16/2023] [Accepted: 08/17/2023] [Indexed: 08/22/2023]
Abstract
BACKGROUND Speech contains neuromuscular, physiological and cognitive components, and so is a potential biomarker of mental disorders. Previous studies indicate that speaking rate and pausing are associated with major depressive disorder (MDD). However, results are inconclusive as many studies are small and underpowered and do not include clinical samples. These studies have also been unilingual and use speech collected in controlled settings. If speech markers are to help understand the onset and progress of MDD, we need to uncover markers that are robust to language and establish the strength of associations in real-world data. METHODS We collected speech data in 585 participants with a history of MDD in the United Kingdom, Spain, and Netherlands as part of the RADAR-MDD study. Participants recorded their speech via smartphones every two weeks for 18 months. Linear mixed models were used to estimate the strength of specific markers of depression from a set of 28 speech features. RESULTS Increased depressive symptoms were associated with speech rate, articulation rate and intensity of speech elicited from a scripted task. These features had consistently stronger effect sizes than pauses. LIMITATIONS Our findings are derived at the cohort level so may have limited impact on identifying intra-individual speech changes associated with changes in symptom severity. The analysis of features averaged over the entire recording may have underestimated the importance of some features. CONCLUSIONS Participants with more severe depressive symptoms spoke more slowly and quietly. Our findings are from a real-world, multilingual, clinical dataset so represent a step-change in the usefulness of speech as a digital phenotype of MDD.
Collapse
Affiliation(s)
- Nicholas Cummins
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.
| | - Judith Dineley
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Germany
| | - Pauline Conde
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Faith Matcham
- School of Psychology, University of Sussex, Falmer, UK; Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Sara Siddi
- Parc Sanitari Sant Joan de Déu, Fundació Sant Joan de Déu, CIBERSAM, Barcelona, Spain
| | - Femke Lamers
- Department of Psychiatry, Amsterdam Public Health Research Institute and Amsterdam Neuroscience, Amsterdam University Medical Centre, Vrije Universiteit and GGZ InGeest, Amsterdam, the Netherlands
| | - Ewan Carr
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Grace Lavelle
- School of Psychology, University of Sussex, Falmer, UK
| | - Daniel Leightley
- Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Katie M White
- Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Carolin Oetzmann
- Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Edward L Campbell
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; GTM research group, AtlanTTic Research Center, University of Vigo, Spain
| | - Sara Simblett
- Department of Psychology, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Stuart Bruce
- RADAR-CNS Patient Advisory Board, King's College London, UK
| | - Josep Maria Haro
- Parc Sanitari Sant Joan de Déu, Fundació Sant Joan de Déu, CIBERSAM, Barcelona, Spain
| | - Brenda W J H Penninx
- Department of Psychiatry, Amsterdam Public Health Research Institute and Amsterdam Neuroscience, Amsterdam University Medical Centre, Vrije Universiteit and GGZ InGeest, Amsterdam, the Netherlands
| | - Yatharth Ranjan
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Zulqarnain Rashid
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Callum Stewart
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Amos A Folarin
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; NIHR Biomedical Research Centre at South London, Maudsley NHS Foundation Trust, King's College London, London, UK
| | - Raquel Bailón
- Biomedical Signal Interpretation and Computational Simulation (BSICoS) group, Aragon Institute for Engineering Research, University of Zaragoza, Zaragoza, Spain; Biomedical Research Networking Center in Bioengineering, Biomaterials and Nanomedicine (CIBER-BBN), Spain
| | - Björn W Schuller
- Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Germany; GLAM - Group on Language, Audio, & Music, Imperial College London, London, UK
| | - Til Wykes
- Department of Psychology, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; NIHR Biomedical Research Centre at South London, Maudsley NHS Foundation Trust, King's College London, London, UK
| | | | - Richard J B Dobson
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; Institute of Health Informatics, University College London, London, UK
| | | | - Matthew Hotopf
- Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; NIHR Biomedical Research Centre at South London, Maudsley NHS Foundation Trust, King's College London, London, UK
| |
Collapse
|
35
|
Gerczuk M, Triantafyllopoulos A, Amiriparian S, Kathan A, Bauer J, Berking M, Schuller BW. Zero-shot personalization of speech foundation models for depressed mood monitoring. PATTERNS (NEW YORK, N.Y.) 2023; 4:100873. [PMID: 38035199 PMCID: PMC10682756 DOI: 10.1016/j.patter.2023.100873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 06/01/2023] [Accepted: 10/10/2023] [Indexed: 12/02/2023]
Abstract
The monitoring of depressed mood plays an important role as a diagnostic tool in psychotherapy. An automated analysis of speech can provide a non-invasive measurement of a patient's affective state. While speech has been shown to be a useful biomarker for depression, existing approaches mostly build population-level models that aim to predict each individual's diagnosis as a (mostly) static property. Because of inter-individual differences in symptomatology and mood regulation behaviors, these approaches are ill-suited to detect smaller temporal variations in depressed mood. We address this issue by introducing a zero-shot personalization of large speech foundation models. Compared with other personalization strategies, our work does not require labeled speech samples for enrollment. Instead, the approach makes use of adapters conditioned on subject-specific metadata. On a longitudinal dataset, we show that the method improves performance compared with a set of suitable baselines. Finally, applying our personalization strategy improves individual-level fairness.
Collapse
Affiliation(s)
- Maurice Gerczuk
- Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | | | - Shahin Amiriparian
- Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Alexander Kathan
- Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Jonathan Bauer
- Department of Clinical Psychology and Psychotherapy, Friedrich-Alexander-Universität, Erlangen-Nürnberg, Erlangen, Germany
| | - Matthias Berking
- Department of Clinical Psychology and Psychotherapy, Friedrich-Alexander-Universität, Erlangen-Nürnberg, Erlangen, Germany
| | - Björn W. Schuller
- Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
- GLAM, Imperial College, London, UK
| |
Collapse
|
36
|
Zhou Y, Han W, Yao X, Xue J, Li Z, Li Y. Developing a machine learning model for detecting depression, anxiety, and apathy in older adults with mild cognitive impairment using speech and facial expressions: A cross-sectional observational study. Int J Nurs Stud 2023; 146:104562. [PMID: 37531702 DOI: 10.1016/j.ijnurstu.2023.104562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 06/23/2023] [Accepted: 07/01/2023] [Indexed: 08/04/2023]
Abstract
BACKGROUND Depression, anxiety, and apathy are highly prevalent in older people with preclinical dementia and mild cognitive impairment. These symptoms have also proven valuable in predicting the progression from mild cognitive impairment to dementia, enabling a timely diagnosis and treatment. However, objective and reliable indicators to detect and distinguish depression, anxiety, and apathy are relatively scarce. OBJECTIVE This study aimed to develop a machine learning model to detect and distinguish depression, anxiety, and apathy based on speech and facial expressions. DESIGN An observational, cross-sectional study design. SETTING(S) The memory outpatient department of a tertiary hospital. PARTICIPANTS 319 older adults diagnosed with mild cognitive impairment. METHODS Depression, anxiety, and apathy were evaluated by the Public Health Questionnaire, General Anxiety Disorder, and Apathy Evaluation Scale, respectively. Speech and facial expressions of older adults with mild cognitive impairment were digitally captured using audio and video recording software. Open-source data analysis toolkits were utilized to extract speech, facial, and text features. The multiclass classification was used to develop classification models, and shapely additive explanations were used to explain the contribution of each feature within the model. RESULTS The random forest method was used to develop a multiclass emotion classification model, which performed well in classifying emotions with a weighted-average F1 score of 96.6 %. The model also demonstrated high accuracy, precision, and recall, with 87.4 %, 86.6 %, and 87.6 %, respectively. CONCLUSIONS The machine learning model developed in this study demonstrated strong classification performance in detecting and differentiating depression, anxiety, and apathy. This innovative approach combines text, audio, and video to provide objective methods for precise classification and remote monitoring of these symptoms in nursing practice. REGISTRATION This study was registered at the Chinese Clinical Trial Registry (registration number: ChiCTR1900023892; registration date: June 19th, 2019).
Collapse
Affiliation(s)
- Ying Zhou
- School of Nursing, Shanghai Jiao Tong University, Shanghai, China.
| | - Wei Han
- Department of Epidemiology and Biostatistics, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.
| | - Xiuyu Yao
- School of Nursing, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - JiaJun Xue
- School of Nursing, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Zheng Li
- School of Nursing, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.
| | - Yingxin Li
- Institute of Biomedical Engineering, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin, China.
| |
Collapse
|
37
|
Wang JZ, Zhao S, Wu C, Adams RB, Newman MG, Shafir T, Tsachor R. Unlocking the Emotional World of Visual Media: An Overview of the Science, Research, and Impact of Understanding Emotion: Drawing Insights From Psychology, Engineering, and the Arts, This Article Provides a Comprehensive Overview of the Field of Emotion Analysis in Visual Media and Discusses the Latest Research, Systems, Challenges, Ethical Implications, and Potential Impact of Artificial Emotional Intelligence on Society. PROCEEDINGS OF THE IEEE. INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS 2023; 111:1236-1286. [PMID: 37859667 PMCID: PMC10586271 DOI: 10.1109/jproc.2023.3273517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/21/2023]
Abstract
The emergence of artificial emotional intelligence technology is revolutionizing the fields of computers and robotics, allowing for a new level of communication and understanding of human behavior that was once thought impossible. While recent advancements in deep learning have transformed the field of computer vision, automated understanding of evoked or expressed emotions in visual media remains in its infancy. This foundering stems from the absence of a universally accepted definition of "emotion," coupled with the inherently subjective nature of emotions and their intricate nuances. In this article, we provide a comprehensive, multidisciplinary overview of the field of emotion analysis in visual media, drawing on insights from psychology, engineering, and the arts. We begin by exploring the psychological foundations of emotion and the computational principles that underpin the understanding of emotions from images and videos. We then review the latest research and systems within the field, accentuating the most promising approaches. We also discuss the current technological challenges and limitations of emotion analysis, underscoring the necessity for continued investigation and innovation. We contend that this represents a "Holy Grail" research problem in computing and delineate pivotal directions for future inquiry. Finally, we examine the ethical ramifications of emotion-understanding technologies and contemplate their potential societal impacts. Overall, this article endeavors to equip readers with a deeper understanding of the domain of emotion analysis in visual media and to inspire further research and development in this captivating and rapidly evolving field.
Collapse
Affiliation(s)
- James Z Wang
- College of Information Sciences and Technology, The Pennsylvania State University, University Park, PA 16802 USA
| | - Sicheng Zhao
- Beijing National Research Center for Information Science and Technology (BNRist), Tsinghua University, Beijing 100084, China
| | - Chenyan Wu
- College of Information Sciences and Technology, The Pennsylvania State University, University Park, PA 16802 USA
| | - Reginald B Adams
- Department of Psychology, The Pennsylvania State University, University Park, PA 16802 USA
| | - Michelle G Newman
- Department of Psychology, The Pennsylvania State University, University Park, PA 16802 USA
| | - Tal Shafir
- Emily Sagol Creative Arts Therapies Research Center, University of Haifa, Haifa 3498838, Israel
| | - Rachelle Tsachor
- School of Theatre and Music, University of Illinois at Chicago, Chicago, IL 60607 USA
| |
Collapse
|
38
|
Chandler C, Diaz‐Asper C, Turner RS, Reynolds B, Elvevåg B. An explainable machine learning model of cognitive decline derived from speech. ALZHEIMER'S & DEMENTIA (AMSTERDAM, NETHERLANDS) 2023; 15:e12516. [PMID: 38155915 PMCID: PMC10752754 DOI: 10.1002/dad2.12516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 11/26/2023] [Accepted: 11/27/2023] [Indexed: 12/30/2023]
Abstract
INTRODUCTION Traditional Alzheimer's disease (AD) and mild cognitive impairment (MCI) screening lacks the sensitivity and timeliness required to detect subtle indicators of cognitive decline. Multimodal artificial intelligence technologies using only speech data promise improved detection of neurodegenerative disorders. METHODS Speech collected over the telephone from 91 older participants who were cognitively healthy (n = 29) or had diagnoses of AD (n = 30) or amnestic MCI (aMCI; n = 32) was analyzed with multimodal natural language and speech processing methods. An explainable ensemble decision tree classifier for the multiclass prediction of cognitive decline was created. RESULTS This approach was 75% accurate overall-an improvement over traditional speech-based screening tools and a unimodal language-based model. We include a dashboard for the examination of the results, allowing for novel ways of interpreting such data. DISCUSSION This work provides a foundation for a meaningful change in medicine as clinical translation, scalability, and user friendliness were core to the methodologies. Highlights Remote assessments and artificial intelligence (AI) models allow greater access to cognitive decline screening.Speech impairments differ significantly between mild AD, amnestic mild cognitive impairment (aMCI), and healthy controls.AI predictions of cognitive decline are more accurate than experts and standard tools.The AI model was 75% accurate in classifying mild AD, aMCI, and healthy controls.
Collapse
Affiliation(s)
- Chelsea Chandler
- Institute of Cognitive ScienceUniversity of ColoradoBoulderColoradoUSA
| | | | - Raymond S. Turner
- Department of NeurologyGeorgetown UniversityWashingtonDistrict of ColumbiaUSA
| | - Brigid Reynolds
- Department of NeurologyGeorgetown UniversityWashingtonDistrict of ColumbiaUSA
| | - Brita Elvevåg
- Department of Clinical MedicineUniversity of Tromsø – the Arctic University of NorwayTromsøNorway
| |
Collapse
|
39
|
Zolnoori M, Vergez S, Sridharan S, Zolnour A, Bowles K, Kostic Z, Topaz M. Is the patient speaking or the nurse? Automatic speaker type identification in patient-nurse audio recordings. J Am Med Inform Assoc 2023; 30:1673-1683. [PMID: 37478477 PMCID: PMC10531109 DOI: 10.1093/jamia/ocad139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 06/06/2023] [Accepted: 07/16/2023] [Indexed: 07/23/2023] Open
Abstract
OBJECTIVES Patient-clinician communication provides valuable explicit and implicit information that may indicate adverse medical conditions and outcomes. However, practical and analytical approaches for audio-recording and analyzing this data stream remain underexplored. This study aimed to 1) analyze patients' and nurses' speech in audio-recorded verbal communication, and 2) develop machine learning (ML) classifiers to effectively differentiate between patient and nurse language. MATERIALS AND METHODS Pilot studies were conducted at VNS Health, the largest not-for-profit home healthcare agency in the United States, to optimize audio-recording patient-nurse interactions. We recorded and transcribed 46 interactions, resulting in 3494 "utterances" that were annotated to identify the speaker. We employed natural language processing techniques to generate linguistic features and built various ML classifiers to distinguish between patient and nurse language at both individual and encounter levels. RESULTS A support vector machine classifier trained on selected linguistic features from term frequency-inverse document frequency, Linguistic Inquiry and Word Count, Word2Vec, and Medical Concepts in the Unified Medical Language System achieved the highest performance with an AUC-ROC = 99.01 ± 1.97 and an F1-score = 96.82 ± 4.1. The analysis revealed patients' tendency to use informal language and keywords related to "religion," "home," and "money," while nurses utilized more complex sentences focusing on health-related matters and medical issues and were more likely to ask questions. CONCLUSION The methods and analytical approach we developed to differentiate patient and nurse language is an important precursor for downstream tasks that aim to analyze patient speech to identify patients at risk of disease and negative health outcomes.
Collapse
Affiliation(s)
- Maryam Zolnoori
- School of Nursing, Columbia University, New York, New York, USA
- Center for Home Care Policy & Research, VNS Health, New York, New York, USA
| | - Sasha Vergez
- Center for Home Care Policy & Research, VNS Health, New York, New York, USA
| | - Sridevi Sridharan
- Center for Home Care Policy & Research, VNS Health, New York, New York, USA
| | - Ali Zolnour
- School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran
| | - Kathryn Bowles
- Center for Home Care Policy & Research, VNS Health, New York, New York, USA
| | - Zoran Kostic
- Department of Electrical Engineering, Columbia University, New York, New York, USA
| | - Maxim Topaz
- School of Nursing, Columbia University, New York, New York, USA
- Center for Home Care Policy & Research, VNS Health, New York, New York, USA
| |
Collapse
|
40
|
Sprotte Y. Computerized text and voice analysis of patients with chronic schizophrenia in art therapy. Sci Rep 2023; 13:16062. [PMID: 37749186 PMCID: PMC10520069 DOI: 10.1038/s41598-023-43069-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 09/19/2023] [Indexed: 09/27/2023] Open
Abstract
This explorative study of patients with chronic schizophrenia aimed to clarify whether group art therapy followed by a therapist-guided picture review could influence patients' communication behaviour. Data on voice and speech characteristics were obtained via objective technological instruments, and these characteristics were selected as indicators of communication behaviour. Seven patients were recruited to participate in weekly group art therapy over a period of 6 months. Three days after each group meeting, they talked about their last picture during a standardized interview that was digitally recorded. The audio recordings were evaluated using validated computer-assisted procedures, the transcribed texts were evaluated using the German version of the LIWC2015 program, and the voice recordings were evaluated using the audio analysis software VocEmoApI. The dual methodological approach was intended to form an internal control of the study results. An exploratory factor analysis of the complete sets of output parameters was carried out with the expectation of obtaining typical speech and voice characteristics that map barriers to communication in patients with schizophrenia. The parameters of both methods were thus processed into five factors each, i.e., into a quantitative digitized classification of the texts and voices. The factor scores were subjected to a linear regression analysis to capture possible process-related changes. Most patients continued to participate in the study. This resulted in high-quality datasets for statistical analysis. To answer the study question, two results were summarized: First, text analysis factor called Presence proved to be a potential surrogate parameter for positive language development. Second, quantitative changes in vocal emotional factors were detected, demonstrating differentiated activation patterns of emotions. These results can be interpreted as an expression of a cathartic healing process. The methods presented in this study make a potentially significant contribution to quantitative research into the effectiveness and mode of action of art therapy.
Collapse
Affiliation(s)
- Yvonne Sprotte
- Art Therapy Department, Dresden University of Fine Arts (Hochschule für Bildende Künste Dresden), Dresden, Germany.
| |
Collapse
|
41
|
Berardi M, Brosch K, Pfarr JK, Schneider K, Sültmann A, Thomas-Odenthal F, Wroblewski A, Usemann P, Philipsen A, Dannlowski U, Nenadić I, Kircher T, Krug A, Stein F, Dietrich M. Relative importance of speech and voice features in the classification of schizophrenia and depression. Transl Psychiatry 2023; 13:298. [PMID: 37726285 PMCID: PMC10509176 DOI: 10.1038/s41398-023-02594-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 08/10/2023] [Accepted: 09/08/2023] [Indexed: 09/21/2023] Open
Abstract
Speech is a promising biomarker for schizophrenia spectrum disorder (SSD) and major depressive disorder (MDD). This proof of principle study investigates previously studied speech acoustics in combination with a novel application of voice pathology features as objective and reproducible classifiers for depression, schizophrenia, and healthy controls (HC). Speech and voice features for classification were calculated from recordings of picture descriptions from 240 speech samples (20 participants with SSD, 20 with MDD, and 20 HC each with 4 samples). Binary classification support vector machine (SVM) models classified the disorder groups and HC. For each feature, the permutation feature importance was calculated, and the top 25% most important features were used to compare differences between the disorder groups and HC including correlations between the important features and symptom severity scores. Multiple kernels for SVM were tested and the pairwise models with the best performing kernel (3-degree polynomial) were highly accurate for each classification: 0.947 for HC vs. SSD, 0.920 for HC vs. MDD, and 0.932 for SSD vs. MDD. The relatively most important features were measures of articulation coordination, number of pauses per minute, and speech variability. There were moderate correlations between important features and positive symptoms for SSD. The important features suggest that speech characteristics relating to psychomotor slowing, alogia, and flat affect differ between HC, SSD, and MDD.
Collapse
Affiliation(s)
- Mark Berardi
- Department of Psychiatry and Psychotherapy, University Hospital Bonn, Bonn, Germany.
| | - Katharina Brosch
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Julia-Katharina Pfarr
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Katharina Schneider
- Institute for Linguistics: General Linguistics, University of Mainz, Mainz, Germany
| | - Angela Sültmann
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Florian Thomas-Odenthal
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Adrian Wroblewski
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Paula Usemann
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Alexandra Philipsen
- Department of Psychiatry and Psychotherapy, University Hospital Bonn, Bonn, Germany
| | - Udo Dannlowski
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
| | - Igor Nenadić
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Tilo Kircher
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Axel Krug
- Department of Psychiatry and Psychotherapy, University Hospital Bonn, Bonn, Germany
| | - Frederike Stein
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Maria Dietrich
- Department of Psychiatry and Psychotherapy, University Hospital Bonn, Bonn, Germany
| |
Collapse
|
42
|
Fusaroli M, Simonsen A, Borrie SA, Low DM, Parola A, Raschi E, Poluzzi E, Fusaroli R. Identifying Medications Underlying Communication Atypicalities in Psychotic and Affective Disorders: A Pharmacovigilance Study Within the FDA Adverse Event Reporting System. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:3242-3259. [PMID: 37524118 DOI: 10.1044/2023_jslhr-22-00739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/02/2023]
Abstract
PURPOSE Communication atypicalities are considered promising markers of a broad range of clinical conditions. However, little is known about the mechanisms and confounders underlying them. Medications might have a crucial, relatively unknown role both as potential confounders and offering an insight on the mechanisms at work. The integration of regulatory documents with disproportionality analyses provides a more comprehensive picture to account for in future investigations of communication-related markers. The aim of this study was to identify a list of drugs potentially associated with communicative atypicalities within psychotic and affective disorders. METHOD We developed a query using the Medical Dictionary for Regulatory Activities to search for communicative atypicalities within the FDA Adverse Event Reporting System (updated June 2021). A Bonferroni-corrected disproportionality analysis (reporting odds ratio) was separately performed on spontaneous reports involving psychotic, affective, and non-neuropsychiatric disorders, to account for the confounding role of different underlying conditions. Drug-adverse event associations not already reported in the Side Effect Resource database of labeled adverse drug reactions (unexpected) were subjected to further robustness analyses to account for expected biases. RESULTS A list of 291 expected and 91 unexpected potential confounding medications was identified, including drugs that may irritate (inhalants) or desiccate (anticholinergics) the larynx, impair speech motor control (antipsychotics), or induce nodules (acitretin) or necrosis (vascular endothelial growth factor receptor inhibitors) on vocal cords; sedatives and stimulants; neurotoxic agents (anti-infectives); and agents acting on neurotransmitter pathways (dopamine agonists). CONCLUSIONS We provide a list of medications to account for in future studies of communication-related markers in affective and psychotic disorders. The current test case illustrates rigorous procedures for digital phenotyping, and the methodological tools implemented for large-scale disproportionality analyses can be considered a road map for investigations of communication-related markers in other clinical populations. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.23721345.
Collapse
Affiliation(s)
- Michele Fusaroli
- Pharmacology Unit, Department of Medical and Surgical Sciences, University of Bologna, Italy
| | - Arndis Simonsen
- Psychosis Research Unit, Department of Clinical Medicine, Aarhus University, Denmark
- Interacting Minds Centre, School of Culture and Society, Aarhus University, Denmark
| | - Stephanie A Borrie
- Department of Communicative Disorders and Deaf Education, Utah State University, Logan
| | - Daniel M Low
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge
- Speech and Hearing Bioscience and Technology Program, Harvard Medical School, Boston, MA
| | - Alberto Parola
- Department of Psychology, University of Turin, Italy
- Department of Linguistics, Cognitive Science and Semiotics, School of Communication and Culture, Aarhus University, Denmark
| | - Emanuel Raschi
- Pharmacology Unit, Department of Medical and Surgical Sciences, University of Bologna, Italy
| | - Elisabetta Poluzzi
- Pharmacology Unit, Department of Medical and Surgical Sciences, University of Bologna, Italy
| | - Riccardo Fusaroli
- Interacting Minds Centre, School of Culture and Society, Aarhus University, Denmark
- Department of Linguistics, Cognitive Science and Semiotics, School of Communication and Culture, Aarhus University, Denmark
- Linguistic Data Consortium, School of Arts & Sciences, University of Pennsylvania, Philadelphia
| |
Collapse
|
43
|
Cohen J, Richter V, Neumann M, Black D, Haq A, Wright-Berryman J, Ramanarayanan V. A multimodal dialog approach to mental state characterization in clinically depressed, anxious, and suicidal populations. Front Psychol 2023; 14:1135469. [PMID: 37767217 PMCID: PMC10520716 DOI: 10.3389/fpsyg.2023.1135469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 08/14/2023] [Indexed: 09/29/2023] Open
Abstract
Background The rise of depression, anxiety, and suicide rates has led to increased demand for telemedicine-based mental health screening and remote patient monitoring (RPM) solutions to alleviate the burden on, and enhance the efficiency of, mental health practitioners. Multimodal dialog systems (MDS) that conduct on-demand, structured interviews offer a scalable and cost-effective solution to address this need. Objective This study evaluates the feasibility of a cloud based MDS agent, Tina, for mental state characterization in participants with depression, anxiety, and suicide risk. Method Sixty-eight participants were recruited through an online health registry and completed 73 sessions, with 15 (20.6%), 21 (28.8%), and 26 (35.6%) sessions screening positive for depression, anxiety, and suicide risk, respectively using conventional screening instruments. Participants then interacted with Tina as they completed a structured interview designed to elicit calibrated, open-ended responses regarding the participants' feelings and emotional state. Simultaneously, the platform streamed their speech and video recordings in real-time to a HIPAA-compliant cloud server, to compute speech, language, and facial movement-based biomarkers. After their sessions, participants completed user experience surveys. Machine learning models were developed using extracted features and evaluated with the area under the receiver operating characteristic curve (AUC). Results For both depression and suicide risk, affected individuals tended to have a higher percent pause time, while those positive for anxiety showed reduced lip movement relative to healthy controls. In terms of single-modality classification models, speech features performed best for depression (AUC = 0.64; 95% CI = 0.51-0.78), facial features for anxiety (AUC = 0.57; 95% CI = 0.43-0.71), and text features for suicide risk (AUC = 0.65; 95% CI = 0.52-0.78). Best overall performance was achieved by decision fusion of all models in identifying suicide risk (AUC = 0.76; 95% CI = 0.65-0.87). Participants reported the experience comfortable and shared their feelings. Conclusion MDS is a feasible, useful, effective, and interpretable solution for RPM in real-world clinical depression, anxiety, and suicidal populations. Facial information is more informative for anxiety classification, while speech and language are more discriminative of depression and suicidality markers. In general, combining speech, language, and facial information improved model performance on all classification tasks.
Collapse
Affiliation(s)
| | | | | | | | - Allie Haq
- Clarigent Health, Mason, OH, United States
| | - Jennifer Wright-Berryman
- Department of Social Work, College of Allied Health Sciences, University of Cincinnati, Cincinnati, OH, United States
| | - Vikram Ramanarayanan
- Modality.AI, Inc., San Francisco, CA, United States
- Otolaryngology - Head and Neck Surgery (OHNS), University of California, San Francisco, San Francisco, CA, United States
| |
Collapse
|
44
|
Duey AH, Rana A, Siddi F, Hussein H, Onnela JP, Smith TR. Daily Pain Prediction Using Smartphone Speech Recordings of Patients With Spine Disease. Neurosurgery 2023; 93:670-677. [PMID: 36995101 DOI: 10.1227/neu.0000000000002474] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Accepted: 02/02/2023] [Indexed: 03/31/2023] Open
Abstract
BACKGROUND Pain evaluation remains largely subjective in neurosurgical practice, but machine learning provides the potential for objective pain assessment tools. OBJECTIVE To predict daily pain levels using speech recordings from personal smartphones of a cohort of patients with diagnosed neurological spine disease. METHODS Patients with spine disease were enrolled through a general neurosurgical clinic with approval from the institutional ethics committee. At-home pain surveys and speech recordings were administered at regular intervals through the Beiwe smartphone application. Praat audio features were extracted from the speech recordings to be used as input to a K-nearest neighbors (KNN) machine learning model. The pain scores were transformed from a 0 to 10 scale to low and high pain for better discriminative capacity. RESULTS A total of 60 patients were enrolled, and 384 observations were used to train and test the prediction model. Using the KNN prediction model, an accuracy of 71% with a positive predictive value of 0.71 was achieved in classifying pain intensity into high and low. The model showed 0.71 precision for high pain and 0.70 precision for low pain. Recall of high pain was 0.74, and recall of low pain was 0.67. The overall F1 score was 0.73. CONCLUSION Our study uses a KNN to model the relationship between speech features and pain levels collected from personal smartphones of patients with spine disease. The proposed model is a stepping stone for the development of objective pain assessment in neurosurgery clinical practice.
Collapse
Affiliation(s)
- Akiro H Duey
- Department of Neurosurgery, Computational Neuroscience Outcomes Center, Brigham and Women's Hospital, Harvard Medical School, Boston , Massachusetts , USA
- Icahn School of Medicine at Mount Sinai, New York , New York , USA
| | - Aakanksha Rana
- Department of Neurosurgery, Computational Neuroscience Outcomes Center, Brigham and Women's Hospital, Harvard Medical School, Boston , Massachusetts , USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge , Massachusetts , USA
| | - Francesca Siddi
- Department of Neurosurgery, Computational Neuroscience Outcomes Center, Brigham and Women's Hospital, Harvard Medical School, Boston , Massachusetts , USA
- Departments of Neurosurgery, Leiden University Medical Center, Leiden , The Netherlands
| | - Helweh Hussein
- Department of Neurosurgery, Computational Neuroscience Outcomes Center, Brigham and Women's Hospital, Harvard Medical School, Boston , Massachusetts , USA
| | - Jukka-Pekka Onnela
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston , Massachusetts , USA
| | - Timothy R Smith
- Department of Neurosurgery, Computational Neuroscience Outcomes Center, Brigham and Women's Hospital, Harvard Medical School, Boston , Massachusetts , USA
- Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston , Massachusetts , USA
| |
Collapse
|
45
|
Gao CX, Dwyer D, Zhu Y, Smith CL, Du L, Filia KM, Bayer J, Menssink JM, Wang T, Bergmeir C, Wood S, Cotton SM. An overview of clustering methods with guidelines for application in mental health research. Psychiatry Res 2023; 327:115265. [PMID: 37348404 DOI: 10.1016/j.psychres.2023.115265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 05/20/2023] [Accepted: 05/21/2023] [Indexed: 06/24/2023]
Abstract
Cluster analyzes have been widely used in mental health research to decompose inter-individual heterogeneity by identifying more homogeneous subgroups of individuals. However, despite advances in new algorithms and increasing popularity, there is little guidance on model choice, analytical framework and reporting requirements. In this paper, we aimed to address this gap by introducing the philosophy, design, advantages/disadvantages and implementation of major algorithms that are particularly relevant in mental health research. Extensions of basic models, such as kernel methods, deep learning, semi-supervised clustering, and clustering ensembles are subsequently introduced. How to choose algorithms to address common issues as well as methods for pre-clustering data processing, clustering evaluation and validation are then discussed. Importantly, we also provide general guidance on clustering workflow and reporting requirements. To facilitate the implementation of different algorithms, we provide information on R functions and libraries.
Collapse
Affiliation(s)
- Caroline X Gao
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia; Department of Epidemiology and Preventative Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, Australia.
| | - Dominic Dwyer
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| | - Ye Zhu
- School of Information Technology, Deakin University, Geelong, VIC, Australia
| | - Catherine L Smith
- Department of Epidemiology and Preventative Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, Australia
| | - Lan Du
- Faculty of Information Technology, Monash University, Clayton, VIC, Australia
| | - Kate M Filia
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| | - Johanna Bayer
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| | - Jana M Menssink
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| | - Teresa Wang
- Faculty of Information Technology, Monash University, Clayton, VIC, Australia
| | - Christoph Bergmeir
- Faculty of Information Technology, Monash University, Clayton, VIC, Australia; Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| | - Stephen Wood
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| | - Sue M Cotton
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| |
Collapse
|
46
|
Foltz PW, Chandler C, Diaz-Asper C, Cohen AS, Rodriguez Z, Holmlund TB, Elvevåg B. Reflections on the nature of measurement in language-based automated assessments of patients' mental state and cognitive function. Schizophr Res 2023; 259:127-139. [PMID: 36153250 DOI: 10.1016/j.schres.2022.07.011] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 07/12/2022] [Accepted: 07/13/2022] [Indexed: 11/23/2022]
Abstract
Modern advances in computational language processing methods have enabled new approaches to the measurement of mental processes. However, the field has primarily focused on model accuracy in predicting performance on a task or a diagnostic category. Instead the field should be more focused on determining which computational analyses align best with the targeted neurocognitive/psychological functions that we want to assess. In this paper we reflect on two decades of experience with the application of language-based assessment to patients' mental state and cognitive function by addressing the questions of what we are measuring, how it should be measured and why we are measuring the phenomena. We address the questions by advocating for a principled framework for aligning computational models to the constructs being assessed and the tasks being used, as well as defining how those constructs relate to patient clinical states. We further examine the assumptions that go into the computational models and the effects that model design decisions may have on the accuracy, bias and generalizability of models for assessing clinical states. Finally, we describe how this principled approach can further the goal of transitioning language-based computational assessments to part of clinical practice while gaining the trust of critical stakeholders.
Collapse
Affiliation(s)
- Peter W Foltz
- Institute of Cognitive Science, University of Colorado Boulder, United States of America.
| | - Chelsea Chandler
- Institute of Cognitive Science, University of Colorado Boulder, United States of America; Department of Computer Science, University of Colorado Boulder, United States of America
| | | | - Alex S Cohen
- Department of Psychology, Louisiana State University, United States of America; Center for Computation and Technology, Louisiana State University, United States of America
| | - Zachary Rodriguez
- Department of Psychology, Louisiana State University, United States of America; Center for Computation and Technology, Louisiana State University, United States of America
| | - Terje B Holmlund
- Department of Clinical Medicine, University of Tromsø - the Arctic University of Norway, Tromsø, Norway
| | - Brita Elvevåg
- Department of Clinical Medicine, University of Tromsø - the Arctic University of Norway, Tromsø, Norway; Norwegian Centre for eHealth Research, University Hospital of North Norway, Tromsø, Norway.
| |
Collapse
|
47
|
Granrud OE, Rodriguez Z, Cowan T, Masucci MD, Cohen AS. Alogia and pressured speech do not fall on a continuum of speech production using objective speech technologies. Schizophr Res 2023; 259:121-126. [PMID: 35864001 DOI: 10.1016/j.schres.2022.07.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 07/02/2022] [Accepted: 07/04/2022] [Indexed: 10/17/2022]
Abstract
Speech production is affected in a variety of serious mental illnesses (SMI; e.g., schizophrenia, unipolar depression, bipolar disorders) and at its extremes can be observed in the gross reduction of speech (e.g., alogia) or increase of speech (e.g., pressured speech). The present study evaluated whether clinically-rated alogia and pressured speech represent antithetical constructs when analyzed using objective metrics of speech production. We examined natural speech using acoustic and natural language processing features from two archival studies using several different speaking tasks and a combined 107 patients meeting criteria for SMI. Contrary to expectations, we did not find that alogia and pressured speech presented as opposing ends of a speech production continuum. Objective speech markers were associated with clinically rated alogia but not pressured speech, and these results were consistent across speaking tasks and studies. Implications for our understanding of speech production symptoms in SMI are discussed, as well as implications for Natural Language Processing and digital phenotyping efforts more generally.
Collapse
Affiliation(s)
- Ole Edvard Granrud
- Louisiana State University, Department of Psychology, United States of America
| | - Zachary Rodriguez
- Louisiana State University, Department of Psychology, United States of America; Louisiana State University, Center for Computation and Technology, United States of America
| | - Tovah Cowan
- Louisiana State University, Department of Psychology, United States of America
| | - Michael D Masucci
- Louisiana State University, Department of Psychology, United States of America
| | - Alex S Cohen
- Louisiana State University, Department of Psychology, United States of America; Louisiana State University, Center for Computation and Technology, United States of America.
| |
Collapse
|
48
|
Mizuguchi D, Yamamoto T, Omiya Y, Endo K, Tano K, Oya M, Takano S. Novel Screening Tool Using Non-linguistic Voice Features Derived from Simple Phrases to Detect Mild Cognitive Impairment and Dementia. JAR LIFE 2023; 12:72-76. [PMID: 37637273 PMCID: PMC10450207 DOI: 10.14283/jarlife.2023.12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 07/13/2023] [Indexed: 08/29/2023]
Abstract
Appropriate intervention and care in detecting cognitive impairment early are essential to effectively prevent the progression of cognitive deterioration. Diagnostic voice analysis is a noninvasive and inexpensive screening method that could be useful for detecting cognitive deterioration at earlier stages such as mild cognitive impairment. We aimed to distinguish between patients with dementia or mild cognitive impairment and healthy controls by using purely acoustic features (i.e., nonlinguistic features) extracted from two simple phrases. Voice was analyzed on 195 recordings from 150 patients (age, 45-95 years). We applied a machine learning algorithm (LightGBM; Microsoft, Redmond, WA, USA) to test whether the healthy control, mild cognitive impairment, and dementia groups could be accurately classified, based on acoustic features. Our algorithm performed well: area under the curve was 0.81 and accuracy, 66.7% for the 3-class classification. Thus, our vocal biomarker is useful for automated assistance in diagnosing early cognitive deterioration.
Collapse
Affiliation(s)
| | | | | | - K Endo
- PST Inc., Yokohama, Japan
| | - K Tano
- Takeyama Hospital, Yokohama, Japan
| | - M Oya
- Takeyama Hospital, Yokohama, Japan
| | - S Takano
- Honjo Kodama Hospital, Honjo, Japan
| |
Collapse
|
49
|
Neumann M, Kothare H, Ramanarayanan V. Combining Multiple Multimodal Speech Features into an Interpretable Index Score for Capturing Disease Progression in Amyotrophic Lateral Sclerosis. INTERSPEECH 2023; 2023:2353-2357. [PMID: 39006832 PMCID: PMC11246072 DOI: 10.21437/interspeech.2023-2100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Multiple speech biomarkers have been shown to carry useful information regarding Amyotrophic Lateral Sclerosis (ALS) pathology. We propose a two-step framework to compute optimal linear combinations (indexes) of these biomarkers that are more discriminative and noise-robust than the individual markers, which is important for clinical care and pharmaceutical trial applications. First, we use a hierarchical clustering based method to select representative speech metrics from a dataset comprising 143 people with ALS and 135 age- and sex-matched healthy controls. Second, we analyze three methods of index computation that optimize linear discriminability, Youden Index, and sparsity of logistic regression model weights, respectively, and evaluate their performance with 5-fold cross validation. We find that the proposed indexes are generally more discriminative of bulbar vs non-bulbar onset in ALS than their individual component metrics as well as an equally-weighted baseline.
Collapse
|
50
|
Wang J, Ravi V, Alwan A. Non-uniform Speaker Disentanglement For Depression Detection From Raw Speech Signals. INTERSPEECH 2023; 2023:2343-2347. [PMID: 38045821 PMCID: PMC10691447 DOI: 10.21437/interspeech.2023-2101] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
While speech-based depression detection methods that use speaker-identity features, such as speaker embeddings, are popular, they often compromise patient privacy. To address this issue, we propose a speaker disentanglement method that utilizes a non-uniform mechanism of adversarial SID loss maximization. This is achieved by varying the adversarial weight between different layers of a model during training. We find that a greater adversarial weight for the initial layers leads to performance improvement. Our approach using the ECAPA-TDNN model achieves an F1-score of 0.7349 (a 3.7% improvement over audio-only SOTA) on the DAIC-WoZ dataset, while simultaneously reducing the speaker-identification accuracy by 50%. Our findings suggest that identifying depression through speech signals can be accomplished without placing undue reliance on a speaker's identity, paving the way for privacy-preserving approaches of depression detection.
Collapse
Affiliation(s)
- Jinhan Wang
- Dept. of Electrical and Computer Engineering, University of California, Los Angeles, USA
| | - Vijay Ravi
- Dept. of Electrical and Computer Engineering, University of California, Los Angeles, USA
| | - Abeer Alwan
- Dept. of Electrical and Computer Engineering, University of California, Los Angeles, USA
| |
Collapse
|