Demiris G, Oliver DP, Washington KT, Chadwick C, Voigt JD, Brotherton S, Naylor MD. Examining spoken words and acoustic features of therapy sessions to understand family caregivers’ anxiety and quality of life.
Int J Med Inform 2022;
160:104716. [PMID:
35183870 PMCID:
PMC8902633 DOI:
10.1016/j.ijmedinf.2022.104716]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 02/07/2022] [Accepted: 02/08/2022] [Indexed: 11/16/2022]
Abstract
BACKGROUND
Speech and language cues are considered significant data sources that can reveal insights into one's behavior and well-being. The goal of this study is to evaluate how different machine learning (ML) classifiers trained both on the spoken word and acoustic features during live conversations between family caregivers and a therapist, correlate to anxiety and quality of life (QoL) as assessed by validated instruments.
METHODS
The dataset comprised of 124 audio-recorded and professionally transcribed discussions between family caregivers of hospice patients and a therapist, of challenges they faced in their caregiving role, and standardized assessments of self-reported QoL and anxiety. We custom-built and trained an Automated Speech Recognition (ASR) system on older adult voices and created a logistic regression-based classifier that incorporated audio-based features. The classification process automated the QoL scoring and display of the score in real time, replacing hand-coding for self-reported assessments with a machine learning identified classifier.
FINDINGS
Of the 124 audio files and their transcripts, 87 of these transcripts (70%) were selected to serve as the training set, holding the remaining 30% of the data for evaluation. For anxiety, the results of adding the dimension of sound and an automated speech-to-text transcription outperformed the prior classifier trained only on human-rendered transcriptions. Specifically, precision improved from 86% to 92%, accuracy from 81% to 89%, and recall from 78% to 88%.
INTERPRETATION
Classifiers can be developed through ML techniques which can indicate improvements in QoL measures with a reasonable degree of accuracy. Examining the content, sound of the voice and context of the conversation provides insights into additional factors affecting anxiety and QoL that could be addressed in tailored therapy and the design of conversational agents serving as therapy chatbots.
Collapse