1
|
Kodish-Wachs J, Agassi E, Kenny P, Overhage JM. A systematic comparison of contemporary automatic speech recognition engines for conversational clinical speech. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018; 2018:683-689. [PMID: 30815110 PMCID: PMC6371385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Conversations especially between a clinician and a patient are important sources of data to support clinical care. To date, clinicians act as the sensor to capture these data and record them in the medical record. Automatic speech recognition (ASR) engines have advanced to support continuous speech, to work independently of speaker and deliver continuously improving performance. Near human levels of performance have been reported for several ASR engines. We undertook a systematic comparison of selected ASRs for clinical conversational speech. Using audio recorded from unscripted clinical scenarios using two microphones, we evaluated eight ASR engines using word error rate (WER) and the precision, recall and F1 scores for concept extraction. We found a wide range of word errors across the ASR engines, with values ranging from 65% to 34%, all falling short of the rates achieved for other conversational speech. Recall for health concepts also ranged from 22% to 74%. Concept recall rates match or exceed expectations given measured word error rates suggesting that vocabulary is not the dominant issue.
Collapse
|
2
|
|
3
|
Kumah-Crystal YA, Pirtle CJ, Whyte HM, Goode ES, Anders SH, Lehmann CU. Electronic Health Record Interactions through Voice: A Review. Appl Clin Inform 2018; 9:541-552. [PMID: 30040113 DOI: 10.1055/s-0038-1666844] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022] Open
Abstract
BACKGROUND Usability problems in the electronic health record (EHR) lead to workflow inefficiencies when navigating charts and entering or retrieving data using standard keyboard and mouse interfaces. Voice input technology has been used to overcome some of the challenges associated with conventional interfaces and continues to evolve as a promising way to interact with the EHR. OBJECTIVE This article reviews the literature and evidence on voice input technology used to facilitate work in the EHR. It also reviews the benefits and challenges of implementation and use of voice technologies, and discusses emerging opportunities with voice assistant technology. METHODS We performed a systematic review of the literature to identify articles that discuss the use of voice technology to facilitate health care work. We searched MEDLINE and the Google search engine to identify relevant articles. We evaluated articles that discussed the strengths and limitations of voice technology to facilitate health care work. Consumer articles from leading technology publications addressing emerging use of voice assistants were reviewed to ascertain functionalities in existing consumer applications. RESULTS Using a MEDLINE search, we identified 683 articles that were reviewed for inclusion eligibility. The references of included articles were also reviewed. Sixty-one papers that discussed the use of voice tools in health care were included, of which 32 detailed the use of voice technologies in production environments. Articles were organized into three domains: Voice for (1) documentation, (2) commands, and (3) interactive response and navigation for patients. Of 31 articles that discussed usability attributes of consumer voice assistant technology, 12 were included in the review. CONCLUSION We highlight the successes and challenges of voice input technologies in health care and discuss opportunities to incorporate emerging voice assistant technologies used in the consumer domain.
Collapse
Affiliation(s)
- Yaa A Kumah-Crystal
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Vanderbilt University, Nashville, Tennessee, United States
| | - Claude J Pirtle
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Vanderbilt University, Nashville, Tennessee, United States
| | - Harrison M Whyte
- Department of Computer Science, Vanderbilt University College of Arts and Science, Vanderbilt University, Nashville, Tennessee, United States
| | - Edward S Goode
- Department of Computer Science, Vanderbilt University College of Arts and Science, Vanderbilt University, Nashville, Tennessee, United States
| | - Shilo H Anders
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Vanderbilt University, Nashville, Tennessee, United States.,Department of Anesthesiology, Vanderbilt University Medical Center, Vanderbilt University, Nashville, Tennessee, United States
| | - Christoph U Lehmann
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Vanderbilt University, Nashville, Tennessee, United States
| |
Collapse
|
4
|
Retrospective Analysis of Clinical Performance of an Estonian Speech Recognition System for Radiology: Effects of Different Acoustic and Language Models. J Digit Imaging 2018; 31:615-621. [PMID: 29713836 PMCID: PMC6148813 DOI: 10.1007/s10278-018-0085-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
The aim of this study was to analyze retrospectively the influence of different acoustic and language models in order to determine the most important effects to the clinical performance of an Estonian language-based non-commercial radiology-oriented automatic speech recognition (ASR) system. An ASR system was developed for Estonian language in radiology domain by utilizing open-source software components (Kaldi toolkit, Thrax). The ASR system was trained with the real radiology text reports and dictations collected during development phases. The final version of the ASR system was tested by 11 radiologists who dictated 219 reports in total, in spontaneous manner in a real clinical environment. The audio files collected in the final phase were used to measure the performance of different versions of the ASR system retrospectively. ASR system versions were evaluated by word error rate (WER) for each speaker and modality and by WER difference for the first and the last version of the ASR system. Total average WER for the final version throughout all material was improved from 18.4% of the first version (v1) to 5.8% of the last (v8) version which corresponds to relative improvement of 68.5%. WER improvement was strongly related to modality and radiologist. In summary, the performance of the final ASR system version was close to optimal, delivering similar results to all modalities and being independent on user, the complexity of the radiology reports, user experience, and speech characteristics.
Collapse
|
5
|
Abstract
BACKGROUND Health question-answering (QA) systems have become a typical application scenario of Artificial Intelligent (AI). An annotated question corpus is prerequisite for training machines to understand health information needs of users. Thus, we aimed to develop an annotated classification corpus of Chinese health questions (Qcorp) and make it openly accessible. METHODS We developed a two-layered classification schema and corresponding annotation rules on basis of our previous work. Using the schema, we annotated 5000 questions that were randomly selected from 5 Chinese health websites within 6 broad sections. 8 annotators participated in the annotation task, and the inter-annotator agreement was evaluated to ensure the corpus quality. Furthermore, the distribution and relationship of the annotated tags were measured by descriptive statistics and social network map. RESULTS The questions were annotated using 7101 tags that covers 29 topic categories in the two-layered schema. In our released corpus, the distribution of questions on the top-layered categories was treatment of 64.22%, diagnosis of 37.14%, epidemiology of 14.96%, healthy lifestyle of 10.38%, and health provider choice of 4.54% respectively. Both the annotated health questions and annotation schema were openly accessible on the Qcorp website. Users can download the annotated Chinese questions in CSV, XML, and HTML format. CONCLUSIONS We developed a Chinese health question corpus including 5000 manually annotated questions. It is openly accessible and would contribute to the intelligent health QA system development.
Collapse
Affiliation(s)
- Haihong Guo
- Institute of Medical Information / Medical Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Xu Na
- Institute of Medical Information / Medical Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Jiao Li
- Institute of Medical Information / Medical Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China.
| |
Collapse
|
6
|
Lancioni GE, Singh NN, O'Reilly MF, Sigafoos J, Ferlisi G, Zullo V, Schirone S, Prisco R, Denitto F. A computer-aided program for helping patients with moderate Alzheimer's disease engage in verbal reminiscence. RESEARCH IN DEVELOPMENTAL DISABILITIES 2014; 35:3026-3033. [PMID: 25124700 DOI: 10.1016/j.ridd.2014.07.047] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2014] [Accepted: 07/22/2014] [Indexed: 06/03/2023]
Abstract
This study assessed a simple computer-aided program for helping patients with moderate Alzheimer's disease engage in verbal reminiscence. In practice, the program was aimed at fostering the patient's verbal engagement on a number of life experiences/topics previously selected for him or her and introduced in the sessions through a friendly female, who appeared on the computer screen. The female asked the patient about the aforementioned experiences/topics, and provided him or her with positive attention, and possibly verbal guidance (i.e., prompts/encouragements). Eight patients were involved in the study, which was carried out according to non-concurrent multiple baseline designs across participants. Seven of them showed clear improvement during the intervention phase (i.e., with the program). Their mean percentages of intervals with verbal engagement/reminiscence ranged from close to zero to about 15 during the baseline and from above 50 to above 75 during the intervention. The results were discussed in relation to previous literature on reminiscence therapy, with specific emphasis on the need for (a) replication studies and (b) the development of new versions of the technology-aided program to improve its impact and reach a wider number of patients.
Collapse
Affiliation(s)
| | - Nirbhay N Singh
- Medical College of Georgia, Georgia Regents University, Augusta, USA
| | | | | | | | | | | | | | | |
Collapse
|
7
|
Natural Language Processing, Electronic Health Records, and Clinical Research. HEALTH INFORMATICS 2012. [DOI: 10.1007/978-1-84882-448-5_16] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2023]
|
8
|
Liu F, Antieau LD, Yu H. Toward automated consumer question answering: automatically separating consumer questions from professional questions in the healthcare domain. J Biomed Inform 2011; 44:1032-8. [PMID: 21856442 PMCID: PMC3226885 DOI: 10.1016/j.jbi.2011.08.008] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2010] [Revised: 07/21/2011] [Accepted: 08/05/2011] [Indexed: 10/17/2022]
Abstract
OBJECTIVE Both healthcare professionals and healthcare consumers have information needs that can be met through the use of computers, specifically via medical question answering systems. However, the information needs of both groups are different in terms of literacy levels and technical expertise, and an effective question answering system must be able to account for these differences if it is to formulate the most relevant responses for users from each group. In this paper, we propose that a first step toward answering the queries of different users is automatically classifying questions according to whether they were asked by healthcare professionals or consumers. DESIGN We obtained two sets of consumer questions (~10,000 questions in total) from Yahoo answers. The professional questions consist of two question collections: 4654 point-of-care questions (denoted as PointCare) obtained from interviews of a group of family doctors following patient visits and 5378 questions from physician practices through professional online services (denoted as OnlinePractice). With more than 20,000 questions combined, we developed supervised machine-learning models for automatic classification between consumer questions and professional questions. To evaluate the robustness of our models, we tested the model that was trained on the Consumer-PointCare dataset on the Consumer-OnlinePractice dataset. We evaluated both linguistic features and statistical features and examined how the characteristics in two different types of professional questions (PointCare vs. OnlinePractice) may affect the classification performance. We explored information gain for feature reduction and the back-off linguistic category features. RESULTS The 10-fold cross-validation results showed the best F1-measure of 0.936 and 0.946 on Consumer-PointCare and Consumer-OnlinePractice respectively, and the best F1-measure of 0.891 when testing the Consumer-PointCare model on the Consumer-OnlinePractice dataset. CONCLUSION Healthcare consumer questions posted at Yahoo online communities can be reliably classified from professional questions posted by point-of-care clinicians and online physicians. The supervised machine-learning models are robust for this task. Our study will significantly benefit further development in automated consumer question answering.
Collapse
Affiliation(s)
- Feifan Liu
- Department of Health Sciences, University of Wisconsin-Milwaukee, Milwaukee, WI 53211, United States.
| | | | | |
Collapse
|
9
|
Miller T, Ravvaz K, Cimino JJ, Yu H. An investigation into the feasibility of spoken clinical question answering. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2011; 2011:954-959. [PMID: 22195154 PMCID: PMC3243288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Spoken question answering for clinical decision support is a potentially revolutionary technology for improving the efficiency and quality of health care delivery. This application involves many technologies currently being researched, including automatic speech recognition (ASR), information retrieval (IR), and summarization, all in the biomedical domain. In certain domains, the problem of spoken document retrieval has been declared solved because of the robustness of IR to ASR errors. This study investigates the extent to which spoken medical question answering benefits from that same robustness. We used the best results from previous speech recognition experiments as inputs to a clinical question answering system, and had physicians perform blind evaluations of results generated both by ASR transcripts of questions and gold standard transcripts of the same questions. Our results suggest that the medical domain differs enough from the open domain to require additional work in automatic speech recognition adapted for the biomedical domain.
Collapse
Affiliation(s)
- Tim Miller
- College of Health Sciences, University of Wisconsin - Milwaukee, Milwaukee, WI, USA
| | | | | | | |
Collapse
|