1
|
Safari N, Fang H, Veerareddy A, Xu P, Krueger F. The anatomical structure of sex differences in trust propensity: A voxel-based morphometry study. Cortex 2024; 176:260-273. [PMID: 38677959 DOI: 10.1016/j.cortex.2024.02.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Revised: 08/14/2023] [Accepted: 02/28/2024] [Indexed: 04/29/2024]
Abstract
Trust is a key component of human relationships. Sex differences in trust behavior have been elucidated by parental investment theory and social role theory, attributing men's higher trust propensity to their increased engagement in physically and socially risky activities aimed at securing additional resources. Although sex differences in trust behavior exist and the neuropsychological signatures of trust are known, the underlying anatomical structure of sex differences is still unexplored. Our study aimed to investigate the anatomical structure of sex differences in trust behavior toward strangers (i.e., trust propensity, TP) by employing voxel-based morphometry (VBM) in a sample of healthy young adults. We collected behavioral data for TP as measured with participants in the role of trustors completing the one-shot trust game (TG) with anonymous partners as trustees. We conducted primary region of interest (ROI) and exploratory whole-brain (WB) VBM analyses of high-resolution structural images to test for the association between TP and regional gray matter volume (GMV) associated with sex differences. Confirming previous studies, our behavioral results demonstrated that men trusted more than women during the one-shot TG. Our WB analysis showed a greater GMV related to TP in men than women in the precuneus (PreC), whereas our ROI analysis in regions of the default-mode network (dorsomedial prefrontal cortex [dmPFC], PreC, superior temporal gyrus) to simulate the partner's trustworthiness, central-executive network (ventrolateral PFC) to implement a calculus-based trust strategy, and action-perception network (precentral gyrus) to performance cost-benefit calculations, as proposed by a neuropsychoeconomic model of trust. Our findings advance the neuropsychological understanding of sex differences in TP, which has implications for interpersonal partnerships, financial transactions, and societal engagements.
Collapse
Affiliation(s)
- Nooshin Safari
- School of Systems Biology, George Mason University, Fairfax, VA, USA
| | - Huihua Fang
- Shenzhen Key Laboratory of Affective and Social Neuroscience, Magnetic Resonance Imaging, China; Department of Psychology, University of Mannheim, Mannheim, Germany
| | | | - Pengfei Xu
- Beijing Key Laboratory of Applied Experimental Psychology, National Demonstration Center for Experimental Psychology Education (BNU), Faculty of Psychology, Beijing Normal University, Beijing, China; Center for Neuroimaging, Shenzhen Institute of Neuroscience, Shenzhen, China.
| | - Frank Krueger
- School of Systems Biology, George Mason University, Fairfax, VA, USA; Department of Psychology, University of Mannheim, Mannheim, Germany
| |
Collapse
|
2
|
Wang H, Alanis N, Haygood L, Swoboda TK, Hoot N, Phillips D, Knowles H, Stinson SA, Mehta P, Sambamoorthi U. Using natural language processing in emergency medicine health service research: A systematic review and meta-analysis. Acad Emerg Med 2024. [PMID: 38757352 DOI: 10.1111/acem.14937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 04/15/2024] [Accepted: 04/17/2024] [Indexed: 05/18/2024]
Abstract
OBJECTIVES Natural language processing (NLP) represents one of the adjunct technologies within artificial intelligence and machine learning, creating structure out of unstructured data. This study aims to assess the performance of employing NLP to identify and categorize unstructured data within the emergency medicine (EM) setting. METHODS We systematically searched publications related to EM research and NLP across databases including MEDLINE, Embase, Scopus, CENTRAL, and ProQuest Dissertations & Theses Global. Independent reviewers screened, reviewed, and evaluated article quality and bias. NLP usage was categorized into syndromic surveillance, radiologic interpretation, and identification of specific diseases/events/syndromes, with respective sensitivity analysis reported. Performance metrics for NLP usage were calculated and the overall area under the summary of receiver operating characteristic curve (SROC) was determined. RESULTS A total of 27 studies underwent meta-analysis. Findings indicated an overall mean sensitivity (recall) of 82%-87%, specificity of 95%, with the area under the SROC at 0.96 (95% CI 0.94-0.98). Optimal performance using NLP was observed in radiologic interpretation, demonstrating an overall mean sensitivity of 93% and specificity of 96%. CONCLUSIONS Our analysis revealed a generally favorable performance accuracy in using NLP within EM research, particularly in the realm of radiologic interpretation. Consequently, we advocate for the adoption of NLP-based research to augment EM health care management.
Collapse
Affiliation(s)
- Hao Wang
- Department of Emergency Medicine, JPS Health Network, Fort Worth, Texas, USA
| | - Naomi Alanis
- Department of Emergency Medicine, JPS Health Network, Fort Worth, Texas, USA
| | - Laura Haygood
- Health Sciences Librarian for Public Health, Brown University, Providence, Rhode Island, USA
| | - Thomas K Swoboda
- Department of Emergency Medicine, The Valley Health System, Touro University Nevada School of Osteopathic Medicine, Las Vegas, Nevada, USA
| | - Nathan Hoot
- Department of Emergency Medicine, JPS Health Network, Fort Worth, Texas, USA
| | - Daniel Phillips
- Department of Emergency Medicine, JPS Health Network, Fort Worth, Texas, USA
| | - Heidi Knowles
- Department of Emergency Medicine, JPS Health Network, Fort Worth, Texas, USA
| | - Sara Ann Stinson
- Mary Couts Burnett Library, Burnett School of Medicine at Texas Christian University, Fort Worth, Texas, USA
| | - Prachi Mehta
- Department of Emergency Medicine, JPS Health Network, Fort Worth, Texas, USA
| | - Usha Sambamoorthi
- College of Pharmacy, University of North Texas Health Science Center, Fort Worth, Texas, USA
| |
Collapse
|
3
|
Wissel BD, Greiner HM, Glauser TA, Mangano FT, Holland-Bouley KD, Zhang N, Szczesniak RD, Santel D, Pestian JP, Dexheimer JW. Automated, machine learning-based alerts increase epilepsy surgery referrals: A randomized controlled trial. Epilepsia 2023; 64:1791-1799. [PMID: 37102995 PMCID: PMC10524622 DOI: 10.1111/epi.17629] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Revised: 04/25/2023] [Accepted: 04/25/2023] [Indexed: 04/28/2023]
Abstract
OBJECTIVE To determine whether automated, electronic alerts increased referrals for epilepsy surgery. METHODS We conducted a prospective, randomized controlled trial of a natural language processing-based clinical decision support system embedded in the electronic health record (EHR) at 14 pediatric neurology outpatient clinic sites. Children with epilepsy and at least two prior neurology visits were screened by the system prior to their scheduled visit. Patients classified as a potential surgical candidate were randomized 2:1 for their provider to receive an alert or standard of care (no alert). The primary outcome was referral for a neurosurgical evaluation. The likelihood of referral was estimated using a Cox proportional hazards regression model. RESULTS Between April 2017 and April 2019, at total of 4858 children were screened by the system, and 284 (5.8%) were identified as potential surgical candidates. Two hundred four patients received an alert, and 96 patients received standard care. Median follow-up time was 24 months (range: 12-36 months). Compared to the control group, patients whose provider received an alert were more likely to be referred for a presurgical evaluation (3.1% vs 9.8%; adjusted hazard ratio [HR] = 3.21, 95% confidence interval [CI]: 0.95-10.8; one-sided p = .03). Nine patients (4.4%) in the alert group underwent epilepsy surgery, compared to none (0%) in the control group (one-sided p = .03). SIGNIFICANCE Machine learning-based automated alerts may improve the utilization of referrals for epilepsy surgery evaluations.
Collapse
Affiliation(s)
- Benjamin D Wissel
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - Hansel M Greiner
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, USA
- Division of Neurology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - Tracy A Glauser
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, USA
- Division of Neurology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - Francesco T Mangano
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, USA
- Division of Neurosurgery, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - Katherine D Holland-Bouley
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, USA
- Division of Neurology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - Nanhua Zhang
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, USA
- Division of Biostatistics & Epidemiology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - Rhonda D Szczesniak
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, USA
- Division of Biostatistics & Epidemiology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - Daniel Santel
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - John P Pestian
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, USA
| | - Judith W Dexheimer
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, USA
- Division of Emergency Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| |
Collapse
|
4
|
Dolatabadi E, Chen B, Buchan SA, Austin AM, Azimaee M, McGeer A, Mubareka S, Kwong JC. Natural Language Processing for Clinical Laboratory Data Repository Systems: Implementation and Evaluation for Respiratory Viruses. JMIR AI 2023; 2:e44835. [PMID: 38875570 PMCID: PMC11057455 DOI: 10.2196/44835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 03/31/2023] [Accepted: 04/18/2023] [Indexed: 06/16/2024]
Abstract
BACKGROUND With the growing volume and complexity of laboratory repositories, it has become tedious to parse unstructured data into structured and tabulated formats for secondary uses such as decision support, quality assurance, and outcome analysis. However, advances in natural language processing (NLP) approaches have enabled efficient and automated extraction of clinically meaningful medical concepts from unstructured reports. OBJECTIVE In this study, we aimed to determine the feasibility of using the NLP model for information extraction as an alternative approach to a time-consuming and operationally resource-intensive handcrafted rule-based tool. Therefore, we sought to develop and evaluate a deep learning-based NLP model to derive knowledge and extract information from text-based laboratory reports sourced from a provincial laboratory repository system. METHODS The NLP model, a hierarchical multilabel classifier, was trained on a corpus of laboratory reports covering testing for 14 different respiratory viruses and viral subtypes. The corpus includes 87,500 unique laboratory reports annotated by 8 subject matter experts (SMEs). The classification task involved assigning the laboratory reports to labels at 2 levels: 24 fine-grained labels in level 1 and 6 coarse-grained labels in level 2. A "label" also refers to the status of a specific virus or strain being tested or detected (eg, influenza A is detected). The model's performance stability and variation were analyzed across all labels in the classification task. Additionally, the model's generalizability was evaluated internally and externally on various test sets. RESULTS Overall, the NLP model performed well on internal, out-of-time (pre-COVID-19), and external (different laboratories) test sets with microaveraged F1-scores >94% across all classes. Higher precision and recall scores with less variability were observed for the internal and pre-COVID-19 test sets. As expected, the model's performance varied across categories and virus types due to the imbalanced nature of the corpus and sample sizes per class. There were intrinsically fewer classes of viruses being detected than those tested; therefore, the model's performance (lowest F1-score of 57%) was noticeably lower in the detected cases. CONCLUSIONS We demonstrated that deep learning-based NLP models are promising solutions for information extraction from text-based laboratory reports. These approaches enable scalable, timely, and practical access to high-quality and encoded laboratory data if integrated into laboratory information system repositories.
Collapse
Affiliation(s)
- Elham Dolatabadi
- Vector Institute, Toronto, ON, Canada
- School of Health Policy and Management, Faculty of Health, York University, Toronto, ON, Canada
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, ON, Canada
| | | | - Sarah A Buchan
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, ON, Canada
- ICES, Toronto, ON, Canada
- Public Health Ontario, Toronto, ON, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | | | - Mahmoud Azimaee
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, ON, Canada
- ICES, Toronto, ON, Canada
| | - Allison McGeer
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, ON, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
- Sinai Health System, Toronto, ON, Canada
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
| | - Samira Mubareka
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
- Sunnybrook Research Institute, Toronto, ON, Canada
| | - Jeffrey C Kwong
- ICES, Toronto, ON, Canada
- Public Health Ontario, Toronto, ON, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
- University Health Network, Toronto, ON, Canada
- Department of Family and Community Medicine, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
5
|
Aronis JM, Ye Y, Espino J, Hochheiser H, Michaels MG, Cooper GF. A Bayesian System to Track Outbreaks of Influenza-Like Illnesses Including Novel Diseases. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.05.10.23289799. [PMID: 37293033 PMCID: PMC10246032 DOI: 10.1101/2023.05.10.23289799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
It would be highly desirable to have a tool that detects the outbreak of a new influenza-like illness, such as COVID-19, accurately and early. This paper describes the ILI Tracker algorithm that first models the daily occurrence of a set of known influenza-like illnesses in a hospital emergency department using findings extracted from patient-care reports using natural language processing. We include results based on modeling the diseases influenza, respiratory syncytial virus, human metapneumovirus, and parainfluenza for five emergency departments in Allegheny County Pennsylvania from June 1, 2010 through May 31, 2015. We then show how the algorithm can be extended to detect the presence of an unmodeled disease which may represent a novel disease outbreak. We also include results for detecting an outbreak of an unmodeled disease during the mentioned time period, which in retrospect was very likely an outbreak of Enterovirus D68.
Collapse
|
6
|
Hao T, Wissel B, Ni Y, Pajor N, Glauser T, Pestian J, Dexheimer JW. Implementation of Machine Learning Pipelines for Clinical Practice: Development and Validation Study. JMIR Med Inform 2022; 10:e37833. [PMID: 36525289 PMCID: PMC9804095 DOI: 10.2196/37833] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 09/01/2022] [Accepted: 09/19/2022] [Indexed: 01/03/2023] Open
Abstract
BACKGROUND Artificial intelligence (AI) technologies, such as machine learning and natural language processing, have the potential to provide new insights into complex health data. Although powerful, these algorithms rarely move from experimental studies to direct clinical care implementation. OBJECTIVE We aimed to describe the key components for successful development and integration of two AI technology-based research pipelines for clinical practice. METHODS We summarized the approach, results, and key learnings from the implementation of the following two systems implemented at a large, tertiary care children's hospital: (1) epilepsy surgical candidate identification (or epilepsy ID) in an ambulatory neurology clinic; and (2) an automated clinical trial eligibility screener (ACTES) for the real-time identification of patients for research studies in a pediatric emergency department. RESULTS The epilepsy ID system performed as well as board-certified neurologists in identifying surgical candidates (with a sensitivity of 71% and positive predictive value of 77%). The ACTES system decreased coordinator screening time by 12.9%. The success of each project was largely dependent upon the collaboration between machine learning experts, research and operational information technology professionals, longitudinal support from clinical providers, and institutional leadership. CONCLUSIONS These projects showcase novel interactions between machine learning recommendations and providers during clinical care. Our deployment provides seamless, real-time integration of AI technology to provide decision support and improve patient care.
Collapse
Affiliation(s)
| | - Benjamin Wissel
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States
| | - Yizhao Ni
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States.,Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, United States
| | - Nathan Pajor
- Division of Pulmonary Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States.,Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States.,Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, United States
| | - Tracy Glauser
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, United States.,Division of Neurology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States
| | - John Pestian
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States.,Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, United States
| | - Judith W Dexheimer
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States.,Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, United States.,Division of Emergency Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States
| |
Collapse
|
7
|
Tang KY, Hsiao CH, Hwang GJ. A scholarly network of AI research with an information science focus: Global North and Global South perspectives. PLoS One 2022; 17:e0266565. [PMID: 35427381 PMCID: PMC9012391 DOI: 10.1371/journal.pone.0266565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Accepted: 03/22/2022] [Indexed: 11/19/2022] Open
Abstract
This paper primarily aims to provide a citation-based method for exploring the scholarly network of artificial intelligence (AI)-related research in the information science (IS) domain, especially from Global North (GN) and Global South (GS) perspectives. Three research objectives were addressed, namely (1) the publication patterns in the field, (2) the most influential articles and researched keywords in the field, and (3) the visualization of the scholarly network between GN and GS researchers between the years 2010 and 2020. On the basis of the PRISMA statement, longitudinal research data were retrieved from the Web of Science and analyzed. Thirty-two AI-related keywords were used to retrieve relevant quality articles. Finally, 149 articles accompanying the follow-up 8838 citing articles were identified as eligible sources. A co-citation network analysis was adopted to scientifically visualize the intellectual structure of AI research in GN and GS networks. The results revealed that the United States, Australia, and the United Kingdom are the most productive GN countries; by contrast, China and India are the most productive GS countries. Next, the 10 most frequently co-cited AI research articles in the IS domain were identified. Third, the scholarly networks of AI research in the GN and GS areas were visualized. Between 2010 and 2015, GN researchers in the IS domain focused on applied research involving intelligent systems (e.g., decision support systems); between 2016 and 2020, GS researchers focused on big data applications (e.g., geospatial big data research). Both GN and GS researchers focused on technology adoption research (e.g., AI-related products and services) throughout the investigated period. Overall, this paper reveals the intellectual structure of the scholarly network on AI research and several applications in the IS literature. The findings provide research-based evidence for expanding global AI research.
Collapse
Affiliation(s)
- Kai-Yu Tang
- Department of International Business, Ming Chuan University, Taipei, Taiwan
- * E-mail:
| | | | - Gwo-Jen Hwang
- Graduate Institute of Digital Learning and Education, National Taiwan University of Science and Technology, Taipei, Taiwan
| |
Collapse
|
8
|
Automating and Improving Cardiovascular Disease Prediction Using Machine Learning and EMR Data Features from a Regional Healthcare System. Int J Med Inform 2022; 163:104786. [DOI: 10.1016/j.ijmedinf.2022.104786] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 04/23/2022] [Accepted: 04/25/2022] [Indexed: 02/05/2023]
|
9
|
Detection and Prevention of Virus Infection. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1368:21-52. [DOI: 10.1007/978-981-16-8969-7_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
10
|
Chamola V, Hassija V, Gupta S, Goyal A, Guizani M, Sikdar B. Disaster and Pandemic Management Using Machine Learning: A Survey. IEEE INTERNET OF THINGS JOURNAL 2021; 8:16047-16071. [PMID: 35782181 PMCID: PMC8768997 DOI: 10.1109/jiot.2020.3044966] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Revised: 11/26/2020] [Accepted: 12/10/2020] [Indexed: 05/14/2023]
Abstract
This article provides a literature review of state-of-the-art machine learning (ML) algorithms for disaster and pandemic management. Most nations are concerned about disasters and pandemics, which, in general, are highly unlikely events. To date, various technologies, such as IoT, object sensing, UAV, 5G, and cellular networks, smartphone-based system, and satellite-based systems have been used for disaster and pandemic management. ML algorithms can handle multidimensional, large volumes of data that occur naturally in environments related to disaster and pandemic management and are particularly well suited for important related tasks, such as recognition and classification. ML algorithms are useful for predicting disasters and assisting in disaster management tasks, such as determining crowd evacuation routes, analyzing social media posts, and handling the post-disaster situation. ML algorithms also find great application in pandemic management scenarios, such as predicting pandemics, monitoring pandemic spread, disease diagnosis, etc. This article first presents a tutorial on ML algorithms. It then presents a detailed review of several ML algorithms and how we can combine these algorithms with other technologies to address disaster and pandemic management. It also discusses various challenges, open issues and, directions for future research.
Collapse
Affiliation(s)
- Vinay Chamola
- Department of Electrical and Electronics Engineering & APPCAIRBirla Institute of Technology and Science at PilaniPilani333031India
| | - Vikas Hassija
- Department of Computer Science and ITJaypee Institute of Information TechnologyNoida201304India
| | - Sakshi Gupta
- Department of Computer Science and ITJaypee Institute of Information TechnologyNoida201304India
| | - Adit Goyal
- Department of Computer Science and ITJaypee Institute of Information TechnologyNoida201304India
| | - Mohsen Guizani
- Department of Computer Science and EngineeringQatar UniversityDohaQatar
| | - Biplab Sikdar
- Department of Electrical and Computer EngineeringNational University of SingaporeSingapore119077
| |
Collapse
|
11
|
Dipaola F, Shiffer D, Gatti M, Menè R, Solbiati M, Furlan R. Machine Learning and Syncope Management in the ED: The Future Is Coming. ACTA ACUST UNITED AC 2021; 57:medicina57040351. [PMID: 33917508 PMCID: PMC8067452 DOI: 10.3390/medicina57040351] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2021] [Revised: 03/30/2021] [Accepted: 04/02/2021] [Indexed: 11/16/2022]
Abstract
In recent years, machine learning (ML) has been promisingly applied in many fields of clinical medicine, both for diagnosis and prognosis prediction. Aims of this narrative review were to summarize the basic concepts of ML applied to clinical medicine and explore its main applications in the emergency department (ED) setting, with a particular focus on syncope management. Through an extensive literature search in PubMed and Embase, we found increasing evidence suggesting that the use of ML algorithms can improve ED triage, diagnosis, and risk stratification of many diseases. However, the lacks of external validation and reliable diagnostic standards currently limit their implementation in clinical practice. Syncope represents a challenging problem for the emergency physician both because its diagnosis is not supported by specific tests and the available prognostic tools proved to be inefficient. ML algorithms have the potential to overcome these limitations and, in the future, they could support the clinician in managing syncope patients more efficiently. However, at present only few studies have addressed this issue, albeit with encouraging results.
Collapse
Affiliation(s)
- Franca Dipaola
- Department of Biomedical Sciences, Humanitas University, Pieve Emanuele, 20090 Milan, Italy; (D.S.); (R.F.)
- Internal Medicine, Humanitas Clinical and Research Center—IRCCS, Rozzano, 20089 Milan, Italy
- Correspondence: ; Tel.: +39-0282247266
| | - Dana Shiffer
- Department of Biomedical Sciences, Humanitas University, Pieve Emanuele, 20090 Milan, Italy; (D.S.); (R.F.)
| | - Mauro Gatti
- IBM, Active Intelligence Center, 40121 Bologna, Italy;
| | - Roberto Menè
- Department of Medicine and Surgery, University of Milano-Bicocca, 20126 Milan, Italy;
| | - Monica Solbiati
- Fondazione IRCCS Ca’ Granda Ospedale Maggiore Policlinico, 20122 Milan, Italy;
- Dipartimento di Scienze Cliniche e di Comunità, Università degli Studi di Milano, 20122 Milan, Italy
| | - Raffaello Furlan
- Department of Biomedical Sciences, Humanitas University, Pieve Emanuele, 20090 Milan, Italy; (D.S.); (R.F.)
- Internal Medicine, Humanitas Clinical and Research Center—IRCCS, Rozzano, 20089 Milan, Italy
| |
Collapse
|
12
|
Oliveira CR, Niccolai P, Ortiz AM, Sheth SS, Shapiro ED, Niccolai LM, Brandt CA. Natural Language Processing for Surveillance of Cervical and Anal Cancer and Precancer: Algorithm Development and Split-Validation Study. JMIR Med Inform 2020; 8:e20826. [PMID: 32469840 PMCID: PMC7671846 DOI: 10.2196/20826] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Revised: 09/18/2020] [Accepted: 10/04/2020] [Indexed: 12/13/2022] Open
Abstract
Background Accurate identification of new diagnoses of human papillomavirus–associated cancers and precancers is an important step toward the development of strategies that optimize the use of human papillomavirus vaccines. The diagnosis of human papillomavirus cancers hinges on a histopathologic report, which is typically stored in electronic medical records as free-form, or unstructured, narrative text. Previous efforts to perform surveillance for human papillomavirus cancers have relied on the manual review of pathology reports to extract diagnostic information, a process that is both labor- and resource-intensive. Natural language processing can be used to automate the structuring and extraction of clinical data from unstructured narrative text in medical records and may provide a practical and effective method for identifying patients with vaccine-preventable human papillomavirus disease for surveillance and research. Objective This study's objective was to develop and assess the accuracy of a natural language processing algorithm for the identification of individuals with cancer or precancer of the cervix and anus. Methods A pipeline-based natural language processing algorithm was developed, which incorporated machine learning and rule-based methods to extract diagnostic elements from the narrative pathology reports. To test the algorithm’s classification accuracy, we used a split-validation study design. Full-length cervical and anal pathology reports were randomly selected from 4 clinical pathology laboratories. Two study team members, blinded to the classifications produced by the natural language processing algorithm, manually and independently reviewed all reports and classified them at the document level according to 2 domains (diagnosis and human papillomavirus testing results). Using the manual review as the gold standard, the algorithm’s performance was evaluated using standard measurements of accuracy, recall, precision, and F-measure. Results The natural language processing algorithm’s performance was validated on 949 pathology reports. The algorithm demonstrated accurate identification of abnormal cytology, histology, and positive human papillomavirus tests with accuracies greater than 0.91. Precision was lowest for anal histology reports (0.87, 95% CI 0.59-0.98) and highest for cervical cytology (0.98, 95% CI 0.95-0.99). The natural language processing algorithm missed 2 out of the 15 abnormal anal histology reports, which led to a relatively low recall (0.68, 95% CI 0.43-0.87). Conclusions This study outlines the development and validation of a freely available and easily implementable natural language processing algorithm that can automate the extraction and classification of clinical data from cervical and anal cytology and histology.
Collapse
Affiliation(s)
- Carlos R Oliveira
- Department of Pediatrics, Yale University School of Medicine, New Haven, CT, United States
| | - Patrick Niccolai
- Department of Pediatrics, Yale University School of Medicine, New Haven, CT, United States
| | - Anette Michelle Ortiz
- Department of Pediatrics, Yale University School of Medicine, New Haven, CT, United States
| | - Sangini S Sheth
- Department of Obstetrics, Gynecology, and Reproductive Sciences, Yale University School of Medicine, New Haven, CT, United States
| | - Eugene D Shapiro
- Department of Pediatrics, Yale University School of Medicine, New Haven, CT, United States.,Departments of Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT, United States
| | - Linda M Niccolai
- Departments of Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT, United States
| | - Cynthia A Brandt
- Departments of Emergency Medicine, Biostatistics, and Health Informatics, Yale Schools of Medicine and Public Health, New Haven, CT, United States.,Veteran Affairs Connecticut Healthcare System, West Haven, CT, United States
| |
Collapse
|
13
|
Abstract
Infectious disease research spans scales from the molecular to the global—from specific mechanisms of pathogen drug resistance, virulence, and replication to the movement of people, animals, and pathogens around the world. All of these research areas have been impacted by the recent growth of large-scale data sources and data analytics. Some of these advances rely on data or analytic methods that are common to most biomedical data science, while others leverage the unique nature of infectious disease, namely its communicability. This review outlines major research progress in the past few years and highlights some remaining opportunities, focusing on data or methodological approaches particular to infectious disease.
Collapse
Affiliation(s)
- Peter M. Kasson
- Department of Biomedical Engineering and Department of Molecular Physiology, University of Virginia, Charlottesville, Virginia 22908, USA
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, 752 37 Uppsala, Sweden
| |
Collapse
|
14
|
Lu W, Ng R. Automated Analysis of Public Health Laboratory Test Results. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2020; 2020:393-402. [PMID: 32477660 PMCID: PMC7233052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
This study investigates the use of machine learning methods for classifying and extracting structured information from laboratory reports stored as semi-structured point-form English text. This is a novel data format that has not been evaluated in conjunction with machine learning classifiers in previous literature. Our classifiers achieve human-level predictive accuracy on the binary Test Performed and 4-class Test Outcome labels. We consider symbolic approaches for predicting the highly multi-class Organism Genus and Organism Species labels. Results are discussed from the viewpoint of interpretability and generalizability to new incoming laboratory reports. Code has been made public at https://github.com/enchainingrealm/UbcDssgBccdc-Research/tree/master/src.
Collapse
Affiliation(s)
- William Lu
- University of British Columbia, Vancouver, British Columbia, Canada
| | - Raymond Ng
- University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
15
|
Tsui F, Ye Y, Ruiz V, Cooper GF, Wagner MM. Automated influenza case detection for public health surveillance and clinical diagnosis using dynamic influenza prevalence method. J Public Health (Oxf) 2019; 40:878-885. [PMID: 29059331 DOI: 10.1093/pubmed/fdx141] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2017] [Indexed: 11/13/2022] Open
Abstract
Objectives To assess the performance of a Bayesian case detector (BCD) for influenza surveillance and clinical diagnosis. Methods BCD uses a Bayesian network classifier to compute the posterior probability of a patient having influenza based on 31 findings from narrative clinical notes. To assess the potential for disease surveillance, we calculated area under the receiver operating characteristic curve (AUC) to indicate BCD's ability to differentiate between influenza and non-influenza encounters in emergency department settings. To assess the potential for clinical diagnosis, we measured AUC for diagnosing influenza cases among encounters having influenza-like illnesses. We also evaluated the performance of BCD using dynamically estimated influenza prevalence, and measured sensitivity, specificity and positive predictive value. Results For influenza surveillance, BCD differentiated between influenza and non-influenza encounters well with an AUC of 0.90 and 0.97 with dynamic influenza prevalence (P < 0.0001). For clinical diagnosis, the addition of dynamic influenza prevalence to BCD significantly improved AUC from 0.63 to 0.85 to distinguish influenza from other causes of influenza-like illness. Conclusions and policy implications BCD can serve as an influenza surveillance and a differential diagnosis tool via our dynamic prevalence approach. It enhances the communication between public health and clinical practice.
Collapse
Affiliation(s)
- Fuchiang Tsui
- Real-time Outbreak and Disease Surveillance Laboratory (RODS), Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.,Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| | - Ye Ye
- Real-time Outbreak and Disease Surveillance Laboratory (RODS), Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.,Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| | - Victor Ruiz
- Real-time Outbreak and Disease Surveillance Laboratory (RODS), Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Gregory F Cooper
- Real-time Outbreak and Disease Surveillance Laboratory (RODS), Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.,Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| | - Michael M Wagner
- Real-time Outbreak and Disease Surveillance Laboratory (RODS), Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.,Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
16
|
Abstract
Electronic Health Records (EHR) are a rich repository of valuable clinical information that exist in primary and secondary care databases. In order to utilize EHRs for medical observational research a range of algorithms for automatically identifying individuals with a specific phenotype have been developed. This review summarizes and offers a critical evaluation of the literature relating to studies conducted into the development of EHR phenotyping systems. This review describes phenotyping systems and techniques based on structured and unstructured EHR data. Articles published on PubMed and Google scholar between 2013 and 2017 have been reviewed, using search terms derived from Medical Subject Headings (MeSH). The popularity of using Natural Language Processing (NLP) techniques in extracting features from narrative text has increased. This increased attention is due to the availability of open source NLP algorithms, combined with accuracy improvement. In this review, Concept extraction is the most popular NLP technique since it has been used by more than 50% of the reviewed papers to extract features from EHR. High-throughput phenotyping systems using unsupervised machine learning techniques have gained more popularity due to their ability to efficiently and automatically extract a phenotype with minimal human effort.
Collapse
|
17
|
Shafaf N, Malek H. Applications of Machine Learning Approaches in Emergency Medicine; a Review Article. ARCHIVES OF ACADEMIC EMERGENCY MEDICINE 2019; 7:34. [PMID: 31555764 PMCID: PMC6732202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/30/2022]
Abstract
Using artificial intelligence and machine learning techniques in different medical fields, especially emergency medicine is rapidly growing. In this paper, studies conducted in the recent years on using artificial intelligence in emergency medicine have been collected and assessed. These studies belonged to three categories: prediction and detection of disease; prediction of need for admission, discharge and also mortality; and machine learning based triage systems. In each of these categories, the most important studies have been chosen and accuracy and results of the algorithms have been briefly evaluated by mentioning machine learning techniques and used datasets.
Collapse
Affiliation(s)
- Negin Shafaf
- Faculty of Computer Science and Engineering, Shahid Beheshti University, Tehran, Iran
| | - Hamed Malek
- Faculty of Computer Science and Engineering, Shahid Beheshti University, Tehran, Iran
| |
Collapse
|
18
|
Conway M, Keyhani S, Christensen L, South BR, Vali M, Walter LC, Mowery DL, Abdelrahman S, Chapman WW. Moonstone: a novel natural language processing system for inferring social risk from clinical narratives. J Biomed Semantics 2019; 10:6. [PMID: 30975223 PMCID: PMC6458709 DOI: 10.1186/s13326-019-0198-0] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Accepted: 03/18/2019] [Indexed: 11/10/2022] Open
Abstract
Background Social risk factors are important dimensions of health and are linked to access to care, quality of life, health outcomes and life expectancy. However, in the Electronic Health Record, data related to many social risk factors are primarily recorded in free-text clinical notes, rather than as more readily computable structured data, and hence cannot currently be easily incorporated into automated assessments of health. In this paper, we present Moonstone, a new, highly configurable rule-based clinical natural language processing system designed to automatically extract information that requires inferencing from clinical notes. Our initial use case for the tool is focused on the automatic extraction of social risk factor information — in this case, housing situation, living alone, and social support — from clinical notes. Nursing notes, social work notes, emergency room physician notes, primary care notes, hospital admission notes, and discharge summaries, all derived from the Veterans Health Administration, were used for algorithm development and evaluation. Results An evaluation of Moonstone demonstrated that the system is highly accurate in extracting and classifying the three variables of interest (housing situation, living alone, and social support). The system achieved positive predictive value (i.e. precision) scores ranging from 0.66 (homeless/marginally housed) to 0.98 (lives at home/not homeless), accuracy scores ranging from 0.63 (lives in facility) to 0.95 (lives alone), and sensitivity (i.e. recall) scores ranging from 0.75 (lives in facility) to 0.97 (lives alone). Conclusions The Moonstone system is — to the best of our knowledge — the first freely available, open source natural language processing system designed to extract social risk factors from clinical text with good (lives in facility) to excellent (lives alone) performance. Although developed with the social risk factor identification task in mind, Moonstone provides a powerful tool to address a range of clinical natural language processing tasks, especially those tasks that require nuanced linguistic processing in conjunction with inference capabilities.
Collapse
Affiliation(s)
- Mike Conway
- Department of Biomedical Informatics, 421 Wakara Way, University of Utah, alt Lake City, 84108, UT, USA.
| | - Salomeh Keyhani
- San Francisco VA Medical Center, 4150 Clement Street, San Francisco, 94121, CA, USA.,Department of Medicine, University of California San Francisco, 505 Parnassus Ave, San Francisco, 94143, CA, USA
| | - Lee Christensen
- Department of Biomedical Informatics, 421 Wakara Way, University of Utah, alt Lake City, 84108, UT, USA
| | - Brett R South
- Department of Biomedical Informatics, 421 Wakara Way, University of Utah, alt Lake City, 84108, UT, USA.,Salt Lake City VA Health Care System, 500 Foothill Drive, Salt Lake City, 84148, UT, USA
| | - Marzieh Vali
- San Francisco VA Medical Center, 4150 Clement Street, San Francisco, 94121, CA, USA
| | - Louise C Walter
- San Francisco VA Medical Center, 4150 Clement Street, San Francisco, 94121, CA, USA.,Department of Medicine, University of California San Francisco, 505 Parnassus Ave, San Francisco, 94143, CA, USA
| | - Danielle L Mowery
- Department of Biomedical Informatics, 421 Wakara Way, University of Utah, alt Lake City, 84108, UT, USA.,Salt Lake City VA Health Care System, 500 Foothill Drive, Salt Lake City, 84148, UT, USA
| | - Samir Abdelrahman
- Department of Biomedical Informatics, 421 Wakara Way, University of Utah, alt Lake City, 84108, UT, USA
| | - Wendy W Chapman
- Department of Biomedical Informatics, 421 Wakara Way, University of Utah, alt Lake City, 84108, UT, USA.,Salt Lake City VA Health Care System, 500 Foothill Drive, Salt Lake City, 84148, UT, USA
| |
Collapse
|
19
|
Smooth Bayesian network model for the prediction of future high-cost patients with COPD. Int J Med Inform 2019; 126:147-155. [PMID: 31029256 DOI: 10.1016/j.ijmedinf.2019.03.017] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Revised: 02/28/2019] [Accepted: 03/26/2019] [Indexed: 02/05/2023]
Abstract
INTRODUCTION The clinical course of chronic obstructive pulmonary disease (COPD) is marked by acute exacerbation events that increase hospitalization rates and healthcare spending. The early identification of future high-cost patients with COPD may decrease healthcare spending by informing individualized interventions that prevent exacerbation events and decelerate disease progression. Existing studies of cost prediction of other chronic diseases have applied regression and machine-learning methods that cannot capture the complex causal relationships between COPD factors. Thus, the exploration of these factors through nonlinear, high-dimensional but explainable modeling is greatly needed. OBJECTIVES We aimed to develop a machine-learning model to identify future high-cost patients with COPD. Such a model should incorporate expert knowledge about causal relationships, and the method for estimating the model could provide more accurate predictions than other machine learning methods. METHODS We used the 2011-2013 medical insurance data of patients with COPD in a large city. The data set included demographic information and admission records. Leveraging on developments in graphical modeling methods, we proposed a smooth Bayesian network (SBN) model for the prediction of high-cost individuals using medical insurance data. The modeling method incorporated some expert knowledge about causal relationships (i.e., about the Bayesian network structure). We employed a smoothing kernel based on the weighted nearest neighborhood method in the SBN model to address overfitting, case-mix effect, and data sparsity (i.e., using data about "similar patients"). RESULTS The proposed SBN achieved the area under curve (AUC) of 0.80 and showed considerable improvement over the baseline machine-learning methods. Besides confirming the known factors from the literature, we found "region" (i.e., a suburban or urban area) to be a significant factor, and that in a 3-tier system with primary, secondary and tertiary hospitals, COPD patients who had been admitted to primary hospitals were more likely to develop into future high-cost patients than patients who had been admitted to tertiary hospitals. CONCLUSION The proposed SBN model not only obtained higher prediction accuracy and stronger generalizability than a number of benchmark machine-learning methods, but also used the Bayesian network to capture the complex causal relationships between different predictors by incorporating expert knowledge. Furthermore, a framework was developed to establish the relationships between exposure to historical trajectory and future outcome, which can also be applied to other temporal data to model different trajectory information and predict other outcomes.
Collapse
|
20
|
Zhou S, Ren X, Yang J, Jin Q. Evaluating the Value of Defensins for Diagnosing Secondary Bacterial Infections in Influenza-Infected Patients. Front Microbiol 2018; 9:2762. [PMID: 30524393 PMCID: PMC6256186 DOI: 10.3389/fmicb.2018.02762] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2018] [Accepted: 10/29/2018] [Indexed: 11/13/2022] Open
Abstract
Acute respiratory infections by influenza viruses are commonly causes of severe pneumonia, which can further deteriorate if secondary bacterial infections occur. Although the viral and bacterial agents are quite diverse, defensins, a set of antimicrobial peptides expressed by the host, may provide promising biomarkers that would greatly improve the diagnosis and treatment. We examined the correlations between the gene expression levels of defensins and the viral and bacterial loads in the blood on a longitudinal, precision-medical study of a severe pneumonia patient infected by influenza A H7N9 virus. We found that DEFA5 is positively correlated to the blood load of influenza A H7N9 virus (r = 0.735, p < 0.05, Spearman correlation). DEFB116 and DEFB127 are positively and DEFB108B and DEFB114 are negatively correlated to the bacterial load. Then the diagnostic potential of defensins to discriminate bacterial and viral infections was evaluated on an independent dataset with 61 bacterial pneumonia patients and 39 viral pneumonia patients infected by influenza A viruses and reached 93% accuracy. Expression levels of defensins in the blood may be of important diagnostic values in clinic to indicate viral and bacterial infections.
Collapse
Affiliation(s)
- Siyu Zhou
- MOH Key Laboratory of Systems Biology of Pathogens, Peking Union Medical College, Institute of Pathogen Biology, Chinese Academy of Medical Sciences, Beijing, China
| | - Xianwen Ren
- BIOPIC, School of Life Sciences, Peking University, Beijing, China
| | - Jian Yang
- MOH Key Laboratory of Systems Biology of Pathogens, Peking Union Medical College, Institute of Pathogen Biology, Chinese Academy of Medical Sciences, Beijing, China
| | - Qi Jin
- MOH Key Laboratory of Systems Biology of Pathogens, Peking Union Medical College, Institute of Pathogen Biology, Chinese Academy of Medical Sciences, Beijing, China
| |
Collapse
|
21
|
Tou H, Yao L, Wei Z, Zhuang X, Zhang B. Automatic infection detection based on electronic medical records. BMC Bioinformatics 2018; 19:117. [PMID: 29671399 PMCID: PMC5907141 DOI: 10.1186/s12859-018-2101-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND Making accurate patient care decision, as early as possible, is a constant challenge, especially for physicians in the emergency department. The increasing volumes of electronic medical records (EMRs) open new horizons for automatic diagnosis. In this paper, we propose to use machine learning approaches for automatic infection detection based on EMRs. Five categories of information are utilized for prediction, including personal information, admission note, vital signs, diagnose test results and medical image diagnose. RESULTS Experimental results on a newly constructed EMRs dataset from emergency department show that machine learning models can achieve a decent performance for infection detection with area under the receiver operator characteristic curve (AUC) of 0.88. Out of all the five types of information, admission note in text form makes the most contribution with the AUC of 0.87. CONCLUSIONS This study provides a state-of-the-art EMRs processing system to automatically make medical decisions. It extracts five types of features associated with infection and achieves a decent performance on automatic infection detection based on machine learning models.
Collapse
Affiliation(s)
- Huaixiao Tou
- School of Data Science, Fudan University, Shanghai, China
| | - Lu Yao
- Zhongshan Hospital Affiliated to Fudan University, Shanghai, China
| | - Zhongyu Wei
- School of Data Science, Fudan University, Shanghai, China.
| | - Xiahai Zhuang
- School of Data Science, Fudan University, Shanghai, China
| | - Bo Zhang
- Zhongshan Hospital Affiliated to Fudan University, Shanghai, China.
| |
Collapse
|
22
|
Kennell TI, Willig JH, Cimino JJ. Clinical Informatics Researcher's Desiderata for the Data Content of the Next Generation Electronic Health Record. Appl Clin Inform 2017; 8:1159-1172. [PMID: 29270955 DOI: 10.4338/aci-2017-06-r-0101] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
OBJECTIVE Clinical informatics researchers depend on the availability of high-quality data from the electronic health record (EHR) to design and implement new methods and systems for clinical practice and research. However, these data are frequently unavailable or present in a format that requires substantial revision. This article reports the results of a review of informatics literature published from 2010 to 2016 that addresses these issues by identifying categories of data content that might be included or revised in the EHR. MATERIALS AND METHODS We used an iterative review process on 1,215 biomedical informatics research articles. We placed them into generic categories, reviewed and refined the categories, and then assigned additional articles, for a total of three iterations. RESULTS Our process identified eight categories of data content issues: Adverse Events, Clinician Cognitive Processes, Data Standards Creation and Data Communication, Genomics, Medication List Data Capture, Patient Preferences, Patient-reported Data, and Phenotyping. DISCUSSION These categories summarize discussions in biomedical informatics literature that concern data content issues restricting clinical informatics research. These barriers to research result from data that are either absent from the EHR or are inadequate (e.g., in narrative text form) for the downstream applications of the data. In light of these categories, we discuss changes to EHR data storage that should be considered in the redesign of EHRs, to promote continued innovation in clinical informatics. CONCLUSION Based on published literature of clinical informaticians' reuse of EHR data, we characterize eight types of data content that, if included in the next generation of EHRs, would find immediate application in advanced informatics tools and techniques.
Collapse
Affiliation(s)
- Timothy I Kennell
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States
| | - James H Willig
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States.,Department of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States
| | - James J Cimino
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States.,Department of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States
| |
Collapse
|
23
|
Walsh JA, Shao Y, Leng J, He T, Teng CC, Redd D, Treitler Zeng Q, Burningham Z, Clegg DO, Sauer BC. Identifying Axial Spondyloarthritis in Electronic Medical Records of US Veterans. Arthritis Care Res (Hoboken) 2017; 69:1414-1420. [PMID: 27813310 DOI: 10.1002/acr.23140] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2016] [Revised: 10/19/2016] [Accepted: 11/01/2016] [Indexed: 11/09/2022]
Abstract
OBJECTIVE Large database research in axial spondyloarthritis (SpA) is limited by a lack of methods for identifying most types of axial SpA. Our objective was to develop methods for identifying axial SpA concepts in the free text of documents from electronic medical records. METHODS Veterans with documents in the national Veterans Health Administration Corporate Data Warehouse between January 1, 2005 and June 30, 2015 were included. Methods were developed for exploring, selecting, and extracting meaningful terms that were likely to represent axial SpA concepts. With annotation, clinical experts reviewed sections of text containing the meaningful terms (snippets) and classified the snippets according to whether or not they represented the intended axial SpA concept. With natural language processing (NLP) tools, computers were trained to replicate the clinical experts' snippet classifications. RESULTS Three axial SpA concepts were selected by clinical experts, including sacroiliitis, terms including the prefix spond*, and HLA-B27 positivity (HLA-B27+). With supervised machine learning on annotated snippets, NLP models were developed with accuracies of 91.1% for sacroiliitis, 93.5% for spond*, and 97.2% for HLA-B27+. With independent validation, the accuracies were 92.0% for sacroiliitis, 91.0% for spond*, and 99.0% for HLA-B27+. CONCLUSION We developed feasible and accurate methods for identifying axial SpA concepts in the free text of clinical notes. Additional research is required to determine combinations of concepts that will accurately identify axial SpA phenotypes. These novel methods will facilitate previously impractical observational research in axial SpA and may be applied to research with other diseases.
Collapse
Affiliation(s)
- Jessica A Walsh
- George E. Wahlen Veterans Affairs Medical Center and University of Utah, Salt Lake City
| | - Yijun Shao
- George E. Wahlen Veterans Affairs Medical Center, Salt Lake City, Utah, and George Washington University, Washington, DC
| | - Jianwei Leng
- George E. Wahlen Veterans Affairs Medical Center and University of Utah, Salt Lake City
| | - Tao He
- George E. Wahlen Veterans Affairs Medical Center and University of Utah, Salt Lake City
| | - Chia-Chen Teng
- George E. Wahlen Veterans Affairs Medical Center and University of Utah, Salt Lake City
| | - Doug Redd
- George E. Wahlen Veterans Affairs Medical Center, Salt Lake City, Utah, and George Washington University, Washington, DC
| | - Qing Treitler Zeng
- George E. Wahlen Veterans Affairs Medical Center, Salt Lake City, Utah, and George Washington University, Washington, DC
| | | | - Daniel O Clegg
- George E. Wahlen Veterans Affairs Medical Center and University of Utah, Salt Lake City
| | - Brian C Sauer
- George E. Wahlen Veterans Affairs Medical Center and University of Utah, Salt Lake City
| |
Collapse
|
24
|
Ferraro JP, Ye Y, Gesteland PH, Haug PJ, Tsui FR, Cooper GF, Van Bree R, Ginter T, Nowalk AJ, Wagner M. The effects of natural language processing on cross-institutional portability of influenza case detection for disease surveillance. Appl Clin Inform 2017; 8:560-580. [PMID: 28561130 PMCID: PMC6241736 DOI: 10.4338/aci-2016-12-ra-0211] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2016] [Accepted: 03/11/2017] [Indexed: 11/23/2022] Open
Abstract
OBJECTIVES This study evaluates the accuracy and portability of a natural language processing (NLP) tool for extracting clinical findings of influenza from clinical notes across two large healthcare systems. Effectiveness is evaluated on how well NLP supports downstream influenza case-detection for disease surveillance. METHODS We independently developed two NLP parsers, one at Intermountain Healthcare (IH) in Utah and the other at University of Pittsburgh Medical Center (UPMC) using local clinical notes from emergency department (ED) encounters of influenza. We measured NLP parser performance for the presence and absence of 70 clinical findings indicative of influenza. We then developed Bayesian network models from NLP processed reports and tested their ability to discriminate among cases of (1) influenza, (2) non-influenza influenza-like illness (NI-ILI), and (3) 'other' diagnosis. RESULTS On Intermountain Healthcare reports, recall and precision of the IH NLP parser were 0.71 and 0.75, respectively, and UPMC NLP parser, 0.67 and 0.79. On University of Pittsburgh Medical Center reports, recall and precision of the UPMC NLP parser were 0.73 and 0.80, respectively, and IH NLP parser, 0.53 and 0.80. Bayesian case-detection performance measured by AUROC for influenza versus non-influenza on Intermountain Healthcare cases was 0.93 (using IH NLP parser) and 0.93 (using UPMC NLP parser). Case-detection on University of Pittsburgh Medical Center cases was 0.95 (using UPMC NLP parser) and 0.83 (using IH NLP parser). For influenza versus NI-ILI on Intermountain Healthcare cases performance was 0.70 (using IH NLP parser) and 0.76 (using UPMC NLP parser). On University of Pisstburgh Medical Center cases, 0.76 (using UPMC NLP parser) and 0.65 (using IH NLP parser). CONCLUSION In all but one instance (influenza versus NI-ILI using IH cases), local parsers were more effective at supporting case-detection although performances of non-local parsers were reasonable.
Collapse
Affiliation(s)
- Jeffrey P Ferraro
- Jeffrey P. Ferraro, Homer Warner Center | Intermountain Healthcare, 5171 South Cottonwood St, Suite 220, Murray, Utah 84107, , Tel: 801-244-6570
| | | | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Ye Y, Wagner MM, Cooper GF, Ferraro JP, Su H, Gesteland PH, Haug PJ, Millett NE, Aronis JM, Nowalk AJ, Ruiz VM, López Pineda A, Shi L, Van Bree R, Ginter T, Tsui F. A study of the transferability of influenza case detection systems between two large healthcare systems. PLoS One 2017; 12:e0174970. [PMID: 28380048 PMCID: PMC5381795 DOI: 10.1371/journal.pone.0174970] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Accepted: 03/17/2017] [Indexed: 01/16/2023] Open
Abstract
Objectives This study evaluates the accuracy and transferability of Bayesian case detection systems (BCD) that use clinical notes from emergency department (ED) to detect influenza cases. Methods A BCD uses natural language processing (NLP) to infer the presence or absence of clinical findings from ED notes, which are fed into a Bayesain network classifier (BN) to infer patients’ diagnoses. We developed BCDs at the University of Pittsburgh Medical Center (BCDUPMC) and Intermountain Healthcare in Utah (BCDIH). At each site, we manually built a rule-based NLP and trained a Bayesain network classifier from over 40,000 ED encounters between Jan. 2008 and May. 2010 using feature selection, machine learning, and expert debiasing approach. Transferability of a BCD in this study may be impacted by seven factors: development (source) institution, development parser, application (target) institution, application parser, NLP transfer, BN transfer, and classification task. We employed an ANOVA analysis to study their impacts on BCD performance. Results Both BCDs discriminated well between influenza and non-influenza on local test cases (AUCs > 0.92). When tested for transferability using the other institution’s cases, BCDUPMC discriminations declined minimally (AUC decreased from 0.95 to 0.94, p<0.01), and BCDIH discriminations declined more (from 0.93 to 0.87, p<0.0001). We attributed the BCDIH decline to the lower recall of the IH parser on UPMC notes. The ANOVA analysis showed five significant factors: development parser, application institution, application parser, BN transfer, and classification task. Conclusion We demonstrated high influenza case detection performance in two large healthcare systems in two geographically separated regions, providing evidentiary support for the use of automated case detection from routinely collected electronic clinical notes in national influenza surveillance. The transferability could be improved by training Bayesian network classifier locally and increasing the accuracy of the NLP parser.
Collapse
Affiliation(s)
- Ye Ye
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Michael M. Wagner
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Gregory F. Cooper
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Jeffrey P. Ferraro
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, United States of America
- Intermountain Healthcare, Salt Lake City, Utah, United States of America
| | - Howard Su
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Per H. Gesteland
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, United States of America
- Intermountain Healthcare, Salt Lake City, Utah, United States of America
- Department of Pediatrics, University of Utah, Salt Lake City, Utah, United States of America
| | - Peter J. Haug
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, United States of America
- Intermountain Healthcare, Salt Lake City, Utah, United States of America
| | - Nicholas E. Millett
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - John M. Aronis
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Andrew J. Nowalk
- Department of Pediatrics, Children's Hospital of Pittsburgh of UPMC, Pittsburgh, Pennsylvania, United States of America
| | - Victor M. Ruiz
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Arturo López Pineda
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Lingyun Shi
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Rudy Van Bree
- Intermountain Healthcare, Salt Lake City, Utah, United States of America
| | - Thomas Ginter
- VA Salt Lake City Healthcare System, Salt Lake City, Utah, United States of America
| | - Fuchiang Tsui
- Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- * E-mail:
| |
Collapse
|
26
|
Gaut G, Steyvers M, Imel ZE, Atkins DC, Smyth P. Content Coding of Psychotherapy Transcripts Using Labeled Topic Models. IEEE J Biomed Health Inform 2017; 21:476-487. [PMID: 26625437 PMCID: PMC4879602 DOI: 10.1109/jbhi.2015.2503985] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Psychotherapy represents a broad class of medical interventions received by millions of patients each year. Unlike most medical treatments, its primary mechanisms are linguistic; i.e., the treatment relies directly on a conversation between a patient and provider. However, the evaluation of patient-provider conversation suffers from critical shortcomings, including intensive labor requirements, coder error, nonstandardized coding systems, and inability to scale up to larger data sets. To overcome these shortcomings, psychotherapy analysis needs a reliable and scalable method for summarizing the content of treatment encounters. We used a publicly available psychotherapy corpus from Alexander Street press comprising a large collection of transcripts of patient-provider conversations to compare coding performance for two machine learning methods. We used the labeled latent Dirichlet allocation (L-LDA) model to learn associations between text and codes, to predict codes in psychotherapy sessions, and to localize specific passages of within-session text representative of a session code. We compared the L-LDA model to a baseline lasso regression model using predictive accuracy and model generalizability (measured by calculating the area under the curve (AUC) from the receiver operating characteristic curve). The L-LDA model outperforms the lasso logistic regression model at predicting session-level codes with average AUC scores of 0.79, and 0.70, respectively. For fine-grained level coding, L-LDA and logistic regression are able to identify specific talk-turns representative of symptom codes. However, model performance for talk-turn identification is not yet as reliable as human coders. We conclude that the L-LDA model has the potential to be an objective, scalable method for accurate automated coding of psychotherapy sessions that perform better than comparable discriminative methods at session-level coding and can also predict fine-grained codes.
Collapse
|
27
|
Ford E, Carroll JA, Smith HE, Scott D, Cassell JA. Extracting information from the text of electronic medical records to improve case detection: a systematic review. J Am Med Inform Assoc 2016; 23:1007-15. [PMID: 26911811 PMCID: PMC4997034 DOI: 10.1093/jamia/ocv180] [Citation(s) in RCA: 205] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2015] [Revised: 10/13/2015] [Accepted: 10/26/2015] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Electronic medical records (EMRs) are revolutionizing health-related research. One key issue for study quality is the accurate identification of patients with the condition of interest. Information in EMRs can be entered as structured codes or unstructured free text. The majority of research studies have used only coded parts of EMRs for case-detection, which may bias findings, miss cases, and reduce study quality. This review examines whether incorporating information from text into case-detection algorithms can improve research quality. METHODS A systematic search returned 9659 papers, 67 of which reported on the extraction of information from free text of EMRs with the stated purpose of detecting cases of a named clinical condition. Methods for extracting information from text and the technical accuracy of case-detection algorithms were reviewed. RESULTS Studies mainly used US hospital-based EMRs, and extracted information from text for 41 conditions using keyword searches, rule-based algorithms, and machine learning methods. There was no clear difference in case-detection algorithm accuracy between rule-based and machine learning methods of extraction. Inclusion of information from text resulted in a significant improvement in algorithm sensitivity and area under the receiver operating characteristic in comparison to codes alone (median sensitivity 78% (codes + text) vs 62% (codes), P = .03; median area under the receiver operating characteristic 95% (codes + text) vs 88% (codes), P = .025). CONCLUSIONS Text in EMRs is accessible, especially with open source information extraction algorithms, and significantly improves case detection when combined with codes. More harmonization of reporting within EMR studies is needed, particularly standardized reporting of algorithm accuracy metrics like positive predictive value (precision) and sensitivity (recall).
Collapse
Affiliation(s)
- Elizabeth Ford
- Division of Primary Care and Public Health, Brighton and Sussex Medical School, Brighton, UK
| | - John A Carroll
- Department of Informatics, University of Sussex, Brighton, UK
| | - Helen E Smith
- Division of Primary Care and Public Health, Brighton and Sussex Medical School, Brighton, UK
| | - Donia Scott
- Department of Informatics, University of Sussex, Brighton, UK
| | - Jackie A Cassell
- Division of Primary Care and Public Health, Brighton and Sussex Medical School, Brighton, UK
| |
Collapse
|
28
|
López Pineda A, Ye Y, Visweswaran S, Cooper GF, Wagner MM, Tsui FR. Comparison of machine learning classifiers for influenza detection from emergency department free-text reports. J Biomed Inform 2015; 58:60-69. [PMID: 26385375 PMCID: PMC4684714 DOI: 10.1016/j.jbi.2015.08.019] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Revised: 05/28/2015] [Accepted: 08/21/2015] [Indexed: 12/31/2022]
Abstract
Influenza is a yearly recurrent disease that has the potential to become a pandemic. An effective biosurveillance system is required for early detection of the disease. In our previous studies, we have shown that electronic Emergency Department (ED) free-text reports can be of value to improve influenza detection in real time. This paper studies seven machine learning (ML) classifiers for influenza detection, compares their diagnostic capabilities against an expert-built influenza Bayesian classifier, and evaluates different ways of handling missing clinical information from the free-text reports. We identified 31,268 ED reports from 4 hospitals between 2008 and 2011 to form two different datasets: training (468 cases, 29,004 controls), and test (176 cases and 1620 controls). We employed Topaz, a natural language processing (NLP) tool, to extract influenza-related findings and to encode them into one of three values: Acute, Non-acute, and Missing. Results show that all ML classifiers had areas under ROCs (AUC) ranging from 0.88 to 0.93, and performed significantly better than the expert-built Bayesian model. Missing clinical information marked as a value of missing (not missing at random) had a consistently improved performance among 3 (out of 4) ML classifiers when it was compared with the configuration of not assigning a value of missing (missing completely at random). The case/control ratios did not affect the classification performance given the large number of training cases. Our study demonstrates ED reports in conjunction with the use of ML and NLP with the handling of missing value information have a great potential for the detection of infectious diseases.
Collapse
Affiliation(s)
- Arturo López Pineda
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, Pittsburgh, PA, United States
| | - Ye Ye
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, Pittsburgh, PA, United States; Intelligent System Program, University of Pittsburgh Dietrich School of Arts and Sciences, 210 South Bouquet Street, Pittsburgh, PA, United States
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, Pittsburgh, PA, United States; Intelligent System Program, University of Pittsburgh Dietrich School of Arts and Sciences, 210 South Bouquet Street, Pittsburgh, PA, United States
| | - Gregory F Cooper
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, Pittsburgh, PA, United States; Intelligent System Program, University of Pittsburgh Dietrich School of Arts and Sciences, 210 South Bouquet Street, Pittsburgh, PA, United States
| | - Michael M Wagner
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, Pittsburgh, PA, United States; Intelligent System Program, University of Pittsburgh Dietrich School of Arts and Sciences, 210 South Bouquet Street, Pittsburgh, PA, United States
| | - Fuchiang Rich Tsui
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, Pittsburgh, PA, United States; Intelligent System Program, University of Pittsburgh Dietrich School of Arts and Sciences, 210 South Bouquet Street, Pittsburgh, PA, United States.
| |
Collapse
|
29
|
Han D, Wang S, Jiang C, Jiang X, Kim HE, Sun J, Ohno-Machado L. Trends in biomedical informatics: automated topic analysis of JAMIA articles. J Am Med Inform Assoc 2015; 22:1153-63. [PMID: 26555018 PMCID: PMC5009912 DOI: 10.1093/jamia/ocv157] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2015] [Revised: 09/08/2015] [Accepted: 09/14/2015] [Indexed: 01/26/2023] Open
Abstract
Biomedical Informatics is a growing interdisciplinary field in which research topics and citation trends have been evolving rapidly in recent years. To analyze these data in a fast, reproducible manner, automation of certain processes is needed. JAMIA is a "generalist" journal for biomedical informatics. Its articles reflect the wide range of topics in informatics. In this study, we retrieved Medical Subject Headings (MeSH) terms and citations of JAMIA articles published between 2009 and 2014. We use tensors (i.e., multidimensional arrays) to represent the interaction among topics, time and citations, and applied tensor decomposition to automate the analysis. The trends represented by tensors were then carefully interpreted and the results were compared with previous findings based on manual topic analysis. A list of most cited JAMIA articles, their topics, and publication trends over recent years is presented. The analyses confirmed previous studies and showed that, from 2012 to 2014, the number of articles related to MeSH terms Methods, Organization & Administration, and Algorithms increased significantly both in number of publications and citations. Citation trends varied widely by topic, with Natural Language Processing having a large number of citations in particular years, and Medical Record Systems, Computerized remaining a very popular topic in all years.
Collapse
Affiliation(s)
- Dong Han
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA School of Electrical and Computer Engineering, University of Oklahoma, Tulsa, OK, 74135, USA
| | - Shuang Wang
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA
| | - Chao Jiang
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA School of Electrical and Computer Engineering, University of Oklahoma, Tulsa, OK, 74135, USA
| | - Xiaoqian Jiang
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA
| | - Hyeon-Eui Kim
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA
| | - Jimeng Sun
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, S30313, USA
| | - Lucila Ohno-Machado
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA
| |
Collapse
|
30
|
|
31
|
An Introduction to Natural Language Processing: How You Can Get More From Those Electronic Notes You Are Generating. Pediatr Emerg Care 2015; 31:536-41. [PMID: 26148107 DOI: 10.1097/pec.0000000000000484] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Electronically stored clinical documents may contain both structured data and unstructured data. The use of structured clinical data varies by facility, but clinicians are familiar with coded data such as International Classification of Diseases, Ninth Revision, Systematized Nomenclature of Medicine-Clinical Terms codes, and commonly other data including patient chief complaints or laboratory results. Most electronic health records have much more clinical information stored as unstructured data, for example, clinical narrative such as history of present illness, procedure notes, and clinical decision making are stored as unstructured data. Despite the importance of this information, electronic capture or retrieval of unstructured clinical data has been challenging. The field of natural language processing (NLP) is undergoing rapid development, and existing tools can be successfully used for quality improvement, research, healthcare coding, and even billing compliance. In this brief review, we provide examples of successful uses of NLP using emergency medicine physician visit notes for various projects and the challenges of retrieving specific data and finally present practical methods that can run on a standard personal computer as well as high-end state-of-the-art funded processes run by leading NLP informatics researchers.
Collapse
|
32
|
Li Q, Spooner SA, Kaiser M, Lingren N, Robbins J, Lingren T, Tang H, Solti I, Ni Y. An end-to-end hybrid algorithm for automated medication discrepancy detection. BMC Med Inform Decis Mak 2015; 15:37. [PMID: 25943550 PMCID: PMC4427951 DOI: 10.1186/s12911-015-0160-8] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2014] [Accepted: 04/27/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In this study we implemented and developed state-of-the-art machine learning (ML) and natural language processing (NLP) technologies and built a computerized algorithm for medication reconciliation. Our specific aims are: (1) to develop a computerized algorithm for medication discrepancy detection between patients' discharge prescriptions (structured data) and medications documented in free-text clinical notes (unstructured data); and (2) to assess the performance of the algorithm on real-world medication reconciliation data. METHODS We collected clinical notes and discharge prescription lists for all 271 patients enrolled in the Complex Care Medical Home Program at Cincinnati Children's Hospital Medical Center between 1/1/2010 and 12/31/2013. A double-annotated, gold-standard set of medication reconciliation data was created for this collection. We then developed a hybrid algorithm consisting of three processes: (1) a ML algorithm to identify medication entities from clinical notes, (2) a rule-based method to link medication names with their attributes, and (3) a NLP-based, hybrid approach to match medications with structured prescriptions in order to detect medication discrepancies. The performance was validated on the gold-standard medication reconciliation data, where precision (P), recall (R), F-value (F) and workload were assessed. RESULTS The hybrid algorithm achieved 95.0%/91.6%/93.3% of P/R/F on medication entity detection and 98.7%/99.4%/99.1% of P/R/F on attribute linkage. The medication matching achieved 92.4%/90.7%/91.5% (P/R/F) on identifying matched medications in the gold-standard and 88.6%/82.5%/85.5% (P/R/F) on discrepant medications. By combining all processes, the algorithm achieved 92.4%/90.7%/91.5% (P/R/F) and 71.5%/65.2%/68.2% (P/R/F) on identifying the matched and the discrepant medications, respectively. The error analysis on algorithm outputs identified challenges to be addressed in order to improve medication discrepancy detection. CONCLUSION By leveraging ML and NLP technologies, an end-to-end, computerized algorithm achieves promising outcome in reconciling medications between clinical notes and discharge prescriptions.
Collapse
Affiliation(s)
- Qi Li
- Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 7024, Cincinnati, OH, 45229-3039, USA
| | - Stephen Andrew Spooner
- Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 7024, Cincinnati, OH, 45229-3039, USA.,Chief Medical Information Officer, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Megan Kaiser
- Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 7024, Cincinnati, OH, 45229-3039, USA
| | - Nataline Lingren
- Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 7024, Cincinnati, OH, 45229-3039, USA
| | - Jessica Robbins
- Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 7024, Cincinnati, OH, 45229-3039, USA
| | - Todd Lingren
- Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 7024, Cincinnati, OH, 45229-3039, USA
| | - Huaxiu Tang
- Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 7024, Cincinnati, OH, 45229-3039, USA
| | - Imre Solti
- Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 7024, Cincinnati, OH, 45229-3039, USA.,James M. Anderson Center for Health Systems Excellence, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Yizhao Ni
- Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 7024, Cincinnati, OH, 45229-3039, USA.
| |
Collapse
|
33
|
Cooper GF, Villamarin R, Rich Tsui FC, Millett N, Espino JU, Wagner MM. A method for detecting and characterizing outbreaks of infectious disease from clinical reports. J Biomed Inform 2014; 53:15-26. [PMID: 25181466 DOI: 10.1016/j.jbi.2014.08.011] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2014] [Revised: 08/04/2014] [Accepted: 08/22/2014] [Indexed: 11/30/2022]
Abstract
Outbreaks of infectious disease can pose a significant threat to human health. Thus, detecting and characterizing outbreaks quickly and accurately remains an important problem. This paper describes a Bayesian framework that links clinical diagnosis of individuals in a population to epidemiological modeling of disease outbreaks in the population. Computer-based diagnosis of individuals who seek healthcare is used to guide the search for epidemiological models of population disease that explain the pattern of diagnoses well. We applied this framework to develop a system that detects influenza outbreaks from emergency department (ED) reports. The system diagnoses influenza in individuals probabilistically from evidence in ED reports that are extracted using natural language processing. These diagnoses guide the search for epidemiological models of influenza that explain the pattern of diagnoses well. Those epidemiological models with a high posterior probability determine the most likely outbreaks of specific diseases; the models are also used to characterize properties of an outbreak, such as its expected peak day and estimated size. We evaluated the method using both simulated data and data from a real influenza outbreak. The results provide support that the approach can detect and characterize outbreaks early and well enough to be valuable. We describe several extensions to the approach that appear promising.
Collapse
Affiliation(s)
- Gregory F Cooper
- Real-time Outbreak and Disease Surveillance (RODS) Laboratory, Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Boulevard, Pittsburgh, PA 15206-3701, USA.
| | - Ricardo Villamarin
- Real-time Outbreak and Disease Surveillance (RODS) Laboratory, Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Boulevard, Pittsburgh, PA 15206-3701, USA
| | - Fu-Chiang Rich Tsui
- Real-time Outbreak and Disease Surveillance (RODS) Laboratory, Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Boulevard, Pittsburgh, PA 15206-3701, USA
| | - Nicholas Millett
- Real-time Outbreak and Disease Surveillance (RODS) Laboratory, Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Boulevard, Pittsburgh, PA 15206-3701, USA
| | - Jeremy U Espino
- Real-time Outbreak and Disease Surveillance (RODS) Laboratory, Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Boulevard, Pittsburgh, PA 15206-3701, USA
| | - Michael M Wagner
- Real-time Outbreak and Disease Surveillance (RODS) Laboratory, Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Boulevard, Pittsburgh, PA 15206-3701, USA
| |
Collapse
|