1
|
Zitu MM, Zhang S, Owen DH, Chiang C, Li L. Generalizability of machine learning methods in detecting adverse drug events from clinical narratives in electronic medical records. Front Pharmacol 2023; 14:1218679. [PMID: 37502211 PMCID: PMC10368879 DOI: 10.3389/fphar.2023.1218679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2023] [Accepted: 06/26/2023] [Indexed: 07/29/2023] Open
Abstract
We assessed the generalizability of machine learning methods using natural language processing (NLP) techniques to detect adverse drug events (ADEs) from clinical narratives in electronic medical records (EMRs). We constructed a new corpus correlating drugs with adverse drug events using 1,394 clinical notes of 47 randomly selected patients who received immune checkpoint inhibitors (ICIs) from 2011 to 2018 at The Ohio State University James Cancer Hospital, annotating 189 drug-ADE relations in single sentences within the medical records. We also used data from Harvard's publicly available 2018 National Clinical Challenge (n2c2), which includes 505 discharge summaries with annotations of 1,355 single-sentence drug-ADE relations. We applied classical machine learning (support vector machine (SVM)), deep learning (convolutional neural network (CNN) and bidirectional long short-term memory (BiLSTM)), and state-of-the-art transformer-based (bidirectional encoder representations from transformers (BERT) and ClinicalBERT) methods trained and tested in the two different corpora and compared performance among them to detect drug-ADE relationships. ClinicalBERT detected drug-ADE relationships better than the other methods when trained using our dataset and tested in n2c2 (ClinicalBERT F-score, 0.78; other methods, F-scores, 0.61-0.73) and when trained using the n2c2 dataset and tested in ours (ClinicalBERT F-score, 0.74; other methods, F-scores, 0.55-0.72). Comparison among several machine learning methods demonstrated the superior performance and, therefore, the greatest generalizability of findings of ClinicalBERT for the detection of drug-ADE relations from clinical narratives in electronic medical records.
Collapse
Affiliation(s)
- Md Muntasir Zitu
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, United States
| | - Shijun Zhang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, United States
| | - Dwight H. Owen
- Department of Internal Medicine, College of Medicine, The Ohio State University, Columbus, OH, United States
| | - Chienwei Chiang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, United States
| | - Lang Li
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, United States
| |
Collapse
|
2
|
Marras C, Arbatti L, Hosamath A, Amara A, Anderson KE, Chahine LM, Eberly S, Kinel D, Mantri S, Mathur S, Oakes D, Purks JL, Standaert DG, Tanner CM, Weintraub D, Shoulson I. What Patients Say: Large-Scale Analyses of Replies to the Parkinson's Disease Patient Report of Problems (PD-PROP). JOURNAL OF PARKINSON'S DISEASE 2023; 13:757-767. [PMID: 37334615 PMCID: PMC10473108 DOI: 10.3233/jpd-225083] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 05/14/2023] [Indexed: 06/20/2023]
Abstract
BACKGROUND Free-text, verbatim replies in the words of people with Parkinson's disease (PD) have the potential to provide unvarnished information about their feelings and experiences. Challenges of processing such data on a large scale are a barrier to analyzing verbatim data collection in large cohorts. OBJECTIVE To develop a method for curating responses from the Parkinson's Disease Patient Report of Problems (PD-PROP), open-ended questions that asks people with PD to report their most bothersome problems and associated functional consequences. METHODS Human curation, natural language processing, and machine learning were used to develop an algorithm to convert verbatim responses to classified symptoms. Nine curators including clinicians, people with PD, and a non-clinician PD expert classified a sample of responses as reporting each symptom or not. Responses to the PD-PROP were collected within the Fox Insight cohort study. RESULTS Approximately 3,500 PD-PROP responses were curated by a human team. Subsequently, approximately 1,500 responses were used in the validation phase; median age of respondents was 67 years, 55% were men and median years since PD diagnosis was 3 years. 168,260 verbatim responses were classified by machine. Accuracy of machine classification was 95% on a held-out test set. 65 symptoms were grouped into 14 domains. The most frequently reported symptoms at first report were tremor (by 46% of respondents), gait and balance problems (>39%), and pain/discomfort (33%). CONCLUSION A human-in-the-loop method of curation provides both accuracy and efficiency, permitting a clinically useful analysis of large datasets of verbatim reports about the problems that bother PD patients.
Collapse
Affiliation(s)
- Connie Marras
- Edmond J Safra Program in Parkinson’s Disease, University Health Network, University of Toronto, Toronto, Canada
| | - Lakshmi Arbatti
- Grey Matter Technologies, a Wholly Owned Subsidiary of Modality.ai, San Francisco, CA, USA
| | - Abhishek Hosamath
- Grey Matter Technologies, a Wholly Owned Subsidiary of Modality.ai, San Francisco, CA, USA
| | - Amy Amara
- Department of Neurology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Karen E. Anderson
- Departments of Psychiatry and Neurology, Georgetown University, Washington DC, USA
| | - Lana M. Chahine
- Department of Neurology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Shirley Eberly
- Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY, USA
| | - Dan Kinel
- Department of Neurology, University of Rochester, Rochester NY, USA
| | - Sneha Mantri
- Department of Neurology, Duke University, Durham, NC, USA
| | | | - David Oakes
- Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY, USA
| | | | | | - Caroline M. Tanner
- Department of Neurology, Weill Institute for Neurosciences, University of California – San Francisco, San Francisco, CA, USA
| | - Daniel Weintraub
- Departments of Psychiatry and Neurology, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
| | - Ira Shoulson
- Grey Matter Technologies, a Wholly Owned Subsidiary of Modality.ai, San Francisco, CA, USA
- Department of Neurology, University of Rochester, Rochester NY, USA
| |
Collapse
|
3
|
Binkheder S, Wu HY, Quinney SK, Zhang S, Zitu MM, Chiang CW, Wang L, Jones J, Li L. Correction: PhenoDEF: a corpus for annotating sentences with information of phenotype definitions in biomedical literature. J Biomed Semantics 2022; 13:20. [PMID: 35858945 PMCID: PMC9297579 DOI: 10.1186/s13326-022-00275-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Affiliation(s)
- Samar Binkheder
- Department of Biohealth Informatics, Indiana University School of Informatics and Computing, Indianapolis, IN, USA.,Medical Informatics Unit, Department of Medical Education, College of Medicine, King Saud University, Riyadh, Saudi Arabia
| | - Heng-Yi Wu
- Development Science Informatics, Genentech, South San Francisco, CA, USA
| | - Sara K Quinney
- Department of Obstetrics and Gynecology, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Shijun Zhang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
| | - Md Muntasir Zitu
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
| | - Chien-Wei Chiang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
| | - Lei Wang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
| | - Josette Jones
- Department of Biohealth Informatics, Indiana University School of Informatics and Computing, Indianapolis, IN, USA
| | - Lang Li
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA. .,, 250 Lincoln Tower, 1800 Cannon Drive, Columbus, OH, 43210, USA.
| |
Collapse
|