1
|
Harandi AA, McPherson K, Lo Y, Gutiérrez R, Chao JY. A pragmatic methodology to extract anesthetic and physiological data from the electronic health record. Paediatr Anaesth 2024; 34:318-323. [PMID: 38055618 PMCID: PMC10922302 DOI: 10.1111/pan.14817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 11/03/2023] [Accepted: 11/19/2023] [Indexed: 12/08/2023]
Abstract
BACKGROUND/AIMS Traditional manual methods of extracting anesthetic and physiological data from the electronic health record rely upon visual transcription by a human analyst that can be labor-intensive and prone to error. Technical complexity, relative inexperience in computer coding, and decreased access to data warehouses can deter investigators from obtaining valuable electronic health record data for research studies, especially in under-resourced settings. We therefore aimed to develop, pilot, and demonstrate the effectiveness and utility of a pragmatic data extraction methodology. METHODS Expired sevoflurane concentration data from the electronic health record transcribed by eye was compared to an intermediate preprocessing method in which the entire anesthetic flowsheet narrative report was selected, copy-pasted, and processed using only Microsoft Word and Excel software to generate a comma-delimited (.csv) file. A step-by-step presentation of this method is presented. Concordance rates, Pearson correlation coefficients, and scatterplots with lines of best fit were used to compare the two methods of data extraction. RESULTS A total of 1132 datapoints across eight subjects were analyzed, accounting for 18.9 h of anesthesia time. There was a high concordance rate of data extracted using the two methods (median concordance rate 100% range [96%, 100%]). The median time required to complete manual data extraction was significantly longer compared to the time required using the intermediate method (240 IQR [199, 482.5] seconds vs 92.5 IQR [69, 99] seconds, p = .01) and was linearly associated with the number of datapoints (rmanual = .97, p < .0001), whereas time required to complete data extraction using the intermediate approach was independent of the number of datapoints (rintermediate = -.02, p = .99). CONCLUSIONS We describe a pragmatic data extraction methodology that does not require additional software or coding skills intended to enhance the ease, speed, and accuracy of data collection that could assist in clinician investigator-initiated research and quality/process improvement projects.
Collapse
Affiliation(s)
- Arshia Aalami Harandi
- Department of Anesthesiology, Montefiore Medical Center, Albert Einstein College of Medicine, Bronx, New York, USA
- Albert Einstein College of Medicine, Bronx, New York, USA
| | - Katherine McPherson
- Department of Anesthesiology, Montefiore Medical Center, Albert Einstein College of Medicine, Bronx, New York, USA
- Albert Einstein College of Medicine, Bronx, New York, USA
| | - Yungtai Lo
- Department of Epidemiology & Population Health (Biostatistics), Albert Einstein College of Medicine, Bronx, New York, USA
| | - Rodrigo Gutiérrez
- Department of Anesthesiology and Perioperative Medicine, Center of Advanced Clinical Research, University of Chile, Santiago, Chile
- Department of Anesthesia, Critical Care, and Pain Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Jerry Y. Chao
- Department of Anesthesiology, Montefiore Medical Center, Albert Einstein College of Medicine, Bronx, New York, USA
| |
Collapse
|
2
|
Chai D, Liu Z, Wang L, Duan H, Zhao C, Xu C, Zhang D, Zhao Q, Ma P. Effectiveness of Medication Reconciliation in a Chinese Hospital: A Pilot Randomized Controlled Trial. J Multidiscip Healthc 2023; 16:3641-3650. [PMID: 38034875 PMCID: PMC10683647 DOI: 10.2147/jmdh.s432522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Accepted: 11/02/2023] [Indexed: 12/02/2023] Open
Abstract
Background Implementing medication reconciliation (MR) was complex and challenging because of the variability in the guidance provided for conducting. The processes of MR adopted in China were different from that recommended by the World Health Organization. A pilot study to inform the design of a future randomized controlled trial to determine the effectiveness of these two workflows was undertaken. Methods Patients taking at least one home/regular medication for hypertension, diabetes, or coronary heart disease were recruited at admission, and then were randomized using a computer-generated random number in a closed envelope. In the study group, the pharmacist reviewed electronic medical record systems before communication with patients. In the control group, pharmacists communicated with patients at patient's admission. The time investment of pharmacists for MR process, the number of unintended medication discrepancies, and physician acceptance were tested as outcome measures. Results One hundred and forty adult patients were randomized, of which 66 patients in the intervention received MR within 24 hours, while 58 patients in control received MR at some point during admission. The most common condition in the study group was hypertension (coronary heart disease in the control group). The workflow of the study group can save an average 7 minutes per patient compared with the WHO recommended process [17.5 minutes (IQR 14.00, 28.25) vs 24.5 minutes (IQR17.75, 35.25), p = 0.004]. The number of unintended discrepancies was 42 in the study group and 34 in the control group (p = 0.33). Physicians' acceptance in the study and control groups were 87.5% and 92.3%, respectively (p = 0.87). Conclusion The results suggest that changes in outcome measures were in the appropriate direction and that the time limit for implementing MR can be set within 48 hours. A future multi-centre RCT study to determine the effectiveness of MR is feasible and warranted.
Collapse
Affiliation(s)
- Dongyan Chai
- Department of Pharmacy, Henan Provincial People’s Hospital, Zhengzhou University People’s Hospital, Henan University People’s Hospital, Zhengzhou, HenanPeople’s Republic of China
- International Medical Center of Henan Province, Henan Provincial People’s Hospital, Zhengzhou, Henan, People’s Republic of China
| | - Zhihui Liu
- Department of General Practice, Henan Provincial People’s Hospital, Zhengzhou, Henan, People’s Republic of China
| | - Liuyi Wang
- Department of General Practice, Henan Provincial People’s Hospital, Zhengzhou, Henan, People’s Republic of China
| | - Hongyan Duan
- International Medical Center of Henan Province, Henan Provincial People’s Hospital, Zhengzhou, Henan, People’s Republic of China
- Department of General Practice, Henan Provincial People’s Hospital, Zhengzhou, Henan, People’s Republic of China
| | - Chenglong Zhao
- Department of Pharmacy, Henan Provincial People’s Hospital, Zhengzhou University People’s Hospital, Henan University People’s Hospital, Zhengzhou, HenanPeople’s Republic of China
| | - Chengyang Xu
- International Medical Center of Henan Province, Henan Provincial People’s Hospital, Zhengzhou, Henan, People’s Republic of China
| | - Dongyan Zhang
- Department of Pharmacy, Henan Provincial People’s Hospital, Zhengzhou University People’s Hospital, Henan University People’s Hospital, Zhengzhou, HenanPeople’s Republic of China
| | - Qiongrui Zhao
- Department of Clinical Research Service Center, Henan Provincial People’s Hospital, Zhengzhou, Henan, People’s Republic of China
| | - Peizhi Ma
- Department of Pharmacy, Henan Provincial People’s Hospital, Zhengzhou University People’s Hospital, Henan University People’s Hospital, Zhengzhou, HenanPeople’s Republic of China
| |
Collapse
|
3
|
Ma MW, Gao XS, Zhang ZY, Shang SY, Jin L, Liu PL, Lv F, Ni W, Han YC, Zong H. Extracting laboratory test information from paper-based reports. BMC Med Inform Decis Mak 2023; 23:251. [PMID: 37932733 PMCID: PMC10629084 DOI: 10.1186/s12911-023-02346-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 10/20/2023] [Indexed: 11/08/2023] Open
Abstract
BACKGROUND In the healthcare domain today, despite the substantial adoption of electronic health information systems, a significant proportion of medical reports still exist in paper-based formats. As a result, there is a significant demand for the digitization of information from these paper-based reports. However, the digitization of paper-based laboratory reports into a structured data format can be challenging due to their non-standard layouts, which includes various data types such as text, numeric values, reference ranges, and units. Therefore, it is crucial to develop a highly scalable and lightweight technique that can effectively identify and extract information from laboratory test reports and convert them into a structured data format for downstream tasks. METHODS We developed an end-to-end Natural Language Processing (NLP)-based pipeline for extracting information from paper-based laboratory test reports. Our pipeline consists of two main modules: an optical character recognition (OCR) module and an information extraction (IE) module. The OCR module is applied to locate and identify text from scanned laboratory test reports using state-of-the-art OCR algorithms. The IE module is then used to extract meaningful information from the OCR results to form digitalized tables of the test reports. The IE module consists of five sub-modules, which are time detection, headline position, line normalization, Named Entity Recognition (NER) with a Conditional Random Fields (CRF)-based method, and step detection for multi-column. Finally, we evaluated the performance of the proposed pipeline on 153 laboratory test reports collected from Peking University First Hospital (PKU1). RESULTS In the OCR module, we evaluate the accuracy of text detection and recognition results at three different levels and achieved an averaged accuracy of 0.93. In the IE module, we extracted four laboratory test entities, including test item name, test result, test unit, and reference value range. The overall F1 score is 0.86 on the 153 laboratory test reports collected from PKU1. With a single CPU, the average inference time of each report is only 0.78 s. CONCLUSION In this study, we developed a practical lightweight pipeline to digitalize and extract information from paper-based laboratory test reports in diverse types and with different layouts that can be adopted in real clinical environments with the lowest possible computing resources requirements. The high evaluation performance on the real-world hospital dataset validated the feasibility of the proposed pipeline.
Collapse
Affiliation(s)
- Ming-Wei Ma
- Department of Radiation Oncology, Peking University First Hospital, No.7 Xishiku Street, Beijing, 100034, China
| | - Xian-Shu Gao
- Department of Radiation Oncology, Peking University First Hospital, No.7 Xishiku Street, Beijing, 100034, China.
| | - Ze-Yu Zhang
- Philips Research China, Shanghai, 200072, China
| | - Shi-Yu Shang
- Department of Radiation Oncology, Peking University First Hospital, No.7 Xishiku Street, Beijing, 100034, China
| | - Ling Jin
- Philips Research China, Shanghai, 200072, China
| | - Pei-Lin Liu
- Department of Radiation Oncology, Peking University First Hospital, No.7 Xishiku Street, Beijing, 100034, China
| | - Feng Lv
- Department of Radiation Oncology, Peking University First Hospital, No.7 Xishiku Street, Beijing, 100034, China
| | - Wei Ni
- Philips Research China, Shanghai, 200072, China
| | - Yu-Chen Han
- Philips Research China, Shanghai, 200072, China
| | - Hui Zong
- Philips Research China, Shanghai, 200072, China
| |
Collapse
|
4
|
Huang S, Cai T, Weber BN, He Z, Dahal KP, Hong C, Hou J, Seyok T, Cagan A, DiCarli MF, Joseph J, Kim SC, Solomon DH, Cai T, Liao KP. Association Between Inflammation, Incident Heart Failure, and Heart Failure Subtypes in Patients With Rheumatoid Arthritis. Arthritis Care Res (Hoboken) 2023; 75:1036-1045. [PMID: 34623035 PMCID: PMC8989720 DOI: 10.1002/acr.24804] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 09/27/2021] [Accepted: 10/05/2021] [Indexed: 12/14/2022]
Abstract
OBJECTIVE In rheumatoid arthritis (RA), there are limited data on risk factors for the clinical heart failure (HF) subtypes of HF with reduced ejection fraction (HFrEF) and HF with preserved ejection fraction (HFpEF). This study examined the association between inflammation and incident HF subtypes in RA. Because inflammation changes over time with disease activity, we hypothesized that the effect of inflammation may be stronger at the 5-year follow-up than at the standard 10-year follow-up from general population studies of cardiovascular risk. METHODS We studied an electronic health record (EHR)-based RA cohort with data pre- and post-RA incidence. We applied a validated approach to identify HF and extract ejection fraction to classify HFrEF and HFpEF. Follow-up started from the RA incidence date (index date) to the earliest occurrence of incident HF, death, last EHR encounter, or 10 years. Baseline inflammation was assessed using erythrocyte sedimentation rate or C-reactive protein values. Covariates included demographic characteristics, established HF risk factors, and RA-related factors. We tested the association between baseline inflammation with incident HF and its subtypes using Cox proportional hazards models. RESULTS We studied 9,087 patients with RA; 8.2% developed HF during 10 years of follow-up. Elevated inflammation was associated with increased risk for HF at both 5- and 10-year follow-ups (hazard ratio [HR] 1.66, 95% confidence interval [95% CI] 1.12-2.46 and HR 1.46, 95% CI 1.13-1.90, respectively), which is also seen for HFpEF at 5 years (HR 1.72, 95% CI 1.09-2.70) and 10 years (HR 1.45, 95% CI 1.07-1.94). HFrEF was not associated with inflammation for either follow-up time. CONCLUSION Elevated inflammation early in RA diagnosis was associated with HF; this association was driven by HFpEF and not HFrEF, suggesting a window of opportunity for prevention of HFpEF in RA.
Collapse
Affiliation(s)
- Sicong Huang
- Brigham and Women’s Hospital and Harvard Medical School
- Division of Rheumatology, Inflammation, and Immunity
- Section of Rheumatology
- Veterans Administration Boston Healthcare System
| | - Tianrun Cai
- Brigham and Women’s Hospital and Harvard Medical School
- Division of Rheumatology, Inflammation, and Immunity
- Veterans Administration Boston Healthcare System
| | - Brittany N. Weber
- Brigham and Women’s Hospital and Harvard Medical School
- Cardiovascular Division
| | - Zeling He
- Brigham and Women’s Hospital and Harvard Medical School
- Division of Rheumatology, Inflammation, and Immunity
| | - Kumar P. Dahal
- Brigham and Women’s Hospital and Harvard Medical School
- Division of Rheumatology, Inflammation, and Immunity
- Veterans Administration Boston Healthcare System
| | - Chuan Hong
- Veterans Administration Boston Healthcare System
- Department of Biomedical Informatics, Harvard Medical School
- Biostatistics, Harvard T.H. Chan School of Public Health
| | - Jue Hou
- Veterans Administration Boston Healthcare System
- Biostatistics, Harvard T.H. Chan School of Public Health
| | - Thany Seyok
- Brigham and Women’s Hospital and Harvard Medical School
- Division of Rheumatology, Inflammation, and Immunity
| | - Andrew Cagan
- Brigham and Women’s Hospital and Harvard Medical School
- Research Information Science and Computing, Mass General Brigham
| | - Marcelo F. DiCarli
- Brigham and Women’s Hospital and Harvard Medical School
- Cardiovascular Division
| | - Jacob Joseph
- Brigham and Women’s Hospital and Harvard Medical School
- Veterans Administration Boston Healthcare System
- Cardiovascular Division
| | - Seoyoung C. Kim
- Brigham and Women’s Hospital and Harvard Medical School
- Division of Rheumatology, Inflammation, and Immunity
- Division of Pharmacoepidemiology and Pharmacoeconomics
| | - Daniel H. Solomon
- Brigham and Women’s Hospital and Harvard Medical School
- Division of Rheumatology, Inflammation, and Immunity
| | - Tianxi Cai
- Veterans Administration Boston Healthcare System
- Department of Biomedical Informatics, Harvard Medical School
- Biostatistics, Harvard T.H. Chan School of Public Health
| | - Katherine P. Liao
- Brigham and Women’s Hospital and Harvard Medical School
- Division of Rheumatology, Inflammation, and Immunity
- Section of Rheumatology
- Veterans Administration Boston Healthcare System
- Department of Biomedical Informatics, Harvard Medical School
| |
Collapse
|
5
|
Le TD, Noumeir R, Rambaud J, Sans G, Jouvet P. Detecting of a Patient's Condition From Clinical Narratives Using Natural Language Representation. IEEE OPEN JOURNAL OF ENGINEERING IN MEDICINE AND BIOLOGY 2022; 3:142-149. [PMID: 36712317 PMCID: PMC9870264 DOI: 10.1109/ojemb.2022.3209900] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 06/27/2022] [Accepted: 09/22/2022] [Indexed: 02/01/2023] Open
Abstract
The rapid progress in clinical data management systems and artificial intelligence approaches enable the era of personalized medicine. Intensive care units (ICUs) are ideal clinical research environments for such development because they collect many clinical data and are highly computerized. Goal: We designed a retrospective clinical study on a prospective ICU database using clinical natural language to help in the early diagnosis of heart failure in critically ill children. Methods: The methodology consisted of empirical experiments of a learning algorithm to learn the hidden interpretation and presentation of the French clinical note data. This study included 1386 patients' clinical notes with 5444 single lines of notes. There were 1941 positive cases (36% of total) and 3503 negative cases classified by two independent physicians using a standardized approach. Results: The multilayer perceptron neural network outperforms other discriminative and generative classifiers. Consequently, the proposed framework yields an overall classification performance with 89% accuracy, 88% recall, and 89% precision. Conclusions: This study successfully applied learning representation and machine learning algorithms to detect heart failure in a single French institution from clinical natural language. Further work is needed to use the same methodology in other languages and institutions.
Collapse
Affiliation(s)
- Thanh-Dung Le
- Biomedical Information Processing Lab, École de Technologie SupérieureUniversity of Québec Montreal QB H3G 1M8 Canada
- Research Center at CHU Sainte-Justine HospitalUniversity of Montreal Montreal QB H3T 1J4 Canada
| | - Rita Noumeir
- Biomedical Information Processing Lab, École de Technologie SupérieureUniversity of Québec Montreal QB H3G 1M8 Canada
| | - Jerome Rambaud
- Research Center at CHU Sainte-Justine HospitalUniversity of Montreal Montreal QB H3T 1J4 Canada
| | - Guillaume Sans
- Research Center at CHU Sainte-Justine HospitalUniversity of Montreal Montreal QB H3T 1J4 Canada
| | - Philippe Jouvet
- Research Center at CHU Sainte-Justine HospitalUniversity of Montreal Montreal QB H3T 1J4 Canada
| |
Collapse
|
6
|
Singh P, Haimovich J, Reeder C, Khurshid S, Lau ES, Cunningham JW, Philippakis A, Anderson CD, Ho JE, Lubitz SA, Batra P. One Clinician Is All You Need-Cardiac Magnetic Resonance Imaging Measurement Extraction: Deep Learning Algorithm Development. JMIR Med Inform 2022; 10:e38178. [PMID: 35960155 PMCID: PMC9526125 DOI: 10.2196/38178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 07/22/2022] [Accepted: 08/11/2022] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Cardiac magnetic resonance imaging (CMR) is a powerful diagnostic modality that provides detailed quantitative assessment of cardiac anatomy and function. Automated extraction of CMR measurements from clinical reports that are typically stored as unstructured text in electronic health record systems would facilitate their use in research. Existing machine learning approaches either rely on large quantities of expert annotation or require the development of engineered rules that are time-consuming and are specific to the setting in which they were developed. OBJECTIVE We hypothesize that the use of pretrained transformer-based language models may enable label-efficient numerical extraction from clinical text without the need for heuristics or large quantities of expert annotations. Here, we fine-tuned pretrained transformer-based language models on a small quantity of CMR annotations to extract 21 CMR measurements. We assessed the effect of clinical pretraining to reduce labeling needs and explored alternative representations of numerical inputs to improve performance. METHODS Our study sample comprised 99,252 patients that received longitudinal cardiology care in a multi-institutional health care system. There were 12,720 available CMR reports from 9280 patients. We adapted PRAnCER (Platform Enabling Rapid Annotation for Clinical Entity Recognition), an annotation tool for clinical text, to collect annotations from a study clinician on 370 reports. We experimented with 5 different representations of numerical quantities and several model weight initializations. We evaluated extraction performance using macroaveraged F1-scores across the measurements of interest. We applied the best-performing model to extract measurements from the remaining CMR reports in the study sample and evaluated established associations between selected extracted measures with clinical outcomes to demonstrate validity. RESULTS All combinations of weight initializations and numerical representations obtained excellent performance on the gold-standard test set, suggesting that transformer models fine-tuned on a small set of annotations can effectively extract numerical quantities. Our results further indicate that custom numerical representations did not appear to have a significant impact on extraction performance. The best-performing model achieved a macroaveraged F1-score of 0.957 across the evaluated CMR measurements (range 0.92 for the lowest-performing measure of left atrial anterior-posterior dimension to 1.0 for the highest-performing measures of left ventricular end systolic volume index and left ventricular end systolic diameter). Application of the best-performing model to the study cohort yielded 136,407 measurements from all available reports in the study sample. We observed expected associations between extracted left ventricular mass index, left ventricular ejection fraction, and right ventricular ejection fraction with clinical outcomes like atrial fibrillation, heart failure, and mortality. CONCLUSIONS This study demonstrated that a domain-agnostic pretrained transformer model is able to effectively extract quantitative clinical measurements from diagnostic reports with a relatively small number of gold-standard annotations. The proposed workflow may serve as a roadmap for other quantitative entity extraction.
Collapse
Affiliation(s)
- Pulkit Singh
- Data Sciences Platform, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
| | - Julian Haimovich
- Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, United States
- Cardiovascular Disease Initiative, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
| | - Christopher Reeder
- Data Sciences Platform, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
| | - Shaan Khurshid
- Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, United States
- Demoulas Center for Cardiac Arrhythmias, Massachusetts General Hospital, Boston, MA, United States
| | - Emily S Lau
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, United States
- Cardiovascular Disease Initiative, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
| | - Jonathan W Cunningham
- Cardiovascular Disease Initiative, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
- Division of Cardiology, Brigham and Women's Hospital, Boston, MA, United States
| | - Anthony Philippakis
- Data Sciences Platform, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
- Eric and Wendy Schmidt Center, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
| | - Christopher D Anderson
- Department of Neurology, Brigham and Women's Hospital, Boston, MA, United States
- Henry and Allison McCance Center for Brain Health, Massachusetts General Hospital, Boston, MA, United States
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, United States
| | - Jennifer E Ho
- Cardiovascular Disease Initiative, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
- CardioVascular Institute and Division of Cardiology, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA, United States
| | - Steven A Lubitz
- Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, United States
- Cardiovascular Disease Initiative, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
- Demoulas Center for Cardiac Arrhythmias, Massachusetts General Hospital, Boston, MA, United States
| | - Puneet Batra
- Data Sciences Platform, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
| |
Collapse
|
7
|
Hou J, Zhao R, Cai T, Beaulieu-Jones B, Seyok T, Dahal K, Yuan Q, Xiong X, Bonzel CL, Fox C, Christiani DC, Jemielita T, Liao KP, Liaw KL, Cai T. Temporal Trends in Clinical Evidence of 5-Year Survival Within Electronic Health Records Among Patients With Early-Stage Colon Cancer Managed With Laparoscopy-Assisted Colectomy vs Open Colectomy. JAMA Netw Open 2022; 5:e2218371. [PMID: 35737384 PMCID: PMC9227003 DOI: 10.1001/jamanetworkopen.2022.18371] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
IMPORTANCE Temporal shifts in clinical knowledge and practice need to be adjusted for in treatment outcome assessment in clinical evidence. OBJECTIVE To use electronic health record (EHR) data to (1) assess the temporal trends in treatment decisions and patient outcomes and (2) emulate a randomized clinical trial (RCT) using EHR data with proper adjustment for temporal trends. DESIGN, SETTING, AND PARTICIPANTS The Clinical Outcomes of Surgical Therapy (COST) Study Group Trial assessing overall survival of patients with stages I to III early-stage colon cancer was chosen as the target trial. The RCT was emulated using EHR data of patients from a single health care system cohort who underwent colectomy for early-stage colon cancer from January 1, 2006, to December 31, 2017, and were followed up to January 1, 2020, from Mass General Brigham. Analyses were conducted from December 2, 2019, to January 24, 2022. EXPOSURES Laparoscopy-assisted colectomy (LAC) vs open colectomy (OC). MAIN OUTCOMES AND MEASURES The primary outcome was 5-year overall survival. To address confounding in the emulation, pretreatment variables were selected and adjusted. The temporal trends were adjusted by stratification of the calendar year when the colectomies were performed with cotraining across strata. RESULTS A total of 943 patients met key RCT eligibility criteria in the EHR emulation cohort, including 518 undergoing LAC (median age, 63 [range, 20-95] years; 268 [52%] women; 121 [23%] with stage I, 165 [32%] with stage II, and 232 [45%] with stage III cancer; 32 [6%] with colon adhesion; 278 [54%] with right-sided colon cancer; 18 [3%] with left-sided colon cancer; and 222 [43%] with sigmoid colon cancer) and 425 undergoing OC (median age, 65 [range, 28-99] years; 223 [52%] women; 61 [14%] with stage I, 153 [36%] with stage II, and 211 [50%] with stage III cancer; 39 [9%] with colon adhesion; 202 [47%] with right-sided colon cancer; 39 [9%] with left-sided colon cancer; and 201 [47%] with sigmoid colon cancer). Tests for temporal trends in treatment assignment (χ2 = 60.3; P < .001) and overall survival (χ2 = 137.2; P < .001) were significant. The adjusted EHR emulation reached the same conclusion as the RCT: LAC is not inferior to OC in overall survival rate with risk difference at 5 years of -0.007 (95% CI, -0.070 to 0.057). The results were consistent for stratified analysis within each temporal period. CONCLUSIONS AND RELEVANCE These findings suggest that confounding bias from temporal trends should be considered when conducting clinical evidence studies with long time spans. Stratification of calendar time and cotraining of models is one solution. With proper adjustment, clinical evidence may supplement RCTs in the assessment of treatment outcome over time.
Collapse
Affiliation(s)
- Jue Hou
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts
| | - Rachel Zhao
- Department of Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| | - Tianrun Cai
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women’s Hospital/Harvard Medical School, Boston, Massachusetts
| | - Brett Beaulieu-Jones
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts
| | - Thany Seyok
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women’s Hospital/Harvard Medical School, Boston, Massachusetts
| | - Kumar Dahal
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women’s Hospital/Harvard Medical School, Boston, Massachusetts
| | - Qianyu Yuan
- Department of Environmental Health, Harvard T. H. Chan School of Public Health, Boston, Massachusetts
| | - Xin Xiong
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts
| | - Clara-Lea Bonzel
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts
| | | | - David C. Christiani
- Department of Environmental Health, Harvard T. H. Chan School of Public Health, Boston, Massachusetts
| | | | - Katherine P. Liao
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women’s Hospital/Harvard Medical School, Boston, Massachusetts
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts
| | | | - Tianxi Cai
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
8
|
Artificial Intelligence in Clinical Immunology. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_83] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
9
|
de Oliveira JM, da Costa CA, Antunes RS. Data structuring of electronic health records: a systematic review. HEALTH AND TECHNOLOGY 2021. [DOI: 10.1007/s12553-021-00607-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
|
10
|
Yuan Q, Cai T, Hong C, Du M, Johnson BE, Lanuti M, Cai T, Christiani DC. Performance of a Machine Learning Algorithm Using Electronic Health Record Data to Identify and Estimate Survival in a Longitudinal Cohort of Patients With Lung Cancer. JAMA Netw Open 2021; 4:e2114723. [PMID: 34232304 PMCID: PMC8264641 DOI: 10.1001/jamanetworkopen.2021.14723] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
IMPORTANCE Electronic health records (EHRs) provide a low-cost means of accessing detailed longitudinal clinical data for large populations. A lung cancer cohort assembled from EHR data would be a powerful platform for clinical outcome studies. OBJECTIVE To investigate whether a clinical cohort assembled from EHRs could be used in a lung cancer prognosis study. DESIGN, SETTING, AND PARTICIPANTS In this cohort study, patients with lung cancer were identified among 76 643 patients with at least 1 lung cancer diagnostic code deposited in an EHR in Mass General Brigham health care system from July 1988 to October 2018. Patients were identified via a semisupervised machine learning algorithm, for which clinical information was extracted from structured and unstructured data via natural language processing tools. Data completeness and accuracy were assessed by comparing with the Boston Lung Cancer Study and against criterion standard EHR review results. A prognostic model for non-small cell lung cancer (NSCLC) overall survival was further developed for clinical application. Data were analyzed from March 2019 through July 2020. EXPOSURES Clinical data deposited in EHRs for cohort construction and variables of interest for the prognostic model were collected. MAIN OUTCOMES AND MEASURES The primary outcomes were the performance of the lung cancer classification model and the quality of the extracted variables; the secondary outcome was the performance of the prognostic model. RESULTS Among 76 643 patients with at least 1 lung cancer diagnostic code, 42 069 patients were identified as having lung cancer, with a positive predictive value of 94.4%. The study cohort consisted of 35 375 patients (16 613 men [47.0%] and 18 756 women [53.0%]; 30 140 White individuals [85.2%], 1040 Black individuals [2.9%], and 857 Asian individuals [2.4%]) after excluding patients with lung cancer history and less than 14 days of follow-up after initial diagnosis. The median (interquartile range) age at diagnosis was 66.7 (58.4-74.1) years. The area under the receiver operating characteristic curves of the prognostic model for overall survival with NSCLC were 0.828 (95% CI, 0.815-0.842) for 1-year prediction, 0.825 (95% CI, 0.812-0.836) for 2-year prediction, 0.814 (95% CI, 0.800-0.826) for 3-year prediction, 0.814 (95% CI, 0.799-0.828) for 4-year prediction, and 0.812 (95% CI, 0.798-0.825) for 5-year prediction. CONCLUSIONS AND RELEVANCE These findings suggest the feasibility of assembling a large-scale EHR-based lung cancer cohort with detailed longitudinal clinical measurements and that EHR data may be applied in cancer progression with a set of generalizable approaches.
Collapse
Affiliation(s)
- Qianyu Yuan
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
| | - Tianrun Cai
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women’s Hospital, Boston, Massachusetts
| | - Chuan Hong
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts
| | - Mulong Du
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
- Department of Biostatistics, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Bruce E. Johnson
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts
- Center for Cancer Genomics, Dana-Farber Cancer Institute, Boston, Massachusetts
| | - Michael Lanuti
- Center for Thoracic Cancers, Division of Thoracic Surgery, Massachusetts General Hospital Cancer Center, Boston, Massachusetts
| | - Tianxi Cai
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts
| | - David C. Christiani
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
- Department of Medicine, Massachusetts General Hospital/Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
11
|
Artificial Intelligence in Clinical Immunology. Artif Intell Med 2021. [DOI: 10.1007/978-3-030-58080-3_83-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|