Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Cai T, Zhang L, Yang N, Kumamaru KK, Rybicki FJ, Cai T, Liao KP. EXTraction of EMR numerical data: an efficient and generalizable tool to EXTEND clinical research. BMC Med Inform Decis Mak 2019;19:226. [PMID: 31730484 PMCID: PMC6858776 DOI: 10.1186/s12911-019-0970-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2019] [Accepted: 11/06/2019] [Indexed: 11/12/2022] Open

For:	Cai T, Zhang L, Yang N, Kumamaru KK, Rybicki FJ, Cai T, Liao KP. EXTraction of EMR numerical data: an efficient and generalizable tool to EXTEND clinical research. BMC Med Inform Decis Mak 2019;19:226. [PMID: 31730484 PMCID: PMC6858776 DOI: 10.1186/s12911-019-0970-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2019] [Accepted: 11/06/2019] [Indexed: 11/12/2022] Open

Number

Cited by Other Article(s)

Harandi AA, McPherson K, Lo Y, Gutiérrez R, Chao JY. A pragmatic methodology to extract anesthetic and physiological data from the electronic health record. Paediatr Anaesth 2024;34:318-323. [PMID: 38055618 PMCID: PMC10922302 DOI: 10.1111/pan.14817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 11/03/2023] [Accepted: 11/19/2023] [Indexed: 12/08/2023]

Abstract

BACKGROUND/AIMS

Traditional manual methods of extracting anesthetic and physiological data from the electronic health record rely upon visual transcription by a human analyst that can be labor-intensive and prone to error. Technical complexity, relative inexperience in computer coding, and decreased access to data warehouses can deter investigators from obtaining valuable electronic health record data for research studies, especially in under-resourced settings. We therefore aimed to develop, pilot, and demonstrate the effectiveness and utility of a pragmatic data extraction methodology.

METHODS

Expired sevoflurane concentration data from the electronic health record transcribed by eye was compared to an intermediate preprocessing method in which the entire anesthetic flowsheet narrative report was selected, copy-pasted, and processed using only Microsoft Word and Excel software to generate a comma-delimited (.csv) file. A step-by-step presentation of this method is presented. Concordance rates, Pearson correlation coefficients, and scatterplots with lines of best fit were used to compare the two methods of data extraction.

RESULTS

A total of 1132 datapoints across eight subjects were analyzed, accounting for 18.9 h of anesthesia time. There was a high concordance rate of data extracted using the two methods (median concordance rate 100% range [96%, 100%]). The median time required to complete manual data extraction was significantly longer compared to the time required using the intermediate method (240 IQR [199, 482.5] seconds vs 92.5 IQR [69, 99] seconds, p = .01) and was linearly associated with the number of datapoints (rmanual = .97, p < .0001), whereas time required to complete data extraction using the intermediate approach was independent of the number of datapoints (rintermediate = -.02, p = .99).

CONCLUSIONS

We describe a pragmatic data extraction methodology that does not require additional software or coding skills intended to enhance the ease, speed, and accuracy of data collection that could assist in clinician investigator-initiated research and quality/process improvement projects.

Collapse

Chai D, Liu Z, Wang L, Duan H, Zhao C, Xu C, Zhang D, Zhao Q, Ma P. Effectiveness of Medication Reconciliation in a Chinese Hospital: A Pilot Randomized Controlled Trial. J Multidiscip Healthc 2023;16:3641-3650. [PMID: 38034875 PMCID: PMC10683647 DOI: 10.2147/jmdh.s432522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Accepted: 11/02/2023] [Indexed: 12/02/2023] Open

Abstract

Background

Implementing medication reconciliation (MR) was complex and challenging because of the variability in the guidance provided for conducting. The processes of MR adopted in China were different from that recommended by the World Health Organization. A pilot study to inform the design of a future randomized controlled trial to determine the effectiveness of these two workflows was undertaken.

Methods

Patients taking at least one home/regular medication for hypertension, diabetes, or coronary heart disease were recruited at admission, and then were randomized using a computer-generated random number in a closed envelope. In the study group, the pharmacist reviewed electronic medical record systems before communication with patients. In the control group, pharmacists communicated with patients at patient's admission. The time investment of pharmacists for MR process, the number of unintended medication discrepancies, and physician acceptance were tested as outcome measures.

Results

One hundred and forty adult patients were randomized, of which 66 patients in the intervention received MR within 24 hours, while 58 patients in control received MR at some point during admission. The most common condition in the study group was hypertension (coronary heart disease in the control group). The workflow of the study group can save an average 7 minutes per patient compared with the WHO recommended process [17.5 minutes (IQR 14.00, 28.25) vs 24.5 minutes (IQR17.75, 35.25), p = 0.004]. The number of unintended discrepancies was 42 in the study group and 34 in the control group (p = 0.33). Physicians' acceptance in the study and control groups were 87.5% and 92.3%, respectively (p = 0.87).

Conclusion

The results suggest that changes in outcome measures were in the appropriate direction and that the time limit for implementing MR can be set within 48 hours. A future multi-centre RCT study to determine the effectiveness of MR is feasible and warranted.

Collapse

Ma MW, Gao XS, Zhang ZY, Shang SY, Jin L, Liu PL, Lv F, Ni W, Han YC, Zong H. Extracting laboratory test information from paper-based reports. BMC Med Inform Decis Mak 2023;23:251. [PMID: 37932733 PMCID: PMC10629084 DOI: 10.1186/s12911-023-02346-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 10/20/2023] [Indexed: 11/08/2023] Open

Abstract

BACKGROUND

In the healthcare domain today, despite the substantial adoption of electronic health information systems, a significant proportion of medical reports still exist in paper-based formats. As a result, there is a significant demand for the digitization of information from these paper-based reports. However, the digitization of paper-based laboratory reports into a structured data format can be challenging due to their non-standard layouts, which includes various data types such as text, numeric values, reference ranges, and units. Therefore, it is crucial to develop a highly scalable and lightweight technique that can effectively identify and extract information from laboratory test reports and convert them into a structured data format for downstream tasks.

METHODS

We developed an end-to-end Natural Language Processing (NLP)-based pipeline for extracting information from paper-based laboratory test reports. Our pipeline consists of two main modules: an optical character recognition (OCR) module and an information extraction (IE) module. The OCR module is applied to locate and identify text from scanned laboratory test reports using state-of-the-art OCR algorithms. The IE module is then used to extract meaningful information from the OCR results to form digitalized tables of the test reports. The IE module consists of five sub-modules, which are time detection, headline position, line normalization, Named Entity Recognition (NER) with a Conditional Random Fields (CRF)-based method, and step detection for multi-column. Finally, we evaluated the performance of the proposed pipeline on 153 laboratory test reports collected from Peking University First Hospital (PKU1).

RESULTS

In the OCR module, we evaluate the accuracy of text detection and recognition results at three different levels and achieved an averaged accuracy of 0.93. In the IE module, we extracted four laboratory test entities, including test item name, test result, test unit, and reference value range. The overall F1 score is 0.86 on the 153 laboratory test reports collected from PKU1. With a single CPU, the average inference time of each report is only 0.78 s.

CONCLUSION

In this study, we developed a practical lightweight pipeline to digitalize and extract information from paper-based laboratory test reports in diverse types and with different layouts that can be adopted in real clinical environments with the lowest possible computing resources requirements. The high evaluation performance on the real-world hospital dataset validated the feasibility of the proposed pipeline.

Collapse

Huang S, Cai T, Weber BN, He Z, Dahal KP, Hong C, Hou J, Seyok T, Cagan A, DiCarli MF, Joseph J, Kim SC, Solomon DH, Cai T, Liao KP. Association Between Inflammation, Incident Heart Failure, and Heart Failure Subtypes in Patients With Rheumatoid Arthritis. Arthritis Care Res (Hoboken) 2023;75:1036-1045. [PMID: 34623035 PMCID: PMC8989720 DOI: 10.1002/acr.24804] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 09/27/2021] [Accepted: 10/05/2021] [Indexed: 12/14/2022]

Abstract

OBJECTIVE

In rheumatoid arthritis (RA), there are limited data on risk factors for the clinical heart failure (HF) subtypes of HF with reduced ejection fraction (HFrEF) and HF with preserved ejection fraction (HFpEF). This study examined the association between inflammation and incident HF subtypes in RA. Because inflammation changes over time with disease activity, we hypothesized that the effect of inflammation may be stronger at the 5-year follow-up than at the standard 10-year follow-up from general population studies of cardiovascular risk.

METHODS

We studied an electronic health record (EHR)-based RA cohort with data pre- and post-RA incidence. We applied a validated approach to identify HF and extract ejection fraction to classify HFrEF and HFpEF. Follow-up started from the RA incidence date (index date) to the earliest occurrence of incident HF, death, last EHR encounter, or 10 years. Baseline inflammation was assessed using erythrocyte sedimentation rate or C-reactive protein values. Covariates included demographic characteristics, established HF risk factors, and RA-related factors. We tested the association between baseline inflammation with incident HF and its subtypes using Cox proportional hazards models.

RESULTS

We studied 9,087 patients with RA; 8.2% developed HF during 10 years of follow-up. Elevated inflammation was associated with increased risk for HF at both 5- and 10-year follow-ups (hazard ratio [HR] 1.66, 95% confidence interval [95% CI] 1.12-2.46 and HR 1.46, 95% CI 1.13-1.90, respectively), which is also seen for HFpEF at 5 years (HR 1.72, 95% CI 1.09-2.70) and 10 years (HR 1.45, 95% CI 1.07-1.94). HFrEF was not associated with inflammation for either follow-up time.

CONCLUSION

Elevated inflammation early in RA diagnosis was associated with HF; this association was driven by HFpEF and not HFrEF, suggesting a window of opportunity for prevention of HFpEF in RA.

Collapse

Affiliation(s)

Sicong Huang Brigham and Women’s Hospital and Harvard Medical School Division of Rheumatology, Inflammation, and Immunity Section of Rheumatology Veterans Administration Boston Healthcare System
Tianrun Cai Brigham and Women’s Hospital and Harvard Medical School Division of Rheumatology, Inflammation, and Immunity Veterans Administration Boston Healthcare System
Brittany N. Weber Brigham and Women’s Hospital and Harvard Medical School Cardiovascular Division
Zeling He Brigham and Women’s Hospital and Harvard Medical School Division of Rheumatology, Inflammation, and Immunity
Kumar P. Dahal Brigham and Women’s Hospital and Harvard Medical School Division of Rheumatology, Inflammation, and Immunity Veterans Administration Boston Healthcare System
Chuan Hong Veterans Administration Boston Healthcare System Department of Biomedical Informatics, Harvard Medical School Biostatistics, Harvard T.H. Chan School of Public Health
Jue Hou Veterans Administration Boston Healthcare System Biostatistics, Harvard T.H. Chan School of Public Health
Thany Seyok Brigham and Women’s Hospital and Harvard Medical School Division of Rheumatology, Inflammation, and Immunity
Andrew Cagan Brigham and Women’s Hospital and Harvard Medical School Research Information Science and Computing, Mass General Brigham
Marcelo F. DiCarli Brigham and Women’s Hospital and Harvard Medical School Cardiovascular Division
Jacob Joseph Brigham and Women’s Hospital and Harvard Medical School Veterans Administration Boston Healthcare System Cardiovascular Division
Seoyoung C. Kim Brigham and Women’s Hospital and Harvard Medical School Division of Rheumatology, Inflammation, and Immunity Division of Pharmacoepidemiology and Pharmacoeconomics
Daniel H. Solomon Brigham and Women’s Hospital and Harvard Medical School Division of Rheumatology, Inflammation, and Immunity
Tianxi Cai Veterans Administration Boston Healthcare System Department of Biomedical Informatics, Harvard Medical School Biostatistics, Harvard T.H. Chan School of Public Health
Katherine P. Liao Brigham and Women’s Hospital and Harvard Medical School Division of Rheumatology, Inflammation, and Immunity Section of Rheumatology Veterans Administration Boston Healthcare System Department of Biomedical Informatics, Harvard Medical School

Collapse

Le TD, Noumeir R, Rambaud J, Sans G, Jouvet P. Detecting of a Patient's Condition From Clinical Narratives Using Natural Language Representation. IEEE OPEN JOURNAL OF ENGINEERING IN MEDICINE AND BIOLOGY 2022;3:142-149. [PMID: 36712317 PMCID: PMC9870264 DOI: 10.1109/ojemb.2022.3209900] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 06/27/2022] [Accepted: 09/22/2022] [Indexed: 02/01/2023] Open

Singh P, Haimovich J, Reeder C, Khurshid S, Lau ES, Cunningham JW, Philippakis A, Anderson CD, Ho JE, Lubitz SA, Batra P. One Clinician Is All You Need-Cardiac Magnetic Resonance Imaging Measurement Extraction: Deep Learning Algorithm Development. JMIR Med Inform 2022;10:e38178. [PMID: 35960155 PMCID: PMC9526125 DOI: 10.2196/38178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 07/22/2022] [Accepted: 08/11/2022] [Indexed: 11/23/2022] Open

Abstract

BACKGROUND

Cardiac magnetic resonance imaging (CMR) is a powerful diagnostic modality that provides detailed quantitative assessment of cardiac anatomy and function. Automated extraction of CMR measurements from clinical reports that are typically stored as unstructured text in electronic health record systems would facilitate their use in research. Existing machine learning approaches either rely on large quantities of expert annotation or require the development of engineered rules that are time-consuming and are specific to the setting in which they were developed.

OBJECTIVE

We hypothesize that the use of pretrained transformer-based language models may enable label-efficient numerical extraction from clinical text without the need for heuristics or large quantities of expert annotations. Here, we fine-tuned pretrained transformer-based language models on a small quantity of CMR annotations to extract 21 CMR measurements. We assessed the effect of clinical pretraining to reduce labeling needs and explored alternative representations of numerical inputs to improve performance.

METHODS

Our study sample comprised 99,252 patients that received longitudinal cardiology care in a multi-institutional health care system. There were 12,720 available CMR reports from 9280 patients. We adapted PRAnCER (Platform Enabling Rapid Annotation for Clinical Entity Recognition), an annotation tool for clinical text, to collect annotations from a study clinician on 370 reports. We experimented with 5 different representations of numerical quantities and several model weight initializations. We evaluated extraction performance using macroaveraged F1-scores across the measurements of interest. We applied the best-performing model to extract measurements from the remaining CMR reports in the study sample and evaluated established associations between selected extracted measures with clinical outcomes to demonstrate validity.

RESULTS

All combinations of weight initializations and numerical representations obtained excellent performance on the gold-standard test set, suggesting that transformer models fine-tuned on a small set of annotations can effectively extract numerical quantities. Our results further indicate that custom numerical representations did not appear to have a significant impact on extraction performance. The best-performing model achieved a macroaveraged F1-score of 0.957 across the evaluated CMR measurements (range 0.92 for the lowest-performing measure of left atrial anterior-posterior dimension to 1.0 for the highest-performing measures of left ventricular end systolic volume index and left ventricular end systolic diameter). Application of the best-performing model to the study cohort yielded 136,407 measurements from all available reports in the study sample. We observed expected associations between extracted left ventricular mass index, left ventricular ejection fraction, and right ventricular ejection fraction with clinical outcomes like atrial fibrillation, heart failure, and mortality.

CONCLUSIONS

This study demonstrated that a domain-agnostic pretrained transformer model is able to effectively extract quantitative clinical measurements from diagnostic reports with a relatively small number of gold-standard annotations. The proposed workflow may serve as a roadmap for other quantitative entity extraction.

Collapse

Affiliation(s)

Pulkit Singh Data Sciences Platform, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
Julian Haimovich Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, United States Cardiovascular Disease Initiative, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
Christopher Reeder Data Sciences Platform, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
Shaan Khurshid Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, United States Demoulas Center for Cardiac Arrhythmias, Massachusetts General Hospital, Boston, MA, United States
Emily S Lau Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, United States Cardiovascular Disease Initiative, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
Jonathan W Cunningham Cardiovascular Disease Initiative, The Broad Institute of Harvard and MIT, Cambridge, MA, United States Division of Cardiology, Brigham and Women's Hospital, Boston, MA, United States
Anthony Philippakis Data Sciences Platform, The Broad Institute of Harvard and MIT, Cambridge, MA, United States Eric and Wendy Schmidt Center, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
Christopher D Anderson Department of Neurology, Brigham and Women's Hospital, Boston, MA, United States Henry and Allison McCance Center for Brain Health, Massachusetts General Hospital, Boston, MA, United States Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, United States
Jennifer E Ho Cardiovascular Disease Initiative, The Broad Institute of Harvard and MIT, Cambridge, MA, United States CardioVascular Institute and Division of Cardiology, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA, United States
Steven A Lubitz Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, United States Cardiovascular Disease Initiative, The Broad Institute of Harvard and MIT, Cambridge, MA, United States Demoulas Center for Cardiac Arrhythmias, Massachusetts General Hospital, Boston, MA, United States
Puneet Batra Data Sciences Platform, The Broad Institute of Harvard and MIT, Cambridge, MA, United States

Collapse

Hou J, Zhao R, Cai T, Beaulieu-Jones B, Seyok T, Dahal K, Yuan Q, Xiong X, Bonzel CL, Fox C, Christiani DC, Jemielita T, Liao KP, Liaw KL, Cai T. Temporal Trends in Clinical Evidence of 5-Year Survival Within Electronic Health Records Among Patients With Early-Stage Colon Cancer Managed With Laparoscopy-Assisted Colectomy vs Open Colectomy. JAMA Netw Open 2022;5:e2218371. [PMID: 35737384 PMCID: PMC9227003 DOI: 10.1001/jamanetworkopen.2022.18371] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Abstract

IMPORTANCE

Temporal shifts in clinical knowledge and practice need to be adjusted for in treatment outcome assessment in clinical evidence.

OBJECTIVE

To use electronic health record (EHR) data to (1) assess the temporal trends in treatment decisions and patient outcomes and (2) emulate a randomized clinical trial (RCT) using EHR data with proper adjustment for temporal trends.

DESIGN, SETTING, AND PARTICIPANTS

The Clinical Outcomes of Surgical Therapy (COST) Study Group Trial assessing overall survival of patients with stages I to III early-stage colon cancer was chosen as the target trial. The RCT was emulated using EHR data of patients from a single health care system cohort who underwent colectomy for early-stage colon cancer from January 1, 2006, to December 31, 2017, and were followed up to January 1, 2020, from Mass General Brigham. Analyses were conducted from December 2, 2019, to January 24, 2022.

EXPOSURES

Laparoscopy-assisted colectomy (LAC) vs open colectomy (OC).

MAIN OUTCOMES AND MEASURES

The primary outcome was 5-year overall survival. To address confounding in the emulation, pretreatment variables were selected and adjusted. The temporal trends were adjusted by stratification of the calendar year when the colectomies were performed with cotraining across strata.

RESULTS

A total of 943 patients met key RCT eligibility criteria in the EHR emulation cohort, including 518 undergoing LAC (median age, 63 [range, 20-95] years; 268 [52%] women; 121 [23%] with stage I, 165 [32%] with stage II, and 232 [45%] with stage III cancer; 32 [6%] with colon adhesion; 278 [54%] with right-sided colon cancer; 18 [3%] with left-sided colon cancer; and 222 [43%] with sigmoid colon cancer) and 425 undergoing OC (median age, 65 [range, 28-99] years; 223 [52%] women; 61 [14%] with stage I, 153 [36%] with stage II, and 211 [50%] with stage III cancer; 39 [9%] with colon adhesion; 202 [47%] with right-sided colon cancer; 39 [9%] with left-sided colon cancer; and 201 [47%] with sigmoid colon cancer). Tests for temporal trends in treatment assignment (χ2 = 60.3; P < .001) and overall survival (χ2 = 137.2; P < .001) were significant. The adjusted EHR emulation reached the same conclusion as the RCT: LAC is not inferior to OC in overall survival rate with risk difference at 5 years of -0.007 (95% CI, -0.070 to 0.057). The results were consistent for stratified analysis within each temporal period.

CONCLUSIONS AND RELEVANCE

These findings suggest that confounding bias from temporal trends should be considered when conducting clinical evidence studies with long time spans. Stratification of calendar time and cotraining of models is one solution. With proper adjustment, clinical evidence may supplement RCTs in the assessment of treatment outcome over time.

Collapse

Affiliation(s)

Jue Hou Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts
Rachel Zhao Department of Medicine, University of British Columbia, Vancouver, British Columbia, Canada
Tianrun Cai Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women’s Hospital/Harvard Medical School, Boston, Massachusetts
Brett Beaulieu-Jones Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts
Thany Seyok Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women’s Hospital/Harvard Medical School, Boston, Massachusetts
Kumar Dahal Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women’s Hospital/Harvard Medical School, Boston, Massachusetts
Qianyu Yuan Department of Environmental Health, Harvard T. H. Chan School of Public Health, Boston, Massachusetts
Xin Xiong Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts
Clara-Lea Bonzel Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts
Claire Fox Merck & Co, Inc, Kenilworth, New Jersey
David C. Christiani Department of Environmental Health, Harvard T. H. Chan School of Public Health, Boston, Massachusetts
Thomas Jemielita Merck & Co, Inc, Kenilworth, New Jersey
Katherine P. Liao Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women’s Hospital/Harvard Medical School, Boston, Massachusetts Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts
Kai-Li Liaw Merck & Co, Inc, Kenilworth, New Jersey
Tianxi Cai Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts

Collapse

Artificial Intelligence in Clinical Immunology. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_83] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

de Oliveira JM, da Costa CA, Antunes RS. Data structuring of electronic health records: a systematic review. HEALTH AND TECHNOLOGY 2021. [DOI: 10.1007/s12553-021-00607-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]

Yuan Q, Cai T, Hong C, Du M, Johnson BE, Lanuti M, Cai T, Christiani DC. Performance of a Machine Learning Algorithm Using Electronic Health Record Data to Identify and Estimate Survival in a Longitudinal Cohort of Patients With Lung Cancer. JAMA Netw Open 2021;4:e2114723. [PMID: 34232304 PMCID: PMC8264641 DOI: 10.1001/jamanetworkopen.2021.14723] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open

Abstract

IMPORTANCE

Electronic health records (EHRs) provide a low-cost means of accessing detailed longitudinal clinical data for large populations. A lung cancer cohort assembled from EHR data would be a powerful platform for clinical outcome studies.

OBJECTIVE

To investigate whether a clinical cohort assembled from EHRs could be used in a lung cancer prognosis study.

DESIGN, SETTING, AND PARTICIPANTS

In this cohort study, patients with lung cancer were identified among 76 643 patients with at least 1 lung cancer diagnostic code deposited in an EHR in Mass General Brigham health care system from July 1988 to October 2018. Patients were identified via a semisupervised machine learning algorithm, for which clinical information was extracted from structured and unstructured data via natural language processing tools. Data completeness and accuracy were assessed by comparing with the Boston Lung Cancer Study and against criterion standard EHR review results. A prognostic model for non-small cell lung cancer (NSCLC) overall survival was further developed for clinical application. Data were analyzed from March 2019 through July 2020.

EXPOSURES

Clinical data deposited in EHRs for cohort construction and variables of interest for the prognostic model were collected.

MAIN OUTCOMES AND MEASURES

The primary outcomes were the performance of the lung cancer classification model and the quality of the extracted variables; the secondary outcome was the performance of the prognostic model.

RESULTS

Among 76 643 patients with at least 1 lung cancer diagnostic code, 42 069 patients were identified as having lung cancer, with a positive predictive value of 94.4%. The study cohort consisted of 35 375 patients (16 613 men [47.0%] and 18 756 women [53.0%]; 30 140 White individuals [85.2%], 1040 Black individuals [2.9%], and 857 Asian individuals [2.4%]) after excluding patients with lung cancer history and less than 14 days of follow-up after initial diagnosis. The median (interquartile range) age at diagnosis was 66.7 (58.4-74.1) years. The area under the receiver operating characteristic curves of the prognostic model for overall survival with NSCLC were 0.828 (95% CI, 0.815-0.842) for 1-year prediction, 0.825 (95% CI, 0.812-0.836) for 2-year prediction, 0.814 (95% CI, 0.800-0.826) for 3-year prediction, 0.814 (95% CI, 0.799-0.828) for 4-year prediction, and 0.812 (95% CI, 0.798-0.825) for 5-year prediction.

CONCLUSIONS AND RELEVANCE

These findings suggest the feasibility of assembling a large-scale EHR-based lung cancer cohort with detailed longitudinal clinical measurements and that EHR data may be applied in cancer progression with a set of generalizable approaches.

Collapse

Artificial Intelligence in Clinical Immunology. Artif Intell Med 2021. [DOI: 10.1007/978-3-030-58080-3_83-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]