1
|
Foer D, Strasser ZH, Cui J, Cahill KN, Boyce JA, Murphy SN, Karlson EW. Association of GLP-1 Receptor Agonists with Chronic Obstructive Pulmonary Disease Exacerbations among Patients with Type 2 Diabetes. Am J Respir Crit Care Med 2023; 208:1088-1100. [PMID: 37647574 PMCID: PMC10867930 DOI: 10.1164/rccm.202303-0491oc] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 08/30/2023] [Indexed: 09/01/2023] Open
Abstract
Rationale: Patients with chronic obstructive pulmonary disease (COPD) and type 2 diabetes (T2D) have worse clinical outcomes compared with patients without metabolic dysregulation. GLP-1 (glucagon-like peptide 1) receptor agonists (GLP-1RAs) reduce asthma exacerbation risk and improve FVC in patients with COPD. Objectives: To determine whether GLP-1RA use is associated with reduced COPD exacerbation rates, and severe and moderate exacerbation risk, compared with other T2D therapies. Methods: A retrospective, observational, electronic health records-based study was conducted using an active comparator, new-user design of 1,642 patients with COPD in a U.S. health system from 2012 to 2022. The COPD cohort was identified using a previously validated machine learning algorithm that includes a natural language processing tool. Exposures were defined as prescriptions for GLP-1RAs (reference group), DPP-4 (dipeptidyl peptidase 4) inhibitors (DPP-4is), SGLT2 (sodium-glucose cotransporter 2) inhibitors, or sulfonylureas. Measurements and Main Results: Unadjusted COPD exacerbation counts were lower in GLP-1RA users. Adjusted exacerbation rates were significantly higher in DPP-4i (incidence rate ratio, 1.48 [95% confidence interval, 1.08-2.04]; P = 0.02) and sulfonylurea (incidence rate ratio, 2.09 [95% confidence interval, 1.62-2.69]; P < 0.0001) users compared with GLP-1RA users. GLP-1RA use was also associated with significantly reduced risk of severe exacerbations compared with DPP-4i and sulfonylurea use, and of moderate exacerbations compared with sulfonylurea use. After adjustment for clinical covariates, moderate exacerbation risk was also lower in GLP-1RA users compared with DPP-4i users. No statistically significant difference in exacerbation outcomes was seen between GLP-1RA and SGLT2 inhibitor users. Conclusions: Prospective studies of COPD exacerbations in patients with comorbid T2D are warranted. Additional research may elucidate the mechanisms underlying these observed associations with T2D medications.
Collapse
Affiliation(s)
- Dinah Foer
- Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts
- Harvard Medical School, Boston, Massachusetts
| | - Zachary H. Strasser
- Harvard Medical School, Boston, Massachusetts
- MGH Laboratory of Computer Science and
- Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts
| | - Jing Cui
- Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts
- Harvard Medical School, Boston, Massachusetts
| | - Katherine N. Cahill
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee; and
| | - Joshua A. Boyce
- Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts
- Harvard Medical School, Boston, Massachusetts
| | - Shawn N. Murphy
- Harvard Medical School, Boston, Massachusetts
- MGH Laboratory of Computer Science and
- Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts
| | - Elizabeth W. Karlson
- Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts
- Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
2
|
Alsaleh MM, Allery F, Choi JW, Hama T, McQuillin A, Wu H, Thygesen JH. Prediction of disease comorbidity using explainable artificial intelligence and machine learning techniques: A systematic review. Int J Med Inform 2023; 175:105088. [PMID: 37156169 DOI: 10.1016/j.ijmedinf.2023.105088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 03/23/2023] [Accepted: 05/01/2023] [Indexed: 05/10/2023]
Abstract
OBJECTIVE Disease comorbidity is a major challenge in healthcare affecting the patient's quality of life and costs. AI-based prediction of comorbidities can overcome this issue by improving precision medicine and providing holistic care. The objective of this systematic literature review was to identify and summarise existing machine learning (ML) methods for comorbidity prediction and evaluate the interpretability and explainability of the models. MATERIALS AND METHODS The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework was used to identify articles in three databases: Ovid Medline, Web of Science and PubMed. The literature search covered a broad range of terms for the prediction of disease comorbidity and ML, including traditional predictive modelling. RESULTS Of 829 unique articles, 58 full-text papers were assessed for eligibility. A final set of 22 articles with 61 ML models was included in this review. Of the identified ML models, 33 models achieved relatively high accuracy (80-95%) and AUC (0.80-0.89). Overall, 72% of studies had high or unclear concerns regarding the risk of bias. DISCUSSION This systematic review is the first to examine the use of ML and explainable artificial intelligence (XAI) methods for comorbidity prediction. The chosen studies focused on a limited scope of comorbidities ranging from 1 to 34 (mean = 6), and no novel comorbidities were found due to limited phenotypic and genetic data. The lack of standard evaluation for XAI hinders fair comparisons. CONCLUSION A broad range of ML methods has been used to predict the comorbidities of various disorders. With further development of explainable ML capacity in the field of comorbidity prediction, there is a significant possibility of identifying unmet health needs by highlighting comorbidities in patient groups that were not previously recognised to be at risk for particular comorbidities.
Collapse
Affiliation(s)
- Mohanad M Alsaleh
- Institute of Health Informatics, University College London, London, UK; Department of Health Informatics, College of Public Health and Health Informatics, Qassim University, Al Bukayriyah, Saudi Arabia.
| | - Freya Allery
- Institute of Health Informatics, University College London, London, UK
| | - Jung Won Choi
- Institute of Health Informatics, University College London, London, UK
| | - Tuankasfee Hama
- Institute of Health Informatics, University College London, London, UK
| | | | - Honghan Wu
- Institute of Health Informatics, University College London, London, UK
| | - Johan H Thygesen
- Institute of Health Informatics, University College London, London, UK
| |
Collapse
|
3
|
Zhuang Y, Xing F, Ghosh D, Hobbs BD, Hersh CP, Banaei-Kashani F, Bowler RP, Kechris K. Deep learning on graphs for multi-omics classification of COPD. PLoS One 2023; 18:e0284563. [PMID: 37083575 PMCID: PMC10121008 DOI: 10.1371/journal.pone.0284563] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 04/03/2023] [Indexed: 04/22/2023] Open
Abstract
Network approaches have successfully been used to help reveal complex mechanisms of diseases including Chronic Obstructive Pulmonary Disease (COPD). However despite recent advances, we remain limited in our ability to incorporate protein-protein interaction (PPI) network information with omics data for disease prediction. New deep learning methods including convolution Graph Neural Network (ConvGNN) has shown great potential for disease classification using transcriptomics data and known PPI networks from existing databases. In this study, we first reconstructed the COPD-associated PPI network through the AhGlasso (Augmented High-Dimensional Graphical Lasso Method) algorithm based on one independent transcriptomics dataset including COPD cases and controls. Then we extended the existing ConvGNN methods to successfully integrate COPD-associated PPI, proteomics, and transcriptomics data and developed a prediction model for COPD classification. This approach improves accuracy over several conventional classification methods and neural networks that do not incorporate network information. We also demonstrated that the updated COPD-associated network developed using AhGlasso further improves prediction accuracy. Although deep neural networks often achieve superior statistical power in classification compared to other methods, it can be very difficult to explain how the model, especially graph neural network(s), makes decisions on the given features and identifies the features that contribute the most to prediction generally and individually. To better explain how the spectral-based Graph Neural Network model(s) works, we applied one unified explainable machine learning method, SHapley Additive exPlanations (SHAP), and identified CXCL11, IL-2, CD48, KIR3DL2, TLR2, BMP10 and several other relevant COPD genes in subnetworks of the ConvGNN model for COPD prediction. Finally, Gene Ontology (GO) enrichment analysis identified glycosaminoglycan, heparin signaling, and carbohydrate derivative signaling pathways significantly enriched in the top important gene/proteins for COPD classifications.
Collapse
Affiliation(s)
- Yonghua Zhuang
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
- Biostatistics Shared Resource, University of Colorado Cancer Center, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
- Department of Pediatrics, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
| | - Fuyong Xing
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
| | - Debashis Ghosh
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
| | - Brian D. Hobbs
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States of America
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Boston, MA, United States of America
- Harvard Medical School, Boston, MA, United States of America
| | - Craig P. Hersh
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States of America
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Boston, MA, United States of America
- Harvard Medical School, Boston, MA, United States of America
| | - Farnoush Banaei-Kashani
- Department of Computer Science and Engineering, University of Colorado Denver, Denver, CO, United States of America
| | | | - Katerina Kechris
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
| |
Collapse
|
4
|
Deep-learning-based prognostic modeling for incident heart failure in patients with diabetes using electronic health records: A retrospective cohort study. PLoS One 2023; 18:e0281878. [PMID: 36809251 PMCID: PMC9943005 DOI: 10.1371/journal.pone.0281878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 02/02/2023] [Indexed: 02/23/2023] Open
Abstract
Patients with type 2 diabetes mellitus (T2DM) have more than twice the risk of developing heart failure (HF) compared to patients without diabetes. The present study is aimed to build an artificial intelligence (AI) prognostic model that takes in account a large and heterogeneous set of clinical factors and investigates the risk of developing HF in diabetic patients. We carried out an electronic health records- (EHR-) based retrospective cohort study that included patients with cardiological clinical evaluation and no previous diagnosis of HF. Information consists of features extracted from clinical and administrative data obtained as part of routine medical care. The primary endpoint was diagnosis of HF (during out-of-hospital clinical examination or hospitalization). We developed two prognostic models using (1) elastic net regularization for Cox proportional hazard model (COX) and (2) a deep neural network survival method (PHNN), in which a neural network was used to represent a non-linear hazard function and explainability strategies are applied to estimate the influence of predictors on the risk function. Over a median follow-up of 65 months, 17.3% of the 10,614 patients developed HF. The PHNN model outperformed COX both in terms of discrimination (c-index 0.768 vs 0.734) and calibration (2-year integrated calibration index 0.008 vs 0.018). The AI approach led to the identification of 20 predictors of different domains (age, body mass index, echocardiographic and electrocardiographic features, laboratory measurements, comorbidities, therapies) whose relationship with the predicted risk correspond to known trends in the clinical practice. Our results suggest that prognostic models for HF in diabetic patients may improve using EHRs in combination with AI techniques for survival analysis, which provide high flexibility and better performance with respect to standard approaches.
Collapse
|
5
|
Diagnostic Performance of a Machine Learning Algorithm (Asthma/Chronic Obstructive Pulmonary Disease [COPD] Differentiation Classification) Tool Versus Primary Care Physicians and Pulmonologists in Asthma, COPD, and Asthma/COPD Overlap. THE JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY. IN PRACTICE 2023; 11:1463-1474.e3. [PMID: 36716998 DOI: 10.1016/j.jaip.2023.01.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 01/04/2023] [Accepted: 01/05/2023] [Indexed: 01/29/2023]
Abstract
BACKGROUND The differential diagnosis of asthma and chronic obstructive pulmonary disease (COPD) poses a challenge in clinical practice and its misdiagnosis results in inappropriate treatment, increased exacerbations, and potentially death. OBJECTIVE To investigate the diagnostic accuracy of the Asthma/COPD Differentiation Classification (AC/DC) tool compared with primary care physicians and pulmonologists in asthma, COPD, and asthma-COPD overlap. METHODS The AC/DC machine learning-based diagnostic tool was developed using 12 parameters from electronic health records of more than 400,000 patients aged 35 years and older. An expert panel of three pulmonologists and four general practitioners from five countries evaluated 119 patient cases from a prospective observational study and provided a confirmed diagnosis (n = 116) of asthma (n = 53), COPD (n = 43), asthma-COPD overlap (n = 7), or other (n = 13). Cases were then reviewed by 180 primary care physicians and 180 pulmonologists from nine countries and by the AC/DC tool, and diagnostic accuracies were compared with reference to the expert panel diagnoses. RESULTS Average diagnostic accuracy of the AC/DC tool was superior to that of primary care physicians (median difference, 24%; 95% posterior credible interval: 17% to 29%; P < .0001) and was noninferior and superior (median difference, 12%; 95% posterior credible interval: 6% to 17%; P < .0001 for noninferiority and P = .0006 for superiority) to that of pulmonologists. Average diagnostic accuracies were 73%, 50%, and 61% by AC/DC tool, primary care physicians, and pulmonologists versus expert panel diagnosis, respectively. CONCLUSION The AC/DC tool demonstrated superior diagnostic accuracy compared with primary care physicians and pulmonologists in the diagnosis of asthma and COPD in patients aged 35 years and greater and has the potential to support physicians in the diagnosis of these conditions in clinical practice.
Collapse
|
6
|
George M, Camargo CA, Burnette A, Chen Y, Pawar A, Molony C, Auclair M, Wells MA, Ferro TJ. Racial and Ethnic Minorities at the Highest Risk of Uncontrolled Moderate-to-Severe Asthma: A United States Electronic Health Record Analysis. J Asthma Allergy 2023; 16:567-577. [PMID: 37200709 PMCID: PMC10187653 DOI: 10.2147/jaa.s383817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Accepted: 03/31/2023] [Indexed: 05/20/2023] Open
Abstract
Purpose The identification of risk factors associated with uncontrolled moderate-to-severe asthma is important to improve asthma outcomes. Aim of this study was to identify risk factors for uncontrolled asthma in United States cohort using electronic health record (EHR)-derived data. Patients and Methods In this retrospective real-world study, de-identified data of adolescent and adult patients (≥12 years old) with moderate-to-severe asthma, based on asthma medications within 12 months prior to asthma-related visit (index date), were extracted from the Optum® Humedica EHR. The baseline period was 12 months prior to the index date. Uncontrolled asthma was defined as ≥2 outpatient oral corticosteroid bursts for asthma or ≥2 emergency department visits or ≥1 inpatient visit for asthma. A Cox proportional hazard model was applied. Results There were 402,403 patients in the EHR between January 1, 2012, and December 31, 2018, who met the inclusion criteria and were analyzed. African American (AA) race (hazard ratio [HR]: 2.08), Medicaid insurance (HR: 1.71), Hispanic ethnicity (HR: 1.34), age of 12 to <18 years (HR 1.20), body mass index of ≥35 kg/m2 (HR: 1.20), and female sex (HR 1.19) were identified as risk factors associated with uncontrolled asthma (P < 0.001). Comorbidities characterized by type 2 inflammation, including a blood eosinophil count of ≥300 cells/μL (as compared with eosinophil <150 cells/μL; HR: 1.40, P < 0.001) and food allergy (HR: 1.31), were associated with a significantly higher risk of uncontrolled asthma; pneumonia was also a comorbidity associated with an increased risk (HR: 1.35) of uncontrolled asthma. Conversely, allergic rhinitis (HR: 0.84) was associated with a significantly lower risk of uncontrolled asthma. Conclusion This large study demonstrates multiple risk factors for uncontrolled asthma. Of note, AA and Hispanic individuals with Medicaid insurance are at a significantly higher risk of uncontrolled asthma versus their White, non-Hispanic counterparts with commercial insurance.
Collapse
Affiliation(s)
- Maureen George
- Office of Research and Scholarship, Columbia University School of Nursing, New York, NY, USA
- Correspondence: Maureen George, Office of Research and Scholarship, Columbia University School of Nursing, New York, NY, 10032, USA, Tel +1 212-305-1175, Email
| | - Carlos A Camargo
- Department of Emergency Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Autumn Burnette
- Division of Allergy and Immunology, Howard University Hospital, Howard University College of Medicine, Washington, DC, USA
| | | | | | | | | | | | | |
Collapse
|
7
|
Romero D, Blanco-Almazán D, Groenendaal W, Lijnen L, Smeets C, Ruttens D, Catthoor F, Jané R. Predicting 6-minute walking test outcomes in patients with chronic obstructive pulmonary disease without physical performance measures. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 225:107020. [PMID: 35905697 DOI: 10.1016/j.cmpb.2022.107020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 06/20/2022] [Accepted: 07/10/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND AND OBJECTIVE Chronic obstructive pulmonary disease (COPD) requires a multifactorial assessment, evaluating the airflow limitation and symptoms of the patients. The 6-min walk test (6MWT) is commonly used to evaluate the functional exercise capacity in these patients. This study aims to propose a novel predictive model of the major 6MWT outcomes for COPD assessment, without physical performance measurements. METHODS Cardiopulmonary and clinical parameters were obtained from fifty COPD patients. These parameters were used as inputs of a Bayesian network (BN), which integrated three multivariate models including the 6-min walking distance (6MWD), the maximum HR (HRmax) after the walking, and the HR decay 3 min after (HRR3). The use of BN allows the assessment of the patients' status by predicting the 6MWT outcomes, but also inferring disease severity parameters based on actual patient's 6MWT outcomes. RESULTS Firstly, the correlation obtained between the estimated and actual 6MWT measures was strong (R = 0.84, MAPE = 8.10% for HRmax) and moderate (R = 0.58, MAPE = 15.43% for 6MWD and R = 0.58, MAPE = 32.49% for HRR3), improving the classical methods to estimate 6MWD. Secondly, the classification of disease severity showed an accuracy of 78.3% using three severity groups, which increased up to 84.4% for two defined severity groups. CONCLUSIONS We propose a powerful two-way assessment tool for COPD patients, capable of predicting 6MWT outcomes without the need for an actual walking exercise. This model-based tool opens the way to implement a continuous monitoring system for COPD patients at home and to provide more personalized care.
Collapse
Affiliation(s)
- Daniel Romero
- Universitat Politecnica de Catalunya · BarcelonaTech (UPC), Barcelona 08019, Spain; Institute for Bioengineering of Catalonia (IBEC-BIST), Barcelona 08019, Spain; Biomedical Research Networking Center of Bioengineering, Biomaterials and Nanomedicine (CIBER-BBN), Madrid 28029, Spain.
| | - Dolores Blanco-Almazán
- Universitat Politecnica de Catalunya · BarcelonaTech (UPC), Barcelona 08019, Spain; Institute for Bioengineering of Catalonia (IBEC-BIST), Barcelona 08019, Spain; Biomedical Research Networking Center of Bioengineering, Biomaterials and Nanomedicine (CIBER-BBN), Madrid 28029, Spain
| | | | | | | | | | | | - Raimon Jané
- Universitat Politecnica de Catalunya · BarcelonaTech (UPC), Barcelona 08019, Spain; Institute for Bioengineering of Catalonia (IBEC-BIST), Barcelona 08019, Spain; Biomedical Research Networking Center of Bioengineering, Biomaterials and Nanomedicine (CIBER-BBN), Madrid 28029, Spain
| |
Collapse
|
8
|
Natural Language Processing in Radiology: Update on Clinical Applications. J Am Coll Radiol 2022; 19:1271-1285. [PMID: 36029890 DOI: 10.1016/j.jacr.2022.06.016] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 05/25/2022] [Accepted: 06/03/2022] [Indexed: 11/24/2022]
Abstract
Radiological reports are a valuable source of information used to guide clinical care and support research. Organizing and managing this content, however, frequently requires several manual curations due to the more common unstructured nature of the reports. However, manual review of these reports for clinical knowledge extraction is costly and time-consuming. Natural language processing (NLP) is a set of methods developed to extract structured meaning from a body of text and can be used to optimize the workflow of health care professionals. Specifically, NLP methods can help radiologists as decision support systems and improve the management of patients' medical data. In this study, we highlight the opportunities offered by NLP in the field of radiology. A comprehensive review of the most commonly used NLP methods to extract information from radiological reports and the development of tools to improve radiological workflow using this information is presented. Finally, we review the important limitations of these tools and discuss the relevant observations and trends in the application of NLP to radiology that could benefit the field in the future.
Collapse
|
9
|
Ashburner JM, Chang Y, Wang X, Khurshid S, Anderson CD, Dahal K, Weisenfeld D, Cai T, Liao KP, Wagholikar KB, Murphy SN, Atlas SJ, Lubitz SA, Singer DE. Natural Language Processing to Improve Prediction of Incident Atrial Fibrillation Using Electronic Health Records. J Am Heart Assoc 2022; 11:e026014. [PMID: 35904194 PMCID: PMC9375475 DOI: 10.1161/jaha.122.026014] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Accepted: 06/29/2022] [Indexed: 11/16/2022]
Abstract
Background Models predicting atrial fibrillation (AF) risk, such as Cohorts for Heart and Aging Research in Genomic Epidemiology AF (CHARGE-AF), have not performed as well in electronic health records. Natural language processing (NLP) may improve models by using narrative electronic health record text. Methods and Results From a primary care network, we included patients aged ≥65 years with visits between 2003 and 2013 in development (n=32 960) and internal validation cohorts (n=13 992). An external validation cohort from a separate network from 2015 to 2020 included 39 051 patients. Model features were defined using electronic health record codified data and narrative data with NLP. We developed 2 models to predict 5-year AF incidence using (1) codified+NLP data and (2) codified data only and evaluated model performance. The analysis included 2839 incident AF cases in the development cohort and 1057 and 2226 cases in internal and external validation cohorts, respectively. The C-statistic was greater (P<0.001) in codified+NLP model (0.744 [95% CI, 0.735-0.753]) compared with codified-only (0.730 [95% CI, 0.720-0.739]) in the development cohort. In internal validation, the C-statistic of codified+NLP was modestly higher (0.735 [95% CI, 0.720-0.749]) compared with codified-only (0.729 [95% CI, 0.715-0.744]; P=0.06) and CHARGE-AF (0.717 [95% CI, 0.703-0.731]; P=0.002). Codified+NLP and codified-only were well calibrated, whereas CHARGE-AF underestimated AF risk. In external validation, the C-statistic of codified+NLP (0.750 [95% CI, 0.740-0.760]) remained higher (P<0.001) than codified-only (0.738 [95% CI, 0.727-0.748]) and CHARGE-AF (0.735 [95% CI, 0.725-0.746]). Conclusions Estimation of 5-year risk of AF can be modestly improved using NLP to incorporate narrative electronic health record data.
Collapse
Affiliation(s)
- Jeffrey M. Ashburner
- Division of General Internal MedicineMassachusetts General HospitalBostonMA
- Harvard Medical SchoolBostonMA
| | - Yuchiao Chang
- Division of General Internal MedicineMassachusetts General HospitalBostonMA
- Harvard Medical SchoolBostonMA
| | - Xin Wang
- Cardiovascular Research CenterMassachusetts General HospitalBostonMA
| | - Shaan Khurshid
- Cardiovascular Research CenterMassachusetts General HospitalBostonMA
- Division of CardiologyMassachusetts General HospitalBostonMA
| | | | - Kumar Dahal
- Department of Rheumatology, Inflammation, and ImmunityBrigham and Women’s HospitalBostonMA
| | - Dana Weisenfeld
- Department of Rheumatology, Inflammation, and ImmunityBrigham and Women’s HospitalBostonMA
| | - Tianrun Cai
- Harvard Medical SchoolBostonMA
- Department of Rheumatology, Inflammation, and ImmunityBrigham and Women’s HospitalBostonMA
| | - Katherine P. Liao
- Harvard Medical SchoolBostonMA
- Department of Rheumatology, Inflammation, and ImmunityBrigham and Women’s HospitalBostonMA
| | - Kavishwar B. Wagholikar
- Harvard Medical SchoolBostonMA
- Laboratory of Computer ScienceMassachusetts General HospitalBostonMA
| | - Shawn N. Murphy
- Harvard Medical SchoolBostonMA
- Research Information Science and ComputingMass General BrighamSomervilleMA
| | - Steven J. Atlas
- Division of General Internal MedicineMassachusetts General HospitalBostonMA
- Harvard Medical SchoolBostonMA
| | - Steven A. Lubitz
- Cardiovascular Research CenterMassachusetts General HospitalBostonMA
- Cardiac Arrhythmia ServiceMassachusetts General HospitalBostonMA
| | - Daniel E. Singer
- Division of General Internal MedicineMassachusetts General HospitalBostonMA
- Harvard Medical SchoolBostonMA
| |
Collapse
|
10
|
Boueiz A, Xu Z, Chang Y, Masoomi A, Gregory A, Lutz S, Qiao D, Crapo JD, Dy JG, Silverman EK, Castaldi PJ. Machine Learning Prediction of Progression in Forced Expiratory Volume in 1 Second in the COPDGene® Study. CHRONIC OBSTRUCTIVE PULMONARY DISEASES (MIAMI, FLA.) 2022; 9:349-365. [PMID: 35649102 PMCID: PMC9448009 DOI: 10.15326/jcopdf.2021.0275] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 05/18/2022] [Indexed: 05/24/2023]
Abstract
BACKGROUND The heterogeneous nature of chronic obstructive pulmonary disease (COPD) complicates the identification of the predictors of disease progression. We aimed to improve the prediction of disease progression in COPD by using machine learning and incorporating a rich dataset of phenotypic features. METHODS We included 4496 smokers with available data from their enrollment and 5-year follow-up visits in the COPD Genetic Epidemiology (COPDGene®) study. We constructed linear regression (LR) and supervised random forest models to predict 5-year progression in forced expiratory in 1 second (FEV1) from 46 baseline features. Using cross-validation, we randomly partitioned participants into training and testing samples. We also validated the results in the COPDGene 10-year follow-up visit. RESULTS Predicting the change in FEV1 over time is more challenging than simply predicting the future absolute FEV1 level. For random forest, R-squared was 0.15 and the area under the receiver operator characteristic (ROC) curves for the prediction of participants in the top quartile of observed progression was 0.71 (testing) and respectively, 0.10 and 0.70 (validation). Random forest provided slightly better performance than LR. The accuracy was best for Global initiative for chronic Obstructive Lung Disease (GOLD) grades 1-2 participants, and it was harder to achieve accurate prediction in advanced stages of the disease. Predictive variables differed in their relative importance as well as for the predictions by GOLD. CONCLUSION Random forest, along with deep phenotyping, predicts FEV1 progression with reasonable accuracy. There is significant room for improvement in future models. This prediction model facilitates the identification of smokers at increased risk for rapid disease progression. Such findings may be useful in the selection of patient populations for targeted clinical trials.
Collapse
Affiliation(s)
- Adel Boueiz
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States
- Pulmonary and Critical Care Division, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States
- *These authors contributed equally
| | - Zhonghui Xu
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States
- *These authors contributed equally
| | - Yale Chang
- Department of Electrical and Computer Engineering, Northeastern University, Boston, Massachusetts, United States
| | - Aria Masoomi
- Department of Electrical and Computer Engineering, Northeastern University, Boston, Massachusetts, United States
| | - Andrew Gregory
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States
| | - Sharon Lutz
- Department of Population Medicine, Harvard Pilgrim Health Care Institute, Boston, Massachusetts, United States
| | - Dandi Qiao
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States
| | - James D. Crapo
- Division of Pulmonary Medicine, Department of Medicine, National Jewish Health, Denver, Colorado, United States
| | - Jennifer G. Dy
- Department of Electrical and Computer Engineering, Northeastern University, Boston, Massachusetts, United States
| | - Edwin K. Silverman
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States
- Pulmonary and Critical Care Division, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States
| | - Peter J. Castaldi
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States
- Division of General Medicine and Primary Care, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States
| | | |
Collapse
|
11
|
Seedahmed MI, Mogilnicka I, Zeng S, Luo G, Whooley MA, McCulloch CE, Koth L, Arjomandi M. Performance of a Computational Phenotyping Algorithm for Sarcoidosis Using Diagnostic Codes in Electronic Medical Records: A Pilot Study from Two Veterans Affairs Medical Centers. JMIR Form Res 2022; 6:e31615. [PMID: 35081036 PMCID: PMC8928044 DOI: 10.2196/31615] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 01/24/2022] [Accepted: 01/24/2022] [Indexed: 11/29/2022] Open
Abstract
Background Electronic medical records (EMRs) offer the promise of computationally identifying sarcoidosis cases. However, the accuracy of identifying these cases in the EMR is unknown. Objective The aim of this study is to determine the statistical performance of using the International Classification of Diseases (ICD) diagnostic codes to identify patients with sarcoidosis in the EMR. Methods We used the ICD diagnostic codes to identify sarcoidosis cases by searching the EMRs of the San Francisco and Palo Alto Veterans Affairs medical centers and randomly selecting 200 patients. To improve the diagnostic accuracy of the computational algorithm in cases where histopathological data are unavailable, we developed an index of suspicion to identify cases with a high index of suspicion for sarcoidosis (confirmed and probable) based on clinical and radiographic features alone using the American Thoracic Society practice guideline. Through medical record review, we determined the positive predictive value (PPV) of diagnosing sarcoidosis by two computational methods: using ICD codes alone and using ICD codes plus the high index of suspicion. Results Among the 200 patients, 158 (79%) had a high index of suspicion for sarcoidosis. Of these 158 patients, 142 (89.9%) had documentation of nonnecrotizing granuloma, confirming biopsy-proven sarcoidosis. The PPV of using ICD codes alone was 79% (95% CI 78.6%-80.5%) for identifying sarcoidosis cases and 71% (95% CI 64.7%-77.3%) for identifying histopathologically confirmed sarcoidosis in the EMRs. The inclusion of the generated high index of suspicion to identify confirmed sarcoidosis cases increased the PPV significantly to 100% (95% CI 96.5%-100%). Histopathology documentation alone was 90% sensitive compared with high index of suspicion. Conclusions ICD codes are reasonable classifiers for identifying sarcoidosis cases within EMRs with a PPV of 79%. Using a computational algorithm to capture index of suspicion data elements could significantly improve the case-identification accuracy.
Collapse
Affiliation(s)
- Mohamed Ismail Seedahmed
- Division of Pulmonary, Critical care, allergy and Immunology, and Sleep., Department of Medicine, University of California San Francisco, 513 Parnassus Ave.HSE 1314, Box 0111, San Francisco, US.,San Francisco Veterans Affairs Medical Center, 4150 Clement Street, San Francisco, US
| | - Izabella Mogilnicka
- Department of Experimental Physiology and Pathophysiology, Laboratory of the Centre for Preclinical Research, Medical University of Warsaw, Warsaw, PL
| | - Siyang Zeng
- San Francisco Veterans Affairs Medical Center, 4150 Clement Street, San Francisco, US.,Department of Biomedical Informatics and Medical Education, School of Medicine, University of Washington, Seattle, US
| | - Gang Luo
- Department of Biomedical Informatics and Medical Education, School of Medicine, University of Washington, Seattle, US
| | - Mary A Whooley
- San Francisco Veterans Affairs Medical Center, 4150 Clement Street, San Francisco, US.,Department of Medicine, University of California San Francisco, San Francisco, US.,Measurement Science Quality Enhancement Research Initiative, San Francisco Veterans Affairs Healthcare System, San Francisco, US
| | - Charles E McCulloch
- Department of Epidemiology & Biostatistics, University of California San Francisco, San Francisco, US
| | - Laura Koth
- Division of Pulmonary, Critical care, allergy and Immunology, and Sleep., Department of Medicine, University of California San Francisco, 513 Parnassus Ave.HSE 1314, Box 0111, San Francisco, US
| | - Mehrdad Arjomandi
- San Francisco Veterans Affairs Medical Center, 4150 Clement Street, San Francisco, US.,Division of Pulmonary, Critical care, allergy and Immunology, and Sleep., Department of Medicine, University of California San Francisco, 513 Parnassus Ave.HSE 1314, Box 0111, San Francisco, US
| |
Collapse
|
12
|
Sivasankar S, Cheng AL, Hoffman M. Ranking Methodology to Evaluate the Severity of a Quality Gap Using a National EHR Database. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2021; 2021:565-574. [PMID: 34457172 PMCID: PMC8378648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Selecting quality improvement projects can often be a reactive process. In order to demonstrate a data-driven strategy, we used multi-site, de-identified electronic health record (EHR) data to prioritize the severity of a quality concern: inappropriate A1c test orders for sickle cell disease patients in two randomly chosen facilities (Facility A & B). The best linear unbiased predictions (BLUP) generated from Generalized Linear Mixed Model (GLMM) was estimated for all 393 facilities with 37,151 SCD patients in the Cerner Health FactsTM (HF) data warehouse based on the ratio of inappropriate A1c orders. Ranking the BLUP after applying the GLMM indicates that the facility A being in the second quartile may not have a quality gap as significant as facility B in the top quartile for this quality concern. This study illustrates the utility of multisite EHR data for evaluating QI projects and the utility of GLMM to enable this analysis.
Collapse
Affiliation(s)
- Shivani Sivasankar
- University of Missouri-Kansas City School of Medicine, MO
- Children's Mercy Hospital, Kansas City, MO
| | - An-Lin Cheng
- University of Missouri-Kansas City School of Medicine, MO
| | - Mark Hoffman
- University of Missouri-Kansas City School of Medicine, MO
- Children's Mercy Hospital, Kansas City, MO
| |
Collapse
|
13
|
Foer D, Beeler PE, Cui J, Karlson EW, Bates DW, Cahill KN. Asthma Exacerbations in Patients with Type 2 Diabetes and Asthma on Glucagon-like Peptide-1 Receptor Agonists. Am J Respir Crit Care Med 2021; 203:831-840. [PMID: 33052715 PMCID: PMC8017590 DOI: 10.1164/rccm.202004-0993oc] [Citation(s) in RCA: 70] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Accepted: 10/13/2020] [Indexed: 12/22/2022] Open
Abstract
Rationale: GLP-1R (glucagon-like peptide-1 receptor) agonists are approved to treat type 2 diabetes mellitus and obesity. GLP-1R agonists reduce airway inflammation and hyperresponsiveness in preclinical models.Objectives: To compare rates of asthma exacerbations and symptoms between adults with type 2 diabetes and asthma prescribed GLP-1R agonists and those prescribed SGLT-2 (sodium-glucose cotransporter-2) inhibitors, DPP-4 (dipeptidyl peptidase-4) inhibitors, sulfonylureas, or basal insulin for diabetes treatment intensification.Methods: This study was an electronic health records-based new-user, active-comparator, retrospective cohort study of patients with type 2 diabetes and asthma newly prescribed GLP-1R agonists or comparator drugs at an academic healthcare system from January 2000 to March 2018. The primary outcome was asthma exacerbations; the secondary outcome was encounters for asthma symptoms. Propensity scores were calculated for GLP-1R agonist and non-GLP-1R agonist use. Zero-inflated Poisson regression models included adjustment for multiple covariates.Measurements and Main Results: Patients initiating GLP-1R agonists (n = 448), SGLT-2 inhibitors (n = 112), DPP-4 inhibitors (n = 435), sulfonylureas (n = 2,253), or basal insulin (n = 2,692) were identified. At 6 months, asthma exacerbation counts were lower in persons initiating GLP-1R agonists (reference) compared with SGLT-2 inhibitors (incidence rate ratio [IRR], 2.98; 95% confidence interval [CI], 1.30-6.80), DPP-4 inhibitors (IRR, 2.45; 95% CI, 1.54-3.89), sulfonylureas (IRR, 1.83; 95% CI, 1.20-2.77), and basal insulin (IRR, 2.58; 95% CI, 1.72-3.88). Healthcare encounters for asthma symptoms were also lower among GLP-1R agonist users.Conclusions: Adult patients with asthma prescribed GLP-1R agonists for type 2 diabetes had lower counts of asthma exacerbations compared with other drugs initiated for treatment intensification. GLP-1R agonists may represent a novel treatment for asthma associated with metabolic dysfunction.
Collapse
Affiliation(s)
- Dinah Foer
- Division of Allergy and Clinical Immunology
| | - Patrick E. Beeler
- Division of General Internal Medicine and Primary Care, and
- Department of Internal Medicine, University Hospital Zurich, and Epidemiology, Biostatistics, and Prevention Institute, University of Zurich, Zurich, Switzerland; and
| | - Jing Cui
- Division of Rheumatology, Immunity, and Inflammation, Department of Medicine, Brigham and Women’s Hospital, and Harvard Medical School, Boston, Massachusetts
| | - Elizabeth W. Karlson
- Division of Rheumatology, Immunity, and Inflammation, Department of Medicine, Brigham and Women’s Hospital, and Harvard Medical School, Boston, Massachusetts
| | - David W. Bates
- Division of General Internal Medicine and Primary Care, and
| | - Katherine N. Cahill
- Division of Allergy, Pulmonary, and Critical Care Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
| |
Collapse
|
14
|
Fontanella S, Cucco A, Custovic A. Machine learning in asthma research: moving toward a more integrated approach. Expert Rev Respir Med 2021; 15:609-621. [PMID: 33618597 DOI: 10.1080/17476348.2021.1894133] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Introduction: Big data are reshaping the future of medicine. The growing availability and increasing complexity of data have favored the adoption of modern analytical and computational methodologies in every area of medicine. Over the past decades, asthma research has been characterized by a shift in the way studies are conducted and data are analyzed. Motivated by the assumptions that 'data will speak for themselves', hypothesis-driven approaches have been replaced by data-driven hypotheses-generating methods to explore hidden patterns and underlying mechanisms. However, even with all the advancement in technologies and the new important insight that we gained to understand and characterize asthma heterogeneity, very few research findings have been translated into clinically actionable solutions.Areas covered: To investigate some of the fundamental analytical approaches adopted in the current literature and appraise their impact and usefulness in medicine, we conducted a bibliometric analysis of big data analytics in asthma research in the past 50 years.Expert opinion: No single data source or methodology can uncover the complexity of human health and disease. To fully capitalize on the potential of 'big data', we will have to embrace the collaborative science and encourage the creation of integrated cross-disciplinary teams brought together around technological advances.
Collapse
Affiliation(s)
- Sara Fontanella
- National Heart and Lung Institute, Imperial College London, UK
| | - Alex Cucco
- National Heart and Lung Institute, Imperial College London, UK
| | - Adnan Custovic
- National Heart and Lung Institute, Imperial College London, UK
| |
Collapse
|
15
|
Landsman D, Abdelbasit A, Wang C, Guerzhoy M, Joshi U, Mathew S, Pou-Prom C, Dai D, Pequegnat V, Murray J, Chokar K, Banning M, Mamdani M, Mishra S, Batt J. Cohort profile: St. Michael's Hospital Tuberculosis Database (SMH-TB), a retrospective cohort of electronic health record data and variables extracted using natural language processing. PLoS One 2021; 16:e0247872. [PMID: 33657184 PMCID: PMC7928444 DOI: 10.1371/journal.pone.0247872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Accepted: 02/16/2021] [Indexed: 12/01/2022] Open
Abstract
Background Tuberculosis (TB) is a major cause of death worldwide. TB research draws heavily on clinical cohorts which can be generated using electronic health records (EHR), but granular information extracted from unstructured EHR data is limited. The St. Michael’s Hospital TB database (SMH-TB) was established to address gaps in EHR-derived TB clinical cohorts and provide researchers and clinicians with detailed, granular data related to TB management and treatment. Methods We collected and validated multiple layers of EHR data from the TB outpatient clinic at St. Michael’s Hospital, Toronto, Ontario, Canada to generate the SMH-TB database. SMH-TB contains structured data directly from the EHR, and variables generated using natural language processing (NLP) by extracting relevant information from free-text within clinic, radiology, and other notes. NLP performance was assessed using recall, precision and F1 score averaged across variable labels. We present characteristics of the cohort population using binomial proportions and 95% confidence intervals (CI), with and without adjusting for NLP misclassification errors. Results SMH-TB currently contains retrospective patient data spanning 2011 to 2018, for a total of 3298 patients (N = 3237 with at least 1 associated dictation). Performance of TB diagnosis and medication NLP rulesets surpasses 93% in recall, precision and F1 metrics, indicating good generalizability. We estimated 20% (95% CI: 18.4–21.2%) were diagnosed with active TB and 46% (95% CI: 43.8–47.2%) were diagnosed with latent TB. After adjusting for potential misclassification, the proportion of patients diagnosed with active and latent TB was 18% (95% CI: 16.8–19.7%) and 40% (95% CI: 37.8–41.6%) respectively Conclusion SMH-TB is a unique database that includes a breadth of structured data derived from structured and unstructured EHR data by using NLP rulesets. The data are available for a variety of research applications, such as clinical epidemiology, quality improvement and mathematical modeling studies.
Collapse
Affiliation(s)
- David Landsman
- MAP Centre for Urban Health Solutions, Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Unity Health Toronto, Toronto, Ontario, Canada
| | - Ahmed Abdelbasit
- Department of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Christine Wang
- Department of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Michael Guerzhoy
- Princeton University, Princeton, New Jersey, United States of America
- University of Toronto, Toronto, Ontario, Canada
- Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Unity Health Toronto, Toronto, Ontario, Canada
| | - Ujash Joshi
- University of Toronto, Toronto, Ontario, Canada
| | - Shaun Mathew
- Department of Computer Science, Ryerson University, Toronto, Ontario, Canada
| | | | - David Dai
- Unity Health Toronto, Toronto, Ontario, Canada
| | - Victoria Pequegnat
- Decision Support Services, St. Michael’s Hospital, Unity Health Toronto, Toronto, Ontario, Canada
| | | | - Kamalprit Chokar
- Division of Respirology, Department of Medicine, St. Michael’s Hospital, Unity Health Toronto, Toronto, Ontario, Canada
| | | | - Muhammad Mamdani
- Department of Medicine, University of Toronto, Toronto, Ontario, Canada
- University of Toronto, Toronto, Ontario, Canada
- Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Unity Health Toronto, Toronto, Ontario, Canada
- Unity Health Toronto, Toronto, Ontario, Canada
- Leslie Dan Faculty of Pharmacy, University of Toronto, Canada, Toronto, Ontario, Canada
- Institute of Health Policy, Management, and Evaluation, University of Toronto, Toronto, Ontario, Canada
- Vector Institute, Toronto, Ontario, Canada
- Institute of Medical Science, University of Toronto, Toronto, Ontario, Canada
| | - Sharmistha Mishra
- MAP Centre for Urban Health Solutions, Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Unity Health Toronto, Toronto, Ontario, Canada
- Department of Medicine, University of Toronto, Toronto, Ontario, Canada
- Institute of Health Policy, Management, and Evaluation, University of Toronto, Toronto, Ontario, Canada
- Institute of Medical Science, University of Toronto, Toronto, Ontario, Canada
| | - Jane Batt
- Department of Medicine, University of Toronto, Toronto, Ontario, Canada
- Institute of Medical Science, University of Toronto, Toronto, Ontario, Canada
- Keenan Research Center for Biomedical Science, St. Michael’s Hospital, Unity Health Toronto, Toronto, Ontario, Canada
- * E-mail:
| |
Collapse
|
16
|
Wang Y, Li Z, Li FS. Development and Assessment of Prediction Models for the Development of COPD in a Typical Rural Area in Northwest China. Int J Chron Obstruct Pulmon Dis 2021; 16:477-486. [PMID: 33664570 PMCID: PMC7924122 DOI: 10.2147/copd.s297380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2020] [Accepted: 02/07/2021] [Indexed: 11/23/2022] Open
Abstract
Objective This study aimed to construct and evaluate a clinical predictive model for the development of COPD in northwest China's rural areas. Methods A cross-sectional study of a natural population was performed in rural northwest China. After assessing demographic and disease characteristics, a clinical prediction model was developed. First, we used the least absolute shrinkage and selection operator regression model to screen possible factors influencing COPD. Then construct a logistic regression model and draw a nomogram. The discriminability of the model was further evaluated by the calibration diagram, C-index and ROC curve system. Clinical benefit was analyzed using the decision curve. Finally, the 1000 bootstrap resamples and Harrell's C-index was used for internal verification of the nomogram. Results Among 3249 patients in the local rural natural population, 394 (12.13%) were diagnosed with COPD. The LASSO regression model was used to find the optimal combination of parameters, and the screened influencing factors included age, gender, barbeque, smoking, passive smoking, energy type, ventilation system and Post-Bronchodilator FEV1. These predictors are used to construct a nomogram. C index is 0.81 (95% confidence interval:0.79-0.83). The combination of the calibration curve and ROC curve indicates that the model has high discriminability. The decision curve shows benefits in clinical practice when the threshold probability is >6% and <58%, respectively. The internal verification results using Harrell's C-Index were 0.80 (95% confidence interval: 0.78-0.83). Conclusion Combining information such as age, sex, barbeque, smoking, passive smoking, type of energy, ventilation systems, and Post-Bronchodilator FEV1 can be easily used to predict the risk of COPD in local rural areas.
Collapse
Affiliation(s)
- Yide Wang
- Department of Integrated Pulmonology, Fourth Affiliated Hospital of Xinjiang Medical University, Urumqi, People's Republic of China
| | - Zheng Li
- Department of Integrated Pulmonology, Fourth Affiliated Hospital of Xinjiang Medical University, Urumqi, People's Republic of China.,Xinjiang National Clinical Research Base of Traditional Chinese Medicine, Xinjiang Medical University, Ürümqi, People's Republic of China
| | - Feng-Sen Li
- Department of Integrated Pulmonology, Fourth Affiliated Hospital of Xinjiang Medical University, Urumqi, People's Republic of China.,Xinjiang National Clinical Research Base of Traditional Chinese Medicine, Xinjiang Medical University, Ürümqi, People's Republic of China
| |
Collapse
|
17
|
Yuan W, Beaulieu-Jones BK, Yu KH, Lipnick SL, Palmer N, Loscalzo J, Cai T, Kohane IS. Temporal bias in case-control design: preventing reliable predictions of the future. Nat Commun 2021; 12:1107. [PMID: 33597541 PMCID: PMC7889612 DOI: 10.1038/s41467-021-21390-2] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2020] [Accepted: 01/22/2021] [Indexed: 02/07/2023] Open
Abstract
One of the primary tools that researchers use to predict risk is the case-control study. We identify a flaw, temporal bias, that is specific to and uniquely associated with these studies that occurs when the study period is not representative of the data that clinicians have during the diagnostic process. Temporal bias acts to undermine the validity of predictions by over-emphasizing features close to the outcome of interest. We examine the impact of temporal bias across the medical literature, and highlight examples of exaggerated effect sizes, false-negative predictions, and replication failure. Given the ubiquity and practical advantages of case-control studies, we discuss strategies for estimating the influence of and preventing temporal bias where it exists.
Collapse
Affiliation(s)
- William Yuan
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| | | | - Kun-Hsing Yu
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Scott L Lipnick
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
- Center for Assessment Technology and Continuous Health, Massachusetts General Hospital, Boston, MA, USA
| | - Nathan Palmer
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Joseph Loscalzo
- Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Tianxi Cai
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Data Sciences, VA Boston Healthcare System, Boston, MA, USA
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Isaac S Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
18
|
Meeraus WH, Mullerova H, El Baou C, Fahey M, Hessel EM, Fahy WA. Predicting Re-Exacerbation Timing and Understanding Prolonged Exacerbations: An Analysis of Patients with COPD in the ECLIPSE Cohort. Int J Chron Obstruct Pulmon Dis 2021; 16:225-244. [PMID: 33574663 PMCID: PMC7872897 DOI: 10.2147/copd.s279315] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Accepted: 12/30/2020] [Indexed: 11/30/2022] Open
Abstract
PURPOSE Understanding risk factors for an acute exacerbation of chronic obstructive pulmonary disease (AECOPD) is important for optimizing patient care. We re-analyzed data from the Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints (ECLIPSE) study (NCT00292552) to identify factors predictive of re-exacerbations and associated with prolonged AECOPDs. METHODS Patients with COPD from ECLIPSE with moderate/severe AECOPDs were included. The end of the first exacerbation was the index date. Timing of re-exacerbation risk was assessed in patients with 180 days' post-index-date follow-up data. Factors predictive of early (1-90 days) vs late (91-180 days) vs no re-exacerbation were identified using a multivariable partial-proportional-odds-predictive model. Explanatory logistic-regression modeling identified factors associated with prolonged AECOPDs. RESULTS Of the 1,554 eligible patients from ECLIPSE, 1,420 had 180 days' follow-up data: more patients experienced early (30.9%) than late (18.7%) re-exacerbations; 50.4% had no re-exacerbation within 180 days. Lower post-bronchodilator FEV1 (P=0.0019), a higher number of moderate/severe exacerbations on/before index date (P<0.0001), higher St. George's Respiratory Questionnaire total score (P=0.0036), and season of index exacerbation (autumn vs winter, P=0.00164) were identified as predictors of early (vs late/none) re-exacerbation risk within 180 days. Similarly, these were all predictors of any (vs none) re-exacerbation risk within 180 days. Median moderate/severe AECOPD duration was 12 days; 22.7% of patients experienced a prolonged AECOPD. The odds of experiencing a prolonged AECOPD were greater for severe vs moderate AECOPDs (adjusted odds ratio=1.917, P=0.002) and lower for spring vs winter AECOPDs (adjusted odds ratio=0.578, P=0.017). CONCLUSION Prior exacerbation history, reduced lung function, poorer respiratory-related quality-of-life (greater disease burden), and season may help identify patients who will re-exacerbate within 90 days of an AECOPD. Severe AECOPDs and winter AECOPDs are likely to be prolonged and may require close monitoring.
Collapse
Affiliation(s)
- Wilhelmine H Meeraus
- GlaxoSmithKline plc., Epidemiology – Value, Evidence and Outcomes, Middlesex, UK
| | - Hana Mullerova
- GlaxoSmithKline plc., Epidemiology – Value, Evidence and Outcomes, Middlesex, UK
| | - Céline El Baou
- GlaxoSmithKline plc., Research and Development, Middlesex, UK
| | - Marion Fahey
- GlaxoSmithKline plc., Epidemiology – Value, Evidence and Outcomes, Middlesex, UK
| | - Edith M Hessel
- GlaxoSmithKline plc., Research and Development, Middlesex, UK
| | - William A Fahy
- GlaxoSmithKline plc., Research and Development, Middlesex, UK
| |
Collapse
|
19
|
Doyle OM, van der Laan R, Obradovic M, McMahon P, Daniels F, Pitcher A, Loebinger MR. Identification of potentially undiagnosed patients with nontuberculous mycobacterial lung disease using machine learning applied to primary care data in the UK. Eur Respir J 2020; 56:13993003.00045-2020. [PMID: 32430411 DOI: 10.1183/13993003.00045-2020] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 04/23/2020] [Indexed: 01/23/2023]
Abstract
Nontuberculous mycobacterial lung disease (NTMLD) is a rare lung disease often missed due to a low index of suspicion and unspecific clinical presentation. This retrospective study was designed to characterise the prediagnosis features of NTMLD patients in primary care and to assess the feasibility of using machine learning to identify undiagnosed NTMLD patients.IQVIA Medical Research Data (incorporating THIN, a Cegedim Database), a UK electronic medical records primary care database was used. NTMLD patients were identified between 2003 and 2017 by diagnosis in primary or secondary care or record of NTMLD treatment regimen. Risk factors and treatments were extracted in the prediagnosis period, guided by literature and expert clinical opinion. The control population was enriched to have at least one of these features.741 NTMLD and 112 784 control patients were selected. Annual prevalence rates of NTMLD from 2006 to 2016 increased from 2.7 to 5.1 per 100 000. The most common pre-existing diagnoses and treatments for NTMLD patients were COPD and asthma and penicillin, macrolides and inhaled corticosteroids. Compared to random testing, machine learning improved detection of patients with NTMLD by almost a thousand-fold with AUC of 0.94. The total prevalence of diagnosed and undiagnosed cases of NTMLD in 2016 was estimated to range between 9 and 16 per 100 000.This study supports the feasibility of machine learning applied to primary care data to screen for undiagnosed NTMLD patients, with results indicating that there may be a substantial number of undiagnosed cases of NTMLD in the UK.
Collapse
Affiliation(s)
- Orla M Doyle
- Predictive Analytics, Real World Analytical Solutions, IQVIA, London, UK.,These authors contributed equally
| | - Roald van der Laan
- Insmed Utrecht, Utrecht, The Netherlands.,These authors contributed equally
| | - Marko Obradovic
- Insmed Utrecht, Utrecht, The Netherlands .,These authors contributed equally
| | | | | | | | - Michael R Loebinger
- Royal Brompton and Harefield NHS Foundation Trust and Imperial College London, London, UK
| |
Collapse
|
20
|
Wang C, Chen X, Du L, Zhan Q, Yang T, Fang Z. Comparison of machine learning algorithms for the identification of acute exacerbations in chronic obstructive pulmonary disease. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2020; 188:105267. [PMID: 31841787 DOI: 10.1016/j.cmpb.2019.105267] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Revised: 11/19/2019] [Accepted: 12/08/2019] [Indexed: 05/05/2023]
Abstract
OBJECTIVES Identifying acute exacerbations in chronic obstructive pulmonary disease (AECOPDs) is of utmost importance for reducing the associated mortality and financial burden. In this research, the authors aimed to develop identification models for AECOPDs and to compare the relative performance of different modeling paradigms to find the best model for this task. METHODS Data were extracted from electronic medical records (EMRs) of patients with chronic obstructive pulmonary disease who admitted to the China-Japan Friendship Hospital between February 2011 and March 2017. Five machine learning algorithms (random forest, support vector machine, logistic regression, K-nearest neighbor and naïve Bayes) were used to develop the AECOPDs identification models. Feature selection was performed to find an optimal feature subset. 10-folds cross-validation was used to find the best hyperparameters for each model. The following metrics: area under the receiver operating characteristic curve, sensitivity, specificity, positive predictive value, and negative predictive value were used to evaluate the performance of these models. RESULTS A total of 303 EMRs (AECOPDs patients:135; None AECOPDs patients: 168) were included in the study. The SVM model obtained the best performance (sensitivity: 0.80, specificity: 0.83, positive predictive value:0.81, negative predictive value:0.85 and area under the receiver operating characteristic curve: 0.90) after performing feature selection. CONCLUSIONS Our research confirms that the proposed model based on the support vector machine is a powerful tool to identify AECOPDs patients, and it is promising to provide decision support for clinicians when they are struggling to give a confirmed clinical diagnosis.
Collapse
Affiliation(s)
- Chenshuo Wang
- Institute of Electronics, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
| | - Xianxiang Chen
- Institute of Electronics, Chinese Academy of Sciences, Beijing, China; Personalized Management of Chronic Respiratory Disease, Chinese Academy of Medical Sciences, China
| | - Lidong Du
- Institute of Electronics, Chinese Academy of Sciences, Beijing, China; Personalized Management of Chronic Respiratory Disease, Chinese Academy of Medical Sciences, China
| | | | - Ting Yang
- China-Japan Friendship Hospital, Beijing, China.
| | - Zhen Fang
- Institute of Electronics, Chinese Academy of Sciences, Beijing, China; Personalized Management of Chronic Respiratory Disease, Chinese Academy of Medical Sciences, China; University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
21
|
Hossain ME, Uddin S, Khan A, Moni MA. A Framework to Understand the Progression of Cardiovascular Disease for Type 2 Diabetes Mellitus Patients Using a Network Approach. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:E596. [PMID: 31963383 PMCID: PMC7013570 DOI: 10.3390/ijerph17020596] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 01/14/2020] [Indexed: 12/13/2022]
Abstract
The prevalence of chronic disease comorbidity has increased worldwide. Comorbidity-i.e., the presence of multiple chronic diseases-is associated with adverse health outcomes in terms of mobility and quality of life as well as financial burden. Understanding the progression of comorbidities can provide valuable insights towards the prevention and better management of chronic diseases. Administrative data can be used in this regard as they contain semantic information on patients' health conditions. Most studies in this field are focused on understanding the progression of one chronic disease rather than multiple diseases. This study aims to understand the progression of two chronic diseases in the Australian health context. It specifically focuses on the comorbidity progression of cardiovascular disease (CVD) in patients with type 2 diabetes mellitus (T2DM), as the prevalence of these chronic diseases in Australians is high. A research framework is proposed to understand and represent the progression of CVD in patients with T2DM using graph theory and social network analysis techniques. Two study cohorts (i.e., patients with both T2DM and CVD and patients with only T2DM) were selected from an administrative dataset obtained from an Australian health insurance company. Two baseline disease networks were constructed from these two selected cohorts. A final disease network from two baseline disease networks was then generated by weight adjustments in a normalized way. The prevalence of renal failure, fluid and electrolyte disorders, hypertension and obesity was significantly higher in patients with both CVD and T2DM than patients with only T2DM. This showed that these chronic diseases occurred frequently during the progression of CVD in patients with T2DM. The proposed network-based model may potentially help the healthcare provider to understand high-risk diseases and the progression patterns between the recurrence of T2DM and CVD. Also, the framework could be useful for stakeholders including governments and private health insurers to adopt appropriate preventive health management programs for patients at a high risk of developing multiple chronic diseases.
Collapse
Affiliation(s)
- Md Ekramul Hossain
- Complex Systems Research Group, Faculty of Engineering, The University of Sydney, Darlington, NSW 2008, Australia;
| | - Shahadat Uddin
- Complex Systems Research Group, Faculty of Engineering, The University of Sydney, Darlington, NSW 2008, Australia;
| | - Arif Khan
- Complex Systems Research Group, Faculty of Engineering, The University of Sydney, Darlington, NSW 2008, Australia;
| | - Mohammad Ali Moni
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW 2006, Australia;
| |
Collapse
|
22
|
Wang X, Zhang Y, Hao S, Zheng L, Liao J, Ye C, Xia M, Wang O, Liu M, Weng CH, Duong SQ, Jin B, Alfreds ST, Stearns F, Kanov L, Sylvester KG, Widen E, McElhinney DB, Ling XB. Prediction of the 1-Year Risk of Incident Lung Cancer: Prospective Study Using Electronic Health Records from the State of Maine. J Med Internet Res 2019; 21:e13260. [PMID: 31099339 PMCID: PMC6542253 DOI: 10.2196/13260] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2018] [Revised: 04/18/2019] [Accepted: 04/23/2019] [Indexed: 02/05/2023] Open
Abstract
BACKGROUND Lung cancer is the leading cause of cancer death worldwide. Early detection of individuals at risk of lung cancer is critical to reduce the mortality rate. OBJECTIVE The aim of this study was to develop and validate a prospective risk prediction model to identify patients at risk of new incident lung cancer within the next 1 year in the general population. METHODS Data from individual patient electronic health records (EHRs) were extracted from the Maine Health Information Exchange network. The study population consisted of patients with at least one EHR between April 1, 2016, and March 31, 2018, who had no history of lung cancer. A retrospective cohort (N=873,598) and a prospective cohort (N=836,659) were formed for model construction and validation. An Extreme Gradient Boosting (XGBoost) algorithm was adopted to build the model. It assigned a score to each individual to quantify the probability of a new incident lung cancer diagnosis from October 1, 2016, to September 31, 2017. The model was trained with the clinical profile in the retrospective cohort from the preceding 6 months and validated with the prospective cohort to predict the risk of incident lung cancer from April 1, 2017, to March 31, 2018. RESULTS The model had an area under the curve (AUC) of 0.881 (95% CI 0.873-0.889) in the prospective cohort. Two thresholds of 0.0045 and 0.01 were applied to the predictive scores to stratify the population into low-, medium-, and high-risk categories. The incidence of lung cancer in the high-risk category (579/53,922, 1.07%) was 7.7 times higher than that in the overall cohort (1167/836,659, 0.14%). Age, a history of pulmonary diseases and other chronic diseases, medications for mental disorders, and social disparities were found to be associated with new incident lung cancer. CONCLUSIONS We retrospectively developed and prospectively validated an accurate risk prediction model of new incident lung cancer occurring in the next 1 year. Through statistical learning from the statewide EHR data in the preceding 6 months, our model was able to identify statewide high-risk patients, which will benefit the population health through establishment of preventive interventions or more intensive surveillance.
Collapse
Affiliation(s)
- Xiaofang Wang
- Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan, China
- Department of Surgery, Stanford University, Stanford, CA, United States
| | - Yan Zhang
- Department of Oncology, The First Hospital of Shijiazhuang, Shijiazhuang, China
| | - Shiying Hao
- Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, United States
- Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States
| | - Le Zheng
- Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, United States
- Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States
| | - Jiayu Liao
- Department of Bioengineering, University of California, Riverside, CA, United States
- West China-California Multiomics Research Center, West China Hospital, Sichuan University, Chengdu, China
| | - Chengyin Ye
- Department of Health Management, Hangzhou Normal University, Hangzhou, China
| | - Minjie Xia
- Healthcare Business Intelligence Solutions Inc, Palo Alto, CA, United States
| | - Oliver Wang
- Healthcare Business Intelligence Solutions Inc, Palo Alto, CA, United States
| | - Modi Liu
- Healthcare Business Intelligence Solutions Inc, Palo Alto, CA, United States
| | - Ching Ho Weng
- Department of Surgery, Stanford University, Stanford, CA, United States
| | - Son Q Duong
- Lucile Packard Children's Hospital, Palo Alto, CA, United States
| | - Bo Jin
- Healthcare Business Intelligence Solutions Inc, Palo Alto, CA, United States
| | | | - Frank Stearns
- Healthcare Business Intelligence Solutions Inc, Palo Alto, CA, United States
| | - Laura Kanov
- Healthcare Business Intelligence Solutions Inc, Palo Alto, CA, United States
| | - Karl G Sylvester
- Department of Surgery, Stanford University, Stanford, CA, United States
| | - Eric Widen
- Healthcare Business Intelligence Solutions Inc, Palo Alto, CA, United States
| | - Doff B McElhinney
- Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, United States
- Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States
| | - Xuefeng B Ling
- Department of Surgery, Stanford University, Stanford, CA, United States
- Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States
| |
Collapse
|
23
|
Smooth Bayesian network model for the prediction of future high-cost patients with COPD. Int J Med Inform 2019; 126:147-155. [PMID: 31029256 DOI: 10.1016/j.ijmedinf.2019.03.017] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Revised: 02/28/2019] [Accepted: 03/26/2019] [Indexed: 02/05/2023]
Abstract
INTRODUCTION The clinical course of chronic obstructive pulmonary disease (COPD) is marked by acute exacerbation events that increase hospitalization rates and healthcare spending. The early identification of future high-cost patients with COPD may decrease healthcare spending by informing individualized interventions that prevent exacerbation events and decelerate disease progression. Existing studies of cost prediction of other chronic diseases have applied regression and machine-learning methods that cannot capture the complex causal relationships between COPD factors. Thus, the exploration of these factors through nonlinear, high-dimensional but explainable modeling is greatly needed. OBJECTIVES We aimed to develop a machine-learning model to identify future high-cost patients with COPD. Such a model should incorporate expert knowledge about causal relationships, and the method for estimating the model could provide more accurate predictions than other machine learning methods. METHODS We used the 2011-2013 medical insurance data of patients with COPD in a large city. The data set included demographic information and admission records. Leveraging on developments in graphical modeling methods, we proposed a smooth Bayesian network (SBN) model for the prediction of high-cost individuals using medical insurance data. The modeling method incorporated some expert knowledge about causal relationships (i.e., about the Bayesian network structure). We employed a smoothing kernel based on the weighted nearest neighborhood method in the SBN model to address overfitting, case-mix effect, and data sparsity (i.e., using data about "similar patients"). RESULTS The proposed SBN achieved the area under curve (AUC) of 0.80 and showed considerable improvement over the baseline machine-learning methods. Besides confirming the known factors from the literature, we found "region" (i.e., a suburban or urban area) to be a significant factor, and that in a 3-tier system with primary, secondary and tertiary hospitals, COPD patients who had been admitted to primary hospitals were more likely to develop into future high-cost patients than patients who had been admitted to tertiary hospitals. CONCLUSION The proposed SBN model not only obtained higher prediction accuracy and stronger generalizability than a number of benchmark machine-learning methods, but also used the Bayesian network to capture the complex causal relationships between different predictors by incorporating expert knowledge. Furthermore, a framework was developed to establish the relationships between exposure to historical trajectory and future outcome, which can also be applied to other temporal data to model different trajectory information and predict other outcomes.
Collapse
|
24
|
Baowaly MK, Lin CC, Liu CL, Chen KT. Synthesizing electronic health records using improved generative adversarial networks. J Am Med Inform Assoc 2019; 26:228-241. [PMID: 30535151 PMCID: PMC7647178 DOI: 10.1093/jamia/ocy142] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2018] [Revised: 09/21/2018] [Accepted: 10/24/2018] [Indexed: 11/14/2022] Open
Abstract
Objective The aim of this study was to generate synthetic electronic health records (EHRs). The generated EHR data will be more realistic than those generated using the existing medical Generative Adversarial Network (medGAN) method. Materials and Methods We modified medGAN to obtain two synthetic data generation models-designated as medical Wasserstein GAN with gradient penalty (medWGAN) and medical boundary-seeking GAN (medBGAN)-and compared the results obtained using the three models. We used 2 databases: MIMIC-III and National Health Insurance Research Database (NHIRD), Taiwan. First, we trained the models and generated synthetic EHRs by using these three 3 models. We then analyzed and compared the models' performance by using a few statistical methods (Kolmogorov-Smirnov test, dimension-wise probability for binary data, and dimension-wise average count for count data) and 2 machine learning tasks (association rule mining and prediction). Results We conducted a comprehensive analysis and found our models were adequately efficient for generating synthetic EHR data. The proposed models outperformed medGAN in all cases, and among the 3 models, boundary-seeking GAN (medBGAN) performed the best. Discussion To generate realistic synthetic EHR data, the proposed models will be effective in the medical industry and related research from the viewpoint of providing better services. Moreover, they will eliminate barriers including limited access to EHR data and thus accelerate research on medical informatics. Conclusion The proposed models can adequately learn the data distribution of real EHRs and efficiently generate realistic synthetic EHRs. The results show the superiority of our models over the existing model.
Collapse
Affiliation(s)
- Mrinal Kanti Baowaly
- Social Networks and Human-Centered Computing, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan
- Department of Computer Science, National Chengchi University, Taipei, Taiwan
| | - Chia-Ching Lin
- Graduate Institute of Electrical Engineering, National Taiwan University, Taipei, Taiwan
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | - Chao-Lin Liu
- Department of Computer Science, National Chengchi University, Taipei, Taiwan
| | - Kuan-Ta Chen
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
25
|
Greenblatt RE, Zhao EJ, Henrickson SE, Apter AJ, Hubbard RA, Himes BE. Factors associated with exacerbations among adults with asthma according to electronic health record data. Asthma Res Pract 2019; 5:1. [PMID: 30680222 PMCID: PMC6339400 DOI: 10.1186/s40733-019-0048-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2018] [Accepted: 01/10/2019] [Indexed: 11/11/2022] Open
Abstract
BACKGROUND Asthma is a chronic inflammatory lung disease that affects 18.7 million U.S. adults. Electronic health records (EHRs) are a unique source of information that can be leveraged to understand factors associated with asthma in real-life populations. In this study, we identify demographic factors and comorbidities associated with asthma exacerbations among adults according to EHR-derived data and compare these findings to those of epidemiological studies. METHODS We obtained University of Pennsylvania Hospital System EHR-derived data for asthma encounters occurring between 2011 and 2014. Regression analyses were performed to model asthma exacerbation frequency as explained by age, sex, race/ethnicity, health insurance type, smoking status, body mass index (BMI) and various comorbidities. We analyzed data from the National Health and Nutrition Examination Survey (NHANES) from 2001 to 2012 to compare findings with those from the EHR-derived data. RESULTS Based on data from 9068 adult patients with asthma, 33.37% had at least one exacerbation over the four-year study period. In a proportional odds logistic regression predicting number of exacerbations during the study period (levels: 0, 1-2, 3-4, 5+ exacerbations), after controlling for age, race/ethnicity, sex, health insurance type, and smoking status, the highest odds ratios (ORs) of significantly associated factors were: chronic bronchitis (2.70), sinusitis (1.50), emphysema (1.39), fluid and electrolyte disorders (1.35), class 3 obesity (1.32), and diabetes (1.28). An analysis of NHANES data showed associations for class 3 obesity, anemia and chronic bronchitis with exacerbation frequency in an adjusted model controlling for age, race/ethnicity, sex, financial class and smoking status. CONCLUSIONS EHR-derived data is helpful to understand exacerbations in real-life asthma patients, facilitating design of detailed studies and interventions tailored for specific populations.
Collapse
Affiliation(s)
- Rebecca E. Greenblatt
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104 USA
| | - Edward J. Zhao
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104 USA
| | - Sarah E. Henrickson
- Division of Allergy-Immunology, Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA
- Institute for Immunology, University of Pennsylvania, Philadelphia, PA 19104 USA
| | - Andrea J. Apter
- Pulmonary, Allergy and Critical Care Division, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104 USA
| | - Rebecca A. Hubbard
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104 USA
| | - Blanca E. Himes
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104 USA
| |
Collapse
|
26
|
Xie S, Himes BE. Approaches to Link Geospatially Varying Social, Economic, and Environmental Factors with Electronic Health Record Data to Better Understand Asthma Exacerbations. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018; 2018:1561-1570. [PMID: 30815202 PMCID: PMC6371292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Electronic health record (EHR)-derived data has become an invaluable resource for biomedical research, but is seldom used for the study of the health impacts of social and environmental factors due in part to the unavailability of relevant variables. We describe how EHR-derived data can be enhanced via linking of external sources of social, economic and environmental data when patient-related geospatial information is available, and we illustrate an approach to better understand the geospatial patterns of asthma exacerbation rates in Philadelphia. Specifically, we relate the spatial distribution of asthma exacerbations observed in EHR-derived data to that of known and potential risk factors (i.e., economic deprivation, crime, vehicular traffic, tree cover). Areas of highest risk based on integrated social and environmental data were consistent with an area with increased asthma exacerbations, demonstrating that data external to the EHR can enhance our understanding of negative health-related outcomes.
Collapse
Affiliation(s)
- Sherrie Xie
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Blanca E Himes
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
27
|
Uncovering the mechanism of Maxing Ganshi Decoction on asthma from a systematic perspective: A network pharmacology study. Sci Rep 2018; 8:17362. [PMID: 30478434 PMCID: PMC6255815 DOI: 10.1038/s41598-018-35791-9] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2018] [Accepted: 11/10/2018] [Indexed: 01/12/2023] Open
Abstract
Maxing Ganshi Decoction (MXGSD) is used widely for asthma over thousands of years, but its underlying pharmacological mechanisms remain unclear. In this study, systematic and comprehensive network pharmacology was utilized for the first time to reveal the potential pharmacological mechanisms of MXGSD on asthma. Specifically, we collected 141 bioactive components from the 600 components in MXGSD, which shared 52 targets common to asthma-related ones. In-depth network analysis of these 52 common targets indicated that asthma might be a manifestation of systemic neuro-immuno-inflammatory dysfunction in the respiratory system, and MXGSD could treat asthma through relieving airway inflammation, improving airway remodeling, and increasing drug responsiveness. After further cluster and enrichment analysis of the protein-protein interaction network of MXGSD bioactive component targets and asthma-related targets, we found that the neurotrophin signaling pathway, estrogen signaling pathway, PI3K-Akt signaling pathway, and ErbB signaling pathway might serve as the key points and principal pathways of MXGSD gene therapy for asthma from a systemic and holistic perspective, and also provides a novel idea for the development of new drugs for asthma.
Collapse
|
28
|
Tran H, Kim J, Kim D, Choi M, Choi M. Impact of air pollution on cause-specific mortality in Korea: Results from Bayesian Model Averaging and Principle Component Regression approaches. THE SCIENCE OF THE TOTAL ENVIRONMENT 2018; 636:1020-1031. [PMID: 29729505 DOI: 10.1016/j.scitotenv.2018.04.273] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/08/2018] [Revised: 04/17/2018] [Accepted: 04/20/2018] [Indexed: 06/08/2023]
Abstract
Health effects related to air pollution are a major global concern. Related studies based on reliable exposure assessment methods would potentially enable policy makers to propose appropriate environmental management policies. In this study, integrated Bayesian Model Averaging (BMA) and Principle Component Regression (PCR) were adopted to assess the severity of air pollution impacts on mortality related to circulatory, respiratory and skin diseases in 25 districts of Seoul, South Korea for the years 2005-2015. These methods were consistent in determining the best regression models and most important pollutants related to mortality in those highly susceptible to poor air quality. Specifically, the results demonstrated that pneumonia was highly associated with air pollution, with a large determination coefficient (BMA: 0.46, PCR: 0.51) and high model's posterior probability (0.47). The most reliable prediction model for pneumonia was indicated by the lowest Bayesian Information Criterion. Among the pollutants, particulate matter with an aerodynamic diameter of 10 μm or less (PM10) was associated with serious health risks on evaluation, with the highest posterior inclusion probabilities (range, 80.20 to 100.00%) and significantly positive correlation coefficients (range, 0.14 to 0.34, p < 0.05). In addition, excessive PM10 concentration (approximately 2.54 times the threshold) and a continuous increase in mortality due to respiratory diseases (approximately 1.50-fold in 10 years) were also exhibited. Overall, the results of this study suggest that currently, socio-environmental policies and international collaboration to mitigate health effects of air pollution is necessary in Seoul, Korea. Moreover, consideration of uncertainty of the regression model, which was verified in this research, will facilitate further application of this approach and enable optimal prediction of interactions between human and environmental factors.
Collapse
Affiliation(s)
- Hien Tran
- Graduate School of Water Resources, Sungkyunkwan University, Suwon 440-746, Republic of Korea
| | - Jeongyeong Kim
- Graduate School of Water Resources, Sungkyunkwan University, Suwon 440-746, Republic of Korea
| | - Daeun Kim
- Graduate School of Water Resources, Sungkyunkwan University, Suwon 440-746, Republic of Korea
| | - Minyoung Choi
- Department of Medical Business Administration, Kyunghee University, Republic of Korea
| | - Minha Choi
- Graduate School of Water Resources, Sungkyunkwan University, Suwon 440-746, Republic of Korea.
| |
Collapse
|
29
|
Matheson MC, Bowatte G, Perret JL, Lowe AJ, Senaratna CV, Hall GL, de Klerk N, Keogh LA, McDonald CF, Waidyatillake NT, Sly PD, Jarvis D, Abramson MJ, Lodge CJ, Dharmage SC. Prediction models for the development of COPD: a systematic review. Int J Chron Obstruct Pulmon Dis 2018; 13:1927-1935. [PMID: 29942125 PMCID: PMC6005295 DOI: 10.2147/copd.s155675] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Early identification of people at risk of developing COPD is crucial for implementing preventive strategies. We aimed to systematically review and assess the performance of all published models that predicted development of COPD. A search was conducted to identify studies that developed a prediction model for COPD development. The Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies was followed when extracting data and appraising the selected studies. Of the 4,481 records identified, 30 articles were selected for full-text review, and only four of these were eligible to be included in the review. The only consistent predictor across all four models was a measure of smoking. Sex and age were used in most models; however, other factors varied widely. Two of the models had good ability to discriminate between people who were correctly or incorrectly classified as at risk of developing COPD. Overall none of the models were particularly useful in accurately predicting future risk of COPD, nor were they good at ruling out future risk of COPD. Further studies are needed to develop new prediction models and robustly validate them in external cohorts.
Collapse
Affiliation(s)
- Melanie C Matheson
- Allergy and Lung Health Unit, Centre for Epidemiology and Biostatistics, School of Population and Global Health, University of Melbourne, Melbourne, VIC, Australia.,Murdoch Children's Research Institute, Melbourne, VIC, Australia
| | - Gayan Bowatte
- Allergy and Lung Health Unit, Centre for Epidemiology and Biostatistics, School of Population and Global Health, University of Melbourne, Melbourne, VIC, Australia.,National Institute of Fundamental Studies, Kandy, Sri Lanka
| | - Jennifer L Perret
- Allergy and Lung Health Unit, Centre for Epidemiology and Biostatistics, School of Population and Global Health, University of Melbourne, Melbourne, VIC, Australia.,Department of Respiratory and Sleep Medicine, Institute for Breathing and Sleep, Austin Health, University of Melbourne, Melbourne, VIC, Australia
| | - Adrian J Lowe
- Allergy and Lung Health Unit, Centre for Epidemiology and Biostatistics, School of Population and Global Health, University of Melbourne, Melbourne, VIC, Australia.,Murdoch Children's Research Institute, Melbourne, VIC, Australia
| | - Chamara V Senaratna
- Allergy and Lung Health Unit, Centre for Epidemiology and Biostatistics, School of Population and Global Health, University of Melbourne, Melbourne, VIC, Australia.,Department of Community Medicine, University of Sri Jayewardenepura, Nugegoda, Sri Lanka
| | - Graham L Hall
- Telethon Kids Institute, Perth, WA, Australia.,School of Physiotherapy and Exercise Science, Curtin University, Perth, WA, Australia.,Centre of Child Health Research, University of Western Australia, Perth, WA, Australia
| | - Nick de Klerk
- Telethon Kids Institute, Perth, WA, Australia.,Centre of Child Health Research, University of Western Australia, Perth, WA, Australia
| | - Louise A Keogh
- Centre for Health Equity, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, VIC, Australia
| | - Christine F McDonald
- Department of Respiratory and Sleep Medicine, Institute for Breathing and Sleep, Austin Health, University of Melbourne, Melbourne, VIC, Australia
| | - Nilakshi T Waidyatillake
- Allergy and Lung Health Unit, Centre for Epidemiology and Biostatistics, School of Population and Global Health, University of Melbourne, Melbourne, VIC, Australia
| | - Peter D Sly
- Child Health Research Centre, The University of Queensland, Brisbane, QLD, Australia
| | - Deborah Jarvis
- MRC-PHE Centre for Environment and Health, Imperial College London, London, UK.,Population Health and Occupational Diseases, National Heart and Lung Institute, Imperial College London, London, UK
| | - Michael J Abramson
- School of Public Health & Preventive Medicine, Monash University, Melbourne, VIC, Australia
| | - Caroline J Lodge
- Allergy and Lung Health Unit, Centre for Epidemiology and Biostatistics, School of Population and Global Health, University of Melbourne, Melbourne, VIC, Australia.,Murdoch Children's Research Institute, Melbourne, VIC, Australia
| | - Shyamali C Dharmage
- Allergy and Lung Health Unit, Centre for Epidemiology and Biostatistics, School of Population and Global Health, University of Melbourne, Melbourne, VIC, Australia.,Murdoch Children's Research Institute, Melbourne, VIC, Australia
| |
Collapse
|
30
|
Bhattacharya M, Jurkovitz C, Shatkay H. Co-occurrence of medical conditions: Exposing patterns through probabilistic topic modeling of snomed codes. J Biomed Inform 2018; 82:31-40. [PMID: 29655947 PMCID: PMC6510486 DOI: 10.1016/j.jbi.2018.04.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2018] [Revised: 04/10/2018] [Accepted: 04/11/2018] [Indexed: 01/03/2023]
Abstract
Patients associated with multiple co-occurring health conditions often face aggravated complications and less favorable outcomes. Co-occurring conditions are especially prevalent among individuals suffering from kidney disease, an increasingly widespread condition affecting 13% of the general population in the US. This study aims to identify and characterize patterns of co-occurring medical conditions in patients employing a probabilistic framework. Specifically, we apply topic modeling in a non-traditional way to find associations across SNOMED-CT codes assigned and recorded in the EHRs of >13,000 patients diagnosed with kidney disease. Unlike most prior work on topic modeling, we apply the method to codes rather than to natural language. Moreover, we quantitatively evaluate the topics, assessing their tightness and distinctiveness, and also assess the medical validity of our results. Our experiments show that each topic is succinctly characterized by a few highly probable and unique disease codes, indicating that the topics are tight. Furthermore, inter-topic distance between each pair of topics is typically high, illustrating distinctiveness. Last, most coded conditions grouped together within a topic, are indeed reported to co-occur in the medical literature. Notably, our results uncover a few indirect associations among conditions that have hitherto not been reported as correlated in the medical literature.
Collapse
Affiliation(s)
- Moumita Bhattacharya
- Computational Biomedicine Lab, Computer and Information Sciences, University of Delaware, Newark, DE, USA.
| | | | - Hagit Shatkay
- Computational Biomedicine Lab, Computer and Information Sciences, University of Delaware, Newark, DE, USA; Center for Bioinformatics and Computational Biology, Delaware Biotechnology Inst, University of Delaware, DE, USA; School of Computing, Queen's University, Kingston, ON K7L 3N6, Canada.
| |
Collapse
|
31
|
Safdari R, Rezaei-Hachesu P, Marjan GhaziSaeedi, Samad-Soltani T, Zolnoori M. Evaluation of Classification Algorithms vs Knowledge-Based Methods for Differential Diagnosis of Asthma in Iranian Patients. INTERNATIONAL JOURNAL OF INFORMATION SYSTEMS IN THE SERVICE SECTOR 2018. [DOI: 10.4018/ijisss.2018040102] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Medical data mining intends to solve real-world problems in the diagnosis and treatment of diseases. This process applies various techniques and algorithms which have different levels of accuracy and precision. The purpose of this article is to apply data mining techniques to the diagnosis of asthma. Sensitivity, specificity and accuracy of K-nearest neighbor, Support Vector Machine, naive Bayes, Artificial Neural Network, classification tree, CN2 algorithms, and related similar studies were evaluated. ROC curves were plotted to show the performance of the authors' approach. Support vector machine (SVM) algorithms achieved the highest accuracy at 98.59% with a sensitivity of 98.59% and a specificity of 98.61% for class 1. Other algorithms had a range of accuracy greater than 87%. The results show that the authors can accurately diagnose asthma approximately 98% of the time based on demographics and clinical data. The study also has a higher sensitivity when compared to expert and knowledge-based systems.
Collapse
Affiliation(s)
- Reza Safdari
- Department of Health Information Technology, Tehran University of Medical Sciences, Tehran, Iran
| | - Peyman Rezaei-Hachesu
- Department of Health Information Technology, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Marjan GhaziSaeedi
- Department of Health Information Technology, Tehran University of Medical Sciences, Tehran, Iran
| | - Taha Samad-Soltani
- Department of Health Information Technology, Tabriz University of Medical Sciences, Tabriz, Iran
| | | |
Collapse
|
32
|
Afzal N, Mallipeddi VP, Sohn S, Liu H, Chaudhry R, Scott CG, Kullo IJ, Arruda-Olson AM. Natural language processing of clinical notes for identification of critical limb ischemia. Int J Med Inform 2018; 111:83-89. [PMID: 29425639 PMCID: PMC5808583 DOI: 10.1016/j.ijmedinf.2017.12.024] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Revised: 12/17/2017] [Accepted: 12/27/2017] [Indexed: 12/27/2022]
Abstract
BACKGROUND Critical limb ischemia (CLI) is a complication of advanced peripheral artery disease (PAD) with diagnosis based on the presence of clinical signs and symptoms. However, automated identification of cases from electronic health records (EHRs) is challenging due to absence of a single definitive International Classification of Diseases (ICD-9 or ICD-10) code for CLI. METHODS AND RESULTS In this study, we extend a previously validated natural language processing (NLP) algorithm for PAD identification to develop and validate a subphenotyping NLP algorithm (CLI-NLP) for identification of CLI cases from clinical notes. We compared performance of the CLI-NLP algorithm with CLI-related ICD-9 billing codes. The gold standard for validation was human abstraction of clinical notes from EHRs. Compared to billing codes the CLI-NLP algorithm had higher positive predictive value (PPV) (CLI-NLP 96%, billing codes 67%, p < 0.001), specificity (CLI-NLP 98%, billing codes 74%, p < 0.001) and F1-score (CLI-NLP 90%, billing codes 76%, p < 0.001). The sensitivity of these two methods was similar (CLI-NLP 84%; billing codes 88%; p < 0.12). CONCLUSIONS The CLI-NLP algorithm for identification of CLI from narrative clinical notes in an EHR had excellent PPV and has potential for translation to patient care as it will enable automated identification of CLI cases for quality projects, clinical decision support tools and support a learning healthcare system.
Collapse
Affiliation(s)
- Naveed Afzal
- Department of Health Sciences Research, Mayo Clinic and Mayo Foundation, Rochester, MN, United States
| | - Vishnu Priya Mallipeddi
- Department of Cardiovascular Diseases, Mayo Clinic and Mayo Foundation, Rochester, MN, United States
| | - Sunghwan Sohn
- Department of Health Sciences Research, Mayo Clinic and Mayo Foundation, Rochester, MN, United States
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic and Mayo Foundation, Rochester, MN, United States
| | - Rajeev Chaudhry
- Division of Primary Care Medicine, Knowledge Delivery Center and Center for Innovation, Mayo Clinic and Mayo Foundation, Rochester, MN, United States
| | - Christopher G Scott
- Department of Health Sciences Research, Mayo Clinic and Mayo Foundation, Rochester, MN, United States
| | - Iftikhar J Kullo
- Department of Cardiovascular Diseases, Mayo Clinic and Mayo Foundation, Rochester, MN, United States
| | - Adelaide M Arruda-Olson
- Department of Cardiovascular Diseases, Mayo Clinic and Mayo Foundation, Rochester, MN, United States.
| |
Collapse
|
33
|
Ye C, Fu T, Hao S, Zhang Y, Wang O, Jin B, Xia M, Liu M, Zhou X, Wu Q, Guo Y, Zhu C, Li YM, Culver DS, Alfreds ST, Stearns F, Sylvester KG, Widen E, McElhinney D, Ling X. Prediction of Incident Hypertension Within the Next Year: Prospective Study Using Statewide Electronic Health Records and Machine Learning. J Med Internet Res 2018; 20:e22. [PMID: 29382633 PMCID: PMC5811646 DOI: 10.2196/jmir.9268] [Citation(s) in RCA: 114] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2017] [Revised: 12/05/2017] [Accepted: 12/06/2017] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND As a high-prevalence health condition, hypertension is clinically costly, difficult to manage, and often leads to severe and life-threatening diseases such as cardiovascular disease (CVD) and stroke. OBJECTIVE The aim of this study was to develop and validate prospectively a risk prediction model of incident essential hypertension within the following year. METHODS Data from individual patient electronic health records (EHRs) were extracted from the Maine Health Information Exchange network. Retrospective (N=823,627, calendar year 2013) and prospective (N=680,810, calendar year 2014) cohorts were formed. A machine learning algorithm, XGBoost, was adopted in the process of feature selection and model building. It generated an ensemble of classification trees and assigned a final predictive risk score to each individual. RESULTS The 1-year incident hypertension risk model attained areas under the curve (AUCs) of 0.917 and 0.870 in the retrospective and prospective cohorts, respectively. Risk scores were calculated and stratified into five risk categories, with 4526 out of 381,544 patients (1.19%) in the lowest risk category (score 0-0.05) and 21,050 out of 41,329 patients (50.93%) in the highest risk category (score 0.4-1) receiving a diagnosis of incident hypertension in the following 1 year. Type 2 diabetes, lipid disorders, CVDs, mental illness, clinical utilization indicators, and socioeconomic determinants were recognized as driving or associated features of incident essential hypertension. The very high risk population mainly comprised elderly (age>50 years) individuals with multiple chronic conditions, especially those receiving medications for mental disorders. Disparities were also found in social determinants, including some community-level factors associated with higher risk and others that were protective against hypertension. CONCLUSIONS With statewide EHR datasets, our study prospectively validated an accurate 1-year risk prediction model for incident essential hypertension. Our real-time predictive analytic model has been deployed in the state of Maine, providing implications in interventions for hypertension and related diseases and hopefully enhancing hypertension care.
Collapse
Affiliation(s)
- Chengyin Ye
- Department of Health Management, Hangzhou Normal University, Hangzhou, China.,Department of Surgery, Stanford University, Stanford, CA, United States
| | - Tianyun Fu
- HBI Solutions Inc, Palo Alto, CA, United States
| | - Shiying Hao
- Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, United States.,Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Stanford, CA, United States
| | - Yan Zhang
- Department of Oncology, The First Hospital of Shijiazhuang, Shijiazhuang, China
| | - Oliver Wang
- HBI Solutions Inc, Palo Alto, CA, United States
| | - Bo Jin
- HBI Solutions Inc, Palo Alto, CA, United States
| | - Minjie Xia
- HBI Solutions Inc, Palo Alto, CA, United States
| | - Modi Liu
- HBI Solutions Inc, Palo Alto, CA, United States
| | - Xin Zhou
- Tianjin Key Laboratory of Cardiovascular Remodeling and Target Organ Injury, Pingjin Hospital Heart Center, Tianjin, China
| | - Qian Wu
- China Electric Power Research Institute, Beijing, China
| | - Yanting Guo
- Department of Surgery, Stanford University, Stanford, CA, United States.,School of Management, Zhejiang University, Hangzhou, China
| | | | - Yu-Ming Li
- Tianjin Key Laboratory of Cardiovascular Remodeling and Target Organ Injury, Pingjin Hospital Heart Center, Tianjin, China
| | | | | | | | - Karl G Sylvester
- Department of Surgery, Stanford University, Stanford, CA, United States
| | - Eric Widen
- HBI Solutions Inc, Palo Alto, CA, United States
| | - Doff McElhinney
- Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, United States.,Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Stanford, CA, United States
| | - Xuefeng Ling
- Department of Surgery, Stanford University, Stanford, CA, United States.,Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Stanford, CA, United States.,Health Care Big Data Center, School of Public Health, Zhejiang University, Hangzhou, China
| |
Collapse
|
34
|
Wang Y, Wang L, Rastegar-Mojarad M, Moon S, Shen F, Afzal N, Liu S, Zeng Y, Mehrabi S, Sohn S, Liu H. Clinical information extraction applications: A literature review. J Biomed Inform 2018; 77:34-49. [PMID: 29162496 PMCID: PMC5771858 DOI: 10.1016/j.jbi.2017.11.011] [Citation(s) in RCA: 340] [Impact Index Per Article: 56.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Revised: 11/01/2017] [Accepted: 11/17/2017] [Indexed: 12/24/2022]
Abstract
BACKGROUND With the rapid adoption of electronic health records (EHRs), it is desirable to harvest information and knowledge from EHRs to support automated systems at the point of care and to enable secondary use of EHRs for clinical and translational research. One critical component used to facilitate the secondary use of EHR data is the information extraction (IE) task, which automatically extracts and encodes clinical information from text. OBJECTIVES In this literature review, we present a review of recent published research on clinical information extraction (IE) applications. METHODS A literature search was conducted for articles published from January 2009 to September 2016 based on Ovid MEDLINE In-Process & Other Non-Indexed Citations, Ovid MEDLINE, Ovid EMBASE, Scopus, Web of Science, and ACM Digital Library. RESULTS A total of 1917 publications were identified for title and abstract screening. Of these publications, 263 articles were selected and discussed in this review in terms of publication venues and data sources, clinical IE tools, methods, and applications in the areas of disease- and drug-related studies, and clinical workflow optimizations. CONCLUSIONS Clinical IE has been used for a wide range of applications, however, there is a considerable gap between clinical studies using EHR data and studies using clinical IE. This study enabled us to gain a more concrete understanding of the gap and to provide potential solutions to bridge this gap.
Collapse
Affiliation(s)
- Yanshan Wang
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Liwei Wang
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Majid Rastegar-Mojarad
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Sungrim Moon
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Feichen Shen
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Naveed Afzal
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Sijia Liu
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Yuqun Zeng
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Saeed Mehrabi
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Sunghwan Sohn
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Hongfang Liu
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States.
| |
Collapse
|
35
|
Spathis D, Vlamos P. Diagnosing asthma and chronic obstructive pulmonary disease with machine learning. Health Informatics J 2017; 25:811-827. [PMID: 28820010 DOI: 10.1177/1460458217723169] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
This study examines the clinical decision support systems in healthcare, in particular about the prevention, diagnosis and treatment of respiratory diseases, such as Asthma and chronic obstructive pulmonary disease. The empirical pulmonology study of a representative sample (n = 132) attempts to identify the major factors that contribute to the diagnosis of these diseases. Machine learning results show that in chronic obstructive pulmonary disease's case, Random Forest classifier outperforms other techniques with 97.7 per cent precision, while the most prominent attributes for diagnosis are smoking, forced expiratory volume 1, age and forced vital capacity. In asthma's case, the best precision, 80.3 per cent, is achieved again with the Random Forest classifier, while the most prominent attribute is MEF2575.
Collapse
Affiliation(s)
- Dimitris Spathis
- Ionian University, Greece.,Aristotle University of Thessaloniki, Greece
| | | |
Collapse
|
36
|
|
37
|
Hao S, Fu T, Wu Q, Jin B, Zhu C, Hu Z, Guo Y, Zhang Y, Yu Y, Fouts T, Ng P, Culver DS, Alfreds ST, Stearns F, Sylvester KG, Widen E, McElhinney DB, Ling XB. Estimating One-Year Risk of Incident Chronic Kidney Disease: Retrospective Development and Validation Study Using Electronic Medical Record Data From the State of Maine. JMIR Med Inform 2017; 5:e21. [PMID: 28747298 PMCID: PMC5550735 DOI: 10.2196/medinform.7954] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2017] [Revised: 06/29/2017] [Accepted: 07/10/2017] [Indexed: 01/28/2023] Open
Abstract
Background Chronic kidney disease (CKD) is a major public health concern in the United States with high prevalence, growing incidence, and serious adverse outcomes. Objective We aimed to develop and validate a model to identify patients at risk of receiving a new diagnosis of CKD (incident CKD) during the next 1 year in a general population. Methods The study population consisted of patients who had visited any care facility in the Maine Health Information Exchange network any time between January 1, 2013, and December 31, 2015, and had no history of CKD diagnosis. Two retrospective cohorts of electronic medical records (EMRs) were constructed for model derivation (N=1,310,363) and validation (N=1,430,772). The model was derived using a gradient tree-based boost algorithm to assign a score to each individual that measured the probability of receiving a new diagnosis of CKD from January 1, 2014, to December 31, 2014, based on the preceding 1-year clinical profile. A feature selection process was conducted to reduce the dimension of the data from 14,680 EMR features to 146 as predictors in the final model. Relative risk was calculated by the model to gauge the risk ratio of the individual to population mean of receiving a CKD diagnosis in next 1 year. The model was tested on the validation cohort to predict risk of CKD diagnosis in the period from January 1, 2015, to December 31, 2015, using the preceding 1-year clinical profile. Results The final model had a c-statistic of 0.871 in the validation cohort. It stratified patients into low-risk (score 0-0.005), intermediate-risk (score 0.005-0.05), and high-risk (score ≥ 0.05) levels. The incidence of CKD in the high-risk patient group was 7.94%, 13.7 times higher than the incidence in the overall cohort (0.58%). Survival analysis showed that patients in the 3 risk categories had significantly different CKD outcomes as a function of time (P<.001), indicating an effective classification of patients by the model. Conclusions We developed and validated a model that is able to identify patients at high risk of having CKD in the next 1 year by statistically learning from the EMR-based clinical history in the preceding 1 year. Identification of these patients indicates care opportunities such as monitoring and adopting intervention plans that may benefit the quality of care and outcomes in the long term.
Collapse
Affiliation(s)
- Shiying Hao
- Department of Epidemiology and Health Statistics, School of Public Health, School of Medicine, Zhejiang University, Hangzhou, China.,Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, United States.,Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States
| | - Tianyun Fu
- HBI Solutions Inc, Palo Alto, CA, United States
| | - Qian Wu
- Department of Surgery, Stanford University, Stanford, CA, United States.,China Electric Power Research Institute, Beijing, China
| | - Bo Jin
- HBI Solutions Inc, Palo Alto, CA, United States
| | | | - Zhongkai Hu
- Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, United States.,Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States
| | - Yanting Guo
- Department of Surgery, Stanford University, Stanford, CA, United States.,School of Management, Zhejiang University, Hangzhou, China
| | - Yan Zhang
- Department of Surgery, Stanford University, Stanford, CA, United States.,Department of Oncology, The First Hospital of Shijiazhuang, Shijiazhuang, China
| | - Yunxian Yu
- Department of Epidemiology and Health Statistics, School of Public Health, School of Medicine, Zhejiang University, Hangzhou, China
| | - Terry Fouts
- Empactful Capital, San Francisco, CA, United States
| | - Phillip Ng
- Sequoia Hospital, Redwood City, CA, United States
| | | | | | | | - Karl G Sylvester
- Department of Surgery, Stanford University, Stanford, CA, United States
| | - Eric Widen
- HBI Solutions Inc, Palo Alto, CA, United States
| | - Doff B McElhinney
- Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, United States.,Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States
| | - Xuefeng B Ling
- Department of Epidemiology and Health Statistics, School of Public Health, School of Medicine, Zhejiang University, Hangzhou, China.,Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States.,Department of Surgery, Stanford University, Stanford, CA, United States
| |
Collapse
|
38
|
Xie S, Greenblatt R, Levy MZ, Himes BE. Enhancing Electronic Health Record Data with Geospatial Information. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2017; 2017:123-132. [PMID: 28815121 PMCID: PMC5543367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Electronic Health Record (EHR)-derived data is a valuable resource for research, and efforts are underway to overcome some of its limitations by using data from external sources to gain a fuller picture of patient characteristics, symptoms, and exposures. Our goal was to assess the utility of augmenting EHR data with geocoded patient addresses to identify geospatial variation of disease that is not explained by EHR-derived demographic factors. Using 2011-2014 encounter data from 27,604 University of Pennsylvania Hospital System asthma patients, we identified factors associated with asthma exacerbations: risk was higher in female, black, middle aged to elderly, and obese patients, as well as those with positive smoking history and with Medicare or Medicaid vs. private insurance. Significant geospatial variability of asthma exacerbations was found using generalized additive models, even after adjusting for demographic factors. Our work shows that geospatial data can be used to cost-effectively enhance EHR data.
Collapse
Affiliation(s)
- Sherrie Xie
- Department of Biostatistics and Epidemiology, University of Pennsylvania, Philadelphia, PA, USA
| | - Rebecca Greenblatt
- Department of Biostatistics and Epidemiology, University of Pennsylvania, Philadelphia, PA, USA
| | - Michael Z Levy
- Department of Biostatistics and Epidemiology, University of Pennsylvania, Philadelphia, PA, USA
| | - Blanca E Himes
- Department of Biostatistics and Epidemiology, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
39
|
Abstract
In the last three decades, researchers have examined extensively how context-aware systems can assist people, specifically those suffering from incurable diseases, to help them cope with their medical illness. Over the years, a huge number of studies on Chronic Obstructive Pulmonary Disease (COPD) have been published. However, how to derive relevant attributes and early detection of COPD exacerbations remains a challenge. In this research work, we will use an efficient algorithm to select relevant attributes where there is no proper approach in this domain. Such algorithm predicts exacerbations with high accuracy by adding discretization process, and organizes the pertinent attributes in priority order based on their impact to facilitate the emergency medical treatment. In this paper, we propose an extension of our existing Helper Context-Aware Engine System (HCES) for COPD. This project uses Bayesian network algorithm to depict the dependency between the COPD symptoms (attributes) in order to overcome the insufficiency and the independency hypothesis of naïve Bayesian. In addition, the dependency in Bayesian network is realized using TAN algorithm rather than consulting pneumologists. All these combined algorithms (discretization, selection, dependency, and the ordering of the relevant attributes) constitute an effective prediction model, comparing to effective ones. Moreover, an investigation and comparison of different scenarios of these algorithms are also done to verify which sequence of steps of prediction model gives more accurate results. Finally, we designed and validated a computer-aided support application to integrate different steps of this model. The findings of our system HCES has shown promising results using Area Under Receiver Operating Characteristic (AUC = 81.5%).
Collapse
|
40
|
Kang MJ, Jin Y, Jin T, Lee SM. Automated Medication Error Risk Assessment System (Auto-MERAS). J Nurs Care Qual 2017; 33:86-93. [PMID: 28505057 DOI: 10.1097/ncq.0000000000000266] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
This study developed the Automated Medical Error Risk Assessment System (Auto-MERAS), which was incorporated into the electronic health record system. The system itself maintained high predictive validity for medication errors at the area under the receiver operating characteristic curves of above 0.80 at the time of development and validation. This study has found possibilities to predict the risk of medication errors that are sensitive to situational and environmental risks without additional data entry from nurses.
Collapse
Affiliation(s)
- Min-Jeoung Kang
- College of Nursing, The Catholic University of Korea, Seoul, Korea
| | | | | | | |
Collapse
|
41
|
Aref-Eshghi E, Oake J, Godwin M, Aubrey-Bassler K, Duke P, Mahdavian M, Asghari S. Identification of Dyslipidemic Patients Attending Primary Care Clinics Using Electronic Medical Record (EMR) Data from the Canadian Primary Care Sentinel Surveillance Network (CPCSSN) Database. J Med Syst 2017; 41:45. [PMID: 28188559 DOI: 10.1007/s10916-017-0694-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2016] [Accepted: 01/30/2017] [Indexed: 12/14/2022]
Abstract
The objective of this study was to define the optimal algorithm to identify patients with dyslipidemia using electronic medical records (EMRs). EMRs of patients attending primary care clinics in St. John's, Newfoundland and Labrador (NL), Canada during 2009-2010, were studied to determine the best algorithm for identification of dyslipidemia. Six algorithms containing three components, dyslipidemia ICD coding, lipid lowering medication use, and abnormal laboratory lipid levels, were tested against a gold standard, defined as the existence of any of the three criteria. Linear discriminate analysis, and bootstrapping were performed following sensitivity/specificity testing and receiver's operating curve analysis. Two validating datasets, NL records of 2011-2014, and Canada-wide records of 2010-2012, were used to replicate the results. Relative to the gold standard, combining laboratory data together with lipid lowering medication consumption yielded the highest sensitivity (99.6%), NPV (98.1%), Kappa agreement (0.98), and area under the curve (AUC, 0.998). The linear discriminant analysis for this combination resulted in an error rate of 0.15 and an Eigenvalue of 1.99, and the bootstrapping led to AUC: 0.998, 95% confidence interval: 0.997-0.999, Kappa: 0.99. This algorithm in the first validating dataset yielded a sensitivity of 97%, Negative Predictive Value (NPV) = 83%, Kappa = 0.88, and AUC = 0.98. These figures for the second validating data set were 98%, 93%, 0.95, and 0.99, respectively. Combining laboratory data with lipid lowering medication consumption within the EMR is the best algorithm for detecting dyslipidemia. These results can generate standardized information systems for dyslipidemia and other chronic disease investigations using EMRs.
Collapse
Affiliation(s)
- Erfan Aref-Eshghi
- Faculty of Medicine, Center for Rural Health Studies, Agnes Cowan Hostel, Health Sciences Centre, Memorial University of Newfoundland, Room 425, 300 Prince Philip Drive, St. John's, NL, A1B 3V6, Canada.,Primary Healthcare Research Unit, Department of Family Medicine, Faculty of Medicine, Memorial University of Newfoundland, St. John's, NL, Canada
| | - Justin Oake
- Faculty of Medicine, Center for Rural Health Studies, Agnes Cowan Hostel, Health Sciences Centre, Memorial University of Newfoundland, Room 425, 300 Prince Philip Drive, St. John's, NL, A1B 3V6, Canada.,Primary Healthcare Research Unit, Department of Family Medicine, Faculty of Medicine, Memorial University of Newfoundland, St. John's, NL, Canada
| | - Marshall Godwin
- Primary Healthcare Research Unit, Department of Family Medicine, Faculty of Medicine, Memorial University of Newfoundland, St. John's, NL, Canada
| | - Kris Aubrey-Bassler
- Faculty of Medicine, Center for Rural Health Studies, Agnes Cowan Hostel, Health Sciences Centre, Memorial University of Newfoundland, Room 425, 300 Prince Philip Drive, St. John's, NL, A1B 3V6, Canada
| | - Pauline Duke
- Primary Healthcare Research Unit, Department of Family Medicine, Faculty of Medicine, Memorial University of Newfoundland, St. John's, NL, Canada
| | - Masoud Mahdavian
- Department of Medicine, Faculty of Medicine, Memorial University of Newfoundland, St. John's, NL, Canada
| | - Shabnam Asghari
- Faculty of Medicine, Center for Rural Health Studies, Agnes Cowan Hostel, Health Sciences Centre, Memorial University of Newfoundland, Room 425, 300 Prince Philip Drive, St. John's, NL, A1B 3V6, Canada. .,Primary Healthcare Research Unit, Department of Family Medicine, Faculty of Medicine, Memorial University of Newfoundland, St. John's, NL, Canada.
| |
Collapse
|
42
|
Non-obvious correlations to disease management unraveled by Bayesian artificial intelligence analyses of CMS data. Artif Intell Med 2016; 74:1-8. [PMID: 27964799 DOI: 10.1016/j.artmed.2016.11.001] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2016] [Revised: 11/01/2016] [Accepted: 11/07/2016] [Indexed: 12/23/2022]
Abstract
OBJECTIVE Given the availability of extensive digitized healthcare data from medical records, claims and prescription information, it is now possible to use hypothesis-free, data-driven approaches to mine medical databases for novel insight. The goal of this analysis was to demonstrate the use of artificial intelligence based methods such as Bayesian networks to open up opportunities for creation of new knowledge in management of chronic conditions. MATERIALS AND METHODS Hospital level Medicare claims data containing discharge numbers for most common diagnoses were analyzed in a hypothesis-free manner using Bayesian networks learning methodology. RESULTS While many interactions identified between discharge rates of diagnoses using this data set are supported by current medical knowledge, a novel interaction linking asthma and renal failure was discovered. This interaction is non-obvious and had not been looked at by the research and clinical communities in epidemiological or clinical data. A plausible pharmacological explanation of this link is proposed together with a verification of the risk significance by conventional statistical analysis. CONCLUSION Potential clinical and molecular pathways defining the relationship between commonly used asthma medications and renal disease are discussed. The study underscores the need for further epidemiological research to validate this novel hypothesis. Validation will lead to advancement in clinical treatment of asthma & bronchitis, thereby, improving patient outcomes and leading to long term cost savings. In summary, this study demonstrates that application of advanced artificial intelligence methods in healthcare has the potential to enhance the quality of care by discovering non-obvious, clinically relevant relationships and enabling timely care intervention.
Collapse
|
43
|
Marín de Mas I, Fanchon E, Papp B, Kalko S, Roca J, Cascante M. Molecular mechanisms underlying COPD-muscle dysfunction unveiled through a systems medicine approach. Bioinformatics 2016; 33:95-103. [PMID: 27794560 DOI: 10.1093/bioinformatics/btw566] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2016] [Revised: 08/26/2016] [Accepted: 08/29/2016] [Indexed: 01/04/2023] Open
Abstract
MOTIVATION Skeletal muscle dysfunction is a systemic effect in one-third of patients with chronic obstructive pulmonary disease (COPD), characterized by high reactive-oxygen-species (ROS) production and abnormal endurance training-induced adaptive changes. However, the role of ROS in COPD remains unclear, not least because of the lack of appropriate tools to study multifactorial diseases. RESULTS We describe a discrete model-driven method combining mechanistic and probabilistic approaches to decipher the role of ROS on the activity state of skeletal muscle regulatory network, assessed before and after an 8-week endurance training program in COPD patients and healthy subjects. In COPD, our computational analysis indicates abnormal training-induced regulatory responses leading to defective tissue remodeling and abnormal energy metabolism. Moreover, we identified tnf, insr, inha and myc as key regulators of abnormal training-induced adaptations in COPD. The tnf-insr pair was identified as a promising target for therapeutic interventions. Our work sheds new light on skeletal muscle dysfunction in COPD, opening new avenues for cost-effective therapies. It overcomes limitations of previous computational approaches showing high potential for the study of other multi-factorial diseases such as diabetes or cancer. CONTACT jroca@clinic.ub.es or martacascante@ub.eduSupplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Igor Marín de Mas
- Department of Biochemistry and Molecular Biology, Faculty of Biology, Institute of Biomedicine of University of Barcelona (IBUB) and IDIBAPS, Diagonal 645, Barcelona 08028, Spain.,Institut d' Investigacions Biomediques August Pi i Sunyer (IDIBAPS), Barcelona 08028, Spain.,Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Center of the Hungarian Academy of Sciences, Temesvári krt. 62, Szeged H-6726, Hungary
| | - Eric Fanchon
- Université Grenoble Alpes-CNRS, TIMC-IMAG UMR 5525, Faculté de Médecine, Grenoble 38041, France
| | - Balázs Papp
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Center of the Hungarian Academy of Sciences, Temesvári krt. 62, Szeged H-6726, Hungary
| | - Susana Kalko
- Bioinformatics Core Facility, IDIBAPS-CEK, Hospital Clínic, University de Barcelona, Barcelona 08036, Spain
| | - Josep Roca
- Institut d' Investigacions Biomediques August Pi i Sunyer (IDIBAPS), Barcelona 08028, Spain.,Department of Pulmonary Medicine, Hospital Clínic, IDIBAPS, CIBERES, Universitat de Barcelona, Barcelona 08036, Spain
| | - Marta Cascante
- Department of Biochemistry and Molecular Biology, Faculty of Biology, Institute of Biomedicine of University of Barcelona (IBUB) and IDIBAPS, Diagonal 645, Barcelona 08028, Spain.,Institut d' Investigacions Biomediques August Pi i Sunyer (IDIBAPS), Barcelona 08028, Spain
| |
Collapse
|
44
|
Karmakar C, Luo W, Tran T, Berk M, Venkatesh S. Predicting Risk of Suicide Attempt Using History of Physical Illnesses From Electronic Medical Records. JMIR Ment Health 2016; 3:e19. [PMID: 27400764 PMCID: PMC4960407 DOI: 10.2196/mental.5475] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/21/2015] [Revised: 02/23/2016] [Accepted: 02/26/2016] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Although physical illnesses, routinely documented in electronic medical records (EMR), have been found to be a contributing factor to suicides, no automated systems use this information to predict suicide risk. OBJECTIVE The aim of this study is to quantify the impact of physical illnesses on suicide risk, and develop a predictive model that captures this relationship using EMR data. METHODS We used history of physical illnesses (except chapter V: Mental and behavioral disorders) from EMR data over different time-periods to build a lookup table that contains the probability of suicide risk for each chapter of the International Statistical Classification of Diseases and Related Health Problems, 10th Revision (ICD-10) codes. The lookup table was then used to predict the probability of suicide risk for any new assessment. Based on the different lengths of history of physical illnesses, we developed six different models to predict suicide risk. We tested the performance of developed models to predict 90-day risk using historical data over differing time-periods ranging from 3 to 48 months. A total of 16,858 assessments from 7399 mental health patients with at least one risk assessment was used for the validation of the developed model. The performance was measured using area under the receiver operating characteristic curve (AUC). RESULTS The best predictive results were derived (AUC=0.71) using combined data across all time-periods, which significantly outperformed the clinical baseline derived from routine risk assessment (AUC=0.56). The proposed approach thus shows potential to be incorporated in the broader risk assessment processes used by clinicians. CONCLUSIONS This study provides a novel approach to exploit the history of physical illnesses extracted from EMR (ICD-10 codes without chapter V-mental and behavioral disorders) to predict suicide risk, and this model outperforms existing clinical assessments of suicide risk.
Collapse
Affiliation(s)
- Chandan Karmakar
- Centre for Pattern Recognition and Data Analytics, Deakin University, Geelong, Australia.
| | | | | | | | | |
Collapse
|
45
|
Goldstein BA, Navar AM, Pencina MJ, Ioannidis JPA. Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. J Am Med Inform Assoc 2016; 24:198-208. [PMID: 27189013 DOI: 10.1093/jamia/ocw042] [Citation(s) in RCA: 449] [Impact Index Per Article: 56.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2015] [Revised: 01/25/2016] [Accepted: 02/20/2016] [Indexed: 12/23/2022] Open
Abstract
OBJECTIVE Electronic health records (EHRs) are an increasingly common data source for clinical risk prediction, presenting both unique analytic opportunities and challenges. We sought to evaluate the current state of EHR based risk prediction modeling through a systematic review of clinical prediction studies using EHR data. METHODS We searched PubMed for articles that reported on the use of an EHR to develop a risk prediction model from 2009 to 2014. Articles were extracted by two reviewers, and we abstracted information on study design, use of EHR data, model building, and performance from each publication and supplementary documentation. RESULTS We identified 107 articles from 15 different countries. Studies were generally very large (median sample size = 26 100) and utilized a diverse array of predictors. Most used validation techniques (n = 94 of 107) and reported model coefficients for reproducibility (n = 83). However, studies did not fully leverage the breadth of EHR data, as they uncommonly used longitudinal information (n = 37) and employed relatively few predictor variables (median = 27 variables). Less than half of the studies were multicenter (n = 50) and only 26 performed validation across sites. Many studies did not fully address biases of EHR data such as missing data or loss to follow-up. Average c-statistics for different outcomes were: mortality (0.84), clinical prediction (0.83), hospitalization (0.71), and service utilization (0.71). CONCLUSIONS EHR data present both opportunities and challenges for clinical risk prediction. There is room for improvement in designing such studies.
Collapse
Affiliation(s)
- Benjamin A Goldstein
- Department of Biostatistics & Bioinformatics, Duke University, Durham, NC 27710, USA .,Center for Predictive Medicine, Duke Clinical Research Institute, Duke University, Durham, NC 27710, USA
| | - Ann Marie Navar
- Center for Predictive Medicine, Duke Clinical Research Institute, Duke University, Durham, NC 27710, USA.,Division of Cardiology at Duke University Medical Center, Duhram, NC 27710, USA
| | - Michael J Pencina
- Department of Biostatistics & Bioinformatics, Duke University, Durham, NC 27710, USA.,Center for Predictive Medicine, Duke Clinical Research Institute, Duke University, Durham, NC 27710, USA
| | - John P A Ioannidis
- Department of Medicine, Stanford University, Palo Alto, CA 94305, USA.,Department of Health Research and Policy, and Statistics and Meta-Research Innovation Center at Stanford, Stanford University, Palo Alto, CA 94305, USA
| |
Collapse
|
46
|
Afzal N, Sohn S, Abram S, Liu H, Kullo IJ, Arruda-Olson AM. Identifying Peripheral Arterial Disease Cases Using Natural Language Processing of Clinical Notes. ... IEEE-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS. IEEE-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS 2016; 2016:126-131. [PMID: 28111640 PMCID: PMC5248569 DOI: 10.1109/bhi.2016.7455851] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Peripheral arterial disease (PAD) is a chronic disease that affects millions of people worldwide. Ascertaining PAD status from clinical notes by manual chart review is labor intensive and time consuming. In this paper, we describe a natural language processing (NLP) algorithm for automated ascertainment of PAD status from clinical notes using predetermined criteria. We developed and evaluated our system against a gold standard that was created by medical experts based on manual chart review. Our system ascertained PAD status from clinical notes with high sensitivity (0.96), positive predictive value (0.92), negative predictive value (0.99) and specificity (0.98). NLP approaches can be used for rapid, efficient and automated ascertainment of PAD cases with implications for patient care and epidemiologic research.
Collapse
Affiliation(s)
- Naveed Afzal
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester MN
| | - Sunghwan Sohn
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester MN
| | - Sara Abram
- Division of Cardiovascular Diseases, Mayo Clinic, Rochester MN
| | - Hongfang Liu
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester MN
| | | | | |
Collapse
|
47
|
Uzuner Ö, Stubbs A. Practical applications for natural language processing in clinical research: The 2014 i2b2/UTHealth shared tasks. J Biomed Inform 2015; 58 Suppl:S1-S5. [PMID: 26515500 PMCID: PMC4978169 DOI: 10.1016/j.jbi.2015.10.007] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2015] [Revised: 10/08/2015] [Accepted: 10/14/2015] [Indexed: 12/29/2022]
Affiliation(s)
- Özlem Uzuner
- Department of Information Studies, State University of New York at Albany, Albany, NY, USA.
| | - Amber Stubbs
- School of Library and Information Science, Simmons College, Boston, MA, USA.
| |
Collapse
|
48
|
Canino G, Guzzi PH, Tradigo G, Zhang A, Veltri P. On the Analysis of Diseases and Their Related Geographical Data. IEEE J Biomed Health Inform 2015; 21:228-237. [PMID: 26540721 DOI: 10.1109/jbhi.2015.2496424] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Electronic medical records (EMRs) store data related to patients information enrolled during their stay in health structures. Data stored into EMRs span from data crawled from biological laboratories to textual description of diseases and diagnostic device results (e.g., biomedical images). Each EMR is related to a diagnosis related group (DRG) record. A DRG record is a record associated with a citizen that has been cured in a hospital. It contains a code, called major diagnostic category (MDC), which summarizes the treated disease and allows to reimburse costs related to patient treatments during his staying in health structures. DRGs are used for administrative process (e.g., costs and reimbursement management) as well as disease monitoring. Associating diagnostic codes with external information (such as environmental and geographical data) and with information filtered from EMRs (e.g., biological results or analytes values) can be useful to monitor citizens wellness status. We propose a methodology to analyze such data based on a multistep process. First, we cross reference data by using a semantics-based clustering procedure, extract information from EMRs, and then, cluster them by looking for similar patterns of diseases. Then, biological records in each disease cluster are analyzed to evaluate intracluster similarity by selecting analytes typologies and values. Finally, biological data is related to diagnosis codes and geometrically projected in areas of interest in order to map calculated outlier patients. We applied the methodology on two case studies: 1) diagnosis codes and biochemical analytes of 20 000 biological analyses about hospitalized patients during one observation year and 2) the correlation between cardiovascular diseases and water quality in a southern Italian region. Preliminary findings show the effectiveness of our method.
Collapse
|
49
|
Solomon JW, Nielsen RD. Predicting changes in systolic blood pressure using longitudinal patient records. J Biomed Inform 2015. [PMID: 26210360 DOI: 10.1016/j.jbi.2015.06.024] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
OBJECTIVE This paper introduces a model that predicts future changes in systolic blood pressure (SBP) based on structured and unstructured (text-based) information from longitudinal clinical records. METHOD For each patient, the clinical records are sorted in chronological order and SBP measurements are extracted from them. The model predicts future changes in SBP based on the preceding clinical notes. This is accomplished using least median squares regression on salient features found using a feature selection algorithm. RESULTS Using the prediction model, a correlation coefficient of 0.47 is achieved on unseen test data (p<.0001). This is in contrast to a baseline correlation coefficient of 0.39.
Collapse
|
50
|
Relationship between serum leptin and chronic obstructive pulmonary disease in US adults: results from the third National Health and Nutrition Examination Survey. J Investig Med 2015; 62:934-7. [PMID: 25118115 DOI: 10.1097/jim.0000000000000104] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
BACKGROUND Recent studies suggest an important role for leptin in respiratory immune responses and pathogenesis of inflammatory respiratory diseases. There has been an interest to explore whether leptin plays any role in the pathogenesis of chronic obstructive pulmonary disease (COPD). OBJECTIVE We conducted a population-based study to evaluate the relationship between serum leptin and COPD in the third US National Health and Nutrition Examination Survey participants. PARTICIPANTS AND DESIGN Our study group was constituted by 6415 adults who had fasting serum leptin and underwent spirometry measurement. MAIN OUTCOME MEASURES Serum leptin levels were compared (1) between subjects with normal lung function and those with COPD and (2) among COPD subjects with different severities. RESULTS Among male participants, 2257 were controls, and 680 had COPD. Compared with controls, COPD subjects were older (62 vs 43 years) and had higher prevalence of smokers (78% vs 58%), lower body mass index (BMI) (26.3 vs 26.9), and higher serum leptin levels (6.6 vs 5.9). For female participants, 2918 were controls, and 560 had COPD. Those with COPD were older (60 vs 43 years) and had lower BMI (26.9 vs 27.7). No differences in serum leptin levels were observed. The independent predictors of COPD in both sexes were age, BMI, and smoking, but not serum leptin. There were no differences in serum leptin among COPD subjects with different severities. CONCLUSIONS We did not find any significant difference in the levels of serum leptin in subjects with COPD. Our data provide indirect evidence against a major role for serum leptin in the pathogenesis of COPD in humans.
Collapse
|