1
|
Chaudhary R, Nourelahi M, Thoma FW, Gellad WF, Lo-Ciganic WH, Chaudhary R, Dua A, Bliden KP, Gurbel PA, Neal MD, Jain S, Bhonsale A, Mulukutla SR, Wang Y, Harinstein ME, Saba S, Visweswaran S. Machine Learning Predicts Bleeding Risk in Atrial Fibrillation Patients on Direct Oral Anticoagulant. Am J Cardiol 2025:S0002-9149(25)00115-8. [PMID: 40015543 DOI: 10.1016/j.amjcard.2025.02.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/11/2024] [Revised: 02/01/2025] [Accepted: 02/23/2025] [Indexed: 03/01/2025]
Abstract
Predicting major bleeding in non-valvular atrial fibrillation (AF) patients on direct oral anticoagulants (DOACs) is crucial for personalized care. Alternatives like left atrial appendage closure devices lower stroke risk with fewer non-procedural bleeds. This study compares machine learning (ML) models with conventional bleeding risk scores (HAS-BLED, ORBIT, and ATRIA) for predicting bleeding events requiring hospitalization in AF patients on DOACs at their index cardiologist visit. This retrospective cohort study used electronic health records from 2010-2022 at the University of Pittsburgh Medical Center. It included 24,468 non-valvular AF patients (age ≥18) on DOACs, excluding those with prior significant bleeding or warfarin use. The primary outcome was hospitalization for bleeding within one year, with follow-up at one, two, and five years. ML algorithms (logistic regression, classification trees, random forest, XGBoost, k-nearest neighbor, naïve Bayes) were compared for performance. Of 24,468 patients, 553 (2.3%) had bleeding within one year, 829 (3.5%) within two years, and 1,292 (5.8%) within five years. ML models outperformed HAS-BLED, ATRIA, and ORBIT in 1-year predictions. The random forest model achieved an AUC of 0.76 (0.70-0.81), G-Mean of 0.67, and net reclassification index of 0.14 compared to HAS-BLED's AUC of 0.57 (p<0.001). ML models showed superior results across all timepoints and for hemorrhagic stroke. SHAP analysis identified new risk factors, including BMI, cholesterol profile, and insurance type. In conclusion, ML models demonstrated improved performance to conventional bleeding risk scores and uncovered novel risk factors, offering potential for more personalized bleeding risk assessment in AF patients on DOACs.
Collapse
Affiliation(s)
- Rahul Chaudhary
- Heart and Vascular Institute, University of Pittsburgh Medical Center, Pittsburgh, PA, USA; Department of Computer Science, Georgia Institute of Technology, Atlanta, GA, USA; AI-HEART Lab, Pittsburgh, PA, USA.
| | - Mehdi Nourelahi
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Floyd W Thoma
- Heart and Vascular Institute, University of Pittsburgh Medical Center, Pittsburgh, PA, USA
| | - Walid F Gellad
- Division of General Internal Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Wei-Hsuan Lo-Ciganic
- Division of General Internal Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania; Center for Pharmaceutical Policy and Prescribing, University of Pittsburgh, Pittsburgh, USA; Geriatric Research Education and Clinical Center, North Florida/South Georgia Veterans Health System, Gainesville, USA
| | - Rohit Chaudhary
- Uniting New South Wales.Autralian Capital Territory, Sydney, AUS
| | - Anahita Dua
- Division of Vascular and Endovascular Surgery, Massachusetts General Hospital, Boston, MA, USA
| | - Kevin P Bliden
- Sinai Center of Thrombosis Research and Drug Development, Sinai Hospital of Baltimore, Baltimore, MD, USA
| | - Paul A Gurbel
- Sinai Center of Thrombosis Research and Drug Development, Sinai Hospital of Baltimore, Baltimore, MD, USA
| | - Matthew D Neal
- Trauma and Transfusion Medicine Research Center, Department of Surgery, University of Pittsburgh, Pittsburgh, PA, USA
| | - Sandeep Jain
- Heart and Vascular Institute, University of Pittsburgh Medical Center, Pittsburgh, PA, USA
| | - Aditya Bhonsale
- Heart and Vascular Institute, University of Pittsburgh Medical Center, Pittsburgh, PA, USA
| | - Suresh R Mulukutla
- Heart and Vascular Institute, University of Pittsburgh Medical Center, Pittsburgh, PA, USA; Clinical Analytics, University of Pittsburgh Medical Center, Pittsburgh, PA, USA
| | - Yanshan Wang
- Department of Computer Science, Georgia Institute of Technology, Atlanta, GA, USA; Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA; Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA; Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, PA, USA
| | - Matthew E Harinstein
- Heart and Vascular Institute, University of Pittsburgh Medical Center, Pittsburgh, PA, USA
| | - Samir Saba
- Heart and Vascular Institute, University of Pittsburgh Medical Center, Pittsburgh, PA, USA
| | - Shyam Visweswaran
- Department of Computer Science, Georgia Institute of Technology, Atlanta, GA, USA; Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
2
|
Oneto L, Chicco D. Eight quick tips for biologically and medically informed machine learning. PLoS Comput Biol 2025; 21:e1012711. [PMID: 39787089 PMCID: PMC11717244 DOI: 10.1371/journal.pcbi.1012711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2025] Open
Abstract
Machine learning has become a powerful tool for computational analysis in the biomedical sciences, with its effectiveness significantly enhanced by integrating domain-specific knowledge. This integration has give rise to informed machine learning, in contrast to studies that lack domain knowledge and treat all variables equally (uninformed machine learning). While the application of informed machine learning to bioinformatics and health informatics datasets has become more seamless, the likelihood of errors has also increased. To address this drawback, we present eight guidelines outlining best practices for employing informed machine learning methods in biomedical sciences. These quick tips offer recommendations on various aspects of informed machine learning analysis, aiming to assist researchers in generating more robust, explainable, and dependable results. Even if we originally crafted these eight simple suggestions for novices, we believe they are deemed relevant for expert computational researchers as well.
Collapse
Affiliation(s)
- Luca Oneto
- Dipartimento di Informatica Bioingegneria Robotica e Ingegneria dei Sistemi, Università di Genova, Genoa, Italy
| | - Davide Chicco
- Dipartimento di Informatica Sistemistica e Comunicazione, Università di Milano-Bicocca, Milan, Italy
- Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
3
|
Laffafchi S, Ebrahimi A, Kafan S. Efficient management of pulmonary embolism diagnosis using a two-step interconnected machine learning model based on electronic health records data. Health Inf Sci Syst 2024; 12:17. [PMID: 38464464 PMCID: PMC10917730 DOI: 10.1007/s13755-024-00276-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 01/17/2024] [Indexed: 03/12/2024] Open
Abstract
Pulmonary Embolism (PE) is a life-threatening clinical disease with no specific clinical symptoms and Computed Tomography Angiography (CTA) is used for diagnosis. Clinical decision support scoring systems like Wells and rGeneva based on PE risk factors have been developed to estimate the pre-test probability but are underused, leading to continuous overuse of CTA imaging. This diagnostic study aimed to propose a novel approach for efficient management of PE diagnosis using a two-step interconnected machine learning framework directly by analyzing patients' Electronic Health Records data. First, we performed feature importance analysis according to the result of LightGBM superiority for PE prediction, then four state-of-the-art machine learning methods were applied for PE prediction based on the feature importance results, enabling swift and accurate pre-test diagnosis. Throughout the study patients' data from different departments were collected from Sina educational hospital, affiliated with the Tehran University of medical sciences in Iran. Generally, the Ridge classification method obtained the best performance with an F1 score of 0.96. Extensive experimental findings showed the effectiveness and simplicity of this diagnostic process of PE in comparison with the existing scoring systems. The main strength of this approach centered on PE disease management procedures, which would reduce avoidable invasive CTA imaging and be applied as a primary prognosis of PE, hence assisting the healthcare system, clinicians, and patients by reducing costs and promoting treatment quality and patient satisfaction.
Collapse
Affiliation(s)
- Soroor Laffafchi
- Department of Business Administration and Entrepreneurship, Faculty of Management and Economics, Science and Research Branch, Islamic Azad University, Daneshgah Blvd, Simon Bulivar Blvd, Tehran, Iran
| | - Ahmad Ebrahimi
- Department of Industrial and Technology Management, Faculty of Management and Economics, Science and Research Branch, Islamic Azad University, Daneshgah Blvd, Simon Bulivar Blvd, Tehran, Iran
| | - Samira Kafan
- Department of Pulmonary Medicine, Sina Hospital, International Relations Office, Medical School, Tehran University of Medical Sciences, PourSina St., Tehran, 1417613151 Iran
| |
Collapse
|
4
|
Soares Dias Portela A, Saxena V, Rosenn E, Wang SH, Masieri S, Palmieri J, Pasinetti GM. Role of Artificial Intelligence in Multinomial Decisions and Preventative Nutrition in Alzheimer's Disease. Mol Nutr Food Res 2024; 68:e2300605. [PMID: 38175857 DOI: 10.1002/mnfr.202300605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 10/04/2023] [Indexed: 01/06/2024]
Abstract
Alzheimer's disease (AD) affects 50 million people worldwide, an increase of 35 million since 2015, and it is known for memory loss and cognitive decline. Considering the morbidity associated with AD, it is important to explore lifestyle elements influencing the chances of developing AD, with special emphasis on nutritional aspects. This review will first discuss how dietary factors have an impact in AD development and the possible role of Artificial Intelligence (AI) and Machine Learning (ML) in preventative care of AD patients through nutrition. The Mediterranean-DASH diets provide individuals with many nutrient benefits which assists the prevention of neurodegeneration by having neuroprotective roles. Lack of micronutrients, protein-energy, and polyunsaturated fatty acids increase the chance of cognitive decline, loss of memory, and synaptic dysfunction among others. ML software has the ability to design models of algorithms from data introduced to present practical solutions that are accessible and easy to use. It can give predictions for a precise medicine approach to evaluate individuals as a whole. There is no doubt the future of nutritional science lies on customizing diets for individuals to reduce dementia risk factors, maintain overall health and brain function.
Collapse
Affiliation(s)
| | - Vrinda Saxena
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, 10019, USA
| | - Eric Rosenn
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, 10019, USA
| | - Shu-Han Wang
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, 10019, USA
| | - Sibilla Masieri
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, 10019, USA
| | - Joshua Palmieri
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, 10019, USA
| | - Giulio Maria Pasinetti
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, 10019, USA
- Geriatrics Research, Education and Clinical Center, JJ Peters VA Medical Center, Bronx, NY, 10468, USA
| |
Collapse
|
5
|
Chaudhary R, Nourelahi M, Thoma FW, Gellad WF, Lo-Ciganic WH, Bliden KP, Gurbel PA, Neal MD, Jain SK, Bhonsale A, Mulukutla SR, Wang Y, Harinstein ME, Saba S, Visweswaran S. Machine Learning - Based Bleeding Risk Predictions in Atrial Fibrillation Patients on Direct Oral Anticoagulants. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.27.24307985. [PMID: 38854094 PMCID: PMC11160827 DOI: 10.1101/2024.05.27.24307985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Importance Accurately predicting major bleeding events in non-valvular atrial fibrillation (AF) patients on direct oral anticoagulants (DOACs) is crucial for personalized treatment and improving patient outcomes, especially with emerging alternatives like left atrial appendage closure devices. The left atrial appendage closure devices reduce stroke risk comparably but with significantly fewer non-procedural bleeding events. Objective To evaluate the performance of machine learning (ML) risk models in predicting clinically significant bleeding events requiring hospitalization and hemorrhagic stroke in non-valvular AF patients on DOACs compared to conventional bleeding risk scores (HAS-BLED, ORBIT, and ATRIA) at the index visit to a cardiologist for AF management. Design Prognostic modeling with retrospective cohort study design using electronic health record (EHR) data, with clinical follow-up at one-, two-, and five-years. Setting University of Pittsburgh Medical Center (UPMC) system. Participants 24,468 non-valvular AF patients aged ≥18 years treated with DOACs, excluding those with prior history of significant bleeding, other indications for DOACs, on warfarin or contraindicated to DOACs. Exposures DOAC therapy for non-valvular AF. Main Outcomes and Measures The primary endpoint was clinically significant bleeding requiring hospitalization within one year of index visit. The models incorporated demographic, clinical, and laboratory variables available in the EHR at the index visit. Results Among 24,468 patients, 553 (2.3%) had bleeding events within one year, 829 (3.5%) within two years, and 1,292 (5.8%) within five years of index visit. We evaluated multivariate logistic regression and ML models including random forest, classification trees, k-nearest neighbor, naive Bayes, and extreme gradient boosting (XGBoost) which modestly outperformed HAS-BLED, ATRIA, and ORBIT scores in predicting clinically significant bleeding at 1-year follow-up. The best performing model (random forest) showed area under the curve (AUC-ROC) 0.76 (0.70-0.81), G-Mean score of 0.67, net reclassification index 0.14 compared to 0.57 (0.50-0.63), G-Mean score of 0.57 for HASBLED score, p-value for difference <0.001. The ML models had improved performance compared to conventional risk across time-points of 2-year and 5-years and within the subgroup of hemorrhagic stroke. SHAP analysis identified novel risk factors including measures from body mass index, cholesterol profile, and insurance type beyond those used in conventional risk scores. Conclusions and Relevance Our findings demonstrate the superior performance of ML models compared to conventional bleeding risk scores and identify novel risk factors highlighting the potential for personalized bleeding risk assessment in AF patients on DOACs.
Collapse
|
6
|
Cerono G, Melaiu O, Chicco D. Clinical Feature Ranking Based on Ensemble Machine Learning Reveals Top Survival Factors for Glioblastoma Multiforme. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2024; 8:1-18. [PMID: 38273986 PMCID: PMC10805687 DOI: 10.1007/s41666-023-00138-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 07/06/2023] [Accepted: 07/07/2023] [Indexed: 01/27/2024]
Abstract
Glioblastoma multiforme (GM) is a malignant tumor of the central nervous system considered to be highly aggressive and often carrying a terrible survival prognosis. An accurate prognosis is therefore pivotal for deciding a good treatment plan for patients. In this context, computational intelligence applied to data of electronic health records (EHRs) of patients diagnosed with this disease can be useful to predict the patients' survival time. In this study, we evaluated different machine learning models to predict survival time in patients suffering from glioblastoma and further investigated which features were the most predictive for survival time. We applied our computational methods to three different independent open datasets of EHRs of patients with glioblastoma: the Shieh dataset of 84 patients, the Berendsen dataset of 647 patients, and the Lammer dataset of 60 patients. Our survival time prediction techniques obtained concordance index (C-index) = 0.583 in the Shieh dataset, C-index = 0.776 in the Berendsen dataset, and C-index = 0.64 in the Lammer dataset, as best results in each dataset. Since the original studies regarding the three datasets analyzed here did not provide insights about the most predictive clinical features for survival time, we investigated the feature importance among these datasets. To this end, we then utilized Random Survival Forests, which is a decision tree-based algorithm able to model non-linear interaction between different features and might be able to better capture the highly complex clinical and genetic status of these patients. Our discoveries can impact clinical practice, aiding clinicians and patients alike to decide which therapy plan is best suited for their unique clinical status.
Collapse
Affiliation(s)
- Gabriel Cerono
- Department of Neurology, University of California San Francisco, San Francisco, CA USA
| | | | - Davide Chicco
- Dipartimento di Informatica Sistemistica e Comunicazione, Università di Milano-Bicocca, Milan, Italy
- Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Ontario Canada
| |
Collapse
|
7
|
Lubiana T, Lopes R, Medeiros P, Silva JC, Goncalves ANA, Maracaja-Coutinho V, Nakaya HI. Ten quick tips for harnessing the power of ChatGPT in computational biology. PLoS Comput Biol 2023; 19:e1011319. [PMID: 37561669 PMCID: PMC10414555 DOI: 10.1371/journal.pcbi.1011319] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023] Open
Affiliation(s)
- Tiago Lubiana
- School of Pharmaceutical Sciences, University of São Paulo, São Paulo, Brazil
| | - Rafael Lopes
- Department of Epidemiology of Microbial Diseases and Public Health Modeling Unit, Yale School of Public Health, New Haven, Connecticut, United States of America
| | | | - Juan Carlo Silva
- School of Pharmaceutical Sciences, University of São Paulo, São Paulo, Brazil
| | | | - Vinicius Maracaja-Coutinho
- Advanced Center for Chronic Diseases, Universidad de Chile, Santiago, Chile
- Centro de Modelamiento Molecular, Biofísica y Bioinformática—CM2B2, Facultad de Ciencias Químicas y Farmacéuticas, Universidad de Chile, Santiago, Chile
- ANID Anillo ACT210004 SYSTEMIX, Rancagua, Chile
- Anillo Inflammation in HIV/AIDS—InflammAIDS, Santiago, Chile
- Beagle Bioinformatics, São Paulo, Brasil & Santiago, Chile
| | - Helder I. Nakaya
- School of Pharmaceutical Sciences, University of São Paulo, São Paulo, Brazil
- Hospital Israelita Albert Einstein, São Paulo, Brazil
| |
Collapse
|
8
|
Chicco D, Cumbo F, Angione C. Ten quick tips for avoiding pitfalls in multi-omics data integration analyses. PLoS Comput Biol 2023; 19:e1011224. [PMID: 37410704 DOI: 10.1371/journal.pcbi.1011224] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/08/2023] Open
Abstract
Data are the most important elements of bioinformatics: Computational analysis of bioinformatics data, in fact, can help researchers infer new knowledge about biology, chemistry, biophysics, and sometimes even medicine, influencing treatments and therapies for patients. Bioinformatics and high-throughput biological data coming from different sources can even be more helpful, because each of these different data chunks can provide alternative, complementary information about a specific biological phenomenon, similar to multiple photos of the same subject taken from different angles. In this context, the integration of bioinformatics and high-throughput biological data gets a pivotal role in running a successful bioinformatics study. In the last decades, data originating from proteomics, metabolomics, metagenomics, phenomics, transcriptomics, and epigenomics have been labelled -omics data, as a unique name to refer to them, and the integration of these omics data has gained importance in all biological areas. Even if this omics data integration is useful and relevant, due to its heterogeneity, it is not uncommon to make mistakes during the integration phases. We therefore decided to present these ten quick tips to perform an omics data integration correctly, avoiding common mistakes we experienced or noticed in published studies in the past. Even if we designed our ten guidelines for beginners, by using a simple language that (we hope) can be understood by anyone, we believe our ten recommendations should be taken into account by all the bioinformaticians performing omics data integration, including experts.
Collapse
Affiliation(s)
- Davide Chicco
- Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
| | - Fabio Cumbo
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, United States of America
| | - Claudio Angione
- School of Computing Engineering and Digital Technologies, Teesside University, Middlesbrough, United Kingdom
| |
Collapse
|
9
|
Chicco D, Ferraro Petrillo U, Cattaneo G. Ten quick tips for bioinformatics analyses using an Apache Spark distributed computing environment. PLoS Comput Biol 2023; 19:e1011272. [PMID: 37471333 PMCID: PMC10358940 DOI: 10.1371/journal.pcbi.1011272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/22/2023] Open
Abstract
Some scientific studies involve huge amounts of bioinformatics data that cannot be analyzed on personal computers usually employed by researchers for day-to-day activities but rather necessitate effective computational infrastructures that can work in a distributed way. For this purpose, distributed computing systems have become useful tools to analyze large amounts of bioinformatics data and to generate relevant results on virtual environments, where software can be executed for hours or even days without affecting the personal computer or laptop of a researcher. Even if distributed computing resources have become pivotal in multiple bioinformatics laboratories, often researchers and students use them in the wrong ways, making mistakes that can cause the distributed computers to underperform or that can even generate wrong outcomes. In this context, we present here ten quick tips for the usage of Apache Spark distributed computing systems for bioinformatics analyses: ten simple guidelines that, if taken into account, can help users avoid common mistakes and can help them run their bioinformatics analyses smoothly. Even if we designed our recommendations for beginners and students, they should be followed by experts too. We think our quick tips can help anyone make use of Apache Spark distributed computing systems more efficiently and ultimately help generate better, more reliable scientific results.
Collapse
Affiliation(s)
- Davide Chicco
- Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
| | | | - Giuseppe Cattaneo
- Dipartimento di Informatica, Università di Salerno, Fisciano (Salerno), Italy
| |
Collapse
|