1
|
Alhazmi A, Mahmud R, Idris N, Mohamed Abo ME, Eke CI. Code-mixing unveiled: Enhancing the hate speech detection in Arabic dialect tweets using machine learning models. PLoS One 2024; 19:e0305657. [PMID: 39018339 PMCID: PMC11253949 DOI: 10.1371/journal.pone.0305657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 06/03/2024] [Indexed: 07/19/2024] Open
Abstract
Technological developments over the past few decades have changed the way people communicate, with platforms like social media and blogs becoming vital channels for international conversation. Even though hate speech is vigorously suppressed on social media, it is still a concern that needs to be constantly recognized and observed. The Arabic language poses particular difficulties in the detection of hate speech, despite the considerable efforts made in this area for English-language social media content. Arabic calls for particular consideration when it comes to hate speech detection because of its many dialects and linguistic nuances. Another degree of complication is added by the widespread practice of "code-mixing," in which users merge various languages smoothly. Recognizing this research vacuum, the study aims to close it by examining how well machine learning models containing variation features can detect hate speech, especially when it comes to Arabic tweets featuring code-mixing. Therefore, the objective of this study is to assess and compare the effectiveness of different features and machine learning models for hate speech detection on Arabic hate speech and code-mixing hate speech datasets. To achieve the objectives, the methodology used includes data collection, data pre-processing, feature extraction, the construction of classification models, and the evaluation of the constructed classification models. The findings from the analysis revealed that the TF-IDF feature, when employed with the SGD model, attained the highest accuracy, reaching 98.21%. Subsequently, these results were contrasted with outcomes from three existing studies, and the proposed method outperformed them, underscoring the significance of the proposed method. Consequently, our study carries practical implications and serves as a foundational exploration in the realm of automated hate speech detection in text.
Collapse
Affiliation(s)
- Ali Alhazmi
- Faculty of Computer Science and Information Technology, Universiti Malaya, Kuala Lumpur, Malaysia
- Department of Computer Science, College of Engineering and Computer Science, Jazan University, Jazan, Saudi Arabia
| | - Rohana Mahmud
- Faculty of Computer Science and Information Technology, Universiti Malaya, Kuala Lumpur, Malaysia
| | - Norisma Idris
- Faculty of Computer Science and Information Technology, Universiti Malaya, Kuala Lumpur, Malaysia
| | | | - Christopher Ifeanyi Eke
- Faculty of Computing, Department of Computer Science, Federal University of Lafia, Lafia, Nasarawa State, Nigeria
| |
Collapse
|
2
|
Wang J, Xue Q, Zhang CWJ, Wong KKL, Liu Z. Explainable coronary artery disease prediction model based on AutoGluon from AutoML framework. Front Cardiovasc Med 2024; 11:1360548. [PMID: 39011494 PMCID: PMC11246996 DOI: 10.3389/fcvm.2024.1360548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Accepted: 06/11/2024] [Indexed: 07/17/2024] Open
Abstract
Objective This study focuses on the innovative application of Automated Machine Learning (AutoML) technology in cardiovascular medicine to construct an explainable Coronary Artery Disease (CAD) prediction model to support the clinical diagnosis of CAD. Methods This study utilizes a combined data set of five public data sets related to CAD. An ensemble model is constructed using the AutoML open-source framework AutoGluon to evaluate the feasibility of AutoML in constructing a disease prediction model in cardiovascular medicine. The performance of the ensemble model is compared against individual baseline models. Finally, the disease prediction ensemble model is explained using SHapley Additive exPlanations (SHAP). Results The experimental results show that the AutoGluon-based ensemble model performs better than the individual baseline models in predicting CAD. It achieved an accuracy of 0.9167 and an AUC of 0.9562 in 4-fold cross-bagging. SHAP measures the importance of each feature to the prediction of the model and explains the prediction results of the model. Conclusion This study demonstrates the feasibility and efficacy of AutoML technology in cardiovascular medicine and highlights its potential in disease prediction. AutoML reduces the barriers to model building and significantly improves prediction accuracy. Additionally, the integration of SHAP enhances model transparency and explainability, which is critical to ensuring model credibility and widespread adoption in cardiovascular medicine.
Collapse
Affiliation(s)
- Jianghong Wang
- Faculty of Information Engineering and Automation, Center for Precision Medicine, Yan'an Hospital of Kunming City & Kunming University of Science and Technology, Kunming, China
| | - Qiang Xue
- Faculty of Information Engineering and Automation, Center for Precision Medicine, Yan'an Hospital of Kunming City & Kunming University of Science and Technology, Kunming, China
| | - Chris W J Zhang
- Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, Canada
| | | | - Zhihua Liu
- Faculty of Information Engineering and Automation, Center for Precision Medicine, Yan'an Hospital of Kunming City & Kunming University of Science and Technology, Kunming, China
- Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, Canada
- Bayer HealthCare & Dana-Farber Cancer Institute, Harvard University, Boston, MA, United States
| |
Collapse
|
3
|
Muse ED, Topol EJ. Transforming the cardiometabolic disease landscape: Multimodal AI-powered approaches in prevention and management. Cell Metab 2024; 36:670-683. [PMID: 38428435 PMCID: PMC10990799 DOI: 10.1016/j.cmet.2024.02.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 01/25/2024] [Accepted: 02/06/2024] [Indexed: 03/03/2024]
Abstract
The rise of artificial intelligence (AI) has revolutionized various scientific fields, particularly in medicine, where it has enabled the modeling of complex relationships from massive datasets. Initially, AI algorithms focused on improved interpretation of diagnostic studies such as chest X-rays and electrocardiograms in addition to predicting patient outcomes and future disease onset. However, AI has evolved with the introduction of transformer models, allowing analysis of the diverse, multimodal data sources existing in medicine today. Multimodal AI holds great promise in more accurate disease risk assessment and stratification as well as optimizing the key driving factors in cardiometabolic disease: blood pressure, sleep, stress, glucose control, weight, nutrition, and physical activity. In this article we outline the current state of medical AI in cardiometabolic disease, highlighting the potential of multimodal AI to augment personalized prevention and treatment strategies in cardiometabolic disease.
Collapse
Affiliation(s)
- Evan D Muse
- Scripps Research Translational Institute, Scripps Research, La Jolla, CA 92037, USA; Division of Cardiovascular Diseases, Scripps Clinic, La Jolla, CA 92037, USA
| | - Eric J Topol
- Scripps Research Translational Institute, Scripps Research, La Jolla, CA 92037, USA; Division of Cardiovascular Diseases, Scripps Clinic, La Jolla, CA 92037, USA.
| |
Collapse
|
4
|
Parvin S, Nimmy SF, Kamal MS. Convolutional neural network based data interpretable framework for Alzheimer's treatment planning. Vis Comput Ind Biomed Art 2024; 7:3. [PMID: 38296864 PMCID: PMC10830981 DOI: 10.1186/s42492-024-00154-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Accepted: 01/08/2024] [Indexed: 02/02/2024] Open
Abstract
Alzheimer's disease (AD) is a neurological disorder that predominantly affects the brain. In the coming years, it is expected to spread rapidly, with limited progress in diagnostic techniques. Various machine learning (ML) and artificial intelligence (AI) algorithms have been employed to detect AD using single-modality data. However, recent developments in ML have enabled the application of these methods to multiple data sources and input modalities for AD prediction. In this study, we developed a framework that utilizes multimodal data (tabular data, magnetic resonance imaging (MRI) images, and genetic information) to classify AD. As part of the pre-processing phase, we generated a knowledge graph from the tabular data and MRI images. We employed graph neural networks for knowledge graph creation, and region-based convolutional neural network approach for image-to-knowledge graph generation. Additionally, we integrated various explainable AI (XAI) techniques to interpret and elucidate the prediction outcomes derived from multimodal data. Layer-wise relevance propagation was used to explain the layer-wise outcomes in the MRI images. We also incorporated submodular pick local interpretable model-agnostic explanations to interpret the decision-making process based on the tabular data provided. Genetic expression values play a crucial role in AD analysis. We used a graphical gene tree to identify genes associated with the disease. Moreover, a dashboard was designed to display XAI outcomes, enabling experts and medical professionals to easily comprehend the prediction results.
Collapse
Affiliation(s)
- Sazia Parvin
- Information Technology, Melbourne Polytechnic, Melbourne, VIC 3072, Australia.
| | - Sonia Farhana Nimmy
- Faculty of Economics and Business, University of New South Wales, Sydney, ACT 2612, Australia
| | - Md Sarwar Kamal
- School of Computer Science, Faculty of Engineering and IT, University of Technology Sydney, Sydney, NSW 2007, Australia
| |
Collapse
|
5
|
Hughes JW, Tooley J, Torres Soto J, Ostropolets A, Poterucha T, Christensen MK, Yuan N, Ehlert B, Kaur D, Kang G, Rogers A, Narayan S, Elias P, Ouyang D, Ashley E, Zou J, Perez MV. A deep learning-based electrocardiogram risk score for long term cardiovascular death and disease. NPJ Digit Med 2023; 6:169. [PMID: 37700032 PMCID: PMC10497604 DOI: 10.1038/s41746-023-00916-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Accepted: 08/30/2023] [Indexed: 09/14/2023] Open
Abstract
The electrocardiogram (ECG) is the most frequently performed cardiovascular diagnostic test, but it is unclear how much information resting ECGs contain about long term cardiovascular risk. Here we report that a deep convolutional neural network can accurately predict the long-term risk of cardiovascular mortality and disease based on a resting ECG alone. Using a large dataset of resting 12-lead ECGs collected at Stanford University Medical Center, we developed SEER, the Stanford Estimator of Electrocardiogram Risk. SEER predicts 5-year cardiovascular mortality with an area under the receiver operator characteristic curve (AUC) of 0.83 in a held-out test set at Stanford, and with AUCs of 0.78 and 0.83 respectively when independently evaluated at Cedars-Sinai Medical Center and Columbia University Irving Medical Center. SEER predicts 5-year atherosclerotic disease (ASCVD) with an AUC of 0.67, similar to the Pooled Cohort Equations for ASCVD Risk, while being only modestly correlated. When used in conjunction with the Pooled Cohort Equations, SEER accurately reclassified 16% of patients from low to moderate risk, uncovering a group with an actual average 9.9% 10-year ASCVD risk who would not have otherwise been indicated for statin therapy. SEER can also predict several other cardiovascular conditions such as heart failure and atrial fibrillation. Using only lead I of the ECG it predicts 5-year cardiovascular mortality with an AUC of 0.80. SEER, used alongside the Pooled Cohort Equations and other risk tools, can substantially improve cardiovascular risk stratification and aid in medical decision making.
Collapse
Affiliation(s)
- J Weston Hughes
- Department of Computer Science, Stanford University, Palo Alto, CA, USA.
| | - James Tooley
- Department of Medicine, Stanford University, Palo Alto, CA, USA
| | - Jessica Torres Soto
- Department of Biomedical Informatics, Stanford University, Palo Alto, CA, USA
| | - Anna Ostropolets
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
| | - Tim Poterucha
- Milstein Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
| | - Matthew Kai Christensen
- Department of Cardiology, Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Neal Yuan
- Department of Cardiology, Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Ben Ehlert
- Department of Biomedical Informatics, Stanford University, Palo Alto, CA, USA
| | | | - Guson Kang
- Department of Medicine, Stanford University, Palo Alto, CA, USA
| | - Albert Rogers
- Department of Medicine, Stanford University, Palo Alto, CA, USA
| | - Sanjiv Narayan
- Department of Medicine, Stanford University, Palo Alto, CA, USA
| | - Pierre Elias
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
- Milstein Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
| | - David Ouyang
- Department of Cardiology, Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Euan Ashley
- Department of Medicine, Stanford University, Palo Alto, CA, USA
| | - James Zou
- Department of Computer Science, Stanford University, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, USA
| | - Marco V Perez
- Department of Medicine, Stanford University, Palo Alto, CA, USA
| |
Collapse
|
6
|
Forrest IS, Petrazzini BO, Duffy Á, Park JK, O'Neal AJ, Jordan DM, Rocheleau G, Nadkarni GN, Cho JH, Blazer AD, Do R. A machine learning model identifies patients in need of autoimmune disease testing using electronic health records. Nat Commun 2023; 14:2385. [PMID: 37169741 PMCID: PMC10130143 DOI: 10.1038/s41467-023-37996-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Accepted: 04/05/2023] [Indexed: 05/13/2023] Open
Abstract
Systemic autoimmune rheumatic diseases (SARDs) can lead to irreversible damage if left untreated, yet these patients often endure long diagnostic journeys before being diagnosed and treated. Machine learning may help overcome the challenges of diagnosing SARDs and inform clinical decision-making. Here, we developed and tested a machine learning model to identify patients who should receive rheumatological evaluation for SARDs using longitudinal electronic health records of 161,584 individuals from two institutions. The model demonstrated high performance for predicting cases of autoantibody-tested individuals in a validation set, an external test set, and an independent cohort with a broader case definition. This approach identified more individuals for autoantibody testing compared with current clinical standards and a greater proportion of autoantibody carriers among those tested. Diagnoses of SARDs and other autoimmune conditions increased with higher model probabilities. The model detected a need for autoantibody testing and rheumatology encounters up to five years before the test date and assessment date, respectively. Altogether, these findings illustrate that the clinical manifestations of a diverse array of autoimmune conditions are detectable in electronic health records using machine learning, which may help systematize and accelerate autoimmune testing.
Collapse
Affiliation(s)
- Iain S Forrest
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Medical Scientist Training Program, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The BioMe Phenomics Center, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ben O Petrazzini
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Áine Duffy
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Joshua K Park
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Medical Scientist Training Program, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Anya J O'Neal
- Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Daniel M Jordan
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ghislain Rocheleau
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Girish N Nadkarni
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The BioMe Phenomics Center, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Judy H Cho
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The BioMe Phenomics Center, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ashira D Blazer
- Division of Rheumatology, Hospital for Special Surgery, New York, NY, USA
| | - Ron Do
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- The BioMe Phenomics Center, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
7
|
Huang AA, Huang SY. Use of machine learning to identify risk factors for coronary artery disease. PLoS One 2023; 18:e0284103. [PMID: 37058460 PMCID: PMC10104376 DOI: 10.1371/journal.pone.0284103] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Accepted: 03/23/2023] [Indexed: 04/15/2023] Open
Abstract
Coronary artery disease (CAD) is the leading cause of death in both developed and developing nations. The objective of this study was to identify risk factors for coronary artery disease through machine-learning and assess this methodology. A retrospective, cross-sectional cohort study using the publicly available National Health and Nutrition Examination Survey (NHANES) was conducted in patients who completed the demographic, dietary, exercise, and mental health questionnaire and had laboratory and physical exam data. Univariate logistic models, with CAD as the outcome, were used to identify covariates that were associated with CAD. Covariates that had a p<0.0001 on univariate analysis were included within the final machine-learning model. The machine learning model XGBoost was used due to its prevalence within the literature as well as its increased predictive accuracy in healthcare prediction. Model covariates were ranked according to the Cover statistic to identify risk factors for CAD. Shapely Additive Explanations (SHAP) explanations were utilized to visualize the relationship between these potential risk factors and CAD. Of the 7,929 patients that met the inclusion criteria in this study, 4,055 (51%) were female, 2,874 (49%) were male. The mean age was 49.2 (SD = 18.4), with 2,885 (36%) White patients, 2,144 (27%) Black patients, 1,639 (21%) Hispanic patients, and 1,261 (16%) patients of other race. A total of 338 (4.5%) of patients had coronary artery disease. These were fitted into the XGBoost model and an AUROC = 0.89, Sensitivity = 0.85, Specificity = 0.87 were observed (Fig 1). The top four highest ranked features by cover, a measure of the percentage contribution of the covariate to the overall model prediction, were age (Cover = 21.1%), Platelet count (Cover = 5.1%), family history of heart disease (Cover = 4.8%), and Total Cholesterol (Cover = 4.1%). Machine learning models can effectively predict coronary artery disease using demographic, laboratory, physical exam, and lifestyle covariates and identify key risk factors.
Collapse
Affiliation(s)
- Alexander A. Huang
- Department of Statistics and Data Science, Cornell University, Ithaca, New York, United States of America
- Department of MD Education, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States of America
| | - Samuel Y. Huang
- Department of Statistics and Data Science, Cornell University, Ithaca, New York, United States of America
- Department of Internal Medicine, Virginia Commonwealth University School of Medicine, Richmond, Virginia, United States of America
| |
Collapse
|
8
|
Li S, Zhang Y, Xu W, Lv Z, Xu L, Zhao Z, Zhu D, Song Y. C Allele of the PPARδ+294T>C Polymorphism Confers a Higher Risk of Hypercholesterolemia, but not Obesity and Insulin Resistance: A Systematic Review and Meta-Analysis. Horm Metab Res 2023; 55:355-366. [PMID: 37011890 DOI: 10.1055/a-2043-7707] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/05/2023]
Abstract
The relationships of the PPARα Leu162Val and PPARδ+294 T>C polymorphisms with metabolic indexes have been reported to be inconsistent and even contradictory. The meta-analysis was conducted to clarify the relationships between the two variants and the indexes of obesity, insulin resistance, and blood lipids. PubMed, Google Scholar, Embase, and Cochrane Library were searched for eligible studies. Standardized mean difference with 95% confidence interval was calculated to estimate the differences in the metabolic indexes between the genotypes of the Leu162Val and+294 T>C polymorphisms. Heterogeneity among studies was assessed by Cochran's x2-based Q-statistic test. Publication bias was identified by using Begg's test. Forty-one studies (44 585 subjects) and 33 studies (23 018 subjects) were identified in the analyses for the Leu162Val and+294 T>C polymorphisms, respectively. C allele carriers of the+294 T>C polymorphism had significantly higher levels of total cholesterol and low-density lipoprotein cholesterol than TT homozygotes in the whole population. Notably, C allele carriers of the+294 T>C polymorphism had significantly higher levels of triglycerides and total cholesterol in East Asians, but lower levels of triglycerides in West Asians than TT homozygotes. Regarding the Leu162Val polymorphism, it was found that Val allele carriers had significantly higher levels of blood glucose than Leu/Leu homozygotes only in European Caucasians. The meta-analysis demonstrates that C allele of the+294 T>C polymorphism in PPARδ gene confers a higher risk of hypercholesterolemia, which may partly explain the relationship between this variant and coronary artery disease.
Collapse
Affiliation(s)
- Shujin Li
- Central Laboratory, Clinical Medical College & Affiliated Hospital of Chengdu University, Chengdu, China
| | - Youjin Zhang
- Central Laboratory, Clinical Medical College & Affiliated Hospital of Chengdu University, Chengdu, China
| | - Wenhao Xu
- Clinical Medical College of Chengdu University, Chengdu, China
| | - Zhimin Lv
- Clinical Medical College of Chengdu University, Chengdu, China
| | - Luying Xu
- Clinical Medical College of Chengdu University, Chengdu, China
| | - Zixuan Zhao
- Clinical Medical College of Chengdu University, Chengdu, China
| | - Dan Zhu
- Clinical Medical College of Chengdu University, Chengdu, China
| | - Yongyan Song
- Central Laboratory, Clinical Medical College & Affiliated Hospital of Chengdu University, Chengdu, China
| |
Collapse
|
9
|
Integrated multimodal artificial intelligence framework for healthcare applications. NPJ Digit Med 2022; 5:149. [PMID: 36127417 PMCID: PMC9489871 DOI: 10.1038/s41746-022-00689-4] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2022] [Accepted: 08/31/2022] [Indexed: 11/24/2022] Open
Abstract
Artificial intelligence (AI) systems hold great promise to improve healthcare over the next decades. Specifically, AI systems leveraging multiple data sources and input modalities are poised to become a viable method to deliver more accurate results and deployable pipelines across a wide range of applications. In this work, we propose and evaluate a unified Holistic AI in Medicine (HAIM) framework to facilitate the generation and testing of AI systems that leverage multimodal inputs. Our approach uses generalizable data pre-processing and machine learning modeling stages that can be readily adapted for research and deployment in healthcare environments. We evaluate our HAIM framework by training and characterizing 14,324 independent models based on HAIM-MIMIC-MM, a multimodal clinical database (N = 34,537 samples) containing 7279 unique hospitalizations and 6485 patients, spanning all possible input combinations of 4 data modalities (i.e., tabular, time-series, text, and images), 11 unique data sources and 12 predictive tasks. We show that this framework can consistently and robustly produce models that outperform similar single-source approaches across various healthcare demonstrations (by 6–33%), including 10 distinct chest pathology diagnoses, along with length-of-stay and 48 h mortality predictions. We also quantify the contribution of each modality and data source using Shapley values, which demonstrates the heterogeneity in data modality importance and the necessity of multimodal inputs across different healthcare-relevant tasks. The generalizable properties and flexibility of our Holistic AI in Medicine (HAIM) framework could offer a promising pathway for future multimodal predictive systems in clinical and operational healthcare settings.
Collapse
|
10
|
Petrazzini BO, Chaudhary K, Márquez-Luna C, Forrest IS, Rocheleau G, Cho J, Narula J, Nadkarni G, Do R. Coronary Risk Estimation Based on Clinical Data in Electronic Health Records. J Am Coll Cardiol 2022; 79:1155-1166. [PMID: 35331410 PMCID: PMC8956801 DOI: 10.1016/j.jacc.2022.01.021] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Accepted: 01/05/2022] [Indexed: 12/25/2022]
Abstract
BACKGROUND Clinical features from electronic health records (EHRs) can be used to build a complementary tool to predict coronary artery disease (CAD) susceptibility. OBJECTIVES The purpose of this study was to determine whether an EHR score can improve CAD prediction and reclassification 1 year before diagnosis, beyond conventional clinical guidelines as determined by the pooled cohort equations (PCE) and a polygenic risk score for CAD. METHODS We applied a machine learning framework using clinical features from the EHR in a multiethnic, clinical care cohort (BioMe) comprising 555 CAD cases and 6,349 control subjects and in a population-based cohort (UK Biobank) comprising 3,130 CAD cases and 378,344 control subjects for external validation. RESULTS Compared with the PCE, the EHR score improved CAD prediction by 12% in the BioMe Biobank and by 9% in the UK Biobank. The EHR score reclassified 25.8% and 15.2% individuals in each cohort respectively, compared with the PCE score. We observed larger improvements in the EHR score over the PCE in a subgroup of individuals with low CAD risk, with 20% increased discrimination and 34.4% increased reclassification. In all models, the polygenic risk score for CAD did not improve CAD prediction, compared with the PCE or EHR score. CONCLUSIONS The EHR score resulted in increased prediction and reclassification for CAD, demonstrating its potential use for population health monitoring of short-term CAD risk in large health systems.
Collapse
Affiliation(s)
- Ben O Petrazzini
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, USA. https://twitter.com/OmegaPetrazzini
| | - Kumardeep Chaudhary
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Carla Márquez-Luna
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Iain S Forrest
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, USA; Medical Scientist Training Program, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Ghislain Rocheleau
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Judy Cho
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Jagat Narula
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Girish Nadkarni
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA; The Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Ron Do
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, USA.
| |
Collapse
|
11
|
Song Y, Li S, He C. PPARγ Gene Polymorphisms, Metabolic Disorders, and Coronary Artery Disease. Front Cardiovasc Med 2022; 9:808929. [PMID: 35402540 PMCID: PMC8984027 DOI: 10.3389/fcvm.2022.808929] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 02/22/2022] [Indexed: 01/14/2023] Open
Abstract
Being activated by endogenous and exogenous ligands, nuclear receptor peroxisome proliferator-activated receptor gamma (PPARγ) enhances insulin sensitivity, promotes adipocyte differentiation, stimulates adipogenesis, and has the properties of anti-atherosclerosis, anti-inflammation, and anti-oxidation. The Human PPARγ gene (PPARG) contains thousands of polymorphic loci, among them two polymorphisms (rs10865710 and rs7649970) in the promoter region and two polymorphisms (rs1801282 and rs3856806) in the exonic region were widely reported to be significantly associated with coronary artery disease (CAD). Mechanistically, PPARG polymorphisms lead to abnormal expression of PPARG gene and/or dysfunction of PPARγ protein, causing metabolic disorders such as hypercholesterolemia and hypertriglyceridemia, and thereby increasing susceptibility to CAD.
Collapse
Affiliation(s)
- Yongyan Song
- Central Laboratory, Clinical Medical College and Affiliated Hospital of Chengdu University, Chengdu, China
| | - Shujin Li
- Central Laboratory, Clinical Medical College and Affiliated Hospital of Chengdu University, Chengdu, China
| | - Chuan He
- Department of Cardiology, Clinical Medical College and Affiliated Hospital of Chengdu University, Chengdu, China
- *Correspondence: Chuan He,
| |
Collapse
|
12
|
Kulm S, Kofman L, Mezey J, Elemento O. Simple Linear Cancer Risk Prediction Models With Novel Features Outperform Complex Approaches. JCO Clin Cancer Inform 2022; 6:e2100166. [PMID: 35239414 PMCID: PMC8920463 DOI: 10.1200/cci.21.00166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Revised: 01/17/2022] [Accepted: 01/28/2022] [Indexed: 11/20/2022] Open
Abstract
PURPOSE The ability to accurately predict an individual's risk for cancer is critical to the implementation of precision prevention measures. Current cancer risk predictions are frequently made with simple models that use a few proven risk factors, such as the Gail model for breast cancer, which are easy to interpret, but may theoretically be less accurate than advanced machine learning (ML) models. METHODS With the UK Biobank, a large prospective study, we developed models that predicted 13 cancer diagnoses within a 10-year time span. ML and linear models fit with all features, linear models fit with 10 features, and externally developed QCancer models, which are available to more than 4,000 general practices, were assessed. RESULTS The average area under the receiver operator curve (AUC) of the linear models (0.722, SE = 0.015) was greater than the average AUC of the ML models (0.720, SE = 0.016) when all 931 features were used. Linear models with only 10 features generated an average AUC of 0.706 (SE 0.015), which was comparable to the complex models using all features and greater than the average AUC of the QCancer models (0.684, SE 0.021). The high performance of the 10-feature linear model may be caused by the consideration of often omitted feature types, including census records and genetic information. CONCLUSION The high performance of the 10-feature linear models indicate that unbiased selection of diverse features, not ML models, may lead to impressively accurate predictions, possibly enabling personalized screening schedules that increase cancer survival.
Collapse
Affiliation(s)
- Scott Kulm
- Caryl and Israel Englander Institute of Precision Medicine, Weill Cornell Medicine, New York, NY
- Physiology, Biophysics and Systems Biology Graduate Program, Weill Cornell Medicine, New York, NY
| | - Lior Kofman
- Caryl and Israel Englander Institute of Precision Medicine, Weill Cornell Medicine, New York, NY
- Department of Computer Science, Tufts University, Medford, MA
| | - Jason Mezey
- Department of Genetic Medicine, Weill Cornell Medicine, New York, NY
- Department of Computational Biology, Cornell University, Ithaca, NY
| | - Olivier Elemento
- Caryl and Israel Englander Institute of Precision Medicine, Weill Cornell Medicine, New York, NY
- Physiology, Biophysics and Systems Biology Graduate Program, Weill Cornell Medicine, New York, NY
| |
Collapse
|
13
|
Li S, He C, Nie H, Pang Q, Wang R, Zeng Z, Song Y. G Allele of the rs1801282 Polymorphism in PPARγ Gene Confers an Increased Risk of Obesity and Hypercholesterolemia, While T Allele of the rs3856806 Polymorphism Displays a Protective Role Against Dyslipidemia: A Systematic Review and Meta-Analysis. Front Endocrinol (Lausanne) 2022; 13:919087. [PMID: 35846293 PMCID: PMC9276935 DOI: 10.3389/fendo.2022.919087] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Accepted: 05/30/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND The relationships between the rs1801282 and rs3856806 polymorphisms in nuclear receptor peroxisome proliferator-activated receptor gamma (PPARγ) gene and obesity indexes as well as serum lipid levels have been extensively investigated in various studies, but the results were inconsistent and even contradictory. METHODS PubMed, Google Scholar, Embase, Cochrane Library, Web of Science, Wanfang, CNKI and VIP databases were searched for eligible studies. The random-effTPDEects model was used, and standardized mean difference (SMD) with 95% confidence interval (CI) was calculated to estimate the differences in obesity indexes and serum lipid levels between the subjects with different genotypes in a dominant model. Heterogeneity among studies was assessed by Cochran's x2-based Q-statistic test. Publication bias was identified by using Begg's test. RESULTS One hundred and twenty studies (70,317 subjects) and 33 studies (18,353 subjects) were identified in the analyses for the rs1801282 and rs3856806 polymorphisms, respectively. The G allele carriers of the rs1801282 polymorphism had higher levels of body mass index (SMD = 0.08 kg/m2, 95% CI = 0.04 to 0.12 kg/m2, p < 0.001), waist circumference (SMD = 0.12 cm, 95% CI = 0.06 to 0.18 cm, p < 0.001) and total cholesterol (SMD = 0.07 mmol/L, 95% CI = 0.02 to 0.11 mmol/L, p < 0.01) than the CC homozygotes. The T allele carriers of the rs3856806 polymorphism had lower levels of low-density lipoprotein cholesterol (SMD = -0.09 mmol/L, 95% CI = -0.15 to -0.03 mmol/L, p < 0.01) and higher levels of high-density lipoprotein cholesterol (SMD = 0.06 mmol/L, 95% CI = 0.02 to 0.10 mmol/L, p < 0.01) than the CC homozygotes. CONCLUSIONS The meta-analysis suggests that the G allele of the rs1801282 polymorphism confers an increased risk of obesity and hypercholesterolemia, while the T allele of the rs3856806 polymorphism displays a protective role against dyslipidemia, which can partly explain the associations between these polymorphisms and cardiovascular disease. SYSTEMATIC REVIEW REGISTRATION https://www.crd.york.ac.uk/prospero/, identifier [CRD42022319347].
Collapse
Affiliation(s)
- Shujin Li
- Central Laboratory, Clinical Medical College and Affiliated Hospital of Chengdu University, Chengdu, China
| | - Chuan He
- Department of Cardiology, Clinical Medical College and Affiliated Hospital of Chengdu University, Chengdu, China
| | - Haiyan Nie
- Clinical Medical College of Chengdu University, Chengdu, China
| | - Qianyin Pang
- Clinical Medical College of Chengdu University, Chengdu, China
| | - Ruixia Wang
- Clinical Medical College of Chengdu University, Chengdu, China
| | - Zhifu Zeng
- Clinical Medical College of Chengdu University, Chengdu, China
| | - Yongyan Song
- Central Laboratory, Clinical Medical College and Affiliated Hospital of Chengdu University, Chengdu, China
- *Correspondence: Yongyan Song,
| |
Collapse
|