1
|
Konstantonis G, Singh KV, Sfikakis PP, Jamthikar AD, Kitas GD, Gupta SK, Saba L, Verrou K, Khanna NN, Ruzsa Z, Sharma AM, Laird JR, Johri AM, Kalra M, Protogerou A, Suri JS. Cardiovascular disease detection using machine learning and carotid/femoral arterial imaging frameworks in rheumatoid arthritis patients. Rheumatol Int 2022; 42:215-239. [PMID: 35013839 DOI: 10.1007/s00296-021-05062-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 11/29/2021] [Indexed: 12/31/2022]
Abstract
The study proposes a novel machine learning (ML) paradigm for cardiovascular disease (CVD) detection in individuals at medium to high cardiovascular risk using data from a Greek cohort of 542 individuals with rheumatoid arthritis, or diabetes mellitus, and/or arterial hypertension, using conventional or office-based, laboratory-based blood biomarkers and carotid/femoral ultrasound image-based phenotypes. Two kinds of data (CVD risk factors and presence of CVD-defined as stroke, or myocardial infarction, or coronary artery syndrome, or peripheral artery disease, or coronary heart disease) as ground truth, were collected at two-time points: (i) at visit 1 and (ii) at visit 2 after 3 years. The CVD risk factors were divided into three clusters (conventional or office-based, laboratory-based blood biomarkers, carotid ultrasound image-based phenotypes) to study their effect on the ML classifiers. Three kinds of ML classifiers (Random Forest, Support Vector Machine, and Linear Discriminant Analysis) were applied in a two-fold cross-validation framework using the data augmented by synthetic minority over-sampling technique (SMOTE) strategy. The performance of the ML classifiers was recorded. In this cohort with overall 46 CVD risk factors (covariates) implemented in an online cardiovascular framework, that requires calculation time less than 1 s per patient, a mean accuracy and area-under-the-curve (AUC) of 98.40% and 0.98 (p < 0.0001) for CVD presence detection at visit 1, and 98.39% and 0.98 (p < 0.0001) at visit 2, respectively. The performance of the cardiovascular framework was significantly better than the classical CVD risk score. The ML paradigm proved to be powerful for CVD prediction in individuals at medium to high cardiovascular risk.
Collapse
Affiliation(s)
- George Konstantonis
- Rheumatology Unit, National Kapodistrian University of Athens, Athens, Greece
| | | | - Petros P Sfikakis
- Rheumatology Unit, National Kapodistrian University of Athens, Athens, Greece
| | - Ankush D Jamthikar
- Research Scientist, AtheroPoint™, USA, Roseville, CA, USA.,Visvesvaraya National Institute of Technology, Nagpur, India
| | - George D Kitas
- Academic Affairs, Dudley Group NHS Foundation Trust, Dudley, UK.,Arthritis Research UK Epidemiology Unit, Manchester University, Manchester, M13, UK
| | - Suneet K Gupta
- Department of Computer Science, Bennett University, Gr. Noida, India
| | - Luca Saba
- Department of Radiology, University of Cagliari, Cagliari, Italy
| | - Kleio Verrou
- Department of Medicine, National and Kapodistrian University of Athens, Athens, Greece
| | - Narendra N Khanna
- Department of Cardiology, Indraprastha Apollo Hospitals, New Delhi, India
| | - Zoltan Ruzsa
- Department of Internal Medicines, Invasive Cardiology Division, University of Szeged, Szeged, Hungary
| | - Aditya M Sharma
- Division of Cardiovascular Medicine, University of Virginia, Charlottesville, VA, USA
| | - John R Laird
- Heart and Vascular Institute, Adventist Health St. Helena, St Helena, CA, USA
| | - Amer M Johri
- Department of Medicine, Division of Cardiology, Queen's University, Kingston, ON, Canada
| | - Manudeep Kalra
- Department of Radiology, Massachusetts General Hospital, 55 Fruit Street, Boston, MA, USA
| | - Athanasios Protogerou
- Cardiovascular Prevention Unit, Department of Pathophysiology, National Kapodistrian University of Athens, Athens, Greece
| | - Jasjit S Suri
- Stroke Monitoring and Diagnostic Division, AtheroPoint™, Roseville, CA, 95661, USA.
| |
Collapse
|
2
|
Abstract
For COVID-19, predictive modeling, in the literature, uses broadly SEIR/SIR, agent-based, curve-fitting techniques/models. Besides, machine-learning models that are built on statistical tools/techniques are widely used. Predictions aim at making states and citizens aware of possible threats/consequences. However, for COVID-19 outbreak, state-of-the-art prediction models are failed to exploit crucial and unprecedented uncertainties/factors, such as a) hospital settings/capacity; b) test capacity/rate (on a daily basis); c) demographics; d) population density; e) vulnerable people; and f) income versus commodities (poverty). Depending on what factors are employed/considered in their models, predictions can be short-term and long-term. In this paper, we discuss how such continuous and unprecedented factors lead us to design complex models, rather than just relying on stochastic and/or discrete ones that are driven by randomly generated parameters. Further, it is a time to employ data-driven mathematically proved models that have the luxury to dynamically and automatically tune parameters over time.
Collapse
Affiliation(s)
- K C Santosh
- Department of Computer Science, University of South Dakota, 414 E Clark St, Vermillion, SD, 57069, USA.
| |
Collapse
|
3
|
Li Y, Umbach DM, Bingham A, Li QJ, Zhuang Y, Li L. Putative biomarkers for predicting tumor sample purity based on gene expression data. BMC Genomics 2019; 20:1021. [PMID: 31881847 PMCID: PMC6933652 DOI: 10.1186/s12864-019-6412-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Accepted: 12/18/2019] [Indexed: 12/29/2022] Open
Abstract
Background Tumor purity is the percent of cancer cells present in a sample of tumor tissue. The non-cancerous cells (immune cells, fibroblasts, etc.) have an important role in tumor biology. The ability to determine tumor purity is important to understand the roles of cancerous and non-cancerous cells in a tumor. Methods We applied a supervised machine learning method, XGBoost, to data from 33 TCGA tumor types to predict tumor purity using RNA-seq gene expression data. Results Across the 33 tumor types, the median correlation between observed and predicted tumor-purity ranged from 0.75 to 0.87 with small root mean square errors, suggesting that tumor purity can be accurately predicted υσινγ expression data. We further confirmed that expression levels of a ten-gene set (CSF2RB, RHOH, C1S, CCDC69, CCL22, CYTIP, POU2AF1, FGR, CCL21, and IL7R) were predictive of tumor purity regardless of tumor type. We tested whether our set of ten genes could accurately predict tumor purity of a TCGA-independent data set. We showed that expression levels from our set of ten genes were highly correlated (ρ = 0.88) with the actual observed tumor purity. Conclusions Our analyses suggested that the ten-gene set may serve as a biomarker for tumor purity prediction using gene expression data.
Collapse
Affiliation(s)
- Yuanyuan Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina 27709, USA MD A3-03, Durham, NC, 27709, USA.
| | - David M Umbach
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina 27709, USA MD A3-03, Durham, NC, 27709, USA
| | - Adrienna Bingham
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina 27709, USA MD A3-03, Durham, NC, 27709, USA
| | - Qi-Jing Li
- Department of Immunology, Duke University, Durham, North, Carolina, 27710, USA
| | - Yuan Zhuang
- Department of Immunology, Duke University, Durham, North, Carolina, 27710, USA
| | - Leping Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina 27709, USA MD A3-03, Durham, NC, 27709, USA
| |
Collapse
|