1
|
Saito S, Shahbaz S, Osman M, Redmond D, Bozorgmehr N, Rosychuk RJ, Lam G, Sligl W, Cohen Tervaert JW, Elahi S. Diverse immunological dysregulation, chronic inflammation, and impaired erythropoiesis in long COVID patients with chronic fatigue syndrome. J Autoimmun 2024; 147:103267. [PMID: 38797051 DOI: 10.1016/j.jaut.2024.103267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 05/16/2024] [Accepted: 05/22/2024] [Indexed: 05/29/2024]
Abstract
A substantial number of patients recovering from acute SARS-CoV-2 infection present serious lingering symptoms, often referred to as long COVID (LC). However, a subset of these patients exhibits the most debilitating symptoms characterized by ongoing myalgic encephalomyelitis or chronic fatigue syndrome (ME/CFS). We specifically identified and studied ME/CFS patients from two independent LC cohorts, at least 12 months post the onset of acute disease, and compared them to the recovered group (R). ME/CFS patients had relatively increased neutrophils and monocytes but reduced lymphocytes. Selective T cell exhaustion with reduced naïve but increased terminal effector T cells was observed in these patients. LC was associated with elevated levels of plasma pro-inflammatory cytokines, chemokines, Galectin-9 (Gal-9), and artemin (ARTN). A defined threshold of Gal-9 and ARTN concentrations had a strong association with LC. The expansion of immunosuppressive CD71+ erythroid cells (CECs) was noted. These cells may modulate the immune response and contribute to increased ARTN concentration, which correlated with pain and cognitive impairment. Serology revealed an elevation in a variety of autoantibodies in LC. Intriguingly, we found that the frequency of 2B4+CD160+ and TIM3+CD160+ CD8+ T cells completely separated LC patients from the R group. Our further analyses using a multiple regression model revealed that the elevated frequency/levels of CD4 terminal effector, ARTN, CEC, Gal-9, CD8 terminal effector, and MCP1 but lower frequency/levels of TGF-β and MAIT cells can distinguish LC from the R group. Our findings provide a new paradigm in the pathogenesis of ME/CFS to identify strategies for its prevention and treatment.
Collapse
Affiliation(s)
- Suguru Saito
- School of Dentistry, Division of Foundational Sciences, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, T6G 2E1, AB, Canada
| | - Shima Shahbaz
- School of Dentistry, Division of Foundational Sciences, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, T6G 2E1, AB, Canada
| | - Mohammed Osman
- Department of Medicine, Division of Rheumatology, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, T6G 2E1, AB, Canada
| | - Desiree Redmond
- Department of Medicine, Division of Rheumatology, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, T6G 2E1, AB, Canada
| | - Najmeh Bozorgmehr
- School of Dentistry, Division of Foundational Sciences, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, T6G 2E1, AB, Canada
| | - Rhonda J Rosychuk
- Department of Pediatrics, Division of Infectious Disease, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, T6G 2E1, AB, Canada
| | - Grace Lam
- Department of Medicine, Division of Pulmonary Medicine, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, T6G 2E1, AB, Canada
| | - Wendy Sligl
- Department of Critical Care Medicine, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, T6G 2E1, AB, Canada; Department of Medicine, Division of Infectious Diseases, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, T6G 2E1, AB, Canada
| | - Jan Willem Cohen Tervaert
- Department of Medicine, Division of Rheumatology, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, T6G 2E1, AB, Canada
| | - Shokrollah Elahi
- School of Dentistry, Division of Foundational Sciences, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, T6G 2E1, AB, Canada; Department of Oncology, University of Alberta, Edmonton, T6G 2E1, AB, Canada; Faculty of Medicine & Dentistry, University of Alberta, Edmonton, T6G 2E1, AB, Canada; Li Ka Shing Institute of Virology, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, T6G 2E1, AB, Canada.
| |
Collapse
|
2
|
Raoufi S, Jafarinejad-Farsangi S, Dehesh T, Hadizadeh M. Investigating unique genes of five molecular subtypes of breast cancer using penalized logistic regression. J Cancer Res Ther 2023; 19:S126-S137. [PMID: 37147992 DOI: 10.4103/jcrt.jcrt_811_21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
Background Breast cancer (BC) is the most common cancer and the fifth cause of death in women worldwide. Exploring unique genes for cancers has been interesting. Patients and Methods This study aimed to explore unique genes of five molecular subtypes of BC in women using penalized logistic regression models. For this purpose, microarray data of five independent GEO data sets were combined. This combination includes genetic information of 324 women with BC and 12 healthy women. Least absolute shrinkage and selection operator (LASSO) logistic regression and adaptive LASSO logistic regression were used to extract unique genes. The biological process of extracted genes was evaluated in an open-source GOnet web application. R software version 3.6.0 with the glmnet package was used for fitting the models. Results Totally, 119 genes were extracted among 15 pairwise comparisons. Seventeen genes (14%) showed overlap between comparative groups. According to GO enrichment analysis, the biological process of extracted genes was enriched in negative and positive regulation biological processes, and molecular function tracking revealed that most genes are involved in kinase and transferring activities. On the other hand, we identified unique genes for each comparative group and the subsequent pathways for them. However, a significant pathway was not identified for genes in normal-like versus ERBB2 and luminal A, basal versus control, and lumina B versus luminal A groups. Conclusion Most genes selected by LASSO logistic regression and adaptive LASSO logistic regression identified unique genes and related pathways for comparative subgroups of BC, which would be useful to comprehend the molecular differences between subgroups that would be considered for further research and therapeutic approaches in the future.
Collapse
Affiliation(s)
- Sadegh Raoufi
- Modeling in Health Research Center, Institute for Futures Studies in Health, Kerman University of Medical Sciences, Kerman, Iran
| | | | - Tania Dehesh
- Department of Epidemiology and Biostatistics, School of Public Health, Kerman University of Medical Sciences, Kerman, Iran
| | - Morteza Hadizadeh
- Cardiovascular Research Centre, Institute of Basic and Clinical Physiology Sciences, Kerman University of Medical Sciences, Kerman, Iran
| |
Collapse
|
3
|
Zhang B, Dong X, Hu Y, Jiang X, Li G. Classification and prediction of spinal disease based on the SMOTE-RFE-XGBoost model. PeerJ Comput Sci 2023; 9:e1280. [PMID: 37346612 PMCID: PMC10280425 DOI: 10.7717/peerj-cs.1280] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 02/15/2023] [Indexed: 06/23/2023]
Abstract
Spinal diseases are killers that cause long-term disturbance to people with complex and diverse symptoms and may cause other conditions. At present, the diagnosis and treatment of the main diseases mainly depend on the professional level and clinical experience of doctors, which is a breakthrough problem in the field of medicine. This article proposes the SMOTE-RFE-XGBoost model, which takes the physical angle of human bone as the research index for feature selection and classification model construction to predict spinal diseases. The research process is as follows: two groups of people with normal and abnormal spine conditions are taken as the research objects of this article, and the synthetic minority oversampling technique (SMOTE) algorithm is used to address category imbalance. Three methods, least absolute shrinkage and selection operator (LASSO), tree-based feature selection, and recursive feature elimination (RFE), are used for feature selection. Logistic regression (LR), support vector machine (SVM), parsimonious Bayes, decision tree (DT), random forest (RF), gradient boosting tree (GBT), extreme gradient boosting (XGBoost), and ridge regression models are used to classify the samples, construct single classification models and combine classification models and rank the feature importance. According to the accuracy and mean square error (MSE) values, the SMOTE-RFE-XGBoost combined model has the best classification, with accuracy, MSE and F1 values of 97.56%, 0.1111 and 0.8696, respectively. The importance of four indicators, lumbar slippage, cervical tilt, pelvic radius and pelvic tilt, was higher.
Collapse
Affiliation(s)
- Biao Zhang
- School of Computer Science, Liaocheng University, Liaocheng, Shandong, China
| | - Xinyan Dong
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, Hubei, China
| | - Yuwei Hu
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, Hubei, China
| | - Xuchu Jiang
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, Hubei, China
| | - Gongchi Li
- Union Hospital Affiliated to Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| |
Collapse
|
4
|
Zheng W, Zhang G, Fu C, Jin B. An adaptive feature selection algorithm based on MDS with uncorrelated constraints for tumor gene data classification. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:6652-6665. [PMID: 37161122 DOI: 10.3934/mbe.2023286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
The developing of DNA microarray technology has made it possible to study the cancer in view of the genes. Since the correlation between the genes is unconsidered, current unsupervised feature selection models may select lots of the redundant genes during the feature selecting due to the over focusing on genes with similar attribute. which may deteriorate the clustering performance of the model. To tackle this problem, we propose an adaptive feature selection model here in which reconstructed coefficient matrix with additional constraint is introduced to transform original data of high dimensional space into a low-dimensional space meanwhile to prevent over focusing on genes with similar attribute. Moreover, Alternative Optimization (AO) is also proposed to handle the nonconvex optimization induced by solving the proposed model. The experimental results on four different cancer datasets show that the proposed model is superior to existing models in the aspects such as clustering accuracy and sparsity of selected genes.
Collapse
Affiliation(s)
- Wenkui Zheng
- School of Computer and Information Engineering, Henan University, Kaifeng 475004, China
| | - Guangyao Zhang
- School of Artificial Intelligence, Henan University, Zhengzhou 450046, China
| | - Chunling Fu
- School of Physics and Electronics, Henan University, Kaifeng 475004, China
| | - Bo Jin
- School of Artificial Intelligence, Henan University, Zhengzhou 450046, China
| |
Collapse
|
5
|
Risk Stratification for Breast Cancer Patient by Simultaneous Learning of Molecular Subtype and Survival Outcome Using Genetic Algorithm-Based Gene Set Selection. Cancers (Basel) 2022; 14:cancers14174120. [PMID: 36077657 PMCID: PMC9454699 DOI: 10.3390/cancers14174120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 08/18/2022] [Accepted: 08/20/2022] [Indexed: 11/26/2022] Open
Abstract
Simple Summary Patient stratification is clinically important because it allows us to understand the characteristics and establish treatment strategies for a group. Transcriptomic data play an important role in determining molecular subtypes and predicting survival. In the case of breast cancer, although the order of prognosis according to molecular subtypes is well known, there is heterogeneity even within a subtype. Therefore, patient stratification considering both molecular subtypes and survival outcomes is required. In this study, a methodology to handle this problem is presented. A genetic algorithm is used to select a set of genes, and a risk score is assigned to each patient using their expression level. According to the risk score, patients are ordered and stratified considering molecular subtypes and survival outcomes. Consequently, informative genes for patient stratification with respect to both aspects could be nominated, and the usefulness of the risk score was shown through comparison with other indicators. Abstract Patient stratification is a clinically important task because it allows us to establish and develop efficient treatment strategies for particular groups of patients. Molecular subtypes have been successfully defined using transcriptomic profiles, and they are used effectively in clinical practice, e.g., PAM50 subtypes of breast cancer. Survival prediction contributed to understanding diseases and also identifying genes related to prognosis. It is desirable to stratify patients considering these two aspects simultaneously. However, there are no methods for patient stratification that consider molecular subtypes and survival outcomes at once. Here, we propose a methodology to deal with the problem. A genetic algorithm is used to select a gene set from transcriptome data, and their expression quantities are utilized to assign a risk score to each patient. The patients are ordered and stratified according to the score. A gene set was selected by our method on a breast cancer cohort (TCGA-BRCA), and we examined its clinical utility using an independent cohort (SCAN-B). In this experiment, our method was successful in stratifying patients with respect to both molecular subtype and survival outcome. We demonstrated that the orders of patients were consistent across repeated experiments, and prognostic genes were successfully nominated. Additionally, it was observed that the risk score can be used to evaluate the molecular aggressiveness of individual patients.
Collapse
|
6
|
Li L, Ching WK, Liu ZP. Robust biomarker screening from gene expression data by stable machine learning-recursive feature elimination methods. Comput Biol Chem 2022; 100:107747. [DOI: 10.1016/j.compbiolchem.2022.107747] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 06/17/2022] [Accepted: 07/25/2022] [Indexed: 11/03/2022]
|
7
|
Qian X, Li Y, Zhang X, Guo H, He J, Wang X, Yan Y, Ma J, Ma R, Guo S. A Cardiovascular Disease Prediction Model Based on Routine Physical Examination Indicators Using Machine Learning Methods: A Cohort Study. Front Cardiovasc Med 2022; 9:854287. [PMID: 35783868 PMCID: PMC9247206 DOI: 10.3389/fcvm.2022.854287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2022] [Accepted: 05/23/2022] [Indexed: 11/24/2022] Open
Abstract
Background Cardiovascular diseases (CVD) are currently the leading cause of premature death worldwide. Model-based early detection of high-risk populations for CVD is the key to CVD prevention. Thus, this research aimed to use machine learning (ML) algorithms to establish a CVD prediction model based on routine physical examination indicators suitable for the Xinjiang rural population. Method The research cohort data collection was divided into two stages. The first stage involved a baseline survey from 2010 to 2012, with follow-up ending in December 2017. The second-phase baseline survey was conducted from September to December 2016, and follow-up ended in August 2021. A total of 12,692 participants (10,407 Uyghur and 2,285 Kazak) were included in the study. Screening predictors and establishing variable subsets were based on least absolute shrinkage and selection operator (Lasso) regression, logistic regression forward partial likelihood estimation (FLR), random forest (RF) feature importance, and RF variable importance. The selected subset of variables was compared with L1 regularized logistic regression (L1-LR), RF, support vector machine (SVM), and AdaBoost algorithm to establish a CVD prediction model suitable for this population. The incidence of CVD in this population was then analyzed. Result After 4.94 years of follow-up, a total of 1,176 people were diagnosed with CVD (cumulative incidence: 9.27%). In the comparison of discrimination and calibration, the prediction performance of the subset of variables selected based on FLR was better than that of other models. Combining the results of discrimination, calibration, and clinical validity, the prediction model based on L1-LR had the best prediction performance. Age, systolic blood pressure, low-density lipoprotein-L/high-density lipoproteins-C, triglyceride blood glucose index, body mass index, and body adiposity index were all important predictors of the onset of CVD in the Xinjiang rural population. Conclusion In the Xinjiang rural population, the prediction model based on L1-LR had the best prediction performance.
Collapse
Affiliation(s)
- Xin Qian
- Department of Public Health, Shihezi University School of Medicine, Shihezi, China
| | - Yu Li
- Department of Public Health, Shihezi University School of Medicine, Shihezi, China
| | - Xianghui Zhang
- Department of Public Health, Shihezi University School of Medicine, Shihezi, China
| | - Heng Guo
- Department of Public Health, Shihezi University School of Medicine, Shihezi, China
| | - Jia He
- Department of Public Health, Shihezi University School of Medicine, Shihezi, China
| | - Xinping Wang
- Department of Public Health, Shihezi University School of Medicine, Shihezi, China
| | - Yizhong Yan
- Department of Public Health, Shihezi University School of Medicine, Shihezi, China
| | - Jiaolong Ma
- Department of Public Health, Shihezi University School of Medicine, Shihezi, China
| | - Rulin Ma
- Department of Public Health, Shihezi University School of Medicine, Shihezi, China
| | - Shuxia Guo
- Department of Public Health, Shihezi University School of Medicine, Shihezi, China
- Department of NHC Key Laboratory of Prevention and Treatment of Central Asia High Incidence Diseases, The First Affiliated Hospital of Shihezi University Medical College, Shihezi, China
| |
Collapse
|
8
|
Chen X, Zhang Q, Zhang Q. Predicting potential biomarkers and immune infiltration characteristics in heart failure. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:8671-8688. [PMID: 35942730 DOI: 10.3934/mbe.2022402] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
BACKGROUND Studies have demonstrated that immune cell activation and their infiltration in the myocardium can have adverse effects on the heart, contributing to the pathogenesis of heart failure (HF). The purpose of this study is used by bioinformatics analysis to determine the potential diagnostic markers of heart failure and establish an applicable model to predict the association between heart failure and immune cell infiltration. METHODS Firstly, gene expression profiles of dilated heart disease GSE3585 and GSE120895 were obtained in Gene Expression Omnibus (GEO) database. This study then selected differentially expressed genes (DEGs) in 54 patients with HF and 13 healthy controls. In this study, biomarkers were identified using Least Absolute Shrinkage and Selector Operation (LASSO) and Support Vector Machine-Recursive Feature Elimination (SVM-RFE). Additionally, we evaluated the prognostic discrimination performance by the receiver operating characteristic (ROC) curve. Cell type Identification by Estimating Relative Subsets of RNA Transcripts (CIBERSORT) was used for analyzing immune cell infiltration in HF tissues. Lastly, immune biomarkers were correlated with each other. RESULT After 24 DEGs were analyzed using a combinatorial model of LASSO regression and SVM-RFE analysis, four key genes were obtained, namely NSG1, NPPB, PHLDA1, and SERPINE2.The area under the curve (AUC) of these four genes were greater than 0.8. Subsequently, using CIBERPORT, we also found that compared with normal people, the proportion of M1 macrophages and activated mast cells in heart failure tissues decreased. In addition, correlation analysis showed that NPPB, PHLDA1 and SERPINE2 were associated with immune cell infiltration. CONCLUSION NSG1, NPPB, PHLDA1 and SERPINE2 were identified as potential biomarkers of heart failure. It reveals the comprehensive role of relevant central genes in immune infiltration, which provides a new research idea for the treatment and early detection in heart failure.
Collapse
Affiliation(s)
- Xuesi Chen
- Cardiovascular Department, the Affiliated People's Hospital of Ningbo University, Ningbo, Zhejiang, China
| | - Qijun Zhang
- Cardiovascular Department, the Affiliated People's Hospital of Ningbo University, Ningbo, Zhejiang, China
| | - Qin Zhang
- Cardiovascular Department, the Affiliated People's Hospital of Ningbo University, Ningbo, Zhejiang, China
| |
Collapse
|
9
|
Li L, Liu ZP. A connected network-regularized logistic regression model for feature selection. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02877-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
10
|
Li L, Liu ZP. Detecting prognostic biomarkers of breast cancer by regularized Cox proportional hazards models. J Transl Med 2021; 19:514. [PMID: 34930307 PMCID: PMC8686664 DOI: 10.1186/s12967-021-03180-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 12/03/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND The successful identification of breast cancer (BRCA) prognostic biomarkers is essential for the strategic interference of BRCA patients. Recently, various methods have been proposed for exploring a small prognostic gene set that can distinguish the high-risk group from the low-risk group. METHODS Regularized Cox proportional hazards (RCPH) models were proposed to discover prognostic biomarkers of BRCA from gene expression data. Firstly, the maximum connected network with 1142 genes by mapping 956 differentially expressed genes (DEGs) and 677 previously BRCA-related genes into the gene regulatory network (GRN) was constructed. Then, the 72 union genes of the four feature gene sets identified by Lasso-RCPH, Enet-RCPH, [Formula: see text]-RCPH and SCAD-RCPH models were recognized as the robust prognostic biomarkers. These biomarkers were validated by literature checks, BRCA-specific GRN and functional enrichment analysis. Finally, an index of prognostic risk score (PRS) for BRCA was established based on univariate and multivariate Cox regression analysis. Survival analysis was performed to investigate the PRS on 1080 BRCA patients from the internal validation. Particularly, the nomogram was constructed to express the relationship between PRS and other clinical information on the discovery dataset. The PRS was also verified on 1848 BRCA patients of ten external validation datasets or collected cohorts. RESULTS The nomogram highlighted that the importance of PRS in guiding significance for the prognosis of BRCA patients. In addition, the PRS of 301 normal samples and 306 tumor samples from five independent datasets showed that it is significantly higher in tumors than in normal tissues ([Formula: see text]). The protein expression profiles of the three genes, i.e., ADRB1, SAV1 and TSPAN14, involved in the PRS model demonstrated that the latter two genes are more strongly stained in tumor specimens. More importantly, external validation illustrated that the high-risk group has worse survival than the low-risk group ([Formula: see text]) in both internal and external validations. CONCLUSIONS The proposed pipelines of detecting and validating prognostic biomarker genes for BRCA are effective and efficient. Moreover, the proposed PRS is very promising as an important indicator for judging the prognosis of BRCA patients.
Collapse
Affiliation(s)
- Lingyu Li
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, 250061, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, 250061, China.
| |
Collapse
|
11
|
Xiong D, Wang Y, You M. Reply to: "Inconsistent prediction capability of ImmuneCells.Sig across different RNA-seq datasets". Nat Commun 2021; 12:4168. [PMID: 34234120 PMCID: PMC8263738 DOI: 10.1038/s41467-021-24304-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Accepted: 06/10/2021] [Indexed: 12/29/2022] Open
Affiliation(s)
- Donghai Xiong
- Center for Cancer Prevention, Houston Methodist Cancer Center, Houston Methodist Research Institute, Houston, TX, United States
| | - Yian Wang
- Center for Cancer Prevention, Houston Methodist Cancer Center, Houston Methodist Research Institute, Houston, TX, United States
| | - Ming You
- Center for Cancer Prevention, Houston Methodist Cancer Center, Houston Methodist Research Institute, Houston, TX, United States.
| |
Collapse
|
12
|
Xiao MX, Lu CH, Ta N, Wei HC, Haryadi B, Wu HT. Machine learning prediction of future peripheral neuropathy in type 2 diabetics with percussion entropy and body mass indices. Biocybern Biomed Eng 2021. [DOI: 10.1016/j.bbe.2021.08.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
13
|
Jiang Y, Zhang X, Ma R, Wang X, Liu J, Keerman M, Yan Y, Ma J, Song Y, Zhang J, He J, Guo S, Guo H. Cardiovascular Disease Prediction by Machine Learning Algorithms Based on Cytokines in Kazakhs of China. Clin Epidemiol 2021; 13:417-428. [PMID: 34135637 PMCID: PMC8200454 DOI: 10.2147/clep.s313343] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Accepted: 05/17/2021] [Indexed: 12/17/2022] Open
Abstract
Background Cardiovascular disease (CVD) is the leading cause of mortality worldwide. Accurately identifying subjects at high-risk of CVD may improve CVD outcomes. We sought to systematically examine the feasibility and performance of 7 widely used machine learning (ML) algorithms in predicting CVD risks. Methods The final analysis included 1508 Kazakh subjects in China without CVD at baseline who completed follow-up. All subjects were randomly divided into the training set (80%) and the test set (20%). L1-penalized logistic regression (LR), support vector machine with radial basis function (SVM), decision tree (DT), random forest (RF), k-nearest neighbors (KNN), Gaussian naive Bayes (NB), and extreme gradient boosting (XGB) were employed for prediction CVD outcomes. Ten-fold cross-validation was used during model developing and hyperparameters tuning in the training set. Model performance was evaluated in the test set in light of discrimination, calibration, and clinical usefulness. RF was applied to obtain the variable importance of included variables. Twenty-two variables, including sociodemographic characteristics, medical history, cytokines, and synthetic indices, were used for model development. Results Among 1508 subjects, 203 were diagnosed with CVD over a median follow-up of 5.17 years. All 7 models had moderate to excellent discrimination (AUC ranged from 0.770 to 0.872) and were well calibrated. LR and SVM performed identically with an AUC of 0.872 (95% CI: 0.829–0.907) and 0.868 (95% CI: 0.825–0.904), respectively. LR had the lowest Brier score (0.078) and the highest sensitivity (97.1%). Decision curve analysis indicated that SVM was slightly better than LR. The inflammatory cytokines, such as hs-CRP and IL-6, were identified as strong predictors of CVD. Conclusion SVM and LR can be applied to guide clinical decision-making in the Kazakh Chinese population, and further study is required to ensure their accuracies.
Collapse
Affiliation(s)
- Yunxing Jiang
- Department of Public Health, Shihezi University School of Medicine, Shihezi, Xinjiang, People's Republic of China
| | - Xianghui Zhang
- Department of Public Health, Shihezi University School of Medicine, Shihezi, Xinjiang, People's Republic of China
| | - Rulin Ma
- Department of Public Health, Shihezi University School of Medicine, Shihezi, Xinjiang, People's Republic of China
| | - Xinping Wang
- Department of Public Health, Shihezi University School of Medicine, Shihezi, Xinjiang, People's Republic of China
| | - Jiaming Liu
- Department of Public Health, Shihezi University School of Medicine, Shihezi, Xinjiang, People's Republic of China
| | - Mulatibieke Keerman
- Department of Public Health, Shihezi University School of Medicine, Shihezi, Xinjiang, People's Republic of China
| | - Yizhong Yan
- Department of Public Health, Shihezi University School of Medicine, Shihezi, Xinjiang, People's Republic of China
| | - Jiaolong Ma
- Department of Public Health, Shihezi University School of Medicine, Shihezi, Xinjiang, People's Republic of China
| | - Yanpeng Song
- Department of Public Health, Shihezi University School of Medicine, Shihezi, Xinjiang, People's Republic of China.,The First Affiliated Hospital of Shihezi University Medical College, Shihezi, Xinjiang, People's Republic of China
| | - Jingyu Zhang
- Department of Public Health, Shihezi University School of Medicine, Shihezi, Xinjiang, People's Republic of China
| | - Jia He
- Department of Public Health, Shihezi University School of Medicine, Shihezi, Xinjiang, People's Republic of China
| | - Shuxia Guo
- Department of Public Health, Shihezi University School of Medicine, Shihezi, Xinjiang, People's Republic of China.,Department of Pathology and Key Laboratory of Xinjiang Endemic and Ethnic Diseases (Ministry of Education), Shihezi University School of Medicine, Shihezi, Xinjiang, People's Republic of China
| | - Heng Guo
- Department of Public Health, Shihezi University School of Medicine, Shihezi, Xinjiang, People's Republic of China
| |
Collapse
|