1
|
Ainiwaer A, Hou WQ, Kadier K, Rehemuding R, Liu PF, Maimaiti H, Qin L, Ma X, Dai JG. A Machine Learning Framework for Diagnosing and Predicting the Severity of Coronary Artery Disease. Rev Cardiovasc Med 2023; 24:168. [PMID: 39077543 PMCID: PMC11264126 DOI: 10.31083/j.rcm2406168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 03/02/2023] [Accepted: 03/06/2023] [Indexed: 07/31/2024] Open
Abstract
Background Although machine learning (ML)-based prediction of coronary artery disease (CAD) has gained increasing attention, assessment of the severity of suspected CAD in symptomatic patients remains challenging. Methods The training set for this study consisted of 284 retrospective participants, while the test set included 116 prospectively enrolled participants from whom we collected 53 baseline variables and coronary angiography results. The data was pre-processed with outlier processing and One-Hot coding. In the first stage, we constructed a ML model that used baseline information to predict the presence of CAD with a dichotomous model. In the second stage, baseline information was used to construct ML regression models for predicting the severity of CAD. The non-CAD population was included, and two different scores were used as output variables. Finally, statistical analysis and SHAP plot visualization methods were employed to explore the relationship between baseline information and CAD. Results The study included 269 CAD patients and 131 healthy controls. The eXtreme Gradient Boosting (XGBoost) model exhibited the best performance amongst the different models for predicting CAD, with an area under the receiver operating characteristic curve of 0.728 (95% CI 0.623-0.824). The main correlates were left ventricular ejection fraction, homocysteine, and hemoglobin (p < 0.001). The XGBoost model performed best for predicting the SYNTAX score, with the main correlates being brain natriuretic peptide (BNP), left ventricular ejection fraction, and glycated hemoglobin (p < 0.001). The main relevant features in the model predictive for the GENSINI score were BNP, high density lipoprotein, and homocysteine (p < 0.001). Conclusions This data-driven approach provides a foundation for the risk stratification and severity assessment of CAD. Clinical Trial Registration The study was registered in www.clinicaltrials.gov protocol registration system (number NCT05018715).
Collapse
Affiliation(s)
- Aikeliyaer Ainiwaer
- Department of Cardiology, The First Affiliated Hospital of Xinjiang
Medical University, 830011 Urumqi, Xinjiang, China
| | - Wen Qing Hou
- College of Information Science and Technology, Shihezi University, 832003
Shihezi, Xinjiang, China
| | - Kaisaierjiang Kadier
- Department of Cardiology, The First Affiliated Hospital of Xinjiang
Medical University, 830011 Urumqi, Xinjiang, China
| | - Rena Rehemuding
- Department of Cardiology, The First Affiliated Hospital of Xinjiang
Medical University, 830011 Urumqi, Xinjiang, China
| | - Peng Fei Liu
- Department of Cardiology, The First Affiliated Hospital of Xinjiang
Medical University, 830011 Urumqi, Xinjiang, China
| | - Halimulati Maimaiti
- Department of Cardiology, The First Affiliated Hospital of Xinjiang
Medical University, 830011 Urumqi, Xinjiang, China
| | - Lian Qin
- Department of Cardiology, The First Affiliated Hospital of Xinjiang
Medical University, 830011 Urumqi, Xinjiang, China
| | - Xiang Ma
- Department of Cardiology, The First Affiliated Hospital of Xinjiang
Medical University, 830011 Urumqi, Xinjiang, China
| | - Jian Guo Dai
- College of Information Science and Technology, Shihezi University, 832003
Shihezi, Xinjiang, China
| |
Collapse
|
2
|
Wang K, Li J, Meng D, Zhang Z, Liu S. Machine learning based on metabolomics reveals potential targets and biomarkers for primary Sjogren’s syndrome. Front Mol Biosci 2022; 9:913325. [PMID: 36133908 PMCID: PMC9483105 DOI: 10.3389/fmolb.2022.913325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Accepted: 08/10/2022] [Indexed: 11/13/2022] Open
Abstract
Background: Using machine learning based on metabolomics, this study aimed to construct an effective primary Sjogren’s syndrome (pSS) diagnostics model and reveal the potential targets and biomarkers of pSS.Methods: From a total of 39 patients with pSS and 38 healthy controls (HCs), serum specimens were collected. The samples were analyzed by ultra-high-performance liquid chromatography coupled with high-resolution mass spectrometry. Three machine learning algorithms, including the least absolute shrinkage and selection operator (LASSO), random forest (RF), and extreme gradient boosting (XGBoost), were used to build the pSS diagnosis models. Afterward, four machine learning methods were used to reduce the dimensionality of the metabolomics data. Finally, metabolites with significant differences were screened and pathway analysis was conducted.Results: The area under the curve (AUC), sensitivity, and specificity of LASSO, RF and XGBoost test set all reached 1.00. Orthogonal partial least squares discriminant analysis was used to classify the metabolomics data. By combining the results of the univariate false discovery rate and the importance of the variable in projection, we identified 21 significantly different metabolites. Using these 21 metabolites for diagnostic modeling, the AUC, sensitivity, and specificity of LASSO, RF, and XGBoost all reached 1.00. Metabolic pathway analysis revealed that these 21 metabolites are highly correlated with amino acid and lipid metabolisms. On the basis of 21 metabolites, we screened the important variables in the models. Further, five common variables were obtained by intersecting the important variables of three models. Based on these five common variables, the AUC, sensitivity, and specificity of LASSO, RF, and XGBoost all reached 1.00.2-Hydroxypalmitic acid, L-carnitine and cyclic AMP were found to be potential targets and specific biomarkers for pSS.Conclusion: The combination of machine learning and metabolomics can accurately distinguish between patients with pSS and HCs. 2-Hydroxypalmitic acid, L-carnitine and cyclic AMP were potential targets and biomarkers for pSS.
Collapse
|
3
|
Rahman MS, Chowdhury AH, Amrin M. Accuracy comparison of ARIMA and XGBoost forecasting models in predicting the incidence of COVID-19 in Bangladesh. PLOS GLOBAL PUBLIC HEALTH 2022; 2:e0000495. [PMID: 36962227 PMCID: PMC10021465 DOI: 10.1371/journal.pgph.0000495] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2021] [Accepted: 04/27/2022] [Indexed: 04/19/2023]
Abstract
Accurate predictive time series modelling is important in public health planning and response during the emergence of a novel pandemic. Therefore, the aims of the study are three-fold: (a) to model the overall trend of COVID-19 confirmed cases and deaths in Bangladesh; (b) to generate a short-term forecast of 8 weeks of COVID-19 cases and deaths; (c) to compare the predictive accuracy of the Autoregressive Integrated Moving Average (ARIMA) and eXtreme Gradient Boosting (XGBoost) for precise modelling of non-linear features and seasonal trends of the time series. The data were collected from the onset of the epidemic in Bangladesh from the Directorate General of Health Service (DGHS) and Institute of Epidemiology, Disease Control and Research (IEDCR). The daily confirmed cases and deaths of COVID-19 of 633 days in Bangladesh were divided into several training and test sets. The ARIMA and XGBoost models were established using those training data, and the test sets were used to evaluate each model's ability to forecast and finally averaged all the predictive performances to choose the best model. The predictive accuracy of the models was assessed using the mean absolute error (MAE), mean percentage error (MPE), root mean square error (RMSE) and mean absolute percentage error (MAPE). The findings reveal the existence of a nonlinear trend and weekly seasonality in the dataset. The average error measures of the ARIMA model for both COVID-19 confirmed cases and deaths were lower than XGBoost model. Hence, in our study, the ARIMA model performed better than the XGBoost model in predicting COVID-19 confirmed cases and deaths in Bangladesh. The suggested prediction model might play a critical role in estimating the spread of a novel pandemic in Bangladesh and similar countries.
Collapse
|
4
|
Hua D, Desaire H. Improved Discrimination of Disease States Using Proteomics Data with the Updated Aristotle Classifier. J Proteome Res 2021; 20:2823-2829. [PMID: 33909976 DOI: 10.1021/acs.jproteome.1c00066] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Mass spectrometry data sets from omics studies are an optimal information source for discriminating patients with disease and identifying biomarkers. Thousands of proteins or endogenous metabolites can be queried in each analysis, spanning several orders of magnitude in abundance. Machine learning tools that effectively leverage these data to accurately identify disease states are in high demand. While mass spectrometry data sets are rich with potentially useful information, using the data effectively can be challenging because of missing entries in the data sets and because the number of samples is typically much smaller than the number of features, two challenges that make machine learning difficult. To address this problem, we have modified a new supervised classification tool, the Aristotle Classifier, so that omics data sets can be better leveraged for identifying disease states. The optimized classifier, AC.2021, is benchmarked on multiple data sets against its predecessor and two leading supervised classification tools, Support Vector Machine (SVM) and XGBoost. The new classifier, AC.2021, outperformed existing tools on multiple tests using proteomics data. The underlying code for the classifier, provided herein, would be useful for researchers who desire improved classification accuracy when using their omics data sets to identify disease states.
Collapse
Affiliation(s)
- David Hua
- Department of Chemistry, University of Kansas, Lawrence, Kansas 66045, United States
| | - Heather Desaire
- Department of Chemistry, University of Kansas, Lawrence, Kansas 66045, United States
| |
Collapse
|
5
|
Yang F, Li Q, Xiang J, Zhang H, Sun H, Ruan G, Tang Y. NMR-based plasma metabolomics of adult B-cell acute lymphoblastic leukemia. Mol Omics 2020; 17:153-159. [PMID: 33295915 DOI: 10.1039/d0mo00067a] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Acute lymphoblastic leukemia (ALL) is one of the common malignant tumors. Compared with childhood ALL, the treatment effect of adult B-cell ALL is less effective and remains a big challenge. In order to explore the pathogenesis of adult B-cell ALL and find new diagnostic biomarkers to develop sensitive diagnostic tools, we investigated the plasma metabolites of adult B-cell ALL by using 1H NMR (nuclear magnetic resonance) metabolomics. Relative to healthy controls, adult B-cell ALL patients showed abnormal metabolism, including glycolysis, gluconeogenesis, amino acid metabolism, fatty acid metabolism and choline phospholipid metabolism. What's more important, we also found that the optimal combination of choline, tyrosine and unsaturated lipids has the potential to diagnose and prognose adult B-cell ALL in the clinic.
Collapse
Affiliation(s)
- Fengmin Yang
- National Laboratory for Molecular Sciences, Center for Molecular Sciences, State Key Laboratory for Structural Chemistry of Unstable and Stable Species, Institute of Chemistry, Chinese Academy of Sciences, Beijing, P. R. China.
| | | | | | | | | | | | | |
Collapse
|
6
|
Alim M, Ye GH, Guan P, Huang DS, Zhou BS, Wu W. Comparison of ARIMA model and XGBoost model for prediction of human brucellosis in mainland China: a time-series study. BMJ Open 2020; 10:e039676. [PMID: 33293308 PMCID: PMC7722837 DOI: 10.1136/bmjopen-2020-039676] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
OBJECTIVES Human brucellosis is a public health problem endangering health and property in China. Predicting the trend and the seasonality of human brucellosis is of great significance for its prevention. In this study, a comparison between the autoregressive integrated moving average (ARIMA) model and the eXtreme Gradient Boosting (XGBoost) model was conducted to determine which was more suitable for predicting the occurrence of brucellosis in mainland China. DESIGN Time-series study. SETTING Mainland China. METHODS Data on human brucellosis in mainland China were provided by the National Health and Family Planning Commission of China. The data were divided into a training set and a test set. The training set was composed of the monthly incidence of human brucellosis in mainland China from January 2008 to June 2018, and the test set was composed of the monthly incidence from July 2018 to June 2019. The mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE) were used to evaluate the effects of model fitting and prediction. RESULTS The number of human brucellosis patients in mainland China increased from 30 002 in 2008 to 40 328 in 2018. There was an increasing trend and obvious seasonal distribution in the original time series. For the training set, the MAE, RSME and MAPE of the ARIMA(0,1,1)×(0,1,1)12 model were 338.867, 450.223 and 10.323, respectively, and the MAE, RSME and MAPE of the XGBoost model were 189.332, 262.458 and 4.475, respectively. For the test set, the MAE, RSME and MAPE of the ARIMA(0,1,1)×(0,1,1)12 model were 529.406, 586.059 and 17.676, respectively, and the MAE, RSME and MAPE of the XGBoost model were 249.307, 280.645 and 7.643, respectively. CONCLUSIONS The performance of the XGBoost model was better than that of the ARIMA model. The XGBoost model is more suitable for prediction cases of human brucellosis in mainland China.
Collapse
Affiliation(s)
- Mirxat Alim
- Department of Epidemiology, China Medical University, Shenyang, China
| | - Guo-Hua Ye
- Department of Epidemiology, China Medical University, Shenyang, China
| | - Peng Guan
- Department of Epidemiology, China Medical University, Shenyang, China
| | - De-Sheng Huang
- Department of Mathematics, China Medical University, Shenyang, China
| | - Bao-Sen Zhou
- Department of Epidemiology, China Medical University, Shenyang, China
| | - Wei Wu
- Department of Epidemiology, China Medical University, Shenyang, China
| |
Collapse
|
7
|
Gu J, Zhang Z, Lang T, Ma X, Yang L, Xu J, Tian C, Han K, Qiu J. PTPRU, As A Tumor Suppressor, Inhibits Cancer Stemness By Attenuating Hippo/YAP Signaling Pathway. Onco Targets Ther 2019; 12:8095-8104. [PMID: 31632062 PMCID: PMC6782031 DOI: 10.2147/ott.s218125] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2019] [Accepted: 09/13/2019] [Indexed: 12/11/2022] Open
Abstract
Background PTPRU is an important signaling molecule that regulates a variety of cellular processes; however, the role of PTPRU in cancer development has remained elusive. Here, we report that PTPRU serves as a tumor suppressor that inhibits cancer stemness by attenuating Hippo/YAP signaling pathway. Methods Primary cancer cells and cell line cells were used in the study. The gene expression data were downloaded from R2 analysis and visualization platform and Kaplan–Meier analysis was performed to study the relationship between survival and PTPRU expression. qRT-PCR and Western blot were employed to study the expression of target genes in tissues and cells. Sphere and colony formation, proliferation, migration activities and the expression of stem cell and EMT markers were employed for characterizing the stemness. Gene manipulation was achieved by lentivirus-mediated gene delivery system. Luciferase reporter gene assay was used to study the transcriptional activity of the promoter, and ChIP-qPCR was employed to study the target binding sequence of the protein. Spearman correlation analysis was performed to study the correlation between two genes. Student’s t-test was used for determination of the significance between two experimental groups. Results PTPRU is downregulated in colorectal and gastric cancer tissues and cancer stem cells. High expression of PTPRU predicts poor prognosis. Overexpression of PTPRU attenuates the stemness of gastric cancer stem cells and knockdown of PTRPU improves the maintenance of the stemness of cancer stem cells. Mechanistic analysis showed that PTPRU inhibits Hippo/YAP signaling by suppressing the expression of YAP in a transcriptional level. Overexpression of YAP restored PTPRU-induced inhibited stemness of gastric cancer stem cells. Conclusion PTPRU serves as a tumor suppressor that inhibits the stemness of cancer stem cell by inhibiting Hippo/YAP signaling pathway.
Collapse
Affiliation(s)
- Jiayi Gu
- Department of Gastrointestinal Surgery, Renji Hospital Shanghai Jiao Tong University School of Medicine, Shanghai 200127, People's Republic of China
| | - Zhiqi Zhang
- Department of General Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai 200233, People's Republic of China
| | - Tingyuan Lang
- Chongqing Key Laboratory of Translational Research for Cancer Metastasis and Individualized Treatment, Chongqing University Cancer Hospital and Chongqing Cancer Institute and Chongqing Cancer Hospital, Chongqing 400030, People's Republic of China
| | - Xinlin Ma
- Department of Gastrointestinal Surgery, Renji Hospital Shanghai Jiao Tong University School of Medicine, Shanghai 200127, People's Republic of China
| | - Linxi Yang
- Department of Gastrointestinal Surgery, Renji Hospital Shanghai Jiao Tong University School of Medicine, Shanghai 200127, People's Republic of China
| | - Jia Xu
- Department of Gastrointestinal Surgery, Renji Hospital Shanghai Jiao Tong University School of Medicine, Shanghai 200127, People's Republic of China
| | - Cong Tian
- Department of Medical Oncology, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai 200233, People's Republic of China
| | - Kun Han
- Department of Medical Oncology, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai 200233, People's Republic of China
| | - Jiangfeng Qiu
- Department of Gastrointestinal Surgery, Renji Hospital Shanghai Jiao Tong University School of Medicine, Shanghai 200127, People's Republic of China
| |
Collapse
|
8
|
Wang L, Zhang H, Lei D. microRNA-146a Promotes Growth of Acute Leukemia Cells by Downregulating Ciliary Neurotrophic Factor Receptor and Activating JAK2/STAT3 Signaling. Yonsei Med J 2019; 60:924-934. [PMID: 31538427 PMCID: PMC6753346 DOI: 10.3349/ymj.2019.60.10.924] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Revised: 07/17/2019] [Accepted: 07/30/2019] [Indexed: 02/07/2023] Open
Abstract
PURPOSE Acute leukemia (AL) is classified as acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML). This study aimed to investigate the effect of miR-146a on childhood AL and its underlying molecular mechanisms. MATERIALS AND METHODS Bone marrow samples were obtained from 39 AL children and 10 non-cancer controls. The expressions of miR-146a and ciliary neurotrophic factor receptor (CNTFR) were detected by quantitative real-time polymerase chain reaction (qRT-PCR) in ALL and AML pediatric patients, as well as ALL (Jurkat) and AML (HL-60) cells. Correlations between miR-146a and clinical indicators were explored. A targeting relationship between miR-146a and CNTFR was detected by dual luciferase reporter gene assay. Cell proliferation, apoptosis, migration, and invasion of Jurkat and HL-60 cells were measured by MTT assay, flow cytometry, and transwell assay, respectively. LIF expression was detected by qRT-PCR in Jurkat and HL-60 cells. The expression of p-JAK2, JAK2, p-STAT3, and STAT3 in HL-60 cells was measured by Western blot. RESULTS miR-146a was increased in ALL and AML pediatric patients, while CNTFR was decreased. miR-146a expression was associated with immunophenotype, karyotype, fusion gene, and SIL-TAL1. CNTFR was a target gene of miR-146a. miR-146a could promote cell proliferation, migration, and invasion, as well as inhibit cell apoptosis in Jurkat and HL-60 cells by downregulating CNTFR. Meanwhile, miR-146a inhibited the expression of LIF and activated JAK2/STAT3 pathway by downregulating CNTFR. CONCLUSION miR-146a could promote the proliferation, migration, and invasion and inhibit the apoptosis of AL Jurkat and HL-60 cells by downregulating CNTFR and activating the JAK2/STAT3 pathway.
Collapse
Affiliation(s)
- Lei Wang
- Department of Pediatrics, Jinan Second Maternal and Child Health Hospital, Jinan, Shandong, China
| | - Hongyan Zhang
- Department of Pediatrics, Jinan First People's Hospital, Jinan, Shandong, China
| | - Donghong Lei
- Department of Pediatrics II, Yulin First Hospital, Suide, Shaanxi, China.
| |
Collapse
|