1
|
Zhang Z, Lan H, Zhao S. Analysis of the Value of Quantitative Features in Multimodal MRI Images to Construct a Radio-Omics Model for Breast Cancer Diagnosis. BREAST CANCER (DOVE MEDICAL PRESS) 2024; 16:305-318. [PMID: 38895649 PMCID: PMC11182731 DOI: 10.2147/bctt.s458036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Accepted: 05/24/2024] [Indexed: 06/21/2024]
Abstract
Objective To analyze the diagnostic value of quantitative features in multimodal magnetic resonance imaging (MRI) images to construct a radio-omics model for breast cancer. Methods Ninety-five patients with breast-related diseases from January 2020 to January 2021 were grouped into the benign group (n=57) and malignant group (n=38) according to the pathological findings. All cases were randomized as the training group (n=66) and validation group (n=29) in a 7:3 ratio based on the examination time. All subjects were examined by T1-weighted imaging (T1WI), T2-weighted imaging (T2WI), diffusion-weighted imaging (DWI), dynamic contrast enhancement (DCE), and apparent diffusion coefficient (ADC) multimodality MRI. The MRI findings were analyzed against pathological findings. A diagnostic breast cancer radiomics model was constructed. The diagnostic efficacy of the model in the validation group was analyzed, and the diagnostic efficacy was analyzed via the ROC curve. Results Fibroadenoma accounted for 49.12% of benign breast diseases, and invasive ductal carcinoma accounted for 73.68% of malignant breast diseases. The sensitivity of T1WI, T2WI, DWI, ADC, and DCE in diagnosing breast cancer was 61.14%, 66.67%, 73.30%, 78.95%, and 85.96%, using the four-fold table method. The area under the curves (AUCs) of T1WI, T2WI, DWI, ADC, and DCE for diagnosing breast cancer were 0.715, 0.769, 0.785, 0.835, and 0.792, respectively. The AUCs of plain scan, diffuse, enhanced, plain scan + diffuse, plain scan + enhanced, enhanced + diffuse, and plain scan + enhanced + diffuse for diagnosing breast cancer were 0.746, 0.798, 0.816, 0.839, 0.890, 0.906, and 0.927, respectively. Conclusion The construction of a radio-omics model by quantitative features in multimodal MRI images was valuable in the diagnosis of breast cancer. The value of radio-omics models such as plain scan + enhanced + diffuse was higher than the other models in diagnosing breast cancer and could be widely applied in clinical practice.
Collapse
Affiliation(s)
- Zhitao Zhang
- Department of Galactophore, Fujian Maternity and Child Health Hospital, Fuzhou, Fujian Province, 350001, People’s Republic of China
| | - Huan Lan
- Department of Galactophore, Fujian Maternity and Child Health Hospital, Fuzhou, Fujian Province, 350001, People’s Republic of China
| | - Shuai Zhao
- Department of Galactophore, Fujian Maternity and Child Health Hospital, Fuzhou, Fujian Province, 350001, People’s Republic of China
| |
Collapse
|
2
|
Patil AR, Schug J, Liu C, Lahori D, Descamps HC, Naji A, Kaestner KH, Faryabi RB, Vahedi G. Modeling type 1 diabetes progression using machine learning and single-cell transcriptomic measurements in human islets. Cell Rep Med 2024; 5:101535. [PMID: 38677282 PMCID: PMC11148720 DOI: 10.1016/j.xcrm.2024.101535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 01/22/2024] [Accepted: 04/07/2024] [Indexed: 04/29/2024]
Abstract
Type 1 diabetes (T1D) is a chronic condition in which beta cells are destroyed by immune cells. Despite progress in immunotherapies that could delay T1D onset, early detection of autoimmunity remains challenging. Here, we evaluate the utility of machine learning for early prediction of T1D using single-cell analysis of islets. Using gradient-boosting algorithms, we model changes in gene expression of single cells from pancreatic tissues in T1D and non-diabetic organ donors. We assess if mathematical modeling could predict the likelihood of T1D development in non-diabetic autoantibody-positive donors. While most autoantibody-positive donors are predicted to be non-diabetic, select donors with unique gene signatures are classified as T1D. Our strategy also reveals a shared gene signature in distinct T1D-associated models across cell types, suggesting a common effect of the disease on transcriptional outputs of these cells. Our study establishes a precedent for using machine learning in early detection of T1D.
Collapse
Affiliation(s)
- Abhijeet R Patil
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Institute for Immunology and Immune Health, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Epigenetics Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Institute for Diabetes, Obesity and Metabolism, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Jonathan Schug
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Epigenetics Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Institute for Diabetes, Obesity and Metabolism, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Chengyang Liu
- Institute for Immunology and Immune Health, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Department of Surgery, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Deeksha Lahori
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Epigenetics Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Institute for Diabetes, Obesity and Metabolism, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Hélène C Descamps
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Epigenetics Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Institute for Diabetes, Obesity and Metabolism, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Ali Naji
- Institute for Immunology and Immune Health, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Institute for Diabetes, Obesity and Metabolism, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Department of Surgery, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Klaus H Kaestner
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Epigenetics Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Institute for Diabetes, Obesity and Metabolism, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Robert B Faryabi
- Institute for Immunology and Immune Health, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Epigenetics Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Abramson Family Cancer Research Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Golnaz Vahedi
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Institute for Immunology and Immune Health, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Epigenetics Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Institute for Diabetes, Obesity and Metabolism, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Abramson Family Cancer Research Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA.
| |
Collapse
|
3
|
Su L, Hounye AH, Pan Q, Miao K, Wang J, Hou M, Xiong L. Explainable cancer factors discovery: Shapley additive explanation for machine learning models demonstrates the best practices in the case of pancreatic cancer. Pancreatology 2024; 24:404-423. [PMID: 38342661 DOI: 10.1016/j.pan.2024.02.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 01/07/2024] [Accepted: 02/05/2024] [Indexed: 02/13/2024]
Abstract
Pancreatic cancer is one of digestive tract cancers with high mortality rate. Despite the wide range of available treatments and improvements in surgery, chemotherapy, and radiation therapy, the five-year prognosis for individuals diagnosed pancreatic cancer remains poor. There is still research to be done to see if immunotherapy may be used to treat pancreatic cancer. The goals of our research were to comprehend the tumor microenvironment of pancreatic cancer, found a useful biomarker to assess the prognosis of patients, and investigated its biological relevance. In this paper, machine learning methods such as random forest were fused with weighted gene co-expression networks for screening hub immune-related genes (hub-IRGs). LASSO regression model was used to further work. Thus, we got eight hub-IRGs. Based on hub-IRGs, we created a prognosis risk prediction model for PAAD that can stratify accurately and produce a prognostic risk score (IRG_Score) for each patient. In the raw data set and the validation data set, the five-year area under the curve (AUC) for this model was 0.9 and 0.7, respectively. And shapley additive explanation (SHAP) portrayed the importance of prognostic risk prediction influencing factors from a machine learning perspective to obtain the most influential certain gene (or clinical factor). The five most important factors were TRIM67, CORT, PSPN, SCAMP5, RFXAP, all of which are genes. In summary, the eight hub-IRGs had accurate risk prediction performance and biological significance, which was validated in other cancers. The result of SHAP helped to understand the molecular mechanism of pancreatic cancer.
Collapse
Affiliation(s)
- Liuyan Su
- School of Mathematics and Statistics, Central South University, Changsha, 410083, China
| | | | - Qi Pan
- School of Mathematics and Statistics, Central South University, Changsha, 410083, China
| | - Kexin Miao
- School of Mathematics and Statistics, Central South University, Changsha, 410083, China
| | - Jiaoju Wang
- School of Mathematics and Statistics, Central South University, Changsha, 410083, China
| | - Muzhou Hou
- School of Mathematics and Statistics, Central South University, Changsha, 410083, China.
| | - Li Xiong
- Department of General Surgery, The Second Xiangya Hospital, Central South University, Changsha, 410011, China; Hunan Clinical Research Center for Intelligent General Surgery, Changsha, 410011, China.
| |
Collapse
|
4
|
Klauschen F, Dippel J, Keyl P, Jurmeister P, Bockmayr M, Mock A, Buchstab O, Alber M, Ruff L, Montavon G, Müller KR. Toward Explainable Artificial Intelligence for Precision Pathology. ANNUAL REVIEW OF PATHOLOGY 2024; 19:541-570. [PMID: 37871132 DOI: 10.1146/annurev-pathmechdis-051222-113147] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
The rapid development of precision medicine in recent years has started to challenge diagnostic pathology with respect to its ability to analyze histological images and increasingly large molecular profiling data in a quantitative, integrative, and standardized way. Artificial intelligence (AI) and, more precisely, deep learning technologies have recently demonstrated the potential to facilitate complex data analysis tasks, including clinical, histological, and molecular data for disease classification; tissue biomarker quantification; and clinical outcome prediction. This review provides a general introduction to AI and describes recent developments with a focus on applications in diagnostic pathology and beyond. We explain limitations including the black-box character of conventional AI and describe solutions to make machine learning decisions more transparent with so-called explainable AI. The purpose of the review is to foster a mutual understanding of both the biomedical and the AI side. To that end, in addition to providing an overview of the relevant foundations in pathology and machine learning, we present worked-through examples for a better practical understanding of what AI can achieve and how it should be done.
Collapse
Affiliation(s)
- Frederick Klauschen
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
- Institute of Pathology, Charité Universitätsmedizin Berlin, Berlin, Germany
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
- German Cancer Consortium, German Cancer Research Center (DKTK/DKFZ), Munich Partner Site, Munich, Germany
| | - Jonas Dippel
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
- Machine Learning Group, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany;
| | - Philipp Keyl
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
| | - Philipp Jurmeister
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
- German Cancer Consortium, German Cancer Research Center (DKTK/DKFZ), Munich Partner Site, Munich, Germany
| | - Michael Bockmayr
- Institute of Pathology, Charité Universitätsmedizin Berlin, Berlin, Germany
- Department of Pediatric Hematology and Oncology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Research Institute Children's Cancer Center Hamburg, Hamburg, Germany
| | - Andreas Mock
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
- German Cancer Consortium, German Cancer Research Center (DKTK/DKFZ), Munich Partner Site, Munich, Germany
| | - Oliver Buchstab
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
| | - Maximilian Alber
- Institute of Pathology, Charité Universitätsmedizin Berlin, Berlin, Germany
- Aignostics, Berlin, Germany
| | | | - Grégoire Montavon
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
- Machine Learning Group, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany;
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | - Klaus-Robert Müller
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
- Machine Learning Group, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany;
- Department of Artificial Intelligence, Korea University, Seoul, Korea
- Max Planck Institute for Informatics, Saarbrücken, Germany
| |
Collapse
|
5
|
Teng X, Wang Z. Online COVID-19 diagnosis prediction using complete blood count: an innovative tool for public health. BMC Public Health 2023; 23:2536. [PMID: 38114942 PMCID: PMC10729447 DOI: 10.1186/s12889-023-17477-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 12/13/2023] [Indexed: 12/21/2023] Open
Abstract
BACKGROUND COVID-19, caused by SARS-CoV-2, presents distinct diagnostic challenges due to its wide range of clinical manifestations and the overlapping symptoms with other common respiratory diseases. This study focuses on addressing these difficulties by employing machine learning (ML) methodologies, particularly the XGBoost algorithm, to utilize Complete Blood Count (CBC) parameters for predictive analysis. METHODS We performed a retrospective study involving 2114 COVID-19 patients treated between December 2022 and January 2023 at our healthcare facility. These patients were classified into fever (1057 patients) and pneumonia groups (1057 patients), based on their clinical symptoms. The CBC data were utilized to create predictive models, with model performance evaluated through metrics like Area Under the Receiver Operating Characteristics Curve (AUC), accuracy, sensitivity, specificity, and precision. We selected the top 10 predictive variables based on their significance in disease prediction. The data were then split into a training set (70% of patients) and a validation set (30% of patients) for model validation. RESULTS We identified 31 indicators with significant disparities. The XGBoost model outperformed others, with an AUC of 0.920 and high precision, sensitivity, specificity, and accuracy. The top 10 features (Age, Monocyte%, Mean Platelet Volume, Lymphocyte%, SIRI, Eosinophil count, Platelet count, Hemoglobin, Platelet Distribution Width, and Neutrophil count.) were crucial in constructing a more precise predictive model. The model demonstrated strong performance on both training (AUC = 0.977) and validation (AUC = 0.912) datasets, validated by decision curve analysis and calibration curve. CONCLUSION ML models that incorporate CBC parameters offer an innovative and effective tool for data analysis in COVID-19. They potentially enhance diagnostic accuracy and the efficacy of therapeutic interventions, ultimately contributing to a reduction in the mortality rate of this infectious disease.
Collapse
Affiliation(s)
- Xiaojing Teng
- Department of Clinical Laboratory, Affiliated Hangzhou First People's Hospital, Westlake University School of Medicine, Hangzhou, Zhejiang, 310000, China
| | - Zhiyi Wang
- Department of Clinical Laboratory, Hangzhou Women's Hospital (Hangzhou Maternity and Child Health Care Hospital), No. 369, Kunpeng Road, Shangcheng District Hangzhou, Hangzhou, Zhejiang, 310008, China.
| |
Collapse
|
6
|
Umar H, Aliyu MR, Usman AG, Ghali UM, Abba SI, Ozsahin DU. Prediction of cell migration potential on human breast cancer cells treated with Albizia lebbeck ethanolic extract using extreme machine learning. Sci Rep 2023; 13:22242. [PMID: 38097683 PMCID: PMC10721884 DOI: 10.1038/s41598-023-49363-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 12/07/2023] [Indexed: 12/17/2023] Open
Abstract
Cancer is one of the major causes of death in the modern world, and the incidence varies considerably based on race, ethnicity, and region. Novel cancer treatments, such as surgery and immunotherapy, are ineffective and expensive. In this situation, ion channels responsible for cell migration have appeared to be the most promising targets for cancer treatment. This research presents findings on the organic compounds present in Albizia lebbeck ethanolic extracts (ALEE), as well as their impact on the anti-migratory, anti-proliferative and cytotoxic potentials on MDA-MB 231 and MCF-7 human breast cancer cell lines. In addition, artificial intelligence (AI) based models, multilayer perceptron (MLP), extreme gradient boosting (XGB), and extreme learning machine (ELM) were performed to predict in vitro cancer cell migration on both cell lines, based on our experimental data. The organic compounds composition of the ALEE was studied using gas chromatography-mass spectrometry (GC-MS) analysis. Cytotoxicity, anti-proliferations, and anti-migratory activity of the extract using Tryphan Blue, MTT, and Wound Heal assay, respectively. Among the various concentrations (2.5-200 μg/mL) of the ALEE that were used in our study, 2.5-10 μg/mL revealed anti-migratory potential with increased concentrations, and they did not show any effect on the proliferation of the cells (P < 0.05; n ≥ 3). Furthermore, the three data-driven models, Multi-layer perceptron (MLP), Extreme gradient boosting (XGB), and Extreme learning machine (ELM), predict the potential migration ability of the extract on the treated cells based on our experimental data. Overall, the concentrations of the plant extract that do not affect the proliferation of the type cells used demonstrated promising effects in reducing cell migration. XGB outperformed the MLP and ELM models and increased their performance efficiency by up to 3% and 1% for MCF and 1% and 2% for MDA-MB231, respectively, in the testing phase.
Collapse
Affiliation(s)
- Huzaifa Umar
- Near East University, Operational Research Centre in Healthcare, TRNC Mersin 10, 99138, Nicosia, Turkey.
| | - Maryam Rabiu Aliyu
- Department of Energy System Engineering, Cyprus International University, Northern Cyprus via Mersin 10, 99258, Nicosia, Turkey
| | - Abdullahi Garba Usman
- Near East University, Operational Research Centre in Healthcare, TRNC Mersin 10, 99138, Nicosia, Turkey
- Department of Analytical Chemistry, Faculty of Pharmacy, Near East University, TRNC, Mersin 10, 99138, Nicosia, Turkey
| | - Umar Muhammad Ghali
- Department of Chemistry, Faculty of Natural and Applied Sciences, Firat University, Merkezi, 23199, Elazig, Turkey
| | - Sani Isah Abba
- Interdisciplinary Research Centre for Membranes and Water Security, King Fahd University of Petroleum and Minerals, 31261, Dhahran, Saudi Arabia
| | - Dilber Uzun Ozsahin
- Department of Medical Diagnostic Imaging, College of Health Sciences, University of Sharjah, P.O. Box 27272, Sharjah, United Arab Emirates.
- Research Institute for Medical and Health Sciences, University of Sharjah, P.O. Box 27272, Sharjah, United Arab Emirates.
| |
Collapse
|
7
|
Wang Y, Wei B, Zhao T, Shen H, Liu X, Wang J, Wang Q, Shen R, Feng D. Machine learning-based prediction models for parathyroid carcinoma using pre-surgery cognitive function and clinical features. Sci Rep 2023; 13:19007. [PMID: 37923800 PMCID: PMC10624903 DOI: 10.1038/s41598-023-46294-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Accepted: 10/30/2023] [Indexed: 11/06/2023] Open
Abstract
Patients with parathyroid carcinoma (PC) are often diagnosed postoperatively, due to incomplete resection during the initial surgery, resulting in poor outcomes. The aim of our study was to investigate the pre-surgery indicators of PC and try to develop a predictive model for PC utilizing machine learning. Evaluation of pre-surgery neuropsychological function and confirmation of pathology were carried out in 133 patients with primary hyperparathyroidism in Beijing Chaoyang Hospital from December 2019 to January 2023. Patients were randomly divided into a training cohort (n = 93) and a validating cohort (n = 40). Analysis of the clinical dataset, two machine learning including the extreme gradient boosting (XGBoost) and the least absolute shrinkage and selection operator (LASSO) regression were utilized to develop the prediction model for PC. Logistic regression analysis was also conducted for comparison. Significant differences in elevated parathyroid hormone and decreased serum phosphorus in PC compared to (BP). The lower score of MMSE and MOCA was observed in PC and a cutoff of MMSE < 24 was the optimal threshold to stratify PC from BP (area under the curve AUC 0.699 vs 0.625). The predicted probability of PC by machine learning was similar to the observed probability in the test set, whereas the logistic model tended to overpredict the possibility of PC. The XGBoost model attained a higher AUC than the logistic algorithms and LASSO models. (0.835 vs 0.683 vs 0.607). Preoperative cognitive function may be a probable predictor for PC. The cognitive function-based prediction model based on the XGBoost algorithm outperformed LASSO and logistic regression, providing valuable preoperative assistance to surgeons in clinical decision-making for patients suspected PC.
Collapse
Affiliation(s)
- Yuting Wang
- Department of Thyroid and Neck Surgery, Beijing Chaoyang Hospital, Capital Medical University, Beijing, China
| | - Bojun Wei
- Department of Thyroid and Neck Surgery, Beijing Chaoyang Hospital, Capital Medical University, Beijing, China.
| | - Teng Zhao
- Department of Thyroid and Neck Surgery, Beijing Chaoyang Hospital, Capital Medical University, Beijing, China
| | - Hong Shen
- Department of Thyroid and Neck Surgery, Beijing Chaoyang Hospital, Capital Medical University, Beijing, China
| | - Xing Liu
- Department of Thyroid and Neck Surgery, Beijing Chaoyang Hospital, Capital Medical University, Beijing, China
| | - Jiacheng Wang
- Department of Thyroid and Neck Surgery, Beijing Chaoyang Hospital, Capital Medical University, Beijing, China
| | - Qian Wang
- Department of Thyroid and Neck Surgery, Beijing Chaoyang Hospital, Capital Medical University, Beijing, China
| | - Rongfang Shen
- Department of Thyroid and Neck Surgery, Beijing Chaoyang Hospital, Capital Medical University, Beijing, China
| | - Dalin Feng
- Department of Thyroid and Neck Surgery, Beijing Chaoyang Hospital, Capital Medical University, Beijing, China
| |
Collapse
|
8
|
Chen G, Dai X, Zhang M, Tian Z, Jin X, Mei K, Huang H, Wu Z. Machine learning-based prediction model and visual interpretation for prostate cancer. BMC Urol 2023; 23:164. [PMID: 37838656 PMCID: PMC10576344 DOI: 10.1186/s12894-023-01316-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2022] [Accepted: 09/03/2023] [Indexed: 10/16/2023] Open
Abstract
BACKGROUND Most prostate cancers(PCa) rely on serum prostate-specific antigen (PSA) testing for biopsy confirmation, but the accuracy needs to be further improved. We need to continue to develop PCa prediction model with high clinical application value. METHODS Benign prostatic hyperplasia (BPH) and prostate cancer data were obtained from the Chinese National Clinical Medical Science Data Center for retrospective analysis. The model was constructed using the XGBoost algorithm, and patients' age, body mass index (BMI), PSA-related parameters and serum biochemical parameters were used as model variables. Using decision analysis curve (DCA) to evaluate the clinical utility of the models. The shapley additive explanation (SHAP) framework was used to analyze the importance ranking and risk threshold of the variables. RESULTS A total of 1915 patients were included in this study, including 823 (43.0%) were BPH patients and 1092 (57.0%) were PCa patients. The XGBoost model provided better performance (AUC 0.82) compared with f/tPSA (AUC 0.75),tPSA (AUC 0.68) and fPSA (AUC 0.61), respectively. Based on SHAP values, f/tPSA was the most important variable, and the top five most important biochemical parameter variables were inorganic phosphorus (P), potassium (K), creatine kinase MB isoenzyme (CKMB), low-density lipoprotein cholesterol (LDL-C), and creatinine (Cre). PCa risk thresholds for these risk markers were f/tPSA (0.13), P (1.29 mmol/L), K (4.29 mmol/L), CKMB ( 11.6U/L), LDL-C (3.05mmol/L) and Cre (74.5-99.1umol/L). CONCLUSION The present model has advantages of wide-spread availability and high net benefit, especially for underdeveloped countries and regions. Furthermore, these risk thresholds can assist in the diagnosis and screening of prostate cancer in clinical practice.
Collapse
Affiliation(s)
- Gang Chen
- School of Public Health and Management, Wenzhou Medical University, Wenzhou, 325035, China
| | - Xuchao Dai
- School of Public Health and Management, Wenzhou Medical University, Wenzhou, 325035, China
| | - Mengqi Zhang
- School of Public Health and Management, Wenzhou Medical University, Wenzhou, 325035, China
| | - Zhujun Tian
- School of Public Health and Management, Wenzhou Medical University, Wenzhou, 325035, China
| | - Xueke Jin
- School of Public Health and Management, Wenzhou Medical University, Wenzhou, 325035, China
| | - Kun Mei
- School of Environmental Science and Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
| | - Hong Huang
- Center for Health Assessment, Wenzhou Medical University, Wenzhou, 325035, China.
- Zhejiang Provincial Key Laboratory of Watershed Sciences and Health, Wenzhou, 325035, China.
| | - Zhigang Wu
- Department of Urology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325035, China.
- Reproductive Health Research Center, Health Assessment Center of Wenzhou Medical University, Wenzhou, 325000, China.
| |
Collapse
|
9
|
Zhao T, Wu H, Wang X, Zhao Y, Wang L, Pan J, Mei H, Han J, Wang S, Lu K, Li M, Gao M, Cao Z, Zhang H, Wan K, Li J, Fang L, Zhang T, Guan X. Integration of eQTL and machine learning to dissect causal genes with pleiotropic effects in genetic regulation networks of seed cotton yield. Cell Rep 2023; 42:113111. [PMID: 37676770 DOI: 10.1016/j.celrep.2023.113111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 06/19/2023] [Accepted: 08/24/2023] [Indexed: 09/09/2023] Open
Abstract
The dissection of a gene regulatory network (GRN) that complements the genome-wide association study (GWAS) locus and the crosstalk underlying multiple agronomical traits remains a major challenge. In this study, we generate 558 transcriptional profiles of lint-bearing ovules at one day post-anthesis from a selective core cotton germplasm, from which 12,207 expression quantitative trait loci (eQTLs) are identified. Sixty-six known phenotypic GWAS loci are colocalized with 1,090 eQTLs, forming 38 functional GRNs associated predominantly with seed yield. Of the eGenes, 34 exhibit pleiotropic effects. Combining the eQTLs within the seed yield GRNs significantly increases the portion of narrow-sense heritability. The extreme gradient boosting (XGBoost) machine learning approach is applied to predict seed cotton yield phenotypes on the basis of gene expression. Top-ranking eGenes (NF-YB3, FLA2, and GRDP1) derived with pleiotropic effects on yield traits are validated, along with their potential roles by correlation analysis, domestication selection analysis, and transgenic plants.
Collapse
Affiliation(s)
- Ting Zhao
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China; Hainan Institute of Zhejiang University, Building 11, Yonyou Industrial Park, Yazhou Bay Science and Technology City, Yazhou District, Sanya 572025, China
| | - Hongyu Wu
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China
| | - Xutong Wang
- Hubei Hongshan Laboratory, Wuhan 430070, China
| | - Yongyan Zhao
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China; Hainan Institute of Zhejiang University, Building 11, Yonyou Industrial Park, Yazhou Bay Science and Technology City, Yazhou District, Sanya 572025, China
| | - Luyao Wang
- Hainan Institute of Zhejiang University, Building 11, Yonyou Industrial Park, Yazhou Bay Science and Technology City, Yazhou District, Sanya 572025, China
| | - Jiaying Pan
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China; Hainan Institute of Zhejiang University, Building 11, Yonyou Industrial Park, Yazhou Bay Science and Technology City, Yazhou District, Sanya 572025, China
| | - Huan Mei
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China
| | - Jin Han
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China
| | - Siyuan Wang
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China
| | - Kening Lu
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Cotton Hybrid R & D Engineering Center (the Ministry of Education), College of Agriculture, Nanjing Agricultural University, Nanjing 210095, China
| | - Menglin Li
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Cotton Hybrid R & D Engineering Center (the Ministry of Education), College of Agriculture, Nanjing Agricultural University, Nanjing 210095, China
| | - Mengtao Gao
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Cotton Hybrid R & D Engineering Center (the Ministry of Education), College of Agriculture, Nanjing Agricultural University, Nanjing 210095, China
| | - Zeyi Cao
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China
| | - Hailin Zhang
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China
| | - Ke Wan
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Cotton Hybrid R & D Engineering Center (the Ministry of Education), College of Agriculture, Nanjing Agricultural University, Nanjing 210095, China
| | - Jie Li
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Cotton Hybrid R & D Engineering Center (the Ministry of Education), College of Agriculture, Nanjing Agricultural University, Nanjing 210095, China
| | - Lei Fang
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China; Hainan Institute of Zhejiang University, Building 11, Yonyou Industrial Park, Yazhou Bay Science and Technology City, Yazhou District, Sanya 572025, China
| | - Tianzhen Zhang
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China; Hainan Institute of Zhejiang University, Building 11, Yonyou Industrial Park, Yazhou Bay Science and Technology City, Yazhou District, Sanya 572025, China
| | - Xueying Guan
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China; Hainan Institute of Zhejiang University, Building 11, Yonyou Industrial Park, Yazhou Bay Science and Technology City, Yazhou District, Sanya 572025, China.
| |
Collapse
|
10
|
Guan SW, Lin Q, Wu XD, Yu HB. Weighted gene coexpression network analysis and machine learning reveal oncogenome associated microbiome plays an important role in tumor immunity and prognosis in pan-cancer. J Transl Med 2023; 21:537. [PMID: 37573394 PMCID: PMC10422781 DOI: 10.1186/s12967-023-04411-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Accepted: 08/02/2023] [Indexed: 08/14/2023] Open
Abstract
BACKGROUND For many years, the role of the microbiome in tumor progression, particularly the tumor microbiome, was largely overlooked. The connection between the tumor microbiome and the tumor genome still requires further investigation. METHODS The TCGA microbiome and genome data were obtained from Haziza et al.'s article and UCSC Xena database, respectively. Separate WGCNA networks were constructed for the tumor microbiome and genomic data after filtering the datasets. Correlation analysis between the microbial and mRNA modules was conducted to identify oncogenome associated microbiome module (OAM) modules, with three microbial modules selected for each tumor type. Reactome analysis was used to enrich biological processes. Machine learning techniques were implemented to explore the tumor type-specific enrichment and prognostic value of OAM, as well as the ability of the tumor microbiome to differentiate TP53 mutations. RESULTS We constructed a total of 182 tumor microbiome and 570 mRNA WGCNA modules. Our results show that there is a correlation between tumor microbiome and tumor genome. Gene enrichment analysis results suggest that the genes in the mRNA module with the highest correlation with the tumor microbiome group are mainly enriched in infection, transcriptional regulation by TP53 and antigen presentation. The correlation analysis of OAM with CD8+ T cells or TAM1 cells suggests the existence of many microbiota that may be involved in tumor immune suppression or promotion, such as Williamsia in breast cancer, Biostraticola in stomach cancer, Megasphaera in cervical cancer and Lottiidibacillus in ovarian cancer. In addition, the results show that the microbiome-genome prognostic model has good predictive value for short-term prognosis. The analysis of tumor TP53 mutations shows that tumor microbiota has a certain ability to distinguish TP53 mutations, with an AUROC value of 0.755. The tumor microbiota with high importance scores are Corallococcus, Bacillus and Saezia. Finally, we identified a potential anti-cancer microbiota, Tissierella, which has been shown to be associated with improved prognosis in tumors including breast cancer, lung adenocarcinoma and gastric cancer. CONCLUSION There is an association between the tumor microbiome and the tumor genome, and the existence of this association is not accidental and could change the landscape of tumor research.
Collapse
Affiliation(s)
- Shi-Wei Guan
- Department of Hepatobiliary Surgery, Wenzhou Central Hospital, The Dingli Clinical Institute of Wenzhou Medical University, Wenzhou, 325000 Zhejiang People’s Republic of China
| | - Quan Lin
- Department of Hepatobiliary Surgery, Wenzhou Central Hospital, The Dingli Clinical Institute of Wenzhou Medical University, Wenzhou, 325000 Zhejiang People’s Republic of China
| | - Xi-Dong Wu
- Department of Neurosurgery Surgery, Wenzhou Central Hospital, The Dingli Clinical Institute of Wenzhou Medical University, Wenzhou, 325000 Zhejiang People’s Republic of China
| | - Hai-Bo Yu
- Department of Hepatobiliary Surgery, Wenzhou Central Hospital, The Dingli Clinical Institute of Wenzhou Medical University, Wenzhou, 325000 Zhejiang People’s Republic of China
| |
Collapse
|
11
|
Mirza Z, Ansari MS, Iqbal MS, Ahmad N, Alganmi N, Banjar H, Al-Qahtani MH, Karim S. Identification of Novel Diagnostic and Prognostic Gene Signature Biomarkers for Breast Cancer Using Artificial Intelligence and Machine Learning Assisted Transcriptomics Analysis. Cancers (Basel) 2023; 15:3237. [PMID: 37370847 DOI: 10.3390/cancers15123237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 06/10/2023] [Accepted: 06/13/2023] [Indexed: 06/29/2023] Open
Abstract
BACKGROUND Breast cancer (BC) is one of the most common female cancers. Clinical and histopathological information is collectively used for diagnosis, but is often not precise. We applied machine learning (ML) methods to identify the valuable gene signature model based on differentially expressed genes (DEGs) for BC diagnosis and prognosis. METHODS A cohort of 701 samples from 11 GEO BC microarray datasets was used for the identification of significant DEGs. Seven ML methods, including RFECV-LR, RFECV-SVM, LR-L1, SVC-L1, RF, and Extra-Trees were applied for gene reduction and the construction of a diagnostic model for cancer classification. Kaplan-Meier survival analysis was performed for prognostic signature construction. The potential biomarkers were confirmed via qRT-PCR and validated by another set of ML methods including GBDT, XGBoost, AdaBoost, KNN, and MLP. RESULTS We identified 355 DEGs and predicted BC-associated pathways, including kinetochore metaphase signaling, PTEN, senescence, and phagosome-formation pathways. A hub of 28 DEGs and a novel diagnostic nine-gene signature (COL10A, S100P, ADAMTS5, WISP1, COMP, CXCL10, LYVE1, COL11A1, and INHBA) were identified using stringent filter conditions. Similarly, a novel prognostic model consisting of eight-gene signatures (CCNE2, NUSAP1, TPX2, S100P, ITM2A, LIFR, TNXA, and ZBTB16) was also identified using disease-free survival and overall survival analysis. Gene signatures were validated by another set of ML methods. Finally, qRT-PCR results confirmed the expression of the identified gene signatures in BC. CONCLUSION The ML approach helped construct novel diagnostic and prognostic models based on the expression profiling of BC. The identified nine-gene signature and eight-gene signatures showed excellent potential in BC diagnosis and prognosis, respectively.
Collapse
Affiliation(s)
- Zeenat Mirza
- King Fahd Medical Research Center, King Abdulaziz University, Jeddah 21589, Saudi Arabia
- Department of Medical Laboratory Science, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Md Shahid Ansari
- Department of Clinical Data Analytics, Max Super Speciality Hospital, Saket, New Delhi 110017, India
| | - Md Shahid Iqbal
- Department of Statistics and Computer Applications, Tilka Manjhi Bhagalpur University, Bhagalpur 812007, India
| | - Nesar Ahmad
- Department of Statistics and Computer Applications, Tilka Manjhi Bhagalpur University, Bhagalpur 812007, India
| | - Nofe Alganmi
- Computer Science Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
- Center of Excellence in Genomic Medicine Research, King Abdulaziz University, Jeddah 21589, Saudi Arabia
- Centre of Artificial Intelligence in Precision Medicines, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Haneen Banjar
- Computer Science Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
- Center of Excellence in Genomic Medicine Research, King Abdulaziz University, Jeddah 21589, Saudi Arabia
- Centre of Artificial Intelligence in Precision Medicines, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Mohammed H Al-Qahtani
- Department of Medical Laboratory Science, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah 21589, Saudi Arabia
- Center of Excellence in Genomic Medicine Research, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Sajjad Karim
- Department of Medical Laboratory Science, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah 21589, Saudi Arabia
- Center of Excellence in Genomic Medicine Research, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| |
Collapse
|
12
|
Guan X, Du Y, Ma R, Teng N, Ou S, Zhao H, Li X. Construction of the XGBoost model for early lung cancer prediction based on metabolic indices. BMC Med Inform Decis Mak 2023; 23:107. [PMID: 37312179 DOI: 10.1186/s12911-023-02171-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 04/05/2023] [Indexed: 06/15/2023] Open
Abstract
BACKGROUND Lung cancer is a malignant tumour, and early diagnosis has been shown to improve the survival rate of lung cancer patients. In this study, we assessed the use of plasma metabolites as biomarkers for lung cancer diagnosis. In this work, we used a novel interdisciplinary mechanism, applied for the first time to lung cancer, to detect biomarkers for early lung cancer diagnosis by combining metabolomics and machine learning approaches. RESULTS In total, 478 lung cancer patients and 370 subjects with benign lung nodules were enrolled from a hospital in Dalian, Liaoning Province. We selected 47 serum amino acid and carnitine indicators from targeted metabolomics studies using LC‒MS/MS and age and sex demographic indicators of the subjects. After screening by a stepwise regression algorithm, 16 metrics were included. The XGBoost model in the machine learning algorithm showed superior predictive power (AUC = 0.81, accuracy = 75.29%, sensitivity = 74%), with the metabolic biomarkers ornithine and palmitoylcarnitine being potential biomarkers to screen for lung cancer. The machine learning model XGBoost is proposed as an tool for early lung cancer prediction. This study provides strong support for the feasibility of blood-based screening for metabolites and provide a safer, faster and more accurate tool for early diagnosis of lung cancer. CONCLUSIONS This study proposes an interdisciplinary approach combining metabolomics with a machine learning model (XGBoost) to predict early the occurrence of lung cancer. The metabolic biomarkers ornithine and palmitoylcarnitine showed significant power for early lung cancer diagnosis.
Collapse
Affiliation(s)
- Xiuliang Guan
- School of Public Health, Dalian Medical University, Dalian, 116000, China
| | - Yue Du
- School of Public Health, Dalian Medical University, Dalian, 116000, China
| | - Rufei Ma
- School of Public Health, Dalian Medical University, Dalian, 116000, China
| | - Nan Teng
- School of Public Health, Dalian Medical University, Dalian, 116000, China
| | - Shu Ou
- School of Public Health, Dalian Medical University, Dalian, 116000, China
| | - Hui Zhao
- Department of Health Examination Center, The Second Affiliated Hospital of Dalian Medical University, Dalian, China.
| | - Xiaofeng Li
- School of Public Health, Dalian Medical University, Dalian, 116000, China.
| |
Collapse
|
13
|
Fonseca-Montaño MA, Vázquez-Santillán KI, Hidalgo-Miranda A. The current advances of lncRNAs in breast cancer immunobiology research. Front Immunol 2023; 14:1194300. [PMID: 37342324 PMCID: PMC10277570 DOI: 10.3389/fimmu.2023.1194300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Accepted: 05/24/2023] [Indexed: 06/22/2023] Open
Abstract
Breast cancer is the most frequently diagnosed malignancy and the leading cause of cancer-related death in women worldwide. Breast cancer development and progression are mainly associated with tumor-intrinsic alterations in diverse genes and signaling pathways and with tumor-extrinsic dysregulations linked to the tumor immune microenvironment. Significantly, abnormal expression of lncRNAs affects the tumor immune microenvironment characteristics and modulates the behavior of different cancer types, including breast cancer. In this review, we provide the current advances about the role of lncRNAs as tumor-intrinsic and tumor-extrinsic modulators of the antitumoral immune response and the immune microenvironment in breast cancer, as well as lncRNAs which are potential biomarkers of tumor immune microenvironment and clinicopathological characteristics in patients, suggesting that lncRNAs are potential targets for immunotherapy in breast cancer.
Collapse
Affiliation(s)
- Marco Antonio Fonseca-Montaño
- Laboratorio de Genómica del Cáncer, Instituto Nacional de Medicina Genómica (INMEGEN), Mexico City, Mexico
- Programa de Doctorado, Posgrado en Ciencias Biológicas, Unidad de Posgrado, Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | | | - Alfredo Hidalgo-Miranda
- Laboratorio de Genómica del Cáncer, Instituto Nacional de Medicina Genómica (INMEGEN), Mexico City, Mexico
| |
Collapse
|
14
|
Applying Explainable Machine Learning Models for Detection of Breast Cancer Lymph Node Metastasis in Patients Eligible for Neoadjuvant Treatment. Cancers (Basel) 2023; 15:cancers15030634. [PMID: 36765592 PMCID: PMC9913601 DOI: 10.3390/cancers15030634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Revised: 01/16/2023] [Accepted: 01/17/2023] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Due to recent changes in breast cancer treatment strategy, significantly more patients are treated with neoadjuvant systemic therapy (NST). Radiological methods do not precisely determine axillary lymph node status, with up to 30% of patients being misdiagnosed. Hence, supplementary methods for lymph node status assessment are needed. This study aimed to apply and evaluate machine learning models on clinicopathological data, with a focus on patients meeting NST criteria, for lymph node metastasis prediction. METHODS From the total breast cancer patient data (n = 8381), 719 patients were identified as eligible for NST. Machine learning models were applied for the NST-criteria group and the total study population. Model explainability was obtained by calculating Shapley values. RESULTS In the NST-criteria group, random forest achieved the highest performance (AUC: 0.793 [0.713, 0.865]), while in the total study population, XGBoost performed the best (AUC: 0.762 [0.726, 0.795]). Shapley values identified tumor size, Ki-67, and patient age as the most important predictors. CONCLUSION Tree-based models achieve a good performance in assessing lymph node status. Such models can lead to more accurate disease stage prediction and consecutively better treatment selection, especially for NST patients where radiological and clinical findings are often the only way of lymph node assessment.
Collapse
|
15
|
Yuan L, Ji M, Wang S, Wen X, Huang P, Shen L, Xu J. Machine learning model identifies aggressive acute pancreatitis within 48 h of admission: a large retrospective study. BMC Med Inform Decis Mak 2022; 22:312. [PMID: 36447180 PMCID: PMC9707001 DOI: 10.1186/s12911-022-02066-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 11/23/2022] [Indexed: 12/05/2022] Open
Abstract
BACKGROUND Acute pancreatitis (AP) with critical illness is linked to increased morbidity and mortality. Current risk scores to identify high-risk AP patients have certain limitations. OBJECTIVE To develop and validate a machine learning tool within 48 h after admission for predicting which patients with AP will develop critical illness based on ubiquitously available clinical, laboratory, and radiologic variables. METHODS 5460 AP patients were enrolled. Clinical, laboratory, and imaging variables were collected within 48 h after hospital admission. Least Absolute Shrinkage Selection Operator with bootstrap method was employed to select the most informative variables. Five different machine learning models were constructed to predictive likelihood of critical illness, and the optimal model (APCU) was selected. External cohort was used to validate APCU. APCU and other risk scores were compared using multivariate analysis. Models were evaluated by area under the curve (AUC). The decision curve analysis was employed to evaluate the standardized net benefit. RESULTS Xgboost was constructed and selected as APCU, involving age, comorbid disease, mental status, pulmonary infiltrates, procalcitonin (PCT), neutrophil percentage (Neu%), ALT/AST, ratio of albumin and globulin, cholinesterase, Urea, Glu, AST and serum total cholesterol. The APCU performed excellently in discriminating AP risk in internal cohort (AUC = 0.95) and external cohort (AUC = 0.873). The APCU was significant for biliogenic AP (OR = 4.25 [2.08-8.72], P < 0.001), alcoholic AP (OR = 3.60 [1.67-7.72], P = 0.001), hyperlipidemic AP (OR = 2.63 [1.28-5.37], P = 0.008) and tumor AP (OR = 4.57 [2.14-9.72], P < 0.001). APCU yielded the highest clinical net benefit, comparatively. CONCLUSION Machine learning tool based on ubiquitously available clinical variables accurately predicts the development of AP, optimizing the management of AP.
Collapse
Affiliation(s)
- Lei Yuan
- grid.260478.f0000 0000 9249 2313School of Automation, Nanjing University of Information Science and Technology, Nanjing, China ,grid.412632.00000 0004 1758 2270Department of Information Center, Wuhan University Renmin Hospital, Wuhan, Hubei China ,grid.260478.f0000 0000 9249 2313Institute for AI in Medicine, School of Artificial Intelligence, Nanjing University of Information Science and Technology, NanJing, China
| | - Mengyao Ji
- grid.412632.00000 0004 1758 2270Department of Gastroenterology, Wuhan University Renmin Hospital, Wuhan, Hubei China
| | - Shuo Wang
- grid.412632.00000 0004 1758 2270Department of Gastroenterology, Wuhan University Renmin Hospital, Wuhan, Hubei China
| | - Xinyu Wen
- grid.412632.00000 0004 1758 2270Department of Gastroenterology, Wuhan University Renmin Hospital, Wuhan, Hubei China
| | - Pingxiao Huang
- grid.33199.310000 0004 0368 7223Department of Gastroenterology, The Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei China
| | - Lei Shen
- grid.412632.00000 0004 1758 2270Department of Gastroenterology, Wuhan University Renmin Hospital, Wuhan, Hubei China
| | - Jun Xu
- grid.260478.f0000 0000 9249 2313School of Automation, Nanjing University of Information Science and Technology, Nanjing, China ,grid.260478.f0000 0000 9249 2313Institute for AI in Medicine, School of Artificial Intelligence, Nanjing University of Information Science and Technology, NanJing, China
| |
Collapse
|
16
|
Li Q, Wang P, Yuan J, Zhou Y, Mei Y, Ye M. A two-stage hybrid gene selection algorithm combined with machine learning models to predict the rupture status in intracranial aneurysms. Front Neurosci 2022; 16:1034971. [PMID: 36340761 PMCID: PMC9631203 DOI: 10.3389/fnins.2022.1034971] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 09/30/2022] [Indexed: 07/31/2023] Open
Abstract
An IA is an abnormal swelling of cerebral vessels, and a subset of these IAs can rupture causing aneurysmal subarachnoid hemorrhage (aSAH), often resulting in death or severe disability. Few studies have used an appropriate method of feature selection combined with machine learning by analyzing transcriptomic sequencing data to identify new molecular biomarkers. Following gene ontology (GO) and enrichment analysis, we found that the distinct status of IAs could lead to differential innate immune responses using all 913 differentially expressed genes, and considering that there are numerous irrelevant and redundant genes, we propose a mixed filter- and wrapper-based feature selection. First, we used the Fast Correlation-Based Filter (FCBF) algorithm to filter a large number of irrelevant and redundant genes in the raw dataset, and then used the wrapper feature selection method based on the he Multi-layer Perceptron (MLP) neural network and the Particle Swarm Optimization (PSO), accuracy (ACC) and mean square error (MSE) were then used as the evaluation criteria. Finally, we constructed a novel 10-gene signature (YIPF1, RAB32, WDR62, ANPEP, LRRCC1, AADAC, GZMK, WBP2NL, PBX1, and TOR1B) by the proposed two-stage hybrid algorithm FCBF-MLP-PSO and used different machine learning models to predict the rupture status in IAs. The highest ACC value increased from 0.817 to 0.919 (12.5% increase), the highest area under ROC curve (AUC) value increased from 0.87 to 0.94 (8.0% increase), and all evaluation metrics improved by approximately 10% after being processed by our proposed gene selection algorithm. Therefore, these 10 informative genes used to predict rupture status of IAs can be used as complements to imaging examinations in the clinic, meanwhile, this selected gene signature also provides new targets and approaches for the treatment of ruptured IAs.
Collapse
Affiliation(s)
- Qingqing Li
- School of Medical Information, Wannan Medical College, Wuhu, Anhui, China
- Research Center of Health Big Data Mining and Applications, Wannan Medical College, Wuhu, Anhui, China
| | - Peipei Wang
- School of Medical Information, Wannan Medical College, Wuhu, Anhui, China
- Research Center of Health Big Data Mining and Applications, Wannan Medical College, Wuhu, Anhui, China
| | - Jinlong Yuan
- Department of Neurosurgery, Yijishan Hospital of Wannan Medical College, Wannan Medical College, Wuhu, Anhui, China
| | - Yunfeng Zhou
- Department of Radiology, Yijishan Hospital of Wannan Medical College, Wannan Medical College, Wuhu, Anhui, China
| | - Yaxin Mei
- School of Medical Information, Wannan Medical College, Wuhu, Anhui, China
- Research Center of Health Big Data Mining and Applications, Wannan Medical College, Wuhu, Anhui, China
| | - Mingquan Ye
- School of Medical Information, Wannan Medical College, Wuhu, Anhui, China
- Research Center of Health Big Data Mining and Applications, Wannan Medical College, Wuhu, Anhui, China
| |
Collapse
|
17
|
Sun CK, Tang YX, Liu TC, Lu CJ. An Integrated Machine Learning Scheme for Predicting Mammographic Anomalies in High-Risk Individuals Using Questionnaire-Based Predictors. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph19159756. [PMID: 35955112 PMCID: PMC9368335 DOI: 10.3390/ijerph19159756] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 08/02/2022] [Accepted: 08/06/2022] [Indexed: 05/09/2023]
Abstract
This study aimed to investigate the important predictors related to predicting positive mammographic findings based on questionnaire-based demographic and obstetric/gynecological parameters using the proposed integrated machine learning (ML) scheme. The scheme combines the benefits of two well-known ML algorithms, namely, least absolute shrinkage and selection operator (Lasso) logistic regression and extreme gradient boosting (XGB), to provide adequate prediction for mammographic anomalies in high-risk individuals and the identification of significant risk factors. We collected questionnaire data on 18 breast-cancer-related risk factors from women who participated in a national mammographic screening program between January 2017 and December 2020 at a single tertiary referral hospital to correlate with their mammographic findings. The acquired data were retrospectively analyzed using the proposed integrated ML scheme. Based on the data from 21,107 valid questionnaires, the results showed that the Lasso logistic regression models with variable combinations generated by XGB could provide more effective prediction results. The top five significant predictors for positive mammography results were younger age, breast self-examination, older age at first childbirth, nulliparity, and history of mammography within 2 years, suggesting a need for timely mammographic screening for women with these risk factors.
Collapse
Affiliation(s)
- Cheuk-Kay Sun
- Division of Hepatology and Gastroenterology, Department of Internal Medicine, Shin Kong Wu Ho-Su Memorial Hospital, Taipei 11101, Taiwan
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 24205, Taiwan
- School of Medicine, Fu Jen Catholic University, New Taipei City 24205, Taiwan
- School of Medicine, Taipei Medical University, Taipei 11031, Taiwan
| | - Yun-Xuan Tang
- Department of Radiology, Shin Kong Wu Ho-Su Memorial Hospital, Taipei 11101, Taiwan
- Department of Medical Imaging and Radiological Technology, Yuanpei University of Medical Technology, Hsinchu 30015, Taiwan
| | - Tzu-Chi Liu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 24205, Taiwan
| | - Chi-Jie Lu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 24205, Taiwan
- Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei City 24205, Taiwan
- Department of Information Management, Fu Jen Catholic University, New Taipei City 24205, Taiwan
- Correspondence:
| |
Collapse
|
18
|
Mammographic Classification of Breast Cancer Microcalcifications through Extreme Gradient Boosting. ELECTRONICS 2022. [DOI: 10.3390/electronics11152435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
In this paper, we proposed an effective and efficient approach to the classification of breast cancer microcalcifications and evaluated the mathematical model for calcification on mammography with a large medical dataset. We employed several semi-automatic segmentation algorithms to extract 51 calcification features from mammograms, including morphologic and textural features. We adopted extreme gradient boosting (XGBoost) to classify microcalcifications. Then, we compared other machine learning techniques, including k-nearest neighbor (kNN), adaboostM1, decision tree, random decision forest (RDF), and gradient boosting decision tree (GBDT), with XGBoost. XGBoost showed the highest accuracy (90.24%) for classifying microcalcifications, and kNN demonstrated the lowest accuracy. This result demonstrates that it is essential for the classification of microcalcification to use the feature engineering method for the selection of the best composition of features. One of the contributions of this study is to present the best composition of features for efficient classification of breast cancers. This paper finds a way to select the best discriminative features as a collection to improve the accuracy. This study showed the highest accuracy (90.24%) for classifying microcalcifications with AUC = 0.89. Moreover, we highlighted the performance of various features from the dataset and found ideal parameters for classifying microcalcifications. Furthermore, we found that the XGBoost model is suitable both in theory and practice for the classification of calcifications on mammography.
Collapse
|