1
|
Jiang Z, Yu Y, Yu X, Huang M, Wang Q, Huang K, Song C. Predictive model of prognosis index for invasive micropapillary carcinoma of the breast based on machine learning: a SEER population-based study. BMC Med Inform Decis Mak 2024; 24:268. [PMID: 39334146 PMCID: PMC11428430 DOI: 10.1186/s12911-024-02669-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Accepted: 09/06/2024] [Indexed: 09/30/2024] Open
Abstract
BACKGROUND Invasive micropapillary carcinoma (IMPC) is a rare subtype of breast cancer. Its epidemiological features, treatment principles, and prognostic factors remain controversial. OBJECTIVE This study aimed to develop an improved machine learning-based model to predict the prognosis of patients with invasive micropapillary carcinoma. METHODS A total of 1123 patients diagnosed with IMPC after surgery between 1998 and 2019 were identified from the Surveillance, Epidemiology, and End Results (SEER) database for survival analysis. Univariate and multivariate analyses were performed to explore independent prognostic factors for the overall and disease-specific survival of patients with IMPC. Five machine learning algorithms were developed to predict the 5-year survival of these patients. RESULTS Cox regression analysis indicated that patients aged > 65 years had a significantly worse prognosis than those younger in age, while unmarried patients had a better prognosis than married patients. Patients diagnosed between 2001 and 2005 had a significant risk reduction of mortality compared with other periods. The XGBoost model outperformed the other models with a precision of 0.818 and an area under the curve of 0.863. CONCLUSIONS A machine learning model for IMPC in patients with breast cancer was developed to estimate the 5-year OS. The XGBoost model had a promising performance and can help clinicians determine the early prognosis of patients with IMPC; therefore, the model can improve clinical outcomes by influencing management strategies and patient health care decisions.
Collapse
Affiliation(s)
- Zirong Jiang
- Department of Breast Surgery, Fujian Cancer Hospital, Clinical Oncology School of Fujian Medical University, No.420, Fu Ma Road Jinan District, Fuzhou, Fujian Province, 350011, China
- Department of Thyroid and Breast Surgery, Ningde Municipal Hospital of Ningde Normal University, Ningde, 352100, China
| | - Yushuai Yu
- Department of Breast Surgery, Fujian Cancer Hospital, Clinical Oncology School of Fujian Medical University, No.420, Fu Ma Road Jinan District, Fuzhou, Fujian Province, 350011, China
| | - Xin Yu
- Department of Breast Surgery, Fujian Cancer Hospital, Clinical Oncology School of Fujian Medical University, No.420, Fu Ma Road Jinan District, Fuzhou, Fujian Province, 350011, China
| | - Mingyao Huang
- Department of Breast Surgery, Fujian Cancer Hospital, Clinical Oncology School of Fujian Medical University, No.420, Fu Ma Road Jinan District, Fuzhou, Fujian Province, 350011, China
| | - Qing Wang
- Department of Breast Surgery, Fujian Cancer Hospital, Clinical Oncology School of Fujian Medical University, No.420, Fu Ma Road Jinan District, Fuzhou, Fujian Province, 350011, China
| | - Kaiyan Huang
- Department of Breast Surgery, Fujian Cancer Hospital, Clinical Oncology School of Fujian Medical University, No.420, Fu Ma Road Jinan District, Fuzhou, Fujian Province, 350011, China
| | - Chuangui Song
- Department of Breast Surgery, Fujian Cancer Hospital, Clinical Oncology School of Fujian Medical University, No.420, Fu Ma Road Jinan District, Fuzhou, Fujian Province, 350011, China.
| |
Collapse
|
2
|
Chen Y, Wen Y, Xie C, Chen X, He S, Bo X, Zhang Z. MOCSS: Multi-omics data clustering and cancer subtyping via shared and specific representation learning. iScience 2023; 26:107378. [PMID: 37559907 PMCID: PMC10407241 DOI: 10.1016/j.isci.2023.107378] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 05/23/2023] [Accepted: 07/07/2023] [Indexed: 08/11/2023] Open
Abstract
Cancer is an extremely complex disease and each type of cancer usually has several different subtypes. Multi-omics data can provide more comprehensive biological information for identifying and discovering cancer subtypes. However, existing unsupervised cancer subtyping methods cannot effectively learn comprehensive shared and specific information of multi-omics data. Therefore, a novel method is proposed based on shared and specific representation learning. For each omics data, two autoencoders are applied to extract shared and specific information, respectively. To reduce redundancy and mutual interference, orthogonality constraint is introduced to separate shared and specific information. In addition, contrastive learning is applied to align the shared information and strengthen their consistency. Finally, the obtained shared and specific information for all samples are used for clustering tasks to achieve cancer subtyping. Experimental results demonstrate that the proposed method can effectively capture shared and specific information of multi-omics data and outperform other state-of-the-art methods on cancer subtyping.
Collapse
Affiliation(s)
- Yuxin Chen
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Yuqi Wen
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Chenyang Xie
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Xinjian Chen
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Song He
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Xiaochen Bo
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Zhongnan Zhang
- School of Informatics, Xiamen University, Xiamen 361005, China
| |
Collapse
|
3
|
Li Z, Wang M, Peng D, Liu J, Xie Y, Dai Z, Zou X. Identification of Chemical-Disease Associations Through Integration of Molecular Fingerprint, Gene Ontology and Pathway Information. Interdiscip Sci 2022; 14:683-696. [PMID: 35391615 DOI: 10.1007/s12539-022-00511-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 03/16/2022] [Accepted: 03/17/2022] [Indexed: 06/14/2023]
Abstract
The identification of chemical-disease association types is helpful not only to discovery lead compounds and study drug repositioning, but also to treat disease and decipher pathomechanism. It is very urgent to develop computational method for identifying potential chemical-disease association types, since wet methods are usually expensive, laborious and time-consuming. In this study, molecular fingerprint, gene ontology and pathway are utilized to characterize chemicals and diseases. A novel predictor is proposed to recognize potential chemical-disease associations at the first layer, and further distinguish whether their relationships belong to biomarker or therapeutic relations at the second layer. The prediction performance of current method is assessed using the benchmark dataset based on ten-fold cross-validation. The practical prediction accuracies of the first layer and the second layer are 78.47% and 72.07%, respectively. The recognition ability for lead compounds, new drug indications, potential and true chemical-disease association pairs has also been investigated and confirmed by constructing a variety of datasets and performing a series of experiments. It is anticipated that the current method can be considered as a powerful high-throughput virtual screening tool for drug researches and developments.
Collapse
Affiliation(s)
- Zhanchao Li
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, People's Republic of China.
- NMPA Key Laboratory for Technology Research and Evaluation of Pharmacovigilance, Guangzhou, 510006, People's Republic of China.
- Key Laboratory of Digital Quality Evaluation of Chinese Materia Medica of State Administration of Traditional Chinese Medicine, Guangzhou, 510006, People's Republic of China.
| | - Mengru Wang
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, People's Republic of China
| | - Dongdong Peng
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, People's Republic of China
| | - Jie Liu
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, People's Republic of China
| | - Yun Xie
- HuiZhou University, Huizhou, 516007, People's Republic of China
| | - Zong Dai
- School of Biomedical Engineering, Sun Yat-Sen University, Guangzhou, 510275, People's Republic of China
| | - Xiaoyong Zou
- School of Chemistry, Sun Yat-Sen University, Guangzhou, 510275, People's Republic of China.
| |
Collapse
|
4
|
Gutiérrez-Cárdenas J, Wang Z. Prediction of binding miRNAs involved with immune genes to the SARS-CoV-2 by using sequence features extraction and One-class SVM. INFORMATICS IN MEDICINE UNLOCKED 2022; 30:100958. [PMID: 35528315 PMCID: PMC9057929 DOI: 10.1016/j.imu.2022.100958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 04/25/2022] [Accepted: 04/25/2022] [Indexed: 10/24/2022] Open
Abstract
The prediction of host human miRNA binding to the SARS-COV-2-CoV-2 RNA sequence is of particular interest. This biological process could lead to virus repression, serve as biomarkers for diagnosis, or as potential treatments for this disease. One source of concern is attempting to uncover the viral regions in which this binding could occur, as well as how these miRNAs binding could affect the SARS-COV-2 virus's processes. Using extracted sequence features from this base pairing, we predicted the relationships between miRNAs that interact with genes involved in immune function and bind to the SARS-COV-2 genome in their 5' UTR region. We compared two supervised models, SVM and Random Forest, with an unsupervised One-Class SVM. When the results of the confusion matrices were inspected, the results of the supervised models were misleading, resulting in a Type II error. However, with the latter model, we achieved an average accuracy of 92%, sensitivity of 96.18%, and specificity of 78%. We hypothesize that studying the bind of miRNAs that affect immunological genes and bind to the SARS-COV-2 virus will lead to potential genetic therapies for fighting the disease or understanding how the immune system is affected when this type of viral infection occurs.
Collapse
Affiliation(s)
- Juan Gutiérrez-Cárdenas
- Universidad de Lima, Lima, Peru
- College of Science, Engineering and Technology, University of South Africa, Florida, 1710, South Africa
| | - Zenghui Wang
- College of Science, Engineering and Technology, University of South Africa, Florida, 1710, South Africa
| |
Collapse
|
5
|
LPI-HyADBS: a hybrid framework for lncRNA-protein interaction prediction integrating feature selection and classification. BMC Bioinformatics 2021; 22:568. [PMID: 34836494 PMCID: PMC8620196 DOI: 10.1186/s12859-021-04485-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 11/09/2021] [Indexed: 12/03/2022] Open
Abstract
Background Long noncoding RNAs (lncRNAs) have dense linkages with a plethora of important cellular activities. lncRNAs exert functions by linking with corresponding RNA-binding proteins. Since experimental techniques to detect lncRNA-protein interactions (LPIs) are laborious and time-consuming, a few computational methods have been reported for LPI prediction. However, computation-based LPI identification methods have the following limitations: (1) Most methods were evaluated on a single dataset, and researchers may thus fail to measure their generalization ability. (2) The majority of methods were validated under cross validation on lncRNA-protein pairs, did not investigate the performance under other cross validations, especially for cross validation on independent lncRNAs and independent proteins. (3) lncRNAs and proteins have abundant biological information, how to select informative features need to further investigate. Results Under a hybrid framework (LPI-HyADBS) integrating feature selection based on AdaBoost, and classification models including deep neural network (DNN), extreme gradient Boost (XGBoost), and SVM with a penalty Coefficient of misclassification (C-SVM), this work focuses on finding new LPIs. First, five datasets are arranged. Each dataset contains lncRNA sequences, protein sequences, and an LPI network. Second, biological features of lncRNAs and proteins are acquired based on Pyfeat. Third, the obtained features of lncRNAs and proteins are selected based on AdaBoost and concatenated to depict each LPI sample. Fourth, DNN, XGBoost, and C-SVM are used to classify lncRNA-protein pairs based on the concatenated features. Finally, a hybrid framework is developed to integrate the classification results from the above three classifiers. LPI-HyADBS is compared to six classical LPI prediction approaches (LPI-SKF, LPI-NRLMF, Capsule-LPI, LPI-CNNCP, LPLNP, and LPBNI) on five datasets under 5-fold cross validations on lncRNAs, proteins, lncRNA-protein pairs, and independent lncRNAs and independent proteins. The results show LPI-HyADBS has the best LPI prediction performance under four different cross validations. In particular, LPI-HyADBS obtains better classification ability than other six approaches under the constructed independent dataset. Case analyses suggest that there is relevance between ZNF667-AS1 and Q15717. Conclusions Integrating feature selection approach based on AdaBoost, three classification techniques including DNN, XGBoost, and C-SVM, this work develops a hybrid framework to identify new linkages between lncRNAs and proteins. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04485-x.
Collapse
|
6
|
Wang J, Wang C, Shen L, Zhou L, Peng L. Screening Potential Drugs for COVID-19 Based on Bound Nuclear Norm Regularization. Front Genet 2021; 12:749256. [PMID: 34691157 PMCID: PMC8529063 DOI: 10.3389/fgene.2021.749256] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Accepted: 08/23/2021] [Indexed: 01/04/2023] Open
Abstract
The novel coronavirus pneumonia COVID-19 infected by SARS-CoV-2 has attracted worldwide attention. It is urgent to find effective therapeutic strategies for stopping COVID-19. In this study, a Bounded Nuclear Norm Regularization (BNNR) method is developed to predict anti-SARS-CoV-2 drug candidates. First, three virus-drug association datasets are compiled. Second, a heterogeneous virus-drug network is constructed. Third, complete genomic sequences and Gaussian association profiles are integrated to compute virus similarities; chemical structures and Gaussian association profiles are integrated to calculate drug similarities. Fourth, a BNNR model based on kernel similarity (VDA-GBNNR) is proposed to predict possible anti-SARS-CoV-2 drugs. VDA-GBNNR is compared with four existing advanced methods under fivefold cross-validation. The results show that VDA-GBNNR computes better AUCs of 0.8965, 0.8562, and 0.8803 on the three datasets, respectively. There are 6 anti-SARS-CoV-2 drugs overlapping in any two datasets, that is, remdesivir, favipiravir, ribavirin, mycophenolic acid, niclosamide, and mizoribine. Molecular dockings are conducted for the 6 small molecules and the junction of SARS-CoV-2 spike protein and human angiotensin-converting enzyme 2. In particular, niclosamide and mizoribine show higher binding energy of −8.06 and −7.06 kcal/mol with the junction, respectively. G496 and K353 may be potential key residues between anti-SARS-CoV-2 drugs and the interface junction. We hope that the predicted results can contribute to the treatment of COVID-19.
Collapse
Affiliation(s)
- Juanjuan Wang
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Chang Wang
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Ling Shen
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Liqian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China.,College of Life Sciences and Chemistry, Hunan University of Technology, Zhuzhou, China
| |
Collapse
|