1
|
Choi JM, Park C, Chae H. meth-SemiCancer: a cancer subtype classification framework via semi-supervised learning utilizing DNA methylation profiles. BMC Bioinformatics 2023; 24:168. [PMID: 37101254 PMCID: PMC10131478 DOI: 10.1186/s12859-023-05272-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 04/05/2023] [Indexed: 04/28/2023] Open
Abstract
BACKGROUND Identification of the cancer subtype plays a crucial role to provide an accurate diagnosis and proper treatment to improve the clinical outcomes of patients. Recent studies have shown that DNA methylation is one of the key factors for tumorigenesis and tumor growth, where the DNA methylation signatures have the potential to be utilized as cancer subtype-specific markers. However, due to the high dimensionality and the low number of DNA methylome cancer samples with the subtype information, still, to date, a cancer subtype classification method utilizing DNA methylome datasets has not been proposed. RESULTS In this paper, we present meth-SemiCancer, a semi-supervised cancer subtype classification framework based on DNA methylation profiles. The proposed model was first pre-trained based on the methylation datasets with the cancer subtype labels. After that, meth-SemiCancer generated the pseudo-subtypes for the cancer datasets without subtype information based on the model's prediction. Finally, fine-tuning was performed utilizing both the labeled and unlabeled datasets. CONCLUSIONS From the performance comparison with the standard machine learning-based classifiers, meth-SemiCancer achieved the highest average F1-score and Matthews correlation coefficient, outperforming other methods. Fine-tuning the model with the unlabeled patient samples by providing the proper pseudo-subtypes, encouraged meth-SemiCancer to generalize better than the supervised neural network-based subtype classification method. meth-SemiCancer is publicly available at https://github.com/cbi-bioinfo/meth-SemiCancer .
Collapse
Affiliation(s)
- Joung Min Choi
- Department of Computer Science, Virginia Tech, Blacksburg, USA
| | - Chaelin Park
- Division of Computer Science, Sookmyung Women's University, Seoul, Republic of Korea
| | - Heejoon Chae
- Division of Computer Science, Sookmyung Women's University, Seoul, Republic of Korea.
| |
Collapse
|
2
|
Zhang L, Li C, Peng D, Yi X, He S, Liu F, Zheng X, Huang WE, Zhao L, Huang X. Raman spectroscopy and machine learning for the classification of breast cancers. Spectrochim Acta A Mol Biomol Spectrosc 2022; 264:120300. [PMID: 34455388 DOI: 10.1016/j.saa.2021.120300] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 07/26/2021] [Accepted: 08/16/2021] [Indexed: 06/13/2023]
Abstract
Breast cancer is a major health threat for women. The drug responses associated with different breast cancer subtypes have obvious effects on therapeutic outcomes; therefore, the accurate classification of breast cancer subtypes is critical. Breast cancer subtype classification has recently been examined using various methods, and Raman spectroscopy has emerged as an effective technique that can be used for noninvasive breast cancer analysis. However, the accurate and rapid classification of breast cancer subtypes currently requires a great deal of effort and experience with the processing and analysis of Raman spectra data. Here, we adopted Raman spectroscopy and machine learning techniques to simplify and accelerate the process used to distinguish normal from breast cancer cells and classify breast cancer subtypes. Raman spectra were obtained from cultured breast cancer cell lines, and the data were analyzed by two machine learning algorithms: principal component analysis (PCA)-discriminant function analysis (DFA) and PCA-support vector machine (SVM). The accuracies with which these two algorithms were able to distinguish normal breast cells from breast cancer cells were both greater than 97%, and the accuracies of breast cancer subtype classification for both algorithms were both greater than 92%. Moreover, our results showed evidence to support the use of characteristic Raman spectral features as cancer cell biomarkers, such as the intensity of intrinsic Raman bands, which increased in cancer cells. Raman spectroscopy combined with machine learning techniques provides a rapid method for breast cancer analysis able to reveal differences in intracellular compositions and molecular structures among subtypes.
Collapse
Affiliation(s)
- Lihao Zhang
- Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Keling Road, Suzhou, Jiangsu Province, 215163, China
| | - Chengjian Li
- Department of Pharmacy, Shanghai Baoshan Luodian Hospital, Baoshan District, Shanghai, 201908, China; Luodian Clinical Drug Research Center, Institute for Translational Medicine Research, Shanghai University, Shanghai, 200444, China
| | - Di Peng
- Shanghai D-band Medical Instrument Co., Ltd, Huyi Highway, Jiading District, Shanghai, 201800, China
| | - Xiaofei Yi
- Shanghai D-band Medical Instrument Co., Ltd, Huyi Highway, Jiading District, Shanghai, 201800, China
| | - Shuai He
- Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Keling Road, Suzhou, Jiangsu Province, 215163, China
| | - Fengxiang Liu
- Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Keling Road, Suzhou, Jiangsu Province, 215163, China
| | - Xiangtai Zheng
- Luodian Clinical Drug Research Center, Institute for Translational Medicine Research, Shanghai University, Shanghai, 200444, China
| | - Wei E Huang
- Department of Engineering Science, University of Oxford, Parks Road, Oxford OX1 3PJ, UK
| | - Liang Zhao
- Department of Pharmacy, Shanghai Baoshan Luodian Hospital, Baoshan District, Shanghai, 201908, China; Luodian Clinical Drug Research Center, Institute for Translational Medicine Research, Shanghai University, Shanghai, 200444, China.
| | - Xia Huang
- Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Keling Road, Suzhou, Jiangsu Province, 215163, China; Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China.
| |
Collapse
|
3
|
Xu J, Wu P, Chen Y, Meng Q, Dawood H, Dawood H. A hierarchical integration deep flexible neural forest framework for cancer subtype classification by integrating multi-omics data. BMC Bioinformatics 2019; 20:527. [PMID: 31660856 PMCID: PMC6819613 DOI: 10.1186/s12859-019-3116-7] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Accepted: 09/27/2019] [Indexed: 12/11/2022] Open
Abstract
Background Cancer subtype classification attains the great importance for accurate diagnosis and personalized treatment of cancer. Latest developments in high-throughput sequencing technologies have rapidly produced multi-omics data of the same cancer sample. Many computational methods have been proposed to classify cancer subtypes, however most of them generate the model by only employing gene expression data. It has been shown that integration of multi-omics data contributes to cancer subtype classification. Results A new hierarchical integration deep flexible neural forest framework is proposed to integrate multi-omics data for cancer subtype classification named as HI-DFNForest. Stacked autoencoder (SAE) is used to learn high-level representations in each omics data, then the complex representations are learned by integrating all learned representations into a layer of autoencoder. Final learned data representations (from the stacked autoencoder) are used to classify patients into different cancer subtypes using deep flexible neural forest (DFNForest) model.Cancer subtype classification is verified on BRCA, GBM and OV data sets from TCGA by integrating gene expression, miRNA expression and DNA methylation data. These results demonstrated that integrating multiple omics data improves the accuracy of cancer subtype classification than only using gene expression data and the proposed framework has achieved better performance compared with other conventional methods. Conclusion The new hierarchical integration deep flexible neural forest framework(HI-DFNForest) is an effective method to integrate multi-omics data to classify cancer subtypes.
Collapse
Affiliation(s)
- Jing Xu
- School of Information Science and Engineering, University of Jinan, Jinan, China.,Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, China
| | - Peng Wu
- School of Information Science and Engineering, University of Jinan, Jinan, China. .,Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, China.
| | - Yuehui Chen
- School of Information Science and Engineering, University of Jinan, Jinan, China.,Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, China
| | - Qingfang Meng
- School of Information Science and Engineering, University of Jinan, Jinan, China.,Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, China
| | - Hussain Dawood
- Department of Computer and Network Engineering, University of Jeddah, Jeddah, Saudi Arabia
| | - Hassan Dawood
- Department of Software Engineering, University of Engineering and Technology, Taxila, Pakistan
| |
Collapse
|
4
|
Qin Y, Feng H, Chen M, Wu H, Zheng X. InfiniumPurify: An R package for estimating and accounting for tumor purity in cancer methylation research. Genes Dis 2018; 5:43-45. [PMID: 30258934 PMCID: PMC6147081 DOI: 10.1016/j.gendis.2018.02.003] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2018] [Accepted: 02/04/2018] [Indexed: 01/07/2023] Open
Abstract
The proposition of cancer cells in a tumor sample, named as tumor purity, is an intrinsic factor of tumor samples and has potentially great influence in variety of analyses including differential methylation, subclonal deconvolution and subtype clustering. InfiniumPurify is an integrated R package for estimating and accounting for tumor purity based on DNA methylation Infinium 450 k array data. InfiniumPurify has three main functions getPurity, InfiniumDMC and InfiniumClust, which could infer tumor purity, differential methylation analysis and tumor sample cluster accounting for estimated or user-provided tumor purities, respectively. The InfiniumPurify package provides a comprehensive analysis of tumor purity in cancer methylation research.
Collapse
Affiliation(s)
- Yufang Qin
- College of Information Technology, Shanghai Ocean University, Shanghai, 201306, PR China.,Key Laboratory of Fisheries Information Ministry of Agriculture, Shanghai, 201306, PR China
| | - Hao Feng
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Georgia 30322, USA
| | - Ming Chen
- College of Information Technology, Shanghai Ocean University, Shanghai, 201306, PR China.,Key Laboratory of Fisheries Information Ministry of Agriculture, Shanghai, 201306, PR China
| | - Hao Wu
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Georgia 30322, USA
| | - Xiaoqi Zheng
- Department of Mathematics, Shanghai Normal University, Shanghai, 200234, PR China
| |
Collapse
|