1
|
Huang RH, Ge ZL, Xu G, Zeng QM, Jiang B, Xiao GC, Xia W, Wu YT, Liao YF. Prognosis and diagnosis of prostate cancer based on hypergraph regularization sparse least partial squares regression algorithm. Aging (Albany NY) 2024; 16:9599-9624. [PMID: 38829766 PMCID: PMC11210239 DOI: 10.18632/aging.205889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 02/29/2024] [Indexed: 06/05/2024]
Abstract
BACKGROUND Prostate cancer (PCa) is a malignant tumor of the male reproductive system, and its incidence has increased significantly in recent years. This study aimed to further identify candidate biomarkers with prognostic and diagnostic significance by integrating gene expression and DNA methylation data from PCa patients through association analysis. MATERIAL AND METHODS To this end, this paper proposes a sparse partial least squares regression algorithm based on hypergraph regularization (HR-SPLS) by integrating and clustering two kinds of data. Next, module 2, with the most significant weight, was selected for further analysis according to the weight of each module related to DNA methylation and mRNAs. Based on the DNA methylation sites in module 2, this paper uses multiple machine learning methods to construct a PCa diagnosis-related model of 10-DNA methylation sites. RESULTS The results of Receiver Operating Characteristic (ROC) analysis showed that the DNA methylation-related diagnostic model we constructed could diagnose PCa patients with high accuracy. Subsequently, based on the mRNAs in module 2, we constructed a prognostic model for 7-mRNAs (MYH11, ACTG2, DDR2, CDC42EP3, MARCKSL1, LMOD1, and MYLK) using multivariate Cox regression analysis. The prognostic model could predict the disease free survival of PCa patients with moderate to high accuracy (area under the curve (AUC) =0.761). In addition, Gene Set EnrichmentAnalysis (GSEA) and immune analysis indicated that the prognosis of patients in the risk group might be related to immune cell infiltration. CONCLUSIONS Our findings may provide new methods and insights for identifying disease-related biomarkers by integrating DNA methylation and gene expression data.
Collapse
Affiliation(s)
- Ruo-Hui Huang
- Department of Urology, First Affiliated Hospital of Gannan Medical University, Ganzhou, Jiangxi, China
| | - Zi-Lu Ge
- First Clinical Medical College, Gannan Medical University, Ganzhou, Jiangxi, China
| | - Gang Xu
- Department of Urology, First Affiliated Hospital of Gannan Medical University, Ganzhou, Jiangxi, China
| | - Qing-Ming Zeng
- Department of Urology, First Affiliated Hospital of Gannan Medical University, Ganzhou, Jiangxi, China
| | - Bo Jiang
- Department of Urology, First Affiliated Hospital of Gannan Medical University, Ganzhou, Jiangxi, China
| | - Guan-Cheng Xiao
- Department of Urology, First Affiliated Hospital of Gannan Medical University, Ganzhou, Jiangxi, China
| | - Wei Xia
- Department of Urology, First Affiliated Hospital of Gannan Medical University, Ganzhou, Jiangxi, China
| | - Yu-Ting Wu
- Department of Urology, First Affiliated Hospital of Gannan Medical University, Ganzhou, Jiangxi, China
| | - Yun-Feng Liao
- Department of Urology, First Affiliated Hospital of Gannan Medical University, Ganzhou, Jiangxi, China
| |
Collapse
|
2
|
Ding P, Zeng M, Yin R. Editorial: Computational methods to analyze RNA data for human diseases. Front Genet 2023; 14:1270334. [PMID: 37674479 PMCID: PMC10478215 DOI: 10.3389/fgene.2023.1270334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 08/14/2023] [Indexed: 09/08/2023] Open
Affiliation(s)
- Pingjian Ding
- Center for Artificial Intelligence in Drug Discovery, School of Medicine, Case Western Reserve University, Cleveland, OH, United States
| | - Min Zeng
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Rui Yin
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, United States
| |
Collapse
|
3
|
Shi Y, Zhou L, Zeng W, Wei B, Deng J. Sparse Independence Component Analysis for Competitive Endogenous RNA Co-Module Identification in Liver Hepatocellular Carcinoma. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE 2023; 11:384-393. [PMID: 37465460 PMCID: PMC10351610 DOI: 10.1109/jtehm.2023.3283519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/12/2022] [Revised: 05/31/2023] [Accepted: 06/04/2023] [Indexed: 07/20/2023]
Abstract
OBJECTIVE Long non-coding RNAs (lncRNAs) have been shown to be associated with the pathogenesis of different kinds of diseases and play important roles in various biological processes. Although numerous lncRNAs have been found, the functions of most lncRNAs and physiological/pathological significance are still in its infancy. Meanwhile, their expression patterns and regulation mechanisms are also far from being fully understood. METHODS In order to reveal functional lncRNAs and identify the key lncRNAs, we develop a new sparse independence component analysis (ICA) method to identify lncRNA-mRNA-miRNA expression co-modules based on the competitive endogenous RNA (ceRNA) theory using the sample-matched lncRNA, mRNA and miRNA expression profiles. The expression data of the three RNA combined together is approximated sparsely to obtain the corresponding sparsity coefficient, and then it is decomposed by using ICA constraint optimization to obtain the common basis and modules. Subsequently, affine propagation clustering is used to perform cluster analysis on the common basis under multiple running conditions to obtain the co-modules for the selection of different RNA elements. RESULTS We applied sparse ICA to Liver Hepatocellular Carcinoma (LIHC) dataset and the experiment results demonstrate that the proposed sparse ICA method can effectively discover biologically functional expression common modules. CONCLUSION It may provide insights into the function of lncRNAs and molecular mechanism of LIHC. Clinical and Translational Impact Statement-The results on LIHC dataset demonstrate that the proposed sparse ICA method can effectively discover biologically functional expression common modules, which may provide insights into the function of IncRNAs and molecular mechanism of LIHC.
Collapse
Affiliation(s)
- Yuhu Shi
- Information Engineering CollegeShanghai Maritime UniversityShanghai201306China
| | - Lili Zhou
- Yangpu District Central HospitalShanghai200433China
| | - Weiming Zeng
- Information Engineering CollegeShanghai Maritime UniversityShanghai201306China
| | - Boyang Wei
- Information Engineering CollegeShanghai Maritime UniversityShanghai201306China
| | - Jin Deng
- College of Mathematics and InformaticsSouth China Agricultural UniversityGuangzhou510642China
| |
Collapse
|
4
|
Meng Y, Jin M. HFS-SLPEE: A Novel Hierarchical Feature Selection and Second Learning Probability Error Ensemble Model for Precision Cancer Diagnosis. Front Cell Dev Biol 2021; 9:696359. [PMID: 34277640 PMCID: PMC8278475 DOI: 10.3389/fcell.2021.696359] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 05/19/2021] [Indexed: 11/15/2022] Open
Abstract
The emergence of high-throughput RNA-seq data has offered unprecedented opportunities for cancer diagnosis. However, capturing biological data with highly nonlinear and complex associations by most existing approaches for cancer diagnosis has been challenging. In this study, we propose a novel hierarchical feature selection and second learning probability error ensemble model (named HFS-SLPEE) for precision cancer diagnosis. Specifically, we first integrated protein-coding gene expression profiles, non-coding RNA expression profiles, and DNA methylation data to provide rich information; afterward, we designed a novel hierarchical feature selection method, which takes the CpG-gene biological associations into account and can select a compact set of superior features; next, we used four individual classifiers with significant differences and apparent complementary to build the heterogeneous classifiers; lastly, we developed a second learning probability error ensemble model called SLPEE to thoroughly learn the new data consisting of classifiers-predicted class probability values and the actual label, further realizing the self-correction of the diagnosis errors. Benchmarking comparisons on TCGA showed that HFS-SLPEE performs better than the state-of-the-art approaches. Moreover, we analyzed in-depth 10 groups of selected features and found several novel HFS-SLPEE-predicted epigenomics and epigenetics biomarkers for breast invasive carcinoma (BRCA) (e.g., TSLP and ADAMTS9-AS2), lung adenocarcinoma (LUAD) (e.g., HBA1 and CTB-43E15.1), and kidney renal clear cell carcinoma (KIRC) (e.g., IRX2 and BMPR1B-AS1).
Collapse
Affiliation(s)
| | - Min Jin
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| |
Collapse
|
5
|
Lu C, Zeng M, Zhang F, Wu FX, Li M, Wang J. Deep Matrix Factorization Improves Prediction of Human CircRNA-Disease Associations. IEEE J Biomed Health Inform 2021; 25:891-899. [PMID: 32750925 DOI: 10.1109/jbhi.2020.2999638] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
In recent years, more and more evidence indicates that circular RNAs (circRNAs) with covalently closed loop play various roles in biological processes. Dysregulation and mutation of circRNAs may be implicated in diseases. Due to its stable structure and resistance to degradation, circRNAs provide great potential to be diagnostic biomarkers. Therefore, predicting circRNA-disease associations is helpful in disease diagnosis. However, there are few experimentally validated associations between circRNAs and diseases. Although several computational methods have been proposed, precisely representing underlying features and grasping the complex structures of data are still challenging. In this paper, we design a new method, called DMFCDA (Deep Matrix Factorization CircRNA-Disease Association), to infer potential circRNA-disease associations. DMFCDA takes both explicit and implicit feedback into account. Then, it uses a projection layer to automatically learn latent representations of circRNAs and diseases. With multi-layer neural networks, DMFCDA can model the non-linear associations to grasp the complex structure of data. We assess the performance of DMFCDA using leave-one cross-validation and 5-fold cross-validation on two datasets. Computational results show that DMFCDA efficiently infers circRNA-disease associations according to AUC values, the percentage of precisely retrieved associations in various top ranks, and statistical comparison. We also conduct case studies to evaluate DMFCDA. All results show that DMFCDA provides accurate predictions.
Collapse
|
6
|
Xiao Q, Zhong J, Tang X, Luo J. iCDA-CMG: identifying circRNA-disease associations by federating multi-similarity fusion and collective matrix completion. Mol Genet Genomics 2020; 296:223-233. [PMID: 33159254 DOI: 10.1007/s00438-020-01741-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Accepted: 10/23/2020] [Indexed: 01/22/2023]
Abstract
Circular RNAs (circRNAs) are a special class of non-coding RNAs with covalently closed-loop structures. Studies prove that circRNAs perform critical roles in various biological processes, and the aberrant expression of circRNAs is closely related to tumorigenesis. Therefore, identifying potential circRNA-disease associations is beneficial to understand the pathogenesis of complex diseases at the circRNA level and helps biomedical researchers and practitioners to discover diagnostic biomarkers accurately. However, it is tremendously laborious and time-consuming to discover disease-related circRNAs with conventional biological experiments. In this study, we develop an integrative framework, called iCDA-CMG, to predict potential associations between circRNAs and diseases. By incorporating multi-source prior knowledge, including known circRNA-disease associations, disease similarities and circRNA similarities, we adopt a collective matrix completion-based graph learning model to prioritize the most promising disease-related circRNAs for guiding laborious clinical trials. The results show that iCDA-CMG outperforms other state-of-the-art models in terms of cross-validation and independent prediction. Moreover, the case studies for several representative cancers suggest the effectiveness of iCDA-CMG in screening circRNA candidates for human diseases, which will contribute to elucidating the pathogenesis mechanisms and unveiling new opportunities for disease diagnosis and targeted therapy.
Collapse
Affiliation(s)
- Qiu Xiao
- Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, College of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China.,Hunan Xiangjiang Artificial Intelligence Academy, Changsha, 410000, China
| | - Jiancheng Zhong
- Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, College of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China.
| | - Xiwei Tang
- School of Information Science and Engineering, Hunan First Normal University, Changsha, 410205, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China.
| |
Collapse
|
7
|
Xiao Q, Zhang N, Luo J, Dai J, Tang X. Adaptive multi-source multi-view latent feature learning for inferring potential disease-associated miRNAs. Brief Bioinform 2020; 22:2043-2057. [PMID: 32186712 DOI: 10.1093/bib/bbaa028] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Revised: 02/16/2020] [Accepted: 01/14/2020] [Indexed: 12/13/2022] Open
Abstract
Accumulating evidence has shown that microRNAs (miRNAs) play crucial roles in different biological processes, and their mutations and dysregulations have been proved to contribute to tumorigenesis. In silico identification of disease-associated miRNAs is a cost-effective strategy to discover those most promising biomarkers for disease diagnosis and treatment. The increasing available omics data sources provide unprecedented opportunities to decipher the underlying relationships between miRNAs and diseases by computational models. However, most existing methods are biased towards a single representation of miRNAs or diseases and are also not capable of discovering unobserved associations for new miRNAs or diseases without association information. In this study, we present a novel computational method with adaptive multi-source multi-view latent feature learning (M2LFL) to infer potential disease-associated miRNAs. First, we adopt multiple data sources to obtain similarity profiles and capture different latent features according to the geometric characteristic of miRNA and disease spaces. Then, the multi-modal latent features are projected to a common subspace to discover unobserved miRNA-disease associations in both miRNA and disease views, and an adaptive joint graph regularization term is developed to preserve the intrinsic manifold structures of multiple similarity profiles. Meanwhile, the Lp,q-norms are imposed into the projection matrices to ensure the sparsity and improve interpretability. The experimental results confirm the superior performance of our proposed method in screening reliable candidate disease miRNAs, which suggests that M2LFL could be an efficient tool to discover diagnostic biomarkers for guiding laborious clinical trials.
Collapse
|
8
|
Xiao Q, Luo J, Liang C, Cai J, Li G, Cao B. CeModule: an integrative framework for discovering regulatory patterns from genomic data in cancer. BMC Bioinformatics 2019; 20:67. [PMID: 30732558 PMCID: PMC6367773 DOI: 10.1186/s12859-019-2654-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2017] [Accepted: 01/24/2019] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Non-coding RNAs (ncRNAs) are emerging as key regulators and play critical roles in a wide range of tumorigenesis. Recent studies have suggested that long non-coding RNAs (lncRNAs) could interact with microRNAs (miRNAs) and indirectly regulate miRNA targets through competing interactions. Therefore, uncovering the competing endogenous RNA (ceRNA) regulatory mechanism of lncRNAs, miRNAs and mRNAs in post-transcriptional level will aid in deciphering the underlying pathogenesis of human polygenic diseases and may unveil new diagnostic and therapeutic opportunities. However, the functional roles of vast majority of cancer specific ncRNAs and their combinational regulation patterns are still insufficiently understood. RESULTS Here we develop an integrative framework called CeModule to discover lncRNA, miRNA and mRNA-associated regulatory modules. We fully utilize the matched expression profiles of lncRNAs, miRNAs and mRNAs and establish a model based on joint orthogonality non-negative matrix factorization for identifying modules. Meanwhile, we impose the experimentally verified miRNA-lncRNA interactions, the validated miRNA-mRNA interactions and the weighted gene-gene network into this framework to improve the module accuracy through the network-based penalties. The sparse regularizations are also used to help this model obtain modular sparse solutions. Finally, an iterative multiplicative updating algorithm is adopted to solve the optimization problem. CONCLUSIONS We applied CeModule to two cancer datasets including ovarian cancer (OV) and uterine corpus endometrial carcinoma (UCEC) obtained from TCGA. The modular analysis indicated that the identified modules involving lncRNAs, miRNAs and mRNAs are significantly associated and functionally enriched in cancer-related biological processes and pathways, which may provide new insights into the complex regulatory mechanism of human diseases at the system level.
Collapse
Affiliation(s)
- Qiu Xiao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
- Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Hunan Normal University, Changsha, 410081, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China.
| | - Cheng Liang
- College of Information Science and Engineering, Shandong Normal University, Jinan, 250000, China
| | - Jie Cai
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| | - Guanghui Li
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| | - Buwen Cao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| |
Collapse
|