1
|
Florentino BR, Parmezan Bonidia R, Sanches NH, da Rocha UN, de Carvalho AC. BioPrediction-RPI: Democratizing the prediction of interaction between non-coding RNA and protein with end-to-end machine learning. Comput Struct Biotechnol J 2024; 23:2267-2276. [PMID: 38827228 PMCID: PMC11140557 DOI: 10.1016/j.csbj.2024.05.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Revised: 05/16/2024] [Accepted: 05/16/2024] [Indexed: 06/04/2024] Open
Abstract
Machine Learning (ML) algorithms have been important tools for the extraction of useful knowledge from biological sequences, particularly in healthcare, agriculture, and the environment. However, the categorical and unstructured nature of these sequences requiring usually additional feature engineering steps, before an ML algorithm can be efficiently applied. The addition of these steps to the ML algorithm creates a processing pipeline, known as end-to-end ML. Despite the excellent results obtained by applying end-to-end ML to biotechnology problems, the performance obtained depends on the expertise of the user in the components of the pipeline. In this work, we propose an end-to-end ML-based framework called BioPrediction-RPI, which can identify implicit interactions between sequences, such as pairs of non-coding RNA and proteins, without the need for specialized expertise in end-to-end ML. This framework applies feature engineering to represent each sequence by structural and topological features. These features are divided into feature groups and used to train partial models, whose partial decisions are combined into a final decision, which, provides insights to the user by giving an interpretability report. In our experiments, the developed framework was competitive when compared with various expert-created models. We assessed BioPrediction-RPI with 12 datasets when it presented equal or better performance than all tools in 40% to 100% of cases, depending on the experiment. Finally, BioPrediction-RPI can fine-tune models based on new data and perform at the same level as ML experts, democratizing end-to-end ML and increasing its access to those working in biological sciences.
Collapse
Affiliation(s)
- Bruno Rafael Florentino
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos, 13566-590, São Paulo, Brazil
| | - Robson Parmezan Bonidia
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos, 13566-590, São Paulo, Brazil
- Department of Computer Science, Federal University of Technology-Paraná (UTFPR), Cornélio Procópio, 86300-000, Paraná, Brazil
| | - Natan Henrique Sanches
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos, 13566-590, São Paulo, Brazil
| | - Ulisses N. da Rocha
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research-UFZ GmbH, Leipzig, Saxony, Germany
| | - André C.P.L.F. de Carvalho
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos, 13566-590, São Paulo, Brazil
| |
Collapse
|
2
|
Zhang X, Zhao L, Chai Z, Wu H, Yang W, Li C, Jiang Y, Liu Q. NPI-DCGNN: An Accurate Tool for Identifying ncRNA-Protein Interactions Using a Dual-Channel Graph Neural Network. J Comput Biol 2024; 31:742-756. [PMID: 38923911 DOI: 10.1089/cmb.2023.0449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/28/2024] Open
Abstract
Noncoding RNA (NcRNA)-protein interactions (NPIs) play fundamentally important roles in carrying out cellular activities. Although various predictors based on molecular features and graphs have been published to boost the identification of NPIs, most of them often ignore the information between known NPIs or exhibit insufficient learning ability from graphs, posing a significant challenge in effectively identifying NPIs. To develop a more reliable and accurate predictor for NPIs, in this article, we propose NPI-DCGNN, an end-to-end NPI predictor based on a dual-channel graph neural network (DCGNN). NPI-DCGNN initially treats the known NPIs as an ncRNA-protein bipartite graph. Subsequently, for each ncRNA-protein pair, NPI-DCGNN extracts two local subgraphs centered around the ncRNA and protein, respectively, from the bipartite graph. After that, it utilizes a dual-channel graph representation learning layer based on GNN to generate high-level feature representations for the ncRNA-protein pair. Finally, it employs a fully connected network and output layer to predict whether an interaction exists between the pair of ncRNA and protein. Experimental results on four experimentally validated datasets demonstrate that NPI-DCGNN outperforms several state-of-the-art NPI predictors. Our case studies on the NPInter database further demonstrate the prediction power of NPI-DCGNN in predicting NPIs. With the availability of the source codes (https://github.com/zhangxin11111/NPI-DCGNN), we anticipate that NPI-DCGNN could facilitate the studies of ncRNA interactome by providing highly reliable NPI candidates for further experimental validation.
Collapse
Affiliation(s)
- Xin Zhang
- College of Information Engineering, Northwest A&F University, Yangling, China
| | - Liangwei Zhao
- College of Information Engineering, Northwest A&F University, Yangling, China
| | - Ziyi Chai
- College of Information Engineering, Northwest A&F University, Yangling, China
| | - Hao Wu
- School of Software, Shandong University, Jinan, China
| | - Wei Yang
- National Clinical Research Center for Infectious Diseases, Shenzhen, China
| | - Chen Li
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia
| | - Yu Jiang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Quanzhong Liu
- College of Information Engineering, Northwest A&F University, Yangling, China
| |
Collapse
|
3
|
Zhang X, Liu M, Li Z, Zhuo L, Fu X, Zou Q. Fusion of multi-source relationships and topology to infer lncRNA-protein interactions. MOLECULAR THERAPY. NUCLEIC ACIDS 2024; 35:102187. [PMID: 38706631 PMCID: PMC11066462 DOI: 10.1016/j.omtn.2024.102187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 04/03/2024] [Indexed: 05/07/2024]
Abstract
Long non-coding RNAs (lncRNAs) are important factors involved in biological regulatory networks. Accurately predicting lncRNA-protein interactions (LPIs) is vital for clarifying lncRNA's functions and pathogenic mechanisms. Existing deep learning models have yet to yield satisfactory results in LPI prediction. Recently, graph autoencoders (GAEs) have seen rapid development, excelling in tasks like link prediction and node classification. We employed GAE technology for LPI prediction, devising the FMSRT-LPI model based on path masking and degree regression strategies and thereby achieving satisfactory outcomes. This represents the first known integration of path masking and degree regression strategies into the GAE framework for potential LPI inference. The effectiveness of our FMSRT-LPI model primarily relies on four key aspects. First, within the GAE framework, our model integrates multi-source relationships of lncRNAs and proteins with LPN's topological data. Second, the implemented masking strategy efficiently identifies LPN's key paths, reconstructs the network, and reduces the impact of redundant or incorrect data. Third, the integrated degree decoder balances degree and structural information, enhancing node representation. Fourth, the PolyLoss function we introduced is more appropriate for LPI prediction tasks. The results on multiple public datasets further demonstrate our model's potential in LPI prediction.
Collapse
Affiliation(s)
- Xinyu Zhang
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou 325027, China
| | - Mingzhe Liu
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou 325027, China
| | - Zhen Li
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou 510000, China
| | - Linlin Zhuo
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou 325027, China
| | - Xiangzheng Fu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410012, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 611730, China
| |
Collapse
|
4
|
Sun DZ, Sun ZL, Liu M, Yong SH. LPI-SKMSC: Predicting LncRNA-Protein Interactions with Segmented k-mer Frequencies and Multi-space Clustering. Interdiscip Sci 2024; 16:378-391. [PMID: 38206558 DOI: 10.1007/s12539-023-00598-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Revised: 11/25/2023] [Accepted: 12/05/2023] [Indexed: 01/12/2024]
Abstract
Long noncoding RNAs (lncRNAs) have significant regulatory roles in gene expression. Interactions with proteins are one of the ways lncRNAs play their roles. Since experiments to determine lncRNA-protein interactions (LPIs) are expensive and time-consuming, many computational methods for predicting LPIs have been proposed as alternatives. In the LPIs prediction problem, there commonly exists the imbalance in the distribution of positive and negative samples. However, there are few existing methods that give specific consideration to this problem. In this paper, we proposed a new clustering-based LPIs prediction method using segmented k-mer frequencies and multi-space clustering (LPI-SKMSC). It was dedicated to handling the imbalance of positive and negative samples. We constructed segmented k-mer frequencies to obtain global and local features of lncRNA and protein sequences. Then, the multi-space clustering was applied to LPI-SKMSC. The convolutional neural network (CNN)-based encoders were used to map different features of a sample to different spaces. It used multiple spaces to jointly constrain the classification of samples. Finally, the distances between the output features of the encoder and the cluster center in each space were calculated. The sum of distances in all spaces was compared with the cluster radius to predict the LPIs. We performed cross-validation on 3 public datasets and LPI-SKMSC showed the best performance compared to other existing methods. Experimental results showed that LPI-SKMSC could predict LPIs more effectively when faced with imbalanced positive and negative samples. In addition, we illustrated that our model was better at uncovering potential lncRNA-protein interaction pairs.
Collapse
Affiliation(s)
- Dian-Zheng Sun
- School of Electrical Engineering and Automation, Anhui University, Hefei, 230601, China
| | - Zhan-Li Sun
- School of Electrical Engineering and Automation, Anhui University, Hefei, 230601, China.
| | - Mengya Liu
- School of Computer Science and Technology, Anhui University, Hefei, 230601, China
| | - Shuang-Hao Yong
- School of Electrical Engineering and Automation, Anhui University, Hefei, 230601, China
| |
Collapse
|
5
|
Liang Y, Yin X, Zhang Y, Guo Y, Wang Y. Predicting lncRNA-protein interactions through deep learning framework employing multiple features and random forest algorithm. BMC Bioinformatics 2024; 25:108. [PMID: 38475723 DOI: 10.1186/s12859-024-05727-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2023] [Accepted: 03/01/2024] [Indexed: 03/14/2024] Open
Abstract
RNA-protein interaction (RPI) is crucial to the life processes of diverse organisms. Various researchers have identified RPI through long-term and high-cost biological experiments. Although numerous machine learning and deep learning-based methods for predicting RPI currently exist, their robustness and generalizability have significant room for improvement. This study proposes LPI-MFF, an RPI prediction model based on multi-source information fusion, to address these issues. The LPI-MFF employed protein-protein interactions features, sequence features, secondary structure features, and physical and chemical properties as the information sources with the corresponding coding scheme, followed by the random forest algorithm for feature screening. Finally, all information was combined and a classification method based on convolutional neural networks is used. The experimental results of fivefold cross-validation demonstrated that the accuracy of LPI-MFF on RPI1807 and NPInter was 97.60% and 97.67%, respectively. In addition, the accuracy rate on the independent test set RPI1168 was 84.9%, and the accuracy rate on the Mus musculus dataset was 90.91%. Accordingly, LPI-MFF demonstrated greater robustness and generalization than other prevalent RPI prediction methods.
Collapse
Affiliation(s)
- Ying Liang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Zhimin Avenue, Nanchang, China
| | - XingRui Yin
- College of Computer and Information Engineering, Jiangxi Agricultural University, Zhimin Avenue, Nanchang, China
| | - YangSen Zhang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Zhimin Avenue, Nanchang, China
| | - You Guo
- First Affiliated Hospital, Gannan Medical University, Medical College Road, Ganzhou, China.
| | - YingLong Wang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Zhimin Avenue, Nanchang, China.
| |
Collapse
|
6
|
Xie W, Chen X, Zheng Z, Wang F, Zhu X, Lin Q, Sun Y, Wong KC. LncRNA-Top: Controlled deep learning approaches for lncRNA gene regulatory relationship annotations across different platforms. iScience 2023; 26:108197. [PMID: 37965148 PMCID: PMC10641498 DOI: 10.1016/j.isci.2023.108197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 08/10/2023] [Accepted: 10/10/2023] [Indexed: 11/16/2023] Open
Abstract
By soaking microRNAs (miRNAs), long non-coding RNAs (lncRNAs) have the potential to regulate gene expression. Few methods have been created based on this mechanism to anticipate the lncRNA-gene relationship prediction. Hence, we present lncRNA-Top to forecast potential lncRNA-gene regulation relationships. Specifically, we constructed controlled deep-learning methods using 12417 lncRNAs and 16127 genes. We have provided retrospective and innovative views among negative sampling, random seeds, cross-validation, metrics, and independent datasets. The AUC, AUPR, and our defined precision@k were leveraged to evaluate performance. In-depth case studies demonstrate that 47 out of 100 projected top unknown pairings were recorded in publications, supporting the predictive power. Our additional software can annotate the scores with target candidates. The lncRNA-Top will be a helpful tool to uncover prospective lncRNA targets and better comprehend the regulatory processes of lncRNAs.
Collapse
Affiliation(s)
- Weidun Xie
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Xingjian Chen
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Zetian Zheng
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Fuzhou Wang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Xiaowei Zhu
- Department of Neuroscience, Jockey Club College of Veterinary Medicine and Life Sciences, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Qiuzhen Lin
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
| | - Yanni Sun
- Department of Electrical Engineering, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
- Shenzhen Research Institute, City University of Hong Kong, Shenzhen, China
- Hong Kong Institute for Data Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| |
Collapse
|
7
|
Peng L, Yuan R, Han C, Han G, Tan J, Wang Z, Chen M, Chen X. CellEnBoost: A Boosting-Based Ligand-Receptor Interaction Identification Model for Cell-to-Cell Communication Inference. IEEE Trans Nanobioscience 2023; 22:705-715. [PMID: 37216267 DOI: 10.1109/tnb.2023.3278685] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Cell-to-cell communication (CCC) plays important roles in multicellular organisms. The identification of communication between cancer cells themselves and one between cancer cells and normal cells in tumor microenvironment helps understand cancer genesis, development and metastasis. CCC is usually mediated by Ligand-Receptor Interactions (LRIs). In this manuscript, we developed a Boosting-based LRI identification model (CellEnBoost) for CCC inference. First, potential LRIs are predicted by data collection, feature extraction, dimensional reduction, and classification based on an ensemble of Light gradient boosting machine and AdaBoost combining convolutional neural network. Next, the predicted LRIs and known LRIs are filtered. Third, the filtered LRIs are applied to CCC elucidation by combining CCC strength measurement and single-cell RNA sequencing data. Finally, CCC inference results are visualized using heatmap view, Circos plot view, and network view. The experimental results show that CellEnBoost obtained the best AUCs and AUPRs on the collected four LRI datasets. Case study in human head and neck squamous cell carcinoma (HNSCC) tissues demonstrates that fibroblasts were more likely to communicate with HNSCC cells, which is in accord with the results from iTALK. We anticipate that this work can contribute to the diagnosis and treatment of cancers.
Collapse
|
8
|
Su Z, Lu H, Wu Y, Li Z, Duan L. Predicting potential lncRNA biomarkers for lung cancer and neuroblastoma based on an ensemble of a deep neural network and LightGBM. Front Genet 2023; 14:1238095. [PMID: 37655066 PMCID: PMC10466784 DOI: 10.3389/fgene.2023.1238095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Accepted: 07/19/2023] [Indexed: 09/02/2023] Open
Abstract
Introduction: Lung cancer is one of the most frequent neoplasms worldwide with approximately 2.2 million new cases and 1.8 million deaths each year. The expression levels of programmed death ligand-1 (PDL1) demonstrate a complex association with lung cancer. Neuroblastoma is a high-risk malignant tumor and is mainly involved in childhood patients. Identification of new biomarkers for these two diseases can significantly promote their diagnosis and therapy. However, in vivo experiments to discover potential biomarkers are costly and laborious. Consequently, artificial intelligence technologies, especially machine learning methods, provide a powerful avenue to find new biomarkers for various diseases. Methods: We developed a machine learning-based method named LDAenDL to detect potential long noncoding RNA (lncRNA) biomarkers for lung cancer and neuroblastoma using an ensemble of a deep neural network and LightGBM. LDAenDL first computes the Gaussian kernel similarity and functional similarity of lncRNAs and the Gaussian kernel similarity and semantic similarity of diseases to obtain their similar networks. Next, LDAenDL combines a graph convolutional network, graph attention network, and convolutional neural network to learn the biological features of the lncRNAs and diseases based on their similarity networks. Third, these features are concatenated and fed to an ensemble model composed of a deep neural network and LightGBM to find new lncRNA-disease associations (LDAs). Finally, the proposed LDAenDL method is applied to identify possible lncRNA biomarkers associated with lung cancer and neuroblastoma. Results: The experimental results show that LDAenDL computed the best AUCs of 0.8701, 107 0.8953, and 0.9110 under cross-validation on lncRNAs, diseases, and lncRNA-disease pairs on Dataset 1, respectively, and 0.9490, 0.9157, and 0.9708 on Dataset 2, respectively. Furthermore, AUPRs of 0.8903, 0.9061, and 0.9166 under three cross-validations were obtained on Dataset 1, and 0.9582, 0.9122, and 0.9743 on Dataset 2. The results demonstrate that LDAenDL significantly outperformed the other four classical LDA prediction methods (i.e., SDLDA, LDNFSGB, IPCAF, and LDASR). Case studies demonstrate that CCDC26 and IFNG-AS1 may be new biomarkers of lung cancer, SNHG3 may associate with PDL1 for lung cancer, and HOTAIR and BDNF-AS may be potential biomarkers of neuroblastoma. Conclusion: We hope that the proposed LDAenDL method can help the development of targeted therapies for these two diseases.
Collapse
Affiliation(s)
- Zhenguo Su
- Clinical Lab, Yantai Affiliated Hospital of Binzhou Medical University, Yantai, China
| | - Huihui Lu
- Department of Thoracic Cardiovascular Surgery, Hunan Province Directly Affiliated TCM Hospital, Zhuzhou, China
| | - Yan Wu
- Geneis (Beijing) Co., Ltd., Beijing, China
| | - Zejun Li
- School of Computer Science, Hunan Institute of Technology, Hengyang, China
| | - Lian Duan
- Faculty of Pediatrics, The Chinese PLA General Hospital, Beijing, China
- Department of Pediatric Surgery, The Seventh Medical Center of PLA General Hospital, Beijing, China
- National Engineering Laboratory for Birth Defects Prevention and Control of Key Technology, Beijing, China
- Beijing Key Laboratory of Pediatric Organ Failure, Beijing, China
| |
Collapse
|
9
|
Kim Y, Lee M. Deep Learning Approaches for lncRNA-Mediated Mechanisms: A Comprehensive Review of Recent Developments. Int J Mol Sci 2023; 24:10299. [PMID: 37373445 DOI: 10.3390/ijms241210299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 06/16/2023] [Accepted: 06/17/2023] [Indexed: 06/29/2023] Open
Abstract
This review paper provides an extensive analysis of the rapidly evolving convergence of deep learning and long non-coding RNAs (lncRNAs). Considering the recent advancements in deep learning and the increasing recognition of lncRNAs as crucial components in various biological processes, this review aims to offer a comprehensive examination of these intertwined research areas. The remarkable progress in deep learning necessitates thoroughly exploring its latest applications in the study of lncRNAs. Therefore, this review provides insights into the growing significance of incorporating deep learning methodologies to unravel the intricate roles of lncRNAs. By scrutinizing the most recent research spanning from 2021 to 2023, this paper provides a comprehensive understanding of how deep learning techniques are employed in investigating lncRNAs, thereby contributing valuable insights to this rapidly evolving field. The review is aimed at researchers and practitioners looking to integrate deep learning advancements into their lncRNA studies.
Collapse
Affiliation(s)
- Yoojoong Kim
- School of Computer Science and Information Engineering, The Catholic University of Korea, Bucheon 14662, Republic of Korea
| | - Minhyeok Lee
- School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|
10
|
Gu C, Li X. Prediction of disease-related miRNAs by voting with multiple classifiers. BMC Bioinformatics 2023; 24:177. [PMID: 37122001 PMCID: PMC10150488 DOI: 10.1186/s12859-023-05308-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 04/26/2023] [Indexed: 05/02/2023] Open
Abstract
There is strong evidence to support that mutations and dysregulation of miRNAs are associated with a variety of diseases, including cancer. However, the experimental methods used to identify disease-related miRNAs are expensive and time-consuming. Effective computational approaches to identify disease-related miRNAs are in high demand and would aid in the detection of lncRNA biomarkers for disease diagnosis, treatment, and prevention. In this study, we develop an ensemble learning framework to reveal the potential associations between miRNAs and diseases (ELMDA). The ELMDA framework does not rely on the known associations when calculating miRNA and disease similarities and uses multi-classifiers voting to predict disease-related miRNAs. As a result, the average AUC of the ELMDA framework was 0.9229 for the HMDD v2.0 database in a fivefold cross-validation. All potential associations in the HMDD V2.0 database were predicted, and 90% of the top 50 results were verified with the updated HMDD V3.2 database. The ELMDA framework was implemented to investigate gastric neoplasms, prostate neoplasms and colon neoplasms, and 100%, 94%, and 90%, respectively, of the top 50 potential miRNAs were validated by the HMDD V3.2 database. Moreover, the ELMDA framework can predict isolated disease-related miRNAs. In conclusion, ELMDA appears to be a reliable method to uncover disease-associated miRNAs.
Collapse
Affiliation(s)
- Changlong Gu
- College of Information Science and Engineering, Hunan University, Changsha, 410082, Hunan, China.
| | - Xiaoying Li
- College of Information Science and Engineering, Hunan University, Changsha, 410082, Hunan, China.
| |
Collapse
|
11
|
Zhao Z, Luo Q, Liu Y, Jiang K, Zhou L, Dai R, Wang H. Multi-level integrative analysis of the roles of lncRNAs and differential mRNAs in the progression of chronic pancreatitis to pancreatic ductal adenocarcinoma. BMC Genomics 2023; 24:101. [PMID: 36879212 PMCID: PMC9990329 DOI: 10.1186/s12864-023-09209-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 02/27/2023] [Indexed: 03/08/2023] Open
Abstract
BACKGROUND Pancreatic ductal adenocarcinoma (PDAC) is one of the most malignant tumors and approximately 5% of patients with chronic pancreatitis (CP) inevitably develop PDAC. This study aims explore the key gene regulation involved in the progression of CP to PDAC, with a particular emphasis on the function of lncRNAs. RESULTS A total of 103 pancreatic tissue samples collected from 11 to 92 patients with CP and PDAC, respectively, were included in this study. After normalizing and logarithmically converting the original data, differentially expressed lncRNAs (DElncRNAs) and mRNAs (DEGs) in each dataset were selected. To determine the main functional pathways of differential mRNAs, we further annotated DEGs using gene ontology (GO) and analyzed the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment. In addition, the interaction between lncRNA-miRNA-mRNA was clarified and the protein-protein interaction (PPI) network was constructed to screen for key modules and determine hub genes. Finally, quantitative real-time polymerase chain reaction (qPCR) was used to detect the changes in non-coding RNAs and key mRNAs in the pancreatic tissues of patients with CP and PDAC. In this study, 230 lncRNAs and 17,668 mRNAs were included. There were nine upregulated lncRNAs and 188 downregulated lncRNAs. Furthermore, 2334 upregulated differential mRNAs and 10,341 downregulated differential mRNAs were included in the enrichment analysis. From the KEGG enrichment analysis, cytokine-cytokine receptor interaction, calcium signaling pathway, cAMP signaling pathway, and nicotine addiction exhibited significant differences. Additionally, a total of 52 lncRNAs, 104 miRNAs, and 312 mRNAs were included in the construction of a potential lncRNA-miRNA-mRNA regulatory network. PPI network was established and two of the five central DEGs were created in this module, suggesting that lysophosphatidic acid receptor 1 (LPAR1) and regulator of calcineurin 2 (RCAN2) may play significant roles in the progression from CP to PDAC. Finally, the PCR results suggested that LINC01547/hsa-miR-4694-3p/LPAR1 and LINC00482/hsa-miR-6756-3p/RCAN2 play important roles in the carcinogenesis process of CP. CONCLUSION Two signaling axes critical in the progression of CP to PDAC were screened out. Our findings will be useful for novel insights into the molecular mechanism and potential diagnostic or therapeutic biomarkers for CP and PDAC.
Collapse
Affiliation(s)
- Zhirong Zhao
- Affiliated Hospital of Southwest Jiaotong University, The General Hospital of Western Theater Command, Chengdu, 610031, Sichuan, China.,Pancreatic injury and repair Key laboratory of Sichuan Province, The General Hospital of Western Theater Command, Chengdu, Sichuan, China
| | - Qiang Luo
- Department of Cardiology, Affiliated Hospital of Southwest Jiaotong University, The Third People's Hospital of Chengdu, Chengdu, Sichuan, China
| | - Yi Liu
- School of Medicine, Jianghan University, 430056, Wuhan, Hubei, China
| | - Kexin Jiang
- Affiliated Hospital of Southwest Jiaotong University, The General Hospital of Western Theater Command, Chengdu, 610031, Sichuan, China
| | - Lichen Zhou
- Affiliated Hospital of Southwest Jiaotong University, The General Hospital of Western Theater Command, Chengdu, 610031, Sichuan, China
| | - Ruiwu Dai
- Affiliated Hospital of Southwest Jiaotong University, The General Hospital of Western Theater Command, Chengdu, 610031, Sichuan, China. .,Pancreatic injury and repair Key laboratory of Sichuan Province, The General Hospital of Western Theater Command, Chengdu, Sichuan, China.
| | - Han Wang
- Department of Cardiology, Affiliated Hospital of Southwest Jiaotong University, The Third People's Hospital of Chengdu, Chengdu, Sichuan, China.
| |
Collapse
|
12
|
Fu Y, Si A, Wei X, Lin X, Ma Y, Qiu H, Guo Z, Pan Y, Zhang Y, Kong X, Li S, Shi Y, Wu H. Combining a machine-learning derived 4-lncRNA signature with AFP and TNM stages in predicting early recurrence of hepatocellular carcinoma. BMC Genomics 2023; 24:89. [PMID: 36849926 PMCID: PMC9972730 DOI: 10.1186/s12864-023-09194-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Accepted: 02/17/2023] [Indexed: 03/01/2023] Open
Abstract
BACKGROUND Near 70% of hepatocellular carcinoma (HCC) recurrence is early recurrence within 2-year post surgery. Long non-coding RNAs (lncRNAs) are intensively involved in HCC progression and serve as biomarkers for HCC prognosis. The aim of this study is to construct a lncRNA-based signature for predicting HCC early recurrence. METHODS Data of RNA expression and associated clinical information were accessed from The Cancer Genome Atlas Liver Hepatocellular Carcinoma (TCGA-LIHC) database. Recurrence associated differentially expressed lncRNAs (DELncs) were determined by three DEG methods and two survival analyses methods. DELncs involved in the signature were selected by three machine learning methods and multivariate Cox analysis. Additionally, the signature was validated in a cohort of HCC patients from an external source. In order to gain insight into the biological functions of this signature, gene sets enrichment analyses, immune infiltration analyses, as well as immune and drug therapy prediction analyses were conducted. RESULTS A 4-lncRNA signature consisting of AC108463.1, AF131217.1, CMB9-22P13.1, TMCC1-AS1 was constructed. Patients in the high-risk group showed significantly higher early recurrence rate compared to those in the low-risk group. Combination of the signature, AFP and TNM further improved the early HCC recurrence predictive performance. Several molecular pathways and gene sets associated with HCC pathogenesis are enriched in the high-risk group. Antitumor immune cells, such as activated B cell, type 1 T helper cell, natural killer cell and effective memory CD8 T cell are enriched in patients with low-risk HCCs. HCC patients in the low- and high-risk group had differential sensitivities to various antitumor drugs. Finally, predictive performance of this signature was validated in an external cohort of patients with HCC. CONCLUSION Combined with TNM and AFP, the 4-lncRNA signature presents excellent predictability of HCC early recurrence.
Collapse
Affiliation(s)
- Yi Fu
- grid.507037.60000 0004 1764 1277Shanghai Key Laboratory of Molecular Imaging, Zhoupu Hospital, Shanghai University of Medicine and Health Sciences, Shanghai, China ,grid.507037.60000 0004 1764 1277Collaborative Innovation Center for Biomedicines, Shanghai University of Medicine and Health Sciences, Shanghai, China ,grid.507037.60000 0004 1764 1277School of Medical Instruments, Shanghai University of Medicine and Health Sciences, Shanghai, China
| | - Anfeng Si
- grid.41156.370000 0001 2314 964XDepartment of Surgical Oncology, Jinling Hospital, Medical School of Nanjing University, Nanjing, China
| | - Xindong Wei
- grid.412585.f0000 0004 0604 8558Central Laboratory, Department of Liver Diseases, Shuguang Hospital, Shanghai University of Chinese Traditional Medicine, Shanghai, China
| | - Xinjie Lin
- grid.507037.60000 0004 1764 1277Shanghai Key Laboratory of Molecular Imaging, Zhoupu Hospital, Shanghai University of Medicine and Health Sciences, Shanghai, China ,grid.507037.60000 0004 1764 1277Collaborative Innovation Center for Biomedicines, Shanghai University of Medicine and Health Sciences, Shanghai, China
| | - Yujie Ma
- grid.507037.60000 0004 1764 1277Shanghai Key Laboratory of Molecular Imaging, Zhoupu Hospital, Shanghai University of Medicine and Health Sciences, Shanghai, China ,grid.507037.60000 0004 1764 1277Collaborative Innovation Center for Biomedicines, Shanghai University of Medicine and Health Sciences, Shanghai, China
| | - Huimin Qiu
- grid.507037.60000 0004 1764 1277Collaborative Innovation Center for Biomedicines, Shanghai University of Medicine and Health Sciences, Shanghai, China ,grid.267139.80000 0000 9188 055XSchool of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - Zhinan Guo
- grid.507037.60000 0004 1764 1277Collaborative Innovation Center for Biomedicines, Shanghai University of Medicine and Health Sciences, Shanghai, China ,grid.412543.50000 0001 0033 4148School of Kinesiology, Shanghai University of Sport, Shanghai, China
| | - Yong Pan
- grid.268099.c0000 0001 0348 3990Department of Infectious Disease, Zhoushan Hospital, Wenzhou Medical University, Zhoushan, China
| | - Yiru Zhang
- grid.268099.c0000 0001 0348 3990Department of Infectious Disease, Zhoushan Hospital, Wenzhou Medical University, Zhoushan, China
| | - Xiaoni Kong
- grid.412585.f0000 0004 0604 8558Central Laboratory, Department of Liver Diseases, Shuguang Hospital, Shanghai University of Chinese Traditional Medicine, Shanghai, China
| | - Shibo Li
- Department of Infectious Disease, Zhoushan Hospital, Wenzhou Medical University, Zhoushan, China.
| | - Yanjun Shi
- Abdominal Transplantation Center, General Surgery, School of Medicine, Ruijin Hospital, Shanghai Jiao Tong University, Shanghai, China.
| | - Hailong Wu
- Shanghai Key Laboratory of Molecular Imaging, Zhoupu Hospital, Shanghai University of Medicine and Health Sciences, Shanghai, China. .,Collaborative Innovation Center for Biomedicines, Shanghai University of Medicine and Health Sciences, Shanghai, China. .,School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China. .,School of Kinesiology, Shanghai University of Sport, Shanghai, China.
| |
Collapse
|
13
|
Ma Y, Zhang H, Jin C, Kang C. Predicting lncRNA-protein interactions with bipartite graph embedding and deep graph neural networks. Front Genet 2023; 14:1136672. [PMID: 36845380 PMCID: PMC9948011 DOI: 10.3389/fgene.2023.1136672] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 01/30/2023] [Indexed: 02/11/2023] Open
Abstract
Background: Long non-coding RNAs (lncRNAs) play crucial roles in numerous biological processes. Investigation of the lncRNA-protein interaction contributes to discovering the undetected molecular functions of lncRNAs. In recent years, increasingly computational approaches have substituted the traditional time-consuming experiments utilized to crack the possible unknown associations. However, significant explorations of the heterogeneity in association prediction between lncRNA and protein are inadequate. It remains challenging to integrate the heterogeneity of lncRNA-protein interactions with graph neural network algorithms. Methods: In this paper, we constructed a deep architecture based on GNN called BiHo-GNN, which is the first to integrate the properties of homogeneous with heterogeneous networks through bipartite graph embedding. Different from previous research, BiHo-GNN can capture the mechanism of molecular association by the data encoder of heterogeneous networks. Meanwhile, we design the process of mutual optimization between homogeneous and heterogeneous networks, which can promote the robustness of BiHo-GNN. Results: We collected four datasets for predicting lncRNA-protein interaction and compared the performance of current prediction models on benchmarking dataset. In comparison with the performance of other models, BiHo-GNN outperforms existing bipartite graph-based methods. Conclusion: Our BiHo-GNN integrates the bipartite graph with homogeneous graph networks. Based on this model structure, the lncRNA-protein interactions and potential associations can be predicted and discovered accurately.
Collapse
Affiliation(s)
- Yuzhou Ma
- College of Artificial Intelligence, Nankai University, Tianjin, China
| | - Han Zhang
- College of Artificial Intelligence, Nankai University, Tianjin, China,*Correspondence: Han Zhang,
| | - Chen Jin
- College of Computer Science, Nankai University, Tianjin, China
| | - Chuanze Kang
- College of Artificial Intelligence, Nankai University, Tianjin, China
| |
Collapse
|
14
|
Li S, Chang M, Tong L, Wang Y, Wang M, Wang F. Screening potential lncRNA biomarkers for breast cancer and colorectal cancer combining random walk and logistic matrix factorization. Front Genet 2023; 13:1023615. [PMID: 36744179 PMCID: PMC9895102 DOI: 10.3389/fgene.2022.1023615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2022] [Accepted: 10/10/2022] [Indexed: 01/21/2023] Open
Abstract
Breast cancer and colorectal cancer are two of the most common malignant tumors worldwide. They cause the leading causes of cancer mortality. Many researches have demonstrated that long noncoding RNAs (lncRNAs) have close linkages with the occurrence and development of the two cancers. Therefore, it is essential to design an effective way to identify potential lncRNA biomarkers for them. In this study, we developed a computational method (LDA-RWLMF) by integrating random walk with restart and Logistic Matrix Factorization to investigate the roles of lncRNA biomarkers in the prognosis and diagnosis of the two cancers. We first fuse disease semantic and Gaussian association profile similarities and lncRNA functional and Gaussian association profile similarities. Second, we design a negative selection algorithm to extract negative LncRNA-Disease Associations (LDA) based on random walk. Third, we develop a logistic matrix factorization model to predict possible LDAs. We compare our proposed LDA-RWLMF method with four classical LDA prediction methods, that is, LNCSIM1, LNCSIM2, ILNCSIM, and IDSSIM. The results from 5-fold cross validation on the MNDR dataset show that LDA-RWLMF computes the best AUC value of 0.9312, outperforming the above four LDA prediction methods. Finally, we rank all lncRNA biomarkers for the two cancers after determining the performance of LDA-RWLMF, respectively. We find that 48 and 50 lncRNAs have the highest association scores with breast cancer and colorectal cancer among all lncRNAs known to associate with them on the MNDR dataset, respectively. We predict that lncRNAs HULC and HAR1A could be separately potential biomarkers for breast cancer and colorectal cancer and need to biomedical experimental validation.
Collapse
|
15
|
Zhang P, Sun W, Wei D, Li G, Xu J, You Z, Zhao B, Li L. PDA-PRGCN: identification of Piwi-interacting RNA-disease associations through subgraph projection and residual scaling-based feature augmentation. BMC Bioinformatics 2023; 24:18. [PMID: 36650439 PMCID: PMC9843905 DOI: 10.1186/s12859-022-05073-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 05/10/2022] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND Emerging evidences show that Piwi-interacting RNAs (piRNAs) play a pivotal role in numerous complex human diseases. Identifying potential piRNA-disease associations (PDAs) is crucial for understanding disease pathogenesis at molecular level. Compared to the biological wet experiments, the computational methods provide a cost-effective strategy. However, few computational methods have been developed so far. RESULTS Here, we proposed an end-to-end model, referred to as PDA-PRGCN (PDA prediction using subgraph Projection and Residual scaling-based feature augmentation through Graph Convolutional Network). Specifically, starting with the known piRNA-disease associations represented as a graph, we applied subgraph projection to construct piRNA-piRNA and disease-disease subgraphs for the first time, followed by a residual scaling-based feature augmentation algorithm for node initial representation. Then, we adopted graph convolutional network (GCN) to learn and identify potential PDAs as a link prediction task on the constructed heterogeneous graph. Comprehensive experiments, including the performance comparison of individual components in PDA-PRGCN, indicated the significant improvement of integrating subgraph projection, node feature augmentation and dual-loss mechanism into GCN for PDA prediction. Compared with state-of-the-art approaches, PDA-PRGCN gave more accurate and robust predictions. Finally, the case studies further corroborated that PDA-PRGCN can reliably detect PDAs. CONCLUSION PDA-PRGCN provides a powerful method for PDA prediction, which can also serve as a screening tool for studies of complex diseases.
Collapse
Affiliation(s)
- Ping Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Weicheng Sun
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Dengguo Wei
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
- Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Shenzhen, 518000, China
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518000, China
| | - Guodong Li
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jinsheng Xu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Zhuhong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710129, China
| | - Bowei Zhao
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
| | - Li Li
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
- Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, People's Republic of China.
| |
Collapse
|
16
|
Peng Y, Zhao S, Zeng Z, Hu X, Yin Z. LGBMDF: A cascade forest framework with LightGBM for predicting drug-target interactions. Front Microbiol 2023; 13:1092467. [PMID: 36687573 PMCID: PMC9849804 DOI: 10.3389/fmicb.2022.1092467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 12/07/2022] [Indexed: 01/07/2023] Open
Abstract
Prediction of drug-target interactions (DTIs) plays an important role in drug development. However, traditional laboratory methods to determine DTIs require a lot of time and capital costs. In recent years, many studies have shown that using machine learning methods to predict DTIs can speed up the drug development process and reduce capital costs. An excellent DTI prediction method should have both high prediction accuracy and low computational cost. In this study, we noticed that the previous research based on deep forests used XGBoost as the estimator in the cascade, we applied LightGBM instead of XGBoost to the cascade forest as the estimator, then the estimator group was determined experimentally as three LightGBMs and three ExtraTrees, this new model is called LGBMDF. We conducted 5-fold cross-validation on LGBMDF and other state-of-the-art methods using the same dataset, and compared their Sn, Sp, MCC, AUC and AUPR. Finally, we found that our method has better performance and faster calculation speed.
Collapse
|
17
|
Zhong W, He C, Xiao C, Liu Y, Qin X, Yu Z. Long-distance dependency combined multi-hop graph neural networks for protein-protein interactions prediction. BMC Bioinformatics 2022; 23:521. [PMID: 36471248 PMCID: PMC9724439 DOI: 10.1186/s12859-022-05062-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2022] [Accepted: 11/16/2022] [Indexed: 12/10/2022] Open
Abstract
BACKGROUND Protein-protein interactions are widespread in biological systems and play an important role in cell biology. Since traditional laboratory-based methods have some drawbacks, such as time-consuming, money-consuming, etc., a large number of methods based on deep learning have emerged. However, these methods do not take into account the long-distance dependency information between each two amino acids in sequence. In addition, most existing models based on graph neural networks only aggregate the first-order neighbors in protein-protein interaction (PPI) network. Although multi-order neighbor information can be aggregated by increasing the number of layers of neural network, it is easy to cause over-fitting. So, it is necessary to design a network that can capture long distance dependency information between amino acids in the sequence and can directly capture multi-order neighbor information in protein-protein interaction network. RESULTS In this study, we propose a multi-hop neural network (LDMGNN) model combining long distance dependency information to predict the multi-label protein-protein interactions. In the LDMGNN model, we design the protein amino acid sequence encoding (PAASE) module with the multi-head self-attention Transformer block to extract the features of amino acid sequences by calculating the interdependence between every two amino acids. And expand the receptive field in space by constructing a two-hop protein-protein interaction (THPPI) network. We combine PPI network and THPPI network with amino acid sequence features respectively, then input them into two identical GIN blocks at the same time to obtain two embeddings. Next, the two embeddings are fused and input to the classifier for predict multi-label protein-protein interactions. Compared with other state-of-the-art methods, LDMGNN shows the best performance on both the SHS27K and SHS148k datasets. Ablation experiments show that the PAASE module and the construction of THPPI network are feasible and effective. CONCLUSIONS In general terms, our proposed LDMGNN model has achieved satisfactory results in the prediction of multi-label protein-protein interactions.
Collapse
Affiliation(s)
- Wen Zhong
- grid.267139.80000 0000 9188 055XCollege of Science, University of Shanghai for Science and Technology, Jungong Road, Shanghai, 200093 China
| | - Changxiang He
- grid.267139.80000 0000 9188 055XCollege of Science, University of Shanghai for Science and Technology, Jungong Road, Shanghai, 200093 China
| | - Chen Xiao
- grid.267139.80000 0000 9188 055XCollege of Science, University of Shanghai for Science and Technology, Jungong Road, Shanghai, 200093 China
| | - Yuru Liu
- grid.267139.80000 0000 9188 055XCollege of Science, University of Shanghai for Science and Technology, Jungong Road, Shanghai, 200093 China
| | - Xiaofei Qin
- grid.267139.80000 0000 9188 055XSchool of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Jungong Road, Shanghai, 200093 China
| | - Zhensheng Yu
- grid.267139.80000 0000 9188 055XCollege of Science, University of Shanghai for Science and Technology, Jungong Road, Shanghai, 200093 China
| |
Collapse
|
18
|
Peng L, Yang J, Wang M, Zhou L. Editorial: Machine learning-based methods for RNA data analysis—Volume II. Front Genet 2022; 13:1010089. [DOI: 10.3389/fgene.2022.1010089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 09/20/2022] [Indexed: 12/02/2022] Open
|
19
|
Pepe G, Appierdo R, Carrino C, Ballesio F, Helmer-Citterich M, Gherardini PF. Artificial intelligence methods enhance the discovery of RNA interactions. Front Mol Biosci 2022; 9:1000205. [PMID: 36275611 PMCID: PMC9585310 DOI: 10.3389/fmolb.2022.1000205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 09/20/2022] [Indexed: 11/13/2022] Open
Abstract
Understanding how RNAs interact with proteins, RNAs, or other molecules remains a challenge of main interest in biology, given the importance of these complexes in both normal and pathological cellular processes. Since experimental datasets are starting to be available for hundreds of functional interactions between RNAs and other biomolecules, several machine learning and deep learning algorithms have been proposed for predicting RNA-RNA or RNA-protein interactions. However, most of these approaches were evaluated on a single dataset, making performance comparisons difficult. With this review, we aim to summarize recent computational methods, developed in this broad research area, highlighting feature encoding and machine learning strategies adopted. Given the magnitude of the effect that dataset size and quality have on performance, we explored the characteristics of these datasets. Additionally, we discuss multiple approaches to generate datasets of negative examples for training. Finally, we describe the best-performing methods to predict interactions between proteins and specific classes of RNA molecules, such as circular RNAs (circRNAs) and long non-coding RNAs (lncRNAs), and methods to predict RNA-RNA or RNA-RBP interactions independently of the RNA type.
Collapse
Affiliation(s)
- G Pepe
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
- *Correspondence: G Pepe, ; M Helmer-Citterich,
| | - R Appierdo
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - C Carrino
- PhD Program in Cellular and Molecular Biology, Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - F Ballesio
- PhD Program in Cellular and Molecular Biology, Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - M Helmer-Citterich
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
- *Correspondence: G Pepe, ; M Helmer-Citterich,
| | - PF Gherardini
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| |
Collapse
|
20
|
Su Q, Tan Q, Liu X, Wu L. Prioritizing potential circRNA biomarkers for bladder cancer and bladder urothelial cancer based on an ensemble model. Front Genet 2022; 13:1001608. [PMID: 36186429 PMCID: PMC9521272 DOI: 10.3389/fgene.2022.1001608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2022] [Accepted: 08/15/2022] [Indexed: 12/03/2022] Open
Abstract
Bladder cancer is the most common cancer of the urinary system. Bladder urothelial cancer accounts for 90% of bladder cancer. These two cancers have high morbidity and mortality rates worldwide. The identification of biomarkers for bladder cancer and bladder urothelial cancer helps in their diagnosis and treatment. circRNAs are considered oncogenes or tumor suppressors in cancers, and they play important roles in the occurrence and development of cancers. In this manuscript, we developed an Ensemble model, CDA-EnRWLRLS, to predict circRNA-Disease Associations (CDA) combining Random Walk with restart and Laplacian Regularized Least Squares, and further screen potential biomarkers for bladder cancer and bladder urothelial cancer. First, we compute disease similarity by combining the semantic similarity and association profile similarity of diseases and circRNA similarity by combining the functional similarity and association profile similarity of circRNAs. Second, we score each circRNA-disease pair by random walk with restart and Laplacian regularized least squares, respectively. Third, circRNA-disease association scores from these models are integrated to obtain the final CDAs by the soft voting approach. Finally, we use CDA-EnRWLRLS to screen potential circRNA biomarkers for bladder cancer and bladder urothelial cancer. CDA-EnRWLRLS is compared to three classical CDA prediction methods (CD-LNLP, DWNN-RLS, and KATZHCDA) and two individual models (CDA-RWR and CDA-LRLS), and obtains better AUC of 0.8654. We predict that circHIPK3 has the highest association with bladder cancer and may be its potential biomarker. In addition, circSMARCA5 has the highest association with bladder urothelial cancer and may be its possible biomarker.
Collapse
|
21
|
Robin V, Bodein A, Scott-Boyer MP, Leclercq M, Périn O, Droit A. Overview of methods for characterization and visualization of a protein-protein interaction network in a multi-omics integration context. Front Mol Biosci 2022; 9:962799. [PMID: 36158572 PMCID: PMC9494275 DOI: 10.3389/fmolb.2022.962799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 08/16/2022] [Indexed: 11/26/2022] Open
Abstract
At the heart of the cellular machinery through the regulation of cellular functions, protein-protein interactions (PPIs) have a significant role. PPIs can be analyzed with network approaches. Construction of a PPI network requires prediction of the interactions. All PPIs form a network. Different biases such as lack of data, recurrence of information, and false interactions make the network unstable. Integrated strategies allow solving these different challenges. These approaches have shown encouraging results for the understanding of molecular mechanisms, drug action mechanisms, and identification of target genes. In order to give more importance to an interaction, it is evaluated by different confidence scores. These scores allow the filtration of the network and thus facilitate the representation of the network, essential steps to the identification and understanding of molecular mechanisms. In this review, we will discuss the main computational methods for predicting PPI, including ones confirming an interaction as well as the integration of PPIs into a network, and we will discuss visualization of these complex data.
Collapse
Affiliation(s)
- Vivian Robin
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Antoine Bodein
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Marie-Pier Scott-Boyer
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Mickaël Leclercq
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Olivier Périn
- Digital Sciences Department, L'Oréal Advanced Research, Aulnay-sous-bois, France
| | - Arnaud Droit
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| |
Collapse
|
22
|
Guo Z, Hui Y, Kong F, Lin X. Finding Lung-Cancer-Related lncRNAs Based on Laplacian Regularized Least Squares With Unbalanced Bi-Random Walk. Front Genet 2022; 13:933009. [PMID: 35938010 PMCID: PMC9355720 DOI: 10.3389/fgene.2022.933009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 06/03/2022] [Indexed: 11/13/2022] Open
Abstract
Lung cancer is one of the leading causes of cancer-related deaths. Thus, it is important to find its biomarkers. Furthermore, there is an increasing number of studies reporting that long noncoding RNAs (lncRNAs) demonstrate dense linkages with multiple human complex diseases. Inferring new lncRNA-disease associations help to identify potential biomarkers for lung cancer and further understand its pathogenesis, design new drugs, and formulate individualized therapeutic options for lung cancer patients. This study developed a computational method (LDA-RLSURW) by integrating Laplacian regularized least squares and unbalanced bi-random walk to discover possible lncRNA biomarkers for lung cancer. First, the lncRNA and disease similarities were computed. Second, unbalanced bi-random walk was, respectively, applied to the lncRNA and disease networks to score associations between diseases and lncRNAs. Third, Laplacian regularized least squares were further used to compute the association probability between each lncRNA-disease pair based on the computed random walk scores. LDA-RLSURW was compared using 10 classical LDA prediction methods, and the best AUC value of 0.9027 on the lncRNADisease database was obtained. We found the top 30 lncRNAs associated with lung cancers and inferred that lncRNAs TUG1, PTENP1, and UCA1 may be biomarkers of lung neoplasms, non-small–cell lung cancer, and LUAD, respectively.
Collapse
|
23
|
Peng L, Yang J, Wang M, Zhou L. Editorial: Machine Learning-Based Methods for RNA Data Analysis. Front Genet 2022; 13:828575. [PMID: 35692815 PMCID: PMC9175173 DOI: 10.3389/fgene.2022.828575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 04/12/2022] [Indexed: 11/13/2022] Open
Affiliation(s)
- Lihong Peng
- College of Life Sciences and Chemistry, Hunan University of Technology, Zhuzhou, China
- School of Computer, Hunan University of Technology, Zhuzhou, China
| | | | - Minxian Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Liqian Zhou
- College of Life Sciences and Chemistry, Hunan University of Technology, Zhuzhou, China
- *Correspondence: Liqian Zhou,
| |
Collapse
|
24
|
Deng L, Liu Z, Qian Y, Zhang J. Predicting circRNA-drug sensitivity associations via graph attention auto-encoder. BMC Bioinformatics 2022; 23:160. [PMID: 35508967 PMCID: PMC9066932 DOI: 10.1186/s12859-022-04694-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 04/20/2022] [Indexed: 11/18/2022] Open
Abstract
Background Circular RNAs (circRNAs) play essential roles in cancer development and therapy resistance. Many studies have shown that circRNA is closely related to human health. The expression of circRNAs also affects the sensitivity of cells to drugs, thereby significantly affecting the efficacy of drugs. However, traditional biological experiments are time-consuming and expensive to validate drug-related circRNAs. Therefore, it is an important and urgent task to develop an effective computational method for predicting unknown circRNA-drug associations. Results In this work, we propose a computational framework (GATECDA) based on graph attention auto-encoder to predict circRNA-drug sensitivity associations. In GATECDA, we leverage multiple databases, containing the sequences of host genes of circRNAs, the structure of drugs, and circRNA-drug sensitivity associations. Based on the data, GATECDA employs Graph attention auto-encoder (GATE) to extract the low-dimensional representation of circRNA/drug, effectively retaining critical information in sparse high-dimensional features and realizing the effective fusion of nodes’ neighborhood information. Experimental results indicate that GATECDA achieves an average AUC of 89.18% under 10-fold cross-validation. Case studies further show the excellent performance of GATECDA. Conclusions Many experimental results and case studies show that our proposed GATECDA method can effectively predict the circRNA-drug sensitivity associations.
Collapse
Affiliation(s)
- Lei Deng
- School of Software, Xinjiang University, Urumqi, China.,School of Computer Science and Engineering, Central South University, Changsha, China
| | - Zixuan Liu
- School of Software, Xinjiang University, Urumqi, China
| | - Yurong Qian
- School of Software, Xinjiang University, Urumqi, China
| | - Jingpu Zhang
- School of Computer and Data Science, Henan University of Urban Construction, Pingdingshan, China.
| |
Collapse
|
25
|
Zhu B, Xu Y, Zhao P, Yiu SM, Yu H, Shi JY. NNAN: Nearest Neighbor Attention Network to Predict Drug–Microbe Associations. Front Microbiol 2022; 13:846915. [PMID: 35479616 PMCID: PMC9035839 DOI: 10.3389/fmicb.2022.846915] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Accepted: 02/14/2022] [Indexed: 11/13/2022] Open
Abstract
Many drugs can be metabolized by human microbes; the drug metabolites would significantly alter pharmacological effects and result in low therapeutic efficacy for patients. Hence, it is crucial to identify potential drug–microbe associations (DMAs) before the drug administrations. Nevertheless, traditional DMA determination cannot be applied in a wide range due to the tremendous number of microbe species, high costs, and the fact that it is time-consuming. Thus, predicting possible DMAs in computer technology is an essential topic. Inspired by other issues addressed by deep learning, we designed a deep learning-based model named Nearest Neighbor Attention Network (NNAN). The proposed model consists of four components, namely, a similarity network constructor, a nearest-neighbor aggregator, a feature attention block, and a predictor. In brief, the similarity block contains a microbe similarity network and a drug similarity network. The nearest-neighbor aggregator generates the embedding representations of drug–microbe pairs by integrating drug neighbors and microbe neighbors of each drug–microbe pair in the network. The feature attention block evaluates the importance of each dimension of drug–microbe pair embedding by a set of ordinary multi-layer neural networks. The predictor is an ordinary fully-connected deep neural network that functions as a binary classifier to distinguish potential DMAs among unlabeled drug–microbe pairs. Several experiments on two benchmark databases are performed to evaluate the performance of NNAN. First, the comparison with state-of-the-art baseline approaches demonstrates the superiority of NNAN under cross-validation in terms of predicting performance. Moreover, the interpretability inspection reveals that a drug tends to associate with a microbe if it finds its top-l most similar neighbors that associate with the microbe.
Collapse
Affiliation(s)
- Bei Zhu
- School of Life Sciences, Northwestern Polytechnical University, Xi’an, China
| | - Yi Xu
- School of Life Sciences, Northwestern Polytechnical University, Xi’an, China
| | - Pengcheng Zhao
- School of Life Sciences, Northwestern Polytechnical University, Xi’an, China
| | - Siu-Ming Yiu
- Department of Computer Science, The University of Hong Kong, Hong Kong, China
| | - Hui Yu
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
- *Correspondence: Hui Yu,
| | - Jian-Yu Shi
- School of Life Sciences, Northwestern Polytechnical University, Xi’an, China
- Jian-Yu Shi,
| |
Collapse
|
26
|
Cai L, Gao M, Ren X, Fu X, Xu J, Wang P, Chen Y. MILNP: Plant lncRNA-miRNA Interaction Prediction Based on Improved Linear Neighborhood Similarity and Label Propagation. FRONTIERS IN PLANT SCIENCE 2022; 13:861886. [PMID: 35401586 PMCID: PMC8990282 DOI: 10.3389/fpls.2022.861886] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Accepted: 02/21/2022] [Indexed: 06/14/2023]
Abstract
Knowledge of the interactions between long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) is the basis of understanding various biological activities and designing new drugs. Previous computational methods for predicting lncRNA-miRNA interactions lacked for plants, and they suffer from various limitations that affect the prediction accuracy and their applicability. Research on plant lncRNA-miRNA interactions is still in its infancy. In this paper, we propose an accurate predictor, MILNP, for predicting plant lncRNA-miRNA interactions based on improved linear neighborhood similarity measurement and linear neighborhood propagation algorithm. Specifically, we propose a novel similarity measure based on linear neighborhood similarity from multiple similarity profiles of lncRNAs and miRNAs and derive more precise neighborhood ranges so as to escape the limits of the existing methods. We then simultaneously update the lncRNA-miRNA interactions predicted from both similarity matrices based on label propagation. We comprehensively evaluate MILNP on the latest plant lncRNA-miRNA interaction benchmark datasets. The results demonstrate the superior performance of MILNP than the most up-to-date methods. What's more, MILNP can be leveraged for isolated plant lncRNAs (or miRNAs). Case studies suggest that MILNP can identify novel plant lncRNA-miRNA interactions, which are confirmed by classical tools. The implementation is available on https://github.com/HerSwain/gra/tree/MILNP.
Collapse
Affiliation(s)
| | | | | | - Xiangzheng Fu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | | | - Peng Wang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | | |
Collapse
|
27
|
Xu D, Xu H, Zhang Y, Gao R. Novel Collaborative Weighted Non-negative Matrix Factorization Improves Prediction of Disease-Associated Human Microbes. Front Microbiol 2022; 13:834982. [PMID: 35369503 PMCID: PMC8965656 DOI: 10.3389/fmicb.2022.834982] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 01/19/2022] [Indexed: 12/14/2022] Open
Abstract
Extensive clinical and biomedical studies have shown that microbiome plays a prominent role in human health. Identifying potential microbe–disease associations (MDAs) can help reveal the pathological mechanism of human diseases and be useful for the prevention, diagnosis, and treatment of human diseases. Therefore, it is necessary to develop effective computational models and reduce the cost and time of biological experiments. Here, we developed a novel machine learning-based joint framework called CWNMF-GLapRLS for human MDA prediction using the proposed collaborative weighted non-negative matrix factorization (CWNMF) technique and graph Laplacian regularized least squares. Especially, to fuse more similarity information, we calculated the functional similarity of microbes. To deal with missing values and effectively overcome the data sparsity problem, we proposed a collaborative weighted NMF technique to reconstruct the original association matrix. In addition, we developed a graph Laplacian regularized least-squares method for prediction. The experimental results of fivefold and leave-one-out cross-validation demonstrated that our method achieved the best performance by comparing it with 5 state-of-the-art methods on the benchmark dataset. Case studies further showed that the proposed method is an effective tool to predict potential MDAs and can provide more help for biomedical researchers.
Collapse
Affiliation(s)
- Da Xu
- School of Mathematics and Statistics, Shandong University, Weihai, China
| | - Hanxiao Xu
- School of Mathematics and Statistics, Shandong University, Weihai, China
| | - Yusen Zhang
- School of Mathematics and Statistics, Shandong University, Weihai, China
- *Correspondence: Yusen Zhang,
| | - Rui Gao
- School of Control Science and Engineering, Shandong University, Jinan, China
- Rui Gao,
| |
Collapse
|
28
|
Li G, Wang D, Zhang Y, Liang C, Xiao Q, Luo J. Using Graph Attention Network and Graph Convolutional Network to Explore Human CircRNA-Disease Associations Based on Multi-Source Data. Front Genet 2022; 13:829937. [PMID: 35198012 PMCID: PMC8859418 DOI: 10.3389/fgene.2022.829937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Accepted: 01/10/2022] [Indexed: 11/13/2022] Open
Abstract
Cumulative research studies have verified that multiple circRNAs are closely associated with the pathogenic mechanism and cellular level. Exploring human circRNA-disease relationships is significant to decipher pathogenic mechanisms and provide treatment plans. At present, several computational models are designed to infer potential relationships between diseases and circRNAs. However, the majority of existing approaches could not effectively utilize the multisource data and achieve poor performance in sparse networks. In this study, we develop an advanced method, GATGCN, using graph attention network (GAT) and graph convolutional network (GCN) to detect potential circRNA-disease relationships. First, several sources of biomedical information are fused via the centered kernel alignment model (CKA), which calculates the corresponding weight of different kernels. Second, we adopt the graph attention network to learn latent representation of diseases and circRNAs. Third, the graph convolutional network is deployed to effectively extract features of associations by aggregating feature vectors of neighbors. Meanwhile, GATGCN achieves the prominent AUC of 0.951 under leave-one-out cross-validation and AUC of 0.932 under 5-fold cross-validation. Furthermore, case studies on lung cancer, diabetes retinopathy, and prostate cancer verify the reliability of GATGCN for detecting latent circRNA-disease pairs.
Collapse
Affiliation(s)
- Guanghui Li
- School of Information Engineering, East China Jiaotong University, Nanchang, China
| | - Diancheng Wang
- School of Information Engineering, East China Jiaotong University, Nanchang, China
| | - Yuejin Zhang
- School of Information Engineering, East China Jiaotong University, Nanchang, China
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Qiu Xiao
- College of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| |
Collapse
|
29
|
Chen M, Deng Y, Li A, Tan Y. Inferring Latent Disease-lncRNA Associations by Label-Propagation Algorithm and Random Projection on a Heterogeneous Network. Front Genet 2022; 13:798632. [PMID: 35186029 PMCID: PMC8854791 DOI: 10.3389/fgene.2022.798632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 01/18/2022] [Indexed: 11/13/2022] Open
Abstract
Long noncoding RNA (lncRNA), a type of more than 200 nucleotides non-coding RNA, is related to various complex diseases. To precisely identify the potential lncRNA–disease association is important to understand the disease pathogenesis, to develop new drugs, and to design individualized diagnosis and treatment methods for different human diseases. Compared with the complexity and high cost of biological experiments, computational methods can quickly and effectively predict potential lncRNA–disease associations. Thus, it is a promising avenue to develop computational methods for lncRNA-disease prediction. However, owing to the low prediction accuracy ofstate of the art methods, it is vastly challenging to accurately and effectively identify lncRNA-disease at present. This article proposed an integrated method called LPARP, which is based on label-propagation algorithm and random projection to address the issue. Specifically, the label-propagation algorithm is initially used to obtain the estimated scores of lncRNA–disease associations, and then random projections are used to accurately predict disease-related lncRNAs.The empirical experiments showed that LAPRP achieved good prediction on three golddatasets, which is superior to existing state-of-the-art prediction methods. It can also be used to predict isolated diseases and new lncRNAs. Case studies of bladder cancer, esophageal squamous-cell carcinoma, and colorectal cancer further prove the reliability of the method. The proposed LPARP algorithm can predict the potential lncRNA–disease interactions stably and effectively with fewer data. LPARP can be used as an effective and reliable tool for biomedical research.
Collapse
|