1
|
Zhang X, Zhao L, Chai Z, Wu H, Yang W, Li C, Jiang Y, Liu Q. NPI-DCGNN: An Accurate Tool for Identifying ncRNA-Protein Interactions Using a Dual-Channel Graph Neural Network. J Comput Biol 2024; 31:742-756. [PMID: 38923911 DOI: 10.1089/cmb.2023.0449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/28/2024] Open
Abstract
Noncoding RNA (NcRNA)-protein interactions (NPIs) play fundamentally important roles in carrying out cellular activities. Although various predictors based on molecular features and graphs have been published to boost the identification of NPIs, most of them often ignore the information between known NPIs or exhibit insufficient learning ability from graphs, posing a significant challenge in effectively identifying NPIs. To develop a more reliable and accurate predictor for NPIs, in this article, we propose NPI-DCGNN, an end-to-end NPI predictor based on a dual-channel graph neural network (DCGNN). NPI-DCGNN initially treats the known NPIs as an ncRNA-protein bipartite graph. Subsequently, for each ncRNA-protein pair, NPI-DCGNN extracts two local subgraphs centered around the ncRNA and protein, respectively, from the bipartite graph. After that, it utilizes a dual-channel graph representation learning layer based on GNN to generate high-level feature representations for the ncRNA-protein pair. Finally, it employs a fully connected network and output layer to predict whether an interaction exists between the pair of ncRNA and protein. Experimental results on four experimentally validated datasets demonstrate that NPI-DCGNN outperforms several state-of-the-art NPI predictors. Our case studies on the NPInter database further demonstrate the prediction power of NPI-DCGNN in predicting NPIs. With the availability of the source codes (https://github.com/zhangxin11111/NPI-DCGNN), we anticipate that NPI-DCGNN could facilitate the studies of ncRNA interactome by providing highly reliable NPI candidates for further experimental validation.
Collapse
Affiliation(s)
- Xin Zhang
- College of Information Engineering, Northwest A&F University, Yangling, China
| | - Liangwei Zhao
- College of Information Engineering, Northwest A&F University, Yangling, China
| | - Ziyi Chai
- College of Information Engineering, Northwest A&F University, Yangling, China
| | - Hao Wu
- School of Software, Shandong University, Jinan, China
| | - Wei Yang
- National Clinical Research Center for Infectious Diseases, Shenzhen, China
| | - Chen Li
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia
| | - Yu Jiang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Quanzhong Liu
- College of Information Engineering, Northwest A&F University, Yangling, China
| |
Collapse
|
2
|
Zhang M, Zhang L, Liu T, Feng H, He Z, Li F, Zhao J, Liu H. CBIL-VHPLI: a model for predicting viral-host protein-lncRNA interactions based on machine learning and transfer learning. Sci Rep 2024; 14:17549. [PMID: 39080344 PMCID: PMC11289117 DOI: 10.1038/s41598-024-68750-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Accepted: 07/26/2024] [Indexed: 08/02/2024] Open
Abstract
Virus‒host protein‒lncRNA interaction (VHPLI) predictions are critical for decoding the molecular mechanisms of viral pathogens and host immune processes. Although VHPLI interactions have been predicted in both plants and animals, they have not been extensively studied in viruses. For the first time, we propose a new deep learning-based approach that consists mainly of a convolutional neural network and bidirectional long and short-term memory network modules in combination with transfer learning named CBIL‒VHPLI to predict viral-host protein‒lncRNA interactions. The models were first trained on large and diverse datasets (including plants, animals, etc.). Protein sequence features were extracted using a k-mer method combined with the one-hot encoding and composition-transition-distribution (CTD) methods, and lncRNA sequence features were extracted using a k-mer method combined with the one-hot encoding and Z curve methods. The results obtained on three independent external validation datasets showed that the pre-trained CBIL‒VHPLI model performed the best with an accuracy of approximately 0.9. Pretraining was followed by conducting transfer learning on a viral protein-human lncRNA dataset, and the fine-tuning results showed that the accuracy of CBIL‒VHPLI was 0.946, which was significantly greater than that of the previous models. The final case study results showed that CBIL‒VHPLI achieved a prediction reproducibility rate of 91.6% for the RIP-Seq experimental screening results. This model was then used to predict the interactions between human lncRNA PIK3CD-AS2 and the nonstructural protein 1 (NS1) of the H5N1 virus, and RNA pull-down experiments were used to prove the prediction readiness of the model in terms of prediction. The source code of CBIL‒VHPLI and the datasets used in this work are available at https://github.com/Liu-Lab-Lnu/CBIL-VHPLI for academic usage.
Collapse
Affiliation(s)
- Man Zhang
- School of Life Science, Liaoning University, Shenyang, 110036, China
| | - Li Zhang
- School of Life Science, Liaoning University, Shenyang, 110036, China
- Technology Innovation Center for Computer Simulating and Information Processing of Bio-Macromolecules of Liaoning Province, Shenyang, 110036, China
- Engineering Laboratory for Molecular Simulation and Designing of Drug Molecules of Liaoning, Shenyang, 110036, China
| | - Ting Liu
- School of Life Science, Liaoning University, Shenyang, 110036, China
- China Medical University-Queen's University Belfast Joint College, China Medical University, Shenyang, 110036, China
| | - Huawei Feng
- Technology Innovation Center for Computer Simulating and Information Processing of Bio-Macromolecules of Liaoning Province, Shenyang, 110036, China
- Engineering Laboratory for Molecular Simulation and Designing of Drug Molecules of Liaoning, Shenyang, 110036, China
- School of Pharmacy, Liaoning University, No. 66, Chongshan Zhonglu, Shenyang, 110036, Liaoning, China
| | - Zhe He
- School of Life Science, Liaoning University, Shenyang, 110036, China
| | - Feng Li
- School of Life Science, Liaoning University, Shenyang, 110036, China
| | - Jian Zhao
- School of Life Science, Liaoning University, Shenyang, 110036, China
| | - Hongsheng Liu
- Technology Innovation Center for Computer Simulating and Information Processing of Bio-Macromolecules of Liaoning Province, Shenyang, 110036, China.
- Engineering Laboratory for Molecular Simulation and Designing of Drug Molecules of Liaoning, Shenyang, 110036, China.
- School of Pharmacy, Liaoning University, No. 66, Chongshan Zhonglu, Shenyang, 110036, Liaoning, China.
| |
Collapse
|
3
|
Liu L, Sun P, Zhang W. A pan-cancer interrogation of intronic polyadenylation and its association with cancer characteristics. Brief Bioinform 2024; 25:bbae376. [PMID: 39082645 PMCID: PMC11289681 DOI: 10.1093/bib/bbae376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 06/26/2024] [Accepted: 07/17/2024] [Indexed: 08/03/2024] Open
Abstract
3'UTR-APAs have been extensively studied, but intronic polyadenylations (IPAs) remain largely unexplored. We characterized the profiles of 22 260 IPAs in 9679 patient samples across 32 cancer types from the Cancer Genome Atlas cohort. By comparing tumor and paired normal tissues, we identified 180 ~ 4645 dysregulated IPAs in 132 ~ 2249 genes in each of 690 patient tumors from 22 cancer types that showed consistent patterns within individual cancer types. We selected 2741 genes that showed consistently patterns across cancer types, including 1834 pan-cancer tumor-enriched and 907 tumor-depleted IPA genes; the former were amply represented in the functional pathways such as deoxyribonucleic acid damage repair. Expression of IPA isoforms was associated with tumor mutation burden and patient characteristics (e.g. sex, race, cancer stages, and subtypes) in cancer-specific and feature-specific manners, and could be a more accurate prognostic marker than gene expression (summary of all isoforms). In summary, our study reveals the roles and the clinical relevance of tumor-associated IPAs.
Collapse
Affiliation(s)
- Liang Liu
- Department of Cancer Biology, Wake Forest University School of Medicine, Medical Center Blvd, Winston-Salem, NC 27157, United States
- Center for Cancer Genomics and Precision Oncology, Atrium Health Wake Forest Baptist Comprehensive Cancer Center, Medical Center Blvd, Winston-Salem, NC 27157, United States
| | - Peiqing Sun
- Department of Cancer Biology, Wake Forest University School of Medicine, Medical Center Blvd, Winston-Salem, NC 27157, United States
| | - Wei Zhang
- Department of Cancer Biology, Wake Forest University School of Medicine, Medical Center Blvd, Winston-Salem, NC 27157, United States
- Center for Cancer Genomics and Precision Oncology, Atrium Health Wake Forest Baptist Comprehensive Cancer Center, Medical Center Blvd, Winston-Salem, NC 27157, United States
| |
Collapse
|
4
|
Sun DZ, Sun ZL, Liu M, Yong SH. LPI-SKMSC: Predicting LncRNA-Protein Interactions with Segmented k-mer Frequencies and Multi-space Clustering. Interdiscip Sci 2024; 16:378-391. [PMID: 38206558 DOI: 10.1007/s12539-023-00598-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Revised: 11/25/2023] [Accepted: 12/05/2023] [Indexed: 01/12/2024]
Abstract
Long noncoding RNAs (lncRNAs) have significant regulatory roles in gene expression. Interactions with proteins are one of the ways lncRNAs play their roles. Since experiments to determine lncRNA-protein interactions (LPIs) are expensive and time-consuming, many computational methods for predicting LPIs have been proposed as alternatives. In the LPIs prediction problem, there commonly exists the imbalance in the distribution of positive and negative samples. However, there are few existing methods that give specific consideration to this problem. In this paper, we proposed a new clustering-based LPIs prediction method using segmented k-mer frequencies and multi-space clustering (LPI-SKMSC). It was dedicated to handling the imbalance of positive and negative samples. We constructed segmented k-mer frequencies to obtain global and local features of lncRNA and protein sequences. Then, the multi-space clustering was applied to LPI-SKMSC. The convolutional neural network (CNN)-based encoders were used to map different features of a sample to different spaces. It used multiple spaces to jointly constrain the classification of samples. Finally, the distances between the output features of the encoder and the cluster center in each space were calculated. The sum of distances in all spaces was compared with the cluster radius to predict the LPIs. We performed cross-validation on 3 public datasets and LPI-SKMSC showed the best performance compared to other existing methods. Experimental results showed that LPI-SKMSC could predict LPIs more effectively when faced with imbalanced positive and negative samples. In addition, we illustrated that our model was better at uncovering potential lncRNA-protein interaction pairs.
Collapse
Affiliation(s)
- Dian-Zheng Sun
- School of Electrical Engineering and Automation, Anhui University, Hefei, 230601, China
| | - Zhan-Li Sun
- School of Electrical Engineering and Automation, Anhui University, Hefei, 230601, China.
| | - Mengya Liu
- School of Computer Science and Technology, Anhui University, Hefei, 230601, China
| | - Shuang-Hao Yong
- School of Electrical Engineering and Automation, Anhui University, Hefei, 230601, China
| |
Collapse
|
5
|
Li X, Qu W, Yan J, Tan J. RPI-EDLCN: An Ensemble Deep Learning Framework Based on Capsule Network for ncRNA-Protein Interaction Prediction. J Chem Inf Model 2024; 64:2221-2235. [PMID: 37158609 DOI: 10.1021/acs.jcim.3c00377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Noncoding RNAs (ncRNAs) play crucial roles in many cellular life activities by interacting with proteins. Identification of ncRNA-protein interactions (ncRPIs) is key to understanding the function of ncRNAs. Although a number of computational methods for predicting ncRPIs have been developed, the problem of predicting ncRPIs remains challenging. It has always been the focus of ncRPIs research to select suitable feature extraction methods and develop a deep learning architecture with better recognition performance. In this work, we proposed an ensemble deep learning framework, RPI-EDLCN, based on a capsule network (CapsuleNet) to predict ncRPIs. In terms of feature input, we extracted the sequence features, secondary structure sequence features, motif information, and physicochemical properties of ncRNA/protein. The sequence and secondary structure sequence features of ncRNA/protein are encoded by the conjoint k-mer method and then input into an ensemble deep learning model based on CapsuleNet by combining the motif information and physicochemical properties. In this model, the encoding features are processed by convolution neural network (CNN), deep neural network (DNN), and stacked autoencoder (SAE). Then the advanced features obtained from the processing are input into the CapsuleNet for further feature learning. Compared with other state-of-the-art methods under 5-fold cross-validation, the performance of RPI-EDLCN is the best, and the accuracy of RPI-EDLCN on RPI1807, RPI2241, and NPInter v2.0 data sets was 93.8%, 88.2%, and 91.9%, respectively. The results of the independent test indicated that RPI-EDLCN can effectively predict potential ncRPIs in different organisms. In addition, RPI-EDLCN successfully predicted hub ncRNAs and proteins in Mus musculus ncRNA-protein networks. Overall, our model can be used as an effective tool to predict ncRPIs and provides some useful guidance for future biological studies.
Collapse
Affiliation(s)
- Xiaoyi Li
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| | - Wenyan Qu
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| | - Jing Yan
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| | - Jianjun Tan
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| |
Collapse
|
6
|
Yan J, Qu W, Li X, Wang R, Tan J. GATLGEMF: A graph attention model with line graph embedding multi-complex features for ncRNA-protein interactions prediction. Comput Biol Chem 2024; 108:108000. [PMID: 38070456 DOI: 10.1016/j.compbiolchem.2023.108000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Revised: 11/27/2023] [Accepted: 12/03/2023] [Indexed: 01/22/2024]
Abstract
Non-coding RNA (ncRNA) plays an important role in many fundamental biological processes, and it may be closely associated with many complex human diseases. NcRNAs exert their functions by interacting with proteins. Therefore, identifying novel ncRNA-protein interactions (NPIs) is important for understanding the mechanism of ncRNAs role. The computational approach has the advantage of low cost and high efficiency. Machine learning and deep learning have achieved great success by making full use of sequence information and structure information. Graph neural network (GNN) is a deep learning algorithm for complex network link prediction, which can extract and discover features in graph topology data. In this study, we propose a new computational model called GATLGEMF. We used a line graph transformation strategy to obtain the most valuable feature information and input this feature information into the attention network to predict NPIs. The results on four benchmark datasets show that our method achieves superior performance. We further compare GATLGEMF with the state-of-the-art existing methods to evaluate the model performance. GATLGEMF shows the best performance with the area under curve (AUC) of 92.41% and 98.93% on RPI2241 and NPInter v2.0 datasets, respectively. In addition, a case study shows that GATLGEMF has the ability to predict new interactions based on known interactions. The source code is available at https://github.com/JianjunTan-Beijing/GATLGEMF.
Collapse
Affiliation(s)
- Jing Yan
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| | - Wenyan Qu
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| | - Xiaoyi Li
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| | - Ruobing Wang
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| | - Jianjun Tan
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China.
| |
Collapse
|
7
|
Das G, Das T, Parida S, Ghosh Z. LncRTPred: Predicting RNA-RNA mode of interaction mediated by lncRNA. IUBMB Life 2024; 76:53-68. [PMID: 37606159 DOI: 10.1002/iub.2778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2023] [Accepted: 07/19/2023] [Indexed: 08/23/2023]
Abstract
Long non-coding RNAs (lncRNAs) play a significant role in various biological processes. Hence, it is utmost important to elucidate their functions in order to understand the molecular mechanism of a complex biological system. This versatile RNA molecule has diverse modes of interaction, one of which constitutes lncRNA-mRNA interaction. Hence, identifying its target mRNA is essential to understand the function of an lncRNA explicitly. Existing lncRNA target prediction tools mainly adopt thermodynamics approach. Large execution time and inability to perform real-time prediction limit their usage. Further, lack of negative training dataset has been a hindrance in the path of developing machine learning (ML) based lncRNA target prediction tools. In this work, we have developed a ML-based lncRNA-mRNA target prediction model- 'LncRTPred'. Here we have addressed the existing problems by generating reliable negative dataset and creating robust ML models. We have identified the non-interacting lncRNA and mRNAs from the unlabelled dataset using BLAT. It is further filtered to get a reliable set of outliers. LncRTPred provides a cumulative_model_score as the final output against each query. In terms of prediction accuracy, LncRTPred outperforms other popular target prediction protocols like LncTar. Further, we have tested its performance against experimentally validated disease-specific lncRNA-mRNA interactions. Overall, performance of LncRTPred is heavily dependent on the size of the training dataset, which is highly reflected by the difference in its performance for human and mouse species. Its performance for human species shows better as compared to that for mouse when applied on an unknown data due to smaller size of the training dataset in case of mouse compared to that of human. Availability of increased number of lncRNA-mRNA interaction data for mouse will improve the performance of LncRTPred in future. Both webserver and standalone versions of LncRTPred are available. Web server link: http://bicresources.jcbose.ac.in/zhumur/lncrtpred/index.html. Github Link: https://github.com/zglabDIB/LncRTPred.
Collapse
Affiliation(s)
- Gourab Das
- Division of Bioinformatics, Bose Institute, Kolkata, India
| | - Troyee Das
- Division of Bioinformatics, Bose Institute, Kolkata, India
| | - Sibun Parida
- Division of Bioinformatics, Bose Institute, Kolkata, India
| | - Zhumur Ghosh
- Division of Bioinformatics, Bose Institute, Kolkata, India
| |
Collapse
|
8
|
Huiwen J, Kai S. Prediction of LncRNA-protein Interactions Using Auto-Encoder, SE-ResNet Models and Transfer Learning. Microrna 2024; 13:155-165. [PMID: 38591194 DOI: 10.2174/0122115366288068240322064431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 02/26/2024] [Accepted: 03/09/2024] [Indexed: 04/10/2024]
Abstract
BACKGROUND Long non-coding RNA (lncRNA) plays a crucial role in various biological processes, and mutations or imbalances of lncRNAs can lead to several diseases, including cancer, Prader-Willi syndrome, autism, Alzheimer's disease, cartilage-hair hypoplasia, and hearing loss. Understanding lncRNA-protein interactions (LPIs) is vital for elucidating basic cellular processes, human diseases, viral replication, transcription, and plant pathogen resistance. Despite the development of several LPI calculation methods, predicting LPI remains challenging, with the selection of variables and deep learning structure being the focus of LPI research. METHODS We propose a deep learning framework called AR-LPI, which extracts sequence and secondary structure features of proteins and lncRNAs. The framework utilizes an auto-encoder for feature extraction and employs SE-ResNet for prediction. Additionally, we apply transfer learning to the deep neural network SE-ResNet for predicting small-sample datasets. RESULTS Through comprehensive experimental comparison, we demonstrate that the AR-LPI architecture performs better in LPI prediction. Specifically, the accuracy of AR-LPI increases by 2.86% to 94.52%, while the F-value of AR-LPI increases by 2.71% to 94.73%. CONCLUSION Our experimental results show that the overall performance of AR-LPI is better than that of other LPI prediction tools.
Collapse
Affiliation(s)
- Jiang Huiwen
- School of Mathematics and Statistics, Qingdao University, Qingdao, Shandong, China
| | - Song Kai
- School of Mathematics and Statistics, Qingdao University, Qingdao, Shandong, China
| |
Collapse
|
9
|
Wang J, Chen C, Yao G, Ding J, Wang L, Jiang H. Intelligent Protein Design and Molecular Characterization Techniques: A Comprehensive Review. Molecules 2023; 28:7865. [PMID: 38067593 PMCID: PMC10707872 DOI: 10.3390/molecules28237865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 11/13/2023] [Accepted: 11/23/2023] [Indexed: 12/18/2023] Open
Abstract
In recent years, the widespread application of artificial intelligence algorithms in protein structure, function prediction, and de novo protein design has significantly accelerated the process of intelligent protein design and led to many noteworthy achievements. This advancement in protein intelligent design holds great potential to accelerate the development of new drugs, enhance the efficiency of biocatalysts, and even create entirely new biomaterials. Protein characterization is the key to the performance of intelligent protein design. However, there is no consensus on the most suitable characterization method for intelligent protein design tasks. This review describes the methods, characteristics, and representative applications of traditional descriptors, sequence-based and structure-based protein characterization. It discusses their advantages, disadvantages, and scope of application. It is hoped that this could help researchers to better understand the limitations and application scenarios of these methods, and provide valuable references for choosing appropriate protein characterization techniques for related research in the field, so as to better carry out protein research.
Collapse
Affiliation(s)
| | | | | | - Junjie Ding
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| | - Liangliang Wang
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| | - Hui Jiang
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| |
Collapse
|
10
|
Wang Y, Pan Z, Mou M, Xia W, Zhang H, Zhang H, Liu J, Zheng L, Luo Y, Zheng H, Yu X, Lian X, Zeng Z, Li Z, Zhang B, Zheng M, Li H, Hou T, Zhu F. A task-specific encoding algorithm for RNAs and RNA-associated interactions based on convolutional autoencoder. Nucleic Acids Res 2023; 51:e110. [PMID: 37889083 PMCID: PMC10682500 DOI: 10.1093/nar/gkad929] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 08/01/2023] [Accepted: 10/10/2023] [Indexed: 10/28/2023] Open
Abstract
RNAs play essential roles in diverse physiological and pathological processes by interacting with other molecules (RNA/protein/compound), and various computational methods are available for identifying these interactions. However, the encoding features provided by existing methods are limited and the existing tools does not offer an effective way to integrate the interacting partners. In this study, a task-specific encoding algorithm for RNAs and RNA-associated interactions was therefore developed. This new algorithm was unique in (a) realizing comprehensive RNA feature encoding by introducing a great many of novel features and (b) enabling task-specific integration of interacting partners using convolutional autoencoder-directed feature embedding. Compared with existing methods/tools, this novel algorithm demonstrated superior performances in diverse benchmark testing studies. This algorithm together with its source code could be readily accessed by all user at: https://idrblab.org/corain/ and https://github.com/idrblab/corain/.
Collapse
Affiliation(s)
- Yunxia Wang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Weiqi Xia
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Hongning Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Hanyu Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Jin Liu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Lingyan Zheng
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-ZJU Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Hanqi Zheng
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Xinyuan Yu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Xichen Lian
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Zhenyu Zeng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-ZJU Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Zhaorong Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-ZJU Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Bing Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-ZJU Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Mingyue Zheng
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Honglin Li
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
- School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-ZJU Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang, China
| |
Collapse
|
11
|
Gong L, Chen J, Cui X, Liu Y. RPIPCM: A deep network model for predicting lncRNA-protein interaction based on sequence feature encoding. Comput Biol Med 2023; 165:107366. [PMID: 37633089 DOI: 10.1016/j.compbiomed.2023.107366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 07/29/2023] [Accepted: 08/12/2023] [Indexed: 08/28/2023]
Abstract
LncRNA-protein interactionplays an important regulatory role in biological processes. In this paper, the proposed RPIPCM based on a novel deep network model uses the sequence feature encoding of both RNA and protein to predict lncRNA-protein interactions (LPIs). A negative sampling of sliding window method is proposed for solving the problem of unbalanced between positive and negative samples. The proposed negative sampling method is effective and helpful to solve the problem of data imbalance in the existing LPIs research by comparative experiments. Experimental results also show that the proposed sequence feature encoding method has good performance in predicting LPIs for different datasets of different sizes and types. In the RPI488 dataset related to animal, compared with the direct original sequence encoding model, the accuracy of sequence feature encoding model increased by 1.02%, the recall increased by 4.08%, and the value of MCC increased by 1.67%. In the case of the plant dataset ATH948, the sequence feature-based encoding demonstrated a 1.58% higher accuracy, a 1.53% higher recall, a 1.62% higher specificity, a 1.62% higher precision, and a 3.16% higher value of MCC compared to the direct original sequence-based encoding. Compared with the latest prediction work in the ZEA22133 dataset, RPIPCM is shown to be more effective with the accuracy increased by 2.23%, the recall increased by 1.78%, the specificity increased by 2.67%, the precision increased by 2.52%, and the value of MCC increased by 4.43%, which also proves the effectiveness and robustness of RPIPCM. In conclusion, RPIPCM of deep network model based on sequence feature encoding can automatically mine the hidden feature information of the sequence in the lncRNA-protein interaction without relying on external features or prior biomedical knowledge, and its low cost and high efficiency can provide a reference for biomedical researchers.
Collapse
Affiliation(s)
- Lejun Gong
- School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China.
| | - Jingmei Chen
- School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
| | - Xiong Cui
- School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
| | - Yang Liu
- School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
| |
Collapse
|
12
|
Chen P, Shen H, Zhang Y, Wang B, Gu P. SGNet: Sequence-Based Convolution and Ligand Graph Network for Protein Binding Affinity Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3257-3266. [PMID: 37030867 DOI: 10.1109/tcbb.2023.3262821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Protein-ligand binding can play an important role in many fields. It is of great importance to accurately predict the binding affinity between molecules by computational methods. Most computational binding affinity methods require molecular structures. However, there are still a large number of protein molecules with known amino acid sequences whose structures have not yet been solved. To address this issue, this paper proposes a sequence-based convolution and ligand graph network, called SGNet, to fuse the molecular graph information and the amino acid sequence information. This method integrates Conjoint Triad (CT) encoding of amino acid sequence and one-dimensional convolutional neural network module to extract protein molecules, develops graph attention network to extract molecular features of ligand, and then fuses the two feature sets to predict the binding affinity between molecules from the fully connected layer. As a result, SGNet achieves good prediction performance on both KIKD and IC50 data sets, with prediction error RMSEs of 1.287 and 1.58, and correlation Pearson Rs of 0.687 and 0.592, respectively. Comparative experimental results under the same conditions showed that SGNet outperformed Kdeep and GraphDTA in predicting binding affinities between protein-ligand molecules.
Collapse
|
13
|
Zhou Z, Du Z, Wei J, Zhuo L, Pan S, Fu X, Lian X. MHAM-NPI: Predicting ncRNA-protein interactions based on multi-head attention mechanism. Comput Biol Med 2023; 163:107143. [PMID: 37339574 DOI: 10.1016/j.compbiomed.2023.107143] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 05/20/2023] [Accepted: 06/06/2023] [Indexed: 06/22/2023]
Abstract
Non-coding RNA (ncRNA) is a functional RNA molecule that plays a key role in various fundamental biological processes, such as gene regulation. Therefore, studying the connection between ncRNA and proteins holds significant importance in exploring the function of ncRNA. Although many efficient and accurate methods have been developed by modern biological scientists, accurate predictions still pose a major challenge for various issues. In our approach, we utilize a multi-head attention mechanism to merge residual connections, allowing for the automatic learning of ncRNA and protein sequence features. Specifically, the proposed method projects node features into multiple spaces based on multi-head attention mechanism, thereby obtaining different feature interaction patterns in these spaces. By stacking interaction layers, higher-order interaction modes can be derived, while still preserving the initial feature information through the residual connection. This strategy effectively leverages the sequence information of ncRNA and protein, enabling the capture of hidden high-order features. The final experimental results demonstrate the effectiveness of our method, with AUC values of 97.4%, 98.5%, and 94.8% achieved on the NPInter v2.0, RPI807, and RPI488 datasets, respectively. These impressive results solidify our method as a powerful tool for exploring the connection between ncRNAs and proteins. We have uploaded the implementation code on GitHub: https://github.com/ZZCrazy00/MHAM-NPI.
Collapse
Affiliation(s)
- Zhecheng Zhou
- Wenzhou University of Technology, Wenzhou, 325000, China
| | - Zhenya Du
- Guangzhou Xinhua University, Guangzhou, 510520, China
| | - Jinhang Wei
- Wenzhou University of Technology, Wenzhou, 325000, China
| | - Linlin Zhuo
- Wenzhou University of Technology, Wenzhou, 325000, China; Hunan University, Changsha, 410000, China.
| | - Shiyao Pan
- Wenzhou University of Technology, Wenzhou, 325000, China
| | | | - Xinze Lian
- Wenzhou University of Technology, Wenzhou, 325000, China.
| |
Collapse
|
14
|
Han Y, Zhang SW. Docsubty: FLAncRPI-LGAT: Prediction of ncRNA-Protein Interactions with Line Graph Attention Network Framework. Comput Struct Biotechnol J 2023; 21:2286-2295. [PMID: 37035546 PMCID: PMC10073990 DOI: 10.1016/j.csbj.2023.03.027] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 03/11/2023] [Accepted: 03/16/2023] [Indexed: 03/19/2023] Open
Abstract
Identification of ncRNA-protein interactions (ncRPIs) through wet experiments is still time-consuming and highly-costly. Although several computational approaches have been developed to predict ncRPIs using the structure and sequence information of ncRNAs and proteins, the prediction accuracy needs to be improved, and the results lack interpretability. In this work, we proposed a novel computational method (called ncRPI-LGAT) to predict the ncRNA-Protein Interactions by transforming the link prediction (i.e., subgraph classification) task into a node classification task in the line network, and introducing a Line Graph ATtention network framework. ncRPI-LGAT first extracts the ncRNA/protein attributes using node2vec, and then generates the local enclosing subgraph of a target ncRNA-protein pair with SEAL. Because using the pooling operations in local enclosing subgraphs to learn a fixed-size feature vector for representing ncRNAs/proteins will cause the information loss, ncRPI-LGAT converts the local enclosing subgraphs into their corresponding line graphs, in which the node corresponds to the edge (i.e., ncRNA-protein pair) of the local enclosing subgraphs. Then, the attention mechanism-based graph neural network GATv2 is used on these line graphs to efficiently learn the embedding features of the target nodes (i.e., ncRNA-protein pairs) by focusing on learning the significance of one ncRNA-protein pair to another ncRNA-protein pair. These embedding features of one ncRNA-protein pair obtained from multi-head attention are concatenated in series and then fed them into a fully connected network to predict ncRPIs. Compared with other state-of-the-art methods in the 5CV test, ncRPI-LGAT shows superior performance on three benchmark datasets, demonstrating the effectiveness of our ncRPI-LGAT method in predicting ncRNA-protein interactions.
Collapse
|
15
|
Wang H, Zhang Z, Li H, Li J, Li H, Liu M, Liang P, Xi Q, Xing Y, Yang L, Zuo Y. A cost-effective machine learning-based method for preeclampsia risk assessment and driver genes discovery. Cell Biosci 2023; 13:41. [PMID: 36849879 PMCID: PMC9972636 DOI: 10.1186/s13578-023-00991-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 02/15/2023] [Indexed: 03/01/2023] Open
Abstract
BACKGROUND The placenta, as a unique exchange organ between mother and fetus, is essential for successful human pregnancy and fetal health. Preeclampsia (PE) caused by placental dysfunction contributes to both maternal and infant morbidity and mortality. Accurate identification of PE patients plays a vital role in the formulation of treatment plans. However, the traditional clinical methods of PE have a high misdiagnosis rate. RESULTS Here, we first designed a computational biology method that used single-cell transcriptome (scRNA-seq) of healthy pregnancy (38 wk) and early-onset PE (28-32 wk) to identify pathological cell subpopulations and predict PE risk. Based on machine learning methods and feature selection techniques, we observed that the Tuning ReliefF (TURF) score hybrid with XGBoost (TURF_XGB) achieved optimal performance, with 92.61% accuracy and 92.46% recall for classifying nine cell subpopulations of healthy placentas. Biological landscapes of placenta heterogeneity could be mapped by the 110 marker genes screened by TURF_XGB, which revealed the superiority of the TURF feature mining. Moreover, we processed the PE dataset with LASSO to obtain 497 biomarkers. Integration analysis of the above two gene sets revealed that dendritic cells were closely associated with early-onset PE, and C1QB and C1QC might drive preeclampsia by mediating inflammation. In addition, an ensemble model-based risk stratification card was developed to classify preeclampsia patients, and its area under the receiver operating characteristic curve (AUC) could reach 0.99. For broader accessibility, we designed an accessible online web server ( http://bioinfor.imu.edu.cn/placenta ). CONCLUSION Single-cell transcriptome-based preeclampsia risk assessment using an ensemble machine learning framework is a valuable asset for clinical decision-making. C1QB and C1QC may be involved in the development and progression of early-onset PE by affecting the complement and coagulation cascades pathway that mediate inflammation, which has important implications for better understanding the pathogenesis of PE.
Collapse
Affiliation(s)
- Hao Wang
- grid.411643.50000 0004 1761 0411The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070 China ,Digital College, Inner Mongolia Intelligent Union Big Data Academy, Inner Mongolia Wesure Date Technology Co., Ltd., Hohhot, 010010 China
| | - Zhaoyue Zhang
- grid.54549.390000 0004 0369 4060School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054 China
| | - Haicheng Li
- grid.411643.50000 0004 1761 0411The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070 China ,Digital College, Inner Mongolia Intelligent Union Big Data Academy, Inner Mongolia Wesure Date Technology Co., Ltd., Hohhot, 010010 China
| | - Jinzhao Li
- grid.411643.50000 0004 1761 0411The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070 China
| | - Hanshuang Li
- grid.411643.50000 0004 1761 0411The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070 China
| | - Mingzhu Liu
- grid.411643.50000 0004 1761 0411The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070 China ,Digital College, Inner Mongolia Intelligent Union Big Data Academy, Inner Mongolia Wesure Date Technology Co., Ltd., Hohhot, 010010 China
| | - Pengfei Liang
- grid.411643.50000 0004 1761 0411The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070 China
| | - Qilemuge Xi
- grid.411643.50000 0004 1761 0411The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070 China
| | - Yongqiang Xing
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China.
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China.
| | - Yongchun Zuo
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China. .,Digital College, Inner Mongolia Intelligent Union Big Data Academy, Inner Mongolia Wesure Date Technology Co., Ltd., Hohhot, 010010, China.
| |
Collapse
|
16
|
Constructing discriminative feature space for LncRNA-protein interaction based on deep autoencoder and marginal fisher analysis. Comput Biol Med 2023; 157:106711. [PMID: 36924738 DOI: 10.1016/j.compbiomed.2023.106711] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 01/26/2023] [Accepted: 02/26/2023] [Indexed: 03/04/2023]
Abstract
Long non-coding RNAs (lncRNAs) play important roles by regulating proteins in many biological processes and life activities. To uncover molecular mechanisms of lncRNA, it is very necessary to identify interactions of lncRNA with proteins. Recently, some machine learning methods were proposed to detect lncRNA-protein interactions according to the distribution of known interactions. The performances of these methods were largely dependent upon: (1) how exactly the distribution of known interactions was characterized by feature space; (2) how discriminative the feature space was for distinguishing lncRNA-protein interactions. Because the known interactions may be multiple and complex model, it remains a challenge to construct discriminative feature space for lncRNA-protein interactions. To resolve this problem, a novel method named DFRPI was developed based on deep autoencoder and marginal fisher analysis in this paper. Firstly, some initial features of lncRNA-protein interactions were extracted from the primary sequences and secondary structures of lncRNA and protein. Secondly, a deep autoencoder was exploited to learn encode parameters of the initial features to describe the known interactions precisely. Next, the marginal fisher analysis was employed to optimize the encode parameters of features to characterize a discriminative feature space of the lncRNA-protein interactions. Finally, a random forest-based predictor was trained on the discriminative feature space to detect lncRNA-protein interactions. Verified by a series of experiments, the results showed that our predictor achieved the precision of 0.920, recall of 0.916, accuracy of 0.918, MCC of 0.836, specificity of 0.920, sensitivity of 0.916 and AUC of 0.906 respectively, which outperforms the concerned methods for predicting lncRNA-protein interaction. It may be suggested that the proposed method can generate a reasonable and effective feature space for distinguishing lncRNA-protein interactions accurately. The code and data are available on https://github.com/D0ub1e-D/DFRPI.
Collapse
|
17
|
Zhao J, Sun J, Shuai SC, Zhao Q, Shuai J. Predicting potential interactions between lncRNAs and proteins via combined graph auto-encoder methods. Brief Bioinform 2023; 24:6896030. [PMID: 36515153 DOI: 10.1093/bib/bbac527] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 10/23/2022] [Accepted: 11/06/2022] [Indexed: 12/15/2022] Open
Abstract
Long noncoding RNA (lncRNA) is a kind of noncoding RNA with a length of more than 200 nucleotide units. Numerous research studies have proven that although lncRNAs cannot be directly translated into proteins, lncRNAs still play an important role in human growth processes by interacting with proteins. Since traditional biological experiments often require a lot of time and material costs to explore potential lncRNA-protein interactions (LPI), several computational models have been proposed for this task. In this study, we introduce a novel deep learning method known as combined graph auto-encoders (LPICGAE) to predict potential human LPIs. First, we apply a variational graph auto-encoder to learn the low dimensional representations from the high-dimensional features of lncRNAs and proteins. Then the graph auto-encoder is used to reconstruct the adjacency matrix for inferring potential interactions between lncRNAs and proteins. Finally, we minimize the loss of the two processes alternately to gain the final predicted interaction matrix. The result in 5-fold cross-validation experiments illustrates that our method achieves an average area under receiver operating characteristic curve of 0.974 and an average accuracy of 0.985, which is better than those of existing six state-of-the-art computational methods. We believe that LPICGAE can help researchers to gain more potential relationships between lncRNAs and proteins effectively.
Collapse
Affiliation(s)
- Jingxuan Zhao
- University of Science and Technology Liaoning, 66459, Anshan, China
| | | | - Stella C Shuai
- Northwestern University, 3270, Evanston, IllinoisUnited States
| | - Qi Zhao
- University of Science and Technology Liaoning, 66459, Anshan, China
| | - Jianwei Shuai
- Department of Physics, Xiamen University, Xiamen, China
| |
Collapse
|
18
|
Wang A, Wang J, Mao M, Zhao X, Li Q, Xuan R, Li F, Chao T. Analyses of lncRNAs, circRNAs, and the Interactions between ncRNAs and mRNAs in Goat Submandibular Glands Reveal Their Potential Function in Immune Regulation. Genes (Basel) 2023; 14:187. [PMID: 36672927 PMCID: PMC9859278 DOI: 10.3390/genes14010187] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2022] [Revised: 01/01/2023] [Accepted: 01/06/2023] [Indexed: 01/13/2023] Open
Abstract
As part of one of the main ruminants, goat salivary glands hardly secrete digestive enzymes, but play an important role in immunity. The immune function of goat salivary glands significantly changes with age, while the expression profile and specific function of non-coding RNA during this process are unknown. In this study, transcriptome sequencing was performed on submandibular gland (SMG) tissues of 1-month-old, 12-month-old, and 24-month-old goats, revealing the expression patterns of lncRNA and circRNA at different ages. A total of 369 lncRNAs and 1699 circRNAs were found to be differentially expressed. Functional enrichment analyses showed that the lncRNA regulated target mRNAs and circRNA host genes were significantly enriched in immune-related GO terms and pathways. CeRNA network analysis showed that the key differentially expressed circRNAs and lncRNAs mainly regulate the key immune-related genes ITGB2, LCP2, PTPRC, SYK, and ZAP70 through competitive binding with miR-141-x, miR-29-y, and chi-miR-29b-3p, thereby affecting the natural killer cell-mediated cytotoxicity pathway, the T cell receptor signaling pathway, and other immune-related pathways. It should be noted that the expression of key circRNAs, lncRNAs, and key immune-related genes in goat SMGs decreased significantly with the growth of the goat. This is the first reporting of lncRNAs, circRNAs, and ceRNA network regulation in goat SMGs. Our study contributes to the knowledge of changes in the expression of non-coding RNAs during SMG development in goats and provides new insights into the relationship between non-coding RNAs and salivary gland immune function in goats.
Collapse
Affiliation(s)
- Aili Wang
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, College of Animal Science and Veterinary Medicine, Shandong Agricultural University, Taian 271000, China
| | - Jianmin Wang
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, College of Animal Science and Veterinary Medicine, Shandong Agricultural University, Taian 271000, China
- Key Laboratory of Efficient Utilization of Non-Grain Feed Resources (Co-Construction by Ministry and Province), Ministry of Agriculture and Rural Affairs, Shandong Agricultural University, Taian 271000, China
| | - Meina Mao
- Shandong Peninsula Engineering Research Center of Comprehensive Brine Utilization, Weifang University of Science and Technology, Shouguang 262700, China
| | - Xiaodong Zhao
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, College of Animal Science and Veterinary Medicine, Shandong Agricultural University, Taian 271000, China
- Shandong Vocational Animal Science and Veterinary College, Weifang 261000, China
| | - Qing Li
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, College of Animal Science and Veterinary Medicine, Shandong Agricultural University, Taian 271000, China
- Key Laboratory of Efficient Utilization of Non-Grain Feed Resources (Co-Construction by Ministry and Province), Ministry of Agriculture and Rural Affairs, Shandong Agricultural University, Taian 271000, China
| | - Rong Xuan
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, College of Animal Science and Veterinary Medicine, Shandong Agricultural University, Taian 271000, China
- Key Laboratory of Efficient Utilization of Non-Grain Feed Resources (Co-Construction by Ministry and Province), Ministry of Agriculture and Rural Affairs, Shandong Agricultural University, Taian 271000, China
| | - Fajun Li
- Shandong Peninsula Engineering Research Center of Comprehensive Brine Utilization, Weifang University of Science and Technology, Shouguang 262700, China
| | - Tianle Chao
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, College of Animal Science and Veterinary Medicine, Shandong Agricultural University, Taian 271000, China
- Key Laboratory of Efficient Utilization of Non-Grain Feed Resources (Co-Construction by Ministry and Province), Ministry of Agriculture and Rural Affairs, Shandong Agricultural University, Taian 271000, China
| |
Collapse
|
19
|
Han S, Yang X, Sun H, Yang H, Zhang Q, Peng C, Fang W, Li Y. LION: an integrated R package for effective prediction of ncRNA-protein interaction. Brief Bioinform 2022; 23:6713512. [PMID: 36155620 DOI: 10.1093/bib/bbac420] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 08/03/2022] [Accepted: 08/30/2022] [Indexed: 12/14/2022] Open
Abstract
Understanding ncRNA-protein interaction is of critical importance to unveil ncRNAs' functions. Here, we propose an integrated package LION which comprises a new method for predicting ncRNA/lncRNA-protein interaction as well as a comprehensive strategy to meet the requirement of customisable prediction. Experimental results demonstrate that our method outperforms its competitors on multiple benchmark datasets. LION can also improve the performance of some widely used tools and build adaptable models for species- and tissue-specific prediction. We expect that LION will be a powerful and efficient tool for the prediction and analysis of ncRNA/lncRNA-protein interaction. The R Package LION is available on GitHub at https://github.com/HAN-Siyu/LION/.
Collapse
Affiliation(s)
- Siyu Han
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, in Jilin University, China
| | - Xiao Yang
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Hang Sun
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Hu Yang
- 964 Hospital of Joint Logistic Support Force of the Chinese People's Liberation Army
| | - Qi Zhang
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Cheng Peng
- School of Software, Tsinghua University, Beijing, China
| | - Wensi Fang
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Ying Li
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| |
Collapse
|
20
|
Shaath H, Vishnubalaji R, Elango R, Kardousha A, Islam Z, Qureshi R, Alam T, Kolatkar PR, Alajez NM. Long non-coding RNA and RNA-binding protein interactions in cancer: Experimental and machine learning approaches. Semin Cancer Biol 2022; 86:325-345. [PMID: 35643221 DOI: 10.1016/j.semcancer.2022.05.013] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 05/16/2022] [Accepted: 05/20/2022] [Indexed: 01/27/2023]
Abstract
Understanding the complex and specific roles played by non-coding RNAs (ncRNAs), which comprise the bulk of the genome, is important for understanding virtually every hallmark of cancer. This large group of molecules plays pivotal roles in key regulatory mechanisms in various cellular processes. Regulatory mechanisms, mediated by long non-coding RNA (lncRNA) and RNA-binding protein (RBP) interactions, are well documented in several types of cancer. Their effects are enabled through networks affecting lncRNA and RBP stability, RNA metabolism including N6-methyladenosine (m6A) and alternative splicing, subcellular localization, and numerous other mechanisms involved in cancer. In this review, we discuss the reciprocal interplay between lncRNAs and RBPs and their involvement in epigenetic regulation via histone modifications, as well as their key role in resistance to cancer therapy. Other aspects of RBPs including their structural domains, provide a deeper knowledge on how lncRNAs and RBPs interact and exert their biological functions. In addition, current state-of-the-art knowledge, facilitated by machine and deep learning approaches, unravels such interactions in better details to further enhance our understanding of the field, and the potential to harness RNA-based therapeutics as an alternative treatment modality for cancer are discussed.
Collapse
Affiliation(s)
- Hibah Shaath
- Translational Cancer and Immunity Center (TCIC), Qatar Biomedical Research Institute (QBRI), Hamad Bin Khalifa University (HBKU), Qatar Foundation (QF), PO Box 34110, Doha, Qatar
| | - Radhakrishnan Vishnubalaji
- Translational Cancer and Immunity Center (TCIC), Qatar Biomedical Research Institute (QBRI), Hamad Bin Khalifa University (HBKU), Qatar Foundation (QF), PO Box 34110, Doha, Qatar
| | - Ramesh Elango
- Translational Cancer and Immunity Center (TCIC), Qatar Biomedical Research Institute (QBRI), Hamad Bin Khalifa University (HBKU), Qatar Foundation (QF), PO Box 34110, Doha, Qatar
| | - Ahmed Kardousha
- College of Health & Life Sciences, Hamad Bin Khalifa University (HBKU), Qatar Foundation (QF), PO Box 34110, Doha, Qatar
| | - Zeyaul Islam
- Diabetes Research Center (DRC), Qatar Biomedical Research Institute (QBRI), Hamad Bin Khalifa University (HBKU), Qatar Foundation, PO Box 34110, Doha, Qatar
| | - Rizwan Qureshi
- College of Science and Engineering, Hamad Bin Khalifa University (HBKU), Qatar Foundation, PO Box 34110, Doha, Qatar
| | - Tanvir Alam
- College of Science and Engineering, Hamad Bin Khalifa University (HBKU), Qatar Foundation, PO Box 34110, Doha, Qatar
| | - Prasanna R Kolatkar
- College of Health & Life Sciences, Hamad Bin Khalifa University (HBKU), Qatar Foundation (QF), PO Box 34110, Doha, Qatar; Diabetes Research Center (DRC), Qatar Biomedical Research Institute (QBRI), Hamad Bin Khalifa University (HBKU), Qatar Foundation, PO Box 34110, Doha, Qatar
| | - Nehad M Alajez
- Translational Cancer and Immunity Center (TCIC), Qatar Biomedical Research Institute (QBRI), Hamad Bin Khalifa University (HBKU), Qatar Foundation (QF), PO Box 34110, Doha, Qatar; College of Health & Life Sciences, Hamad Bin Khalifa University (HBKU), Qatar Foundation (QF), PO Box 34110, Doha, Qatar.
| |
Collapse
|
21
|
Bheemireddy S, Sandhya S, Srinivasan N, Sowdhamini R. Computational tools to study RNA-protein complexes. Front Mol Biosci 2022; 9:954926. [PMID: 36275618 PMCID: PMC9585174 DOI: 10.3389/fmolb.2022.954926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 09/20/2022] [Indexed: 11/19/2022] Open
Abstract
RNA is the key player in many cellular processes such as signal transduction, replication, transport, cell division, transcription, and translation. These diverse functions are accomplished through interactions of RNA with proteins. However, protein–RNA interactions are still poorly derstood in contrast to protein–protein and protein–DNA interactions. This knowledge gap can be attributed to the limited availability of protein-RNA structures along with the experimental difficulties in studying these complexes. Recent progress in computational resources has expanded the number of tools available for studying protein-RNA interactions at various molecular levels. These include tools for predicting interacting residues from primary sequences, modelling of protein-RNA complexes, predicting hotspots in these complexes and insights into derstanding in the dynamics of their interactions. Each of these tools has its strengths and limitations, which makes it significant to select an optimal approach for the question of interest. Here we present a mini review of computational tools to study different aspects of protein-RNA interactions, with focus on overall application, development of the field and the future perspectives.
Collapse
Affiliation(s)
- Sneha Bheemireddy
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Sankaran Sandhya
- Department of Biotechnology, Faculty of Life and Allied Health Sciences, M.S. Ramaiah University of Applied Sciences, Bengaluru, India
- *Correspondence: Sankaran Sandhya, ; Ramanathan Sowdhamini,
| | | | - Ramanathan Sowdhamini
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
- National Centre for Biological Sciences, TIFR, GKVK Campus, Bangalore, India
- Institute of Bioinformatics and Applied Biotechnology, Bangalore, India
- *Correspondence: Sankaran Sandhya, ; Ramanathan Sowdhamini,
| |
Collapse
|
22
|
Pepe G, Appierdo R, Carrino C, Ballesio F, Helmer-Citterich M, Gherardini PF. Artificial intelligence methods enhance the discovery of RNA interactions. Front Mol Biosci 2022; 9:1000205. [PMID: 36275611 PMCID: PMC9585310 DOI: 10.3389/fmolb.2022.1000205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 09/20/2022] [Indexed: 11/13/2022] Open
Abstract
Understanding how RNAs interact with proteins, RNAs, or other molecules remains a challenge of main interest in biology, given the importance of these complexes in both normal and pathological cellular processes. Since experimental datasets are starting to be available for hundreds of functional interactions between RNAs and other biomolecules, several machine learning and deep learning algorithms have been proposed for predicting RNA-RNA or RNA-protein interactions. However, most of these approaches were evaluated on a single dataset, making performance comparisons difficult. With this review, we aim to summarize recent computational methods, developed in this broad research area, highlighting feature encoding and machine learning strategies adopted. Given the magnitude of the effect that dataset size and quality have on performance, we explored the characteristics of these datasets. Additionally, we discuss multiple approaches to generate datasets of negative examples for training. Finally, we describe the best-performing methods to predict interactions between proteins and specific classes of RNA molecules, such as circular RNAs (circRNAs) and long non-coding RNAs (lncRNAs), and methods to predict RNA-RNA or RNA-RBP interactions independently of the RNA type.
Collapse
Affiliation(s)
- G Pepe
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
- *Correspondence: G Pepe, ; M Helmer-Citterich,
| | - R Appierdo
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - C Carrino
- PhD Program in Cellular and Molecular Biology, Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - F Ballesio
- PhD Program in Cellular and Molecular Biology, Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - M Helmer-Citterich
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
- *Correspondence: G Pepe, ; M Helmer-Citterich,
| | - PF Gherardini
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| |
Collapse
|
23
|
Zhuo L, Chen Y, Song B, Liu Y, Su Y. A model for predicting ncRNA-protein interactions based on graph neural networks and community detection. Methods 2022; 207:74-80. [PMID: 36108992 DOI: 10.1016/j.ymeth.2022.09.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Revised: 08/07/2022] [Accepted: 09/03/2022] [Indexed: 10/31/2022] Open
Abstract
Non-coding RNA (ncRNA) s play an considerable role in the current biological sciences, such as gene transcription, gene expression, etc. Exploring the ncRNA-protein interactions(NPI) is of great significance, while some experimental techniques are very expensive in terms of time consumption and labor cost. This has promoted the birth of some computational algorithms related to traditional statistics and artificial intelligence. However, these algorithms usually require the sequence or structural feature vector of the molecule. Although graph neural network (GNN) s has been widely used in recent academic and industrial researches, its potential remains unexplored in the field of detecting NPI. Hence, we present a novel GNN-based model to detect NPI in this paper, where the detecting problem of NPI is transformed into the graph link prediction problem. Specifically, the proposed method utilizes two groups of labels to distinguish two different types of nodes: ncRNA and protein, which alleviates the problem of over-coupling in graph network. Subsequently, ncRNA and protein embedding is initially optimized based on the cluster ownership relationship of nodes in the graph. Moreover, the model applies a self-attention mechanism to preserve the graph topology to reduce information loss during pooling. The experimental results indicate that the proposed model indeed has superior performance.
Collapse
Affiliation(s)
- Linlin Zhuo
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, Zhejiang 325035, China; College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| | - Yifan Chen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China.
| | - Bosheng Song
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| | - Yuansheng Liu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| | - Yansen Su
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China.
| |
Collapse
|
24
|
Zhuo L, Song B, Liu Y, Li Z, Fu X. Predicting ncRNA-protein interactions based on dual graph convolutional network and pairwise learning. Brief Bioinform 2022; 23:6691912. [PMID: 36063562 DOI: 10.1093/bib/bbac339] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Revised: 07/05/2022] [Accepted: 07/25/2022] [Indexed: 11/14/2022] Open
Abstract
Noncoding RNAs (ncRNAs) have recently attracted considerable attention due to their key roles in biology. The ncRNA-proteins interaction (NPI) is often explored to reveal some biological activities that ncRNA may affect, such as biological traits, diseases, etc. Traditional experimental methods can accomplish this work but are often labor-intensive and expensive. Machine learning and deep learning methods have achieved great success by exploiting sufficient sequence or structure information. Graph Neural Network (GNN)-based methods consider the topology in ncRNA-protein graphs and perform well on tasks like NPI prediction. Based on GNN, some pairwise constraint methods have been developed to apply on homogeneous networks, but not used for NPI prediction on heterogeneous networks. In this paper, we construct a pairwise constrained NPI predictor based on dual Graph Convolutional Network (GCN) called NPI-DGCN. To our knowledge, our method is the first to train a heterogeneous graph-based model using a pairwise learning strategy. Instead of binary classification, we use a rank layer to calculate the score of an ncRNA-protein pair. Moreover, our model is the first to predict NPIs on the ncRNA-protein bipartite graph rather than the homogeneous graph. We transform the original ncRNA-protein bipartite graph into two homogenous graphs on which to explore second-order implicit relationships. At the same time, we model direct interactions between two homogenous graphs to explore explicit relationships. Experimental results on the four standard datasets indicate that our method achieves competitive performance with other state-of-the-art methods. And the model is available at https://github.com/zhuoninnin1992/NPIPredict.
Collapse
Affiliation(s)
- Linlin Zhuo
- College of Data Science and Artificial Intelligence, Wenzhou University of Technology, 325027, Wenzhou, China
| | - Bosheng Song
- College of Computer Science and Electronic Engineering, Hunan University, 410082, Changsha, China
| | - Yuansheng Liu
- College of Computer Science and Electronic Engineering, Hunan University, 410082, Changsha, China
| | - Zejun Li
- School of Computer and Information Science, Hunan Institute of Technology, 421000, Hengyang, China
| | - Xiangzheng Fu
- College of Computer Science and Electronic Engineering, Hunan University, 410082, Changsha, China
| |
Collapse
|
25
|
Jiang Y, Wang Y, Shen L, Adjeroh DA, Liu Z, Lin J. Identification of all-against-all protein-protein interactions based on deep hash learning. BMC Bioinformatics 2022; 23:266. [PMID: 35804303 PMCID: PMC9264577 DOI: 10.1186/s12859-022-04811-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 06/17/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein-protein interaction (PPI) is vital for life processes, disease treatment, and drug discovery. The computational prediction of PPI is relatively inexpensive and efficient when compared to traditional wet-lab experiments. Given a new protein, one may wish to find whether the protein has any PPI relationship with other existing proteins. Current computational PPI prediction methods usually compare the new protein to existing proteins one by one in a pairwise manner. This is time consuming. RESULTS In this work, we propose a more efficient model, called deep hash learning protein-and-protein interaction (DHL-PPI), to predict all-against-all PPI relationships in a database of proteins. First, DHL-PPI encodes a protein sequence into a binary hash code based on deep features extracted from the protein sequences using deep learning techniques. This encoding scheme enables us to turn the PPI discrimination problem into a much simpler searching problem. The binary hash code for a protein sequence can be regarded as a number. Thus, in the pre-screening stage of DHL-PPI, the string matching problem of comparing a protein sequence against a database with M proteins can be transformed into a much more simpler problem: to find a number inside a sorted array of length M. This pre-screening process narrows down the search to a much smaller set of candidate proteins for further confirmation. As a final step, DHL-PPI uses the Hamming distance to verify the final PPI relationship. CONCLUSIONS The experimental results confirmed that DHL-PPI is feasible and effective. Using a dataset with strictly negative PPI examples of four species, DHL-PPI is shown to be superior or competitive when compared to the other state-of-the-art methods in terms of precision, recall or F1 score. Furthermore, in the prediction stage, the proposed DHL-PPI reduced the time complexity from [Formula: see text] to [Formula: see text] for performing an all-against-all PPI prediction for a database with M proteins. With the proposed approach, a protein database can be preprocessed and stored for later search using the proposed encoding scheme. This can provide a more efficient way to cope with the rapidly increasing volume of protein datasets.
Collapse
Affiliation(s)
- Yue Jiang
- College of Computer and Cyber Security, Fujian Normal University, Fuzhou, 350108, People's Republic of China
| | - Yuxuan Wang
- No. 2 Thoracic Surgery Department Beijing Chest Hospital, Capital Medical University, Beijing Tuberculosis and Thoracic Tumor Research Institute, Beijing, 101149, People's Republic of China
| | - Lin Shen
- College of Computer and Cyber Security, Fujian Normal University, Fuzhou, 350108, People's Republic of China
| | - Donald A Adjeroh
- Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, 26506, USA
| | - Zhidong Liu
- No. 2 Thoracic Surgery Department Beijing Chest Hospital, Capital Medical University, Beijing Tuberculosis and Thoracic Tumor Research Institute, Beijing, 101149, People's Republic of China.
| | - Jie Lin
- College of Computer and Cyber Security, Fujian Normal University, Fuzhou, 350108, People's Republic of China.
| |
Collapse
|
26
|
Simple synthesis of massively parallel RNA microarrays via enzymatic conversion from DNA microarrays. Nat Commun 2022; 13:3772. [PMID: 35773271 PMCID: PMC9246885 DOI: 10.1038/s41467-022-31370-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 06/14/2022] [Indexed: 11/20/2022] Open
Abstract
RNA catalytic and binding interactions with proteins and small molecules are fundamental elements of cellular life processes as well as the basis for RNA therapeutics and molecular engineering. In the absence of quantitative predictive capacity for such bioaffinity interactions, high throughput experimental approaches are needed to sufficiently sample RNA sequence space. Here we report on a simple and highly accessible approach to convert commercially available customized DNA microarrays of any complexity and density to RNA microarrays via a T7 RNA polymerase-mediated extension of photocrosslinked methyl RNA primers and subsequent degradation of the DNA templates. RNA microarrays have many potential applications, but are difficult to produce. Here, the AUs present a method for converting commercial, customizable DNA microarrays into RNA microarrays using an accessible three-step process involving primer photocrosslinking, extension, and template degradation.
Collapse
|
27
|
Huang X, Shi Y, Yan J, Qu W, Li X, Tan J. LPI-CSFFR: Combining serial fusion with feature reuse for predicting LncRNA-protein interactions. Comput Biol Chem 2022; 99:107718. [PMID: 35785626 DOI: 10.1016/j.compbiolchem.2022.107718] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 05/24/2022] [Accepted: 06/22/2022] [Indexed: 11/03/2022]
Abstract
Long non-coding RNAs (LncRNAs) play important roles in a series of life activities, and they function primarily with proteins. The wet experimental-based methods in lncRNA-protein interactions (lncRPIs) study are time-consuming and expensive. In this study, we propose for the first time a novel feature fusion method, the LPI-CSFFR, to train and predict LncRPIs based on a Convolutional Neural Network (CNN) with feature reuse and serial fusion in sequences, secondary structures, and physicochemical properties of proteins and lncRNAs. The experimental results indicate that LPI-CSFFR achieves excellent performance on the datasets RPI1460 and RPI1807 with an accuracy of 83.7 % and 98.1 %, respectively. We further compare LPI-CSFFR with the state-of-the-art existing methods on the same benchmark datasets to evaluate the performance. In addition, to test the generalization performance of the model, we independently test sample pairs of five model organisms, where Mus musculus are the highest prediction accuracy of 99.5 %, and we find multiple hotspot proteins after constructing an interaction network. Finally, we test the predictive power of the LPI-CSFFR for sample pairs with unknown interactions. The results indicate that LPI-CSFFR is promising for predicting potential LncRPIs. The relevant source code and the data used in this study are available at https://github.com/JianjunTan-Beijing/LPI-CSFFR.
Collapse
Affiliation(s)
- Xiaoqian Huang
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| | - Yi Shi
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| | - Jing Yan
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| | - Wenyan Qu
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| | - Xiaoyi Li
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| | - Jianjun Tan
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China.
| |
Collapse
|
28
|
Yang R, Liu H, Yang L, Zhou T, Li X, Zhao Y. RPpocket: An RNA–Protein Intuitive Database with RNA Pocket Topology Resources. Int J Mol Sci 2022; 23:ijms23136903. [PMID: 35805909 PMCID: PMC9266927 DOI: 10.3390/ijms23136903] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 06/13/2022] [Accepted: 06/20/2022] [Indexed: 02/04/2023] Open
Abstract
RNA–protein complexes regulate a variety of biological functions. Thus, it is essential to explore and visualize RNA–protein structural interaction features, especially pocket interactions. In this work, we develop an easy-to-use bioinformatics resource: RPpocket. This database provides RNA–protein complex interactions based on sequence, secondary structure, and pocket topology analysis. We extracted 793 pockets from 74 non-redundant RNA–protein structures. Then, we calculated the binding- and non-binding pocket topological properties and analyzed the binding mechanism of the RNA–protein complex. The results showed that the binding pockets were more extended than the non-binding pockets. We also found that long-range forces were the main interaction for RNA–protein recognition, while short-range forces strengthened and optimized the binding. RPpocket could facilitate RNA–protein engineering for biological or medical applications.
Collapse
|
29
|
Chu Y, Guo S, Cui D, Fu X, Ma Y. DeephageTP: a convolutional neural network framework for identifying phage-specific proteins from metagenomic sequencing data. PeerJ 2022; 10:e13404. [PMID: 35698617 PMCID: PMC9188312 DOI: 10.7717/peerj.13404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Accepted: 04/18/2022] [Indexed: 01/14/2023] Open
Abstract
Bacteriophages (phages) are the most abundant and diverse biological entity on Earth. Due to the lack of universal gene markers and database representatives, there about 50-90% of genes of phages are unable to assign functions. This makes it a challenge to identify phage genomes and annotate functions of phage genes efficiently by homology search on a large scale, especially for newly phages. Portal (portal protein), TerL (large terminase subunit protein), and TerS (small terminase subunit protein) are three specific proteins of Caudovirales phage. Here, we developed a CNN (convolutional neural network)-based framework, DeephageTP, to identify the three specific proteins from metagenomic data. The framework takes one-hot encoding data of original protein sequences as the input and automatically extracts predictive features in the process of modeling. To overcome the false positive problem, a cutoff-loss-value strategy is introduced based on the distributions of the loss values of protein sequences within the same category. The proposed model with a set of cutoff-loss-values demonstrates high performance in terms of Precision in identifying TerL and Portal sequences (94% and 90%, respectively) from the mimic metagenomic dataset. Finally, we tested the efficacy of the framework using three real metagenomic datasets, and the results shown that compared to the conventional alignment-based methods, our proposed framework had a particular advantage in identifying the novel phage-specific protein sequences of portal and TerL with remote homology to their counterparts in the training datasets. In summary, our study for the first time develops a CNN-based framework for identifying the phage-specific protein sequences with high complexity and low conservation, and this framework will help us find novel phages in metagenomic sequencing data. The DeephageTP is available at https://github.com/chuym726/DeephageTP.
Collapse
Affiliation(s)
- Yunmeng Chu
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese, Shenzhen, Guangdong, P.R. China,Department of Bioengineering and Biotechnology, Huaqiao University, Xiamen, Fujian, P.R. China
| | - Shun Guo
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese, Shenzhen, Guangdong, P.R. China
| | - Dachao Cui
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese, Shenzhen, Guangdong, P.R. China
| | - Xiongfei Fu
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese, Shenzhen, Guangdong, P.R. China
| | - Yingfei Ma
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese, Shenzhen, Guangdong, P.R. China
| |
Collapse
|
30
|
Yin A, Chen W, Tang L, Zhong M, Jia B. Pseudogene CLEC4GP1 modulates trophoblast cell apoptosis and invasion via IL-15 inhibition. Exp Cell Res 2022; 418:113215. [DOI: 10.1016/j.yexcr.2022.113215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 05/16/2022] [Accepted: 05/17/2022] [Indexed: 11/04/2022]
|
31
|
Yu B, Wang X, Zhang Y, Gao H, Wang Y, Liu Y, Gao X. RPI-MDLStack: Predicting RNA-protein interactions through deep learning with stacking strategy and LASSO. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.108676] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
|
32
|
Ren ZH, Yu CQ, Li LP, You ZH, Guan YJ, Li YC, Pan J. SAWRPI: A Stacking Ensemble Framework With Adaptive Weight for Predicting ncRNA-Protein Interactions Using Sequence Information. Front Genet 2022; 13:839540. [PMID: 35360836 PMCID: PMC8963817 DOI: 10.3389/fgene.2022.839540] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 02/07/2022] [Indexed: 11/13/2022] Open
Abstract
Non-coding RNAs (ncRNAs) take essential effects on biological processes, like gene regulation. One critical way of ncRNA executing biological functions is interactions between ncRNA and RNA binding proteins (RBPs). Identifying proteins, involving ncRNA-protein interactions, can well understand the function ncRNA. Many high-throughput experiment have been applied to recognize the interactions. As a consequence of these approaches are time- and labor-consuming, currently, a great number of computational methods have been developed to improve and advance the ncRNA-protein interactions research. However, these methods may be not available to all RNAs and proteins, particularly processing new RNAs and proteins. Additionally, most of them cannot process well with long sequence. In this work, a computational method SAWRPI is proposed to make prediction of ncRNA-protein through sequence information. More specifically, the raw features of protein and ncRNA are firstly extracted through the k-mer sparse matrix with SVD reduction and learning nucleic acid symbols by natural language processing with local fusion strategy, respectively. Then, to classify easily, Hilbert Transformation is exploited to transform raw feature data to the new feature space. Finally, stacking ensemble strategy is adopted to learn high-level abstraction features automatically and generate final prediction results. To confirm the robustness and stability, three different datasets containing two kinds of interactions are utilized. In comparison with state-of-the-art methods and other results classifying or feature extracting strategies, SAWRPI achieved high performance on three datasets, containing two kinds of lncRNA-protein interactions. Upon our finding, SAWRPI is a trustworthy, robust, yet simple and can be used as a beneficial supplement to the task of predicting ncRNA-protein interactions.
Collapse
Affiliation(s)
- Zhong-Hao Ren
- School of Information Engineering, Xijing University, Xi’an, China
| | - Chang-Qing Yu
- School of Information Engineering, Xijing University, Xi’an, China
- *Correspondence: Li-Ping Li, ; Chang-Qing Yu,
| | - Li-Ping Li
- School of Information Engineering, Xijing University, Xi’an, China
- *Correspondence: Li-Ping Li, ; Chang-Qing Yu,
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| | - Yong-Jian Guan
- School of Information Engineering, Xijing University, Xi’an, China
| | - Yue-Chao Li
- School of Information Engineering, Xijing University, Xi’an, China
| | - Jie Pan
- School of Information Engineering, Xijing University, Xi’an, China
| |
Collapse
|
33
|
Zhao G, Li P, Qiao X, Han X, Liu ZP. Predicting lncRNA–Protein Interactions by Heterogenous Network Embedding. Front Genet 2022; 12:814073. [PMID: 35186016 PMCID: PMC8854746 DOI: 10.3389/fgene.2021.814073] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 12/27/2021] [Indexed: 12/25/2022] Open
Abstract
lncRNA–protein interactions play essential roles in a variety of cellular processes. However, the experimental methods for systematically mapping of lncRNA–protein interactions remain time-consuming and expensive. Therefore, it is urgent to develop reliable computational methods for predicting lncRNA–protein interactions. In this study, we propose a computational method called LncPNet to predict potential lncRNA–protein interactions by embedding an lncRNA–protein heterogenous network. The experimental results indicate that LncPNet achieves promising performance on benchmark datasets extracted from the NPInter database with an accuracy of 0.930 and area under ROC curve (AUC) of 0.971. In addition, we further compare our method with other eight state-of-the-art methods, and the results illustrate that our method achieves superior prediction performance. LncPNet provides an effective method via a new perspective of representing lncRNA–protein heterogenous network, which will greatly benefit the prediction of lncRNA–protein interactions.
Collapse
Affiliation(s)
- Guoqing Zhao
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, China
| | - Pengpai Li
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, China
| | - Xu Qiao
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, China
| | - Xianhua Han
- Faculty of Science, Yamaguchi University, Yamaguchi, Japan
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, China
- *Correspondence: Zhi-Ping Liu,
| |
Collapse
|
34
|
Cui F, Zhang Z, Cao C, Zou Q, Chen D, Su X. Protein-DNA/RNA interactions: Machine intelligence tools and approaches in the era of artificial intelligence and big data. Proteomics 2022; 22:e2100197. [PMID: 35112474 DOI: 10.1002/pmic.202100197] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 01/02/2022] [Accepted: 01/17/2022] [Indexed: 11/09/2022]
Abstract
With the development of artificial intelligence technologies and the availability of large amounts of biological data, computational methods for proteomics have undergone a developmental process from traditional machine learning to deep learning. This review focuses on computational approaches and tools for the prediction of protein-DNA/RNA interactions using machine intelligence techniques. We provide an overview of the development progress of computational methods and summarize the advantages and shortcomings of these methods. We further compiled applications in tasks related to the protein-DNA/RNA interactions, and pointed out possible future application trends. Moreover, biological sequence-digitizing representation strategies used in different types of computational methods are also summarized and discussed. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Feifei Cui
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| | - Zilong Zhang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| | - Chen Cao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| | - Dong Chen
- College of Electrical and Information Engineering, Quzhou University, Quzhou, 324000, China
| | - Xi Su
- Foshan Maternal and Child Health Hospital, Foshan, Guangdong, China
| |
Collapse
|
35
|
3D Modeling of Non-coding RNA Interactions. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1385:281-317. [DOI: 10.1007/978-3-031-08356-3_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
36
|
Zhao D, Wang C, Yan S, Chen R. Advances in the identification of long non-coding RNA binding proteins. Anal Biochem 2021; 639:114520. [PMID: 34896376 DOI: 10.1016/j.ab.2021.114520] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Revised: 12/04/2021] [Accepted: 12/04/2021] [Indexed: 02/06/2023]
Abstract
Long non-coding RNAs (lncRNAs) are transcripts longer than 200 nt without evident protein coding function. They play important regulatory roles in many biological processes, e.g., gene regulation, chromatin remodeling, and cell fate determination during development. Dysregulation of lncRNAs has been observed in various diseases including cancer. Interacting with proteins is a crucial way for lncRNAs to play their biological roles. Therefore, the characterization of lncRNA binding proteins is important to understand their functions and to delineate the underlying molecular mechanism. Large-scale studies based on mass spectrometry have characterized over a thousand new RNA binding proteins without known RNA-binding domains, thus revealing the complexity and diversity of RNA-protein interactions. In addition, several methods have been developed to identify the binding proteins for particular RNAs of interest. Here we review the progress of the RNA-centric methods for the identification of RNA-protein interactions, focusing on the studies involving lncRNAs, and discuss their strengths and limitations.
Collapse
Affiliation(s)
- Dongqing Zhao
- School of Pharmaceutical Science and Technology, Tianjin University, Tianjin, 300072, China
| | - Chunqing Wang
- The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Jinan, 250014, China
| | - Shuai Yan
- Peking University First Hospital, Peking University Health Science Center, Beijing, 100191, China
| | - Ruibing Chen
- School of Pharmaceutical Science and Technology, Tianjin University, Tianjin, 300072, China.
| |
Collapse
|
37
|
Peng L, Yuan R, Shen L, Gao P, Zhou L. LPI-EnEDT: an ensemble framework with extra tree and decision tree classifiers for imbalanced lncRNA-protein interaction data classification. BioData Min 2021; 14:50. [PMID: 34861891 PMCID: PMC8642957 DOI: 10.1186/s13040-021-00277-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 08/22/2021] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Long noncoding RNAs (lncRNAs) have dense linkages with various biological processes. Identifying interacting lncRNA-protein pairs contributes to understand the functions and mechanisms of lncRNAs. Wet experiments are costly and time-consuming. Most computational methods failed to observe the imbalanced characterize of lncRNA-protein interaction (LPI) data. More importantly, they were measured based on a unique dataset, which produced the prediction bias. RESULTS In this study, we develop an Ensemble framework (LPI-EnEDT) with Extra tree and Decision Tree classifiers to implement imbalanced LPI data classification. First, five LPI datasets are arranged. Second, lncRNAs and proteins are separately characterized based on Pyfeat and BioTriangle and concatenated as a vector to represent each lncRNA-protein pair. Finally, an ensemble framework with Extra tree and decision tree classifiers is developed to classify unlabeled lncRNA-protein pairs. The comparative experiments demonstrate that LPI-EnEDT outperforms four classical LPI prediction methods (LPI-BLS, LPI-CatBoost, LPI-SKF, and PLIPCOM) under cross validations on lncRNAs, proteins, and LPIs. The average AUC values on the five datasets are 0.8480, 0,7078, and 0.9066 under the three cross validations, respectively. The average AUPRs are 0.8175, 0.7265, and 0.8882, respectively. Case analyses suggest that there are underlying associations between HOTTIP and Q9Y6M1, NRON and Q15717. CONCLUSIONS Fusing diverse biological features of lncRNAs and proteins and exploiting an ensemble learning model with Extra tree and decision tree classifiers, this work focus on imbalanced LPI data classification as well as interaction information inference for a new lncRNA (or protein).
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China.,College of Life Sciences and Chemistry, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Ruya Yuan
- School of Computer Science, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Ling Shen
- School of Computer Science, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Pengfei Gao
- College of Life Sciences and Chemistry, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Liqian Zhou
- School of Computer Science, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China.
| |
Collapse
|
38
|
LPI-HyADBS: a hybrid framework for lncRNA-protein interaction prediction integrating feature selection and classification. BMC Bioinformatics 2021; 22:568. [PMID: 34836494 PMCID: PMC8620196 DOI: 10.1186/s12859-021-04485-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 11/09/2021] [Indexed: 12/03/2022] Open
Abstract
Background Long noncoding RNAs (lncRNAs) have dense linkages with a plethora of important cellular activities. lncRNAs exert functions by linking with corresponding RNA-binding proteins. Since experimental techniques to detect lncRNA-protein interactions (LPIs) are laborious and time-consuming, a few computational methods have been reported for LPI prediction. However, computation-based LPI identification methods have the following limitations: (1) Most methods were evaluated on a single dataset, and researchers may thus fail to measure their generalization ability. (2) The majority of methods were validated under cross validation on lncRNA-protein pairs, did not investigate the performance under other cross validations, especially for cross validation on independent lncRNAs and independent proteins. (3) lncRNAs and proteins have abundant biological information, how to select informative features need to further investigate. Results Under a hybrid framework (LPI-HyADBS) integrating feature selection based on AdaBoost, and classification models including deep neural network (DNN), extreme gradient Boost (XGBoost), and SVM with a penalty Coefficient of misclassification (C-SVM), this work focuses on finding new LPIs. First, five datasets are arranged. Each dataset contains lncRNA sequences, protein sequences, and an LPI network. Second, biological features of lncRNAs and proteins are acquired based on Pyfeat. Third, the obtained features of lncRNAs and proteins are selected based on AdaBoost and concatenated to depict each LPI sample. Fourth, DNN, XGBoost, and C-SVM are used to classify lncRNA-protein pairs based on the concatenated features. Finally, a hybrid framework is developed to integrate the classification results from the above three classifiers. LPI-HyADBS is compared to six classical LPI prediction approaches (LPI-SKF, LPI-NRLMF, Capsule-LPI, LPI-CNNCP, LPLNP, and LPBNI) on five datasets under 5-fold cross validations on lncRNAs, proteins, lncRNA-protein pairs, and independent lncRNAs and independent proteins. The results show LPI-HyADBS has the best LPI prediction performance under four different cross validations. In particular, LPI-HyADBS obtains better classification ability than other six approaches under the constructed independent dataset. Case analyses suggest that there is relevance between ZNF667-AS1 and Q15717. Conclusions Integrating feature selection approach based on AdaBoost, three classification techniques including DNN, XGBoost, and C-SVM, this work develops a hybrid framework to identify new linkages between lncRNAs and proteins. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04485-x.
Collapse
|
39
|
Ma Y, Wang X, Luo W, Xiao J, Song X, Wang Y, Shuai H, Ren Z, Wang Y. Roles of Emerging RNA-Binding Activity of cGAS in Innate Antiviral Response. Front Immunol 2021; 12:741599. [PMID: 34899698 PMCID: PMC8660693 DOI: 10.3389/fimmu.2021.741599] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 10/25/2021] [Indexed: 12/12/2022] Open
Abstract
cGAS, a DNA sensor in mammalian cells, catalyzes the generation of 2'-3'-cyclic AMP-GMP (cGAMP) once activated by the binding of free DNA. cGAMP can bind to STING, activating downstream TBK1-IRF-3 signaling to initiate the expression of type I interferons. Although cGAS has been considered a traditional DNA-binding protein, several lines of evidence suggest that cGAS is a potential RNA-binding protein (RBP), which is mainly supported by its interactions with RNAs, RBP partners, RNA/cGAS-phase-separations as well as its structural similarity with the dsRNA recognition receptor 2'-5' oligoadenylate synthase. Moreover, two influential studies reported that the cGAS-like receptors (cGLRs) of fly Drosophila melanogaster sense RNA and control 3'-2'-cGAMP signaling. In this review, we summarize and discuss in depth recent studies that identified or implied cGAS as an RBP. We also comprehensively summarized current experimental methods and computational tools that can identify or predict RNAs that bind to cGAS. Based on these discussions, we appeal that the RNA-binding activity of cGAS cannot be ignored in the cGAS-mediated innate antiviral response. It will be important to identify RNAs that can bind and regulate the activity of cGAS in cells with or without virus infection. Our review provides novel insight into the regulation of cGAS by its RNA-binding activity and extends beyond its DNA-binding activity. Our review would be significant for understanding the precise modulation of cGAS activity, providing the foundation for the future development of drugs against cGAS-triggering autoimmune diseases such as Aicardi-Gourtières syndrome.
Collapse
Affiliation(s)
- Yuying Ma
- Guangzhou Jinan Biomedicine Research and Development Center, National Engineering Research Center of Genetic Medicine, Institute of Biomedicine, College of Life Science and Technology, Jinan University, Guangzhou, China
- Key Laboratory of Virology of Guangdong Province, Jinan University, Guangzhou, China
- Guangdong Province Key Laboratory of Bioengineering Medicine, Jinan University, Guangzhou, China
| | - Xiaohui Wang
- Guangzhou Jinan Biomedicine Research and Development Center, National Engineering Research Center of Genetic Medicine, Institute of Biomedicine, College of Life Science and Technology, Jinan University, Guangzhou, China
- Key Laboratory of Virology of Guangdong Province, Jinan University, Guangzhou, China
- Guangdong Province Key Laboratory of Bioengineering Medicine, Jinan University, Guangzhou, China
| | - Weisheng Luo
- Guangzhou Jinan Biomedicine Research and Development Center, National Engineering Research Center of Genetic Medicine, Institute of Biomedicine, College of Life Science and Technology, Jinan University, Guangzhou, China
- Key Laboratory of Virology of Guangdong Province, Jinan University, Guangzhou, China
- Guangdong Province Key Laboratory of Bioengineering Medicine, Jinan University, Guangzhou, China
| | - Ji Xiao
- Guangzhou Jinan Biomedicine Research and Development Center, National Engineering Research Center of Genetic Medicine, Institute of Biomedicine, College of Life Science and Technology, Jinan University, Guangzhou, China
- Key Laboratory of Virology of Guangdong Province, Jinan University, Guangzhou, China
- Guangdong Province Key Laboratory of Bioengineering Medicine, Jinan University, Guangzhou, China
| | - Xiaowei Song
- Guangzhou Jinan Biomedicine Research and Development Center, National Engineering Research Center of Genetic Medicine, Institute of Biomedicine, College of Life Science and Technology, Jinan University, Guangzhou, China
- Key Laboratory of Virology of Guangdong Province, Jinan University, Guangzhou, China
- Guangdong Province Key Laboratory of Bioengineering Medicine, Jinan University, Guangzhou, China
| | - Yifei Wang
- Guangzhou Jinan Biomedicine Research and Development Center, National Engineering Research Center of Genetic Medicine, Institute of Biomedicine, College of Life Science and Technology, Jinan University, Guangzhou, China
- Key Laboratory of Virology of Guangdong Province, Jinan University, Guangzhou, China
- Guangdong Province Key Laboratory of Bioengineering Medicine, Jinan University, Guangzhou, China
| | - Hanlin Shuai
- Department of Obstetrics and Gynecology, The Fifth Affiliated Hospital of Jinan University, Heyuan, China
| | - Zhe Ren
- Guangzhou Jinan Biomedicine Research and Development Center, National Engineering Research Center of Genetic Medicine, Institute of Biomedicine, College of Life Science and Technology, Jinan University, Guangzhou, China
- Key Laboratory of Virology of Guangdong Province, Jinan University, Guangzhou, China
- Guangdong Province Key Laboratory of Bioengineering Medicine, Jinan University, Guangzhou, China
| | - Yiliang Wang
- Guangzhou Jinan Biomedicine Research and Development Center, National Engineering Research Center of Genetic Medicine, Institute of Biomedicine, College of Life Science and Technology, Jinan University, Guangzhou, China
- Key Laboratory of Virology of Guangdong Province, Jinan University, Guangzhou, China
- Guangdong Province Key Laboratory of Bioengineering Medicine, Jinan University, Guangzhou, China
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| |
Collapse
|
40
|
Mushtaq M, Naveed H, Khalid Z. Computational Prediction of lncRNA-Protein Interactions using Machine learning. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2021; 2021:2100-2103. [PMID: 34891703 DOI: 10.1109/embc46164.2021.9630282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Long non-coding RNAs have generated much scientific interest because of their functional significance in regulating various biological processes and also their dysfunction has been implicated in disease progression. LncRNAs usually bind with proteins to perform their function. The experimental approaches for identifying these interactions are time taking and expensive. Lately, numerous method on predicting lncRNA-protein interactions have been reported yet, they all have some prevalent drawbacks that limit their prediction performance. In this research, we proposed a computational method based on a similarity scheme that integrates features derived from sequence and structure similarities. When compared with the state of the art, the proposed method has achieved highest performance with accuracy and F1 measure of 98.6% and 98.7% using XGBoost as classifier. Our results showed that by combining sequence and structure based features the lncRNA protein interactions can be better predicted and can also complement the experimental techniques for this task.Clinical Relevance- The lncRNA-protein interactions play significant role in regulating various biological processes. This can help in providing early diagnosis and better treatment for cancer related diseases.
Collapse
|
41
|
Yu H, Shen ZA, Du PF. NPI-RGCNAE: Fast predicting ncRNA-protein interactions using the Relational Graph Convolutional Network Auto-Encoder. IEEE J Biomed Health Inform 2021; 26:1861-1871. [PMID: 34699377 DOI: 10.1109/jbhi.2021.3122527] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
- ncRNAs play important roles in a variety of biological processes by interacting with RNA-binding proteins. Therefore, identifying ncRNA-protein interactions is important to understanding the biological functions of ncRNAs. Since experimental methods to determine ncRNA-protein interactions are always costly and time-consuming, computational methods have been proposed as alternative approaches. We developed a novel method NPI-RGCNAE (predicting ncRNA-Protein Interactions by the Relational Graph Convolutional Network Auto-Encoder). With a reliable negative sample selection strategy, we applied the Relational Graph Convolutional Network encoder and the DistMult decoder to predict ncRNA-protein interactions in an accurate and efficient way. By using the 5-fold cross-validation, we found that our method achieved a comparable performance to all state-of-the-art methods. Our method requires less than 10% training time of all state-of-the-art methods. It is a more efficient choice with large datasets in practice. All datasets and source codes of NPI-RGCNAE have been deposited in a public Github repository (https://github.com/Angelia0hh/NPI-RGCNAE).
Collapse
|
42
|
Zhou H, Wekesa JS, Luan Y, Meng J. PRPI-SC: an ensemble deep learning model for predicting plant lncRNA-protein interactions. BMC Bioinformatics 2021; 22:415. [PMID: 34429059 PMCID: PMC8385908 DOI: 10.1186/s12859-021-04328-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2020] [Accepted: 11/09/2020] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Plant long non-coding RNAs (lncRNAs) play vital roles in many biological processes mainly through interactions with RNA-binding protein (RBP). To understand the function of lncRNAs, a fundamental method is to identify which types of proteins interact with the lncRNAs. However, the models or rules of interactions are a major challenge when calculating and estimating the types of RBP. RESULTS In this study, we propose an ensemble deep learning model to predict plant lncRNA-protein interactions using stacked denoising autoencoder and convolutional neural network based on sequence and structural information, named PRPI-SC. PRPI-SC predicts interactions between lncRNAs and proteins based on the k-mer features of RNAs and proteins. Experiments proved good results on Arabidopsis thaliana and Zea mays datasets (ATH948 and ZEA22133). The accuracy rates of ATH948 and ZEA22133 datasets were 88.9% and 82.6%, respectively. PRPI-SC also performed well on some public RNA protein interaction datasets. CONCLUSIONS PRPI-SC accurately predicts the interaction between plant lncRNA and protein, which plays a guiding role in studying the function and expression of plant lncRNA. At the same time, PRPI-SC has a strong generalization ability and good prediction effect for non-plant data.
Collapse
Affiliation(s)
- Haoran Zhou
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024 Liaoning China
| | - Jael Sanyanda Wekesa
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024 Liaoning China
| | - Yushi Luan
- School of Bioengineering, Dalian University of Technology, Dalian, 116024 Liaoning China
| | - Jun Meng
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024 Liaoning China
| |
Collapse
|
43
|
Das A, Sinha T, Shyamal S, Panda AC. Emerging Role of Circular RNA-Protein Interactions. Noncoding RNA 2021; 7:48. [PMID: 34449657 PMCID: PMC8395946 DOI: 10.3390/ncrna7030048] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 07/26/2021] [Accepted: 07/29/2021] [Indexed: 12/17/2022] Open
Abstract
Circular RNAs (circRNAs) are emerging as novel regulators of gene expression in various biological processes. CircRNAs regulate gene expression by interacting with cellular regulators such as microRNAs and RNA binding proteins (RBPs) to regulate downstream gene expression. The accumulation of high-throughput RNA-protein interaction data revealed the interaction of RBPs with the coding and noncoding RNAs, including recently discovered circRNAs. RBPs are a large family of proteins known to play a critical role in gene expression by modulating RNA splicing, nuclear export, mRNA stability, localization, and translation. However, the interaction of RBPs with circRNAs and their implications on circRNA biogenesis and function has been emerging in the last few years. Recent studies suggest that circRNA interaction with target proteins modulates the interaction of the protein with downstream target mRNAs or proteins. This review outlines the emerging mechanisms of circRNA-protein interactions and their functional role in cell physiology.
Collapse
Affiliation(s)
- Arundhati Das
- Institute of Life Sciences, Nalco Square, Bhubaneswar 751023, India; (A.D.); (T.S.); (S.S.)
- School of Biotechnology, KIIT University, Bhubaneswar 751024, India
| | - Tanvi Sinha
- Institute of Life Sciences, Nalco Square, Bhubaneswar 751023, India; (A.D.); (T.S.); (S.S.)
| | - Sharmishtha Shyamal
- Institute of Life Sciences, Nalco Square, Bhubaneswar 751023, India; (A.D.); (T.S.); (S.S.)
| | - Amaresh Chandra Panda
- Institute of Life Sciences, Nalco Square, Bhubaneswar 751023, India; (A.D.); (T.S.); (S.S.)
| |
Collapse
|
44
|
Karuppasamy MP, Venkateswaran S, Subbiah P. PDB-2-PBv3.0: An updated protein block database. J Bioinform Comput Biol 2021; 18:2050009. [PMID: 32404014 DOI: 10.1142/s0219720020500092] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Our protein block (PB) sequence database PDB-2-PBv1.0 provides PB sequences and dihedral angles for 74,297 protein structures comprising of 103,252 protein chains of Protein Data Bank (PDB) as on 2011. Since there are a lot of practical applications of PB and also as the size of PDB database increases, it becomes necessary to provide the PB sequences for all PDB protein structures. The current updated PDB-2-PBv3.0 contains PB sequences for 147,602 PDB structures comprising of 400,355 protein chains as on October 2019. When compared to our previous version PDB-2-PBv1.0, the current PDB-2-PBv3.0 contains 2- and 4-fold increase in the number of protein structures and chains, respectively. Notably, it provides PB information for any protein chain, regardless of the missing atom records of protein structure data in PDB. It includes protein interaction information with DNA and RNA along with their corresponding functional classes from Nucleic Acid Database (NDB) and PDB. Now, the updated version allows the user to download multiple PB records by parameter search and/or by a given list. This database is freely accessible at http://bioinfo.bdu.ac.in/pb3.
Collapse
Affiliation(s)
- Muthuvel Prasath Karuppasamy
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli 620 024, Tamil Nadu, India
| | - Suresh Venkateswaran
- Department of Paediatrics, Emory University School of Medicine & Children's Healthcare of Atlanta, GA, USA
| | - Parthasarathy Subbiah
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli 620 024, Tamil Nadu, India
| |
Collapse
|
45
|
Yu H, Shen ZA, Zhou YK, Du PF. Recent advances in predicting protein-lncRNA interactions using machine learning methods. Curr Gene Ther 2021; 22:228-244. [PMID: 34254917 DOI: 10.2174/1566523221666210712190718] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 05/01/2021] [Accepted: 05/31/2021] [Indexed: 11/22/2022]
Abstract
Long non-coding RNAs (LncRNAs) are a type of RNA with little or no protein-coding ability. Their length is more than 200 nucleotides. A large number of studies have indicated that lncRNAs play a significant role in various biological processes, including chromatin organizations, epigenetic programmings, transcriptional regulations, post-transcriptional processing, and circadian mechanism at the cellular level. Since lncRNAs perform vast functions through their interactions with proteins, identifying lncRNA-protein interaction is crucial to the understandings of the lncRNA molecular functions. However, due to the high cost and time-consuming disadvantage of experimental methods, a variety of computational methods have emerged. Recently, many effective and novel machine learning methods have been developed. In general, these methods fall into two categories: semi-supervised learning methods and supervised learning methods. The latter category can be further classified into the deep learning-based method, the ensemble learning-based method, and the hybrid method. In this paper, we focused on supervised learning methods. We summarized the state-of-the-art methods in predicting lncRNA-protein interactions. Furthermore, the performance and the characteristics of different methods have also been compared in this work. Considering the limits of the existing models, we analyzed the problems and discussed future research potentials.
Collapse
Affiliation(s)
- Han Yu
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Zi-Ang Shen
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Yuan-Ke Zhou
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Pu-Feng Du
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| |
Collapse
|
46
|
Chowdhary A, Satagopam V, Schneider R. Long Non-coding RNAs: Mechanisms, Experimental, and Computational Approaches in Identification, Characterization, and Their Biomarker Potential in Cancer. Front Genet 2021; 12:649619. [PMID: 34276764 PMCID: PMC8281131 DOI: 10.3389/fgene.2021.649619] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Accepted: 04/20/2021] [Indexed: 01/09/2023] Open
Abstract
Long non-coding RNAs are diverse class of non-coding RNA molecules >200 base pairs of length having various functions like gene regulation, dosage compensation, epigenetic regulation. Dysregulation and genomic variations of several lncRNAs have been implicated in several diseases. Their tissue and developmental specific expression are contributing factors for them to be viable indicators of physiological states of the cells. Here we present an comprehensive review the molecular mechanisms and functions, state of the art experimental and computational pipelines and challenges involved in the identification and functional annotation of lncRNAs and their prospects as biomarkers. We also illustrate the application of co-expression networks on the TCGA-LIHC dataset for putative functional predictions of lncRNAs having a therapeutic potential in Hepatocellular carcinoma (HCC).
Collapse
Affiliation(s)
- Anshika Chowdhary
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Venkata Satagopam
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Reinhard Schneider
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| |
Collapse
|
47
|
Zooming in on protein-RNA interactions: a multi-level workflow to identify interaction partners. Biochem Soc Trans 2021; 48:1529-1543. [PMID: 32820806 PMCID: PMC7458403 DOI: 10.1042/bst20191059] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Revised: 07/17/2020] [Accepted: 07/20/2020] [Indexed: 02/01/2023]
Abstract
Interactions between proteins and RNA are at the base of numerous cellular regulatory and functional phenomena. The investigation of the biological relevance of non-coding RNAs has led to the identification of numerous novel RNA-binding proteins (RBPs). However, defining the RNA sequences and structures that are selectively recognised by an RBP remains challenging, since these interactions can be transient and highly dynamic, and may be mediated by unstructured regions in the protein, as in the case of many non-canonical RBPs. Numerous experimental and computational methodologies have been developed to predict, identify and verify the binding between a given RBP and potential RNA partners, but navigating across the vast ocean of data can be frustrating and misleading. In this mini-review, we propose a workflow for the identification of the RNA binding partners of putative, newly identified RBPs. The large pool of potential binders selected by in-cell experiments can be enriched by in silico tools such as catRAPID, which is able to predict the RNA sequences more likely to interact with specific RBP regions with high accuracy. The RNA candidates with the highest potential can then be analysed in vitro to determine the binding strength and to precisely identify the binding sites. The results thus obtained can furthermore validate the computational predictions, offering an all-round solution to the issue of finding the most likely RNA binding partners for a newly identified potential RBP.
Collapse
|
48
|
Li Y, Sun H, Feng S, Zhang Q, Han S, Du W. Capsule-LPI: a LncRNA-protein interaction predicting tool based on a capsule network. BMC Bioinformatics 2021; 22:246. [PMID: 33985444 PMCID: PMC8120853 DOI: 10.1186/s12859-021-04171-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 05/05/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Long noncoding RNAs (lncRNAs) play important roles in multiple biological processes. Identifying LncRNA-protein interactions (LPIs) is key to understanding lncRNA functions. Although some LPIs computational methods have been developed, the LPIs prediction problem remains challenging. How to integrate multimodal features from more perspectives and build deep learning architectures with better recognition performance have always been the focus of research on LPIs. RESULTS We present a novel multichannel capsule network framework to integrate multimodal features for LPI prediction, Capsule-LPI. Capsule-LPI integrates four groups of multimodal features, including sequence features, motif information, physicochemical properties and secondary structure features. Capsule-LPI is composed of four feature-learning subnetworks and one capsule subnetwork. Through comprehensive experimental comparisons and evaluations, we demonstrate that both multimodal features and the architecture of the multichannel capsule network can significantly improve the performance of LPI prediction. The experimental results show that Capsule-LPI performs better than the existing state-of-the-art tools. The precision of Capsule-LPI is 87.3%, which represents a 1.7% improvement. The F-value of Capsule-LPI is 92.2%, which represents a 1.4% improvement. CONCLUSIONS This study provides a novel and feasible LPI prediction tool based on the integration of multimodal features and a capsule network. A webserver ( http://csbg-jlu.site/lpc/predict ) is developed to be convenient for users.
Collapse
Affiliation(s)
- Ying Li
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street, 130012, Changchun, China
| | - Hang Sun
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street, 130012, Changchun, China
| | - Shiyao Feng
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street, 130012, Changchun, China
| | - Qi Zhang
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street, 130012, Changchun, China
| | - Siyu Han
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street, 130012, Changchun, China
- Department of Computer Science, Faculty of Engineering, University of Bristol, Bristol, BS8 1UB, UK
| | - Wei Du
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street, 130012, Changchun, China.
| |
Collapse
|
49
|
Choudhary C, Sharma S, Meghwanshi KK, Patel S, Mehta P, Shukla N, Do DN, Rajpurohit S, Suravajhala P, Shukla JN. Long Non-Coding RNAs in Insects. Animals (Basel) 2021; 11:1118. [PMID: 33919662 PMCID: PMC8069800 DOI: 10.3390/ani11041118] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 03/30/2021] [Accepted: 04/06/2021] [Indexed: 12/27/2022] Open
Abstract
Only a small subset of all the transcribed RNAs are used as a template for protein translation, whereas RNA molecules that are not translated play a very important role as regulatory non-coding RNAs (ncRNAs). Besides traditionally known RNAs (ribosomal and transfer RNAs), ncRNAs also include small non-coding RNAs (sncRNAs) and long non-coding RNAs (lncRNAs). The lncRNAs, which were initially thought to be junk, have gained a great deal attention because of their regulatory roles in diverse biological processes in animals and plants. Insects are the most abundant and diverse group of animals on this planet. Recent studies have demonstrated the role of lncRNAs in almost all aspects of insect development, reproduction, and genetic plasticity. In this review, we describe the function and molecular mechanisms of the mode of action of different insect lncRNAs discovered up to date.
Collapse
Affiliation(s)
- Chhavi Choudhary
- Department of Biotechnology, School of Life Sciences, Central University of Rajasthan, Bandarsindari, Ajmer 305801, India; (C.C.); (K.K.M.)
| | - Shivasmi Sharma
- Department of Biotechnology, Amity University Jaipur, Jaipur 303002, India; (S.S.); (S.P.)
| | - Keshav Kumar Meghwanshi
- Department of Biotechnology, School of Life Sciences, Central University of Rajasthan, Bandarsindari, Ajmer 305801, India; (C.C.); (K.K.M.)
| | - Smit Patel
- Department of Biotechnology, Amity University Jaipur, Jaipur 303002, India; (S.S.); (S.P.)
| | - Prachi Mehta
- Division of Biological & Life Sciences, School of Arts and Sciences, Ahmedabad University, Gujarat 380009, India; (P.M.); (S.R.)
| | - Nidhi Shukla
- Department of Biotechnology and Bioinformatics, Birla Institute of Scientific Research, Jaipur 302001, India;
| | - Duy Ngoc Do
- Institute of Research and Development, Duy Tan University, Danang 550000, Vietnam;
| | - Subhash Rajpurohit
- Division of Biological & Life Sciences, School of Arts and Sciences, Ahmedabad University, Gujarat 380009, India; (P.M.); (S.R.)
| | - Prashanth Suravajhala
- Department of Biotechnology and Bioinformatics, Birla Institute of Scientific Research, Jaipur 302001, India;
- Bioclues.org, Vivekananda Nagar, Kukatpally, Hyderabad, Telangana 500072, India
| | - Jayendra Nath Shukla
- Department of Biotechnology, School of Life Sciences, Central University of Rajasthan, Bandarsindari, Ajmer 305801, India; (C.C.); (K.K.M.)
| |
Collapse
|
50
|
Shen ZA, Luo T, Zhou YK, Yu H, Du PF. NPI-GNN: Predicting ncRNA-protein interactions with deep graph neural networks. Brief Bioinform 2021; 22:6210071. [PMID: 33822882 DOI: 10.1093/bib/bbab051] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 01/29/2021] [Accepted: 02/01/2021] [Indexed: 12/23/2022] Open
Abstract
Noncoding RNAs (ncRNAs) play crucial roles in many biological processes. Experimental methods for identifying ncRNA-protein interactions (NPIs) are always costly and time-consuming. Many computational approaches have been developed as alternative ways. In this work, we collected five benchmarking datasets for predicting NPIs. Based on these datasets, we evaluated and compared the prediction performances of existing machine-learning based methods. Graph neural network (GNN) is a recently developed deep learning algorithm for link predictions on complex networks, which has never been applied in predicting NPIs. We constructed a GNN-based method, which is called Noncoding RNA-Protein Interaction prediction using Graph Neural Networks (NPI-GNN), to predict NPIs. The NPI-GNN method achieved comparable performance with state-of-the-art methods in a 5-fold cross-validation. In addition, it is capable of predicting novel interactions based on network information and sequence information. We also found that insufficient sequence information does not affect the NPI-GNN prediction performance much, which makes NPI-GNN more robust than other methods. As far as we can tell, NPI-GNN is the first end-to-end GNN predictor for predicting NPIs. All benchmarking datasets in this work and all source codes of the NPI-GNN method have been deposited with documents in a GitHub repo (https://github.com/AshuiRUA/NPI-GNN).
Collapse
Affiliation(s)
- Zi-Ang Shen
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Tao Luo
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Yuan-Ke Zhou
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Han Yu
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Pu-Feng Du
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| |
Collapse
|