1
|
Long J, Zhang Q, Lu X, Wen J, Zhao L, Xie W. Multi-scale locality preserving projection for partial multi-view incomplete multi-label learning. Neural Netw 2024; 180:106748. [PMID: 39332211 DOI: 10.1016/j.neunet.2024.106748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2024] [Revised: 08/25/2024] [Accepted: 09/17/2024] [Indexed: 09/29/2024]
Abstract
Amidst advancements in feature extraction techniques, research on multi-view multi-label classifications has attracted widespread interest in recent years. However, real-world scenarios often pose a challenge where the completeness of multiple views and labels cannot be ensured. At present, only a handful of techniques have attempted to address the complex issue of partial multi-view incomplete multi-label classification, and the majority of these approaches overlook the significance of manifold structures between instances. To tackle these challenges, we propose a novel partial multi-view incomplete multi-label learning model, termed MSLPP. Differing from existing studies, MSLPP emphasizes retaining the effective inherent structure of data during the feature extraction process, thereby facilitating a richer semantic information extraction. Specifically, MSLPP captures and integrates four types of information: the distance and similarity information in the original feature space, and the distance and similarity information in the extracted feature space. Further, by adopting the graph embedding technique, it simultaneously preserves the intrinsic structure with multi-scale information through a constraint term. Moreover, taking into account the negative impact of the missing views on the model and the possible impact of missing views on the data inherent structure, we further propose a shielding strategy for missing views, which not only eliminates the negative effects of missing views on the model but also more accurately captures the inherent data structure. The experimental results on five widely recognized datasets indicate that the model performs better than many excellent methods.
Collapse
Affiliation(s)
- Jiang Long
- College of Big Data and Information Engineering, Guizhou University, Guiyang, China
| | - Qi Zhang
- Faculty of Data Science, City University of Macau, Macao Special Administrative Region of China
| | - Xiaohuan Lu
- College of Big Data and Information Engineering, Guizhou University, Guiyang, China.
| | - Jie Wen
- Shenzhen Key Laboratory of Visual Object Detection and Recognition , Harbin Institute of Technology, Shenzhen, 518055, China
| | - Lian Zhao
- College of Big Data and Information Engineering, Guizhou University, Guiyang, China
| | - Wulin Xie
- College of Big Data and Information Engineering, Guizhou University, Guiyang, China
| |
Collapse
|
2
|
Zhang TH, Jo S, Zhang M, Wang K, Gao SJ, Huang Y. Understanding YTHDF2-mediated mRNA degradation by m6A-BERT-Deg. Brief Bioinform 2024; 25:bbae170. [PMID: 38622358 PMCID: PMC11018547 DOI: 10.1093/bib/bbae170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 03/05/2024] [Accepted: 03/20/2024] [Indexed: 04/17/2024] Open
Abstract
N6-methyladenosine (m6A) is the most abundant mRNA modification within mammalian cells, holding pivotal significance in the regulation of mRNA stability, translation and splicing. Furthermore, it plays a critical role in the regulation of RNA degradation by primarily recruiting the YTHDF2 reader protein. However, the selective regulation of mRNA decay of the m6A-methylated mRNA through YTHDF2 binding is poorly understood. To improve our understanding, we developed m6A-BERT-Deg, a BERT model adapted for predicting YTHDF2-mediated degradation of m6A-methylated mRNAs. We meticulously assembled a high-quality training dataset by integrating multiple data sources for the HeLa cell line. To overcome the limitation of small training samples, we employed a pre-training-fine-tuning strategy by first performing a self-supervised pre-training of the model on 427 760 unlabeled m6A site sequences. The test results demonstrated the importance of this pre-training strategy in enabling m6A-BERT-Deg to outperform other benchmark models. We further conducted a comprehensive model interpretation and revealed a surprising finding that the presence of co-factors in proximity to m6A sites may disrupt YTHDF2-mediated mRNA degradation, subsequently enhancing mRNA stability. We also extended our analyses to the HEK293 cell line, shedding light on the context-dependent YTHDF2-mediated mRNA degradation.
Collapse
Affiliation(s)
- Ting-He Zhang
- Cancer Virology Program, UPMC Hillman Cancer Center, University of Pittsburgh School of Medicine, Pittsburgh, PA 15232, USA
- Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA,15261, USA
| | - Sumin Jo
- Cancer Virology Program, UPMC Hillman Cancer Center, University of Pittsburgh School of Medicine, Pittsburgh, PA 15232, USA
- Department of Electrical and Computer Engineering, Swanson School of Engineering, University of Pittsburgh, Pittsburgh, PA, 15261, USA
| | - Michelle Zhang
- Department of Electrical and Computer Engineering, The University of Texas at San Antonio, San Antonio, TX, 78249, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Shou-Jiang Gao
- Cancer Virology Program, UPMC Hillman Cancer Center, University of Pittsburgh School of Medicine, Pittsburgh, PA 15232, USA
- Department of Microbiology and Molecular Genetics, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15219, USA
| | - Yufei Huang
- Cancer Virology Program, UPMC Hillman Cancer Center, University of Pittsburgh School of Medicine, Pittsburgh, PA 15232, USA
- Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA,15261, USA
- Department of Electrical and Computer Engineering, Swanson School of Engineering, University of Pittsburgh, Pittsburgh, PA, 15261, USA
- Department of Pharmaceutical Sciences, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15261, USA
| |
Collapse
|
3
|
Tang W, Deng Z, Zhou H, Zhang W, Hu F, Choi KS, Wang S. MVDINET: A Novel Multi-Level Enzyme Function Predictor With Multi-View Deep Interactive Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:84-94. [PMID: 38015669 DOI: 10.1109/tcbb.2023.3337158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
As a class of extremely significant of biocatalysts, enzymes play an important role in the process of biological reproduction and metabolism. Therefore, the prediction of enzyme function is of great significance in biomedicine fields. Recently, computational methods for predicting enzyme function have been proposed, and they effectively reduce the cost of enzyme function prediction. However, there are still deficiencies for effectively mining the discriminant information for enzyme function recognition in existing methods. In this study, we present MVDINET, a novel method for multi-level enzyme function prediction. First, the initial multi-view feature data is extracted by the enzyme sequence. Then, the above initial views are fed into various deep specific network modules to learn the depth-specificity information. Further, a deep view interaction network is designed to extract the interaction information. Finally, the specificity information and interaction information are fed into a multi-view adaptively weighted classification. We compressively evaluate MVDINET on benchmark datasets and demonstrate that MVDINET is superior to existing methods.
Collapse
|
4
|
Horlacher M, Cantini G, Hesse J, Schinke P, Goedert N, Londhe S, Moyon L, Marsico A. A systematic benchmark of machine learning methods for protein-RNA interaction prediction. Brief Bioinform 2023; 24:bbad307. [PMID: 37635383 PMCID: PMC10516373 DOI: 10.1093/bib/bbad307] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 06/15/2023] [Accepted: 07/18/2023] [Indexed: 08/29/2023] Open
Abstract
RNA-binding proteins (RBPs) are central actors of RNA post-transcriptional regulation. Experiments to profile-binding sites of RBPs in vivo are limited to transcripts expressed in the experimental cell type, creating the need for computational methods to infer missing binding information. While numerous machine-learning based methods have been developed for this task, their use of heterogeneous training and evaluation datasets across different sets of RBPs and CLIP-seq protocols makes a direct comparison of their performance difficult. Here, we compile a set of 37 machine learning (primarily deep learning) methods for in vivo RBP-RNA interaction prediction and systematically benchmark a subset of 11 representative methods across hundreds of CLIP-seq datasets and RBPs. Using homogenized sample pre-processing and two negative-class sample generation strategies, we evaluate methods in terms of predictive performance and assess the impact of neural network architectures and input modalities on model performance. We believe that this study will not only enable researchers to choose the optimal prediction method for their tasks at hand, but also aid method developers in developing novel, high-performing methods by introducing a standardized framework for their evaluation.
Collapse
Affiliation(s)
- Marc Horlacher
- Computational Health Center, Helmholtz Center Munich, Germany
- School of Computation, Information and Technology, Technical University Munich (TUM), Germany
| | - Giulia Cantini
- Computational Health Center, Helmholtz Center Munich, Germany
| | - Julian Hesse
- Computational Health Center, Helmholtz Center Munich, Germany
| | - Patrick Schinke
- Computational Health Center, Helmholtz Center Munich, Germany
| | - Nicolas Goedert
- Computational Health Center, Helmholtz Center Munich, Germany
| | | | - Lambert Moyon
- Computational Health Center, Helmholtz Center Munich, Germany
| | | |
Collapse
|
5
|
Feng W, Zhang H, Cao Y, Yang C, Khalid MHB, Yang Q, Li W, Wang Y, Fu F, Yu H. Comprehensive Identification of the Pum Gene Family and Its Involvement in Kernel Development in Maize. Int J Mol Sci 2023; 24:14036. [PMID: 37762337 PMCID: PMC10530998 DOI: 10.3390/ijms241814036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 09/07/2023] [Accepted: 09/12/2023] [Indexed: 09/29/2023] Open
Abstract
The Pumilio (Pum) RNA-binding protein family regulates post-transcription and plays crucial roles in stress response and growth. However, little is known about Pum in plants. In this study, a total of 19 ZmPum genes were identified and classified into two groups in maize. Although each ZmPum contains the conserved Pum domain, the ZmPum members show diversity in the gene and protein architectures, physicochemical properties, chromosomal location, collinearity, cis-elements, and expression patterns. The typical ZmPum proteins have eight α-helices repeats, except for ZmPum2, 3, 5, 7, and 14, which have fewer α-helices. Moreover, we examined the expression profiles of ZmPum genes and found their involvement in kernel development. Except for ZmPum2, ZmPum genes are expressed in maize embryos, endosperms, or whole seeds. Notably, ZmPum4, 7, and 13 exhibited dramatically high expression levels during seed development. The study not only contributes valuable information for further validating the functions of ZmPum genes but also provides insights for improvement and enhancing maize yield.
Collapse
Affiliation(s)
- Wenqi Feng
- Key Laboratory of Biology and Genetic Improvement of Maize in Southwest Region, Ministry of Agriculture, Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Hongwanjun Zhang
- Key Laboratory of Biology and Genetic Improvement of Maize in Southwest Region, Ministry of Agriculture, Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Yang Cao
- Key Laboratory of Biology and Genetic Improvement of Maize in Southwest Region, Ministry of Agriculture, Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Cheng Yang
- Key Laboratory of Biology and Genetic Improvement of Maize in Southwest Region, Ministry of Agriculture, Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Muhammad Hayder Bin Khalid
- National Research Centre of Intercropping, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan
| | - Qingqing Yang
- Key Laboratory of Biology and Genetic Improvement of Maize in Southwest Region, Ministry of Agriculture, Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Wanchen Li
- Key Laboratory of Biology and Genetic Improvement of Maize in Southwest Region, Ministry of Agriculture, Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Yingge Wang
- Key Laboratory of Biology and Genetic Improvement of Maize in Southwest Region, Ministry of Agriculture, Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Fengling Fu
- Key Laboratory of Biology and Genetic Improvement of Maize in Southwest Region, Ministry of Agriculture, Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Haoqiang Yu
- Key Laboratory of Biology and Genetic Improvement of Maize in Southwest Region, Ministry of Agriculture, Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| |
Collapse
|
6
|
Liu D, Lin Z, Jia C. NeuroCNN_GNB: an ensemble model to predict neuropeptides based on a convolution neural network and Gaussian naive Bayes. Front Genet 2023; 14:1226905. [PMID: 37576553 PMCID: PMC10414792 DOI: 10.3389/fgene.2023.1226905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 06/30/2023] [Indexed: 08/15/2023] Open
Abstract
Neuropeptides contain more chemical information than other classical neurotransmitters and have multiple receptor recognition sites. These characteristics allow neuropeptides to have a correspondingly higher selectivity for nerve receptors and fewer side effects. Traditional experimental methods, such as mass spectrometry and liquid chromatography technology, still need the support of a complete neuropeptide precursor database and the basic characteristics of neuropeptides. Incomplete neuropeptide precursor and information databases will lead to false-positives or reduce the sensitivity of recognition. In recent years, studies have proven that machine learning methods can rapidly and effectively predict neuropeptides. In this work, we have made a systematic attempt to create an ensemble tool based on four convolution neural network models. These baseline models were separately trained on one-hot encoding, AAIndex, G-gap dipeptide encoding and word2vec and integrated using Gaussian Naive Bayes (NB) to construct our predictor designated NeuroCNN_GNB. Both 5-fold cross-validation tests using benchmark datasets and independent tests showed that NeuroCNN_GNB outperformed other state-of-the-art methods. Furthermore, this novel framework provides essential interpretations that aid the understanding of model success by leveraging the powerful Shapley Additive exPlanation (SHAP) algorithm, thereby highlighting the most important features relevant for predicting neuropeptides.
Collapse
Affiliation(s)
- Di Liu
- Information Science and Technology College, Dalian Maritime University, Dalian, China
| | - Zhengkui Lin
- Information Science and Technology College, Dalian Maritime University, Dalian, China
| | - Cangzhi Jia
- School of Science, Dalian Maritime University, Dalian, China
| |
Collapse
|
7
|
Wang N, Yan K, Zhang J, Liu B. iDRNA-ITF: identifying DNA- and RNA-binding residues in proteins based on induction and transfer framework. Brief Bioinform 2022; 23:6609520. [PMID: 35709747 DOI: 10.1093/bib/bbac236] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 05/06/2022] [Accepted: 05/20/2022] [Indexed: 11/14/2022] Open
Abstract
Protein-DNA and protein-RNA interactions are involved in many biological activities. In the post-genome era, accurate identification of DNA- and RNA-binding residues in protein sequences is of great significance for studying protein functions and promoting new drug design and development. Therefore, some sequence-based computational methods have been proposed for identifying DNA- and RNA-binding residues. However, they failed to fully utilize the functional properties of residues, leading to limited prediction performance. In this paper, a sequence-based method iDRNA-ITF was proposed to incorporate the functional properties in residue representation by using an induction and transfer framework. The properties of nucleic acid-binding residues were induced by the nucleic acid-binding residue feature extraction network, and then transferred into the feature integration modules of the DNA-binding residue prediction network and the RNA-binding residue prediction network for the final prediction. Experimental results on four test sets demonstrate that iDRNA-ITF achieves the state-of-the-art performance, outperforming the other existing sequence-based methods. The webserver of iDRNA-ITF is freely available at http://bliulab.net/iDRNA-ITF.
Collapse
Affiliation(s)
- Ning Wang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Ke Yan
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Jun Zhang
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.,Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
8
|
Du X, Zhao X, Zhang Y. DeepBtoD: Improved RNA-binding proteins prediction via integrated deep learning. J Bioinform Comput Biol 2022; 20:2250006. [PMID: 35451938 DOI: 10.1142/s0219720022500068] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
RNA-binding proteins (RBPs) have crucial roles in various cellular processes such as alternative splicing and gene regulation. Therefore, the analysis and identification of RBPs is an essential issue. However, although many computational methods have been developed for predicting RBPs, a few studies simultaneously consider local and global information from the perspective of the RNA sequence. Facing this challenge, we present a novel method called DeepBtoD, which predicts RBPs directly from RNA sequences. First, a [Formula: see text]-BtoD encoding is designed, which takes into account the composition of [Formula: see text]-nucleotides and their relative positions and forms a local module. Second, we designed a multi-scale convolutional module embedded with a self-attentive mechanism, the ms-focusCNN, which is used to further learn more effective, diverse, and discriminative high-level features. Finally, global information is considered to supplement local modules with ensemble learning to predict whether the target RNA binds to RBPs. Our preliminary 24 independent test datasets show that our proposed method can classify RBPs with the area under the curve of 0.933. Remarkably, DeepBtoD shows competitive results across seven state-of-the-art methods, suggesting that RBPs can be highly recognized by integrating local [Formula: see text]-BtoD and global information only from RNA sequences. Hence, our integrative method may be useful to improve the power of RBPs prediction, which might be particularly useful for modeling protein-nucleic acid interactions in systems biology studies. Our DeepBtoD server can be accessed at http://175.27.228.227/DeepBtoD/.
Collapse
Affiliation(s)
- XiuQuan Du
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei 230601, Anhui, P. R. China.,School of Computer Science and Technology, Anhui University, Hefei 230601, Anhui, P. R. China
| | - XiuJuan Zhao
- School of Computer Science and Technology, Anhui University, Hefei 230601, Anhui, P. R. China
| | - YanPing Zhang
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei 230601, Anhui, P. R. China
| |
Collapse
|
9
|
Chalupová E, Vaculík O, Poláček J, Jozefov F, Majtner T, Alexiou P. ENNGene: an Easy Neural Network model building tool for Genomics. BMC Genomics 2022; 23:248. [PMID: 35361122 PMCID: PMC8973509 DOI: 10.1186/s12864-022-08414-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 02/23/2022] [Indexed: 11/17/2022] Open
Abstract
Background The recent big data revolution in Genomics, coupled with the emergence of Deep Learning as a set of powerful machine learning methods, has shifted the standard practices of machine learning for Genomics. Even though Deep Learning methods such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are becoming widespread in Genomics, developing and training such models is outside the ability of most researchers in the field. Results Here we present ENNGene—Easy Neural Network model building tool for Genomics. This tool simplifies training of custom CNN or hybrid CNN-RNN models on genomic data via an easy-to-use Graphical User Interface. ENNGene allows multiple input branches, including sequence, evolutionary conservation, and secondary structure, and performs all the necessary preprocessing steps, allowing simple input such as genomic coordinates. The network architecture is selected and fully customized by the user, from the number and types of the layers to each layer's precise set-up. ENNGene then deals with all steps of training and evaluation of the model, exporting valuable metrics such as multi-class ROC and precision-recall curve plots or TensorBoard log files. To facilitate interpretation of the predicted results, we deploy Integrated Gradients, providing the user with a graphical representation of an attribution level of each input position. To showcase the usage of ENNGene, we train multiple models on the RBP24 dataset, quickly reaching the state of the art while improving the performance on more than half of the proteins by including the evolutionary conservation score and tuning the network per protein. Conclusions As the role of DL in big data analysis in the near future is indisputable, it is important to make it available for a broader range of researchers. We believe that an easy-to-use tool such as ENNGene can allow Genomics researchers without a background in Computational Sciences to harness the power of DL to gain better insights into and extract important information from the large amounts of data available in the field. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08414-x.
Collapse
Affiliation(s)
- Eliška Chalupová
- Faculty of Science, National Centre for Biomolecular Research, Masaryk University, Brno, Czechia.,Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia
| | - Ondřej Vaculík
- Faculty of Science, National Centre for Biomolecular Research, Masaryk University, Brno, Czechia.,Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia
| | - Jakub Poláček
- Faculty of Informatics, Masaryk University, Brno, Czechia
| | - Filip Jozefov
- Faculty of Informatics, Masaryk University, Brno, Czechia
| | - Tomáš Majtner
- Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia
| | - Panagiotis Alexiou
- Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia.
| |
Collapse
|
10
|
Yu B, Wang X, Zhang Y, Gao H, Wang Y, Liu Y, Gao X. RPI-MDLStack: Predicting RNA-protein interactions through deep learning with stacking strategy and LASSO. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.108676] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
|
11
|
Wang C, Zhang W, Tian R, Zhang J, Zhang L, Deng Z, Lv X, Li J, Liu L, Du G, Liu Y. Model‐driven design of synthetic N‐terminal coding sequences for regulating gene expression in yeast and bacteria. Biotechnol J 2022; 17:e2100655. [DOI: 10.1002/biot.202100655] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2021] [Revised: 01/12/2022] [Accepted: 01/13/2022] [Indexed: 11/12/2022]
Affiliation(s)
- Chenyun Wang
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, School of Biotechnology Jiangnan University Wuxi 214122 China
- Science Center for Future Foods Jiangnan University Wuxi 214122 China
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology Jiangnan University Wuxi 214122 China
- Jiangsu Province Engineering Research Center of Food Synthetic Biotechnology Jiangnan University Wuxi 214122 China
| | - Wei Zhang
- School of Artificial Intelligence and Computer Science Jiangnan University Wuxi 214122 China
- Jiangsu Key Laboratory of Media Design and Software Technology Wuxi 214122 China
| | - Rongzhen Tian
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, School of Biotechnology Jiangnan University Wuxi 214122 China
- Science Center for Future Foods Jiangnan University Wuxi 214122 China
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology Jiangnan University Wuxi 214122 China
- Jiangsu Province Engineering Research Center of Food Synthetic Biotechnology Jiangnan University Wuxi 214122 China
| | - Jianing Zhang
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, School of Biotechnology Jiangnan University Wuxi 214122 China
- Science Center for Future Foods Jiangnan University Wuxi 214122 China
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology Jiangnan University Wuxi 214122 China
- Jiangsu Province Engineering Research Center of Food Synthetic Biotechnology Jiangnan University Wuxi 214122 China
| | - Linpei Zhang
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, School of Biotechnology Jiangnan University Wuxi 214122 China
| | - Zhaohong Deng
- School of Artificial Intelligence and Computer Science Jiangnan University Wuxi 214122 China
- Jiangsu Key Laboratory of Media Design and Software Technology Wuxi 214122 China
| | - Xueqin Lv
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, School of Biotechnology Jiangnan University Wuxi 214122 China
- Science Center for Future Foods Jiangnan University Wuxi 214122 China
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology Jiangnan University Wuxi 214122 China
- Jiangsu Province Engineering Research Center of Food Synthetic Biotechnology Jiangnan University Wuxi 214122 China
| | - Jianghua Li
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, School of Biotechnology Jiangnan University Wuxi 214122 China
- Science Center for Future Foods Jiangnan University Wuxi 214122 China
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology Jiangnan University Wuxi 214122 China
- Jiangsu Province Engineering Research Center of Food Synthetic Biotechnology Jiangnan University Wuxi 214122 China
| | - Long Liu
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, School of Biotechnology Jiangnan University Wuxi 214122 China
- Science Center for Future Foods Jiangnan University Wuxi 214122 China
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology Jiangnan University Wuxi 214122 China
- Jiangsu Province Engineering Research Center of Food Synthetic Biotechnology Jiangnan University Wuxi 214122 China
| | - Guocheng Du
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, School of Biotechnology Jiangnan University Wuxi 214122 China
- Science Center for Future Foods Jiangnan University Wuxi 214122 China
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology Jiangnan University Wuxi 214122 China
- Jiangsu Province Engineering Research Center of Food Synthetic Biotechnology Jiangnan University Wuxi 214122 China
| | - Yanfeng Liu
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, School of Biotechnology Jiangnan University Wuxi 214122 China
- Science Center for Future Foods Jiangnan University Wuxi 214122 China
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology Jiangnan University Wuxi 214122 China
- Jiangsu Province Engineering Research Center of Food Synthetic Biotechnology Jiangnan University Wuxi 214122 China
- Qingdao Special Food Research Institute Wuxi 214122 China
| |
Collapse
|
12
|
DFpin: Deep learning-based protein-binding site prediction with feature-based non-redundancy from RNA level. Comput Biol Med 2022; 142:105216. [PMID: 35030497 DOI: 10.1016/j.compbiomed.2022.105216] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 12/19/2021] [Accepted: 01/02/2022] [Indexed: 11/20/2022]
Abstract
The interaction between proteins and RNA is closely related to various human diseases. Computer-aided drug design can be facilitated by detecting the RNA sites that bind proteins. However, due to the aggregation of binding sites in RNA sequences, high sample similarity occurs when extracting RNA fragments by using a sliding window. Considering these problems, we present a method, DFpin, to predict protein-interacting nucleotides in RNA. To retain more key nucleotide sites, we used the redundancy method based on feature similarity, that is, feature redundancy is removed based on the RNA mono-nucleotide composition to maintain the diversity of RNA samples and avoid the residue of redundant data. In addition, to extract key abstract features and avoid over-fitting, we used the cascade structure of a deep forest model to predict protein-interacting nucleotides. Overall, DFpin demonstrated excellent classification with 85.4% accuracy and 93.3% area under the curve. Compared with other methods, the accuracy of DFpin was better, suggesting that feature-based redundancy removal and deep forest can help predict nucleotides of protein interactions. The source code and all dataset are available at: https://github.com/zhaoxj-tech/DFpin.git.
Collapse
|
13
|
The Role of Pumilio RNA Binding Protein in Plants. Biomolecules 2021; 11:biom11121851. [PMID: 34944494 PMCID: PMC8699478 DOI: 10.3390/biom11121851] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Revised: 12/01/2021] [Accepted: 12/07/2021] [Indexed: 11/27/2022] Open
Abstract
Eukaryotic organisms have a posttranscriptional/translational regulation system for the control of translational efficiency. RNA binding proteins (RBPs) have been known to control target genes. One type of protein, Pumilio (Pum)/Puf family RNA binding proteins, show a specific binding of 3′ untranslational region (3′ UTR) of target mRNA and function as a post-transcriptional/translational regulator in eukaryotic cells. Plant Pum protein is involved in development and biotic/abiotic stresses. Interestingly, Arabidopsis Pum can control target genes in a sequence-specific manner and rRNA processing in a sequence-nonspecific manner. As shown in in silico Pum gene expression analysis, Arabidopsis and rice Pum genes are responsive to biotic/abiotic stresses. Plant Pum can commonly contribute to host gene regulation at the post-transcriptional/translational step, as can mammalian Pum. However, the function of plant Pum proteins is not yet fully known. In this review, we briefly summarize the function of plant Pum in defense, development, and environmental responses via recent research and bioinformatics data.
Collapse
|