1
|
Gillani M, Pollastri G. Protein subcellular localization prediction tools. Comput Struct Biotechnol J 2024; 23:1796-1807. [PMID: 38707539 PMCID: PMC11066471 DOI: 10.1016/j.csbj.2024.04.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 04/11/2024] [Accepted: 04/11/2024] [Indexed: 05/07/2024] Open
Abstract
Protein subcellular localization prediction is of great significance in bioinformatics and biological research. Most of the proteins do not have experimentally determined localization information, computational prediction methods and tools have been acting as an active research area for more than two decades now. Knowledge of the subcellular location of a protein provides valuable information about its functionalities, the functioning of the cell, and other possible interactions with proteins. Fast, reliable, and accurate predictors provides platforms to harness the abundance of sequence data to predict subcellular locations accordingly. During the last decade, there has been a considerable amount of research effort aimed at developing subcellular localization predictors. This paper reviews recent subcellular localization prediction tools in the Eukaryotic, Prokaryotic, and Virus-based categories followed by a detailed analysis. Each predictor is discussed based on its main features, strengths, weaknesses, algorithms used, prediction techniques, and analysis. This review is supported by prediction tools taxonomies that highlight their rele- vant area and examples for uncomplicated categorization and ease of understandability. These taxonomies help users find suitable tools according to their needs. Furthermore, recent research gaps and challenges are discussed to cover areas that need the utmost attention. This survey provides an in-depth analysis of the most recent prediction tools to facilitate readers and can be considered a quick guide for researchers to identify and explore the recent literature advancements.
Collapse
Affiliation(s)
- Maryam Gillani
- School of Computer Science, University College Dublin (UCD), Dublin, D04 V1W8, Ireland
| | - Gianluca Pollastri
- School of Computer Science, University College Dublin (UCD), Dublin, D04 V1W8, Ireland
| |
Collapse
|
2
|
Xiao C, Zhou Z, She J, Yin J, Cui F, Zhang Z. PEL-PVP: Application of plant vacuolar protein discriminator based on PEFT ESM-2 and bilayer LSTM in an unbalanced dataset. Int J Biol Macromol 2024; 277:134317. [PMID: 39094861 DOI: 10.1016/j.ijbiomac.2024.134317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Revised: 07/10/2024] [Accepted: 07/28/2024] [Indexed: 08/04/2024]
Abstract
Plant vacuoles, play a crucial role in maintaining cellular stability, adapting to environmental changes, and responding to external pressures. The accurate identification of vacuolar proteins (PVPs) is crucial for understanding the biosynthetic mechanisms of intracellular vacuoles and the adaptive mechanisms of plants. In order to more accurately identify vacuole proteins, this study developed a new predictive model PEL-PVP based on ESM-2. Through this study, the feasibility and effectiveness of using advanced pre-training models and fine-tuning techniques for bioinformatics tasks were demonstrated, providing new methods and ideas for plant vacuolar protein research. In addition, previous datasets for vacuolar proteins were balanced, but imbalance is more closely related to the actual situation. Therefore, this study constructed an imbalanced dataset UB-PVP from the UniProt database,helping the model better adapt to the complexity and uncertainty in real environments, thereby improving the model's generalization ability and practicality. The experimental results show that compared with existing recognition techniques, achieving significant improvements in multiple indicators, with 6.08 %, 13.51 %, 11.9 %, and 5 % improvements in ACC, SP, MCC, and AUC, respectively. The accuracy reaches 94.59 %, significantly higher than the previous best model GraphIdn. This provides an efficient and precise tool for the study of plant vacuole proteins.
Collapse
Affiliation(s)
- Cuilin Xiao
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Zheyu Zhou
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Jiayi She
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Jinfen Yin
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Feifei Cui
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Zilong Zhang
- School of Computer Science and Technology, Hainan University, Haikou 570228, China.
| |
Collapse
|
3
|
García Sánchez N, Ugarte Carro E, Prieto-Santamaría L, Rodríguez-González A. Protein sequence analysis in the context of drug repurposing. BMC Med Inform Decis Mak 2024; 24:122. [PMID: 38741115 DOI: 10.1186/s12911-024-02531-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 05/08/2024] [Indexed: 05/16/2024] Open
Abstract
MOTIVATION Drug repurposing speeds up the development of new treatments, being less costly, risky, and time consuming than de novo drug discovery. There are numerous biological elements that contribute to the development of diseases and, as a result, to the repurposing of drugs. METHODS In this article, we analysed the potential role of protein sequences in drug repurposing scenarios. For this purpose, we embedded the protein sequences by performing four state of the art methods and validated their capacity to encapsulate essential biological information through visualization. Then, we compared the differences in sequence distance between protein-drug target pairs of drug repurposing and non - drug repurposing data. Thus, we were able to uncover patterns that define protein sequences in repurposing cases. RESULTS We found statistically significant sequence distance differences between protein pairs in the repurposing data and the rest of protein pairs in non-repurposing data. In this manner, we verified the potential of using numerical representations of sequences to generate repurposing hypotheses in the future.
Collapse
Affiliation(s)
- Natalia García Sánchez
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, Madrid, 28223, Spain
| | - Esther Ugarte Carro
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, Madrid, 28223, Spain
| | - Lucía Prieto-Santamaría
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, Madrid, 28223, Spain
- ETS de Ingenieros Informáticos, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, 28660, Spain
| | - Alejandro Rodríguez-González
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, Madrid, 28223, Spain.
- ETS de Ingenieros Informáticos, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, 28660, Spain.
| |
Collapse
|
4
|
Anteghini M, Santos VAMD, Saccenti E. PortPred: Exploiting deep learning embeddings of amino acid sequences for the identification of transporter proteins and their substrates. J Cell Biochem 2023; 124:1803-1824. [PMID: 37877557 DOI: 10.1002/jcb.30490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 09/29/2023] [Accepted: 10/03/2023] [Indexed: 10/26/2023]
Abstract
The physiology of every living cell is regulated at some level by transporter proteins which constitute a relevant portion of membrane-bound proteins and are involved in the movement of ions, small and macromolecules across bio-membranes. The importance of transporter proteins is unquestionable. The prediction and study of previously unknown transporters can lead to the discovery of new biological pathways, drugs and treatments. Here we present PortPred, a tool to accurately identify transporter proteins and their substrate starting from the protein amino acid sequence. PortPred successfully combines pre-trained deep learning-based protein embeddings and machine learning classification approaches and outperforms other state-of-the-art methods. In addition, we present a comparison of the most promising protein sequence embeddings (Unirep, SeqVec, ProteinBERT, ESM-1b) and their performances for this specific task.
Collapse
Affiliation(s)
- Marco Anteghini
- LifeGlimmer GmbH, Berlin, Germany
- Department of Systems and Synthetic Biology, Wageningen University & Research, Wageningen WE, The Netherlands
- Department of Visual and Data-Centric Computing, Zuse Institute Berlin, Berlin, Germany
| | - Vitor Ap Martins Dos Santos
- LifeGlimmer GmbH, Berlin, Germany
- Department of Bioprocess Engineering, Wageningen University & Research, Wageningen WE, The Netherlands
| | - Edoardo Saccenti
- Department of Systems and Synthetic Biology, Wageningen University & Research, Wageningen WE, The Netherlands
| |
Collapse
|
5
|
Sui J, Chen J, Chen Y, Iwamori N, Sun J. Identification of plant vacuole proteins by using graph neural network and contact maps. BMC Bioinformatics 2023; 24:357. [PMID: 37740195 PMCID: PMC10517492 DOI: 10.1186/s12859-023-05475-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Accepted: 09/12/2023] [Indexed: 09/24/2023] Open
Abstract
Plant vacuoles are essential organelles in the growth and development of plants, and accurate identification of their proteins is crucial for understanding their biological properties. In this study, we developed a novel model called GraphIdn for the identification of plant vacuole proteins. The model uses SeqVec, a deep representation learning model, to initialize the amino acid sequence. We utilized the AlphaFold2 algorithm to obtain the structural information of corresponding plant vacuole proteins, and then fed the calculated contact maps into a graph convolutional neural network. GraphIdn achieved accuracy values of 88.51% and 89.93% in independent testing and fivefold cross-validation, respectively, outperforming previous state-of-the-art predictors. As far as we know, this is the first model to use predicted protein topology structure graphs to identify plant vacuole proteins. Furthermore, we assessed the effectiveness and generalization capability of our GraphIdn model by applying it to identify and locate peroxisomal proteins, which yielded promising outcomes. The source code and datasets can be accessed at https://github.com/SJNNNN/GraphIdn .
Collapse
Affiliation(s)
- Jianan Sui
- School of Information Science and Engineering, University of Jinan, Jinan, China
| | - Jiazi Chen
- Laboratory of Zoology, Graduate School of Bioresource and Bioenvironmental Sciences, Kyushu University, Fukuoka-Shi, Fukuoka, Japan
| | - Yuehui Chen
- School of Artificial Intelligence Institute and Information Science and Engineering, University of Jinan, Jinan, China.
| | - Naoki Iwamori
- Laboratory of Zoology, Graduate School of Bioresource and Bioenvironmental Sciences, Kyushu University, Fukuoka-Shi, Fukuoka, Japan
| | - Jin Sun
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China
| |
Collapse
|
6
|
Savojardo C, Martelli PL, Casadio R. Finding functional motifs in protein sequences with deep learning and natural language models. Curr Opin Struct Biol 2023; 81:102641. [PMID: 37385080 DOI: 10.1016/j.sbi.2023.102641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 04/17/2023] [Accepted: 05/24/2023] [Indexed: 07/01/2023]
Abstract
Recently, prediction of structural/functional motifs in protein sequences takes advantage of powerful machine learning based approaches. Protein encoding adopts protein language models overpassing standard procedures. Different combinations of machine learning and encoding schemas are available for predicting different structural/functional motifs. Particularly interesting is the adoption of protein language models to encode proteins in addition to evolution information and physicochemical parameters. A thorough analysis of recent predictors developed for annotating transmembrane regions, sorting signals, lipidation and phosphorylation sites allows to investigate the state-of-the-art focusing on the relevance of protein language models for the different tasks. This highlights that more experimental data are necessary to exploit available powerful machine learning methods.
Collapse
Affiliation(s)
- Castrense Savojardo
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Via San Giacomo 9/2, 40126 Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Via San Giacomo 9/2, 40126 Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Via San Giacomo 9/2, 40126 Bologna, Italy.
| |
Collapse
|
7
|
Shi Z, Deng R, Yuan Q, Mao Z, Wang R, Li H, Liao X, Ma H. Enzyme Commission Number Prediction and Benchmarking with Hierarchical Dual-core Multitask Learning Framework. RESEARCH (WASHINGTON, D.C.) 2023; 6:0153. [PMID: 37275124 PMCID: PMC10232324 DOI: 10.34133/research.0153] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 04/28/2023] [Indexed: 06/07/2023]
Abstract
Enzyme commission (EC) numbers, which associate a protein sequence with the biochemical reactions it catalyzes, are essential for the accurate understanding of enzyme functions and cellular metabolism. Many ab initio computational approaches were proposed to predict EC numbers for given input protein sequences. However, the prediction performance (accuracy, recall, and precision), usability, and efficiency of existing methods decreased seriously when dealing with recently discovered proteins, thus still having much room to be improved. Here, we report HDMLF, a hierarchical dual-core multitask learning framework for accurately predicting EC numbers based on novel deep learning techniques. HDMLF is composed of an embedding core and a learning core; the embedding core adopts the latest protein language model for protein sequence embedding, and the learning core conducts the EC number prediction. Specifically, HDMLF is designed on the basis of a gated recurrent unit framework to perform EC number prediction in the multi-objective hierarchy, multitasking manner. Additionally, we introduced an attention layer to optimize the EC prediction and employed a greedy strategy to integrate and fine-tune the final model. Comparative analyses against 4 representative methods demonstrate that HDMLF stably delivers the highest performance, which improves accuracy and F1 score by 60% and 40% over the state of the art, respectively. An additional case study of tyrB predicted to compensate for the loss of aspartate aminotransferase aspC, as reported in a previous experimental study, shows that our model can also be used to uncover the enzyme promiscuity. Finally, we established a web platform, namely, ECRECer (https://ecrecer.biodesign.ac.cn), using an entirely could-based serverless architecture and provided an offline bundle to improve usability.
Collapse
Affiliation(s)
- Zhenkun Shi
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, 300308, Tianjin, China
- National Center of Technology Innovation for Synthetic Biology, 300308, Tianjin, China
| | - Rui Deng
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, 300308, Tianjin, China
- National Center of Technology Innovation for Synthetic Biology, 300308, Tianjin, China
- College of Biotechnology,
Tianjin University of Science & Technology, Tianjin, China
| | - Qianqian Yuan
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, 300308, Tianjin, China
- National Center of Technology Innovation for Synthetic Biology, 300308, Tianjin, China
| | - Zhitao Mao
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, 300308, Tianjin, China
- National Center of Technology Innovation for Synthetic Biology, 300308, Tianjin, China
| | - Ruoyu Wang
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, 300308, Tianjin, China
- National Center of Technology Innovation for Synthetic Biology, 300308, Tianjin, China
| | - Haoran Li
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, 300308, Tianjin, China
- National Center of Technology Innovation for Synthetic Biology, 300308, Tianjin, China
| | - Xiaoping Liao
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, 300308, Tianjin, China
- National Center of Technology Innovation for Synthetic Biology, 300308, Tianjin, China
- Haihe Laboratory of Synthetic Biology, 300308, Tianjin, China
| | - Hongwu Ma
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, 300308, Tianjin, China
- National Center of Technology Innovation for Synthetic Biology, 300308, Tianjin, China
| |
Collapse
|
8
|
Anteghini M, Martins Dos Santos VAP. Computational Approaches for Peroxisomal Protein Localization. Methods Mol Biol 2023; 2643:405-411. [PMID: 36952202 DOI: 10.1007/978-1-0716-3048-8_29] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/24/2023]
Abstract
Computational approaches are practical when investigating putative peroxisomal proteins and for sub-peroxisomal protein localization in unknown protein sequences. Nowadays, advancements in computational methods and Machine Learning (ML) can be used to hasten the discovery of novel peroxisomal proteins and can be combined with more established computational methodologies. Here, we explain and list some of the most used tools and methodologies for novel peroxisomal protein detection and localization.
Collapse
Affiliation(s)
- Marco Anteghini
- Lifeglimmer GmbH, Berlin, Germany.
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Wageningen, WE, The Netherlands.
- Zuse Institut Berlin, Visual and Data-Centric Computing, Berlin, Germany.
| | - Vitor A P Martins Dos Santos
- Lifeglimmer GmbH, Berlin, Germany
- BioProcess Engineering, Wageningen University & Research, Wageningen, WE, The Netherlands
| |
Collapse
|
9
|
Anteghini M, Haja A, Martins dos Santos VA, Schomaker L, Saccenti E. OrganelX web server for sub-peroxisomal and sub-mitochondrial protein localization and peroxisomal target signal detection. Comput Struct Biotechnol J 2022; 21:128-133. [PMID: 36544474 PMCID: PMC9747352 DOI: 10.1016/j.csbj.2022.11.058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 11/28/2022] [Accepted: 11/28/2022] [Indexed: 12/12/2022] Open
Abstract
We present the OrganelX e-Science Web Server that provides a user-friendly implementation of the In-Pero and In-Mito classifiers for sub-peroxisomal and sub-mitochondrial localization of peroxisomal and mitochondrial proteins and the Is-PTS1 algorithm for detecting and validating potential peroxisomal proteins carrying a PTS1 signal sequence. The OrganelX e-Science Web Server is available at https://organelx.hpc.rug.nl/fasta/.
Collapse
Affiliation(s)
- Marco Anteghini
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Wageningen, The Netherlands,LifeGlimmer GmbH, Berlin, Germany,Corresponding author at: Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Wageningen, The Netherlands (M. Anteghini).
| | - Asmaa Haja
- Bernoulli Institute, University of Groningen, Groningen, The Netherlands
| | - Vitor A.P. Martins dos Santos
- LifeGlimmer GmbH, Berlin, Germany,Bioprocess Engineering, Wageningen University & Research, Wageningen, The Netherlands
| | - Lambert Schomaker
- Bernoulli Institute, University of Groningen, Groningen, The Netherlands
| | - Edoardo Saccenti
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Wageningen, The Netherlands,Corresponding author at: Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Wageningen, The Netherlands (M. Anteghini).
| |
Collapse
|
10
|
Nakai K, Wei L. Recent Advances in the Prediction of Subcellular Localization of Proteins and Related Topics. FRONTIERS IN BIOINFORMATICS 2022; 2:910531. [PMID: 36304291 PMCID: PMC9580943 DOI: 10.3389/fbinf.2022.910531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 04/25/2022] [Indexed: 11/13/2022] Open
Abstract
Prediction of subcellular localization of proteins from their amino acid sequences has a long history in bioinformatics and is still actively developing, incorporating the latest advances in machine learning and proteomics. Notably, deep learning-based methods for natural language processing have made great contributions. Here, we review recent advances in the field as well as its related fields, such as subcellular proteomics and the prediction/recognition of subcellular localization from image data.
Collapse
Affiliation(s)
- Kenta Nakai
- Institute of Medical Science, The University of Tokyo, Minato-Ku, Japan
- *Correspondence: Kenta Nakai,
| | - Leyi Wei
- School of Software, Shandong University, Jinan, China
| |
Collapse
|
11
|
Kamoshita M, Kumar R, Anteghini M, Kunze M, Islinger M, Martins dos Santos V, Schrader M. Insights Into the Peroxisomal Protein Inventory of Zebrafish. Front Physiol 2022; 13:822509. [PMID: 35295584 PMCID: PMC8919083 DOI: 10.3389/fphys.2022.822509] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Accepted: 02/07/2022] [Indexed: 12/19/2022] Open
Abstract
Peroxisomes are ubiquitous, oxidative subcellular organelles with important functions in cellular lipid metabolism and redox homeostasis. Loss of peroxisomal functions causes severe disorders with developmental and neurological abnormalities. Zebrafish are emerging as an attractive vertebrate model to study peroxisomal disorders as well as cellular lipid metabolism. Here, we combined bioinformatics analyses with molecular cell biology and reveal the first comprehensive inventory of Danio rerio peroxisomal proteins, which we systematically compared with those of human peroxisomes. Through bioinformatics analysis of all PTS1-carrying proteins, we demonstrate that D. rerio lacks two well-known mammalian peroxisomal proteins (BAAT and ZADH2/PTGR3), but possesses a putative peroxisomal malate synthase (Mlsl) and verified differences in the presence of purine degrading enzymes. Furthermore, we revealed novel candidate peroxisomal proteins in D. rerio, whose function and localisation is discussed. Our findings confirm the suitability of zebrafish as a vertebrate model for peroxisome research and open possibilities for the study of novel peroxisomal candidate proteins in zebrafish and humans.
Collapse
Affiliation(s)
- Maki Kamoshita
- College of Life and Environmental Sciences, Biosciences, University of Exeter, Exeter, United Kingdom
| | - Rechal Kumar
- College of Life and Environmental Sciences, Biosciences, University of Exeter, Exeter, United Kingdom
| | - Marco Anteghini
- LifeGlimmer GmbH, Berlin, Germany
- Systems and Synthetic Biology, Wageningen University & Research, Wageningen, Netherlands
| | - Markus Kunze
- Center for Brain Research, Medical University of Vienna, Vienna, Austria
| | - Markus Islinger
- Institute of Neuroanatomy, Mannheim Center for Translational Neuroscience, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Vítor Martins dos Santos
- LifeGlimmer GmbH, Berlin, Germany
- Systems and Synthetic Biology, Wageningen University & Research, Wageningen, Netherlands
| | - Michael Schrader
- College of Life and Environmental Sciences, Biosciences, University of Exeter, Exeter, United Kingdom
- *Correspondence: Michael Schrader,
| |
Collapse
|
12
|
Jiao S, Zou Q. Identification of plant vacuole proteins by exploiting deep representation learning features. Comput Struct Biotechnol J 2022; 20:2921-2927. [PMID: 35765653 PMCID: PMC9207291 DOI: 10.1016/j.csbj.2022.06.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 05/30/2022] [Accepted: 06/01/2022] [Indexed: 12/04/2022] Open
Abstract
Plant vacuoles are the most important organelles for plant growth, development, and defense, and they play an important role in many types of stress responses. An important function of vacuole proteins is the transport of various classes of amino acids, ions, sugars, and other molecules. Accurate identification of vacuole proteins is crucial for revealing their biological functions. Several automatic and rapid computational tools have been proposed for the subcellular localization of proteins. Regrettably, they are not specific for the identification of plant vacuole proteins. To the best of our knowledge, there is only one computational software specifically trained for plant vacuolar proteins. Although its accuracy is acceptable, the prediction performance and stability of this method in practical applications can still be improved. Hence, in this study, a new predictor named iPVP-DRLF was developed to identify plant vacuole proteins specifically and effectively. This prediction software is designed using the light gradient boosting machine (LGBM) algorithm and hybrid features composed of classic sequence features and deep representation learning features. iPVP-DRLF achieved fivefold cross-validation and independent test accuracy values of 88.25 % and 87.16 %, respectively, both outperforming previous state-of-the-art predictors. Moreover, the blind dataset test results also showed that the performance of iPVP-DRLF was significantly better than the existing tools. The results of comparative experiments confirmed that deep representation learning features have an advantage over other classic sequence features in the identification of plant vacuole proteins. We believe that iPVP-DRLF would serve as an effective computational technique for plant vacuole protein prediction and facilitate related future research. The online server is freely accessible at https://lab.malab.cn/~acy/iPVP-DRLF. In addition, the source code and datasets are also accessible at https://github.com/jiaoshihu/iPVP-DRLF.
Collapse
Affiliation(s)
- Shihu Jiao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Corresponding author at: Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.
| |
Collapse
|