1
|
Bahammou D, Recorbet G, Mamode Cassim A, Robert F, Balliau T, Van Delft P, Haddad Y, Mongrand S, Fouillen L, Simon-Plas F. A combined lipidomic and proteomic profiling of Arabidopsis thaliana plasma membrane. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2024. [PMID: 38761101 DOI: 10.1111/tpj.16810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 04/24/2024] [Accepted: 04/30/2024] [Indexed: 05/20/2024]
Abstract
The plant plasma membrane (PM) plays a key role in perception of environmental signals, and set-up of adaptive responses. An exhaustive and quantitative description of the whole set of lipids and proteins constituting the PM is necessary to understand how these components allow to fulfill such essential physiological functions. Here we provide by state-of-the-art approaches the first combined reference of the plant PM lipidome and proteome from Arabidopsis thaliana suspension cell culture. We identified and quantified a reproducible core set of 2165 proteins, which is by far the largest set of available data concerning this plant PM proteome. Using the same samples, combined lipidomic approaches, allowing the identification and quantification of an unprecedented repertoire of 414 molecular species of lipids showed that sterols, phospholipids, and sphingolipids are present in similar proportions in the plant PM. Within each lipid class, the precise amount of each lipid family and the relative proportion of each molecular species were further determined, allowing to establish the complete lipidome of Arabidopsis PM, and highlighting specific characteristics of the different molecular species of lipids. Results obtained point to a finely tuned adjustment of the molecular characteristics of lipids and proteins. More than a hundred proteins related to lipid metabolism, transport, or signaling have been identified and put in perspective of the lipids with which they are associated. This set of data represents an innovative resource to guide further research relative to the organization and functions of the plant PM.
Collapse
Affiliation(s)
- Delphine Bahammou
- Laboratoire de Biogenèse Membranaire, CNRS, Université, Bordeaux, (UMR 5200), F-33140, Villenave d'Ornon, France
| | - Ghislaine Recorbet
- UMR Agroécologie, INRAE, Institut Agro Dijon, Université Bourgogne Franche-Comté, F-21000, Dijon, France
| | - Adiilah Mamode Cassim
- UMR Agroécologie, INRAE, Institut Agro Dijon, Université Bourgogne Franche-Comté, F-21000, Dijon, France
| | - Franck Robert
- UMR Agroécologie, INRAE, Institut Agro Dijon, Université Bourgogne Franche-Comté, F-21000, Dijon, France
| | - Thierry Balliau
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE-Le Moulon, PAPPSO, F-91190, Gif-Sur-Yvette, France
| | - Pierre Van Delft
- Laboratoire de Biogenèse Membranaire, CNRS, Université, Bordeaux, (UMR 5200), F-33140, Villenave d'Ornon, France
| | - Youcef Haddad
- Laboratoire de Biogenèse Membranaire, CNRS, Université, Bordeaux, (UMR 5200), F-33140, Villenave d'Ornon, France
| | - Sébastien Mongrand
- Laboratoire de Biogenèse Membranaire, CNRS, Université, Bordeaux, (UMR 5200), F-33140, Villenave d'Ornon, France
| | - Laetitia Fouillen
- Laboratoire de Biogenèse Membranaire, CNRS, Université, Bordeaux, (UMR 5200), F-33140, Villenave d'Ornon, France
| | - Françoise Simon-Plas
- UMR Agroécologie, INRAE, Institut Agro Dijon, Université Bourgogne Franche-Comté, F-21000, Dijon, France
| |
Collapse
|
2
|
Xiao H, Zou Y, Wang J, Wan S. A Review for Artificial Intelligence Based Protein Subcellular Localization. Biomolecules 2024; 14:409. [PMID: 38672426 PMCID: PMC11048326 DOI: 10.3390/biom14040409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 03/21/2024] [Accepted: 03/25/2024] [Indexed: 04/28/2024] Open
Abstract
Proteins need to be located in appropriate spatiotemporal contexts to carry out their diverse biological functions. Mislocalized proteins may lead to a broad range of diseases, such as cancer and Alzheimer's disease. Knowing where a target protein resides within a cell will give insights into tailored drug design for a disease. As the gold validation standard, the conventional wet lab uses fluorescent microscopy imaging, immunoelectron microscopy, and fluorescent biomarker tags for protein subcellular location identification. However, the booming era of proteomics and high-throughput sequencing generates tons of newly discovered proteins, making protein subcellular localization by wet-lab experiments a mission impossible. To tackle this concern, in the past decades, artificial intelligence (AI) and machine learning (ML), especially deep learning methods, have made significant progress in this research area. In this article, we review the latest advances in AI-based method development in three typical types of approaches, including sequence-based, knowledge-based, and image-based methods. We also elaborately discuss existing challenges and future directions in AI-based method development in this research field.
Collapse
Affiliation(s)
- Hanyu Xiao
- Department of Genetics, Cell Biology and Anatomy, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, USA;
| | - Yijin Zou
- College of Veterinary Medicine, China Agricultural University, Beijing 100193, China;
| | - Jieqiong Wang
- Department of Neurological Sciences, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, USA;
| | - Shibiao Wan
- Department of Genetics, Cell Biology and Anatomy, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, USA;
| |
Collapse
|
3
|
Nielsen H. Protein Sorting Prediction. Methods Mol Biol 2024; 2715:27-63. [PMID: 37930519 DOI: 10.1007/978-1-0716-3445-5_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2023]
Abstract
Many computational methods are available for predicting protein sorting in bacteria. When comparing them, it is important to know that they can be grouped into three fundamentally different approaches: signal-based, global property-based, and homology-based prediction. In this chapter, the strengths and drawbacks of each of these approaches are described through many examples of methods that predict secretion, integration into membranes, or subcellular locations in general. The aim of this chapter is to provide a user-level introduction to the field with a minimum of computational theory.
Collapse
Affiliation(s)
- Henrik Nielsen
- Department of Health Technology, Technical University of Denmark, Lyngby, Denmark.
| |
Collapse
|
4
|
Zou K, Wang S, Wang Z, Zhang Z, Yang F. HAR_Locator: a novel protein subcellular location prediction model of immunohistochemistry images based on hybrid attention modules and residual units. Front Mol Biosci 2023; 10:1171429. [PMID: 37664182 PMCID: PMC10470064 DOI: 10.3389/fmolb.2023.1171429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 08/04/2023] [Indexed: 09/05/2023] Open
Abstract
Introduction: Proteins located in subcellular compartments have played an indispensable role in the physiological function of eukaryotic organisms. The pattern of protein subcellular localization is conducive to understanding the mechanism and function of proteins, contributing to investigating pathological changes of cells, and providing technical support for targeted drug research on human diseases. Automated systems based on featurization or representation learning and classifier design have attracted interest in predicting the subcellular location of proteins due to a considerable rise in proteins. However, large-scale, fine-grained protein microscopic images are prone to trapping and losing feature information in the general deep learning models, and the shallow features derived from statistical methods have weak supervision abilities. Methods: In this work, a novel model called HAR_Locator was developed to predict the subcellular location of proteins by concatenating multi-view abstract features and shallow features, whose advanced advantages are summarized in the following three protocols. Firstly, to get discriminative abstract feature information on protein subcellular location, an abstract feature extractor called HARnet based on Hybrid Attention modules and Residual units was proposed to relieve gradient dispersion and focus on protein-target regions. Secondly, it not only improves the supervision ability of image information but also enhances the generalization ability of the HAR_Locator through concatenating abstract features and shallow features. Finally, a multi-category multi-classifier decision system based on an Artificial Neural Network (ANN) was introduced to obtain the final output results of samples by fitting the most representative result from five subset predictors. Results: To evaluate the model, a collection of 6,778 immunohistochemistry (IHC) images from the Human Protein Atlas (HPA) database was used to present experimental results, and the accuracy, precision, and recall evaluation indicators were significantly increased to 84.73%, 84.77%, and 84.70%, respectively, compared with baseline predictors.
Collapse
Affiliation(s)
- Kai Zou
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Simeng Wang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Ziqian Wang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Zhihai Zhang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Fan Yang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
- Artificial Intelligence and Bioinformation Cognition Laboratory, Jiangxi Science and Technology Normal University, Nanchang, China
| |
Collapse
|
5
|
Jiang Y, Wang D, Wang W, Xu D. Computational methods for protein localization prediction. Comput Struct Biotechnol J 2021; 19:5834-5844. [PMID: 34765098 PMCID: PMC8564054 DOI: 10.1016/j.csbj.2021.10.023] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 10/12/2021] [Accepted: 10/13/2021] [Indexed: 12/16/2022] Open
Abstract
The accurate annotation of protein localization is crucial in understanding protein function in tandem with a broad range of applications such as pathological analysis and drug design. Since most proteins do not have experimentally-determined localization information, the computational prediction of protein localization has been an active research area for more than two decades. In particular, recent machine-learning advancements have fueled the development of new methods in protein localization prediction. In this review paper, we first categorize the main features and algorithms used for protein localization prediction. Then, we summarize a list of protein localization prediction tools in terms of their coverage, characteristics, and accessibility to help users find suitable tools based on their needs. Next, we evaluate some of these tools on a benchmark dataset. Finally, we provide an outlook on the future exploration of protein localization methods.
Collapse
Affiliation(s)
- Yuexu Jiang
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | - Duolin Wang
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | - Weiwei Wang
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| |
Collapse
|
6
|
Spectrum of Protein Location in Proteomes Captures Evolutionary Relationship Between Species. J Mol Evol 2021; 89:544-553. [PMID: 34328525 PMCID: PMC8379119 DOI: 10.1007/s00239-021-10022-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2020] [Accepted: 07/16/2021] [Indexed: 11/10/2022]
Abstract
The native subcellular location (also referred to as localization or cellular compartment) of a protein is the one in which it acts most frequently; it is one aspect of protein function. Do ten eukaryotic model organisms differ in their location spectrum, i.e., the fraction of its proteome in each of seven major cellular compartments? As experimental annotations of locations remain biased and incomplete, we need prediction methods to answer this question. After systematic bias corrections, the complete but faulty prediction methods appeared to be more appropriate to compare location spectra between species than the incomplete more accurate experimental data. This work compared the location spectra for ten eukaryotes: Homo sapiens (human), Gorilla gorilla (gorilla), Pan troglodytes (chimpanzee), Mus musculus (mouse), Rattus norvegicus (rat), Drosophila melanogaster (fruit/vinegar fly), Anopheles gambiae (African malaria mosquito), Caenorhabitis elegans (nematode), Saccharomyces cerevisiae (baker’s yeast), and Schizosaccharomyces pombe (fission yeast). The two largest classes were predicted to be the nucleus and the cytoplasm together accounting for 47–62% of all proteins, while 7–21% of the proteins were predicted in the plasma membrane and 4–15% to be secreted. Overall, the predicted location spectra were largely similar. However, in detail, the differences sufficed to plot trees (UPGMA) and 2D (PCA) maps relating the ten organisms using a simple Euclidean distance in seven states (location classes). The relations based on the simple predicted location spectra captured aspects of cross-species comparisons usually revealed only by much more detailed evolutionary comparisons. Most interestingly, known phylogenetic relations were reproduced better by paralog-only than by ortholog-only trees.
Collapse
|
7
|
ANOX: A robust computational model for predicting the antioxidant proteins based on multiple features. Anal Biochem 2021; 631:114257. [PMID: 34043981 DOI: 10.1016/j.ab.2021.114257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 05/12/2021] [Accepted: 05/14/2021] [Indexed: 11/20/2022]
Abstract
As an indispensable component of various living organisms, the antioxidant proteins have been studied for anti-aging and prevention of various diseases, such as altitude sickness, coronary heart disease, and even cancer. However, the traditional experimental methods for identifying the antioxidant proteins are very expensive and time-consuming. Thus, to address the challenge, a new predictor, named ANOX, was developed in this study. Multiple features, such as frequency matrix features (FRE), amino acid and dipeptide composition (AADP), evolutionary difference formula features (EEDP), k-separated bigrams (KSB), and PSI-PRED secondary structure (PRED), were extracted to generate the original feature space. To find the optimized feature subset, the Max-Relevance-Max-Distance (MRMD) algorithm was implemented for feature ranking and our model received the best performance with the top 1170 features. Rigorous tests were performed to evaluate the performance of ANOX, and the results showed that ANOX achieved a major improvement in the prediction accuracy of the antioxidant proteins (AUC:0.930 and 0.935 using 5-fold cross-validation or the jackknife test) compared to the state-of-the-art predictor AOPs-SVM (AUC:0.869 and 0.885). The dataset used in this study and the source code of ANOX are all available at https://github.com/NWAFU-LiuLab/ANOX.
Collapse
|
8
|
Wang W, Chen Y, Zhao J, Chen L, Song W, Li L, Lin GN. Alternatively Splicing Interactomes Identify Novel Isoform-Specific Partners for NSD2. Front Cell Dev Biol 2021; 9:612019. [PMID: 33718354 PMCID: PMC7947288 DOI: 10.3389/fcell.2021.612019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 02/05/2021] [Indexed: 11/13/2022] Open
Abstract
Nuclear receptor SET domain protein (NSD2) plays a fundamental role in the pathogenesis of Wolf-Hirschhorn Syndrome (WHS) and is overexpressed in multiple human myelomas, but its protein-protein interaction (PPI) patterns, particularly at the isoform/exon levels, are poorly understood. We explored the subcellular localizations of four representative NSD2 transcripts with immunofluorescence microscopy. Next, we used label-free quantification to perform immunoprecipitation mass spectrometry (IP-MS) analyses of the transcripts. Using the interaction partners for each transcript detected in the IP-MS results, we identified 890 isoform-specific PPI partners (83% are novel). These PPI networks were further divided into four categories of the exon-specific interactome. In these exon-specific PPI partners, two genes, RPL10 and HSPA8, were successfully confirmed by co-immunoprecipitation and Western blotting. RPL10 primarily interacted with Isoforms 1, 3, and 5, and HSPA8 interacted with all four isoforms, respectively. Using our extended NSD2 protein interactions, we constructed an isoform-level PPI landscape for NSD2 to serve as reference interactome data for NSD2 spliceosome-level studies. Furthermore, the RNA splicing processes supported by these isoform partners shed light on the diverse roles NSD2 plays in WHS and myeloma development. We also validated the interactions using Western blotting, RPL10, and the three NSD2 (Isoform 1, 3, and 5). Our results expand gene-level NSD2 PPI networks and provide a basis for the treatment of NSD2-related developmental diseases.
Collapse
Affiliation(s)
- Weidi Wang
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
- Shanghai Key Laboratory of Psychotic Disorders, Shanghai, China
| | - Yucan Chen
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Jingjing Zhao
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Liang Chen
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Weichen Song
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Li Li
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Guan Ning Lin
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
- Shanghai Key Laboratory of Psychotic Disorders, Shanghai, China
| |
Collapse
|
9
|
Littmann M, Heinzinger M, Dallago C, Olenyi T, Rost B. Embeddings from deep learning transfer GO annotations beyond homology. Sci Rep 2021; 11:1160. [PMID: 33441905 PMCID: PMC7806674 DOI: 10.1038/s41598-020-80786-0] [Citation(s) in RCA: 67] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Accepted: 12/24/2020] [Indexed: 11/09/2022] Open
Abstract
Knowing protein function is crucial to advance molecular and medical biology, yet experimental function annotations through the Gene Ontology (GO) exist for fewer than 0.5% of all known proteins. Computational methods bridge this sequence-annotation gap typically through homology-based annotation transfer by identifying sequence-similar proteins with known function or through prediction methods using evolutionary information. Here, we propose predicting GO terms through annotation transfer based on proximity of proteins in the SeqVec embedding rather than in sequence space. These embeddings originate from deep learned language models (LMs) for protein sequences (SeqVec) transferring the knowledge gained from predicting the next amino acid in 33 million protein sequences. Replicating the conditions of CAFA3, our method reaches an Fmax of 37 ± 2%, 50 ± 3%, and 57 ± 2% for BPO, MFO, and CCO, respectively. Numerically, this appears close to the top ten CAFA3 methods. When restricting the annotation transfer to proteins with < 20% pairwise sequence identity to the query, performance drops (Fmax BPO 33 ± 2%, MFO 43 ± 3%, CCO 53 ± 2%); this still outperforms naïve sequence-based transfer. Preliminary results from CAFA4 appear to confirm these findings. Overall, this new concept is likely to change the annotation of proteins, in particular for proteins from smaller families or proteins with intrinsically disordered regions.
Collapse
Affiliation(s)
- Maria Littmann
- Department of Informatics, Bioinformatics and Computational Biology, i12, TUM (Technical University of Munich), Boltzmannstr. 3, Garching, 85748, Munich, Germany.
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany.
| | - Michael Heinzinger
- Department of Informatics, Bioinformatics and Computational Biology, i12, TUM (Technical University of Munich), Boltzmannstr. 3, Garching, 85748, Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Christian Dallago
- Department of Informatics, Bioinformatics and Computational Biology, i12, TUM (Technical University of Munich), Boltzmannstr. 3, Garching, 85748, Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Tobias Olenyi
- Department of Informatics, Bioinformatics and Computational Biology, i12, TUM (Technical University of Munich), Boltzmannstr. 3, Garching, 85748, Munich, Germany
| | - Burkhard Rost
- Department of Informatics, Bioinformatics and Computational Biology, i12, TUM (Technical University of Munich), Boltzmannstr. 3, Garching, 85748, Munich, Germany
- Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, Garching, 85748, Munich, Germany
- School of Life Sciences Weihenstephan (TUM-WZW), TUM (Technical University of Munich), Alte Akademie 8, Freising, Germany
- Department of Biochemistry and Molecular Biophysics, Columbia University, 701 West, 168th Street, New York, NY, 10032, USA
| |
Collapse
|
10
|
Kumar R, Dhanda SK. Bird Eye View of Protein Subcellular Localization Prediction. Life (Basel) 2020; 10:E347. [PMID: 33327400 PMCID: PMC7764902 DOI: 10.3390/life10120347] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 12/11/2020] [Accepted: 12/11/2020] [Indexed: 12/12/2022] Open
Abstract
Proteins are made up of long chain of amino acids that perform a variety of functions in different organisms. The activity of the proteins is determined by the nucleotide sequence of their genes and by its 3D structure. In addition, it is essential for proteins to be destined to their specific locations or compartments to perform their structure and functions. The challenge of computational prediction of subcellular localization of proteins is addressed in various in silico methods. In this review, we reviewed the progress in this field and offered a bird eye view consisting of a comprehensive listing of tools, types of input features explored, machine learning approaches employed, and evaluation matrices applied. We hope the review will be useful for the researchers working in the field of protein localization predictions.
Collapse
Affiliation(s)
- Ravindra Kumar
- Biometric Research Program, Division of Cancer Treatment and Diagnosis, National Cancer Institute, NIH, 9609 Medical Center Drive, Rockville, MD 20850, USA
| | - Sandeep Kumar Dhanda
- Department of Oncology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| |
Collapse
|
11
|
Grasso S, van Rij T, van Dijl JM. GP4: an integrated Gram-Positive Protein Prediction Pipeline for subcellular localization mimicking bacterial sorting. Brief Bioinform 2020; 22:5998864. [PMID: 33227814 PMCID: PMC8294519 DOI: 10.1093/bib/bbaa302] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Revised: 10/08/2020] [Accepted: 10/09/2020] [Indexed: 11/17/2022] Open
Abstract
Subcellular localization is a critical aspect of protein function and the potential application of proteins either as drugs or drug targets, or in industrial and domestic applications. However, the experimental determination of protein localization is time consuming and expensive. Therefore, various localization predictors have been developed for particular groups of species. Intriguingly, despite their major representation amongst biotechnological cell factories and pathogens, a meta-predictor based on sorting signals and specific for Gram-positive bacteria was still lacking. Here we present GP4, a protein subcellular localization meta-predictor mainly for Firmicutes, but also Actinobacteria, based on the combination of multiple tools, each specific for different sorting signals and compartments. Novelty elements include improved cell-wall protein prediction, including differentiation of the type of interaction, prediction of non-canonical secretion pathway target proteins, separate prediction of lipoproteins and better user experience in terms of parsability and interpretability of the results. GP4 aims at mimicking protein sorting as it would happen in a bacterial cell. As GP4 is not homology based, it has a broad applicability and does not depend on annotated databases with homologous proteins. Non-canonical usage may include little studied or novel species, synthetic and engineered organisms, and even re-use of the prediction data to develop custom prediction algorithms. Our benchmark analysis highlights the improved performance of GP4 compared to other widely used subcellular protein localization predictors. A webserver running GP4 is available at http://gp4.hpc.rug.nl/
Collapse
Affiliation(s)
| | | | - Jan Maarten van Dijl
- University of Groningen and the University Medical Center Groningen, the Netherlands
| |
Collapse
|
12
|
Bouziane H, Chouarfia A. Use of Chou's 5-steps rule to predict the subcellular localization of gram-negative and gram-positive bacterial proteins by multi-label learning based on gene ontology annotation and profile alignment. J Integr Bioinform 2020; 18:51-79. [PMID: 32598314 PMCID: PMC8035964 DOI: 10.1515/jib-2019-0091] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Accepted: 04/08/2020] [Indexed: 12/31/2022] Open
Abstract
To date, many proteins generated by large-scale genome sequencing projects are still uncharacterized and subject to intensive investigations by both experimental and computational means. Knowledge of protein subcellular localization (SCL) is of key importance for protein function elucidation. However, it remains a challenging task, especially for multiple sites proteins known to shuttle between cell compartments to perform their proper biological functions and proteins which do not have significant homology to proteins of known subcellular locations. Due to their low-cost and reasonable accuracy, machine learning-based methods have gained much attention in this context with the availability of a plethora of biological databases and annotated proteins for analysis and benchmarking. Various predictive models have been proposed to tackle the SCL problem, using different protein sequence features pertaining to the subcellular localization, however, the overwhelming majority of them focuses on single localization and cover very limited cellular locations. The prediction was basically established on sorting signals, amino acids compositions, and homology. To improve the prediction quality, focus is actually on knowledge information extracted from annotation databases, such as protein-protein interactions and Gene Ontology (GO) functional domains annotation which has been recently a widely adopted and essential information for learning systems. To deal with such problem, in the present study, we considered SCL prediction task as a multi-label learning problem and tried to label both single site and multiple sites unannotated bacterial protein sequences by mining proteins homology relationships using both GO terms of protein homologs and PSI-BLAST profiles. The experiments using 5-fold cross-validation tests on the benchmark datasets showed a significant improvement on the results obtained by the proposed consensus multi-label prediction model which discriminates six compartments for Gram-negative and five compartments for Gram-positive bacterial proteins.
Collapse
Affiliation(s)
- Hafida Bouziane
- Département d’Informatique, Université des Sciences et de la Technologie d’Oran Mohamed Boudiaf, USTO-MB BP 1505, El M’Naouer, 31000, Oran, Algeria
| | - Abdallah Chouarfia
- Département d’Informatique, Université des Sciences et de la Technologie d’Oran Mohamed Boudiaf, USTO-MB BP 1505, El M’Naouer, 31000, Oran, Algeria
| |
Collapse
|
13
|
Singh V, Singh G, Singh V. TulsiPIN: An Interologous Protein Interactome of Ocimum tenuiflorum. J Proteome Res 2020; 19:884-899. [PMID: 31789043 DOI: 10.1021/acs.jproteome.9b00683] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Ocimum tenuiflorum, commonly known as holy basil or tulsi, is globally recognized for its multitude of medicinal properties. However, a comprehensive study revealing the complex interplay among its constituent proteins at subcellular level is still lacking. To bridge this gap, in this work, a genome-scale interologous protein-protein interaction (PPI) network, TulsiPIN, is developed using 36 template plants, which consists of 13 660 nodes and 327 409 binary interactions. A high confidence network, hc-TulsiPIN, consisting of 7719 nodes having 95 532 interactions is inferred using domain-domain interaction information along with interolog-based statistics, and its reliability is assessed using pathway enrichment, functional homogeneity, and protein colocalization of PPIs. Examination of topological features revealed that hc-TulsiPIN possesses conventional properties, like small-world, scale-free, and modular architecture. A total of 1625 vital proteins are predicted by statistically evaluating hc-TulsiPIN with two ensembles of corresponding random networks, each consisting of 10 000 realizations of Erdoős-Rényi and Barabási-Albert models. Also, numerous regulatory proteins like transcription factors, transcription regulators, and protein kinases are profiled. Using 36 guide genes participating in 9 secondary metabolite biosynthetic pathways, a subnetwork consisting of 171 proteins and 612 interactions was constructed, and 127 of these proteins could be successfully characterized. Detailed information of TulsiPIN is available at https://cuhpcbbtulsipin.shinyapps.io/tulsipin_v0/ .
Collapse
Affiliation(s)
- Vikram Singh
- Centre for Computational Biology and Bioinformatics , Central University of Himahcal Pradesh , Dharamshala 176206 , India
| | - Gagandeep Singh
- Centre for Computational Biology and Bioinformatics , Central University of Himahcal Pradesh , Dharamshala 176206 , India
| | - Vikram Singh
- Centre for Computational Biology and Bioinformatics , Central University of Himahcal Pradesh , Dharamshala 176206 , India
| |
Collapse
|
14
|
Nielsen H, Petsalaki EI, Zhao L, Stühler K. Predicting eukaryotic protein secretion without signals. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2019; 1867:140174. [DOI: 10.1016/j.bbapap.2018.11.011] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2018] [Revised: 10/30/2018] [Accepted: 11/29/2018] [Indexed: 10/27/2022]
|
15
|
Yang F, Liu Y, Wang Y, Yin Z, Yang Z. MIC_Locator: a novel image-based protein subcellular location multi-label prediction model based on multi-scale monogenic signal representation and intensity encoding strategy. BMC Bioinformatics 2019; 20:522. [PMID: 31655541 PMCID: PMC6815465 DOI: 10.1186/s12859-019-3136-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Accepted: 10/09/2019] [Indexed: 12/20/2022] Open
Abstract
Background Protein subcellular localization plays a crucial role in understanding cell function. Proteins need to be in the right place at the right time, and combine with the corresponding molecules to fulfill their functions. Furthermore, prediction of protein subcellular location not only should be a guiding role in drug design and development due to potential molecular targets but also be an essential role in genome annotation. Taking the current status of image-based protein subcellular localization as an example, there are three common drawbacks, i.e., obsolete datasets without updating label information, stereotypical feature descriptor on spatial domain or grey level, and single-function prediction algorithm’s limited capacity of handling single-label database. Results In this paper, a novel human protein subcellular localization prediction model MIC_Locator is proposed. Firstly, the latest datasets are collected and collated as our benchmark dataset instead of obsolete data while training prediction model. Secondly, Fourier transformation, Riesz transformation, Log-Gabor filter and intensity coding strategy are employed to obtain frequency feature based on three components of monogenic signal with different frequency scales. Thirdly, a chained prediction model is proposed to handle multi-label instead of single-label datasets. The experiment results showed that the MIC_Locator can achieve 60.56% subset accuracy and outperform the existing majority of prediction models, and the frequency feature and intensity coding strategy can be conducive to improving the classification accuracy. Conclusions Our results demonstrate that the frequency feature is more beneficial for improving the performance of model compared to features extracted from spatial domain, and the MIC_Locator proposed in this paper can speed up validation of protein annotation, knowledge of protein function and proteomics research.
Collapse
Affiliation(s)
- Fan Yang
- School of Communications and Electronics, Jiangxi Science & Technology Normal University, Nanchang, 330003, China. .,Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA, 02115, USA.
| | - Yang Liu
- School of Communications and Electronics, Jiangxi Science & Technology Normal University, Nanchang, 330003, China
| | - Yanbin Wang
- School of Communications and Electronics, Jiangxi Science & Technology Normal University, Nanchang, 330003, China
| | - Zhijian Yin
- School of Communications and Electronics, Jiangxi Science & Technology Normal University, Nanchang, 330003, China
| | - Zhen Yang
- School of Communications and Electronics, Jiangxi Science & Technology Normal University, Nanchang, 330003, China
| |
Collapse
|
16
|
Chou KC. Advances in Predicting Subcellular Localization of Multi-label Proteins and its Implication for Developing Multi-target Drugs. Curr Med Chem 2019; 26:4918-4943. [PMID: 31060481 DOI: 10.2174/0929867326666190507082559] [Citation(s) in RCA: 78] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Revised: 01/29/2019] [Accepted: 01/31/2019] [Indexed: 12/16/2022]
Abstract
The smallest unit of life is a cell, which contains numerous protein molecules. Most
of the functions critical to the cell’s survival are performed by these proteins located in its different
organelles, usually called ‘‘subcellular locations”. Information of subcellular localization
for a protein can provide useful clues about its function. To reveal the intricate pathways at the
cellular level, knowledge of the subcellular localization of proteins in a cell is prerequisite.
Therefore, one of the fundamental goals in molecular cell biology and proteomics is to determine
the subcellular locations of proteins in an entire cell. It is also indispensable for prioritizing
and selecting the right targets for drug development. Unfortunately, it is both timeconsuming
and costly to determine the subcellular locations of proteins purely based on experiments.
With the avalanche of protein sequences generated in the post-genomic age, it is highly
desired to develop computational methods for rapidly and effectively identifying the subcellular
locations of uncharacterized proteins based on their sequences information alone. Actually,
considerable progresses have been achieved in this regard. This review is focused on those
methods, which have the capacity to deal with multi-label proteins that may simultaneously
exist in two or more subcellular location sites. Protein molecules with this kind of characteristic
are vitally important for finding multi-target drugs, a current hot trend in drug development.
Focused in this review are also those methods that have use-friendly web-servers established so
that the majority of experimental scientists can use them to get the desired results without the
need to go through the detailed mathematics involved.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, United States
| |
Collapse
|
17
|
Abstract
The smallest unit of life is a cell, which contains numerous protein molecules. Most
of the functions critical to the cell’s survival are performed by these proteins located in its different
organelles, usually called ‘‘subcellular locations”. Information of subcellular localization
for a protein can provide useful clues about its function. To reveal the intricate pathways at the
cellular level, knowledge of the subcellular localization of proteins in a cell is prerequisite.
Therefore, one of the fundamental goals in molecular cell biology and proteomics is to determine
the subcellular locations of proteins in an entire cell. It is also indispensable for prioritizing
and selecting the right targets for drug development. Unfortunately, it is both timeconsuming
and costly to determine the subcellular locations of proteins purely based on experiments.
With the avalanche of protein sequences generated in the post-genomic age, it is highly
desired to develop computational methods for rapidly and effectively identifying the subcellular
locations of uncharacterized proteins based on their sequences information alone. Actually,
considerable progresses have been achieved in this regard. This review is focused on those
methods, which have the capacity to deal with multi-label proteins that may simultaneously
exist in two or more subcellular location sites. Protein molecules with this kind of characteristic
are vitally important for finding multi-target drugs, a current hot trend in drug development.
Focused in this review are also those methods that have use-friendly web-servers established so
that the majority of experimental scientists can use them to get the desired results without the
need to go through the detailed mathematics involved.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, United States
| |
Collapse
|
18
|
Li SH, Guan ZX, Zhang D, Zhang ZM, Huang J, Yang W, Lin H. Recent Advancement in Predicting Subcellular Localization of Mycobacterial Protein with Machine Learning Methods. Med Chem 2019; 16:605-619. [PMID: 31584379 DOI: 10.2174/1573406415666191004101913] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2019] [Revised: 06/25/2019] [Accepted: 08/23/2019] [Indexed: 01/28/2023]
Abstract
Mycobacterium tuberculosis (MTB) can cause the terrible tuberculosis (TB), which is reported as one of the most dreadful epidemics. Although many biochemical molecular drugs have been developed to cope with this disease, the drug resistance-especially the multidrug-resistant (MDR) and extensively drug-resistance (XDR)-poses a huge threat to the treatment. However, traditional biochemical experimental method to tackle TB is time-consuming and costly. Benefited by the appearance of the enormous genomic and proteomic sequence data, TB can be treated via sequence-based biological computational approach-bioinformatics. Studies on predicting subcellular localization of mycobacterial protein (MBP) with high precision and efficiency may help figure out the biological function of these proteins and then provide useful insights for protein function annotation as well as drug design. In this review, we reported the progress that has been made in computational prediction of subcellular localization of MBP including the following aspects: 1) Construction of benchmark datasets. 2) Methods of feature extraction. 3) Techniques of feature selection. 4) Application of several published prediction algorithms. 5) The published results. 6) The further study on prediction of subcellular localization of MBP.
Collapse
Affiliation(s)
- Shi-Hao Li
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zheng-Xing Guan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Dan Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zi-Mei Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Jian Huang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Wuritu Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,Development and Planning Department, Inner Mongolia University, Hohhot, P.R. China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
19
|
Cao Z, Pan X, Yang Y, Huang Y, Shen HB. The lncLocator: a subcellular localization predictor for long non-coding RNAs based on a stacked ensemble classifier. Bioinformatics 2019; 34:2185-2194. [PMID: 29462250 DOI: 10.1093/bioinformatics/bty085] [Citation(s) in RCA: 275] [Impact Index Per Article: 45.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2017] [Accepted: 02/14/2018] [Indexed: 01/01/2023] Open
Abstract
Motivation The long non-coding RNA (lncRNA) studies have been hot topics in the field of RNA biology. Recent studies have shown that their subcellular localizations carry important information for understanding their complex biological functions. Considering the costly and time-consuming experiments for identifying subcellular localization of lncRNAs, computational methods are urgently desired. However, to the best of our knowledge, there are no computational tools for predicting the lncRNA subcellular locations to date. Results In this study, we report an ensemble classifier-based predictor, lncLocator, for predicting the lncRNA subcellular localizations. To fully exploit lncRNA sequence information, we adopt both k-mer features and high-level abstraction features generated by unsupervised deep models, and construct four classifiers by feeding these two types of features to support vector machine (SVM) and random forest (RF), respectively. Then we use a stacked ensemble strategy to combine the four classifiers and get the final prediction results. The current lncLocator can predict five subcellular localizations of lncRNAs, including cytoplasm, nucleus, cytosol, ribosome and exosome, and yield an overall accuracy of 0.59 on the constructed benchmark dataset. Availability and implementation The lncLocator is available at www.csbio.sjtu.edu.cn/bioinf/lncLocator. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhen Cao
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
| | - Xiaoyong Pan
- Department of Medical Informatics, Erasmus MC, Rotterdam, The Netherlands
| | - Yang Yang
- Department of Computer Science, Shanghai Jiao Tong University, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai, China
| | - Yan Huang
- State Key Laboratory of Infrared Physics, Shanghai Institute of Technical Physics, Chinese Academy of Sciences, Shanghai, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
| |
Collapse
|
20
|
Abstract
Background:
Revealing the subcellular location of a newly discovered protein can
bring insight into their function and guide research at the cellular level. The experimental methods
currently used to identify the protein subcellular locations are both time-consuming and expensive.
Thus, it is highly desired to develop computational methods for efficiently and effectively identifying
the protein subcellular locations. Especially, the rapidly increasing number of protein sequences
entering the genome databases has called for the development of automated analysis methods.
Methods:
In this review, we will describe the recent advances in predicting the protein subcellular
locations with machine learning from the following aspects: i) Protein subcellular location benchmark
dataset construction, ii) Protein feature representation and feature descriptors, iii) Common
machine learning algorithms, iv) Cross-validation test methods and assessment metrics, v) Web
servers.
Result & Conclusion:
Concomitant with a large number of protein sequences generated by highthroughput
technologies, four future directions for predicting protein subcellular locations with
machine learning should be paid attention. One direction is the selection of novel and effective features
(e.g., statistics, physical-chemical, evolutional) from the sequences and structures of proteins.
Another is the feature fusion strategy. The third is the design of a powerful predictor and the fourth
one is the protein multiple location sites prediction.
Collapse
Affiliation(s)
- Ting-He Zhang
- School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Shao-Wu Zhang
- School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China
| |
Collapse
|
21
|
Abstract
Ever since the signal hypothesis was proposed in 1971, the exact nature of signal peptides has been a focus point of research. The prediction of signal peptides and protein subcellular location from amino acid sequences has been an important problem in bioinformatics since the dawn of this research field, involving many statistical and machine learning technologies. In this review, we provide a historical account of how position-weight matrices, artificial neural networks, hidden Markov models, support vector machines and, lately, deep learning techniques have been used in the attempts to predict where proteins go. Because the secretory pathway was the first one to be studied both experimentally and through bioinformatics, our main focus is on the historical development of prediction methods for signal peptides that target proteins for secretion; prediction methods to identify targeting signals for other cellular compartments are treated in less detail.
Collapse
Affiliation(s)
- Henrik Nielsen
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Kgs. Lyngby, Denmark.
| | - Konstantinos D Tsirigos
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Søren Brunak
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Kgs. Lyngby, Denmark
- Faculty of Health and Medical Sciences, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
| | - Gunnar von Heijne
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
- Science for Life Laboratory, Stockholm University, Solna, Sweden
| |
Collapse
|
22
|
Zhang YJ, Zhu C, Ding Y, Yan ZW, Li GH, Lan Y, Wen JF, Chen B. Subcellular stoichiogenomics reveal cell evolution and electrostatic interaction mechanisms in cytoskeleton. BMC Genomics 2018; 19:469. [PMID: 29914356 PMCID: PMC6006717 DOI: 10.1186/s12864-018-4845-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Accepted: 05/31/2018] [Indexed: 01/24/2023] Open
Abstract
Background Eukaryotic cells contain a huge variety of internally specialized subcellular compartments. Stoichiogenomics aims to reveal patterns of elements usage in biological macromolecules. However, the stoichiogenomic characteristics and how they adapt to various subcellular microenvironments are still unknown. Results Here we first updated the definition of stoichiogenomics. Then we applied it to subcellular research, and detected distinctive nitrogen content of nuclear and hydrogen, sulfur content of extracellular proteomes. Specially, we found that acidic amino acids (AAs) content of cytoskeletal proteins is the highest. The increased charged AAs are mainly caused by the eukaryotic originated cytoskeletal proteins. Functional subdivision of the cytoskeleton showed that activation, binding/association, and complexes are the three largest functional categories. Electrostatic interaction analysis showed an increased electrostatic interaction between both primary sequences and PPI interfaces of 3D structures, in the cytoskeleton. Conclusions This study creates a blueprint of subcellular stoichiogenomic characteristics, and explains that charged AAs of the cytoskeleton increased greatly in evolution, which offer material basis for the eukaryotic cytoskeletal proteins to act in two ways of electrostatic interactions, and further perform their activation, binding/association and complex formation. Electronic supplementary material The online version of this article (10.1186/s12864-018-4845-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yu-Juan Zhang
- Institute of Entomology and Molecular Biology, College of Life Sciences, Chongqing Normal University, Shapingba, Chongqing, 401331, People's Republic of China.,State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan Province, 650223, People's Republic of China
| | - Chengxu Zhu
- Institute of Entomology and Molecular Biology, College of Life Sciences, Chongqing Normal University, Shapingba, Chongqing, 401331, People's Republic of China
| | - Yiran Ding
- Institute of Entomology and Molecular Biology, College of Life Sciences, Chongqing Normal University, Shapingba, Chongqing, 401331, People's Republic of China
| | - Zheng-Wen Yan
- Institute of Entomology and Molecular Biology, College of Life Sciences, Chongqing Normal University, Shapingba, Chongqing, 401331, People's Republic of China
| | - Gong-Hua Li
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan Province, 650223, People's Republic of China
| | - Yang Lan
- Institute of Entomology and Molecular Biology, College of Life Sciences, Chongqing Normal University, Shapingba, Chongqing, 401331, People's Republic of China
| | - Jian-Fan Wen
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan Province, 650223, People's Republic of China.
| | - Bin Chen
- Institute of Entomology and Molecular Biology, College of Life Sciences, Chongqing Normal University, Shapingba, Chongqing, 401331, People's Republic of China.
| |
Collapse
|
23
|
Abstract
Many computational methods are available for predicting protein sorting in bacteria. When comparing them, it is important to know that they can be grouped into three fundamentally different approaches: signal-based, global-property-based and homology-based prediction. In this chapter, the strengths and drawbacks of each of these approaches is described through many examples of methods that predict secretion, integration into membranes, or subcellular locations in general. The aim of this chapter is to provide a user-level introduction to the field with a minimum of computational theory.
Collapse
Affiliation(s)
- Henrik Nielsen
- Technical University of Denmark, Kemitorvet, Building 208, DK-2800, Kgs. Lyngby, Denmark.
| |
Collapse
|
24
|
Aloui A, Recorbet G, Lemaître-Guillier C, Mounier A, Balliau T, Zivy M, Wipf D, Dumas-Gaudot E. The plasma membrane proteome of Medicago truncatula roots as modified by arbuscular mycorrhizal symbiosis. MYCORRHIZA 2018; 28:1-16. [PMID: 28725961 DOI: 10.1007/s00572-017-0789-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Accepted: 07/06/2017] [Indexed: 06/07/2023]
Abstract
In arbuscular mycorrhizal (AM) roots, the plasma membrane (PM) of the host plant is involved in all developmental stages of the symbiotic interaction, from initial recognition to intracellular accommodation of intra-radical hyphae and arbuscules. Although the role of the PM as the agent for cellular morphogenesis and nutrient exchange is especially accentuated in endosymbiosis, very little is known regarding the PM protein composition of mycorrhizal roots. To obtain a global overview at the proteome level of the host PM proteins as modified by symbiosis, we performed a comparative protein profiling of PM fractions from Medicago truncatula roots either inoculated or not with the AM fungus Rhizophagus irregularis. PM proteins were isolated from root microsomes using an optimized discontinuous sucrose gradient; their subsequent analysis by liquid chromatography followed by mass spectrometry (MS) identified 674 proteins. Cross-species sequence homology searches combined with MS-based quantification clearly confirmed enrichment in PM-associated proteins and depletion of major microsomal contaminants. Changes in protein amounts between the PM proteomes of mycorrhizal and non-mycorrhizal roots were monitored further by spectral counting. This workflow identified a set of 82 mycorrhiza-responsive proteins that provided insights into the plant PM response to mycorrhizal symbiosis. Among them, the association of one third of the mycorrhiza-responsive proteins with detergent-resistant membranes pointed at partitioning to PM microdomains. The PM-associated proteins responsive to mycorrhization also supported host plant control of sugar uptake to limit fungal colonization, and lipid turnover events in the PM fraction of symbiotic roots. Because of the depletion upon symbiosis of proteins mediating the replacement of phospholipids by phosphorus-free lipids in the plasmalemma, we propose a role of phosphate nutrition in the PM composition of mycorrhizal roots.
Collapse
Affiliation(s)
- Achref Aloui
- UMR Agroécologie, INRA/AgroSup/University Bourgogne Franche-Comté, Pôle Interactions Plantes Microrganismes, ERL 6003 CNRS, BP 86510, 21065, Dijon Cedex, France
- Laboratoire des Plantes Extrêmophiles, Centre de Biotechnologie de Borj-Cédria, BP 901, 2050, Hammam-lif, Tunisia
| | - Ghislaine Recorbet
- UMR Agroécologie, INRA/AgroSup/University Bourgogne Franche-Comté, Pôle Interactions Plantes Microrganismes, ERL 6003 CNRS, BP 86510, 21065, Dijon Cedex, France.
| | - Christelle Lemaître-Guillier
- UMR Agroécologie, INRA/AgroSup/University Bourgogne Franche-Comté, Pôle Interactions Plantes Microrganismes, ERL 6003 CNRS, BP 86510, 21065, Dijon Cedex, France
| | - Arnaud Mounier
- UMR Agroécologie, INRA/AgroSup/University Bourgogne Franche-Comté, Pôle Interactions Plantes Microrganismes, ERL 6003 CNRS, BP 86510, 21065, Dijon Cedex, France
| | - Thierry Balliau
- UMR de Génétique végétale, PAPPSO, Ferme du Moulon, 91190, Gif sur Yvette, France
| | - Michel Zivy
- UMR de Génétique végétale, PAPPSO, Ferme du Moulon, 91190, Gif sur Yvette, France
| | - Daniel Wipf
- UMR Agroécologie, INRA/AgroSup/University Bourgogne Franche-Comté, Pôle Interactions Plantes Microrganismes, ERL 6003 CNRS, BP 86510, 21065, Dijon Cedex, France
| | - Eliane Dumas-Gaudot
- UMR Agroécologie, INRA/AgroSup/University Bourgogne Franche-Comté, Pôle Interactions Plantes Microrganismes, ERL 6003 CNRS, BP 86510, 21065, Dijon Cedex, France
| |
Collapse
|
25
|
Zhou H, Yang Y, Shen HB. Hum-mPLoc 3.0: prediction enhancement of human protein subcellular localization through modeling the hidden correlations of gene ontology and functional domain features. Bioinformatics 2017; 33:843-853. [PMID: 27993784 DOI: 10.1093/bioinformatics/btw723] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2016] [Accepted: 11/17/2016] [Indexed: 11/13/2022] Open
Abstract
Motivation Protein subcellular localization prediction has been an important research topic in computational biology over the last decade. Various automatic methods have been proposed to predict locations for large scale protein datasets, where statistical machine learning algorithms are widely used for model construction. A key step in these predictors is encoding the amino acid sequences into feature vectors. Many studies have shown that features extracted from biological domains, such as gene ontology and functional domains, can be very useful for improving the prediction accuracy. However, domain knowledge usually results in redundant features and high-dimensional feature spaces, which may degenerate the performance of machine learning models. Results In this paper, we propose a new amino acid sequence-based human protein subcellular location prediction approach Hum-mPLoc 3.0, which covers 12 human subcellular localizations. The sequences are represented by multi-view complementary features, i.e. context vocabulary annotation-based gene ontology (GO) terms, peptide-based functional domains, and residue-based statistical features. To systematically reflect the structural hierarchy of the domain knowledge bases, we propose a novel feature representation protocol denoted as HCM (Hidden Correlation Modeling), which will create more compact and discriminative feature vectors by modeling the hidden correlations between annotation terms. Experimental results on four benchmark datasets show that HCM improves prediction accuracy by 5-11% and F 1 by 8-19% compared with conventional GO-based methods. A large-scale application of Hum-mPLoc 3.0 on the whole human proteome reveals proteins co-localization preferences in the cell. Availability and Implementation www.csbio.sjtu.edu.cn/bioinf/Hum-mPLoc3/. Contacts hbshen@sjtu.edu.cn. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hang Zhou
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Ministry of Education of China, Shanghai, China.,Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
| | - Yang Yang
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China.,Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Ministry of Education of China, Shanghai, China.,Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
| |
Collapse
|
26
|
Nielsen H. Predicting Subcellular Localization of Proteins by Bioinformatic Algorithms. Curr Top Microbiol Immunol 2017; 404:129-158. [PMID: 26728066 DOI: 10.1007/82_2015_5006] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
When predicting the subcellular localization of proteins from their amino acid sequences, there are basically three approaches: signal-based, global property-based, and homology-based. Each of these has its advantages and drawbacks, and it is important when comparing methods to know which approach was used. Various statistical and machine learning algorithms are used with all three approaches, and various measures and standards are employed when reporting the performances of the developed methods. This chapter presents a number of available methods for prediction of sorting signals and subcellular localization, but rather than providing a checklist of which predictors to use, it aims to function as a guide for critical assessment of prediction methods.
Collapse
Affiliation(s)
- Henrik Nielsen
- Department of Systems Biology, Center for Biological Sequence Analysis, Technical University of Denmark, Kemitorvet building 208, 2800, Lyngby, Denmark.
| |
Collapse
|
27
|
Wan S, Mak MW, Kung SY. Transductive Learning for Multi-Label Protein Subchloroplast Localization Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:212-224. [PMID: 26887009 DOI: 10.1109/tcbb.2016.2527657] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Predicting the localization of chloroplast proteins at the sub-subcellular level is an essential yet challenging step to elucidate their functions. Most of the existing subchloroplast localization predictors are limited to predicting single-location proteins and ignore the multi-location chloroplast proteins. While recent studies have led to some multi-location chloroplast predictors, they usually perform poorly. This paper proposes an ensemble transductive learning method to tackle this multi-label classification problem. Specifically, given a protein in a dataset, its composition-based sequence information and profile-based evolutionary information are respectively extracted. These two kinds of features are respectively compared with those of other proteins in the dataset. The comparisons lead to two similarity vectors which are weighted-combined to constitute an ensemble feature vector. A transductive learning model based on the least squares and nearest neighbor algorithms is proposed to process the ensemble features. We refer to the resulting predictor to as EnTrans-Chlo. Experimental results on a stringent benchmark dataset and a novel dataset demonstrate that EnTrans-Chlo significantly outperforms state-of-the-art predictors and particularly gains more than 4% (absolute) improvement on the overall actual accuracy. For readers' convenience, EnTrans-Chlo is freely available online at http://bioinfo.eie.polyu.edu.hk/EnTransChloServer/.
Collapse
|
28
|
Wei L, Liao M, Gao X, Wang J, Lin W. mGOF-loc: A novel ensemble learning method for human protein subcellular localization prediction. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2015.09.137] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
29
|
Predicting protein subcellular localization based on information content of gene ontology terms. Comput Biol Chem 2016; 65:1-7. [PMID: 27665466 DOI: 10.1016/j.compbiolchem.2016.09.009] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2016] [Revised: 07/10/2016] [Accepted: 09/11/2016] [Indexed: 01/11/2023]
Abstract
Predicting the location where a protein resides within a cell is important in cell biology. Computational approaches to this issue have attracted more and more attentions from the community of biomedicine. Among the protein features used to predict the subcellular localization of proteins, the feature derived from Gene Ontology (GO) has been shown to be superior to others. However, most of the sights in this field are set on the presence or absence of some predefined GO terms. We proposed a method to derive information from the intrinsic structure of the GO graph. The feature vector was constructed with each element in it representing the information content of the GO term annotating to a protein investigated, and the support vector machines was used as classifier to test our extracted features. Evaluation experiments were conducted on three protein datasets and the results show that our method can enhance eukaryotic and human subcellular location prediction accuracy by up to 1.1% better than previous studies that also used GO-based features. Especially in the scenario where the cellular component annotation is absent, our method can achieved satisfied results with an overall accuracy of more than 87%.
Collapse
|
30
|
Wang X, Li H, Zhang Q, Wang R. Predicting Subcellular Localization of Apoptosis Proteins Combining GO Features of Homologous Proteins and Distance Weighted KNN Classifier. BIOMED RESEARCH INTERNATIONAL 2016; 2016:1793272. [PMID: 27213149 PMCID: PMC4860209 DOI: 10.1155/2016/1793272] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/05/2016] [Revised: 03/30/2016] [Accepted: 03/31/2016] [Indexed: 02/06/2023]
Abstract
Apoptosis proteins play a key role in maintaining the stability of organism; the functions of apoptosis proteins are related to their subcellular locations which are used to understand the mechanism of programmed cell death. In this paper, we utilize GO annotation information of apoptosis proteins and their homologous proteins retrieved from GOA database to formulate feature vectors and then combine the distance weighted KNN classification algorithm with them to solve the data imbalance problem existing in CL317 data set to predict subcellular locations of apoptosis proteins. It is found that the number of homologous proteins can affect the overall prediction accuracy. Under the optimal number of homologous proteins, the overall prediction accuracy of our method on CL317 data set reaches 96.8% by Jackknife test. Compared with other existing methods, it shows that our proposed method is very effective and better than others for predicting subcellular localization of apoptosis proteins.
Collapse
Affiliation(s)
- Xiao Wang
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China
| | - Hui Li
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China
| | - Qiuwen Zhang
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China
| | - Rong Wang
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China
| |
Collapse
|
31
|
Wan S, Mak MW, Kung SY. Sparse regressions for predicting and interpreting subcellular localization of multi-label proteins. BMC Bioinformatics 2016; 17:97. [PMID: 26911432 PMCID: PMC4765148 DOI: 10.1186/s12859-016-0940-x] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Accepted: 01/27/2016] [Indexed: 11/10/2022] Open
Abstract
Background Predicting protein subcellular localization is indispensable for inferring protein functions. Recent studies have been focusing on predicting not only single-location proteins, but also multi-location proteins. Almost all of the high performing predictors proposed recently use gene ontology (GO) terms to construct feature vectors for classification. Despite their high performance, their prediction decisions are difficult to interpret because of the large number of GO terms involved. Results This paper proposes using sparse regressions to exploit GO information for both predicting and interpreting subcellular localization of single- and multi-location proteins. Specifically, we compared two multi-label sparse regression algorithms, namely multi-label LASSO (mLASSO) and multi-label elastic net (mEN), for large-scale predictions of protein subcellular localization. Both algorithms can yield sparse and interpretable solutions. By using the one-vs-rest strategy, mLASSO and mEN identified 87 and 429 out of more than 8,000 GO terms, respectively, which play essential roles in determining subcellular localization. More interestingly, many of the GO terms selected by mEN are from the biological process and molecular function categories, suggesting that the GO terms of these categories also play vital roles in the prediction. With these essential GO terms, not only where a protein locates can be decided, but also why it resides there can be revealed. Conclusions Experimental results show that the output of both mEN and mLASSO are interpretable and they perform significantly better than existing state-of-the-art predictors. Moreover, mEN selects more features and performs better than mLASSO on a stringent human benchmark dataset. For readers’ convenience, an online server called SpaPredictor for both mLASSO and mEN is available at http://bioinfo.eie.polyu.edu.hk/SpaPredictorServer/.
Collapse
Affiliation(s)
- Shibiao Wan
- Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong, SAR, China.
| | - Man-Wai Mak
- Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong, SAR, China.
| | - Sun-Yuan Kung
- Department of Electrical Engineering, Princeton University, New Jersey, USA.
| |
Collapse
|
32
|
Predicting subcellular localization of multi-location proteins by improving support vector machines with an adaptive-decision scheme. INT J MACH LEARN CYB 2015. [DOI: 10.1007/s13042-015-0460-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
|
33
|
Wan S, Mak MW, Kung SY. mLASSO-Hum: A LASSO-based interpretable human-protein subcellular localization predictor. J Theor Biol 2015; 382:223-34. [DOI: 10.1016/j.jtbi.2015.06.042] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2015] [Revised: 06/25/2015] [Accepted: 06/26/2015] [Indexed: 02/03/2023]
|
34
|
Papadopoulos A, Kirmitzoglou I, Promponas VJ, Theocharides T. GPU technology as a platform for accelerating local complexity analysis of protein sequences. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2015; 2013:2684-7. [PMID: 24110280 DOI: 10.1109/embc.2013.6610093] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
The use of GPGPU programming paradigm (running CUDA-enabled algorithms on GPU cards) in Bioinformatics showed promising results [1]. As such a similar approach can be used to speedup other algorithms such as CAST, a popular tool used for masking low-complexity regions (LCRs) in protein sequences [2] with increased sensitivity. We developed and implemented a CUDA-enabled version (GPU_CAST) of the multi-threaded version of CAST software first presented in [3] and optimized in [4]. The proposed software implementation uses the nVIDIA CUDA libraries and the GPGPU programming paradigm to take advantage of the inherent parallel characteristics of the CAST algorithm to execute the calculations on the GPU card of the host computer system. The GPU-based implementation presented in this work, is compared against the multi-threaded, multi-core optimized version of CAST [4] and yielded speedups of 5x-10x for large protein sequence datasets.
Collapse
|
35
|
mPLR-Loc: An adaptive decision multi-label classifier based on penalized logistic regression for protein subcellular localization prediction. Anal Biochem 2015; 473:14-27. [DOI: 10.1016/j.ab.2014.10.014] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2014] [Revised: 09/29/2014] [Accepted: 10/21/2014] [Indexed: 01/16/2023]
|
36
|
Zhao XF, Wang J, Liu GX, Fan TP, Zhang YJ, Yu J, Wang SX, Li ZJ, Zhang YY, Zheng XH. Binding mechanism of nine N-phenylpiperazine derivatives and α1A-adrenoceptor using site-directed molecular docking and high performance affinity chromatography. RSC Adv 2015. [DOI: 10.1039/c5ra10812h] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Investigating the binding mechanism of α1A-adrenoceptor and its specific ligands by affinity chromatography.
Collapse
|
37
|
Chen J, Tang YY, Chen CLP, Fang B, Lin Y, Shang Z. Multi-Label Learning With Fuzzy Hypergraph Regularization for Protein Subcellular Location Prediction. IEEE Trans Nanobioscience 2014; 13:438-47. [DOI: 10.1109/tnb.2014.2341111] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
38
|
Text as data: using text-based features for proteins representation and for computational prediction of their characteristics. Methods 2014; 74:54-64. [PMID: 25448299 DOI: 10.1016/j.ymeth.2014.10.027] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2014] [Revised: 09/21/2014] [Accepted: 10/21/2014] [Indexed: 11/21/2022] Open
Abstract
The current era of large-scale biology is characterized by a fast-paced growth in the number of sequenced genomes and, consequently, by a multitude of identified proteins whose function has yet to be determined. Simultaneously, any known or postulated information concerning genes and proteins is part of the ever-growing published scientific literature, which is expanding at a rate of over a million new publications per year. Computational tools that attempt to automatically predict and annotate protein characteristics, such as function and localization patterns, are being developed along with systems that aim to support the process via text mining. Most work on protein characterization focuses on features derived directly from protein sequence data. Protein-related work that does aim to utilize the literature typically concentrates on extracting specific facts (e.g., protein interactions) from text. In the past few years we have taken a different route, treating the literature as a source of text-based features, which can be employed just as sequence-based protein-features were used in earlier work, for predicting protein subcellular location and possibly also function. We discuss here in detail the overall approach, along with results from work we have done in this area demonstrating the value of this method and its potential use.
Collapse
|
39
|
Wan S, Mak MW, Kung SY. R3P-Loc: a compact multi-label predictor using ridge regression and random projection for protein subcellular localization. J Theor Biol 2014; 360:34-45. [PMID: 24997236 DOI: 10.1016/j.jtbi.2014.06.031] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2014] [Revised: 06/24/2014] [Accepted: 06/25/2014] [Indexed: 12/21/2022]
Abstract
Locating proteins within cellular contexts is of paramount significance in elucidating their biological functions. Computational methods based on knowledge databases (such as gene ontology annotation (GOA) database) are known to be more efficient than sequence-based methods. However, the predominant scenarios of knowledge-based methods are that (1) knowledge databases typically have enormous size and are growing exponentially, (2) knowledge databases contain redundant information, and (3) the number of extracted features from knowledge databases is much larger than the number of data samples with ground-truth labels. These properties render the extracted features liable to redundant or irrelevant information, causing the prediction systems suffer from overfitting. To address these problems, this paper proposes an efficient multi-label predictor, namely R3P-Loc, which uses two compact databases for feature extraction and applies random projection (RP) to reduce the feature dimensions of an ensemble ridge regression (RR) classifier. Two new compact databases are created from Swiss-Prot and GOA databases. These databases possess almost the same amount of information as their full-size counterparts but with much smaller size. Experimental results on two recent datasets (eukaryote and plant) suggest that R3P-Loc can reduce the dimensions by seven-folds and significantly outperforms state-of-the-art predictors. This paper also demonstrates that the compact databases reduce the memory consumption by 39 times without causing degradation in prediction accuracy. For readers׳ convenience, the R3P-Loc server is available online at url:http://bioinfo.eie.polyu.edu.hk/R3PLocServer/.
Collapse
Affiliation(s)
- Shibiao Wan
- Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong SAR, China.
| | - Man-Wai Mak
- Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong SAR, China.
| | - Sun-Yuan Kung
- Department of Electrical Engineering, Princeton University, NJ, USA.
| |
Collapse
|
40
|
Abdallah C, Valot B, Guillier C, Mounier A, Balliau T, Zivy M, van Tuinen D, Renaut J, Wipf D, Dumas-Gaudot E, Recorbet G. The membrane proteome of Medicago truncatula roots displays qualitative and quantitative changes in response to arbuscular mycorrhizal symbiosis. J Proteomics 2014; 108:354-68. [PMID: 24925269 DOI: 10.1016/j.jprot.2014.05.028] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2013] [Revised: 04/07/2014] [Accepted: 05/12/2014] [Indexed: 11/25/2022]
Abstract
UNLABELLED Arbuscular mycorrhizal (AM) symbiosis that associates roots of most land plants with soil-borne fungi (Glomeromycota), is characterized by reciprocal nutritional benefits. Fungal colonization of plant roots induces massive changes in cortical cells where the fungus differentiates an arbuscule, which drives proliferation of the plasma membrane. Despite the recognized importance of membrane proteins in sustaining AM symbiosis, the root microsomal proteome elicited upon mycorrhiza still remains to be explored. In this study, we first examined the qualitative composition of the root membrane proteome of Medicago truncatula after microsome enrichment and subsequent in depth analysis by GeLC-MS/MS. The results obtained highlighted the identification of 1226 root membrane protein candidates whose cellular and functional classifications predispose plastids and protein synthesis as prevalent organelle and function, respectively. Changes at the protein abundance level between the membrane proteomes of mycorrhizal and nonmycorrhizal roots were further monitored by spectral counting, which retrieved a total of 96 proteins that displayed a differential accumulation upon AM symbiosis. Besides the canonical markers of the periarbuscular membrane, new candidates supporting the importance of membrane trafficking events during mycorrhiza establishment/functioning were identified, including flotillin-like proteins. The data have been deposited to the ProteomeXchange with identifier PXD000875. BIOLOGICAL SIGNIFICANCE During arbuscular mycorrhizal symbiosis, one of the most widespread mutualistic associations in nature, the endomembrane system of plant roots is believed to undergo qualitative and quantitative changes in order to sustain both the accommodation process of the AM fungus within cortical cells and the exchange of nutrients between symbionts. Large-scale GeLC-MS/MS proteomic analysis of the membrane fractions from mycorrhizal and nonmycorrhizal roots of M. truncatula coupled to spectral counting retrieved around one hundred proteins that displayed changes in abundance upon mycorrhizal establishment. The symbiosis-related membrane proteins that were identified mostly function in signaling/membrane trafficking and nutrient uptake regulation. Besides extending the coverage of the root membrane proteome of M. truncatula, new candidates involved in the symbiotic program emerged from the current study, which pointed out a dynamic reorganization of microsomal proteins during the accommodation of AM fungi within cortical cells.
Collapse
Affiliation(s)
- Cosette Abdallah
- UMR Agroécologie INRA 1347/Agrosup/Université de Bourgogne, Pôle Interactions Plantes Microorganismes ERL 6300 CNRS, BP 86510, 21065 Dijon Cedex, France; Environmental and Agro-Biotechnologies Department, Centre de Recherche Public-Gabriel Lippmann, 41, rue du Brill, Belvaux L-4422, Luxembourg.
| | - Benoit Valot
- UMR de Génétique Végétale, PAPPSO, Ferme du Moulon, 91190 Gif sur Yvette, France.
| | - Christelle Guillier
- UMR Agroécologie INRA 1347/Agrosup/Université de Bourgogne, Pôle Interactions Plantes Microorganismes ERL 6300 CNRS, BP 86510, 21065 Dijon Cedex, France.
| | - Arnaud Mounier
- UMR Agroécologie INRA 1347/Agrosup/Université de Bourgogne, Pôle Interactions Plantes Microorganismes ERL 6300 CNRS, BP 86510, 21065 Dijon Cedex, France.
| | - Thierry Balliau
- UMR de Génétique Végétale, PAPPSO, Ferme du Moulon, 91190 Gif sur Yvette, France.
| | - Michel Zivy
- UMR de Génétique Végétale, PAPPSO, Ferme du Moulon, 91190 Gif sur Yvette, France.
| | - Diederik van Tuinen
- UMR Agroécologie INRA 1347/Agrosup/Université de Bourgogne, Pôle Interactions Plantes Microorganismes ERL 6300 CNRS, BP 86510, 21065 Dijon Cedex, France.
| | - Jenny Renaut
- Environmental and Agro-Biotechnologies Department, Centre de Recherche Public-Gabriel Lippmann, 41, rue du Brill, Belvaux L-4422, Luxembourg.
| | - Daniel Wipf
- UMR Agroécologie INRA 1347/Agrosup/Université de Bourgogne, Pôle Interactions Plantes Microorganismes ERL 6300 CNRS, BP 86510, 21065 Dijon Cedex, France.
| | - Eliane Dumas-Gaudot
- UMR Agroécologie INRA 1347/Agrosup/Université de Bourgogne, Pôle Interactions Plantes Microorganismes ERL 6300 CNRS, BP 86510, 21065 Dijon Cedex, France.
| | - Ghislaine Recorbet
- Environmental and Agro-Biotechnologies Department, Centre de Recherche Public-Gabriel Lippmann, 41, rue du Brill, Belvaux L-4422, Luxembourg.
| |
Collapse
|
41
|
Yu CS, Cheng CW, Su WC, Chang KC, Huang SW, Hwang JK, Lu CH. CELLO2GO: a web server for protein subCELlular LOcalization prediction with functional gene ontology annotation. PLoS One 2014; 9:e99368. [PMID: 24911789 PMCID: PMC4049835 DOI: 10.1371/journal.pone.0099368] [Citation(s) in RCA: 282] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2013] [Accepted: 05/14/2014] [Indexed: 01/15/2023] Open
Abstract
CELLO2GO (http://cello.life.nctu.edu.tw/cello2go/) is a publicly available, web-based system for screening various properties of a targeted protein and its subcellular localization. Herein, we describe how this platform is used to obtain a brief or detailed gene ontology (GO)-type categories, including subcellular localization(s), for the queried proteins by combining the CELLO localization-predicting and BLAST homology-searching approaches. Given a query protein sequence, CELLO2GO uses BLAST to search for homologous sequences that are GO annotated in an in-house database derived from the UniProt KnowledgeBase database. At the same time, CELLO attempts predict at least one subcellular localization on the basis of the species in which the protein is found. When homologs for the query sequence have been identified, the number of terms found for each of their GO categories, i.e., cellular compartment, molecular function, and biological process, are summed and presented as pie charts representing possible functional annotations for the queried protein. Although the experimental subcellular localization of a protein may not be known, and thus not annotated, CELLO can confidentially suggest a subcellular localization. CELLO2GO should be a useful tool for research involving complex subcellular systems because it combines CELLO and BLAST into one platform and its output is easily manipulated such that the user-specific questions may be readily addressed.
Collapse
Affiliation(s)
- Chin-Sheng Yu
- Department of Information Engineering and Computer Science, Feng Chia University, Taichung, Taiwan
- Master's Program in Biomedical Informatics and Biomedical Engineering, Feng Chia University, Taichung, Taiwan
| | - Chih-Wen Cheng
- Department of Information Engineering and Computer Science, Feng Chia University, Taichung, Taiwan
| | - Wen-Chi Su
- Department of Information Engineering and Computer Science, Feng Chia University, Taichung, Taiwan
| | - Kuei-Chung Chang
- Department of Information Engineering and Computer Science, Feng Chia University, Taichung, Taiwan
| | - Shao-Wei Huang
- Department of Medical Informatics, Tzu Chi University, Hualien, Taiwan
| | - Jenn-Kang Hwang
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
- Center of Bioinformatics Research, National Chiao Tung University, Hsinchu, Taiwan
| | - Chih-Hao Lu
- Graduate Institute of Basic Medical Science, China Medical University, Taichung, Taiwan
- * E-mail:
| |
Collapse
|
42
|
Song T, Gu H. Discriminative motif discovery via simulated evolution and random under-sampling. PLoS One 2014; 9:e87670. [PMID: 24551063 PMCID: PMC3923751 DOI: 10.1371/journal.pone.0087670] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2013] [Accepted: 12/29/2013] [Indexed: 11/22/2022] Open
Abstract
Conserved motifs in biological sequences are closely related to their structure and functions. Recently, discriminative motif discovery methods have attracted more and more attention. However, little attention has been devoted to the data imbalance problem, which is one of the main reasons affecting the performance of the discriminative models. In this article, a simulated evolution method is applied to solve the multi-class imbalance problem at the stage of data preprocessing, and at the stage of Hidden Markov Models (HMMs) training, a random under-sampling method is introduced for the imbalance between the positive and negative datasets. It is shown that, in the task of discovering targeting motifs of nine subcellular compartments, the motifs found by our method are more conserved than the methods without considering data imbalance problem and recover the most known targeting motifs from Minimotif Miner and InterPro. Meanwhile, we use the found motifs to predict protein subcellular localization and achieve higher prediction precision and recall for the minority classes.
Collapse
Affiliation(s)
- Tao Song
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, Liaoning, China
| | - Hong Gu
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, Liaoning, China
- * E-mail:
| |
Collapse
|
43
|
Fukasawa Y, Leung RKK, Tsui SKW, Horton P. Plus ça change - evolutionary sequence divergence predicts protein subcellular localization signals. BMC Genomics 2014; 15:46. [PMID: 24438075 PMCID: PMC3906766 DOI: 10.1186/1471-2164-15-46] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2013] [Accepted: 01/06/2014] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Protein subcellular localization is a central problem in understanding cell biology and has been the focus of intense research. In order to predict localization from amino acid sequence a myriad of features have been tried: including amino acid composition, sequence similarity, the presence of certain motifs or domains, and many others. Surprisingly, sequence conservation of sorting motifs has not yet been employed, despite its extensive use for tasks such as the prediction of transcription factor binding sites. RESULTS Here, we flip the problem around, and present a proof of concept for the idea that the lack of sequence conservation can be a novel feature for localization prediction. We show that for yeast, mammal and plant datasets, evolutionary sequence divergence alone has significant power to identify sequences with N-terminal sorting sequences. Moreover sequence divergence is nearly as effective when computed on automatically defined ortholog sets as on hand curated ones. Unfortunately, sequence divergence did not necessarily increase classification performance when combined with some traditional sequence features such as amino acid composition. However a post-hoc analysis of the proteins in which sequence divergence changes the prediction yielded some proteins with atypical (i.e. not MPP-cleaved) matrix targeting signals as well as a few misannotations. CONCLUSION We report the results of the first quantitative study of the effectiveness of evolutionary sequence divergence as a feature for protein subcellular localization prediction. We show that divergence is indeed useful for prediction, but it is not trivial to improve overall accuracy simply by adding this feature to classical sequence features. Nevertheless we argue that sequence divergence is a promising feature and show anecdotal examples in which it succeeds where other features fail.
Collapse
Affiliation(s)
- Yoshinori Fukasawa
- Department of Computational Biology, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Japan
- Japan Society for the Promotion of Science, Tokyo Chiyoda, Japan
| | - Ross KK Leung
- Hong Kong Bioinformatics Centre and School of Biomedical Sciences, Chinese University of Hong Kong, Shatin, China
| | - Stephen KW Tsui
- Hong Kong Bioinformatics Centre and School of Biomedical Sciences, Chinese University of Hong Kong, Shatin, China
| | - Paul Horton
- Department of Computational Biology, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Japan
- Computational Biology Research Center, Advanced Industrial Science and Technology, Tokyo, Japan
| |
Collapse
|
44
|
Wang X, Wang R, Zhang Y, Zhang H. Evolutionary survey of druggable protein targets with respect to their subcellular localizations. Genome Biol Evol 2013; 5:1291-7. [PMID: 23749117 PMCID: PMC3730344 DOI: 10.1093/gbe/evt092] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
The druggable subset of the human genome, termed the “druggable genome,” provides the pharmaceutical industry with a unique opportunity for the advancement of new therapeutic interventions for a multitude of diseases and disorders. To date, there is no systematic assessment of the evolutionary history and nature of the defined druggable proteins derived from the contemporary druggable genome (i.e., proteins that bind or are predicted to bind with high affinity to a biologic). An understanding of drug–protein target interactions in specific cellular compartments is crucial for the optimal therapeutic delivery of pharmaceutical agents, as well as for preclinical drug trials in model animals. This study applied the concept of pharmacophylogenomics, the study of genes, evolution, and drug targets, to conduct an evolutionary survey of drug targets with respect to their subcellular localizations. Using multiple models and modes of druggable genome comparison, the results concordantly indicated that orthologous drug targets with a nuclear localization in the human, macaque, mouse, and rat showed a higher trend for evolutionary conservation compared with drug targets in the cell membrane and the extracellular compartment. As such, this study provides important information regarding druggable protein targets and the druggable genome at the pharmacophylogenomics level.
Collapse
Affiliation(s)
- Xiaotong Wang
- School of Agriculture, Ludong University, Yantai, China
| | | | | | | |
Collapse
|
45
|
A novel approach for protein subcellular location prediction using amino acid exposure. BMC Bioinformatics 2013; 14:342. [PMID: 24283794 PMCID: PMC4219330 DOI: 10.1186/1471-2105-14-342] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2013] [Accepted: 11/25/2013] [Indexed: 11/10/2022] Open
Abstract
Background Proteins perform their functions in associated cellular locations. Therefore, the study of protein function can be facilitated by predictions of protein location. Protein location can be predicted either from the sequence of a protein alone by identification of targeting peptide sequences and motifs, or by homology to proteins of known location. A third approach, which is complementary, exploits the differences in amino acid composition of proteins associated to different cellular locations, and can be useful if motif and homology information are missing. Here we expand this approach taking into account amino acid composition at different levels of amino acid exposure. Results Our method has two stages. For stage one, we trained multiple Support Vector Machines (SVMs) to score eukaryotic protein sequences for membership to each of three categories: nuclear, cytoplasmic and extracellular, plus extra category nucleocytoplasmic, accounting for the fact that a large number of proteins shuttles between those two locations. In stage two we use an artificial neural network (ANN) to propose a category from the scores given to the four locations in stage one. The method reaches an accuracy of 68% when using as input 3D-derived values of amino acid exposure. Calibration of the method using predicted values of amino acid exposure allows classifying proteins without 3D-information with an accuracy of 62% and discerning proteins in different locations even if they shared high levels of identity. Conclusions In this study we explored the relationship between residue exposure and protein subcellular location. We developed a new algorithm for subcellular location prediction that uses residue exposure signatures. Our algorithm uses a novel approach to address the multiclass classification problem. The algorithm is implemented as web server 'NYCE’ and can be accessed at http://cbdm.mdc-berlin.de/~amer/nyce.
Collapse
|
46
|
Feng PM, Lin H, Chen W. Identification of antioxidants from sequence information using naïve Bayes. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2013; 2013:567529. [PMID: 24062796 PMCID: PMC3766563 DOI: 10.1155/2013/567529] [Citation(s) in RCA: 79] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 05/03/2013] [Revised: 07/20/2013] [Accepted: 07/22/2013] [Indexed: 12/22/2022]
Abstract
Antioxidant proteins are substances that protect cells from the damage caused by free radicals. Accurate identification of new antioxidant proteins is important in understanding their roles in delaying aging. Therefore, it is highly desirable to develop computational methods to identify antioxidant proteins. In this study, a Naïve Bayes-based method was proposed to predict antioxidant proteins using amino acid compositions and dipeptide compositions. In order to remove redundant information, a novel feature selection technique was employed to single out optimized features. In the jackknife test, the proposed method achieved an accuracy of 66.88% for the discrimination between antioxidant and nonantioxidant proteins, which is superior to that of other state-of-the-art classifiers. These results suggest that the proposed method could be an effective and promising high-throughput method for antioxidant protein identification.
Collapse
Affiliation(s)
- Peng-Mian Feng
- School of Public Health, Hebei United University, Tangshan 063000, China
| | - Hao Lin
- Key Laboratory for NeuroInformation of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Wei Chen
- Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063000, China
| |
Collapse
|
47
|
Xu YY, Yang F, Zhang Y, Shen HB. An image-based multi-label human protein subcellular localization predictor (iLocator) reveals protein mislocalizations in cancer tissues. ACTA ACUST UNITED AC 2013; 29:2032-40. [PMID: 23740749 DOI: 10.1093/bioinformatics/btt320] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
MOTIVATION Human cells are organized into compartments of different biochemical cellular processes. Having proteins appear at the right time to the correct locations in the cellular compartments is required to conduct their functions in normal cells, whereas mislocalization of proteins can result in pathological diseases, including cancer. RESULTS To reveal the cancer-related protein mislocalizations, we developed an image-based multi-label subcellular location predictor, iLocator, which covers seven cellular localizations. The iLocator incorporates both global and local image descriptors and generates predictions by using an ensemble multi-label classifier. The algorithm has the ability to treat both single- and multiple-location proteins. We first trained and tested iLocator on 3240 normal human tissue images that have known subcellular location information from the human protein atlas. The iLocator was then used to generate protein localization predictions for 3696 protein images from seven cancer tissues that have no location annotations in the human protein atlas. By comparing the output data from normal and cancer tissues, we detected eight potential cancer biomarker proteins that have significant localization differences with P-value < 0.01. AVAILABILITY http://www.csbio.sjtu.edu.cn/bioinf/iLocator/
Collapse
Affiliation(s)
- Ying-Ying Xu
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
| | | | | | | |
Collapse
|
48
|
Abstract
Motivation: Subcellular localization is one aspect of protein function. Despite advances in high-throughput imaging, localization maps remain incomplete. Several methods accurately predict localization, but many challenges remain to be tackled. Results: In this study, we introduced a framework to predict localization in life's three domains, including globular and membrane proteins (3 classes for archaea; 6 for bacteria and 18 for eukaryota). The resulting method, LocTree2, works well even for protein fragments. It uses a hierarchical system of support vector machines that imitates the cascading mechanism of cellular sorting. The method reaches high levels of sustained performance (eukaryota: Q18=65%, bacteria: Q6=84%). LocTree2 also accurately distinguishes membrane and non-membrane proteins. In our hands, it compared favorably with top methods when tested on new data. Availability: Online through PredictProtein (predictprotein.org); as standalone version at http://www.rostlab.org/services/loctree2. Contact:localization@rostlab.org Supplementary Information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tatyana Goldberg
- TUM, Bioinformatik-I12, Informatik, Boltzmannstrasse 3, Garching 85748, Germany.
| | | | | |
Collapse
|
49
|
LIN HAO, DING CHEN, YUAN LUFENG, CHEN WEI, DING HUI, LI ZIQIANG, GUO FENGBIAO, HUANG JIAN, RAO NINI. PREDICTING SUBCHLOROPLAST LOCATIONS OF PROTEINS BASED ON THE GENERAL FORM OF CHOU'S PSEUDO AMINO ACID COMPOSITION: APPROACHED FROM OPTIMAL TRIPEPTIDE COMPOSITION. INT J BIOMATH 2013. [DOI: 10.1142/s1793524513500034] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Chloroplasts are organelles found in plant cells that conduct photosynthesis. The subchloroplast locations of proteins are correlated with their functions. With the availability of a great number of protein data, it is highly desired to develop a computational method to predict the subchloroplast locations of chloroplast proteins. In this study, we proposed a novel method to predict subchloroplast locations of proteins using tripeptide compositions. It first used the binomial distribution to optimize the feature sets. Then the support vector machine was selected to perform the prediction of subchloroplast locations of proteins. The proposed method was tested on a reliable and rigorous dataset including 259 chloroplast proteins with sequence identity ≤ 25%. In the jack-knife cross-validation, 92.21% envelope proteins, 93.20% thylakoid membrane, 52.63% thylakoid lumen and 85.00% stroma can be correctly identified. The overall accuracy achieves 88.03% which is higher than that of other models. Based on this method, a predictor called ChloPred has been built and can be freely available from http://cobi.uestc.edu.cn/people/hlin/tools/ChloPred/ . The predictor will provide important information for theoretical and experimental research of chloroplast proteins.
Collapse
Affiliation(s)
- HAO LIN
- Key Laboratory for NeuroInformation of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, P. R. China
| | - CHEN DING
- Key Laboratory for NeuroInformation of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, P. R. China
| | - LU-FENG YUAN
- Key Laboratory for NeuroInformation of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, P. R. China
| | - WEI CHEN
- Center for Genomics and Computational Biology, Department of Physics, College of Sciences, Hebei United University, Tangshan 063000, P. R. China
| | - HUI DING
- Key Laboratory for NeuroInformation of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, P. R. China
| | - ZI-QIANG LI
- School of Information and Engineering, Sichuan Agricultural University, Yaan 625014, P. R. China
| | - FENG-BIAO GUO
- Key Laboratory for NeuroInformation of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, P. R. China
| | - JIAN HUANG
- Key Laboratory for NeuroInformation of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, P. R. China
| | - NI-NI RAO
- Key Laboratory for NeuroInformation of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, P. R. China
| |
Collapse
|
50
|
Donati C, Rappuoli R. Reverse vaccinology in the 21st century: improvements over the original design. Ann N Y Acad Sci 2013; 1285:115-32. [PMID: 23527566 DOI: 10.1111/nyas.12046] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Reverse vaccinology (RV), the first application of genomic technologies in vaccine research, represented a major revolution in the process of discovering novel vaccines. By determining their entire antigenic repertoire, researchers could identify protective targets and design efficacious vaccines for pathogens where conventional approaches had failed. Bexsero, the first vaccine developed using RV, has recently received positive opinion from the European Medicines Agency. The use of RV initiated a cascade of changes that affected the entire vaccine development process, shifting the focus from the identification of a list of vaccine candidates to the definition of a set of high throughput screens to reduce the need for costly and labor intensive tests in animal models. It is now clear that a deep understanding of the epidemiology of vaccine candidates, and their regulation and role in host-pathogen interactions, must become an integral component of the screening workflow. Far from being outdated by technological advancements, RV still represents a paradigm of how high-throughput technologies and scientific insight can be integrated into biotechnology research.
Collapse
|