1
|
Hu G, Moon J, Hayashi T. Protein Classes Predicted by Molecular Surface Chemical Features: Machine Learning-Assisted Classification of Cytosol and Secreted Proteins. J Phys Chem B 2024; 128:8423-8436. [PMID: 39185763 PMCID: PMC11382266 DOI: 10.1021/acs.jpcb.4c02461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Chemical structures of protein surfaces govern intermolecular interaction, and protein functions include specific molecular recognition, transport, self-assembly, etc. Therefore, the relationship between the chemical structure and protein functions provides insights into the understanding of the mechanism underlying protein functions and developments of new biomaterials. In this study, we analyze protein surface features, including surface amino acid populations and secondary structure ratios, instead of entire sequences as input for the classifier, intending to provide deeper insights into the determination of protein classes (cytosol or secreted). We employed a random forest-based classifier for the prediction of protein locations. Our training and testing data sets consisting of secreted and cytosol proteins were constructed using filtered information from UniProt and 3D structures from AlphaFold. The classifier achieved a testing accuracy of 93.9% with a feature importance ranking and quantitative boundary values for the top three features. We discuss the significance of these features quantitatively and the hidden rules to determine the protein classes (cytosol or secreted).
Collapse
Affiliation(s)
- Guanghao Hu
- Department of Materials Science and Engineering, School of Materials Science and Chemical Technology, Tokyo Institute of Technology, 4259 Nagatsuta-cho, Midori-ku, Yokohama-shi, Kanagawa-ken 226-8502, Japan
| | - Jooa Moon
- Department of Materials Science and Engineering, School of Materials Science and Chemical Technology, Tokyo Institute of Technology, 4259 Nagatsuta-cho, Midori-ku, Yokohama-shi, Kanagawa-ken 226-8502, Japan
| | - Tomohiro Hayashi
- Department of Materials Science and Engineering, School of Materials Science and Chemical Technology, Tokyo Institute of Technology, 4259 Nagatsuta-cho, Midori-ku, Yokohama-shi, Kanagawa-ken 226-8502, Japan
- The Institute for Solid State Physics, The University of Tokyo, 5-1-5, Kashiwanoha, Kashiwa, Chiba 277-0882, Japan
| |
Collapse
|
2
|
Nielsen H. Protein Sorting Prediction. Methods Mol Biol 2024; 2715:27-63. [PMID: 37930519 DOI: 10.1007/978-1-0716-3445-5_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2023]
Abstract
Many computational methods are available for predicting protein sorting in bacteria. When comparing them, it is important to know that they can be grouped into three fundamentally different approaches: signal-based, global property-based, and homology-based prediction. In this chapter, the strengths and drawbacks of each of these approaches are described through many examples of methods that predict secretion, integration into membranes, or subcellular locations in general. The aim of this chapter is to provide a user-level introduction to the field with a minimum of computational theory.
Collapse
Affiliation(s)
- Henrik Nielsen
- Department of Health Technology, Technical University of Denmark, Lyngby, Denmark.
| |
Collapse
|
3
|
Li J, Zou Q, Yuan L. A review from biological mapping to computation-based subcellular localization. MOLECULAR THERAPY. NUCLEIC ACIDS 2023; 32:507-521. [PMID: 37215152 PMCID: PMC10192651 DOI: 10.1016/j.omtn.2023.04.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Subcellular localization is crucial to the study of virus and diseases. Specifically, research on protein subcellular localization can help identify clues between virus and host cells that can aid in the design of targeted drugs. Research on RNA subcellular localization is significant for human diseases (such as Alzheimer's disease, colon cancer, etc.). To date, only reviews addressing subcellular localization of proteins have been published, which are outdated for reference, and reviews of RNA subcellular localization are not comprehensive. Therefore, we collated (the most up-to-date) literature on protein and RNA subcellular localization to help researchers understand changes in the field of protein and RNA subcellular localization. Extensive and complete methods for constructing subcellular localization models have also been summarized, which can help readers understand the changes in application of biotechnology and computer science in subcellular localization research and explore how to use biological data to construct improved subcellular localization models. This paper is the first review to cover both protein subcellular localization and RNA subcellular localization. We urge researchers from biology and computational biology to jointly pay attention to transformation patterns, interrelationships, differences, and causality of protein subcellular localization and RNA subcellular localization.
Collapse
Affiliation(s)
- Jing Li
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, 1 Chengdian Road, Quzhou, Zhejiang 324000, China
- School of Biomedical Sciences, University of Hong Kong, Hong Kong, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, 1 Chengdian Road, Quzhou, Zhejiang 324000, China
| | - Lei Yuan
- Department of Hepatobiliary Surgery, Quzhou People's Hospital, 100 Minjiang Main Road, Quzhou, Zhejiang 324000, China
| |
Collapse
|
4
|
Anteghini M, Haja A, Martins dos Santos VA, Schomaker L, Saccenti E. OrganelX web server for sub-peroxisomal and sub-mitochondrial protein localization and peroxisomal target signal detection. Comput Struct Biotechnol J 2022; 21:128-133. [PMID: 36544474 PMCID: PMC9747352 DOI: 10.1016/j.csbj.2022.11.058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 11/28/2022] [Accepted: 11/28/2022] [Indexed: 12/12/2022] Open
Abstract
We present the OrganelX e-Science Web Server that provides a user-friendly implementation of the In-Pero and In-Mito classifiers for sub-peroxisomal and sub-mitochondrial localization of peroxisomal and mitochondrial proteins and the Is-PTS1 algorithm for detecting and validating potential peroxisomal proteins carrying a PTS1 signal sequence. The OrganelX e-Science Web Server is available at https://organelx.hpc.rug.nl/fasta/.
Collapse
Affiliation(s)
- Marco Anteghini
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Wageningen, The Netherlands
- LifeGlimmer GmbH, Berlin, Germany
| | - Asmaa Haja
- Bernoulli Institute, University of Groningen, Groningen, The Netherlands
| | - Vitor A.P. Martins dos Santos
- LifeGlimmer GmbH, Berlin, Germany
- Bioprocess Engineering, Wageningen University & Research, Wageningen, The Netherlands
| | - Lambert Schomaker
- Bernoulli Institute, University of Groningen, Groningen, The Netherlands
| | - Edoardo Saccenti
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Wageningen, The Netherlands
| |
Collapse
|
5
|
Masnoddin M, Ling CMWV, Yusof NA. Functional Analysis of Conserved Hypothetical Proteins from the Antarctic Bacterium, Pedobacter cryoconitis Strain BG5 Reveals Protein Cold Adaptation and Thermal Tolerance Strategies. Microorganisms 2022; 10:microorganisms10081654. [PMID: 36014072 PMCID: PMC9415557 DOI: 10.3390/microorganisms10081654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Revised: 08/04/2022] [Accepted: 08/12/2022] [Indexed: 11/16/2022] Open
Abstract
Pedobacter cryoconitis BG5 is an obligate psychrophilic bacterium that was first isolated on King George Island, Antarctica. Over the last 50 years, the West Antarctic, including King George Island, has been one of the most rapidly warming places on Earth, hence making it an excellent area to measure the resilience of living species in warmed areas exposed to the constantly changing environment due to climate change. This bacterium encodes a genome of approximately 5694 protein-coding genes. However, 35% of the gene models for this species are found to be hypothetical proteins (HP). In this study, three conserved HP genes of P. cryoconitis, designated pcbg5hp1, pcbg5hp2 and pcbg5hp12, were cloned and the proteins were expressed, purified and their functions and structures were evaluated. Real-time quantitative PCR analysis revealed that these genes were expressed constitutively, suggesting a potentially important role where the expression of these genes under an almost constant demand might have some regulatory functions in thermal stress tolerance. Functional analysis showed that these proteins maintained their activities at low and moderate temperatures. Meanwhile, a low citrate synthase aggregation at 43 °C in the presence of PCBG5HP1 suggested the characteristics of chaperone activity. Furthermore, our comparative structural analysis demonstrated that the HPs exhibited cold-adapted traits, most notably increased flexibility in their 3D structures compared to their counterparts. Concurrently, the presence of a disulphide bridge and aromatic clusters was attributed to PCBG5HP1’s unusual protein stability and chaperone activity. Thus, this suggested that the HPs examined in this study acquired strategies to maintain a balance between molecular stability and structural flexibility. Conclusively, this study has established the structure–function relationships of the HPs produced by P. cryoconitis and provided crucial experimental evidence indicating their importance in thermal stress response.
Collapse
Affiliation(s)
- Makdi Masnoddin
- Biotechnology Research Institute, Universiti Malaysia Sabah, Jalan UMS, Kota Kinabalu 88400, Sabah, Malaysia
- Preparatory Centre for Science and Technology, Universiti Malaysia Sabah, Jalan UMS, Kota Kinabalu 88400, Sabah, Malaysia
| | | | - Nur Athirah Yusof
- Biotechnology Research Institute, Universiti Malaysia Sabah, Jalan UMS, Kota Kinabalu 88400, Sabah, Malaysia
- Correspondence:
| |
Collapse
|
6
|
Lu Z, Yin G, Chai M, Sun L, Wei H, Chen J, Yang Y, Fu X, Li S. Systematic analysis of CNGCs in cotton and the positive role of GhCNGC32 and GhCNGC35 in salt tolerance. BMC Genomics 2022; 23:560. [PMID: 35931984 PMCID: PMC9356423 DOI: 10.1186/s12864-022-08800-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2022] [Accepted: 07/27/2022] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Cyclic nucleotide-gated ion channels (CNGCs) are calcium-permeable channels that participate in a variety of biological functions, such as signaling pathways, plant development, and environmental stress and stimulus responses. Nevertheless, there have been few studies on CNGC gene family in cotton. RESULTS In this study, a total of 114 CNGC genes were identified from the genomes of 4 cotton species. These genes clustered into 5 main groups: I, II, III, IVa, and IVb. Gene structure and protein motif analysis showed that CNGCs on the same branch were highly conserved. In addition, collinearity analysis showed that the CNGC gene family had expanded mainly by whole-genome duplication (WGD). Promoter analysis of the GhCNGCs showed that there were a large number of cis-acting elements related to abscisic acid (ABA). Combination of transcriptome data and the results of quantitative RT-PCR (qRT-PCR) analysis revealed that some GhCNGC genes were induced in response to salt and drought stress and to exogenous ABA. Virus-induced gene silencing (VIGS) experiments showed that the silencing of the GhCNGC32 and GhCNGC35 genes decreased the salt tolerance of cotton plants (TRV:00). Specifically, physiological indexes showed that the malondialdehyde (MDA) content in gene-silenced plants (TRV:GhCNGC32 and TRV:GhCNGC35) increased significantly under salt stress but that the peroxidase (POD) activity decreased. After salt stress, the expression level of ABA-related genes increased significantly, indicating that salt stress can trigger the ABA signal regulatory mechanism. CONCLUSIONS we comprehensively analyzed CNGC genes in four cotton species, and found that GhCNGC32 and GhCNGC35 genes play an important role in cotton salt tolerance. These results laid a foundation for the subsequent study of the involvement of cotton CNGC genes in salt tolerance.
Collapse
Affiliation(s)
- Zhengying Lu
- Handan Academy of Agricultural Sciences, Handan, China
| | - Guo Yin
- Handan Academy of Agricultural Sciences, Handan, China
| | - Mao Chai
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences (CAAS), Anyang, China
| | - Lu Sun
- Handan Academy of Agricultural Sciences, Handan, China
| | - Hengling Wei
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences (CAAS), Anyang, China
| | - Jie Chen
- Handan Academy of Agricultural Sciences, Handan, China
| | - Yufeng Yang
- Handan Academy of Agricultural Sciences, Handan, China
| | - Xiaokang Fu
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences (CAAS), Anyang, China.
| | - Shiyun Li
- Handan Academy of Agricultural Sciences, Handan, China.
| |
Collapse
|
7
|
Mendik P, Kerestély M, Kamp S, Deritei D, Kunšič N, Vassy Z, Csermely P, Veres DV. Translocating proteins compartment-specifically alter the fate of epithelial-mesenchymal transition in a compartmentalized Boolean network model. NPJ Syst Biol Appl 2022; 8:19. [PMID: 35680961 PMCID: PMC9184490 DOI: 10.1038/s41540-022-00228-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Accepted: 05/20/2022] [Indexed: 11/13/2022] Open
Abstract
Regulation of translocating proteins is crucial in defining cellular behaviour. Epithelial-mesenchymal transition (EMT) is important in cellular processes, such as cancer progression. Several orchestrators of EMT, such as key transcription factors, are known to translocate. We show that translocating proteins become enriched in EMT-signalling. To simulate the compartment-specific functions of translocating proteins we created a compartmentalized Boolean network model. This model successfully reproduced known biological traits of EMT and as a novel feature it also captured organelle-specific functions of proteins. Our results predicted that glycogen synthase kinase-3 beta (GSK3B) compartment-specifically alters the fate of EMT, amongst others the activation of nuclear GSK3B halts transforming growth factor beta-1 (TGFB) induced EMT. Moreover, our results recapitulated that the nuclear activation of glioma associated oncogene transcription factors (GLI) is needed to achieve a complete EMT. Compartmentalized network models will be useful to uncover novel control mechanisms of biological processes. Our algorithmic procedures can be automatically rerun on the https://translocaboole.linkgroup.hu website, which provides a framework for similar future studies.
Collapse
Affiliation(s)
- Péter Mendik
- Department of Molecular Biology, Institute of Biochemistry and Molecular Biology, Semmelweis University, Budapest, Hungary
| | - Márk Kerestély
- Department of Molecular Biology, Institute of Biochemistry and Molecular Biology, Semmelweis University, Budapest, Hungary
| | | | - Dávid Deritei
- Department of Molecular Biology, Institute of Biochemistry and Molecular Biology, Semmelweis University, Budapest, Hungary
| | - Nina Kunšič
- Department of Molecular Biology, Institute of Biochemistry and Molecular Biology, Semmelweis University, Budapest, Hungary
| | - Zsolt Vassy
- Department of Molecular Biology, Institute of Biochemistry and Molecular Biology, Semmelweis University, Budapest, Hungary
| | - Péter Csermely
- Department of Molecular Biology, Institute of Biochemistry and Molecular Biology, Semmelweis University, Budapest, Hungary
| | - Daniel V Veres
- Department of Molecular Biology, Institute of Biochemistry and Molecular Biology, Semmelweis University, Budapest, Hungary. .,Turbine Ltd, Budapest, Hungary.
| |
Collapse
|
8
|
Ma D, Lai Z, Ding Q, Zhang K, Chang K, Li S, Zhao Z, Zhong F. Identification, Characterization and Function of Orphan Genes Among the Current Cucurbitaceae Genomes. FRONTIERS IN PLANT SCIENCE 2022; 13:872137. [PMID: 35599909 PMCID: PMC9114813 DOI: 10.3389/fpls.2022.872137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 03/28/2022] [Indexed: 06/15/2023]
Abstract
Orphan genes (OGs) that are missing identifiable homologs in other lineages may potentially make contributions to a variety of biological functions. The Cucurbitaceae family consists of a wide range of fruit crops of worldwide or local economic significance. To date, very few functional mechanisms of OGs in Cucurbitaceae are known. In this study, we systematically identified the OGs of eight Cucurbitaceae species using a comparative genomics approach. The content of OGs varied widely among the eight Cucurbitaceae species, ranging from 1.63% in chayote to 16.55% in wax gourd. Genetic structure analysis showed that OGs have significantly shorter protein lengths and fewer exons in Cucurbitaceae. The subcellular localizations of OGs were basically the same, with only subtle differences. Except for aggregation in some chromosomal regions, the distribution density of OGs was higher near the telomeres and relatively evenly distributed on the chromosomes. Gene expression analysis revealed that OGs had less abundantly and highly tissue-specific expression. Interestingly, the largest proportion of these OGs was significantly more tissue-specific expressed in the flower than in other tissues, and more detectable expression was found in the male flower. Functional prediction of OGs showed that (1) 18 OGs associated with male sterility in watermelon; (2) 182 OGs associated with flower development in cucumber; (3) 51 OGs associated with environmental adaptation in watermelon; (4) 520 OGs may help with the large fruit size in wax gourd. Our results provide the molecular basis and research direction for some important mechanisms in Cucurbitaceae species and domesticated crops.
Collapse
Affiliation(s)
- Dongna Ma
- College of Horticulture, Fujian Agriculture and Forestry University, Fujian, China
- College of the Environment and Ecology, Xiamen University, Fujian, China
| | - Zhengfeng Lai
- Subtropical Agricultural Research Institute, Fujian Academy of Agriculture Sciences, Fujian, China
| | - Qiansu Ding
- College of the Environment and Ecology, Xiamen University, Fujian, China
| | - Kun Zhang
- College of Horticulture, Fujian Agriculture and Forestry University, Fujian, China
| | - Kaizhen Chang
- College of Horticulture, Fujian Agriculture and Forestry University, Fujian, China
| | - Shuhao Li
- College of Horticulture, Fujian Agriculture and Forestry University, Fujian, China
| | - Zhizhu Zhao
- College of the Environment and Ecology, Xiamen University, Fujian, China
| | - Fenglin Zhong
- College of Horticulture, Fujian Agriculture and Forestry University, Fujian, China
| |
Collapse
|
9
|
Wang G, Zhai YJ, Xue ZZ, Xu YY. Improving Protein Subcellular Location Classification by Incorporating Three-Dimensional Structure Information. Biomolecules 2021; 11:1607. [PMID: 34827605 PMCID: PMC8615982 DOI: 10.3390/biom11111607] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 10/27/2021] [Accepted: 10/27/2021] [Indexed: 12/12/2022] Open
Abstract
The subcellular locations of proteins are closely related to their functions. In the past few decades, the application of machine learning algorithms to predict protein subcellular locations has been an important topic in proteomics. However, most studies in this field used only amino acid sequences as the data source. Only a few works focused on other protein data types. For example, three-dimensional structures, which contain far more functional protein information than sequences, remain to be explored. In this work, we extracted various handcrafted features to describe the protein structures from physical, chemical, and topological aspects, as well as the learned features obtained by deep neural networks. We then used these features to classify the protein subcellular locations. Our experimental results demonstrated that some of these structural features have a certain effect on the protein location classification, and can help improve the performance of sequence-based location predictors. Our method provides a new view for the analysis of protein spatial distribution, and is anticipated to be used in revealing the relationships between protein structures and functions.
Collapse
Affiliation(s)
- Ge Wang
- School of Biomedical Engineering, Southern Medical University, Guangzhou 510515, China; (G.W.); (Z.-Z.X.)
- Guangdong Provincial Key Laboratory of Medical Imaging Processing, Southern Medical University, Guangzhou 510515, China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China
| | - Yu-Jia Zhai
- Guangzhou Women and Children’s Medical Center, Department of Pharmacy, Guangzhou Medical University, Guangzhou 510623, China;
| | - Zhen-Zhen Xue
- School of Biomedical Engineering, Southern Medical University, Guangzhou 510515, China; (G.W.); (Z.-Z.X.)
- Guangdong Provincial Key Laboratory of Medical Imaging Processing, Southern Medical University, Guangzhou 510515, China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China
- Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Ying-Ying Xu
- School of Biomedical Engineering, Southern Medical University, Guangzhou 510515, China; (G.W.); (Z.-Z.X.)
- Guangdong Provincial Key Laboratory of Medical Imaging Processing, Southern Medical University, Guangzhou 510515, China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China
| |
Collapse
|
10
|
Anteghini M, Martins dos Santos V, Saccenti E. In-Pero: Exploiting Deep Learning Embeddings of Protein Sequences to Predict the Localisation of Peroxisomal Proteins. Int J Mol Sci 2021; 22:6409. [PMID: 34203866 PMCID: PMC8232616 DOI: 10.3390/ijms22126409] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Revised: 05/31/2021] [Accepted: 06/09/2021] [Indexed: 01/28/2023] Open
Abstract
Peroxisomes are ubiquitous membrane-bound organelles, and aberrant localisation of peroxisomal proteins contributes to the pathogenesis of several disorders. Many computational methods focus on assigning protein sequences to subcellular compartments, but there are no specific tools tailored for the sub-localisation (matrix vs. membrane) of peroxisome proteins. We present here In-Pero, a new method for predicting protein sub-peroxisomal cellular localisation. In-Pero combines standard machine learning approaches with recently proposed multi-dimensional deep-learning representations of the protein amino-acid sequence. It showed a classification accuracy above 0.9 in predicting peroxisomal matrix and membrane proteins. The method is trained and tested using a double cross-validation approach on a curated data set comprising 160 peroxisomal proteins with experimental evidence for sub-peroxisomal localisation. We further show that the proposed approach can be easily adapted (In-Mito) to the prediction of mitochondrial protein localisation obtaining performances for certain classes of proteins (matrix and inner-membrane) superior to existing tools.
Collapse
Affiliation(s)
- Marco Anteghini
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Stippeneng 4, 6708 WE Wageningen, The Netherlands;
- LifeGlimmer GmbH, 12163 Berlin, Germany
| | - Vitor Martins dos Santos
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Stippeneng 4, 6708 WE Wageningen, The Netherlands;
- LifeGlimmer GmbH, 12163 Berlin, Germany
| | - Edoardo Saccenti
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Stippeneng 4, 6708 WE Wageningen, The Netherlands;
| |
Collapse
|
11
|
Frutiger A, Tanno A, Hwu S, Tiefenauer RF, Vörös J, Nakatsuka N. Nonspecific Binding-Fundamental Concepts and Consequences for Biosensing Applications. Chem Rev 2021; 121:8095-8160. [PMID: 34105942 DOI: 10.1021/acs.chemrev.1c00044] [Citation(s) in RCA: 98] [Impact Index Per Article: 32.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Nature achieves differentiation of specific and nonspecific binding in molecular interactions through precise control of biomolecules in space and time. Artificial systems such as biosensors that rely on distinguishing specific molecular binding events in a sea of nonspecific interactions have struggled to overcome this issue. Despite the numerous technological advancements in biosensor technologies, nonspecific binding has remained a critical bottleneck due to the lack of a fundamental understanding of the phenomenon. To date, the identity, cause, and influence of nonspecific binding remain topics of debate within the scientific community. In this review, we discuss the evolution of the concept of nonspecific binding over the past five decades based upon the thermodynamic, intermolecular, and structural perspectives to provide classification frameworks for biomolecular interactions. Further, we introduce various theoretical models that predict the expected behavior of biosensors in physiologically relevant environments to calculate the theoretical detection limit and to optimize sensor performance. We conclude by discussing existing practical approaches to tackle the nonspecific binding challenge in vitro for biosensing platforms and how we can both address and harness nonspecific interactions for in vivo systems.
Collapse
Affiliation(s)
- Andreas Frutiger
- Laboratory of Biosensors and Bioelectronics, Institute for Biomedical Engineering, ETH Zürich, Zürich CH-8092, Switzerland
| | - Alexander Tanno
- Laboratory of Biosensors and Bioelectronics, Institute for Biomedical Engineering, ETH Zürich, Zürich CH-8092, Switzerland
| | - Stephanie Hwu
- Laboratory of Biosensors and Bioelectronics, Institute for Biomedical Engineering, ETH Zürich, Zürich CH-8092, Switzerland
| | - Raphael F Tiefenauer
- Laboratory of Biosensors and Bioelectronics, Institute for Biomedical Engineering, ETH Zürich, Zürich CH-8092, Switzerland
| | - János Vörös
- Laboratory of Biosensors and Bioelectronics, Institute for Biomedical Engineering, ETH Zürich, Zürich CH-8092, Switzerland
| | - Nako Nakatsuka
- Laboratory of Biosensors and Bioelectronics, Institute for Biomedical Engineering, ETH Zürich, Zürich CH-8092, Switzerland
| |
Collapse
|
12
|
Abstract
The elucidation of the subcellular localization of proteins is very important in order to deeply understand their functions. In fact, proteins activities are strictly correlated to the cellular compartment and microenvironment in which they are present.In recent years, several effective and reliable proteomics techniques and computational methods have been developed and implemented in order to identify the proteins subcellular localization. This process is often time-consuming and expensive, but the recent technological and bioinformatics progress allowed the development of more accurate and simple workflows to determine the localization, interactions, and functions of proteins.In the following chapter, a brief introduction on the importance of knowing subcellular localization of proteins will be presented. Then, sample preparation protocols, proteomic methods, data analysis strategies, and software for the prediction of proteins localization will be presented and discussed. Finally, the more recent and advanced spatial proteomics techniques will be shown.
Collapse
Affiliation(s)
- Elettra Barberis
- Department of Translational Medicine, University of Piemonte Orientale, Novara, Italy
- Center for Translational Research on Autoimmune and Allergic Diseases, CAAD, University of Piemonte Orientale, Novara, Italy
| | - Emilio Marengo
- Department of Sciences and Technological Innovation, University of Piemonte Orientale, Alessandria, Italy
- Center for Translational Research on Autoimmune and Allergic Diseases, CAAD, University of Piemonte Orientale, Novara, Italy
| | - Marcello Manfredi
- Department of Translational Medicine, University of Piemonte Orientale, Novara, Italy.
- Center for Translational Research on Autoimmune and Allergic Diseases, CAAD, University of Piemonte Orientale, Novara, Italy.
| |
Collapse
|
13
|
Zhang J, Yu J, Lin D, Guo X, He H, Shi S. DeepCLA: A Hybrid Deep Learning Approach for the Identification of Clathrin. J Chem Inf Model 2020; 61:516-524. [PMID: 33347303 DOI: 10.1021/acs.jcim.0c00979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Clathrin is a highly evolutionarily conserved protein, which can affect membrane cleavage and membrane release of vesicles. The absence of clathrin in the cellular system affects a variety of human diseases. Effective recognition of clathrin plays an important role in the development of drugs to treat related diseases. In recent years, deep learning has been widely applied in the field of bioinformatics because of its high efficiency and accuracy. In this study, we propose a deep learning framework, DeepCLA, which combines two different network structures, including a convolutional neural network and a bidirectional long short-term memory network to identify clathrin. The investigation of different deep network architectures demonstrates that the prediction performance of a hybrid depth network model is better than that of a single depth network. On the independent test dataset, DeepCLA outperforms the state-of-the-art methods. It suggests that DeepCLA is an effective approach for clathrin prediction and can provide more instructive guidance for further experimental investigation of clathrin. Moreover, the source code and training data of DeepCLA are provided at https://github.com/ZhangZhang89/DeepCLA.
Collapse
Affiliation(s)
- Ju Zhang
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang 330031, China
| | - Jialin Yu
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang 330031, China
| | - Dan Lin
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang 330031, China
| | - Xinyun Guo
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang 330031, China
| | - Huan He
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang 330031, China
| | - Shaoping Shi
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang 330031, China
| |
Collapse
|
14
|
Li FM, Gao XW. Predicting Gram-Positive Bacterial Protein Subcellular Location by Using Combined Features. BIOMED RESEARCH INTERNATIONAL 2020; 2020:9701734. [PMID: 32802888 PMCID: PMC7421015 DOI: 10.1155/2020/9701734] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/23/2020] [Revised: 06/30/2020] [Accepted: 07/13/2020] [Indexed: 12/14/2022]
Abstract
There are a lot of bacteria in the environment, and Gram-positive bacteria are the most common ones. Some Gram-positive bacteria are very harmful to the human body, so it is significant to predict Gram-positive bacterial protein subcellular location. And identification of Gram-positive bacterial protein subcellular location is important for developing effective drugs. In this paper, a new Gram-positive bacterial protein subcellular location dataset was established. The amino acid composition, the gene ontology annotation information, the hydropathy dipeptide composition information, the amino acid dipeptide composition information, and the autocovariance average chemical shift information were selected as characteristic parameters, then these parameters were combined. The locations of Gram-positive bacterial proteins were predicted by the Support Vector Machine (SVM) algorithm, and the overall accuracy (OA) reached 86.1% under the Jackknife test. The overall accuracy (OA) in our predictive model was higher than those in existing methods. This improved method may be helpful for protein function prediction.
Collapse
Affiliation(s)
- Feng-Min Li
- College of Science, Inner Mongolia Agricultural University, Hohhot 010018, China
| | - Xiao-Wei Gao
- College of Science, Inner Mongolia Agricultural University, Hohhot 010018, China
| |
Collapse
|
15
|
Some illuminating remarks on molecular genetics and genomics as well as drug development. Mol Genet Genomics 2020; 295:261-274. [PMID: 31894399 DOI: 10.1007/s00438-019-01634-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Accepted: 12/05/2019] [Indexed: 02/07/2023]
Abstract
Facing the explosive growth of biological sequences unearthed in the post-genomic age, one of the most important but also most difficult problems in computational biology is how to express a biological sequence with a discrete model or a vector, but still keep it with considerable sequence-order information or its special pattern. To deal with such a challenging problem, the ideas of "pseudo amino acid components" and "pseudo K-tuple nucleotide composition" have been proposed. The ideas and their approaches have further stimulated the birth for "distorted key theory", "wenxing diagram", and substantially strengthening the power in treating the multi-label systems, as well as the establishment of the famous "5-steps rule". All these logic developments are quite natural that are very useful not only for theoretical scientists but also for experimental scientists in conducting genetics/genomics analysis and drug development. Presented in this review paper are also their future perspectives; i.e., their impacts will become even more significant and propounding.
Collapse
|
16
|
Nielsen H, Petsalaki EI, Zhao L, Stühler K. Predicting eukaryotic protein secretion without signals. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2019; 1867:140174. [DOI: 10.1016/j.bbapap.2018.11.011] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2018] [Revised: 10/30/2018] [Accepted: 11/29/2018] [Indexed: 10/27/2022]
|
17
|
Chou KC. Advances in Predicting Subcellular Localization of Multi-label Proteins and its Implication for Developing Multi-target Drugs. Curr Med Chem 2019; 26:4918-4943. [PMID: 31060481 DOI: 10.2174/0929867326666190507082559] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Revised: 01/29/2019] [Accepted: 01/31/2019] [Indexed: 12/16/2022]
Abstract
The smallest unit of life is a cell, which contains numerous protein molecules. Most
of the functions critical to the cell’s survival are performed by these proteins located in its different
organelles, usually called ‘‘subcellular locations”. Information of subcellular localization
for a protein can provide useful clues about its function. To reveal the intricate pathways at the
cellular level, knowledge of the subcellular localization of proteins in a cell is prerequisite.
Therefore, one of the fundamental goals in molecular cell biology and proteomics is to determine
the subcellular locations of proteins in an entire cell. It is also indispensable for prioritizing
and selecting the right targets for drug development. Unfortunately, it is both timeconsuming
and costly to determine the subcellular locations of proteins purely based on experiments.
With the avalanche of protein sequences generated in the post-genomic age, it is highly
desired to develop computational methods for rapidly and effectively identifying the subcellular
locations of uncharacterized proteins based on their sequences information alone. Actually,
considerable progresses have been achieved in this regard. This review is focused on those
methods, which have the capacity to deal with multi-label proteins that may simultaneously
exist in two or more subcellular location sites. Protein molecules with this kind of characteristic
are vitally important for finding multi-target drugs, a current hot trend in drug development.
Focused in this review are also those methods that have use-friendly web-servers established so
that the majority of experimental scientists can use them to get the desired results without the
need to go through the detailed mathematics involved.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, United States
| |
Collapse
|
18
|
Abstract
The smallest unit of life is a cell, which contains numerous protein molecules. Most
of the functions critical to the cell’s survival are performed by these proteins located in its different
organelles, usually called ‘‘subcellular locations”. Information of subcellular localization
for a protein can provide useful clues about its function. To reveal the intricate pathways at the
cellular level, knowledge of the subcellular localization of proteins in a cell is prerequisite.
Therefore, one of the fundamental goals in molecular cell biology and proteomics is to determine
the subcellular locations of proteins in an entire cell. It is also indispensable for prioritizing
and selecting the right targets for drug development. Unfortunately, it is both timeconsuming
and costly to determine the subcellular locations of proteins purely based on experiments.
With the avalanche of protein sequences generated in the post-genomic age, it is highly
desired to develop computational methods for rapidly and effectively identifying the subcellular
locations of uncharacterized proteins based on their sequences information alone. Actually,
considerable progresses have been achieved in this regard. This review is focused on those
methods, which have the capacity to deal with multi-label proteins that may simultaneously
exist in two or more subcellular location sites. Protein molecules with this kind of characteristic
are vitally important for finding multi-target drugs, a current hot trend in drug development.
Focused in this review are also those methods that have use-friendly web-servers established so
that the majority of experimental scientists can use them to get the desired results without the
need to go through the detailed mathematics involved.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, United States
| |
Collapse
|
19
|
Li SH, Guan ZX, Zhang D, Zhang ZM, Huang J, Yang W, Lin H. Recent Advancement in Predicting Subcellular Localization of Mycobacterial Protein with Machine Learning Methods. Med Chem 2019; 16:605-619. [PMID: 31584379 DOI: 10.2174/1573406415666191004101913] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2019] [Revised: 06/25/2019] [Accepted: 08/23/2019] [Indexed: 01/28/2023]
Abstract
Mycobacterium tuberculosis (MTB) can cause the terrible tuberculosis (TB), which is reported as one of the most dreadful epidemics. Although many biochemical molecular drugs have been developed to cope with this disease, the drug resistance-especially the multidrug-resistant (MDR) and extensively drug-resistance (XDR)-poses a huge threat to the treatment. However, traditional biochemical experimental method to tackle TB is time-consuming and costly. Benefited by the appearance of the enormous genomic and proteomic sequence data, TB can be treated via sequence-based biological computational approach-bioinformatics. Studies on predicting subcellular localization of mycobacterial protein (MBP) with high precision and efficiency may help figure out the biological function of these proteins and then provide useful insights for protein function annotation as well as drug design. In this review, we reported the progress that has been made in computational prediction of subcellular localization of MBP including the following aspects: 1) Construction of benchmark datasets. 2) Methods of feature extraction. 3) Techniques of feature selection. 4) Application of several published prediction algorithms. 5) The published results. 6) The further study on prediction of subcellular localization of MBP.
Collapse
Affiliation(s)
- Shi-Hao Li
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zheng-Xing Guan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Dan Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zi-Mei Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Jian Huang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Wuritu Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,Development and Planning Department, Inner Mongolia University, Hohhot, P.R. China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
20
|
Bernhofer M, Goldberg T, Wolf S, Ahmed M, Zaugg J, Boden M, Rost B. NLSdb-major update for database of nuclear localization signals and nuclear export signals. Nucleic Acids Res 2019; 46:D503-D508. [PMID: 29106588 PMCID: PMC5753228 DOI: 10.1093/nar/gkx1021] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 10/18/2017] [Indexed: 11/13/2022] Open
Abstract
NLSdb is a database collecting nuclear export signals (NES) and nuclear localization signals (NLS) along with experimentally annotated nuclear and non-nuclear proteins. NES and NLS are short sequence motifs related to protein transport out of and into the nucleus. The updated NLSdb now contains 2253 NLS and introduces 398 NES. The potential sets of novel NES and NLS have been generated by a simple 'in silico mutagenesis' protocol. We started with motifs annotated by experiments. In step 1, we increased specificity such that no known non-nuclear protein matched the refined motif. In step 2, we increased the sensitivity trying to match several different families with a motif. We then iterated over steps 1 and 2. The final set of 2253 NLS motifs matched 35% of 8421 experimentally verified nuclear proteins (up from 21% for the previous version) and none of 18 278 non-nuclear proteins. We updated the web interface providing multiple options to search protein sequences for NES and NLS motifs, and to evaluate your own signal sequences. NLSdb can be accessed via Rostlab services at: https://rostlab.org/services/nlsdb/.
Collapse
Affiliation(s)
- Michael Bernhofer
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748 Garching/Munich, Germany
| | - Tatyana Goldberg
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748 Garching/Munich, Germany
| | - Silvana Wolf
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748 Garching/Munich, Germany
| | - Mohamed Ahmed
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748 Garching/Munich, Germany
| | - Julian Zaugg
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane 4072, Australia
| | - Mikael Boden
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane 4072, Australia
| | - Burkhard Rost
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748 Garching/Munich, Germany.,Institute of Advanced Study (TUM-IAS), Lichtenbergstrasse 2a, 85748 Garching/Munich, Germany.,Institute for Food and Plant Sciences WZW-Weihenstephan, Alte Akademie 8, 85354 Freising, Germany.,Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
| |
Collapse
|
21
|
Abstract
Ever since the signal hypothesis was proposed in 1971, the exact nature of signal peptides has been a focus point of research. The prediction of signal peptides and protein subcellular location from amino acid sequences has been an important problem in bioinformatics since the dawn of this research field, involving many statistical and machine learning technologies. In this review, we provide a historical account of how position-weight matrices, artificial neural networks, hidden Markov models, support vector machines and, lately, deep learning techniques have been used in the attempts to predict where proteins go. Because the secretory pathway was the first one to be studied both experimentally and through bioinformatics, our main focus is on the historical development of prediction methods for signal peptides that target proteins for secretion; prediction methods to identify targeting signals for other cellular compartments are treated in less detail.
Collapse
Affiliation(s)
- Henrik Nielsen
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Kgs. Lyngby, Denmark.
| | - Konstantinos D Tsirigos
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Søren Brunak
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Kgs. Lyngby, Denmark
- Faculty of Health and Medical Sciences, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
| | - Gunnar von Heijne
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
- Science for Life Laboratory, Stockholm University, Solna, Sweden
| |
Collapse
|
22
|
Perdigão N, Rosa A. Dark Proteome Database: Studies on Dark Proteins. High Throughput 2019; 8:ht8020008. [PMID: 30934744 PMCID: PMC6630768 DOI: 10.3390/ht8020008] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2018] [Revised: 03/12/2019] [Accepted: 03/15/2019] [Indexed: 12/27/2022] Open
Abstract
The dark proteome, as we define it, is the part of the proteome where 3D structure has not been observed either by homology modeling or by experimental characterization in the protein universe. From the 550.116 proteins available in Swiss-Prot (as of July 2016), 43.2% of the eukarya universe and 49.2% of the virus universe are part of the dark proteome. In bacteria and archaea, the percentage of the dark proteome presence is significantly less, at 12.6% and 13.3% respectively. In this work, we present a necessary step to complete the dark proteome picture by introducing the map of the dark proteome in the human and in other model organisms of special importance to mankind. The most significant result is that around 40% to 50% of the proteome of these organisms are still in the dark, where the higher percentages belong to higher eukaryotes (mouse and human organisms). Due to the amount of darkness present in the human organism being more than 50%, deeper studies were made, including the identification of ‘dark’ genes that are responsible for the production of so-called dark proteins, as well as the identification of the ‘dark’ tissues where dark proteins are over represented, namely, the heart, cervical mucosa, and natural killer cells. This is a step forward in the direction of gaining a deeper knowledge of the human dark proteome.
Collapse
Affiliation(s)
- Nelson Perdigão
- Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisbon, Portugal.
- Instituto de Sistemas e Robótica, 1049-001 Lisbon, Portugal.
| | - Agostinho Rosa
- Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisbon, Portugal.
- Instituto de Sistemas e Robótica, 1049-001 Lisbon, Portugal.
| |
Collapse
|
23
|
Marginal protein stability drives subcellular proteome isoelectric point. Proc Natl Acad Sci U S A 2018; 115:11778-11783. [PMID: 30385634 DOI: 10.1073/pnas.1809098115] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
There exists a positive correlation between the pH of subcellular compartments and the median isoelectric point (pI) for the associated proteomes. Proteins in the human lysosome-a highly acidic compartment in the cell-have a median pI of ∼6.5, whereas proteins in the more basic mitochondria have a median pI of ∼8.0. Proposed mechanisms reflect potential adaptations to pH. For example, enzyme active site general acid/base residue pKs are likely evolved to match environmental pH. However, such effects would be limited to a few residues on specific proteins, and might not affect the proteome at large. A protein model that considers residue burial upon folding recapitulates the correlation between proteome pI and environmental pH. This correlation can be fully described by a neutral evolution process; no functional selection is included in the model. Proteins in acidic environments incur a lower energetic penalty for burying acidic residues than basic residues, resulting in a net accumulation of acidic residues in the protein core. The inverse is true under alkaline conditions. The pI distributions of subcellular proteomes are likely not a direct result of functional adaptations to pH, but a molecular spandrel stemming from marginal stability.
Collapse
|
24
|
da Costa WLO, Araújo CLDA, Dias LM, Pereira LCDS, Alves JTC, Araújo FA, Folador EL, Henriques I, Silva A, Folador ARC. Functional annotation of hypothetical proteins from the Exiguobacterium antarcticum strain B7 reveals proteins involved in adaptation to extreme environments, including high arsenic resistance. PLoS One 2018; 13:e0198965. [PMID: 29940001 PMCID: PMC6016940 DOI: 10.1371/journal.pone.0198965] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2018] [Accepted: 05/28/2018] [Indexed: 02/07/2023] Open
Abstract
Exiguobacterium antarcticum strain B7 is a psychrophilic Gram-positive bacterium that possesses enzymes that can be used for several biotechnological applications. However, many proteins from its genome are considered hypothetical proteins (HPs). These functionally unknown proteins may indicate important functions regarding the biological role of this bacterium, and the use of bioinformatics tools can assist in the biological understanding of this organism through functional annotation analysis. Thus, our study aimed to assign functions to proteins previously described as HPs, present in the genome of E. antarcticum B7. We used an extensive in silico workflow combining several bioinformatics tools for function annotation, sub-cellular localization and physicochemical characterization, three-dimensional structure determination, and protein-protein interactions. This genome contains 2772 genes, of which 765 CDS were annotated as HPs. The amino acid sequences of all HPs were submitted to our workflow and we successfully attributed function to 132 HPs. We identified 11 proteins that play important roles in the mechanisms of adaptation to adverse environments, such as flagellar biosynthesis, biofilm formation, carotenoids biosynthesis, and others. In addition, three predicted HPs are possibly related to arsenic tolerance. Through an in vitro assay, we verified that E. antarcticum B7 can grow at high concentrations of this metal. The approach used was important to precisely assign function to proteins from diverse classes and to infer relationships with proteins with functions already described in the literature. This approach aims to produce a better understanding of the mechanism by which this bacterium adapts to extreme environments and to the finding of targets with biotechnological interest.
Collapse
Affiliation(s)
- Wana Lailan Oliveira da Costa
- Laboratory of Genomic and Bioinformatics, Center of Genomics and System Biology, Institute of Biological Science, Federal University of Para, Belém, Pará, Brazil
| | - Carlos Leonardo de Aragão Araújo
- Laboratory of Genomic and Bioinformatics, Center of Genomics and System Biology, Institute of Biological Science, Federal University of Para, Belém, Pará, Brazil
| | - Larissa Maranhão Dias
- Laboratory of Genomic and Bioinformatics, Center of Genomics and System Biology, Institute of Biological Science, Federal University of Para, Belém, Pará, Brazil
| | - Lino César de Sousa Pereira
- Laboratory of Genomic and Bioinformatics, Center of Genomics and System Biology, Institute of Biological Science, Federal University of Para, Belém, Pará, Brazil
| | - Jorianne Thyeska Castro Alves
- Laboratory of Genomic and Bioinformatics, Center of Genomics and System Biology, Institute of Biological Science, Federal University of Para, Belém, Pará, Brazil
| | - Fabrício Almeida Araújo
- Laboratory of Genomic and Bioinformatics, Center of Genomics and System Biology, Institute of Biological Science, Federal University of Para, Belém, Pará, Brazil
| | - Edson Luiz Folador
- Biotechnology Center, Federal University of Paraiba, João Pessoa, Paraíba, Brazil
| | - Isabel Henriques
- Biology Department & CESAM, University of Aveiro, Aveiro, Portugal
| | - Artur Silva
- Laboratory of Genomic and Bioinformatics, Center of Genomics and System Biology, Institute of Biological Science, Federal University of Para, Belém, Pará, Brazil
| | - Adriana Ribeiro Carneiro Folador
- Laboratory of Genomic and Bioinformatics, Center of Genomics and System Biology, Institute of Biological Science, Federal University of Para, Belém, Pará, Brazil
- * E-mail: ,
| |
Collapse
|
25
|
Abstract
Many computational methods are available for predicting protein sorting in bacteria. When comparing them, it is important to know that they can be grouped into three fundamentally different approaches: signal-based, global-property-based and homology-based prediction. In this chapter, the strengths and drawbacks of each of these approaches is described through many examples of methods that predict secretion, integration into membranes, or subcellular locations in general. The aim of this chapter is to provide a user-level introduction to the field with a minimum of computational theory.
Collapse
Affiliation(s)
- Henrik Nielsen
- Technical University of Denmark, Kemitorvet, Building 208, DK-2800, Kgs. Lyngby, Denmark.
| |
Collapse
|
26
|
Brüne D, Andrade-Navarro MA, Mier P. Proteome-wide comparison between the amino acid composition of domains and linkers. BMC Res Notes 2018; 11:117. [PMID: 29426365 PMCID: PMC5807739 DOI: 10.1186/s13104-018-3221-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2017] [Accepted: 02/01/2018] [Indexed: 02/01/2023] Open
Abstract
Objective Amino acid composition is a sequence feature that has been extensively used to characterize proteomes of many species and protein families. Yet the analysis of amino acid composition of protein domains and the linkers connecting them has received less attention. Here, we perform both a comprehensive full-proteome amino acid composition analysis and a similar analysis focusing on domains and linkers, to uncover domain- or linker-specific differential amino acid usage patterns. Results The amino acid composition in the 38 proteomes studied showcase the greater variability found in archaea and bacteria species compared to eukaryotes. When focusing on domains and linkers, we describe the preferential use of polar residues in linkers and hydrophobic residues in domains. To let any user perform this analysis on a given domain (or set of them), we developed a dedicated R script called RACCOON, which can be easily used and can provide interesting insights into the compositional differences between a domain and its surrounding linkers. Electronic supplementary material The online version of this article (10.1186/s13104-018-3221-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Daniel Brüne
- Institute of Pharmacy and Molecular Biotechnology, Ruprecht Karls University Heidelberg, 69120, Heidelberg, Germany
| | | | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Gresemundweg 2, 55128, Mainz, Germany.
| |
Collapse
|
27
|
Kumar R, Kumari B, Kumar M. Prediction of endoplasmic reticulum resident proteins using fragmented amino acid composition and support vector machine. PeerJ 2017; 5:e3561. [PMID: 28890846 PMCID: PMC5588793 DOI: 10.7717/peerj.3561] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2017] [Accepted: 06/20/2017] [Indexed: 12/15/2022] Open
Abstract
Background The endoplasmic reticulum plays an important role in many cellular processes, which includes protein synthesis, folding and post-translational processing of newly synthesized proteins. It is also the site for quality control of misfolded proteins and entry point of extracellular proteins to the secretory pathway. Hence at any given point of time, endoplasmic reticulum contains two different cohorts of proteins, (i) proteins involved in endoplasmic reticulum-specific function, which reside in the lumen of the endoplasmic reticulum, called as endoplasmic reticulum resident proteins and (ii) proteins which are in process of moving to the extracellular space. Thus, endoplasmic reticulum resident proteins must somehow be distinguished from newly synthesized secretory proteins, which pass through the endoplasmic reticulum on their way out of the cell. Approximately only 50% of the proteins used in this study as training data had endoplasmic reticulum retention signal, which shows that these signals are not essentially present in all endoplasmic reticulum resident proteins. This also strongly indicates the role of additional factors in retention of endoplasmic reticulum-specific proteins inside the endoplasmic reticulum. Methods This is a support vector machine based method, where we had used different forms of protein features as inputs for support vector machine to develop the prediction models. During training leave-one-out approach of cross-validation was used. Maximum performance was obtained with a combination of amino acid compositions of different part of proteins. Results In this study, we have reported a novel support vector machine based method for predicting endoplasmic reticulum resident proteins, named as ERPred. During training we achieved a maximum accuracy of 81.42% with leave-one-out approach of cross-validation. When evaluated on independent dataset, ERPred did prediction with sensitivity of 72.31% and specificity of 83.69%. We have also annotated six different proteomes to predict the candidate endoplasmic reticulum resident proteins in them. A webserver, ERPred, was developed to make the method available to the scientific community, which can be accessed at http://proteininformatics.org/mkumar/erpred/index.html. Discussion We found that out of 124 proteins of the training dataset, only 66 proteins had endoplasmic reticulum retention signals, which shows that these signals are not an absolute necessity for endoplasmic reticulum resident proteins to remain inside the endoplasmic reticulum. This observation also strongly indicates the role of additional factors in retention of proteins inside the endoplasmic reticulum. Our proposed predictor, ERPred, is a signal independent tool. It is tuned for the prediction of endoplasmic reticulum resident proteins, even if the query protein does not contain specific ER-retention signal.
Collapse
Affiliation(s)
- Ravindra Kumar
- Department of Biophysics, University of Delhi South Campus, New Delhi, India.,Current affiliation: Newe-Ya'ar Research Center, Agricultural Research Organization, Ramat Yishay, Israel
| | - Bandana Kumari
- Department of Biophysics, University of Delhi South Campus, New Delhi, India
| | - Manish Kumar
- Department of Biophysics, University of Delhi South Campus, New Delhi, India
| |
Collapse
|
28
|
pLoc-mVirus: Predict subcellular localization of multi-location virus proteins via incorporating the optimal GO information into general PseAAC. Gene 2017; 628:315-321. [DOI: 10.1016/j.gene.2017.07.036] [Citation(s) in RCA: 135] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2017] [Revised: 07/08/2017] [Accepted: 07/11/2017] [Indexed: 12/25/2022]
|
29
|
Nielsen H. Predicting Subcellular Localization of Proteins by Bioinformatic Algorithms. Curr Top Microbiol Immunol 2017; 404:129-158. [PMID: 26728066 DOI: 10.1007/82_2015_5006] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
When predicting the subcellular localization of proteins from their amino acid sequences, there are basically three approaches: signal-based, global property-based, and homology-based. Each of these has its advantages and drawbacks, and it is important when comparing methods to know which approach was used. Various statistical and machine learning algorithms are used with all three approaches, and various measures and standards are employed when reporting the performances of the developed methods. This chapter presents a number of available methods for prediction of sorting signals and subcellular localization, but rather than providing a checklist of which predictors to use, it aims to function as a guide for critical assessment of prediction methods.
Collapse
Affiliation(s)
- Henrik Nielsen
- Department of Systems Biology, Center for Biological Sequence Analysis, Technical University of Denmark, Kemitorvet building 208, 2800, Lyngby, Denmark.
| |
Collapse
|
30
|
Chaiyasit P, Tongraar A, Kerdcharoen T. Characteristics of methylammonium ion (CH 3 NH 3 + ) in aqueous electrolyte solution: An ONIOM-XS MD simulation study. Chem Phys 2017. [DOI: 10.1016/j.chemphys.2017.06.012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
31
|
Genome-wide analysis of the CCCH zinc finger family identifies tissue specific and stress responsive candidates in chickpea (Cicer arietinum L.). PLoS One 2017; 12:e0180469. [PMID: 28704400 PMCID: PMC5507508 DOI: 10.1371/journal.pone.0180469] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2017] [Accepted: 06/15/2017] [Indexed: 12/15/2022] Open
Abstract
The CCCH zinc finger is a group of proteins characterised by a typical motif consisting of three cysteine residues and one histidine residue. These proteins have been reported to play important roles in regulation of plant growth, developmental processes and environmental responses. In the present study, genome wide analysis of the CCCH zinc finger gene family was carried out in the available chickpea genome. Various bioinformatics tools were employed to predict 58 CCCH zinc finger genes in chickpea (designated CarC3H1-58), which were analysed for their physio-chemical properties. Phylogenetic analysis classified the proteins into 12 groups in which members of a particular group had similar structural organization. Further, the numbers as well as the types of CCCH motifs present in the CarC3H proteins were compared with those from Arabidopsis and Medicago truncatula. Synteny analysis revealed valuable information regarding the evolution of this gene family. Tandem and segmental duplication events were identified and their Ka/Ks values revealed that the CarC3H gene family in chickpea had undergone purifying selection. Digital, as well as real time qRT-PCR expression analysis was performed which helped in identification of several CarC3H members that expressed preferentially in specific chickpea tissues as well as during abiotic stresses (desiccation, cold, salinity). Moreover, molecular characterization of an important member CarC3H45 was carried out. This study provides comprehensive genomic information about the important CCCH zinc finger gene family in chickpea. The identified tissue specific and abiotic stress specific CCCH genes could be potential candidates for further characterization to delineate their functional roles in development and stress.
Collapse
|
32
|
Orfanoudaki G, Markaki M, Chatzi K, Tsamardinos I, Economou A. MatureP: prediction of secreted proteins with exclusive information from their mature regions. Sci Rep 2017; 7:3263. [PMID: 28607462 PMCID: PMC5468347 DOI: 10.1038/s41598-017-03557-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2017] [Accepted: 04/28/2017] [Indexed: 11/09/2022] Open
Abstract
More than a third of the cellular proteome is non-cytoplasmic. Most secretory proteins use the Sec system for export and are targeted to membranes using signal peptides and mature domains. To specifically analyze bacterial mature domain features, we developed MatureP, a classifier that predicts secretory sequences through features exclusively computed from their mature domains. MatureP was trained using Just Add Data Bio, an automated machine learning tool. Mature domains are predicted efficiently with ~92% success, as measured by the Area Under the Receiver Operating Characteristic Curve (AUC). Predictions were validated using experimental datasets of mutated secretory proteins. The features selected by MatureP reveal prominent differences in amino acid content between secreted and cytoplasmic proteins. Amino-terminal mature domain sequences have enhanced disorder, more hydroxyl and polar residues and less hydrophobics. Cytoplasmic proteins have prominent amino-terminal hydrophobic stretches and charged regions downstream. Presumably, secretory mature domains comprise a distinct protein class. They balance properties that promote the necessary flexibility required for the maintenance of non-folded states during targeting and secretion with the ability of post-secretion folding. These findings provide novel insight in protein trafficking, sorting and folding mechanisms and may benefit protein secretion biotechnology.
Collapse
Affiliation(s)
- Georgia Orfanoudaki
- Institute of Molecular Biology and Biotechnology-FORTH and Department of Biology-University of Crete, PO Box 1385, Heraklion, Crete, Greece
| | - Maria Markaki
- Computer Science Department, University of Crete, Heraklion, Greece
| | - Katerina Chatzi
- KU Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Laboratory of Molecular Bacteriology, B-3000, Leuven, Belgium
| | - Ioannis Tsamardinos
- Computer Science Department, University of Crete, Heraklion, Greece.,Gnosis Data Analysis PC, Heraklion, Greece
| | - Anastassios Economou
- Institute of Molecular Biology and Biotechnology-FORTH and Department of Biology-University of Crete, PO Box 1385, Heraklion, Crete, Greece. .,KU Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Laboratory of Molecular Bacteriology, B-3000, Leuven, Belgium.
| |
Collapse
|
33
|
Oligopeptidase B and B2: comparative modelling and virtual screening as searching tools for new antileishmanial compounds. Parasitology 2016; 144:536-545. [DOI: 10.1017/s0031182016002237] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
SUMMARYLeishmaniasis are diseases caused by parasites of the genus Leishmania and transmitted to humans by the bite of infected insects of the subfamily Phlebotominae. Current drug therapy shows high toxicity and severe adverse effects. Recently, two oligopeptidases (OPBs) were identified in Leishmania amazonensis, namely oligopeptidase B (OPB) and oligopeptidase B2 (OPB2). These OPBs could be ideal targets, since both enzymes are expressed in all parasite lifecycle and were not identified in human. This work aimed to identify possible dual inhibitors of OPB and OPB2 from L. amazonensis. The three-dimensional structures of both enzymes were built by comparative modelling and used to perform a virtual screening of ZINC database by DOCK Blaster server. It is the first time that OPB models from L. amazonensis are used to virtual screening approach. Four hundred compounds were identified as possible inhibitors to each enzyme. The top scored compounds were submitted to refinement by AutoDock program. The best results suggest that compounds interact with important residues, as Tyr490, Glu612 and Arg655 (OPB numbers). The identified compounds showed better results than antipain and drugs currently used against leishmaniasis when ADMET in silico were performed. These compounds could be explored in order to find dual inhibitors of OPB and OPB2 from L. amazonensis.
Collapse
|
34
|
Ikpeme E, Udensi O, Kooffreh M, Etta H, Ushie B, Echea E, Ozoje M. In silico Analysis of BRCA1 Gene and its Phylogenetic Relationship in some Selected Domestic Animal Species. ACTA ACUST UNITED AC 2016. [DOI: 10.3923/tb.2017.1.10] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
35
|
Genome-wide identification of multifunctional laccase gene family in cotton (Gossypium spp.); expression and biochemical analysis during fiber development. Sci Rep 2016; 6:34309. [PMID: 27679939 PMCID: PMC5041144 DOI: 10.1038/srep34309] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2016] [Accepted: 09/12/2016] [Indexed: 12/27/2022] Open
Abstract
The single-celled cotton fibers, produced from seed coat epidermal cells are the largest natural source of textile fibers. The economic value of cotton fiber lies in its length and quality. The multifunctional laccase enzymes play important roles in cell elongation, lignification and pigmentation in plants and could play crucial role in cotton fiber quality. Genome-wide analysis of cultivated allotetraploid (G. hirsutum) and its progenitor diploid (G. arboreum and G. raimondii) cotton species identified 84, 44 and 46 laccase genes, respectively. Analysis of chromosomal location, phylogeny, conserved domain and physical properties showed highly conserved nature of laccases across three cotton species. Gene expression, enzymatic activity and biochemical analysis of developing cotton fibers was performed using G. arboreum species. Of the total 44, 40 laccases showed expression during different stages of fiber development. The higher enzymatic activity of laccases correlated with higher lignin content at 25 DPA (Days Post Anthesis). Further, analysis of cotton fiber phenolic compounds showed an overall decrease at 25 DPA indicating possible incorporation of these substrates into lignin polymer during secondary cell wall biosynthesis. Overall data indicate significant roles of laccases in cotton fiber development, and presents an excellent opportunity for manipulation of fiber development and quality.
Collapse
|
36
|
Abstract
We surveyed the "dark" proteome-that is, regions of proteins never observed by experimental structure determination and inaccessible to homology modeling. For 546,000 Swiss-Prot proteins, we found that 44-54% of the proteome in eukaryotes and viruses was dark, compared with only ∼14% in archaea and bacteria. Surprisingly, most of the dark proteome could not be accounted for by conventional explanations, such as intrinsic disorder or transmembrane regions. Nearly half of the dark proteome comprised dark proteins, in which the entire sequence lacked similarity to any known structure. Dark proteins fulfill a wide variety of functions, but a subset showed distinct and largely unexpected features, such as association with secretion, specific tissues, the endoplasmic reticulum, disulfide bonding, and proteolytic cleavage. Dark proteins also had short sequence length, low evolutionary reuse, and few known interactions with other proteins. These results suggest new research directions in structural and computational biology.
Collapse
|
37
|
Dwivedi A, Srivastava AK, Bajpai A. Vibrational spectra, HOMO, LUMO, MESP surfaces and reactivity descriptors of amylamine and its isomers: A DFT study. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2015; 149:343-351. [PMID: 25965519 DOI: 10.1016/j.saa.2015.04.042] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2014] [Revised: 03/25/2015] [Accepted: 04/20/2015] [Indexed: 06/04/2023]
Abstract
Amylamine constitutes an important class of organic compounds which exists in a variety of ammonia derivatives. In present study, a comparative analysis of amylamine and its two potential isomers, iso-amylamine and tert-amylamine, has been performed using density functional theory with B3LYP method and 6-311G(d,p) as the basis set. The equilibrium structures of amylamine as well as its iso and tert forms have been obtained. The vibrational spectroscopic analysis has been carried out for the three molecules and complete assignments to all possible modes have been offered. The HOMO, LUMO and MESP surfaces are analyzed to discuss the chemical reactivity patterns in the molecules. A number of reactivity parameters have been calculated to further explain their chemical reactivity. The thermodynamic and nonlinear optical parameters are also calculated and discussed.
Collapse
Affiliation(s)
- Apoorva Dwivedi
- Department of Physics, Govt. Kakatiya P.G. College, Jagdalpur, Dist. Bastar, Chhattisgarh 494001, India
| | | | - Abhishek Bajpai
- Department of Physics, Govt. Kakatiya P.G. College, Jagdalpur, Dist. Bastar, Chhattisgarh 494001, India.
| |
Collapse
|
38
|
Affiliation(s)
- Fahrul Huyop
- Universiti Teknologi Malaysia, Faculty of Biosciences and Bioengineering, Johor, Malaysia
| | - Ismaila Yada Sudi
- Universiti Teknologi Malaysia, Faculty of Biosciences and Bioengineering, Johor, Malaysia
| |
Collapse
|
39
|
Predicting human protein subcellular locations by the ensemble of multiple predictors via protein-protein interaction network with edge clustering coefficients. PLoS One 2014; 9:e86879. [PMID: 24466278 PMCID: PMC3900678 DOI: 10.1371/journal.pone.0086879] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2013] [Accepted: 12/18/2013] [Indexed: 12/14/2022] Open
Abstract
One of the fundamental tasks in biology is to identify the functions of all proteins to reveal the primary machinery of a cell. Knowledge of the subcellular locations of proteins will provide key hints to reveal their functions and to understand the intricate pathways that regulate biological processes at the cellular level. Protein subcellular location prediction has been extensively studied in the past two decades. A lot of methods have been developed based on protein primary sequences as well as protein-protein interaction network. In this paper, we propose to use the protein-protein interaction network as an infrastructure to integrate existing sequence based predictors. When predicting the subcellular locations of a given protein, not only the protein itself, but also all its interacting partners were considered. Unlike existing methods, our method requires neither the comprehensive knowledge of the protein-protein interaction network nor the experimentally annotated subcellular locations of most proteins in the protein-protein interaction network. Besides, our method can be used as a framework to integrate multiple predictors. Our method achieved 56% on human proteome in absolute-true rate, which is higher than the state-of-the-art methods.
Collapse
|
40
|
Fukasawa Y, Leung RKK, Tsui SKW, Horton P. Plus ça change - evolutionary sequence divergence predicts protein subcellular localization signals. BMC Genomics 2014; 15:46. [PMID: 24438075 PMCID: PMC3906766 DOI: 10.1186/1471-2164-15-46] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2013] [Accepted: 01/06/2014] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Protein subcellular localization is a central problem in understanding cell biology and has been the focus of intense research. In order to predict localization from amino acid sequence a myriad of features have been tried: including amino acid composition, sequence similarity, the presence of certain motifs or domains, and many others. Surprisingly, sequence conservation of sorting motifs has not yet been employed, despite its extensive use for tasks such as the prediction of transcription factor binding sites. RESULTS Here, we flip the problem around, and present a proof of concept for the idea that the lack of sequence conservation can be a novel feature for localization prediction. We show that for yeast, mammal and plant datasets, evolutionary sequence divergence alone has significant power to identify sequences with N-terminal sorting sequences. Moreover sequence divergence is nearly as effective when computed on automatically defined ortholog sets as on hand curated ones. Unfortunately, sequence divergence did not necessarily increase classification performance when combined with some traditional sequence features such as amino acid composition. However a post-hoc analysis of the proteins in which sequence divergence changes the prediction yielded some proteins with atypical (i.e. not MPP-cleaved) matrix targeting signals as well as a few misannotations. CONCLUSION We report the results of the first quantitative study of the effectiveness of evolutionary sequence divergence as a feature for protein subcellular localization prediction. We show that divergence is indeed useful for prediction, but it is not trivial to improve overall accuracy simply by adding this feature to classical sequence features. Nevertheless we argue that sequence divergence is a promising feature and show anecdotal examples in which it succeeds where other features fail.
Collapse
Affiliation(s)
- Yoshinori Fukasawa
- Department of Computational Biology, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Japan
- Japan Society for the Promotion of Science, Tokyo Chiyoda, Japan
| | - Ross KK Leung
- Hong Kong Bioinformatics Centre and School of Biomedical Sciences, Chinese University of Hong Kong, Shatin, China
| | - Stephen KW Tsui
- Hong Kong Bioinformatics Centre and School of Biomedical Sciences, Chinese University of Hong Kong, Shatin, China
| | - Paul Horton
- Department of Computational Biology, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Japan
- Computational Biology Research Center, Advanced Industrial Science and Technology, Tokyo, Japan
| |
Collapse
|
41
|
Du P, Xu C. Predicting multisite protein subcellular locations: progress and challenges. Expert Rev Proteomics 2014; 10:227-37. [DOI: 10.1586/epr.13.16] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
42
|
A novel approach for protein subcellular location prediction using amino acid exposure. BMC Bioinformatics 2013; 14:342. [PMID: 24283794 PMCID: PMC4219330 DOI: 10.1186/1471-2105-14-342] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2013] [Accepted: 11/25/2013] [Indexed: 11/10/2022] Open
Abstract
Background Proteins perform their functions in associated cellular locations. Therefore, the study of protein function can be facilitated by predictions of protein location. Protein location can be predicted either from the sequence of a protein alone by identification of targeting peptide sequences and motifs, or by homology to proteins of known location. A third approach, which is complementary, exploits the differences in amino acid composition of proteins associated to different cellular locations, and can be useful if motif and homology information are missing. Here we expand this approach taking into account amino acid composition at different levels of amino acid exposure. Results Our method has two stages. For stage one, we trained multiple Support Vector Machines (SVMs) to score eukaryotic protein sequences for membership to each of three categories: nuclear, cytoplasmic and extracellular, plus extra category nucleocytoplasmic, accounting for the fact that a large number of proteins shuttles between those two locations. In stage two we use an artificial neural network (ANN) to propose a category from the scores given to the four locations in stage one. The method reaches an accuracy of 68% when using as input 3D-derived values of amino acid exposure. Calibration of the method using predicted values of amino acid exposure allows classifying proteins without 3D-information with an accuracy of 62% and discerning proteins in different locations even if they shared high levels of identity. Conclusions In this study we explored the relationship between residue exposure and protein subcellular location. We developed a new algorithm for subcellular location prediction that uses residue exposure signatures. Our algorithm uses a novel approach to address the multiclass classification problem. The algorithm is implemented as web server 'NYCE’ and can be accessed at http://cbdm.mdc-berlin.de/~amer/nyce.
Collapse
|
43
|
Kaundal R, Sahu SS, Verma R, Weirick T. Identification and characterization of plastid-type proteins from sequence-attributed features using machine learning. BMC Bioinformatics 2013; 14 Suppl 14:S7. [PMID: 24266945 PMCID: PMC3851450 DOI: 10.1186/1471-2105-14-s14-s7] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Plastids are an important component of plant cells, being the site of manufacture and storage of chemical compounds used by the cell, and contain pigments such as those used in photosynthesis, starch synthesis/storage, cell color etc. They are essential organelles of the plant cell, also present in algae. Recent advances in genomic technology and sequencing efforts is generating a huge amount of DNA sequence data every day. The predicted proteome of these genomes needs annotation at a faster pace. In view of this, one such annotation need is to develop an automated system that can distinguish between plastid and non-plastid proteins accurately, and further classify plastid-types based on their functionality. We compared the amino acid compositions of plastid proteins with those of non-plastid ones and found significant differences, which were used as a basis to develop various feature-based prediction models using similarity-search and machine learning. RESULTS In this study, we developed separate Support Vector Machine (SVM) trained classifiers for characterizing the plastids in two steps: first distinguishing the plastid vs. non-plastid proteins, and then classifying the identified plastids into their various types based on their function (chloroplast, chromoplast, etioplast, and amyloplast). Five diverse protein features: amino acid composition, dipeptide composition, the pseudo amino acid composition, N(terminal)-Center-C(terminal) composition and the protein physicochemical properties are used to develop SVM models. Overall, the dipeptide composition-based module shows the best performance with an accuracy of 86.80% and Matthews Correlation Coefficient (MCC) of 0.74 in phase-I and 78.60% with a MCC of 0.44 in phase-II. On independent test data, this model also performs better with an overall accuracy of 76.58% and 74.97% in phase-I and phase-II, respectively. The similarity-based PSI-BLAST module shows very low performance with about 50% prediction accuracy for distinguishing plastid vs. non-plastids and only 20% in classifying various plastid-types, indicating the need and importance of machine learning algorithms. CONCLUSION The current work is a first attempt to develop a methodology for classifying various plastid-type proteins. The prediction modules have also been made available as a web tool, PLpred available at http://bioinfo.okstate.edu/PLpred/ for real time identification/characterization. We believe this tool will be very useful in the functional annotation of various genomes.
Collapse
|
44
|
Abstract
Motivation: Subcellular localization is one aspect of protein function. Despite advances in high-throughput imaging, localization maps remain incomplete. Several methods accurately predict localization, but many challenges remain to be tackled. Results: In this study, we introduced a framework to predict localization in life's three domains, including globular and membrane proteins (3 classes for archaea; 6 for bacteria and 18 for eukaryota). The resulting method, LocTree2, works well even for protein fragments. It uses a hierarchical system of support vector machines that imitates the cascading mechanism of cellular sorting. The method reaches high levels of sustained performance (eukaryota: Q18=65%, bacteria: Q6=84%). LocTree2 also accurately distinguishes membrane and non-membrane proteins. In our hands, it compared favorably with top methods when tested on new data. Availability: Online through PredictProtein (predictprotein.org); as standalone version at http://www.rostlab.org/services/loctree2. Contact:localization@rostlab.org Supplementary Information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tatyana Goldberg
- TUM, Bioinformatik-I12, Informatik, Boltzmannstrasse 3, Garching 85748, Germany.
| | | | | |
Collapse
|
45
|
Sun XY, Shi SP, Qiu JD, Suo SB, Huang SY, Liang RP. Identifying protein quaternary structural attributes by incorporating physicochemical properties into the general form of Chou's PseAAC via discrete wavelet transform. MOLECULAR BIOSYSTEMS 2013; 8:3178-84. [PMID: 22990717 DOI: 10.1039/c2mb25280e] [Citation(s) in RCA: 74] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
In vivo, some proteins exist as monomers and others as oligomers. Oligomers can be further classified into homo-oligomers (formed by identical subunits) and hetero-oligomers (formed by different subunits), and they form the structural components of various biological functions, including cooperative effects, allosteric mechanism and ion-channel gating. Therefore, with the avalanche of protein sequences generated in the post-genomic era, it is very important for both basic research and the pharmaceutical industry to acquire the possible knowledge about quaternary structural attributes of their proteins of interest. In view of this, a high throughput method (DWT_DT), a 2-layer approach by fusing discrete wavelet transform (DWT) and decision-tree algorithm (DT) with physicochemical features, has been developed to predict protein quaternary structures. The 1st layer is to assign a query protein to one of the 10 main quaternary structural attributes. The 2nd layer is to evaluate whether the protein in question is composed of homo- or hetero-oligomers. The overall accuracy by jackknife test for the 1st layer identification was 89.60%. The overall accuracy of the 2nd layer varies from 88.23 to 100%. The results suggest that this newly developed protocol (DWT_DT) is very promising in predicting quaternary structures with complicated composition.
Collapse
Affiliation(s)
- Xing-Yu Sun
- Department of Chemistry, Nanchang University, Nanchang 330031, P.R. China.
| | | | | | | | | | | |
Collapse
|
46
|
Flower DR, Perrie Y. Identification of Candidate Vaccine Antigens In Silico. IMMUNOMIC DISCOVERY OF ADJUVANTS AND CANDIDATE SUBUNIT VACCINES 2013. [PMCID: PMC7120937 DOI: 10.1007/978-1-4614-5070-2_3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The identification of immunogenic whole-protein antigens is fundamental to the successful discovery of candidate subunit vaccines and their rapid, effective, and efficient transformation into clinically useful, commercially successful vaccine formulations. In the wider context of the experimental discovery of vaccine antigens, with particular reference to reverse vaccinology, this chapter adumbrates the principal computational approaches currently deployed in the hunt for novel antigens: genome-level prediction of antigens, antigen identification through the use of protein sequence alignment-based approaches, antigen detection through the use of subcellular location prediction, and the use of alignment-independent approaches to antigen discovery. Reference is also made to the recent emergence of various expert systems for protein antigen identification.
Collapse
Affiliation(s)
- Darren R. Flower
- Aston Pharmacy School, School of Life and Health Sciences, University of Aston, Aston Triangle, Birmingham, B4 7ET United Kingdom
| | - Yvonne Perrie
- Aston Pharmacy School, School of Life and Health Sciences, Aston University, Aston Triangle, Birmingham, B4 7ET United Kingdom
| |
Collapse
|
47
|
White AD, Huang W, Jiang S. Role of nonspecific interactions in molecular chaperones through model-based bioinformatics. Biophys J 2012; 103:2484-91. [PMID: 23260050 DOI: 10.1016/j.bpj.2012.10.040] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2012] [Revised: 10/22/2012] [Accepted: 10/31/2012] [Indexed: 01/16/2023] Open
Abstract
Molecular chaperones are large proteins or protein complexes from which many proteins require assistance in order to fold. One unique property of molecular chaperones is the cavity they provide in which proteins fold. The interior surface residues which make up the cavities of molecular chaperone complexes from different organisms has recently been identified, including the well-studied GroEL-GroES chaperonin complex found in Escherichia coli. It was found that the interior of these protein complexes is significantly different than other protein surfaces and that the residues found on the protein surface are able to resist protein adsorption when immobilized on a surface. Yet it remains unknown if these residues passively resist protein binding inside GroEL-GroEs (as demonstrated by experiments that created synthetic mimics of the interior cavity) or if the interior also actively stabilizes protein folding. To answer this question, we have extended entropic models of substrate protein folding inside GroEL-GroES to include interaction energies between substrate proteins and the GroEL-GroES chaperone complex. This model was tested on a set of 528 proteins and the results qualitatively match experimental observations. The interior residues were found to strongly discourage the exposure of any hydrophobic residues, providing an enhanced hydrophobic effect inside the cavity that actively influences protein folding. This work provides both a mechanism for active protein stabilization in GroEL-GroES and a model that matches contemporary understanding of the chaperone protein.
Collapse
Affiliation(s)
- Andrew D White
- Department of Chemical Engineering, University of Washington, Seattle, Washington, USA
| | | | | |
Collapse
|
48
|
Predict mycobacterial proteins subcellular locations by incorporating pseudo-average chemical shift into the general form of Chou’s pseudo amino acid composition. J Theor Biol 2012; 304:88-95. [DOI: 10.1016/j.jtbi.2012.03.017] [Citation(s) in RCA: 89] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2011] [Revised: 03/13/2012] [Accepted: 03/14/2012] [Indexed: 11/18/2022]
|
49
|
Evaluation of hydropathy of amino acids from a comparison of their viscosities inside vesicles and on supported lipid bilayers. Colloids Surf B Biointerfaces 2012; 91:63-7. [PMID: 22118892 DOI: 10.1016/j.colsurfb.2011.10.038] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2011] [Revised: 10/20/2011] [Accepted: 10/20/2011] [Indexed: 11/21/2022]
Abstract
The viscosity of amino acids enclosed in giant lipid vesicles (η(out)) subjected to a shear flow near a solid surface has been studied using quartz crystal microbalance (QCM). This viscosity has been compared with shear viscosity for the different amino acids adsorbed on supported bilayers (SLBs) (η(in)) of the lipids on quartz. Using a first approximation of vesicles as model rigid spheres, the measured viscosities and the extent of deformation of vesicles observed using optical microscopy, two non-dimensional parameters: the reduced volume and the ratio of (η(in))/(η(out)) have been analyzed as a function of physical parameters: vesicle substrate distance (vesicle vs. supported lipid bilayers), vesicle size and their variation as a function of the viscosity. The kinematics of the vesicles with the amino acids compared with the shear at supported lipid bilayers seems to describe a reasonable hydropathy scale for the amino acids. The results show that there is a direct correlation between the above parameters and the polarity variations in amino acids suggesting that the viscous force may be an important parameter and should be taken into account in studies on membrane proteins interacting with cells and cell adhesion in flow chambers where cell membrane and the adhesive substrate are in relative motion.
Collapse
|
50
|
White AD, Nowinski AK, Huang W, Keefe AJ, Sun F, Jiang S. Decoding nonspecific interactions from nature. Chem Sci 2012. [DOI: 10.1039/c2sc21135a] [Citation(s) in RCA: 82] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
|