1
|
Koga N, Tatsumi-Koga R. Inventing Novel Protein Folds. J Mol Biol 2024:168791. [PMID: 39260686 DOI: 10.1016/j.jmb.2024.168791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 09/04/2024] [Accepted: 09/05/2024] [Indexed: 09/13/2024]
Abstract
The vastness of unexplored protein fold universe remains a significant question. Through systematic de novo design of proteins with novel αβ-folds, we demonstrated that nature has only explored a tiny portion of the possible folds. Numerous possible protein folds are still untouched by nature. This review outlines this study and discusses the prospects for design of functional proteins with novel folds.
Collapse
Affiliation(s)
- Nobuyasu Koga
- Laboratory for Protein Design, Institute for Protein Research (IPR), Osaka University, Suita, Osaka 565-0871, Japan; Protein Design Group, Exploratory Research Center on Life and Living Systems (ExCELLS)d, National Institutes of Natural Sciences, Okazaki, Aichi 444-8585, Japan.
| | - Rie Tatsumi-Koga
- Laboratory for Protein Design, Institute for Protein Research (IPR), Osaka University, Suita, Osaka 565-0871, Japan
| |
Collapse
|
2
|
Jacques F, Bolivar P, Pietras K, Hammarlund EU. Roadmap to the study of gene and protein phylogeny and evolution-A practical guide. PLoS One 2023; 18:e0279597. [PMID: 36827278 PMCID: PMC9955684 DOI: 10.1371/journal.pone.0279597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Accepted: 12/12/2022] [Indexed: 02/25/2023] Open
Abstract
Developments in sequencing technologies and the sequencing of an ever-increasing number of genomes have revolutionised studies of biodiversity and organismal evolution. This accumulation of data has been paralleled by the creation of numerous public biological databases through which the scientific community can mine the sequences and annotations of genomes, transcriptomes, and proteomes of multiple species. However, to find the appropriate databases and bioinformatic tools for respective inquiries and aims can be challenging. Here, we present a compilation of DNA and protein databases, as well as bioinformatic tools for phylogenetic reconstruction and a wide range of studies on molecular evolution. We provide a protocol for information extraction from biological databases and simple phylogenetic reconstruction using probabilistic and distance methods, facilitating the study of biodiversity and evolution at the molecular level for the broad scientific community.
Collapse
Affiliation(s)
- Florian Jacques
- Lund University Cancer Centre, Department of Laboratory Medicine, Lund University, Lund, Sweden
- Lund Stem Cell Center, Department of Laboratory Medicine, Lund University, Lund, Sweden
| | - Paulina Bolivar
- Lund University Cancer Centre, Department of Laboratory Medicine, Lund University, Lund, Sweden
| | - Kristian Pietras
- Lund University Cancer Centre, Department of Laboratory Medicine, Lund University, Lund, Sweden
| | - Emma U. Hammarlund
- Lund University Cancer Centre, Department of Laboratory Medicine, Lund University, Lund, Sweden
- Lund Stem Cell Center, Department of Laboratory Medicine, Lund University, Lund, Sweden
| |
Collapse
|
3
|
Zhao W, Zhong B, Zheng L, Tan P, Wang Y, Leng H, de Souza N, Liu Z, Hong L, Xiao X. Proteome-wide 3D structure prediction provides insights into the ancestral metabolism of ancient archaea and bacteria. Nat Commun 2022; 13:7861. [PMID: 36543797 PMCID: PMC9772386 DOI: 10.1038/s41467-022-35523-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 12/07/2022] [Indexed: 12/24/2022] Open
Abstract
Ancestral metabolism has remained controversial due to a lack of evidence beyond sequence-based reconstructions. Although prebiotic chemists have provided hints that metabolism might originate from non-enzymatic protometabolic pathways, gaps between ancestral reconstruction and prebiotic processes mean there is much that is still unknown. Here, we apply proteome-wide 3D structure predictions and comparisons to investigate ancestorial metabolism of ancient bacteria and archaea, to provide information beyond sequence as a bridge to the prebiotic processes. We compare representative bacterial and archaeal strains, which reveal surprisingly similar physiological and metabolic characteristics via microbiological and biophysical experiments. Pairwise comparison of protein structures identify the conserved metabolic modules in bacteria and archaea, despite interference from overly variable sequences. The conserved modules (for example, middle of glycolysis, partial TCA, proton/sulfur respiration, building block biosynthesis) constitute the basic functions that possibly existed in the archaeal-bacterial common ancestor, which are remarkably consistent with the experimentally confirmed protometabolic pathways. These structure-based findings provide a new perspective to reconstructing the ancestral metabolism and understanding its origin, which suggests high-throughput protein 3D structure prediction is a promising approach, deserving broader application in future ancestral exploration.
Collapse
Affiliation(s)
- Weishu Zhao
- State Key Laboratory of Microbial Metabolism, International Center for Deep Life Investigation (IC-DLI), School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 200240, Shanghai, China
| | - Bozitao Zhong
- State Key Laboratory of Microbial Metabolism, International Center for Deep Life Investigation (IC-DLI), School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 200240, Shanghai, China
- Institute of Natural Sciences, Shanghai National Center for Applied Mathematics (SJTU Center) and MOE-LSC, Shanghai Jiao Tong University, 200240, Shanghai, China
| | - Lirong Zheng
- Institute of Natural Sciences, Shanghai National Center for Applied Mathematics (SJTU Center) and MOE-LSC, Shanghai Jiao Tong University, 200240, Shanghai, China
| | - Pan Tan
- Institute of Natural Sciences, Shanghai National Center for Applied Mathematics (SJTU Center) and MOE-LSC, Shanghai Jiao Tong University, 200240, Shanghai, China
| | - Yinzhao Wang
- State Key Laboratory of Microbial Metabolism, International Center for Deep Life Investigation (IC-DLI), School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 200240, Shanghai, China
| | - Hao Leng
- State Key Laboratory of Microbial Metabolism, International Center for Deep Life Investigation (IC-DLI), School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 200240, Shanghai, China
| | - Nicolas de Souza
- Australian Nuclear Science and Technology (ANSTO), Locked Bag 2001, Kirrawee DC, Sydney, NSW, 2232, Australia
| | - Zhuo Liu
- Institute of Natural Sciences, Shanghai National Center for Applied Mathematics (SJTU Center) and MOE-LSC, Shanghai Jiao Tong University, 200240, Shanghai, China
- Shanghai Artificial Intelligence Laboratory, 200232, Shanghai, China
- School of Physics and Astronomy, Zhangjiang Institute for Advanced Study, Shanghai Jiao Tong University, 200240, Shanghai, China
| | - Liang Hong
- Institute of Natural Sciences, Shanghai National Center for Applied Mathematics (SJTU Center) and MOE-LSC, Shanghai Jiao Tong University, 200240, Shanghai, China.
- Shanghai Artificial Intelligence Laboratory, 200232, Shanghai, China.
- School of Physics and Astronomy, Zhangjiang Institute for Advanced Study, Shanghai Jiao Tong University, 200240, Shanghai, China.
| | - Xiang Xiao
- State Key Laboratory of Microbial Metabolism, International Center for Deep Life Investigation (IC-DLI), School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 200240, Shanghai, China.
- Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Zhuhai, Guangdong, China.
| |
Collapse
|
4
|
Paul SK, Saddam M, Rahaman KA, Choi JG, Lee SS, Hasan M. Molecular modeling, molecular dynamics simulation, and essential dynamics analysis of grancalcin: An upregulated biomarker in experimental autoimmune encephalomyelitis mice. Heliyon 2022; 8:e11232. [PMID: 36340004 PMCID: PMC9626934 DOI: 10.1016/j.heliyon.2022.e11232] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Revised: 05/30/2022] [Accepted: 10/20/2022] [Indexed: 11/06/2022] Open
Abstract
The experimental autoimmune encephalomyelitis mouse model is the most commonly used animal model, and it best represents multiple sclerosis. Grancalcin (GCA) was discovered to be upregulated in EAE mice. GCA comprises 220 amino acids that have been assigned the UniprotKB ID Q8VC88. It is a calcium-binding protein that helps neutrophils adhere to fibronectin and the formation of focal adhesions. However, the protein data bank does not contain the crystal structure of mouse GCA. The current study aims to analyze the structural and physicochemical properties of GCA. Mouse GCA showed a high percentage identity (87%) with the crystal structure of des (1-52) grancalcin with bound calcium (chain A) from Homo sapiens identified by its PDB id 1k94_A. Using the SWISS-MODEL server, we used 1k94_A as a template protein to model the mouse GCA protein. Compared to the template structure 1K94, three potential binding sites for calcium-binding have been proposed, ranging from 13 to 20, 80 to 91, and 109 to 120 amino acids. On an i5 personal computer with 8GB of RAM, GROMACS 2020.1 was utilized to run a 100 ns molecular dynamics (MD) simulation. RMSD, Rg, and RMSF analysis of an MD simulation trajectory indicate a stable and compact state throughout the simulation period of modeled proteins. We found that GCA is primarily alpha helical (Class 1), with eight alpha helices. The essential dynamics analysis captures PCA and SASA, culminating in the biological motions that correspond to the last 1000 frames. These findings will aid the development of potential inhibitors as well as the determination of binding pockets and residues for drug-like molecules.
Collapse
Affiliation(s)
- Shamrat Kumar Paul
- Department of Biochemistry and Molecular Biology, Life Science Faculty, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj 8100, Bangladesh
| | - Md. Saddam
- Department of Biochemistry and Molecular Biology, Life Science Faculty, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj 8100, Bangladesh
| | - Khandoker Asiqur Rahaman
- Division of Biomedical Science and Technology, KIST-School, Korea University of Science and Technology, Seoul 02792, South Korea
| | - Jong-Gu Choi
- Department of Oriental Biomedical Engineering, Sangji University, Wonju 26339, South Korea
| | - Sang-Suk Lee
- Department of Oriental Biomedical Engineering, Sangji University, Wonju 26339, South Korea
| | - Mahbub Hasan
- Department of Biochemistry and Molecular Biology, Life Science Faculty, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj 8100, Bangladesh
- Department of Oriental Biomedical Engineering, Sangji University, Wonju 26339, South Korea
| |
Collapse
|
5
|
Rosa HVD, Leonardo DA, Brognara G, Brandão-Neto J, D'Muniz Pereira H, Araújo APU, Garratt RC. Molecular Recognition at Septin Interfaces: The Switches Hold the Key. J Mol Biol 2020; 432:5784-5801. [DOI: 10.1016/j.jmb.2020.09.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Revised: 08/25/2020] [Accepted: 09/01/2020] [Indexed: 01/22/2023]
|
6
|
Decomposing Structural Response Due to Sequence Changes in Protein Domains with Machine Learning. J Mol Biol 2020; 432:4435-4446. [PMID: 32485208 DOI: 10.1016/j.jmb.2020.05.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Revised: 05/06/2020] [Accepted: 05/27/2020] [Indexed: 10/24/2022]
Abstract
How protein domain structure changes in response to mutations is not well understood. Some mutations change the structure drastically, while most only result in small changes. To gain an understanding of this, we decompose the relationship between changes in domain sequence and structure using machine learning. We select pairs of evolutionarily related domains with a broad range of evolutionary distances. In contrast to earlier studies, we do not find a strictly linear relationship between sequence and structural changes. We train a random forest regressor that predicts the structural similarity between pairs with an average accuracy of 0.029 lDDT ( local Distance Difference Test) score, and a correlation coefficient of 0.92. Decomposing the feature importance shows that the domain length, or analogously, size is the most important feature. Our model enables assessing deviations in relative structural response, and thus prediction of evolutionary trajectories, in protein domains across evolution.
Collapse
|
7
|
Dimarogona M, Topakas E, Christakopoulos P, Chrysina ED. The crystal structure of a Fusarium oxysporum feruloyl esterase that belongs to the tannase family. FEBS Lett 2020; 594:1738-1749. [PMID: 32297315 DOI: 10.1002/1873-3468.13776] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Revised: 03/13/2020] [Accepted: 03/17/2020] [Indexed: 12/31/2022]
Abstract
Feruloyl esterases are enzymes of industrial interest that catalyse the hydrolysis of the ester bond between hydroxycinnamic acids such as ferulic acid and sugars present in the plant cell wall. Although there are several structures of biochemically characterized feruloyl esterases available, the structural determinants of their substrate specificity are not yet fully understood. Here, we present the crystal structure of a feruloyl esterase from Fusarium oxysporum (FoFaeC) at 2.3 Å resolution. Similar to the two other tannase-like feruloyl esterases, FoFaeC features a large lid domain covering the active site with potential regulatory role and a disulphide bond that brings together the serine and histidine of the catalytic triad. Differences are mainly observed in the metal coordination site and the substrate binding pocket. ENZYMES: E.C.3.1.1.73. DATABASES: The sequence of FoFaeC has been deposited with UniProt with accession code A0A1D3S5H0_FUSOX and the atomic coordinates of the three-dimensional structure with Protein Data Bank, with PDB code: 6FAT.
Collapse
Affiliation(s)
- Maria Dimarogona
- Institute of Chemical Biology, National Hellenic Research Foundation, Athens, Greece.,School of Chemical Engineering, National Technical University of Athens, Greece
| | - Evangelos Topakas
- School of Chemical Engineering, National Technical University of Athens, Greece
| | - Paul Christakopoulos
- Biochemical Process Engineering, Division of Chemical Engineering, Department of Civil, Environmental and Natural Resources Engineering, Luleå University of Technology, Sweden
| | - Evangelia D Chrysina
- Institute of Chemical Biology, National Hellenic Research Foundation, Athens, Greece
| |
Collapse
|
8
|
Kumar AP, Verma CS, Lukman S. Structural dynamics and allostery of Rab proteins: strategies for drug discovery and design. Brief Bioinform 2020; 22:270-287. [PMID: 31950981 DOI: 10.1093/bib/bbz161] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Revised: 08/29/2019] [Accepted: 11/15/2019] [Indexed: 01/09/2023] Open
Abstract
Rab proteins represent the largest family of the Rab superfamily guanosine triphosphatase (GTPase). Aberrant human Rab proteins are associated with multiple diseases, including cancers and neurological disorders. Rab subfamily members display subtle conformational variations that render specificity in their physiological functions and can be targeted for subfamily-specific drug design. However, drug discovery efforts have not focused much on targeting Rab allosteric non-nucleotide binding sites which are subjected to less evolutionary pressures to be conserved, hence are likely to offer subfamily specificity and may be less prone to undesirable off-target interactions and side effects. To discover druggable allosteric binding sites, Rab structural dynamics need to be first incorporated using multiple experimentally and computationally obtained structures. The high-dimensional structural data may necessitate feature extraction methods to identify manageable representative structures for subsequent analyses. We have detailed state-of-the-art computational methods to (i) identify binding sites using data on sequence, shape, energy, etc., (ii) determine the allosteric nature of these binding sites based on structural ensembles, residue networks and correlated motions and (iii) identify small molecule binders through structure- and ligand-based virtual screening. To benefit future studies for targeting Rab allosteric sites, we herein detail a refined workflow comprising multiple available computational methods, which have been successfully used alone or in combinations. This workflow is also applicable for drug discovery efforts targeting other medically important proteins. Depending on the structural dynamics of proteins of interest, researchers can select suitable strategies for allosteric drug discovery and design, from the resources of computational methods and tools enlisted in the workflow.
Collapse
Affiliation(s)
- Ammu Prasanna Kumar
- Department of Chemistry, College of Arts and Sciences, Khalifa University, Abu Dhabi, United Arab Emirates.,Research Unit in Bioinformatics, Department of Biochemistry and Microbiology, Rhodes University, South Africa
| | - Chandra S Verma
- Bioinformatics Institute, Agency for Science, Technology and Research, Singapore.,Department of Biological Sciences, National University of Singapore, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore
| | - Suryani Lukman
- Department of Chemistry, College of Arts and Sciences, Khalifa University, Abu Dhabi, United Arab Emirates
| |
Collapse
|
9
|
Schaeffer RD, Kinch L, Medvedev KE, Pei J, Cheng H, Grishin N. ECOD: identification of distant homology among multidomain and transmembrane domain proteins. BMC Mol Cell Biol 2019; 20:18. [PMID: 31226926 PMCID: PMC6588880 DOI: 10.1186/s12860-019-0204-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Accepted: 06/02/2019] [Indexed: 12/03/2022] Open
Abstract
The manual classification of protein domains is approaching its 20th anniversary. ECOD is our mixed manual-automatic domain classification. Over time, the types of proteins which require manual curation has changed. Depositions with complex multidomain and multichain arrangements are commonplace. Transmembrane domains are regularly classified. Repeatedly, domains which are initially believed to be novel are found to have homologous links to existing classified domains. Here we present a brief summary of recent manual curation efforts in ECOD generally combined with specific case studies of transmembrane and multidomain proteins wherein manual curation was useful for discovering new homologous relationships. We present a new taxonomy for the classification of ABC transporter transmembrane domains. We examine alternate topologies of the leucine-specific (LS) domain of Leucine tRNA-synthetase. Finally, we elaborate on a distant homologous links between two helical dimerization domains.
Collapse
Affiliation(s)
- R Dustin Schaeffer
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390-9050, USA.
| | - Lisa Kinch
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX, 75390-9050, USA
| | - Kirill E Medvedev
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390-9050, USA
| | - Jimin Pei
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX, 75390-9050, USA
| | - Hua Cheng
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX, 75390-9050, USA
| | - Nick Grishin
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390-9050, USA
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX, 75390-9050, USA
| |
Collapse
|
10
|
Song K, Zhang J, Lu S. Progress in Allosteric Database. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2019; 1163:65-87. [PMID: 31707700 DOI: 10.1007/978-981-13-8719-7_4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
An allosteric mechanism refers to the biological regulation process wherein macromolecules propagate the effect of ligand binding at one site to a spatially distant orthosteric locus, thus affecting activity. The theory has remained a trending topic in biology research for over 50 years, since the understanding of allostery is fundamental for gleaning numerous biological processes and developing new drug therapies. In the past two decades, the allosteric paradigm has evolved into more descriptive models, with ever-expanding amounts of experimental data pertaining to newly identified allosteric molecules. The AlloSteric Database (ASD, accessible at http://mdl.shsmu.edu.cn/ASD ), which is a comprehensive knowledge repository, has provided the public with integrated information encompassing allosteric proteins, modulators, sites, pathways, and networks to investigate allostery since 2009. In this chapter, we introduce the history and usage of the ASD and give attention to specific applications that have benefited from the ASD.
Collapse
Affiliation(s)
- Kun Song
- Department of Pathophysiology, Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, Shanghai Jiao Tong University, School of Medicine, Shanghai, China
| | - Jian Zhang
- Department of Pathophysiology, Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, Shanghai Jiao Tong University, School of Medicine, Shanghai, China
| | - Shaoyong Lu
- Department of Pathophysiology, Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, Shanghai Jiao Tong University, School of Medicine, Shanghai, China.
| |
Collapse
|
11
|
Shanthirabalan S, Chomilier J, Carpentier M. Structural effects of point mutations in proteins. Proteins 2018; 86:853-867. [DOI: 10.1002/prot.25499] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2017] [Revised: 03/19/2018] [Accepted: 03/20/2018] [Indexed: 12/21/2022]
Affiliation(s)
- Suvethigaa Shanthirabalan
- Institut Systématique Evolution Biodiversité (ISYEB), Sorbonne Université, MNHN, CNRS, EPHE; Paris France
| | | | - Mathilde Carpentier
- Institut Systématique Evolution Biodiversité (ISYEB), Sorbonne Université, MNHN, CNRS, EPHE; Paris France
- Sorbonne Université, CNRS, MNHN, IRD, IMPMC, BiBiP; Paris France
| |
Collapse
|
12
|
Abstract
Comparing and classifying protein domain interactions according to their three-dimensional (3D) structures can help to understand protein structure-function and evolutionary relationships. Additionally, structural knowledge of existing domain-domain interactions can provide a useful way to find structural templates with which to model the 3D structures of unsolved protein complexes. Here we present a straightforward guide to using the "Kbdock" protein domain structure database and its associated web site for exploring and comparing protein domain-domain interactions (DDIs) and domain-peptide interactions (DPIs) at the Pfam domain family level. We also briefly explain how the Kbdock web site works, and we provide some notes and suggestions which should help to avoid some common pitfalls when working with 3D protein domain structures.
Collapse
|
13
|
NSiteMatch: Prediction of Binding Sites of Nucleotides by Identifying the Structure Similarity of Local Surface Patches. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2017; 2017:5471607. [PMID: 28811833 PMCID: PMC5547728 DOI: 10.1155/2017/5471607] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Accepted: 06/14/2017] [Indexed: 12/01/2022]
Abstract
Nucleotides play a central role in life-form metabolism, by interacting with proteins and mediating the function of proteins. It is estimated that nucleotides constitute about 15% of the biologically relevant ligands included in PDB. Prediction of binding sites of nucleotides is useful in understanding the function of proteins and can facilitate the in silico design of drugs. In this study, we propose a nucleotide-binding site predictor, namely, NSiteMatch. The NSiteMatch algorithm integrates three different strategies: geometrical analysis, energy calculation, and template comparison. Unlike a traditional template-based predictor, which identifies global similarity between target structure and template, NSiteMatch concerns the local similarity between a surface patch of the target protein and the binding sites of template. To this end, NSiteMatch identifies more templates than traditional template-based predictors. The NSiteMatch predictor is compared with three representative methods, Findsite, Q-SiteFinder, and MetaPocket. An extensive evaluation demonstrates that NSiteMatch achieves higher success rates than Findsite, Q-SiteFinder, and MetaPocket, in prediction of binding sites of ATP, ADP, and AMP.
Collapse
|
14
|
Yoneda JS, Miles AJ, Araujo APU, Wallace BA. Differential dehydration effects on globular proteins and intrinsically disordered proteins during film formation. Protein Sci 2017; 26:718-726. [PMID: 28097742 PMCID: PMC5368061 DOI: 10.1002/pro.3118] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Revised: 01/06/2017] [Accepted: 01/09/2017] [Indexed: 12/22/2022]
Abstract
Globular proteins composed of different secondary structures and fold types were examined by synchrotron radiation circular dichroism spectroscopy to determine the effects of dehydration on their secondary structures. They exhibited only minor changes upon removal of bulk water during film formation, contrary to previously reported studies of proteins dehydrated by lyophilization (where substantial loss of helical structure and gain in sheet structure was detected). This near lack of conformational change observed for globular proteins contrasts with intrinsically disordered proteins (IDPs) dried in the same manner: the IDPs, which have almost completely unordered structures in solution, exhibited increased amounts of regular (mostly helical) secondary structures when dehydrated, suggesting formation of new intra-protein hydrogen bonds replacing solvent-protein hydrogen bonds, in a process which may mimic interactions that occur when IDPs bind to partner molecules. This study has thus shown that the secondary structures of globular and intrinsically disordered proteins behave very differently upon dehydration, and that films are a potentially useful format for examining dehydrated soluble proteins and assessing IDPs structures.
Collapse
Affiliation(s)
- Juliana Sakamoto Yoneda
- Institute of Structural and Molecular Biology, Birkbeck College, University of London, London, UK.,Instituto de Física de São Carlos, Universidade de São Paulo, São Carlos, Brazil
| | - Andew J Miles
- Institute of Structural and Molecular Biology, Birkbeck College, University of London, London, UK
| | | | - B A Wallace
- Institute of Structural and Molecular Biology, Birkbeck College, University of London, London, UK
| |
Collapse
|
15
|
Mavridis L, Janes RW. PDB2CD: a web-based application for the generation of circular dichroism spectra from protein atomic coordinates. Bioinformatics 2016; 33:56-63. [PMID: 27651482 PMCID: PMC5408769 DOI: 10.1093/bioinformatics/btw554] [Citation(s) in RCA: 51] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2016] [Revised: 08/13/2016] [Accepted: 08/21/2016] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Circular dichroism (CD) spectroscopy is extensively utilized for determining the percentages of secondary structure content present in proteins. However, although a large contributor, secondary structure is not the only factor that influences the shape and magnitude of the CD spectrum produced. Other structural features can make contributions so an entire protein structural conformation can give rise to a CD spectrum. There is a need for an application capable of generating protein CD spectra from atomic coordinates. However, no empirically derived method to do this currently exists. RESULTS PDB2CD has been created as an empirical-based approach to the generation of protein CD spectra from atomic coordinates. The method utilizes a combination of structural features within the conformation of a protein; not only its percentage secondary structure content, but also the juxtaposition of these structural components relative to one another, and the overall structure similarity of the query protein to proteins in our dataset, the SP175 dataset, the 'gold standard' set obtained from the Protein Circular Dichroism Data Bank (PCDDB). A significant number of the CD spectra associated with the 71 proteins in this dataset have been produced with excellent accuracy using a leave-one-out cross-validation process. The method also creates spectra in good agreement with those of a test set of 14 proteins from the PCDDB. The PDB2CD package provides a web-based, user friendly approach to enable researchers to produce CD spectra from protein atomic coordinates. AVAILABILITY AND IMPLEMENTATION http://pdb2cd.cryst.bbk.ac.uk CONTACT: r.w.janes@qmul.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lazaros Mavridis
- School of Biological and Chemical Sciences, Queen Mary University of London, London, E1 4NS, UK
| | - Robert W Janes
- School of Biological and Chemical Sciences, Queen Mary University of London, London, E1 4NS, UK
| |
Collapse
|
16
|
Nguyen MN, Sim AYL, Wan Y, Madhusudhan MS, Verma C. Topology independent comparison of RNA 3D structures using the CLICK algorithm. Nucleic Acids Res 2016; 45:e5. [PMID: 27634929 PMCID: PMC5741206 DOI: 10.1093/nar/gkw819] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2015] [Revised: 09/01/2016] [Accepted: 09/02/2016] [Indexed: 01/15/2023] Open
Abstract
RNA molecules are attractive therapeutic targets because non-coding RNA molecules have increasingly been found to play key regulatory roles in the cell. Comparing and classifying RNA 3D structures yields unique insights into RNA evolution and function. With the rapid increase in the number of atomic-resolution RNA structures, it is crucial to have effective tools to classify RNA structures and to investigate them for structural similarities at different resolutions. We previously developed the algorithm CLICK to superimpose a pair of protein 3D structures by clique matching and 3D least squares fitting. In this study, we extend and optimize the CLICK algorithm to superimpose pairs of RNA 3D structures and RNA-protein complexes, independent of the associated topologies. Benchmarking Rclick on four different datasets showed that it is either comparable to or better than other structural alignment methods in terms of the extent of structural overlaps. Rclick also recognizes conformational changes between RNA structures and produces complementary alignments to maximize the extent of detectable similarity. Applying Rclick to study Ribonuclease III protein correctly aligned the RNA binding sites of RNAse III with its substrate. Rclick can be further extended to identify ligand-binding pockets in RNA. A web server is developed at http://mspc.bii.a-star.edu.sg/minhn/rclick.html.
Collapse
Affiliation(s)
- Minh N Nguyen
- Bioinformatics Institute, 30 Biopolis Street, #07-01, Matrix, Singapore 138671
| | - Adelene Y L Sim
- Bioinformatics Institute, 30 Biopolis Street, #07-01, Matrix, Singapore 138671
| | - Yue Wan
- Genome Institute of Singapore, 60 Biopolis Street, Genome, #02-01, Singapore 138672
| | - M S Madhusudhan
- Bioinformatics Institute, 30 Biopolis Street, #07-01, Matrix, Singapore 138671.,Indian Institute of Science Education and Research, Pune, India
| | - Chandra Verma
- Bioinformatics Institute, 30 Biopolis Street, #07-01, Matrix, Singapore 138671.,Department of Biological Sciences, National University of Singapore, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore
| |
Collapse
|
17
|
Making sense of genomes of parasitic worms: Tackling bioinformatic challenges. Biotechnol Adv 2016; 34:663-686. [DOI: 10.1016/j.biotechadv.2016.03.001] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2015] [Revised: 02/25/2016] [Accepted: 03/01/2016] [Indexed: 01/25/2023]
|
18
|
Paz I, Kligun E, Bengad B, Mandel-Gutfreund Y. BindUP: a web server for non-homology-based prediction of DNA and RNA binding proteins. Nucleic Acids Res 2016; 44:W568-74. [PMID: 27198220 PMCID: PMC4987955 DOI: 10.1093/nar/gkw454] [Citation(s) in RCA: 49] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2016] [Accepted: 05/11/2016] [Indexed: 12/12/2022] Open
Abstract
Gene expression is a multi-step process involving many layers of regulation. The main regulators of the pathway are DNA and RNA binding proteins. While over the years, a large number of DNA and RNA binding proteins have been identified and extensively studied, it is still expected that many other proteins, some with yet another known function, are awaiting to be discovered. Here we present a new web server, BindUP, freely accessible through the website http://bindup.technion.ac.il/, for predicting DNA and RNA binding proteins using a non-homology-based approach. Our method is based on the electrostatic features of the protein surface and other general properties of the protein. BindUP predicts nucleic acid binding function given the proteins three-dimensional structure or a structural model. Additionally, BindUP provides information on the largest electrostatic surface patches, visualized on the server. The server was tested on several datasets of DNA and RNA binding proteins, including proteins which do not possess DNA or RNA binding domains and have no similarity to known nucleic acid binding proteins, achieving very high accuracy. BindUP is applicable in either single or batch modes and can be applied for testing hundreds of proteins simultaneously in a highly efficient manner.
Collapse
Affiliation(s)
- Inbal Paz
- Department of Biology, Technion-Israel Institute of Technology, Technion City, Haifa 32000, Israel
| | - Efrat Kligun
- Department of Biology, Technion-Israel Institute of Technology, Technion City, Haifa 32000, Israel
| | - Barak Bengad
- Department of Biology, Technion-Israel Institute of Technology, Technion City, Haifa 32000, Israel
| | - Yael Mandel-Gutfreund
- Department of Biology, Technion-Israel Institute of Technology, Technion City, Haifa 32000, Israel
| |
Collapse
|
19
|
Ramírez-Sarmiento CA, Baez M, Zamora RA, Balasubramaniam D, Babul J, Komives EA, Guixé V. The folding unit of phosphofructokinase-2 as defined by the biophysical properties of a monomeric mutant. Biophys J 2016; 108:2350-61. [PMID: 25954892 DOI: 10.1016/j.bpj.2015.04.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Revised: 04/01/2015] [Accepted: 04/02/2015] [Indexed: 10/23/2022] Open
Abstract
Escherichia coli phosphofructokinase-2 (Pfk-2) is an obligate homodimer that follows a highly cooperative three-state folding mechanism N2 ↔ 2I ↔ 2U. The strong coupling between dissociation and unfolding is a consequence of the structural features of its interface: a bimolecular domain formed by intertwining of the small domain of each subunit into a flattened β-barrel. Although isolated monomers of E. coli Pfk-2 have been observed by modification of the environment (changes in temperature, addition of chaotropic agents), no isolated subunits in native conditions have been obtained. Based on in silico estimations of the change in free energy and the local energetic frustration upon binding, we engineered a single-point mutant to destabilize the interface of Pfk-2. This mutant, L93A, is an inactive monomer at protein concentrations below 30 μM, as determined by analytical ultracentrifugation, dynamic light scattering, size exclusion chromatography, small-angle x-ray scattering, and enzyme kinetics. Active dimer formation can be induced by increasing the protein concentration and by addition of its substrate fructose-6-phosphate. Chemical and thermal unfolding of the L93A monomer followed by circular dichroism and dynamic light scattering suggest that it unfolds noncooperatively and that the isolated subunit is partially unstructured and marginally stable. The detailed structural features of the L93A monomer and the F6P-induced dimer were ascertained by high-resolution hydrogen/deuterium exchange mass spectrometry. Our results show that the isolated subunit has overall higher solvent accessibility than the native dimer, with the exception of residues 240-309. These residues correspond to most of the β-meander module and show the same extent of deuterium uptake as the native dimer. Our results support the idea that the hydrophobic core of the isolated monomer of Pfk-2 is solvent-penetrated in native conditions and that the β-meander module is not affected by monomerizing mutations.
Collapse
Affiliation(s)
| | - Mauricio Baez
- Departamento de Bioquímica y Biología Molecular, Facultad de Ciencias Químicas y Farmacéuticas, Casilla 233, Santiago, Chile
| | - Ricardo A Zamora
- Departamento de Biología, Facultad de Ciencias, Universidad de Chile, Casilla 653, Santiago, Chile
| | - Deepa Balasubramaniam
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California
| | - Jorge Babul
- Departamento de Biología, Facultad de Ciencias, Universidad de Chile, Casilla 653, Santiago, Chile
| | - Elizabeth A Komives
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California.
| | - Victoria Guixé
- Departamento de Biología, Facultad de Ciencias, Universidad de Chile, Casilla 653, Santiago, Chile.
| |
Collapse
|
20
|
Chen ASY, Westwood NJ, Brear P, Rogers GW, Mavridis L, Mitchell JBO. A Random Forest Model for Predicting Allosteric and Functional Sites on Proteins. Mol Inform 2016; 35:125-35. [DOI: 10.1002/minf.201500108] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2015] [Accepted: 12/28/2015] [Indexed: 01/17/2023]
|
21
|
Abstract
Soluble N-ethylmaleimide-sensitive factor attachment protein receptor (SNARE) proteins constitute the core membrane fusion machinery of intracellular transport and intercellular communication. A little more than ten years ago, it was proposed that the long N-terminal domain of a subset of SNAREs, henceforth called the longin domain, could be a crucial regulator with multiple functions in membrane trafficking. Structural, biochemical and cell biology studies have now produced a large set of data that support this hypothesis and indicate a role for the longin domain in regulating the sorting and activity of SNAREs. Here, we review the first decade of structure-function data on the three prototypical longin SNAREs: Ykt6, VAMP7 and Sec22b. We will, in particular, highlight the conserved molecular mechanisms that allow longin domains to fold back onto the fusion-inducing SNARE coiled-coil domain, thereby inhibiting membrane fusion, and describe the interactions of longin SNAREs with proteins that regulate their intracellular sorting. This dual function of the longin domain in regulating both the membrane localization and membrane fusion activity of SNAREs points to its role as a key regulatory module of intracellular trafficking.
Collapse
Affiliation(s)
- Frédéric Daste
- Université Paris Diderot, Sorbonne Paris Cité, Institut Jacques Monod, CNRS UMR 7592, Membrane Traffic in Health & Disease, INSERM ERL U950, Paris F-75013, France
| | - Thierry Galli
- Université Paris Diderot, Sorbonne Paris Cité, Institut Jacques Monod, CNRS UMR 7592, Membrane Traffic in Health & Disease, INSERM ERL U950, Paris F-75013, France
| | - David Tareste
- Université Paris Diderot, Sorbonne Paris Cité, Institut Jacques Monod, CNRS UMR 7592, Membrane Traffic in Health & Disease, INSERM ERL U950, Paris F-75013, France
| |
Collapse
|
22
|
Fox NK, Brenner SE, Chandonia JM. The value of protein structure classification information-Surveying the scientific literature. Proteins 2015; 83:2025-38. [PMID: 26313554 PMCID: PMC4609302 DOI: 10.1002/prot.24915] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2015] [Revised: 08/06/2015] [Accepted: 08/18/2015] [Indexed: 11/08/2022]
Abstract
The Structural Classification of Proteins (SCOP) and Class, Architecture, Topology, Homology (CATH) databases have been valuable resources for protein structure classification for over 20 years. Development of SCOP (version 1) concluded in June 2009 with SCOP 1.75. The SCOPe (SCOP-extended) database offers continued development of the classic SCOP hierarchy, adding over 33,000 structures. We have attempted to assess the impact of these two decade old resources and guide future development. To this end, we surveyed recent articles to learn how structure classification data are used. Of 571 articles published in 2012-2013 that cite SCOP, 439 actually use data from the resource. We found that the type of use was fairly evenly distributed among four top categories: A) study protein structure or evolution (27% of articles), B) train and/or benchmark algorithms (28% of articles), C) augment non-SCOP datasets with SCOP classification (21% of articles), and D) examine the classification of one protein/a small set of proteins (22% of articles). Most articles described computational research, although 11% described purely experimental research, and a further 9% included both. We examined how CATH and SCOP were used in 158 articles that cited both databases: while some studies used only one dataset, the majority used data from both resources. Protein structure classification remains highly relevant for a diverse range of problems and settings.
Collapse
Affiliation(s)
- Naomi K Fox
- Lawrence Berkeley National Laboratory, Physical Biosciences Division, Berkeley, California, 94720
| | - Steven E Brenner
- Lawrence Berkeley National Laboratory, Physical Biosciences Division, Berkeley, California, 94720.,Department of Plant and Microbial Biology, University of California, Berkeley, California, 94720
| | - John-Marc Chandonia
- Lawrence Berkeley National Laboratory, Physical Biosciences Division, Berkeley, California, 94720
| |
Collapse
|
23
|
Kelley LA, Sternberg MJE. Partial protein domains: evolutionary insights and bioinformatics challenges. Genome Biol 2015; 16:100. [PMID: 25986583 PMCID: PMC4436111 DOI: 10.1186/s13059-015-0663-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Protein domains are generally thought to correspond to units of evolution. New research raises questions about how such domains are defined with bioinformatics tools and sheds light on how evolution has enabled partial domains to be viable.
Collapse
Affiliation(s)
- Lawrence A Kelley
- Structural Bioinformatics Group, Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK.
| | - Michael J E Sternberg
- Structural Bioinformatics Group, Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK.
| |
Collapse
|
24
|
Banach M, Prudhomme N, Carpentier M, Duprat E, Papandreou N, Kalinowska B, Chomilier J, Roterman I. Contribution to the prediction of the fold code: application to immunoglobulin and flavodoxin cases. PLoS One 2015; 10:e0125098. [PMID: 25915049 PMCID: PMC4411048 DOI: 10.1371/journal.pone.0125098] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2014] [Accepted: 03/20/2015] [Indexed: 12/19/2022] Open
Abstract
Background Folding nucleus of globular proteins formation starts by the mutual interaction of a group of hydrophobic amino acids whose close contacts allow subsequent formation and stability of the 3D structure. These early steps can be predicted by simulation of the folding process through a Monte Carlo (MC) coarse grain model in a discrete space. We previously defined MIRs (Most Interacting Residues), as the set of residues presenting a large number of non-covalent neighbour interactions during such simulation. MIRs are good candidates to define the minimal number of residues giving rise to a given fold instead of another one, although their proportion is rather high, typically [15-20]% of the sequences. Having in mind experiments with two sequences of very high levels of sequence identity (up to 90%) but different folds, we combined the MIR method, which takes sequence as single input, with the “fuzzy oil drop” (FOD) model that requires a 3D structure, in order to estimate the residues coding for the fold. FOD assumes that a globular protein follows an idealised 3D Gaussian distribution of hydrophobicity density, with the maximum in the centre and minima at the surface of the “drop”. If the actual local density of hydrophobicity around a given amino acid is as high as the ideal one, then this amino acid is assigned to the core of the globular protein, and it is assumed to follow the FOD model. Therefore one obtains a distribution of the amino acids of a protein according to their agreement or rejection with the FOD model. Results We compared and combined MIR and FOD methods to define the minimal nucleus, or keystone, of two populated folds: immunoglobulin-like (Ig) and flavodoxins (Flav). The combination of these two approaches defines some positions both predicted as a MIR and assigned as accordant with the FOD model. It is shown here that for these two folds, the intersection of the predicted sets of residues significantly differs from random selection. It reduces the number of selected residues by each individual method and allows a reasonable agreement with experimentally determined key residues coding for the particular fold. In addition, the intersection of the two methods significantly increases the specificity of the prediction, providing a robust set of residues that constitute the folding nucleus.
Collapse
Affiliation(s)
- Mateusz Banach
- Department of Bioinformatics and Telemedicine, Medical College, Jagiellonian University, Krakow, Poland
| | - Nicolas Prudhomme
- Protein Structure Prediction group, IMPMC, UPMC & CNRS, Paris, France
| | - Mathilde Carpentier
- Protein Structure Prediction group, IMPMC, UPMC & CNRS, Paris, France
- RPBS, 35 rue Hélène Brion, 75013, Paris, France
| | - Elodie Duprat
- Protein Structure Prediction group, IMPMC, UPMC & CNRS, Paris, France
- RPBS, 35 rue Hélène Brion, 75013, Paris, France
| | - Nikolaos Papandreou
- Genetics Department, Agricultural University of Athens, Iera Odos 75, Athens, Greece
| | - Barbara Kalinowska
- Department of Bioinformatics and Telemedicine, Medical College, Jagiellonian University, Krakow, Poland
| | - Jacques Chomilier
- Protein Structure Prediction group, IMPMC, UPMC & CNRS, Paris, France
- RPBS, 35 rue Hélène Brion, 75013, Paris, France
- * E-mail: (JC); (IR)
| | - Irena Roterman
- Department of Bioinformatics and Telemedicine, Medical College, Jagiellonian University, Krakow, Poland
- * E-mail: (JC); (IR)
| |
Collapse
|
25
|
A structure-based classification and analysis of protein domain family binding sites and their interactions. BIOLOGY 2015; 4:327-43. [PMID: 25860777 PMCID: PMC4498303 DOI: 10.3390/biology4020327] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/16/2015] [Revised: 03/24/2015] [Accepted: 03/31/2015] [Indexed: 11/29/2022]
Abstract
While the number of solved 3D protein structures continues to grow rapidly, the structural rules that distinguish protein-protein interactions between different structural families are still not clear. Here, we classify and analyse the secondary structural features and promiscuity of a comprehensive non-redundant set of domain family binding sites (DFBSs) and hetero domain-domain interactions (DDIs) extracted from our updated KBDOCK resource. We have partitioned 4001 DFBSs into five classes using their propensities for three types of secondary structural elements (“α” for helices, “β” for strands, and “γ” for irregular structure) and we have analysed how frequently these classes occur in DDIs. Our results show that β elements are not highly represented in DFBSs compared to α and γ elements. At the DDI level, all classes of binding sites tend to preferentially bind to the same class of binding sites and α/β contacts are significantly disfavored. Very few DFBSs are promiscuous: 80% of them interact with just one Pfam domain. About 50% of our Pfam domains bear only one single-partner DFBS and are therefore monogamous in their interactions with other domains. Conversely, promiscuous Pfam domains bear several DFBSs among which one or two are promiscuous, thereby multiplying the promiscuity of the concerned protein.
Collapse
|
26
|
Computational tools for epitope vaccine design and evaluation. Curr Opin Virol 2015; 11:103-12. [PMID: 25837467 DOI: 10.1016/j.coviro.2015.03.013] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2014] [Revised: 03/13/2015] [Accepted: 03/16/2015] [Indexed: 12/15/2022]
Abstract
Rational approaches will be required to develop universal vaccines for viral pathogens such as human immunodeficiency virus, hepatitis C virus, and influenza, for which empirical approaches have failed. The main objective of a rational vaccine strategy is to design novel immunogens that are capable of inducing long-term protective immunity. In practice, this requires structure-based engineering of the target neutralizing epitopes and a quantitative readout of vaccine-induced immune responses. Therefore, computational tools that can facilitate these two areas have played increasingly important roles in rational vaccine design in recent years. Here we review the computational techniques developed for protein structure prediction and antibody repertoire analysis, and demonstrate how they can be applied to the design and evaluation of epitope vaccines.
Collapse
|
27
|
Currin A, Swainston N, Day PJ, Kell DB. Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligently. Chem Soc Rev 2015; 44:1172-239. [PMID: 25503938 PMCID: PMC4349129 DOI: 10.1039/c4cs00351a] [Citation(s) in RCA: 256] [Impact Index Per Article: 28.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Indexed: 12/21/2022]
Abstract
The amino acid sequence of a protein affects both its structure and its function. Thus, the ability to modify the sequence, and hence the structure and activity, of individual proteins in a systematic way, opens up many opportunities, both scientifically and (as we focus on here) for exploitation in biocatalysis. Modern methods of synthetic biology, whereby increasingly large sequences of DNA can be synthesised de novo, allow an unprecedented ability to engineer proteins with novel functions. However, the number of possible proteins is far too large to test individually, so we need means for navigating the 'search space' of possible protein sequences efficiently and reliably in order to find desirable activities and other properties. Enzymologists distinguish binding (Kd) and catalytic (kcat) steps. In a similar way, judicious strategies have blended design (for binding, specificity and active site modelling) with the more empirical methods of classical directed evolution (DE) for improving kcat (where natural evolution rarely seeks the highest values), especially with regard to residues distant from the active site and where the functional linkages underpinning enzyme dynamics are both unknown and hard to predict. Epistasis (where the 'best' amino acid at one site depends on that or those at others) is a notable feature of directed evolution. The aim of this review is to highlight some of the approaches that are being developed to allow us to use directed evolution to improve enzyme properties, often dramatically. We note that directed evolution differs in a number of ways from natural evolution, including in particular the available mechanisms and the likely selection pressures. Thus, we stress the opportunities afforded by techniques that enable one to map sequence to (structure and) activity in silico, as an effective means of modelling and exploring protein landscapes. Because known landscapes may be assessed and reasoned about as a whole, simultaneously, this offers opportunities for protein improvement not readily available to natural evolution on rapid timescales. Intelligent landscape navigation, informed by sequence-activity relationships and coupled to the emerging methods of synthetic biology, offers scope for the development of novel biocatalysts that are both highly active and robust.
Collapse
Affiliation(s)
- Andrew Currin
- Manchester Institute of Biotechnology , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK . ; http://dbkgroup.org/; @dbkell ; Tel: +44 (0)161 306 4492
- School of Chemistry , The University of Manchester , Manchester M13 9PL , UK
- Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK
| | - Neil Swainston
- Manchester Institute of Biotechnology , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK . ; http://dbkgroup.org/; @dbkell ; Tel: +44 (0)161 306 4492
- Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK
- School of Computer Science , The University of Manchester , Manchester M13 9PL , UK
| | - Philip J. Day
- Manchester Institute of Biotechnology , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK . ; http://dbkgroup.org/; @dbkell ; Tel: +44 (0)161 306 4492
- Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK
- Faculty of Medical and Human Sciences , The University of Manchester , Manchester M13 9PT , UK
| | - Douglas B. Kell
- Manchester Institute of Biotechnology , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK . ; http://dbkgroup.org/; @dbkell ; Tel: +44 (0)161 306 4492
- School of Chemistry , The University of Manchester , Manchester M13 9PL , UK
- Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK
| |
Collapse
|
28
|
Rappoport N, Stern A, Linial N, Linial M. Entropy-driven partitioning of the hierarchical protein space. Bioinformatics 2015; 30:i624-30. [PMID: 25161256 PMCID: PMC4147929 DOI: 10.1093/bioinformatics/btu478] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
Motivation: Modern protein sequencing techniques have led to the determination of >50 million protein sequences. ProtoNet is a clustering system that provides a continuous hierarchical agglomerative clustering tree for all proteins. While ProtoNet performs unsupervised classification of all included proteins, finding an optimal level of granularity for the purpose of focusing on protein functional groups remain elusive. Here, we ask whether knowledge-based annotations on protein families can support the automatic unsupervised methods for identifying high-quality protein families. We present a method that yields within the ProtoNet hierarchy an optimal partition of clusters, relative to manual annotation schemes. The method’s principle is to minimize the entropy-derived distance between annotation-based partitions and all available hierarchical partitions. We describe the best front (BF) partition of 2 478 328 proteins from UniRef50. Of 4 929 553 ProtoNet tree clusters, BF based on Pfam annotations contain 26 891 clusters. The high quality of the partition is validated by the close correspondence with the set of clusters that best describe thousands of keywords of Pfam. The BF is shown to be superior to naïve cut in the ProtoNet tree that yields a similar number of clusters. Finally, we used parameters intrinsic to the clustering process to enrich a priori the BF’s clusters. We present the entropy-based method’s benefit in overcoming the unavoidable limitations of nested clusters in ProtoNet. We suggest that this automatic information-based cluster selection can be useful for other large-scale annotation schemes, as well as for systematically testing and comparing putative families derived from alternative clustering methods. Availability and implementation: A catalog of BF clusters for thousands of Pfam keywords is provided at http://protonet.cs.huji.ac.il/bestFront/ Contact: michall@cc.huji.ac.il
Collapse
Affiliation(s)
- Nadav Rappoport
- School of Computer Science and Engineering and Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, 91904, Israel
| | - Amos Stern
- School of Computer Science and Engineering and Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, 91904, Israel
| | - Nathan Linial
- School of Computer Science and Engineering and Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, 91904, Israel
| | - Michal Linial
- School of Computer Science and Engineering and Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, 91904, Israel
| |
Collapse
|
29
|
Abstract
A key reason three-dimensional (3-D) protein structures are annotated with supporting or derived information is to understand the molecular basis of protein function. To this end, protein structure annotation databases curate key facts and observations, based on community-accepted standards, about the ~100,000 3-D experimental protein structures to date. This review will introduce the primary structure repositories, databases, and value-added structural annotation databases, as well as the range of information they provide. The different levels of annotation data (primary vs. derived vs. inferred) and how they should all be considered accordingly will also be described.
Collapse
Affiliation(s)
- Margaret J. Gabanyi
- Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA
| | - Helen M. Berman
- Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA
| |
Collapse
|
30
|
|
31
|
Hepburn L, Prajsnar TK, Klapholz C, Moreno P, Loynes CA, Ogryzko NV, Brown K, Schiebler M, Hegyi K, Antrobus R, Hammond KL, Connolly J, Ochoa B, Bryant C, Otto M, Surewaard B, Seneviratne SL, Grogono DM, Cachat J, Ny T, Kaser A, Török ME, Peacock SJ, Holden M, Blundell T, Wang L, Ligoxygakis P, Minichiello L, Woods CG, Foster SJ, Renshaw SA, Floto RA. Innate immunity. A Spaetzle-like role for nerve growth factor β in vertebrate immunity to Staphylococcus aureus. Science 2014; 346:641-646. [PMID: 25359976 PMCID: PMC4255479 DOI: 10.1126/science.1258705] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Many key components of innate immunity to infection are shared between Drosophila and humans. However, the fly Toll ligand Spaetzle is not thought to have a vertebrate equivalent. We have found that the structurally related cystine-knot protein, nerve growth factor β (NGFβ), plays an unexpected Spaetzle-like role in immunity to Staphylococcus aureus infection in chordates. Deleterious mutations of either human NGFβ or its high-affinity receptor tropomyosin-related kinase receptor A (TRKA) were associated with severe S. aureus infections. NGFβ was released by macrophages in response to S. aureus exoproteins through activation of the NOD-like receptors NLRP3 and NLRP4 and enhanced phagocytosis and superoxide-dependent killing, stimulated proinflammatory cytokine production, and promoted calcium-dependent neutrophil recruitment. TrkA knockdown in zebrafish increased susceptibility to S. aureus infection, confirming an evolutionarily conserved role for NGFβ-TRKA signaling in pathogen-specific host immunity.
Collapse
Affiliation(s)
- Lucy Hepburn
- Cambridge Institute for Medical Research, University of Cambridge, UK
- Department of Medicine, University of Cambridge, UK
| | - Tomasz K. Prajsnar
- Krebs Institute, University of Sheffield, Western Bank, Sheffield, S10 2TN, UK
- Department of Molecular Biology and Biotechnology, University of Sheffield, Western Bank, Sheffield, S10 2TN, UK
- Bateson Centre, University of Sheffield, Western Bank, Sheffield, S10 2TN, UK
| | - Catherine Klapholz
- Cambridge Institute for Medical Research, University of Cambridge, UK
- Department of Medicine, University of Cambridge, UK
| | - Pablo Moreno
- Cambridge Institute for Medical Research, University of Cambridge, UK
| | - Catherine A. Loynes
- Bateson Centre, University of Sheffield, Western Bank, Sheffield, S10 2TN, UK
- Department of Infection and Immunity, University of Sheffield, Western Bank, Sheffield, S10 2TN, UK
| | - Nikolay V. Ogryzko
- Bateson Centre, University of Sheffield, Western Bank, Sheffield, S10 2TN, UK
| | - Karen Brown
- Cambridge Institute for Medical Research, University of Cambridge, UK
- Department of Medicine, University of Cambridge, UK
- Cambridge Centre for Lung Infection, Papworth Hospital, Cambridge, UK
| | - Mark Schiebler
- Cambridge Institute for Medical Research, University of Cambridge, UK
- Department of Medicine, University of Cambridge, UK
| | - Krisztina Hegyi
- Cambridge Institute for Medical Research, University of Cambridge, UK
- Department of Medicine, University of Cambridge, UK
| | - Robin Antrobus
- Cambridge Institute for Medical Research, University of Cambridge, UK
| | - Katherine L. Hammond
- Bateson Centre, University of Sheffield, Western Bank, Sheffield, S10 2TN, UK
- Department of Infection and Immunity, University of Sheffield, Western Bank, Sheffield, S10 2TN, UK
| | - John Connolly
- Krebs Institute, University of Sheffield, Western Bank, Sheffield, S10 2TN, UK
- Department of Molecular Biology and Biotechnology, University of Sheffield, Western Bank, Sheffield, S10 2TN, UK
| | | | - Clare Bryant
- Department of Veterinary Medicine, University of Cambridge, UK
| | - Michael Otto
- Laboratory of Human Bacterial Pathogenesis NIAID, NIH, Bethesda USA
| | - Bas Surewaard
- Dept of Medical Microbiology, University Medical Centre, Utrecht, Netherlands
| | | | - Dorothy M. Grogono
- Department of Medicine, University of Cambridge, UK
- Cambridge Centre for Lung Infection, Papworth Hospital, Cambridge, UK
| | - Julien Cachat
- Dept. of Pathology and Immunology, Geneva University, Switzerland
| | - Tor Ny
- Dept. of Medical Biochemistry and Biophysics, Umea University, Sweden
| | - Arthur Kaser
- Department of Medicine, University of Cambridge, UK
| | | | - Sharon J. Peacock
- Department of Medicine, University of Cambridge, UK
- Wellcome Trust Sanger Institute, Hinxton, UK
| | | | - Tom Blundell
- Department of Biochemistry, University of Cambridge, UK
| | - Lihui Wang
- Biochemistry Department, Oxford University. UK
| | | | | | - C. Geoff Woods
- Cambridge Institute for Medical Research, University of Cambridge, UK
- Department of Medical Genetics, University of Cambridge, UK
| | - Simon J. Foster
- Krebs Institute, University of Sheffield, Western Bank, Sheffield, S10 2TN, UK
- Department of Molecular Biology and Biotechnology, University of Sheffield, Western Bank, Sheffield, S10 2TN, UK
| | - Stephen A. Renshaw
- Krebs Institute, University of Sheffield, Western Bank, Sheffield, S10 2TN, UK
- Bateson Centre, University of Sheffield, Western Bank, Sheffield, S10 2TN, UK
- Department of Infection and Immunity, University of Sheffield, Western Bank, Sheffield, S10 2TN, UK
| | - R. Andres Floto
- Cambridge Institute for Medical Research, University of Cambridge, UK
- Department of Medicine, University of Cambridge, UK
- Cambridge Centre for Lung Infection, Papworth Hospital, Cambridge, UK
| |
Collapse
|
32
|
Berman HM, Kleywegt GJ, Nakamura H, Markley JL. The Protein Data Bank archive as an open data resource. J Comput Aided Mol Des 2014; 28:1009-14. [PMID: 25062767 PMCID: PMC4196035 DOI: 10.1007/s10822-014-9770-y] [Citation(s) in RCA: 73] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2014] [Accepted: 06/23/2014] [Indexed: 02/08/2023]
Abstract
The Protein Data Bank archive was established in 1971, and recently celebrated its 40th anniversary (Berman et al. in Structure 20:391, 2012). An analysis of interrelationships of the science, technology and community leads to further insights into how this resource evolved into one of the oldest and most widely used open-access data resources in biology.
Collapse
Affiliation(s)
- Helen M Berman
- RCSB PDB, Department of Chemistry and Chemical Biology and Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA,
| | | | | | | |
Collapse
|
33
|
Ma J, Ma Z, Kang B, Lu K. A method of protein model classification and retrieval using bag-of-visual-features. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2014; 2014:269394. [PMID: 25258644 PMCID: PMC4165735 DOI: 10.1155/2014/269394] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/20/2014] [Accepted: 07/30/2014] [Indexed: 11/18/2022]
Abstract
In this paper we propose a novel visual method for protein model classification and retrieval. Different from the conventional methods, the key idea of the proposed method is to extract image features of proteins and measure the visual similarity between proteins. Firstly, the multiview images are captured by vertices and planes of a given octahedron surrounding the protein. Secondly, the local features are extracted from each image of the different views by the SURF algorithm and are vector quantized into visual words using a visual codebook. Finally, KLD is employed to calculate the similarity distance between two feature vectors. Experimental results show that the proposed method has encouraging performances for protein retrieval and categorization as shown in the comparison with other methods.
Collapse
Affiliation(s)
- Jinlin Ma
- School of Information and Technology, Northwest University, Xi'an 710120, China
- School of Mathematics and Information Science, North University of Nationalities, Yinchuan 750021, China
| | - Ziping Ma
- School of Mathematics and Information Science, North University of Nationalities, Yinchuan 750021, China
| | - Baosheng Kang
- School of Information and Technology, Northwest University, Xi'an 710120, China
| | - Ke Lu
- College of Computing & Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
34
|
Loong BK, Knotts TA. Communication: Using multiple tethers to stabilize proteins on surfaces. J Chem Phys 2014; 141:051104. [DOI: 10.1063/1.4891971] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Affiliation(s)
- Brandon K. Loong
- Department of Chemical Engineering, Brigham Young University, Provo, Utah 84602, USA
| | - Thomas A. Knotts
- Department of Chemical Engineering, Brigham Young University, Provo, Utah 84602, USA
| |
Collapse
|
35
|
Musayeva K, Henderson T, Mitchell JB, Mavridis L. PFClust: an optimised implementation of a parameter-free clustering algorithm. SOURCE CODE FOR BIOLOGY AND MEDICINE 2014; 9:5. [PMID: 24490618 PMCID: PMC3940029 DOI: 10.1186/1751-0473-9-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/18/2013] [Accepted: 01/28/2014] [Indexed: 11/25/2022]
Abstract
Background A well-known problem in cluster analysis is finding an optimal number of clusters reflecting the inherent structure of the data. PFClust is a partitioning-based clustering algorithm capable, unlike many widely-used clustering algorithms, of automatically proposing an optimal number of clusters for the data. Results The results of tests on various types of data showed that PFClust can discover clusters of arbitrary shapes, sizes and densities. The previous implementation of the algorithm had already been successfully used to cluster large macromolecular structures and small druglike compounds. We have greatly improved the algorithm by a more efficient implementation, which enables PFClust to process large data sets acceptably fast. Conclusions In this paper we present a new optimized implementation of the PFClust algorithm that runs considerably faster than the original.
Collapse
Affiliation(s)
| | | | | | - Lazaros Mavridis
- EaStCHEM School of Chemistry and Biomedical Sciences Research Complex, University of St Andrews, North Haugh, St Andrews, Scotland KY16 9ST, UK.
| |
Collapse
|
36
|
De Franceschi N, Wild K, Schlacht A, Dacks JB, Sinning I, Filippini F. Longin and GAF domains: structural evolution and adaptation to the subcellular trafficking machinery. Traffic 2013; 15:104-21. [PMID: 24107188 DOI: 10.1111/tra.12124] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2013] [Revised: 09/18/2013] [Accepted: 09/23/2013] [Indexed: 11/28/2022]
Abstract
Endomembrane trafficking is one of the most prominent cytological features of eukaryotes. Given their widespread distribution and specialization, coiled-coil domains, coatomer domains, small GTPases and Longin domains are considered primordial 'building blocks' of the membrane trafficking machineries. Longin domains are conserved across eukaryotes and were likely to be present in the Last Eukaryotic Common Ancestor. The Longin fold is based on the α-β-α sandwich architecture and a unique topology, possibly accounting for the special adaptation to the eukaryotic trafficking machinery. The ancient Per ARNT Sim (PAS) and cGMP-specific phosphodiesterases, Adenylyl cyclases and FhlA (GAF) family domains show a similar architecture, and the identification of prokaryotic counterparts of GAF domains involved in trafficking provides an additional connection for the endomembrane system back into the pre-eukaryotic world. Proteome-wide, comparative bioinformatic analyses of the domains reveal three binding regions (A, B and C) mediating either specific or conserved protein-protein interactions. While the A region mediates intra- and inter-molecular interactions, the B region is involved in binding small GTPases, thus providing an evolutionary connection among major building blocks in the endomembrane system. Finally, we propose that the peculiar interaction surface of the C region of the Longin domain allowed it to extensively integrate into the endomembrane trafficking machinery in the earliest stages of building the eukaryotic cell.
Collapse
Affiliation(s)
- Nicola De Franceschi
- Molecular Biology and Bioinformatics Unit, Department of Biology, University of Padova, Padova, Italy; Current address: Centre for Biotechnology, University of Turku, Turku, Finland
| | | | | | | | | | | |
Collapse
|
37
|
|
38
|
Vance SJ, McDonald RE, Cooper A, Smith BO, Kennedy MW. The structure of latherin, a surfactant allergen protein from horse sweat and saliva. J R Soc Interface 2013; 10:20130453. [PMID: 23782536 PMCID: PMC4043175 DOI: 10.1098/rsif.2013.0453] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2013] [Accepted: 05/29/2013] [Indexed: 12/30/2022] Open
Abstract
Latherin is a highly surface-active allergen protein found in the sweat and saliva of horses and other equids. Its surfactant activity is intrinsic to the protein in its native form, and is manifest without associated lipids or glycosylation. Latherin probably functions as a wetting agent in evaporative cooling in horses, but it may also assist in mastication of fibrous food as well as inhibition of microbial biofilms. It is a member of the PLUNC family of proteins abundant in the oral cavity and saliva of mammals, one of which has also been shown to be a surfactant and capable of disrupting microbial biofilms. How these proteins work as surfactants while remaining soluble and cell membrane-compatible is not known. Nor have their structures previously been reported. We have used protein nuclear magnetic resonance spectroscopy to determine the conformation and dynamics of latherin in aqueous solution. The protein is a monomer in solution with a slightly curved cylindrical structure exhibiting a 'super-roll' motif comprising a four-stranded anti-parallel β-sheet and two opposing α-helices which twist along the long axis of the cylinder. One end of the molecule has prominent, flexible loops that contain a number of apolar amino acid side chains. This, together with previous biophysical observations, leads us to a plausible mechanism for surfactant activity in which the molecule is first localized to the non-polar interface via these loops, and then unfolds and flattens to expose its hydrophobic interior to the air or non-polar surface. Intrinsically surface-active proteins are relatively rare in nature, and this is the first structure of such a protein from mammals to be reported. Both its conformation and proposed method of action are different from other, non-mammalian surfactant proteins investigated so far.
Collapse
Affiliation(s)
- Steven J. Vance
- School of Chemistry, University of Glasgow, Glasgow G12 8QQ, UK
| | - Rhona E. McDonald
- Institute of Biodiversity, Animal Health and Comparative Medicine, University of Glasgow, Glasgow G12 8QQ, UK
| | - Alan Cooper
- School of Chemistry, University of Glasgow, Glasgow G12 8QQ, UK
| | - Brian O. Smith
- Institute of Molecular, Cell and Systems Biology, University of Glasgow, Glasgow G12 8QQ, UK
| | - Malcolm W. Kennedy
- Institute of Biodiversity, Animal Health and Comparative Medicine, University of Glasgow, Glasgow G12 8QQ, UK
- Institute of Molecular, Cell and Systems Biology, University of Glasgow, Glasgow G12 8QQ, UK
| |
Collapse
|
39
|
Mavridis L, Nath N, Mitchell JBO. PFClust: a novel parameter free clustering algorithm. BMC Bioinformatics 2013; 14:213. [PMID: 23819480 PMCID: PMC3747858 DOI: 10.1186/1471-2105-14-213] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2013] [Accepted: 07/01/2013] [Indexed: 12/02/2022] Open
Abstract
Background We present the algorithm PFClust (Parameter Free Clustering), which is able automatically to cluster data and identify a suitable number of clusters to group them into without requiring any parameters to be specified by the user. The algorithm partitions a dataset into a number of clusters that share some common attributes, such as their minimum expectation value and variance of intra-cluster similarity. A set of n objects can be clustered into any number of clusters from one to n, and there are many different hierarchical and partitional, agglomerative and divisive, clustering methodologies available that can be used to do this. Nonetheless, automatically determining the number of clusters present in a dataset constitutes a significant challenge for clustering algorithms. Identifying a putative optimum number of clusters to group the objects into involves computing and evaluating a range of clusterings with different numbers of clusters. However, there is no agreed or unique definition of optimum in this context. Thus, we test PFClust on datasets for which an external gold standard of ‘correct’ cluster definitions exists, noting that this division into clusters may be suboptimal according to other reasonable criteria. PFClust is heuristic in the sense that it cannot be described in terms of optimising any single simply-expressed metric over the space of possible clusterings. Results We validate PFClust firstly with reference to a number of synthetic datasets consisting of 2D vectors, showing that its clustering performance is at least equal to that of six other leading methodologies – even though five of the other methods are told in advance how many clusters to use. We also demonstrate the ability of PFClust to classify the three dimensional structures of protein domains, using a set of folds taken from the structural bioinformatics database CATH. Conclusions We show that PFClust is able to cluster the test datasets a little better, on average, than any of the other algorithms, and furthermore is able to do this without the need to specify any external parameters. Results on the synthetic datasets demonstrate that PFClust generates meaningful clusters, while our algorithm also shows excellent agreement with the correct assignments for a dataset extracted from the CATH part-manually curated classification of protein domain structures.
Collapse
Affiliation(s)
- Lazaros Mavridis
- Biomedical Sciences Research Complex and EaStCHEM School of Chemistry, Purdie Building, University of St Andrews, North Haugh, St Andrews, KY16 9ST, Scotland, UK.
| | | | | |
Collapse
|
40
|
Hugo W, Sung WK, Ng SK. Discovering interacting domains and motifs in protein-protein interactions. Methods Mol Biol 2013. [PMID: 23192537 DOI: 10.1007/978-1-62703-107-3_2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/13/2023]
Abstract
Many important biological processes, such as the signaling pathways, require protein-protein interactions (PPIs) that are designed for fast response to stimuli. These interactions are usually transient, easily formed, and disrupted, yet specific. Many of these transient interactions involve the binding of a protein domain to a short stretch (3-10) of amino acid residues, which can be characterized by a sequence pattern, i.e., a short linear motif (SLiM). We call these interacting domains and motifs domain-SLiM interactions. Existing methods have focused on discovering SLiMs in the interacting proteins' sequence data. With the recent increase in protein structures, we have a new opportunity to detect SLiMs directly from the proteins' 3D structures instead of their linear sequences. In this chapter, we describe a computational method called SLiMDIet to directly detect SLiMs on domain interfaces extracted from 3D structures of PPIs. SLiMDIet comprises two steps: (1) interaction interfaces belonging to the same domain are extracted and grouped together using structural clustering and (2) the extracted interaction interfaces in each cluster are structurally aligned to extract the corresponding SLiM. Using SLiMDIet, de novo SLiMs interacting with protein domains can be computationally detected from structurally clustered domain-SLiM interactions for PFAM domains which have available 3D structures in the PDB database.
Collapse
Affiliation(s)
- Willy Hugo
- School of Computing, National University of Singapore, Singapore, Singapore
| | | | | |
Collapse
|
41
|
Dai Q, Li Y, Liu X, Yao Y, Cao Y, He P. Comparison study on statistical features of predicted secondary structures for protein structural class prediction: From content to position. BMC Bioinformatics 2013; 14:152. [PMID: 23641706 PMCID: PMC3652764 DOI: 10.1186/1471-2105-14-152] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2012] [Accepted: 04/03/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Many content-based statistical features of secondary structural elements (CBF-PSSEs) have been proposed and achieved promising results in protein structural class prediction, but until now position distribution of the successive occurrences of an element in predicted secondary structure sequences hasn't been used. It is necessary to extract some appropriate position-based features of the secondary structural elements for prediction task. RESULTS We proposed some position-based features of predicted secondary structural elements (PBF-PSSEs) and assessed their intrinsic ability relative to the available CBF-PSSEs, which not only offers a systematic and quantitative experimental assessment of these statistical features, but also naturally complements the available comparison of the CBF-PSSEs. We also analyzed the performance of the CBF-PSSEs combined with the PBF-PSSE and further constructed a new combined feature set, PBF11CBF-PSSE. Based on these experiments, novel valuable guidelines for the use of PBF-PSSEs and CBF-PSSEs were obtained. CONCLUSIONS PBF-PSSEs and CBF-PSSEs have a compelling impact on protein structural class prediction. When combining with the PBF-PSSE, most of the CBF-PSSEs get a great improvement over the prediction accuracies, so the PBF-PSSEs and the CBF-PSSEs have to work closely so as to make significant and complementary contributions to protein structural class prediction. Besides, the proposed PBF-PSSE's performance is extremely sensitive to the choice of parameter k. In summary, our quantitative analysis verifies that exploring the position information of predicted secondary structural elements is a promising way to improve the abilities of protein structural class prediction.
Collapse
Affiliation(s)
- Qi Dai
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Yan Li
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Xiaoqing Liu
- College of Science, Hangzhou Dianzi University, Hangzhou, 310018, China
| | - Yuhua Yao
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Yunjie Cao
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Pingan He
- College of Science, Zhejiang Sci-Tech University, Hangzhou, 310018, China
| |
Collapse
|
42
|
Choi Y, Griswold KE, Bailey-Kellogg C. Structure-based redesign of proteins for minimal T-cell epitope content. J Comput Chem 2013; 34:879-91. [PMID: 23299435 PMCID: PMC3763725 DOI: 10.1002/jcc.23213] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2012] [Revised: 11/16/2012] [Accepted: 11/28/2012] [Indexed: 12/31/2022]
Abstract
The protein universe displays a wealth of therapeutically relevant activities, but T-cell driven immune responses to non-"self" biological agents present a major impediment to harnessing the full diversity of these molecular functions. Mutagenic T-cell epitope deletion seeks to mitigate the immune response, but can typically address only a small number of epitopes. Here, we pursue a "bottom-up" approach that redesigns an entire protein to remain native-like but contain few if any immunogenic epitopes. We do so by extending the Rosetta flexible-backbone protein design software with an epitope scoring mechanism and appropriate constraints. The method is benchmarked with a diverse panel of proteins and applied to three targets of therapeutic interest. We show that the deimmunized designs indeed have minimal predicted epitope content and are native-like in terms of various quality measures, and moreover that they display levels of native sequence recovery comparable to those of non-deimmunized designs.
Collapse
Affiliation(s)
- Yoonjoo Choi
- Department of Computer Science, Dartmouth College, New Hampshire 03755, USA
| | | | | |
Collapse
|
43
|
Rappoport N, Linial M. Functional inference by ProtoNet family tree: the uncharacterized proteome of Daphnia pulex. BMC Bioinformatics 2013; 14 Suppl 3:S11. [PMID: 23514195 PMCID: PMC3584848 DOI: 10.1186/1471-2105-14-s3-s11] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Background Daphnia pulex (Water flea) is the first fully sequenced crustacean genome. The crustaceans and insects have diverged from a common ancestor. It is a model organism for studying the molecular makeup for coping with the environmental challenges. In the complete proteome, there are 30,550 putative proteins. However, about 10,000 of them have no known homologues. Currently, the UniProtoKB reports on 95% of the Daphnia's proteins as putative and uncharacterized proteins. Results We have applied ProtoNet, an unsupervised hierarchical protein clustering method that covers about 10 million sequences, for automatic annotation of the Daphnia's proteome. 98.7% (26,625) of the Daphnia full-length proteins were successfully mapped to 13,880 ProtoNet stable clusters, and only 1.3% remained unmapped. We compared the properties of the Daphnia's protein families with those of the mouse and the fruitfly proteomes. Functional annotations were successfully assigned for 86% of the proteins. Most proteins (61%) were mapped to only 2953 clusters that contain Daphnia's duplicated genes. We focused on the functionality of maximally amplified paralogs. Cuticle structure components and a variety of ion channels protein families were associated with a maximal level of gene amplification. We focused on gene amplification as a leading strategy of the Daphnia in coping with environmental toxicity. Conclusions Automatic inference is achieved through mapping of sequences to the protein family tree of ProtoNet 6.0. Applying a careful inference protocol resulted in functional assignments for over 86% of the complete proteome. We conclude that the scaffold of ProtoNet can be used as an alignment-free protocol for large-scale annotation task of uncharacterized proteomes.
Collapse
Affiliation(s)
- Nadav Rappoport
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, 91904, Israel
| | | |
Collapse
|
44
|
Sadowski MI. Prediction of protein domain boundaries from inverse covariances. Proteins 2013; 81:253-60. [PMID: 22987736 PMCID: PMC3563215 DOI: 10.1002/prot.24181] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2012] [Revised: 08/10/2012] [Accepted: 09/04/2012] [Indexed: 01/04/2023]
Abstract
It has been known even since relatively few structures had been solved that longer protein chains often contain multiple domains, which may fold separately and play the role of reusable functional modules found in many contexts. In many structural biology tasks, in particular structure prediction, it is of great use to be able to identify domains within the structure and analyze these regions separately. However, when using sequence data alone this task has proven exceptionally difficult, with relatively little improvement over the naive method of choosing boundaries based on size distributions of observed domains. The recent significant improvement in contact prediction provides a new source of information for domain prediction. We test several methods for using this information including a kernel smoothing-based approach and methods based on building alpha-carbon models and compare performance with a length-based predictor, a homology search method and four published sequence-based predictors: DOMCUT, DomPRO, DLP-SVM, and SCOOBY-DOmain. We show that the kernel-smoothing method is significantly better than the other ab initio predictors when both single-domain and multidomain targets are considered and is not significantly different to the homology-based method. Considering only multidomain targets the kernel-smoothing method outperforms all of the published methods except DLP-SVM. The kernel smoothing method therefore represents a potentially useful improvement to ab initio domain prediction.
Collapse
Affiliation(s)
- Michael I Sadowski
- MRC National Institute for Medical Research, The Ridgeway, Mill Hill, London, United Kingdom.
| |
Collapse
|
45
|
Li J, Wu J, Chen K. PFP-RFSM: Protein fold prediction by using random forests and sequence motifs. ACTA ACUST UNITED AC 2013. [DOI: 10.4236/jbise.2013.612145] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
46
|
Abstract
The prediction of gene function from genome sequences is one of the main issues in Bioinformatics. Most computational approaches are based on the similarity between sequences to infer gene function. However, the availability of several fully sequenced genomes has enabled alternative approaches, such as phylogenetic profiles. Phylogenetic profiles are vectors which indicate the presence or absence of a gene in other genomes. The main concept of phylogenetic profiles is that proteins participating in a common structural complex or metabolic pathway are likely to evolve in a correlated fashion. In this paper, a multi-level clustering algorithm of phylogenetic profiles is presented, which aims to detect inter- and intra-genome gene clusters.
Collapse
Affiliation(s)
- FOTIS E. PSOMOPOULOS
- Department Electrical and Computer Engineering, Aristotle University of Thessaloniki, Thessaloniki GR-54124, Greece
| | - PERICLES A. MITKAS
- Department Electrical and Computer Engineering, Aristotle University of Thessaloniki, Thessaloniki GR-54124, Greece
| |
Collapse
|
47
|
Ritchie DW, Ghoorah AW, Mavridis L, Venkatraman V. Fast protein structure alignment using Gaussian overlap scoring of backbone peptide fragment similarity. Bioinformatics 2012; 28:3274-81. [DOI: 10.1093/bioinformatics/bts618] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
48
|
Usifo E, Leigh SEA, Whittall RA, Lench N, Taylor A, Yeats C, Orengo CA, Martin ACR, Celli J, Humphries SE. Low-Density Lipoprotein Receptor Gene Familial Hypercholesterolemia Variant Database: Update and Pathological Assessment. Ann Hum Genet 2012; 76:387-401. [DOI: 10.1111/j.1469-1809.2012.00724.x] [Citation(s) in RCA: 159] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
49
|
Lewis JI, Moss DJ, Knotts TA. Multiple molecule effects on the cooperativity of protein folding transitions in simulations. J Chem Phys 2012; 136:245101. [DOI: 10.1063/1.4729604] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
|
50
|
Skolnick J, Zhou H, Brylinski M. Further evidence for the likely completeness of the library of solved single domain protein structures. J Phys Chem B 2012; 116:6654-64. [PMID: 22272723 DOI: 10.1021/jp211052j] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Recent studies questioned whether the Protein Data Bank (PDB) contains all compact, single domain protein structures. Here, we show that all quasi-spherical, QS, random protein structures devoid of secondary structure are in the PDB and are excellent templates for all native PDB proteins up to 250 residues. Because QS templates have a similar global contour as native, TASSER can refine 98% (90%) of those whose TM-score is 0.4 (0.35) to structures greater than or equal to the 0.5 TM-score threshold (0.74 (0.64) mean TM-score) for CATH/SCOP assignment. On the basis of this and the fact that, at a TM-score of 0.4, 83% (90%) of all (internal) core secondary structure elements are recovered, a 0.40 TM-score is an appropriate fold similarity assignment threshold. Despite the claims of Taylor, Trovato, and Zhou that many of their structures lack a PDB counterpart, using fr-TM-align, at a 0.45 (0.5) TM-score threshold, essentially all (most) are found in the PDB. Thus, the conclusion that the PDB is likely complete is further supported.
Collapse
Affiliation(s)
- Jeffrey Skolnick
- Center for the Study of Systems Biology, Georgia Institute of Technology, 250 14th Street NW, Atlanta, Georgia 30318, USA.
| | | | | |
Collapse
|