1
|
Peters DL, Gaudreault F, Chen W. Functional domains of Acinetobacter bacteriophage tail fibers. Front Microbiol 2024; 15:1230997. [PMID: 38690360 PMCID: PMC11058221 DOI: 10.3389/fmicb.2024.1230997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 03/08/2024] [Indexed: 05/02/2024] Open
Abstract
A rapid increase in antimicrobial resistant bacterial infections around the world is causing a global health crisis. The Gram-negative bacterium Acinetobacter baumannii is categorized as a Priority 1 pathogen for research and development of new antimicrobials by the World Health Organization due to its numerous intrinsic antibiotic resistance mechanisms and ability to quickly acquire new resistance determinants. Specialized phage enzymes, called depolymerases, degrade the bacterial capsule polysaccharide layer and show therapeutic potential by sensitizing the bacterium to phages, select antibiotics, and serum killing. The functional domains responsible for the capsule degradation activity are often found in the tail fibers of select A. baumannii phages. To further explore the functional domains associated with depolymerase activity, tail-associated proteins of 71 sequenced and fully characterized phages were identified from published literature and analyzed for functional domains using InterProScan. Multisequence alignments and phylogenetic analyses were conducted on the domain groups and assessed in the context of noted halo formation or depolymerase characterization. Proteins derived from phages noted to have halo formation or a functional depolymerase, but no functional domain hits, were modeled with AlphaFold2 Multimer, and compared to other protein models using the DALI server. The domains associated with depolymerase function were pectin lyase-like (SSF51126), tailspike binding (cd20481), (Trans)glycosidases (SSF51445), and potentially SGNH hydrolases. These findings expand our knowledge on phage depolymerases, enabling researchers to better exploit these enzymes for therapeutic use in combating the antimicrobial resistance crisis.
Collapse
Affiliation(s)
- Danielle L. Peters
- Human Health Therapeutics (HHT) Research Center, National Research Council Canada, Ottawa, ON, Canada
| | | | - Wangxue Chen
- Human Health Therapeutics (HHT) Research Center, National Research Council Canada, Ottawa, ON, Canada
- Department of Biology, Brock University, St. Catharines, ON, Canada
| |
Collapse
|
2
|
Gullotto D. Fine tuned exploration of evolutionary relationships within the protein universe. Stat Appl Genet Mol Biol 2021; 20:17-36. [PMID: 33594839 DOI: 10.1515/sagmb-2019-0039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2019] [Accepted: 01/12/2021] [Indexed: 11/15/2022]
Abstract
In the regime of domain classifications, the protein universe unveils a discrete set of folds connected by hierarchical relationships. Instead, at sub-domain-size resolution and because of physical constraints not necessarily requiring evolution to shape polypeptide chains, networks of protein motifs depict a continuous view that lies beyond the extent of hierarchical classification schemes. A number of studies, however, suggest that universal sub-sequences could be the descendants of peptides emerged in an ancient pre-biotic world. Should this be the case, evolutionary signals retained by structurally conserved motifs, along with hierarchical features of ancient domains, could sew relationships among folds that diverged beyond the point where homology is discernable. In view of the aforementioned, this paper provides a rationale where a network with hierarchical and continuous levels of the protein space, together with sequence profiles that probe the extent of sequence similarity and contacting residues that capture the transition from pre-biotic to domain world, has been used to explore relationships between ancient folds. Statistics of detected signals have been reported. As a result, an example of an emergent sub-network that makes sense from an evolutionary perspective, where conserved signals retrieved from the assessed protein space have been co-opted, has been discussed.
Collapse
Affiliation(s)
- Danilo Gullotto
- Advanced Computational Biostructural Research Collaboratory, I-95019, Zafferana Etnea, Italy
| |
Collapse
|
3
|
Oldfield CJ, Chen K, Kurgan L. Computational Prediction of Secondary and Supersecondary Structures from Protein Sequences. Methods Mol Biol 2019; 1958:73-100. [PMID: 30945214 DOI: 10.1007/978-1-4939-9161-7_4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Many new methods for the sequence-based prediction of the secondary and supersecondary structures have been developed over the last several years. These and older sequence-based predictors are widely applied for the characterization and prediction of protein structure and function. These efforts have produced countless accurate predictors, many of which rely on state-of-the-art machine learning models and evolutionary information generated from multiple sequence alignments. We describe and motivate both types of predictions. We introduce concepts related to the annotation and computational prediction of the three-state and eight-state secondary structure as well as several types of supersecondary structures, such as β hairpins, coiled coils, and α-turn-α motifs. We review 34 predictors focusing on recent tools and provide detailed information for a selected set of 14 secondary structure and 3 supersecondary structure predictors. We conclude with several practical notes for the end users of these predictive methods.
Collapse
Affiliation(s)
- Christopher J Oldfield
- Department of Computer Science, College of Engineering, Virginia Commonwealth University, Richmond, VA, USA
| | - Ke Chen
- School of Computer Science and Software Engineering, Tianjin Polytechnic University, Tianjin, People's Republic of China
| | - Lukasz Kurgan
- Department of Computer Science, College of Engineering, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
4
|
Trevizani R, Custódio FL. Supersecondary Structures and Fragment Libraries. Methods Mol Biol 2019; 1958:283-295. [PMID: 30945224 DOI: 10.1007/978-1-4939-9161-7_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The use of smotifs and fragment libraries has proven useful to both simplify and increase the quality of protein models. Here, we present Profrager, a tool that automatically generates putative structural fragments to reproduce local motifs of proteins given a target sequence. Profrager is highly customizable, allowing the user to select the number of fragments per library, the ranking method is able to generate fragments of all sizes, and it was recently modified to include the possibility of output exclusively smotifs.
Collapse
|
5
|
MacCarthy E, Perry D, Kc DB. Advances in Protein Super-Secondary Structure Prediction and Application to Protein Structure Prediction. Methods Mol Biol 2019; 1958:15-45. [PMID: 30945212 DOI: 10.1007/978-1-4939-9161-7_2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Due to the advancement in various sequencing technologies, the gap between the number of protein sequences and the number of experimental protein structures is ever increasing. Community-wide initiatives like CASP have resulted in considerable efforts in the development of computational methods to accurately model protein structures from sequences. Sequence-based prediction of super-secondary structure has direct application in protein structure prediction, and there have been significant efforts in the prediction of super-secondary structure in the last decade. In this chapter, we first introduce the protein structure prediction problem and highlight some of the important progress in the field of protein structure prediction. Next, we discuss recent methods for the prediction of super-secondary structures. Finally, we discuss applications of super-secondary structure prediction in structure prediction/analysis of proteins. We also discuss prediction of protein structures that are composed of simple super-secondary structure repeats and protein structures that are composed of complex super-secondary structure repeats. Finally, we also discuss the recent trends in the field.
Collapse
Affiliation(s)
- Elijah MacCarthy
- Department of Computational Science and Engineering, North Carolina A&T State University, Greensboro, NC, USA
| | - Derrick Perry
- Department of Computational Science and Engineering, North Carolina A&T State University, Greensboro, NC, USA
| | - Dukka B Kc
- Department of Computational Science and Engineering, North Carolina A&T State University, Greensboro, NC, USA.
| |
Collapse
|
6
|
Abstract
Drugs modulate disease states through their actions on targets in the body. Determining these targets aids the focused development of new treatments, and helps to better characterize those already employed. One means of accomplishing this is through the deployment of in silico methodologies, harnessing computational analytical and predictive power to produce educated hypotheses for experimental verification. Here, we provide an overview of the current state of the art, describe some of the well-established methods in detail, and reflect on how they, and emerging technologies promoting the incorporation of complex and heterogeneous data-sets, can be employed to improve our understanding of (poly)pharmacology.
Collapse
Affiliation(s)
- Ryan Byrne
- Department of Chemistry and Applied Biosciences, Swiss Federal Institute of Technology (ETH), Zurich, Switzerland
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, Swiss Federal Institute of Technology (ETH), Zurich, Switzerland.
| |
Collapse
|
7
|
Holliday GL, Brown SD, Akiva E, Mischel D, Hicks MA, Morris JH, Huang CC, Meng EC, Pegg SCH, Ferrin TE, Babbitt PC. Biocuration in the structure-function linkage database: the anatomy of a superfamily. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2017; 2017:3074783. [PMID: 28365730 PMCID: PMC5467563 DOI: 10.1093/database/bax006] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/28/2016] [Accepted: 01/23/2017] [Indexed: 12/11/2022]
Abstract
With ever-increasing amounts of sequence data available in both the primary literature and sequence repositories, there is a bottleneck in annotating molecular function to a sequence. This article describes the biocuration process and methods used in the structure-function linkage database (SFLD) to help address some of the challenges. We discuss how the hierarchy within the SFLD allows us to infer detailed functional properties for functionally diverse enzyme superfamilies in which all members are homologous, conserve an aspect of their chemical function and have associated conserved structural features that enable the chemistry. Also presented is the Enzyme Structure-Function Ontology (ESFO), which has been designed to capture the relationships between enzyme sequence, structure and function that underlie the SFLD and is used to guide the biocuration processes within the SFLD. Database URL:http://sfld.rbvi.ucsf.edu/
Collapse
Affiliation(s)
- Gemma L Holliday
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143, USA
| | - Shoshana D Brown
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143, USA
| | - Eyal Akiva
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143, USA
| | - David Mischel
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143, USA
| | - Michael A Hicks
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143, USA.,Human Longevity, Inc, San Diego, CA 92121, USA
| | - John H Morris
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94143, USA
| | - Conrad C Huang
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94143, USA
| | - Elaine C Meng
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94143, USA
| | | | - Thomas E Ferrin
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94143, USA.,California Institute for Quantitative Biosciences, University of California, San Francisco, CA 94158, USA
| | - Patricia C Babbitt
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143, USA.,Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94143, USA.,California Institute for Quantitative Biosciences, University of California, San Francisco, CA 94158, USA
| |
Collapse
|
8
|
Abstract
The dramatic increase in the number of protein sequences and structures deposited in biological databases has led to the development of many bioinformatics tools and programs to manage, validate, compare, and interpret this large volume of data. In addition, powerful tools are being developed to use this sequence and structural data to facilitate protein classification and infer biological function of newly identified proteins. This chapter covers freely available bioinformatics resources on the World Wide Web that are commonly used for protein structure analysis.
Collapse
Affiliation(s)
- Jason J Paxman
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Rm 521, LIMS1, Kingsbury Drive, Bundoora, Melbourne, VIC, 3086, Australia
| | - Begoña Heras
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Rm 521, LIMS1, Kingsbury Drive, Bundoora, Melbourne, VIC, 3086, Australia.
| |
Collapse
|
9
|
Adamczak R, Meller J. UQlust: combining profile hashing with linear-time ranking for efficient clustering and analysis of big macromolecular data. BMC Bioinformatics 2016; 17:546. [PMID: 28031034 PMCID: PMC5198500 DOI: 10.1186/s12859-016-1381-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Accepted: 11/23/2016] [Indexed: 12/01/2022] Open
Abstract
Background Advances in computing have enabled current protein and RNA structure prediction and molecular simulation methods to dramatically increase their sampling of conformational spaces. The quickly growing number of experimentally resolved structures, and databases such as the Protein Data Bank, also implies large scale structural similarity analyses to retrieve and classify macromolecular data. Consequently, the computational cost of structure comparison and clustering for large sets of macromolecular structures has become a bottleneck that necessitates further algorithmic improvements and development of efficient software solutions. Results uQlust is a versatile and easy-to-use tool for ultrafast ranking and clustering of macromolecular structures. uQlust makes use of structural profiles of proteins and nucleic acids, while combining a linear-time algorithm for implicit comparison of all pairs of models with profile hashing to enable efficient clustering of large data sets with a low memory footprint. In addition to ranking and clustering of large sets of models of the same protein or RNA molecule, uQlust can also be used in conjunction with fragment-based profiles in order to cluster structures of arbitrary length. For example, hierarchical clustering of the entire PDB using profile hashing can be performed on a typical laptop, thus opening an avenue for structural explorations previously limited to dedicated resources. The uQlust package is freely available under the GNU General Public License at https://github.com/uQlust. Conclusion uQlust represents a drastic reduction in the computational complexity and memory requirements with respect to existing clustering and model quality assessment methods for macromolecular structure analysis, while yielding results on par with traditional approaches for both proteins and RNAs. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1381-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Rafal Adamczak
- Department of Informatics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100, Torun, Poland.
| | - Jarek Meller
- Department of Informatics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100, Torun, Poland. .,Departments of Environmental Health and Electrical Engineering & Computing Systems, University of Cincinnati, Cincinnati, USA. .,Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, USA.
| |
Collapse
|
10
|
Stogios PJ, Cox G, Spanogiannopoulos P, Pillon MC, Waglechner N, Skarina T, Koteva K, Guarné A, Savchenko A, Wright GD. Rifampin phosphotransferase is an unusual antibiotic resistance kinase. Nat Commun 2016; 7:11343. [PMID: 27103605 PMCID: PMC4844700 DOI: 10.1038/ncomms11343] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2015] [Accepted: 03/15/2016] [Indexed: 11/11/2022] Open
Abstract
Rifampin (RIF) phosphotransferase (RPH) confers antibiotic resistance by conversion of RIF and ATP, to inactive phospho-RIF, AMP and Pi. Here we present the crystal structure of RPH from Listeria monocytogenes (RPH-Lm), which reveals that the enzyme is comprised of three domains: two substrate-binding domains (ATP-grasp and RIF-binding domains); and a smaller phosphate-carrying His swivel domain. Using solution small-angle X-ray scattering and mutagenesis, we reveal a mechanism where the swivel domain transits between the spatially distinct substrate-binding sites during catalysis. RPHs are previously uncharacterized dikinases that are widespread in environmental and pathogenic bacteria. These enzymes are members of a large unexplored group of bacterial enzymes with substrate affinities that have yet to be fully explored. Such an enzymatically complex mechanism of antibiotic resistance augments the spectrum of strategies used by bacteria to evade antimicrobial compounds.
Collapse
Affiliation(s)
- Peter J. Stogios
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada M5G 1L6
| | - Georgina Cox
- M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, McMaster University, 1280 Main St W, Hamilton, Ontario, Canada L8S 4K1
| | - Peter Spanogiannopoulos
- M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, McMaster University, 1280 Main St W, Hamilton, Ontario, Canada L8S 4K1
| | - Monica C. Pillon
- Department of Biochemistry and Biomedical Sciences, McMaster University, 1280 Main St W, Hamilton, Ontario, Canada L8S 4K1
| | - Nicholas Waglechner
- M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, McMaster University, 1280 Main St W, Hamilton, Ontario, Canada L8S 4K1
| | - Tatiana Skarina
- M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, McMaster University, 1280 Main St W, Hamilton, Ontario, Canada L8S 4K1
| | - Kalinka Koteva
- M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, McMaster University, 1280 Main St W, Hamilton, Ontario, Canada L8S 4K1
| | - Alba Guarné
- Department of Biochemistry and Biomedical Sciences, McMaster University, 1280 Main St W, Hamilton, Ontario, Canada L8S 4K1
| | - Alexei Savchenko
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada M5G 1L6
| | - Gerard D. Wright
- M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, McMaster University, 1280 Main St W, Hamilton, Ontario, Canada L8S 4K1
| |
Collapse
|
11
|
Tiwari SP, Reuter N. Similarity in Shape Dictates Signature Intrinsic Dynamics Despite No Functional Conservation in TIM Barrel Enzymes. PLoS Comput Biol 2016; 12:e1004834. [PMID: 27015412 PMCID: PMC4807811 DOI: 10.1371/journal.pcbi.1004834] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2015] [Accepted: 02/25/2016] [Indexed: 11/19/2022] Open
Abstract
The conservation of the intrinsic dynamics of proteins emerges as we attempt to understand the relationship between sequence, structure and functional conservation. We characterise the conservation of such dynamics in a case where the structure is conserved but function differs greatly. The triosephosphate isomerase barrel fold (TBF), renowned for its 8 β-strand-α-helix repeats that close to form a barrel, is one of the most diverse and abundant folds found in known protein structures. Proteins with this fold have diverse enzymatic functions spanning five of six Enzyme Commission classes, and we have picked five different superfamily candidates for our analysis using elastic network models. We find that the overall shape is a large determinant in the similarity of the intrinsic dynamics, regardless of function. In particular, the β-barrel core is highly rigid, while the α-helices that flank the β-strands have greater relative mobility, allowing for the many possibilities for placement of catalytic residues. We find that these elements correlate with each other via the loops that link them, as opposed to being directly correlated. We are also able to analyse the types of motions encoded by the normal mode vectors of the α-helices. We suggest that the global conservation of the intrinsic dynamics in the TBF contributes greatly to its success as an enzymatic scaffold both through evolution and enzyme design.
Collapse
Affiliation(s)
- Sandhya P. Tiwari
- Department of Molecular Biology, University of Bergen, Pb. 7803, Bergen, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Pb. 7803, Bergen, Norway
| | - Nathalie Reuter
- Department of Molecular Biology, University of Bergen, Pb. 7803, Bergen, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Pb. 7803, Bergen, Norway
- * E-mail:
| |
Collapse
|
12
|
Xu J, Zhang J. Impact of structure space continuity on protein fold classification. Sci Rep 2016; 6:23263. [PMID: 27006112 PMCID: PMC4804218 DOI: 10.1038/srep23263] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2015] [Accepted: 03/03/2016] [Indexed: 11/09/2022] Open
Abstract
Protein structure classification hierarchically clusters domain structures based on structure and/or sequence similarities and plays important roles in the study of protein structure-function relationship and protein evolution. Among many classifications, SCOP and CATH are widely viewed as the gold standards. Fold classification is of special interest because this is the lowest level of classification that does not depend on protein sequence similarity. The current fold classifications such as those in SCOP and CATH are controversial because they implicitly assume that folds are discrete islands in the structure space, whereas increasing evidence suggests significant similarities among folds and supports a continuous fold space. Although this problem is widely recognized, its impact on fold classification has not been quantitatively evaluated. Here we develop a likelihood method to classify a domain into the existing folds of CATH or SCOP using both query-fold structure similarities and within-fold structure heterogeneities. The new classification differs from the original classification for 3.4-12% of domains, depending on factors such as the structure similarity score and original classification scheme used. Because these factors differ for different biological purposes, our results indicate that the importance of considering structure space continuity in fold classification depends on the specific question asked.
Collapse
Affiliation(s)
- Jinrui Xu
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
13
|
Scott RA, Lindow SE. Transcriptional control of quorum sensing and associated metabolic interactions inPseudomonas syringaestrain B728a. Mol Microbiol 2016; 99:1080-98. [DOI: 10.1111/mmi.13289] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Revised: 12/02/2015] [Indexed: 11/29/2022]
Affiliation(s)
- Russell A. Scott
- Department of Plant and Microbial Biology; University of California; 111 Koshland Hall Berkeley CA 94720-3102 USA
| | - Steven E. Lindow
- Department of Plant and Microbial Biology; University of California; 111 Koshland Hall Berkeley CA 94720-3102 USA
| |
Collapse
|
14
|
Kuang X, Dhroso A, Han JG, Shyu CR, Korkin D. DOMMINO 2.0: integrating structurally resolved protein-, RNA-, and DNA-mediated macromolecular interactions. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:bav114. [PMID: 26827237 PMCID: PMC4733329 DOI: 10.1093/database/bav114] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2015] [Accepted: 11/16/2015] [Indexed: 11/14/2022]
Abstract
Macromolecular interactions are formed between proteins, DNA and RNA molecules. Being a principle building block in macromolecular assemblies and pathways, the interactions underlie most of cellular functions. Malfunctioning of macromolecular interactions is also linked to a number of diseases. Structural knowledge of the macromolecular interaction allows one to understand the interaction's mechanism, determine its functional implications and characterize the effects of genetic variations, such as single nucleotide polymorphisms, on the interaction. Unfortunately, until now the interactions mediated by different types of macromolecules, e.g. protein-protein interactions or protein-DNA interactions, are collected into individual and unrelated structural databases. This presents a significant obstacle in the analysis of macromolecular interactions. For instance, the homogeneous structural interaction databases prevent scientists from studying structural interactions of different types but occurring in the same macromolecular complex. Here, we introduce DOMMINO 2.0, a structural Database Of Macro-Molecular INteractiOns. Compared to DOMMINO 1.0, a comprehensive database on protein-protein interactions, DOMMINO 2.0 includes the interactions between all three basic types of macromolecules extracted from PDB files. DOMMINO 2.0 is automatically updated on a weekly basis. It currently includes ∼1,040,000 interactions between two polypeptide subunits (e.g. domains, peptides, termini and interdomain linkers), ∼43,000 RNA-mediated interactions, and ∼12,000 DNA-mediated interactions. All protein structures in the database are annotated using SCOP and SUPERFAMILY family annotation. As a result, protein-mediated interactions involving protein domains, interdomain linkers, C- and N- termini, and peptides are identified. Our database provides an intuitive web interface, allowing one to investigate interactions at three different resolution levels: whole subunit network, binary interaction and interaction interface. Database URL: http://dommino.org.
Collapse
Affiliation(s)
- Xingyan Kuang
- Informatics Institute, University of Missouri, Columbia, MO, USA
| | - Andi Dhroso
- Department of Computer Science and Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA, USA
| | - Jing Ginger Han
- Informatics Institute, University of Missouri, Columbia, MO, USA
| | - Chi-Ren Shyu
- Informatics Institute, University of Missouri, Columbia, MO, USA, Department of Electrical and Computer Engineering, Department of Computer Science, University of Missouri, Columbia, MO, USA
| | - Dmitry Korkin
- Department of Computer Science and Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA, USA,
| |
Collapse
|
15
|
Scaiewicz A, Levitt M. The language of the protein universe. Curr Opin Genet Dev 2015; 35:50-6. [PMID: 26451980 DOI: 10.1016/j.gde.2015.08.010] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Revised: 08/20/2015] [Accepted: 08/25/2015] [Indexed: 11/17/2022]
Abstract
Proteins, the main cell machinery which play a major role in nearly every cellular process, have always been a central focus in biology. We live in the post-genomic era, and inferring information from massive data sets is a steadily growing universal challenge. The increasing availability of fully sequenced genomes can be regarded as the 'Rosetta Stone' of the protein universe, allowing the understanding of genomes and their evolution, just as the original Rosetta Stone allowed Champollion to decipher the ancient Egyptian hieroglyphics. In this review, we consider aspects of the protein domain architectures repertoire that are closely related to those of human languages and aim to provide some insights about the language of proteins.
Collapse
Affiliation(s)
- Andrea Scaiewicz
- Department of Structural Biology, Stanford University, Stanford, CA 94305-5126, United States
| | - Michael Levitt
- Department of Structural Biology, Stanford University, Stanford, CA 94305-5126, United States.
| |
Collapse
|
16
|
Xue Z, Jang R, Govindarajoo B, Huang Y, Wang Y. Extending Protein Domain Boundary Predictors to Detect Discontinuous Domains. PLoS One 2015; 10:e0141541. [PMID: 26502173 PMCID: PMC4621036 DOI: 10.1371/journal.pone.0141541] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2015] [Accepted: 10/10/2015] [Indexed: 11/18/2022] Open
Abstract
A variety of protein domain predictors were developed to predict protein domain boundaries in recent years, but most of them cannot predict discontinuous domains. Considering nearly 40% of multidomain proteins contain one or more discontinuous domains, we have developed DomEx to enable domain boundary predictors to detect discontinuous domains by assembling the continuous domain segments. Discontinuous domains are predicted by matching the sequence profile of concatenated continuous domain segments with the profiles from a single-domain library derived from SCOP and CATH, and Pfam. Then the matches are filtered by similarity to library templates, a symmetric index score and a profile-profile alignment score. DomEx recalled 32.3% discontinuous domains with 86.5% precision when tested on 97 non-homologous protein chains containing 58 continuous and 99 discontinuous domains, in which the predicted domain segments are within ±20 residues of the boundary definitions in CATH 3.5. Compared with our recently developed predictor, ThreaDom, which is the state-of-the-art tool to detect discontinuous-domains, DomEx recalled 26.7% discontinuous domains with 72.7% precision in a benchmark with 29 discontinuous-domain chains, where ThreaDom failed to predict any discontinuous domains. Furthermore, combined with ThreaDom, the method ranked number one among 10 predictors. The source code and datasets are available at https://github.com/xuezhidong/DomEx.
Collapse
Affiliation(s)
- Zhidong Xue
- School of Software Engineering, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
- * E-mail: (ZX); (YW)
| | - Richard Jang
- School of Software Engineering, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, United States of America
| | - Brandon Govindarajoo
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, United States of America
| | - Yichu Huang
- School of Software Engineering, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
| | - Yan Wang
- School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
- * E-mail: (ZX); (YW)
| |
Collapse
|
17
|
Molecular modeling, mutational analysis and conformational switching in IL27: An in silico structural insight towards AIDS research. Gene 2015; 576:72-8. [PMID: 26432006 DOI: 10.1016/j.gene.2015.09.075] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2015] [Revised: 08/09/2015] [Accepted: 09/25/2015] [Indexed: 11/22/2022]
Abstract
The advancement in proteomics and bioinformatics provokes to discern the molecular-level probe for HIV inhibitor; human interleukin-27 (IL27). Documentation documents that tyrosine residues in IL27 play a pivotal role for interacting with HIV, causing apoptosis of the HIV+ cells. Primarily, 3D structure of human wild-type (WT) IL27 was built through manifold molecular modeling techniques after the satisfaction of stereo-chemical properties. Its essential tyrosine residues were identified. Two mutant models for IL27 were prepared following the similar protocol by first substituting the tyrosine residues with glycine (MT_G) and then with alanine (MT_A) in the WT protein. Molecular dynamics (MD) simulation was performed to obtain a stable conformation. Conformational alterations in WT, MT_G and MT_A (before and after MD simulation) disclosed that MT_A was the steadiest one with the best secondary structure conformation supported by statistical significances. Though huge RMSD variations were observed on superimposing the MT structures on WT individually, the MTs were examined to share similar SCOP/CATH fold with TM-score=0.8, indicating that they retained their functionality even after mutation. Electrostatic surface potential again unveiled MT_A to be the most stable one. MT_A was thereby revealed to be the potent peptide inhibitor for HIV. This probe presents a pathway to investigate and compare the bio-molecular interaction of WT IL27 and MT_A IL27 (strongest model) with HIV in the future. This is the first report regarding the structural biology of IL27 accompanied by alteration at its genetic level and delving into the unknown residue-level and functional biochemistry for bringing about an annihilation towards AIDS.
Collapse
|
18
|
An assessment of the amount of untapped fold level novelty in under-sampled areas of the tree of life. Sci Rep 2015; 5:14717. [PMID: 26434770 PMCID: PMC4592975 DOI: 10.1038/srep14717] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Accepted: 09/07/2015] [Indexed: 11/14/2022] Open
Abstract
Previous studies of protein fold space suggest that fold coverage is plateauing. However, sequence sampling has been -and remains to a large extent- heavily biased, focusing on culturable phyla. Sustained technological developments have fuelled the advent of metagenomics and single-cell sequencing, which might correct the current sequencing bias. The extent to which these efforts affect structural diversity remains unclear, although preliminary results suggest that uncultured organisms could constitute a source of new folds. We investigate to what extent genomes from uncultured and under-sampled phyla accessed through single cell sequencing, metagenomics and high-throughput culturing efforts have the potential to increase protein fold space, and conclude that i) genomes from under-sampled phyla appear enriched in sequences not covered by current protein family and fold profile libraries, ii) this enrichment is linked to an excess of short (and possibly partly spurious) sequences in some of the datasets, iii) the discovery rate of novel folds among sequences uncovered by current fold and family profile libraries may be as high as 36%, but would ultimately translate into a marginal increase in global discovery of novel folds. Thus, genomes from under-sampled phyla should have a rather limited impact on increasing coarse grained tertiary structure level novelty.
Collapse
|
19
|
Fox NK, Brenner SE, Chandonia JM. The value of protein structure classification information-Surveying the scientific literature. Proteins 2015; 83:2025-38. [PMID: 26313554 PMCID: PMC4609302 DOI: 10.1002/prot.24915] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2015] [Revised: 08/06/2015] [Accepted: 08/18/2015] [Indexed: 11/08/2022]
Abstract
The Structural Classification of Proteins (SCOP) and Class, Architecture, Topology, Homology (CATH) databases have been valuable resources for protein structure classification for over 20 years. Development of SCOP (version 1) concluded in June 2009 with SCOP 1.75. The SCOPe (SCOP-extended) database offers continued development of the classic SCOP hierarchy, adding over 33,000 structures. We have attempted to assess the impact of these two decade old resources and guide future development. To this end, we surveyed recent articles to learn how structure classification data are used. Of 571 articles published in 2012-2013 that cite SCOP, 439 actually use data from the resource. We found that the type of use was fairly evenly distributed among four top categories: A) study protein structure or evolution (27% of articles), B) train and/or benchmark algorithms (28% of articles), C) augment non-SCOP datasets with SCOP classification (21% of articles), and D) examine the classification of one protein/a small set of proteins (22% of articles). Most articles described computational research, although 11% described purely experimental research, and a further 9% included both. We examined how CATH and SCOP were used in 158 articles that cited both databases: while some studies used only one dataset, the majority used data from both resources. Protein structure classification remains highly relevant for a diverse range of problems and settings.
Collapse
Affiliation(s)
- Naomi K Fox
- Lawrence Berkeley National Laboratory, Physical Biosciences Division, Berkeley, California, 94720
| | - Steven E Brenner
- Lawrence Berkeley National Laboratory, Physical Biosciences Division, Berkeley, California, 94720.,Department of Plant and Microbial Biology, University of California, Berkeley, California, 94720
| | - John-Marc Chandonia
- Lawrence Berkeley National Laboratory, Physical Biosciences Division, Berkeley, California, 94720
| |
Collapse
|
20
|
Stolte C, Sabir KS, Heinrich J, Hammang CJ, Schafferhans A, O'Donoghue SI. Integrated visual analysis of protein structures, sequences, and feature data. BMC Bioinformatics 2015; 16 Suppl 11:S7. [PMID: 26329268 PMCID: PMC4547178 DOI: 10.1186/1471-2105-16-s11-s7] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND To understand the molecular mechanisms that give rise to a protein's function, biologists often need to (i) find and access all related atomic-resolution 3D structures, and (ii) map sequence-based features (e.g., domains, single-nucleotide polymorphisms, post-translational modifications) onto these structures. RESULTS To streamline these processes we recently developed Aquaria, a resource offering unprecedented access to protein structure information based on an all-against-all comparison of SwissProt and PDB sequences. In this work, we provide a requirements analysis for several frequently occuring tasks in molecular biology and describe how design choices in Aquaria meet these requirements. Finally, we show how the interface can be used to explore features of a protein and gain biologically meaningful insights in two case studies conducted by domain experts. CONCLUSIONS The user interface design of Aquaria enables biologists to gain unprecedented access to molecular structures and simplifies the generation of insight. The tasks involved in mapping sequence features onto structures can be conducted easier and faster using Aquaria.
Collapse
Affiliation(s)
| | - Kenneth S Sabir
- The Garvan Institute of Medical Research, Sydney, Australia
- The University of Sydney, Sydney, Australia
| | | | | | | | - Seán I O'Donoghue
- CSIRO, Sydney, Australia
- The Garvan Institute of Medical Research, Sydney, Australia
- The University of Sydney, Sydney, Australia
| |
Collapse
|
21
|
Vallat B, Madrid-Aliste C, Fiser A. Modularity of Protein Folds as a Tool for Template-Free Modeling of Structures. PLoS Comput Biol 2015; 11:e1004419. [PMID: 26252221 PMCID: PMC4529212 DOI: 10.1371/journal.pcbi.1004419] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2015] [Accepted: 06/30/2015] [Indexed: 12/25/2022] Open
Abstract
Predicting the three-dimensional structure of proteins from their amino acid sequences remains a challenging problem in molecular biology. While the current structural coverage of proteins is almost exclusively provided by template-based techniques, the modeling of the rest of the protein sequences increasingly require template-free methods. However, template-free modeling methods are much less reliable and are usually applicable for smaller proteins, leaving much space for improvement. We present here a novel computational method that uses a library of supersecondary structure fragments, known as Smotifs, to model protein structures. The library of Smotifs has saturated over time, providing a theoretical foundation for efficient modeling. The method relies on weak sequence signals from remotely related protein structures to create a library of Smotif fragments specific to the target protein sequence. This Smotif library is exploited in a fragment assembly protocol to sample decoys, which are assessed by a composite scoring function. Since the Smotif fragments are larger in size compared to the ones used in other fragment-based methods, the proposed modeling algorithm, SmotifTF, can employ an exhaustive sampling during decoy assembly. SmotifTF successfully predicts the overall fold of the target proteins in about 50% of the test cases and performs competitively when compared to other state of the art prediction methods, especially when sequence signal to remote homologs is diminishing. Smotif-based modeling is complementary to current prediction methods and provides a promising direction in addressing the structure prediction problem, especially when targeting larger proteins for modeling. Each protein folds into a unique three-dimensional structure that enables it to carry out its biological function. Knowledge of the atomic details of protein structures is therefore a key to understanding their function. Advances in high throughput experimental technologies have lead to an exponential increase in the availability of known protein sequences. Although strong progress has been made in experimental protein structure determination, it remains a fact that more than 99% of structural information is provided by computational modeling methods. We describe here a novel structure prediction method, SmotifTF, which uses a unique library of known protein fragments to assemble the three-dimensional structure of a sequence. The fragment library has saturated over time and therefore provides a complete set of building blocks required for model building. The method performs competitively compared to existing methods of structure prediction.
Collapse
Affiliation(s)
- Brinda Vallat
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, New York, United States of America
| | - Carlos Madrid-Aliste
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, New York, United States of America
| | - Andras Fiser
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, New York, United States of America
| |
Collapse
|
22
|
De novo protein design: how do we expand into the universe of possible protein structures? Curr Opin Struct Biol 2015; 33:16-26. [DOI: 10.1016/j.sbi.2015.05.009] [Citation(s) in RCA: 128] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Revised: 05/15/2015] [Accepted: 05/25/2015] [Indexed: 01/08/2023]
|
23
|
Currin A, Swainston N, Day PJ, Kell DB. Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligently. Chem Soc Rev 2015; 44:1172-239. [PMID: 25503938 PMCID: PMC4349129 DOI: 10.1039/c4cs00351a] [Citation(s) in RCA: 258] [Impact Index Per Article: 25.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Indexed: 12/21/2022]
Abstract
The amino acid sequence of a protein affects both its structure and its function. Thus, the ability to modify the sequence, and hence the structure and activity, of individual proteins in a systematic way, opens up many opportunities, both scientifically and (as we focus on here) for exploitation in biocatalysis. Modern methods of synthetic biology, whereby increasingly large sequences of DNA can be synthesised de novo, allow an unprecedented ability to engineer proteins with novel functions. However, the number of possible proteins is far too large to test individually, so we need means for navigating the 'search space' of possible protein sequences efficiently and reliably in order to find desirable activities and other properties. Enzymologists distinguish binding (Kd) and catalytic (kcat) steps. In a similar way, judicious strategies have blended design (for binding, specificity and active site modelling) with the more empirical methods of classical directed evolution (DE) for improving kcat (where natural evolution rarely seeks the highest values), especially with regard to residues distant from the active site and where the functional linkages underpinning enzyme dynamics are both unknown and hard to predict. Epistasis (where the 'best' amino acid at one site depends on that or those at others) is a notable feature of directed evolution. The aim of this review is to highlight some of the approaches that are being developed to allow us to use directed evolution to improve enzyme properties, often dramatically. We note that directed evolution differs in a number of ways from natural evolution, including in particular the available mechanisms and the likely selection pressures. Thus, we stress the opportunities afforded by techniques that enable one to map sequence to (structure and) activity in silico, as an effective means of modelling and exploring protein landscapes. Because known landscapes may be assessed and reasoned about as a whole, simultaneously, this offers opportunities for protein improvement not readily available to natural evolution on rapid timescales. Intelligent landscape navigation, informed by sequence-activity relationships and coupled to the emerging methods of synthetic biology, offers scope for the development of novel biocatalysts that are both highly active and robust.
Collapse
Affiliation(s)
- Andrew Currin
- Manchester Institute of Biotechnology , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK . ; http://dbkgroup.org/; @dbkell ; Tel: +44 (0)161 306 4492
- School of Chemistry , The University of Manchester , Manchester M13 9PL , UK
- Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK
| | - Neil Swainston
- Manchester Institute of Biotechnology , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK . ; http://dbkgroup.org/; @dbkell ; Tel: +44 (0)161 306 4492
- Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK
- School of Computer Science , The University of Manchester , Manchester M13 9PL , UK
| | - Philip J. Day
- Manchester Institute of Biotechnology , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK . ; http://dbkgroup.org/; @dbkell ; Tel: +44 (0)161 306 4492
- Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK
- Faculty of Medical and Human Sciences , The University of Manchester , Manchester M13 9PT , UK
| | - Douglas B. Kell
- Manchester Institute of Biotechnology , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK . ; http://dbkgroup.org/; @dbkell ; Tel: +44 (0)161 306 4492
- School of Chemistry , The University of Manchester , Manchester M13 9PL , UK
- Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK
| |
Collapse
|
24
|
Roche DB, Brüls T. The enzymatic nature of an anonymous protein sequence cannot reliably be inferred from superfamily level structural information alone. Protein Sci 2015; 24:643-50. [PMID: 25559918 DOI: 10.1002/pro.2635] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2014] [Revised: 12/13/2014] [Accepted: 12/29/2014] [Indexed: 11/12/2022]
Abstract
As the largest fraction of any proteome does not carry out enzymatic functions, and in order to leverage 3D structural data for the annotation of increasingly higher volumes of sequence data, we wanted to assess the strength of the link between coarse grained structural data (i.e., homologous superfamily level) and the enzymatic versus non-enzymatic nature of protein sequences. To probe this relationship, we took advantage of 41 phylogenetically diverse (encompassing 11 distinct phyla) genomes recently sequenced within the GEBA initiative, for which we integrated structural information, as defined by CATH, with enzyme level information, as defined by Enzyme Commission (EC) numbers. This analysis revealed that only a very small fraction (about 1%) of domain sequences occurring in the analyzed genomes was found to be associated with homologous superfamilies strongly indicative of enzymatic function. Resorting to less stringent criteria to define enzyme versus non-enzyme biased structural classes or excluding highly prevalent folds from the analysis had only modest effect on this proportion. Thus, the low genomic coverage by structurally anchored protein domains strongly associated to catalytic activities indicates that, on its own, the power of coarse grained structural information to infer the general property of being an enzyme is rather limited.
Collapse
Affiliation(s)
- Daniel Barry Roche
- Laboratoire de génomique et biochimie du métabolisme, Genoscope, Institut de Génomique, Commissariat à l'Energie Atomique et aux Energies Alternatives, Evry, Essonne, 91057, France; UMR 8030 - Génomique Métabolique, Centre National de la Recherche Scientifique, Evry, Essonne, 91057, France; Départment de Biologie, Université d'Evry-Val-d'Essonne, Evry, Essonne, 91000, France; PRES UniverSud Paris, Saint-Aubin, Essonne, 91190, France
| | | |
Collapse
|
25
|
Abstract
Regulated interactions between proteins govern signaling pathways within and between cells. Structural studies on protein complexes formed reversibly and/or transiently illustrate the remarkable diversity of interactions, both in terms of interfacial size and nature. In recent years, "domain-peptide" interactions have gained much greater recognition and may be viewed as both pre-translational and posttranslational-dependent functional switches. Our understanding of the multistep regulation of auto-inhibited multidomain proteins has also grown. Their activity may be understood as the "combinatorial" output of multiple input signals, including phosphorylation, location, and mechanical force. The prospects for bridging the gap between the new "systems biology" data and the traditional "reductionist" data are also discussed.
Collapse
Affiliation(s)
- Robert C Liddington
- Sanford-Burnham Medical Research Institute, 10901 North Torrey Pines Road, La Jolla, CA, 92037, USA,
| |
Collapse
|
26
|
Fernandes P, Aldeborgh H, Carlucci L, Walsh L, Wasserman J, Zhou E, Lefurgy ST, Mundorff EC. Alteration of substrate specificity of alanine dehydrogenase. Protein Eng Des Sel 2014; 28:29-35. [PMID: 25538307 DOI: 10.1093/protein/gzu053] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The l-alanine dehydrogenase (AlaDH) has a natural history that suggests it would not be a promising candidate for expansion of substrate specificity by protein engineering: it is the only amino acid dehydrogenase in its fold family, it has no sequence or structural similarity to any known amino acid dehydrogenase, and it has a strong preference for l-alanine over all other substrates. By contrast, engineering of the amino acid dehydrogenase superfamily members has produced catalysts with expanded substrate specificity; yet, this enzyme family already contains members that accept a broad range of substrates. To test whether the natural history of an enzyme is a predictor of its innate evolvability, directed evolution was carried out on AlaDH. A single mutation identified through molecular modeling, F94S, introduced into the AlaDH from Mycobacterium tuberculosis (MtAlaDH) completely alters its substrate specificity pattern, enabling activity toward a range of larger amino acids. Saturation mutagenesis libraries in this mutant background additionally identified a double mutant (F94S/Y117L) showing improved activity toward hydrophobic amino acids. The catalytic efficiencies achieved in AlaDH are comparable with those that resulted from similar efforts in the amino acid dehydrogenase superfamily and demonstrate the evolvability of MtAlaDH specificity toward other amino acid substrates.
Collapse
Affiliation(s)
- Puja Fernandes
- Chemistry Department, Hofstra University, Hempstead, NY 11549, USA
| | - Hannah Aldeborgh
- Chemistry Department, Vassar College, Poughkeepsie, NY 12604, USA
| | - Lauren Carlucci
- Chemistry Department, Hofstra University, Hempstead, NY 11549, USA
| | - Lauren Walsh
- Chemistry Department, Hofstra University, Hempstead, NY 11549, USA
| | - Jordan Wasserman
- Chemistry Department, Hofstra University, Hempstead, NY 11549, USA
| | - Edward Zhou
- Chemistry Department, Hofstra University, Hempstead, NY 11549, USA
| | - Scott T Lefurgy
- Chemistry Department, Hofstra University, Hempstead, NY 11549, USA
| | - Emily C Mundorff
- Chemistry Department, Hofstra University, Hempstead, NY 11549, USA
| |
Collapse
|
27
|
Archbold JK, Whitten AE, Hu SH, Collins BM, Martin JL. SNARE-ing the structures of Sec1/Munc18 proteins. Curr Opin Struct Biol 2014; 29:44-51. [DOI: 10.1016/j.sbi.2014.09.003] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2014] [Revised: 09/09/2014] [Accepted: 09/12/2014] [Indexed: 10/24/2022]
|
28
|
Mitchell A, Chang HY, Daugherty L, Fraser M, Hunter S, Lopez R, McAnulla C, McMenamin C, Nuka G, Pesseat S, Sangrador-Vegas A, Scheremetjew M, Rato C, Yong SY, Bateman A, Punta M, Attwood TK, Sigrist CJA, Redaschi N, Rivoire C, Xenarios I, Kahn D, Guyot D, Bork P, Letunic I, Gough J, Oates M, Haft D, Huang H, Natale DA, Wu CH, Orengo C, Sillitoe I, Mi H, Thomas PD, Finn RD. The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Res 2014; 43:D213-21. [PMID: 25428371 PMCID: PMC4383996 DOI: 10.1093/nar/gku1243] [Citation(s) in RCA: 961] [Impact Index Per Article: 87.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
The InterPro database (http://www.ebi.ac.uk/interpro/) is a freely available resource that can be used to classify sequences into protein families and to predict the presence of important domains and sites. Central to the InterPro database are predictive models, known as signatures, from a range of different protein family databases that have different biological focuses and use different methodological approaches to classify protein families and domains. InterPro integrates these signatures, capitalizing on the respective strengths of the individual databases, to produce a powerful protein classification resource. Here, we report on the status of InterPro as it enters its 15th year of operation, and give an overview of new developments with the database and its associated Web interfaces and software. In particular, the new domain architecture search tool is described and the process of mapping of Gene Ontology terms to InterPro is outlined. We also discuss the challenges faced by the resource given the explosive growth in sequence data in recent years. InterPro (version 48.0) contains 36 766 member database signatures integrated into 26 238 InterPro entries, an increase of over 3993 entries (5081 signatures), since 2012.
Collapse
Affiliation(s)
- Alex Mitchell
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Hsin-Yu Chang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Louise Daugherty
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Matthew Fraser
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Sarah Hunter
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Rodrigo Lopez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Craig McAnulla
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Conor McMenamin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Gift Nuka
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Sebastien Pesseat
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Amaia Sangrador-Vegas
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Maxim Scheremetjew
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Claudia Rato
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Siew-Yit Yong
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Marco Punta
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Teresa K Attwood
- Faculty of Life Science and School of Computer Science, The University of Manchester, Manchester, M13 9PL, UK
| | - Christian J A Sigrist
- Swiss Institute of Bioinformatics (SIB), CMU - Rue Michel-Servet, 1211 Geneva 4, Switzerland
| | - Nicole Redaschi
- Swiss Institute of Bioinformatics (SIB), CMU - Rue Michel-Servet, 1211 Geneva 4, Switzerland
| | - Catherine Rivoire
- Swiss Institute of Bioinformatics (SIB), CMU - Rue Michel-Servet, 1211 Geneva 4, Switzerland
| | - Ioannis Xenarios
- Swiss Institute of Bioinformatics (SIB), CMU - Rue Michel-Servet, 1211 Geneva 4, Switzerland Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland Department of Biochemistry, University of Geneva, 1211 Geneva, Switzerland
| | - Daniel Kahn
- Pôle Rhône-Alpin de Bio-Informatique (PRABI), Batiment G. Mendel, Universite Claude Bernard, 43 bd du 11 novembre 1918, 69622 Villeurbanne Cedex, France
| | - Dominique Guyot
- Pôle Rhône-Alpin de Bio-Informatique (PRABI), Batiment G. Mendel, Universite Claude Bernard, 43 bd du 11 novembre 1918, 69622 Villeurbanne Cedex, France
| | - Peer Bork
- European Molecular Laboratory (EMBL), Meyerhofstasse 1, 69117 Heidelberg, Germany
| | - Ivica Letunic
- European Molecular Laboratory (EMBL), Meyerhofstasse 1, 69117 Heidelberg, Germany
| | - Julian Gough
- Department of Computer Science, University of Bristol, Woodland Road, Bristol, BS8 1UB, UK
| | - Matt Oates
- Department of Computer Science, University of Bristol, Woodland Road, Bristol, BS8 1UB, UK
| | - Daniel Haft
- J. Craig Venter Institute (JCVI), 9704 Medical Center Drive, Rockville, MD 20850, USA
| | - Hongzhan Huang
- Protein Information Resource (PIR), Georgetown University Medical Center, Washington, DC 20007, USA
| | - Darren A Natale
- Protein Information Resource (PIR), Georgetown University Medical Center, Washington, DC 20007, USA
| | - Cathy H Wu
- Protein Information Resource (PIR), Georgetown University Medical Center, Washington, DC 20007, USA Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
| | - Christine Orengo
- Structural and Molecular Biology Department, University College London, University of London, London, WC1E 6BT, UK
| | - Ian Sillitoe
- Structural and Molecular Biology Department, University College London, University of London, London, WC1E 6BT, UK
| | - Huaiyu Mi
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
29
|
Zhang C, Tao L, Qin C, Zhang P, Chen S, Zeng X, Xu F, Chen Z, Yang SY, Chen YZ. CFam: a chemical families database based on iterative selection of functional seeds and seed-directed compound clustering. Nucleic Acids Res 2014; 43:D558-65. [PMID: 25414339 PMCID: PMC4383987 DOI: 10.1093/nar/gku1212] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Similarity-based clustering and classification of compounds enable the search of drug leads and the structural and chemogenomic studies for facilitating chemical, biomedical, agricultural, material and other industrial applications. A database that organizes compounds into similarity-based as well as scaffold-based and property-based families is useful for facilitating these tasks. CFam Chemical Family database http://bidd2.cse.nus.edu.sg/cfam was developed to hierarchically cluster drugs, bioactive molecules, human metabolites, natural products, patented agents and other molecules into functional families, superfamilies and classes of structurally similar compounds based on the literature-reported high, intermediate and remote similarity measures. The compounds were represented by molecular fingerprint and molecular similarity was measured by Tanimoto coefficient. The functional seeds of CFam families were from hierarchically clustered drugs, bioactive molecules, human metabolites, natural products, patented agents, respectively, which were used to characterize families and cluster compounds into families, superfamilies and classes. CFam currently contains 11 643 classes, 34 880 superfamilies and 87 136 families of 490 279 compounds (1691 approved drugs, 1228 clinical trial drugs, 12 386 investigative drugs, 262 881 highly active molecules, 15 055 human metabolites, 80 255 ZINC-processed natural products and 116 783 patented agents). Efforts will be made to further expand CFam database and add more functional categories and families based on other types of molecular representations.
Collapse
Affiliation(s)
- Cheng Zhang
- Bioinformatics and Drug Design Group, Department of Pharmacy, and Center for Computational Science and Engineering, National University of Singapore, Singapore 117543 State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, West China Medical School, Sichuan University, Chengdu 610041, China Computational and Systems Biology, Singapore-MIT Alliance, National University of Singapore, Singapore
| | - Lin Tao
- Bioinformatics and Drug Design Group, Department of Pharmacy, and Center for Computational Science and Engineering, National University of Singapore, Singapore 117543 NUS Graduate School for Integrative Sciences and Engineering, Singapore 117456
| | - Chu Qin
- Bioinformatics and Drug Design Group, Department of Pharmacy, and Center for Computational Science and Engineering, National University of Singapore, Singapore 117543 NUS Graduate School for Integrative Sciences and Engineering, Singapore 117456
| | - Peng Zhang
- Bioinformatics and Drug Design Group, Department of Pharmacy, and Center for Computational Science and Engineering, National University of Singapore, Singapore 117543
| | - Shangying Chen
- Bioinformatics and Drug Design Group, Department of Pharmacy, and Center for Computational Science and Engineering, National University of Singapore, Singapore 117543
| | - Xian Zeng
- Bioinformatics and Drug Design Group, Department of Pharmacy, and Center for Computational Science and Engineering, National University of Singapore, Singapore 117543
| | - Feng Xu
- College of Pharmacy and Tianjin Key Laboratory of Molecular Drug Research, Nankai University, Tianjin 300071, China State Key Laboratory of Medicinal Chemistry & Biology, Tianjin International Joint Academy of Biotechnology & Medicine, Tianjin 300457, China
| | - Zhe Chen
- State Key Laboratory of Medicinal Chemistry & Biology, Tianjin International Joint Academy of Biotechnology & Medicine, Tianjin 300457, China
| | - Sheng Yong Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, West China Medical School, Sichuan University, Chengdu 610041, China
| | - Yu Zong Chen
- Bioinformatics and Drug Design Group, Department of Pharmacy, and Center for Computational Science and Engineering, National University of Singapore, Singapore 117543 State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, West China Medical School, Sichuan University, Chengdu 610041, China
| |
Collapse
|
30
|
Yeats C, Dessailly BH, Glass EM, Fremont DH, Orengo CA. Target selection for structural genomics of infectious diseases. Methods Mol Biol 2014; 1140:35-51. [PMID: 24590707 DOI: 10.1007/978-1-4939-0354-2_3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2023]
Abstract
This chapter describes the protocols used to identify, filter, and annotate potential protein targets from an organism associated with infectious diseases. Protocols often combine computational approaches for mining information in public databases or for checking whether the protein has already been targeted for structure determination, with manual strategies that examine the literature for information on the biological role of the protein or the experimental strategies that explore the effects of knocking out the protein. Publicly available computational tools have been cited as much as possible. Where these do not exist, the concepts underlying in-house tools developed for the Center for Structural Genomics of Infectious Diseases have been described.
Collapse
Affiliation(s)
- Corin Yeats
- Dept. of Structural and Molecular Biology, University College London, Gower Street, WC1E 6BT, London, UK
| | | | | | | | | |
Collapse
|
31
|
Nagano N, Nakayama N, Ikeda K, Fukuie M, Yokota K, Doi T, Kato T, Tomii K. EzCatDB: the enzyme reaction database, 2015 update. Nucleic Acids Res 2014; 43:D453-8. [PMID: 25324316 PMCID: PMC4384017 DOI: 10.1093/nar/gku946] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
The EzCatDB database (http://ezcatdb.cbrc.jp/EzCatDB/) has emphasized manual classification of enzyme reactions from the viewpoints of enzyme active-site structures and their catalytic mechanisms based on literature information, amino acid sequences of enzymes (UniProtKB) and the corresponding tertiary structures from the Protein Data Bank (PDB). Reaction types such as hydrolysis, transfer, addition, elimination, isomerization, hydride transfer and electron transfer have been included in the reaction classification, RLCP. This database includes information related to ligand molecules on the enzyme structures in the PDB data, classified in terms of cofactors, substrates, products and intermediates, which are also necessary to elucidate the catalytic mechanisms. Recently, the database system was updated. The 3D structures of active sites for each PDB entry can be viewed using Jmol or Rasmol software. Moreover, sequence search systems of two types were developed for the EzCatDB database: EzCat-BLAST and EzCat-FORTE. EzCat-BLAST is suitable for quick searches, adopting the BLAST algorithm, whereas EzCat-FORTE is more suitable for detecting remote homologues, adopting the algorithm for FORTE protein structure prediction software. Another system, EzMetAct, is also available to searching for major active-site structures in EzCatDB, for which PDB-formatted queries can be searched.
Collapse
Affiliation(s)
- Nozomi Nagano
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo Waterfront Bio-IT Research Building, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
| | - Naoko Nakayama
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo Waterfront Bio-IT Research Building, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
| | - Kazuyoshi Ikeda
- Level Five Co. Ltd., Grove Tower 4112, 4-21-1, Shibaura, Minato-ku, Tokyo 108-0023, Japan
| | - Masaru Fukuie
- Level Five Co. Ltd., Grove Tower 4112, 4-21-1, Shibaura, Minato-ku, Tokyo 108-0023, Japan
| | - Kiyonobu Yokota
- Level Five Co. Ltd., Grove Tower 4112, 4-21-1, Shibaura, Minato-ku, Tokyo 108-0023, Japan
| | - Takuo Doi
- Level Five Co. Ltd., Grove Tower 4112, 4-21-1, Shibaura, Minato-ku, Tokyo 108-0023, Japan
| | - Tsuyoshi Kato
- Faculty of Science and Engineering, Gunma University, 1-5-1 Tenjin-cho, Kiryu, Gunma 376-8515, Japan Center for Informational Biology, Ochanomizu University, 2-1-1 Otsuka, Bunkyo-ku, Tokyo 112-8610, Japan
| | - Kentaro Tomii
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo Waterfront Bio-IT Research Building, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
| |
Collapse
|
32
|
Berman HM, Kleywegt GJ, Nakamura H, Markley JL. The Protein Data Bank archive as an open data resource. J Comput Aided Mol Des 2014; 28:1009-14. [PMID: 25062767 PMCID: PMC4196035 DOI: 10.1007/s10822-014-9770-y] [Citation(s) in RCA: 85] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2014] [Accepted: 06/23/2014] [Indexed: 02/08/2023]
Abstract
The Protein Data Bank archive was established in 1971, and recently celebrated its 40th anniversary (Berman et al. in Structure 20:391, 2012). An analysis of interrelationships of the science, technology and community leads to further insights into how this resource evolved into one of the oldest and most widely used open-access data resources in biology.
Collapse
Affiliation(s)
- Helen M Berman
- RCSB PDB, Department of Chemistry and Chemical Biology and Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA,
| | | | | | | |
Collapse
|
33
|
Tóth-Petróczy A, Tawfik DS. The robustness and innovability of protein folds. Curr Opin Struct Biol 2014; 26:131-8. [PMID: 25038399 DOI: 10.1016/j.sbi.2014.06.007] [Citation(s) in RCA: 93] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2013] [Revised: 06/26/2014] [Accepted: 06/26/2014] [Indexed: 11/30/2022]
Abstract
Assignment of protein folds to functions indicates that >60% of folds carry out one or two enzymatic functions, while few folds, for example, the TIM-barrel and Rossmann folds, exhibit hundreds. Are there structural features that make a fold amenable to functional innovation (innovability)? Do these features relate to robustness--the ability to readily accumulate sequence changes? We discuss several hypotheses regarding the relationship between the architecture of a protein and its evolutionary potential. We describe how, in a seemingly paradoxical manner, opposite properties, such as high stability and rigidity versus conformational plasticity and structural order versus disorder, promote robustness and/or innovability. We hypothesize that polarity--differentiation and low connectivity between a protein's scaffold and its active-site--is a key prerequisite for innovability.
Collapse
Affiliation(s)
- Agnes Tóth-Petróczy
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Dan S Tawfik
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 76100, Israel.
| |
Collapse
|
34
|
Zhang D, Iyer LM, Burroughs AM, Aravind L. Resilience of biochemical activity in protein domains in the face of structural divergence. Curr Opin Struct Biol 2014; 26:92-103. [PMID: 24952217 DOI: 10.1016/j.sbi.2014.05.008] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Accepted: 05/20/2014] [Indexed: 01/07/2023]
Abstract
Recent studies point to the prevalence of the evolutionary phenomenon of drastic structural transformation of protein domains while continuing to preserve their basic biochemical function. These transformations span a wide spectrum, including simple domains incorporated into larger structural scaffolds, changes in the structural core, major active site shifts, topological rewiring and extensive structural transmogrifications. Proteins from biological conflict systems, such as toxin-antitoxin, restriction-modification, CRISPR/Cas, polymorphic toxin and secondary metabolism systems commonly display such transformations. These include endoDNases, metal-independent RNases, deaminases, ADP ribosyltransferases, immunity proteins, kinases and E1-like enzymes. In eukaryotes such transformations are seen in domains involved in chromatin-related peptide recognition and protein/DNA-modification. Intense selective pressures from 'arms-race'-like situations in conflict and macromolecular modification systems could favor drastic structural divergence while preserving function.
Collapse
Affiliation(s)
- Dapeng Zhang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Lakshminarayan M Iyer
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - A Maxwell Burroughs
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - L Aravind
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| |
Collapse
|
35
|
Anand P, Nagarajan D, Mukherjee S, Chandra N. PLIC: protein-ligand interaction clusters. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau029. [PMID: 24763918 PMCID: PMC3998096 DOI: 10.1093/database/bau029] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Most of the biological processes are governed through specific protein–ligand interactions. Discerning different components that contribute toward a favorable protein– ligand interaction could contribute significantly toward better understanding protein function, rationalizing drug design and obtaining design principles for protein engineering. The Protein Data Bank (PDB) currently hosts the structure of ∼68 000 protein–ligand complexes. Although several databases exist that classify proteins according to sequence and structure, a mere handful of them annotate and classify protein–ligand interactions and provide information on different attributes of molecular recognition. In this study, an exhaustive comparison of all the biologically relevant ligand-binding sites (84 846 sites) has been conducted using PocketMatch: a rapid, parallel, in-house algorithm. PocketMatch quantifies the similarity between binding sites based on structural descriptors and residue attributes. A similarity network was constructed using binding sites whose PocketMatch scores exceeded a high similarity threshold (0.80). The binding site similarity network was clustered into discrete sets of similar sites using the Markov clustering (MCL) algorithm. Furthermore, various computational tools have been used to study different attributes of interactions within the individual clusters. The attributes can be roughly divided into (i) binding site characteristics including pocket shape, nature of residues and interaction profiles with different kinds of atomic probes, (ii) atomic contacts consisting of various types of polar, hydrophobic and aromatic contacts along with binding site water molecules that could play crucial roles in protein–ligand interactions and (iii) binding energetics involved in interactions derived from scoring functions developed for docking. For each ligand-binding site in each protein in the PDB, site similarity information, clusters they belong to and description of site attributes are provided as a relational database—protein–ligand interaction clusters (PLIC). Database URL: http://proline.biochem.iisc.ernet.in/PLIC
Collapse
Affiliation(s)
- Praveen Anand
- Department of Biochemistry, Indian Institute of Science, Bangalore 560012, Karnataka, India and IISc Mathematics Initiative, Indian Institute of Science, Banglaore 560012, Karnataka, India
| | | | | | | |
Collapse
|
36
|
Alderson RG, De Ferrari L, Mavridis L, McDonagh JL, Mitchell JBO, Nath N. Enzyme informatics. Curr Top Med Chem 2014; 12:1911-23. [PMID: 23116471 DOI: 10.2174/156802612804547353] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2012] [Revised: 09/12/2012] [Accepted: 09/15/2012] [Indexed: 12/18/2022]
Abstract
Over the last 50 years, sequencing, structural biology and bioinformatics have completely revolutionised biomolecular science, with millions of sequences and tens of thousands of three dimensional structures becoming available. The bioinformatics of enzymes is well served by, mostly free, online databases. BRENDA describes the chemistry, substrate specificity, kinetics, preparation and biological sources of enzymes, while KEGG is valuable for understanding enzymes and metabolic pathways. EzCatDB, SFLD and MACiE are key repositories for data on the chemical mechanisms by which enzymes operate. At the current rate of genome sequencing and manual annotation, human curation will never finish the functional annotation of the ever-expanding list of known enzymes. Hence there is an increasing need for automated annotation, though it is not yet widespread for enzyme data. In contrast, functional ontologies such as the Gene Ontology already profit from automation. Despite our growing understanding of enzyme structure and dynamics, we are only beginning to be able to design novel enzymes. One can now begin to trace the functional evolution of enzymes using phylogenetics. The ability of enzymes to perform secondary functions, albeit relatively inefficiently, gives clues as to how enzyme function evolves. Substrate promiscuity in enzymes is one example of imperfect specificity in protein-ligand interactions. Similarly, most drugs bind to more than one protein target. This may sometimes result in helpful polypharmacology as a drug modulates plural targets, but also often leads to adverse side-effects. Many chemoinformatics approaches can be used to model the interactions between druglike molecules and proteins in silico. We can even use quantum chemical techniques like DFT and QM/MM to compute the structural and energetic course of enzyme catalysed chemical reaction mechanisms, including a full description of bond making and breaking.
Collapse
Affiliation(s)
- Rosanna G Alderson
- Biomedical Sciences Research Complex and EaStCHEM School of Chemistry, Purdie Building, University of St Andrews, North Haugh, St Andrews, Scotland, UK
| | | | | | | | | | | |
Collapse
|
37
|
Abstract
Efforts from the TB Structural Genomics Consortium together with those of tuberculosis structural biologists worldwide have led to the determination of about 350 structures, making up nearly a tenth of the pathogen's proteome. Given that knowledge of protein structures is essential to obtaining a high-resolution understanding of the underlying biology, it is desirable to have a structural view of the entire proteome. Indeed, structure prediction methods have advanced sufficiently to allow structural models of many more proteins to be built based on homology modeling and fold recognition strategies. By means of these approaches, structural models for about 2,877 proteins, making up nearly 70% of the Mycobacterium tuberculosis proteome, are available. Knowledge from bioinformatics has made significant inroads into an improved annotation of the M. tuberculosis genome and in the prediction of key protein players that interact in vital pathways, some of which are unique to the organism. Functional inferences have been made for a large number of proteins based on fold-function associations. More importantly, ligand-binding pockets of the proteins are identified and scanned against a large database, leading to binding site-based ligand associations and hence structure-based function annotation. Near proteome-wide structural models provide a global perspective of the fold distribution in the genome. New insights about the folds that predominate in the genome, as well as the fold combinations that make up multidomain proteins, are also obtained. This chapter describes the structural proteome, functional inferences drawn from it, and its applications in drug discovery.
Collapse
|
38
|
Kosciolek T, Jones DT. De novo structure prediction of globular proteins aided by sequence variation-derived contacts. PLoS One 2014; 9:e92197. [PMID: 24637808 PMCID: PMC3956894 DOI: 10.1371/journal.pone.0092197] [Citation(s) in RCA: 93] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2013] [Accepted: 02/19/2014] [Indexed: 12/21/2022] Open
Abstract
The advent of high accuracy residue-residue intra-protein contact prediction methods enabled a significant boost in the quality of de novo structure predictions. Here, we investigate the potential benefits of combining a well-established fragment-based folding algorithm--FRAGFOLD, with PSICOV, a contact prediction method which uses sparse inverse covariance estimation to identify co-varying sites in multiple sequence alignments. Using a comprehensive set of 150 diverse globular target proteins, up to 266 amino acids in length, we are able to address the effectiveness and some limitations of such approaches to globular proteins in practice. Overall we find that using fragment assembly with both statistical potentials and predicted contacts is significantly better than either statistical potentials or contacts alone. Results show up to nearly 80% of correct predictions (TM-score ≥0.5) within analysed dataset and a mean TM-score of 0.54. Unsuccessful modelling cases emerged either from conformational sampling problems, or insufficient contact prediction accuracy. Nevertheless, a strong dependency of the quality of final models on the fraction of satisfied predicted long-range contacts was observed. This not only highlights the importance of these contacts on determining the protein fold, but also (combined with other ensemble-derived qualities) provides a powerful guide as to the choice of correct models and the global quality of the selected model. A proposed quality assessment scoring function achieves 0.93 precision and 0.77 recall for the discrimination of correct folds on our dataset of decoys. These findings suggest the approach is well-suited for blind predictions on a variety of globular proteins of unknown 3D structure, provided that enough homologous sequences are available to construct a large and accurate multiple sequence alignment for the initial contact prediction step.
Collapse
Affiliation(s)
- Tomasz Kosciolek
- Bioinformatics Group, Department of Computer Science, University College London, London, United Kingdom
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - David T. Jones
- Bioinformatics Group, Department of Computer Science, University College London, London, United Kingdom
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| |
Collapse
|
39
|
Dhifli W, Saidi R, Nguifo EM. Smoothing 3D Protein Structure Motifs Through Graph Mining and Amino Acid Similarities. J Comput Biol 2014; 21:162-72. [DOI: 10.1089/cmb.2013.0092] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Affiliation(s)
- Wajdi Dhifli
- LIMOS, Blaise Pascal University, Clermont University, Clermont-Ferrand, France
- LIMOS, CNRS UMR 6158, Aubière, France
| | - Rabie Saidi
- European Bioinformatics Institute, Hinxton, Cambridge, United Kingdom
| | - Engelbert Mephu Nguifo
- LIMOS, Blaise Pascal University, Clermont University, Clermont-Ferrand, France
- LIMOS, CNRS UMR 6158, Aubière, France
| |
Collapse
|
40
|
Rahman SA, Cuesta SM, Furnham N, Holliday GL, Thornton JM. EC-BLAST: a tool to automatically search and compare enzyme reactions. Nat Methods 2014; 11:171-4. [PMID: 24412978 PMCID: PMC4122987 DOI: 10.1038/nmeth.2803] [Citation(s) in RCA: 88] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2013] [Accepted: 12/10/2013] [Indexed: 11/08/2022]
Abstract
We present EC-BLAST (http://www.ebi.ac.uk/thornton-srv/software/rbl/), an algorithm and Web tool for quantitative similarity searches between enzyme reactions at three levels: bond change, reaction center and reaction structure similarity. It uses bond changes and reaction patterns for all known biochemical reactions derived from atom-atom mapping across each reaction. EC-BLAST has the potential to improve enzyme classification, identify previously uncharacterized or new biochemical transformations, improve the assignment of enzyme function to sequences, and assist in enzyme engineering.
Collapse
Affiliation(s)
- Syed Asad Rahman
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, UK
| | - Sergio Martinez Cuesta
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, UK
| | - Nicholas Furnham
- 1] European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, UK. [2]
| | - Gemma L Holliday
- 1] European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, UK. [2]
| | - Janet M Thornton
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, UK
| |
Collapse
|
41
|
Yergens DW, Dutton DJ, Patten SB. An overview of the statistical methods reported by studies using the Canadian community health survey. BMC Med Res Methodol 2014; 14:15. [PMID: 24460595 PMCID: PMC3922729 DOI: 10.1186/1471-2288-14-15] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2013] [Accepted: 01/23/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The Canadian Community Health Survey (CCHS) is a cross-sectional survey that has collected information on health determinants, health status and the utilization of the health system in Canada since 2001. Several hundred articles have been written utilizing the CCHS dataset. Previous analyses of statistical methods utilized in the literature have focused on a particular journal or set of journals to understand the statistical literacy required for understanding the published research. In this study, we describe the statistical methods referenced in the published literature utilizing the CCHS dataset(s). METHODS A descriptive study was undertaken of references published in Medline, Embase, Web of Knowledge and Scopus associated with the CCHS. These references were imported into a Java application utilizing the searchable Apache Lucene text database and screened based upon pre-defined inclusion and exclusion criteria. Full-text PDF articles that met the inclusion criteria were then used for the identification of descriptive, elementary and regression statistical methods referenced in these articles. The identification of statistical methods occurred through an automated search of key words on the full-text articles utilizing the Java application. RESULTS We identified 4811 references from the 4 bibliographical databases for possible inclusion. After exclusions, 663 references were used for the analysis. Descriptive statistics such as means or proportions were presented in a majority of the articles (97.7%). Elementary-level statistics such as t-tests were less frequently referenced (29.7%) than descriptive statistics. Regression methods were frequently referenced in the articles: 79.8% of articles contained reference to regression in general with logistic regression appearing most frequently in 67.1% of the articles. CONCLUSIONS Our study shows a diverse set of analysis methods being referenced in the CCHS literature, however, the literature heavily relies on only a subset of all possible statistical tools. This information can be used in identifying gaps in statistical methods that could be applied to future analysis of public health surveys, insight into training and educational programs, and also identifies the level of statistical literacy needed to understand the published literature.
Collapse
Affiliation(s)
- Dean W Yergens
- Department of Community Health Sciences, University of Calgary, Calgary, Alberta, Canada.
| | | | | |
Collapse
|
42
|
Sula A, Cole AR, Yeats C, Orengo C, Keep NH. Crystal structures of the human Dysferlin inner DysF domain. BMC STRUCTURAL BIOLOGY 2014; 14:3. [PMID: 24438169 PMCID: PMC3898210 DOI: 10.1186/1472-6807-14-3] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/12/2013] [Accepted: 01/15/2014] [Indexed: 11/10/2022]
Abstract
BACKGROUND Mutations in dysferlin, the first protein linked with the cell membrane repair mechanism, causes a group of muscular dystrophies called dysferlinopathies. Dysferlin is a type two-anchored membrane protein, with a single C terminal trans-membrane helix, and most of the protein lying in cytoplasm. Dysferlin contains several C2 domains and two DysF domains which are nested one inside the other. Many pathogenic point mutations fall in the DysF domain region. RESULTS We describe the crystal structure of the human dysferlin inner DysF domain with a resolution of 1.9 Ångstroms. Most of the pathogenic mutations are part of aromatic/arginine stacks that hold the domain in a folded conformation. The high resolution of the structure show that these interactions are a mixture of parallel ring/guanadinium stacking, perpendicular H bond stacking and aliphatic chain packing. CONCLUSIONS The high resolution structure of the Dysferlin DysF domain gives a template on which to interpret in detail the pathogenic mutations that lead to disease.
Collapse
Affiliation(s)
| | | | | | | | - Nicholas H Keep
- Crystallography, Biological Sciences, Institute for Structural and Molecular Biology, Birkbeck University of London, Malet Street, London WC1E 7HX, UK.
| |
Collapse
|
43
|
Jung S, Main D. Genomics and bioinformatics resources for translational science in Rosaceae. PLANT BIOTECHNOLOGY REPORTS 2014; 8:49-64. [PMID: 24634697 PMCID: PMC3951882 DOI: 10.1007/s11816-013-0282-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/11/2013] [Accepted: 04/22/2013] [Indexed: 05/22/2023]
Abstract
Recent technological advances in biology promise unprecedented opportunities for rapid and sustainable advancement of crop quality. Following this trend, the Rosaceae research community continues to generate large amounts of genomic, genetic and breeding data. These include annotated whole genome sequences, transcriptome and expression data, proteomic and metabolomic data, genotypic and phenotypic data, and genetic and physical maps. Analysis, storage, integration and dissemination of these data using bioinformatics tools and databases are essential to provide utility of the data for basic, translational and applied research. This review discusses the currently available genomics and bioinformatics resources for the Rosaceae family.
Collapse
Affiliation(s)
- Sook Jung
- Department of Horticulture, Washington State University, Pullman, WA 99164 USA
| | - Dorrie Main
- Department of Horticulture, Washington State University, Pullman, WA 99164 USA
| |
Collapse
|
44
|
Lees JG, Lee D, Studer RA, Dawson NL, Sillitoe I, Das S, Yeats C, Dessailly BH, Rentzsch R, Orengo CA. Gene3D: Multi-domain annotations for protein sequence and comparative genome analysis. Nucleic Acids Res 2013; 42:D240-5. [PMID: 24270792 PMCID: PMC3965083 DOI: 10.1093/nar/gkt1205] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
Gene3D (http://gene3d.biochem.ucl.ac.uk) is a database of protein domain structure annotations for protein sequences. Domains are predicted using a library of profile HMMs from 2738 CATH superfamilies. Gene3D assigns domain annotations to Ensembl and UniProt sequence sets including >6000 cellular genomes and >20 million unique protein sequences. This represents an increase of 45% in the number of protein sequences since our last publication. Thanks to improvements in the underlying data and pipeline, we see large increases in the domain coverage of sequences. We have expanded this coverage by integrating Pfam and SUPERFAMILY domain annotations, and we now resolve domain overlaps to provide highly comprehensive composite multi-domain architectures. To make these data more accessible for comparative genome analyses, we have developed novel search algorithms for searching genomes to identify related multi-domain architectures. In addition to providing domain family annotations, we have now developed a pipeline for 3D homology modelling of domains in Gene3D. This has been applied to the human genome and will be rolled out to other major organisms over the next year.
Collapse
Affiliation(s)
- Jonathan G Lees
- Division of Biosciences, Institute of Structural and Molecular Biology, University College London, Gower Street, London WC1E 6BT, UK, Department of Infectious Disease Epidemiology, Imperial College London, St Mary's Campus, Norfolk Place, London W2 1PG, UK and Robert Koch Institut, Research Group Bioinformatics Ng4, Nordufer 20, 13353 Berlin, Germany
| | | | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Pérès S, Felicori L, Molina F. Elementary flux modes analysis of functional domain networks allows a better metabolic pathway interpretation. PLoS One 2013; 8:e76143. [PMID: 24204596 PMCID: PMC3812217 DOI: 10.1371/journal.pone.0076143] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2013] [Accepted: 08/21/2013] [Indexed: 12/20/2022] Open
Abstract
Metabolic network analysis is an important step for the functional understanding of biological systems. In these networks, enzymes are made of one or more functional domains often involved in different catalytic activities. Elementary flux mode (EFM) analysis is a method of choice for the topological studies of these enzymatic networks. In this article, we propose to use an EFM approach on networks that encompass available knowledge on structure-function. We introduce a new method that allows to represent the metabolic networks as functional domain networks and provides an application of the algorithm for computing elementary flux modes to analyse them. Any EFM that can be represented using the classical representation can be represented using our functional domain network representation but the fine-grained feature of functional domain networks allows to highlight new connections in EFMs. This methodology is applied to the tricarboxylic acid cycle (TCA cycle) of Bacillus subtilis, and compared to the classical analyses. This new method of analysis of the functional domain network reveals that a specific inhibition on the second domain of the lipoamide dehydrogenase (pdhD) component of pyruvate dehydrogenase complex leads to the loss of all fluxes. Such conclusion was not predictable in the classical approach.
Collapse
Affiliation(s)
- Sabine Pérès
- Laboratoire de Recherche en Informatique, Université Paris-Sud, CNRS UMR 8623 and INRIA Saclay, Orsay, France
- SysDiag UMR3145 CNRS/Bio-Rad Parc Euromédecine, Montpellier, France
- * E-mail: (SP); (FM)
| | - Liza Felicori
- Universidade Federal de Minas Gerais, Bioquimica e Imunologia, Belo Horizonte, Brazil
- SysDiag UMR3145 CNRS/Bio-Rad Parc Euromédecine, Montpellier, France
| | - Franck Molina
- SysDiag UMR3145 CNRS/Bio-Rad Parc Euromédecine, Montpellier, France
- * E-mail: (SP); (FM)
| |
Collapse
|
46
|
Cloning, in silico characterization and prediction of three dimensional structure of SbDof1, SbDof19, SbDof23 and SbDof24 proteins from Sorghum [Sorghum bicolor (L.) Moench]. Mol Biotechnol 2013; 54:1-12. [PMID: 22476870 DOI: 10.1007/s12033-012-9536-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
In the present study, four full-length Dof (DNA-binding with one finger) genes from Sorghum bicolor namely SbDof1, SbDof19, SbDof23, and SbDof24 were PCR amplified, gel eluted, cloned, and sequenced (accession number HQ540084, HQ540085, HQ540086, and HQ540087, respectively). These sequences were further characterized in silico by subjecting them to homology search, multiple sequence alignment, phylogenetic tree construction, and protein functional analysis, revealing their identity to Dof like proteins. Phylogenetic analysis of cloned SbDof genes along with other reported Dof proteins revealed existence of two major groups A and B, while group A was further bifurcated into two sub-groups (viz., I and II). Motif scan analysis of SbDof proteins revealed the presence of glycine- and alanine-rich profiles in SbDof1, while proline-rich profile was observed in SbDof23. Asparagines, methionine, and serine-rich profiles were common in case of both SbDof19 and SbDof24 proteins. The three dimensional structures of SbDof proteins were predicted by I-TASSER server based on multiple threading method. The modeled structures were refined by energy minimization and their stereo chemical qualities were validated by PROCHECK and QMEAN server indicating the acceptability of the predicted models. The final models were submitted to PMDB database with assigned PMDB IDs, i.e., PM0077395, PM0077396, PM0077397, PM0077398, and PM0076448 for SbDof1, SbDof19, SbDof23, SbDof24, and Dof domain, respectively. Based on gene ontology (GO) terms in I-TASSER server putative functions of modeled SbDof proteins were also predicted.
Collapse
|
47
|
Automatic identification of highly conserved family regions and relationships in genome wide datasets including remote protein sequences. PLoS One 2013; 8:e75458. [PMID: 24069417 PMCID: PMC3771926 DOI: 10.1371/journal.pone.0075458] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2013] [Accepted: 08/19/2013] [Indexed: 11/19/2022] Open
Abstract
Identifying shared sequence segments along amino acid sequences generally requires a collection of closely related proteins, most often curated manually from the sequence datasets to suit the purpose at hand. Currently developed statistical methods are strained, however, when the collection contains remote sequences with poor alignment to the rest, or sequences containing multiple domains. In this paper, we propose a completely unsupervised and automated method to identify the shared sequence segments observed in a diverse collection of protein sequences including those present in a smaller fraction of the sequences in the collection, using a combination of sequence alignment, residue conservation scoring and graph-theoretical approaches. Since shared sequence fragments often imply conserved functional or structural attributes, the method produces a table of associations between the sequences and the identified conserved regions that can reveal previously unknown protein families as well as new members to existing ones. We evaluated the biological relevance of the method by clustering the proteins in gold standard datasets and assessing the clustering performance in comparison with previous methods from the literature. We have then applied the proposed method to a genome wide dataset of 17793 human proteins and generated a global association map to each of the 4753 identified conserved regions. Investigations on the major conserved regions revealed that they corresponded strongly to annotated structural domains. This suggests that the method can be useful in predicting novel domains on protein sequences.
Collapse
|
48
|
Kim DE, Dimaio F, Yu-Ruei Wang R, Song Y, Baker D. One contact for every twelve residues allows robust and accurate topology-level protein structure modeling. Proteins 2013; 82 Suppl 2:208-18. [PMID: 23900763 DOI: 10.1002/prot.24374] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2013] [Revised: 06/12/2013] [Accepted: 06/21/2013] [Indexed: 12/19/2022]
Abstract
A number of methods have been described for identifying pairs of contacting residues in protein three-dimensional structures, but it is unclear how many contacts are required for accurate structure modeling. The CASP10 assisted contact experiment provided a blind test of contact guided protein structure modeling. We describe the models generated for these contact guided prediction challenges using the Rosetta structure modeling methodology. For nearly all cases, the submitted models had the correct overall topology, and in some cases, they had near atomic-level accuracy; for example the model of the 384 residue homo-oligomeric tetramer (Tc680o) had only 2.9 Å root-mean-square deviation (RMSD) from the crystal structure. Our results suggest that experimental and bioinformatic methods for obtaining contact information may need to generate only one correct contact for every 12 residues in the protein to allow accurate topology level modeling.
Collapse
Affiliation(s)
- David E Kim
- Department of Biochemistry, University of Washington, Seattle, 98195, Washington
| | | | | | | | | |
Collapse
|
49
|
Mariani V, Biasini M, Barbato A, Schwede T. lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests. Bioinformatics 2013; 29:2722-8. [PMID: 23986568 PMCID: PMC3799472 DOI: 10.1093/bioinformatics/btt473] [Citation(s) in RCA: 577] [Impact Index Per Article: 48.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Motivation: The assessment of protein structure prediction techniques requires objective criteria to measure the similarity between a computational model and the experimentally determined reference structure. Conventional similarity measures based on a global superposition of carbon α atoms are strongly influenced by domain motions and do not assess the accuracy of local atomic details in the model. Results: The Local Distance Difference Test (lDDT) is a superposition-free score that evaluates local distance differences of all atoms in a model, including validation of stereochemical plausibility. The reference can be a single structure, or an ensemble of equivalent structures. We demonstrate that lDDT is well suited to assess local model quality, even in the presence of domain movements, while maintaining good correlation with global measures. These properties make lDDT a robust tool for the automated assessment of structure prediction servers without manual intervention. Availability and implementation: Source code, binaries for Linux and MacOSX, and an interactive web server are available at http://swissmodel.expasy.org/lddt Contact:torsten.schwede@unibas.ch Supplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Valerio Mariani
- Biozentrum, Universität Basel, Klingelbergstrasse 50-70 and Computational Structural Biology, SIB Swiss Institute of Bioinformatics, 4056 Basel, Switzerland
| | | | | | | |
Collapse
|
50
|
Gutmanas A, Oldfield TJ, Patwardhan A, Sen S, Velankar S, Kleywegt GJ. The role of structural bioinformatics resources in the era of integrative structural biology. ACTA CRYSTALLOGRAPHICA. SECTION D, BIOLOGICAL CRYSTALLOGRAPHY 2013; 69:710-21. [PMID: 23633580 PMCID: PMC3640467 DOI: 10.1107/s0907444913001157] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/06/2012] [Accepted: 01/11/2013] [Indexed: 11/10/2022]
Abstract
The history and the current state of the PDB and EMDB archives is briefly described, as well as some of the challenges that they face. It seems natural that the role of structural biology archives will change from being a pure repository of historic data into becoming an indispensable resource for the wider biomedical community. As part of this transformation, it will be necessary to validate the biomacromolecular structure data and ensure the highest possible quality for the archive holdings, to combine structural data from different spatial scales into a unified resource and to integrate structural data with functional, genetic and taxonomic data as well as other information available in bioinformatics resources. Some recent developments and plans to address these challenges at PDBe are presented.
Collapse
Affiliation(s)
- Aleksandras Gutmanas
- Protein Data Bank in Europe, EMBL–EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, England
| | - Thomas J. Oldfield
- Protein Data Bank in Europe, EMBL–EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, England
| | - Ardan Patwardhan
- Protein Data Bank in Europe, EMBL–EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, England
| | - Sanchayita Sen
- Protein Data Bank in Europe, EMBL–EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, England
| | - Sameer Velankar
- Protein Data Bank in Europe, EMBL–EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, England
| | - Gerard J. Kleywegt
- Protein Data Bank in Europe, EMBL–EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, England
| |
Collapse
|