151
|
Haggerty LS, Jachiet PA, Hanage WP, Fitzpatrick DA, Lopez P, O'Connell MJ, Pisani D, Wilkinson M, Bapteste E, McInerney JO. A pluralistic account of homology: adapting the models to the data. Mol Biol Evol 2013; 31:501-16. [PMID: 24273322 PMCID: PMC3935183 DOI: 10.1093/molbev/mst228] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Defining homologous genes is important in many evolutionary studies but raises obvious issues. Some of these issues are conceptual and stem from our assumptions of how a gene evolves, others are practical, and depend on the algorithmic decisions implemented in existing software. Therefore, to make progress in the study of homology, both ontological and epistemological questions must be considered. In particular, defining homologous genes cannot be solely addressed under the classic assumptions of strong tree thinking, according to which genes evolve in a strictly tree-like fashion of vertical descent and divergence and the problems of homology detection are primarily methodological. Gene homology could also be considered under a different perspective where genes evolve as “public goods,” subjected to various introgressive processes. In this latter case, defining homologous genes becomes a matter of designing models suited to the actual complexity of the data and how such complexity arises, rather than trying to fit genetic data to some a priori tree-like evolutionary model, a practice that inevitably results in the loss of much information. Here we show how important aspects of the problems raised by homology detection methods can be overcome when even more fundamental roots of these problems are addressed by analyzing public goods thinking evolutionary processes through which genes have frequently originated. This kind of thinking acknowledges distinct types of homologs, characterized by distinct patterns, in phylogenetic and nonphylogenetic unrooted or multirooted networks. In addition, we define “family resemblances” to include genes that are related through intermediate relatives, thereby placing notions of homology in the broader context of evolutionary relationships. We conclude by presenting some payoffs of adopting such a pluralistic account of homology and family relationship, which expands the scope of evolutionary analyses beyond the traditional, yet relatively narrow focus allowed by a strong tree-thinking view on gene evolution.
Collapse
Affiliation(s)
- Leanne S Haggerty
- Bioinformatics and Molecular Evolution Unit, Department of Biology, National University of Ireland Maynooth, Maynooth, Co. Kildare, Ireland
| | | | | | | | | | | | | | | | | | | |
Collapse
|
152
|
Di Roberto RB, Peisajovich SG. The role of domain shuffling in the evolution of signaling networks. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 2013; 322:65-72. [DOI: 10.1002/jez.b.22551] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2013] [Accepted: 10/28/2013] [Indexed: 01/05/2023]
|
153
|
Arnold R, Goldenberg F, Mewes HW, Rattei T. SIMAP--the database of all-against-all protein sequence similarities and annotations with new interfaces and increased coverage. Nucleic Acids Res 2013; 42:D279-84. [PMID: 24165881 PMCID: PMC3965014 DOI: 10.1093/nar/gkt970] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The Similarity Matrix of Proteins (SIMAP, http://mips.gsf.de/simap/) database has been designed to massively accelerate computationally expensive protein sequence analysis tasks in bioinformatics. It provides pre-calculated sequence similarities interconnecting the entire known protein sequence universe, complemented by pre-calculated protein features and domains, similarity clusters and functional annotations. SIMAP covers all major public protein databases as well as many consistently re-annotated metagenomes from different repositories. As of September 2013, SIMAP contains >163 million proteins corresponding to ∼70 million non-redundant sequences. SIMAP uses the sensitive FASTA search heuristics, the Smith-Waterman alignment algorithm, the InterPro database of protein domain models and the BLAST2GO functional annotation algorithm. SIMAP assists biologists by facilitating the interactive exploration of the protein sequence universe. Web-Service and DAS interfaces allow connecting SIMAP with any other bioinformatic tool and resource. All-against-all protein sequence similarity matrices of project-specific protein collections are generated on request. Recent improvements allow SIMAP to cover the rapidly growing sequenced protein sequence universe. New Web-Service interfaces enhance the connectivity of SIMAP. Novel tools for interactive extraction of protein similarity networks have been added. Open access to SIMAP is provided through the web portal; the portal also contains instructions and links for software access and flat file downloads.
Collapse
Affiliation(s)
- Roland Arnold
- Terrence Donnelly Centre for Cellular and Biomolecular Research, Kim Lab, University of Toronto, Toronto, ON M5S 3E1, Canada, CUBE-Division of Computational Systems Biology, Department of Microbiology and Ecosystem Science, University of Vienna, 1090 Vienna, Austria and Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Technische Universität München, Wissenschaftszentrum Weihenstephan, 85764 Neuherberg, Germany
| | | | | | | |
Collapse
|
154
|
Anishchenko I, Kundrotas PJ, Tuzikov AV, Vakser IA. Protein models: the Grand Challenge of protein docking. Proteins 2013; 82:278-87. [PMID: 23934791 DOI: 10.1002/prot.24385] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2013] [Revised: 07/16/2013] [Accepted: 07/26/2013] [Indexed: 12/28/2022]
Abstract
Characterization of life processes at the molecular level requires structural details of protein-protein interactions (PPIs). The number of experimentally determined protein structures accounts only for a fraction of known proteins. This gap has to be bridged by modeling, typically using experimentally determined structures as templates to model related proteins. The fraction of experimentally determined PPI structures is even smaller than that for the individual proteins, due to a larger number of interactions than the number of individual proteins, and a greater difficulty of crystallizing protein-protein complexes. The approaches to structural modeling of PPI (docking) often have to rely on modeled structures of the interactors, especially in the case of large PPI networks. Structures of modeled proteins are typically less accurate than the ones determined by X-ray crystallography or nuclear magnetic resonance. Thus the utility of approaches to dock these structures should be assessed by thorough benchmarking, specifically designed for protein models. To be credible, such benchmarking has to be based on carefully curated sets of structures with levels of distortion typical for modeled proteins. This article presents such a suite of models built for the benchmark set of the X-ray structures from the Dockground resource (http://dockground.bioinformatics.ku.edu) by a combination of homology modeling and Nudged Elastic Band method. For each monomer, six models were generated with predefined C(α) root mean square deviation from the native structure (1, 2, …, 6 Å). The sets and the accompanying data provide a comprehensive resource for the development of docking methodology for modeled proteins.
Collapse
Affiliation(s)
- Ivan Anishchenko
- Center for Bioinformatics, The University of Kansas, Lawrence, Kansas, 66047; United Institute of Informatics Problems, National Academy of Sciences, 220012, Minsk, Belarus
| | | | | | | |
Collapse
|
155
|
Yegambaram K, Bulloch EMM, Kingston RL. Protein domain definition should allow for conditional disorder. Protein Sci 2013; 22:1502-18. [PMID: 23963781 DOI: 10.1002/pro.2336] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2013] [Revised: 08/04/2013] [Accepted: 08/12/2013] [Indexed: 12/19/2022]
Abstract
Proteins are often classified in a binary fashion as either structured or disordered. However this approach has several deficits. Firstly, protein folding is always conditional on the physiochemical environment. A protein which is structured in some circumstances will be disordered in others. Secondly, it hides a fundamental asymmetry in behavior. While all structured proteins can be unfolded through a change in environment, not all disordered proteins have the capacity for folding. Failure to accommodate these complexities confuses the definition of both protein structural domains and intrinsically disordered regions. We illustrate these points with an experimental study of a family of small binding domains, drawn from the RNA polymerase of mumps virus and its closest relatives. Assessed at face value the domains fall on a structural continuum, with folded, partially folded, and near unstructured members. Yet the disorder present in the family is conditional, and these closely related polypeptides can access the same folded state under appropriate conditions. Any heuristic definition of the protein domain emphasizing conformational stability divides this domain family in two, in a way that makes no biological sense. Structural domains would be better defined by their ability to adopt a specific tertiary structure: a structure that may or may not be realized, dependent on the circumstances. This explicitly allows for the conditional nature of protein folding, and more clearly demarcates structural domains from intrinsically disordered regions that may function without folding.
Collapse
Affiliation(s)
- Kavestri Yegambaram
- School of Biological Sciences, University of Auckland, Private Bag 92019, Auckland, 1142, New Zealand
| | | | | |
Collapse
|
156
|
Huang W, Greene GL, Ravikumar KM, Yang S. Cross-talk between the ligand- and DNA-binding domains of estrogen receptor. Proteins 2013; 81:1900-9. [PMID: 23737157 DOI: 10.1002/prot.24331] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2013] [Revised: 04/22/2013] [Accepted: 05/09/2013] [Indexed: 11/11/2022]
Abstract
Estrogen receptor alpha (ERα) is a hormone-responsive transcription factor that contains several discrete functional domains, including a ligand-binding domain (LBD) and a DNA-binding domain (DBD). Despite a wealth of knowledge about the behaviors of individual domains, the molecular mechanisms of cross-talk between LBD and DBD during signal transduction from hormone to DNA-binding of ERα remain elusive. Here, we apply a multiscale approach combining coarse-grained (CG) and atomistically detailed simulations to characterize this cross-talk mechanism via an investigation of the ERα conformational landscape. First, a CG model of ERα is built based on crystal structures of individual LBDs and DBDs, with more emphasis on their interdomain interactions. Second, molecular dynamics simulations are implemented and enhanced sampling is achieved via the "push-pull-release" strategy in the search for different LBD-DBD orientations. Third, multiple energetically stable ERα conformations are identified on the landscape. A key finding is that estradiol-bound LBDs utilize the well-described activation helix H12 to pack and stabilize LBD-DBD interactions. Our results suggest that the estradiol-bound LBDs can serve as a scaffold to position and stabilize the DBD-DNA complex, consistent with experimental observations of enhanced DNA binding with the LBD. Final assessment using atomic-level simulations shows that these CG-predicted models are significantly stable within a 15-ns simulation window and that specific pairs of lysine residues in close proximity at the domain interfaces could serve as candidate sites for chemical cross-linking studies. Together, these simulation results provide a molecular view of the role of ERα domain interactions in response to hormone binding.
Collapse
Affiliation(s)
- Wei Huang
- Center for Proteomics and Department of Pharmacology, Case Western Reserve University, Cleveland, Ohio, 44106-4988
| | | | | | | |
Collapse
|
157
|
Tinti M, Johnson C, Toth R, Ferrier DEK, Mackintosh C. Evolution of signal multiplexing by 14-3-3-binding 2R-ohnologue protein families in the vertebrates. Open Biol 2013; 2:120103. [PMID: 22870394 PMCID: PMC3411107 DOI: 10.1098/rsob.120103] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2012] [Accepted: 06/29/2012] [Indexed: 01/09/2023] Open
Abstract
14-3-3 proteins regulate cellular responses to stimuli by docking onto pairs of phosphorylated residues on target proteins. The present study shows that the human 14-3-3-binding phosphoproteome is highly enriched in 2R-ohnologues, which are proteins in families of two to four members that were generated by two rounds of whole genome duplication at the origin of the vertebrates. We identify 2R-ohnologue families whose members share a ‘lynchpin’, defined as a 14-3-3-binding phosphosite that is conserved across members of a given family, and aligns with a Ser/Thr residue in pro-orthologues from the invertebrate chordates. For example, the human receptor expression enhancing protein (REEP) 1–4 family has the commonest type of lynchpin motif in current datasets, with a phosphorylatable serine in the –2 position relative to the 14-3-3-binding phosphosite. In contrast, the second 14-3-3-binding sites of REEPs 1–4 differ and are phosphorylated by different kinases, and hence the REEPs display different affinities for 14-3-3 dimers. We suggest a conceptual model for intracellular regulation involving protein families whose evolution into signal multiplexing systems was facilitated by 14-3-3 dimer binding to lynchpins, which gave freedom for other regulatory sites to evolve. While increased signalling complexity was needed for vertebrate life, these systems also generate vulnerability to genetic disorders.
Collapse
Affiliation(s)
- Michele Tinti
- MRC Protein Phosphorylation Unit, College of Life Sciences, James Black Centre, University of Dundee, Dow Street, Dundee DD1 5EH , UK
| | | | | | | | | |
Collapse
|
158
|
Hsu CH, Chen CK, Hwang MJ. The architectural design of networks of protein domain architectures. Biol Lett 2013; 9:20130268. [PMID: 23760167 DOI: 10.1098/rsbl.2013.0268] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Protein domain architectures (PDAs), in which single domains are linked to form multiple-domain proteins, are a major molecular form used by evolution for the diversification of protein functions. However, the design principles of PDAs remain largely uninvestigated. In this study, we constructed networks to connect domain architectures that had grown out from the same single domain for every single domain in the Pfam-A database and found that there are three main distinctive types of these networks, which suggests that evolution can exploit PDAs in three different ways. Further analysis showed that these three different types of PDA networks are each adopted by different types of protein domains, although many networks exhibit the characteristics of more than one of the three types. Our results shed light on nature's blueprint for protein architecture and provide a framework for understanding architectural design from a network perspective.
Collapse
Affiliation(s)
- Chia-Hsin Hsu
- Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei, Taiwan, Republic of China
| | | | | |
Collapse
|
159
|
Syamaladevi DP, Joshi A, Sowdhamini R. An alignment-free domain architecture similarity search (ADASS) algorithm for inferring homology between multi-domain proteins. Bioinformation 2013; 9:491-9. [PMID: 23861564 PMCID: PMC3705623 DOI: 10.6026/97320630009491] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2012] [Revised: 01/01/2013] [Accepted: 01/02/2013] [Indexed: 11/23/2022] Open
Abstract
Annotations of the genes and their products are largely guided by inferring homology. Sequence
similarity is the primary measure used for annotation purpose however, the domain content and
order were given less importance albeit the fact that domain insertion, deletion, positional
changes can bring in functional varieties. Of late, several methods developed quantify domain
architecture similarity depending on alignments of their sequences and are focused on only homologous
proteins. We present an alignment-free domain architecture-similarity search (ADASS) algorithm that
identifies proteins that share very poor sequence similarity yet having similar domain architectures.
We introduce a “singlet matching-triplet comparison” method in ADASS, wherein triplet of domains is
compared with other triplets in a pair-wise comparison of two domain architectures. Different events
in the triplet comparison are scored as per a scoring scheme and an average pairwise distance score
(Domain Architecture Distance score - DAD Score) is calculated between protein domains architectures.
We use domain architectures of a selected domain termed as centric domain and cluster them based on DAD score.
The algorithm has high Positive Prediction Value (PPV) with respect to the clustering of the sequences of selected
domain architectures. A comparison of domain architecture based dendrograms using ADASS method and an existing
method revealed that ADASS can classify proteins depending on the extent of domain architecture level similarity.
ADASS is more relevant in cases of proteins with tiny domains having little contribution to the overall sequence
similarity but contributing significantly to the overall function.
Collapse
Affiliation(s)
- Divya P Syamaladevi
- Sugarcane Breeding Institute Indian Council of Agricultural Research Coimbatore, India, PIN 641 007 ; National Center for Biological Sciences (TIFR), UAS-GKVK Campus, Bellary Road, Bangalore 560 065, India
| | | | | |
Collapse
|
160
|
Asada Y, Sugahara M, Mizutani H, Naitow H, Tanaka T, Matsuura Y, Agari Y, Ebihara A, Shinkai A, Kuramitsu S, Yokoyama S, Kaminuma E, Kobayashi N, Nishikata K, Shimoyama S, Toyoda T, Ishikawa T, Kunishima N. Integrated database of information from structural genomics experiments. ACTA CRYSTALLOGRAPHICA SECTION D: BIOLOGICAL CRYSTALLOGRAPHY 2013; 69:914-9. [PMID: 23633602 DOI: 10.1107/s0907444913001728] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2012] [Accepted: 01/17/2013] [Indexed: 02/05/2023]
Abstract
Information from structural genomics experiments at the RIKEN SPring-8 Center, Japan has been compiled and published as an integrated database. The contents of the database are (i) experimental data from nine species of bacteria that cover a large variety of protein molecules in terms of both evolution and properties (http://database.riken.jp/db/bacpedia), (ii) experimental data from mutant proteins that were designed systematically to study the influence of mutations on the diffraction quality of protein crystals (http://database.riken.jp/db/bacpedia) and (iii) experimental data from heavy-atom-labelled proteins from the heavy-atom database HATODAS (http://database.riken.jp/db/hatodas). The database integration adopts the semantic web, which is suitable for data reuse and automatic processing, thereby allowing batch downloads of full data and data reconstruction to produce new databases. In addition, to enhance the use of data (i) and (ii) by general researchers in biosciences, a comprehensible user interface, Bacpedia (http://bacpedia.harima.riken.jp), has been developed.
Collapse
Affiliation(s)
- Yukuhiko Asada
- Protein Crystallography Research Group, RIKEN SPring-8 Center, Harima Institute, 1-1-1 Kouto, Sayo-cho, Sayo-gun, Hyogo 679-5148, Japan
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
161
|
Bornberg-Bauer E, Albà MM. Dynamics and adaptive benefits of modular protein evolution. Curr Opin Struct Biol 2013; 23:459-66. [PMID: 23562500 DOI: 10.1016/j.sbi.2013.02.012] [Citation(s) in RCA: 80] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2013] [Revised: 02/15/2013] [Accepted: 02/15/2013] [Indexed: 11/29/2022]
Abstract
During protein evolution, novel domain arrangements are continuously formed. Rearrangements are important for the creation of molecular biodiversity and for functional molecular changes which underlie developmental shifts in the bauplan of organisms. Here we review the mechanisms by which new arrangements arise and the potential benefits of rearrangements. We concentrate on how new domains emerge and why they rapidly spread across genomes, gaining higher copy numbers than older, more established domains. This spread is most likely a consequence of their high adaptive potential but is unlikely to make up on its own for the drastic loss of domains, which is observed across different taxa. We show that a significant portion of the recently emerged domains, especially those in multidomain families, are highly disordered and speculate about the significance of these findings for the evolvability of novel genetic material.
Collapse
Affiliation(s)
- Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, School of Biological Sciences, University of Münster, Hüfferstrasse 1, D48149 Münster, Germany.
| | | |
Collapse
|
162
|
Bukhari SA, Caetano-Anollés G. Origin and evolution of protein fold designs inferred from phylogenomic analysis of CATH domain structures in proteomes. PLoS Comput Biol 2013; 9:e1003009. [PMID: 23555236 PMCID: PMC3610613 DOI: 10.1371/journal.pcbi.1003009] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2012] [Accepted: 02/13/2013] [Indexed: 12/22/2022] Open
Abstract
The spatial arrangements of secondary structures in proteins, irrespective of their connectivity, depict the overall shape and organization of protein domains. These features have been used in the CATH and SCOP classifications to hierarchically partition fold space and define the architectural make up of proteins. Here we use phylogenomic methods and a census of CATH structures in hundreds of genomes to study the origin and diversification of protein architectures (A) and their associated topologies (T) and superfamilies (H). Phylogenies that describe the evolution of domain structures and proteomes were reconstructed from the structural census and used to generate timelines of domain discovery. Phylogenies of CATH domains at T and H levels of structural abstraction and associated chronologies revealed patterns of reductive evolution, the early rise of Archaea, three epochs in the evolution of the protein world, and patterns of structural sharing between superkingdoms. Phylogenies of proteomes confirmed the early appearance of Archaea. While these findings are in agreement with previous phylogenomic studies based on the SCOP classification, phylogenies unveiled sharing patterns between Archaea and Eukarya that are recent and can explain the canonical bacterial rooting typically recovered from sequence analysis. Phylogenies of CATH domains at A level uncovered general patterns of architectural origin and diversification. The tree of A structures showed that ancient structural designs such as the 3-layer (αβα) sandwich (3.40) or the orthogonal bundle (1.10) are comparatively simpler in their makeup and are involved in basic cellular functions. In contrast, modern structural designs such as prisms, propellers, 2-solenoid, super-roll, clam, trefoil and box are not widely distributed and were probably adopted to perform specialized functions. Our timelines therefore uncover a universal tendency towards protein structural complexity that is remarkable.
Collapse
Affiliation(s)
- Syed Abbas Bukhari
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, United States of America
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, United States of America
| |
Collapse
|
163
|
Protein structure prediction from sequence variation. Nat Biotechnol 2013; 30:1072-80. [PMID: 23138306 DOI: 10.1038/nbt.2419] [Citation(s) in RCA: 430] [Impact Index Per Article: 39.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2012] [Accepted: 10/15/2012] [Indexed: 02/07/2023]
Abstract
Genomic sequences contain rich evolutionary information about functional constraints on macromolecules such as proteins. This information can be efficiently mined to detect evolutionary couplings between residues in proteins and address the long-standing challenge to compute protein three-dimensional structures from amino acid sequences. Substantial progress has recently been made on this problem owing to the explosive growth in available sequences and the application of global statistical methods. In addition to three-dimensional structure, the improved understanding of covariation may help identify functional residues involved in ligand binding, protein-complex formation and conformational changes. We expect computation of covariation patterns to complement experimental structural biology in elucidating the full spectrum of protein structures, their functional interactions and evolutionary dynamics.
Collapse
|
164
|
Jaramillo-Garzón JA, Gallardo-Chacón JJ, Castellanos-Domínguez CG, Perera-Lluna A. Predictability of gene ontology slim-terms from primary structure information in Embryophyta plant proteins. BMC Bioinformatics 2013; 14:68. [PMID: 23441934 PMCID: PMC3660269 DOI: 10.1186/1471-2105-14-68] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2012] [Accepted: 02/19/2013] [Indexed: 11/25/2022] Open
Abstract
Background Proteins are the key elements on the path from genetic information to the development of life. The roles played by the different proteins are difficult to uncover experimentally as this process involves complex procedures such as genetic modifications, injection of fluorescent proteins, gene knock-out methods and others. The knowledge learned from each protein is usually annotated in databases through different methods such as the proposed by The Gene Ontology (GO) consortium. Different methods have been proposed in order to predict GO terms from primary structure information, but very few are available for large-scale functional annotation of plants, and reported success rates are much less than the reported by other non-plant predictors. This paper explores the predictability of GO annotations on proteins belonging to the Embryophyta group from a set of features extracted solely from their primary amino acid sequence. Results High predictability of several GO terms was found for Molecular Function and Cellular Component. As expected, a lower degree of predictability was found on Biological Process ontology annotations, although a few biological processes were easily predicted. Proteins related to transport and transcription were particularly well predicted from primary structure information. The most discriminant features for prediction were those related to electric charges of the amino-acid sequence and hydropathicity derived features. Conclusions An analysis of GO-slim terms predictability in plants was carried out, in order to determine single categories or groups of functions that are most related with primary structure information. For each highly predictable GO term, the responsible features of such successfulness were identified and discussed. In addition to most published studies, focused on few categories or single ontologies, results in this paper comprise a complete landscape of GO predictability from primary structure encompassing 75 GO terms at molecular, cellular and phenotypical level. Thus, it provides a valuable guide for researchers interested on further advances in protein function prediction on Embryophyta plants.
Collapse
Affiliation(s)
- Jorge Alberto Jaramillo-Garzón
- Departamento de Ingeniería Eléctrica, Electrónica y Computación, Universidad Nacional de Colombia sede Manizales, Campus La Nubia, Km 7 Vía al Magdalena, Manizales-Caldas, Colombia.
| | | | | | | |
Collapse
|
165
|
Yafremava LS, Wielgos M, Thomas S, Nasir A, Wang M, Mittenthal JE, Caetano-Anollés G. A general framework of persistence strategies for biological systems helps explain domains of life. Front Genet 2013; 4:16. [PMID: 23443991 PMCID: PMC3580334 DOI: 10.3389/fgene.2013.00016] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2012] [Accepted: 01/28/2013] [Indexed: 11/13/2022] Open
Abstract
The nature and cause of the division of organisms in superkingdoms is not fully understood. Assuming that environment shapes physiology, here we construct a novel theoretical framework that helps identify general patterns of organism persistence. This framework is based on Jacob von Uexküll's organism-centric view of the environment and James G. Miller's view of organisms as matter-energy-information processing molecular machines. Three concepts describe an organism's environmental niche: scope, umwelt, and gap. Scope denotes the entirety of environmental events and conditions to which the organism is exposed during its lifetime. Umwelt encompasses an organism's perception of these events. The gap is the organism's blind spot, the scope that is not covered by umwelt. These concepts bring organisms of different complexity to a common ecological denominator. Ecological and physiological data suggest organisms persist using three strategies: flexibility, robustness, and economy. All organisms use umwelt information to flexibly adapt to environmental change. They implement robustness against environmental perturbations within the gap generally through redundancy and reliability of internal constituents. Both flexibility and robustness improve survival. However, they also incur metabolic matter-energy processing costs, which otherwise could have been used for growth and reproduction. Lineages evolve unique tradeoff solutions among strategies in the space of what we call "a persistence triangle." Protein domain architecture and other evidence support the preferential use of flexibility and robustness properties. Archaea and Bacteria gravitate toward the triangle's economy vertex, with Archaea biased toward robustness. Eukarya trade economy for survivability. Protista occupy a saddle manifold separating akaryotes from multicellular organisms. Plants and the more flexible Fungi share an economic stratum, and Metazoa are locked in a positive feedback loop toward flexibility.
Collapse
Affiliation(s)
- Liudmila S Yafremava
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois Urbana, IL, USA
| | | | | | | | | | | | | |
Collapse
|
166
|
Moore AD, Grath S, Schüler A, Huylmans AK, Bornberg-Bauer E. Quantification and functional analysis of modular protein evolution in a dense phylogenetic tree. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013; 1834:898-907. [PMID: 23376183 DOI: 10.1016/j.bbapap.2013.01.007] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2012] [Revised: 01/06/2013] [Accepted: 01/09/2013] [Indexed: 12/24/2022]
Abstract
Modularity is a hallmark of molecular evolution. Whether considering gene regulation, the components of metabolic pathways or signaling cascades, the ability to reuse autonomous modules in different molecular contexts can expedite evolutionary innovation. Similarly, protein domains are the modules of proteins, and modular domain rearrangements can create diversity with seemingly few operations in turn allowing for swift changes to an organism's functional repertoire. Here, we assess the patterns and functional effects of modular rearrangements at high resolution. Using a well resolved and diverse group of pancrustaceans, we illustrate arrangement diversity within closely related organisms, estimate arrangement turnover frequency and establish, for the first time, branch-specific rate estimates for fusion, fission, domain addition and terminal loss. Our results show that roughly 16 new arrangements arise per million years and that between 64% and 81% of these can be explained by simple, single-step modular rearrangement events. We find evidence that the frequencies of fission and terminal deletion events increase over time, and that modular rearrangements impact all levels of the cellular signaling apparatus and thus may have strong adaptive potential. Novel arrangements that cannot be explained by simple modular rearrangements contain a significant amount of repeat domains that occur in complex patterns which we term "supra-repeats". Furthermore, these arrangements are significantly longer than those with a single-step rearrangement solution, suggesting that such arrangements may result from multi-step events. In summary, our analysis provides an integrated view and initial quantification of the patterns and functional impact of modular protein evolution in a well resolved phylogenetic tree. This article is part of a Special Issue entitled: The emerging dynamic view of proteins: Protein plasticity in allostery, evolution and self-assembly.
Collapse
Affiliation(s)
- Andrew D Moore
- Institute for Evolution and Biodiversity, Münster, Germany
| | | | | | | | | |
Collapse
|
167
|
Low-resolution structural modeling of protein interactome. Curr Opin Struct Biol 2013; 23:198-205. [PMID: 23294579 DOI: 10.1016/j.sbi.2012.12.003] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2012] [Accepted: 12/03/2012] [Indexed: 11/23/2022]
Abstract
Structural characterization of protein-protein interactions across the broad spectrum of scales is key to our understanding of life at the molecular level. Low-resolution approach to protein interactions is needed for modeling large interaction networks, given the significant level of uncertainties in large biomolecular systems and the high-throughput nature of the task. Since only a fraction of protein structures in interactome are determined experimentally, protein docking approaches are increasingly focusing on modeled proteins. Current rapid advancement of template-based modeling of protein-protein complexes is following a long standing trend in structure prediction of individual proteins. Protein-protein templates are already available for almost all interactions of structurally characterized proteins, and about one third of such templates are likely correct.
Collapse
|
168
|
Sigrist CJA, de Castro E, Cerutti L, Cuche BA, Hulo N, Bridge A, Bougueleret L, Xenarios I. New and continuing developments at PROSITE. Nucleic Acids Res 2012; 41:D344-7. [PMID: 23161676 PMCID: PMC3531220 DOI: 10.1093/nar/gks1067] [Citation(s) in RCA: 923] [Impact Index Per Article: 76.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
PROSITE (http://prosite.expasy.org/) consists of documentation entries describing protein domains, families and functional sites, as well as associated patterns and profiles to identify them. It is complemented by ProRule a collection of rules, which increases the discriminatory power of these profiles and patterns by providing additional information about functionally and/or structurally critical amino acids. PROSITE signatures, together with ProRule, are used for the annotation of domains and features of UniProtKB/Swiss-Prot entries. Here, we describe recent developments that allow users to perform whole-proteome annotation as well as a number of filtering options that can be combined to perform powerful targeted searches for biological discovery. The latest version of PROSITE (release 20.85, of 30 August 2012) contains 1308 patterns, 1039 profiles and 1041 ProRules.
Collapse
Affiliation(s)
- Christian J A Sigrist
- SIB Swiss Institute of Bioinformatics, Centre Médical Universitaire (CMU), 1 rue Michel Servet, 1211 Geneva 4, Switzerland.
| | | | | | | | | | | | | | | |
Collapse
|
169
|
Rekapalli B, Wuichet K, Peterson GD, Zhulin IB. Dynamics of domain coverage of the protein sequence universe. BMC Genomics 2012; 13:634. [PMID: 23157439 PMCID: PMC3557196 DOI: 10.1186/1471-2164-13-634] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2012] [Accepted: 11/11/2012] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND The currently known protein sequence space consists of millions of sequences in public databases and is rapidly expanding. Assigning sequences to families leads to a better understanding of protein function and the nature of the protein universe. However, a large portion of the current protein space remains unassigned and is referred to as its "dark matter". RESULTS Here we suggest that true size of "dark matter" is much larger than stated by current definitions. We propose an approach to reducing the size of "dark matter" by identifying and subtracting regions in protein sequences that are not likely to contain any domain. CONCLUSIONS Recent improvements in computational domain modeling result in a decrease, albeit slowly, in the relative size of "dark matter"; however, its absolute size increases substantially with the growth of sequence data.
Collapse
Affiliation(s)
- Bhanu Rekapalli
- Joint Institute for Computational Sciences, Oak Ridge National Laboratory - University of Tennessee, Oak Ridge, TN 37831, USA
| | | | | | | |
Collapse
|
170
|
Yu C, Deng M, Cheng SY, Yau SC, He RL, Yau SST. Protein space: a natural method for realizing the nature of protein universe. J Theor Biol 2012; 318:197-204. [PMID: 23154188 DOI: 10.1016/j.jtbi.2012.11.005] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2012] [Revised: 11/01/2012] [Accepted: 11/02/2012] [Indexed: 10/27/2022]
Abstract
Current methods cannot tell us what the nature of the protein universe is concretely. They are based on different models of amino acid substitution and multiple sequence alignment which is an NP-hard problem and requires manual intervention. Protein structural analysis also gives a direction for mapping the protein universe. Unfortunately, now only a minuscule fraction of proteins' 3-dimensional structures are known. Furthermore, the phylogenetic tree representations are not unique for any existing tree construction methods. Here we develop a novel method to realize the nature of protein universe. We show the protein universe can be realized as a protein space in 60-dimensional Euclidean space using a distance based on a normalized distribution of amino acids. Every protein is in one-to-one correspondence with a point in protein space, where proteins with similar properties stay close together. Thus the distance between two points in protein space represents the biological distance of the corresponding two proteins. We also propose a natural graphical representation for inferring phylogenies. The representation is natural and unique based on the biological distances of proteins in protein space. This will solve the fundamental question of how proteins are distributed in the protein universe.
Collapse
Affiliation(s)
- Chenglong Yu
- Department of Mathematics, Statistics and Computer Science, University of Illinois at Chicago, Chicago, IL, USA
| | | | | | | | | | | |
Collapse
|
171
|
Minkiewicz P, Bucholska J, Darewicz M, Borawska J. Epitopic hexapeptide sequences from Baltic cod parvalbumin beta (allergen Gad c 1) are common in the universal proteome. Peptides 2012; 38:105-9. [PMID: 22940202 DOI: 10.1016/j.peptides.2012.08.011] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/18/2012] [Revised: 08/14/2012] [Accepted: 08/14/2012] [Indexed: 01/25/2023]
Abstract
The aim of this study was to analyze the distribution of hexapeptide fragments considered as epitopes of Baltic cod parvalbumin beta (allergen Gad c 1) in the universal proteome. Cod (Gadus morhua subsp. callarias) parvalbumin hexapeptides cataloged in the Immune Epitope Database were used as query sequences. The UniProt database was screened using the WU-BLAST 2 program. The distribution of hexapeptide fragments was investigated in various protein families, classified according to the presence of the appropriate domains, and in proteins of plant, animal and microbial species. Hexapeptides from cod parvalbumin were found in the proteins of plants and animals which are food sources, microorganisms with various applications in food technology and biotechnology, microorganisms which are human symbionts and commensals as well as human pathogens. In the last case possible coverage between epitopes from pathogens and allergens should be avoided during vaccine design.
Collapse
Affiliation(s)
- Piotr Minkiewicz
- University of Warmia and Mazury in Olsztyn, Chair of Food Biochemistry, Olsztyn-Kortowo, Poland.
| | | | | | | |
Collapse
|
172
|
Evolutionary analyses of non-genealogical bonds produced by introgressive descent. Proc Natl Acad Sci U S A 2012; 109:18266-72. [PMID: 23090996 DOI: 10.1073/pnas.1206541109] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
All evolutionary biologists are familiar with evolutionary units that evolve by vertical descent in a tree-like fashion in single lineages. However, many other kinds of processes contribute to evolutionary diversity. In vertical descent, the genetic material of a particular evolutionary unit is propagated by replication inside its own lineage. In what we call introgressive descent, the genetic material of a particular evolutionary unit propagates into different host structures and is replicated within these host structures. Thus, introgressive descent generates a variety of evolutionary units and leaves recognizable patterns in resemblance networks. We characterize six kinds of evolutionary units, of which five involve mosaic lineages generated by introgressive descent. To facilitate detection of these units in resemblance networks, we introduce terminology based on two notions, P3s (subgraphs of three nodes: A, B, and C) and mosaic P3s, and suggest an apparatus for systematic detection of introgressive descent. Mosaic P3s correspond to a distinct type of evolutionary bond that is orthogonal to the bonds of kinship and genealogy usually examined by evolutionary biologists. We argue that recognition of these evolutionary bonds stimulates radical rethinking of key questions in evolutionary biology (e.g., the relations among evolutionary players in very early phases of evolutionary history, the origin and emergence of novelties, and the production of new lineages). This line of research will expand the study of biological complexity beyond the usual genealogical bonds, revealing additional sources of biodiversity. It provides an important step to a more realistic pluralist treatment of evolutionary complexity.
Collapse
|
173
|
Assessing the accuracy of template-based structure prediction metaservers by comparison with structural genomics structures. ACTA ACUST UNITED AC 2012; 13:213-25. [PMID: 23086054 DOI: 10.1007/s10969-012-9146-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2012] [Accepted: 09/26/2012] [Indexed: 12/19/2022]
Abstract
The explosion of the size of the universe of known protein sequences has stimulated two complementary approaches to structural mapping of these sequences: theoretical structure prediction and experimental determination by structural genomics (SG). In this work, we assess the accuracy of structure prediction by two automated template-based structure prediction metaservers (genesilico.pl and bioinfo.pl) by measuring the structural similarity of the predicted models to corresponding experimental models determined a posteriori. Of 199 targets chosen from SG programs, the metaservers predicted the structures of about a fourth of them "correctly." (In this case, "correct" was defined as placing more than 70 % of the alpha carbon atoms in the model within 2 Å of the experimentally determined positions.) Almost all of the targets that could be modeled to this accuracy were those with an available template in the Protein Data Bank (PDB) with more than 25 % sequence identity. The majority of those SG targets with lower sequence identity to structures in the PDB were not predicted by the metaservers with this accuracy. We also compared metaserver results to CASP8 results, finding that the models obtained by participants in the CASP competition were significantly better than those produced by the metaservers.
Collapse
|
174
|
Structure-based prediction of protein-protein interactions on a genome-wide scale. Nature 2012; 490:556-60. [PMID: 23023127 PMCID: PMC3482288 DOI: 10.1038/nature11503] [Citation(s) in RCA: 485] [Impact Index Per Article: 40.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2011] [Accepted: 08/10/2012] [Indexed: 12/23/2022]
Abstract
The genome-wide identification of pairs of interacting proteins is an important step in the elucidation of cell regulatory mechanisms1,2. Much of our current knowledge derives from high-throughput techniques such as yeast two hybrid and affinity purification3, as well as from manual curation of experiments on individual systems4. A variety of computational approaches based, for example, on sequence homology, gene co-expression, and phylogenetic profiles have also been developed for the genome-wide inference of protein-protein interactions (PPIs)5,6. Yet, comparative studies suggest that the development of accurate and complete repertoires of PPIs is still in its early stages7–9. Here we show that three-dimensional structural information can be used to predict PPIs with an accuracy and coverage that are superior to predictions based on non-structural evidence. Moreover, an algorithm, PrePPI, that combines structural information with other functional clues is comparable in accuracy to high-throughput experiments, yielding over 30,000 high confidence interactions for yeast and over 300,000 for human. Experimental tests of a number of predictions demonstrate the ability of the PrePPI algorithm to identify unexpected PPIs of significant biological interest. The surprising effectiveness of three-dimensional structural information can be attributed to the use of homology models combined with the exploitation of both close and remote geometric relationships between proteins.
Collapse
|
175
|
Tiwari MK, Singh R, Singh RK, Kim IW, Lee JK. Computational approaches for rational design of proteins with novel functionalities. Comput Struct Biotechnol J 2012; 2:e201209002. [PMID: 24688643 PMCID: PMC3962203 DOI: 10.5936/csbj.201209002] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2012] [Revised: 08/17/2012] [Accepted: 08/23/2012] [Indexed: 11/22/2022] Open
Abstract
Proteins are the most multifaceted macromolecules in living systems and have various important functions, including structural, catalytic, sensory, and regulatory functions. Rational design of enzymes is a great challenge to our understanding of protein structure and physical chemistry and has numerous potential applications. Protein design algorithms have been applied to design or engineer proteins that fold, fold faster, catalyze, catalyze faster, signal, and adopt preferred conformational states. The field of de novo protein design, although only a few decades old, is beginning to produce exciting results. Developments in this field are already having a significant impact on biotechnology and chemical biology. The application of powerful computational methods for functional protein designing has recently succeeded at engineering target activities. Here, we review recently reported de novo functional proteins that were developed using various protein design approaches, including rational design, computational optimization, and selection from combinatorial libraries, highlighting recent advances and successes.
Collapse
Affiliation(s)
- Manish Kumar Tiwari
- Department of Chemical Engineering, Konkuk University, 1 Hwayang-Dong, Gwangjin-Gu, Seoul 143-701, Korea ; These authors contributed equally
| | - Ranjitha Singh
- Department of Chemical Engineering, Konkuk University, 1 Hwayang-Dong, Gwangjin-Gu, Seoul 143-701, Korea ; These authors contributed equally
| | - Raushan Kumar Singh
- Department of Chemical Engineering, Konkuk University, 1 Hwayang-Dong, Gwangjin-Gu, Seoul 143-701, Korea
| | - In-Won Kim
- Department of Chemical Engineering, Konkuk University, 1 Hwayang-Dong, Gwangjin-Gu, Seoul 143-701, Korea
| | - Jung-Kul Lee
- Department of Chemical Engineering, Konkuk University, 1 Hwayang-Dong, Gwangjin-Gu, Seoul 143-701, Korea ; Institute of SK-KU Biomaterials, Konkuk University, 1 Hwayang-Dong, Gwangjin-Gu, Seoul 143-701, Korea
| |
Collapse
|
176
|
Mello LV, Rigden DJ. A new family of bacterial DNA repair proteins annotated by the integration of non-homology, distant homology and structural bioinformatic methods. FEBS Lett 2012; 586:3908-13. [DOI: 10.1016/j.febslet.2012.09.023] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2012] [Revised: 09/13/2012] [Accepted: 09/14/2012] [Indexed: 10/27/2022]
|
177
|
The dynamic disulphide relay of quiescin sulphydryl oxidase. Nature 2012; 488:414-8. [PMID: 22801504 DOI: 10.1038/nature11267] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2011] [Accepted: 05/28/2012] [Indexed: 12/16/2022]
Abstract
Protein stability, assembly, localization and regulation often depend on the formation of disulphide crosslinks between cysteine side chains. Enzymes known as sulphydryl oxidases catalyse de novo disulphide formation and initiate intra- and intermolecular dithiol/disulphide relays to deliver the disulphides to substrate proteins. Quiescin sulphydryl oxidase (QSOX) is a unique, multi-domain disulphide catalyst that is localized primarily to the Golgi apparatus and secreted fluids and has attracted attention owing to its overproduction in tumours. In addition to its physiological importance, QSOX is a mechanistically intriguing enzyme, encompassing functions typically carried out by a series of proteins in other disulphide-formation pathways. How disulphides are relayed through the multiple redox-active sites of QSOX and whether there is a functional benefit to concatenating these sites on a single polypeptide are open questions. Here we present the first crystal structure of an intact QSOX enzyme, derived from a trypanosome parasite. Notably, sequential sites in the disulphide relay were found more than 40 Å apart in this structure, too far for direct disulphide transfer. To resolve this puzzle, we trapped and crystallized an intermediate in the disulphide hand-off, which showed a 165° domain rotation relative to the original structure, bringing the two active sites within disulphide-bonding distance. The comparable structure of a mammalian QSOX enzyme, also presented here, shows further biochemical features that facilitate disulphide transfer in metazoan orthologues. Finally, we quantified the contribution of concatenation to QSOX activity, providing general lessons for the understanding of multi-domain enzymes and the design of new catalytic relays.
Collapse
|
178
|
Arviv O, Levy Y. Folding of multidomain proteins: Biophysical consequences of tethering even in apparently independent folding. Proteins 2012; 80:2780-98. [DOI: 10.1002/prot.24161] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2012] [Revised: 07/11/2012] [Accepted: 07/16/2012] [Indexed: 01/09/2023]
|
179
|
Caetano-Anollés G, Nasir A. Benefits of using molecular structure and abundance in phylogenomic analysis. Front Genet 2012; 3:172. [PMID: 22973296 PMCID: PMC3434437 DOI: 10.3389/fgene.2012.00172] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2012] [Accepted: 08/18/2012] [Indexed: 12/25/2022] Open
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois Urbana-Champaign, IL, USA
| | | |
Collapse
|
180
|
Ravikumar K, Huang W, Yang S. Coarse-grained simulations of protein-protein association: an energy landscape perspective. Biophys J 2012; 103:837-45. [PMID: 22947945 PMCID: PMC3443792 DOI: 10.1016/j.bpj.2012.07.013] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2012] [Revised: 07/10/2012] [Accepted: 07/12/2012] [Indexed: 01/15/2023] Open
Abstract
Understanding protein-protein association is crucial in revealing the molecular basis of many biological processes. Here, we describe a theoretical simulation pipeline to study protein-protein association from an energy landscape perspective. First, a coarse-grained model is implemented and its applications are demonstrated via molecular dynamics simulations for several protein complexes. Second, an enhanced search method is used to efficiently sample a broad range of protein conformations. Third, multiple conformations are identified and clustered from simulation data and further projected on a three-dimensional globe specifying protein orientations and interacting energies. Results from several complexes indicate that the crystal-like conformation is favorable on the energy landscape even if the landscape is relatively rugged with metastable conformations. A closer examination on molecular forces shows that the formation of associated protein complexes can be primarily electrostatics-driven, hydrophobics-driven, or a combination of both in stabilizing specific binding interfaces. Taken together, these results suggest that the coarse-grained simulations and analyses provide an alternative toolset to study protein-protein association occurring in functional biomolecular complexes.
Collapse
Affiliation(s)
| | | | - Sichun Yang
- Center for Proteomics and Department of Pharmacology, Case Western Reserve University, Cleveland, Ohio
| |
Collapse
|
181
|
Garma L, Mukherjee S, Mitra P, Zhang Y. How many protein-protein interactions types exist in nature? PLoS One 2012; 7:e38913. [PMID: 22719985 PMCID: PMC3374795 DOI: 10.1371/journal.pone.0038913] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2012] [Accepted: 05/14/2012] [Indexed: 11/18/2022] Open
Abstract
“Protein quaternary structure universe” refers to the ensemble of all protein-protein complexes across all organisms in nature. The number of quaternary folds thus corresponds to the number of ways proteins physically interact with other proteins. This study focuses on answering two basic questions: Whether the number of protein-protein interactions is limited and, if yes, how many different quaternary folds exist in nature. By all-to-all sequence and structure comparisons, we grouped the protein complexes in the protein data bank (PDB) into 3,629 families and 1,761 folds. A statistical model was introduced to obtain the quantitative relation between the numbers of quaternary families and quaternary folds in nature. The total number of possible protein-protein interactions was estimated around 4,000, which indicates that the current protein repository contains only 42% of quaternary folds in nature and a full coverage needs approximately a quarter century of experimental effort. The results have important implications to the protein complex structural modeling and the structure genomics of protein-protein interactions.
Collapse
Affiliation(s)
- Leonardo Garma
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Biocenter Oulu and Department of Biochemistry, University of Oulu, Oulu, Finland
| | - Srayanta Mukherjee
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Pralay Mitra
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- * E-mail:
| |
Collapse
|
182
|
Templates are available to model nearly all complexes of structurally characterized proteins. Proc Natl Acad Sci U S A 2012; 109:9438-41. [PMID: 22645367 DOI: 10.1073/pnas.1200678109] [Citation(s) in RCA: 147] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Traditional approaches to protein-protein docking sample the binding modes with no regard to similar experimentally determined structures (templates) of protein-protein complexes. Emerging template-based docking approaches utilize such similar complexes to determine the docking predictions. The docking problem assumes the knowledge of the participating proteins' structures. Thus, it provides the possibility of aligning the structures of the proteins and the template complexes. The progress in the development of template-based docking and the vast experience in template-based modeling of individual proteins show that, generally, such approaches are more reliable than the free modeling. The key aspect of this modeling paradigm is the availability of the templates. The current common perception is that due to the difficulties in experimental structure determination of protein-protein complexes, the pool of docking templates is insignificant, and thus a broad application of template-based docking is possible only at some future time. The results of our large scale, systematic study show that, surprisingly, in spite of the limited number of protein-protein complexes in the Protein Data Bank, docking templates can be found for complexes representing almost all the known protein-protein interactions, provided the components themselves have a known structure or can be homology-built. About one-third of the templates are of good quality when they are compared to experimental structures in test sets extracted from the Protein Data Bank and would be useful starting points in modeling the complexes. This finding dramatically expands our ability to model protein interactions, and has far-reaching implications for the protein docking field in general.
Collapse
|
183
|
Steczkiewicz K, Muszewska A, Knizewski L, Rychlewski L, Ginalski K. Sequence, structure and functional diversity of PD-(D/E)XK phosphodiesterase superfamily. Nucleic Acids Res 2012; 40:7016-45. [PMID: 22638584 PMCID: PMC3424549 DOI: 10.1093/nar/gks382] [Citation(s) in RCA: 109] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Proteins belonging to PD-(D/E)XK phosphodiesterases constitute a functionally diverse superfamily with representatives involved in replication, restriction, DNA repair and tRNA-intron splicing. Their malfunction in humans triggers severe diseases, such as Fanconi anemia and Xeroderma pigmentosum. To date there have been several attempts to identify and classify new PD-(D/E)KK phosphodiesterases using remote homology detection methods. Such efforts are complicated, because the superfamily exhibits extreme sequence and structural divergence. Using advanced homology detection methods supported with superfamily-wide domain architecture and horizontal gene transfer analyses, we provide a comprehensive reclassification of proteins containing a PD-(D/E)XK domain. The PD-(D/E)XK phosphodiesterases span over 21,900 proteins, which can be classified into 121 groups of various families. Eleven of them, including DUF4420, DUF3883, DUF4263, COG5482, COG1395, Tsp45I, HaeII, Eco47II, ScaI, HpaII and Replic_Relax, are newly assigned to the PD-(D/E)XK superfamily. Some groups of PD-(D/E)XK proteins are present in all domains of life, whereas others occur within small numbers of organisms. We observed multiple horizontal gene transfers even between human pathogenic bacteria or from Prokaryota to Eukaryota. Uncommon domain arrangements greatly elaborate the PD-(D/E)XK world. These include domain architectures suggesting regulatory roles in Eukaryotes, like stress sensing and cell-cycle regulation. Our results may inspire further experimental studies aimed at identification of exact biological functions, specific substrates and molecular mechanisms of reactions performed by these highly diverse proteins.
Collapse
Affiliation(s)
- Kamil Steczkiewicz
- Laboratory of Bioinformatics and Systems Biology, CENT, University of Warsaw, Zwirki i Wigury 93, 02-089 Warsaw, Poland
| | | | | | | | | |
Collapse
|
184
|
Rodrigues JPGLM, Levitt M, Chopra G. KoBaMIN: a knowledge-based minimization web server for protein structure refinement. Nucleic Acids Res 2012; 40:W323-8. [PMID: 22564897 PMCID: PMC3394243 DOI: 10.1093/nar/gks376] [Citation(s) in RCA: 109] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The KoBaMIN web server provides an online interface to a simple, consistent and computationally efficient protein structure refinement protocol based on minimization of a knowledge-based potential of mean force. The server can be used to refine either a single protein structure or an ensemble of proteins starting from their unrefined coordinates in PDB format. The refinement method is particularly fast and accurate due to the underlying knowledge-based potential derived from structures deposited in the PDB; as such, the energy function implicitly includes the effects of solvent and the crystal environment. Our server allows for an optional but recommended step that optimizes stereochemistry using the MESHI software. The KoBaMIN server also allows comparison of the refined structures with a provided reference structure to assess the changes brought about by the refinement protocol. The performance of KoBaMIN has been benchmarked widely on a large set of decoys, all models generated at the seventh worldwide experiments on critical assessment of techniques for protein structure prediction (CASP7) and it was also shown to produce top-ranking predictions in the refinement category at both CASP8 and CASP9, yielding consistently good results across a broad range of model quality values. The web server is fully functional and freely available at http://csb.stanford.edu/kobamin.
Collapse
Affiliation(s)
- João P G L M Rodrigues
- Department of Structural Biology, 299 Campus Dr W, Fairchild Bldg, Room D100, Stanford University, Stanford, CA 94305, USA
| | | | | |
Collapse
|
185
|
Moreno-Hernández S, Levitt M. Comparative modeling and protein-like features of hydrophobic-polar models on a two-dimensional lattice. Proteins 2012; 80:1683-93. [PMID: 22411636 DOI: 10.1002/prot.24067] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2011] [Revised: 02/26/2012] [Accepted: 03/03/2012] [Indexed: 11/07/2022]
Abstract
Lattice models of proteins have been extensively used to study protein thermodynamics, folding dynamics, and evolution. Our study considers two different hydrophobic-polar (HP) models on the 2D square lattice: the purely HP model and a model where a compactness-favoring term is added. We exhaustively enumerate all the possible structures in our models and perform the study of their corresponding folds, HP arrangements in space and shapes. The two models considered differ greatly in their numbers of structures, folds, arrangements, and shapes. Despite their differences, both lattice models have distinctive protein-like features: (1) Shapes are compact in both models, especially when a compactness-favoring energy term is added. (2) The residue composition is independent of the chain length and is very close to 50% hydrophobic in both models, as we observe in real proteins. (3) Comparative modeling works well in both models, particularly in the more compact one. The fact that our models show protein-like features suggests that lattice models incorporate the fundamental physical principles of proteins. Our study supports the use of lattice models to study questions about proteins that require exactness and extensive calculations, such as protein design and evolution, which are often too complex and computationally demanding to be addressed with more detailed models.
Collapse
Affiliation(s)
- Sergio Moreno-Hernández
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | | |
Collapse
|
186
|
Montelione GT. The Protein Structure Initiative: achievements and visions for the future. F1000 BIOLOGY REPORTS 2012; 4:7. [PMID: 22500193 PMCID: PMC3318194 DOI: 10.3410/b4-7] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The Protein Structure Initiative (PSI) was established in 2000 by the National Institutes of General Medical Sciences with the long-term goal of providing 3D (three-dimensional) structural information for most proteins in nature. As advances in genomic sequencing, bioinformatics, homology modelling, and methods for rapid determination of 3D structures of proteins by X-ray crystallography and nuclear magnetic resonance (NMR) converged, it was proposed that our understanding of the biology of protein structure and evolution could be greatly enabled by ‘genomic-scale’ protein structure determination. Over the past 12 years, the PSI has evolved from a testing bed for new methods of sample and structure production to a core component of a wide range of biology programs.
Collapse
Affiliation(s)
- Gaetano T Montelione
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Rutgers University Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, Northeast Structural Genomics Consortium, Piscataway, NJ 08854, USA
| |
Collapse
|
187
|
Kersting AR, Bornberg-Bauer E, Moore AD, Grath S. Dynamics and adaptive benefits of protein domain emergence and arrangements during plant genome evolution. Genome Biol Evol 2012; 4:316-29. [PMID: 22250127 PMCID: PMC3318442 DOI: 10.1093/gbe/evs004] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Plant genomes are generally very large, mostly paleopolyploid, and have numerous gene duplicates and complex genomic features such as repeats and transposable elements. Many of these features have been hypothesized to enable plants, which cannot easily escape environmental challenges, to rapidly adapt. Another mechanism, which has recently been well described as a major facilitator of rapid adaptation in bacteria, animals, and fungi but not yet for plants, is modular rearrangement of protein-coding genes. Due to the high precision of profile-based methods, rearrangements can be well captured at the protein level by characterizing the emergence, loss, and rearrangements of protein domains, their structural, functional, and evolutionary building blocks. Here, we study the dynamics of domain rearrangements and explore their adaptive benefit in 27 plant and 3 algal genomes. We use a phylogenomic approach by which we can explain the formation of 88% of all arrangements by single-step events, such as fusion, fission, and terminal loss of domains. We find many domains are lost along every lineage, but at least 500 domains are novel, that is, they are unique to green plants and emerged more or less recently. These novel domains duplicate and rearrange more readily within their genomes than ancient domains and are overproportionally involved in stress response and developmental innovations. Novel domains more often affect regulatory proteins and show a higher degree of structural disorder than ancient domains. Whereas a relatively large and well-conserved core set of single-domain proteins exists, long multi-domain arrangements tend to be species-specific. We find that duplicated genes are more often involved in rearrangements. Although fission events typically impact metabolic proteins, fusion events often create new signaling proteins essential for environmental sensing. Taken together, the high volatility of single domains and complex arrangements in plant genomes demonstrate the importance of modularity for environmental adaptability of plants.
Collapse
Affiliation(s)
- Anna R Kersting
- Evolutionary Bioinformatics Group, Institute for Evolution and Biodiversity, University of Muenster (WWU), Germany
| | | | | | | |
Collapse
|
188
|
Isin B, Tirupula KC, Oltvai ZN, Klein-Seetharaman J, Bahar I. Identification of motions in membrane proteins by elastic network models and their experimental validation. Methods Mol Biol 2012; 914:285-317. [PMID: 22976035 DOI: 10.1007/978-1-62703-023-6_17] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Identifying the functional motions of membrane proteins is difficult because they range from large-scale collective dynamics to local small atomic fluctuations at different timescales that are difficult to measure experimentally due to the hydrophobic nature of these proteins. Elastic Network Models, and in particular their most widely used implementation, the Anisotropic Network Model (ANM), have proven to be useful computational methods in many recent applications to predict membrane protein dynamics. These models are based on the premise that biomolecules possess intrinsic mechanical characteristics uniquely defined by their particular architectures. In the ANM, interactions between residues in close proximity are represented by harmonic potentials with a uniform spring constant. The slow mode shapes generated by the ANM provide valuable information on the global dynamics of biomolecules that are relevant to their function. In its recent extension in the form of ANM-guided molecular dynamics (MD), this coarse-grained approach is augmented with atomic detail. The results from ANM and its extensions can be used to guide experiments and thus speedup the process of quantifying motions in membrane proteins. Testing the predictions can be accomplished through (a) direct observation of motions through studies of structure and biophysical probes, (b) perturbation of the motions by, e.g., cross-linking or site-directed mutagenesis, and (c) by studying the effects of such perturbations on protein function, typically through ligand binding and activity assays. To illustrate the applicability of the combined computational ANM-experimental testing framework to membrane proteins, we describe-alongside the general protocols-here the application of ANM to rhodopsin, a prototypical member of the pharmacologically relevant G-protein coupled receptor family.
Collapse
Affiliation(s)
- Basak Isin
- Department of Pathology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | | | | | | | | |
Collapse
|
189
|
The phylogenomic roots of modern biochemistry: origins of proteins, cofactors and protein biosynthesis. J Mol Evol 2012; 74:1-34. [PMID: 22210458 DOI: 10.1007/s00239-011-9480-1] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2011] [Accepted: 12/12/2011] [Indexed: 12/20/2022]
Abstract
The complexity of modern biochemistry developed gradually on early Earth as new molecules and structures populated the emerging cellular systems. Here, we generate a historical account of the gradual discovery of primordial proteins, cofactors, and molecular functions using phylogenomic information in the sequence of 420 genomes. We focus on structural and functional annotations of the 54 most ancient protein domains. We show how primordial functions are linked to folded structures and how their interaction with cofactors expanded the functional repertoire. We also reveal protocell membranes played a crucial role in early protein evolution and show translation started with RNA and thioester cofactor-mediated aminoacylation. Our findings allow elaboration of an evolutionary model of early biochemistry that is firmly grounded in phylogenomic information and biochemical, biophysical, and structural knowledge. The model describes how primordial α-helical bundles stabilized membranes, how these were decorated by layered arrangements of β-sheets and α-helices, and how these arrangements became globular. Ancient forms of aminoacyl-tRNA synthetase (aaRS) catalytic domains and ancient non-ribosomal protein synthetase (NRPS) modules gave rise to primordial protein synthesis and the ability to generate a code for specificity in their active sites. These structures diversified producing cofactor-binding molecular switches and barrel structures. Accretion of domains and molecules gave rise to modern aaRSs, NRPS, and ribosomal ensembles, first organized around novel emerging cofactors (tRNA and carrier proteins) and then more complex cofactor structures (rRNA). The model explains how the generation of protein structures acted as scaffold for nucleic acids and resulted in crystallization of modern translation.
Collapse
|
190
|
Pang E, Tan T, Lin K. Promiscuous domains: facilitating stability of the yeast protein–protein interaction network. ACTA ACUST UNITED AC 2012; 8:766-71. [DOI: 10.1039/c1mb05364g] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
191
|
Two immunoglobulin tandem proteins with a linking β-strand reveal unexpected differences in cooperativity and folding pathways. J Mol Biol 2011; 416:137-47. [PMID: 22197372 PMCID: PMC3277889 DOI: 10.1016/j.jmb.2011.12.012] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2011] [Revised: 11/30/2011] [Accepted: 12/06/2011] [Indexed: 11/23/2022]
Abstract
The study of the folding of single domains, in the context of their multidomain environment, is important because more than 70% of eukaryotic proteins are composed of multiple domains. The structures of the tandem immunoglobulin (Ig) domain pairs A164–A165 and A168–A169, from the A-band of the giant muscle protein titin, reveal that they form tightly associated domain arrangements, connected by a continuous β-strand. We investigate the thermodynamic and kinetic properties of these tandem domain pairs. While A164–A165 apparently behaves as a single cooperative unit at equilibrium, unfolding without the accumulation of a large population of intermediates, domains in A168–A169 behave independently. Although A169 appears to be stabilized in the tandem protein, we show that this is due to nonspecific stabilization by extension. We elucidate the folding and unfolding pathways of both tandem pairs and show that cooperativity in A164–A165 is a manifestation of the relative refolding and unfolding rate constants of each individual domain. We infer that the differences between the two tandem pairs result from a different pattern of interactions at the domain/domain interface.
Collapse
|
192
|
Wagner A. Genotype networks shed light on evolutionary constraints. Trends Ecol Evol 2011; 26:577-84. [DOI: 10.1016/j.tree.2011.07.001] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2011] [Revised: 07/01/2011] [Accepted: 07/04/2011] [Indexed: 10/17/2022]
|
193
|
Moore AD, Bornberg-Bauer E. The dynamics and evolutionary potential of domain loss and emergence. Mol Biol Evol 2011; 29:787-96. [PMID: 22016574 PMCID: PMC3258042 DOI: 10.1093/molbev/msr250] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
The wealth of available genomic data presents an unrivaled opportunity to study the molecular basis of evolution. Studies on gene family expansions and site-dependent analyses have already helped establish important insights into how proteins facilitate adaptation. However, efforts to conduct full-scale cross-genomic comparisons between species are challenged by both growing amounts of data and the inherent difficulty in accurately inferring homology between deeply rooted species. Proteins, in comparison, evolve by means of domain rearrangements, a process more amenable to study given the strength of profile-based homology inference and the lower rates with which rearrangements occur. However, adapting to a constantly changing environment can require molecular modulations beyond reach of rearrangement alone. Here, we explore rates and functional implications of novel domain emergence in contrast to domain gain and loss in 20 arthropod species of the pancrustacean clade. Emerging domains are more likely disordered in structure and spread more rapidly within their genomes than established domains. Furthermore, although domain turnover occurs at lower rates than gene family turnover, we find strong evidence that the emergence of novel domains is foremost associated with environmental adaptation such as abiotic stress response. The results presented here illustrate the simplicity with which domain-based analyses can unravel key players of nature's adaptational machinery, complementing the classical site-based analyses of adaptation.
Collapse
Affiliation(s)
- Andrew D Moore
- Evolutionary Bioinformatics Group, Institute for Evolution and Biodiversity, University of Muenster, Germany
| | | |
Collapse
|
194
|
Yamamoto T, Iino H, Kim K, Kuramitsu S, Fukui K. Evidence for ATP-dependent structural rearrangement of nuclease catalytic site in DNA mismatch repair endonuclease MutL. J Biol Chem 2011; 286:42337-42348. [PMID: 21953455 DOI: 10.1074/jbc.m111.277335] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
DNA mismatch repair (MMR) greatly contributes to genome integrity via the correction of mismatched bases that are mainly generated by replication errors. Postreplicative MMR excises a relatively long tract of error-containing single-stranded DNA. MutL is a widely conserved nicking endonuclease that directs the excision reaction to the error-containing strand of the duplex by specifically nicking the daughter strand. Because MutL apparently exhibits nonspecific nicking endonuclease activity in vitro, the regulatory mechanism of MutL has been argued. Recent studies suggest ATP-dependent conformational and functional changes of MutL, indicating that the regulatory mechanism involves the ATP binding and hydrolysis cycle. In this study, we investigated the effect of ATP binding on the structure of MutL. First, a cross-linking experiment confirmed that the N-terminal ATPase domain physically interacts with the C-terminal endonuclease domain. Next, hydrogen/deuterium exchange mass spectrometry clarified that the binding of ATP to the N-terminal domain induces local structural changes at the catalytic sites of MutL C-terminal domain. Finally, on the basis of the results of the hydrogen/deuterium exchange experiment, we successfully identified novel regions essential for the endonuclease activity of MutL. The results clearly show that ATP modulates the nicking endonuclease activity of MutL via structural rearrangements of the catalytic site. In addition, several Lynch syndrome-related mutations in human MutL homolog are located in the position corresponding to the newly identified catalytic region. Our data contribute toward understanding the relationship between mutations in MutL homolog and human disease.
Collapse
Affiliation(s)
- Tatsuya Yamamoto
- RIKEN SPring-8 Center, Harima Institute, 1-1-1 Kouto, Sayo-cho, Sayo-gun, Hyogo 679-5148, Japan
| | - Hitoshi Iino
- RIKEN SPring-8 Center, Harima Institute, 1-1-1 Kouto, Sayo-cho, Sayo-gun, Hyogo 679-5148, Japan
| | - Kwang Kim
- Department of Biological Sciences, Graduate School of Science, Osaka University, 1-1 Machikaneyama-cho, Toyonaka, Osaka 560-0043, Japan
| | - Seiki Kuramitsu
- RIKEN SPring-8 Center, Harima Institute, 1-1-1 Kouto, Sayo-cho, Sayo-gun, Hyogo 679-5148, Japan; Department of Biological Sciences, Graduate School of Science, Osaka University, 1-1 Machikaneyama-cho, Toyonaka, Osaka 560-0043, Japan
| | - Kenji Fukui
- RIKEN SPring-8 Center, Harima Institute, 1-1-1 Kouto, Sayo-cho, Sayo-gun, Hyogo 679-5148, Japan.
| |
Collapse
|
195
|
Abstract
Gene evolution has long been thought to be primarily driven by duplication and rearrangement mechanisms. However, every evolutionary lineage harbours orphan genes that lack homologues in other lineages and whose evolutionary origin is only poorly understood. Orphan genes might arise from duplication and rearrangement processes followed by fast divergence; however, de novo evolution out of non-coding genomic regions is emerging as an important additional mechanism. This process appears to provide raw material continuously for the evolution of new gene functions, which can become relevant for lineage-specific adaptations.
Collapse
|
196
|
Global analysis of proline-rich tandem repeat proteins reveals broad phylogenetic diversity in plant secretomes. PLoS One 2011; 6:e23167. [PMID: 21829715 PMCID: PMC3149072 DOI: 10.1371/journal.pone.0023167] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2011] [Accepted: 07/13/2011] [Indexed: 11/19/2022] Open
Abstract
Cell walls, constructed by precisely choreographed changes in the plant secretome, play critical roles in plant cell physiology and development. Along with structural polysaccharides, secreted proline-rich Tandem Repeat Proteins (TRPs) are important for cell wall function, yet the evolutionary diversity of these structural TRPs remains virtually unexplored. Using a systems-level computational approach to analyze taxonomically diverse plant sequence data, we identified 31 distinct Pro-rich TRP classes targeted for secretion. This analysis expands upon the known phylogenetic diversity of extensins, the most widely studied class of wall structural proteins, and demonstrates that extensins evolved before plant vascularization. Our results also show that most Pro-rich TRP classes have unexpectedly restricted evolutionary distributions, revealing considerable differences in plant secretome signatures that define unexplored diversity.
Collapse
|
197
|
Abroi A, Gough J. Are viruses a source of new protein folds for organisms? - Virosphere structure space and evolution. Bioessays 2011; 33:626-35. [DOI: 10.1002/bies.201000126] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
198
|
Godzik A. Metagenomics and the protein universe. Curr Opin Struct Biol 2011; 21:398-403. [PMID: 21497084 DOI: 10.1016/j.sbi.2011.03.010] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2011] [Revised: 03/07/2011] [Accepted: 03/24/2011] [Indexed: 02/07/2023]
Abstract
Metagenomics sequencing projects have dramatically increased our knowledge of the protein universe and provided over one-half of currently known protein sequences; they have also introduced a much broader phylogenetic diversity into the protein databases. The full analysis of metagenomic datasets is only beginning, but it has already led to the discovery of thousands of new protein families, likely representing novel functions specific to given environments. At the same time, a deeper analysis of such novel families, including experimental structure determination of some representatives, suggests that most of them represent distant homologs of already characterized protein families, and thus most of the protein diversity present in the new environments are due to functional divergence of the known protein families rather than the emergence of new ones.
Collapse
Affiliation(s)
- Adam Godzik
- Program on Bioinformatics and Systems Biology, Sanford-Burnham Medical Research Institute, 10901 North Torrey Pines Road, La Jolla, CA 92037, USA.
| |
Collapse
|
199
|
Abstract
Around half of all protein structures solved nowadays using solution-state nuclear magnetic resonance (NMR) spectroscopy have been because of automated data analysis. The pervasiveness of computational approaches in general hides, however, a more nuanced view in which the full variety and richness of the field appears. This review is structured around a comparison of methods associated with three NMR observables: classical nuclear Overhauser effect (NOE) constraint gathering in contrast with more recent chemical shift and residual dipole coupling (RDC) based protocols. In each case, the emphasis is placed on the latest research, covering mainly the past 5 years. By describing both general concepts and representative programs, the objective is to map out a field in which--through the very profusion of approaches--it is all too easy to lose one's bearings.
Collapse
|
200
|
Dai L, Zhou Y. Characterizing the existing and potential structural space of proteins by large-scale multiple loop permutations. J Mol Biol 2011; 408:585-95. [PMID: 21376059 DOI: 10.1016/j.jmb.2011.02.056] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2010] [Revised: 02/22/2011] [Accepted: 02/24/2011] [Indexed: 10/18/2022]
Abstract
Worldwide structural genomics projects are increasing structure coverage of sequence space but have not significantly expanded the protein structure space itself (i.e., number of unique structural folds) since 2007. Discovering new structural folds experimentally by directed evolution and random recombination of secondary-structure blocks is also proved rarely successful. Meanwhile, previous computational efforts for large-scale mapping of protein structure space are limited to simple model proteins and led to an inconclusive answer on the completeness of the existing observed protein structure space. Here, we build novel protein structures by extending naturally occurring circular (single-loop) permutation to multiple loop permutations (MLPs). These structures are clustered by structural similarity measure called TM-score. The computational technique allows us to produce different structural clusters on the same naturally occurring, packed, stable core but with alternatively connected secondary-structure segments. A large-scale MLP of 2936 domains from structural classification of protein domains reproduces those existing structural clusters (63%) mostly as hubs for many nonredundant sequences and illustrates newly discovered novel clusters as islands adopted by a few sequences only. Results further show that there exist a significant number of novel potentially stable clusters for medium-size or large-size single-domain proteins, in particular, >100 amino acid residues, that are either not yet adopted by nature or adopted only by a few sequences. This study suggests that MLP provides a simple yet highly effective tool for engineering and design of novel protein structures (including naturally knotted proteins). The implication of recovering new-fold targets from critical assessment of structure prediction techniques (CASP) by MLP on template-based structure prediction is also discussed. Our MLP structures are available for download at the publication page of the Web site http://sparks.informatics.iupui.edu.
Collapse
Affiliation(s)
- Liang Dai
- School of Informatics, Indiana University Purdue University Indianapolis, and Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, 719 Indiana Avenue, Walker Plaza Building Suite 319, Indianapolis, IN 46202, USA
| | | |
Collapse
|