51
|
Leclère L, Rentzsch F. Repeated evolution of identical domain architecture in metazoan netrin domain-containing proteins. Genome Biol Evol 2012; 4:883-99. [PMID: 22813778 PMCID: PMC3516229 DOI: 10.1093/gbe/evs061] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2012] [Indexed: 12/13/2022] Open
Abstract
The majority of proteins in eukaryotes are composed of multiple domains, and the number and order of these domains is an important determinant of protein function. Although multidomain proteins with a particular domain architecture were initially considered to have a common evolutionary origin, recent comparative studies of protein families or whole genomes have reported that a minority of multidomain proteins could have appeared multiple times independently. Here, we test this scenario in detail for the signaling molecules netrin and secreted frizzled-related proteins (sFRPs), two groups of netrin domain-containing proteins with essential roles in animal development. Our primary phylogenetic analyses suggest that the particular domain architectures of each of these proteins were present in the eumetazoan ancestor and evolved a second time independently within the metazoan lineage from laminin and frizzled proteins, respectively. Using an array of phylogenetic methods, statistical tests, and character sorting analyses, we show that the polyphyly of netrin and sFRP is well supported and cannot be explained by classical phylogenetic reconstruction artifacts. Despite their independent origins, the two groups of netrins and of sFRPs have the same protein interaction partners (Deleted in Colorectal Cancer/neogenin and Unc5 for netrins and Wnts for sFRPs) and similar developmental functions. Thus, these cases of convergent evolution emphasize the importance of domain architecture for protein function by uncoupling shared domain architecture from shared evolutionary history. Therefore, we propose the terms merology to describe the repeated evolution of proteins with similar domain architecture and discuss the potential of merologous proteins to help understanding protein evolution.
Collapse
Affiliation(s)
- Lucas Leclère
- Sars International Centre for Marine Molecular Biology, University of Bergen, Norway.
| | | |
Collapse
|
52
|
Liberles DA, Teichmann SA, Bahar I, Bastolla U, Bloom J, Bornberg-Bauer E, Colwell LJ, de Koning APJ, Dokholyan NV, Echave J, Elofsson A, Gerloff DL, Goldstein RA, Grahnen JA, Holder MT, Lakner C, Lartillot N, Lovell SC, Naylor G, Perica T, Pollock DD, Pupko T, Regan L, Roger A, Rubinstein N, Shakhnovich E, Sjölander K, Sunyaev S, Teufel AI, Thorne JL, Thornton JW, Weinreich DM, Whelan S. The interface of protein structure, protein biophysics, and molecular evolution. Protein Sci 2012; 21:769-85. [PMID: 22528593 PMCID: PMC3403413 DOI: 10.1002/pro.2071] [Citation(s) in RCA: 152] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2012] [Revised: 03/22/2012] [Accepted: 03/23/2012] [Indexed: 12/20/2022]
Abstract
Abstract The interface of protein structural biology, protein biophysics, molecular evolution, and molecular population genetics forms the foundations for a mechanistic understanding of many aspects of protein biochemistry. Current efforts in interdisciplinary protein modeling are in their infancy and the state-of-the art of such models is described. Beyond the relationship between amino acid substitution and static protein structure, protein function, and corresponding organismal fitness, other considerations are also discussed. More complex mutational processes such as insertion and deletion and domain rearrangements and even circular permutations should be evaluated. The role of intrinsically disordered proteins is still controversial, but may be increasingly important to consider. Protein geometry and protein dynamics as a deviation from static considerations of protein structure are also important. Protein expression level is known to be a major determinant of evolutionary rate and several considerations including selection at the mRNA level and the role of interaction specificity are discussed. Lastly, the relationship between modeling and needed high-throughput experimental data as well as experimental examination of protein evolution using ancestral sequence resurrection and in vitro biochemistry are presented, towards an aim of ultimately generating better models for biological inference and prediction.
Collapse
Affiliation(s)
- David A Liberles
- Department of Molecular Biology, University of WyomingLaramie, Wyoming 82071
| | - Sarah A Teichmann
- MRC Laboratory of Molecular BiologyHills Road, Cambridge CB2 0QH, United Kingdom
| | - Ivet Bahar
- Department of Computational and Systems Biology, School of Medicine, University of PittsburghPittsburgh, Pennsylvania 15213
| | - Ugo Bastolla
- Bioinformatics Unit. Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Universidad Autonoma de Madrid28049 Cantoblanco Madrid, Spain
| | - Jesse Bloom
- Division of Basic Sciences, Fred Hutchinson Cancer Research CenterSeattle, Washington 98109
| | - Erich Bornberg-Bauer
- Evolutionary Bioinformatics Group, Institute for Evolution and Biodiversity, University of MuensterGermany
| | - Lucy J Colwell
- MRC Laboratory of Molecular BiologyHills Road, Cambridge CB2 0QH, United Kingdom
| | - A P Jason de Koning
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of ColoradoAurora, Colorado
| | - Nikolay V Dokholyan
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel HillNorth Carolina 27599
| | - Julian Echave
- Escuela de Ciencia y Tecnología, Universidad Nacional de San MartínMartín de Irigoyen 3100, 1650 San Martín, Buenos Aires, Argentina
| | - Arne Elofsson
- Department of Biochemistry and Biophysics, Center for Biomembrane Research, Stockholm Bioinformatics Center, Science for Life Laboratory, Swedish E-science Research Center, Stockholm University106 91 Stockholm, Sweden
| | - Dietlind L Gerloff
- Biomolecular Engineering Department, University of CaliforniaSanta Cruz, California 95064
| | - Richard A Goldstein
- Division of Mathematical Biology, National Institute for Medical Research (MRC)Mill Hill, London NW7 1AA, United Kingdom
| | - Johan A Grahnen
- Department of Molecular Biology, University of WyomingLaramie, Wyoming 82071
| | - Mark T Holder
- Department of Ecology and Evolutionary Biology, University of KansasLawrence, Kansas 66045
| | - Clemens Lakner
- Bioinformatics Research Center, North Carolina State UniversityRaleigh, North Carolina 27695
| | - Nicholas Lartillot
- Département de Biochimie, Faculté de Médecine, Université de MontréalMontréal, QC H3T1J4, Canada
| | - Simon C Lovell
- Faculty of Life Sciences, University of ManchesterManchester M13 9PT, United Kingdom
| | - Gavin Naylor
- Department of Biology, College of CharlestonCharleston, South Carolina 29424
| | - Tina Perica
- MRC Laboratory of Molecular BiologyHills Road, Cambridge CB2 0QH, United Kingdom
| | - David D Pollock
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of ColoradoAurora, Colorado
| | - Tal Pupko
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv UniversityTel Aviv, Israel
| | - Lynne Regan
- Department of Molecular Biophysics and Biochemistry, Yale UniversityNew Haven 06511
| | - Andrew Roger
- Department of Biochemistry and Molecular Biology, Dalhousie UniversityHalifax, NS, Canada
| | - Nimrod Rubinstein
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv UniversityTel Aviv, Israel
| | - Eugene Shakhnovich
- Department of Chemistry and Chemical Biology, Harvard UniversityCambridge, Massachusetts 02138
| | - Kimmen Sjölander
- Department of Bioengineering, University of CaliforniaBerkeley, Berkeley, California 94720
| | - Shamil Sunyaev
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School77 Avenue Louis Pasteur, Boston, Massachusetts 02115
| | - Ashley I Teufel
- Department of Molecular Biology, University of WyomingLaramie, Wyoming 82071
| | - Jeffrey L Thorne
- Bioinformatics Research Center, North Carolina State UniversityRaleigh, North Carolina 27695
| | - Joseph W Thornton
- Howard Hughes Medical Institute and Institute for Ecology and Evolution, University of OregonEugene, Oregon 97403
- Department of Human Genetics, University of ChicagoChicago, Illinois 60637
- Department of Ecology and Evolution, University of ChicagoChicago, Illinois 60637
| | - Daniel M Weinreich
- Department of Ecology and Evolutionary Biology, and Center for Computational Molecular Biology, Brown UniversityProvidence, Rhode Island 02912
| | - Simon Whelan
- Faculty of Life Sciences, University of ManchesterManchester M13 9PT, United Kingdom
| |
Collapse
|
53
|
Kersting AR, Bornberg-Bauer E, Moore AD, Grath S. Dynamics and adaptive benefits of protein domain emergence and arrangements during plant genome evolution. Genome Biol Evol 2012; 4:316-29. [PMID: 22250127 PMCID: PMC3318442 DOI: 10.1093/gbe/evs004] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Plant genomes are generally very large, mostly paleopolyploid, and have numerous gene duplicates and complex genomic features such as repeats and transposable elements. Many of these features have been hypothesized to enable plants, which cannot easily escape environmental challenges, to rapidly adapt. Another mechanism, which has recently been well described as a major facilitator of rapid adaptation in bacteria, animals, and fungi but not yet for plants, is modular rearrangement of protein-coding genes. Due to the high precision of profile-based methods, rearrangements can be well captured at the protein level by characterizing the emergence, loss, and rearrangements of protein domains, their structural, functional, and evolutionary building blocks. Here, we study the dynamics of domain rearrangements and explore their adaptive benefit in 27 plant and 3 algal genomes. We use a phylogenomic approach by which we can explain the formation of 88% of all arrangements by single-step events, such as fusion, fission, and terminal loss of domains. We find many domains are lost along every lineage, but at least 500 domains are novel, that is, they are unique to green plants and emerged more or less recently. These novel domains duplicate and rearrange more readily within their genomes than ancient domains and are overproportionally involved in stress response and developmental innovations. Novel domains more often affect regulatory proteins and show a higher degree of structural disorder than ancient domains. Whereas a relatively large and well-conserved core set of single-domain proteins exists, long multi-domain arrangements tend to be species-specific. We find that duplicated genes are more often involved in rearrangements. Although fission events typically impact metabolic proteins, fusion events often create new signaling proteins essential for environmental sensing. Taken together, the high volatility of single domains and complex arrangements in plant genomes demonstrate the importance of modularity for environmental adaptability of plants.
Collapse
Affiliation(s)
- Anna R Kersting
- Evolutionary Bioinformatics Group, Institute for Evolution and Biodiversity, University of Muenster (WWU), Germany
| | | | | | | |
Collapse
|
54
|
Abstract
This chapter reviews the current research on how protein domain architectures evolve. We begin by summarizing work on the phylogenetic distribution of proteins, as this directly impacts which domain architectures can be formed in different species. Studies relating domain family size to occurrence have shown that they generally follow power law distributions, both within genomes and larger evolutionary groups. These findings were subsequently extended to multidomain architectures. Genome evolution models that have been suggested to explain the shape of these distributions are reviewed, as well as evidence for selective pressure to expand certain domain families more than others. Each domain has an intrinsic combinatorial propensity, and the effects of this have been studied using measures of domain versatility or promiscuity. Next, we study the principles of protein domain architecture evolution and how these have been inferred from distributions of extant domain arrangements. Following this, we review inferences of ancestral domain architecture and the conclusions concerning domain architecture evolution mechanisms that can be drawn from these. Finally, we examine whether all known cases of a given domain architecture can be assumed to have a single common origin (monophyly) or have evolved convergently (polyphyly).
Collapse
|
55
|
Stewart JJ, Coyne KJ. Analysis of raphidophyte assimilatory nitrate reductase reveals unique domain architecture incorporating a 2/2 hemoglobin. PLANT MOLECULAR BIOLOGY 2011; 77:565-75. [PMID: 22038092 DOI: 10.1007/s11103-011-9831-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2011] [Accepted: 09/19/2011] [Indexed: 05/04/2023]
Abstract
Eukaryotic assimilatory nitrate reductase (NR) is a multi-domain protein that catalyzes the rate-limiting step in nitrate assimilation. This protein is highly conserved and has been extensively characterized in plants and algae. Here, we report hybrid NRs (NR2-2/2HbN) identified in two microalgal species, Heterosigma akashiwo and Chattonella subsalsa, with a 2/2 hemoglobin (2/2Hb) inserted into the hinge 2 region of a prototypical NR. 2/2Hbs are a class of single-domain heme proteins found in bacteria, ciliates, algae and plants. Sequence analysis indicates that the C-terminal FAD/NADH reductase domain of NR2-2/2HbN retains identity with eukaryotic NR, suggesting that the 2/2Hb domain was inserted interior to the existing NR domain architecture. Phylogenetic analysis supports the placement of the 2/2Hb domain of NR2-2/2HbN within group I (N-type) 2/2Hbs with high similarity to mycobacterial 2/2HbNs, known to convert nitric oxide to nitrate. Experimental data confirms that H. akashiwo is capable of metabolizing nitric oxide and shows that HaNR2-2/2HbN expression increases in response to nitric oxide addition. Here, we propose a mechanism for the dual function of NR2-2/2HbN in which nitrate reduction and nitric oxide dioxygenase reactions are cooperative, such that conversion of nitric oxide to nitrate is followed by reduction of nitrate for assimilation as cellular nitrogen.
Collapse
Affiliation(s)
- Jennifer J Stewart
- University of Delaware College of Earth, Ocean, and Environment, Lewes, DE 19958, USA
| | | |
Collapse
|
56
|
Lee YCG, Reinhardt JA. Widespread polymorphism in the positions of stop codons in Drosophila melanogaster. Genome Biol Evol 2011; 4:533-49. [PMID: 22051795 PMCID: PMC3342867 DOI: 10.1093/gbe/evr113] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/28/2011] [Indexed: 12/19/2022] Open
Abstract
The mechanisms underlying evolutionary changes in protein length are poorly understood. Protein domains are lost and gained between species and must have arisen first as within-species polymorphisms. Here, we use Drosophila melanogaster population genomic data combined with between species divergence information to understand the evolutionary forces that generate and maintain polymorphisms causing changes in protein length in D. melanogaster. Specifically, we looked for protein length variations resulting from premature termination codons (PTCs) and stop codon losses (SCLs). We discovered that 438 genes contained polymorphisms resulting in truncation of the translated region (PTCs) and 119 genes contained polymorphisms predicted to lengthen the translated region (SCLs). Stop codon polymorphisms (SCPs) (especially PTCs) appear to be more deleterious than other polymorphisms, including protein amino acid changes. Genes harboring SCPs are in general less selectively constrained, more narrowly expressed, and enriched for dispensable biological functions. However, we also observed exceptional cases such as genes that have multiple independent SCPs, alleles that are shared between D. melanogaster and Drosophila simulans, and high-frequency alleles that cause extreme changes in gene length. SCPs likely have an important role in the evolution of these genes.
Collapse
Affiliation(s)
- Yuh Chwen G. Lee
- Department of Evolution and Ecology, The University of California at Davis
| | | |
Collapse
|
57
|
Moore AD, Bornberg-Bauer E. The dynamics and evolutionary potential of domain loss and emergence. Mol Biol Evol 2011; 29:787-96. [PMID: 22016574 PMCID: PMC3258042 DOI: 10.1093/molbev/msr250] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
The wealth of available genomic data presents an unrivaled opportunity to study the molecular basis of evolution. Studies on gene family expansions and site-dependent analyses have already helped establish important insights into how proteins facilitate adaptation. However, efforts to conduct full-scale cross-genomic comparisons between species are challenged by both growing amounts of data and the inherent difficulty in accurately inferring homology between deeply rooted species. Proteins, in comparison, evolve by means of domain rearrangements, a process more amenable to study given the strength of profile-based homology inference and the lower rates with which rearrangements occur. However, adapting to a constantly changing environment can require molecular modulations beyond reach of rearrangement alone. Here, we explore rates and functional implications of novel domain emergence in contrast to domain gain and loss in 20 arthropod species of the pancrustacean clade. Emerging domains are more likely disordered in structure and spread more rapidly within their genomes than established domains. Furthermore, although domain turnover occurs at lower rates than gene family turnover, we find strong evidence that the emergence of novel domains is foremost associated with environmental adaptation such as abiotic stress response. The results presented here illustrate the simplicity with which domain-based analyses can unravel key players of nature's adaptational machinery, complementing the classical site-based analyses of adaptation.
Collapse
Affiliation(s)
- Andrew D Moore
- Evolutionary Bioinformatics Group, Institute for Evolution and Biodiversity, University of Muenster, Germany
| | | |
Collapse
|
58
|
Wu YC, Rasmussen MD, Kellis M. Evolution at the subgene level: domain rearrangements in the Drosophila phylogeny. Mol Biol Evol 2011; 29:689-705. [PMID: 21900599 PMCID: PMC3258039 DOI: 10.1093/molbev/msr222] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
Although the possibility of gene evolution by domain rearrangements has long been appreciated, current methods for reconstructing and systematically analyzing gene family evolution are limited to events such as duplication, loss, and sometimes, horizontal transfer. However, within the Drosophila clade, we find domain rearrangements occur in 35.9% of gene families, and thus, any comprehensive study of gene evolution in these species will need to account for such events. Here, we present a new computational model and algorithm for reconstructing gene evolution at the domain level. We develop a method for detecting homologous domains between genes and present a phylogenetic algorithm for reconstructing maximum parsimony evolutionary histories that include domain generation, duplication, loss, merge (fusion), and split (fission) events. Using this method, we find that genes involved in fusion and fission are enriched in signaling and development, suggesting that domain rearrangements and reuse may be crucial in these processes. We also find that fusion is more abundant than fission, and that fusion and fission events occur predominantly alongside duplication, with 92.5% and 34.3% of fusion and fission events retaining ancestral architectures in the duplicated copies. We provide a catalog of ∼9,000 genes that undergo domain rearrangement across nine sequenced species, along with possible mechanisms for their formation. These results dramatically expand on evolution at the subgene level and offer several insights into how new genes and functions arise between species.
Collapse
Affiliation(s)
- Yi-Chieh Wu
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Massachusetts, USA.
| | | | | |
Collapse
|
59
|
Flynn M, Saha O, Young P. Molecular evolution of the LNX gene family. BMC Evol Biol 2011; 11:235. [PMID: 21827680 PMCID: PMC3162930 DOI: 10.1186/1471-2148-11-235] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2011] [Accepted: 08/09/2011] [Indexed: 02/07/2023] Open
Abstract
Background LNX (Ligand of Numb Protein-X) proteins typically contain an amino-terminal RING domain adjacent to either two or four PDZ domains - a domain architecture that is unique to the LNX family. LNX proteins function as E3 ubiquitin ligases and their domain organisation suggests that their ubiquitin ligase activity may be targeted to specific substrates or subcellular locations by PDZ domain-mediated interactions. Indeed, numerous interaction partners for LNX proteins have been identified, but the in vivo functions of most family members remain largely unclear. Results To gain insights into their function we examined the phylogenetic origins and evolution of the LNX gene family. We find that a LNX1/LNX2-like gene arose in an early metazoan lineage by gene duplication and fusion events that combined a RING domain with four PDZ domains. These PDZ domains are closely related to the four carboxy-terminal domains from multiple PDZ domain containing protein-1 (MUPP1). Duplication of the LNX1/LNX2-like gene and subsequent loss of PDZ domains appears to have generated a gene encoding a LNX3/LNX4-like protein, with just two PDZ domains. This protein has novel carboxy-terminal sequences that include a potential modular LNX3 homology domain. The two ancestral LNX genes are present in some, but not all, invertebrate lineages. They were, however, maintained in the vertebrate lineage, with further duplication events giving rise to five LNX family members in most mammals. In addition, we identify novel interactions of LNX1 and LNX2 with three known MUPP1 ligands using yeast two-hybrid asssays. This demonstrates conservation of binding specificity between LNX and MUPP1 PDZ domains. Conclusions The LNX gene family has an early metazoan origin with a LNX1/LNX2-like protein likely giving rise to a LNX3/LNX4-like protein through the loss of PDZ domains. The absence of LNX orthologs in some lineages indicates that LNX proteins are not essential in invertebrates. In contrast, the maintenance of both ancestral LNX genes in the vertebrate lineage suggests the acquisition of essential vertebrate specific functions. The revelation that the LNX PDZ domains are phylogenetically related to domains in MUPP1, and have common binding specificities, suggests that LNX and MUPP1 may have similarities in their cellular functions.
Collapse
Affiliation(s)
- Michael Flynn
- Department of Biochemistry, University College Cork, Cork, Ireland
| | | | | |
Collapse
|
60
|
Nagy A, Patthy L. Reassessing domain architecture evolution of metazoan proteins: the contribution of different evolutionary mechanisms. Genes (Basel) 2011; 2:578-98. [PMID: 24710211 PMCID: PMC3927616 DOI: 10.3390/genes2030578] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2011] [Revised: 07/13/2011] [Accepted: 08/02/2011] [Indexed: 11/16/2022] Open
Abstract
In the accompanying papers we have shown that sequence errors of public databases and confusion of paralogs and epaktologs (proteins that are related only through the independent acquisition of the same domain types) significantly distort the picture that emerges from comparison of the domain architecture (DA) of multidomain Metazoan proteins since they introduce a strong bias in favor of terminal over internal DA change. The issue of whether terminal or internal DA changes occur with greater probability has very important implications for the DA evolution of multidomain proteins since gene fusion can add domains only at terminal positions, whereas domain-shuffling is capable of inserting domains both at internal and terminal positions. As a corollary, overestimation of terminal DA changes may be misinterpreted as evidence for a dominant role of gene fusion in DA evolution. In this manuscript we show that in several recent studies of DA evolution of Metazoa the authors used databases that are significantly contaminated with incomplete, abnormal and mispredicted sequences (e.g., UniProtKB/TrEMBL, EnsEMBL) and/or the authors failed to separate paralogs and epaktologs, explaining why these studies concluded that the major mechanism for gains of new domains in metazoan proteins is gene fusion. In contrast with the latter conclusion, our studies on high quality orthologous and paralogous Swiss-Prot sequences confirm that shuffling of mobile domains had a major role in the evolution of multidomain proteins of Metazoa and especially those formed in early vertebrates.
Collapse
Affiliation(s)
- Alinda Nagy
- Institute of Enzymology, Biological Research Center, Hungarian Academy of Sciences, Budapest H-1113, Hungary.
| | - Laszlo Patthy
- Institute of Enzymology, Biological Research Center, Hungarian Academy of Sciences, Budapest H-1113, Hungary.
| |
Collapse
|
61
|
Bhaskara RM, Srinivasan N. Stability of domain structures in multi-domain proteins. Sci Rep 2011; 1:40. [PMID: 22355559 PMCID: PMC3216527 DOI: 10.1038/srep00040] [Citation(s) in RCA: 78] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2011] [Accepted: 06/27/2011] [Indexed: 01/22/2023] Open
Abstract
Multi-domain proteins have many advantages with respect to stability and folding inside cells. Here we attempt to understand the intricate relationship between the domain-domain interactions and the stability of domains in isolation. We provide quantitative treatment and proof for prevailing intuitive ideas on the strategies employed by nature to stabilize otherwise unstable domains. We find that domains incapable of independent stability are stabilized by favourable interactions with tethered domains in the multi-domain context. Stability of such folds to exist independently is optimized by evolution. Specific residue mutations in the sites equivalent to inter-domain interface enhance the overall solvation, thereby stabilizing these domain folds independently. A few naturally occurring variants at these sites alter communication between domains and affect stability leading to disease manifestation. Our analysis provides safe guidelines for mutagenesis which have attractive applications in obtaining stable fragments and domain constructs essential for structural studies by crystallography and NMR.
Collapse
|
62
|
Reassessing domain architecture evolution of metazoan proteins: major impact of gene prediction errors. Genes (Basel) 2011; 2:449-501. [PMID: 24710207 PMCID: PMC3927609 DOI: 10.3390/genes2030449] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2011] [Revised: 06/14/2011] [Accepted: 06/20/2011] [Indexed: 11/17/2022] Open
Abstract
In view of the fact that appearance of novel protein domain architectures (DA) is closely associated with biological innovations, there is a growing interest in the genome-scale reconstruction of the evolutionary history of the domain architectures of multidomain proteins. In such analyses, however, it is usually ignored that a significant proportion of Metazoan sequences analyzed is mispredicted and that this may seriously affect the validity of the conclusions. To estimate the contribution of errors in gene prediction to differences in DA of predicted proteins, we have used the high quality manually curated UniProtKB/Swiss-Prot database as a reference. For genome-scale analysis of domain architectures of predicted proteins we focused on RefSeq, EnsEMBL and NCBI's GNOMON predicted sequences of Metazoan species with completely sequenced genomes. Comparison of the DA of UniProtKB/Swiss-Prot sequences of worm, fly, zebrafish, frog, chick, mouse, rat and orangutan with those of human Swiss-Prot entries have identified relatively few cases where orthologs had different DA, although the percentage with different DA increased with evolutionary distance. In contrast with this, comparison of the DA of human, orangutan, rat, mouse, chicken, frog, zebrafish, worm and fly RefSeq, EnsEMBL and NCBI's GNOMON predicted protein sequences with those of the corresponding/orthologous human Swiss-Prot entries identified a significantly higher proportion of domain architecture differences than in the case of the comparison of Swiss-Prot entries. Analysis of RefSeq, EnsEMBL and NCBI's GNOMON predicted protein sequences with DAs different from those of their Swiss-Prot orthologs confirmed that the higher rate of domain architecture differences is due to errors in gene prediction, the majority of which could be corrected with our FixPred protocol. We have also demonstrated that contamination of databases with incomplete, abnormal or mispredicted sequences introduces a bias in DA differences in as much as it increases the proportion of terminal over internal DA differences. Here we have shown that in the case of RefSeq, EnsEMBL and NCBI's GNOMON predicted protein sequences of Metazoan species, the contribution of gene prediction errors to domain architecture differences of orthologs is comparable to or greater than those due to true gene rearrangements. We have also demonstrated that domain architecture comparison may serve as a useful tool for the quality control of gene predictions and may thus guide the correction of sequence errors. Our findings caution that earlier genome-scale studies based on comparison of predicted (frequently mispredicted) protein sequences may have led to some erroneous conclusions about the evolution of novel domain architectures of multidomain proteins. A reassessment of the DA evolution of orthologous and paralogous proteins is presented in an accompanying paper [1].
Collapse
|
63
|
Cohen-Gihon I, Fong JH, Sharan R, Nussinov R, Przytycka TM, Panchenko AR. Evolution of domain promiscuity in eukaryotic genomes--a perspective from the inferred ancestral domain architectures. MOLECULAR BIOSYSTEMS 2011; 7:784-92. [PMID: 21127809 PMCID: PMC3321261 DOI: 10.1039/c0mb00182a] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Most eukaryotic proteins are composed of two or more domains. These assemble in a modular manner to create new proteins usually by the acquisition of one or more domains to an existing protein. Promiscuous domains which are found embedded in a variety of proteins and co-exist with many other domains are of particular interest and were shown to have roles in signaling pathways and mediating network communication. The evolution of domain promiscuity is still an open problem, mostly due to the lack of sequenced ancestral genomes. Here we use inferred domain architectures of ancestral genomes to trace the evolution of domain promiscuity in eukaryotic genomes. We find an increase in average promiscuity along many branches of the eukaryotic tree. Moreover, domain promiscuity can proceed at almost a steady rate over long evolutionary time or exhibit lineage-specific acceleration. We also observe that many signaling and regulatory domains gained domain promiscuity around the Bilateria divergence. In addition we show that those domains that played a role in the creation of two body axes and existed before the divergence of the bilaterians from fungi/metazoan achieve a boost in their promiscuities during the bilaterian evolution.
Collapse
Affiliation(s)
- Inbar Cohen-Gihon
- Sackler Institute of Molecular Medicine, Department of Human Genetics, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Jessica H. Fong
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Roded Sharan
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| | - Ruth Nussinov
- Sackler Institute of Molecular Medicine, Department of Human Genetics, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
- Center for Cancer Research Nanobiology Program, SAIC-Frederick, Inc., NCI-Frederick, Frederick, MD 21702, USA
| | - Teresa M. Przytycka
- Center for Cancer Research Nanobiology Program, SAIC-Frederick, Inc., NCI-Frederick, Frederick, MD 21702, USA
| | - Anna R. Panchenko
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
64
|
Rüping B, Ernst AM, Jekat SB, Nordzieke S, Reineke AR, Müller B, Bornberg-Bauer E, Prüfer D, Noll GA. Molecular and phylogenetic characterization of the sieve element occlusion gene family in Fabaceae and non-Fabaceae plants. BMC PLANT BIOLOGY 2010; 10:219. [PMID: 20932300 PMCID: PMC3017817 DOI: 10.1186/1471-2229-10-219] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/15/2010] [Accepted: 10/08/2010] [Indexed: 05/18/2023]
Abstract
BACKGROUND The phloem of dicotyledonous plants contains specialized P-proteins (phloem proteins) that accumulate during sieve element differentiation and remain parietally associated with the cisternae of the endoplasmic reticulum in mature sieve elements. Wounding causes P-protein filaments to accumulate at the sieve plates and block the translocation of photosynthate. Specialized, spindle-shaped P-proteins known as forisomes that undergo reversible calcium-dependent conformational changes have evolved exclusively in the Fabaceae. Recently, the molecular characterization of three genes encoding forisome components in the model legume Medicago truncatula (MtSEO1, MtSEO2 and MtSEO3; SEO = sieve element occlusion) was reported, but little is known about the molecular characteristics of P-proteins in non-Fabaceae. RESULTS We performed a comprehensive genome-wide comparative analysis by screening the M. truncatula, Glycine max, Arabidopsis thaliana, Vitis vinifera and Solanum phureja genomes, and a Malus domestica EST library for homologs of MtSEO1, MtSEO2 and MtSEO3 and identified numerous novel SEO genes in Fabaceae and even non-Fabaceae plants, which do not possess forisomes. Even in Fabaceae some SEO genes appear to not encode forisome components. All SEO genes have a similar exon-intron structure and are expressed predominantly in the phloem. Phylogenetic analysis revealed the presence of several subgroups with Fabaceae-specific subgroups containing all of the known as well as newly identified forisome component proteins. We constructed Hidden Markov Models that identified three conserved protein domains, which characterize SEO proteins when present in combination. In addition, one common and three subgroup specific protein motifs were found in the amino acid sequences of SEO proteins. SEO genes are organized in genomic clusters and the conserved synteny allowed us to identify several M. truncatula vs G. max orthologs as well as paralogs within the G. max genome. CONCLUSIONS The unexpected occurrence of forisome-like genes in non-Fabaceae plants may indicate that these proteins encode species-specific P-proteins, which is backed up by the phloem-specific expression profiles. The conservation of gene structure, the presence of specific motifs and domains and the genomic synteny argue for a common phylogenetic origin of forisomes and other P-proteins.
Collapse
Affiliation(s)
- Boris Rüping
- Institut für Biochemie und Biotechnologie der Pflanzen, Westfälische Wilhelms-Universität Münster, Hindenburgplatz 55, D-48143 Münster, Germany
- Fraunhofer Institute for Molecular Biology and Applied Ecology (IME), Forckenbeckstraße 6, D-52074 Aachen, Germany
| | - Antonia M Ernst
- Institut für Biochemie und Biotechnologie der Pflanzen, Westfälische Wilhelms-Universität Münster, Hindenburgplatz 55, D-48143 Münster, Germany
- Fraunhofer Institute for Molecular Biology and Applied Ecology (IME), Forckenbeckstraße 6, D-52074 Aachen, Germany
| | - Stephan B Jekat
- Institut für Biochemie und Biotechnologie der Pflanzen, Westfälische Wilhelms-Universität Münster, Hindenburgplatz 55, D-48143 Münster, Germany
- Fraunhofer Institute for Molecular Biology and Applied Ecology (IME), Forckenbeckstraße 6, D-52074 Aachen, Germany
| | - Steffen Nordzieke
- Institut für Biochemie und Biotechnologie der Pflanzen, Westfälische Wilhelms-Universität Münster, Hindenburgplatz 55, D-48143 Münster, Germany
| | - Anna R Reineke
- Institut für Evolution und Biodiversität, Westfälische Wilhelms-Universität Münster, Hüfferstraße 1, D-48149 Münster, Germany
| | - Boje Müller
- Institut für Biochemie und Biotechnologie der Pflanzen, Westfälische Wilhelms-Universität Münster, Hindenburgplatz 55, D-48143 Münster, Germany
- Fraunhofer Institute for Molecular Biology and Applied Ecology (IME), Forckenbeckstraße 6, D-52074 Aachen, Germany
| | - Erich Bornberg-Bauer
- Institut für Evolution und Biodiversität, Westfälische Wilhelms-Universität Münster, Hüfferstraße 1, D-48149 Münster, Germany
| | - Dirk Prüfer
- Institut für Biochemie und Biotechnologie der Pflanzen, Westfälische Wilhelms-Universität Münster, Hindenburgplatz 55, D-48143 Münster, Germany
- Fraunhofer Institute for Molecular Biology and Applied Ecology (IME), Forckenbeckstraße 6, D-52074 Aachen, Germany
| | - Gundula A Noll
- Institut für Biochemie und Biotechnologie der Pflanzen, Westfälische Wilhelms-Universität Münster, Hindenburgplatz 55, D-48143 Münster, Germany
| |
Collapse
|
65
|
Nebulin: A Study of Protein Repeat Evolution. J Mol Biol 2010; 402:38-51. [DOI: 10.1016/j.jmb.2010.07.011] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2010] [Revised: 06/22/2010] [Accepted: 07/07/2010] [Indexed: 11/22/2022]
|
66
|
Grassi L, Fusco D, Sellerio A, Corà D, Bassetti B, Caselle M, Lagomarsino MC. Identity and divergence of protein domain architectures after the yeast whole-genome duplication event. MOLECULAR BIOSYSTEMS 2010; 6:2305-15. [PMID: 20820472 DOI: 10.1039/c003507f] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Gene duplication is a key mechanism in evolution for generating new functionality, and it is known to have produced a large proportion of genes. Duplication mechanisms include small-scale, or "local", events such as unequal crossing over and retroposition, together with global events, such as chromosomal or whole genome duplication (WGD). In particular, different studies confirmed that the yeast S. cerevisiae arose from a 100-150 million-year old whole-genome duplication. Detection and study of duplications are usually based on sequence alignment, synteny and phylogenetic techniques, but protein domains are also useful in assessing protein homology. We develop a simple and computationally efficient protein domain architecture comparison method based on the domain assignments available from public databases. We test the accuracy and the reliability of this method in detecting instances of gene duplication in the yeast S. cerevisiae. In particular, we analyze the evolution of WGD and non-WGD paralogs from the domain viewpoint, in comparison with a more standard functional analysis of the genes. A large number of domains is shared by genes that underwent local and global duplications, indicating the existence of a common set of "duplicable" domains. On the other hand, WGD and non-WGD paralogs tend to have different functions. We find evidence that this comes from functional migration within similar domain superfamilies, but also from the existence of small sets of WGD and non-WGD specific domain superfamilies with largely different functions. This observation gives a novel perspective on the finding that WGD paralogs tend to be functionally different from small-scale paralogs. WGD and non-WGD superfamilies carry distinct functions. Finally, the Gene Ontology similarity of paralogs tends to decrease with duplication age, while this tendency is weaker or not observable by the comparison of the domain architectures of paralogs. This suggests that the set of domains composing a protein tends to be maintained, while its function, cellular process or localization diversifies. Overall, the gathered evidence gives a different viewpoint on the biological specificity of the WGD and at the same time points out the validity of domain architecture comparison as a tool for detecting homology.
Collapse
Affiliation(s)
- Luigi Grassi
- Università degli Studi di Torino, Dip. Fisica Teorica-Via Giuria 1, 10125 Torino, Italy
| | | | | | | | | | | | | |
Collapse
|
67
|
Nacher J, Hayashida M, Akutsu T. The role of internal duplication in the evolution of multi-domain proteins. Biosystems 2010; 101:127-35. [DOI: 10.1016/j.biosystems.2010.05.005] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2010] [Revised: 05/24/2010] [Accepted: 05/25/2010] [Indexed: 11/30/2022]
|
68
|
Bagowski CP, Bruins W, te Velthuis AJ. The nature of protein domain evolution: shaping the interaction network. Curr Genomics 2010; 11:368-76. [PMID: 21286315 PMCID: PMC2945003 DOI: 10.2174/138920210791616725] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2010] [Revised: 06/04/2010] [Accepted: 06/13/2010] [Indexed: 11/30/2022] Open
Abstract
The proteomes that make up the collection of proteins in contemporary organisms evolved through recombination and duplication of a limited set of domains. These protein domains are essentially the main components of globular proteins and are the most principal level at which protein function and protein interactions can be understood. An important aspect of domain evolution is their atomic structure and biochemical function, which are both specified by the information in the amino acid sequence. Changes in this information may bring about new folds, functions and protein architectures. With the present and still increasing wealth of sequences and annotation data brought about by genomics, new evolutionary relationships are constantly being revealed, unknown structures modeled and phylogenies inferred. Such investigations not only help predict the function of newly discovered proteins, but also assist in mapping unforeseen pathways of evolution and reveal crucial, co-evolving inter- and intra-molecular interactions. In turn this will help us describe how protein domains shaped cellular interaction networks and the dynamics with which they are regulated in the cell. Additionally, these studies can be used for the design of new and optimized protein domains for therapy. In this review, we aim to describe the basic concepts of protein domain evolution and illustrate recent developments in molecular evolution that have provided valuable new insights in the field of comparative genomics and protein interaction networks.
Collapse
Affiliation(s)
- Christoph P Bagowski
- German University Cairo, Faculty of Pharmacy and Biotechnology, New Cairo City, Egypt
| | - Wouter Bruins
- Institute of Biology, Leiden University, 2333 AL Leiden, The Netherlands
| | - Aartjan J.W te Velthuis
- Department of Medical Microbiology, Molecular Virology Laboratory, Leiden University Medical Center, Albinusdreef 2, 2333 ZA Leiden, The Netherlands
- Department of Bionanoscience, Delft University of Technology, Lorentzweg 1, 2628 CJ, Delft, The Netherlands
| |
Collapse
|
69
|
Buljan M, Frankish A, Bateman A. Quantifying the mechanisms of domain gain in animal proteins. Genome Biol 2010; 11:R74. [PMID: 20633280 PMCID: PMC2926785 DOI: 10.1186/gb-2010-11-7-r74] [Citation(s) in RCA: 82] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2010] [Revised: 06/04/2010] [Accepted: 07/15/2010] [Indexed: 11/21/2022] Open
Abstract
Background Protein domains are protein regions that are shared among different proteins and are frequently functionally and structurally independent from the rest of the protein. Novel domain combinations have a major role in evolutionary innovation. However, the relative contributions of the different molecular mechanisms that underlie domain gains in animals are still unknown. By using animal gene phylogenies we were able to identify a set of high confidence domain gain events and by looking at their coding DNA investigate the causative mechanisms. Results Here we show that the major mechanism for gains of new domains in metazoan proteins is likely to be gene fusion through joining of exons from adjacent genes, possibly mediated by non-allelic homologous recombination. Retroposition and insertion of exons into ancestral introns through intronic recombination are, in contrast to previous expectations, only minor contributors to domain gains and have accounted for less than 1% and 10% of high confidence domain gain events, respectively. Additionally, exonization of previously non-coding regions appears to be an important mechanism for addition of disordered segments to proteins. We observe that gene duplication has preceded domain gain in at least 80% of the gain events. Conclusions The interplay of gene duplication and domain gain demonstrates an important mechanism for fast neofunctionalization of genes.
Collapse
Affiliation(s)
- Marija Buljan
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
| | | | | |
Collapse
|
70
|
Abstract
A study of the contributions of different mechanisms of domain gain in animal proteins suggests that gene fusion is likely to be most frequent.
Collapse
Affiliation(s)
- Joseph A Marsh
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, UK
| | | |
Collapse
|
71
|
How do new proteins arise? Curr Opin Struct Biol 2010; 20:390-6. [PMID: 20347587 DOI: 10.1016/j.sbi.2010.02.005] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2010] [Revised: 02/24/2010] [Accepted: 02/25/2010] [Indexed: 11/23/2022]
|
72
|
Zhang Q, Zmasek CM, Godzik A. Domain architecture evolution of pattern-recognition receptors. Immunogenetics 2010; 62:263-72. [PMID: 20195594 PMCID: PMC2858798 DOI: 10.1007/s00251-010-0428-1] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2009] [Accepted: 02/03/2010] [Indexed: 12/11/2022]
Abstract
In animals, the innate immune system is the first line of defense against invading microorganisms, and the pattern-recognition receptors (PRRs) are the key components of this system, detecting microbial invasion and initiating innate immune defenses. Two families of PRRs, the intracellular NOD-like receptors (NLRs) and the transmembrane Toll-like receptors (TLRs), are of particular interest because of their roles in a number of diseases. Understanding the evolutionary history of these families and their pattern of evolutionary changes may lead to new insights into the functioning of this critical system. We found that the evolution of both NLR and TLR families included massive species-specific expansions and domain shuffling in various lineages, which resulted in the same domain architectures evolving independently within different lineages in a process that fits the definition of parallel evolution. This observation illustrates both the dynamics of the innate immune system and the effects of "combinatorially constrained" evolution, where existence of the limited numbers of functionally relevant domains constrains the choices of domain architectures for new members in the family, resulting in the emergence of independently evolved proteins with identical domain architectures, often mistaken for orthologs.
Collapse
Affiliation(s)
- Qing Zhang
- Burnham Institute for Medical Research, 10901 North Torrey Pines Road, La Jolla, CA 92037 USA
| | - Christian M. Zmasek
- Burnham Institute for Medical Research, 10901 North Torrey Pines Road, La Jolla, CA 92037 USA
| | - Adam Godzik
- Burnham Institute for Medical Research, 10901 North Torrey Pines Road, La Jolla, CA 92037 USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093 USA
| |
Collapse
|
73
|
Hydrothermal focusing of chemical and chemiosmotic energy, supported by delivery of catalytic Fe, Ni, Mo/W, Co, S and Se, forced life to emerge. J Mol Evol 2009; 69:481-96. [PMID: 19911220 DOI: 10.1007/s00239-009-9289-3] [Citation(s) in RCA: 95] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2009] [Accepted: 09/18/2009] [Indexed: 10/20/2022]
Abstract
Energised by the protonmotive force and with the intervention of inorganic catalysts, at base Life reacts hydrogen from a variety of sources with atmospheric carbon dioxide. It seems inescapable that life emerged to fulfil the same role (i.e., to hydrogenate CO(2)) on the early Earth, thus outcompeting the slow geochemical reduction to methane. Life would have done so where hydrothermal hydrogen interfaced a carbonic ocean through inorganic precipitate membranes. Thus we argue that the first carbon-fixing reaction was the molybdenum-dependent, proton-translocating formate hydrogenlyase system described by Andrews et al. (Microbiology 143:3633-3647, 1997), but driven in reverse. Alkaline on the inside and acidic and carbonic on the outside - a submarine chambered hydrothermal mound built above an alkaline hydrothermal spring of long duration - offered just the conditions for such a reverse reaction imposed by the ambient protonmotive force. Assisted by the same inorganic catalysts and potential energy stores that were to evolve into the active centres of enzymes supplied variously from ocean or hydrothermal system, the formate reaction enabled the rest of the acetyl coenzyme-A pathway to be followed exergonically, first to acetate, then separately to methane. Thus the two prokaryotic domains both emerged within the hydrothermal mound-the acetogens were the forerunners of the Bacteria and the methanogens were the forerunners of the Archaea.
Collapse
|
74
|
Avonce N, Wuyts J, Verschooten K, Vandesteene L, Van Dijck P. The Cytophaga hutchinsonii ChTPSP: First Characterized Bifunctional TPS–TPP Protein as Putative Ancestor of All Eukaryotic Trehalose Biosynthesis Proteins. Mol Biol Evol 2009; 27:359-69. [DOI: 10.1093/molbev/msp241] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
|
75
|
Terrapon N, Gascuel O, Maréchal E, Bréehélin L. Detection of new protein domains using co-occurrence: application to Plasmodium falciparum. ACTA ACUST UNITED AC 2009; 25:3077-83. [PMID: 19786484 DOI: 10.1093/bioinformatics/btp560] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Hidden Markov models (HMMs) have proved to be a powerful tool for protein domain identification in newly sequenced organisms. However, numerous domains may be missed in highly divergent proteins. This is the case for Plasmodium falciparum proteins, the main causal agent of human malaria. RESULTS We propose a method to improve the sensitivity of HMM domain detection by exploiting the tendency of the domains to appear preferentially with a few other favorite domains in a protein. When sequence information alone is not sufficient to warrant the presence of a particular domain, our method enables its detection on the basis of the presence of other Pfam or InterPro domains. Moreover, a shuffling procedure allows us to estimate the false discovery rate associated with the results. Applied to P. falciparum, our method identifies 585 new Pfam domains (versus the 3683 already known domains in the Pfam database) with an estimated error rate <20%. These new domains provide 387 new Gene Ontology (GO) annotations to the P. falciparum proteome. Analogous and congruent results are obtained when applying the method to related Plasmodium species (P. vivax and P. yoelii). AVAILABILITY Supplementary Material and a database of the new domains and GO predictions achieved on Plasmodium proteins are available at http://www.lirmm.fr/~terrapon/codd/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nicolas Terrapon
- Méthodes et algorithmes pour la Bioinformatique, LIRMM, Université Montpellier 2, CNRS, 161 rue Ada, 34392 Montpellier Cedex 5, France
| | | | | | | |
Collapse
|
76
|
Abstract
Protein domains are the common currency of protein structure and function. Over 10000 such protein families have now been collected in the Pfam database. Using these data along with animal gene phylogenies from TreeFam allowed us to investigate the gain and loss of protein domains. Most gains and losses of domains occur at protein termini. We show that the nature of changes is similar after speciation or duplication events. However, changes in domain architecture happen at a higher frequency after gene duplication. We suggest that the bias towards protein termini is largely because insertion and deletion of domains at most positions in a protein are likely to disrupt the structure of existing domains. We can also use Pfam to trace the evolution of specific families. For example, the immunoglobulin superfamily can be traced over 500 million years during its expansion into one of the largest families in the human genome. It can be shown that this protein family has its origins in basic animals such as the poriferan sponges where it is found in cell-surface-receptor proteins. We can trace how the structure and sequence of this family diverged during vertebrate evolution into constant and variable domains that are found in the antibodies of our immune system as well as in neural and muscle proteins.
Collapse
|
77
|
Abstract
It has been known for more than 35 years that, during evolution, new proteins are formed by gene duplications, sequence and structural divergence and, in many cases, gene combinations. The genome projects have produced complete, or almost complete, descriptions of the protein repertoires of over 600 distinct organisms. Analyses of these data have dramatically increased our understanding of the formation of new proteins. At the present time, we can accurately trace the evolutionary relationships of about half the proteins found in most genomes, and it is these proteins that we discuss in the present review. Usually, the units of evolution are protein domains that are duplicated, diverge and form combinations. Small proteins contain one domain, and large proteins contain combinations of two or more domains. Domains descended from a common ancestor are clustered into superfamilies. In most genomes, the net growth of superfamily members means that more than 90% of domains are duplicates. In a section on domain duplications, we discuss the number of currently known superfamilies, their size and distribution, and superfamily expansions related to biological complexity and to specific lineages. In a section on divergence, we describe how sequences and structures diverge, the changes in stability produced by acceptable mutations, and the nature of functional divergence and selection. In a section on domain combinations, we discuss their general nature, the sequential order of domains, how combinations modify function, and the extraordinary variety of the domain combinations found in different genomes. We conclude with a brief note on other forms of protein evolution and speculations of the origins of the duplication, divergence and combination processes.
Collapse
|
78
|
Wang M, Caetano-Anollés G. The evolutionary mechanics of domain organization in proteomes and the rise of modularity in the protein world. Structure 2009; 17:66-78. [PMID: 19141283 DOI: 10.1016/j.str.2008.11.008] [Citation(s) in RCA: 101] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2008] [Revised: 10/27/2008] [Accepted: 11/13/2008] [Indexed: 10/21/2022]
Abstract
Protein domains are compact evolutionary units of structure and function that usually combine in proteins to produce complex domain arrangements. In order to study their evolution, we reconstructed genome-based phylogenetic trees of architectures from a census of domain structure and organization conducted at protein fold and fold-superfamily levels in hundreds of fully sequenced genomes. These trees defined timelines of architectural discovery and revealed remarkable evolutionary patterns, including the explosive appearance of domain combinations during the rise of organismal lineages, the dominance of domain fusion processes throughout evolution, and the late appearance of a new class of multifunctional modules in Eukarya by fission of domain combinations. Our study provides a detailed account of the history and diversification of a molecular interactome and shows how the interplay of domain fusions and fissions defines an evolutionary mechanics of domain organization that is fundamentally responsible for the complexity of the protein world.
Collapse
Affiliation(s)
- Minglei Wang
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | | |
Collapse
|
79
|
Cosentino Lagomarsino M, Sellerio AL, Heijning PD, Bassetti B. Universal features in the genome-level evolution of protein domains. Genome Biol 2009; 10:R12. [PMID: 19183449 PMCID: PMC2687789 DOI: 10.1186/gb-2009-10-1-r12] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2008] [Revised: 01/22/2009] [Accepted: 01/30/2009] [Indexed: 12/03/2022] Open
Abstract
Novel protein domain stochastic duplication/innovation models that are independent of genome-specific features are used to interpret global trends of genome evolution. Background Protein domains can be used to study proteome evolution at a coarse scale. In particular, they are found on genomes with notable statistical distributions. It is known that the distribution of domains with a given topology follows a power law. We focus on a further aspect: these distributions, and the number of distinct topologies, follow collective trends, or scaling laws, depending on the total number of domains only, and not on genome-specific features. Results We present a stochastic duplication/innovation model, in the class of the so-called 'Chinese restaurant processes', that explains this observation with two universal parameters, representing a minimal number of domains and the relative weight of innovation to duplication. Furthermore, we study a model variant where new topologies are related to occurrence in genomic data, accounting for fold specificity. Conclusions Both models have general quantitative agreement with data from hundreds of genomes, which indicates that the domains of a genome are built with a combination of specificity and robust self-organizing phenomena. The latter are related to the basic evolutionary 'moves' of duplication and innovation, and give rise to the observed scaling laws, a priori of the specific evolutionary history of a genome. We interpret this as the concurrent effect of neutral and selective drives, which increase duplication and decrease innovation in larger and more complex genomes. The validity of our model would imply that the empirical observation of a small number of folds in nature may be a consequence of their evolution.
Collapse
|
80
|
Kummerfeld SK, Teichmann SA. Protein domain organisation: adding order. BMC Bioinformatics 2009; 10:39. [PMID: 19178743 PMCID: PMC2657131 DOI: 10.1186/1471-2105-10-39] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2008] [Accepted: 01/29/2009] [Indexed: 11/30/2022] Open
Abstract
Background Domains are the building blocks of proteins. During evolution, they have been duplicated, fused and recombined, to produce proteins with novel structures and functions. Structural and genome-scale studies have shown that pairs or groups of domains observed together in a protein are almost always found in only one N to C terminal order and are the result of a single recombination event that has been propagated by duplication of the multi-domain unit. Previous studies of domain organisation have used graph theory to represent the co-occurrence of domains within proteins. We build on this approach by adding directionality to the graphs and connecting nodes based on their relative order in the protein. Most of the time, the linear order of domains is conserved. However, using the directed graph representation we have identified non-linear features of domain organization that are over-represented in genomes. Recognising these patterns and unravelling how they have arisen may allow us to understand the functional relationships between domains and understand how the protein repertoire has evolved. Results We identify groups of domains that are not linearly conserved, but instead have been shuffled during evolution so that they occur in multiple different orders. We consider 192 genomes across all three kingdoms of life and use domain and protein annotation to understand their functional significance. To identify these features and assess their statistical significance, we represent the linear order of domains in proteins as a directed graph and apply graph theoretical methods. We describe two higher-order patterns of domain organisation: clusters and bi-directionally associated domain pairs and explore their functional importance and phylogenetic conservation. Conclusion Taking into account the order of domains, we have derived a novel picture of global protein organization. We found that all genomes have a higher than expected degree of clustering and more domain pairs in forward and reverse orientation in different proteins relative to random graphs with identical degree distributions. While these features were statistically over-represented, they are still fairly rare. Looking in detail at the proteins involved, we found strong functional relationships within each cluster. In addition, the domains tended to be involved in protein-protein interaction and are able to function as independent structural units. A particularly striking example was the human Jak-STAT signalling pathway which makes use of a set of domains in a range of orders and orientations to provide nuanced signaling functionality. This illustrated the importance of functional and structural constraints (or lack thereof) on domain organisation.
Collapse
Affiliation(s)
- Sarah K Kummerfeld
- Department of Developmental Biology, 279 Campus Dr, Stanford, 94305, CA, USA.
| | | |
Collapse
|
81
|
Basu MK, Poliakov E, Rogozin IB. Domain mobility in proteins: functional and evolutionary implications. Brief Bioinform 2009; 10:205-16. [PMID: 19151098 DOI: 10.1093/bib/bbn057] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
A substantial fraction of eukaryotic proteins contains multiple domains, some of which show a tendency to occur in diverse domain architectures and can be considered mobile (or 'promiscuous'). These promiscuous domains are typically involved in protein-protein interactions and play crucial roles in interaction networks, particularly those contributing to signal transduction. They also play a major role in creating diversity of protein domain architecture in the proteome. It is now apparent that promiscuity is a volatile and relatively fast-changing feature in evolution, and that only a few domains retain their promiscuity status throughout evolution. Many such domains attained their promiscuity status independently in different lineages. Only recently, we have begun to understand the diversity of protein domain architectures and the role the promiscuous domains play in evolution of this diversity. However, many of the biological mechanisms of protein domain mobility remain shrouded in mystery. In this review, we discuss our present understanding of protein domain promiscuity, its evolution and its role in cellular function.
Collapse
Affiliation(s)
- Malay Kumar Basu
- J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA.
| | | | | |
Collapse
|
82
|
Yang H, Wu Y, Feng J, Yang S, Tian D. Evolutionary pattern of protein architecture in mammal and fruit fly genomes. Genomics 2008; 93:90-7. [PMID: 18929639 DOI: 10.1016/j.ygeno.2008.09.009] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2008] [Revised: 09/12/2008] [Accepted: 09/13/2008] [Indexed: 11/17/2022]
Abstract
Mutations, which can alter amino acid constitution, contribute greatly to protein evolution. However, little is reported of their pattern during protein structural evolution. We investigated the distribution of non-synonymous single nucleotide polymorphisms (nsSNPs) and insertions/deletions (indels) along mammal and fruit fly proteins. We found the nsSNPs (and d(N)) and indels increased in protein boundary regions, and this pattern is inversely correlated with the distribution of protein domain density. Additionally, synonymous substitutions (and d(S)) are reduced in 5' and 3' regions, indicating more variable protein boundaries, compared with central interior. All evidence suggests that the inner part of coding sequences (CDSs) is comparatively conserved, whereas the 5' and 3' regions, with higher evolution rates, are more variable. We assumed that due to greater frequencies of nsSNPs and indels in adaptive regions of CDSs it could be easier to ultimately alter, gain, or lose amino acids, thus becoming the front line of protein evolution.
Collapse
Affiliation(s)
- Haiwang Yang
- State Key Laboratory of Pharmaceutical Biotechnology, Department of Biology, Nanjing University, Nanjing 210093, China
| | | | | | | | | |
Collapse
|
83
|
Weiner J, Moore AD, Bornberg-Bauer E. Just how versatile are domains? BMC Evol Biol 2008; 8:285. [PMID: 18854028 PMCID: PMC2588589 DOI: 10.1186/1471-2148-8-285] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2008] [Accepted: 10/14/2008] [Indexed: 11/17/2022] Open
Abstract
Background Creating new protein domain arrangements is a frequent mechanism of evolutionary innovation. While some domains always form the same combinations, others form many different arrangements. This ability, which is often referred to as versatility or promiscuity of domains, its a random evolutionary model in which a domain's promiscuity is based on its relative frequency of domains. Results We show that there is a clear relationship across genomes between the promiscuity of a given domain and its frequency. However, the strength of this relationship differs for different domains. We thus redefine domain promiscuity by defining a new index, DV I ("domain versatility index"), which eliminates the effect of domain frequency. We explore links between a domain's versatility, when unlinked from abundance, and its biological properties. Conclusion Our results indicate that domains occurring as single domain proteins and domains appearing frequently at protein termini have a higher DV I. This is consistent with previous observations that the evolution of domain re-arrangements is primarily driven by fusion of pre-existing arrangements and single domains as well as loss of domains at protein termini. Furthermore, we studied the link between domain age, defined as the first appearance of a domain in the species tree, and the DV I. Contrary to previous studies based on domain promiscuity, it seems as if the DV I is age independent. Finally, we find that contrary to previously reported findings, versatility is lower in Eukaryotes. In summary, our measure of domain versatility indicates that a random attachment process is sufficient to explain the observed distribution of domain arrangements and that several views on domain promiscuity need to be revised.
Collapse
Affiliation(s)
- January Weiner
- Institute for Evolution and Biodiversity, Evolutionary Bioinformatics Group, Westphalian Wilhelms-University, Münster, Germany.
| | | | | |
Collapse
|
84
|
Baertsch R, Diekhans M, Kent WJ, Haussler D, Brosius J. Retrocopy contributions to the evolution of the human genome. BMC Genomics 2008; 9:466. [PMID: 18842134 PMCID: PMC2584115 DOI: 10.1186/1471-2164-9-466] [Citation(s) in RCA: 92] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2008] [Accepted: 10/08/2008] [Indexed: 02/06/2023] Open
Abstract
Background Evolution via point mutations is a relatively slow process and is unlikely to completely explain the differences between primates and other mammals. By contrast, 45% of the human genome is composed of retroposed elements, many of which were inserted in the primate lineage. A subset of retroposed mRNAs (retrocopies) shows strong evidence of expression in primates, often yielding functional retrogenes. Results To identify and analyze the relatively recently evolved retrogenes, we carried out BLASTZ alignments of all human mRNAs against the human genome and scored a set of features indicative of retroposition. Of over 12,000 putative retrocopy-derived genes that arose mainly in the primate lineage, 726 with strong evidence of transcript expression were examined in detail. These mRNA retroposition events fall into three categories: I) 34 retrocopies and antisense retrocopies that added potential protein coding space and UTRs to existing genes; II) 682 complete retrocopy duplications inserted into new loci; and III) an unexpected set of 13 retrocopies that contributed out-of-frame, or antisense sequences in combination with other types of transposed elements (SINEs, LINEs, LTRs), even unannotated sequence to form potentially novel genes with no homologs outside primates. In addition to their presence in human, several of the gene candidates also had potentially viable ORFs in chimpanzee, orangutan, and rhesus macaque, underscoring their potential of function. Conclusion mRNA-derived retrocopies provide raw material for the evolution of genes in a wide variety of ways, duplicating and amending the protein coding region of existing genes as well as generating the potential for new protein coding space, or non-protein coding RNAs, by unexpected contributions out of frame, in reverse orientation, or from previously non-protein coding sequence.
Collapse
Affiliation(s)
- Robert Baertsch
- Center for Biomolecular Science and Engineering, University of California Santa Cruz, Santa Cruz, California 95064, USA.
| | | | | | | | | |
Collapse
|
85
|
Moore AD, Björklund AK, Ekman D, Bornberg-Bauer E, Elofsson A. Arrangements in the modular evolution of proteins. Trends Biochem Sci 2008; 33:444-51. [PMID: 18656364 DOI: 10.1016/j.tibs.2008.05.008] [Citation(s) in RCA: 171] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2007] [Revised: 05/28/2008] [Accepted: 05/28/2008] [Indexed: 11/17/2022]
Abstract
It has been known for the last couple of decades that proteins evolve partly through rearrangements of larger fragments, typically domains. These units are considered the basic modules of protein structure, evolution and function. In the last few years, the analysis of protein-domain rearrangements has provided us with functional and evolutionary insights and has aided improved functional predictions and domain assignments to previously uncharacterised genes and proteins. Although some mechanisms that govern modular rearrangements of protein domains have been uncovered, such as the addition or deletion of a single N- or C-terminal domain, much is still unknown about the genetics behind these arrangements.
Collapse
Affiliation(s)
- Andrew D Moore
- Evolutionary Bioinformatics, IEB, University of Münster, Hüfferstrasse 1, Münster, Germany
| | | | | | | | | |
Collapse
|
86
|
Engineering of functional replication protein a homologs based on insights into the evolution of oligonucleotide/oligosaccharide-binding folds. J Bacteriol 2008; 190:5766-80. [PMID: 18586938 DOI: 10.1128/jb.01930-07] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
The bacterial single-stranded DNA-binding protein (SSB) and the archaeal/eukaryotic functional homolog, replication protein A (RPA), are essential for most aspects of DNA metabolism. Structural analyses of the architecture of SSB and RPA suggest that they are composed of different combinations of a module called the oligonucleotide/oligosaccharide-binding (OB) fold. Members of the domains Bacteria and Eukarya, in general, contain one type of SSB or RPA. In contrast, organisms in the archaeal domain have different RPAs made up of different organizations of OB folds. Interestingly, the euryarchaeon Methanosarcina acetivorans harbors multiple functional RPAs named MacRPA1 (for M. acetivorans RPA 1), MacRPA2, and MacRPA3. Comparison of MacRPA1 with related proteins in the publicly available databases suggested that intramolecular homologous recombination might play an important role in generating some of the diversity of OB folds in archaeal cells. On the basis of this information, from a four-OB-fold-containing RPA, we engineered chimeric modules to create three-OB-fold-containing RPAs to mimic a novel form of RPA found in Methanococcoides burtonii and Methanosaeta thermophila. We further created two RPAs that mimicked the RPAs in Methanocaldococcus jannaschii and Methanothermobacter thermautotrophicus through fusions of modules from MacRPA1 and M. thermautotrophicus RPA. Functional studies of these engineered proteins suggested that fusion and shuffling of OB folds can lead to well-folded polypeptides with most of the known properties of SSB and RPAs. On the basis of these results, different models that attempt to explain how intramolecular and intermolecular homologous recombination can generate novel forms of SSB or RPAs are proposed.
Collapse
|
87
|
Amoutzias GD, Van de Peer Y, Mossialos D. Evolution and taxonomic distribution of nonribosomal peptide and polyketide synthases. Future Microbiol 2008; 3:361-70. [DOI: 10.2217/17460913.3.3.361] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
The majority of nonribosomal peptide synthases and type I polyketide synthases are multimodular megasynthases of oligopeptide and polyketide secondary metabolites, respectively. Owing to their multimodular architecture, they synthesize their metabolites in assembly line logic. The ongoing genomic revolution together with the application of computational tools has provided the opportunity to mine the various genomes for these enzymes and identify those organisms that produce many oligopeptide and polyketide metabolites. In addition, scientists have started to comprehend the molecular mechanisms of megasynthase evolution, by duplication, recombination, point mutation and module skipping. This knowledge and computational analyses have been implemented towards predicting the specificity of these megasynthases and the structure of their end products. It is an exciting field, both for gaining deeper insight into their basic molecular mechanisms and exploiting them biotechnologically.
Collapse
Affiliation(s)
- Grigoris D Amoutzias
- Department of Plant Systems Biology, VIB & Department of Molecular Genetics, Ghent University, Technologiepark 927, B-9052 Ghent, Belgium
| | - Yves Van de Peer
- Department of Plant Systems Biology, VIB & Department of Molecular Genetics, Ghent University, Technologiepark 927, B-9052 Ghent, Belgium
| | - Dimitris Mossialos
- Department of Biochemistry & Biotechnology, University of Thessaly, Ploutonos & Aiolou 26, GR-41221 Larissa, Greece
| |
Collapse
|
88
|
Sequence similarity network reveals common ancestry of multidomain proteins. PLoS Comput Biol 2008; 4:e1000063. [PMID: 18475320 PMCID: PMC2377100 DOI: 10.1371/journal.pcbi.1000063] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2007] [Accepted: 03/18/2008] [Indexed: 11/25/2022] Open
Abstract
We address the problem of homology identification in complex multidomain families with varied domain architectures. The challenge is to distinguish sequence pairs that share common ancestry from pairs that share an inserted domain but are otherwise unrelated. This distinction is essential for accuracy in gene annotation, function prediction, and comparative genomics. There are two major obstacles to multidomain homology identification: lack of a formal definition and lack of curated benchmarks for evaluating the performance of new methods. We offer preliminary solutions to both problems: 1) an extension of the traditional model of homology to include domain insertions; and 2) a manually curated benchmark of well-studied families in mouse and human. We further present Neighborhood Correlation, a novel method that exploits the local structure of the sequence similarity network to identify homologs with great accuracy based on the observation that gene duplication and domain shuffling leave distinct patterns in the sequence similarity network. In a rigorous, empirical comparison using our curated data, Neighborhood Correlation outperforms sequence similarity, alignment length, and domain architecture comparison. Neighborhood Correlation is well suited for automated, genome-scale analyses. It is easy to compute, does not require explicit knowledge of domain architecture, and classifies both single and multidomain homologs with high accuracy. Homolog predictions obtained with our method, as well as our manually curated benchmark and a web-based visualization tool for exploratory analysis of the network neighborhood structure, are available at http://www.neighborhoodcorrelation.org. Our work represents a departure from the prevailing view that the concept of homology cannot be applied to genes that have undergone domain shuffling. In contrast to current approaches that either focus on the homology of individual domains or consider only families with identical domain architectures, we show that homology can be rationally defined for multidomain families with diverse architectures by considering the genomic context of the genes that encode them. Our study demonstrates the utility of mining network structure for evolutionary information, suggesting this is a fertile approach for investigating evolutionary processes in the post-genomic era. New genes evolve through the duplication and modification of existing genes. As a result, genes that share common ancestry tend to have similar structure and function. Computational methods that use common ancestry have been extraordinarily successful in inferring function. The practice of discerning evolutionary relationships is stymied, however, by modular sequences made up of two or more domains. When two genes share some domains but not others, it is difficult to distinguish a case of common ancestry from insertion of the same domain into both genes. We present a formal framework to define how multidomain genes are related, and propose a novel method for rapid, robust characterization of evolutionary relationships. In an empirical comparison with the current state of the art, we demonstrate superior performance of our method using a large hand-curated set of sequences known to share common ancestry. The success of our method derives from its unique ability to infer evolutionary history from local topology in the sequence similarity network. This represents a departure from the view that protein family classification must be restricted to families with conserved architecture. By exploiting the structure of the sequence similarity network, our approach surmounts this limitation and opens the door to studies of the role of modularity in protein evolution.
Collapse
|
89
|
Jiménez JL, Davletov B. Beta-strand recombination in tricalbin evolution and the origin of synaptotagmin-like C2 domains. Proteins 2007; 68:770-8. [PMID: 17510957 DOI: 10.1002/prot.21449] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Two protein families involved in membrane traffic, tricalbins and synaptotagmins, contain several copies of C2 domains and are related based on their sequence and domain architecture. Paradoxically, tricalbin and synaptotagmin C2 domains belong to different structural types with apparent circular permutation of terminal beta-strands. To understand whether a topological switch took place, we analyzed tricalbin and synaptotagmin-like C2 domains using two-dimensional structural analysis. We found that yeast tricalbins contain five to six C2 domains. One of these C2 domains possesses many features of synaptotagmin-like C2 domains and also carries a conserved C-terminal strand that is similar to its structural equivalent in synaptotagmin-like C2 domains, suggesting a structural permutation event. Indeed, among higher eukaryotes, animal tricalbins have evolved a C2 domain with synaptotagmin-like topology indicating that the structural conversion has taken place. Investigation of plant synaptotagmins, however, proves that they are direct tricalbin orthologs. Our analysis shows that beta-strand recombination is a possible evolutionary mechanism to generate new structural topologies with altered functional properties.
Collapse
Affiliation(s)
- José L Jiménez
- Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
| | | |
Collapse
|
90
|
Rasteiro R, Pereira-Leal JB. Multiple domain insertions and losses in the evolution of the Rab prenylation complex. BMC Evol Biol 2007; 7:140. [PMID: 17705859 PMCID: PMC1994686 DOI: 10.1186/1471-2148-7-140] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2007] [Accepted: 08/17/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Rab proteins are regulators of vesicular trafficking, requiring a lipid modification for proper function, prenylation of C-terminal cysteines. This is catalysed by a complex of a catalytic heterodimer (Rab Geranylgeranyl Transferase - RabGGTase) and an accessory protein (Rab Escort Protein. REP). Components of this complex display domain insertions relative to paralogous proteins. The function of these inserted domains is unclear. RESULTS We profiled the domain architecture of the components of the Rab prenylation complex in evolution. We identified the orthologues of the components of the Rab prenylation machinery in 43 organisms, representing the crown eukaryotic groups. We characterize in detail the domain structure of all these components and the phylogenetic relationships between the individual domains. CONCLUSION We found different domain insertions in different taxa, in alpha-subunits of RGGTase and REP. Our results suggest that there were multiple insertions, expansions and contractions in the evolution of this prenylation complex.
Collapse
Affiliation(s)
- Rita Rasteiro
- Instituto Gulbenkian de Ciência, Apartado 14, P-2781-901 Oeiras Portugal
| | | |
Collapse
|
91
|
Destri C, Miccio C. Simple stochastic model for the evolution of protein lengths. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2007; 76:011924. [PMID: 17677511 DOI: 10.1103/physreve.76.011924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2007] [Indexed: 05/16/2023]
Abstract
We analyze a simple discrete-time stochastic process for the theoretical modeling of the evolution of protein lengths. At every step of the process, a new protein is produced as a modification of one of the proteins already existing, and its length is assumed to be a random variable that depends only on the length of the originating protein. Thus a random recursive tree is produced over the natural numbers. If (quasi) scale invariance is assumed, the length distribution in a single history tends to a log-normal form with a specific signature of the deviations from exact Gaussianity. Comparison with the very large Similarity Matrix of Proteins database shows good agreement.
Collapse
Affiliation(s)
- C Destri
- Dipartimento di Fisica G. Occhialini, Università di Milano-Bicocca and INFN, Sezione di Milano, Piazza della Scienza 3, I-20126 Milano, Italy
| | | |
Collapse
|
92
|
Ekman D, Björklund AK, Elofsson A. Quantification of the elevated rate of domain rearrangements in metazoa. J Mol Biol 2007; 372:1337-48. [PMID: 17689563 DOI: 10.1016/j.jmb.2007.06.022] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2007] [Revised: 06/07/2007] [Accepted: 06/08/2007] [Indexed: 11/24/2022]
Abstract
Most eukaryotic proteins consist of multiple domains created through gene fusions or internal duplications. The most frequent change of a domain architecture (DA) is insertion or deletion of a domain at the N or C terminus. Still, the mechanisms underlying the evolution of multidomain proteins are not very well studied. Here, we have studied the evolution of multidomain architectures (MDA), guided by evolutionary information in the form of a phylogenetic tree. Our results show that Pfam domain families and MDAs have been created with comparable rates (0.1-1 per million years (My)). The major changes in DA evolution have occurred in the process of multicellularization and within the metazoan lineage. In contrast, creation of domains seems to have been frequent already in the early evolution. Furthermore, most of the architectures have been created from older domains or architectures, whereas novel domains are mainly found in single-domain proteins. However, a particular group of exon-bordering domains may have contributed to the rapid evolution of novel multidomain proteins in metazoan organisms. Finally, MDAs have evolved predominantly through insertions of domains, whereas domain deletions are less common. In conclusion, the rate of creation of multidomain proteins has accelerated in the metazoan lineage, which may partly be explained by the frequent insertion of exon-bordering domains into new architectures. However, our results indicate that other factors have contributed as well.
Collapse
Affiliation(s)
- Diana Ekman
- Stockholm Bioinformatics Center, SCFAB, Stockholm University, SE-10691 Stockholm, Sweden
| | | | | |
Collapse
|
93
|
Beaussart F, Weiner J, Bornberg-Bauer E. Automated Improvement of Domain ANnotations using context analysis of domain arrangements (AIDAN). ACTA ACUST UNITED AC 2007; 23:1834-6. [PMID: 17483506 DOI: 10.1093/bioinformatics/btm240] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
MOTIVATION Since protein domains are the units of evolution, databases of domain signatures such as ProDom or Pfam enable both a sensitive and selective sequence analysis. However, manually curated databases have a low coverage and automatically generated ones often miss relationships which have not yet been discovered between domains or cannot display similarities between domains which have drifted apart. METHODS We present a tool which makes use of the fact that overall domain arrangements are often conserved. AIDAN (Automated Improvement of Domain ANnotations) identifies potential annotation artifacts and domains which have drifted apart. The underlying database supplements ProDom and is interfaced by a graphical tool allowing the localization of single domain deletions or annotations which have been falsely made by the automated procedure. AVAILABILITY http://www.uni-muenster.de/Evolution/ebb/Services/AIDAN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Francois Beaussart
- Evolutionary Bioinformatics, Institute for Evolution and Biodiversity, Westfälische Wilhelms University, Schlossplatz 4, D48149 Münster, Germany
| | | | | |
Collapse
|
94
|
Han JH, Batey S, Nickson AA, Teichmann SA, Clarke J. The folding and evolution of multidomain proteins. Nat Rev Mol Cell Biol 2007; 8:319-30. [PMID: 17356578 DOI: 10.1038/nrm2144] [Citation(s) in RCA: 283] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Analyses of genomes show that more than 70% of eukaryotic proteins are composed of multiple domains. However, most studies of protein folding focus on individual domains and do not consider how interactions between domains might affect folding. Here, we address this by analysing the three-dimensional structures of multidomain proteins that have been characterized experimentally and observe that where the interface is small and loosely packed, or unstructured, the folding of the domains is independent. Furthermore, recent studies indicate that multidomain proteins have evolved mechanisms to minimize the problems of interdomain misfolding.
Collapse
Affiliation(s)
- Jung-Hoon Han
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, UK
| | | | | | | | | |
Collapse
|
95
|
Fong JH, Geer LY, Panchenko AR, Bryant SH. Modeling the evolution of protein domain architectures using maximum parsimony. J Mol Biol 2006; 366:307-15. [PMID: 17166515 PMCID: PMC1858635 DOI: 10.1016/j.jmb.2006.11.017] [Citation(s) in RCA: 75] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2006] [Revised: 09/22/2006] [Accepted: 11/06/2006] [Indexed: 10/23/2022]
Abstract
Domains are basic evolutionary units of proteins and most proteins have more than one domain. Advances in domain modeling and collection are making it possible to annotate a large fraction of known protein sequences by a linear ordering of their domains, yielding their architecture. Protein domain architectures link evolutionarily related proteins and underscore their shared functions. Here, we attempt to better understand this association by identifying the evolutionary pathways by which extant architectures may have evolved. We propose a model of evolution in which architectures arise through rearrangements of inferred precursor architectures and acquisition of new domains. These pathways are ranked using a parsimony principle, whereby scenarios requiring the fewest number of independent recombination events, namely fission and fusion operations, are assumed to be more likely. Using a data set of domain architectures present in 159 proteomes that represent all three major branches of the tree of life allows us to estimate the history of over 85% of all architectures in the sequence database. We find that the distribution of rearrangement classes is robust with respect to alternative parsimony rules for inferring the presence of precursor architectures in ancestral species. Analyzing the most parsimonious pathways, we find 87% of architectures to gain complexity over time through simple changes, among which fusion events account for 5.6 times as many architectures as fission. Our results may be used to compute domain architecture similarities, for example, based on the number of historical recombination events separating them. Domain architecture "neighbors" identified in this way may lead to new insights about the evolution of protein function.
Collapse
|
96
|
Tracing the origin of functional and conserved domains in the human proteome: implications for protein evolution at the modular level. BMC Evol Biol 2006; 6:91. [PMID: 17090320 PMCID: PMC1654190 DOI: 10.1186/1471-2148-6-91] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2006] [Accepted: 11/07/2006] [Indexed: 11/29/2022] Open
Abstract
Background The functional repertoire of the human proteome is an incremental collection of functions accomplished by protein domains evolved along the Homo sapiens lineage. Therefore, knowledge on the origin of these functionalities provides a better understanding of the domain and protein evolution in human. The lack of proper comprehension about such origin has impelled us to study the evolutionary origin of human proteome in a unique way as detailed in this study. Results This study reports a unique approach for understanding the evolution of human proteome by tracing the origin of its constituting domains hierarchically, along the Homo sapiens lineage. The uniqueness of this method lies in subtractive searching of functional and conserved domains in the human proteome resulting in higher efficiency of detecting their origins. From these analyses the nature of protein evolution and trends in domain evolution can be observed in the context of the entire human proteome data. The method adopted here also helps delineate the degree of divergence of functional families occurred during the course of evolution. Conclusion This approach to trace the evolutionary origin of functional domains in the human proteome facilitates better understanding of their functional versatility as well as provides insights into the functionality of hypothetical proteins present in the human proteome. This work elucidates the origin of functional and conserved domains in human proteins, their distribution along the Homo sapiens lineage, occurrence frequency of different domain combinations and proteome-wide patterns of their distribution, providing insights into the evolutionary solution to the increased complexity of the human proteome.
Collapse
|
97
|
Björklund ÅK, Ekman D, Elofsson A. Expansion of protein domain repeats. PLoS Comput Biol 2006; 2:e114. [PMID: 16933986 PMCID: PMC1553488 DOI: 10.1371/journal.pcbi.0020114] [Citation(s) in RCA: 191] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2006] [Accepted: 07/14/2006] [Indexed: 11/20/2022] Open
Abstract
Many proteins, especially in eukaryotes, contain tandem repeats of several domains from the same family. These repeats have a variety of binding properties and are involved in protein-protein interactions as well as binding to other ligands such as DNA and RNA. The rapid expansion of protein domain repeats is assumed to have evolved through internal tandem duplications. However, the exact mechanisms behind these tandem duplications are not well-understood. Here, we have studied the evolution, function, protein structure, gene structure, and phylogenetic distribution of domain repeats. For this purpose we have assigned Pfam-A domain families to 24 proteomes with more sensitive domain assignments in the repeat regions. These assignments confirmed previous findings that eukaryotes, and in particular vertebrates, contain a much higher fraction of proteins with repeats compared with prokaryotes. The internal sequence similarity in each protein revealed that the domain repeats are often expanded through duplications of several domains at a time, while the duplication of one domain is less common. Many of the repeats appear to have been duplicated in the middle of the repeat region. This is in strong contrast to the evolution of other proteins that mainly works through additions of single domains at either terminus. Further, we found that some domain families show distinct duplication patterns, e.g., nebulin domains have mainly been expanded with a unit of seven domains at a time, while duplications of other domain families involve varying numbers of domains. Finally, no common mechanism for the expansion of all repeats could be detected. We found that the duplication patterns show no dependence on the size of the domains. Further, repeat expansion in some families can possibly be explained by shuffling of exons. However, exon shuffling could not have created all repeats.
Collapse
Affiliation(s)
- Åsa K Björklund
- Stockholm Bioinformatics Center, Center for Biomembrane Research, Stockholm University, Stockholm, Sweden
| | - Diana Ekman
- Stockholm Bioinformatics Center, Center for Biomembrane Research, Stockholm University, Stockholm, Sweden
| | - Arne Elofsson
- Stockholm Bioinformatics Center, Center for Biomembrane Research, Stockholm University, Stockholm, Sweden
| |
Collapse
|