1
|
Sowa ST, Bosetti C, Galera-Prat A, Johnson MS, Lehtiö L. An Evolutionary Perspective on the Origin, Conservation and Binding Partner Acquisition of Tankyrases. Biomolecules 2022; 12:1688. [PMID: 36421702 PMCID: PMC9688111 DOI: 10.3390/biom12111688] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 11/08/2022] [Accepted: 11/08/2022] [Indexed: 01/04/2024] Open
Abstract
Tankyrases are poly-ADP-ribosyltransferases that regulate many crucial and diverse cellular processes in humans such as Wnt signaling, telomere homeostasis, mitotic spindle formation and glucose metabolism. While tankyrases are present in most animals, functional differences across species may exist. In this work, we confirm the widespread distribution of tankyrases throughout the branches of multicellular animal life and identify the single-celled choanoflagellates as earliest origin of tankyrases. We further show that the sequences and structural aspects of TNKSs are well-conserved even between distantly related species. We also experimentally characterized an anciently diverged tankyrase homolog from the sponge Amphimedon queenslandica and show that the basic functional aspects, such as poly-ADP-ribosylation activity and interaction with the canonical tankyrase binding peptide motif, are conserved. Conversely, the presence of tankyrase binding motifs in orthologs of confirmed interaction partners varies greatly between species, indicating that tankyrases may have different sets of interaction partners depending on the animal lineage. Overall, our analysis suggests a remarkable degree of conservation for tankyrases, and that their regulatory functions in cells have likely changed considerably throughout evolution.
Collapse
Affiliation(s)
- Sven T. Sowa
- Faculty for Biochemistry and Molecular Medicine & Biocenter Oulu, University of Oulu, 90220 Oulu, Finland
| | - Chiara Bosetti
- Faculty for Biochemistry and Molecular Medicine & Biocenter Oulu, University of Oulu, 90220 Oulu, Finland
| | - Albert Galera-Prat
- Faculty for Biochemistry and Molecular Medicine & Biocenter Oulu, University of Oulu, 90220 Oulu, Finland
| | - Mark S. Johnson
- Structural Bioinformatics Laboratory, Biochemistry, Faculty of Science and Engineering and InFLAMES Research Flagship Center, Åbo Akademi University, 20520 Turku, Finland
| | - Lari Lehtiö
- Faculty for Biochemistry and Molecular Medicine & Biocenter Oulu, University of Oulu, 90220 Oulu, Finland
| |
Collapse
|
2
|
Ratcliffe LE, Asiedu EK, Pickett CJ, Warburton MA, Izzi SA, Meedel TH. The Ciona myogenic regulatory factor functions as a typical MRF but possesses a novel N-terminus that is essential for activity. Dev Biol 2019; 448:210-225. [PMID: 30365920 PMCID: PMC6478573 DOI: 10.1016/j.ydbio.2018.10.010] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2018] [Revised: 08/28/2018] [Accepted: 10/16/2018] [Indexed: 11/26/2022]
Abstract
Electroporation-based assays were used to test whether the myogenic regulatory factor (MRF) of Ciona intestinalis (CiMRF) interferes with endogenous developmental programs, and to evaluate the importance of its unusual N-terminus for muscle development. We found that CiMRF suppresses both notochord and endoderm development when it is expressed in these tissues by a mechanism that may involve activation of muscle-specific microRNAs. Because these results add to a large body of evidence demonstrating the exceptionally high degree of functional conservation among MRFs, we were surprised to discover that non-ascidian MRFs were not myogenic in Ciona unless they formed part of a chimeric protein containing the CiMRF N-terminus. Equally surprising, we found that despite their widely differing primary sequences, the N-termini of MRFs of other ascidian species could form chimeric MRFs that were also myogenic in Ciona. This domain did not rescue the activity of a Brachyury protein whose transcriptional activation domain had been deleted, and so does not appear to constitute such a domain. Our results indicate that ascidians have previously unrecognized and potentially novel requirements for MRF-directed myogenesis. Moreover, they provide the first example of a domain that is essential to the core function of an important family of gene regulatory proteins, one that, to date, has been found in only a single branch of the family.
Collapse
Affiliation(s)
- Lindsay E Ratcliffe
- Department of Biology, Rhode Island College, 600 Mt. Pleasant Ave., Providence, RI 02908, USA.
| | - Emmanuel K Asiedu
- Department of Biology, Rhode Island College, 600 Mt. Pleasant Ave., Providence, RI 02908, USA.
| | - C J Pickett
- Department of Biology, Rhode Island College, 600 Mt. Pleasant Ave., Providence, RI 02908, USA.
| | - Megan A Warburton
- Department of Biology, Rhode Island College, 600 Mt. Pleasant Ave., Providence, RI 02908, USA.
| | - Stephanie A Izzi
- Department of Biology, Rhode Island College, 600 Mt. Pleasant Ave., Providence, RI 02908, USA.
| | - Thomas H Meedel
- Department of Biology, Rhode Island College, 600 Mt. Pleasant Ave., Providence, RI 02908, USA.
| |
Collapse
|
3
|
Abstract
This chapter reviews current research on how protein domain architectures evolve. We begin by summarizing work on the phylogenetic distribution of proteins, as this will directly impact which domain architectures can be formed in different species. Studies relating domain family size to occurrence have shown that they generally follow power law distributions, both within genomes and larger evolutionary groups. These findings were subsequently extended to multi-domain architectures. Genome evolution models that have been suggested to explain the shape of these distributions are reviewed, as well as evidence for selective pressure to expand certain domain families more than others. Each domain has an intrinsic combinatorial propensity, and the effects of this have been studied using measures of domain versatility or promiscuity. Next, we study the principles of protein domain architecture evolution and how these have been inferred from distributions of extant domain arrangements. Following this, we review inferences of ancestral domain architecture and the conclusions concerning domain architecture evolution mechanisms that can be drawn from these. Finally, we examine whether all known cases of a given domain architecture can be assumed to have a single common origin (monophyly) or have evolved convergently (polyphyly). We end by a discussion of some available tools for computational analysis or exploitation of protein domain architectures and their evolution.
Collapse
|
4
|
Putative extremely high rate of proteome innovation in lancelets might be explained by high rate of gene prediction errors. Sci Rep 2016; 6:30700. [PMID: 27476717 PMCID: PMC4967905 DOI: 10.1038/srep30700] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2016] [Accepted: 07/06/2016] [Indexed: 01/17/2023] Open
Abstract
A recent analysis of the genomes of Chinese and Florida lancelets has concluded that the rate of creation of novel protein domain combinations is orders of magnitude greater in lancelets than in other metazoa and it was suggested that continuous activity of transposable elements in lancelets is responsible for this increased rate of protein innovation. Since morphologically Chinese and Florida lancelets are highly conserved, this finding would contradict the observation that high rates of protein innovation are usually associated with major evolutionary innovations. Here we show that the conclusion that the rate of proteome innovation is exceptionally high in lancelets may be unjustified: the differences observed in domain architectures of orthologous proteins of different amphioxus species probably reflect high rates of gene prediction errors rather than true innovation.
Collapse
|
5
|
Sato PM, Yoganathan K, Jung JH, Peisajovich SG. The robustness of a signaling complex to domain rearrangements facilitates network evolution. PLoS Biol 2014; 12:e1002012. [PMID: 25490747 PMCID: PMC4260825 DOI: 10.1371/journal.pbio.1002012] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2014] [Accepted: 10/21/2014] [Indexed: 11/18/2022] Open
Abstract
The broad tolerance of domain-rearranging mutations by a yeast signaling network suggests that signaling complexes have loose spatial constraints, making manipulation and perhaps evolution easier. The rearrangement of protein domains is known to have key roles in the evolution of signaling networks and, consequently, is a major tool used to synthetically rewire networks. However, natural mutational events leading to the creation of proteins with novel domain combinations, such as in frame fusions followed by domain loss, retrotranspositions, or translocations, to name a few, often simultaneously replace pre-existing genes. Thus, while proteins with new domain combinations may establish novel network connections, it is not clear how the concomitant deletions are tolerated. We investigated the mechanisms that enable signaling networks to tolerate domain rearrangement-mediated gene replacements. Using as a model system the yeast mitogen activated protein kinase (MAPK)-mediated mating pathway, we analyzed 92 domain-rearrangement events affecting 11 genes. Our results indicate that, while domain rearrangement events that result in the loss of catalytic activities within the signaling complex are not tolerated, domain rearrangements can drastically alter protein interactions without impairing function. This suggests that signaling complexes can maintain function even when some components are recruited to alternative sites within the complex. Furthermore, we also found that the ability of the complex to tolerate changes in interaction partners does not depend on long disordered linkers that often connect domains. Taken together, our results suggest that some signaling complexes are dynamic ensembles with loose spatial constraints that could be easily re-shaped by evolution and, therefore, are ideal targets for cellular engineering. Cells use complex protein interaction networks to sense and process external signals. Proteins involved in signaling are often composed of multiple functional units called domains. Because domains are modular, mutations that rearrange domains among proteins have the potential to result in the creation of novel proteins with altered functions. At an evolutionary timescale, domain rearrangements contribute to the functional diversification of signaling networks; at the shorter timescale of the life of an individual, domain rearrangements can impair cellular functions and lead to disease. Here, we investigated how domain-rearranging mutations alter the function of signaling networks, in particular when these mutations disrupt pre-existing proteins. We used as a model system the yeast mating signaling pathway, which shares many properties with more complex pathways active in human cells. Our results demonstrate that signaling networks are often robust to domain rearrangements that disrupt pre-existing genes. In addition, our experiments suggest a possible mechanism to explain this robustness: rather than being a rigid multi-protein machine, the yeast mating signaling complex is a dynamic ensemble with loose spatial constraints. Because of this, the changes in protein interaction partners caused by domain-rearrangement mutations can be accommodated without disrupting network function.
Collapse
Affiliation(s)
- Paloma M. Sato
- Department of Cell and Systems Biology, and Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario, Canada
| | - Kogulan Yoganathan
- Department of Cell and Systems Biology, and Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario, Canada
| | - Jae H. Jung
- Department of Cell and Systems Biology, and Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario, Canada
| | - Sergio G. Peisajovich
- Department of Cell and Systems Biology, and Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario, Canada
- * E-mail:
| |
Collapse
|
6
|
Nagy A, Patthy L. FixPred: a resource for correction of erroneous protein sequences. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau032. [PMID: 24705206 PMCID: PMC3975993 DOI: 10.1093/database/bau032] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Protein databases are heavily contaminated with erroneous (mispredicted, abnormal and incomplete) sequences and these erroneous data significantly distort the conclusions drawn from genome-scale protein sequence analyses. In our earlier work we described the MisPred resource that serves to identify erroneous sequences; here we present the FixPred computational pipeline that automatically corrects sequences identified by MisPred as erroneous. The current version of the associated FixPred database contains corrected UniProtKB/Swiss-Prot and NCBI/RefSeq sequences from Homo sapiens, Mus musculus, Rattus norvegicus, Monodelphis domestica, Gallus gallus, Xenopus tropicalis, Danio rerio, Fugu rubripes, Ciona intestinalis, Branchostoma floridae, Drosophila melanogaster and Caenorhabditis elegans; future releases of the FixPred database will include corrected sequences of additional Metazoan species. The FixPred computational pipeline and database (http://www.fixpred.com) are easily accessible through a simple web interface coupled to a powerful query engine and a standard web service. The content is completely or partially downloadable in a variety of formats. Database URL:http://www.fixpred.com
Collapse
Affiliation(s)
| | - László Patthy
- *Corresponding author: Tel: +361 279 3100; Fax: +361 466 5465;
| |
Collapse
|
7
|
Nagy A, Patthy L. MisPred: a resource for identification of erroneous protein sequences in public databases. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013; 2013:bat053. [PMID: 23864220 PMCID: PMC3713709 DOI: 10.1093/database/bat053] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Correct prediction of the structure of protein-coding genes of higher eukaryotes is still a difficult task; therefore, public databases are heavily contaminated with mispredicted sequences. The high rate of misprediction has serious consequences because it significantly affects the conclusions that may be drawn from genome-scale sequence analyses of eukaryotic genomes. Here we present the MisPred database and computational pipeline that provide efficient means for the identification of erroneous sequences in public databases. The MisPred database contains a collection of abnormal, incomplete and mispredicted protein sequences from 19 metazoan species identified as erroneous by MisPred quality control tools in the UniProtKB/Swiss-Prot, UniProtKB/TrEMBL, NCBI/RefSeq and EnsEMBL databases. Major releases of the database are automatically generated and updated regularly. The database (http://www.mispred.com) is easily accessible through a simple web interface coupled to a powerful query engine and a standard web service. The content is completely or partially downloadable in a variety of formats. DATABASE URL: http://www.mispred.com.
Collapse
Affiliation(s)
- Alinda Nagy
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, H-1113 Budapest, Hungary
| | | |
Collapse
|
8
|
Zmasek CM, Godzik A. This Déjà vu feeling--analysis of multidomain protein evolution in eukaryotic genomes. PLoS Comput Biol 2012; 8:e1002701. [PMID: 23166479 PMCID: PMC3499355 DOI: 10.1371/journal.pcbi.1002701] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2012] [Accepted: 07/27/2012] [Indexed: 12/31/2022] Open
Abstract
Evolutionary innovation in eukaryotes and especially animals is at least partially driven by genome rearrangements and the resulting emergence of proteins with new domain combinations, and thus potentially novel functionality. Given the random nature of such rearrangements, one could expect that proteins with particularly useful multidomain combinations may have been rediscovered multiple times by parallel evolution. However, existing reports suggest a minimal role of this phenomenon in the overall evolution of eukaryotic proteomes. We assembled a collection of 172 complete eukaryotic genomes that is not only the largest, but also the most phylogenetically complete set of genomes analyzed so far. By employing a maximum parsimony approach to compare repertoires of Pfam domains and their combinations, we show that independent evolution of domain combinations is significantly more prevalent than previously thought. Our results indicate that about 25% of all currently observed domain combinations have evolved multiple times. Interestingly, this percentage is even higher for sets of domain combinations in individual species, with, for instance, 70% of the domain combinations found in the human genome having evolved independently at least once in other species. We also show that previous, much lower estimates of this rate are most likely due to the small number and biased phylogenetic distribution of the genomes analyzed. The process of independent emergence of identical domain combination is widespread, not limited to domains with specific functional categories. Besides data from large-scale analyses, we also present individual examples of independent domain combination evolution. The surprisingly large contribution of parallel evolution to the development of the domain combination repertoire in extant genomes has profound consequences for our understanding of the evolution of pathways and cellular processes in eukaryotes and for comparative functional genomics. Most proteins in eukaryotes are composed of two or more domains, evolutionary independent units with (often) their own individual functions. The specific repertoire of multidomain proteins in a given species defines the topology of pathways and networks that carry out its metabolic and regulatory processes. When proteins with new domain combinations emerge by gene fusion and fission, it directly affects topology of cellular networks in this organism. To better understand the evolution of such networks we analyzed a large set of eukaryotic genomes for the evolutionary history of known domain combinations. Our analysis shows that 70% of all domain combinations present in the human genome independently appeared in at least one other eukaryotic genome. Overall, over 25% of all known multidomain architectures emerged independently several times in the history of life. The difference between a global and species specific picture can be explained by the existence of a core set of domain combinations that keeps reemerging in different species, which are accompanied by a smaller number of unique domain combinations that do not appear anywhere else.
Collapse
Affiliation(s)
- Christian M. Zmasek
- Program in Bioinformatics and Systems Biology, Sanford-Burnham Medical Research Institute, La Jolla, California, United States of America
- * E-mail: (CMZ); (AG)
| | - Adam Godzik
- Program in Bioinformatics and Systems Biology, Sanford-Burnham Medical Research Institute, La Jolla, California, United States of America
- * E-mail: (CMZ); (AG)
| |
Collapse
|
9
|
Guo B, Zou M, Wagner A. Pervasive indels and their evolutionary dynamics after the fish-specific genome duplication. Mol Biol Evol 2012; 29:3005-22. [PMID: 22490820 DOI: 10.1093/molbev/mss108] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
Insertions and deletions (indels) in protein-coding genes are important sources of genetic variation. Their role in creating new proteins may be especially important after gene duplication. However, little is known about how indels affect the divergence of duplicate genes. We here study thousands of duplicate genes in five fish (teleost) species with completely sequenced genomes. The ancestor of these species has been subject to a fish-specific genome duplication (FSGD) event that occurred approximately 350 Ma. We find that duplicate genes contain at least 25% more indels than single-copy genes. These indels accumulated preferentially in the first 40 my after the FSGD. A lack of widespread asymmetric indel accumulation indicates that both members of a duplicate gene pair typically experience relaxed selection. Strikingly, we observe a 30-80% excess of deletions over insertions that is consistent for indels of various lengths and across the five genomes. We also find that indels preferentially accumulate inside loop regions of protein secondary structure and in regions where amino acids are exposed to solvent. We show that duplicate genes with high indel density also show high DNA sequence divergence. Indel density, but not amino acid divergence, can explain a large proportion of the tertiary structure divergence between proteins encoded by duplicate genes. Our observations are consistent across all five fish species. Taken together, they suggest a general pattern of duplicate gene evolution in which indels are important driving forces of evolutionary change.
Collapse
Affiliation(s)
- Baocheng Guo
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
| | | | | |
Collapse
|
10
|
Reassessing domain architecture evolution of metazoan proteins: major impact of errors caused by confusing paralogs and epaktologs. Genes (Basel) 2011; 2:516-61. [PMID: 24710209 PMCID: PMC3927612 DOI: 10.3390/genes2030516] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2011] [Revised: 07/08/2011] [Accepted: 07/19/2011] [Indexed: 11/16/2022] Open
Abstract
In the accompanying paper (Nagy, Szláma, Szarka, Trexler, Bányai, Patthy, Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Gene Prediction Errors) we showed that in the case of UniProtKB/TrEMBL, RefSeq, EnsEMBL and NCBI's GNOMON predicted protein sequences of Metazoan species the contribution of erroneous (incomplete, abnormal, mispredicted) sequences to domain architecture (DA) differences of orthologous proteins might be greater than those of true gene rearrangements. Based on these findings, we suggest that earlier genome-scale studies based on comparison of predicted (frequently mispredicted) protein sequences may have led to some erroneous conclusions about the evolution of novel domain architectures of multidomain proteins. In this manuscript we examine the impact of confusing paralogous and epaktologous multidomain proteins (i.e., those that are related only through the independent acquisition of the same domain types) on conclusions drawn about DA evolution of multidomain proteins in Metazoa. To estimate the contribution of this type of error we have used as reference UniProtKB/Swiss-Prot sequences from protein families with well-characterized evolutionary histories. We have used two types of paralogy-group construction procedures and monitored the impact of various parameters on the separation of true paralogs from epaktologs on correctly annotated Swiss-Prot entries of multidomain proteins. Our studies have shown that, although public protein family databases are contaminated with epaktologs, analysis of the structure of sequence similarity networks of multidomain proteins provides an efficient means for the separation of epaktologs and paralogs. We have also demonstrated that contamination of protein families with epaktologs increases the apparent rate of DA change and introduces a bias in DA differences in as much as it increases the proportion of terminal over internal DA differences. We have shown that confusing paralogous and epaktologous multidomain proteins significantly increases the apparent rate of DA change in Metazoa and introduces a positional bias in favor of terminal over internal DA changes. Our findings caution that earlier studies based on analysis of datasets of protein families that were contaminated with epaktologs may have led to some erroneous conclusions about the evolution of novel domain architectures of multidomain proteins. A reassessment of the DA evolution of multidomain proteins is presented in an accompanying paper [1].
Collapse
|
11
|
Reassessing domain architecture evolution of metazoan proteins: major impact of gene prediction errors. Genes (Basel) 2011; 2:449-501. [PMID: 24710207 PMCID: PMC3927609 DOI: 10.3390/genes2030449] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2011] [Revised: 06/14/2011] [Accepted: 06/20/2011] [Indexed: 11/17/2022] Open
Abstract
In view of the fact that appearance of novel protein domain architectures (DA) is closely associated with biological innovations, there is a growing interest in the genome-scale reconstruction of the evolutionary history of the domain architectures of multidomain proteins. In such analyses, however, it is usually ignored that a significant proportion of Metazoan sequences analyzed is mispredicted and that this may seriously affect the validity of the conclusions. To estimate the contribution of errors in gene prediction to differences in DA of predicted proteins, we have used the high quality manually curated UniProtKB/Swiss-Prot database as a reference. For genome-scale analysis of domain architectures of predicted proteins we focused on RefSeq, EnsEMBL and NCBI's GNOMON predicted sequences of Metazoan species with completely sequenced genomes. Comparison of the DA of UniProtKB/Swiss-Prot sequences of worm, fly, zebrafish, frog, chick, mouse, rat and orangutan with those of human Swiss-Prot entries have identified relatively few cases where orthologs had different DA, although the percentage with different DA increased with evolutionary distance. In contrast with this, comparison of the DA of human, orangutan, rat, mouse, chicken, frog, zebrafish, worm and fly RefSeq, EnsEMBL and NCBI's GNOMON predicted protein sequences with those of the corresponding/orthologous human Swiss-Prot entries identified a significantly higher proportion of domain architecture differences than in the case of the comparison of Swiss-Prot entries. Analysis of RefSeq, EnsEMBL and NCBI's GNOMON predicted protein sequences with DAs different from those of their Swiss-Prot orthologs confirmed that the higher rate of domain architecture differences is due to errors in gene prediction, the majority of which could be corrected with our FixPred protocol. We have also demonstrated that contamination of databases with incomplete, abnormal or mispredicted sequences introduces a bias in DA differences in as much as it increases the proportion of terminal over internal DA differences. Here we have shown that in the case of RefSeq, EnsEMBL and NCBI's GNOMON predicted protein sequences of Metazoan species, the contribution of gene prediction errors to domain architecture differences of orthologs is comparable to or greater than those due to true gene rearrangements. We have also demonstrated that domain architecture comparison may serve as a useful tool for the quality control of gene predictions and may thus guide the correction of sequence errors. Our findings caution that earlier genome-scale studies based on comparison of predicted (frequently mispredicted) protein sequences may have led to some erroneous conclusions about the evolution of novel domain architectures of multidomain proteins. A reassessment of the DA evolution of orthologous and paralogous proteins is presented in an accompanying paper [1].
Collapse
|