1
|
Yang Y, Braga MV, Dean MD. Insertion-Deletion Events Are Depleted in Protein Regions with Predicted Secondary Structure. Genome Biol Evol 2024; 16:evae093. [PMID: 38735759 PMCID: PMC11102076 DOI: 10.1093/gbe/evae093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 04/16/2024] [Accepted: 04/21/2024] [Indexed: 05/14/2024] Open
Abstract
A fundamental goal in evolutionary biology and population genetics is to understand how selection shapes the fate of new mutations. Here, we test the null hypothesis that insertion-deletion (indel) events in protein-coding regions occur randomly with respect to secondary structures. We identified indels across 11,444 sequence alignments in mouse, rat, human, chimp, and dog genomes and then quantified their overlap with four different types of secondary structure-alpha helices, beta strands, protein bends, and protein turns-predicted by deep-learning methods of AlphaFold2. Indels overlapped secondary structures 54% as much as expected and were especially underrepresented over beta strands, which tend to form internal, stable regions of proteins. In contrast, indels were enriched by 155% over regions without any predicted secondary structures. These skews were stronger in the rodent lineages compared to the primate lineages, consistent with population genetic theory predicting that natural selection will be more efficient in species with larger effective population sizes. Nonsynonymous substitutions were also less common in regions of protein secondary structure, although not as strongly reduced as in indels. In a complementary analysis of thousands of human genomes, we showed that indels overlapping secondary structure segregated at significantly lower frequency than indels outside of secondary structure. Taken together, our study shows that indels are selected against if they overlap secondary structure, presumably because they disrupt the tertiary structure and function of a protein.
Collapse
Affiliation(s)
- Yi Yang
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Matthew V Braga
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Matthew D Dean
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
2
|
Miton CM, Tokuriki N. Insertions and Deletions (Indels): A Missing Piece of the Protein Engineering Jigsaw. Biochemistry 2023; 62:148-157. [PMID: 35830609 DOI: 10.1021/acs.biochem.2c00188] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Over the years, protein engineers have studied nature and borrowed its tricks to accelerate protein evolution in the test tube. While there have been considerable advances, our ability to generate new proteins in the laboratory is seemingly limited. One explanation for these shortcomings may be that insertions and deletions (indels), which frequently arise in nature, are largely overlooked during protein engineering campaigns. The profound effect of indels on protein structures, by way of drastic backbone alterations, could be perceived as "saltation" events that bring about significant phenotypic changes in a single mutational step. Should we leverage these effects to accelerate protein engineering and gain access to unexplored regions of adaptive landscapes? In this Perspective, we describe the role played by indels in the functional diversification of proteins in nature and discuss their untapped potential for protein engineering, despite their often-destabilizing nature. We hope to spark a renewed interest in indels, emphasizing that their wider study and use may prove insightful and shape the future of protein engineering by unlocking unique functional changes that substitutions alone could never achieve.
Collapse
Affiliation(s)
- Charlotte M Miton
- Michael Smith Laboratories, University of British Columbia, Vancouver, V6T 1Z4 BC, Canada
| | - Nobuhiko Tokuriki
- Michael Smith Laboratories, University of British Columbia, Vancouver, V6T 1Z4 BC, Canada
| |
Collapse
|
3
|
Savino S, Desmet T, Franceus J. Insertions and deletions in protein evolution and engineering. Biotechnol Adv 2022; 60:108010. [PMID: 35738511 DOI: 10.1016/j.biotechadv.2022.108010] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 06/15/2022] [Accepted: 06/16/2022] [Indexed: 11/17/2022]
Abstract
Protein evolution or engineering studies are traditionally focused on amino acid substitutions and the way these contribute to fitness. Meanwhile, the insertion and deletion of amino acids is often overlooked, despite being one of the most common sources of genetic variation. Recent methodological advances and successful engineering stories have demonstrated that the time is ripe for greater emphasis on these mutations and their understudied effects. This review highlights the evolutionary importance and biotechnological relevance of insertions and deletions (indels). We provide a comprehensive overview of approaches that can be employed to include indels in random, (semi)-rational or computational protein engineering pipelines. Furthermore, we discuss the tolerance to indels at the structural level, address how domain indels can link the function of unrelated proteins, and feature studies that illustrate the surprising and intriguing potential of frameshift mutations.
Collapse
Affiliation(s)
- Simone Savino
- Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, Coupure Links 653, 9000 Ghent, Belgium
| | - Tom Desmet
- Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, Coupure Links 653, 9000 Ghent, Belgium
| | - Jorick Franceus
- Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, Coupure Links 653, 9000 Ghent, Belgium..
| |
Collapse
|
4
|
Guo B, Zou M, Sakamoto T, Innan H. Functional Innovation through Gene Duplication Followed by Frameshift Mutation. Genes (Basel) 2022; 13:genes13020190. [PMID: 35205235 PMCID: PMC8872073 DOI: 10.3390/genes13020190] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2021] [Revised: 01/14/2022] [Accepted: 01/18/2022] [Indexed: 11/16/2022] Open
Abstract
In his influential book “Evolution by Gene Duplication”, Ohno postulated that frameshift mutation could lead to a new function after duplication, but frameshift mutation is generally thought to be deleterious, and thus drew little attention in functional innovation in duplicate evolution. To this end, we here report an exhaustive survey of the genomes of human, mouse, zebrafish, and fruit fly. We identified 80 duplicate genes that involved frameshift mutations after duplication. The frameshift mutation preferentially located close to the C-terminus in most cases (55/88), which indicated that a frameshift mutation that changed the reading frame in a small part at the end of a duplicate may likely have contributed to adaptive evolution (e.g., human genes NOTCH2NL and ARHGAP11B) otherwise too deleterious to survive. A few cases (11/80) involved multiple frameshift mutations, exhibiting various patterns of modifications of the reading frame. Functionality of duplicate genes involving frameshift mutations was confirmed by sequence characteristics and expression profile, suggesting a potential role of frameshift mutation in creating functional novelty. We thus showed that genomes have non-negligible numbers of genes that have experienced frameshift mutations following gene duplication. Our results demonstrated the potential importance of frameshift mutations in molecular evolution, as Ohno verbally argued 50 years ago.
Collapse
Affiliation(s)
- Baocheng Guo
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China;
- University of Chinese Academy of Sciences, Beijing 100049, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China
- Correspondence: (B.G.); (H.I.)
| | - Ming Zou
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China;
| | - Takahiro Sakamoto
- Department of Evolutionary Studies of Biosystems, Graduate University for Advanced Studies, Hayama 240-0193, Kanagawa, Japan;
| | - Hideki Innan
- Department of Evolutionary Studies of Biosystems, Graduate University for Advanced Studies, Hayama 240-0193, Kanagawa, Japan;
- Correspondence: (B.G.); (H.I.)
| |
Collapse
|
5
|
Zhao VY, Rodrigues JV, Lozovsky ER, Hartl DL, Shakhnovich EI. Switching an active site helix in dihydrofolate reductase reveals limits to subdomain modularity. Biophys J 2021; 120:4738-4750. [PMID: 34571014 PMCID: PMC8595743 DOI: 10.1016/j.bpj.2021.09.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Revised: 09/14/2021] [Accepted: 09/22/2021] [Indexed: 11/23/2022] Open
Abstract
To what degree are individual structural elements within proteins modular such that similar structures from unrelated proteins can be interchanged? We study subdomain modularity by creating 20 chimeras of an enzyme, Escherichia coli dihydrofolate reductase (DHFR), in which a catalytically important, 10-residue α-helical sequence is replaced by α-helical sequences from a diverse set of proteins. The chimeras stably fold but have a range of diminished thermal stabilities and catalytic activities. Evolutionary coupling analysis indicates that the residues of this α-helix are under selection pressure to maintain catalytic activity in DHFR. Reversion to phenylalanine at key position 31 was found to partially restore catalytic activity, which could be explained by evolutionary coupling values. We performed molecular dynamics simulations using replica exchange with solute tempering. Chimeras with low catalytic activity exhibit nonhelical conformations that block the binding site and disrupt the positioning of the catalytically essential residue D27. Simulation observables and in vitro measurements of thermal stability and substrate-binding affinity are strongly correlated. Several E. coli strains with chromosomally integrated chimeric DHFRs can grow, with growth rates that follow predictions from a kinetic flux model that depends on the intracellular abundance and catalytic activity of DHFR. Our findings show that although α-helices are not universally substitutable, the molecular and fitness effects of modular segments can be predicted by the biophysical compatibility of the replacement segment.
Collapse
Affiliation(s)
- Victor Y Zhao
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts
| | - João V Rodrigues
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts
| | - Elena R Lozovsky
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts
| | - Daniel L Hartl
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts
| | - Eugene I Shakhnovich
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts.
| |
Collapse
|
6
|
Wang Y, Guo B. The divergence of alternative splicing between ohnologs in teleost fishes. BMC Ecol Evol 2021; 21:98. [PMID: 34034651 PMCID: PMC8146666 DOI: 10.1186/s12862-021-01833-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Accepted: 05/19/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gene duplication and alternative splicing (AS) are two distinct mechanisms generating new materials for genetic innovations. The evolutionary link between gene duplication and AS is still controversial, due to utilizing duplicates from inconsistent ages of duplication events in earlier studies. With the aid of RNA-seq data, we explored evolutionary scenario of AS divergence between duplicates with ohnologs that resulted from the teleost genome duplication event in zebrafish, medaka, and stickleback. RESULTS Ohnologs in zebrafish have fewer AS forms compared to their singleton orthologs, supporting the function-sharing model of AS divergence between duplicates. Ohnologs in stickleback have more AS forms compared to their singleton orthologs, which supports the accelerated model of AS divergence between duplicates. The evolution of AS in ohnologs in medaka supports a combined scenario of the function-sharing and the accelerated model of AS divergence between duplicates. We also found a small number of ohnolog pairs in each of the three teleosts showed significantly asymmetric AS divergence. For example, the well-known ovary-factor gene cyp19a1a has no AS form but its ohnolog cyp19a1b has multiple AS forms in medaka, suggesting that functional divergence between duplicates might have result from AS divergence. CONCLUSIONS We found that a combined scenario of function-sharing and accelerated models for AS evolution in ohnologs in teleosts and rule out the independent model that assumes a lack of correlation between gene duplication and AS. Our study thus provided insights into the link between gene duplication and AS in general and ohnolog divergence in teleosts from AS perspective in particular.
Collapse
Affiliation(s)
- Yuwei Wang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Baocheng Guo
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101, China. .,University of Chinese Academy of Sciences, Beijing, 100049, China. .,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650201, China.
| |
Collapse
|
7
|
Conant GC. The lasting after-effects of an ancient polyploidy on the genomes of teleosts. PLoS One 2020; 15:e0231356. [PMID: 32298330 PMCID: PMC7161988 DOI: 10.1371/journal.pone.0231356] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Accepted: 03/20/2020] [Indexed: 12/20/2022] Open
Abstract
The ancestor of most teleost fishes underwent a whole-genome duplication event three hundred million years ago. Despite its antiquity, the effects of this event are evident both in the structure of teleost genomes and in how the surviving duplicated genes still operate to drive form and function. I inferred a set of shared syntenic regions that survive from the teleost genome duplication (TGD) using eight teleost genomes and the outgroup gar genome (which lacks the TGD). I then phylogenetically modeled the TGD's resolution via shared and independent gene losses and applied a new simulation-based statistical test for the presence of bias toward the preservation of genes from one parental subgenome. On the basis of that test, I argue that the TGD was likely an allopolyploidy. I find that duplicate genes surviving from this duplication in zebrafish are less likely to function in early embryo development than are genes that have returned to single copy at some point in this species' history. The tissues these ohnologs are expressed in, as well as their biological functions, lend support to recent suggestions that the TGD was the source of a morphological innovation in the structure of the teleost retina. Surviving duplicates also appear less likely to be essential than singletons, despite the fact that their single-copy orthologs in mouse are no less essential than other genes.
Collapse
Affiliation(s)
- Gavin C. Conant
- Department of Biological Sciences, North Carolina State University, Raleigh, NC, United States of America
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, United States of America
- Program in Genetics, North Carolina State University, Raleigh, NC, United States of America
- Division of Animal Sciences, University of Missouri, Columbia, MO, United States of America
| |
Collapse
|
8
|
Chen F, Fengling Lai, Luo M, Han YS, Cheng H, Zhou R. The genome-wide landscape of small insertion and deletion mutations in Monopterus albus. J Genet Genomics 2019; 46:75-86. [PMID: 30867123 DOI: 10.1016/j.jgg.2019.02.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2018] [Revised: 12/21/2018] [Accepted: 02/01/2019] [Indexed: 11/17/2022]
Abstract
Insertion and deletion (indel) mutations, which can trigger single nucleotide substitutions on the flanking regions of genes, may generate abundant materials for disease defense, reproduction, species survival and evolution. However, genetic and evolutionary mechanisms of indels remain elusive. We establish a comparative genome-transcriptome-alignment approach for a large-scale identification of indels in Monopterus population. Over 2000 indels in 1738 indel genes, including 1-21 bp deletions and 1-15 bp insertions, were detected. Each indel gene had ∼1.1 deletions/insertions, and 2-4 alleles in population. Frequencies of deletions were prominently higher than those of insertions on both genome and population levels. Most of the indels led to in frame mutations with multiples of three and majorly occurred in non-domain regions, indicating functional constraint or tolerance of the indels. All indel genes showed higher expression levels than non-indel genes during sex reversal. Slide window analysis of global expression levels in gonads showed a significant positive correlation with indel density in the genome. Moreover, indel genes were evolutionarily conserved and evolved slowly compared to non-indel genes. Notably, population genetic structure of indels revealed divergent evolution of Monopterus population, as bottleneck effect of biogeographic isolation by Taiwan Strait, China.
Collapse
Affiliation(s)
- Feng Chen
- Hubei Key Laboratory of Cell Homeostasis, College of Life Sciences, Wuhan University, Wuhan, 430072, China
| | - Fengling Lai
- Hubei Key Laboratory of Cell Homeostasis, College of Life Sciences, Wuhan University, Wuhan, 430072, China
| | - Majing Luo
- Hubei Key Laboratory of Cell Homeostasis, College of Life Sciences, Wuhan University, Wuhan, 430072, China
| | - Yu-San Han
- Institute of Fisheries Science, College of Life Science, "National Taiwan University", Taipei, 10617, Taiwan, China
| | - Hanhua Cheng
- Hubei Key Laboratory of Cell Homeostasis, College of Life Sciences, Wuhan University, Wuhan, 430072, China.
| | - Rongjia Zhou
- Hubei Key Laboratory of Cell Homeostasis, College of Life Sciences, Wuhan University, Wuhan, 430072, China.
| |
Collapse
|
9
|
Xie S, Zhou A, Feng Y, Wang Z, Fan L, Zhang Y, Zeng F, Zou J. Effects of fasting and re-feeding on mstn and mstnb genes expressions in Cranoglanis bouderius. Gene 2019; 682:1-12. [DOI: 10.1016/j.gene.2018.09.050] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Revised: 09/15/2018] [Accepted: 09/25/2018] [Indexed: 12/09/2022]
|
10
|
Zhang Z, Wang J, Gong Y, Li Y. Contributions of substitutions and indels to the structural variations in ancient protein superfamilies. BMC Genomics 2018; 19:771. [PMID: 30355304 PMCID: PMC6201574 DOI: 10.1186/s12864-018-5178-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Accepted: 10/16/2018] [Indexed: 11/10/2022] Open
Abstract
Background Quantitative evaluation of protein structural evolution is important for our understanding of protein biological functions and their evolutionary adaptation, and is useful in guiding protein engineering. However, compared to the models for sequence evolution, the quantitative models for protein structural evolution received less attention. Ancient protein superfamilies are often considered versatile, allowing genetic and functional diversifications during long-term evolution. In this study, we investigated the quantitative impacts of sequence variations on the structural evolution of homologues in 68 ancient protein superfamilies that exist widely in sequenced eukaryotic, bacterial and archaeal genomes. Results We found that the accumulated structural variations within ancient superfamilies could be explained largely by a bilinear model that simultaneously considers amino acid substitution and insertion/deletion (indel). Both substitutions and indels are essential for explaining the structural variations within ancient superfamilies. For those ancient superfamilies with high bilinear multiple correlation coefficients, the influence of each unit of substitution or indel on structural variations is almost constant within each superfamily, but varies greatly among different superfamilies. The influence of each unit indel on structural variations is always larger than that of each unit substitution within each superfamily, but the accumulated contributions of indels to structural variations are lower than those of substitutions in most superfamilies. The total contributions of sequence indels and substitutions (46% and 54%, respectively) to the structural variations that result from sequence variations are slightly different in ancient superfamilies. Conclusions Structural variations within ancient protein superfamilies accumulated under the significantly bilinear influence of amino acid substitutions and indels in sequences. Both substitutions and indels are essential for explaining the structural variations within ancient superfamilies. For those structural variations resulting from sequence variations, the total contribution of indels is slightly lower than that of amino acid substitutions. The regular clock exists not only in protein sequences, but also probably in protein structures. Electronic supplementary material The online version of this article (10.1186/s12864-018-5178-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Zheng Zhang
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, 266237, China
| | - Jinlan Wang
- Physical Examination Office of Shandong Province, Health and Family Planning Commission of Shandong Province, Jinan, 250014, China
| | - Ya Gong
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, 266237, China
| | - Yuezhong Li
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, 266237, China.
| |
Collapse
|
11
|
Qiao X, Yin H, Li L, Wang R, Wu J, Wu J, Zhang S. Different Modes of Gene Duplication Show Divergent Evolutionary Patterns and Contribute Differently to the Expansion of Gene Families Involved in Important Fruit Traits in Pear ( Pyrus bretschneideri). FRONTIERS IN PLANT SCIENCE 2018; 9:161. [PMID: 29487610 PMCID: PMC5816897 DOI: 10.3389/fpls.2018.00161] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Accepted: 01/29/2018] [Indexed: 05/21/2023]
Abstract
Pear is an important fruit crop of the Rosaceae family and has experienced two rounds of ancient whole-genome duplications (WGDs). However, whether different types of gene duplications evolved differently after duplication remains unclear in the pear genome. In this study, we identified the different modes of gene duplication in pear. Duplicate genes derived from WGD, tandem, proximal, retrotransposed, DNA-based transposed or dispersed duplications differ in genomic distribution, gene features, selection pressure, expression divergence, regulatory divergence and biological roles. Widespread sequence, expression and regulatory divergence have occurred between duplicate genes over the 30-45 million years of evolution after the recent genome duplication in pear. The retrotransposed genes show relatively higher expression and regulatory divergence than other gene duplication modes. In contrast, WGD genes underwent a slower sequence divergence and may be influenced by abundant gene conversion events. Moreover, the different classes of duplicate genes exhibited biased functional roles. We also investigated the evolution and expansion patterns of the gene families involved in sugar and organic acid metabolism pathways, which are closely related to the fruit quality and taste in pear. Single-gene duplications largely account for the extensive expansion of gene families involved in the sorbitol metabolism pathway in pear. Gene family expansion was also detected in the sucrose metabolism pathway and tricarboxylic acid cycle pathways. Thus, this study provides insights into the evolutionary fates of duplicated genes.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Shaoling Zhang
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Centre of Pear Engineering Technology Research, Nanjing Agricultural University, Nanjing, China
| |
Collapse
|
12
|
Complex Genes Are Preferentially Retained After Whole-Genome Duplication in Teleost Fish. J Mol Evol 2017; 84:253-258. [PMID: 28492966 DOI: 10.1007/s00239-017-9794-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Accepted: 05/03/2017] [Indexed: 11/27/2022]
Abstract
Gene duplication generates new genetic material which, if retained after duplication, may contribute to organismal evolution. A whole-genome duplication occurred in the ancestry of teleost fish and consequently there are many duplicated genes in teleost genomes. Indeed, it has been proposed that the evolutionary diversification of teleost fish may have been stimulated by the fish-specific genome duplication (FSGD). However, it is not clear which factors determine which genes are retained as duplicate copies and which return to a singleton state after duplication. In the present study, gene complexity, in terms of encoded protein length and functional domain number, is compared between duplicate and singleton genes for nine well-annotated teleost genomes. A total of 933 gene families with retained duplicates and 4590 singleton gene families are analysed. Genes with retained duplicates are found to be significantly longer (27.9-38.2%) and to have more functional domains (20.5-26.5%) than singleton genes in all the nine teleost genomes, suggesting that genes encoded longer proteins with and more functional domains were preferentially retained after whole-genome duplication in teleosts. This differential retention of duplicated genes will have increased the genomic complexity of teleost fish after FSGD which, together with differential duplicated gene retention as a lineage-splitting force, may have greatly contributed to the successful diversification of teleost fish.
Collapse
|
13
|
Jackson EL, Spielman SJ, Wilke CO. Computational prediction of the tolerance to amino-acid deletion in green-fluorescent protein. PLoS One 2017; 12:e0164905. [PMID: 28369116 PMCID: PMC5378326 DOI: 10.1371/journal.pone.0164905] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2016] [Accepted: 03/21/2017] [Indexed: 01/29/2023] Open
Abstract
Proteins evolve through two primary mechanisms: substitution, where mutations alter a protein's amino-acid sequence, and insertions and deletions (indels), where amino acids are either added to or removed from the sequence. Protein structure has been shown to influence the rate at which substitutions accumulate across sites in proteins, but whether structure similarly constrains the occurrence of indels has not been rigorously studied. Here, we investigate the extent to which structural properties known to covary with protein evolutionary rates might also predict protein tolerance to indels. Specifically, we analyze a publicly available dataset of single-amino-acid deletion mutations in enhanced green fluorescent protein (eGFP) to assess how well the functional effect of deletions can be predicted from protein structure. We find that weighted contact number (WCN), which measures how densely packed a residue is within the protein's three-dimensional structure, provides the best single predictor for whether eGFP will tolerate a given deletion. We additionally find that using protein design to explicitly model deletions results in improved predictions of functional status when combined with other structural predictors. Our work suggests that structure plays fundamental role in constraining deletions at sites in proteins, and further that similar biophysical constraints influence both substitutions and deletions. This study therefore provides a solid foundation for future work to examine how protein structure influences tolerance of more complex indel events, such as insertions or large deletions.
Collapse
Affiliation(s)
- Eleisha L. Jackson
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas, United States of America
- Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, United States of America
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
| | - Stephanie J. Spielman
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania, United States of America
| | - Claus O. Wilke
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas, United States of America
- Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, United States of America
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
- * E-mail:
| |
Collapse
|
14
|
Evolution of Fish Let-7 MicroRNAs and Their Expression Correlated to Growth Development in Blunt Snout Bream. Int J Mol Sci 2017; 18:ijms18030646. [PMID: 28300776 PMCID: PMC5372658 DOI: 10.3390/ijms18030646] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2017] [Revised: 03/12/2017] [Accepted: 03/13/2017] [Indexed: 12/12/2022] Open
Abstract
The lethal-7 (let-7) miRNA, known as one of the first founding miRNAs, is present in multiple copies in a genome and has diverse functions in animals. In this study, comparative genomic analysis of let-7 miRNAs members in fish species indicated that let-7 miRNA is a sequence conserved family in fish, while different species have the variable gene copy numbers. Among the ten members including let-7a/b/c/d/e/f/g/h/i/j, the let-7a precursor sequence was more similar to ancestral sequences, whereas other let-7 miRNA members were separate from the late differentiation of let-7a. The mostly predicted target genes of let-7 miRNAs are involved in biological process, especially developmental process and growth through Gene Ontology (GO) enrichment analysis. In order to identify the possible different functions of these ten miRNAs in fish growth development, their expression levels were quantified in adult males and females of Megalobrama amblycephala, as well as in 3-, 6-, and 12-months-old individuals with relatively slow- and fast-growth rates. These ten miRNAs had similar tissue expression patterns between males and females, with higher expression levels in the brain and pituitary than that in other tissues (p < 0.05). Among these miRNAs, the relative expression level of let-7a was the highest among almost all the tested tissues, followed by let-7b, let-7d and let-7c/e/f/g/h/i/j. As to the groups with different growth rates, the expression levels of let-7 miRNAs in pituitary and brain from the slow-growth group were always significantly higher than that in the fast-growth group (p < 0.05). These results suggest that let-7 miRNA members could play an important role in the regulation of growth development in M. amblycephala through negatively regulating expression of their target genes.
Collapse
|
15
|
Canapa A, Barucca M, Biscotti MA, Forconi M, Olmo E. Transposons, Genome Size, and Evolutionary Insights in Animals. Cytogenet Genome Res 2016; 147:217-39. [PMID: 26967166 DOI: 10.1159/000444429] [Citation(s) in RCA: 88] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/03/2015] [Indexed: 11/19/2022] Open
Abstract
The relationship between genome size and the percentage of transposons in 161 animal species evidenced that variations in genome size are linked to the amplification or the contraction of transposable elements. The activity of transposable elements could represent a response to environmental stressors. Indeed, although with different trends in protostomes and deuterostomes, comprehensive changes in genome size were recorded in concomitance with particular periods of evolutionary history or adaptations to specific environments. During evolution, genome size and the presence of transposable elements have influenced structural and functional parameters of genomes and cells. Changes of these parameters have had an impact on morphological and functional characteristics of the organism on which natural selection directly acts. Therefore, the current situation represents a balance between insertion and amplification of transposons and the mechanisms responsible for their deletion or for decreasing their activity. Among the latter, methylation and the silencing action of small RNAs likely represent the most frequent mechanisms.
Collapse
Affiliation(s)
- Adriana Canapa
- Dipartimento di Scienze della Vita e dell'Ambiente, Universitx00E0; Politecnica delle Marche, Ancona, Italy
| | | | | | | | | |
Collapse
|
16
|
Comprehensive Transcriptome Analysis of Six Catfish Species from an Altitude Gradient Reveals Adaptive Evolution in Tibetan Fishes. G3-GENES GENOMES GENETICS 2015; 6:141-8. [PMID: 26564948 PMCID: PMC4704712 DOI: 10.1534/g3.115.024448] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Glyptosternoid fishes (Siluriformes), one of the three broad fish lineages (the two other are schizothoracines and Triplophysa), have a limited distribution in the rivers in the Tibetan Plateau and peripheral regions. To investigate the genetic mechanisms underlying adaptation to the Tibetan Plateau in several fish species from gradient altitudes, a total of 20,659,183–37,166,756 sequence reads from six species of catfish were generated by Illumina sequencing, resulting in six assemblies. Analysis of the 1,656 orthologs among the six assembled catfish unigene sets provided consistent evidence for genome-wide accelerated evolution in the three glyptosternoid lineages living at high altitudes. A large number of genes refer to functional categories related to hypoxia and energy metabolism exhibited rapid evolution in the glyptosternoid lineages relative to yellowhead catfish living in plains areas. Genes showing signatures of rapid evolution and positive selection in the glyptosternoid lineages were also enriched in functions associated with energy metabolism and hypoxia. Our analyses provide novel insights into highland adaptation in fishes and can serve as a foundation for future studies aiming to identify candidate genes underlying the genetic basis of adaptation in Tibetan fishes.
Collapse
|
17
|
Surkont J, Diekmann Y, Ryder PV, Pereira-Leal JB. Coiled-coil length: Size does matter. Proteins 2015; 83:2162-9. [PMID: 26387794 DOI: 10.1002/prot.24932] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2015] [Revised: 08/23/2015] [Accepted: 09/14/2015] [Indexed: 11/09/2022]
Abstract
Protein evolution is governed by processes that alter primary sequence but also the length of proteins. Protein length may change in different ways, but insertions, deletions and duplications are the most common. An optimal protein size is a trade-off between sequence extension, which may change protein stability or lead to acquisition of a new function, and shrinkage that decreases metabolic cost of protein synthesis. Despite the general tendency for length conservation across orthologous proteins, the propensity to accept insertions and deletions is heterogeneous along the sequence. For example, protein regions rich in repetitive peptide motifs are well known to extensively vary their length across species. Here, we analyze length conservation of coiled-coils, domains formed by an ubiquitous, repetitive peptide motif present in all domains of life, that frequently plays a structural role in the cell. We observed that, despite the repetitive nature, the length of coiled-coil domains is generally highly conserved throughout the tree of life, even when the remaining parts of the protein change, including globular domains. Length conservation is independent of primary amino acid sequence variation, and represents a conservation of domain physical size. This suggests that the conservation of domain size is due to functional constraints.
Collapse
Affiliation(s)
| | - Yoan Diekmann
- Instituto Gulbenkian de Ciência, Oeiras, 2780-156, Portugal.,Physiology Course, Marine Biological Laboratory, Woods Hole, Massachusetts, 02543
| | - Pearl V Ryder
- Physiology Course, Marine Biological Laboratory, Woods Hole, Massachusetts, 02543.,Emory University School of Medicine, Atlanta, Georgia, 30322
| | - Jose B Pereira-Leal
- Instituto Gulbenkian de Ciência, Oeiras, 2780-156, Portugal.,Physiology Course, Marine Biological Laboratory, Woods Hole, Massachusetts, 02543
| |
Collapse
|
18
|
Tong C, Zhang C, Shi J, Qi H, Zhang R, Tang Y, Li G, Feng C, Zhao K. Characterization of two paralogous myostatin genes and evidence for positive selection in Tibet fish: Gymnocypris przewalskii. Gene 2015; 565:201-10. [DOI: 10.1016/j.gene.2015.04.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2015] [Revised: 03/13/2015] [Accepted: 04/06/2015] [Indexed: 10/23/2022]
|
19
|
|
20
|
Mei J, Gui JF. Genetic basis and biotechnological manipulation of sexual dimorphism and sex determination in fish. SCIENCE CHINA-LIFE SCIENCES 2015; 58:124-36. [PMID: 25563981 DOI: 10.1007/s11427-014-4797-9] [Citation(s) in RCA: 174] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2014] [Accepted: 09/28/2014] [Indexed: 10/24/2022]
Abstract
Aquaculture has made an enormous contribution to the world food production, especially to the sustainable supply of animal proteins. The utility of diverse reproduction strategies in fish, such as the exploiting use of unisexual gynogenesis, has created a typical case of fish genetic breeding. A number of fish species show substantial sexual dimorphism that is closely linked to multiple economic traits including growth rate and body size, and the efficient development of sex-linked genetic markers and sex control biotechnologies has provided significant approaches to increase the production and value for commercial purposes. Along with the rapid development of genomics and molecular genetic techniques, the genetic basis of sexual dimorphism has been gradually deciphered, and great progress has been made in the mechanisms of fish sex determination and identification of sex-determining genes. This review summarizes the progress to provide some directive and objective thinking for further research in this field.
Collapse
Affiliation(s)
- Jie Mei
- College of Fisheries, Key Laboratory of Freshwater Animal Breeding, Ministry of Agriculture, Freshwater Aquaculture Collaborative Innovation Center of Hubei Province, Huazhong Agricultural University, Wuhan, 430070, China
| | | |
Collapse
|
21
|
Yang L, Wang Y, Zhang Z, He S. Comprehensive transcriptome analysis reveals accelerated genic evolution in a Tibet fish, Gymnodiptychus pachycheilus. Genome Biol Evol 2014; 7:251-61. [PMID: 25543049 PMCID: PMC4316632 DOI: 10.1093/gbe/evu279] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Elucidating the genetic mechanisms of organismal adaptation to the Tibetan Plateau at a genomic scale can provide insights into the process of adaptive evolution. Many highland species have been investigated and various candidate genes that may be responsible for highland adaptation have been identified. However, we know little about the genomic basis of adaptation to Tibet in fishes. Here, we performed transcriptome sequencing of a schizothoracine fish (Gymnodiptychus pachycheilus) and used it to identify potential genetic mechanisms of highland adaptation. We obtained totally 66,105 assembled unigenes, of which 7,232 were assigned as putative one-to-one orthologs in zebrafish. Comparative gene annotations from several species indicated that at least 350 genes lost and 41 gained since the divergence between G. pachycheilus and zebrafish. An analysis of 6,324 orthologs among zebrafish, fugu, medaka, and spotted gar identified consistent evidence for genome-wide accelerated evolution in G. pachycheilus and only the terminal branch of G. pachycheilus had an elevated Ka/Ks ratio than the ancestral branch. Many functional categories related to hypoxia and energy metabolism exhibited rapid evolution in G. pachycheilus relative to zebrafish. Genes showing signature of rapid evolution and positive selection in the G. pachycheilus lineage were also enriched in functions associated with energy metabolism and hypoxia. The first genomic resources for fish in the Tibetan Plateau and evolutionary analyses provided some novel insights into highland adaptation in fishes and served as a foundation for future studies aiming to identify candidate genes underlying the genetic bases of adaptation to Tibet in fishes.
Collapse
Affiliation(s)
- Liandong Yang
- The Key Laboratory of Aquatic Biodiversity and Conservation of Chinese Academy of Sciences, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, Hubei, People's Republic of China University of Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Ying Wang
- The Key Laboratory of Aquatic Biodiversity and Conservation of Chinese Academy of Sciences, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, Hubei, People's Republic of China University of Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Zhaolei Zhang
- Department of Molecular Genetics, University of Toronto, Ontario, Canada Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Ontario, Canada
| | - Shunping He
- The Key Laboratory of Aquatic Biodiversity and Conservation of Chinese Academy of Sciences, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, Hubei, People's Republic of China
| |
Collapse
|
22
|
Glasauer SMK, Neuhauss SCF. Whole-genome duplication in teleost fishes and its evolutionary consequences. Mol Genet Genomics 2014; 289:1045-60. [PMID: 25092473 DOI: 10.1007/s00438-014-0889-2] [Citation(s) in RCA: 517] [Impact Index Per Article: 51.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2013] [Accepted: 07/15/2014] [Indexed: 12/18/2022]
Abstract
Whole-genome duplication (WGD) events have shaped the history of many evolutionary lineages. One such duplication has been implicated in the evolution of teleost fishes, by far the most species-rich vertebrate clade. After initial controversy, there is now solid evidence that such event took place in the common ancestor of all extant teleosts. It is termed teleost-specific (TS) WGD. After WGD, duplicate genes have different fates. The most likely outcome is non-functionalization of one duplicate gene due to the lack of selective constraint on preserving both. Mechanisms that act on preservation of duplicates are subfunctionalization (partitioning of ancestral gene functions on the duplicates), neofunctionalization (assigning a novel function to one of the duplicates) and dosage selection (preserving genes to maintain dosage balance between interconnected components). Since the frequency of these mechanisms is influenced by the genes' properties, there are over-retained classes of genes, such as highly expressed ones and genes involved in neural function. The consequences of the TS-WGD, especially its impact on the massive radiation of teleosts, have been matter of controversial debate. It is evident that gene duplications are crucial for generating complexity and that WGDs provide large amounts of raw material for evolutionary adaptation and innovation. However, it is less clear whether the TS-WGD is directly linked to the evolutionary success of teleosts and their radiation. Recent studies let us conclude that TS-WGD has been important in generating teleost complexity, but that more recent ecological adaptations only marginally related to TS-WGD might have even contributed more to diversification. It is likely, however, that TS-WGD provided teleosts with diversification potential that can become effective much later, such as during phases of environmental change.
Collapse
Affiliation(s)
- Stella M K Glasauer
- Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, 8057, Zurich, Switzerland
| | | |
Collapse
|
23
|
Puritz JB, Hollenbeck CM, Gold JR. dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms. PeerJ 2014; 2:e431. [PMID: 24949246 PMCID: PMC4060032 DOI: 10.7717/peerj.431] [Citation(s) in RCA: 259] [Impact Index Per Article: 25.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2014] [Accepted: 05/27/2014] [Indexed: 12/14/2022] Open
Abstract
Restriction-site associated DNA sequencing (RADseq) has become a powerful and useful approach for population genomics. Currently, no software exists that utilizes both paired-end reads from RADseq data to efficiently produce population-informative variant calls, especially for non-model organisms with large effective population sizes and high levels of genetic polymorphism. dDocent is an analysis pipeline with a user-friendly, command-line interface designed to process individually barcoded RADseq data (with double cut sites) into informative SNPs/Indels for population-level analyses. The pipeline, written in BASH, uses data reduction techniques and other stand-alone software packages to perform quality trimming and adapter removal, de novo assembly of RAD loci, read mapping, SNP and Indel calling, and baseline data filtering. Double-digest RAD data from population pairings of three different marine fishes were used to compare dDocent with Stacks, the first generally available, widely used pipeline for analysis of RADseq data. dDocent consistently identified more SNPs shared across greater numbers of individuals and with higher levels of coverage. This is due to the fact that dDocent quality trims instead of filtering, incorporates both forward and reverse reads (including reads with INDEL polymorphisms) in assembly, mapping, and SNP calling. The pipeline and a comprehensive user guide can be found at http://dDocent.wordpress.com.
Collapse
Affiliation(s)
- Jonathan B Puritz
- Marine Genomics Laboratory, Harte Research Institute, Texas A&M University-Corpus Christi , Corpus Christi, TX , USA
| | - Christopher M Hollenbeck
- Marine Genomics Laboratory, Harte Research Institute, Texas A&M University-Corpus Christi , Corpus Christi, TX , USA
| | - John R Gold
- Marine Genomics Laboratory, Harte Research Institute, Texas A&M University-Corpus Christi , Corpus Christi, TX , USA
| |
Collapse
|
24
|
Caetano-Anollés G, Nasir A, Zhou K, Caetano-Anollés D, Mittenthal JE, Sun FJ, Kim KM. Archaea: the first domain of diversified life. ARCHAEA (VANCOUVER, B.C.) 2014; 2014:590214. [PMID: 24987307 PMCID: PMC4060292 DOI: 10.1155/2014/590214] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2013] [Revised: 02/15/2014] [Accepted: 03/25/2014] [Indexed: 01/23/2023]
Abstract
The study of the origin of diversified life has been plagued by technical and conceptual difficulties, controversy, and apriorism. It is now popularly accepted that the universal tree of life is rooted in the akaryotes and that Archaea and Eukarya are sister groups to each other. However, evolutionary studies have overwhelmingly focused on nucleic acid and protein sequences, which partially fulfill only two of the three main steps of phylogenetic analysis, formulation of realistic evolutionary models, and optimization of tree reconstruction. In the absence of character polarization, that is, the ability to identify ancestral and derived character states, any statement about the rooting of the tree of life should be considered suspect. Here we show that macromolecular structure and a new phylogenetic framework of analysis that focuses on the parts of biological systems instead of the whole provide both deep and reliable phylogenetic signal and enable us to put forth hypotheses of origin. We review over a decade of phylogenomic studies, which mine information in a genomic census of millions of encoded proteins and RNAs. We show how the use of process models of molecular accumulation that comply with Weston's generality criterion supports a consistent phylogenomic scenario in which the origin of diversified life can be traced back to the early history of Archaea.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, Institute for Genomic Biology and Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Arshan Nasir
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, Institute for Genomic Biology and Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Kaiyue Zhou
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, Institute for Genomic Biology and Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Derek Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, Institute for Genomic Biology and Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Jay E. Mittenthal
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, Institute for Genomic Biology and Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Feng-Jie Sun
- School of Science and Technology, Georgia Gwinnett College, Lawrenceville, GA 30043, USA
| | - Kyung Mo Kim
- Microbial Resource Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon 305-806, Republic of Korea
| |
Collapse
|
25
|
Nagy A, Patthy L. FixPred: a resource for correction of erroneous protein sequences. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau032. [PMID: 24705206 PMCID: PMC3975993 DOI: 10.1093/database/bau032] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Protein databases are heavily contaminated with erroneous (mispredicted, abnormal and incomplete) sequences and these erroneous data significantly distort the conclusions drawn from genome-scale protein sequence analyses. In our earlier work we described the MisPred resource that serves to identify erroneous sequences; here we present the FixPred computational pipeline that automatically corrects sequences identified by MisPred as erroneous. The current version of the associated FixPred database contains corrected UniProtKB/Swiss-Prot and NCBI/RefSeq sequences from Homo sapiens, Mus musculus, Rattus norvegicus, Monodelphis domestica, Gallus gallus, Xenopus tropicalis, Danio rerio, Fugu rubripes, Ciona intestinalis, Branchostoma floridae, Drosophila melanogaster and Caenorhabditis elegans; future releases of the FixPred database will include corrected sequences of additional Metazoan species. The FixPred computational pipeline and database (http://www.fixpred.com) are easily accessible through a simple web interface coupled to a powerful query engine and a standard web service. The content is completely or partially downloadable in a variety of formats. Database URL:http://www.fixpred.com
Collapse
Affiliation(s)
| | - László Patthy
- *Corresponding author: Tel: +361 279 3100; Fax: +361 466 5465;
| |
Collapse
|
26
|
Guo B, Chain FJJ, Bornberg-Bauer E, Leder EH, Merilä J. Genomic divergence between nine- and three-spined sticklebacks. BMC Genomics 2013; 14:756. [PMID: 24188282 PMCID: PMC4046692 DOI: 10.1186/1471-2164-14-756] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2013] [Accepted: 10/31/2013] [Indexed: 12/22/2022] Open
Abstract
Background Comparative genomics approaches help to shed light on evolutionary processes that shape differentiation between lineages. The nine-spined stickleback (Pungitius pungitius) is a closely related species of the ecological ‘supermodel’ three-spined stickleback (Gasterosteus aculeatus). It is an emerging model system for evolutionary biology research but has garnered less attention and lacks extensive genomic resources. To expand on these resources and aid the study of sticklebacks in a phylogenetic framework, we characterized nine-spined stickleback transcriptomes from brain and liver using deep sequencing. Results We obtained nearly eight thousand assembled transcripts, of which 3,091 were assigned as putative one-to-one orthologs to genes found in the three-spined stickleback. These sequences were used for evaluating overall differentiation and substitution rates between nine- and three-spined sticklebacks, and to identify genes that are putatively evolving under positive selection. The synonymous substitution rate was estimated to be 7.1 × 10-9 per site per year between the two species, and a total of 165 genes showed patterns of adaptive evolution in one or both species. A few nine-spined stickleback contigs lacked an obvious ortholog in three-spined sticklebacks but were found to match genes in other fish species, suggesting several gene losses within 13 million years since the divergence of the two stickleback species. We identified 47 SNPs in 25 different genes that differentiate pond and marine ecotypes. We also identified 468 microsatellites that could be further developed as genetic markers in nine-spined sticklebacks. Conclusion With deep sequencing of nine-spined stickleback cDNA libraries, our study provides a significant increase in the number of gene sequences and microsatellite markers for this species, and identifies a number of genes showing patterns of adaptive evolution between nine- and three-spined sticklebacks. We also report several candidate genes that might be involved in differential adaptation between marine and freshwater nine-spined sticklebacks. This study provides a valuable resource for future studies aiming to identify candidate genes underlying ecological adaptation in this and other stickleback species. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-14-756) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Baocheng Guo
- Ecological Genetics Research Unit, Department of Biosciences, University of Helsinki, Helsinki, Finland.
| | | | | | | | | |
Collapse
|
27
|
Wang Y, Tan X, Paterson AH. Different patterns of gene structure divergence following gene duplication in Arabidopsis. BMC Genomics 2013; 14:652. [PMID: 24063813 PMCID: PMC3848917 DOI: 10.1186/1471-2164-14-652] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2013] [Accepted: 09/20/2013] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND Divergence in gene structure following gene duplication is not well understood. Gene duplication can occur via whole-genome duplication (WGD) and single-gene duplications including tandem, proximal and transposed duplications. Different modes of gene duplication may be associated with different types, levels, and patterns of structural divergence. RESULTS In Arabidopsis thaliana, we denote levels of structural divergence between duplicated genes by differences in coding-region lengths and average exon lengths, and the number of insertions/deletions (indels) and maximum indel length in their protein sequence alignment. Among recent duplicates of different modes, transposed duplicates diverge most dramatically in gene structure. In transposed duplications, parental loci tend to have longer coding-regions and exons, and smaller numbers of indels and maximum indel lengths than transposed loci, reflecting biased structural changes in transposed duplications. Structural divergence increases with evolutionary time for WGDs, but not transposed duplications, possibly because of biased gene losses following transposed duplications. Structural divergence has heterogeneous relationships with nucleotide substitution rates, but is consistently positively correlated with gene expression divergence. The NBS-LRR gene family shows higher-than-average levels of structural divergence. CONCLUSIONS Our study suggests that structural divergence between duplicated genes is greatly affected by the mechanisms of gene duplication and may be not proportional to evolutionary time, and that certain gene families are under selection on rapid evolution of gene structure.
Collapse
Affiliation(s)
- Yupeng Wang
- Plant Genome Mapping Laboratory, University of Georgia, Athens, GA 30602, USA.
| | | | | |
Collapse
|
28
|
Nagy A, Patthy L. MisPred: a resource for identification of erroneous protein sequences in public databases. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013; 2013:bat053. [PMID: 23864220 PMCID: PMC3713709 DOI: 10.1093/database/bat053] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Correct prediction of the structure of protein-coding genes of higher eukaryotes is still a difficult task; therefore, public databases are heavily contaminated with mispredicted sequences. The high rate of misprediction has serious consequences because it significantly affects the conclusions that may be drawn from genome-scale sequence analyses of eukaryotic genomes. Here we present the MisPred database and computational pipeline that provide efficient means for the identification of erroneous sequences in public databases. The MisPred database contains a collection of abnormal, incomplete and mispredicted protein sequences from 19 metazoan species identified as erroneous by MisPred quality control tools in the UniProtKB/Swiss-Prot, UniProtKB/TrEMBL, NCBI/RefSeq and EnsEMBL databases. Major releases of the database are automatically generated and updated regularly. The database (http://www.mispred.com) is easily accessible through a simple web interface coupled to a powerful query engine and a standard web service. The content is completely or partially downloadable in a variety of formats. DATABASE URL: http://www.mispred.com.
Collapse
Affiliation(s)
- Alinda Nagy
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, H-1113 Budapest, Hungary
| | | |
Collapse
|
29
|
Genome-wide identification of the class III aminotransferase gene family in rice and expression analysis under abiotic stress. Genes Genomics 2013. [DOI: 10.1007/s13258-013-0108-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
30
|
Residue mutations and their impact on protein structure and function: detecting beneficial and pathogenic changes. Biochem J 2013; 449:581-94. [DOI: 10.1042/bj20121221] [Citation(s) in RCA: 131] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The present review focuses on the evolution of proteins and the impact of amino acid mutations on function from a structural perspective. Proteins evolve under the law of natural selection and undergo alternating periods of conservative evolution and of relatively rapid change. The likelihood of mutations being fixed in the genome depends on various factors, such as the fitness of the phenotype or the position of the residues in the three-dimensional structure. For example, co-evolution of residues located close together in three-dimensional space can occur to preserve global stability. Whereas point mutations can fine-tune the protein function, residue insertions and deletions (‘decorations’ at the structural level) can sometimes modify functional sites and protein interactions more dramatically. We discuss recent developments and tools to identify such episodic mutations, and examine their applications in medical research. Such tools have been tested on simulated data and applied to real data such as viruses or animal sequences. Traditionally, there has been little if any cross-talk between the fields of protein biophysics, protein structure–function and molecular evolution. However, the last several years have seen some exciting developments in combining these approaches to obtain an in-depth understanding of how proteins evolve. For example, a better understanding of how structural constraints affect protein evolution will greatly help us to optimize our models of sequence evolution. The present review explores this new synthesis of perspectives.
Collapse
|