1
|
Buckley RM, Kortschak RD, Adelson DL. Divergent genome evolution caused by regional variation in DNA gain and loss between human and mouse. PLoS Comput Biol 2018; 14:e1006091. [PMID: 29677183 PMCID: PMC5931693 DOI: 10.1371/journal.pcbi.1006091] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Revised: 05/02/2018] [Accepted: 03/15/2018] [Indexed: 12/31/2022] Open
Abstract
The forces driving the accumulation and removal of non-coding DNA and ultimately the evolution of genome size in complex organisms are intimately linked to genome structure and organisation. Our analysis provides a novel method for capturing the regional variation of lineage-specific DNA gain and loss events in their respective genomic contexts. To further understand this connection we used comparative genomics to identify genome-wide individual DNA gain and loss events in the human and mouse genomes. Focusing on the distribution of DNA gains and losses, relationships to important structural features and potential impact on biological processes, we found that in autosomes, DNA gains and losses both followed separate lineage-specific accumulation patterns. However, in both species chromosome X was particularly enriched for DNA gain, consistent with its high L1 retrotransposon content required for X inactivation. We found that DNA loss was associated with gene-rich open chromatin regions and DNA gain events with gene-poor closed chromatin regions. Additionally, we found that DNA loss events tended to be smaller than DNA gain events suggesting that they were able to accumulate in gene-rich open chromatin regions due to their reduced capacity to interrupt gene regulatory architecture. GO term enrichment showed that mouse loss hotspots were strongly enriched for terms related to developmental processes. However, these genes were also located in regions with a high density of conserved elements, suggesting that despite high levels of DNA loss, gene regulatory architecture remained conserved. This is consistent with a model in which DNA gain and loss results in turnover or "churning" in regulatory element dense regions of open chromatin, where interruption of regulatory elements is selected against.
Collapse
Affiliation(s)
- Reuben M. Buckley
- Department of Genetics and Evolution, The University of Adelaide, North Tce, Adelaide, Australia
| | - R. Daniel Kortschak
- Department of Genetics and Evolution, The University of Adelaide, North Tce, Adelaide, Australia
| | - David L. Adelson
- Department of Genetics and Evolution, The University of Adelaide, North Tce, Adelaide, Australia
- * E-mail:
| |
Collapse
|
2
|
Abstract
Genome size in mammals and birds shows remarkably little interspecific variation compared with other taxa. However, genome sequencing has revealed that many mammal and bird lineages have experienced differential rates of transposable element (TE) accumulation, which would be predicted to cause substantial variation in genome size between species. Thus, we hypothesize that there has been covariation between the amount of DNA gained by transposition and lost by deletion during mammal and avian evolution, resulting in genome size equilibrium. To test this model, we develop computational methods to quantify the amount of DNA gained by TE expansion and lost by deletion over the last 100 My in the lineages of 10 species of eutherian mammals and 24 species of birds. The results reveal extensive variation in the amount of DNA gained via lineage-specific transposition, but that DNA loss counteracted this expansion to various extents across lineages. Our analysis of the rate and size spectrum of deletion events implies that DNA removal in both mammals and birds has proceeded mostly through large segmental deletions (>10 kb). These findings support a unified "accordion" model of genome size evolution in eukaryotes whereby DNA loss counteracting TE expansion is a major determinant of genome size. Furthermore, we propose that extensive DNA loss, and not necessarily a dearth of TE activity, has been the primary force maintaining the greater genomic compaction of flying birds and bats relative to their flightless relatives.
Collapse
|
3
|
Secondary structure impacts patterns of selection in human lncRNAs. BMC Biol 2016; 14:60. [PMID: 27457204 PMCID: PMC4960838 DOI: 10.1186/s12915-016-0283-0] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2016] [Accepted: 07/04/2016] [Indexed: 02/04/2023] Open
Abstract
Background Metazoans transcribe many long non-coding RNAs (lncRNAs) that are poorly conserved and whose function remains unknown. This has raised the questions of what fraction of the predicted lncRNAs is actually functional, and whether selection can effectively constrain lncRNAs in species with small effective population sizes such as human populations. Results Here we evaluate signatures of selection in human lncRNAs using inter-specific data and intra-specific comparisons from five major populations, as well as by assessing relationships between sequence variation and predictions of secondary structure. In all analyses we included a reference of functionally characterized lncRNAs. Altogether, our results show compelling evidence of recent purifying selection acting on both characterized and predicted lncRNAs. We found that RNA secondary structure constrains sequence variation in lncRNAs, so that polymorphisms are depleted in paired regions with low accessibility and tend to be neutral with respect to structural stability. Conclusions Important implications of our results are that secondary structure plays a role in the functionality of lncRNAs, and that the set of predicted lncRNAs contains a large fraction of functional ones that may play key roles that remain to be discovered. Electronic supplementary material The online version of this article (doi:10.1186/s12915-016-0283-0) contains supplementary material, which is available to authorized users.
Collapse
|
4
|
Young RS. Lineage-specific genomics: Frequent birth and death in the human genome: The human genome contains many lineage-specific elements created by both sequence and functional turnover. Bioessays 2016; 38:654-63. [PMID: 27231054 PMCID: PMC4949557 DOI: 10.1002/bies.201500192] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Frequent evolutionary birth and death events have created a large quantity of biologically important, lineage‐specific DNA within mammalian genomes. The birth and death of DNA sequences is so frequent that the total number of these insertions and deletions in the human population remains unknown, although there are differences between these groups, e.g. transposable elements contribute predominantly to sequence insertion. Functional turnover – where the activity of a locus is specific to one lineage, but the underlying DNA remains conserved – can also drive birth and death. However, this does not appear to be a major driver of divergent transcriptional regulation. Both sequence and functional turnover have contributed to the birth and death of thousands of functional promoters in the human and mouse genomes. These findings reveal the pervasive nature of evolutionary birth and death and suggest that lineage‐specific regions may play an important but previously underappreciated role in human biology and disease.
Collapse
Affiliation(s)
- Robert S Young
- MRC Human Genetics Unit, MRC IGMM, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
5
|
Jubb AW, Young RS, Hume DA, Bickmore WA. Enhancer Turnover Is Associated with a Divergent Transcriptional Response to Glucocorticoid in Mouse and Human Macrophages. JOURNAL OF IMMUNOLOGY (BALTIMORE, MD. : 1950) 2016; 196:813-822. [PMID: 26663721 PMCID: PMC4707550 DOI: 10.4049/jimmunol.1502009] [Citation(s) in RCA: 53] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2015] [Accepted: 11/04/2015] [Indexed: 02/07/2023]
Abstract
Phenotypic differences between individuals and species are controlled in part through differences in expression of a relatively conserved set of genes. Genes expressed in the immune system are subject to especially powerful selection. We have investigated the evolution of both gene expression and candidate enhancers in human and mouse macrophages exposed to glucocorticoid (GC), a regulator of innate immunity and an important therapeutic agent. Our analyses revealed a very limited overlap in the repertoire of genes responsive to GC in human and mouse macrophages. Peaks of inducible binding of the GC receptor (GR) detected by chromatin immunoprecipitation-Seq correlated with induction, but not repression, of target genes in both species, occurred at distal regulatory sites not promoters, and were strongly enriched for the consensus GR-binding motif. Turnover of GR binding between mice and humans was associated with gain and loss of the motif. There was no detectable signal of positive selection at species-specific GR binding sites, but clear evidence of purifying selection at the small number of conserved sites. We conclude that enhancer divergence underlies the difference in transcriptional activation after GC treatment between mouse and human macrophages. Only the shared inducible loci show evidence of selection, and therefore these loci may be important for the subset of responses to GC that is shared between species.
Collapse
Affiliation(s)
- Alasdair W Jubb
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, The University of Edinburgh, Crewe Road, Edinburgh, EH4 2XU, Scotland, UK
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, EH25 9RG, Scotland, UK
| | - Robert S Young
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, The University of Edinburgh, Crewe Road, Edinburgh, EH4 2XU, Scotland, UK
| | - David A Hume
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, EH25 9RG, Scotland, UK
| | - Wendy A Bickmore
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, The University of Edinburgh, Crewe Road, Edinburgh, EH4 2XU, Scotland, UK
| |
Collapse
|
6
|
Radó-Trilla N, Arató K, Pegueroles C, Raya A, de la Luna S, Albà MM. Key Role of Amino Acid Repeat Expansions in the Functional Diversification of Duplicated Transcription Factors. Mol Biol Evol 2015; 32:2263-72. [PMID: 25931513 PMCID: PMC4540963 DOI: 10.1093/molbev/msv103] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
The high regulatory complexity of vertebrates has been related to two rounds of whole genome duplication (2R-WGD) that occurred before the divergence of the major vertebrate groups. Following these events, many developmental transcription factors (TFs) were retained in multiple copies and subsequently specialized in diverse functions, whereas others reverted to their singleton state. TFs are known to be generally rich in amino acid repeats or low-complexity regions (LCRs), such as polyalanine or polyglutamine runs, which can evolve rapidly and potentially influence the transcriptional activity of the protein. Here we test the hypothesis that LCRs have played a major role in the diversification of TF gene duplicates. We find that nearly half of the TF gene families originated during the 2R-WGD contains LCRs. The number of gene duplicates with LCRs is 155 out of 550 analyzed (28%), about twice as many as the number of single copy genes with LCRs (15 out of 115, 13%). In addition, duplicated TFs preferentially accumulate certain LCR types, the most prominent of which are alanine repeats. We experimentally test the role of alanine-rich LCRs in two different TF gene families, PHOX2A/PHOX2B and LHX2/LHX9. In both cases, the presence of the alanine-rich LCR in one of the copies (PHOX2B and LHX2) significantly increases the capacity of the TF to activate transcription. Taken together, the results provide strong evidence that LCRs are important driving forces of evolutionary change in duplicated genes.
Collapse
Affiliation(s)
- Núria Radó-Trilla
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Research Institute (IMIM), Barcelona, Spain
| | - Krisztina Arató
- Department of Experimental and Health Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain Centre for Genomic Regulation (CRG), Barcelona, Spain Centro de Investigación Biomèdica en Red en Enfermedades Raras (CIBERER), Barcelona, Spain
| | - Cinta Pegueroles
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Research Institute (IMIM), Barcelona, Spain Centre for Genomic Regulation (CRG), Barcelona, Spain
| | - Alicia Raya
- Department of Experimental and Health Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain Centre for Genomic Regulation (CRG), Barcelona, Spain Centro de Investigación Biomèdica en Red en Enfermedades Raras (CIBERER), Barcelona, Spain
| | - Susana de la Luna
- Department of Experimental and Health Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain Centre for Genomic Regulation (CRG), Barcelona, Spain Centro de Investigación Biomèdica en Red en Enfermedades Raras (CIBERER), Barcelona, Spain Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - M Mar Albà
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Research Institute (IMIM), Barcelona, Spain Department of Experimental and Health Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
7
|
Young RS, Hayashizaki Y, Andersson R, Sandelin A, Kawaji H, Itoh M, Lassmann T, Carninci P, Bickmore WA, Forrest AR, Taylor MS. The frequent evolutionary birth and death of functional promoters in mouse and human. Genome Res 2015; 25:1546-57. [PMID: 26228054 PMCID: PMC4579340 DOI: 10.1101/gr.190546.115] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2015] [Accepted: 07/28/2015] [Indexed: 12/04/2022]
Abstract
Promoters are central to the regulation of gene expression. Changes in gene regulation are thought to underlie much of the adaptive diversification between species and phenotypic variation within populations. In contrast to earlier work emphasizing the importance of enhancer evolution and subtle sequence changes at promoters, we show that dramatic changes such as the complete gain and loss (collectively, turnover) of functional promoters are common. Using quantitative measures of transcription initiation in both humans and mice across 52 matched tissues, we discriminate promoter sequence gains from losses and resolve the lineage of changes. We also identify expression divergence and functional turnover between orthologous promoters, finding only the latter is associated with local sequence changes. Promoter turnover has occurred at the majority (>56%) of protein-coding genes since humans and mice diverged. Tissue-restricted promoters are the most evolutionarily volatile where retrotransposition is an important, but not the sole, source of innovation. There is considerable heterogeneity of turnover rates between promoters in different tissues, but the consistency of these in both lineages suggests that the same biological systems are similarly inclined to transcriptional rewiring. The genes affected by promoter turnover show evidence of adaptive evolution. In mice, promoters are primarily lost through deletion of the promoter containing sequence, whereas in humans, many promoters appear to be gradually decaying with weak transcriptional output and relaxed selective constraint. Our results suggest that promoter gain and loss is an important process in the evolutionary rewiring of gene regulation and may be a significant source of phenotypic diversification.
Collapse
Affiliation(s)
- Robert S Young
- MRC Human Genetics Unit, MRC Institute for Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, United Kingdom
| | - Yoshihide Hayashizaki
- RIKEN Preventive Medicine and Diagnosis Innovation Program, Wako, Saitama, 351-0198, Japan
| | - Robin Andersson
- Department of Biology and Biotech Research and Innovation Centre, Copenhagen University, 2200 Copenhagen N, Denmark
| | - Albin Sandelin
- Department of Biology and Biotech Research and Innovation Centre, Copenhagen University, 2200 Copenhagen N, Denmark
| | - Hideya Kawaji
- RIKEN Preventive Medicine and Diagnosis Innovation Program, Wako, Saitama, 351-0198, Japan; RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Tsurumi-ku, Yokohama, 230-0045, Japan
| | - Masayoshi Itoh
- RIKEN Preventive Medicine and Diagnosis Innovation Program, Wako, Saitama, 351-0198, Japan; RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Tsurumi-ku, Yokohama, 230-0045, Japan
| | - Timo Lassmann
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Tsurumi-ku, Yokohama, 230-0045, Japan
| | - Piero Carninci
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Tsurumi-ku, Yokohama, 230-0045, Japan
| | | | - Wendy A Bickmore
- MRC Human Genetics Unit, MRC Institute for Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, United Kingdom
| | - Alistair R Forrest
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Tsurumi-ku, Yokohama, 230-0045, Japan; Systems Biology and Genomics, Harry Perkins Institute of Medical Research, QEII Medical Centre, Nedlands, Western Australia 6009, Australia
| | - Martin S Taylor
- MRC Human Genetics Unit, MRC Institute for Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, United Kingdom
| |
Collapse
|
8
|
Park L. Ancestral alleles in the human genome based on population sequencing data. PLoS One 2015; 10:e0128186. [PMID: 26020928 PMCID: PMC4447449 DOI: 10.1371/journal.pone.0128186] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2015] [Accepted: 04/23/2015] [Indexed: 12/03/2022] Open
Abstract
Ancestral allele information is useful for genetics studies. Previously, the identification of ancestral alleles was primarily based on sequence alignments between species. Alternative ways to identify ancestral alleles were proposed in this study based on population sequencing data. The methods described here utilized the diversity between haplotypes harboring ancestral and newly emerged alleles. Simulations showed that these methods were reliable for identifying ancestral alleles when the variants had not aged too greatly. Application to the human genome sequencing data suggested the role of indels in maintaining the GC content in the human genome. The deletion-to-insertion ratios and GC proportions were correlated depending on the sizes of insertions and deletions in the direction of increasing GC content. There were GC-biased fixations in single base-pair insertions and AT-biased fixations in single base-pair deletions in the results based on the proposed methods. In the current study, GC-biased gene conversions in nucleotide substitutions were very slight or insignificant. In the variants of several quantitative trait loci (QTLs), slight GC-biased gene conversion was observed in nucleotide substitutions. For the QTL indels, insertions were observed more often than deletions, and deletion-biased fixation was observed, providing new insights into the evolution of functional genes.
Collapse
Affiliation(s)
- Leeyoung Park
- Natural Science Research Institute, Yonsei University, Seoul, Korea
| |
Collapse
|
9
|
Meng Y, Zhang W, Zhou J, Liu M, Chen J, Tian S, Zhuo M, Zhang Y, Zhong Y, Du H, Wang X. Genome-wide analysis of positively selected genes in seasonal and non-seasonal breeding species. PLoS One 2015; 10:e0126736. [PMID: 26000771 PMCID: PMC4441472 DOI: 10.1371/journal.pone.0126736] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2014] [Accepted: 04/07/2015] [Indexed: 01/04/2023] Open
Abstract
Some mammals breed throughout the year, while others breed only at certain times of year. These differences in reproductive behavior can be explained by evolution. We identified positively-selected genes in two sets of species with different degrees of relatedness including seasonal and non-seasonal breeding species, using branch-site models. After stringent filtering by sum of pairs scoring, we revealed that more genes underwent positive selection in seasonal compared with non-seasonal breeding species. Positively-selected genes were verified by cDNA mapping of the positive sites with the corresponding cDNA sequences. The design of the evolutionary analysis can effectively lower the false-positive rate and thus identify valid positive genes. Validated, positively-selected genes, including CGA, DNAH1, INVS, and CD151, were related to reproductive behaviors such as spermatogenesis and cell proliferation in non-seasonal breeding species. Genes in seasonal breeding species, including THRAP3, TH1L, and CMTM6, may be related to the evolution of sperm and the circadian rhythm system. Identification of these positively-selected genes might help to identify the molecular mechanisms underlying seasonal and non-seasonal reproductive behaviors.
Collapse
Affiliation(s)
- Yuhuan Meng
- School of Bioscience and Bioengineering, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, South China University of Technology, Guangzhou, China
| | - Wenlu Zhang
- School of Bioscience and Bioengineering, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, South China University of Technology, Guangzhou, China
| | - Jinghui Zhou
- School of Bioscience and Bioengineering, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, South China University of Technology, Guangzhou, China
| | - Mingyu Liu
- School of Bioscience and Bioengineering, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, South China University of Technology, Guangzhou, China
| | - Junhui Chen
- School of Bioscience and Bioengineering, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, South China University of Technology, Guangzhou, China
| | - Shuai Tian
- School of Bioscience and Bioengineering, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, South China University of Technology, Guangzhou, China
| | - Min Zhuo
- School of Bioscience and Bioengineering, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, South China University of Technology, Guangzhou, China
| | - Yu Zhang
- Guangdong Key Laboratory of Laboratory Animals/Guangdong laboratory animals monitoring institution, Guangzhou, China
| | - Yang Zhong
- School of Life Sciences, Fudan University, Shanghai, China
- Institute of Biodiversity Science, Tibet University, Lhasa, China
| | - Hongli Du
- School of Bioscience and Bioengineering, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, South China University of Technology, Guangzhou, China
| | - Xiaoning Wang
- School of Bioscience and Bioengineering, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, South China University of Technology, Guangzhou, China
- Chinese PLA General Hospital, Beijing, China
| |
Collapse
|
10
|
Huang S, Li J, Xu A, Huang G, You L. Small Insertions Are More Deleterious than Small Deletions in Human Genomes. Hum Mutat 2013; 34:1642-9. [DOI: 10.1002/humu.22435] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2013] [Accepted: 08/22/2013] [Indexed: 11/09/2022]
Affiliation(s)
- Shengfeng Huang
- State Key Laboratory of Biocontrol; Guangdong Key Laboratory of Pharmaceutical Functional Genes; College of Life Sciences, Sun Yat-sen University; Guangzhou 510275 People's Republic of China
| | - Jie Li
- State Key Laboratory of Biocontrol; Guangdong Key Laboratory of Pharmaceutical Functional Genes; College of Life Sciences, Sun Yat-sen University; Guangzhou 510275 People's Republic of China
| | - Anlong Xu
- State Key Laboratory of Biocontrol; Guangdong Key Laboratory of Pharmaceutical Functional Genes; College of Life Sciences, Sun Yat-sen University; Guangzhou 510275 People's Republic of China
- Beijing University of Chinese Medicine, Chao-yang District; Beijing 100029 People's Republic of China
| | - Guangrui Huang
- State Key Laboratory of Biocontrol; Guangdong Key Laboratory of Pharmaceutical Functional Genes; College of Life Sciences, Sun Yat-sen University; Guangzhou 510275 People's Republic of China
| | - Leiming You
- State Key Laboratory of Biocontrol; Guangdong Key Laboratory of Pharmaceutical Functional Genes; College of Life Sciences, Sun Yat-sen University; Guangzhou 510275 People's Republic of China
| |
Collapse
|
11
|
Villanueva-Cañas JL, Laurie S, Albà MM. Improving genome-wide scans of positive selection by using protein isoforms of similar length. Genome Biol Evol 2013; 5:457-67. [PMID: 23377868 PMCID: PMC3590775 DOI: 10.1093/gbe/evt017] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Large-scale evolutionary studies often require the automated construction of alignments of a large number of homologous gene families. The majority of eukaryotic genes can produce different transcripts due to alternative splicing or transcription initiation, and many such transcripts encode different protein isoforms. As analyses tend to be gene centered, one single-protein isoform per gene is selected for the alignment, with the de facto approach being to use the longest protein isoform per gene (Longest), presumably to avoid including partial sequences and to maximize sequence information. Here, we show that this approach is problematic because it increases the number of indels in the alignments due to the inclusion of nonhomologous regions, such as those derived from species-specific exons, increasing the number of misaligned positions. With the aim of ameliorating this problem, we have developed a novel heuristic, Protein ALignment Optimizer (PALO), which, for each gene family, selects the combination of protein isoforms that are most similar in length. We examine several evolutionary parameters inferred from alignments in which the only difference is the method used to select the protein isoform combination: Longest, PALO, the combination that results in the highest sequence conservation, and a randomly selected combination. We observe that Longest tends to overestimate both nonsynonymous and synonymous substitution rates when compared with PALO, which is most likely due to an excess of misaligned positions. The estimation of the fraction of genes that have experienced positive selection by maximum likelihood is very sensitive to the method of isoform selection employed, both when alignments are constructed with MAFFT and with Prank+F. Longest performs better than a random combination but still estimates up to 3 times more positively selected genes than the combination showing the highest conservation, indicating the presence of many false positives. We show that PALO can eliminate the majority of such false positives and thus that it is a more appropriate approach for large-scale analyses than Longest. A web server has been set up to facilitate the use of PALO given a user-defined set of gene families; it is available at http://evolutionarygenomics.imim.es/palo.
Collapse
Affiliation(s)
- José Luis Villanueva-Cañas
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Research Institute (IMIM), Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | | | | |
Collapse
|
12
|
Sun C, López Arriaza JR, Mueller RL. Slow DNA loss in the gigantic genomes of salamanders. Genome Biol Evol 2013; 4:1340-8. [PMID: 23175715 PMCID: PMC3542557 DOI: 10.1093/gbe/evs103] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Evolutionary changes in genome size result from the combined effects of mutation, natural
selection, and genetic drift. Insertion and deletion mutations (indels) directly impact
genome size by adding or removing sequences. Most species lose more DNA through small
indels (i.e., ∼1–30 bp) than they gain, which can result in genome reduction
over time. Because this rate of DNA loss varies across species, small indel dynamics have
been suggested to contribute to genome size evolution. Species with extremely large
genomes provide interesting test cases for exploring the link between small indels and
genome size; however, most large genomes remain relatively unexplored. Here, we examine
rates of DNA loss in the tetrapods with the largest genomes—the salamanders. We used
low-coverage genomic shotgun sequence data from four salamander species to examine
patterns of insertion, deletion, and substitution in neutrally evolving non-long terminal
repeat (LTR) retrotransposon sequences. For comparison, we estimated genome-wide DNA loss
rates in non-LTR retrotransposon sequences from five other vertebrate genomes:
Anolis carolinensis, Danio rerio, Gallus
gallus, Homo sapiens, and Xenopus tropicalis.
Our results show that salamanders have significantly lower rates of DNA loss than do other
vertebrates. More specifically, salamanders experience lower numbers of deletions relative
to insertions, and both deletions and insertions are skewed toward smaller sizes. On the
basis of these patterns, we conclude that slow DNA loss contributes to genomic gigantism
in salamanders. We also identify candidate molecular mechanisms underlying these
differences and suggest that natural variation in indel dynamics provides a unique
opportunity to study the basis of genome stability.
Collapse
Affiliation(s)
- Cheng Sun
- Department of Biology, Colorado State University, CO, USA
| | | | | |
Collapse
|
13
|
Pegueroles C, Laurie S, Albà MM. Accelerated evolution after gene duplication: a time-dependent process affecting just one copy. Mol Biol Evol 2013; 30:1830-42. [PMID: 23625888 DOI: 10.1093/molbev/mst083] [Citation(s) in RCA: 87] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Gene duplication is widely regarded as a major mechanism modeling genome evolution and function. However, the mechanisms that drive the evolution of the two, initially redundant, gene copies are still ill defined. Many gene duplicates experience evolutionary rate acceleration, but the relative contribution of positive selection and random drift to the retention and subsequent evolution of gene duplicates, and for how long the molecular clock may be distorted by these processes, remains unclear. Focusing on rodent genes that duplicated before and after the mouse and rat split, we find significantly increased sequence divergence after duplication in only one of the copies, which in nearly all cases corresponds to the novel daughter copy, independent of the mechanism of duplication. We observe that the evolutionary rate of the accelerated copy, measured as the ratio of nonsynonymous to synonymous substitutions, is on average 5-fold higher in the period spanning 4-12 My after the duplication than it was before the duplication. This increase can be explained, at least in part, by the action of positive selection according to the results of the maximum likelihood-based branch-site test. Subsequently, the rate decelerates until purifying selection completely returns to preduplication levels. Reversion to the original rates has already been accomplished 40.5 My after the duplication event, corresponding to a genetic distance of about 0.28 synonymous substitutions per site. Differences in tissue gene expression patterns parallel those of substitution rates, reinforcing the role of neofunctionalization in explaining the evolution of young gene duplicates.
Collapse
Affiliation(s)
- Cinta Pegueroles
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Research Institute (IMIM), Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | | | | |
Collapse
|
14
|
Toll-Riera M, Albà MM. Emergence of novel domains in proteins. BMC Evol Biol 2013; 13:47. [PMID: 23425224 PMCID: PMC3599535 DOI: 10.1186/1471-2148-13-47] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2012] [Accepted: 01/31/2013] [Indexed: 12/31/2022] Open
Abstract
Background Proteins are composed of a combination of discrete, well-defined, sequence domains, associated with specific functions that have arisen at different times during evolutionary history. The emergence of novel domains is related to protein functional diversification and adaptation. But currently little is known about how novel domains arise and how they subsequently evolve. Results To gain insights into the impact of recently emerged domains in protein evolution we have identified all human young protein domains that have emerged in approximately the past 550 million years. We have classified them into vertebrate-specific and mammalian-specific groups, and compared them to older domains. We have found 426 different annotated young domains, totalling 995 domain occurrences, which represent about 12.3% of all human domains. We have observed that 61.3% of them arose in newly formed genes, while the remaining 38.7% are found combined with older domains, and have very likely emerged in the context of a previously existing protein. Young domains are preferentially located at the N-terminus of the protein, indicating that, at least in vertebrates, novel functional sequences often emerge there. Furthermore, young domains show significantly higher non-synonymous to synonymous substitution rates than older domains using human and mouse orthologous sequence comparisons. This is also true when we compare young and old domains located in the same protein, suggesting that recently arisen domains tend to evolve in a less constrained manner than older domains. Conclusions We conclude that proteins tend to gain domains over time, becoming progressively longer. We show that many proteins are made of domains of different age, and that the fastest evolving parts correspond to the domains that have been acquired more recently.
Collapse
Affiliation(s)
- Macarena Toll-Riera
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics (GRIB) - Hospital del Mar Research Institute (IMIM), Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | | |
Collapse
|
15
|
Radó-Trilla N, Albà M. Dissecting the role of low-complexity regions in the evolution of vertebrate proteins. BMC Evol Biol 2012; 12:155. [PMID: 22920595 PMCID: PMC3523016 DOI: 10.1186/1471-2148-12-155] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2012] [Accepted: 07/30/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Low-complexity regions (LCRs) in proteins are tracts that are highly enriched in one or a few amino acids. Given their high abundance, and their capacity to expand in relatively short periods of time through replication slippage, they can greatly contribute to increase protein sequence space and generate novel protein functions. However, little is known about the global impact of LCRs on protein evolution. RESULTS We have traced back the evolutionary history of 2,802 LCRs from a large set of homologous protein families from H.sapiens, M.musculus, G.gallus, D.rerio and C.intestinalis. Transcriptional factors and other regulatory functions are overrepresented in proteins containing LCRs. We have found that the gain of novel LCRs is frequently associated with repeat expansion whereas the loss of LCRs is more often due to accumulation of amino acid substitutions as opposed to deletions. This dichotomy results in net protein sequence gain over time. We have detected a significant increase in the rate of accumulation of novel LCRs in the ancestral Amniota and mammalian branches, and a reduction in the chicken branch. Alanine and/or glycine-rich LCRs are overrepresented in recently emerged LCR sets from all branches, suggesting that their expansion is better tolerated than for other LCR types. LCRs enriched in positively charged amino acids show the contrary pattern, indicating an important effect of purifying selection in their maintenance. CONCLUSION We have performed the first large-scale study on the evolutionary dynamics of LCRs in protein families. The study has shown that the composition of an LCR is an important determinant of its evolutionary pattern.
Collapse
Affiliation(s)
- Núria Radó-Trilla
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics - IMIM Hospital del Mar Research Institute, Universitat Pompeu Fabra, Dr. Aiguader 88, Barcelona 08003, Spain
| | | |
Collapse
|