1
|
Waters E, Pucci P, Hirst M, Chapman S, Wang Y, Crea F, Heath CJ. HAR1: an insight into lncRNA genetic evolution. Epigenomics 2021; 13:1831-1843. [PMID: 34676772 DOI: 10.2217/epi-2021-0069] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Long noncoding RNAs (lncRNAs) have a wide range of functions in health and disease, but many remain uncharacterized because of their complex expression patterns and structures. The genetic loci encoding lncRNAs can be subject to accelerated evolutionary changes within the human lineage. HAR1 is a region that has a significantly altered sequence compared to other primates and is a component of two overlapping lncRNA loci, HAR1A and HAR1B. Although the functions of these lncRNAs are unknown, they have been associated with neurological disorders and cancer. Here, we explore the current state of understanding of evolution in human lncRNA genes, using the HAR1 locus as the case study.
Collapse
Affiliation(s)
- Ella Waters
- School of Life, Health & Chemical Sciences, The Open University, Milton Keynes, MK7 6AA, UK
| | - Perla Pucci
- School of Life, Health & Chemical Sciences, The Open University, Milton Keynes, MK7 6AA, UK.,Division of Cellular & Molecular Pathology, Department of Pathology, University of Cambridge, Cambridge, CB2 0QQ, UK
| | - Mark Hirst
- School of Life, Health & Chemical Sciences, The Open University, Milton Keynes, MK7 6AA, UK
| | - Simon Chapman
- School of Life, Health & Chemical Sciences, The Open University, Milton Keynes, MK7 6AA, UK
| | - Yuzhuo Wang
- Department of Urologic Sciences, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Francesco Crea
- School of Life, Health & Chemical Sciences, The Open University, Milton Keynes, MK7 6AA, UK
| | - Christopher J Heath
- School of Life, Health & Chemical Sciences, The Open University, Milton Keynes, MK7 6AA, UK
| |
Collapse
|
2
|
Abstract
Transposable elements (TEs) are mobile genetic elements that were once perceived as merely selfish, but are now recognized as potent agents of adaptation. One way TEs contribute to genome evolution is through TE exaptation, a process whereby TEs, which usually persist by replicating in the genome, transform into novel host genes, which thereafter persist by conferring phenotypic benefits. Exapted TEs are known to contribute diverse and vital functions, and may facilitate punctuated equilibrium, yet we have little understanding about the process of TE exaptation. In order to facilitate our understanding of how TE coding sequences may become exapted, here we incorporate the findings of recent publications into a framework and six-step model.
Collapse
Affiliation(s)
- Zoé Joly-Lopez
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003, USA
| | - Thomas E Bureau
- Department of Biology, McGill University, Montreal, QC H3A 1B1, Canada.
| |
Collapse
|
3
|
Abstract
Genome size in mammals and birds shows remarkably little interspecific variation compared with other taxa. However, genome sequencing has revealed that many mammal and bird lineages have experienced differential rates of transposable element (TE) accumulation, which would be predicted to cause substantial variation in genome size between species. Thus, we hypothesize that there has been covariation between the amount of DNA gained by transposition and lost by deletion during mammal and avian evolution, resulting in genome size equilibrium. To test this model, we develop computational methods to quantify the amount of DNA gained by TE expansion and lost by deletion over the last 100 My in the lineages of 10 species of eutherian mammals and 24 species of birds. The results reveal extensive variation in the amount of DNA gained via lineage-specific transposition, but that DNA loss counteracted this expansion to various extents across lineages. Our analysis of the rate and size spectrum of deletion events implies that DNA removal in both mammals and birds has proceeded mostly through large segmental deletions (>10 kb). These findings support a unified "accordion" model of genome size evolution in eukaryotes whereby DNA loss counteracting TE expansion is a major determinant of genome size. Furthermore, we propose that extensive DNA loss, and not necessarily a dearth of TE activity, has been the primary force maintaining the greater genomic compaction of flying birds and bats relative to their flightless relatives.
Collapse
|
4
|
Shapiro JA. Exploring the read-write genome: mobile DNA and mammalian adaptation. Crit Rev Biochem Mol Biol 2016; 52:1-17. [DOI: 10.1080/10409238.2016.1226748] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Affiliation(s)
- James A. Shapiro
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, IL, USA
| |
Collapse
|
5
|
Gulko B, Hubisz MJ, Gronau I, Siepel A. A method for calculating probabilities of fitness consequences for point mutations across the human genome. Nat Genet 2015; 47:276-83. [PMID: 25599402 PMCID: PMC4342276 DOI: 10.1038/ng.3196] [Citation(s) in RCA: 173] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2014] [Accepted: 12/19/2014] [Indexed: 12/17/2022]
Abstract
We describe a novel computational method for estimating the probability that a point mutation at each position in a genome will influence fitness. These fitness consequence (fitCons) scores serve as evolution-based measures of potential genomic function. Our approach is to cluster genomic positions into groups exhibiting distinct “fingerprints” based on high-throughput functional genomic data, then to estimate a probability of fitness consequences for each group from associated patterns of genetic polymorphism and divergence. We have generated fitCons scores for three human cell types based on public data from ENCODE. Compared with conventional conservation scores, fitCons scores show considerably improved prediction power for cis-regulatory elements. In addition, fitCons scores indicate that 4.2–7.5% of nucleotides in the human genome have influenced fitness since the human-chimpanzee divergence, and they suggest that recent evolutionary turnover has had limited impact on the functional content of the genome.
Collapse
Affiliation(s)
- Brad Gulko
- Graduate Field of Computer Science, Cornell University, Ithaca, New York, USA
| | - Melissa J Hubisz
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA
| | - Ilan Gronau
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA
| | - Adam Siepel
- 1] Graduate Field of Computer Science, Cornell University, Ithaca, New York, USA. [2] Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA
| |
Collapse
|
6
|
Gao K, Miller J. Human-chimpanzee alignment: ortholog exponentials and paralog power laws. Comput Biol Chem 2014; 53 Pt A:59-70. [PMID: 25443749 DOI: 10.1016/j.compbiolchem.2014.08.010] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2014] [Indexed: 11/27/2022]
Abstract
Genomic subsequences conserved between closely related species such as human and chimpanzee exhibit an exponential length distribution, in contrast to the algebraic length distribution observed for sequences shared between distantly related genomes. We find that the former exponential can be further decomposed into an exponential component primarily composed of orthologous sequences, and a truncated algebraic component primarily composed of paralogous sequences.
Collapse
Affiliation(s)
- Kun Gao
- Physics and Biology Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan.
| | - Jonathan Miller
- Physics and Biology Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan.
| |
Collapse
|
7
|
8.2% of the Human genome is constrained: variation in rates of turnover across functional element classes in the human lineage. PLoS Genet 2014; 10:e1004525. [PMID: 25057982 PMCID: PMC4109858 DOI: 10.1371/journal.pgen.1004525] [Citation(s) in RCA: 133] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2013] [Accepted: 06/05/2014] [Indexed: 01/27/2023] Open
Abstract
Ten years on from the finishing of the human reference genome sequence, it remains unclear what fraction of the human genome confers function, where this sequence resides, and how much is shared with other mammalian species. When addressing these questions, functional sequence has often been equated with pan-mammalian conserved sequence. However, functional elements that are short-lived, including those contributing to species-specific biology, will not leave a footprint of long-lasting negative selection. Here, we address these issues by identifying and characterising sequence that has been constrained with respect to insertions and deletions for pairs of eutherian genomes over a range of divergences. Within noncoding sequence, we find increasing amounts of mutually constrained sequence as species pairs become more closely related, indicating that noncoding constrained sequence turns over rapidly. We estimate that half of present-day noncoding constrained sequence has been gained or lost in approximately the last 130 million years (half-life in units of divergence time, d1/2 = 0.25–0.31). While enriched with ENCODE biochemical annotations, much of the short-lived constrained sequences we identify are not detected by models optimized for wider pan-mammalian conservation. Constrained DNase 1 hypersensitivity sites, promoters and untranslated regions have been more evolutionarily stable than long noncoding RNA loci which have turned over especially rapidly. By contrast, protein coding sequence has been highly stable, with an estimated half-life of over a billion years (d1/2 = 2.1–5.0). From extrapolations we estimate that 8.2% (7.1–9.2%) of the human genome is presently subject to negative selection and thus is likely to be functional, while only 2.2% has maintained constraint in both human and mouse since these species diverged. These results reveal that the evolutionary history of the human genome has been highly dynamic, particularly for its noncoding yet biologically functional fraction. Nearly 99% of the human genome does not encode proteins, and while there recently has been extensive biochemical annotation of the remaining noncoding fraction, it remains unclear whether or not the bulk of these DNA sequences have important functional roles. By comparing the genome sequences of different species we identify genomic regions that have evolved unexpectedly slowly, a signature of natural selection upon functional sequence. Using a high resolution evolutionary approach to find sequence showing evolutionary signatures of functionality we estimate that a total of 8.2% (7.1–9.2%) of the human genome is presently functional, more than three times as much than is functional and shared between human and mouse. This implies that there is an abundance of sequences with short lived lineage-specific functionality. As expected, most of the sequence involved in this functional “turnover” is noncoding, while protein coding sequence is stably preserved over longer evolutionary timescales. More generally, we find that the rate of functional turnover varies significantly across categories of functional noncoding elements. Our results provide a pan-mammalian and whole genome perspective on how rapidly different classes of sequence have gained and lost functionality down the human lineage.
Collapse
|
8
|
Abstract
Evolutionary conservation has been an accurate predictor of functional elements across the first decade of metazoan genomics. More recently, there has been a move to define functional elements instead from biochemical annotations. Evolutionary methods are, however, more comprehensive than biochemical approaches can be and can assess quantitatively, especially for subtle effects, how biologically important--how injurious after mutation--different types of elements are. Evolutionary methods are thus critical for understanding the large fraction (up to 10%) of the human genome that does not encode proteins and yet might convey function. These methods can also capture the ephemeral nature of much noncoding functional sequence, with large numbers of functional elements having been gained and lost rapidly along each mammalian lineage. Here, we review how different strengths of purifying selection have impacted on protein-coding and non-protein-coding loci and on transcription factor binding sites in mammalian and fruit fly genomes.
Collapse
Affiliation(s)
- Wilfried Haerty
- MRC Functional Genomics Unit, Department of Physiology, Anatomy, and Genetics, University of Oxford, Oxford OX1 3PT, United Kingdom; ,
| | | |
Collapse
|
9
|
Stindl R. The telomeric sync model of speciation: species-wide telomere erosion triggers cycles of transposon-mediated genomic rearrangements, which underlie the saltatory appearance of nonadaptive characters. Naturwissenschaften 2014; 101:163-86. [PMID: 24493020 PMCID: PMC3935097 DOI: 10.1007/s00114-014-1152-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2013] [Revised: 01/13/2014] [Accepted: 01/16/2014] [Indexed: 12/16/2022]
Abstract
Charles Darwin knew that the fossil record is not overwhelmingly supportive of genetic and phenotypic gradualism; therefore, he developed the core of his theory on the basis of breeding experiments. Here, I present evidence for the existence of a cell biological mechanism that strongly points to the almost forgotten European concept of saltatory evolution of nonadaptive characters, which is in perfect agreement with the gaps in the fossil record. The standard model of chromosomal evolution has always been handicapped by a paradox, namely, how speciation can occur by spontaneous chromosomal rearrangements that are known to decrease the fertility of heterozygotes in a population. However, the hallmark of almost all closely related species is a differing chromosome complement and therefore chromosomal rearrangements seem to be crucial for speciation. Telomeres, the caps of eukaryotic chromosomes, erode in somatic tissues during life, but have been thought to remain stable in the germline of a species. Recently, a large human study spanning three healthy generations clearly found a cumulative telomere effect, which is indicative of transgenerational telomere erosion in the human species. The telomeric sync model of speciation presented here is based on telomere erosion between generations, which leads to identical fusions of chromosomes and triggers a transposon-mediated genomic repatterning in the germline of many individuals of a species. The phenotypic outcome of the telomere-triggered transposon activity is the saltatory appearance of nonadaptive characters simultaneously in many individuals. Transgenerational telomere erosion is therefore the material basis of aging at the species level.
Collapse
Affiliation(s)
- Reinhard Stindl
- apo-med-center, Alpharm GesmbH, Plättenstrasse 7-9, 2380, Perchtoldsdorf, Austria,
| |
Collapse
|
10
|
Park S, Infante CR, Rivera-Davila LC, Menke DB. Conserved regulation ofhoxc11by pitx1 inAnolislizards. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 2013; 322:156-65. [DOI: 10.1002/jez.b.22554] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2013] [Accepted: 11/26/2013] [Indexed: 11/12/2022]
Affiliation(s)
- Sungdae Park
- Department of Genetics; University of Georgia; Athens Georgia
| | | | - Laura C. Rivera-Davila
- Department of Genetics; University of Georgia; Athens Georgia
- Department of Biology; University of Puerto Rico at Cayey; RISE Program; Cayey Puerto Rico
| | | |
Collapse
|
11
|
Abrusán G. Integration of new genes into cellular networks, and their structural maturation. Genetics 2013; 195:1407-17. [PMID: 24056411 PMCID: PMC3832282 DOI: 10.1534/genetics.113.152256] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2013] [Accepted: 08/27/2013] [Indexed: 12/21/2022] Open
Abstract
It has been recently discovered that new genes can originate de novo from noncoding DNA, and several biological traits including expression or sequence composition form a continuum from noncoding sequences to conserved genes. In this article, using yeast genes I test whether the integration of new genes into cellular networks and their structural maturation shows such a continuum by analyzing their changes with gene age. I show that 1) The number of regulatory, protein-protein, and genetic interactions increases continuously with gene age, although with very different rates. New regulatory interactions emerge rapidly within a few million years, while the number of protein-protein and genetic interactions increases slowly, with a rate of 2-2.25 × 10(-8)/year and 4.8 × 10(-8)/year, respectively. 2) Gene essentiality evolves relatively quickly: the youngest essential genes appear in proto-genes ∼14 MY old. 3) In contrast to interactions, the secondary structure of proteins and their robustness to mutations indicate that new genes face a bottleneck in their evolution: proto-genes are characterized by high β-strand content, high aggregation propensity, and low robustness against mutations, while conserved genes are characterized by lower strand content and higher stability, most likely due to the higher probability of gene loss among young genes and accumulation of neutral mutations.
Collapse
Affiliation(s)
- György Abrusán
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre of the Hungarian Academy of Sciences, Szeged H-6701, Hungary
| |
Collapse
|
12
|
Affiliation(s)
- James G. D. Prendergast
- MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, United Kingdom
| | - Colin A. Semple
- MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, United Kingdom
- * E-mail:
| |
Collapse
|
13
|
Enard W. Functional primate genomics—leveraging the medical potential. J Mol Med (Berl) 2012; 90:471-80. [DOI: 10.1007/s00109-012-0901-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2012] [Revised: 04/04/2012] [Accepted: 04/05/2012] [Indexed: 10/28/2022]
|
14
|
Clustering of DNA words and biological function: A proof of principle. J Theor Biol 2012; 297:127-36. [DOI: 10.1016/j.jtbi.2011.12.024] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2011] [Revised: 12/20/2011] [Accepted: 12/21/2011] [Indexed: 02/08/2023]
|
15
|
|
16
|
Abstract
Many evolutionary studies over the past decade have estimated α(sel), the proportion of all nucleotides in the human genome that are subject to purifying selection because of their biological function. Most of these studies have estimated the nucleotide substitution rates from genome sequence alignments across many diverse mammals. Some α(sel) estimates will be affected by the heterogeneity of substitution rates in neutral sequence across the genome. Most will also be inaccurate if change in the functional sequence repertoire occurs rapidly relative to the separation of lineages that are being compared. Evidence gathered from both evolutionary and experimental analyses now indicate that rates of "turnover" of functional, predominantly noncoding, sequence are, indeed, high. They are sufficiently high that an estimated 50% of mouse constrained noncoding sequence is predicted not to be shared with rat, a closely related rodent. The rapidity of turnover results in, at least, a twofold underestimate of α(sel) by analyses that measure constraint across the eutherian phylogeny. Approaches that take account of turnover estimate that the steady-state value of α(sel) lies between 10% and 15%. Experimental studies corroborate the predicted rates of loss and gain of noncoding functional sites. These studies show the limitations inherent in the use of deep sequence conservation for identifying functional sequence. Experimental investigations focusing on lineage-specific, noncoding, and functional sequence are now essential if we are to appreciate the complete functional repertoire of the human genome.
Collapse
Affiliation(s)
- Chris P Ponting
- MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford OX1 3QX, United Kingdom.
| | | |
Collapse
|