1
|
Gallardo-Dodd CJ, Kutter C. The regulatory landscape of interacting RNA and protein pools in cellular homeostasis and cancer. Hum Genomics 2024; 18:109. [PMID: 39334294 DOI: 10.1186/s40246-024-00678-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2024] [Accepted: 09/22/2024] [Indexed: 09/30/2024] Open
Abstract
Biological systems encompass intricate networks governed by RNA-protein interactions that play pivotal roles in cellular functions. RNA and proteins constituting 1.1% and 18% of the mammalian cell weight, respectively, orchestrate vital processes from genome organization to translation. To date, disentangling the functional fraction of the human genome has presented a major challenge, particularly for noncoding regions, yet recent discoveries have started to unveil a host of regulatory functions for noncoding RNAs (ncRNAs). While ncRNAs exist at different sizes, structures, degrees of evolutionary conservation and abundances within the cell, they partake in diverse roles either alone or in combination. However, certain ncRNA subtypes, including those that have been described or remain to be discovered, are poorly characterized given their heterogeneous nature. RNA activity is in most cases coordinated through interactions with RNA-binding proteins (RBPs). Extensive efforts are being made to accurately reconstruct RNA-RBP regulatory networks, which have provided unprecedented insight into cellular physiology and human disease. In this review, we provide a comprehensive view of RNAs and RBPs, focusing on how their interactions generate functional signals in living cells, particularly in the context of post-transcriptional regulatory processes and cancer.
Collapse
Affiliation(s)
- Carlos J Gallardo-Dodd
- Department of Microbiology, Tumor, and Cell Biology, Science for Life Laboratory, Karolinska Institute, Solna, Sweden
| | - Claudia Kutter
- Department of Microbiology, Tumor, and Cell Biology, Science for Life Laboratory, Karolinska Institute, Solna, Sweden.
| |
Collapse
|
2
|
Buffalo V, Kern AD. A quantitative genetic model of background selection in humans. PLoS Genet 2024; 20:e1011144. [PMID: 38507461 PMCID: PMC10984650 DOI: 10.1371/journal.pgen.1011144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Revised: 04/01/2024] [Accepted: 01/19/2024] [Indexed: 03/22/2024] Open
Abstract
Across the human genome, there are large-scale fluctuations in genetic diversity caused by the indirect effects of selection. This "linked selection signal" reflects the impact of selection according to the physical placement of functional regions and recombination rates along chromosomes. Previous work has shown that purifying selection acting against the steady influx of new deleterious mutations at functional portions of the genome shapes patterns of genomic variation. To date, statistical efforts to estimate purifying selection parameters from linked selection models have relied on classic Background Selection theory, which is only applicable when new mutations are so deleterious that they cannot fix in the population. Here, we develop a statistical method based on a quantitative genetics view of linked selection, that models how polygenic additive fitness variance distributed along the genome increases the rate of stochastic allele frequency change. By jointly predicting the equilibrium fitness variance and substitution rate due to both strong and weakly deleterious mutations, we estimate the distribution of fitness effects (DFE) and mutation rate across three geographically distinct human samples. While our model can accommodate weaker selection, we find evidence of strong selection operating similarly across all human samples. Although our quantitative genetic model of linked selection fits better than previous models, substitution rates of the most constrained sites disagree with observed divergence levels. We find that a model incorporating selective interference better predicts observed divergence in conserved regions, but overall our results suggest uncertainty remains about the processes generating fitness variation in humans.
Collapse
Affiliation(s)
- Vince Buffalo
- Department of Integrative Biology, University of California, Berkeley, Berkeley, California, United States of America
- Institute of Ecology and Evolution and Department of Biology, University of Oregon, Eugene, Oregon, United States of America
| | - Andrew D. Kern
- Institute of Ecology and Evolution and Department of Biology, University of Oregon, Eugene, Oregon, United States of America
| |
Collapse
|
3
|
Young RS, Talmane L, Marion de Procé S, Taylor MS. The contribution of evolutionarily volatile promoters to molecular phenotypes and human trait variation. Genome Biol 2022; 23:89. [PMID: 35379293 PMCID: PMC8978360 DOI: 10.1186/s13059-022-02634-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 02/16/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Promoters are sites of transcription initiation that harbour a high concentration of phenotype-associated genetic variation. The evolutionary gain and loss of promoters between species (collectively, termed turnover) is pervasive across mammalian genomes and may play a prominent role in driving human phenotypic diversity. RESULTS We classified human promoters by their evolutionary history during the divergence of mouse and human lineages from a common ancestor. This defined conserved, human-inserted and mouse-deleted promoters, and a class of functional-turnover promoters that align between species but are only active in humans. We show that promoters of all evolutionary categories are hotspots for substitution and often, insertion mutations. Loci with a history of insertion and deletion continue that mode of evolution within contemporary humans. The presence of an evolutionary volatile promoter within a gene is associated with increased expression variance between individuals, but only in the case of human-inserted and mouse-deleted promoters does that correspond to an enrichment of promoter-proximal genetic effects. Despite the enrichment of these molecular quantitative trait loci (QTL) at evolutionarily volatile promoters, this does not translate into a corresponding enrichment of phenotypic traits mapping to these loci. CONCLUSIONS Promoter turnover is pervasive in the human genome, and these promoters are rich in molecularly quantifiable but phenotypically inconsequential variation in gene expression. However, since evolutionarily volatile promoters show evidence of selection, coupled with high mutation rates and enrichment of QTLs, this implicates them as a source of evolutionary innovation and phenotypic variation, albeit with a high background of selectively neutral expression variation.
Collapse
Affiliation(s)
- Robert S Young
- Usher Institute, University of Edinburgh, Teviot Place, Edinburgh, EH8 9AG, UK. .,Zhejiang University - University of Edinburgh Institute, Zhejiang University, 718 East Haizhou Road, 314400, Haining, China. .,MRC Human Genetics Unit, Institute for Genetics and Cancer, University of Edinburgh, Crewe Road, Edinburgh, EH4 2XU, UK.
| | - Lana Talmane
- MRC Human Genetics Unit, Institute for Genetics and Cancer, University of Edinburgh, Crewe Road, Edinburgh, EH4 2XU, UK
| | - Sophie Marion de Procé
- Usher Institute, University of Edinburgh, Teviot Place, Edinburgh, EH8 9AG, UK.,MRC Human Genetics Unit, Institute for Genetics and Cancer, University of Edinburgh, Crewe Road, Edinburgh, EH4 2XU, UK
| | - Martin S Taylor
- MRC Human Genetics Unit, Institute for Genetics and Cancer, University of Edinburgh, Crewe Road, Edinburgh, EH4 2XU, UK.
| |
Collapse
|
4
|
Das A, Ganesan H, Sriramulu S, Marotta F, Kanna NRR, Banerjee A, He F, Duttaroy AK, Pathak S. A review on interplay between small RNAs and oxidative stress in cancer progression. Mol Cell Biochem 2021; 476:4117-4131. [PMID: 34292483 DOI: 10.1007/s11010-021-04228-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Accepted: 07/16/2021] [Indexed: 02/07/2023]
Abstract
Oxidative stress has been known to be the underlying cause in many instances of cancer development. The new aspect of cancer genesis that has caught the attention of many researchers worldwide is its connection to non-coding RNAs (ncRNAs). ncRNAs may not be protein coding, but in light of the more recent discovery of their wide range of functions, the term 'dark matter of the genome' has been rendered inapplicable. There is an extensive mention of colon cancer as an example, where some of these ncRNAs and their manipulations have seen significant progress. As of now, the focus is on discovering a non-invasive, cost-effective method for diagnosis that is easier to monitor and can be conducted before visible symptoms indicate cancer in a patient, by which time it may already be too late. The concept of liquid biopsies has revolutionized recent diagnostic measures. It has been possible to detect circulating parts of the cancer genome or other biomarkers in the patients' bodily fluids, resulting in the effective management of the disease. This has led these ncRNAs to be considered effective therapeutic targets and extrinsic modifications in several tumor types, proven to be effective as therapy. However, there is a vast scope for further understanding and pertinent application of our acquired knowledge and expanding it in enhancing the utilization of ncRNAs for a better prognosis, quicker diagnosis, and improved management of cancer. This review explores the prognosis of cancer and related mutations by scrutinizing small ncRNAs in the disease.
Collapse
Affiliation(s)
- Aparimita Das
- Department of Medical Biotechnology, Faculty of Allied Health Sciences, Chettinad Academy of Research and Education (CARE), Chettinad Hospital and Research Institute (CHRI), Kelambakkam, Chennai, 603 103, India
| | - Harsha Ganesan
- Department of Medical Biotechnology, Faculty of Allied Health Sciences, Chettinad Academy of Research and Education (CARE), Chettinad Hospital and Research Institute (CHRI), Kelambakkam, Chennai, 603 103, India
| | - Sushmitha Sriramulu
- Department of Medical Biotechnology, Faculty of Allied Health Sciences, Chettinad Academy of Research and Education (CARE), Chettinad Hospital and Research Institute (CHRI), Kelambakkam, Chennai, 603 103, India
| | - Francesco Marotta
- ReGenera R&D International for Aging Intervention and Vitality & Longevity Medical Science Commission, FEMTEC World Foundation, Milan, Italy
| | - N R Rajesh Kanna
- Department of Pathology, Chettinad Academy of Research and Education (CARE), Chettinad Hospital and Research Institute (CHRI), Kelambakkam, Chennai, 603 103, India
| | - Antara Banerjee
- Department of Medical Biotechnology, Faculty of Allied Health Sciences, Chettinad Academy of Research and Education (CARE), Chettinad Hospital and Research Institute (CHRI), Kelambakkam, Chennai, 603 103, India
| | - Fang He
- West China School of Public Health, Sichuan University, Chengdu, China
| | - Asim K Duttaroy
- Department of Nutrition, Institute of Basic Medical Science, Faculty of Medicine, University of Oslo, Oslo, Norway
| | - Surajit Pathak
- Department of Medical Biotechnology, Faculty of Allied Health Sciences, Chettinad Academy of Research and Education (CARE), Chettinad Hospital and Research Institute (CHRI), Kelambakkam, Chennai, 603 103, India.
| |
Collapse
|
5
|
Galeota-Sprung B, Sniegowski P, Ewens W. Mutational Load and the Functional Fraction of the Human Genome. Genome Biol Evol 2021; 12:273-281. [PMID: 32108234 PMCID: PMC7151545 DOI: 10.1093/gbe/evaa040] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/25/2020] [Indexed: 01/30/2023] Open
Abstract
The fraction of the human genome that is functional is a question of both evolutionary and practical importance. Studies of sequence divergence have suggested that the functional fraction of the human genome is likely to be no more than ∼15%. In contrast, the ENCODE project, a systematic effort to map regions of transcription, transcription factor association, chromatin structure, and histone modification, assigned function to 80% of the human genome. In this article, we examine whether and how an analysis based on mutational load might set a limit on the functional fraction. In order to do so, we characterize the distribution of fitness of a large, finite, diploid population at mutation-selection equilibrium. In particular, if mean fitness is ∼1, the fitness of the fittest individual likely to occur cannot be unreasonably high. We find that at equilibrium, the distribution of log fitness has variance nus, where u is the per-base deleterious mutation rate, n is the number of functional sites (and hence incorporates the functional fraction f), and s is the selection coefficient of deleterious mutations. In a large (N=109) reproducing population, the fitness of the fittest individual likely to exist is ∼e5nus. These results apply to both additive and recessive fitness schemes. Our approach is different from previous work that compared mean fitness at mutation-selection equilibrium with the fitness of an individual who has no deleterious mutations; we show that such an individual is exceedingly unlikely to exist. We find that the functional fraction is not very likely to be limited substantially by mutational load, and that any such limit, if it exists, depends strongly on the selection coefficients of new deleterious mutations.
Collapse
Affiliation(s)
| | | | - Warren Ewens
- Department of Biology, University of Pennsylvania
| |
Collapse
|
6
|
The rate and molecular spectrum of mutation are selectively maintained in yeast. Nat Commun 2021; 12:4044. [PMID: 34193872 PMCID: PMC8245649 DOI: 10.1038/s41467-021-24364-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 06/10/2021] [Indexed: 12/25/2022] Open
Abstract
What determines the rate (μ) and molecular spectrum of mutation is a fundamental question. The prevailing hypothesis asserts that natural selection against deleterious mutations has pushed μ to the minimum achievable in the presence of genetic drift, or the drift barrier. Here we show that, contrasting this hypothesis, μ substantially exceeds the drift barrier in diverse organisms. Random mutation accumulation (MA) in yeast frequently reduces μ, and deleting the newly discovered mutator gene PSP2 nearly halves μ. These results, along with a comparison between the MA and natural yeast strains, demonstrate that μ is maintained above the drift barrier by stabilizing selection. Similar comparisons show that the mutation spectrum such as the universal AT mutational bias is not intrinsic but has been selectively preserved. These findings blur the separation of mutation from selection as distinct evolutionary forces but open the door to alleviating mutagenesis in various organisms by genome editing. How natural selection shapes the rate and molecular spectrum of mutations is debated. Yeast mutation accumulation experiments identify a gene promoting mutagenesis and show stabilizing selection maintaining the mutation rate above the drift barrier. Selection also preserves the mutation spectrum.
Collapse
|
7
|
Huber CD, Kim BY, Lohmueller KE. Population genetic models of GERP scores suggest pervasive turnover of constrained sites across mammalian evolution. PLoS Genet 2020; 16:e1008827. [PMID: 32469868 PMCID: PMC7286533 DOI: 10.1371/journal.pgen.1008827] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Revised: 06/10/2020] [Accepted: 05/05/2020] [Indexed: 01/20/2023] Open
Abstract
Comparative genomic approaches have been used to identify sites where mutations are under purifying selection and of functional consequence by searching for sequences that are conserved across distantly related species. However, the performance of these approaches has not been rigorously evaluated under population genetic models. Further, short-lived functional elements may not leave a footprint of sequence conservation across many species. We use simulations to study how one measure of conservation, the Genomic Evolutionary Rate Profiling (GERP) score, relates to the strength of selection (Nes). We show that the GERP score is related to the strength of purifying selection. However, changes in selection coefficients or functional elements over time (i.e. functional turnover) can strongly affect the GERP distribution, leading to unexpected relationships between GERP and Nes. Further, we show that for functional elements that have a high turnover rate, adding more species to the analysis does not necessarily increase statistical power. Finally, we use the distribution of GERP scores across the human genome to compare models with and without turnover of sites where mutations are under purifying selection. We show that mutations in 4.51% of the noncoding human genome are under purifying selection and that most of this sequence has likely experienced changes in selection coefficients throughout mammalian evolution. Our work reveals limitations to using comparative genomic approaches to identify deleterious mutations. Commonly used GERP score thresholds miss over half of the noncoding sites in the human genome where mutations are under purifying selection. One of the most significant and challenging tasks in modern genomics is to assess the functional consequences of a particular nucleotide change in a genome. A common approach to address this challenge prioritizes sequences that share similar nucleotides across distantly related species, with the rationale that mutations at such positions were deleterious and removed from the population by purifying natural selection. Our manuscript shows that one popular measure of sequence conservation, the GERP score, performs well at identifying selected mutations if mutations at a site were under selection across all of mammalian evolution. Changes in selection at a given site dramatically reduces the power of GERP to detect selected mutations in humans. We also combine population genetic models with the distribution of GERP scores at noncoding sites across the human genome to show that the degree of selection at individual sites has changed throughout mammalian evolution. Importantly, we demonstrate that at least 80 Mb of noncoding sequence under purifying selection in humans will not have extreme GERP scores and will likely be missed by modern comparative genomic approaches. Our work argues that new approaches, potentially based on genetic variation within species, will be required to identify deleterious mutations.
Collapse
Affiliation(s)
- Christian D. Huber
- School of Biological Sciences, University of Adelaide, Adelaide, South Australia, Australia
| | - Bernard Y. Kim
- Department of Biology, Stanford University, Stanford, California, United States of America
| | - Kirk E. Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California, United States of America
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, California, United States of America
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California, United States of America
- * E-mail:
| |
Collapse
|
8
|
Woerner AE, Veeramah KR, Watkins JC, Hammer MF. The Role of Phylogenetically Conserved Elements in Shaping Patterns of Human Genomic Diversity. Mol Biol Evol 2020; 35:2284-2295. [PMID: 30113695 DOI: 10.1093/molbev/msy145] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Evolutionary genetic studies have shown a positive correlation between levels of nucleotide diversity and either rates of recombination or genetic distance to genes. Both positive-directional and purifying selection have been offered as the source of these correlations via genetic hitchhiking and background selection, respectively. Phylogenetically conserved elements (CEs) are short (∼100 bp), widely distributed (comprising ∼5% of genome), sequences that are often found far from genes. While the function of many CEs is unknown, CEs also are associated with reduced diversity at linked sites. Using high coverage (>80×) whole genome data from two human populations, the Yoruba and the CEU, we perform fine scale evaluations of diversity, rates of recombination, and linkage to genes. We find that the local rate of recombination has a stronger effect on levels of diversity than linkage to genes, and that these effects of recombination persist even in regions far from genes. Our whole genome modeling demonstrates that, rather than recombination or GC-biased gene conversion, selection on sites within or linked to CEs better explains the observed genomic diversity patterns. A major implication is that very few sites in the human genome are predicted to be free of the effects of selection. These sites, which we refer to as the human "neutralome," comprise only 1.2% of the autosomes and 5.1% of the X chromosome. Demographic analysis of the neutralome reveals larger population sizes and lower rates of growth for ancestral human populations than inferred by previous analyses.
Collapse
Affiliation(s)
- August E Woerner
- ARL Division of Biotechnology, University of Arizona, Tucson, AZ.,Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX
| | - Krishna R Veeramah
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY
| | | | - Michael F Hammer
- ARL Division of Biotechnology, University of Arizona, Tucson, AZ
| |
Collapse
|
9
|
Osipova E, Hecker N, Hiller M. RepeatFiller newly identifies megabases of aligning repetitive sequences and improves annotations of conserved non-exonic elements. Gigascience 2019; 8:giz132. [PMID: 31742600 PMCID: PMC6862929 DOI: 10.1093/gigascience/giz132] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Revised: 09/10/2019] [Accepted: 10/15/2019] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Transposons and other repetitive sequences make up a large part of complex genomes. Repetitive sequences can be co-opted into a variety of functions and thus provide a source for evolutionary novelty. However, comprehensively detecting ancestral repeats that align between species is difficult because considering all repeat-overlapping seeds in alignment methods that rely on the seed-and-extend heuristic results in prohibitively high runtimes. RESULTS Here, we show that ignoring repeat-overlapping alignment seeds when aligning entire genomes misses numerous alignments between repetitive elements. We present a tool, RepeatFiller, that improves genome alignments by incorporating previously undetected local alignments between repetitive sequences. By applying RepeatFiller to genome alignments between human and 20 other representative mammals, we uncover between 22 and 84 Mb of previously undetected alignments that mostly overlap transposable elements. We further show that the increased alignment coverage improves the annotation of conserved non-exonic elements, both by discovering numerous novel transposon-derived elements that evolve under constraint and by removing thousands of elements that are not under constraint in placental mammals. CONCLUSIONS RepeatFiller contributes to comprehensively aligning repetitive genomic regions, which facilitates studying transposon co-option and genome evolution. Source code: https://github.com/hillerlab/GenomeAlignmentTools.
Collapse
Affiliation(s)
- Ekaterina Osipova
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstr. 108, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, Noethnitzer Str. 38, 01187 Dresden, Germany
- Center for Systems Biology, Pfotenhauerstr. 108, 01307 Dresden, Germany
| | - Nikolai Hecker
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstr. 108, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, Noethnitzer Str. 38, 01187 Dresden, Germany
- Center for Systems Biology, Pfotenhauerstr. 108, 01307 Dresden, Germany
| | - Michael Hiller
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstr. 108, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, Noethnitzer Str. 38, 01187 Dresden, Germany
- Center for Systems Biology, Pfotenhauerstr. 108, 01307 Dresden, Germany
| |
Collapse
|
10
|
Functional conserved non-coding elements among tunicates and chordates. Dev Biol 2019; 448:101-110. [DOI: 10.1016/j.ydbio.2018.12.012] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2018] [Revised: 12/10/2018] [Accepted: 12/11/2018] [Indexed: 11/22/2022]
|
11
|
Genome-wide use of high- and low-affinity Tbrain transcription factor binding sites during echinoderm development. Proc Natl Acad Sci U S A 2018; 114:5854-5861. [PMID: 28584099 DOI: 10.1073/pnas.1610611114] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Sea stars and sea urchins are model systems for interrogating the types of deep evolutionary changes that have restructured developmental gene regulatory networks (GRNs). Although cis-regulatory DNA evolution is likely the predominant mechanism of change, it was recently shown that Tbrain, a Tbox transcription factor protein, has evolved a changed preference for a low-affinity, secondary binding motif. The primary, high-affinity motif is conserved. To date, however, no genome-wide comparisons have been performed to provide an unbiased assessment of the evolution of GRNs between these taxa, and no study has attempted to determine the interplay between transcription factor binding motif evolution and GRN topology. The study here measures genome-wide binding of Tbrain orthologs by using ChIP-sequencing and associates these orthologs with putative target genes to assess global function. Targets of both factors are enriched for other regulatory genes, although nonoverlapping sets of functional enrichments in the two datasets suggest a much diverged function. The number of low-affinity binding motifs is significantly depressed in sea urchins compared with sea star, but both motif types are associated with genes from a range of functional categories. Only a small fraction (∼10%) of genes are predicted to be orthologous targets. Collectively, these data indicate that Tbr has evolved significantly different developmental roles in these echinoderms and that the targets and the binding motifs in associated cis-regulatory sequences are dispersed throughout the hierarchy of the GRN, rather than being biased toward terminal process or discrete functional blocks, which suggests extensive evolutionary tinkering.
Collapse
|
12
|
Marinov GK, Kundaje A. ChIP-ping the branches of the tree: functional genomics and the evolution of eukaryotic gene regulation. Brief Funct Genomics 2018; 17:116-137. [PMID: 29529131 PMCID: PMC5889016 DOI: 10.1093/bfgp/ely004] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Advances in the methods for detecting protein-DNA interactions have played a key role in determining the directions of research into the mechanisms of transcriptional regulation. The most recent major technological transformation happened a decade ago, with the move from using tiling arrays [chromatin immunoprecipitation (ChIP)-on-Chip] to high-throughput sequencing (ChIP-seq) as a readout for ChIP assays. In addition to the numerous other ways in which it is superior to arrays, by eliminating the need to design and manufacture them, sequencing also opened the door to carrying out comparative analyses of genome-wide transcription factor occupancy across species and studying chromatin biology in previously less accessible model and nonmodel organisms, thus allowing us to understand the evolution and diversity of regulatory mechanisms in unprecedented detail. Here, we review the biological insights obtained from such studies in recent years and discuss anticipated future developments in the field.
Collapse
Affiliation(s)
- Georgi K Marinov
- Corresponding author: Georgi K. Marinov, Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA. E-mail:
| | | |
Collapse
|
13
|
Wang Y, Ung MH, Xia T, Cheng W, Cheng C. Cancer cell line specific co-factors modulate the FOXM1 cistrome. Oncotarget 2017; 8:76498-76515. [PMID: 29100329 PMCID: PMC5652723 DOI: 10.18632/oncotarget.20405] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2017] [Accepted: 08/14/2017] [Indexed: 12/11/2022] Open
Abstract
ChIP-seq has been commonly applied to identify genomic occupation of transcription factors (TFs) in a context-specific manner. It is generally assumed that a TF should have similar binding patterns in cells from the same or closely related tissues. Surprisingly, this assumption has not been carefully examined. To this end, we systematically compared the genomic binding of the cell cycle regulator FOXM1 in eight cell lines from seven different human tissues at binding signal, peaks and target genes levels. We found that FOXM1 binding in ER-positive breast cancer cell line MCF-7 are distinct comparing to those in not only other non-breast cell lines, but also MDA-MB-231, ER-negative breast cancer cell line. However, binding sites in MDA-MB-231 and non-breast cell lines were highly consistent. The recruitment of estrogen receptor alpha (ERα) caused the unique FOXM1 binding patterns in MCF-7. Moreover, the activity of FOXM1 in MCF-7 reflects the regulatory functions of ERα, while in MDA-MB-231 and non-breast cell lines, FOXM1 activities regulate cell proliferation. Our results suggest that tissue similarity, in some specific contexts, does not hold precedence over TF-cofactors interactions in determining transcriptional states and that the genomic binding of a TF can be dramatically affected by a particular co-factor under certain conditions.
Collapse
Affiliation(s)
- Yue Wang
- School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China.,Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| | - Matthew H Ung
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| | - Tian Xia
- School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Wenqing Cheng
- School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Chao Cheng
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA.,Norris Cotton Cancer Center, Geisel School of Medicine at Dartmouth, Lebanon, NH 03766, USA.,Department of Biomedical Data Sciences, Geisel School of Medicine at Dartmouth, Lebanon, NH 03766, USA
| |
Collapse
|
14
|
Villanueva‐Cañas JL, Rech GE, Cara MAR, González J. Beyond
SNP
s: how to detect selection on transposable element insertions. Methods Ecol Evol 2017. [DOI: 10.1111/2041-210x.12781] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Affiliation(s)
| | - Gabriel E. Rech
- Institute of Evolutionary Biology (CSIC‐Universitat Pompeu Fabra) Barcelona Spain
| | - Maria Angeles Rodriguez Cara
- Ecoanthropology and Ethnobiology Laboratory, UMR 7206, CNRS/MNHN/Universite Paris 7 Museum National d'HistoireNaturelle F‐75116 Paris France
| | - Josefa González
- Institute of Evolutionary Biology (CSIC‐Universitat Pompeu Fabra) Barcelona Spain
| |
Collapse
|
15
|
Huang YF, Gulko B, Siepel A. Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data. Nat Genet 2017; 49:618-624. [PMID: 28288115 PMCID: PMC5395419 DOI: 10.1038/ng.3810] [Citation(s) in RCA: 221] [Impact Index Per Article: 31.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 02/13/2017] [Indexed: 12/17/2022]
Abstract
Many genetic variants that influence phenotypes of interest are located outside of protein-coding genes, yet existing methods for identifying such variants have poor predictive power. Here we introduce a new computational method, called LINSIGHT, that substantially improves the prediction of noncoding nucleotide sites at which mutations are likely to have deleterious fitness consequences, and which, therefore, are likely to be phenotypically important. LINSIGHT combines a generalized linear model for functional genomic data with a probabilistic model of molecular evolution. The method is fast and highly scalable, enabling it to exploit the 'big data' available in modern genomics. We show that LINSIGHT outperforms the best available methods in identifying human noncoding variants associated with inherited diseases. In addition, we apply LINSIGHT to an atlas of human enhancers and show that the fitness consequences at enhancers depend on cell type, tissue specificity, and constraints at associated promoters.
Collapse
Affiliation(s)
- Yi-Fei Huang
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | - Brad Gulko
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA.,Graduate Field of Computer Science, Cornell University, Ithaca, New York, USA
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| |
Collapse
|
16
|
Dutoit L, Burri R, Nater A, Mugal CF, Ellegren H. Genomic distribution and estimation of nucleotide diversity in natural populations: perspectives from the collared flycatcher (Ficedula albicollis) genome. Mol Ecol Resour 2016; 17:586-597. [DOI: 10.1111/1755-0998.12602] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2016] [Revised: 09/02/2016] [Accepted: 09/19/2016] [Indexed: 12/30/2022]
Affiliation(s)
- Ludovic Dutoit
- Department of Evolutionary Biology; Evolutionary Biology Centre; Uppsala University; Norbyvägen 18D SE-752 36 Uppsala Sweden
| | - Reto Burri
- Department of Evolutionary Biology; Evolutionary Biology Centre; Uppsala University; Norbyvägen 18D SE-752 36 Uppsala Sweden
| | - Alexander Nater
- Department of Evolutionary Biology; Evolutionary Biology Centre; Uppsala University; Norbyvägen 18D SE-752 36 Uppsala Sweden
| | - Carina F. Mugal
- Department of Evolutionary Biology; Evolutionary Biology Centre; Uppsala University; Norbyvägen 18D SE-752 36 Uppsala Sweden
| | - Hans Ellegren
- Department of Evolutionary Biology; Evolutionary Biology Centre; Uppsala University; Norbyvägen 18D SE-752 36 Uppsala Sweden
| |
Collapse
|
17
|
Yue JX, Kozmikova I, Ono H, Nossa CW, Kozmik Z, Putnam NH, Yu JK, Holland LZ. Conserved Noncoding Elements in the Most Distant Genera of Cephalochordates: The Goldilocks Principle. Genome Biol Evol 2016; 8:2387-405. [PMID: 27412606 PMCID: PMC5010895 DOI: 10.1093/gbe/evw158] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Cephalochordates, the sister group of vertebrates + tunicates, are evolving particularly slowly. Therefore, genome comparisons between two congeners of Branchiostoma revealed so many conserved noncoding elements (CNEs), that it was not clear how many are functional regulatory elements. To more effectively identify CNEs with potential regulatory functions, we compared noncoding sequences of genomes of the most phylogenetically distant cephalochordate genera, Asymmetron and Branchiostoma, which diverged approximately 120-160 million years ago. We found 113,070 noncoding elements conserved between the two species, amounting to 3.3% of the genome. The genomic distribution, target gene ontology, and enriched motifs of these CNEs all suggest that many of them are probably cis-regulatory elements. More than 90% of previously verified amphioxus regulatory elements were re-captured in this study. A search of the cephalochordate CNEs around 50 developmental genes in several vertebrate genomes revealed eight CNEs conserved between cephalochordates and vertebrates, indicating sequence conservation over >500 million years of divergence. The function of five CNEs was tested in reporter assays in zebrafish, and one was also tested in amphioxus. All five CNEs proved to be tissue-specific enhancers. Taken together, these findings indicate that even though Branchiostoma and Asymmetron are distantly related, as they are evolving slowly, comparisons between them are likely optimal for identifying most of their tissue-specific cis-regulatory elements laying the foundation for functional characterizations and a better understanding of the evolution of developmental regulation in cephalochordates.
Collapse
Affiliation(s)
- Jia-Xing Yue
- Biosciences at Rice, Rice University, Houston, Texas Present address: Institute for Research on Cancer and Aging, Nice (IRCAN), CNRS UMR 7284, INSERM U1081, Nice 06107 France
| | - Iryna Kozmikova
- Department of Transcriptional Regulation, Institute of Molecular Genetics, Prague 14220, Czech Republic
| | - Hiroki Ono
- Marine Biology Research Division, Scripps Institution of Oceanography, UC San Diego, La Jolla, California
| | - Carlos W Nossa
- Biosciences at Rice, Rice University, Houston, Texas Present address: Gene by Gene Ltd., Houston, TX 77008
| | - Zbynek Kozmik
- Department of Transcriptional Regulation, Institute of Molecular Genetics, Prague 14220, Czech Republic
| | - Nicholas H Putnam
- Biosciences at Rice, Rice University, Houston, Texas Present address: Dovetail Genomics, Santa Cruz, CA 95060
| | - Jr-Kai Yu
- Institute of Cellular and Organismic Biology, Academia Sinica, Taipei, Taiwan
| | - Linda Z Holland
- Marine Biology Research Division, Scripps Institution of Oceanography, UC San Diego, La Jolla, California
| |
Collapse
|
18
|
Phung TN, Huber CD, Lohmueller KE. Determining the Effect of Natural Selection on Linked Neutral Divergence across Species. PLoS Genet 2016; 12:e1006199. [PMID: 27508305 PMCID: PMC4980041 DOI: 10.1371/journal.pgen.1006199] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2015] [Accepted: 06/25/2016] [Indexed: 11/18/2022] Open
Abstract
A major goal in evolutionary biology is to understand how natural selection has shaped patterns of genetic variation across genomes. Studies in a variety of species have shown that neutral genetic diversity (intra-species differences) has been reduced at sites linked to those under direct selection. However, the effect of linked selection on neutral sequence divergence (inter-species differences) remains ambiguous. While empirical studies have reported correlations between divergence and recombination, which is interpreted as evidence for natural selection reducing linked neutral divergence, theory argues otherwise, especially for species that have diverged long ago. Here we address these outstanding issues by examining whether natural selection can affect divergence between both closely and distantly related species. We show that neutral divergence between closely related species (e.g. human-primate) is negatively correlated with functional content and positively correlated with human recombination rate. We also find that neutral divergence between distantly related species (e.g. human-rodent) is negatively correlated with functional content and positively correlated with estimates of background selection from primates. These patterns persist after accounting for the confounding factors of hypermutable CpG sites, GC content, and biased gene conversion. Coalescent models indicate that even when the contribution of ancestral polymorphism to divergence is small, background selection in the ancestral population can still explain a large proportion of the variance in divergence across the genome, generating the observed correlations. Our findings reveal that, contrary to previous intuition, natural selection can indirectly affect linked neutral divergence between both closely and distantly related species. Though we cannot formally exclude the possibility that the direct effects of purifying selection drive some of these patterns, such a scenario would be possible only if more of the genome is under purifying selection than currently believed. Our work has implications for understanding the evolution of genomes and interpreting patterns of genetic variation. Genetic variation at neutral sites can be reduced through linkage to nearby selected sites. This pattern has been used to show the widespread effects of natural selection at shaping patterns of genetic diversity across genomes from a variety of species. However, it is not entirely clear whether natural selection has an effect on neutral divergence between species. Here we show that putatively neutral divergence between closely related species (human and chimp) and between distantly related pairs of species (humans and mice) show signatures consistent with having been affected by linkage to selected sites. Further, our theoretical models and simulations show that natural selection indirectly affecting linked neutral sites can generate these patterns. Unless substantially more of the genome is under the direct effects of purifying selection than currently believed, our results argue that natural selection has played an important role in shaping variation in levels of putatively neutral sequence divergence across the genome. Our findings further suggest that divergence-based estimates of neutral mutation rate variation across the genome as well as certain estimators of population history may be confounded by linkage to selected sites.
Collapse
Affiliation(s)
- Tanya N. Phung
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Christian D. Huber
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Kirk E. Lohmueller
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, California, United States of America
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, United States of America
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, United States of America
- * E-mail:
| |
Collapse
|
19
|
Costa IR, Prosdocimi F, Jennings WB. In silico phylogenomics using complete genomes: a case study on the evolution of hominoids. Genome Res 2016; 26:1257-67. [PMID: 27435933 PMCID: PMC5052044 DOI: 10.1101/gr.203950.115] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2015] [Accepted: 07/14/2016] [Indexed: 01/30/2023]
Abstract
The increasing availability of complete genome data is facilitating the acquisition of phylogenomic data sets, but the process of obtaining orthologous sequences from other genomes and assembling multiple sequence alignments remains piecemeal and arduous. We designed software that performs these tasks and outputs anonymous loci (AL) or anchored enrichment/ultraconserved element loci (AE/UCE) data sets in ready-to-analyze formats. We demonstrate our program by applying it to the hominoids. Starting with human, chimpanzee, gorilla, and orangutan genomes, our software generated an exhaustive data set of 292 ALs (∼1 kb each) in ∼3 h. Not only did analyses of our AL data set validate the program by yielding a portrait of hominoid evolution in agreement with previous studies, but the accuracy and precision of our estimated ancestral effective population sizes and speciation times represent improvements. We also used our program with a published set of 512 vertebrate-wide AE "probe" sequences to generate data sets consisting of 171 and 242 independent loci (∼1 kb each) in 11 and 13 min, respectively. The former data set consisted of flanking sequences 500 bp from adjacent AEs, while the latter contained sequences bordering AEs. Although our AE data sets produced the expected hominoid species tree, coalescent-based estimates of ancestral population sizes and speciation times based on these data were considerably lower than estimates from our AL data set and previous studies. Accordingly, we suggest that loci subjected to direct or indirect selection may not be appropriate for coalescent-based methods. Complete in silico approaches, combined with the burgeoning genome databases, will accelerate the pace of phylogenomics.
Collapse
Affiliation(s)
- Igor Rodrigues Costa
- Laboratório de Genômica e Biodiversidade, Instituto de Bioquímica Médica Leopoldo de Meis, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, 21941-902, Brazil
| | - Francisco Prosdocimi
- Laboratório de Genômica e Biodiversidade, Instituto de Bioquímica Médica Leopoldo de Meis, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, 21941-902, Brazil
| | - W Bryan Jennings
- Departamento de Vertebrados, Museu Nacional, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, 20940-040, Brazil
| |
Collapse
|
20
|
Young RS. Lineage-specific genomics: Frequent birth and death in the human genome: The human genome contains many lineage-specific elements created by both sequence and functional turnover. Bioessays 2016; 38:654-63. [PMID: 27231054 PMCID: PMC4949557 DOI: 10.1002/bies.201500192] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Frequent evolutionary birth and death events have created a large quantity of biologically important, lineage‐specific DNA within mammalian genomes. The birth and death of DNA sequences is so frequent that the total number of these insertions and deletions in the human population remains unknown, although there are differences between these groups, e.g. transposable elements contribute predominantly to sequence insertion. Functional turnover – where the activity of a locus is specific to one lineage, but the underlying DNA remains conserved – can also drive birth and death. However, this does not appear to be a major driver of divergent transcriptional regulation. Both sequence and functional turnover have contributed to the birth and death of thousands of functional promoters in the human and mouse genomes. These findings reveal the pervasive nature of evolutionary birth and death and suggest that lineage‐specific regions may play an important but previously underappreciated role in human biology and disease.
Collapse
Affiliation(s)
- Robert S Young
- MRC Human Genetics Unit, MRC IGMM, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
21
|
Ramachandran P, Palidwor GA, Perkins TJ. BIDCHIPS: bias decomposition and removal from ChIP-seq data clarifies true binding signal and its functional correlates. Epigenetics Chromatin 2015; 8:33. [PMID: 26388941 PMCID: PMC4574076 DOI: 10.1186/s13072-015-0028-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2015] [Accepted: 09/07/2015] [Indexed: 12/24/2022] Open
Abstract
Background Unraveling transcriptional regulatory networks is a central problem in molecular biology and, in this quest, chromatin immunoprecipitation and sequencing (ChIP-seq) technology has given us the unprecedented ability to identify sites of protein-DNA binding and histone modification genome wide. However, multiple systemic and procedural biases hinder harnessing the full potential of this technology. Previous studies have addressed this problem, but a thorough characterization of different, interacting biases on ChIP-seq signals is still lacking. Results Here, we present a novel framework where the genome-wide ChIP-seq signal is viewed as being quantifiably influenced by different, measurable sources of bias, which can then be computationally subtracted away. We use a compendium of 123 human ENCODE ChIP-seq datasets to build regression models that tell us how much of a ChIP-seq signal can be attributed to mappability, GC-content, chromatin accessibility, and factors represented in input DNA and IgG controls. When we use the model to separate out these non-binding influences from the ChIP-seq signal, we obtain a purified signal that associates better to TF-DNA-binding motifs than do other measures of peak significance. We also carry out a multiscale analysis that reveals how ChIP-seq signal biases differ across different scales. Finally, we investigate previously reported associations between gene expression and ChIP-seq signals at transcription start sites. We show that our model can be used to discriminate ChIP-seq signals that are truly related to gene expression from those that are merely correlated by virtue of bias—in particular, chromatin accessibility bias, which shows up in ChIP-seq signals and also relates to gene expression. Conclusions Our study provides new insights into the behavior of ChIP-seq signal biases and proposes a novel mitigation framework that improves results compared to existing techniques. With ChIP-seq now being the central technology for studying transcriptional regulation, it is most crucial to accurately characterize, quantify, and adjust for the genome-wide effects of biases affecting ChIP-seq. Our study also emphasizes that properly accounting for confounders in ChIP-seq data is of paramount importance for obtaining biologically accurate insights into the workings of the complex regulatory mechanisms in living organisms. R and MATLAB packages implementing the framework can be obtained from http://www.perkinslab.ca/Software.html. Electronic supplementary material The online version of this article (doi:10.1186/s13072-015-0028-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Parameswaran Ramachandran
- Regenerative Medicine Program, Ottawa Hospital Research Institute, K1H 8L6 Ottawa, Canada ; Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, K1H 8M5 Ottawa, Canada
| | - Gareth A Palidwor
- Regenerative Medicine Program, Ottawa Hospital Research Institute, K1H 8L6 Ottawa, Canada
| | - Theodore J Perkins
- Regenerative Medicine Program, Ottawa Hospital Research Institute, K1H 8L6 Ottawa, Canada ; Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, K1H 8M5 Ottawa, Canada
| |
Collapse
|
22
|
Ward M, McEwan C, Mills JD, Janitz M. Conservation and tissue-specific transcription patterns of long noncoding RNAs. ACTA ACUST UNITED AC 2015; 1:2-9. [PMID: 27335896 PMCID: PMC4894084 DOI: 10.3109/23324015.2015.1077591] [Citation(s) in RCA: 68] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2015] [Accepted: 07/15/2015] [Indexed: 12/31/2022]
Abstract
Over the past decade, the focus of molecular biology has shifted from being predominately DNA and protein-centric to having a greater appreciation of RNA. It is now accepted that the genome is pervasively transcribed in tissue- and cell-specific manner, to produce not only protein-coding RNAs, but also an array of noncoding RNAs (ncRNAs). Many of these ncRNAs have been found to interact with DNA, protein and other RNA molecules where they exert regulatory functions. Long ncRNAs (lncRNAs) are a subclass of ncRNAs that are particularly interesting due to their cell-specific and species-specific expression patterns and unique conservation patterns. Currently, individual lncRNAs have been classified functionally; however, for the vast majority the functional relevance is unknown. To better categorize lncRNAs, an understanding of their specific expression patterns and evolutionary constraints are needed.
Collapse
Affiliation(s)
- Melanie Ward
- School of Biotechnology and Biomolecular Sciences, University of New South Wales , Sydney, NSW 2052, Australia
| | - Callum McEwan
- School of Biotechnology and Biomolecular Sciences, University of New South Wales , Sydney, NSW 2052, Australia
| | - James D Mills
- School of Biotechnology and Biomolecular Sciences, University of New South Wales , Sydney, NSW 2052, Australia
| | - Michael Janitz
- School of Biotechnology and Biomolecular Sciences, University of New South Wales , Sydney, NSW 2052, Australia
| |
Collapse
|
23
|
Gittelman RM, Hun E, Ay F, Madeoy J, Pennacchio L, Noble WS, Hawkins RD, Akey JM. Comprehensive identification and analysis of human accelerated regulatory DNA. Genome Res 2015; 25:1245-55. [PMID: 26104583 PMCID: PMC4561485 DOI: 10.1101/gr.192591.115] [Citation(s) in RCA: 74] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2015] [Accepted: 06/15/2015] [Indexed: 01/19/2023]
Abstract
It has long been hypothesized that changes in gene regulation have played an important role in human evolution, but regulatory DNA has been much more difficult to study compared with protein-coding regions. Recent large-scale studies have created genome-scale catalogs of DNase I hypersensitive sites (DHSs), which demark potentially functional regulatory DNA. To better define regulatory DNA that has been subject to human-specific adaptive evolution, we performed comprehensive evolutionary and population genetics analyses on over 18 million DHSs discovered in 130 cell types. We identified 524 DHSs that are conserved in nonhuman primates but accelerated in the human lineage (haDHS), and estimate that 70% of substitutions in haDHSs are attributable to positive selection. Through extensive computational and experimental analyses, we demonstrate that haDHSs are often active in brain or neuronal cell types; play an important role in regulating the expression of developmentally important genes, including many transcription factors such as SOX6, POU3F2, and HOX genes; and identify striking examples of adaptive regulatory evolution that may have contributed to human-specific phenotypes. More generally, our results reveal new insights into conserved and adaptive regulatory DNA in humans and refine the set of genomic substrates that distinguish humans from their closest living primate relatives.
Collapse
Affiliation(s)
- Rachel M Gittelman
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Enna Hun
- Division of Medical Genetics, University of Washington, Seattle, Washington 98195, USA
| | - Ferhat Ay
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Jennifer Madeoy
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Len Pennacchio
- Lawrence Berkeley National Laboratory, Genomics Division, Berkeley, California 94701, USA
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - R David Hawkins
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA; Division of Medical Genetics, University of Washington, Seattle, Washington 98195, USA
| | - Joshua M Akey
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
24
|
A statistical framework to predict functional non-coding regions in the human genome through integrated analysis of annotation data. Sci Rep 2015; 5:10576. [PMID: 26015273 PMCID: PMC4444969 DOI: 10.1038/srep10576] [Citation(s) in RCA: 112] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2014] [Accepted: 04/20/2015] [Indexed: 12/16/2022] Open
Abstract
Identifying functional regions in the human genome is a major goal in human genetics. Great efforts have been made to functionally annotate the human genome either through computational predictions, such as genomic conservation, or high-throughput experiments, such as the ENCODE project. These efforts have resulted in a rich collection of functional annotation data of diverse types that need to be jointly analyzed for integrated interpretation and annotation. Here we present GenoCanyon, a whole-genome annotation method that performs unsupervised statistical learning using 22 computational and experimental annotations thereby inferring the functional potential of each position in the human genome. With GenoCanyon, we are able to predict many of the known functional regions. The ability of predicting functional regions as well as its generalizable statistical framework makes GenoCanyon a unique and powerful tool for whole-genome annotation. The GenoCanyon web server is available at http://genocanyon.med.yale.edu
Collapse
|
25
|
Genome-wide comparative analysis reveals human-mouse regulatory landscape and evolution. BMC Genomics 2015; 16:87. [PMID: 25765714 PMCID: PMC4333152 DOI: 10.1186/s12864-015-1245-6] [Citation(s) in RCA: 57] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2014] [Accepted: 01/15/2015] [Indexed: 11/29/2022] Open
Abstract
Background Because species-specific gene expression is driven by species-specific regulation, understanding the relationship between sequence and function of the regulatory regions in different species will help elucidate how differences among species arise. Despite active experimental and computational research, relationships among sequence, conservation, and function are still poorly understood. Results We compared transcription factor occupied segments (TFos) for 116 human and 35 mouse TFs in 546 human and 125 mouse cell types and tissues from the Human and the Mouse ENCODE projects. We based the map between human and mouse TFos on a one-to-one nucleotide cross-species mapper, bnMapper, that utilizes whole genome alignments (WGA). Our analysis shows that TFos are under evolutionary constraint, but a substantial portion (25.1% of mouse and 25.85% of human on average) of the TFos does not have a homologous sequence on the other species; this portion varies among cell types and TFs. Furthermore, 47.67% and 57.01% of the homologous TFos sequence shows binding activity on the other species for human and mouse respectively. However, 79.87% and 69.22% is repurposed such that it binds the same TF in different cells or different TFs in the same cells. Remarkably, within the set of repurposed TFos, the corresponding genome regions in the other species are preferred locations of novel TFos. These events suggest exaptation of some functional regulatory sequences into new function. Despite TFos repurposing, we did not find substantial changes in their predicted target genes, suggesting that CRMs buffer evolutionary events allowing little or no change in the TFos – target gene associations. Thus, the small portion of TFos with strictly conserved occupancy underestimates the degree of conservation of regulatory interactions. Conclusion We mapped regulatory sequences from an extensive number of TFs and cell types between human and mouse using WGA. A comparative analysis of this correspondence unveiled the extent of the shared regulatory sequence across TFs and cell types under study. Importantly, a large part of the shared regulatory sequence is repurposed on the other species. This sequence, fueled by turnover events, provides a strong case for exaptation in regulatory elements. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1245-6) contains supplementary material, which is available to authorized users.
Collapse
|
26
|
Cheatle Jarvela AM, Hinman VF. Evolution of transcription factor function as a mechanism for changing metazoan developmental gene regulatory networks. EvoDevo 2015; 6:3. [PMID: 25685316 PMCID: PMC4327956 DOI: 10.1186/2041-9139-6-3] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2014] [Accepted: 12/18/2014] [Indexed: 11/10/2022] Open
Abstract
The form that an animal takes during development is directed by gene regulatory networks (GRNs). Developmental GRNs interpret maternally deposited molecules and externally supplied signals to direct cell-fate decisions, which ultimately leads to the arrangements of organs and tissues in the organism. Genetically encoded modifications to these networks have generated the wide range of metazoan diversity that exists today. Most studies of GRN evolution focus on changes to cis-regulatory DNA, and it was historically theorized that changes to the transcription factors that bind to these cis-regulatory modules (CRMs) contribute to this process only rarely. A growing body of evidence suggests that changes to the coding regions of transcription factors play a much larger role in the evolution of developmental gene regulatory networks than originally imagined. Just as cis-regulatory changes make use of modular binding site composition and tissue-specific modules to avoid pleiotropy, transcription factor coding regions also predominantly evolve in ways that limit the context of functional effects. Here, we review the recent works that have led to this unexpected change in the field of Evolution and Development (Evo-Devo) and consider the implications these studies have had on our understanding of the evolution of developmental processes.
Collapse
Affiliation(s)
- Alys M Cheatle Jarvela
- Department of Biological Sciences, Carnegie Mellon University, 4400 5th Ave, Pittsburgh, PA 15213 USA
| | - Veronica F Hinman
- Department of Biological Sciences, Carnegie Mellon University, 4400 5th Ave, Pittsburgh, PA 15213 USA
| |
Collapse
|
27
|
Gulko B, Hubisz MJ, Gronau I, Siepel A. A method for calculating probabilities of fitness consequences for point mutations across the human genome. Nat Genet 2015; 47:276-83. [PMID: 25599402 PMCID: PMC4342276 DOI: 10.1038/ng.3196] [Citation(s) in RCA: 181] [Impact Index Per Article: 20.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2014] [Accepted: 12/19/2014] [Indexed: 12/17/2022]
Abstract
We describe a novel computational method for estimating the probability that a point mutation at each position in a genome will influence fitness. These fitness consequence (fitCons) scores serve as evolution-based measures of potential genomic function. Our approach is to cluster genomic positions into groups exhibiting distinct “fingerprints” based on high-throughput functional genomic data, then to estimate a probability of fitness consequences for each group from associated patterns of genetic polymorphism and divergence. We have generated fitCons scores for three human cell types based on public data from ENCODE. Compared with conventional conservation scores, fitCons scores show considerably improved prediction power for cis-regulatory elements. In addition, fitCons scores indicate that 4.2–7.5% of nucleotides in the human genome have influenced fitness since the human-chimpanzee divergence, and they suggest that recent evolutionary turnover has had limited impact on the functional content of the genome.
Collapse
Affiliation(s)
- Brad Gulko
- Graduate Field of Computer Science, Cornell University, Ithaca, New York, USA
| | - Melissa J Hubisz
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA
| | - Ilan Gronau
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA
| | - Adam Siepel
- 1] Graduate Field of Computer Science, Cornell University, Ithaca, New York, USA. [2] Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA
| |
Collapse
|
28
|
Ryan NM, Morris SW, Porteous DJ, Taylor MS, Evans KL. SuRFing the genomics wave: an R package for prioritising SNPs by functionality. Genome Med 2014; 6:79. [PMID: 25400697 PMCID: PMC4224693 DOI: 10.1186/s13073-014-0079-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2014] [Accepted: 09/26/2014] [Indexed: 12/16/2022] Open
Abstract
Identifying functional non-coding variants is one of the greatest unmet challenges in genetics. To help address this, we introduce an R package, SuRFR, which integrates functional annotation and prior biological knowledge to prioritise candidate functional variants. SuRFR is publicly available, modular, flexible, fast, and simple to use. We demonstrate that SuRFR performs with high sensitivity and specificity and provide a widely applicable and scalable benchmarking dataset for model training and validation. Website: http://www.cgem.ed.ac.uk/resources/
Collapse
Affiliation(s)
- Niamh M Ryan
- Centre for Genomic and Experimental Medicine, Institute of Genetics and Molecular Medicine, The University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU UK
| | - Stewart W Morris
- Centre for Genomic and Experimental Medicine, Institute of Genetics and Molecular Medicine, The University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU UK
| | - David J Porteous
- Centre for Genomic and Experimental Medicine, Institute of Genetics and Molecular Medicine, The University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU UK ; Centre for Cognitive Ageing and Cognitive Epidemiology, The University of Edinburgh, 7 George Square, Edinburgh, EH8 9JZ UK
| | - Martin S Taylor
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, The University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU UK
| | - Kathryn L Evans
- Centre for Genomic and Experimental Medicine, Institute of Genetics and Molecular Medicine, The University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU UK ; Centre for Cognitive Ageing and Cognitive Epidemiology, The University of Edinburgh, 7 George Square, Edinburgh, EH8 9JZ UK
| |
Collapse
|
29
|
Babarinde IA, Saitou N. Heterogeneous tempo and mode of conserved noncoding sequence evolution among four mammalian orders. Genome Biol Evol 2014; 5:2330-43. [PMID: 24259317 PMCID: PMC3879966 DOI: 10.1093/gbe/evt177] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Conserved noncoding sequences (CNSs) of vertebrates are considered to be closely linked with protein-coding gene regulatory functions. We examined the abundance and genomic distribution of CNSs in four mammalian orders: primates, rodents, carnivores, and cetartiodactyls. We defined the two thresholds for CNS using conservation level of coding genes; using all the three coding positions and using only first and second codon positions. The abundance of CNSs varied among lineages, with primates and rodents having highest and lowest number of CNSs, respectively, whereas carnivores and cetartiodactyls had intermediate values. These CNSs cover 1.3-5.5% of the mammalian genomes and have signatures of selective constraints that are stronger in more ancestral than the recent ones. Evolution of new CNSs as well as retention of ancestral CNSs contribute to the differences in abundance. The genomic distribution of CNSs is dynamic with higher proportions of rodent and primate CNSs located in the introns compared with carnivores and cetartiodactyls. In fact, 19% of orthologous single-copy CNSs between human and dog are located in different genomic regions. If CNSs can be considered as candidates of gene expression regulatory sequences, heterogeneity of CNSs among the four mammalian orders may have played an important role in creating the order-specific phenotypes. Fewer CNSs in rodents suggest that rodent diversity is related to lower regulatory conservation. With CNSs shown to cluster around genes involved in nervous systems and the higher number of primate CNSs, our result suggests that CNSs may be involved in the higher complexity of the primate nervous system.
Collapse
Affiliation(s)
- Isaac Adeyemi Babarinde
- Department of Genetics, School of Life Science, The Graduate University for Advanced Studies (SOKENDAI), Mishima Japan
| | | |
Collapse
|
30
|
8.2% of the Human genome is constrained: variation in rates of turnover across functional element classes in the human lineage. PLoS Genet 2014; 10:e1004525. [PMID: 25057982 PMCID: PMC4109858 DOI: 10.1371/journal.pgen.1004525] [Citation(s) in RCA: 133] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2013] [Accepted: 06/05/2014] [Indexed: 01/27/2023] Open
Abstract
Ten years on from the finishing of the human reference genome sequence, it remains unclear what fraction of the human genome confers function, where this sequence resides, and how much is shared with other mammalian species. When addressing these questions, functional sequence has often been equated with pan-mammalian conserved sequence. However, functional elements that are short-lived, including those contributing to species-specific biology, will not leave a footprint of long-lasting negative selection. Here, we address these issues by identifying and characterising sequence that has been constrained with respect to insertions and deletions for pairs of eutherian genomes over a range of divergences. Within noncoding sequence, we find increasing amounts of mutually constrained sequence as species pairs become more closely related, indicating that noncoding constrained sequence turns over rapidly. We estimate that half of present-day noncoding constrained sequence has been gained or lost in approximately the last 130 million years (half-life in units of divergence time, d1/2 = 0.25–0.31). While enriched with ENCODE biochemical annotations, much of the short-lived constrained sequences we identify are not detected by models optimized for wider pan-mammalian conservation. Constrained DNase 1 hypersensitivity sites, promoters and untranslated regions have been more evolutionarily stable than long noncoding RNA loci which have turned over especially rapidly. By contrast, protein coding sequence has been highly stable, with an estimated half-life of over a billion years (d1/2 = 2.1–5.0). From extrapolations we estimate that 8.2% (7.1–9.2%) of the human genome is presently subject to negative selection and thus is likely to be functional, while only 2.2% has maintained constraint in both human and mouse since these species diverged. These results reveal that the evolutionary history of the human genome has been highly dynamic, particularly for its noncoding yet biologically functional fraction. Nearly 99% of the human genome does not encode proteins, and while there recently has been extensive biochemical annotation of the remaining noncoding fraction, it remains unclear whether or not the bulk of these DNA sequences have important functional roles. By comparing the genome sequences of different species we identify genomic regions that have evolved unexpectedly slowly, a signature of natural selection upon functional sequence. Using a high resolution evolutionary approach to find sequence showing evolutionary signatures of functionality we estimate that a total of 8.2% (7.1–9.2%) of the human genome is presently functional, more than three times as much than is functional and shared between human and mouse. This implies that there is an abundance of sequences with short lived lineage-specific functionality. As expected, most of the sequence involved in this functional “turnover” is noncoding, while protein coding sequence is stably preserved over longer evolutionary timescales. More generally, we find that the rate of functional turnover varies significantly across categories of functional noncoding elements. Our results provide a pan-mammalian and whole genome perspective on how rapidly different classes of sequence have gained and lost functionality down the human lineage.
Collapse
|
31
|
del Rosario RCH, Rayan NA, Prabhakar S. Noncoding origins of anthropoid traits and a new null model of transposon functionalization. Genome Res 2014; 24:1469-84. [PMID: 25043600 PMCID: PMC4158753 DOI: 10.1101/gr.168963.113] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Little is known about novel genetic elements that drove the emergence of anthropoid primates. We exploited the sequencing of the marmoset genome to identify 23,849 anthropoid-specific constrained (ASC) regions and confirmed their robust functional signatures. Of the ASC base pairs, 99.7% were noncoding, suggesting that novel anthropoid functional elements were overwhelmingly cis-regulatory. ASCs were highly enriched in loci associated with fetal brain development, motor coordination, neurotransmission, and vision, thus providing a large set of candidate elements for exploring the molecular basis of hallmark primate traits. We validated ASC192 as a primate-specific enhancer in proliferative zones of the developing brain. Unexpectedly, transposable elements (TEs) contributed to >56% of ASCs, and almost all TE families showed functional potential similar to that of nonrepetitive DNA. Three L1PA repeat-derived ASCs displayed coherent eye-enhancer function, thus demonstrating that the "gene-battery" model of TE functionalization applies to enhancers in vivo. Our study provides fundamental insights into genome evolution and the origins of anthropoid phenotypes and supports an elegantly simple new null model of TE exaptation.
Collapse
Affiliation(s)
- Ricardo C H del Rosario
- Computational and Systems Biology, Genome Institute of Singapore, #02-01 Genome, Singapore 138672
| | - Nirmala Arul Rayan
- Computational and Systems Biology, Genome Institute of Singapore, #02-01 Genome, Singapore 138672
| | - Shyam Prabhakar
- Computational and Systems Biology, Genome Institute of Singapore, #02-01 Genome, Singapore 138672
| |
Collapse
|
32
|
Abstract
Evolutionary conservation has been an accurate predictor of functional elements across the first decade of metazoan genomics. More recently, there has been a move to define functional elements instead from biochemical annotations. Evolutionary methods are, however, more comprehensive than biochemical approaches can be and can assess quantitatively, especially for subtle effects, how biologically important--how injurious after mutation--different types of elements are. Evolutionary methods are thus critical for understanding the large fraction (up to 10%) of the human genome that does not encode proteins and yet might convey function. These methods can also capture the ephemeral nature of much noncoding functional sequence, with large numbers of functional elements having been gained and lost rapidly along each mammalian lineage. Here, we review how different strengths of purifying selection have impacted on protein-coding and non-protein-coding loci and on transcription factor binding sites in mammalian and fruit fly genomes.
Collapse
Affiliation(s)
- Wilfried Haerty
- MRC Functional Genomics Unit, Department of Physiology, Anatomy, and Genetics, University of Oxford, Oxford OX1 3PT, United Kingdom; ,
| | | |
Collapse
|
33
|
Abstract
With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease.
Collapse
|
34
|
Abstract
The ability to sequence genomes and characterize their products has begun to reveal the central role for regulatory RNAs in biology, especially in complex organisms. It is now evident that the human genome contains not only protein-coding genes, but also tens of thousands of non-protein coding genes that express small and long ncRNAs (non-coding RNAs). Rapid progress in characterizing these ncRNAs has identified a diverse range of subclasses, which vary widely in size, sequence and mechanism-of-action, but share a common functional theme of regulating gene expression. ncRNAs play a crucial role in many cellular pathways, including the differentiation and development of cells and organs and, when mis-regulated, in a number of diseases. Increasing evidence suggests that these RNAs are a major area of evolutionary innovation and play an important role in determining phenotypic diversity in animals.
Collapse
|
35
|
Genome-wide analysis of promoters: clustering by alignment and analysis of regular patterns. PLoS One 2014; 9:e85260. [PMID: 24465517 PMCID: PMC3898993 DOI: 10.1371/journal.pone.0085260] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2013] [Accepted: 11/26/2013] [Indexed: 01/08/2023] Open
Abstract
In this paper we perform a genome-wide analysis of H. sapiens promoters. To this aim, we developed and combined two mathematical methods that allow us to (i) classify promoters into groups characterized by specific global structural features, and (ii) recover, in full generality, any regular sequence in the different classes of promoters. One of the main findings of this analysis is that H. sapiens promoters can be classified into three main groups. Two of them are distinguished by the prevalence of weak or strong nucleotides and are characterized by short compositionally biased sequences, while the most frequent regular sequences in the third group are strongly correlated with transposons. Taking advantage of the generality of these mathematical procedures, we have compared the promoter database of H. sapiens with those of other species. We have found that the above-mentioned features characterize also the evolutionary content appearing in mammalian promoters, at variance with ancestral species in the phylogenetic tree, that exhibit a definitely lower level of differentiation among promoters.
Collapse
|
36
|
Bassett AR, Liu JL. CRISPR/Cas9 and genome editing in Drosophila. J Genet Genomics 2013; 41:7-19. [PMID: 24480743 DOI: 10.1016/j.jgg.2013.12.004] [Citation(s) in RCA: 139] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2013] [Revised: 12/10/2013] [Accepted: 12/11/2013] [Indexed: 12/26/2022]
Abstract
Recent advances in our ability to design DNA binding factors with specificity for desired sequences have resulted in a revolution in genetic engineering, enabling directed changes to the genome to be made relatively easily. Traditional techniques for generating genetic mutations in most organisms have relied on selection from large pools of randomly induced mutations for those of particular interest, or time-consuming gene targeting by homologous recombination. Drosophila melanogaster has always been at the forefront of genetic analysis, and application of these new genome editing techniques to this organism will revolutionise our approach to performing analysis of gene function in the future. We discuss the recent techniques that apply the CRISPR/Cas9 system to Drosophila, highlight potential uses for this technology and speculate upon the future of genome engineering in this model organism.
Collapse
Affiliation(s)
- Andrew R Bassett
- MRC Functional Genomics Unit, University of Oxford, Department of Physiology, Anatomy and Genetics, South Parks Road, Oxford OX1 3QX, United Kingdom.
| | - Ji-Long Liu
- MRC Functional Genomics Unit, University of Oxford, Department of Physiology, Anatomy and Genetics, South Parks Road, Oxford OX1 3QX, United Kingdom.
| |
Collapse
|
37
|
Abrusán G. Integration of new genes into cellular networks, and their structural maturation. Genetics 2013; 195:1407-17. [PMID: 24056411 PMCID: PMC3832282 DOI: 10.1534/genetics.113.152256] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2013] [Accepted: 08/27/2013] [Indexed: 12/21/2022] Open
Abstract
It has been recently discovered that new genes can originate de novo from noncoding DNA, and several biological traits including expression or sequence composition form a continuum from noncoding sequences to conserved genes. In this article, using yeast genes I test whether the integration of new genes into cellular networks and their structural maturation shows such a continuum by analyzing their changes with gene age. I show that 1) The number of regulatory, protein-protein, and genetic interactions increases continuously with gene age, although with very different rates. New regulatory interactions emerge rapidly within a few million years, while the number of protein-protein and genetic interactions increases slowly, with a rate of 2-2.25 × 10(-8)/year and 4.8 × 10(-8)/year, respectively. 2) Gene essentiality evolves relatively quickly: the youngest essential genes appear in proto-genes ∼14 MY old. 3) In contrast to interactions, the secondary structure of proteins and their robustness to mutations indicate that new genes face a bottleneck in their evolution: proto-genes are characterized by high β-strand content, high aggregation propensity, and low robustness against mutations, while conserved genes are characterized by lower strand content and higher stability, most likely due to the higher probability of gene loss among young genes and accumulation of neutral mutations.
Collapse
Affiliation(s)
- György Abrusán
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre of the Hungarian Academy of Sciences, Szeged H-6701, Hungary
| |
Collapse
|
38
|
Abstract
Antisense transcription, which was initially considered by many as transcriptional noise, is increasingly being recognized as an important regulator of gene expression. It is widespread among all kingdoms of life and has been shown to influence - either through the act of transcription or through the non-coding RNA that is produced - almost all stages of gene expression, from transcription and translation to RNA degradation. Antisense transcription can function as a fast evolving regulatory switch and a modular scaffold for protein complexes, and it can 'rewire' regulatory networks. The genomic arrangement of antisense RNAs opposite sense genes indicates that they might be part of self-regulatory circuits that allow genes to regulate their own expression.
Collapse
|
39
|
Harmston N, Baresic A, Lenhard B. The mystery of extreme non-coding conservation. Philos Trans R Soc Lond B Biol Sci 2013; 368:20130021. [PMID: 24218634 PMCID: PMC3826495 DOI: 10.1098/rstb.2013.0021] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Regions of several dozen to several hundred base pairs of extreme conservation have been found in non-coding regions in all metazoan genomes. The distribution of these elements within and across genomes has suggested that many have roles as transcriptional regulatory elements in multi-cellular organization, differentiation and development. Currently, there is no known mechanism or function that would account for this level of conservation at the observed evolutionary distances. Previous studies have found that, while these regions are under strong purifying selection, and not mutational coldspots, deletion of entire regions in mice does not necessarily lead to identifiable changes in phenotype during development. These opposing findings lead to several questions regarding their functional importance and why they are under strong selection in the first place. In this perspective, we discuss the methods and techniques used in identifying and dissecting these regions, their observed patterns of conservation, and review the current hypotheses on their functional significance.
Collapse
Affiliation(s)
- Nathan Harmston
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London and MRC Clinical Sciences Centre, , Hammersmith Hospital Campus, Du Cane Road, London W12 0NN, UK
| | | | | |
Collapse
|
40
|
Abstract
An enduring goal of evolutionary biology is to understand how natural selection has shaped patterns of polymorphism and divergence within and between species and to map the genetic basis of adaptations. The rapid maturation of next-generation sequencing technology has generated a deluge of genomics data from nonhuman primates, extinct hominins, and diverse human populations. These emerging genome data sets have simultaneously broadened our understanding of human evolution and sharply defined existing gaps in knowledge about the mechanistic basis of evolutionary change. In this review, we summarize recent insights into how natural selection has influenced the human genome across different timescales. Although the path to a more comprehensive understanding of selection and adaptation in humans remains arduous, some general insights are beginning to emerge, such as the importance of adaptive regulatory evolution, the absence of pervasive classic selective sweeps, and the potential roles that selection from standing variation and polygenic adaptation have likely played in recent human evolutionary history.
Collapse
Affiliation(s)
- Wenqing Fu
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195-5065;
| | | |
Collapse
|
41
|
Behnam E, Waterman MS, Smith AD. A geometric interpretation for local alignment-free sequence comparison. J Comput Biol 2013; 20:471-85. [PMID: 23829649 PMCID: PMC3704055 DOI: 10.1089/cmb.2012.0280] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Local alignment-free sequence comparison arises in the context of identifying similar segments of sequences that may not be alignable in the traditional sense. We propose a randomized approximation algorithm that is both accurate and efficient. We show that under D2 and its important variant [Formula: see text] as the similarity measure, local alignment-free comparison between a pair of sequences can be formulated as the problem of finding the maximum bichromatic dot product between two sets of points in high dimensions. We introduce a geometric framework that reduces this problem to that of finding the bichromatic closest pair (BCP), allowing the properties of the underlying metric to be leveraged. Local alignment-free sequence comparison can be solved by making a quadratic number of alignment-free substring comparisons. We show both theoretically and through empirical results on simulated data that our approximation algorithm requires a subquadratic number of such comparisons and trades only a small amount of accuracy to achieve this efficiency. Therefore, our algorithm can extend the current usage of alignment-free-based methods and can also be regarded as a substitute for local alignment algorithms in many biological studies.
Collapse
Affiliation(s)
- Ehsan Behnam
- Molecular and Computational Biology, University of Southern California, Los Angeles, California 90089-2910, USA
| | | | | |
Collapse
|
42
|
de Souza FS, Franchini LF, Rubinstein M. Exaptation of transposable elements into novel cis-regulatory elements: is the evidence always strong? Mol Biol Evol 2013; 30:1239-51. [PMID: 23486611 PMCID: PMC3649676 DOI: 10.1093/molbev/mst045] [Citation(s) in RCA: 117] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Transposable elements (TEs) are mobile genetic sequences that can jump around the genome from one location to another, behaving as genomic parasites. TEs have been particularly effective in colonizing mammalian genomes, and such heavy TE load is expected to have conditioned genome evolution. Indeed, studies conducted both at the gene and genome levels have uncovered TE insertions that seem to have been co-opted--or exapted--by providing transcription factor binding sites (TFBSs) that serve as promoters and enhancers, leading to the hypothesis that TE exaptation is a major factor in the evolution of gene regulation. Here, we critically review the evidence for exaptation of TE-derived sequences as TFBSs, promoters, enhancers, and silencers/insulators both at the gene and genome levels. We classify the functional impact attributed to TE insertions into four categories of increasing complexity and argue that so far very few studies have conclusively demonstrated exaptation of TEs as transcriptional regulatory regions. We also contend that many genome-wide studies dealing with TE exaptation in recent lineages of mammals are still inconclusive and that the hypothesis of rapid transcriptional regulatory rewiring mediated by TE mobilization must be taken with caution. Finally, we suggest experimental approaches that may help attributing higher-order functions to candidate exapted TEs.
Collapse
Affiliation(s)
- Flávio S.J. de Souza
- Instituto de Investigaciones en Ingeniería Genética y Biología Molecular, Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
- Departamento de Fisiología, Biología Molecular y Celular, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Lucía F. Franchini
- Instituto de Investigaciones en Ingeniería Genética y Biología Molecular, Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - Marcelo Rubinstein
- Instituto de Investigaciones en Ingeniería Genética y Biología Molecular, Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
- Departamento de Fisiología, Biología Molecular y Celular, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina
| |
Collapse
|
43
|
Haerty W, Ponting CP. Mutations within lncRNAs are effectively selected against in fruitfly but not in human. Genome Biol 2013; 14:R49. [PMID: 23710818 PMCID: PMC4053968 DOI: 10.1186/gb-2013-14-5-r49] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2013] [Accepted: 05/27/2013] [Indexed: 02/07/2023] Open
Abstract
Background Previous studies in Drosophila and mammals have revealed levels of long non-coding RNAs (lncRNAs) sequence conservation that are intermediate between neutrally evolving and protein-coding sequence. These analyses compared conservation between species that diverged up to 75 million years ago. However, analysis of sequence polymorphisms within a species' population can provide an understanding of essentially contemporaneous selective constraints that are acting on lncRNAs and can quantify the deleterious effect of mutations occurring within these loci. Results We took advantage of polymorphisms derived from the genome sequences of 163 Drosophila melanogaster strains and 174 human individuals to calculate the distribution of fitness effects of single nucleotide polymorphisms occurring within intergenic lncRNAs and compared this to distributions for SNPs present within putatively neutral or protein-coding sequences. Our observations show that in D.melanogaster there is a significant excess of rare frequency variants within intergenic lncRNAs relative to neutrally evolving sequences, whereas selection on human intergenic lncRNAs appears to be effectively neutral. Approximately 30% of mutations within these fruitfly lncRNAs are estimated as being weakly deleterious. Conclusions These contrasting results can be attributed to the large difference in effective population sizes between the two species. Our results suggest that while the sequences of lncRNAs will be well conserved across insect species, such loci in mammals will accumulate greater proportions of deleterious changes through genetic drift.
Collapse
|
44
|
Ward LD, Kellis M. Response to comment on "Evidence of abundant purifying selection in humans for recently acquired regulatory functions". Science 2013; 340:682. [PMID: 23661743 DOI: 10.1126/science.1233366] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Green and Ewing propose corrections to our methodology, which we incorporate and extend here. The improved methodology supports our initial conclusion of extensive lineage-specific constraint concentrated in ENCODE elements. We clarify that our estimate is dependent on the constrained and neutral references used, which can further increase the number of nucleotides involved, because a particularly stringent definition was initially used.
Collapse
Affiliation(s)
- Lucas D Ward
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology (MIT), Cambridge, MA 02139, USA
| | | |
Collapse
|
45
|
Rands CM, Darling A, Fujita M, Kong L, Webster MT, Clabaut C, Emes RD, Heger A, Meader S, Hawkins MB, Eisen MB, Teiling C, Affourtit J, Boese B, Grant PR, Grant BR, Eisen JA, Abzhanov A, Ponting CP. Insights into the evolution of Darwin's finches from comparative analysis of the Geospiza magnirostris genome sequence. BMC Genomics 2013; 14:95. [PMID: 23402223 PMCID: PMC3575239 DOI: 10.1186/1471-2164-14-95] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2012] [Accepted: 01/23/2013] [Indexed: 01/01/2023] Open
Abstract
Background A classical example of repeated speciation coupled with ecological diversification is the evolution of 14 closely related species of Darwin’s (Galápagos) finches (Thraupidae, Passeriformes). Their adaptive radiation in the Galápagos archipelago took place in the last 2–3 million years and some of the molecular mechanisms that led to their diversification are now being elucidated. Here we report evolutionary analyses of genome of the large ground finch, Geospiza magnirostris. Results 13,291 protein-coding genes were predicted from a 991.0 Mb G. magnirostris genome assembly. We then defined gene orthology relationships and constructed whole genome alignments between the G. magnirostris and other vertebrate genomes. We estimate that 15% of genomic sequence is functionally constrained between G. magnirostris and zebra finch. Genic evolutionary rate comparisons indicate that similar selective pressures acted along the G. magnirostris and zebra finch lineages suggesting that historical effective population size values have been similar in both lineages. 21 otherwise highly conserved genes were identified that each show evidence for positive selection on amino acid changes in the Darwin's finch lineage. Two of these genes (Igf2r and Pou1f1) have been implicated in beak morphology changes in Darwin’s finches. Five of 47 genes showing evidence of positive selection in early passerine evolution have cilia related functions, and may be examples of adaptively evolving reproductive proteins. Conclusions These results provide insights into past evolutionary processes that have shaped G. magnirostris genes and its genome, and provide the necessary foundation upon which to build population genomics resources that will shed light on more contemporaneous adaptive and non-adaptive processes that have contributed to the evolution of the Darwin’s finches.
Collapse
Affiliation(s)
- Chris M Rands
- Department of Physiology, Anatomy, and Genetics, MRC Functional Genomics Unit, University of Oxford, Oxford, OX1 3PT, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
46
|
Vernot B, Stergachis AB, Maurano MT, Vierstra J, Neph S, Thurman RE, Stamatoyannopoulos JA, Akey JM. Personal and population genomics of human regulatory variation. Genome Res 2013; 22:1689-97. [PMID: 22955981 PMCID: PMC3431486 DOI: 10.1101/gr.134890.111] [Citation(s) in RCA: 91] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The characteristics and evolutionary forces acting on regulatory variation in humans remains elusive because of the difficulty in defining functionally important noncoding DNA. Here, we combine genome-scale maps of regulatory DNA marked by DNase I hypersensitive sites (DHSs) from 138 cell and tissue types with whole-genome sequences of 53 geographically diverse individuals in order to better delimit the patterns of regulatory variation in humans. We estimate that individuals likely harbor many more functionally important variants in regulatory DNA compared with protein-coding regions, although they are likely to have, on average, smaller effect sizes. Moreover, we demonstrate that there is significant heterogeneity in the level of functional constraint in regulatory DNA among different cell types. We also find marked variability in functional constraint among transcription factor motifs in regulatory DNA, with sequence motifs for major developmental regulators, such as HOX proteins, exhibiting levels of constraint comparable to protein-coding regions. Finally, we perform a genome-wide scan of recent positive selection and identify hundreds of novel substrates of adaptive regulatory evolution that are enriched for biologically interesting pathways such as melanogenesis and adipocytokine signaling. These data and results provide new insights into patterns of regulatory variation in individuals and populations and demonstrate that a large proportion of functionally important variation lies beyond the exome.
Collapse
Affiliation(s)
- Benjamin Vernot
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | | | | | | | | | | | | | | |
Collapse
|
47
|
Graur D, Zheng Y, Price N, Azevedo RBR, Zufall RA, Elhaik E. On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE. Genome Biol Evol 2013; 5:578-90. [PMID: 23431001 PMCID: PMC3622293 DOI: 10.1093/gbe/evt028] [Citation(s) in RCA: 302] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/16/2013] [Indexed: 12/11/2022] Open
Abstract
A recent slew of ENCyclopedia Of DNA Elements (ENCODE) Consortium publications, specifically the article signed by all Consortium members, put forward the idea that more than 80% of the human genome is functional. This claim flies in the face of current estimates according to which the fraction of the genome that is evolutionarily conserved through purifying selection is less than 10%. Thus, according to the ENCODE Consortium, a biological function can be maintained indefinitely without selection, which implies that at least 80 - 10 = 70% of the genome is perfectly invulnerable to deleterious mutations, either because no mutation can ever occur in these "functional" regions or because no mutation in these regions can ever be deleterious. This absurd conclusion was reached through various means, chiefly by employing the seldom used "causal role" definition of biological function and then applying it inconsistently to different biochemical properties, by committing a logical fallacy known as "affirming the consequent," by failing to appreciate the crucial difference between "junk DNA" and "garbage DNA," by using analytical methods that yield biased errors and inflate estimates of functionality, by favoring statistical sensitivity over specificity, and by emphasizing statistical significance rather than the magnitude of the effect. Here, we detail the many logical and methodological transgressions involved in assigning functionality to almost every nucleotide in the human genome. The ENCODE results were predicted by one of its authors to necessitate the rewriting of textbooks. We agree, many textbooks dealing with marketing, mass-media hype, and public relations may well have to be rewritten.
Collapse
Affiliation(s)
- Dan Graur
- Department of Biology and Biochemistry, University of Houston, TX, USA.
| | | | | | | | | | | |
Collapse
|
48
|
Ward LD, Kellis M. Interpreting noncoding genetic variation in complex traits and human disease. Nat Biotechnol 2012; 30:1095-106. [PMID: 23138309 PMCID: PMC3703467 DOI: 10.1038/nbt.2422] [Citation(s) in RCA: 340] [Impact Index Per Article: 28.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2012] [Accepted: 10/16/2012] [Indexed: 12/13/2022]
Abstract
Association studies provide genome-wide information about the genetic basis of complex disease, but medical research has primarily focused on protein-coding variants, due to the difficulty of interpreting non-coding mutations. This picture has changed with advances in the systematic annotation of functional non-coding elements. Evolutionary conservation, functional genomics, chromatin state, sequence motifs, and molecular quantitative trait loci all provide complementary information about non-coding function. These functional maps can help prioritize variants on risk haplotypes, filter mutations encountered in the clinic, and perform systems-level analyses to reveal processes underlying disease associations. Advances in predictive modeling can enable dataset integration to reveal pathways shared across loci and alleles, and richer regulatory models can guide the search for epistatic interactions. Lastly, new massively parallel reporter experiments can systematically validate regulatory predictions. Ultimately, advances in regulatory and systems genomics can help unleash the value of whole-genome sequencing for personalized genomic risk assessment, diagnosis, and treatment.
Collapse
Affiliation(s)
- Lucas D Ward
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.
| | | |
Collapse
|
49
|
Reilly SB, Marks SB, Jennings WB. Defining evolutionary boundaries across parapatric ecomorphs of Black Salamanders (Aneides flavipunctatus) with conservation implications. Mol Ecol 2012; 21:5745-61. [DOI: 10.1111/mec.12068] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2012] [Revised: 08/30/2012] [Accepted: 09/11/2012] [Indexed: 11/29/2022]
Affiliation(s)
- Sean B. Reilly
- Department of Biological Sciences; Humboldt State University; 1 Harpst Street; Arcata; CA; 95521; USA
| | - Sharyn B. Marks
- Department of Biological Sciences; Humboldt State University; 1 Harpst Street; Arcata; CA; 95521; USA
| | | |
Collapse
|
50
|
Ward LD, Kellis M. Evidence of abundant purifying selection in humans for recently acquired regulatory functions. Science 2012; 337:1675-8. [PMID: 22956687 PMCID: PMC4104271 DOI: 10.1126/science.1225057] [Citation(s) in RCA: 165] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Although only 5% of the human genome is conserved across mammals, a substantially larger portion is biochemically active, raising the question of whether the additional elements evolve neutrally or confer a lineage-specific fitness advantage. To address this question, we integrate human variation information from the 1000 Genomes Project and activity data from the ENCODE Project. A broad range of transcribed and regulatory nonconserved elements show decreased human diversity, suggesting lineage-specific purifying selection. Conversely, conserved elements lacking activity show increased human diversity, suggesting that some recently became nonfunctional. Regulatory elements under human constraint in nonconserved regions were found near color vision and nerve-growth genes, consistent with purifying selection for recently evolved functions. Our results suggest continued turnover in regulatory regions, with at least an additional 4% of the human genome subject to lineage-specific constraint.
Collapse
Affiliation(s)
- Lucas D Ward
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology (MIT), Cambridge, MA 02139, USA
| | | |
Collapse
|