1
|
Shakya SB, Edwards SV, Sackton TB. Convergent evolution of noncoding elements associated with short tarsus length in birds. BMC Biol 2025; 23:52. [PMID: 39984930 PMCID: PMC11846207 DOI: 10.1186/s12915-025-02156-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Accepted: 02/12/2025] [Indexed: 02/23/2025] Open
Abstract
BACKGROUND Convergent evolution is the independent evolution of similar traits in unrelated lineages across the Tree of Life. Various genomic signatures can help identify cases of convergent evolution at the molecular level, including changes in substitution rate in the same genes or gene networks. In this study, utilizing tarsus measurements of ~ 5400 species of birds, we identify independent shifts in tarsus length and use both comparative genomic and population genetic data to identify convergent evolutionary changes among focal clades with shifts to shorter optimal tarsus length. RESULTS Using a newly generated, comprehensive and broadly accessible set of 932,467 avian conserved non-exonic elements (CNEEs) and a whole-genome alignment of 79 birds, we find strong evidence for convergent acceleration in short-tarsus clades among 14,422 elements. Analysis of 9854 protein-coding genes, however, yielded no evidence of convergent patterns of positive selection. Accelerated elements in short-tarsus clades are concentrated near genes with functions in development, with the strongest enrichment associated with skeletal system development. Analysis of gene networks supports convergent changes in regulation of broadly homologous limb developmental genes and pathways. CONCLUSIONS Our results highlight the important role of regulatory elements undergoing convergent acceleration in convergent skeletal traits and are consistent with previous studies showing the roles of regulatory elements and skeletal phenotypes.
Collapse
Affiliation(s)
- Subir B Shakya
- Informatics Group, Harvard University, Cambridge, MA, USA.
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA.
| | - Scott V Edwards
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Timothy B Sackton
- Informatics Group, Harvard University, Cambridge, MA, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
| |
Collapse
|
2
|
Naqvi S, Kim S, Tabatabaee S, Pampari A, Kundaje A, Pritchard JK, Wysocka J. Transfer learning reveals sequence determinants of the quantitative response to transcription factor dosage. CELL GENOMICS 2025:100780. [PMID: 40020686 DOI: 10.1016/j.xgen.2025.100780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2024] [Revised: 01/29/2025] [Accepted: 01/30/2025] [Indexed: 03/03/2025]
Abstract
Deep learning models have advanced our ability to predict cell-type-specific chromatin patterns from transcription factor (TF) binding motifs, but their application to perturbed contexts remains limited. We applied transfer learning to predict how concentrations of the dosage-sensitive TFs TWIST1 and SOX9 affect regulatory element (RE) chromatin accessibility in facial progenitor cells, achieving near-experimental accuracy. High-affinity motifs that allow for heterotypic TF co-binding and are concentrated at the center of REs buffer against quantitative changes in TF dosage and predict unperturbed accessibility. Conversely, low-affinity or homotypic binding motifs distributed throughout REs drive sensitive responses with minimal impact on unperturbed accessibility. Both buffering and sensitizing features display purifying selection signatures. We validated these sequence features through reporter assays and demonstrated that TF-nucleosome competition can explain low-affinity motifs' sensitizing effects. This combination of transfer learning and quantitative chromatin response measurements provides a novel approach for uncovering additional layers of the cis-regulatory code.
Collapse
Affiliation(s)
- Sahin Naqvi
- Departments of Chemical and Systems Biology and Developmental Biology, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Genetics, Stanford University, Stanford, CA 94305, USA; Division of Gastroenterology, Hepatology, and Nutrition, Boston Children's Hospital, Boston, MA 02115, USA; Department of Pediatrics, Harvard Medical School, Boston, MA 02115, USA.
| | - Seungsoo Kim
- Departments of Chemical and Systems Biology and Developmental Biology, Stanford University School of Medicine, Stanford, CA 94305, USA; Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Saman Tabatabaee
- Departments of Chemical and Systems Biology and Developmental Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Anusri Pampari
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford, CA 94305, USA; Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Jonathan K Pritchard
- Department of Genetics, Stanford University, Stanford, CA 94305, USA; Department of Biology, Stanford University, Stanford, CA 94305, USA.
| | - Joanna Wysocka
- Departments of Chemical and Systems Biology and Developmental Biology, Stanford University School of Medicine, Stanford, CA 94305, USA; Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305, USA.
| |
Collapse
|
3
|
Kliesmete Z, Orchard P, Lee VYK, Geuder J, Krauß SM, Ohnuki M, Jocher J, Vieth B, Enard W, Hellmann I. Evidence for compensatory evolution within pleiotropic regulatory elements. Genome Res 2024; 34:1528-1539. [PMID: 39255977 PMCID: PMC11534155 DOI: 10.1101/gr.279001.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 08/19/2024] [Indexed: 09/12/2024]
Abstract
Pleiotropy, measured as expression breadth across tissues, is one of the best predictors for protein sequence and expression conservation. In this study, we investigated its effect on the evolution of cis-regulatory elements (CREs). To this end, we carefully reanalyzed the Epigenomics Roadmap data for nine fetal tissues, assigning a measure of pleiotropic degree to nearly half a million CREs. To assess the functional conservation of CREs, we generated ATAC-seq and RNA-seq data from humans and macaques. We found that more pleiotropic CREs exhibit greater conservation in accessibility, and the mRNA expression levels of the associated genes are more conserved. This trend of higher conservation for higher degrees of pleiotropy persists when analyzing the transcription factor binding repertoire. In contrast, simple DNA sequence conservation of orthologous sites between species tends to be even lower for pleiotropic CREs than for species-specific CREs. Combining various lines of evidence, we propose that the lack of sequence conservation in functionally conserved pleiotropic CREs is owing to within-element compensatory evolution. In summary, our findings suggest that pleiotropy is also a good predictor for the functional conservation of CREs, even though this is not reflected in the sequence conservation of pleiotropic CREs.
Collapse
Affiliation(s)
- Zane Kliesmete
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians Universität München, 82152 Munich, Germany
| | - Peter Orchard
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians Universität München, 82152 Munich, Germany
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109-2218, USA
| | - Victor Yan Kin Lee
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians Universität München, 82152 Munich, Germany
- Section for Molecular Ecology and Evolution, Globe Institute, University of Copenhagen, 1350 Copenhagen, Denmark
| | - Johanna Geuder
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians Universität München, 82152 Munich, Germany
| | - Simon M Krauß
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians Universität München, 82152 Munich, Germany
- Department of Hematology, Cell Therapy, Hemostaseology and Infectious Diseases, University Leipzig Medical Center, 04103 Leipzig, Germany
| | - Mari Ohnuki
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians Universität München, 82152 Munich, Germany
- Faculty of Medicine, Institute for the Advanced Study of Human Biology (ASHBi), Kyoto University, Kyoto 606-8501, Japan
| | - Jessica Jocher
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians Universität München, 82152 Munich, Germany
| | - Beate Vieth
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians Universität München, 82152 Munich, Germany
| | - Wolfgang Enard
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians Universität München, 82152 Munich, Germany
| | - Ines Hellmann
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians Universität München, 82152 Munich, Germany;
| |
Collapse
|
4
|
Naqvi S, Kim S, Tabatabaee S, Pampari A, Kundaje A, Pritchard JK, Wysocka J. Transfer learning reveals sequence determinants of the quantitative response to transcription factor dosage. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.28.596078. [PMID: 38853998 PMCID: PMC11160683 DOI: 10.1101/2024.05.28.596078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Deep learning approaches have made significant advances in predicting cell type-specific chromatin patterns from the identity and arrangement of transcription factor (TF) binding motifs. However, most models have been applied in unperturbed contexts, precluding a predictive understanding of how chromatin state responds to TF perturbation. Here, we used transfer learning to train and interpret deep learning models that use DNA sequence to predict, with accuracy approaching experimental reproducibility, how the concentration of two dosage-sensitive TFs (TWIST1, SOX9) affects regulatory element (RE) chromatin accessibility in facial progenitor cells. High-affinity motifs that allow for heterotypic TF co-binding and are concentrated at the center of REs buffer against quantitative changes in TF dosage and strongly predict unperturbed accessibility. In contrast, motifs with low-affinity or homotypic binding distributed throughout REs lead to sensitive responses with minimal contributions to unperturbed accessibility. Both buffering and sensitizing features show signatures of purifying selection. We validated these predictive sequence features using reporter assays and showed that a biophysical model of TF-nucleosome competition can explain the sensitizing effect of low-affinity motifs. Our approach of combining transfer learning and quantitative measurements of the chromatin response to TF dosage therefore represents a powerful method to reveal additional layers of the cis-regulatory code.
Collapse
Affiliation(s)
- Sahin Naqvi
- Departments of Chemical and Systems Biology and Developmental Biology, Stanford University School of Medicine, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, California, USA
- Division of Gastroenterology, Hepatology, and Nutrition, Boston Children’s Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Lead contact
| | - Seungsoo Kim
- Departments of Chemical and Systems Biology and Developmental Biology, Stanford University School of Medicine, Stanford, CA, USA
- Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA, USA
- These authors contributed equally
| | - Saman Tabatabaee
- Departments of Chemical and Systems Biology and Developmental Biology, Stanford University School of Medicine, Stanford, CA, USA
- These authors contributed equally
| | - Anusri Pampari
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford, California, USA
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Jonathan K Pritchard
- Department of Genetics, Stanford University, Stanford, California, USA
- Department of Biology, Stanford University, Stanford, CA, USA
| | - Joanna Wysocka
- Departments of Chemical and Systems Biology and Developmental Biology, Stanford University School of Medicine, Stanford, CA, USA
- Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA, USA
| |
Collapse
|
5
|
Zhao S, Chi L, Chen H. CEGA: a method for inferring natural selection by comparative population genomic analysis across species. Genome Biol 2023; 24:219. [PMID: 37789379 PMCID: PMC10548728 DOI: 10.1186/s13059-023-03068-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Accepted: 09/20/2023] [Indexed: 10/05/2023] Open
Abstract
We developed maximum likelihood method for detecting positive selection or balancing selection using multilocus or genomic polymorphism and divergence data from two species. The method is especially useful for investigating natural selection in noncoding regions. Simulations demonstrate that the method outperforms existing methods in detecting both positive and balancing selection. We apply the method to population genomic data from human and chimpanzee. The list of genes identified under selection in the noncoding regions is prominently enriched in pathways related to the brain and nervous system. Therefore, our method will serve as a useful tool for comparative population genomic analysis.
Collapse
Affiliation(s)
- Shilei Zhao
- CAS Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China
- China National Center for Bioinformation, Beijing, 100101, China
- School of Future Technology, College of Life Sciences and Sino-Danish College, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Lianjiang Chi
- CAS Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China
- China National Center for Bioinformation, Beijing, 100101, China
| | - Hua Chen
- CAS Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China.
- China National Center for Bioinformation, Beijing, 100101, China.
- School of Future Technology, College of Life Sciences and Sino-Danish College, University of Chinese Academy of Sciences, Beijing, 100049, China.
- CAS Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650223, China.
| |
Collapse
|
6
|
Tuncay IO, DeVries D, Gogate A, Kaur K, Kumar A, Xing C, Goodspeed K, Seyoum-Tesfa L, Chahrour MH. The genetics of autism spectrum disorder in an East African familial cohort. CELL GENOMICS 2023; 3:100322. [PMID: 37492102 PMCID: PMC10363748 DOI: 10.1016/j.xgen.2023.100322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 03/09/2023] [Accepted: 04/16/2023] [Indexed: 07/27/2023]
Abstract
Autism spectrum disorder (ASD) is a group of complex neurodevelopmental conditions affecting communication and social interaction in 2.3% of children. Studies that demonstrated its complex genetic architecture have been mainly performed in populations of European ancestry. We investigate the genetics of ASD in an East African cohort (129 individuals) from a population with higher prevalence (5%). Whole-genome sequencing identified 2.13 million private variants in the cohort and potentially pathogenic variants in known ASD genes (including CACNA1C, CHD7, FMR1, and TCF7L2). Admixture analysis demonstrated that the cohort comprises two ancestral populations, African and Eurasian. Admixture mapping discovered 10 regions that confer ASD risk on the African haplotypes, containing several known ASD genes. The increased ASD prevalence in this population suggests decreased heterogeneity in the underlying genetic etiology, enabling risk allele identification. Our approach emphasizes the power of African genetic variation and admixture analysis to inform the architecture of complex disorders.
Collapse
Affiliation(s)
- Islam Oguz Tuncay
- Department of Neuroscience, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Darlene DeVries
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Ashlesha Gogate
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Kiran Kaur
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Ashwani Kumar
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Chao Xing
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
- Department of Population and Data Sciences, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Kimberly Goodspeed
- Department of Pediatrics, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
- Department of Neurology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
- Department of Psychiatry, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | | | - Maria H Chahrour
- Department of Neuroscience, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
- Department of Psychiatry, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
- Peter O'Donnell Jr. Brain Institute, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| |
Collapse
|
7
|
Zhang X, Fang B, Huang YF. Transcription factor binding sites are frequently under accelerated evolution in primates. Nat Commun 2023; 14:783. [PMID: 36774380 PMCID: PMC9922303 DOI: 10.1038/s41467-023-36421-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 01/31/2023] [Indexed: 02/13/2023] Open
Abstract
Recent comparative genomic studies have identified many human accelerated elements (HARs) with elevated substitution rates in the human lineage. However, it remains unknown to what extent transcription factor binding sites (TFBSs) are under accelerated evolution in humans and other primates. Here, we introduce two pooling-based phylogenetic methods with dramatically enhanced sensitivity to examine accelerated evolution in TFBSs. Using these new methods, we show that more than 6000 TFBSs annotated in the human genome have experienced accelerated evolution in Hominini, apes, and Old World monkeys. Although these TFBSs individually show relatively weak signals of accelerated evolution, they collectively are more abundant than HARs. Also, we show that accelerated evolution in Pol III binding sites may be driven by lineage-specific positive selection, whereas accelerated evolution in other TFBSs might be driven by nonadaptive evolutionary forces. Finally, the accelerated TFBSs are enriched around developmental genes, suggesting that accelerated evolution in TFBSs may drive the divergence of developmental processes between primates.
Collapse
Affiliation(s)
- Xinru Zhang
- Department of Biology, Pennsylvania State University, University Park, PA, 16802, USA. .,Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, 16802, USA. .,Bioinformatics and Genomics Graduate Program, Pennsylvania State University, University Park, PA, 16802, USA.
| | - Bohao Fang
- Department of Organismic and Evolutionary Biology and the Museum of Comparative Zoology, Harvard University, Boston, MA, 02135, USA
| | - Yi-Fei Huang
- Department of Biology, Pennsylvania State University, University Park, PA, 16802, USA. .,Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
8
|
Schubach M, Nazaretyan L, Kircher M. The Regulatory Mendelian Mutation score for GRCh38. Gigascience 2022; 12:giad024. [PMID: 37083939 PMCID: PMC10120424 DOI: 10.1093/gigascience/giad024] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 01/10/2023] [Accepted: 03/21/2023] [Indexed: 04/22/2023] Open
Abstract
BACKGROUND Genome sequencing efforts for individuals with rare Mendelian disease have increased the research focus on the noncoding genome and the clinical need for methods that prioritize potentially disease causal noncoding variants. Some tools for assessment of variant pathogenicity as well as annotations are not available for the current human genome build (GRCh38), for which the adoption in databases, software, and pipelines was slow. RESULTS Here, we present an updated version of the Regulatory Mendelian Mutation (ReMM) score, retrained on features and variants derived from the GRCh38 genome build. Like its GRCh37 version, it achieves good performance on its highly imbalanced data. To improve accessibility and provide users with a toolbox to score their variant files and look up scores in the genome, we developed a website and API for easy score lookup. CONCLUSIONS Scores of the GRCh38 genome build are highly correlated to the prior release with a performance increase due to the better coverage of features. For prioritization of noncoding mutations in imbalanced datasets, the ReMM score performed much better than other variation scores. Prescored whole-genome files of GRCh37 and GRCh38 genome builds are cited in the article and the website; UCSC genome browser tracks, and an API are available at https://remm.bihealth.org.
Collapse
Affiliation(s)
- Max Schubach
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité–Universitätsmedizin Berlin, 10117 Berlin, Germany
| | - Lusiné Nazaretyan
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité–Universitätsmedizin Berlin, 10117 Berlin, Germany
| | - Martin Kircher
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité–Universitätsmedizin Berlin, 10117 Berlin, Germany
- Institute of Human Genetics, University Medical Center Schleswig-Holstein, University of Lübeck, 23562 Lübeck, Germany
| |
Collapse
|
9
|
Linker SB, Narvaiza I, Hsu JY, Wang M, Qiu F, Mendes APD, Oefner R, Kottilil K, Sharma A, Randolph-Moore L, Mejia E, Santos R, Marchetto MC, Gage FH. Human-specific regulation of neural maturation identified by cross-primate transcriptomics. Curr Biol 2022; 32:4797-4807.e5. [PMID: 36228612 DOI: 10.1016/j.cub.2022.09.028] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Revised: 07/08/2022] [Accepted: 09/14/2022] [Indexed: 11/06/2022]
Abstract
Unique aspects of human behavior are often attributed to differences in the relative size and organization of the human brain: these structural aspects originate during early development. Recent studies indicate that human neurodevelopment is considerably slower than that in other nonhuman primates, a finding that is termed neoteny. One aspect of neoteny is the slow onset of action potentials. However, which molecular mechanisms play a role in this process remain unclear. To examine the evolutionary constraints on the rate of neuronal maturation, we have generated transcriptional data tracking five time points, from the neural progenitor state to 8-week-old neurons, in primates spanning the catarrhine lineage, including Macaca mulatta, Gorilla gorilla, Pan paniscus, Pan troglodytes, and Homo sapiens. Despite finding an overall similarity of many transcriptional signatures, species-specific and clade-specific distinctions were observed. Among the genes that exhibited human-specific regulation, we identified a key pioneer transcription factor, GATA3, that was uniquely upregulated in humans during the neuronal maturation process. We further examined the regulatory nature of GATA3 in human cells and observed that downregulation quickened the speed of developing spontaneous action potentials, thereby modulating the human neotenic phenotype. These results provide evidence for the divergence of gene regulation as a key molecular mechanism underlying human neoteny.
Collapse
Affiliation(s)
- Sara B Linker
- Laboratory of Genetics, Salk Institute for Biological Studies, 10010 North Pines Road, La Jolla, CA 92037, USA
| | - Iñigo Narvaiza
- Laboratory of Genetics, Salk Institute for Biological Studies, 10010 North Pines Road, La Jolla, CA 92037, USA
| | - Jonathan Y Hsu
- Laboratory of Genetics, Salk Institute for Biological Studies, 10010 North Pines Road, La Jolla, CA 92037, USA
| | - Meiyan Wang
- Laboratory of Genetics, Salk Institute for Biological Studies, 10010 North Pines Road, La Jolla, CA 92037, USA
| | - Fan Qiu
- Laboratory of Genetics, Salk Institute for Biological Studies, 10010 North Pines Road, La Jolla, CA 92037, USA
| | - Ana P D Mendes
- Laboratory of Genetics, Salk Institute for Biological Studies, 10010 North Pines Road, La Jolla, CA 92037, USA
| | - Ruth Oefner
- Laboratory of Genetics, Salk Institute for Biological Studies, 10010 North Pines Road, La Jolla, CA 92037, USA
| | - Kalyani Kottilil
- Laboratory of Genetics, Salk Institute for Biological Studies, 10010 North Pines Road, La Jolla, CA 92037, USA
| | - Amandeep Sharma
- Laboratory of Genetics, Salk Institute for Biological Studies, 10010 North Pines Road, La Jolla, CA 92037, USA
| | - Lynne Randolph-Moore
- Laboratory of Genetics, Salk Institute for Biological Studies, 10010 North Pines Road, La Jolla, CA 92037, USA
| | - Eunice Mejia
- Laboratory of Genetics, Salk Institute for Biological Studies, 10010 North Pines Road, La Jolla, CA 92037, USA
| | - Renata Santos
- Laboratory of Genetics, Salk Institute for Biological Studies, 10010 North Pines Road, La Jolla, CA 92037, USA; Université Paris Cité, Institute of Psychiatry and Neuroscience of Paris (IPNP), INSERM U1266, Laboratory of Dynamics of Neuronal Structure in Health and Disease, 102 rue de la Santé, 75014 Paris, France; Institut des Sciences Biologiques, CNRS, 16 rue Pierre et Marie Curie, 75005 Paris, France
| | - Maria C Marchetto
- Department of Anthropology, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA; Center for Academic Research and Training in Anthropogeny (CARTA), University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA.
| | - Fred H Gage
- Laboratory of Genetics, Salk Institute for Biological Studies, 10010 North Pines Road, La Jolla, CA 92037, USA.
| |
Collapse
|
10
|
Exploration of Tools for the Interpretation of Human Non-Coding Variants. Int J Mol Sci 2022; 23:ijms232112977. [PMID: 36361767 PMCID: PMC9654743 DOI: 10.3390/ijms232112977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 10/17/2022] [Accepted: 10/23/2022] [Indexed: 02/01/2023] Open
Abstract
The advent of Whole Genome Sequencing (WGS) broadened the genetic variation detection range, revealing the presence of variants even in non-coding regions of the genome, which would have been missed using targeted approaches. One of the most challenging issues in WGS analysis regards the interpretation of annotated variants. This review focuses on tools suitable for the functional annotation of variants falling into non-coding regions. It couples the description of non-coding genomic areas with the results and performance of existing tools for a functional interpretation of the effect of variants in these regions. Tools were tested in a controlled genomic scenario, representing the ground-truth and allowing us to determine software performance.
Collapse
|
11
|
Lye Z, Choi JY, Purugganan MD. Deleterious mutations and the rare allele burden on rice gene expression. Mol Biol Evol 2022; 39:6693943. [PMID: 36073358 PMCID: PMC9512150 DOI: 10.1093/molbev/msac193] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Deleterious genetic variation is maintained in populations at low frequencies. Under a model of stabilizing selection, rare (and presumably deleterious) genetic variants are associated with increase or decrease in gene expression from some intermediate optimum. We investigate this phenomenon in a population of largely Oryza sativa ssp. indica rice landraces under normal unstressed wet and stressful drought field conditions. We include single nucleotide polymorphisms, insertion/deletion mutations, and structural variants in our analysis and find a stronger association between rare variants and gene expression outliers under the stress condition. We also show an association of the strength of this rare variant effect with linkage, gene expression levels, network connectivity, local recombination rate, and fitness consequence scores, consistent with the stabilizing selection model of gene expression.
Collapse
Affiliation(s)
- Zoe Lye
- Center for Genomics and Systems Biology, New York University, New York, NY 10003
| | - Jae Young Choi
- Center for Genomics and Systems Biology, New York University, New York, NY 10003
| | - Michael D Purugganan
- Center for Genomics and Systems Biology, New York University, New York, NY 10003.,Center for Genomics and Systems Biology, New York University Abu Dhabi, Abu Dhabi, United Arab Emirates
| |
Collapse
|
12
|
Ramstein GP, Buckler ES. Prediction of evolutionary constraint by genomic annotations improves functional prioritization of genomic variants in maize. Genome Biol 2022; 23:183. [PMID: 36050782 PMCID: PMC9438327 DOI: 10.1186/s13059-022-02747-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Accepted: 08/15/2022] [Indexed: 11/10/2022] Open
Abstract
Background Crop improvement through cross-population genomic prediction and genome editing requires identification of causal variants at high resolution, within fewer than hundreds of base pairs. Most genetic mapping studies have generally lacked such resolution. In contrast, evolutionary approaches can detect genetic effects at high resolution, but they are limited by shifting selection, missing data, and low depth of multiple-sequence alignments. Here we use genomic annotations to accurately predict nucleotide conservation across angiosperms, as a proxy for fitness effect of mutations. Results Using only sequence analysis, we annotate nonsynonymous mutations in 25,824 maize gene models, with information from bioinformatics and deep learning. Our predictions are validated by experimental information: within-species conservation, chromatin accessibility, and gene expression. According to gene ontology and pathway enrichment analyses, predicted nucleotide conservation points to genes in central carbon metabolism. Importantly, it improves genomic prediction for fitness-related traits such as grain yield, in elite maize panels, by stringent prioritization of fewer than 1% of single-site variants. Conclusions Our results suggest that predicting nucleotide conservation across angiosperms may effectively prioritize sites most likely to impact fitness-related traits in crops, without being limited by shifting selection, missing data, and low depth of multiple-sequence alignments. Our approach—Prediction of mutation Impact by Calibrated Nucleotide Conservation (PICNC)—could be useful to select polymorphisms for accurate genomic prediction, and candidate mutations for efficient base editing. The trained PICNC models and predicted nucleotide conservation at protein-coding SNPs in maize are publicly available in CyVerse (10.25739/hybz-2957). Supplementary Information The online version contains supplementary material available at 10.1186/s13059-022-02747-2.
Collapse
Affiliation(s)
- Guillaume P Ramstein
- Center for Quantitative Genetics and Genomics, Aarhus University, 8000, Aarhus, Denmark. .,Institute for Genomic Diversity, Cornell University, Ithaca, NY, 14853, USA.
| | - Edward S Buckler
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, 14853, USA.,USDA-ARS, Ithaca, NY, 14853, USA
| |
Collapse
|
13
|
Dukler N, Mughal MR, Ramani R, Huang YF, Siepel A. Extreme purifying selection against point mutations in the human genome. Nat Commun 2022; 13:4312. [PMID: 35879308 PMCID: PMC9314448 DOI: 10.1038/s41467-022-31872-6] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 07/07/2022] [Indexed: 12/13/2022] Open
Abstract
Large-scale genome sequencing has enabled the measurement of strong purifying selection in protein-coding genes. Here we describe a new method, called ExtRaINSIGHT, for measuring such selection in noncoding as well as coding regions of the human genome. ExtRaINSIGHT estimates the prevalence of "ultraselection" by the fractional depletion of rare single-nucleotide variants, after controlling for variation in mutation rates. Applying ExtRaINSIGHT to 71,702 whole genome sequences from gnomAD v3, we find abundant ultraselection in evolutionarily ancient miRNAs and neuronal protein-coding genes, as well as at splice sites. By contrast, we find much less ultraselection in other noncoding RNAs and transcription factor binding sites, and only modest levels in ultraconserved elements. We estimate that ~0.4-0.7% of the human genome is ultraselected, implying ~ 0.26-0.51 strongly deleterious mutations per generation. Overall, our study sheds new light on the genome-wide distribution of fitness effects by combining deep sequencing data and classical theory from population genetics.
Collapse
Affiliation(s)
- Noah Dukler
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Mehreen R Mughal
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Ritika Ramani
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Yi-Fei Huang
- Department of Biology and Huck Institute of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
| |
Collapse
|
14
|
Halldorsson BV, Eggertsson HP, Moore KHS, Hauswedell H, Eiriksson O, Ulfarsson MO, Palsson G, Hardarson MT, Oddsson A, Jensson BO, Kristmundsdottir S, Sigurpalsdottir BD, Stefansson OA, Beyter D, Holley G, Tragante V, Gylfason A, Olason PI, Zink F, Asgeirsdottir M, Sverrisson ST, Sigurdsson B, Gudjonsson SA, Sigurdsson GT, Halldorsson GH, Sveinbjornsson G, Norland K, Styrkarsdottir U, Magnusdottir DN, Snorradottir S, Kristinsson K, Sobech E, Jonsson H, Geirsson AJ, Olafsson I, Jonsson P, Pedersen OB, Erikstrup C, Brunak S, Ostrowski SR, Thorleifsson G, Jonsson F, Melsted P, Jonsdottir I, Rafnar T, Holm H, Stefansson H, Saemundsdottir J, Gudbjartsson DF, Magnusson OT, Masson G, Thorsteinsdottir U, Helgason A, Jonsson H, Sulem P, Stefansson K. The sequences of 150,119 genomes in the UK Biobank. Nature 2022; 607:732-740. [PMID: 35859178 PMCID: PMC9329122 DOI: 10.1038/s41586-022-04965-x] [Citation(s) in RCA: 205] [Impact Index Per Article: 68.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 06/10/2022] [Indexed: 12/25/2022]
Abstract
Detailed knowledge of how diversity in the sequence of the human genome affects phenotypic diversity depends on a comprehensive and reliable characterization of both sequences and phenotypic variation. Over the past decade, insights into this relationship have been obtained from whole-exome sequencing or whole-genome sequencing of large cohorts with rich phenotypic data1,2. Here we describe the analysis of whole-genome sequencing of 150,119 individuals from the UK Biobank3. This constitutes a set of high-quality variants, including 585,040,410 single-nucleotide polymorphisms, representing 7.0% of all possible human single-nucleotide polymorphisms, and 58,707,036 indels. This large set of variants allows us to characterize selection based on sequence variation within a population through a depletion rank score of windows along the genome. Depletion rank analysis shows that coding exons represent a small fraction of regions in the genome subject to strong sequence conservation. We define three cohorts within the UK Biobank: a large British Irish cohort, a smaller African cohort and a South Asian cohort. A haplotype reference panel is provided that allows reliable imputation of most variants carried by three or more sequenced individuals. We identified 895,055 structural variants and 2,536,688 microsatellites, groups of variants typically excluded from large-scale whole-genome sequencing studies. Using this formidable new resource, we provide several examples of trait associations for rare variants with large effects not found previously through studies based on whole-exome sequencing and/or imputation.
Collapse
Affiliation(s)
- Bjarni V Halldorsson
- deCODE genetics/Amgen Inc., Reykjavik, Iceland. .,School of Technology, Reykjavik University, Reykjavik, Iceland.
| | | | | | | | | | - Magnus O Ulfarsson
- deCODE genetics/Amgen Inc., Reykjavik, Iceland.,School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | | | - Marteinn T Hardarson
- deCODE genetics/Amgen Inc., Reykjavik, Iceland.,School of Technology, Reykjavik University, Reykjavik, Iceland
| | | | | | - Snaedis Kristmundsdottir
- deCODE genetics/Amgen Inc., Reykjavik, Iceland.,School of Technology, Reykjavik University, Reykjavik, Iceland
| | - Brynja D Sigurpalsdottir
- deCODE genetics/Amgen Inc., Reykjavik, Iceland.,School of Technology, Reykjavik University, Reykjavik, Iceland
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Helgi Jonsson
- Landspitali-University Hospital, Reykjavik, Iceland.,Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | | | | | - Palmi Jonsson
- Landspitali-University Hospital, Reykjavik, Iceland.,Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | - Ole Birger Pedersen
- Department of Clinical Immunology, Zealand University Hospital, Køge, Denmark
| | - Christian Erikstrup
- Department of Clinical Medicine, Aarhus University, Aarhus, Denmark.,Department of Clinical Immunology, Aarhus University Hospital, Aarhus, Denmark
| | - Søren Brunak
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Sisse Rye Ostrowski
- Department of Clinical Immunology, Copenhagen University Hospital (Rigshospitalet), Copenhagen, Denmark.,Department of Clinical Medicine, Faculty of Health and Clinical Sciences, Copenhagen University, Copenhagen, Denmark
| | | | | | | | - Pall Melsted
- deCODE genetics/Amgen Inc., Reykjavik, Iceland.,School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | - Ingileif Jonsdottir
- deCODE genetics/Amgen Inc., Reykjavik, Iceland.,Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | | | - Hilma Holm
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
| | | | | | - Daniel F Gudbjartsson
- deCODE genetics/Amgen Inc., Reykjavik, Iceland.,School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | | | | | - Unnur Thorsteinsdottir
- deCODE genetics/Amgen Inc., Reykjavik, Iceland.,Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | - Agnar Helgason
- deCODE genetics/Amgen Inc., Reykjavik, Iceland.,Department of Anthropology, University of Iceland, Reykjavik, Iceland
| | | | | | | |
Collapse
|
15
|
Groen SC, Joly-Lopez Z, Platts AE, Natividad M, Fresquez Z, Mauck WM, Quintana MR, Cabral CLU, Torres RO, Satija R, Purugganan MD, Henry A. Evolutionary systems biology reveals patterns of rice adaptation to drought-prone agro-ecosystems. THE PLANT CELL 2022; 34:759-783. [PMID: 34791424 PMCID: PMC8824591 DOI: 10.1093/plcell/koab275] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 11/02/2021] [Indexed: 05/24/2023]
Abstract
Rice (Oryza sativa) was domesticated around 10,000 years ago and has developed into a staple for half of humanity. The crop evolved and is currently grown in stably wet and intermittently dry agro-ecosystems, but patterns of adaptation to differences in water availability remain poorly understood. While previous field studies have evaluated plant developmental adaptations to water deficit, adaptive variation in functional and hydraulic components, particularly in relation to gene expression, has received less attention. Here, we take an evolutionary systems biology approach to characterize adaptive drought resistance traits across roots and shoots. We find that rice harbors heritable variation in molecular, physiological, and morphological traits that is linked to higher fitness under drought. We identify modules of co-expressed genes that are associated with adaptive drought avoidance and tolerance mechanisms. These expression modules showed evidence of polygenic adaptation in rice subgroups harboring accessions that evolved in drought-prone agro-ecosystems. Fitness-linked expression patterns allowed us to identify the drought-adaptive nature of optimizing photosynthesis and interactions with arbuscular mycorrhizal fungi. Taken together, our study provides an unprecedented, integrative view of rice adaptation to water-limited field conditions.
Collapse
Affiliation(s)
- Simon C Groen
- Author for correspondence: (S.C.G.), (M.D.P.), (A.H.)
| | | | | | - Mignon Natividad
- International Rice Research Institute, Los Baños, Laguna, Philippines, USA
| | - Zoë Fresquez
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, USA
| | | | | | - Carlo Leo U Cabral
- International Rice Research Institute, Los Baños, Laguna, Philippines, USA
| | - Rolando O Torres
- International Rice Research Institute, Los Baños, Laguna, Philippines, USA
| | - Rahul Satija
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, USA
- New York Genome Center, New York, USA
| | | | - Amelia Henry
- Author for correspondence: (S.C.G.), (M.D.P.), (A.H.)
| |
Collapse
|
16
|
Yousaf A, Liu J, Ye S, Chen H. Current Progress in Evolutionary Comparative Genomics of Great Apes. Front Genet 2021; 12:657468. [PMID: 34456962 PMCID: PMC8385753 DOI: 10.3389/fgene.2021.657468] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 07/15/2021] [Indexed: 12/04/2022] Open
Abstract
The availability of high-quality genome sequences of great ape species provides unprecedented opportunities for genomic analyses. Herein, we reviewed the recent progress in evolutionary comparative genomic studies of the existing great ape species, including human, chimpanzee, bonobo, gorilla, and orangutan. We elaborate discovery on evolutionary history, natural selection, structural variations, and new genes of these species, which is informative for understanding the origin of human-specific phenotypes.
Collapse
Affiliation(s)
- Aisha Yousaf
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,China National Center for Bioinformation, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Junfeng Liu
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,China National Center for Bioinformation, Beijing, China
| | - Sicheng Ye
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,China National Center for Bioinformation, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Hua Chen
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,China National Center for Bioinformation, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China.,CAS Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
| |
Collapse
|
17
|
Mechanisms of Binding Specificity among bHLH Transcription Factors. Int J Mol Sci 2021; 22:ijms22179150. [PMID: 34502060 PMCID: PMC8431614 DOI: 10.3390/ijms22179150] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 08/14/2021] [Accepted: 08/18/2021] [Indexed: 12/25/2022] Open
Abstract
The transcriptome of every cell is orchestrated by the complex network of interaction between transcription factors (TFs) and their binding sites on DNA. Disruption of this network can result in many forms of organism malfunction but also can be the substrate of positive natural selection. However, understanding the specific determinants of each of these individual TF-DNA interactions is a challenging task as it requires integrating the multiple possible mechanisms by which a given TF ends up interacting with a specific genomic region. These mechanisms include DNA motif preferences, which can be determined by nucleotide sequence but also by DNA’s shape; post-translational modifications of the TF, such as phosphorylation; and dimerization partners and co-factors, which can mediate multiple forms of direct or indirect cooperative binding. Binding can also be affected by epigenetic modifications of putative target regions, including DNA methylation and nucleosome occupancy. In this review, we describe how all these mechanisms have a role and crosstalk in one specific family of TFs, the basic helix-loop-helix (bHLH), with a very conserved DNA binding domain and a similar DNA preferred motif, the E-box. Here, we compile and discuss a rich catalog of strategies used by bHLH to acquire TF-specific genome-wide landscapes of binding sites.
Collapse
|
18
|
Abstract
Interpreting the effects of genetic variants is key to understanding individual susceptibility to disease and designing personalized therapeutic approaches. Modern experimental technologies are enabling the generation of massive compendia of human genome sequence data and associated molecular and phenotypic traits, together with genome-scale expression, epigenomics and other functional genomic data. Integrative computational models can leverage these data to understand variant impact, elucidate the effect of dysregulated genes on biological pathways in specific disease and tissue contexts, and interpret disease risk beyond what is feasible with experiments alone. In this Review, we discuss recent developments in machine learning algorithms for genome interpretation and for integrative molecular-level modelling of cells, tissues and organs relevant to disease. More specifically, we highlight existing methods and key challenges and opportunities in identifying specific disease-causing genetic variants and linking them to molecular pathways and, ultimately, to disease phenotypes.
Collapse
|
19
|
Tan L, Cheng W, Liu F, Wang DO, Wu L, Cao N, Wang J. Positive natural selection of N6-methyladenosine on the RNAs of processed pseudogenes. Genome Biol 2021; 22:180. [PMID: 34120636 PMCID: PMC8201931 DOI: 10.1186/s13059-021-02402-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 06/04/2021] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Canonical nonsense-mediated decay (NMD) is an important splicing-dependent process for mRNA surveillance in mammals. However, processed pseudogenes are not able to trigger NMD due to their lack of introns. It is largely unknown whether they have evolved other surveillance mechanisms. RESULTS Here, we find that the RNAs of pseudogenes, especially processed pseudogenes, have dramatically higher m6A levels than their cognate protein-coding genes, associated with de novo m6A peaks and motifs in human cells. Furthermore, pseudogenes have rapidly accumulated m6A motifs during evolution. The m6A sites of pseudogenes are evolutionarily younger than neutral sites and their m6A levels are increasing, supporting the idea that m6A on the RNAs of pseudogenes is under positive selection. We then find that the m6A RNA modification of processed, rather than unprocessed, pseudogenes promotes cytosolic RNA degradation and attenuates interference with the RNAs of their cognate protein-coding genes. We experimentally validate the m6A RNA modification of two processed pseudogenes, DSTNP2 and NAP1L4P1, which promotes the RNA degradation of both pseudogenes and their cognate protein-coding genes DSTN and NAP1L4. In addition, the m6A of DSTNP2 regulation of DSTN is partially dependent on the miRNA miR-362-5p. CONCLUSIONS Our discovery reveals a novel evolutionary role of m6A RNA modification in cleaning up the unnecessary processed pseudogene transcripts to attenuate their interference with the regulatory network of protein-coding genes.
Collapse
Affiliation(s)
- Liqiang Tan
- Department of Medical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, China
- Center for Stem Cell Biology and Tissue Engineering, Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Sun Yat-sen University, Guangzhou, 510080, China
| | - Weisheng Cheng
- Department of Medical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, China
- Center for Stem Cell Biology and Tissue Engineering, Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Sun Yat-sen University, Guangzhou, 510080, China
| | - Fang Liu
- Center for Stem Cell Biology and Tissue Engineering, Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Sun Yat-sen University, Guangzhou, 510080, China
| | - Dan Ohtan Wang
- Center for Biosystems Dynamics Research, RIKEN, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo, 650-0047, Japan
- Wuya College of Innovation, Shenyang Pharmaceutical University, Shenyang, 110016, China
| | - Linwei Wu
- Center for Stem Cell Biology and Tissue Engineering, Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Sun Yat-sen University, Guangzhou, 510080, China
| | - Nan Cao
- Center for Stem Cell Biology and Tissue Engineering, Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Sun Yat-sen University, Guangzhou, 510080, China
| | - Jinkai Wang
- Department of Medical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, China.
- Center for Stem Cell Biology and Tissue Engineering, Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Sun Yat-sen University, Guangzhou, 510080, China.
- RNA Biomedical Institute, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, 510120, China.
| |
Collapse
|
20
|
Gaither JBS, Lammi GE, Li JL, Gordon DM, Kuck HC, Kelly BJ, Fitch JR, White P. Synonymous variants that disrupt messenger RNA structure are significantly constrained in the human population. Gigascience 2021; 10:giab023. [PMID: 33822938 PMCID: PMC8023685 DOI: 10.1093/gigascience/giab023] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2020] [Revised: 02/10/2021] [Accepted: 03/10/2021] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND The role of synonymous single-nucleotide variants in human health and disease is poorly understood, yet evidence suggests that this class of "silent" genetic variation plays multiple regulatory roles in both transcription and translation. One mechanism by which synonymous codons direct and modulate the translational process is through alteration of the elaborate structure formed by single-stranded mRNA molecules. While tools to computationally predict the effect of non-synonymous variants on protein structure are plentiful, analogous tools to systematically assess how synonymous variants might disrupt mRNA structure are lacking. RESULTS We developed novel software using a parallel processing framework for large-scale generation of secondary RNA structures and folding statistics for the transcriptome of any species. Focusing our analysis on the human transcriptome, we calculated 5 billion RNA-folding statistics for 469 million single-nucleotide variants in 45,800 transcripts. By considering the impact of all possible synonymous variants globally, we discover that synonymous variants predicted to disrupt mRNA structure have significantly lower rates of incidence in the human population. CONCLUSIONS These findings support the hypothesis that synonymous variants may play a role in genetic disorders due to their effects on mRNA structure. To evaluate the potential pathogenic impact of synonymous variants, we provide RNA stability, edge distance, and diversity metrics for every nucleotide in the human transcriptome and introduce a "Structural Predictivity Index" (SPI) to quantify structural constraint operating on any synonymous variant. Because no single RNA-folding metric can capture the diversity of mechanisms by which a variant could alter secondary mRNA structure, we generated a SUmmarized RNA Folding (SURF) metric to provide a single measurement to predict the impact of secondary structure altering variants in human genetic studies.
Collapse
Affiliation(s)
- Jeffrey B S Gaither
- Computational Genomics Group, The Institute for Genomic Medicine, Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH 43215, USA
| | - Grant E Lammi
- Computational Genomics Group, The Institute for Genomic Medicine, Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH 43215, USA
| | - James L Li
- Computational Genomics Group, The Institute for Genomic Medicine, Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH 43215, USA
| | - David M Gordon
- Computational Genomics Group, The Institute for Genomic Medicine, Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH 43215, USA
| | - Harkness C Kuck
- Computational Genomics Group, The Institute for Genomic Medicine, Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH 43215, USA
| | - Benjamin J Kelly
- Computational Genomics Group, The Institute for Genomic Medicine, Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH 43215, USA
| | - James R Fitch
- Computational Genomics Group, The Institute for Genomic Medicine, Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH 43215, USA
| | - Peter White
- Computational Genomics Group, The Institute for Genomic Medicine, Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH 43215, USA
- Department of Pediatrics, College of Medicine, The Ohio State University, 370 W. 9th Avenue, Columbus, OH 43210, USA
| |
Collapse
|
21
|
Scossa F, Fernie AR. Ancestral sequence reconstruction - An underused approach to understand the evolution of gene function in plants? Comput Struct Biotechnol J 2021; 19:1579-1594. [PMID: 33868595 PMCID: PMC8039532 DOI: 10.1016/j.csbj.2021.03.008] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2021] [Revised: 03/04/2021] [Accepted: 03/06/2021] [Indexed: 02/06/2023] Open
Abstract
Whilst substantial research effort has been placed on understanding the interactions of plant proteins with their molecular partners, relatively few studies in plants - by contrast to work in other organisms - address how these interactions evolve. It is thought that ancestral proteins were more promiscuous than modern proteins and that specificity often evolved following gene duplication and subsequent functional refining. However, ancestral protein resurrection studies have found that some modern proteins have evolved de novo from ancestors lacking those functions. Intriguingly, the new interactions evolved as a consequence of just a few mutations and, as such, acquisition of new functions appears to be neither difficult nor rare, however, only a few of them are incorporated into biological processes before they are lost to subsequent mutations. Here, we detail the approach of ancestral sequence reconstruction (ASR), providing a primer to reconstruct the sequence of an ancestral gene. We will present case studies from a range of different eukaryotes before discussing the few instances where ancestral reconstructions have been used in plants. As ASR is used to dig into the remote evolutionary past, we will also present some alternative genetic approaches to investigate molecular evolution on shorter timescales. We argue that the study of plant secondary metabolism is particularly well suited for ancestral reconstruction studies. Indeed, its ancient evolutionary roots and highly diverse landscape provide an ideal context in which to address the focal issue around the emergence of evolutionary novelties and how this affects the chemical diversification of plant metabolism.
Collapse
Key Words
- APR, ancestral protein resurrection
- ASR, ancestral sequence reconstruction
- Ancestral sequence reconstruction
- CDS, coding sequence
- Evolution
- GR, glucocorticoid receptor
- GWAS, genome wide association study
- Genomics
- InDel, insertion/deletion
- MCMC, Markov Chain Monte Carlo
- ML, maximum likelihood
- MP, maximum parsimony
- MR, mineralcorticoid receptor
- MSA, multiple sequence alignment
- Metabolism
- NJ, neighbor-joining
- Phylogenetics
- Plants
- SFS, site frequency spectrum
Collapse
Affiliation(s)
- Federico Scossa
- Max-Planck-Institute of Molecular Plant Physiology (MPI-MP), 14476 Potsdam-Golm, Germany
- Council for Agricultural Research and Economics (CREA), Research Centre for Genomics and Bioinformatics (CREA-GB), Rome, Italy
| | - Alisdair R. Fernie
- Max-Planck-Institute of Molecular Plant Physiology (MPI-MP), 14476 Potsdam-Golm, Germany
- Center of Plant Systems Biology and Biotechnology (CPSBB), Plovdiv, Bulgaria
| |
Collapse
|
22
|
Liu J, Robinson-Rechavi M. Robust inference of positive selection on regulatory sequences in the human brain. SCIENCE ADVANCES 2020; 6:6/48/eabc9863. [PMID: 33246961 PMCID: PMC7695467 DOI: 10.1126/sciadv.abc9863] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Accepted: 10/16/2020] [Indexed: 05/07/2023]
Abstract
A longstanding hypothesis is that divergence between humans and chimpanzees might have been driven more by regulatory level adaptations than by protein sequence adaptations. This has especially been suggested for regulatory adaptations in the evolution of the human brain. We present a new method to detect positive selection on transcription factor binding sites on the basis of measuring predicted affinity change with a machine learning model of binding. Unlike other methods, this approach requires neither defining a priori neutral sites nor detecting accelerated evolution, thus removing major sources of bias. We scanned the signals of positive selection for CTCF binding sites in 29 human and 11 mouse tissues or cell types. We found that human brain-related cell types have the highest proportion of positive selection. This result is consistent with the view that adaptive evolution to gene regulation has played an important role in evolution of the human brain.
Collapse
Affiliation(s)
- Jialin Liu
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Marc Robinson-Rechavi
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|
23
|
Zhang H, Shi X, Huang T, Zhao X, Chen W, Gu N, Zhang R. Dynamic landscape and evolution of m6A methylation in human. Nucleic Acids Res 2020; 48:6251-6264. [PMID: 32406913 PMCID: PMC7293016 DOI: 10.1093/nar/gkaa347] [Citation(s) in RCA: 202] [Impact Index Per Article: 40.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2020] [Revised: 04/23/2020] [Accepted: 04/24/2020] [Indexed: 01/03/2023] Open
Abstract
m6A is a prevalent internal modification in mRNAs and has been linked to the diverse effects on mRNA fate. To explore the landscape and evolution of human m6A, we generated 27 m6A methylomes across major adult tissues. These data reveal dynamic m6A methylation across tissue types, uncover both broadly or tissue-specifically methylated sites, and identify an unexpected enrichment of m6A methylation at non-canonical cleavage sites. A comparison of fetal and adult m6A methylomes reveals that m6A preferentially occupies CDS regions in fetal tissues. Moreover, the m6A sub-motifs vary between fetal and adult tissues or across tissue types. From the evolutionary perspective, we uncover that the selection pressure on m6A sites varies and depends on their genic locations. Unexpectedly, we found that ∼40% of the 3′UTR m6A sites are under negative selection, which is higher than the evolutionary constraint on miRNA binding sites, and much higher than that on A-to-I RNA modification. Moreover, the recently gained m6A sites in human populations are clearly under positive selection and associated with traits or diseases. Our work provides a resource of human m6A profile for future studies of m6A functions, and suggests a role of m6A modification in human evolutionary adaptation and disease susceptibility.
Collapse
Affiliation(s)
- Hui Zhang
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510275, PR China
| | - Xinrui Shi
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510275, PR China
| | - Tao Huang
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510275, PR China
| | - Xueni Zhao
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510275, PR China
| | - Wanying Chen
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510275, PR China
| | - Nannan Gu
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510275, PR China
| | - Rui Zhang
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510275, PR China.,RNA Biomedical Institute, Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, Guangzhou 510120, PR China
| |
Collapse
|
24
|
Huang YF. Unified inference of missense variant effects and gene constraints in the human genome. PLoS Genet 2020; 16:e1008922. [PMID: 32667917 PMCID: PMC7384676 DOI: 10.1371/journal.pgen.1008922] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Revised: 07/27/2020] [Accepted: 06/09/2020] [Indexed: 01/25/2023] Open
Abstract
A challenge in medical genomics is to identify variants and genes associated with severe genetic disorders. Based on the premise that severe, early-onset disorders often result in a reduction of evolutionary fitness, several statistical methods have been developed to predict pathogenic variants or constrained genes based on the signatures of negative selection in human populations. However, we currently lack a statistical framework to jointly predict deleterious variants and constrained genes from both variant-level features and gene-level selective constraints. Here we present such a unified approach, UNEECON, based on deep learning and population genetics. UNEECON treats the contributions of variant-level features and gene-level constraints as a variant-level fixed effect and a gene-level random effect, respectively. The sum of the fixed and random effects is then combined with an evolutionary model to infer the strength of negative selection at both variant and gene levels. Compared with previously published methods, UNEECON shows improved performance in predicting missense variants and protein-coding genes associated with autosomal dominant disorders, and feature importance analysis suggests that both gene-level selective constraints and variant-level predictors are important for accurate variant prioritization. Furthermore, based on UNEECON, we observe a low correlation between gene-level intolerance to missense mutations and that to loss-of-function mutations, which can be partially explained by the prevalence of disordered protein regions that are highly tolerant to missense mutations. Finally, we show that genes intolerant to both missense and loss-of-function mutations play key roles in the central nervous system and the autism spectrum disorders. Overall, UNEECON is a promising framework for both variant and gene prioritization. Numerous statistical methods have been developed to predict deleterious missense variants or constrained genes in the human genome, but unified prioritization methods that utilize both variant- and gene-level information are underdeveloped. Here we present UNEECON, an evolution-based deep learning framework for unified variant and gene prioritization. By integrating variant-level predictors and gene-level selective constraints, UNEECON outperforms existing methods in predicting missense variants and protein-coding genes associated with dominant disorders. Based on UNEECON, we show that disordered proteins are tolerant to missense mutations but not to loss-of-function mutations. In addition, we find that genes under strong selective constraints at both missense and loss-of-function levels are strongly associated with the central nervous system and the autism spectrum disorders, highlighting the need to investigate the function of these highly constrained genes in future studies.
Collapse
Affiliation(s)
- Yi-Fei Huang
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania, United States of America
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania, United States of America
- * E-mail:
| |
Collapse
|
25
|
Takeda JI, Nanatsue K, Yamagishi R, Ito M, Haga N, Hirata H, Ogi T, Ohno K. InMeRF: prediction of pathogenicity of missense variants by individual modeling for each amino acid substitution. NAR Genom Bioinform 2020; 2:lqaa038. [PMID: 33543123 PMCID: PMC7671370 DOI: 10.1093/nargab/lqaa038] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2019] [Revised: 03/03/2020] [Accepted: 05/13/2020] [Indexed: 12/15/2022] Open
Abstract
In predicting the pathogenicity of a nonsynonymous single-nucleotide variant (nsSNV), a radical change in amino acid properties is prone to be classified as being pathogenic. However, not all such nsSNVs are associated with human diseases. We generated random forest (RF) models individually for each amino acid substitution to differentiate pathogenic nsSNVs in the Human Gene Mutation Database and common nsSNVs in dbSNP. We named a set of our models ‘Individual Meta RF’ (InMeRF). Ten-fold cross-validation of InMeRF showed that the areas under the curves (AUCs) of receiver operating characteristic (ROC) and precision–recall curves were on average 0.941 and 0.957, respectively. To compare InMeRF with seven other tools, the eight tools were generated using the same training dataset, and were compared using the same three testing datasets. ROC-AUCs of InMeRF were ranked first in the eight tools. We applied InMeRF to 155 pathogenic and 125 common nsSNVs in seven major genes causing congenital myasthenic syndromes, as well as in VANGL1 causing spina bifida, and found that the sensitivity and specificity of InMeRF were 0.942 and 0.848, respectively. We made the InMeRF web service, and also made genome-wide InMeRF scores available online (https://www.med.nagoya-u.ac.jp/neurogenetics/InMeRF/).
Collapse
Affiliation(s)
- Jun-Ichi Takeda
- Division of Neurogenetics, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, 65 Tsurumai, Showa-ku, Nagoya 466-8550, Japan
| | - Kentaro Nanatsue
- Division of Neurogenetics, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, 65 Tsurumai, Showa-ku, Nagoya 466-8550, Japan
| | - Ryosuke Yamagishi
- Division of Neurogenetics, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, 65 Tsurumai, Showa-ku, Nagoya 466-8550, Japan
| | - Mikako Ito
- Division of Neurogenetics, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, 65 Tsurumai, Showa-ku, Nagoya 466-8550, Japan
| | - Nobuhiko Haga
- Department of Rehabilitation Medicine, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8655, Japan
| | - Hiromi Hirata
- Department of Chemistry and Biological Science, College of Science and Engineering, Aoyama Gakuin University, 5-10-1 Fuchinobe, Chuo-ku, Sagamihara 252-5258, Japan
| | - Tomoo Ogi
- Department of Genetics, Research Institute of Environmental Medicine (RIeM), Nagoya University, Furo, Chikusa-ku, Nagoya 464-8601, Japan
| | - Kinji Ohno
- Division of Neurogenetics, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, 65 Tsurumai, Showa-ku, Nagoya 466-8550, Japan
| |
Collapse
|
26
|
Mugal CF, Kutschera VE, Botero-Castro F, Wolf JBW, Kaj I. Polymorphism Data Assist Estimation of the Nonsynonymous over Synonymous Fixation Rate Ratio ω for Closely Related Species. Mol Biol Evol 2020; 37:260-279. [PMID: 31504782 PMCID: PMC6984366 DOI: 10.1093/molbev/msz203] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
The ratio of nonsynonymous over synonymous sequence divergence, dN/dS, is a widely used estimate of the nonsynonymous over synonymous fixation rate ratio ω, which measures the extent to which natural selection modulates protein sequence evolution. Its computation is based on a phylogenetic approach and computes sequence divergence of protein-coding DNA between species, traditionally using a single representative DNA sequence per species. This approach ignores the presence of polymorphisms and relies on the indirect assumption that new mutations fix instantaneously, an assumption which is generally violated and reasonable only for distantly related species. The violation of the underlying assumption leads to a time-dependence of sequence divergence, and biased estimates of ω in particular for closely related species, where the contribution of ancestral and lineage-specific polymorphisms to sequence divergence is substantial. We here use a time-dependent Poisson random field model to derive an analytical expression of dN/dS as a function of divergence time and sample size. We then extend our framework to the estimation of the proportion of adaptive protein evolution α. This mathematical treatment enables us to show that the joint usage of polymorphism and divergence data can assist the inference of selection for closely related species. Moreover, our analytical results provide the basis for a protocol for the estimation of ω and α for closely related species. We illustrate the performance of this protocol by studying a population data set of four corvid species, which involves the estimation of ω and α at different time-scales and for several choices of sample sizes.
Collapse
Affiliation(s)
- Carina F Mugal
- Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden
| | - Verena E Kutschera
- Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden.,Science for Life Laboratory, Stockholm University, Stockholm, Sweden.,Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Fidel Botero-Castro
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Planegg-Martinsried, Germany
| | - Jochen B W Wolf
- Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden.,Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Planegg-Martinsried, Germany
| | - Ingemar Kaj
- Department of Mathematics, Uppsala University, Uppsala, Sweden
| |
Collapse
|
27
|
Joly-Lopez Z, Platts AE, Gulko B, Choi JY, Groen SC, Zhong X, Siepel A, Purugganan MD. An inferred fitness consequence map of the rice genome. NATURE PLANTS 2020; 6:119-130. [PMID: 32042156 PMCID: PMC7446671 DOI: 10.1038/s41477-019-0589-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Accepted: 12/20/2019] [Indexed: 05/04/2023]
Abstract
The extent to which sequence variation impacts plant fitness is poorly understood. High-resolution maps detailing the constraint acting on the genome, especially in regulatory sites, would be beneficial as functional annotation of noncoding sequences remains sparse. Here, we present a fitness consequence (fitCons) map for rice (Oryza sativa). We inferred fitCons scores (ρ) for 246 inferred genome classes derived from nine functional genomic and epigenomic datasets, including chromatin accessibility, messenger RNA/small RNA transcription, DNA methylation, histone modifications and engaged RNA polymerase activity. These were integrated with genome-wide polymorphism and divergence data from 1,477 rice accessions and 11 reference genome sequences in the Oryzeae. We found ρ to be multimodal, with ~9% of the rice genome falling into classes where more than half of the bases would probably have a fitness consequence if mutated. Around 2% of the rice genome showed evidence of weak negative selection, frequently at candidate regulatory sites, including a novel set of 1,000 potentially active enhancer elements. This fitCons map provides perspective on the evolutionary forces associated with genome diversity, aids in genome annotation and can guide crop breeding programs.
Collapse
Affiliation(s)
- Zoé Joly-Lopez
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Adrian E Platts
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Brad Gulko
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Jae Young Choi
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Simon C Groen
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Xuehua Zhong
- Laboratory of Genetics and Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Michael D Purugganan
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA.
- Center for Genomics and Systems Biology, NYU Abu Dhabi Research Institute, NYU Abu Dhabi, Abu Dhabi, United Arab Emirates.
| |
Collapse
|
28
|
Moutinho AF, Bataillon T, Dutheil JY. Variation of the adaptive substitution rate between species and within genomes. Evol Ecol 2019. [DOI: 10.1007/s10682-019-10026-z] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
AbstractThe importance of adaptive mutations in molecular evolution is extensively debated. Recent developments in population genomics allow inferring rates of adaptive mutations by fitting a distribution of fitness effects to the observed patterns of polymorphism and divergence at sites under selection and sites assumed to evolve neutrally. Here, we summarize the current state-of-the-art of these methods and review the factors that affect the molecular rate of adaptation. Several studies have reported extensive cross-species variation in the proportion of adaptive amino-acid substitutions (α) and predicted that species with larger effective population sizes undergo less genetic drift and higher rates of adaptation. Disentangling the rates of positive and negative selection, however, revealed that mutations with deleterious effects are the main driver of this population size effect and that adaptive substitution rates vary comparatively little across species. Conversely, rates of adaptive substitution have been documented to vary substantially within genomes. On a genome-wide scale, gene density, recombination and mutation rate were observed to play a role in shaping molecular rates of adaptation, as predicted under models of linked selection. At the gene level, it has been reported that the gene functional category and the macromolecular structure substantially impact the rate of adaptive mutations. Here, we deliver a comprehensive review of methods used to infer the molecular adaptive rate, the potential drivers of adaptive evolution and how positive selection shapes molecular evolution within genes, across genes within species and between species.
Collapse
|
29
|
Zitnik M, Nguyen F, Wang B, Leskovec J, Goldenberg A, Hoffman MM. Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities. AN INTERNATIONAL JOURNAL ON INFORMATION FUSION 2019; 50:71-91. [PMID: 30467459 PMCID: PMC6242341 DOI: 10.1016/j.inffus.2018.09.012] [Citation(s) in RCA: 235] [Impact Index Per Article: 39.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include myriad properties describing genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. No single data type, however, can capture the complexity of all the factors relevant to understanding a phenomenon such as a disease. Integrative methods that combine data from multiple technologies have thus emerged as critical statistical and computational approaches. The key challenge in developing such approaches is the identification of effective models to provide a comprehensive and relevant systems view. An ideal method can answer a biological or medical question, identifying important features and predicting outcomes, by harnessing heterogeneous data across several dimensions of biological variation. In this Review, we describe the principles of data integration and discuss current methods and available implementations. We provide examples of successful data integration in biology and medicine. Finally, we discuss current challenges in biomedical integrative methods and our perspective on the future development of the field.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Computer Science, Stanford University,
Stanford, CA, USA
| | - Francis Nguyen
- Department of Medical Biophysics, University of Toronto,
Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
| | - Bo Wang
- Hikvision Research Institute, Santa Clara, CA, USA
| | - Jure Leskovec
- Department of Computer Science, Stanford University,
Stanford, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Anna Goldenberg
- Genetics & Genome Biology, SickKids Research Institute,
Toronto, ON, Canada
- Department of Computer Science, University of Toronto,
Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| | - Michael M. Hoffman
- Department of Medical Biophysics, University of Toronto,
Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
- Department of Computer Science, University of Toronto,
Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| |
Collapse
|
30
|
Arneson A, Ernst J. Systematic discovery of conservation states for single-nucleotide annotation of the human genome. Commun Biol 2019; 2:248. [PMID: 31286065 PMCID: PMC6606595 DOI: 10.1038/s42003-019-0488-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2019] [Accepted: 05/30/2019] [Indexed: 12/12/2022] Open
Abstract
Comparative genomics sequence data is an important source of information for interpreting genomes. Genome-wide annotations based on this data have largely focused on univariate scores or binary elements of evolutionary constraint. Here we present a complementary whole genome annotation approach, ConsHMM, which applies a multivariate hidden Markov model to learn de novo 'conservation states' based on the combinatorial and spatial patterns of which species align to and match a reference genome in a multiple species DNA sequence alignment. We applied ConsHMM to a 100-way vertebrate sequence alignment to annotate the human genome at single nucleotide resolution into 100 conservation states. These states have distinct enrichments for other genomic information including gene annotations, chromatin states, repeat families, and bases prioritized by various variant prioritization scores. Constrained elements have distinct heritability partitioning enrichments depending on their conservation state assignment. ConsHMM conservation states are a resource for analyzing genomes and genetic variants.
Collapse
Affiliation(s)
- Adriana Arneson
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095 USA
- Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, CA 90095 USA
| | - Jason Ernst
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095 USA
- Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, CA 90095 USA
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research at University of California, Los Angeles, Los Angeles, CA 90095 USA
- Computer Science Department, University of California, Los Angeles, Los Angeles, CA 90095 USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA 90095 USA
- Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095 USA
| |
Collapse
|
31
|
Huang YF, Siepel A. Estimation of allele-specific fitness effects across human protein-coding sequences and implications for disease. Genome Res 2019; 29:1310-1321. [PMID: 31249063 PMCID: PMC6673719 DOI: 10.1101/gr.245522.118] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Accepted: 06/20/2019] [Indexed: 12/16/2022]
Abstract
A central challenge in human genomics is to understand the cellular, evolutionary, and clinical significance of genetic variants. Here, we introduce a unified population-genetic and machine-learning model, called Linear Allele-Specific Selection InferencE (LASSIE), for estimating the fitness effects of all observed and potential single-nucleotide variants, based on polymorphism data and predictive genomic features. We applied LASSIE to 51 high-coverage genome sequences annotated with 33 genomic features and constructed a map of allele-specific selection coefficients across all protein-coding sequences in the human genome. This map is generally consistent with previous inferences of the bulk distribution of fitness effects but reveals pervasive weak negative selection against synonymous mutations. In addition, the estimated selection coefficients are highly predictive of inherited pathogenic variants and cancer driver mutations, outperforming state-of-the-art variant prioritization methods. By contrasting our estimated model with ultrahigh coverage ExAC exome-sequencing data, we identified 1118 genes under unusually strong negative selection, which tend to be exclusively expressed in the central nervous system or associated with autism spectrum disorder, as well as 773 genes under unusually weak selection, which tend to be associated with metabolism. This combination of classical population genetic theory with modern machine-learning and large-scale genomic data is a powerful paradigm for the study of both human evolution and disease.
Collapse
Affiliation(s)
- Yi-Fei Huang
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| |
Collapse
|
32
|
Walter Costa MB, Höner zu Siederdissen C, Dunjić M, Stadler PF, Nowick K. SSS-test: a novel test for detecting positive selection on RNA secondary structure. BMC Bioinformatics 2019; 20:151. [PMID: 30898084 PMCID: PMC6429701 DOI: 10.1186/s12859-019-2711-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Accepted: 03/03/2019] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Long non-coding RNAs (lncRNAs) play an important role in regulating gene expression and are thus important for determining phenotypes. Most attempts to measure selection in lncRNAs have focused on the primary sequence. The majority of small RNAs and at least some parts of lncRNAs must fold into specific structures to perform their biological function. Comprehensive assessments of selection acting on RNAs therefore must also encompass structure. Selection pressures acting on the structure of non-coding genes can be detected within multiple sequence alignments. Approaches of this type, however, have so far focused on negative selection. Thus, a computational method for identifying ncRNAs under positive selection is needed. RESULTS We introduce the SSS-test (test for Selection on Secondary Structure) to identify positive selection and thus adaptive evolution. Benchmarks with biological as well as synthetic controls yield coherent signals for both negative and positive selection, demonstrating the functionality of the test. A survey of a lncRNA collection comprising 15,443 families resulted in 110 candidates that appear to be under positive selection in human. In 26 lncRNAs that have been associated with psychiatric disorders we identified local structures that have signs of positive selection in the human lineage. CONCLUSIONS It is feasible to assay positive selection acting on RNA secondary structures on a genome-wide scale. The detection of human-specific positive selection in lncRNAs associated with cognitive disorder provides a set of candidate genes for further experimental testing and may provide insights into the evolution of cognitive abilities in humans. AVAILABILITY The SSS-test and related software is available at: https://github.com/waltercostamb/SSS-test . The databases used in this work are available at: http://www.bioinf.uni-leipzig.de/Software/SSS-test/ .
Collapse
Affiliation(s)
- Maria Beatriz Walter Costa
- Embrapa Agroenergia, Parque Estação Biológica (PqEB), Asa Norte, Brasília, DF, 70770-901 Brazil
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16–18, Leipzig, 04107 Germany
| | - Christian Höner zu Siederdissen
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16–18, Leipzig, 04107 Germany
| | - Marko Dunjić
- Human Biology Group, Institute for Biology, Department of Biology, Chemistry, Pharmacy, Freie Universitaet Berlin, Königin-Luise-Straße 1-3, Berlin, 14195 Germany
- Center for Human Molecular Genetics, Faculty of Biology, University of Belgrade, Studentski trg 16, PO box 43, Belgrade, 11000 Serbia
| | - Peter F. Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16–18, Leipzig, 04107 Germany
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig & Competence Center for Scalable Data Services and Solutions Dresden-Leipzig & Leipzig Research Center for Civilization Diseases, University Leipzig, Leipzig, 04107 Germany
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, Leipzig, 04103 Germany
- Department of Theoretical Chemistry, University of Vienna, Währinger Straße 17, Vienna, A-1090 Austria
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C, DK-1870 Denmark
- Faculdad de Ciencias, Universidad Nacional de Colombia, Sede Bogotá, Ciudad Universitaria, Bogotá, D.C., COL-111321 Colombia
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM87501 USA
| | - Katja Nowick
- Human Biology Group, Institute for Biology, Department of Biology, Chemistry, Pharmacy, Freie Universitaet Berlin, Königin-Luise-Straße 1-3, Berlin, 14195 Germany
- TFome Research Group, Bioinformatics Group, Interdisciplinary Center of Bioinformatics, Department of Computer Science, University of Leipzig, Härtelstraße 16-18, Leipzig, 04107 Germany
- Paul-Flechsig-Institute for Brain Research, University of Leipzig, Liebigstraße 19. Haus C, Leipzig, 04103 Germany
- Bioinformatics, Faculty of Agricultural Sciences, Institute of Animal Science, University of Hohenheim, Garbenstraße 13, Stuttgart, 70593 Germany
| |
Collapse
|
33
|
Tataru P, Bataillon T. polyDFEv2.0: testing for invariance of the distribution of fitness effects within and across species. Bioinformatics 2019; 35:2868-2869. [DOI: 10.1093/bioinformatics/bty1060] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2018] [Revised: 12/18/2018] [Accepted: 01/03/2019] [Indexed: 01/01/2023] Open
Abstract
Abstract
Summary
Distribution of fitness effects (DFE) of mutations can be inferred from site frequency spectrum (SFS) data. There is mounting interest to determine whether distinct genomic regions and/or species share a common DFE, or whether evidence exists for differences among them. polyDFEv2.0 fits multiple SFS datasets at once and provides likelihood ratio tests for DFE invariance across datasets. Simulations show that testing for DFE invariance across genomic regions within a species requires models accounting for distinct sources of heterogeneity (chance and genuine difference in DFE) underlying differences in SFS data in these regions. Not accounting for this will result in the spurious detection of DFE differences.
Availability and Implementation
polyDFEv2.0 is implemented in C and is accompanied by a series of R functions that facilitate post-processing of the output. It is available as source code and compiled binaries under a GNU General Public License v3.0 from https://github.com/paula-tataru/polyDFE.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Paula Tataru
- Bioinformatics Research Centre, Aarhus University, DK Aarhus, Denmark
| | - Thomas Bataillon
- Bioinformatics Research Centre, Aarhus University, DK Aarhus, Denmark
| |
Collapse
|
34
|
Gulko B, Siepel A. An evolutionary framework for measuring epigenomic information and estimating cell-type-specific fitness consequences. Nat Genet 2018; 51:335-342. [PMID: 30559490 DOI: 10.1038/s41588-018-0300-z] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Accepted: 10/30/2018] [Indexed: 01/22/2023]
Abstract
Here we ask the question "How much information do epigenomic datasets provide about human genomic function?" We consider nine epigenomic features across 115 cell types and measure information about function as a reduction in entropy under a probabilistic evolutionary model fitted to human and nonhuman primate genomes. Several epigenomic features yield more information in combination than they do individually. We find that the entropy in human genetic variation predominantly reflects a balance between mutation and neutral drift. Our cell-type-specific FitCons scores reveal relationships among cell types and suggest that around 8% of nucleotide sites are constrained by natural selection.
Collapse
Affiliation(s)
- Brad Gulko
- Graduate Field of Computer Science, Cornell University, Ithaca, NY, USA.,Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
| |
Collapse
|
35
|
Zhou Y, Fujikura K, Mkrtchian S, Lauschke VM. Computational Methods for the Pharmacogenetic Interpretation of Next Generation Sequencing Data. Front Pharmacol 2018; 9:1437. [PMID: 30564131 PMCID: PMC6288784 DOI: 10.3389/fphar.2018.01437] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Accepted: 11/20/2018] [Indexed: 12/21/2022] Open
Abstract
Up to half of all patients do not respond to pharmacological treatment as intended. A substantial fraction of these inter-individual differences is due to heritable factors and a growing number of associations between genetic variations and drug response phenotypes have been identified. Importantly, the rapid progress in Next Generation Sequencing technologies in recent years unveiled the true complexity of the genetic landscape in pharmacogenes with tens of thousands of rare genetic variants. As each individual was found to harbor numerous such rare variants they are anticipated to be important contributors to the genetically encoded inter-individual variability in drug effects. The fundamental challenge however is their functional interpretation due to the sheer scale of the problem that renders systematic experimental characterization of these variants currently unfeasible. Here, we review concepts and important progress in the development of computational prediction methods that allow to evaluate the effect of amino acid sequence alterations in drug metabolizing enzymes and transporters. In addition, we discuss recent advances in the interpretation of functional effects of non-coding variants, such as variations in splice sites, regulatory regions and miRNA binding sites. We anticipate that these methodologies will provide a useful toolkit to facilitate the integration of the vast extent of rare genetic variability into drug response predictions in a precision medicine framework.
Collapse
Affiliation(s)
- Yitian Zhou
- Section of Pharmacogenetics, Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
| | - Kohei Fujikura
- Department of Diagnostic Pathology, Kobe University Graduate School of Medicine, Kobe, Japan
| | - Souren Mkrtchian
- Section of Pharmacogenetics, Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
| | - Volker M. Lauschke
- Section of Pharmacogenetics, Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
36
|
Savisaar R, Hurst LD. Exonic splice regulation imposes strong selection at synonymous sites. Genome Res 2018; 28:1442-1454. [PMID: 30143596 PMCID: PMC6169883 DOI: 10.1101/gr.233999.117] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Accepted: 07/31/2018] [Indexed: 01/17/2023]
Abstract
What proportion of coding sequence nucleotides have roles in splicing, and how strong is the selection that maintains them? Despite a large body of research into exonic splice regulatory signals, these questions have not been answered. This is because, to our knowledge, previous investigations have not explicitly disentangled the frequency of splice regulatory elements from the strength of the evolutionary constraint under which they evolve. Current data are consistent both with a scenario of weak and diffuse constraint, enveloping large swaths of sequence, as well as with well-defined pockets of strong purifying selection. In the former case, natural selection on exonic splice enhancers (ESEs) might primarily act as a slight modifier of codon usage bias. In the latter, mutations that disrupt ESEs are likely to have large fitness and, potentially, clinical effects. To distinguish between these scenarios, we used several different methods to determine the distribution of selection coefficients for new mutations within ESEs. The analyses converged to suggest that ∼15%-20% of fourfold degenerate sites are part of functional ESEs. Most of these sites are under strong evolutionary constraint. Therefore, exonic splice regulation does not simply impose a weak bias that gently nudges coding sequence evolution in a particular direction. Rather, the selection to preserve these motifs is a strong force that severely constrains the evolution of a substantial proportion of coding nucleotides. Thus synonymous mutations that disrupt ESEs should be considered as a potentially common cause of single-locus genetic disorders.
Collapse
Affiliation(s)
- Rosina Savisaar
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, United Kingdom
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, United Kingdom
| |
Collapse
|
37
|
Lee KS, Chatterjee P, Choi EY, Sung MK, Oh J, Won H, Park SM, Kim YJ, Yi SV, Choi JK. Selection on the regulation of sympathetic nervous activity in humans and chimpanzees. PLoS Genet 2018; 14:e1007311. [PMID: 29672586 PMCID: PMC5908061 DOI: 10.1371/journal.pgen.1007311] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Accepted: 03/17/2018] [Indexed: 12/31/2022] Open
Abstract
Adrenergic α2C receptor (ADRA2C) is an inhibitory modulator of the sympathetic nervous system. Knockout mice for this gene show physiological and behavioural alterations that are associated with the fight-or-flight response. There is evidence of positive selection on the regulation of this gene during chicken domestication. Here, we find that the neuronal expression of ADRA2C is lower in human and chimpanzee than in other primates. On the basis of three-dimensional chromatin structure, we identified a cis-regulatory region whose DNA sequences have been significantly accelerated in human and chimpanzee. Active histone modification marks this region in rhesus macaque but not in human and chimpanzee; instead, repressive marks are enriched in various human brain samples. This region contains two neuron-restrictive silencer factor (NRSF) binding motifs, each of which harbours a polymorphism. Our genotyping and analysis of population genome data indicate that at both polymorphic sites, the derived allele has reached fixation in humans and chimpanzees but not in bonobos, whereas only the ancestral allele is present among macaques. Our CRISPR/Cas9 genome editing and reporter assays show that both derived nucleotides repress ADRA2C, most likely by increasing NRSF binding. In addition, we detected signatures of recent positive selection for lower neuronal ADRA2C expression in humans. Our findings indicate that there has been selective pressure for enhanced sympathetic nervous activity in the evolution of humans and chimpanzees.
Collapse
Affiliation(s)
- Kang Seon Lee
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
| | - Paramita Chatterjee
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Eun-Young Choi
- Specific Organs Cancer Branch, Research Institute, National Cancer Center, Ilsan, Gyeonggi, Republic of Korea
| | - Min Kyung Sung
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
| | - Jaeho Oh
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
| | - Hyejung Won
- Department of Neurology, University of California Los Angeles, Los Angeles, California, United States of America
| | - Seong-Min Park
- Specific Organs Cancer Branch, Research Institute, National Cancer Center, Ilsan, Gyeonggi, Republic of Korea
| | - Youn-Jae Kim
- Specific Organs Cancer Branch, Research Institute, National Cancer Center, Ilsan, Gyeonggi, Republic of Korea
| | - Soojin V. Yi
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Jung Kyoon Choi
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
| |
Collapse
|
38
|
Danko CG, Choate LA, Marks BA, Rice EJ, Wang Z, Chu T, Martins AL, Dukler N, Coonrod SA, Tait Wojno ED, Lis JT, Kraus WL, Siepel A. Dynamic evolution of regulatory element ensembles in primate CD4 + T cells. Nat Ecol Evol 2018; 2:537-548. [PMID: 29379187 PMCID: PMC5957490 DOI: 10.1038/s41559-017-0447-5] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2017] [Accepted: 12/08/2017] [Indexed: 12/12/2022]
Abstract
How evolutionary changes at enhancers affect the transcription of target genes remains an important open question. Previous comparative studies of gene expression have largely measured the abundance of messenger RNA, which is affected by post-transcriptional regulatory processes, hence limiting inferences about the mechanisms underlying expression differences. Here, we directly measured nascent transcription in primate species, allowing us to separate transcription from post-transcriptional regulation. We used precision run-on and sequencing to map RNA polymerases in resting and activated CD4+ T cells in multiple human, chimpanzee and rhesus macaque individuals, with rodents as outgroups. We observed general conservation in coding and non-coding transcription, punctuated by numerous differences between species, particularly at distal enhancers and non-coding RNAs. Genes regulated by larger numbers of enhancers are more frequently transcribed at evolutionarily stable levels, despite reduced conservation at individual enhancers. Adaptive nucleotide substitutions are associated with lineage-specific transcription and at one locus, SGPP2, we predict and experimentally validate that multiple substitutions contribute to human-specific transcription. Collectively, our findings suggest a pervasive role for evolutionary compensation across ensembles of enhancers that jointly regulate target genes.
Collapse
Affiliation(s)
- Charles G Danko
- Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University, Ithaca, NY, USA.
- Department of Biomedical Sciences, College of Veterinary Medicine, Cornell University, Ithaca, NY, USA.
| | - Lauren A Choate
- Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University, Ithaca, NY, USA
| | - Brooke A Marks
- Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University, Ithaca, NY, USA
| | - Edward J Rice
- Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University, Ithaca, NY, USA
| | - Zhong Wang
- Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University, Ithaca, NY, USA
| | - Tinyi Chu
- Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University, Ithaca, NY, USA
- Graduate Field of Computational Biology, Cornell University, Ithaca, NY, USA
| | - Andre L Martins
- Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University, Ithaca, NY, USA
- Graduate Field of Computational Biology, Cornell University, Ithaca, NY, USA
| | - Noah Dukler
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
- Tri-Institutional Training Program in Computational Biology and Medicine, New York, NY, USA
| | - Scott A Coonrod
- Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University, Ithaca, NY, USA
- Department of Biomedical Sciences, College of Veterinary Medicine, Cornell University, Ithaca, NY, USA
| | - Elia D Tait Wojno
- Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University, Ithaca, NY, USA
- Department of Microbiology and Immunology, College of Veterinary Medicine, Cornell University, Ithaca, NY, USA
| | - John T Lis
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA
| | - W Lee Kraus
- Laboratory of Signaling and Gene Regulation, Cecil H. and Ida Green Center for Reproductive Biology Sciences, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Division of Basic Research, Department of Obstetrics and Gynecology, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
| |
Collapse
|
39
|
Sánchez-Gracia A, Guirao-Rico S, Hinojosa-Alvarez S, Rozas J. Computational prediction of the phenotypic effects of genetic variants: basic concepts and some application examples in Drosophila nervous system genes. J Neurogenet 2017; 31:307-319. [DOI: 10.1080/01677063.2017.1398241] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Alejandro Sánchez-Gracia
- Departament de Genètica, Microbiologia i Estadística and Institut de Recerca de la Biodiversitat (IRBio), Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
| | - Sara Guirao-Rico
- Center for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Bellaterra, Spain
| | - Silvia Hinojosa-Alvarez
- Departament de Genètica, Microbiologia i Estadística and Institut de Recerca de la Biodiversitat (IRBio), Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
| | - Julio Rozas
- Departament de Genètica, Microbiologia i Estadística and Institut de Recerca de la Biodiversitat (IRBio), Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
| |
Collapse
|
40
|
Inference of Distribution of Fitness Effects and Proportion of Adaptive Substitutions from Polymorphism Data. Genetics 2017; 207:1103-1119. [PMID: 28951530 PMCID: PMC5676230 DOI: 10.1534/genetics.117.300323] [Citation(s) in RCA: 94] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 09/13/2017] [Indexed: 11/18/2022] Open
Abstract
The distribution of fitness effects (DFE) encompasses the fraction of deleterious, neutral, and beneficial mutations. It conditions the evolutionary trajectory of populations, as well as the rate of adaptive molecular evolution (α). Inferring DFE and α from patterns of polymorphism, as given through the site frequency spectrum (SFS) and divergence data, has been a longstanding goal of evolutionary genetics. A widespread assumption shared by previous inference methods is that beneficial mutations only contribute negligibly to the polymorphism data. Hence, a DFE comprising only deleterious mutations tends to be estimated from SFS data, and α is then predicted by contrasting the SFS with divergence data from an outgroup. We develop a hierarchical probabilistic framework that extends previous methods to infer DFE and α from polymorphism data alone. We use extensive simulations to examine the performance of our method. While an outgroup is still needed to obtain an unfolded SFS, we show that both a DFE, comprising both deleterious and beneficial mutations, and α can be inferred without using divergence data. We also show that not accounting for the contribution of beneficial mutations to polymorphism data leads to substantially biased estimates of the DFE and α. We compare our framework with one of the most widely used inference methods available and apply it on a recently published chimpanzee exome data set.
Collapse
|
41
|
Savisaar R, Hurst LD. Estimating the prevalence of functional exonic splice regulatory information. Hum Genet 2017; 136:1059-1078. [PMID: 28405812 PMCID: PMC5602102 DOI: 10.1007/s00439-017-1798-3] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2017] [Accepted: 04/04/2017] [Indexed: 12/14/2022]
Abstract
In addition to coding information, human exons contain sequences necessary for correct splicing. These elements are known to be under purifying selection and their disruption can cause disease. However, the density of functional exonic splicing information remains profoundly uncertain. Several groups have experimentally investigated how mutations at different exonic positions affect splicing. They have found splice information to be distributed widely in exons, with one estimate putting the proportion of splicing-relevant nucleotides at >90%. These results suggest that splicing could place a major pressure on exon evolution. However, analyses of sequence conservation have concluded that the need to preserve splice regulatory signals only slightly constrains exon evolution, with a resulting decrease in the average human rate of synonymous evolution of only 1–4%. Why do these two lines of research come to such different conclusions? Among other reasons, we suggest that the methods are measuring different things: one assays the density of sites that affect splicing, the other the density of sites whose effects on splicing are visible to selection. In addition, the experimental methods typically consider short exons, thereby enriching for nucleotides close to the splice junction, such sites being enriched for splice-control elements. By contrast, in part owing to correction for nucleotide composition biases and to the assumption that constraint only operates on exon ends, the conservation-based methods can be overly conservative.
Collapse
Affiliation(s)
- Rosina Savisaar
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK.
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK
| |
Collapse
|
42
|
Madsen T, Hobolth A, Jensen JL, Pedersen JS. Significance evaluation in factor graphs. BMC Bioinformatics 2017; 18:199. [PMID: 28359297 PMCID: PMC5374669 DOI: 10.1186/s12859-017-1614-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2016] [Accepted: 03/24/2017] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Factor graphs provide a flexible and general framework for specifying probability distributions. They can capture a range of popular and recent models for analysis of both genomics data as well as data from other scientific fields. Owing to the ever larger data sets encountered in genomics and the multiple-testing issues accompanying them, accurate significance evaluation is of great importance. We here address the problem of evaluating statistical significance of observations from factor graph models. RESULTS Two novel numerical approximations for evaluation of statistical significance are presented. First a method using importance sampling. Second a saddlepoint approximation based method. We develop algorithms to efficiently compute the approximations and compare them to naive sampling and the normal approximation. The individual merits of the methods are analysed both from a theoretical viewpoint and with simulations. A guideline for choosing between the normal approximation, saddle-point approximation and importance sampling is also provided. Finally, the applicability of the methods is demonstrated with examples from cancer genomics, motif-analysis and phylogenetics. CONCLUSIONS The applicability of saddlepoint approximation and importance sampling is demonstrated on known models in the factor graph framework. Using the two methods we can substantially improve computational cost without compromising accuracy. This contribution allows analyses of large datasets in the general factor graph framework.
Collapse
Affiliation(s)
- Tobias Madsen
- Department of Molecular Medicine, Aarhus University, Palle Juul-Jensens Boulevard 99, Aarhus, Denmark. .,Bioinformatics Research Center, Aarhus University, C.F. Møllers Allé 8, Aarhus, Denmark.
| | - Asger Hobolth
- Bioinformatics Research Center, Aarhus University, C.F. Møllers Allé 8, Aarhus, Denmark
| | - Jens Ledet Jensen
- Department of Mathematics, Aarhus University, Ny Munkegade 118, Aarhus, Denmark
| | - Jakob Skou Pedersen
- Department of Molecular Medicine, Aarhus University, Palle Juul-Jensens Boulevard 99, Aarhus, Denmark.,Bioinformatics Research Center, Aarhus University, C.F. Møllers Allé 8, Aarhus, Denmark
| |
Collapse
|
43
|
Huang YF, Gulko B, Siepel A. Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data. Nat Genet 2017; 49:618-624. [PMID: 28288115 PMCID: PMC5395419 DOI: 10.1038/ng.3810] [Citation(s) in RCA: 232] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 02/13/2017] [Indexed: 12/17/2022]
Abstract
Many genetic variants that influence phenotypes of interest are located outside of protein-coding genes, yet existing methods for identifying such variants have poor predictive power. Here we introduce a new computational method, called LINSIGHT, that substantially improves the prediction of noncoding nucleotide sites at which mutations are likely to have deleterious fitness consequences, and which, therefore, are likely to be phenotypically important. LINSIGHT combines a generalized linear model for functional genomic data with a probabilistic model of molecular evolution. The method is fast and highly scalable, enabling it to exploit the 'big data' available in modern genomics. We show that LINSIGHT outperforms the best available methods in identifying human noncoding variants associated with inherited diseases. In addition, we apply LINSIGHT to an atlas of human enhancers and show that the fitness consequences at enhancers depend on cell type, tissue specificity, and constraints at associated promoters.
Collapse
Affiliation(s)
- Yi-Fei Huang
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | - Brad Gulko
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA.,Graduate Field of Computer Science, Cornell University, Ithaca, New York, USA
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| |
Collapse
|
44
|
Schor IE, Degner JF, Harnett D, Cannavò E, Casale FP, Shim H, Garfield DA, Birney E, Stephens M, Stegle O, Furlong EEM. Promoter shape varies across populations and affects promoter evolution and expression noise. Nat Genet 2017; 49:550-558. [PMID: 28191888 DOI: 10.1038/ng.3791] [Citation(s) in RCA: 58] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Accepted: 01/20/2017] [Indexed: 12/29/2022]
Abstract
Animal promoters initiate transcription either at precise positions (narrow promoters) or dispersed regions (broad promoters), a distinction referred to as promoter shape. Although highly conserved, the functional properties of promoters with different shapes and the genetic basis of their evolution remain unclear. Here we used natural genetic variation across a panel of 81 Drosophila lines to measure changes in transcriptional start site (TSS) usage, identifying thousands of genetic variants affecting transcript levels (strength) or the distribution of TSSs within a promoter (shape). Our results identify promoter shape as a molecular trait that can evolve independently of promoter strength. Broad promoters typically harbor shape-associated variants, with signatures of adaptive selection. Single-cell measurements demonstrate that variants modulating promoter shape often increase expression noise, whereas heteroallelic interactions with other promoter variants alleviate these effects. These results uncover new functional properties of natural promoters and suggest the minimization of expression noise as an important factor in promoter evolution.
Collapse
Affiliation(s)
- Ignacio E Schor
- European Molecular Biology Laboratory (EMBL) Genome Biology Unit, Heidelberg, Germany
| | - Jacob F Degner
- European Molecular Biology Laboratory (EMBL) Genome Biology Unit, Heidelberg, Germany
| | - Dermot Harnett
- European Molecular Biology Laboratory (EMBL) Genome Biology Unit, Heidelberg, Germany
| | - Enrico Cannavò
- European Molecular Biology Laboratory (EMBL) Genome Biology Unit, Heidelberg, Germany
| | - Francesco P Casale
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Heejung Shim
- Department of Statistics, Purdue University, West Lafayette, Indiana, USA
| | - David A Garfield
- European Molecular Biology Laboratory (EMBL) Genome Biology Unit, Heidelberg, Germany
| | - Ewan Birney
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Matthew Stephens
- Department of Human Genetics, University of Chicago, Chicago, Illinois, USA
| | - Oliver Stegle
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Eileen E M Furlong
- European Molecular Biology Laboratory (EMBL) Genome Biology Unit, Heidelberg, Germany
| |
Collapse
|
45
|
Joly-Lopez Z, Flowers JM, Purugganan MD. Developing maps of fitness consequences for plant genomes. CURRENT OPINION IN PLANT BIOLOGY 2016; 30:101-7. [PMID: 26950251 DOI: 10.1016/j.pbi.2016.02.008] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2015] [Revised: 02/02/2016] [Accepted: 02/17/2016] [Indexed: 05/22/2023]
Abstract
Predicting the fitness consequences of mutations, and their concomitant impacts on molecular and cellular function as well as organismal phenotypes, is an important challenge in biology that has new relevance in an era when genomic data is readily available. The ability to construct genomewide maps of fitness consequences in plant genomes is a recent development that has profound implications for our ability to predict the fitness effects of mutations and discover functional elements. Here we highlight approaches to building fitness consequence maps to infer regions under selection. We emphasize computational methods applied primarily to the study of human disease that translate physical maps of within-species genome variation into maps of fitness effects of individual natural mutations. Maps of fitness consequences in plants, combined with traditional genetic approaches, could accelerate discovery of functional elements such as regulatory sequences in non-coding DNA and genetic polymorphisms associated with key traits, including agronomically-important traits such as yield and environmental stress responses.
Collapse
Affiliation(s)
- Zoé Joly-Lopez
- Center for Genomics and Systems Biology, Department of Biology, 12 Waverly Place, New York University, New York, NY 10003, United States
| | - Jonathan M Flowers
- Center for Genomics and Systems Biology, Department of Biology, 12 Waverly Place, New York University, New York, NY 10003, United States; Center for Genomics and Systems Biology, NYU Abu Dhabi Research Institute, NYU Abu Dhabi, Saadiyat Island, Abu Dhabi, United Arab Emirates
| | - Michael D Purugganan
- Center for Genomics and Systems Biology, Department of Biology, 12 Waverly Place, New York University, New York, NY 10003, United States; Center for Genomics and Systems Biology, NYU Abu Dhabi Research Institute, NYU Abu Dhabi, Saadiyat Island, Abu Dhabi, United Arab Emirates.
| |
Collapse
|
46
|
Bailey SF, Bataillon T. Can the experimental evolution programme help us elucidate the genetic basis of adaptation in nature? Mol Ecol 2016; 25:203-18. [PMID: 26346808 PMCID: PMC5019151 DOI: 10.1111/mec.13378] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2015] [Revised: 08/26/2015] [Accepted: 09/04/2015] [Indexed: 02/04/2023]
Abstract
There have been a variety of approaches taken to try to characterize and identify the genetic basis of adaptation in nature, spanning theoretical models, experimental evolution studies and direct tests of natural populations. Theoretical models can provide formalized and detailed hypotheses regarding evolutionary processes and patterns, from which experimental evolution studies can then provide important proofs of concepts and characterize what is biologically reasonable. Genetic and genomic data from natural populations then allow for the identification of the particular factors that have and continue to play an important role in shaping adaptive evolution in the natural world. Further to this, experimental evolution studies allow for tests of theories that may be difficult or impossible to test in natural populations for logistical and methodological reasons and can even generate new insights, suggesting further refinement of existing theories. However, as experimental evolution studies often take place in a very particular set of controlled conditions--that is simple environments, a small range of usually asexual species, relatively short timescales--the question remains as to how applicable these experimental results are to natural populations. In this review, we discuss important insights coming from experimental evolution, focusing on four key topics tied to the evolutionary genetics of adaptation, and within those topics, we discuss the extent to which the experimental work compliments and informs natural population studies. We finish by making suggestions for future work in particular a need for natural population genomic time series data, as well as the necessity for studies that combine both experimental evolution and natural population approaches.
Collapse
Affiliation(s)
- Susan F. Bailey
- Bioinformatics Research CentreAarhus UniversityC.F. Møllers Allé 8DK‐8000Aarhus CDenmark
| | - Thomas Bataillon
- Bioinformatics Research CentreAarhus UniversityC.F. Møllers Allé 8DK‐8000Aarhus CDenmark
| |
Collapse
|
47
|
Shadow Enhancers Are Pervasive Features of Developmental Regulatory Networks. Curr Biol 2015; 26:38-51. [PMID: 26687625 PMCID: PMC4712172 DOI: 10.1016/j.cub.2015.11.034] [Citation(s) in RCA: 166] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2015] [Revised: 11/16/2015] [Accepted: 11/17/2015] [Indexed: 11/22/2022]
Abstract
Embryogenesis is remarkably robust to segregating mutations and environmental variation; under a range of conditions, embryos of a given species develop into stereotypically patterned organisms. Such robustness is thought to be conferred, in part, through elements within regulatory networks that perform similar, redundant tasks. Redundant enhancers (or "shadow" enhancers), for example, can confer precision and robustness to gene expression, at least at individual, well-studied loci. However, the extent to which enhancer redundancy exists and can thereby have a major impact on developmental robustness remains unknown. Here, we systematically assessed this, identifying over 1,000 predicted shadow enhancers during Drosophila mesoderm development. The activity of 23 elements, associated with five genes, was examined in transgenic embryos, while natural structural variation among individuals was used to assess their ability to buffer against genetic variation. Our results reveal three clear properties of enhancer redundancy within developmental systems. First, it is much more pervasive than previously anticipated, with 64% of loci examined having shadow enhancers. Their spatial redundancy is often partial in nature, while the non-overlapping function may explain why these enhancers are maintained within a population. Second, over 70% of loci do not follow the simple situation of having only two shadow enhancers-often there are three (rols), four (CadN and ade5), or five (Traf1), at least one of which can be deleted with no obvious phenotypic effects. Third, although shadow enhancers can buffer variation, patterns of segregating variation suggest that they play a more complex role in development than generally considered.
Collapse
|
48
|
Richardson K, Schnitzler GR, Lai CQ, Ordovas JM. Functional Genomics Analysis of Big Data Identifies Novel Peroxisome Proliferator-Activated Receptor γ Target Single Nucleotide Polymorphisms Showing Association With Cardiometabolic Outcomes. ACTA ACUST UNITED AC 2015; 8:842-51. [PMID: 26518621 DOI: 10.1161/circgenetics.115.001174] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2015] [Accepted: 10/22/2015] [Indexed: 11/16/2022]
Abstract
BACKGROUND Cardiovascular disease and type 2 diabetes mellitus represent overlapping diseases where a large portion of the variation attributable to genetics remains unexplained. An important player in their pathogenesis is peroxisome proliferator-activated receptor γ (PPARγ) that is involved in lipid and glucose metabolism and maintenance of metabolic homeostasis. We used a functional genomics methodology to interrogate human chromatin immunoprecipitation-sequencing, genome-wide association studies, and expression quantitative trait locus data to inform selection of candidate functional single nucleotide polymorphisms (SNPs) falling in PPARγ motifs. METHODS AND RESULTS We derived 27 328 chromatin immunoprecipitation-sequencing peaks for PPARγ in human adipocytes through meta-analysis of 3 data sets. The PPARγ consensus motif showed greatest enrichment and mapped to 8637 peaks. We identified 146 SNPs in these motifs. This number was significantly less than would be expected by chance, and Inference of Natural Selection from Interspersed Genomically coHerent elemenTs analysis indicated that these motifs are under weak negative selection. A screen of these SNPs against genome-wide association studies for cardiometabolic traits revealed significant enrichment with 16 SNPs. A screen against the MuTHER expression quantitative trait locus data revealed 8 of these were significantly associated with altered gene expression in human adipose, more than would be expected by chance. Several SNPs fall close, or are linked by expression quantitative trait locus to lipid-metabolism loci including CYP26A1. CONCLUSIONS We demonstrated the use of functional genomics to identify SNPs of potential function. Specifically, that SNPs within PPARγ motifs that bind PPARγ in adipocytes are significantly associated with cardiometabolic disease and with the regulation of transcription in adipose. This method may be used to uncover functional SNPs that do not reach significance thresholds in the agnostic approach of genome-wide association studies.
Collapse
Affiliation(s)
- Kris Richardson
- From the Nutrition and Genomics Laboratory, Jean Mayer United States Department of Agriculture Human Nutrition Research Center on Aging at Tufts University, Boston, MA (K.R., C.-Q.L., J.M.O.); Molecular Cardiology Research Institute, Tufts Medical Center, Boston, MA (G.R.S.); Department of Clinical Investigation, Centro Nacional Investigaciones Cardiovasculares (CNIC), Madrid, Spain (J.M.O.); and Department of Nutritional Genomics, Instituto Madrileno de Estudios Avanzados en Alimentacion, Madrid, Spain (J.M.O).
| | - Gavin R Schnitzler
- From the Nutrition and Genomics Laboratory, Jean Mayer United States Department of Agriculture Human Nutrition Research Center on Aging at Tufts University, Boston, MA (K.R., C.-Q.L., J.M.O.); Molecular Cardiology Research Institute, Tufts Medical Center, Boston, MA (G.R.S.); Department of Clinical Investigation, Centro Nacional Investigaciones Cardiovasculares (CNIC), Madrid, Spain (J.M.O.); and Department of Nutritional Genomics, Instituto Madrileno de Estudios Avanzados en Alimentacion, Madrid, Spain (J.M.O)
| | - Chao-Qiang Lai
- From the Nutrition and Genomics Laboratory, Jean Mayer United States Department of Agriculture Human Nutrition Research Center on Aging at Tufts University, Boston, MA (K.R., C.-Q.L., J.M.O.); Molecular Cardiology Research Institute, Tufts Medical Center, Boston, MA (G.R.S.); Department of Clinical Investigation, Centro Nacional Investigaciones Cardiovasculares (CNIC), Madrid, Spain (J.M.O.); and Department of Nutritional Genomics, Instituto Madrileno de Estudios Avanzados en Alimentacion, Madrid, Spain (J.M.O)
| | - Jose M Ordovas
- From the Nutrition and Genomics Laboratory, Jean Mayer United States Department of Agriculture Human Nutrition Research Center on Aging at Tufts University, Boston, MA (K.R., C.-Q.L., J.M.O.); Molecular Cardiology Research Institute, Tufts Medical Center, Boston, MA (G.R.S.); Department of Clinical Investigation, Centro Nacional Investigaciones Cardiovasculares (CNIC), Madrid, Spain (J.M.O.); and Department of Nutritional Genomics, Instituto Madrileno de Estudios Avanzados en Alimentacion, Madrid, Spain (J.M.O)
| |
Collapse
|
49
|
Transcriptional enhancers: functional insights and role in human disease. Curr Opin Genet Dev 2015; 33:71-6. [PMID: 26433090 PMCID: PMC4720706 DOI: 10.1016/j.gde.2015.08.009] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2015] [Revised: 08/21/2015] [Accepted: 08/26/2015] [Indexed: 01/20/2023]
Abstract
In recent years, studies of cis-regulatory mechanisms have evolved from a predominant focus on promoter regions to the realization that spatial and temporal gene regulation is frequently driven by long-range enhancer clusters that operate within chromosomal compartments. This increased understanding of genome function, together with the emergence of technologies that enable whole-genome sequencing of patients’ DNAs, open the prospect of dissecting the role of cis-regulatory defects in human disease. In this review we discuss how recent epigenomic studies have provided insights into the function of transcriptional enhancers. We then present examples that illustrate how integrative genomics can help uncover enhancer sequence variants underlying Mendelian and common polygenic human disease.
Collapse
|
50
|
Prescott SL, Srinivasan R, Marchetto MC, Grishina I, Narvaiza I, Selleri L, Gage FH, Swigut T, Wysocka J. Enhancer divergence and cis-regulatory evolution in the human and chimp neural crest. Cell 2015; 163:68-83. [PMID: 26365491 DOI: 10.1016/j.cell.2015.08.036] [Citation(s) in RCA: 244] [Impact Index Per Article: 24.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2015] [Revised: 05/06/2015] [Accepted: 07/21/2015] [Indexed: 01/23/2023]
Abstract
cis-regulatory changes play a central role in morphological divergence, yet the regulatory principles underlying emergence of human traits remain poorly understood. Here, we use epigenomic profiling from human and chimpanzee cranial neural crest cells to systematically and quantitatively annotate divergence of craniofacial cis-regulatory landscapes. Epigenomic divergence is often attributable to genetic variation within TF motifs at orthologous enhancers, with a novel motif being most predictive of activity biases. We explore properties of this cis-regulatory change, revealing the role of particular retroelements, uncovering broad clusters of species-biased enhancers near genes associated with human facial variation, and demonstrating that cis-regulatory divergence is linked to quantitative expression differences of crucial neural crest regulators. Our work provides a wealth of candidates for future evolutionary studies and demonstrates the value of "cellular anthropology," a strategy of using in-vitro-derived embryonic cell types to elucidate both fundamental and evolving mechanisms underlying morphological variation in higher primates.
Collapse
Affiliation(s)
- Sara L Prescott
- Department of Chemical and Systems Biology and Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Rajini Srinivasan
- Department of Chemical and Systems Biology and Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Maria Carolina Marchetto
- Laboratory of Genetics, The Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA 92037, USA
| | - Irina Grishina
- Department of Cell and Developmental Biology, Weill Cornell Medical College, Cornell University, New York, NY 10065, USA
| | - Iñigo Narvaiza
- Laboratory of Genetics, The Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA 92037, USA
| | - Licia Selleri
- Department of Cell and Developmental Biology, Weill Cornell Medical College, Cornell University, New York, NY 10065, USA
| | - Fred H Gage
- Laboratory of Genetics, The Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA 92037, USA; Center for Academic Research and Training in Anthropogeny (CARTA), University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Tomek Swigut
- Department of Chemical and Systems Biology and Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA 94305, USA.
| | - Joanna Wysocka
- Department of Chemical and Systems Biology and Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA 94305, USA; Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305, USA; Institute of Stem Cell Biology and Regenerative Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA.
| |
Collapse
|