1
|
Chen Y, Liu S, Ren Z, Wang F, Liang Q, Jiang Y, Dai R, Duan F, Han C, Ning Z, Xia Y, Li M, Yuan K, Qiu W, Yan XX, Dai J, Kopp RF, Huang J, Xu S, Tang B, Wu L, Gamazon ER, Bigdeli T, Gershon E, Huang H, Ma C, Liu C, Chen C. Cross-ancestry analysis of brain QTLs enhances interpretation of schizophrenia genome-wide association studies. Am J Hum Genet 2024:S0002-9297(24)00336-7. [PMID: 39362218 DOI: 10.1016/j.ajhg.2024.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 09/04/2024] [Accepted: 09/06/2024] [Indexed: 10/05/2024] Open
Abstract
Research on brain expression quantitative trait loci (eQTLs) has illuminated the genetic underpinnings of schizophrenia (SCZ). Yet most of these studies have been centered on European populations, leading to a constrained understanding of population diversities and disease risks. To address this gap, we examined genotype and RNA-seq data from African Americans (AA, n = 158), Europeans (EUR, n = 408), and East Asians (EAS, n = 217). When comparing eQTLs between EUR and non-EUR populations, we observed concordant patterns of genetic regulatory effect, particularly in terms of the effect sizes of the eQTLs. However, 343,737 cis-eQTLs linked to 1,276 genes and 198,769 SNPs were found to be specific to non-EUR populations. Over 90% of observed population differences in eQTLs could be traced back to differences in allele frequency. Furthermore, 35% of these eQTLs were notably rare in the EUR population. Integrating brain eQTLs with SCZ signals from diverse populations, we observed a higher disease heritability enrichment of brain eQTLs in matched populations compared to mismatched ones. Prioritization analysis identified five risk genes (SFXN2, VPS37B, DENR, FTCDNL1, and NT5DC2) and three potential regulatory variants in known risk genes (CNNM2, MTRFR, and MPHOSPH9) that were missed in the EUR dataset. Our findings underscore that increasing genetic ancestral diversity is more efficient for power improvement than merely increasing the sample size within single-ancestry eQTLs datasets. Such a strategy will not only improve our understanding of the biological underpinnings of population structures but also pave the way for the identification of risk genes in SCZ.
Collapse
Affiliation(s)
- Yu Chen
- MOE Key Laboratory of Rare Pediatric Diseases & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, and Department of Psychiatry, The Second Xiangya Hospital, Central South University, Changsha, Hunan 410000, China; Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Sihan Liu
- MOE Key Laboratory of Rare Pediatric Diseases & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, and Department of Psychiatry, The Second Xiangya Hospital, Central South University, Changsha, Hunan 410000, China; Institute of Rare Diseases, West China Hospital, Sichuan University, Chengdu, China
| | - Zongyao Ren
- MOE Key Laboratory of Rare Pediatric Diseases & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, and Department of Psychiatry, The Second Xiangya Hospital, Central South University, Changsha, Hunan 410000, China
| | - Feiran Wang
- MOE Key Laboratory of Rare Pediatric Diseases & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, and Department of Psychiatry, The Second Xiangya Hospital, Central South University, Changsha, Hunan 410000, China
| | - Qiuman Liang
- MOE Key Laboratory of Rare Pediatric Diseases & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, and Department of Psychiatry, The Second Xiangya Hospital, Central South University, Changsha, Hunan 410000, China
| | - Yi Jiang
- MOE Key Laboratory of Rare Pediatric Diseases & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, and Department of Psychiatry, The Second Xiangya Hospital, Central South University, Changsha, Hunan 410000, China
| | - Rujia Dai
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA
| | - Fangyuan Duan
- MOE Key Laboratory of Rare Pediatric Diseases & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, and Department of Psychiatry, The Second Xiangya Hospital, Central South University, Changsha, Hunan 410000, China
| | - Cong Han
- MOE Key Laboratory of Rare Pediatric Diseases & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, and Department of Psychiatry, The Second Xiangya Hospital, Central South University, Changsha, Hunan 410000, China
| | - Zhilin Ning
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yan Xia
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Miao Li
- MOE Key Laboratory of Rare Pediatric Diseases & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, and Department of Psychiatry, The Second Xiangya Hospital, Central South University, Changsha, Hunan 410000, China
| | - Kai Yuan
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Wenying Qiu
- Institute of Basic Medical Sciences, Neuroscience Center, National Human Brain Bank for Development and Function, Chinese Academy of Medical Sciences, Department of Human Anatomy, Histology and Embryology, School of Basic Medicine, Peking Union Medical College, Beijing, China
| | - Xiao-Xin Yan
- Department of Human Anatomy and Neurobiology, Xiangya School of Medicine, Central South University, Changsha, China
| | - Jiapei Dai
- Wuhan Institute for Neuroscience and Engineering, South-Central University for Nationalities, Wuhan, China
| | - Richard F Kopp
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA
| | - Jufang Huang
- Department of Human Anatomy and Neurobiology, Xiangya School of Medicine, Central South University, Changsha, China
| | - Shuhua Xu
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Collaborative Innovation Center of Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
| | - Beisha Tang
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, China
| | - Lingqian Wu
- MOE Key Laboratory of Rare Pediatric Diseases & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, and Department of Psychiatry, The Second Xiangya Hospital, Central South University, Changsha, Hunan 410000, China
| | - Eric R Gamazon
- Division of Genetic Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Tim Bigdeli
- Institute for Genomics in Health, SUNY Downstate Health Sciences University, Brooklyn, NY, USA
| | - Elliot Gershon
- Department of Psychiatry and Behavioral Neuroscience, University of Chicago, Chicago, IL, USA
| | | | - Chao Ma
- Institute of Basic Medical Sciences, Neuroscience Center, National Human Brain Bank for Development and Function, Chinese Academy of Medical Sciences, Department of Human Anatomy, Histology and Embryology, School of Basic Medicine, Peking Union Medical College, Beijing, China
| | - Chunyu Liu
- MOE Key Laboratory of Rare Pediatric Diseases & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, and Department of Psychiatry, The Second Xiangya Hospital, Central South University, Changsha, Hunan 410000, China; Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA.
| | - Chao Chen
- MOE Key Laboratory of Rare Pediatric Diseases & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, and Department of Psychiatry, The Second Xiangya Hospital, Central South University, Changsha, Hunan 410000, China; National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, China; Hunan Key Laboratory of Animal Models for Human Diseases, Central South University, Changsha, China.
| |
Collapse
|
2
|
Chen Y, Liu S, Ren Z, Wang F, Jiang Y, Dai R, Duan F, Han C, Ning Z, Xia Y, Li M, Yuan K, Qiu W, Yan XX, Dai J, Kopp RF, Huang J, Xu S, Tang B, Gamazon ER, Bigdeli T, Gershon E, Huang H, Ma C, Liu C, Chen C. Brain eQTLs of European, African American, and Asian ancestry improve interpretation of schizophrenia GWAS. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.02.13.24301833. [PMID: 38405973 PMCID: PMC10888997 DOI: 10.1101/2024.02.13.24301833] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Research on brain expression quantitative trait loci (eQTLs) has illuminated the genetic underpinnings of schizophrenia (SCZ). Yet, the majority of these studies have been centered on European populations, leading to a constrained understanding of population diversities and disease risks. To address this gap, we examined genotype and RNA-seq data from African Americans (AA, n=158), Europeans (EUR, n=408), and East Asians (EAS, n=217). When comparing eQTLs between EUR and non-EUR populations, we observed concordant patterns of genetic regulatory effect, particularly in terms of the effect sizes of the eQTLs. However, 343,737 cis-eQTLs (representing ∼17% of all eQTLs pairs) linked to 1,276 genes (about 10% of all eGenes) and 198,769 SNPs (approximately 16% of all eSNPs) were identified only in the non-EUR populations. Over 90% of observed population differences in eQTLs could be traced back to differences in allele frequency. Furthermore, 35% of these eQTLs were notably rare (MAF < 0.05) in the EUR population. Integrating brain eQTLs with SCZ signals from diverse populations, we observed a higher disease heritability enrichment of brain eQTLs in matched populations compared to mismatched ones. Prioritization analysis identified seven new risk genes ( SFXN2 , RP11-282018.3 , CYP17A1 , VPS37B , DENR , FTCDNL1 , and NT5DC2 ), and three potential novel regulatory variants in known risk genes ( CNNM2 , C12orf65 , and MPHOSPH9 ) that were missed in the EUR dataset. Our findings underscore that increasing genetic ancestral diversity is more efficient for power improvement than merely increasing the sample size within single-ancestry eQTLs datasets. Such a strategy will not only improve our understanding of the biological underpinnings of population structures but also pave the way for the identification of novel risk genes in SCZ.
Collapse
|
3
|
Wang X, Ingvarsson PK. Quantifying adaptive evolution and the effects of natural selection across the Norway spruce genome. Mol Ecol 2023; 32:5288-5304. [PMID: 37622583 DOI: 10.1111/mec.17106] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 08/07/2023] [Accepted: 08/09/2023] [Indexed: 08/26/2023]
Abstract
Detecting natural selection is one of the major goals of evolutionary genomics. Here, we sequenced the whole genome of 25 Picea abies individuals and quantified the amount of selection across the genome. Using an estimate of the distribution of fitness effects, we showed that both negative selection and the rate of positively selected substitutions are very limited in coding regions. We found a positive correlation between the rate of adaptive substitutions and recombination rate and a negative correlation between the rate of adaptive substitutions and gene density, suggesting a widespread influence from Hill-Robertson interference on the efficiency of protein adaptation in P. abies. Finally, the distinct population statistics between genomic regions under either positive or balancing selection with that under neutral regions indicated the impact of natural selection on the genomic architecture of Norway spruce. Further gene ontology enrichment analysis for genes located in regions identified as undergoing either positive or long-term balancing selection also highlighted the specific molecular functions and biological processes that appear to be targets of selection in Norway spruce.
Collapse
Affiliation(s)
- Xi Wang
- Umeå Plant Science Centre, Department of Ecology and Environmental Science, Umeå University, Umeå, Sweden
| | - Pär K Ingvarsson
- Linnean Centre for Plant Biology, Department of Plant Biology, Swedish University of Agricultural Sciences, Uppsala, Sweden
| |
Collapse
|
4
|
Liu Z, Samee M. Structural underpinnings of mutation rate variations in the human genome. Nucleic Acids Res 2023; 51:7184-7197. [PMID: 37395403 PMCID: PMC10415140 DOI: 10.1093/nar/gkad551] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 06/06/2023] [Accepted: 06/15/2023] [Indexed: 07/04/2023] Open
Abstract
Single nucleotide mutation rates have critical implications for human evolution and genetic diseases. Importantly, the rates vary substantially across the genome and the principles underlying such variations remain poorly understood. A recent model explained much of this variation by considering higher-order nucleotide interactions in the 7-mer sequence context around mutated nucleotides. This model's success implicates a connection between DNA shape and mutation rates. DNA shape, i.e. structural properties like helical twist and tilt, is known to capture interactions between nucleotides within a local context. Thus, we hypothesized that changes in DNA shape features at and around mutated positions can explain mutation rate variations in the human genome. Indeed, DNA shape-based models of mutation rates showed similar or improved performance over current nucleotide sequence-based models. These models accurately characterized mutation hotspots in the human genome and revealed the shape features whose interactions underlie mutation rate variations. DNA shape also impacts mutation rates within putative functional regions like transcription factor binding sites where we find a strong association between DNA shape and position-specific mutation rates. This work demonstrates the structural underpinnings of nucleotide mutations in the human genome and lays the groundwork for future models of genetic variations to incorporate DNA shape.
Collapse
Affiliation(s)
- Zian Liu
- Department of Integrative Physiology, Baylor College of Medicine, Houston, TX 77030, USA
| | - Md Abul Hassan Samee
- Department of Integrative Physiology, Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
5
|
Traniello IM, Bukhari SA, Dibaeinia P, Serrano G, Avalos A, Ahmed AC, Sankey AL, Hernaez M, Sinha S, Zhao SD, Catchen J, Robinson GE. Single-cell dissection of aggression in honeybee colonies. Nat Ecol Evol 2023; 7:1232-1244. [PMID: 37264201 DOI: 10.1038/s41559-023-02090-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 05/09/2023] [Indexed: 06/03/2023]
Abstract
Understanding how genotypic variation results in phenotypic variation is especially difficult for collective behaviour because group phenotypes arise from complex interactions among group members. A genome-wide association study identified hundreds of genes associated with colony-level variation in honeybee aggression, many of which also showed strong signals of positive selection, but the influence of these 'colony aggression genes' on brain function was unknown. Here we use single-cell (sc) transcriptomics and gene regulatory network (GRN) analyses to test the hypothesis that genetic variation for colony aggression influences individual differences in brain gene expression and/or gene regulation. We compared soldiers, which respond to territorial intrusion with stinging attacks, and foragers, which do not. Colony environment showed stronger influences on soldier-forager differences in brain gene regulation compared with brain gene expression. GRN plasticity was strongly associated with colony aggression, with larger differences in GRN dynamics detected between soldiers and foragers from more aggressive relative to less aggressive colonies. The regulatory dynamics of subnetworks composed of genes associated with colony aggression genes were more strongly correlated with each other across different cell types and brain regions relative to other genes, especially in brain regions involved with olfaction and vision and multimodal sensory integration, which are known to mediate bee aggression. These results show how group genetics can shape a collective phenotype by modulating individual brain gene regulatory network architecture.
Collapse
Affiliation(s)
- Ian M Traniello
- Neuroscience Program, University of Illinois at Urbana-Champaign (UIUC), Urbana, IL, USA.
- Carl R Woese Institute for Genomic Biology, UIUC, Urbana, IL, USA.
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA.
| | | | | | - Guillermo Serrano
- Computational Biology Program, CIMA University of Navarra, Pamplona, Spain
| | - Arian Avalos
- Honey Bee Breeding, Genetics and Physiology Research Laboratory, Agricultural Research Services, United States Department of Agriculture, Baton Rouge, LA, USA
| | - Amy Cash Ahmed
- Carl R Woese Institute for Genomic Biology, UIUC, Urbana, IL, USA
| | - Alison L Sankey
- Carl R Woese Institute for Genomic Biology, UIUC, Urbana, IL, USA
| | - Mikel Hernaez
- Computational Biology Program, CIMA University of Navarra, Pamplona, Spain
| | - Saurabh Sinha
- Carl R Woese Institute for Genomic Biology, UIUC, Urbana, IL, USA
- Department of Computer Science, UIUC, Urbana, IL, USA
| | - Sihai Dave Zhao
- Carl R Woese Institute for Genomic Biology, UIUC, Urbana, IL, USA
- Department of Statistics, UIUC, Urbana, IL, USA
| | - Julian Catchen
- Carl R Woese Institute for Genomic Biology, UIUC, Urbana, IL, USA
- Department of Evolution, Ecology and Behavior, UIUC, Urbana, IL, USA
| | - Gene E Robinson
- Neuroscience Program, University of Illinois at Urbana-Champaign (UIUC), Urbana, IL, USA.
- Carl R Woese Institute for Genomic Biology, UIUC, Urbana, IL, USA.
- Department of Entomology, UIUC, Urbana, IL, USA.
| |
Collapse
|
6
|
Tuncay IO, DeVries D, Gogate A, Kaur K, Kumar A, Xing C, Goodspeed K, Seyoum-Tesfa L, Chahrour MH. The genetics of autism spectrum disorder in an East African familial cohort. CELL GENOMICS 2023; 3:100322. [PMID: 37492102 PMCID: PMC10363748 DOI: 10.1016/j.xgen.2023.100322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 03/09/2023] [Accepted: 04/16/2023] [Indexed: 07/27/2023]
Abstract
Autism spectrum disorder (ASD) is a group of complex neurodevelopmental conditions affecting communication and social interaction in 2.3% of children. Studies that demonstrated its complex genetic architecture have been mainly performed in populations of European ancestry. We investigate the genetics of ASD in an East African cohort (129 individuals) from a population with higher prevalence (5%). Whole-genome sequencing identified 2.13 million private variants in the cohort and potentially pathogenic variants in known ASD genes (including CACNA1C, CHD7, FMR1, and TCF7L2). Admixture analysis demonstrated that the cohort comprises two ancestral populations, African and Eurasian. Admixture mapping discovered 10 regions that confer ASD risk on the African haplotypes, containing several known ASD genes. The increased ASD prevalence in this population suggests decreased heterogeneity in the underlying genetic etiology, enabling risk allele identification. Our approach emphasizes the power of African genetic variation and admixture analysis to inform the architecture of complex disorders.
Collapse
Affiliation(s)
- Islam Oguz Tuncay
- Department of Neuroscience, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Darlene DeVries
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Ashlesha Gogate
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Kiran Kaur
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Ashwani Kumar
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Chao Xing
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
- Department of Population and Data Sciences, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Kimberly Goodspeed
- Department of Pediatrics, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
- Department of Neurology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
- Department of Psychiatry, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | | | - Maria H Chahrour
- Department of Neuroscience, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
- Department of Psychiatry, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
- Peter O'Donnell Jr. Brain Institute, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| |
Collapse
|
7
|
Mehta TK, Man A, Ciezarek A, Ranson K, Penman D, Di-Palma F, Haerty W. Chromatin accessibility in gill tissue identifies candidate genes and loci associated with aquaculture relevant traits in tilapia. Genomics 2023; 115:110633. [PMID: 37121445 DOI: 10.1016/j.ygeno.2023.110633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 04/25/2023] [Accepted: 04/26/2023] [Indexed: 05/02/2023]
Abstract
The Nile tilapia (Oreochromis niloticus) accounts for ∼9% of global freshwater finfish production however, extreme cold weather and decreasing freshwater resources has created the need to develop resilient strains. By determining the genetic bases of aquaculture relevant traits, we can genotype and breed desirable traits into farmed strains. We generated ATAC-seq and gene expression data from O. niloticus gill tissues, and through the integration of SNPs from 27 tilapia species, identified 1168 highly expressed genes (4% of all Nile tilapia genes) with highly accessible promoter regions with functional variation at transcription factor binding sites (TFBSs). Regulatory variation at these TFBSs is likely driving gene expression differences associated with tilapia gill adaptations, and differentially segregate in freshwater and euryhaline tilapia species. The generation of novel integrative data revealed candidate genes e.g., prolactin receptor 1 and claudin-h, genetic relationships, and loci associated with aquaculture relevant traits like salinity and osmotic stress acclimation.
Collapse
Affiliation(s)
| | | | | | - Keith Ranson
- Institute of Aquaculture, University of Stirling, Scotland, UK
| | - David Penman
- Institute of Aquaculture, University of Stirling, Scotland, UK
| | - Federica Di-Palma
- School of Biological Sciences, University of East Anglia, Norwich, UK; Genome British Columbia, Vancouver, Canada
| | - Wilfried Haerty
- Earlham Institute (EI), Norwich, UK; School of Biological Sciences, University of East Anglia, Norwich, UK
| |
Collapse
|
8
|
Zhang X, Fang B, Huang YF. Transcription factor binding sites are frequently under accelerated evolution in primates. Nat Commun 2023; 14:783. [PMID: 36774380 PMCID: PMC9922303 DOI: 10.1038/s41467-023-36421-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 01/31/2023] [Indexed: 02/13/2023] Open
Abstract
Recent comparative genomic studies have identified many human accelerated elements (HARs) with elevated substitution rates in the human lineage. However, it remains unknown to what extent transcription factor binding sites (TFBSs) are under accelerated evolution in humans and other primates. Here, we introduce two pooling-based phylogenetic methods with dramatically enhanced sensitivity to examine accelerated evolution in TFBSs. Using these new methods, we show that more than 6000 TFBSs annotated in the human genome have experienced accelerated evolution in Hominini, apes, and Old World monkeys. Although these TFBSs individually show relatively weak signals of accelerated evolution, they collectively are more abundant than HARs. Also, we show that accelerated evolution in Pol III binding sites may be driven by lineage-specific positive selection, whereas accelerated evolution in other TFBSs might be driven by nonadaptive evolutionary forces. Finally, the accelerated TFBSs are enriched around developmental genes, suggesting that accelerated evolution in TFBSs may drive the divergence of developmental processes between primates.
Collapse
Affiliation(s)
- Xinru Zhang
- Department of Biology, Pennsylvania State University, University Park, PA, 16802, USA. .,Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, 16802, USA. .,Bioinformatics and Genomics Graduate Program, Pennsylvania State University, University Park, PA, 16802, USA.
| | - Bohao Fang
- Department of Organismic and Evolutionary Biology and the Museum of Comparative Zoology, Harvard University, Boston, MA, 02135, USA
| | - Yi-Fei Huang
- Department of Biology, Pennsylvania State University, University Park, PA, 16802, USA. .,Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
9
|
Linker SB, Narvaiza I, Hsu JY, Wang M, Qiu F, Mendes APD, Oefner R, Kottilil K, Sharma A, Randolph-Moore L, Mejia E, Santos R, Marchetto MC, Gage FH. Human-specific regulation of neural maturation identified by cross-primate transcriptomics. Curr Biol 2022; 32:4797-4807.e5. [PMID: 36228612 DOI: 10.1016/j.cub.2022.09.028] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Revised: 07/08/2022] [Accepted: 09/14/2022] [Indexed: 11/06/2022]
Abstract
Unique aspects of human behavior are often attributed to differences in the relative size and organization of the human brain: these structural aspects originate during early development. Recent studies indicate that human neurodevelopment is considerably slower than that in other nonhuman primates, a finding that is termed neoteny. One aspect of neoteny is the slow onset of action potentials. However, which molecular mechanisms play a role in this process remain unclear. To examine the evolutionary constraints on the rate of neuronal maturation, we have generated transcriptional data tracking five time points, from the neural progenitor state to 8-week-old neurons, in primates spanning the catarrhine lineage, including Macaca mulatta, Gorilla gorilla, Pan paniscus, Pan troglodytes, and Homo sapiens. Despite finding an overall similarity of many transcriptional signatures, species-specific and clade-specific distinctions were observed. Among the genes that exhibited human-specific regulation, we identified a key pioneer transcription factor, GATA3, that was uniquely upregulated in humans during the neuronal maturation process. We further examined the regulatory nature of GATA3 in human cells and observed that downregulation quickened the speed of developing spontaneous action potentials, thereby modulating the human neotenic phenotype. These results provide evidence for the divergence of gene regulation as a key molecular mechanism underlying human neoteny.
Collapse
Affiliation(s)
- Sara B Linker
- Laboratory of Genetics, Salk Institute for Biological Studies, 10010 North Pines Road, La Jolla, CA 92037, USA
| | - Iñigo Narvaiza
- Laboratory of Genetics, Salk Institute for Biological Studies, 10010 North Pines Road, La Jolla, CA 92037, USA
| | - Jonathan Y Hsu
- Laboratory of Genetics, Salk Institute for Biological Studies, 10010 North Pines Road, La Jolla, CA 92037, USA
| | - Meiyan Wang
- Laboratory of Genetics, Salk Institute for Biological Studies, 10010 North Pines Road, La Jolla, CA 92037, USA
| | - Fan Qiu
- Laboratory of Genetics, Salk Institute for Biological Studies, 10010 North Pines Road, La Jolla, CA 92037, USA
| | - Ana P D Mendes
- Laboratory of Genetics, Salk Institute for Biological Studies, 10010 North Pines Road, La Jolla, CA 92037, USA
| | - Ruth Oefner
- Laboratory of Genetics, Salk Institute for Biological Studies, 10010 North Pines Road, La Jolla, CA 92037, USA
| | - Kalyani Kottilil
- Laboratory of Genetics, Salk Institute for Biological Studies, 10010 North Pines Road, La Jolla, CA 92037, USA
| | - Amandeep Sharma
- Laboratory of Genetics, Salk Institute for Biological Studies, 10010 North Pines Road, La Jolla, CA 92037, USA
| | - Lynne Randolph-Moore
- Laboratory of Genetics, Salk Institute for Biological Studies, 10010 North Pines Road, La Jolla, CA 92037, USA
| | - Eunice Mejia
- Laboratory of Genetics, Salk Institute for Biological Studies, 10010 North Pines Road, La Jolla, CA 92037, USA
| | - Renata Santos
- Laboratory of Genetics, Salk Institute for Biological Studies, 10010 North Pines Road, La Jolla, CA 92037, USA; Université Paris Cité, Institute of Psychiatry and Neuroscience of Paris (IPNP), INSERM U1266, Laboratory of Dynamics of Neuronal Structure in Health and Disease, 102 rue de la Santé, 75014 Paris, France; Institut des Sciences Biologiques, CNRS, 16 rue Pierre et Marie Curie, 75005 Paris, France
| | - Maria C Marchetto
- Department of Anthropology, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA; Center for Academic Research and Training in Anthropogeny (CARTA), University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA.
| | - Fred H Gage
- Laboratory of Genetics, Salk Institute for Biological Studies, 10010 North Pines Road, La Jolla, CA 92037, USA.
| |
Collapse
|
10
|
Exploration of Tools for the Interpretation of Human Non-Coding Variants. Int J Mol Sci 2022; 23:ijms232112977. [PMID: 36361767 PMCID: PMC9654743 DOI: 10.3390/ijms232112977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 10/17/2022] [Accepted: 10/23/2022] [Indexed: 02/01/2023] Open
Abstract
The advent of Whole Genome Sequencing (WGS) broadened the genetic variation detection range, revealing the presence of variants even in non-coding regions of the genome, which would have been missed using targeted approaches. One of the most challenging issues in WGS analysis regards the interpretation of annotated variants. This review focuses on tools suitable for the functional annotation of variants falling into non-coding regions. It couples the description of non-coding genomic areas with the results and performance of existing tools for a functional interpretation of the effect of variants in these regions. Tools were tested in a controlled genomic scenario, representing the ground-truth and allowing us to determine software performance.
Collapse
|
11
|
Ramstein GP, Buckler ES. Prediction of evolutionary constraint by genomic annotations improves functional prioritization of genomic variants in maize. Genome Biol 2022; 23:183. [PMID: 36050782 PMCID: PMC9438327 DOI: 10.1186/s13059-022-02747-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Accepted: 08/15/2022] [Indexed: 11/10/2022] Open
Abstract
Background Crop improvement through cross-population genomic prediction and genome editing requires identification of causal variants at high resolution, within fewer than hundreds of base pairs. Most genetic mapping studies have generally lacked such resolution. In contrast, evolutionary approaches can detect genetic effects at high resolution, but they are limited by shifting selection, missing data, and low depth of multiple-sequence alignments. Here we use genomic annotations to accurately predict nucleotide conservation across angiosperms, as a proxy for fitness effect of mutations. Results Using only sequence analysis, we annotate nonsynonymous mutations in 25,824 maize gene models, with information from bioinformatics and deep learning. Our predictions are validated by experimental information: within-species conservation, chromatin accessibility, and gene expression. According to gene ontology and pathway enrichment analyses, predicted nucleotide conservation points to genes in central carbon metabolism. Importantly, it improves genomic prediction for fitness-related traits such as grain yield, in elite maize panels, by stringent prioritization of fewer than 1% of single-site variants. Conclusions Our results suggest that predicting nucleotide conservation across angiosperms may effectively prioritize sites most likely to impact fitness-related traits in crops, without being limited by shifting selection, missing data, and low depth of multiple-sequence alignments. Our approach—Prediction of mutation Impact by Calibrated Nucleotide Conservation (PICNC)—could be useful to select polymorphisms for accurate genomic prediction, and candidate mutations for efficient base editing. The trained PICNC models and predicted nucleotide conservation at protein-coding SNPs in maize are publicly available in CyVerse (10.25739/hybz-2957). Supplementary Information The online version contains supplementary material available at 10.1186/s13059-022-02747-2.
Collapse
Affiliation(s)
- Guillaume P Ramstein
- Center for Quantitative Genetics and Genomics, Aarhus University, 8000, Aarhus, Denmark. .,Institute for Genomic Diversity, Cornell University, Ithaca, NY, 14853, USA.
| | - Edward S Buckler
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, 14853, USA.,USDA-ARS, Ithaca, NY, 14853, USA
| |
Collapse
|
12
|
Dukler N, Mughal MR, Ramani R, Huang YF, Siepel A. Extreme purifying selection against point mutations in the human genome. Nat Commun 2022; 13:4312. [PMID: 35879308 PMCID: PMC9314448 DOI: 10.1038/s41467-022-31872-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 07/07/2022] [Indexed: 12/13/2022] Open
Abstract
Large-scale genome sequencing has enabled the measurement of strong purifying selection in protein-coding genes. Here we describe a new method, called ExtRaINSIGHT, for measuring such selection in noncoding as well as coding regions of the human genome. ExtRaINSIGHT estimates the prevalence of "ultraselection" by the fractional depletion of rare single-nucleotide variants, after controlling for variation in mutation rates. Applying ExtRaINSIGHT to 71,702 whole genome sequences from gnomAD v3, we find abundant ultraselection in evolutionarily ancient miRNAs and neuronal protein-coding genes, as well as at splice sites. By contrast, we find much less ultraselection in other noncoding RNAs and transcription factor binding sites, and only modest levels in ultraconserved elements. We estimate that ~0.4-0.7% of the human genome is ultraselected, implying ~ 0.26-0.51 strongly deleterious mutations per generation. Overall, our study sheds new light on the genome-wide distribution of fitness effects by combining deep sequencing data and classical theory from population genetics.
Collapse
Affiliation(s)
- Noah Dukler
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Mehreen R Mughal
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Ritika Ramani
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Yi-Fei Huang
- Department of Biology and Huck Institute of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
| |
Collapse
|
13
|
Systems biology analysis of human genomes points to key pathways conferring spina bifida risk. Proc Natl Acad Sci U S A 2021; 118:2106844118. [PMID: 34916285 PMCID: PMC8713748 DOI: 10.1073/pnas.2106844118] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/20/2021] [Indexed: 12/15/2022] Open
Abstract
Genetic investigations of most structural birth defects, including spina bifida (SB), congenital heart disease, and craniofacial anomalies, have been underpowered for genome-wide association studies because of their rarity, genetic heterogeneity, incomplete penetrance, and environmental influences. Our systems biology strategy to investigate SB predisposition controls for population stratification and avoids much of the bias inherent in candidate gene searches that are pervasive in the field. We examine both protein coding and noncoding regions of whole genomes to analyze sequence variants, collapsed by gene or regulatory region, and apply machine learning, gene enrichment, and pathway analyses to elucidate molecular pathways and genes contributing to human SB. Spina bifida (SB) is a debilitating birth defect caused by multiple gene and environment interactions. Though SB shows non-Mendelian inheritance, genetic factors contribute to an estimated 70% of cases. Nevertheless, identifying human mutations conferring SB risk is challenging due to its relative rarity, genetic heterogeneity, incomplete penetrance, and environmental influences that hamper genome-wide association studies approaches to untargeted discovery. Thus, SB genetic studies may suffer from population substructure and/or selection bias introduced by typical candidate gene searches. We report a population based, ancestry-matched whole-genome sequence analysis of SB genetic predisposition using a systems biology strategy to interrogate 298 case-control subject genomes (149 pairs). Genes that were enriched in likely gene disrupting (LGD), rare protein-coding variants were subjected to machine learning analysis to identify genes in which LGD variants occur with a different frequency in cases versus controls and so discriminate between these groups. Those genes with high discriminatory potential for SB significantly enriched pathways pertaining to carbon metabolism, inflammation, innate immunity, cytoskeletal regulation, and essential transcriptional regulation consistent with their having impact on the pathogenesis of human SB. Additionally, an interrogation of conserved noncoding sequences identified robust variant enrichment in regulatory regions of several transcription factors critical to embryonic development. This genome-wide perspective offers an effective approach to the interrogation of coding and noncoding sequence variant contributions to rare complex genetic disorders.
Collapse
|
14
|
Joshi M, Kapopoulou A, Laurent S. Impact of Genetic Variation in Gene Regulatory Sequences: A Population Genomics Perspective. Front Genet 2021; 12:660899. [PMID: 34276769 PMCID: PMC8282999 DOI: 10.3389/fgene.2021.660899] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 05/31/2021] [Indexed: 01/06/2023] Open
Abstract
The unprecedented rise of high-throughput sequencing and assay technologies has provided a detailed insight into the non-coding sequences and their potential role as gene expression regulators. These regulatory non-coding sequences are also referred to as cis-regulatory elements (CREs). Genetic variants occurring within CREs have been shown to be associated with altered gene expression and phenotypic changes. Such variants are known to occur spontaneously and ultimately get fixed, due to selection and genetic drift, in natural populations and, in some cases, pave the way for speciation. Hence, the study of genetic variation at CREs has improved our overall understanding of the processes of local adaptation and evolution. Recent advances in high-throughput sequencing and better annotations of CREs have enabled the evaluation of the impact of such variation on gene expression, phenotypic alteration and fitness. Here, we review recent research on the evolution of CREs and concentrate on studies that have investigated genetic variation occurring in these regulatory sequences within the context of population genetics.
Collapse
Affiliation(s)
- Manas Joshi
- Department of Comparative Development and Genetics, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| | | | - Stefan Laurent
- Department of Comparative Development and Genetics, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| |
Collapse
|
15
|
Zhou Y, Lauschke VM. Computational Tools to Assess the Functional Consequences of Rare and Noncoding Pharmacogenetic Variability. Clin Pharmacol Ther 2021; 110:626-636. [PMID: 33998671 DOI: 10.1002/cpt.2289] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 05/07/2021] [Indexed: 12/19/2022]
Abstract
Interindividual differences in drug response are a common concern in both drug development and across layers of care. While genetics clearly influences drug response and toxicity of many drugs, a substantial fraction of the heritable pharmacological and toxicological variability remains unexplained by known genetic polymorphisms. In recent years, population-scale sequencing projects have unveiled tens of thousands of coding and noncoding pharmacogenetic variants with unclear functional effects that might explain at least part of this missing heritability. However, translating these personalized variant signatures into drug response predictions and actionable advice remains challenging and constitutes one of the most important frontiers of contemporary pharmacogenomics. Conventional prediction methods are primarily based on evolutionary conservation, which drastically reduces their predictive accuracy when applied to poorly conserved pharmacogenes. Here, we review the current state-of-the-art of computational variant effect predictors across variant classes and critically discuss their utility for pharmacogenomics. Besides missense variants, we discuss recent progress in the evaluation of synonymous, splice, and noncoding variations. Furthermore, we discuss emerging possibilities to assess haplotypes and structural variations. We advocate for the development of algorithms trained on pharmacogenomic instead of pathogenic data sets to improve the predictive accuracy in order to facilitate the utilization of next-generation sequencing data for personalized clinical decision support and precision pharmacogenomics.
Collapse
Affiliation(s)
- Yitian Zhou
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
| | - Volker M Lauschke
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
16
|
Zrimec J, Buric F, Kokina M, Garcia V, Zelezniak A. Learning the Regulatory Code of Gene Expression. Front Mol Biosci 2021; 8:673363. [PMID: 34179082 PMCID: PMC8223075 DOI: 10.3389/fmolb.2021.673363] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Accepted: 05/24/2021] [Indexed: 11/13/2022] Open
Abstract
Data-driven machine learning is the method of choice for predicting molecular phenotypes from nucleotide sequence, modeling gene expression events including protein-DNA binding, chromatin states as well as mRNA and protein levels. Deep neural networks automatically learn informative sequence representations and interpreting them enables us to improve our understanding of the regulatory code governing gene expression. Here, we review the latest developments that apply shallow or deep learning to quantify molecular phenotypes and decode the cis-regulatory grammar from prokaryotic and eukaryotic sequencing data. Our approach is to build from the ground up, first focusing on the initiating protein-DNA interactions, then specific coding and non-coding regions, and finally on advances that combine multiple parts of the gene and mRNA regulatory structures, achieving unprecedented performance. We thus provide a quantitative view of gene expression regulation from nucleotide sequence, concluding with an information-centric overview of the central dogma of molecular biology.
Collapse
Affiliation(s)
- Jan Zrimec
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Filip Buric
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Mariia Kokina
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Victor Garcia
- School of Life Sciences and Facility Management, Zurich University of Applied Sciences, Wädenswil, Switzerland
| | - Aleksej Zelezniak
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Science for Life Laboratory, Stockholm, Sweden
| |
Collapse
|
17
|
Tseng CC, Wong MC, Liao WT, Chen CJ, Lee SC, Yen JH, Chang SJ. Genetic Variants in Transcription Factor Binding Sites in Humans: Triggered by Natural Selection and Triggers of Diseases. Int J Mol Sci 2021; 22:ijms22084187. [PMID: 33919522 PMCID: PMC8073710 DOI: 10.3390/ijms22084187] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2021] [Revised: 04/15/2021] [Accepted: 04/16/2021] [Indexed: 12/14/2022] Open
Abstract
Variants of transcription factor binding sites (TFBSs) constitute an important part of the human genome. Current evidence demonstrates close links between nucleotides within TFBSs and gene expression. There are multiple pathways through which genomic sequences located in TFBSs regulate gene expression, and recent genome-wide association studies have shown the biological significance of TFBS variation in human phenotypes. However, numerous challenges remain in the study of TFBS polymorphisms. This article aims to cover the current state of understanding as regards the genomic features of TFBSs and TFBS variants; the mechanisms through which TFBS variants regulate gene expression; the approaches to studying the effects of nucleotide changes that create or disrupt TFBSs; the challenges faced in studies of TFBS sequence variations; the effects of natural selection on collections of TFBSs; in addition to the insights gained from the study of TFBS alleles related to gout, its associated comorbidities (increased body mass index, chronic kidney disease, diabetes, dyslipidemia, coronary artery disease, ischemic heart disease, hypertension, hyperuricemia, osteoporosis, and prostate cancer), and the treatment responses of patients.
Collapse
Affiliation(s)
- Chia-Chun Tseng
- Graduate Institute of Clinical Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung 80708, Taiwan; (C.-C.T.); (J.-H.Y.)
- Division of Rheumatology, Department of Internal Medicine, Kaohsiung Medical University Hospital, Kaohsiung 80756, Taiwan
| | - Man-Chun Wong
- Department of Biotechnology, College of Life Science, Kaohsiung Medical University, Kaohsiung 80708, Taiwan;
| | - Wei-Ting Liao
- Department of Biotechnology, College of Life Science, Kaohsiung Medical University, Kaohsiung 80708, Taiwan;
- Department of Medical Research, Kaohsiung Medical University Hospital, Kaohsiung 80756, Taiwan
- Correspondence: (W.-T.L.); (S.-J.C.); Tel.: +886-7-3121101 (W.-T.L.); +886-7-5916679 (S.-J.C.); Fax:+886-7-3125339 (W.-T.L.); +886-7-5919264 (S.-J.C.)
| | - Chung-Jen Chen
- Department of Internal Medicine, Kaohsiung Municipal Ta-Tung Hospital, Kaohsiung 80145, Taiwan;
| | - Su-Chen Lee
- Laboratory Diagnosis of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung 80708, Taiwan;
| | - Jeng-Hsien Yen
- Graduate Institute of Clinical Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung 80708, Taiwan; (C.-C.T.); (J.-H.Y.)
- Division of Rheumatology, Department of Internal Medicine, Kaohsiung Medical University Hospital, Kaohsiung 80756, Taiwan
- Institute of Biomedical Sciences, National Sun Yat-Sen University, Kaohsiung 80424, Taiwan
- Department of Biological Science and Technology, National Chiao-Tung University, Hsinchu 30010, Taiwan
| | - Shun-Jen Chang
- Department of Kinesiology, Health and Leisure Studies, National University of Kaohsiung, Kaohsiung 81148, Taiwan
- Correspondence: (W.-T.L.); (S.-J.C.); Tel.: +886-7-3121101 (W.-T.L.); +886-7-5916679 (S.-J.C.); Fax:+886-7-3125339 (W.-T.L.); +886-7-5919264 (S.-J.C.)
| |
Collapse
|
18
|
Zrimec J, Börlin CS, Buric F, Muhammad AS, Chen R, Siewers V, Verendel V, Nielsen J, Töpel M, Zelezniak A. Deep learning suggests that gene expression is encoded in all parts of a co-evolving interacting gene regulatory structure. Nat Commun 2020; 11:6141. [PMID: 33262328 PMCID: PMC7708451 DOI: 10.1038/s41467-020-19921-4] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Accepted: 11/02/2020] [Indexed: 12/31/2022] Open
Abstract
Understanding the genetic regulatory code governing gene expression is an important challenge in molecular biology. However, how individual coding and non-coding regions of the gene regulatory structure interact and contribute to mRNA expression levels remains unclear. Here we apply deep learning on over 20,000 mRNA datasets to examine the genetic regulatory code controlling mRNA abundance in 7 model organisms ranging from bacteria to Human. In all organisms, we can predict mRNA abundance directly from DNA sequence, with up to 82% of the variation of transcript levels encoded in the gene regulatory structure. By searching for DNA regulatory motifs across the gene regulatory structure, we discover that motif interactions could explain the whole dynamic range of mRNA levels. Co-evolution across coding and non-coding regions suggests that it is not single motifs or regions, but the entire gene regulatory structure and specific combination of regulatory elements that define gene expression levels.
Collapse
Affiliation(s)
- Jan Zrimec
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
| | - Christoph S Börlin
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
| | - Filip Buric
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
| | - Azam Sheikh Muhammad
- Computer Science and Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
| | - Rhongzen Chen
- Computer Science and Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
| | - Verena Siewers
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
| | - Vilhelm Verendel
- Computer Science and Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
| | - Jens Nielsen
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
| | - Mats Töpel
- Department of Marine Sciences, University of Gothenburg, Box 461, SE-405 30, Gothenburg, Sweden
- Gothenburg Global Biodiversity Center (GGBC), Box 461, 40530, Gothenburg, Sweden
| | - Aleksej Zelezniak
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden.
- Science for Life Laboratory, Tomtebodavägen 23a, SE-171 65, Stockholm, Sweden.
| |
Collapse
|
19
|
Liu J, Robinson-Rechavi M. Robust inference of positive selection on regulatory sequences in the human brain. SCIENCE ADVANCES 2020; 6:6/48/eabc9863. [PMID: 33246961 PMCID: PMC7695467 DOI: 10.1126/sciadv.abc9863] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Accepted: 10/16/2020] [Indexed: 05/07/2023]
Abstract
A longstanding hypothesis is that divergence between humans and chimpanzees might have been driven more by regulatory level adaptations than by protein sequence adaptations. This has especially been suggested for regulatory adaptations in the evolution of the human brain. We present a new method to detect positive selection on transcription factor binding sites on the basis of measuring predicted affinity change with a machine learning model of binding. Unlike other methods, this approach requires neither defining a priori neutral sites nor detecting accelerated evolution, thus removing major sources of bias. We scanned the signals of positive selection for CTCF binding sites in 29 human and 11 mouse tissues or cell types. We found that human brain-related cell types have the highest proportion of positive selection. This result is consistent with the view that adaptive evolution to gene regulation has played an important role in evolution of the human brain.
Collapse
Affiliation(s)
- Jialin Liu
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Marc Robinson-Rechavi
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|
20
|
Zhang H, Shi X, Huang T, Zhao X, Chen W, Gu N, Zhang R. Dynamic landscape and evolution of m6A methylation in human. Nucleic Acids Res 2020; 48:6251-6264. [PMID: 32406913 PMCID: PMC7293016 DOI: 10.1093/nar/gkaa347] [Citation(s) in RCA: 181] [Impact Index Per Article: 45.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2020] [Revised: 04/23/2020] [Accepted: 04/24/2020] [Indexed: 01/03/2023] Open
Abstract
m6A is a prevalent internal modification in mRNAs and has been linked to the diverse effects on mRNA fate. To explore the landscape and evolution of human m6A, we generated 27 m6A methylomes across major adult tissues. These data reveal dynamic m6A methylation across tissue types, uncover both broadly or tissue-specifically methylated sites, and identify an unexpected enrichment of m6A methylation at non-canonical cleavage sites. A comparison of fetal and adult m6A methylomes reveals that m6A preferentially occupies CDS regions in fetal tissues. Moreover, the m6A sub-motifs vary between fetal and adult tissues or across tissue types. From the evolutionary perspective, we uncover that the selection pressure on m6A sites varies and depends on their genic locations. Unexpectedly, we found that ∼40% of the 3′UTR m6A sites are under negative selection, which is higher than the evolutionary constraint on miRNA binding sites, and much higher than that on A-to-I RNA modification. Moreover, the recently gained m6A sites in human populations are clearly under positive selection and associated with traits or diseases. Our work provides a resource of human m6A profile for future studies of m6A functions, and suggests a role of m6A modification in human evolutionary adaptation and disease susceptibility.
Collapse
Affiliation(s)
- Hui Zhang
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510275, PR China
| | - Xinrui Shi
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510275, PR China
| | - Tao Huang
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510275, PR China
| | - Xueni Zhao
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510275, PR China
| | - Wanying Chen
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510275, PR China
| | - Nannan Gu
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510275, PR China
| | - Rui Zhang
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510275, PR China.,RNA Biomedical Institute, Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, Guangzhou 510120, PR China
| |
Collapse
|
21
|
Selection against archaic hominin genetic variation in regulatory regions. Nat Ecol Evol 2020; 4:1558-1566. [PMID: 32839541 DOI: 10.1038/s41559-020-01284-0] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Accepted: 07/21/2020] [Indexed: 01/20/2023]
Abstract
Traces of Neandertal and Denisovan DNA persist in the modern human gene pool, but have been systematically purged by natural selection from genes and other functionally important regions. This implies that many archaic alleles harmed the fitness of hybrid individuals, but the nature of this harm is poorly understood. Here, we show that enhancers contain less Neandertal and Denisovan variation than expected given the background selection they experience, suggesting that selection acted to purge these regions of archaic alleles that disrupted their gene regulatory functions. We infer that selection acted mainly on young archaic variation that arose in Neandertals or Denisovans shortly before their contact with humans; enhancers are not depleted of older variants found in both archaic species. Some types of enhancer appear to have tolerated introgression better than others; compared with tissue-specific enhancers, pleiotropic enhancers show stronger depletion of archaic single-nucleotide polymorphisms. To some extent, evolutionary constraint is predictive of introgression depletion, but certain tissues' enhancers are more depleted of Neandertal and Denisovan alleles than expected given their comparative tolerance to new mutations. Foetal brain and muscle are the tissues whose enhancers show the strongest depletion of archaic alleles, but only brain enhancers show evidence of unusually stringent purifying selection. We conclude that epistatic incompatibilities between human and archaic alleles are needed to explain the degree of archaic variant depletion from foetal muscle enhancers, perhaps due to divergent selection for higher muscle mass in archaic hominins compared with humans.
Collapse
|
22
|
Huang YF. Unified inference of missense variant effects and gene constraints in the human genome. PLoS Genet 2020; 16:e1008922. [PMID: 32667917 PMCID: PMC7384676 DOI: 10.1371/journal.pgen.1008922] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Revised: 07/27/2020] [Accepted: 06/09/2020] [Indexed: 01/25/2023] Open
Abstract
A challenge in medical genomics is to identify variants and genes associated with severe genetic disorders. Based on the premise that severe, early-onset disorders often result in a reduction of evolutionary fitness, several statistical methods have been developed to predict pathogenic variants or constrained genes based on the signatures of negative selection in human populations. However, we currently lack a statistical framework to jointly predict deleterious variants and constrained genes from both variant-level features and gene-level selective constraints. Here we present such a unified approach, UNEECON, based on deep learning and population genetics. UNEECON treats the contributions of variant-level features and gene-level constraints as a variant-level fixed effect and a gene-level random effect, respectively. The sum of the fixed and random effects is then combined with an evolutionary model to infer the strength of negative selection at both variant and gene levels. Compared with previously published methods, UNEECON shows improved performance in predicting missense variants and protein-coding genes associated with autosomal dominant disorders, and feature importance analysis suggests that both gene-level selective constraints and variant-level predictors are important for accurate variant prioritization. Furthermore, based on UNEECON, we observe a low correlation between gene-level intolerance to missense mutations and that to loss-of-function mutations, which can be partially explained by the prevalence of disordered protein regions that are highly tolerant to missense mutations. Finally, we show that genes intolerant to both missense and loss-of-function mutations play key roles in the central nervous system and the autism spectrum disorders. Overall, UNEECON is a promising framework for both variant and gene prioritization. Numerous statistical methods have been developed to predict deleterious missense variants or constrained genes in the human genome, but unified prioritization methods that utilize both variant- and gene-level information are underdeveloped. Here we present UNEECON, an evolution-based deep learning framework for unified variant and gene prioritization. By integrating variant-level predictors and gene-level selective constraints, UNEECON outperforms existing methods in predicting missense variants and protein-coding genes associated with dominant disorders. Based on UNEECON, we show that disordered proteins are tolerant to missense mutations but not to loss-of-function mutations. In addition, we find that genes under strong selective constraints at both missense and loss-of-function levels are strongly associated with the central nervous system and the autism spectrum disorders, highlighting the need to investigate the function of these highly constrained genes in future studies.
Collapse
Affiliation(s)
- Yi-Fei Huang
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania, United States of America
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania, United States of America
- * E-mail:
| |
Collapse
|
23
|
Dukler N, Huang YF, Siepel A. Phylogenetic Modeling of Regulatory Element Turnover Based on Epigenomic Data. Mol Biol Evol 2020; 37:2137-2152. [PMID: 32176292 PMCID: PMC7306682 DOI: 10.1093/molbev/msaa073] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Evolutionary changes in gene expression are often driven by gains and losses of cis-regulatory elements (CREs). The dynamics of CRE evolution can be examined using multispecies epigenomic data, but so far such analyses have generally been descriptive and model-free. Here, we introduce a probabilistic modeling framework for the evolution of CREs that operates directly on raw chromatin immunoprecipitation and sequencing (ChIP-seq) data and fully considers the phylogenetic relationships among species. Our framework includes a phylogenetic hidden Markov model, called epiPhyloHMM, for identifying the locations of multiply aligned CREs, and a combined phylogenetic and generalized linear model, called phyloGLM, for accounting for the influence of a rich set of genomic features in describing their evolutionary dynamics. We apply these methods to previously published ChIP-seq data for the H3K4me3 and H3K27ac histone modifications in liver tissue from nine mammals. We find that enhancers are gained and lost during mammalian evolution at about twice the rate of promoters, and that turnover rates are negatively correlated with DNA sequence conservation, expression level, and tissue breadth, and positively correlated with distance from the transcription start site, consistent with previous findings. In addition, we find that the predicted dosage sensitivity of target genes positively correlates with DNA sequence constraint in CREs but not with turnover rates, perhaps owing to differences in the effect sizes of the relevant mutations. Altogether, our probabilistic modeling framework enables a variety of powerful new analyses.
Collapse
Affiliation(s)
- Noah Dukler
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY
- Physiology, Biophysics, and Systems Biology, Weill Cornell Medical College, New York, NY
| | - Yi-Fei Huang
- Department of Biology and Huck Institute of Life Sciences, Pennsylvania State University, University Park, PA
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY
| |
Collapse
|
24
|
Takeda JI, Nanatsue K, Yamagishi R, Ito M, Haga N, Hirata H, Ogi T, Ohno K. InMeRF: prediction of pathogenicity of missense variants by individual modeling for each amino acid substitution. NAR Genom Bioinform 2020; 2:lqaa038. [PMID: 33543123 PMCID: PMC7671370 DOI: 10.1093/nargab/lqaa038] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2019] [Revised: 03/03/2020] [Accepted: 05/13/2020] [Indexed: 12/15/2022] Open
Abstract
In predicting the pathogenicity of a nonsynonymous single-nucleotide variant (nsSNV), a radical change in amino acid properties is prone to be classified as being pathogenic. However, not all such nsSNVs are associated with human diseases. We generated random forest (RF) models individually for each amino acid substitution to differentiate pathogenic nsSNVs in the Human Gene Mutation Database and common nsSNVs in dbSNP. We named a set of our models ‘Individual Meta RF’ (InMeRF). Ten-fold cross-validation of InMeRF showed that the areas under the curves (AUCs) of receiver operating characteristic (ROC) and precision–recall curves were on average 0.941 and 0.957, respectively. To compare InMeRF with seven other tools, the eight tools were generated using the same training dataset, and were compared using the same three testing datasets. ROC-AUCs of InMeRF were ranked first in the eight tools. We applied InMeRF to 155 pathogenic and 125 common nsSNVs in seven major genes causing congenital myasthenic syndromes, as well as in VANGL1 causing spina bifida, and found that the sensitivity and specificity of InMeRF were 0.942 and 0.848, respectively. We made the InMeRF web service, and also made genome-wide InMeRF scores available online (https://www.med.nagoya-u.ac.jp/neurogenetics/InMeRF/).
Collapse
Affiliation(s)
- Jun-Ichi Takeda
- Division of Neurogenetics, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, 65 Tsurumai, Showa-ku, Nagoya 466-8550, Japan
| | - Kentaro Nanatsue
- Division of Neurogenetics, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, 65 Tsurumai, Showa-ku, Nagoya 466-8550, Japan
| | - Ryosuke Yamagishi
- Division of Neurogenetics, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, 65 Tsurumai, Showa-ku, Nagoya 466-8550, Japan
| | - Mikako Ito
- Division of Neurogenetics, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, 65 Tsurumai, Showa-ku, Nagoya 466-8550, Japan
| | - Nobuhiko Haga
- Department of Rehabilitation Medicine, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8655, Japan
| | - Hiromi Hirata
- Department of Chemistry and Biological Science, College of Science and Engineering, Aoyama Gakuin University, 5-10-1 Fuchinobe, Chuo-ku, Sagamihara 252-5258, Japan
| | - Tomoo Ogi
- Department of Genetics, Research Institute of Environmental Medicine (RIeM), Nagoya University, Furo, Chikusa-ku, Nagoya 464-8601, Japan
| | - Kinji Ohno
- Division of Neurogenetics, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, 65 Tsurumai, Showa-ku, Nagoya 466-8550, Japan
| |
Collapse
|
25
|
Russell LE, Schwarz UI. Variant discovery using next-generation sequencing and its future role in pharmacogenetics. Pharmacogenomics 2020; 21:471-486. [DOI: 10.2217/pgs-2019-0190] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Next-generation sequencing (NGS) has enabled the discovery of a multitude of novel and mostly rare variants in pharmacogenes that may alter a patient’s therapeutic response to drugs. In addition to single nucleotide variants, structural variation affecting the number of copies of whole genes or parts of genes can be detected. While current guidelines concerning clinical implementation mostly act upon well-documented, common single nucleotide variants to guide dosing or drug selection, in silico and large-scale functional assessment of rare variant effects on protein function are at the forefront of pharmacogenetic research to facilitate their clinical integration. Here, we discuss the role of NGS in variant discovery, paving the way for more comprehensive genotype-guided pharmacotherapy that can translate to improved clinical care.
Collapse
Affiliation(s)
- Laura E Russell
- Department of Physiology & Pharmacology, Western University, Medical Sciences Building, London, ON, N6A 5C1, Canada
| | - Ute I Schwarz
- Department of Physiology & Pharmacology, Western University, Medical Sciences Building, London, ON, N6A 5C1, Canada
- Division of Clinical Pharmacology, Department of Medicine, Western University, London Health Sciences Centre – University Hospital, 339 Windermere Road, London, ON, N6A 5A5, Canada
| |
Collapse
|
26
|
Walker RL, Ramaswami G, Hartl C, Mancuso N, Gandal MJ, de la Torre-Ubieta L, Pasaniuc B, Stein JL, Geschwind DH. Genetic Control of Expression and Splicing in Developing Human Brain Informs Disease Mechanisms. Cell 2019; 179:750-771.e22. [PMID: 31626773 PMCID: PMC8963725 DOI: 10.1016/j.cell.2019.09.021] [Citation(s) in RCA: 145] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2018] [Revised: 06/06/2019] [Accepted: 09/20/2019] [Indexed: 02/08/2023]
Abstract
Tissue-specific regulatory regions harbor substantial genetic risk for disease. Because brain development is a critical epoch for neuropsychiatric disease susceptibility, we characterized the genetic control of the transcriptome in 201 mid-gestational human brains, identifying 7,962 expression quantitative trait loci (eQTL) and 4,635 spliceQTL (sQTL), including several thousand prenatal-specific regulatory regions. We show that significant genetic liability for neuropsychiatric disease lies within prenatal eQTL and sQTL. Integration of eQTL and sQTL with genome-wide association studies (GWAS) via transcriptome-wide association identified dozens of novel candidate risk genes, highlighting shared and stage-specific mechanisms in schizophrenia (SCZ). Gene network analysis revealed that SCZ and autism spectrum disorder (ASD) affect distinct developmental gene co-expression modules. Yet, in each disorder, common and rare genetic variation converges within modules, which in ASD implicates superficial cortical neurons. More broadly, these data, available as a web browser and our analyses, demonstrate the genetic mechanisms by which developmental events have a widespread influence on adult anatomical and behavioral phenotypes.
Collapse
Affiliation(s)
- Rebecca L Walker
- Department of Neurology, Center for Autism Research and Treatment, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, 695 Charles E. Young Drive South, Los Angeles, CA 90095, USA; Program in Neurobehavioral Genetics, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA; Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Gokul Ramaswami
- Department of Neurology, Center for Autism Research and Treatment, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, 695 Charles E. Young Drive South, Los Angeles, CA 90095, USA
| | - Christopher Hartl
- Department of Neurology, Center for Autism Research and Treatment, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, 695 Charles E. Young Drive South, Los Angeles, CA 90095, USA; Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Nicholas Mancuso
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90024, USA
| | - Michael J Gandal
- Department of Neurology, Center for Autism Research and Treatment, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, 695 Charles E. Young Drive South, Los Angeles, CA 90095, USA; Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Psychiatry, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, 695 Charles E. Young Drive South, Los Angeles, CA 90095, USA
| | - Luis de la Torre-Ubieta
- Department of Neurology, Center for Autism Research and Treatment, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, 695 Charles E. Young Drive South, Los Angeles, CA 90095, USA; Department of Psychiatry, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, 695 Charles E. Young Drive South, Los Angeles, CA 90095, USA
| | - Bogdan Pasaniuc
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90024, USA; Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Jason L Stein
- Department of Genetics and UNC Neuroscience Center, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Daniel H Geschwind
- Department of Neurology, Center for Autism Research and Treatment, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, 695 Charles E. Young Drive South, Los Angeles, CA 90095, USA; Program in Neurobehavioral Genetics, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Psychiatry, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, 695 Charles E. Young Drive South, Los Angeles, CA 90095, USA.
| |
Collapse
|
27
|
Further Defining the Phenotypic Spectrum of B3GAT3 Mutations and Literature Review on Linkeropathy Syndromes. Genes (Basel) 2019; 10:genes10090631. [PMID: 31438591 PMCID: PMC6770791 DOI: 10.3390/genes10090631] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 08/09/2019] [Accepted: 08/19/2019] [Indexed: 11/29/2022] Open
Abstract
The term linkeropathies (LKs) refers to a group of rare heritable connective tissue disorders, characterized by a variable degree of short stature, skeletal dysplasia, joint laxity, cutaneous anomalies, dysmorphism, heart malformation, and developmental delay. The LK genes encode for enzymes that add glycosaminoglycan chains onto proteoglycans via a common tetrasaccharide linker region. Biallelic variants in XYLT1 and XYLT2, encoding xylosyltransferases, are associated with Desbuquois dysplasia type 2 and spondylo-ocular syndrome, respectively. Defects in B4GALT7 and B3GALT6, encoding galactosyltransferases, lead to spondylodysplastic Ehlers-Danlos syndrome (spEDS). Mutations in B3GAT3, encoding a glucuronyltransferase, were described in 25 patients from 12 families with variable phenotypes resembling Larsen, Antley-Bixler, Shprintzen-Goldberg, and Geroderma osteodysplastica syndromes. Herein, we report on a 13-year-old girl with a clinical presentation suggestive of spEDS, according to the 2017 EDS nosology, in whom compound heterozygosity for two B3GAT3 likely pathogenic variants was identified. We review the spectrum of B3GAT3-related disorders and provide a comparison of all LK patients reported up to now, highlighting that LKs are a phenotypic continuum bridging EDS and skeletal disorders, hence offering future nosologic perspectives.
Collapse
|
28
|
Lipan O, Wu E. A stochastic switch with different phases. CHAOS (WOODBURY, N.Y.) 2019; 29:083107. [PMID: 31472510 DOI: 10.1063/1.5096778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Accepted: 07/16/2019] [Indexed: 06/10/2023]
Abstract
We describe an analog stochastic switch that exhibits three distinct phases as its parameters change. The phases are classified by the mean and variance of the switch's output. A phase change appears if the mean or the variance tends to a finite value or to infinity. The switch can be embedded in a large gene regulatory network for which the moment equations naturally close at the second order. This switch was used to model the response of a heat-shock system.
Collapse
Affiliation(s)
- Ovidiu Lipan
- Department of Physics, University of Richmond, Richmond, Virginia 23173, USA
| | - Emily Wu
- Department of Physics, University of Richmond, Richmond, Virginia 23173, USA
| |
Collapse
|
29
|
Berger MJ, Wenger AM, Guturu H, Bejerano G. Independent erosion of conserved transcription factor binding sites points to shared hindlimb, vision and external testes loss in different mammals. Nucleic Acids Res 2019; 46:9299-9308. [PMID: 30137416 PMCID: PMC6182171 DOI: 10.1093/nar/gky741] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2017] [Accepted: 08/21/2018] [Indexed: 02/05/2023] Open
Abstract
Genetic variation in cis-regulatory elements is thought to be a major driving force in morphological and physiological changes. However, identifying transcription factor binding events that code for complex traits remains a challenge, motivating novel means of detecting putatively important binding events. Using a curated set of 1154 high-quality transcription factor motifs, we demonstrate that independently eroded binding sites are enriched for independently lost traits in three distinct pairs of placental mammals. We show that these independently eroded events pinpoint the loss of hindlimbs in dolphin and manatee, degradation of vision in naked mole-rat and star-nosed mole, and the loss of external testes in white rhinoceros and Weddell seal. We additionally show that our method may also be utilized with more than two species. Our study exhibits a novel methodology to detect cis-regulatory mutations which help explain a portion of the molecular mechanism underlying complex trait formation and loss.
Collapse
Affiliation(s)
- Mark J Berger
- Department of Computer Science, Stanford University, Stanford, CA 94305-5329, USA
| | - Aaron M Wenger
- Department of Computer Science, Stanford University, Stanford, CA 94305-5329, USA
| | - Harendra Guturu
- Department of Electrical Engineering, Stanford University, Stanford, CA 94305-5008, USA
| | - Gill Bejerano
- Department of Computer Science, Stanford University, Stanford, CA 94305-5329, USA.,Department of Developmental Biology, Stanford University, Stanford, CA 94305-5329, USA.,Department of Pediatrics, Stanford University, Stanford, CA 94305-5208, USA.,Department of Biomedical Data Science, Stanford University, Stanford, CA 94305-5464, USA
| |
Collapse
|
30
|
Huang YF, Siepel A. Estimation of allele-specific fitness effects across human protein-coding sequences and implications for disease. Genome Res 2019; 29:1310-1321. [PMID: 31249063 PMCID: PMC6673719 DOI: 10.1101/gr.245522.118] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Accepted: 06/20/2019] [Indexed: 12/16/2022]
Abstract
A central challenge in human genomics is to understand the cellular, evolutionary, and clinical significance of genetic variants. Here, we introduce a unified population-genetic and machine-learning model, called Linear Allele-Specific Selection InferencE (LASSIE), for estimating the fitness effects of all observed and potential single-nucleotide variants, based on polymorphism data and predictive genomic features. We applied LASSIE to 51 high-coverage genome sequences annotated with 33 genomic features and constructed a map of allele-specific selection coefficients across all protein-coding sequences in the human genome. This map is generally consistent with previous inferences of the bulk distribution of fitness effects but reveals pervasive weak negative selection against synonymous mutations. In addition, the estimated selection coefficients are highly predictive of inherited pathogenic variants and cancer driver mutations, outperforming state-of-the-art variant prioritization methods. By contrasting our estimated model with ultrahigh coverage ExAC exome-sequencing data, we identified 1118 genes under unusually strong negative selection, which tend to be exclusively expressed in the central nervous system or associated with autism spectrum disorder, as well as 773 genes under unusually weak selection, which tend to be associated with metabolism. This combination of classical population genetic theory with modern machine-learning and large-scale genomic data is a powerful paradigm for the study of both human evolution and disease.
Collapse
Affiliation(s)
- Yi-Fei Huang
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| |
Collapse
|
31
|
Gain of transcription factor binding sites is associated to changes in the expression signature of human brain and testis and is correlated to genes with higher expression breadth. SCIENCE CHINA-LIFE SCIENCES 2019; 62:526-534. [PMID: 30919278 DOI: 10.1007/s11427-018-9454-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2018] [Accepted: 10/15/2018] [Indexed: 11/26/2022]
Abstract
The gain of transcription factor binding sites (TFBS) is believed to represent one of the major causes of biological innovation. Here we used strategies based on comparative genomics to identify 21,822 TFBS specific to the human lineage (TFBS-HS), when compared to chimpanzee and gorilla genomes. More than 40% (9,206) of these TFBS-HS are in the vicinity of 1,283 genes. A comparison of the expression pattern of these genes and the corresponding orthologs in chimpanzee and gorilla identified genes differentially expressed in human tissues. These genes show a more divergent expression pattern in the human testis and brain, suggesting a role for positive selection in the fixation of TFBS gains. Genes associated with TFBS-HS were enriched in gene ontology categories related to transcriptional regulation, signaling, differentiation/development and nervous system. Furthermore, genes associated with TFBS-HS present a higher expression breadth when compared to genes in general. This biased distribution is due to a preferential gain of TFBS in genes with higher expression breadth rather than a shift in the expression pattern after the gain of TFBS.
Collapse
|
32
|
Walter Costa MB, Höner zu Siederdissen C, Dunjić M, Stadler PF, Nowick K. SSS-test: a novel test for detecting positive selection on RNA secondary structure. BMC Bioinformatics 2019; 20:151. [PMID: 30898084 PMCID: PMC6429701 DOI: 10.1186/s12859-019-2711-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Accepted: 03/03/2019] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Long non-coding RNAs (lncRNAs) play an important role in regulating gene expression and are thus important for determining phenotypes. Most attempts to measure selection in lncRNAs have focused on the primary sequence. The majority of small RNAs and at least some parts of lncRNAs must fold into specific structures to perform their biological function. Comprehensive assessments of selection acting on RNAs therefore must also encompass structure. Selection pressures acting on the structure of non-coding genes can be detected within multiple sequence alignments. Approaches of this type, however, have so far focused on negative selection. Thus, a computational method for identifying ncRNAs under positive selection is needed. RESULTS We introduce the SSS-test (test for Selection on Secondary Structure) to identify positive selection and thus adaptive evolution. Benchmarks with biological as well as synthetic controls yield coherent signals for both negative and positive selection, demonstrating the functionality of the test. A survey of a lncRNA collection comprising 15,443 families resulted in 110 candidates that appear to be under positive selection in human. In 26 lncRNAs that have been associated with psychiatric disorders we identified local structures that have signs of positive selection in the human lineage. CONCLUSIONS It is feasible to assay positive selection acting on RNA secondary structures on a genome-wide scale. The detection of human-specific positive selection in lncRNAs associated with cognitive disorder provides a set of candidate genes for further experimental testing and may provide insights into the evolution of cognitive abilities in humans. AVAILABILITY The SSS-test and related software is available at: https://github.com/waltercostamb/SSS-test . The databases used in this work are available at: http://www.bioinf.uni-leipzig.de/Software/SSS-test/ .
Collapse
Affiliation(s)
- Maria Beatriz Walter Costa
- Embrapa Agroenergia, Parque Estação Biológica (PqEB), Asa Norte, Brasília, DF, 70770-901 Brazil
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16–18, Leipzig, 04107 Germany
| | - Christian Höner zu Siederdissen
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16–18, Leipzig, 04107 Germany
| | - Marko Dunjić
- Human Biology Group, Institute for Biology, Department of Biology, Chemistry, Pharmacy, Freie Universitaet Berlin, Königin-Luise-Straße 1-3, Berlin, 14195 Germany
- Center for Human Molecular Genetics, Faculty of Biology, University of Belgrade, Studentski trg 16, PO box 43, Belgrade, 11000 Serbia
| | - Peter F. Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16–18, Leipzig, 04107 Germany
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig & Competence Center for Scalable Data Services and Solutions Dresden-Leipzig & Leipzig Research Center for Civilization Diseases, University Leipzig, Leipzig, 04107 Germany
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, Leipzig, 04103 Germany
- Department of Theoretical Chemistry, University of Vienna, Währinger Straße 17, Vienna, A-1090 Austria
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C, DK-1870 Denmark
- Faculdad de Ciencias, Universidad Nacional de Colombia, Sede Bogotá, Ciudad Universitaria, Bogotá, D.C., COL-111321 Colombia
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM87501 USA
| | - Katja Nowick
- Human Biology Group, Institute for Biology, Department of Biology, Chemistry, Pharmacy, Freie Universitaet Berlin, Königin-Luise-Straße 1-3, Berlin, 14195 Germany
- TFome Research Group, Bioinformatics Group, Interdisciplinary Center of Bioinformatics, Department of Computer Science, University of Leipzig, Härtelstraße 16-18, Leipzig, 04107 Germany
- Paul-Flechsig-Institute for Brain Research, University of Leipzig, Liebigstraße 19. Haus C, Leipzig, 04103 Germany
- Bioinformatics, Faculty of Agricultural Sciences, Institute of Animal Science, University of Hohenheim, Garbenstraße 13, Stuttgart, 70593 Germany
| |
Collapse
|
33
|
Gulko B, Siepel A. An evolutionary framework for measuring epigenomic information and estimating cell-type-specific fitness consequences. Nat Genet 2018; 51:335-342. [PMID: 30559490 DOI: 10.1038/s41588-018-0300-z] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Accepted: 10/30/2018] [Indexed: 01/22/2023]
Abstract
Here we ask the question "How much information do epigenomic datasets provide about human genomic function?" We consider nine epigenomic features across 115 cell types and measure information about function as a reduction in entropy under a probabilistic evolutionary model fitted to human and nonhuman primate genomes. Several epigenomic features yield more information in combination than they do individually. We find that the entropy in human genetic variation predominantly reflects a balance between mutation and neutral drift. Our cell-type-specific FitCons scores reveal relationships among cell types and suggest that around 8% of nucleotide sites are constrained by natural selection.
Collapse
Affiliation(s)
- Brad Gulko
- Graduate Field of Computer Science, Cornell University, Ithaca, NY, USA.,Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
| |
Collapse
|
34
|
Zhou Y, Fujikura K, Mkrtchian S, Lauschke VM. Computational Methods for the Pharmacogenetic Interpretation of Next Generation Sequencing Data. Front Pharmacol 2018; 9:1437. [PMID: 30564131 PMCID: PMC6288784 DOI: 10.3389/fphar.2018.01437] [Citation(s) in RCA: 48] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Accepted: 11/20/2018] [Indexed: 12/21/2022] Open
Abstract
Up to half of all patients do not respond to pharmacological treatment as intended. A substantial fraction of these inter-individual differences is due to heritable factors and a growing number of associations between genetic variations and drug response phenotypes have been identified. Importantly, the rapid progress in Next Generation Sequencing technologies in recent years unveiled the true complexity of the genetic landscape in pharmacogenes with tens of thousands of rare genetic variants. As each individual was found to harbor numerous such rare variants they are anticipated to be important contributors to the genetically encoded inter-individual variability in drug effects. The fundamental challenge however is their functional interpretation due to the sheer scale of the problem that renders systematic experimental characterization of these variants currently unfeasible. Here, we review concepts and important progress in the development of computational prediction methods that allow to evaluate the effect of amino acid sequence alterations in drug metabolizing enzymes and transporters. In addition, we discuss recent advances in the interpretation of functional effects of non-coding variants, such as variations in splice sites, regulatory regions and miRNA binding sites. We anticipate that these methodologies will provide a useful toolkit to facilitate the integration of the vast extent of rare genetic variability into drug response predictions in a precision medicine framework.
Collapse
Affiliation(s)
- Yitian Zhou
- Section of Pharmacogenetics, Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
| | - Kohei Fujikura
- Department of Diagnostic Pathology, Kobe University Graduate School of Medicine, Kobe, Japan
| | - Souren Mkrtchian
- Section of Pharmacogenetics, Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
| | - Volker M. Lauschke
- Section of Pharmacogenetics, Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
35
|
Reshef YA, Finucane HK, Kelley DR, Gusev A, Kotliar D, Ulirsch JC, Hormozdiari F, Nasser J, O'Connor L, van de Geijn B, Loh PR, Grossman SR, Bhatia G, Gazal S, Palamara PF, Pinello L, Patterson N, Adams RP, Price AL. Detecting genome-wide directional effects of transcription factor binding on polygenic disease risk. Nat Genet 2018; 50:1483-1493. [PMID: 30177862 PMCID: PMC6202062 DOI: 10.1038/s41588-018-0196-7] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Accepted: 07/11/2018] [Indexed: 12/19/2022]
Abstract
Biological interpretation of genome-wide association study data frequently involves assessing whether SNPs linked to a biological process, for example, binding of a transcription factor, show unsigned enrichment for disease signal. However, signed annotations quantifying whether each SNP allele promotes or hinders the biological process can enable stronger statements about disease mechanism. We introduce a method, signed linkage disequilibrium profile regression, for detecting genome-wide directional effects of signed functional annotations on disease risk. We validate the method via simulations and application to molecular quantitative trait loci in blood, recovering known transcriptional regulators. We apply the method to expression quantitative trait loci in 48 Genotype-Tissue Expression tissues, identifying 651 transcription factor-tissue associations including 30 with robust evidence of tissue specificity. We apply the method to 46 diseases and complex traits (average n = 290 K), identifying 77 annotation-trait associations representing 12 independent transcription factor-trait associations, and characterize the underlying transcriptional programs using gene-set enrichment analyses. Our results implicate new causal disease genes and new disease mechanisms.
Collapse
Affiliation(s)
- Yakir A Reshef
- Department of Computer Science, Harvard University, Cambridge, MA, USA.
- Harvard/MIT MD/PhD Program, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | | | - David R Kelley
- California Life Sciences LLC, South San Francisco, CA, USA
| | | | - Dylan Kotliar
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jacob C Ulirsch
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Dana Farber Cancer Institute, Boston, MA, USA
- Boston Children's Hospital, Boston, MA, USA
| | - Farhad Hormozdiari
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Joseph Nasser
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Luke O'Connor
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Bioinformatics and Integrative Genomics, Harvard University, Cambridge, MA, USA
| | - Bryce van de Geijn
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Po-Ru Loh
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Sharon R Grossman
- Harvard/MIT MD/PhD Program, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Gaurav Bhatia
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Steven Gazal
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Pier Francesco Palamara
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Statistics, University of Oxford, Oxford, UK
| | - Luca Pinello
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Massachusetts General Hospital, Charlestown, MA, USA
- Department of Pathology, Harvard Medical School, Boston, MA, USA
| | | | - Ryan P Adams
- Google Brain, New York, NY, USA
- Department of Computer Science, Princeton University, Princeton, NJ, USA
| | - Alkes L Price
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| |
Collapse
|
36
|
Davis GE, Lowell WE. Solar energy at birth and human lifespan. JOURNAL OF PHOTOCHEMISTRY AND PHOTOBIOLOGY B-BIOLOGY 2018; 186:59-68. [DOI: 10.1016/j.jphotobiol.2018.07.006] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2018] [Revised: 06/29/2018] [Accepted: 07/04/2018] [Indexed: 01/03/2023]
|
37
|
Niu M, Tabari E, Ni P, Su Z. Towards a map of cis-regulatory sequences in the human genome. Nucleic Acids Res 2018; 46:5395-5409. [PMID: 29733395 PMCID: PMC6009671 DOI: 10.1093/nar/gky338] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2018] [Revised: 04/14/2018] [Accepted: 04/19/2018] [Indexed: 01/10/2023] Open
Abstract
Accumulating evidence indicates that transcription factor (TF) binding sites, or cis-regulatory elements (CREs), and their clusters termed cis-regulatory modules (CRMs) play a more important role than do gene-coding sequences in specifying complex traits in humans, including the susceptibility to common complex diseases. To fully characterize their roles in deriving the complex traits/diseases, it is necessary to annotate all CREs and CRMs encoded in the human genome. However, the current annotations of CREs and CRMs in the human genome are still very limited and mostly coarse-grained, as they often lack the detailed information of CREs in CRMs. Here, we integrated 620 TF ChIP-seq datasets produced by the ENCODE project for 168 TFs in 79 different cell/tissue types and predicted an unprecedentedly completely map of CREs in CRMs in the human genome at single nucleotide resolution. The map includes 305 912 CRMs containing a total of 1 178 913 CREs belonging to 736 unique TF binding motifs. The predicted CREs and CRMs tend to be subject to either purifying selection or positive selection, thus are likely to be functional. Based on the results, we also examined the status of available ChIP-seq datasets for predicting the entire regulatory genome of humans.
Collapse
Affiliation(s)
- Meng Niu
- Department of Bioinformatics and Genomics, College of Computing and Informatics, The University of North Carolina at Charlotte, 9201 University City Blvd., Charlotte, NC 28223, USA
| | - Ehsan Tabari
- Department of Bioinformatics and Genomics, College of Computing and Informatics, The University of North Carolina at Charlotte, 9201 University City Blvd., Charlotte, NC 28223, USA
| | - Pengyu Ni
- Department of Bioinformatics and Genomics, College of Computing and Informatics, The University of North Carolina at Charlotte, 9201 University City Blvd., Charlotte, NC 28223, USA
| | - Zhengchang Su
- Department of Bioinformatics and Genomics, College of Computing and Informatics, The University of North Carolina at Charlotte, 9201 University City Blvd., Charlotte, NC 28223, USA
| |
Collapse
|
38
|
Lee KS, Chatterjee P, Choi EY, Sung MK, Oh J, Won H, Park SM, Kim YJ, Yi SV, Choi JK. Selection on the regulation of sympathetic nervous activity in humans and chimpanzees. PLoS Genet 2018; 14:e1007311. [PMID: 29672586 PMCID: PMC5908061 DOI: 10.1371/journal.pgen.1007311] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Accepted: 03/17/2018] [Indexed: 12/31/2022] Open
Abstract
Adrenergic α2C receptor (ADRA2C) is an inhibitory modulator of the sympathetic nervous system. Knockout mice for this gene show physiological and behavioural alterations that are associated with the fight-or-flight response. There is evidence of positive selection on the regulation of this gene during chicken domestication. Here, we find that the neuronal expression of ADRA2C is lower in human and chimpanzee than in other primates. On the basis of three-dimensional chromatin structure, we identified a cis-regulatory region whose DNA sequences have been significantly accelerated in human and chimpanzee. Active histone modification marks this region in rhesus macaque but not in human and chimpanzee; instead, repressive marks are enriched in various human brain samples. This region contains two neuron-restrictive silencer factor (NRSF) binding motifs, each of which harbours a polymorphism. Our genotyping and analysis of population genome data indicate that at both polymorphic sites, the derived allele has reached fixation in humans and chimpanzees but not in bonobos, whereas only the ancestral allele is present among macaques. Our CRISPR/Cas9 genome editing and reporter assays show that both derived nucleotides repress ADRA2C, most likely by increasing NRSF binding. In addition, we detected signatures of recent positive selection for lower neuronal ADRA2C expression in humans. Our findings indicate that there has been selective pressure for enhanced sympathetic nervous activity in the evolution of humans and chimpanzees.
Collapse
Affiliation(s)
- Kang Seon Lee
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
| | - Paramita Chatterjee
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Eun-Young Choi
- Specific Organs Cancer Branch, Research Institute, National Cancer Center, Ilsan, Gyeonggi, Republic of Korea
| | - Min Kyung Sung
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
| | - Jaeho Oh
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
| | - Hyejung Won
- Department of Neurology, University of California Los Angeles, Los Angeles, California, United States of America
| | - Seong-Min Park
- Specific Organs Cancer Branch, Research Institute, National Cancer Center, Ilsan, Gyeonggi, Republic of Korea
| | - Youn-Jae Kim
- Specific Organs Cancer Branch, Research Institute, National Cancer Center, Ilsan, Gyeonggi, Republic of Korea
| | - Soojin V. Yi
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Jung Kyoon Choi
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
| |
Collapse
|
39
|
Sheep genome functional annotation reveals proximal regulatory elements contributed to the evolution of modern breeds. Nat Commun 2018; 9:859. [PMID: 29491421 PMCID: PMC5830443 DOI: 10.1038/s41467-017-02809-1] [Citation(s) in RCA: 68] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2017] [Accepted: 12/03/2017] [Indexed: 12/30/2022] Open
Abstract
Domestication fundamentally reshaped animal morphology, physiology and behaviour, offering the opportunity to investigate the molecular processes driving evolutionary change. Here we assess sheep domestication and artificial selection by comparing genome sequence from 43 modern breeds (Ovis aries) and their Asian mouflon ancestor (O. orientalis) to identify selection sweeps. Next, we provide a comparative functional annotation of the sheep genome, validated using experimental ChIP-Seq of sheep tissue. Using these annotations, we evaluate the impact of selection and domestication on regulatory sequences and find that sweeps are significantly enriched for protein coding genes, proximal regulatory elements of genes and genome features associated with active transcription. Finally, we find individual sites displaying strong allele frequency divergence are enriched for the same regulatory features. Our data demonstrate that remodelling of gene expression is likely to have been one of the evolutionary forces that drove phenotypic diversification of this common livestock species. The domestication of plants and animals causes genomic changes underlying various morphologic, physiologic and behavioral changes. Here, Naval-Sanchez et al. provide a ChIP-Seq validated comparative functional annotation of the sheep genome, and show widespread evolution of proximal regulatory elements.
Collapse
|
40
|
Harakalova M, Asselbergs FW. Systems analysis of dilated cardiomyopathy in the next generation sequencing era. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2018; 10:e1419. [PMID: 29485202 DOI: 10.1002/wsbm.1419] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2017] [Revised: 12/31/2017] [Accepted: 01/17/2018] [Indexed: 12/17/2022]
Abstract
Dilated cardiomyopathy (DCM) is a form of severe failure of cardiac muscle caused by a long list of etiologies ranging from myocardial infarction, DNA mutations in cardiac genes, to toxics. Systems analysis integrating next-generation sequencing (NGS)-based omics approaches, such as the sequencing of DNA, RNA, and chromatin, provide valuable insights into DCM mechanisms. The outcome and interpretation of NGS methods can be affected by the localization of cardiac biopsy, level of tissue degradation, and variable ratios of different cell populations, especially in the presence of fibrosis. Heart tissue composition may even differ between sexes, or siblings carrying the same disease causing mutation. Therefore, before planning any experiments, it is important to fully appreciate the complexities of DCM, and the selection of samples suitable for given research question should be an interdisciplinary effort involving clinicians and biologists. The list of NGS omics datasets in DCM to date is short. More studies have to be performed to contribute to public data repositories and facilitate systems analysis. In addition, proper data integration is a difficult task requiring complex computational approaches. Despite these complications, there are multiple promising implications of systems analysis in DCM. By combining various types of datasets, for example, RNA-seq, ChIP-seq, or 4C, deep insights into cardiac biology, and possible biomarkers and treatment targets, can be gained. Systems analysis can also facilitate the annotation of noncoding mutations in cardiac-specific DNA regulatory regions that play a substantial role in maintaining the tissue- and cell-specific transcriptional programs in the heart. This article is categorized under: Physiology > Mammalian Physiology in Health and Disease Laboratory Methods and Technologies > Genetic/Genomic Methods Laboratory Methods and Technologies > RNA Methods.
Collapse
Affiliation(s)
- Magdalena Harakalova
- Department of Cardiology, Division Heart and Lungs, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
| | - Folkert W Asselbergs
- Department of Cardiology, Division Heart and Lungs, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands.,Durrer Center for Cardiovascular Research, Netherlands Heart Institute, Utrecht, Netherlands.,Institute of Cardiovascular Science, University College London, London, UK
| |
Collapse
|
41
|
Redundant regulation. Nat Ecol Evol 2018; 2:418-419. [PMID: 29379186 DOI: 10.1038/s41559-018-0479-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
42
|
Dynamic evolution of regulatory element ensembles in primate CD4 + T cells. Nat Ecol Evol 2018; 2:537-548. [PMID: 29379187 DOI: 10.1038/s41559-017-0447-5] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2017] [Accepted: 12/08/2017] [Indexed: 12/12/2022]
Abstract
How evolutionary changes at enhancers affect the transcription of target genes remains an important open question. Previous comparative studies of gene expression have largely measured the abundance of messenger RNA, which is affected by post-transcriptional regulatory processes, hence limiting inferences about the mechanisms underlying expression differences. Here, we directly measured nascent transcription in primate species, allowing us to separate transcription from post-transcriptional regulation. We used precision run-on and sequencing to map RNA polymerases in resting and activated CD4+ T cells in multiple human, chimpanzee and rhesus macaque individuals, with rodents as outgroups. We observed general conservation in coding and non-coding transcription, punctuated by numerous differences between species, particularly at distal enhancers and non-coding RNAs. Genes regulated by larger numbers of enhancers are more frequently transcribed at evolutionarily stable levels, despite reduced conservation at individual enhancers. Adaptive nucleotide substitutions are associated with lineage-specific transcription and at one locus, SGPP2, we predict and experimentally validate that multiple substitutions contribute to human-specific transcription. Collectively, our findings suggest a pervasive role for evolutionary compensation across ensembles of enhancers that jointly regulate target genes.
Collapse
|
43
|
Li X, Kim Y, Tsang EK, Davis JR, Damani FN, Chiang C, Hess GT, Zappala Z, Strober BJ, Scott AJ, Li A, Ganna A, Bassik MC, Merker JD, Hall IM, Battle A, Montgomery SB. The impact of rare variation on gene expression across tissues. Nature 2017; 550:239-243. [PMID: 29022581 PMCID: PMC5877409 DOI: 10.1038/nature24267] [Citation(s) in RCA: 159] [Impact Index Per Article: 22.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2016] [Accepted: 09/13/2017] [Indexed: 12/24/2022]
Abstract
Rare genetic variants are abundant in humans and are expected to contribute to individual disease risk. While genetic association studies have successfully identified common genetic variants associated with susceptibility, these studies are not practical for identifying rare variants. Efforts to distinguish pathogenic variants from benign rare variants have leveraged the genetic code to identify deleterious protein-coding alleles, but no analogous code exists for non-coding variants. Therefore, ascertaining which rare variants have phenotypic effects remains a major challenge. Rare non-coding variants have been associated with extreme gene expression in studies using single tissues, but their effects across tissues are unknown. Here we identify gene expression outliers, or individuals showing extreme expression levels for a particular gene, across 44 human tissues by using combined analyses of whole genomes and multi-tissue RNA-sequencing data from the Genotype-Tissue Expression (GTEx) project v6p release. We find that 58% of underexpression and 28% of overexpression outliers have nearby conserved rare variants compared to 8% of non-outliers. Additionally, we developed RIVER (RNA-informed variant effect on regulation), a Bayesian statistical model that incorporates expression data to predict a regulatory effect for rare variants with higher accuracy than models using genomic annotations alone. Overall, we demonstrate that rare variants contribute to large gene expression changes across tissues and provide an integrative method for interpretation of rare variants in individual genomes.
Collapse
Affiliation(s)
- Xin Li
- Department of Pathology, Stanford University, Stanford, California 94305, USA
| | - Yungil Kim
- Department of Computer Science, Johns Hopkins University, Baltimore 21218, Maryland, USA
| | - Emily K Tsang
- Department of Pathology, Stanford University, Stanford, California 94305, USA
- Biomedical Informatics Program, Stanford University, Stanford, California 94305, USA
| | - Joe R Davis
- Department of Pathology, Stanford University, Stanford, California 94305, USA
- Department of Genetics, Stanford University, Stanford, California 94305, USA
| | - Farhan N Damani
- Department of Computer Science, Johns Hopkins University, Baltimore 21218, Maryland, USA
| | - Colby Chiang
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, Missouri 63108, USA
| | - Gaelen T Hess
- Department of Genetics, Stanford University, Stanford, California 94305, USA
| | - Zachary Zappala
- Department of Pathology, Stanford University, Stanford, California 94305, USA
- Department of Genetics, Stanford University, Stanford, California 94305, USA
| | - Benjamin J Strober
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Alexandra J Scott
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, Missouri 63108, USA
| | - Amy Li
- Department of Genetics, Stanford University, Stanford, California 94305, USA
| | - Andrea Ganna
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Michael C Bassik
- Department of Genetics, Stanford University, Stanford, California 94305, USA
| | - Jason D Merker
- Department of Pathology, Stanford University, Stanford, California 94305, USA
| | - Ira M Hall
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, Missouri 63108, USA
- Department of Medicine, Washington University School of Medicine, St Louis, Missouri 63110, USA
- Department of Genetics, Washington University School of Medicine, St Louis, Missouri 63110, USA
| | - Alexis Battle
- Department of Computer Science, Johns Hopkins University, Baltimore 21218, Maryland, USA
| | - Stephen B Montgomery
- Department of Pathology, Stanford University, Stanford, California 94305, USA
- Department of Genetics, Stanford University, Stanford, California 94305, USA
| |
Collapse
|
44
|
Gursky VV, Kozlov KN, Kulakovskiy IV, Zubair A, Marjoram P, Lawrie DS, Nuzhdin SV, Samsonova MG. Translating natural genetic variation to gene expression in a computational model of the Drosophila gap gene regulatory network. PLoS One 2017; 12:e0184657. [PMID: 28898266 PMCID: PMC5595321 DOI: 10.1371/journal.pone.0184657] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 08/28/2017] [Indexed: 11/18/2022] Open
Abstract
Annotating the genotype-phenotype relationship, and developing a proper quantitative description of the relationship, requires understanding the impact of natural genomic variation on gene expression. We apply a sequence-level model of gap gene expression in the early development of Drosophila to analyze single nucleotide polymorphisms (SNPs) in a panel of natural sequenced D. melanogaster lines. Using a thermodynamic modeling framework, we provide both analytical and computational descriptions of how single-nucleotide variants affect gene expression. The analysis reveals that the sequence variants increase (decrease) gene expression if located within binding sites of repressors (activators). We show that the sign of SNP influence (activation or repression) may change in time and space and elucidate the origin of this change in specific examples. The thermodynamic modeling approach predicts non-local and non-linear effects arising from SNPs, and combinations of SNPs, in individual fly genotypes. Simulation of individual fly genotypes using our model reveals that this non-linearity reduces to almost additive inputs from multiple SNPs. Further, we see signatures of the action of purifying selection in the gap gene regulatory regions. To infer the specific targets of purifying selection, we analyze the patterns of polymorphism in the data at two phenotypic levels: the strengths of binding and expression. We find that combinations of SNPs show evidence of being under selective pressure, while individual SNPs do not. The model predicts that SNPs appear to accumulate in the genotypes of the natural population in a way biased towards small increases in activating action on the expression pattern. Taken together, these results provide a systems-level view of how genetic variation translates to the level of gene regulatory networks via combinatorial SNP effects.
Collapse
Affiliation(s)
- Vitaly V. Gursky
- Theoretical Department, Ioffe Institute, Saint Petersburg, Russia
- Systems Biology and Bioinformatics Laboratory, Peter the Great Saint Petersburg Polytechnic University, Saint Petersburg, Russia
- * E-mail:
| | - Konstantin N. Kozlov
- Systems Biology and Bioinformatics Laboratory, Peter the Great Saint Petersburg Polytechnic University, Saint Petersburg, Russia
| | - Ivan V. Kulakovskiy
- Engelhardt Institute of Molecular Biology, Moscow, Russia
- Vavilov Institute of General Genetics, Moscow, Russia
- Center for Data-Intensive Biomedicine and Biotechnology, Skolkovo Institute of Science and Technology, Moscow, Russia
| | - Asif Zubair
- Molecular and Computational Biology, University of Southern California, Los Angeles, California, United States of America
| | - Paul Marjoram
- Molecular and Computational Biology, University of Southern California, Los Angeles, California, United States of America
| | - David S. Lawrie
- Molecular and Computational Biology, University of Southern California, Los Angeles, California, United States of America
| | - Sergey V. Nuzhdin
- Molecular and Computational Biology, University of Southern California, Los Angeles, California, United States of America
| | - Maria G. Samsonova
- Systems Biology and Bioinformatics Laboratory, Peter the Great Saint Petersburg Polytechnic University, Saint Petersburg, Russia
| |
Collapse
|
45
|
Kober KM, Pogson GH. Genome-wide signals of positive selection in strongylocentrotid sea urchins. BMC Genomics 2017; 18:555. [PMID: 28732465 PMCID: PMC5521101 DOI: 10.1186/s12864-017-3944-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Accepted: 07/13/2017] [Indexed: 12/21/2022] Open
Abstract
Background Comparative genomics studies investigating the signals of positive selection among groups of closely related species are still rare and limited in taxonomic breadth. Such studies show great promise in advancing our knowledge about the proportion and the identity of genes experiencing diversifying selection. However, methodological challenges have led to high levels of false positives in past studies. Here, we use the well-annotated genome of the purple sea urchin, Strongylocentrotus purpuratus, as a reference to investigate the signals of positive selection at 6520 single-copy orthologs from nine sea urchin species belonging to the family Strongylocentrotidae paying careful attention to minimizing false positives. Results We identified 1008 (15.5%) candidate positive selection genes (PSGs). Tests for positive selection along the nine terminal branches of the phylogeny identified 824 genes that showed lineage-specific adaptive diversification (1.67% of branch-sites tests performed). Positively selected codons were not enriched at exon borders or near regions containing missing data, suggesting a limited contribution of false positives caused by alignment or annotation errors. Alignments were validated at 10 loci with re-sequencing using Sanger methods. No differences were observed in the rates of synonymous substitution (dS), GC content, and codon bias between the candidate PSGs and those not showing positive selection. However, the candidate PSGs had 68% higher rates of nonsynonymous substitution (dN) and 33% lower levels of heterozygosity, consistent with selective sweeps and opposite to that expected by a relaxation of selective constraint. Although positive selection was identified at reproductive proteins and innate immunity genes, the strongest signals of adaptive diversification were observed at extracellular matrix proteins, cell adhesion molecules, membrane receptors, and ion channels. Many candidate PSGs have been widely implicated as targets of pathogen binding, inactivation, mimicry, or exploitation in other groups (notably mammals). Conclusions Our study confirmed the widespread action of positive selection across sea urchin genomes and allowed us to reject the possibility that annotation and alignment errors (including paralogs) were responsible for creating false signals of adaptive molecular divergence. The candidate PSGs identified in our study represent promising targets for future research into the selective agents responsible for their adaptive diversification and their contribution to speciation. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3944-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kord M Kober
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, USA. .,Institute for Computational Health Sciences, University of California, San Francisco, USA. .,Present address: Department of Physiological Nursing, University of California, San Francisco, USA.
| | - Grant H Pogson
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, USA
| |
Collapse
|
46
|
Li A, Hooli B, Mullin K, Tate RE, Bubnys A, Kirchner R, Chapman B, Hofmann O, Hide W, Tanzi RE. Silencing of the Drosophila ortholog of SOX5 leads to abnormal neuronal development and behavioral impairment. Hum Mol Genet 2017; 26:1472-1482. [PMID: 28186563 DOI: 10.1093/hmg/ddx051] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2016] [Accepted: 02/07/2017] [Indexed: 01/27/2023] Open
Abstract
SOX5 encodes a transcription factor that is expressed in multiple tissues including heart, lung and brain. Mutations in SOX5 have been previously found in patients with amyotrophic lateral sclerosis (ALS) and developmental delay, intellectual disability and dysmorphic features. To characterize the neuronal role of SOX5, we silenced the Drosophila ortholog of SOX5, Sox102F, by RNAi in various neuronal subtypes in Drosophila. Silencing of Sox102F led to misorientated and disorganized michrochaetes, neurons with shorter dendritic arborization (DA) and reduced complexity, diminished larval peristaltic contractions, loss of neuromuscular junction bouton structures, impaired olfactory perception, and severe neurodegeneration in brain. Silencing of SOX5 in human SH-SY5Y neuroblastoma cells resulted in a significant repression of WNT signaling activity and altered expression of WNT-related genes. Genetic association and meta-analyses of the results in several large family-based and case-control late-onset familial Alzheimer's disease (LOAD) samples of SOX5 variants revealed several variants that show significant association with AD disease status. In addition, analysis for rare and highly penetrate functional variants revealed four novel variants/mutations in SOX5, which taken together with functional prediction analysis, suggests a strong role of SOX5 causing AD in the carrier families. Collectively, these findings indicate that SOX5 is a novel candidate gene for LOAD with an important role in neuronal function. The genetic findings warrant further studies to identify and characterize SOX5 variants that confer risk for AD, ALS and intellectual disability.
Collapse
Affiliation(s)
- Airong Li
- Genetics and Aging Research Unit, Department of Neurology, Massachusetts General Hospital, Harvard Medical School, MassGeneral Institute for Neurodegenerative Diseases, Charlestown, MA 02129, USA
| | - Basavaraj Hooli
- Genetics and Aging Research Unit, Department of Neurology, Massachusetts General Hospital, Harvard Medical School, MassGeneral Institute for Neurodegenerative Diseases, Charlestown, MA 02129, USA
| | - Kristina Mullin
- Genetics and Aging Research Unit, Department of Neurology, Massachusetts General Hospital, Harvard Medical School, MassGeneral Institute for Neurodegenerative Diseases, Charlestown, MA 02129, USA
| | - Rebecca E Tate
- Genetics and Aging Research Unit, Department of Neurology, Massachusetts General Hospital, Harvard Medical School, MassGeneral Institute for Neurodegenerative Diseases, Charlestown, MA 02129, USA
| | - Adele Bubnys
- Genetics and Aging Research Unit, Department of Neurology, Massachusetts General Hospital, Harvard Medical School, MassGeneral Institute for Neurodegenerative Diseases, Charlestown, MA 02129, USA
| | - Rory Kirchner
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA
| | - Brad Chapman
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA
| | - Oliver Hofmann
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA.,Center for Cancer Research, University of Melbourne, Melbourne 3000, Australia and
| | - Winston Hide
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA.,Department of Neuroscience, Sheffield Institute for Translational Neuroscience, University of Sheffield, Sheffield S10 2HQ, UK
| | - Rudolph E Tanzi
- Genetics and Aging Research Unit, Department of Neurology, Massachusetts General Hospital, Harvard Medical School, MassGeneral Institute for Neurodegenerative Diseases, Charlestown, MA 02129, USA
| |
Collapse
|
47
|
Savisaar R, Hurst LD. Estimating the prevalence of functional exonic splice regulatory information. Hum Genet 2017; 136:1059-1078. [PMID: 28405812 PMCID: PMC5602102 DOI: 10.1007/s00439-017-1798-3] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2017] [Accepted: 04/04/2017] [Indexed: 12/14/2022]
Abstract
In addition to coding information, human exons contain sequences necessary for correct splicing. These elements are known to be under purifying selection and their disruption can cause disease. However, the density of functional exonic splicing information remains profoundly uncertain. Several groups have experimentally investigated how mutations at different exonic positions affect splicing. They have found splice information to be distributed widely in exons, with one estimate putting the proportion of splicing-relevant nucleotides at >90%. These results suggest that splicing could place a major pressure on exon evolution. However, analyses of sequence conservation have concluded that the need to preserve splice regulatory signals only slightly constrains exon evolution, with a resulting decrease in the average human rate of synonymous evolution of only 1–4%. Why do these two lines of research come to such different conclusions? Among other reasons, we suggest that the methods are measuring different things: one assays the density of sites that affect splicing, the other the density of sites whose effects on splicing are visible to selection. In addition, the experimental methods typically consider short exons, thereby enriching for nucleotides close to the splice junction, such sites being enriched for splice-control elements. By contrast, in part owing to correction for nucleotide composition biases and to the assumption that constraint only operates on exon ends, the conservation-based methods can be overly conservative.
Collapse
Affiliation(s)
- Rosina Savisaar
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK.
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK
| |
Collapse
|
48
|
Huang YF, Gulko B, Siepel A. Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data. Nat Genet 2017; 49:618-624. [PMID: 28288115 PMCID: PMC5395419 DOI: 10.1038/ng.3810] [Citation(s) in RCA: 221] [Impact Index Per Article: 31.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 02/13/2017] [Indexed: 12/17/2022]
Abstract
Many genetic variants that influence phenotypes of interest are located outside of protein-coding genes, yet existing methods for identifying such variants have poor predictive power. Here we introduce a new computational method, called LINSIGHT, that substantially improves the prediction of noncoding nucleotide sites at which mutations are likely to have deleterious fitness consequences, and which, therefore, are likely to be phenotypically important. LINSIGHT combines a generalized linear model for functional genomic data with a probabilistic model of molecular evolution. The method is fast and highly scalable, enabling it to exploit the 'big data' available in modern genomics. We show that LINSIGHT outperforms the best available methods in identifying human noncoding variants associated with inherited diseases. In addition, we apply LINSIGHT to an atlas of human enhancers and show that the fitness consequences at enhancers depend on cell type, tissue specificity, and constraints at associated promoters.
Collapse
Affiliation(s)
- Yi-Fei Huang
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | - Brad Gulko
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA.,Graduate Field of Computer Science, Cornell University, Ithaca, New York, USA
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| |
Collapse
|
49
|
Evolution of Brain Active Gene Promoters in Human Lineage Towards the Increased Plasticity of Gene Regulation. Mol Neurobiol 2017; 55:1871-1904. [PMID: 28233272 DOI: 10.1007/s12035-017-0427-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Accepted: 01/26/2017] [Indexed: 01/31/2023]
Abstract
Adaptability to a variety of environmental conditions is a prominent feature of Homo sapiens. We hypothesize that this feature can be explained by evolutionary changes in gene promoters active in the brain prefrontal cortex leading to a more flexible gene regulation network. The genotype-dependent range of gene expression can be broader in humans than in other higher primates. Thus, we searched for specific signatures of evolutionary changes in promoter architectures of multiple hominid genes, including the genes active in human cortical neurons that may indicate an increase of variability of gene expression rather than just changes in the level of expression, such as downregulation or upregulation of the genes. We performed a whole-genome search for genetic-based alterations that may impact gene regulation "flexibility" in a process of hominids evolution, such as (i) CpG dinucleotide content, (ii) predicted nucleosome-DNA dissociation constant, and (iii) predicted affinities for TATA-binding protein (TBP) in gene promoters. We tested all putative promoter regions across the human genome and especially gene promoters in active chromatin state in neurons of prefrontal cortex, the brain region critical for abstract thinking and social and behavioral adaptation. Our data imply that the origin of modern man has been associated with an increase of flexibility of promoter-driven gene regulation in brain. In contrast, after splitting from the ancestral lineages of H. sapiens, the evolution of ape species is characterized by reduced flexibility of gene promoter functioning, underlying reduced variability of the gene expression.
Collapse
|
50
|
Schor IE, Degner JF, Harnett D, Cannavò E, Casale FP, Shim H, Garfield DA, Birney E, Stephens M, Stegle O, Furlong EEM. Promoter shape varies across populations and affects promoter evolution and expression noise. Nat Genet 2017; 49:550-558. [PMID: 28191888 DOI: 10.1038/ng.3791] [Citation(s) in RCA: 58] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Accepted: 01/20/2017] [Indexed: 12/29/2022]
Abstract
Animal promoters initiate transcription either at precise positions (narrow promoters) or dispersed regions (broad promoters), a distinction referred to as promoter shape. Although highly conserved, the functional properties of promoters with different shapes and the genetic basis of their evolution remain unclear. Here we used natural genetic variation across a panel of 81 Drosophila lines to measure changes in transcriptional start site (TSS) usage, identifying thousands of genetic variants affecting transcript levels (strength) or the distribution of TSSs within a promoter (shape). Our results identify promoter shape as a molecular trait that can evolve independently of promoter strength. Broad promoters typically harbor shape-associated variants, with signatures of adaptive selection. Single-cell measurements demonstrate that variants modulating promoter shape often increase expression noise, whereas heteroallelic interactions with other promoter variants alleviate these effects. These results uncover new functional properties of natural promoters and suggest the minimization of expression noise as an important factor in promoter evolution.
Collapse
Affiliation(s)
- Ignacio E Schor
- European Molecular Biology Laboratory (EMBL) Genome Biology Unit, Heidelberg, Germany
| | - Jacob F Degner
- European Molecular Biology Laboratory (EMBL) Genome Biology Unit, Heidelberg, Germany
| | - Dermot Harnett
- European Molecular Biology Laboratory (EMBL) Genome Biology Unit, Heidelberg, Germany
| | - Enrico Cannavò
- European Molecular Biology Laboratory (EMBL) Genome Biology Unit, Heidelberg, Germany
| | - Francesco P Casale
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Heejung Shim
- Department of Statistics, Purdue University, West Lafayette, Indiana, USA
| | - David A Garfield
- European Molecular Biology Laboratory (EMBL) Genome Biology Unit, Heidelberg, Germany
| | - Ewan Birney
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Matthew Stephens
- Department of Human Genetics, University of Chicago, Chicago, Illinois, USA
| | - Oliver Stegle
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Eileen E M Furlong
- European Molecular Biology Laboratory (EMBL) Genome Biology Unit, Heidelberg, Germany
| |
Collapse
|