1
|
Iliopoulou E, Papadogiannis V, Tsigenopoulos CS, Manousaki T. Extensive Loss and Gain of Conserved Noncoding Elements During Early Teleost Evolution. Genome Biol Evol 2024; 16:evae061. [PMID: 38648507 PMCID: PMC11034925 DOI: 10.1093/gbe/evae061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/19/2024] [Indexed: 04/25/2024] Open
Abstract
Conserved noncoding elements in vertebrates are enriched around transcription factor loci associated with development. However, loss and rapid divergence of conserved noncoding elements has been reported in teleost fish, albeit taking only few genomes into consideration. Taking advantage of the recent increase in high-quality teleost genomes, we focus on studying the evolution of teleost conserved noncoding elements, carrying out targeted genomic alignments and comparisons within the teleost phylogeny to detect conserved noncoding elements and reconstruct the ancestral teleost conserved noncoding elements repertoire. This teleost-centric approach confirms previous observations of extensive vertebrate conserved noncoding elements loss early in teleost evolution, but also reveals massive conserved noncoding elements gain in the teleost stem-group over 300 million years ago. Using synteny-based association to link conserved noncoding elements to their putatively regulated target genes, we show the most teleost gained conserved noncoding elements are found in the vicinity of orthologous loci involved in transcriptional regulation and embryonic development that are also associated with conserved noncoding elements in other vertebrates. Moreover, teleost and vertebrate conserved noncoding elements share a highly similar motif and transcription factor binding site vocabulary. We suggest that early teleost conserved noncoding element gains reflect a restructuring of the ancestral conserved noncoding element repertoire through both extreme divergence and de novo emergence. Finally, we support newly identified pan-teleost conserved noncoding elements have potential for accurate resolution of teleost phylogenetic placements in par with coding sequences, unlike ancestral only elements shared with spotted gar. This work provides new insight into conserved noncoding element evolution with great value for follow-up work on phylogenomics, comparative genomics, and the study of gene regulation evolution in teleosts.
Collapse
Affiliation(s)
- Elisavet Iliopoulou
- Hellenic Centre for Marine Research (HCMR), Institute of Marine Biology, Biotechnology & Aquaculture (IMBBC), Heraklion, Greece
- Present Address: Université Paris Cité, CNRS, Institut Jacques Monod, F-75013 Paris, France
| | - Vasileios Papadogiannis
- Hellenic Centre for Marine Research (HCMR), Institute of Marine Biology, Biotechnology & Aquaculture (IMBBC), Heraklion, Greece
- Present Address: Center for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Costas S Tsigenopoulos
- Hellenic Centre for Marine Research (HCMR), Institute of Marine Biology, Biotechnology & Aquaculture (IMBBC), Heraklion, Greece
| | - Tereza Manousaki
- Hellenic Centre for Marine Research (HCMR), Institute of Marine Biology, Biotechnology & Aquaculture (IMBBC), Heraklion, Greece
| |
Collapse
|
2
|
Abstract
We developed dbCNS (http://yamasati.nig.ac.jp/dbcns), a new database for conserved noncoding sequences (CNSs). CNSs exist in many eukaryotes and are assumed to be involved in protein expression control. Version 1 of dbCNS, introduced here, includes a powerful and precise CNS identification pipeline for multiple vertebrate genomes. Mutations in CNSs may induce morphological changes and cause genetic diseases. For this reason, many vertebrate CNSs have been identified, with special reference to primate genomes. We integrated ∼6.9 million CNSs from many vertebrate genomes into dbCNS, which allows users to extract CNSs near genes of interest using keyword searches. In addition to CNSs, dbCNS contains published genome sequences of 161 species. With purposeful taxonomic sampling of genomes, users can employ CNSs as queries to reconstruct CNS alignments and phylogenetic trees, to evaluate CNS modifications, acquisitions, and losses, and to roughly identify species with CNSs having accelerated substitution rates. dbCNS also produces links to dbSNP for searching pathogenic single-nucleotide polymorphisms in human CNSs. Thus, dbCNS connects morphological changes with genetic diseases. A test analysis using 38 gnathostome genomes was accomplished within 30 s. dbCNS results can evaluate CNSs identified by other stand-alone programs using genome-scale data.
Collapse
Affiliation(s)
- Jun Inoue
- Population Genetics Laboratory, Department of Genomics and Evolutionary Biology, National Institute of Genetics, Mishima, Japan.,Center for Earth Surface System Dynamics, Atmosphere and Ocean Research Institute, University of Tokyo, Kashiwa, Japan
| | - Naruya Saitou
- Population Genetics Laboratory, Department of Genomics and Evolutionary Biology, National Institute of Genetics, Mishima, Japan.,Department of Okinawa Bioinformation Bank, Faculty of Medicine, University of the Ryukyus, Okinawa, Japan
| |
Collapse
|
3
|
Cruz MAD, Lund D, Szekeres F, Karlsson S, Faresjö M, Larsson D. Cis-regulatory elements in conserved non-coding sequences of nuclear receptor genes indicate for crosstalk between endocrine systems. Open Med (Wars) 2021; 16:640-650. [PMID: 33954257 PMCID: PMC8051167 DOI: 10.1515/med-2021-0264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Revised: 02/01/2021] [Accepted: 03/09/2021] [Indexed: 11/16/2022] Open
Abstract
Nuclear receptors (NRs) are ligand-activated transcription factors that regulate gene expression when bound to specific DNA sequences. Crosstalk between steroid NR systems has been studied for understanding the development of hormone-driven cancers but not to an extent at a genetic level. This study aimed to investigate crosstalk between steroid NRs in conserved intron and exon sequences, with a focus on steroid NRs involved in prostate cancer etiology. For this purpose, we evaluated conserved intron and exon sequences among all 49 members of the NR Superfamily (NRS) and their relevance as regulatory sequences and NR-binding sequences. Sequence conservation was found to be higher in the first intron (35%), when compared with downstream introns. Seventy-nine percent of the conserved regions in the NRS contained putative transcription factor binding sites (TFBS) and a large fraction of these sequences contained splicing sites (SS). Analysis of transcription factors binding to putative intronic and exonic TFBS revealed that 5 and 16%, respectively, were NRs. The present study suggests crosstalk between steroid NRs, e.g., vitamin D, estrogen, progesterone, and retinoic acid endocrine systems, through cis-regulatory elements in conserved sequences of introns and exons. This investigation gives evidence for crosstalk between steroid hormones and contributes to novel targets for steroid NR regulation.
Collapse
Affiliation(s)
- Maria Araceli Diaz Cruz
- Research School of Health and Welfare, School of Health and Welfare, Jönköping University, Jönköping, Sweden
| | - Dan Lund
- Department of Natural Science and Biomedicine, School of Health and Welfare, Jönköping University, Jönköping, Sweden
| | - Ferenc Szekeres
- Department of Biomedicine, School of Health Sciences, University of Skövde, Skövde, Sweden
| | - Sandra Karlsson
- Department of Natural Science and Biomedicine, School of Health and Welfare, Jönköping University, Jönköping, Sweden
| | - Maria Faresjö
- Department of Natural Science and Biomedicine, School of Health and Welfare, Jönköping University, Jönköping, Sweden
| | - Dennis Larsson
- Sahlgrenska University Hospital, Gothia Forum for Clinical Research, Gothenburg, Sweden
| |
Collapse
|
4
|
Babarinde IA, Saitou N. The Dynamics, Causes, and Impacts of Mammalian Evolutionary Rates Revealed by the Analyses of Capybara Draft Genome Sequences. Genome Biol Evol 2020; 12:1444-1458. [PMID: 32835375 DOI: 10.1093/gbe/evaa157] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/23/2020] [Indexed: 12/23/2022] Open
Abstract
Capybara (Hydrochoerus hydrochaeri) is the largest species among the extant rodents. The draft genome of capybara was sequenced with the estimated genome size of 2.6 Gb. Although capybara is about 60 times larger than guinea pig, comparative analyses revealed that the neutral evolutionary rates of the two species were not substantially different. However, analyses of 39 mammalian genomes revealed very heterogeneous evolutionary rates. The highest evolutionary rate, 8.5 times higher than the human rate, was found in the Cricetidae-Muridae common ancestor after the divergence of Spalacidae. Muridae, the family with the highest number of species among mammals, emerged after the rate acceleration. Factors responsible for the evolutionary rate heterogeneity were investigated through correlations between the evolutionary rate and longevity, gestation length, litter frequency, litter size, body weight, generation interval, age at maturity, and taxonomic order. The regression analysis of these factors showed that the model with three factors (taxonomic order, generation interval, and litter size) had the highest predictive power (R2 = 0.74). These three factors determine the number of meiosis per unit time. We also conducted transcriptome analysis and found that the evolutionary rate dynamics affects the evolution of gene expression patterns.
Collapse
Affiliation(s)
- Isaac Adeyemi Babarinde
- Department of Biological Sciences, Southern University of Science and Technology, Shenzhen, China.,Population Genetics Laboratory, National Institute of Genetics, Mishima, Japan
| | - Naruya Saitou
- Population Genetics Laboratory, National Institute of Genetics, Mishima, Japan.,School of Medicine, University of the Ryukyus, Okinawa, Japan.,Department of Genetics, School of Life Science, Graduate University for Advanced Studies, Mishima, Japan.,Department of Biological Sciences, Graduate School of Science, University of Tokyo, Japan
| |
Collapse
|
5
|
Li L, Barth NKH, Hirth E, Taher L. Pairs of Adjacent Conserved Noncoding Elements Separated by Conserved Genomic Distances Act as Cis-Regulatory Units. Genome Biol Evol 2018; 10:2535-2550. [PMID: 30184074 PMCID: PMC6161761 DOI: 10.1093/gbe/evy196] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/01/2018] [Indexed: 01/02/2023] Open
Abstract
Comparative genomic studies have identified thousands of conserved noncoding elements (CNEs) in the mammalian genome, many of which have been reported to exert cis-regulatory activity. We analyzed ∼5,500 pairs of adjacent CNEs in the human genome and found that despite divergence at the nucleotide sequence level, the inter-CNE distances of the pairs are under strong evolutionary constraint, with inter-CNE sequences featuring significantly lower transposon densities than expected. Further, we show that different degrees of conservation of the inter-CNE distance are associated with distinct cis-regulatory functions at the CNEs. Specifically, the CNEs in pairs with conserved and mildly contracted inter-CNE sequences are the most likely to represent active or poised enhancers. In contrast, CNEs in pairs with extremely contracted or expanded inter-CNE sequences are associated with no cis-regulatory activity. Furthermore, we observed that functional CNEs in a pair have very similar epigenetic profiles, hinting at a functional relationship between them. Taken together, our results support the existence of epistatic interactions between adjacent CNEs that are distance-sensitive and disrupted by transposon insertions and deletions, and contribute to our understanding of the selective forces acting on cis-regulatory elements, which are crucial for elucidating the molecular mechanisms underlying adaptive evolution and human genetic diseases.
Collapse
Affiliation(s)
- Lifei Li
- Division of Bioinformatics, Department of Biology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Nicolai K H Barth
- Division of Bioinformatics, Department of Biology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Eva Hirth
- Division of Bioinformatics, Department of Biology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Leila Taher
- Division of Bioinformatics, Department of Biology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|
6
|
Han SK, Kim D, Lee H, Kim I, Kim S. Divergence of Noncoding Regulatory Elements Explains Gene–Phenotype Differences between Human and Mouse Orthologous Genes. Mol Biol Evol 2018; 35:1653-1667. [DOI: 10.1093/molbev/msy056] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Affiliation(s)
- Seong Kyu Han
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, Korea
| | - Donghyo Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, Korea
| | - Heetak Lee
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, Korea
| | - Inhae Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, Korea
| | - Sanguk Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, Korea
| |
Collapse
|
7
|
Conserved noncoding sequences conserve biological networks and influence genome evolution. Heredity (Edinb) 2018; 120:437-451. [PMID: 29396421 DOI: 10.1038/s41437-018-0055-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2017] [Revised: 12/14/2017] [Accepted: 01/08/2018] [Indexed: 01/24/2023] Open
Abstract
Comparative genomics approaches have identified numerous conserved cis-regulatory sequences near genes in plant genomes. Despite the identification of these conserved noncoding sequences (CNSs), our knowledge of their functional importance and selection remains limited. Here, we used a combination of DNA methylome analysis, microarray expression analyses, and functional annotation to study these sequences in the model tree Populus trichocarpa. Methylation in CG contexts and non-CG contexts was lower in CNSs, particularly CNSs in the 5'-upstream regions of genes, compared with other sites in the genome. We observed that CNSs are enriched in genes with transcription and binding functions, and this also associated with syntenic genes and those from whole-genome duplications, suggesting that cis-regulatory sequences play a key role in genome evolution. We detected a significant positive correlation between CNS number and protein interactions, suggesting that CNSs may have roles in the evolution and maintenance of biological networks. The divergence of CNSs indicates that duplication-degeneration-complementation drives the subfunctionalization of a proportion of duplicated genes from whole-genome duplication. Furthermore, population genomics confirmed that most CNSs are under strong purifying selection and only a small subset of CNSs shows evidence of adaptive evolution. These findings provide a foundation for future studies exploring these key genomic features in the maintenance of biological networks, local adaptation, and transcription.
Collapse
|
8
|
Polychronopoulos D, King JWD, Nash AJ, Tan G, Lenhard B. Conserved non-coding elements: developmental gene regulation meets genome organization. Nucleic Acids Res 2018; 45:12611-12624. [PMID: 29121339 PMCID: PMC5728398 DOI: 10.1093/nar/gkx1074] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Accepted: 10/24/2017] [Indexed: 12/20/2022] Open
Abstract
Comparative genomics has revealed a class of non-protein-coding genomic sequences that display an extraordinary degree of conservation between two or more organisms, regularly exceeding that found within protein-coding exons. These elements, collectively referred to as conserved non-coding elements (CNEs), are non-randomly distributed across chromosomes and tend to cluster in the vicinity of genes with regulatory roles in multicellular development and differentiation. CNEs are organized into functional ensembles called genomic regulatory blocks–dense clusters of elements that collectively coordinate the expression of shared target genes, and whose span in many cases coincides with topologically associated domains. CNEs display sequence properties that set them apart from other sequences under constraint, and have recently been proposed as useful markers for the reconstruction of the evolutionary history of organisms. Disruption of several of these elements is known to contribute to diseases linked with development, and cancer. The emergence, evolutionary dynamics and functions of CNEs still remain poorly understood, and new approaches are required to enable comprehensive CNE identification and characterization. Here, we review current knowledge and identify challenges that need to be tackled to resolve the impasse in understanding extreme non-coding conservation.
Collapse
Affiliation(s)
- Dimitris Polychronopoulos
- Computational Regulatory Genomics Group, MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK.,Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Campus, Du Cane Road, London W12 0NN, UK
| | - James W D King
- Computational Regulatory Genomics Group, MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK.,Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Campus, Du Cane Road, London W12 0NN, UK
| | - Alexander J Nash
- Computational Regulatory Genomics Group, MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK.,Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Campus, Du Cane Road, London W12 0NN, UK
| | - Ge Tan
- Computational Regulatory Genomics Group, MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK.,Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Campus, Du Cane Road, London W12 0NN, UK
| | - Boris Lenhard
- Computational Regulatory Genomics Group, MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK.,Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Campus, Du Cane Road, London W12 0NN, UK.,Sars International Centre for Marine Molecular Biology, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway
| |
Collapse
|
9
|
Saitou N. Neutral Evolution. INTRODUCTION TO EVOLUTIONARY GENOMICS 2018. [PMCID: PMC7121930 DOI: 10.1007/978-3-319-92642-1_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
Neutral evolution is the default process of genomic changes. This is because our world is finite, and the randomness, indispensable for neutral evolution, is important when we consider the history of a finite world. The random nature of DNA propagation is discussed using branching process, coalescent process, Markov process, and diffusion process. Expected evolutionary patterns under neutrality are then discussed on fixation probability, rate of evolution, and amount of DNA variation kept in population. We then discuss various features of neutral evolution starting from evolutionary rates, synonymous and nonsynonymous substitutions, junk DNA, and pseudogenes.
Collapse
Affiliation(s)
- Naruya Saitou
- Division of Population Genetics, National Institute of Genetics (NIG), Mishima, Shizuoka Japan
| |
Collapse
|
10
|
Mahmoudi Saber M, Saitou N. Silencing Effect of Hominoid Highly Conserved Noncoding Sequences on Embryonic Brain Development. Genome Biol Evol 2017; 9:2037-2048. [PMID: 28633494 PMCID: PMC5591954 DOI: 10.1093/gbe/evx105] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/16/2017] [Indexed: 12/12/2022] Open
Abstract
Superfamily Hominoidea, which consists of Hominidae (humans and great apes) and Hylobatidae (gibbons), is well-known for sharing human-like characteristics, however, the genomic origins of these shared unique phenotypes have mainly remained elusive. To decipher the underlying genomic basis of Hominoidea-restricted phenotypes, we identified and characterized Hominoidea-restricted highly conserved noncoding sequences (HCNSs) that are a class of potential regulatory elements which may be involved in evolution of lineage-specific phenotypes. We discovered 679 such HCNSs from human, chimpanzee, gorilla, orangutan and gibbon genomes. These HCNSs were demonstrated to be under purifying selection but with lineage-restricted characteristics different from old CNSs. A significant proportion of their ancestral sequences had accelerated rates of nucleotide substitutions, insertions and deletions during the evolution of common ancestor of Hominoidea, suggesting the intervention of positive Darwinian selection for creating those HCNSs. In contrary to enhancer elements and similar to silencer sequences, these Hominoidea-restricted HCNSs are located in close proximity of transcription start sites. Their target genes are enriched in the nervous system, development and transcription, and they tend to be remotely located from the nearest coding gene. Chip-seq signals and gene expression patterns suggest that Hominoidea-restricted HCNSs are likely to be functional regulatory elements by imposing silencing effects on their target genes in a tissue-restricted manner during fetal brain development. These HCNSs, emerged through adaptive evolution and conserved through purifying selection, represent a set of promising targets for future functional studies of the evolution of Hominoidea-restricted phenotypes.
Collapse
Affiliation(s)
- Morteza Mahmoudi Saber
- Division of Population Genetics, National Institute of Genetics, Mishima, Japan
- Department of Biological Sciences, Graduate School of Science, University of Tokyo, Japan
| | - Naruya Saitou
- Division of Population Genetics, National Institute of Genetics, Mishima, Japan
- Department of Biological Sciences, Graduate School of Science, University of Tokyo, Japan
| |
Collapse
|
11
|
Algama M, Tasker E, Williams C, Parslow AC, Bryson-Richardson RJ, Keith JM. Genome-wide identification of conserved intronic non-coding sequences using a Bayesian segmentation approach. BMC Genomics 2017; 18:259. [PMID: 28347272 PMCID: PMC5369223 DOI: 10.1186/s12864-017-3645-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2016] [Accepted: 03/18/2017] [Indexed: 11/17/2022] Open
Abstract
Background Computational identification of non-coding RNAs (ncRNAs) is a challenging problem. We describe a genome-wide analysis using Bayesian segmentation to identify intronic elements highly conserved between three evolutionarily distant vertebrate species: human, mouse and zebrafish. We investigate the extent to which these elements include ncRNAs (or conserved domains of ncRNAs) and regulatory sequences. Results We identified 655 deeply conserved intronic sequences in a genome-wide analysis. We also performed a pathway-focussed analysis on genes involved in muscle development, detecting 27 intronic elements, of which 22 were not detected in the genome-wide analysis. At least 87% of the genome-wide and 70% of the pathway-focussed elements have existing annotations indicative of conserved RNA secondary structure. The expression of 26 of the pathway-focused elements was examined using RT-PCR, providing confirmation that they include expressed ncRNAs. Consistent with previous studies, these elements are significantly over-represented in the introns of transcription factors. Conclusions This study demonstrates a novel, highly effective, Bayesian approach to identifying conserved non-coding sequences. Our results complement previous findings that these sequences are enriched in transcription factors. However, in contrast to previous studies which suggest the majority of conserved sequences are regulatory factor binding sites, the majority of conserved sequences identified using our approach contain evidence of conserved RNA secondary structures, and our laboratory results suggest most are expressed. Functional roles at DNA and RNA levels are not mutually exclusive, and many of our elements possess evidence of both. Moreover, ncRNAs play roles in transcriptional and post-transcriptional regulation, and this may contribute to the over-representation of these elements in introns of transcription factors. We attribute the higher sensitivity of the pathway-focussed analysis compared to the genome-wide analysis to improved alignment quality, suggesting that enhanced genomic alignments may reveal many more conserved intronic sequences. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3645-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Manjula Algama
- School of Mathematical Sciences, Monash University, Melbourne, VIC, 3800, Australia
| | - Edward Tasker
- School of Mathematical Sciences, Monash University, Melbourne, VIC, 3800, Australia
| | - Caitlin Williams
- School of Biological Sciences, Monash University, Melbourne, VIC, 3800, Australia
| | - Adam C Parslow
- School of Biological Sciences, Monash University, Melbourne, VIC, 3800, Australia
| | | | - Jonathan M Keith
- School of Mathematical Sciences, Monash University, Melbourne, VIC, 3800, Australia.
| |
Collapse
|
12
|
Hettiarachchi N, Saitou N. GC Content Heterogeneity Transition of Conserved Noncoding Sequences Occurred at the Emergence of Vertebrates. Genome Biol Evol 2016; 8:3377-3392. [PMID: 28040773 PMCID: PMC5203776 DOI: 10.1093/gbe/evw231] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Conserved non-coding sequences (CNSs) of Eukaryotes are known to be significantly enriched in regulatory sequences. CNSs of diverse lineages follow different patterns in abundance, sequence composition, and location. Here, we report a thorough analysis of CNSs in diverse groups of Eukaryotes with respect to GC content heterogeneity. We examined 24 fungi, 19 invertebrates, and 12 non-mammalian vertebrates so as to find lineage specific features of CNSs. We found that fungi and invertebrate CNSs are predominantly GC rich as in plants we previously observed, whereas vertebrate CNSs are GC poor. This result suggests that the CNS GC content transition occurred from the ancestral GC rich state of Eukaryotes to GC poor in the vertebrate lineage due to the enrollment of GC poor transcription factor binding sites that are lineage specific. CNS GC content is closely linked with the nucleosome occupancy that determines the location and structural architecture of DNAs.
Collapse
Affiliation(s)
- Nilmini Hettiarachchi
- Department of Genetics, School of Life Science, Graduate University for Advanced Studies (SOKENDAI), Mishima, Japan.,Division of Population Genetics, National institute of Genetics, Mishima, Japan
| | - Naruya Saitou
- Department of Genetics, School of Life Science, Graduate University for Advanced Studies (SOKENDAI), Mishima, Japan .,Division of Population Genetics, National institute of Genetics, Mishima, Japan
| |
Collapse
|
13
|
Saber MM, Adeyemi Babarinde I, Hettiarachchi N, Saitou N. Emergence and Evolution of Hominidae-Specific Coding and Noncoding Genomic Sequences. Genome Biol Evol 2016; 8:2076-92. [PMID: 27289096 PMCID: PMC4987104 DOI: 10.1093/gbe/evw132] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Family Hominidae, which includes humans and great apes, is recognized for unique complex social behavior and intellectual abilities. Despite the increasing genome data, however, the genomic origin of its phenotypic uniqueness has remained elusive. Clade-specific genes and highly conserved noncoding sequences (HCNSs) are among the high-potential evolutionary candidates involved in driving clade-specific characters and phenotypes. On this premise, we analyzed whole genome sequences along with gene orthology data retrieved from major DNA databases to find Hominidae-specific (HS) genes and HCNSs. We discovered that Down syndrome critical region 4 (DSCR4) is the only experimentally verified gene uniquely present in Hominidae. DSCR4 has no structural homology to any known protein and was inferred to have emerged in several steps through LTR/ERV1, LTR/ERVL retrotransposition, and transversion. Using the genomic distance as neutral evolution threshold, we identified 1,658 HS HCNSs. Polymorphism coverage and derived allele frequency analysis of HS HCNSs showed that these HCNSs are under purifying selection, indicating that they may harbor important functions. They are overrepresented in promoters/untranslated regions, in close proximity of genes involved in sensory perception of sound and developmental process, and also showed a significantly lower nucleosome occupancy probability. Interestingly, many ancestral sequences of the HS HCNSs showed very high evolutionary rates. This suggests that new functions emerged through some kind of positive selection, and then purifying selection started to operate to keep these functions.
Collapse
Affiliation(s)
- Morteza Mahmoudi Saber
- Department of Biological Sciences, Graduate School of Science, University of Tokyo Division of Population Genetics, National Institute of Genetics, Mishima, Japan
| | - Isaac Adeyemi Babarinde
- Division of Population Genetics, National Institute of Genetics, Mishima, Japan Department of Genetics, School of Life Science, Graduate University for Advanced Studies (SOKENDAI), Mishima, Japan
| | - Nilmini Hettiarachchi
- Division of Population Genetics, National Institute of Genetics, Mishima, Japan Department of Genetics, School of Life Science, Graduate University for Advanced Studies (SOKENDAI), Mishima, Japan
| | - Naruya Saitou
- Department of Biological Sciences, Graduate School of Science, University of Tokyo Division of Population Genetics, National Institute of Genetics, Mishima, Japan Department of Genetics, School of Life Science, Graduate University for Advanced Studies (SOKENDAI), Mishima, Japan
| |
Collapse
|
14
|
Babarinde IA, Saitou N. Genomic Locations of Conserved Noncoding Sequences and Their Proximal Protein-Coding Genes in Mammalian Expression Dynamics. Mol Biol Evol 2016; 33:1807-17. [PMID: 27017584 DOI: 10.1093/molbev/msw058] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Experimental studies have found the involvement of certain conserved noncoding sequences (CNSs) in the regulation of the proximal protein-coding genes in mammals. However, reported cases of long range enhancer activities and inter-chromosomal regulation suggest that proximity of CNSs to protein-coding genes might not be important for regulation. To test the importance of the CNS genomic location, we extracted the CNSs conserved between chicken and four mammalian species (human, mouse, dog, and cattle). These CNSs were confirmed to be under purifying selection. The intergenic CNSs are often found in clusters in gene deserts, where protein-coding genes are in paucity. The distribution pattern, ChIP-Seq, and RNA-Seq data suggested that the CNSs are more likely to be regulatory elements and not corresponding to long intergenic noncoding RNAs. Physical distances between CNS and their nearest protein coding genes were well conserved between human and mouse genomes, and CNS-flanking genes were often found in evolutionarily conserved genomic neighborhoods. ChIP-Seq signal and gene expression patterns also suggested that CNSs regulate nearby genes. Interestingly, genes with more CNSs have more evolutionarily conserved expression than those with fewer CNSs. These computationally obtained results suggest that the genomic locations of CNSs are important for their regulatory functions. In fact, various kinds of evolutionary constraints may be acting to maintain the genomic locations of CNSs and protein-coding genes in mammals to ensure proper regulation.
Collapse
Affiliation(s)
- Isaac Adeyemi Babarinde
- Department of Genetics, Graduate University for Advanced Studies, Mishima, Japan Division of Population Genetics, National Institute of Genetics, Mishima, Japan
| | - Naruya Saitou
- Department of Genetics, Graduate University for Advanced Studies, Mishima, Japan Division of Population Genetics, National Institute of Genetics, Mishima, Japan
| |
Collapse
|
15
|
Hettiarachchi N, Kryukov K, Sumiyama K, Saitou N. Lineage-specific conserved noncoding sequences of plant genomes: their possible role in nucleosome positioning. Genome Biol Evol 2014; 6:2527-42. [PMID: 25364802 PMCID: PMC4202324 DOI: 10.1093/gbe/evu188] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/26/2014] [Indexed: 01/01/2023] Open
Abstract
Many studies on conserved noncoding sequences (CNSs) have found that CNSs are enriched significantly in regulatory sequence elements. We conducted whole-genome analysis on plant CNSs to identify lineage-specific CNSs in eudicots, monocots, angiosperms,and vascular plants based on the premise that lineage-specific CNSs define lineage-specific characters and functions in groups of organisms. We identified 27 eudicot, 204 monocot, 6,536 grass, 19 angiosperm, and 2 vascular plant lineage-specific CNSs(lengths range from 16 to 1,517 bp) that presumably originated in their respective common ancestors. A stronger constraint on the CNSs located in the untranslated regions was observed. The CNSs were often flanked by genes involved in transcription regulation. A drop of A+T content near the border of CNSs was observed and CNS regions showed a higher nucleosome occupancy probability. These CNSs are candidate regulatory elements, which are expected to define lineage-specific features of various plant groups.
Collapse
Affiliation(s)
- Nilmini Hettiarachchi
- Department of Genetics, School of Life Science, Graduate University for Advanced Studies (SOKENDAI), Mishima, Japan
- Division of Population Genetics, National Institute of Genetics, Mishima, Japan
| | - Kirill Kryukov
- Division of Population Genetics, National Institute of Genetics, Mishima, Japan
| | - Kenta Sumiyama
- Department of Genetics, School of Life Science, Graduate University for Advanced Studies (SOKENDAI), Mishima, Japan
- Division of Population Genetics, National Institute of Genetics, Mishima, Japan
| | - Naruya Saitou
- Department of Genetics, School of Life Science, Graduate University for Advanced Studies (SOKENDAI), Mishima, Japan
- Division of Population Genetics, National Institute of Genetics, Mishima, Japan
- Department of Biological Sciences, Graduate School of Science, University of Tokyo, Japan
| |
Collapse
|
16
|
Babarinde IA, Saitou N. Heterogeneous tempo and mode of conserved noncoding sequence evolution among four mammalian orders. Genome Biol Evol 2014; 5:2330-43. [PMID: 24259317 PMCID: PMC3879966 DOI: 10.1093/gbe/evt177] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Conserved noncoding sequences (CNSs) of vertebrates are considered to be closely linked with protein-coding gene regulatory functions. We examined the abundance and genomic distribution of CNSs in four mammalian orders: primates, rodents, carnivores, and cetartiodactyls. We defined the two thresholds for CNS using conservation level of coding genes; using all the three coding positions and using only first and second codon positions. The abundance of CNSs varied among lineages, with primates and rodents having highest and lowest number of CNSs, respectively, whereas carnivores and cetartiodactyls had intermediate values. These CNSs cover 1.3-5.5% of the mammalian genomes and have signatures of selective constraints that are stronger in more ancestral than the recent ones. Evolution of new CNSs as well as retention of ancestral CNSs contribute to the differences in abundance. The genomic distribution of CNSs is dynamic with higher proportions of rodent and primate CNSs located in the introns compared with carnivores and cetartiodactyls. In fact, 19% of orthologous single-copy CNSs between human and dog are located in different genomic regions. If CNSs can be considered as candidates of gene expression regulatory sequences, heterogeneity of CNSs among the four mammalian orders may have played an important role in creating the order-specific phenotypes. Fewer CNSs in rodents suggest that rodent diversity is related to lower regulatory conservation. With CNSs shown to cluster around genes involved in nervous systems and the higher number of primate CNSs, our result suggests that CNSs may be involved in the higher complexity of the primate nervous system.
Collapse
Affiliation(s)
- Isaac Adeyemi Babarinde
- Department of Genetics, School of Life Science, The Graduate University for Advanced Studies (SOKENDAI), Mishima Japan
| | | |
Collapse
|
17
|
Harmston N, Baresic A, Lenhard B. The mystery of extreme non-coding conservation. Philos Trans R Soc Lond B Biol Sci 2013; 368:20130021. [PMID: 24218634 PMCID: PMC3826495 DOI: 10.1098/rstb.2013.0021] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Regions of several dozen to several hundred base pairs of extreme conservation have been found in non-coding regions in all metazoan genomes. The distribution of these elements within and across genomes has suggested that many have roles as transcriptional regulatory elements in multi-cellular organization, differentiation and development. Currently, there is no known mechanism or function that would account for this level of conservation at the observed evolutionary distances. Previous studies have found that, while these regions are under strong purifying selection, and not mutational coldspots, deletion of entire regions in mice does not necessarily lead to identifiable changes in phenotype during development. These opposing findings lead to several questions regarding their functional importance and why they are under strong selection in the first place. In this perspective, we discuss the methods and techniques used in identifying and dissecting these regions, their observed patterns of conservation, and review the current hypotheses on their functional significance.
Collapse
Affiliation(s)
- Nathan Harmston
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London and MRC Clinical Sciences Centre, , Hammersmith Hospital Campus, Du Cane Road, London W12 0NN, UK
| | | | | |
Collapse
|
18
|
Matsunami M, Saitou N. Vertebrate paralogous conserved noncoding sequences may be related to gene expressions in brain. Genome Biol Evol 2013; 5:140-50. [PMID: 23267051 PMCID: PMC3595034 DOI: 10.1093/gbe/evs128] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Vertebrate genomes include gene regulatory elements in protein-noncoding regions. A part of gene regulatory elements are expected to be conserved according to their functional importance, so that evolutionarily conserved noncoding sequences (CNSs) might be good candidates for those elements. In addition, paralogous CNSs, which are highly conserved among both orthologous loci and paralogous loci, have the possibility of controlling overlapping expression patterns of their adjacent paralogous protein-coding genes. The two-round whole-genome duplications (2R WGDs), which most probably occurred in the vertebrate common ancestors, generated large numbers of paralogous protein-coding genes and their regulatory elements. These events could contribute to the emergence of vertebrate features. However, the evolutionary history and influences of the 2R WGDs are still unclear, especially in noncoding regions. To address this issue, we identified paralogous CNSs. Region-focused Basic Local Alignment Search Tool (BLAST) search of each synteny block revealed 7,924 orthologous CNSs and 309 paralogous CNSs conserved among eight high-quality vertebrate genomes. Paralogous CNSs we found contained 115 previously reported ones and newly detected 194 ones. Through comparisons with VISTA Enhancer Browser and available ChIP-seq data, one-third (103) of paralogous CNSs detected in this study showed gene regulatory activity in the brain at several developmental stages. Their genomic locations are highly enriched near the transcription factor-coding regions, which are expressed in brain and neural systems. These results suggest that paralogous CNSs are conserved mainly because of maintaining gene expression in the vertebrate brain.
Collapse
Affiliation(s)
- Masatoshi Matsunami
- Department of Genetics, School of Life Science, Graduate University for Advanced Studies (SOKENDAI), Mishima, Japan
- Division of Population Genetics, National Institute of Genetics, Mishima, Japan
- Present address: Laboratory of Ecology and Genetics, Graduate School of Environmental Science, Hokkaido University, Sapporo, Japan
| | - Naruya Saitou
- Department of Genetics, School of Life Science, Graduate University for Advanced Studies (SOKENDAI), Mishima, Japan
- Division of Population Genetics, National Institute of Genetics, Mishima, Japan
- Department of Biological Sciences, Graduate School of Science, University of Tokyo, Tokyo, Japan
- *Corresponding author: E-mail:
| |
Collapse
|
19
|
Neutral Evolution. INTRODUCTION TO EVOLUTIONARY GENOMICS 2013. [PMCID: PMC7122456 DOI: 10.1007/978-1-4471-5304-7_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Neutral evolution is the default process of the genome changes. This is because our world is finite and the randomness is important when we consider history of a finite world. The random nature of DNA propagation is discussed using branching process, coalescent process, Markov process, and diffusion process. Expected evolutionary patterns under neutrality are then discussed on fixation probability, rate of evolution, and amount of DNA variation kept in population. We then discuss various features of neutral evolution starting from evolutionary rates, synonymous and nonsynonymous substitutions, junk DNA, and pseudogenes.
Collapse
|