1
|
Zhou W, Shi H, Wang Z, Huang Y, Ni L, Chen X, Liu Y, Li H, Li C, Liu Y. Identification of Highly Repetitive Enhancers with Long-range Regulation Potential in Barley via STARR-seq. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae012. [PMID: 39167800 DOI: 10.1093/gpbjnl/qzae012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 06/02/2023] [Accepted: 06/25/2023] [Indexed: 08/23/2024]
Abstract
Enhancers are DNA sequences that can strengthen transcription initiation. However, the global identification of plant enhancers is complicated due to uncertainty in the distance and orientation of enhancers, especially in species with large genomes. In this study, we performed self-transcribing active regulatory region sequencing (STARR-seq) for the first time to identify enhancers across the barley genome. A total of 7323 enhancers were successfully identified, and among 45 randomly selected enhancers, over 75% were effective as validated by a dual-luciferase reporter assay system in the lower epidermis of tobacco leaves. Interestingly, up to 53.5% of the barley enhancers were repetitive sequences, especially transposable elements (TEs), thus reinforcing the vital role of repetitive enhancers in gene expression. Both the common active mark H3K4me3 and repressive mark H3K27me3 were abundant among the barley STARR-seq enhancers. In addition, the functional range of barley STARR-seq enhancers seemed much broader than that of rice or maize and extended to ±100 kb of the gene body, and this finding was consistent with the high expression levels of genes in the genome. This study specifically depicts the unique features of barley enhancers and provides available barley enhancers for further utilization.
Collapse
Affiliation(s)
- Wanlin Zhou
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu 611130, China
- Triticeae Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Haoran Shi
- Triticeae Research Institute, Sichuan Agricultural University, Chengdu 611130, China
- Chengdu Academy of Agricultural and Forestry Sciences, Chengdu 611130, China
| | - Zhiqiang Wang
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu 611130, China
- Triticeae Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Yuxin Huang
- Triticeae Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Lin Ni
- Triticeae Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Xudong Chen
- Triticeae Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Yan Liu
- Triticeae Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Haojie Li
- Triticeae Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Caixia Li
- Triticeae Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Yaxi Liu
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu 611130, China
- Triticeae Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| |
Collapse
|
2
|
Chantzi N, Mareboina M, Konnaris MA, Montgomery A, Patsakis M, Mouratidis I, Georgakopoulos-Soares I. The determinants of the rarity of nucleic and peptide short sequences in nature. NAR Genom Bioinform 2024; 6:lqae029. [PMID: 38584871 PMCID: PMC10993293 DOI: 10.1093/nargab/lqae029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 02/21/2024] [Accepted: 03/18/2024] [Indexed: 04/09/2024] Open
Abstract
The prevalence of nucleic and peptide short sequences across organismal genomes and proteomes has not been thoroughly investigated. We examined 45 785 reference genomes and 21 871 reference proteomes, spanning archaea, bacteria, eukaryotes and viruses to calculate the rarity of short sequences in them. To capture this, we developed a metric of the rarity of each sequence in nature, the rarity index. We find that the frequency of certain dipeptides in rare oligopeptide sequences is hundreds of times lower than expected, which is not the case for any dinucleotides. We also generate predictive regression models that infer the rarity of nucleic and proteomic sequences across nature or within each domain of life and viruses separately. When examining each of the three domains of life and viruses separately, the R² performance of the model predicting rarity for 5-mer peptides from mono- and dipeptides ranged between 0.814 and 0.932. A separate model predicting rarity for 10-mer oligonucleotides from mono- and dinucleotides achieved R² performance between 0.408 and 0.606. Our results indicate that the mono- and dinucleotide composition of nucleic sequences and the mono- and dipeptide composition of peptide sequences can explain a significant proportion of the variance in their frequencies in nature.
Collapse
Affiliation(s)
- Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA
| | - Manvita Mareboina
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA
| | - Maxwell A Konnaris
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA
- Department of Statistics, Penn State University, University Park, PA, 16802, USA
- Huck Institutes of the Life Sciences, Penn State University, University Park, PA, 16802, USA
| | - Austin Montgomery
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA
| | - Michail Patsakis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA
- Huck Institutes of the Life Sciences, Penn State University, University Park, PA, 16802, USA
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA
| |
Collapse
|
3
|
Barbieri M. Overview of the fourth special issue in code biology. Biosystems 2024; 235:105074. [PMID: 37944633 DOI: 10.1016/j.biosystems.2023.105074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2023]
Affiliation(s)
- Marcello Barbieri
- Dipartimento di Morfologia Ed Embriologia, Via Fossato di Mortara 64a, 44121, Ferrara, Italy.
| |
Collapse
|
4
|
Orlov YL, Orlova NG. Bioinformatics tools for the sequence complexity estimates. Biophys Rev 2023; 15:1367-1378. [PMID: 37974990 PMCID: PMC10643780 DOI: 10.1007/s12551-023-01140-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 09/01/2023] [Indexed: 11/19/2023] Open
Abstract
We review current methods and bioinformatics tools for the text complexity estimates (information and entropy measures). The search DNA regions with extreme statistical characteristics such as low complexity regions are important for biophysical models of chromosome function and gene transcription regulation in genome scale. We discuss the complexity profiling for segmentation and delineation of genome sequences, search for genome repeats and transposable elements, and applications to next-generation sequencing reads. We review the complexity methods and new applications fields: analysis of mutation hotspots loci, analysis of short sequencing reads with quality control, and alignment-free genome comparisons. The algorithms implementing various numerical measures of text complexity estimates including combinatorial and linguistic measures have been developed before genome sequencing era. The series of tools to estimate sequence complexity use compression approaches, mainly by modification of Lempel-Ziv compression. Most of the tools are available online providing large-scale service for whole genome analysis. Novel machine learning applications for classification of complete genome sequences also include sequence compression and complexity algorithms. We present comparison of the complexity methods on the different sequence sets, the applications for gene transcription regulatory regions analysis. Furthermore, we discuss approaches and application of sequence complexity for proteins. The complexity measures for amino acid sequences could be calculated by the same entropy and compression-based algorithms. But the functional and evolutionary roles of low complexity regions in protein have specific features differing from DNA. The tools for protein sequence complexity aimed for protein structural constraints. It was shown that low complexity regions in protein sequences are conservative in evolution and have important biological and structural functions. Finally, we summarize recent findings in large scale genome complexity comparison and applications for coronavirus genome analysis.
Collapse
Affiliation(s)
- Yuriy L. Orlov
- The Digital Health Institute, I.M. Sechenov First Moscow State Medical University of the Russian Ministry of Health (Sechenov University), Moscow, 119991 Russia
- Institute of Cytology and Genetics SB RAS, 630090 Novosibirsk, Russia
- Agrarian and Technological Institute, Peoples’ Friendship University of Russia, 117198 Moscow, Russia
| | - Nina G. Orlova
- Department of Mathematics, Financial University under the Government of the Russian Federation, Moscow, 125167 Russia
| |
Collapse
|
5
|
Zuiddam M, Shakiba B, Schiessel H. Multiplexing mechanical and translational cues on genes. Biophys J 2022; 121:4311-4324. [PMID: 36230003 PMCID: PMC9703045 DOI: 10.1016/j.bpj.2022.10.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 07/06/2022] [Accepted: 10/07/2022] [Indexed: 12/14/2022] Open
Abstract
The genetic code gives precise instructions on how to translate codons into amino acids. Due to the degeneracy of the genetic code-18 out of 20 amino acids are encoded for by more than one codon-more information can be stored in a basepair sequence. Indeed, various types of additional information have been discussed in the literature, e.g., the positioning of nucleosomes along eukaryotic genomes and the modulation of the translating efficiency in ribosomes to influence cotranslational protein folding. The purpose of this study is to show that it is indeed possible to carry more than one additional layer of information on top of a gene. In particular, we show how much translation efficiency and nucleosome positioning can be adjusted simultaneously without changing the encoded protein. We achieve this by mapping genes on weighted graphs that contain all synonymous genes, and then finding shortest paths through these graphs. This enables us, for example, to readjust the disrupted translational efficiency profile after a gene has been introduced from one organism (e.g., human) into another (e.g., yeast) without greatly changing the nucleosome landscape intrinsically encoded by the DNA molecule.
Collapse
Affiliation(s)
- Martijn Zuiddam
- Institute Lorentz for Theoretical Physics, Leiden University, Leiden, the Netherlands
| | - Bahareh Shakiba
- Institute Lorentz for Theoretical Physics, Leiden University, Leiden, the Netherlands
| | - Helmut Schiessel
- Cluster of Excellence Physics of Life, TU Dresden, Dresden, Germany.
| |
Collapse
|
6
|
Cytogenomics of Deschampsia P. Beauv. (Poaceae) Species Based on Sequence Analyses and FISH Mapping of CON/COM Satellite DNA Families. PLANTS 2021; 10:plants10061105. [PMID: 34070920 PMCID: PMC8229069 DOI: 10.3390/plants10061105] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Revised: 05/19/2021] [Accepted: 05/26/2021] [Indexed: 02/06/2023]
Abstract
The genus Deschampsia P. Beauv. (Poaceae) involves a group of widespread polymorphic species, and many of them are highly tolerant to stressful environmental conditions. Genome diversity and chromosomal phylogeny within the genus are still insufficiently studied. Satellite DNAs, including CON/COM families, are the main components of the plant repeatome, which contribute to chromosome organization. For the first time, using PCR-based (Polymerase Chain Reaction) techniques and sequential BLAST (Basic Local Alignment Search Tool) and MSA (Multiple Sequence Alignment) analyses, we identified and classified CON/COM repeats in genomes of eleven Deschampsia accessions and three accessions from related genera. High homology of CON/COM sequences were revealed in the studied species though differences in single-nucleotide alteration profiles detected in homologous CON/COM regions indicated that they tended to diverge independently. The performed chromosome mapping of 45S rDNA, 5S rDNA, and CON/COM repeats in six Deschampsia species demonstrated interspecific variability in localization of these cytogenetic markers and facilitated the identification of different chromosomal rearrangements. Based on the obtained data, the studied Deschampsia species were distinguished into karyological groups, and MSA-based schematic trees were built, which could clarify the relationships within the genus. Our findings can be useful for further genetic and phylogenetic studies.
Collapse
|
7
|
Bahiri-Elitzur S, Tuller T. Codon-based indices for modeling gene expression and transcript evolution. Comput Struct Biotechnol J 2021; 19:2646-2663. [PMID: 34025951 PMCID: PMC8122159 DOI: 10.1016/j.csbj.2021.04.042] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2021] [Revised: 04/17/2021] [Accepted: 04/18/2021] [Indexed: 11/21/2022] Open
Abstract
Codon usage bias (CUB) refers to the phenomena that synonymous codons are used in different frequencies in most genes and organisms. The general assumption is that codon biases reflect a balance between mutational biases and natural selection. Today we understand that the codon content is related and can affect all gene expression steps. Starting from the 1980s, codon-based indices have been used for answering different questions in all biomedical fields, including systems biology, agriculture, medicine, and biotechnology. In general, codon usage bias indices weigh each codon or a small set of codons to estimate the fitting of a certain coding sequence to a certain phenomenon (e.g., bias in codons, adaptation to the tRNA pool, frequencies of certain codons, transcription elongation speed, etc.) and are usually easy to implement. Today there are dozens of such indices; thus, this paper aims to review and compare the different codon usage bias indices, their applications, and advantages. In addition, we perform analysis that demonstrates that most indices tend to correlate even though they aim to capture different aspects. Due to the centrality of codon usage bias on different gene expression steps, it is important to keep developing new indices that can capture additional aspects that are not modeled with the current indices.
Collapse
Affiliation(s)
| | - Tamir Tuller
- Department of Biomedical Engineering, Tel-Aviv University, Tel Aviv, Israel
- The Sagol School of Neuroscience, Tel-Aviv University, Tel Aviv, Israel
| |
Collapse
|
8
|
Bernardi G. The "Genomic Code": DNA Pervasively Moulds Chromatin Structures Leaving no Room for "Junk". Life (Basel) 2021; 11:342. [PMID: 33924668 PMCID: PMC8070607 DOI: 10.3390/life11040342] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 04/06/2021] [Accepted: 04/07/2021] [Indexed: 02/07/2023] Open
Abstract
The chromatin of the human genome was analyzed at three DNA size levels. At the first, compartment level, two "gene spaces" were found many years ago: A GC-rich, gene-rich "genome core" and a GC-poor, gene-poor "genome desert", the former corresponding to open chromatin centrally located in the interphase nucleus, the latter to closed chromatin located peripherally. This bimodality was later confirmed and extended by the discoveries (1) of LADs, the Lamina-Associated Domains, and InterLADs; (2) of two "spatial compartments", A and B, identified on the basis of chromatin interactions; and (3) of "forests and prairies" characterized by high and low CpG islands densities. Chromatin compartments were shown to be associated with the compositionally different, flat and single- or multi-peak DNA structures of the two, GC-poor and GC-rich, "super-families" of isochores. At the second, sub-compartment, level, chromatin corresponds to flat isochores and to isochore loops (due to compositional DNA gradients) that are susceptible to extrusion. Finally, at the short-sequence level, two sets of sequences, GC-poor and GC-rich, define two different nucleosome spacings, a short one and a long one. In conclusion, chromatin structures are moulded according to a "genomic code" by DNA sequences that pervade the genome and leave no room for "junk".
Collapse
Affiliation(s)
- Giorgio Bernardi
- Science Department, Roma Tre University, Viale Marconi 446, 00146 Rome, Italy; ; Tel.: +39-33-540-5892
- Stazione Zoologica Anton Dohrn, Villa Comunale, 80121 Naples, Italy
| |
Collapse
|
9
|
The 3D Genome Shapes the Regulatory Code of Developmental Genes. J Mol Biol 2020; 432:712-723. [DOI: 10.1016/j.jmb.2019.10.017] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 10/11/2019] [Accepted: 10/24/2019] [Indexed: 02/06/2023]
|
10
|
Jabbari K, Chakraborty M, Wiehe T. DNA sequence-dependent chromatin architecture and nuclear hubs formation. Sci Rep 2019; 9:14646. [PMID: 31601866 PMCID: PMC6787200 DOI: 10.1038/s41598-019-51036-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Accepted: 09/18/2019] [Indexed: 02/08/2023] Open
Abstract
In this study, by exploring chromatin conformation capture data, we show that the nuclear segregation of Topologically Associated Domains (TADs) is contributed by DNA sequence composition. GC-peaks and valleys of TADs strongly influence interchromosomal interactions and chromatin 3D structure. To gain insight on the compositional and functional constraints associated with chromatin interactions and TADs formation, we analysed intra-TAD and intra-loop GC variations. This led to the identification of clear GC-gradients, along which, the density of genes, super-enhancers, transcriptional activity, and CTCF binding sites occupancy co-vary non-randomly. Further, the analysis of DNA base composition of nucleolar aggregates and nuclear speckles showed strong sequence-dependant effects. We conjecture that dynamic DNA binding affinity and flexibility underlay the emergence of chromatin condensates, their growth is likely promoted in mechanically soft regions (GC-rich) of the lowest chromatin and nucleosome densities. As a practical perspective, the strong linear association between sequence composition and interchromosomal contacts can help define consensus chromatin interactions, which in turn may be used to study alternative states of chromatin architecture.
Collapse
Affiliation(s)
- Kamel Jabbari
- Institute for Genetics, Biocenter Cologne, University of Cologne, Zülpicher Straße 47a, 50674, Köln, Germany.
| | - Maharshi Chakraborty
- Institute for Genetics, Biocenter Cologne, University of Cologne, Zülpicher Straße 47a, 50674, Köln, Germany
| | - Thomas Wiehe
- Institute for Genetics, Biocenter Cologne, University of Cologne, Zülpicher Straße 47a, 50674, Köln, Germany
| |
Collapse
|
11
|
A general model on the origin of biological codes. Biosystems 2019; 181:11-19. [DOI: 10.1016/j.biosystems.2019.04.010] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Revised: 04/16/2019] [Accepted: 04/16/2019] [Indexed: 01/09/2023]
|
12
|
Ciotti BJ, Planes S. Within-generation consequences of postsettlement mortality for trait composition in wild populations: An experimental test. Ecol Evol 2019; 9:2550-2561. [PMID: 30891199 PMCID: PMC6405511 DOI: 10.1002/ece3.4911] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Revised: 11/29/2018] [Accepted: 12/20/2018] [Indexed: 11/28/2022] Open
Abstract
There is a critical need to understand patterns and causes of intraspecific variation in physiological performance in order to predict the distribution and dynamics of wild populations under natural and human-induced environmental change. However, the usual explanation for trait differences, local adaptation, fails to account for the small-scale phenotypic and genetic divergence observed in fishes and other species with dispersive early life stages. We tested the hypothesis that local-scale variation in the strength of selective mortality in early life mediates the trait composition in later life stages. Through in situ experiments, we manipulated exposure to predators in the coral reef damselfish Dascyllus aruanus and examined consequences for subsequent growth performance under common garden conditions. Groups of 20 recently settled D. aruanus were outplanted to experimental coral colonies in Moorea lagoon and either exposed to natural predation mortality (52% mortality in three days) or protected from predators with cages for three days. After postsettlement mortality, predator-exposed groups were shorter than predator-protected ones, while groups with lower survival were in better condition, suggesting that predators removed the longer, thinner individuals. Growth of both treatment groups was subsequently compared under common conditions. We did not detect consequences of predator exposure for subsequent growth performance: Growth over the following 37 days was not affected by the prior predator treatment or survival. Genotyping at 10 microsatellite loci did indicate, however, that predator exposure significantly influenced the genetic composition of groups. We conclude that postsettlement mortality did not have carryover effects on the subsequent growth performance of cohorts in this instance, despite evidence for directional selection during the initial mortality phase.
Collapse
Affiliation(s)
- Benjamin J. Ciotti
- Laboratoire d'excellence "CORAIL"USR 3278 CNRS‐EPHE‐UPVD CRIOBEPerpignanFrance
- School of Biological and Marine SciencesUniversity of PlymouthPlymouthUK
| | - Serge Planes
- Laboratoire d'excellence "CORAIL"USR 3278 CNRS‐EPHE‐UPVD CRIOBEPerpignanFrance
| |
Collapse
|
13
|
|
14
|
Barbieri M. What is code biology? Biosystems 2018; 164:1-10. [DOI: 10.1016/j.biosystems.2017.10.005] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2017] [Revised: 10/04/2017] [Accepted: 10/05/2017] [Indexed: 01/29/2023]
|
15
|
|
16
|
Peters JE, Makarova KS, Shmakov S, Koonin EV. Recruitment of CRISPR-Cas systems by Tn7-like transposons. Proc Natl Acad Sci U S A 2017; 114:E7358-E7366. [PMID: 28811374 PMCID: PMC5584455 DOI: 10.1073/pnas.1709035114] [Citation(s) in RCA: 176] [Impact Index Per Article: 25.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
A survey of bacterial and archaeal genomes shows that many Tn7-like transposons contain minimal type I-F CRISPR-Cas systems that consist of fused cas8f and cas5f, cas7f, and cas6f genes and a short CRISPR array. Several small groups of Tn7-like transposons encompass similarly truncated type I-B CRISPR-Cas. This minimal gene complement of the transposon-associated CRISPR-Cas systems implies that they are competent for pre-CRISPR RNA (precrRNA) processing yielding mature crRNAs and target binding but not target cleavage that is required for interference. Phylogenetic analysis demonstrates that evolution of the CRISPR-Cas-containing transposons included a single, ancestral capture of a type I-F locus and two independent instances of type I-B loci capture. We show that the transposon-associated CRISPR arrays contain spacers homologous to plasmid and temperate phage sequences and, in some cases, chromosomal sequences adjacent to the transposon. We hypothesize that the transposon-encoded CRISPR-Cas systems generate displacement (R-loops) in the cognate DNA sites, targeting the transposon to these sites and thus facilitating their spread via plasmids and phages. These findings suggest the existence of RNA-guided transposition and fit the guns-for-hire concept whereby mobile genetic elements capture host defense systems and repurpose them for different stages in the life cycle of the element.
Collapse
Affiliation(s)
- Joseph E Peters
- Department of Microbiology, Cornell University, Ithaca, NY 14853;
| | - Kira S Makarova
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894
| | - Sergey Shmakov
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894
- Skolkovo Institute of Science and Technology, Skolkovo, 143025, Russia
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894;
| |
Collapse
|
17
|
Abstract
Every ribonucleic acid begins its cellular life as a transcript. If the transcript or its processing product has a function it should be regarded an RNA. Nonfunctional transcripts, by-products from processing, degradation intermediates, even those originating from (functional) RNAs, and non-functional products of transcriptional gene regulation accomplished via the act of transcription, as well as stochastic (co)transcripts could simply be addressed as transcripts (class 0). The copious functional RNAs (class I), often maturing after one or more processing steps, already are systematized into ever expanding sub-classifications ranging from micro RNAs to rRNAs. Established sub-classifications addressing a wide functional diversity remain unaffected. mRNAs (class II) are distinct from any other RNA by virtue of their potential to be translated into (poly)peptide(s) on ribosomes. We are not proposing a novel RNA classification, but wish to add a basic concept with existing terminology (transcript, RNA, and mRNA) that should serve as an additional framework for carefully delineating RNA function from an avalanche of RNA sequencing data. At the same time, this top level hierarchical model should illuminate important principles of RNA evolution and biology thus heightening our awareness that in biology boundaries and categorizations are typically fuzzy.
Collapse
Affiliation(s)
- Jürgen Brosius
- a Institute of Experimental Pathology, ZMBE, University of Münster , Von-Esmarch-Str. 56, 48149 ; Münster , Germany.,b Institute of Evolutionary and Medical Genomics, Brandenburg Medical School (MHB) , Fehrbelliner Str. 38, 16816 ; Germany
| | - Carsten A Raabe
- a Institute of Experimental Pathology, ZMBE, University of Münster , Von-Esmarch-Str. 56, 48149 ; Münster , Germany.,b Institute of Evolutionary and Medical Genomics, Brandenburg Medical School (MHB) , Fehrbelliner Str. 38, 16816 ; Germany
| |
Collapse
|
18
|
|
19
|
Shimada MK, Sanbonmatsu R, Yamaguchi-Kabata Y, Yamasaki C, Suzuki Y, Chakraborty R, Gojobori T, Imanishi T. Selection pressure on human STR loci and its relevance in repeat expansion disease. Mol Genet Genomics 2016; 291:1851-69. [PMID: 27290643 DOI: 10.1007/s00438-016-1219-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2015] [Accepted: 05/21/2016] [Indexed: 12/30/2022]
Abstract
Short Tandem Repeats (STRs) comprise repeats of one to several base pairs. Because of the high mutability due to strand slippage during DNA synthesis, rapid evolutionary change in the number of repeating units directly shapes the range of repeat-number variation according to selection pressure. However, the remaining questions include: Why are STRs causing repeat expansion diseases maintained in the human population; and why are these limited to neurodegenerative diseases? By evaluating the genome-wide selection pressure on STRs using the database we constructed, we identified two different patterns of relationship in repeat-number polymorphisms between DNA and amino-acid sequences, although both patterns are evolutionary consequences of avoiding the formation of harmful long STRs. First, a mixture of degenerate codons is represented in poly-proline (poly-P) repeats. Second, long poly-glutamine (poly-Q) repeats are favored at the protein level; however, at the DNA level, STRs encoding long poly-Qs are frequently divided by synonymous SNPs. Furthermore, significant enrichments of apoptosis and neurodevelopment were biological processes found specifically in genes encoding poly-Qs with repeat polymorphism. This suggests the existence of a specific molecular function for polymorphic and/or long poly-Q stretches. Given that the poly-Qs causing expansion diseases were longer than other poly-Qs, even in healthy subjects, our results indicate that the evolutionary benefits of long and/or polymorphic poly-Q stretches outweigh the risks of long CAG repeats predisposing to pathological hyper-expansions. Molecular pathways in neurodevelopment requiring long and polymorphic poly-Q stretches may provide a clue to understanding why poly-Q expansion diseases are limited to neurodegenerative diseases.
Collapse
Affiliation(s)
- Makoto K Shimada
- Institute for Comprehensive Medical Science, Fujita Health University, 1-98 Dengakugakubo, Kutsukake-cho, Toyoake, Aichi, 470-1192, Japan. .,National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi Koto-ku, Tokyo, 135-0064, Japan. .,Japan Biological Informatics Consortium, 10F TIME24 Building, 2-4-32 Aomi, Koto-ku, Tokyo, 135-8073, Japan.
| | - Ryoko Sanbonmatsu
- Japan Biological Informatics Consortium, 10F TIME24 Building, 2-4-32 Aomi, Koto-ku, Tokyo, 135-8073, Japan
| | - Yumi Yamaguchi-Kabata
- National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi Koto-ku, Tokyo, 135-0064, Japan.,Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, 980-8573, Japan
| | - Chisato Yamasaki
- National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi Koto-ku, Tokyo, 135-0064, Japan.,Japan Biological Informatics Consortium, 10F TIME24 Building, 2-4-32 Aomi, Koto-ku, Tokyo, 135-8073, Japan
| | - Yoshiyuki Suzuki
- Graduate School of Natural Sciences, Nagoya City University, 1 Yamanohata, Mizuho-cho, Mizuho-ku, Nagoya, Aichi, 467-8501, Japan
| | - Ranajit Chakraborty
- Health Science Center, University of North Texas, 3500 Camp Bowie Blvd., Fort Worth, TX, 76107, USA
| | - Takashi Gojobori
- National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi Koto-ku, Tokyo, 135-0064, Japan.,Computational Bioscience Research Center, King Abdullah University of Science and Technology, Ibn Al-Haytham Building (West), Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Tadashi Imanishi
- National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi Koto-ku, Tokyo, 135-0064, Japan.,Department of Molecular Life Science, Tokai University School of Medicine, 143 Shimokasuya, Isehara, Kanagawa, 259-1193, Japan
| |
Collapse
|
20
|
Multiplexing Genetic and Nucleosome Positioning Codes: A Computational Approach. PLoS One 2016; 11:e0156905. [PMID: 27272176 PMCID: PMC4896621 DOI: 10.1371/journal.pone.0156905] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2015] [Accepted: 05/20/2016] [Indexed: 11/19/2022] Open
Abstract
Eukaryotic DNA is strongly bent inside fundamental packaging units: the nucleosomes. It is known that their positions are strongly influenced by the mechanical properties of the underlying DNA sequence. Here we discuss the possibility that these mechanical properties and the concomitant nucleosome positions are not just a side product of the given DNA sequence, e.g. that of the genes, but that a mechanical evolution of DNA molecules might have taken place. We first demonstrate the possibility of multiplexing classical and mechanical genetic information using a computational nucleosome model. In a second step we give evidence for genome-wide multiplexing in Saccharomyces cerevisiae and Schizosacharomyces pombe. This suggests that the exact positions of nucleosomes play crucial roles in chromatin function.
Collapse
|
21
|
Igamberdiev AU, Shklovskiy-Kordi NE. Computational power and generative capacity of genetic systems. Biosystems 2016; 142-143:1-8. [DOI: 10.1016/j.biosystems.2016.01.003] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2015] [Revised: 01/25/2016] [Accepted: 01/27/2016] [Indexed: 01/01/2023]
|
22
|
Barbieri M. A new theory of development: the generation of complexity in ontogenesis. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2016; 374:rsta.2015.0148. [PMID: 26857661 DOI: 10.1098/rsta.2015.0148] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 08/01/2015] [Indexed: 06/05/2023]
Abstract
Today there is a very wide consensus on the idea that embryonic development is the result of a genetic programme and of epigenetic processes. Many models have been proposed in this theoretical framework to account for the various aspects of development, and virtually all of them have one thing in common: they do not acknowledge the presence of organic codes (codes between organic molecules) in ontogenesis. Here it is argued instead that embryonic development is a convergent increase in complexity that necessarily requires organic codes and organic memories, and a few examples of such codes are described. This is the code theory of development, a theory that was originally inspired by an algorithm that is capable of reconstructing structures from incomplete information, an algorithm that here is briefly summarized because it makes it intuitively appealing how a convergent increase in complexity can be achieved. The main thesis of the new theory is that the presence of organic codes in ontogenesis is not only a theoretical necessity but, first and foremost, an idea that can be tested and that has already been found to be in agreement with the evidence.
Collapse
Affiliation(s)
- Marcello Barbieri
- Dipartimento di Morfologia ed Embriologia, via Fossato di Mortara 64a, Ferrara 44121, Italy
| |
Collapse
|
23
|
Kumar B, Saini S. Analysis of the optimality of the standard genetic code. MOLECULAR BIOSYSTEMS 2016; 12:2642-51. [DOI: 10.1039/c6mb00262e] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Many theories have been proposed attempting to explain the origin of the genetic code. In this work, we compare performance of the standard genetic code against millions of randomly generated codes. On left, ability of genetic codes to encode additional information and their robustness to frameshift mutations.
Collapse
Affiliation(s)
- Balaji Kumar
- Department of Chemical Engineering
- Indian Institute of Technology Bombay
- Mumbai – 400 076
- India
| | - Supreet Saini
- Department of Chemical Engineering
- Indian Institute of Technology Bombay
- Mumbai – 400 076
- India
| |
Collapse
|
24
|
Raabe CA, Brosius J. Does every transcript originate from a gene? Ann N Y Acad Sci 2015; 1341:136-48. [PMID: 25847549 DOI: 10.1111/nyas.12741] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2014] [Revised: 02/05/2015] [Accepted: 02/11/2015] [Indexed: 12/20/2022]
Abstract
Outdated gene definitions favored regions corresponding to mature messenger RNAs, in particular, the open reading frame. In eukaryotes, the intergenic space was widely regarded nonfunctional and devoid of RNA transcription. Original concepts were based on the assumption that RNA expression was restricted to known protein-coding genes and a few so-called structural RNA genes, such as ribosomal RNAs or transfer RNAs. With the discovery of introns and, more recently, sensitive techniques for monitoring genome-wide transcription, this view had to be substantially modified. Tiling microarrays and RNA deep sequencing revealed myriads of transcripts, which cover almost entire genomes. The tremendous complexity of non-protein-coding RNA transcription has to be integrated into novel gene definitions. Despite an ever-growing list of functional RNAs, questions concerning the mass of identified transcripts are under dispute. Here, we examined genome-wide transcription from various angles, including evolutionary considerations, and suggest, in analogy to novel alternative splice variants that do not persist, that the vast majority of transcripts represent raw material for potential, albeit rare, exaptation events.
Collapse
Affiliation(s)
- Carsten A Raabe
- Institute of Experimental Pathology, ZMBE, University of Münster, Münster, Germany
| | | |
Collapse
|
25
|
Barbieri M. Semantic Biology and the Mind-Body Problem: The Theory of the Conventional Mind. ACTA ACUST UNITED AC 2015. [DOI: 10.1162/biot.2006.1.4.352] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
|
26
|
Variation and constraints in species-specific promoter sequences. J Theor Biol 2014; 363:357-66. [PMID: 25149367 DOI: 10.1016/j.jtbi.2014.08.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2014] [Revised: 07/30/2014] [Accepted: 08/04/2014] [Indexed: 11/24/2022]
Abstract
A vast literature is nowadays devoted to the search of correlations between transcription related functions and the composition of sequences upstream the Transcription Start Site. Little is known about the possible functional effects of nucleotide distributions on the conformational landscape of DNA in such regions. We have used suitable statistical indicators for identifying sequences that may play an important role in regulating transcription processes. In particular, we have analyzed base composition, periodicity and information content in sets of aligned promoters clustered according to functional information in order to obtain an insight on the main structural differences between promoters regulating genes with different functions. Our results show that when we select promoters according to some biological information, in a single species, at least in vertebrates, we observe structurally different classes of sequences. The highly variable and differentiated gene expression patterns may explain the great extent of structural differentiation observed in complex organisms. In fact, despite our analysis is focused on Homo sapiens, we provide also a comparison with other species, selected at different positions in the phylogenetic tree.
Collapse
|
27
|
Brosius J. The persistent contributions of RNA to eukaryotic gen(om)e architecture and cellular function. Cold Spring Harb Perspect Biol 2014; 6:a016089. [PMID: 25081515 DOI: 10.1101/cshperspect.a016089] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Currently, the best scenario for earliest forms of life is based on RNA molecules as they have the proven ability to catalyze enzymatic reactions and harbor genetic information. Evolutionary principles valid today become apparent in such models already. Furthermore, many features of eukaryotic genome architecture might have their origins in an RNA or RNA/protein (RNP) world, including the onset of a further transition, when DNA replaced RNA as the genetic bookkeeper of the cell. Chromosome maintenance, splicing, and regulatory function via RNA may be deeply rooted in the RNA/RNP worlds. Mostly in eukaryotes, conversion from RNA to DNA is still ongoing, which greatly impacts the plasticity of extant genomes. Raw material for novel genes encoding protein or RNA, or parts of genes including regulatory elements that selection can act on, continues to enter the evolutionary lottery.
Collapse
Affiliation(s)
- Jürgen Brosius
- Institute of Experimental Pathology (ZMBE), University of Münster, D-48149 Münster, Germany
| |
Collapse
|
28
|
Abstract
According to quasispecies theory, high mutation rates limit the amount of information genomes can store (Eigen’s Paradox), whereas genomes with higher degrees of neutrality may be selected even at the expenses of higher replication rates (the “survival of the flattest” effect). Introducing a complex genotype to phenotype map, such as RNA folding, epitomizes such effect because of the existence of neutral networks and their exploitation by evolution, affecting both population structure and genome composition. We reexamine these classical results in the light of an RNA-based system that can evolve its own ecology. Contrary to expectations, we find that quasispecies evolving at high mutation rates are steep and characterized by one master sequence. Importantly, the analysis of the system and the characterization of the evolved quasispecies reveal the emergence of functionalities as phenotypes of nonreplicating genotypes, whose presence is crucial for the overall viability and stability of the system. In other words, the master sequence codes for the information of the entire ecosystem, whereas the decoding happens, stochastically, through mutations. We show that this solution quickly outcompetes strategies based on genomes with a high degree of neutrality. In conclusion, individually coded but ecosystem-based diversity evolves and persists indefinitely close to the Information Threshold.
Collapse
Affiliation(s)
| | - Paulien Hogeweg
- Theoretical Biology and Bioinformatics Group, Utrecht University, The Netherlands
| |
Collapse
|
29
|
|
30
|
Frenkel ZM, Barzily Z, Volkovich Z, Trifonov EN. Hidden ancient repeats in DNA: Mapping and quantification. Gene 2013; 528:282-7. [DOI: 10.1016/j.gene.2013.06.059] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2013] [Accepted: 06/21/2013] [Indexed: 01/27/2023]
|
31
|
Donaldson ZR, Young LJ. The relative contribution of proximal 5' flanking sequence and microsatellite variation on brain vasopressin 1a receptor (Avpr1a) gene expression and behavior. PLoS Genet 2013; 9:e1003729. [PMID: 24009523 PMCID: PMC3757045 DOI: 10.1371/journal.pgen.1003729] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2013] [Accepted: 07/01/2013] [Indexed: 01/16/2023] Open
Abstract
Certain genes exhibit notable diversity in their expression patterns both within and between species. One such gene is the vasopressin receptor 1a gene (Avpr1a), which exhibits striking differences in neural expression patterns that are responsible for mediating differences in vasopressin-mediated social behaviors. The genomic mechanisms that contribute to these remarkable differences in expression are not well understood. Previous work has suggested that both the proximal 5′ flanking region and a polymorphic microsatellite element within that region of the vole Avpr1a gene are associated with variation in V1a receptor (V1aR) distribution and behavior, but neither has been causally linked. Using homologous recombination in mice, we reveal the modest contribution of proximal 5′ flanking sequences to species differences in V1aR distribution, and confirm that variation in V1aR distribution impacts stress-coping in the forced swim test. We also demonstrate that the vole Avpr1a microsatellite structure contributes to Avpr1a expression in the amygdala, thalamus, and hippocampus, mirroring a subset of the inter- and intra-species differences observed in central V1aR patterns in voles. This is the first direct evidence that polymorphic microsatellite elements near behaviorally relevant genes can contribute to diversity in brain gene expression profiles, providing a mechanism for generating behavioral diversity both at the individual and species level. However, our results suggest that many features of species-specific expression patterns are mediated by elements outside of the immediate 5′ flanking region of the gene. DNA sequence variation underlies many differences both within and between species. In this paper, we investigate a specific DNA sequence that is thought to influence expression of a gene that modulates behavior, the vasopressin V1a receptor gene (Avpr1a). Specifically, differences in the expression of V1a receptor in the brain have been causally tied to social behavior differences, but the genetic basis of these differences is not understood. Using transgenic mice, we investigate the role of DNA sequences upstream of this gene in generating species-specific and individual variation in Avpr1a expression. We find that, contrary to our expectation, this region has only a modest influence on differences in expression patterns across rodent species. This indicates that DNA elements outside of this region play a larger role in species-level differences in expression. We confirm that variation in Avpr1a expression mediated by this upstream region translates to differences in behavior. We also find that variable DNA sequences associated with repetitive motifs within this region subtly influence gene expression. Together these findings highlight the complexity of genetic mechanisms that influence diversity in brain receptor patterns and support the idea that variable repetitive elements can influence both species and individual differences in gene expression patterns.
Collapse
Affiliation(s)
- Zoe R Donaldson
- Division of Integrative Neuroscience, Department of Psychiatry, Columbia University, New York, New York, United States of America.
| | | |
Collapse
|
32
|
Ma XX, Feng YP, Liu JL, Ma B, Chen L, Zhao YQ, Guo PH, Guo JZ, Ma ZR, Zhang J. The effects of the codon usage and translation speed on protein folding of 3Dpol of foot-and-mouth disease virus. Vet Res Commun 2013; 37:243-50. [DOI: 10.1007/s11259-013-9564-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/10/2013] [Indexed: 10/26/2022]
|
33
|
Merkulova TI, Ananko EA, Ignatieva EV, Kolchanov NA. Transcription regulatory codes of eukaryotic genomes. RUSS J GENET+ 2013. [DOI: 10.1134/s1022795413010079] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
34
|
Trifonov EN, Volkovich Z, Frenkel ZM. Multiple levels of meaning in DNA sequences, and one more. Ann N Y Acad Sci 2012; 1267:35-8. [PMID: 22954214 DOI: 10.1111/j.1749-6632.2012.06589.x] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
If we define a genetic code as a widespread DNA sequence pattern that carries a message with an impact on biology, then there are multiple genetic codes. Sequences involved in these codes overlap and, thus, both interact with and constrain each other, such as for the triplet code, the intron-splicing code, the code for amphipathic alpha helices, and the chromatin code. Nucleosomes preferentially are located at the ends of exons, thus protecting splice junctions, with the N9 positions of guanines of the GT and AG junctions oriented toward the histones. Analysis of protein-coding sequences reveals numerous traces of tandem repeats, apparently formed by triplet expansion, which in effect is a genome inflation ``code''. Our data are consistent with the hypothesis that expansion of simple tandem repetition of certain aggressive triplets has been a characteristic of life from its emergence. Such expanding triplets appear to be the major factor underlying observed codon usage biases.
Collapse
Affiliation(s)
- Edward N Trifonov
- Genome Diversity Center, Institute of Evolution, University of Haifa, Mount Carmel, Haifa, Israel.
| | | | | |
Collapse
|
35
|
Abstract
The ability to survey polymorphism on a genomic scale has enabled genome-wide scans for the targets of natural selection. Theory that connects patterns of genetic variation to evidence of natural selection most often assumes a diallelic locus and no recurrent mutation. Although these assumptions are suitable to selection that targets single nucleotide variants, fundamentally different types of mutation generate abundant polymorphism in genomes. Moreover, recent empirical results suggest that mutationally complex, multiallelic loci including microsatellites and copy number variants are sometimes targeted by natural selection. Given their abundance, the lack of inference methods tailored to the mutational peculiarities of these types of loci represents a notable gap in our ability to interrogate genomes for signatures of natural selection. Previous theoretical investigations of mutation-selection balance at multiallelic loci include assumptions that limit their application to inference from empirical data. Focusing on microsatellites, we assess the dynamics and population-level consequences of selection targeting mutationally complex variants. We develop general models of a multiallelic fitness surface, a realistic model of microsatellite mutation, and an efficient simulation algorithm. Using these tools, we explore mutation-selection-drift equilibrium at microsatellites and investigate the mutational history and selective regime of the microsatellite that causes Friedreich's ataxia. We characterize microsatellite selective events by their duration and cost, note similarities to sweeps from standing point variation, and conclude that it is premature to label microsatellites as ubiquitous agents of efficient adaptive change. Together, our models and simulation algorithm provide a powerful framework for statistical inference, which can be used to test the neutrality of microsatellites and other multiallelic variants.
Collapse
Affiliation(s)
- Ryan J Haasl
- Laboratory of Genetics, University of Wisconsin, USA.
| | | |
Collapse
|
36
|
Trifonov EN. Nucleosome Positioning by Sequence, State of the Art and Apparent Finale. J Biomol Struct Dyn 2012; 27:741-6. [DOI: 10.1080/073911010010524944] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
37
|
Cole HA, Nagarajavel V, Clark DJ. Perfect and imperfect nucleosome positioning in yeast. BIOCHIMICA ET BIOPHYSICA ACTA 2012; 1819:639-43. [PMID: 22306662 PMCID: PMC3358424 DOI: 10.1016/j.bbagrm.2012.01.008] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2011] [Revised: 01/05/2012] [Accepted: 01/11/2012] [Indexed: 11/17/2022]
Abstract
Numerous studies of nucleosome positioning have shown that nucleosomes almost invariably adopt one of several alternative overlapping positions on a short DNA fragment in vitro. We define such a set of overlapping positions as a "position cluster", and the 5S RNA gene positioning sequence is presented as an example. The notable exception is the synthetic 601-sequence, which can position a nucleosome perfectly in vitro, though not in vivo. Many years ago, we demonstrated that nucleosome position clusters are present on the CUP1 and HIS3 genes in native yeast chromatin. Recently, using genome-wide paired-end sequencing of nucleosomes, we have shown that position clusters are the general rule in yeast chromatin, not the exception. We argue that, within a cell population, one of several alternative nucleosomal arrays is formed on each gene. We show how position clusters and alternative arrays can give rise to typical nucleosome occupancy profiles, and that position clusters are disrupted by transcriptional activation. The centromeric nucleosome is a rare example of perfect positioning in vivo. It is, however, a special case, since it contains the centromeric histone H3 variant instead of normal H3. Perfect positioning might be due to centromeric sequence-specific DNA binding proteins. Finally, we point out that the existence of position clusters implies that the putative nucleosome code is degenerate. We suggest that degeneracy might be a crucial point in the debate concerning the code. This article is part of a Special Issue entitled: Chromatin in time and space.
Collapse
Affiliation(s)
- Hope A. Cole
- Program in Genomics of Differentiation, Eunice Kennedy Shriver National Institute for Child Health and Human Development, National Institutes of Health, Bethesda MD
| | - V. Nagarajavel
- Program in Genomics of Differentiation, Eunice Kennedy Shriver National Institute for Child Health and Human Development, National Institutes of Health, Bethesda MD
| | - David J. Clark
- Program in Genomics of Differentiation, Eunice Kennedy Shriver National Institute for Child Health and Human Development, National Institutes of Health, Bethesda MD
| |
Collapse
|
38
|
|
39
|
King DG. Evolution of simple sequence repeats as mutable sites. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2012; 769:10-25. [PMID: 23560302 DOI: 10.1007/978-1-4614-5434-2_2] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Because natural selection is commonly presumed to minimize mutation rates, the discovery of mutationally unstable simple sequence repeats (SSRs) in many functional genomic locations came as a surprise to many biologists. Whether such SSRs persist in spite of or because of their intrinsic mutability-whether they constitute a genetic burden or an evolutionary boon--remains uncertain. Two contrasting evolutionary explanations can be offered for SSR abundance. First, suppressing the inherent mutability of repetitive sequences might simply lie beyond the reach of natural selection. Alternatively, natural selection might indirectly favor SSRs at sites where particular repeat-number variants have provided positive contributions to fitness. Indirect selection could thereby shape SSRs into "tuning knobs" that facilitate evolutionary adaptation by implementing an implicit protocol of incremental adjustability. The latter possibility is consistent with deep evolutionary conservation of some SSRs, including several in genes with neurological and neurodevelopmental function.
Collapse
Affiliation(s)
- David G King
- Department of Anatomy, Southern Illinois University Carbondale, Carbondale, Illinois, USA.
| |
Collapse
|
40
|
Thirty years of multiple sequence codes. GENOMICS PROTEOMICS & BIOINFORMATICS 2011; 9:1-6. [PMID: 21641556 PMCID: PMC5054146 DOI: 10.1016/s1672-0229(11)60001-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 09/15/2010] [Accepted: 12/09/2010] [Indexed: 11/23/2022]
Abstract
An overview is presented on the status of studies on multiple codes in genetic sequences. Indirectly, the existence of multiple codes is recognized in the form of several rediscoveries of Second Genetic Code that is different each time. A due credit is given to earlier seminal work related to the codes often neglected in literature. The latest developments in the field of chromatin code are discussed, as well as perspectives of single-base resolution studies of nucleosome positioning, including rotational setting of DNA on the surface of the histone octamers.
Collapse
|
41
|
Rapoport AE, Trifonov EN. "Anticipated" nucleosome positioning pattern in prokaryotes. Gene 2011; 488:41-5. [PMID: 21884764 DOI: 10.1016/j.gene.2011.08.002] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2011] [Revised: 07/29/2011] [Accepted: 08/03/2011] [Indexed: 11/19/2022]
Abstract
Linguistic (word count) analysis of prokaryotic genome sequences, by Shannon N-gram extension, reveals that the dominant hidden motifs in A+T rich genomes are T(A)(T)A and G(A)(T)C with uncertain number of repeating A and T. Since prokaryotic sequences are largely protein-coding, the motifs would correspond to amphipathic alpha-helices with alternating lysine and phenylalanine as preferential polar and non-polar residues. The motifs are also known in eukaryotes, as nucleosome positioning patterns. Their existence in prokaryotes as well may serve for binding of histone-like proteins to DNA. In this case the above patterns in prokaryotes may be considered as "anticipated" nucleosome positioning patterns which, quite likely, existed in prokaryotic genomes before the evolutionary separation between eukaryotes and prokaryotes.
Collapse
Affiliation(s)
- Alexandra E Rapoport
- Genome Diversity Center, Institute of Evolution, University of Haifa, Mount Carmel, Haifa 31905, Israel
| | | |
Collapse
|
42
|
Calistri E, Livi R, Buiatti M. Evolutionary trends of GC/AT distribution patterns in promoters. Mol Phylogenet Evol 2011; 60:228-35. [PMID: 21554969 DOI: 10.1016/j.ympev.2011.04.015] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2010] [Revised: 03/25/2011] [Accepted: 04/17/2011] [Indexed: 11/18/2022]
Abstract
Nucleotide distributions in genomes is known not to be random, showing the presence of specific motifs, long and short range correlations, periodicities, etc. Particularly, motifs are critical for the recognition by specific proteins affecting chromosome organization, transcription and DNA replication but little is known about the possible functional effects of nucleotide distributions on the conformational landscape of DNA, putatively leading to differential selective pressures throughout evolution. Promoter sequences have a fundamental role in the regulation of gene activity and a vast literature suggests that their conformational landscapes may be a critical factor in gene expression dynamics. On these grounds, with the aim of investigating the putative existence of phylogenetic patterns of promoter base distributions, we analyzed GC/AT ratios along the 1000 nucleotide sequences upstream of TSS in wide sets of promoters belonging to organisms ranging from bacteria to pluricellular eukaryotes. The data obtained showed very clear phylogenetic trends throughout evolution of promoter sequence base distributions. Particularly, in all cases either GC-rich or AT-rich monotone gradients were observed: the former being present in eukaryotes, the latter in bacteria along with strand biases. Moreover, within eukaryotes, GC-rich gradients increased in length from unicellular organisms to plants, to vertebrates and, within them, from ancestral to more recent species. Finally, results were thoroughly discussed with particular attention to the possible correlation between nucleotide distribution patterns, evolution, and the putative existence of differential selection pressures, deriving from structural and/or functional constraints, between and within prokaryotes and eukaryotes.
Collapse
Affiliation(s)
- Elisa Calistri
- Dipartimento di Biologia Evoluzionistica, Universita' degli Studi di Firenze, via Romana 19, 50125 Firenze, Italy.
| | | | | |
Collapse
|
43
|
Frenkel ZM, Bettecken T, Trifonov EN. Nucleosome DNA sequence structure of isochores. BMC Genomics 2011; 12:203. [PMID: 21510861 PMCID: PMC3097165 DOI: 10.1186/1471-2164-12-203] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2010] [Accepted: 04/21/2011] [Indexed: 12/03/2022] Open
Abstract
Background Significant differences in G+C content between different isochore types suggest that the nucleosome positioning patterns in DNA of the isochores should be different as well. Results Extraction of the patterns from the isochore DNA sequences by Shannon N-gram extension reveals that while the general motif YRRRRRYYYYYR is characteristic for all isochore types, the dominant positioning patterns of the isochores vary between TAAAAATTTTTA and CGGGGGCCCCCG due to the large differences in G+C composition. This is observed in human, mouse and chicken isochores, demonstrating that the variations of the positioning patterns are largely G+C dependent rather than species-specific. The species-specificity of nucleosome positioning patterns is revealed by dinucleotide periodicity analyses in isochore sequences. While human sequences are showing CG periodicity, chicken isochores display AG (CT) periodicity. Mouse isochores show very weak CG periodicity only. Conclusions Nucleosome positioning pattern as revealed by Shannon N-gram extension is strongly dependent on G+C content and different in different isochores. Species-specificity of the pattern is subtle. It is reflected in the choice of preferentially periodical dinucleotides.
Collapse
Affiliation(s)
- Zakharia M Frenkel
- Genome Diversity Center, Institute of Evolution, University of Haifa, Mount Carmel, Haifa 31905, Israel
| | | | | |
Collapse
|
44
|
Rapoport AE, Frenkel ZM, Trifonov EN. Nucleosome positioning pattern derived from oligonucleotide compositions of genomic sequences. J Biomol Struct Dyn 2011; 28:567-74. [PMID: 21142224 DOI: 10.1080/07391102.2011.10531243] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Availability of nucleosome positioning pattern(s) is crucial for chromatin studies. The matrix form of the pattern has been recently derived (I. Gabdank, D. Barash, E. N. Trifonov. J Biomol Struct Dyn 26, 403-412 (2009), and E. N. Trifonov. J Biomol Struct Dyn 27, 741-746 (2010)). In its simplified linear form it is described by the motif CGRAAATTTYCG. Oligonucleotide components of the motif (say, triplets GRA, RAA, AAA, etc.) would be expected to appear in eukaryotic sequences more frequently. In this work we attempted the reconstruction of the bendability patterns for 13 genomes by a novel approach-extension of highest frequency trinucleotides. The consensus of the patterns reconstructed on the basis of trinucleotide frequencies in 13 eukaryotic genomes is derived: CRAAAATTTTYG. It conforms to the earlier established sequence motif. The reconstruction, thus, attests to the universality of the nucleosome DNA bendability pattern.
Collapse
Affiliation(s)
- Alexandra E Rapoport
- Genome Diversity Center, Institute of Evolution, University of Haifa, Mount Carmel, Haifa 31905, Israel.
| | | | | |
Collapse
|
45
|
Role of Everlasting Triplet Expansions in Protein Evolution. J Mol Evol 2010; 72:232-9. [DOI: 10.1007/s00239-010-9425-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2010] [Accepted: 12/01/2010] [Indexed: 02/05/2023]
|
46
|
Nair TM. Sequence periodicity in nucleosomal DNA and intrinsic curvature. BMC STRUCTURAL BIOLOGY 2010; 10 Suppl 1:S8. [PMID: 20487515 PMCID: PMC2873831 DOI: 10.1186/1472-6807-10-s1-s8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
BACKGROUND Most eukaryotic DNA contained in the nucleus is packaged by wrapping DNA around histone octamers. Histones are ubiquitous and bind most regions of chromosomal DNA. In order to achieve smooth wrapping of the DNA around the histone octamer, the DNA duplex should be able to deform and should possess intrinsic curvature. The deformability of DNA is a result of the non-parallelness of base pair stacks. The stacking interaction between base pairs is sequence dependent. The higher the stacking energy the more rigid the DNA helix, thus it is natural to expect that sequences that are involved in wrapping around the histone octamer should be unstacked and possess intrinsic curvature. Intrinsic curvature has been shown to be dictated by the periodic recurrence of certain dinucleotides. Several genome-wide studies directed towards mapping of nucleosome positions have revealed periodicity associated with certain stretches of sequences. In the current study, these sequences have been analyzed with a view to understand their sequence-dependent structures. RESULTS Higher order DNA structures and the distribution of molecular bend loci associated with 146 base nucleosome core DNA sequence from C. elegans and chicken have been analyzed using the theoretical model for DNA curvature. The curvature dispersion calculated by cyclically permuting the sequences revealed that the molecular bend loci were delocalized throughout the nucleosome core region and had varying degrees of intrinsic curvature. CONCLUSIONS The higher order structures associated with nucleosomes of C.elegans and chicken calculated from the sequences revealed heterogeneity with respect to the deviation of the DNA axis. The results points to the possibility of context dependent curvature of varying degrees to be associated with nucleosomal DNA.
Collapse
Affiliation(s)
- T Murlidharan Nair
- Department of Biological sciences, Indiana University South Bend, 1700 Mishawaka Ave, South Bend, IN-46634, USA.
| |
Collapse
|
47
|
Base pair stacking in nucleosome DNA and bendability sequence pattern. J Theor Biol 2010; 263:337-9. [DOI: 10.1016/j.jtbi.2009.11.020] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2009] [Revised: 11/23/2009] [Accepted: 11/23/2009] [Indexed: 11/19/2022]
|
48
|
Seaman JD, Sanford JC. Skittle: A 2-Dimensional Genome Visualization Tool. BMC Bioinformatics 2009; 10:452. [PMID: 20042093 PMCID: PMC2817707 DOI: 10.1186/1471-2105-10-452] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2009] [Accepted: 12/30/2009] [Indexed: 11/16/2022] Open
Abstract
Background It is increasingly evident that there are multiple and overlapping patterns within the genome, and that these patterns contain different types of information - regarding both genome function and genome history. In order to discover additional genomic patterns which may have biological significance, novel strategies are required. To partially address this need, we introduce a new data visualization tool entitled Skittle. Results This program first creates a 2-dimensional nucleotide display by assigning four colors to the four nucleotides, and then text-wraps to a user adjustable width. This nucleotide display is accompanied by a "repeat map" which comprehensively displays all local repeating units, based upon analysis of all possible local alignments. Skittle includes a smooth-zooming interface which allows the user to analyze genomic patterns at any scale. Skittle is especially useful in identifying and analyzing tandem repeats, including repeats not normally detectable by other methods. However, Skittle is also more generally useful for analysis of any genomic data, allowing users to correlate published annotations and observable visual patterns, and allowing for sequence and construct quality control. Conclusions Preliminary observations using Skittle reveal intriguing genomic patterns not otherwise obvious, including structured variations inside tandem repeats. The striking visual patterns revealed by Skittle appear to be useful for hypothesis development, and have already led the authors to theorize that imperfect tandem repeats could act as information carriers, and may form tertiary structures within the interphase nucleus.
Collapse
|
49
|
Abstract
While once almost synonymous, there is an increasing gap between the expanding definition of what constitutes a gene and the conservative and narrowly defined terms code or coding, which for a long time, almost exclusively constituted the open reading frame. Much confusion results from this disparity, especially in light of the plethora of noncoding RNAs (more correctly termed "non-protein-coding RNAs") that usually are encoded and transcribed by their own genes. A simple solution would be to adopt Ed Trifonov's less constrained definition of a code as any sequence pattern that can have a biological function. Such consideration favors not only a more complex view of the gene as an entity composed of many more or less conserved subgenic modules, but also a concept of modular evolution of genes and entire genomes.
Collapse
Affiliation(s)
- Jürgen Brosius
- Institute of Experimental Pathology (ZMBE), University of Münster, Münster, Germany.
| |
Collapse
|
50
|
Affiliation(s)
- David G. King
- Departments of Anatomy and Zoology, Southern Illinois University, Carbondale, IL 62901, USA. Department of Biotechnology and Food Engineering, The Technion—Israel Institute of Technology, Haifa 32000, Israel
| | - Yechezkel Kashi
- Departments of Anatomy and Zoology, Southern Illinois University, Carbondale, IL 62901, USA. Department of Biotechnology and Food Engineering, The Technion—Israel Institute of Technology, Haifa 32000, Israel
| |
Collapse
|