1
|
Smeds L, Kamali K, Kejnovská I, Kejnovský E, Chiaromonte F, Makova KD. Non-canonical DNA in human and other ape telomere-to-telomere genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.02.610891. [PMID: 39713403 PMCID: PMC11661062 DOI: 10.1101/2024.09.02.610891] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/24/2024]
Abstract
Non-canonical (non-B) DNA structures-e.g., bent DNA, hairpins, G-quadruplexes, Z-DNA, etc.-which form at certain sequence motifs (e.g., A-phased repeats, inverted repeats, etc.), have emerged as important regulators of cellular processes and drivers of genome evolution. Yet, they have been understudied due to their repetitive nature and potentially inaccurate sequences generated with short-read technologies. Here we comprehensively characterize such motifs in the long-read telomere-to-telomere (T2T) genomes of human, bonobo, chimpanzee, gorilla, Bornean orangutan, Sumatran orangutan, and siamang. Non-B DNA motifs are enriched at the genomic regions added to T2T assemblies, and occupy 9-15%, 9-11%, and 12-38% of autosomes, and chromosomes X and Y, respectively. Functional regions (e.g., promoters and enhancers) and repetitive sequences are enriched in non-B DNA motifs. Non-B DNA motifs concentrate at short arms of acrocentric chromosomes in a pattern reflecting their satellite repeat content and might contribute to satellite dynamics in these regions. Most centromeres and/or their flanking regions are enriched in at least one non-B DNA motif type, consistent with a potential role of non-B structures in determining centromeres. Our results highlight the uneven distribution of predicted non-B DNA structures across ape genomes and suggest their novel functions in previously inaccessible genomic regions.
Collapse
Affiliation(s)
- Linnéa Smeds
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Kaivan Kamali
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Iva Kejnovská
- Department of Biophysics of Nucleic Acids, Institute of Biophysics of the Czech Academy of Sciences, Královopolská 135, 612 65 Brno, Czech Republic
| | - Eduard Kejnovský
- Department of Plant Developmental Genetics, Institute of Biophysics of the Czech Academy of Sciences, Královopolská 135, 612 65 Brno, Czech Republic
| | - Francesca Chiaromonte
- Department of Statistics, Penn State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park, PA 16802 USA
- L'EMbeDS, Sant'Anna School of Advanced Studies, 56127 Pisa, Italy
| | - Kateryna D Makova
- Department of Biology, Penn State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park, PA 16802 USA
| |
Collapse
|
2
|
Provatas K, Chantzi N, Patsakis M, Nayak A, Mouratidis I, Georgakopoulos-Soares I. Microsatellites explorer: A database of short tandem repeats across genomes. Comput Struct Biotechnol J 2024; 23:3817-3826. [PMID: 39525087 PMCID: PMC11550718 DOI: 10.1016/j.csbj.2024.10.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2024] [Revised: 10/24/2024] [Accepted: 10/24/2024] [Indexed: 11/16/2024] Open
Abstract
Short tandem repeats (STRs) are widespread, repetitive elements, with a number of biological functions and are among the most rapidly mutating regions in the genome. Their distribution varies significantly between taxonomic groups in the tree of life and are highly polymorphic within the human population. Advances in sequencing technologies coupled with decreasing costs have enabled the generation of an ever-growing number of complete genomes. Additionally, the arrival of accurate long reads has facilitated the generation of Telomere-to-Telomere (T2T) assemblies of complete genomes. Nevertheless, there is no comprehensive database that encompasses the STRs found per genome across different organisms and for different human genomes across diverse ancestries. Here we introduce Microsatellites Explorer, a database of STRs found in the genomes of 117,253 organisms across all major taxonomic groups, 15 T2T genome assemblies of different organisms, and 94 human haplotypes from the human pangenome. The database currently hosts 406,758,798 STR sequences, serving as a centralized user-friendly repository to perform searches, interactive visualizations, and download existing STR data for independent analysis. Microsatellites Explorer is implemented as a web-portal for browsing, analyzing and downloading STR data. Microsatellites Explorer is publicly available at https://www.microsatellitesexplorer.com.
Collapse
Affiliation(s)
- Kimonas Provatas
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | - Michail Patsakis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | - Akshatha Nayak
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| |
Collapse
|
3
|
Wang YR, Chang SM, Lin JJ, Chen HC, Lee LT, Tsai DY, Lee SD, Lan CY, Chang CR, Chen CF, Ng CS. A comprehensive study of Z-DNA density and its evolutionary implications in birds. BMC Genomics 2024; 25:1123. [PMID: 39573987 PMCID: PMC11580473 DOI: 10.1186/s12864-024-11039-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2024] [Accepted: 11/13/2024] [Indexed: 11/25/2024] Open
Abstract
BACKGROUND Z-DNA, a left-handed helical form of DNA, plays a significant role in genomic stability and gene regulation. Its formation, associated with high GC content and repetitive sequences, is linked to genomic instability, potentially leading to large-scale deletions and contributing to phenotypic diversity and evolutionary adaptation. RESULTS In this study, we analyzed the density of Z-DNA-prone motifs of 154 avian genomes using the non-B DNA Motif Search Tool (nBMST). Our findings indicate a higher prevalence of Z-DNA motifs in promoter regions across all avian species compared to other genomic regions. A negative correlation was observed between Z-DNA density and developmental time in birds, suggesting that species with shorter developmental periods tend to have higher Z-DNA densities. This relationship implies that Z-DNA may influence the timing and regulation of development in avian species. Furthermore, Z-DNA density showed associations with traits such as body mass, egg mass, and genome size, highlighting the complex interactions between genome architecture and phenotypic characteristics. Gene Ontology (GO) analysis revealed that Z-DNA motifs are enriched in genes involved in nucleic acid binding, kinase activity, and translation regulation, suggesting a role in fine-tuning gene expression essential for cellular functions and responses to environmental changes. Additionally, the potential of Z-DNA to drive genomic instability and facilitate adaptive evolution underscores its importance in shaping phenotypic diversity. CONCLUSIONS This study emphasizes the role of Z-DNA as a dynamic genomic element contributing to gene regulation, genomic stability, and phenotypic diversity in avian species. Future research should experimentally validate these associations and explore the molecular mechanisms by which Z-DNA influences avian biology.
Collapse
Affiliation(s)
- Yu-Ren Wang
- Institute of Molecular and Cellular Biology, National Tsing Hua University, Hsinchu, 300044, Taiwan
| | - Shao-Ming Chang
- Institute of Molecular and Cellular Biology, National Tsing Hua University, Hsinchu, 300044, Taiwan
| | - Jinn-Jy Lin
- National Center for High-performance Computing, National Applied Research Laboratories, Hsinchu, 300092, Taiwan
| | - Hsiao-Chian Chen
- Institute of Molecular and Cellular Biology, National Tsing Hua University, Hsinchu, 300044, Taiwan
- Marine Research Station, Academia Sinica, Yilan, 262204, Taiwan
- Okinawa Institute of Science and Technology, Okinawa, 904-0495, Japan
| | - Lo-Tung Lee
- Institute of Molecular and Cellular Biology, National Tsing Hua University, Hsinchu, 300044, Taiwan
| | - Dien-Yu Tsai
- Institute of Molecular and Cellular Biology, National Tsing Hua University, Hsinchu, 300044, Taiwan
| | - Shih-Da Lee
- Institute of Molecular and Cellular Biology, National Tsing Hua University, Hsinchu, 300044, Taiwan
| | - Chung-Yu Lan
- Institute of Molecular and Cellular Biology, National Tsing Hua University, Hsinchu, 300044, Taiwan
- Department of Life Science, National Tsing Hua University, Hsinchu, 300044, Taiwan
| | - Chuang-Rung Chang
- Institute of Biotechnology, National Tsing Hua University, Hsinchu, 300044, Taiwan
- Department of Medical Science, National Tsing Hua University, Hsinchu, 300044, Taiwan
- School of Medicine, National Tsing Hua University, Hsinchu, 300044, Taiwan
| | - Chih-Feng Chen
- Deparment of Animal Sciences, National Chung Hsing University, Taichung, 402202, Taiwan
- The iEGG and Animal Biotechnology Center, National Chung Hsing University, Taichung, 402202, Taiwan
| | - Chen Siang Ng
- Institute of Molecular and Cellular Biology, National Tsing Hua University, Hsinchu, 300044, Taiwan.
- Department of Life Science, National Tsing Hua University, Hsinchu, 300044, Taiwan.
- The iEGG and Animal Biotechnology Center, National Chung Hsing University, Taichung, 402202, Taiwan.
- Bioresource Conservation Research Center, National Tsing Hua University, Hsinchu, 300044, Taiwan.
| |
Collapse
|
4
|
Provatas K, Chantzi N, Patsakis M, Nayak A, Mouratidis I, Pavlopoulos GA, Georgakopoulos-Soares I. invertiaDB: A Database of Inverted Repeats Across Organismal Genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.11.622808. [PMID: 39605716 PMCID: PMC11601276 DOI: 10.1101/2024.11.11.622808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Inverted repeats are repetitive elements that can form hairpin and cruciform structures. They are linked to genomic instability, however they also have various biological functions. Their distribution differs markedly across taxonomic groups in the tree of life, and they exhibit high polymorphism due to their inherent genomic instability. Advances in sequencing technologies and declined costs have enabled the generation of an ever-growing number of complete genomes for organisms across taxonomic groups in the tree of life. However, a comprehensive database encompassing inverted repeats across diverse organismal genomes has been lacking. We present InvertiaDB, the first comprehensive database of inverted repeats spanning multiple taxa, featuring repeats identified in the genomes of 118,070 organisms across all major taxonomic groups. The database currently hosts 30,067,666 inverted repeat sequences, serving as a centralized, user-friendly repository to perform searches, interactive visualization, and download existing inverted repeat data for independent analysis. invertiaDB is implemented as a web portal for browsing, analyzing and downloading inverted repeat data. invertiaDB is publicly available at https://invertiadb.netlify.app/homepage.html.
Collapse
Affiliation(s)
- Kimonas Provatas
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | - Michail Patsakis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | - Akshatha Nayak
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | | | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| |
Collapse
|
5
|
Štefan U, Brázda V, Plavec J, Marušič M. The influence of G-tract and loop length on the topological variability of putative five and six G-quartet DNA structures in the human genome. Int J Biol Macromol 2024; 280:136008. [PMID: 39326605 DOI: 10.1016/j.ijbiomac.2024.136008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 09/23/2024] [Accepted: 09/23/2024] [Indexed: 09/28/2024]
Abstract
Local variation of DNA structure and its dynamic nature play an essential role in the regulation of important biological processes. One of the most prominent noncanonical structures are G-quadruplexes, which form in vivo within guanine-rich regions and have been demonstrated to be involved in the regulation of transcription, translation and telomere maintenance. We provide an analysis of G-quadruplex formation in sequences with five and six guanine residues long G-tracts, which have emerged from the investigation of the gapless human genome and are associated with genes related to cancer and neurodegenerative diseases. We systematically explored the effect of G-tract and loop elongations by means of NMR and CD spectroscopy and polyacrylamide electrophoresis. Despite both types of elongation leading up to structural polymorphism, we successfully determined the topologies of four out of eight examined sequences, one of which contributes to a very scarce selection of currently known intramolecular four G-quartet structures in potassium solutions. We demonstrate that examined sequences are incompatible with five or six G-quartet structures with propeller loops, although the compatibility with other loop types cannot be factored out. Lastly, we propose a novel approach towards specific G-quadruplex targeting that could be implemented in structures with more than four G-quartets.
Collapse
Affiliation(s)
- Urša Štefan
- Faculty of Chemistry and Chemical Technology, University of Ljubljana, Večna pot 113, SI-1000 Ljubljana, Slovenia
| | - Václav Brázda
- Institute of Biophysics of the Czech Academy of Sciences, Královopolská 135, 61265 Brno, Czech Republic
| | - Janez Plavec
- Faculty of Chemistry and Chemical Technology, University of Ljubljana, Večna pot 113, SI-1000 Ljubljana, Slovenia; Slovenian NMR Center, National Institute of Chemistry, Hajdrihova 19, SI-1000 Ljubljana, Slovenia; EN-FIST Center of Excellence, SI-1000 Ljubljana, Slovenia
| | - Maja Marušič
- Slovenian NMR Center, National Institute of Chemistry, Hajdrihova 19, SI-1000 Ljubljana, Slovenia.
| |
Collapse
|
6
|
Beknazarov N, Konovalov D, Herbert A, Poptsova M. Z-DNA formation in promoters conserved between human and mouse are associated with increased transcription reinitiation rates. Sci Rep 2024; 14:17786. [PMID: 39090226 PMCID: PMC11294368 DOI: 10.1038/s41598-024-68439-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 07/23/2024] [Indexed: 08/04/2024] Open
Abstract
A long-standing question concerns the role of Z-DNA in transcription. Here we use a deep learning approach DeepZ that predicts Z-flipons based on DNA sequence, structural properties of nucleotides and omics data. We examined Z-flipons that are conserved between human and mouse genomes after generating whole-genome Z-flipon maps and then validated them by orthogonal approaches based on high resolution chemical mapping of Z-DNA and the transformer algorithm Z-DNABERT. For human and mouse, we revealed similar pattern of transcription factors, chromatin remodelers, and histone marks associated with conserved Z-flipons. We found significant enrichment of Z-flipons in alternative and bidirectional promoters associated with neurogenesis genes. We show that conserved Z-flipons are associated with increased experimentally determined transcription reinitiation rates compared to promoters without Z-flipons, but without affecting elongation or pausing. Our findings support a model where Z-flipons engage Transcription Factor E and impact phenotype by enabling the reset of preinitiation complexes when active, and the suppression of gene expression when engaged by repressive chromatin complexes.
Collapse
Affiliation(s)
- Nazar Beknazarov
- Laboratory of Bioinformatics, Faculty of Computer Science, National Research University Higher School of Economics, Moscow, Russia
| | - Dmitry Konovalov
- Laboratory of Bioinformatics, Faculty of Computer Science, National Research University Higher School of Economics, Moscow, Russia
| | - Alan Herbert
- Laboratory of Bioinformatics, Faculty of Computer Science, National Research University Higher School of Economics, Moscow, Russia.
- InsideOutBio, Charlestown, MA, USA.
| | - Maria Poptsova
- Laboratory of Bioinformatics, Faculty of Computer Science, National Research University Higher School of Economics, Moscow, Russia.
| |
Collapse
|
7
|
Lebherz MK, Fouks B, Schmidt J, Bornberg-Bauer E, Grandchamp A. DNA Transposons Favor De Novo Transcript Emergence Through Enrichment of Transcription Factor Binding Motifs. Genome Biol Evol 2024; 16:evae134. [PMID: 38934893 PMCID: PMC11264136 DOI: 10.1093/gbe/evae134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 06/11/2024] [Accepted: 06/15/2024] [Indexed: 06/28/2024] Open
Abstract
De novo genes emerge from noncoding regions of genomes via succession of mutations. Among others, such mutations activate transcription and create a new open reading frame (ORF). Although the mechanisms underlying ORF emergence are well documented, relatively little is known about the mechanisms enabling new transcription events. Yet, in many species a continuum between absent and very prominent transcription has been reported for essentially all regions of the genome. In this study, we searched for de novo transcripts by using newly assembled genomes and transcriptomes of seven inbred lines of Drosophila melanogaster, originating from six European and one African population. This setup allowed us to detect sample specific de novo transcripts, and compare them to their homologous nontranscribed regions in other samples, as well as genic and intergenic control sequences. We studied the association with transposable elements (TEs) and the enrichment of transcription factor motifs upstream of de novo emerged transcripts and compared them with regulatory elements. We found that de novo transcripts overlap with TEs more often than expected by chance. The emergence of new transcripts correlates with regions of high guanine-cytosine content and TE expression. Moreover, upstream regions of de novo transcripts are highly enriched with regulatory motifs. Such motifs are more enriched in new transcripts overlapping with TEs, particularly DNA TEs, and are more conserved upstream de novo transcripts than upstream their 'nontranscribed homologs'. Overall, our study demonstrates that TE insertion is important for transcript emergence, partly by introducing new regulatory motifs from DNA TE families.
Collapse
Affiliation(s)
| | - Bertrand Fouks
- CEFE, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, F-34398, Montpellier, France
- CIRAD, UMR AGAP Institut, F-34398, Montpellier, France
| | - Julian Schmidt
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
- Department of Protein Evolution, Max Planck Institute for Biology, Tübingen, Germany
| | - Anna Grandchamp
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| |
Collapse
|
8
|
Yi C, Liu Q, Huang Y, Liu C, Guo X, Fan C, Zhang K, Liu Y, Han F. Non-B-form DNA is associated with centromere stability in newly-formed polyploid wheat. SCIENCE CHINA. LIFE SCIENCES 2024; 67:1479-1488. [PMID: 38639838 DOI: 10.1007/s11427-023-2513-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 12/18/2023] [Indexed: 04/20/2024]
Abstract
Non-B-form DNA differs from the classic B-DNA double helix structure and plays a crucial regulatory role in replication and transcription. However, the role of non-B-form DNA in centromeres, especially in polyploid wheat, remains elusive. Here, we systematically analyzed seven non-B-form DNA motif profiles (A-phased DNA repeat, direct repeat, G-quadruplex, inverted repeat, mirror repeat, short tandem repeat, and Z-DNA) in hexaploid wheat. We found that three of these non-B-form DNA motifs were enriched at centromeric regions, especially at the CENH3-binding sites, suggesting that non-B-form DNA may create a favorable loading environment for the CENH3 nucleosome. To investigate the dynamics of centromeric non-B form DNA during the alloploidization process, we analyzed DNA secondary structure using CENH3 ChIP-seq data from newly formed allotetraploid wheat and its two diploid ancestors. We found that newly formed allotetraploid wheat formed more non-B-form DNA in centromeric regions compared with their parents, suggesting that non-B-form DNA is related to the localization of the centromeric regions in newly formed wheat. Furthermore, non-B-form DNA enriched in the centromeric regions was found to preferentially form on young LTR retrotransposons, explaining CENH3's tendency to bind to younger LTR. Collectively, our study describes the landscape of non-B-form DNA in the wheat genome, and sheds light on its potential role in the evolution of polyploid centromeres.
Collapse
Affiliation(s)
- Congyang Yi
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Qian Liu
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yuhong Huang
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Chang Liu
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xianrui Guo
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Chaolan Fan
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Kaibiao Zhang
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yang Liu
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Fangpu Han
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
9
|
Fang Y, Bansal K, Mostafavi S, Benoist C, Mathis D. AIRE relies on Z-DNA to flag gene targets for thymic T cell tolerization. Nature 2024; 628:400-407. [PMID: 38480882 PMCID: PMC11091860 DOI: 10.1038/s41586-024-07169-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 02/06/2024] [Indexed: 03/18/2024]
Abstract
AIRE is an unconventional transcription factor that enhances the expression of thousands of genes in medullary thymic epithelial cells and promotes clonal deletion or phenotypic diversion of self-reactive T cells1-4. The biological logic of AIRE's target specificity remains largely unclear as, in contrast to many transcription factors, it does not bind to a particular DNA sequence motif. Here we implemented two orthogonal approaches to investigate AIRE's cis-regulatory mechanisms: construction of a convolutional neural network and leveraging natural genetic variation through analysis of F1 hybrid mice5. Both approaches nominated Z-DNA and NFE2-MAF as putative positive influences on AIRE's target choices. Genome-wide mapping studies revealed that Z-DNA-forming and NFE2L2-binding motifs were positively associated with the inherent ability of a gene's promoter to generate DNA double-stranded breaks, and promoters showing strong double-stranded break generation were more likely to enter a poised state with accessible chromatin and already-assembled transcriptional machinery. Consequently, AIRE preferentially targets genes with poised promoters. We propose a model in which Z-DNA anchors the AIRE-mediated transcriptional program by enhancing double-stranded break generation and promoter poising. Beyond resolving a long-standing mechanistic conundrum, these findings suggest routes for manipulating T cell tolerance.
Collapse
Affiliation(s)
- Yuan Fang
- Department of Immunology, Harvard Medical School, Boston, MA, USA
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
| | - Kushagra Bansal
- Molecular Biology and Genetics Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Bangalore, India
| | - Sara Mostafavi
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
- Canadian Institute for Advanced Research, Toronto, Ontario, Canada
| | | | - Diane Mathis
- Department of Immunology, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
10
|
Qian SH, Shi MW, Xiong YL, Zhang Y, Zhang ZH, Song XM, Deng XY, Chen ZX. EndoQuad: a comprehensive genome-wide experimentally validated endogenous G-quadruplex database. Nucleic Acids Res 2024; 52:D72-D80. [PMID: 37904589 PMCID: PMC10767823 DOI: 10.1093/nar/gkad966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 09/22/2023] [Accepted: 10/14/2023] [Indexed: 11/01/2023] Open
Abstract
G-quadruplexes (G4s) are non-canonical four-stranded structures and are emerging as novel genetic regulatory elements. However, a comprehensive genomic annotation of endogenous G4s (eG4s) and systematic characterization of their regulatory network are still lacking, posing major challenges for eG4 research. Here, we present EndoQuad (https://EndoQuad.chenzxlab.cn/) to address these pressing issues by integrating high-throughput experimental data. First, based on high-quality genome-wide eG4s mapping datasets (human: 1181; mouse: 24; chicken: 2) generated by G4 ChIP-seq/CUT&Tag, we generate a reference set of genome-wide eG4s. Our multi-omics analyses show that most eG4s are identified in one or a few cell types. The eG4s with higher occurrences across samples are more structurally stable, evolutionarily conserved, enriched in promoter regions, mark highly expressed genes and associate with complex regulatory programs, demonstrating higher confidence level for further experiments. Finally, we integrate millions of functional genomic variants and prioritize eG4s with regulatory functions in disease and cancer contexts. These efforts have culminated in the comprehensive and interactive database of experimentally validated DNA eG4s. As such, EndoQuad enables users to easily access, download and repurpose these data for their own research. EndoQuad will become a one-stop resource for eG4 research and lay the foundation for future functional studies.
Collapse
Affiliation(s)
- Sheng Hu Qian
- Hubei Hongshan Laboratory, College of Life Science and Technology, College of Biomedicine and Health, Interdisciplinary Sciences Institute, Huazhong Agricultural University, Wuhan 430070, PR China
| | - Meng-Wei Shi
- Hubei Hongshan Laboratory, College of Life Science and Technology, College of Biomedicine and Health, Interdisciplinary Sciences Institute, Huazhong Agricultural University, Wuhan 430070, PR China
| | - Yu-Li Xiong
- Hubei Hongshan Laboratory, College of Life Science and Technology, College of Biomedicine and Health, Interdisciplinary Sciences Institute, Huazhong Agricultural University, Wuhan 430070, PR China
| | - Yuan Zhang
- Hubei Hongshan Laboratory, College of Life Science and Technology, College of Biomedicine and Health, Interdisciplinary Sciences Institute, Huazhong Agricultural University, Wuhan 430070, PR China
| | - Ze-Hao Zhang
- Hubei Hongshan Laboratory, College of Life Science and Technology, College of Biomedicine and Health, Interdisciplinary Sciences Institute, Huazhong Agricultural University, Wuhan 430070, PR China
| | - Xue-Mei Song
- Hubei Hongshan Laboratory, College of Life Science and Technology, College of Biomedicine and Health, Interdisciplinary Sciences Institute, Huazhong Agricultural University, Wuhan 430070, PR China
| | - Xin-Yin Deng
- Hubei Hongshan Laboratory, College of Life Science and Technology, College of Biomedicine and Health, Interdisciplinary Sciences Institute, Huazhong Agricultural University, Wuhan 430070, PR China
| | - Zhen-Xia Chen
- Hubei Hongshan Laboratory, College of Life Science and Technology, College of Biomedicine and Health, Interdisciplinary Sciences Institute, Huazhong Agricultural University, Wuhan 430070, PR China
- Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Shenzhen 518000, China
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518000, China
| |
Collapse
|
11
|
Zhao J, Baltoumas FA, Konnaris MA, Mouratidis I, Liu Z, Sims J, Agarwal V, Pavlopoulos GA, Georgakopoulos--Soares I, Ahituv N. MPRAbase: A Massively Parallel Reporter Assay Database. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.19.567742. [PMID: 38045264 PMCID: PMC10690217 DOI: 10.1101/2023.11.19.567742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Massively parallel reporter assays (MPRAs) represent a set of high-throughput technologies that measure the functional effects of thousands of sequences/variants on gene regulatory activity. There are several different variations of MPRA technology and they are used for numerous applications, including regulatory element discovery, variant effect measurement, saturation mutagenesis, synthetic regulatory element generation or characterization of evolutionary gene regulatory differences. Despite their many designs and uses, there is no comprehensive database that incorporates the results of these experiments. To address this, we developed MPRAbase, a manually curated database that currently harbors 129 experiments, encompassing 17,718,677 elements tested across 35 cell types and 4 organisms. The MPRAbase web interface (http://www.mprabase.com) serves as a centralized user-friendly repository to download existing MPRA data for independent analysis and is designed with the ability to allow researchers to share their published data for rapid dissemination to the community.
Collapse
Affiliation(s)
- Jingjing Zhao
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, USA
| | - Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, 16672, Greece
| | - Maxwell A. Konnaris
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Department of Statistics, Penn State University, State College, PA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Department of Statistics, Penn State University, State College, PA, USA
| | - Zhe Liu
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, USA
- Department of Computer Science, City University of Hong Kong, Hong Kong, China
| | - Jasmine Sims
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, USA
| | - Vikram Agarwal
- mRNA Center of Excellence, Sanofi Pasteur Inc., Waltham, MA, USA
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, 16672, Greece
| | - Ilias Georgakopoulos--Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Department of Statistics, Penn State University, State College, PA, USA
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, USA
| |
Collapse
|
12
|
Dézé O, Ordanoska D, Rossille D, Miglierina E, Laffleur B, Cogné M. Unique repetitive nucleic acid structures mirror switch regions in the human IgH locus. Biochimie 2023; 214:167-175. [PMID: 37678746 DOI: 10.1016/j.biochi.2023.08.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 08/29/2023] [Accepted: 08/31/2023] [Indexed: 09/09/2023]
Abstract
Immunoglobulin (Ig) genes carry the unique ability to be reshaped in peripheral B lymphocytes after these cells encounter a specific antigen. B cells can then further improve their affinity, acquire new functions as memory cells and eventually end up as antibody-secreting cells. Ig class switching is an important change that occurs in this context, thanks to local DNA lesions initiated by the enzyme activation-induced deaminase (AID). Several cis-acting elements of the Ig heavy (IgH) chain locus make it accessible to the AID-mediated lesions that promote class switch recombination (CSR). DNA repeats, with a non-template strand rich in G-quadruplexes (G4)-DNA, are prominent cis-targets of AID and define the so-called "switch" (S) regions specifically targeted for CSR. By analyzing the structure of the human IgH locus, we uncover that abundant DNA repeats, some with a putative G4-rich template strand, are additionally present in downstream portions of the IgH coding genes. These like-S (LS) regions stand as 3' mirror-images of S regions and also show analogies to some previously reported repeats associated with the IgH locus 3' super-enhancer. A regulatory role of LS repeats is strongly suggested by their specific localization close to exons encoding the membrane form of Ig molecules, and by their conservation during mammalian evolution.
Collapse
Affiliation(s)
- Ophélie Dézé
- Institut National de La Santé et de La Recherche Médicale, Unité Mixte de Recherche U1236, Université de Rennes, Etablissement Français Du Sang Bretagne, F-35000, Rennes, France
| | - Delfina Ordanoska
- Institut National de La Santé et de La Recherche Médicale, Unité Mixte de Recherche U1236, Université de Rennes, Etablissement Français Du Sang Bretagne, F-35000, Rennes, France
| | - Delphine Rossille
- Centre Hospitalier Universitaire de Rennes, SITI, Pôle Biologie, F-35033, Rennes, France
| | - Emma Miglierina
- Institut National de La Santé et de La Recherche Médicale, Unité Mixte de Recherche U1236, Université de Rennes, Etablissement Français Du Sang Bretagne, F-35000, Rennes, France
| | - Brice Laffleur
- Institut National de La Santé et de La Recherche Médicale, Unité Mixte de Recherche U1236, Université de Rennes, Etablissement Français Du Sang Bretagne, F-35000, Rennes, France
| | - Michel Cogné
- Institut National de La Santé et de La Recherche Médicale, Unité Mixte de Recherche U1236, Université de Rennes, Etablissement Français Du Sang Bretagne, F-35000, Rennes, France; Centre Hospitalier Universitaire de Rennes, SITI, Pôle Biologie, F-35033, Rennes, France.
| |
Collapse
|
13
|
Umerenkov D, Herbert A, Konovalov D, Danilova A, Beknazarov N, Kokh V, Fedorov A, Poptsova M. Z-flipon variants reveal the many roles of Z-DNA and Z-RNA in health and disease. Life Sci Alliance 2023; 6:e202301962. [PMID: 37164635 PMCID: PMC10172764 DOI: 10.26508/lsa.202301962] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 04/25/2023] [Accepted: 04/28/2023] [Indexed: 05/12/2023] Open
Abstract
Identifying roles for Z-DNA remains challenging given their dynamic nature. Here, we perform genome-wide interrogation with the DNABERT transformer algorithm trained on experimentally identified Z-DNA forming sequences (Z-flipons). The algorithm yields large performance enhancements (F1 = 0.83) over existing approaches and implements computational mutagenesis to assess the effects of base substitution on Z-DNA formation. We show Z-flipons are enriched in promoters and telomeres, overlapping quantitative trait loci for RNA expression, RNA editing, splicing, and disease-associated variants. We cross-validate across a number of orthogonal databases and define BZ junction motifs. Surprisingly, many effects we delineate are likely mediated through Z-RNA formation. A shared Z-RNA motif is identified in SCARF2, SMAD1, and CACNA1 transcripts, whereas other motifs are present in noncoding RNAs. We provide evidence for a Z-RNA fold that promotes adaptive immunity through alternative splicing of KRAB domain zinc finger proteins. An analysis of OMIM and presumptive gnomAD loss-of-function datasets reveals an overlap of Z-flipons with disease-causing variants in 8.6% and 2.9% of Mendelian disease genes, respectively, greatly extending the range of phenotypes mapped to Z-flipons.
Collapse
Affiliation(s)
| | - Alan Herbert
- Laboratory of Bioinformatics, Faculty of Computer Science, HSE University, Moscow, Russia
- InsideOutBio, Charlestown, MA, USA
| | - Dmitrii Konovalov
- Laboratory of Bioinformatics, Faculty of Computer Science, HSE University, Moscow, Russia
| | - Anna Danilova
- Laboratory of Bioinformatics, Faculty of Computer Science, HSE University, Moscow, Russia
| | - Nazar Beknazarov
- Laboratory of Bioinformatics, Faculty of Computer Science, HSE University, Moscow, Russia
| | | | - Aleksandr Fedorov
- Laboratory of Bioinformatics, Faculty of Computer Science, HSE University, Moscow, Russia
| | - Maria Poptsova
- Laboratory of Bioinformatics, Faculty of Computer Science, HSE University, Moscow, Russia
| |
Collapse
|
14
|
Hosseini M, Palmer A, Manka W, Grady PGS, Patchigolla V, Bi J, O'Neill RJ, Chi Z, Aguiar D. Deep statistical modelling of nanopore sequencing translocation times reveals latent non-B DNA structures. Bioinformatics 2023; 39:i242-i251. [PMID: 37387144 DOI: 10.1093/bioinformatics/btad220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION Non-canonical (or non-B) DNA are genomic regions whose three-dimensional conformation deviates from the canonical double helix. Non-B DNA play an important role in basic cellular processes and are associated with genomic instability, gene regulation, and oncogenesis. Experimental methods are low-throughput and can detect only a limited set of non-B DNA structures, while computational methods rely on non-B DNA base motifs, which are necessary but not sufficient indicators of non-B structures. Oxford Nanopore sequencing is an efficient and low-cost platform, but it is currently unknown whether nanopore reads can be used for identifying non-B structures. RESULTS We build the first computational pipeline to predict non-B DNA structures from nanopore sequencing. We formalize non-B detection as a novelty detection problem and develop the GoFAE-DND, an autoencoder that uses goodness-of-fit (GoF) tests as a regularizer. A discriminative loss encourages non-B DNA to be poorly reconstructed and optimizing Gaussian GoF tests allows for the computation of P-values that indicate non-B structures. Based on whole genome nanopore sequencing of NA12878, we show that there exist significant differences between the timing of DNA translocation for non-B DNA bases compared with B-DNA. We demonstrate the efficacy of our approach through comparisons with novelty detection methods using experimental data and data synthesized from a new translocation time simulator. Experimental validations suggest that reliable detection of non-B DNA from nanopore sequencing is achievable. AVAILABILITY AND IMPLEMENTATION Source code is available at https://github.com/bayesomicslab/ONT-nonb-GoFAE-DND.
Collapse
Affiliation(s)
- Marjan Hosseini
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-4155, United States
| | - Aaron Palmer
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-4155, United States
| | - William Manka
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-4155, United States
| | - Patrick G S Grady
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269-3003, United States
| | - Venkata Patchigolla
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-4155, United States
| | - Jinbo Bi
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-4155, United States
| | - Rachel J O'Neill
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269-3003, United States
| | - Zhiyi Chi
- Department of Statistics, University of Connecticut, Storrs, CT 06269-4120, United States
| | - Derek Aguiar
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-4155, United States
| |
Collapse
|
15
|
Li G, Su G, Wang Y, Wang W, Shi J, Li D, Sui G. Integrative genomic analyses of promoter G-quadruplexes reveal their selective constraint and association with gene activation. Commun Biol 2023; 6:625. [PMID: 37301913 PMCID: PMC10257653 DOI: 10.1038/s42003-023-05015-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Accepted: 06/05/2023] [Indexed: 06/12/2023] Open
Abstract
G-quadruplexes (G4s) regulate DNA replication and gene transcription, and are enriched in promoters without fully appreciated functional relevance. Here we show high selection pressure on putative G4 (pG4) forming sequences in promoters through investigating genetic and genomic data. Analyses of 76,156 whole-genome sequences reveal that G-tracts and connecting loops in promoter pG4s display lower or higher allele frequencies, respectively, than pG4-flanking regions, and central guanines (Gs) in G-tracts show higher selection pressure than other Gs. Additionally, pG4-promoters produce over 72.4% of transcripts, and promoter G4-containing genes are expressed at relatively high levels. Most genes repressed by TMPyP4, a G4-ligand, regulate epigenetic processes, and promoter G4s are enriched with gene activation histone marks, chromatin remodeler and transcription factor binding sites. Consistently, cis-expression quantitative trait loci (cis-eQTLs) are enriched in promoter pG4s and their G-tracts. Overall, our study demonstrates selective constraint of promoter G4s and reinforces their stimulative role in gene expression.
Collapse
Affiliation(s)
- Guangyue Li
- College of Life Science, Northeast Forestry University, Harbin, 150040, China
| | - Gongbo Su
- College of Life Science, Northeast Forestry University, Harbin, 150040, China
| | - Yunxuan Wang
- Department of Medical Oncology, Harbin Medical University Cancer Hospital, Harbin, 150081, China
| | - Wenmeng Wang
- College of Life Science, Northeast Forestry University, Harbin, 150040, China
| | - Jinming Shi
- College of Life Science, Northeast Forestry University, Harbin, 150040, China
| | - Dangdang Li
- College of Life Science, Northeast Forestry University, Harbin, 150040, China
| | - Guangchao Sui
- College of Life Science, Northeast Forestry University, Harbin, 150040, China.
| |
Collapse
|
16
|
Xu Q, Kowalski J. NBBC: a non-B DNA burden explorer in cancer. Nucleic Acids Res 2023:7177884. [PMID: 37224529 DOI: 10.1093/nar/gkad379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 04/16/2023] [Accepted: 05/12/2023] [Indexed: 05/26/2023] Open
Abstract
Alternate (non-B) DNA-forming structures, such as Z-DNA, G-quadruplex, triplex have demonstrated a potential role in cancer etiology. It has been found that non-B DNA-forming sequences can stimulate genetic instability in human cancer genomes, implicating them in the development of cancer and other genetic diseases. While there exist several non-B prediction tools and databases, they lack the ability to both analyze and visualize non-B data within a cancer context. Herein, we introduce NBBC, a non-B DNA burden explorer in cancer, that offers analyses and visualizations for non-B DNA forming motifs. To do so, we introduce 'non-B burden' as a metric to summarize the prevalence of non-B DNA motifs at the gene-, signature- and genomic site-levels. Using our non-B burden metric, we developed two analyses modules within a cancer context to assist in exploring both gene- and motif-level non-B type heterogeneity among gene signatures. NBBC is designed to serve as a new analysis and visualization platform for the exploration of non-B DNA, guided by non-B burden as a novel marker.
Collapse
Affiliation(s)
- Qi Xu
- Department of Oncology, Dell Medical School, The University of Texas at Austin, Austin, TX 78712, USA
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Jeanne Kowalski
- Department of Oncology, Dell Medical School, The University of Texas at Austin, Austin, TX 78712, USA
| |
Collapse
|
17
|
Moeckel C, Zaravinos A, Georgakopoulos-Soares I. Strand Asymmetries Across Genomic Processes. Comput Struct Biotechnol J 2023; 21:2036-2047. [PMID: 36968020 PMCID: PMC10030826 DOI: 10.1016/j.csbj.2023.03.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 03/08/2023] [Accepted: 03/08/2023] [Indexed: 03/12/2023] Open
Abstract
Across biological systems, a number of genomic processes, including transcription, replication, DNA repair, and transcription factor binding, display intrinsic directionalities. These directionalities are reflected in the asymmetric distribution of nucleotides, motifs, genes, transposon integration sites, and other functional elements across the two complementary strands. Strand asymmetries, including GC skews and mutational biases, have shaped the nucleotide composition of diverse organisms. The investigation of strand asymmetries often serves as a method to understand underlying biological mechanisms, including protein binding preferences, transcription factor interactions, retrotransposition, DNA damage and repair preferences, transcription-replication collisions, and mutagenesis mechanisms. Research into this subject also enables the identification of functional genomic sites, such as replication origins and transcription start sites. Improvements in our ability to detect and quantify DNA strand asymmetries will provide insights into diverse functionalities of the genome, the contribution of different mutational mechanisms in germline and somatic mutagenesis, and our knowledge of genome instability and evolution, which all have significant clinical implications in human disease, including cancer. In this review, we describe key developments that have been made across the field of genomic strand asymmetries, as well as the discovery of associated mechanisms.
Collapse
Affiliation(s)
- Camille Moeckel
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Apostolos Zaravinos
- Department of Life Sciences, European University Cyprus, Diogenis Str., 6, Nicosia 2404, Cyprus
- Cancer Genetics, Genomics and Systems Biology laboratory, Basic and Translational Cancer Research Center (BTCRC), Nicosia 1516, Cyprus
- Corresponding author at: Department of Life Sciences, European University Cyprus, Diogenis Str., 6, Nicosia 2404, Cyprus.
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Corresponding author.
| |
Collapse
|
18
|
Martin-Trujillo A, Garg P, Patel N, Jadhav B, Sharp AJ. Genome-wide evaluation of the effect of short tandem repeat variation on local DNA methylation. Genome Res 2023; 33:184-196. [PMID: 36577521 PMCID: PMC10069470 DOI: 10.1101/gr.277057.122] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Accepted: 12/19/2022] [Indexed: 12/30/2022]
Abstract
Short tandem repeats (STRs) contribute significantly to genetic diversity in humans, including disease-causing variation. Although the effect of STR variation on gene expression has been extensively assessed, their impact on epigenetics has been poorly studied and limited to specific genomic regions. Here, we investigated the hypothesis that some STRs act as independent regulators of local DNA methylation in the human genome and modify risk of common human traits. To address these questions, we first analyzed two independent data sets comprising PCR-free whole-genome sequencing (WGS) and genome-wide DNA methylation levels derived from whole-blood samples in 245 (discovery cohort) and 484 individuals (replication cohort). Using genotypes for 131,635 polymorphic STRs derived from WGS using HipSTR, we identified 11,870 STRs that associated with DNA methylation levels (mSTRs) of 11,774 CpGs (Bonferroni P < 0.001) in our discovery cohort, with 90% successfully replicating in our second cohort. Subsequently, through fine-mapping using CAVIAR we defined 585 of these mSTRs as the likely causal variants underlying the observed associations (fm-mSTRs) and linked a fraction of these to previously reported genome-wide association study signals, providing insights into the mechanisms underlying complex human traits. Furthermore, by integrating gene expression data, we observed that 12.5% of the tested fm-mSTRs also modulate expression levels of nearby genes, reinforcing their regulatory potential. Overall, our findings expand the catalog of functional sequence variants that affect genome regulation, highlighting the importance of incorporating STRs in future genetic association analysis and epigenetics data for the interpretation of trait-associated variants.
Collapse
Affiliation(s)
- Alejandro Martin-Trujillo
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, Hess Center for Science and Medicine, New York, New York 10029, USA
| | - Paras Garg
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, Hess Center for Science and Medicine, New York, New York 10029, USA
| | - Nihir Patel
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, Hess Center for Science and Medicine, New York, New York 10029, USA
| | - Bharati Jadhav
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, Hess Center for Science and Medicine, New York, New York 10029, USA
| | - Andrew J Sharp
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, Hess Center for Science and Medicine, New York, New York 10029, USA
| |
Collapse
|
19
|
Chang T, Li G, Ding Z, Li W, Zhu P, Lei W, Shangguan D. Potential G-quadruplexes within the Promoter Nuclease Hypersensitive Sites of the Heat-responsive Genes in Rice. Chembiochem 2022; 23:e202200405. [PMID: 36006168 DOI: 10.1002/cbic.202200405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 08/21/2022] [Indexed: 11/11/2022]
Abstract
G-quadruplexes (G4s) have been shown to be involved in the regulation of multiple cellular processes. Exploring putative G4-forming sequences (PQSs) in heat-responsive genes of rice and their folding structures under different conditions will help to understand the mechanism in response to heat stress. In this work, we discovered a prevalence of PQSs in nuclease hypersensitive sites within the promoters of heat-responsive genes. Moreover, 50% of the searched G3 PQSs ((G3+L1-7)3+G3+) locate in heat shock transcription factors. Circular dichroism spectroscopy, thermal difference spectroscopy, and UV melting analysis demonstrated the representative PQSs could adopt stable G4s at physiological temperature and potassium concentration. These PQSs were able to stall Klenow Fragment (KF) DNA polymerase by the formation of G4s. However, the G4s with Tm values around 50 - 60 oC could be increasingly unwound by KF with the increase of temperatures from 25 to 50 oC, implying these G4s could sense the changes in temperature by structural switch. This work offers fresh clue to understand the potential of G4-involved functions of PQSs and the molecular events in plants in the response to heat stress.
Collapse
Affiliation(s)
- Tianjun Chang
- Henan Polytechnic University, Institute of Enveiroment and Resoures, 2001 Shiji Avenue, 454003, Jiaozuo, CHINA
| | - Guangping Li
- Henan Polytechnic University, Institute of Resources and Environment, CHINA
| | - Zhan Ding
- Henan Polytechnic University, Institute of Resources and Environment, CHINA
| | - Weiguo Li
- Henan Polytechnic University, Institute of Resources and Environment, CHINA
| | - Panpan Zhu
- Henan Polytechnic University, Institute of Resources and Environment, CHINA
| | - Wei Lei
- Henan Polytechnic University, Institute of Resources and Environment, CHINA
| | - Dihua Shangguan
- Institute of Chemistry Chinese Academy of Sciences, Beijing National Laboratory for Molecular Sciences, Key Labor-atory of Analytical Chemistry for Living Biosystems, CAS Re-search/Education Center for Excellence in Molecular Sciences, CHINA
| |
Collapse
|
20
|
High-throughput techniques enable advances in the roles of DNA and RNA secondary structures in transcriptional and post-transcriptional gene regulation. Genome Biol 2022; 23:159. [PMID: 35851062 PMCID: PMC9290270 DOI: 10.1186/s13059-022-02727-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 07/07/2022] [Indexed: 12/27/2022] Open
Abstract
The most stable structure of DNA is the canonical right-handed double helix termed B DNA. However, certain environments and sequence motifs favor alternative conformations, termed non-canonical secondary structures. The roles of DNA and RNA secondary structures in transcriptional regulation remain incompletely understood. However, advances in high-throughput assays have enabled genome wide characterization of some secondary structures. Here, we describe their regulatory functions in promoters and 3’UTRs, providing insights into key mechanisms through which they regulate gene expression. We discuss their implication in human disease, and how advances in molecular technologies and emerging high-throughput experimental methods could provide additional insights.
Collapse
|
21
|
Georgakopoulos-Soares I, Parada GE, Hemberg M. Secondary structures in RNA synthesis, splicing and translation. Comput Struct Biotechnol J 2022; 20:2871-2884. [PMID: 35765654 PMCID: PMC9198270 DOI: 10.1016/j.csbj.2022.05.041] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 05/19/2022] [Accepted: 05/21/2022] [Indexed: 11/30/2022] Open
Abstract
Even though the functional role of mRNA molecules is primarily decided by the nucleotide sequence, several properties are determined by secondary structure conformations. Examples of secondary structures include long range interactions, hairpins, R-loops and G-quadruplexes and they are formed through interactions of non-adjacent nucleotides. Here, we discuss advances in our understanding of how secondary structures can impact RNA synthesis, splicing, translation and mRNA half-life. During RNA synthesis, secondary structures determine RNA polymerase II (RNAPII) speed, thereby influencing splicing. Splicing is also determined by RNA binding proteins and their binding rates are modulated by secondary structures. For the initiation of translation, secondary structures can control the choice of translation start site. Here, we highlight the mechanisms by which secondary structures modulate these processes, discuss advances in technologies to detect and study them systematically, and consider the roles of RNA secondary structures in disease.
Collapse
Affiliation(s)
- Ilias Georgakopoulos-Soares
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | - Guillermo E. Parada
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5A 1A8, Canada
| | - Martin Hemberg
- Evergrande Center for Immunologic Diseases, Harvard Medical School and Brigham and Women’s Hospital, Boston, MA, USA
| |
Collapse
|
22
|
Georgakopoulos-Soares I, Parada GE, Wong HY, Medhi R, Furlan G, Munita R, Miska EA, Kwok CK, Hemberg M. Alternative splicing modulation by G-quadruplexes. Nat Commun 2022; 13:2404. [PMID: 35504902 PMCID: PMC9065059 DOI: 10.1038/s41467-022-30071-7] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Accepted: 03/30/2022] [Indexed: 12/14/2022] Open
Abstract
Alternative splicing is central to metazoan gene regulation, but the regulatory mechanisms are incompletely understood. Here, we show that G-quadruplex (G4) motifs are enriched ~3-fold near splice junctions. The importance of G4s in RNA is emphasised by a higher enrichment for the non-template strand. RNA-seq data from mouse and human neurons reveals an enrichment of G4s at exons that were skipped following depolarisation induced by potassium chloride. We validate the formation of stable RNA G4s for three candidate splice sites by circular dichroism spectroscopy, UV-melting and fluorescence measurements. Moreover, we find that sQTLs are enriched at G4s, and a minigene experiment provides further support for their role in promoting exon inclusion. Analysis of >1,800 high-throughput experiments reveals multiple RNA binding proteins associated with G4s. Finally, exploration of G4 motifs across eleven species shows strong enrichment at splice sites in mammals and birds, suggesting an evolutionary conserved splice regulatory mechanism. Here the authors shows that G-quadruplexes, non-canonical DNA/RNA structures, can have a direct impact on alternative splicing and that binding of splicing regulators is affected by their presence.
Collapse
Affiliation(s)
- Ilias Georgakopoulos-Soares
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK.,Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Guillermo E Parada
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK.,Wellcome Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK.,Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK.,Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, M5S 3E1, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, ON, M5A 1A8, Canada
| | - Hei Yuen Wong
- Department of Chemistry and State Key Laboratory of Marine Pollution, City University of Hong Kong, Kowloon Tong, Hong Kong SAR, China
| | - Ragini Medhi
- Wellcome Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK.,Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK
| | - Giulia Furlan
- Wellcome Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK.,Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK
| | - Roberto Munita
- Division of Molecular Hematology, Department of Laboratory Medicine, Lund Stem Cell Center, Faculty of Medicine, Lund University, Lund, Sweden
| | - Eric A Miska
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK.,Wellcome Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK.,Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK
| | - Chun Kit Kwok
- Department of Chemistry and State Key Laboratory of Marine Pollution, City University of Hong Kong, Kowloon Tong, Hong Kong SAR, China.,Shenzhen Research Institute of City University of Hong Kong, Shenzhen, China
| | - Martin Hemberg
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK. .,Wellcome Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK. .,Evergrande Center for Immunologic Diseases, Harvard Medical School and Brigham and Women's Hospital, Boston, MA, 02115, USA.
| |
Collapse
|
23
|
Bhardwaj V, Yadav D, Dhankhar M, Saini K. A novel approach for identification of mirror repeats within the Engrailed Homeobox-1 gene of Xenopus tropicalis. BIOMEDICAL AND BIOTECHNOLOGY RESEARCH JOURNAL (BBRJ) 2022. [DOI: 10.4103/bbrj.bbrj_281_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|