1
|
Porubiaková O, Havlík J, Indu, Šedý M, Přepechalová V, Bartas M, Bidula S, Šťastný J, Fojta M, Brázda V. Variability of Inverted Repeats in All Available Genomes of Bacteria. Microbiol Spectr 2023; 11:e0164823. [PMID: 37358458 PMCID: PMC10434271 DOI: 10.1128/spectrum.01648-23] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Accepted: 06/03/2023] [Indexed: 06/27/2023] Open
Abstract
Noncanonical secondary structures in nucleic acids have been studied intensively in recent years. Important biological roles of cruciform structures formed by inverted repeats (IRs) have been demonstrated in diverse organisms, including humans. Using Palindrome analyser, we analyzed IRs in all accessible bacterial genome sequences to determine their frequencies, lengths, and localizations. IR sequences were identified in all species, but their frequencies differed significantly across various evolutionary groups. We detected 242,373,717 IRs in all 1,565 bacterial genomes. The highest mean IR frequency was detected in the Tenericutes (61.89 IRs/kbp) and the lowest mean frequency was found in the Alphaproteobacteria (27.08 IRs/kbp). IRs were abundant near genes and around regulatory, tRNA, transfer-messenger RNA (tmRNA), and rRNA regions, pointing to the importance of IRs in such basic cellular processes as genome maintenance, DNA replication, and transcription. Moreover, we found that organisms with high IR frequencies were more likely to be endosymbiotic, antibiotic producing, or pathogenic. On the other hand, those with low IR frequencies were far more likely to be thermophilic. This first comprehensive analysis of IRs in all available bacterial genomes demonstrates their genomic ubiquity, nonrandom distribution, and enrichment in genomic regulatory regions. IMPORTANCE Our manuscript reports for the first time a complete analysis of inverted repeats in all fully sequenced bacterial genomes. Thanks to the availability of unique computational resources, we were able to statistically evaluate the presence and localization of these important regulatory sequences in bacterial genomes. This work revealed a strong abundance of these sequences in regulatory regions and provides researchers with a valuable tool for their manipulation.
Collapse
Affiliation(s)
- Otília Porubiaková
- Institute of Biophysics of the Czech Academy of Sciences, Brno, Czech Republic
| | - Jan Havlík
- Mendel University in Brno, Brno, Czech Republic
| | - Indu
- Institute of Biophysics of the Czech Academy of Sciences, Brno, Czech Republic
- Department of Experimental Biology, Faculty of Science, Masaryk University, Brno, Czech Republic
| | - Michal Šedý
- Brno University of Technology, Faculty of Chemistry, Brno, Czech Republic
| | - Veronika Přepechalová
- Institute of Biophysics of the Czech Academy of Sciences, Brno, Czech Republic
- Brno University of Technology, Faculty of Chemistry, Brno, Czech Republic
| | - Martin Bartas
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czech Republic
| | - Stefan Bidula
- School of Pharmacy, University of East Anglia, Norwich Research Park, Norwich, United Kingdom
| | - Jiří Šťastný
- Mendel University in Brno, Brno, Czech Republic
- Brno University of Technology, Faculty of Mechanical Engineering, Brno, Czech Republic
| | - Miroslav Fojta
- Institute of Biophysics of the Czech Academy of Sciences, Brno, Czech Republic
| | - Václav Brázda
- Institute of Biophysics of the Czech Academy of Sciences, Brno, Czech Republic
- Brno University of Technology, Faculty of Chemistry, Brno, Czech Republic
| |
Collapse
|
2
|
Searching for New Z-DNA/Z-RNA Binding Proteins Based on Structural Similarity to Experimentally Validated Zα Domain. Int J Mol Sci 2022; 23:ijms23020768. [PMID: 35054954 PMCID: PMC8775963 DOI: 10.3390/ijms23020768] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 01/03/2022] [Accepted: 01/05/2022] [Indexed: 11/17/2022] Open
Abstract
Z-DNA and Z-RNA are functionally important left-handed structures of nucleic acids, which play a significant role in several molecular and biological processes including DNA replication, gene expression regulation and viral nucleic acid sensing. Most proteins that have been proven to interact with Z-DNA/Z-RNA contain the so-called Zα domain, which is structurally well conserved. To date, only eight proteins with Zα domain have been described within a few organisms (including human, mouse, Danio rerio, Trypanosoma brucei and some viruses). Therefore, this paper aimed to search for new Z-DNA/Z-RNA binding proteins in the complete PDB structures database and from the AlphaFold2 protein models. A structure-based similarity search found 14 proteins with highly similar Zα domain structure in experimentally-defined proteins and 185 proteins with a putative Zα domain using the AlphaFold2 models. Structure-based alignment and molecular docking confirmed high functional conservation of amino acids involved in Z-DNA/Z-RNA, suggesting that Z-DNA/Z-RNA recognition may play an important role in a variety of cellular processes.
Collapse
|
3
|
Dettori LG, Torrejon D, Chakraborty A, Dutta A, Mohamed M, Papp C, Kuznetsov VA, Sung P, Feng W, Bah A. A Tale of Loops and Tails: The Role of Intrinsically Disordered Protein Regions in R-Loop Recognition and Phase Separation. Front Mol Biosci 2021; 8:691694. [PMID: 34179096 PMCID: PMC8222781 DOI: 10.3389/fmolb.2021.691694] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Accepted: 05/14/2021] [Indexed: 11/13/2022] Open
Abstract
R-loops are non-canonical, three-stranded nucleic acid structures composed of a DNA:RNA hybrid, a displaced single-stranded (ss)DNA, and a trailing ssRNA overhang. R-loops perform critical biological functions under both normal and disease conditions. To elucidate their cellular functions, we need to understand the mechanisms underlying R-loop formation, recognition, signaling, and resolution. Previous high-throughput screens identified multiple proteins that bind R-loops, with many of these proteins containing folded nucleic acid processing and binding domains that prevent (e.g., topoisomerases), resolve (e.g., helicases, nucleases), or recognize (e.g., KH, RRMs) R-loops. However, a significant number of these R-loop interacting Enzyme and Reader proteins also contain long stretches of intrinsically disordered regions (IDRs). The precise molecular and structural mechanisms by which the folded domains and IDRs synergize to recognize and process R-loops or modulate R-loop-mediated signaling have not been fully explored. While studying one such modular R-loop Reader, the Fragile X Protein (FMRP), we unexpectedly discovered that the C-terminal IDR (C-IDR) of FMRP is the predominant R-loop binding site, with the three N-terminal KH domains recognizing the trailing ssRNA overhang. Interestingly, the C-IDR of FMRP has recently been shown to undergo spontaneous Liquid-Liquid Phase Separation (LLPS) assembly by itself or in complex with another non-canonical nucleic acid structure, RNA G-quadruplex. Furthermore, we have recently shown that FMRP can suppress persistent R-loops that form during transcription, a process that is also enhanced by LLPS via the assembly of membraneless transcription factories. These exciting findings prompted us to explore the role of IDRs in R-loop processing and signaling proteins through a comprehensive bioinformatics and computational biology study. Here, we evaluated IDR prevalence, sequence composition and LLPS propensity for the known R-loop interactome. We observed that, like FMRP, the majority of the R-loop interactome, especially Readers, contains long IDRs that are highly enriched in low complexity sequences with biased amino acid composition, suggesting that these IDRs could directly interact with R-loops, rather than being “mere flexible linkers” connecting the “functional folded enzyme or binding domains”. Furthermore, our analysis shows that several proteins in the R-loop interactome are either predicted to or have been experimentally demonstrated to undergo LLPS or are known to be associated with phase separated membraneless organelles. Thus, our overall results present a thought-provoking hypothesis that IDRs in the R-loop interactome can provide a functional link between R-loop recognition via direct binding and downstream signaling through the assembly of LLPS-mediated membrane-less R-loop foci. The absence or dysregulation of the function of IDR-enriched R-loop interactors can potentially lead to severe genomic defects, such as the widespread R-loop-mediated DNA double strand breaks that we recently observed in Fragile X patient-derived cells.
Collapse
Affiliation(s)
- Leonardo G Dettori
- Department of Biochemistry and Molecular Biology, SUNY Upstate Medical University, Syracuse, NY, United States
| | - Diego Torrejon
- Department of Biochemistry and Molecular Biology, SUNY Upstate Medical University, Syracuse, NY, United States
| | - Arijita Chakraborty
- Department of Biochemistry and Molecular Biology, SUNY Upstate Medical University, Syracuse, NY, United States
| | - Arijit Dutta
- Department of Biochemistry and Structural Biology, University of Texas Health San Antonio, San Antonio, TX, United States
| | - Mohamed Mohamed
- Department of Biochemistry and Molecular Biology, SUNY Upstate Medical University, Syracuse, NY, United States
| | - Csaba Papp
- Department of Biochemistry and Molecular Biology, SUNY Upstate Medical University, Syracuse, NY, United States.,Department of Urology, SUNY Upstate Medical University, Syracuse, NY, United States
| | - Vladimir A Kuznetsov
- Department of Urology, SUNY Upstate Medical University, Syracuse, NY, United States.,Bioinformatics Institute, ASTAR Biomedical Institutes, Singapore, Singapore
| | - Patrick Sung
- Department of Biochemistry and Structural Biology, University of Texas Health San Antonio, San Antonio, TX, United States
| | - Wenyi Feng
- Department of Biochemistry and Molecular Biology, SUNY Upstate Medical University, Syracuse, NY, United States
| | - Alaji Bah
- Department of Biochemistry and Molecular Biology, SUNY Upstate Medical University, Syracuse, NY, United States
| |
Collapse
|
4
|
Bartas M, Červeň J, Guziurová S, Slychko K, Pečinka P. Amino Acid Composition in Various Types of Nucleic Acid-Binding Proteins. Int J Mol Sci 2021; 22:ijms22020922. [PMID: 33477647 PMCID: PMC7831508 DOI: 10.3390/ijms22020922] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Revised: 01/15/2021] [Accepted: 01/16/2021] [Indexed: 12/20/2022] Open
Abstract
Nucleic acid-binding proteins are traditionally divided into two categories: With the ability to bind DNA or RNA. In the light of new knowledge, such categorizing should be overcome because a large proportion of proteins can bind both DNA and RNA. Another even more important features of nucleic acid-binding proteins are so-called sequence or structure specificities. Proteins able to bind nucleic acids in a sequence-specific manner usually contain one or more of the well-defined structural motifs (zinc-fingers, leucine zipper, helix-turn-helix, or helix-loop-helix). In contrast, many proteins do not recognize nucleic acid sequence but rather local DNA or RNA structures (G-quadruplexes, i-motifs, triplexes, cruciforms, left-handed DNA/RNA form, and others). Finally, there are also proteins recognizing both sequence and local structural properties of nucleic acids (e.g., famous tumor suppressor p53). In this mini-review, we aim to summarize current knowledge about the amino acid composition of various types of nucleic acid-binding proteins with a special focus on significant enrichment and/or depletion in each category.
Collapse
|
5
|
Brázda V, Červeň J, Bartas M, Mikysková N, Coufal J, Pečinka P. The Amino Acid Composition of Quadruplex Binding Proteins Reveals a Shared Motif and Predicts New Potential Quadruplex Interactors. Molecules 2018; 23:E2341. [PMID: 30216987 PMCID: PMC6225207 DOI: 10.3390/molecules23092341] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2018] [Revised: 09/09/2018] [Accepted: 09/12/2018] [Indexed: 12/27/2022] Open
Abstract
The importance of local DNA structures in the regulation of basic cellular processes is an emerging field of research. Amongst local non-B DNA structures, G-quadruplexes are perhaps the most well-characterized to date, and their presence has been demonstrated in many genomes, including that of humans. G-quadruplexes are selectively bound by many regulatory proteins. In this paper, we have analyzed the amino acid composition of all seventy-seven described G-quadruplex binding proteins of Homo sapiens. Our comparison with amino acid frequencies in all human proteins and specific protein subsets (e.g., all nucleic acid binding) revealed unique features of quadruplex binding proteins, with prominent enrichment for glycine (G) and arginine (R). Cluster analysis with bootstrap resampling shows similarities and differences in amino acid composition of particular quadruplex binding proteins. Interestingly, we found that all characterized G-quadruplex binding proteins share a 20 amino acid long motif/domain (RGRGR GRGGG SGGSG GRGRG) which is similar to the previously described RG-rich domain (RRGDG RRRGG GGRGQ GGRGR GGGFKG) of the FRM1 G-quadruplex binding protein. Based on this protein fingerprint, we have predicted a new set of potential G-quadruplex binding proteins sharing this interesting domain rich in glycine and arginine residues.
Collapse
Affiliation(s)
- Václav Brázda
- Institute of Biophysics, Academy of Sciences of the Czech Republic v.v.i., Královopolská 135, 612 65 Brno, Czech Republic.
| | - Jiří Červeň
- Department of Biology and Ecology/Institute of Environmental Technologies, Faculty of Science, University of Ostrava, 710 00 Ostrava, Czech Republic.
| | - Martin Bartas
- Department of Biology and Ecology/Institute of Environmental Technologies, Faculty of Science, University of Ostrava, 710 00 Ostrava, Czech Republic.
| | - Nikol Mikysková
- Department of Biology and Ecology/Institute of Environmental Technologies, Faculty of Science, University of Ostrava, 710 00 Ostrava, Czech Republic.
| | - Jan Coufal
- Institute of Biophysics, Academy of Sciences of the Czech Republic v.v.i., Královopolská 135, 612 65 Brno, Czech Republic.
| | - Petr Pečinka
- Department of Biology and Ecology/Institute of Environmental Technologies, Faculty of Science, University of Ostrava, 710 00 Ostrava, Czech Republic.
| |
Collapse
|