1
|
Sievers A, Sauer L, Bisch M, Sprengel J, Hausmann M, Hildenbrand G. Moderation of Structural DNA Properties by Coupled Dinucleotide Contents in Eukaryotes. Genes (Basel) 2023; 14:genes14030755. [PMID: 36981025 PMCID: PMC10048725 DOI: 10.3390/genes14030755] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 03/16/2023] [Accepted: 03/18/2023] [Indexed: 03/30/2023] Open
Abstract
Dinucleotides are known as determinants for various structural and physiochemical properties of DNA and for binding affinities of proteins to DNA. These properties (e.g., stiffness) and bound proteins (e.g., transcription factors) are known to influence important biological functions, such as transcription regulation and 3D chromatin organization. Accordingly, the question arises of how the considerable variations in dinucleotide contents of eukaryotic chromosomes could still provide consistent DNA properties resulting in similar functions and 3D conformations. In this work, we investigate the hypothesis that coupled dinucleotide contents influence DNA properties in opposite directions to moderate each other's influences. Analyzing all 2478 chromosomes of 155 eukaryotic species, considering bias from coding sequences and enhancers, we found sets of correlated and anti-correlated dinucleotide contents. Using computational models, we estimated changes of DNA properties resulting from this coupling. We found that especially pure A/T dinucleotides (AA, TT, AT, TA), known to influence histone positioning and AC/GT contents, are relevant moderators and that, e.g., the Roll property, which is known to influence histone affinity of DNA, is preferably moderated. We conclude that dinucleotide contents might indirectly influence transcription and chromatin 3D conformation, via regulation of histone occupancy and/or other mechanisms.
Collapse
Affiliation(s)
- Aaron Sievers
- Kirchhoff Institute for Physics, Heidelberg University, INF 227, 69117 Heidelberg, Germany
- Institute for Human Genetics, University Hospital Heidelberg, INF 366, 69117 Heidelberg, Germany
| | - Liane Sauer
- Kirchhoff Institute for Physics, Heidelberg University, INF 227, 69117 Heidelberg, Germany
- Institute for Human Genetics, University Hospital Heidelberg, INF 366, 69117 Heidelberg, Germany
| | - Marc Bisch
- Kirchhoff Institute for Physics, Heidelberg University, INF 227, 69117 Heidelberg, Germany
| | - Jan Sprengel
- Kirchhoff Institute for Physics, Heidelberg University, INF 227, 69117 Heidelberg, Germany
| | - Michael Hausmann
- Kirchhoff Institute for Physics, Heidelberg University, INF 227, 69117 Heidelberg, Germany
| | - Georg Hildenbrand
- Kirchhoff Institute for Physics, Heidelberg University, INF 227, 69117 Heidelberg, Germany
- Faculty of Engeneering, University of Applied Science Aschaffenburg, Würzburger Str. 45, 63743 Aschaffenburg, Germany
| |
Collapse
|
2
|
Li Y, Kong F, Cui H, Wang F, Li C, Ma J. SENIES: DNA Shape Enhanced Two-Layer Deep Learning Predictor for the Identification of Enhancers and Their Strength. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:637-645. [PMID: 35015646 DOI: 10.1109/tcbb.2022.3142019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Identifying enhancers is a critical task in bioinformatics due to their primary role in regulating gene expression. For this reason, various computational algorithms devoted to enhancer identification have been put forward over the years. More features are extracted from the single DNA sequences to boost the performance. Nevertheless, DNA structural information is neglected, which is an essential factor affecting the binding preferences of transcription factors to regulatory elements like enhancers. Here, we propose SENIES, a DNA shape enhanced deep learning predictor, to identify enhancers and their strength. The predictor consists of two layers where the first layer is for enhancer and non-enhancer identification, and the second layer is for predicting the strength of enhancers. Apart from two common sequence-derived features (i.e., one-hot and k-mer), DNA shape is introduced to describe the 3D structures of DNA sequences. Performance comparison with state-of-the-art methods conducted on public datasets demonstrates the effectiveness and robustness of our predictor. The code implementation of SENIES is publicly available at https://github.com/hlju-liye/SENIES.
Collapse
|
3
|
Wang J, Du PF, Xue XY, Li GP, Zhou YK, Zhao W, Lin H, Chen W. VisFeature: a stand-alone program for visualizing and analyzing statistical features of biological sequences. Bioinformatics 2020; 36:1277-1278. [PMID: 31504195 DOI: 10.1093/bioinformatics/btz689] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Revised: 08/12/2019] [Accepted: 08/30/2019] [Indexed: 11/14/2022] Open
Abstract
SUMMARY Many efforts have been made in developing bioinformatics algorithms to predict functional attributes of genes and proteins from their primary sequences. One challenge in this process is to intuitively analyze and to understand the statistical features that have been selected by heuristic or iterative methods. In this paper, we developed VisFeature, which aims to be a helpful software tool that allows the users to intuitively visualize and analyze statistical features of all types of biological sequence, including DNA, RNA and proteins. VisFeature also integrates sequence data retrieval, multiple sequence alignments and statistical feature generation functions. AVAILABILITY AND IMPLEMENTATION VisFeature is a desktop application that is implemented using JavaScript/Electron and R. The source codes of VisFeature are freely accessible from the GitHub repository (https://github.com/wangjun1996/VisFeature). The binary release, which includes an example dataset, can be freely downloaded from the same GitHub repository (https://github.com/wangjun1996/VisFeature/releases). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jun Wang
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Pu-Feng Du
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Xin-Yu Xue
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Guang-Ping Li
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Yuan-Ke Zhou
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Wei Zhao
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Wei Chen
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China.,Center for Genomics and Computational Biology, School of Life Sciences, North China University of Science and Technology, Tangshan 063000, China
| |
Collapse
|
4
|
Daga A, Ansari A, Pandya M, Shah K, Patel S, Rawal R, Umrania V. Significant Role of Segmental Duplications and SIDD Sites in Chromosomal Translocations of Hematological Malignancies: A Multi-parametric Bioinformatic Analysis. Interdiscip Sci 2016; 10:467-475. [PMID: 27896663 DOI: 10.1007/s12539-016-0203-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2016] [Revised: 11/12/2016] [Accepted: 11/14/2016] [Indexed: 10/20/2022]
Abstract
Recurrent non-random chromosomal translocations are hallmark characteristics of leukemogenesis, and however, molecular mechanisms underlying these rearrangements are less explored. The fundamental question is, why and how chromosomes break and reunite so precisely in the genome. Meticulous understanding of mechanism leading to chromosomal rearrangement can be achieved by characterizing breakpoints. To address this hypothesis, a novel multi-parametric computational approach for characterization of major leukemic translocations within and around breakpoint region was performed. To best of our knowledge, this bioinformatic analysis is unique in finding the presence of segmental duplications (SDs) flanking breakpoints of all major leukemic translocation. Breakpoint islands (BpIs) were analyzed for stress-induced duplex destabilization (SIDD) sites along with other complex genomic architecture and physicochemical properties. Our study distinctly emphasizes on the probable correlative role of SDs, SIDD sites and various genomic features in the occurrence of breakpoints. Further, it also highlights potential features which may be playing a crucial role in causing double-strand breaks, leading to translocation.
Collapse
Affiliation(s)
- Aditi Daga
- Department of Microbiology, MVM Science College, Saurashtra University, Near Under Bridge, Kalawad Road, Rajkot, Gujarat, 360007, India
| | - Afzal Ansari
- BIT Virtual Institute of Bioinformatics (GCRI Node), GSBTM, Gandhinagar, Gujarat, India
- BIT Virtual Institute of Bioinformatics (GCRI Node), Division of Medicinal Chemistry and Pharmacogenomics, The Gujarat Cancer and Research Institute, NCH Campus, Asarwa, Ahmedabad, Gujarat, 380016, India
| | - Medha Pandya
- Department of Bioinformatics, Maharaja Krishnakumarsinhji Bhavnagar University, Bhavnagar, Gujarat, 364022, India
- Department of Physics, Maharaja Krishnakumarsinhji Bhavnagar University, Bhavnagar, Gujarat, 364022, India
| | - Krupa Shah
- Division of Medicinal Chemistry and Pharmacogenomics, Department of Cancer Biology, The Gujarat Cancer and Research Institute, NCH Campus, Asarwa, Ahmedabad, Gujarat, 380016, India
| | - Shanaya Patel
- Division of Medicinal Chemistry and Pharmacogenomics, Department of Cancer Biology, The Gujarat Cancer and Research Institute, NCH Campus, Asarwa, Ahmedabad, Gujarat, 380016, India
| | - Rakesh Rawal
- Division of Medicinal Chemistry and Pharmacogenomics, Department of Cancer Biology, The Gujarat Cancer and Research Institute, NCH Campus, Asarwa, Ahmedabad, Gujarat, 380016, India.
| | - Valentina Umrania
- Department of Microbiology, MVM Science College, Saurashtra University, Near Under Bridge, Kalawad Road, Rajkot, Gujarat, 360007, India
| |
Collapse
|
5
|
Daga A, Ansari A, Rawal R, Umrania V. Characterization of chromosomal translocation breakpoint sequences in solid tumours: "an in silico analysis". Open Med Inform J 2015; 9:1-8. [PMID: 25972994 PMCID: PMC4421838 DOI: 10.2174/1874431101509010001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2014] [Revised: 02/19/2015] [Accepted: 02/28/2015] [Indexed: 01/07/2023] Open
Abstract
Chromosomal translocations that results in formation and activation of fusion oncogenes are observed in numerous solid malignancies since years back. Expression of fusion kinases in these cancers drives the initiation & progression that ultimately leads to tumour development and thus comes out to be clinically imperative in terms of diagnosis and treatment of cancer. Nonetheless, molecular mechanisms beneath these translocations remained unexplored consequently limiting our knowledge of carcinogenesis and hence is the current field where further research is required. The issue of prime focus is the precision with which the chromosomes breaks and reunites within genome. Characterization of Genomic sequences located at Breakpoint region may direct us towards the thorough understanding of mechanism leading to chromosomal rearrangement. A unique computational multi-parametric analysis was performed for characterization of genomic sequence within and around breakpoint region. This study turns out to be novel as it reveals the occurrence of Segmental Duplications flanking the breakpoints of all translocation. Breakpoint Islands were also investigated for the presence of other intricate genomic architecture and various physico-chemical parameters. Our study particularly highlights the probable role of SDs and specific genomic features in precise chromosomal breakage. Additionally, it pinpoints the potential features that may be significant for double-strand breaks leading to chromosomal rearrangements.
Collapse
Affiliation(s)
- Aditi Daga
- Department of Microbiology, MVM Science College, Saurashtra University, Rajkot, Gujarat, India
| | - Afzal Ansari
- BIT Virtual Institute of Bioinformatics (GCRI Node), GSBTM, Gandhinagar, Gujarat, India
| | - Rakesh Rawal
- Department of Cancer Biology, The Gujarat Cancer & Research Institute, Ahmedabad, Gujarat, India
| | - Valentina Umrania
- Department of Microbiology, MVM Science College, Saurashtra University, Rajkot, Gujarat, India
| |
Collapse
|
6
|
Gupta Y, Witte M, Möller S, Ludwig RJ, Restle T, Zillikens D, Ibrahim SM. ptRNApred: computational identification and classification of post-transcriptional RNA. Nucleic Acids Res 2014; 42:e167. [PMID: 25303994 PMCID: PMC4267668 DOI: 10.1093/nar/gku918] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
UNLABELLED Non-coding RNAs (ncRNAs) are known to play important functional roles in the cell. However, their identification and recognition in genomic sequences remains challenging. In silico methods, such as classification tools, offer a fast and reliable way for such screening and multiple classifiers have already been developed to predict well-defined subfamilies of RNA. So far, however, out of all the ncRNAs, only tRNA, miRNA and snoRNA can be predicted with a satisfying sensitivity and specificity. We here present ptRNApred, a tool to detect and classify subclasses of non-coding RNA that are involved in the regulation of post-transcriptional modifications or DNA replication, which we here call post-transcriptional RNA (ptRNA). It (i) detects RNA sequences coding for post-transcriptional RNA from the genomic sequence with an overall sensitivity of 91% and a specificity of 94% and (ii) predicts ptRNA-subclasses that exist in eukaryotes: snRNA, snoRNA, RNase P, RNase MRP, Y RNA or telomerase RNA. AVAILABILITY The ptRNApred software is open for public use on http://www.ptrnapred.org/.
Collapse
Affiliation(s)
- Yask Gupta
- Department of Dermatology, University of Lübeck, 23538 Lübeck, Germany
| | - Mareike Witte
- Department of Dermatology, University of Lübeck, 23538 Lübeck, Germany
| | - Steffen Möller
- Department of Dermatology, University of Lübeck, 23538 Lübeck, Germany
| | - Ralf J Ludwig
- Department of Dermatology, University of Lübeck, 23538 Lübeck, Germany
| | - Tobias Restle
- Institute for Molecular Medicine, University of Lübeck, 23538 Lübeck, Germany
| | - Detlef Zillikens
- Department of Dermatology, University of Lübeck, 23538 Lübeck, Germany
| | - Saleh M Ibrahim
- Department of Dermatology, University of Lübeck, 23538 Lübeck, Germany
| |
Collapse
|
7
|
Martínez-Guitarte JL, de la Fuente M, Morcillo G. Telomeric transcriptome from Chironomus riparius (Diptera), a species with noncanonical telomeres. INSECT MOLECULAR BIOLOGY 2014; 23:367-380. [PMID: 24580894 DOI: 10.1111/imb.12087] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Although there are alternative telomere structures, most telomeres contain DNA arrays of short repeats (6-26 bp) maintained by telomerase. Like other diptera, Chironomus riparius has noncanonical telomeres and three subfamilies, TsA, TsB and TsC, of longer sequences (176 bp) are found at their chromosomal ends. Reverse transcription PCR was used to show that different RNAs are transcribed from these sequences. Only one strand from TsA sequences seems to render a noncoding RNA (named CriTER-A); transcripts from both TsB strands were found (CriTER-B and αCriTER-B) but no TsC transcripts were detected. Interestingly, these sequences showed a differential transcriptional response upon heat shock, and they were also differentially affected by inhibitors of RNA polymerase II and RNA polymerase III. A computer search for transcription factor binding sites revealed putative regulatory cis-elements within the transcribed sequence, reinforcing the experimental evidence which suggests that the telomeric repeat might function as a promoter. This work describes the telomeric transcriptome of an insect with non-telomerase telomeres, confirming the evolutionary conservation of telomere transcription. Our data reveal differences in the regulation of telomeric transcripts between control and stressful environmental conditions, supporting the idea that telomeric RNAs could have a relevant role in cellular metabolism in insect cells.
Collapse
Affiliation(s)
- J L Martínez-Guitarte
- Grupo de Biología y Toxicología Ambiental, Facultad de Ciencias, Universidad Nacional de Educación a Distancia, UNED, Madrid, Spain
| | | | | |
Collapse
|
8
|
Muiño JM, Smaczniak C, Angenent GC, Kaufmann K, van Dijk ADJ. Structural determinants of DNA recognition by plant MADS-domain transcription factors. Nucleic Acids Res 2013; 42:2138-46. [PMID: 24275492 PMCID: PMC3936718 DOI: 10.1093/nar/gkt1172] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Plant MADS-domain transcription factors act as key regulators of many developmental processes. Despite the wealth of information that exists about these factors, the mechanisms by which they recognize their cognate DNA-binding site, called CArG-box (consensus CCW6GG), and how different MADS-domain proteins achieve DNA-binding specificity, are still largely unknown. We used information from in vivo ChIP-seq experiments, in vitro DNA-binding data and evolutionary conservation to address these important questions. We found that structural characteristics of the DNA play an important role in the DNA binding of plant MADS-domain proteins. The central region of the CArG-box largely resembles a structural motif called ‘A-tract’, which is characterized by a narrow minor groove and may assist bending of the DNA by MADS-domain proteins. Periodically spaced A-tracts outside the CArG-box suggest additional roles for this structure in the process of DNA binding of these transcription factors. Structural characteristics of the CArG-box not only play an important role in DNA-binding site recognition of MADS-domain proteins, but also partly explain differences in DNA-binding specificity of different members of this transcription factor family and their heteromeric complexes.
Collapse
Affiliation(s)
- Jose M Muiño
- Bioscience, Plant Research International, Wageningen, PO Box 619, 6700 AP, The Netherlands, Laboratory of Bioinformatics, Wageningen University, PO Box 569, 6700 AN Wageningen, The Netherlands, Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin D-14195, Germany, Laboratory of Molecular Biology, Wageningen University, Wageningen, PO Box 633, 6700 AP, The Netherlands and Biometris, Wageningen University and Research Centre, Wageningen, PO Box 100, 6700 AC, The Netherlands
| | | | | | | | | |
Collapse
|
9
|
Hühne R, Thalheim T, Sühnel J. AgeFactDB--the JenAge Ageing Factor Database--towards data integration in ageing research. Nucleic Acids Res 2013; 42:D892-6. [PMID: 24217911 PMCID: PMC3964983 DOI: 10.1093/nar/gkt1073] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
AgeFactDB (http://agefactdb.jenage.de) is a database aimed at the collection and integration of ageing phenotype data including lifespan information. Ageing factors are considered to be genes, chemical compounds or other factors such as dietary restriction, whose action results in a changed lifespan or another ageing phenotype. Any information related to the effects of ageing factors is called an observation and is presented on observation pages. To provide concise access to the complete information for a particular ageing factor, corresponding observations are also summarized on ageing factor pages. In a first step, ageing-related data were primarily taken from existing databases such as the Ageing Gene Database--GenAge, the Lifespan Observations Database and the Dietary Restriction Gene Database--GenDR. In addition, we have started to include new ageing-related information. Based on homology data taken from the HomoloGene Database, AgeFactDB also provides observation and ageing factor pages of genes that are homologous to known ageing-related genes. These homologues are considered as candidate or putative ageing-related genes. AgeFactDB offers a variety of search and browse options, and also allows the download of ageing factor or observation lists in TSV, CSV and XML formats.
Collapse
Affiliation(s)
- Rolf Hühne
- Biocomputing Group, Leibniz Institute for Age Research - Fritz Lipmann Institute, Jena Centre for Systems Biology of Ageing - JenAge, Beutenbergstrasse 11, Jena, Germany
| | | | | |
Collapse
|
10
|
Porcelli I, Reuter M, Pearson BM, Wilhelm T, van Vliet AHM. Parallel evolution of genome structure and transcriptional landscape in the Epsilonproteobacteria. BMC Genomics 2013; 14:616. [PMID: 24028687 PMCID: PMC3847290 DOI: 10.1186/1471-2164-14-616] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2013] [Accepted: 09/03/2013] [Indexed: 02/26/2023] Open
Abstract
Background Gene reshuffling, point mutations and horizontal gene transfer contribute to bacterial genome variation, but require the genome to rewire its transcriptional circuitry to ensure that inserted, mutated or reshuffled genes are transcribed at appropriate levels. The genomes of Epsilonproteobacteria display very low synteny, due to high levels of reshuffling and reorganisation of gene order, but still share a significant number of gene orthologs allowing comparison. Here we present the primary transcriptome of the pathogenic Epsilonproteobacterium Campylobacter jejuni, and have used this for comparative and predictive transcriptomics in the Epsilonproteobacteria. Results Differential RNA-sequencing using 454 sequencing technology was used to determine the primary transcriptome of C. jejuni NCTC 11168, which consists of 992 transcription start sites (TSS), which included 29 putative non-coding and stable RNAs, 266 intragenic (internal) TSS, and 206 antisense TSS. Several previously unknown features were identified in the C. jejuni transcriptional landscape, like leaderless mRNAs and potential leader peptides upstream of amino acid biosynthesis genes. A cross-species comparison of the primary transcriptomes of C. jejuni and the related Epsilonproteobacterium Helicobacter pylori highlighted a lack of conservation of operon organisation, position of intragenic and antisense promoters or leaderless mRNAs. Predictive comparisons using 40 other Epsilonproteobacterial genomes suggests that this lack of conservation of transcriptional features is common to all Epsilonproteobacterial genomes, and is associated with the absence of genome synteny in this subdivision of the Proteobacteria. Conclusions Both the genomes and transcriptomes of Epsilonproteobacteria are highly variable, both at the genome level by combining and division of multicistronic operons, but also on the gene level by generation or deletion of promoter sequences and 5′ untranslated regions. Regulatory features may have evolved after these species split from a common ancestor, with transcriptome rewiring compensating for changes introduced by genomic reshuffling and horizontal gene transfer.
Collapse
Affiliation(s)
- Ida Porcelli
- Gut Health and Food Safety Programme, Institute of Food Research, Colney Lane, Norwich, NR4 7UA, UK.
| | | | | | | | | |
Collapse
|
11
|
Maruyama H, Harwood JC, Moore KM, Paszkiewicz K, Durley SC, Fukushima H, Atomi H, Takeyasu K, Kent NA. An alternative beads-on-a-string chromatin architecture in Thermococcus kodakarensis. EMBO Rep 2013; 14:711-7. [PMID: 23835508 DOI: 10.1038/embor.2013.94] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2013] [Revised: 06/05/2013] [Accepted: 06/12/2013] [Indexed: 12/11/2022] Open
Abstract
We have applied chromatin sequencing technology to the euryarchaeon Thermococcus kodakarensis, which is known to possess histone-like proteins. We detect positioned chromatin particles of variable sizes associated with lengths of DNA differing as multiples of 30 bp (ranging from 30 bp to >450 bp) consistent with formation from dynamic polymers of the archaeal histone dimer. T. kodakarensis chromatin particles have distinctive underlying DNA sequence suggesting a genomic particle-positioning code and are excluded from gene-regulatory DNA suggesting a functional organization. Beads-on-a-string chromatin is therefore conserved between eukaryotes and archaea but can derive from deployment of histone-fold proteins in a variety of multimeric forms.
Collapse
Affiliation(s)
- Hugo Maruyama
- Department of Bacteriology, Osaka Dental University, Osaka 573-1121, Japan
| | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Oberlin AT, Jurkovic DA, Balish MF, Friedberg I. Biological database of images and genomes: tools for community annotations linking image and genomic information. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013; 2013:bat016. [PMID: 23550062 PMCID: PMC3708683 DOI: 10.1093/database/bat016] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Genomic data and biomedical imaging data are undergoing exponential growth. However, our understanding of the phenotype-genotype connection linking the two types of data is lagging behind. While there are many types of software that enable the manipulation and analysis of image data and genomic data as separate entities, there is no framework established for linking the two. We present a generic set of software tools, BioDIG, that allows linking of image data to genomic data. BioDIG tools can be applied to a wide range of research problems that require linking images to genomes. BioDIG features the following: rapid construction of web-based workbenches, community-based annotation, user management and web services. By using BioDIG to create websites, researchers and curators can rapidly annotate a large number of images with genomic information. Here we present the BioDIG software tools that include an image module, a genome module and a user management module. We also introduce a BioDIG-based website, MyDIG, which is being used to annotate images of mycoplasmas.
Collapse
Affiliation(s)
- Andrew T Oberlin
- Department of Computer Science and Software Engineering, Miami University, Oxford, OH 45056, USA
| | | | | | | |
Collapse
|
13
|
Megremis S, Demetriou P, Makrinioti H, Manoussaki AE, Papadopoulos NG. The genomic signature of human rhinoviruses A, B and C. PLoS One 2012; 7:e44557. [PMID: 23028561 PMCID: PMC3441561 DOI: 10.1371/journal.pone.0044557] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2012] [Accepted: 08/07/2012] [Indexed: 11/19/2022] Open
Abstract
Human rhinoviruses are single stranded positive sense RNA viruses that are presented in more than 50% of acute upper respiratory tract infections. Despite extensive studies on the genetic diversity of the virus, little is known about the forces driving it. In order to explain this diversity, many research groups have focused on protein sequence requirements for viable, functional and transmissible virus but have missed out an important aspect of viral evolution such as the genomic ontology of the virus. This study presents for the first time the genomic signature of 111 fully sequenced HRV strains from all three groups HRV-A, HRV-B and HRV-C. We observed an HRV genome tendency to eliminate CpG and UpA dinucleotides, coupling with over-representation of UpG and CpA. We propose a specific mechanism which describes how rapid changes in the HRV genomic sequence can take place under the strict control of conservation of the polypeptide backbone. Moreover, the distribution of the observed under- and over-represented dinucleotides along the HRV genome is presented. Distance matrice tables based on CpG and UpA odds ratios were constructed and viewed as heatmaps and distance trees. None of the suppressions can be attributed to codon usage or in RNA secondary structure requirements. Since viral recognition is dependent on RNA motifs rich in CpG and UpA, it is possible that the overall described genome evolution mechanism acts in order to protect the virus from host recognition.
Collapse
Affiliation(s)
- Spyridon Megremis
- Allergy Department, 2nd Pediatric Clinic, University of Athens, Athens, Greece.
| | | | | | | | | |
Collapse
|
14
|
Meysman P, Marchal K, Engelen K. DNA structural properties in the classification of genomic transcription regulation elements. Bioinform Biol Insights 2012; 6:155-68. [PMID: 22837642 PMCID: PMC3399529 DOI: 10.4137/bbi.s9426] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
It has been long known that DNA molecules encode information at various levels. The most basic level comprises the base sequence itself and is primarily important for the encoding of proteins and direct base recognition by DNA-binding proteins. A more elusive level consists of the local structural properties of the DNA molecule wherein the DNA sequence only plays an indirect supportive role. These properties are nevertheless an important factor in a large number of biomolecular processes and can be considered as informative signals for the presence of a variety of genomic features. Several recent studies have unequivocally shown the benefit of relying on such DNA properties for modeling and predicting genomic features as diverse as transcription start sites, transcription factor binding sites, or nucleosome occupancy. This review is meant to provide an overview of the key aspects of these DNA conformational and physicochemical properties. To illustrate their potential added value compared to relying solely on the nucleotide sequence in genomics studies, we discuss their application in research on transcription regulation mechanisms as representative cases.
Collapse
Affiliation(s)
- Pieter Meysman
- Department of Molecular and Microbial Systems, KULeuven, Kasteelpark Arenberg 20, 3001 Leuven, Belgium
| | | | | |
Collapse
|
15
|
Dai Z, Dai X. Gene expression divergence is coupled to evolution of DNA structure in coding regions. PLoS Comput Biol 2011; 7:e1002275. [PMID: 22125484 PMCID: PMC3219629 DOI: 10.1371/journal.pcbi.1002275] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2011] [Accepted: 10/01/2011] [Indexed: 01/17/2023] Open
Abstract
Sequence changes in coding region and regulatory region of the gene itself (cis) determine most of gene expression divergence between closely related species. But gene expression divergence between yeast species is not correlated with evolution of primary nucleotide sequence. This indicates that other factors in cis direct gene expression divergence. Here, we studied the contribution of DNA three-dimensional structural evolution as cis to gene expression divergence. We found that the evolution of DNA structure in coding regions and gene expression divergence are correlated in yeast. Similar result was also observed between Drosophila species. DNA structure is associated with the binding of chromatin remodelers and histone modifiers to DNA sequences in coding regions, which influence RNA polymerase II occupancy that controls gene expression level. We also found that genes with similar DNA structures are involved in the same biological process and function. These results reveal the previously unappreciated roles of DNA structure as cis-effects in gene expression. The unique phenotype of each organism is partly determined by gene expression. Changes in gene expression are an important source of phenotypic variation, and can be caused by changes in regulatory and coding sequences of the gene itself (cis) and changes in regulatory factors (trans). The contribution of cis regulation to gene expression divergence between closely related species is much greater than that of trans regulation. However, evolution of primary nucleotide sequences is not correlated with gene expression divergence in yeast, suggesting that other factors in cis drive gene expression divergence. Here, we found that evolution of DNA structure in coding regions is coupled to gene expression divergence in yeast. We also found that DNA structure is associated with specific gene characteristics. Genes with similar DNA structures are involved in the same biological process and function. These results demonstrate the important roles of DNA structure in directing gene expression.
Collapse
Affiliation(s)
- Zhiming Dai
- School of Information Science and Technology, Sun Yat-Sen University, Guangzhou, China
- * E-mail: (ZD); (XD)
| | - Xianhua Dai
- School of Information Science and Technology, Sun Yat-Sen University, Guangzhou, China
- * E-mail: (ZD); (XD)
| |
Collapse
|
16
|
Masoudi-Nejad A, Movahedi S, Jáuregui R. Genome-scale computational analysis of DNA curvature and repeats in Arabidopsis and rice uncovers plant-specific genomic properties. BMC Genomics 2011; 12:214. [PMID: 21548945 PMCID: PMC3113785 DOI: 10.1186/1471-2164-12-214] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2010] [Accepted: 05/06/2011] [Indexed: 11/16/2022] Open
Abstract
Background Due to its overarching role in genome function, sequence-dependent DNA curvature continues to attract great attention. The DNA double helix is not a rigid cylinder, but presents both curvature and flexibility in different regions, depending on the sequence. More in depth knowledge of the various orders of complexity of genomic DNA structure has allowed the design of sophisticated bioinformatics tools for its analysis and manipulation, which, in turn, have yielded a better understanding of the genome itself. Curved DNA is involved in many biologically important processes, such as transcription initiation and termination, recombination, DNA replication, and nucleosome positioning. CpG islands and tandem repeats also play significant roles in the dynamics and evolution of genomes. Results In this study, we analyzed the relationship between these three structural features within rice (Oryza sativa) and Arabidopsis (Arabidopsis thaliana) genomes. A genome-scale prediction of curvature distribution in rice and Arabidopsis indicated that most of the chromosomes of both genomes have maximal chromosomal DNA curvature adjacent to the centromeric region. By analyzing tandem repeats across the genome, we found that frequencies of repeats are higher in regions adjacent to those with high curvature value. Further analysis of CpG islands shows a clear interdependence between curvature value, repeat frequencies and CpG islands. Each CpG island appears in a local minimal curvature region, and CpG islands usually do not appear in the centromere or regions with high repeat frequency. A statistical evaluation demonstrates the significance and non-randomness of these features. Conclusions This study represents the first systematic genome-scale analysis of DNA curvature, CpG islands and tandem repeats at the DNA sequence level in plant genomes, and finds that not all of the chromosomes in plants follow the same rules common to other eukaryote organisms, suggesting that some of these genomic properties might be considered as specific to plants.
Collapse
Affiliation(s)
- Ali Masoudi-Nejad
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics and COE in Biomathematics, University of Tehran, Iran.
| | | | | |
Collapse
|
17
|
Lacroix T, Loux V, Gendrault A, Gibrat JF, Chiapello H. CompaGB: An open framework for genome browsers comparison. BMC Res Notes 2011; 4:133. [PMID: 21542900 PMCID: PMC3096945 DOI: 10.1186/1756-0500-4-133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2010] [Accepted: 05/04/2011] [Indexed: 11/17/2022] Open
Abstract
Background Tools to visualize and explore genomes hold a central place in genomics and the diversity of genome browsers has increased dramatically over the last few years. It often turns out to be a daunting task to compare and choose a well-adapted genome browser, as multidisciplinary knowledge is required to carry out this task and the number of tools, functionalities and features are overwhelming. Findings To assist in this task, we propose a community-based framework based on two cornerstones: (i) the implementation of industry promoted software qualification method (QSOS) adapted for genome browser evaluations, and (ii) a web resource providing numerous facilities either for visualizing comparisons or performing new evaluations. We formulated 60 criteria specifically for genome browsers, and incorporated another 65 directly from QSOS's generic section. Those criteria aim to answer versatile needs, ranging from a biologist whose interest primarily lies into user-friendly and informative functionalities, a bioinformatician who wants to integrate the genome browser into a wider framework, or a computer scientist who might choose a software according to more technical features. We developed a dedicated web application to enrich the existing QSOS functionalities (weighting of criteria, user profile) with features of interest to a community-based framework: easy management of evolving data, user comments... Conclusions The framework is available at http://genome.jouy.inra.fr/CompaGB. It is open to anyone who wishes to participate in the evaluations. It helps the scientific community to (1) choose a genome browser that would better fit their particular project, (2) visualize features comparatively with easily accessible formats, such as tables or radar plots and (3) perform their own evaluation against the defined criteria. To illustrate the CompaGB functionalities, we have evaluated seven genome browsers according to the implemented methodology. A summary of the features of the compared genome browsers is presented and discussed.
Collapse
Affiliation(s)
- Thomas Lacroix
- INRA UR1077, Unité Mathématique, Informatique & Génome, Jouy-en-Josas, France.
| | | | | | | | | |
Collapse
|
18
|
Hollunder J, Friedel M, Kuiper M, Wilhelm T. DASS-GUI: a user interface for identification and analysis of significant patterns in non-sequential data. Bioinformatics 2010; 26:987-9. [PMID: 20172945 PMCID: PMC2844999 DOI: 10.1093/bioinformatics/btq071] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2009] [Revised: 02/12/2010] [Accepted: 02/17/2010] [Indexed: 11/29/2022] Open
Abstract
SUMMARY Many large 'omics' datasets have been published and many more are expected in the near future. New analysis methods are needed for best exploitation. We have developed a graphical user interface (GUI) for easy data analysis. Our discovery of all significant substructures (DASS) approach elucidates the underlying modularity, a typical feature of complex biological data. It is related to biclustering and other data mining approaches. Importantly, DASS-GUI also allows handling of multi-sets and calculation of statistical significances. DASS-GUI contains tools for further analysis of the identified patterns: analysis of the pattern hierarchy, enrichment analysis, module validation, analysis of additional numerical data, easy handling of synonymous names, clustering, filtering and merging. Different export options allow easy usage of additional tools such as Cytoscape. AVAILABILITY Source code, pre-compiled binaries for different systems, a comprehensive tutorial, case studies and many additional datasets are freely available at http://www.ifr.ac.uk/dass/gui/. DASS-GUI is implemented in Qt.
Collapse
Affiliation(s)
- Jens Hollunder
- Department of Plant Systems Biology, VIB, Department of Molecular Genetics, Ghent University, Technologiepark 927, B-9052 Gent, Belgium.
| | | | | | | |
Collapse
|