1
|
Sharma D, Sharma K, Mishra A, Siwach P, Mittal A, Jayaram B. Molecular dynamics simulation-based trinucleotide and tetranucleotide level structural and energy characterization of the functional units of genomic DNA. Phys Chem Chem Phys 2023; 25:7323-7337. [PMID: 36825435 DOI: 10.1039/d2cp04820e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
Abstract
Genomes of most organisms on earth are written in a universal language of life, made up of four units - adenine (A), thymine (T), guanine (G), and cytosine (C), and understanding the way they are put together has been a great challenge to date. Multiple efforts have been made to annotate this wonderfully engineered string of DNA using different methods but they lack a universal character. In this article, we have investigated the structural and energetic profiles of both prokaryotes and eukaryotes by considering two essential genomic sites, viz., the transcription start sites (TSS) and exon-intron boundaries. We have characterized these sites by mapping the structural and energy features of DNA obtained from molecular dynamics simulations, which considers all possible trinucleotide and tetranucleotide steps. For DNA, these physicochemical properties show distinct signatures at the TSS and intron-exon boundaries. Our results firmly convey the idea that DNA uses the same dialect for prokaryotes and eukaryotes and that it is worth going beyond sequence-level analyses to physicochemical space to determine the functional destiny of DNA sequences.
Collapse
Affiliation(s)
- Dinesh Sharma
- Supercomputing Facility for Bioinformatics & Computational Biology, Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India
| | - Kopal Sharma
- Supercomputing Facility for Bioinformatics & Computational Biology, Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India
| | - Akhilesh Mishra
- Supercomputing Facility for Bioinformatics & Computational Biology, Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India
| | - Priyanka Siwach
- Department of Biotechnology, Chaudhary Devi Lal University, Sirsa, Haryana, India
| | - Aditya Mittal
- Supercomputing Facility for Bioinformatics & Computational Biology, Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India
| | - B Jayaram
- Supercomputing Facility for Bioinformatics & Computational Biology, Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India.,Department of Chemistry, Indian Institute of Technology, Delhi, India.
| |
Collapse
|
2
|
Mishra A, Siwach P, Misra P, Dhiman S, Pandey AK, Srivastava P, Jayaram B. Intron exon boundary junctions in human genome have in-built unique structural and energetic signals. Nucleic Acids Res 2021; 49:2674-2683. [PMID: 33621338 PMCID: PMC7969029 DOI: 10.1093/nar/gkab098] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2019] [Revised: 01/21/2021] [Accepted: 02/22/2021] [Indexed: 11/13/2022] Open
Abstract
Precise identification of correct exon–intron boundaries is a prerequisite to analyze the location and structure of genes. The existing framework for genomic signals, delineating exon and introns in a genomic segment, seems insufficient, predominantly due to poor sequence consensus as well as limitations of training on available experimental data sets. We present here a novel concept for characterizing exon–intron boundaries in genomic segments on the basis of structural and energetic properties. We analyzed boundary junctions on both sides of all the exons (3 28 368) of protein coding genes from human genome (GENCODE database) using 28 structural and three energy parameters. Study of sequence conservation at these sites shows very poor consensus. It is observed that DNA adopts a unique structural and energy state at the boundary junctions. Also, signals are somewhat different for housekeeping and tissue specific genes. Clustering of 31 parameters into four derived vectors gives some additional insights into the physical mechanisms involved in this biological process. Sites of structural and energy signals correlate well to the positions playing important roles in pre-mRNA splicing.
Collapse
Affiliation(s)
- Akhilesh Mishra
- Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology Delhi, India.,Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India
| | - Priyanka Siwach
- Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology Delhi, India.,Department of Biotechnology, Chaudhary Devi Lal University, Sirsa, Haryana, India
| | - Pallavi Misra
- Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology Delhi, India
| | - Simran Dhiman
- Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology Delhi, India
| | | | - Parul Srivastava
- Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology Delhi, India
| | - B Jayaram
- Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology Delhi, India.,Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India.,Department of Chemistry, Indian Institute of Technology, Delhi, India
| |
Collapse
|
3
|
Mishra A, Dhanda S, Siwach P, Aggarwal S, Jayaram B. A novel method SEProm for prokaryotic promoter prediction based on DNA structure and energetics. Bioinformatics 2020; 36:2375-2384. [PMID: 31909789 DOI: 10.1093/bioinformatics/btz941] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Revised: 11/08/2019] [Accepted: 01/02/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Despite conservation in general architecture of promoters and protein-DNA interaction interface of RNA polymerases among various prokaryotes, identification of promoter regions in the whole genome sequences remains a daunting challenge. The available tools for promoter prediction do not seem to address the problem satisfactorily, apparently because the biochemical nature of promoter signals is yet to be understood fully. Using 28 structural and 3 energetic parameters, we found that prokaryotic promoter regions have a unique structural and energy state, quite distinct from that of coding regions and the information for this signature state is in-built in their sequences. We developed a novel promoter prediction tool from these 31 parameters using various statistical techniques. RESULTS Here, we introduce SEProm, a novel tool that is developed by studying and utilizing the in-built structural and energy information of DNA sequences, which is applicable to all prokaryotes including archaea. Compared to five most recent, diverged and current best available tools, SEProm performs much better, predicting promoters with an 'F-value' of 82.04 and 'Precision' of 81.08. The next best 'F-value' was obtained with PromPredict (72.14) followed by BProm (68.37). On the basis of 'Precision' value, the next best 'Precision' was observed for Pepper (75.39) followed by PromPredict (72.01). SEProm maintained the lead even when comparison was done on two test organisms (not involved in training for SEProm). AVAILABILITY AND IMPLEMENTATION The software is freely available with easy to follow instructions (www.scfbio-iitd.res.in/software/TSS_Predict.jsp). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Akhilesh Mishra
- Supercomputing Facility for Bioinformatics & Computational Biology.,Kusuma School of Biological Sciences, Indian Institute of Technology, New Delhi 110016, India
| | - Sahil Dhanda
- Supercomputing Facility for Bioinformatics & Computational Biology
| | - Priyanka Siwach
- Supercomputing Facility for Bioinformatics & Computational Biology.,Department of Biotechnology, Chaudhary Devi Lal University, Sirsa 125055, India
| | - Shruti Aggarwal
- Supercomputing Facility for Bioinformatics & Computational Biology
| | - B Jayaram
- Supercomputing Facility for Bioinformatics & Computational Biology.,Kusuma School of Biological Sciences, Indian Institute of Technology, New Delhi 110016, India.,Department of Chemistry, Indian Institute of Technology, New Delhi 110016, India
| |
Collapse
|
4
|
Bhat R, Kaushik R, Singh A, DasGupta D, Jayaraj A, Soni A, Shandilya A, Shekhar V, Shekhar S, Jayaram B. A comprehensive automated computer-aided discovery pipeline from genomes to hit molecules. Chem Eng Sci 2020. [DOI: 10.1016/j.ces.2020.115711] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
5
|
Abstract
Gene prediction, also known as gene identification, gene finding, gene recognition, or gene discovery, is among one of the important problems of molecular biology and is receiving increasing attention due to the advent of large-scale genome sequencing projects. We designed an ab initio model (called ChemGenome) for gene prediction in prokaryotic genomes based on physicochemical characteristics of codons. In this chapter, we present the methodology of the latest version of this model ChemGenome2.1 (CG2.1). The first module of the protocol builds a three-dimensional vector from three calculated quantities for each codon-the double-helical trinucleotide base pairing energy, the base pair stacking energy, and an index of the propensity of a codon for protein-nucleic acid interactions. As this three-dimensional vector moves along any genome, the net orientation of the resultant vector should differ significantly for gene and non-genic regions to make a distinction feasible. The predicted putative protein-coding genes from above parameters are passed through a second module of the protocol which reduces the number of false positives by utilizing a filter based on stereochemical properties of protein sequences. The chemical properties of amino acid side chains taken into consideration are the presence of sp3 hybridized γ carbon atom, hydrogen bond donor ability, short/absence of δ carbon and linearity of the side chains/non-occurrence of bi-dentate forks with terminal hydrogen atoms in the side chain. The final prediction of the potential protein-coding genes is based on the frequency of occurrence of amino acids in the predicted protein sequences and their deviation from the frequency values of Swissprot protein sequences, both at monomer and tripeptide levels. The final screening is based on Z-score. Though CG2.1 is a gene finding tool for prokaryotes, considering the underlying similarity in the chemical and physical properties of DNA among prokaryotes and eukaryotes, we attempted to evaluate its applicability for gene finding in the lower eukaryotes. The results give a hope that the concept of gene finding based on physicochemical model of codons is a viable idea for eukaryotes as well, though, undoubtedly, improvements are needed.
Collapse
Affiliation(s)
- Akhilesh Mishra
- Supercomputing Facility for Bioinformatics and Computational Biology, Indian Institute of Technology Delhi, New Delhi, India
- Kusuma School of Biological Sciences, Indian Institute of Technology Delhi, New Delhi, India
| | - Priyanka Siwach
- Supercomputing Facility for Bioinformatics and Computational Biology, Indian Institute of Technology Delhi, New Delhi, India
- Department of Biotechnology, Chaudhary Devi Lal University, Sirsa, Haryana, India
| | - Poonam Singhal
- Supercomputing Facility for Bioinformatics and Computational Biology, Indian Institute of Technology Delhi, New Delhi, India
| | - B Jayaram
- Supercomputing Facility for Bioinformatics and Computational Biology, Indian Institute of Technology Delhi, New Delhi, India.
- Kusuma School of Biological Sciences, Indian Institute of Technology Delhi, New Delhi, India.
- Department of Chemistry, Indian Institute of Technology Delhi, New Delhi, India.
| |
Collapse
|
6
|
Jaiswal AK, Krishnamachari A. Physicochemical property based computational scheme for classifying DNA sequence elements of Saccharomyces cerevisiae. Comput Biol Chem 2018; 79:193-201. [PMID: 30711426 DOI: 10.1016/j.compbiolchem.2018.12.014] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Accepted: 12/25/2018] [Indexed: 10/27/2022]
Abstract
GenerationE of huge "omics" data necessitates the development and application of computational methods to annotate the data in terms of biological features. In the context of DNA sequence, it is important to unravel the hidden physicochemical signatures. For this purpose, we have considered various sequence elements such as promoter, ACS, LTRs, telomere, and retrotransposon of the model organism Saccharomyces cerevisiae. Contributions due to di-nucleotides play a major role in studying the DNA conformation profile. The physicochemical parameters used are hydrogen bonding energy, stacking energy and solvation energy per base pair. Our computational study shows that all sequence elements in this study have distinctive physicochemical signatures and the same can be exploited for prediction experiments. The order that we see in a DNA sequence is dictated by biological regions and hence, there exists role of dependency in the sequence makeup, keeping this in mind we are proposing two computational schemes (a) using a windowing block size procedure and (b) using di-nucleotide transitions. We obtained better discriminating profile when we analyzed the sequence data in windowing manner. In the second novel approach, we introduced the di-nucleotide transition probability matrix (DTPM) to study the hidden layer of information embedded in the sequences. DTPM has been used as weights for scanning and predictions. This proposed computational scheme incorporates the memory property which is more realistic to study the physicochemical properties embedded in DNA sequences. Our analysis shows that the DTPM scheme performs better than the existing method in this applied region. Characterization of these elements will be a key to genome editing applications and advanced machine learning approaches may also require such distinctive profiles as useful input features.
Collapse
Affiliation(s)
- Atul Kumar Jaiswal
- School of Computational and Integrative Sciences, JNU, New Delhi, 110067, India
| | | |
Collapse
|
7
|
Mishra A, Siwach P, Misra P, Jayaram B, Bansal M, Olson WK, Thayer KM, Beveridge DL. Toward a Universal Structural and Energetic Model for Prokaryotic Promoters. Biophys J 2018; 115:1180-1189. [PMID: 30172386 DOI: 10.1016/j.bpj.2018.08.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2018] [Revised: 07/28/2018] [Accepted: 08/02/2018] [Indexed: 01/04/2023] Open
Abstract
With almost no consensus promoter sequence in prokaryotes, recruitment of RNA polymerase (RNAP) to precise transcriptional start sites (TSSs) has remained an unsolved puzzle. Uncovering the underlying mechanism is critical for understanding the principle of gene regulation. We attempted to search the hidden code in ∼16,500 promoters of 12 prokaryotes representing two kingdoms in their structure and energetics. Twenty-eight fundamental parameters of DNA structure including backbone angles, basepair axis, and interbasepair and intrabasepair parameters were used, and information was extracted from x-ray crystallography data. Three parameters (solvation energy, hydrogen-bond energy, and stacking energy) were selected for creating energetics profiles using in-house programs. DNA of promoter regions was found to be inherently designed to undergo a change in every parameter undertaken for the study, in all prokaryotes. The change starts from some distance upstream of TSSs and continues past some distance from TSS, hence giving a signature state to promoter regions. These signature states might be the universal hidden codes recognized by RNAP. This observation was reiterated when randomly selected promoter sequences (with little sequence conservation) were subjected to structure generation; all developed into very similar three-dimensional structures quite distinct from those of conventional B-DNA and coding sequences. Fine structural details at important motifs (viz. -11, -35, and -75 positions relative to TSS) of promoters reveal novel to our knowledge and pointed insights for RNAP interaction at these locations; it could be correlated with how some particular structural changes at the -11 region may allow insertion of RNAP amino acids in interbasepair space as well as facilitate the flipping out of bases from the DNA duplex.
Collapse
Affiliation(s)
- Akhilesh Mishra
- Supercomputing Facility for Bioinformatics & Computational Biology; Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India
| | - Priyanka Siwach
- Supercomputing Facility for Bioinformatics & Computational Biology; Department of Biotechnology, Chaudhary Devi Lal University, Sirsa, Haryana, India
| | - Pallavi Misra
- Supercomputing Facility for Bioinformatics & Computational Biology
| | - Bhyravabhotla Jayaram
- Supercomputing Facility for Bioinformatics & Computational Biology; Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India; Department of Chemistry, Indian Institute of Technology, Delhi, India.
| | - Manju Bansal
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, India
| | - Wilma K Olson
- Department of Chemistry & Chemical Biology and BioMaPS Institute for Quantitative Biology, Rutgers, Piscataway, New Jersey
| | - Kelly M Thayer
- Department of Chemistry, Vassar College, Poughkeepsie, New York
| | - David L Beveridge
- Departments of Chemistry, Molecular Biology, and Biochemistry and Molecular Biophysics Program, Wesleyan University, Middletown, Connecticut
| |
Collapse
|
8
|
Singh A, Mishra A, Khosravi A, Khandelwal G, Jayaram B. Physico-chemical fingerprinting of RNA genes. Nucleic Acids Res 2017; 45:e47. [PMID: 27932456 PMCID: PMC5397174 DOI: 10.1093/nar/gkw1236] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2016] [Accepted: 11/29/2016] [Indexed: 12/13/2022] Open
Abstract
We advance here a novel concept for characterizing different classes of RNA genes on the basis of physico-chemical properties of DNA sequences. As knowledge-based approaches could yield unsatisfactory outcomes due to limitations of training on available experimental data sets, alternative approaches that utilize properties intrinsic to DNA are needed to supplement training based methods and to eventually provide molecular insights into genome organization. Based on a comprehensive series of molecular dynamics simulations of Ascona B-DNA consortium, we extracted hydrogen bonding, stacking and solvation energies of all combinations of DNA sequences at the dinucleotide level and calculated these properties for different types of RNA genes. Considering ∼7.3 million mRNA, 255 524 tRNA, 40 649 rRNA (different subunits) and 5250 miRNA, 3747 snRNA, gene sequences from 9282 complete genome chromosomes of all prokaryotes and eukaryotes available at NCBI, we observed that physico-chemical properties of different functional units on genomic DNA differ in their signatures.
Collapse
Affiliation(s)
- Ankita Singh
- Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology, Hauz Khas, New Delhi-110016, India
| | - Akhilesh Mishra
- Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology, Hauz Khas, New Delhi-110016, India.,Kusuma School of Biological Sciences, Indian Institute of Technology, Hauz Khas, New Delhi-110016, India
| | - Ali Khosravi
- Ale-Taha Institute of Higher Education, Tehran, Iran
| | - Garima Khandelwal
- Cancer Research UK Manchester Institute, The University of Manchester, Wilmslow Road, Manchester M20 4BX, UK
| | - B Jayaram
- Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology, Hauz Khas, New Delhi-110016, India.,Kusuma School of Biological Sciences, Indian Institute of Technology, Hauz Khas, New Delhi-110016, India.,Department of Chemistry, Indian Institute of Technology, Hauz Khas, New Delhi-110016, India
| |
Collapse
|
9
|
Kumar A, Bansal M. Unveiling DNA structural features of promoters associated with various types of TSSs in prokaryotic transcriptomes and their role in gene expression. DNA Res 2017; 24:25-35. [PMID: 27803028 PMCID: PMC5381344 DOI: 10.1093/dnares/dsw045] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 09/23/2016] [Indexed: 01/28/2023] Open
Abstract
Next-generation sequencing studies have revealed that a variety of transcripts are present in the prokaryotic transcriptome and a significant fraction of them are functional, being involved in various regulatory activities apart from coding for proteins. Identification of promoters associated with different transcripts is necessary for characterization of the transcriptome. Promoter regions have been shown to have unique structural features as compared with their flanking region, in organisms covering all domains of life. Here we report an in silico analysis of DNA sequence dependent structural properties like stability, bendability and curvature in the promoter region of six different prokaryotic transcriptomes. Using these structural features, we predicted promoters associated with different categories of transcripts (mRNA, internal, antisense and non-coding), which constitute the transcriptome. Promoter annotation using structural features is fairly accurate and reliable with about 50% of the primary promoters being characterized by all three structural properties while at least one property identifies 95%. We also studied the relative differences of these structural features in terms of gene expression and found that the features, viz. lower stability, lesser bendability and higher curvature are more prominent in the promoter regions which are associated with high gene expression as compared with low expression genes. Hence, promoters, which are associated with higher gene expression, get annotated well using DNA structural features as compared with those, which are linked to lower gene expression.
Collapse
Affiliation(s)
| | - Manju Bansal
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, 560012 Karnataka, India
| |
Collapse
|
10
|
Gerogiokas G, Southey MWY, Mazanetz MP, Heifetz A, Bodkin M, Law RJ, Henchman RH, Michel J. Assessment of Hydration Thermodynamics at Protein Interfaces with Grid Cell Theory. J Phys Chem B 2016; 120:10442-10452. [PMID: 27645529 DOI: 10.1021/acs.jpcb.6b07993] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Molecular dynamics simulations have been analyzed with the Grid Cell Theory (GCT) method to spatially resolve the binding enthalpies and entropies of water molecules at the interface of 17 structurally diverse proteins. Correlations between computed energetics and structural descriptors have been sought to facilitate the development of simple models of protein hydration. Little correlation was found between GCT-computed binding enthalpies and continuum electrostatics calculations. A simple count of contacts with functional groups in charged amino acids correlates well with enhanced water stabilization, but the stability of water near hydrophobic and polar residues depends markedly on its coordination environment. The positions of X-ray-resolved water molecules correlate with computed high-density hydration sites, but many unresolved waters are significantly stabilized at the protein surfaces. A defining characteristic of ligand-binding pockets compared to nonbinding pockets was a greater solvent-accessible volume, but average water thermodynamic properties were not distinctive from other interfacial regions. Interfacial water molecules are frequently stabilized by enthalpy and destabilized entropy with respect to bulk, but counter-examples occasionally occur. Overall detailed inspection of the local coordinating environment appears necessary to gauge the thermodynamic stability of water in protein structures.
Collapse
Affiliation(s)
- Georgios Gerogiokas
- EaStCHEM School of Chemistry , Joseph Black Building, The King's Buildings, Edinburgh EH9 3JJ, United Kingdom
| | - Michelle W Y Southey
- Evotec (U.K.) Limited , 114 Innovation Drive, Milton Park, Abingdon, Oxfordshire OX14 4SA, United Kingdom
| | - Michael P Mazanetz
- Evotec (U.K.) Limited , 114 Innovation Drive, Milton Park, Abingdon, Oxfordshire OX14 4SA, United Kingdom
| | - Alexander Heifetz
- Evotec (U.K.) Limited , 114 Innovation Drive, Milton Park, Abingdon, Oxfordshire OX14 4SA, United Kingdom
| | - Michael Bodkin
- Evotec (U.K.) Limited , 114 Innovation Drive, Milton Park, Abingdon, Oxfordshire OX14 4SA, United Kingdom
| | - Richard J Law
- Evotec (U.K.) Limited , 114 Innovation Drive, Milton Park, Abingdon, Oxfordshire OX14 4SA, United Kingdom
| | - Richard H Henchman
- Manchester Institute of Biotechnology, The University of Manchester , 131 Princess Street, Manchester M1 7DN, United Kingdom.,School of Chemistry, The University of Manchester , Oxford Road, Manchester M13 9PL, United Kingdom
| | - J Michel
- EaStCHEM School of Chemistry , Joseph Black Building, The King's Buildings, Edinburgh EH9 3JJ, United Kingdom
| |
Collapse
|
11
|
A statistical thermodynamic model for investigating the stability of DNA sequences from oligonucleotides to genomes. Biophys J 2015; 106:2465-73. [PMID: 24896126 DOI: 10.1016/j.bpj.2014.04.029] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2013] [Revised: 03/20/2014] [Accepted: 04/17/2014] [Indexed: 12/12/2022] Open
Abstract
We describe the development and testing of a simple statistical mechanics methodology for duplex DNA applicable to sequences of any composition and extensible to genomes. The microstates of a DNA sequence are modeled in terms of blocks of basepairs that are assumed to be fully closed (paired) or open. This approach generates an ensemble of bubblelike microstates that are used to calculate the corresponding partition function. The energies of the microstates are calculated as additive contributions from hydrogen bonding, basepair stacking, and solvation terms parameterized from a comprehensive series of molecular dynamics simulations including solvent and ions. Thermodynamic properties and nucleotide stability constants for DNA sequences follow directly from the partition function. The methodology was tested by comparing computed free energies per basepair with the experimental melting temperatures of 60 oligonucleotides, yielding a correlation coefficient of -0.96. The thermodynamic stability of genic/nongenic regions was tested in terms of nucleotide stability constants versus sequence for the Escherichia coli K-12 genome. It showed clear differentiation of the genes from promoters and captures genic regions with a sensitivity of 0.94. The statistical thermodynamic model presented here provides a seemingly new handle on the challenging problem of interpreting genomic sequences.
Collapse
|
12
|
Soni A, Pandey KM, Ray P, Jayaram B. Genomes to hits in silico - a country path today, a highway tomorrow: a case study of chikungunya. Curr Pharm Des 2013; 19:4687-700. [PMID: 23260020 PMCID: PMC3831887 DOI: 10.2174/13816128113199990379] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2012] [Accepted: 12/17/2012] [Indexed: 12/11/2022]
Abstract
These are exciting times for bioinformaticians, computational biologists and drug designers with the genome and proteome sequences and related structural databases growing at an accelerated pace. The post-genomic era has triggered high expectations for a rapid and successful treatment of diseases. However, in this biological information rich and functional knowledge poor scenario, the challenges are indeed grand, no less than the assembly of the genome of the whole organism. These include functional annotation of genes, identification of druggable targets, prediction of three-dimensional structures of protein targets from their amino acid sequences, arriving at lead compounds for these targets followed by a transition from bench to bedside. We propose here a "Genome to Hits In Silico" strategy (called Dhanvantari) and illustrate it on Chikungunya virus (CHIKV). "Genome to hits" is a novel pathway incorporating a series of steps such as gene prediction, protein tertiary structure determination, active site identification, hit molecule generation, docking and scoring of hits to arrive at lead compounds. The current state of the art for each of the steps in the pathway is high-lighted and the feasibility of creating an automated genome to hits assembly line is discussed.
Collapse
Affiliation(s)
- Anjali Soni
- Department of Chemistry, Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology, Hauz Khas, New Delhi-110016, India.
| | | | | | | |
Collapse
|
13
|
Dixit SB, Mezei M, Beveridge DL. Studies of base pair sequence effects on DNA solvation based on all-atom molecular dynamics simulations. J Biosci 2012; 37:399-421. [PMID: 22750979 DOI: 10.1007/s12038-012-9223-5] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Detailed analyses of the sequence-dependent solvation and ion atmosphere of DNA are presented based on molecular dynamics (MD) simulations on all the 136 unique tetranucleotide steps obtained by the ABC consortium using the AMBER suite of programs. Significant sequence effects on solvation and ion localization were observed in these simulations. The results were compared to essentially all known experimental data on the subject. Proximity analysis was employed to highlight the sequence dependent differences in solvation and ion localization properties in the grooves of DNA. Comparison of the MD-calculated DNA structure with canonical A- and B-forms supports the idea that the G/C-rich sequences are closer to canonical A- than B-form structures, while the reverse is true for the poly A sequences, with the exception of the alternating ATAT sequence. Analysis of hydration density maps reveals that the flexibility of solute molecule has a significant effect on the nature of observed hydration. Energetic analysis of solute-solvent interactions based on proximity analysis of solvent reveals that the GC or CG base pairs interact more strongly with water molecules in the minor groove of DNA that the AT or TA base pairs, while the interactions of the AT or TA pairs in the major groove are stronger than those of the GC or CG pairs. Computation of solvent-accessible surface area of the nucleotide units in the simulated trajectories reveals that the similarity with results derived from analysis of a database of crystallographic structures is excellent. The MD trajectories tend to follow Manning's counterion condensation theory, presenting a region of condensed counterions within a radius of about 17 A from the DNA surface independent of sequence. The GC and CG pairs tend to associate with cations in the major groove of the DNA structure to a greater extent than the AT and TA pairs. Cation association is more frequent in the minor groove of AT than the GC pairs. In general, the observed water and ion atmosphere around the DNA sequences is the MD simulation is in good agreement with experimental observations.
Collapse
Affiliation(s)
- Surjit B Dixit
- Chemistry Department and Molecular Biophysics Program, Wesleyan University, Middletown, CT 06457, USA
| | | | | |
Collapse
|