1
|
Fuchs S, Engelmann S. Small proteins in bacteria - Big challenges in prediction and identification. Proteomics 2023; 23:e2200421. [PMID: 37609810 DOI: 10.1002/pmic.202200421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 08/03/2023] [Accepted: 08/10/2023] [Indexed: 08/24/2023]
Abstract
Proteins with up to 100 amino acids have been largely overlooked due to the challenges associated with predicting and identifying them using traditional methods. Recent advances in bioinformatics and machine learning, DNA sequencing, RNA and Ribo-seq technologies, and mass spectrometry (MS) have greatly facilitated the detection and characterisation of these elusive proteins in recent years. This has revealed their crucial role in various cellular processes including regulation, signalling and transport, as toxins and as folding helpers for protein complexes. Consequently, the systematic identification and characterisation of these proteins in bacteria have emerged as a prominent field of interest within the microbial research community. This review provides an overview of different strategies for predicting and identifying these proteins on a large scale, leveraging the power of these advanced technologies. Furthermore, the review offers insights into the future developments that may be expected in this field.
Collapse
Affiliation(s)
- Stephan Fuchs
- Genome Competence Center (MF1), Department MFI, Robert-Koch-Institut, Berlin, Germany
| | - Susanne Engelmann
- Institute for Microbiology, Technische Universität Braunschweig, Braunschweig, Germany
- Microbial Proteomics, Helmholtzzentrum für Infektionsforschung GmbH, Braunschweig, Germany
| |
Collapse
|
2
|
Meier-Credo J, Heiniger B, Schori C, Rupprecht F, Michel H, Ahrens CH, Langer JD. Detection of Known and Novel Small Proteins in Pseudomonas stutzeri Using a Combination of Bottom-Up and Digest-Free Proteomics and Proteogenomics. Anal Chem 2023; 95:11892-11900. [PMID: 37535005 PMCID: PMC10433244 DOI: 10.1021/acs.analchem.3c00676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 07/24/2023] [Indexed: 08/04/2023]
Abstract
Small proteins of around 50 aa in length have been largely overlooked in genetic and biochemical assays due to the inherent challenges with detecting and characterizing them. Recent discoveries of their critical roles in many biological processes have led to an increased recognition of the importance of small proteins for basic research and as potential new drug targets. One example is CcoM, a 36 aa subunit of the cbb3-type oxidase that plays an essential role in adaptation to oxygen-limited conditions in Pseudomonas stutzeri (P. stutzeri), a model for the clinically relevant, opportunistic pathogen Pseudomonas aeruginosa. However, as no comprehensive data were available in P. stutzeri, we devised an integrated, generic approach to study small proteins more systematically. Using the first complete genome as basis, we conducted bottom-up proteomics analyses and established a digest-free, direct-sequencing proteomics approach to study cells grown under aerobic and oxygen-limiting conditions. Finally, we also applied a proteogenomics pipeline to identify missed protein-coding genes. Overall, we identified 2921 known and 29 novel proteins, many of which were differentially regulated. Among 176 small proteins 16 were novel. Direct sequencing, featuring a specialized precursor acquisition scheme, exhibited advantages in the detection of small proteins with higher (up to 100%) sequence coverage and more spectral counts, including sequences with high proline content. Three novel small proteins, uniquely identified by direct sequencing and not conserved beyond P. stutzeri, were predicted to form an operon with a conserved protein and may represent de novo genes. These data demonstrate the power of this combined approach to study small proteins in P. stutzeri and show its potential for other prokaryotes.
Collapse
Affiliation(s)
- Jakob Meier-Credo
- Proteomics, Max Planck Institute of Biophysics, 60438 Frankfurt am Main, Germany
| | - Benjamin Heiniger
- Molecular
Ecology, Agroscope & SIB Swiss Institute
of Bioinformatics, 8046 Zürich, Switzerland
| | - Christian Schori
- Molecular
Ecology, Agroscope & SIB Swiss Institute
of Bioinformatics, 8046 Zürich, Switzerland
| | - Fiona Rupprecht
- Proteomics, Max Planck Institute for Brain
Research, 60438 Frankfurt
am Main, Germany
| | - Hartmut Michel
- Department
of Molecular Membrane Biology, Max Planck
Institute of Biophysics, 60438 Frankfurt am Main, Germany
| | - Christian H. Ahrens
- Molecular
Ecology, Agroscope & SIB Swiss Institute
of Bioinformatics, 8046 Zürich, Switzerland
| | - Julian D. Langer
- Proteomics, Max Planck Institute of Biophysics, 60438 Frankfurt am Main, Germany
- Proteomics, Max Planck Institute for Brain
Research, 60438 Frankfurt
am Main, Germany
| |
Collapse
|
3
|
Sharma D, Sharma K, Mishra A, Siwach P, Mittal A, Jayaram B. Molecular dynamics simulation-based trinucleotide and tetranucleotide level structural and energy characterization of the functional units of genomic DNA. Phys Chem Chem Phys 2023; 25:7323-7337. [PMID: 36825435 DOI: 10.1039/d2cp04820e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
Abstract
Genomes of most organisms on earth are written in a universal language of life, made up of four units - adenine (A), thymine (T), guanine (G), and cytosine (C), and understanding the way they are put together has been a great challenge to date. Multiple efforts have been made to annotate this wonderfully engineered string of DNA using different methods but they lack a universal character. In this article, we have investigated the structural and energetic profiles of both prokaryotes and eukaryotes by considering two essential genomic sites, viz., the transcription start sites (TSS) and exon-intron boundaries. We have characterized these sites by mapping the structural and energy features of DNA obtained from molecular dynamics simulations, which considers all possible trinucleotide and tetranucleotide steps. For DNA, these physicochemical properties show distinct signatures at the TSS and intron-exon boundaries. Our results firmly convey the idea that DNA uses the same dialect for prokaryotes and eukaryotes and that it is worth going beyond sequence-level analyses to physicochemical space to determine the functional destiny of DNA sequences.
Collapse
Affiliation(s)
- Dinesh Sharma
- Supercomputing Facility for Bioinformatics & Computational Biology, Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India
| | - Kopal Sharma
- Supercomputing Facility for Bioinformatics & Computational Biology, Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India
| | - Akhilesh Mishra
- Supercomputing Facility for Bioinformatics & Computational Biology, Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India
| | - Priyanka Siwach
- Department of Biotechnology, Chaudhary Devi Lal University, Sirsa, Haryana, India
| | - Aditya Mittal
- Supercomputing Facility for Bioinformatics & Computational Biology, Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India
| | - B Jayaram
- Supercomputing Facility for Bioinformatics & Computational Biology, Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India.,Department of Chemistry, Indian Institute of Technology, Delhi, India.
| |
Collapse
|
4
|
Mishra A, Siwach P, Misra P, Dhiman S, Pandey AK, Srivastava P, Jayaram B. Intron exon boundary junctions in human genome have in-built unique structural and energetic signals. Nucleic Acids Res 2021; 49:2674-2683. [PMID: 33621338 PMCID: PMC7969029 DOI: 10.1093/nar/gkab098] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2019] [Revised: 01/21/2021] [Accepted: 02/22/2021] [Indexed: 11/13/2022] Open
Abstract
Precise identification of correct exon–intron boundaries is a prerequisite to analyze the location and structure of genes. The existing framework for genomic signals, delineating exon and introns in a genomic segment, seems insufficient, predominantly due to poor sequence consensus as well as limitations of training on available experimental data sets. We present here a novel concept for characterizing exon–intron boundaries in genomic segments on the basis of structural and energetic properties. We analyzed boundary junctions on both sides of all the exons (3 28 368) of protein coding genes from human genome (GENCODE database) using 28 structural and three energy parameters. Study of sequence conservation at these sites shows very poor consensus. It is observed that DNA adopts a unique structural and energy state at the boundary junctions. Also, signals are somewhat different for housekeeping and tissue specific genes. Clustering of 31 parameters into four derived vectors gives some additional insights into the physical mechanisms involved in this biological process. Sites of structural and energy signals correlate well to the positions playing important roles in pre-mRNA splicing.
Collapse
Affiliation(s)
- Akhilesh Mishra
- Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology Delhi, India.,Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India
| | - Priyanka Siwach
- Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology Delhi, India.,Department of Biotechnology, Chaudhary Devi Lal University, Sirsa, Haryana, India
| | - Pallavi Misra
- Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology Delhi, India
| | - Simran Dhiman
- Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology Delhi, India
| | | | - Parul Srivastava
- Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology Delhi, India
| | - B Jayaram
- Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology Delhi, India.,Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India.,Department of Chemistry, Indian Institute of Technology, Delhi, India
| |
Collapse
|
5
|
Petruschke H, Schori C, Canzler S, Riesbeck S, Poehlein A, Daniel R, Frei D, Segessemann T, Zimmerman J, Marinos G, Kaleta C, Jehmlich N, Ahrens CH, von Bergen M. Discovery of novel community-relevant small proteins in a simplified human intestinal microbiome. MICROBIOME 2021; 9:55. [PMID: 33622394 PMCID: PMC7903761 DOI: 10.1186/s40168-020-00981-z] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 12/16/2020] [Indexed: 05/13/2023]
Abstract
BACKGROUND The intestinal microbiota plays a crucial role in protecting the host from pathogenic microbes, modulating immunity and regulating metabolic processes. We studied the simplified human intestinal microbiota (SIHUMIx) consisting of eight bacterial species with a particular focus on the discovery of novel small proteins with less than 100 amino acids (= sProteins), some of which may contribute to shape the simplified human intestinal microbiota. Although sProteins carry out a wide range of important functions, they are still often missed in genome annotations, and little is known about their structure and function in individual microbes and especially in microbial communities. RESULTS We created a multi-species integrated proteogenomics search database (iPtgxDB) to enable a comprehensive identification of novel sProteins. Six of the eight SIHUMIx species, for which no complete genomes were available, were sequenced and de novo assembled. Several proteomics approaches including two earlier optimized sProtein enrichment strategies were applied to specifically increase the chances for novel sProtein discovery. The search of tandem mass spectrometry (MS/MS) data against the multi-species iPtgxDB enabled the identification of 31 novel sProteins, of which the expression of 30 was supported by metatranscriptomics data. Using synthetic peptides, we were able to validate the expression of 25 novel sProteins. The comparison of sProtein expression in each single strain versus a multi-species community cultivation showed that six of these sProteins were only identified in the SIHUMIx community indicating a potentially important role of sProteins in the organization of microbial communities. Two of these novel sProteins have a potential antimicrobial function. Metabolic modelling revealed that a third sProtein is located in a genomic region encoding several enzymes relevant for the community metabolism within SIHUMIx. CONCLUSIONS We outline an integrated experimental and bioinformatics workflow for the discovery of novel sProteins in a simplified intestinal model system that can be generically applied to other microbial communities. The further analysis of novel sProteins uniquely expressed in the SIHUMIx multi-species community is expected to enable new insights into the role of sProteins on the functionality of bacterial communities such as those of the human intestinal tract. Video abstract.
Collapse
Affiliation(s)
- Hannes Petruschke
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
| | - Christian Schori
- Agroscope, Molecular Diagnostics, Genomics & Bioinformatics and SIB Swiss Institute of Bioinformatics, Wädenswil, Switzerland
| | - Sebastian Canzler
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
| | - Sarah Riesbeck
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
| | - Anja Poehlein
- Institute of Microbiology and Genetics, Department of Genomic and Applied Microbiology, Georg-August University of Göttingen, Göttingen, Germany
| | - Rolf Daniel
- Institute of Microbiology and Genetics, Department of Genomic and Applied Microbiology, Georg-August University of Göttingen, Göttingen, Germany
| | - Daniel Frei
- Agroscope, Molecular Diagnostics, Genomics & Bioinformatics and SIB Swiss Institute of Bioinformatics, Wädenswil, Switzerland
| | - Tina Segessemann
- Agroscope, Molecular Diagnostics, Genomics & Bioinformatics and SIB Swiss Institute of Bioinformatics, Wädenswil, Switzerland
| | - Johannes Zimmerman
- Research Group Medical Systems Biology, Institute for Experimental Medicine, Christian-Albrechts-University Kiel, Kiel, Germany
| | - Georgios Marinos
- Research Group Medical Systems Biology, Institute for Experimental Medicine, Christian-Albrechts-University Kiel, Kiel, Germany
| | - Christoph Kaleta
- Research Group Medical Systems Biology, Institute for Experimental Medicine, Christian-Albrechts-University Kiel, Kiel, Germany
| | - Nico Jehmlich
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
| | - Christian H Ahrens
- Agroscope, Molecular Diagnostics, Genomics & Bioinformatics and SIB Swiss Institute of Bioinformatics, Wädenswil, Switzerland.
| | - Martin von Bergen
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany.
- Institute of Biochemistry, Faculty of Biosciences, Pharmacy and Psychology, University of Leipzig, Leipzig, Germany.
| |
Collapse
|
6
|
Varadarajan AR, Allan RN, Valentin JDP, Castañeda Ocampo OE, Somerville V, Pietsch F, Buhmann MT, West J, Skipp PJ, van der Mei HC, Ren Q, Schreiber F, Webb JS, Ahrens CH. An integrated model system to gain mechanistic insights into biofilm-associated antimicrobial resistance in Pseudomonas aeruginosa MPAO1. NPJ Biofilms Microbiomes 2020; 6:46. [PMID: 33127897 PMCID: PMC7603352 DOI: 10.1038/s41522-020-00154-8] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Accepted: 10/07/2020] [Indexed: 12/11/2022] Open
Abstract
Pseudomonas aeruginosa MPAO1 is the parental strain of the widely utilized transposon mutant collection for this important clinical pathogen. Here, we validate a model system to identify genes involved in biofilm growth and biofilm-associated antibiotic resistance. Our model employs a genomics-driven workflow to assemble the complete MPAO1 genome, identify unique and conserved genes by comparative genomics with the PAO1 reference strain and genes missed within existing assemblies by proteogenomics. Among over 200 unique MPAO1 genes, we identified six general essential genes that were overlooked when mapping public Tn-seq data sets against PAO1, including an antitoxin. Genomic data were integrated with phenotypic data from an experimental workflow using a user-friendly, soft lithography-based microfluidic flow chamber for biofilm growth and a screen with the Tn-mutant library in microtiter plates. The screen identified hitherto unknown genes involved in biofilm growth and antibiotic resistance. Experiments conducted with the flow chamber across three laboratories delivered reproducible data on P. aeruginosa biofilms and validated the function of both known genes and genes identified in the Tn-mutant screens. Differential protein abundance data from planktonic cells versus biofilm confirmed the upregulation of candidates known to affect biofilm formation, of structural and secreted proteins of type VI secretion systems, and provided proteogenomic evidence for some missed MPAO1 genes. This integrated, broadly applicable model promises to improve the mechanistic understanding of biofilm formation, antimicrobial tolerance, and resistance evolution in biofilms.
Collapse
Affiliation(s)
- Adithi R Varadarajan
- Research Group Molecular Diagnostics Genomics & Bioinformatics, Agroscope and SIB Swiss Institute of Bioinformatics, Wädenswil, Switzerland.
| | - Raymond N Allan
- School of Biological Sciences and Institute for Life Sciences, University of Southampton, Southampton, SO17 1BJ, UK
- National Biofilms Innovation Centre, University of Southampton, Southampton, SO17 1BJ, UK
- School of Pharmacy, Faculty of Health and Life Sciences, De Montfort University, Leicester, LE1 9BH, UK
| | - Jules D P Valentin
- Laboratory for Biointerfaces, Empa, Swiss Federal Laboratories for Materials Science and Technology, St. Gallen, Switzerland
- Department of BioMedical Engineering, University of Groningen and University Medical Center Groningen, Groningen, Netherlands
| | - Olga E Castañeda Ocampo
- Department of BioMedical Engineering, University of Groningen and University Medical Center Groningen, Groningen, Netherlands
| | - Vincent Somerville
- Research Group Molecular Diagnostics Genomics & Bioinformatics, Agroscope and SIB Swiss Institute of Bioinformatics, Wädenswil, Switzerland
| | - Franziska Pietsch
- Division of Biodeterioration and Reference Organisms, Federal Institute for Materials Research and Testing (BAM), Berlin, Germany
| | - Matthias T Buhmann
- Laboratory for Biointerfaces, Empa, Swiss Federal Laboratories for Materials Science and Technology, St. Gallen, Switzerland
| | - Jonathan West
- Faculty of Medicine, University of Southampton, Southampton, SO17 1BJ, UK
- Centre for Hybrid Biodevices, University of Southampton, Southampton, SO17 1BJ, UK
| | - Paul J Skipp
- Centre for Proteomics Research, University of Southampton, Southampton, SO17 1BJ, UK
| | - Henny C van der Mei
- Department of BioMedical Engineering, University of Groningen and University Medical Center Groningen, Groningen, Netherlands
| | - Qun Ren
- Laboratory for Biointerfaces, Empa, Swiss Federal Laboratories for Materials Science and Technology, St. Gallen, Switzerland
| | - Frank Schreiber
- Division of Biodeterioration and Reference Organisms, Federal Institute for Materials Research and Testing (BAM), Berlin, Germany
| | - Jeremy S Webb
- School of Biological Sciences and Institute for Life Sciences, University of Southampton, Southampton, SO17 1BJ, UK
- National Biofilms Innovation Centre, University of Southampton, Southampton, SO17 1BJ, UK
| | - Christian H Ahrens
- Research Group Molecular Diagnostics Genomics & Bioinformatics, Agroscope and SIB Swiss Institute of Bioinformatics, Wädenswil, Switzerland.
| |
Collapse
|
7
|
Mishra A, Dhanda S, Siwach P, Aggarwal S, Jayaram B. A novel method SEProm for prokaryotic promoter prediction based on DNA structure and energetics. Bioinformatics 2020; 36:2375-2384. [PMID: 31909789 DOI: 10.1093/bioinformatics/btz941] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Revised: 11/08/2019] [Accepted: 01/02/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Despite conservation in general architecture of promoters and protein-DNA interaction interface of RNA polymerases among various prokaryotes, identification of promoter regions in the whole genome sequences remains a daunting challenge. The available tools for promoter prediction do not seem to address the problem satisfactorily, apparently because the biochemical nature of promoter signals is yet to be understood fully. Using 28 structural and 3 energetic parameters, we found that prokaryotic promoter regions have a unique structural and energy state, quite distinct from that of coding regions and the information for this signature state is in-built in their sequences. We developed a novel promoter prediction tool from these 31 parameters using various statistical techniques. RESULTS Here, we introduce SEProm, a novel tool that is developed by studying and utilizing the in-built structural and energy information of DNA sequences, which is applicable to all prokaryotes including archaea. Compared to five most recent, diverged and current best available tools, SEProm performs much better, predicting promoters with an 'F-value' of 82.04 and 'Precision' of 81.08. The next best 'F-value' was obtained with PromPredict (72.14) followed by BProm (68.37). On the basis of 'Precision' value, the next best 'Precision' was observed for Pepper (75.39) followed by PromPredict (72.01). SEProm maintained the lead even when comparison was done on two test organisms (not involved in training for SEProm). AVAILABILITY AND IMPLEMENTATION The software is freely available with easy to follow instructions (www.scfbio-iitd.res.in/software/TSS_Predict.jsp). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Akhilesh Mishra
- Supercomputing Facility for Bioinformatics & Computational Biology.,Kusuma School of Biological Sciences, Indian Institute of Technology, New Delhi 110016, India
| | - Sahil Dhanda
- Supercomputing Facility for Bioinformatics & Computational Biology
| | - Priyanka Siwach
- Supercomputing Facility for Bioinformatics & Computational Biology.,Department of Biotechnology, Chaudhary Devi Lal University, Sirsa 125055, India
| | - Shruti Aggarwal
- Supercomputing Facility for Bioinformatics & Computational Biology
| | - B Jayaram
- Supercomputing Facility for Bioinformatics & Computational Biology.,Kusuma School of Biological Sciences, Indian Institute of Technology, New Delhi 110016, India.,Department of Chemistry, Indian Institute of Technology, New Delhi 110016, India
| |
Collapse
|
8
|
A disclosure of hidden secrets in human cytomegalovirus: An in-silico study of identification of novel genes and their analysis for vaccine development. Meta Gene 2020. [DOI: 10.1016/j.mgene.2020.100754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
9
|
Bartel J, Varadarajan AR, Sura T, Ahrens CH, Maaß S, Becher D. Optimized Proteomics Workflow for the Detection of Small Proteins. J Proteome Res 2020; 19:4004-4018. [DOI: 10.1021/acs.jproteome.0c00286] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Jürgen Bartel
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, D-17489 Greifswald, Germany
| | - Adithi R. Varadarajan
- Agroscope, Research Group Molecular Diagnostics, Genomics & Bioinformatics and SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Thomas Sura
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, D-17489 Greifswald, Germany
| | - Christian H. Ahrens
- Agroscope, Research Group Molecular Diagnostics, Genomics & Bioinformatics and SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Sandra Maaß
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, D-17489 Greifswald, Germany
| | - Dörte Becher
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, D-17489 Greifswald, Germany
| |
Collapse
|
10
|
Bhat R, Kaushik R, Singh A, DasGupta D, Jayaraj A, Soni A, Shandilya A, Shekhar V, Shekhar S, Jayaram B. A comprehensive automated computer-aided discovery pipeline from genomes to hit molecules. Chem Eng Sci 2020. [DOI: 10.1016/j.ces.2020.115711] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
11
|
Melior H, Maaß S, Li S, Förstner KU, Azarderakhsh S, Varadarajan AR, Stötzel M, Elhossary M, Barth-Weber S, Ahrens CH, Becher D, Evguenieva-Hackenberg E. The Leader Peptide peTrpL Forms Antibiotic-Containing Ribonucleoprotein Complexes for Posttranscriptional Regulation of Multiresistance Genes. mBio 2020; 11:e01027-20. [PMID: 32546623 PMCID: PMC7298713 DOI: 10.1128/mbio.01027-20] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Accepted: 05/07/2020] [Indexed: 11/20/2022] Open
Abstract
Bacterial ribosome-dependent attenuators are widespread posttranscriptional regulators. They harbor small upstream open reading frames (uORFs) encoding leader peptides, for which no functions in trans are known yet. In the plant symbiont Sinorhizobium meliloti, the tryptophan biosynthesis gene trpE(G) is preceded by the uORF trpL and is regulated by transcription attenuation according to tryptophan availability. However, trpLE(G) transcription is initiated independently of the tryptophan level in S. meliloti, thereby ensuring a largely tryptophan-independent production of the leader peptide peTrpL. Here, we provide evidence for a tryptophan-independent role of peTrpL in trans We found that peTrpL increases the resistance toward tetracycline, erythromycin, chloramphenicol, and the flavonoid genistein, which are substrates of the major multidrug efflux pump SmeAB. Coimmunoprecipitation with a FLAG-peTrpL suggested smeR mRNA, which encodes the transcription repressor of smeABR, as a peptide target. Indeed, upon antibiotic exposure, smeR mRNA was destabilized and smeA stabilized in a peTrpL-dependent manner, showing that peTrpL acts in the differential regulation of smeABR Furthermore, smeR mRNA was coimmunoprecipitated with peTrpL in antibiotic-dependent ribonucleoprotein (ARNP) complexes, which, in addition, contained an antibiotic-induced antisense RNA complementary to smeRIn vitro ARNP reconstitution revealed that the above-mentioned antibiotics and genistein directly support complex formation. A specific region of the antisense RNA was identified as a seed region for ARNP assembly in vitro Altogether, our data show that peTrpL is involved in a mechanism for direct utilization of antimicrobial compounds in posttranscriptional regulation of multiresistance genes. Importantly, this role of peTrpL in resistance is conserved in other AlphaproteobacteriaIMPORTANCE Leader peptides encoded by transcription attenuators are widespread small proteins that are considered nonfunctional in trans We found that the leader peptide peTrpL of the soil-dwelling plant symbiont Sinorhizobium meliloti is required for differential, posttranscriptional regulation of a multidrug resistance operon upon antibiotic exposure. Multiresistance achieved by efflux of different antimicrobial compounds ensures survival and competitiveness in nature and is important from both evolutionary and medical points of view. We show that the leader peptide forms antibiotic- and flavonoid-dependent ribonucleoprotein complexes (ARNPs) for destabilization of smeR mRNA encoding the transcription repressor of the major multidrug resistance operon. The seed region for ARNP assembly was localized in an antisense RNA, whose transcription is induced by antimicrobial compounds. The discovery of ARNP complexes as new players in multiresistance regulation opens new perspectives in understanding bacterial physiology and evolution and potentially provides new targets for antibacterial control.
Collapse
Affiliation(s)
- Hendrik Melior
- Institute of Microbiology and Molecular Biology, University of Giessen, Giessen, Germany
| | - Sandra Maaß
- Institute of Microbiology, University of Greifswald, Greifswald, Germany
| | - Siqi Li
- Institute of Microbiology and Molecular Biology, University of Giessen, Giessen, Germany
| | - Konrad U Förstner
- ZB MED-Information Centre for Life Sciences, University of Cologne, Cologne, Germany
| | - Saina Azarderakhsh
- Institute of Microbiology and Molecular Biology, University of Giessen, Giessen, Germany
| | | | - Maximilian Stötzel
- Institute of Microbiology and Molecular Biology, University of Giessen, Giessen, Germany
| | - Muhammad Elhossary
- ZB MED-Information Centre for Life Sciences, University of Cologne, Cologne, Germany
| | - Susanne Barth-Weber
- Institute of Microbiology and Molecular Biology, University of Giessen, Giessen, Germany
| | - Christian H Ahrens
- Agroscope & SIB Swiss Institute of Bioinformatics, Wädenswil, Switzerland
| | - Dörte Becher
- Institute of Microbiology, University of Greifswald, Greifswald, Germany
| | | |
Collapse
|
12
|
Pant P, Pathak A, Jayaram B. Symmetrization of the backbone of nucleic acids: a molecular dynamics study. J Biomol Struct Dyn 2019; 38:673-681. [DOI: 10.1080/07391102.2019.1585292] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Affiliation(s)
- Pradeep Pant
- Department of Chemistry, Indian Institute of Technology Delhi, New Delhi, India
- Supercomputing Facility for Bioinformatics & Computational Biology, New Delhi, India
| | - Amita Pathak
- Department of Chemistry, Indian Institute of Technology Delhi, New Delhi, India
- Supercomputing Facility for Bioinformatics & Computational Biology, New Delhi, India
| | - B. Jayaram
- Department of Chemistry, Indian Institute of Technology Delhi, New Delhi, India
- Supercomputing Facility for Bioinformatics & Computational Biology, New Delhi, India
- Kusuma School of Biological Sciences, Indian Institute of Technology Delhi, New Delhi, India
| |
Collapse
|
13
|
Abstract
Gene prediction, also known as gene identification, gene finding, gene recognition, or gene discovery, is among one of the important problems of molecular biology and is receiving increasing attention due to the advent of large-scale genome sequencing projects. We designed an ab initio model (called ChemGenome) for gene prediction in prokaryotic genomes based on physicochemical characteristics of codons. In this chapter, we present the methodology of the latest version of this model ChemGenome2.1 (CG2.1). The first module of the protocol builds a three-dimensional vector from three calculated quantities for each codon-the double-helical trinucleotide base pairing energy, the base pair stacking energy, and an index of the propensity of a codon for protein-nucleic acid interactions. As this three-dimensional vector moves along any genome, the net orientation of the resultant vector should differ significantly for gene and non-genic regions to make a distinction feasible. The predicted putative protein-coding genes from above parameters are passed through a second module of the protocol which reduces the number of false positives by utilizing a filter based on stereochemical properties of protein sequences. The chemical properties of amino acid side chains taken into consideration are the presence of sp3 hybridized γ carbon atom, hydrogen bond donor ability, short/absence of δ carbon and linearity of the side chains/non-occurrence of bi-dentate forks with terminal hydrogen atoms in the side chain. The final prediction of the potential protein-coding genes is based on the frequency of occurrence of amino acids in the predicted protein sequences and their deviation from the frequency values of Swissprot protein sequences, both at monomer and tripeptide levels. The final screening is based on Z-score. Though CG2.1 is a gene finding tool for prokaryotes, considering the underlying similarity in the chemical and physical properties of DNA among prokaryotes and eukaryotes, we attempted to evaluate its applicability for gene finding in the lower eukaryotes. The results give a hope that the concept of gene finding based on physicochemical model of codons is a viable idea for eukaryotes as well, though, undoubtedly, improvements are needed.
Collapse
Affiliation(s)
- Akhilesh Mishra
- Supercomputing Facility for Bioinformatics and Computational Biology, Indian Institute of Technology Delhi, New Delhi, India
- Kusuma School of Biological Sciences, Indian Institute of Technology Delhi, New Delhi, India
| | - Priyanka Siwach
- Supercomputing Facility for Bioinformatics and Computational Biology, Indian Institute of Technology Delhi, New Delhi, India
- Department of Biotechnology, Chaudhary Devi Lal University, Sirsa, Haryana, India
| | - Poonam Singhal
- Supercomputing Facility for Bioinformatics and Computational Biology, Indian Institute of Technology Delhi, New Delhi, India
| | - B Jayaram
- Supercomputing Facility for Bioinformatics and Computational Biology, Indian Institute of Technology Delhi, New Delhi, India.
- Kusuma School of Biological Sciences, Indian Institute of Technology Delhi, New Delhi, India.
- Department of Chemistry, Indian Institute of Technology Delhi, New Delhi, India.
| |
Collapse
|
14
|
Jaiswal AK, Krishnamachari A. Physicochemical property based computational scheme for classifying DNA sequence elements of Saccharomyces cerevisiae. Comput Biol Chem 2018; 79:193-201. [PMID: 30711426 DOI: 10.1016/j.compbiolchem.2018.12.014] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Accepted: 12/25/2018] [Indexed: 10/27/2022]
Abstract
GenerationE of huge "omics" data necessitates the development and application of computational methods to annotate the data in terms of biological features. In the context of DNA sequence, it is important to unravel the hidden physicochemical signatures. For this purpose, we have considered various sequence elements such as promoter, ACS, LTRs, telomere, and retrotransposon of the model organism Saccharomyces cerevisiae. Contributions due to di-nucleotides play a major role in studying the DNA conformation profile. The physicochemical parameters used are hydrogen bonding energy, stacking energy and solvation energy per base pair. Our computational study shows that all sequence elements in this study have distinctive physicochemical signatures and the same can be exploited for prediction experiments. The order that we see in a DNA sequence is dictated by biological regions and hence, there exists role of dependency in the sequence makeup, keeping this in mind we are proposing two computational schemes (a) using a windowing block size procedure and (b) using di-nucleotide transitions. We obtained better discriminating profile when we analyzed the sequence data in windowing manner. In the second novel approach, we introduced the di-nucleotide transition probability matrix (DTPM) to study the hidden layer of information embedded in the sequences. DTPM has been used as weights for scanning and predictions. This proposed computational scheme incorporates the memory property which is more realistic to study the physicochemical properties embedded in DNA sequences. Our analysis shows that the DTPM scheme performs better than the existing method in this applied region. Characterization of these elements will be a key to genome editing applications and advanced machine learning approaches may also require such distinctive profiles as useful input features.
Collapse
Affiliation(s)
- Atul Kumar Jaiswal
- School of Computational and Integrative Sciences, JNU, New Delhi, 110067, India
| | | |
Collapse
|
15
|
Mishra A, Siwach P, Misra P, Jayaram B, Bansal M, Olson WK, Thayer KM, Beveridge DL. Toward a Universal Structural and Energetic Model for Prokaryotic Promoters. Biophys J 2018; 115:1180-1189. [PMID: 30172386 DOI: 10.1016/j.bpj.2018.08.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2018] [Revised: 07/28/2018] [Accepted: 08/02/2018] [Indexed: 01/04/2023] Open
Abstract
With almost no consensus promoter sequence in prokaryotes, recruitment of RNA polymerase (RNAP) to precise transcriptional start sites (TSSs) has remained an unsolved puzzle. Uncovering the underlying mechanism is critical for understanding the principle of gene regulation. We attempted to search the hidden code in ∼16,500 promoters of 12 prokaryotes representing two kingdoms in their structure and energetics. Twenty-eight fundamental parameters of DNA structure including backbone angles, basepair axis, and interbasepair and intrabasepair parameters were used, and information was extracted from x-ray crystallography data. Three parameters (solvation energy, hydrogen-bond energy, and stacking energy) were selected for creating energetics profiles using in-house programs. DNA of promoter regions was found to be inherently designed to undergo a change in every parameter undertaken for the study, in all prokaryotes. The change starts from some distance upstream of TSSs and continues past some distance from TSS, hence giving a signature state to promoter regions. These signature states might be the universal hidden codes recognized by RNAP. This observation was reiterated when randomly selected promoter sequences (with little sequence conservation) were subjected to structure generation; all developed into very similar three-dimensional structures quite distinct from those of conventional B-DNA and coding sequences. Fine structural details at important motifs (viz. -11, -35, and -75 positions relative to TSS) of promoters reveal novel to our knowledge and pointed insights for RNAP interaction at these locations; it could be correlated with how some particular structural changes at the -11 region may allow insertion of RNAP amino acids in interbasepair space as well as facilitate the flipping out of bases from the DNA duplex.
Collapse
Affiliation(s)
- Akhilesh Mishra
- Supercomputing Facility for Bioinformatics & Computational Biology; Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India
| | - Priyanka Siwach
- Supercomputing Facility for Bioinformatics & Computational Biology; Department of Biotechnology, Chaudhary Devi Lal University, Sirsa, Haryana, India
| | - Pallavi Misra
- Supercomputing Facility for Bioinformatics & Computational Biology
| | - Bhyravabhotla Jayaram
- Supercomputing Facility for Bioinformatics & Computational Biology; Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India; Department of Chemistry, Indian Institute of Technology, Delhi, India.
| | - Manju Bansal
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, India
| | - Wilma K Olson
- Department of Chemistry & Chemical Biology and BioMaPS Institute for Quantitative Biology, Rutgers, Piscataway, New Jersey
| | - Kelly M Thayer
- Department of Chemistry, Vassar College, Poughkeepsie, New York
| | - David L Beveridge
- Departments of Chemistry, Molecular Biology, and Biochemistry and Molecular Biophysics Program, Wesleyan University, Middletown, Connecticut
| |
Collapse
|
16
|
Omasits U, Varadarajan AR, Schmid M, Goetze S, Melidis D, Bourqui M, Nikolayeva O, Québatte M, Patrignani A, Dehio C, Frey JE, Robinson MD, Wollscheid B, Ahrens CH. An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics. Genome Res 2017; 27:2083-2095. [PMID: 29141959 PMCID: PMC5741054 DOI: 10.1101/gr.218255.116] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2016] [Accepted: 10/25/2017] [Indexed: 12/18/2022]
Abstract
Accurate annotation of all protein-coding sequences (CDSs) is an essential prerequisite to fully exploit the rapidly growing repertoire of completely sequenced prokaryotic genomes. However, large discrepancies among the number of CDSs annotated by different resources, missed functional short open reading frames (sORFs), and overprediction of spurious ORFs represent serious limitations. Our strategy toward accurate and complete genome annotation consolidates CDSs from multiple reference annotation resources, ab initio gene prediction algorithms and in silico ORFs (a modified six-frame translation considering alternative start codons) in an integrated proteogenomics database (iPtgxDB) that covers the entire protein-coding potential of a prokaryotic genome. By extending the PeptideClassifier concept of unambiguous peptides for prokaryotes, close to 95% of the identifiable peptides imply one distinct protein, largely simplifying downstream analysis. Searching a comprehensive Bartonella henselae proteomics data set against such an iPtgxDB allowed us to unambiguously identify novel ORFs uniquely predicted by each resource, including lipoproteins, differentially expressed and membrane-localized proteins, novel start sites and wrongly annotated pseudogenes. Most novelties were confirmed by targeted, parallel reaction monitoring mass spectrometry, including unique ORFs and single amino acid variations (SAAVs) identified in a re-sequenced laboratory strain that are not present in its reference genome. We demonstrate the general applicability of our strategy for genomes with varying GC content and distinct taxonomic origin. We release iPtgxDBs for B. henselae, Bradyrhizobium diazoefficiens and Escherichia coli and the software to generate both proteogenomics search databases and integrated annotation files that can be viewed in a genome browser for any prokaryote.
Collapse
Affiliation(s)
- Ulrich Omasits
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Adithi R Varadarajan
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland.,Department of Health Sciences and Technology, Institute of Molecular Systems Biology, Swiss Federal Institute of Technology Zurich, CH-8093 Zurich, Switzerland
| | - Michael Schmid
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Sandra Goetze
- Department of Health Sciences and Technology, Institute of Molecular Systems Biology, Swiss Federal Institute of Technology Zurich, CH-8093 Zurich, Switzerland
| | - Damianos Melidis
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Marc Bourqui
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Olga Nikolayeva
- Institute for Molecular Life Sciences & SIB Swiss Institute of Bioinformatics, University of Zurich, CH-8057 Zurich, Switzerland
| | | | - Andrea Patrignani
- Functional Genomics Center Zurich, ETH & UZH Zurich, CH-8057 Zurich, Switzerland
| | | | - Juerg E Frey
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Mark D Robinson
- Institute for Molecular Life Sciences & SIB Swiss Institute of Bioinformatics, University of Zurich, CH-8057 Zurich, Switzerland
| | - Bernd Wollscheid
- Department of Health Sciences and Technology, Institute of Molecular Systems Biology, Swiss Federal Institute of Technology Zurich, CH-8093 Zurich, Switzerland
| | - Christian H Ahrens
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| |
Collapse
|
17
|
Singh A, Mishra A, Khosravi A, Khandelwal G, Jayaram B. Physico-chemical fingerprinting of RNA genes. Nucleic Acids Res 2017; 45:e47. [PMID: 27932456 PMCID: PMC5397174 DOI: 10.1093/nar/gkw1236] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2016] [Accepted: 11/29/2016] [Indexed: 12/13/2022] Open
Abstract
We advance here a novel concept for characterizing different classes of RNA genes on the basis of physico-chemical properties of DNA sequences. As knowledge-based approaches could yield unsatisfactory outcomes due to limitations of training on available experimental data sets, alternative approaches that utilize properties intrinsic to DNA are needed to supplement training based methods and to eventually provide molecular insights into genome organization. Based on a comprehensive series of molecular dynamics simulations of Ascona B-DNA consortium, we extracted hydrogen bonding, stacking and solvation energies of all combinations of DNA sequences at the dinucleotide level and calculated these properties for different types of RNA genes. Considering ∼7.3 million mRNA, 255 524 tRNA, 40 649 rRNA (different subunits) and 5250 miRNA, 3747 snRNA, gene sequences from 9282 complete genome chromosomes of all prokaryotes and eukaryotes available at NCBI, we observed that physico-chemical properties of different functional units on genomic DNA differ in their signatures.
Collapse
Affiliation(s)
- Ankita Singh
- Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology, Hauz Khas, New Delhi-110016, India
| | - Akhilesh Mishra
- Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology, Hauz Khas, New Delhi-110016, India.,Kusuma School of Biological Sciences, Indian Institute of Technology, Hauz Khas, New Delhi-110016, India
| | - Ali Khosravi
- Ale-Taha Institute of Higher Education, Tehran, Iran
| | - Garima Khandelwal
- Cancer Research UK Manchester Institute, The University of Manchester, Wilmslow Road, Manchester M20 4BX, UK
| | - B Jayaram
- Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology, Hauz Khas, New Delhi-110016, India.,Kusuma School of Biological Sciences, Indian Institute of Technology, Hauz Khas, New Delhi-110016, India.,Department of Chemistry, Indian Institute of Technology, Hauz Khas, New Delhi-110016, India
| |
Collapse
|
18
|
A statistical thermodynamic model for investigating the stability of DNA sequences from oligonucleotides to genomes. Biophys J 2015; 106:2465-73. [PMID: 24896126 DOI: 10.1016/j.bpj.2014.04.029] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2013] [Revised: 03/20/2014] [Accepted: 04/17/2014] [Indexed: 12/12/2022] Open
Abstract
We describe the development and testing of a simple statistical mechanics methodology for duplex DNA applicable to sequences of any composition and extensible to genomes. The microstates of a DNA sequence are modeled in terms of blocks of basepairs that are assumed to be fully closed (paired) or open. This approach generates an ensemble of bubblelike microstates that are used to calculate the corresponding partition function. The energies of the microstates are calculated as additive contributions from hydrogen bonding, basepair stacking, and solvation terms parameterized from a comprehensive series of molecular dynamics simulations including solvent and ions. Thermodynamic properties and nucleotide stability constants for DNA sequences follow directly from the partition function. The methodology was tested by comparing computed free energies per basepair with the experimental melting temperatures of 60 oligonucleotides, yielding a correlation coefficient of -0.96. The thermodynamic stability of genic/nongenic regions was tested in terms of nucleotide stability constants versus sequence for the Escherichia coli K-12 genome. It showed clear differentiation of the genes from promoters and captures genic regions with a sensitivity of 0.94. The statistical thermodynamic model presented here provides a seemingly new handle on the challenging problem of interpreting genomic sequences.
Collapse
|
19
|
Gupta S, Chavan S, Deobagkar DN, Deobagkar DD. Bio/chemoinformatics in India: an outlook. Brief Bioinform 2014; 16:710-31. [PMID: 25159593 DOI: 10.1093/bib/bbu028] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Accepted: 07/28/2014] [Indexed: 12/25/2022] Open
Abstract
With the advent of significant establishment and development of Internet facilities and computational infrastructure, an overview on bio/chemoinformatics is presented along with its multidisciplinary facts, promises and challenges. The Government of India has paved the way for more profound research in biological field with the use of computational facilities and schemes/projects to collaborate with scientists from different disciplines. Simultaneously, the growth of available biomedical data has provided fresh insight into the nature of redundant and compensatory data. Today, bioinformatics research in India is characterized by a powerful grid computing systems, great variety of biological questions addressed and the close collaborations between scientists and clinicians, with a full spectrum of focuses ranging from database building and methods development to biological discoveries. In fact, this outlook provides a resourceful platform highlighting the funding agencies, institutes and industries working in this direction, which would certainly be of great help to students seeking their career in bioinformatics. Thus, in short, this review highlights the current bio/chemoinformatics trend, educations, status, diverse applicability and demands for further development.
Collapse
|
20
|
Soni A, Pandey KM, Ray P, Jayaram B. Genomes to hits in silico - a country path today, a highway tomorrow: a case study of chikungunya. Curr Pharm Des 2013; 19:4687-700. [PMID: 23260020 PMCID: PMC3831887 DOI: 10.2174/13816128113199990379] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2012] [Accepted: 12/17/2012] [Indexed: 12/11/2022]
Abstract
These are exciting times for bioinformaticians, computational biologists and drug designers with the genome and proteome sequences and related structural databases growing at an accelerated pace. The post-genomic era has triggered high expectations for a rapid and successful treatment of diseases. However, in this biological information rich and functional knowledge poor scenario, the challenges are indeed grand, no less than the assembly of the genome of the whole organism. These include functional annotation of genes, identification of druggable targets, prediction of three-dimensional structures of protein targets from their amino acid sequences, arriving at lead compounds for these targets followed by a transition from bench to bedside. We propose here a "Genome to Hits In Silico" strategy (called Dhanvantari) and illustrate it on Chikungunya virus (CHIKV). "Genome to hits" is a novel pathway incorporating a series of steps such as gene prediction, protein tertiary structure determination, active site identification, hit molecule generation, docking and scoring of hits to arrive at lead compounds. The current state of the art for each of the steps in the pathway is high-lighted and the feasibility of creating an automated genome to hits assembly line is discussed.
Collapse
Affiliation(s)
- Anjali Soni
- Department of Chemistry, Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology, Hauz Khas, New Delhi-110016, India.
| | | | | | | |
Collapse
|
21
|
Abstract
We present here a novel methodology for predicting new genes in prokaryotic genomes on the basis of inherent energetics of DNA. Regions of higher thermodynamic stability were identified, which were filtered based on already known annotations to yield a set of potentially new genes. These were then processed for their compatibility with the stereo-chemical properties of proteins and tripeptide frequencies of proteins in Swissprot data, which results in a reliable set of new genes in a genome. Quite surprisingly, the methodology identifies new genes even in well-annotated genomes. Also, the methodology can handle genomes of any GC-content, size and number of annotated genes.
Collapse
|
22
|
Abstract
This article provides a retrospective on the ABC initiative in the area of all-atom molecular dynamics (MD) simulations including explicit solvent on all tetranucleotide steps of duplex B-form DNA duplex, ca. 2012. The ABC consortium has completed two phases of simulations, the most current being a set of 50-100 trajectories based on the AMBER ff99 force field together with the parmbsc0 modification. Some general perspectives on the field of MD on DNA and sequence effects on DNA structure are provided, followed by an overview our MD results, including a detailed comparison of the ff99/parmbsc0 results with crystal and NMR structures available for d(CGCGAATTCGCG). Some projects inspired by or related to the ABC initiative and database are also reviewed, including methods for the trajectory analyses, informatics of dealing with the large database of results, compressions of trajectories for efficacy of distribution, DNA solvation by water and ions, parameterization of coarse-grained models with applications and gene finding and genome annotation.
Collapse
Affiliation(s)
- David L Beveridge
- Department of Chemistry and Molecular Biophysics Program, Wesleyan University Middletown, CT 06459, USA.
| | | | | |
Collapse
|
23
|
Khandelwal G, Jayaram B. DNA-water interactions distinguish messenger RNA genes from transfer RNA genes. J Am Chem Soc 2012; 134:8814-6. [PMID: 22551381 DOI: 10.1021/ja3020956] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Physicochemical properties of DNA sequences as a guide to developing insights into genome organization has received little attention. Here, we utilize the energetics of DNA to further advance the knowledge on its language at a molecular level. Specifically, we ask the question whether physicochemical properties of different functional units on genomes differ. We extract intramolecular and solvation energies of different DNA base pair steps from a comprehensive set of molecular dynamics simulations. We then investigate the solvation behavior of DNA sequences coding for mRNAs and tRNAs. Distinguishing mRNA genes from tRNA genes is a tricky problem in genome annotation without assumptions on length of DNA and secondary structure of the product of transcription. We find that solvation energetics of DNA behaves as an extremely efficient property in discriminating 2,063,537 genes coding for mRNAs from 56,251 genes coding for tRNAs in all (~1500) completely sequenced prokaryotic genomes.
Collapse
Affiliation(s)
- Garima Khandelwal
- Department of Chemistry, Indian Institute of Technology Delhi, Hauz Khas, New Delhi-110016, India
| | | |
Collapse
|
24
|
Abstract
It has been known for decades that DNA is extremely flexible and polymorphic, but our knowledge of its accessible conformational space remains limited. Structural data, primarily from X-ray diffraction studies, is sparse in comparison to the manifold configurations possible, and direct experimental examinations of DNA's flexibility still suffer from many limitations. In the face of these shortcomings, molecular dynamics (MD) is now an essential tool in the study of DNA. It affords detailed structural and dynamical insights, which explains its recent transition from a small number of highly specialized laboratories to a large variety of groups dealing with challenging biological problems. MD is now making an irreversible journey to the mainstream of research in biology, with the attendant opportunities and challenges. But given the speed with which MD studies of DNA have spread, the roots remain somewhat shallow: in many cases, there is a lack of deep knowledge about the foundations, strengths, and limits of the technique. In this Account, we discuss how MD has become the most important source of structural and flexibility data on DNA, focusing on advances since 2007 of atomistic MD in the description of DNA under near-physiological conditions and highlighting the possibilities and shortcomings of the technique. The evolution in the field over the past four years is a prelude to the ongoing revolution. The technique has gained in robustness and predictive power, which when coupled with the spectacular improvements in software and hardware has enabled the tackling of systems of increasing complexity. Simulation times of microseconds have now been achieved, with even longer times when specialized hardware is used. As a result, we have seen the first real-time simulation of large conformational transitions, including folding and unfolding of short DNA duplexes. Noteworthy advances have also been made in the study of DNA-ligand interactions, and we predict that a global thermodynamic and kinetic picture of the binding landscape of DNA will become available in a few years. MD will become a crucial tool in areas such as biomolecular engineering and synthetic biology. MD has also been shown to be an excellent source of parameters for mesoscopic models of DNA flexibility. Such models can be refined through atomistic MD simulations on small duplexes and then applied to the study of entire chromosomes. Recent evidence suggests that MD-derived elastic models can successfully predict the position of regulatory regions in DNA and can help advance our understanding of nucleosome positioning and chromatin plasticity. If these results are confirmed, MD simulations can become the ultimate tool to decipher a physical code that can contribute to gene regulation. We are entering the golden age of MD simulations of DNA. Undoubtedly, the expectations are high, but the challenges are also enormous. These include the need for more accurate potential energy functionals and for longer and more complex simulations in more realistic systems. The joint research effort of several groups will be crucial for adapting the technique to the requirements of the coming decade.
Collapse
Affiliation(s)
- Alberto Pérez
- Joint IRB-BSC Program in Computational Biology, Institute of Research in Biomedicine Barcelona, Baldiri i Reixac 10, Barcelona 08028, Spain
| | - F. Javier Luque
- Department de Fisicoquímica and Institut de Biomedicina (IBUB), Facultat de Farmàcia, Universitat de Barcelona, Avgda Diagonal 643, Barcelona 08028, Spain
| | - Modesto Orozco
- Joint IRB-BSC Program in Computational Biology, Institute of Research in Biomedicine Barcelona, Baldiri i Reixac 10, Barcelona 08028, Spain
- Departament de Bioquímica, Universitat de Barcelona, Avgda Diagonal 647, Barcelona 08028, Spain, and Instituto Nacional de Bioinformàtica, Parc Científic de Barcelona, Baldiri i Reixac 10, Barcelona 08028, Spain
| |
Collapse
|
25
|
Torella R, Moroni E, Caselle M, Morra G, Colombo G. Investigating dynamic and energetic determinants of protein nucleic acid recognition: analysis of the zinc finger zif268-DNA complexes. BMC STRUCTURAL BIOLOGY 2010; 10:42. [PMID: 21106075 PMCID: PMC3002361 DOI: 10.1186/1472-6807-10-42] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/14/2010] [Accepted: 11/24/2010] [Indexed: 01/08/2023]
Abstract
BACKGROUND Protein-DNA recognition underlies fundamental biological processes ranging from transcription to replication and modification. Herein, we present a computational study of the sequence modulation of internal dynamic properties and of intraprotein networks of aminoacid interactions that determine the stability and specificity of protein-DNA complexes. RESULTS To this aim, we apply novel theoretical approaches to analyze the dynamics and energetics of biological systems starting from MD trajectories. As model system, we chose different sequences of Zinc Fingers (ZF) of the Zif268 family bound with different sequences of DNA. The complexes differ for their experimental stability properties, but share the same overall 3 D structure and do not undergo structural modifications during the simulations. The results of our analysis suggest that the energy landscape for DNA binding may be populated by dynamically different states, even in the absence of major conformational changes. Energetic couplings between residues change in response to protein and/or DNA sequence variations thus modulating the selectivity of recognition and the relative importance of different regions for binding. CONCLUSIONS The results show differences in the organization of the intra-protein energy-networks responsible for the stabilization of the protein conformations recognizing and binding DNA. These, in turn, are reflected into different modulation of the ZF's internal dynamics. The results also show a correlation between energetic and dynamic properties of the different proteins and their specificity/selectivity for DNA sequences. Finally, a dynamic and energetic model for the recognition of DNA by Zinc Fingers is proposed.
Collapse
Affiliation(s)
- Rubben Torella
- Istituto di Chimica del Riconoscimento Molecolare, CNR, Via Mario Bianco 9, 20131 Milano, Italy
| | | | | | | | | |
Collapse
|
26
|
Khandelwal G, Bhyravabhotla J. A phenomenological model for predicting melting temperatures of DNA sequences. PLoS One 2010; 5:e12433. [PMID: 20865157 PMCID: PMC2928768 DOI: 10.1371/journal.pone.0012433] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2009] [Accepted: 08/02/2010] [Indexed: 11/29/2022] Open
Abstract
We report here a novel method for predicting melting temperatures of DNA sequences based on a molecular-level hypothesis on the phenomena underlying the thermal denaturation of DNA. The model presented here attempts to quantify the energetic components stabilizing the structure of DNA such as base pairing, stacking, and ionic environment which are partially disrupted during the process of thermal denaturation. The model gives a Pearson product-moment correlation coefficient (r) of approximately 0.98 between experimental and predicted melting temperatures for over 300 sequences of varying lengths ranging from 15-mers to genomic level and at different salt concentrations. The approach is implemented as a web tool (www.scfbio-iitd.res.in/chemgenome/Tm_predictor.jsp) for the prediction of melting temperatures of DNA sequences.
Collapse
Affiliation(s)
- Garima Khandelwal
- Department of Chemistry, Indian Institute of Technology Delhi, New Delhi, India
- Supercomputing Facility for Bioinformatics and Computational Biology, Indian Institute of Technology Delhi, New Delhi, India
| | - Jayaram Bhyravabhotla
- Department of Chemistry, Indian Institute of Technology Delhi, New Delhi, India
- Supercomputing Facility for Bioinformatics and Computational Biology, Indian Institute of Technology Delhi, New Delhi, India
- School of Biological Sciences, Indian Institute of Technology Delhi, New Delhi, India
| |
Collapse
|
27
|
Rangannan V, Bansal M. Relative stability of DNA as a generic criterion for promoter prediction: whole genome annotation of microbial genomes with varying nucleotide base composition. MOLECULAR BIOSYSTEMS 2009; 5:1758-69. [DOI: 10.1039/b906535k] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
28
|
Goñi JR, Fenollosa C, Pérez A, Torrents D, Orozco M. DNAlive: a tool for the physical analysis of DNA at the genomic scale. ACTA ACUST UNITED AC 2008; 24:1731-2. [PMID: 18544548 DOI: 10.1093/bioinformatics/btn259] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
SUMMARY DNAlive is a tool for the analysis and graphical display of structural and physical characteristics of genomic DNA. The web server implements a wide repertoire of metrics to derive physical information from DNA sequences with a powerful interface to derive 3D information on large sequences of both naked and protein-bound DNAs. Furthermore, it implements a mesoscopic Metropolis code which allows the inexpensive study of the dynamic properties of chromatin fibers. In addition, our server also surveys other protein and genomic databases allowing the user to combine and explore the physical properties of selected DNA in the context of functional features annotated on those regions. AVAILABILITY http://mmb.pcb.ub.es/DNAlive/ ; http://www.inab.org/
Collapse
Affiliation(s)
- J Ramon Goñi
- Joint IRB-BSC Program on Computational Biology, Institute of Research in Biomedicine, Parc Científic de Barcelona, Josep Samitier 1-5, Barcelona 08028, Spain
| | | | | | | | | |
Collapse
|