1
|
Chakraborty D, Mondal B, Thirumalai D. Brewing COFFEE: A Sequence-Specific Coarse-Grained Energy Function for Simulations of DNA-Protein Complexes. J Chem Theory Comput 2024; 20:1398-1413. [PMID: 38241144 DOI: 10.1021/acs.jctc.3c00833] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2024]
Abstract
DNA-protein interactions are pervasive in a number of biophysical processes ranging from transcription and gene expression to chromosome folding. To describe the structural and dynamic properties underlying these processes accurately, it is important to create transferable computational models. Toward this end, we introduce Coarse-grained Force Field for Energy Estimation, COFFEE, a robust framework for simulating DNA-protein complexes. To brew COFFEE, we integrated the energy function in the self-organized polymer model with side-chains for proteins and the three interaction site model for DNA in a modular fashion, without recalibrating any of the parameters in the original force-fields. A unique feature of COFFEE is that it describes sequence-specific DNA-protein interactions using a statistical potential (SP) derived from a data set of high-resolution crystal structures. The only parameter in COFFEE is the strength (λDNAPRO) of the DNA-protein contact potential. For an optimal choice of λDNAPRO, the crystallographic B-factors for DNA-protein complexes with varying sizes and topologies are quantitatively reproduced. Without any further readjustments to the force-field parameters, COFFEE predicts scattering profiles that are in quantitative agreement with small-angle X-ray scattering experiments, as well as chemical shifts that are consistent with NMR. We also show that COFFEE accurately describes the salt-induced unraveling of nucleosomes. Strikingly, our nucleosome simulations explain the destabilization effect of ARG to LYS mutations, which do not alter the balance of electrostatic interactions but affect chemical interactions in subtle ways. The range of applications attests to the transferability of COFFEE, and we anticipate that it would be a promising framework for simulating DNA-protein complexes at the molecular length-scale.
Collapse
Affiliation(s)
- Debayan Chakraborty
- Department of Chemistry, The University of Texas at Austin, 105 E 24th Street, Stop A5300, Austin 78712, Texas, United States
| | - Balaka Mondal
- Department of Chemistry, The University of Texas at Austin, 105 E 24th Street, Stop A5300, Austin 78712, Texas, United States
| | - D Thirumalai
- Department of Chemistry, The University of Texas at Austin, 105 E 24th Street, Stop A5300, Austin 78712, Texas, United States
- Department of Physics, The University of Texas at Austin, 2515 Speedway, Austin 78712, Texas, United States
| |
Collapse
|
2
|
Chakraborty D, Mondal B, Thirumalai D. Brewing COFFEE: A sequence-specific coarse-grained energy function for simulations of DNA-protein complexes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.07.544064. [PMID: 37333386 PMCID: PMC10274755 DOI: 10.1101/2023.06.07.544064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
DNA-protein interactions are pervasive in a number of biophysical processes ranging from transcription, gene expression, to chromosome folding. To describe the structural and dynamic properties underlying these processes accurately, it is important to create transferable computational models. Toward this end, we introduce Coarse grained force field for energy estimation, COFFEE, a robust framework for simulating DNA-protein complexes. To brew COFFEE, we integrated the energy function in the Self-Organized Polymer model with Side Chains for proteins and the Three Interaction Site model for DNA in a modular fashion, without re-calibrating any of the parameters in the original force-fields. A unique feature of COFFEE is that it describes sequence-specific DNA-protein interactions using a statistical potential (SP) derived from a dataset of high-resolution crystal structures. The only parameter in COFFEE is the strength (λ D N A P R O ) of the DNA-protein contact potential. For an optimal choice of λ D N A P R O , the crystallographic B-factors for DNA-protein complexes, with varying sizes and topologies, are quantitatively reproduced. Without any further readjustments to the force-field parameters, COFFEE predicts the scattering profiles that are in quantitative agreement with SAXS experiments as well as chemical shifts that are consistent with NMR. We also show that COFFEE accurately describes the salt-induced unraveling of nucleosomes. Strikingly, our nucleosome simulations explain the destabilization effect of ARG to LYS mutations, which does not alter the balance of electrostatic interactions, but affects chemical interactions in subtle ways. The range of applications attests to the transferability of COFFEE, and we anticipate that it would be a promising framework for simulating DNA-protein complexes at the molecular length-scale.
Collapse
Affiliation(s)
- Debayan Chakraborty
- Department of Chemistry, The University of Texas at Austin, 105 E 24th St, Stop A5300, Austin TX 78712, USA
| | - Balaka Mondal
- Department of Chemistry, The University of Texas at Austin, 105 E 24th St, Stop A5300, Austin TX 78712, USA
| | - D Thirumalai
- Department of Chemistry, The University of Texas at Austin, 105 E 24th St, Stop A5300, Austin TX 78712, USA
- Department of Physics, The University of Texas at Austin, 2515 Speedway,Austin TX 78712, USA
| |
Collapse
|
3
|
Yang S, Gong W, Zhou T, Sun X, Chen L, Zhou W, Li C. emPDBA: protein-DNA binding affinity prediction by combining features from binding partners and interface learned with ensemble regression model. Brief Bioinform 2023:7165253. [PMID: 37193676 DOI: 10.1093/bib/bbad192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 04/26/2023] [Accepted: 04/29/2023] [Indexed: 05/18/2023] Open
Abstract
Protein-deoxyribonucleic acid (DNA) interactions are important in a variety of biological processes. Accurately predicting protein-DNA binding affinity has been one of the most attractive and challenging issues in computational biology. However, the existing approaches still have much room for improvement. In this work, we propose an ensemble model for Protein-DNA Binding Affinity prediction (emPDBA), which combines six base models with one meta-model. The complexes are classified into four types based on the DNA structure (double-stranded or other forms) and the percentage of interface residues. For each type, emPDBA is trained with the sequence-based, structure-based and energy features from binding partners and complex structures. Through feature selection by the sequential forward selection method, it is found that there do exist considerable differences in the key factors contributing to intermolecular binding affinity. The complex classification is beneficial for the important feature extraction for binding affinity prediction. The performance comparison of our method with other peer ones on the independent testing dataset shows that emPDBA outperforms the state-of-the-art methods with the Pearson correlation coefficient of 0.53 and the mean absolute error of 1.11 kcal/mol. The comprehensive results demonstrate that our method has a good performance for protein-DNA binding affinity prediction. Availability and implementation: The source code is available at https://github.com/ChunhuaLiLab/emPDBA/.
Collapse
Affiliation(s)
- Shuang Yang
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Weikang Gong
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Tong Zhou
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Xiaohan Sun
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Lei Chen
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Wenxue Zhou
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Chunhua Li
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| |
Collapse
|
4
|
Rodríguez-Lumbreras LA, Jiménez-García B, Giménez-Santamarina S, Fernández-Recio J. pyDockDNA: A new web server for energy-based protein-DNA docking and scoring. Front Mol Biosci 2022; 9:988996. [PMID: 36275623 PMCID: PMC9582769 DOI: 10.3389/fmolb.2022.988996] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Accepted: 09/20/2022] [Indexed: 11/16/2022] Open
Abstract
Proteins and nucleic acids are essential biological macromolecules for cell life. Indeed, interactions between proteins and DNA regulate many biological processes such as protein synthesis, signal transduction, DNA storage, or DNA replication and repair. Despite their importance, less than 4% of total structures deposited in the Protein Data Bank (PDB) correspond to protein-DNA complexes, and very few computational methods are available to model their structure. We present here the pyDockDNA web server, which can successfully model a protein-DNA complex with a reasonable predictive success rate (as benchmarked on a standard dataset of protein-DNA complex structures, where DNA is in B-DNA conformation). The server implements the pyDockDNA program, as a module of pyDock suite, thus including third-party programs, modules, and previously developed tools, as well as new modules and parameters to handle the DNA properly. The user is asked to enter Protein Data Bank files for protein and DNA input structures (or suitable models) and select the chains to be docked. The server calculations are mainly divided into three steps: sampling by FTDOCK, scoring with new energy-based parameters and the possibility of applying external restraints. The user can select different options for these steps. The final output screen shows a 3D representation of the top 10 models and a table sorting the model according to the scoring function selected previously. All these output files can be downloaded, including the top 100 models predicted by pyDockDNA. The server can be freely accessed for academic use (https://model3dbio.csic.es/pydockdna).
Collapse
Affiliation(s)
| | - Brian Jiménez-García
- Barcelona Supercomputing Center, Barcelona, Spain
- Zymvol Biomodeling SL, Barcelona, Spain
| | | | - Juan Fernández-Recio
- Barcelona Supercomputing Center, Barcelona, Spain
- Instituto de Ciencias de la Vid y del Vino (ICVV), Logroño, Spain
- *Correspondence: Juan Fernández-Recio,
| |
Collapse
|
5
|
Pal A, Chakrabarti P, Dey S. ProDFace: A web-tool for the dissection of protein-DNA interfaces. Front Mol Biosci 2022; 9:978310. [PMID: 36148013 PMCID: PMC9486321 DOI: 10.3389/fmolb.2022.978310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2022] [Accepted: 08/09/2022] [Indexed: 11/30/2022] Open
Abstract
Protein-DNA interactions play a crucial role in gene expression and regulation. Identifying the DNA binding surface of proteins has long been a challenge–in comparison to protein-protein interactions, limited progress has been made in the development of efficient DNA binding site prediction and protein-DNA docking methods. Here we present ProDFace, a web tool that characterizes the binding region of a protein-DNA complex based on amino acid propensity, hydrogen bond (HB) donor capacity (number of solvent accessible HB donor groups), sequence conservation at the interface core and rim region, and geometry. The program takes as input the structure of a protein-DNA complex in PDB (Protein Data Bank) format, and outputs various physicochemical and geometric parameters of the interface, as well as conservation of the interface residues in the protein component. Values are provided for the whole interface, and after dissecting it into core and rim regions. Details of water mediated HBs between protein and DNA, potential HB donor groups present at the binding surface of protein, and conserved interface residues are also provided as downloadable text files. These parameters can be useful in evaluating and validating protein-DNA docking solutions, structures derived from simulation as well as solutions from the available prediction tools, and facilitate the development of more efficient prediction methods. The web-tool is freely available at structbioinfo.iitj.ac.in/resources/bioinfo/pd_interface.
Collapse
Affiliation(s)
- Arumay Pal
- School of Bioengineering, Vellore Institute of Technology, Bhopal, India
| | | | - Sucharita Dey
- Department of Bioscience and Bioengineering, Indian Institute of Technology Jodhpur, Karwar, India
- *Correspondence: Sucharita Dey,
| |
Collapse
|
6
|
Littmann M, Heinzinger M, Dallago C, Weissenow K, Rost B. Protein embeddings and deep learning predict binding residues for various ligand classes. Sci Rep 2021; 11:23916. [PMID: 34903827 PMCID: PMC8668950 DOI: 10.1038/s41598-021-03431-4] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Accepted: 12/02/2021] [Indexed: 01/27/2023] Open
Abstract
One important aspect of protein function is the binding of proteins to ligands, including small molecules, metal ions, and macromolecules such as DNA or RNA. Despite decades of experimental progress many binding sites remain obscure. Here, we proposed bindEmbed21, a method predicting whether a protein residue binds to metal ions, nucleic acids, or small molecules. The Artificial Intelligence (AI)-based method exclusively uses embeddings from the Transformer-based protein Language Model (pLM) ProtT5 as input. Using only single sequences without creating multiple sequence alignments (MSAs), bindEmbed21DL outperformed MSA-based predictions. Combination with homology-based inference increased performance to F1 = 48 ± 3% (95% CI) and MCC = 0.46 ± 0.04 when merging all three ligand classes into one. All results were confirmed by three independent data sets. Focusing on very reliably predicted residues could complement experimental evidence: For the 25% most strongly predicted binding residues, at least 73% were correctly predicted even when ignoring the problem of missing experimental annotations. The new method bindEmbed21 is fast, simple, and broadly applicable-neither using structure nor MSAs. Thereby, it found binding residues in over 42% of all human proteins not otherwise implied in binding and predicted about 6% of all residues as binding to metal ions, nucleic acids, or small molecules.
Collapse
Affiliation(s)
- Maria Littmann
- Department of Informatics, Bioinformatics and Computational Biology, I12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany.
| | - Michael Heinzinger
- Department of Informatics, Bioinformatics and Computational Biology, I12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Christian Dallago
- Department of Informatics, Bioinformatics and Computational Biology, I12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Konstantin Weissenow
- Department of Informatics, Bioinformatics and Computational Biology, I12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Burkhard Rost
- Department of Informatics, Bioinformatics and Computational Biology, I12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
- Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, Garching, 85748, Munich, Germany
- TUM School of Life Sciences Weihenstephan (TUM-WZW), Alte Akademie 8, Freising, Germany
- Department of Biochemistry and Molecular Biophysics, Columbia University, 701 West, 168th Street, New York, NY, 10032, USA
| |
Collapse
|
7
|
Garcia DR, Souza FR, Guimarães AP, Valis M, Pavelek Z, Kuca K, Ramalho TC, França TCC. In Silico Studies of Potential Selective Inhibitors of Thymidylate Kinase from Variola virus. Pharmaceuticals (Basel) 2021; 14:ph14101027. [PMID: 34681251 PMCID: PMC8537287 DOI: 10.3390/ph14101027] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2021] [Revised: 09/22/2021] [Accepted: 09/30/2021] [Indexed: 11/17/2022] Open
Abstract
Continuing the work developed by our research group, in the present manuscript, we performed a theoretical study of 10 new structures derived from the antivirals cidofovir and ribavirin, as inhibitor prototypes for the enzyme thymidylate kinase from Variola virus (VarTMPK). The proposed structures were subjected to docking calculations, molecular dynamics simulations, and free energy calculations, using the molecular mechanics Poisson-Boltzmann surface area (MM-PBSA) method, inside the active sites of VarTMPK and human TMPK (HssTMPK). The docking and molecular dynamic studies pointed to structures 2, 3, 4, 6, and 9 as more selective towards VarTMPK. In addition, the free energy data calculated through the MM-PBSA method, corroborated these results. This suggests that these compounds are potential selective inhibitors of VarTMPK and, thus, can be considered as template molecules to be synthesized and experimentally evaluated against smallpox.
Collapse
Affiliation(s)
- Danielle R. Garcia
- Laboratory of Molecular Modeling Applied to Chemical and Biological Defense, Military Institute of Engineering, Praça General Tiburcio 80, Urca, Rio de Janeiro 22290-270, Brazil;
| | - Felipe R. Souza
- Department of Chemistry, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro 22541-041, Brazil;
| | - Ana P. Guimarães
- Department of Chemistry, Federal University of Viçosa, Avenida P. H. Rolfs, s/n, Centro, Viçosa 36570-000, MG, Brazil;
| | - Martin Valis
- Department of Neurology of the Medical Faculty of Charles University and University Hospital in Hradec Kralove, Sokolska 581, 50005 Hradec Kralove, Czech Republic; (M.V.); (Z.P.)
| | - Zbyšek Pavelek
- Department of Neurology of the Medical Faculty of Charles University and University Hospital in Hradec Kralove, Sokolska 581, 50005 Hradec Kralove, Czech Republic; (M.V.); (Z.P.)
| | - Kamil Kuca
- Department of Chemistry, Faculty of Science, University of Hradec Kralove, Rokitanskeho 62, 50003 Hradec Kralove, Czech Republic;
- Biomedical Research Center, University Hospital in Hradec Kralove, Sokolska 581, 50005 Hradec Kralove, Czech Republic
- Correspondence: (K.K.); (T.C.C.F.)
| | - Teodorico C. Ramalho
- Department of Chemistry, Faculty of Science, University of Hradec Kralove, Rokitanskeho 62, 50003 Hradec Kralove, Czech Republic;
- Laboratory of Computational Chemistry, Department of Chemistry, UFLA, Lavras 37200-000, MG, Brazil
| | - Tanos C. C. França
- Laboratory of Molecular Modeling Applied to Chemical and Biological Defense, Military Institute of Engineering, Praça General Tiburcio 80, Urca, Rio de Janeiro 22290-270, Brazil;
- Department of Chemistry, Faculty of Science, University of Hradec Kralove, Rokitanskeho 62, 50003 Hradec Kralove, Czech Republic;
- Correspondence: (K.K.); (T.C.C.F.)
| |
Collapse
|
8
|
Bernhofer M, Dallago C, Karl T, Satagopam V, Heinzinger M, Littmann M, Olenyi T, Qiu J, Schütze K, Yachdav G, Ashkenazy H, Ben-Tal N, Bromberg Y, Goldberg T, Kajan L, O’Donoghue S, Sander C, Schafferhans A, Schlessinger A, Vriend G, Mirdita M, Gawron P, Gu W, Jarosz Y, Trefois C, Steinegger M, Schneider R, Rost B. PredictProtein - Predicting Protein Structure and Function for 29 Years. Nucleic Acids Res 2021; 49:W535-W540. [PMID: 33999203 PMCID: PMC8265159 DOI: 10.1093/nar/gkab354] [Citation(s) in RCA: 129] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 04/06/2021] [Accepted: 05/10/2021] [Indexed: 12/12/2022] Open
Abstract
Since 1992 PredictProtein (https://predictprotein.org) is a one-stop online resource for protein sequence analysis with its main site hosted at the Luxembourg Centre for Systems Biomedicine (LCSB) and queried monthly by over 3,000 users in 2020. PredictProtein was the first Internet server for protein predictions. It pioneered combining evolutionary information and machine learning. Given a protein sequence as input, the server outputs multiple sequence alignments, predictions of protein structure in 1D and 2D (secondary structure, solvent accessibility, transmembrane segments, disordered regions, protein flexibility, and disulfide bridges) and predictions of protein function (functional effects of sequence variation or point mutations, Gene Ontology (GO) terms, subcellular localization, and protein-, RNA-, and DNA binding). PredictProtein's infrastructure has moved to the LCSB increasing throughput; the use of MMseqs2 sequence search reduced runtime five-fold (apparently without lowering performance of prediction methods); user interface elements improved usability, and new prediction methods were added. PredictProtein recently included predictions from deep learning embeddings (GO and secondary structure) and a method for the prediction of proteins and residues binding DNA, RNA, or other proteins. PredictProtein.org aspires to provide reliable predictions to computational and experimental biologists alike. All scripts and methods are freely available for offline execution in high-throughput settings.
Collapse
Affiliation(s)
- Michael Bernhofer
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
- TUM Graduate School CeDoSIA, Boltzmannstr 11, 85748 Garching, Germany
| | - Christian Dallago
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
- TUM Graduate School CeDoSIA, Boltzmannstr 11, 85748 Garching, Germany
| | - Tim Karl
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
| | - Venkata Satagopam
- Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
- ELIXIR Luxembourg (ELIXIR-LU) Node, University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Michael Heinzinger
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
- TUM Graduate School CeDoSIA, Boltzmannstr 11, 85748 Garching, Germany
| | - Maria Littmann
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
- TUM Graduate School CeDoSIA, Boltzmannstr 11, 85748 Garching, Germany
| | - Tobias Olenyi
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
| | - Jiajun Qiu
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
- Department of Otolaryngology Head & Neck Surgery, The Ninth People's Hospital & Ear Institute, School of Medicine & Shanghai Key Laboratory of Translational Medicine on Ear and Nose Diseases, Shanghai Jiao Tong University, Shanghai, China
| | - Konstantin Schütze
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
| | - Guy Yachdav
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
| | - Haim Ashkenazy
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, 69978 Tel Aviv, Israel
| | - Nir Ben-Tal
- Department of Biochemistry & Molecular Biology, George S. Wise Faculty of Life Sciences, Tel Aviv University, 69978 Tel Aviv, Israel
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ 08901, USA
| | - Tatyana Goldberg
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
| | - Laszlo Kajan
- Roche Polska Sp. z o.o., Domaniewska 39B, 02–672 Warsaw, Poland
| | | | - Chris Sander
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Department of Cell Biology, Harvard Medical School, Boston, MA 02215, USA
- Broad Institute of MIT and Harvard, Boston, MA 02142, USA
| | - Andrea Schafferhans
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
- HSWT (Hochschule Weihenstephan Triesdorf | University of Applied Sciences), Department of Bioengineering Sciences, Am Hofgarten 10, 85354 Freising, Germany
| | - Avner Schlessinger
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | | | - Milot Mirdita
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Piotr Gawron
- Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Wei Gu
- Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
- ELIXIR Luxembourg (ELIXIR-LU) Node, University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Yohan Jarosz
- Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
- ELIXIR Luxembourg (ELIXIR-LU) Node, University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Christophe Trefois
- Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
- ELIXIR Luxembourg (ELIXIR-LU) Node, University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul, South Korea
- Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
| | - Reinhard Schneider
- Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
- ELIXIR Luxembourg (ELIXIR-LU) Node, University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Burkhard Rost
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
- Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748 Garching/Munich, Germany
- TUM School of Life Sciences Weihenstephan (WZW), Alte Akademie 8, Freising, Germany
| |
Collapse
|
9
|
Xu L, Jiang S, Wu J, Zou Q. An in silico approach to identification, categorization and prediction of nucleic acid binding proteins. Brief Bioinform 2021. [PMID: 32793956 DOI: 10.1101/2020.05.05.078741] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/12/2023] Open
Abstract
The interaction between proteins and nucleic acid plays an important role in many processes, such as transcription, translation and DNA repair. The mechanisms of related biological events can be understood by exploring the function of proteins in these interactions. The number of known protein sequences has increased rapidly in recent years, but the databases for describing the structure and function of protein have unfortunately grown quite slowly. Thus, improving such databases is meaningful for predicting protein-nucleic acid interactions. Furthermore, the mechanism of related biological events, such as viral infection or designing novel drug targets, can be further understood by understanding the function of proteins in these interactions. The information for each sequence, including its function and interaction sites, were collected and identified, and a database called PNIDB was built. The proteins in PNIDB were grouped into 27 classes, such as transcription, immune system, and structural protein, etc. The function of each protein was then predicted using a machine learning method. Using our method, the predictor was trained on labeled sequences, and then the function of a protein was predicted based on the trained classifier. The prediction accuracy achieved a score of 77.43% by 10-fold cross validation.
Collapse
Affiliation(s)
- Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic
| | | | - Jin Wu
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China
| | - Quan Zou
- School of Management, Shenzhen Polytechnic
| |
Collapse
|
10
|
Xu L, Jiang S, Wu J, Zou Q. An in silico approach to identification, categorization and prediction of nucleic acid binding proteins. Brief Bioinform 2020; 22:5892348. [PMID: 32793956 DOI: 10.1093/bib/bbaa171] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Revised: 06/22/2020] [Accepted: 07/01/2020] [Indexed: 01/29/2023] Open
Abstract
The interaction between proteins and nucleic acid plays an important role in many processes, such as transcription, translation and DNA repair. The mechanisms of related biological events can be understood by exploring the function of proteins in these interactions. The number of known protein sequences has increased rapidly in recent years, but the databases for describing the structure and function of protein have unfortunately grown quite slowly. Thus, improving such databases is meaningful for predicting protein-nucleic acid interactions. Furthermore, the mechanism of related biological events, such as viral infection or designing novel drug targets, can be further understood by understanding the function of proteins in these interactions. The information for each sequence, including its function and interaction sites, were collected and identified, and a database called PNIDB was built. The proteins in PNIDB were grouped into 27 classes, such as transcription, immune system, and structural protein, etc. The function of each protein was then predicted using a machine learning method. Using our method, the predictor was trained on labeled sequences, and then the function of a protein was predicted based on the trained classifier. The prediction accuracy achieved a score of 77.43% by 10-fold cross validation.
Collapse
Affiliation(s)
- Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic
| | | | - Jin Wu
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China
| | - Quan Zou
- School of Management, Shenzhen Polytechnic
| |
Collapse
|
11
|
Ribeiro J, Ríos-Vera C, Melo F, Schüller A. Calculation of accurate interatomic contact surface areas for the quantitative analysis of non-bonded molecular interactions. Bioinformatics 2020; 35:3499-3501. [PMID: 30698657 PMCID: PMC6748739 DOI: 10.1093/bioinformatics/btz062] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2018] [Revised: 12/24/2018] [Accepted: 01/24/2019] [Indexed: 12/02/2022] Open
Abstract
Summary Intra- and intermolecular contact surfaces are routinely calculated for a large array of applications in bioinformatics but are typically approximated from differential solvent accessible surface area calculations and not calculated directly. These approximations do not properly take the effects of neighboring atoms into account and tend to deviate considerably from the true contact surface. We implemented an extension of the original Shrake-Rupley algorithm to accurately estimate interatomic contact surface areas of molecular structures and complexes. Our extended algorithm is able to calculate the contact area of an atom to all nearby atoms by directly calculating overlapping surface patches, taking into account the possible shielding effects of neighboring atoms. Here, we present a versatile software tool and web server for the calculation of contact surface areas, as well as buried surface areas and solvent accessible surface areas (SASA) for different types of biomolecules, such as proteins, nucleic acids and small organic molecules. Detailed results are provided in tab-separated values format for analysis and Protein Databank files for visualization. Direct contact surface area calculation resulted in improved accuracy in a benchmark with a non-redundant set of 245 protein–DNA complexes. SASA-based approximations underestimated protein–DNA contact surfaces on average by 40%. This software tool may be useful for surface-based intra- and intermolecular interaction analyses and scoring function development. Availability and implementation A web server, stand-alone binaries for Linux, MacOS and Windows and C++ source code are freely available from http://schuellerlab.org/dr_sasa/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Judemir Ribeiro
- Department of Molecular Genetics and Microbiology, School of Biological Sciences, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Carlos Ríos-Vera
- Department of Molecular Genetics and Microbiology, School of Biological Sciences, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Francisco Melo
- Department of Molecular Genetics and Microbiology, School of Biological Sciences, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Andreas Schüller
- Department of Molecular Genetics and Microbiology, School of Biological Sciences, Pontificia Universidad Católica de Chile, Santiago, Chile
| |
Collapse
|
12
|
Sagendorf JM, Markarian N, Berman HM, Rohs R. DNAproDB: an expanded database and web-based tool for structural analysis of DNA-protein complexes. Nucleic Acids Res 2020; 48:D277-D287. [PMID: 31612957 PMCID: PMC7145614 DOI: 10.1093/nar/gkz889] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Revised: 09/22/2019] [Accepted: 10/01/2019] [Indexed: 11/24/2022] Open
Abstract
DNAproDB (https://dnaprodb.usc.edu) is a web-based database and structural analysis tool that offers a combination of data visualization, data processing and search functionality that improves the speed and ease with which researchers can analyze, access and visualize structural data of DNA–protein complexes. In this paper, we report significant improvements made to DNAproDB since its initial release. DNAproDB now supports any DNA secondary structure from typical B-form DNA to single-stranded DNA to G-quadruplexes. We have updated the structure of our data files to support complex DNA conformations, multiple DNA–protein complexes within a DNAproDB entry and model indexing for analysis of ensemble data. Support for chemically modified residues and nucleotides has been significantly improved along with the addition of new structural features, improved structural moiety assignment and use of more sequence-based annotations. We have redesigned our report pages and search forms to support these enhancements, and the DNAproDB website has been improved to be more responsive and user-friendly. DNAproDB is now integrated with the Nucleic Acid Database, and we have increased our coverage of available Protein Data Bank entries. Our database now contains 95% of all available DNA–protein complexes, making our tools for analysis of these structures accessible to a broad community.
Collapse
Affiliation(s)
- Jared M Sagendorf
- Quantitative and Computational Biology, Departments of Biological Sciences, Chemistry, Physics and Astronomy, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - Nicholas Markarian
- Quantitative and Computational Biology, Departments of Biological Sciences, Chemistry, Physics and Astronomy, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - Helen M Berman
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Remo Rohs
- Quantitative and Computational Biology, Departments of Biological Sciences, Chemistry, Physics and Astronomy, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
13
|
Qiu J, Bernhofer M, Heinzinger M, Kemper S, Norambuena T, Melo F, Rost B. ProNA2020 predicts protein-DNA, protein-RNA, and protein-protein binding proteins and residues from sequence. J Mol Biol 2020; 432:2428-2443. [PMID: 32142788 DOI: 10.1016/j.jmb.2020.02.026] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2019] [Revised: 02/17/2020] [Accepted: 02/23/2020] [Indexed: 11/29/2022]
Abstract
The intricate details of how proteins bind to proteins, DNA, and RNA are crucial for the understanding of almost all biological processes. Disease-causing sequence variants often affect binding residues. Here, we described a new, comprehensive system of in silico methods that take only protein sequence as input to predict binding of protein to DNA, RNA, and other proteins. Firstly, we needed to develop several new methods to predict whether or not proteins bind (per-protein prediction). Secondly, we developed independent methods that predict which residues bind (per-residue). Not requiring three-dimensional information, the system can predict the actual binding residue. The system combined homology-based inference with machine learning and motif-based profile-kernel approaches with word-based (ProtVec) solutions to machine learning protein level predictions. This achieved an overall non-exclusive three-state accuracy of 77% ± 1% (±one standard error) corresponding to a 1.8 fold improvement over random (best classification for protein-protein with F1 = 91 ± 0.8%). Standard neural networks for per-residue binding residue predictions appeared best for DNA-binding (Q2 = 81 ± 0.9%) followed by RNA-binding (Q2 = 80 ± 1%) and worst for protein-protein binding (Q2 = 69 ± 0.8%). The new method, dubbed ProNA2020, is available as code through github (https://github.com/Rostlab/ProNA2020.git) and through PredictProtein (www.predictprotein.org).
Collapse
Affiliation(s)
- Jiajun Qiu
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany; TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Garching, 85748, Germany.
| | - Michael Bernhofer
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany; TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Garching, 85748, Germany
| | - Michael Heinzinger
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany; TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Garching, 85748, Germany
| | - Sofie Kemper
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany
| | - Tomas Norambuena
- Molecular Bioinformatics Laboratory, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Francisco Melo
- Molecular Bioinformatics Laboratory, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile; Institute of Biological and Medical Engineering, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Burkhard Rost
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany; Columbia University, Department of Biochemistry and Molecular Biophysics, 701 West, 168th Street, New York, NY, 10032, USA; Institute of Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748, Garching/Munich, Germany; Germany & Institute for Food and Plant Sciences (WZW) Weihenstephan, Alte Akademie 8, 85354 Freising, Germany
| |
Collapse
|
14
|
Machado MR, Pantano S. Split the Charge Difference in Two! A Rule of Thumb for Adding Proper Amounts of Ions in MD Simulations. J Chem Theory Comput 2020; 16:1367-1372. [PMID: 31999456 DOI: 10.1021/acs.jctc.9b00953] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Despite the relevance of properly setting ionic concentrations in Molecular Dynamics (MD) simulations, methods or practical rules to set ionic strength are scarce and rarely documented. Based on a recently proposed thermodynamics method we provide an accurate rule of thumb to define the electrolytic content in simulation boxes. Extending the use of good practices in setting up MD systems is promptly needed to ensure reproducibility and consistency in molecular simulations.
Collapse
Affiliation(s)
- Matías R Machado
- Biomolecular Simulations Group, Institut Pasteur de Montevideo, Mataojo 2020, Montevideo CP 11400, Uruguay
| | - Sergio Pantano
- Biomolecular Simulations Group, Institut Pasteur de Montevideo, Mataojo 2020, Montevideo CP 11400, Uruguay
| |
Collapse
|
15
|
Rodrigues Garcia D, Rodrigues de Souza F, Paula Guimarães A, Castro Ramalho T, Palermo de Aguiar A, Celmar Costa França T. Design of inhibitors of thymidylate kinase from Variola virus as new selective drugs against smallpox: part II. J Biomol Struct Dyn 2019; 37:4569-4579. [PMID: 30488769 PMCID: PMC9491145 DOI: 10.1080/07391102.2018.1554510] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2018] [Revised: 11/26/2018] [Accepted: 11/26/2018] [Indexed: 01/21/2023]
Abstract
Acknowledging the importance of studies toward the development of measures against terrorism and bioterrorism, this study aims to contribute to the design of new prototypes of potential drugs against smallpox. Based on a former study, nine synthetic feasible prototypes of selective inhibitors for thymidylate kinase from Variola virus (VarTMPK) were designed and submitted to molecular docking, molecular dynamics simulations and binding energy calculations. The compounds are simplifications of two more complex scaffolds, with a guanine connected to an amide or alcohol through a spacer containing ether and/or amide groups, formerly suggested as promising for the design of selective inhibitors of VarTMPK. Our study showed that, despite the structural simplifications, the compounds presented effective energy values in interactions with VarTMPK and HssTMPK and that the guanine could be replaced by a simpler imidazole ring linked to a -NH2 group, without compromising the affinity for VarTMPK. It was also observed that a positive charge in the imidazole ring is important for the selectivity toward VarTMPK and that an amide group in the spacer does not contribute to selectivity. Finally, prototype 3 was pointed as the most promising to be synthesized and experimentally evaluated. Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Danielle Rodrigues Garcia
- Laboratory of Molecular Modeling Applied to Chemical and Biological Defense, Military Institute of Engineering, Rio de Janeiro, RJ, Brazil
| | - Felipe Rodrigues de Souza
- Laboratory of Molecular Modeling Applied to Chemical and Biological Defense, Military Institute of Engineering, Rio de Janeiro, RJ, Brazil
| | | | - Teodorico Castro Ramalho
- Laboratory of Computational Chemistry, Department of Chemistry, UFLA, Lavras, MG, Brazil
- Faculty of Informatics and Management, Center for Basic and Applied Research, University of Hradec Králové, Hradec Králove, Czech Republic
| | | | - Tanos Celmar Costa França
- Laboratory of Molecular Modeling Applied to Chemical and Biological Defense, Military Institute of Engineering, Rio de Janeiro, RJ, Brazil
- Faculty of Informatics and Management, Center for Basic and Applied Research, University of Hradec Králové, Hradec Králove, Czech Republic
| |
Collapse
|
16
|
De Las Rivas J, Bonavides-Martínez C, Campos-Laborie FJ. Bioinformatics in Latin America and SoIBio impact, a tale of spin-off and expansion around genomes and protein structures. Brief Bioinform 2019; 20:390-397. [PMID: 28981567 PMCID: PMC6433739 DOI: 10.1093/bib/bbx064] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2017] [Revised: 04/18/2017] [Indexed: 11/30/2022] Open
Abstract
Owing to the emerging impact of bioinformatics and computational biology, in this article, we present an overview of the history and current state of the research on this field in Latin America (LA). It will be difficult to cover without inequality all the efforts, initiatives and works that have happened for the past two decades in this vast region (that includes >19 million km2 and >600 million people). Despite the difficulty, we have done an analytical search looking for publications in the field made by researchers from 19 LA countries in the past 25 years. In this way, we find that research in bioinformatics in this region should develop twice to approach the average world scientific production in the field. We also found some of the pioneering scientists who initiated and led bioinformatics in the region and were promoters of this new scientific field. Our analysis also reveals that spin-off began around some specific areas within the biomolecular sciences: studies on genomes (anchored in the new generation of deep sequencing technologies, followed by developments in proteomics) and studies on protein structures (supported by three-dimensional structural determination technologies and their computational advancement). Finally, we show that the contribution to this endeavour of the Iberoamerican Society for Bioinformatics, founded in Mexico in 2009, has been significant, as it is a leading forum to join efforts of many scientists from LA interested in promoting research, training and education in bioinformatics.
Collapse
Affiliation(s)
- Javier De Las Rivas
- CSIC and Universidad de Salamanca, Bioinformatics and Functional Genomics Group, Cancer Research Center (IMBCC, CSIC/USAL/IBSAL), Salamanca, Spain
- Corresponding author. Javier De Las Rivas, Bioinformatics and Functional Genomics Group, Cancer Research Center (IMBCC, CSIC/USAL/IBSAL), Consejo Superior de Investigaciones Científicas (CSIC) and Universidad de Salamanca (USAL), Campus Miguel de Unamuno s/n, Salamanca 37007, Spain. Tel.: +34 923294819; Fax: +34923294743; E-mail:
| | - Cesar Bonavides-Martínez
- Universidad Nacional Autonoma de Mexico, Computational Genomics, Centro de Ciencias Genómicas, Cuernavaca, Morelos, Mexico
| | - Francisco Jose Campos-Laborie
- CSIC and Universidad de Salamanca, Bioinformatics and Functional Genomics Group, Cancer Research Center (IMBCC, CSIC/USAL/IBSAL), Salamanca, Spain
| |
Collapse
|
17
|
Sagendorf JM, Berman HM, Rohs R. DNAproDB: an interactive tool for structural analysis of DNA-protein complexes. Nucleic Acids Res 2019; 45:W89-W97. [PMID: 28431131 PMCID: PMC5570235 DOI: 10.1093/nar/gkx272] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2017] [Accepted: 04/06/2017] [Indexed: 02/06/2023] Open
Abstract
Many biological processes are mediated by complex interactions between DNA and proteins. Transcription factors, various polymerases, nucleases and histones recognize and bind DNA with different levels of binding specificity. To understand the physical mechanisms that allow proteins to recognize DNA and achieve their biological functions, it is important to analyze structures of DNA–protein complexes in detail. DNAproDB is a web-based interactive tool designed to help researchers study these complexes. DNAproDB provides an automated structure-processing pipeline that extracts structural features from DNA–protein complexes. The extracted features are organized in structured data files, which are easily parsed with any programming language or viewed in a browser. We processed a large number of DNA–protein complexes retrieved from the Protein Data Bank and created the DNAproDB database to store this data. Users can search the database by combining features of the DNA, protein or DNA–protein interactions at the interface. Additionally, users can upload their own structures for processing privately and securely. DNAproDB provides several interactive and customizable tools for creating visualizations of the DNA–protein interface at different levels of abstraction that can be exported as high quality figures. All functionality is documented and freely accessible at http://dnaprodb.usc.edu.
Collapse
Affiliation(s)
- Jared M Sagendorf
- Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics & Astronomy, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - Helen M Berman
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Remo Rohs
- Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics & Astronomy, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
18
|
Emamjomeh A, Choobineh D, Hajieghrari B, MahdiNezhad N, Khodavirdipour A. DNA-protein interaction: identification, prediction and data analysis. Mol Biol Rep 2019; 46:3571-3596. [PMID: 30915687 DOI: 10.1007/s11033-019-04763-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Accepted: 03/14/2019] [Indexed: 12/30/2022]
Abstract
Life in living organisms is dependent on specific and purposeful interaction between other molecules. Such purposeful interactions make the various processes inside the cells and the bodies of living organisms possible. DNA-protein interactions, among all the types of interactions between different molecules, are of considerable importance. Currently, with the development of numerous experimental techniques, diverse methods are convenient for recognition and investigating such interactions. While the traditional experimental techniques to identify DNA-protein complexes are time-consuming and are unsuitable for genome-scale studies, the current high throughput approaches are more efficient in determining such interaction at a large-scale, but they are clearly too costly to be practice for daily applications. Hence, according to the availability of much information related to different biological sequences and clearing different dimensions of conditions in which such interactions are formed, with the developments related to the computer, mathematics, and statistics motivate scientists to develop bioinformatics tools for prediction the interaction site(s). Until now, there has been much progress in this field. In this review, the factors and conditions governing the interaction and the laboratory techniques for examining such interactions are addressed. In addition, developed bioinformatics tools are introduced and compared for this reason and, in the end, several suggestions are offered for the promotion of such tools in prediction with much more precision.
Collapse
Affiliation(s)
- Abbasali Emamjomeh
- Laboratory of Computational Biotechnology and Bioinformatics (CBB), Department of Plant Breeding and Biotechnology (PBB), University of Zabol, Zabol, 98615-538, Iran.
| | - Darush Choobineh
- Agricultural Biotechnology, Department of Plant Breeding and Biotechnology (PBB), Faculty of Agriculture, University of Zabol, Zabol, Iran
| | - Behzad Hajieghrari
- Department of Agricultural Biotechnology, College of Agriculture, Jahrom University, Jahrom, 74135-111, Iran.
| | - Nafiseh MahdiNezhad
- Laboratory of Computational Biotechnology and Bioinformatics (CBB), Department of Plant Breeding and Biotechnology (PBB), University of Zabol, Zabol, 98615-538, Iran
| | - Amir Khodavirdipour
- Division of Human Genetics, Department of Anatomy, St. John's hospital, Bangalore, India
| |
Collapse
|
19
|
Exploring DNA dynamics within oligonucleosomes with coarse-grained simulations: SIRAH force field extension for protein-DNA complexes. Biochem Biophys Res Commun 2018; 498:319-326. [DOI: 10.1016/j.bbrc.2017.09.086] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2017] [Revised: 09/06/2017] [Accepted: 09/15/2017] [Indexed: 12/22/2022]
|
20
|
Wilson KA, Wetmore SD. Combining crystallographic and quantum chemical data to understand DNA-protein π-interactions in nature. Struct Chem 2017. [DOI: 10.1007/s11224-017-0954-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
21
|
Gardini S, Furini S, Santucci A, Niccolai N. A structural bioinformatics investigation on protein–DNA complexes delineates their modes of interaction. MOLECULAR BIOSYSTEMS 2017; 13:1010-1017. [DOI: 10.1039/c7mb00071e] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
A non-redundant dataset of 629 protein–DNA complexes has been used to investigate on amino acid composition of protein-DNA interfaces. Structural proteins, transcription factors and DNA-related enzymes show specific patterns accounting for different modes of their interaction with DNA.
Collapse
Affiliation(s)
- Simone Gardini
- Department of Biotechnology
- Chemistry and Pharmacy
- University of Siena
- Italy
| | - Simone Furini
- Department of Medical Biotechnologies
- University of Siena
- Siena
- Italy
| | - Annalisa Santucci
- Department of Biotechnology
- Chemistry and Pharmacy
- University of Siena
- Italy
| | - Neri Niccolai
- Department of Biotechnology
- Chemistry and Pharmacy
- University of Siena
- Italy
| |
Collapse
|
22
|
High-resolution biophysical analysis of the dynamics of nucleosome formation. Sci Rep 2016; 6:27337. [PMID: 27263658 PMCID: PMC4897087 DOI: 10.1038/srep27337] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2016] [Indexed: 12/14/2022] Open
Abstract
We describe a biophysical approach that enables changes in the structure of DNA to be followed during nucleosome formation in in vitro reconstitution with either the canonical "Widom" sequence or a judiciously mutated sequence. The rapid non-perturbing photochemical analysis presented here provides 'snapshots' of the DNA configuration at any given moment in time during nucleosome formation under a very broad range of reaction conditions. Changes in DNA photochemical reactivity upon protein binding are interpreted as being mainly induced by alterations in individual base pair roll angles. The results strengthen the importance of the role of an initial (H3/H4)2 histone tetramer-DNA interaction and highlight the modulation of this early event by the DNA sequence. (H3/H4)2 binding precedes and dictates subsequent H2A/H2B-DNA interactions, which are less affected by the DNA sequence, leading to the final octameric nucleosome. Overall, our results provide a novel, exciting way to investigate those biophysical properties of DNA that constitute a crucial component in nucleosome formation and stabilization.
Collapse
|
23
|
Ribeiro J, Melo F, Schüller A. PDIviz: analysis and visualization of protein-DNA binding interfaces. Bioinformatics 2015; 31:2751-3. [PMID: 25886981 PMCID: PMC4528634 DOI: 10.1093/bioinformatics/btv203] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2015] [Accepted: 04/03/2015] [Indexed: 11/23/2022] Open
Abstract
Summary: Specific recognition of DNA by proteins is a crucial step of many biological processes. PDIviz is a plugin for the PyMOL molecular visualization system that analyzes protein–DNA binding interfaces by comparing the solvent accessible surface area of the complex against the free protein and free DNA. The plugin provides three distinct three-dimensional visualization modes to highlight interactions with DNA bases and backbone, major and minor groove, and with atoms of different pharmacophoric type (hydrogen bond donors/acceptors, hydrophobic and thymine methyl). Each mode comes in three styles to focus the visual analysis on the protein or DNA side of the interface, or on the nucleotide sequence. PDIviz allows for the generation of publication quality images, all calculated data can be written to disk, and a command line interface is provided for automating tasks. The plugin may be helpful for the detailed identification of regions involved in DNA base and shape readout, and can be particularly useful in rapidly pinpointing the overall mode of interaction. Availability and implementation: Freely available at http://melolab.org/pdiviz/ as a PyMOL plugin. Tested with incentive, educational, and open source versions of PyMOL on Windows, Mac and Linux systems. Contact:aschueller@bio.puc.cl Supplementary Information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Judemir Ribeiro
- Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Francisco Melo
- Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Andreas Schüller
- Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
| |
Collapse
|
24
|
An overview of the prediction of protein DNA-binding sites. Int J Mol Sci 2015; 16:5194-215. [PMID: 25756377 PMCID: PMC4394471 DOI: 10.3390/ijms16035194] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2014] [Revised: 02/21/2015] [Accepted: 02/27/2015] [Indexed: 02/06/2023] Open
Abstract
Interactions between proteins and DNA play an important role in many essential biological processes such as DNA replication, transcription, splicing, and repair. The identification of amino acid residues involved in DNA-binding sites is critical for understanding the mechanism of these biological activities. In the last decade, numerous computational approaches have been developed to predict protein DNA-binding sites based on protein sequence and/or structural information, which play an important role in complementing experimental strategies. At this time, approaches can be divided into three categories: sequence-based DNA-binding site prediction, structure-based DNA-binding site prediction, and homology modeling and threading. In this article, we review existing research on computational methods to predict protein DNA-binding sites, which includes data sets, various residue sequence/structural features, machine learning methods for comparison and selection, evaluation methods, performance comparison of different tools, and future directions in protein DNA-binding site prediction. In particular, we detail the meta-analysis of protein DNA-binding sites. We also propose specific implications that are likely to result in novel prediction methods, increased performance, or practical applications.
Collapse
|
25
|
Wilson KA, Wetmore SD. A Survey of DNA–Protein π–Interactions: A Comparison of Natural Occurrences and Structures, and Computationally Predicted Structures and Strengths. CHALLENGES AND ADVANCES IN COMPUTATIONAL CHEMISTRY AND PHYSICS 2015. [DOI: 10.1007/978-3-319-14163-3_17] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
|
26
|
Park B, Kim H, Han K. DBBP: database of binding pairs in protein-nucleic acid interactions. BMC Bioinformatics 2014; 15 Suppl 15:S5. [PMID: 25474259 PMCID: PMC4271565 DOI: 10.1186/1471-2105-15-s15-s5] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Background Interaction of proteins with other molecules plays an important role in many biological activities. As many structures of protein-DNA complexes and protein-RNA complexes have been determined in the past years, several databases have been constructed to provide structure data of the complexes. However, the information on the binding sites between proteins and nucleic acids is not readily available from the structure data since the data consists mostly of the three-dimensional coordinates of the atoms in the complexes. Results We analyzed the huge amount of structure data for the hydrogen bonding interactions between proteins and nucleic acids and developed a database called DBBP (DataBase of Binding Pairs in protein-nucleic acid interactions, http://bclab.inha.ac.kr/dbbp). DBBP contains 44,955 hydrogen bonds (H-bonds) of protein-DNA interactions and 77,947 H-bonds of protein-RNA interactions. Conclusions Analysis of the huge amount of structure data of protein-nucleic acid complexes is labor-intensive, yet provides useful information for studying protein-nucleic acid interactions. DBBP provides the detailed information of hydrogen-bonding interactions between proteins and nucleic acids at various levels from the atomic level to the residue level. The binding information can be used as a valuable resource for developing a computational method aiming at predicting new binding sites in proteins or nucleic acids.
Collapse
|
27
|
Abstract
Biomolecules are the prime information processing elements of living matter. Most of these inanimate systems are polymers that compute their own structures and dynamics using as input seemingly random character strings of their sequence, following which they coalesce and perform integrated cellular functions. In large computational systems with finite interaction-codes, the appearance of conflicting goals is inevitable. Simple conflicting forces can lead to quite complex structures and behaviors, leading to the concept of frustration in condensed matter. We present here some basic ideas about frustration in biomolecules and how the frustration concept leads to a better appreciation of many aspects of the architecture of biomolecules, and especially how biomolecular structure connects to function by means of localized frustration. These ideas are simultaneously both seductively simple and perilously subtle to grasp completely. The energy landscape theory of protein folding provides a framework for quantifying frustration in large systems and has been implemented at many levels of description. We first review the notion of frustration from the areas of abstract logic and its uses in simple condensed matter systems. We discuss then how the frustration concept applies specifically to heteropolymers, testing folding landscape theory in computer simulations of protein models and in experimentally accessible systems. Studying the aspects of frustration averaged over many proteins provides ways to infer energy functions useful for reliable structure prediction. We discuss how frustration affects folding mechanisms. We review here how the biological functions of proteins are related to subtle local physical frustration effects and how frustration influences the appearance of metastable states, the nature of binding processes, catalysis and allosteric transitions. In this review, we also emphasize that frustration, far from being always a bad thing, is an essential feature of biomolecules that allows dynamics to be harnessed for function. In this way, we hope to illustrate how Frustration is a fundamental concept in molecular biology.
Collapse
|
28
|
Disturbance of Arabidopsis thaliana microRNA-regulated pathways by Xcc bacterial effector proteins. Amino Acids 2014; 46:953-61. [PMID: 24385242 DOI: 10.1007/s00726-013-1646-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2013] [Accepted: 12/11/2013] [Indexed: 01/10/2023]
Abstract
Plants are continuously subjected to infection by pathogens, including bacteria and viruses. Bacteria can inject a variety of effector proteins into the host to reprogram host defense mechanism. It is known that microRNAs participate in plant disease resistance to bacterial pathogens and previous studies have suggested that some bacterial effectors have evolved to disturb the host's microRNA-regulated pathways; and so enabling infection. In this study, the inter-species interaction between an Xanthomonas campestris pv campestris (Xcc) pathogen effector and Arabidopsis thaliana microRNA transcription promoter was investigated using three methods: (1) interolog, (2) alignment based on using transcription factor binding site profile matrix, and (3) the web-based binding site prediction tool, PATSER. Furthermore, we integrated another two data sets from our previous study into the present web-based system. These are (1) microRNA target genes and their downstream effects mediated by protein-protein interaction (PPI), and (2) the Xcc-Arabidopsis PPI information. This present work is probably the first comprehensive study of constructing pathways that comprises effector, microRNA, target genes and PPI for the study of pathogen-host interactions. It is expected that this study may help to elucidate the role of pathogen-host interplay in a plant's immune system. The database is freely accessible at: http://ppi.bioinfo.asia.edu.tw/EDMRP .
Collapse
|
29
|
Yan Z, Wang J. Optimizing scoring function of protein-nucleic acid interactions with both affinity and specificity. PLoS One 2013; 8:e74443. [PMID: 24098651 PMCID: PMC3787031 DOI: 10.1371/journal.pone.0074443] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2013] [Accepted: 08/02/2013] [Indexed: 12/14/2022] Open
Abstract
Protein-nucleic acid (protein-DNA and protein-RNA) recognition is fundamental to the regulation of gene expression. Determination of the structures of the protein-nucleic acid recognition and insight into their interactions at molecular level are vital to understanding the regulation function. Recently, quantitative computational approach has been becoming an alternative of experimental technique for predicting the structures and interactions of biomolecular recognition. However, the progress of protein-nucleic acid structure prediction, especially protein-RNA, is far behind that of the protein-ligand and protein-protein structure predictions due to the lack of reliable and accurate scoring function for quantifying the protein-nucleic acid interactions. In this work, we developed an accurate scoring function (named as SPA-PN, SPecificity and Affinity of the Protein-Nucleic acid interactions) for protein-nucleic acid interactions by incorporating both the specificity and affinity into the optimization strategy. Specificity and affinity are two requirements of highly efficient and specific biomolecular recognition. Previous quantitative descriptions of the biomolecular interactions considered the affinity, but often ignored the specificity owing to the challenge of specificity quantification. We applied our concept of intrinsic specificity to connect the conventional specificity, which circumvents the challenge of specificity quantification. In addition to the affinity optimization, we incorporated the quantified intrinsic specificity into the optimization strategy of SPA-PN. The testing results and comparisons with other scoring functions validated that SPA-PN performs well on both the prediction of binding affinity and identification of native conformation. In terms of its performance, SPA-PN can be widely used to predict the protein-nucleic acid structures and quantify their interactions.
Collapse
Affiliation(s)
- Zhiqiang Yan
- Department of Chemistry & Physics, State University of New York at Stony Brook, Stony Brook, New York, United States of America
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun, Jilin, China
| | - Jin Wang
- Department of Chemistry & Physics, State University of New York at Stony Brook, Stony Brook, New York, United States of America
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun, Jilin, China
| |
Collapse
|
30
|
Nagarajan R, Ahmad S, Gromiha MM. Novel approach for selecting the best predictor for identifying the binding sites in DNA binding proteins. Nucleic Acids Res 2013; 41:7606-14. [PMID: 23788679 PMCID: PMC3763535 DOI: 10.1093/nar/gkt544] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Protein-DNA complexes play vital roles in many cellular processes by the interactions of amino acids with DNA. Several computational methods have been developed for predicting the interacting residues in DNA-binding proteins using sequence and/or structural information. These methods showed different levels of accuracies, which may depend on the choice of data sets used in training, the feature sets selected for developing a predictive model, the ability of the models to capture information useful for prediction or a combination of these factors. In many cases, different methods are likely to produce similar results, whereas in others, the predictors may return contradictory predictions. In this situation, a priori estimates of prediction performance applicable to the system being investigated would be helpful for biologists to choose the best method for designing their experiments. In this work, we have constructed unbiased, stringent and diverse data sets for DNA-binding proteins based on various biologically relevant considerations: (i) seven structural classes, (ii) 86 folds, (iii) 106 superfamilies, (iv) 194 families, (v) 15 binding motifs, (vi) single/double-stranded DNA, (vii) DNA conformation (A, B, Z, etc.), (viii) three functions and (ix) disordered regions. These data sets were culled as non-redundant with sequence identities of 25 and 40% and used to evaluate the performance of 11 different methods in which online services or standalone programs are available. We observed that the best performing methods for each of the data sets showed significant biases toward the data sets selected for their benchmark. Our analysis revealed important data set features, which could be used to estimate these context-specific biases and hence suggest the best method to be used for a given problem. We have developed a web server, which considers these features on demand and displays the best method that the investigator should use. The web server is freely available at http://www.biotech.iitm.ac.in/DNA-protein/. Further, we have grouped the methods based on their complexity and analyzed the performance. The information gained in this work could be effectively used to select the best method for designing experiments.
Collapse
Affiliation(s)
- R Nagarajan
- Department of Biotechnology, Indian Institute of Technology Madras, Chennai 600036, India and National Institute of Biomedical Innovation, Osaka, Japan
| | | | | |
Collapse
|
31
|
Gromiha MM, Nagarajan R. Computational approaches for predicting the binding sites and understanding the recognition mechanism of protein-DNA complexes. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2013; 91:65-99. [PMID: 23790211 DOI: 10.1016/b978-0-12-411637-5.00003-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Protein-DNA recognition plays an important role in the regulation of gene expression. Understanding the influence of specific residues for protein-DNA interactions and the recognition mechanism of protein-DNA complexes is a challenging task in molecular and computational biology. Several computational approaches have been put forward to tackle these problems from different perspectives: (i) development of databases for the interactions between protein and DNA and binding specificity of protein-DNA complexes, (ii) structural analysis of protein-DNA complexes, (iii) discriminating DNA-binding proteins from amino acid sequence, (iv) prediction of DNA-binding sites and protein-DNA binding specificity using sequence and/or structural information, and (v) understanding the recognition mechanism of protein-DNA complexes. In this review, we focus on all these issues and extensively discuss the advancements on the development of comprehensive bioinformatics databases for protein-DNA interactions, efficient tools for identifying the binding sites, and plausible mechanisms for understanding the recognition of protein-DNA complexes. Further, the available online resources for understanding protein-DNA interactions are collectively listed, which will serve as ready-to-use information for the research community.
Collapse
Affiliation(s)
- M Michael Gromiha
- Department of Biotechnology, Indian Institute of Technology Madras, Chennai, Tamil Nadu, India.
| | | |
Collapse
|
32
|
Dans PD, Darré L, Machado MR, Zeida A, Brandner AF, Pantano S. Assessing the Accuracy of the SIRAH Force Field to Model DNA at Coarse Grain Level. ACTA ACUST UNITED AC 2013. [DOI: 10.1007/978-3-319-02624-4_7] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/12/2023]
|
33
|
Kirsanov DD, Zanegina ON, Aksianov EA, Spirin SA, Karyagina AS, Alexeevski AV. NPIDB: Nucleic acid-Protein Interaction DataBase. Nucleic Acids Res 2012. [PMID: 23193292 PMCID: PMC3531207 DOI: 10.1093/nar/gks1199] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The Nucleic acid-Protein Interaction DataBase (http://npidb.belozersky.msu.ru/) contains information derived from structures of DNA-protein and RNA-protein complexes extracted from the Protein Data Bank (3846 complexes in October 2012). It provides a web interface and a set of tools for extracting biologically meaningful characteristics of nucleoprotein complexes. The content of the database is updated weekly. The current version of the Nucleic acid-Protein Interaction DataBase is an upgrade of the version published in 2007. The improvements include a new web interface, new tools for calculation of intermolecular interactions, a classification of SCOP families that contains DNA-binding protein domains and data on conserved water molecules on the DNA-protein interface.
Collapse
Affiliation(s)
- Dmitry D Kirsanov
- Department of Mathematical Methods in Biology, Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, Russia
| | | | | | | | | | | |
Collapse
|
34
|
Dans PD, Pérez A, Faustino I, Lavery R, Orozco M. Exploring polymorphisms in B-DNA helical conformations. Nucleic Acids Res 2012; 40:10668-78. [PMID: 23012264 PMCID: PMC3510489 DOI: 10.1093/nar/gks884] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The traditional mesoscopic paradigm represents DNA as a series of base-pair steps whose energy response to equilibrium perturbations is elastic, with harmonic oscillations (defining local stiffness) around a single equilibrium conformation. In addition, base sequence effects are often analysed as a succession of independent XpY base-pair steps (i.e. a nearest-neighbour (NN) model with only 10 unique cases). Unfortunately, recent massive simulations carried out by the ABC consortium suggest that the real picture of DNA flexibility may be much more complex. The paradigm of DNA flexibility therefore needs to be revisited. In this article, we explore in detail one of the most obvious violations of the elastic NN model of flexibility: the bimodal distributions of some helical parameters. We perform here an in-depth statistical analysis of a very large set of MD trajectories and also of experimental structures, which lead to very solid evidence of bimodality. We then suggest ways to improve mesoscopic models to account for this deviation from the elastic regime.
Collapse
Affiliation(s)
- Pablo D Dans
- Joint IRB-BSC Program on Computational Biology, Institute for Research in Biomedicine, Parc Cientific de Barcelona, Josep Samitier 1-5, Barcelona 08028, Spain
| | | | | | | | | |
Collapse
|
35
|
Turner D, Kim R, Guo JT. TFinDit: transcription factor-DNA interaction data depository. BMC Bioinformatics 2012; 13:220. [PMID: 22943312 PMCID: PMC3483241 DOI: 10.1186/1471-2105-13-220] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2012] [Accepted: 08/23/2012] [Indexed: 11/28/2022] Open
Abstract
Background One of the crucial steps in regulation of gene expression is the binding of transcription factor(s) to specific DNA sequences. Knowledge of the binding affinity and specificity at a structural level between transcription factors and their target sites has important implications in our understanding of the mechanism of gene regulation. Due to their unique functions and binding specificity, there is a need for a transcription factor-specific, structure-based database and corresponding web service to facilitate structural bioinformatics studies of transcription factor-DNA interactions, such as development of knowledge-based interaction potential, transcription factor-DNA docking, binding induced conformational changes, and the thermodynamics of protein-DNA interactions. Description TFinDit is a relational database and a web search tool for studying transcription factor-DNA interactions. The database contains annotated transcription factor-DNA complex structures and related data, such as unbound protein structures, thermodynamic data, and binding sequences for the corresponding transcription factors in the complex structures. TFinDit also provides a user-friendly interface and allows users to either query individual entries or generate datasets through culling the database based on one or more search criteria. Conclusions TFinDit is a specialized structural database with annotated transcription factor-DNA complex structures and other preprocessed data. We believe that this database/web service can facilitate the development and testing of TF-DNA interaction potentials and TF-DNA docking algorithms, and the study of protein-DNA recognition mechanisms.
Collapse
Affiliation(s)
- Daniel Turner
- Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | | | | |
Collapse
|
36
|
Zeida A, Machado MR, Dans PD, Pantano S. Breathing, bubbling, and bending: DNA flexibility from multimicrosecond simulations. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2012; 86:021903. [PMID: 23005781 DOI: 10.1103/physreve.86.021903] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2011] [Revised: 06/01/2012] [Indexed: 06/01/2023]
Abstract
Bending of the seemingly stiff DNA double helix is a fundamental physical process for any living organism. Specialized proteins recognize DNA inducing and stabilizing sharp curvatures of the double helix. However, experimental evidence suggests a high protein-independent flexibility of DNA. On the basis of coarse-grained simulations, we propose that DNA experiences thermally induced kinks associated with the spontaneous formation of internal bubbles. Comparison of the protein-induced DNA curvature calculated from the Protein Data Bank with that sampled by our simulations suggests that thermally induced distortions can account for ~80% of the DNA curvature present in experimentally solved structures.
Collapse
Affiliation(s)
- Ari Zeida
- Institut Pasteur de Montevideo, Calle Mataojo 2020, Montevideo, Codigo Postal 11400, Uruguay
| | | | | | | |
Collapse
|
37
|
Liu LA, Bradley P. Atomistic modeling of protein-DNA interaction specificity: progress and applications. Curr Opin Struct Biol 2012; 22:397-405. [PMID: 22796087 DOI: 10.1016/j.sbi.2012.06.002] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2012] [Accepted: 06/20/2012] [Indexed: 12/22/2022]
Abstract
An accurate, predictive understanding of protein-DNA binding specificity is crucial for the successful design and engineering of novel protein-DNA binding complexes. In this review, we summarize recent studies that use atomistic representations of interfaces to predict protein-DNA binding specificity computationally. Although methods with limited structural flexibility have proven successful at recapitulating consensus binding sequences from wild-type complex structures, conformational flexibility is likely important for design and template-based modeling, where non-native conformations need to be sampled and accurately scored. A successful application of such computational modeling techniques in the construction of the TAL-DNA complex structure is discussed. With continued improvements in energy functions, solvation models, and conformational sampling, we are optimistic that reliable and large-scale protein-DNA binding prediction and engineering is a goal within reach.
Collapse
|
38
|
Computer-based annotation of putative AraC/XylS-family transcription factors of known structure but unknown function. J Biomed Biotechnol 2012; 2012:103132. [PMID: 22505803 PMCID: PMC3312330 DOI: 10.1155/2012/103132] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2011] [Revised: 12/09/2011] [Accepted: 12/13/2011] [Indexed: 12/12/2022] Open
Abstract
Currently, about 20 crystal structures per day are released and deposited in the Protein Data Bank. A significant fraction of these structures is produced by research groups associated with the structural genomics consortium. The biological function of many of these proteins is generally unknown or not validated by experiment. Therefore, a growing need for functional prediction of protein structures has emerged. Here we present an integrated bioinformatics method that combines sequence-based relationships and three-dimensional (3D) structural similarity of transcriptional regulators with computer prediction of their cognate DNA binding sequences. We applied this method to the AraC/XylS family of transcription factors, which is a large family of transcriptional regulators found in many bacteria controlling the expression of genes involved in diverse biological functions. Three putative new members of this family with known 3D structure but unknown function were identified for which a probable functional classification is provided. Our bioinformatics analyses suggest that they could be involved in plant cell wall degradation (Lin2118 protein from Listeria innocua, PDB code 3oou), symbiotic nitrogen fixation (protein from Chromobacterium violaceum, PDB code 3oio), and either metabolism of plant-derived biomass or nitrogen fixation (protein from Rhodopseudomonas palustris, PDB code 3mn2).
Collapse
|
39
|
Suhrer SJ, Gruber M, Wiederstein M, Sippl MJ. Effective techniques for protein structure mining. Methods Mol Biol 2012; 857:33-54. [PMID: 22323216 DOI: 10.1007/978-1-61779-588-6_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Retrieval and characterization of protein structure relationships are instrumental in a wide range of tasks in structural biology. The classification of protein structures (COPS) is a web service that provides efficient access to structure and sequence similarities for all currently available protein structures. Here, we focus on the application of COPS to the problem of template selection in homology modeling.
Collapse
Affiliation(s)
- Stefan J Suhrer
- Center of Applied Molecular Engineering, Division of Bioinformatics, University of Salzburg, Salzburg, Austria.
| | | | | | | |
Collapse
|
40
|
Singh H, Chauhan JS, Gromiha MM, Raghava GPS. ccPDB: compilation and creation of data sets from Protein Data Bank. Nucleic Acids Res 2011; 40:D486-9. [PMID: 22139939 PMCID: PMC3245168 DOI: 10.1093/nar/gkr1150] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
ccPDB (http://crdd.osdd.net/raghava/ccpdb/) is a database of data sets compiled from the literature and Protein Data Bank (PDB). First, we collected and compiled data sets from the literature used for developing bioinformatics methods to annotate the structure and function of proteins. Second, data sets were derived from the latest release of PDB using standard protocols. Third, we developed a powerful module for creating a wide range of customized data sets from the current release of PDB. This is a flexible module that allows users to create data sets using a simple six step procedure. In addition, a number of web services have been integrated in ccPDB, which include submission of jobs on PDB-based servers, annotation of protein structures and generation of patterns. This database maintains >30 types of data sets such as secondary structure, tight-turns, nucleotide interacting residues, metals interacting residues, DNA/RNA binding residues and so on.
Collapse
Affiliation(s)
- Harinder Singh
- Bioinformatics Centre, Institute of Microbial Technology, Chandigarh, India
| | | | | | | | | |
Collapse
|
41
|
Benchmarks for flexible and rigid transcription factor-DNA docking. BMC STRUCTURAL BIOLOGY 2011; 11:45. [PMID: 22044637 PMCID: PMC3262759 DOI: 10.1186/1472-6807-11-45] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/09/2011] [Accepted: 11/01/2011] [Indexed: 12/27/2022]
Abstract
BACKGROUND Structural insight from transcription factor-DNA (TF-DNA) complexes is of paramount importance to our understanding of the affinity and specificity of TF-DNA interaction, and to the development of structure-based prediction of TF binding sites. Yet the majority of the TF-DNA complexes remain unsolved despite the considerable experimental efforts being made. Computational docking represents a promising alternative to bridge the gap. To facilitate the study of TF-DNA docking, carefully designed benchmarks are needed for performance evaluation and identification of the strengths and weaknesses of docking algorithms. RESULTS We constructed two benchmarks for flexible and rigid TF-DNA docking respectively using a unified non-redundant set of 38 test cases. The test cases encompass diverse fold families and are classified into easy and hard groups with respect to the degrees of difficulty in TF-DNA docking. The major parameters used to classify expected docking difficulty in flexible docking are the conformational differences between bound and unbound TFs and the interaction strength between TFs and DNA. For rigid docking in which the starting structure is a bound TF conformation, only interaction strength is considered. CONCLUSIONS We believe these benchmarks are important for the development of better interaction potentials and TF-DNA docking algorithms, which bears important implications to structure-based prediction of transcription factor binding sites and drug design.
Collapse
|
42
|
Lewis BA, Walia RR, Terribilini M, Ferguson J, Zheng C, Honavar V, Dobbs D. PRIDB: a Protein-RNA interface database. Nucleic Acids Res 2010; 39:D277-82. [PMID: 21071426 PMCID: PMC3013700 DOI: 10.1093/nar/gkq1108] [Citation(s) in RCA: 95] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
The Protein–RNA Interface Database (PRIDB) is a comprehensive database of protein–RNA interfaces extracted from complexes in the Protein Data Bank (PDB). It is designed to facilitate detailed analyses of individual protein–RNA complexes and their interfaces, in addition to automated generation of user-defined data sets of protein–RNA interfaces for statistical analyses and machine learning applications. For any chosen PDB complex or list of complexes, PRIDB rapidly displays interfacial amino acids and ribonucleotides within the primary sequences of the interacting protein and RNA chains. PRIDB also identifies ProSite motifs in protein chains and FR3D motifs in RNA chains and provides links to these external databases, as well as to structure files in the PDB. An integrated JMol applet is provided for visualization of interacting atoms and residues in the context of the 3D complex structures. The current version of PRIDB contains structural information regarding 926 protein–RNA complexes available in the PDB (as of 10 October 2010). Atomic- and residue-level contact information for the entire data set can be downloaded in a simple machine-readable format. Also, several non-redundant benchmark data sets of protein–RNA complexes are provided. The PRIDB database is freely available online at http://bindr.gdcb.iastate.edu/PRIDB.
Collapse
Affiliation(s)
- Benjamin A Lewis
- Bioinformatics and Computational Biology Program, Iowa State University, Iowa, USA.
| | | | | | | | | | | | | |
Collapse
|
43
|
Churchill CDM, Rutledge LR, Wetmore SD. Effects of the biological backbone on stacking interactions at DNA-protein interfaces: the interplay between the backbone···π and π···π components. Phys Chem Chem Phys 2010; 12:14515-26. [PMID: 20927465 DOI: 10.1039/c0cp00550a] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The (gas-phase) MP2/6-31G*(0.25) π···π stacking interactions between the five natural bases and the aromatic amino acids calculated using (truncated) monomers composed of conjugated rings and/or (extended) monomers containing the biological backbone (either the protein backbone or deoxyribose sugar) were previously compared. Although preliminary energetic results indicated that the protein backbone strengthens, while the deoxyribose sugar either strengthens or weakens, the interaction calculated using truncated models, the reasons for these effects were unknown. The present work explains these observations by dissecting the interaction energy of the extended complexes into individual backbone···π and π···π components. Our calculations reveal that the total interaction energy of the extended complex can be predicted as a sum of the backbone···π and π···π components, which indicates that the biological backbone does not significantly affect the ring system through π-polarization. Instead, we find that the backbone can indirectly affect the magnitude of the π···π contribution by changing the relative ring orientations in extended dimers compared with truncated dimers. Furthermore, the strengths of the individual backbone···π contributions are determined to be significant (up to 18 kJ mol(-1)). Therefore, the origin of the energetic change upon model extension is found to result from a balance between an additional (attractive) backbone···π component and differences in the strength of the π···π interaction. In addition, to understand the effects of the biological backbone on the stacking interactions at DNA-protein interfaces in nature, we analyzed the stacking interactions found in select DNA-protein crystal structures, and verified that an additive approach can be used to examine the strength of these interactions in biological complexes. Interestingly, although the presence of attractive backbone···π contacts is qualitatively confirmed using the quantum theory of atoms in molecules (QTAIM), QTAIM electron density analysis is unable to quantitatively predict the additive relationship of these interactions. Most importantly, this work reveals that both the backbone···π and π···π components must be carefully considered to accurately determine the overall stability of DNA-protein assemblies.
Collapse
Affiliation(s)
- Cassandra D M Churchill
- Department of Chemistry and Biochemistry, University of Lethbridge, 4401 University Drive, Lethbridge, Alberta, Canada T1K 3M4
| | | | | |
Collapse
|