1
|
Sagendorf JM, Berman HM, Rohs R. DNAproDB: an interactive tool for structural analysis of DNA-protein complexes. Nucleic Acids Res 2019; 45:W89-W97. [PMID: 28431131 PMCID: PMC5570235 DOI: 10.1093/nar/gkx272] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2017] [Accepted: 04/06/2017] [Indexed: 02/06/2023] Open
Abstract
Many biological processes are mediated by complex interactions between DNA and proteins. Transcription factors, various polymerases, nucleases and histones recognize and bind DNA with different levels of binding specificity. To understand the physical mechanisms that allow proteins to recognize DNA and achieve their biological functions, it is important to analyze structures of DNA–protein complexes in detail. DNAproDB is a web-based interactive tool designed to help researchers study these complexes. DNAproDB provides an automated structure-processing pipeline that extracts structural features from DNA–protein complexes. The extracted features are organized in structured data files, which are easily parsed with any programming language or viewed in a browser. We processed a large number of DNA–protein complexes retrieved from the Protein Data Bank and created the DNAproDB database to store this data. Users can search the database by combining features of the DNA, protein or DNA–protein interactions at the interface. Additionally, users can upload their own structures for processing privately and securely. DNAproDB provides several interactive and customizable tools for creating visualizations of the DNA–protein interface at different levels of abstraction that can be exported as high quality figures. All functionality is documented and freely accessible at http://dnaprodb.usc.edu.
Collapse
Affiliation(s)
- Jared M Sagendorf
- Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics & Astronomy, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - Helen M Berman
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Remo Rohs
- Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics & Astronomy, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
2
|
Kamagata K, Murata A, Itoh Y, Takahashi S. Characterization of facilitated diffusion of tumor suppressor p53 along DNA using single-molecule fluorescence imaging. JOURNAL OF PHOTOCHEMISTRY AND PHOTOBIOLOGY C-PHOTOCHEMISTRY REVIEWS 2017. [DOI: 10.1016/j.jphotochemrev.2017.01.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
|
3
|
Ghosh P, Sowdhamini R. Genome-wide survey of putative RNA-binding proteins encoded in the human proteome. MOLECULAR BIOSYSTEMS 2016; 12:532-40. [PMID: 26675803 DOI: 10.1039/c5mb00638d] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
RNA-binding proteins (RBPs) are involved in various post-transcriptional gene regulatory processes and are also functionally important members of the ribosome and the spliceosome. However, RBPs and their interactions with RNA are less well-studied in comparison to DNA-binding proteins. We have classified the existing RBP structures, available in complexes with RNA and RNA/DNA hybrids, into different structural families and created Hidden Markov Models (HMMs). These structure-centric family HMMs, along with the sequence-centric family HMMs, were used as a primary database to systematically search the human proteome for the presence of putative RBPs. We have found more than 2600 gene products with RBP signatures in humans, of which around 28% are likely to bind to RNA but not DNA, whereas 9% might bind to both RNA and DNA. 11% of them do not contain an explicit functional annotation yet. Nearly 30% of the putative RBPs are exclusively nuclear, 15% have known disease associations and around 30% are enzymes. Around 40% of the proteins identified in this study are novel and have not been reported by recent large-scale studies on human RBPs.
Collapse
Affiliation(s)
- Pritha Ghosh
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road, Bangalore, Karnataka 560 065, India.
| | - R Sowdhamini
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road, Bangalore, Karnataka 560 065, India.
| |
Collapse
|
4
|
Zanegina O, Kirsanov D, Baulin E, Karyagina A, Alexeevski A, Spirin S. An updated version of NPIDB includes new classifications of DNA-protein complexes and their families. Nucleic Acids Res 2016; 44:D144-53. [PMID: 26656949 PMCID: PMC4702928 DOI: 10.1093/nar/gkv1339] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2015] [Revised: 11/13/2015] [Accepted: 11/16/2015] [Indexed: 11/13/2022] Open
Abstract
The recent upgrade of nucleic acid-protein interaction database (NPIDB, http://npidb.belozersky.msu.ru/) includes a newly elaborated classification of complexes of protein domains with double-stranded DNA and a classification of families of related complexes. Our classifications are based on contacting structural elements of both DNA: the major groove, the minor groove and the backbone; and protein: helices, beta-strands and unstructured segments. We took into account both hydrogen bonds and hydrophobic interaction. The analyzed material contains 1942 structures of protein domains from 748 PDB entries. We have identified 97 interaction modes of individual protein domain-DNA complexes and 17 DNA-protein interaction classes of protein domain families. We analyzed the sources of diversity of DNA-protein interaction modes in different complexes of one protein domain family. The observed interaction mode is sometimes influenced by artifacts of crystallization or diversity in secondary structure assignment. The interaction classes of domain families are more stable and thus possess more biological sense than a classification of single complexes. Integration of the classification into NPIDB allows the user to browse the database according to the interacting structural elements of DNA and protein molecules. For each family, we present average DNA shape parameters in contact zones with domains of the family.
Collapse
Affiliation(s)
- Olga Zanegina
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow 119992, Russia
| | | | - Eugene Baulin
- Laboratory of Applied Mathematics, Institute of Mathematical Problems in Biology, Puschino 142290, Russia
| | - Anna Karyagina
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow 119992, Russia Laboratory of Biologically Active Nanostructures, Gamaleya Center of Epidemiology and Microbiology, Moscow 123098, Russia Laboratory of Genome Analysis, Institute of Agricultural Biotechnology, Moscow 127550, Russia
| | - Andrei Alexeevski
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow 119992, Russia Sector of Applied Informatics, Research Institute for System Studies, Moscow 117218, Russia
| | - Sergey Spirin
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow 119992, Russia Sector of Applied Informatics, Research Institute for System Studies, Moscow 117218, Russia
| |
Collapse
|
5
|
Malhotra S, Sowdhamini R. Collation and analyses of DNA-binding protein domain families from sequence and structural databanks. MOLECULAR BIOSYSTEMS 2015; 11:1110-8. [PMID: 25656606 DOI: 10.1039/c4mb00629a] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
DNA-protein interactions govern several high fidelity cellular processes like DNA-replication, transcription, DNA repair, etc. Proteins that have the ability to recognise and bind DNA sequences can be classified either according to their DNA-binding motif or based on the sequence of the target nucleotides. We have collated the DNA-binding families by integrating information from both protein sequence family and structural databases. This resulted in a dataset of 1057 DNA-binding protein domain families. Their family properties (the number of members, percent identity distribution and length of members) and domain architectures were examined. Further, sequence domain families were mapped to structures in the protein databank (PDB) and the protein domain structure classification database (SCOP). The DNA-binding families, with no structural information, were clustered together into potential superfamilies based on sequence associations. On the basis of functions attributed to DNA-binding protein folds, we observe that a majority of the DNA-binding proteins follow divergent evolution. This study can serve as a basis for annotation and distribution of DNA-binding proteins in genome(s) of interest. The entire collated set of DNA-binding protein domains is available for download as Hidden Markov Models.
Collapse
Affiliation(s)
- Sony Malhotra
- National Centre for Biological Sciences, Bellary Road, GKVK Campus, Bangalore, India.
| | | |
Collapse
|
6
|
Malhotra S, Mathew OK, Sowdhamini R. DOCKSCORE: a webserver for ranking protein-protein docked poses. BMC Bioinformatics 2015; 16:127. [PMID: 25902779 PMCID: PMC4414291 DOI: 10.1186/s12859-015-0572-6] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2014] [Accepted: 04/13/2015] [Indexed: 11/28/2022] Open
Abstract
Background Proteins interact with a variety of other molecules such as nucleic acids, small molecules and other proteins inside the cell. Structure-determination of protein-protein complexes is challenging due to several reasons such as the large molecular weights of these macromolecular complexes, their dynamic nature, difficulty in purification and sample preparation. Computational docking permits an early understanding of the feasibility and mode of protein-protein interactions. However, docking algorithms propose a number of solutions and it is a challenging task to select the native or near native pose(s) from this pool. DockScore is an objective scoring scheme that can be used to rank protein-protein docked poses. It considers several interface parameters, namely, surface area, evolutionary conservation, hydrophobicity, short contacts and spatial clustering at the interface for scoring. Results We have implemented DockScore in form of a webserver for its use by the scientific community. DockScore webserver can be employed, subsequent to docking, to perform scoring of the docked solutions, starting from multiple poses as inputs. The results, on scores and ranks for all the poses, can be downloaded as a csv file and graphical view of the interface of best ranking poses is possible. Conclusions The webserver for DockScore is made freely available for the scientific community at: http://caps.ncbs.res.in/dockscore/. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0572-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sony Malhotra
- National Centre for Biological Sciences (TIFR), UAS-GKVK Campus, Bellary Road, Bangalore, 560 065, India.
| | - Oommen K Mathew
- National Centre for Biological Sciences (TIFR), UAS-GKVK Campus, Bellary Road, Bangalore, 560 065, India. .,SASTRA University, Tirumalaisamudram, Thanjavur, 613 401, Tamil Nadu, India.
| | - Ramanathan Sowdhamini
- National Centre for Biological Sciences (TIFR), UAS-GKVK Campus, Bellary Road, Bangalore, 560 065, India.
| |
Collapse
|
7
|
Malhotra S, Sowdhamini R. Genome-wide survey of DNA-binding proteins in Arabidopsis thaliana: analysis of distribution and functions. Nucleic Acids Res 2013; 41:7212-9. [PMID: 23775796 PMCID: PMC3753632 DOI: 10.1093/nar/gkt505] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
The interaction of proteins with their respective DNA targets is known to control many high-fidelity cellular processes. Performing a comprehensive survey of the sequenced genomes for DNA-binding proteins (DBPs) will help in understanding their distribution and the associated functions in a particular genome. Availability of fully sequenced genome of Arabidopsis thaliana enables the review of distribution of DBPs in this model plant genome. We used profiles of both structure and sequence-based DNA-binding families, derived from PDB and PFam databases, to perform the survey. This resulted in 4471 proteins, identified as DNA-binding in Arabidopsis genome, which are distributed across 300 different PFam families. Apart from several plant-specific DNA-binding families, certain RING fingers and leucine zippers also had high representation. Our search protocol helped to assign DNA-binding property to several proteins that were previously marked as unknown, putative or hypothetical in function. The distribution of Arabidopsis genes having a role in plant DNA repair were particularly studied and noted for their functional mapping. The functions observed to be overrepresented in the plant genome harbour DNA-3-methyladenine glycosylase activity, alkylbase DNA N-glycosylase activity and DNA-(apurinic or apyrimidinic site) lyase activity, suggesting their role in specialized functions such as gene regulation and DNA repair.
Collapse
Affiliation(s)
- Sony Malhotra
- National Centre for Biological Sciences (TIFR), UAS-GKVK Campus, Bellary Road, Bangalore 560 065, India
| | | |
Collapse
|
8
|
Gromiha MM, Nagarajan R. Computational approaches for predicting the binding sites and understanding the recognition mechanism of protein-DNA complexes. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2013; 91:65-99. [PMID: 23790211 DOI: 10.1016/b978-0-12-411637-5.00003-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Protein-DNA recognition plays an important role in the regulation of gene expression. Understanding the influence of specific residues for protein-DNA interactions and the recognition mechanism of protein-DNA complexes is a challenging task in molecular and computational biology. Several computational approaches have been put forward to tackle these problems from different perspectives: (i) development of databases for the interactions between protein and DNA and binding specificity of protein-DNA complexes, (ii) structural analysis of protein-DNA complexes, (iii) discriminating DNA-binding proteins from amino acid sequence, (iv) prediction of DNA-binding sites and protein-DNA binding specificity using sequence and/or structural information, and (v) understanding the recognition mechanism of protein-DNA complexes. In this review, we focus on all these issues and extensively discuss the advancements on the development of comprehensive bioinformatics databases for protein-DNA interactions, efficient tools for identifying the binding sites, and plausible mechanisms for understanding the recognition of protein-DNA complexes. Further, the available online resources for understanding protein-DNA interactions are collectively listed, which will serve as ready-to-use information for the research community.
Collapse
Affiliation(s)
- M Michael Gromiha
- Department of Biotechnology, Indian Institute of Technology Madras, Chennai, Tamil Nadu, India.
| | | |
Collapse
|