1
|
Zhou J, Xu R, He Y, Lu Q, Wang H, Kong B. PDNAsite: Identification of DNA-binding Site from Protein Sequence by Incorporating Spatial and Sequence Context. Sci Rep 2016; 6:27653. [PMID: 27282833 PMCID: PMC4901350 DOI: 10.1038/srep27653] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Accepted: 05/18/2016] [Indexed: 02/01/2023] Open
Abstract
Protein-DNA interactions are involved in many fundamental biological processes essential for cellular function. Most of the existing computational approaches employed only the sequence context of the target residue for its prediction. In the present study, for each target residue, we applied both the spatial context and the sequence context to construct the feature space. Subsequently, Latent Semantic Analysis (LSA) was applied to remove the redundancies in the feature space. Finally, a predictor (PDNAsite) was developed through the integration of the support vector machines (SVM) classifier and ensemble learning. Results on the PDNA-62 and the PDNA-224 datasets demonstrate that features extracted from spatial context provide more information than those from sequence context and the combination of them gives more performance gain. An analysis of the number of binding sites in the spatial context of the target site indicates that the interactions between binding sites next to each other are important for protein-DNA recognition and their binding ability. The comparison between our proposed PDNAsite method and the existing methods indicate that PDNAsite outperforms most of the existing methods and is a useful tool for DNA-binding site identification. A web-server of our predictor (http://hlt.hitsz.edu.cn:8080/PDNAsite/) is made available for free public accessible to the biological research community.
Collapse
Affiliation(s)
- Jiyun Zhou
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China.,Department of Computing, the Hong Kong Polytechnic University, Hong Kong
| | - Ruifeng Xu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China.,Shenzhen Engineering Laboratory of Performance Robots at Digital Stage, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China
| | - Yulan He
- School of Engineering and Applied Science, Aston University, UK
| | - Qin Lu
- Department of Computing, the Hong Kong Polytechnic University, Hong Kong
| | - Hongpeng Wang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Bing Kong
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| |
Collapse
|
2
|
Didovyk A, Verdine GL. Structural origins of DNA target selection and nucleobase extrusion by a DNA cytosine methyltransferase. J Biol Chem 2012; 287:40099-105. [PMID: 23012373 DOI: 10.1074/jbc.m112.413054] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
BACKGROUND How DNA 5-cytosine methyltransferases (DCMTases) select their substrate nucleobase for extrusion from DNA duplex is poorly understood. RESULTS The crystal structure of a pre-extrusion M.HaeIII DCMTase-substrate DNA complex is reported here. CONCLUSION M.HaeIII selects its substrate cytosine for extrusion by selectively interfering with its stacking and hydrogen bonding interactions within the DNA duplex. SIGNIFICANCE This is the first structural elucidation of the target cytosine selection by a DCMTase. Epigenetic methylation of cytosine residues in DNA is an essential element of genome maintenance and function in organisms ranging from bacteria to humans. DNA 5-cytosine methyltransferase enzymes (DCMTases) catalyze cytosine methylation via reaction intermediates in which the DNA is drastically remodeled, with the target cytosine residue extruded from the DNA helix and plunged into the active site pocket of the enzyme. We have determined a crystal structure of M.HaeIII DCMTase in complex with its DNA substrate at a previously unobserved state, prior to extrusion of the target cytosine and frameshifting of the DNA recognition sequence. The structure reveals that M.HaeIII selects the target cytosine and destabilizes its base-pairing through a precise, focused, and coordinated assault on the duplex DNA, which isolates the target cytosine from its nearest neighbors and thereby facilitates its extrusion from DNA.
Collapse
Affiliation(s)
- Andriy Didovyk
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA
| | | |
Collapse
|
3
|
Darii MV, Cherepanova NA, Subach OM, Kirsanova OV, Raskó T, Ślaska-Kiss K, Kiss A, Deville-Bonne D, Reboud-Ravaux M, Gromova ES. Mutational analysis of the CG recognizing DNA methyltransferase SssI: Insight into enzyme–DNA interactions. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2009; 1794:1654-62. [DOI: 10.1016/j.bbapap.2009.07.016] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2009] [Revised: 07/09/2009] [Accepted: 07/24/2009] [Indexed: 10/20/2022]
|
4
|
Yan C, Terribilini M, Wu F, Jernigan RL, Dobbs D, Honavar V. Predicting DNA-binding sites of proteins from amino acid sequence. BMC Bioinformatics 2006; 7:262. [PMID: 16712732 PMCID: PMC1534068 DOI: 10.1186/1471-2105-7-262] [Citation(s) in RCA: 101] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2005] [Accepted: 05/19/2006] [Indexed: 11/20/2022] Open
Abstract
Background Understanding the molecular details of protein-DNA interactions is critical for deciphering the mechanisms of gene regulation. We present a machine learning approach for the identification of amino acid residues involved in protein-DNA interactions. Results We start with a Naïve Bayes classifier trained to predict whether a given amino acid residue is a DNA-binding residue based on its identity and the identities of its sequence neighbors. The input to the classifier consists of the identities of the target residue and 4 sequence neighbors on each side of the target residue. The classifier is trained and evaluated (using leave-one-out cross-validation) on a non-redundant set of 171 proteins. Our results indicate the feasibility of identifying interface residues based on local sequence information. The classifier achieves 71% overall accuracy with a correlation coefficient of 0.24, 35% specificity and 53% sensitivity in identifying interface residues as evaluated by leave-one-out cross-validation. We show that the performance of the classifier is improved by using sequence entropy of the target residue (the entropy of the corresponding column in multiple alignment obtained by aligning the target sequence with its sequence homologs) as additional input. The classifier achieves 78% overall accuracy with a correlation coefficient of 0.28, 44% specificity and 41% sensitivity in identifying interface residues. Examination of the predictions in the context of 3-dimensional structures of proteins demonstrates the effectiveness of this method in identifying DNA-binding sites from sequence information. In 33% (56 out of 171) of the proteins, the classifier identifies the interaction sites by correctly recognizing at least half of the interface residues. In 87% (149 out of 171) of the proteins, the classifier correctly identifies at least 20% of the interface residues. This suggests the possibility of using such classifiers to identify potential DNA-binding motifs and to gain potentially useful insights into sequence correlates of protein-DNA interactions. Conclusion Naïve Bayes classifiers trained to identify DNA-binding residues using sequence information offer a computationally efficient approach to identifying putative DNA-binding sites in DNA-binding proteins and recognizing potential DNA-binding motifs.
Collapse
Affiliation(s)
- Changhui Yan
- Department of Computer Science, Utah State University, Logan, Utah, 84341, USA
| | - Michael Terribilini
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, Iowa, 50010, USA
- Bioinformatics and Computational Biology Graduate Program, Iowa State University, Ames, Iowa, 50010, USA
| | - Feihong Wu
- Artificial Intelligence Research Laboratory, Iowa State University, Ames, Iowa, 50010, USA
- Department of Computer Science, Iowa State University, Ames, Iowa, 50010, USA
- Center for Computational Intelligence, Learning, and Discovery, Iowa State University, Ames, Iowa, 50010, USA
| | - Robert L Jernigan
- Bioinformatics and Computational Biology Graduate Program, Iowa State University, Ames, Iowa, 50010, USA
- Center for Computational Intelligence, Learning, and Discovery, Iowa State University, Ames, Iowa, 50010, USA
- Laurence H Baker Center for Bioinformatics and Biological Statistics, Iowa State University, Ames, Iowa, 50010, USA
- Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, Iowa, 50010, USA
| | - Drena Dobbs
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, Iowa, 50010, USA
- Bioinformatics and Computational Biology Graduate Program, Iowa State University, Ames, Iowa, 50010, USA
- Artificial Intelligence Research Laboratory, Iowa State University, Ames, Iowa, 50010, USA
- Center for Computational Intelligence, Learning, and Discovery, Iowa State University, Ames, Iowa, 50010, USA
- Laurence H Baker Center for Bioinformatics and Biological Statistics, Iowa State University, Ames, Iowa, 50010, USA
| | - Vasant Honavar
- Bioinformatics and Computational Biology Graduate Program, Iowa State University, Ames, Iowa, 50010, USA
- Artificial Intelligence Research Laboratory, Iowa State University, Ames, Iowa, 50010, USA
- Department of Computer Science, Iowa State University, Ames, Iowa, 50010, USA
- Center for Computational Intelligence, Learning, and Discovery, Iowa State University, Ames, Iowa, 50010, USA
- Laurence H Baker Center for Bioinformatics and Biological Statistics, Iowa State University, Ames, Iowa, 50010, USA
| |
Collapse
|
5
|
Gowher H, Loutchanwoot P, Vorobjeva O, Handa V, Jurkowska RZ, Jurkowski TP, Jeltsch A. Mutational Analysis of the Catalytic Domain of the Murine Dnmt3a DNA-(cytosine C5)-methyltransferase. J Mol Biol 2006; 357:928-41. [PMID: 16472822 DOI: 10.1016/j.jmb.2006.01.035] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2005] [Revised: 12/22/2005] [Accepted: 01/08/2006] [Indexed: 11/15/2022]
Abstract
On the basis of amino acid sequence alignments and structural data of related enzymes, we have performed a mutational analysis of 14 amino acid residues in the catalytic domain of the murine Dnmt3a DNA-(cytosine C5)-methyltransferase. The target residues are located within the ten conserved amino acid sequence motifs characteristic for cytosine-C5 methyltransferases and in the putative DNA recognition domain of the enzyme (TRD). Mutant proteins were purified and tested for their catalytic properties and their abilities to bind DNA and AdoMet. We prepared a structural model of Dnmt3a to interpret our results. We demonstrate that Phe50 (motif I) and Glu74 (motif II) are important for AdoMet binding and catalysis. D96A (motif III) showed reduced AdoMet binding but increased activity under conditions of saturation with S-adenosyl-L-methionine (AdoMet), indicating that the contact of Asp96 to AdoMet is not required for catalysis. R130A (following motif IV), R241A and R246A (in the TRD), R292A, and R297A (both located in front of motif X) showed reduced DNA binding. R130A displayed a strong reduction in catalytic activity and a complete change in flanking sequence preferences, indicating that Arg130 has an important role in the DNA interaction of Dnmt3a. R292A also displayed reduced activity and changes in the flanking sequence preferences, indicating a potential role in DNA contacts farther away from the CG target site. N167A (motif VI) and R202A (motif VIII) have normal AdoMet and DNA binding but reduced catalytic activity. While Asn167 might contribute to the positioning of residues from motif VI, according to structural data Arg202 has a role in catalysis of cytosine-C5 methyltransferases. The R295A variant was catalytically inactive most likely because of destabilization of the hinge sub-domain of the protein.
Collapse
Affiliation(s)
- Humaira Gowher
- International University Bremen, Biochemistry, School of Engineering and Science, Campus Ring 1, 28759 Bremen, Germany
| | | | | | | | | | | | | |
Collapse
|