1
|
Inan T, Flinko R, Lewis GK, MacKerell AD, Kurkcuoglu O. Identifying and Assessing Putative Allosteric Sites and Modulators for CXCR4 Predicted through Network Modeling and Site Identification by Ligand Competitive Saturation. J Phys Chem B 2024; 128:5157-5174. [PMID: 38647430 PMCID: PMC11139592 DOI: 10.1021/acs.jpcb.4c00925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Revised: 04/04/2024] [Accepted: 04/08/2024] [Indexed: 04/25/2024]
Abstract
The chemokine receptor CXCR4 is a critical target for the treatment of several cancer types and HIV-1 infections. While orthosteric and allosteric modulators have been developed targeting its extracellular or transmembrane regions, the intramembrane region of CXCR4 may also include allosteric binding sites suitable for the development of allosteric drugs. To investigate this, we apply the Gaussian Network Model (GNM) to the monomeric and dimeric forms of CXCR4 to identify residues essential for its local and global motions located in the hinge regions of the protein. Residue interaction network (RIN) analysis suggests hub residues that participate in allosteric communication throughout the receptor. Mutual residues from the network models reside in regions with a high capacity to alter receptor dynamics upon ligand binding. We then investigate the druggability of these potential allosteric regions using the site identification by ligand competitive saturation (SILCS) approach, revealing two putative allosteric sites on the monomer and three on the homodimer. Two screening campaigns with Glide and SILCS-Monte Carlo docking using FDA-approved drugs suggest 20 putative hit compounds including antifungal drugs, anticancer agents, HIV protease inhibitors, and antimalarial drugs. In vitro assays considering mAB 12G5 and CXCL12 demonstrate both positive and negative allosteric activities of these compounds, supporting our computational approach. However, in vivo functional assays based on the recruitment of β-arrestin to CXCR4 do not show significant agonism and antagonism at a single compound concentration. The present computational pipeline brings a new perspective to computer-aided drug design by combining conformational dynamics based on network analysis and cosolvent analysis based on the SILCS technology to identify putative allosteric binding sites using CXCR4 as a showcase.
Collapse
Affiliation(s)
- Tugce Inan
- Department
of Chemical Engineering, Istanbul Technical
University, Istanbul 34469, Turkey
| | - Robin Flinko
- Institute
of Human Virology, University of Maryland
School of Medicine, Baltimore, Maryland 21201, United States
| | - George K. Lewis
- Institute
of Human Virology, University of Maryland
School of Medicine, Baltimore, Maryland 21201, United States
| | - Alexander D. MacKerell
- University
of Maryland Computer-Aided Drug Design Center, Department of Pharmaceutical
Sciences, School of Pharmacy, University
of Maryland, Baltimore, Maryland 21201, United States
| | - Ozge Kurkcuoglu
- Department
of Chemical Engineering, Istanbul Technical
University, Istanbul 34469, Turkey
| |
Collapse
|
2
|
Ropii B, Bethasari M, Anshori I, Koesoema AP, Shalannanda W, Satriawan A, Setianingsih C, Akbar MR, Aditama R, Fahmi F, Sutanto E, Yazid M, Aziz M. The molecular interaction of six single-stranded DNA aptamers to cardiac troponin I revealed by docking and molecular dynamics simulation. PLoS One 2024; 19:e0302475. [PMID: 38748685 PMCID: PMC11095691 DOI: 10.1371/journal.pone.0302475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 04/04/2024] [Indexed: 05/19/2024] Open
Abstract
Cardiac troponin I (cTnI) is a cardiac biomarker for diagnosing ischemic heart disease and acute myocardial infarction. Current biochemical assays use antibodies (Abs) due to their high specificity and sensitivity. However, there are some limitations, such as the high-cost production of Abs due to complex instruments, reagents, and steps; the variability of Abs quality from batch to batch; the low stability at high temperatures; and the difficulty of chemical modification. Aptamer overcomes the limitations of antibodies, such as relatively lower cost, high reproducibility, high stability, and ease of being chemically modified. Aptamers are three-dimensional architectures of single-stranded RNA or DNA that bind to targets such as proteins. Six aptamers (Tro1-Tro6) with higher binding affinity than an antibody have been identified, but the molecular interaction has not been studied. In this study, six DNA aptamers were modeled and docked to cTnI protein. Molecular docking revealed that the interaction between all aptamer and cTnI happened in the similar cTnI region. The interaction between aptamer and cTnI involved hydrophobic interaction, hydrogen bonds, π-cation interactions, π-stack interactions, and salt-bridge formation. The calculated binding energy of all complexes was negative, which means that the complex formation was thermodynamically favorable. The electrostatic energy term was the main driving force of the interaction between all aptamer and cTnI. This study could be used to predict the behavior of further modified aptamer to improve aptamer performance.
Collapse
Affiliation(s)
- Bejo Ropii
- School of Electrical Engineering and Informatics, Bandung Institute of Technology, Bandung, West Java, Indonesia
| | - Maulidwina Bethasari
- Department of Pharmacy, Universitas Muhammadiyah Bandung, Bandung, West Java, Indonesia
| | - Isa Anshori
- School of Electrical Engineering and Informatics, Bandung Institute of Technology, Bandung, West Java, Indonesia
- Center for Health and Sports Technology, Bandung Institute of Technology, Bandung, West Java, Indonesia
| | - Allya Paramita Koesoema
- School of Electrical Engineering and Informatics, Bandung Institute of Technology, Bandung, West Java, Indonesia
| | - Wervyan Shalannanda
- School of Electrical Engineering and Informatics, Bandung Institute of Technology, Bandung, West Java, Indonesia
| | - Ardianto Satriawan
- School of Electrical Engineering and Informatics, Bandung Institute of Technology, Bandung, West Java, Indonesia
| | - Casi Setianingsih
- Department of Computer Engineering, School of Electrical Engineering, Telkom University, Bandung Regency, West Java, Indonesia
| | - Mohammad Rizki Akbar
- Department of Cardiology and Vascular Medicine, Faculty of Medicine, Universitas Padjadjaran and Dr. Hasan Sadikin General Hospital, Bandung, Indonesia
| | - Reza Aditama
- Biochemistry and Biomolecular Engineering Research Division, Faculty of Mathematics and Natural Sciences, Bandung Institute of Technology, Bandung, West Java, Indonesia
| | - Fahmi Fahmi
- Department of Electrical Engineering, Faculty of Engineering, Universitas Sumatera Utara, Medan, North Sumatera, Indonesia
| | - Erwin Sutanto
- Department of Physics, Faculty of Science and Technology, Universitas Airlangga, Kampus C Unair Mulyorejo, Surabaya, East Java, Indonesia
| | - Muhammad Yazid
- Biomedical Engineering Department, Institut Teknologi Sepuluh Nopember, Surabaya, East Java, Indonesia
| | - Muhammad Aziz
- Institute of Industrial Science, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
3
|
Mursalim MKN, Mengko TLER, Hertadi R, Purwarianti A, Susanty M. BiCaps-DBP: Predicting DNA-binding proteins from protein sequences using Bi-LSTM and a 1D-capsule network. Comput Biol Med 2023; 163:107241. [PMID: 37437362 DOI: 10.1016/j.compbiomed.2023.107241] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 06/23/2023] [Accepted: 07/07/2023] [Indexed: 07/14/2023]
Abstract
Predicting DNA-binding proteins (DBPs) based solely on primary sequences is one of the most challenging problems in genome annotation. DBPs play a crucial role in various biological processes, including DNA replication, transcription, repair, and splicing. Some DBPs are essential in pharmaceutical research on various human cancers and autoimmune diseases. Existing experimental methods for identifying DBPs are time-consuming and costly. Therefore, developing a rapid and accurate computational technique is necessary to address the issue. This study introduces BiCaps-DBP, a deep learning-based method that improves DBP prediction performance by combining bidirectional long short-term memory with a 1D-capsule network. This study uses three training and independent datasets to evaluate the proposed model's generalizability and robustness. Based on three independent datasets, BiCaps-DBP achieved 1.05%, 5.79% and 0.40% higher accuracies than an existing predictor for PDB2272, PDB186 and PDB20000, respectively. These outcomes indicate that the proposed method is a promising DBP predictor.
Collapse
Affiliation(s)
- Muhammad K N Mursalim
- School of Electrical Engineering and Informatics, Bandung Institute of Technology, Bandung, 40132, Indonesia; Department of Informatics Engineering, Universal University, Batam, Indonesia
| | - Tati L E R Mengko
- School of Electrical Engineering and Informatics, Bandung Institute of Technology, Bandung, 40132, Indonesia.
| | - Rukman Hertadi
- Faculty of Mathematics and Natural Sciences, Bandung Institute of Technology, Bandung, 40132, Indonesia
| | - Ayu Purwarianti
- School of Electrical Engineering and Informatics, Bandung Institute of Technology, Bandung, 40132, Indonesia; Center for Artificial Intelligence (U-CoE AI-VLB), Bandung Institute of Technology, Bandung, 40132, Indonesia
| | - Meredita Susanty
- School of Electrical Engineering and Informatics, Bandung Institute of Technology, Bandung, 40132, Indonesia; Department of Computer Science, Pertamina University, Jakarta, Indonesia
| |
Collapse
|
4
|
Guan S, Zou Q, Wu H, Ding Y. Protein-DNA Binding Residues Prediction Using a Deep Learning Model With Hierarchical Feature Extraction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2619-2628. [PMID: 35834447 DOI: 10.1109/tcbb.2022.3190933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Biologically important effects occur when proteins bind to other substances, of which binding to DNA is a crucial one. Therefore, accurate identification of protein-DNA binding residues is important for further understanding of the protein-DNA interaction mechanism. Although wet-lab methods can accurately obtain the location of bound residues, it requires significant human, financial and time costs. There is thus an urgent need to develop efficient computational-based methods. Most current state-of-the-art methods are two-step approaches: the first step uses a sliding window technique to extract residue features; the second step uses each residue as an input to the model for prediction. This has a negative impact on the efficiency of prediction and ease of use. In this study, we propose a sequence-to-sequence (seq2seq) model that can input the entire protein sequence of variable length and use two modules, Transformer Encoder Block and Feature Extracting Block, for hierarchical feature extraction, where Transformer Encoder Block is used to extract global features, and then Feature Extracting Block is used to extract local features to further improve the recognition capability of the model. The comparison results on two benchmark datasets, namely PDNA-543 and PDNA-41, prove the effectiveness of our method in identifying protein-DNA binding residues.
Collapse
|
5
|
Hu J, Bai YS, Zheng LL, Jia NX, Yu DJ, Zhang GJ. Protein-DNA Binding Residue Prediction via Bagging Strategy and Sequence-Based Cube-Format Feature. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3635-3645. [PMID: 34714748 DOI: 10.1109/tcbb.2021.3123828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Protein-DNA interactions play an important role in diverse biological processes. Accurately identifying protein-DNA binding residues is a critical but challenging task for protein function annotations and drug design. Although wet-lab experimental methods are the most accurate way to identify protein-DNA binding residues, they are time consuming and labor intensive. There is an urgent need to develop computational methods to rapidly and accurately predict protein-DNA binding residues. In this study, we propose a novel sequence-based method, named PredDBR, for predicting DNA-binding residues. In PredDBR, for each query protein, its position-specific frequency matrix (PSFM), predicted secondary structure (PSS), and predicted probabilities of ligand-binding residues (PPLBR) are first generated as three feature sources. Secondly, for each feature source, the sliding window technique is employed to extract the matrix-format feature of each residue. Then, we design two strategies, i.e., square root (SR) and average (AVE), to separately transform PSFM-based and two predicted feature source-based, i.e., PSS-based and PPLBR-based, matrix-format features of each residue into three corresponding cube-format features. Finally, after serially combining the three cube-format features, the ensemble classifier is generated via applying bagging strategy to multiple base classifiers built by the framework of 2D convolutional neural network. The computational experimental results demonstrate that the proposed PredDBR achieves an average overall accuracy of 93.7% and a Mathew's correlation coefficient of 0.405 on two independent validation datasets and outperforms several state-of-the-art sequenced-based protein-DNA binding residue predictors. The PredDBR web-server is available at https://jun-csbio.github.io/PredDBR/.
Collapse
|
6
|
Marques-Pereira C, Pires M, Moreira IS. Discovery of Virus-Host interactions using bioinformatic tools. Methods Cell Biol 2022; 169:169-198. [DOI: 10.1016/bs.mcb.2022.02.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
7
|
Yuce M, Cicek E, Inan T, Dag AB, Kurkcuoglu O, Sungur FA. Repurposing of FDA-approved drugs against active site and potential allosteric drug-binding sites of COVID-19 main protease. Proteins 2021; 89:1425-1441. [PMID: 34169568 PMCID: PMC8441840 DOI: 10.1002/prot.26164] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Revised: 06/02/2021] [Accepted: 06/06/2021] [Indexed: 02/06/2023]
Abstract
The novel coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) still has serious negative effects on health, social life, and economics. Recently, vaccines from various companies have been urgently approved to control SARS-CoV-2 infections. However, any specific antiviral drug has not been confirmed so far for regular treatment. An important target is the main protease (Mpro ), which plays a major role in replication of the virus. In this study, Gaussian and residue network models are employed to reveal two distinct potential allosteric sites on Mpro that can be evaluated as drug targets besides the active site. Then, Food and Drug Administration (FDA)-approved drugs are docked to three distinct sites with flexible docking using AutoDock Vina to identify potential drug candidates. Fourteen best molecule hits for the active site of Mpro are determined. Six of these also exhibit high docking scores for the potential allosteric regions. Full-atom molecular dynamics simulations with MM-GBSA method indicate that compounds docked to active and potential allosteric sites form stable interactions with high binding free energy (∆Gbind ) values. ∆Gbind values reach -52.06 kcal/mol for the active site, -51.08 kcal/mol for the potential allosteric site 1, and - 42.93 kcal/mol for the potential allosteric site 2. Energy decomposition calculations per residue elucidate key binding residues stabilizing the ligands that can further serve to design pharmacophores. This systematic and efficient computational analysis successfully determines ivermectine, diosmin, and selinexor currently subjected to clinical trials, and further proposes bromocriptine, elbasvir as Mpro inhibitor candidates to be evaluated against SARS-CoV-2 infections.
Collapse
Affiliation(s)
- Merve Yuce
- Department of Chemical EngineeringIstanbul Technical UniversityIstanbulTurkey
| | - Erdem Cicek
- Computational Science and Engineering DivisionInformatics Institute, Istanbul Technical UniversityIstanbulTurkey
| | - Tugce Inan
- Department of Chemical EngineeringIstanbul Technical UniversityIstanbulTurkey
| | - Aslihan Basak Dag
- Department of Molecular Biology and GeneticsIstanbul Technical UniversityIstanbulTurkey
| | - Ozge Kurkcuoglu
- Department of Chemical EngineeringIstanbul Technical UniversityIstanbulTurkey
| | - Fethiye Aylin Sungur
- Computational Science and Engineering DivisionInformatics Institute, Istanbul Technical UniversityIstanbulTurkey
| |
Collapse
|
8
|
Hendrix SG, Chang KY, Ryu Z, Xie ZR. DeepDISE: DNA Binding Site Prediction Using a Deep Learning Method. Int J Mol Sci 2021; 22:ijms22115510. [PMID: 34073705 PMCID: PMC8197219 DOI: 10.3390/ijms22115510] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Revised: 04/30/2021] [Accepted: 05/19/2021] [Indexed: 11/18/2022] Open
Abstract
It is essential for future research to develop a new, reliable prediction method of DNA binding sites because DNA binding sites on DNA-binding proteins provide critical clues about protein function and drug discovery. However, the current prediction methods of DNA binding sites have relatively poor accuracy. Using 3D coordinates and the atom-type of surface protein atom as the input, we trained and tested a deep learning model to predict how likely a voxel on the protein surface is to be a DNA-binding site. Based on three different evaluation datasets, the results show that our model not only outperforms several previous methods on two commonly used datasets, but also demonstrates its robust performance to be consistent among the three datasets. The visualized prediction outcomes show that the binding sites are also mostly located in correct regions. We successfully built a deep learning model to predict the DNA binding sites on target proteins. It demonstrates that 3D protein structures plus atom-type information on protein surfaces can be used to predict the potential binding sites on a protein. This approach should be further extended to develop the binding sites of other important biological molecules.
Collapse
Affiliation(s)
- Samuel Godfrey Hendrix
- Computational Drug Discovery Laboratory, School of Electrical and Computer Engineering, College of Engineering, University of Georgia, Athens, GA 30602, USA; (S.G.H.); (Z.R.)
| | - Kuan Y. Chang
- Department of Computer Science and Engineering, National Taiwan Ocean University, Keelung 202, Taiwan;
| | - Zeezoo Ryu
- Computational Drug Discovery Laboratory, School of Electrical and Computer Engineering, College of Engineering, University of Georgia, Athens, GA 30602, USA; (S.G.H.); (Z.R.)
- Department of Computer Science, Franklin College of Arts and Sciences, University of Georgia, Athens, GA 30602, USA
| | - Zhong-Ru Xie
- Computational Drug Discovery Laboratory, School of Electrical and Computer Engineering, College of Engineering, University of Georgia, Athens, GA 30602, USA; (S.G.H.); (Z.R.)
- Correspondence:
| |
Collapse
|
9
|
Abstract
Biological processes are often mediated by complexes formed between proteins and various biomolecules. The 3D structures of such protein-biomolecule complexes provide insights into the molecular mechanism of their action. The structure of these complexes can be predicted by various computational methods. Choosing an appropriate method for modelling depends on the category of biomolecule that a protein interacts with and the availability of structural information about the protein and its interacting partner. We intend for the contents of this chapter to serve as a guide as to what software would be the most appropriate for the type of data at hand and the kind of 3D complex structure required. Particularly, we have dealt with protein-small molecule ligand, protein-peptide, protein-protein, and protein-nucleic acid interactions.Most, if not all, model building protocols perform some sampling and scoring. Typically, several alternate conformations and configurations of the interactors are sampled. Each such sample is then scored for optimization. To boost the confidence in these predicted models, their assessment using other independent scoring schemes besides the inbuilt/default ones would prove to be helpful. This chapter also lists such software and serves as a guide to gauge the fidelity of modelled structures of biomolecular complexes.
Collapse
|
10
|
Kurkcuoglu O, Gunes MU, Haliloglu T. Local and Global Motions Underlying Antibiotic Binding in Bacterial Ribosome. J Chem Inf Model 2020; 60:6447-6461. [PMID: 33231066 DOI: 10.1021/acs.jcim.0c00967] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The bacterial ribosome is one of the most important targets in the treatment of infectious diseases. As antibiotic resistance in bacteria poses a growing threat, a significant amount of effort is concentrated on exploring new drug-binding sites where testable predictions are of significance. Here, we study the dynamics of a ribosomal complex and 67 small and large subunits of the ribosomal crystal structures (64 antibiotic-bound, 3 antibiotic-free) from Deinococcus radiodurans, Escherichia coli, Haloarcula marismortui, and Thermus thermophilus by the Gaussian network model. Interestingly, a network of nucleotides coupled in high-frequency fluctuations reveals known antibiotic-binding sites. These sites are seen to locate at the interface of dynamic domains that have an intrinsic dynamic capacity to interfere with functional globular motions. The nucleotides and the residues fluctuating in the fast and slow modes of motion thus have promise for plausible antibiotic-binding and allosteric sites that can alter antibiotic binding and resistance. Overall, the present analysis brings a new dynamic perspective to the long-discussed link between small-molecule binding and large conformational changes of the supramolecule.
Collapse
Affiliation(s)
- Ozge Kurkcuoglu
- Department of Chemical Engineering, Istanbul Technical University, Istanbul 34469, Turkey
| | - M Unal Gunes
- Polymer Research Center, Bogazici University, Istanbul 34342, Turkey
| | - Turkan Haliloglu
- Polymer Research Center, Bogazici University, Istanbul 34342, Turkey
| |
Collapse
|
11
|
Multiple protein-DNA interfaces unravelled by evolutionary information, physico-chemical and geometrical properties. PLoS Comput Biol 2020; 16:e1007624. [PMID: 32012150 PMCID: PMC7018136 DOI: 10.1371/journal.pcbi.1007624] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Revised: 02/13/2020] [Accepted: 12/20/2019] [Indexed: 02/06/2023] Open
Abstract
Interactions between proteins and nucleic acids are at the heart of many essential biological processes. Despite increasing structural information about how these interactions may take place, our understanding of the usage made of protein surfaces by nucleic acids is still very limited. This is in part due to the inherent complexity associated to protein surface deformability and evolution. In this work, we present a method that contributes to decipher such complexity by predicting protein-DNA interfaces and characterizing their properties. It relies on three biologically and physically meaningful descriptors, namely evolutionary conservation, physico-chemical properties and surface geometry. We carefully assessed its performance on several hundreds of protein structures and compared it to several machine-learning state-of-the-art methods. Our approach achieves a higher sensitivity compared to the other methods, with a similar precision. Importantly, we show that it is able to unravel ‘hidden’ binding sites by applying it to unbound protein structures and to proteins binding to DNA via multiple sites and in different conformations. It is also applicable to the detection of RNA-binding sites, without significant loss of performance. This confirms that DNA and RNA-binding sites share similar properties. Our method is implemented as a fully automated tool, JETDNA2, freely accessible at: http://www.lcqb.upmc.fr/JET2DNA. We also provide a new dataset of 187 protein-DNA complex structures, along with a subset of 82 associated unbound structures. The set represents the largest body of high-resolution crystallographic structures of protein-DNA complexes, use biological protein assemblies as DNA-binding units, and covers all major types of protein-DNA interactions. It is available at: http://www.lcqb.upmc.fr/PDNAbenchmarks. Protein-DNA interactions are essential to living organisms and their impairment is associated to many diseases. For these reasons, they have become increasingly important therapeutic targets. Experimental structure determination has revealed different binding motifs and modes, associated to different functions. Yet, the available structural data gives us only a glimpse of the multiplicity and complexity of protein surface usage by DNA. In this work, we use a three-layer model to describe and predict DNA-binding sites at protein surfaces. Given a protein, we consider the way its residues are conserved through evolution, their physico-chemical properties and geometrical shapes to decrypt its surface. We are able to detect a large portion of interacting residues with good precision, even when they are ‘hidden’ by conformational changes. We highlight cases where one protein binds DNA via distinct regions to perform different functions. We are able to uncover the alternative binding sites and relate their properties with their specific roles. Our work can help guiding mutagenesis experiments and the development of new drugs specifically targeting one site while limiting possible side effects.
Collapse
|
12
|
Zhou J, Lu Q, Xu R, Gui L, Wang H. EL_LSTM: Prediction of DNA-Binding Residue from Protein Sequence by Combining Long Short-Term Memory and Ensemble Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:124-135. [PMID: 30040656 DOI: 10.1109/tcbb.2018.2858806] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Most past works for DNA-binding residue prediction did not consider the relationships between residues. In this paper, we propose a novel approach for DNA-binding residue prediction, referred to as EL_LSTM, which includes two main components. The first component is the Long Short-Term Memory (LSTM), which learns pairwise relationships between residues through a bi-gram model and then learns feature vectors for all residues. The second component is an ensemble learning based classifier introduced to tackle the data imbalance problem in binding residue predictions. We use a variant of the bagging strategy in ensemble learning to achieve balanced samples. Evaluations on PDNA-224 and DBP-123 show that adding feature relationships performs better than classifiers without feature relationships by at least 0.028 on MCC, 1.18 percent on ST and 0.012 on AUC. This indicates the usefulness of feature relationships for DNA-binding residue predictions. Evaluation on using ensemble learning indicates that the improvement can reach at least 0.021 on MCC, 1.32 percent on ST, and 0.018 on AUC compared to the use of a single LSTM classifier. Comparisons with the state-of-the-art predictors show that our proposed EL_LSTM outperforms them significantly. Further feature analysis validates the effectiveness of LSTM for the prediction of DNA-binding residues.
Collapse
|
13
|
Bravo-Bautista N, Hoang H, Joshi A, Travis J, Wooten M, Wymer NJ. Investigating the Deoxyribonuclease Activity of CRM197 with Site-Directed Mutagenesis. ACS OMEGA 2019; 4:11987-11992. [PMID: 31460310 PMCID: PMC6682014 DOI: 10.1021/acsomega.9b00418] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 05/13/2019] [Indexed: 05/10/2023]
Abstract
The protein cross-reactive material 197 (CRM197) is known to catalyze the hydrolytic cleavage of DNA (DNase activity). A suspected metal-binding site (S109, T111, and E112) and suspected DNA-binding motif (T89, K90, and V91) were predicted within the CRM197 protein X-ray crystal structure (4AE0) using METSITE and DNABindProt, respectively. Between these two predicted sites is a groove (K103, E116, T120, E122, F123, and R126) that may assist in DNase activity. Alanine scanning was performed at these sites to determine which amino acids might be important for DNase activity. These mutations individually or in combination either maintained or increased the overall DNase activity compared to the unmodified CRM197. Mutation at the suspected metal-binding site showed similar fluctuations to the overall DNase activity whether the DNase assays were run with Mg2+ and Ca2+ or Mn2+. However, many of the mutations within the suspected DNA-binding motif saw significant differences depending on which metal was used. Only some of the improvements in DNase activity could be attributed to improved folding of the mutants compared to the unmodified CRM197. This study should provide a basis for further mutagenesis studies to remove the DNase activity of CRM197.
Collapse
Affiliation(s)
- Nathalie Bravo-Bautista
- Department
of Chemistry and Biochemistry, Department of Biological and Biomedical
Sciences, and Department of Pharmaceutical Sciences, North Carolina Central University, Durham, North Carolina 27707, United States
| | - Hieu Hoang
- Department
of Chemistry and Biochemistry, Department of Biological and Biomedical
Sciences, and Department of Pharmaceutical Sciences, North Carolina Central University, Durham, North Carolina 27707, United States
| | - Anusha Joshi
- Department
of Chemistry and Biochemistry, Department of Biological and Biomedical
Sciences, and Department of Pharmaceutical Sciences, North Carolina Central University, Durham, North Carolina 27707, United States
| | - Jennifer Travis
- Department
of Chemistry and Biochemistry, Department of Biological and Biomedical
Sciences, and Department of Pharmaceutical Sciences, North Carolina Central University, Durham, North Carolina 27707, United States
| | - Melissa Wooten
- Department
of Chemistry and Biochemistry, Department of Biological and Biomedical
Sciences, and Department of Pharmaceutical Sciences, North Carolina Central University, Durham, North Carolina 27707, United States
| | - Nathan J. Wymer
- Department
of Chemistry and Biochemistry, Department of Biological and Biomedical
Sciences, and Department of Pharmaceutical Sciences, North Carolina Central University, Durham, North Carolina 27707, United States
- E-mail:
| |
Collapse
|
14
|
Emamjomeh A, Choobineh D, Hajieghrari B, MahdiNezhad N, Khodavirdipour A. DNA-protein interaction: identification, prediction and data analysis. Mol Biol Rep 2019; 46:3571-3596. [PMID: 30915687 DOI: 10.1007/s11033-019-04763-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Accepted: 03/14/2019] [Indexed: 12/30/2022]
Abstract
Life in living organisms is dependent on specific and purposeful interaction between other molecules. Such purposeful interactions make the various processes inside the cells and the bodies of living organisms possible. DNA-protein interactions, among all the types of interactions between different molecules, are of considerable importance. Currently, with the development of numerous experimental techniques, diverse methods are convenient for recognition and investigating such interactions. While the traditional experimental techniques to identify DNA-protein complexes are time-consuming and are unsuitable for genome-scale studies, the current high throughput approaches are more efficient in determining such interaction at a large-scale, but they are clearly too costly to be practice for daily applications. Hence, according to the availability of much information related to different biological sequences and clearing different dimensions of conditions in which such interactions are formed, with the developments related to the computer, mathematics, and statistics motivate scientists to develop bioinformatics tools for prediction the interaction site(s). Until now, there has been much progress in this field. In this review, the factors and conditions governing the interaction and the laboratory techniques for examining such interactions are addressed. In addition, developed bioinformatics tools are introduced and compared for this reason and, in the end, several suggestions are offered for the promotion of such tools in prediction with much more precision.
Collapse
Affiliation(s)
- Abbasali Emamjomeh
- Laboratory of Computational Biotechnology and Bioinformatics (CBB), Department of Plant Breeding and Biotechnology (PBB), University of Zabol, Zabol, 98615-538, Iran.
| | - Darush Choobineh
- Agricultural Biotechnology, Department of Plant Breeding and Biotechnology (PBB), Faculty of Agriculture, University of Zabol, Zabol, Iran
| | - Behzad Hajieghrari
- Department of Agricultural Biotechnology, College of Agriculture, Jahrom University, Jahrom, 74135-111, Iran.
| | - Nafiseh MahdiNezhad
- Laboratory of Computational Biotechnology and Bioinformatics (CBB), Department of Plant Breeding and Biotechnology (PBB), University of Zabol, Zabol, 98615-538, Iran
| | - Amir Khodavirdipour
- Division of Human Genetics, Department of Anatomy, St. John's hospital, Bangalore, India
| |
Collapse
|
15
|
Ghosh S, Bagchi A. Structural study to analyze the DNA-binding properties of DsrC protein from the dsr operon of sulfur-oxidizing bacterium Allochromatium vinosum. J Mol Model 2019; 25:74. [PMID: 30798412 DOI: 10.1007/s00894-019-3945-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2018] [Accepted: 01/29/2019] [Indexed: 01/11/2023]
Abstract
Our environment is densely populated with various beneficial sulfur-oxidizing prokaryotes (SOPs). These organisms are responsible for the proper maintenance of biogeochemical sulfur cycles to regulate the turnover of biological sulfur substrates in the environment. Allochromatium vinosum strain DSM 180T is a gamma-proteobacterium and is a member of SOP. The organism codes for the sulfur-oxidizing dsr operon, which is comprised of dsrABEFHCMKLJOPNRS genes. The Dsr proteins formed from dsr operon are responsible for formation of sulfur globules. However, the molecular mechanism of the regulation of the dsr operon is not yet fully established. Among the proteins encoded by dsr genes, DsrC is known to have some regulatory functions. DsrC possesses a helix-turn-helix (HTH) DNA-binding motif. Interestingly, the structural details of this interaction have not yet been fully established. Therefore, we tried to analyze the binding interactions of the DsrC protein with the promoter DNA structure of the dsr operon as well as a random DNA as the control. We also performed molecular dynamics simulations of the DsrC-DNA complexes. This structure-function relationship investigation revealed the most probable binding interactions of the DsrC protein with the promoter region present upstream of the dsrA gene in the dsr operon. As expected, the random DNA structure could not properly interact with DsrC. Our analysis will therefore help researchers to predict a plausible biochemical mechanism for the sulfur oxidation process. Graphical Abstract Interaction of Allochromatium vinosum DsrC protein with the promoter region present upstream of the dsrA gene.
Collapse
Affiliation(s)
- Semanti Ghosh
- Department of Biochemistry and Biophysics, University of Kalyani, Kalyani, Nadia, 741235, India.,Crystallography and Molecular Biology Division, Saha Institute of Nuclear Physics, 1/AF Bidhannagar, Kolkata, 700064, India
| | - Angshuman Bagchi
- Department of Biochemistry and Biophysics, University of Kalyani, Kalyani, Nadia, 741235, India.
| |
Collapse
|
16
|
Deng L, Pan J, Xu X, Yang W, Liu C, Liu H. PDRLGB: precise DNA-binding residue prediction using a light gradient boosting machine. BMC Bioinformatics 2018; 19:522. [PMID: 30598073 PMCID: PMC6311926 DOI: 10.1186/s12859-018-2527-1] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Background Identifying specific residues for protein-DNA interactions are of considerable importance to better recognize the binding mechanism of protein-DNA complexes. Despite the fact that many computational DNA-binding residue prediction approaches have been developed, there is still significant room for improvement concerning overall performance and availability. Results Here, we present an efficient approach termed PDRLGB that uses a light gradient boosting machine (LightGBM) to predict binding residues in protein-DNA complexes. Initially, we extract a wide variety of 913 sequence and structure features with a sliding window of 11. Then, we apply the random forest algorithm to sort the features in descending order of importance and obtain the optimal subset of features using incremental feature selection. Based on the selected feature set, we use a light gradient boosting machine to build the prediction model for DNA-binding residues. Our PDRLGB method shows better overall predictive accuracy and relatively less training time than other widely used machine learning (ML) methods such as random forest (RF), Adaboost and support vector machine (SVM). We further compare PDRLGB with various existing approaches on the independent test datasets and show improvement in results over the existing state-of-the-art approaches. Conclusions PDRLGB is an efficient approach to predict specific residues for protein-DNA interactions.
Collapse
Affiliation(s)
- Lei Deng
- School of Software, Central South University, Changsha, 410075, China
| | - Juan Pan
- School of Software, Central South University, Changsha, 410075, China
| | - Xiaojie Xu
- School of Software, Central South University, Changsha, 410075, China
| | - Wenyi Yang
- School of Software, Central South University, Changsha, 410075, China
| | - Chuyao Liu
- School of Software, Central South University, Changsha, 410075, China
| | - Hui Liu
- Lab of Information Management, Changzhou University, Changzhou, 213164, China.
| |
Collapse
|
17
|
Yan W, Hu G, Liang Z, Zhou J, Yang Y, Chen J, Shen B. Node-Weighted Amino Acid Network Strategy for Characterization and Identification of Protein Functional Residues. J Chem Inf Model 2018; 58:2024-2032. [PMID: 30107728 DOI: 10.1021/acs.jcim.8b00146] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The study of functional residues (FRs) is essential for understanding protein functions and biological processes. The amino acid network (AAN) has become an emerging paradigm for studying FRs during the past decade. Current AAN models ignore the heterogeneity of nodes and treat amino acids in the AAN as the same. However, the properties of each amino acid node are of fundamental importance. We here proposed a node-weighted AAN strategy termed the node-weighted amino acid contact energy network (NACEN) to characterize and predict three types of FRs, namely, hot spots, catalytic residues, and allosteric residues. We first constructed NACENs with their nodes weighted based on structural, sequence, physicochemical, and dynamical properties of the amino acids and then characterized the FRs with the NACEN parameters. We finally built machine learning predictors to identify each type of FR. The results revealed that residues characterized with NACEN parameters are more distinguishable between FRs and non-FRs than those with unweighted network ones. With few features for classification, NACEN yields comparable performance for FR identification and provides residue level prediction for allosteric regulation. The proposed strategy can be easily implemented to other functional residue identification. An R package is also provided for NACEN construction and analysis at http://sysbio.suda.edu.cn/NACEN/index.html .
Collapse
Affiliation(s)
- Wenying Yan
- Center for systems biology , Soochow University , Suzhou 215006 , China
| | - Guang Hu
- Center for systems biology , Soochow University , Suzhou 215006 , China
| | - Zhongjie Liang
- Center for systems biology , Soochow University , Suzhou 215006 , China
| | - Jianhong Zhou
- Center for systems biology , Soochow University , Suzhou 215006 , China
| | - Yang Yang
- School of computer science and technology , Soochow University , Suzhou 215006 , China
| | - Jiajia Chen
- School of Chemistry, Biology and Material Engineering , Suzhou University of Science and Technology , Suzhou 215011 , China
| | - Bairong Shen
- Center for systems biology , Soochow University , Suzhou 215006 , China
| |
Collapse
|
18
|
Niazi S, Purohit M, Sonawani A, Niazi JH. Revealing the molecular interactions of aptamers that specifically bind to the extracellular domain of HER2 cancer biomarker protein: An in silico assessment. J Mol Graph Model 2018; 83:112-121. [DOI: 10.1016/j.jmgm.2018.06.003] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Revised: 06/03/2018] [Accepted: 06/04/2018] [Indexed: 12/16/2022]
|
19
|
In silico prediction of active site and in vitro DNase and RNase activities of Helicoverpa-inducible pathogenesis related-4 protein from Cicer arietinum. Int J Biol Macromol 2018. [DOI: 10.1016/j.ijbiomac.2018.03.027] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
20
|
Zhou HX, Pang X. Electrostatic Interactions in Protein Structure, Folding, Binding, and Condensation. Chem Rev 2018; 118:1691-1741. [PMID: 29319301 DOI: 10.1021/acs.chemrev.7b00305] [Citation(s) in RCA: 501] [Impact Index Per Article: 83.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Charged and polar groups, through forming ion pairs, hydrogen bonds, and other less specific electrostatic interactions, impart important properties to proteins. Modulation of the charges on the amino acids, e.g., by pH and by phosphorylation and dephosphorylation, have significant effects such as protein denaturation and switch-like response of signal transduction networks. This review aims to present a unifying theme among the various effects of protein charges and polar groups. Simple models will be used to illustrate basic ideas about electrostatic interactions in proteins, and these ideas in turn will be used to elucidate the roles of electrostatic interactions in protein structure, folding, binding, condensation, and related biological functions. In particular, we will examine how charged side chains are spatially distributed in various types of proteins and how electrostatic interactions affect thermodynamic and kinetic properties of proteins. Our hope is to capture both important historical developments and recent experimental and theoretical advances in quantifying electrostatic contributions of proteins.
Collapse
Affiliation(s)
- Huan-Xiang Zhou
- Department of Chemistry and Department of Physics, University of Illinois at Chicago , Chicago, Illinois 60607, United States.,Department of Physics and Institute of Molecular Biophysics, Florida State University , Tallahassee, Florida 32306, United States
| | - Xiaodong Pang
- Department of Physics and Institute of Molecular Biophysics, Florida State University , Tallahassee, Florida 32306, United States
| |
Collapse
|
21
|
Abstract
The increasing number of protein structures with uncharacterized function necessitates the development of in silico prediction methods for functional annotations on proteins. In this chapter, different kinds of computational approaches are briefly introduced to predict DNA-binding residues on surface of DNA-binding proteins, and the merits and limitations of these methods are mainly discussed. This chapter focuses on the structure-based approaches and mainly discusses the framework of machine learning methods in application to DNA-binding prediction task.
Collapse
|
22
|
Identification of Hot Spots in Protein Structures Using Gaussian Network Model and Gaussian Naive Bayes. BIOMED RESEARCH INTERNATIONAL 2016; 2016:4354901. [PMID: 27882325 PMCID: PMC5110947 DOI: 10.1155/2016/4354901] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/21/2016] [Revised: 10/02/2016] [Accepted: 10/11/2016] [Indexed: 01/21/2023]
Abstract
Residue fluctuations in protein structures have been shown to be highly associated with various protein functions. Gaussian network model (GNM), a simple representative coarse-grained model, was widely adopted to reveal function-related protein dynamics. We directly utilized the high frequency modes generated by GNM and further performed Gaussian Naive Bayes (GNB) to identify hot spot residues. Two coding schemes about the feature vectors were implemented with varying distance cutoffs for GNM and sliding window sizes for GNB based on tenfold cross validations: one by using only a single high mode and the other by combining multiple modes with the highest frequency. Our proposed methods outperformed the previous work that did not directly utilize the high frequency modes generated by GNM, with regard to overall performance evaluated using F1 measure. Moreover, we found that inclusion of more high frequency modes for a GNB classifier can significantly improve the sensitivity. The present study provided additional valuable insights into the relation between the hot spots and the residue fluctuations.
Collapse
|
23
|
Chandrasekaran A, Chan J, Lim C, Yang LW. Protein Dynamics and Contact Topology Reveal Protein–DNA Binding Orientation. J Chem Theory Comput 2016; 12:5269-5277. [DOI: 10.1021/acs.jctc.6b00688] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Affiliation(s)
| | | | | | - Lee-Wei Yang
- Physics
Division, National Center for Theoretical Sciences, Hsinchu 30013, Taiwan
| |
Collapse
|
24
|
Zhou J, Xu R, He Y, Lu Q, Wang H, Kong B. PDNAsite: Identification of DNA-binding Site from Protein Sequence by Incorporating Spatial and Sequence Context. Sci Rep 2016; 6:27653. [PMID: 27282833 PMCID: PMC4901350 DOI: 10.1038/srep27653] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Accepted: 05/18/2016] [Indexed: 02/01/2023] Open
Abstract
Protein-DNA interactions are involved in many fundamental biological processes essential for cellular function. Most of the existing computational approaches employed only the sequence context of the target residue for its prediction. In the present study, for each target residue, we applied both the spatial context and the sequence context to construct the feature space. Subsequently, Latent Semantic Analysis (LSA) was applied to remove the redundancies in the feature space. Finally, a predictor (PDNAsite) was developed through the integration of the support vector machines (SVM) classifier and ensemble learning. Results on the PDNA-62 and the PDNA-224 datasets demonstrate that features extracted from spatial context provide more information than those from sequence context and the combination of them gives more performance gain. An analysis of the number of binding sites in the spatial context of the target site indicates that the interactions between binding sites next to each other are important for protein-DNA recognition and their binding ability. The comparison between our proposed PDNAsite method and the existing methods indicate that PDNAsite outperforms most of the existing methods and is a useful tool for DNA-binding site identification. A web-server of our predictor (http://hlt.hitsz.edu.cn:8080/PDNAsite/) is made available for free public accessible to the biological research community.
Collapse
Affiliation(s)
- Jiyun Zhou
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China.,Department of Computing, the Hong Kong Polytechnic University, Hong Kong
| | - Ruifeng Xu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China.,Shenzhen Engineering Laboratory of Performance Robots at Digital Stage, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China
| | - Yulan He
- School of Engineering and Applied Science, Aston University, UK
| | - Qin Lu
- Department of Computing, the Hong Kong Polytechnic University, Hong Kong
| | - Hongpeng Wang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Bing Kong
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| |
Collapse
|
25
|
Miao Z, Westhof E. A Large-Scale Assessment of Nucleic Acids Binding Site Prediction Programs. PLoS Comput Biol 2015; 11:e1004639. [PMID: 26681179 PMCID: PMC4683125 DOI: 10.1371/journal.pcbi.1004639] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2015] [Accepted: 10/30/2015] [Indexed: 11/18/2022] Open
Abstract
Computational prediction of nucleic acid binding sites in proteins are necessary to disentangle functional mechanisms in most biological processes and to explore the binding mechanisms. Several strategies have been proposed, but the state-of-the-art approaches display a great diversity in i) the definition of nucleic acid binding sites; ii) the training and test datasets; iii) the algorithmic methods for the prediction strategies; iv) the performance measures and v) the distribution and availability of the prediction programs. Here we report a large-scale assessment of 19 web servers and 3 stand-alone programs on 41 datasets including more than 5000 proteins derived from 3D structures of protein-nucleic acid complexes. Well-defined binary assessment criteria (specificity, sensitivity, precision, accuracy…) are applied. We found that i) the tools have been greatly improved over the years; ii) some of the approaches suffer from theoretical defects and there is still room for sorting out the essential mechanisms of binding; iii) RNA binding and DNA binding appear to follow similar driving forces and iv) dataset bias may exist in some methods.
Collapse
Affiliation(s)
- Zhichao Miao
- Architecture et Réactivité de l'ARN, Université de Strasbourg, Institut de Biologie Moléculaire et Cellulaire du CNRS, Strasbourg, France
| | - Eric Westhof
- Architecture et Réactivité de l'ARN, Université de Strasbourg, Institut de Biologie Moléculaire et Cellulaire du CNRS, Strasbourg, France
| |
Collapse
|
26
|
Wong KC, Li Y, Peng C, Moses AM, Zhang Z. Computational learning on specificity-determining residue-nucleotide interactions. Nucleic Acids Res 2015; 43:10180-9. [PMID: 26527718 PMCID: PMC4666365 DOI: 10.1093/nar/gkv1134] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2015] [Accepted: 10/18/2015] [Indexed: 01/02/2023] Open
Abstract
The protein–DNA interactions between transcription factors and transcription factor binding sites are essential activities in gene regulation. To decipher the binding codes, it is a long-standing challenge to understand the binding mechanism across different transcription factor DNA binding families. Past computational learning studies usually focus on learning and predicting the DNA binding residues on protein side. Taking into account both sides (protein and DNA), we propose and describe a computational study for learning the specificity-determining residue-nucleotide interactions of different known DNA-binding domain families. The proposed learning models are compared to state-of-the-art models comprehensively, demonstrating its competitive learning performance. In addition, we describe and propose two applications which demonstrate how the learnt models can provide meaningful insights into protein–DNA interactions across different DNA binding families.
Collapse
Affiliation(s)
- Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong
| | - Yue Li
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada CSAIL, Massachusetts Institute of Technology, Cambridge, MA 02139-4307, USA
| | - Chengbin Peng
- CEMSE Division, King Abdullah University of Science and Technology, Thuwal, Jeddah, Saudi Arabia
| | - Alan M Moses
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, Canada Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
| | - Zhaolei Zhang
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
27
|
Miao Z, Westhof E. Prediction of nucleic acid binding probability in proteins: a neighboring residue network based score. Nucleic Acids Res 2015; 43:5340-51. [PMID: 25940624 PMCID: PMC4477668 DOI: 10.1093/nar/gkv446] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2015] [Revised: 04/23/2015] [Accepted: 04/24/2015] [Indexed: 11/13/2022] Open
Abstract
We describe a general binding score for predicting the nucleic acid binding probability in proteins. The score is directly derived from physicochemical and evolutionary features and integrates a residue neighboring network approach. Our process achieves stable and high accuracies on both DNA- and RNA-binding proteins and illustrates how the main driving forces for nucleic acid binding are common. Because of the effective integration of the synergetic effects of the network of neighboring residues and the fact that the prediction yields a hierarchical scoring on the protein surface, energy funnels for nucleic acid binding appear on protein surfaces, pointing to the dynamic process occurring in the binding of nucleic acids to proteins.
Collapse
Affiliation(s)
- Zhichao Miao
- Architecture et Réactivité de l'ARN, Université de Strasbourg, Institut de biologie moléculaire et cellulaire du CNRS, 15 Rue Descartes, 67000 Strasbourg, France
| | - Eric Westhof
- Architecture et Réactivité de l'ARN, Université de Strasbourg, Institut de biologie moléculaire et cellulaire du CNRS, 15 Rue Descartes, 67000 Strasbourg, France
| |
Collapse
|
28
|
An overview of the prediction of protein DNA-binding sites. Int J Mol Sci 2015; 16:5194-215. [PMID: 25756377 PMCID: PMC4394471 DOI: 10.3390/ijms16035194] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2014] [Revised: 02/21/2015] [Accepted: 02/27/2015] [Indexed: 02/06/2023] Open
Abstract
Interactions between proteins and DNA play an important role in many essential biological processes such as DNA replication, transcription, splicing, and repair. The identification of amino acid residues involved in DNA-binding sites is critical for understanding the mechanism of these biological activities. In the last decade, numerous computational approaches have been developed to predict protein DNA-binding sites based on protein sequence and/or structural information, which play an important role in complementing experimental strategies. At this time, approaches can be divided into three categories: sequence-based DNA-binding site prediction, structure-based DNA-binding site prediction, and homology modeling and threading. In this article, we review existing research on computational methods to predict protein DNA-binding sites, which includes data sets, various residue sequence/structural features, machine learning methods for comparison and selection, evaluation methods, performance comparison of different tools, and future directions in protein DNA-binding site prediction. In particular, we detail the meta-analysis of protein DNA-binding sites. We also propose specific implications that are likely to result in novel prediction methods, increased performance, or practical applications.
Collapse
|
29
|
Sahillioglu AC, Sumbul F, Ozoren N, Haliloglu T. Structural and dynamics aspects of ASC speck assembly. Structure 2014; 22:1722-1734. [PMID: 25458835 DOI: 10.1016/j.str.2014.09.011] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2014] [Revised: 09/17/2014] [Accepted: 09/17/2014] [Indexed: 10/24/2022]
Abstract
Activation of the inflammasome is accompanied by rapid formation of a micrometer-sized perinuclear structure called the ASC speck, a platform for caspase-1 activity. The ASC speck is often referred to as an aggregate and shares certain features with aggresomes. It is thus an open question whether the ASC speck formation takes place via nonspecific aggregation of hydrophobic patches or specific interactions of its domains; PYD and CARD, which belong to the death fold superfamily. Bringing together structure and dynamics studies using the Gaussian network model of PYD and CARD, and molecular dynamics simulations of the wild-type and in silico mutated PYD, with the mutational analysis on the ASC structure and its separate domains in human cells, we show that the ASC speck is an organized structure with at least two levels of distinct compaction mechanisms based on the specific interactions of PYD and CARD.
Collapse
Affiliation(s)
- Ali Can Sahillioglu
- Department of Molecular Biology and Genetics, Apoptosis and Cancer Immunology Laboratory (AKIL), Bogazici University, 34470 Istanbul, Turkey
| | - Fidan Sumbul
- Department of Chemical Engineering and Polymer Research Center, Bogazici University, 34470 Istanbul, Turkey
| | - Nesrin Ozoren
- Department of Molecular Biology and Genetics, Apoptosis and Cancer Immunology Laboratory (AKIL), Bogazici University, 34470 Istanbul, Turkey; Center for Life Sciences and Technologies, Bogazici University, 34470 Istanbul, Turkey.
| | - Turkan Haliloglu
- Department of Chemical Engineering and Polymer Research Center, Bogazici University, 34470 Istanbul, Turkey; Center for Life Sciences and Technologies, Bogazici University, 34470 Istanbul, Turkey.
| |
Collapse
|
30
|
Liu B, Xu J, Fan S, Xu R, Zhou J, Wang X. PseDNA-Pro: DNA-Binding Protein Identification by Combining Chou’s PseAAC and Physicochemical Distance Transformation. Mol Inform 2014; 34:8-17. [DOI: 10.1002/minf.201400025] [Citation(s) in RCA: 135] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2014] [Accepted: 05/27/2014] [Indexed: 11/06/2022]
|
31
|
Zhao H, Wang J, Zhou Y, Yang Y. Predicting DNA-binding proteins and binding residues by complex structure prediction and application to human proteome. PLoS One 2014; 9:e96694. [PMID: 24792350 PMCID: PMC4008587 DOI: 10.1371/journal.pone.0096694] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2014] [Accepted: 04/10/2014] [Indexed: 12/25/2022] Open
Abstract
As more and more protein sequences are uncovered from increasingly inexpensive sequencing techniques, an urgent task is to find their functions. This work presents a highly reliable computational technique for predicting DNA-binding function at the level of protein-DNA complex structures, rather than low-resolution two-state prediction of DNA-binding as most existing techniques do. The method first predicts protein-DNA complex structure by utilizing the template-based structure prediction technique HHblits, followed by binding affinity prediction based on a knowledge-based energy function (Distance-scaled finite ideal-gas reference state for protein-DNA interactions). A leave-one-out cross validation of the method based on 179 DNA-binding and 3797 non-binding protein domains achieves a Matthews correlation coefficient (MCC) of 0.77 with high precision (94%) and high sensitivity (65%). We further found 51% sensitivity for 82 newly determined structures of DNA-binding proteins and 56% sensitivity for the human proteome. In addition, the method provides a reasonably accurate prediction of DNA-binding residues in proteins based on predicted DNA-binding complex structures. Its application to human proteome leads to more than 300 novel DNA-binding proteins; some of these predicted structures were validated by known structures of homologous proteins in APO forms. The method [SPOT-Seq (DNA)] is available as an on-line server at http://sparks-lab.org.
Collapse
Affiliation(s)
- Huiying Zhao
- School of Informatics, Indiana University Purdue University, Indianapolis, Indiana, United States of America
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
- QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - Jihua Wang
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
- Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Dezhou University, Dezhou, Shandong, China
| | - Yaoqi Zhou
- School of Informatics, Indiana University Purdue University, Indianapolis, Indiana, United States of America
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
- Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Dezhou University, Dezhou, Shandong, China
- Institute for Glycomics and School of Information and Communication Technique, Griffith University, Southport, Queensland, Australia
- * E-mail: (YZ); (YY)
| | - Yuedong Yang
- School of Informatics, Indiana University Purdue University, Indianapolis, Indiana, United States of America
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
- Institute for Glycomics and School of Information and Communication Technique, Griffith University, Southport, Queensland, Australia
- * E-mail: (YZ); (YY)
| |
Collapse
|
32
|
On the use of knowledge-based potentials for the evaluation of models of protein-protein, protein-DNA, and protein-RNA interactions. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2014; 94:77-120. [PMID: 24629186 DOI: 10.1016/b978-0-12-800168-4.00004-4] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Proteins are the bricks and mortar of cells, playing structural and functional roles. In order to perform their function, they interact with each other as well as with other biomolecules such as DNA or RNA. Therefore, to fathom the function of a protein, we require knowing its partners and the atomic details of its interactions (i.e., the structure of the complex). However, the amount of protein interactions with an experimentally determined three-dimensional structure is scarce. Therefore, computational techniques such as homology modeling are foremost to fill this gap. Protein interactions can be modeled using as templates the interactions of homologous proteins, if the structure of the complex is known, or using docking methods. In both approaches, the estimation of the quality of models is essential. There are several ways to address this problem. In this review, we focus on the use of knowledge-based potentials for the analysis of protein interactions. We describe the procedure to derive statistical potentials and split them into different energetic terms that can be used for different purposes. We extensively discuss the fields where knowledge-based potentials have been successfully applied to (1) model protein-protein, protein-DNA, and protein-RNA interactions and (2) predict binding sites (in the protein and in the DNA). Moreover, we provide ready-to-use resources for docking and benchmarking protein interactions.
Collapse
|
33
|
Ozbek P, Soner S, Haliloglu T. Hot spots in a network of functional sites. PLoS One 2013; 8:e74320. [PMID: 24023934 PMCID: PMC3759471 DOI: 10.1371/journal.pone.0074320] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2012] [Accepted: 08/02/2013] [Indexed: 12/05/2022] Open
Abstract
It is of significant interest to understand how proteins interact, which holds the key phenomenon in biological functions. Using dynamic fluctuations in high frequency modes, we show that the Gaussian Network Model (GNM) predicts hot spot residues with success rates ranging between S 8–58%, C 84–95%, P 5–19% and A 81–92% on unbound structures and S 8–51%, C 97–99%, P 14–50%, A 94–97% on complex structures for sensitivity, specificity, precision and accuracy, respectively. High specificity and accuracy rates with a single property on unbound protein structures suggest that hot spots are predefined in the dynamics of unbound structures and forming the binding core of interfaces, whereas the prediction of other functional residues with similar dynamic behavior explains the lower precision values. The latter is demonstrated with the case studies; ubiquitin, hen egg-white lysozyme and M2 proton channel. The dynamic fluctuations suggest a pseudo network of residues with high frequency fluctuations, which could be plausible for the mechanism of biological interactions and allosteric regulation.
Collapse
Affiliation(s)
- Pemra Ozbek
- Department of Bioengineering, Marmara University, Goztepe, Istanbul, Turkey
| | - Seren Soner
- Department of Chemical Engineering and Polymer Research Center, Bogazici University, Bebek, Turkey
| | - Turkan Haliloglu
- Department of Chemical Engineering and Polymer Research Center, Bogazici University, Bebek, Turkey
- * E-mail:
| |
Collapse
|
34
|
Liu R, Hu J. DNABind: A hybrid algorithm for structure-based prediction of DNA-binding residues by combining machine learning- and template-based approaches. Proteins 2013; 81:1885-99. [DOI: 10.1002/prot.24330] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2013] [Revised: 05/02/2013] [Accepted: 05/12/2013] [Indexed: 01/10/2023]
Affiliation(s)
- Rong Liu
- Department of Computer Science and Engineering; University of South Carolina; Columbia South Carolina 29208
- Center for Bioinformatics; College of Life Science and Technology; Huazhong Agricultural University; Wuhan 430070 People's Republic of China
| | - Jianjun Hu
- Department of Computer Science and Engineering; University of South Carolina; Columbia South Carolina 29208
| |
Collapse
|
35
|
Abstract
In this study, we present the DNA-Binding Site Identifier (DBSI), a new structure-based method for predicting protein interaction sites for DNA binding. DBSI was trained and validated on a data set of 263 proteins (TRAIN-263), tested on an independent set of protein-DNA complexes (TEST-206) and data sets of 29 unbound (APO-29) and 30 bound (HOLO-30) protein structures distinct from the training data. We computed 480 candidate features for identifying protein residues that bind DNA, including new features that capture the electrostatic microenvironment within shells near the protein surface. Our iterative feature selection process identified features important in other models, as well as features unique to the DBSI model, such as a banded electrostatic feature with spatial separation comparable with the canonical width of the DNA minor groove. Validations and comparisons with established methods using a range of performance metrics clearly demonstrate the predictive advantage of DBSI, and its comparable performance on unbound (APO-29) and bound (HOLO-30) conformations demonstrates robustness to binding-induced protein conformational changes. Finally, we offer our feature data table to others for integration into their own models or for testing improved feature selection and model training strategies based on DBSI.
Collapse
Affiliation(s)
- Xiaolei Zhu
- BACTER Institute, University of Wisconsin-Madison, Madison, WI, USA, Departments of Mathematics and Biochemistry, University of Wisconsin-Madison, Madison, WI, USA
| | | | | |
Collapse
|
36
|
Abstract
Predicting binding sites of a transcription factor in the genome is an important, but challenging, issue in studying gene regulation. In the past decade, a large number of protein–DNA co-crystallized structures available in the Protein Data Bank have facilitated the understanding of interacting mechanisms between transcription factors and their binding sites. Recent studies have shown that both physics-based and knowledge-based potential functions can be applied to protein–DNA complex structures to deliver position weight matrices (PWMs) that are consistent with the experimental data. To further use the available structural models, the proposed Web server, PiDNA, aims at first constructing reliable PWMs by applying an atomic-level knowledge-based scoring function on numerous in silico mutated complex structures, and then using the PWM constructed by the structure models with small energy changes to predict the interaction between proteins and DNA sequences. With PiDNA, the users can easily predict the relative preference of all the DNA sequences with limited mutations from the native sequence co-crystallized in the model in a single run. More predictions on sequences with unlimited mutations can be realized by additional requests or file uploading. Three types of information can be downloaded after prediction: (i) the ranked list of mutated sequences, (ii) the PWM constructed by the favourable mutated structures, and (iii) any mutated protein–DNA complex structure models specified by the user. This study first shows that the constructed PWMs are similar to the annotated PWMs collected from databases or literature. Second, the prediction accuracy of PiDNA in detecting relatively high-specificity sites is evaluated by comparing the ranked lists against in vitro experiments from protein-binding microarrays. Finally, PiDNA is shown to be able to select the experimentally validated binding sites from 10 000 random sites with high accuracy. With PiDNA, the users can design biological experiments based on the predicted sequence specificity and/or request mutated structure models for further protein design. As well, it is expected that PiDNA can be incorporated with chromatin immunoprecipitation data to refine large-scale inference of in vivo protein–DNA interactions. PiDNA is available at: http://dna.bime.ntu.edu.tw/pidna.
Collapse
Affiliation(s)
- Chih-Kang Lin
- Center for Systems Biology, National Taiwan University, Taipei 106, Taiwan
| | | |
Collapse
|
37
|
Gromiha MM, Nagarajan R. Computational approaches for predicting the binding sites and understanding the recognition mechanism of protein-DNA complexes. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2013; 91:65-99. [PMID: 23790211 DOI: 10.1016/b978-0-12-411637-5.00003-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Protein-DNA recognition plays an important role in the regulation of gene expression. Understanding the influence of specific residues for protein-DNA interactions and the recognition mechanism of protein-DNA complexes is a challenging task in molecular and computational biology. Several computational approaches have been put forward to tackle these problems from different perspectives: (i) development of databases for the interactions between protein and DNA and binding specificity of protein-DNA complexes, (ii) structural analysis of protein-DNA complexes, (iii) discriminating DNA-binding proteins from amino acid sequence, (iv) prediction of DNA-binding sites and protein-DNA binding specificity using sequence and/or structural information, and (v) understanding the recognition mechanism of protein-DNA complexes. In this review, we focus on all these issues and extensively discuss the advancements on the development of comprehensive bioinformatics databases for protein-DNA interactions, efficient tools for identifying the binding sites, and plausible mechanisms for understanding the recognition of protein-DNA complexes. Further, the available online resources for understanding protein-DNA interactions are collectively listed, which will serve as ready-to-use information for the research community.
Collapse
Affiliation(s)
- M Michael Gromiha
- Department of Biotechnology, Indian Institute of Technology Madras, Chennai, Tamil Nadu, India.
| | | |
Collapse
|
38
|
Qin S, Zhou HX. PI 2PE: A Suite of Web Servers for Predictions Ranging From Protein Structure to Binding Kinetics. Biophys Rev 2012; 5:41-46. [PMID: 23526172 DOI: 10.1007/s12551-012-0086-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
PI2PE (http://pipe.sc.fsu.edu) is a suite of four web servers for predicting a variety of folding- and binding-related properties of proteins. These include the solvent accessibility of amino acids upon protein folding, the amino acids forming the interfaces of protein-protein and protein-nucleic acid complexes, and the binding rate constants of these complexes. Three of the servers debuted in 2007, and have garnered ~2,500 unique users and finished over 30,000 jobs. The functionalities of these servers are now enhanced, and a new sever, for predicting the binding rate constants, is added. Together, these web servers form a pipeline from protein sequence to tertiary structure, then to quaternary structure, and finally to binding kinetics.
Collapse
Affiliation(s)
- Sanbo Qin
- Department of Physics and Institute of Molecular Biophysics, Florida State University, Tallahassee, Florida 32306, USA
| | | |
Collapse
|
39
|
Chen YC, Wright JD, Lim C. DR_bind: a web server for predicting DNA-binding residues from the protein structure based on electrostatics, evolution and geometry. Nucleic Acids Res 2012; 40:W249-56. [PMID: 22661576 PMCID: PMC3394278 DOI: 10.1093/nar/gks481] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
DR_bind is a web server that automatically predicts DNA-binding residues, given the respective protein structure based on (i) electrostatics, (ii) evolution and (iii) geometry. In contrast to machine-learning methods, DR_bind does not require a training data set or any parameters. It predicts DNA-binding residues by detecting a cluster of conserved, solvent-accessible residues that are electrostatically stabilized upon mutation to Asp−/Glu−. The server requires as input the DNA-binding protein structure in PDB format and outputs a downloadable text file of the predicted DNA-binding residues, a 3D visualization of the predicted residues highlighted in the given protein structure, and a downloadable PyMol script for visualization of the results. Calibration on 83 and 55 non-redundant DNA-bound and DNA-free protein structures yielded a DNA-binding residue prediction accuracy/precision of 90/47% and 88/42%, respectively. Since DR_bind does not require any training using protein–DNA complex structures, it may predict DNA-binding residues in novel structures of DNA-binding proteins resulting from structural genomics projects with no conservation data. The DR_bind server is freely available with no login requirement at http://dnasite.limlab.ibms.sinica.edu.tw.
Collapse
Affiliation(s)
- Yao Chi Chen
- Institute of Biomedical Sciences, Genomics Research Center, Academia Sinica, Taipei 115, Taiwan
| | | | | |
Collapse
|
40
|
Xiong Y, Xia J, Zhang W, Liu J. Exploiting a reduced set of weighted average features to improve prediction of DNA-binding residues from 3D structures. PLoS One 2011; 6:e28440. [PMID: 22174808 PMCID: PMC3234263 DOI: 10.1371/journal.pone.0028440] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2011] [Accepted: 11/08/2011] [Indexed: 01/29/2023] Open
Abstract
Predicting DNA-binding residues from a protein three-dimensional structure is a key task of computational structural proteomics. In the present study, based on machine learning technology, we aim to explore a reduced set of weighted average features for improving prediction of DNA-binding residues on protein surfaces. Via constructing the spatial environment around a DNA-binding residue, a novel weighting factor is first proposed to quantify the distance-dependent contribution of each neighboring residue in determining the location of a binding residue. Then, a weighted average scheme is introduced to represent the surface patch of the considering residue. Finally, the classifier is trained on the reduced set of these weighted average features, consisting of evolutionary profile, interface propensity, betweenness centrality and solvent surface area of side chain. Experimental results on 5-fold cross validation and independent tests indicate that the new feature set are effective to describe DNA-binding residues and our approach has significantly better performance than two previous methods. Furthermore, a brief case study suggests that the weighted average features are powerful for identifying DNA-binding residues and are promising for further study of protein structure-function relationship. The source code and datasets are available upon request.
Collapse
Affiliation(s)
- Yi Xiong
- School of Computer, Wuhan University, Wuhan, China
| | - Junfeng Xia
- Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Wen Zhang
- School of Computer, Wuhan University, Wuhan, China
| | - Juan Liu
- School of Computer, Wuhan University, Wuhan, China
- * E-mail:
| |
Collapse
|
41
|
Identification of key residues for protein conformational transition using elastic network model. J Chem Phys 2011; 135:174101. [DOI: 10.1063/1.3651480] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
|
42
|
Xiong Y, Liu J, Wei DQ. An accurate feature-based method for identifying DNA-binding residues on protein surfaces. Proteins 2011; 79:509-17. [PMID: 21069866 DOI: 10.1002/prot.22898] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Proteins that interact with DNA play vital roles in all mechanisms of gene expression and regulation. In order to understand these activities, it is crucial to analyze and identify DNA-binding residues on DNA-binding protein surfaces. Here, we proposed two novel features B-factor and packing density in combination with several conventional features to characterize the DNA-binding residues in a well-constructed representative dataset of 119 protein-DNA complexes from the Protein Data Bank (PDB). Based on the selected features, a prediction model for DNA-binding residues was constructed using support vector machine (SVM). The predictor was evaluated using a 5-fold cross validation on above dataset of 123 DNA-binding proteins. Moreover, two independent datasets of 83 DNA-bound protein structures and their corresponding DNA-free forms were compiled. The B-factor and packing density features were statistically analyzed on these 83 pairs of holo-apo proteins structures. Finally, we developed the SVM model to accurately predict DNA-binding residues on protein surface, given the DNA-free structure of a protein. Results showed here indicate that our method represents a significant improvement of previously existing approaches such as DISPLAR. The observation suggests that our method will be useful in studying protein-DNA interactions to guide consequent works such as site-directed mutagenesis and protein-DNA docking.
Collapse
Affiliation(s)
- Yi Xiong
- School of Computer, Wuhan University, Wuhan 430072, People's Republic of China
| | | | | |
Collapse
|