1
|
Zheng F, Jiang X, Wen Y, Yang Y, Li M. Systematic investigation of machine learning on limited data: A study on predicting protein-protein binding strength. Comput Struct Biotechnol J 2024; 23:460-472. [PMID: 38235359 PMCID: PMC10792694 DOI: 10.1016/j.csbj.2023.12.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 12/14/2023] [Accepted: 12/16/2023] [Indexed: 01/19/2024] Open
Abstract
The application of machine learning techniques in biological research, especially when dealing with limited data availability, poses significant challenges. In this study, we leveraged advancements in method development for predicting protein-protein binding strength to conduct a systematic investigation into the application of machine learning on limited data. The binding strength, quantitatively measured as binding affinity, is vital for understanding the processes of recognition, association, and dysfunction that occur within protein complexes. By incorporating transfer learning, integrating domain knowledge, and employing both deep learning and traditional machine learning algorithms, we mitigated the impact of data limitations and made significant advancements in predicting protein-protein binding affinity. In particular, we developed over 20 models, ultimately selecting three representative best-performing ones that belong to distinct categories. The first model is structure-based, consisting of a random forest regression and thirteen handcrafted features. The second model is sequence-based, employing an architecture that combines transferred embedding features with a multilayer perceptron. Finally, we created an ensemble model by averaging the predictions of the two aforementioned models. The comparison with other predictors on three independent datasets confirms the significant improvements achieved by our models in predicting protein-protein binding affinity. The programs for running these three models are available at https://github.com/minghuilab/BindPPI.
Collapse
Affiliation(s)
- Feifan Zheng
- MOE Key Laboratory of Geriatric Diseases and Immunology, School of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow University, Suzhou, Jiangsu Province 215123, China
| | - Xin Jiang
- MOE Key Laboratory of Geriatric Diseases and Immunology, School of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow University, Suzhou, Jiangsu Province 215123, China
| | - Yuhao Wen
- MOE Key Laboratory of Geriatric Diseases and Immunology, School of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow University, Suzhou, Jiangsu Province 215123, China
| | - Yan Yang
- MOE Key Laboratory of Geriatric Diseases and Immunology, School of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow University, Suzhou, Jiangsu Province 215123, China
| | - Minghui Li
- MOE Key Laboratory of Geriatric Diseases and Immunology, School of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow University, Suzhou, Jiangsu Province 215123, China
| |
Collapse
|
2
|
Sun X, Wu Z, Su J, Li C. A deep attention model for wide-genome protein-peptide binding affinity prediction at a sequence level. Int J Biol Macromol 2024; 276:133811. [PMID: 38996881 DOI: 10.1016/j.ijbiomac.2024.133811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Revised: 07/09/2024] [Accepted: 07/09/2024] [Indexed: 07/14/2024]
Abstract
Peptides are pivotal in numerous biological activities by engaging in up to 40 % of protein-protein interactions in many cellular processes. Due to their exceptional specificity and effectiveness, peptides have emerged as promising candidates for drug design. However, accurately predicting protein-peptide binding affinity remains a challenging. Aiming at the problem, we develop a prediction model PepPAP based on convolutional neural network and multi-head attention, which relies solely on sequence features. These features include physicochemical properties, intrinsic disorder, sequence encoding, and especially interface propensity which is extracted from 16,689 non-redundant protein-peptide complexes. Notably, the adopted regression stratification cross-validation scheme proposed in our previous work is beneficial to improve the prediction for the cases with extreme binding affinity values. On three benchmark test datasets: T100, a series of peptides targeting to PDZ domain and CXCR4, PepPAP shows excellent performance, outperforming the existing methods and demonstrating its good generalization ability. Furthermore, PepPAP has good results in binary interaction prediction, and the analysis of the feature space distribution visualization highlights PepPAP's effectiveness. To the best of our knowledge, PepPAP is the first sequence-based deep attention model for wide-genome protein-peptide binding affinity prediction, and holds the potential to offer valuable insights for the peptide-based drug design.
Collapse
Affiliation(s)
- Xiaohan Sun
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Zhixiang Wu
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Jingjie Su
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Chunhua Li
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China.
| |
Collapse
|
3
|
Grassmann G, Miotto M, Desantis F, Di Rienzo L, Tartaglia GG, Pastore A, Ruocco G, Monti M, Milanetti E. Computational Approaches to Predict Protein-Protein Interactions in Crowded Cellular Environments. Chem Rev 2024; 124:3932-3977. [PMID: 38535831 PMCID: PMC11009965 DOI: 10.1021/acs.chemrev.3c00550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 02/20/2024] [Accepted: 02/21/2024] [Indexed: 04/11/2024]
Abstract
Investigating protein-protein interactions is crucial for understanding cellular biological processes because proteins often function within molecular complexes rather than in isolation. While experimental and computational methods have provided valuable insights into these interactions, they often overlook a critical factor: the crowded cellular environment. This environment significantly impacts protein behavior, including structural stability, diffusion, and ultimately the nature of binding. In this review, we discuss theoretical and computational approaches that allow the modeling of biological systems to guide and complement experiments and can thus significantly advance the investigation, and possibly the predictions, of protein-protein interactions in the crowded environment of cell cytoplasm. We explore topics such as statistical mechanics for lattice simulations, hydrodynamic interactions, diffusion processes in high-viscosity environments, and several methods based on molecular dynamics simulations. By synergistically leveraging methods from biophysics and computational biology, we review the state of the art of computational methods to study the impact of molecular crowding on protein-protein interactions and discuss its potential revolutionizing effects on the characterization of the human interactome.
Collapse
Affiliation(s)
- Greta Grassmann
- Department
of Biochemical Sciences “Alessandro Rossi Fanelli”, Sapienza University of Rome, Rome 00185, Italy
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
| | - Mattia Miotto
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
| | - Fausta Desantis
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
- The
Open University Affiliated Research Centre at Istituto Italiano di
Tecnologia, Genoa 16163, Italy
| | - Lorenzo Di Rienzo
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
| | - Gian Gaetano Tartaglia
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
- Department
of Neuroscience and Brain Technologies, Istituto Italiano di Tecnologia, Genoa 16163, Italy
- Center
for Human Technologies, Genoa 16152, Italy
| | - Annalisa Pastore
- Experiment
Division, European Synchrotron Radiation
Facility, Grenoble 38043, France
| | - Giancarlo Ruocco
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
- Department
of Physics, Sapienza University, Rome 00185, Italy
| | - Michele Monti
- RNA
System Biology Lab, Department of Neuroscience and Brain Technologies, Istituto Italiano di Tecnologia, Genoa 16163, Italy
| | - Edoardo Milanetti
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
- Department
of Physics, Sapienza University, Rome 00185, Italy
| |
Collapse
|
4
|
Yi C, Taylor ML, Ziebarth J, Wang Y. Predictive Models and Impact of Interfacial Contacts and Amino Acids on Protein-Protein Binding Affinity. ACS OMEGA 2024; 9:3454-3468. [PMID: 38284090 PMCID: PMC10809705 DOI: 10.1021/acsomega.3c06996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 12/11/2023] [Accepted: 12/14/2023] [Indexed: 01/30/2024]
Abstract
Protein-protein interactions (PPIs) play a central role in nearly all cellular processes. The strength of the binding in a PPI is characterized by the binding affinity (BA) and is a key factor in controlling protein-protein complex formation and defining the structure-function relationship. Despite advancements in understanding protein-protein binding, much remains unknown about the interfacial region and its association with BA. New models are needed to predict BA with improved accuracy for therapeutic design. Here, we use machine learning approaches to examine how well different types of interfacial contacts can be used to predict experimentally determined BA and to reveal the impact of the specific amino acids at the binding interface on BA. We create a series of multivariate linear regression models incorporating different contact features at both residue and atomic levels and examine how different methods of identifying and characterizing these properties impact the performance of these models. Particularly, we introduce a new and simple approach to predict BA based on the quantities of specific amino acids at the protein-protein interface. We found that the numbers of specific amino acids at the protein-protein interface were correlated with BA. We show that the interfacial numbers of amino acids can be used to produce models with consistently good performance across different data sets, indicating the importance of the identities of interfacial amino acids in underlying BA. When trained on a diverse set of complexes from two benchmark data sets, the best performing BA model was generated with an explicit linear equation involving six amino acids. Tyrosine, in particular, was identified as the key amino acid in controlling BA, as it had the strongest correlation with BA and was consistently identified as the most important amino acid in feature importance studies. Glycine and serine were identified as the next two most important amino acids in predicting BA. The results from this study further our understanding of PPIs and can be used to make improved predictions of BA, giving them implications for drug design and screening in the pharmaceutical industry.
Collapse
Affiliation(s)
- Carey
Huang Yi
- Department of Chemistry, The University of Memphis, Memphis, Tennessee 38152, United States
| | - Mitchell Lee Taylor
- Department of Chemistry, The University of Memphis, Memphis, Tennessee 38152, United States
| | - Jesse Ziebarth
- Department of Chemistry, The University of Memphis, Memphis, Tennessee 38152, United States
| | - Yongmei Wang
- Department of Chemistry, The University of Memphis, Memphis, Tennessee 38152, United States
| |
Collapse
|
5
|
Zhang Y, Wang X, Zhang Z, Huang Y, Kihara D. Assessment of Protein-Protein Docking Models Using Deep Learning. Methods Mol Biol 2024; 2780:149-162. [PMID: 38987469 DOI: 10.1007/978-1-0716-3985-6_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
Protein-protein interactions are involved in almost all processes in a living cell and determine the biological functions of proteins. To obtain mechanistic understandings of protein-protein interactions, the tertiary structures of protein complexes have been determined by biophysical experimental methods, such as X-ray crystallography and cryogenic electron microscopy. However, as experimental methods are costly in resources, many computational methods have been developed that model protein complex structures. One of the difficulties in computational protein complex modeling (protein docking) is to select the most accurate models among many models that are usually generated by a docking method. This article reviews advances in protein docking model assessment methods, focusing on recent developments that apply deep learning to several network architectures.
Collapse
Affiliation(s)
- Yuanyuan Zhang
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Xiao Wang
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Zicong Zhang
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Yunhan Huang
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, USA.
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
6
|
Cho Y, Ryu H, Lim G, Nam S, Lee J. Improving Geometric Validation Metrics and Ensuring Consistency with Experimental Data through TrioSA: An NMR Refinement Protocol. Int J Mol Sci 2023; 24:13337. [PMID: 37686144 PMCID: PMC10487420 DOI: 10.3390/ijms241713337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 08/21/2023] [Accepted: 08/24/2023] [Indexed: 09/10/2023] Open
Abstract
Protein model refinement a the crucial step in improving the quality of a predicted protein model. This study presents an NMR refinement protocol called TrioSA (torsion-angle and implicit-solvation-optimized simulated annealing) that improves the accuracy of backbone/side-chain conformations and the overall structural quality of proteins. TrioSA was applied to a subset of 3752 solution NMR protein structures accompanied by experimental NMR data: distance and dihedral angle restraints. We compared the initial NMR structures with the TrioSA-refined structures and found significant improvements in structural quality. In particular, we observed a reduction in both the maximum and number of NOE (nuclear Overhauser effect) violations, indicating better agreement with experimental NMR data. TrioSA improved geometric validation metrics of NMR protein structure, including backbone accuracy and the secondary structure ratio. We evaluated the contribution of each refinement element and found that the torsional angle potential played a significant role in improving the geometric validation metrics. In addition, we investigated protein-ligand docking to determine if TrioSA can improve biological outcomes. TrioSA structures exhibited better binding prediction compared to the initial NMR structures. This study suggests that further development and research in computational refinement methods could improve biomolecular NMR structural determination.
Collapse
Affiliation(s)
- Youngbeom Cho
- Department of Bioinformatics, KRIBB School of Bioscience, University of Science and Technology (UST), Daejeon 34141, Republic of Korea;
- Disease Target Structure Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 34141, Republic of Korea; (H.R.); (G.L.)
| | - Hyojung Ryu
- Disease Target Structure Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 34141, Republic of Korea; (H.R.); (G.L.)
| | - Gyutae Lim
- Disease Target Structure Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 34141, Republic of Korea; (H.R.); (G.L.)
| | - Seungyoon Nam
- Department of Genome Medicine and Science, AI Convergence Center for Medical Science, Gachon Institute of Genome Medicine and Science, Gachon University Gil Medical Center, Gachon University College of Medicine, Incheon 21565, Republic of Korea
- Department of Health Sciences and Technology, Gachon Advanced Institute for Health Sciences and Technology, Gachon University, Incheon 21999, Republic of Korea
| | - Jinhyuk Lee
- Department of Bioinformatics, KRIBB School of Bioscience, University of Science and Technology (UST), Daejeon 34141, Republic of Korea;
- Disease Target Structure Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 34141, Republic of Korea; (H.R.); (G.L.)
| |
Collapse
|
7
|
Yang S, Gong W, Zhou T, Sun X, Chen L, Zhou W, Li C. emPDBA: protein-DNA binding affinity prediction by combining features from binding partners and interface learned with ensemble regression model. Brief Bioinform 2023:7165253. [PMID: 37193676 DOI: 10.1093/bib/bbad192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 04/26/2023] [Accepted: 04/29/2023] [Indexed: 05/18/2023] Open
Abstract
Protein-deoxyribonucleic acid (DNA) interactions are important in a variety of biological processes. Accurately predicting protein-DNA binding affinity has been one of the most attractive and challenging issues in computational biology. However, the existing approaches still have much room for improvement. In this work, we propose an ensemble model for Protein-DNA Binding Affinity prediction (emPDBA), which combines six base models with one meta-model. The complexes are classified into four types based on the DNA structure (double-stranded or other forms) and the percentage of interface residues. For each type, emPDBA is trained with the sequence-based, structure-based and energy features from binding partners and complex structures. Through feature selection by the sequential forward selection method, it is found that there do exist considerable differences in the key factors contributing to intermolecular binding affinity. The complex classification is beneficial for the important feature extraction for binding affinity prediction. The performance comparison of our method with other peer ones on the independent testing dataset shows that emPDBA outperforms the state-of-the-art methods with the Pearson correlation coefficient of 0.53 and the mean absolute error of 1.11 kcal/mol. The comprehensive results demonstrate that our method has a good performance for protein-DNA binding affinity prediction. Availability and implementation: The source code is available at https://github.com/ChunhuaLiLab/emPDBA/.
Collapse
Affiliation(s)
- Shuang Yang
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Weikang Gong
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Tong Zhou
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Xiaohan Sun
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Lei Chen
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Wenxue Zhou
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Chunhua Li
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| |
Collapse
|
8
|
Chitosan and HPMCAS double-coating as protective systems for alginate microparticles loaded with Ctx(Ile 21)-Ha antimicrobial peptide to prevent intestinal infections. Biomaterials 2023; 293:121978. [PMID: 36580719 DOI: 10.1016/j.biomaterials.2022.121978] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 11/03/2022] [Accepted: 12/20/2022] [Indexed: 12/24/2022]
Abstract
The incorrect use of conventional drugs for both prevention and control of intestinal infections has contributed to a significant spread of bacterial resistance. In this way, studies that promote their replacement are a priority. In the last decade, the use of antimicrobial peptides (AMP), especially Ctx(Ile21)-Ha AMP, has gained strength, demonstrating efficient antimicrobial activity (AA) against pathogens, including multidrug-resistant bacteria. However, gastrointestinal degradation does not allow its direct oral application. In this research, double-coating systems using alginate microparticles loaded with Ctx(Ile21)-Ha peptide were designed, and in vitro release assays simulating the gastrointestinal tract were evaluated. Also, the AA against Salmonella spp. and Escherichia coli was examined. The results showed the physicochemical stability of Ctx(Ile21)-Ha peptide in the system and its potent antimicrobial activity. In addition, the combination of HPMCAS and chitosan as a gastric protection system can be promising for peptide carriers or other low pH-sensitive molecules, adequately released in the intestine. In conclusion, the coated systems employed in this study can improve the formulation of new foods or biopharmaceutical products for specific application against intestinal pathogens in animal production or, possibly, in the near future, in human health.
Collapse
|
9
|
Govind Kumar V, Polasa A, Agrawal S, Kumar TKS, Moradi M. Binding affinity estimation from restrained umbrella sampling simulations. NATURE COMPUTATIONAL SCIENCE 2023; 3:59-70. [PMID: 38177953 PMCID: PMC10766565 DOI: 10.1038/s43588-022-00389-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Accepted: 12/05/2022] [Indexed: 01/06/2024]
Abstract
The protein-ligand binding affinity quantifies the binding strength between a protein and its ligand. Computer modeling and simulations can be used to estimate the binding affinity or binding free energy using data- or physics-driven methods or a combination thereof. Here we discuss a purely physics-based sampling approach based on biased molecular dynamics simulations. Our proposed method generalizes and simplifies previously suggested stratification strategies that use umbrella sampling or other enhanced sampling simulations with additional collective-variable-based restraints. The approach presented here uses a flexible scheme that can be easily tailored for any system of interest. We estimate the binding affinity of human fibroblast growth factor 1 to heparin hexasaccharide based on the available crystal structure of the complex as the initial model and four different variations of the proposed method to compare against the experimentally determined binding affinity obtained from isothermal titration calorimetry experiments.
Collapse
Affiliation(s)
- Vivek Govind Kumar
- Department of Chemistry and Biochemistry, University of Arkansas, Fayetteville, AR, USA
| | - Adithya Polasa
- Department of Chemistry and Biochemistry, University of Arkansas, Fayetteville, AR, USA
| | - Shilpi Agrawal
- Department of Chemistry and Biochemistry, University of Arkansas, Fayetteville, AR, USA
| | | | - Mahmoud Moradi
- Department of Chemistry and Biochemistry, University of Arkansas, Fayetteville, AR, USA.
| |
Collapse
|
10
|
Guo Z, Yamaguchi R. Machine learning methods for protein-protein binding affinity prediction in protein design. FRONTIERS IN BIOINFORMATICS 2022; 2:1065703. [PMID: 36591334 PMCID: PMC9800603 DOI: 10.3389/fbinf.2022.1065703] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 12/01/2022] [Indexed: 12/23/2022] Open
Abstract
Protein-protein interactions govern a wide range of biological activity. A proper estimation of the protein-protein binding affinity is vital to design proteins with high specificity and binding affinity toward a target protein, which has a variety of applications including antibody design in immunotherapy, enzyme engineering for reaction optimization, and construction of biosensors. However, experimental and theoretical modelling methods are time-consuming, hinder the exploration of the entire protein space, and deter the identification of optimal proteins that meet the requirements of practical applications. In recent years, the rapid development in machine learning methods for protein-protein binding affinity prediction has revealed the potential of a paradigm shift in protein design. Here, we review the prediction methods and associated datasets and discuss the requirements and construction methods of binding affinity prediction models for protein design.
Collapse
Affiliation(s)
- Zhongliang Guo
- Division of Cancer Systems Biology, Aichi Cancer Center Research Institute, Nagoya, Aichi, Japan
| | - Rui Yamaguchi
- Division of Cancer Systems Biology, Aichi Cancer Center Research Institute, Nagoya, Aichi, Japan,Division of Cancer Informatics, Nagoya University Graduate School of Medicine, Nagoya, Aichi, Japan,*Correspondence: Rui Yamaguchi,
| |
Collapse
|
11
|
Protein Function Analysis through Machine Learning. Biomolecules 2022; 12:biom12091246. [PMID: 36139085 PMCID: PMC9496392 DOI: 10.3390/biom12091246] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Revised: 08/22/2022] [Accepted: 08/31/2022] [Indexed: 11/16/2022] Open
Abstract
Machine learning (ML) has been an important arsenal in computational biology used to elucidate protein function for decades. With the recent burgeoning of novel ML methods and applications, new ML approaches have been incorporated into many areas of computational biology dealing with protein function. We examine how ML has been integrated into a wide range of computational models to improve prediction accuracy and gain a better understanding of protein function. The applications discussed are protein structure prediction, protein engineering using sequence modifications to achieve stability and druggability characteristics, molecular docking in terms of protein–ligand binding, including allosteric effects, protein–protein interactions and protein-centric drug discovery. To quantify the mechanisms underlying protein function, a holistic approach that takes structure, flexibility, stability, and dynamics into account is required, as these aspects become inseparable through their interdependence. Another key component of protein function is conformational dynamics, which often manifest as protein kinetics. Computational methods that use ML to generate representative conformational ensembles and quantify differences in conformational ensembles important for function are included in this review. Future opportunities are highlighted for each of these topics.
Collapse
|
12
|
Zhou Y, Jiang Y, Chen SJ. RNA-ligand molecular docking: advances and challenges. WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL MOLECULAR SCIENCE 2022; 12:e1571. [PMID: 37293430 PMCID: PMC10250017 DOI: 10.1002/wcms.1571] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Accepted: 07/20/2021] [Indexed: 12/16/2022]
Abstract
With rapid advances in computer algorithms and hardware, fast and accurate virtual screening has led to a drastic acceleration in selecting potent small molecules as drug candidates. Computational modeling of RNA-small molecule interactions has become an indispensable tool for RNA-targeted drug discovery. The current models for RNA-ligand binding have mainly focused on the docking-and-scoring method. Accurate docking and scoring should tackle four crucial problems: (1) conformational flexibility of ligand, (2) conformational flexibility of RNA, (3) efficient sampling of binding sites and binding poses, and (4) accurate scoring of different binding modes. Moreover, compared with the problem of protein-ligand docking, predicting ligand binding to RNA, a negatively charged polymer, is further complicated by additional effects such as metal ion effects. Thermodynamic models based on physics-based and knowledge-based scoring functions have shown highly encouraging success in predicting ligand binding poses and binding affinities. Recently, kinetic models for ligand binding have further suggested that including dissociation kinetics (residence time) in ligand docking would result in improved performance in estimating in vivo drug efficacy. More recently, the rise of deep-learning approaches has led to new tools for predicting RNA-small molecule binding. In this review, we present an overview of the recently developed computational methods for RNA-ligand docking and their advantages and disadvantages.
Collapse
Affiliation(s)
- Yuanzhe Zhou
- Department of Physics and Astronomy, Department of Biochemistry, Institute of Data Sciences and Informatics, University of Missouri, Columbia, MO 65211-7010, USA
| | - Yangwei Jiang
- Department of Physics and Astronomy, Department of Biochemistry, Institute of Data Sciences and Informatics, University of Missouri, Columbia, MO 65211-7010, USA
| | - Shi-Jie Chen
- Department of Physics and Astronomy, Department of Biochemistry, Institute of Data Sciences and Informatics, University of Missouri, Columbia, MO 65211-7010, USA
| |
Collapse
|
13
|
Zhou P, Wen L, Lin J, Mei L, Liu Q, Shang S, Li J, Shu J. Integrated unsupervised-supervised modeling and prediction of protein-peptide affinities at structural level. Brief Bioinform 2022; 23:6555404. [PMID: 35352094 DOI: 10.1093/bib/bbac097] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 02/15/2022] [Accepted: 02/23/2022] [Indexed: 12/24/2022] Open
Abstract
Cell signal networks are orchestrated directly or indirectly by various peptide-mediated protein-protein interactions, which are normally weak and transient and thus ideal for biological regulation and medicinal intervention. Here, we develop a general-purpose method for modeling and predicting the binding affinities of protein-peptide interactions (PpIs) at the structural level. The method is a hybrid strategy that employs an unsupervised approach to derive a layered PpI atom-residue interaction (ulPpI[a-r]) potential between different protein atom types and peptide residue types from thousands of solved PpI complex structures and then statistically correlates the potential descriptors with experimental affinities (KD values) over hundreds of known PpI samples in a supervised manner to create an integrated unsupervised-supervised PpI affinity (usPpIA) predictor. Although both the ulPpI[a-r] potential and usPpIA predictor can be used to calculate PpI affinities from their complex structures, the latter seems to perform much better than the former, suggesting that the unsupervised potential can be improved substantially with a further correction by supervised statistical learning. We examine the robustness and fault-tolerance of usPpIA predictor when applied to treat the coarse-grained PpI complex structures modeled computationally by sophisticated peptide docking and dynamics simulation. It is revealed that, despite developed solely based on solved structures, the integrated unsupervised-supervised method is also applicable for locally docked structures to reach a quantitative prediction but can only give a qualitative prediction on globally docked structures. The dynamics refinement seems not to change (or improve) the predictive results essentially, although it is computationally expensive and time-consuming relative to peptide docking. We also perform extrapolation of usPpIA predictor to the indirect affinity quantities of HLA-A*0201 binding epitope peptides and NHERF PDZ binding scaffold peptides, consequently resulting in a good and moderate correlation of the predicted KD with experimental IC50 and BLU on the two peptide sets, with Pearson's correlation coefficients Rp = 0.635 and 0.406, respectively.
Collapse
Affiliation(s)
- Peng Zhou
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), Chengdu 611731, China
| | - Li Wen
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), Chengdu 611731, China
| | - Jing Lin
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), Chengdu 611731, China
| | - Li Mei
- Institute of Culinary, Sichuan Tourism University, Chengdu 610100, China
| | - Qian Liu
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), Chengdu 611731, China
| | - Shuyong Shang
- of Ecological Environment Protection, Chengdu Normal University, Chengdu 611130, China
| | - Juelin Li
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), Chengdu 611731, China
| | - Jianping Shu
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), Chengdu 611731, China
| |
Collapse
|
14
|
Yang YX, Wang P, Zhu BT. Relative importance of interface and surface areas in protein-protein binding affinity prediction: A machine learning analysis based on linear regression and artificial neural network. Biophys Chem 2022; 283:106762. [DOI: 10.1016/j.bpc.2022.106762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 01/11/2022] [Accepted: 01/14/2022] [Indexed: 11/02/2022]
|
15
|
rsRNASP: A residue-separation-based statistical potential for RNA 3D structure evaluation. Biophys J 2022; 121:142-156. [PMID: 34798137 PMCID: PMC8758408 DOI: 10.1016/j.bpj.2021.11.016] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 10/23/2021] [Accepted: 11/10/2021] [Indexed: 01/07/2023] Open
Abstract
Knowledge-based statistical potentials have been shown to be rather effective in protein 3-dimensional (3D) structure evaluation and prediction. Recently, several statistical potentials have been developed for RNA 3D structure evaluation, while their performances are either still at a low level for the test datasets from structure prediction models or dependent on the "black-box" process through neural networks. In this work, we have developed an all-atom distance-dependent statistical potential based on residue separation for RNA 3D structure evaluation, namely rsRNASP, which is composed of short- and long-ranged potentials distinguished by residue separation. The extensive examinations against available RNA test datasets show that rsRNASP has apparently higher performance than the existing statistical potentials for the realistic test datasets with large RNAs from structure prediction models, including the newly released RNA-Puzzles dataset, and is comparable to the existing top statistical potentials for the test datasets with small RNAs or near-native decoys. In addition, rsRNASP is superior to RNA3DCNN, a recently developed scoring function through 3D convolutional neural networks. rsRNASP and the relevant databases are available to the public.
Collapse
|
16
|
Dhusia K, Madrid C, Su Z, Wu Y. EXCESP: A Structure-Based Online Database for Extracellular Interactome of Cell Surface Proteins in Humans. J Proteome Res 2022; 21:349-359. [PMID: 34978816 DOI: 10.1021/acs.jproteome.1c00612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The interactions between ectodomains of cell surface proteins are vital players in many important cellular processes, such as regulating immune responses, coordinating cell differentiation, and shaping neural plasticity. However, while the construction of a large-scale protein interactome has been greatly facilitated by the development of high-throughput experimental techniques, little progress has been made to support the discovery of extracellular interactome for cell surface proteins. Harnessed by the recent advances in computational modeling of protein-protein interactions, here we present a structure-based online database for the extracellular interactome of cell surface proteins in humans, called EXCESP. The database contains both experimentally determined and computationally predicted interactions among all type-I transmembrane proteins in humans. All structural models for these interactions and their binding affinities were further computationally modeled. Moreover, information such as expression levels of each protein in different cell types and its relation to various signaling pathways from other online resources has also been integrated into the database. In summary, the database serves as a valuable addition to the existing online resources for the study of cell surface proteins. It can contribute to the understanding of the functions of cell surface proteins in the era of systems biology.
Collapse
Affiliation(s)
- Kalyani Dhusia
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, New York 10461, United States
| | - Carlos Madrid
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, New York 10461, United States.,Laboratory for Macromolecular Analysis and Proteomics, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, New York 10461, United States
| | - Zhaoqian Su
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, New York 10461, United States
| | - Yinghao Wu
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, New York 10461, United States
| |
Collapse
|
17
|
Crampon K, Giorkallos A, Deldossi M, Baud S, Steffenel LA. Machine-learning methods for ligand-protein molecular docking. Drug Discov Today 2021; 27:151-164. [PMID: 34560276 DOI: 10.1016/j.drudis.2021.09.007] [Citation(s) in RCA: 84] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Revised: 07/14/2021] [Accepted: 09/15/2021] [Indexed: 12/22/2022]
Abstract
Artificial intelligence (AI) is often presented as a new Industrial Revolution. Many domains use AI, including molecular simulation for drug discovery. In this review, we provide an overview of ligand-protein molecular docking and how machine learning (ML), especially deep learning (DL), a subset of ML, is transforming the field by tackling the associated challenges.
Collapse
Affiliation(s)
- Kevin Crampon
- Université de Reims Champagne Ardenne, CNRS, MEDyC UMR 7369, 51097 Reims, France; Université de Reims Champagne Ardenne, LICIIS - LRC CEA DIGIT, 51100 Reims, France; Atos SE, Center of Excellence in Advanced Computing, 38130 Echirolles, France
| | - Alexis Giorkallos
- Atos SE, Center of Excellence in Advanced Computing, 38130 Echirolles, France
| | - Myrtille Deldossi
- Atos SE, Center of Excellence in Advanced Computing, 38130 Echirolles, France
| | - Stéphanie Baud
- Université de Reims Champagne Ardenne, CNRS, MEDyC UMR 7369, 51097 Reims, France
| | | |
Collapse
|
18
|
Eberhardt J, Santos-Martins D, Tillack AF, Forli S. AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings. J Chem Inf Model 2021; 61:3891-3898. [PMID: 34278794 PMCID: PMC10683950 DOI: 10.1021/acs.jcim.1c00203] [Citation(s) in RCA: 1418] [Impact Index Per Article: 472.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
AutoDock Vina is arguably one of the fastest and most widely used open-source programs for molecular docking. However, compared to other programs in the AutoDock Suite, it lacks support for modeling specific features such as macrocycles or explicit water molecules. Here, we describe the implementation of this functionality in AutoDock Vina 1.2.0. Additionally, AutoDock Vina 1.2.0 supports the AutoDock4.2 scoring function, simultaneous docking of multiple ligands, and a batch mode for docking a large number of ligands. Furthermore, we implemented Python bindings to facilitate scripting and the development of docking workflows. This work is an effort toward the unification of the features of the AutoDock4 and AutoDock Vina programs. The source code is available at https://github.com/ccsb-scripps/AutoDock-Vina.
Collapse
Affiliation(s)
- Jerome Eberhardt
- Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, 92037 California, United States
| | - Diogo Santos-Martins
- Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, 92037 California, United States
| | - Andreas F Tillack
- Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, 92037 California, United States
| | - Stefano Forli
- Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, 92037 California, United States
| |
Collapse
|
19
|
Wang B, Su Z, Wu Y. Computational Assessment of Protein-Protein Binding Affinity by Reverse Engineering the Energetics in Protein Complexes. GENOMICS PROTEOMICS & BIOINFORMATICS 2021; 19:1012-1022. [PMID: 33838354 PMCID: PMC9403033 DOI: 10.1016/j.gpb.2021.03.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/10/2018] [Revised: 03/07/2019] [Accepted: 05/17/2019] [Indexed: 11/29/2022]
Abstract
The cellular functions of proteins are maintained by forming diverse complexes. The stability of these complexes is quantified by the measurement of binding affinity, and mutations that alter the binding affinity can cause various diseases such as cancer and diabetes. As a result, accurate estimation of the binding stability and the effects of mutations on changes of binding affinity is a crucial step to understanding the biological functions of proteins and their dysfunctional consequences. It has been hypothesized that the stability of a protein complex is dependent not only on the residues at its binding interface by pairwise interactions but also on all other remaining residues that do not appear at the binding interface. Here, we computationally reconstruct the binding affinity by decomposing it into the contributions of interfacial residues and other non-interfacial residues in a protein complex. We further assume that the contributions of both interfacial and non-interfacial residues to the binding affinity depend on their local structural environments such as solvent-accessible surfaces and secondary structural types. The weights of all corresponding parameters are optimized by Monte-Carlo simulations. After cross-validation against a large-scale dataset, we show that the model not only shows a strong correlation between the absolute values of the experimental and calculated binding affinities, but can also be an effective approach to predict the relative changes of binding affinity from mutations. Moreover, we have found that the optimized weights of many parameters can capture the first-principle chemical and physical features of molecular recognition, therefore reversely engineering the energetics of protein complexes. These results suggest that our method can serve as a useful addition to current computational approaches for predicting binding affinity and understanding the molecular mechanism of protein–protein interactions.
Collapse
Affiliation(s)
- Bo Wang
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
| | - Zhaoqian Su
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
| | - Yinghao Wu
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA.
| |
Collapse
|
20
|
Wang Q, Chen F, Liu P, Mu Y, Sun S, Yuan X, Shang P, Ji B. Scaffold-based analysis of nonpeptide oncogenic FTase inhibitors using multiple similarity matching, binding affinity scoring and enzyme inhibition assay. J Mol Graph Model 2021; 105:107898. [PMID: 33784524 DOI: 10.1016/j.jmgm.2021.107898] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 02/25/2021] [Accepted: 03/05/2021] [Indexed: 10/21/2022]
Abstract
Oncogenic protein farnesyltransferase (FTase) is a key enzyme responsible for the lipid modification of a large and important number of proteins including Ras, which has been recognized as a druggable target of diverse cancers. Here, we report a systematic scaffold-based analysis to investigate the affinity, selectivity and cross-reactivity of nonpeptide inhibitors across ontology-enriched, disease-associated FTase mutants, by integrating multiple similarity matching, binding affinity scoring and enzyme inhibition assay. It is revealed that nonpeptide inhibitors are generally insensitive to FTase mutations; many of them cannot definitely select for wild-type target over mutant enzymes. Therefore, off-target is observed as a common phenomenon for the untargeted consequence of targeted therapies with FTase inhibition. This is not unexpected if considering that the enzyme active site is highly conserved in composition, configuration and function. The off-target, on the one hand, causes nonpeptide inhibitors with adverse drug reactions and, on the other hand, makes the inhibitors as promising candidates for the new use of old drugs. To practice the latter, a number of unexpected mutant-inhibitor interactions involved in cancer signaling pathways are uncovered in the created profile, from which several nonpeptide inhibitors are identified as insensitive to a drug-resistant mutation. Structural analysis suggests that the inhibitor ligands can bind to the mutant active site in a similar manner with wild-type target, although their nonbonded interactions appear to be impaired moderately upon the mutation.
Collapse
Affiliation(s)
- Qifei Wang
- Department of Chest Surgery, The Second Affiliated Hospital of Shandong First Medical University, Taian, 271000, China
| | - Fei Chen
- Department of Gastroenterology, The Second Affiliated Hospital of Shandong First Medical University, Taian, 271000, China
| | - Peng Liu
- Department of Chest Surgery, Ningyang First People's Hospital, Taian, 271400, China
| | - Yushu Mu
- Department of Chest Surgery, The Second Affiliated Hospital of Shandong First Medical University, Taian, 271000, China
| | - Shibin Sun
- Department of Chest Surgery, The Second Affiliated Hospital of Shandong First Medical University, Taian, 271000, China
| | - Xulong Yuan
- Department of Chest Surgery, The Second Affiliated Hospital of Shandong First Medical University, Taian, 271000, China
| | - Pan Shang
- Department of Chest Surgery, The Second Affiliated Hospital of Shandong First Medical University, Taian, 271000, China
| | - Bo Ji
- Department of Chest Surgery, The Second Affiliated Hospital of Shandong First Medical University, Taian, 271000, China.
| |
Collapse
|
21
|
Xu C, Liu X, Shen J, Sun Q, Guo X, Yang M, Leng J. Integrative identification of human serpin PAI-1 inhibitors from Dracaena dragon blood and molecular implications for inhibitor-induced PAI-1 allosterism. Biotechnol Appl Biochem 2021; 69:221-229. [PMID: 33433923 DOI: 10.1002/bab.2100] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Accepted: 01/06/2021] [Indexed: 11/08/2022]
Abstract
Human plasminogen activator inhibitor-1 (PAI-1) is an important component of the coagulation system and has been recognized as a potential therapeutic target of diverse cardiovascular disorders. Previously, it was found that the extracts from the Chinese medicine Dracaena dragon blood have potent inhibitory activity against PAI-1, but it is unclear which constituents directly participate in the inhibition and how do they regulate PAI-1 at molecular level. Here, we describe an integrated strategy to identify the dragon blood's chemical constituents that can directly target PAI-1. With the strategy, five compounds 1-5 are hit as promising PAI-1 inhibitor candidates, from which three are measured to have high or moderate activity against PAI-1. In particular, the compound 3 is determined to exhibit the highest potency; this value is roughly comparable with the widely used PAI-1 inhibitor Tiplaxtinin. We further examine the molecular effect of compound 3 on PAI-1 conformation at structural level. It is supposed that small-molecule inhibitor regulates the reactive center loop (RCL) of PAI-1 through an allosterism, that is, binding of compound 3 to PAI-1 can allosterically stabilize RCL in latent form, thus promoting PAI-1 conformational conversion from metastable active form to the inactive latent form. Long-term atomistic simulations also demonstrate that removal of compound 3 can destabilize the structured β-stranded conformation of RCL in latent form, although the current simulations are still not sufficient to characterize the full conversion dynamics trajectory.
Collapse
Affiliation(s)
- Chong Xu
- Chongqing Academy of Traditional Chinese Medicine, Chongqing, People's Republic of China.,Chongqing Traditional Chinese Medicine Hospital, Chongqing, People's Republic of China
| | - Xia Liu
- Chongqing Academy of Traditional Chinese Medicine, Chongqing, People's Republic of China.,Chongqing Traditional Chinese Medicine Hospital, Chongqing, People's Republic of China
| | - Jie Shen
- Chongqing Academy of Traditional Chinese Medicine, Chongqing, People's Republic of China.,Chongqing Traditional Chinese Medicine Hospital, Chongqing, People's Republic of China
| | - Quan Sun
- Chongqing Academy of Traditional Chinese Medicine, Chongqing, People's Republic of China.,Chongqing Traditional Chinese Medicine Hospital, Chongqing, People's Republic of China
| | - Xiaohong Guo
- Chongqing Academy of Traditional Chinese Medicine, Chongqing, People's Republic of China.,Chongqing Traditional Chinese Medicine Hospital, Chongqing, People's Republic of China
| | - Min Yang
- Chongqing Academy of Traditional Chinese Medicine, Chongqing, People's Republic of China.,Chongqing Traditional Chinese Medicine Hospital, Chongqing, People's Republic of China
| | - Jing Leng
- Chongqing Academy of Traditional Chinese Medicine, Chongqing, People's Republic of China.,Chongqing Traditional Chinese Medicine Hospital, Chongqing, People's Republic of China
| |
Collapse
|
22
|
Kwon Y, Shin WH, Ko J, Lee J. AK-Score: Accurate Protein-Ligand Binding Affinity Prediction Using an Ensemble of 3D-Convolutional Neural Networks. Int J Mol Sci 2020; 21:E8424. [PMID: 33182567 PMCID: PMC7697539 DOI: 10.3390/ijms21228424] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2020] [Revised: 10/24/2020] [Accepted: 11/07/2020] [Indexed: 02/04/2023] Open
Abstract
Accurate prediction of the binding affinity of a protein-ligand complex is essential for efficient and successful rational drug design. Therefore, many binding affinity prediction methods have been developed. In recent years, since deep learning technology has become powerful, it is also implemented to predict affinity. In this work, a new neural network model that predicts the binding affinity of a protein-ligand complex structure is developed. Our model predicts the binding affinity of a complex using the ensemble of multiple independently trained networks that consist of multiple channels of 3-D convolutional neural network layers. Our model was trained using the 3772 protein-ligand complexes from the refined set of the PDBbind-2016 database and tested using the core set of 285 complexes. The benchmark results show that the Pearson correlation coefficient between the predicted binding affinities by our model and the experimental data is 0.827, which is higher than the state-of-the-art binding affinity prediction scoring functions. Additionally, our method ranks the relative binding affinities of possible multiple binders of a protein quite accurately, comparable to the other scoring functions. Last, we measured which structural information is critical for predicting binding affinity and found that the complementarity between the protein and ligand is most important.
Collapse
Affiliation(s)
- Yongbeom Kwon
- Department of Chemistry, Kangwon National University, Gangwon-do, Chuncheon 24341, Korea;
| | - Woong-Hee Shin
- Department of Chemical Science Education, Sunchon National University, Jeollanam-do, Suncheon 57922, Korea
| | - Junsu Ko
- Arontier, 241 Gangnam-daero, Seocho-gu, Seoul 06735, Korea
| | - Juyong Lee
- Department of Chemistry, Kangwon National University, Gangwon-do, Chuncheon 24341, Korea;
| |
Collapse
|
23
|
Aderinwale T, Christoffer CW, Sarkar D, Alnabati E, Kihara D. Computational structure modeling for diverse categories of macromolecular interactions. Curr Opin Struct Biol 2020; 64:1-8. [PMID: 32599506 PMCID: PMC7665979 DOI: 10.1016/j.sbi.2020.05.017] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Revised: 05/06/2020] [Accepted: 05/21/2020] [Indexed: 01/23/2023]
Abstract
Computational protein-protein docking is one of the most intensively studied topics in structural bioinformatics. The field has made substantial progress through over three decades of development. The development began with methods for rigid-body docking of two proteins, which have now been extended in different directions to cover the various macromolecular interactions observed in a cell. Here, we overview the recent developments of the variations of docking methods, including multiple protein docking, peptide-protein docking, and disordered protein docking methods.
Collapse
Affiliation(s)
- Tunde Aderinwale
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | | | - Daipayan Sarkar
- Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA
| | - Eman Alnabati
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA; Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA.
| |
Collapse
|
24
|
Adeshina YO, Deeds EJ, Karanicolas J. Machine learning classification can reduce false positives in structure-based virtual screening. Proc Natl Acad Sci U S A 2020; 117:18477-18488. [PMID: 32669436 PMCID: PMC7414157 DOI: 10.1073/pnas.2000585117] [Citation(s) in RCA: 95] [Impact Index Per Article: 23.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
With the recent explosion in the size of libraries available for screening, virtual screening is positioned to assume a more prominent role in early drug discovery's search for active chemical matter. In typical virtual screens, however, only about 12% of the top-scoring compounds actually show activity when tested in biochemical assays. We argue that most scoring functions used for this task have been developed with insufficient thoughtfulness into the datasets on which they are trained and tested, leading to overly simplistic models and/or overtraining. These problems are compounded in the literature because studies reporting new scoring methods have not validated their models prospectively within the same study. Here, we report a strategy for building a training dataset (D-COID) that aims to generate highly compelling decoy complexes that are individually matched to available active complexes. Using this dataset, we train a general-purpose classifier for virtual screening (vScreenML) that is built on the XGBoost framework. In retrospective benchmarks, our classifier shows outstanding performance relative to other scoring functions. In a prospective context, nearly all candidate inhibitors from a screen against acetylcholinesterase show detectable activity; beyond this, 10 of 23 compounds have IC50 better than 50 μM. Without any medicinal chemistry optimization, the most potent hit has IC50 280 nM, corresponding to Ki of 173 nM. These results support using the D-COID strategy for training classifiers in other computational biology tasks, and for vScreenML in virtual screening campaigns against other protein targets. Both D-COID and vScreenML are freely distributed to facilitate such efforts.
Collapse
Affiliation(s)
- Yusuf O Adeshina
- Program in Molecular Therapeutics, Fox Chase Cancer Center, Philadelphia, PA 19111
- Center for Computational Biology, University of Kansas, Lawrence, KS 66045
| | - Eric J Deeds
- Center for Computational Biology, University of Kansas, Lawrence, KS 66045
- Department of Molecular Biosciences, University of Kansas, Lawrence, KS 66045
| | - John Karanicolas
- Program in Molecular Therapeutics, Fox Chase Cancer Center, Philadelphia, PA 19111;
| |
Collapse
|
25
|
Saikia S, Bordoloi M. Molecular Docking: Challenges, Advances and its Use in Drug Discovery Perspective. Curr Drug Targets 2020; 20:501-521. [PMID: 30360733 DOI: 10.2174/1389450119666181022153016] [Citation(s) in RCA: 203] [Impact Index Per Article: 50.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Revised: 06/08/2018] [Accepted: 08/28/2018] [Indexed: 01/21/2023]
Abstract
Molecular docking is a process through which small molecules are docked into the macromolecular structures for scoring its complementary values at the binding sites. It is a vibrant research area with dynamic utility in structure-based drug-designing, lead optimization, biochemical pathway and for drug designing being the most attractive tools. Two pillars for a successful docking experiment are correct pose and affinity prediction. Each program has its own advantages and drawbacks with respect to their docking accuracy, ranking accuracy and time consumption so a general conclusion cannot be drawn. Moreover, users don't always consider sufficient diversity in their test sets which results in certain programs to outperform others. In this review, the prime focus has been laid on the challenges of docking and troubleshooters in existing programs, underlying algorithmic background of docking, preferences regarding the use of docking programs for best results illustrated with examples, comparison of performance for existing tools and algorithms, state of art in docking, recent trends of diseases and current drug industries, evidence from clinical trials and post-marketing surveillance are discussed. These aspects of the molecular drug designing paradigm are quite controversial and challenging and this review would be an asset to the bioinformatics and drug designing communities.
Collapse
Affiliation(s)
- Surovi Saikia
- Natural Products Chemistry Group, CSIR North East Institute of Science & Technology, Jorhat-785006, Assam, India
| | - Manobjyoti Bordoloi
- Natural Products Chemistry Group, CSIR North East Institute of Science & Technology, Jorhat-785006, Assam, India
| |
Collapse
|
26
|
Li Y, Gao Y, Holloway MK, Wang R. Prediction of the Favorable Hydration Sites in a Protein Binding Pocket and Its Application to Scoring Function Formulation. J Chem Inf Model 2020; 60:4359-4375. [DOI: 10.1021/acs.jcim.9b00619] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Affiliation(s)
- Yan Li
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People’s Republic of China
| | - Yingduo Gao
- Merck Research Laboratories, 2000 Galloping Hill Road, Kenilworth, New Jersey 07033, United States
- Merck Research Laboratories, 770 Sumneytown Pike, West Point, Pennsylvania 19486, United States
| | | | - Renxiao Wang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People’s Republic of China
- State Key Laboratory of Bioorganic and Natural Products Chemistry, Center for Excellence in Molecular Synthesis, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, 345 Lingling Road, Shanghai 200032, People’s Republic of China
- Shanxi Key Laboratory of Innovative Drugs for the Treatment of Serious Diseases Basing on Chronic Inflammation, College of Traditional Chinese Medicines, Shanxi University of Chinese Medicine, Taiyuan, Shanxi 030619, People’s Republic of China
| |
Collapse
|
27
|
Sen Gupta PS, Islam RNUI, Banerjee S, Nayek A, Rana MK, Bandyopadhyay AK. Screening and molecular characterization of lethal mutations of human homogentisate 1, 2 dioxigenase. J Biomol Struct Dyn 2020; 39:1661-1671. [PMID: 32107984 DOI: 10.1080/07391102.2020.1736158] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Affiliation(s)
- Parth Sarthi Sen Gupta
- Department of Biotechnology, The University of Burdwan, Bardhaman, West Bengal, India
- Department of Chemical Sciences, Indian Institute of Science Education and Research (IISER) Berhampur, Ganjam, Odisha, India
| | - Rifat Nawaz UI Islam
- Department of Biotechnology, The University of Burdwan, Bardhaman, West Bengal, India
| | - Sahini Banerjee
- Department of Biological Sciences, Indian Statistical Institute, Kolkata, West Bengal, India
| | - Arnab Nayek
- Department of Biotechnology, The University of Burdwan, Bardhaman, West Bengal, India
| | - Malay Kumar Rana
- Department of Chemical Sciences, Indian Institute of Science Education and Research (IISER) Berhampur, Ganjam, Odisha, India
| | | |
Collapse
|
28
|
Dos Santos Maia M, Soares Rodrigues GC, Silva Cavalcanti AB, Scotti L, Scotti MT. Consensus Analyses in Molecular Docking Studies Applied to Medicinal Chemistry. Mini Rev Med Chem 2020; 20:1322-1340. [PMID: 32013847 DOI: 10.2174/1389557520666200204121129] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Revised: 10/31/2019] [Accepted: 11/04/2019] [Indexed: 02/08/2023]
Abstract
The increasing number of computational studies in medicinal chemistry involving molecular docking has put the technique forward as promising in Computer-Aided Drug Design. Considering the main method in the virtual screening based on the structure, consensus analysis of docking has been applied in several studies to overcome limitations of algorithms of different programs and mainly to increase the reliability of the results and reduce the number of false positives. However, some consensus scoring strategies are difficult to apply and, in some cases, are not reliable due to the small number of datasets tested. Thus, for such a methodology to be successful, it is necessary to understand why, when and how to use consensus docking. Therefore, the present study aims to present different approaches to docking consensus, applications, and several scoring strategies that have been successful and can be applied in future studies.
Collapse
Affiliation(s)
- Mayara Dos Santos Maia
- Program of Natural and Synthetic Bioactive Products (PgPNSB), Health Sciences Center, Federal University of Paraiba, Joao Pessoa-PB, Brazil
| | - Gabriela Cristina Soares Rodrigues
- Program of Natural and Synthetic Bioactive Products (PgPNSB), Health Sciences Center, Federal University of Paraiba, Joao Pessoa-PB, Brazil
| | - Andreza Barbosa Silva Cavalcanti
- Program of Natural and Synthetic Bioactive Products (PgPNSB), Health Sciences Center, Federal University of Paraiba, Joao Pessoa-PB, Brazil
| | - Luciana Scotti
- Program of Natural and Synthetic Bioactive Products (PgPNSB), Health Sciences Center, Federal University of Paraiba, Joao Pessoa-PB, Brazil
| | - Marcus Tullius Scotti
- Program of Natural and Synthetic Bioactive Products (PgPNSB), Health Sciences Center, Federal University of Paraiba, Joao Pessoa-PB, Brazil
| |
Collapse
|
29
|
PreDBA: A heterogeneous ensemble approach for predicting protein-DNA binding affinity. Sci Rep 2020; 10:1278. [PMID: 31992738 PMCID: PMC6987227 DOI: 10.1038/s41598-020-57778-1] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Accepted: 01/06/2020] [Indexed: 11/17/2022] Open
Abstract
The interaction between protein and DNA plays an essential function in various critical natural processes, like DNA replication, transcription, splicing, and repair. Studying the binding affinity of proteins to DNA helps to understand the recognition mechanism of protein-DNA complexes. Since there are still many limitations on the protein-DNA binding affinity data measured by experiments, accurate and reliable calculation methods are necessarily required. So we put forward a computational approach in this paper, called PreDBA, that can forecast protein-DNA binding affinity effectively by using heterogeneous ensemble models. One hundred protein-DNA complexes are manually collected from the related literature as a data set for protein-DNA binding affinity. Then, 52 sequence and structural features are obtained. Based on this, the correlation between these 52 characteristics and protein-DNA binding affinity is calculated. Furthermore, we found that the protein-DNA binding affinity is affected by the DNA molecule structure of the compound. We classify all protein-DNA compounds into five classifications based on the DNA structure related to the proteins that make up the protein-DNA complexes. In each group, a stacked heterogeneous ensemble model is constructed based on the obtained features. In the end, based on the binding affinity data set, we used the leave-one-out cross-validation to evaluate the proposed method comprehensively. In the five categories, the Pearson correlation coefficient values of our recommended method range from 0.735 to 0.926. We have demonstrated the advantages of the proposed method compared to other machine learning methods and currently existing protein-DNA binding affinity prediction approach.
Collapse
|
30
|
Qiu L, Zou X. Scoring Functions for Protein-RNA Complex Structure Prediction: Advances, Applications, and Future Directions. COMMUNICATIONS IN INFORMATION AND SYSTEMS 2020; 20:1-22. [PMID: 33867869 DOI: 10.4310/cis.2020.v20.n1.a1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Protein-RNA interaction is among the most essential of biological events in living cells, being involved in protein synthesizing, RNA processing and transport, DNA transcription, and regulation of gene expression, and many other critical bio-molecular activities. A thorough understanding of this interaction is of paramount importance in fundamental study of a variety of vital cellular processes and therapeutic application for remedy of a broad range of diseases. Experimental high-resolution 3D structure determination is the primary source of knowledge for protein-RNA complexes. However, due to technical limitations, the existing techniques for experimental structure determination couldn't match the demand from fast growing interest in academia and industry. This problem necessitates the alternative high-throughput computational method for protein-RNA complex structure prediction. Similar to the in silico methods used for protein-protein and protein-DNA interactions, a reliable prediction of protein-RNA complex structure requires a scoring function with commensurate discriminatory power. Derived from determined structures and purposed to predict the to-be-determined structures, the scoring function is not only a predictive tool but also a gauge of our knowledge of protein-RNA interaction. In this review, we present an overview of the status of existing scoring functions and the scientific principle behind their constructions as well as their strengths and limitations. Finally, we will discuss about future directions of the scoring function development for protein-RNA structure prediction.
Collapse
Affiliation(s)
- Liming Qiu
- Dalton Cardiovascular Research Center, University of Missouri, Columbia, Missouri 65211
| | - Xiaoqin Zou
- Dalton Cardiovascular Research Center, University of Missouri, Columbia, Missouri 65211.,Department of Physics & Astronomy, University of Missouri, Columbia, Missouri 65211.,Department of Biochemistry, University of Missouri, Columbia, Missouri 65211.,Informatics Institute, University of Missouri, Columbia, Missouri 65211
| |
Collapse
|
31
|
Nithin C, Mukherjee S, Bahadur RP. A structure-based model for the prediction of protein-RNA binding affinity. RNA (NEW YORK, N.Y.) 2019; 25:1628-1645. [PMID: 31395671 PMCID: PMC6859855 DOI: 10.1261/rna.071779.119] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2019] [Accepted: 08/05/2019] [Indexed: 05/28/2023]
Abstract
Protein-RNA recognition is highly affinity-driven and regulates a wide array of cellular functions. In this study, we have curated a binding affinity data set of 40 protein-RNA complexes, for which at least one unbound partner is available in the docking benchmark. The data set covers a wide affinity range of eight orders of magnitude as well as four different structural classes. On average, we find the complexes with single-stranded RNA have the highest affinity, whereas the complexes with the duplex RNA have the lowest. Nevertheless, free energy gain upon binding is the highest for the complexes with ribosomal proteins and the lowest for the complexes with tRNA with an average of -5.7 cal/mol/Å2 in the entire data set. We train regression models to predict the binding affinity from the structural and physicochemical parameters of protein-RNA interfaces. The best fit model with the lowest maximum error is provided with three interface parameters: relative hydrophobicity, conformational change upon binding and relative hydration pattern. This model has been used for predicting the binding affinity on a test data set, generated using mutated structures of yeast aspartyl-tRNA synthetase, for which experimentally determined ΔG values of 40 mutations are available. The predicted ΔGempirical values highly correlate with the experimental observations. The data set provided in this study should be useful for further development of the binding affinity prediction methods. Moreover, the model developed in this study enhances our understanding on the structural basis of protein-RNA binding affinity and provides a platform to engineer protein-RNA interfaces with desired affinity.
Collapse
Affiliation(s)
- Chandran Nithin
- Computational Structural Biology Lab, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur 721302, India
| | - Sunandan Mukherjee
- Computational Structural Biology Lab, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur 721302, India
| | - Ranjit Prasad Bahadur
- Computational Structural Biology Lab, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur 721302, India
| |
Collapse
|
32
|
Su Z, Wu Y. Multiscale simulation unravel the kinetic mechanisms of inflammasome assembly. BIOCHIMICA ET BIOPHYSICA ACTA-MOLECULAR CELL RESEARCH 2019; 1867:118612. [PMID: 31758956 DOI: 10.1016/j.bbamcr.2019.118612] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/13/2019] [Revised: 11/11/2019] [Accepted: 11/18/2019] [Indexed: 01/16/2023]
Abstract
In the innate immune system, the host defense from the invasion of external pathogens triggers the inflammatory responses. Proteins involved in the inflammatory pathways were often found to aggregate into supramolecular oligomers, called 'inflammasome', mostly through the homotypic interaction between their domains that belong to the death domain superfamily. Although much has been known about the formation of these helical molecular machineries, the detailed correlation between the dynamics of their assembly and the structure of each domain is still not well understood. Using the filament formed by the PYD domains of adaptor molecule ASC as a test system, we constructed a new multiscale simulation framework to study the kinetics of inflammasome assembly. We found that the filament assembly is a multi-step, but highly cooperative process. Moreover, there are three types of binding interfaces between domain subunits in the ASCPYD filament. The multiscale simulation results suggest that dynamics of domain assembly are rooted in the primary protein sequence which defines the energetics of molecular recognition through three binding interfaces. Interface I plays a more regulatory role than the other two in mediating both the kinetics and the thermodynamics of assembly. Finally, the efficiency of our computational framework allows us to design mutants on a systematic scale and predict their impacts on filament assembly. In summary, this is, to the best of our knowledge, the first simulation method to model the spatial-temporal process of inflammasome assembly. Our work is a useful addition to a suite of existing experimental techniques to study the functions of inflammasome in innate immune system.
Collapse
Affiliation(s)
- Zhaoqian Su
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, United States of America
| | - Yinghao Wu
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, United States of America.
| |
Collapse
|
33
|
AlQuraishi M. AlphaFold at CASP13. Bioinformatics 2019; 35:4862-4865. [PMID: 31116374 PMCID: PMC6907002 DOI: 10.1093/bioinformatics/btz422] [Citation(s) in RCA: 154] [Impact Index Per Article: 30.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2019] [Revised: 03/26/2019] [Accepted: 05/15/2019] [Indexed: 11/13/2022] Open
Abstract
SUMMARY Computational prediction of protein structure from sequence is broadly viewed as a foundational problem of biochemistry and one of the most difficult challenges in bioinformatics. Once every two years the Critical Assessment of protein Structure Prediction (CASP) experiments are held to assess the state of the art in the field in a blind fashion, by presenting predictor groups with protein sequences whose structures have been solved but have not yet been made publicly available. The first CASP was organized in 1994, and the latest, CASP13, took place last December, when for the first time the industrial laboratory DeepMind entered the competition. DeepMind's entry, AlphaFold, placed first in the Free Modeling (FM) category, which assesses methods on their ability to predict novel protein folds (the Zhang group placed first in the Template-Based Modeling (TBM) category, which assess methods on predicting proteins whose folds are related to ones already in the Protein Data Bank.) DeepMind's success generated significant public interest. Their approach builds on two ideas developed in the academic community during the preceding decade: (i) the use of co-evolutionary analysis to map residue co-variation in protein sequence to physical contact in protein structure, and (ii) the application of deep neural networks to robustly identify patterns in protein sequence and co-evolutionary couplings and convert them into contact maps. In this Letter, we contextualize the significance of DeepMind's entry within the broader history of CASP, relate AlphaFold's methodological advances to prior work, and speculate on the future of this important problem.
Collapse
Affiliation(s)
- Mohammed AlQuraishi
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
- Lab of Systems Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
34
|
Siebenmorgen T, Zacharias M. Computational prediction of protein–protein binding affinities. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2019. [DOI: 10.1002/wcms.1448] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Affiliation(s)
- Till Siebenmorgen
- Physics Department T38 Technical University of Munich Garching Germany
| | - Martin Zacharias
- Physics Department T38 Technical University of Munich Garching Germany
| |
Collapse
|
35
|
Zhang T, Hu G, Yang Y, Wang J, Zhou Y. All-Atom Knowledge-Based Potential for RNA Structure Discrimination Based on the Distance-Scaled Finite Ideal-Gas Reference State. J Comput Biol 2019; 27:856-867. [PMID: 31638408 DOI: 10.1089/cmb.2019.0251] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Noncoding RNAs are increasingly found to play a wide variety of roles in living organisms. Yet, their functional mechanisms are poorly understood because their structures are difficult to determine experimentally. As a result, developing more effective computational techniques to predict RNA structures becomes increasingly an urgent task. One key challenge in RNA structure prediction is the lack of an accurate free energy function to guide RNA folding and discriminate native and near-native structures from decoy conformations. In this study, we developed an all-atom distance-dependent knowledge-based energy function for RNA that is based on a reference state (distance-scaled finite ideal-gas reference state, DFIRE) proven successful for protein structure discrimination. Using four separate benchmarks including RNA puzzles, we found that this DFIRE-based RNA statistical energy function is able to discriminate native and near-native structures against decoys with performance comparable with or better than several existing scoring functions compared. The energy function is expected to be useful for improving the detection of RNA near-native structures.
Collapse
Affiliation(s)
- Tongchuan Zhang
- Institute for Glycomics, School of Informatics and Communication Technology, Griffith University, Southport, Australia
| | - Guodong Hu
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
| | - Yuedong Yang
- School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, China
| | - Jihua Wang
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
| | - Yaoqi Zhou
- Institute for Glycomics, School of Informatics and Communication Technology, Griffith University, Southport, Australia.,Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
| |
Collapse
|
36
|
Thaljeh LF, Rothschild JA, Naderi M, Coghill LM, Brown JM, Brylinski M. Hinge Region in DNA Packaging Terminase pUL15 of Herpes Simplex Virus: A Potential Allosteric Target for Antiviral Drugs. Biomolecules 2019; 9:biom9100603. [PMID: 31614784 PMCID: PMC6843332 DOI: 10.3390/biom9100603] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Revised: 09/30/2019] [Accepted: 10/08/2019] [Indexed: 12/23/2022] Open
Abstract
Approximately 80% of adults are infected with a member of the herpesviridae family. Herpesviruses establish life-long latent infections within neurons, which may reactivate into lytic infections due to stress or immune suppression. There are nine human herpesviruses (HHV) posing health concerns from benign conditions to life threatening encephalitis, including cancers associated with viral infections. The current treatment options for most HHV conditions mainly include several nucleoside and nucleotide analogs targeting viral DNA polymerase. Although these drugs help manage infections, their common mechanism of action may lead to the development of drug resistance, which is particularly devastating in immunocompromised patients. Therefore, new classes of drugs directed against novel targets in HHVs are necessary to alleviate this issue. We analyzed the conservation rates of all proteins in herpes simplex virus 1 (HHV-1), a representative of the HHV family and one of the most common viruses infecting the human population. Furthermore, we generated a full-length structure model of the most conserved HHV-1 protein, the DNA packaging terminase pUL15. A series of computational analyses were performed on the model to identify ATP and DNA binding sites and characterize the dynamics of the protein. Our study indicates that proteins involved in HHV-1 DNA packaging and cleavage are amongst the most conserved gene products of HHVs. Since the packaging protein pUL15 is the most conserved among all HHV-1 gene products, the virus will have a lower chance of developing resistance to small molecules targeting pUL15. A subsequent analysis of the structure of pUL15 revealed distinct ATP and DNA binding domains and the elastic network model identifies a functionally important hinge region between the two domains of pUL15. The atomic information on the active and allosteric sites in the ATP- and DNA-bound model of pUL15 presented in this study can inform the structure-based drug discovery of a new class of drugs to treat a wide range of HHVs.
Collapse
Affiliation(s)
- Lana F Thaljeh
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA.
| | - J Ainsley Rothschild
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA.
| | - Misagh Naderi
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA.
| | - Lyndon M Coghill
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA.
- Center for Computation & Technology, Louisiana State University, Baton Rouge, LA 70803, USA.
| | - Jeremy M Brown
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA.
| | - Michal Brylinski
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA.
- Center for Computation & Technology, Louisiana State University, Baton Rouge, LA 70803, USA.
| |
Collapse
|
37
|
DLIGAND2: an improved knowledge-based energy function for protein-ligand interactions using the distance-scaled, finite, ideal-gas reference state. J Cheminform 2019; 11:52. [PMID: 31392430 PMCID: PMC6686496 DOI: 10.1186/s13321-019-0373-4] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Accepted: 07/27/2019] [Indexed: 12/14/2022] Open
Abstract
Performance of structure-based molecular docking largely depends on the accuracy of scoring functions. One important type of scoring functions are knowledge-based potentials derived from known three-dimensional structures of proteins and/or protein–ligand complex structures. This study seeks to improve a knowledge-based protein–ligand potential based on a distance-scale finite ideal-gas reference (DFIRE) state (DLIGAND) by expanding the representation of protein atoms from 13 mol2 atom types to 167 residue-specific atom types, and employing a recently updated dataset containing 12,450 monomer protein chains for training. We found that the updated version DLIGAND2 has a consistent improvement over DLIGAND in predicting binding affinities for either native complex structures or docking-generated poses. More importantly, DLIGAND2 has a 52% increase over DLIGAND in enrichment factors in top 1% predictions based on the DUD-E decoy set, and consistently improves over Autodock Vina and other statistical energy functions in all three benchmark tests. We further found that DLIGAND2 outperforms empirical and machine-learning methods compared for virtual screening on new targets that are not homologous to the DUD-E training set. Given the best performance as a parameter-free statistical potential and among the best in all performance measures, DLIGAND2 should be useful for re-assessing the poses generated by docking software, or acting as one term in other scoring functions. The program is available at https://github.com/sysu-yanglab/DLIGAND2.![]()
Collapse
|
38
|
Potunuru UR, Priya KV, Varsha MS, Mehta N, Chandel S, Manoj N, Raman T, Ramar M, Gromiha MM, Dixit M. Amarogentin, a secoiridoid glycoside, activates AMP- activated protein kinase (AMPK) to exert beneficial vasculo-metabolic effects. Biochim Biophys Acta Gen Subj 2019; 1863:1270-1282. [DOI: 10.1016/j.bbagen.2019.05.008] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2018] [Revised: 05/07/2019] [Accepted: 05/14/2019] [Indexed: 12/12/2022]
|
39
|
Wang D, Geng L, Zhao YJ, Yang Y, Huang Y, Zhang Y, Shen HB. Artificial intelligence-based multi-objective optimization protocol for protein structure refinement. Bioinformatics 2019; 36:437-448. [PMID: 31274151 PMCID: PMC7999140 DOI: 10.1093/bioinformatics/btz544] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Revised: 06/06/2019] [Accepted: 07/04/2019] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION Protein structure refinement is an important step of protein structure prediction. Existing approaches have generally used a single scoring function combined with Monte Carlo method or Molecular Dynamics algorithm. The one-dimension optimization of a single energy function may take the structure too far away without a constraint. The basic motivation of our study is to reduce the bias problem caused by minimizing only a single energy function due to the very diversity of different protein structures. RESULTS We report a new Artificial Intelligence-based protein structure Refinement method called AIR. Its fundamental idea is to use multiple energy functions as multi-objectives in an effort to correct the potential inaccuracy from a single function. A multi-objective particle swarm optimization algorithm-based structure refinement is designed, where each structure is considered as a particle in the protocol. With the refinement iterations, the particles move around. The quality of particles in each iteration is evaluated by three energy functions, and the non-dominated particles are put into a set called Pareto set. After enough iteration times, particles from the Pareto set are screened and part of the top solutions are outputted as the final refined structures. The multi-objective energy function optimization strategy designed in the AIR protocol provides a different constraint view of the structure, by extending the one-dimension optimization to a new three-dimension space optimization driven by the multi-objective particle swarm optimization engine. Experimental results on CASP11, CASP12 refinement targets and blind tests in CASP 13 turn to be promising. AVAILABILITY AND IMPLEMENTATION The AIR is available online at: www.csbio.sjtu.edu.cn/bioinf/AIR/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Yu-Jun Zhao
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Yang Yang
- Department of Computer Science, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yan Huang
- State Key Laboratory of Infrared Physics, Shanghai Institute of Technical Physics, Chinese Academy of Sciences, Shanghai 200083, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | | |
Collapse
|
40
|
Lai X, Stigliani A, Vachon G, Carles C, Smaczniak C, Zubieta C, Kaufmann K, Parcy F. Building Transcription Factor Binding Site Models to Understand Gene Regulation in Plants. MOLECULAR PLANT 2019; 12:743-763. [PMID: 30447332 DOI: 10.1016/j.molp.2018.10.010] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 09/20/2018] [Accepted: 10/30/2018] [Indexed: 06/09/2023]
Abstract
Transcription factors (TFs) are key cellular components that control gene expression. They recognize specific DNA sequences, the TF binding sites (TFBSs), and thus are targeted to specific regions of the genome where they can recruit transcriptional co-factors and/or chromatin regulators to fine-tune spatiotemporal gene regulation. Therefore, the identification of TFBSs in genomic sequences and their subsequent quantitative modeling is of crucial importance for understanding and predicting gene expression. Here, we review how TFBSs can be determined experimentally, how the TFBS models can be constructed in silico, and how they can be optimized by taking into account features such as position interdependence within TFBSs, DNA shape, and/or by introducing state-of-the-art computational algorithms such as deep learning methods. In addition, we discuss the integration of context variables into the TFBS modeling, including nucleosome positioning, chromatin states, methylation patterns, 3D genome architectures, and TF cooperative binding, in order to better predict TF binding under cellular contexts. Finally, we explore the possibilities of combining the optimized TFBS model with technological advances, such as targeted TFBS perturbation by CRISPR, to better understand gene regulation, evolution, and plant diversity.
Collapse
Affiliation(s)
- Xuelei Lai
- CNRS, Univ. Grenoble Alpes, CEA, INRA, BIG-LPCV, 38000 Grenoble, France.
| | - Arnaud Stigliani
- CNRS, Univ. Grenoble Alpes, CEA, INRA, BIG-LPCV, 38000 Grenoble, France
| | - Gilles Vachon
- CNRS, Univ. Grenoble Alpes, CEA, INRA, BIG-LPCV, 38000 Grenoble, France
| | - Cristel Carles
- CNRS, Univ. Grenoble Alpes, CEA, INRA, BIG-LPCV, 38000 Grenoble, France
| | - Cezary Smaczniak
- Department for Plant Cell and Molecular Biology, Institute for Biology, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Chloe Zubieta
- CNRS, Univ. Grenoble Alpes, CEA, INRA, BIG-LPCV, 38000 Grenoble, France
| | - Kerstin Kaufmann
- Department for Plant Cell and Molecular Biology, Institute for Biology, Humboldt-Universität zu Berlin, Berlin, Germany
| | - François Parcy
- CNRS, Univ. Grenoble Alpes, CEA, INRA, BIG-LPCV, 38000 Grenoble, France.
| |
Collapse
|
41
|
Li Z, Miao Q, Yan F, Meng Y, Zhou P. Machine Learning in Quantitative Protein–peptide Affinity Prediction: Implications for Therapeutic Peptide Design. Curr Drug Metab 2019; 20:170-176. [DOI: 10.2174/1389200219666181012151944] [Citation(s) in RCA: 66] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2017] [Revised: 11/07/2017] [Accepted: 08/20/2018] [Indexed: 01/03/2023]
Abstract
Background:Protein–peptide recognition plays an essential role in the orchestration and regulation of cell signaling networks, which is estimated to be responsible for up to 40% of biological interaction events in the human interactome and has recently been recognized as a new and attractive druggable target for drug development and disease intervention.Methods:We present a systematic review on the application of machine learning techniques in the quantitative modeling and prediction of protein–peptide binding affinity, particularly focusing on its implications for therapeutic peptide design. We also briefly introduce the physical quantities used to characterize protein–peptide affinity and attempt to extend the content of generalized machine learning methods.Results:Existing issues and future perspective on the statistical modeling and regression prediction of protein– peptide binding affinity are discussed.Conclusion:There is still a long way to go before establishment of general, reliable and efficient machine leaningbased protein–peptide affinity predictors.
Collapse
Affiliation(s)
- Zhongyan Li
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), Chengdu 610054, China
| | - Qingqing Miao
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), Chengdu 610054, China
| | - Fugang Yan
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), Chengdu 610054, China
| | - Yang Meng
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), Chengdu 610054, China
| | - Peng Zhou
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), Chengdu 610054, China
| |
Collapse
|
42
|
Methods for the Refinement of Protein Structure 3D Models. Int J Mol Sci 2019; 20:ijms20092301. [PMID: 31075942 PMCID: PMC6539982 DOI: 10.3390/ijms20092301] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Revised: 04/24/2019] [Accepted: 05/07/2019] [Indexed: 12/25/2022] Open
Abstract
The refinement of predicted 3D protein models is crucial in bringing them closer towards experimental accuracy for further computational studies. Refinement approaches can be divided into two main stages: The sampling and scoring stages. Sampling strategies, such as the popular Molecular Dynamics (MD)-based protocols, aim to generate improved 3D models. However, generating 3D models that are closer to the native structure than the initial model remains challenging, as structural deviations from the native basin can be encountered due to force-field inaccuracies. Therefore, different restraint strategies have been applied in order to avoid deviations away from the native structure. For example, the accurate prediction of local errors and/or contacts in the initial models can be used to guide restraints. MD-based protocols, using physics-based force fields and smart restraints, have made significant progress towards a more consistent refinement of 3D models. The scoring stage, including energy functions and Model Quality Assessment Programs (MQAPs) are also used to discriminate near-native conformations from non-native conformations. Nevertheless, there are often very small differences among generated 3D models in refinement pipelines, which makes model discrimination and selection problematic. For this reason, the identification of the most native-like conformations remains a major challenge.
Collapse
|
43
|
da Silva Pinto L, Cardoso G, Kremer FS, dos Santos Woloski RD, Dellagostin OA, Campos VF. Heterologous expression and characterization of a new galactose-binding lectin from Bauhinia forficata with antiproliferative activity. Int J Biol Macromol 2019; 128:877-884. [DOI: 10.1016/j.ijbiomac.2019.01.090] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2018] [Revised: 01/07/2019] [Accepted: 01/18/2019] [Indexed: 02/06/2023]
|
44
|
Pu L, Govindaraj RG, Lemoine JM, Wu HC, Brylinski M. DeepDrug3D: Classification of ligand-binding pockets in proteins with a convolutional neural network. PLoS Comput Biol 2019; 15:e1006718. [PMID: 30716081 PMCID: PMC6375647 DOI: 10.1371/journal.pcbi.1006718] [Citation(s) in RCA: 61] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Revised: 02/14/2019] [Accepted: 12/16/2018] [Indexed: 01/19/2023] Open
Abstract
Comprehensive characterization of ligand-binding sites is invaluable to infer molecular functions of hypothetical proteins, trace evolutionary relationships between proteins, engineer enzymes to achieve a desired substrate specificity, and develop drugs with improved selectivity profiles. These research efforts pose significant challenges owing to the fact that similar pockets are commonly observed across different folds, leading to the high degree of promiscuity of ligand-protein interactions at the system-level. On that account, novel algorithms to accurately classify binding sites are needed. Deep learning is attracting a significant attention due to its successful applications in a wide range of disciplines. In this communication, we present DeepDrug3D, a new approach to characterize and classify binding pockets in proteins with deep learning. It employs a state-of-the-art convolutional neural network in which biomolecular structures are represented as voxels assigned interaction energy-based attributes. The current implementation of DeepDrug3D, trained to detect and classify nucleotide- and heme-binding sites, not only achieves a high accuracy of 95%, but also has the ability to generalize to unseen data as demonstrated for steroid-binding proteins and peptidase enzymes. Interestingly, the analysis of strongly discriminative regions of binding pockets reveals that this high classification accuracy arises from learning the patterns of specific molecular interactions, such as hydrogen bonds, aromatic and hydrophobic contacts. DeepDrug3D is available as an open-source program at https://github.com/pulimeng/DeepDrug3D with the accompanying TOUGH-C1 benchmarking dataset accessible from https://osf.io/enz69/.
Collapse
Affiliation(s)
- Limeng Pu
- Division of Electrical & Computer Engineering, Louisiana State University, Baton Rouge, LA, United States of America
| | - Rajiv Gandhi Govindaraj
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, United States of America
| | - Jeffrey Mitchell Lemoine
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, United States of America
- Division of Computer Science and Engineering, Louisiana State University, Baton Rouge, LA, United States of America
| | - Hsiao-Chun Wu
- Division of Electrical & Computer Engineering, Louisiana State University, Baton Rouge, LA, United States of America
| | - Michal Brylinski
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, United States of America
- Center for Computation & Technology, Louisiana State University, Baton Rouge, LA, United States of America
- * E-mail:
| |
Collapse
|
45
|
Geng C, Xue LC, Roel‐Touris J, Bonvin AMJJ. Finding the ΔΔ
G
spot: Are predictors of binding affinity changes upon mutations in protein–protein interactions ready for it? WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2019. [DOI: 10.1002/wcms.1410] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Affiliation(s)
- Cunliang Geng
- Bijvoet Center for Biomolecular Research, Faculty of Science—Chemistry Utrecht University Utrecht The Netherlands
| | - Li C. Xue
- Bijvoet Center for Biomolecular Research, Faculty of Science—Chemistry Utrecht University Utrecht The Netherlands
| | - Jorge Roel‐Touris
- Bijvoet Center for Biomolecular Research, Faculty of Science—Chemistry Utrecht University Utrecht The Netherlands
| | - Alexandre M. J. J. Bonvin
- Bijvoet Center for Biomolecular Research, Faculty of Science—Chemistry Utrecht University Utrecht The Netherlands
| |
Collapse
|
46
|
Corona RI, Sudarshan S, Aluru S, Guo JT. An SVM-based method for assessment of transcription factor-DNA complex models. BMC Bioinformatics 2018; 19:506. [PMID: 30577740 PMCID: PMC6302363 DOI: 10.1186/s12859-018-2538-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Background Atomic details of protein-DNA complexes can provide insightful information for better understanding of the function and binding specificity of DNA binding proteins. In addition to experimental methods for solving protein-DNA complex structures, protein-DNA docking can be used to predict native or near-native complex models. A docking program typically generates a large number of complex conformations and predicts the complex model(s) based on interaction energies between protein and DNA. However, the prediction accuracy is hampered by current approaches to model assessment, especially when docking simulations fail to produce any near-native models. Results We present here a Support Vector Machine (SVM)-based approach for quality assessment of the predicted transcription factor (TF)-DNA complex models. Besides a knowledge-based protein-DNA interaction potential DDNA3, we applied several structural features that have been shown to play important roles in binding specificity between transcription factors and DNA molecules to quality assessment of complex models. To address the issue of unbalanced positive and negative cases in the training dataset, we applied hard-negative mining, an iterative training process that selects an initial training dataset by combining all of the positive cases and a random sample from the negative cases. Results show that the SVM model greatly improves prediction accuracy (84.2%) over two knowledge-based protein-DNA interaction potentials, orientation potential (60.8%) and DDNA3 (68.4%). The improvement is achieved through reducing the number of false positive predictions, especially for the hard docking cases, in which a docking algorithm fails to produce any near-native complex models. Conclusions A learning-based SVM scoring model with structural features for specific protein-DNA binding and an atomic-level protein-DNA interaction potential DDNA3 significantly improves prediction accuracy of complex models by successfully identifying cases without near-native structural models. Electronic supplementary material The online version of this article (10.1186/s12859-018-2538-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Rosario I Corona
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC, 28223, USA
| | - Sanjana Sudarshan
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC, 28223, USA
| | - Srinivas Aluru
- School of Computational Science and Engineering, Georgia Institute of Technology, 266 Ferst Drive, Atlanta, GA, 30332, USA
| | - Jun-Tao Guo
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC, 28223, USA.
| |
Collapse
|
47
|
Peng Y, Sun L, Jia Z, Li L, Alexov E. Predicting protein-DNA binding free energy change upon missense mutations using modified MM/PBSA approach: SAMPDI webserver. Bioinformatics 2018; 34:779-786. [PMID: 29091991 DOI: 10.1093/bioinformatics/btx698] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Accepted: 10/27/2017] [Indexed: 12/28/2022] Open
Abstract
Motivation Protein-DNA interactions are essential for regulating many cellular processes, such as transcription, replication, recombination and translation. Amino acid mutations occurring in DNA-binding proteins have profound effects on protein-DNA binding and are linked with many diseases. Hence, accurate and fast predictions of the effects of mutations on protein-DNA binding affinity are essential for understanding disease-causing mechanisms and guiding plausible treatments. Results Here we report a new method Single Amino acid Mutation binding free energy change of Protein-DNA Interaction (SAMPDI). The method utilizes modified Molecular Mechanics Poisson-Boltzmann Surface Area (MM/PBSA) approach along with an additional set of knowledge-based terms delivered from investigations of the physicochemical properties of protein-DNA complexes. The method is benchmarked against experimentally determined binding free energy changes caused by 105 mutations in 13 proteins (compiled ProNIT database and data from recent references), and results in correlation coefficient of 0.72. Availability and implementation http://compbio.clemson.edu/SAMPDI. Contact ealexov@clemson.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yunhui Peng
- Department of Physics and Astronomy, Clemson University, Clemson SC 29634, USA
| | - Lexuan Sun
- Department of Physics and Astronomy, Clemson University, Clemson SC 29634, USA
| | - Zhe Jia
- Department of Physics and Astronomy, Clemson University, Clemson SC 29634, USA
| | - Lin Li
- Department of Physics and Astronomy, Clemson University, Clemson SC 29634, USA
| | - Emil Alexov
- Department of Physics and Astronomy, Clemson University, Clemson SC 29634, USA
| |
Collapse
|
48
|
Zheng Z, Pei J, Bansal N, Liu H, Song LF, Merz KM. Generation of Pairwise Potentials Using Multidimensional Data Mining. J Chem Theory Comput 2018; 14:5045-5067. [PMID: 30183299 DOI: 10.1021/acs.jctc.8b00516] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The rapid development of molecular structural databases provides the chemistry community access to an enormous array of experimental data that can be used to build and validate computational models. Using radial distribution functions collected from experimentally available X-ray and NMR structures, a number of so-called statistical potentials have been developed over the years using the structural data mining strategy. These potentials have been developed within the context of the two-particle Kirkwood equation by extending its original use for isotropic monatomic systems to anisotropic biomolecular systems. However, the accuracy and the unclear physical meaning of statistical potentials have long formed the central arguments against such methods. In this work, we present a new approach to generate molecular energy functions using structural data mining. Instead of employing the Kirkwood equation and introducing the "reference state" approximation, we model the multidimensional probability distributions of the molecular system using graphical models and generate the target pairwise Boltzmann probabilities using the Bayesian field theory. Different from the current statistical potentials that mimic the "knowledge-based" PMF based on the 2-particle Kirkwood equation, the graphical-model-based structure-derived potential developed in this study focuses on the generation of lower-dimensional Boltzmann distributions of atoms through reduction of dimensionality. We have named this new scoring function GARF, and in this work we focus on the mathematical derivation of our novel approach followed by validation studies on its ability to predict protein-ligand interactions.
Collapse
Affiliation(s)
- Zheng Zheng
- Department of Chemistry , Michigan State University , 578 South Shaw Lane , East Lansing , Michigan 48824 , United States
| | - Jun Pei
- Department of Chemistry , Michigan State University , 578 South Shaw Lane , East Lansing , Michigan 48824 , United States
| | - Nupur Bansal
- Department of Chemistry , Michigan State University , 578 South Shaw Lane , East Lansing , Michigan 48824 , United States
| | - Hao Liu
- Department of Chemistry , Michigan State University , 578 South Shaw Lane , East Lansing , Michigan 48824 , United States
| | - Lin Frank Song
- Department of Chemistry , Michigan State University , 578 South Shaw Lane , East Lansing , Michigan 48824 , United States
| | - Kenneth M Merz
- Department of Chemistry , Michigan State University , 578 South Shaw Lane , East Lansing , Michigan 48824 , United States
| |
Collapse
|
49
|
Identification of Effective Dimeric Gramicidin-D Peptide as Antimicrobial Therapeutics over Drug Resistance: In-Silico Approach. Interdiscip Sci 2018; 11:575-583. [PMID: 30182355 DOI: 10.1007/s12539-018-0304-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Revised: 07/25/2018] [Accepted: 08/28/2018] [Indexed: 10/28/2022]
Abstract
Discovering and developing the antimicrobial peptides are recently focused on pharmaceutical firm, since they serve as complementary to antibiotics in prevailing over drug resistance by eliciting the disruption of microbial membrane. Still, there are lots of challenges to bring up the structurally stable and functionally efficient antimicrobial peptides. It is well known that gramicidin D is the prominent antimicrobial peptide that exists as g-AB, g-BC, and g-AC. This study analyzes the structural stability and the functional activity of hetero-dimeric double-stranded gramicidin-D peptides, thereby demonstrating its potent antimicrobial activity against antibiotic-resistant micro-organisms. To investigate the structural stability and functionality of gramicidin D, we performed static and dynamic analysis. Initially, we observed a maximum number of intermolecular interactions and membrane penetration in g-AB as compared to g-BC and g-AC. To substantiate further, the geometrical and thermodynamic parameters revealed the retention of maximum stability in g-AB than g-AC and g-BC. Thus, the conformational free energy and the binding free energy showed the variation among gramicidin-D peptides for the prediction of increased stability and functionality. In conclusion, g-AB peptide has definitely demonstrated adequate structural stability and functionality and this work will need to be considered in peptide-based drug discovery.
Collapse
|
50
|
Zang T, Ma T, Wang Q, Ma J. Improving low-accuracy protein structures using enhanced sampling techniques. J Chem Phys 2018; 149:072319. [PMID: 30134714 PMCID: PMC5995690 DOI: 10.1063/1.5027243] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Accepted: 05/23/2018] [Indexed: 11/14/2022] Open
Abstract
In this paper, we report results of using enhanced sampling and blind selection techniques for high-accuracy protein structural refinement. By combining a parallel continuous simulated tempering (PCST) method, previously developed by Zang et al. [J. Chem. Phys. 141, 044113 (2014)], and the structure based model (SBM) as restraints, we refined 23 targets (18 from the refinement category of the CASP10 and 5 from that of CASP12). We also designed a novel model selection method to blindly select high-quality models from very long simulation trajectories. The combined use of PCST-SBM with the blind selection method yielded final models that are better than initial models. For Top-1 group, 7 out of 23 targets had better models (greater global distance test total scores) than the critical assessment of structure prediction participants. For Top-5 group, 10 out of 23 were better. Our results justify the crucial position of enhanced sampling in protein structure prediction and refinement and demonstrate that a considerable improvement of low-accuracy structures is achievable with current force fields.
Collapse
Affiliation(s)
- Tianwu Zang
- Applied Physics Program and Department of Bioengineering, Rice University, Houston, Texas 77005, USA
| | - Tianqi Ma
- Applied Physics Program and Department of Bioengineering, Rice University, Houston, Texas 77005, USA
| | - Qinghua Wang
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas 77030, USA
| | - Jianpeng Ma
- Author to whom correspondence should be addressed: . Telephone: 713-798-8187. Fax: 713-796-9438
| |
Collapse
|