1
|
Ray Chaudhuri N, Ghosh Dastidar S. Adaptive Workflows of Machine Learning Illuminate the Sequential Operation Mechanism of the TAK1's Allosteric Network. Biochemistry 2024; 63:1474-1492. [PMID: 38743619 DOI: 10.1021/acs.biochem.3c00643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Allostery is a fundamental mechanism driving biomolecular processes that holds significant therapeutic concern. Our study rigorously investigates how two distinct machine-learning algorithms uniquely classify two already close-to-active DFG-in states of TAK1, differing just by the presence or absence of its allosteric activator TAB1, from an ensemble mixture of conformations (obtained from 2.4 μs molecular dynamics (MD) simulations). The novelty, however, lies in understanding the deeper algorithmic potentials to systematically derive a diverse set of differential residue connectivity features that reconstruct the essential mechanistic architecture for TAK1-TAB1 allostery in such a close-to-active biochemical scenario. While the recursive, random forest-based workflow displays the potential of conducting discretized, hierarchical derivation of allosteric features, a multilayer perceptron-based approach gains considerable efficacy in revealing fluid connected patterns of features when hybridized with mutual information scoring. Interestingly, both pipelines benchmark similar directions of functional conformational changes for TAK1's activation. The findings significantly advance the depth of mechanistic understanding by highlighting crucial activation signatures along a directed C-lobe → activation loop → ATP pocket channel of information flow, including (1) the αF-αE biterminal alignments and (2) the "catalytic" drift of the activation loop toward kinase active site. Besides, some novel allosteric hotspots (K253, Y206, N189, etc.) are further recognized as TAB1 sensors, transducers, and responders, including a benchmark E70 mutation site, precisely mapping the important structural segments for sequential allosteric execution. Hence, our work demonstrates how to navigate through greater structural depths and dimensions of dynamic allosteric machineries just by leveraging standard ML methods in suitable streamlined workflows adaptive to the specific system and objectives.
Collapse
Affiliation(s)
- Nibedita Ray Chaudhuri
- Biological Sciences, Bose Institute, EN 80, Sector V, Bidhan Nagar, Kolkata 700091, India
| | - Shubhra Ghosh Dastidar
- Biological Sciences, Bose Institute, EN 80, Sector V, Bidhan Nagar, Kolkata 700091, India
| |
Collapse
|
2
|
Basciu A, Athar M, Kurt H, Neville C, Malloci G, Muredda FC, Bosin A, Ruggerone P, Bonvin AMJJ, Vargiu AV. Predicting binding events in very flexible, allosteric, multi-domain proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.02.597018. [PMID: 38895346 PMCID: PMC11185556 DOI: 10.1101/2024.06.02.597018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Knowledge of the structures formed by proteins and small ligands is of fundamental importance for understanding molecular principles of chemotherapy and for designing new and more effective drugs. Due to the still high costs and to the several limitations of experimental techniques, it is most often desirable to predict these ligand-protein complexes in silico, particularly when screening for new putative drugs from databases of millions of compounds. While virtual screening based on molecular docking is widely used for this purpose, it generally fails in mimicking binding events associated with large conformational changes in the protein, particularly when the latter involve multiple domains. In this work, we describe a new methodology aimed at generating bound-like conformations of very flexible and allosteric proteins bearing multiple binding sites. Validation was performed on the enzyme adenylate kinase (ADK), a paradigmatic example of proteins that undergo very large conformational changes upon ligand binding. By only exploiting the unbound structure and the putative binding sites of the protein, we generated a significant fraction of bound-like structures, which employed in ensemble-docking calculations allowed to find native-like poses of substrates, inhibitors, and catalytically incompetent binders. Our protocol provides a general framework for the generation of bound-like conformations of flexible proteins that are suitable to host different ligands, demonstrating high sensitivity to the fine chemical details that regulate protein's activity. We foresee applications in virtual screening for difficult targets, prediction of the impact of amino acid mutations on structure and dynamics, and protein engineering.
Collapse
Affiliation(s)
- Andrea Basciu
- Physics Department, University of Cagliari, Cittadella Universitaria, I-09042 Monserrato (CA), Italy
| | - Mohd Athar
- Physics Department, University of Cagliari, Cittadella Universitaria, I-09042 Monserrato (CA), Italy
| | - Han Kurt
- Physics Department, University of Cagliari, Cittadella Universitaria, I-09042 Monserrato (CA), Italy
| | - Christine Neville
- Institute for Computational Molecular Science, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Giuliano Malloci
- Physics Department, University of Cagliari, Cittadella Universitaria, I-09042 Monserrato (CA), Italy
| | - Fabrizio C. Muredda
- Physics Department, University of Cagliari, Cittadella Universitaria, I-09042 Monserrato (CA), Italy
| | - Andrea Bosin
- Physics Department, University of Cagliari, Cittadella Universitaria, I-09042 Monserrato (CA), Italy
| | - Paolo Ruggerone
- Physics Department, University of Cagliari, Cittadella Universitaria, I-09042 Monserrato (CA), Italy
| | - Alexandre M. J. J. Bonvin
- Bijvoet Centre for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
| | - Attilio V. Vargiu
- Physics Department, University of Cagliari, Cittadella Universitaria, I-09042 Monserrato (CA), Italy
| |
Collapse
|
3
|
Lensink MF, Brysbaert G, Raouraoua N, Bates PA, Giulini M, Honorato RV, van Noort C, Teixeira JMC, Bonvin AMJJ, Kong R, Shi H, Lu X, Chang S, Liu J, Guo Z, Chen X, Morehead A, Roy RS, Wu T, Giri N, Quadir F, Chen C, Cheng J, Del Carpio CA, Ichiishi E, Rodriguez‐Lumbreras LA, Fernandez‐Recio J, Harmalkar A, Chu L, Canner S, Smanta R, Gray JJ, Li H, Lin P, He J, Tao H, Huang S, Roel‐Touris J, Jimenez‐Garcia B, Christoffer CW, Jain AJ, Kagaya Y, Kannan H, Nakamura T, Terashi G, Verburgt JC, Zhang Y, Zhang Z, Fujuta H, Sekijima M, Kihara D, Khan O, Kotelnikov S, Ghani U, Padhorny D, Beglov D, Vajda S, Kozakov D, Negi SS, Ricciardelli T, Barradas‐Bautista D, Cao Z, Chawla M, Cavallo L, Oliva R, Yin R, Cheung M, Guest JD, Lee J, Pierce BG, Shor B, Cohen T, Halfon M, Schneidman‐Duhovny D, Zhu S, Yin R, Sun Y, Shen Y, Maszota‐Zieleniak M, Bojarski KK, Lubecka EA, Marcisz M, Danielsson A, Dziadek L, Gaardlos M, Gieldon A, Liwo A, Samsonov SA, Slusarz R, Zieba K, Sieradzan AK, Czaplewski C, Kobayashi S, Miyakawa Y, Kiyota Y, Takeda‐Shitaka M, Olechnovic K, Valancauskas L, Dapkunas J, Venclovas C, Wallner B, Yang L, Hou C, He X, Guo S, Jiang S, Ma X, Duan R, Qui L, Xu X, Zou X, Velankar S, Wodak SJ. Impact of AlphaFold on structure prediction of protein complexes: The CASP15-CAPRI experiment. Proteins 2023; 91:1658-1683. [PMID: 37905971 PMCID: PMC10841881 DOI: 10.1002/prot.26609] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 09/22/2023] [Accepted: 09/28/2023] [Indexed: 11/02/2023]
Abstract
We present the results for CAPRI Round 54, the 5th joint CASP-CAPRI protein assembly prediction challenge. The Round offered 37 targets, including 14 homodimers, 3 homo-trimers, 13 heterodimers including 3 antibody-antigen complexes, and 7 large assemblies. On average ~70 CASP and CAPRI predictor groups, including more than 20 automatics servers, submitted models for each target. A total of 21 941 models submitted by these groups and by 15 CAPRI scorer groups were evaluated using the CAPRI model quality measures and the DockQ score consolidating these measures. The prediction performance was quantified by a weighted score based on the number of models of acceptable quality or higher submitted by each group among their five best models. Results show substantial progress achieved across a significant fraction of the 60+ participating groups. High-quality models were produced for about 40% of the targets compared to 8% two years earlier. This remarkable improvement is due to the wide use of the AlphaFold2 and AlphaFold2-Multimer software and the confidence metrics they provide. Notably, expanded sampling of candidate solutions by manipulating these deep learning inference engines, enriching multiple sequence alignments, or integration of advanced modeling tools, enabled top performing groups to exceed the performance of a standard AlphaFold2-Multimer version used as a yard stick. This notwithstanding, performance remained poor for complexes with antibodies and nanobodies, where evolutionary relationships between the binding partners are lacking, and for complexes featuring conformational flexibility, clearly indicating that the prediction of protein complexes remains a challenging problem.
Collapse
Affiliation(s)
- Marc F. Lensink
- Univ. Lille, CNRS, UMR8576 – UGSF – Unité de Glycobiologie Structurale et FonctionnelleLilleFrance
| | - Guillaume Brysbaert
- Univ. Lille, CNRS, UMR8576 – UGSF – Unité de Glycobiologie Structurale et FonctionnelleLilleFrance
| | - Nessim Raouraoua
- Univ. Lille, CNRS, UMR8576 – UGSF – Unité de Glycobiologie Structurale et FonctionnelleLilleFrance
| | - Paul A. Bates
- Biomolecular Modeling LaboratoryThe Francis Crick InstituteLondonUK
| | - Marco Giulini
- Bijvoet Center for Biomolecular Research, Faculty of Science – ChemistryUtrecht UniversityUtrechtThe Netherlands
| | - Rodrigo V. Honorato
- Bijvoet Center for Biomolecular Research, Faculty of Science – ChemistryUtrecht UniversityUtrechtThe Netherlands
| | - Charlotte van Noort
- Bijvoet Center for Biomolecular Research, Faculty of Science – ChemistryUtrecht UniversityUtrechtThe Netherlands
| | - Joao M. C. Teixeira
- Bijvoet Center for Biomolecular Research, Faculty of Science – ChemistryUtrecht UniversityUtrechtThe Netherlands
| | - Alexandre M. J. J. Bonvin
- Bijvoet Center for Biomolecular Research, Faculty of Science – ChemistryUtrecht UniversityUtrechtThe Netherlands
| | - Ren Kong
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information EngineeringJiangsu University of TechnologyChangzhouChina
| | - Hang Shi
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information EngineeringJiangsu University of TechnologyChangzhouChina
| | - Xufeng Lu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information EngineeringJiangsu University of TechnologyChangzhouChina
| | - Shan Chang
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information EngineeringJiangsu University of TechnologyChangzhouChina
| | - Jian Liu
- Dept. of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Zhiye Guo
- Dept. of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Xiao Chen
- Dept. of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Alex Morehead
- Dept. of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Raj S. Roy
- Dept. of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Tianqi Wu
- Dept. of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Nabin Giri
- Dept. of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Farhan Quadir
- Dept. of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Chen Chen
- Dept. of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Jianlin Cheng
- Dept. of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | | | - Eichiro Ichiishi
- International University of Health and Welfare (IUHV Hospital)Nasushiobara‐CityJapan
| | - Luis A. Rodriguez‐Lumbreras
- Instituto de Ciencias de la Vida y del Vino (ICVV)CSIC ‐ Universidad de La Rioja ‐ Gobierno de La RiojaLogronoSpain
- Barcelona Supercomputing Center (BSC)BarcelonaSpain
| | - Juan Fernandez‐Recio
- Instituto de Ciencias de la Vida y del Vino (ICVV)CSIC ‐ Universidad de La Rioja ‐ Gobierno de La RiojaLogronoSpain
- Barcelona Supercomputing Center (BSC)BarcelonaSpain
| | - Ameya Harmalkar
- Dept. of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Lee‐Shin Chu
- Dept. of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Sam Canner
- Dept. of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Rituparna Smanta
- Dept. of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Jeffrey J. Gray
- Dept. of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMarylandUSA
- Program in Molecular BiophysicsJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Hao Li
- School of PhysicsHuazhong University of Science and TechnologyWuhanChina
| | - Peicong Lin
- School of PhysicsHuazhong University of Science and TechnologyWuhanChina
| | - Jiahua He
- School of PhysicsHuazhong University of Science and TechnologyWuhanChina
| | - Huanyu Tao
- School of PhysicsHuazhong University of Science and TechnologyWuhanChina
| | - Sheng‐You Huang
- School of PhysicsHuazhong University of Science and TechnologyWuhanChina
| | - Jorge Roel‐Touris
- Protein Design and Modeling Lab, Dept. of Structural BiologyMolecular Biology Institute of Barcelona (IBMB‐CSIC)BarcelonaSpain
| | | | | | - Anika J. Jain
- Dept. of Biological SciencesPurdue UniversityWest LafayetteIndianaUSA
| | - Yuki Kagaya
- Dept. of Biological SciencesPurdue UniversityWest LafayetteIndianaUSA
| | - Harini Kannan
- Dept. of Biological SciencesPurdue UniversityWest LafayetteIndianaUSA
- Dept. of Biotechnology, Bhupat and Jyoti Mehta School of BiosciencesIndian Institute of Technology MadrasChennaiIndia
| | - Tsukasa Nakamura
- Dept. of Biological SciencesPurdue UniversityWest LafayetteIndianaUSA
| | - Genki Terashi
- Dept. of Biological SciencesPurdue UniversityWest LafayetteIndianaUSA
| | - Jacob C. Verburgt
- Dept. of Biological SciencesPurdue UniversityWest LafayetteIndianaUSA
| | - Yuanyuan Zhang
- Dept. of Computer SciencePurdue UniversityWest LafayetteIndianaUSA
| | - Zicong Zhang
- Dept. of Computer SciencePurdue UniversityWest LafayetteIndianaUSA
| | - Hayato Fujuta
- Dept. of Biotechnology, Bhupat and Jyoti Mehta School of BiosciencesIndian Institute of Technology MadrasChennaiIndia
| | | | - Daisuke Kihara
- Dept. of Computer SciencePurdue UniversityWest LafayetteIndianaUSA
- Dept. of Biological SciencesPurdue UniversityWest LafayetteIndianaUSA
| | | | | | | | | | | | | | | | - Surendra S. Negi
- Sealy Center for Structural Biology and Molecular BiophysicsUniversity of Texas Medical BranchGalvestonTexasUSA
| | | | | | - Zhen Cao
- King Abdullah University of Science and Technology (KAUST)Saudi Arabia
| | - Mohit Chawla
- King Abdullah University of Science and Technology (KAUST)Saudi Arabia
| | - Luigi Cavallo
- King Abdullah University of Science and Technology (KAUST)Saudi Arabia
- Department of Chemistry and BiologyUniversity of SalernoFiscianoItaly
| | | | - Rui Yin
- University of Maryland Institute for Bioscience and Biotechnology ResearchRockvilleMarylandUSA
- Dept. of Cell Biology and Molecular GeneticsUniversity of MarylandCollege ParkMarylandUSA
| | - Melyssa Cheung
- University of Maryland Institute for Bioscience and Biotechnology ResearchRockvilleMarylandUSA
- Dept. of Chemistry and BiochemistryUniversity of MarylandCollege ParkMarylandUSA
| | - Johnathan D. Guest
- University of Maryland Institute for Bioscience and Biotechnology ResearchRockvilleMarylandUSA
- Dept. of Cell Biology and Molecular GeneticsUniversity of MarylandCollege ParkMarylandUSA
| | - Jessica Lee
- University of Maryland Institute for Bioscience and Biotechnology ResearchRockvilleMarylandUSA
- Dept. of Cell Biology and Molecular GeneticsUniversity of MarylandCollege ParkMarylandUSA
| | - Brian G. Pierce
- University of Maryland Institute for Bioscience and Biotechnology ResearchRockvilleMarylandUSA
- Dept. of Cell Biology and Molecular GeneticsUniversity of MarylandCollege ParkMarylandUSA
| | - Ben Shor
- School of Computer Science and EngineeringThe Hebrew University of JerusalemJerusalemIsrael
| | - Tomer Cohen
- School of Computer Science and EngineeringThe Hebrew University of JerusalemJerusalemIsrael
| | - Matan Halfon
- School of Computer Science and EngineeringThe Hebrew University of JerusalemJerusalemIsrael
| | | | - Shaowen Zhu
- Department of Electrical and Computer EngineeringTexas A&M UniversityCollege StationTexasUSA
| | - Rujie Yin
- Department of Electrical and Computer EngineeringTexas A&M UniversityCollege StationTexasUSA
| | - Yuanfei Sun
- Department of Electrical and Computer EngineeringTexas A&M UniversityCollege StationTexasUSA
| | - Yang Shen
- Department of Electrical and Computer EngineeringTexas A&M UniversityCollege StationTexasUSA
- Department of Computer Science and EngineeringTexas A&M UniversityCollege StationTexasUSA
- Institute of Biosciences and Technology and Department of Translational Medical SciencesTexas A&M UniversityHoustonTexasUSA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Yuta Miyakawa
- School of PharmacyKitasato UniversityMinato‐kuTokyoJapan
| | - Yasuomi Kiyota
- School of PharmacyKitasato UniversityMinato‐kuTokyoJapan
| | | | - Kliment Olechnovic
- Institute of Biotechnology, Life Sciences CenterVilnius UniversityVilniusLithuania
| | - Lukas Valancauskas
- Institute of Biotechnology, Life Sciences CenterVilnius UniversityVilniusLithuania
| | - Justas Dapkunas
- Institute of Biotechnology, Life Sciences CenterVilnius UniversityVilniusLithuania
| | - Ceslovas Venclovas
- Institute of Biotechnology, Life Sciences CenterVilnius UniversityVilniusLithuania
| | - Bjorn Wallner
- Bioinformatics Division, Department of Physics, Chemistry, and BiologyLinkoping UniversityLinköpingSweden
| | - Lin Yang
- National Key Laboratory of Science and Technology on Advanced Composites in Special Environments, Center for Composite Materials and StructuresHarbin Institute of TechnologyHarbinChina
- School of Aerospace, Mechanical and Mechatronic EngineeringThe University of SydneyNew South WalesAustralia
| | - Chengyu Hou
- School of Electronics and Information EngineeringHarbin Institute of TechnologyHarbinChina
| | - Xiaodong He
- National Key Laboratory of Science and Technology on Advanced Composites in Special Environments, Center for Composite Materials and StructuresHarbin Institute of TechnologyHarbinChina
- Shenzhen STRONG Advanced Materials Research Institute Col, LtdShenzhenPeople's Republic of China
| | - Shuai Guo
- National Key Laboratory of Science and Technology on Advanced Composites in Special Environments, Center for Composite Materials and StructuresHarbin Institute of TechnologyHarbinChina
| | - Shenda Jiang
- National Key Laboratory of Science and Technology on Advanced Composites in Special Environments, Center for Composite Materials and StructuresHarbin Institute of TechnologyHarbinChina
| | - Xiaoliang Ma
- National Key Laboratory of Science and Technology on Advanced Composites in Special Environments, Center for Composite Materials and StructuresHarbin Institute of TechnologyHarbinChina
| | - Rui Duan
- Dalton Cardiovascular Research CenterUniversity of MissouriColumbiaMissouriUSA
| | - Liming Qui
- Dalton Cardiovascular Research CenterUniversity of MissouriColumbiaMissouriUSA
| | - Xianjin Xu
- Dalton Cardiovascular Research CenterUniversity of MissouriColumbiaMissouriUSA
| | - Xiaoqin Zou
- Dalton Cardiovascular Research CenterUniversity of MissouriColumbiaMissouriUSA
- Dept. of Physics and AstronomyUniversity of MissouriColumbiaMissouriUSA
- Dept. of BiochemistryUniversity of MissouriColumbiaMissouriUSA
- Institute for Data Science and InformaticsUniversity of MissouriColumbiaMissouriUSA
| | - Sameer Velankar
- Protein Data Bank in Europe, European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI)HinxtonCambridgeUK
| | | |
Collapse
|
4
|
López-Correa JM, König C, Vellido A. GPCR molecular dynamics forecasting using recurrent neural networks. Sci Rep 2023; 13:20995. [PMID: 38017062 PMCID: PMC10684758 DOI: 10.1038/s41598-023-48346-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 11/25/2023] [Indexed: 11/30/2023] Open
Abstract
G protein-coupled receptors (GPCRs) are a large superfamily of cell membrane proteins that play an important physiological role as transmitters of extracellular signals. Signal transmission through the cell membrane depends on conformational changes in the transmembrane region of the receptor, which makes the investigation of the dynamics in these regions particularly relevant. Molecular dynamics (MD) simulations provide a wealth of data about the structure, dynamics, and physiological function of biological macromolecules by modelling the interactions between their atomic constituents. In this study, a Recurrent and Convolutional Neural Network (RNN) model, namely Long Short-Term Memory (LSTM), is used to predict the dynamics of two GPCR states and three specific simulations of each one, through their activation path and focussing on specific receptor regions. Active and inactive states of the GPCRs are analysed in six scenarios involving APO, Full Agonist (BI 167107) and Partial Inverse Agonist (carazolol) of the receptor. Four Machine Learning models with increasing complexity in terms of neural network architecture are evaluated, and their results discussed. The best method achieves an overall RMSD lower than 0.139 Å and the transmembrane helices are the regions showing the minimum prediction errors and minimum relative movements of the protein.
Collapse
Affiliation(s)
| | - Caroline König
- Universitat Politècnica de Catalunya, Barcelona, Spain
- IDEAI-UPC - Research Center, Universitat Politècnica de Catalunya, Barcelona, Spain
| | - Alfredo Vellido
- Universitat Politècnica de Catalunya, Barcelona, Spain.
- IDEAI-UPC - Research Center, Universitat Politècnica de Catalunya, Barcelona, Spain.
| |
Collapse
|
5
|
Zhu J, Li Z, Tong H, Lu Z, Zhang N, Wei T, Chen HF. Phanto-IDP: compact model for precise intrinsically disordered protein backbone generation and enhanced sampling. Brief Bioinform 2023; 25:bbad429. [PMID: 38018910 PMCID: PMC10783862 DOI: 10.1093/bib/bbad429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 09/21/2023] [Accepted: 11/05/2023] [Indexed: 11/30/2023] Open
Abstract
The biological function of proteins is determined not only by their static structures but also by the dynamic properties of their conformational ensembles. Numerous high-accuracy static structure prediction tools have been recently developed based on deep learning; however, there remains a lack of efficient and accurate methods for exploring protein dynamic conformations. Traditionally, studies concerning protein dynamics have relied on molecular dynamics (MD) simulations, which incur significant computational costs for all-atom precision and struggle to adequately sample conformational spaces with high energy barriers. To overcome these limitations, various enhanced sampling techniques have been developed to accelerate sampling in MD. Traditional enhanced sampling approaches like replica exchange molecular dynamics (REMD) and frontier expansion sampling (FEXS) often follow the MD simulation approach and still cost a lot of computational resources and time. Variational autoencoders (VAEs), as a classic deep generative model, are not restricted by potential energy landscapes and can explore conformational spaces more efficiently than traditional methods. However, VAEs often face challenges in generating reasonable conformations for complex proteins, especially intrinsically disordered proteins (IDPs), which limits their application as an enhanced sampling method. In this study, we presented a novel deep learning model (named Phanto-IDP) that utilizes a graph-based encoder to extract protein features and a transformer-based decoder combined with variational sampling to generate highly accurate protein backbones. Ten IDPs and four structured proteins were used to evaluate the sampling ability of Phanto-IDP. The results demonstrate that Phanto-IDP has high fidelity and diversity in the generated conformation ensembles, making it a suitable tool for enhancing the efficiency of MD simulation, generating broader protein conformational space and a continuous protein transition path.
Collapse
Affiliation(s)
- Junjie Zhu
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Zhengxin Li
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Haowei Tong
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Zhouyu Lu
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Ningjie Zhang
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Ting Wei
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Hai-Feng Chen
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| |
Collapse
|
6
|
Chen SH, Weiss KL, Stanley C, Bhowmik D. Structural characterization of an intrinsically disordered protein complex using integrated small-angle neutron scattering and computing. Protein Sci 2023; 32:e4772. [PMID: 37646172 PMCID: PMC10503416 DOI: 10.1002/pro.4772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 08/22/2023] [Accepted: 08/27/2023] [Indexed: 09/01/2023]
Abstract
Characterizing structural ensembles of intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs) of proteins is essential for studying structure-function relationships. Due to the different neutron scattering lengths of hydrogen and deuterium, selective labeling and contrast matching in small-angle neutron scattering (SANS) becomes an effective tool to study dynamic structures of disordered systems. However, experimental timescales typically capture measurements averaged over multiple conformations, leaving complex SANS data for disentanglement. We hereby demonstrate an integrated method to elucidate the structural ensemble of a complex formed by two IDRs. We use data from both full contrast and contrast matching with residue-specific deuterium labeling SANS experiments, microsecond all-atom molecular dynamics (MD) simulations with four molecular mechanics force fields, and an autoencoder-based deep learning (DL) algorithm. From our combined approach, we show that selective deuteration provides additional information that helps characterize structural ensembles. We find that among the four force fields, a99SB-disp and CHARMM36m show the strongest agreement with SANS and NMR experiments. In addition, our DL algorithm not only complements conventional structural analysis methods but also successfully differentiates NMR and MD structures which are indistinguishable on the free energy surface. Lastly, we present an ensemble that describes experimental SANS and NMR data better than MD ensembles generated by one single force field and reveal three clusters of distinct conformations. Our results demonstrate a new integrated approach for characterizing structural ensembles of IDPs.
Collapse
Affiliation(s)
- Serena H. Chen
- Computational Sciences and Engineering DivisionOak Ridge National LaboratoryOak RidgeTennesseeUSA
| | - Kevin L. Weiss
- Neutron Scattering DivisionOak Ridge National LaboratoryOak RidgeTennesseeUSA
| | - Christopher Stanley
- Computational Sciences and Engineering DivisionOak Ridge National LaboratoryOak RidgeTennesseeUSA
| | - Debsindhu Bhowmik
- Computational Sciences and Engineering DivisionOak Ridge National LaboratoryOak RidgeTennesseeUSA
| |
Collapse
|
7
|
Zhang L, Wang S, Hou J, Si D, Zhu J, Cao R. ComplexQA: a deep graph learning approach for protein complex structure assessment. Brief Bioinform 2023; 24:bbad287. [PMID: 37930021 DOI: 10.1093/bib/bbad287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 05/09/2023] [Accepted: 07/24/2023] [Indexed: 11/07/2023] Open
Abstract
MOTIVATION In recent years, the end-to-end deep learning method for single-chain protein structure prediction has achieved high accuracy. For example, the state-of-the-art method AlphaFold, developed by Google, has largely increased the accuracy of protein structure predictions to near experimental accuracy in some of the cases. At the same time, there are few methods that can evaluate the quality of protein complexes at the residue level. In particular, evaluating the quality of residues at the interface of protein complexes can lead to a wide range of applications, such as protein function analysis and drug design. In this paper, we introduce a new deep graph neural network-based method ComplexQA, to evaluate the local quality of interfaces for protein complexes by utilizing the residue-level structural information in 3D space and the sequence-level constraints. RESULTS We benchmark our method to other state-of-the-art quality assessment approaches on the HAF2 and DBM55-AF2 datasets (high-quality structural models predicted by AlphaFold-Multimer), and the BM5 docking dataset. The experimental results show that our proposed method achieves better or similar performance compared with other state-of-the-art methods, especially on difficult targets which only contain a few acceptable models. Our method is able to suggest a score for each interfac e residue, which demonstrates a powerful assessment tool for the ever-increasing number of protein complexes. AVAILABILITY https://github.com/Cao-Labs/ComplexQA.git. Contact: caora@plu.edu.
Collapse
Affiliation(s)
- Lei Zhang
- Department of Computer Science and Technology, AnHui University, Hefei, 230601, Anhui, China
| | - Sheng Wang
- Department of Computer Science and Technology, AnHui University, Hefei, 230601, Anhui, China
| | - Jie Hou
- Department of Computer Science, Saint Louis University, Saint. Louis, 63103, MO, USA
| | - Dong Si
- Division of Computing and Software Systems, University of Washington Bothell, Bothell, 98011, WA, USA
| | - Junyong Zhu
- Department of Computer Science and Technology, AnHui University, Hefei, 230601, Anhui, China
| | - Renzhi Cao
- Department of Humanities, Pacific Lutheran University, Tacoma, 98447, WA, USA
| |
Collapse
|
8
|
Chen Z, Liu N, Huang Y, Min X, Zeng X, Ge S, Zhang J, Xia N. PointDE: Protein Docking Evaluation Using 3D Point Cloud Neural Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3128-3138. [PMID: 37220029 DOI: 10.1109/tcbb.2023.3279019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Protein-protein interactions (PPIs) play essential roles in many vital movements and the determination of protein complex structure is helpful to discover the mechanism of PPI. Protein-protein docking is being developed to model the structure of the protein. However, there is still a challenge to selecting the near-native decoys generated by protein-protein docking. Here, we propose a docking evaluation method using 3D point cloud neural network named PointDE. PointDE transforms protein structure to the point cloud. Using the state-of-the-art point cloud network architecture and a novel grouping mechanism, PointDE can capture the geometries of the point cloud and learn the interaction information from the protein interface. On public datasets, PointDE surpasses the state-of-the-art method using deep learning. To further explore the ability of our method in different types of protein structures, we developed a new dataset generated by high-quality antibody-antigen complexes. The result in this antibody-antigen dataset shows the strong performance of PointDE, which will be helpful for the understanding of PPI mechanisms.
Collapse
|
9
|
Xiao S, Song Z, Tian H, Tao P. Assessments of Variational Autoencoder in Protein Conformation Exploration. JOURNAL OF COMPUTATIONAL BIOPHYSICS AND CHEMISTRY 2023; 22:489-501. [PMID: 38826699 PMCID: PMC11138204 DOI: 10.1142/s2737416523500217] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Molecular dynamics (MD) simulations have been extensively used to study protein dynamics and subsequently functions. However, MD simulations are often insufficient to explore adequate conformational space for protein functions within reachable timescales. Accordingly, many enhanced sampling methods, including variational autoencoder (VAE) based methods, have been developed to address this issue. The purpose of this study is to evaluate the feasibility of using VAE to assist in the exploration of protein conformational landscapes. Using three modeling systems, we showed that VAE could capture high-level hidden information which distinguishes protein conformations. These models could also be used to generate new physically plausible protein conformations for direct sampling in favorable conformational spaces. We also found that VAE worked better in interpolation than extrapolation and increasing latent space dimension could lead to a trade-off between performances and complexities.
Collapse
Affiliation(s)
- Sian Xiao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75205, United States
| | - Zilin Song
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75205, United States
| | - Hao Tian
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75205, United States
| | - Peng Tao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75205, United States
| |
Collapse
|
10
|
Kubečka J, Knattrup Y, Engsvang M, Jensen AB, Ayoubi D, Wu H, Christiansen O, Elm J. Current and future machine learning approaches for modeling atmospheric cluster formation. NATURE COMPUTATIONAL SCIENCE 2023; 3:495-503. [PMID: 38177415 DOI: 10.1038/s43588-023-00435-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 03/16/2023] [Indexed: 01/06/2024]
Abstract
The formation of strongly bound atmospheric molecular clusters is the first step towards forming new aerosol particles. Recent advances in the application of machine learning models open an enormous opportunity for complementing expensive quantum chemical calculations with efficient machine learning predictions. In this Perspective, we present how data-driven approaches can be applied to accelerate cluster configurational sampling, thereby greatly increasing the number of chemically relevant systems that can be covered.
Collapse
Affiliation(s)
- Jakub Kubečka
- Department of Chemistry, Aarhus University, Aarhus, Denmark
| | - Yosef Knattrup
- Department of Chemistry, Aarhus University, Aarhus, Denmark
| | | | | | - Daniel Ayoubi
- Department of Chemistry, Aarhus University, Aarhus, Denmark
| | - Haide Wu
- Department of Chemistry, Aarhus University, Aarhus, Denmark
| | | | - Jonas Elm
- Department of Chemistry, Aarhus University, Aarhus, Denmark.
- iCLIMATE Aarhus University Interdisciplinary Centre for Climate Change, Aarhus, Denmark.
| |
Collapse
|
11
|
Zheng LE, Barethiya S, Nordquist E, Chen J. Machine Learning Generation of Dynamic Protein Conformational Ensembles. Molecules 2023; 28:4047. [PMID: 37241789 PMCID: PMC10220786 DOI: 10.3390/molecules28104047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 05/04/2023] [Accepted: 05/09/2023] [Indexed: 05/28/2023] Open
Abstract
Machine learning has achieved remarkable success across a broad range of scientific and engineering disciplines, particularly its use for predicting native protein structures from sequence information alone. However, biomolecules are inherently dynamic, and there is a pressing need for accurate predictions of dynamic structural ensembles across multiple functional levels. These problems range from the relatively well-defined task of predicting conformational dynamics around the native state of a protein, which traditional molecular dynamics (MD) simulations are particularly adept at handling, to generating large-scale conformational transitions connecting distinct functional states of structured proteins or numerous marginally stable states within the dynamic ensembles of intrinsically disordered proteins. Machine learning has been increasingly applied to learn low-dimensional representations of protein conformational spaces, which can then be used to drive additional MD sampling or directly generate novel conformations. These methods promise to greatly reduce the computational cost of generating dynamic protein ensembles, compared to traditional MD simulations. In this review, we examine recent progress in machine learning approaches towards generative modeling of dynamic protein ensembles and emphasize the crucial importance of integrating advances in machine learning, structural data, and physical principles to achieve these ambitious goals.
Collapse
Affiliation(s)
- Li-E Zheng
- Department of Gynecology, The First Affiliated Hospital of Fujian Medical University, Fuzhou 350005, China;
| | - Shrishti Barethiya
- Department of Chemistry, University of Massachusetts Amherst, Amherst, MA 01003, USA; (S.B.); (E.N.)
| | - Erik Nordquist
- Department of Chemistry, University of Massachusetts Amherst, Amherst, MA 01003, USA; (S.B.); (E.N.)
| | - Jianhan Chen
- Department of Chemistry, University of Massachusetts Amherst, Amherst, MA 01003, USA; (S.B.); (E.N.)
| |
Collapse
|
12
|
Wodak SJ, Vajda S, Lensink MF, Kozakov D, Bates PA. Critical Assessment of Methods for Predicting the 3D Structure of Proteins and Protein Complexes. Annu Rev Biophys 2023; 52:183-206. [PMID: 36626764 PMCID: PMC10885158 DOI: 10.1146/annurev-biophys-102622-084607] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Advances in a scientific discipline are often measured by small, incremental steps. In this review, we report on two intertwined disciplines in the protein structure prediction field, modeling of single chains and modeling of complexes, that have over decades emulated this pattern, as monitored by the community-wide blind prediction experiments CASP and CAPRI. However, over the past few years, dramatic advances were observed for the accurate prediction of single protein chains, driven by a surge of deep learning methodologies entering the prediction field. We review the mainscientific developments that enabled these recent breakthroughs and feature the important role of blind prediction experiments in building up and nurturing the structure prediction field. We discuss how the new wave of artificial intelligence-based methods is impacting the fields of computational and experimental structural biology and highlight areas in which deep learning methods are likely to lead to future developments, provided that major challenges are overcome.
Collapse
Affiliation(s)
- Shoshana J Wodak
- VIB-VUB Center for Structural Biology, Vrije Universiteit Brussel, Brussels, Belgium;
| | - Sandor Vajda
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA;
- Department of Chemistry, Boston University, Boston, Massachusetts, USA
| | - Marc F Lensink
- Univ. Lille, CNRS, UMR 8576-UGSF-Unité de Glycobiologie Structurale et Fonctionnelle, Lille, France;
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA;
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA
| | - Paul A Bates
- Biomolecular Modelling Laboratory, The Francis Crick Institute, London, United Kingdom;
| |
Collapse
|
13
|
Zhang O, Haghighatlari M, Li J, Liu ZH, Namini A, Teixeira JMC, Forman-Kay JD, Head-Gordon T. Learning to evolve structural ensembles of unfolded and disordered proteins using experimental solution data. J Chem Phys 2023; 158:174113. [PMID: 37144719 PMCID: PMC10163956 DOI: 10.1063/5.0141474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 04/11/2023] [Indexed: 05/06/2023] Open
Abstract
The structural characterization of proteins with a disorder requires a computational approach backed by experiments to model their diverse and dynamic structural ensembles. The selection of conformational ensembles consistent with solution experiments of disordered proteins highly depends on the initial pool of conformers, with currently available tools limited by conformational sampling. We have developed a Generative Recurrent Neural Network (GRNN) that uses supervised learning to bias the probability distributions of torsions to take advantage of experimental data types such as nuclear magnetic resonance J-couplings, nuclear Overhauser effects, and paramagnetic resonance enhancements. We show that updating the generative model parameters according to the reward feedback on the basis of the agreement between experimental data and probabilistic selection of torsions from learned distributions provides an alternative to existing approaches that simply reweight conformers of a static structural pool for disordered proteins. Instead, the biased GRNN, DynamICE, learns to physically change the conformations of the underlying pool of the disordered protein to those that better agree with experiments.
Collapse
Affiliation(s)
- Oufan Zhang
- Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, California 94720, USA
| | - Mojtaba Haghighatlari
- Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, California 94720, USA
| | - Jie Li
- Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, California 94720, USA
| | | | - Ashley Namini
- Molecular Medicine Program, Hospital for Sick Children, Toronto, Ontario M5S 1A8, Canada
| | | | | | | |
Collapse
|
14
|
Sun F, Kadupitiya J, Jadhao V. Probing Accuracy-Speedup Tradeoff in Machine Learning Surrogates for Molecular Dynamics Simulations. J Chem Theory Comput 2023. [PMID: 37094180 DOI: 10.1021/acs.jctc.2c01282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/26/2023]
Abstract
The performance promise of machine learning surrogates of molecular dynamics simulations of soft materials is significant but generally comes at the cost of acquiring large training datasets to learn the complex relationships between input soft material attributes and output properties. Under the constraint of limited high-performance computing resources, optimizing the size of the training datasets becomes paramount. Using an artificial neural network based surrogate for molecular dynamics simulations of confined electrolytes, we explore the tradeoff between surrogate accuracy and computational gains. Accuracy is assessed by computing the root-mean-square errors between the surrogate predictions and the ground truth results obtained via molecular dynamics simulations. The computational performance is judged by evaluating the speedup which incorporates the training dataset creation time. Improvement in accuracy occurs with a loss of speedup, which scales as the inverse of the training dataset size. The link between surrogate generalizability and the accuracy-speedup tradeoff is assessed by examining the errors incurred in surrogate predictions on unseen, interpolated input variables and developing a net speedup metric to capture the associated gains.
Collapse
Affiliation(s)
- Fanbo Sun
- Intelligent Systems Engineering, Indiana University, 700 N. Woodlawn Avenue, Bloomington, Indiana 47408, United States
| | - Jcs Kadupitiya
- Intelligent Systems Engineering, Indiana University, 700 N. Woodlawn Avenue, Bloomington, Indiana 47408, United States
| | - Vikram Jadhao
- Intelligent Systems Engineering, Indiana University, 700 N. Woodlawn Avenue, Bloomington, Indiana 47408, United States
| |
Collapse
|
15
|
Zhu JJ, Zhang NJ, Wei T, Chen HF. Enhancing Conformational Sampling for Intrinsically Disordered and Ordered Proteins by Variational Autoencoder. Int J Mol Sci 2023; 24:ijms24086896. [PMID: 37108059 PMCID: PMC10138423 DOI: 10.3390/ijms24086896] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 03/26/2023] [Accepted: 03/27/2023] [Indexed: 04/29/2023] Open
Abstract
Intrinsically disordered proteins (IDPs) account for more than 50% of the human proteome and are closely associated with tumors, cardiovascular diseases, and neurodegeneration, which have no fixed three-dimensional structure under physiological conditions. Due to the characteristic of conformational diversity, conventional experimental methods of structural biology, such as NMR, X-ray diffraction, and CryoEM, are unable to capture conformational ensembles. Molecular dynamics (MD) simulation can sample the dynamic conformations at the atomic level, which has become an effective method for studying the structure and function of IDPs. However, the high computational cost prevents MD simulations from being widely used for IDPs conformational sampling. In recent years, significant progress has been made in artificial intelligence, which makes it possible to solve the conformational reconstruction problem of IDP with fewer computational resources. Here, based on short MD simulations of different IDPs systems, we use variational autoencoders (VAEs) to achieve the generative reconstruction of IDPs structures and include a wider range of sampled conformations from longer simulations. Compared with the generative autoencoder (AEs), VAEs add an inference layer between the encoder and decoder in the latent space, which can cover the conformational landscape of IDPs more comprehensively and achieve the effect of enhanced sampling. Through experimental verification, the Cα RMSD between VAE-generated and MD simulation sampling conformations in the 5 IDPs test systems was significantly lower than that of AE. The Spearman correlation coefficient on the structure was higher than that of AE. VAE can also achieve excellent performance regarding structured proteins. In summary, VAEs can be used to effectively sample protein structures.
Collapse
Affiliation(s)
- Jun-Jie Zhu
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Ning-Jie Zhang
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Ting Wei
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Hai-Feng Chen
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
- Shanghai Center for Bioinformation Technology, Shanghai 200240, China
| |
Collapse
|
16
|
Ramírez-Palacios C, Marrink SJ. Super High-Throughput Screening of Enzyme Variants by Spectral Graph Convolutional Neural Networks. J Chem Theory Comput 2023. [PMID: 36961994 PMCID: PMC10373491 DOI: 10.1021/acs.jctc.2c01227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2023]
Abstract
Finding new enzyme variants with the desired substrate scope requires screening through a large number of potential variants. In a typical in silico enzyme engineering workflow, it is possible to scan a few thousands of variants, and gather several candidates for further screening or experimental verification. In this work, we show that a Graph Convolutional Neural Network (GCN) can be trained to predict the binding energy of combinatorial libraries of enzyme complexes using only sequence information. The GCN model uses a stack of message-passing and graph pooling layers to extract information from the protein input graph and yield a prediction. The GCN model is agnostic to the identity of the ligand, which is kept constant within the mutant libraries. Using a miniscule subset of the total combinatorial space (204-208 mutants) as training data, the proposed GCN model achieves a high accuracy in predicting the binding energy of unseen variants. The network's accuracy was further improved by injecting feature embeddings obtained from a language module pretrained on 10 million protein sequences. Since no structural information is needed to evaluate new variants, the deep learning algorithm is capable of scoring an enzyme variant in under 1 ms, allowing the search of billions of candidates on a single GPU.
Collapse
Affiliation(s)
- Carlos Ramírez-Palacios
- Molecular Dynamics, Groningen Biomolecular Sciences and Biotechnology Institute (GBB), University of Groningen, Nijenborgh 7, 9747 AG Groningen, The Netherlands
| | - Siewert J Marrink
- Molecular Dynamics, Groningen Biomolecular Sciences and Biotechnology Institute (GBB), University of Groningen, Nijenborgh 7, 9747 AG Groningen, The Netherlands
| |
Collapse
|
17
|
Dutagaci B, Duan B, Qiu C, Kaplan CD, Feig M. Characterization of RNA polymerase II trigger loop mutations using molecular dynamics simulations and machine learning. PLoS Comput Biol 2023; 19:e1010999. [PMID: 36947548 PMCID: PMC10069792 DOI: 10.1371/journal.pcbi.1010999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 04/03/2023] [Accepted: 03/06/2023] [Indexed: 03/23/2023] Open
Abstract
Catalysis and fidelity of multisubunit RNA polymerases rely on a highly conserved active site domain called the trigger loop (TL), which achieves roles in transcription through conformational changes and interaction with NTP substrates. The mutations of TL residues cause distinct effects on catalysis including hypo- and hyperactivity and altered fidelity. We applied molecular dynamics simulation (MD) and machine learning (ML) techniques to characterize TL mutations in the Saccharomyces cerevisiae RNA Polymerase II (Pol II) system. We did so to determine relationships between individual mutations and phenotypes and to associate phenotypes with MD simulated structural alterations. Using fitness values of mutants under various stress conditions, we modeled phenotypes along a spectrum of continual values. We found that ML could predict the phenotypes with 0.68 R2 correlation from amino acid sequences alone. It was more difficult to incorporate MD data to improve predictions from machine learning, presumably because MD data is too noisy and possibly incomplete to directly infer functional phenotypes. However, a variational auto-encoder model based on the MD data allowed the clustering of mutants with different phenotypes based on structural details. Overall, we found that a subset of loss-of-function (LOF) and lethal mutations tended to increase distances of TL residues to the NTP substrate, while another subset of LOF and lethal substitutions tended to confer an increase in distances between TL and bridge helix (BH). In contrast, some of the gain-of-function (GOF) mutants appear to cause disruption of hydrophobic contacts among TL and nearby helices.
Collapse
Affiliation(s)
- Bercem Dutagaci
- Department of Molecular and Cell Biology, University of California Merced, Merced, California, United States of America
| | - Bingbing Duan
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Chenxi Qiu
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Craig D. Kaplan
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, United States of America
| |
Collapse
|
18
|
Banerjee A, Saha S, Tvedt NC, Yang LW, Bahar I. Mutually beneficial confluence of structure-based modeling of protein dynamics and machine learning methods. Curr Opin Struct Biol 2023; 78:102517. [PMID: 36587424 PMCID: PMC10038760 DOI: 10.1016/j.sbi.2022.102517] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 11/19/2022] [Accepted: 11/22/2022] [Indexed: 12/31/2022]
Abstract
Proteins sample an ensemble of conformers under physiological conditions, having access to a spectrum of modes of motions, also called intrinsic dynamics. These motions ensure the adaptation to various interactions in the cell, and largely assist in, if not determine, viable mechanisms of biological function. In recent years, machine learning frameworks have proven uniquely useful in structural biology, and recent studies further provide evidence to the utility and/or necessity of considering intrinsic dynamics for increasing their predictive ability. Efficient quantification of dynamics-based attributes by recently developed physics-based theories and models such as elastic network models provides a unique opportunity to generate data on dynamics for training ML models towards inferring mechanisms of protein function, assessing pathogenicity, or estimating binding affinities.
Collapse
Affiliation(s)
- Anupam Banerjee
- Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh PA 15261, USA
| | - Satyaki Saha
- Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh PA 15261, USA
| | - Nathan C Tvedt
- Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh PA 15261, USA; Computational and Applied Mathematics and Statistics, The College of William and Mary, Williamsburg, VA 23185, USA
| | - Lee-Wei Yang
- Institute of Bioinformatics and Structural Biology, and PhD Program in Biomedical Artificial Intelligence, National Tsing Hua University, Hsinchu 300044, Taiwan; Physics Division, National Center for Theoretical Sciences, Taipei 106319, Taiwan
| | - Ivet Bahar
- Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh PA 15261, USA.
| |
Collapse
|
19
|
Protein Function Analysis through Machine Learning. Biomolecules 2022; 12:biom12091246. [PMID: 36139085 PMCID: PMC9496392 DOI: 10.3390/biom12091246] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Revised: 08/22/2022] [Accepted: 08/31/2022] [Indexed: 11/16/2022] Open
Abstract
Machine learning (ML) has been an important arsenal in computational biology used to elucidate protein function for decades. With the recent burgeoning of novel ML methods and applications, new ML approaches have been incorporated into many areas of computational biology dealing with protein function. We examine how ML has been integrated into a wide range of computational models to improve prediction accuracy and gain a better understanding of protein function. The applications discussed are protein structure prediction, protein engineering using sequence modifications to achieve stability and druggability characteristics, molecular docking in terms of protein–ligand binding, including allosteric effects, protein–protein interactions and protein-centric drug discovery. To quantify the mechanisms underlying protein function, a holistic approach that takes structure, flexibility, stability, and dynamics into account is required, as these aspects become inseparable through their interdependence. Another key component of protein function is conformational dynamics, which often manifest as protein kinetics. Computational methods that use ML to generate representative conformational ensembles and quantify differences in conformational ensembles important for function are included in this review. Future opportunities are highlighted for each of these topics.
Collapse
|
20
|
Ton AT, Pandey M, Smith JR, Ban F, Fernandez M, Cherkasov A. Targeting SARS-CoV-2 Papain-Like Protease in the Post-Vaccine Era. Trends Pharmacol Sci 2022; 43:906-919. [PMID: 36114026 PMCID: PMC9399131 DOI: 10.1016/j.tips.2022.08.008] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 08/10/2022] [Accepted: 08/19/2022] [Indexed: 11/29/2022]
Abstract
While vaccines remain at the forefront of global healthcare responses, pioneering therapeutics against SARS-CoV-2 are expected to fill the gaps for waning immunity. Rapid development and approval of orally available direct-acting antivirals targeting crucial SARS-CoV-2 proteins marked the beginning of the era of small-molecule drugs for COVID-19. In that regard, the papain-like protease (PLpro) can be considered a major SARS-CoV-2 therapeutic target due to its dual biological role in suppressing host innate immune responses and in ensuring viral replication. Here, we summarize the challenges of targeting PLpro and innovative early-stage PLpro-specific small molecules. We propose that state-of-the-art computer-aided drug design (CADD) methodologies will play a critical role in the discovery of PLpro compounds as a novel class of COVID-19 drugs.
Collapse
Affiliation(s)
- Anh-Tien Ton
- Vancouver Prostate Centre, University of British Columbia, Vancouver, BC, Canada
| | - Mohit Pandey
- Vancouver Prostate Centre, University of British Columbia, Vancouver, BC, Canada
| | - Jason R Smith
- Vancouver Prostate Centre, University of British Columbia, Vancouver, BC, Canada; Department of Chemistry, Simon Fraser University, Burnaby, Canada
| | - Fuqiang Ban
- Vancouver Prostate Centre, University of British Columbia, Vancouver, BC, Canada
| | - Michael Fernandez
- Vancouver Prostate Centre, University of British Columbia, Vancouver, BC, Canada
| | - Artem Cherkasov
- Vancouver Prostate Centre, University of British Columbia, Vancouver, BC, Canada.
| |
Collapse
|
21
|
Fu X, Bates PA. Application of deep learning methods: From molecular modelling to patient classification. Exp Cell Res 2022; 418:113278. [PMID: 35810775 DOI: 10.1016/j.yexcr.2022.113278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 06/16/2022] [Accepted: 07/05/2022] [Indexed: 11/28/2022]
Abstract
We are now well into the information driven age with complex, heterogeneous, datasets in the biological sciences continuing to grow at a rapid pace. Moreover, distilling of such datasets, to find new governing principles, are underway. Leading the surge are new and exciting algorithmic developments in computer simulation and machine learning, most notably for the latter, those centred on deep learning. However, practical applications of cell centric computations within the biological sciences, even when carefully benchmarked against existing experimental datasets, remain challenging. Here we discuss the application of deep learning methodologies to support our understanding of cell functionality and as an aid to patient classification. Whilst comprehensive end-to-end deep learning approaches that utilise knowledge of the cell and its molecular components to aid human disease classification are yet to be implemented, important for opening the door to more effective molecular and cell-based therapies, we illustrate that many deep learning applications have been developed to tackle components of such an ambitious pipeline. We end our discussion on what the future may hold, especially how an integrated framework of computer simulations and deep learning, in conjunction with wet-bench experimentation, could enable to reveal the governing principles underlying cell functionalities within the tissue environments cells operate.
Collapse
Affiliation(s)
- Xiao Fu
- Biomolecular Modelling Laboratory, The Francis Crick Institute, 1 Midland Rd, London, NW1 1AT, UK.
| | - Paul A Bates
- Biomolecular Modelling Laboratory, The Francis Crick Institute, 1 Midland Rd, London, NW1 1AT, UK.
| |
Collapse
|
22
|
Villalobos-Alva J, Ochoa-Toledo L, Villalobos-Alva MJ, Aliseda A, Pérez-Escamirosa F, Altamirano-Bustamante NF, Ochoa-Fernández F, Zamora-Solís R, Villalobos-Alva S, Revilla-Monsalve C, Kemper-Valverde N, Altamirano-Bustamante MM. Protein Science Meets Artificial Intelligence: A Systematic Review and a Biochemical Meta-Analysis of an Inter-Field. Front Bioeng Biotechnol 2022; 10:788300. [PMID: 35875501 PMCID: PMC9301016 DOI: 10.3389/fbioe.2022.788300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Accepted: 05/25/2022] [Indexed: 11/23/2022] Open
Abstract
Proteins are some of the most fascinating and challenging molecules in the universe, and they pose a big challenge for artificial intelligence. The implementation of machine learning/AI in protein science gives rise to a world of knowledge adventures in the workhorse of the cell and proteome homeostasis, which are essential for making life possible. This opens up epistemic horizons thanks to a coupling of human tacit–explicit knowledge with machine learning power, the benefits of which are already tangible, such as important advances in protein structure prediction. Moreover, the driving force behind the protein processes of self-organization, adjustment, and fitness requires a space corresponding to gigabytes of life data in its order of magnitude. There are many tasks such as novel protein design, protein folding pathways, and synthetic metabolic routes, as well as protein-aggregation mechanisms, pathogenesis of protein misfolding and disease, and proteostasis networks that are currently unexplored or unrevealed. In this systematic review and biochemical meta-analysis, we aim to contribute to bridging the gap between what we call binomial artificial intelligence (AI) and protein science (PS), a growing research enterprise with exciting and promising biotechnological and biomedical applications. We undertake our task by exploring “the state of the art” in AI and machine learning (ML) applications to protein science in the scientific literature to address some critical research questions in this domain, including What kind of tasks are already explored by ML approaches to protein sciences? What are the most common ML algorithms and databases used? What is the situational diagnostic of the AI–PS inter-field? What do ML processing steps have in common? We also formulate novel questions such as Is it possible to discover what the rules of protein evolution are with the binomial AI–PS? How do protein folding pathways evolve? What are the rules that dictate the folds? What are the minimal nuclear protein structures? How do protein aggregates form and why do they exhibit different toxicities? What are the structural properties of amyloid proteins? How can we design an effective proteostasis network to deal with misfolded proteins? We are a cross-functional group of scientists from several academic disciplines, and we have conducted the systematic review using a variant of the PICO and PRISMA approaches. The search was carried out in four databases (PubMed, Bireme, OVID, and EBSCO Web of Science), resulting in 144 research articles. After three rounds of quality screening, 93 articles were finally selected for further analysis. A summary of our findings is as follows: regarding AI applications, there are mainly four types: 1) genomics, 2) protein structure and function, 3) protein design and evolution, and 4) drug design. In terms of the ML algorithms and databases used, supervised learning was the most common approach (85%). As for the databases used for the ML models, PDB and UniprotKB/Swissprot were the most common ones (21 and 8%, respectively). Moreover, we identified that approximately 63% of the articles organized their results into three steps, which we labeled pre-process, process, and post-process. A few studies combined data from several databases or created their own databases after the pre-process. Our main finding is that, as of today, there are no research road maps serving as guides to address gaps in our knowledge of the AI–PS binomial. All research efforts to collect, integrate multidimensional data features, and then analyze and validate them are, so far, uncoordinated and scattered throughout the scientific literature without a clear epistemic goal or connection between the studies. Therefore, our main contribution to the scientific literature is to offer a road map to help solve problems in drug design, protein structures, design, and function prediction while also presenting the “state of the art” on research in the AI–PS binomial until February 2021. Thus, we pave the way toward future advances in the synthetic redesign of novel proteins and protein networks and artificial metabolic pathways, learning lessons from nature for the welfare of humankind. Many of the novel proteins and metabolic pathways are currently non-existent in nature, nor are they used in the chemical industry or biomedical field.
Collapse
Affiliation(s)
- Jalil Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Luis Ochoa-Toledo
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Mario Javier Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Atocha Aliseda
- Instituto de Investigaciones Filosóficas, Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Fernando Pérez-Escamirosa
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | | | - Francine Ochoa-Fernández
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Ricardo Zamora-Solís
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Sebastián Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Cristina Revilla-Monsalve
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Nicolás Kemper-Valverde
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Myriam M. Altamirano-Bustamante
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
- *Correspondence: Myriam M. Altamirano-Bustamante,
| |
Collapse
|
23
|
Li Y, Guo Y, Cheng H, Zeng X, Zhang X, Sang P, Chen B, Yang L. Deciphering gp120 sequence variation and structural dynamics in
HIV
neutralization phenotype by molecular dynamics simulations and graph machine learning. Proteins 2022; 90:1413-1424. [DOI: 10.1002/prot.26322] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Revised: 01/21/2022] [Accepted: 02/10/2022] [Indexed: 02/04/2023]
Affiliation(s)
- Yi Li
- College of Mathematics and Computer Science Dali University Dali Yunnan China
| | - Yu‐Chen Guo
- College of Mathematics and Computer Science Dali University Dali Yunnan China
| | - Hong‐Han Cheng
- College of Mathematics and Computer Science Dali University Dali Yunnan China
| | - Xin Zeng
- College of Mathematics and Computer Science Dali University Dali Yunnan China
| | - Xiao‐Ling Zhang
- College of Mathematics and Computer Science Dali University Dali Yunnan China
| | - Peng Sang
- College of Agriculture and Biological Science Dali University Dali Yunnan China
| | - Ben‐Hui Chen
- College of Mathematics and Computer Science Dali University Dali Yunnan China
| | - Li‐Quan Yang
- College of Agriculture and Biological Science Dali University Dali Yunnan China
| |
Collapse
|
24
|
Gupta A, Dey S, Hicks A, Zhou HX. Artificial intelligence guided conformational mining of intrinsically disordered proteins. Commun Biol 2022; 5:610. [PMID: 35725761 PMCID: PMC9209487 DOI: 10.1038/s42003-022-03562-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 06/07/2022] [Indexed: 12/29/2022] Open
Abstract
Artificial intelligence recently achieved the breakthrough of predicting the three-dimensional structures of proteins. The next frontier is presented by intrinsically disordered proteins (IDPs), which, representing 30% to 50% of proteomes, readily access vast conformational space. Molecular dynamics (MD) simulations are promising in sampling IDP conformations, but only at extremely high computational cost. Here, we developed generative autoencoders that learn from short MD simulations and generate full conformational ensembles. An encoder represents IDP conformations as vectors in a reduced-dimensional latent space. The mean vector and covariance matrix of the training dataset are calculated to define a multivariate Gaussian distribution, from which vectors are sampled and fed to a decoder to generate new conformations. The ensembles of generated conformations cover those sampled by long MD simulations and are validated by small-angle X-ray scattering profile and NMR chemical shifts. This work illustrates the vast potential of artificial intelligence in conformational mining of IDPs.
Collapse
Affiliation(s)
- Aayush Gupta
- grid.185648.60000 0001 2175 0319Department of Chemistry, University of Illinois at Chicago, Chicago, IL 60607 USA
| | - Souvik Dey
- grid.185648.60000 0001 2175 0319Department of Chemistry, University of Illinois at Chicago, Chicago, IL 60607 USA
| | - Alan Hicks
- grid.185648.60000 0001 2175 0319Department of Chemistry, University of Illinois at Chicago, Chicago, IL 60607 USA
| | - Huan-Xiang Zhou
- grid.185648.60000 0001 2175 0319Department of Chemistry, University of Illinois at Chicago, Chicago, IL 60607 USA ,grid.185648.60000 0001 2175 0319Department of Physics, University of Illinois at Chicago, Chicago, IL 60607 USA
| |
Collapse
|
25
|
Taneishi K, Tsuchiya Y. Structure-based analyses of gut microbiome-related proteins by neural networks and molecular dynamics simulations. Curr Opin Struct Biol 2022; 73:102336. [DOI: 10.1016/j.sbi.2022.102336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 11/18/2021] [Accepted: 01/14/2022] [Indexed: 11/03/2022]
|
26
|
Li C, Liu J, Chen J, Yuan Y, Yu J, Gou Q, Guo Y, Pu X. An Interpretable Convolutional Neural Network Framework for Analyzing Molecular Dynamics Trajectories: a Case Study on Functional States for G-Protein-Coupled Receptors. J Chem Inf Model 2022; 62:1399-1410. [PMID: 35257580 DOI: 10.1021/acs.jcim.2c00085] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Molecular dynamics (MD) simulations have made great contribution to revealing structural and functional mechanisms for many biomolecular systems. However, how to identify functional states and important residues from vast conformation space generated by MD remains challenging; thus an intelligent navigation is highly desired. Despite intelligent advantages of deep learning exhibited in analyzing MD trajectory, its black-box nature limits its application. To address this problem, we explore an interpretable convolutional neural network (CNN)-based deep learning framework to automatically identify diverse active states from the MD trajectory for G-protein-coupled receptors (GPCRs), named the ICNNMD model. To avoid the information loss in representing the conformation structure, the pixel representation is introduced, and then the CNN module is constructed to efficiently extract features followed by a fully connected neural network to realize the classification task. More importantly, we design a local interpretable model-agnostic explanation interpreter for the classification result by local approximation with a linear model, through which important residues underlying distinct active states can be quickly identified. Our model showcases higher than 99% classification accuracy for three important GPCR systems with diverse active states. Notably, some important residues in regulating different biased activities are successfully identified, which are beneficial to elucidating diverse activation mechanisms for GPCRs. Our model can also serve as a general tool to analyze MD trajectory for other biomolecular systems. All source codes are freely available at https://github.com/Jane-Liu97/ICNNMD for aiding MD studies.
Collapse
Affiliation(s)
- Chuan Li
- College of Computer Science, Sichuan University, Chengdu 610064, China
| | - Jiangting Liu
- College of Computer Science, Sichuan University, Chengdu 610064, China
| | - Jianfang Chen
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Yuan Yuan
- College of Management, Southwest University for Nationalities, Chengdu 610041, China
| | - Jin Yu
- Department of Physics and Astronomy, University of California, Irvine, California 92697, United States
| | - Qiaolin Gou
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Yanzhi Guo
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Xuemei Pu
- College of Chemistry, Sichuan University, Chengdu 610064, China
| |
Collapse
|
27
|
Ketkaew R, Creazzo F, Luber S. Machine Learning-Assisted Discovery of Hidden States in Expanded Free Energy Space. J Phys Chem Lett 2022; 13:1797-1805. [PMID: 35171614 DOI: 10.1021/acs.jpclett.1c04004] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Collective variables (CVs) are crucial parameters in enhanced sampling calculations and strongly impact the quality of the obtained free energy surface. However, many existing CVs are unique to and dependent on the system they are constructed with, making the developed CV non-transferable to other systems. Herein, we develop a non-instructor-led deep autoencoder neural network (DAENN) for discovering general-purpose CVs. The DAENN is used to train a model by learning molecular representations upon unbiased trajectories that contain only the reactant conformers. The prior knowledge of nonconstraint reactants coupled with the here-introduced topology variable and loss-like penalty function are only required to make the biasing method able to expand its configurational (phase) space to unexplored energy basins. Our developed autoencoder is efficient and relatively inexpensive to use in terms of a priori knowledge, enabling one to automatically search for hidden CVs of the reaction of interest.
Collapse
Affiliation(s)
- Rangsiman Ketkaew
- Department of Chemistry, University of Zurich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland
| | - Fabrizio Creazzo
- Department of Chemistry, University of Zurich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland
| | - Sandra Luber
- Department of Chemistry, University of Zurich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland
| |
Collapse
|
28
|
Ruban A, Saccon F. Chlorophyll a De-Excitation Pathways in the LHCII antenna. J Chem Phys 2022; 156:070902. [DOI: 10.1063/5.0073825] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Alexander Ruban
- SBBS, Queen Mary University of London - Mile End Campus, United Kingdom
| | - Francesco Saccon
- School of Biological and Chemical Sciences, Queen Mary University of London - Mile End Campus, United Kingdom
| |
Collapse
|
29
|
Basciu A, Callea L, Motta S, Bonvin AM, Bonati L, Vargiu AV. No dance, no partner! A tale of receptor flexibility in docking and virtual screening. VIRTUAL SCREENING AND DRUG DOCKING 2022. [DOI: 10.1016/bs.armc.2022.08.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
30
|
Abstract
The biological significance of proteins attracted the scientific community in exploring their characteristics. The studies shed light on the interaction patterns and functions of proteins in a living body. Due to their practical difficulties, reliable experimental techniques pave the way for introducing computational methods in the interaction prediction. Automated methods reduced the difficulties but could not yet replace experimental studies as the field is still evolving. Interaction prediction problem being critical needs highly accurate results, but none of the existing methods could offer reliable performance that can parallel with experimental results yet. This article aims to assess the existing computational docking algorithms, their challenges, and future scope. Blind docking techniques are quite helpful when no information other than the individual structures are available. As more and more complex structures are being added to different databases, information-driven approaches can be a good alternative. Artificial intelligence, ruling over the major fields, is expected to take over this domain very shortly.
Collapse
|
31
|
Tian H, Jiang X, Trozzi F, Xiao S, Larson EC, Tao P. Explore Protein Conformational Space With Variational Autoencoder. Front Mol Biosci 2021; 8:781635. [PMID: 34869602 PMCID: PMC8633506 DOI: 10.3389/fmolb.2021.781635] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 10/28/2021] [Indexed: 12/02/2022] Open
Abstract
Molecular dynamics (MD) simulations have been actively used in the study of protein structure and function. However, extensive sampling in the protein conformational space requires large computational resources and takes a prohibitive amount of time. In this study, we demonstrated that variational autoencoders (VAEs), a type of deep learning model, can be employed to explore the conformational space of a protein through MD simulations. VAEs are shown to be superior to autoencoders (AEs) through a benchmark study, with low deviation between the training and decoded conformations. Moreover, we show that the learned latent space in the VAE can be used to generate unsampled protein conformations. Additional simulations starting from these generated conformations accelerated the sampling process and explored hidden spaces in the conformational landscape.
Collapse
Affiliation(s)
- Hao Tian
- Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Department of Chemistry, Southern Methodist University, Dallas, TX, United States
| | - Xi Jiang
- Department of Statistical Science, Southern Methodist University, Dallas, TX, United States
| | - Francesco Trozzi
- Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Department of Chemistry, Southern Methodist University, Dallas, TX, United States
| | - Sian Xiao
- Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Department of Chemistry, Southern Methodist University, Dallas, TX, United States
| | - Eric C. Larson
- Department of Computer Science, Southern Methodist University, Dallas, TX, United States
| | - Peng Tao
- Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Department of Chemistry, Southern Methodist University, Dallas, TX, United States
| |
Collapse
|
32
|
Tam C, Kumar A, Zhang KYJ. NbX: Machine Learning-Guided Re-Ranking of Nanobody-Antigen Binding Poses. Pharmaceuticals (Basel) 2021; 14:ph14100968. [PMID: 34681192 PMCID: PMC8537642 DOI: 10.3390/ph14100968] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 09/17/2021] [Accepted: 09/21/2021] [Indexed: 12/02/2022] Open
Abstract
Modeling the binding pose of an antibody is a prerequisite to structure-based affinity maturation and design. Without knowing a reliable binding pose, the subsequent structural simulation is largely futile. In this study, we have developed a method of machine learning-guided re-ranking of antigen binding poses of nanobodies, the single-domain antibody which has drawn much interest recently in antibody drug development. We performed a large-scale self-docking experiment of nanobody–antigen complexes. By training a decision tree classifier through mapping a feature set consisting of energy, contact and interface property descriptors to a measure of their docking quality of the refined poses, significant improvement in the median ranking of native-like nanobody poses by was achieved eightfold compared with ClusPro and an established deep 3D CNN classifier of native protein–protein interaction. We further interpreted our model by identifying features that showed relatively important contributions to the prediction performance. This study demonstrated a useful method in improving our current ability in pose prediction of nanobodies.
Collapse
Affiliation(s)
- Chunlai Tam
- Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan; (C.T.); (A.K.)
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8561, Japan
| | - Ashutosh Kumar
- Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan; (C.T.); (A.K.)
| | - Kam Y. J. Zhang
- Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan; (C.T.); (A.K.)
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8561, Japan
- Correspondence:
| |
Collapse
|
33
|
Dongre AV, Das S, Bellur A, Kumar S, Chandrashekarmath A, Karmakar T, Balaram P, Balasubramanian S, Balaram H. Structural basis for the hyperthermostability of an archaeal enzyme induced by succinimide formation. Biophys J 2021; 120:3732-3746. [PMID: 34302792 DOI: 10.1016/j.bpj.2021.07.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Revised: 06/18/2021] [Accepted: 07/19/2021] [Indexed: 10/20/2022] Open
Abstract
Stability of proteins from hyperthermophiles (organisms existing under boiling water conditions) enabled by a reduction of conformational flexibility is realized through various mechanisms. A succinimide (SNN) arising from the post-translational cyclization of the side chains of aspartyl/asparaginyl residues with the backbone amide -NH of the succeeding residue would restrain the torsion angle Ψ and can serve as a new route for hyperthermostability. However, such a succinimide is typically prone to hydrolysis, transforming to either an aspartyl or β-isoaspartyl residue. Here, we present the crystal structure of Methanocaldococcus jannaschii glutamine amidotransferase and, using enhanced sampling molecular dynamics simulations, address the mechanism of its increased thermostability, up to 100°C, imparted by an unexpectedly stable succinimidyl residue at position 109. The stability of SNN109 to hydrolysis is seen to arise from its electrostatic shielding by the side-chain carboxylate group of its succeeding residue Asp110, as well as through n → π∗ interactions between SNN109 and its preceding residue Glu108, both of which prevent water access to SNN. The stable succinimidyl residue induces the formation of an α-turn structure involving 13-atom hydrogen bonding, which locks the local conformation, reducing protein flexibility. The destabilization of the protein upon replacement of SNN with a Φ-restricted prolyl residue highlights the specificity of the succinimidyl residue in imparting hyperthermostability to the enzyme. The conservation of the succinimide-forming tripeptide sequence (E(N/D)(E/D)) in several archaeal GATases strongly suggests an adaptation of this otherwise detrimental post-translational modification as a harbinger of thermostability.
Collapse
Affiliation(s)
- Aparna Vilas Dongre
- Molecular Biology and Genetics Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Jakkur, Bangalore, India
| | - Sudip Das
- Chemistry and Physics of Materials Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Jakkur, Bangalore, India
| | - Asutosh Bellur
- Molecular Biology and Genetics Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Jakkur, Bangalore, India
| | - Sanjeev Kumar
- Molecular Biology and Genetics Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Jakkur, Bangalore, India; National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore, India
| | - Anusha Chandrashekarmath
- Molecular Biology and Genetics Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Jakkur, Bangalore, India
| | - Tarak Karmakar
- Chemistry and Physics of Materials Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Jakkur, Bangalore, India; Department of Chemistry and Applied Biosciences, ETH Zurich, Lugano, Ticino, Switzerland; Facoltà di Informatica, Istituto di Scienze Computationali, Università della Svizzera Italiana, Lugano, Ticino, Switzerland
| | - Padmanabhan Balaram
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore, India; Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Sundaram Balasubramanian
- Chemistry and Physics of Materials Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Jakkur, Bangalore, India.
| | - Hemalatha Balaram
- Molecular Biology and Genetics Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Jakkur, Bangalore, India.
| |
Collapse
|
34
|
Kulichenko M, Smith JS, Nebgen B, Li YW, Fedik N, Boldyrev AI, Lubbers N, Barros K, Tretiak S. The Rise of Neural Networks for Materials and Chemical Dynamics. J Phys Chem Lett 2021; 12:6227-6243. [PMID: 34196559 DOI: 10.1021/acs.jpclett.1c01357] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Machine learning (ML) is quickly becoming a premier tool for modeling chemical processes and materials. ML-based force fields, trained on large data sets of high-quality electron structure calculations, are particularly attractive due their unique combination of computational efficiency and physical accuracy. This Perspective summarizes some recent advances in the development of neural network-based interatomic potentials. Designing high-quality training data sets is crucial to overall model accuracy. One strategy is active learning, in which new data are automatically collected for atomic configurations that produce large ML uncertainties. Another strategy is to use the highest levels of quantum theory possible. Transfer learning allows training to a data set of mixed fidelity. A model initially trained to a large data set of density functional theory calculations can be significantly improved by retraining to a relatively small data set of expensive coupled cluster theory calculations. These advances are exemplified by applications to molecules and materials.
Collapse
Affiliation(s)
- Maksim Kulichenko
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Department of Chemistry and Biochemistry, Utah State University, Logan, Utah 84322, United States
| | - Justin S Smith
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Benjamin Nebgen
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Ying Wai Li
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Nikita Fedik
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Department of Chemistry and Biochemistry, Utah State University, Logan, Utah 84322, United States
| | - Alexander I Boldyrev
- Department of Chemistry and Biochemistry, Utah State University, Logan, Utah 84322, United States
| | - Nicholas Lubbers
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Kipton Barros
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Sergei Tretiak
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Integrated Nanotechnologies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| |
Collapse
|
35
|
Nunes-Alves A, Ormersbach F, Wade RC. Prediction of the Drug-Target Binding Kinetics for Flexible Proteins by Comparative Binding Energy Analysis. J Chem Inf Model 2021; 61:3708-3721. [PMID: 34197096 DOI: 10.1021/acs.jcim.1c00639] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
There is growing consensus that the optimization of the kinetic parameters for drug-protein binding leads to improved drug efficacy. Therefore, computational methods have been developed to predict kinetic rates and to derive quantitative structure-kinetic relationships (QSKRs). Many of these methods are based on crystal structures of ligand-protein complexes. However, a drawback is that each ligand-protein complex is usually treated as having a single structure. Here, we present a modification of COMparative BINding Energy (COMBINE) analysis, which uses the structures of ligand-protein complexes to predict binding parameters. We introduce the option of using multiple structures to describe each ligand-protein complex in COMBINE analysis and apply this to study the effects of protein flexibility on the derivation of dissociation rate constants (koff) for inhibitors of p38 mitogen-activated protein (MAP) kinase, which has a flexible binding site. Multiple structures were obtained for each ligand-protein complex by performing docking to an ensemble of protein configurations obtained from molecular dynamics simulations. Coefficients to scale ligand-protein interaction energies determined from energy-minimized structures of ligand-protein complexes were obtained by partial least squares regression, and they allowed for the computation of koff values. The QSKR model obtained using single, energy-minimized crystal structures for each ligand-protein complex had higher predictive power than the QSKR model obtained with multiple structures from ensemble docking. However, incorporation of ligand-protein flexibility helped to highlight additional ligand-protein interactions that lead to longer residence times, such as interactions with residues Arg67 and Asp168, which are close to the ligand in many crystal structures. These results show that COMBINE analysis is a promising method to guide the design of compounds that bind to flexible proteins with improved binding kinetics.
Collapse
Affiliation(s)
- Ariane Nunes-Alves
- Heidelberg Institute for Theoretical Studies (HITS), Schloß-Wolfsbrunnenweg 35, 69118 Heidelberg, Germany.,Center for Molecular Biology (ZMBH), DKFZ-ZMBH Alliance, Heidelberg University, Im Neuenheimer Feld 282, 69120 Heidelberg, Germany
| | - Fabian Ormersbach
- Heidelberg Institute for Theoretical Studies (HITS), Schloß-Wolfsbrunnenweg 35, 69118 Heidelberg, Germany
| | - Rebecca C Wade
- Heidelberg Institute for Theoretical Studies (HITS), Schloß-Wolfsbrunnenweg 35, 69118 Heidelberg, Germany.,Center for Molecular Biology (ZMBH), DKFZ-ZMBH Alliance, Heidelberg University, Im Neuenheimer Feld 282, 69120 Heidelberg, Germany.,Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, Im Neuenheimer Feld 205, 69120 Heidelberg, Germany
| |
Collapse
|
36
|
Kingdon ADH, Alderwick LJ. Structure-based in silico approaches for drug discovery against Mycobacterium tuberculosis. Comput Struct Biotechnol J 2021; 19:3708-3719. [PMID: 34285773 PMCID: PMC8258792 DOI: 10.1016/j.csbj.2021.06.034] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 06/22/2021] [Accepted: 06/22/2021] [Indexed: 12/12/2022] Open
Abstract
Mycobacterium tuberculosis is the causative agent of TB and was estimated to cause 1.4 million death in 2019, alongside 10 million new infections. Drug resistance is a growing issue, with multi-drug resistant infections representing 3.3% of all new infections, hence novel antimycobacterial drugs are urgently required to combat this growing health emergency. Alongside this, increased knowledge of gene essentiality in the pathogenic organism and larger compound databases can aid in the discovery of new drug compounds. The number of protein structures, X-ray based and modelled, is increasing and now accounts for greater than > 80% of all predicted M. tuberculosis proteins; allowing novel targets to be investigated. This review will focus on structure-based in silico approaches for drug discovery, covering a range of complexities and computational demands, with associated antimycobacterial examples. This includes molecular docking, molecular dynamic simulations, ensemble docking and free energy calculations. Applications of machine learning onto each of these approaches will be discussed. The need for experimental validation of computational hits is an essential component, which is unfortunately missing from many current studies. The future outlooks of these approaches will also be discussed.
Collapse
Key Words
- CV, collective variable
- Docking
- Drug discovery
- In silico
- LIE, Linear Interaction Energy
- MD, Molecular Dynamic
- MDR, multi-drug resistant
- MMPB(GB)SA, Molecular Mechanics with Poisson Boltzmann (or generalised Born) and Surface Area solvation
- Machine learning
- Mt, Mycobacterium tuberculosis
- Mycobacterium tuberculosis
- PTC, peptidyl transferase centre
- RMSD, root-mean square-deviation
- Tuberculosis, TB
- cMD, Classical Molecular Dynamic
- cryo-EM, cryogenic electron microscopy
- ns, nanosecond
Collapse
Affiliation(s)
- Alexander D H Kingdon
- Institute of Microbiology and Infection, School of Biosciences, University of Birmingham, Edgbaston, Birmingham B15 2TT, United Kingdom
| | - Luke J Alderwick
- Institute of Microbiology and Infection, School of Biosciences, University of Birmingham, Edgbaston, Birmingham B15 2TT, United Kingdom
| |
Collapse
|
37
|
Wang X, Flannery ST, Kihara D. Protein Docking Model Evaluation by Graph Neural Networks. Front Mol Biosci 2021; 8:647915. [PMID: 34113650 PMCID: PMC8185212 DOI: 10.3389/fmolb.2021.647915] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Accepted: 04/26/2021] [Indexed: 12/03/2022] Open
Abstract
Physical interactions of proteins play key functional roles in many important cellular processes. To understand molecular mechanisms of such functions, it is crucial to determine the structure of protein complexes. To complement experimental approaches, which usually take a considerable amount of time and resources, various computational methods have been developed for predicting the structures of protein complexes. In computational modeling, one of the challenges is to identify near-native structures from a large pool of generated models. Here, we developed a deep learning-based approach named Graph Neural Network-based DOcking decoy eValuation scorE (GNN-DOVE). To evaluate a protein docking model, GNN-DOVE extracts the interface area and represents it as a graph. The chemical properties of atoms and the inter-atom distances are used as features of nodes and edges in the graph, respectively. GNN-DOVE was trained, validated, and tested on docking models in the Dockground database and further tested on a combined dataset of Dockground and ZDOCK benchmark as well as a CAPRI scoring dataset. GNN-DOVE performed better than existing methods, including DOVE, which is our previous development that uses a convolutional neural network on voxelized structure models.
Collapse
Affiliation(s)
- Xiao Wang
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
| | - Sean T. Flannery
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
- Department of Biological Sciences, Purdue University, West Lafayette, IN, United States
| |
Collapse
|
38
|
Ward MD, Zimmerman MI, Meller A, Chung M, Swamidass SJ, Bowman GR. Deep learning the structural determinants of protein biochemical properties by comparing structural ensembles with DiffNets. Nat Commun 2021; 12:3023. [PMID: 34021153 PMCID: PMC8140102 DOI: 10.1038/s41467-021-23246-1] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Accepted: 04/16/2021] [Indexed: 12/05/2022] Open
Abstract
Understanding the structural determinants of a protein's biochemical properties, such as activity and stability, is a major challenge in biology and medicine. Comparing computer simulations of protein variants with different biochemical properties is an increasingly powerful means to drive progress. However, success often hinges on dimensionality reduction algorithms for simplifying the complex ensemble of structures each variant adopts. Unfortunately, common algorithms rely on potentially misleading assumptions about what structural features are important, such as emphasizing larger geometric changes over smaller ones. Here we present DiffNets, self-supervised autoencoders that avoid such assumptions, and automatically identify the relevant features, by requiring that the low-dimensional representations they learn are sufficient to predict the biochemical differences between protein variants. For example, DiffNets automatically identify subtle structural signatures that predict the relative stabilities of β-lactamase variants and duty ratios of myosin isoforms. DiffNets should also be applicable to understanding other perturbations, such as ligand binding.
Collapse
Affiliation(s)
- Michael D Ward
- Department of Biochemistry & Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA
- Center for the Science and Engineering of Living Systems, Washington University in St. Louis, St. Louis, MO, USA
| | - Maxwell I Zimmerman
- Department of Biochemistry & Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA
- Center for the Science and Engineering of Living Systems, Washington University in St. Louis, St. Louis, MO, USA
| | - Artur Meller
- Department of Biochemistry & Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA
- Center for the Science and Engineering of Living Systems, Washington University in St. Louis, St. Louis, MO, USA
| | - Moses Chung
- Department of Biochemistry & Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA
- Center for the Science and Engineering of Living Systems, Washington University in St. Louis, St. Louis, MO, USA
| | - S J Swamidass
- Department of Pathology & Immunology, Washington University School of Medicine, St. Louis, MO, USA
| | - Gregory R Bowman
- Department of Biochemistry & Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA.
- Center for the Science and Engineering of Living Systems, Washington University in St. Louis, St. Louis, MO, USA.
| |
Collapse
|
39
|
Plante A, Weinstein H. Ligand-Dependent Conformational Transitions in Molecular Dynamics Trajectories of GPCRs Revealed by a New Machine Learning Rare Event Detection Protocol. Molecules 2021; 26:molecules26103059. [PMID: 34065494 PMCID: PMC8161244 DOI: 10.3390/molecules26103059] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2021] [Revised: 05/11/2021] [Accepted: 05/11/2021] [Indexed: 01/14/2023] Open
Abstract
Central among the tools and approaches used for ligand discovery and design are Molecular Dynamics (MD) simulations, which follow the dynamic changes in molecular structure in response to the environmental condition, interactions with other proteins, and the effects of ligand binding. The need for, and successes of, MD simulations in providing this type of essential information are well documented, but so are the challenges presented by the size of the resulting datasets encoding the desired information. The difficulty of extracting information on mechanistically important state-to-state transitions in response to ligand binding and other interactions is compounded by these being rare events in the MD trajectories of complex molecular machines, such as G-protein-coupled receptors (GPCRs). To address this problem, we have developed a protocol for the efficient detection of such events. We show that the novel Rare Event Detection (RED) protocol reveals functionally relevant and pharmacologically discriminating responses to the binding of different ligands to the 5-HT2AR orthosteric site in terms of clearly defined, structurally coherent, and temporally ordered conformational transitions. This information from the RED protocol offers new insights into specific ligand-determined functional mechanisms encoded in the MD trajectories, which opens a new and rigorously reproducible path to understanding drug activity with application in drug discovery.
Collapse
Affiliation(s)
- Ambrose Plante
- Department of Physiology and Biophysics, Weill Cornell Medical College of Cornell University, New York, NY 10065, USA;
| | - Harel Weinstein
- Department of Physiology and Biophysics, Weill Cornell Medical College of Cornell University, New York, NY 10065, USA;
- Institute for Computational Biomedicine, Weill Cornell Medical College of Cornell University, New York, NY 10065, USA
- Correspondence: ; Tel.: +1-212-746-6358
| |
Collapse
|
40
|
Dechant PP, He YH. Machine-learning a virus assembly fitness landscape. PLoS One 2021; 16:e0250227. [PMID: 33951035 PMCID: PMC8099058 DOI: 10.1371/journal.pone.0250227] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2019] [Accepted: 04/01/2021] [Indexed: 02/05/2023] Open
Abstract
Realistic evolutionary fitness landscapes are notoriously difficult to construct. A recent cutting-edge model of virus assembly consists of a dodecahedral capsid with 12 corresponding packaging signals in three affinity bands. This whole genome/phenotype space consisting of 312 genomes has been explored via computationally expensive stochastic assembly models, giving a fitness landscape in terms of the assembly efficiency. Using latest machine-learning techniques by establishing a neural network, we show that the intensive computation can be short-circuited in a matter of minutes to astounding accuracy.
Collapse
Affiliation(s)
- Pierre-Philippe Dechant
- School of Science, Technology & Health, York St John University, York, United Kingdom
- York Cross-disciplinary Centre for Systems Analysis, University of York, Heslington, United Kingdom
- Department of Mathematics, University of York, Heslington, United Kingdom
- * E-mail:
| | - Yang-Hui He
- Department of Mathematics, City, University of London, London, United Kingdom
- Merton College, University of Oxford, Oxford, United Kingdom
- School of Physics, NanKai University, Tianjin, P.R. China
| |
Collapse
|
41
|
Jin Y, Johannissen LO, Hay S. Predicting new protein conformations from molecular dynamics simulation conformational landscapes and machine learning. Proteins 2021; 89:915-921. [PMID: 33629765 DOI: 10.1002/prot.26068] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Revised: 01/21/2021] [Accepted: 02/23/2021] [Indexed: 11/06/2022]
Abstract
Molecular dynamics (MD) simulations are a popular method of studying protein structure and function, but are unable to reliably sample all relevant conformational space in reasonable computational timescales. A range of enhanced sampling methods are available that can improve conformational sampling, but these do not offer a complete solution. We present here a proof-of-principle method of combining MD simulation with machine learning to explore protein conformational space. An autoencoder is used to map snapshots from MD simulations onto a user-defined conformational landscape defined by principal components analysis or specific structural features, and we show that we can predict, with useful accuracy, conformations that are not present in the training data. This method offers a new approach to the prediction of new low energy/physically realistic structures of conformationally dynamic proteins and allows an alternative approach to enhanced sampling of MD simulations.
Collapse
Affiliation(s)
- Yiming Jin
- Manchester Institute of Biotechnology and Department of Chemistry, The University of Manchester, Manchester, UK
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Linus O Johannissen
- Manchester Institute of Biotechnology and Department of Chemistry, The University of Manchester, Manchester, UK
| | - Sam Hay
- Manchester Institute of Biotechnology and Department of Chemistry, The University of Manchester, Manchester, UK
| |
Collapse
|
42
|
Wang B, Su Z, Wu Y. Characterizing the function of domain linkers in regulating the dynamics of multi-domain fusion proteins by microsecond molecular dynamics simulations and artificial intelligence. Proteins 2021; 89:884-895. [PMID: 33620752 DOI: 10.1002/prot.26066] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Revised: 01/20/2021] [Accepted: 02/20/2021] [Indexed: 11/12/2022]
Abstract
Multi-domain proteins are not only formed through natural evolution but can also be generated by recombinant DNA technology. Because many fusion proteins can enhance the selectivity of cell targeting, these artificially produced molecules, called multi-specific biologics, are promising drug candidates, especially for immunotherapy. Moreover, the rational design of domain linkers in fusion proteins is becoming an essential step toward a quantitative understanding of the dynamics in these biopharmaceutics. We developed a computational framework to characterize the impacts of peptide linkers on the dynamics of multi-specific biologics. Specifically, we first constructed a benchmark containing six types of linkers that represent various lengths and degrees of flexibility and used them to connect two natural proteins as a test system. We then projected the microsecond dynamics of these proteins generated from Anton onto a coarse-grained conformational space. We further analyzed the similarity of dynamics among different proteins in this low-dimensional space by a neural-network-based classification model. Finally, we applied hierarchical clustering to place linkers into different subgroups based on the classification results. The clustering results suggest that the length of linkers, which is used to spatially separate different functional modules, plays the most important role in regulating the dynamics of this fusion protein. Given the same number of amino acids, linker flexibility functions as a regulator of protein dynamics. In summary, we illustrated that a new computational strategy can be used to study the dynamics of multi-domain fusion proteins by a combination of long timescale molecular dynamics simulation, coarse-grained feature extraction, and artificial intelligence.
Collapse
Affiliation(s)
- Bo Wang
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, USA
| | - Zhaoqian Su
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, USA
| | - Yinghao Wu
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, USA
| |
Collapse
|
43
|
Rahman T, Du Y, Zhao L, Shehu A. Generative Adversarial Learning of Protein Tertiary Structures. Molecules 2021; 26:molecules26051209. [PMID: 33668217 PMCID: PMC7956369 DOI: 10.3390/molecules26051209] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 02/13/2021] [Accepted: 02/16/2021] [Indexed: 12/15/2022] Open
Abstract
Protein molecules are inherently dynamic and modulate their interactions with different molecular partners by accessing different tertiary structures under physiological conditions. Elucidating such structures remains challenging. Current momentum in deep learning and the powerful performance of generative adversarial networks (GANs) in complex domains, such as computer vision, inspires us to investigate GANs on their ability to generate physically-realistic protein tertiary structures. The analysis presented here shows that several GAN models fail to capture complex, distal structural patterns present in protein tertiary structures. The study additionally reveals that mechanisms touted as effective in stabilizing the training of a GAN model are not all effective, and that performance based on loss alone may be orthogonal to performance based on the quality of generated datasets. A novel contribution in this study is the demonstration that Wasserstein GAN strikes a good balance and manages to capture both local and distal patterns, thus presenting a first step towards more powerful deep generative models for exploring a possibly very diverse set of structures supporting diverse activities of a protein molecule in the cell.
Collapse
Affiliation(s)
- Taseef Rahman
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA; (T.R.); (Y.D.)
| | - Yuanqi Du
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA; (T.R.); (Y.D.)
| | - Liang Zhao
- Department of Computer Science, Emory University, Atlanta, GA 30322, USA;
| | - Amarda Shehu
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA; (T.R.); (Y.D.)
- Center for Advancing Human-Machine Partnerships, George Mason University, Fairfax, VA 22030, USA
- Department of Bioengineering, George Mason University, Fairfax, VA 22030, USA
- School of Systems Biology, George Mason University, Manassas, VA 20110, USA
- Correspondence:
| |
Collapse
|
44
|
Rognan D. Modeling Protein-Ligand Interactions: Are We Ready for Deep Learning? SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11521-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
45
|
Harmalkar A, Gray JJ. Advances to tackle backbone flexibility in protein docking. Curr Opin Struct Biol 2020; 67:178-186. [PMID: 33360497 DOI: 10.1016/j.sbi.2020.11.011] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Revised: 11/18/2020] [Accepted: 11/25/2020] [Indexed: 12/11/2022]
Abstract
Computational docking methods can provide structural models of protein-protein complexes, but protein backbone flexibility upon association often thwarts accurate predictions. In recent blind challenges, medium or high accuracy models were submitted in less than 20% of the 'difficult' targets (with significant backbone change or uncertainty). Here, we describe recent developments in protein-protein docking and highlight advances that tackle backbone flexibility. In molecular dynamics and Monte Carlo approaches, enhanced sampling techniques have reduced time-scale limitations. Internal coordinate formulations can now capture realistic motions of monomers and complexes using harmonic dynamics. And machine learning approaches adaptively guide docking trajectories or generate novel binding site predictions from deep neural networks trained on protein interfaces. These tools poise the field to break through the longstanding challenge of correctly predicting complex structures with significant conformational change.
Collapse
Affiliation(s)
- Ameya Harmalkar
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Jeffrey J Gray
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, USA; Program in Molecular Biophysics, Institute for Nanobiotechnology, and Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
46
|
Meng F, Liang Z, Zhao K, Luo C. Drug design targeting active posttranslational modification protein isoforms. Med Res Rev 2020; 41:1701-1750. [PMID: 33355944 DOI: 10.1002/med.21774] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 11/29/2020] [Accepted: 12/03/2020] [Indexed: 12/11/2022]
Abstract
Modern drug design aims to discover novel lead compounds with attractable chemical profiles to enable further exploration of the intersection of chemical space and biological space. Identification of small molecules with good ligand efficiency, high activity, and selectivity is crucial toward developing effective and safe drugs. However, the intersection is one of the most challenging tasks in the pharmaceutical industry, as chemical space is almost infinity and continuous, whereas the biological space is very limited and discrete. This bottleneck potentially limits the discovery of molecules with desirable properties for lead optimization. Herein, we present a new direction leveraging posttranslational modification (PTM) protein isoforms target space to inspire drug design termed as "Post-translational Modification Inspired Drug Design (PTMI-DD)." PTMI-DD aims to extend the intersections of chemical space and biological space. We further rationalized and highlighted the importance of PTM protein isoforms and their roles in various diseases and biological functions. We then laid out a few directions to elaborate the PTMI-DD in drug design including discovering covalent binding inhibitors mimicking PTMs, targeting PTM protein isoforms with distinctive binding sites from that of wild-type counterpart, targeting protein-protein interactions involving PTMs, and hijacking protein degeneration by ubiquitination for PTM protein isoforms. These directions will lead to a significant expansion of the biological space and/or increase the tractability of compounds, primarily due to precisely targeting PTM protein isoforms or complexes which are highly relevant to biological functions. Importantly, this new avenue will further enrich the personalized treatment opportunity through precision medicine targeting PTM isoforms.
Collapse
Affiliation(s)
- Fanwang Meng
- Drug Discovery and Design Center, the Center for Chemical Biology, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.,Department of Chemistry and Chemical Biology, McMaster University, Hamilton, Ontario, Canada
| | - Zhongjie Liang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Kehao Zhao
- School of Pharmacy, Key Laboratory of Molecular Pharmacology and Drug Evaluation (Yantai University), Ministry of Education, Collaborative Innovation Center of Advanced Drug Delivery System and Biotech Drugs in Universities of Shandong, Yantai University, Yantai, China
| | - Cheng Luo
- Drug Discovery and Design Center, the Center for Chemical Biology, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|
47
|
Instantaneous generation of protein hydration properties from static structures. Commun Chem 2020; 3:188. [PMID: 36703451 PMCID: PMC9814540 DOI: 10.1038/s42004-020-00435-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Accepted: 11/10/2020] [Indexed: 01/29/2023] Open
Abstract
Complex molecular simulation methods are typically required to calculate the thermodynamic properties of biochemical systems. One example thereof is the thermodynamic profiling of (de)solvation of proteins, which is an essential driving force for protein-ligand and protein-protein binding. The thermodynamic state of water molecules depends on its enthalpic and entropic components; the latter is governed by dynamic properties of the molecule. Here, we developed, to the best of our knowledge, two novel machine learning methods based on deep neural networks that are able to generate the converged thermodynamic state of dynamic water molecules in the heterogeneous protein environment based solely on the information of the static protein structure. The applicability of our machine learning methods to predict the hydration information is demonstrated in two different studies, the qualitative analysis and quantitative prediction of structure-activity relationships, and the prediction of protein-ligand binding modes.
Collapse
|
48
|
Bartocci A, Gillet N, Jiang T, Szczepaniak F, Dumont E. Molecular Dynamics Approach for Capturing Calixarene-Protein Interactions: The Case of Cytochrome C. J Phys Chem B 2020; 124:11371-11378. [PMID: 33270456 DOI: 10.1021/acs.jpcb.0c08482] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Functionalized supramolecular cages are of growing importance in biology and biochemistry. They have recently been proposed as efficient auxiliaries to obtain high-resolution cocrystallized proteins. Here, we propose a molecular dynamics investigation of the supramolecular association of sulfonated calix-[8]-arenes to cytochrome c starting from initially distant proteins and ligands. We characterize two main binding sites for the sulfonated calixarene on the cytochrome c surface which are in perfect agreement with the previous experiments with regard to the structure (comparison with the X-ray structure PDB 6GD8) and the binding free energies [comparison between the molecular mechanics Poisson-Boltzmann surface area analysis and the isothermal titration calorimetry measurements]. The per-residue decomposition of the interaction energies reveals the detailed picture of this electrostatically driven association and notably the role of arginine R13 as a bridging residue between the two main anchoring sites. In addition, the analysis of the residue behavior by means of a supervised machine learning protocol unveils the formation of a hydrogen bond network far from the binding sites, increasing the rigidity of the protein. This study paves the way toward an automated procedure to predict the supramolecular protein-cage association, with the possibility of a computational screening of new promising derivatives for controlled protein assembly and protein surface recognition processes.
Collapse
Affiliation(s)
- Alessio Bartocci
- Univ Lyon, ENS de Lyon, CNRS UMR 5182, Université Claude Bernard Lyon 1, Laboratoire de Chimie, F-69342 Lyon, France
| | - Natacha Gillet
- Univ Lyon, ENS de Lyon, CNRS UMR 5182, Université Claude Bernard Lyon 1, Laboratoire de Chimie, F-69342 Lyon, France
| | - Tao Jiang
- Univ Lyon, ENS de Lyon, CNRS UMR 5182, Université Claude Bernard Lyon 1, Laboratoire de Chimie, F-69342 Lyon, France
| | - Florence Szczepaniak
- Univ Lyon, ENS de Lyon, CNRS UMR 5182, Université Claude Bernard Lyon 1, Laboratoire de Chimie, F-69342 Lyon, France
| | - Elise Dumont
- Univ Lyon, ENS de Lyon, CNRS UMR 5182, Université Claude Bernard Lyon 1, Laboratoire de Chimie, F-69342 Lyon, France.,Institut Universitaire de France, 5 Rue Descartes, 75005 Paris, France
| |
Collapse
|
49
|
Gkeka P, Stoltz G, Barati Farimani A, Belkacemi Z, Ceriotti M, Chodera JD, Dinner AR, Ferguson AL, Maillet JB, Minoux H, Peter C, Pietrucci F, Silveira A, Tkatchenko A, Trstanova Z, Wiewiora R, Lelièvre T. Machine Learning Force Fields and Coarse-Grained Variables in Molecular Dynamics: Application to Materials and Biological Systems. J Chem Theory Comput 2020; 16:4757-4775. [PMID: 32559068 PMCID: PMC8312194 DOI: 10.1021/acs.jctc.0c00355] [Citation(s) in RCA: 82] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Machine learning encompasses tools and algorithms that are now becoming popular in almost all scientific and technological fields. This is true for molecular dynamics as well, where machine learning offers promises of extracting valuable information from the enormous amounts of data generated by simulation of complex systems. We provide here a review of our current understanding of goals, benefits, and limitations of machine learning techniques for computational studies on atomistic systems, focusing on the construction of empirical force fields from ab initio databases and the determination of reaction coordinates for free energy computation and enhanced sampling.
Collapse
Affiliation(s)
- Paraskevi Gkeka
- Integrated Drug Discovery, Sanofi R&D, 91385 Chilly-Mazarin, France
| | - Gabriel Stoltz
- CERMICS, Ecole des Ponts, Marne-la-Vallée, France
- Matherials Project-Team, Inria Paris, 75012 Paris, France
| | | | - Zineb Belkacemi
- Integrated Drug Discovery, Sanofi R&D, 91385 Chilly-Mazarin, France
- CERMICS, Ecole des Ponts, Marne-la-Vallée, France
| | - Michele Ceriotti
- Laboratory of Computational Science and Modelling, Institute of Materials, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| | - John D Chodera
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
| | - Aaron R Dinner
- Department of Chemistry, The University of Chicago, Chicago, Illinois 60637, United States
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, 5640 South Ellis Avenue, Chicago, Illinois 60637, United States
| | | | - Hervé Minoux
- Integrated Drug Discovery, Sanofi R&D, 94403 Vitry-sur-Seine, France
| | | | - Fabio Pietrucci
- UMR CNRS 7590, MNHN, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, Sorbonne Université, 75005 Paris, France
| | - Ana Silveira
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Zofia Trstanova
- School of Mathematics, The University of Edinburgh, Edinburgh EH9 3FD, U.K
| | - Rafal Wiewiora
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
| | - Tony Lelièvre
- CERMICS, Ecole des Ponts, Marne-la-Vallée, France
- Matherials Project-Team, Inria Paris, 75012 Paris, France
| |
Collapse
|
50
|
Verkhivker GM, Agajanian S, Hu G, Tao P. Allosteric Regulation at the Crossroads of New Technologies: Multiscale Modeling, Networks, and Machine Learning. Front Mol Biosci 2020; 7:136. [PMID: 32733918 PMCID: PMC7363947 DOI: 10.3389/fmolb.2020.00136] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Accepted: 06/08/2020] [Indexed: 12/12/2022] Open
Abstract
Allosteric regulation is a common mechanism employed by complex biomolecular systems for regulation of activity and adaptability in the cellular environment, serving as an effective molecular tool for cellular communication. As an intrinsic but elusive property, allostery is a ubiquitous phenomenon where binding or disturbing of a distal site in a protein can functionally control its activity and is considered as the "second secret of life." The fundamental biological importance and complexity of these processes require a multi-faceted platform of synergistically integrated approaches for prediction and characterization of allosteric functional states, atomistic reconstruction of allosteric regulatory mechanisms and discovery of allosteric modulators. The unifying theme and overarching goal of allosteric regulation studies in recent years have been integration between emerging experiment and computational approaches and technologies to advance quantitative characterization of allosteric mechanisms in proteins. Despite significant advances, the quantitative characterization and reliable prediction of functional allosteric states, interactions, and mechanisms continue to present highly challenging problems in the field. In this review, we discuss simulation-based multiscale approaches, experiment-informed Markovian models, and network modeling of allostery and information-theoretical approaches that can describe the thermodynamics and hierarchy allosteric states and the molecular basis of allosteric mechanisms. The wealth of structural and functional information along with diversity and complexity of allosteric mechanisms in therapeutically important protein families have provided a well-suited platform for development of data-driven research strategies. Data-centric integration of chemistry, biology and computer science using artificial intelligence technologies has gained a significant momentum and at the forefront of many cross-disciplinary efforts. We discuss new developments in the machine learning field and the emergence of deep learning and deep reinforcement learning applications in modeling of molecular mechanisms and allosteric proteins. The experiment-guided integrated approaches empowered by recent advances in multiscale modeling, network science, and machine learning can lead to more reliable prediction of allosteric regulatory mechanisms and discovery of allosteric modulators for therapeutically important protein targets.
Collapse
Affiliation(s)
- Gennady M. Verkhivker
- Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA, United States
- Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, CA, United States
| | - Steve Agajanian
- Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA, United States
| | - Guang Hu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Peng Tao
- Department of Chemistry, Center for Drug Discovery, Design, and Delivery (CD4), Center for Scientific Computation, Southern Methodist University, Dallas, TX, United States
| |
Collapse
|