1
|
Rosenberg AA, Marx A, Bronstein AM. A dataset of alternately located segments in protein crystal structures. Sci Data 2024; 11:783. [PMID: 39019896 PMCID: PMC11255211 DOI: 10.1038/s41597-024-03595-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Accepted: 07/01/2024] [Indexed: 07/19/2024] Open
Abstract
Protein Data Bank (PDB) files list the relative spatial location of atoms in a protein structure as the final output of the process of fitting and refining to experimentally determined electron density measurements. Where experimental evidence exists for multiple conformations, atoms are modelled in alternate locations. Programs reading PDB files commonly ignore these alternate conformations by default leaving users oblivious to the presence of alternate conformations in the structures they analyze. This has led to underappreciation of their prevalence, under characterisation of their features and limited the accessibility to this high-resolution data representing structural ensembles. We have trawled PDB files to extract structural features of residues with alternately located atoms. The output includes the distance between alternate conformations and identifies the location of these segments within the protein chain and in proximity of all other atoms within a defined radius. This dataset should be of use in efforts to predict multiple structures from a single sequence and support studies investigating protein flexibility and the association with protein function.
Collapse
Affiliation(s)
- Aviv A Rosenberg
- Department of Computer Science, Technion - Israel Institute of Technology, Haifa, Israel
| | - Ailie Marx
- Department of Molecular and Computational Biosciences and Biotechnology, Migal - Galilee Research Institute, Qiryat, Israel.
| | - Alexander M Bronstein
- Department of Computer Science, Technion - Israel Institute of Technology, Haifa, Israel.
| |
Collapse
|
2
|
Garrido-Rodríguez P, Carmena-Bargueño M, de la Morena-Barrio ME, Bravo-Pérez C, de la Morena-Barrio B, Cifuentes-Riquelme R, Lozano ML, Pérez-Sánchez H, Corral J. Analysis of AlphaFold and molecular dynamics structure predictions of mutations in serpins. PLoS One 2024; 19:e0304451. [PMID: 38968282 PMCID: PMC11226102 DOI: 10.1371/journal.pone.0304451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 05/13/2024] [Indexed: 07/07/2024] Open
Abstract
Serine protease inhibitors (serpins) include thousands of structurally conserved proteins playing key roles in many organisms. Mutations affecting serpins may disturb their conformation, leading to inactive forms. Unfortunately, conformational consequences of serpin mutations are difficult to predict. In this study, we integrate experimental data of patients with mutations affecting one serpin with the predictions obtained by AlphaFold and molecular dynamics. Five SERPINC1 mutations causing antithrombin deficiency, the strongest congenital thrombophilia were selected from a cohort of 350 unrelated patients based on functional, biochemical, and crystallographic evidence supporting a folding defect. AlphaFold gave an accurate prediction for the wild-type structure. However, it also produced native structures for all variants, regardless of complexity or conformational consequences in vivo. Similarly, molecular dynamics of up to 1000 ns at temperatures causing conformational transitions did not show significant changes in the native structure of wild-type and variants. In conclusion, AlphaFold and molecular dynamics force predictions into the native conformation at conditions with experimental evidence supporting a conformational change to other structures. It is necessary to improve predictive strategies for serpins that consider the conformational sensitivity of these molecules.
Collapse
Affiliation(s)
- Pedro Garrido-Rodríguez
- Servicio de Hematología y Oncología Médica, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, IMIB-Arrixaca, CIBERER-ISCIII, Murcia, Spain
| | - Miguel Carmena-Bargueño
- Structural Bioinformatics & High Performance Computing Research Group (BIO-HPC), Universidad Católica de Murcia (UCAM), Murcia, Spain
| | - María Eugenia de la Morena-Barrio
- Servicio de Hematología y Oncología Médica, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, IMIB-Arrixaca, CIBERER-ISCIII, Murcia, Spain
| | - Carlos Bravo-Pérez
- Servicio de Hematología y Oncología Médica, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, IMIB-Arrixaca, CIBERER-ISCIII, Murcia, Spain
| | - Belén de la Morena-Barrio
- Servicio de Hematología y Oncología Médica, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, IMIB-Arrixaca, CIBERER-ISCIII, Murcia, Spain
| | - Rosa Cifuentes-Riquelme
- Servicio de Hematología y Oncología Médica, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, IMIB-Arrixaca, CIBERER-ISCIII, Murcia, Spain
| | - María Luisa Lozano
- Servicio de Hematología y Oncología Médica, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, IMIB-Arrixaca, CIBERER-ISCIII, Murcia, Spain
| | - Horacio Pérez-Sánchez
- Structural Bioinformatics & High Performance Computing Research Group (BIO-HPC), Universidad Católica de Murcia (UCAM), Murcia, Spain
| | - Javier Corral
- Servicio de Hematología y Oncología Médica, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, IMIB-Arrixaca, CIBERER-ISCIII, Murcia, Spain
| |
Collapse
|
3
|
Ruzmetov T, Hung TI, Jonnalagedda SP, Chen SH, Fasihianifard P, Guo Z, Bhanu B, Chang CEA. Sampling Conformational Ensembles of Highly Dynamic Proteins via Generative Deep Learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.05.592587. [PMID: 38979147 PMCID: PMC11230202 DOI: 10.1101/2024.05.05.592587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Proteins are inherently dynamic, and their conformational ensembles are functionally important in biology. Large-scale motions may govern protein structure-function relationship, and numerous transient but stable conformations of intrinsically disordered proteins (IDPs) can play a crucial role in biological function. Investigating conformational ensembles to understand regulations and disease-related aggregations of IDPs is challenging both experimentally and computationally. In this paper we first introduced an unsupervised deep learning-based model, termed Internal Coordinate Net (ICoN), which learns the physical principles of conformational changes from molecular dynamics (MD) simulation data. Second, we selected interpolating data points in the learned latent space that rapidly identify novel synthetic conformations with sophisticated and large-scale sidechains and backbone arrangements. Third, with the highly dynamic amyloid-β 1-42 (Aβ42) monomer, our deep learning model provided a comprehensive sampling of Aβ42's conformational landscape. Analysis of these synthetic conformations revealed conformational clusters that can be used to rationalize experimental findings. Additionally, the method can identify novel conformations with important interactions in atomistic details that are not included in the training data. New synthetic conformations showed distinct sidechain rearrangements that are probed by our EPR and amino acid substitution studies. This approach is highly transferable and can be used for any available data for training. The work also demonstrated the ability for deep learning to utilize learned natural atomistic motions in protein conformation sampling.
Collapse
|
4
|
Ruzmetov T, Hung TI, Jonnalagedda SP, Chen SH, Fasihianifard P, Guo Z, Bhanu B, Chang CEA. Sampling Conformational Ensembles of Highly Dynamic Proteins via Generative Deep Learning. RESEARCH SQUARE 2024:rs.3.rs-4301803. [PMID: 38978607 PMCID: PMC11230488 DOI: 10.21203/rs.3.rs-4301803/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Proteins are inherently dynamic, and their conformational ensembles are functionally important in biology. Large-scale motions may govern protein structure-function relationship, and numerous transient but stable conformations of intrinsically disordered proteins (IDPs) can play a crucial role in biological function. Investigating conformational ensembles to understand regulations and disease-related aggregations of IDPs is challenging both experimentally and computationally. In this paper first an unsupervised deep learning-based model, termed Internal Coordinate Net (ICoN), is developed that learns the physical principles of conformational changes from molecular dynamics (MD) simulation data. Second, interpolating data points in the learned latent space are selected that rapidly identify novel synthetic conformations with sophisticated and large-scale sidechains and backbone arrangements. Third, with the highly dynamic amyloid-β1-42 (Aβ42) monomer, our deep learning model provided a comprehensive sampling of Aβ42's conformational landscape. Analysis of these synthetic conformations revealed conformational clusters that can be used to rationalize experimental findings. Additionally, the method can identify novel conformations with important interactions in atomistic details that are not included in the training data. New synthetic conformations showed distinct sidechain rearrangements that are probed by our EPR and amino acid substitution studies. The proposed approach is highly transferable and can be used for any available data for training. The work also demonstrated the ability for deep learning to utilize learned natural atomistic motions in protein conformation sampling.
Collapse
Affiliation(s)
- Talant Ruzmetov
- Department of Chemistry, University of California, Riverside, CA92521
| | - Ta I Hung
- Department of Chemistry, University of California, Riverside, CA92521
- Department of Bioengineering, University of California, Riverside, CA92521
| | | | - Si-Han Chen
- Department of Chemistry, University of California, Riverside, CA92521
| | | | - Zhefeng Guo
- Department of Neurology, Brain Research Institute, University of California, Los Angeles, CA 90095
| | - Bir Bhanu
- Department of Bioengineering, University of California, Riverside, CA92521
- Department of Electrical and Computer Engineering, University of California, Riverside, CA92521
| | - Chia-En A Chang
- Department of Chemistry, University of California, Riverside, CA92521
- Department of Bioengineering, University of California, Riverside, CA92521
| |
Collapse
|
5
|
Venanzi NE, Basciu A, Vargiu AV, Kiparissides A, Dalby PA, Dikicioglu D. Machine Learning Integrating Protein Structure, Sequence, and Dynamics to Predict the Enzyme Activity of Bovine Enterokinase Variants. J Chem Inf Model 2024; 64:2681-2694. [PMID: 38386417 PMCID: PMC11005043 DOI: 10.1021/acs.jcim.3c00999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 02/12/2024] [Accepted: 02/13/2024] [Indexed: 02/24/2024]
Abstract
Despite recent advances in computational protein science, the dynamic behavior of proteins, which directly governs their biological activity, cannot be gleaned from sequence information alone. To overcome this challenge, we propose a framework that integrates the peptide sequence, protein structure, and protein dynamics descriptors into machine learning algorithms to enhance their predictive capabilities and achieve improved prediction of the protein variant function. The resulting machine learning pipeline integrates traditional sequence and structure information with molecular dynamics simulation data to predict the effects of multiple point mutations on the fold improvement of the activity of bovine enterokinase variants. This study highlights how the combination of structural and dynamic data can provide predictive insights into protein functionality and address protein engineering challenges in industrial contexts.
Collapse
Affiliation(s)
| | - Andrea Basciu
- Department
of Physics, University of Cagliari, Cittadella
Universitaria, I-09042 Monserrato, Cagliari, Italy
| | - Attilio Vittorio Vargiu
- Department
of Physics, University of Cagliari, Cittadella
Universitaria, I-09042 Monserrato, Cagliari, Italy
| | - Alexandros Kiparissides
- Department
of Biochemical Engineering, University College
London, Gower Street, WC1E 6BT London, U.K.
- Department
of Chemical Engineering, Aristotle University
of Thessaloniki, 54 124 Thessaloniki, Greece
| | - Paul A. Dalby
- Department
of Biochemical Engineering, University College
London, Gower Street, WC1E 6BT London, U.K.
| | - Duygu Dikicioglu
- Department
of Biochemical Engineering, University College
London, Gower Street, WC1E 6BT London, U.K.
| |
Collapse
|
6
|
AlRawashdeh S, Barakat KH. Applications of Molecular Dynamics Simulations in Drug Discovery. Methods Mol Biol 2024; 2714:127-141. [PMID: 37676596 DOI: 10.1007/978-1-0716-3441-7_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]
Abstract
In the current drug development process, molecular dynamics (MD) simulations have proven to be very useful. This chapter provides an overview of the current applications of MD simulations in drug discovery, from detecting protein druggable sites and validating drug docking outcomes to exploring protein conformations and investigating the influence of mutations on its structure and functions. In addition, this chapter emphasizes various strategies to improve the conformational sampling efficiency in molecular dynamics simulations. With a growing computer power and developments in the production of force fields and MD techniques, the importance of MD simulations in helping the drug development process is projected to rise significantly in the future.
Collapse
Affiliation(s)
- Sara AlRawashdeh
- Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton, AB, Canada
| | - Khaled H Barakat
- Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton, AB, Canada.
| |
Collapse
|
7
|
Acheson K, Kirrander A. Automatic Clustering of Excited-State Trajectories: Application to Photoexcited Dynamics. J Chem Theory Comput 2023; 19:6126-6138. [PMID: 37703098 PMCID: PMC10536988 DOI: 10.1021/acs.jctc.3c00776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Indexed: 09/14/2023]
Abstract
We introduce automatic clustering as a computationally efficient tool for classifying and interpreting trajectories from simulations of photo-excited dynamics. Trajectories are treated as time-series data, with the features for clustering selected by variance mapping of normalized data. The L2-norm and dynamic time warping are proposed as suitable similarity measures for calculating the distance matrices, and these are clustered using the unsupervised density-based DBSCAN algorithm. The silhouette coefficient and the number of trajectories classified as noise are used as quality measures for the clustering. The ability of clustering to provide rapid overview of large and complex trajectory data sets, and its utility for extracting chemical and physical insight, is demonstrated on trajectories corresponding to the photochemical ring-opening reaction of 1,3-cyclohexadiene, noting that the clustering can be used to generate reduced dimensionality representations in an unbiased manner.
Collapse
Affiliation(s)
- Kyle Acheson
- EaStCHEM,
School of Chemistry and Centre for Science at Extreme Conditions, University of Edinburgh, David Brewster Road, Edinburgh EH9 3FJ, U.K.
- Department
of Chemistry, University of Warwick, Coventry CV4 7AL, U.K.
| | - Adam Kirrander
- Physical
and Theoretical Chemistry Laboratory, Department of Chemistry, University of Oxford, South Parks Road, Oxford OX1 3QZ, U.K.
| |
Collapse
|
8
|
Arnittali M, Rissanou AN, Kefala A, Kokkinidis M, Harmandaris V. Structure of amino acid sequence-reversed wtRop protein: insights from atomistic molecular dynamics simulations. J Biomol Struct Dyn 2023:1-15. [PMID: 37671833 DOI: 10.1080/07391102.2023.2252903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 08/23/2023] [Indexed: 09/07/2023]
Abstract
This study aims to the investigation of the advantages of designing new proteins presume upon a 'bias' sequence of amino acids, based on the reversed sequence of parent proteins, such as the retro ones. The structural simplicity of wtRop offers a very attractive model system to study these aspects. The current work is based on all-atom Molecular Dynamics (MD) simulations and corresponding experimental evidence on two different types of reversed wtRop protein, one with a fully reversed sequence of amino acids (rRop) and another with a partially reversed sequence (prRop), where only the five residues of the loop region (30ASP-34GLN) were not reversed. The exploration of the structure of the two retro proteins is performed highlighting the similarities and the differences with their parent protein, by employing various measures. Two models have been studied for both reversed proteins, a dimeric and a monomeric with the former one found to be more stable than the latter. Preferable equilibrium structures that the protein molecule can attain are explored, indicating the equilibration pathway. Simulation findings indicate a disruption of the α-helical structure and the appearance of additional secondary structures for both retro proteins. Reduced structural stability compared to their parent protein (wtRop) is also found. A corruption of the hydrophobic core is observed in the dimeric models. Furthermore, the simulations findings are consistent with the experimental characterization of prRop by circular dichroism spectroscopy (CD) which highlights an unstable, highly α-helical protein.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Maria Arnittali
- Computation-Based Science and Technology Research Center, The Cyprus Institute, Nicosia, Cyprus
- Institute of Applied and Computational Mathematics, Foundation for Research and Technology Hellas (FORTH), Heraklion, Greece
- Department of Mathematics and Applied Mathematics, University of Crete, Heraklion, Crete, Greece
| | - Anastassia N Rissanou
- National Hellenic Research Foundation, Theoretical and Physical Chemistry Institute, Athens, Greece
| | - Aikaterini Kefala
- Institute of Molecular Biology and Biotechnology, Foundation of Research and Technology (FORTH), Heraklion, Greece
- Department of Biology, University of Crete, Heraklion, Crete, Greece
| | - Michael Kokkinidis
- Institute of Molecular Biology and Biotechnology, Foundation of Research and Technology (FORTH), Heraklion, Greece
- Department of Biology, University of Crete, Heraklion, Crete, Greece
| | - Vagelis Harmandaris
- Computation-Based Science and Technology Research Center, The Cyprus Institute, Nicosia, Cyprus
- Institute of Applied and Computational Mathematics, Foundation for Research and Technology Hellas (FORTH), Heraklion, Greece
- Department of Mathematics and Applied Mathematics, University of Crete, Heraklion, Crete, Greece
| |
Collapse
|
9
|
Verkhivker G, Alshahrani M, Gupta G, Xiao S, Tao P. From Deep Mutational Mapping of Allosteric Protein Landscapes to Deep Learning of Allostery and Hidden Allosteric Sites: Zooming in on "Allosteric Intersection" of Biochemical and Big Data Approaches. Int J Mol Sci 2023; 24:7747. [PMID: 37175454 PMCID: PMC10178073 DOI: 10.3390/ijms24097747] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 04/22/2023] [Accepted: 04/23/2023] [Indexed: 05/15/2023] Open
Abstract
The recent advances in artificial intelligence (AI) and machine learning have driven the design of new expert systems and automated workflows that are able to model complex chemical and biological phenomena. In recent years, machine learning approaches have been developed and actively deployed to facilitate computational and experimental studies of protein dynamics and allosteric mechanisms. In this review, we discuss in detail new developments along two major directions of allosteric research through the lens of data-intensive biochemical approaches and AI-based computational methods. Despite considerable progress in applications of AI methods for protein structure and dynamics studies, the intersection between allosteric regulation, the emerging structural biology technologies and AI approaches remains largely unexplored, calling for the development of AI-augmented integrative structural biology. In this review, we focus on the latest remarkable progress in deep high-throughput mining and comprehensive mapping of allosteric protein landscapes and allosteric regulatory mechanisms as well as on the new developments in AI methods for prediction and characterization of allosteric binding sites on the proteome level. We also discuss new AI-augmented structural biology approaches that expand our knowledge of the universe of protein dynamics and allostery. We conclude with an outlook and highlight the importance of developing an open science infrastructure for machine learning studies of allosteric regulation and validation of computational approaches using integrative studies of allosteric mechanisms. The development of community-accessible tools that uniquely leverage the existing experimental and simulation knowledgebase to enable interrogation of the allosteric functions can provide a much-needed boost to further innovation and integration of experimental and computational technologies empowered by booming AI field.
Collapse
Affiliation(s)
- Gennady Verkhivker
- Keck Center for Science and Engineering, Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA 92866, USA; (M.A.); (G.G.)
- Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, CA 92618, USA
| | - Mohammed Alshahrani
- Keck Center for Science and Engineering, Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA 92866, USA; (M.A.); (G.G.)
| | - Grace Gupta
- Keck Center for Science and Engineering, Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA 92866, USA; (M.A.); (G.G.)
| | - Sian Xiao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, TX 75275, USA; (S.X.); (P.T.)
| | - Peng Tao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, TX 75275, USA; (S.X.); (P.T.)
| |
Collapse
|
10
|
Protein Function Analysis through Machine Learning. Biomolecules 2022; 12:biom12091246. [PMID: 36139085 PMCID: PMC9496392 DOI: 10.3390/biom12091246] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Revised: 08/22/2022] [Accepted: 08/31/2022] [Indexed: 11/16/2022] Open
Abstract
Machine learning (ML) has been an important arsenal in computational biology used to elucidate protein function for decades. With the recent burgeoning of novel ML methods and applications, new ML approaches have been incorporated into many areas of computational biology dealing with protein function. We examine how ML has been integrated into a wide range of computational models to improve prediction accuracy and gain a better understanding of protein function. The applications discussed are protein structure prediction, protein engineering using sequence modifications to achieve stability and druggability characteristics, molecular docking in terms of protein–ligand binding, including allosteric effects, protein–protein interactions and protein-centric drug discovery. To quantify the mechanisms underlying protein function, a holistic approach that takes structure, flexibility, stability, and dynamics into account is required, as these aspects become inseparable through their interdependence. Another key component of protein function is conformational dynamics, which often manifest as protein kinetics. Computational methods that use ML to generate representative conformational ensembles and quantify differences in conformational ensembles important for function are included in this review. Future opportunities are highlighted for each of these topics.
Collapse
|
11
|
Basciu A, Callea L, Motta S, Bonvin AM, Bonati L, Vargiu AV. No dance, no partner! A tale of receptor flexibility in docking and virtual screening. VIRTUAL SCREENING AND DRUG DOCKING 2022. [DOI: 10.1016/bs.armc.2022.08.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|