1
|
Elisée E, Ducrot L, Méheust R, Bastard K, Fossey-Jouenne A, Grogan G, Pelletier E, Petit JL, Stam M, de Berardinis V, Zaparucha A, Vallenet D, Vergne-Vaxelaire C. A refined picture of the native amine dehydrogenase family revealed by extensive biodiversity screening. Nat Commun 2024; 15:4933. [PMID: 38858403 PMCID: PMC11164908 DOI: 10.1038/s41467-024-49009-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 05/20/2024] [Indexed: 06/12/2024] Open
Abstract
Native amine dehydrogenases offer sustainable access to chiral amines, so the search for scaffolds capable of converting more diverse carbonyl compounds is required to reach the full potential of this alternative to conventional synthetic reductive aminations. Here we report a multidisciplinary strategy combining bioinformatics, chemoinformatics and biocatalysis to extensively screen billions of sequences in silico and to efficiently find native amine dehydrogenases features using computational approaches. In this way, we achieve a comprehensive overview of the initial native amine dehydrogenase family, extending it from 2,011 to 17,959 sequences, and identify native amine dehydrogenases with non-reported substrate spectra, including hindered carbonyls and ethyl ketones, and accepting methylamine and cyclopropylamine as amine donor. We also present preliminary model-based structural information to inform the design of potential (R)-selective amine dehydrogenases, as native amine dehydrogenases are mostly (S)-selective. This integrated strategy paves the way for expanding the resource of other enzyme families and in highlighting enzymes with original features.
Collapse
Affiliation(s)
- Eddy Elisée
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Laurine Ducrot
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Raphaël Méheust
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Karine Bastard
- School of Pharmacy, Faculty of Medicine and Health, University of Sydney, Sydney, NSW, 2006, Australia
| | - Aurélie Fossey-Jouenne
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Gideon Grogan
- York Structural Biology Laboratory, Department of Chemistry, University of York, Heslington, York, YO10 5DD, UK
| | - Eric Pelletier
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Jean-Louis Petit
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Mark Stam
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Véronique de Berardinis
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Anne Zaparucha
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - David Vallenet
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France.
| | - Carine Vergne-Vaxelaire
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France.
| |
Collapse
|
2
|
Hetmann M, Parigger L, Sirelkhatim H, Stern A, Krassnigg A, Gruber K, Steinkellner G, Ruau D, Gruber CC. Folding the human proteome using BioNeMo: A fused dataset of structural models for machine learning purposes. Sci Data 2024; 11:591. [PMID: 38844754 PMCID: PMC11156891 DOI: 10.1038/s41597-024-03403-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 05/22/2024] [Indexed: 06/09/2024] Open
Abstract
Human proteins are crucial players in both health and disease. Understanding their molecular landscape is a central topic in biological research. Here, we present an extensive dataset of predicted protein structures for 42,042 distinct human proteins, including splicing variants, derived from the UniProt reference proteome UP000005640. To ensure high quality and comparability, the dataset was generated by combining state-of-the-art modeling-tools AlphaFold 2, OpenFold, and ESMFold, provided within NVIDIA's BioNeMo platform, as well as homology modeling using Innophore's CavitomiX platform. Our dataset is offered in both unedited and edited formats for diverse research requirements. The unedited version contains structures as generated by the different prediction methods, whereas the edited version contains refinements, including a dataset of structures without low prediction-confidence regions and structures in complex with predicted ligands based on homologs in the PDB. We are confident that this dataset represents the most comprehensive collection of human protein structures available today, facilitating diverse applications such as structure-based drug design and the prediction of protein function and interactions.
Collapse
|
3
|
Parigger L, Krassnigg A, Grabuschnig S, Gruber K, Steinkellner G, Gruber CC. AI-assisted structural consensus-proteome prediction of human monkeypox viruses isolated within a year after the 2022 multi-country outbreak. Microbiol Spectr 2023; 11:e0231523. [PMID: 37874150 PMCID: PMC10714838 DOI: 10.1128/spectrum.02315-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 09/09/2023] [Indexed: 10/25/2023] Open
Abstract
IMPORTANCE The 2022 outbreak of the monkeypox virus already involves, by April 2023, 110 countries with 86,956 confirmed cases and 119 deaths. Understanding an emerging disease on a molecular level is essential to study infection processes and eventually guide drug discovery at an early stage. To support this, we provide the so far most comprehensive structural proteome of the monkeypox virus, which includes 210 structural models, each computed with three state-of-the-art structure prediction methods. Instead of building on a single-genome sequence, we generated our models from a consensus of 3,713 high-quality genome sequences sampled from patients within 1 year of the outbreak. Therefore, we present an average structural proteome of the currently isolated viruses, including mutational analyses with a special focus on drug-binding sites. Continuing dynamic mutation monitoring within the structural proteome presented here is essential to timely predict possible physiological changes in the evolving virus.
Collapse
Affiliation(s)
- Lena Parigger
- Innophore, Graz, Austria
- Institute of Molecular Biosciences, University of Graz, Graz, Austria
| | | | | | - Karl Gruber
- Innophore, Graz, Austria
- Institute of Molecular Biosciences, University of Graz, Graz, Austria
- Austrian Centre of Industrial Biotechnology, Graz, Austria
- Field of Excellence BioHealth, University of Graz, Graz, Austria
| | - Georg Steinkellner
- Institute of Molecular Biosciences, University of Graz, Graz, Austria
- Field of Excellence BioHealth, University of Graz, Graz, Austria
- Innophore, San Francisco, California, USA
| | - Christian C. Gruber
- Institute of Molecular Biosciences, University of Graz, Graz, Austria
- Austrian Centre of Industrial Biotechnology, Graz, Austria
- Field of Excellence BioHealth, University of Graz, Graz, Austria
- Innophore, San Francisco, California, USA
| |
Collapse
|
4
|
Fang X, Bogdanov V, Davis JP, Kekenes-Huskey PM. Molecular Insights into the MLCK Activation by CaM. J Chem Inf Model 2023; 63:7487-7498. [PMID: 38016288 PMCID: PMC11070109 DOI: 10.1021/acs.jcim.3c00954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
Calmodulin (CaM) is a universal regulatory protein that modulates numerous cellular processes by using calcium (Ca2+) as the signal. In smooth muscle cells (SMC), one major target of CaM is myosin light chain kinase (MLCK), a kinase that phosphorylates the myosin regulatory light chain and thereby regulates cell contraction. In the absence of CaM, MLCK remains inhibited by its autoinhibitory domain (AID). While it is well established that CaM activates MLCK, the molecular interactions between these two proteins remain elusive due to the lack of structural data. In this work, we constructed a molecular model of mammalian CaM (mCaM) in complex with MLCK leveraging AlphaFold, published biochemical data, and protein-protein docking. The model, along with a strategic set of CaM mutants including a inhibitory variant soybean CaM isoform 4 (sCaM-4), was subject to molecular dynamics (MD) simulations. Using principal component analysis (PCA), we mapped out the transition path for the removal of the AID from the MLCK kinase domain to provide molecular basis of MLCK activation. Additionally, we established MLCK conformations that correspond to the active and inactive states of the kinase. We showed that mCaM and sCaM-4 cause MLCK to undergo the transition to the active and inactive states, respectively. Using two structural metrics, we computed the probabilities of MLCK activation by different CaM variants, which were in good agreement with the experimental data. Distributions along these metrics revealed that different inhibitory CaM variants impair MLCK activation through unique mechanisms. We finally identified molecular contacts that contribute to the MLCK activation by CaM. Overall, we report a de novo molecular model of CaM-MLCK that provides insights into the molecular mechanism of MLCK activation by CaM. The mechanism requires effective removal of the AID while preserving an active configuration of the kinase domain. This mechanism may be shared by other MLCK isoforms and potentially other structurally similar kinases with CaM-mediated regulatory domains.
Collapse
Affiliation(s)
- Xuan Fang
- Department of Cell and Molecular Physiology, Stritch School of medicine, Loyola University Chicago, Maywood, Illinois 60153, United States
| | - Vladimir Bogdanov
- Department of Physiology and Cell Biology, College of Medicine, The Ohio State University, Columbus, Ohio 43210, United States
| | - Jonathan P Davis
- Department of Physiology and Cell Biology, College of Medicine, The Ohio State University, Columbus, Ohio 43210, United States
| | - Peter M Kekenes-Huskey
- Department of Cell and Molecular Physiology, Stritch School of medicine, Loyola University Chicago, Maywood, Illinois 60153, United States
| |
Collapse
|
5
|
Affiliation(s)
- Maria Cristina De Rosa
- Institute of Chemical Sciences and Technologies "Giulio Natta" (SCITEC) - CNR, L.go F. Vito 1, 00168, Rome, Italy.
| | - Rituraj Purohit
- Structural Bioinformatics Lab, CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, HP, 176061, India.
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India.
| | - Alfonso T García-Sosa
- Department of Molecular Technology, Institute of Chemistry, University of Tartu, Ravila 14a, 50411, Tartu, Estonia.
| |
Collapse
|
6
|
Ghorbanali Z, Zare-Mirakabad F, Salehi N, Akbari M, Masoudi-Nejad A. DrugRep-HeSiaGraph: when heterogenous siamese neural network meets knowledge graphs for drug repurposing. BMC Bioinformatics 2023; 24:374. [PMID: 37789314 PMCID: PMC10548718 DOI: 10.1186/s12859-023-05479-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 09/12/2023] [Indexed: 10/05/2023] Open
Abstract
BACKGROUND Drug repurposing is an approach that holds promise for identifying new therapeutic uses for existing drugs. Recently, knowledge graphs have emerged as significant tools for addressing the challenges of drug repurposing. However, there are still major issues with constructing and embedding knowledge graphs. RESULTS This study proposes a two-step method called DrugRep-HeSiaGraph to address these challenges. The method integrates the drug-disease knowledge graph with the application of a heterogeneous siamese neural network. In the first step, a drug-disease knowledge graph named DDKG-V1 is constructed by defining new relationship types, and then numerical vector representations for the nodes are created using the distributional learning method. In the second step, a heterogeneous siamese neural network called HeSiaNet is applied to enrich the embedding of drugs and diseases by bringing them closer in a new unified latent space. Then, it predicts potential drug candidates for diseases. DrugRep-HeSiaGraph achieves impressive performance metrics, including an AUC-ROC of 91.16%, an AUC-PR of 90.32%, an accuracy of 84.63%, a BS of 0.119, and an MCC of 69.31%. CONCLUSION We demonstrate the effectiveness of the proposed method in identifying potential drugs for COVID-19 as a case study. In addition, this study shows the role of dipeptidyl peptidase 4 (DPP-4) as a potential receptor for SARS-CoV-2 and the effectiveness of DPP-4 inhibitors in facing COVID-19. This highlights the practical application of the model in addressing real-world challenges in the field of drug repurposing. The code and data for DrugRep-HeSiaGraph are publicly available at https://github.com/CBRC-lab/DrugRep-HeSiaGraph .
Collapse
Affiliation(s)
- Zahra Ghorbanali
- Computational Biology Research Center (CBRC), Department of Mathematics and Computer Science, Amirkabir University of Technology, Tehran, Iran
| | - Fatemeh Zare-Mirakabad
- Computational Biology Research Center (CBRC), Department of Mathematics and Computer Science, Amirkabir University of Technology, Tehran, Iran.
| | - Najmeh Salehi
- School of Biological Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
| | - Mohammad Akbari
- Computational Biology Research Center (CBRC), Department of Mathematics and Computer Science, Amirkabir University of Technology, Tehran, Iran
| | - Ali Masoudi-Nejad
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| |
Collapse
|