1
|
Alhumaid NK, Tawfik EA. Reliability of AlphaFold2 Models in Virtual Drug Screening: A Focus on Selected Class A GPCRs. Int J Mol Sci 2024; 25:10139. [PMID: 39337622 PMCID: PMC11432040 DOI: 10.3390/ijms251810139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2024] [Revised: 09/19/2024] [Accepted: 09/19/2024] [Indexed: 09/30/2024] Open
Abstract
Protein three-dimensional (3D) structure prediction is one of the most challenging issues in the field of computational biochemistry, which has overwhelmed scientists for almost half a century. A significant breakthrough in structural biology has been established by developing the artificial intelligence (AI) system AlphaFold2 (AF2). The AF2 system provides a state-of-the-art prediction of protein structures from nearly all known protein sequences with high accuracy. This study examined the reliability of AF2 models compared to the experimental structures in drug discovery, focusing on one of the most common protein drug-targeted classes known as G protein-coupled receptors (GPCRs) class A. A total of 32 representative protein targets were selected, including experimental structures of X-ray crystallographic and Cryo-EM structures and their corresponding AF2 models. The quality of AF2 models was assessed using different structure validation tools, including the pLDDT score, RMSD value, MolProbity score, percentage of Ramachandran favored, QMEAN Z-score, and QMEANDisCo Global. The molecular docking was performed using the Genetic Optimization for Ligand Docking (GOLD) software. The AF2 models' reliability in virtual drug screening was determined by their ability to predict the ligand binding poses closest to the native binding pose by assessing the Root Mean Square Deviation (RMSD) metric and docking scoring function. The quality of the docking and scoring function was evaluated using the enrichment factor (EF). Furthermore, the capability of using AF2 models in molecular docking to identify hits with key protein-ligand interactions was analyzed. The posing power results showed that the AF2 models successfully predicted ligand binding poses (RMSD < 2 Å). However, they exhibited lower screening power, with average EF values of 2.24, 2.42, and 1.82 for X-ray, Cryo-EM, and AF2 structures, respectively. Moreover, our study revealed that molecular docking using AF2 models can identify competitive inhibitors. In conclusion, this study found that AF2 models provided docking results comparable to experimental structures, particularly for certain GPCR targets, and could potentially significantly impact drug discovery.
Collapse
Affiliation(s)
- Nada K Alhumaid
- Advanced Diagnostics and Therapeutics Institute, Health Sector, King Abdulaziz City for Science and Technology (KACST), Riyadh 11442, Saudi Arabia
| | - Essam A Tawfik
- Advanced Diagnostics and Therapeutics Institute, Health Sector, King Abdulaziz City for Science and Technology (KACST), Riyadh 11442, Saudi Arabia
| |
Collapse
|
2
|
Correa Marrero M, Jänes J, Baptista D, Beltrao P. Integrating Large-Scale Protein Structure Prediction into Human Genetics Research. Annu Rev Genomics Hum Genet 2024; 25:123-140. [PMID: 38621234 DOI: 10.1146/annurev-genom-120622-020615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2024]
Abstract
The last five years have seen impressive progress in deep learning models applied to protein research. Most notably, sequence-based structure predictions have seen transformative gains in the form of AlphaFold2 and related approaches. Millions of missense protein variants in the human population lack annotations, and these computational methods are a valuable means to prioritize variants for further analysis. Here, we review the recent progress in deep learning models applied to the prediction of protein structure and protein variants, with particular emphasis on their implications for human genetics and health. Improved prediction of protein structures facilitates annotations of the impact of variants on protein stability, protein-protein interaction interfaces, and small-molecule binding pockets. Moreover, it contributes to the study of host-pathogen interactions and the characterization of protein function. As genome sequencing in large cohorts becomes increasingly prevalent, we believe that better integration of state-of-the-art protein informatics technologies into human genetics research is of paramount importance.
Collapse
Affiliation(s)
- Miguel Correa Marrero
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland;
| | - Jürgen Jänes
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland;
| | | | - Pedro Beltrao
- Instituto Gulbenkian de Ciência, Oeiras, Portugal
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland;
| |
Collapse
|
3
|
Agarwal V, McShan AC. The power and pitfalls of AlphaFold2 for structure prediction beyond rigid globular proteins. Nat Chem Biol 2024; 20:950-959. [PMID: 38907110 DOI: 10.1038/s41589-024-01638-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 04/29/2024] [Indexed: 06/23/2024]
Abstract
Artificial intelligence-driven advances in protein structure prediction in recent years have raised the question: has the protein structure-prediction problem been solved? Here, with a focus on nonglobular proteins, we highlight the many strengths and potential weaknesses of DeepMind's AlphaFold2 in the context of its biological and therapeutic applications. We summarize the subtleties associated with evaluation of AlphaFold2 model quality and reliability using the predicted local distance difference test (pLDDT) and predicted aligned error (PAE) values. We highlight various classes of proteins that AlphaFold2 can be applied to and the caveats involved. Concrete examples of how AlphaFold2 models can be integrated with experimental data in the form of small-angle X-ray scattering (SAXS), solution NMR, cryo-electron microscopy (cryo-EM) and X-ray diffraction are discussed. Finally, we highlight the need to move beyond structure prediction of rigid, static structural snapshots toward conformational ensembles and alternate biologically relevant states. The overarching theme is that careful consideration is due when using AlphaFold2-generated models to generate testable hypotheses and structural models, rather than treating predicted models as de facto ground truth structures.
Collapse
Affiliation(s)
- Vinayak Agarwal
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, USA.
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA.
| | - Andrew C McShan
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, USA.
| |
Collapse
|
4
|
Urvas L, Chiesa L, Bret G, Jacquemard C, Kellenberger E. Benchmarking AlphaFold-Generated Structures of Chemokine-Chemokine Receptor Complexes. J Chem Inf Model 2024; 64:4587-4600. [PMID: 38809680 DOI: 10.1021/acs.jcim.3c01835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2024]
Abstract
AlphaFold and AlphaFold-Multimer have become two essential tools for the modeling of unknown structures of proteins and protein complexes. In this work, we extensively benchmarked the quality of chemokine-chemokine receptor structures generated by AlphaFold-Multimer against experimentally determined structures. Our analysis considered both the global quality of the model, as well as key structural features for chemokine recognition. To study the effects of template and multiple sequence alignment parameters on the results, a new prediction pipeline called LIT-AlphaFold (https://github.com/LIT-CCM-lab/LIT-AlphaFold) was developed, allowing extensive input customization. AlphaFold-Multimer correctly predicted differences in chemokine binding orientation and accurately reproduced the unique binding orientation of the CXCL12-ACKR3 complex. Further, the predictions of the full receptor N-terminus provided insights into a putative chemokine recognition site 0.5. The accuracy of chemokine N-terminus binding mode prediction varied between complexes, but the confidence score permitted the distinguishing of residues that were very likely well positioned. Finally, we generated a high-confidence model of the unsolved CXCL12-CXCR4 complex, which agreed with experimental mutagenesis and cross-linking data.
Collapse
Affiliation(s)
- Lauri Urvas
- Laboratoire d'Innovation Thérapeutique, UMR 7200 CNRS, Université de Strasbourg, 67400 Illkirch, France
| | - Luca Chiesa
- Laboratoire d'Innovation Thérapeutique, UMR 7200 CNRS, Université de Strasbourg, 67400 Illkirch, France
| | - Guillaume Bret
- Laboratoire d'Innovation Thérapeutique, UMR 7200 CNRS, Université de Strasbourg, 67400 Illkirch, France
| | - Célien Jacquemard
- Laboratoire d'Innovation Thérapeutique, UMR 7200 CNRS, Université de Strasbourg, 67400 Illkirch, France
| | - Esther Kellenberger
- Laboratoire d'Innovation Thérapeutique, UMR 7200 CNRS, Université de Strasbourg, 67400 Illkirch, France
| |
Collapse
|
5
|
Duignan TT. The Potential of Neural Network Potentials. ACS PHYSICAL CHEMISTRY AU 2024; 4:232-241. [PMID: 38800721 PMCID: PMC11117678 DOI: 10.1021/acsphyschemau.4c00004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 03/04/2024] [Accepted: 03/05/2024] [Indexed: 05/29/2024]
Abstract
In the next half-century, physical chemistry will likely undergo a profound transformation, driven predominantly by the combination of recent advances in quantum chemistry and machine learning (ML). Specifically, equivariant neural network potentials (NNPs) are a breakthrough new tool that are already enabling us to simulate systems at the molecular scale with unprecedented accuracy and speed, relying on nothing but fundamental physical laws. The continued development of this approach will realize Paul Dirac's 80-year-old vision of using quantum mechanics to unify physics with chemistry and providing invaluable tools for understanding materials science, biology, earth sciences, and beyond. The era of highly accurate and efficient first-principles molecular simulations will provide a wealth of training data that can be used to build automated computational methodologies, using tools such as diffusion models, for the design and optimization of systems at the molecular scale. Large language models (LLMs) will also evolve into increasingly indispensable tools for literature review, coding, idea generation, and scientific writing.
Collapse
|
6
|
Pun MN, Ivanov A, Bellamy Q, Montague Z, LaMont C, Bradley P, Otwinowski J, Nourmohammad A. Learning the shape of protein microenvironments with a holographic convolutional neural network. Proc Natl Acad Sci U S A 2024; 121:e2300838121. [PMID: 38300863 PMCID: PMC10861886 DOI: 10.1073/pnas.2300838121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 11/29/2023] [Indexed: 02/03/2024] Open
Abstract
Proteins play a central role in biology from immune recognition to brain activity. While major advances in machine learning have improved our ability to predict protein structure from sequence, determining protein function from its sequence or structure remains a major challenge. Here, we introduce holographic convolutional neural network (H-CNN) for proteins, which is a physically motivated machine learning approach to model amino acid preferences in protein structures. H-CNN reflects physical interactions in a protein structure and recapitulates the functional information stored in evolutionary data. H-CNN accurately predicts the impact of mutations on protein stability and binding of protein complexes. Our interpretable computational model for protein structure-function maps could guide design of novel proteins with desired function.
Collapse
Affiliation(s)
- Michael N. Pun
- Department of Physics, University of Washington, Seattle, WA98195
- The Department for Statistical Physics of Evolving Systems, Max Planck Institute for Dynamics and Self-Organization, Göttingen37077, Germany
| | - Andrew Ivanov
- Department of Physics, University of Washington, Seattle, WA98195
| | - Quinn Bellamy
- Department of Physics, University of Washington, Seattle, WA98195
| | - Zachary Montague
- Department of Physics, University of Washington, Seattle, WA98195
- The Department for Statistical Physics of Evolving Systems, Max Planck Institute for Dynamics and Self-Organization, Göttingen37077, Germany
| | - Colin LaMont
- The Department for Statistical Physics of Evolving Systems, Max Planck Institute for Dynamics and Self-Organization, Göttingen37077, Germany
| | - Philip Bradley
- Fred Hutchinson Cancer Center, Seattle, WA98102
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
| | - Jakub Otwinowski
- The Department for Statistical Physics of Evolving Systems, Max Planck Institute for Dynamics and Self-Organization, Göttingen37077, Germany
- Dyno Therapeutics, Watertown, MA02472
| | - Armita Nourmohammad
- Department of Physics, University of Washington, Seattle, WA98195
- The Department for Statistical Physics of Evolving Systems, Max Planck Institute for Dynamics and Self-Organization, Göttingen37077, Germany
- Fred Hutchinson Cancer Center, Seattle, WA98102
- Department of Applied Mathematics, University of Washington, Seattle, WA98105
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA98195
| |
Collapse
|
7
|
Versini R, Sritharan S, Aykac Fas B, Tubiana T, Aimeur SZ, Henri J, Erard M, Nüsse O, Andreani J, Baaden M, Fuchs P, Galochkina T, Chatzigoulas A, Cournia Z, Santuz H, Sacquin-Mora S, Taly A. A Perspective on the Prospective Use of AI in Protein Structure Prediction. J Chem Inf Model 2024; 64:26-41. [PMID: 38124369 DOI: 10.1021/acs.jcim.3c01361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
AlphaFold2 (AF2) and RoseTTaFold (RF) have revolutionized structural biology, serving as highly reliable and effective methods for predicting protein structures. This article explores their impact and limitations, focusing on their integration into experimental pipelines and their application in diverse protein classes, including membrane proteins, intrinsically disordered proteins (IDPs), and oligomers. In experimental pipelines, AF2 models help X-ray crystallography in resolving the phase problem, while complementarity with mass spectrometry and NMR data enhances structure determination and protein flexibility prediction. Predicting the structure of membrane proteins remains challenging for both AF2 and RF due to difficulties in capturing conformational ensembles and interactions with the membrane. Improvements in incorporating membrane-specific features and predicting the structural effect of mutations are crucial. For intrinsically disordered proteins, AF2's confidence score (pLDDT) serves as a competitive disorder predictor, but integrative approaches including molecular dynamics (MD) simulations or hydrophobic cluster analyses are advocated for accurate dynamics representation. AF2 and RF show promising results for oligomeric models, outperforming traditional docking methods, with AlphaFold-Multimer showing improved performance. However, some caveats remain in particular for membrane proteins. Real-life examples demonstrate AF2's predictive capabilities in unknown protein structures, but models should be evaluated for their agreement with experimental data. Furthermore, AF2 models can be used complementarily with MD simulations. In this Perspective, we propose a "wish list" for improving deep-learning-based protein folding prediction models, including using experimental data as constraints and modifying models with binding partners or post-translational modifications. Additionally, a meta-tool for ranking and suggesting composite models is suggested, driving future advancements in this rapidly evolving field.
Collapse
Affiliation(s)
- Raphaelle Versini
- Laboratoire de Biochimie Théorique, CNRS (UPR9080), Université Paris Cité, F-75005 Paris, France
| | - Sujith Sritharan
- Laboratoire de Biochimie Théorique, CNRS (UPR9080), Université Paris Cité, F-75005 Paris, France
| | - Burcu Aykac Fas
- Laboratoire de Biochimie Théorique, CNRS (UPR9080), Université Paris Cité, F-75005 Paris, France
| | - Thibault Tubiana
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Sana Zineb Aimeur
- Université Paris-Saclay, CNRS, Institut de Chimie Physique, 91405 Orsay, France
| | - Julien Henri
- Sorbonne Université, CNRS, Laboratoire de Biologie, Computationnelle et Quantitative UMR 7238, Institut de Biologie Paris-Seine, 4 Place Jussieu, F-75005 Paris, France
| | - Marie Erard
- Université Paris-Saclay, CNRS, Institut de Chimie Physique, 91405 Orsay, France
| | - Oliver Nüsse
- Université Paris-Saclay, CNRS, Institut de Chimie Physique, 91405 Orsay, France
| | - Jessica Andreani
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Marc Baaden
- Laboratoire de Biochimie Théorique, CNRS (UPR9080), Université Paris Cité, F-75005 Paris, France
| | - Patrick Fuchs
- Sorbonne Université, École Normale Supérieure, PSL University, CNRS, Laboratoire des Biomolécules, LBM, 75005 Paris, France
- Université de Paris, UFR Sciences du Vivant, 75013 Paris, France
| | - Tatiana Galochkina
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75014 Paris, France
| | - Alexios Chatzigoulas
- Biomedical Research Foundation, Academy of Athens, 11527 Athens, Greece
- Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, 15784 Athens, Greece
| | - Zoe Cournia
- Biomedical Research Foundation, Academy of Athens, 11527 Athens, Greece
- Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, 15784 Athens, Greece
| | - Hubert Santuz
- Laboratoire de Biochimie Théorique, CNRS (UPR9080), Université Paris Cité, F-75005 Paris, France
| | - Sophie Sacquin-Mora
- Laboratoire de Biochimie Théorique, CNRS (UPR9080), Université Paris Cité, F-75005 Paris, France
| | - Antoine Taly
- Laboratoire de Biochimie Théorique, CNRS (UPR9080), Université Paris Cité, F-75005 Paris, France
| |
Collapse
|
8
|
Suskiewicz MJ, Munnur D, Strømland Ø, Yang JC, Easton L, Chatrin C, Zhu K, Baretić D, Goffinont S, Schuller M, Wu WF, Elkins J, Ahel D, Sanyal S, Neuhaus D, Ahel I. Updated protein domain annotation of the PARP protein family sheds new light on biological function. Nucleic Acids Res 2023; 51:8217-8236. [PMID: 37326024 PMCID: PMC10450202 DOI: 10.1093/nar/gkad514] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 05/09/2023] [Accepted: 06/03/2023] [Indexed: 06/17/2023] Open
Abstract
AlphaFold2 and related computational tools have greatly aided studies of structural biology through their ability to accurately predict protein structures. In the present work, we explored AF2 structural models of the 17 canonical members of the human PARP protein family and supplemented this analysis with new experiments and an overview of recent published data. PARP proteins are typically involved in the modification of proteins and nucleic acids through mono or poly(ADP-ribosyl)ation, but this function can be modulated by the presence of various auxiliary protein domains. Our analysis provides a comprehensive view of the structured domains and long intrinsically disordered regions within human PARPs, offering a revised basis for understanding the function of these proteins. Among other functional insights, the study provides a model of PARP1 domain dynamics in the DNA-free and DNA-bound states and enhances the connection between ADP-ribosylation and RNA biology and between ADP-ribosylation and ubiquitin-like modifications by predicting putative RNA-binding domains and E2-related RWD domains in certain PARPs. In line with the bioinformatic analysis, we demonstrate for the first time PARP14's RNA-binding capability and RNA ADP-ribosylation activity in vitro. While our insights align with existing experimental data and are probably accurate, they need further validation through experiments.
Collapse
Affiliation(s)
| | - Deeksha Munnur
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, UK
| | - Øyvind Strømland
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, UK
- Department of Biomedicine, University of Bergen, Bergen, Norway
| | - Ji-Chun Yang
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Laura E Easton
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Chatrin Chatrin
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, UK
| | - Kang Zhu
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, UK
| | - Domagoj Baretić
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, UK
| | | | - Marion Schuller
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, UK
| | - Wing-Fung Wu
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Jonathan M Elkins
- Centre for Medicines Discovery, University of Oxford, Oxford OX3 7DQ, UK
| | - Dragana Ahel
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, UK
| | - Sumana Sanyal
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, UK
| | - David Neuhaus
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Ivan Ahel
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, UK
| |
Collapse
|
9
|
Adhav V, Saikrishnan K. The Realm of Unconventional Noncovalent Interactions in Proteins: Their Significance in Structure and Function. ACS OMEGA 2023; 8:22268-22284. [PMID: 37396257 PMCID: PMC10308531 DOI: 10.1021/acsomega.3c00205] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 05/22/2023] [Indexed: 07/04/2023]
Abstract
Proteins and their assemblies are fundamental for living cells to function. Their complex three-dimensional architecture and its stability are attributed to the combined effect of various noncovalent interactions. It is critical to scrutinize these noncovalent interactions to understand their role in the energy landscape in folding, catalysis, and molecular recognition. This Review presents a comprehensive summary of unconventional noncovalent interactions, beyond conventional hydrogen bonds and hydrophobic interactions, which have gained prominence over the past decade. The noncovalent interactions discussed include low-barrier hydrogen bonds, C5 hydrogen bonds, C-H···π interactions, sulfur-mediated hydrogen bonds, n → π* interactions, London dispersion interactions, halogen bonds, chalcogen bonds, and tetrel bonds. This Review focuses on their chemical nature, interaction strength, and geometrical parameters obtained from X-ray crystallography, spectroscopy, bioinformatics, and computational chemistry. Also highlighted are their occurrence in proteins or their complexes and recent advances made toward understanding their role in biomolecular structure and function. Probing the chemical diversity of these interactions, we determined that the variable frequency of occurrence in proteins and the ability to synergize with one another are important not only for ab initio structure prediction but also to design proteins with new functionalities. A better understanding of these interactions will promote their utilization in designing and engineering ligands with potential therapeutic value.
Collapse
Affiliation(s)
- Vishal
Annasaheb Adhav
- Department of Biology, Indian Institute of Science Education and Research, Pune 411008, India
| | - Kayarat Saikrishnan
- Department of Biology, Indian Institute of Science Education and Research, Pune 411008, India
| |
Collapse
|
10
|
Bhatia H, Aydin F, Carpenter TS, Lightstone FC, Bremer PT, Ingólfsson HI, Nissley DV, Streitz FH. The confluence of machine learning and multiscale simulations. Curr Opin Struct Biol 2023; 80:102569. [PMID: 36966691 DOI: 10.1016/j.sbi.2023.102569] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 01/31/2023] [Accepted: 02/08/2023] [Indexed: 06/04/2023]
Abstract
Multiscale modeling has a long history of use in structural biology, as computational biologists strive to overcome the time- and length-scale limits of atomistic molecular dynamics. Contemporary machine learning techniques, such as deep learning, have promoted advances in virtually every field of science and engineering and are revitalizing the traditional notions of multiscale modeling. Deep learning has found success in various approaches for distilling information from fine-scale models, such as building surrogate models and guiding the development of coarse-grained potentials. However, perhaps its most powerful use in multiscale modeling is in defining latent spaces that enable efficient exploration of conformational space. This confluence of machine learning and multiscale simulation with modern high-performance computing promises a new era of discovery and innovation in structural biology.
Collapse
Affiliation(s)
- Harsh Bhatia
- Computing Directorate, Lawrence Livermore National Laboratory, Livermore, CA, 94550, USA. https://twitter.com/@harshbhatia85
| | - Fikret Aydin
- Physical and Life Sciences (PLS) Directorate, Lawrence Livermore National Laboratory, Livermore, CA, 94550, USA
| | - Timothy S Carpenter
- Physical and Life Sciences (PLS) Directorate, Lawrence Livermore National Laboratory, Livermore, CA, 94550, USA
| | - Felice C Lightstone
- Physical and Life Sciences (PLS) Directorate, Lawrence Livermore National Laboratory, Livermore, CA, 94550, USA
| | - Peer-Timo Bremer
- Computing Directorate, Lawrence Livermore National Laboratory, Livermore, CA, 94550, USA
| | - Helgi I Ingólfsson
- Physical and Life Sciences (PLS) Directorate, Lawrence Livermore National Laboratory, Livermore, CA, 94550, USA
| | - Dwight V Nissley
- RAS Initiative, The Cancer Research Technology Program, Frederick National Laboratory, Frederick, MD, 21701, USA.
| | - Frederick H Streitz
- Physical and Life Sciences (PLS) Directorate, Lawrence Livermore National Laboratory, Livermore, CA, 94550, USA.
| |
Collapse
|
11
|
Veličković P. Everything is connected: Graph neural networks. Curr Opin Struct Biol 2023; 79:102538. [PMID: 36764042 DOI: 10.1016/j.sbi.2023.102538] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2022] [Revised: 12/28/2022] [Accepted: 01/03/2023] [Indexed: 02/11/2023]
Abstract
In many ways, graphs are the main modality of data we receive from nature. This is due to the fact that most of the patterns we see, both in natural and artificial systems, are elegantly representable using the language of graph structures. Prominent examples include molecules (represented as graphs of atoms and bonds), social networks and transportation networks. This potential has already been seen by key scientific and industrial groups, with already-impacted application areas including traffic forecasting, drug discovery, social network analysis and recommender systems. Further, some of the most successful domains of application for machine learning in previous years-images, text and speech processing-can be seen as special cases of graph representation learning, and consequently there has been significant exchange of information between these areas. The main aim of this short survey is to enable the reader to assimilate the key concepts in the area, and position graph representation learning in a proper context with related fields.
Collapse
Affiliation(s)
- Petar Veličković
- DeepMind, 6 Pancras Square, London, N1C 4AG, Greater London, UK; Department of Computer Science and Technology, University of Cambridge, 15 JJ Thomson Avenue, Cambridge, CB3 0FD, Cambridgeshire, UK.
| |
Collapse
|
12
|
Bertoline LMF, Lima AN, Krieger JE, Teixeira SK. Before and after AlphaFold2: An overview of protein structure prediction. FRONTIERS IN BIOINFORMATICS 2023; 3:1120370. [PMID: 36926275 PMCID: PMC10011655 DOI: 10.3389/fbinf.2023.1120370] [Citation(s) in RCA: 41] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 02/17/2023] [Indexed: 03/08/2023] Open
Abstract
Three-dimensional protein structure is directly correlated with its function and its determination is critical to understanding biological processes and addressing human health and life science problems in general. Although new protein structures are experimentally obtained over time, there is still a large difference between the number of protein sequences placed in Uniprot and those with resolved tertiary structure. In this context, studies have emerged to predict protein structures by methods based on a template or free modeling. In the last years, different methods have been combined to overcome their individual limitations, until the emergence of AlphaFold2, which demonstrated that predicting protein structure with high accuracy at unprecedented scale is possible. Despite its current impact in the field, AlphaFold2 has limitations. Recently, new methods based on protein language models have promised to revolutionize the protein structural biology allowing the discovery of protein structure and function only from evolutionary patterns present on protein sequence. Even though these methods do not reach AlphaFold2 accuracy, they already covered some of its limitations, being able to predict with high accuracy more than 200 million proteins from metagenomic databases. In this mini-review, we provide an overview of the breakthroughs in protein structure prediction before and after AlphaFold2 emergence.
Collapse
|
13
|
Soleymani F, Paquet E, Viktor HL, Michalowski W, Spinello D. ProtInteract: A deep learning framework for predicting protein-protein interactions. Comput Struct Biotechnol J 2023; 21:1324-1348. [PMID: 36817951 PMCID: PMC9929211 DOI: 10.1016/j.csbj.2023.01.028] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 01/20/2023] [Accepted: 01/20/2023] [Indexed: 01/26/2023] Open
Abstract
Proteins mainly perform their functions by interacting with other proteins. Protein-protein interactions underpin various biological activities such as metabolic cycles, signal transduction, and immune response. However, due to the sheer number of proteins, experimental methods for finding interacting and non-interacting protein pairs are time-consuming and costly. We therefore developed the ProtInteract framework to predict protein-protein interaction. ProtInteract comprises two components: first, a novel autoencoder architecture that encodes each protein's primary structure to a lower-dimensional vector while preserving its underlying sequence attributes. This leads to faster training of the second network, a deep convolutional neural network (CNN) that receives encoded proteins and predicts their interaction under three different scenarios. In each scenario, the deep CNN predicts the class of a given encoded protein pair. Each class indicates different ranges of confidence scores corresponding to the probability of whether a predicted interaction occurs or not. The proposed framework features significantly low computational complexity and relatively fast response. The contributions of this work are twofold. First, ProtInteract assimilates the protein's primary structure into a pseudo-time series. Therefore, we leverage the nature of the time series of proteins and their physicochemical properties to encode a protein's amino acid sequence into a lower-dimensional vector space. This approach enables extracting highly informative sequence attributes while reducing computational complexity. Second, the ProtInteract framework utilises this information to identify protein interactions with other proteins based on its amino acid configuration. Our results suggest that the proposed framework performs with high accuracy and efficiency in predicting protein-protein interactions.
Collapse
Affiliation(s)
- Farzan Soleymani
- Department of Mechanical Engineering, University of Ottawa, Ottawa, ON K1N 6N5, Canada
| | - Eric Paquet
- National Research Council, 1200 Montreal Road, Ottawa, ON K1A 0R6, Canada,Corresponding author.
| | - Herna Lydia Viktor
- School of Electrical Engineering and Computer Science, University of Ottawa, ON K1N 6N5, Canada
| | | | - Davide Spinello
- Department of Mechanical Engineering, University of Ottawa, Ottawa, ON K1N 6N5, Canada
| |
Collapse
|
14
|
Boonyakida J, Khoris IM, Nasrin F, Park EY. Improvement of Modular Protein Display Efficiency in SpyTag-Implemented Norovirus-like Particles. Biomacromolecules 2023; 24:308-318. [PMID: 36475654 DOI: 10.1021/acs.biomac.2c01150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Genetic fusion and chemical conjugation are the most common approaches for displaying a foreign protein on the surface of virus-like particles (VLPs); however, these methods may negatively affect the formation and stability of VLPs. Here, we aimed to develop a modular display platform for protein decoration on norovirus-like particles (NoV-LPs) by combining the NoV-LP scaffold with the SpyTag/SpyCatcher bioconjugation system, as the NoV-LP is an attractive protein nanoparticle to carry foreign proteins for various applications. The SpyTagged-NoV-LPs were prepared by introducing SpyTag peptide into the C-terminus of the norovirus VP1 protein. To increase surface exposure of the SpyTag peptide on the NoV-LPs, two or three repeated extension linkers (EAAAK) were inserted between the SpyTag peptide and VP1 protein. Fluorescence proteins, EGFP and mCherry, were fused to SpyCatcher and employed as SpyTag conjugation partners. These VP1-SpyTag variants and SpyCatcher-fused EGFP and mCherry were separately expressed in silkworm fat bodies and purified. This study reveals that adding an extension linker did not disrupt the VLP formation; instead, it increased the particle size by 4-6 nm. The conjugation efficiency of the VP1-SpyTag variants with the extended linker improved from ∼15-35 to ∼50-63% based on the densitometric analysis, while it was up to 77% based on an optical quantification of EGFP and mCherry. Results indicate that the linker causes the SpyTag peptides to be positioned further away from the C-termini of VP1 and potentially increases the exposure of the SpyTag to the outer surface of the NoV-LPs, allowing more SpyTag/SpyCatcher complex formation on the VLP surface. Our study provides a strategy for enhancing the conjugation efficiency of NoV-LP and demonstrates the platform's utility for developing vaccines or functional nanoparticles.
Collapse
Affiliation(s)
- Jirayu Boonyakida
- Research Institute of Green Science and Technology, Shizuoka University, 836 Ohya, Suruga ward, Shizuoka422-8529, Japan
| | - Indra Memdi Khoris
- Research Institute of Green Science and Technology, Shizuoka University, 836 Ohya, Suruga ward, Shizuoka422-8529, Japan
| | - Fahmida Nasrin
- Research Institute of Green Science and Technology, Shizuoka University, 836 Ohya, Suruga ward, Shizuoka422-8529, Japan
| | - Enoch Y Park
- Research Institute of Green Science and Technology, Shizuoka University, 836 Ohya, Suruga ward, Shizuoka422-8529, Japan
| |
Collapse
|
15
|
Li J, Wang H, Zhu J, Yang Q, Luan Y, Shi L, Molina-Mora JA, Zheng Y. De novo assembly of a chromosome-level reference genome of the ornamental butterfly Sericinus montelus based on nanopore sequencing and Hi-C analysis. Front Genet 2023; 14:1107353. [PMID: 36968580 PMCID: PMC10030965 DOI: 10.3389/fgene.2023.1107353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Accepted: 02/27/2023] [Indexed: 03/29/2023] Open
Abstract
Sericinus montelus (Lepidoptera, Papilionidae, Parnassiinae) is a high-value ornamental swallowtail butterfly species widely distributed in Northern and Central China, Japan, Korea, and Russia. The larval stage of this species feeds exclusively on Aristolochia plants. The Aristolochia species is well known for its high levels of aristolochic acids (AAs), which have been found to be carcinogenic for numerous animals. The swallowtail butterfly is among the few that can feed on these toxic host plants. However, the genetic adaptation of S. montelus to confer new abilities for AA tolerance has not yet been well explored, largely due to the limited genomic resources of this species. This study aimed to present a chromosome-level reference genome for S. montelus using the Oxford Nanopore long-read sequencing, Illumina short-read sequencing, and Hi-C technology. The final assembly was composed of 581.44 Mb with an expected genome size of 619.27 Mb. Further, 99.98% of the bases could be anchored onto 30 chromosomes. The N50 of contigs and scaffolds was 5.74 and 19.12 Mb, respectively. Approximately 48.86% of the assembled genome was suggested to be repeat elements, and 13,720 protein-coding genes were predicted in the current assembly. The phylogenetic analysis indicated that S. montelus diverged from the common ancestor of swallowtails about 58.57-80.46 million years ago. Compared with related species, S. montelus showed a significant expansion of P450 gene family members, and positive selections on eloa, heatr1, and aph1a resulted in the AA tolerance for S. montelus larva. The de novo assembly of a high-quality reference genome for S. montelus provided a fundamental genomic tool for future research on evolution, genome genetics, and toxicology of the swallowtail butterflies.
Collapse
Affiliation(s)
- Jingjing Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
- Grandomics Biosciences Institute, Wuhan, China
| | - Haiyan Wang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | | | - Qi Yang
- Grandomics Biosciences Institute, Wuhan, China
| | - Yang Luan
- Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
- Cancer Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - José Arturo Molina-Mora
- Centro de Investigación en Enfermedades Tropicales, Facultad de Microbiología, Universidad de Costa Rica, San José, Costa Rica
- *Correspondence: José Arturo Molina-Mora, ; Yuanting Zheng,
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
- *Correspondence: José Arturo Molina-Mora, ; Yuanting Zheng,
| |
Collapse
|
16
|
Nallasamy V, Seshiah M. Energy Profile Bayes and Thompson Optimized Convolutional Neural Network protein structure prediction. Neural Comput Appl 2023; 35:1983-2006. [PMID: 36245797 PMCID: PMC9542649 DOI: 10.1007/s00521-022-07868-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Accepted: 09/21/2022] [Indexed: 01/12/2023]
Abstract
In living organisms, proteins are considered as the executants of biological functions. Owing to its pivotal role played in protein folding patterns, comprehension of protein structure is a challenging issue. Moreover, owing to numerous protein sequence exploration in protein data banks and complication of protein structures, experimental methods are found to be inadequate for protein structural class prediction. Hence, it is very much advantageous to design a reliable computational method to predict protein structural classes from protein sequences. In the recent few years there has been an elevated interest in using deep learning to assist protein structure prediction as protein structure prediction models can be utilized to screen a large number of novel sequences. In this regard, we propose a model employing Energy Profile for atom pairs in conjunction with the Legion-Class Bayes function called Energy Profile Legion-Class Bayes Protein Structure Identification model. Followed by this, we use a Thompson Optimized convolutional neural network to extract features between amino acids and then the Thompson Optimized SoftMax function is employed to extract associations between protein sequences for predicting secondary protein structure. The proposed Energy Profile Bayes and Thompson Optimized Convolutional Neural Network (EPB-OCNN) method tested distinct unique protein data and was compared to the state-of-the-art methods, the Template-Based Modeling, Protein Design using Deep Graph Neural Networks, a deep learning-based S-glutathionylation sites prediction tool called a Computational Framework, the Deep Learning and a distance-based protein structure prediction using deep learning. The results obtained when applied with the Biopython tool with respect to protein structure prediction time, protein structure prediction accuracy, specificity, recall, F-measure, and precision, respectively, are measured. The proposed EPB-OCNN method outperformed the state-of-the-art methods, thereby corroborating the objective.
Collapse
Affiliation(s)
- Varanavasi Nallasamy
- Cognizant Technology Solutions Pvt. Ltd, CHIL SEZ IT Park, Keeranatham, Saravanam Patti, Coimbatore, Tamil Nadu 641035 India
| | - Malarvizhi Seshiah
- Department of Computer Science, Thiruvalluvar Government Arts College, Rasipuram, Namakkal, Tamil Nadu India
| |
Collapse
|
17
|
Dephospho-Coenzyme A Kinase Is an Exploitable Drug Target against Plasmodium falciparum: Identification of Selective Inhibitors by High-Throughput Screening of a Large Chemical Compound Library. Antimicrob Agents Chemother 2022; 66:e0042022. [DOI: 10.1128/aac.00420-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Malaria is a mosquito-borne fatal infectious disease that affects humans and is caused by
Plasmodium
parasites, primarily
Plasmodium falciparum
. Widespread drug resistance compels us to discover novel compounds and alternative drug discovery targets.
Collapse
|
18
|
Nussinov R, Zhang M, Liu Y, Jang H. AlphaFold, Artificial Intelligence (AI), and Allostery. J Phys Chem B 2022; 126:6372-6383. [PMID: 35976160 PMCID: PMC9442638 DOI: 10.1021/acs.jpcb.2c04346] [Citation(s) in RCA: 42] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 08/03/2022] [Indexed: 02/08/2023]
Abstract
AlphaFold has burst into our lives. A powerful algorithm that underscores the strength of biological sequence data and artificial intelligence (AI). AlphaFold has appended projects and research directions. The database it has been creating promises an untold number of applications with vast potential impacts that are still difficult to surmise. AI approaches can revolutionize personalized treatments and usher in better-informed clinical trials. They promise to make giant leaps toward reshaping and revamping drug discovery strategies, selecting and prioritizing combinations of drug targets. Here, we briefly overview AI in structural biology, including in molecular dynamics simulations and prediction of microbiota-human protein-protein interactions. We highlight the advancements accomplished by the deep-learning-powered AlphaFold in protein structure prediction and their powerful impact on the life sciences. At the same time, AlphaFold does not resolve the decades-long protein folding challenge, nor does it identify the folding pathways. The models that AlphaFold provides do not capture conformational mechanisms like frustration and allostery, which are rooted in ensembles, and controlled by their dynamic distributions. Allostery and signaling are properties of populations. AlphaFold also does not generate ensembles of intrinsically disordered proteins and regions, instead describing them by their low structural probabilities. Since AlphaFold generates single ranked structures, rather than conformational ensembles, it cannot elucidate the mechanisms of allosteric activating driver hotspot mutations nor of allosteric drug resistance. However, by capturing key features, deep learning techniques can use the single predicted conformation as the basis for generating a diverse ensemble.
Collapse
Affiliation(s)
- Ruth Nussinov
- Computational
Structural Biology Section, Frederick National
Laboratory for Cancer Research, Frederick, Maryland 21702, United States
- Department
of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Mingzhen Zhang
- Computational
Structural Biology Section, Frederick National
Laboratory for Cancer Research, Frederick, Maryland 21702, United States
| | - Yonglan Liu
- Cancer
Innovation Laboratory, National Cancer Institute, Frederick, Maryland 21702, United States
| | - Hyunbum Jang
- Computational
Structural Biology Section, Frederick National
Laboratory for Cancer Research, Frederick, Maryland 21702, United States
| |
Collapse
|
19
|
Computation-Aided Design of Albumin Affibody-Inserted Antibody Fragment for the Prolonged Serum Half-Life. Pharmaceutics 2022; 14:pharmaceutics14091769. [PMID: 36145517 PMCID: PMC9500697 DOI: 10.3390/pharmaceutics14091769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 08/17/2022] [Accepted: 08/23/2022] [Indexed: 11/16/2022] Open
Abstract
Single-chain variable fragments (scFvs) have been recognized as promising agents in cancer therapy. However, short serum half-life of scFvs often limits clinical application. Fusion to albumin affibody (ABD) is an effective and convenient half-life extension strategy. Although one terminus of scFv is available for fusion of ABD, it is also frequently used for fusion of useful moieties such as small functional proteins, cytokines, or antibodies. Herein, we investigated the internal linker region for ABD fusion instead of terminal region, which was rarely explored before. We constructed two internally ABD-inserted anti-HER2 4D5scFv (4D5-ABD) variants, which have short (4D5-S-ABD) and long (4D5-L-ABD) linker length respectively. The model structures of these 4D5scFv and 4D5-ABD variants predicted using the deep learning-based protein structure prediction program (AlphaFold2) revealed high similarity to either the original 4D5scFv or the ABD structure, implying that the functionality would be retained. Designed 4D5-ABD variants were expressed in the bacterial expression system and characterized. Both 4D5-ABD variants showed anti-HER2 binding affinity comparable with 4D5scFv. Binding affinity of both 4D5-ABD variants against albumin was also comparable. In a pharmacokinetic study in mice, the 4D5-ABD variants showed a significantly prolonged half-life of 34 h, 114 times longer than that of 4D5scFv. In conclusion, we have developed a versatile scFv platform with enhanced pharmacokinetic profiles with an aid of deep learning-based structure prediction.
Collapse
|
20
|
Ma Q, Lei H, Cao Y. Intramolecular covalent bonds in Gram-positive bacterial surface proteins. Chembiochem 2022; 23:e202200316. [PMID: 35801833 DOI: 10.1002/cbic.202200316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 07/07/2022] [Indexed: 11/09/2022]
Abstract
Gram-positive bacteria experience considerable mechanical perturbation when adhering to host surfaces during colonization and infection. They have evolved various adhesion proteins that are mechanically robust to ensure strong surface adhesion. Recently, it was discovered that these adhesion proteins contain rare, extra intramolecular covalent bonds that stabilize protein structures and participate in surface bonding. These intramolecular covalent bonds include isopeptides, thioesters, and ester bonds, which often form spontaneously without the need for additional enzymes. With the development of single-molecule force spectroscopy techniques, the detailed mechanical roles of these intramolecular covalent bonds have been revealed. In this review, we summarize the recent advances in this area of research, focusing on the link between the mechanical stability and function of these covalent bonds in Gram-positive bacterial surface proteins. We also highlight the potential impact of these discoveries on the development of novel antibiotics and chemical biology tools.
Collapse
Affiliation(s)
- Quan Ma
- Nanjing University, Department of Physics, CHINA
| | - Hai Lei
- Nanjing University, Department of Physics, CHINA
| | - Yi Cao
- Nanjing University, Department of Physics, 22 Hankou Road, 210093, Nanjing, CHINA
| |
Collapse
|
21
|
Pelosi B. Developing a bioinformatics pipeline for comparative protein classification analysis. BMC Genom Data 2022; 23:43. [PMID: 35668373 PMCID: PMC9172112 DOI: 10.1186/s12863-022-01045-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 03/11/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Protein classification is a task of paramount importance in various fields of biology. Despite the great momentum of modern implementation of protein classification, machine learning techniques such as Random Forest and Neural Network could not always be used for several reasons: data collection, unbalanced classification or labelling of the data.As an alternative, I propose the use of a bioinformatics pipeline to search for and classify information from protein databases. Hence, to evaluate the efficiency and accuracy of the pipeline, I focused on the carotenoid biosynthetic genes and developed a filtering approach to retrieve orthologs clusters in two well-studied plants that belong to the Brassicaceae family: Arabidopsis thaliana and Brassica rapa Pekinensis group. The result obtained has been compared with previous studies on carotenoid biosynthetic genes in B. rapa where phylogenetic analysis was conducted. RESULTS The developed bioinformatics pipeline relies on commercial software and multiple databeses including the use of phylogeny, Gene Ontology terms (GOs) and Protein Families (Pfams) at a protein level. Furthermore, the phylogeny is coupled with "population analysis" to evaluate the potential orthologs. All the steps taken together give a final table of potential orthologs. The phylogenetic tree gives a result of 43 putative orthologs conserved in B. rapa Pekinensis group. Different A. thaliana proteins have more than one syntenic ortholog as also shown in a previous finding (Li et al., BMC Genomics 16(1):1-11, 2015). CONCLUSIONS This study demonstrates that, when the biological features of proteins of interest are not specific, I can rely on a computational approach in filtering steps for classification purposes. The comparison of the results obtained here for the carotenoid biosynthetic genes with previous research confirmed the accuracy of the developed pipeline which can therefore be applied for filtering different types of datasets.
Collapse
Affiliation(s)
- Benedetta Pelosi
- Department of Molecular Biosciences, The Wenner-Gren Institute, Stockholm University, Stockholm, Sweden.
| |
Collapse
|
22
|
Nishihara A, Morimoto N, Sumiyoshi T, Yasumoto S, Kondo M, Kono T, Sakai M, Hikima JI. Inhibition of lysozyme lytic activity by Ivy derived from Photobacterium damselae subsp. piscicida. FISH & SHELLFISH IMMUNOLOGY 2022; 124:280-288. [PMID: 35421575 DOI: 10.1016/j.fsi.2022.04.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 04/08/2022] [Accepted: 04/09/2022] [Indexed: 06/14/2023]
Abstract
A pseudotuberculosis pathogen, Photobacterium damselae subsp. piscicida (Pdp), has caused enormous economic damage to yellowtail aquaculture in Japan. The Ivy gene has been discovered in plasmid of Pdp, and it has been proposed that it may help bacteria evade lysozyme-mediated lysis during interaction with an animal host. However, the lysozyme-inhibiting activity of Pdp-derived Ivy (Ivy-Pdp) is unknown, and it is unclear whether it acts as a virulence factor for host biophylaxis. In this study, the inhibitory effect of Ivy-Pdp on lysozyme was evaluated by expressing and purifying the recombinant Ivy-Pdp protein (rIvy-Pdp). The rIvy-Pdp protein inhibited hen egg white lysozyme activity in an rIvy-Pdp-concentration-dependent manner, and its inhibitory effect was similar under different temperature and pH conditions. The serum and skin mucus of the yellowtail (which is the host species of Pdp), Japanese flounder, and Nile tilapia showed bacteriolytic activity. In contrast, the addition of rIvy-Pdp inhibited the lytic activity in the serum of these fish species. In particular, it significantly inhibited lytic activity in the serum and skin mucus of Nile tilapia. On the basis of these results, we suggest that Ivy-Pdp is a temperature- and pH-stable lysozyme inhibitor. Additionally, Ivy-Pdp inhibited the lytic activity of lysozyme, which is involved in host biophylaxis. In summary, we inferred that Ivy-Pdp is an important factor that diminishes the sterilization ability of C-type lysozyme when Pdp infects the host.
Collapse
Affiliation(s)
- Aki Nishihara
- Interdisciplinary Graduate School of Agriculture and Engineering, University of Miyazaki, Miyazaki, 889-2192, Japan
| | - Natsuki Morimoto
- Department of Biochemistry and Applied Biosciences, Faculty of Agriculture, University of Miyazaki, Miyazaki, 889-2192, Japan
| | - Takechiyo Sumiyoshi
- Department of Biochemistry and Applied Biosciences, Faculty of Agriculture, University of Miyazaki, Miyazaki, 889-2192, Japan
| | - Shinya Yasumoto
- Department of Applied Aquabiology, National Fisheries University, Japan Fisheries Research and Education Agency, Yamaguchi 759-6595, Japan
| | - Masakazu Kondo
- Department of Applied Aquabiology, National Fisheries University, Japan Fisheries Research and Education Agency, Yamaguchi 759-6595, Japan
| | - Tomoya Kono
- Department of Biochemistry and Applied Biosciences, Faculty of Agriculture, University of Miyazaki, Miyazaki, 889-2192, Japan
| | - Masahiro Sakai
- Department of Biochemistry and Applied Biosciences, Faculty of Agriculture, University of Miyazaki, Miyazaki, 889-2192, Japan
| | - Jun-Ichi Hikima
- Department of Biochemistry and Applied Biosciences, Faculty of Agriculture, University of Miyazaki, Miyazaki, 889-2192, Japan.
| |
Collapse
|
23
|
Vacuolar Protein-Sorting Receptor MoVps13 Regulates Conidiation and Pathogenicity in Rice Blast Fungus Magnaporthe oryzae. J Fungi (Basel) 2021; 7:jof7121084. [PMID: 34947066 PMCID: PMC8708568 DOI: 10.3390/jof7121084] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 12/04/2021] [Accepted: 12/16/2021] [Indexed: 01/18/2023] Open
Abstract
Magnaporthe oryzae (synonym Pyricularia oryzae) is a filamentous fungal pathogen that causes major yield losses in cultivated rice worldwide. However, the mechanisms of infection of M. oryzae are not well characterized. The VPS13 proteins play vital roles in various biological processes in many eukaryotic organisms, including in the organization of actin cytoskeleton, vesicle trafficking, mitochondrial fusion, and phagocytosis. Nevertheless, the function of the Vps13 protein in plant pathogenic fungi has not been explored. Here, we analysed the biological functions of the Vps13 protein in the development and pathogenicity of M. oryzae. Deletion mutants of MoVps13 significantly reduced the conidiation and decreased the rate of fungal infection on hosts. Moreover, the loss of MoVps13 resulted in defective cell wall integrity (CWI) and plasma membrane (PM) homeostasis when treated with chemicals for inducing cell wall stress (200 mg/mL Congo Red or 0.005% SDS) and sphingolipid synthesis inhibitors (2 μM myriocin or 2 μM amphotericin B). This indicated that MoVps13 is also involved in cell wall synthesis and sphingolipid synthesis. Through immunoblotting, autophagic flux detection, co-localization, and chemical drug sensitivity assays, we confirmed the involvement of Movps13 in ER-phagy and the response to ER stress. Additionally, we generated the C-terminal structure of MoVps13 with high accuracy using the alphaflod2 database. Our experimental evidence indicates that MoVps13 is an important virulence factor that regulates the pathogenicity of M. oryzae by controlling CWI, lipid metabolism and the ER-phagy pathway. These results have expanded our knowledge about pathogenic fungi and will help exploration for novel therapeutic strategies against the rice blast fungus.
Collapse
|
24
|
Perrakis A, Sixma TK. AI revolutions in biology: The joys and perils of AlphaFold. EMBO Rep 2021; 22:e54046. [PMID: 34668287 PMCID: PMC8567224 DOI: 10.15252/embr.202154046] [Citation(s) in RCA: 82] [Impact Index Per Article: 27.3] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 10/05/2021] [Indexed: 11/30/2022] Open
Affiliation(s)
- Anastassis Perrakis
- Oncode Institute and Division of Biochemistry, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Titia K Sixma
- Oncode Institute and Division of Biochemistry, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| |
Collapse
|
25
|
David A, Islam S, Tankhilevich E, Sternberg MJE. The AlphaFold Database of Protein Structures: A Biologist's Guide. J Mol Biol 2021; 434:167336. [PMID: 34757056 PMCID: PMC8783046 DOI: 10.1016/j.jmb.2021.167336] [Citation(s) in RCA: 118] [Impact Index Per Article: 39.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 10/25/2021] [Accepted: 10/26/2021] [Indexed: 01/06/2023]
Abstract
AlphaFold, the deep learning algorithm developed by DeepMind, recently released the three-dimensional models of the whole human proteome to the scientific community. Here we discuss the advantages, limitations and the still unsolved challenges of the AlphaFold models from the perspective of a biologist, who may not be an expert in structural biology.
Collapse
Affiliation(s)
- Alessia David
- Centre for Integrative System Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK.
| | - Suhail Islam
- Centre for Integrative System Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| | - Evgeny Tankhilevich
- Centre for Integrative System Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| | - Michael J E Sternberg
- Centre for Integrative System Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
26
|
AlQuraishi M, Sorger PK. Differentiable biology: using deep learning for biophysics-based and data-driven modeling of molecular mechanisms. Nat Methods 2021; 18:1169-1180. [PMID: 34608321 PMCID: PMC8793939 DOI: 10.1038/s41592-021-01283-4] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Accepted: 08/27/2021] [Indexed: 02/08/2023]
Abstract
Deep learning using neural networks relies on a class of machine-learnable models constructed using 'differentiable programs'. These programs can combine mathematical equations specific to a particular domain of natural science with general-purpose, machine-learnable components trained on experimental data. Such programs are having a growing impact on molecular and cellular biology. In this Perspective, we describe an emerging 'differentiable biology' in which phenomena ranging from the small and specific (for example, one experimental assay) to the broad and complex (for example, protein folding) can be modeled effectively and efficiently, often by exploiting knowledge about basic natural phenomena to overcome the limitations of sparse, incomplete and noisy data. By distilling differentiable biology into a small set of conceptual primitives and illustrative vignettes, we show how it can help to address long-standing challenges in integrating multimodal data from diverse experiments across biological scales. This promises to benefit fields as diverse as biophysics and functional genomics.
Collapse
Affiliation(s)
- Mohammed AlQuraishi
- Department of Systems Biology, Columbia University, New York, NY, USA.
- Laboratory of Systems Pharmacology, Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
| | - Peter K Sorger
- Laboratory of Systems Pharmacology, Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
27
|
Kell DB. The Transporter-Mediated Cellular Uptake and Efflux of Pharmaceutical Drugs and Biotechnology Products: How and Why Phospholipid Bilayer Transport Is Negligible in Real Biomembranes. Molecules 2021; 26:5629. [PMID: 34577099 PMCID: PMC8470029 DOI: 10.3390/molecules26185629] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Revised: 09/03/2021] [Accepted: 09/14/2021] [Indexed: 12/12/2022] Open
Abstract
Over the years, my colleagues and I have come to realise that the likelihood of pharmaceutical drugs being able to diffuse through whatever unhindered phospholipid bilayer may exist in intact biological membranes in vivo is vanishingly low. This is because (i) most real biomembranes are mostly protein, not lipid, (ii) unlike purely lipid bilayers that can form transient aqueous channels, the high concentrations of proteins serve to stop such activity, (iii) natural evolution long ago selected against transport methods that just let any undesirable products enter a cell, (iv) transporters have now been identified for all kinds of molecules (even water) that were once thought not to require them, (v) many experiments show a massive variation in the uptake of drugs between different cells, tissues, and organisms, that cannot be explained if lipid bilayer transport is significant or if efflux were the only differentiator, and (vi) many experiments that manipulate the expression level of individual transporters as an independent variable demonstrate their role in drug and nutrient uptake (including in cytotoxicity or adverse drug reactions). This makes such transporters valuable both as a means of targeting drugs (not least anti-infectives) to selected cells or tissues and also as drug targets. The same considerations apply to the exploitation of substrate uptake and product efflux transporters in biotechnology. We are also beginning to recognise that transporters are more promiscuous, and antiporter activity is much more widespread, than had been realised, and that such processes are adaptive (i.e., were selected by natural evolution). The purpose of the present review is to summarise the above, and to rehearse and update readers on recent developments. These developments lead us to retain and indeed to strengthen our contention that for transmembrane pharmaceutical drug transport "phospholipid bilayer transport is negligible".
Collapse
Affiliation(s)
- Douglas B. Kell
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool L69 7ZB, UK;
- Novo Nordisk Foundation Centre for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs Lyngby, Denmark
- Mellizyme Biotechnology Ltd., IC1, Liverpool Science Park, Mount Pleasant, Liverpool L3 5TF, UK
| |
Collapse
|