1
|
Gillani M, Pollastri G. Protein subcellular localization prediction tools. Comput Struct Biotechnol J 2024; 23:1796-1807. [PMID: 38707539 PMCID: PMC11066471 DOI: 10.1016/j.csbj.2024.04.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 04/11/2024] [Accepted: 04/11/2024] [Indexed: 05/07/2024] Open
Abstract
Protein subcellular localization prediction is of great significance in bioinformatics and biological research. Most of the proteins do not have experimentally determined localization information, computational prediction methods and tools have been acting as an active research area for more than two decades now. Knowledge of the subcellular location of a protein provides valuable information about its functionalities, the functioning of the cell, and other possible interactions with proteins. Fast, reliable, and accurate predictors provides platforms to harness the abundance of sequence data to predict subcellular locations accordingly. During the last decade, there has been a considerable amount of research effort aimed at developing subcellular localization predictors. This paper reviews recent subcellular localization prediction tools in the Eukaryotic, Prokaryotic, and Virus-based categories followed by a detailed analysis. Each predictor is discussed based on its main features, strengths, weaknesses, algorithms used, prediction techniques, and analysis. This review is supported by prediction tools taxonomies that highlight their rele- vant area and examples for uncomplicated categorization and ease of understandability. These taxonomies help users find suitable tools according to their needs. Furthermore, recent research gaps and challenges are discussed to cover areas that need the utmost attention. This survey provides an in-depth analysis of the most recent prediction tools to facilitate readers and can be considered a quick guide for researchers to identify and explore the recent literature advancements.
Collapse
Affiliation(s)
- Maryam Gillani
- School of Computer Science, University College Dublin (UCD), Dublin, D04 V1W8, Ireland
| | - Gianluca Pollastri
- School of Computer Science, University College Dublin (UCD), Dublin, D04 V1W8, Ireland
| |
Collapse
|
2
|
Ghafarollahi A, Buehler MJ. ProtAgents: protein discovery via large language model multi-agent collaborations combining physics and machine learning. DIGITAL DISCOVERY 2024; 3:1389-1409. [PMID: 38993729 PMCID: PMC11235180 DOI: 10.1039/d4dd00013g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 05/13/2024] [Indexed: 07/13/2024]
Abstract
Designing de novo proteins beyond those found in nature holds significant promise for advancements in both scientific and engineering applications. Current methodologies for protein design often rely on AI-based models, such as surrogate models that address end-to-end problems by linking protein structure to material properties or vice versa. However, these models frequently focus on specific material objectives or structural properties, limiting their flexibility when incorporating out-of-domain knowledge into the design process or comprehensive data analysis is required. In this study, we introduce ProtAgents, a platform for de novo protein design based on Large Language Models (LLMs), where multiple AI agents with distinct capabilities collaboratively address complex tasks within a dynamic environment. The versatility in agent development allows for expertise in diverse domains, including knowledge retrieval, protein structure analysis, physics-based simulations, and results analysis. The dynamic collaboration between agents, empowered by LLMs, provides a versatile approach to tackling protein design and analysis problems, as demonstrated through diverse examples in this study. The problems of interest encompass designing new proteins, analyzing protein structures and obtaining new first-principles data - natural vibrational frequencies - via physics simulations. The concerted effort of the system allows for powerful automated and synergistic design of de novo proteins with targeted mechanical properties. The flexibility in designing the agents, on one hand, and their capacity in autonomous collaboration through the dynamic LLM-based multi-agent environment on the other hand, unleashes great potentials of LLMs in addressing multi-objective materials problems and opens up new avenues for autonomous materials discovery and design.
Collapse
Affiliation(s)
- Alireza Ghafarollahi
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology 77 Massachusetts Ave. Cambridge MA 02139 USA
| | - Markus J Buehler
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology 77 Massachusetts Ave. Cambridge MA 02139 USA
- Center for Computational Science and Engineering, Schwarzman College of Computing, Massachusetts Institute of Technology 77 Massachusetts Ave. Cambridge MA 02139 USA
| |
Collapse
|
3
|
Rahimzadeh F, Mohammad Khanli L, Salehpoor P, Golabi F, PourBahrami S. Unveiling the evolution of policies for enhancing protein structure predictions: A comprehensive analysis. Comput Biol Med 2024; 179:108815. [PMID: 38986287 DOI: 10.1016/j.compbiomed.2024.108815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 06/09/2024] [Accepted: 06/24/2024] [Indexed: 07/12/2024]
Abstract
Predicting protein structure is both fascinating and formidable, playing a crucial role in structure-based drug discovery and unraveling diseases with elusive origins. The Critical Assessment of Protein Structure Prediction (CASP) serves as a biannual battleground where global scientists converge to untangle the intricate relationships within amino acid chains. Two primary methods, Template-Based Modeling (TBM) and Template-Free (TF) strategies, dominate protein structure prediction. The trend has shifted towards Template-Free predictions due to their broader sequence coverage with fewer templates. The predictive process can be broadly classified into contact map, binned-distance, and real-valued distance predictions, each with distinctive strengths and limitations manifested through tailored loss functions. We have also introduced revolutionary end-to-end, and all-atom diffusion-based techniques that have transformed protein structure predictions. Recent advancements in deep learning techniques have significantly improved prediction accuracy, although the effectiveness is contingent upon the quality of input features derived from natural bio-physiochemical attributes and Multiple Sequence Alignments (MSA). Hence, the generation of high-quality MSA data holds paramount importance in harnessing informative input features for enhanced prediction outcomes. Remarkable successes have been achieved in protein structure prediction accuracy, however not enough for what structural knowledge was intended to, which implies need for development in some other aspects of the predictions. In this regard, scientists have opened other frontiers for protein structural prediction. The utilization of subsampling in multiple sequence alignment (MSA) and protein language modeling appears to be particularly promising in enhancing the accuracy and efficiency of predictions, ultimately aiding in drug discovery efforts. The exploration of predicting protein complex structure also opens up exciting opportunities to deepen our knowledge of molecular interactions and design therapeutics that are more effective. In this article, we have discussed the vicissitudes that the scientists have gone through to improve prediction accuracy, and examined the effective policies in predicting from different aspects, including the construction of high quality MSA, providing informative input features, and progresses in deep learning approaches. We have also briefly touched upon transitioning from predicting single-chain protein structures to predicting protein complex structures. Our findings point towards promoting open research environments to support the objectives of protein structure prediction.
Collapse
Affiliation(s)
- Faezeh Rahimzadeh
- Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran
| | | | - Pedram Salehpoor
- Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran
| | - Faegheh Golabi
- Department of Biomedical Engineering, Faculty of Advanced Medical Sciences, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Shahin PourBahrami
- Department of Computer Engineering, Technical and Vocational University (TVU), Tehran, Iran
| |
Collapse
|
4
|
Wang J, Watson JL, Lisanza SL. Protein Design Using Structure-Prediction Networks: AlphaFold and RoseTTAFold as Protein Structure Foundation Models. Cold Spring Harb Perspect Biol 2024; 16:a041472. [PMID: 38438190 PMCID: PMC11216169 DOI: 10.1101/cshperspect.a041472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2024]
Abstract
Designing proteins with tailored structures and functions is a long-standing goal in bioengineering. Recently, deep learning advances have enabled protein structure prediction at near-experimental accuracy, which has catalyzed progress in protein design as well. We review recent studies that use structure-prediction neural networks to design proteins, via approaches such as activation maximization, inpainting, or denoising diffusion. These methods have led to major improvements over previous methods in wet-lab success rates for designing protein binders, metalloproteins, enzymes, and oligomeric assemblies. These results show that structure-prediction models are a powerful foundation for developing protein-design tools and suggest that continued improvement of their accuracy and generality will be key to unlocking the full potential of protein design.
Collapse
Affiliation(s)
- Jue Wang
- Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA
- Institute for Protein Design, University of Washington, Seattle, Washington 98195, USA
- Graduate Program in Biological Physics, Structure and Design, University of Washington, Seattle, Washington 98195, USA
- DeepMind, London EC4A 3BF, United Kingdom
| | - Joseph L Watson
- Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA
- Institute for Protein Design, University of Washington, Seattle, Washington 98195, USA
| | - Sidney L Lisanza
- Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA
- Institute for Protein Design, University of Washington, Seattle, Washington 98195, USA
- Graduate Program in Biological Physics, Structure and Design, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
5
|
Tang X, Dai H, Knight E, Wu F, Li Y, Li T, Gerstein M. A survey of generative AI for de novo drug design: new frontiers in molecule and protein generation. Brief Bioinform 2024; 25:bbae338. [PMID: 39007594 PMCID: PMC11247410 DOI: 10.1093/bib/bbae338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Revised: 05/21/2024] [Accepted: 06/27/2024] [Indexed: 07/16/2024] Open
Abstract
Artificial intelligence (AI)-driven methods can vastly improve the historically costly drug design process, with various generative models already in widespread use. Generative models for de novo drug design, in particular, focus on the creation of novel biological compounds entirely from scratch, representing a promising future direction. Rapid development in the field, combined with the inherent complexity of the drug design process, creates a difficult landscape for new researchers to enter. In this survey, we organize de novo drug design into two overarching themes: small molecule and protein generation. Within each theme, we identify a variety of subtasks and applications, highlighting important datasets, benchmarks, and model architectures and comparing the performance of top models. We take a broad approach to AI-driven drug design, allowing for both micro-level comparisons of various methods within each subtask and macro-level observations across different fields. We discuss parallel challenges and approaches between the two applications and highlight future directions for AI-driven de novo drug design as a whole. An organized repository of all covered sources is available at https://github.com/gersteinlab/GenAI4Drug.
Collapse
Affiliation(s)
- Xiangru Tang
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
| | - Howard Dai
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
| | - Elizabeth Knight
- School of Medicine, Yale University, New Haven, CT 06520, United States
| | - Fang Wu
- Computer Science Department, Stanford University, CA 94305, United States
| | - Yunyang Li
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
| | - Tianxiao Li
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT 06520, United States
| | - Mark Gerstein
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT 06520, United States
- Department of Statistics & Data Science, Yale University, New Haven, CT 06520, United States
- Department of Biomedical Informatics & Data Science, Yale University, New Haven, CT 06520, United States
- Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, CT 06520, United States
| |
Collapse
|
6
|
Zheng T, Zhang C. Engineering strategies and challenges of endolysin as an antibacterial agent against Gram-negative bacteria. Microb Biotechnol 2024; 17:e14465. [PMID: 38593316 PMCID: PMC11003714 DOI: 10.1111/1751-7915.14465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 03/09/2024] [Accepted: 03/21/2024] [Indexed: 04/11/2024] Open
Abstract
Bacteriophage endolysin is a novel antibacterial agent that has attracted much attention in the prevention and control of drug-resistant bacteria due to its unique mechanism of hydrolysing peptidoglycans. Although endolysin exhibits excellent bactericidal effects on Gram-positive bacteria, the presence of the outer membrane of Gram-negative bacteria makes it difficult to lyse them extracellularly, thus limiting their application field. To enhance the extracellular activity of endolysin and facilitate its crossing through the outer membrane of Gram-negative bacteria, researchers have adopted physical, chemical, and molecular methods. This review summarizes the characterization of endolysin targeting Gram-negative bacteria, strategies for endolysin modification, and the challenges and future of engineering endolysin against Gram-negative bacteria in clinical applications, to promote the application of endolysin in the prevention and control of Gram-negative bacteria.
Collapse
Affiliation(s)
- Tianyu Zheng
- Bathurst Future Agri‐Tech InstituteQingdao Agricultural UniversityQingdaoChina
| | - Can Zhang
- College of Veterinary MedicineQingdao Agricultural UniversityQingdaoChina
| |
Collapse
|
7
|
Reveguk I, Simonson T. Classifying protein kinase conformations with machine learning. Protein Sci 2024; 33:e4918. [PMID: 38501429 PMCID: PMC10962494 DOI: 10.1002/pro.4918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 01/02/2024] [Accepted: 01/22/2024] [Indexed: 03/20/2024]
Abstract
Protein kinases are key actors of signaling networks and important drug targets. They cycle between active and inactive conformations, distinguished by a few elements within the catalytic domain. One is the activation loop, whose conserved DFG motif can occupy DFG-in, DFG-out, and some rarer conformations. Annotation and classification of the structural kinome are important, as different conformations can be targeted by different inhibitors and activators. Valuable resources exist; however, large-scale applications will benefit from increased automation and interpretability of structural annotation. Interpretable machine learning models are described for this purpose, based on ensembles of decision trees. To train them, a set of catalytic domain sequences and structures was collected, somewhat larger and more diverse than existing resources. The structures were clustered based on the DFG conformation and manually annotated. They were then used as training input. Two main models were constructed, which distinguished active/inactive and in/out/other DFG conformations. They considered initially 1692 structural variables, spanning the whole catalytic domain, then identified ("learned") a small subset that sufficed for accurate classification. The first model correctly labeled all but 3 of 3289 structures as active or inactive, while the second assigned the correct DFG label to all but 17 of 8826 structures. The most potent classifying variables were all related to well-known structural elements in or near the activation loop and their ranking gives insights into the conformational preferences. The models were used to automatically annotate 3850 kinase structures predicted recently with the Alphafold2 tool, showing that Alphafold2 reproduced the active/inactive but not the DFG-in proportions seen in the Protein Data Bank. We expect the models will be useful for understanding and engineering kinases.
Collapse
Affiliation(s)
- Ivan Reveguk
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654)Ecole PolytechniquePalaiseauFrance
| | - Thomas Simonson
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654)Ecole PolytechniquePalaiseauFrance
| |
Collapse
|
8
|
Wei J, Xiao J, Chen S, Zong L, Gao X, Li Y. ProNet DB: a proteome-wise database for protein surface property representations and RNA-binding profiles. Database (Oxford) 2024; 2024:baae012. [PMID: 38557634 PMCID: PMC10984565 DOI: 10.1093/database/baae012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 01/08/2024] [Accepted: 02/17/2024] [Indexed: 04/04/2024]
Abstract
The rapid growth in the number of experimental and predicted protein structures and more complicated protein structures poses a significant challenge for computational biology in leveraging structural information and accurate representation of protein surface properties. Recently, AlphaFold2 released the comprehensive proteomes of various species, and protein surface property representation plays a crucial role in protein-molecule interaction predictions, including those involving proteins, nucleic acids and compounds. Here, we proposed the first extensive database, namely ProNet DB, that integrates multiple protein surface representations and RNA-binding landscape for 326 175 protein structures. This collection encompasses the 16 model organism proteomes from the AlphaFold Protein Structure Database and experimentally validated structures from the Protein Data Bank. For each protein, ProNet DB provides access to the original protein structures along with the detailed surface property representations encompassing hydrophobicity, charge distribution and hydrogen bonding potential as well as interactive features such as the interacting face and RNA-binding sites and preferences. To facilitate an intuitive interpretation of these properties and the RNA-binding landscape, ProNet DB incorporates visualization tools like Mol* and an Online 3D Viewer, allowing for the direct observation and analysis of these representations on protein surfaces. The availability of pre-computed features enables instantaneous access for users, significantly advancing computational biology research in areas such as molecular mechanism elucidation, geometry-based drug discovery and the development of novel therapeutic approaches. Database URL: https://proj.cse.cuhk.edu.hk/aihlab/pronet/.
Collapse
Affiliation(s)
- Junkang Wei
- Department of Computer Science and Engineering (CSE), The Chinese University of Hong Kong (CUHK), Chung Chi Rd, Ma Liu Shui, Hong Kong SAR 999077, China
| | - Jin Xiao
- Department of Computer Science and Engineering (CSE), The Chinese University of Hong Kong (CUHK), Chung Chi Rd, Ma Liu Shui, Hong Kong SAR 999077, China
| | - Siyuan Chen
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, Thuwal 23955, Kingdom of Saudi Arabia
| | - Licheng Zong
- Department of Computer Science and Engineering (CSE), The Chinese University of Hong Kong (CUHK), Chung Chi Rd, Ma Liu Shui, Hong Kong SAR 999077, China
| | - Xin Gao
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, Thuwal 23955, Kingdom of Saudi Arabia
| | - Yu Li
- Department of Computer Science and Engineering (CSE), The Chinese University of Hong Kong (CUHK), Chung Chi Rd, Ma Liu Shui, Hong Kong SAR 999077, China
- The CUHK Shenzhen Research Institute, 4 Gaoxin Ave Nanshan, Shenzhen 518057, China
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, 45 Carleton Street, Cambridge, MA 02142, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, 201 Brookline Avenue, Boston, MA 02215, USA
- Broad Institute of MIT and Harvard, Merkin Building, 415 Main Street, Cambridge, MA 02142, USA
| |
Collapse
|
9
|
Maiti S, Singh A, Maji T, Saibo NV, De S. Experimental methods to study the structure and dynamics of intrinsically disordered regions in proteins. Curr Res Struct Biol 2024; 7:100138. [PMID: 38707546 PMCID: PMC11068507 DOI: 10.1016/j.crstbi.2024.100138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 03/12/2024] [Accepted: 03/15/2024] [Indexed: 05/07/2024] Open
Abstract
Eukaryotic proteins often feature long stretches of amino acids that lack a well-defined three-dimensional structure and are referred to as intrinsically disordered proteins (IDPs) or regions (IDRs). Although these proteins challenge conventional structure-function paradigms, they play vital roles in cellular processes. Recent progress in experimental techniques, such as NMR spectroscopy, single molecule FRET, high speed AFM and SAXS, have provided valuable insights into the biophysical basis of IDP function. This review discusses the advancements made in these techniques particularly for the study of disordered regions in proteins. In NMR spectroscopy new strategies such as 13C detection, non-uniform sampling, segmental isotope labeling, and rapid data acquisition methods address the challenges posed by spectral overcrowding and low stability of IDPs. The importance of various NMR parameters, including chemical shifts, hydrogen exchange rates, and relaxation measurements, to reveal transient secondary structures within IDRs and IDPs are presented. Given the high flexibility of IDPs, the review outlines NMR methods for assessing their dynamics at both fast (ps-ns) and slow (μs-ms) timescales. IDPs exert their functions through interactions with other molecules such as proteins, DNA, or RNA. NMR-based titration experiments yield insights into the thermodynamics and kinetics of these interactions. Detailed study of IDPs requires multiple experimental techniques, and thus, several methods are described for studying disordered proteins, highlighting their respective advantages and limitations. The potential for integrating these complementary techniques, each offering unique perspectives, is explored to achieve a comprehensive understanding of IDPs.
Collapse
Affiliation(s)
| | - Aakanksha Singh
- School of Bioscience, Indian Institute of Technology Kharagpur, Kharagpur, WB, 721302, India
| | - Tanisha Maji
- School of Bioscience, Indian Institute of Technology Kharagpur, Kharagpur, WB, 721302, India
| | - Nikita V. Saibo
- School of Bioscience, Indian Institute of Technology Kharagpur, Kharagpur, WB, 721302, India
| | - Soumya De
- School of Bioscience, Indian Institute of Technology Kharagpur, Kharagpur, WB, 721302, India
| |
Collapse
|
10
|
Corum MR, Venkannagari H, Hryc CF, Baker ML. Predictive modeling and cryo-EM: A synergistic approach to modeling macromolecular structure. Biophys J 2024; 123:435-450. [PMID: 38268190 PMCID: PMC10912932 DOI: 10.1016/j.bpj.2024.01.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 01/09/2024] [Accepted: 01/18/2024] [Indexed: 01/26/2024] Open
Abstract
Over the last 15 years, structural biology has seen unprecedented development and improvement in two areas: electron cryo-microscopy (cryo-EM) and predictive modeling. Once relegated to low resolutions, single-particle cryo-EM is now capable of achieving near-atomic resolutions of a wide variety of macromolecular complexes. Ushered in by AlphaFold, machine learning has powered the current generation of predictive modeling tools, which can accurately and reliably predict models for proteins and some complexes directly from the sequence alone. Although they offer new opportunities individually, there is an inherent synergy between these techniques, allowing for the construction of large, complex macromolecular models. Here, we give a brief overview of these approaches in addition to illustrating works that combine these techniques for model building. These examples provide insight into model building, assessment, and limitations when integrating predictive modeling with cryo-EM density maps. Together, these approaches offer the potential to greatly accelerate the generation of macromolecular structural insights, particularly when coupled with experimental data.
Collapse
Affiliation(s)
- Michael R Corum
- Department of Biochemistry and Molecular Biology, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas
| | - Harikanth Venkannagari
- Department of Biochemistry and Molecular Biology, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas
| | - Corey F Hryc
- Department of Biochemistry and Molecular Biology, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas
| | - Matthew L Baker
- Department of Biochemistry and Molecular Biology, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas.
| |
Collapse
|
11
|
Pun MN, Ivanov A, Bellamy Q, Montague Z, LaMont C, Bradley P, Otwinowski J, Nourmohammad A. Learning the shape of protein microenvironments with a holographic convolutional neural network. Proc Natl Acad Sci U S A 2024; 121:e2300838121. [PMID: 38300863 PMCID: PMC10861886 DOI: 10.1073/pnas.2300838121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 11/29/2023] [Indexed: 02/03/2024] Open
Abstract
Proteins play a central role in biology from immune recognition to brain activity. While major advances in machine learning have improved our ability to predict protein structure from sequence, determining protein function from its sequence or structure remains a major challenge. Here, we introduce holographic convolutional neural network (H-CNN) for proteins, which is a physically motivated machine learning approach to model amino acid preferences in protein structures. H-CNN reflects physical interactions in a protein structure and recapitulates the functional information stored in evolutionary data. H-CNN accurately predicts the impact of mutations on protein stability and binding of protein complexes. Our interpretable computational model for protein structure-function maps could guide design of novel proteins with desired function.
Collapse
Affiliation(s)
- Michael N. Pun
- Department of Physics, University of Washington, Seattle, WA98195
- The Department for Statistical Physics of Evolving Systems, Max Planck Institute for Dynamics and Self-Organization, Göttingen37077, Germany
| | - Andrew Ivanov
- Department of Physics, University of Washington, Seattle, WA98195
| | - Quinn Bellamy
- Department of Physics, University of Washington, Seattle, WA98195
| | - Zachary Montague
- Department of Physics, University of Washington, Seattle, WA98195
- The Department for Statistical Physics of Evolving Systems, Max Planck Institute for Dynamics and Self-Organization, Göttingen37077, Germany
| | - Colin LaMont
- The Department for Statistical Physics of Evolving Systems, Max Planck Institute for Dynamics and Self-Organization, Göttingen37077, Germany
| | - Philip Bradley
- Fred Hutchinson Cancer Center, Seattle, WA98102
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
| | - Jakub Otwinowski
- The Department for Statistical Physics of Evolving Systems, Max Planck Institute for Dynamics and Self-Organization, Göttingen37077, Germany
- Dyno Therapeutics, Watertown, MA02472
| | - Armita Nourmohammad
- Department of Physics, University of Washington, Seattle, WA98195
- The Department for Statistical Physics of Evolving Systems, Max Planck Institute for Dynamics and Self-Organization, Göttingen37077, Germany
- Fred Hutchinson Cancer Center, Seattle, WA98102
- Department of Applied Mathematics, University of Washington, Seattle, WA98105
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA98195
| |
Collapse
|
12
|
Xu B, Chen Y, Xue W. Computational Protein Design - Where it goes? Curr Med Chem 2024; 31:2841-2854. [PMID: 37272467 DOI: 10.2174/0929867330666230602143700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Revised: 02/18/2023] [Accepted: 03/15/2023] [Indexed: 06/06/2023]
Abstract
Proteins have been playing a critical role in the regulation of diverse biological processes related to human life. With the increasing demand, functional proteins are sparse in this immense sequence space. Therefore, protein design has become an important task in various fields, including medicine, food, energy, materials, etc. Directed evolution has recently led to significant achievements. Molecular modification of proteins through directed evolution technology has significantly advanced the fields of enzyme engineering, metabolic engineering, medicine, and beyond. However, it is impossible to identify desirable sequences from a large number of synthetic sequences alone. As a result, computational methods, including data-driven machine learning and physics-based molecular modeling, have been introduced to protein engineering to produce more functional proteins. This review focuses on recent advances in computational protein design, highlighting the applicability of different approaches as well as their limitations.
Collapse
Affiliation(s)
- Binbin Xu
- Chongqing Key Laboratory of Natural Product Synthesis and Drug Research, School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Yingjun Chen
- Chongqing Key Laboratory of Natural Product Synthesis and Drug Research, School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Weiwei Xue
- Chongqing Key Laboratory of Natural Product Synthesis and Drug Research, School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| |
Collapse
|
13
|
Biswas A, Kumari A, Gaikwad DS, Pandey DK. Revolutionizing Biological Science: The Synergy of Genomics in Health, Bioinformatics, Agriculture, and Artificial Intelligence. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2023; 27:550-569. [PMID: 38100404 DOI: 10.1089/omi.2023.0197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2023]
Abstract
With climate emergency, COVID-19, and the rise of planetary health scholarship, the binary of human and ecosystem health has been deeply challenged. The interdependence of human and nonhuman animal health is increasingly acknowledged and paving the way for new frontiers in integrative biology. The convergence of genomics in health, bioinformatics, agriculture, and artificial intelligence (AI) has ushered in a new era of possibilities and applications. However, the sheer volume of genomic/multiomics big data generated also presents formidable sociotechnical challenges in extracting meaningful biological, planetary health and ecological insights. Over the past few years, AI-guided bioinformatics has emerged as a powerful tool for managing, analyzing, and interpreting complex biological datasets. The advances in AI, particularly in machine learning and deep learning, have been transforming the fields of genomics, planetary health, and agriculture. This article aims to unpack and explore the formidable range of possibilities and challenges that result from such transdisciplinary integration, and emphasizes its radically transformative potential for human and ecosystem health. The integration of these disciplines is also driving significant advancements in precision medicine and personalized health care. This presents an unprecedented opportunity to deepen our understanding of complex biological systems and advance the well-being of all life in planetary ecosystems. Notwithstanding in mind its sociotechnical, ethical, and critical policy challenges, the integration of genomics, multiomics, planetary health, and agriculture with AI-guided bioinformatics opens up vast opportunities for transnational collaborative efforts, data sharing, analysis, valorization, and interdisciplinary innovations in life sciences and integrative biology.
Collapse
Affiliation(s)
- Aakanksha Biswas
- Amity Institute of Biotechnology, Amity University Jharkhand, Ranchi, India
| | - Aditi Kumari
- Amity Institute of Biotechnology, Amity University Jharkhand, Ranchi, India
| | - D S Gaikwad
- Amity Institute of Organic Agriculture, Amity University, Noida, India
| | - Dhananjay K Pandey
- Amity Institute of Biotechnology, Amity University Jharkhand, Ranchi, India
| |
Collapse
|
14
|
Ochoa R, Fox T. Assessing the fast prediction of peptide conformers and the impact of non-natural modifications. J Mol Graph Model 2023; 125:108608. [PMID: 37659134 DOI: 10.1016/j.jmgm.2023.108608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 08/17/2023] [Accepted: 08/18/2023] [Indexed: 09/04/2023]
Abstract
We present an assessment of different approaches to predict peptide structures using modeling tools. Several small molecule, protein, and peptide-focused methodologies were used for the fast prediction of conformers for peptides shorter than 30 amino acids. We assessed the effect of including restraints based on annotated or predicted secondary structure motifs. A number of peptides in bound conformations and in solution were collected to compare the tools. In addition, we studied the impact of changing single amino acids to non-natural residues using molecular dynamics simulations. Deep learning methods such as AlphaFold2, or the combination of physics-based approaches with secondary structure information, produce the most accurate results for natural sequences. In the case of peptides with non-natural modifications, modeling the peptide containing natural amino acids first and then modifying and simulating the peptide using benchmarked force fields is a recommended pipeline. The results can guide the modeling of oligopeptides for drug discovery projects.
Collapse
Affiliation(s)
- Rodrigo Ochoa
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co KG, 88397 Biberach/Riss, Germany.
| | - Thomas Fox
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co KG, 88397 Biberach/Riss, Germany
| |
Collapse
|
15
|
López-Luis MA, Soriano-Pérez EE, Parada-Fabián JC, Torres J, Maldonado-Rodríguez R, Méndez-Tenorio A. A Proposal for a Consolidated Structural Model of the CagY Protein of Helicobacter pylori. Int J Mol Sci 2023; 24:16781. [PMID: 38069104 PMCID: PMC10706595 DOI: 10.3390/ijms242316781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 11/17/2023] [Accepted: 11/22/2023] [Indexed: 12/18/2023] Open
Abstract
CagY is the largest and most complex protein from Helicobacter pylori's (Hp) type IV secretion system (T4SS), playing a critical role in the modulation of gastric inflammation and risk for gastric cancer. CagY spans from the inner to the outer membrane, forming a channel through which Hp molecules are injected into human gastric cells. Yet, a tridimensional structure has been reported for only short segments of the protein. This intricate protein was modeled using different approaches, including homology modeling, ab initio, and deep learning techniques. The challengingly long middle repeat region (MRR) was modeled using deep learning and optimized using equilibrium molecular dynamics. The previously modeled segments were assembled into a 1595 aa chain and a 14-chain CagY multimer structure was assembled by structural alignment. The final structure correlated with published structures and allowed to show how the multimer may form the T4SS channel through which CagA and other molecules are translocated to gastric cells. The model confirmed that MRR, the most polymorphic and complex region of CagY, presents numerous cysteine residues forming disulfide bonds that stabilize the protein and suggest this domain may function as a contractile region playing an essential role in the modulating activity of CagY on tissue inflammation.
Collapse
Affiliation(s)
- Mario Angel López-Luis
- Laboratorio de Biotecnología y Bioinformática Genómica, Departamento de Bioquímica, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Campus Lázaro Cárdenas, Mexico City 11340, Mexico; (M.A.L.-L.); (E.E.S.-P.); (J.C.P.-F.); (R.M.-R.)
| | - Eva Elda Soriano-Pérez
- Laboratorio de Biotecnología y Bioinformática Genómica, Departamento de Bioquímica, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Campus Lázaro Cárdenas, Mexico City 11340, Mexico; (M.A.L.-L.); (E.E.S.-P.); (J.C.P.-F.); (R.M.-R.)
| | - José Carlos Parada-Fabián
- Laboratorio de Biotecnología y Bioinformática Genómica, Departamento de Bioquímica, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Campus Lázaro Cárdenas, Mexico City 11340, Mexico; (M.A.L.-L.); (E.E.S.-P.); (J.C.P.-F.); (R.M.-R.)
| | - Javier Torres
- Unidad de Investigación en Enfermedades Infecciosas, UMAE Pediatría, Instituto Mexicano del Seguro Social, Mexico City 06720, Mexico;
| | - Rogelio Maldonado-Rodríguez
- Laboratorio de Biotecnología y Bioinformática Genómica, Departamento de Bioquímica, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Campus Lázaro Cárdenas, Mexico City 11340, Mexico; (M.A.L.-L.); (E.E.S.-P.); (J.C.P.-F.); (R.M.-R.)
| | - Alfonso Méndez-Tenorio
- Laboratorio de Biotecnología y Bioinformática Genómica, Departamento de Bioquímica, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Campus Lázaro Cárdenas, Mexico City 11340, Mexico; (M.A.L.-L.); (E.E.S.-P.); (J.C.P.-F.); (R.M.-R.)
| |
Collapse
|
16
|
Zhou X, Chen G, Ye J, Wang E, Zhang J, Mao C, Li Z, Hao J, Huang X, Tang J, Heng PA. ProRefiner: an entropy-based refining strategy for inverse protein folding with global graph attention. Nat Commun 2023; 14:7434. [PMID: 37973874 PMCID: PMC10654420 DOI: 10.1038/s41467-023-43166-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 11/02/2023] [Indexed: 11/19/2023] Open
Abstract
Inverse Protein Folding (IPF) is an important task of protein design, which aims to design sequences compatible with a given backbone structure. Despite the prosperous development of algorithms for this task, existing methods tend to rely on noisy predicted residues located in the local neighborhood when generating sequences. To address this limitation, we propose an entropy-based residue selection method to remove noise in the input residue context. Additionally, we introduce ProRefiner, a memory-efficient global graph attention model to fully utilize the denoised context. Our proposed method achieves state-of-the-art performance on multiple sequence design benchmarks in different design settings. Furthermore, we demonstrate the applicability of ProRefiner in redesigning Transposon-associated transposase B, where six out of the 20 variants we propose exhibit improved gene editing activity.
Collapse
Affiliation(s)
- Xinyi Zhou
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Central Ave, Hong Kong, China
| | | | - Junjie Ye
- Noah's Ark Lab, Huawei, Shenzhen, China
| | - Ercheng Wang
- Zhejiang Lab, Kechuang Avenue, Hangzhou, China
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Jun Zhang
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, China
| | - Cong Mao
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, China
| | - Zhanwei Li
- Zhejiang Lab, Kechuang Avenue, Hangzhou, China
| | | | | | - Jin Tang
- Zhejiang Lab, Kechuang Avenue, Hangzhou, China
| | - Pheng Ann Heng
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Central Ave, Hong Kong, China
- Zhejiang Lab, Kechuang Avenue, Hangzhou, China
| |
Collapse
|
17
|
Lu C, Lubin JH, Sarma VV, Stentz SZ, Wang G, Wang S, Khare SD. Prediction and design of protease enzyme specificity using a structure-aware graph convolutional network. Proc Natl Acad Sci U S A 2023; 120:e2303590120. [PMID: 37729196 PMCID: PMC10523478 DOI: 10.1073/pnas.2303590120] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Accepted: 08/14/2023] [Indexed: 09/22/2023] Open
Abstract
Site-specific proteolysis by the enzymatic cleavage of small linear sequence motifs is a key posttranslational modification involved in physiology and disease. The ability to robustly and rapidly predict protease-substrate specificity would also enable targeted proteolytic cleavage by designed proteases. Current methods for predicting protease specificity are limited to sequence pattern recognition in experimentally derived cleavage data obtained for libraries of potential substrates and generated separately for each protease variant. We reasoned that a more semantically rich and robust model of protease specificity could be developed by incorporating the energetics of molecular interactions between protease and substrates into machine learning workflows. We present Protein Graph Convolutional Network (PGCN), which develops a physically grounded, structure-based molecular interaction graph representation that describes molecular topology and interaction energetics to predict enzyme specificity. We show that PGCN accurately predicts the specificity landscapes of several variants of two model proteases. Node and edge ablation tests identified key graph elements for specificity prediction, some of which are consistent with known biochemical constraints for protease:substrate recognition. We used a pretrained PGCN model to guide the design of protease libraries for cleaving two noncanonical substrates, and found good agreement with experimental cleavage results. Importantly, the model can accurately assess designs featuring diversity at positions not present in the training data. The described methodology should enable the structure-based prediction of specificity landscapes of a wide variety of proteases and the construction of tailor-made protease editors for site-selectively and irreversibly modifying chosen target proteins.
Collapse
Affiliation(s)
- Changpeng Lu
- Institute for Quantitative Biomedicine, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | - Joseph H. Lubin
- Department of Chemistry and Chemical Biology, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | - Vidur V. Sarma
- Institute for Quantitative Biomedicine, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | | | - Guanyang Wang
- Department of Statistics, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | - Sijian Wang
- Institute for Quantitative Biomedicine, Rutgers–The State University of New Jersey, Piscataway, NJ08854
- Department of Statistics, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | - Sagar D. Khare
- Institute for Quantitative Biomedicine, Rutgers–The State University of New Jersey, Piscataway, NJ08854
- Department of Chemistry and Chemical Biology, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| |
Collapse
|
18
|
Pandey A, Liu E, Graham J, Chen W, Keten S. B-factor prediction in proteins using a sequence-based deep learning model. PATTERNS (NEW YORK, N.Y.) 2023; 4:100805. [PMID: 37720331 PMCID: PMC10499862 DOI: 10.1016/j.patter.2023.100805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 05/23/2023] [Accepted: 07/07/2023] [Indexed: 09/19/2023]
Abstract
B factors provide critical insight into protein dynamics. Predicting B factors of an atom in new proteins remains challenging as it is impacted by their neighbors in Euclidean space. Previous learning methods developed have resulted in low Pearson correlation coefficients beyond the training set due to their limited ability to capture the effect of neighboring atoms. With the advances in deep learning methods, we develop a sequence-based model that is tested on 2,442 proteins and outperforms the state-of-the-art models by 30%. We find that the model learns that the B factor of a site is prominently affected by atoms within a 12-15 Å radius, which is in excellent agreement with cutoffs from protein network models. The ablation study revealed that the B factor can largely be predicted from the primary sequence alone. Based on the abovementioned points, our model lays a foundation for predicting other properties that are correlated with the B factor.
Collapse
Affiliation(s)
- Akash Pandey
- Department of Mechanical Engineering, Northwestern University, Evanston, IL, USA
| | - Elaine Liu
- Department of Mechanical Engineering, Northwestern University, Evanston, IL, USA
| | - Jacob Graham
- Department of Mechanical Engineering, Northwestern University, Evanston, IL, USA
| | - Wei Chen
- Department of Mechanical Engineering, Northwestern University, Evanston, IL, USA
| | - Sinan Keten
- Department of Mechanical Engineering, Northwestern University, Evanston, IL, USA
- Department of Civil and Environmental Engineering, Northwestern University, Evanston, IL, USA
| |
Collapse
|
19
|
Bauer J, Rajagopal N, Gupta P, Gupta P, Nixon AE, Kumar S. How can we discover developable antibody-based biotherapeutics? Front Mol Biosci 2023; 10:1221626. [PMID: 37609373 PMCID: PMC10441133 DOI: 10.3389/fmolb.2023.1221626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 07/10/2023] [Indexed: 08/24/2023] Open
Abstract
Antibody-based biotherapeutics have emerged as a successful class of pharmaceuticals despite significant challenges and risks to their discovery and development. This review discusses the most frequently encountered hurdles in the research and development (R&D) of antibody-based biotherapeutics and proposes a conceptual framework called biopharmaceutical informatics. Our vision advocates for the syncretic use of computation and experimentation at every stage of biologic drug discovery, considering developability (manufacturability, safety, efficacy, and pharmacology) of potential drug candidates from the earliest stages of the drug discovery phase. The computational advances in recent years allow for more precise formulation of disease concepts, rapid identification, and validation of targets suitable for therapeutic intervention and discovery of potential biotherapeutics that can agonize or antagonize them. Furthermore, computational methods for de novo and epitope-specific antibody design are increasingly being developed, opening novel computationally driven opportunities for biologic drug discovery. Here, we review the opportunities and limitations of emerging computational approaches for optimizing antigens to generate robust immune responses, in silico generation of antibody sequences, discovery of potential antibody binders through virtual screening, assessment of hits, identification of lead drug candidates and their affinity maturation, and optimization for developability. The adoption of biopharmaceutical informatics across all aspects of drug discovery and development cycles should help bring affordable and effective biotherapeutics to patients more quickly.
Collapse
Affiliation(s)
- Joschka Bauer
- Early Stage Pharmaceutical Development Biologicals, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach/Riss, Germany
- In Silico Team, Boehringer Ingelheim, Hannover, Germany
| | - Nandhini Rajagopal
- In Silico Team, Boehringer Ingelheim, Hannover, Germany
- Biotherapeutics Discovery, Boehringer Ingelheim Pharmaceuticals Inc., Ridgefield, CT, United States
| | - Priyanka Gupta
- In Silico Team, Boehringer Ingelheim, Hannover, Germany
- Biotherapeutics Discovery, Boehringer Ingelheim Pharmaceuticals Inc., Ridgefield, CT, United States
| | - Pankaj Gupta
- In Silico Team, Boehringer Ingelheim, Hannover, Germany
- Biotherapeutics Discovery, Boehringer Ingelheim Pharmaceuticals Inc., Ridgefield, CT, United States
| | - Andrew E. Nixon
- Biotherapeutics Discovery, Boehringer Ingelheim Pharmaceuticals Inc., Ridgefield, CT, United States
| | - Sandeep Kumar
- In Silico Team, Boehringer Ingelheim, Hannover, Germany
- Biotherapeutics Discovery, Boehringer Ingelheim Pharmaceuticals Inc., Ridgefield, CT, United States
| |
Collapse
|
20
|
Casadevall G, Duran C, Osuna S. AlphaFold2 and Deep Learning for Elucidating Enzyme Conformational Flexibility and Its Application for Design. JACS AU 2023; 3:1554-1562. [PMID: 37388680 PMCID: PMC10302747 DOI: 10.1021/jacsau.3c00188] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 05/22/2023] [Accepted: 05/22/2023] [Indexed: 07/01/2023]
Abstract
The recent success of AlphaFold2 (AF2) and other deep learning (DL) tools in accurately predicting the folded three-dimensional (3D) structure of proteins and enzymes has revolutionized the structural biology and protein design fields. The 3D structure indeed reveals key information on the arrangement of the catalytic machinery of enzymes and which structural elements gate the active site pocket. However, comprehending enzymatic activity requires a detailed knowledge of the chemical steps involved along the catalytic cycle and the exploration of the multiple thermally accessible conformations that enzymes adopt when in solution. In this Perspective, some of the recent studies showing the potential of AF2 in elucidating the conformational landscape of enzymes are provided. Selected examples of the key developments of AF2-based and DL methods for protein design are discussed, as well as a few enzyme design cases. These studies show the potential of AF2 and DL for allowing the routine computational design of efficient enzymes.
Collapse
Affiliation(s)
- Guillem Casadevall
- Institut
de Química Computacional i Catàlisi (IQCC) and Departament
de Química, Universitat de Girona, Maria Aurèlia Capmany 69, 17003 Girona, Spain
| | - Cristina Duran
- Institut
de Química Computacional i Catàlisi (IQCC) and Departament
de Química, Universitat de Girona, Maria Aurèlia Capmany 69, 17003 Girona, Spain
| | - Sílvia Osuna
- Institut
de Química Computacional i Catàlisi (IQCC) and Departament
de Química, Universitat de Girona, Maria Aurèlia Capmany 69, 17003 Girona, Spain
- ICREA, Passeig Lluís Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
21
|
Ayub S, Malak N, Cossío-Bayúgar R, Nasreen N, Khan A, Niaz S, Khan A, Alanazi AD, Ben Said M. In Vitro and In Silico Protocols for the Assessment of Anti-Tick Compounds from Pinus roxburghii against Rhipicephalus (Boophilus) microplus Ticks. Animals (Basel) 2023; 13:ani13081388. [PMID: 37106951 PMCID: PMC10135231 DOI: 10.3390/ani13081388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2022] [Revised: 04/10/2023] [Accepted: 04/12/2023] [Indexed: 04/29/2023] Open
Abstract
Pinus roxburghii, also known by the name "Himalayan chir pine," belongs to the Pinaceae family. Rhipicephalus (Boophilus) microplus tick is one of the most significant bovine ectoparasites, making it a major vector of economically important tick-borne diseases. The researchers conducted adult immersion tests (AIT) and larval packet tests (LPT) to investigate the acaricidal effect of P. roxburghii plant extract on R. (B.) microplus and its potential modulatory function when used with cypermethrin. Eggs were also assessed for their weight, egg-laying index (IE), hatchability rate, and control rate. After exposure to essential extract concentrations ranging from 2.5 to 40 mg/mL for 48 h, adult female ticks' oviposition inhibition and unfed R. (B.) microplus larvae's mortality rates were analyzed. Engorged females exposed to P. roxburghii at 40 mg/mL had reduced biological activity (oviposition, IE) compared to positive and negative controls. A concentration of 40 mg/mL of P. roxburghii caused 90% mortality in R. (B.) microplus larvae, whereas cypermethrin (the positive control) caused 98.3% mortality in LPT. In AIT, cypermethrin inhibited 81% of oviposition, compared to the 40 mg/mL concentration of P. roxburghii, which inhibited 40% of the ticks' oviposition. Moreover, this study assessed the binding capacity of selected phytocompounds with the targeted protein. Three servers (SWISS-MODEL, RoseTTAFold, and TrRosetta) recreated the target protein RmGABACl's 3D structure. The modeled 3D structure was validated using the online servers PROCHECK, ERRAT, and Prosa. Molecular docking using Auto Dock VINA predicted the binding mechanisms of 20 drug-like compounds against the target protein. Catechin and myricetin showed significant interactions with active site residues of the target protein, with docking scores of -7.7 kcal/mol and -7.6 kcal/mol, respectively. In conclusion, this study demonstrated the acaricidal activity of P. roxburghii extract, suggesting its potential as an alternative natural acaricide for controlling R. (B.) microplus.
Collapse
Affiliation(s)
- Sana Ayub
- Department of Zoology, Abdul Wali Khan University Mardan, Mardan 23200, Pakistan
| | - Nosheen Malak
- Department of Zoology, Abdul Wali Khan University Mardan, Mardan 23200, Pakistan
| | - Raquel Cossío-Bayúgar
- Centro Nacional de Investigaciones Disciplinarias en Salud Animal e Inocuidad, Departamento de Artropodología, Instituto Nacional de Investigaciones Forestales Agrícolas y Pecuarias (INIFAP), Boulevard Cuauhnahuac No. 8534, Jiutepec 62574, Mexico
| | - Nasreen Nasreen
- Department of Zoology, Abdul Wali Khan University Mardan, Mardan 23200, Pakistan
| | - Afshan Khan
- Department of Zoology, Abdul Wali Khan University Mardan, Mardan 23200, Pakistan
| | - Sadaf Niaz
- Department of Zoology, Abdul Wali Khan University Mardan, Mardan 23200, Pakistan
| | - Adil Khan
- Department of Zoology, Bacha Khan University Charsadda, Charsadda 24420, Pakistan
| | - Abdallah D Alanazi
- Department of Biological Sciences, Faculty of Science and Humanities, Shaqra University, Ad-Dawadimi 11911, Saudi Arabia
| | - Mourad Ben Said
- Department of Basic Sciences, Higher Institute of Biotechnology of Sidi Thabet, University of Manouba, Manouba 2010, Tunisia
- Laboratory of Microbiology, National School of Veterinary Medicine, Sidi Thabet, University of Manouba, Manouba 2010, Tunisia
| |
Collapse
|
22
|
Chen Y, Wang Z, Wang L, Wang J, Li P, Cao D, Zeng X, Ye X, Sakurai T. Deep generative model for drug design from protein target sequence. J Cheminform 2023; 15:38. [PMID: 36978179 PMCID: PMC10052801 DOI: 10.1186/s13321-023-00702-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Accepted: 02/18/2023] [Indexed: 03/30/2023] Open
Abstract
Drug discovery for a protein target is a laborious and costly process. Deep learning (DL) methods have been applied to drug discovery and successfully generated novel molecular structures, and they can substantially reduce development time and costs. However, most of them rely on prior knowledge, either by drawing on the structure and properties of known molecules to generate similar candidate molecules or extracting information on the binding sites of protein pockets to obtain molecules that can bind to them. In this paper, DeepTarget, an end-to-end DL model, was proposed to generate novel molecules solely relying on the amino acid sequence of the target protein to reduce the heavy reliance on prior knowledge. DeepTarget includes three modules: Amino Acid Sequence Embedding (AASE), Structural Feature Inference (SFI), and Molecule Generation (MG). AASE generates embeddings from the amino acid sequence of the target protein. SFI inferences the potential structural features of the synthesized molecule, and MG seeks to construct the eventual molecule. The validity of the generated molecules was demonstrated by a benchmark platform of molecular generation models. The interaction between the generated molecules and the target proteins was also verified on the basis of two metrics, drug-target affinity and molecular docking. The results of the experiments indicated the efficacy of the model for direct molecule generation solely conditioned on amino acid sequence.
Collapse
Affiliation(s)
- Yangyang Chen
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan.
| | - Zixu Wang
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Lei Wang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, China
| | - Jianmin Wang
- The Interdisciplinary Graduate Program in Integrative Biotechnology and Translational Medicine, Yonsei University, Incheon, 21983, Republic of Korea
- Bioinformatics and Molecular Design Research Center (BMDRC), Incheon, 21983, Republic of Korea
| | - Pengyong Li
- School of Computer Science and Technology, Xidian University, Xian, 710071, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, China.
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, People's Republic of China.
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan.
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| |
Collapse
|
23
|
Bougueroua S, Bricage M, Aboulfath Y, Barth D, Gaigeot MP. Algorithmic Graph Theory, Reinforcement Learning and Game Theory in MD Simulations: From 3D Structures to Topological 2D-Molecular Graphs (2D-MolGraphs) and Vice Versa. Molecules 2023; 28:molecules28072892. [PMID: 37049654 PMCID: PMC10096312 DOI: 10.3390/molecules28072892] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 03/17/2023] [Accepted: 03/18/2023] [Indexed: 04/14/2023] Open
Abstract
This paper reviews graph-theory-based methods that were recently developed in our group for post-processing molecular dynamics trajectories. We show that the use of algorithmic graph theory not only provides a direct and fast methodology to identify conformers sampled over time but also allows to follow the interconversions between the conformers through graphs of transitions in time. Examples of gas phase molecules and inhomogeneous aqueous solid interfaces are presented to demonstrate the power of topological 2D graphs and their versatility for post-processing molecular dynamics trajectories. An even more complex challenge is to predict 3D structures from topological 2D graphs. Our first attempts to tackle such a challenge are presented with the development of game theory and reinforcement learning methods for predicting the 3D structure of a gas-phase peptide.
Collapse
Affiliation(s)
- Sana Bougueroua
- Université Paris-Saclay, University Evry, CY Cergy Paris Université, CNRS, LAMBE UMR8587, 91025 Evry-Courcouronnes, France
| | - Marie Bricage
- Université Paris-Saclay, University Versailles Saint Quentin, DAVID, 78000 Versailles, France
| | - Ylène Aboulfath
- Université Paris-Saclay, University Versailles Saint Quentin, DAVID, 78000 Versailles, France
| | - Dominique Barth
- Université Paris-Saclay, University Versailles Saint Quentin, DAVID, 78000 Versailles, France
| | - Marie-Pierre Gaigeot
- Université Paris-Saclay, University Evry, CY Cergy Paris Université, CNRS, LAMBE UMR8587, 91025 Evry-Courcouronnes, France
| |
Collapse
|
24
|
Zhang H, Li X, Li Z, Huang D, Zhang L. Estimation of Particle Location in Granular Materials Based on Graph Neural Networks. MICROMACHINES 2023; 14:714. [PMID: 37420946 DOI: 10.3390/mi14040714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 03/20/2023] [Accepted: 03/21/2023] [Indexed: 07/09/2023]
Abstract
Particle locations determine the whole structure of a granular system, which is crucial to understanding various anomalous behaviors in glasses and amorphous solids. How to accurately determine the coordinates of each particle in such materials within a short time has always been a challenge. In this paper, we use an improved graph convolutional neural network to estimate the particle locations in two-dimensional photoelastic granular materials purely from the knowledge of the distances for each particle, which can be estimated in advance via a distance estimation algorithm. The robustness and effectiveness of our model are verified by testing other granular systems with different disorder degrees, as well as systems with different configurations. In this study, we attempt to provide a new route to the structural information of granular systems irrelevant to dimensionality, compositions, or other material properties.
Collapse
Affiliation(s)
- Hang Zhang
- School of Automation, Central South University, Changsha 410083, China
| | - Xingqiao Li
- School of Automation, Central South University, Changsha 410083, China
| | - Zirui Li
- School of Automation, Central South University, Changsha 410083, China
| | - Duan Huang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Ling Zhang
- School of Automation, Central South University, Changsha 410083, China
| |
Collapse
|
25
|
Sicard J, Barbe S, Boutrou R, Bouvier L, Delaplace G, Lashermes G, Théron L, Vitrac O, Tonda A. A primer on predictive techniques for food and bioresources transformation processes. J FOOD PROCESS ENG 2023. [DOI: 10.1111/jfpe.14325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2023]
Affiliation(s)
| | | | | | - Laurent Bouvier
- UMET Université de Lille, CNRS, Centrale Lille, INRAE Villeneuve‐D'Ascq France
| | - Guillaume Delaplace
- UMET Université de Lille, CNRS, Centrale Lille, INRAE Villeneuve‐D'Ascq France
| | | | | | - Olivier Vitrac
- SayFood, INRAE, AgroParisTech Université Paris Saclay Massy France
| | - Alberto Tonda
- MIA‐Paris, AgroParisTech, INRAE Université Paris Saclay Paris France
| |
Collapse
|
26
|
Yang Z, Zeng X, Zhao Y, Chen R. AlphaFold2 and its applications in the fields of biology and medicine. Signal Transduct Target Ther 2023; 8:115. [PMID: 36918529 PMCID: PMC10011802 DOI: 10.1038/s41392-023-01381-z] [Citation(s) in RCA: 60] [Impact Index Per Article: 60.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 12/27/2022] [Accepted: 02/16/2023] [Indexed: 03/16/2023] Open
Abstract
AlphaFold2 (AF2) is an artificial intelligence (AI) system developed by DeepMind that can predict three-dimensional (3D) structures of proteins from amino acid sequences with atomic-level accuracy. Protein structure prediction is one of the most challenging problems in computational biology and chemistry, and has puzzled scientists for 50 years. The advent of AF2 presents an unprecedented progress in protein structure prediction and has attracted much attention. Subsequent release of structures of more than 200 million proteins predicted by AF2 further aroused great enthusiasm in the science community, especially in the fields of biology and medicine. AF2 is thought to have a significant impact on structural biology and research areas that need protein structure information, such as drug discovery, protein design, prediction of protein function, et al. Though the time is not long since AF2 was developed, there are already quite a few application studies of AF2 in the fields of biology and medicine, with many of them having preliminarily proved the potential of AF2. To better understand AF2 and promote its applications, we will in this article summarize the principle and system architecture of AF2 as well as the recipe of its success, and particularly focus on reviewing its applications in the fields of biology and medicine. Limitations of current AF2 prediction will also be discussed.
Collapse
Affiliation(s)
- Zhenyu Yang
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Xiaoxi Zeng
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
| | - Yi Zhao
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China.
| | - Runsheng Chen
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
- Pingshan Translational Medicine Center, Shenzhen Bay Laboratory, Shenzhen, 518118, China.
| |
Collapse
|
27
|
Wang F, Sangfuang N, McCoubrey LE, Yadav V, Elbadawi M, Orlu M, Gaisford S, Basit AW. Advancing oral delivery of biologics: Machine learning predicts peptide stability in the gastrointestinal tract. Int J Pharm 2023; 634:122643. [PMID: 36709014 DOI: 10.1016/j.ijpharm.2023.122643] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 01/18/2023] [Accepted: 01/20/2023] [Indexed: 01/26/2023]
Abstract
The oral delivery of peptide therapeutics could facilitate precision treatment of numerous gastrointestinal (GI) and systemic diseases with simple administration for patients. However, the vast majority of licensed peptide drugs are currently administered parenterally due to prohibitive peptide instability in the GI tract. As such, the development of GI-stable peptides is receiving considerable investment. This study provides researchers with the first tool to predict the GI stability of peptide therapeutics based solely on the amino acid sequence. Both unsupervised and supervised machine learning techniques were trained on literature-extracted data describing peptide stability in simulated gastric and small intestinal fluid (SGF and SIF). Based on 109 peptide incubations, classification models for SGF and SIF were developed. The best models utilized k-Nearest Neighbor (for SGF) and XGBoost (for SIF) algorithms, with accuracies of 75.1% (SGF) and 69.3% (SIF), and f1 scores of 84.5% (SGF) and 73.4% (SIF) under 5-fold cross-validation. Feature importance analysis demonstrated that peptides' lipophilicity, rigidity, and size were key determinants of stability. These models are now available to those working on the development of oral peptide therapeutics.
Collapse
Affiliation(s)
- Fanjin Wang
- Intract Pharma Ltd. London Bioscience Innovation Centre, 2 Royal College St, London NW1 0NH, UK
| | | | | | - Vipul Yadav
- Intract Pharma Ltd. London Bioscience Innovation Centre, 2 Royal College St, London NW1 0NH, UK
| | - Moe Elbadawi
- UCL School of Pharmacy, 29-39 Brunswick Square, London WC1N 1AX, UK
| | - Mine Orlu
- UCL School of Pharmacy, 29-39 Brunswick Square, London WC1N 1AX, UK
| | - Simon Gaisford
- UCL School of Pharmacy, 29-39 Brunswick Square, London WC1N 1AX, UK
| | - Abdul W Basit
- UCL School of Pharmacy, 29-39 Brunswick Square, London WC1N 1AX, UK.
| |
Collapse
|
28
|
Lu C, Lubin JH, Sarma VV, Stentz SZ, Wang G, Wang S, Khare SD. Prediction and Design of Protease Enzyme Specificity Using a Structure-Aware Graph Convolutional Network. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.16.528728. [PMID: 36824945 PMCID: PMC9949123 DOI: 10.1101/2023.02.16.528728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
Abstract
Site-specific proteolysis by the enzymatic cleavage of small linear sequence motifs is a key post-translational modification involved in physiology and disease. The ability to robustly and rapidly predict protease substrate specificity would also enable targeted proteolytic cleavage - editing - of a target protein by designed proteases. Current methods for predicting protease specificity are limited to sequence pattern recognition in experimentally-derived cleavage data obtained for libraries of potential substrates and generated separately for each protease variant. We reasoned that a more semantically rich and robust model of protease specificity could be developed by incorporating the three-dimensional structure and energetics of molecular interactions between protease and substrates into machine learning workflows. We present Protein Graph Convolutional Network (PGCN), which develops a physically-grounded, structure-based molecular interaction graph representation that describes molecular topology and interaction energetics to predict enzyme specificity. We show that PGCN accurately predicts the specificity landscapes of several variants of two model proteases: the NS3/4 protease from the Hepatitis C virus (HCV) and the Tobacco Etch Virus (TEV) proteases. Node and edge ablation tests identified key graph elements for specificity prediction, some of which are consistent with known biochemical constraints for protease:substrate recognition. We used a pre-trained PGCN model to guide the design of TEV protease libraries for cleaving two non-canonical substrates, and found good agreement with experimental cleavage results. Importantly, the model can accurately assess designs featuring diversity at positions not present in the training data. The described methodology should enable the structure-based prediction of specificity landscapes of a wide variety of proteases and the construction of tailor-made protease editors for site-selectively and irreversibly modifying chosen target proteins.
Collapse
Affiliation(s)
- Changpeng Lu
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Joseph H. Lubin
- Department of Chemistry & Chemical Biology, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Vidur V. Sarma
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
| | | | - Guanyang Wang
- Department of Statistics, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Sijian Wang
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
- Department of Statistics, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Sagar D. Khare
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
- Department of Chemistry & Chemical Biology, Rutgers - The State University of New Jersey, Piscataway, NJ
| |
Collapse
|
29
|
Li AJ, Lu M, Desta I, Sundar V, Grigoryan G, Keating AE. Neural network-derived Potts models for structure-based protein design using backbone atomic coordinates and tertiary motifs. Protein Sci 2023; 32:e4554. [PMID: 36564857 PMCID: PMC9854172 DOI: 10.1002/pro.4554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 11/15/2022] [Accepted: 12/20/2022] [Indexed: 12/25/2022]
Abstract
Designing novel proteins to perform desired functions, such as binding or catalysis, is a major goal in synthetic biology. A variety of computational approaches can aid in this task. An energy-based framework rooted in the sequence-structure statistics of tertiary motifs (TERMs) can be used for sequence design on predefined backbones. Neural network models that use backbone coordinate-derived features provide another way to design new proteins. In this work, we combine the two methods to make neural structure-based models more suitable for protein design. Specifically, we supplement backbone-coordinate features with TERM-derived data, as inputs, and we generate energy functions as outputs. We present two architectures that generate Potts models over the sequence space: TERMinator, which uses both TERM-based and coordinate-based information, and COORDinator, which uses only coordinate-based information. Using these two models, we demonstrate that TERMs can be utilized to improve native sequence recovery performance of neural models. Furthermore, we demonstrate that sequences designed by TERMinator are predicted to fold to their target structures by AlphaFold. Finally, we show that both TERMinator and COORDinator learn notions of energetics, and these methods can be fine-tuned on experimental data to improve predictions. Our results suggest that using TERM-based and coordinate-based features together may be beneficial for protein design and that structure-based neural models that produce Potts energy tables have utility for flexible applications in protein science.
Collapse
Affiliation(s)
- Alex J. Li
- Department of ChemistryMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| | - Mindren Lu
- Department of Electrical Engineering and Computer ScienceMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
- Department of Biological EngineeringMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| | - Israel Desta
- Department of BiologyMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| | - Vikram Sundar
- Computational and Systems Biology ProgramMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| | - Gevorg Grigoryan
- Department of Computer ScienceDartmouth CollegeHanoverNew HampshireUSA
| | - Amy E. Keating
- Department of Biological EngineeringMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
- Department of BiologyMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
- Koch Institute for Integrative Cancer ResearchMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| |
Collapse
|
30
|
Soleymani F, Paquet E, Viktor HL, Michalowski W, Spinello D. ProtInteract: A deep learning framework for predicting protein-protein interactions. Comput Struct Biotechnol J 2023; 21:1324-1348. [PMID: 36817951 PMCID: PMC9929211 DOI: 10.1016/j.csbj.2023.01.028] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 01/20/2023] [Accepted: 01/20/2023] [Indexed: 01/26/2023] Open
Abstract
Proteins mainly perform their functions by interacting with other proteins. Protein-protein interactions underpin various biological activities such as metabolic cycles, signal transduction, and immune response. However, due to the sheer number of proteins, experimental methods for finding interacting and non-interacting protein pairs are time-consuming and costly. We therefore developed the ProtInteract framework to predict protein-protein interaction. ProtInteract comprises two components: first, a novel autoencoder architecture that encodes each protein's primary structure to a lower-dimensional vector while preserving its underlying sequence attributes. This leads to faster training of the second network, a deep convolutional neural network (CNN) that receives encoded proteins and predicts their interaction under three different scenarios. In each scenario, the deep CNN predicts the class of a given encoded protein pair. Each class indicates different ranges of confidence scores corresponding to the probability of whether a predicted interaction occurs or not. The proposed framework features significantly low computational complexity and relatively fast response. The contributions of this work are twofold. First, ProtInteract assimilates the protein's primary structure into a pseudo-time series. Therefore, we leverage the nature of the time series of proteins and their physicochemical properties to encode a protein's amino acid sequence into a lower-dimensional vector space. This approach enables extracting highly informative sequence attributes while reducing computational complexity. Second, the ProtInteract framework utilises this information to identify protein interactions with other proteins based on its amino acid configuration. Our results suggest that the proposed framework performs with high accuracy and efficiency in predicting protein-protein interactions.
Collapse
Affiliation(s)
- Farzan Soleymani
- Department of Mechanical Engineering, University of Ottawa, Ottawa, ON K1N 6N5, Canada
| | - Eric Paquet
- National Research Council, 1200 Montreal Road, Ottawa, ON K1A 0R6, Canada,Corresponding author.
| | - Herna Lydia Viktor
- School of Electrical Engineering and Computer Science, University of Ottawa, ON K1N 6N5, Canada
| | | | - Davide Spinello
- Department of Mechanical Engineering, University of Ottawa, Ottawa, ON K1N 6N5, Canada
| |
Collapse
|
31
|
Sharifi F, Sharifi I, Babaei Z, Alahdin S, Afgar A. Bioinformatics evaluation of anticancer properties of GP63 protein-derived peptides on MMP2 protein of melanoma cancer. J Pathol Inform 2023; 14:100190. [PMID: 36700237 PMCID: PMC9867975 DOI: 10.1016/j.jpi.2023.100190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 01/09/2023] [Accepted: 01/09/2023] [Indexed: 01/13/2023] Open
Abstract
Background GP63, also known as Leishmanolysin, is a multifunctional virulence factor abundant on the surface of Leishmania spp. small peptides with anticancer capabilities that are selective and toxic to cancer cells are known as anticancer peptides. We aimed to demonstrate the activity of GP63 and its anticancer properties on melanoma using a range of in silico tools and screening methods to identify predicted and designed anticancer peptides. Methods Various in silico modeling methodologies are used to establish the three-dimensional (3D) structure of GP63. Refinement and re-evaluation of the modeled structures and the built models' quality evaluated using the different docking used to find the interacting amino acids between MMP2 and GP63 and its anticancer peptides. AntiCP2.0 is used for screening anticancer peptides. 2D interaction plots of protein-ligand complexes evaluated by Protein-Ligand Interaction Profiler server. It is for the first time that used anticancer peptides of GP63 and the predicted and designed peptides. Results We used 3 peptides of GP63 based on the AntiCP 2.0 server with scores of 0.63, 0.53, and 0.49, and common peptides of GP63/MMP2 (continues peptide: mean the completely selected peptide after docking with non-anticancer effect, predicted with 0.58 score and designed peptides with 0.47 and 0.45 scores by AntiCP 2.0 server). Conclusions The antileishmanial and anticancer peptide research topics exemplify the multidisciplinary nature of peptide research. The advancement of therapeutics targeting cancer and/or Leishmania requires an interconnected research strategy shown in this work.
Collapse
Key Words
- ACPs, anticancer peptides
- Anticancer
- CASTp, Computed Atlas of Surface Topography of proteins
- CL, cutaneous leishmaniasis
- GP63, Glycoprotein 63
- In silico
- Leishmania
- Leishmanolysin
- MD, molecular dynamics
- MMPs, matrix metalloproteases
- MSP, major surface protease
- Matrix metalloproteases
- PDB, Protein Data Bank
- PLIP, Protein–Ligand Interaction Profiler
- Peptide
- Protein–Ligand Interaction Profiler
- ROS, reactive oxygen species formation
- SVM, Support Vector Machine
- VL, visceral leishmaniasis
- kNN, k-Nearest Neighbors
Collapse
Affiliation(s)
- Fatemeh Sharifi
- Research Center of Tropical and Infectious Diseases, Kerman University of Medical Sciences, Kerman, Iran
| | - Iraj Sharifi
- Leishmaniasis Research Center, Kerman University of Medical Sciences, Kerman, Iran
| | - Zahra Babaei
- Leishmaniasis Research Center, Kerman University of Medical Sciences, Kerman, Iran
| | - Sodabeh Alahdin
- Leishmaniasis Research Center, Kerman University of Medical Sciences, Kerman, Iran,Student Research Committee, Kerman University of Medical Sciences, Kerman, Iran
| | - Ali Afgar
- Research Center for Hydatid Disease in Iran, Kerman University of Medical Sciences, Kerman, Iran,Corresponding author.
| |
Collapse
|
32
|
Nallasamy V, Seshiah M. Energy Profile Bayes and Thompson Optimized Convolutional Neural Network protein structure prediction. Neural Comput Appl 2023; 35:1983-2006. [PMID: 36245797 PMCID: PMC9542649 DOI: 10.1007/s00521-022-07868-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Accepted: 09/21/2022] [Indexed: 01/12/2023]
Abstract
In living organisms, proteins are considered as the executants of biological functions. Owing to its pivotal role played in protein folding patterns, comprehension of protein structure is a challenging issue. Moreover, owing to numerous protein sequence exploration in protein data banks and complication of protein structures, experimental methods are found to be inadequate for protein structural class prediction. Hence, it is very much advantageous to design a reliable computational method to predict protein structural classes from protein sequences. In the recent few years there has been an elevated interest in using deep learning to assist protein structure prediction as protein structure prediction models can be utilized to screen a large number of novel sequences. In this regard, we propose a model employing Energy Profile for atom pairs in conjunction with the Legion-Class Bayes function called Energy Profile Legion-Class Bayes Protein Structure Identification model. Followed by this, we use a Thompson Optimized convolutional neural network to extract features between amino acids and then the Thompson Optimized SoftMax function is employed to extract associations between protein sequences for predicting secondary protein structure. The proposed Energy Profile Bayes and Thompson Optimized Convolutional Neural Network (EPB-OCNN) method tested distinct unique protein data and was compared to the state-of-the-art methods, the Template-Based Modeling, Protein Design using Deep Graph Neural Networks, a deep learning-based S-glutathionylation sites prediction tool called a Computational Framework, the Deep Learning and a distance-based protein structure prediction using deep learning. The results obtained when applied with the Biopython tool with respect to protein structure prediction time, protein structure prediction accuracy, specificity, recall, F-measure, and precision, respectively, are measured. The proposed EPB-OCNN method outperformed the state-of-the-art methods, thereby corroborating the objective.
Collapse
Affiliation(s)
- Varanavasi Nallasamy
- Cognizant Technology Solutions Pvt. Ltd, CHIL SEZ IT Park, Keeranatham, Saravanam Patti, Coimbatore, Tamil Nadu 641035 India
| | - Malarvizhi Seshiah
- Department of Computer Science, Thiruvalluvar Government Arts College, Rasipuram, Namakkal, Tamil Nadu India
| |
Collapse
|
33
|
Syrlybaeva R, Strauch EM. Deep learning of protein sequence design of protein-protein interactions. Bioinformatics 2023; 39:6827796. [PMID: 36377772 PMCID: PMC9947925 DOI: 10.1093/bioinformatics/btac733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2022] [Revised: 09/16/2022] [Accepted: 11/14/2022] [Indexed: 11/16/2022] Open
Abstract
MOTIVATION As more data of experimentally determined protein structures are becoming available, data-driven models to describe protein sequence-structure relationships become more feasible. Within this space, the amino acid sequence design of protein-protein interactions is still a rather challenging subproblem with very low success rates-yet, it is central to most biological processes. RESULTS We developed an attention-based deep learning model inspired by algorithms used for image-caption assignments to design peptides or protein fragment sequences. Our trained model can be applied for the redesign of natural protein interfaces or the designed protein interaction fragments. Here, we validate the potential by recapitulating naturally occurring protein-protein interactions including antibody-antigen complexes. The designed interfaces accurately capture essential native interactions and have comparable native-like binding affinities in silico. Furthermore, our model does not need a precise backbone location, making it an attractive tool for working with de novo design of protein-protein interactions. AVAILABILITY AND IMPLEMENTATION The source code of the method is available at https://github.com/strauchlab/iNNterfaceDesign. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Raulia Syrlybaeva
- Department of Pharmaceutical and Biomedical Sciences, University of Georgia, Athens, GA 30602, USA
| | - Eva-Maria Strauch
- Department of Pharmaceutical and Biomedical Sciences, University of Georgia, Athens, GA 30602, USA.,Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| |
Collapse
|
34
|
Bansia H, Ramakumar S. Homology Modeling of Antibody Variable Regions: Methods and Applications. Methods Mol Biol 2023; 2627:301-319. [PMID: 36959454 DOI: 10.1007/978-1-0716-2974-1_16] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
Adaptive immunity specifically protects us from antigenic challenges. Antibodies are key effector proteins of adaptive immunity, and they are remarkable in their ability to recognize a virtually limitless number of antigens. Fragment variable (FV), the antigen-binding region of antibodies, can be split into two main components, namely, framework and complementarity determining regions. The framework (FR) consists of light-chain framework (FRL) and heavy-chain framework (FRH). Similarly, the complementarity determining regions (CDRs) comprises of light-chain CDRs 1-3 (CDRs L1-3) and heavy-chain CDRs 1-3 (CDRs H1-3). While FRs are relatively constant in sequence and structure across diverse antibodies, sequence variation in CDRs leading to differential conformations of CDR loops accounts for the distinct antigenic specificities of diverse antibodies. The conserved structural features in FRs and conformity of CDRs to a limited set of standard conformations allow for the accurate prediction of FV models using homology modeling techniques. Antibody structure prediction from its amino acid sequence has numerous important applications including prediction of antibody-antigen interaction interfaces and redesign of therapeutically and biotechnologically useful antibodies with improved affinity. This chapter summarizes the current practices employed in the successful homology modeling of antibody variable regions and the potential applications of the generated homology models.
Collapse
Affiliation(s)
- Harsh Bansia
- Department of Physics, Indian Institute of Science, Bengaluru, India.
- Advanced Science Research Center at The Graduate Center of the City University of New York, New York, NY, USA.
| | | |
Collapse
|
35
|
Chakraborty C, Bhattacharya M, Chatterjee S, Sharma AR, Saha RP, Dhama K, Agoramoorthy G. Integrative Bioinformatics Approaches Indicate a Particular Pattern of Some SARS-CoV-2 and Non-SARS-CoV-2 Proteins. Vaccines (Basel) 2022; 11:vaccines11010038. [PMID: 36679883 PMCID: PMC9864461 DOI: 10.3390/vaccines11010038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 12/12/2022] [Accepted: 12/20/2022] [Indexed: 12/28/2022] Open
Abstract
Pattern recognition plays a critical role in integrative bioinformatics to determine the structural patterns of proteins of viruses such as SARS-CoV-2. This study identifies the pattern of SARS-CoV-2 proteins to depict the structure-function relationships of the protein alphabets of SARS-CoV-2 and COVID-19. The assembly enumeration algorithm, Anisotropic Network Model, Gaussian Network Model, Markovian Stochastic Model, and image comparison protein-like alphabets were used. The distance score was the lowest with 22 for "I" and highest with 40 for "9". For post-processing and decision, two protein alphabets "C" (PDB ID: 6XC3) and "S" (PDB ID: 7OYG) were evaluated to understand the structural, functional, and evolutionary relationships, and we found uniqueness in the functionality of proteins. Here, models were constructed using "SARS-CoV-2 proteins" (12 numbers) and "non-SARS-CoV-2 proteins" (14 numbers) to create two words, "SARS-CoV-2" and "COVID-19". Similarly, we developed two slogans: "Vaccinate the world against COVID-19" and "Say no to SARS-CoV-2", which were made with the proteins structure. It might generate vaccine-related interest to broad reader categories. Finally, the evolutionary process appears to enhance the protein structure smoothly to provide suitable functionality shaped by natural selection.
Collapse
Affiliation(s)
- Chiranjib Chakraborty
- Department of Biotechnology, School of Life Science and Biotechnology, Adamas University, Kolkata 700126, West Bengal, India
- Correspondence:
| | - Manojit Bhattacharya
- Department of Zoology, Fakir Mohan University, Vyasa Vihar, Balasore 756020, Odisha, India
| | - Srijan Chatterjee
- Institute for Skeletal Aging and Orthopaedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon-si 24252, Gangwon-do, Republic of Korea
| | - Ashish Ranjan Sharma
- Institute for Skeletal Aging and Orthopaedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon-si 24252, Gangwon-do, Republic of Korea
| | - Rudra P. Saha
- Department of Biotechnology, School of Life Science and Biotechnology, Adamas University, Kolkata 700126, West Bengal, India
| | - Kuldeep Dhama
- Division of Pathology, ICAR-Indian Veterinary Research Institute, Izatnagar, Bareilly 243122, Uttar Pradesh, India
| | | |
Collapse
|
36
|
Lansford JL, Barnes BC, Rice BM, Jensen KF. Building Chemical Property Models for Energetic Materials from Small Datasets Using a Transfer Learning Approach. J Chem Inf Model 2022; 62:5397-5410. [PMID: 36240441 DOI: 10.1021/acs.jcim.2c00841] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
For many experimentally measured chemical properties that cannot be directly computed from first-principles, the existing physics-based models do not extrapolate well to out-of-sample molecules, and experimental datasets themselves are too small for traditional machine learning (ML) approaches. To overcome these limitations, we apply a transfer learning approach, whereby we simultaneously train a multi-target regression model on a small number of molecules with experimentally measured values and a large number of molecules with related computed properties. We demonstrate this methodology on predicting the experimentally measured impact sensitivity of energetic crystals, finding that both characteristics of the computed dataset and model architecture are important to prediction accuracy of the small experimental dataset. Our directed-message passing neural network (D-MPNN) ML model using transfer learning outperforms direct-ML and physics-based models on a diverse test set, and the new methods described here are widely applicable to modeling many other structure-property relationships.
Collapse
Affiliation(s)
- Joshua L Lansford
- U.S. Army Combat Capabilities Development Command (DEVCOM) Army Research Laboratory, Aberdeen Proving Ground, Maryland 21005, United States.,Department of Chemical Engineering, MIT, Cambridge, Massachusetts 02139, United States
| | - Brian C Barnes
- U.S. Army Combat Capabilities Development Command (DEVCOM) Army Research Laboratory, Aberdeen Proving Ground, Maryland 21005, United States
| | - Betsy M Rice
- U.S. Army Combat Capabilities Development Command (DEVCOM) Army Research Laboratory, Aberdeen Proving Ground, Maryland 21005, United States
| | - Klavs F Jensen
- Department of Chemical Engineering, MIT, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
37
|
Scott O, Gu J, Chan AE. Classification of Protein-Binding Sites Using a Spherical Convolutional Neural Network. J Chem Inf Model 2022; 62:5383-5396. [PMID: 36341715 PMCID: PMC9709917 DOI: 10.1021/acs.jcim.2c00832] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The analysis and comparison of protein-binding sites aid various applications in the drug discovery process, e.g., hit finding, drug repurposing, and polypharmacology. Classification of binding sites has been a hot topic for the past 30 years, and many different methods have been published. The rapid development of machine learning computational algorithms, coupled with the large volume of publicly available protein-ligand 3D structures, makes it possible to apply deep learning techniques in binding site comparison. Our method uses a cutting-edge spherical convolutional neural network based on the DeepSphere architecture to learn global representations of protein-binding sites. The model was trained on TOUGH-C1 and TOUGH-M1 data and validated with the ProSPECCTs datasets. Our results show that our model can (1) perform well in protein-binding site similarity and classification tasks and (2) learn and separate the physicochemical properties of binding sites. Lastly, we tested the model on a set of kinases, where the results show that it is able to cluster the different kinase subfamilies effectively. This example demonstrates the method's promise for lead hopping within or outside a protein target, directly based on binding site information.
Collapse
|
38
|
Mahajan SP, Ruffolo JA, Frick R, Gray JJ. Hallucinating structure-conditioned antibody libraries for target-specific binders. Front Immunol 2022; 13:999034. [PMID: 36341416 PMCID: PMC9635398 DOI: 10.3389/fimmu.2022.999034] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 09/22/2022] [Indexed: 11/29/2022] Open
Abstract
Antibodies are widely developed and used as therapeutics to treat cancer, infectious disease, and inflammation. During development, initial leads routinely undergo additional engineering to increase their target affinity. Experimental methods for affinity maturation are expensive, laborious, and time-consuming and rarely allow the efficient exploration of the relevant design space. Deep learning (DL) models are transforming the field of protein engineering and design. While several DL-based protein design methods have shown promise, the antibody design problem is distinct, and specialized models for antibody design are desirable. Inspired by hallucination frameworks that leverage accurate structure prediction DL models, we propose the FvHallucinator for designing antibody sequences, especially the CDR loops, conditioned on an antibody structure. Such a strategy generates targeted CDR libraries that retain the conformation of the binder and thereby the mode of binding to the epitope on the antigen. On a benchmark set of 60 antibodies, FvHallucinator generates sequences resembling natural CDRs and recapitulates perplexity of canonical CDR clusters. Furthermore, the FvHallucinator designs amino acid substitutions at the VH-VL interface that are enriched in human antibody repertoires and therapeutic antibodies. We propose a pipeline that screens FvHallucinator designs to obtain a library enriched in binders for an antigen of interest. We apply this pipeline to the CDR H3 of the Trastuzumab-HER2 complex to generate in silico designs predicted to improve upon the binding affinity and interfacial properties of the original antibody. Thus, the FvHallucinator pipeline enables generation of inexpensive, diverse, and targeted antibody libraries enriched in binders for antibody affinity maturation.
Collapse
Affiliation(s)
- Sai Pooja Mahajan
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, United States
| | - Jeffrey A. Ruffolo
- Program in Molecular Biophysics, Johns Hopkins University, Baltimore, MD, United States
| | - Rahel Frick
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, United States
| | - Jeffrey J. Gray
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, United States
- Program in Molecular Biophysics, Johns Hopkins University, Baltimore, MD, United States
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD, United States
- *Correspondence: Jeffrey J. Gray,
| |
Collapse
|
39
|
Mukherjee S, Cassini TA, Hu N, Yang T, Li B, Shen W, Moth CW, Rinker DC, Sheehan JH, Cogan JD, Newman JH, Hamid R, Macdonald RL, Roden DM, Meiler J, Kuenze G, Phillips JA, Capra JA. Personalized structural biology reveals the molecular mechanisms underlying heterogeneous epileptic phenotypes caused by de novo KCNC2 variants. HGG ADVANCES 2022; 3:100131. [PMID: 36035247 PMCID: PMC9399384 DOI: 10.1016/j.xhgg.2022.100131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 07/11/2022] [Indexed: 11/28/2022] Open
Abstract
Whole-exome sequencing (WES) in the clinic has identified several rare monogenic developmental and epileptic encephalopathies (DEE) caused by ion channel variants. However, WES often fails to provide actionable insight for rare diseases, such as DEEs, due to the challenges of interpreting variants of unknown significance (VUS). Here, we describe a "personalized structural biology" (PSB) approach that leverages recent innovations in the analysis of protein 3D structures to address this challenge. We illustrate this approach in an Undiagnosed Diseases Network (UDN) individual with DEE symptoms and a de novo VUS in KCNC2 (p.V469L), the Kv3.2 voltage-gated potassium channel. A nearby KCNC2 variant (p.V471L) was recently suggested to cause DEE-like phenotypes. Computational structural modeling suggests that both affect protein function. However, despite their proximity, the p.V469L variant is likely to sterically block the channel pore, while the p.V471L variant is likely to stabilize the open state. Biochemical and electrophysiological analyses demonstrate heterogeneous loss-of-function and gain-of-function effects, as well as differential response to 4-aminopyridine treatment. Molecular dynamics simulations illustrate that the pore of the p.V469L variant is more constricted, increasing the energetic barrier for K+ permeation, whereas the p.V471L variant stabilizes the open conformation. Our results implicate variants in KCNC2 as causative for DEE and guide the interpretation of a UDN individual. They further delineate the molecular basis for the heterogeneous clinical phenotypes resulting from two proximal pathogenic variants. This demonstrates how the PSB approach can provide an analytical framework for individualized hypothesis-driven interpretation of protein-coding VUS.
Collapse
Affiliation(s)
- Souhrid Mukherjee
- Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN 37235, USA
| | - Thomas A. Cassini
- Department of Internal Medicine, National Institutes of Health Clinical Center, Bethesda, MD 20814, USA
| | - Ningning Hu
- Department of Neurology, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Department of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Tao Yang
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Bian Li
- Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN 37235, USA
| | - Wangzhen Shen
- Department of Neurology, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Christopher W. Moth
- Center for Structural Biology, Vanderbilt University, Nashville, TN 37235, USA
| | - David C. Rinker
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN 37235, USA
| | - Jonathan H. Sheehan
- Center for Structural Biology, Vanderbilt University, Nashville, TN 37235, USA
- John T. Milliken Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Joy D. Cogan
- Department of Pediatrics, Division of Medical Genetics and Genomic Medicine, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | - Undiagnosed Diseases Network
- Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA
- Department of Pediatrics, Division of Medical Genetics and Genomic Medicine, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
- Pulmonary Hypertension Center, Division of Allergy, Pulmonary and Critical Care Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235, USA
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN 37235, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Department of Neurology, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Department of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- John T. Milliken Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
- Department of Internal Medicine, National Institutes of Health Clinical Center, Bethesda, MD 20814, USA
- Institute for Drug Discovery, Leipzig University Medical School, Leipzig, SAC 04103, Germany
- Department of Chemistry, Leipzig University, Leipzig, SAC 04109, Germany
- Department of Computer Science, Leipzig University, Leipzig, SAC 04109, Germany
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Bakar Computational Health Sciences Institute and Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, CA 94143, USA
| | - John H. Newman
- Pulmonary Hypertension Center, Division of Allergy, Pulmonary and Critical Care Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Rizwan Hamid
- Department of Pediatrics, Division of Medical Genetics and Genomic Medicine, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | - Robert L. Macdonald
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Department of Neurology, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Dan M. Roden
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Jens Meiler
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235, USA
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN 37235, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Institute for Drug Discovery, Leipzig University Medical School, Leipzig, SAC 04103, Germany
- Department of Chemistry, Leipzig University, Leipzig, SAC 04109, Germany
- Department of Computer Science, Leipzig University, Leipzig, SAC 04109, Germany
| | - Georg Kuenze
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN 37235, USA
- Institute for Drug Discovery, Leipzig University Medical School, Leipzig, SAC 04103, Germany
| | - John A. Phillips
- Department of Pediatrics, Division of Medical Genetics and Genomic Medicine, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | - John A. Capra
- Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN 37235, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Bakar Computational Health Sciences Institute and Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, CA 94143, USA
| |
Collapse
|
40
|
Hirschi S, Ward TR, Meier WP, Müller DJ, Fotiadis D. Synthetic Biology: Bottom-Up Assembly of Molecular Systems. Chem Rev 2022; 122:16294-16328. [PMID: 36179355 DOI: 10.1021/acs.chemrev.2c00339] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The bottom-up assembly of biological and chemical components opens exciting opportunities to engineer artificial vesicular systems for applications with previously unmet requirements. The modular combination of scaffolds and functional building blocks enables the engineering of complex systems with biomimetic or new-to-nature functionalities. Inspired by the compartmentalized organization of cells and organelles, lipid or polymer vesicles are widely used as model membrane systems to investigate the translocation of solutes and the transduction of signals by membrane proteins. The bottom-up assembly and functionalization of such artificial compartments enables full control over their composition and can thus provide specifically optimized environments for synthetic biological processes. This review aims to inspire future endeavors by providing a diverse toolbox of molecular modules, engineering methodologies, and different approaches to assemble artificial vesicular systems. Important technical and practical aspects are addressed and selected applications are presented, highlighting particular achievements and limitations of the bottom-up approach. Complementing the cutting-edge technological achievements, fundamental aspects are also discussed to cater to the inherently diverse background of the target audience, which results from the interdisciplinary nature of synthetic biology. The engineering of proteins as functional modules and the use of lipids and block copolymers as scaffold modules for the assembly of functionalized vesicular systems are explored in detail. Particular emphasis is placed on ensuring the controlled assembly of these components into increasingly complex vesicular systems. Finally, all descriptions are presented in the greater context of engineering valuable synthetic biological systems for applications in biocatalysis, biosensing, bioremediation, or targeted drug delivery.
Collapse
Affiliation(s)
- Stephan Hirschi
- Institute of Biochemistry and Molecular Medicine, University of Bern, Bühlstrasse 28, 3012 Bern, Switzerland.,Molecular Systems Engineering, National Centre of Competence in Research (NCCR), 4002 Basel, Switzerland
| | - Thomas R Ward
- Department of Chemistry, University of Basel, St. Johanns-Ring 19, 4056 Basel, Switzerland.,Molecular Systems Engineering, National Centre of Competence in Research (NCCR), 4002 Basel, Switzerland
| | - Wolfgang P Meier
- Department of Chemistry, University of Basel, St. Johanns-Ring 19, 4056 Basel, Switzerland.,Molecular Systems Engineering, National Centre of Competence in Research (NCCR), 4002 Basel, Switzerland
| | - Daniel J Müller
- Department of Biosystems Science and Engineering, ETH Zürich, Mattenstrasse 26, 4058 Basel, Switzerland.,Molecular Systems Engineering, National Centre of Competence in Research (NCCR), 4002 Basel, Switzerland
| | - Dimitrios Fotiadis
- Institute of Biochemistry and Molecular Medicine, University of Bern, Bühlstrasse 28, 3012 Bern, Switzerland.,Molecular Systems Engineering, National Centre of Competence in Research (NCCR), 4002 Basel, Switzerland
| |
Collapse
|
41
|
Soleymani F, Paquet E, Viktor H, Michalowski W, Spinello D. Protein-protein interaction prediction with deep learning: A comprehensive review. Comput Struct Biotechnol J 2022; 20:5316-5341. [PMID: 36212542 PMCID: PMC9520216 DOI: 10.1016/j.csbj.2022.08.070] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 08/29/2022] [Accepted: 08/30/2022] [Indexed: 11/15/2022] Open
Abstract
Most proteins perform their biological function by interacting with themselves or other molecules. Thus, one may obtain biological insights into protein functions, disease prevalence, and therapy development by identifying protein-protein interactions (PPI). However, finding the interacting and non-interacting protein pairs through experimental approaches is labour-intensive and time-consuming, owing to the variety of proteins. Hence, protein-protein interaction and protein-ligand binding problems have drawn attention in the fields of bioinformatics and computer-aided drug discovery. Deep learning methods paved the way for scientists to predict the 3-D structure of proteins from genomes, predict the functions and attributes of a protein, and modify and design new proteins to provide desired functions. This review focuses on recent deep learning methods applied to problems including predicting protein functions, protein-protein interaction and their sites, protein-ligand binding, and protein design.
Collapse
Affiliation(s)
- Farzan Soleymani
- Department of Mechanical Engineering, University of Ottawa, Ottawa, ON, Canada
| | - Eric Paquet
- National Research Council, 1200 Montreal Road, Ottawa, ON K1A 0R6, Canada
| | - Herna Viktor
- School of Electrical Engineering and Computer Science, University of Ottawa, ON, Canada
| | | | - Davide Spinello
- Department of Mechanical Engineering, University of Ottawa, Ottawa, ON, Canada
| |
Collapse
|
42
|
Abstract
![]()
AlphaFold has burst into our lives. A powerful algorithm
that underscores
the strength of biological sequence data and artificial intelligence
(AI). AlphaFold has appended projects and research directions. The
database it has been creating promises an untold number of applications
with vast potential impacts that are still difficult to surmise. AI
approaches can revolutionize personalized treatments and usher in
better-informed clinical trials. They promise to make giant leaps
toward reshaping and revamping drug discovery strategies, selecting
and prioritizing combinations of drug targets. Here, we briefly overview
AI in structural biology, including in molecular dynamics simulations
and prediction of microbiota–human protein–protein interactions.
We highlight the advancements accomplished by the deep-learning-powered
AlphaFold in protein structure prediction and their powerful impact
on the life sciences. At the same time, AlphaFold does not resolve
the decades-long protein folding challenge, nor does it identify the
folding pathways. The models that AlphaFold provides do not capture
conformational mechanisms like frustration and allostery, which are
rooted in ensembles, and controlled by their dynamic distributions.
Allostery and signaling are properties of populations. AlphaFold also
does not generate ensembles of intrinsically disordered proteins and
regions, instead describing them by their low structural probabilities.
Since AlphaFold generates single ranked structures, rather than conformational
ensembles, it cannot elucidate the mechanisms of allosteric activating
driver hotspot mutations nor of allosteric drug resistance. However,
by capturing key features, deep learning techniques can use the single
predicted conformation as the basis for generating a diverse ensemble.
Collapse
Affiliation(s)
- Ruth Nussinov
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research, Frederick, Maryland 21702, United States.,Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Mingzhen Zhang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research, Frederick, Maryland 21702, United States
| | - Yonglan Liu
- Cancer Innovation Laboratory, National Cancer Institute, Frederick, Maryland 21702, United States
| | - Hyunbum Jang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research, Frederick, Maryland 21702, United States
| |
Collapse
|
43
|
Khan A, Sohaib M, Ullah R, Hussain I, Niaz S, Malak N, de la Fuente J, Khan A, Aguilar-Marcelino L, Alanazi AD, Ben Said M. Structure-based in silico design and in vitro acaricidal activity assessment of Acacia nilotica and Psidium guajava extracts against Sarcoptes scabiei var. cuniculi. Parasitol Res 2022; 121:2901-2915. [PMID: 35972548 DOI: 10.1007/s00436-022-07615-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 07/31/2022] [Indexed: 12/01/2022]
Abstract
Infestation by Sarcoptes scabiei var. cuniculi mite causes scabies in humans and mange in animals. Alternative methods for developing environmentally friendly and effective plant-based acaricides are now a priority. The purpose of this research was the in silico design and in vitro evaluation of the efficacy of ethanol extracts of Acacia nilotica and Psidium guajava plant leaves against S. scabiei. Chem-Draw ultra-software (v. 12.0.2.1076.2010) was used to draw 36 distinct compounds from these plants that were employed as ligands in docking tests against S. scabiei Aspartic protease (SsAP). With docking scores of - 6.50993 and - 6.16359, respectively, clionasterol (PubChem CID 457801) and mangiferin (PubChem CID 5281647) from A. nilotica inhibited the targeted protein SsAP, while only beta-sitosterol (PubChem CID 222284) from P. guajava interacted with the SsAP active site with a docking score of - 6.20532. Mortality in contact bioassay at concentrations of 0.25, 0.5, 1.0, and 2.0 g/ml was determined to calculate median lethal time (LT50) and median lethal concentration (LC50) values. Acacia nilotica extract had an LC50 value of 0.218 g/ml compared to P. guajava extract, which had an LC50 value of 0.829 g/ml at 6 h. These results suggest that A. nilotica extract is more effective in killing mites, and these plants may have novel acaricidal properties against S. scabiei. Further research should focus on A. nilotica as a potential substitute for clinically available acaricides against resistant mites.
Collapse
Affiliation(s)
- Afshan Khan
- Department of Zoology, Abdul Wali Khan University Mardan, Mardan, Khyber Pakhtunkhwa, Pakistan
| | - Muhammad Sohaib
- Department of Zoology, Abdul Wali Khan University Mardan, Mardan, Khyber Pakhtunkhwa, Pakistan
| | - Rooh Ullah
- Department of Zoology, Abdul Wali Khan University Mardan, Mardan, Khyber Pakhtunkhwa, Pakistan
| | - Imdad Hussain
- Department of Zoology, Abdul Wali Khan University Mardan, Mardan, Khyber Pakhtunkhwa, Pakistan
| | - Sadaf Niaz
- Department of Zoology, Abdul Wali Khan University Mardan, Mardan, Khyber Pakhtunkhwa, Pakistan
| | - Nosheen Malak
- Department of Zoology, Abdul Wali Khan University Mardan, Mardan, Khyber Pakhtunkhwa, Pakistan
| | - José de la Fuente
- SaBio. Instituto de Investigación en Recursos Cinegéticos IREC-CSIC-UCLM-JCCM, Ronda de Toledo s/n, 13005, Ciudad Real, Spain.,Department of Veterinary Pathobiology, Center for Veterinary Health Sciences, Oklahoma State University, Stillwater, OK, 74078, USA
| | - Adil Khan
- Department of Zoology, Bacha Khan University Charsadda, Charsadda, Khyber Pakhtunkhwa, Pakistan.
| | - Liliana Aguilar-Marcelino
- National Center for Disciplinary Research in Animal Health and Safety (INIFAP), Km 11 Federal Road Cuernavaca-Cuautla, 62550, Jiutepec, Morelos, México
| | - Abdullah D Alanazi
- Department of Biological Sciences, Faculty of Science and Humanities, Shaqra University, 1040 Ad-Dawadimi, 11911, Shaqra, Saudi Arabia
| | - Mourad Ben Said
- Department of Basic Sciences, Higher Institute of Biotechnology of Sidi Thabet, University of Manouba, 2010, Manouba, Tunisia.,Laboratory of Microbiology, National School of Veterinary Medicine of Sidi Thabet, University of Manouba, 2010, Manouba, Tunisia
| |
Collapse
|
44
|
Yang H, Xiong Z, Zonta F. Construction of a Deep Neural Network Energy Function for Protein Physics. J Chem Theory Comput 2022; 18:5649-5658. [PMID: 35939398 PMCID: PMC9476656 DOI: 10.1021/acs.jctc.2c00069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The traditional approach of computational biology consists of calculating molecule properties by using approximate classical potentials. Interactions between atoms are described by an energy function derived from physical principles or fitted to experimental data. Their functional form is usually limited to pairwise interactions between atoms and does not consider complex multibody effects. More recently, neural networks have emerged as an alternative way of describing the interactions between biomolecules. In this approach, the energy function does not have an explicit functional form and is learned bottom-up from simulations at the atomistic or quantum level. In this study, we attempt a top-down approach and use deep learning methods to obtain an energy function by exploiting the large amount of experimental data acquired with years in the field of structural biology. The energy function is represented by a probability density model learned from a large repertoire of building blocks representing local clusters of amino acids paired with their sequence signature. We demonstrated the feasibility of this approach by generating a neural network energy function and testing its validity on several applications such as discriminating decoys, assessing qualities of structural models, sampling structural conformations, and designing new protein sequences. We foresee that, in the future, our methodology could exploit the continuously increasing availability of experimental data and simulations and provide a new method for the parametrization of protein energy functions.
Collapse
Affiliation(s)
- Huan Yang
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai 201210, China
| | - Zhaoping Xiong
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai 201210, China
| | - Francesco Zonta
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai 201210, China
| |
Collapse
|
45
|
BIGDML-Towards accurate quantum machine learning force fields for materials. Nat Commun 2022; 13:3733. [PMID: 35768400 PMCID: PMC9243122 DOI: 10.1038/s41467-022-31093-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 06/01/2022] [Indexed: 12/16/2022] Open
Abstract
Machine-learning force fields (MLFF) should be accurate, computationally and data efficient, and applicable to molecules, materials, and interfaces thereof. Currently, MLFFs often introduce tradeoffs that restrict their practical applicability to small subsets of chemical space or require exhaustive datasets for training. Here, we introduce the Bravais-Inspired Gradient-Domain Machine Learning (BIGDML) approach and demonstrate its ability to construct reliable force fields using a training set with just 10-200 geometries for materials including pristine and defect-containing 2D and 3D semiconductors and metals, as well as chemisorbed and physisorbed atomic and molecular adsorbates on surfaces. The BIGDML model employs the full relevant symmetry group for a given material, does not assume artificial atom types or localization of atomic interactions and exhibits high data efficiency and state-of-the-art energy accuracies (errors substantially below 1 meV per atom) for an extended set of materials. Extensive path-integral molecular dynamics carried out with BIGDML models demonstrate the counterintuitive localization of benzene-graphene dynamics induced by nuclear quantum effects and their strong contributions to the hydrogen diffusion coefficient in a Pd crystal for a wide range of temperatures.
Collapse
|
46
|
Shi Z, Liu P, Liao X, Mao Z, Zhang J, Wang Q, Sun J, Ma H, Ma Y. Data-Driven Synthetic Cell Factories Development for Industrial Biomanufacturing. BIODESIGN RESEARCH 2022; 2022:9898461. [PMID: 37850146 PMCID: PMC10521697 DOI: 10.34133/2022/9898461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 05/26/2022] [Indexed: 10/19/2023] Open
Abstract
Revolutionary breakthroughs in artificial intelligence (AI) and machine learning (ML) have had a profound impact on a wide range of scientific disciplines, including the development of artificial cell factories for biomanufacturing. In this paper, we review the latest studies on the application of data-driven methods for the design of new proteins, pathways, and strains. We first briefly introduce the various types of data and databases relevant to industrial biomanufacturing, which are the basis for data-driven research. Different types of algorithms, including traditional ML and more recent deep learning methods, are also presented. We then demonstrate how these data-based approaches can be applied to address various issues in cell factory development using examples from recent studies, including the prediction of protein function, improvement of metabolic models, and estimation of missing kinetic parameters, design of non-natural biosynthesis pathways, and pathway optimization. In the last section, we discuss the current limitations of these data-driven approaches and propose that data-driven methods should be integrated with mechanistic models to complement each other and facilitate the development of synthetic strains for industrial biomanufacturing.
Collapse
Affiliation(s)
- Zhenkun Shi
- Key Laboratory of Systems Microbial Technology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin 300308China
| | - Pi Liu
- Key Laboratory of Systems Microbial Technology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin 300308China
| | - Xiaoping Liao
- Key Laboratory of Systems Microbial Technology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin 300308China
| | - Zhitao Mao
- Key Laboratory of Systems Microbial Technology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin 300308China
| | - Jianqi Zhang
- Key Laboratory of Systems Microbial Technology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin 300308China
| | - Qinhong Wang
- Key Laboratory of Systems Microbial Technology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin 300308China
| | - Jibin Sun
- Key Laboratory of Systems Microbial Technology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin 300308China
| | - Hongwu Ma
- Key Laboratory of Systems Microbial Technology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin 300308China
| | - Yanhe Ma
- Key Laboratory of Systems Microbial Technology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin 300308China
| |
Collapse
|
47
|
Mattiello L, Rütgers M, Sua-Rojas MF, Tavares R, Soares JS, Begcy K, Menossi M. Molecular and Computational Strategies to Increase the Efficiency of CRISPR-Based Techniques. FRONTIERS IN PLANT SCIENCE 2022; 13:868027. [PMID: 35712599 PMCID: PMC9194676 DOI: 10.3389/fpls.2022.868027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 04/27/2022] [Indexed: 06/15/2023]
Abstract
The prokaryote-derived Clustered Regularly Interspaced Palindromic Repeats (CRISPR)/Cas mediated gene editing tools have revolutionized our ability to precisely manipulate specific genome sequences in plants and animals. The simplicity, precision, affordability, and robustness of this technology have allowed a myriad of genomes from a diverse group of plant species to be successfully edited. Even though CRISPR/Cas, base editing, and prime editing technologies have been rapidly adopted and implemented in plants, their editing efficiency rate and specificity varies greatly. In this review, we provide a critical overview of the recent advances in CRISPR/Cas9-derived technologies and their implications on enhancing editing efficiency. We highlight the major efforts of engineering Cas9, Cas12a, Cas12b, and Cas12f proteins aiming to improve their efficiencies. We also provide a perspective on the global future of agriculturally based products using DNA-free CRISPR/Cas techniques. The improvement of CRISPR-based technologies efficiency will enable the implementation of genome editing tools in a variety of crop plants, as well as accelerate progress in basic research and molecular breeding.
Collapse
Affiliation(s)
- Lucia Mattiello
- Department of Genetics, Evolution, Microbiology and Immunology, Institute of Biology, State University of Campinas (UNICAMP), Campinas, Brazil
| | - Mark Rütgers
- Department of Genetics, Evolution, Microbiology and Immunology, Institute of Biology, State University of Campinas (UNICAMP), Campinas, Brazil
| | - Maria Fernanda Sua-Rojas
- Department of Genetics, Evolution, Microbiology and Immunology, Institute of Biology, State University of Campinas (UNICAMP), Campinas, Brazil
| | - Rafael Tavares
- Cell and Developmental Biology, John Innes Centre, Norwich, United Kingdom
| | - José Sérgio Soares
- Department of Genetics, Evolution, Microbiology and Immunology, Institute of Biology, State University of Campinas (UNICAMP), Campinas, Brazil
| | - Kevin Begcy
- Environmental Horticulture Department, University of Florida, Gainesville, FL, United States
| | - Marcelo Menossi
- Department of Genetics, Evolution, Microbiology and Immunology, Institute of Biology, State University of Campinas (UNICAMP), Campinas, Brazil
| |
Collapse
|
48
|
Talluri S. Algorithms for protein design. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 130:1-38. [PMID: 35534105 DOI: 10.1016/bs.apcsb.2022.01.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Computational Protein Design has the potential to contribute to major advances in enzyme technology, vaccine design, receptor-ligand engineering, biomaterials, nanosensors, and synthetic biology. Although Protein Design is a challenging problem, proteins can be designed by experts in Protein Design, as well as by non-experts whose primary interests are in the applications of Protein Design. The increased accessibility of Protein Design technology is attributable to the accumulated knowledge and experience with Protein Design as well as to the availability of software and online resources. The objective of this review is to serve as a guide to the relevant literature with a focus on the novel methods and algorithms that have been developed or applied for Protein Design, and to assist in the selection of algorithms for Protein Design. Novel algorithms and models that have been introduced to utilize the enormous amount of experimental data and novel computational hardware have the potential for producing substantial increases in the accuracy, reliability and range of applications of designed proteins.
Collapse
Affiliation(s)
- Sekhar Talluri
- Department of Biotechnology, GITAM, Visakhapatnam, India.
| |
Collapse
|
49
|
Zhou Y, Jiang Y, Chen SJ. RNA-ligand molecular docking: advances and challenges. WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL MOLECULAR SCIENCE 2022; 12:e1571. [PMID: 37293430 PMCID: PMC10250017 DOI: 10.1002/wcms.1571] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Accepted: 07/20/2021] [Indexed: 12/16/2022]
Abstract
With rapid advances in computer algorithms and hardware, fast and accurate virtual screening has led to a drastic acceleration in selecting potent small molecules as drug candidates. Computational modeling of RNA-small molecule interactions has become an indispensable tool for RNA-targeted drug discovery. The current models for RNA-ligand binding have mainly focused on the docking-and-scoring method. Accurate docking and scoring should tackle four crucial problems: (1) conformational flexibility of ligand, (2) conformational flexibility of RNA, (3) efficient sampling of binding sites and binding poses, and (4) accurate scoring of different binding modes. Moreover, compared with the problem of protein-ligand docking, predicting ligand binding to RNA, a negatively charged polymer, is further complicated by additional effects such as metal ion effects. Thermodynamic models based on physics-based and knowledge-based scoring functions have shown highly encouraging success in predicting ligand binding poses and binding affinities. Recently, kinetic models for ligand binding have further suggested that including dissociation kinetics (residence time) in ligand docking would result in improved performance in estimating in vivo drug efficacy. More recently, the rise of deep-learning approaches has led to new tools for predicting RNA-small molecule binding. In this review, we present an overview of the recently developed computational methods for RNA-ligand docking and their advantages and disadvantages.
Collapse
Affiliation(s)
- Yuanzhe Zhou
- Department of Physics and Astronomy, Department of Biochemistry, Institute of Data Sciences and Informatics, University of Missouri, Columbia, MO 65211-7010, USA
| | - Yangwei Jiang
- Department of Physics and Astronomy, Department of Biochemistry, Institute of Data Sciences and Informatics, University of Missouri, Columbia, MO 65211-7010, USA
| | - Shi-Jie Chen
- Department of Physics and Astronomy, Department of Biochemistry, Institute of Data Sciences and Informatics, University of Missouri, Columbia, MO 65211-7010, USA
| |
Collapse
|
50
|
Thean DGL, Chu HY, Fong JHC, Chan BKC, Zhou P, Kwok CCS, Chan YM, Mak SYL, Choi GCG, Ho JWK, Zheng Z, Wong ASL. Machine learning-coupled combinatorial mutagenesis enables resource-efficient engineering of CRISPR-Cas9 genome editor activities. Nat Commun 2022; 13:2219. [PMID: 35468907 PMCID: PMC9039034 DOI: 10.1038/s41467-022-29874-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Accepted: 04/04/2022] [Indexed: 12/12/2022] Open
Abstract
The genome-editing Cas9 protein uses multiple amino-acid residues to bind the target DNA. Considering only the residues in proximity to the target DNA as potential sites to optimise Cas9’s activity, the number of combinatorial variants to screen through is too massive for a wet-lab experiment. Here we generate and cross-validate ten in silico and experimental datasets of multi-domain combinatorial mutagenesis libraries for Cas9 engineering, and demonstrate that a machine learning-coupled engineering approach reduces the experimental screening burden by as high as 95% while enriching top-performing variants by ∼7.5-fold in comparison to the null model. Using this approach and followed by structure-guided engineering, we identify the N888R/A889Q variant conferring increased editing activity on the protospacer adjacent motif-relaxed KKH variant of Cas9 nuclease from Staphylococcus aureus (KKH-SaCas9) and its derived base editor in human cells. Our work validates a readily applicable workflow to enable resource-efficient high-throughput engineering of genome editor’s activity. Screening combinatorial mutants is too massive for wet-lab experiment alone. Here the authors present a machine learning-coupled combinatorial mutagenesis approach to vastly reduce experimental burden for engineering Cas9 genome editing enzymes.
Collapse
Affiliation(s)
- Dawn G L Thean
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Hong Kong, SAR, China
| | - Hoi Yee Chu
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Hong Kong, SAR, China.,Centre for Oncology and Immunology Limited, Hong Kong Science Park, Hong Kong, SAR, China
| | - John H C Fong
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Hong Kong, SAR, China
| | - Becky K C Chan
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Hong Kong, SAR, China.,Centre for Oncology and Immunology Limited, Hong Kong Science Park, Hong Kong, SAR, China
| | - Peng Zhou
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Hong Kong, SAR, China.,Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, SAR, China
| | - Cynthia C S Kwok
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Hong Kong, SAR, China
| | - Yee Man Chan
- Ming Wai Lau Centre for Reparative Medicine, Karolinska Institutet, Hong Kong, SAR, China
| | - Silvia Y L Mak
- Ming Wai Lau Centre for Reparative Medicine, Karolinska Institutet, Hong Kong, SAR, China
| | - Gigi C G Choi
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Hong Kong, SAR, China.,Centre for Oncology and Immunology Limited, Hong Kong Science Park, Hong Kong, SAR, China
| | - Joshua W K Ho
- School of Biomedical Sciences, The University of Hong Kong, Hong Kong, SAR, China.,Laboratory of Data Discovery for Health Limited (D24H), Hong Kong Science Park, Hong Kong, SAR, China
| | - Zongli Zheng
- Ming Wai Lau Centre for Reparative Medicine, Karolinska Institutet, Hong Kong, SAR, China.,Department of Biomedical Sciences, City University of Hong Kong, Hong Kong, SAR, China.,Biotechnology and Health Centre, City University of Hong Kong Shenzhen Research Institute, Shenzhen, China
| | - Alan S L Wong
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Hong Kong, SAR, China. .,Centre for Oncology and Immunology Limited, Hong Kong Science Park, Hong Kong, SAR, China. .,Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, SAR, China.
| |
Collapse
|