1
|
Pillay K, Chiliza TE, Senzani S, Pillay B, Pillay M. In silico design of Mycobacterium tuberculosis multi-epitope adhesin protein vaccines. Heliyon 2024; 10:e37536. [PMID: 39323805 PMCID: PMC11422057 DOI: 10.1016/j.heliyon.2024.e37536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Accepted: 09/04/2024] [Indexed: 09/27/2024] Open
Abstract
Mycobacterium tuberculosis (Mtb) adhesin proteins are promising candidates for subunit vaccine design. Multi-epitope Mtb vaccine and diagnostic candidates were designed using immunoinformatic tools. The antigenic potential of 26 adhesin proteins were determined using VaxiJen 2.0. The truncated heat shock protein 70 (tnHSP70), 19 kDa antigen lipoprotein (lpqH), Mtb curli pili (MTP), and Phosphate transport protein S1 (PstS1) were selected based on the number of known epitopes on the Immune Epitope Database (IEDB). B- and T-cell epitopes were identified using BepiPred2.0, ABCpred, SVMTriP, and IEDB, respectively. Population coverage was analysed using prominent South African specific alleles on the IEDB. The allergenicity, physicochemical characteristics and tertiary structure of the tri-fusion proteins were determined. The in silico immune simulation was performed using C-ImmSim. Three truncated sequences, with predicted B and T cell epitopes, and without allergenicity or signal peptides were linked by three glycine-serine residues, resulting in the stable, hydrophilic molecules, tnlpqH-tnPstS1-tnHSP70 (64,86 kDa) and tnMTP-tnPstS1-tnHSP70 (63,96 kDa). Restriction endonuclease recognition sequences incorporated at the N- and C-terminal ends of each construct, facilitated virtual cloning using Snapgene, into pGEX6P-1, resulting in novel, highly immunogenic vaccine candidates (0,912-0,985). Future studies will involve the cloning, recombinant protein expression and purification of these constructs for downstream applications.
Collapse
Affiliation(s)
- Koobashnee Pillay
- Discipline of Medical Microbiology, School of Laboratory Medicine and Medical Sciences, College of Health Science, University of KwaZulu-Natal, South Africa
| | - Thamsanqa E. Chiliza
- Discipline of Microbiology, School of Life Sciences, College of Agriculture, Engineering and Science, University of KwaZulu-Natal, South Africa
| | - Sibusiso Senzani
- Discipline of Medical Microbiology, School of Laboratory Medicine and Medical Sciences, College of Health Science, University of KwaZulu-Natal, South Africa
| | - Balakrishna Pillay
- Discipline of Microbiology, School of Life Sciences, College of Agriculture, Engineering and Science, University of KwaZulu-Natal, South Africa
| | - Manormoney Pillay
- Discipline of Medical Microbiology, School of Laboratory Medicine and Medical Sciences, College of Health Science, University of KwaZulu-Natal, South Africa
| |
Collapse
|
2
|
Jeje O, Otun S, Aloke C, Achilonu I. Exploring NAD + metabolism and NNAT: Insights from structure, function, and computational modeling. Biochimie 2024; 220:84-98. [PMID: 38182101 DOI: 10.1016/j.biochi.2024.01.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 12/18/2023] [Accepted: 01/02/2024] [Indexed: 01/07/2024]
Abstract
Nicotinamide Adenine Dinucleotide (NAD+), a coenzyme, is ubiquitously distributed and serves crucial functions in diverse biological processes, encompassing redox reactions, energy metabolism, and cellular signalling. This review article explores the intricate realm of NAD + metabolism, with a particular emphasis on the complex relationship between its structure, function, and the pivotal enzyme, Nicotinate Nucleotide Adenylyltransferase (NNAT), also known as nicotinate mononucleotide adenylyltransferase (NaMNAT), in the process of its biosynthesis. Our findings indicate that NAD + biosynthesis in humans and bacteria occurs via the same de novo synthesis route and the pyridine ring salvage pathway. Maintaining NAD homeostasis in bacteria is imperative, as most bacterial species cannot get NAD+ from their surroundings. However, due to lower sequence identity and structurally distant relationship of bacteria, including E. faecium and K. pneumonia, to its human counterpart, inhibiting NNAT, an indispensable enzyme implicated in NAD + biosynthesis, is a viable alternative in curtailing infections orchestrated by E. faecium and K. pneumonia. By merging empirical and computational discoveries and connecting the intricate NAD + metabolism network with NNAT's crucial role, it becomes clear that the synergistic effect of these insights may lead to a more profound understanding of the coenzyme's function and its potential applications in the fields of therapeutics and biotechnology.
Collapse
Affiliation(s)
- Olamide Jeje
- Protein Structure-Function and Research Unit, School of Molecular and Cell Biology, Faculty of Science, University of the Witwatersrand, Braamfontein, Johannesburg, 2050, South Africa
| | - Sarah Otun
- Protein Structure-Function and Research Unit, School of Molecular and Cell Biology, Faculty of Science, University of the Witwatersrand, Braamfontein, Johannesburg, 2050, South Africa.
| | - Chinyere Aloke
- Protein Structure-Function and Research Unit, School of Molecular and Cell Biology, Faculty of Science, University of the Witwatersrand, Braamfontein, Johannesburg, 2050, South Africa; Department of Medical Biochemistry, Alex Ekwueme Federal University Ndufu-Alike, Ebonyi State, Nigeria
| | - Ikechukwu Achilonu
- Protein Structure-Function and Research Unit, School of Molecular and Cell Biology, Faculty of Science, University of the Witwatersrand, Braamfontein, Johannesburg, 2050, South Africa
| |
Collapse
|
3
|
Grassmann G, Miotto M, Desantis F, Di Rienzo L, Tartaglia GG, Pastore A, Ruocco G, Monti M, Milanetti E. Computational Approaches to Predict Protein-Protein Interactions in Crowded Cellular Environments. Chem Rev 2024; 124:3932-3977. [PMID: 38535831 PMCID: PMC11009965 DOI: 10.1021/acs.chemrev.3c00550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 02/20/2024] [Accepted: 02/21/2024] [Indexed: 04/11/2024]
Abstract
Investigating protein-protein interactions is crucial for understanding cellular biological processes because proteins often function within molecular complexes rather than in isolation. While experimental and computational methods have provided valuable insights into these interactions, they often overlook a critical factor: the crowded cellular environment. This environment significantly impacts protein behavior, including structural stability, diffusion, and ultimately the nature of binding. In this review, we discuss theoretical and computational approaches that allow the modeling of biological systems to guide and complement experiments and can thus significantly advance the investigation, and possibly the predictions, of protein-protein interactions in the crowded environment of cell cytoplasm. We explore topics such as statistical mechanics for lattice simulations, hydrodynamic interactions, diffusion processes in high-viscosity environments, and several methods based on molecular dynamics simulations. By synergistically leveraging methods from biophysics and computational biology, we review the state of the art of computational methods to study the impact of molecular crowding on protein-protein interactions and discuss its potential revolutionizing effects on the characterization of the human interactome.
Collapse
Affiliation(s)
- Greta Grassmann
- Department
of Biochemical Sciences “Alessandro Rossi Fanelli”, Sapienza University of Rome, Rome 00185, Italy
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
| | - Mattia Miotto
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
| | - Fausta Desantis
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
- The
Open University Affiliated Research Centre at Istituto Italiano di
Tecnologia, Genoa 16163, Italy
| | - Lorenzo Di Rienzo
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
| | - Gian Gaetano Tartaglia
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
- Department
of Neuroscience and Brain Technologies, Istituto Italiano di Tecnologia, Genoa 16163, Italy
- Center
for Human Technologies, Genoa 16152, Italy
| | - Annalisa Pastore
- Experiment
Division, European Synchrotron Radiation
Facility, Grenoble 38043, France
| | - Giancarlo Ruocco
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
- Department
of Physics, Sapienza University, Rome 00185, Italy
| | - Michele Monti
- RNA
System Biology Lab, Department of Neuroscience and Brain Technologies, Istituto Italiano di Tecnologia, Genoa 16163, Italy
| | - Edoardo Milanetti
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
- Department
of Physics, Sapienza University, Rome 00185, Italy
| |
Collapse
|
4
|
Drogalin A, Monteiro LS, Alves MJ, Castro TG. Golgi α-mannosidase: opposing structures of Drosophila melanogaster and novel human model using molecular dynamics simulations and docking at different pHs. J Biomol Struct Dyn 2024; 42:2714-2725. [PMID: 37158092 DOI: 10.1080/07391102.2023.2209184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 04/19/2023] [Indexed: 05/10/2023]
Abstract
The search for Golgi α-mannosidase II (GMII) potent and specific inhibitors has been a focus of many studies for the past three decades since this enzyme is a key target for cancer treatment. α-Mannosidases, such as those from Drosophila melanogaster or Jack bean, have been used as functional models of the human Golgi α-mannosidase II (hGMII) because mammalian mannosidases are difficult to purify and characterize experimentally. Meanwhile, computational studies have been seen as privileged tools able to explore assertive solutions to specific enzymes, providing molecular details of these macromolecules, their protonation states and their interactions. Thus, modelling techniques can successfully predict hGMII 3D structure with high confidence, speeding up the development of new hits. In this study, Drosophila melanogaster Golgi mannosidase II (dGMII) and a novel human model, developed in silico and equilibrated via molecular dynamics simulations, were both opposed for docking. Our findings highlight that the design of novel inhibitors should be carried out considering the human model's characteristics and the enzyme operating pH. A reliable model is evidenced, showing a good correlation between Ki/IC50 experimental data and theoretical ΔGbinding estimations in GMII, opening the possibility of optimizing the rational drug design of new derivatives.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Artem Drogalin
- Chemistry Centre, School of Sciences, University of Minho, Braga, Portugal
| | - Luís S Monteiro
- Chemistry Centre, School of Sciences, University of Minho, Braga, Portugal
| | - Maria José Alves
- Chemistry Centre, School of Sciences, University of Minho, Braga, Portugal
| | - Tarsila G Castro
- CEB - Centre of Biological Engineering, University of Minho, Braga, Portugal
- LABBELS -Associate Laboratory, Braga/Guimarães, Portugal
| |
Collapse
|
5
|
Wuyun Q, Chen Y, Shen Y, Cao Y, Hu G, Cui W, Gao J, Zheng W. Recent Progress of Protein Tertiary Structure Prediction. Molecules 2024; 29:832. [PMID: 38398585 PMCID: PMC10893003 DOI: 10.3390/molecules29040832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 02/06/2024] [Accepted: 02/08/2024] [Indexed: 02/25/2024] Open
Abstract
The prediction of three-dimensional (3D) protein structure from amino acid sequences has stood as a significant challenge in computational and structural bioinformatics for decades. Recently, the widespread integration of artificial intelligence (AI) algorithms has substantially expedited advancements in protein structure prediction, yielding numerous significant milestones. In particular, the end-to-end deep learning method AlphaFold2 has facilitated the rise of structure prediction performance to new heights, regularly competitive with experimental structures in the 14th Critical Assessment of Protein Structure Prediction (CASP14). To provide a comprehensive understanding and guide future research in the field of protein structure prediction for researchers, this review describes various methodologies, assessments, and databases in protein structure prediction, including traditionally used protein structure prediction methods, such as template-based modeling (TBM) and template-free modeling (FM) approaches; recently developed deep learning-based methods, such as contact/distance-guided methods, end-to-end folding methods, and protein language model (PLM)-based methods; multi-domain protein structure prediction methods; the CASP experiments and related assessments; and the recently released AlphaFold Protein Structure Database (AlphaFold DB). We discuss their advantages, disadvantages, and application scopes, aiming to provide researchers with insights through which to understand the limitations, contexts, and effective selections of protein structure prediction methods in protein-related fields.
Collapse
Affiliation(s)
- Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Yihan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Yifeng Shen
- Faculty of Environment and Information Studies, Keio University, Fujisawa 252-0882, Kanagawa, Japan;
| | - Yang Cao
- College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Gang Hu
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China
| | - Wei Cui
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
6
|
Roy BG, Choi J, Fuchs MF. Predictive Modeling of Proteins Encoded by a Plant Virus Sheds a New Light on Their Structure and Inherent Multifunctionality. Biomolecules 2024; 14:62. [PMID: 38254661 PMCID: PMC10813169 DOI: 10.3390/biom14010062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 12/29/2023] [Accepted: 12/30/2023] [Indexed: 01/24/2024] Open
Abstract
Plant virus genomes encode proteins that are involved in replication, encapsidation, cell-to-cell, and long-distance movement, avoidance of host detection, counter-defense, and transmission from host to host, among other functions. Even though the multifunctionality of plant viral proteins is well documented, contemporary functional repertoires of individual proteins are incomplete. However, these can be enhanced by modeling tools. Here, predictive modeling of proteins encoded by the two genomic RNAs, i.e., RNA1 and RNA2, of grapevine fanleaf virus (GFLV) and their satellite RNAs by a suite of protein prediction software confirmed not only previously validated functions (suppressor of RNA silencing [VSR], viral genome-linked protein [VPg], protease [Pro], symptom determinant [Sd], homing protein [HP], movement protein [MP], coat protein [CP], and transmission determinant [Td]) and previously identified putative functions (helicase [Hel] and RNA-dependent RNA polymerase [Pol]), but also predicted novel functions with varying levels of confidence. These include a T3/T7-like RNA polymerase domain for protein 1AVSR, a short-chain reductase for protein 1BHel/VSR, a parathyroid hormone family domain for protein 1EPol/Sd, overlapping domains of unknown function and an ABC transporter domain for protein 2BMP, and DNA topoisomerase domains, transcription factor FBXO25 domain, or DNA Pol subunit cdc27 domain for the satellite RNA protein. Structural predictions for proteins 2AHP/Sd, 2BMP, and 3A? had low confidence, while predictions for proteins 1AVSR, 1BHel*/VSR, 1CVPg, 1DPro, 1EPol*/Sd, and 2CCP/Td retained higher confidence in at least one prediction. This research provided new insights into the structure and functions of GFLV proteins and their satellite protein. Future work is needed to validate these findings.
Collapse
Affiliation(s)
- Brandon G. Roy
- Plant Pathology and Plant-Microbe Biology Section, School of Integrative Plant Science, Cornell University, 15 Castle Creek Drive, Geneva, NY 14456, USA; (J.C.); (M.F.F.)
| | | | | |
Collapse
|
7
|
Yin R, Pierce BG. Evaluation of AlphaFold antibody-antigen modeling with implications for improving predictive accuracy. Protein Sci 2024; 33:e4865. [PMID: 38073135 PMCID: PMC10751731 DOI: 10.1002/pro.4865] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 12/01/2023] [Accepted: 12/07/2023] [Indexed: 12/26/2023]
Abstract
High resolution antibody-antigen structures provide critical insights into immune recognition and can inform therapeutic design. The challenges of experimental structural determination and the diversity of the immune repertoire underscore the necessity of accurate computational tools for modeling antibody-antigen complexes. Initial benchmarking showed that despite overall success in modeling protein-protein complexes, AlphaFold and AlphaFold-Multimer have limited success in modeling antibody-antigen interactions. In this study, we performed a thorough analysis of AlphaFold's antibody-antigen modeling performance on 427 nonredundant antibody-antigen complex structures, identifying useful confidence metrics for predicting model quality, and features of complexes associated with improved modeling success. Notably, we found that the latest version of AlphaFold improves near-native modeling success to over 30%, versus approximately 20% for a previous version, while increased AlphaFold sampling gives approximately 50% success. With this improved success, AlphaFold can generate accurate antibody-antigen models in many cases, while additional training or other optimization may further improve performance.
Collapse
Affiliation(s)
- Rui Yin
- University of Maryland Institute for Bioscience and Biotechnology ResearchRockvilleMarylandUSA
- Department of Cell Biology and Molecular GeneticsUniversity of MarylandCollege ParkMarylandUSA
| | - Brian G. Pierce
- University of Maryland Institute for Bioscience and Biotechnology ResearchRockvilleMarylandUSA
- Department of Cell Biology and Molecular GeneticsUniversity of MarylandCollege ParkMarylandUSA
| |
Collapse
|
8
|
Zheng W, Wuyun Q, Freddolino PL, Zhang Y. Integrating deep learning, threading alignments, and a multi-MSA strategy for high-quality protein monomer and complex structure prediction in CASP15. Proteins 2023; 91:1684-1703. [PMID: 37650367 PMCID: PMC10840719 DOI: 10.1002/prot.26585] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 08/04/2023] [Accepted: 08/14/2023] [Indexed: 09/01/2023]
Abstract
We report the results of the "UM-TBM" and "Zheng" groups in CASP15 for protein monomer and complex structure prediction. These prediction sets were obtained using the D-I-TASSER and DMFold-Multimer algorithms, respectively. For monomer structure prediction, D-I-TASSER introduced four new features during CASP15: (i) a multiple sequence alignment (MSA) generation protocol that combines multi-source MSA searching and a structural modeling-based MSA ranker; (ii) attention-network based spatial restraints; (iii) a multi-domain module containing domain partition and arrangement for domain-level templates and spatial restraints; (iv) an optimized I-TASSER-based folding simulation system for full-length model creation guided by a combination of deep learning restraints, threading alignments, and knowledge-based potentials. For 47 free modeling targets in CASP15, the final models predicted by D-I-TASSER showed average TM-score 19% higher than the standard AlphaFold2 program. We thus showed that traditional Monte Carlo-based folding simulations, when appropriately coupled with deep learning algorithms, can generate models with improved accuracy over end-to-end deep learning methods alone. For protein complex structure prediction, DMFold-Multimer generated models by integrating a new MSA generation algorithm (DeepMSA2) with the end-to-end modeling module from AlphaFold2-Multimer. For the 38 complex targets, DMFold-Multimer generated models with an average TM-score of 0.83 and Interface Contact Score of 0.60, both significantly higher than those of competing complex prediction tools. Our analyses on complexes highlighted the critical role played by MSA generating, ranking, and pairing in protein complex structure prediction. We also discuss future room for improvement in the areas of viral protein modeling and complex model ranking.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Peter L Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Computer Science, School of Computing, National University of Singapore, 117417 Singapore
- Cancer Science Institute of Singapore, National University of Singapore, 117599, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 117596, Singapore
| |
Collapse
|
9
|
Hoppe-Elsholz G, Piña-Iturbe A, Vallejos OP, Suazo ID, Sepúlveda-Alfaro J, Pereira-Sánchez P, Martínez-Balboa Y, Catalán EA, Reyes P, Scaff V, Bassi F, Campos-Gajardo S, Avilés A, Santiviago CA, Kalergis AM, Bueno SM. SEN1990 is a predicted winged helix-turn-helix protein involved in the pathogenicity of Salmonella enterica serovar Enteritidis and the expression of the gene oafB in the SPI-17. Front Microbiol 2023; 14:1236458. [PMID: 38029095 PMCID: PMC10655114 DOI: 10.3389/fmicb.2023.1236458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 09/27/2023] [Indexed: 12/01/2023] Open
Abstract
Excisable genomic islands (EGIs) are horizontally acquired genetic elements that harbor an array of genes with diverse functions. ROD21 is an EGI found integrated in the chromosome of Salmonella enterica serovar Enteritidis (Salmonella ser. Enteritidis). While this island is known to be involved in the capacity of Salmonella ser. Enteritidis to cross the epithelial barrier and colonize sterile organs, the role of most ROD21 genes remains unknown, and thus, the identification of their function is fundamental to understanding the impact of this EGI on bacterium pathogenicity. Therefore, in this study, we used a bioinformatical approach to evaluate the function of ROD21-encoded genes and delve into the characterization of SEN1990, a gene encoding a putative DNA-binding protein. We characterized the predicted structure of SEN1990, finding that this protein contains a three-stranded winged helix-turn-helix (wHTH) DNA-binding domain. Additionally, we identified homologs of SEN1990 among other members of the EARL EGIs. Furthermore, we deleted SEN1990 in Salmonella ser. Enteritidis, finding no differences in the replication or maintenance of the excised ROD21, contrary to what the previous Refseq annotation of the protein suggests. High-throughput RNA sequencing was carried out to evaluate the effect of the absence of SEN1990 on the bacterium's global transcription. We found a downregulated expression of oafB, an SPI-17-encoded acetyltransferase involved in O-antigen modification, which was restored when the deletion mutant was complemented ectopically. Additionally, we found that strains lacking SEN1990 had a reduced capacity to colonize sterile organs in mice. Our findings suggest that SEN1990 encodes a wHTH domain-containing protein that modulates the transcription of oafB from the SPI-17, implying a crosstalk between these pathogenicity islands and a possible new role of ROD21 in the pathogenesis of Salmonella ser. Enteritidis.
Collapse
Affiliation(s)
- Guillermo Hoppe-Elsholz
- Millennium Institute on Immunology and Immunotherapy, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Alejandro Piña-Iturbe
- Millennium Institute on Immunology and Immunotherapy, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Omar P. Vallejos
- Millennium Institute on Immunology and Immunotherapy, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Isidora D. Suazo
- Millennium Institute on Immunology and Immunotherapy, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Javiera Sepúlveda-Alfaro
- Millennium Institute on Immunology and Immunotherapy, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Patricia Pereira-Sánchez
- Millennium Institute on Immunology and Immunotherapy, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Yohana Martínez-Balboa
- Millennium Institute on Immunology and Immunotherapy, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Eduardo A. Catalán
- Millennium Institute on Immunology and Immunotherapy, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Pablo Reyes
- Millennium Institute on Immunology and Immunotherapy, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Valentina Scaff
- Millennium Institute on Immunology and Immunotherapy, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Franco Bassi
- Millennium Institute on Immunology and Immunotherapy, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Sofia Campos-Gajardo
- Millennium Institute on Immunology and Immunotherapy, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Andrea Avilés
- Departamento de Bioquímica y Biología Molecular, Facultad de Ciencias Químicas y Farmacéuticas, Universidad de Chile, Santiago, Chile
| | - Carlos A. Santiviago
- Departamento de Bioquímica y Biología Molecular, Facultad de Ciencias Químicas y Farmacéuticas, Universidad de Chile, Santiago, Chile
| | - Alexis M. Kalergis
- Millennium Institute on Immunology and Immunotherapy, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
- Departamento de Endocrinología, Facultad de Medicina, Escuela de Medicina, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Susan M. Bueno
- Millennium Institute on Immunology and Immunotherapy, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
| |
Collapse
|
10
|
Han R, Yoon H, Kim G, Lee H, Lee Y. Revolutionizing Medicinal Chemistry: The Application of Artificial Intelligence (AI) in Early Drug Discovery. Pharmaceuticals (Basel) 2023; 16:1259. [PMID: 37765069 PMCID: PMC10537003 DOI: 10.3390/ph16091259] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 08/24/2023] [Accepted: 09/04/2023] [Indexed: 09/29/2023] Open
Abstract
Artificial intelligence (AI) has permeated various sectors, including the pharmaceutical industry and research, where it has been utilized to efficiently identify new chemical entities with desirable properties. The application of AI algorithms to drug discovery presents both remarkable opportunities and challenges. This review article focuses on the transformative role of AI in medicinal chemistry. We delve into the applications of machine learning and deep learning techniques in drug screening and design, discussing their potential to expedite the early drug discovery process. In particular, we provide a comprehensive overview of the use of AI algorithms in predicting protein structures, drug-target interactions, and molecular properties such as drug toxicity. While AI has accelerated the drug discovery process, data quality issues and technological constraints remain challenges. Nonetheless, new relationships and methods have been unveiled, demonstrating AI's expanding potential in predicting and understanding drug interactions and properties. For its full potential to be realized, interdisciplinary collaboration is essential. This review underscores AI's growing influence on the future trajectory of medicinal chemistry and stresses the importance of ongoing synergies between computational and domain experts.
Collapse
Affiliation(s)
| | | | | | | | - Yoonji Lee
- College of Pharmacy, Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|
11
|
Camponeschi C, Righino B, Pirolli D, Semeraro A, Ria F, De Rosa MC. Prediction of CD44 Structure by Deep Learning-Based Protein Modeling. Biomolecules 2023; 13:1047. [PMID: 37509083 PMCID: PMC10376988 DOI: 10.3390/biom13071047] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 06/19/2023] [Accepted: 06/24/2023] [Indexed: 07/30/2023] Open
Abstract
CD44 is a cell surface glycoprotein transmembrane receptor that is involved in cell-cell and cell-matrix interactions. It crucially associates with several molecules composing the extracellular matrix, the main one of which is hyaluronic acid. It is ubiquitously expressed in various types of cells and is involved in the regulation of important signaling pathways, thus playing a key role in several physiological and pathological processes. Structural information about CD44 is, therefore, fundamental for understanding the mechanism of action of this receptor and developing effective treatments against its aberrant expression and dysregulation frequently associated with pathological conditions. To date, only the structure of the hyaluronan-binding domain (HABD) of CD44 has been experimentally determined. To elucidate the nature of CD44s, the most frequently expressed isoform, we employed the recently developed deep-learning-based tools D-I-TASSER, AlphaFold2, and RoseTTAFold for an initial structural prediction of the full-length receptor, accompanied by molecular dynamics simulations on the most promising model. All three approaches correctly predicted the HABD, with AlphaFold2 outperforming D-I-TASSER and RoseTTAFold in the structural comparison with the crystallographic HABD structure and confidence in predicting the transmembrane helix. Low confidence regions were also predicted, which largely corresponded to the disordered regions of CD44s. These regions allow the receptor to perform its unconventional activity.
Collapse
Affiliation(s)
- Chiara Camponeschi
- Institute of Chemical Sciences and Technologies ''Giulio Natta'' (SCITEC)-CNR, 00168 Rome, Italy
| | - Benedetta Righino
- Institute of Chemical Sciences and Technologies ''Giulio Natta'' (SCITEC)-CNR, 00168 Rome, Italy
| | - Davide Pirolli
- Institute of Chemical Sciences and Technologies ''Giulio Natta'' (SCITEC)-CNR, 00168 Rome, Italy
| | - Alessandro Semeraro
- Department of Chemistry and Technology of Drugs, Sapienza University of Rome, 00185 Rome, Italy
| | - Francesco Ria
- Department of Translational Medicine and Surgery, Section of General Pathology, Università Cattolica del Sacro Cuore, 00168 Rome, Italy
- Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy
| | - Maria Cristina De Rosa
- Institute of Chemical Sciences and Technologies ''Giulio Natta'' (SCITEC)-CNR, 00168 Rome, Italy
| |
Collapse
|
12
|
Adiyaman R, Edmunds NS, Genc AG, Alharbi SMA, McGuffin LJ. Improvement of protein tertiary and quaternary structure predictions using the ReFOLD refinement method and the AlphaFold2 recycling process. BIOINFORMATICS ADVANCES 2023; 3:vbad078. [PMID: 37359722 PMCID: PMC10290552 DOI: 10.1093/bioadv/vbad078] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 05/09/2023] [Accepted: 06/13/2023] [Indexed: 06/28/2023]
Abstract
Motivation The accuracy gap between predicted and experimental structures has been significantly reduced following the development of AlphaFold2 (AF2). However, for many targets, AF2 models still have room for improvement. In previous CASP experiments, highly computationally intensive MD simulation-based methods have been widely used to improve the accuracy of single 3D models. Here, our ReFOLD pipeline was adapted to refine AF2 predictions while maintaining high model accuracy at a modest computational cost. Furthermore, the AF2 recycling process was utilized to improve 3D models by using them as custom template inputs for tertiary and quaternary structure predictions. Results According to the Molprobity score, 94% of the generated 3D models by ReFOLD were improved. AF2 recycling showed an improvement rate of 87.5% (using MSAs) and 81.25% (using single sequences) for monomeric AF2 models and 100% (MSA) and 97.8% (single sequence) for monomeric non-AF2 models, as measured by the average change in lDDT. By the same measure, the recycling of multimeric models showed an improvement rate of as much as 80% for AF2-Multimer (AF2M) models and 94% for non-AF2M models. Availability and implementation Refinement using AlphaFold2-Multimer recycling is available as part of the MultiFOLD docker package (https://hub.docker.com/r/mcguffin/multifold). The ReFOLD server is available at https://www.reading.ac.uk/bioinf/ReFOLD/ and the modified scripts can be downloaded from https://www.reading.ac.uk/bioinf/downloads/. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Recep Adiyaman
- School of Biological Sciences, University of Reading, Reading RG6 6EX, UK
| | - Nicholas S Edmunds
- School of Biological Sciences, University of Reading, Reading RG6 6EX, UK
| | - Ahmet G Genc
- School of Biological Sciences, University of Reading, Reading RG6 6EX, UK
| | - Shuaa M A Alharbi
- School of Biological Sciences, University of Reading, Reading RG6 6EX, UK
| | | |
Collapse
|
13
|
Ye J, Li A, Zheng H, Yang B, Lu Y. Machine Learning Advances in Predicting Peptide/Protein-Protein Interactions Based on Sequence Information for Lead Peptides Discovery. Adv Biol (Weinh) 2023; 7:e2200232. [PMID: 36775876 DOI: 10.1002/adbi.202200232] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 12/30/2022] [Indexed: 02/14/2023]
Abstract
Peptides have shown increasing advantages and significant clinical value in drug discovery and development. With the development of high-throughput technologies and artificial intelligence (AI), machine learning (ML) methods for discovering new lead peptides have been expanded and incorporated into rational drug design. Predictions of peptide-protein interactions (PepPIs) and protein-protein interactions (PPIs) are both opportunities and challenges in computational biology, which will help to better understand the mechanisms of disease and provide the impetus for the discovery of lead peptides. This paper comprehensively reviews computational models for PepPI and PPI predictions. It begins with an introduction of various databases of peptide ligands and target proteins. Then it discusses data formats and feature representations for proteins and peptides. Furthermore, classical ML methods and emerging deep learning (DL) methods that can be used to train prediction models of PepPI and PPI are classified into four categories, and their advantages and disadvantages are analyzed. To assess the relative performance of different models, different validation protocols and evaluation indexes are discussed. The goal of this review is to help researchers quickly get started to develop computational frameworks using these integrated resources and eventually promote the discovery of lead peptides.
Collapse
Affiliation(s)
- Jiahao Ye
- School of Medicine, Shanghai University, Shanghai, 200444, China
| | - An Li
- Department of Critical Care Medicine, Shanghai Tenth People's Hospital, School of Medicine, Tongji University, Shanghai, 200072, China
- Department of Biochemical Pharmacy, School of Pharmacy, Second Military Medical University, Shanghai, 200433, China
| | - Hao Zheng
- School of Medicine, Shanghai University, Shanghai, 200444, China
| | - Banghua Yang
- School of Medicine, Shanghai University, Shanghai, 200444, China
| | - Yiming Lu
- School of Medicine, Shanghai University, Shanghai, 200444, China
- Department of Critical Care Medicine, Shanghai Tenth People's Hospital, School of Medicine, Tongji University, Shanghai, 200072, China
- Department of Biochemical Pharmacy, School of Pharmacy, Second Military Medical University, Shanghai, 200433, China
| |
Collapse
|
14
|
Wodak SJ, Vajda S, Lensink MF, Kozakov D, Bates PA. Critical Assessment of Methods for Predicting the 3D Structure of Proteins and Protein Complexes. Annu Rev Biophys 2023; 52:183-206. [PMID: 36626764 PMCID: PMC10885158 DOI: 10.1146/annurev-biophys-102622-084607] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Advances in a scientific discipline are often measured by small, incremental steps. In this review, we report on two intertwined disciplines in the protein structure prediction field, modeling of single chains and modeling of complexes, that have over decades emulated this pattern, as monitored by the community-wide blind prediction experiments CASP and CAPRI. However, over the past few years, dramatic advances were observed for the accurate prediction of single protein chains, driven by a surge of deep learning methodologies entering the prediction field. We review the mainscientific developments that enabled these recent breakthroughs and feature the important role of blind prediction experiments in building up and nurturing the structure prediction field. We discuss how the new wave of artificial intelligence-based methods is impacting the fields of computational and experimental structural biology and highlight areas in which deep learning methods are likely to lead to future developments, provided that major challenges are overcome.
Collapse
Affiliation(s)
- Shoshana J Wodak
- VIB-VUB Center for Structural Biology, Vrije Universiteit Brussel, Brussels, Belgium;
| | - Sandor Vajda
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA;
- Department of Chemistry, Boston University, Boston, Massachusetts, USA
| | - Marc F Lensink
- Univ. Lille, CNRS, UMR 8576-UGSF-Unité de Glycobiologie Structurale et Fonctionnelle, Lille, France;
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA;
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA
| | - Paul A Bates
- Biomolecular Modelling Laboratory, The Francis Crick Institute, London, United Kingdom;
| |
Collapse
|
15
|
Durham J, Zhang J, Humphreys IR, Pei J, Cong Q. Recent advances in predicting and modeling protein-protein interactions. Trends Biochem Sci 2023; 48:527-538. [PMID: 37061423 DOI: 10.1016/j.tibs.2023.03.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 03/03/2023] [Accepted: 03/17/2023] [Indexed: 04/17/2023]
Abstract
Protein-protein interactions (PPIs) drive biological processes, and disruption of PPIs can cause disease. With recent breakthroughs in structure prediction and a deluge of genomic sequence data, computational methods to predict PPIs and model spatial structures of protein complexes are now approaching the accuracy of experimental approaches for permanent interactions and show promise for elucidating transient interactions. As we describe here, the key to this success is rich evolutionary information deciphered from thousands of homologous sequences that coevolve in interacting partners. This covariation signal, revealed by sophisticated statistical and machine learning (ML) algorithms, predicts physiological interactions. Accurate artificial intelligence (AI)-based modeling of protein structures promises to provide accurate 3D models of PPIs at a proteome-wide scale.
Collapse
Affiliation(s)
- Jesse Durham
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Jing Zhang
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Ian R Humphreys
- Department of Biochemistry, University of Washington, Seattle, WA, USA; Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Jimin Pei
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Qian Cong
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA.
| |
Collapse
|
16
|
Elofsson A. Progress at protein structure prediction, as seen in CASP15. Curr Opin Struct Biol 2023; 80:102594. [PMID: 37060758 DOI: 10.1016/j.sbi.2023.102594] [Citation(s) in RCA: 35] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Revised: 03/12/2023] [Accepted: 03/17/2023] [Indexed: 04/17/2023]
Abstract
In Dec 2020, the results of AlphaFold version 2 were presented at CASP14, sparking a revolution in the field of protein structure predictions. For the first time, a purely computational method could challenge experimental accuracy for structure prediction of single protein domains. The code of AlphaFold v2 was released in the summer of 2021, and since then, it has been shown that it can be used to accurately predict the structure of most ordered proteins and many protein-protein interactions. It has also sparked an explosion of development in the field, improving AI-based methods to predict protein complexes, disordered regions, and protein design. Here I will review some of the inventions sparked by the release of AlphaFold.
Collapse
Affiliation(s)
- Arne Elofsson
- Science for Life Laboratory and Dep. of Biochemistry and Biophysics, Stockholm University, Sweden.
| |
Collapse
|
17
|
Zhang C, Pyle AM. PDC: a highly compact file format to store protein 3D coordinates. Database (Oxford) 2023; 2023:baad018. [PMID: 37010520 PMCID: PMC10069377 DOI: 10.1093/database/baad018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 03/01/2023] [Accepted: 03/07/2023] [Indexed: 04/04/2023]
Abstract
Recent improvements in computational and experimental techniques for obtaining protein structures have resulted in an explosion of 3D coordinate data. To cope with the ever-increasing sizes of structure databases, this work proposes the Protein Data Compression (PDC) format, which compresses coordinates and temperature factors of full-atomic and Cα-only protein structures. Without loss of precision, PDC results in 69% to 78% smaller file sizes than Protein Data Bank (PDB) and macromolecular Crystallographic Information File (mmCIF) files with standard GZIP compression. It uses ∼60% less space than existing compression algorithms specific to macromolecular structures. PDC optionally performs lossy compression with minimal sacrifice of precision, which allows reduction of file sizes by another 79%. Conversion between PDC, mmCIF and PDB formats is typically achieved within 0.02 s. The compactness and fast reading/writing speed of PDC make it valuable for storage and analysis of large quantity of tertiary structural data. Database URL https://github.com/kad-ecoli/pdc.
Collapse
Affiliation(s)
- Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Av, Ann Arbor, MI 48109, USA
- Howard Hughes Medical Institute, 4000 Jones Bridge Rd, Chevy Chase, MD 20815, USA
- Department of Molecular, Cellular, and Developmental Biology, Yale University, 266 Whitney Av, New Haven, CT 06511, USA
| | - Anna Marie Pyle
- Howard Hughes Medical Institute, 4000 Jones Bridge Rd, Chevy Chase, MD 20815, USA
- Department of Molecular, Cellular, and Developmental Biology, Yale University, 266 Whitney Av, New Haven, CT 06511, USA
- Department of Chemistry, Yale University, 225 Prospect St, New Haven, CT 06511, USA
| |
Collapse
|
18
|
Dey D, Tanaka R, Ito H. Structural Characterization of the Chlorophyllide a Oxygenase (CAO) Enzyme Through an In Silico Approach. J Mol Evol 2023; 91:225-235. [PMID: 36869271 DOI: 10.1007/s00239-023-10100-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 02/16/2023] [Indexed: 03/05/2023]
Abstract
Chlorophyllide a oxygenase (CAO) is responsible for converting chlorophyll a to chlorophyll b in a two-step oxygenation reaction. CAO belongs to the family of Rieske-mononuclear iron oxygenases. Although the structure and reaction mechanism of other Rieske monooxygenases have been described, a member of plant Rieske non-heme iron-dependent monooxygenase has not been structurally characterized. The enzymes in this family usually form a trimeric structure and electrons are transferred between the non-heme iron site and the Rieske center of the adjoining subunits. CAO is supposed to form a similar structural arrangement. However, in Mamiellales such as Micromonas and Ostreococcus, CAO is encoded by two genes where non-heme iron site and Rieske cluster localize on the distinct polypeptides. It is not clear if they can form a similar structural organization to achieve the enzymatic activity. In this study, the tertiary structures of CAO from the model plant Arabidopsis thaliana and the Prasinophyte Micromonas pusilla were predicted by deep learning-based methods, followed by energy minimization and subsequent stereochemical quality assessment of the predicted models. Furthermore, the chlorophyll a binding cavity and the interaction of ferredoxin, which is the electron donor, on the surface of Micromonas CAO were predicted. The electron transfer pathway was predicted in Micromonas CAO and the overall structure of the CAO active site was conserved even though it forms a heterodimeric complex. The structures presented in this study will serve as a basis for understanding the reaction mechanism and regulation of the plant monooxygenase family to which CAO belongs.
Collapse
Affiliation(s)
- Debayan Dey
- Graduate School of Life Science, Hokkaido University, N10 W8, Sapporo, 060-0810, Japan
- Institute of Low Temperature Science, Hokkaido University, N19 W8, Sapporo, 060-0819, Japan
| | - Ryouichi Tanaka
- Institute of Low Temperature Science, Hokkaido University, N19 W8, Sapporo, 060-0819, Japan
| | - Hisashi Ito
- Institute of Low Temperature Science, Hokkaido University, N19 W8, Sapporo, 060-0819, Japan.
| |
Collapse
|
19
|
Pearce R, Huang X, Omenn GS, Zhang Y. De novo protein fold design through sequence-independent fragment assembly simulations. Proc Natl Acad Sci U S A 2023; 120:e2208275120. [PMID: 36656852 PMCID: PMC9942881 DOI: 10.1073/pnas.2208275120] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 12/22/2022] [Indexed: 01/20/2023] Open
Abstract
De novo protein design generally consists of two steps, including structure and sequence design. Many protein design studies have focused on sequence design with scaffolds adapted from native structures in the PDB, which renders novel areas of protein structure and function space unexplored. We developed FoldDesign to create novel protein folds from specific secondary structure (SS) assignments through sequence-independent replica-exchange Monte Carlo (REMC) simulations. The method was tested on 354 non-redundant topologies, where FoldDesign consistently created stable structural folds, while recapitulating on average 87.7% of the SS elements. Meanwhile, the FoldDesign scaffolds had well-formed structures with buried residues and solvent-exposed areas closely matching their native counterparts. Despite the high fidelity to the input SS restraints and local structural characteristics of native proteins, a large portion of the designed scaffolds possessed global folds completely different from natural proteins in the PDB, highlighting the ability of FoldDesign to explore novel areas of protein fold space. Detailed data analyses revealed that the major contributions to the successful structure design lay in the optimal energy force field, which contains a balanced set of SS packing terms, and REMC simulations, which were coupled with multiple auxiliary movements to efficiently search the conformational space. Additionally, the ability to recognize and assemble uncommon super-SS geometries, rather than the unique arrangement of common SS motifs, was the key to generating novel folds. These results demonstrate a strong potential to explore both structural and functional spaces through computational design simulations that natural proteins have not reached through evolution.
Collapse
Affiliation(s)
- Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI48109
| | - Xiaoqiang Huang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI48109
| | - Gilbert S. Omenn
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI48109
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI48109
- Department of Human Genetics, University of Michigan, Ann Arbor, MI48109
- School of Public Health, University of Michigan, Ann Arbor, MI48109
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI48109
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI48109
- Department of Computer Science, School of Computing, National University of Singapore117417, Singapore
- Cancer Science Institute of Singapore, National University of Singapore117599, Singapore
| |
Collapse
|
20
|
Liu J, Zhao K, Zhang G. Improved model quality assessment using sequence and structural information by enhanced deep neural networks. Brief Bioinform 2023; 24:6865134. [PMID: 36460624 DOI: 10.1093/bib/bbac507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 10/02/2022] [Accepted: 10/24/2022] [Indexed: 12/04/2022] Open
Abstract
Protein model quality assessment plays an important role in protein structure prediction, protein design and drug discovery. In this work, DeepUMQA2, a substantially improved version of DeepUMQA for protein model quality assessment, is proposed. First, sequence features containing protein co-evolution information and structural features reflecting family information are extracted to complement model-dependent features. Second, a novel backbone network based on triangular multiplication update and axial attention mechanism is designed to enhance information exchange between inter-residue pairs. On CASP13 and CASP14 datasets, the performance of DeepUMQA2 increases by 20.5 and 20.4% compared with DeepUMQA, respectively (measured by top 1 loss). Moreover, on the three-month CAMEO dataset (11 March to 04 June 2022), DeepUMQA2 outperforms DeepUMQA by 15.5% (measured by local AUC0,0.2) and ranks first among all competing server methods in CAMEO blind test. Experimental results show that DeepUMQA2 outperforms state-of-the-art model quality assessment methods, such as ProQ3D-LDDT, ModFOLD8, and DeepAccNet and DeepUMQA2 can select more suitable best models than state-of-the-art protein structure methods, such as AlphaFold2, RoseTTAFold and I-TASSER, provided themselves.
Collapse
Affiliation(s)
- Jun Liu
- College of Information Engineering, Zhejiang University of Technology
| | - Kailong Zhao
- College of Information Engineering, Zhejiang University of Technology
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology
| |
Collapse
|
21
|
Chang Y, Hawkins BA, Du JJ, Groundwater PW, Hibbs DE, Lai F. A Guide to In Silico Drug Design. Pharmaceutics 2022; 15:pharmaceutics15010049. [PMID: 36678678 PMCID: PMC9867171 DOI: 10.3390/pharmaceutics15010049] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 12/16/2022] [Accepted: 12/17/2022] [Indexed: 12/28/2022] Open
Abstract
The drug discovery process is a rocky path that is full of challenges, with the result that very few candidates progress from hit compound to a commercially available product, often due to factors, such as poor binding affinity, off-target effects, or physicochemical properties, such as solubility or stability. This process is further complicated by high research and development costs and time requirements. It is thus important to optimise every step of the process in order to maximise the chances of success. As a result of the recent advancements in computer power and technology, computer-aided drug design (CADD) has become an integral part of modern drug discovery to guide and accelerate the process. In this review, we present an overview of the important CADD methods and applications, such as in silico structure prediction, refinement, modelling and target validation, that are commonly used in this area.
Collapse
Affiliation(s)
- Yiqun Chang
- Sydney Pharmacy School, Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Bryson A. Hawkins
- Sydney Pharmacy School, Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Jonathan J. Du
- Department of Biochemistry, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Paul W. Groundwater
- Sydney Pharmacy School, Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW 2006, Australia
| | - David E. Hibbs
- Sydney Pharmacy School, Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Felcia Lai
- Sydney Pharmacy School, Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW 2006, Australia
- Correspondence:
| |
Collapse
|
22
|
Mufassirin MMM, Newton MAH, Sattar A. Artificial intelligence for template-free protein structure prediction: a comprehensive review. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10350-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
23
|
rMSA: a sequence search and alignment algorithm to improve RNA structure modeling. J Mol Biol 2022. [DOI: 10.1016/j.jmb.2022.167904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
|
24
|
Protein structure prediction in the deep learning era. Curr Opin Struct Biol 2022; 77:102495. [PMID: 36371845 DOI: 10.1016/j.sbi.2022.102495] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 10/03/2022] [Accepted: 10/04/2022] [Indexed: 11/11/2022]
Abstract
Significant advances have been achieved in protein structure prediction, especially with the recent development of the AlphaFold2 and the RoseTTAFold systems. This article reviews the progress in deep learning-based protein structure prediction methods in the past two years. First, we divide the representative methods into two categories: the two-step approach and the end-to-end approach. Then, we show that the two-step approach is possible to achieve similar accuracy to the state-of-the-art end-to-end approach AlphaFold2. Compared to the end-to-end approach, the two-step approach requires fewer computing resources. We conclude that it is valuable to keep developing both approaches. Finally, a few outstanding challenges in function-orientated protein structure prediction are pointed out for future development.
Collapse
|
25
|
Peng CX, Zhou XG, Xia YH, Liu J, Hou MH, Zhang GJ. Structural analogue-based protein structure domain assembly assisted by deep learning. Bioinformatics 2022; 38:4513-4521. [PMID: 35962986 DOI: 10.1093/bioinformatics/btac553] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Revised: 07/27/2022] [Accepted: 08/08/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION With the breakthrough of AlphaFold2, the protein structure prediction problem has made remarkable progress through deep learning end-to-end techniques, in which correct folds could be built for nearly all single-domain proteins. However, the full-chain modelling appears to be lower on average accuracy than that for the constituent domains and requires higher demand on computing hardware, indicating the performance of full-chain modelling still needs to be improved. In this study, we investigate whether the predicted accuracy of the full-chain model can be further improved by domain assembly assisted by deep learning. RESULTS In this article, we developed a structural analogue-based protein structure domain assembly method assisted by deep learning, named SADA. In SADA, a multi-domain protein structure database was constructed for the full-chain analogue detection using individual domain models. Starting from the initial model constructed from the analogue, the domain assembly simulation was performed to generate the full-chain model through a two-stage differential evolution algorithm guided by the energy function with an inter-residue distance potential predicted by deep learning. SADA was compared with the state-of-the-art domain assembly methods on 356 benchmark proteins, and the average TM-score of SADA models is 8.1% and 27.0% higher than that of DEMO and AIDA, respectively. We also assembled 293 human multi-domain proteins, where the average TM-score of the full-chain model after the assembly by SADA is 1.1% higher than that of the model by AlphaFold2. To conclude, we find that the domains often interact in the similar way in the quaternary orientations if the domains have similar tertiary structures. Furthermore, homologous templates and structural analogues are complementary for multi-domain protein full-chain modelling. AVAILABILITY AND IMPLEMENTATION http://zhanglab-bioinf.com/SADA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chun-Xiang Peng
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiao-Gen Zhou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Yu-Hao Xia
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Ming-Hua Hou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
26
|
Avery C, Patterson J, Grear T, Frater T, Jacobs DJ. Protein Function Analysis through Machine Learning. Biomolecules 2022; 12:1246. [PMID: 36139085 PMCID: PMC9496392 DOI: 10.3390/biom12091246] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Revised: 08/22/2022] [Accepted: 08/31/2022] [Indexed: 11/16/2022] Open
Abstract
Machine learning (ML) has been an important arsenal in computational biology used to elucidate protein function for decades. With the recent burgeoning of novel ML methods and applications, new ML approaches have been incorporated into many areas of computational biology dealing with protein function. We examine how ML has been integrated into a wide range of computational models to improve prediction accuracy and gain a better understanding of protein function. The applications discussed are protein structure prediction, protein engineering using sequence modifications to achieve stability and druggability characteristics, molecular docking in terms of protein-ligand binding, including allosteric effects, protein-protein interactions and protein-centric drug discovery. To quantify the mechanisms underlying protein function, a holistic approach that takes structure, flexibility, stability, and dynamics into account is required, as these aspects become inseparable through their interdependence. Another key component of protein function is conformational dynamics, which often manifest as protein kinetics. Computational methods that use ML to generate representative conformational ensembles and quantify differences in conformational ensembles important for function are included in this review. Future opportunities are highlighted for each of these topics.
Collapse
Affiliation(s)
- Chris Avery
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - John Patterson
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Tyler Grear
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
- Department of Physics and Optical Science, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Theodore Frater
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Donald J. Jacobs
- Department of Physics and Optical Science, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| |
Collapse
|
27
|
Zhang H, Shang R, Kim K, Zheng W, Johnson CJ, Sun L, Niu X, Liu L, Zhou J, Liu L, Zhang Z, Uyeno TA, Pei J, Fissette SD, Green SA, Samudra SP, Wen J, Zhang J, Eggenschwiler JT, Menke DB, Bronner ME, Grishin NV, Li W, Ye K, Zhang Y, Stolfi A, Bi P. Evolution of a chordate-specific mechanism for myoblast fusion. SCIENCE ADVANCES 2022; 8:eadd2696. [PMID: 36054355 PMCID: PMC10848958 DOI: 10.1126/sciadv.add2696] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 07/15/2022] [Indexed: 06/15/2023]
Abstract
Vertebrate myoblast fusion allows for multinucleated muscle fibers to compound the size and strength of mononucleated cells, but the evolution of this important process is unknown. We investigated the evolutionary origins and function of membrane-coalescing agents Myomaker and Myomixer in various groups of chordates. Here, we report that Myomaker likely arose through gene duplication in the last common ancestor of tunicates and vertebrates, while Myomixer appears to have evolved de novo in early vertebrates. Functional tests revealed a complex evolutionary history of myoblast fusion. A prevertebrate phase of muscle multinucleation driven by Myomaker was followed by the later emergence of Myomixer that enables the highly efficient fusion system of vertebrates. Evolutionary comparisons between vertebrate and nonvertebrate Myomaker revealed key structural and mechanistic insights into myoblast fusion. Thus, our findings suggest an evolutionary model of chordate fusogens and illustrate how new genes shape the emergence of novel morphogenetic traits and mechanisms.
Collapse
Affiliation(s)
- Haifeng Zhang
- Center for Molecular Medicine, University of Georgia, Athens, GA, USA
| | - Renjie Shang
- Center for Molecular Medicine, University of Georgia, Athens, GA, USA
- Department of Genetics, University of Georgia, Athens, GA, USA
| | - Kwantae Kim
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA
| | | | - Lei Sun
- The Fifth People’s Hospital of Shanghai, and Shanghai Key Laboratory of Medical Epigenetics, Institutes of Biomedical Sciences, Fudan University, Shanghai, China
| | - Xiang Niu
- Tri-Institutional Program in Computational Biology and Medicine, Weill Cornell Medical College, New York, USA
| | - Liang Liu
- Department of Statistics, University of Georgia, Athens, GA, USA
- Institute of Bioinformatics, University of Georgia, Athens, GA, USA
| | - Jingqi Zhou
- Department of Genetics, University of Georgia, Athens, GA, USA
| | - Lingshu Liu
- Department of Genetics, University of Georgia, Athens, GA, USA
| | - Zheng Zhang
- Center for Molecular Medicine, University of Georgia, Athens, GA, USA
| | | | - Jimin Pei
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Skye D. Fissette
- Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI, USA
| | - Stephen A. Green
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | | | - Junfei Wen
- Center for Molecular Medicine, University of Georgia, Athens, GA, USA
| | - Jianli Zhang
- College of Engineering, University of Georgia, Athens, GA, USA
| | | | | | - Marianne E. Bronner
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Nick V. Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Weiming Li
- Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI, USA
| | - Kaixiong Ye
- Department of Genetics, University of Georgia, Athens, GA, USA
- Institute of Bioinformatics, University of Georgia, Athens, GA, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Alberto Stolfi
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA
| | - Pengpeng Bi
- Center for Molecular Medicine, University of Georgia, Athens, GA, USA
- Department of Genetics, University of Georgia, Athens, GA, USA
| |
Collapse
|
28
|
I-TASSER-MTD: a deep-learning-based platform for multi-domain protein structure and function prediction. Nat Protoc 2022; 17:2326-2353. [PMID: 35931779 DOI: 10.1038/s41596-022-00728-0] [Citation(s) in RCA: 157] [Impact Index Per Article: 78.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 05/24/2022] [Indexed: 01/17/2023]
Abstract
Most proteins in cells are composed of multiple folding units (or domains) to perform complex functions in a cooperative manner. Relative to the rapid progress in single-domain structure prediction, there are few effective tools available for multi-domain protein structure assembly, mainly due to the complexity of modeling multi-domain proteins, which involves higher degrees of freedom in domain-orientation space and various levels of continuous and discontinuous domain assembly and linker refinement. To meet the challenge and the high demand of the community, we developed I-TASSER-MTD to model the structures and functions of multi-domain proteins through a progressive protocol that combines sequence-based domain parsing, single-domain structure folding, inter-domain structure assembly and structure-based function annotation in a fully automated pipeline. Advanced deep-learning models have been incorporated into each of the steps to enhance both the domain modeling and inter-domain assembly accuracy. The protocol allows for the incorporation of experimental cross-linking data and cryo-electron microscopy density maps to guide the multi-domain structure assembly simulations. I-TASSER-MTD is built on I-TASSER but substantially extends its ability and accuracy in modeling large multi-domain protein structures and provides meaningful functional insights for the targets at both the domain- and full-chain levels from the amino acid sequence alone.
Collapse
|
29
|
Zhou X, Peng C, Zheng W, Li Y, Zhang G, Zhang Y. DEMO2: Assemble multi-domain protein structures by coupling analogous template alignments with deep-learning inter-domain restraint prediction. Nucleic Acids Res 2022; 50:W235-W245. [PMID: 35536281 PMCID: PMC9252800 DOI: 10.1093/nar/gkac340] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 04/13/2022] [Accepted: 04/22/2022] [Indexed: 01/19/2023] Open
Abstract
Most proteins in nature contain multiple folding units (or domains). The revolutionary success of AlphaFold2 in single-domain structure prediction showed potential to extend deep-learning techniques for multi-domain structure modeling. This work presents a significantly improved method, DEMO2, which integrates analogous template structural alignments with deep-learning techniques for high-accuracy domain structure assembly. Starting from individual domain models, inter-domain spatial restraints are first predicted with deep residual convolutional networks, where full-length structure models are assembled using L-BFGS simulations under the guidance of a hybrid energy function combining deep-learning restraints and analogous multi-domain template alignments searched from the PDB. The output of DEMO2 contains deep-learning inter-domain restraints, top-ranked multi-domain structure templates, and up to five full-length structure models. DEMO2 was tested on a large-scale benchmark and the blind CASP14 experiment, where DEMO2 was shown to significantly outperform its predecessor and the state-of-the-art protein structure prediction methods. By integrating with new deep-learning techniques, DEMO2 should help fill the rapidly increasing gap between the improved ability of tertiary structure determination and the high demand for the high-quality multi-domain protein structures. The DEMO2 server is available at https://zhanggroup.org/DEMO/.
Collapse
Affiliation(s)
- Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Chunxiang Peng
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
30
|
Li Y, Zhang C, Yu DJ, Zhang Y. Deep learning geometrical potential for high-accuracy ab initio protein structure prediction. iScience 2022; 25:104425. [PMID: 35663033 PMCID: PMC9160776 DOI: 10.1016/j.isci.2022.104425] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 05/02/2022] [Accepted: 05/11/2022] [Indexed: 11/22/2022] Open
Abstract
Ab initio protein structure prediction has been vastly boosted by the modeling of inter-residue contact/distance maps in recent years. We developed a new deep learning model, DeepPotential, which accurately predicts the distribution of a complementary set of geometric descriptors including a novel hydrogen-bonding potential defined by C-alpha atom coordinates. On 154 Free-Modeling/Hard targets from the CASP and CAMEO experiments, DeepPotential demonstrated significant advantage on both geometrical feature prediction and full-length structure construction, with Top-L/5 contact accuracy and TM-score of full-length models 4.1% and 6.7% higher than the best of other deep-learning restraint prediction approaches. Detail analyses showed that the major contributions to the TM-score/contact-map improvements come from the employment of multi-tasking network architecture and metagenome-based MSA collection assisted with confidence-based MSA selection, where hydrogen-bonding and inter-residue orientation predictions help improve hydrogen-bonding network and secondary structure packing. These results demonstrated new progress in the deep-learning restraint-guided ab initio protein structure prediction. Multi-tasking network architecture for multiple inter-residue geometries Novel deep learning model for improved hydrogen-bonding modeling Rapid and high-accuracy Ab initio protein structure prediction
Collapse
Affiliation(s)
- Yang Li
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 21000, China.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 21000, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
31
|
Zheng W, Wuyun Q, Zhou X, Li Y, Freddolino PL, Zhang Y. LOMETS3: integrating deep learning and profile alignment for advanced protein template recognition and function annotation. Nucleic Acids Res 2022; 50:W454-W464. [PMID: 35420129 PMCID: PMC9252734 DOI: 10.1093/nar/gkac248] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2022] [Revised: 03/29/2022] [Accepted: 03/31/2022] [Indexed: 11/25/2022] Open
Abstract
Deep learning techniques have significantly advanced the field of protein structure prediction. LOMETS3 (https://zhanglab.ccmb.med.umich.edu/LOMETS/) is a new generation meta-server approach to template-based protein structure prediction and function annotation, which integrates newly developed deep learning threading methods. For the first time, we have extended LOMETS3 to handle multi-domain proteins and to construct full-length models with gradient-based optimizations. Starting from a FASTA-formatted sequence, LOMETS3 performs four steps of domain boundary prediction, domain-level template identification, full-length template/model assembly and structure-based function prediction. The output of LOMETS3 contains (i) top-ranked templates from LOMETS3 and its component threading programs, (ii) up to 5 full-length structure models constructed by L-BFGS (limited-memory Broyden–Fletcher–Goldfarb–Shanno algorithm) optimization, (iii) the 10 closest Protein Data Bank (PDB) structures to the target, (iv) structure-based functional predictions, (v) domain partition and assembly results, and (vi) the domain-level threading results, including items (i)–(iii) for each identified domain. LOMETS3 was tested in large-scale benchmarks and the blind CASP14 (14th Critical Assessment of Structure Prediction) experiment, where the overall template recognition and function prediction accuracy is significantly beyond its predecessors and other state-of-the-art threading approaches, especially for hard targets without homologous templates in the PDB. Based on the improved developments, LOMETS3 should help significantly advance the capability of broader biomedical community for template-based protein structure and function modelling.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Peter L Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
32
|
Shrestha B, Adhikari B. Scoring protein sequence alignments using deep Learning. Bioinformatics 2022; 38:2988-2995. [PMID: 35385080 DOI: 10.1093/bioinformatics/btac210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 04/01/2022] [Accepted: 04/05/2022] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND A high-quality sequence alignment (SA) is the most important input feature for accurate protein structure prediction. For a protein sequence, there are many methods to generate a SA. However, when given a choice of more than one SA for a protein sequence, there are no methods to predict which SA may lead to more accurate models without actually building the models. In this work, we describe a method to predict the quality of a protein's SA. METHODS We created our own dataset by generating a variety of SAs for a set of 1,351 representative proteins and investigated various deep learning architectures to predict the local distance difference test (lDDT) scores of distance maps predicted with SAs as the input. These lDDT scores serve as indicators of the quality of the SAs. RESULTS Using two independent test datasets consisting of CASP13 and CASP14 targets, we show that our method is effective for scoring and ranking SAs when a pool of SAs is available for a protein sequence. With an example, we further discuss that SA selection using our method can lead to improved structure prediction. AVAILABILITY Code and datasets are available at https://github.com/ba-lab/Alignment-Score/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Bikash Shrestha
- Department of Computer Science, University of Missouri-St. Louis, St. Louis, MO 63132, USA
| | - Badri Adhikari
- Department of Computer Science, University of Missouri-St. Louis, St. Louis, MO 63132, USA
| |
Collapse
|
33
|
Aderinwale T, Bharadwaj V, Christoffer C, Terashi G, Zhang Z, Jahandideh R, Kagaya Y, Kihara D. Real-time structure search and structure classification for AlphaFold protein models. Commun Biol 2022; 5:316. [PMID: 35383281 PMCID: PMC8983703 DOI: 10.1038/s42003-022-03261-8] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 03/11/2022] [Indexed: 11/17/2022] Open
Abstract
Last year saw a breakthrough in protein structure prediction, where the AlphaFold2 method showed a substantial improvement in the modeling accuracy. Following the software release of AlphaFold2, predicted structures by AlphaFold2 for proteins in 21 species were made publicly available via the AlphaFold Database. Here, to facilitate structural analysis and application of AlphaFold2 models, we provide the infrastructure, 3D-AF-Surfer, which allows real-time structure-based search for the AlphaFold2 models. In 3D-AF-Surfer, structures are represented with 3D Zernike descriptors (3DZD), which is a rotationally invariant, mathematical representation of 3D shapes. We developed a neural network that takes 3DZDs of proteins as input and retrieves proteins of the same fold more accurately than direct comparison of 3DZDs. Using 3D-AF-Surfer, we report structure classifications of AlphaFold2 models and discuss the correlation between confidence levels of AlphaFold2 models and intrinsic disordered regions.
Collapse
Affiliation(s)
- Tunde Aderinwale
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Vijay Bharadwaj
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Charles Christoffer
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Genki Terashi
- Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA
| | - Zicong Zhang
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | | | - Yuki Kagaya
- Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA.
- Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA.
| |
Collapse
|
34
|
Zhou X, Li Y, Zhang C, Zheng W, Zhang G, Zhang Y. Progressive assembly of multi-domain protein structures from cryo-EM density maps. NATURE COMPUTATIONAL SCIENCE 2022; 2:265-275. [PMID: 35844960 PMCID: PMC9281201 DOI: 10.1038/s43588-022-00232-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 03/21/2022] [Indexed: 05/20/2023]
Abstract
Progress in cryo-electron microscopy has provided the potential for large-size protein structure determination. However, the success rate for solving multi-domain proteins remains low because of the difficulty in modelling inter-domain orientations. Here we developed domain enhanced modeling using cryo-electron microscopy (DEMO-EM), an automatic method to assemble multi-domain structures from cryo-electron microscopy maps through a progressive structural refinement procedure combining rigid-body domain fitting and flexible assembly simulations with deep-neural-network inter-domain distance profiles. The method was tested on a large-scale benchmark set of proteins containing up to 12 continuous and discontinuous domains with medium- to low-resolution density maps, where DEMO-EM produced models with correct inter-domain orientations (template modeling score (TM-score) >0.5) for 97% of cases and outperformed state-of-the-art methods. DEMO-EM was applied to the severe acute respiratory syndrome coronavirus 2 genome and generated models with average TM-score and root-mean-square deviation of 0.97 and 1.3 Å, respectively, with respect to the deposited structures. These results demonstrate an efficient pipeline that enables automated and reliable large-scale multi-domain protein structure modelling from cryo-electron microscopy maps.
Collapse
Affiliation(s)
- Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
35
|
Hong Y, Lee J, Ko J. A-Prot: protein structure modeling using MSA transformer. BMC Bioinformatics 2022; 23:93. [PMID: 35296230 PMCID: PMC8925138 DOI: 10.1186/s12859-022-04628-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Accepted: 03/03/2022] [Indexed: 11/18/2022] Open
Abstract
Background The accuracy of protein 3D structure prediction has been dramatically improved with the help of advances in deep learning. In the recent CASP14, Deepmind demonstrated that their new version of AlphaFold (AF) produces highly accurate 3D models almost close to experimental structures. The success of AF shows that the multiple sequence alignment of a sequence contains rich evolutionary information, leading to accurate 3D models. Despite the success of AF, only the prediction code is open, and training a similar model requires a vast amount of computational resources. Thus, developing a lighter prediction model is still necessary. Results In this study, we propose a new protein 3D structure modeling method, A-Prot, using MSA Transformer, one of the state-of-the-art protein language models. An MSA feature tensor and row attention maps are extracted and converted into 2D residue-residue distance and dihedral angle predictions for a given MSA. We demonstrated that A-Prot predicts long-range contacts better than the existing methods. Additionally, we modeled the 3D structures of the free modeling and hard template-based modeling targets of CASP14. The assessment shows that the A-Prot models are more accurate than most top server groups of CASP14. Conclusion These results imply that A-Prot accurately captures the evolutionary and structural information of proteins with relatively low computational cost. Thus, A-Prot can provide a clue for the development of other protein property prediction methods. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04628-8.
Collapse
Affiliation(s)
- Yiyu Hong
- Arontier Co, Seoul, Republic of Korea
| | - Juyong Lee
- Arontier Co, Seoul, Republic of Korea. .,Department of Chemistry, Division of Chemistry and Biochemistry, Kangwon National University, Chuncheon, Republic of Korea.
| | - Junsu Ko
- Arontier Co, Seoul, Republic of Korea
| |
Collapse
|
36
|
Hou Q, Pucci F, Pan F, Xue F, Rooman M, Feng Q. Using metagenomic data to boost protein structure prediction and discovery. Comput Struct Biotechnol J 2022; 20:434-442. [PMID: 35070166 PMCID: PMC8760478 DOI: 10.1016/j.csbj.2021.12.030] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 12/17/2021] [Accepted: 12/21/2021] [Indexed: 11/19/2022] Open
Abstract
Over the past decade, metagenomic sequencing approaches have been providing an ever-increasing amount of protein sequence data at an astonishing rate. These constitute an invaluable source of information which has been exploited in various research fields such as the study of the role of the gut microbiota in human diseases and aging. However, only a small fraction of all metagenomic sequences collected have been functionally or structurally characterized, leaving much of them completely unexplored. Here, we review how this information has been used in protein structure prediction and protein discovery. We begin by presenting some widely used metagenomic databases and analyze in detail how metagenomic data has contributed to the impressive improvement in the accuracy of structure prediction methods in recent years. We then examine how metagenomic information can be exploited to annotate protein sequences. More specifically, we focus on the role of metagenomes in the discovery of enzymes and new CRISPR-Cas systems, and in the identification of antibiotic resistance genes. With this review, we provide an overview of how metagenomic data is currently revolutionizing our understanding of protein science.
Collapse
Affiliation(s)
- Qingzhen Hou
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Shandong 250012, China
- National Institute of Health Data Science of China, Shandong University, Shandong 250002, China
| | - Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, 1050 Brussels, Belgium
| | - Fengming Pan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Shandong 250012, China
- National Institute of Health Data Science of China, Shandong University, Shandong 250002, China
| | - Fuzhong Xue
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Shandong 250012, China
- National Institute of Health Data Science of China, Shandong University, Shandong 250002, China
| | - Marianne Rooman
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, 1050 Brussels, Belgium
| | - Qiang Feng
- Shandong Provincial Key Laboratory of Oral Tissue Regeneration & Shandong Engineering Laboratory for Dental Materials and Oral Tissue Regeneration, Department of Human Microbiome, School of Stomatology, Shandong University, Jinan, Shandong Province 250012, China
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao, Shandong Province 266237, China
| |
Collapse
|
37
|
Su H, Wang W, Du Z, Peng Z, Gao S, Cheng M, Yang J. Improved Protein Structure Prediction Using a New Multi-Scale Network and Homologous Templates. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2021; 8:e2102592. [PMID: 34719864 PMCID: PMC8693034 DOI: 10.1002/advs.202102592] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/2021] [Revised: 09/12/2021] [Indexed: 06/04/2023]
Abstract
The accuracy of de novo protein structure prediction has been improved considerably in recent years, mostly due to the introduction of deep learning techniques. In this work, trRosettaX, an improved version of trRosetta for protein structure prediction is presented. The major improvement over trRosetta consists of two folds. The first is the application of a new multi-scale network, i.e., Res2Net, for improved prediction of inter-residue geometries, including distance and orientations. The second is an attention-based module to exploit multiple homologous templates to increase the accuracy further. Compared with trRosetta, trRosettaX improves the contact precision by 6% and 8% on the free modeling targets of CASP13 and CASP14, respectively. A preliminary version of trRosettaX is ranked as one of the top server groups in CASP14's blind test. Additional benchmark test on 161 targets from CAMEO (between Jun and Sep 2020) shows that trRosettaX achieves an average TM-score ≈0.8, outperforming the top groups in CAMEO. These data suggest the effectiveness of using the multi-scale network and the benefit of incorporating homologous templates into the network. The trRosettaX algorithm is incorporated into the trRosetta server since Nov 2020. The web server, the training and inference codes are available at: https://yanglab.nankai.edu.cn/trRosetta/.
Collapse
Affiliation(s)
- Hong Su
- School of Mathematical SciencesNankai UniversityTianjin300071China
| | - Wenkai Wang
- School of Mathematical SciencesNankai UniversityTianjin300071China
| | - Zongyang Du
- School of Mathematical SciencesNankai UniversityTianjin300071China
| | - Zhenling Peng
- Research Center for Mathematics and Interdisciplinary SciencesShandong UniversityQingdao266237China
| | - Shang‐Hua Gao
- College of Computer ScienceNankai UniversityTianjin300071China
| | - Ming‐Ming Cheng
- College of Computer ScienceNankai UniversityTianjin300071China
| | - Jianyi Yang
- Research Center for Mathematics and Interdisciplinary SciencesShandong UniversityQingdao266237China
| |
Collapse
|
38
|
|
39
|
Defresne M, Barbe S, Schiex T. Protein Design with Deep Learning. Int J Mol Sci 2021; 22:11741. [PMID: 34769173 PMCID: PMC8584038 DOI: 10.3390/ijms222111741] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 10/23/2021] [Accepted: 10/26/2021] [Indexed: 12/21/2022] Open
Abstract
Computational Protein Design (CPD) has produced impressive results for engineering new proteins, resulting in a wide variety of applications. In the past few years, various efforts have aimed at replacing or improving existing design methods using Deep Learning technology to leverage the amount of publicly available protein data. Deep Learning (DL) is a very powerful tool to extract patterns from raw data, provided that data are formatted as mathematical objects and the architecture processing them is well suited to the targeted problem. In the case of protein data, specific representations are needed for both the amino acid sequence and the protein structure in order to capture respectively 1D and 3D information. As no consensus has been reached about the most suitable representations, this review describes the representations used so far, discusses their strengths and weaknesses, and details their associated DL architecture for design and related tasks.
Collapse
Affiliation(s)
- Marianne Defresne
- Toulouse Biotechnology Institute, Université de Toulouse, CNRS, INRAE, INSA, ANITI, 31077 Toulouse, France; (M.D.); (S.B.)
- Université Fédérale de Toulouse, ANITI, INRAE, UR 875, 31326 Toulouse, France
| | - Sophie Barbe
- Toulouse Biotechnology Institute, Université de Toulouse, CNRS, INRAE, INSA, ANITI, 31077 Toulouse, France; (M.D.); (S.B.)
| | - Thomas Schiex
- Université Fédérale de Toulouse, ANITI, INRAE, UR 875, 31326 Toulouse, France
| |
Collapse
|