151
|
Ovek D, Keskin O, Gursoy A. ProInterVal: Validation of Protein-Protein Interfaces through Learned Interface Representations. J Chem Inf Model 2024; 64:2979-2987. [PMID: 38526504 PMCID: PMC11040718 DOI: 10.1021/acs.jcim.3c01788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 02/21/2024] [Accepted: 02/22/2024] [Indexed: 03/26/2024]
Abstract
Proteins are vital components of the biological world and serve a multitude of functions. They interact with other molecules through their interfaces and participate in crucial cellular processes. Disruption of these interactions can have negative effects on organisms, highlighting the importance of studying protein-protein interfaces for developing targeted therapies for diseases. Therefore, the development of a reliable method for investigating protein-protein interactions is of paramount importance. In this work, we present an approach for validating protein-protein interfaces using learned interface representations. The approach involves using a graph-based contrastive autoencoder architecture and a transformer to learn representations of protein-protein interaction interfaces from unlabeled data and then validating them through learned representations with a graph neural network. Our method achieves an accuracy of 0.91 for the test set, outperforming existing GNN-based methods. We demonstrate the effectiveness of our approach on a benchmark data set and show that it provides a promising solution for validating protein-protein interfaces.
Collapse
Affiliation(s)
- Damla Ovek
- KUIS
AI Center, Koç University, Istanbul 34450, Turkey
- Computer
Engineering, Koç University, Istanbul 34450, Turkey
| | - Ozlem Keskin
- Chemical
and Biological Engineering, Koç University, Istanbul 34450, Turkey
| | - Attila Gursoy
- Computer
Engineering, Koç University, Istanbul 34450, Turkey
| |
Collapse
|
152
|
Krishna R, Wang J, Ahern W, Sturmfels P, Venkatesh P, Kalvet I, Lee GR, Morey-Burrows FS, Anishchenko I, Humphreys IR, McHugh R, Vafeados D, Li X, Sutherland GA, Hitchcock A, Hunter CN, Kang A, Brackenbrough E, Bera AK, Baek M, DiMaio F, Baker D. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science 2024; 384:eadl2528. [PMID: 38452047 DOI: 10.1126/science.adl2528] [Citation(s) in RCA: 168] [Impact Index Per Article: 168.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 02/27/2024] [Indexed: 03/09/2024]
Abstract
Deep-learning methods have revolutionized protein structure prediction and design but are presently limited to protein-only systems. We describe RoseTTAFold All-Atom (RFAA), which combines a residue-based representation of amino acids and DNA bases with an atomic representation of all other groups to model assemblies that contain proteins, nucleic acids, small molecules, metals, and covalent modifications, given their sequences and chemical structures. By fine-tuning on denoising tasks, we developed RFdiffusion All-Atom (RFdiffusionAA), which builds protein structures around small molecules. Starting from random distributions of amino acid residues surrounding target small molecules, we designed and experimentally validated, through crystallography and binding measurements, proteins that bind the cardiac disease therapeutic digoxigenin, the enzymatic cofactor heme, and the light-harvesting molecule bilin.
Collapse
Affiliation(s)
- Rohith Krishna
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - Jue Wang
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - Woody Ahern
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98105, USA
| | - Pascal Sturmfels
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98105, USA
| | - Preetham Venkatesh
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
- Graduate Program in Biological Physics, Structure and Design, University of Washington, Seattle, WA 98105, USA
| | - Indrek Kalvet
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98105, USA
| | - Gyu Rie Lee
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98105, USA
| | | | - Ivan Anishchenko
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - Ian R Humphreys
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - Ryan McHugh
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
- Graduate Program in Biological Physics, Structure and Design, University of Washington, Seattle, WA 98105, USA
| | - Dionne Vafeados
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - Xinting Li
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | | | - Andrew Hitchcock
- School of Biosciences, University of Sheffield, Sheffield S10 2TN, UK
| | - C Neil Hunter
- School of Biosciences, University of Sheffield, Sheffield S10 2TN, UK
| | - Alex Kang
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - Evans Brackenbrough
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - Asim K Bera
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - Minkyung Baek
- School of Biological Sciences, Seoul National University, Seoul 08826, Republic of Korea
| | - Frank DiMaio
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98105, USA
| |
Collapse
|
153
|
Chen J, Wu H, Wang N. KEGG orthology prediction of bacterial proteins using natural language processing. BMC Bioinformatics 2024; 25:146. [PMID: 38600441 PMCID: PMC11007918 DOI: 10.1186/s12859-024-05766-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 04/03/2024] [Indexed: 04/12/2024] Open
Abstract
BACKGROUND The advent of high-throughput technologies has led to an exponential increase in uncharacterized bacterial protein sequences, surpassing the capacity of manual curation. A large number of bacterial protein sequences remain unannotated by Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology, making it necessary to use auto annotation tools. These tools are now indispensable in the biological research landscape, bridging the gap between the vastness of unannotated sequences and meaningful biological insights. RESULTS In this work, we propose a novel pipeline for KEGG orthology annotation of bacterial protein sequences that uses natural language processing and deep learning. To assess the effectiveness of our pipeline, we conducted evaluations using the genomes of two randomly selected species from the KEGG database. In our evaluation, we obtain competitive results on precision, recall, and F1 score, with values of 0.948, 0.947, and 0.947, respectively. CONCLUSIONS Our experimental results suggest that our pipeline demonstrates performance comparable to traditional methods and excels in identifying distant relatives with low sequence identity. This demonstrates the potential of our pipeline to significantly improve the accuracy and comprehensiveness of KEGG orthology annotation, thereby advancing our understanding of functional relationships within biological systems.
Collapse
Affiliation(s)
- Jing Chen
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China
- Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computing Intelligence, Jiangnan University, Wuxi, China
| | - Haoyu Wu
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China
| | - Ning Wang
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China.
| |
Collapse
|
154
|
Si Y, Yan C. Protein language model-embedded geometric graphs power inter-protein contact prediction. eLife 2024; 12:RP92184. [PMID: 38564241 PMCID: PMC10987090 DOI: 10.7554/elife.92184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2024] Open
Abstract
Accurate prediction of contacting residue pairs between interacting proteins is very useful for structural characterization of protein-protein interactions. Although significant improvement has been made in inter-protein contact prediction recently, there is still a large room for improving the prediction accuracy. Here we present a new deep learning method referred to as PLMGraph-Inter for inter-protein contact prediction. Specifically, we employ rotationally and translationally invariant geometric graphs obtained from structures of interacting proteins to integrate multiple protein language models, which are successively transformed by graph encoders formed by geometric vector perceptrons and residual networks formed by dimensional hybrid residual blocks to predict inter-protein contacts. Extensive evaluation on multiple test sets illustrates that PLMGraph-Inter outperforms five top inter-protein contact prediction methods, including DeepHomo, GLINTER, CDPred, DeepHomo2, and DRN-1D2D_Inter, by large margins. In addition, we also show that the prediction of PLMGraph-Inter can complement the result of AlphaFold-Multimer. Finally, we show leveraging the contacts predicted by PLMGraph-Inter as constraints for protein-protein docking can dramatically improve its performance for protein complex structure prediction.
Collapse
Affiliation(s)
- Yunda Si
- School of Physics, Huazhong University of Science and TechnologyWuhanChina
| | - Chengfei Yan
- School of Physics, Huazhong University of Science and TechnologyWuhanChina
| |
Collapse
|
155
|
Liu Q, Fu Q, Yan Y, Jiang Q, Mao L, Wang L, Yu F, Zheng H. Curation, nomenclature, and topological classification of receptor-like kinases from 528 plant species for novel domain discovery and functional inference. MOLECULAR PLANT 2024; 17:658-671. [PMID: 38384130 DOI: 10.1016/j.molp.2024.02.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Revised: 01/25/2024] [Accepted: 02/19/2024] [Indexed: 02/23/2024]
Abstract
Receptor-like kinases (RLKs) are the most numerous signal transduction components in plants and play important roles in determining how different plants adapt to their ecological environments. Research on RLKs has focused mainly on a small number of typical RLK members in a few model plants. There is an urgent need to study the composition, distribution, and evolution of RLKs at the holistic level to increase our understanding of how RLKs assist in the ecological adaptations of different plant species. In this study, we collected the genome assemblies of 528 plant species and constructed an RLK dataset. Using this dataset, we identified and characterized 524 948 RLK family members. Each member underwent systematic topological classification and was assigned a gene ID based on a unified nomenclature system. Furthermore, we identified two novel extracellular domains in some RLKs, designated Xiao and Xiang. Evolutionary analysis of the RLK family revealed that the RLCK-XVII and RLCK-XII-2 classes were present exclusively in dicots, suggesting that diversification of RLKs between monocots and dicots may have led to differences in downstream cytoplasmic responses. We also used an interaction proteome to help empower data mining for inference of new RLK functions from a global perspective, with the ultimate goal of understanding how RLKs shape the adaptation of different plants to the environments/ecosystems. The assembled RLK dataset, together with annotations and analytical tools, forms an integrated foundation of multiomics data that is publicly accessible via the metaRLK web portal (http://metaRLK.biocloud.top).
Collapse
Affiliation(s)
- Qian Liu
- State Key Laboratory of Chemo/Biosensing and Chemometrics, Hunan University College of Biology, Changsha, Hunan 410082, China
| | - Qiong Fu
- State Key Laboratory of Chemo/Biosensing and Chemometrics, Hunan University College of Biology, Changsha, Hunan 410082, China
| | - Yujie Yan
- State Key Laboratory of Chemo/Biosensing and Chemometrics, Hunan University College of Biology, Changsha, Hunan 410082, China
| | - Qian Jiang
- State Key Laboratory of Chemo/Biosensing and Chemometrics, Hunan University College of Biology, Changsha, Hunan 410082, China
| | - Longfei Mao
- Bioinformatics Center, Hunan University College of Biology, Changsha, Hunan 410082, China
| | - Long Wang
- State Key Laboratory of Chemo/Biosensing and Chemometrics, Hunan University College of Biology, Changsha, Hunan 410082, China
| | - Feng Yu
- State Key Laboratory of Chemo/Biosensing and Chemometrics, Hunan University College of Biology, Changsha, Hunan 410082, China.
| | - Heping Zheng
- State Key Laboratory of Chemo/Biosensing and Chemometrics, Hunan University College of Biology, Changsha, Hunan 410082, China; Bioinformatics Center, Hunan University College of Biology, Changsha, Hunan 410082, China.
| |
Collapse
|
156
|
Liu W, Wang Z, You R, Xie C, Wei H, Xiong Y, Yang J, Zhu S. PLMSearch: Protein language model powers accurate and fast sequence search for remote homology. Nat Commun 2024; 15:2775. [PMID: 38555371 PMCID: PMC10981738 DOI: 10.1038/s41467-024-46808-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Accepted: 03/08/2024] [Indexed: 04/02/2024] Open
Abstract
Homologous protein search is one of the most commonly used methods for protein annotation and analysis. Compared to structure search, detecting distant evolutionary relationships from sequences alone remains challenging. Here we propose PLMSearch (Protein Language Model), a homologous protein search method with only sequences as input. PLMSearch uses deep representations from a pre-trained protein language model and trains the similarity prediction model with a large number of real structure similarity. This enables PLMSearch to capture the remote homology information concealed behind the sequences. Extensive experimental results show that PLMSearch can search millions of query-target protein pairs in seconds like MMseqs2 while increasing the sensitivity by more than threefold, and is comparable to state-of-the-art structure search methods. In particular, unlike traditional sequence search methods, PLMSearch can recall most remote homology pairs with dissimilar sequences but similar structures. PLMSearch is freely available at https://dmiip.sjtu.edu.cn/PLMSearch .
Collapse
Affiliation(s)
- Wei Liu
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, 200433, Shanghai, China
| | - Ziye Wang
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, 200433, Shanghai, China
| | - Ronghui You
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, 200433, Shanghai, China
| | - Chenghan Xie
- School of Mathematical Sciences, Fudan University, 200433, Shanghai, China
| | - Hong Wei
- School of Mathematical Sciences, Nankai University, 300071, Tianjin, China
| | - Yi Xiong
- Department of Bioinformatics and Biostatistics, Shanghai Jiao Tong University, 200240, Shanghai, China
| | - Jianyi Yang
- Ministry of Education Frontiers Science Center for Nonlinear Expectations, Research Center for Mathematics and Interdisciplinary Science, Shandong University, 266237, Qingdao, China.
| | - Shanfeng Zhu
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, 200433, Shanghai, China.
- Shanghai Qi Zhi Institute, Shanghai, China.
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China.
- Shanghai Key Lab of Intelligent Information Processing and Shanghai Institute of Artificial Intelligence Algorithm, Fudan University, Shanghai, China.
- Zhangjiang Fudan International Innovation Center, Shanghai, China.
| |
Collapse
|
157
|
Zhang C, Zhang C, Shang T, Zhu N, Wu X, Duan H. HighFold: accurately predicting structures of cyclic peptides and complexes with head-to-tail and disulfide bridge constraints. Brief Bioinform 2024; 25:bbae215. [PMID: 38706323 PMCID: PMC11070728 DOI: 10.1093/bib/bbae215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 04/12/2024] [Accepted: 04/18/2024] [Indexed: 05/07/2024] Open
Abstract
In recent years, cyclic peptides have emerged as a promising therapeutic modality due to their diverse biological activities. Understanding the structures of these cyclic peptides and their complexes is crucial for unlocking invaluable insights about protein target-cyclic peptide interaction, which can facilitate the development of novel-related drugs. However, conducting experimental observations is time-consuming and expensive. Computer-aided drug design methods are not practical enough in real-world applications. To tackles this challenge, we introduce HighFold, an AlphaFold-derived model in this study. By integrating specific details about the head-to-tail circle and disulfide bridge structures, the HighFold model can accurately predict the structures of cyclic peptides and their complexes. Our model demonstrates superior predictive performance compared to other existing approaches, representing a significant advancement in structure-activity research. The HighFold model is openly accessible at https://github.com/hongliangduan/HighFold.
Collapse
Affiliation(s)
- Chenhao Zhang
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, China
| | - Chengyun Zhang
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, China
- AI department, Shanghai Highslab Therapeutics. Inc, Shanghai, 201203, China
| | - Tianfeng Shang
- AI department, Shanghai Highslab Therapeutics. Inc, Shanghai, 201203, China
| | - Ning Zhu
- China Pharmaceutical University, Nanjing, Jiangsu, 211198, China
| | - Xinyi Wu
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, China
| | - Hongliang Duan
- Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macao, 999078, China
| |
Collapse
|
158
|
Jing X, Wu F, Luo X, Xu J. Single-sequence protein structure prediction by integrating protein language models. Proc Natl Acad Sci U S A 2024; 121:e2308788121. [PMID: 38507445 PMCID: PMC10990103 DOI: 10.1073/pnas.2308788121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 02/05/2024] [Indexed: 03/22/2024] Open
Abstract
Protein structure prediction has been greatly improved by deep learning in the past few years. However, the most successful methods rely on multiple sequence alignment (MSA) of the sequence homologs of the protein under prediction. In nature, a protein folds in the absence of its sequence homologs and thus, a MSA-free structure prediction method is desired. Here, we develop a single-sequence-based protein structure prediction method RaptorX-Single by integrating several protein language models and a structure generation module and then study its advantage over MSA-based methods. Our experimental results indicate that in addition to running much faster than MSA-based methods such as AlphaFold2, RaptorX-Single outperforms AlphaFold2 and other MSA-free methods in predicting the structure of antibodies (after fine-tuning on antibody data), proteins of very few sequence homologs, and single mutation effects. By comparing different protein language models, our results show that not only the scale but also the training data of protein language models will impact the performance. RaptorX-Single also compares favorably to MSA-based AlphaFold2 when the protein under prediction has a large number of sequence homologs.
Collapse
Affiliation(s)
| | - Fandi Wu
- MoleculeMind Ltd., Beijing100084, China
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing100190, China
| | - Xiao Luo
- Toyota Technological Institute at Chicago, Chicago, IL60637
- Shanghai Artificial Intelligence Laboratory, Shanghai200232, China
| | - Jinbo Xu
- MoleculeMind Ltd., Beijing100084, China
- Toyota Technological Institute at Chicago, Chicago, IL60637
| |
Collapse
|
159
|
Zimmerman L, Alon N, Levin I, Koganitsky A, Shpigel N, Brestel C, Lapidoth GD. Context-dependent design of induced-fit enzymes using deep learning generates well-expressed, thermally stable and active enzymes. Proc Natl Acad Sci U S A 2024; 121:e2313809121. [PMID: 38437538 PMCID: PMC10945820 DOI: 10.1073/pnas.2313809121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 02/09/2024] [Indexed: 03/06/2024] Open
Abstract
The potential of engineered enzymes in industrial applications is often limited by their expression levels, thermal stability, and catalytic diversity. De novo enzyme design faces challenges due to the complexity of enzymatic catalysis. An alternative approach involves expanding natural enzyme capabilities for new substrates and parameters. Here, we introduce CoSaNN (Conformation Sampling using Neural Network), an enzyme design strategy using deep learning for structure prediction and sequence optimization. CoSaNN controls enzyme conformations to expand chemical space beyond simple mutagenesis. It employs a context-dependent approach for generating enzyme designs, considering non-linear relationships in sequence and structure space. We also developed SolvIT, a graph NN predicting protein solubility in Escherichia coli, optimizing enzyme expression selection from larger design sets. Using this method, we engineered enzymes with superior expression levels, with 54% expressed in E. coli, and increased thermal stability, with over 30% having higher Tm than the template, with no high-throughput screening. Our research underscores AI's transformative role in protein design, capturing high-order interactions and preserving allosteric mechanisms in extensively modified enzymes, and notably enhancing expression success rates. This method's ease of use and efficiency streamlines enzyme design, opening broad avenues for biotechnological applications and broadening field accessibility.
Collapse
Affiliation(s)
| | - Noga Alon
- Enzymit Ltd., Ness-Ziona7403626, Israel
| | | | | | | | | | | |
Collapse
|
160
|
Shi YZ, Wu H, Li SS, Li HZ, Zhang BG, Tan YL. ABC2A: A Straightforward and Fast Method for the Accurate Backmapping of RNA Coarse-Grained Models to All-Atom Structures. Molecules 2024; 29:1244. [PMID: 38542881 PMCID: PMC10974898 DOI: 10.3390/molecules29061244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Revised: 03/05/2024] [Accepted: 03/08/2024] [Indexed: 11/12/2024] Open
Abstract
RNAs play crucial roles in various essential biological functions, including catalysis and gene regulation. Despite the widespread use of coarse-grained (CG) models/simulations to study RNA 3D structures and dynamics, their direct application is challenging due to the lack of atomic detail. Therefore, the reconstruction of full atomic structures is desirable. In this study, we introduced a straightforward method called ABC2A for reconstructing all-atom structures from RNA CG models. ABC2A utilizes diverse nucleotide fragments from known structures to assemble full atomic structures based on the CG atoms. The diversification of assembly fragments beyond standard A-form ones, commonly used in other programs, combined with a highly simplified structure refinement process, ensures that ABC2A achieves both high accuracy and rapid speed. Tests on a recent large dataset of 361 RNA experimental structures (30-692 nt) indicate that ABC2A can reconstruct full atomic structures from three-bead CG models with a mean RMSD of ~0.34 Å from experimental structures and an average runtime of ~0.5 s (maximum runtime < 2.5 s). Compared to the state-of-the-art Arena, ABC2A achieves a ~25% improvement in accuracy and is five times faster in speed.
Collapse
Affiliation(s)
- Ya-Zhou Shi
- Research Center of Nonlinear Science, School of Mathematical & Physical Sciences, Wuhan Textile University, Wuhan 430200, China; (Y.-Z.S.); (H.W.); (S.-S.L.); (H.-Z.L.)
| | - Hao Wu
- Research Center of Nonlinear Science, School of Mathematical & Physical Sciences, Wuhan Textile University, Wuhan 430200, China; (Y.-Z.S.); (H.W.); (S.-S.L.); (H.-Z.L.)
| | - Sha-Sha Li
- Research Center of Nonlinear Science, School of Mathematical & Physical Sciences, Wuhan Textile University, Wuhan 430200, China; (Y.-Z.S.); (H.W.); (S.-S.L.); (H.-Z.L.)
| | - Hui-Zhen Li
- Research Center of Nonlinear Science, School of Mathematical & Physical Sciences, Wuhan Textile University, Wuhan 430200, China; (Y.-Z.S.); (H.W.); (S.-S.L.); (H.-Z.L.)
| | - Ben-Gong Zhang
- Research Center of Nonlinear Science, School of Mathematical & Physical Sciences, Wuhan Textile University, Wuhan 430200, China; (Y.-Z.S.); (H.W.); (S.-S.L.); (H.-Z.L.)
| | - Ya-Lan Tan
- Research Center of Nonlinear Science, School of Mathematical & Physical Sciences, Wuhan Textile University, Wuhan 430200, China; (Y.-Z.S.); (H.W.); (S.-S.L.); (H.-Z.L.)
- School of Bioengineering and Health, Wuhan Textile University, Wuhan 430200, China
| |
Collapse
|
161
|
Tavis S, Hettich RL. Multi-Omics integration can be used to rescue metabolic information for some of the dark region of the Pseudomonas putida proteome. BMC Genomics 2024; 25:267. [PMID: 38468234 PMCID: PMC10926591 DOI: 10.1186/s12864-024-10082-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 02/02/2024] [Indexed: 03/13/2024] Open
Abstract
In every omics experiment, genes or their products are identified for which even state of the art tools are unable to assign a function. In the biotechnology chassis organism Pseudomonas putida, these proteins of unknown function make up 14% of the proteome. This missing information can bias analyses since these proteins can carry out functions which impact the engineering of organisms. As a consequence of predicting protein function across all organisms, function prediction tools generally fail to use all of the types of data available for any specific organism, including protein and transcript expression information. Additionally, the release of Alphafold predictions for all Uniprot proteins provides a novel opportunity for leveraging structural information. We constructed a bespoke machine learning model to predict the function of recalcitrant proteins of unknown function in Pseudomonas putida based on these sources of data, which annotated 1079 terms to 213 proteins. Among the predicted functions supplied by the model, we found evidence for a significant overrepresentation of nitrogen metabolism and macromolecule processing proteins. These findings were corroborated by manual analyses of selected proteins which identified, among others, a functionally unannotated operon that likely encodes a branch of the shikimate pathway.
Collapse
Affiliation(s)
- Steven Tavis
- Genome Science and Technology Graduate Program, University of Tennessee Knoxville, Knoxville, USA
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Robert L Hettich
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA.
| |
Collapse
|
162
|
Shor B, Schneidman-Duhovny D. CombFold: predicting structures of large protein assemblies using a combinatorial assembly algorithm and AlphaFold2. Nat Methods 2024; 21:477-487. [PMID: 38326495 PMCID: PMC10927564 DOI: 10.1038/s41592-024-02174-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 01/09/2024] [Indexed: 02/09/2024]
Abstract
Deep learning models, such as AlphaFold2 and RosettaFold, enable high-accuracy protein structure prediction. However, large protein complexes are still challenging to predict due to their size and the complexity of interactions between multiple subunits. Here we present CombFold, a combinatorial and hierarchical assembly algorithm for predicting structures of large protein complexes utilizing pairwise interactions between subunits predicted by AlphaFold2. CombFold accurately predicted (TM-score >0.7) 72% of the complexes among the top-10 predictions in two datasets of 60 large, asymmetric assemblies. Moreover, the structural coverage of predicted complexes was 20% higher compared to corresponding Protein Data Bank entries. We applied the method on complexes from Complex Portal with known stoichiometry but without known structure and obtained high-confidence predictions. CombFold supports the integration of distance restraints based on crosslinking mass spectrometry and fast enumeration of possible complex stoichiometries. CombFold's high accuracy makes it a promising tool for expanding structural coverage beyond monomeric proteins.
Collapse
Affiliation(s)
- Ben Shor
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Dina Schneidman-Duhovny
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel.
| |
Collapse
|
163
|
Manalastas-Cantos K, Adoni KR, Pfeifer M, Märtens B, Grünewald K, Thalassinos K, Topf M. Modeling Flexible Protein Structure With AlphaFold2 and Crosslinking Mass Spectrometry. Mol Cell Proteomics 2024; 23:100724. [PMID: 38266916 PMCID: PMC10884514 DOI: 10.1016/j.mcpro.2024.100724] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 12/23/2023] [Accepted: 12/27/2023] [Indexed: 01/26/2024] Open
Abstract
We propose a pipeline that combines AlphaFold2 (AF2) and crosslinking mass spectrometry (XL-MS) to model the structure of proteins with multiple conformations. The pipeline consists of two main steps: ensemble generation using AF2 and conformer selection using XL-MS data. For conformer selection, we developed two scores-the monolink probability score (MP) and the crosslink probability score (XLP)-both of which are based on residue depth from the protein surface. We benchmarked MP and XLP on a large dataset of decoy protein structures and showed that our scores outperform previously developed scores. We then tested our methodology on three proteins having an open and closed conformation in the Protein Data Bank: Complement component 3 (C3), luciferase, and glutamine-binding periplasmic protein, first generating ensembles using AF2, which were then screened for the open and closed conformations using experimental XL-MS data. In five out of six cases, the most accurate model within the AF2 ensembles-or a conformation within 1 Å of this model-was identified using crosslinks, as assessed through the XLP score. In the remaining case, only the monolinks (assessed through the MP score) successfully identified the open conformation of glutamine-binding periplasmic protein, and these results were further improved by including the "occupancy" of the monolinks. This serves as a compelling proof-of-concept for the effectiveness of monolinks. In contrast, the AF2 assessment score was only able to identify the most accurate conformation in two out of six cases. Our results highlight the complementarity of AF2 with experimental methods like XL-MS, with the MP and XLP scores providing reliable metrics to assess the quality of the predicted models. The MP and XLP scoring functions mentioned above are available at https://gitlab.com/topf-lab/xlms-tools.
Collapse
Affiliation(s)
- Karen Manalastas-Cantos
- Center for Data and Computing in Natural Sciences, Universität Hamburg, Hamburg, Germany; Department of Integrative Virology, Leibniz-Institut für Virologie (LIV), Centre for Structural Systems Biology (CSSB), Hamburg, Germany
| | - Kish R Adoni
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, London, UK; Institute of Structural and Molecular Biology, Birkbeck College, University of London, London, United Kingdom
| | - Matthias Pfeifer
- Department of Integrative Virology, Leibniz-Institut für Virologie (LIV), Centre for Structural Systems Biology (CSSB), Hamburg, Germany; Universitätsklinikum Hamburg Eppendorf (UKE), Hamburg, Germany
| | - Birgit Märtens
- Department of Integrative Virology, Leibniz-Institut für Virologie (LIV), Centre for Structural Systems Biology (CSSB), Hamburg, Germany; Universitätsklinikum Hamburg Eppendorf (UKE), Hamburg, Germany
| | - Kay Grünewald
- Department of Integrative Virology, Leibniz-Institut für Virologie (LIV), Centre for Structural Systems Biology (CSSB), Hamburg, Germany; Department of Chemistry, Universität Hamburg, Hamburg, Germany
| | - Konstantinos Thalassinos
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, London, UK; Institute of Structural and Molecular Biology, Birkbeck College, University of London, London, United Kingdom
| | - Maya Topf
- Department of Integrative Virology, Leibniz-Institut für Virologie (LIV), Centre for Structural Systems Biology (CSSB), Hamburg, Germany; Universitätsklinikum Hamburg Eppendorf (UKE), Hamburg, Germany.
| |
Collapse
|
164
|
Banayan NE, Loughlin BJ, Singh S, Forouhar F, Lu G, Wong K, Neky M, Hunt HS, Bateman LB, Tamez A, Handelman SK, Price WN, Hunt JF. Systematic enhancement of protein crystallization efficiency by bulk lysine-to-arginine (KR) substitution. Protein Sci 2024; 33:e4898. [PMID: 38358135 PMCID: PMC10868448 DOI: 10.1002/pro.4898] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Revised: 01/01/2024] [Accepted: 01/02/2024] [Indexed: 02/16/2024]
Abstract
Structural genomics consortia established that protein crystallization is the primary obstacle to structure determination using x-ray crystallography. We previously demonstrated that crystallization propensity is systematically related to primary sequence, and we subsequently performed computational analyses showing that arginine is the most overrepresented amino acid in crystal-packing interfaces in the Protein Data Bank. Given the similar physicochemical characteristics of arginine and lysine, we hypothesized that multiple lysine-to-arginine (KR) substitutions should improve crystallization. To test this hypothesis, we developed software that ranks lysine sites in a target protein based on the redundancy-corrected KR substitution frequency in homologs. This software can be run interactively on the worldwide web at https://www.pxengineering.org/. We demonstrate that three unrelated single-domain proteins can tolerate 5-11 KR substitutions with at most minor destabilization, and, for two of these three proteins, the construct with the largest number of KR substitutions exhibits significantly enhanced crystallization propensity. This approach rapidly produced a 1.9 Å crystal structure of a human protein domain refractory to crystallization with its native sequence. Structures from Bulk KR-substituted domains show the engineered arginine residues frequently make hydrogen-bonds across crystal-packing interfaces. We thus demonstrate that Bulk KR substitution represents a rational and efficient method for probabilistic engineering of protein surface properties to improve crystallization.
Collapse
Affiliation(s)
- Nooriel E. Banayan
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
| | - Blaine J. Loughlin
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
| | - Shikha Singh
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
| | - Farhad Forouhar
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
| | - Guanqi Lu
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
| | - Kam‐Ho Wong
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
- Present address:
Vaccine Research and DevelopmentPfizer Inc.Pearl RiverNew YorkUSA
| | - Matthew Neky
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
- Present address:
Columbia UniversityNew YorkNew YorkUSA
| | - Henry S. Hunt
- Department of PhysicsStanford UniversityStanfordCaliforniaUSA
| | | | | | - Samuel K. Handelman
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
- Present address:
Department of Pain & Neuronal HealthEli Lily & Co.893 Delaware StIndianapolisIndianaUSA
| | - W. Nicholson Price
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
- Present address:
University of Michigan Law SchoolAnn ArborMichiganUSA
| | - John F. Hunt
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
| |
Collapse
|
165
|
Shams MH, Sohrabi SM, Jafari R, Sheikhian A, Motedayyen H, Baharvand PA, Hasanvand A, Fouladvand A, Assarehzadegan MA. Designing a T-cell epitope-based vaccine using in silico approaches against the Sal k 1 allergen of Salsola kali plant. Sci Rep 2024; 14:5040. [PMID: 38424208 PMCID: PMC10904830 DOI: 10.1038/s41598-024-55788-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Accepted: 02/27/2024] [Indexed: 03/02/2024] Open
Abstract
Allergens originated from Salsola kali (Russian thistle) pollen grains are one of the most important sources of aeroallergens causing pollinosis in desert and semi-desert regions. T-cell epitope-based vaccines (TEV) are more effective among different therapeutic approaches developed to alleviate allergic diseases. The physicochemical properties, and B as well as T cell epitopes of Sal k 1 (a major allergen of S. kali) were predicted using immunoinformatic tools. A TEV was constructed using the linkers EAAAK, GPGPG and the most suitable CD4+ T cell epitopes. RS04 adjuvant was added as a TLR4 agonist to the amino (N) and carboxyl (C) terminus of the TEV protein. The secondary and tertiary structures, solubility, allergenicity, toxicity, stability, physicochemical properties, docking with immune receptors, BLASTp against the human and microbiota proteomes, and in silico cloning of the designed TEV were assessed using immunoinformatic analyses. Two CD4+ T cell epitopes of Sal k1 that had high affinity with different alleles of MHC-II were selected and used in the TEV. The molecular docking of the TEV with HLADRB1, and TLR4 showed TEV strong interactions and stable binding pose to these receptors. Moreover, the codon optimized TEV sequence was cloned between NcoI and XhoI restriction sites of pET-28a(+) expression plasmid. The designed TEV can be used as a promising candidate in allergen-specific immunotherapy against S. kali. Nonetheless, effectiveness of this vaccine should be validated through immunological bioassays.
Collapse
Affiliation(s)
- Mohammad Hossein Shams
- Hepatitis Research Center and Department of Medical Immunology, School of Medicine, Lorestan University of Medical Sciences, Khorramabad, Iran.
| | - Seyyed Mohsen Sohrabi
- Department of Production Engineering and Plant Genetic, Faculty of Agriculture, Shahid Chamran University of Ahvaz, Box 6814993165, Ahvaz, Iran
| | - Reza Jafari
- School of Allied Medical Sciences, Shahroud University of Medical Sciences, Shahroud, Iran
| | - Ali Sheikhian
- Hepatitis Research Center and Department of Medical Immunology, School of Medicine, Lorestan University of Medical Sciences, Khorramabad, Iran
| | - Hossein Motedayyen
- Autoimmune Diseases Research Center, Kashan University of Medical Sciences, Kashan, Iran
| | - Peyman Amanolahi Baharvand
- Hepatitis Research Center and Department of Medical Immunology, School of Medicine, Lorestan University of Medical Sciences, Khorramabad, Iran
| | - Amin Hasanvand
- Department of Physiology and Pharmacology, School of Medicine, Lorestan University of Medical Sciences, Khorramabad, Iran
| | - Ali Fouladvand
- Hepatitis Research Center and Department of Medical Immunology, School of Medicine, Lorestan University of Medical Sciences, Khorramabad, Iran
| | - Mohammad-Ali Assarehzadegan
- Immunology Research Center, Department of Immunology, School of Medicine, Iran University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
166
|
Wang X, Zhu H, Terashi G, Taluja M, Kihara D. DiffModeler: Large Macromolecular Structure Modeling in Low-Resolution Cryo-EM Maps Using Diffusion Model. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.20.576370. [PMID: 38328203 PMCID: PMC10849514 DOI: 10.1101/2024.01.20.576370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Cryogenic electron microscopy (cryo-EM) has now been widely used for determining multi-chain protein complexes. However, modeling a complex structure is challenging particularly when the map resolution is low, typically in the intermediate resolution range of 5 to 10 Å. Within this resolution range, even accurate structure fitting is difficult, let alone de novo modeling. To address this challenge, here we present DiffModeler, a fully automated method for modeling protein complex structures. DiffModeler employs a diffusion model for backbone tracing and integrates AlphaFold2-predicted single-chain structures for structure fitting. Extensive testing on cryo-EM maps at intermediate resolutions demonstrates the exceptional accuracy of DiffModeler in structure modeling, achieving an average TM-Score of 0.92, surpassing existing methodologies significantly. Notably, DiffModeler successfully modeled a protein complex composed of 47 chains and 13,462 residues, achieving a high TM-Score of 0.94. Further benchmarking at low resolutions (10-20 Å confirms its versatility, demonstrating plausible performance. Moreover, when coupled with CryoREAD, DiffModeler excels in constructing protein-DNA/RNA complex structures for near-atomic resolution maps (0-5 Å), showcasing state-of-the-art performance with average TM-Scores of 0.88 and 0.91 across two datasets.
Collapse
Affiliation(s)
- Xiao Wang
- Department of Computer Science, Purdue University, West Lafayette, Indiana, 47907, USA
| | - Han Zhu
- Department of Computer Science, Purdue University, West Lafayette, Indiana, 47907, USA
| | - Genki Terashi
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, 47907, USA
| | - Manav Taluja
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, 47907, USA
- School of Computer Science and Engineering, Vellore Institute of Technology, Tamil Nadu 642014, India
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, Indiana, 47907, USA
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, 47907, USA
| |
Collapse
|
167
|
Leśniewski M, Pyrka M, Czaplewski C, Co NT, Jiang Y, Gong Z, Tang C, Liwo A. Assessment of Two Restraint Potentials for Coarse-Grained Chemical-Cross-Link-Assisted Modeling of Protein Structures. J Chem Inf Model 2024; 64:1377-1393. [PMID: 38345917 PMCID: PMC10900291 DOI: 10.1021/acs.jcim.3c01890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 01/20/2024] [Accepted: 01/22/2024] [Indexed: 02/27/2024]
Abstract
The influence of distance restraints from chemical cross-link mass spectroscopy (XL-MS) on the quality of protein structures modeled with the coarse-grained UNRES force field was assessed by using a protocol based on multiplexed replica exchange molecular dynamics, in which both simulated and experimental cross-link restraints were employed, for 23 small proteins. Six cross-links with upper distance boundaries from 4 Å to 12 Å (azido benzoic acid succinimide (ABAS), triazidotriazine (TATA), succinimidyldiazirine (SDA), disuccinimidyl adipate (DSA), disuccinimidyl glutarate (DSG), and disuccinimidyl suberate (BS3)) and two types of restraining potentials ((i) simple flat-bottom Lorentz-like potentials dependent on side chain distance (all cross-links) and (ii) distance- and orientation-dependent potentials determined based on molecular dynamics simulations of model systems (DSA, DSG, BS3, and SDA)) were considered. The Lorentz-like potentials with properly set parameters were found to produce a greater number of higher-quality models compared to unrestrained simulations than the MD-based potentials, because the latter can force too long distances between side chains. Therefore, the flat-bottom Lorentz-like potentials are recommended to represent cross-link restraints. It was also found that significant improvement of model quality upon the introduction of cross-link restraints is obtained when the sum of differences of indices of cross-linked residues exceeds 150.
Collapse
Affiliation(s)
- Mateusz Leśniewski
- Faculty
of Chemistry, University of Gdańsk, Fahrenheit Union of Universities, ul. Wita Stwosza 63, 80-308 Gdańsk, Poland
| | - Maciej Pyrka
- Faculty
of Chemistry, University of Gdańsk, Fahrenheit Union of Universities, ul. Wita Stwosza 63, 80-308 Gdańsk, Poland
- Department
of Physics and Biophysics, University of
Warmia and Mazury, ul. Oczapowskiego 4, 10-719 Olsztyn, Poland
| | - Cezary Czaplewski
- Faculty
of Chemistry, University of Gdańsk, Fahrenheit Union of Universities, ul. Wita Stwosza 63, 80-308 Gdańsk, Poland
| | - Nguyen Truong Co
- Faculty
of Chemistry, University of Gdańsk, Fahrenheit Union of Universities, ul. Wita Stwosza 63, 80-308 Gdańsk, Poland
| | - Yida Jiang
- College
of Chemistry and Molecular Engineering & Center for Quantitative
Biology & PKU-Tsinghua Center for Life Sciences & Beijing
National Laboratory for Molecular Sciences, Peking University, Beijing 100871, China
| | - Zhou Gong
- Innovation
Academy of Precision Measurement Science and Technology, Chinese Academy of Sciences, 30 W. Xiao Hong Shan, Wuhan 430071, China
| | - Chun Tang
- College
of Chemistry and Molecular Engineering & Center for Quantitative
Biology & PKU-Tsinghua Center for Life Sciences & Beijing
National Laboratory for Molecular Sciences, Peking University, Beijing 100871, China
| | - Adam Liwo
- Faculty
of Chemistry, University of Gdańsk, Fahrenheit Union of Universities, ul. Wita Stwosza 63, 80-308 Gdańsk, Poland
| |
Collapse
|
168
|
Ali MA, Caetano-Anollés G. AlphaFold2 Reveals Structural Patterns of Seasonal Haplotype Diversification in SARS-CoV-2 Spike Protein Variants. BIOLOGY 2024; 13:134. [PMID: 38534404 PMCID: PMC10968544 DOI: 10.3390/biology13030134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 02/07/2024] [Accepted: 02/16/2024] [Indexed: 03/28/2024]
Abstract
The slow experimental acquisition of high-quality atomic structures of the rapidly changing proteins of the COVID-19 virus challenges vaccine and therapeutic drug development efforts. Fortunately, deep learning tools such as AlphaFold2 can quickly generate reliable models of atomic structure at experimental resolution. Current modeling studies have focused solely on definitions of mutant constellations of Variants of Concern (VOCs), leaving out the impact of haplotypes on protein structure. Here, we conduct a thorough comparative structural analysis of S-proteins belonging to major VOCs and corresponding latitude-delimited haplotypes that affect viral seasonal behavior. Our approach identified molecular regions of importance as well as patterns of structural recruitment. The S1 subunit hosted the majority of structural changes, especially those involving the N-terminal domain (NTD) and the receptor-binding domain (RBD). In particular, structural changes in the NTD were much greater than just translations in three-dimensional space, altering the sub-structures to greater extents. We also revealed a notable pattern of structural recruitment with the early VOCs Alpha and Delta behaving antagonistically by suppressing regions of structural change introduced by their corresponding haplotypes, and the current VOC Omicron behaving synergistically by amplifying or collecting structural change. Remarkably, haplotypes altering the galectin-like structure of the NTD were major contributors to seasonal behavior, supporting its putative environmental-sensing role. Our results provide an extensive view of the evolutionary landscape of the S-protein across the COVID-19 pandemic. This view will help predict important regions of structural change in future variants and haplotypes for more efficient vaccine and drug development.
Collapse
Affiliation(s)
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA;
| |
Collapse
|
169
|
Corum MR, Venkannagari H, Hryc CF, Baker ML. Predictive modeling and cryo-EM: A synergistic approach to modeling macromolecular structure. Biophys J 2024; 123:435-450. [PMID: 38268190 PMCID: PMC10912932 DOI: 10.1016/j.bpj.2024.01.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 01/09/2024] [Accepted: 01/18/2024] [Indexed: 01/26/2024] Open
Abstract
Over the last 15 years, structural biology has seen unprecedented development and improvement in two areas: electron cryo-microscopy (cryo-EM) and predictive modeling. Once relegated to low resolutions, single-particle cryo-EM is now capable of achieving near-atomic resolutions of a wide variety of macromolecular complexes. Ushered in by AlphaFold, machine learning has powered the current generation of predictive modeling tools, which can accurately and reliably predict models for proteins and some complexes directly from the sequence alone. Although they offer new opportunities individually, there is an inherent synergy between these techniques, allowing for the construction of large, complex macromolecular models. Here, we give a brief overview of these approaches in addition to illustrating works that combine these techniques for model building. These examples provide insight into model building, assessment, and limitations when integrating predictive modeling with cryo-EM density maps. Together, these approaches offer the potential to greatly accelerate the generation of macromolecular structural insights, particularly when coupled with experimental data.
Collapse
Affiliation(s)
- Michael R Corum
- Department of Biochemistry and Molecular Biology, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas
| | - Harikanth Venkannagari
- Department of Biochemistry and Molecular Biology, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas
| | - Corey F Hryc
- Department of Biochemistry and Molecular Biology, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas
| | - Matthew L Baker
- Department of Biochemistry and Molecular Biology, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas.
| |
Collapse
|
170
|
Bahena-Ceron R, Teixeira C, Ponce JRJ, Wolff P, Couzon F, François P, Klaholz BP, Vandenesch F, Romby P, Moreau K, Marzi S. RlmQ: a newly discovered rRNA modification enzyme bridging RNA modification and virulence traits in Staphylococcus aureus. RNA (NEW YORK, N.Y.) 2024; 30:200-212. [PMID: 38164596 PMCID: PMC10870370 DOI: 10.1261/rna.079850.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Accepted: 11/29/2023] [Indexed: 01/03/2024]
Abstract
rRNA modifications play crucial roles in fine-tuning the delicate balance between translation speed and accuracy, yet the underlying mechanisms remain elusive. Comparative analyses of the rRNA modifications in taxonomically distant bacteria could help define their general, as well as species-specific, roles. In this study, we identified a new methyltransferase, RlmQ, in Staphylococcus aureus responsible for the Gram-positive specific m7G2601, which is not modified in Escherichia coli (G2574). We also demonstrate the absence of methylation on C1989, equivalent to E. coli C1962, which is methylated at position 5 by the Gram-negative specific RlmI methyltransferase, a paralog of RlmQ. Both modifications (S. aureus m7G2601 and E. coli m5C1962) are situated within the same tRNA accommodation corridor, hinting at a potential shared function in translation. Inactivation of S. aureus rlmQ causes the loss of methylation at G2601 and significantly impacts growth, cytotoxicity, and biofilm formation. These findings unravel the intricate connections between rRNA modifications, translation, and virulence in pathogenic Gram-positive bacteria.
Collapse
Affiliation(s)
- Roberto Bahena-Ceron
- Université de Strasbourg, CNRS, Architecture et Réactivité de l'ARN, 67000 Strasbourg, France
| | - Chloé Teixeira
- CIRI, Centre International de Recherche en Infectiologie, Université de Lyon, Inserm U1111, Université Claude Bernard Lyon 1, CNRS UMR5308, ENS de Lyon, 69007 Lyon, France
| | - Jose R Jaramillo Ponce
- Université de Strasbourg, CNRS, Architecture et Réactivité de l'ARN, 67000 Strasbourg, France
| | - Philippe Wolff
- Université de Strasbourg, CNRS, Architecture et Réactivité de l'ARN, 67000 Strasbourg, France
| | - Florence Couzon
- CIRI, Centre International de Recherche en Infectiologie, Université de Lyon, Inserm U1111, Université Claude Bernard Lyon 1, CNRS UMR5308, ENS de Lyon, 69007 Lyon, France
| | - Pauline François
- CIRI, Centre International de Recherche en Infectiologie, Université de Lyon, Inserm U1111, Université Claude Bernard Lyon 1, CNRS UMR5308, ENS de Lyon, 69007 Lyon, France
| | - Bruno P Klaholz
- Centre for Integrative Biology, Department of Integrated Structural Biology, IGBMC, 67400 Illkirch, France
- CNRS UMR 7104, 67400 Illkirch, France
- Inserm U964, 67400 Illkirch, France
- Université de Strasbourg, 67000 Strasbourg, France
| | - François Vandenesch
- CIRI, Centre International de Recherche en Infectiologie, Université de Lyon, Inserm U1111, Université Claude Bernard Lyon 1, CNRS UMR5308, ENS de Lyon, 69007 Lyon, France
- Institut des agents infectieux, Hospices Civils de Lyon, 69004 Lyon, France
- Centre National de Référence des Staphylocoques, Hospices Civils de Lyon, 69317 Lyon, France
| | - Pascale Romby
- Université de Strasbourg, CNRS, Architecture et Réactivité de l'ARN, 67000 Strasbourg, France
| | - Karen Moreau
- CIRI, Centre International de Recherche en Infectiologie, Université de Lyon, Inserm U1111, Université Claude Bernard Lyon 1, CNRS UMR5308, ENS de Lyon, 69007 Lyon, France
| | - Stefano Marzi
- Université de Strasbourg, CNRS, Architecture et Réactivité de l'ARN, 67000 Strasbourg, France
| |
Collapse
|
171
|
Schweke H, Pacesa M, Levin T, Goverde CA, Kumar P, Duhoo Y, Dornfeld LJ, Dubreuil B, Georgeon S, Ovchinnikov S, Woolfson DN, Correia BE, Dey S, Levy ED. An atlas of protein homo-oligomerization across domains of life. Cell 2024; 187:999-1010.e15. [PMID: 38325366 DOI: 10.1016/j.cell.2024.01.022] [Citation(s) in RCA: 26] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 11/03/2023] [Accepted: 01/15/2024] [Indexed: 02/09/2024]
Abstract
Protein structures are essential to understanding cellular processes in molecular detail. While advances in artificial intelligence revealed the tertiary structure of proteins at scale, their quaternary structure remains mostly unknown. We devise a scalable strategy based on AlphaFold2 to predict homo-oligomeric assemblies across four proteomes spanning the tree of life. Our results suggest that approximately 45% of an archaeal proteome and a bacterial proteome and 20% of two eukaryotic proteomes form homomers. Our predictions accurately capture protein homo-oligomerization, recapitulate megadalton complexes, and unveil hundreds of homo-oligomer types, including three confirmed experimentally by structure determination. Integrating these datasets with omics information suggests that a majority of known protein complexes are symmetric. Finally, these datasets provide a structural context for interpreting disease mutations and reveal coiled-coil regions as major enablers of quaternary structure evolution in human. Our strategy is applicable to any organism and provides a comprehensive view of homo-oligomerization in proteomes.
Collapse
Affiliation(s)
- Hugo Schweke
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Martin Pacesa
- Laboratory of Protein Design and Immunoengineering, École Polytechnique Fédérale de Lausanne and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Tal Levin
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Casper A Goverde
- Laboratory of Protein Design and Immunoengineering, École Polytechnique Fédérale de Lausanne and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Prasun Kumar
- School of Chemistry, University of Bristol, Bristol BS8 1TS, UK; School of Biochemistry, University of Bristol, Bristol BS8 1TD, UK; Bristol BioDesign Institute, University of Bristol, Life Sciences Building, Bristol BS8 1TQ, UK; Max Planck-Bristol Centre for Minimal Biology, University of Bristol, Cantock's Close, Bristol BS8 1TS, UK
| | - Yoan Duhoo
- Protein Production and Structure Characterization Core Facility (PTPSP), School of Life Sciences, École polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Lars J Dornfeld
- Laboratory of Protein Design and Immunoengineering, École Polytechnique Fédérale de Lausanne and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Benjamin Dubreuil
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Sandrine Georgeon
- Laboratory of Protein Design and Immunoengineering, École Polytechnique Fédérale de Lausanne and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Sergey Ovchinnikov
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA, USA
| | - Derek N Woolfson
- School of Chemistry, University of Bristol, Bristol BS8 1TS, UK; School of Biochemistry, University of Bristol, Bristol BS8 1TD, UK; Bristol BioDesign Institute, University of Bristol, Life Sciences Building, Bristol BS8 1TQ, UK; Max Planck-Bristol Centre for Minimal Biology, University of Bristol, Cantock's Close, Bristol BS8 1TS, UK.
| | - Bruno E Correia
- Laboratory of Protein Design and Immunoengineering, École Polytechnique Fédérale de Lausanne and Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| | - Sucharita Dey
- Department of Bioscience and Bioengineering, Indian Institute of Technology Jodhpur, Rajasthan, India.
| | - Emmanuel D Levy
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, Israel.
| |
Collapse
|
172
|
Wuyun Q, Chen Y, Shen Y, Cao Y, Hu G, Cui W, Gao J, Zheng W. Recent Progress of Protein Tertiary Structure Prediction. Molecules 2024; 29:832. [PMID: 38398585 PMCID: PMC10893003 DOI: 10.3390/molecules29040832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 02/06/2024] [Accepted: 02/08/2024] [Indexed: 02/25/2024] Open
Abstract
The prediction of three-dimensional (3D) protein structure from amino acid sequences has stood as a significant challenge in computational and structural bioinformatics for decades. Recently, the widespread integration of artificial intelligence (AI) algorithms has substantially expedited advancements in protein structure prediction, yielding numerous significant milestones. In particular, the end-to-end deep learning method AlphaFold2 has facilitated the rise of structure prediction performance to new heights, regularly competitive with experimental structures in the 14th Critical Assessment of Protein Structure Prediction (CASP14). To provide a comprehensive understanding and guide future research in the field of protein structure prediction for researchers, this review describes various methodologies, assessments, and databases in protein structure prediction, including traditionally used protein structure prediction methods, such as template-based modeling (TBM) and template-free modeling (FM) approaches; recently developed deep learning-based methods, such as contact/distance-guided methods, end-to-end folding methods, and protein language model (PLM)-based methods; multi-domain protein structure prediction methods; the CASP experiments and related assessments; and the recently released AlphaFold Protein Structure Database (AlphaFold DB). We discuss their advantages, disadvantages, and application scopes, aiming to provide researchers with insights through which to understand the limitations, contexts, and effective selections of protein structure prediction methods in protein-related fields.
Collapse
Affiliation(s)
- Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Yihan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Yifeng Shen
- Faculty of Environment and Information Studies, Keio University, Fujisawa 252-0882, Kanagawa, Japan;
| | - Yang Cao
- College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Gang Hu
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China
| | - Wei Cui
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
173
|
Liu Z, Zhang C, Zhang Q, Zhang Y, Yu DJ. TM-search: An Efficient and Effective Tool for Protein Structure Database Search. J Chem Inf Model 2024; 64:1043-1049. [PMID: 38270339 DOI: 10.1021/acs.jcim.3c01455] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]
Abstract
The quickly increasing size of the Protein Data Bank is challenging biologists to develop a more scalable protein structure alignment tool for fast structure database search. Although many protein structure search algorithms and programs have been designed and implemented for this purpose, most require a large amount of computational time. We propose a novel protein structure search approach, TM-search, which is based on the pairwise structure alignment program TM-align and a new iterative clustering algorithm. Benchmark tests demonstrate that TM-search is 27 times faster than a TM-align full database search while still being able to identify ∼90% of all high TM-score hits, which is 2-10 times more than other existing programs such as Foldseek, Dali, and PSI-BLAST.
Collapse
Affiliation(s)
- Zi Liu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
- Computer Department, Jingdezhen Ceramic University, Jingdezhen 333403, China
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw, Ann Arbor, Michigan 48109-2218, United States
| | - Qidi Zhang
- Computer Department, Jingdezhen Ceramic University, Jingdezhen 333403, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw, Ann Arbor, Michigan 48109-2218, United States
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| |
Collapse
|
174
|
Wu KE, Yang KK, van den Berg R, Alamdari S, Zou JY, Lu AX, Amini AP. Protein structure generation via folding diffusion. Nat Commun 2024; 15:1059. [PMID: 38316764 PMCID: PMC10844308 DOI: 10.1038/s41467-024-45051-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 01/12/2024] [Indexed: 02/07/2024] Open
Abstract
The ability to computationally generate novel yet physically foldable protein structures could lead to new biological discoveries and new treatments targeting yet incurable diseases. Despite recent advances in protein structure prediction, directly generating diverse, novel protein structures from neural networks remains difficult. In this work, we present a diffusion-based generative model that generates protein backbone structures via a procedure inspired by the natural folding process. We describe a protein backbone structure as a sequence of angles capturing the relative orientation of the constituent backbone atoms, and generate structures by denoising from a random, unfolded state towards a stable folded structure. Not only does this mirror how proteins natively twist into energetically favorable conformations, the inherent shift and rotational invariance of this representation crucially alleviates the need for more complex equivariant networks. We train a denoising diffusion probabilistic model with a simple transformer backbone and demonstrate that our resulting model unconditionally generates highly realistic protein structures with complexity and structural patterns akin to those of naturally-occurring proteins. As a useful resource, we release an open-source codebase and trained models for protein structure diffusion.
Collapse
Affiliation(s)
- Kevin E Wu
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | | | | | | | - James Y Zou
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | - Alex X Lu
- Microsoft Research, Cambridge, MA, USA
| | | |
Collapse
|
175
|
Jones RN, Miyauchi S, Roy S, Boutros N, Mayadev JS, Mell LK, Califano JA, Venuti A, Sharabi AB. Computational and AI-driven 3D structural analysis of human papillomavirus (HPV) oncoproteins E5, E6, and E7 reveal significant divergence of HPV E5 between low-risk and high-risk genotypes. Virology 2024; 590:109946. [PMID: 38147693 DOI: 10.1016/j.virol.2023.109946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 11/01/2023] [Accepted: 11/20/2023] [Indexed: 12/28/2023]
Abstract
There are over 220 identified genotypes of Human papillomavirus (HPV), and the HPV genome encodes 3 major oncogenes, E5, E6, and E7. Conservation and divergence in protein sequence and function between low-risk versus high-risk oncogenic HPV genotypes has not been fully characterized. Here, we used modern computational and structural folding algorithms to perform a comparative analysis of HPV E5, E6, and E7 between multiple low risk and high risk genotypes. We first identified significantly greater sequence divergence in E5 between low- and high-risk genotypes compared to E6 and E7. Next, we used AlphaFold to model the structure of papillomavirus proteins and complexes with high confidence, including some with no established consensus structure. We observed that HPV E5, but not E6 or E7, had a dramatically different 3D structure between low-risk and high-risk genotypes. To our knowledge, this is the first comparative analysis of HPV proteins using Alphafold artificial intelligence (AI) system. The marked differences in E5 sequence and structure in high-risk HPVs may contribute in important and underappreciated ways to the development of HPV-associated cancers.
Collapse
Affiliation(s)
- Riley N Jones
- Department of Radiation Medicine and Applied Sciences, University of California, San Diego, La Jolla, CA, 92037, USA
| | - Sayuri Miyauchi
- Department of Radiation Medicine and Applied Sciences, University of California, San Diego, La Jolla, CA, 92037, USA
| | - Souvick Roy
- Department of Radiation Medicine and Applied Sciences, University of California, San Diego, La Jolla, CA, 92037, USA
| | - Nathalie Boutros
- Department of Radiation Medicine and Applied Sciences, University of California, San Diego, La Jolla, CA, 92037, USA
| | - Jyoti S Mayadev
- Department of Radiation Medicine and Applied Sciences, University of California, San Diego, La Jolla, CA, 92037, USA
| | - Loren K Mell
- Department of Radiation Medicine and Applied Sciences, University of California, San Diego, La Jolla, CA, 92037, USA; Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
| | - Joseph A Califano
- Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA; Division of Otolaryngology-Head and Neck Surgery, Department of Surgery, University of California, San Diego, La Jolla, CA, USA
| | - Aldo Venuti
- HPV-UNIT-UOSD Tumor Immunology and Immunotherapy, IRCCS Regina Elena National Cancer Institute, Rome, Italy
| | - Andrew B Sharabi
- Department of Radiation Medicine and Applied Sciences, University of California, San Diego, La Jolla, CA, 92037, USA; Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
176
|
Zheng W, Wuyun Q, Li Y, Zhang C, Freddolino L, Zhang Y. Improving deep learning protein monomer and complex structure prediction using DeepMSA2 with huge metagenomics data. Nat Methods 2024; 21:279-289. [PMID: 38167654 PMCID: PMC10864179 DOI: 10.1038/s41592-023-02130-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Accepted: 11/13/2023] [Indexed: 01/05/2024]
Abstract
Leveraging iterative alignment search through genomic and metagenome sequence databases, we report the DeepMSA2 pipeline for uniform protein single- and multichain multiple-sequence alignment (MSA) construction. Large-scale benchmarks show that DeepMSA2 MSAs can remarkably increase the accuracy of protein tertiary and quaternary structure predictions compared with current state-of-the-art methods. An integrated pipeline with DeepMSA2 participated in the most recent CASP15 experiment and created complex structural models with considerably higher quality than the AlphaFold2-Multimer server (v.2.2.0). Detailed data analyses show that the major advantage of DeepMSA2 lies in its balanced alignment search and effective model selection, and in the power of integrating huge metagenomics databases. These results demonstrate a new avenue to improve deep learning protein structure prediction through advanced MSA construction and provide additional evidence that optimization of input information to deep learning-based structure prediction methods must be considered with as much care as the design of the predictor itself.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Lydia Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA.
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore.
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA.
- Department of Computer Science, School of Computing, National University of Singapore, Singapore, Singapore.
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
177
|
Giancotti R, Lomoio U, Puccio B, Tradigo G, Vizza P, Torti C, Veltri P, Guzzi PH. The Omicron XBB.1 Variant and Its Descendants: Genomic Mutations, Rapid Dissemination and Notable Characteristics. BIOLOGY 2024; 13:90. [PMID: 38392308 PMCID: PMC10886209 DOI: 10.3390/biology13020090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 01/26/2024] [Accepted: 01/30/2024] [Indexed: 02/24/2024]
Abstract
The SARS-CoV-2 virus, which is a major threat to human health, has undergone many mutations during the replication process due to errors in the replication steps and modifications in the structure of viral proteins. The XBB variant was identified for the first time in Singapore in the fall of 2022. It was then detected in other countries, including the United States, Canada, and the United Kingdom. We study the impact of sequence changes on spike protein structure on the subvariants of XBB, with particular attention to the velocity of variant diffusion and virus activity with respect to its diffusion. We examine the structural and functional distinctions of the variants in three different conformations: (i) spike glycoprotein in complex with ACE2 (1-up state), (ii) spike glycoprotein (closed-1 state), and (iii) S protein (open-1 state). We also estimate the affinity binding between the spike protein and ACE2. The market binding affinity observed in specific variants raises questions about the efficacy of current vaccines in preparing the immune system for virus variant recognition. This work may be useful in devising strategies to manage the ongoing COVID-19 pandemic. To stay ahead of the virus evolution, further research and surveillance should be carried out to adjust public health measures accordingly.
Collapse
Affiliation(s)
- Raffaele Giancotti
- Department of Surgical and Medical Sciences, Magna Graecia University of Catanzaro, 88100 Catanzaro, Italy
| | - Ugo Lomoio
- Department of Surgical and Medical Sciences, Magna Graecia University of Catanzaro, 88100 Catanzaro, Italy
| | - Barbara Puccio
- Department of Surgical and Medical Sciences, Magna Graecia University of Catanzaro, 88100 Catanzaro, Italy
| | | | - Patrizia Vizza
- Department of Surgical and Medical Sciences, Magna Graecia University of Catanzaro, 88100 Catanzaro, Italy
| | - Carlo Torti
- Department of Surgical and Medical Sciences, Magna Graecia University of Catanzaro, 88100 Catanzaro, Italy
| | - Pierangelo Veltri
- Department of Computer Engineering, Modelling, Electronics and System, University of Calabria, 87036 Rende, Italy
| | - Pietro Hiram Guzzi
- Department of Surgical and Medical Sciences, Magna Graecia University of Catanzaro, 88100 Catanzaro, Italy
| |
Collapse
|
178
|
Liu Y, Liu H. Protein sequence design on given backbones with deep learning. Protein Eng Des Sel 2024; 37:gzad024. [PMID: 38157313 DOI: 10.1093/protein/gzad024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 12/08/2023] [Accepted: 12/18/2023] [Indexed: 01/03/2024] Open
Abstract
Deep learning methods for protein sequence design focus on modeling and sampling the many- dimensional distribution of amino acid sequences conditioned on the backbone structure. To produce physically foldable sequences, inter-residue couplings need to be considered properly. These couplings are treated explicitly in iterative methods or autoregressive methods. Non-autoregressive models treating these couplings implicitly are computationally more efficient, but still await tests by wet experiment. Currently, sequence design methods are evaluated mainly using native sequence recovery rate and native sequence perplexity. These metrics can be complemented by sequence-structure compatibility metrics obtained from energy calculation or structure prediction. However, existing computational metrics have important limitations that may render the generalization of computational test results to performance in real applications unwarranted. Validation of design methods by wet experiments should be encouraged.
Collapse
Affiliation(s)
- Yufeng Liu
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Haiyan Liu
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
- Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, Anhui 230027, China
- School of Biomedical Engineering, Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, Jiangsu 215004, China
| |
Collapse
|
179
|
Peng J, Zhao L. The origin and structural evolution of de novo genes in Drosophila. Nat Commun 2024; 15:810. [PMID: 38280868 PMCID: PMC10821953 DOI: 10.1038/s41467-024-45028-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 01/09/2024] [Indexed: 01/29/2024] Open
Abstract
Recent studies reveal that de novo gene origination from previously non-genic sequences is a common mechanism for gene innovation. These young genes provide an opportunity to study the structural and functional origins of proteins. Here, we combine high-quality base-level whole-genome alignments and computational structural modeling to study the origination, evolution, and protein structures of lineage-specific de novo genes. We identify 555 de novo gene candidates in D. melanogaster that originated within the Drosophilinae lineage. Sequence composition, evolutionary rates, and expression patterns indicate possible gradual functional or adaptive shifts with their gene ages. Surprisingly, we find little overall protein structural changes in candidates from the Drosophilinae lineage. We identify several candidates with potentially well-folded protein structures. Ancestral sequence reconstruction analysis reveals that most potentially well-folded candidates are often born well-folded. Single-cell RNA-seq analysis in testis shows that although most de novo gene candidates are enriched in spermatocytes, several young candidates are biased towards the early spermatogenesis stage, indicating potentially important but less emphasized roles of early germline cells in the de novo gene origination in testis. This study provides a systematic overview of the origin, evolution, and protein structural changes of Drosophilinae-specific de novo genes.
Collapse
Affiliation(s)
- Junhui Peng
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA
| | - Li Zhao
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA.
| |
Collapse
|
180
|
Ko S, Kim J, Lim J, Lee SM, Park JY, Woo J, Scott-Nevros ZK, Kim JR, Yoon H, Kim D. Blanket antimicrobial resistance gene database with structural information, BOARDS, provides insights on historical landscape of resistance prevalence and effects of mutations in enzyme structure. mSystems 2024; 9:e0094323. [PMID: 38085058 PMCID: PMC10871167 DOI: 10.1128/msystems.00943-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 11/02/2023] [Indexed: 01/24/2024] Open
Abstract
Antimicrobial resistance (AMR) in pathogenic bacteria poses a significant threat to public health, yet there is still a need for development in the tools to deeply understand AMR genes based on genetic or structural information. In this study, we present an interactive web database named Blanket Overarching Antimicrobial-Resistance gene Database with Structural information (BOARDS, sbml.unist.ac.kr), a database that comprehensively includes 3,943 reported AMR gene information for 1,997 extended spectrum beta-lactamase (ESBL) and 1,946 other genes as well as a total of 27,395 predicted protein structures. These structures, which include both wild-type AMR genes and their mutants, were derived from 80,094 publicly available whole-genome sequences. In addition, we developed the rapid analysis and detection tool of antimicrobial-resistance (RADAR), a one-stop analysis pipeline to detect AMR genes across whole-genome sequencing (WGSs). By integrating BOARDS and RADAR, the AMR prevalence landscape for eight multi-drug resistant pathogens was reconstructed, leading to unexpected findings such as the pre-existence of the MCR genes before their official reports. Enzymatic structure prediction-based analysis revealed that the occurrence of mutations found in some ESBL genes was found to be closely related to the binding affinities with their antibiotic substrates. Overall, BOARDS can play a significant role in performing in-depth analysis on AMR.IMPORTANCEWhile the increasing antibiotic resistance (AMR) in pathogen has been a burden on public health, effective tools for deep understanding of AMR based on genetic or structural information remain limited. In this study, a blanket overarching antimicrobial-resistance gene database with structure information (BOARDS)-a web-based database that comprehensively collected AMR gene data with predictive protein structural information was constructed. Additionally, we report the development of a RADAR pipeline that can analyze whole-genome sequences as well. BOARDS, which includes sequence and structural information, has shown the historical landscape and prevalence of the AMR genes and can provide insight into single-nucleotide polymorphism effects on antibiotic degrading enzymes within protein structures.
Collapse
Affiliation(s)
- Seyoung Ko
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea
- School of Life Sciences, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea
| | - Jaehyung Kim
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea
| | - Jaewon Lim
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea
| | - Sang-Mok Lee
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea
| | - Joon Young Park
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea
| | - Jihoon Woo
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea
| | - Zoe K. Scott-Nevros
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea
| | - Jong R. Kim
- School of Engineering and Digital Sciences, Nazarbayev University, Astan, Kazakhstan
| | - Hyunjin Yoon
- Department of Molecular Science and Technology, Ajou University, Suwon, South Korea
| | - Donghyuk Kim
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea
- School of Life Sciences, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea
| |
Collapse
|
181
|
Desai A, Mahajan V, Ramabhadran RO, Mukherjee R. Binding order of substrate and cofactor in sulfonamide monooxygenase during sulfa drug degradation: in silico studies. J Biomol Struct Dyn 2024:1-15. [PMID: 38263732 DOI: 10.1080/07391102.2024.2306495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 01/10/2024] [Indexed: 01/25/2024]
Abstract
For decades, sulfonamide antibiotics have been used across industries such as agriculture and animal husbandry. However, the use and inadvertent misuse of these antibiotics have resulted in the advent of sulfonamide-drug-resistant strains due to antibiotic pollution. Enzymatic bioremediation of antibiotics remains a potential emerging solution to combat antibiotic pollution. Here, we propose an enzymatic model for the degradation of sulfonamides by Microbacterium sp. We have employed a multi-pronged computational strategy involving - protein structure modelling, ligand docking and molecular dynamics simulations to decipher a plausible binding order for the enzymatic degradation of sulfonamides by the bacterial sulfonamide monooxygenase, SulX. Our results enable us to predict that this degradation is achieved through the sequential binding of the antibiotic sulfonamide followed by the reduced flavin cofactor FMNH2, thereby laying the computational foundation for further advancements in enzyme-mediated degradation of the antibiotic. We also provide a list of experiments which may be performed to verify and follow-up on our in-silico studies.
Collapse
Affiliation(s)
- Amogh Desai
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati, India
| | - Ved Mahajan
- Department of Chemistry, Indian Institute of Science Education and Research Tirupati, Tirupati, India
| | - Raghunath O Ramabhadran
- Department of Chemistry, Indian Institute of Science Education and Research Tirupati, Tirupati, India
| | - Raju Mukherjee
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati, India
| |
Collapse
|
182
|
Zhang Z, Cai Y, Zhang B, Zheng W, Freddolino L, Zhang G, Zhou X. DEMO-EM2: assembling protein complex structures from cryo-EM maps through intertwined chain and domain fitting. Brief Bioinform 2024; 25:bbae113. [PMID: 38517699 PMCID: PMC10959074 DOI: 10.1093/bib/bbae113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 02/10/2024] [Accepted: 02/25/2024] [Indexed: 03/24/2024] Open
Abstract
The breakthrough in cryo-electron microscopy (cryo-EM) technology has led to an increasing number of density maps of biological macromolecules. However, constructing accurate protein complex atomic structures from cryo-EM maps remains a challenge. In this study, we extend our previously developed DEMO-EM to present DEMO-EM2, an automated method for constructing protein complex models from cryo-EM maps through an iterative assembly procedure intertwining chain- and domain-level matching and fitting for predicted chain models. The method was carefully evaluated on 27 cryo-electron tomography (cryo-ET) maps and 16 single-particle EM maps, where DEMO-EM2 models achieved an average TM-score of 0.92, outperforming those of state-of-the-art methods. The results demonstrate an efficient method that enables the rapid and reliable solution of challenging cryo-EM structure modeling problems.
Collapse
Affiliation(s)
- Ziying Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Yaxian Cai
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Biao Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Lydia Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiaogen Zhou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
183
|
Bernard C, Postic G, Ghannay S, Tahi F. RNAdvisor: a comprehensive benchmarking tool for the measure and prediction of RNA structural model quality. Brief Bioinform 2024; 25:bbae064. [PMID: 38436560 PMCID: PMC10939302 DOI: 10.1093/bib/bbae064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 01/30/2024] [Accepted: 02/02/2024] [Indexed: 03/05/2024] Open
Abstract
RNA is a complex macromolecule that plays central roles in the cell. While it is well known that its structure is directly related to its functions, understanding and predicting RNA structures is challenging. Assessing the real or predictive quality of a structure is also at stake with the complex 3D possible conformations of RNAs. Metrics have been developed to measure model quality while scoring functions aim at assigning quality to guide the discrimination of structures without a known and solved reference. Throughout the years, many metrics and scoring functions have been developed, and no unique assessment is used nowadays. Each developed assessment method has its specificity and might be complementary to understanding structure quality. Therefore, to evaluate RNA 3D structure predictions, it would be important to calculate different metrics and/or scoring functions. For this purpose, we developed RNAdvisor, a comprehensive automated software that integrates and enhances the accessibility of existing metrics and scoring functions. In this paper, we present our RNAdvisor tool, as well as state-of-the-art existing metrics, scoring functions and a set of benchmarks we conducted for evaluating them. Source code is freely available on the EvryRNA platform: https://evryrna.ibisc.univ-evry.fr.
Collapse
Affiliation(s)
- Clement Bernard
- Université Paris Saclay, Univ Evry, IBISC, 91020 Evry-Courcouronnes, France
| | - Guillaume Postic
- Université Paris Saclay, Univ Evry, IBISC, 91020 Evry-Courcouronnes, France
| | - Sahar Ghannay
- LISN - CNRS/Université Paris-Saclay, France, 91400 Orsay, France
| | - Fariza Tahi
- Université Paris Saclay, Univ Evry, IBISC, 91020 Evry-Courcouronnes, France
| |
Collapse
|
184
|
Thayyil Menambath D, Adiga U, Rai T, Adiga S, Shetty V. Identification of the SIRT1 gene's most harmful non-synonymous SNPs and their effects on functional and structural features-an in silico analysis. F1000Res 2024; 12:66. [PMID: 38283900 PMCID: PMC10822041 DOI: 10.12688/f1000research.128706.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/16/2024] [Indexed: 01/30/2024] Open
Abstract
Introduction The sirtuin (Silent mating type information regulation 2 homolog)1(SIRT1) protein plays a vital role in many disorders such as diabetes, cancer, obesity, inflammation, and neurodegenerative and cardiovascular diseases. The objective of this in silico analysis of SIRT1's functional single nucleotide polymorphisms (SNPs) was to gain valuable insight into the harmful effects of non-synonymous SNPs (nsSNPs) on the protein. The objective of the study was to use bioinformatics methods to investigate the genetic variations and modifications that may have an impact on the SIRT1 gene's expression and function. Methods nsSNPs of SIRT1 protein were collected from the dbSNP site, from its three (3) different protein accession IDs. These were then fed to various bioinformatic tools such as SIFT, Provean, and I- Mutant to find the most deleterious ones. Functional and structural effects were examined using the HOPE server and I-Tasser. Gene interactions were predicted by STRING software. The SIFT, Provean, and I-Mutant tools detected the most deleterious three nsSNPs (rs769519031, rs778184510, and rs199983221). Results Out of 252 nsSNPs, SIFT analysis showed that 94 were deleterious, Provean listed 67 dangerous, and I-Mutant found 58 nsSNPs resulting in lowered stability of proteins. HOPE modelling of rs199983221 and rs769519031 suggested reduced hydrophobicity due to Ile 4Thr and Ile223Ser resulting in decreased hydrophobic interactions. In contrast, on modelling rs778184510, the mutant protein had a higher hydrophobicity than the wild type. Conclusions Our study reports that three nsSNPs (D357A, I223S, I4T) are the most damaging mutations of the SIRT1 gene. Mutations may result in altered protein structure and functions. Such altered protein may be the basis for various disorders. Our findings may be a crucial guide in establishing the pathogenesis of various disorders.
Collapse
Affiliation(s)
| | - Usha Adiga
- Biochemistry, KS Hegde Medical Academy, NITTE (DU), Mangalore, Karnataka, 575018, India
| | - Tirthal Rai
- Biochemistry, KS Hegde Medical Academy, NITTE (DU), Mangalore, Karnataka, 575018, India
| | - Sachidananda Adiga
- Pharmacology, KS Hegde Medical Academy, NITTE(DU), Mangalore, Karnataka, 575018, India
| | - Vijith Shetty
- Oncology, KS Hegde Medical Academy, NITTE(DU), Mangalore, Karnataka, 575018, India
| |
Collapse
|
185
|
Li J, Wang L, Zhu Z, Song C. Exploring the Alternative Conformation of a Known Protein Structure Based on Contact Map Prediction. J Chem Inf Model 2024; 64:301-315. [PMID: 38117138 PMCID: PMC10777399 DOI: 10.1021/acs.jcim.3c01381] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 12/03/2023] [Accepted: 12/05/2023] [Indexed: 12/21/2023]
Abstract
The rapid development of deep learning-based methods has considerably advanced the field of protein structure prediction. The accuracy of predicting the 3D structures of simple proteins is comparable to that of experimentally determined structures, providing broad possibilities for structure-based biological studies. Another critical question is whether and how multistate structures can be predicted from a given protein sequence. In this study, analysis of tens of two-state proteins demonstrated that deep learning-based contact map predictions contain structural information on both states, which suggests that it is probably appropriate to change the target of deep learning-based protein structure prediction from one specific structure to multiple likely structures. Furthermore, by combining deep learning- and physics-based computational methods, we developed a protocol for exploring alternative conformations from a known structure of a given protein, by which we successfully approached the holo-state conformations of multiple representative proteins from their apo-state structures.
Collapse
Affiliation(s)
- Jiaxuan Li
- Center
for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Lei Wang
- Center
for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Peking-Tsinghua
Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Zefeng Zhu
- Center
for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Peking-Tsinghua
Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Chen Song
- Center
for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Peking-Tsinghua
Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| |
Collapse
|
186
|
Zhang C, Zhang X, Freddolino L, Zhang Y. BioLiP2: an updated structure database for biologically relevant ligand-protein interactions. Nucleic Acids Res 2024; 52:D404-D412. [PMID: 37522378 PMCID: PMC10767969 DOI: 10.1093/nar/gkad630] [Citation(s) in RCA: 22] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 07/03/2023] [Accepted: 07/17/2023] [Indexed: 08/01/2023] Open
Abstract
With the progress of structural biology, the Protein Data Bank (PDB) has witnessed rapid accumulation of experimentally solved protein structures. Since many structures are determined with purification and crystallization additives that are unrelated to a protein's in vivo function, it is nontrivial to identify the subset of protein-ligand interactions that are biologically relevant. We developed the BioLiP2 database (https://zhanggroup.org/BioLiP) to extract biologically relevant protein-ligand interactions from the PDB database. BioLiP2 assesses the functional relevance of the ligands by geometric rules and experimental literature validations. The ligand binding information is further enriched with other function annotations, including Enzyme Commission numbers, Gene Ontology terms, catalytic sites, and binding affinities collected from other databases and a manual literature survey. Compared to its predecessor BioLiP, BioLiP2 offers significantly greater coverage of nucleic acid-protein interactions, and interactions involving large complexes that are unavailable in PDB format. BioLiP2 also integrates cutting-edge structural alignment algorithms with state-of-the-art structure prediction techniques, which for the first time enables composite protein structure and sequence-based searching and significantly enhances the usefulness of the database in structure-based function annotations. With these new developments, BioLiP2 will continue to be an important and comprehensive database for docking, virtual screening, and structure-based protein function analyses.
Collapse
Affiliation(s)
- Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Xi Zhang
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| | - Lydia Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Computer Science, School of Computing, National University of Singapore, 117417, Singapore
- Cancer Science Institute of Singapore, National University of Singapore,117599, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 117596, Singapore
| |
Collapse
|
187
|
Roy BG, Choi J, Fuchs MF. Predictive Modeling of Proteins Encoded by a Plant Virus Sheds a New Light on Their Structure and Inherent Multifunctionality. Biomolecules 2024; 14:62. [PMID: 38254661 PMCID: PMC10813169 DOI: 10.3390/biom14010062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 12/29/2023] [Accepted: 12/30/2023] [Indexed: 01/24/2024] Open
Abstract
Plant virus genomes encode proteins that are involved in replication, encapsidation, cell-to-cell, and long-distance movement, avoidance of host detection, counter-defense, and transmission from host to host, among other functions. Even though the multifunctionality of plant viral proteins is well documented, contemporary functional repertoires of individual proteins are incomplete. However, these can be enhanced by modeling tools. Here, predictive modeling of proteins encoded by the two genomic RNAs, i.e., RNA1 and RNA2, of grapevine fanleaf virus (GFLV) and their satellite RNAs by a suite of protein prediction software confirmed not only previously validated functions (suppressor of RNA silencing [VSR], viral genome-linked protein [VPg], protease [Pro], symptom determinant [Sd], homing protein [HP], movement protein [MP], coat protein [CP], and transmission determinant [Td]) and previously identified putative functions (helicase [Hel] and RNA-dependent RNA polymerase [Pol]), but also predicted novel functions with varying levels of confidence. These include a T3/T7-like RNA polymerase domain for protein 1AVSR, a short-chain reductase for protein 1BHel/VSR, a parathyroid hormone family domain for protein 1EPol/Sd, overlapping domains of unknown function and an ABC transporter domain for protein 2BMP, and DNA topoisomerase domains, transcription factor FBXO25 domain, or DNA Pol subunit cdc27 domain for the satellite RNA protein. Structural predictions for proteins 2AHP/Sd, 2BMP, and 3A? had low confidence, while predictions for proteins 1AVSR, 1BHel*/VSR, 1CVPg, 1DPro, 1EPol*/Sd, and 2CCP/Td retained higher confidence in at least one prediction. This research provided new insights into the structure and functions of GFLV proteins and their satellite protein. Future work is needed to validate these findings.
Collapse
Affiliation(s)
- Brandon G. Roy
- Plant Pathology and Plant-Microbe Biology Section, School of Integrative Plant Science, Cornell University, 15 Castle Creek Drive, Geneva, NY 14456, USA; (J.C.); (M.F.F.)
| | | | | |
Collapse
|
188
|
Pantolini L, Studer G, Pereira J, Durairaj J, Tauriello G, Schwede T. Embedding-based alignment: combining protein language models with dynamic programming alignment to detect structural similarities in the twilight-zone. Bioinformatics 2024; 40:btad786. [PMID: 38175775 PMCID: PMC10792726 DOI: 10.1093/bioinformatics/btad786] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 10/27/2023] [Accepted: 12/29/2023] [Indexed: 01/06/2024] Open
Abstract
MOTIVATION Language models are routinely used for text classification and generative tasks. Recently, the same architectures were applied to protein sequences, unlocking powerful new approaches in the bioinformatics field. Protein language models (pLMs) generate high-dimensional embeddings on a per-residue level and encode a "semantic meaning" of each individual amino acid in the context of the full protein sequence. These representations have been used as a starting point for downstream learning tasks and, more recently, for identifying distant homologous relationships between proteins. RESULTS In this work, we introduce a new method that generates embedding-based protein sequence alignments (EBA) and show how these capture structural similarities even in the twilight zone, outperforming both classical methods as well as other approaches based on pLMs. The method shows excellent accuracy despite the absence of training and parameter optimization. We demonstrate that the combination of pLMs with alignment methods is a valuable approach for the detection of relationships between proteins in the twilight-zone. AVAILABILITY AND IMPLEMENTATION The code to run EBA and reproduce the analysis described in this article is available at: https://git.scicore.unibas.ch/schwede/EBA and https://git.scicore.unibas.ch/schwede/eba_benchmark.
Collapse
Affiliation(s)
- Lorenzo Pantolini
- Biozentrum, University of Basel, Basel 4056, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Gabriel Studer
- Biozentrum, University of Basel, Basel 4056, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Joana Pereira
- Biozentrum, University of Basel, Basel 4056, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Janani Durairaj
- Biozentrum, University of Basel, Basel 4056, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Gerardo Tauriello
- Biozentrum, University of Basel, Basel 4056, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Torsten Schwede
- Biozentrum, University of Basel, Basel 4056, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| |
Collapse
|
189
|
Qu X, Du G, Hu J, Cai Y. Graph-DTI: A New Model for Drug-target Interaction Prediction Based on Heterogenous Network Graph Embedding. Curr Comput Aided Drug Des 2024; 20:1013-1024. [PMID: 37448360 DOI: 10.2174/1573409919666230713142255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 05/04/2023] [Accepted: 05/26/2023] [Indexed: 07/15/2023]
Abstract
BACKGROUND In this study, we aimed to develop a new end-to-end learning model called Graph-Drug-Target Interaction (DTI), which integrates various types of information in the heterogeneous network data, and to explore automatic learning of the topology-maintaining representations of drugs and targets, thereby effectively contributing to the prediction of DTI. Precise predictions of DTI can guide drug discovery and development. Most machine learning algorithms integrate multiple data sources and combine them with common embedding methods. However, the relationship between the drugs and target proteins is not well reported. Although some existing studies have used heterogeneous network graphs for DTI prediction, there are many limitations in the neighborhood information between the nodes in the heterogeneous network graphs. We studied the drug-drug interaction (DDI) and DTI from DrugBank Version 3.0, protein-protein interaction (PPI) from the human protein reference database Release 9, drug structure similarity from Morgan fingerprints of radius 2 and calculated by RDKit, and protein sequence similarity from Smith-Waterman score. METHODS Our study consists of three major components. First, various drugs and target proteins were integrated, and a heterogeneous network was established based on a series of data sets. Second, the graph neural networks-inspired graph auto-encoding method was used to extract high-order structural information from the heterogeneous networks, thereby revealing the description of nodes (drugs and proteins) and their topological neighbors. Finally, potential DTI prediction was made, and the obtained samples were sent to the classifier for secondary classification. RESULTS The performance of Graph-DTI and all baseline methods was evaluated using the sums of the area under the precision-recall curve (AUPR) and the area under the receiver operating characteristic curve (AUC). The results indicated that Graph-DTI outperformed the baseline methods in both performance results. CONCLUSION Compared with other baseline DTI prediction methods, the results showed that Graph-DTI had better prediction performance. Additionally, in this study, we effectively classified drugs corresponding to different targets and vice versa. The above findings showed that Graph-DTI provided a powerful tool for drug research, development, and repositioning. Graph- DTI can serve as a drug development and repositioning tool more effectively than previous studies that did not use heterogeneous network graph embedding.
Collapse
Affiliation(s)
- Xiaohan Qu
- School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, China
| | - Guoxia Du
- School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, China
| | - Jing Hu
- School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, China
| | - Yongming Cai
- School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, China
- Guangdong Provincial Traditional Chinese Medicine Precision Medicine Big Data Engineering Technology Research Center, Guangzhou, China
| |
Collapse
|
190
|
Zhuang L, Zhao Y, Yang L, Li L, Ye Z, Ali A, An Y, Ni R, Ali SL, Gong W. Harnessing bioinformatics for the development of a promising multi-epitope vaccine against tuberculosis: The ZL9810L vaccine. DECODING INFECTION AND TRANSMISSION 2024; 2:100026. [DOI: https:/doi.org/10.1016/j.dcit.2024.100026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2025]
|
191
|
Sudarev VV, Gette MS, Bazhenov SV, Tilinova OM, Zinovev EV, Manukhov IV, Kuklin AI, Ryzhykau YL, Vlasov AV. Ferritin-based fusion protein shows octameric deadlock state of self-assembly. Biochem Biophys Res Commun 2024; 690:149276. [PMID: 38007906 DOI: 10.1016/j.bbrc.2023.149276] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 11/15/2023] [Indexed: 11/28/2023]
Abstract
Ferritin is a universal protein complex responsible for iron perception in almost all living organisms and has applications from fundamental biophysics to drug delivery and structure-based immunogen design. Different platforms based on ferritin share similar technological challenges limiting their development - control of self-assembling processes of ferritin itself as well as ferritin-based chimeric recombinant protein complexes. In our research, we studied self-assembly processes of ferritin-based protein complexes under different expression conditions. We fused a ferritin subunit with a SMT3 protein tag, a homolog of human Small Ubiquitin-like Modifier (SUMO-tag), which was taken to destabilize ferritin 3-fold channel contacts and increase ferritin-SUMO subunits solubility. We first obtained the octameric protein complex of ferritin-SUMO (8xFer-SUMO) and studied its structural organization by small-angle X-ray scattering (SAXS). Obtained SAXS data correspond well with the high-resolution models predicted by AlphaFold and CORAL software of an octameric assembly around the 4-fold channel of ferritin without formation of 3-fold channels. Interestingly, three copies of 8xFer-SUMO do not assemble into 24-meric globules. Thus, we first obtained and structurally characterized ferritin-based self-assembling oligomers in a deadlock state. Deadlock oligomeric states of ferritin extend the known scheme of its self-assembly process, being new potential tools for a number of applications. Finally, our results might open new directions for various biotechnological platforms utilizing ferritin-based tools.
Collapse
Affiliation(s)
- V V Sudarev
- Research Center for Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny, 141700, Russian Federation
| | - M S Gette
- Research Center for Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny, 141700, Russian Federation
| | - S V Bazhenov
- Research Center for Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny, 141700, Russian Federation
| | - O M Tilinova
- Research Center for Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny, 141700, Russian Federation
| | - E V Zinovev
- Research Center for Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny, 141700, Russian Federation
| | - I V Manukhov
- Research Center for Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny, 141700, Russian Federation
| | - A I Kuklin
- Research Center for Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny, 141700, Russian Federation; Frank Laboratory of Neutron Physics, Joint Institute for Nuclear Research, Dubna, 141980, Russian Federation
| | - Yu L Ryzhykau
- Research Center for Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny, 141700, Russian Federation; Frank Laboratory of Neutron Physics, Joint Institute for Nuclear Research, Dubna, 141980, Russian Federation.
| | - A V Vlasov
- Research Center for Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny, 141700, Russian Federation; Frank Laboratory of Neutron Physics, Joint Institute for Nuclear Research, Dubna, 141980, Russian Federation.
| |
Collapse
|
192
|
Chailyan A, Marcatili P. Structural Characterization of Peptide Antibodies. Methods Mol Biol 2024; 2821:195-204. [PMID: 38997490 DOI: 10.1007/978-1-0716-3914-6_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/14/2024]
Abstract
The role of proteins as very effective immunogens for the generation of antibodies is indisputable. Nevertheless, cases in which protein usage for antibody production is not feasible or convenient compelled the creation of a powerful alternative consisting of synthetic peptides. Synthetic peptides can be modified to obtain desired properties or conformation, tagged for purification, isotopically labeled for protein quantitation or conjugated to immunogens for antibody production. The antibodies that bind to these peptides represent an invaluable tool for biological research and discovery. To better understand the underlying mechanisms of antibody-antigen interaction, here, we present a pipeline developed by us to structurally classify immunoglobulin antigen binding sites and to infer key sequence residues and other variables that have a prominent role in each structural class.
Collapse
Affiliation(s)
- Anna Chailyan
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark
| | - Paolo Marcatili
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
193
|
Saeed A, Alharazi T, Alshaghdali K, Rezgui R, Elnaem I, Alreshidi BAT, Tasleem M, Saeed M. Targeting GluR3 in Depression and Alzheimer's Disease: Novel Compounds and Therapeutic Prospects. J Alzheimers Dis 2024; 97:1299-1312. [PMID: 38277291 DOI: 10.3233/jad-230821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2024]
Abstract
BACKGROUND The present study investigates the interrelated pathophysiology of depression and Alzheimer's disease (AD), with the objective of elucidating common underlying mechanisms. OBJECTIVE Our objective is to identify previously undiscovered biogenic compounds from the NuBBE database that specifically interact with GluR3. This study examines the bidirectional association between depression and AD, specifically focusing on the role of depression as a risk factor in the onset and progression of the disease. METHODS In this study, we utilize pharmacokinetics, homology modeling, and molecular docking-based virtual screening techniques to examine the GluR3 AMPA receptor subunit. RESULTS The compounds, namely ZINC000002558953, ZINC000001228056, ZINC000000187911, ZINC000003954487, and ZINC000002040988, exhibited favorable pharmacokinetic profiles and drug-like characteristics, displaying high binding affinities to the GluR3 binding pocket. CONCLUSIONS These findings suggest that targeting GluR3 could hold promise for the development of therapies for depression and AD. Further validation through in vitro, in vivo, and clinical studies is necessary to explore the potential of these compounds as lead candidates for potent and selective GluR3 inhibitors. The shared molecular mechanisms between depression and AD provide an opportunity for novel treatment approaches that address both conditions simultaneously.
Collapse
Affiliation(s)
- Amir Saeed
- Department of Medical Laboratory Sciences, College of Applied Medical Sciences, University of Hail, Hail, Saudi Arabia
- Department of Medical Microbiology, Faculty of Medical Laboratory Sciences, University of Medical Sciences & Technology, Khartoum, Sudan
| | - Talal Alharazi
- Department of Medical Laboratory Sciences, College of Applied Medical Sciences, University of Hail, Hail, Saudi Arabia
| | - Khalid Alshaghdali
- Department of Medical Laboratory Sciences, College of Applied Medical Sciences, University of Hail, Hail, Saudi Arabia
| | - Raja Rezgui
- Department of Medical Laboratory Sciences, College of Applied Medical Sciences, University of Hail, Hail, Saudi Arabia
| | - Ibtihag Elnaem
- Department of oral and maxillofacial surgery and diagnostic science College of Dentistry, University of Hail, Hail, Saudi Arabia
| | | | - Munazzah Tasleem
- School of Electronic Science and Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Mohd Saeed
- Department of Biology, College of Science, University of Hail, Hail, Saudi Arabia
| |
Collapse
|
194
|
Zhou H, Skolnick J. FRAGSITE2: A structure and fragment-based approach for virtual ligand screening. Protein Sci 2024; 33:e4869. [PMID: 38100293 PMCID: PMC10751727 DOI: 10.1002/pro.4869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 12/06/2023] [Accepted: 12/09/2023] [Indexed: 12/17/2023]
Abstract
Protein function annotation and drug discovery often involve finding small molecule binders. In the early stages of drug discovery, virtual ligand screening (VLS) is frequently applied to identify possible hits before experimental testing. While our recent ligand homology modeling (LHM)-machine learning VLS method FRAGSITE outperformed approaches that combined traditional docking to generate protein-ligand poses and deep learning scoring functions to rank ligands, a more robust approach that could identify a more diverse set of binding ligands is needed. Here, we describe FRAGSITE2 that shows significant improvement on protein targets lacking known small molecule binders and no confident LHM identified template ligands when benchmarked on two commonly used VLS datasets: For both the DUD-E set and DEKOIS2.0 set and ligands having a Tanimoto coefficient (TC) < 0.7 to the template ligands, the 1% enrichment factor (EF1% ) of FRAGSITE2 is significantly better than those for FINDSITEcomb2.0 , an earlier LHM algorithm. For the DUD-E set, FRAGSITE2 also shows better ROC enrichment factor and AUPR (area under the precision-recall curve) than the deep learning DenseFS scoring function. Comparison with the RF-score-VS on the 76 target subset of DEKOIS2.0 and a TC < 0.99 to training DUD-E ligands, FRAGSITE2 has double the EF1% . Its boosted tree regression method provides for more robust performance than a deep learning multiple layer perceptron method. When compared with the pretrained language model for protein target features, FRAGSITE2 also shows much better performance. Thus, FRAGSITE2 is a promising approach that can discover novel hits for protein targets. FRAGSITE2's web service is freely available to academic users at http://sites.gatech.edu/cssb/FRAGSITE2.
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of TechnologyAtlantaGeorgiaUSA
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of TechnologyAtlantaGeorgiaUSA
| |
Collapse
|
195
|
Liuu S, Nepelska M, Pfister H, Gamelas Magalhaes J, Chevalier G, Strozzi F, Billerey C, Maresca M, Nicoletti C, Di Pasquale E, Pechard C, Bardouillet L, Girardin SE, Boneca IG, Doré J, Blottière HM, Bonny C, Chene L, Cultrone A. Identification of a muropeptide precursor transporter from gut microbiota and its role in preventing intestinal inflammation. Proc Natl Acad Sci U S A 2023; 120:e2306863120. [PMID: 38127978 PMCID: PMC10756304 DOI: 10.1073/pnas.2306863120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 10/31/2023] [Indexed: 12/23/2023] Open
Abstract
The gut microbiota is a considerable source of biologically active compounds that can promote intestinal homeostasis and improve immune responses. Here, we used large expression libraries of cloned metagenomic DNA to identify compounds able to sustain an anti-inflammatory reaction on host cells. Starting with a screen for NF-κB activation, we have identified overlapping clones harbouring a heterodimeric ATP-binding cassette (ABC)-transporter from a Firmicutes. Extensive purification of the clone's supernatant demonstrates that the ABC-transporter allows for the efficient extracellular accumulation of three muropeptide precursor, with anti-inflammatory properties. They induce IL-10 secretion from human monocyte-derived dendritic cells and proved effective in reducing AIEC LF82 epithelial damage and IL-8 secretion in human intestinal resections. In addition, treatment with supernatants containing the muropeptide precursor reduces body weight loss and improves histological parameters in Dextran Sulfate Sodium (DSS)-treated mice. Until now, the source of peptidoglycan fragments was shown to come from the natural turnover of the peptidoglycan layer by endogenous peptidoglycan hydrolases. This is a report showing an ABC-transporter as a natural source of secreted muropeptide precursor and as an indirect player in epithelial barrier strengthening. The mechanism described here might represent an important component of the host immune homeostasis.
Collapse
Affiliation(s)
| | - Malgorzata Nepelska
- Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), AgroParisTech, Food Microbial Ecology lab (Micalis), Université Paris-Saclay, Jouy-en-Josas78350, France
| | | | | | | | | | | | - Marc Maresca
- CNRS, Centrale Marseille, Institut des Sciences Moléculaires (iSm2) UMR7313, Aix Marseille Université, Marseille13013, France
| | - Cendrine Nicoletti
- CNRS, Centrale Marseille, Institut des Sciences Moléculaires (iSm2) UMR7313, Aix Marseille Université, Marseille13013, France
| | - Eric Di Pasquale
- Institut de NeuroPhysioPathologie (INP), Aix Marseille Université, UMR 7051, Marseille13005, France
| | | | | | - Stephen E. Girardin
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Ivo Gomperts Boneca
- Institut Pasteur, Université Paris Cité, CNRS Unité Mixe de Recherche 6047, INSERM U1306, Unité de Biologie et génétique de la paroi bactérienne, Paris75015, France
| | - Joel Doré
- Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), AgroParisTech, Food Microbial Ecology lab (Micalis), Université Paris-Saclay, Jouy-en-Josas78350, France
- Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), MetaGenoPolis, Université Paris-Saclay, Jouy-en-Josas78350, France
| | - Hervé M. Blottière
- Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), AgroParisTech, Food Microbial Ecology lab (Micalis), Université Paris-Saclay, Jouy-en-Josas78350, France
- Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), MetaGenoPolis, Université Paris-Saclay, Jouy-en-Josas78350, France
| | | | | | | |
Collapse
|
196
|
Park J, Champion JA. Development of Self-Assembled Protein Nanocage Spatially Functionalized with HA Stalk as a Broadly Cross-Reactive Influenza Vaccine Platform. ACS NANO 2023; 17:25045-25060. [PMID: 38084728 PMCID: PMC10753887 DOI: 10.1021/acsnano.3c07669] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 11/29/2023] [Accepted: 12/01/2023] [Indexed: 12/27/2023]
Abstract
There remains a need for the development of a universal influenza vaccine, as current seasonal influenza vaccines exhibit limited protection against mismatched, mutated, or pandemic influenza viruses. A desirable approach to developing an effective universal influenza vaccine is the incorporation of highly conserved antigens in a multivalent scaffold that enhances their immunogenicity. Here, we develop a broadly cross-reactive influenza vaccine by functionalizing self-assembled protein nanocages (SAPNs) with multiple copies of the hemagglutinin stalk on the outer surface and matrix protein 2 ectodomain on the inner surface. SAPNs were generated by engineering short coiled coils, and the design was simulated by MD GROMACS. Due to the short sequences, off-target immune responses against empty SAPN scaffolds were not seen in immunized mice. Vaccination with the multivalent SAPNs induces high levels of broadly cross-reactive antibodies of only external antigens, demonstrating tight spatial control over the designed antigen placement. This work demonstrates the use of SAPNs as a potential influenza vaccine.
Collapse
Affiliation(s)
- Jaeyoung Park
- School of Chemical and Biomolecular
Engineering, Georgia Institute of Technology, 950 Atlantic Dr. NW, Atlanta, Georgia 30332-2000, United States
| | - Julie A. Champion
- School of Chemical and Biomolecular
Engineering, Georgia Institute of Technology, 950 Atlantic Dr. NW, Atlanta, Georgia 30332-2000, United States
| |
Collapse
|
197
|
Hakkennes MA, Buda F, Bonnet S. MetalDock: An Open Access Docking Tool for Easy and Reproducible Docking of Metal Complexes. J Chem Inf Model 2023; 63:7816-7825. [PMID: 38048559 PMCID: PMC10751784 DOI: 10.1021/acs.jcim.3c01582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 11/13/2023] [Accepted: 11/14/2023] [Indexed: 12/06/2023]
Abstract
Despite the proven potential of metal complexes as therapeutics, the lack of computational tools available for the high-throughput screening of their interactions with proteins is a limiting factor toward clinical developments. To address this challenge, we introduce MetalDock, an easy-to-use, open access docking software for docking metal complexes to proteins. Our tool integrates the AutoDock docking engine with three well-known quantum software packages to automate the docking of metal-organic complexes to proteins. We used a Monte Carlo sampling scheme to obtain the missing Lennard-Jones parameters for 12 metal atom types and demonstrated that these parameters generalize exceptionally well. Our results show that the poses obtained by MetalDock are highly accurate, as they predict the binding geometries experimentally determined by crystal structures with high spatial reproducibility. Three different case studies are presented that demonstrate the versatility of MetalDock for the docking of diverse metal-organic compounds to different biomacromolecules, including nucleic acids.
Collapse
Affiliation(s)
- Matthijs
L. A. Hakkennes
- Leiden
Institute of Chemistry, Leiden University, P.O. Box 9502, 2300 RA Leiden, The Netherlands
| | - Francesco Buda
- Leiden
Institute of Chemistry, Leiden University, P.O. Box 9502, 2300 RA Leiden, The Netherlands
| | - Sylvestre Bonnet
- Leiden
Institute of Chemistry, Leiden University, P.O. Box 9502, 2300 RA Leiden, The Netherlands
| |
Collapse
|
198
|
Ng TK, Ji J, Liu Q, Yao Y, Wang WY, Cao Y, Chen CB, Lin JW, Dong G, Cen LP, Huang C, Zhang M. Evaluation of Myocilin Variant Protein Structures Modeled by AlphaFold2. Biomolecules 2023; 14:14. [PMID: 38275755 PMCID: PMC10813463 DOI: 10.3390/biom14010014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 12/12/2023] [Accepted: 12/15/2023] [Indexed: 01/27/2024] Open
Abstract
Deep neural network-based programs can be applied to protein structure modeling by inputting amino acid sequences. Here, we aimed to evaluate the AlphaFold2-modeled myocilin wild-type and variant protein structures and compare to the experimentally determined protein structures. Molecular dynamic and ligand binding properties of the experimentally determined and AlphaFold2-modeled protein structures were also analyzed. AlphaFold2-modeled myocilin variant protein structures showed high similarities in overall structure to the experimentally determined mutant protein structures, but the orientations and geometries of amino acid side chains were slightly different. The olfactomedin-like domain of the modeled missense variant protein structures showed fewer folding changes than the nonsense variant when compared to the predicted wild-type protein structure. Differences were also observed in molecular dynamics and ligand binding sites between the AlphaFold2-modeled and experimentally determined structures as well as between the wild-type and variant structures. In summary, the folding of the AlphaFold2-modeled MYOC variant protein structures could be similar to that determined by the experiments but with differences in amino acid side chain orientations and geometries. Careful comparisons with experimentally determined structures are needed before the applications of the in silico modeled variant protein structures.
Collapse
Affiliation(s)
- Tsz Kin Ng
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou 515041, China; (T.K.N.)
- Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong, China
| | - Jie Ji
- Network & Information Centre, Shantou University, Shantou 515041, China
| | - Qingping Liu
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou 515041, China; (T.K.N.)
- Key Laboratory of Carbohydrate and Lipid Metabolism Research, College of Life Science and Technology, Dalian University, Dalian 116622, China
| | - Yao Yao
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou 515041, China; (T.K.N.)
- Shantou University Medical College, Shantou 515041, China
| | - Wen-Ying Wang
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou 515041, China; (T.K.N.)
- Shantou University Medical College, Shantou 515041, China
| | - Yingjie Cao
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou 515041, China; (T.K.N.)
| | - Chong-Bo Chen
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou 515041, China; (T.K.N.)
| | - Jian-Wei Lin
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou 515041, China; (T.K.N.)
| | - Geng Dong
- Shantou University Medical College, Shantou 515041, China
| | - Ling-Ping Cen
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou 515041, China; (T.K.N.)
| | - Chukai Huang
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou 515041, China; (T.K.N.)
| | - Mingzhi Zhang
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou 515041, China; (T.K.N.)
| |
Collapse
|
199
|
Danneskiold-Samsøe NB, Kavi D, Jude KM, Nissen SB, Wat LW, Coassolo L, Zhao M, Santana-Oikawa GA, Broido BB, Garcia KC, Svensson KJ. AlphaFold2 enables accurate deorphanization of ligands to single-pass receptors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.16.531341. [PMID: 36993313 PMCID: PMC10055078 DOI: 10.1101/2023.03.16.531341] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Secreted proteins play crucial roles in paracrine and endocrine signaling; however, identifying novel ligand-receptor interactions remains challenging. Here, we benchmarked AlphaFold as a screening approach to identify extracellular ligand-binding pairs using a structural library of single-pass transmembrane receptors. Key to the approach is the optimization of AlphaFold input and output for screening ligands against receptors to predict the most probable ligand-receptor interactions. Importantly, the predictions were performed on ligand-receptor pairs not used for AlphaFold training. We demonstrate high discriminatory power and a success rate of close to 90 % for known ligand-receptor pairs and 50 % for a diverse set of experimentally validated interactions. These results demonstrate proof-of-concept of a rapid and accurate screening platform to predict high-confidence cell-surface receptors for a diverse set of ligands by structural binding prediction, with potentially wide applicability for the understanding of cell-cell communication.
Collapse
Affiliation(s)
- Niels Banhos Danneskiold-Samsøe
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
- Department of Biology, University of Copenhagen, Denmark
| | - Deniz Kavi
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
| | - Kevin M. Jude
- Department of Molecular and Cellular Physiology, Department of Structural Biology, and Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Silas Boye Nissen
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
- The Novo Nordisk Foundation Center for Stem Cell Medicine (reNEW), University of Copenhagen, Blegdamsvej 3B, DK-2200 Copenhagen N, Denmark
| | - Lianna W. Wat
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Diabetes Research Center, Stanford University School of Medicine, Stanford, CA, USA
| | - Laetitia Coassolo
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Diabetes Research Center, Stanford University School of Medicine, Stanford, CA, USA
| | - Meng Zhao
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Diabetes Research Center, Stanford University School of Medicine, Stanford, CA, USA
| | | | | | - K. Christopher Garcia
- Department of Molecular and Cellular Physiology, Department of Structural Biology, and Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Katrin J. Svensson
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Diabetes Research Center, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Cardiovascular Institute, Stanford University School of Medicine, CA, USA
| |
Collapse
|
200
|
Jeppesen M, André I. Accurate prediction of protein assembly structure by combining AlphaFold and symmetrical docking. Nat Commun 2023; 14:8283. [PMID: 38092742 PMCID: PMC10719378 DOI: 10.1038/s41467-023-43681-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 11/16/2023] [Indexed: 12/17/2023] Open
Abstract
AlphaFold can predict the structures of monomeric and multimeric proteins with high accuracy but has a limit on the number of chains and residues it can fold. Here we show that a combination of AlphaFold and all-atom symmetric docking simulations enables highly accurate prediction of the structure of complex symmetrical assemblies. We present a method to predict the structure of complexes with cubic - tetrahedral, octahedral and icosahedral - symmetry from sequence. Focusing on proteins where AlphaFold can make confident predictions on the subunit structure, 27 cubic systems were assembled with a median TM-score of 0.99 and a DockQ score of 0.72. 21 had TM-scores of above 0.9 and were categorized as acceptable- to high-quality according to DockQ. The resulting models are energetically optimized and can be used for detailed studies of intermolecular interactions in higher-order symmetrical assemblies. The results demonstrate how explicit treatment of structural symmetry can significantly expand the size and complexity of AlphaFold predictions.
Collapse
Affiliation(s)
- Mads Jeppesen
- Department of Biochemistry and Structural Biology, Lund University, Lund, Sweden
| | - Ingemar André
- Department of Biochemistry and Structural Biology, Lund University, Lund, Sweden.
| |
Collapse
|