1251
|
Homma F, Huang J, van der Hoorn RAL. AlphaFold-Multimer predicts cross-kingdom interactions at the plant-pathogen interface. Nat Commun 2023; 14:6040. [PMID: 37758696 PMCID: PMC10533508 DOI: 10.1038/s41467-023-41721-9] [Citation(s) in RCA: 50] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 09/14/2023] [Indexed: 09/29/2023] Open
Abstract
Adapted plant pathogens from various microbial kingdoms produce hundreds of unrelated small secreted proteins (SSPs) with elusive roles. Here, we used AlphaFold-Multimer (AFM) to screen 1879 SSPs of seven tomato pathogens for interacting with six defence-related hydrolases of tomato. This screen of 11,274 protein pairs identified 15 non-annotated SSPs that are predicted to obstruct the active site of chitinases and proteases with an intrinsic fold. Four SSPs were experimentally verified to be inhibitors of pathogenesis-related subtilase P69B, including extracellular protein-36 (Ecp36) and secreted-into-xylem-15 (Six15) of the fungal pathogens Cladosporium fulvum and Fusarium oxysporum, respectively. Together with a P69B inhibitor from the bacterial pathogen Xanthomonas perforans and Kazal-like inhibitors of the oomycete pathogen Phytophthora infestans, P69B emerges as an effector hub targeted by different microbial kingdoms, consistent with a diversification of P69B orthologs and paralogs. This study demonstrates the power of artificial intelligence to predict cross-kingdom interactions at the plant-pathogen interface.
Collapse
Affiliation(s)
- Felix Homma
- The Plant Chemetics Laboratory, Department of Biology, University of Oxford, OX1 3RB, Oxford, UK
| | - Jie Huang
- The Plant Chemetics Laboratory, Department of Biology, University of Oxford, OX1 3RB, Oxford, UK
| | - Renier A L van der Hoorn
- The Plant Chemetics Laboratory, Department of Biology, University of Oxford, OX1 3RB, Oxford, UK.
| |
Collapse
|
1252
|
Vitali V, Ackermann K, Hagelueken G, Bode BE. Spectroscopically Orthogonal Labelling to Disentangle Site-Specific Nitroxide Label Distributions. APPLIED MAGNETIC RESONANCE 2023; 55:187-205. [PMID: 38357007 PMCID: PMC10861635 DOI: 10.1007/s00723-023-01611-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 08/23/2023] [Accepted: 08/25/2023] [Indexed: 02/16/2024]
Abstract
Biomolecular applications of pulse dipolar electron paramagnetic resonance spectroscopy (PDS) are becoming increasingly valuable in structural biology. Site-directed spin labelling of proteins is routinely performed using nitroxides, with paramagnetic metal ions and other organic radicals gaining popularity as alternative spin centres. Spectroscopically orthogonal spin labelling using different types of labels potentially increases the information content available from a single sample. When analysing experimental distance distributions between two nitroxide spin labels, the site-specific rotamer information has been projected into the distance and is not readily available, and the contributions of individual labelling sites to the width of the distance distribution are not obvious from the PDS data. Here, we exploit the exquisite precision of labelling double-histidine (dHis) motifs with CuII chelate complexes. The contribution of this label to the distance distribution widths in model protein GB1 has been shown to be negligible. By combining a dHis CuII labelling site with cysteine-specific nitroxide labelling, we gather insights on the label rotamers at two distinct sites, comparing their contributions to distance distributions based on different in silico modelling approaches and structural models. From this study, it seems advisable to consider discrepancies between different in silico modelling approaches when selecting labelling sites for PDS studies. Supplementary Information The online version contains supplementary material available at 10.1007/s00723-023-01611-1.
Collapse
Affiliation(s)
- Valentina Vitali
- EaStCHEM School of Chemistry, Biomedical Sciences Research Complex, and Centre of Magnetic Resonance, University of St Andrews, North Haugh, St Andrews, KY16 9ST Scotland
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
- Department of Chemistry “Ugo Schiff”, University of Florence, Via Della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| | - Katrin Ackermann
- EaStCHEM School of Chemistry, Biomedical Sciences Research Complex, and Centre of Magnetic Resonance, University of St Andrews, North Haugh, St Andrews, KY16 9ST Scotland
| | - Gregor Hagelueken
- Institute of Structural Biology, Biomedical Center, University of Bonn, Venusberg-Campus 1, 53127 Bonn, Germany
| | - Bela E. Bode
- EaStCHEM School of Chemistry, Biomedical Sciences Research Complex, and Centre of Magnetic Resonance, University of St Andrews, North Haugh, St Andrews, KY16 9ST Scotland
| |
Collapse
|
1253
|
Thongchol J, Lill Z, Hoover Z, Zhang J. Recent Advances in Structural Studies of Single-Stranded RNA Bacteriophages. Viruses 2023; 15:1985. [PMID: 37896763 PMCID: PMC10610835 DOI: 10.3390/v15101985] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 09/20/2023] [Accepted: 09/21/2023] [Indexed: 10/29/2023] Open
Abstract
Positive-sense single-stranded RNA (ssRNA) bacteriophages (phages) were first isolated six decades ago. Since then, extensive research has been conducted on these ssRNA phages, particularly those infecting E. coli. With small genomes of typically 3-4 kb that usually encode four essential proteins, ssRNA phages employ a straightforward infectious cycle involving host adsorption, genome entry, genome replication, phage assembly, and host lysis. Recent advancements in metagenomics and transcriptomics have led to the identification of ~65,000 sequences from ssRNA phages, expanding our understanding of their prevalence and potential hosts. This review article illuminates significant investigations into ssRNA phages, with a focal point on their structural aspects, providing insights into the various stages of their infectious cycle.
Collapse
Affiliation(s)
| | | | | | - Junjie Zhang
- Center for Phage Technology, Department of Biochemistry and Biophysics, Texas A&M University, College Station, TX 77843, USA; (J.T.); (Z.L.); (Z.H.)
| |
Collapse
|
1254
|
Song Y, Yuan Q, Zhao H, Yang Y. Accurately identifying nucleic-acid-binding sites through geometric graph learning on language model predicted structures. Brief Bioinform 2023; 24:bbad360. [PMID: 37824738 DOI: 10.1093/bib/bbad360] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 09/18/2023] [Accepted: 09/18/2023] [Indexed: 10/14/2023] Open
Abstract
The interactions between nucleic acids and proteins are important in diverse biological processes. The high-quality prediction of nucleic-acid-binding sites continues to pose a significant challenge. Presently, the predictive efficacy of sequence-based methods is constrained by their exclusive consideration of sequence context information, whereas structure-based methods are unsuitable for proteins lacking known tertiary structures. Though protein structures predicted by AlphaFold2 could be used, the extensive computing requirement of AlphaFold2 hinders its use for genome-wide applications. Based on the recent breakthrough of ESMFold for fast prediction of protein structures, we have developed GLMSite, which accurately identifies DNA- and RNA-binding sites using geometric graph learning on ESMFold predicted structures. Here, the predicted protein structures are employed to construct protein structural graph with residues as nodes and spatially neighboring residue pairs for edges. The node representations are further enhanced through the pre-trained language model ProtTrans. The network was trained using a geometric vector perceptron, and the geometric embeddings were subsequently fed into a common network to acquire common binding characteristics. Finally, these characteristics were input into two fully connected layers to predict binding sites with DNA and RNA, respectively. Through comprehensive tests on DNA/RNA benchmark datasets, GLMSite was shown to surpass the latest sequence-based methods and be comparable with structure-based methods. Moreover, the prediction was shown useful for inferring nucleic-acid-binding proteins, demonstrating its potential for protein function discovery. The datasets, codes, and trained models are available at https://github.com/biomed-AI/nucleic-acid-binding.
Collapse
Affiliation(s)
- Yidong Song
- Key Laboratory of Machine Intelligence and Advanced Computing of MOE, School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Qianmu Yuan
- Key Laboratory of Machine Intelligence and Advanced Computing of MOE, School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Huiying Zhao
- Key Laboratory of Machine Intelligence and Advanced Computing of MOE, School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Yuedong Yang
- Key Laboratory of Machine Intelligence and Advanced Computing of MOE, School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| |
Collapse
|
1255
|
Cheng J, Novati G, Pan J, Bycroft C, Žemgulytė A, Applebaum T, Pritzel A, Wong LH, Zielinski M, Sargeant T, Schneider RG, Senior AW, Jumper J, Hassabis D, Kohli P, Avsec Ž. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 2023; 381:eadg7492. [PMID: 37733863 DOI: 10.1126/science.adg7492] [Citation(s) in RCA: 694] [Impact Index Per Article: 347.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 08/23/2023] [Indexed: 09/23/2023]
Abstract
The vast majority of missense variants observed in the human genome are of unknown clinical significance. We present AlphaMissense, an adaptation of AlphaFold fine-tuned on human and primate variant population frequency databases to predict missense variant pathogenicity. By combining structural context and evolutionary conservation, our model achieves state-of-the-art results across a wide range of genetic and experimental benchmarks, all without explicitly training on such data. The average pathogenicity score of genes is also predictive for their cell essentiality, capable of identifying short essential genes that existing statistical approaches are underpowered to detect. As a resource to the community, we provide a database of predictions for all possible human single amino acid substitutions and classify 89% of missense variants as either likely benign or likely pathogenic.
Collapse
|
1256
|
Mardikoraem M, Wang Z, Pascual N, Woldring D. Generative models for protein sequence modeling: recent advances and future directions. Brief Bioinform 2023; 24:bbad358. [PMID: 37864295 PMCID: PMC10589401 DOI: 10.1093/bib/bbad358] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 09/08/2023] [Accepted: 09/12/2023] [Indexed: 10/22/2023] Open
Abstract
The widespread adoption of high-throughput omics technologies has exponentially increased the amount of protein sequence data involved in many salient disease pathways and their respective therapeutics and diagnostics. Despite the availability of large-scale sequence data, the lack of experimental fitness annotations underpins the need for self-supervised and unsupervised machine learning (ML) methods. These techniques leverage the meaningful features encoded in abundant unlabeled sequences to accomplish complex protein engineering tasks. Proficiency in the rapidly evolving fields of protein engineering and generative AI is required to realize the full potential of ML models as a tool for protein fitness landscape navigation. Here, we support this work by (i) providing an overview of the architecture and mathematical details of the most successful ML models applicable to sequence data (e.g. variational autoencoders, autoregressive models, generative adversarial neural networks, and diffusion models), (ii) guiding how to effectively implement these models on protein sequence data to predict fitness or generate high-fitness sequences and (iii) highlighting several successful studies that implement these techniques in protein engineering (from paratope regions and subcellular localization prediction to high-fitness sequences and protein design rules generation). By providing a comprehensive survey of model details, novel architecture developments, comparisons of model applications, and current challenges, this study intends to provide structured guidance and robust framework for delivering a prospective outlook in the ML-driven protein engineering field.
Collapse
Affiliation(s)
- Mehrsa Mardikoraem
- Michigan State University (MSU)‘s Department of Chemical Engineering and Materials Science
| | - Zirui Wang
- Regeneron Pharmaceuticals, Inc. Having received his B.S. in Chemical Engineering from MSU, he is currently pursuing a M.S. in Computer Science from Syracuse University
| | | | - Daniel Woldring
- MSU’s Department of Chemical Engineering and Materials Science and a member of MSU’s Institute for Quantitative Health Sciences and Engineering
| |
Collapse
|
1257
|
Dai X, Wu L, Yoo S, Liu Q. Integrating AlphaFold and deep learning for atomistic interpretation of cryo-EM maps. Brief Bioinform 2023; 24:bbad405. [PMID: 37982712 DOI: 10.1093/bib/bbad405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 10/09/2023] [Accepted: 10/23/2023] [Indexed: 11/21/2023] Open
Abstract
Interpretation of cryo-electron microscopy (cryo-EM) maps requires building and fitting 3D atomic models of biological molecules. AlphaFold-predicted models generate initial 3D coordinates; however, model inaccuracy and conformational heterogeneity often necessitate labor-intensive manual model building and fitting into cryo-EM maps. In this work, we designed a protein model-building workflow, which combines a deep-learning cryo-EM map feature enhancement tool, CryoFEM (Cryo-EM Feature Enhancement Model) and AlphaFold. A benchmark test using 36 cryo-EM maps shows that CryoFEM achieves state-of-the-art performance in optimizing the Fourier Shell Correlations between the maps and the ground truth models. Furthermore, in a subset of 17 datasets where the initial AlphaFold predictions are less accurate, the workflow significantly improves their model accuracy. Our work demonstrates that the integration of modern deep learning image enhancement and AlphaFold may lead to automated model building and fitting for the atomistic interpretation of cryo-EM maps.
Collapse
Affiliation(s)
- Xin Dai
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY, USA
| | - Longlong Wu
- Condensed Matter Physics and Materials Science Department, Brookhaven National Laboratory, Upton, NY, USA
| | - Shinjae Yoo
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY, USA
| | - Qun Liu
- Biology Department, Brookhaven National Laboratory, Upton, NY, USA
| |
Collapse
|
1258
|
Lemcke S, Appeldorn JH, Wand M, Speck T. Toward a structural identification of metastable molecular conformations. J Chem Phys 2023; 159:114105. [PMID: 37712784 DOI: 10.1063/5.0164145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 08/21/2023] [Indexed: 09/16/2023] Open
Abstract
Interpreting high-dimensional data from molecular dynamics simulations is a persistent challenge. In this paper, we show that for a small peptide, deca-alanine, metastable states can be identified through a neural net based on structural information alone. While processing molecular dynamics data, dimensionality reduction is a necessary step that projects high-dimensional data onto a low-dimensional representation that, ideally, captures the conformational changes in the underlying data. Conventional methods make use of the temporal information contained in trajectories generated through integrating the equations of motion, which forgoes more efficient sampling schemes. We demonstrate that EncoderMap, an autoencoder architecture with an additional distance metric, can find a suitable low-dimensional representation to identify long-lived molecular conformations using exclusively structural information. For deca-alanine, which exhibits several helix-forming pathways, we show that this approach allows us to combine simulations with different biasing forces and yields representations comparable in quality to other established methods. Our results contribute to computational strategies for the rapid automatic exploration of the configuration space of peptides and proteins.
Collapse
Affiliation(s)
- Simon Lemcke
- Institut für Physik, Johannes Gutenberg-Universität Mainz, Staudingerweg 7-9, 55128 Mainz, Germany
| | - Jörn H Appeldorn
- Institut für Physik, Johannes Gutenberg-Universität Mainz, Staudingerweg 7-9, 55128 Mainz, Germany
| | - Michael Wand
- Institut für Informatik, Johannes Gutenberg-Universität Mainz, Staudingerweg 9, 55128 Mainz, Germany
| | - Thomas Speck
- Institut für Theoretische Physik IV, Universität Stuttgart, Heisenbergstr. 3, 70569 Stuttgart, Germany
| |
Collapse
|
1259
|
Qiu Y, Wei GW. Artificial intelligence-aided protein engineering: from topological data analysis to deep protein language models. Brief Bioinform 2023; 24:bbad289. [PMID: 37580175 PMCID: PMC10516362 DOI: 10.1093/bib/bbad289] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 07/14/2023] [Accepted: 07/26/2023] [Indexed: 08/16/2023] Open
Abstract
Protein engineering is an emerging field in biotechnology that has the potential to revolutionize various areas, such as antibody design, drug discovery, food security, ecology, and more. However, the mutational space involved is too vast to be handled through experimental means alone. Leveraging accumulative protein databases, machine learning (ML) models, particularly those based on natural language processing (NLP), have considerably expedited protein engineering. Moreover, advances in topological data analysis (TDA) and artificial intelligence-based protein structure prediction, such as AlphaFold2, have made more powerful structure-based ML-assisted protein engineering strategies possible. This review aims to offer a comprehensive, systematic, and indispensable set of methodological components, including TDA and NLP, for protein engineering and to facilitate their future development.
Collapse
Affiliation(s)
- Yuchi Qiu
- Department of Mathematics, Michigan State University, East Lansing, 48824 MI, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, 48824 MI, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, 48824 MI, USA
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, 48824 MI, USA
| |
Collapse
|
1260
|
Hallee L, Rafailidis N, Gleghorn JP. cdsBERT - Extending Protein Language Models with Codon Awareness. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.15.558027. [PMID: 37745387 PMCID: PMC10516008 DOI: 10.1101/2023.09.15.558027] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Recent advancements in Protein Language Models (pLMs) have enabled high-throughput analysis of proteins through primary sequence alone. At the same time, newfound evidence illustrates that codon usage bias is remarkably predictive and can even change the final structure of a protein. Here, we explore these findings by extending the traditional vocabulary of pLMs from amino acids to codons to encapsulate more information inside CoDing Sequences (CDS). We build upon traditional transfer learning techniques with a novel pipeline of token embedding matrix seeding, masked language modeling, and student-teacher knowledge distillation, called MELD. This transformed the pretrained ProtBERT into cdsBERT; a pLM with a codon vocabulary trained on a massive corpus of CDS. Interestingly, cdsBERT variants produced a highly biochemically relevant latent space, outperforming their amino acid-based counterparts on enzyme commission number prediction. Further analysis revealed that synonymous codon token embeddings moved distinctly in the embedding space, showcasing unique additions of information across broad phylogeny inside these traditionally "silent" mutations. This embedding movement correlated significantly with average usage bias across phylogeny. Future fine-tuned organism-specific codon pLMs may potentially have a more significant increase in codon usage fidelity. This work enables an exciting potential in using the codon vocabulary to improve current state-of-the-art structure and function prediction that necessitates the creation of a codon pLM foundation model alongside the addition of high-quality CDS to large-scale protein databases.
Collapse
Affiliation(s)
- Logan Hallee
- Center for Bioinformatics and Computational Biology, University of Delaware
| | | | | |
Collapse
|
1261
|
Roche R, Moussad B, Shuvo MH, Tarafder S, Bhattacharya D. EquiPNAS: improved protein-nucleic acid binding site prediction using protein-language-model-informed equivariant deep graph neural networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.14.557719. [PMID: 37745556 PMCID: PMC10515942 DOI: 10.1101/2023.09.14.557719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Protein language models (pLMs) trained on a large corpus of protein sequences have shown unprecedented scalability and broad generalizability in a wide range of predictive modeling tasks, but their power has not yet been harnessed for predicting protein-nucleic acid binding sites, critical for characterizing the interactions between proteins and nucleic acids. Here we present EquiPNAS, a new pLM-informed E(3) equivariant deep graph neural network framework for improved protein-nucleic acid binding site prediction. By combining the strengths of pLM and symmetry-aware deep graph learning, EquiPNAS consistently outperforms the state-of-the-art methods for both protein-DNA and protein-RNA binding site prediction on multiple datasets across a diverse set of predictive modeling scenarios ranging from using experimental input to AlphaFold2 predictions. Our ablation study reveals that the pLM embeddings used in EquiPNAS are sufficiently powerful to dramatically reduce the dependence on the availability of evolutionary information without compromising on accuracy, and that the symmetry-aware nature of the E(3) equivariant graph-based neural architecture offers remarkable robustness and performance resilience. EquiPNAS is freely available at https://github.com/Bhattacharya-Lab/EquiPNAS.
Collapse
Affiliation(s)
- Rahmatullah Roche
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States of America
| | - Bernard Moussad
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States of America
| | - Md Hossain Shuvo
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States of America
| | - Sumit Tarafder
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States of America
| | - Debswapna Bhattacharya
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States of America
| |
Collapse
|
1262
|
Wang Y, Lv H, Lei R, Yeung YH, Shen IR, Choi D, Teo QW, Tan TJ, Gopal AB, Chen X, Graham CS, Wu NC. An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.11.557288. [PMID: 37745338 PMCID: PMC10515799 DOI: 10.1101/2023.09.11.557288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Despite decades of antibody research, it remains challenging to predict the specificity of an antibody solely based on its sequence. Two major obstacles are the lack of appropriate models and inaccessibility of datasets for model training. In this study, we curated a dataset of >5,000 influenza hemagglutinin (HA) antibodies by mining research publications and patents, which revealed many distinct sequence features between antibodies to HA head and stem domains. We then leveraged this dataset to develop a lightweight memory B cell language model (mBLM) for sequence-based antibody specificity prediction. Model explainability analysis showed that mBLM captured key sequence motifs of HA stem antibodies. Additionally, by applying mBLM to HA antibodies with unknown epitopes, we discovered and experimentally validated many HA stem antibodies. Overall, this study not only advances our molecular understanding of antibody response to influenza virus, but also provides an invaluable resource for applying deep learning to antibody research.
Collapse
Affiliation(s)
- Yiquan Wang
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Huibin Lv
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
- Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Ruipeng Lei
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Yuen-Hei Yeung
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR, China
| | - Ivana R. Shen
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Danbi Choi
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Qi Wen Teo
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
- Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Timothy J.C. Tan
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Akshita B. Gopal
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Xin Chen
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Claire S. Graham
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Nicholas C. Wu
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
- Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
- Carle Illinois College of Medicine, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
1263
|
Patsch D, Eichenberger M, Voss M, Bornscheuer UT, Buller RM. LibGENiE - A bioinformatic pipeline for the design of information-enriched enzyme libraries. Comput Struct Biotechnol J 2023; 21:4488-4496. [PMID: 37736300 PMCID: PMC10510078 DOI: 10.1016/j.csbj.2023.09.013] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 09/13/2023] [Accepted: 09/13/2023] [Indexed: 09/23/2023] Open
Abstract
Enzymes are potent catalysts with high specificity and selectivity. To leverage nature's synthetic potential for industrial applications, various protein engineering techniques have emerged which allow to tailor the catalytic, biophysical, and molecular recognition properties of enzymes. However, the many possible ways a protein can be altered forces researchers to carefully balance between the exhaustiveness of an enzyme screening campaign and the required resources. Consequently, the optimal engineering strategy is often defined on a case-by-case basis. Strikingly, while predicting mutations that lead to an improved target function is challenging, here we show that the prediction and exclusion of deleterious mutations is a much more straightforward task as analyzed for an engineered carbonic acid anhydrase, a transaminase, a squalene-hopene cyclase and a Kemp eliminase. Combining such a pre-selection of allowed residues with advanced gene synthesis methods opens a path toward an efficient and generalizable library construction approach for protein engineering. To give researchers easy access to this methodology, we provide the website LibGENiE containing the bioinformatic tools for the library design workflow.
Collapse
Affiliation(s)
- David Patsch
- Zurich University of Applied Sciences, School of Life Sciences and Facility Management, Institute of Chemistry and Biotechnology, Einsiedlerstrasse 31, 8820 Wädenswil, Switzerland
- Institute of Biochemistry, Department of Biotechnology & Enzyme Catalysis, Greifswald University, Felix-Hausdorff-Strasse 4, 17487 Greifswald, Germany
| | - Michael Eichenberger
- Zurich University of Applied Sciences, School of Life Sciences and Facility Management, Institute of Chemistry and Biotechnology, Einsiedlerstrasse 31, 8820 Wädenswil, Switzerland
| | - Moritz Voss
- Zurich University of Applied Sciences, School of Life Sciences and Facility Management, Institute of Chemistry and Biotechnology, Einsiedlerstrasse 31, 8820 Wädenswil, Switzerland
| | - Uwe T. Bornscheuer
- Institute of Biochemistry, Department of Biotechnology & Enzyme Catalysis, Greifswald University, Felix-Hausdorff-Strasse 4, 17487 Greifswald, Germany
| | - Rebecca M. Buller
- Zurich University of Applied Sciences, School of Life Sciences and Facility Management, Institute of Chemistry and Biotechnology, Einsiedlerstrasse 31, 8820 Wädenswil, Switzerland
| |
Collapse
|
1264
|
Huang Y, Huang HY, Chen Y, Lin YCD, Yao L, Lin T, Leng J, Chang Y, Zhang Y, Zhu Z, Ma K, Cheng YN, Lee TY, Huang HD. A Robust Drug-Target Interaction Prediction Framework with Capsule Network and Transfer Learning. Int J Mol Sci 2023; 24:14061. [PMID: 37762364 PMCID: PMC10531393 DOI: 10.3390/ijms241814061] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 08/27/2023] [Accepted: 08/28/2023] [Indexed: 09/29/2023] Open
Abstract
Drug-target interactions (DTIs) are considered a crucial component of drug design and drug discovery. To date, many computational methods were developed for drug-target interactions, but they are insufficiently informative for accurately predicting DTIs due to the lack of experimentally verified negative datasets, inaccurate molecular feature representation, and ineffective DTI classifiers. Therefore, we address the limitations of randomly selecting negative DTI data from unknown drug-target pairs by establishing two experimentally validated datasets and propose a capsule network-based framework called CapBM-DTI to capture hierarchical relationships of drugs and targets, which adopts pre-trained bidirectional encoder representations from transformers (BERT) for contextual sequence feature extraction from target proteins through transfer learning and the message-passing neural network (MPNN) for the 2-D graph feature extraction of compounds to accurately and robustly identify drug-target interactions. We compared the performance of CapBM-DTI with state-of-the-art methods using four experimentally validated DTI datasets of different sizes, including human (Homo sapiens) and worm (Caenorhabditis elegans) species datasets, as well as three subsets (new compounds, new proteins, and new pairs). Our results demonstrate that the proposed model achieved robust performance and powerful generalization ability in all experiments. The case study on treating COVID-19 demonstrates the applicability of the model in virtual screening.
Collapse
Affiliation(s)
- Yixian Huang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Hsi-Yuan Huang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Yigang Chen
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Yang-Chi-Dung Lin
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Lantian Yao
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Tianxiu Lin
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Junlin Leng
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Yuan Chang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Yuntian Zhang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Zihao Zhu
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Kun Ma
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Yeong-Nan Cheng
- Institute of Bioinformatics and Systems Biology, Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (Y.-N.C.)
| | - Tzong-Yi Lee
- Institute of Bioinformatics and Systems Biology, Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (Y.-N.C.)
| | - Hsien-Da Huang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| |
Collapse
|
1265
|
Arikawa K, Hosokawa M. Uncultured prokaryotic genomes in the spotlight: An examination of publicly available data from metagenomics and single-cell genomics. Comput Struct Biotechnol J 2023; 21:4508-4518. [PMID: 37771751 PMCID: PMC10523443 DOI: 10.1016/j.csbj.2023.09.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 09/10/2023] [Accepted: 09/10/2023] [Indexed: 09/30/2023] Open
Abstract
Owing to the ineffectiveness of traditional culture techniques for the vast majority of microbial species, culture-independent analyses utilizing next-generation sequencing and bioinformatics have become essential for gaining insight into microbial ecology and function. This mini-review focuses on two essential methods for obtaining genetic information from uncultured prokaryotes, metagenomics and single-cell genomics. We analyzed the registration status of uncultured prokaryotic genome data from major public databases and assessed the advantages and limitations of both the methods. Metagenomics generates a significant quantity of sequence data and multiple prokaryotic genomes using straightforward experimental procedures. However, in ecosystems with high microbial diversity, such as soil, most genes are presented as brief, disconnected contigs, and lack association of highly conserved genes and mobile genetic elements with individual species genomes. Although technically more challenging, single-cell genomics offers valuable insights into complex ecosystems by providing strain-resolved genomes, addressing issues in metagenomics. Recent technological advancements, such as long-read sequencing, machine learning algorithms, and in silico protein structure prediction, in combination with vast genomic data, have the potential to overcome the current technical challenges and facilitate a deeper understanding of uncultured microbial ecosystems and microbial dark matter genes and proteins. In light of this, it is imperative that continued innovation in both methods and technologies take place to create high-quality reference genome databases that will support future microbial research and industrial applications.
Collapse
Affiliation(s)
- Koji Arikawa
- Department of Life Science and Medical Bioscience, Waseda University, 2-2 Wakamatsu-cho, Shinjuku-ku, Tokyo 162-8480, Japan
- bitBiome, Inc., 513 Wasedatsurumaki-cho, Shinjuku-ku, Tokyo 162-0041, Japan
| | - Masahito Hosokawa
- Department of Life Science and Medical Bioscience, Waseda University, 2-2 Wakamatsu-cho, Shinjuku-ku, Tokyo 162-8480, Japan
- bitBiome, Inc., 513 Wasedatsurumaki-cho, Shinjuku-ku, Tokyo 162-0041, Japan
- Research Organization for Nano and Life Innovation, Waseda University, 513 Wasedatsurumaki-cho, Shinjuku-ku, Tokyo 162-0041, Japan
- Institute for Advanced Research of Biosystem Dynamics, Waseda Research Institute for Science and Engineering, 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Computational Bio Big-Data Open Innovation Laboratory, National Institute of Advanced Industrial Science and Technology, 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
| |
Collapse
|
1266
|
Han R, Yoon H, Kim G, Lee H, Lee Y. Revolutionizing Medicinal Chemistry: The Application of Artificial Intelligence (AI) in Early Drug Discovery. Pharmaceuticals (Basel) 2023; 16:1259. [PMID: 37765069 PMCID: PMC10537003 DOI: 10.3390/ph16091259] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 08/24/2023] [Accepted: 09/04/2023] [Indexed: 09/29/2023] Open
Abstract
Artificial intelligence (AI) has permeated various sectors, including the pharmaceutical industry and research, where it has been utilized to efficiently identify new chemical entities with desirable properties. The application of AI algorithms to drug discovery presents both remarkable opportunities and challenges. This review article focuses on the transformative role of AI in medicinal chemistry. We delve into the applications of machine learning and deep learning techniques in drug screening and design, discussing their potential to expedite the early drug discovery process. In particular, we provide a comprehensive overview of the use of AI algorithms in predicting protein structures, drug-target interactions, and molecular properties such as drug toxicity. While AI has accelerated the drug discovery process, data quality issues and technological constraints remain challenges. Nonetheless, new relationships and methods have been unveiled, demonstrating AI's expanding potential in predicting and understanding drug interactions and properties. For its full potential to be realized, interdisciplinary collaboration is essential. This review underscores AI's growing influence on the future trajectory of medicinal chemistry and stresses the importance of ongoing synergies between computational and domain experts.
Collapse
Affiliation(s)
| | | | | | | | - Yoonji Lee
- College of Pharmacy, Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|
1267
|
Schafer JW, Porter LL. Evolutionary selection of proteins with two folds. Nat Commun 2023; 14:5478. [PMID: 37673981 PMCID: PMC10482954 DOI: 10.1038/s41467-023-41237-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 08/24/2023] [Indexed: 09/08/2023] Open
Abstract
Although most globular proteins fold into a single stable structure, an increasing number have been shown to remodel their secondary and tertiary structures in response to cellular stimuli. State-of-the-art algorithms predict that these fold-switching proteins adopt only one stable structure, missing their functionally critical alternative folds. Why these algorithms predict a single fold is unclear, but all of them infer protein structure from coevolved amino acid pairs. Here, we hypothesize that coevolutionary signatures are being missed. Suspecting that single-fold variants could be masking these signatures, we developed an approach, called Alternative Contact Enhancement (ACE), to search both highly diverse protein superfamilies-composed of single-fold and fold-switching variants-and protein subfamilies with more fold-switching variants. ACE successfully revealed coevolution of amino acid pairs uniquely corresponding to both conformations of 56/56 fold-switching proteins from distinct families. Then, we used ACE-derived contacts to (1) predict two experimentally consistent conformations of a candidate protein with unsolved structure and (2) develop a blind prediction pipeline for fold-switching proteins. The discovery of widespread dual-fold coevolution indicates that fold-switching sequences have been preserved by natural selection, implying that their functionalities provide evolutionary advantage and paving the way for predictions of diverse protein structures from single sequences.
Collapse
Affiliation(s)
- Joseph W Schafer
- National Library of Medicine, National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Lauren L Porter
- National Library of Medicine, National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, 20894, USA.
- National Heart, Lung, and Blood Institute, Biochemistry and Biophysics Center, National Institutes of Health, Bethesda, MD, 20892, USA.
| |
Collapse
|
1268
|
Nie M, Li H. Innovation in Cross-Linking Mass Spectrometry Workflows: Toward a Comprehensive, Flexible, and Customizable Data Analysis Platform. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2023; 34:1949-1956. [PMID: 37537999 DOI: 10.1021/jasms.3c00123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/05/2023]
Abstract
Cross-linking mass spectrometry (XL-MS) is widely used in the analysis of protein structure and protein-protein interactions (PPIs). Throughout the entire workflow, the utilization of cross-linkers and the interpretation of cross-linking data are the core steps. In recent years, the development of cross-linkers and analytical software mostly follow up on the classical models of non-cleavable cross-linkers such as BS3/DSS and MS-cleavable cross-linkers such as DSSO. Although such a paradigm promotes the maturity and robustness of the XL-MS field, it confines the innovation and flexibility of new cross-linkers and analytical software. This critical insight will discuss the classification, advantages, and disadvantages of existing data analysis search engines. Take the new platinum-based metal cross-linker as an example, potential pitfalls in characterization of cross-linked peptides using existing software are discussed. Finally, ideas on developing more flexible, comprehensive, and user-friendly cross-linkers and software tools are proposed.
Collapse
Affiliation(s)
- Minhan Nie
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, China
| | - Huilin Li
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, China
- Guangdong Key Laboratory of Chiral Molecule and Drug Discovery, School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, China
| |
Collapse
|
1269
|
Cotet TS, Agrafiotis A, Kreiner V, Kuhn R, Shlesinger D, Manero-Carranza M, Khodaverdi K, Kladis E, Desideri Perea A, Maassen-Veeters D, Glänzer W, Massery S, Guerci L, Hong KL, Han J, Stiklioraitis K, D’Arcy VM, Dizerens R, Kilchenmann S, Stalder L, Nissen L, Vogelsanger B, Anzböck S, Laslo D, Bakker S, Kondorosy M, Venerito M, Sanz García A, Feller I, Oxenius A, Reddy ST, Yermanos A. ePlatypus: an ecosystem for computational analysis of immunogenomics data. Bioinformatics 2023; 39:btad553. [PMID: 37682115 PMCID: PMC10518073 DOI: 10.1093/bioinformatics/btad553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 08/08/2023] [Accepted: 09/06/2023] [Indexed: 09/09/2023] Open
Abstract
MOTIVATION The maturation of systems immunology methodologies requires novel and transparent computational frameworks capable of integrating diverse data modalities in a reproducible manner. RESULTS Here, we present the ePlatypus computational immunology ecosystem for immunogenomics data analysis, with a focus on adaptive immune repertoires and single-cell sequencing. ePlatypus is an open-source web-based platform and provides programming tutorials and an integrative database that helps elucidate signatures of B and T cell clonal selection. Furthermore, the ecosystem links novel and established bioinformatics pipelines relevant for single-cell immune repertoires and other aspects of computational immunology such as predicting ligand-receptor interactions, structural modeling, simulations, machine learning, graph theory, pseudotime, spatial transcriptomics, and phylogenetics. The ePlatypus ecosystem helps extract deeper insight in computational immunology and immunogenomics and promote open science. AVAILABILITY AND IMPLEMENTATION Platypus code used in this manuscript can be found at github.com/alexyermanos/Platypus.
Collapse
Affiliation(s)
- Tudor-Stefan Cotet
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, Basel 4058, Switzerland
| | - Andreas Agrafiotis
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, Basel 4058, Switzerland
- Institute of Microbiology, ETH Zurich, Vladimir-Prelog-Weg 4, Zurich 8093, Switzerland
| | - Victor Kreiner
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, Basel 4058, Switzerland
| | - Raphael Kuhn
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, Basel 4058, Switzerland
| | - Danielle Shlesinger
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, Basel 4058, Switzerland
| | - Marcos Manero-Carranza
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, Basel 4058, Switzerland
| | - Keywan Khodaverdi
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, Basel 4058, Switzerland
| | - Evgenios Kladis
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, Basel 4058, Switzerland
| | - Aurora Desideri Perea
- Center for Translational Immunology, University Medical Center Utrecht, Lundlaan 6, Utrecht 3584 EA, The Netherlands
| | - Dylan Maassen-Veeters
- Center for Translational Immunology, University Medical Center Utrecht, Lundlaan 6, Utrecht 3584 EA, The Netherlands
| | - Wiona Glänzer
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, Basel 4058, Switzerland
| | - Solène Massery
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, Basel 4058, Switzerland
| | - Lorenzo Guerci
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, Basel 4058, Switzerland
| | - Kai-Lin Hong
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, Basel 4058, Switzerland
| | - Jiami Han
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, Basel 4058, Switzerland
| | - Kostas Stiklioraitis
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, Basel 4058, Switzerland
| | | | - Raphael Dizerens
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, Basel 4058, Switzerland
| | - Samuel Kilchenmann
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, Basel 4058, Switzerland
| | - Lucas Stalder
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, Basel 4058, Switzerland
| | - Leon Nissen
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, Basel 4058, Switzerland
| | - Basil Vogelsanger
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, Basel 4058, Switzerland
| | - Stine Anzböck
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, Basel 4058, Switzerland
| | - Daria Laslo
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, Basel 4058, Switzerland
| | - Sophie Bakker
- Center for Translational Immunology, University Medical Center Utrecht, Lundlaan 6, Utrecht 3584 EA, The Netherlands
| | - Melinda Kondorosy
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, Basel 4058, Switzerland
| | - Marco Venerito
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, Basel 4058, Switzerland
| | - Alejandro Sanz García
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, Basel 4058, Switzerland
| | - Isabelle Feller
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, Basel 4058, Switzerland
| | - Annette Oxenius
- Institute of Microbiology, ETH Zurich, Vladimir-Prelog-Weg 4, Zurich 8093, Switzerland
| | - Sai T Reddy
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, Basel 4058, Switzerland
| | - Alexander Yermanos
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, Basel 4058, Switzerland
- Institute of Microbiology, ETH Zurich, Vladimir-Prelog-Weg 4, Zurich 8093, Switzerland
- Center for Translational Immunology, University Medical Center Utrecht, Lundlaan 6, Utrecht 3584 EA, The Netherlands
- Department of Pathology and Immunology, University of Geneva, 24 rue du Général-Dufour, Geneva 1211, Switzerland
| |
Collapse
|
1270
|
Herrington NB, Stein D, Li YC, Pandey G, Schlessinger A. Exploring the Druggable Conformational Space of Protein Kinases Using AI-Generated Structures. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.31.555779. [PMID: 37693436 PMCID: PMC10491245 DOI: 10.1101/2023.08.31.555779] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
Protein kinase function and interactions with drugs are controlled in part by the movement of the DFG and ɑC-Helix motifs, which enable kinases to adopt various conformational states. Small molecule ligands elicit therapeutic effects with distinct selectivity profiles and residence times that often depend on the kinase conformation(s) they bind. However, the limited availability of experimentally determined structural data for kinases in inactive states restricts drug discovery efforts for this major protein family. Modern AI-based structural modeling methods hold potential for exploring the previously experimentally uncharted druggable conformational space for kinases. Here, we first evaluated the currently explored conformational space of kinases in the PDB and models generated by AlphaFold2 (AF2) (1) and ESMFold (2), two prominent AI-based structure prediction methods. We then investigated AF2's ability to predict kinase structures in different conformations at various multiple sequence alignment (MSA) depths, based on this parameter's ability to explore conformational diversity. Our results showed a bias within the PDB and predicted structural models generated by AF2 and ESMFold toward structures of kinases in the active state over alternative conformations, particularly those conformations controlled by the DFG motif. Finally, we demonstrate that predicting kinase structures using AF2 at lower MSA depths allows the exploration of the space of these alternative conformations, including identifying previously unobserved conformations for 398 kinases. The results of our analysis of structural modeling by AF2 create a new avenue for the pursuit of new therapeutic agents against a notoriously difficult-to-target family of proteins. Significance Statement Greater abundance of kinase structural data in inactive conformations, currently lacking in structural databases, would improve our understanding of how protein kinases function and expand drug discovery and development for this family of therapeutic targets. Modern approaches utilizing artificial intelligence and machine learning have potential for efficiently capturing novel protein conformations. We provide evidence for a bias within AlphaFold2 and ESMFold to predict structures of kinases in their active states, similar to their overrepresentation in the PDB. We show that lowering the AlphaFold2 algorithm's multiple sequence alignment depth can help explore kinase conformational space more broadly. It can also enable the prediction of hundreds of kinase structures in novel conformations, many of whose models are likely viable for drug discovery.
Collapse
|
1271
|
Cohen S, Schneidman-Duhovny D. A deep learning model for predicting optimal distance range in crosslinking mass spectrometry data. Proteomics 2023; 23:e2200341. [PMID: 37070547 DOI: 10.1002/pmic.202200341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 04/02/2023] [Accepted: 04/03/2023] [Indexed: 04/19/2023]
Abstract
Macromolecular assemblies play an important role in all cellular processes. While there has recently been significant progress in protein structure prediction based on deep learning, large protein complexes cannot be predicted with these approaches. The integrative structure modeling approach characterizes multi-subunit complexes by computational integration of data from fast and accessible experimental techniques. Crosslinking mass spectrometry is one such technique that provides spatial information about the proximity of crosslinked residues. One of the challenges in interpreting crosslinking datasets is designing a scoring function that, given a structure, can quantify how well it fits the data. Most approaches set an upper bound on the distance between Cα atoms of crosslinked residues and calculate a fraction of satisfied crosslinks. However, the distance spanned by the crosslinker greatly depends on the neighborhood of the crosslinked residues. Here, we design a deep learning model for predicting the optimal distance range for a crosslinked residue pair based on the structures of their neighborhoods. We find that our model can predict the distance range with the area under the receiver-operator curve of 0.86 and 0.7 for intra- and inter-protein crosslinks, respectively. Our deep scoring function can be used in a range of structure modeling applications.
Collapse
Affiliation(s)
- Shon Cohen
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Dina Schneidman-Duhovny
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|
1272
|
Livesey BJ, Marsh JA. Advancing variant effect prediction using protein language models. Nat Genet 2023; 55:1426-1427. [PMID: 37563330 DOI: 10.1038/s41588-023-01470-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023]
Affiliation(s)
- Benjamin J Livesey
- MRC Human Genetics Unit, Institute of Genetics & Cancer, University of Edinburgh, Edinburgh, UK
| | - Joseph A Marsh
- MRC Human Genetics Unit, Institute of Genetics & Cancer, University of Edinburgh, Edinburgh, UK.
| |
Collapse
|
1273
|
Simpkin AJ, Caballero I, McNicholas S, Stevenson K, Jiménez E, Sánchez Rodríguez F, Fando M, Uski V, Ballard C, Chojnowski G, Lebedev A, Krissinel E, Usón I, Rigden DJ, Keegan RM. Predicted models and CCP4. Acta Crystallogr D Struct Biol 2023; 79:806-819. [PMID: 37594303 PMCID: PMC10478639 DOI: 10.1107/s2059798323006289] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Accepted: 07/19/2023] [Indexed: 08/19/2023] Open
Abstract
In late 2020, the results of CASP14, the 14th event in a series of competitions to assess the latest developments in computational protein structure-prediction methodology, revealed the giant leap forward that had been made by Google's Deepmind in tackling the prediction problem. The level of accuracy in their predictions was the first instance of a competitor achieving a global distance test score of better than 90 across all categories of difficulty. This achievement represents both a challenge and an opportunity for the field of experimental structural biology. For structure determination by macromolecular X-ray crystallography, access to highly accurate structure predictions is of great benefit, particularly when it comes to solving the phase problem. Here, details of new utilities and enhanced applications in the CCP4 suite, designed to allow users to exploit predicted models in determining macromolecular structures from X-ray diffraction data, are presented. The focus is mainly on applications that can be used to solve the phase problem through molecular replacement.
Collapse
Affiliation(s)
- Adam J. Simpkin
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom
| | - Iracema Caballero
- Crystallographic Methods, Institute of Molecular Biology of Barcelona (IBMB–CSIC), Barcelona, Spain
| | - Stuart McNicholas
- York Structural Biology Laboratory, Department of Chemistry, The University of York, York YO10 5DD, United Kingdom
| | - Kyle Stevenson
- UKRI–STFC, Rutherford Appleton Laboratory, Research Complex at Harwell, Didcot OX11 0FA, United Kingdom
| | - Elisabet Jiménez
- Crystallographic Methods, Institute of Molecular Biology of Barcelona (IBMB–CSIC), Barcelona, Spain
| | - Filomeno Sánchez Rodríguez
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom
- York Structural Biology Laboratory, Department of Chemistry, The University of York, York YO10 5DD, United Kingdom
| | - Maria Fando
- UKRI–STFC, Rutherford Appleton Laboratory, Research Complex at Harwell, Didcot OX11 0FA, United Kingdom
| | - Ville Uski
- UKRI–STFC, Rutherford Appleton Laboratory, Research Complex at Harwell, Didcot OX11 0FA, United Kingdom
| | - Charles Ballard
- UKRI–STFC, Rutherford Appleton Laboratory, Research Complex at Harwell, Didcot OX11 0FA, United Kingdom
| | - Grzegorz Chojnowski
- European Molecular Biology Laboratory, Hamburg Unit, Notkestrasse 85, 22607 Hamburg, Germany
| | - Andrey Lebedev
- UKRI–STFC, Rutherford Appleton Laboratory, Research Complex at Harwell, Didcot OX11 0FA, United Kingdom
| | - Eugene Krissinel
- UKRI–STFC, Rutherford Appleton Laboratory, Research Complex at Harwell, Didcot OX11 0FA, United Kingdom
| | - Isabel Usón
- Crystallographic Methods, Institute of Molecular Biology of Barcelona (IBMB–CSIC), Barcelona, Spain
- ICREA, Institució Catalana de Recerca i Estudis Avançats, Passeig Lluís Companys 23, 08003 Barcelona, Spain
| | - Daniel J. Rigden
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom
| | - Ronan M. Keegan
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom
- UKRI–STFC, Rutherford Appleton Laboratory, Research Complex at Harwell, Didcot OX11 0FA, United Kingdom
| |
Collapse
|
1274
|
Nordquist E, Zhang G, Barethiya S, Ji N, White KM, Han L, Jia Z, Shi J, Cui J, Chen J. Incorporating physics to overcome data scarcity in predictive modeling of protein function: A case study of BK channels. PLoS Comput Biol 2023; 19:e1011460. [PMID: 37713443 PMCID: PMC10529646 DOI: 10.1371/journal.pcbi.1011460] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Revised: 09/27/2023] [Accepted: 08/24/2023] [Indexed: 09/17/2023] Open
Abstract
Machine learning has played transformative roles in numerous chemical and biophysical problems such as protein folding where large amount of data exists. Nonetheless, many important problems remain challenging for data-driven machine learning approaches due to the limitation of data scarcity. One approach to overcome data scarcity is to incorporate physical principles such as through molecular modeling and simulation. Here, we focus on the big potassium (BK) channels that play important roles in cardiovascular and neural systems. Many mutants of BK channel are associated with various neurological and cardiovascular diseases, but the molecular effects are unknown. The voltage gating properties of BK channels have been characterized for 473 site-specific mutations experimentally over the last three decades; yet, these functional data by themselves remain far too sparse to derive a predictive model of BK channel voltage gating. Using physics-based modeling, we quantify the energetic effects of all single mutations on both open and closed states of the channel. Together with dynamic properties derived from atomistic simulations, these physical descriptors allow the training of random forest models that could reproduce unseen experimentally measured shifts in gating voltage, ∆V1/2, with a RMSE ~ 32 mV and correlation coefficient of R ~ 0.7. Importantly, the model appears capable of uncovering nontrivial physical principles underlying the gating of the channel, including a central role of hydrophobic gating. The model was further evaluated using four novel mutations of L235 and V236 on the S5 helix, mutations of which are predicted to have opposing effects on V1/2 and suggest a key role of S5 in mediating voltage sensor-pore coupling. The measured ∆V1/2 agree quantitatively with prediction for all four mutations, with a high correlation of R = 0.92 and RMSE = 18 mV. Therefore, the model can capture nontrivial voltage gating properties in regions where few mutations are known. The success of predictive modeling of BK voltage gating demonstrates the potential of combining physics and statistical learning for overcoming data scarcity in nontrivial protein function prediction.
Collapse
Affiliation(s)
- Erik Nordquist
- Department of Chemistry, University of Massachusetts Amherst, Amherst, Massachusetts, United States of America
| | - Guohui Zhang
- Department of Biomedical Engineering, Center for the Investigation of Membrane Excitability Disorders, Cardiac Bioelectricity and Arrhythmia Center, Washington University in St. Louis, St. Louis, Missouri, United States of America
| | - Shrishti Barethiya
- Department of Chemistry, University of Massachusetts Amherst, Amherst, Massachusetts, United States of America
| | - Nathan Ji
- Department of Biology, Boston College, Chestnut Hill, Massachusetts, United States of America
| | - Kelli M. White
- Department of Biomedical Engineering, Center for the Investigation of Membrane Excitability Disorders, Cardiac Bioelectricity and Arrhythmia Center, Washington University in St. Louis, St. Louis, Missouri, United States of America
| | - Lu Han
- Department of Biomedical Engineering, Center for the Investigation of Membrane Excitability Disorders, Cardiac Bioelectricity and Arrhythmia Center, Washington University in St. Louis, St. Louis, Missouri, United States of America
| | - Zhiguang Jia
- Department of Chemistry, University of Massachusetts Amherst, Amherst, Massachusetts, United States of America
| | - Jingyi Shi
- Department of Biomedical Engineering, Center for the Investigation of Membrane Excitability Disorders, Cardiac Bioelectricity and Arrhythmia Center, Washington University in St. Louis, St. Louis, Missouri, United States of America
| | - Jianmin Cui
- Department of Biomedical Engineering, Center for the Investigation of Membrane Excitability Disorders, Cardiac Bioelectricity and Arrhythmia Center, Washington University in St. Louis, St. Louis, Missouri, United States of America
| | - Jianhan Chen
- Department of Chemistry, University of Massachusetts Amherst, Amherst, Massachusetts, United States of America
| |
Collapse
|
1275
|
Mesdaghi S, Price RM, Madine J, Rigden DJ. Deep Learning-based structure modelling illuminates structure and function in uncharted regions of β-solenoid fold space. J Struct Biol 2023; 215:108010. [PMID: 37544372 DOI: 10.1016/j.jsb.2023.108010] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 07/19/2023] [Accepted: 08/03/2023] [Indexed: 08/08/2023]
Abstract
Repeat proteins are common in all domains of life and exhibit a wide range of functions. One class of repeat protein contains solenoid folds where the repeating unit consists of β-strands separated by tight turns. β-solenoids have distinguishing structural features such as handedness, twist, oligomerisation state, coil shape and size which give rise to their diversity. Characterised β-solenoid repeat proteins are known to form regions in bacterial and viral virulence factors, antifreeze proteins and functional amyloids. For many of these proteins, the experimental structure has not been solved, as they are difficult to crystallise or model. Here we use various deep learning-based structure-modelling methods to discover novel predicted β-solenoids, perform structural database searches to mine further structural neighbours and relate their predicted structure to possible functions. We find both eukaryotic and prokaryotic adhesins, confirming a known functional linkage between adhesin function and the β-solenoid fold. We further identify exceptionally long, flat β-solenoid folds as possible structures of mucin tandem repeat regions and unprecedentedly small β-solenoid structures. Additionally, we characterise a novel β-solenoid coil shape, the FapC Greek key β-solenoid as well as plausible complexes between it and other proteins involved in Pseudomonas functional amyloid fibres.
Collapse
Affiliation(s)
- Shahram Mesdaghi
- The University of Liverpool, Institute of Systems, Molecular & Integrative Biology, Biosciences Building, Crown Street, Liverpool L69 7ZB, United Kingdom; Computational Biology Facility, MerseyBio, University of Liverpool, Crown Street, Liverpool L69 7ZB, United Kingdom
| | - Rebecca M Price
- The University of Liverpool, Institute of Systems, Molecular & Integrative Biology, Biosciences Building, Crown Street, Liverpool L69 7ZB, United Kingdom
| | - Jillian Madine
- The University of Liverpool, Institute of Systems, Molecular & Integrative Biology, Biosciences Building, Crown Street, Liverpool L69 7ZB, United Kingdom.
| | - Daniel J Rigden
- The University of Liverpool, Institute of Systems, Molecular & Integrative Biology, Biosciences Building, Crown Street, Liverpool L69 7ZB, United Kingdom.
| |
Collapse
|
1276
|
Porter LL. Fluid protein fold space and its implications. Bioessays 2023; 45:e2300057. [PMID: 37431685 PMCID: PMC10529699 DOI: 10.1002/bies.202300057] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 06/21/2023] [Accepted: 06/23/2023] [Indexed: 07/12/2023]
Abstract
Fold-switching proteins, which remodel their secondary and tertiary structures in response to cellular stimuli, suggest a new view of protein fold space. For decades, experimental evidence has indicated that protein fold space is discrete: dissimilar folds are encoded by dissimilar amino acid sequences. Challenging this assumption, fold-switching proteins interconnect discrete groups of dissimilar protein folds, making protein fold space fluid. Three recent observations support the concept of fluid fold space: (1) some amino acid sequences interconvert between folds with distinct secondary structures, (2) some naturally occurring sequences have switched folds by stepwise mutation, and (3) fold switching is evolutionarily selected and likely confers advantage. These observations indicate that minor amino acid sequence modifications can transform protein structure and function. Consequently, proteomic structural and functional diversity may be expanded by alternative splicing, small nucleotide polymorphisms, post-translational modifications, and modified translation rates.
Collapse
Affiliation(s)
- Lauren L. Porter
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD
| |
Collapse
|
1277
|
Ojala T, Häkkinen AE, Kankuri E, Kankainen M. Current concepts, advances, and challenges in deciphering the human microbiota with metatranscriptomics. Trends Genet 2023; 39:686-702. [PMID: 37365103 DOI: 10.1016/j.tig.2023.05.004] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Revised: 05/24/2023] [Accepted: 05/25/2023] [Indexed: 06/28/2023]
Abstract
Metatranscriptomics refers to the analysis of the collective microbial transcriptome of a sample. Its increased utilization for the characterization of human-associated microbial communities has enabled the discovery of many disease-state related microbial activities. Here, we review the principles of metatranscriptomics-based analysis of human-associated microbial samples. We describe strengths and weaknesses of popular sample preparation, sequencing, and bioinformatics approaches and summarize strategies for their use. We then discuss how human-associated microbial communities have recently been examined and how their characterization may change. We conclude that metatranscriptomics insights into human microbiotas under health and disease have not only expanded our knowledge on human health, but also opened avenues for rational antimicrobial drug use and disease management.
Collapse
Affiliation(s)
- Teija Ojala
- Department of Pharmacology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | | | - Esko Kankuri
- Department of Pharmacology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Matti Kankainen
- Hematology Research Unit, University of Helsinki, Helsinki, Finland; Laboratory of Genetics, HUS Diagnostic Center, Hospital District of Helsinki and Uusimaa (HUS), Helsinki, Finland.
| |
Collapse
|
1278
|
Xing J, Gumerov VM, Zhulin IB. Origin and functional diversification of PAS domain, a ubiquitous intracellular sensor. SCIENCE ADVANCES 2023; 9:eadi4517. [PMID: 37647406 PMCID: PMC10468136 DOI: 10.1126/sciadv.adi4517] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 07/28/2023] [Indexed: 09/01/2023]
Abstract
Signal perception is a key function in regulating biological activities and adapting to changing environments. Per-Arnt-Sim (PAS) domains are ubiquitous sensors found in diverse receptors in bacteria, archaea, and eukaryotes, but their origins, distribution across the tree of life, and extent of their functional diversity are not fully characterized. Here, we show that using sequence conservation and structural information, it is possible to propose specific and potential functions for a large portion of nearly 3 million PAS domains. Our analysis suggests that PAS domains originated in bacteria and were horizontally transferred to archaea and eukaryotes. We reveal that gas sensing via a heme cofactor evolved independently in several lineages, whereas redox and light sensing via flavin adenine dinucleotide and flavin mononucleotide cofactors have the same origin. The close relatedness of human PAS domains to those in bacteria provides an opportunity for drug design by exploring potential natural ligands and cofactors for bacterial homologs.
Collapse
Affiliation(s)
- Jiawei Xing
- Department of Microbiology, The Ohio State University, Columbus, OH, USA
- Translational Data Analytics Institute, The Ohio State University, Columbus, OH USA
| | - Vadim M. Gumerov
- Department of Microbiology, The Ohio State University, Columbus, OH, USA
- Translational Data Analytics Institute, The Ohio State University, Columbus, OH USA
| | - Igor B. Zhulin
- Department of Microbiology, The Ohio State University, Columbus, OH, USA
- Translational Data Analytics Institute, The Ohio State University, Columbus, OH USA
| |
Collapse
|
1279
|
Wu F, Wu L, Radev D, Xu J, Li SZ. Integration of pre-trained protein language models into geometric deep learning networks. Commun Biol 2023; 6:876. [PMID: 37626165 PMCID: PMC10457366 DOI: 10.1038/s42003-023-05133-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 07/11/2023] [Indexed: 08/27/2023] Open
Abstract
Geometric deep learning has recently achieved great success in non-Euclidean domains, and learning on 3D structures of large biomolecules is emerging as a distinct research area. However, its efficacy is largely constrained due to the limited quantity of structural data. Meanwhile, protein language models trained on substantial 1D sequences have shown burgeoning capabilities with scale in a broad range of applications. Several preceding studies consider combining these different protein modalities to promote the representation power of geometric neural networks but fail to present a comprehensive understanding of their benefits. In this work, we integrate the knowledge learned by well-trained protein language models into several state-of-the-art geometric networks and evaluate a variety of protein representation learning benchmarks, including protein-protein interface prediction, model quality assessment, protein-protein rigid-body docking, and binding affinity prediction. Our findings show an overall improvement of 20% over baselines. Strong evidence indicates that the incorporation of protein language models' knowledge enhances geometric networks' capacity by a significant margin and can be generalized to complex tasks.
Collapse
Affiliation(s)
- Fang Wu
- AI Research and Innovation Laboratory, Westlake University, 310030, Hangzhou, China
| | - Lirong Wu
- AI Research and Innovation Laboratory, Westlake University, 310030, Hangzhou, China
| | - Dragomir Radev
- Department of Computer Science, Yale University, New Haven, CT, 06511, USA
| | - Jinbo Xu
- Institute of AI Industry Research, Tsinghua University, Haidian Street, 100084, Beijing, China
- Toyota Technological Institute at Chicago, Chicago, IL, 60637, USA
| | - Stan Z Li
- AI Research and Innovation Laboratory, Westlake University, 310030, Hangzhou, China.
| |
Collapse
|
1280
|
Cohen Y, Valdés-Mas R, Elinav E. The Role of Artificial Intelligence in Deciphering Diet-Disease Relationships: Case Studies. Annu Rev Nutr 2023; 43:225-250. [PMID: 37207358 DOI: 10.1146/annurev-nutr-061121-090535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Modernization of society from a rural, hunter-gatherer setting into an urban and industrial habitat, with the associated dietary changes, has led to an increased prevalence of cardiometabolic and additional noncommunicable diseases, such as cancer, inflammatory bowel disease, and neurodegenerative and autoimmune disorders. However, while dietary sciences have been rapidly evolving to meet these challenges, validation and translation of experimental results into clinical practice remain limited for multiple reasons, including inherent ethnic, gender, and cultural interindividual variability, among other methodological, dietary reporting-related, and analytical issues. Recently, large clinical cohorts with artificial intelligence analytics have introduced new precision and personalized nutrition concepts that enable one to successfully bridge these gaps in a real-life setting. In this review, we highlight selected examples of case studies at the intersection between diet-disease research and artificial intelligence. We discuss their potential and challenges and offer an outlook toward the transformation of dietary sciences into individualized clinical translation.
Collapse
Affiliation(s)
- Yotam Cohen
- Systems Immunology Department, Weizmann Institute of Science, Rehovot, Israel;
| | - Rafael Valdés-Mas
- Systems Immunology Department, Weizmann Institute of Science, Rehovot, Israel;
| | - Eran Elinav
- Systems Immunology Department, Weizmann Institute of Science, Rehovot, Israel;
- Division of Microbiome & Cancer, National German Cancer Research Center (DKFZ), Heidelberg, Germany;
| |
Collapse
|
1281
|
Huang W, Yin C, Briley KP, Dalzell WAB, Fallon JT. Dynamic Evolution of SARS-CoV-2 in a Patient on Chemotherapy. Viruses 2023; 15:1759. [PMID: 37632101 PMCID: PMC10458003 DOI: 10.3390/v15081759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 08/10/2023] [Accepted: 08/16/2023] [Indexed: 08/27/2023] Open
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has evolved significantly during the pandemic and resulted in daunting numbers of genomic sequences. Tracking SARS-CoV-2 evolution during persistent cases could provide insight into the origins and dynamics of new variants. We report here a case of B-cell acute lymphocytic leukemia on chemotherapy with infection of SARS-CoV-2 for more than two months. Genomic surveillance of his serial SARS-CoV-2-positive specimens revealed two unprecedented large deletions, Δ15-26 and Δ138-145, in the viral spike protein N-terminal domain (NTD) and demonstrated their dynamic shifts in generating these new variants. Located at antigenic supersites, these large deletions are anticipated to dramatically change the spike protein NTD in three-dimensional protein structure prediction, which may lead to immune escape but reduce their viral transmissibility. In summary, we present here a new viral evolutionary trajectory in a patient on chemotherapy.
Collapse
Affiliation(s)
- Weihua Huang
- Department of Pathology and Laboratory Medicine, Brody School of Medicine, East Carolina University, Greenville, NC 27834, USA; (C.Y.); (K.P.B.); (J.T.F.)
| | - Changhong Yin
- Department of Pathology and Laboratory Medicine, Brody School of Medicine, East Carolina University, Greenville, NC 27834, USA; (C.Y.); (K.P.B.); (J.T.F.)
| | - Kimberly P. Briley
- Department of Pathology and Laboratory Medicine, Brody School of Medicine, East Carolina University, Greenville, NC 27834, USA; (C.Y.); (K.P.B.); (J.T.F.)
| | - William A. B. Dalzell
- Department of Pediatrics, Brody School of Medicine, East Carolina University, Greenville, NC 27834, USA;
| | - John T. Fallon
- Department of Pathology and Laboratory Medicine, Brody School of Medicine, East Carolina University, Greenville, NC 27834, USA; (C.Y.); (K.P.B.); (J.T.F.)
| |
Collapse
|
1282
|
Laroussi H, Juarez‐Martinez AB, Le Roy A, Boeri Erba E, Gabel F, de Massy B, Kadlec J. Characterization of the REC114-MEI4-IHO1 complex regulating meiotic DNA double-strand break formation. EMBO J 2023; 42:e113866. [PMID: 37431931 PMCID: PMC10425845 DOI: 10.15252/embj.2023113866] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 06/16/2023] [Accepted: 06/23/2023] [Indexed: 07/12/2023] Open
Abstract
Meiotic recombination is initiated by the formation of DNA double-strand breaks (DSBs), essential for fertility and genetic diversity. In the mouse, DSBs are formed by the catalytic TOPOVIL complex consisting of SPO11 and TOPOVIBL. To preserve genome integrity, the activity of the TOPOVIL complex is finely controlled by several meiotic factors including REC114, MEI4, and IHO1, but the underlying mechanism is poorly understood. Here, we report that mouse REC114 forms homodimers, that it associates with MEI4 as a 2:1 heterotrimer that further dimerizes, and that IHO1 forms coiled-coil-based tetramers. Using AlphaFold2 modeling combined with biochemical characterization, we uncovered the molecular details of these assemblies. Finally, we show that IHO1 directly interacts with the PH domain of REC114 by recognizing the same surface as TOPOVIBL and another meiotic factor ANKRD31. These results provide strong evidence for the existence of a ternary IHO1-REC114-MEI4 complex and suggest that REC114 could act as a potential regulatory platform mediating mutually exclusive interactions with several partners.
Collapse
Affiliation(s)
| | | | - Aline Le Roy
- Université Grenoble Alpes, CNRS, CEA, IBSGrenobleFrance
| | | | - Frank Gabel
- Université Grenoble Alpes, CNRS, CEA, IBSGrenobleFrance
| | - Bernard de Massy
- Institut de Génétique Humaine (IGH), Centre National de la Recherche ScientifiqueUniversity of MontpellierMontpellierFrance
| | - Jan Kadlec
- Université Grenoble Alpes, CNRS, CEA, IBSGrenobleFrance
| |
Collapse
|
1283
|
Jagota M, Ye C, Albors C, Rastogi R, Koehl A, Ioannidis N, Song YS. Cross-protein transfer learning substantially improves disease variant prediction. Genome Biol 2023; 24:182. [PMID: 37550700 PMCID: PMC10408151 DOI: 10.1186/s13059-023-03024-6] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 07/27/2023] [Indexed: 08/09/2023] Open
Abstract
BACKGROUND Genetic variation in the human genome is a major determinant of individual disease risk, but the vast majority of missense variants have unknown etiological effects. Here, we present a robust learning framework for leveraging saturation mutagenesis experiments to construct accurate computational predictors of proteome-wide missense variant pathogenicity. RESULTS We train cross-protein transfer (CPT) models using deep mutational scanning (DMS) data from only five proteins and achieve state-of-the-art performance on clinical variant interpretation for unseen proteins across the human proteome. We also improve predictive accuracy on DMS data from held-out proteins. High sensitivity is crucial for clinical applications and our model CPT-1 particularly excels in this regime. For instance, at 95% sensitivity of detecting human disease variants annotated in ClinVar, CPT-1 improves specificity to 68%, from 27% for ESM-1v and 55% for EVE. Furthermore, for genes not used to train REVEL, a supervised method widely used by clinicians, we show that CPT-1 compares favorably with REVEL. Our framework combines predictive features derived from general protein sequence models, vertebrate sequence alignments, and AlphaFold structures, and it is adaptable to the future inclusion of other sources of information. We find that vertebrate alignments, albeit rather shallow with only 100 genomes, provide a strong signal for variant pathogenicity prediction that is complementary to recent deep learning-based models trained on massive amounts of protein sequence data. We release predictions for all possible missense variants in 90% of human genes. CONCLUSIONS Our results demonstrate the utility of mutational scanning data for learning properties of variants that transfer to unseen proteins.
Collapse
Affiliation(s)
- Milind Jagota
- Computer Science Division, University of California, Berkeley, 94720, CA, USA
| | - Chengzhong Ye
- Department of Statistics, University of California, Berkeley, 94720, CA, USA
| | - Carlos Albors
- Computer Science Division, University of California, Berkeley, 94720, CA, USA
| | - Ruchir Rastogi
- Computer Science Division, University of California, Berkeley, 94720, CA, USA
| | - Antoine Koehl
- Department of Statistics, University of California, Berkeley, 94720, CA, USA
| | - Nilah Ioannidis
- Computer Science Division, University of California, Berkeley, 94720, CA, USA
- Chan Zuckerberg Biohub, San Francisco, 94158, CA, USA
- Center for Computational Biology, University of California, Berkeley, 94720, CA, USA
| | - Yun S Song
- Computer Science Division, University of California, Berkeley, 94720, CA, USA.
- Department of Statistics, University of California, Berkeley, 94720, CA, USA.
- Center for Computational Biology, University of California, Berkeley, 94720, CA, USA.
| |
Collapse
|
1284
|
de Haas RJ, Brunette N, Goodson A, Dauparas J, Yi SY, Yang EC, Dowling Q, Nguyen H, Kang A, Bera AK, Sankaran B, de Vries R, Baker D, King NP. Rapid and automated design of two-component protein nanomaterials using ProteinMPNN. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.04.551935. [PMID: 37577478 PMCID: PMC10418170 DOI: 10.1101/2023.08.04.551935] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
The design of novel protein-protein interfaces using physics-based design methods such as Rosetta requires substantial computational resources and manual refinement by expert structural biologists. A new generation of deep learning methods promises to simplify protein-protein interface design and enable its application to a wide variety of problems by researchers from various scientific disciplines. Here we test the ability of a deep learning method for protein sequence design, ProteinMPNN, to design two-component tetrahedral protein nanomaterials and benchmark its performance against Rosetta. ProteinMPNN had a similar success rate to Rosetta, yielding 13 new experimentally confirmed assemblies, but required orders of magnitude less computation and no manual refinement. The interfaces designed by ProteinMPNN were substantially more polar than those designed by Rosetta, which facilitated in vitro assembly of the designed nanomaterials from independently purified components. Crystal structures of several of the assemblies confirmed the accuracy of the design method at high resolution. Our results showcase the potential of deep learning-based methods to unlock the widespread application of designed protein-protein interfaces and self-assembling protein nanomaterials in biotechnology.
Collapse
|
1285
|
Retamal-Farfán I, González-Higueras J, Galaz-Davison P, Rivera M, Ramírez-Sarmiento CA. Exploring the structural acrobatics of fold-switching proteins using simplified structure-based models. Biophys Rev 2023; 15:787-799. [PMID: 37681096 PMCID: PMC10480104 DOI: 10.1007/s12551-023-01087-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 06/22/2023] [Indexed: 09/09/2023] Open
Abstract
Metamorphic proteins are a paradigm of the protein folding process, by encoding two or more native states, highly dissimilar in terms of their secondary, tertiary, and even quaternary structure, on a single amino acid sequence. Moreover, these proteins structurally interconvert between these native states in a reversible manner at biologically relevant timescales as a result of different environmental cues. The large-scale rearrangements experienced by these proteins, and their sometimes high mass interacting partners that trigger their metamorphosis, makes the computational and experimental study of their structural interconversion challenging. Here, we present our efforts in studying the refolding landscapes of two quintessential metamorphic proteins, RfaH and KaiB, using simplified dual-basin structure-based models (SBMs), rigorously footed on the energy landscape theory of protein folding and the principle of minimal frustration. By using coarse-grained models in which the native contacts and bonded interactions extracted from the available experimental structures of the two native states of RfaH and KaiB are merged into a single Hamiltonian, dual-basin SBM models can be generated and savvily calibrated to explore their fold-switch in a reversible manner in molecular dynamics simulations. We also describe how some of the insights offered by these simulations have driven the design of experiments and the validation of the conformational ensembles and refolding routes observed using this simple and computationally efficient models.
Collapse
Affiliation(s)
- Ignacio Retamal-Farfán
- Institute for Biological and Medical Engineering, Schools of Engineering, Medicine and Biological Sciences, Pontificia Universidad Católica de Chile, 7820436 Santiago, Chile
- ANID — Millennium Science Initiative Program — Millennium Institute for Integrative Biology (iBio), Santiago, Chile
| | - Jorge González-Higueras
- Institute for Biological and Medical Engineering, Schools of Engineering, Medicine and Biological Sciences, Pontificia Universidad Católica de Chile, 7820436 Santiago, Chile
- ANID — Millennium Science Initiative Program — Millennium Institute for Integrative Biology (iBio), Santiago, Chile
| | - Pablo Galaz-Davison
- Institute for Biological and Medical Engineering, Schools of Engineering, Medicine and Biological Sciences, Pontificia Universidad Católica de Chile, 7820436 Santiago, Chile
- ANID — Millennium Science Initiative Program — Millennium Institute for Integrative Biology (iBio), Santiago, Chile
| | - Maira Rivera
- Institute for Biological and Medical Engineering, Schools of Engineering, Medicine and Biological Sciences, Pontificia Universidad Católica de Chile, 7820436 Santiago, Chile
- Department of Chemistry, Faculty of Science, McGill University, Montreal, Quebec H3A 0B8 Canada
| | - César A. Ramírez-Sarmiento
- Institute for Biological and Medical Engineering, Schools of Engineering, Medicine and Biological Sciences, Pontificia Universidad Católica de Chile, 7820436 Santiago, Chile
- ANID — Millennium Science Initiative Program — Millennium Institute for Integrative Biology (iBio), Santiago, Chile
| |
Collapse
|
1286
|
Madeo G, Savojardo C, Manfredi M, Martelli PL, Casadio R. CoCoNat: a novel method based on deep learning for coiled-coil prediction. Bioinformatics 2023; 39:btad495. [PMID: 37540220 PMCID: PMC10425188 DOI: 10.1093/bioinformatics/btad495] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 07/31/2023] [Accepted: 08/03/2023] [Indexed: 08/05/2023] Open
Abstract
MOTIVATION Coiled-coil domains (CCD) are widespread in all organisms and perform several crucial functions. Given their relevance, the computational detection of CCD is very important for protein functional annotation. State-of-the-art prediction methods include the precise identification of CCD boundaries, the annotation of the typical heptad repeat pattern along the coiled-coil helices as well as the prediction of the oligomerization state. RESULTS In this article, we describe CoCoNat, a novel method for predicting coiled-coil helix boundaries, residue-level register annotation, and oligomerization state. Our method encodes sequences with the combination of two state-of-the-art protein language models and implements a three-step deep learning procedure concatenated with a Grammatical-Restrained Hidden Conditional Random Field for CCD identification and refinement. A final neural network predicts the oligomerization state. When tested on a blind test set routinely adopted, CoCoNat obtains a performance superior to the current state-of-the-art both for residue-level and segment-level CCD. CoCoNat significantly outperforms the most recent state-of-the-art methods on register annotation and prediction of oligomerization states. AVAILABILITY AND IMPLEMENTATION CoCoNat web server is available at https://coconat.biocomp.unibo.it. Standalone version is available on GitHub at https://github.com/BolognaBiocomp/coconat.
Collapse
Affiliation(s)
- Giovanni Madeo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Matteo Manfredi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
| |
Collapse
|
1287
|
Sala D, Engelberger F, Mchaourab HS, Meiler J. Modeling conformational states of proteins with AlphaFold. Curr Opin Struct Biol 2023; 81:102645. [PMID: 37392556 DOI: 10.1016/j.sbi.2023.102645] [Citation(s) in RCA: 83] [Impact Index Per Article: 41.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 05/16/2023] [Accepted: 06/01/2023] [Indexed: 07/03/2023]
Abstract
Many proteins exert their function by switching among different structures. Knowing the conformational ensembles affiliated with these states is critical to elucidate key mechanistic aspects that govern protein function. While experimental determination efforts are still bottlenecked by cost, time, and technical challenges, the machine-learning technology AlphaFold showed near experimental accuracy in predicting the three-dimensional structure of monomeric proteins. However, an AlphaFold ensemble of models usually represents a single conformational state with minimal structural heterogeneity. Consequently, several pipelines have been proposed to either expand the structural breadth of an ensemble or bias the prediction toward a desired conformational state. Here, we analyze how those pipelines work, what they can and cannot predict, and future directions.
Collapse
Affiliation(s)
- D Sala
- Institute of Drug Discovery, Faculty of Medicine, University of Leipzig, 04103 Leipzig, Germany. https://twitter.com/sala_davide
| | - F Engelberger
- Institute of Drug Discovery, Faculty of Medicine, University of Leipzig, 04103 Leipzig, Germany. https://twitter.com/fengel97
| | - H S Mchaourab
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, USA. https://twitter.com/Mchaourablab
| | - J Meiler
- Institute of Drug Discovery, Faculty of Medicine, University of Leipzig, 04103 Leipzig, Germany; Center for Structural Biology, Vanderbilt University, Nashville, TN 37240, USA; Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), Dresden/Leipzig, Germany.
| |
Collapse
|
1288
|
Roche R, Moussad B, Shuvo MH, Bhattacharya D. E(3) equivariant graph neural networks for robust and accurate protein-protein interaction site prediction. PLoS Comput Biol 2023; 19:e1011435. [PMID: 37651442 PMCID: PMC10499216 DOI: 10.1371/journal.pcbi.1011435] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 09/13/2023] [Accepted: 08/15/2023] [Indexed: 09/02/2023] Open
Abstract
Artificial intelligence-powered protein structure prediction methods have led to a paradigm-shift in computational structural biology, yet contemporary approaches for predicting the interfacial residues (i.e., sites) of protein-protein interaction (PPI) still rely on experimental structures. Recent studies have demonstrated benefits of employing graph convolution for PPI site prediction, but ignore symmetries naturally occurring in 3-dimensional space and act only on experimental coordinates. Here we present EquiPPIS, an E(3) equivariant graph neural network approach for PPI site prediction. EquiPPIS employs symmetry-aware graph convolutions that transform equivariantly with translation, rotation, and reflection in 3D space, providing richer representations for molecular data compared to invariant convolutions. EquiPPIS substantially outperforms state-of-the-art approaches based on the same experimental input, and exhibits remarkable robustness by attaining better accuracy with predicted structural models from AlphaFold2 than what existing methods can achieve even with experimental structures. Freely available at https://github.com/Bhattacharya-Lab/EquiPPIS, EquiPPIS enables accurate PPI site prediction at scale.
Collapse
Affiliation(s)
- Rahmatullah Roche
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Bernard Moussad
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Md Hossain Shuvo
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Debswapna Bhattacharya
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, United States of America
| |
Collapse
|
1289
|
Watson JL, Juergens D, Bennett NR, Trippe BL, Yim J, Eisenach HE, Ahern W, Borst AJ, Ragotte RJ, Milles LF, Wicky BIM, Hanikel N, Pellock SJ, Courbet A, Sheffler W, Wang J, Venkatesh P, Sappington I, Torres SV, Lauko A, De Bortoli V, Mathieu E, Ovchinnikov S, Barzilay R, Jaakkola TS, DiMaio F, Baek M, Baker D. De novo design of protein structure and function with RFdiffusion. Nature 2023; 620:1089-1100. [PMID: 37433327 PMCID: PMC10468394 DOI: 10.1038/s41586-023-06415-8] [Citation(s) in RCA: 526] [Impact Index Per Article: 263.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 07/07/2023] [Indexed: 07/13/2023]
Abstract
There has been considerable recent progress in designing new proteins using deep-learning methods1-9. Despite this progress, a general deep-learning framework for protein design that enables solution of a wide range of design challenges, including de novo binder design and design of higher-order symmetric architectures, has yet to be described. Diffusion models10,11 have had considerable success in image and language generative modelling but limited success when applied to protein modelling, probably due to the complexity of protein backbone geometry and sequence-structure relationships. Here we show that by fine-tuning the RoseTTAFold structure prediction network on protein structure denoising tasks, we obtain a generative model of protein backbones that achieves outstanding performance on unconditional and topology-constrained protein monomer design, protein binder design, symmetric oligomer design, enzyme active site scaffolding and symmetric motif scaffolding for therapeutic and metal-binding protein design. We demonstrate the power and generality of the method, called RoseTTAFold diffusion (RFdiffusion), by experimentally characterizing the structures and functions of hundreds of designed symmetric assemblies, metal-binding proteins and protein binders. The accuracy of RFdiffusion is confirmed by the cryogenic electron microscopy structure of a designed binder in complex with influenza haemagglutinin that is nearly identical to the design model. In a manner analogous to networks that produce images from user-specified inputs, RFdiffusion enables the design of diverse functional proteins from simple molecular specifications.
Collapse
Affiliation(s)
- Joseph L Watson
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - David Juergens
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Graduate Program in Molecular Engineering, University of Washington, Seattle, WA, USA
| | - Nathaniel R Bennett
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Graduate Program in Molecular Engineering, University of Washington, Seattle, WA, USA
| | - Brian L Trippe
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Columbia University, Department of Statistics, New York, NY, USA
- Irving Institute for Cancer Dynamics, Columbia University, New York, NY, USA
| | - Jason Yim
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Helen E Eisenach
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Woody Ahern
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Andrew J Borst
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Robert J Ragotte
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Lukas F Milles
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Basile I M Wicky
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Nikita Hanikel
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Samuel J Pellock
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Alexis Courbet
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- National Centre for Scientific Research, École Normale Supérieure rue d'Ulm, Paris, France
| | - William Sheffler
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Jue Wang
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Preetham Venkatesh
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Graduate Program in Biological Physics, Structure and Design, University of Washington, Seattle, WA, USA
| | - Isaac Sappington
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Graduate Program in Biological Physics, Structure and Design, University of Washington, Seattle, WA, USA
| | - Susana Vázquez Torres
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Graduate Program in Biological Physics, Structure and Design, University of Washington, Seattle, WA, USA
| | - Anna Lauko
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Graduate Program in Biological Physics, Structure and Design, University of Washington, Seattle, WA, USA
| | - Valentin De Bortoli
- National Centre for Scientific Research, École Normale Supérieure rue d'Ulm, Paris, France
| | - Emile Mathieu
- Department of Engineering, University of Cambridge, Cambridge, UK
| | - Sergey Ovchinnikov
- Faculty of Applied Sciences, Harvard University, Cambridge, MA, USA
- John Harvard Distinguished Science Fellowship, Harvard University, Cambridge, MA, USA
| | | | | | - Frank DiMaio
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Minkyung Baek
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA, USA.
- Institute for Protein Design, University of Washington, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| |
Collapse
|
1290
|
Tam C, Iwasaki W. AlphaCutter: Efficient removal of non-globular regions from predicted protein structures. Proteomics 2023; 23:e2300176. [PMID: 37309722 DOI: 10.1002/pmic.202300176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 05/24/2023] [Accepted: 05/26/2023] [Indexed: 06/14/2023]
Abstract
A huge number of high-quality predicted protein structures are now publicly available. However, many of these structures contain non-globular regions, which diminish the performance of downstream structural bioinformatic applications. In this study, we develop AlphaCutter for the removal of non-globular regions from predicted protein structures. A large-scale cleaning of 542,380 predicted SwissProt structures highlights that AlphaCutter is able to (1) remove non-globular regions that are undetectable using pLDDT scores and (2) preserve high integrity of the cleaned domain regions. As useful applications, AlphaCutter improved the folding energy scores and sequence recovery rates in the re-design of domain regions. On average, AlphaCutter takes less than 3 s to clean a protein structure, enabling efficient cleaning of the exploding number of predicted protein structures. AlphaCutter is available at https://github.com/johnnytam100/AlphaCutter. AlphaCutter-cleaned SwissProt structures are available for download at https://doi.org/10.5281/zenodo.7944483.
Collapse
Affiliation(s)
- Chunlai Tam
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Chiba, Japan
| | - Wataru Iwasaki
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Chiba, Japan
| |
Collapse
|
1291
|
Penzar D, Nogina D, Noskova E, Zinkevich A, Meshcheryakov G, Lando A, Rafi AM, de Boer C, Kulakovskiy IV. LegNet: a best-in-class deep learning model for short DNA regulatory regions. Bioinformatics 2023; 39:btad457. [PMID: 37490428 PMCID: PMC10400376 DOI: 10.1093/bioinformatics/btad457] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 05/28/2023] [Accepted: 07/24/2023] [Indexed: 07/27/2023] Open
Abstract
MOTIVATION The increasing volume of data from high-throughput experiments including parallel reporter assays facilitates the development of complex deep-learning approaches for modeling DNA regulatory grammar. RESULTS Here, we introduce LegNet, an EfficientNetV2-inspired convolutional network for modeling short gene regulatory regions. By approaching the sequence-to-expression regression problem as a soft classification task, LegNet secured first place for the autosome.org team in the DREAM 2022 challenge of predicting gene expression from gigantic parallel reporter assays. Using published data, here, we demonstrate that LegNet outperforms existing models and accurately predicts gene expression per se as well as the effects of single-nucleotide variants. Furthermore, we show how LegNet can be used in a diffusion network manner for the rational design of promoter sequences yielding the desired expression level. AVAILABILITY AND IMPLEMENTATION https://github.com/autosome-ru/LegNet. The GitHub repository includes Jupyter Notebook tutorials and Python scripts under the MIT license to reproduce the results presented in the study.
Collapse
Affiliation(s)
- Dmitry Penzar
- Vavilov Institute of General Genetics, Moscow 119991, Russia
- Institute of Protein Research, Pushchino 142290, Russia
- Institute of Translational Medicine, Pirogov Russian National Research Medical University, Moscow 117997, Russia
| | - Daria Nogina
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow 119991, Russia
| | - Elizaveta Noskova
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow 119991, Russia
| | - Arsenii Zinkevich
- Vavilov Institute of General Genetics, Moscow 119991, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow 119991, Russia
| | | | | | - Abdul Muntakim Rafi
- School of Biomedical Engineering, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Carl de Boer
- School of Biomedical Engineering, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Ivan V Kulakovskiy
- Vavilov Institute of General Genetics, Moscow 119991, Russia
- Institute of Protein Research, Pushchino 142290, Russia
- Laboratory of Regulatory Genomics, Institute of Fundamental Medicine and Biology, Kazan Federal University, Kazan 420008, Russia
| |
Collapse
|
1292
|
Guzmán-Vega FJ, González-Álvarez AC, Peña-Guerra KA, Cardona-Londoño KJ, Arold ST. Leveraging AI Advances and Online Tools for Structure-Based Variant Analysis. Curr Protoc 2023; 3:e857. [PMID: 37540795 DOI: 10.1002/cpz1.857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/06/2023]
Abstract
Understanding how a gene variant affects protein function is important in life science, as it helps explain traits or dysfunctions in organisms. In a clinical setting, this understanding makes it possible to improve and personalize patient care. Bioinformatic tools often only assign a pathogenicity score, rather than providing information about the molecular basis for phenotypes. Experimental testing can furnish this information, but this is slow and costly and requires expertise and equipment not available in a clinical setting. Conversely, mapping a gene variant onto the three-dimensional (3D) protein structure provides a fast molecular assessment free of charge. Before 2021, this type of analysis was severely limited by the availability of experimentally determined 3D protein structures. Advances in artificial intelligence algorithms now allow confident prediction of protein structural features from sequence alone. The aim of the protocols presented here is to enable non-experts to use databases and online tools to investigate the molecular effect of a genetic variant. The Basic Protocol relies only on the online resources AlphaFold, Protein Structure Database, and UniProt. Alternate Protocols document the usage of the Protein Data Bank, SWISS-MODEL, ColabFold, and PyMOL for structure-based variant analysis. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol: 3D Mapping based on UniProt and AlphaFold Alternate Protocol 1: Using experimental models from the PDB Alternate Protocol 2: Using information from homology modeling with SWISS-MODEL Alternate Protocol 3: Predicting 3D structures with ColabFold Alternate Protocol 4: Structure visualization and analysis with PyMOL.
Collapse
Affiliation(s)
- Francisco J Guzmán-Vega
- Bioscience Program, Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, KAUST, Thuwal, Saudi Arabia
| | - Ana C González-Álvarez
- Bioengineering Program, Biological and Environmental Science and Engineering Division, KAUST, Thuwal, Saudi Arabia
- Computational Bioscience Research Center, KAUST, Thuwal, Saudi Arabia
| | - Karla A Peña-Guerra
- Bioengineering Program, Biological and Environmental Science and Engineering Division, KAUST, Thuwal, Saudi Arabia
- Computational Bioscience Research Center, KAUST, Thuwal, Saudi Arabia
| | - Kelly J Cardona-Londoño
- Bioengineering Program, Biological and Environmental Science and Engineering Division, KAUST, Thuwal, Saudi Arabia
- Computational Bioscience Research Center, KAUST, Thuwal, Saudi Arabia
| | - Stefan T Arold
- Bioscience Program, Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
- Bioengineering Program, Biological and Environmental Science and Engineering Division, KAUST, Thuwal, Saudi Arabia
- Computational Bioscience Research Center, KAUST, Thuwal, Saudi Arabia
- Centre de Biologie Structurale (CBS), INSERM, CNRS, Université de Montpellier, Montpellier, France
| |
Collapse
|
1293
|
Yang A, Jude KM, Lai B, Minot M, Kocyla AM, Glassman CR, Nishimiya D, Kim YS, Reddy ST, Khan AA, Garcia KC. Deploying synthetic coevolution and machine learning to engineer protein-protein interactions. Science 2023; 381:eadh1720. [PMID: 37499032 PMCID: PMC10403280 DOI: 10.1126/science.adh1720] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 06/16/2023] [Indexed: 07/29/2023]
Abstract
Fine-tuning of protein-protein interactions occurs naturally through coevolution, but this process is difficult to recapitulate in the laboratory. We describe a platform for synthetic protein-protein coevolution that can isolate matched pairs of interacting muteins from complex libraries. This large dataset of coevolved complexes drove a systems-level analysis of molecular recognition between Z domain-affibody pairs spanning a wide range of structures, affinities, cross-reactivities, and orthogonalities, and captured a broad spectrum of coevolutionary networks. Furthermore, we harnessed pretrained protein language models to expand, in silico, the amino acid diversity of our coevolution screen, predicting remodeled interfaces beyond the reach of the experimental library. The integration of these approaches provides a means of simulating protein coevolution and generating protein complexes with diverse molecular recognition properties for biotechnology and synthetic biology.
Collapse
Affiliation(s)
- Aerin Yang
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Kevin M. Jude
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA 94305, USA
- Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Ben Lai
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA
| | - Mason Minot
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
| | - Anna M. Kocyla
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Caleb R. Glassman
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Daisuke Nishimiya
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Yoon Seok Kim
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Sai T. Reddy
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
| | - Aly A. Khan
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA
- Departments of Pathology, and Family Medicine, University of Chicago, Chicago, IL 60637, USA
| | - K. Christopher Garcia
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA 94305, USA
- Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
| |
Collapse
|
1294
|
Stiefel J, Zimmer J, Schloßhauer JL, Vosen A, Kilz S, Balakin S. Just Keep Rolling?-An Encompassing Review towards Accelerated Vaccine Product Life Cycles. Vaccines (Basel) 2023; 11:1287. [PMID: 37631855 PMCID: PMC10459022 DOI: 10.3390/vaccines11081287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Revised: 07/20/2023] [Accepted: 07/24/2023] [Indexed: 08/27/2023] Open
Abstract
In light of the recent pandemic, several COVID-19 vaccines were developed, tested and approved in a very short time, a process that otherwise takes many years. Above all, these efforts have also unmistakably revealed the capacity limits and potential for improvement in vaccine production. This review aims to emphasize recent approaches for the targeted rapid adaptation and production of vaccines from an interdisciplinary, multifaceted perspective. Using research from the literature, stakeholder analysis and a value proposition canvas, we reviewed technological innovations on the pharmacological level, formulation, validation and resilient vaccine production to supply bottlenecks and logistic networks. We identified four main drivers to accelerate the vaccine product life cycle: computerized candidate screening, modular production, digitized quality management and a resilient business model with corresponding transparent supply chains. In summary, the results presented here can serve as a guide and implementation tool for flexible, scalable vaccine production to swiftly respond to pandemic situations in the future.
Collapse
Affiliation(s)
- Janis Stiefel
- Fraunhofer Institute for Microengineering and Microsystems IMM, Carl-Zeiss-Straße 18-20, 55129 Mainz, Germany
| | - Jan Zimmer
- Fraunhofer Institute for Microengineering and Microsystems IMM, Carl-Zeiss-Straße 18-20, 55129 Mainz, Germany
| | - Jeffrey L. Schloßhauer
- Fraunhofer Institute for Cell Therapy and Immunology, Branch Bioanalytics and Bioprocesses IZI-BB, Am Mühlenberg 13, 14476 Potsdam, Germany
| | - Agnes Vosen
- Fraunhofer Center for International Management and Knowledge Economy IMW, Neumarkt 20, 04109 Leipzig, Germany
| | - Sarah Kilz
- Fraunhofer Center for International Management and Knowledge Economy IMW, Neumarkt 20, 04109 Leipzig, Germany
| | - Sascha Balakin
- Fraunhofer Institute for Ceramic Technologies and Systems IKTS Material Diagnostics, Bio- and Nanotechnology, Maria-Reiche-Straße 2, 01109 Dresden, Germany
- Max Bergmann Center of Biomaterials (MBC), Technical University of Dresden, Budapester Strasse 27, 01069 Dresden, Germany
| |
Collapse
|
1295
|
Liu ZH, Teixeira JM, Zhang O, Tsangaris TE, Li J, Gradinaru CC, Head-Gordon T, Forman-Kay JD. Local Disordered Region Sampling (LDRS) for Ensemble Modeling of Proteins with Experimentally Undetermined or Low Confidence Prediction Segments. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.25.550520. [PMID: 37546943 PMCID: PMC10402175 DOI: 10.1101/2023.07.25.550520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
The Local Disordered Region Sampling (LDRS, pronounced loaders) tool, developed for the IDPConformerGenerator platform (Teixeira et al. 2022), provides a method for generating all-atom conformations of intrinsically disordered regions (IDRs) at N- and C-termini of and in loops or linkers between folded regions of an existing protein structure. These disordered elements often lead to missing coordinates in experimental structures or low confidence in predicted structures. Requiring only a pre-existing PDB structure of the protein with missing coordinates or with predicted confidence scores and its full-length primary sequence, LDRS will automatically generate physically meaningful conformational ensembles of the missing flexible regions to complete the full-length protein. The capabilities of the LDRS tool of IDPConformerGenerator include modeling phosphorylation sites using enhanced Monte Carlo Side Chain Entropy (MC-SCE) (Bhowmick and Head-Gordon 2015), transmembrane proteins within an all-atom bilayer, and multi-chain complexes. The modeling capacity of LDRS capitalizes on the modularity, ability to be used as a library and via command-line, and computational speed of the IDPConformerGenerator platform.
Collapse
Affiliation(s)
- Zi Hao Liu
- Molecular Medicine Program, Hospital for Sick Children, Toronto, Ontario M5G 0A4, Canada
- Department of Biochemistry, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - João M.C. Teixeira
- Molecular Medicine Program, Hospital for Sick Children, Toronto, Ontario M5G 0A4, Canada
| | - Oufan Zhang
- Pitzer Center for Theoretical Chemistry, University of California, Berkeley, California 94720, United States of America
- Department of Chemistry, University of California, Berkeley, California 94720-1460 United States of America
| | - Thomas E. Tsangaris
- Department of Physics, University of Toronto, Toronto, Ontario M5S 1A7, Canada
- Department of Chemical and Physical Sciences, University of Toronto Mississauga, Mississauga, Ontario L5L 1C6, Canada
| | - Jie Li
- Pitzer Center for Theoretical Chemistry, University of California, Berkeley, California 94720, United States of America
- Department of Chemistry, University of California, Berkeley, California 94720-1460 United States of America
| | - Claudiu C. Gradinaru
- Department of Physics, University of Toronto, Toronto, Ontario M5S 1A7, Canada
- Department of Chemical and Physical Sciences, University of Toronto Mississauga, Mississauga, Ontario L5L 1C6, Canada
| | - Teresa Head-Gordon
- Pitzer Center for Theoretical Chemistry, University of California, Berkeley, California 94720, United States of America
- Department of Chemistry, University of California, Berkeley, California 94720-1460 United States of America
- Department of Chemical and Biomolecular Engineering, University of California, Berkeley, California 94720-1462, United States of America
- Department of Bioengineering, University of California, Berkeley, California 94720-1762, United States of America
| | - Julie D. Forman-Kay
- Molecular Medicine Program, Hospital for Sick Children, Toronto, Ontario M5G 0A4, Canada
- Department of Biochemistry, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| |
Collapse
|
1296
|
Habeck M. Bayesian methods in integrative structure modeling. Biol Chem 2023; 404:741-754. [PMID: 37505205 DOI: 10.1515/hsz-2023-0145] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Accepted: 07/07/2023] [Indexed: 07/29/2023]
Abstract
There is a growing interest in characterizing the structure and dynamics of large biomolecular assemblies and their interactions within the cellular environment. A diverse array of experimental techniques allows us to study biomolecular systems on a variety of length and time scales. These techniques range from imaging with light, X-rays or electrons, to spectroscopic methods, cross-linking mass spectrometry and functional genomics approaches, and are complemented by AI-assisted protein structure prediction methods. A challenge is to integrate all of these data into a model of the system and its functional dynamics. This review focuses on Bayesian approaches to integrative structure modeling. We sketch the principles of Bayesian inference, highlight recent applications to integrative modeling and conclude with a discussion of current challenges and future perspectives.
Collapse
Affiliation(s)
- Michael Habeck
- Microscopic Image Analysis Group, Jena University Hospital, D-07743 Jena, Germany
- Max Planck Institute for Multidisciplinary Sciences, d-37077 Göttingen, Germany
| |
Collapse
|
1297
|
Xu J, Li F, Li C, Guo X, Landersdorfer C, Shen HH, Peleg AY, Li J, Imoto S, Yao J, Akutsu T, Song J. iAMPCN: a deep-learning approach for identifying antimicrobial peptides and their functional activities. Brief Bioinform 2023; 24:bbad240. [PMID: 37369638 PMCID: PMC10359087 DOI: 10.1093/bib/bbad240] [Citation(s) in RCA: 39] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 05/30/2023] [Accepted: 06/08/2023] [Indexed: 06/29/2023] Open
Abstract
Antimicrobial peptides (AMPs) are short peptides that play crucial roles in diverse biological processes and have various functional activities against target organisms. Due to the abuse of chemical antibiotics and microbial pathogens' increasing resistance to antibiotics, AMPs have the potential to be alternatives to antibiotics. As such, the identification of AMPs has become a widely discussed topic. A variety of computational approaches have been developed to identify AMPs based on machine learning algorithms. However, most of them are not capable of predicting the functional activities of AMPs, and those predictors that can specify activities only focus on a few of them. In this study, we first surveyed 10 predictors that can identify AMPs and their functional activities in terms of the features they employed and the algorithms they utilized. Then, we constructed comprehensive AMP datasets and proposed a new deep learning-based framework, iAMPCN (identification of AMPs based on CNNs), to identify AMPs and their related 22 functional activities. Our experiments demonstrate that iAMPCN significantly improved the prediction performance of AMPs and their corresponding functional activities based on four types of sequence features. Benchmarking experiments on the independent test datasets showed that iAMPCN outperformed a number of state-of-the-art approaches for predicting AMPs and their functional activities. Furthermore, we analyzed the amino acid preferences of different AMP activities and evaluated the model on datasets of varying sequence redundancy thresholds. To facilitate the community-wide identification of AMPs and their corresponding functional types, we have made the source codes of iAMPCN publicly available at https://github.com/joy50706/iAMPCN/tree/master. We anticipate that iAMPCN can be explored as a valuable tool for identifying potential AMPs with specific functional activities for further experimental validation.
Collapse
Affiliation(s)
- Jing Xu
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Fuyi Li
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- College of Information Engineering, Northwest A&F University, Shaanxi 712100, China
- The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, VIC 3800, Australia
| | - Chen Li
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Xudong Guo
- College of Information Engineering, Northwest A&F University, Shaanxi 712100, China
| | - Cornelia Landersdorfer
- Monash Institute of Pharmaceutical Sciences, Monash University, Melbourne, VIC 3800, Australia
| | - Hsin-Hui Shen
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Department of Materials Science and Engineering, Faculty of Engineering, Monash University, Clayton, VIC, 3800, Australia
| | - Anton Y Peleg
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Department of Infectious Diseases, Alfred Hospital, Alfred Health, Melbourne, Victoria, Australia
| | - Jian Li
- Monash Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC 3800, Australia
| | - Seiya Imoto
- Division of Health Medical Intelligence, Human Genome Center, Institute of Medical Science, The University of Tokyo, Minato-ku, Tokyo, Japan
- Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | | | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji 611-0011, Japan
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji 611-0011, Japan
| |
Collapse
|
1298
|
Kakoulidis P, Vlachos IS, Thanos D, Blatch GL, Emiris IZ, Anastasiadou E. Identifying and profiling structural similarities between Spike of SARS-CoV-2 and other viral or host proteins with Machaon. Commun Biol 2023; 6:752. [PMID: 37468602 PMCID: PMC10356814 DOI: 10.1038/s42003-023-05076-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 06/26/2023] [Indexed: 07/21/2023] Open
Abstract
Using protein structure to predict function, interactions, and evolutionary history is still an open challenge, with existing approaches relying extensively on protein homology and families. Here, we present Machaon, a data-driven method combining orientation invariant metrics on phi-psi angles, inter-residue contacts and surface complexity. It can be readily applied on whole structures or segments-such as domains and binding sites. Machaon was applied on SARS-CoV-2 Spike monomers of native, Delta and Omicron variants and identified correlations with a wide range of viral proteins from close to distant taxonomy ranks, as well as host proteins, such as ACE2 receptor. Machaon's meta-analysis of the results highlights structural, chemical and transcriptional similarities between the Spike monomer and human proteins, indicating a multi-level viral mimicry. This extended analysis also revealed relationships of the Spike protein with biological processes such as ubiquitination and angiogenesis and highlighted different patterns in virus attachment among the studied variants. Available at: https://machaonweb.com .
Collapse
Affiliation(s)
- Panos Kakoulidis
- Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, Ilisia, 157 84, Athens, Greece
- Biomedical Research Foundation of the Academy of Athens, 4 Soranou Ephessiou St., 115 27, Athens, Greece
| | - Ioannis S Vlachos
- Broad Institute of MIT and Harvard, Merkin Building, 415 Main St., Cambridge, MA, 02142, USA
- Cancer Research Institute, Beth Israel Deaconess Medical Center, 330 Brookline Avenue, Boston, MA, 02215, USA
- Department of Pathology, Beth Israel Deaconess Medical Center, 330 Brookline Avenue, Boston, MA, 02215, USA
- Harvard Medical School, 25 Shattuck Street, Boston, MA, 02115, USA
- Spatial Technologies Unit, Harvard Medical School Initiative for RNA Medicine, Dana Building, Beth Israel Deaconess Medical Center, 330 Brookline Avenue, Boston, MA, 02215, USA
| | - Dimitris Thanos
- Biomedical Research Foundation of the Academy of Athens, 4 Soranou Ephessiou St., 115 27, Athens, Greece
| | - Gregory L Blatch
- Biomedical Biotechnology Research Unit, Department of Biochemistry and Microbiology, Rhodes University, PO Box 94, Makhanda (Grahamstown) 6140, Eastern Cape, South Africa
- Biomedical and Drug Discovery Research Group, Faculty of Health Sciences, Higher Colleges of Technology, PO 25026, Sharjah, UAE
- Institute for Health and Sport, Victoria University, Melbourne, PO Box 14428, VIC 8001, Melbourne, Australia
- The Vice Chancellery, The University of Notre Dame Australia, PO Box 1225, WA 6959, Fremantle, Australia
| | - Ioannis Z Emiris
- Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, Ilisia, 157 84, Athens, Greece
- ATHENA Research and Innovation Center, Artemidos 6 & Epidavrou 15125, Marousi, Greece
| | - Ema Anastasiadou
- Biomedical Research Foundation of the Academy of Athens, 4 Soranou Ephessiou St., 115 27, Athens, Greece.
| |
Collapse
|
1299
|
Grigorjew A, Gynter A, Dias FHC, Buchfink B, Drost HG, Tomescu AI. Sensitive inference of alignment-safe intervals from biodiverse protein sequence clusters using EMERALD. Genome Biol 2023; 24:168. [PMID: 37461051 DOI: 10.1186/s13059-023-03008-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 07/05/2023] [Indexed: 07/20/2023] Open
Abstract
Sequence alignments are the foundations of life science research, but most innovation so far focuses on optimal alignments, while information derived from suboptimal solutions is ignored. We argue that one optimal alignment per pairwise sequence comparison is a reasonable approximation when dealing with very similar sequences but is insufficient when exploring the biodiversity of the protein universe at tree-of-life scale. To overcome this limitation, we introduce pairwise alignment-safety to uncover the amino acid positions robustly shared across all suboptimal solutions. We implement EMERALD, a software library for alignment-safety inference, and apply it to 400k sequences from the SwissProt database.
Collapse
Affiliation(s)
- Andreas Grigorjew
- Department of Computer Science, University of Helsinki, Helsinki, Finland
| | - Artur Gynter
- Department of Computer Science, University of Helsinki, Helsinki, Finland
| | - Fernando H C Dias
- Department of Computer Science, University of Helsinki, Helsinki, Finland
| | - Benjamin Buchfink
- Computational Biology Group, Max Planck Institute for Biology, Tübingen, Germany
| | - Hajk-Georg Drost
- Computational Biology Group, Max Planck Institute for Biology, Tübingen, Germany.
| | | |
Collapse
|
1300
|
Karlsen ST, Rau MH, Sánchez BJ, Jensen K, Zeidan AA. From genotype to phenotype: computational approaches for inferring microbial traits relevant to the food industry. FEMS Microbiol Rev 2023; 47:fuad030. [PMID: 37286882 PMCID: PMC10337747 DOI: 10.1093/femsre/fuad030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 05/31/2023] [Accepted: 06/06/2023] [Indexed: 06/09/2023] Open
Abstract
When selecting microbial strains for the production of fermented foods, various microbial phenotypes need to be taken into account to achieve target product characteristics, such as biosafety, flavor, texture, and health-promoting effects. Through continuous advances in sequencing technologies, microbial whole-genome sequences of increasing quality can now be obtained both cheaper and faster, which increases the relevance of genome-based characterization of microbial phenotypes. Prediction of microbial phenotypes from genome sequences makes it possible to quickly screen large strain collections in silico to identify candidates with desirable traits. Several microbial phenotypes relevant to the production of fermented foods can be predicted using knowledge-based approaches, leveraging our existing understanding of the genetic and molecular mechanisms underlying those phenotypes. In the absence of this knowledge, data-driven approaches can be applied to estimate genotype-phenotype relationships based on large experimental datasets. Here, we review computational methods that implement knowledge- and data-driven approaches for phenotype prediction, as well as methods that combine elements from both approaches. Furthermore, we provide examples of how these methods have been applied in industrial biotechnology, with special focus on the fermented food industry.
Collapse
Affiliation(s)
- Signe T Karlsen
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Martin H Rau
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Benjamín J Sánchez
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Kristian Jensen
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Ahmad A Zeidan
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| |
Collapse
|