1
|
Rosignoli S, Pacelli M, Manganiello F, Paiardini A. An outlook on structural biology after AlphaFold: tools, limits and perspectives. FEBS Open Bio 2025; 15:202-222. [PMID: 39313455 PMCID: PMC11788754 DOI: 10.1002/2211-5463.13902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 08/19/2024] [Accepted: 09/13/2024] [Indexed: 09/25/2024] Open
Abstract
AlphaFold and similar groundbreaking, AI-based tools, have revolutionized the field of structural bioinformatics, with their remarkable accuracy in ab-initio protein structure prediction. This success has catalyzed the development of new software and pipelines aimed at incorporating AlphaFold's predictions, often focusing on addressing the algorithm's remaining challenges. Here, we present the current landscape of structural bioinformatics shaped by AlphaFold, and discuss how the field is dynamically responding to this revolution, with new software, methods, and pipelines. While the excitement around AI-based tools led to their widespread application, it is essential to acknowledge that their practical success hinges on their integration into established protocols within structural bioinformatics, often neglected in the context of AI-driven advancements. Indeed, user-driven intervention is still as pivotal in the structure prediction process as in complementing state-of-the-art algorithms with functional and biological knowledge.
Collapse
Affiliation(s)
- Serena Rosignoli
- Department of Biochemical sciences “A. Rossi Fanelli”Sapienza Università di RomaItaly
| | - Maddalena Pacelli
- Department of Biochemical sciences “A. Rossi Fanelli”Sapienza Università di RomaItaly
| | - Francesca Manganiello
- Department of Biochemical sciences “A. Rossi Fanelli”Sapienza Università di RomaItaly
| | - Alessandro Paiardini
- Department of Biochemical sciences “A. Rossi Fanelli”Sapienza Università di RomaItaly
| |
Collapse
|
2
|
Greenshields-Watson A, Vavourakis O, Spoendlin FC, Cagiada M, Deane CM. Challenges and compromises: Predicting unbound antibody structures with deep learning. Curr Opin Struct Biol 2025; 90:102983. [PMID: 39862761 DOI: 10.1016/j.sbi.2025.102983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2024] [Revised: 12/31/2024] [Accepted: 01/02/2025] [Indexed: 01/27/2025]
Abstract
Therapeutic antibodies are manufactured, stored and administered in the free state; this makes understanding the unbound form key to designing and improving development pipelines. Prediction of unbound antibodies is challenging, specifically modelling of the CDRH3 loop, where inaccuracies are potentially worse due to a bias in structural data towards antibody-antigen complexes. This class imbalance provides a challenge for deep learning models trained on this data, potentially limiting generalisation to unbound forms. Here we discuss the importance of unbound structures in antibody development pipelines. We explore how the latest generation of structure predictors can provide new insights and assess how conformational heterogeneity may influence binding kinetics. We hypothesise that generative models may address some of these issues. While prediction of antibodies in complex is essential, we should not ignore the need for progress in modelling the unbound form.
Collapse
Affiliation(s)
- Alexander Greenshields-Watson
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, 24-29 St Giles', Oxford, OX1 3LB, United Kingdom.
| | - Odysseas Vavourakis
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, 24-29 St Giles', Oxford, OX1 3LB, United Kingdom
| | - Fabian C Spoendlin
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, 24-29 St Giles', Oxford, OX1 3LB, United Kingdom
| | - Matteo Cagiada
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, 24-29 St Giles', Oxford, OX1 3LB, United Kingdom; Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, DK-2200, Copenhagen, Denmark
| | - Charlotte M Deane
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, 24-29 St Giles', Oxford, OX1 3LB, United Kingdom
| |
Collapse
|
3
|
Schuette G, Lao Z, Zhang B. ChromoGen: Diffusion model predicts single-cell chromatin conformations. SCIENCE ADVANCES 2025; 11:eadr8265. [PMID: 39888999 PMCID: PMC11784829 DOI: 10.1126/sciadv.adr8265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Accepted: 01/02/2025] [Indexed: 02/02/2025]
Abstract
Breakthroughs in high-throughput sequencing and microscopic imaging technologies have revealed that chromatin structures vary considerably between cells of the same type. However, a thorough characterization of this heterogeneity remains elusive due to the labor-intensive and time-consuming nature of these experiments. To address these challenges, we introduce ChromoGen, a generative model based on state-of-the-art artificial intelligence techniques that efficiently predicts three-dimensional, single-cell chromatin conformations de novo with both region and cell type specificity. These generated conformations accurately reproduce experimental results at both the single-cell and population levels. Moreover, ChromoGen successfully transfers to cell types excluded from the training data using just DNA sequence and widely available DNase-seq data, thus providing access to chromatin structures in myriad cell types. These achievements come at a remarkably low computational cost. Therefore, ChromoGen enables the systematic investigation of single-cell chromatin organization, its heterogeneity, and its relationship to sequencing data, all while remaining economical.
Collapse
Affiliation(s)
| | | | - Bin Zhang
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| |
Collapse
|
4
|
Jones MS, Khanna S, Ferguson AL. FlowBack: A Generalized Flow-Matching Approach for Biomolecular Backmapping. J Chem Inf Model 2025; 65:672-692. [PMID: 39772562 DOI: 10.1021/acs.jcim.4c02046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2025]
Abstract
Coarse-grained models have become ubiquitous in biomolecular modeling tasks aimed at studying slow dynamical processes such as protein folding and DNA hybridization. These models can considerably accelerate sampling but it remains challenging to accurately and efficiently restore all-atom detail to the coarse-grained trajectory, which can be vital for detailed understanding of molecular mechanisms and calculation of observables contingent on all-atom coordinates. In this work, we introduce FlowBack as a deep generative model employing a flow-matching objective to map samples from a coarse-grained prior distribution to an all-atom data distribution. We construct our prior distribution to be agnostic to the coarse-grained map and molecular type. A protein-specific model trained on ∼65k structures from the Protein Data Bank achieves state-of-the-art performance on structural metrics compared to previous generative and rules-based approaches in applications to static PDB structures, all-atom simulations of fast-folding proteins, and coarse-grained trajectories generated by a machine-learned force field. A DNA-protein model trained on ∼1.5k DNA-protein complexes achieves excellent reconstruction and generative capabilities on static DNA-protein complexes from the Protein Data Bank as well as on out-of-distribution coarse-grained dynamical simulations of DNA-protein complexation. FlowBack offers an accurate, efficient, and easy-to-use tool to recover all-atom structures from coarse-grained molecular simulations with higher robustness and fewer steric clashes than previous approaches. We make FlowBack freely available to the community as an open source Python package.
Collapse
Affiliation(s)
- Michael S Jones
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Smayan Khanna
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
- Department of Chemistry, University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
5
|
Lu T, Liu M, Chen Y, Kim J, Huang PS. Assessing Generative Model Coverage of Protein Structures with SHAPES. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.09.632260. [PMID: 39868321 PMCID: PMC11761634 DOI: 10.1101/2025.01.09.632260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2025]
Abstract
Recent advances in generative modeling enable efficient sampling of protein structures, but their tendency to optimize for designability imposes a bias toward idealized structures at the expense of loops and other complex structural motifs critical for function. We introduce SHAPES (Structural and Hierarchical Assessment of Proteins with Embedding Similarity) to evaluate five state-of-the-art generative models of protein structures. Using structural embeddings across multiple structural hierarchies, ranging from local geometries to global protein architectures, we reveal substantial undersampling of the observed protein structure space by these models. We use Fréchet Protein Distance (FPD) to quantify distributional coverage. Different models are distinct in their coverage behavior across different sampling noise scales and temperatures; the frequency of TERtiary Motifs (TERMs) further supports the observations. More robust sequence design and structure prediction methods are likely crucial in guiding the development of models with improved coverage of the designable protein space.
Collapse
Affiliation(s)
- Tianyu Lu
- Department of Bioengineering, Stanford University, Stanford, CA, USA
- Equal contribution
| | - Melissa Liu
- Department of Bioengineering, Stanford University, Stanford, CA, USA
- Equal contribution
| | - Yilin Chen
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Jinho Kim
- Department of Physics, Stanford University, Stanford, CA, USA
| | - Po-Ssu Huang
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| |
Collapse
|
6
|
Hayes T, Rao R, Akin H, Sofroniew NJ, Oktay D, Lin Z, Verkuil R, Tran VQ, Deaton J, Wiggert M, Badkundri R, Shafkat I, Gong J, Derry A, Molina RS, Thomas N, Khan YA, Mishra C, Kim C, Bartie LJ, Nemeth M, Hsu PD, Sercu T, Candido S, Rives A. Simulating 500 million years of evolution with a language model. Science 2025:eads0018. [PMID: 39818825 DOI: 10.1126/science.ads0018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2024] [Accepted: 01/07/2025] [Indexed: 01/19/2025]
Abstract
More than three billion years of evolution have produced an image of biology encoded into the space of natural proteins. Here we show that language models trained at scale on evolutionary data can generate functional proteins that are far away from known proteins. We present ESM3, a frontier multimodal generative language model that reasons over the sequence, structure, and function of proteins. ESM3 can follow complex prompts combining its modalities and is highly responsive to alignment to improve its fidelity. We have prompted ESM3 to generate fluorescent proteins. Among the generations that we synthesized, we found a bright fluorescent protein at a far distance (58% sequence identity) from known fluorescent proteins, which we estimate is equivalent to simulating five hundred million years of evolution.
Collapse
Affiliation(s)
| | - Roshan Rao
- EvolutionaryScale, PBC, New York, NY, USA
| | - Halil Akin
- EvolutionaryScale, PBC, New York, NY, USA
| | | | | | - Zeming Lin
- EvolutionaryScale, PBC, New York, NY, USA
| | | | - Vincent Q Tran
- Arc Institute, Palo Alto, CA, USA
- University of California, Berkeley, Berkeley, CA, USA
| | | | | | | | | | - Jun Gong
- EvolutionaryScale, PBC, New York, NY, USA
| | | | | | | | | | | | | | | | | | - Patrick D Hsu
- Arc Institute, Palo Alto, CA, USA
- University of California, Berkeley, Berkeley, CA, USA
| | - Tom Sercu
- EvolutionaryScale, PBC, New York, NY, USA
| | | | | |
Collapse
|
7
|
Cheng AH, Ser CT, Skreta M, Guzmán-Cordero A, Thiede L, Burger A, Aldossary A, Leong SX, Pablo-García S, Strieth-Kalthoff F, Aspuru-Guzik A. Spiers Memorial Lecture: How to do impactful research in artificial intelligence for chemistry and materials science. Faraday Discuss 2025; 256:10-60. [PMID: 39400305 DOI: 10.1039/d4fd00153b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2024]
Abstract
Machine learning has been pervasively touching many fields of science. Chemistry and materials science are no exception. While machine learning has been making a great impact, it is still not reaching its full potential or maturity. In this perspective, we first outline current applications across a diversity of problems in chemistry. Then, we discuss how machine learning researchers view and approach problems in the field. Finally, we provide our considerations for maximizing impact when researching machine learning for chemistry.
Collapse
Affiliation(s)
- Austin H Cheng
- Department of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada.
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario M5G 1M1, Canada
| | - Cher Tian Ser
- Department of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada.
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario M5G 1M1, Canada
| | - Marta Skreta
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario M5G 1M1, Canada
| | - Andrés Guzmán-Cordero
- Vector Institute for Artificial Intelligence, Toronto, Ontario M5G 1M1, Canada
- Tinbergen Institute, University of Amsterdam, Amsterdam, Netherlands
| | - Luca Thiede
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario M5G 1M1, Canada
| | - Andreas Burger
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario M5G 1M1, Canada
| | | | - Shi Xuan Leong
- Department of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada.
- School of Chemistry, Chemical Engineering and Biotechnology, Nanyang Technological University, Singapore 63737, Singapore
| | | | | | - Alán Aspuru-Guzik
- Department of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada.
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario M5G 1M1, Canada
- Acceleration Consortium, Toronto, Ontario M5G 1X6, Canada
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Canada
- Department of Materials Science and Engineering, University of Toronto, Canada
- Lebovic Fellow, Canadian Institute for Advanced Research (CIFAR), Canada
| |
Collapse
|
8
|
Li Y, Duan Z, Li Z, Xue W. Data and AI-driven synthetic binding protein discovery. Trends Pharmacol Sci 2025:S0165-6147(24)00268-2. [PMID: 39755458 DOI: 10.1016/j.tips.2024.12.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2024] [Revised: 12/02/2024] [Accepted: 12/06/2024] [Indexed: 01/06/2025]
Abstract
Synthetic binding proteins (SBPs) are a class of protein binders that are artificially created and do not exist naturally. Their broad applications in tackling challenges of research, diagnostics, and therapeutics have garnered significant interest. Traditional protein engineering is pivotal to the discovery of SBPs. Recently, this discovery has been significantly accelerated by computational approaches, such as molecular modeling and artificial intelligence (AI). Furthermore, while numerous bioinformatics databases offer a wealth of resources that fuel SBP discovery, the full potential of these data has not yet been fully exploited. In this review, we present a comprehensive overview of SBP data ecosystem and methodologies in SBP discovery, highlighting the critical role of high-quality data and AI technologies in accelerating the discovery of innovative SBPs with promising applications in pharmacological sciences.
Collapse
Affiliation(s)
- Yanlin Li
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Zixin Duan
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Zhenwen Li
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Weiwei Xue
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China; Western (Chongqing) Collaborative Innovation Center for Intelligent Diagnostics and Digital Medicine, Chongqing National Biomedicine Industry Park, Chongqing 401329, China.
| |
Collapse
|
9
|
Lopez-Mateos D, Harris BJ, Hernández-González A, Narang K, Yarov-Yarovoy V. Harnessing Deep Learning Methods for Voltage-Gated Ion Channel Drug Discovery. Physiology (Bethesda) 2025; 40:0. [PMID: 39189871 DOI: 10.1152/physiol.00029.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Revised: 08/16/2024] [Accepted: 08/18/2024] [Indexed: 08/28/2024] Open
Abstract
Voltage-gated ion channels (VGICs) are pivotal in regulating electrical activity in excitable cells and are critical pharmaceutical targets for treating many diseases including cardiac arrhythmia and neuropathic pain. Despite their significance, challenges such as achieving target selectivity persist in VGIC drug development. Recent progress in deep learning, particularly diffusion models, has enabled the computational design of protein binders for any clinically relevant protein based solely on its structure. These developments coincide with a surge in experimental structural data for VGICs, providing a rich foundation for computational design efforts. This review explores the recent advancements in computational protein design using deep learning and diffusion methods, focusing on their application in designing protein binders to modulate VGIC activity. We discuss the potential use of these methods to computationally design protein binders targeting different regions of VGICs, including the pore domain, voltage-sensing domains, and interface with auxiliary subunits. We provide a comprehensive overview of the different design scenarios, discuss key structural considerations, and address the practical challenges in developing VGIC-targeting protein binders. By exploring these innovative computational methods, we aim to provide a framework for developing novel strategies that could significantly advance VGIC pharmacology and lead to the discovery of effective and safe therapeutics.
Collapse
Affiliation(s)
- Diego Lopez-Mateos
- Department of Physiology and Membrane Biology, University of California School of Medicine, Davis, California, United States
- Biophysics Graduate Group, University of California School of Medicine, Davis, California, United States
| | - Brandon John Harris
- Department of Physiology and Membrane Biology, University of California School of Medicine, Davis, California, United States
- Biophysics Graduate Group, University of California School of Medicine, Davis, California, United States
| | - Adriana Hernández-González
- Department of Physiology and Membrane Biology, University of California School of Medicine, Davis, California, United States
- Biophysics Graduate Group, University of California School of Medicine, Davis, California, United States
| | - Kush Narang
- Department of Physiology and Membrane Biology, University of California School of Medicine, Davis, California, United States
| | - Vladimir Yarov-Yarovoy
- Department of Physiology and Membrane Biology, University of California School of Medicine, Davis, California, United States
- Biophysics Graduate Group, University of California School of Medicine, Davis, California, United States
- Department of Anesthesiology and Pain Medicine, University of California School of Medicine, Davis, California, United States
| |
Collapse
|
10
|
Cagiada M, Ovchinnikov S, Lindorff‐Larsen K. Predicting absolute protein folding stability using generative models. Protein Sci 2025; 34:e5233. [PMID: 39673466 PMCID: PMC11645669 DOI: 10.1002/pro.5233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2024] [Revised: 10/30/2024] [Accepted: 11/11/2024] [Indexed: 12/16/2024]
Abstract
While there has been substantial progress in our ability to predict changes in protein stability due to amino acid substitutions, progress has been slower in methods to predict the absolute stability of a protein. Here, we show how a generative model for protein sequence can be leveraged to predict absolute protein stability. We benchmark our predictions across a broad set of proteins and find a mean error of 1.5 kcal/mol and a correlation coefficient of 0.7 for the absolute stability across a range of natural, small- to medium-sized proteins up to ca. 150 amino acid residues. We analyze current limitations and future directions including how such a model may be useful for predicting conformational free energies. Our approach is simple to use and freely available at an online implementation available via https://github.com/KULL-Centre/_2024_cagiada_stability.
Collapse
Affiliation(s)
- Matteo Cagiada
- Linderstrøm‐Lang Centre for Protein Science, Department of BiologyUniversity of CopenhagenCopenhagenDenmark
| | - Sergey Ovchinnikov
- Department of BiologyMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| | - Kresten Lindorff‐Larsen
- Linderstrøm‐Lang Centre for Protein Science, Department of BiologyUniversity of CopenhagenCopenhagenDenmark
| |
Collapse
|
11
|
Dhoriyani J, Bergman MT, Hall CK, You F. Integrating biophysical modeling, quantum computing, and AI to discover plastic-binding peptides that combat microplastic pollution. PNAS NEXUS 2025; 4:pgae572. [PMID: 39871828 PMCID: PMC11770337 DOI: 10.1093/pnasnexus/pgae572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/08/2024] [Accepted: 12/16/2024] [Indexed: 01/29/2025]
Abstract
Methods are needed to mitigate microplastic (MP) pollution to minimize their harm to the environment and human health. Given the ability of polypeptides to adsorb strongly to materials of micro- or nanometer size, plastic-binding peptides (PBPs) could help create bio-based tools for detecting, filtering, or degrading MNP pollution. However, the development of such tools is prevented by the lack of PBPs. In this work, we discover and evaluate PBPs for several common plastics by combining biophysical modeling, molecular dynamics (MD), quantum computing, and reinforcement learning. We frame peptide affinity for a given plastic through a Potts model that is a function of the amino acid sequence and then search for the amino acid sequences with the greatest predicted affinity using quantum annealing. We also use proximal policy optimization to find PBPs with a broader range of physicochemical properties, such as isoelectric point or solubility. Evaluation of the discovered PBPs in MD simulations demonstrates that the peptides have high affinity for two of the plastics: polyethylene and polypropylene. We conclude by describing how our computational approach could be paired with experimental approaches to create a nexus for designing and optimizing peptide-based tools that aid the detection, capture, or biodegradation of MPs. We thus hope that this study will aid in the fight against MP pollution.
Collapse
Affiliation(s)
- Jeet Dhoriyani
- Systems Engineering, College of Engineering, Cornell University, Ithaca, NY 14853, USA
| | - Michael T Bergman
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC 27606, USA
| | - Carol K Hall
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC 27606, USA
| | - Fengqi You
- Systems Engineering, College of Engineering, Cornell University, Ithaca, NY 14853, USA
- Robert Frederick Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY 14853, USA
- Cornell University AI for Science Institute, Cornell University, Ithaca, NY 14853, USA
| |
Collapse
|
12
|
Wang F, Wang Y, Feng L, Zhang C, Lai L. Target-Specific De Novo Peptide Binder Design with DiffPepBuilder. J Chem Inf Model 2024; 64:9135-9149. [PMID: 39266056 DOI: 10.1021/acs.jcim.4c00975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/14/2024]
Abstract
Despite the exciting progress in target-specific de novo protein binder design, peptide binder design remains challenging due to the flexibility of peptide structures and the scarcity of protein-peptide complex structure data. In this study, we curated a large synthetic data set, referred to as PepPC-F, from the abundant protein-protein interface data and developed DiffPepBuilder, a de novo target-specific peptide binder generation method that utilizes an SE(3)-equivariant diffusion model trained on PepPC-F to codesign peptide sequences and structures. DiffPepBuilder also introduces disulfide bonds to stabilize the generated peptide structures. We tested DiffPepBuilder on 30 experimentally verified strong peptide binders with available protein-peptide complex structures. DiffPepBuilder was able to effectively recall the native structures and sequences of the peptide ligands and to generate novel peptide binders with improved binding free energy. We subsequently conducted de novo generation case studies on three targets. In both the regeneration test and case studies, DiffPepBuilder outperformed AfDesign and RFdiffusion coupled with ProteinMPNN, in terms of sequence and structure recall, interface quality, and structural diversity. Molecular dynamics simulations confirmed that the introduction of disulfide bonds enhanced the structural rigidity and binding performance of the generated peptides. As a general peptide binder de novo design tool, DiffPepBuilder can be used to design peptide binders for given protein targets with three-dimensional and binding site information.
Collapse
Affiliation(s)
- Fanhao Wang
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Yuzhe Wang
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Laiyi Feng
- Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Changsheng Zhang
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Luhua Lai
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
- Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| |
Collapse
|
13
|
Xu J, Wang Y. Generating Multistate Conformations of P-type ATPases with a Conditional Diffusion Model. J Chem Inf Model 2024; 64:9227-9239. [PMID: 39480276 DOI: 10.1021/acs.jcim.4c01519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2024]
Abstract
Understanding and predicting the diverse conformational states of membrane proteins is essential for elucidating their biological functions. Despite advancements in computational methods, accurately capturing these complex structural changes remains a significant challenge. Here, we introduce a computational approach to generate diverse and biologically relevant conformations of membrane proteins using a conditional diffusion model. Our approach integrates forward and backward diffusion processes, incorporating state classifiers and additional conditioners to control the generation gradient of conformational states. We specifically targeted the P-type ATPases, a critical family of membrane transporters, and constructed a comprehensive data set through a combination of experimental structures and molecular dynamics simulations. Our model, incorporating a graph neural network with specialized membrane constraints, demonstrates exceptional accuracy in generating a wide range of P-type ATPase conformations associated with different functional states. This approach represents a meaningful step forward in the computational generation of membrane protein conformations using AI and holds promise for studying the dynamics of other membrane proteins.
Collapse
Affiliation(s)
- Jingtian Xu
- College of Life Sciences, Zhejiang University, Hangzhou 310027, China
| | - Yong Wang
- College of Life Sciences, Zhejiang University, Hangzhou 310027, China
| |
Collapse
|
14
|
Mo W, Vaiana CA, Myers CJ. The need for adaptability in detection, characterization, and attribution of biosecurity threats. Nat Commun 2024; 15:10699. [PMID: 39702312 DOI: 10.1038/s41467-024-55436-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Accepted: 12/12/2024] [Indexed: 12/21/2024] Open
Abstract
Modern biotechnology necessitates robust biosecurity protocols to address the risk of engineered biological threats. Current efforts focus on screening DNA and rejecting the synthesis of dangerous elements but face technical and logistical barriers. Screening should integrate into a broader strategy that addresses threats at multiple stages of development and deployment. The success of this approach hinges upon reliable detection, characterization, and attribution of engineered DNA. Recent advances notably aid the potential to both develop threats and analyze them. However, further work is needed to translate developments into biosecurity applications. This work reviews cutting-edge methods for DNA analysis and recommends avenues to improve biosecurity in an adaptable manner.
Collapse
Affiliation(s)
- William Mo
- Draper Scholar, The Charles Stark Draper Laboratory, Inc., 555 Technology Square, Cambridge, MA, USA
- Department of Electrical, Computer, and Energy Engineering, University of Colorado Boulder, 1111 Engineering Dr, Boulder, CO, USA
| | - Christopher A Vaiana
- The Charles Stark Draper Laboratory, Inc., 555 Technology Square, Cambridge, MA, USA
| | - Chris J Myers
- Department of Electrical, Computer, and Energy Engineering, University of Colorado Boulder, 1111 Engineering Dr, Boulder, CO, USA.
| |
Collapse
|
15
|
Yin M, Zhou H, Zhu Y, Lin M, Wu Y, Wu J, Xu H, Hsieh CY, Hou T, Chen J, Wu J. Multi-Modal CLIP-Informed Protein Editing. HEALTH DATA SCIENCE 2024; 4:0211. [PMID: 39703565 PMCID: PMC11658819 DOI: 10.34133/hds.0211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/24/2024] [Revised: 10/17/2024] [Accepted: 11/12/2024] [Indexed: 12/21/2024]
Abstract
Background: Proteins govern most biological functions essential for life, and achieving controllable protein editing has made great advances in probing natural systems, creating therapeutic conjugates, and generating novel protein constructs. Recently, machine learning-assisted protein editing (MLPE) has shown promise in accelerating optimization cycles and reducing experimental workloads. However, current methods struggle with the vast combinatorial space of potential protein edits and cannot explicitly conduct protein editing using biotext instructions, limiting their interactivity with human feedback. Methods: To fill these gaps, we propose a novel method called ProtET for efficient CLIP-informed protein editing through multi-modality learning. Our approach comprises 2 stages: In the pretraining stage, contrastive learning aligns protein-biotext representations encoded by 2 large language models (LLMs). Subsequently, during the protein editing stage, the fused features from editing instruction texts and original protein sequences serve as the final editing condition for generating target protein sequences. Results: Comprehensive experiments demonstrated the superiority of ProtET in editing proteins to enhance human-expected functionality across multiple attribute domains, including enzyme catalytic activity, protein stability, and antibody-specific binding ability. ProtET improves the state-of-the-art results by a large margin, leading to substantial stability improvements of 16.67% and 16.90%. Conclusions: This capability positions ProtET to advance real-world artificial protein editing, potentially addressing unmet academic, industrial, and clinical needs.
Collapse
Affiliation(s)
- Mingze Yin
- School of Medicine,
Zhejiang University, Hangzhou, China
| | - Hanjing Zhou
- College of Computer Science and Technology,
Zhejiang University, Hangzhou, China
| | - Yiheng Zhu
- College of Computer Science and Technology,
Zhejiang University, Hangzhou, China
| | - Miao Lin
- Medical Big Data Center, Guangdong Provincial People’s Hospital (Guangdong Academy of Medical Sciences),
Southern Medical University, Guangzhou, China
| | - Yixuan Wu
- School of Medicine,
Zhejiang University, Hangzhou, China
| | - Jialu Wu
- Innovation Institute for Artificial Intelligence in Medicine ofZhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Hongxia Xu
- Innovation Institute for Artificial Intelligence in Medicine ofZhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine ofZhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine ofZhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Jintai Chen
- AI Thrust, Information Hub, HKUST (Guangzhou), Guangzhou, China
| | - Jian Wu
- Second Affiliated Hospital School of Medicine, Hangzhou, China
- School of Public Health,
Zhejiang University, Hangzhou, China
- Institute of Wenzhou, Wenzhou, China
| |
Collapse
|
16
|
O'Donnell TJ, Kanduri C, Isacchini G, Limenitakis JP, Brachman RA, Alvarez RA, Haff IH, Sandve GK, Greiff V. Reading the repertoire: Progress in adaptive immune receptor analysis using machine learning. Cell Syst 2024; 15:1168-1189. [PMID: 39701034 DOI: 10.1016/j.cels.2024.11.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2024] [Revised: 08/16/2024] [Accepted: 11/14/2024] [Indexed: 12/21/2024]
Abstract
The adaptive immune system holds invaluable information on past and present immune responses in the form of B and T cell receptor sequences, but we are limited in our ability to decode this information. Machine learning approaches are under active investigation for a range of tasks relevant to understanding and manipulating the adaptive immune receptor repertoire, including matching receptors to the antigens they bind, generating antibodies or T cell receptors for use as therapeutics, and diagnosing disease based on patient repertoires. Progress on these tasks has the potential to substantially improve the development of vaccines, therapeutics, and diagnostics, as well as advance our understanding of fundamental immunological principles. We outline key challenges for the field, highlighting the need for software benchmarking, targeted large-scale data generation, and coordinated research efforts.
Collapse
Affiliation(s)
| | - Chakravarthi Kanduri
- Department of Informatics, University of Oslo, Oslo, Norway; UiO:RealArt Convergence Environment, University of Oslo, Oslo, Norway
| | | | | | - Rebecca A Brachman
- Imprint Labs, LLC, New York, NY, USA; Cornell Tech, Cornell University, New York, NY, USA
| | | | - Ingrid H Haff
- Department of Mathematics, University of Oslo, 0371 Oslo, Norway
| | - Geir K Sandve
- Department of Informatics, University of Oslo, Oslo, Norway; UiO:RealArt Convergence Environment, University of Oslo, Oslo, Norway
| | - Victor Greiff
- Imprint Labs, LLC, New York, NY, USA; Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway.
| |
Collapse
|
17
|
Samanta R, Harmalkar A, Prathima P, Gray JJ. Advancing Membrane-Associated Protein Docking with Improved Sampling and Scoring in Rosetta. J Chem Theory Comput 2024; 20:10740-10749. [PMID: 39574325 DOI: 10.1021/acs.jctc.4c00927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2024]
Abstract
The oligomerization of protein macromolecules on cell membranes plays a fundamental role in regulating cellular function. From modulating signal transduction to directing immune response, membrane proteins (MPs) play a crucial role in biological processes and are often the target of many pharmaceutical drugs. Despite their biological relevance, the challenges in experimental determination have hampered the structural availability of membrane proteins and their complexes. Computational docking provides a promising alternative to model membrane protein complex structures. Here, we present Rosetta-MPDock, a flexible transmembrane (TM) protein docking protocol that captures binding-induced conformational changes. Rosetta-MPDock samples large conformational ensembles of flexible monomers and docks them within an implicit membrane environment. We benchmarked this method on 29 TM-protein complexes of variable backbone flexibility. These complexes are classified based on the root-mean-square deviation between the unbound and bound states (RMSDUB) as rigid (RMSDUB < 1.2 Å), moderately flexible (RMSDUB ∈ [1.2, 2.2] Å), and flexible targets (RMSDUB > 2.2 Å). In a local docking scenario, i.e. with membrane protein partners starting ≈10 Å apart embedded in the membrane in their unbound conformations, Rosetta-MPDock successfully predicts the correct interface (success defined as achieving 3 near-native structures in the 5 top-ranked models) for 67% moderately flexible targets and 60% of the highly flexible targets, a substantial improvement from the existing membrane protein docking methods. Further, by integrating AlphaFold2-multimer for structure determination and using Rosetta-MPDock for docking and refinement, we demonstrate improved success rates over the benchmark targets from 64% to 73%. Rosetta-MPDock advances the capabilities for membrane protein complex structure prediction and modeling to tackle key biological questions and elucidate functional mechanisms in the membrane environment. The benchmark set and the code is available for public use at github.com/Graylab/MPDock.
Collapse
Affiliation(s)
- Rituparna Samanta
- Department of Chemical and Biomolecular Engineering, The Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Ameya Harmalkar
- Department of Chemical and Biomolecular Engineering, The Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Priyamvada Prathima
- Department of Chemical and Biomolecular Engineering, The Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Jeffrey J Gray
- Department of Chemical and Biomolecular Engineering, The Johns Hopkins University, Baltimore, Maryland 21218, United States
| |
Collapse
|
18
|
Ramos MC, Collison CJ, White AD. A review of large language models and autonomous agents in chemistry. Chem Sci 2024:d4sc03921a. [PMID: 39829984 PMCID: PMC11739813 DOI: 10.1039/d4sc03921a] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2024] [Accepted: 12/03/2024] [Indexed: 01/22/2025] Open
Abstract
Large language models (LLMs) have emerged as powerful tools in chemistry, significantly impacting molecule design, property prediction, and synthesis optimization. This review highlights LLM capabilities in these domains and their potential to accelerate scientific discovery through automation. We also review LLM-based autonomous agents: LLMs with a broader set of tools to interact with their surrounding environment. These agents perform diverse tasks such as paper scraping, interfacing with automated laboratories, and synthesis planning. As agents are an emerging topic, we extend the scope of our review of agents beyond chemistry and discuss across any scientific domains. This review covers the recent history, current capabilities, and design of LLMs and autonomous agents, addressing specific challenges, opportunities, and future directions in chemistry. Key challenges include data quality and integration, model interpretability, and the need for standard benchmarks, while future directions point towards more sophisticated multi-modal agents and enhanced collaboration between agents and experimental methods. Due to the quick pace of this field, a repository has been built to keep track of the latest studies: https://github.com/ur-whitelab/LLMs-in-science.
Collapse
Affiliation(s)
- Mayk Caldas Ramos
- FutureHouse Inc. San Francisco CA USA
- Department of Chemical Engineering, University of Rochester Rochester NY USA
| | - Christopher J Collison
- School of Chemistry and Materials Science, Rochester Institute of Technology Rochester NY USA
| | - Andrew D White
- FutureHouse Inc. San Francisco CA USA
- Department of Chemical Engineering, University of Rochester Rochester NY USA
| |
Collapse
|
19
|
Soleymani F, Paquet E, Viktor HL, Michalowski W. Structure-based protein and small molecule generation using EGNN and diffusion models: A comprehensive review. Comput Struct Biotechnol J 2024; 23:2779-2797. [PMID: 39050782 PMCID: PMC11268121 DOI: 10.1016/j.csbj.2024.06.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 06/13/2024] [Accepted: 06/18/2024] [Indexed: 07/27/2024] Open
Abstract
Recent breakthroughs in deep learning have revolutionized protein sequence and structure prediction. These advancements are built on decades of protein design efforts, and are overcoming traditional time and cost limitations. Diffusion models, at the forefront of these innovations, significantly enhance design efficiency by automating knowledge acquisition. In the field of de novo protein design, the goal is to create entirely novel proteins with predetermined structures. Given the arbitrary positions of proteins in 3-D space, graph representations and their properties are widely used in protein generation studies. A critical requirement in protein modelling is maintaining spatial relationships under transformations (rotations, translations, and reflections). This property, known as equivariance, ensures that predicted protein characteristics adapt seamlessly to changes in orientation or position. Equivariant graph neural networks offer a solution to this challenge. By incorporating equivariant graph neural networks to learn the score of the probability density function in diffusion models, one can generate proteins with robust 3-D structural representations. This review examines the latest deep learning advancements, specifically focusing on frameworks that combine diffusion models with equivariant graph neural networks for protein generation.
Collapse
Affiliation(s)
- Farzan Soleymani
- Telfer School of Management, University of Ottawa, ON, K1N 6N5, Canada
| | - Eric Paquet
- National Research Council, 1200 Montreal Road, Ottawa, ON, K1A 0R6, Canada
- School of Electrical Engineering and Computer Science, University of Ottawa, ON, K1N 6N5, Canada
| | - Herna Lydia Viktor
- School of Electrical Engineering and Computer Science, University of Ottawa, ON, K1N 6N5, Canada
| | | |
Collapse
|
20
|
Kuang Z, Yan X, Yuan Y, Wang R, Zhu H, Wang Y, Li J, Ye J, Yue H, Yang X. Advances in stress-tolerance elements for microbial cell factories. Synth Syst Biotechnol 2024; 9:793-808. [PMID: 39072145 PMCID: PMC11277822 DOI: 10.1016/j.synbio.2024.06.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 06/10/2024] [Accepted: 06/27/2024] [Indexed: 07/30/2024] Open
Abstract
Microorganisms, particularly extremophiles, have evolved multiple adaptation mechanisms to address diverse stress conditions during survival in unique environments. Their responses to environmental coercion decide not only survival in severe conditions but are also an essential factor determining bioproduction performance. The design of robust cell factories should take the balance of their growing and bioproduction into account. Thus, mining and redesigning stress-tolerance elements to optimize the performance of cell factories under various extreme conditions is necessary. Here, we reviewed several stress-tolerance elements, including acid-tolerant elements, saline-alkali-resistant elements, thermotolerant elements, antioxidant elements, and so on, providing potential materials for the construction of cell factories and the development of synthetic biology. Strategies for mining and redesigning stress-tolerance elements were also discussed. Moreover, several applications of stress-tolerance elements were provided, and perspectives and discussions for potential strategies for screening stress-tolerance elements were made.
Collapse
Affiliation(s)
- Zheyi Kuang
- School of Intelligence Science and Technology, Xinjiang University, Urumqi, 830017, China
| | - Xiaofang Yan
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Yanfei Yuan
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Ruiqi Wang
- School of Intelligence Science and Technology, Xinjiang University, Urumqi, 830017, China
| | - Haifan Zhu
- School of Intelligence Science and Technology, Xinjiang University, Urumqi, 830017, China
| | - Youyang Wang
- School of Intelligence Science and Technology, Xinjiang University, Urumqi, 830017, China
| | - Jianfeng Li
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Jianwen Ye
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Haitao Yue
- School of Intelligence Science and Technology, Xinjiang University, Urumqi, 830017, China
- Laboratory of Synthetic Biology, School of Life Science and Technology, Xinjiang University, Urumqi, 830017, China
| | - Xiaofeng Yang
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| |
Collapse
|
21
|
Stohr AM, Ma D, Chen W, Blenner M. Engineering conditional protein-protein interactions for dynamic cellular control. Biotechnol Adv 2024; 77:108457. [PMID: 39343083 DOI: 10.1016/j.biotechadv.2024.108457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Revised: 08/28/2024] [Accepted: 09/26/2024] [Indexed: 10/01/2024]
Abstract
Conditional protein-protein interactions enable dynamic regulation of cellular activity and are an attractive approach to probe native protein interactions, improve metabolic engineering of microbial factories, and develop smart therapeutics. Conditional protein-protein interactions have been engineered to respond to various chemical, light, and nucleic acid-based stimuli. These interactions have been applied to assemble protein fragments, build protein scaffolds, and spatially organize proteins in many microbial and higher-order hosts. To foster the development of novel conditional protein-protein interactions that respond to new inputs or can be utilized in alternative settings, we provide an overview of the process of designing new engineered protein interactions while showcasing many recently developed computational tools that may accelerate protein engineering in this space.
Collapse
Affiliation(s)
- Anthony M Stohr
- Department of Chemical and Biomolecular Engineering, University of Delaware, Newark, DE 19716, USA
| | - Derron Ma
- Department of Chemical and Biomolecular Engineering, University of Delaware, Newark, DE 19716, USA
| | - Wilfred Chen
- Department of Chemical and Biomolecular Engineering, University of Delaware, Newark, DE 19716, USA.
| | - Mark Blenner
- Department of Chemical and Biomolecular Engineering, University of Delaware, Newark, DE 19716, USA.
| |
Collapse
|
22
|
Meng F, Zhou N, Hu G, Liu R, Zhang Y, Jing M, Hou Q. A comprehensive overview of recent advances in generative models for antibodies. Comput Struct Biotechnol J 2024; 23:2648-2660. [PMID: 39027650 PMCID: PMC11254834 DOI: 10.1016/j.csbj.2024.06.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Revised: 06/15/2024] [Accepted: 06/18/2024] [Indexed: 07/20/2024] Open
Abstract
Therapeutic antibodies are an important class of biopharmaceuticals. With the rapid development of deep learning methods and the increasing amount of antibody data, antibody generative models have made great progress recently. They aim to solve the antibody space searching problems and are widely incorporated into the antibody development process. Therefore, a comprehensive introduction to the development methods in this field is imperative. Here, we collected 34 representative antibody generative models published recently and all generative models can be divided into three categories: sequence-generating models, structure-generating models, and hybrid models, based on their principles and algorithms. We further studied their performance and contributions to antibody sequence prediction, structure optimization, and affinity enhancement. Our manuscript will provide a comprehensive overview of the status of antibody generative models and also offer guidance for selecting different approaches.
Collapse
Affiliation(s)
- Fanxu Meng
- College of Chemical Engineering, Qingdao University of Science and Technology, Qingdao 266042, China
| | - Na Zhou
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250100, China
- National Institute of Health Data Science of China, Shandong University, Jinan 250100, China
| | - Guangchun Hu
- School of Information Science and Engineering, University of Jinan, Jinan 250022, China
| | - Ruotong Liu
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250100, China
- National Institute of Health Data Science of China, Shandong University, Jinan 250100, China
| | - Yuanyuan Zhang
- College of Chemical Engineering, Qingdao University of Science and Technology, Qingdao 266042, China
| | - Ming Jing
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China
- Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan 250000, China
| | - Qingzhen Hou
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250100, China
- National Institute of Health Data Science of China, Shandong University, Jinan 250100, China
| |
Collapse
|
23
|
Fan J, Li Z, Alcaide E, Ke G, Huang H, E W. Accurate Conformation Sampling via Protein Structural Diffusion. J Chem Inf Model 2024; 64:8414-8426. [PMID: 39340358 DOI: 10.1021/acs.jcim.4c00928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/30/2024]
Abstract
Accurate sampling of protein conformations is pivotal for advances in biology and medicine. Although there has been tremendous progress in protein structure prediction in recent years due to deep learning, models that can predict the different stable conformations of proteins with high accuracy and structural validity are still lacking. Here, we introduce UFConf, a cutting-edge approach designed for robust sampling of diverse protein conformations based solely on amino acid sequences. This method transforms AlphaFold2 into a diffusion model by implementing a conformation-based diffusion process and adapting the architecture to process diffused inputs effectively. To counteract the inherent conformational bias in the Protein Data Bank, we developed a novel hierarchical reweighting protocol based on structural clustering. Our evaluations demonstrate that UFConf outperforms existing methods in terms of successful sampling and structural validity. The comparisons with long-time molecular dynamics show that UFConf can overcome the energy barrier existing in molecular dynamics simulations and perform more efficient sampling. Furthermore, We showcase UFConf's utility in drug discovery through its application in neural protein-ligand docking. In a blind test, it accurately predicted a novel protein-ligand complex, underscoring its potential to impact real-world biological research. Additionally, we present other modes of sampling using UFConf, including partial sampling with fixed motif, Langevin dynamics, and structural interpolation.
Collapse
Affiliation(s)
- Jiahao Fan
- School of Physics, Peking University, Beijing 100871, China
- DP Technology, Beijing 100080, China
| | - Ziyao Li
- DP Technology, Beijing 100080, China
- Center for Data Science, Peking University, Beijing 100871, China
| | - Eric Alcaide
- DP Technology, Beijing 100080, China
- University of Barcelona, Barcelona 08007, Spain
| | - Guolin Ke
- DP Technology, Beijing 100080, China
| | - Huaqing Huang
- School of Physics, Peking University, Beijing 100871, China
| | - Weinan E
- School of Mathematical Sciences, Peking University, Beijing 100871, China
| |
Collapse
|
24
|
Rajagopal N, Choudhary U, Tsang K, Martin KP, Karadag M, Chen HT, Kwon NY, Mozdzierz J, Horspool AM, Li L, Tessier PM, Marlow MS, Nixon AE, Kumar S. Deep learning-based design and experimental validation of a medicine-like human antibody library. Brief Bioinform 2024; 26:bbaf023. [PMID: 39851074 PMCID: PMC11757908 DOI: 10.1093/bib/bbaf023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Revised: 12/31/2024] [Accepted: 01/09/2025] [Indexed: 01/25/2025] Open
Abstract
Antibody generation requires the use of one or more time-consuming methods, namely animal immunization, and in vitro display technologies. However, the recent availability of large amounts of antibody sequence and structural data in the public domain along with the advent of generative deep learning algorithms raises the possibility of computationally generating novel antibody sequences with desirable developability attributes. Here, we describe a deep learning model for computationally generating libraries of highly human antibody variable regions whose intrinsic physicochemical properties resemble those of the variable regions of the marketed antibody-based biotherapeutics (medicine-likeness). We generated 100000 variable region sequences of antigen-agnostic human antibodies belonging to the IGHV3-IGKV1 germline pair using a training dataset of 31416 human antibodies that satisfied our computational developability criteria. The in-silico generated antibodies recapitulate intrinsic sequence, structural, and physicochemical properties of the training antibodies, and compare favorably with the experimentally measured biophysical attributes of 100 variable regions of marketed and clinical stage antibody-based biotherapeutics. A sample of 51 highly diverse in-silico generated antibodies with >90th percentile medicine-likeness and > 90% humanness was evaluated by two independent experimental laboratories. Our data show the in-silico generated sequences exhibit high expression, monomer content, and thermal stability along with low hydrophobicity, self-association, and non-specific binding when produced as full-length monoclonal antibodies. The ability to computationally generate developable human antibody libraries is a first step towards enabling in-silico discovery of antibody-based biotherapeutics. These findings are expected to accelerate in-silico discovery of antibody-based biotherapeutics and expand the druggable antigen space to include targets refractory to conventional antibody discovery methods requiring in vitro antigen production.
Collapse
Affiliation(s)
- Nandhini Rajagopal
- Biotherapeutics Molecule Discovery, Boehringer Ingelheim Pharmaceutical Inc., 900 Ridgebury Road, Ridgefield, CT 06877, United States
| | - Udit Choudhary
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharmaceutical Inc., 900 Ridgebury Road, Ridgefield, CT 06877, United States
| | - Kenny Tsang
- Biotherapeutics Molecule Discovery, Boehringer Ingelheim Pharmaceutical Inc., 900 Ridgebury Road, Ridgefield, CT 06877, United States
| | - Kyle P Martin
- Biotherapeutics Molecule Discovery, Boehringer Ingelheim Pharmaceutical Inc., 900 Ridgebury Road, Ridgefield, CT 06877, United States
| | - Murat Karadag
- Departments of Chemical Engineering, Pharmaceutical Sciences and Biomedical Engineering, Biointerfaces Institute, University of Michigan, 2800 Plymouth Road, Ann Arbor, MI 48105, United States
| | - Hsin-Ting Chen
- Departments of Chemical Engineering, Pharmaceutical Sciences and Biomedical Engineering, Biointerfaces Institute, University of Michigan, 2800 Plymouth Road, Ann Arbor, MI 48105, United States
| | - Na-Young Kwon
- Departments of Chemical Engineering, Pharmaceutical Sciences and Biomedical Engineering, Biointerfaces Institute, University of Michigan, 2800 Plymouth Road, Ann Arbor, MI 48105, United States
| | - Joseph Mozdzierz
- Biotherapeutics Molecule Discovery, Boehringer Ingelheim Pharmaceutical Inc., 900 Ridgebury Road, Ridgefield, CT 06877, United States
| | - Alexander M Horspool
- Biotherapeutics Molecule Discovery, Boehringer Ingelheim Pharmaceutical Inc., 900 Ridgebury Road, Ridgefield, CT 06877, United States
| | - Li Li
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharmaceutical Inc., 900 Ridgebury Road, Ridgefield, CT 06877, United States
| | - Peter M Tessier
- Departments of Chemical Engineering, Pharmaceutical Sciences and Biomedical Engineering, Biointerfaces Institute, University of Michigan, 2800 Plymouth Road, Ann Arbor, MI 48105, United States
| | - Michael S Marlow
- Biotherapeutics Molecule Discovery, Boehringer Ingelheim Pharmaceutical Inc., 900 Ridgebury Road, Ridgefield, CT 06877, United States
| | - Andrew E Nixon
- Biotherapeutics Molecule Discovery, Boehringer Ingelheim Pharmaceutical Inc., 900 Ridgebury Road, Ridgefield, CT 06877, United States
| | - Sandeep Kumar
- Biotherapeutics Molecule Discovery, Boehringer Ingelheim Pharmaceutical Inc., 900 Ridgebury Road, Ridgefield, CT 06877, United States
| |
Collapse
|
25
|
Nguyen E, Poli M, Durrant MG, Kang B, Katrekar D, Li DB, Bartie LJ, Thomas AW, King SH, Brixi G, Sullivan J, Ng MY, Lewis A, Lou A, Ermon S, Baccus SA, Hernandez-Boussard T, Ré C, Hsu PD, Hie BL. Sequence modeling and design from molecular to genome scale with Evo. Science 2024; 386:eado9336. [PMID: 39541441 DOI: 10.1126/science.ado9336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 09/09/2024] [Indexed: 11/16/2024]
Abstract
The genome is a sequence that encodes the DNA, RNA, and proteins that orchestrate an organism's function. We present Evo, a long-context genomic foundation model with a frontier architecture trained on millions of prokaryotic and phage genomes, and report scaling laws on DNA to complement observations in language and vision. Evo generalizes across DNA, RNA, and proteins, enabling zero-shot function prediction competitive with domain-specific language models and the generation of functional CRISPR-Cas and transposon systems, representing the first examples of protein-RNA and protein-DNA codesign with a language model. Evo also learns how small mutations affect whole-organism fitness and generates megabase-scale sequences with plausible genomic architecture. These prediction and generation capabilities span molecular to genomic scales of complexity, advancing our understanding and control of biology.
Collapse
Affiliation(s)
- Eric Nguyen
- Arc Institute, Palo Alto, CA, USA
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Michael Poli
- Department of Computer Science, Stanford University, Stanford, CA, USA
- TogetherAI, San Francisco, CA, USA
| | | | - Brian Kang
- Arc Institute, Palo Alto, CA, USA
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | | | - David B Li
- Arc Institute, Palo Alto, CA, USA
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | | | - Armin W Thomas
- Stanford Data Science, Stanford University, Stanford, CA, USA
| | - Samuel H King
- Arc Institute, Palo Alto, CA, USA
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Garyk Brixi
- Arc Institute, Palo Alto, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | | | - Madelena Y Ng
- Stanford Center for Biomedical Informatics Research, Stanford, CA, USA
| | - Ashley Lewis
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Aaron Lou
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Stefano Ermon
- Department of Computer Science, Stanford University, Stanford, CA, USA
- CZ Biohub, San Francisco, CA, USA
| | - Stephen A Baccus
- Department of Neurobiology, Stanford University, Stanford, CA, USA
| | | | - Christopher Ré
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Patrick D Hsu
- Arc Institute, Palo Alto, CA, USA
- Department of Bioengineering and Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Brian L Hie
- Arc Institute, Palo Alto, CA, USA
- Stanford Data Science, Stanford University, Stanford, CA, USA
- Department of Chemical Engineering, Stanford University, Stanford, CA, USA
| |
Collapse
|
26
|
Toyooka R, Nishimoto S, Tendo T, Horiyama T, Tachi T, Matsunaga Y. Explicit description of viral capsid subunit shapes by unfolding dihedrons. Commun Biol 2024; 7:1509. [PMID: 39543373 PMCID: PMC11564659 DOI: 10.1038/s42003-024-07218-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2024] [Accepted: 11/05/2024] [Indexed: 11/17/2024] Open
Abstract
Viral capsid assembly and the design of capsid-based nanocontainers critically depend on understanding the shapes and interfaces of constituent protein subunits. However, a comprehensive framework for characterizing these features is still lacking. Here, we introduce a novel approach based on spherical tiling theory that explicitly describes the 2D shapes and interfaces of subunits in icosahedral capsids. Our method unfolds spherical dihedrons defined by icosahedral symmetry axes, enabling systematic characterization of all possible subunit geometries. Applying this framework to real T = 1 capsid structures reveals distinct interface groups within this single classification, with variations in interaction patterns around 3-fold and 5-fold symmetry axes. We validate our classification through molecular docking simulations, demonstrating its consistency with physical subunit interactions. This analysis suggests different assembly pathways for capsid nucleation. Our general framework is applicable to other triangular numbers, paving the way for broader studies in structural virology and nanomaterial design.
Collapse
Affiliation(s)
- Ryuya Toyooka
- Department of General Systems Studies, The University of Tokyo, Tokyo, Japan
| | - Seri Nishimoto
- Department of General Systems Studies, The University of Tokyo, Tokyo, Japan
| | - Tomoya Tendo
- Department of General Systems Studies, The University of Tokyo, Tokyo, Japan
| | - Takashi Horiyama
- Faculty of Information Science and Technology, Hokkaido University, Sapporo, Japan.
| | - Tomohiro Tachi
- Department of General Systems Studies, The University of Tokyo, Tokyo, Japan.
| | - Yasuhiro Matsunaga
- Graduate School of Science and Engineering, Saitama University, Saitama, Japan.
| |
Collapse
|
27
|
Zhang Z, Jin R, Fu K, Cong L, Zitnik M, Wang M. FoldMark: Protecting Protein Generative Models with Watermarking. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.23.619960. [PMID: 39554012 PMCID: PMC11565776 DOI: 10.1101/2024.10.23.619960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/19/2024]
Abstract
Protein structure is key to understanding protein function and is essential for progress in bioengineering, drug discovery, and molecular biology. Recently, with the incorporation of generative AI, the power and accuracy of computational protein structure prediction/design have been improved significantly. However, ethical concerns such as copyright protection and harmful content generation (biosecurity) pose challenges to the wide implementation of protein generative models. Here, we investigate whether it is possible to embed watermarks into protein generative models and their outputs for copyright authentication and the tracking of generated structures. As a proof of concept, we propose a two-stage method FoldMark as a generalized watermarking strategy for protein generative models. FoldMark first pretrain watermark encoder and decoder, which can minorly adjust protein structures to embed user-specific information and faithfully recover the information from the encoded structure. In the second step, protein generative models are fine-tuned with Low-Rank Adaptation modules with watermark as condition to preserve generation quality while learning to generate watermarked structures with high recovery rates. Extensive experiments are conducted on open-source protein structure prediction models (e.g., ESMFold and MultiFlow) and de novo structure design models (e.g., FrameDiff and FoldFlow) and we demonstrate that our method is effective across all these generative models. Meanwhile, our watermarking framework only exerts a negligible impact on the original protein structure quality and is robust under potential post-processing and adaptive attacks.
Collapse
Affiliation(s)
| | | | - Kaidi Fu
- Tsinghua University, Beijing, China
| | - Le Cong
- Stanford University, CA, USA
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | | |
Collapse
|
28
|
Praljak N, Yeh H, Moore M, Socolich M, Ranganathan R, Ferguson AL. Natural Language Prompts Guide the Design of Novel Functional Protein Sequences. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.11.622734. [PMID: 39605414 PMCID: PMC11601239 DOI: 10.1101/2024.11.11.622734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
The advent of natural language interaction with machines has ushered in new innovations in text-guided generation of images, audio, video, and more. In this arena, we introduce Bio logical M ulti- M odal M odel ( BioM3 ), as a novel framework for designing functional proteins via natural language prompts. This framework integrates natural language with protein design through a three-stage process: aligning protein and text representations in a joint embedding space learned using contrastive learning, refinement of the text embeddings, and conditional generation of protein sequences via a discrete autoregressive diffusion model. BioM3 synthe-sizes protein sequences with detailed descriptions of the protein structure, lineage, and function from text annotations to enable the conditional generation of novel sequences with desired attributes through natural language prompts. We present in silico validation of the model predictions for subcellular localization prediction, reaction classification, remote homology detection, scaffold in-painting, and structural plausibility, and in vivo and in vitro experimental tests of natural language prompt-designed synthetic analogs of Src-homology 3 (SH3) domain proteins that mediate signaling in the Sho1 osmotic stress response pathway in baker's yeast. BioM3 possesses state-of-the-art performance in zero-shot prediction and homology detection tasks, and generates proteins with native-like tertiary folds and wild-type levels of experimentally assayed function.
Collapse
|
29
|
He S, Taher NM, Simard AR, Hvorecny KL, Ragusa MJ, Bahl CD, Hickman AB, Dyda F, Madden DR. Molecular basis for the transcriptional regulation of an epoxide-based virulence circuit in Pseudomonas aeruginosa. Nucleic Acids Res 2024; 52:12727-12747. [PMID: 39413156 DOI: 10.1093/nar/gkae889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 08/30/2024] [Accepted: 10/03/2024] [Indexed: 10/18/2024] Open
Abstract
The opportunistic pathogen Pseudomonas aeruginosa infects the airways of people with cystic fibrosis (CF) and produces a virulence factor Cif that is associated with worse outcomes. Cif is an epoxide hydrolase that reduces cell-surface abundance of the cystic fibrosis transmembrane conductance regulator (CFTR) and sabotages pro-resolving signals. Its expression is regulated by a divergently transcribed TetR family transcriptional repressor. CifR represents the first reported epoxide-sensing bacterial transcriptional regulator, but neither its interaction with cognate operator sequences nor the mechanism of activation has been investigated. Using biochemical and structural approaches, we uncovered the molecular mechanisms controlling this complex virulence operon. We present here the first molecular structures of CifR alone and in complex with operator DNA, resolved in a single crystal lattice. Significant conformational changes between these two structures suggest how CifR regulates the expression of the virulence gene cif. Interactions between the N-terminal extension of CifR with the DNA minor groove of the operator play a significant role in the operator recognition of CifR. We also determined that cysteine residue Cys107 is critical for epoxide sensing and DNA release. These results offer new insights into the stereochemical regulation of an epoxide-based virulence circuit in a critically important clinical pathogen.
Collapse
Affiliation(s)
- Susu He
- Department of Biochemistry and Cell Biology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| | - Noor M Taher
- Department of Biochemistry and Cell Biology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| | - Adam R Simard
- Department of Biochemistry and Cell Biology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| | - Kelli L Hvorecny
- Department of Biochemistry and Cell Biology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| | - Michael J Ragusa
- Department of Biochemistry and Cell Biology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
- Department of Chemistry, Dartmouth, Hanover, NH 03755, USA
| | - Christopher D Bahl
- Department of Biochemistry and Cell Biology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| | - Alison B Hickman
- Laboratory of Molecular Biology, NIDDK, National Institutes of Health, Bethesda, MD 20892, USA
| | - Fred Dyda
- Laboratory of Molecular Biology, NIDDK, National Institutes of Health, Bethesda, MD 20892, USA
| | - Dean R Madden
- Department of Biochemistry and Cell Biology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
- Department of Chemistry, Dartmouth, Hanover, NH 03755, USA
| |
Collapse
|
30
|
James JS, Dai J, Chew WL, Cai Y. The design and engineering of synthetic genomes. Nat Rev Genet 2024:10.1038/s41576-024-00786-y. [PMID: 39506144 DOI: 10.1038/s41576-024-00786-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/23/2024] [Indexed: 11/08/2024]
Abstract
Synthetic genomics seeks to design and construct entire genomes to mechanistically dissect fundamental questions of genome function and to engineer organisms for diverse applications, including bioproduction of high-value chemicals and biologics, advanced cell therapies, and stress-tolerant crops. Recent progress has been fuelled by advancements in DNA synthesis, assembly, delivery and editing. Computational innovations, such as the use of artificial intelligence to provide prediction of function, also provide increasing capabilities to guide synthetic genome design and construction. However, translating synthetic genome-scale projects from idea to implementation remains highly complex. Here, we aim to streamline this implementation process by comprehensively reviewing the strategies for design, construction, delivery, debugging and tailoring of synthetic genomes as well as their potential applications.
Collapse
Affiliation(s)
- Joshua S James
- Manchester Institute of Biotechnology, University of Manchester, Manchester, UK
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Junbiao Dai
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Shenzhen Key Laboratory of Agricultural Synthetic Biology, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Wei Leong Chew
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Yizhi Cai
- Manchester Institute of Biotechnology, University of Manchester, Manchester, UK.
| |
Collapse
|
31
|
Chen Y, Zhang H, Ma J, Cui TJ, del Hougne P, Li L. Semantic-Electromagnetic Inversion With Pretrained Multimodal Generative Model. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2406793. [PMID: 39246254 PMCID: PMC11558082 DOI: 10.1002/advs.202406793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/2024] [Revised: 07/28/2024] [Indexed: 09/10/2024]
Abstract
Across diverse domains of science and technology, electromagnetic (EM) inversion problems benefit from the ability to account for multimodal prior information to regularize their inherent ill-posedness. Indeed, besides priors that are formulated mathematically or learned from quantitative data, valuable prior information may be available in the form of text or images. Besides handling semantic multimodality, it is furthermore important to minimize the cost of adapting to a new physical measurement operator and to limit the requirements for costly labeled data. Here, these challenges are tackled with a frugal and multimodal semantic-EM inversion technique. The key ingredient is a multimodal generator of reconstruction results that can be pretrained, being agnostic to the physical measurement operator. The generator is fed by a multimodal foundation model encoding the multimodal semantic prior and a physical adapter encoding the measured data. For a new physical setting, only the lightweight physical adapter is retrained. The authors' architecture also enables a flexible iterative step-by-step solution to the inverse problem where each step can be semantically controlled. The feasibility and benefits of this methodology are demonstrated for three EM inverse problems: a canonical two-dimensional inverse-scattering problem in numerics, as well as three-dimensional and four-dimensional compressive microwave meta-imaging experiments.
Collapse
Affiliation(s)
- Yanjin Chen
- State Key Laboratory of Advanced Optical Communication Systems and NetworksSchool of ElectronicsPeking UniversityBeijing100871China
| | - Hongrui Zhang
- State Key Laboratory of Advanced Optical Communication Systems and NetworksSchool of ElectronicsPeking UniversityBeijing100871China
| | - Jie Ma
- State Key Laboratory of Advanced Optical Communication Systems and NetworksSchool of ElectronicsPeking UniversityBeijing100871China
| | - Tie Jun Cui
- State Key Laboratory of Millimeter WavesSoutheast UniversityNanjing210096China
- Pazhou Laboratory (Huangpu)Guangzhou510555China
| | | | - Lianlin Li
- State Key Laboratory of Advanced Optical Communication Systems and NetworksSchool of ElectronicsPeking UniversityBeijing100871China
- Pazhou Laboratory (Huangpu)Guangzhou510555China
| |
Collapse
|
32
|
Berardi AJ, Francisco SD, Chang A, Zelaya JC, Raymond JE, Lahann J. Synthetic Protein Nanoparticles via Photoreactive Electrohydrodynamic Jetting. Macromol Rapid Commun 2024; 45:e2400349. [PMID: 39171381 DOI: 10.1002/marc.202400349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 06/20/2024] [Indexed: 08/23/2024]
Abstract
Protein nanoparticles are an attractive class of materials for nanomedicine applications due to the intrinsic biocompatibility, biodegradability, and intrinsic functionality of their constituent proteins. Despite the clinical success of select protein nanoparticles, this class of nanocarriers remains understudied and underdeveloped compared to lipid and polymer nanoparticles due to challenges related to formulation optimization, large design space, and their structural complexity. In this work, a modular strategy for protein nanoparticle preparation based on the concept of photoreactive jetting is introduced. The process relies on continuous ultraviolet irradiation during electrohydrodynamic (EHD) jetting of protein solutions that contain a homobifunctional photocrosslinker. Protein nanoparticles exhibit nanogel-like architectures comprised of proteins that are linked via synthetic moieties. Compared to conventional protein nanoparticles, this method reduces nanoparticle processing times to minutes, rather than hours to days. The inclusion of an emissive structural motif as the molecular scaffold of the photocrosslinker is used to study the supramolecular architecture of the stable nanoparticles via time-resolved fluorescence spectroscopy.
Collapse
Affiliation(s)
- Anthony J Berardi
- Macromolecular Science and Engineering Program, Ann Arbor, 48109, USA
- Biointerfaces Institute, Ann Arbor, 48109, USA
| | - Sonja D Francisco
- Biointerfaces Institute, Ann Arbor, 48109, USA
- Department of Chemistry, Ann Arbor, 48109, USA
| | - Albert Chang
- Biointerfaces Institute, Ann Arbor, 48109, USA
- Department of Materials Science and Engineering, Ann Arbor, 48109, USA
| | - Julio C Zelaya
- Macromolecular Science and Engineering Program, Ann Arbor, 48109, USA
- Biointerfaces Institute, Ann Arbor, 48109, USA
| | - Jeffery E Raymond
- Biointerfaces Institute, Ann Arbor, 48109, USA
- Department of Chemical Engineering, Ann Arbor, 48109, USA
- Center for Complex Particle Systems, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Joerg Lahann
- Macromolecular Science and Engineering Program, Ann Arbor, 48109, USA
- Biointerfaces Institute, Ann Arbor, 48109, USA
- Department of Materials Science and Engineering, Ann Arbor, 48109, USA
- Department of Chemical Engineering, Ann Arbor, 48109, USA
- Center for Complex Particle Systems, University of Michigan, Ann Arbor, MI, 48109, USA
| |
Collapse
|
33
|
SCUBA-D: a freshly trained diffusion model generates high-quality protein structures. Nat Methods 2024; 21:1990-1991. [PMID: 39468213 DOI: 10.1038/s41592-024-02465-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/30/2024]
|
34
|
Koga N, Tatsumi-Koga R. Inventing Novel Protein Folds. J Mol Biol 2024; 436:168791. [PMID: 39260686 DOI: 10.1016/j.jmb.2024.168791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 09/04/2024] [Accepted: 09/05/2024] [Indexed: 09/13/2024]
Abstract
The vastness of unexplored protein fold universe remains a significant question. Through systematic de novo design of proteins with novel αβ-folds, we demonstrated that nature has only explored a tiny portion of the possible folds. Numerous possible protein folds are still untouched by nature. This review outlines this study and discusses the prospects for design of functional proteins with novel folds.
Collapse
Affiliation(s)
- Nobuyasu Koga
- Laboratory for Protein Design, Institute for Protein Research (IPR), Osaka University, Suita, Osaka 565-0871, Japan; Protein Design Group, Exploratory Research Center on Life and Living Systems (ExCELLS), National Institutes of Natural Sciences, Okazaki, Aichi 444-8585, Japan.
| | - Rie Tatsumi-Koga
- Laboratory for Protein Design, Institute for Protein Research (IPR), Osaka University, Suita, Osaka 565-0871, Japan
| |
Collapse
|
35
|
Zhang P, Wei L, Li J, Wang X. Artificial intelligence-guided strategies for next-generation biological sequence design. Natl Sci Rev 2024; 11:nwae343. [PMID: 39606146 PMCID: PMC11601974 DOI: 10.1093/nsr/nwae343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Revised: 09/20/2024] [Accepted: 09/25/2024] [Indexed: 11/29/2024] Open
Affiliation(s)
- Pengcheng Zhang
- Ministry of Education Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Department of Automation, Tsinghua University, China
| | - Lei Wei
- Ministry of Education Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Department of Automation, Tsinghua University, China
| | - Jiaqi Li
- Ministry of Education Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Department of Automation, Tsinghua University, China
| | - Xiaowo Wang
- Ministry of Education Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Department of Automation, Tsinghua University, China
| |
Collapse
|
36
|
Liu Y, Wang S, Dong J, Chen L, Wang X, Wang L, Li F, Wang C, Zhang J, Wang Y, Wei S, Chen Q, Liu H. De novo protein design with a denoising diffusion network independent of pretrained structure prediction models. Nat Methods 2024; 21:2107-2116. [PMID: 39384986 DOI: 10.1038/s41592-024-02437-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 08/30/2024] [Indexed: 10/11/2024]
Abstract
The recent success of RFdiffusion, a method for protein structure design with a denoising diffusion probabilistic model, has relied on fine-tuning the RoseTTAFold structure prediction network for protein backbone denoising. Here, we introduce SCUBA-diffusion (SCUBA-D), a protein backbone denoising diffusion probabilistic model freshly trained by considering co-diffusion of sequence representation to enhance model regularization and adversarial losses to minimize data-out-of-distribution errors. While matching the performance of the pretrained RoseTTAFold-based RFdiffusion in generating experimentally realizable protein structures, SCUBA-D readily generates protein structures with not-yet-observed overall folds that are different from those predictable with RoseTTAFold. The accuracy of SCUBA-D was confirmed by the X-ray structures of 16 designed proteins and a protein complex, and by experiments validating designed heme-binding proteins and Ras-binding proteins. Our work shows that deep generative models of images or texts can be fruitfully extended to complex physical objects like protein structures by addressing outstanding issues such as the data-out-of-distribution errors.
Collapse
Affiliation(s)
- Yufeng Liu
- Department of Rheumatology and Immunology, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, Hefei National Research Center for Physical Sciences at the Microscale, Center for Advanced Interdisciplinary Science and Biomedicine of IHM, University of Science and Technology of China, Hefei, China
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
| | - Sheng Wang
- Department of Rheumatology and Immunology, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, Hefei National Research Center for Physical Sciences at the Microscale, Center for Advanced Interdisciplinary Science and Biomedicine of IHM, University of Science and Technology of China, Hefei, China
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
| | - Jixin Dong
- Department of Rheumatology and Immunology, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, Hefei National Research Center for Physical Sciences at the Microscale, Center for Advanced Interdisciplinary Science and Biomedicine of IHM, University of Science and Technology of China, Hefei, China
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
| | | | - Xinyu Wang
- Department of Rheumatology and Immunology, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, Hefei National Research Center for Physical Sciences at the Microscale, Center for Advanced Interdisciplinary Science and Biomedicine of IHM, University of Science and Technology of China, Hefei, China
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
| | - Lei Wang
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
| | - Fudong Li
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
- Biomedical Sciences and Health Laboratory of Anhui Province, Anhui Basic Discipline Research Center of Artificial Intelligence Biotechnology and Synthetic Biology, University of Science and Technology of China, Hefei, China
| | - Chenchen Wang
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
| | - Jiahai Zhang
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
- Biomedical Sciences and Health Laboratory of Anhui Province, Anhui Basic Discipline Research Center of Artificial Intelligence Biotechnology and Synthetic Biology, University of Science and Technology of China, Hefei, China
| | - Yuzhu Wang
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
| | - Si Wei
- iFLYTEK Research, Hefei, China
| | - Quan Chen
- Department of Rheumatology and Immunology, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, Hefei National Research Center for Physical Sciences at the Microscale, Center for Advanced Interdisciplinary Science and Biomedicine of IHM, University of Science and Technology of China, Hefei, China.
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China.
- Oristruct Biotech Co. Ltd, Hefei, China.
- Biomedical Sciences and Health Laboratory of Anhui Province, Anhui Basic Discipline Research Center of Artificial Intelligence Biotechnology and Synthetic Biology, University of Science and Technology of China, Hefei, China.
| | - Haiyan Liu
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China.
- Oristruct Biotech Co. Ltd, Hefei, China.
- Biomedical Sciences and Health Laboratory of Anhui Province, Anhui Basic Discipline Research Center of Artificial Intelligence Biotechnology and Synthetic Biology, University of Science and Technology of China, Hefei, China.
- School of Biomedical Engineering, Suzhou Institute for Advanced Research, University of Science and Technology of China, Hefei, China.
| |
Collapse
|
37
|
Abriata LA. The Nobel Prize in Chemistry: past, present, and future of AI in biology. Commun Biol 2024; 7:1409. [PMID: 39472680 PMCID: PMC11522274 DOI: 10.1038/s42003-024-07113-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Accepted: 10/21/2024] [Indexed: 11/02/2024] Open
Abstract
A Comment on the transformative progress of artificial intelligence for structural and protein biology, referencing the 2024 Nobel Prize in Chemistry.
Collapse
Affiliation(s)
- Luciano A Abriata
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, CH-1015, Lausanne, Switzerland.
| |
Collapse
|
38
|
Chen X, Xu S, Chu B, Guo J, Zhang H, Sun S, Song L, Feng XQ. Applying Spatiotemporal Modeling of Cell Dynamics to Accelerate Drug Development. ACS NANO 2024; 18:29311-29336. [PMID: 39420743 DOI: 10.1021/acsnano.4c12599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2024]
Abstract
Cells act as physical computational programs that utilize input signals to orchestrate molecule-level protein-protein interactions (PPIs), generating and responding to forces, ultimately shaping all of the physiological and pathophysiological behaviors. Genome editing and molecule drugs targeting PPIs hold great promise for the treatments of diseases. Linking genes and molecular drugs with protein-performed cellular behaviors is a key yet challenging issue due to the wide range of spatial and temporal scales involved. Building predictive spatiotemporal modeling systems that can describe the dynamic behaviors of cells intervened by genome editing and molecular drugs at the intersection of biology, chemistry, physics, and computer science will greatly accelerate pharmaceutical advances. Here, we review the mechanical roles of cytoskeletal proteins in orchestrating cellular behaviors alongside significant advancements in biophysical modeling while also addressing the limitations in these models. Then, by integrating generative artificial intelligence (AI) with spatiotemporal multiscale biophysical modeling, we propose a computational pipeline for developing virtual cells, which can simulate and evaluate the therapeutic effects of drugs and genome editing technologies on various cell dynamic behaviors and could have broad biomedical applications. Such virtual cell modeling systems might revolutionize modern biomedical engineering by moving most of the painstaking wet-laboratory effort to computer simulations, substantially saving time and alleviating the financial burden for pharmaceutical industries.
Collapse
Affiliation(s)
- Xindong Chen
- Institute of Biomechanics and Medical Engineering, Department of Engineering Mechanics, Tsinghua University, Beijing 100084, China
- BioMap, Beijing 100144, China
| | - Shihao Xu
- Institute of Biomechanics and Medical Engineering, Department of Engineering Mechanics, Tsinghua University, Beijing 100084, China
| | - Bizhu Chu
- School of Pharmacy, Shenzhen University, Shenzhen 518055, China
- Medical School, Shenzhen University, Shenzhen 518055, China
| | - Jing Guo
- Department of Medical Oncology, Xiamen Key Laboratory of Antitumor Drug Transformation Research, The First Affiliated Hospital of Xiamen University, Xiamen 361000, China
| | - Huikai Zhang
- Institute of Biomechanics and Medical Engineering, Department of Engineering Mechanics, Tsinghua University, Beijing 100084, China
| | - Shuyi Sun
- Institute of Biomechanics and Medical Engineering, Department of Engineering Mechanics, Tsinghua University, Beijing 100084, China
| | - Le Song
- BioMap, Beijing 100144, China
| | - Xi-Qiao Feng
- Institute of Biomechanics and Medical Engineering, Department of Engineering Mechanics, Tsinghua University, Beijing 100084, China
| |
Collapse
|
39
|
Frank C, Khoshouei A, Fuß L, Schiwietz D, Putz D, Weber L, Zhao Z, Hattori M, Feng S, de Stigter Y, Ovchinnikov S, Dietz H. Scalable protein design using optimization in a relaxed sequence space. Science 2024; 386:439-445. [PMID: 39446959 PMCID: PMC11734486 DOI: 10.1126/science.adq1741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Accepted: 09/13/2024] [Indexed: 10/26/2024]
Abstract
Machine learning (ML)-based design approaches have advanced the field of de novo protein design, with diffusion-based generative methods increasingly dominating protein design pipelines. Here, we report a "hallucination"-based protein design approach that functions in relaxed sequence space, enabling the efficient design of high-quality protein backbones over multiple scales and with broad scope of application without the need for any form of retraining. We experimentally produced and characterized more than 100 proteins. Three high-resolution crystal structures and two cryo-electron microscopy density maps of designed single-chain proteins comprising up to 1000 amino acids validate the accuracy of the method. Our pipeline can also be used to design synthetic protein-protein interactions, as validated experimentally by a set of protein heterodimers. Relaxed sequence optimization offers attractive performance with respect to designability, scope of applicability for different design problems, and scalability across protein sizes.
Collapse
Affiliation(s)
- Christopher Frank
- Laboratory for Biomolecular Nanotechnology, Department of Biosciences, School of Natural Sciences Technical University of Munich, Am Coulombwall 4a, 85748 Garching, Germany
- Munich Institute of Biomedical Engineering, Technical University of Munich, Boltzmannstraße 11, 85748 Garching, Germany
| | - Ali Khoshouei
- Laboratory for Biomolecular Nanotechnology, Department of Biosciences, School of Natural Sciences Technical University of Munich, Am Coulombwall 4a, 85748 Garching, Germany
- Munich Institute of Biomedical Engineering, Technical University of Munich, Boltzmannstraße 11, 85748 Garching, Germany
| | - Lara Fuß
- Laboratory for Biomolecular Nanotechnology, Department of Biosciences, School of Natural Sciences Technical University of Munich, Am Coulombwall 4a, 85748 Garching, Germany
- Munich Institute of Biomedical Engineering, Technical University of Munich, Boltzmannstraße 11, 85748 Garching, Germany
| | - Dominik Schiwietz
- Laboratory for Biomolecular Nanotechnology, Department of Biosciences, School of Natural Sciences Technical University of Munich, Am Coulombwall 4a, 85748 Garching, Germany
- Munich Institute of Biomedical Engineering, Technical University of Munich, Boltzmannstraße 11, 85748 Garching, Germany
| | - Dominik Putz
- Laboratory for Biomolecular Nanotechnology, Department of Biosciences, School of Natural Sciences Technical University of Munich, Am Coulombwall 4a, 85748 Garching, Germany
- Munich Institute of Biomedical Engineering, Technical University of Munich, Boltzmannstraße 11, 85748 Garching, Germany
| | - Lara Weber
- Laboratory for Biomolecular Nanotechnology, Department of Biosciences, School of Natural Sciences Technical University of Munich, Am Coulombwall 4a, 85748 Garching, Germany
- Munich Institute of Biomedical Engineering, Technical University of Munich, Boltzmannstraße 11, 85748 Garching, Germany
| | - Zhixuan Zhao
- State Key Laboratory of Genetic Engineering, Shanghai Key Laboratory of Bioactive Small Molecules, Collaborative Innovation Center of Genetics and Development, Department of Department of Physiology and Neurobiology, School of Life Sciences, Fudan University, 2005 Songhu Road, Yangpu District, Shanghai 200438, China
| | - Motoyuki Hattori
- State Key Laboratory of Genetic Engineering, Shanghai Key Laboratory of Bioactive Small Molecules, Collaborative Innovation Center of Genetics and Development, Department of Department of Physiology and Neurobiology, School of Life Sciences, Fudan University, 2005 Songhu Road, Yangpu District, Shanghai 200438, China
| | | | - Yosta de Stigter
- Laboratory for Biomolecular Nanotechnology, Department of Biosciences, School of Natural Sciences Technical University of Munich, Am Coulombwall 4a, 85748 Garching, Germany
- Munich Institute of Biomedical Engineering, Technical University of Munich, Boltzmannstraße 11, 85748 Garching, Germany
| | - Sergey Ovchinnikov
- Faculty of Applied Sciences, Harvard University, Cambridge MA, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Hendrik Dietz
- Laboratory for Biomolecular Nanotechnology, Department of Biosciences, School of Natural Sciences Technical University of Munich, Am Coulombwall 4a, 85748 Garching, Germany
- Munich Institute of Biomedical Engineering, Technical University of Munich, Boltzmannstraße 11, 85748 Garching, Germany
| |
Collapse
|
40
|
Tripp A, Braun M, Wieser F, Oberdorfer G, Lechner H. Click, Compute, Create: A Review of Web-based Tools for Enzyme Engineering. Chembiochem 2024; 25:e202400092. [PMID: 38634409 DOI: 10.1002/cbic.202400092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 04/14/2024] [Accepted: 04/15/2024] [Indexed: 04/19/2024]
Abstract
Enzyme engineering, though pivotal across various biotechnological domains, is often plagued by its time-consuming and labor-intensive nature. This review aims to offer an overview of supportive in silico methodologies for this demanding endeavor. Starting from methods to predict protein structures, to classification of their activity and even the discovery of new enzymes we continue with describing tools used to increase thermostability and production yields of selected targets. Subsequently, we discuss computational methods to modulate both, the activity as well as selectivity of enzymes. Last, we present recent approaches based on cutting-edge machine learning methods to redesign enzymes. With exception of the last chapter, there is a strong focus on methods easily accessible via web-interfaces or simple Python-scripts, therefore readily useable for a diverse and broad community.
Collapse
Affiliation(s)
- Adrian Tripp
- Institute of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010, Graz, Austria
| | - Markus Braun
- Institute of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010, Graz, Austria
| | - Florian Wieser
- Institute of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010, Graz, Austria
| | - Gustav Oberdorfer
- Institute of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010, Graz, Austria
- BioTechMed, Graz, Austria
| | - Horst Lechner
- Institute of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010, Graz, Austria
- BioTechMed, Graz, Austria
| |
Collapse
|
41
|
Harteveld Z, Van Hall-Beauvais A, Morozova I, Southern J, Goverde C, Georgeon S, Rosset S, Defferrard M, Loukas A, Vandergheynst P, Bronstein MM, Correia BE. Exploring "dark-matter" protein folds using deep learning. Cell Syst 2024; 15:898-910.e5. [PMID: 39383860 DOI: 10.1016/j.cels.2024.09.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Revised: 06/13/2024] [Accepted: 09/16/2024] [Indexed: 10/11/2024]
Abstract
De novo protein design explores uncharted sequence and structure space to generate novel proteins not sampled by evolution. A main challenge in de novo design involves crafting "designable" structural templates to guide the sequence searches toward adopting target structures. We present a convolutional variational autoencoder that learns patterns of protein structure, dubbed Genesis. We coupled Genesis with trRosetta to design sequences for a set of protein folds and found that Genesis is capable of reconstructing native-like distance and angle distributions for five native folds and three novel, the so-called "dark-matter" folds as a demonstration of generalizability. We used a high-throughput assay to characterize the stability of the designs through protease resistance, obtaining encouraging success rates for folded proteins. Genesis enables exploration of the protein fold space within minutes, unrestricted by protein topologies. Our approach addresses the backbone designability problem, showing that small neural networks can efficiently learn structural patterns in proteins. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Zander Harteveld
- École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Alexandra Van Hall-Beauvais
- École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Irina Morozova
- École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | | | - Casper Goverde
- École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | | | - Stéphane Rosset
- École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | | | - Andreas Loukas
- École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland; Prescient Design, gRED, Roche, Basel, Switzerland
| | | | | | - Bruno E Correia
- École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
| |
Collapse
|
42
|
Wang S, Favor A, Kibler R, Lubner J, Borst AJ, Coudray N, Redler RL, Chiang HT, Sheffler W, Hsia Y, Li Z, Ekiert DC, Bhabha G, Pozzo LD, Baker D. Bond-centric modular design of protein assemblies. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.11.617872. [PMID: 39416012 PMCID: PMC11483063 DOI: 10.1101/2024.10.11.617872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2024]
Abstract
We describe a modular bond-centric approach to protein nanomaterial design inspired by the rich diversity of chemical structures that can be generated from the small number of atomic valencies and bonding interactions. We design protein building blocks with regular coordination geometries and bonding interactions that enable the assembly of a wide variety of closed and opened nanomaterials using simple geometrical principles. Experimental characterization confirms successful formation of more than twenty multi-component polyhedral protein cages, 2D arrays, and 3D protein lattices, with a high (10-50 %) success rate and electron microscopy data closely matching the corresponding design models. Because of the modularity, individual building blocks can assemble with different partners to generate distinct regular assemblies, resulting in an economy of parts and enabling the construction of reconfigurable systems.
Collapse
Affiliation(s)
- Shunzhi Wang
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Andrew Favor
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Molecular Engineering and Sciences Institute, University of Washington, Seattle, WA, USA
| | - Ryan Kibler
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Joshua Lubner
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Andrew J. Borst
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Nicolas Coudray
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Medicine, Division of Precision Medicine, NYU Grossman School of Medicine, New York, USA
| | - Rachel L. Redler
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Huat Thart Chiang
- Department of Chemical Engineering, University of Washington, Seattle, WA, USA
| | - William Sheffler
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Yang Hsia
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Zhe Li
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Damian C. Ekiert
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Gira Bhabha
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Lilo D Pozzo
- Department of Chemical Engineering, University of Washington, Seattle, WA, USA
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| |
Collapse
|
43
|
Norton T, Bhattacharya D. Sifting through the noise: A survey of diffusion probabilistic models and their applications to biomolecules. J Mol Biol 2024:168818. [PMID: 39389290 DOI: 10.1016/j.jmb.2024.168818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 09/20/2024] [Accepted: 10/03/2024] [Indexed: 10/12/2024]
Abstract
Diffusion probabilistic models have made their way into a number of high-profile applications since their inception. In particular, there has been a wave of research into using diffusion models in the prediction and design of biomolecular structures and sequences. Their growing ubiquity makes it imperative for researchers in these fields to understand them. This paper serves as a general overview for the theory behind these models and the current state of research. We first introduce diffusion models and discuss common motifs used when applying them to biomolecules. We then present the significant outcomes achieved through the application of these models in generative and predictive tasks. This survey aims to provide readers with a comprehensive understanding of the increasingly critical role of diffusion models.
Collapse
Affiliation(s)
- Trevor Norton
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States
| | | |
Collapse
|
44
|
Joshi CK, Jamasb AR, Viñas R, Harris C, Mathis S, Morehead A, Anand R, Liò P. gRNAde: Geometric Deep Learning for 3D RNA inverse design. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.31.587283. [PMID: 38826198 PMCID: PMC11142113 DOI: 10.1101/2024.03.31.587283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Computational RNA design tasks are often posed as inverse problems, where sequences are designed based on adopting a single desired secondary structure without considering 3D geometry and conformational diversity. We introduce gRNAde, a geometric RNA design pipeline operating on 3D RNA backbones to design sequences that explicitly account for structure and dynamics. gRNAde uses a multi-state Graph Neural Network and autoregressive decoding to generates candidate RNA sequences conditioned on one or more 3D backbone structures where the identities of the bases are unknown. On a single-state fixed backbone re-design benchmark of 14 RNA structures from the PDB identified by Das et al. (2010), gRNAde obtains higher native sequence recovery rates (56% on average) compared to Rosetta (45% on average), taking under a second to produce designs compared to the reported hours for Rosetta. We further demonstrate the utility of gRNAde on a new benchmark of multi-state design for structurally flexible RNAs, as well as zero-shot ranking of mutational fitness landscapes in a retrospective analysis of a recent ribozyme. Open source code: github.com/chaitjo/geometric-rna-design.
Collapse
Affiliation(s)
| | - Arian R Jamasb
- University of Cambridge, UK
- Prescient Design, Genentech, Roche
| | | | | | | | | | | | | |
Collapse
|
45
|
Min X, Liao Y, Chen X, Yang Q, Ying J, Zou J, Yang C, Zhang J, Ge S, Xia N. PB-GPT: An innovative GPT-based model for protein backbone generation. Structure 2024; 32:1820-1833.e5. [PMID: 39173620 DOI: 10.1016/j.str.2024.07.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 06/02/2024] [Accepted: 07/28/2024] [Indexed: 08/24/2024]
Abstract
With advanced computational methods, it is now feasible to modify or design proteins for specific functions, a process with significant implications for disease treatment and other medical applications. Protein structures and functions are intrinsically linked to their backbones, making the design of these backbones a pivotal aspect of protein engineering. In this study, we focus on the task of unconditionally generating protein backbones. By means of codebook quantization and compression dictionaries, we convert protein backbone structures into a distinctive coded language and propose a GPT-based protein backbone generation model, PB-GPT. To validate the generalization performance of the model, we trained and evaluated the model on both public datasets and small protein datasets. The results demonstrate that our model has the capability to unconditionally generate elaborate, highly realistic protein backbones with structural patterns resembling those of natural proteins, thus showcasing the significant potential of large language models in protein structure design.
Collapse
Affiliation(s)
- Xiaoping Min
- School of Informatics, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China; National Institute of Diagnostics and Vaccine Development in Infectious Diseases, Xiamen University, State Key, No. 422 Siming South Rd, Xiamen 361005, China; State Key Laboratory of Vaccines for Infectious Diseases, Xiang An Biomedicine Laboratory, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China
| | - Yiyang Liao
- School of Informatics, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China; National Institute of Diagnostics and Vaccine Development in Infectious Diseases, Xiamen University, State Key, No. 422 Siming South Rd, Xiamen 361005, China; State Key Laboratory of Vaccines for Infectious Diseases, Xiang An Biomedicine Laboratory, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China
| | - Xiao Chen
- School of Informatics, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China
| | - Qianli Yang
- Institute of Artificial Intelligence, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China
| | - Junjie Ying
- Institute of Artificial Intelligence, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China
| | - Jiajun Zou
- School of Informatics, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China
| | - Chongzhou Yang
- National Institute of Diagnostics and Vaccine Development in Infectious Diseases, Xiamen University, State Key, No. 422 Siming South Rd, Xiamen 361005, China; Institute of Artificial Intelligence, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China
| | - Jun Zhang
- National Institute of Diagnostics and Vaccine Development in Infectious Diseases, Xiamen University, State Key, No. 422 Siming South Rd, Xiamen 361005, China; School of Public Health, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China; State Key Laboratory of Vaccines for Infectious Diseases, Xiang An Biomedicine Laboratory, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China
| | - Shengxiang Ge
- National Institute of Diagnostics and Vaccine Development in Infectious Diseases, Xiamen University, State Key, No. 422 Siming South Rd, Xiamen 361005, China; School of Public Health, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China; State Key Laboratory of Vaccines for Infectious Diseases, Xiang An Biomedicine Laboratory, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China.
| | - Ningshao Xia
- National Institute of Diagnostics and Vaccine Development in Infectious Diseases, Xiamen University, State Key, No. 422 Siming South Rd, Xiamen 361005, China; School of Public Health, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China; State Key Laboratory of Vaccines for Infectious Diseases, Xiang An Biomedicine Laboratory, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China.
| |
Collapse
|
46
|
Chu LS, Sarma S, Gray JJ. Unified Sampling and Ranking for Protein Docking with DFMDock. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.27.615401. [PMID: 39386449 PMCID: PMC11463455 DOI: 10.1101/2024.09.27.615401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
Diffusion models have shown promise in addressing the protein docking problem. Traditionally, these models are used solely for sampling docked poses, with a separate confidence model for ranking. We introduce DFMDock (Denoising Force Matching Dock), a diffusion model that unifies sampling and ranking within a single framework. DFMDock features two output heads: one for predicting forces and the other for predicting energies. The forces are trained using a denoising force matching objective, while the energy gradients are trained to align with the forces. This design enables our model to sample using the predicted forces and rank poses using the predicted energies, thereby eliminating the need for an additional confidence model. Our approach outperforms the previous diffusion model for protein docking, DiffDock-PP, with a sampling success rate of 44% compared to its 8%, and a Top- 1 ranking success rate of 16% compared to 0% on the Docking Benchmark 5.5 test set. In successful decoy cases, the DFMDock Energy forms a binding funnel similar to the physics-based Rosetta Energy, suggesting that DFMDock can capture the underlying energy landscape.
Collapse
Affiliation(s)
- Lee-Shin Chu
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Sudeep Sarma
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jeffrey J Gray
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| |
Collapse
|
47
|
Lisanza SL, Gershon JM, Tipps SWK, Sims JN, Arnoldt L, Hendel SJ, Simma MK, Liu G, Yase M, Wu H, Tharp CD, Li X, Kang A, Brackenbrough E, Bera AK, Gerben S, Wittmann BJ, McShan AC, Baker D. Multistate and functional protein design using RoseTTAFold sequence space diffusion. Nat Biotechnol 2024:10.1038/s41587-024-02395-w. [PMID: 39322764 DOI: 10.1038/s41587-024-02395-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 08/21/2024] [Indexed: 09/27/2024]
Abstract
Protein denoising diffusion probabilistic models are used for the de novo generation of protein backbones but are limited in their ability to guide generation of proteins with sequence-specific attributes and functional properties. To overcome this limitation, we developed ProteinGenerator (PG), a sequence space diffusion model based on RoseTTAFold that simultaneously generates protein sequences and structures. Beginning from a noised sequence representation, PG generates sequence and structure pairs by iterative denoising, guided by desired sequence and structural protein attributes. We designed thermostable proteins with varying amino acid compositions and internal sequence repeats and cage bioactive peptides, such as melittin. By averaging sequence logits between diffusion trajectories with distinct structural constraints, we designed multistate parent-child protein triples in which the same sequence folds to different supersecondary structures when intact in the parent versus split into two child domains. PG design trajectories can be guided by experimental sequence-activity data, providing a general approach for integrated computational and experimental optimization of protein function.
Collapse
Affiliation(s)
- Sidney Lyayuga Lisanza
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Graduate Program in Biological Physics, Structure and Design, University of Washington, Seattle, WA, USA
| | - Jacob Merle Gershon
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Department of Molecular Engineering, University of Washington, Seattle, WA, USA
| | - Samuel W K Tipps
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Jeremiah Nelson Sims
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Molecular & Cellular Biology, Medical Scientist Training Program, University of Washington, Seattle, WA, USA
| | - Lucas Arnoldt
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Faculty of Engineering Sciences, Heidelberg University, Heidelberg, Germany
| | - Samuel J Hendel
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Miriam K Simma
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, USA
| | - Ge Liu
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Muna Yase
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Department of Molecular Engineering, University of Washington, Seattle, WA, USA
| | - Hongwei Wu
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, USA
| | - Claire D Tharp
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, USA
| | - Xinting Li
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Alex Kang
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | | | - Asim K Bera
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Stacey Gerben
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Bruce J Wittmann
- Office of the Chief Scientific Officer, Microsoft, Redmond, WA, USA
| | - Andrew C McShan
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, USA
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA, USA.
- Institute for Protein Design, University of Washington, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| |
Collapse
|
48
|
Liu J, Guo Z, You H, Zhang C, Lai L. All-Atom Protein Sequence Design Based on Geometric Deep Learning. Angew Chem Int Ed Engl 2024:e202411461. [PMID: 39295564 DOI: 10.1002/anie.202411461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Revised: 09/09/2024] [Accepted: 09/18/2024] [Indexed: 09/21/2024]
Abstract
Designing sequences for specific protein backbones is a key step in creating new functional proteins. Here, we introduce GeoSeqBuilder, a deep learning framework that integrates protein sequence generation with side chain conformation prediction to produce the complete all-atom structures for designed sequences. GeoSeqBuilder uses spatial geometric features from protein backbones and explicitly includes three-body interactions of neighboring residues. GeoSeqBuilder achieves native residue type recovery rate of 51.6 %, comparable to ProteinMPNN and other leading methods, while accurately predicting side chain conformations. We first used GeoSeqBuilder to design sequences for thioredoxin and a hallucinated three-helical bundle protein. All the 15 tested sequences expressed as soluble monomeric proteins with high thermal stability, and the 2 high-resolution crystal structures solved closely match the designed models. The generated protein sequences exhibit low similarity (minimum 23 %) to the original sequences, with significantly altered hydrophobic cores. We further redesigned the hydrophobic core of glutathione peroxidase 4, and 3 of the 5 designs showed improved enzyme activity. Although further testing is needed, the high experimental success rate in our testing demonstrates that GeoSeqBuilder is a powerful tool for designing novel sequences for predefined protein structures with atomic details. GeoSeqBuilder is available at https://github.com/PKUliujl/GeoSeqBuilder.
Collapse
Affiliation(s)
- Jiale Liu
- Center for Life Sciences Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
| | - Zheng Guo
- Center for Life Sciences Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
| | - Hantian You
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing, 100871, China
| | - Changsheng Zhang
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing, 100871, China
| | - Luhua Lai
- Center for Life Sciences Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing, 100871, China
- Center for Quantitative Biology Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
- Chengdu Academy for Advanced Interdisciplinary Biotechnologies, Peking University, Chengdu, 510100, Sichuan, China
| |
Collapse
|
49
|
McCoy KM, Ackerman ME, Grigoryan G. A comparison of antibody-antigen complex sequence-to-structure prediction methods and their systematic biases. Protein Sci 2024; 33:e5127. [PMID: 39167052 PMCID: PMC11337930 DOI: 10.1002/pro.5127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 06/24/2024] [Accepted: 07/14/2024] [Indexed: 08/23/2024]
Abstract
The ability to accurately predict antibody-antigen complex structures from their sequences could greatly advance our understanding of the immune system and would aid in the development of novel antibody therapeutics. There have been considerable recent advancements in predicting protein-protein interactions (PPIs) fueled by progress in machine learning (ML). To understand the current state of the field, we compare six representative methods for predicting antibody-antigen complexes from sequence, including two deep learning approaches trained to predict PPIs in general (AlphaFold-Multimer and RoseTTAFold), two composite methods that initially predict antibody and antigen structures separately and dock them (using antibody-mode ClusPro), local refinement in Rosetta (SnugDock) of globally docked poses from ClusPro, and a pipeline combining homology modeling with rigid-body docking informed by ML-based epitope and paratope prediction (AbAdapt). We find that AlphaFold-Multimer outperformed other methods, although the absolute performance leaves considerable room for improvement. AlphaFold-Multimer models of lower quality display significant structural biases at the level of tertiary motifs (TERMs) toward having fewer structural matches in non-antibody-containing structures from the Protein Data Bank (PDB). Specifically, better models exhibit more common PDB-like TERMs at the antibody-antigen interface than worse ones. Importantly, the clear relationship between performance and the commonness of interfacial TERMs suggests that the scarcity of interfacial geometry data in the structural database may currently limit the application of ML to the prediction of antibody-antigen interactions.
Collapse
Affiliation(s)
- Katherine Maia McCoy
- Molecular and Cell Biology Graduate ProgramDartmouth CollegeHanoverNew HampshireUSA
| | - Margaret E. Ackerman
- Molecular and Cell Biology Graduate ProgramDartmouth CollegeHanoverNew HampshireUSA
- Thayer School of EngineeringDartmouth CollegeHanoverNew HampshireUSA
| | - Gevorg Grigoryan
- Molecular and Cell Biology Graduate ProgramDartmouth CollegeHanoverNew HampshireUSA
- Department of Computer ScienceDartmouth CollegeHanoverNew HampshireUSA
| |
Collapse
|
50
|
Giraldo-Castaño MC, Littlejohn KA, Avecilla ARC, Barrera-Villamizar N, Quiroz FG. Programmability and biomedical utility of intrinsically-disordered protein polymers. Adv Drug Deliv Rev 2024; 212:115418. [PMID: 39094909 PMCID: PMC11389844 DOI: 10.1016/j.addr.2024.115418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Revised: 07/03/2024] [Accepted: 07/29/2024] [Indexed: 08/04/2024]
Abstract
Intrinsically disordered proteins (IDPs) exhibit molecular-level conformational dynamics that are functionally harnessed across a wide range of fascinating biological phenomena. The low sequence complexity of IDPs has led to the design and development of intrinsically-disordered protein polymers (IDPPs), a class of engineered repeat IDPs with stimuli-responsive properties. The perfect repetitive architecture of IDPPs allows for repeat-level encoding of tunable protein functionality. Designer IDPPs can be modeled on endogenous IDPs or engineered de novo as protein polymers with dual biophysical and biological functionality. Their properties can be rationally tailored to access enigmatic IDP biology and to create programmable smart biomaterials. With the goal of inspiring the bioengineering of multifunctional IDP-based materials, here we synthesize recent multidisciplinary progress in programming and exploiting the bio-functionality of IDPPs and IDPP-containing proteins. Collectively, expanding beyond the traditional sequence space of extracellular IDPs, emergent sequence-level control of IDPP functionality is fueling the bioengineering of self-assembling biomaterials, advanced drug delivery systems, tissue scaffolds, and biomolecular condensates -genetically encoded organelle-like structures. Looking forward, we emphasize open challenges and emerging opportunities, arguing that the intracellular behaviors of IDPPs represent a rich space for biomedical discovery and innovation. Combined with the intense focus on IDP biology, the growing landscape of IDPPs and their biomedical applications set the stage for the accelerated engineering of high-value biotechnologies and biomaterials.
Collapse
Affiliation(s)
- Maria Camila Giraldo-Castaño
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | - Kai A Littlejohn
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | - Alexa Regina Chua Avecilla
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | - Natalia Barrera-Villamizar
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | - Felipe Garcia Quiroz
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA.
| |
Collapse
|