1
|
Tripp A, Braun M, Wieser F, Oberdorfer G, Lechner H. Click, Compute, Create: A Review of Web-based Tools for Enzyme Engineering. Chembiochem 2024; 25:e202400092. [PMID: 38634409 DOI: 10.1002/cbic.202400092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 04/14/2024] [Accepted: 04/15/2024] [Indexed: 04/19/2024]
Abstract
Enzyme engineering, though pivotal across various biotechnological domains, is often plagued by its time-consuming and labor-intensive nature. This review aims to offer an overview of supportive in silico methodologies for this demanding endeavor. Starting from methods to predict protein structures, to classification of their activity and even the discovery of new enzymes we continue with describing tools used to increase thermostability and production yields of selected targets. Subsequently, we discuss computational methods to modulate both, the activity as well as selectivity of enzymes. Last, we present recent approaches based on cutting-edge machine learning methods to redesign enzymes. With exception of the last chapter, there is a strong focus on methods easily accessible via web-interfaces or simple Python-scripts, therefore readily useable for a diverse and broad community.
Collapse
Affiliation(s)
- Adrian Tripp
- Institute of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010, Graz, Austria
| | - Markus Braun
- Institute of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010, Graz, Austria
| | - Florian Wieser
- Institute of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010, Graz, Austria
| | - Gustav Oberdorfer
- Institute of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010, Graz, Austria
- BioTechMed, Graz, Austria
| | - Horst Lechner
- Institute of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010, Graz, Austria
- BioTechMed, Graz, Austria
| |
Collapse
|
2
|
Min X, Liao Y, Chen X, Yang Q, Ying J, Zou J, Yang C, Zhang J, Ge S, Xia N. PB-GPT: An innovative GPT-based model for protein backbone generation. Structure 2024; 32:1820-1833.e5. [PMID: 39173620 DOI: 10.1016/j.str.2024.07.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 06/02/2024] [Accepted: 07/28/2024] [Indexed: 08/24/2024]
Abstract
With advanced computational methods, it is now feasible to modify or design proteins for specific functions, a process with significant implications for disease treatment and other medical applications. Protein structures and functions are intrinsically linked to their backbones, making the design of these backbones a pivotal aspect of protein engineering. In this study, we focus on the task of unconditionally generating protein backbones. By means of codebook quantization and compression dictionaries, we convert protein backbone structures into a distinctive coded language and propose a GPT-based protein backbone generation model, PB-GPT. To validate the generalization performance of the model, we trained and evaluated the model on both public datasets and small protein datasets. The results demonstrate that our model has the capability to unconditionally generate elaborate, highly realistic protein backbones with structural patterns resembling those of natural proteins, thus showcasing the significant potential of large language models in protein structure design.
Collapse
Affiliation(s)
- Xiaoping Min
- School of Informatics, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China; National Institute of Diagnostics and Vaccine Development in Infectious Diseases, Xiamen University, State Key, No. 422 Siming South Rd, Xiamen 361005, China; State Key Laboratory of Vaccines for Infectious Diseases, Xiang An Biomedicine Laboratory, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China
| | - Yiyang Liao
- School of Informatics, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China; National Institute of Diagnostics and Vaccine Development in Infectious Diseases, Xiamen University, State Key, No. 422 Siming South Rd, Xiamen 361005, China; State Key Laboratory of Vaccines for Infectious Diseases, Xiang An Biomedicine Laboratory, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China
| | - Xiao Chen
- School of Informatics, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China
| | - Qianli Yang
- Institute of Artificial Intelligence, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China
| | - Junjie Ying
- Institute of Artificial Intelligence, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China
| | - Jiajun Zou
- School of Informatics, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China
| | - Chongzhou Yang
- National Institute of Diagnostics and Vaccine Development in Infectious Diseases, Xiamen University, State Key, No. 422 Siming South Rd, Xiamen 361005, China; Institute of Artificial Intelligence, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China
| | - Jun Zhang
- National Institute of Diagnostics and Vaccine Development in Infectious Diseases, Xiamen University, State Key, No. 422 Siming South Rd, Xiamen 361005, China; School of Public Health, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China; State Key Laboratory of Vaccines for Infectious Diseases, Xiang An Biomedicine Laboratory, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China
| | - Shengxiang Ge
- National Institute of Diagnostics and Vaccine Development in Infectious Diseases, Xiamen University, State Key, No. 422 Siming South Rd, Xiamen 361005, China; School of Public Health, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China; State Key Laboratory of Vaccines for Infectious Diseases, Xiang An Biomedicine Laboratory, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China.
| | - Ningshao Xia
- National Institute of Diagnostics and Vaccine Development in Infectious Diseases, Xiamen University, State Key, No. 422 Siming South Rd, Xiamen 361005, China; School of Public Health, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China; State Key Laboratory of Vaccines for Infectious Diseases, Xiang An Biomedicine Laboratory, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China.
| |
Collapse
|
3
|
Leone L, De Fenza M, Esposito A, Maglio O, Nastri F, Lombardi A. Peptides and metal ions: A successful marriage for developing artificial metalloproteins. J Pept Sci 2024; 30:e3606. [PMID: 38719781 DOI: 10.1002/psc.3606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 03/27/2024] [Accepted: 03/28/2024] [Indexed: 10/12/2024]
Abstract
The mutual relationship between peptides and metal ions enables metalloproteins to have crucial roles in biological systems, including structural, sensing, electron transport, and catalytic functions. The effort to reproduce or/and enhance these roles, or even to create unprecedented functions, is the focus of protein design, the first step toward the comprehension of the complex machinery of nature. Nowadays, protein design allows the building of sophisticated scaffolds, with novel functions and exceptional stability. Recent progress in metalloprotein design has led to the building of peptides/proteins capable of orchestrating the desired functions of different metal cofactors. The structural diversity of peptides allows proper selection of first- and second-shell ligands, as well as long-range electrostatic and hydrophobic interactions, which represent precious tools for tuning metal properties. The scope of this review is to discuss the construction of metal sites in de novo designed and miniaturized scaffolds. Selected examples of mono-, di-, and multi-nuclear binding sites, from the last 20 years will be described in an effort to highlight key artificial models of catalytic or electron-transfer metalloproteins. The authors' goal is to make readers feel like guests at the marriage between peptides and metal ions while offering sources of inspiration for future architects of innovative, artificial metalloproteins.
Collapse
Affiliation(s)
- Linda Leone
- Department of Chemical Sciences, University of Naples Federico II, Naples, Italy
| | - Maria De Fenza
- Department of Chemical Sciences, University of Naples Federico II, Naples, Italy
| | - Alessandra Esposito
- Department of Chemical Sciences, University of Naples Federico II, Naples, Italy
| | - Ornella Maglio
- Department of Chemical Sciences, University of Naples Federico II, Naples, Italy
- Institute of Biostructures and Bioimaging, National Research Council, Naples, Italy
| | - Flavia Nastri
- Department of Chemical Sciences, University of Naples Federico II, Naples, Italy
| | - Angela Lombardi
- Department of Chemical Sciences, University of Naples Federico II, Naples, Italy
| |
Collapse
|
4
|
Son A, Park J, Kim W, Yoon Y, Lee S, Park Y, Kim H. Revolutionizing Molecular Design for Innovative Therapeutic Applications through Artificial Intelligence. Molecules 2024; 29:4626. [PMID: 39407556 PMCID: PMC11477718 DOI: 10.3390/molecules29194626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2024] [Revised: 09/19/2024] [Accepted: 09/27/2024] [Indexed: 10/20/2024] Open
Abstract
The field of computational protein engineering has been transformed by recent advancements in machine learning, artificial intelligence, and molecular modeling, enabling the design of proteins with unprecedented precision and functionality. Computational methods now play a crucial role in enhancing the stability, activity, and specificity of proteins for diverse applications in biotechnology and medicine. Techniques such as deep learning, reinforcement learning, and transfer learning have dramatically improved protein structure prediction, optimization of binding affinities, and enzyme design. These innovations have streamlined the process of protein engineering by allowing the rapid generation of targeted libraries, reducing experimental sampling, and enabling the rational design of proteins with tailored properties. Furthermore, the integration of computational approaches with high-throughput experimental techniques has facilitated the development of multifunctional proteins and novel therapeutics. However, challenges remain in bridging the gap between computational predictions and experimental validation and in addressing ethical concerns related to AI-driven protein design. This review provides a comprehensive overview of the current state and future directions of computational methods in protein engineering, emphasizing their transformative potential in creating next-generation biologics and advancing synthetic biology.
Collapse
Affiliation(s)
- Ahrum Son
- Department of Molecular Medicine, Scripps Research, La Jolla, CA 92037, USA;
| | - Jongham Park
- Department of Bio-AI Convergence, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea; (J.P.); (W.K.); (Y.Y.); (S.L.); (Y.P.)
| | - Woojin Kim
- Department of Bio-AI Convergence, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea; (J.P.); (W.K.); (Y.Y.); (S.L.); (Y.P.)
| | - Yoonki Yoon
- Department of Bio-AI Convergence, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea; (J.P.); (W.K.); (Y.Y.); (S.L.); (Y.P.)
| | - Sangwoon Lee
- Department of Bio-AI Convergence, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea; (J.P.); (W.K.); (Y.Y.); (S.L.); (Y.P.)
| | - Yongho Park
- Department of Bio-AI Convergence, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea; (J.P.); (W.K.); (Y.Y.); (S.L.); (Y.P.)
| | - Hyunsoo Kim
- Department of Bio-AI Convergence, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea; (J.P.); (W.K.); (Y.Y.); (S.L.); (Y.P.)
- Department of Convergent Bioscience and Informatics, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea
- Protein AI Design Institute, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea
- SCICS, Prove beyond AI, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea
| |
Collapse
|
5
|
Liu J, Guo Z, You H, Zhang C, Lai L. All-Atom Protein Sequence Design Based on Geometric Deep Learning. Angew Chem Int Ed Engl 2024:e202411461. [PMID: 39295564 DOI: 10.1002/anie.202411461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Revised: 09/09/2024] [Accepted: 09/18/2024] [Indexed: 09/21/2024]
Abstract
Designing sequences for specific protein backbones is a key step in creating new functional proteins. Here, we introduce GeoSeqBuilder, a deep learning framework that integrates protein sequence generation with side chain conformation prediction to produce the complete all-atom structures for designed sequences. GeoSeqBuilder uses spatial geometric features from protein backbones and explicitly includes three-body interactions of neighboring residues. GeoSeqBuilder achieves native residue type recovery rate of 51.6 %, comparable to ProteinMPNN and other leading methods, while accurately predicting side chain conformations. We first used GeoSeqBuilder to design sequences for thioredoxin and a hallucinated three-helical bundle protein. All the 15 tested sequences expressed as soluble monomeric proteins with high thermal stability, and the 2 high-resolution crystal structures solved closely match the designed models. The generated protein sequences exhibit low similarity (minimum 23 %) to the original sequences, with significantly altered hydrophobic cores. We further redesigned the hydrophobic core of glutathione peroxidase 4, and 3 of the 5 designs showed improved enzyme activity. Although further testing is needed, the high experimental success rate in our testing demonstrates that GeoSeqBuilder is a powerful tool for designing novel sequences for predefined protein structures with atomic details. GeoSeqBuilder is available at https://github.com/PKUliujl/GeoSeqBuilder.
Collapse
Affiliation(s)
- Jiale Liu
- Center for Life Sciences Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
| | - Zheng Guo
- Center for Life Sciences Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
| | - Hantian You
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing, 100871, China
| | - Changsheng Zhang
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing, 100871, China
| | - Luhua Lai
- Center for Life Sciences Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing, 100871, China
- Center for Quantitative Biology Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
- Chengdu Academy for Advanced Interdisciplinary Biotechnologies, Peking University, Chengdu, 510100, Sichuan, China
| |
Collapse
|
6
|
Hardy BJ, Curnow P. Computational design of de novo bioenergetic membrane proteins. Biochem Soc Trans 2024; 52:1737-1745. [PMID: 38958574 DOI: 10.1042/bst20231347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Revised: 06/11/2024] [Accepted: 06/17/2024] [Indexed: 07/04/2024]
Abstract
The major energy-producing reactions of biochemistry occur at biological membranes. Computational protein design now provides the opportunity to elucidate the underlying principles of these processes and to construct bioenergetic pathways on our own terms. Here, we review recent achievements in this endeavour of 'synthetic bioenergetics', with a particular focus on new enabling tools that facilitate the computational design of biocompatible de novo integral membrane proteins. We use recent examples to showcase some of the key computational approaches in current use and highlight that the overall philosophy of 'surface-swapping' - the replacement of solvent-facing residues with amino acids bearing lipid-soluble hydrophobic sidechains - is a promising avenue in membrane protein design. We conclude by highlighting outstanding design challenges and the emerging role of AI in sequence design and structure ideation.
Collapse
Affiliation(s)
| | - Paul Curnow
- School of Biochemistry, University of Bristol, Bristol, U.K
| |
Collapse
|
7
|
Krapp LF, Meireles FA, Abriata LA, Devillard J, Vacle S, Marcaida MJ, Dal Peraro M. Context-aware geometric deep learning for protein sequence design. Nat Commun 2024; 15:6273. [PMID: 39054322 PMCID: PMC11272779 DOI: 10.1038/s41467-024-50571-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Accepted: 07/15/2024] [Indexed: 07/27/2024] Open
Abstract
Protein design and engineering are evolving at an unprecedented pace leveraging the advances in deep learning. Current models nonetheless cannot natively consider non-protein entities within the design process. Here, we introduce a deep learning approach based solely on a geometric transformer of atomic coordinates and element names that predicts protein sequences from backbone scaffolds aware of the restraints imposed by diverse molecular environments. To validate the method, we show that it can produce highly thermostable, catalytically active enzymes with high success rates. This concept is anticipated to improve the versatility of protein design pipelines for crafting desired functions.
Collapse
Affiliation(s)
- Lucien F Krapp
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Fernando A Meireles
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Luciano A Abriata
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Jean Devillard
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Sarah Vacle
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Maria J Marcaida
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Matteo Dal Peraro
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL), Lausanne, Switzerland.
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
| |
Collapse
|
8
|
Xia Y, Du X, Liu B, Guo S, Huo YX. Species-specific design of artificial promoters by transfer-learning based generative deep-learning model. Nucleic Acids Res 2024; 52:6145-6157. [PMID: 38783063 PMCID: PMC11194083 DOI: 10.1093/nar/gkae429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 04/04/2024] [Accepted: 05/08/2024] [Indexed: 05/25/2024] Open
Abstract
Native prokaryotic promoters share common sequence patterns, but are species dependent. For understudied species with limited data, it is challenging to predict the strength of existing promoters and generate novel promoters. Here, we developed PromoGen, a collection of nucleotide language models to generate species-specific functional promoters, across dozens of species in a data and parameter efficient way. Twenty-seven species-specific models in this collection were finetuned from the pretrained model which was trained on multi-species promoters. When systematically compared with native promoters, the Escherichia coli- and Bacillus subtilis-specific artificial PromoGen-generated promoters (PGPs) were demonstrated to hold all distribution patterns of native promoters. A regression model was developed to score generated either by PromoGen or by another competitive neural network, and the overall score of PGPs is higher. Encouraged by in silico analysis, we further experimentally characterized twenty-two B. subtilis PGPs, results showed that four of tested PGPs reached the strong promoter level while all were active. Furthermore, we developed a user-friendly website to generate species-specific promoters for 27 different species by PromoGen. This work presented an efficient deep-learning strategy for de novo species-specific promoter generation even with limited datasets, providing valuable promoter toolboxes especially for the metabolic engineering of understudied microorganisms.
Collapse
Affiliation(s)
- Yan Xia
- Key Laboratory of Molecular Medicine and Biotherapy, School of Life Science, Beijing Institute of Technology, Beijing 100081, China
| | - Xiaowen Du
- Key Laboratory of Molecular Medicine and Biotherapy, School of Life Science, Beijing Institute of Technology, Beijing 100081, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Shuyuan Guo
- Key Laboratory of Molecular Medicine and Biotherapy, School of Life Science, Beijing Institute of Technology, Beijing 100081, China
| | - Yi-Xin Huo
- Key Laboratory of Molecular Medicine and Biotherapy, School of Life Science, Beijing Institute of Technology, Beijing 100081, China
- Tangshan Research Institute, Beijing Institute of Technology, Hebei 063611, China
| |
Collapse
|
9
|
Schmitz M, Ballestin JB, Liang J, Tomas F, Freist L, Voigt K, Di Ventura B, Öztürk MA. Int&in: A machine learning-based web server for active split site identification in inteins. Protein Sci 2024; 33:e4985. [PMID: 38717278 PMCID: PMC11078102 DOI: 10.1002/pro.4985] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 03/06/2024] [Accepted: 03/24/2024] [Indexed: 05/12/2024]
Abstract
Inteins are proteins that excise themselves out of host proteins and ligate the flanking polypeptides in an auto-catalytic process called protein splicing. In nature, inteins are either contiguous or split. In the case of split inteins, the two fragments must first form a complex for the splicing to occur. Contiguous inteins have previously been artificially split in two fragments because split inteins allow for distinct applications than contiguous ones. Even naturally split inteins have been split at unnatural split sites to obtain fragments with reduced affinity for one another, which are useful to create conditional inteins or to study protein-protein interactions. So far, split sites in inteins have been heuristically identified. We developed Int&in, a web server freely available for academic research (https://intein.biologie.uni-freiburg.de) that runs a machine learning model using logistic regression to predict active and inactive split sites in inteins with high accuracy. The model was trained on a dataset of 126 split sites generated using the gp41-1, Npu DnaE and CL inteins and validated using 97 split sites extracted from the literature. Despite the limited data size, the model, which uses various protein structural features, as well as sequence conservation information, achieves an accuracy of 0.79 and 0.78 for the training and testing sets, respectively. We envision Int&in will facilitate the engineering of novel split inteins for applications in synthetic and cell biology.
Collapse
Affiliation(s)
- Mirko Schmitz
- BIOSS and CIBSS Research Signalling Centers, University of FreiburgFreiburgGermany
- Institute of Biology II, University of FreiburgFreiburgGermany
- 4HF Biotec GmbHFreiburgGermany
| | - Jara Ballestin Ballestin
- BIOSS and CIBSS Research Signalling Centers, University of FreiburgFreiburgGermany
- Institute of Biology II, University of FreiburgFreiburgGermany
- Bioprocess Innovation Unit, ViraTherapeutics GmbHRumAustria
| | - Junsheng Liang
- BIOSS and CIBSS Research Signalling Centers, University of FreiburgFreiburgGermany
- Institute of Biology II, University of FreiburgFreiburgGermany
| | - Franziska Tomas
- BIOSS and CIBSS Research Signalling Centers, University of FreiburgFreiburgGermany
- Institute of Biology II, University of FreiburgFreiburgGermany
- Department of Molecular Life SciencesUniversity of ZurichZurichSwitzerland
| | - Leon Freist
- Institute of Biology III, University of FreiburgFreiburgGermany
| | - Karsten Voigt
- Institute of Biology III, University of FreiburgFreiburgGermany
| | - Barbara Di Ventura
- BIOSS and CIBSS Research Signalling Centers, University of FreiburgFreiburgGermany
- Institute of Biology II, University of FreiburgFreiburgGermany
| | - Mehmet Ali Öztürk
- BIOSS and CIBSS Research Signalling Centers, University of FreiburgFreiburgGermany
- Institute of Biology II, University of FreiburgFreiburgGermany
| |
Collapse
|
10
|
Janson G, Feig M. Transferable deep generative modeling of intrinsically disordered protein conformations. PLoS Comput Biol 2024; 20:e1012144. [PMID: 38781245 PMCID: PMC11152266 DOI: 10.1371/journal.pcbi.1012144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 06/05/2024] [Accepted: 05/07/2024] [Indexed: 05/25/2024] Open
Abstract
Intrinsically disordered proteins have dynamic structures through which they play key biological roles. The elucidation of their conformational ensembles is a challenging problem requiring an integrated use of computational and experimental methods. Molecular simulations are a valuable computational strategy for constructing structural ensembles of disordered proteins but are highly resource-intensive. Recently, machine learning approaches based on deep generative models that learn from simulation data have emerged as an efficient alternative for generating structural ensembles. However, such methods currently suffer from limited transferability when modeling sequences and conformations absent in the training data. Here, we develop a novel generative model that achieves high levels of transferability for intrinsically disordered protein ensembles. The approach, named idpSAM, is a latent diffusion model based on transformer neural networks. It combines an autoencoder to learn a representation of protein geometry and a diffusion model to sample novel conformations in the encoded space. IdpSAM was trained on a large dataset of simulations of disordered protein regions performed with the ABSINTH implicit solvent model. Thanks to the expressiveness of its neural networks and its training stability, idpSAM faithfully captures 3D structural ensembles of test sequences with no similarity in the training set. Our study also demonstrates the potential for generating full conformational ensembles from datasets with limited sampling and underscores the importance of training set size for generalization. We believe that idpSAM represents a significant progress in transferable protein ensemble modeling through machine learning.
Collapse
Affiliation(s)
- Giacomo Janson
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, United States of America
| | - Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, United States of America
| |
Collapse
|
11
|
Harihar B, Saravanan KM, Gromiha MM, Selvaraj S. Importance of Inter-residue Contacts for Understanding Protein Folding and Unfolding Rates, Remote Homology, and Drug Design. Mol Biotechnol 2024:10.1007/s12033-024-01119-4. [PMID: 38498284 DOI: 10.1007/s12033-024-01119-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Accepted: 02/10/2024] [Indexed: 03/20/2024]
Abstract
Inter-residue interactions in protein structures provide valuable insights into protein folding and stability. Understanding these interactions can be helpful in many crucial applications, including rational design of therapeutic small molecules and biologics, locating functional protein sites, and predicting protein-protein and protein-ligand interactions. The process of developing machine learning models incorporating inter-residue interactions has been improved recently. This review highlights the theoretical models incorporating inter-residue interactions in predicting folding and unfolding rates of proteins. Utilizing contact maps to depict inter-residue interactions aids researchers in developing computer models for detecting remote homologs and interface residues within protein-protein complexes which, in turn, enhances our knowledge of the relationship between sequence and structure of proteins. Further, the application of contact maps derived from inter-residue interactions is highlighted in the field of drug discovery. Overall, this review presents an extensive assessment of the significant models that use inter-residue interactions to investigate folding rates, unfolding rates, remote homology, and drug development, providing potential future advancements in constructing efficient computational models in structural biology.
Collapse
Affiliation(s)
- Balasubramanian Harihar
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
| | - Konda Mani Saravanan
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India
- Department of Biotechnology, Bharath Institute of Higher Education and Research, Chennai, Tamil Nadu, 600073, India
| | - Michael M Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
| | - Samuel Selvaraj
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India.
| |
Collapse
|
12
|
Pacesa M, Pelea O, Jinek M. Past, present, and future of CRISPR genome editing technologies. Cell 2024; 187:1076-1100. [PMID: 38428389 DOI: 10.1016/j.cell.2024.01.042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 01/23/2024] [Accepted: 01/26/2024] [Indexed: 03/03/2024]
Abstract
Genome editing has been a transformative force in the life sciences and human medicine, offering unprecedented opportunities to dissect complex biological processes and treat the underlying causes of many genetic diseases. CRISPR-based technologies, with their remarkable efficiency and easy programmability, stand at the forefront of this revolution. In this Review, we discuss the current state of CRISPR gene editing technologies in both research and therapy, highlighting limitations that constrain them and the technological innovations that have been developed in recent years to address them. Additionally, we examine and summarize the current landscape of gene editing applications in the context of human health and therapeutics. Finally, we outline potential future developments that could shape gene editing technologies and their applications in the coming years.
Collapse
Affiliation(s)
- Martin Pacesa
- Laboratory of Protein Design and Immunoengineering, École Polytechnique Fédérale de Lausanne and Swiss Institute of Bioinformatics, Station 19, CH-1015 Lausanne, Switzerland
| | - Oana Pelea
- Department of Biochemistry, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
| | - Martin Jinek
- Department of Biochemistry, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland.
| |
Collapse
|
13
|
Janson G, Feig M. Transferable deep generative modeling of intrinsically disordered protein conformations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.08.579522. [PMID: 38370653 PMCID: PMC10871340 DOI: 10.1101/2024.02.08.579522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Intrinsically disordered proteins have dynamic structures through which they play key biological roles. The elucidation of their conformational ensembles is a challenging problem requiring an integrated use of computational and experimental methods. Molecular simulations are a valuable computational strategy for constructing structural ensembles of disordered proteins but are highly resource-intensive. Recently, machine learning approaches based on deep generative models that learn from simulation data have emerged as an efficient alternative for generating structural ensembles. However, such methods currently suffer from limited transferability when modeling sequences and conformations absent in the training data. Here, we develop a novel generative model that achieves high levels of transferability for intrinsically disordered protein ensembles. The approach, named idpSAM, is a latent diffusion model based on transformer neural networks. It combines an autoencoder to learn a representation of protein geometry and a diffusion model to sample novel conformations in the encoded space. IdpSAM was trained on a large dataset of simulations of disordered protein regions performed with the ABSINTH implicit solvent model. Thanks to the expressiveness of its neural networks and its training stability, idpSAM faithfully captures 3D structural ensembles of test sequences with no similarity in the training set. Our study also demonstrates the potential for generating full conformational ensembles from datasets with limited sampling and underscores the importance of training set size for generalization. We believe that idpSAM represents a significant progress in transferable protein ensemble modeling through machine learning.
Collapse
Affiliation(s)
- Giacomo Janson
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, USA
| | - Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, USA
| |
Collapse
|
14
|
Chu AE, Lu T, Huang PS. Sparks of function by de novo protein design. Nat Biotechnol 2024; 42:203-215. [PMID: 38361073 PMCID: PMC11366440 DOI: 10.1038/s41587-024-02133-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Accepted: 01/09/2024] [Indexed: 02/17/2024]
Abstract
Information in proteins flows from sequence to structure to function, with each step causally driven by the preceding one. Protein design is founded on inverting this process: specify a desired function, design a structure executing this function, and find a sequence that folds into this structure. This 'central dogma' underlies nearly all de novo protein-design efforts. Our ability to accomplish these tasks depends on our understanding of protein folding and function and our ability to capture this understanding in computational methods. In recent years, deep learning-derived approaches for efficient and accurate structure modeling and enrichment of successful designs have enabled progression beyond the design of protein structures and towards the design of functional proteins. We examine these advances in the broader context of classical de novo protein design and consider implications for future challenges to come, including fundamental capabilities such as sequence and structure co-design and conformational control considering flexibility, and functional objectives such as antibody and enzyme design.
Collapse
Affiliation(s)
- Alexander E Chu
- Biophysics Program, Stanford University, Palo Alto, CA, USA
- Department of Bioengineering, Stanford University, Palo Alto, CA, USA
- Google DeepMind, London, UK
| | - Tianyu Lu
- Department of Bioengineering, Stanford University, Palo Alto, CA, USA
| | - Po-Ssu Huang
- Biophysics Program, Stanford University, Palo Alto, CA, USA.
- Department of Bioengineering, Stanford University, Palo Alto, CA, USA.
| |
Collapse
|