1
|
Mu J, Li Z, Zhang B, Zhang Q, Iqbal J, Wadood A, Wei T, Feng Y, Chen HF. Graphormer supervised de novo protein design method and function validation. Brief Bioinform 2024; 25:bbae135. [PMID: 38557677 PMCID: PMC10982952 DOI: 10.1093/bib/bbae135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Revised: 01/31/2024] [Accepted: 03/12/2024] [Indexed: 04/04/2024] Open
Abstract
Protein design is central to nearly all protein engineering problems, as it can enable the creation of proteins with new biological functions, such as improving the catalytic efficiency of enzymes. One key facet of protein design, fixed-backbone protein sequence design, seeks to design new sequences that will conform to a prescribed protein backbone structure. Nonetheless, existing sequence design methods present limitations, such as low sequence diversity and shortcomings in experimental validation of the designed functional proteins. These inadequacies obstruct the goal of functional protein design. To improve these limitations, we initially developed the Graphormer-based Protein Design (GPD) model. This model utilizes the Transformer on a graph-based representation of three-dimensional protein structures and incorporates Gaussian noise and a sequence random masks to node features, thereby enhancing sequence recovery and diversity. The performance of the GPD model was significantly better than that of the state-of-the-art ProteinMPNN model on multiple independent tests, especially for sequence diversity. We employed GPD to design CalB hydrolase and generated nine artificially designed CalB proteins. The results show a 1.7-fold increase in catalytic activity compared to that of the wild-type CalB and strong substrate selectivity on p-nitrophenyl acetate with different carbon chain lengths (C2-C16). Thus, the GPD method could be used for the de novo design of industrial enzymes and protein drugs. The code was released at https://github.com/decodermu/GPD.
Collapse
Affiliation(s)
- Junxi Mu
- State Key Laboratory of Microbial metabolism, Joint International Research Laboratory of Metabolic Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
- Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, No.5 Yiheyuan Road, Beijing, 100871, China
| | - Zhengxin Li
- State Key Laboratory of Microbial metabolism, Joint International Research Laboratory of Metabolic Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Bo Zhang
- State Key Laboratory of Microbial metabolism, Joint International Research Laboratory of Metabolic Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Qi Zhang
- State Key Laboratory of Microbial metabolism, Joint International Research Laboratory of Metabolic Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Jamshed Iqbal
- Centre for Advanced Drug Research, COMSATS University Islamabad, Abbottabad Campus, Abbottabad, 22060, Pakistan
| | - Abdul Wadood
- Department of Biochemistry, Abdul Wali Khan University Mardan, Mardan, 23200, Pakistan
| | - Ting Wei
- State Key Laboratory of Microbial metabolism, Joint International Research Laboratory of Metabolic Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Yan Feng
- State Key Laboratory of Microbial metabolism, Joint International Research Laboratory of Metabolic Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Hai-Feng Chen
- State Key Laboratory of Microbial metabolism, Joint International Research Laboratory of Metabolic Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| |
Collapse
|
2
|
Wang H, Liu D, Zhao K, Wang Y, Zhang G. SPDesign: protein sequence designer based on structural sequence profile using ultrafast shape recognition. Brief Bioinform 2024; 25:bbae146. [PMID: 38600663 PMCID: PMC11006797 DOI: 10.1093/bib/bbae146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 03/02/2024] [Accepted: 03/15/2024] [Indexed: 04/12/2024] Open
Abstract
Protein sequence design can provide valuable insights into biopharmaceuticals and disease treatments. Currently, most protein sequence design methods based on deep learning focus on network architecture optimization, while ignoring protein-specific physicochemical features. Inspired by the successful application of structure templates and pre-trained models in the protein structure prediction, we explored whether the representation of structural sequence profile can be used for protein sequence design. In this work, we propose SPDesign, a method for protein sequence design based on structural sequence profile using ultrafast shape recognition. Given an input backbone structure, SPDesign utilizes ultrafast shape recognition vectors to accelerate the search for similar protein structures in our in-house PAcluster80 structure database and then extracts the sequence profile through structure alignment. Combined with structural pre-trained knowledge and geometric features, they are further fed into an enhanced graph neural network for sequence prediction. The results show that SPDesign significantly outperforms the state-of-the-art methods, such as ProteinMPNN, Pifold and LM-Design, leading to 21.89%, 15.54% and 11.4% accuracy gains in sequence recovery rate on CATH 4.2 benchmark, respectively. Encouraging results also have been achieved on orphan and de novo (designed) benchmarks with few homologous sequences. Furthermore, analysis conducted by the PDBench tool suggests that SPDesign performs well in subdivided structures. More interestingly, we found that SPDesign can well reconstruct the sequences of some proteins that have similar structures but different sequences. Finally, the structural modeling verification experiment indicates that the sequences designed by SPDesign can fold into the native structures more accurately.
Collapse
Affiliation(s)
| | | | | | - Yajun Wang
- Corresponding authors. Guijun Zhang, College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China. E-mail: ; Yajun Wang, College of Pharmaceutical Science, Zhejiang University of Technology, Hangzhou 310014, China. E-mail:
| | - Guijun Zhang
- Corresponding authors. Guijun Zhang, College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China. E-mail: ; Yajun Wang, College of Pharmaceutical Science, Zhejiang University of Technology, Hangzhou 310014, China. E-mail:
| |
Collapse
|
3
|
Judge A, Sankaran B, Hu L, Palaniappan M, Birgy A, Prasad BVV, Palzkill T. Network of epistatic interactions in an enzyme active site revealed by large-scale deep mutational scanning. Proc Natl Acad Sci U S A 2024; 121:e2313513121. [PMID: 38483989 PMCID: PMC10962969 DOI: 10.1073/pnas.2313513121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 02/14/2024] [Indexed: 03/19/2024] Open
Abstract
Cooperative interactions between amino acids are critical for protein function. A genetic reflection of cooperativity is epistasis, which is when a change in the amino acid at one position changes the sequence requirements at another position. To assess epistasis within an enzyme active site, we utilized CTX-M β-lactamase as a model system. CTX-M hydrolyzes β-lactam antibiotics to provide antibiotic resistance, allowing a simple functional selection for rapid sorting of modified enzymes. We created all pairwise mutations across 17 active site positions in the β-lactamase enzyme and quantitated the function of variants against two β-lactam antibiotics using next-generation sequencing. Context-dependent sequence requirements were determined by comparing the antibiotic resistance function of double mutations across the CTX-M active site to their predicted function based on the constituent single mutations, revealing both positive epistasis (synergistic interactions) and negative epistasis (antagonistic interactions) between amino acid substitutions. The resulting trends demonstrate that positive epistasis is present throughout the active site, that epistasis between residues is mediated through substrate interactions, and that residues more tolerant to substitutions serve as generic compensators which are responsible for many cases of positive epistasis. Additionally, we show that a key catalytic residue (Glu166) is amenable to compensatory mutations, and we characterize one such double mutant (E166Y/N170G) that acts by an altered catalytic mechanism. These findings shed light on the unique biochemical factors that drive epistasis within an enzyme active site and will inform enzyme engineering efforts by bridging the gap between amino acid sequence and catalytic function.
Collapse
Affiliation(s)
- Allison Judge
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX77030
| | - Banumathi Sankaran
- Department of Molecular Biophysics and Integrated Bioimaging, Berkeley Center for Structural Biology Lawrence Berkeley National Laboratory, Berkeley, CA94720
| | - Liya Hu
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX77030
| | - Murugesan Palaniappan
- Department of Pathology and Immunology, Center for Drug Discovery, Baylor College of Medicine, Houston, TX77030
| | - André Birgy
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX77030
- Infections, Antimicrobials, Modelling, Evolution, UMR 1137, French Insitute for Medical Research (INSERM), Faculty of Health, Université Paris Cité, Paris75006, France
| | - B. V. Venkataram Prasad
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX77030
| | - Timothy Palzkill
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX77030
| |
Collapse
|
4
|
Kim DN, McNaughton AD, Kumar N. Leveraging Artificial Intelligence to Expedite Antibody Design and Enhance Antibody-Antigen Interactions. Bioengineering (Basel) 2024; 11:185. [PMID: 38391671 PMCID: PMC10886287 DOI: 10.3390/bioengineering11020185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 01/30/2024] [Accepted: 02/06/2024] [Indexed: 02/24/2024] Open
Abstract
This perspective sheds light on the transformative impact of recent computational advancements in the field of protein therapeutics, with a particular focus on the design and development of antibodies. Cutting-edge computational methods have revolutionized our understanding of protein-protein interactions (PPIs), enhancing the efficacy of protein therapeutics in preclinical and clinical settings. Central to these advancements is the application of machine learning and deep learning, which offers unprecedented insights into the intricate mechanisms of PPIs and facilitates precise control over protein functions. Despite these advancements, the complex structural nuances of antibodies pose ongoing challenges in their design and optimization. Our review provides a comprehensive exploration of the latest deep learning approaches, including language models and diffusion techniques, and their role in surmounting these challenges. We also present a critical analysis of these methods, offering insights to drive further progress in this rapidly evolving field. The paper includes practical recommendations for the application of these computational techniques, supplemented with independent benchmark studies. These studies focus on key performance metrics such as accuracy and the ease of program execution, providing a valuable resource for researchers engaged in antibody design and development. Through this detailed perspective, we aim to contribute to the advancement of antibody design, equipping researchers with the tools and knowledge to navigate the complexities of this field.
Collapse
Affiliation(s)
- Doo Nam Kim
- Pacific Northwest National Laboratory, 902 Battelle Blvd., Richland, WA 99352, USA
| | - Andrew D McNaughton
- Pacific Northwest National Laboratory, 902 Battelle Blvd., Richland, WA 99352, USA
| | - Neeraj Kumar
- Pacific Northwest National Laboratory, 902 Battelle Blvd., Richland, WA 99352, USA
| |
Collapse
|
5
|
Dolorfino M, Samanta R, Vorobieva A. ProteinMPNN Recovers Complex Sequence Properties of Transmembrane β-barrels. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.16.575764. [PMID: 38352434 PMCID: PMC10862708 DOI: 10.1101/2024.01.16.575764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/19/2024]
Abstract
Recent deep-learning (DL) protein design methods have been successfully applied to a range of protein design problems, including the de novo design of novel folds, protein binders, and enzymes. However, DL methods have yet to meet the challenge of de novo membrane protein (MP) and the design of complex β-sheet folds. We performed a comprehensive benchmark of one DL protein sequence design method, ProteinMPNN, using transmembrane and water-soluble β-barrel folds as a model, and compared the performance of ProteinMPNN to the new membrane-specific Rosetta Franklin2023 energy function. We tested the effect of input backbone refinement on ProteinMPNN performance and found that given refined and well-defined inputs, ProteinMPNN more accurately captures global sequence properties despite complex folding biophysics. It generates more diverse TMB sequences than Franklin2023 in pore-facing positions. In addition, ProteinMPNN generated TMB sequences that passed state-of-the-art in silico filters for experimental validation, suggesting that the model could be used in de novo design tasks of diverse nanopores for single-molecule sensing and sequencing. Lastly, our results indicate that the low success rate of ProteinMPNN for the design of β-sheet proteins stems from backbone input accuracy rather than software limitations.
Collapse
Affiliation(s)
- Marissa Dolorfino
- Structural Biology Brussel, Vrije Universiteit Brussel, Brussels, Belgium
- VUB-VIB Center for Structural Biology, Brussels, Belgium
| | | | - Anastassia Vorobieva
- Structural Biology Brussel, Vrije Universiteit Brussel, Brussels, Belgium
- VUB-VIB Center for Structural Biology, Brussels, Belgium
- VIB Center for AI and Computational Biology, Belgium
| |
Collapse
|
6
|
Fu X, Suo H, Zhang J, Chen D. Machine-learning-guided Directed Evolution for AAV Capsid Engineering. Curr Pharm Des 2024; 30:811-824. [PMID: 38445704 DOI: 10.2174/0113816128286593240226060318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 02/07/2024] [Accepted: 02/13/2024] [Indexed: 03/07/2024]
Abstract
Target gene delivery is crucial to gene therapy. Adeno-associated virus (AAV) has emerged as a primary gene therapy vector due to its broad host range, long-term expression, and low pathogenicity. However, AAV vectors have some limitations, such as immunogenicity and insufficient targeting. Designing or modifying capsids is a potential method of improving the efficacy of gene delivery, but hindered by weak biological basis of AAV, complexity of the capsids, and limitations of current screening methods. Artificial intelligence (AI), especially machine learning (ML), has great potential to accelerate and improve the optimization of capsid properties as well as decrease their development time and manufacturing costs. This review introduces the traditional methods of designing AAV capsids and the general steps of building a sequence-function ML model, highlights the applications of ML in the development workflow, and summarizes its advantages and challenges.
Collapse
Affiliation(s)
- Xianrong Fu
- School of Artificial Intelligence, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Hairui Suo
- School of Artificial Intelligence, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Jiachen Zhang
- School of Artificial Intelligence, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Dongmei Chen
- School of Artificial Intelligence, Hangzhou Dianzi University, Hangzhou 310018, China
| |
Collapse
|
7
|
Wang J, Chen C, Yao G, Ding J, Wang L, Jiang H. Intelligent Protein Design and Molecular Characterization Techniques: A Comprehensive Review. Molecules 2023; 28:7865. [PMID: 38067593 PMCID: PMC10707872 DOI: 10.3390/molecules28237865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 11/13/2023] [Accepted: 11/23/2023] [Indexed: 12/18/2023] Open
Abstract
In recent years, the widespread application of artificial intelligence algorithms in protein structure, function prediction, and de novo protein design has significantly accelerated the process of intelligent protein design and led to many noteworthy achievements. This advancement in protein intelligent design holds great potential to accelerate the development of new drugs, enhance the efficiency of biocatalysts, and even create entirely new biomaterials. Protein characterization is the key to the performance of intelligent protein design. However, there is no consensus on the most suitable characterization method for intelligent protein design tasks. This review describes the methods, characteristics, and representative applications of traditional descriptors, sequence-based and structure-based protein characterization. It discusses their advantages, disadvantages, and scope of application. It is hoped that this could help researchers to better understand the limitations and application scenarios of these methods, and provide valuable references for choosing appropriate protein characterization techniques for related research in the field, so as to better carry out protein research.
Collapse
Affiliation(s)
| | | | | | - Junjie Ding
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| | - Liangliang Wang
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| | - Hui Jiang
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| |
Collapse
|
8
|
Komp E, Alanzi HN, Francis R, Vuong C, Roberts L, Mosallanejad A, Beck DAC. Homologous Pairs of Low and High Temperature Originating Proteins Spanning the Known Prokaryotic Universe. Sci Data 2023; 10:682. [PMID: 37805601 PMCID: PMC10560248 DOI: 10.1038/s41597-023-02553-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 09/08/2023] [Indexed: 10/09/2023] Open
Abstract
Stability of proteins at high temperature has been a topic of interest for many years, as this attribute is favourable for applications ranging from therapeutics to industrial chemical manufacturing. Our current understanding and methods for designing high-temperature stability into target proteins are inadequate. To drive innovation in this space, we have curated a large dataset, learn2thermDB, of protein-temperature examples, totalling 24 million instances, and paired proteins across temperatures based on homology, yielding 69 million protein pairs - orders of magnitude larger than the current largest. This important step of pairing allows for study of high-temperature stability in a sequence-dependent manner in the big data era. The data pipeline is parameterized and open, allowing it to be tuned by downstream users. We further show that the data contains signal for deep learning. This data offers a new doorway towards thermal stability design models.
Collapse
Affiliation(s)
- Evan Komp
- Department of Chemical Engineering, University of Washington, Seattle, USA.
| | - Humood N Alanzi
- Department of Chemical Engineering, University of Washington, Seattle, USA
| | - Ryan Francis
- Department of Chemical Engineering, University of Washington, Seattle, USA
| | - Chau Vuong
- Department of Biochemistry, University of Washington, Seattle, USA
| | - Logan Roberts
- Department of Chemical Engineering, University of Washington, Seattle, USA
| | - Amin Mosallanejad
- Department of Chemical Engineering, University of Washington, Seattle, USA
| | - David A C Beck
- Department of Chemical Engineering, University of Washington, Seattle, USA.
- eScience Institute, University of Washington, Seattle, USA.
- Paul G. Allen School of Computer Science, University of Washington, Seattle, USA.
| |
Collapse
|
9
|
Stern JA, Free TJ, Stern KL, Gardiner S, Dalley NA, Bundy BC, Price JL, Wingate D, Della Corte D. A probabilistic view of protein stability, conformational specificity, and design. Sci Rep 2023; 13:15493. [PMID: 37726313 PMCID: PMC10509192 DOI: 10.1038/s41598-023-42032-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 09/04/2023] [Indexed: 09/21/2023] Open
Abstract
Various approaches have used neural networks as probabilistic models for the design of protein sequences. These "inverse folding" models employ different objective functions, which come with trade-offs that have not been assessed in detail before. This study introduces probabilistic definitions of protein stability and conformational specificity and demonstrates the relationship between these chemical properties and the [Formula: see text] Boltzmann probability objective. This links the Boltzmann probability objective function to experimentally verifiable outcomes. We propose a novel sequence decoding algorithm, referred to as "BayesDesign", that leverages Bayes' Rule to maximize the [Formula: see text] objective instead of the [Formula: see text] objective common in inverse folding models. The efficacy of BayesDesign is evaluated in the context of two protein model systems, the NanoLuc enzyme and the WW structural motif. Both BayesDesign and the baseline ProteinMPNN algorithm increase the thermostability of NanoLuc and increase the conformational specificity of WW. The possible sources of error in the model are analyzed.
Collapse
Affiliation(s)
- Jacob A Stern
- Department of Computer Science, Brigham Young University, Provo, UT, USA
| | - Tyler J Free
- Department of Chemical Engineering, Brigham Young University, Provo, UT, USA
| | - Kimberlee L Stern
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, UT, USA
| | - Spencer Gardiner
- Department of Physics and Astronomy, Brigham Young University, Provo, UT, USA
| | - Nicholas A Dalley
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, UT, USA
| | - Bradley C Bundy
- Department of Chemical Engineering, Brigham Young University, Provo, UT, USA
| | - Joshua L Price
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, UT, USA
| | - David Wingate
- Department of Computer Science, Brigham Young University, Provo, UT, USA
| | - Dennis Della Corte
- Department of Physics and Astronomy, Brigham Young University, Provo, UT, USA.
| |
Collapse
|
10
|
Sun Y, Huang X, Osawa Y, Chen YE, Zhang H. The Versatile Biocatalyst of Cytochrome P450 CYP102A1: Structure, Function, and Engineering. Molecules 2023; 28:5353. [PMID: 37513226 PMCID: PMC10383305 DOI: 10.3390/molecules28145353] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 07/07/2023] [Accepted: 07/10/2023] [Indexed: 07/30/2023] Open
Abstract
Wild-type cytochrome P450 CYP102A1 from Bacillus megaterium is a highly efficient monooxygenase for the oxidation of long-chain fatty acids. The unique features of CYP102A1, such as high catalytic activity, expression yield, regio- and stereoselectivity, and self-sufficiency in electron transfer as a fusion protein, afford the requirements for an ideal biocatalyst. In the past three decades, remarkable progress has been made in engineering CYP102A1 for applications in drug discovery, biosynthesis, and biotechnology. The repertoire of engineered CYP102A1 variants has grown tremendously, whereas the substrate repertoire is avalanched to encompass alkanes, alkenes, aromatics, organic solvents, pharmaceuticals, drugs, and many more. In this article, we highlight the major advances in the past five years in our understanding of the structure and function of CYP102A1 and the methodologies used to engineer CYP102A1 for novel applications. The objective is to provide a succinct review of the latest developments with reference to the body of CYP102A1-related literature.
Collapse
Affiliation(s)
- Yudong Sun
- Department of Pharmacology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Xiaoqiang Huang
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yoichi Osawa
- Department of Pharmacology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yuqing Eugene Chen
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI 48109, USA
| | - Haoming Zhang
- Department of Pharmacology, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
11
|
Rescifina A. Progress of the "Molecular Informatics" Section in 2022. Int J Mol Sci 2023; 24:ijms24119442. [PMID: 37298393 DOI: 10.3390/ijms24119442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 05/19/2023] [Indexed: 06/12/2023] Open
Abstract
This is the first Editorial of the "Molecular Informatics" Section (MIS) of the International Journal of Molecular Sciences (IJMS), which was created towards the end of 2018 (the first article was submitted on 27 September 2018) and has experienced significant growth from 2018 to now [...].
Collapse
Affiliation(s)
- Antonio Rescifina
- Department of Drug and Health Sciences, University of Catania, Viale Andrea Doria 6, 95125 Catania, Italy
| |
Collapse
|
12
|
Rennella E, Sahtoe DD, Baker D, Kay LE. Exploiting conformational dynamics to modulate the function of designed proteins. Proc Natl Acad Sci U S A 2023; 120:e2303149120. [PMID: 37094170 PMCID: PMC10161014 DOI: 10.1073/pnas.2303149120] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 03/22/2023] [Indexed: 04/26/2023] Open
Abstract
With the recent success in calculating protein structures from amino acid sequences using artificial intelligence-based algorithms, an important next step is to decipher how dynamics is encoded by the primary protein sequence so as to better predict function. Such dynamics information is critical for protein design, where strategies could then focus not only on sequences that fold into particular structures that perform a given task, but would also include low-lying excited protein states that could influence the function of the designed protein. Herein, we illustrate the importance of dynamics in modulating the function of C34, a designed α/β protein that captures β-strands of target ligands and is a member of a family of proteins designed to sequester β-strands and β hairpins of aggregation-prone molecules that lead to a variety of pathologies. Using a strategy to "see" regions of apo C34 that are invisible to NMR spectroscopy as a result of pervasive conformational exchange, as well as a mutagenesis approach whereby C34 molecules are stabilized into a single conformer, we determine the structures of the predominant conformations that are sampled by C34 and show that these attenuate the affinity for cognate peptide. Subsequently, the observed motion is exploited to develop an allosterically regulated peptide binder whose binding affinity can be controlled through the addition of a second molecule. Our study emphasizes the unique role that NMR can play in directing the design process and in the construction of new molecules with more complex functionality.
Collapse
Affiliation(s)
- Enrico Rennella
- Department of Molecular Genetics, University of Toronto, Toronto, ONM5S 1A8, Canada
- Department of Biochemistry, University of Toronto, Toronto, ONM5S 1A8, Canada
- Department of Chemistry, University of Toronto, Toronto, ONM5S 3H6, Canada
| | - Danny D. Sahtoe
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
- HHMI, University of Washington, Seattle, WA98195
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
- HHMI, University of Washington, Seattle, WA98195
| | - Lewis E. Kay
- Department of Molecular Genetics, University of Toronto, Toronto, ONM5S 1A8, Canada
- Department of Biochemistry, University of Toronto, Toronto, ONM5S 1A8, Canada
- Department of Chemistry, University of Toronto, Toronto, ONM5S 3H6, Canada
- Program in Molecular Medicine, The Hospital for Sick Children Research Institute, Toronto, ONM5G 0A4, Canada
| |
Collapse
|
13
|
Bougueroua S, Bricage M, Aboulfath Y, Barth D, Gaigeot MP. Algorithmic Graph Theory, Reinforcement Learning and Game Theory in MD Simulations: From 3D Structures to Topological 2D-Molecular Graphs (2D-MolGraphs) and Vice Versa. Molecules 2023; 28:molecules28072892. [PMID: 37049654 PMCID: PMC10096312 DOI: 10.3390/molecules28072892] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 03/17/2023] [Accepted: 03/18/2023] [Indexed: 04/14/2023] Open
Abstract
This paper reviews graph-theory-based methods that were recently developed in our group for post-processing molecular dynamics trajectories. We show that the use of algorithmic graph theory not only provides a direct and fast methodology to identify conformers sampled over time but also allows to follow the interconversions between the conformers through graphs of transitions in time. Examples of gas phase molecules and inhomogeneous aqueous solid interfaces are presented to demonstrate the power of topological 2D graphs and their versatility for post-processing molecular dynamics trajectories. An even more complex challenge is to predict 3D structures from topological 2D graphs. Our first attempts to tackle such a challenge are presented with the development of game theory and reinforcement learning methods for predicting the 3D structure of a gas-phase peptide.
Collapse
Affiliation(s)
- Sana Bougueroua
- Université Paris-Saclay, University Evry, CY Cergy Paris Université, CNRS, LAMBE UMR8587, 91025 Evry-Courcouronnes, France
| | - Marie Bricage
- Université Paris-Saclay, University Versailles Saint Quentin, DAVID, 78000 Versailles, France
| | - Ylène Aboulfath
- Université Paris-Saclay, University Versailles Saint Quentin, DAVID, 78000 Versailles, France
| | - Dominique Barth
- Université Paris-Saclay, University Versailles Saint Quentin, DAVID, 78000 Versailles, France
| | - Marie-Pierre Gaigeot
- Université Paris-Saclay, University Evry, CY Cergy Paris Université, CNRS, LAMBE UMR8587, 91025 Evry-Courcouronnes, France
| |
Collapse
|
14
|
Li AJ, Lu M, Desta I, Sundar V, Grigoryan G, Keating AE. Neural network-derived Potts models for structure-based protein design using backbone atomic coordinates and tertiary motifs. Protein Sci 2023; 32:e4554. [PMID: 36564857 PMCID: PMC9854172 DOI: 10.1002/pro.4554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 11/15/2022] [Accepted: 12/20/2022] [Indexed: 12/25/2022]
Abstract
Designing novel proteins to perform desired functions, such as binding or catalysis, is a major goal in synthetic biology. A variety of computational approaches can aid in this task. An energy-based framework rooted in the sequence-structure statistics of tertiary motifs (TERMs) can be used for sequence design on predefined backbones. Neural network models that use backbone coordinate-derived features provide another way to design new proteins. In this work, we combine the two methods to make neural structure-based models more suitable for protein design. Specifically, we supplement backbone-coordinate features with TERM-derived data, as inputs, and we generate energy functions as outputs. We present two architectures that generate Potts models over the sequence space: TERMinator, which uses both TERM-based and coordinate-based information, and COORDinator, which uses only coordinate-based information. Using these two models, we demonstrate that TERMs can be utilized to improve native sequence recovery performance of neural models. Furthermore, we demonstrate that sequences designed by TERMinator are predicted to fold to their target structures by AlphaFold. Finally, we show that both TERMinator and COORDinator learn notions of energetics, and these methods can be fine-tuned on experimental data to improve predictions. Our results suggest that using TERM-based and coordinate-based features together may be beneficial for protein design and that structure-based neural models that produce Potts energy tables have utility for flexible applications in protein science.
Collapse
Affiliation(s)
- Alex J. Li
- Department of ChemistryMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| | - Mindren Lu
- Department of Electrical Engineering and Computer ScienceMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
- Department of Biological EngineeringMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| | - Israel Desta
- Department of BiologyMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| | - Vikram Sundar
- Computational and Systems Biology ProgramMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| | - Gevorg Grigoryan
- Department of Computer ScienceDartmouth CollegeHanoverNew HampshireUSA
| | - Amy E. Keating
- Department of Biological EngineeringMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
- Department of BiologyMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
- Koch Institute for Integrative Cancer ResearchMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| |
Collapse
|
15
|
Ferruz N, Heinzinger M, Akdel M, Goncearenco A, Naef L, Dallago C. From sequence to function through structure: Deep learning for protein design. Comput Struct Biotechnol J 2022; 21:238-250. [PMID: 36544476 PMCID: PMC9755234 DOI: 10.1016/j.csbj.2022.11.014] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 11/05/2022] [Accepted: 11/05/2022] [Indexed: 11/20/2022] Open
Abstract
The process of designing biomolecules, in particular proteins, is witnessing a rapid change in available tooling and approaches, moving from design through physicochemical force fields, to producing plausible, complex sequences fast via end-to-end differentiable statistical models. To achieve conditional and controllable protein design, researchers at the interface of artificial intelligence and biology leverage advances in natural language processing (NLP) and computer vision techniques, coupled with advances in computing hardware to learn patterns from growing biological databases, curated annotations thereof, or both. Once learned, these patterns can be used to provide novel insights into mechanistic biology and the design of biomolecules. However, navigating and understanding the practical applications for the many recent protein design tools is complex. To facilitate this, we 1) document recent advances in deep learning (DL) assisted protein design from the last three years, 2) present a practical pipeline that allows to go from de novo-generated sequences to their predicted properties and web-powered visualization within minutes, and 3) leverage it to suggest a generated protein sequence which might be used to engineer a biosynthetic gene cluster to produce a molecular glue-like compound. Lastly, we discuss challenges and highlight opportunities for the protein design field.
Collapse
Key Words
- ADMM, Alternating Direction Method of Multipliers
- CNN, Convolutional Neural Network
- DL, Deep learning
- Deep learning
- Drug discovery
- FNN, fully-connected neural network
- GAN, Generative Adversarial Network
- GCN, Graph Convolutional Network
- GNN, Graph Neural Network
- GO, Gene Ontology
- GVP, Geometric Vector Perceptron
- LSTM, Long-Short Term Memory
- MLP, Multilayer Perceptron
- MSA, Multiple Sequence Alignment
- NLP, Natural Language Processing
- NSR, Natural Sequence Recovery
- Protein design
- Protein language models
- Protein prediction
- VAE, Variational Autoencoder
- pLM, protein Language Model
Collapse
Affiliation(s)
- Noelia Ferruz
- Institute of Informatics and Applications, University of Girona, Girona, Spain
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
| | - Michael Heinzinger
- Department of Informatics, Bioinformatics & Computational Biology, Technische Universität München, 85748 Garching, Germany
| | - Mehmet Akdel
- VantAI, 151 W 42nd Street, New York, NY 10036, United States
| | | | - Luca Naef
- VantAI, 151 W 42nd Street, New York, NY 10036, United States
| | - Christian Dallago
- Department of Informatics, Bioinformatics & Computational Biology, Technische Universität München, 85748 Garching, Germany
- VantAI, 151 W 42nd Street, New York, NY 10036, United States
- NVIDIA DE GmbH, Einsteinstraße 172, 81677 München, Germany
| |
Collapse
|
16
|
Sun L, Ma X, Zhang B, Qin Y, Ma J, Du Y, Chen T. From polymerase engineering to semi-synthetic life: artificial expansion of the central dogma. RSC Chem Biol 2022; 3:1173-1197. [PMID: 36320892 PMCID: PMC9533422 DOI: 10.1039/d2cb00116k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Accepted: 08/08/2022] [Indexed: 11/21/2022] Open
Abstract
Nucleic acids have been extensively modified in different moieties to expand the scope of genetic materials in the past few decades. While the development of unnatural base pairs (UBPs) has expanded the genetic information capacity of nucleic acids, the production of synthetic alternatives of DNA and RNA has increased the types of genetic information carriers and introduced novel properties and functionalities into nucleic acids. Moreover, the efforts of tailoring DNA polymerases (DNAPs) and RNA polymerases (RNAPs) to be efficient unnatural nucleic acid polymerases have enabled broad application of these unnatural nucleic acids, ranging from production of stable aptamers to evolution of novel catalysts. The introduction of unnatural nucleic acids into living organisms has also started expanding the central dogma in vivo. In this article, we first summarize the development of unnatural nucleic acids with modifications or alterations in different moieties. The strategies for engineering DNAPs and RNAPs are then extensively reviewed, followed by summarization of predominant polymerase mutants with good activities for synthesizing, reverse transcribing, or even amplifying unnatural nucleic acids. Some recent application examples of unnatural nucleic acids with their polymerases are then introduced. At the end, the approaches of introducing UBPs and synthetic genetic polymers into living organisms for the creation of semi-synthetic organisms are reviewed and discussed.
Collapse
Affiliation(s)
- Leping Sun
- MOE International Joint Research Laboratory on Synthetic Biology and Medicines, School of Biology and Biological Engineering, South China University of Technology 510006 Guangzhou China
| | - Xingyun Ma
- MOE International Joint Research Laboratory on Synthetic Biology and Medicines, School of Biology and Biological Engineering, South China University of Technology 510006 Guangzhou China
| | - Binliang Zhang
- MOE International Joint Research Laboratory on Synthetic Biology and Medicines, School of Biology and Biological Engineering, South China University of Technology 510006 Guangzhou China
| | - Yanjia Qin
- MOE International Joint Research Laboratory on Synthetic Biology and Medicines, School of Biology and Biological Engineering, South China University of Technology 510006 Guangzhou China
| | - Jiezhao Ma
- MOE International Joint Research Laboratory on Synthetic Biology and Medicines, School of Biology and Biological Engineering, South China University of Technology 510006 Guangzhou China
| | - Yuhui Du
- MOE International Joint Research Laboratory on Synthetic Biology and Medicines, School of Biology and Biological Engineering, South China University of Technology 510006 Guangzhou China
| | - Tingjian Chen
- MOE International Joint Research Laboratory on Synthetic Biology and Medicines, School of Biology and Biological Engineering, South China University of Technology 510006 Guangzhou China
| |
Collapse
|
17
|
Kang M, Oh JH. Editorial of Special Issue "Deep Learning and Machine Learning in Bioinformatics". Int J Mol Sci 2022; 23:ijms23126610. [PMID: 35743052 PMCID: PMC9224509 DOI: 10.3390/ijms23126610] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 06/10/2022] [Indexed: 02/04/2023] Open
Abstract
In recent years, deep learning has emerged as a highly active research field, achieving great success in various machine learning areas, including image processing, speech recognition, and natural language processing, and now rapidly becoming a dominant tool in biomedicine [...].
Collapse
Affiliation(s)
- Mingon Kang
- Department of Computer Science, University of Nevada, Las Vegas, NV 89154, USA;
| | - Jung Hun Oh
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
- Correspondence:
| |
Collapse
|
18
|
Rudden LSP, Hijazi M, Barth P. Deep learning approaches for conformational flexibility and switching properties in protein design. Front Mol Biosci 2022; 9:928534. [PMID: 36032687 PMCID: PMC9399439 DOI: 10.3389/fmolb.2022.928534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 07/15/2022] [Indexed: 11/30/2022] Open
Abstract
Following the hugely successful application of deep learning methods to protein structure prediction, an increasing number of design methods seek to leverage generative models to design proteins with improved functionality over native proteins or novel structure and function. The inherent flexibility of proteins, from side-chain motion to larger conformational reshuffling, poses a challenge to design methods, where the ideal approach must consider both the spatial and temporal evolution of proteins in the context of their functional capacity. In this review, we highlight existing methods for protein design before discussing how methods at the forefront of deep learning-based design accommodate flexibility and where the field could evolve in the future.
Collapse
Affiliation(s)
| | | | - Patrick Barth
- *Correspondence: Lucas S. P. Rudden, ; Patrick Barth,
| |
Collapse
|