1
|
Hong L, Kortemme T. An integrative approach to protein sequence design through multiobjective optimization. PLoS Comput Biol 2024; 20:e1011953. [PMID: 38991035 PMCID: PMC11265717 DOI: 10.1371/journal.pcbi.1011953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 07/23/2024] [Accepted: 06/25/2024] [Indexed: 07/13/2024] Open
Abstract
With recent methodological advances in the field of computational protein design, in particular those based on deep learning, there is an increasing need for frameworks that allow for coherent, direct integration of different models and objective functions into the generative design process. Here we demonstrate how evolutionary multiobjective optimization techniques can be adapted to provide such an approach. With the established Non-dominated Sorting Genetic Algorithm II (NSGA-II) as the optimization framework, we use AlphaFold2 and ProteinMPNN confidence metrics to define the objective space, and a mutation operator composed of ESM-1v and ProteinMPNN to rank and then redesign the least favorable positions. Using the two-state design problem of the foldswitching protein RfaH as an in-depth case study, and PapD and calmodulin as examples of higher-dimensional design problems, we show that the evolutionary multiobjective optimization approach leads to significant reduction in the bias and variance in RfaH native sequence recovery, compared to a direct application of ProteinMPNN. We suggest that this improvement is due to three factors: (i) the use of an informative mutation operator that accelerates the sequence space exploration, (ii) the parallel, iterative design process inherent to the genetic algorithm that improves upon the ProteinMPNN autoregressive sequence decoding scheme, and (iii) the explicit approximation of the Pareto front that leads to optimal design candidates representing diverse tradeoff conditions. We anticipate this approach to be readily adaptable to different models and broadly relevant for protein design tasks with complex specifications.
Collapse
Affiliation(s)
- Lu Hong
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California, United States of America
| | - Tanja Kortemme
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California, United States of America
- Quantitative Biosciences Institute, University of California, San Francisco, California, United States of America
- Chan Zuckerberg Biohub, San Francisco, California, United States of America
| |
Collapse
|
2
|
Hong L, Kortemme T. An integrative approach to protein sequence design through multiobjective optimization. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.01.582670. [PMID: 38496480 PMCID: PMC10942313 DOI: 10.1101/2024.03.01.582670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
With recent methodological advances in the field of computational protein design, in particular those based on deep learning, there is an increasing need for frameworks that allow for coherent, direct integration of different models and objective functions into the generative design process. Here we demonstrate how evolutionary multiobjective optimization techniques can be adapted to provide such an approach. With the established Non-dominated Sorting Genetic Algorithm II (NSGA-II) as the optimization framework, we use AlphaFold2 and ProteinMPNN confidence metrics to define the objective space, and a mutation operator composed of ESM-1v and ProteinMPNN to rank and then redesign the least favorable positions. Using the multistate design problem of the foldswitching protein RfaH as an in-depth case study, we show that the evolutionary multiobjective optimization approach leads to significant reduction in the bias and variance in RfaH native sequence recovery, compared to a direct application of ProteinMPNN. We suggest that this improvement is due to three factors: (i) the use of an informative mutation operator that accelerates the sequence space exploration, (ii) the parallel, iterative design process inherent to the genetic algorithm that improves upon the ProteinMPNN autoregressive sequence decoding scheme, and (iii) the explicit approximation of the Pareto front that leads to optimal design candidates representing diverse tradeoff conditions. We anticipate this approach to be readily adaptable to different models and broadly relevant for protein design tasks with complex specifications.
Collapse
Affiliation(s)
- Lu Hong
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Tanja Kortemme
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA
- Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, CA 94158, USA
- Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| |
Collapse
|
3
|
Castorina LV, Ünal SM, Subr K, Wood CW. TIMED-Design: flexible and accessible protein sequence design with convolutional neural networks. Protein Eng Des Sel 2024; 37:gzae002. [PMID: 38288671 PMCID: PMC10939383 DOI: 10.1093/protein/gzae002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 12/12/2023] [Accepted: 01/12/2024] [Indexed: 02/18/2024] Open
Abstract
Sequence design is a crucial step in the process of designing or engineering proteins. Traditionally, physics-based methods have been used to solve for optimal sequences, with the main disadvantages being that they are computationally intensive for the end user. Deep learning-based methods offer an attractive alternative, outperforming physics-based methods at a significantly lower computational cost. In this paper, we explore the application of Convolutional Neural Networks (CNNs) for sequence design. We describe the development and benchmarking of a range of networks, as well as reimplementations of previously described CNNs. We demonstrate the flexibility of representing proteins in a three-dimensional voxel grid by encoding additional design constraints into the input data. Finally, we describe TIMED-Design, a web application and command line tool for exploring and applying the models described in this paper. The user interface will be available at the URL: https://pragmaticproteindesign.bio.ed.ac.uk/timed. The source code for TIMED-Design is available at https://github.com/wells-wood-research/timed-design.
Collapse
Affiliation(s)
- Leonardo V Castorina
- School of Informatics, University of Edinburgh, 10 Crichton Street, Edinburgh EH8 9AB United Kingdom
| | - Suleyman Mert Ünal
- School of Biological Sciences, University of Edinburgh, Roger Land Building, Edinburgh EH9 3FF, United Kingdom
| | - Kartic Subr
- School of Informatics, University of Edinburgh, 10 Crichton Street, Edinburgh EH8 9AB United Kingdom
| | - Christopher W Wood
- School of Biological Sciences, University of Edinburgh, Roger Land Building, Edinburgh EH9 3FF, United Kingdom
| |
Collapse
|
4
|
Liang S, Zhang C, Zhu M. Ab Initio Prediction of 3-D Conformations for Protein Long Loops with High Accuracy and Applications to Antibody CDRH3 Modeling. J Chem Inf Model 2023; 63:7568-7577. [PMID: 38018130 DOI: 10.1021/acs.jcim.3c01051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
Residue-level potentials of mean force were widely used for protein backbone refinements to avoid simultaneous sampling of side-chain conformations. The interaction energy between the reduced side chains and backbone atoms was not considered explicitly. In this study, we developed novel methods to calculate the residue-atom interaction energy in combination with atomic and residue-level terms. The parameters were optimized step by step to remove the overcounting or overlap problem between different energy terms. The mixing energy functions were then used to evaluate the generated backbone conformations at the initial sampling stage of protein loop modeling (OSCAR-loop), including the interaction energy between the reduced loop residues and full atoms of the protein framework. The accuracies of top-ranked decoys were 1.18 and 2.81 Å for 8-residue and 12-residue loops, respectively. We then selected diverse decoys for side-chain modeling, backbone refinement, and energy minimization. The procedure was repeated multiple times to select one prediction with the lowest energy. Consequently, we obtained an accuracy of 0.74 Å for a prevailing test set of 12-residue loops, compared with >1.4 Å reported by other researchers. The OSCAR-loop was also effective for modeling the H3 loops of antibody complementary determining regions (CDRs) in the crystal environment. The prediction accuracy of OSCAR-loop (1.74 Å) was better than the accuracy of the Rosetta NGK method (3.11 Å) or those achieved by deep learning methods (>2.2 Å) for the CDRH3 loops of 49 targets in the Rosetta antibody benchmark. The performance of OSCAR-loop in a model environment was also discussed.
Collapse
Affiliation(s)
- Shide Liang
- Department of Computational Biology, 20n Bio Limited, Hangzhou 310018, P. R. China
- Department of Research and Development, Bio-Thera Solutions, Guangzhou 510530, P. R. China
| | - Chi Zhang
- School of Biological Sciences, University of Nebraska, Lincoln, Nebraska 68588, United States
| | - Mingfu Zhu
- Department of Computational Biology, 20n Bio Limited, Hangzhou 310018, P. R. China
| |
Collapse
|
5
|
Bonadio A, Wenig BL, Hockla A, Radisky ES, Shifman JM. Designed Loop Extension Followed by Combinatorial Screening Confers High Specificity to a Broad Matrix MetalloproteinaseInhibitor. J Mol Biol 2023; 435:168095. [PMID: 37068580 PMCID: PMC10312305 DOI: 10.1016/j.jmb.2023.168095] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 04/03/2023] [Accepted: 04/10/2023] [Indexed: 04/19/2023]
Abstract
Matrix metalloproteinases (MMPs) are key drivers of various diseases, including cancer. Development of probes and drugs capable of selectively inhibiting the individual members of the large MMP family remains a persistent challenge. The inhibitory N-terminal domain of tissue inhibitor of metalloproteinases-2 (N-TIMP2), a natural broad MMP inhibitor, can provide a scaffold for protein engineering to create more selective MMP inhibitors. Here, we pursued a unique approach harnessing both computational design and combinatorial screening to confer high binding specificity toward a target MMP in preference to an anti-target MMP. We designed a loop extension of N-TIMP2 to allow new interactions with the non-conserved MMP surface and generated an efficient focused library for yeast surface display, which was then screened for high binding to the target MMP-14 and low binding to anti-target MMP-3. Deep sequencing analysis identified the most promising variants, which were expressed, purified, and tested for selectivity of inhibition. Our best N-TIMP2 variant exhibited 29 pM binding affinity to MMP-14 and 2.4 µM affinity to MMP-3, revealing 7500-fold greater specificity than WT N-TIMP2. High-confidence structural models were obtained by including NGS data in the AlphaFold multiple sequence alignment. The modeling together with experimental mutagenesis validated our design predictions, demonstrating that the loop extension packs tightly against non-conserved residues on MMP-14 and clashes with MMP-3. This study demonstrates how introduction of loop extensions in a manner guided by target protein conservation data and loop design can offer an attractive strategy to achieve specificity in design of protein ligands.
Collapse
Affiliation(s)
- Alessandro Bonadio
- Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Israel
| | - Bernhard L Wenig
- Department of Cancer Biology, Mayo Clinic Comprehensive Cancer Center, Jacksonville, Florida, USA; Paracelsus Medical University, Salzburg, Austria
| | - Alexandra Hockla
- Department of Cancer Biology, Mayo Clinic Comprehensive Cancer Center, Jacksonville, Florida, USA
| | - Evette S Radisky
- Department of Cancer Biology, Mayo Clinic Comprehensive Cancer Center, Jacksonville, Florida, USA.
| | - Julia M Shifman
- Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Israel.
| |
Collapse
|
6
|
Wang X, Xu K, Tan Y, Liu S, Zhou J. Possibilities of Using De Novo Design for Generating Diverse Functional Food Enzymes. Int J Mol Sci 2023; 24:3827. [PMID: 36835238 PMCID: PMC9964944 DOI: 10.3390/ijms24043827] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 02/03/2023] [Accepted: 02/03/2023] [Indexed: 02/17/2023] Open
Abstract
Food enzymes have an important role in the improvement of certain food characteristics, such as texture improvement, elimination of toxins and allergens, production of carbohydrates, enhancing flavor/appearance characteristics. Recently, along with the development of artificial meats, food enzymes have been employed to achieve more diverse functions, especially in converting non-edible biomass to delicious foods. Reported food enzyme modifications for specific applications have highlighted the significance of enzyme engineering. However, using direct evolution or rational design showed inherent limitations due to the mutation rates, which made it difficult to satisfy the stability or specific activity needs for certain applications. Generating functional enzymes using de novo design, which highly assembles naturally existing enzymes, provides potential solutions for screening desired enzymes. Here, we describe the functions and applications of food enzymes to introduce the need for food enzymes engineering. To illustrate the possibilities of using de novo design for generating diverse functional proteins, we reviewed protein modelling and de novo design methods and their implementations. The future directions for adding structural data for de novo design model training, acquiring diversified training data, and investigating the relationship between enzyme-substrate binding and activity were highlighted as challenges to overcome for the de novo design of food enzymes.
Collapse
Affiliation(s)
- Xinglong Wang
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology, School of Biotechnology, Jiangnan University, Wuxi 214122, China
- Science Center for Future Foods, Jiangnan University, Wuxi 214122, China
| | - Kangjie Xu
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology, School of Biotechnology, Jiangnan University, Wuxi 214122, China
- Science Center for Future Foods, Jiangnan University, Wuxi 214122, China
| | - Yameng Tan
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology, School of Biotechnology, Jiangnan University, Wuxi 214122, China
- Science Center for Future Foods, Jiangnan University, Wuxi 214122, China
| | - Song Liu
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology, School of Biotechnology, Jiangnan University, Wuxi 214122, China
- Science Center for Future Foods, Jiangnan University, Wuxi 214122, China
| | - Jingwen Zhou
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology, School of Biotechnology, Jiangnan University, Wuxi 214122, China
- Science Center for Future Foods, Jiangnan University, Wuxi 214122, China
- Jiangsu Province Engineering Research Center of Food Synthetic Biotechnology, Jiangnan University, Wuxi 214122, China
| |
Collapse
|