1
|
Clifton BE, Kozome D, Laurino P. Efficient Exploration of Sequence Space by Sequence-Guided Protein Engineering and Design. Biochemistry 2023; 62:210-220. [PMID: 35245020 DOI: 10.1021/acs.biochem.1c00757] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
The rapid growth of sequence databases over the past two decades means that protein engineers faced with optimizing a protein for any given task will often have immediate access to a vast number of related protein sequences. These sequences encode information about the evolutionary history of the protein and the underlying sequence requirements to produce folded, stable, and functional protein variants. Methods that can take advantage of this information are an increasingly important part of the protein engineering tool kit. In this Perspective, we discuss the utility of sequence data in protein engineering and design, focusing on recent advances in three main areas: the use of ancestral sequence reconstruction as an engineering tool to generate thermostable and multifunctional proteins, the use of sequence data to guide engineering of multipoint mutants by structure-based computational protein design, and the use of unlabeled sequence data for unsupervised and semisupervised machine learning, allowing the generation of diverse and functional protein sequences in unexplored regions of sequence space. Altogether, these methods enable the rapid exploration of sequence space within regions enriched with functional proteins and therefore have great potential for accelerating the engineering of stable, functional, and diverse proteins for industrial and biomedical applications.
Collapse
Affiliation(s)
- Ben E Clifton
- Protein Engineering and Evolution Unit, Okinawa Institute of Science and Technology, 1919-1 Tancha, Onna, Okinawa 904-0495, Japan
| | - Dan Kozome
- Protein Engineering and Evolution Unit, Okinawa Institute of Science and Technology, 1919-1 Tancha, Onna, Okinawa 904-0495, Japan
| | - Paola Laurino
- Protein Engineering and Evolution Unit, Okinawa Institute of Science and Technology, 1919-1 Tancha, Onna, Okinawa 904-0495, Japan
| |
Collapse
|
2
|
Li RX, Zhang NN, Wu B, OuYang B, Shen HB. Multiobjective heuristic algorithm for de novo protein design in a quantified continuous sequence space. Comput Struct Biotechnol J 2021; 19:2575-2587. [PMID: 34025944 PMCID: PMC8114120 DOI: 10.1016/j.csbj.2021.04.046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 04/19/2021] [Accepted: 04/22/2021] [Indexed: 11/12/2022] Open
Abstract
Protein design usually involves sequence search process and evaluation criteria. Commonly used methods primarily implement the Monte Carlo or simulated annealing algorithm with a single-energy function to obtain ideal solutions, which is often highly time-consuming and limited by the accuracy of the energy function. In this report, we introduce a multiobjective algorithm named Hydra for protein design, which employs two different energy functions to optimize solutions simultaneously and makes use of the latent quantitative relationship between different amino acid types to facilitate the search process. The framework uses two kinds of prior information to transform the original disordered discrete sequence space into a relatively ordered space, and decoy sequences are searched in this ordered space through a multiobjective swarm intelligence algorithm. This algorithm features high accuracy and a high-speed search process. Our method was tested on 40 targets covering different fold classes, which were computationally verified to be well folded, and it experimentally solved the 1UBQ fold by NMR in excellent agreement with the native structure with a backbone RMSD deviation of 1.074 Å. The Hydra software package can be downloaded from: http://www.csbio.sjtu.edu.cn/bioinf/HYDRA/ for academic use.
Collapse
Affiliation(s)
- Rui-Xiang Li
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Ning-Ning Zhang
- State Key Laboratory of Molecular Biology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai 201203, China
| | - Bin Wu
- National Facility for Protein Science in Shanghai, ZhangJiang Lab, Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China
| | - Bo OuYang
- State Key Laboratory of Molecular Biology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai 201203, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.,Department of Computer Science, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
3
|
Pham PN, Huličiak M, Biedermannová L, Černý J, Charnavets T, Fuertes G, Herynek Š, Kolářová L, Kolenko P, Pavlíček J, Zahradník J, Mikulecky P, Schneider B. Protein Binder (ProBi) as a New Class of Structurally Robust Non-Antibody Protein Scaffold for Directed Evolution. Viruses 2021; 13:v13020190. [PMID: 33514045 PMCID: PMC7911045 DOI: 10.3390/v13020190] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Revised: 01/15/2021] [Accepted: 01/23/2021] [Indexed: 12/13/2022] Open
Abstract
Engineered small non-antibody protein scaffolds are a promising alternative to antibodies and are especially attractive for use in protein therapeutics and diagnostics. The advantages include smaller size and a more robust, single-domain structural framework with a defined binding surface amenable to mutation. This calls for a more systematic approach in designing new scaffolds suitable for use in one or more methods of directed evolution. We hereby describe a process based on an analysis of protein structures from the Protein Data Bank and their experimental examination. The candidate protein scaffolds were subjected to a thorough screening including computational evaluation of the mutability, and experimental determination of their expression yield in E. coli, solubility, and thermostability. In the next step, we examined several variants of the candidate scaffolds including their wild types and alanine mutants. We proved the applicability of this systematic procedure by selecting a monomeric single-domain human protein with a fold different from previously known scaffolds. The newly developed scaffold, called ProBi (Protein Binder), contains two independently mutable surface patches. We demonstrated its functionality by training it as a binder against human interleukin-10, a medically important cytokine. The procedure yielded scaffold-related variants with nanomolar affinity.
Collapse
|
4
|
Tu Z, Huang X, Fu J, Hu N, Zheng W, Li Y, Zhang Y. Landscape of variable domain of heavy-chain-only antibody repertoire from alpaca. Immunology 2020; 161:53-65. [PMID: 32506493 DOI: 10.1111/imm.13224] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 05/18/2020] [Accepted: 05/19/2020] [Indexed: 01/05/2023] Open
Abstract
Heavy-chain-only antibodies (HCAbs), which are devoid of light chains, have been found naturally occurring in various species including camelids and cartilaginous fish. Because of their high thermostability, refoldability and capacity for cell permeation, the variable regions of the heavy chain of HCAbs (VHHs) have been widely used in diagnosis, bio-imaging, food safety and therapeutics. Most immunogenetic and functional studies of HCAbs are based on case studies or a limited number of low-throughput sequencing data. A complete picture derived from more abundant high-throughput sequencing (HTS) data can help us gain deeper insights. We cloned and sequenced the full-length coding region of VHHs in Alpaca (Vicugna pacos) via HTS in this study. A new pipeline was developed to conduct an in-depth analysis of the HCAb repertoires. Various critical features, including the length distribution of complementarity-determining region 3 (CDR3), V(D)J usage, VJ pairing, germline-specific mutation rate and germline-specific scoring profiles (GSSPs), were systematically characterized. The quantitative data show that V(D)J usage and VHH recombination are highly biased. Interestingly, we found that the average CDR3 length of classical VHHs is longer than that of non-classical ones, whereas the mutation rates are similar in both kinds of VHHs. Finally, GSSPs were built to quantitatively describe and compare sequences that originate from each VJ pair. Overall, this study presents a comprehensive landscape of the HCAb repertoire, which can provide useful guidance for the modeling of somatic hypermutation and the design of novel functional VHHs or VHH repertoires via evolutionary profiles.
Collapse
Affiliation(s)
- Zhui Tu
- State Key Laboratory of Food Science and Technology, Nanchang University, Nanchang, China.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA.,Jiangxi Province Key Laboratory of Modern Analytical Science, Nanchang University, Nanchang, China
| | - Xiaoqiang Huang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Jinheng Fu
- State Key Laboratory of Food Science and Technology, Nanchang University, Nanchang, China.,Jiangxi-OAI Joint Research Institution, Nanchang University, Nanchang, China
| | - Na Hu
- State Key Laboratory of Food Science and Technology, Nanchang University, Nanchang, China.,Jiangxi Province Key Laboratory of Modern Analytical Science, Nanchang University, Nanchang, China.,Maternal and Child Medical Research Institute, Shenzhen Maternity and Child Healthcare Hospital, Southern Medical University, Shenzhen, China
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Yanping Li
- State Key Laboratory of Food Science and Technology, Nanchang University, Nanchang, China.,Jiangxi Province Key Laboratory of Modern Analytical Science, Nanchang University, Nanchang, China.,Jiangxi-OAI Joint Research Institution, Nanchang University, Nanchang, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
5
|
Small design from big alignment: engineering proteins with multiple sequence alignment as the starting point. Biotechnol Lett 2020; 42:1305-1315. [PMID: 32430802 DOI: 10.1007/s10529-020-02914-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Accepted: 05/14/2020] [Indexed: 02/08/2023]
Abstract
Multiple sequence alignment (MSA) is a fundamental way to gain information that cannot be obtained from the analysis of any individual sequence included in the alignment. It provides ways to investigate the relationship between sequence and function from a perspective of evolution. Thus, the MSA of proteins can be employed as a reference for protein engineering. In this paper, we reviewed the recent advances to highlight how protein engineering was benefited from the MSA of proteins. These methods include (1) engineering the thermostability or solubility of proteins by making it closer to the consensus sequence of the alignment through introducing site mutations; (2) structure-based engineering proteins with comparative modeling; (3) creating paleoenzymes featured with high thermostability and promiscuity by constructing the ancestral sequences derived from multiple sequence alignment; and (4) incorporating site-mutations targeting the evolutionarily coupled sites identified from multiple sequence alignment.
Collapse
|
6
|
Sauer MF, Sevy AM, Crowe JE, Meiler J. Multi-state design of flexible proteins predicts sequences optimal for conformational change. PLoS Comput Biol 2020; 16:e1007339. [PMID: 32032348 PMCID: PMC7032724 DOI: 10.1371/journal.pcbi.1007339] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2019] [Revised: 02/20/2020] [Accepted: 12/23/2019] [Indexed: 12/11/2022] Open
Abstract
Computational protein design of an ensemble of conformations for one protein–i.e., multi-state design–determines the side chain identity by optimizing the energetic contributions of that side chain in each of the backbone conformations. Sampling the resulting large sequence-structure search space limits the number of conformations and the size of proteins in multi-state design algorithms. Here, we demonstrated that the REstrained CONvergence (RECON) algorithm can simultaneously evaluate the sequence of large proteins that undergo substantial conformational changes. Simultaneous optimization of side chain conformations across all conformations increased sequence conservation when compared to single-state designs in all cases. More importantly, the sequence space sampled by RECON MSD resembled the evolutionary sequence space of flexible proteins, particularly when confined to predicting the mutational preferences of limited common ancestral descent, such as in the case of influenza type A hemagglutinin. Additionally, we found that sequence positions which require substantial changes in their local environment across an ensemble of conformations are more likely to be conserved. These increased conservation rates are better captured by RECON MSD over multiple conformations and thus multiple local residue environments during design. To quantify this rewiring of contacts at a certain position in sequence and structure, we introduced a new metric designated ‘contact proximity deviation’ that enumerates contact map changes. This measure allows mapping of global conformational changes into local side chain proximity adjustments, a property not captured by traditional global similarity metrics such as RMSD or local similarity metrics such as changes in φ and ψ angles. Multi-state design can be used to engineer proteins that need to exist in multiple conformations or that bind to multiple partner molecules. In essence, multi-state design selects a compromise of protein sequences that allow for an ensemble of protein conformations, or states, associated with a particular biological function. In this paper, we used the REstrained CONvergence (RECON) algorithm with Rosetta to show that multi-state design of flexible proteins predicts sequences optimal for conformational change, mimicking mutation preferences sampled in evolution. Modeling optimal local side chain physicochemical environments within an ensemble selected significantly more native-like sequences than selections performed when all conformations states are designed independently. This outcome was particularly true for amino acids whose local side chain environment change between conformations. To quantify such contact map changes, we introduced a novel metric to show that sequence conservation is dependent on protein flexibility, i.e., changes in local side chain environments between stated limit the space of tolerated mutations. Additionally, such positions in sequence and structure are more likely to be energetically frustrated, at least in some states. Importantly, we showed that multi-state design over an ensemble of conformations (space) can explore evolutionary tolerated sequence space (time), thus enabling RECON to not only design proteins that require multiple states for function but also predict mutations that might be tolerated in native proteins but have not yet been explored by evolution. The latter aspect can be important to anticipate escape mutations, for example in pathogens or oncoproteins.
Collapse
Affiliation(s)
- Marion F Sauer
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America.,Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Alexander M Sevy
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America.,Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - James E Crowe
- Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America.,Department of Pediatrics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America.,Department of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Jens Meiler
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America.,Department of Chemistry, Vanderbilt University, Nashville, Tennessee, United States of America
| |
Collapse
|
7
|
Zhou J, Panaitiu AE, Grigoryan G. A general-purpose protein design framework based on mining sequence-structure relationships in known protein structures. Proc Natl Acad Sci U S A 2020; 117:1059-1068. [PMID: 31892539 PMCID: PMC6969538 DOI: 10.1073/pnas.1908723117] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Current state-of-the-art approaches to computational protein design (CPD) aim to capture the determinants of structure from physical principles. While this has led to many successful designs, it does have strong limitations associated with inaccuracies in physical modeling, such that a reliable general solution to CPD has yet to be found. Here, we propose a design framework-one based on identifying and applying patterns of sequence-structure compatibility found in known proteins, rather than approximating them from models of interatomic interactions. We carry out extensive computational analyses and an experimental validation for our method. Our results strongly argue that the Protein Data Bank is now sufficiently large to enable proteins to be designed by using only examples of structural motifs from unrelated proteins. Because our method is likely to have orthogonal strengths relative to existing techniques, it could represent an important step toward removing remaining barriers to robust CPD.
Collapse
Affiliation(s)
- Jianfu Zhou
- Department of Computer Science, Dartmouth College, Hanover, NH 03755
| | | | - Gevorg Grigoryan
- Department of Computer Science, Dartmouth College, Hanover, NH 03755;
- Department of Biological Sciences, Dartmouth College, Hanover, NH 03755
| |
Collapse
|
8
|
The state-of-the-art strategies of protein engineering for enzyme stabilization. Biotechnol Adv 2018; 37:530-537. [PMID: 31138425 DOI: 10.1016/j.biotechadv.2018.10.011] [Citation(s) in RCA: 89] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2017] [Revised: 10/12/2018] [Accepted: 10/25/2018] [Indexed: 12/11/2022]
Abstract
Enzymes generated by natural recruitment and protein engineering have greatly contribute in various sets of applications. However, their insufficient stability is a bottleneck that limit the rapid development of biocatalysis. Novel approaches based on precise and global structural dissection, advanced gene manipulation, and combination with the multidisciplinary techniques open a new horizon to generate stable enzymes efficiently. Here, we comprehensively introduced emerging advances of protein engineering strategies for enzyme stabilization. Then, we highlighted practical cases to show importance of enzyme stabilization in pharmaceutical and industrial applications. Combining computational enzyme design with molecular evolution will hold considerable promise in this field.
Collapse
|
9
|
Setiawan D, Brender J, Zhang Y. Recent advances in automated protein design and its future challenges. Expert Opin Drug Discov 2018; 13:587-604. [PMID: 29695210 DOI: 10.1080/17460441.2018.1465922] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
INTRODUCTION Protein function is determined by protein structure which is in turn determined by the corresponding protein sequence. If the rules that cause a protein to adopt a particular structure are understood, it should be possible to refine or even redefine the function of a protein by working backwards from the desired structure to the sequence. Automated protein design attempts to calculate the effects of mutations computationally with the goal of more radical or complex transformations than are accessible by experimental techniques. Areas covered: The authors give a brief overview of the recent methodological advances in computer-aided protein design, showing how methodological choices affect final design and how automated protein design can be used to address problems considered beyond traditional protein engineering, including the creation of novel protein scaffolds for drug development. Also, the authors address specifically the future challenges in the development of automated protein design. Expert opinion: Automated protein design holds potential as a protein engineering technique, particularly in cases where screening by combinatorial mutagenesis is problematic. Considering solubility and immunogenicity issues, automated protein design is initially more likely to make an impact as a research tool for exploring basic biology in drug discovery than in the design of protein biologics.
Collapse
Affiliation(s)
- Dani Setiawan
- a Department of Computational Medicine and Bioinformatics , University of Michigan , Ann Arbor , MI , USA
| | - Jeffrey Brender
- b Radiation Biology Branch , Center for Cancer Research, National Cancer Institute - NIH , Bethesda , MD , USA
| | - Yang Zhang
- a Department of Computational Medicine and Bioinformatics , University of Michigan , Ann Arbor , MI , USA.,c Department of Biological Chemistry , University of Michigan , Ann Arbor , MI , USA
| |
Collapse
|