1
|
Hu X, Xu Y, Yi J, Wang C, Zhu Z, Yue T, Zhang H, Wang X, Wu F, Xue L, Bai L, Liu H, Chen Q. Using Protein Design and Directed Evolution to Monomerize a Bright Near-Infrared Fluorescent Protein. ACS Synth Biol 2024; 13:1177-1190. [PMID: 38552148 DOI: 10.1021/acssynbio.3c00643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/20/2024]
Abstract
The small ultrared fluorescent protein (smURFP) is a bright near-infrared (NIR) fluorescent protein (FP) that forms a dimer and binds its fluorescence chromophore, biliverdin, at its dimer interface. To engineer a monomeric NIR FP based on smURFP potentially more suitable for bioimaging, we employed protein design to extend the protein backbone with a new segment of two helices that shield the original dimer interface while covering the biliverdin binding pocket in place of the second chain in the original dimer. We experimentally characterized 13 designs and obtained a monomeric protein with a weak fluorescence. We enhanced the fluorescence of this designed protein through two rounds of directed evolution and obtained designed monomeric smURFP (DMsmURFP), a bright, stable, and monomeric NIR FP with a molecular weight of 19.6 kDa. We determined the crystal structures of DMsmURFP both in the apo state and in complex with biliverdin, which confirmed the designed structure. The use of DMsmURFP in in vivo imaging of mammalian systems was demonstrated. The backbone design-based strategy used here can also be applied to monomerize other naturally multimeric proteins with intersubunit functional sites.
Collapse
Affiliation(s)
- Xiuhong Hu
- Department of Rheumatology and Immunology, The First Affiliated Hospital of USTC, Center for Advanced Interdisciplinary Science and Biomedicine of IHM, Hefei National Center for Interdisciplinary Sciences at the Microscale, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Yang Xu
- Department of Rheumatology and Immunology, The First Affiliated Hospital of USTC, Center for Advanced Interdisciplinary Science and Biomedicine of IHM, Hefei National Center for Interdisciplinary Sciences at the Microscale, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Junxi Yi
- Department of Rheumatology and Immunology, The First Affiliated Hospital of USTC, Center for Advanced Interdisciplinary Science and Biomedicine of IHM, Hefei National Center for Interdisciplinary Sciences at the Microscale, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
- School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Chenchen Wang
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Zhongliang Zhu
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Ting Yue
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Haiyan Zhang
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Xinyu Wang
- Department of Rheumatology and Immunology, The First Affiliated Hospital of USTC, Center for Advanced Interdisciplinary Science and Biomedicine of IHM, Hefei National Center for Interdisciplinary Sciences at the Microscale, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Fan Wu
- Department of Rheumatology and Immunology, The First Affiliated Hospital of USTC, Center for Advanced Interdisciplinary Science and Biomedicine of IHM, Hefei National Center for Interdisciplinary Sciences at the Microscale, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Lin Xue
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
- Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Li Bai
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
- Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Haiyan Liu
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
- Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, Anhui 230027, China
- School of Data Science, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Quan Chen
- Department of Rheumatology and Immunology, The First Affiliated Hospital of USTC, Center for Advanced Interdisciplinary Science and Biomedicine of IHM, Hefei National Center for Interdisciplinary Sciences at the Microscale, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
- Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, Anhui 230027, China
| |
Collapse
|
2
|
Xu Y, Hu X, Wang C, Liu Y, Chen Q, Liu H. De novo design of cavity-containing proteins with a backbone-centered neural network energy function. Structure 2024; 32:424-432.e4. [PMID: 38325370 DOI: 10.1016/j.str.2024.01.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Revised: 10/04/2023] [Accepted: 01/11/2024] [Indexed: 02/09/2024]
Abstract
The design of small-molecule-binding proteins requires protein backbones that contain cavities. Previous design efforts were based on naturally occurring cavity-containing backbone architectures. Here, we designed diverse cavity-containing backbones without predefined architectures by introducing tailored restraints into the backbone sampling driven by SCUBA (Side Chain-Unknown Backbone Arrangement), a neural network statistical energy function. For 521 out of 5816 designs, the root-mean-square deviations (RMSDs) of the Cα atoms for the AlphaFold2-predicted structures and our designed structures are within 2.0 Å. We experimentally tested 10 designed proteins and determined the crystal structures of two of them. One closely agrees with the designed model, while the other forms a domain-swapped dimer, where the partial structures are in agreement with the designed structures. Our results indicate that data-driven methods such as SCUBA hold great potential for designing de novo proteins with tailored small-molecule-binding function.
Collapse
Affiliation(s)
- Yang Xu
- Department of Rheumatology and Immunology, The First Affiliated Hospital of USTC, Centre for Advanced Interdisciplinary Science and Biomedicine of IHM, Hefei National Center for Interdisciplinary Sciences at the Microscale, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230001, China; MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, Hefei National Laboratory for Physical Sciences at the Microscale, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Xiuhong Hu
- Department of Rheumatology and Immunology, The First Affiliated Hospital of USTC, Centre for Advanced Interdisciplinary Science and Biomedicine of IHM, Hefei National Center for Interdisciplinary Sciences at the Microscale, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230001, China; MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, Hefei National Laboratory for Physical Sciences at the Microscale, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Chenchen Wang
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, Hefei National Laboratory for Physical Sciences at the Microscale, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Yongrui Liu
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, Hefei National Laboratory for Physical Sciences at the Microscale, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Quan Chen
- Department of Rheumatology and Immunology, The First Affiliated Hospital of USTC, Centre for Advanced Interdisciplinary Science and Biomedicine of IHM, Hefei National Center for Interdisciplinary Sciences at the Microscale, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230001, China; MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, Hefei National Laboratory for Physical Sciences at the Microscale, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China; Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, Anhui 230027, China.
| | - Haiyan Liu
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, Hefei National Laboratory for Physical Sciences at the Microscale, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China; Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, Anhui 230027, China; School of Data Science, University of Science and Technology of China, Hefei, Anhui 230027, China.
| |
Collapse
|
3
|
Chu AE, Lu T, Huang PS. Sparks of function by de novo protein design. Nat Biotechnol 2024; 42:203-215. [PMID: 38361073 DOI: 10.1038/s41587-024-02133-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Accepted: 01/09/2024] [Indexed: 02/17/2024]
Abstract
Information in proteins flows from sequence to structure to function, with each step causally driven by the preceding one. Protein design is founded on inverting this process: specify a desired function, design a structure executing this function, and find a sequence that folds into this structure. This 'central dogma' underlies nearly all de novo protein-design efforts. Our ability to accomplish these tasks depends on our understanding of protein folding and function and our ability to capture this understanding in computational methods. In recent years, deep learning-derived approaches for efficient and accurate structure modeling and enrichment of successful designs have enabled progression beyond the design of protein structures and towards the design of functional proteins. We examine these advances in the broader context of classical de novo protein design and consider implications for future challenges to come, including fundamental capabilities such as sequence and structure co-design and conformational control considering flexibility, and functional objectives such as antibody and enzyme design.
Collapse
Affiliation(s)
- Alexander E Chu
- Biophysics Program, Stanford University, Palo Alto, CA, USA
- Department of Bioengineering, Stanford University, Palo Alto, CA, USA
- Google DeepMind, London, UK
| | - Tianyu Lu
- Department of Bioengineering, Stanford University, Palo Alto, CA, USA
| | - Po-Ssu Huang
- Biophysics Program, Stanford University, Palo Alto, CA, USA.
- Department of Bioengineering, Stanford University, Palo Alto, CA, USA.
| |
Collapse
|
4
|
Liu Y, Liu H. Protein sequence design on given backbones with deep learning. Protein Eng Des Sel 2024; 37:gzad024. [PMID: 38157313 DOI: 10.1093/protein/gzad024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 12/08/2023] [Accepted: 12/18/2023] [Indexed: 01/03/2024] Open
Abstract
Deep learning methods for protein sequence design focus on modeling and sampling the many- dimensional distribution of amino acid sequences conditioned on the backbone structure. To produce physically foldable sequences, inter-residue couplings need to be considered properly. These couplings are treated explicitly in iterative methods or autoregressive methods. Non-autoregressive models treating these couplings implicitly are computationally more efficient, but still await tests by wet experiment. Currently, sequence design methods are evaluated mainly using native sequence recovery rate and native sequence perplexity. These metrics can be complemented by sequence-structure compatibility metrics obtained from energy calculation or structure prediction. However, existing computational metrics have important limitations that may render the generalization of computational test results to performance in real applications unwarranted. Validation of design methods by wet experiments should be encouraged.
Collapse
Affiliation(s)
- Yufeng Liu
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Haiyan Liu
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
- Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, Anhui 230027, China
- School of Biomedical Engineering, Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, Jiangsu 215004, China
| |
Collapse
|
5
|
Zhang X, Yin H, Ling F, Zhan J, Zhou Y. SPIN-CGNN: Improved fixed backbone protein design with contact map-based graph construction and contact graph neural network. PLoS Comput Biol 2023; 19:e1011330. [PMID: 38060617 PMCID: PMC10729952 DOI: 10.1371/journal.pcbi.1011330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 12/19/2023] [Accepted: 11/27/2023] [Indexed: 12/20/2023] Open
Abstract
Recent advances in deep learning have significantly improved the ability to infer protein sequences directly from protein structures for the fix-backbone design. The methods have evolved from the early use of multi-layer perceptrons to convolutional neural networks, transformers, and graph neural networks (GNN). However, the conventional approach of constructing K-nearest-neighbors (KNN) graph for GNN has limited the utilization of edge information, which plays a critical role in network performance. Here we introduced SPIN-CGNN based on protein contact maps for nearest neighbors. Together with auxiliary edge updates and selective kernels, we found that SPIN-CGNN provided a comparable performance in refolding ability by AlphaFold2 to the current state-of-the-art techniques but a significant improvement over them in term of sequence recovery, perplexity, deviation from amino-acid compositions of native sequences, conservation of hydrophobic positions, and low complexity regions, according to the test by unseen structures, "hallucinated" structures and diffusion models. Results suggest that low complexity regions in the sequences designed by deep learning, for generated structures in particular, remain to be improved, when compared to the native sequences.
Collapse
Affiliation(s)
- Xing Zhang
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, People’s Republic of China
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen, People’s Republic of China
| | - Hongmei Yin
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen, People’s Republic of China
| | - Fei Ling
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, People’s Republic of China
| | - Jian Zhan
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen, People’s Republic of China
| | - Yaoqi Zhou
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen, People’s Republic of China
| |
Collapse
|
6
|
Wu C, Yu X, Zheng P, Chen P, Wu D. Rational Redesign of Chitosanase to Enhance Thermostability and Catalytic Activity to Produce Chitooligosaccharides with a Relatively High Degree of Polymerization. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2023; 71:15213-15223. [PMID: 37793074 DOI: 10.1021/acs.jafc.3c04542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/06/2023]
Abstract
Chitooligosaccharides (hdpCOS) with a high degree of polymerization (hdp, DP 4-10) generally have greater biological activities than those of low-DP (ldp, DP 2-3) COS. Chitosanase from Bacillus amyloliquefaciens KCP2 (Csn46) can degrade chitosan to more hdpCOS at high temperature (70 °C), but low thermal stability at this temperature makes it unsuitable for industrial application; the wild-type enzyme can only produce COS (DP 2-4) at lower temperatures. Several thermostable mutants were obtained by modifying chitosanase using a comprehensive strategy based on a computer-aided mutant design. A combination of four beneficial single-point mutations (A129L/T175 V/K70T/D34G) to Csn46 was selected to obtain a markedly improved mutant, Mut4, with a half-life at 60 °C extended from 34.31 to 690.80 min, and the specific activity increased from 1671.73 to 3528.77 U/mg. Mut4 produced COS with DPs of 2-4 and 2-7 at 60 and 70 °C, respectively. Therefore, Mut4 has the potential to be applied to the industrial-scale preparation of hdpCOS with high biological activity.
Collapse
Affiliation(s)
- Changyun Wu
- Key Laboratory of Industrial Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi 214122, China
| | - Xiaowei Yu
- Key Laboratory of Industrial Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi 214122, China
| | - Pu Zheng
- Key Laboratory of Industrial Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi 214122, China
| | - Pengcheng Chen
- Key Laboratory of Industrial Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi 214122, China
| | - Dan Wu
- Key Laboratory of Industrial Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi 214122, China
| |
Collapse
|
7
|
Zhang L, Liu H. Exploring binding positions and backbone conformations of peptide ligands of proteins with a backbone-centred statistical energy function. J Comput Aided Mol Des 2023; 37:463-478. [PMID: 37498491 DOI: 10.1007/s10822-023-00518-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 07/05/2023] [Indexed: 07/28/2023]
Abstract
When designing peptide ligands based on the structure of a protein receptor, it can be very useful to narrow down the possible binding positions and bound conformations of the ligand without the need to choose its amino acid sequence in advance. Here, we construct and benchmark a tool for this purpose based on a recently reported statistical energy model named SCUBA (Sidechain-Unknown Backbone Arrangement) for designing protein backbones without considering specific amino acid sequences. With this tool, backbone fragments of different local conformation types are generated and optimized with SCUBA-driven stochastic simulations and simulated annealing, and then ranked and clustered to obtain representative backbone fragment poses of strong SCUBA interaction energies with the receptor. We computationally benchmarked the tool on 111 known protein-peptide complex structures. When the bound ligands are in the strand conformation, the method is able to generate backbone fragments of both low SCUBA energies and low root mean square deviations from experimental structures of peptide ligands. When the bound ligands are helices or coils, low-energy backbone fragments with binding poses similar to experimental structures have been generated for approximately 50% of benchmark cases. We have examined a number of predicted ligand-receptor complexes by atomistic molecular dynamics simulations, in which the peptide ligands have been found to stay at the predicted binding sites and to maintain their local conformations. These results suggest that promising backbone structures of peptides bound to protein receptors can be designed by identifying outstanding minima on the SCUBA-modeled backbone energy landscape.
Collapse
Affiliation(s)
- Lu Zhang
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, Hefei National Laboratory for Physical Sciences at the Microscale, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230027, Anhui, China
| | - Haiyan Liu
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, Hefei National Laboratory for Physical Sciences at the Microscale, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230027, Anhui, China.
- Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, 230027, Anhui, China.
- School of Data Science, University of Science and Technology of China, Hefei, 230027, Anhui, China.
| |
Collapse
|
8
|
Yan J, Li S, Zhang Y, Hao A, Zhao Q. ZetaDesign: an end-to-end deep learning method for protein sequence design and side-chain packing. Brief Bioinform 2023; 24:bbad257. [PMID: 37429578 DOI: 10.1093/bib/bbad257] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 06/05/2023] [Accepted: 06/21/2023] [Indexed: 07/12/2023] Open
Abstract
Computational protein design has been demonstrated to be the most powerful tool in the last few years among protein designing and repacking tasks. In practice, these two tasks are strongly related but often treated separately. Besides, state-of-the-art deep-learning-based methods cannot provide interpretability from an energy perspective, affecting the accuracy of the design. Here we propose a new systematic approach, including both a posterior probability and a joint probability parts, to solve the two essential questions once for all. This approach takes the physicochemical property of amino acids into consideration and uses the joint probability model to ensure the convergence between structure and amino acid type. Our results demonstrated that this method could generate feasible, high-confidence sequences with low-energy side conformations. The designed sequences can fold into target structures with high confidence and maintain relatively stable biochemical properties. The side chain conformation has a significantly lower energy landscape without delegating to a rotamer library or performing the expensive conformational searches. Overall, we propose an end-to-end method that combines the advantages of both deep learning and energy-based methods. The design results of this model demonstrate high efficiency, and precision, as well as a low energy state and good interpretability.
Collapse
Affiliation(s)
- Junyu Yan
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
| | - Shuai Li
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
| | - Ying Zhang
- The Key Laboratory of Cell Proliferation and Regulation Biology, Ministry of Education, College of Life Sciences, Beijing Normal University, Beijing, China
| | - Aimin Hao
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
| | - Qinping Zhao
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
| |
Collapse
|
9
|
Huang J, Xie X, Zheng Z, Ye L, Wang P, Xu L, Wu Y, Yan J, Yang M, Yan Y. De Novo Computational Design of a Lipase with Hydrolysis Activity towards Middle-Chained Fatty Acid Esters. Int J Mol Sci 2023; 24:ijms24108581. [PMID: 37239928 DOI: 10.3390/ijms24108581] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 05/08/2023] [Accepted: 05/09/2023] [Indexed: 05/28/2023] Open
Abstract
Innovations in biocatalysts provide great prospects for intolerant environments or novel reactions. Due to the limited catalytic capacity and the long-term and labor-intensive characteristics of mining enzymes with the desired functions, de novo enzyme design was developed to obtain industrial application candidates in a rapid and convenient way. Here, based on the catalytic mechanisms and the known structures of proteins, we proposed a computational protein design strategy combining de novo enzyme design and laboratory-directed evolution. Starting with the theozyme constructed using a quantum-mechanical approach, the theoretical enzyme-skeleton combinations were assembled and optimized via the Rosetta "inside-out" protocol. A small number of designed sequences were experimentally screened using SDS-PAGE, mass spectrometry and a qualitative activity assay in which the designed enzyme 1a8uD1 exhibited a measurable hydrolysis activity of 24.25 ± 0.57 U/g towards p-nitrophenyl octanoate. To improve the activity of the designed enzyme, molecular dynamics simulations and the RosettaDesign application were utilized to further optimize the substrate binding mode and amino acid sequence, thus keeping the residues of theozyme intact. The redesigned lipase 1a8uD1-M8 displayed enhanced hydrolysis activity towards p-nitrophenyl octanoate-3.34 times higher than that of 1a8uD1. Meanwhile, the natural skeleton protein (PDB entry 1a8u) did not display any hydrolysis activity, confirming that the hydrolysis abilities of the designed 1a8uD1 and the redesigned 1a8uD1-M8 were devised from scratch. More importantly, the designed 1a8uD1-M8 was also able to hydrolyze the natural middle-chained substrate (glycerol trioctanoate), for which the activity was 27.67 ± 0.69 U/g. This study indicates that the strategy employed here has great potential to generate novel enzymes exhibiting the desired reactions.
Collapse
Affiliation(s)
- Jinsha Huang
- Key Laboratory of Molecular Biophysics, Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Xiaoman Xie
- Key Laboratory of Molecular Biophysics, Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Zhen Zheng
- Key Laboratory of Molecular Biophysics, Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Luona Ye
- Key Laboratory of Molecular Biophysics, Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Pengbo Wang
- Key Laboratory of Molecular Biophysics, Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Li Xu
- Key Laboratory of Molecular Biophysics, Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Ying Wu
- Key Laboratory of Molecular Biophysics, Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Jinyong Yan
- Key Laboratory of Molecular Biophysics, Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Min Yang
- Key Laboratory of Molecular Biophysics, Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Yunjun Yan
- Key Laboratory of Molecular Biophysics, Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| |
Collapse
|
10
|
Liu H, Chen Q. Computational protein design with data‐driven approaches: Recent developments and perspectives. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Affiliation(s)
- Haiyan Liu
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine University of Science and Technology of China Hefei Anhui China
- Biomedical Sciences and Health Laboratory of Anhui Province University of Science and Technology of China Hefei Anhui China
- School of Data Science University of Science and Technology of China Hefei Anhui China
| | - Quan Chen
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine University of Science and Technology of China Hefei Anhui China
- Biomedical Sciences and Health Laboratory of Anhui Province University of Science and Technology of China Hefei Anhui China
| |
Collapse
|
11
|
Dicks L, Wales DJ. Exploiting Sequence-Dependent Rotamer Information in Global Optimization of Proteins. J Phys Chem B 2022; 126:8381-8390. [PMID: 36257022 PMCID: PMC9623586 DOI: 10.1021/acs.jpcb.2c04647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Rotamers, namely amino acid side chain conformations common to many different peptides, can be compiled into libraries. These rotamer libraries are used in protein modeling, where the limited conformational space occupied by amino acid side chains is exploited. Here, we construct a sequence-dependent rotamer library from simulations of all possible tripeptides, which provides rotameric states dependent on adjacent amino acids. We observe significant sensitivity of rotamer populations to sequence and find that the library is successful in locating side chain conformations present in crystal structures. The library is designed for applications with basin-hopping global optimization, where we use it to propose moves in conformational space. The addition of rotamer moves significantly increases the efficiency of protein structure prediction within this framework, and we determine parameters to optimize efficiency.
Collapse
Affiliation(s)
- L. Dicks
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom,IBM
Research, The Hartree Centre STFC Laboratory,
Sci-Tech Daresbury, Warrington WA4 4AD, United Kingdom
| | - D. J. Wales
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom,
| |
Collapse
|
12
|
Yuan B, Ru X, Lin Z. Analysis of the sidechain structures of amino acids and peptides and a deduced method for the efficient search of peptide conformations. COMPUT THEOR CHEM 2022. [DOI: 10.1016/j.comptc.2022.113815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
13
|
Liu Y, Zhang L, Wang W, Zhu M, Wang C, Li F, Zhang J, Li H, Chen Q, Liu H. Rotamer-free protein sequence design based on deep learning and self-consistency. NATURE COMPUTATIONAL SCIENCE 2022; 2:451-462. [PMID: 38177863 DOI: 10.1038/s43588-022-00273-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Accepted: 06/07/2022] [Indexed: 01/06/2024]
Abstract
Several previously proposed deep learning methods to design amino acid sequences that autonomously fold into a given protein backbone yielded promising results in computational tests but did not outperform conventional energy function-based methods in wet experiments. Here we present the ABACUS-R method, which uses an encoder-decoder network trained using a multitask learning strategy to predict the sidechain type of a central residue from its three-dimensional local environment, which includes, besides other features, the types but not the conformations of the surrounding sidechains. This eliminates the need to reconstruct and optimize sidechain structures, and drastically simplifies the sequence design process. Thus iteratively applying the encoder-decoder to different central residues is able to produce self-consistent overall sequences for a target backbone. Results of wet experiments, including five structures solved by X-ray crystallography, show that ABACUS-R outperforms state-of-the-art energy function-based methods in success rate and design precision.
Collapse
Affiliation(s)
- Yufeng Liu
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, China
| | - Lu Zhang
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, China
| | - Weilun Wang
- CAS Key Laboratory of GIPAS, School of Information Science and Technology, Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei, Anhui, China
| | - Min Zhu
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, China
| | - Chenchen Wang
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, China
| | - Fudong Li
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, China
- Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, Anhui, China
| | - Jiahai Zhang
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, China
- Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, Anhui, China
| | - Houqiang Li
- CAS Key Laboratory of GIPAS, School of Information Science and Technology, Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei, Anhui, China.
| | - Quan Chen
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, China.
- Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, Anhui, China.
| | - Haiyan Liu
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, China.
- Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, Anhui, China.
- School of Data Science, University of Science and Technology of China, Hefei, Anhui, China.
| |
Collapse
|
14
|
Sun J, Wu B. Protein design with a machine-learned potential about backbone designability. Trends Biochem Sci 2022; 47:638-640. [DOI: 10.1016/j.tibs.2022.04.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2022] [Revised: 04/07/2022] [Accepted: 04/07/2022] [Indexed: 10/18/2022]
|
15
|
Chen Y, Chen Q, Liu H. DEPACT and PACMatch: A Workflow of Designing De Novo Protein Pockets to Bind Small Molecules. J Chem Inf Model 2022; 62:971-985. [PMID: 35171604 DOI: 10.1021/acs.jcim.1c01398] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Engineering of new functional proteins such as enzymes and biosensors involves the design of new protein pockets for the specific binding of small molecules. Here, we report a workflow composed of two new computational methods to execute this task. The DEPACT (Design Pocket as a Cluster based on Templates) method is a data-driven approach to design and evaluate small-molecule-binding pockets as isolated clusters, while the PACMatch method is a computational approach to match pocket residues in a cluster model to positions on given protein scaffolds. Using DEPACT and its scoring function, pocket clusters of natural-pocket-like chemical compositions and protein-ligand interaction strength can be designed. DEPACT can design pocket clusters containing water- or metal-ion-mediated protein-ligand interactions. While being able to efficiently treat relatively large pocket cluster models (e.g., of around 10 pocket residues), PACMatch outperforms previous methods in test cases of recovering the native positions of pocket residues in natural enzyme-substrate complexes.
Collapse
Affiliation(s)
- Yaoxi Chen
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, Hefei National Laboratory for Physical Sciences at the Microscale, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Quan Chen
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, Hefei National Laboratory for Physical Sciences at the Microscale, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China.,Biomedical Sciences and Health Laboratory of Anhui Province, University of Science & Technology of China, Hefei, Anhui 230027, China
| | - Haiyan Liu
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, Hefei National Laboratory for Physical Sciences at the Microscale, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China.,Biomedical Sciences and Health Laboratory of Anhui Province, University of Science & Technology of China, Hefei, Anhui 230027, China.,School of Data Science, University of Science and Technology of China, Hefei, Anhui 230027, China
| |
Collapse
|
16
|
A backbone-centred energy function of neural networks for protein design. Nature 2022; 602:523-528. [PMID: 35140398 DOI: 10.1038/s41586-021-04383-5] [Citation(s) in RCA: 41] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Accepted: 12/23/2021] [Indexed: 12/29/2022]
Abstract
A protein backbone structure is designable if a substantial number of amino acid sequences exist that autonomously fold into it1,2. It has been suggested that the designability of backbones is governed mainly by side chain-independent or side chain type-insensitive molecular interactions3-5, indicating an approach for designing new backbones (ready for amino acid selection) based on continuous sampling and optimization of the backbone-centred energy surface. However, a sufficiently comprehensive and precise energy function has yet to be established for this purpose. Here we show that this goal is met by a statistical model named SCUBA (for Side Chain-Unknown Backbone Arrangement) that uses neural network-form energy terms. These terms are learned with a two-step approach that comprises kernel density estimation followed by neural network training and can analytically represent multidimensional, high-order correlations in known protein structures. We report the crystal structures of nine de novo proteins whose backbones were designed to high precision using SCUBA, four of which have novel, non-natural overall architectures. By eschewing use of fragments from existing protein structures, SCUBA-driven structure design facilitates far-reaching exploration of the designable backbone space, thus extending the novelty and diversity of the proteins amenable to de novo design.
Collapse
|
17
|
Liang S, Li Z, Zhan J, Zhou Y. De novo protein design by an energy function based on series expansion in distance and orientation dependence. Bioinformatics 2021; 38:86-93. [PMID: 34406339 DOI: 10.1093/bioinformatics/btab598] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2021] [Revised: 08/11/2021] [Accepted: 08/16/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Despite many successes, de novo protein design is not yet a solved problem as its success rate remains low. The low success rate is largely because we do not yet have an accurate energy function for describing the solvent-mediated interaction between amino acid residues in a protein chain. Previous studies showed that an energy function based on series expansions with its parameters optimized for side-chain and loop conformations can lead to one of the most accurate methods for side chain (OSCAR) and loop prediction (LEAP). Following the same strategy, we developed an energy function based on series expansions with the parameters optimized in four separate stages (recovering single-residue types without and with orientation dependence, selecting loop decoys and maintaining the composition of amino acids). We tested the energy function for de novo design by using Monte Carlo simulated annealing. RESULTS The method for protein design (OSCAR-Design) is found to be as accurate as OSCAR and LEAP for side-chain and loop prediction, respectively. In de novo design, it can recover native residue types ranging from 38% to 43% depending on test sets, conserve hydrophobic/hydrophilic residues at ∼75%, and yield the overall similarity in amino acid compositions at more than 90%. These performance measures are all statistically significantly better than several protein design programs compared. Moreover, the largest hydrophobic patch areas in designed proteins are near or smaller than those in native proteins. Thus, an energy function based on series expansion can be made useful for protein design. AVAILABILITY AND IMPLEMENTATION The Linux executable version is freely available for academic users at http://zhouyq-lab.szbl.ac.cn/resources/.
Collapse
Affiliation(s)
- Shide Liang
- Department of R & D, Bio-Thera Solutions, Guangzhou 510530, China
| | - Zhixiu Li
- Institute of Health and Biomedical Innovation, Queensland University of Technology at Translational Research Institute, Woolloongabba, QLD 3001, Australia
| | - Jian Zhan
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Gold Coast Campus, Southport, QLD 4222, Australia.,Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China
| | - Yaoqi Zhou
- Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China.,Peking University Shenzhen Graduate School, Shenzhen 518055, China
| |
Collapse
|
18
|
Liu R, Wang J, Xiong P, Chen Q, Liu H. De novo sequence redesign of a functional Ras-binding domain globally inverted the surface charge distribution and led to extreme thermostability. Biotechnol Bioeng 2021; 118:2031-2042. [PMID: 33590881 DOI: 10.1002/bit.27716] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Revised: 02/05/2021] [Accepted: 02/14/2021] [Indexed: 11/05/2022]
Abstract
To acquire extremely thermostable proteins of given functions is challenging for conventional protein engineering. Here we applied ABACUS, a statistical energy function we developed for de novo amino acid sequence design, to globally redesign a Ras-binding domain (RBD), and obtained an extremely thermostable RBD that unfolds reversibly at above 110°C, the redesigned RBD experimentally confirmed to have expected structure and Ras-binding interface. Directed evolution of the redesigned RBD improved its Ras-binding affinity to the native protein level without excessive loss of thermostability. The designed amino acid substitutions were mostly at the protein surface. For many substitutions, strong epistasis or significantly differentiated effects on thermostability in the native sequence context relative to the redesigned sequence context were observed, suggesting the globally redesigned sequence to be unreachable through combining beneficial mutations of the native sequence. Further analyses revealed that by replacing 38 of a total of 48 non-interfacial surface residues at once, ABACUS redesign was able to globally "invert" the protein's charge distribution pattern in an optimized way. Our study demonstrates that computational protein design provides powerful new tools to solve challenging protein engineering problems.
Collapse
Affiliation(s)
- Ruicun Liu
- School of Life Sciences, University of Science and Technology of China, Hefei, Anhui, China
| | - Jichao Wang
- School of Life Sciences, University of Science and Technology of China, Hefei, Anhui, China
| | - Peng Xiong
- School of Life Sciences, University of Science and Technology of China, Hefei, Anhui, China
| | - Quan Chen
- School of Life Sciences, University of Science and Technology of China, Hefei, Anhui, China.,Hefei National Laboratory for Physical Sciences at the Microscale, University of Science and Technology of China, Hefei, Anhui, China
| | - Haiyan Liu
- School of Life Sciences, University of Science and Technology of China, Hefei, Anhui, China.,Hefei National Laboratory for Physical Sciences at the Microscale, University of Science and Technology of China, Hefei, Anhui, China.,School of Data Science, University of Science and Technology of China, Hefei, Anhui, China
| |
Collapse
|
19
|
Huang X, Pearce R, Zhang Y. FASPR: an open-source tool for fast and accurate protein side-chain packing. Bioinformatics 2020; 36:3758-3765. [PMID: 32259206 DOI: 10.1093/bioinformatics/btaa234] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2020] [Revised: 03/30/2020] [Accepted: 04/01/2020] [Indexed: 01/04/2023] Open
Abstract
MOTIVATION Protein structure and function are essentially determined by how the side-chain atoms interact with each other. Thus, accurate protein side-chain packing (PSCP) is a critical step toward protein structure prediction and protein design. Despite the importance of the problem, however, the accuracy and speed of current PSCP programs are still not satisfactory. RESULTS We present FASPR for fast and accurate PSCP by using an optimized scoring function in combination with a deterministic searching algorithm. The performance of FASPR was compared with four state-of-the-art PSCP methods (CISRR, RASP, SCATD and SCWRL4) on both native and non-native protein backbones. For the assessment on native backbones, FASPR achieved a good performance by correctly predicting 69.1% of all the side-chain dihedral angles using a stringent tolerance criterion of 20°, compared favorably with SCWRL4, CISRR, RASP and SCATD which successfully predicted 68.8%, 68.6%, 67.8% and 61.7%, respectively. Additionally, FASPR achieved the highest speed for packing the 379 test protein structures in only 34.3 s, which was significantly faster than the control methods. For the assessment on non-native backbones, FASPR showed an equivalent or better performance on I-TASSER predicted backbones and the backbones perturbed from experimental structures. Detailed analyses showed that the major advantage of FASPR lies in the optimal combination of the dead-end elimination and tree decomposition with a well optimized scoring function, which makes FASPR of practical use for both protein structure modeling and protein design studies. AVAILABILITY AND IMPLEMENTATION The web server, source code and datasets are freely available at https://zhanglab.ccmb.med.umich.edu/FASPR and https://github.com/tommyhuangthu/FASPR. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Robin Pearce
- Department of Computational Medicine and Bioinformatics
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
20
|
Qi Y, Zhang JZH. DenseCPD: Improving the Accuracy of Neural-Network-Based Computational Protein Sequence Design with DenseNet. J Chem Inf Model 2020; 60:1245-1252. [DOI: 10.1021/acs.jcim.0c00043] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Affiliation(s)
- Yifei Qi
- Shanghai Engineering Research Center of Molecular Therapeutics & New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China
- NYU−ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| | - John Z. H. Zhang
- Shanghai Engineering Research Center of Molecular Therapeutics & New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China
- NYU−ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
- Department of Chemistry, New York University, New York, New York 10003, United States
| |
Collapse
|
21
|
Chen S, Sun Z, Lin L, Liu Z, Liu X, Chong Y, Lu Y, Zhao H, Yang Y. To Improve Protein Sequence Profile Prediction through Image Captioning on Pairwise Residue Distance Map. J Chem Inf Model 2019; 60:391-399. [PMID: 31800243 DOI: 10.1021/acs.jcim.9b00438] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Protein sequence profile prediction aims to generate multiple sequences from structural information to advance the protein design. Protein sequence profile can be computationally predicted by energy-based or fragment-based methods. By integrating these methods with neural networks, our previous method, SPIN2, has achieved a sequence recovery rate of 34%. However, SPIN2 employed only one-dimensional (1D) structural properties that are not sufficient to represent three-dimensional (3D) structures. In this study, we represented 3D structures by 2D maps of pairwise residue distances and developed a new method (SPROF) to predict protein sequence profiles based on an image captioning learning frame. To our best knowledge, this is the first method to employ a 2D distance map for predicting protein properties. SPROF achieved 39.8% in sequence recovery of residues on the independent test set, representing a 5.2% improvement over SPIN2. We also found the sequence recovery increased with the number of their neighbored residues in 3D structural space, indicating that our method can effectively learn long-range information from the 2D distance map. Thus, such network architecture using a 2D distance map is expected to be useful for other 3D structure-based applications, such as binding site prediction, protein function prediction, and protein interaction prediction. The online server and the source code is available at http://biomed.nscc-gz.cn and https://github.com/biomed-AI/SPROF , respectively.
Collapse
Affiliation(s)
- Sheng Chen
- School of Data and Computer Science , Sun Yat-sen University , Guangzhou 510000 , China
| | - Zhe Sun
- School of Data and Computer Science , Sun Yat-sen University , Guangzhou 510000 , China
| | - Lihua Lin
- School of Data and Computer Science , Sun Yat-sen University , Guangzhou 510000 , China
| | - Zifeng Liu
- Third Affiliated Hospital of Sun Yat-sen University , Guangzhou 510000 , China
| | - Xun Liu
- Third Affiliated Hospital of Sun Yat-sen University , Guangzhou 510000 , China
| | - Yutian Chong
- Third Affiliated Hospital of Sun Yat-sen University , Guangzhou 510000 , China
| | - Yutong Lu
- School of Data and Computer Science , Sun Yat-sen University , Guangzhou 510000 , China
| | - Huiying Zhao
- Sun Yat-sen Memorial Hospital , Sun Yat-sen University , Guangzhou 510000 , China
| | - Yuedong Yang
- School of Data and Computer Science , Sun Yat-sen University , Guangzhou 510000 , China.,Key Laboratory of Machine Intelligence and Advanced Computing (Sun Yat-sen University) of the Ministry of Education , Guangzhou 510000 , China
| |
Collapse
|