1
|
Yu H, Luo X. ThermoFinder: A sequence-based thermophilic proteins prediction framework. Int J Biol Macromol 2024; 270:132469. [PMID: 38761901 DOI: 10.1016/j.ijbiomac.2024.132469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2024] [Revised: 05/14/2024] [Accepted: 05/15/2024] [Indexed: 05/20/2024]
Abstract
Thermophilic proteins are important for academic research and industrial processes, and various computational methods have been developed to identify and screen them. However, their performance has been limited due to the lack of high-quality labeled data and efficient models for representing protein. Here, we proposed a novel sequence-based thermophilic proteins prediction framework, called ThermoFinder. The results demonstrated that ThermoFinder outperforms previous state-of-the-art tools on two benchmark datasets, and feature ablation experiments confirmed the effectiveness of our approach. Additionally, ThermoFinder exhibited exceptional performance and consistency across two newly constructed datasets, one of these was specifically constructed for the regression-based prediction of temperature optimum values directly derived from protein sequences. The feature importance analysis, using shapley additive explanations, further validated the advantages of ThermoFinder. We believe that ThermoFinder will be a valuable and comprehensive framework for predicting thermophilic proteins, and we have made our model open source and available on Github at https://github.com/Luo-SynBioLab/ThermoFinder.
Collapse
Affiliation(s)
- Han Yu
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; University of Chinese Academy of Sciences, Beijing 100049, China; CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Xiaozhou Luo
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; University of Chinese Academy of Sciences, Beijing 100049, China; CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.
| |
Collapse
|
2
|
Yu H, Deng H, He J, Keasling JD, Luo X. UniKP: a unified framework for the prediction of enzyme kinetic parameters. Nat Commun 2023; 14:8211. [PMID: 38081905 PMCID: PMC10713628 DOI: 10.1038/s41467-023-44113-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 11/30/2023] [Indexed: 12/18/2023] Open
Abstract
Prediction of enzyme kinetic parameters is essential for designing and optimizing enzymes for various biotechnological and industrial applications, but the limited performance of current prediction tools on diverse tasks hinders their practical applications. Here, we introduce UniKP, a unified framework based on pretrained language models for the prediction of enzyme kinetic parameters, including enzyme turnover number (kcat), Michaelis constant (Km), and catalytic efficiency (kcat / Km), from protein sequences and substrate structures. A two-layer framework derived from UniKP (EF-UniKP) has also been proposed to allow robust kcat prediction in considering environmental factors, including pH and temperature. In addition, four representative re-weighting methods are systematically explored to successfully reduce the prediction error in high-value prediction tasks. We have demonstrated the application of UniKP and EF-UniKP in several enzyme discovery and directed evolution tasks, leading to the identification of new enzymes and enzyme mutants with higher activity. UniKP is a valuable tool for deciphering the mechanisms of enzyme kinetics and enables novel insights into enzyme engineering and their industrial applications.
Collapse
Affiliation(s)
- Han Yu
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Huaxiang Deng
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Jiahui He
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Jay D Keasling
- Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- Joint BioEnergy Institute, Emeryville, CA, 94608, USA
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
- Department of Chemical and Biomolecular Engineering & Department of Bioengineering, University of California, Berkeley, CA, 94720, USA
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800, Kgs, Lyngby, Denmark
| | - Xiaozhou Luo
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
- Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| |
Collapse
|
3
|
Jia W, Peng J, Zhang Y, Zhu J, Qiang X, Zhang R, Shi L. Exploring novel ANGICon-EIPs through ameliorated peptidomics techniques: Can deep learning strategies as a core breakthrough in peptide structure and function prediction? Food Res Int 2023; 174:113640. [PMID: 37986483 DOI: 10.1016/j.foodres.2023.113640] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 10/23/2023] [Accepted: 10/24/2023] [Indexed: 11/22/2023]
Abstract
Dairy-derived angiotensin-I-converting enzyme inhibitory peptides (ANGICon-EIPs) have been regarded as a relatively safe supplementary diet-therapy strategy for individuals with hypertension, and short-chain peptides may have more relevant antihypertensive benefits due to their direct intestinal absorption. Our previous explorations have confirmed that endogenous goat milk short-chain peptides are also an essential source of ANGICon-EIPs. Nonetheless, there are limited explorations on endogenous ANGICon-EIPs owing to the limitations of the extraction and enrichment of endogenous peptides, currently. This review outlined ameliorated pre-treatment strategies, data acquisition methods, and tools for the prediction of peptide structure and function, aiming to provide creative ideas for discovering novel ANGICon-EIPs. Currently, deep learning-based peptide structure and function prediction algorithms have achieved significant advancements. The convolutional neural network (CNN) and peptide sequence-based multi-label deep learning approach for determining the multi-functionalities of bioactive peptides (MLBP) can predict multiple peptide functions with absolute true value and accuracy of 0.699 and 0.708, respectively. Utilizing peptide sequence input, torsion angles, and inter-residue distance to train neural networks, APPTEST predicted the average backbone root mean square deviation (RMSD) value of peptide (5-40 aa) structures as low as 1.96 Å. Overall, with the exploration of more neural network architectures, deep learning could be considered a critical research tool to reduce the cost and improve the efficiency of identifying novel endogenous ANGICon-EIPs.
Collapse
Affiliation(s)
- Wei Jia
- School of Food and Bioengineering, Shaanxi University of Science and Technology, Xi'an 710021, China; Inspection and Testing Center of Fuping County (Shaanxi goat milk product quality supervision and Inspection Center), Weinan 711700, China; Shaanxi Research Institute of Agricultural Products Processing Technology, Xi'an 710021, China.
| | - Jian Peng
- School of Food and Bioengineering, Shaanxi University of Science and Technology, Xi'an 710021, China
| | - Yan Zhang
- Inspection and Testing Center of Fuping County (Shaanxi goat milk product quality supervision and Inspection Center), Weinan 711700, China
| | - Jiying Zhu
- School of Food and Bioengineering, Shaanxi University of Science and Technology, Xi'an 710021, China
| | - Xin Qiang
- Inspection and Testing Center of Fuping County (Shaanxi goat milk product quality supervision and Inspection Center), Weinan 711700, China
| | - Rong Zhang
- School of Food and Bioengineering, Shaanxi University of Science and Technology, Xi'an 710021, China
| | - Lin Shi
- School of Food and Bioengineering, Shaanxi University of Science and Technology, Xi'an 710021, China
| |
Collapse
|