1
|
Dou Z, He J, Han C, Wu X, Wan L, Yang J, Zheng Y, Gong B, Wang L. qProtein: Exploring Physical Features of Protein Thermostability Based on Structural Proteomics. J Chem Inf Model 2024; 64:7885-7894. [PMID: 39375829 DOI: 10.1021/acs.jcim.4c01303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/09/2024]
Abstract
Thermostability, which is essential for the functional performance of enzymes, is largely determined by intramolecular physical interactions. Although many tools have been developed, existing computational methods have struggled to find the universal principles of protein thermostability. Recent advancements in structural proteomics have been driven by the introduction of deep neural networks such as AlphaFold2 and ESMFold. These innovations have enabled the characterization of protein structures with unprecedented speed and accuracy. Here, we introduce qProtein, a Python-implemented workflow designed for the quantitative analysis of physical interactions on the scale of structural proteomics. This platform accepts protein sequences as input and produces four structural features, including hydrophobic clusters, hydrogen bonds, electrostatic interactions, and disulfide bonds. To demonstrate the use of qProtein, we investigate the structural features related to protein thermostability in six glycoside hydrolase (GH) families, comprising a total of 3,811 protein structures. Our results indicate that in five enzyme families (GH11, GH12, GH5_2, GH10, and GH48), the thermophilic enzymes have a larger average area of hydrophobic clusters compared to the nonthermophilic enzymes within each family. Furthermore, our analysis of the local-structure regions reveals that the hydrophobic clusters are predominantly distributed in the distal regions of the GH11 enzymes. In addition, the average hydrophobic cluster area of the thermophilic enzymes is significantly higher than that of the nonthermophilic enzymes in the distal regions of the GH11 enzymes. Therefore, qProtein is a well-suited platform for analyzing the structural features of thermal stability at the level of structural proteomics. We provide the source code for qProtein at https://github.com/bj600800/qProtein, and the web server is available at http://qProtein.sdu.edu.cn:8888.
Collapse
Affiliation(s)
- Zhixin Dou
- State Key Laboratory of Microbial Technology, Shandong University, No. 72 Binhai Road, Qingdao 266237, P.R. China
| | - Jiaxin He
- School of Computer Science and Technology, Shandong University, No. 72 Binhai Road, Qingdao 266237, P.R. China
| | - Chao Han
- Shandong Key Laboratory of Agricultural Microbiology, Shandong Agricultural University, Tai'an 271018, China
| | - Xiuyun Wu
- State Key Laboratory of Microbial Technology, Shandong University, No. 72 Binhai Road, Qingdao 266237, P.R. China
| | - Lin Wan
- School of Software, Shandong University, Shunhua Road, Jinan 250101, P.R. China
| | - Jian Yang
- School of Computer Science and Technology, Shandong University, No. 72 Binhai Road, Qingdao 266237, P.R. China
| | - Yanwei Zheng
- School of Computer Science and Technology, Shandong University, No. 72 Binhai Road, Qingdao 266237, P.R. China
| | - Bin Gong
- School of Software, Shandong University, Shunhua Road, Jinan 250101, P.R. China
| | - Lushan Wang
- State Key Laboratory of Microbial Technology, Shandong University, No. 72 Binhai Road, Qingdao 266237, P.R. China
| |
Collapse
|
2
|
Han J, Ullah M, Andoh V, Khan MN, Feng Y, Guo Z, Chen H. Engineering Bacterial Chitinases for Industrial Application: From Protein Engineering to Bacterial Strains Mutation! A Comprehensive Review of Physical, Molecular, and Computational Approaches. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2024; 72:23082-23096. [PMID: 39388625 DOI: 10.1021/acs.jafc.4c06856] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
Bacterial chitinases are integral in breaking down chitin, the natural polymer in crustacean and insect exoskeletons. Their increasing utilization across various sectors such as agriculture, waste management, biotechnology, food processing, and pharmaceutical industries highlights their significance as biocatalysts. The current review investigates various scientific strategies to maximize the efficiency and production of bacterial chitinases for industrial use. Our goal is to optimize the heterologous production process using physical, molecular, and computational tools. Physical methods focus on isolating, purifying, and characterizing chitinases from various sources to ensure optimal conditions for maximum enzyme activity. Molecular techniques involve gene cloning, site-directed mutation, and CRISPR-Cas9 gene editing as an approach for creating chitinases with improved catalytic activity, substrate specificity, and stability. Computational approaches use molecular modeling, docking, and simulation techniques to accurately predict enzyme-substrate interactions and enhance chitinase variants' design. Integrating multidisciplinary strategies enables the development of highly efficient chitinases tailored for specific industrial applications. This review summarizes current knowledge and advances in chitinase engineering to serve as an indispensable guideline for researchers and industrialists seeking to optimize chitinase production for various uses.
Collapse
Affiliation(s)
- Jianda Han
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu 212000, P. R. China
| | - Mati Ullah
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu 212000, P. R. China
| | - Vivian Andoh
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu 212000, P. R. China
| | - Muhammad Nadeem Khan
- Department of Cell Biology and Genetics, Shantou University Medical College, Shantou 515041, P. R. China
| | - Yong Feng
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu 212000, P. R. China
| | - Zhongjian Guo
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu 212000, P. R. China
| | - Huayou Chen
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu 212000, P. R. China
| |
Collapse
|
3
|
Su Z, El Hage M, Linnebacher M. Mutation patterns in colorectal cancer and their relationship with prognosis. Heliyon 2024; 10:e36550. [PMID: 39263143 PMCID: PMC11387246 DOI: 10.1016/j.heliyon.2024.e36550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 08/12/2024] [Accepted: 08/19/2024] [Indexed: 09/13/2024] Open
Abstract
Background Colorectal cancer (CRC) is a prevalent malignancy and a leading cause of cancer-related mortality. Extensive research into the aetiology of CRC has revealed that somatic mutations in certain genes play a crucial role in CRC development.AIM: In this study, we utilized data from public databases to investigate prevalent mutation patterns in CRC and developed a prognostic predictive model for CRC patients based on mutant genetic characteristics and other relevant clinical features. Methods We initially gathered mutation information from CRC patients by analysing data from 15 datasets to identify genes with a mutation frequency of ≥10 %. Next, log-rank analyses were used to determine the relationship between prognosis and the mutational status of the most commonly mutated genes; the SIGnaling database was utilized to generate a protein‒protein interaction network. We consolidated and classified the gene mutation patterns of CRC patients in the database based on frequently mutated genes related to prognosis. A predictive nomogram was constructed, including age, sex, TNM stage, and mutation partner, based on available clinical, mutational, and prognostic information for CRC patients at our institution. Finally, the reliability of the model was verified using time-dependent ROC curve analysis. Results The top 7 genes somatically mutated ≥10 % in 4477 samples from 4255 patients were TP53 (67 %), APC (66 %), KRAS (43 %), PIK3CA (18 %), FBXW7 (14 %), SMAD4 (14 %), and BRAF (10 %). Log-rank analysis demonstrated that the mutation status of 5 genes, namely, TP53, APC, PIK3CA, SMAD4, and BRAF, correlated significantly with prognosis. Protein‒protein interaction analysis confirmed functional interactions between these 5 genes, implicating them in tumorigenesis. We exhaustively enumerated the mutation patterns involving these five genes in 4255 patients, resulting in identification of 32 mutational patterns. After consolidation and classification, these patterns were divided into 3 grades based on patient prognosis. Next, a predictive nomogram based on the clinical, mutational, and prognostic information of 107 CRC patients treated at University Medical Center Rostock was constructed. The area under the curve (AUC) values for the model for predicting 1-, 3-, and 5-year overall survival were 0.779, 0.721, and 0.815, respectively. Conclusion Common mutational patterns based on frequently mutated genes are associated with prognosis in CRC patients. Our study provides a valuable and concise prognostic predictor for determining outcomes in patients with CRC.
Collapse
Affiliation(s)
- Zhaoran Su
- Department of Gastrointestinal Surgery, People's Hospital of Tongling City, China
- College of Mathematics and Computer Science, Tongling University, Tongling 244000, China
- Molecular Oncology and Immunotherapy, Clinic of General Surgery, University Medical Center Rostock, Rostock 18057, Germany
| | - Maria El Hage
- Molecular Oncology and Immunotherapy, Clinic of General Surgery, University Medical Center Rostock, Rostock 18057, Germany
| | - Michael Linnebacher
- Molecular Oncology and Immunotherapy, Clinic of General Surgery, University Medical Center Rostock, Rostock 18057, Germany
| |
Collapse
|
4
|
Li SS, Liu ZM, Li J, Ma YB, Dong ZY, Hou JW, Shen FJ, Wang WB, Li QM, Su JG. Prediction of mutation-induced protein stability changes based on the geometric representations learned by a self-supervised method. BMC Bioinformatics 2024; 25:282. [PMID: 39198740 PMCID: PMC11360314 DOI: 10.1186/s12859-024-05876-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 07/19/2024] [Indexed: 09/01/2024] Open
Abstract
BACKGROUND Thermostability is a fundamental property of proteins to maintain their biological functions. Predicting protein stability changes upon mutation is important for our understanding protein structure-function relationship, and is also of great interest in protein engineering and pharmaceutical design. RESULTS Here we present mutDDG-SSM, a deep learning-based framework that uses the geometric representations encoded in protein structure to predict the mutation-induced protein stability changes. mutDDG-SSM consists of two parts: a graph attention network-based protein structural feature extractor that is trained with a self-supervised learning scheme using large-scale high-resolution protein structures, and an eXtreme Gradient Boosting model-based stability change predictor with an advantage of alleviating overfitting problem. The performance of mutDDG-SSM was tested on several widely-used independent datasets. Then, myoglobin and p53 were used as case studies to illustrate the effectiveness of the model in predicting protein stability changes upon mutations. Our results show that mutDDG-SSM achieved high performance in estimating the effects of mutations on protein stability. In addition, mutDDG-SSM exhibited good unbiasedness, where the prediction accuracy on the inverse mutations is as well as that on the direct mutations. CONCLUSION Meaningful features can be extracted from our pre-trained model to build downstream tasks and our model may serve as a valuable tool for protein engineering and drug design.
Collapse
Affiliation(s)
- Shan Shan Li
- High Performance Computing Center, National Vaccine and Serum Institute (NVSI), Beijing, China
- National Engineering Center for New Vaccine Research, Beijing, China
| | - Zhao Ming Liu
- National Engineering Center for New Vaccine Research, Beijing, China
- The Sixth Laboratory, National Vaccine and Serum Institute (NVSI), Beijing, China
| | - Jiao Li
- High Performance Computing Center, National Vaccine and Serum Institute (NVSI), Beijing, China
- National Engineering Center for New Vaccine Research, Beijing, China
| | - Yi Bo Ma
- High Performance Computing Center, National Vaccine and Serum Institute (NVSI), Beijing, China
- National Engineering Center for New Vaccine Research, Beijing, China
| | - Ze Yuan Dong
- High Performance Computing Center, National Vaccine and Serum Institute (NVSI), Beijing, China
- National Engineering Center for New Vaccine Research, Beijing, China
| | - Jun Wei Hou
- National Engineering Center for New Vaccine Research, Beijing, China
- The Sixth Laboratory, National Vaccine and Serum Institute (NVSI), Beijing, China
| | - Fu Jie Shen
- National Engineering Center for New Vaccine Research, Beijing, China
- The Sixth Laboratory, National Vaccine and Serum Institute (NVSI), Beijing, China
| | - Wei Bu Wang
- High Performance Computing Center, National Vaccine and Serum Institute (NVSI), Beijing, China
- National Engineering Center for New Vaccine Research, Beijing, China
| | - Qi Ming Li
- National Engineering Center for New Vaccine Research, Beijing, China.
- The Sixth Laboratory, National Vaccine and Serum Institute (NVSI), Beijing, China.
| | - Ji Guo Su
- High Performance Computing Center, National Vaccine and Serum Institute (NVSI), Beijing, China.
- National Engineering Center for New Vaccine Research, Beijing, China.
| |
Collapse
|
5
|
Bernett J, Blumenthal DB, Grimm DG, Haselbeck F, Joeres R, Kalinina OV, List M. Guiding questions to avoid data leakage in biological machine learning applications. Nat Methods 2024; 21:1444-1453. [PMID: 39122953 DOI: 10.1038/s41592-024-02362-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 06/26/2024] [Indexed: 08/12/2024]
Abstract
Machine learning methods for extracting patterns from high-dimensional data are very important in the biological sciences. However, in certain cases, real-world applications cannot confirm the reported prediction performance. One of the main reasons for this is data leakage, which can be seen as the illicit sharing of information between the training data and the test data, resulting in performance estimates that are far better than the performance observed in the intended application scenario. Data leakage can be difficult to detect in biological datasets due to their complex dependencies. With this in mind, we present seven questions that should be asked to prevent data leakage when constructing machine learning models in biological domains. We illustrate the usefulness of our questions by applying them to nontrivial examples. Our goal is to raise awareness of potential data leakage problems and to promote robust and reproducible machine learning-based research in biology.
Collapse
Affiliation(s)
- Judith Bernett
- TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - David B Blumenthal
- Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
| | - Dominik G Grimm
- TUM Campus Straubing for Biotechnology and Sustainability, Technical University of Munich, Straubing, Germany.
- Bioinformatics, Weihenstephan-Triesdorf University of Applied Sciences, Straubing, Germany.
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
| | - Florian Haselbeck
- TUM Campus Straubing for Biotechnology and Sustainability, Technical University of Munich, Straubing, Germany
- Bioinformatics, Weihenstephan-Triesdorf University of Applied Sciences, Straubing, Germany
- Smart Farming, Weihenstephan-Triesdorf University of Applied Sciences, Freising, Germany
| | - Roman Joeres
- Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden
- Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg, Sweden
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
| | - Olga V Kalinina
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany.
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany.
- Medical Faculty, Saarland University, Homburg, Germany.
| | - Markus List
- TUM School of Life Sciences, Technical University of Munich, Freising, Germany.
- Munich Data Science Institute (MDSI), Technical University of Munich, Garching, Germany.
| |
Collapse
|
6
|
Cuturello F, Celoria M, Ansuini A, Cazzaniga A. Enhancing predictions of protein stability changes induced by single mutations using MSA-based Language Models. Bioinformatics 2024; 40:btae447. [PMID: 39012369 PMCID: PMC11269464 DOI: 10.1093/bioinformatics/btae447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 06/19/2024] [Accepted: 07/10/2024] [Indexed: 07/17/2024] Open
Abstract
MOTIVATION Protein Language Models offer a new perspective for addressing challenges in structural biology, while relying solely on sequence information. Recent studies have investigated their effectiveness in forecasting shifts in thermodynamic stability caused by single amino acid mutations, a task known for its complexity due to the sparse availability of data, constrained by experimental limitations. To tackle this problem, we introduce two key novelties: leveraging a Protein Language Model that incorporates Multiple Sequence Alignments to capture evolutionary information, and using a recently released mega-scale dataset with rigorous data pre-processing to mitigate overfitting. RESULTS We ensure comprehensive comparisons by fine-tuning various pre-trained models, taking advantage of analyses such as ablation studies and baselines evaluation. Our methodology introduces a stringent policy to reduce the widespread issue of data leakage, rigorously removing sequences from the training set when they exhibit significant similarity with the test set. The MSA Transformer emerges as the most accurate among the models under investigation, given its capability to leverage co-evolution signals encoded in aligned homologous sequences. Moreover, the optimized MSA Transformer outperforms existing methods and exhibits enhanced generalization power, leading to a notable improvement in predicting changes in protein stability resulting from point mutations. AVAILABILITY AND IMPLEMENTATION Code and data at https://github.com/RitAreaSciencePark/PLM4Muts. SUPPLEMENTARY INFORMATION Supplementary Information is available at Bioinformatics online.
Collapse
Affiliation(s)
- Francesca Cuturello
- Research and Technology Institute, , AREA Science Park, Trieste 34149, Italy
| | - Marco Celoria
- Research and Technology Institute, , AREA Science Park, Trieste 34149, Italy
- HPC Department, , CINECA National Supercomputing Center, Bologna 40033, Italy
| | - Alessio Ansuini
- Research and Technology Institute, , AREA Science Park, Trieste 34149, Italy
| | - Alberto Cazzaniga
- Research and Technology Institute, , AREA Science Park, Trieste 34149, Italy
| |
Collapse
|
7
|
Narayanan KK, Weigle AT, Xu L, Mi X, Zhang C, Chen LQ, Procko E, Shukla D. Deep mutational scanning reveals sequence to function constraints for SWEET family transporters. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.28.601307. [PMID: 39005363 PMCID: PMC11244857 DOI: 10.1101/2024.06.28.601307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Protein science is entering a transformative phase enabled by deep mutational scans that provide an unbiased view of the residue level interactions that mediate function. However, it has yet to be extensively used to characterize the mutational and evolutionary landscapes of plant proteins. Here, we apply the method to explore sequence-function relationships within the sugar transporter AtSWEET13. DMS results describe how mutational interrogation throughout different regions of the protein affects AtSWEET13 abundance and transport function. Our results identify novel transport-enhancing mutations that are validated using the FRET sensor assays. Extending DMS results to phylogenetic analyses reveal the role of transmembrane helix 4 (TM4) which makes the SWEET family transporters distinct from prokaryotic SemiSWEETs. We show that transmembrane helix 4 is intolerant to motif swapping with other clade-specific SWEET TM4 compositions, despite accommodating single point-mutations towards aromatic and charged polar amino acids. We further show that the transfer learning approaches based on physics and ML based In silico variant prediction tools have limited utility for engineering plant proteins as they were unable to reproduce our experimental results. We conclude that DMS can produce datasets which, when combined with the right predictive computational frameworks, can direct plant engineering efforts through derivative phenotype selection and evolutionary insights.
Collapse
Affiliation(s)
- Krishna K. Narayanan
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Austin T. Weigle
- Department of Chemistry, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Lingyun Xu
- Department of Bioengineering, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Xuenan Mi
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Chen Zhang
- Department of Plant Biology, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Li-Qing Chen
- Department of Plant Biology, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Erik Procko
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, United States
- Cyrus Biotechnology, Inc., Seattle, Washington 98121, United States
| | - Diwakar Shukla
- Department of Chemical & Biomolecular Engineering; Department of Plant Biology; Department of Bioengineering; Department of Chemistry, Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
8
|
Sun X, Yang S, Wu Z, Su J, Hu F, Chang F, Li C. PMSPcnn: Predicting protein stability changes upon single point mutations with convolutional neural network. Structure 2024; 32:838-848.e3. [PMID: 38508191 DOI: 10.1016/j.str.2024.02.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 12/19/2023] [Accepted: 02/22/2024] [Indexed: 03/22/2024]
Abstract
Protein missense mutations and resulting protein stability changes are important causes for many human genetic diseases. However, the accurate prediction of stability changes due to mutations remains a challenging problem. To address this problem, we have developed an unbiased effective model: PMSPcnn that is based on a convolutional neural network. We have included an anti-symmetry property to build a balanced training dataset, which improves the prediction, in particular for stabilizing mutations. Persistent homology, which is an effective approach for characterizing protein structures, is used to obtain topological features. Additionally, a regression stratification cross-validation scheme has been proposed to improve the prediction for mutations with extreme ΔΔG. For three test datasets: Ssym, p53, and myoglobin, PMSPcnn achieves a better performance than currently existing predictors. PMSPcnn also outperforms currently available methods for membrane proteins. Overall, PMSPcnn is a promising method for the prediction of protein stability changes caused by single point mutations.
Collapse
Affiliation(s)
- Xiaohan Sun
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Shuang Yang
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Zhixiang Wu
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Jingjie Su
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Fangrui Hu
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Fubin Chang
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Chunhua Li
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China.
| |
Collapse
|
9
|
Qiu Y, Huang T, Cai YD. Review of predicting protein stability changes upon variations. Proteomics 2024; 24:e2300371. [PMID: 38643379 DOI: 10.1002/pmic.202300371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 04/07/2024] [Accepted: 04/08/2024] [Indexed: 04/22/2024]
Abstract
Forecasting alterations in protein stability caused by variations holds immense importance. Improving the thermal stability of proteins is important for biomedical and industrial applications. This review discusses the latest methods for predicting the effects of mutations on protein stability, databases containing protein mutations and thermodynamic parameters, and experimental techniques for efficiently assessing protein stability in high-throughput settings. Various publicly available databases for protein stability prediction are introduced. Furthermore, state-of-the-art computational approaches for anticipating protein stability changes due to variants are reviewed. Each method's types of features, base algorithm, and prediction results are also detailed. Additionally, some experimental approaches for verifying the prediction results of computational methods are introduced. Finally, the review summarizes the progress and challenges of protein stability prediction and discusses potential models for future research directions.
Collapse
Affiliation(s)
- Yiling Qiu
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- School of Mathematics and Statistics, Guangdong University of Technology, Guangzhou, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
10
|
Zhang Y, Wu K, Li Y, Wu S, Warshel A, Bai C. Predicting Mutational Effects on Ca 2+-Activated Chloride Conduction of TMEM16A Based on a Simulation Study. J Am Chem Soc 2024; 146:4665-4679. [PMID: 38319142 DOI: 10.1021/jacs.3c11940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2024]
Abstract
The dysfunction and defects of ion channels are associated with many human diseases, especially for loss-of-function mutations in ion channels such as cystic fibrosis transmembrane conductance regulator mutations in cystic fibrosis. Understanding ion channels is of great current importance for both medical and fundamental purposes. Such an understanding should include the ability to predict mutational effects and describe functional and mechanistic effects. In this work, we introduce an approach to predict mutational effects based on kinetic information (including reaction barriers and transition state locations) obtained by studying the working mechanism of target proteins. Specifically, we take the Ca2+-activated chloride channel TMEM16A as an example and utilize the computational biology model to predict the mutational effects of key residues. Encouragingly, we verified our predictions through electrophysiological experiments, demonstrating a 94% prediction accuracy regarding mutational directions. The mutational strength assessed by Pearson's correlation coefficient is -0.80 between our calculations and the experimental results. These findings suggest that the proposed methodology is reliable and can provide valuable guidance for revealing functional mechanisms and identifying key residues of the TMEM16A channel. The proposed approach can be extended to a broad scope of biophysical systems.
Collapse
Affiliation(s)
- Yue Zhang
- Warshel Institute for Computational Biology, School of Life and Health Sciences, School of Medicine, The Chinese University of Hong Kong (Shenzhen), Shenzhen 518172, China
- School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
| | - Kang Wu
- South China Hospital, Health Science Center, Shenzhen University, Shenzhen 518116, China
| | - Yuqing Li
- South China Hospital, Health Science Center, Shenzhen University, Shenzhen 518116, China
| | - Song Wu
- South China Hospital, Health Science Center, Shenzhen University, Shenzhen 518116, China
| | - Arieh Warshel
- Department of Chemistry, University of Southern California, Los Angeles, California 90089-1062, United States
| | - Chen Bai
- Warshel Institute for Computational Biology, School of Life and Health Sciences, School of Medicine, The Chinese University of Hong Kong (Shenzhen), Shenzhen 518172, China
- Chenzhu Biotechnology Co., Ltd., Hangzhou 310005, China
| |
Collapse
|
11
|
Cheon H, Kim JH, Kim JS, Park JB. Valorization of single-carbon chemicals by using carboligases as key enzymes. Curr Opin Biotechnol 2024; 85:103047. [PMID: 38128199 DOI: 10.1016/j.copbio.2023.103047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 11/23/2023] [Accepted: 11/24/2023] [Indexed: 12/23/2023]
Abstract
Single-carbon (C1) biorefinery plays a key role in the consumption of global greenhouse gases and a circular carbon economy. Thereby, we have focused on the valorization of C1 compounds (e.g. methanol, formaldehyde, and formate) into multicarbon products, including bioplastic monomers, glycolate, and ethylene glycol. For instance, methanol, derived from the oxidation of CH4, can be converted into glycolate, ethylene glycol, or erythrulose via formaldehyde and glycolaldehyde, employing C1 and/or C2 carboligases as essential enzymes. Escherichia coli was engineered to convert formate, produced from CO via CO2 or from CO2 directly, into glycolate. Recent progress in the design of biotransformation pathways, enzyme discovery, and engineering, as well as whole-cell biocatalyst engineering for C1 biorefinery, was addressed in this review.
Collapse
Affiliation(s)
- Huijin Cheon
- Department of Food Science and Biotechnology, Ewha Womans University, Seoul 03760, Republic of Korea
| | - Jun-Hong Kim
- Department of Chemistry, Chonnam National University, Gwangju 61186, Republic of Korea
| | - Jeong-Sun Kim
- Department of Chemistry, Chonnam National University, Gwangju 61186, Republic of Korea.
| | - Jin-Byung Park
- Department of Food Science and Biotechnology, Ewha Womans University, Seoul 03760, Republic of Korea.
| |
Collapse
|
12
|
Zheng F, Liu Y, Yang Y, Wen Y, Li M. Assessing computational tools for predicting protein stability changes upon missense mutations using a new dataset. Protein Sci 2024; 33:e4861. [PMID: 38084013 PMCID: PMC10751734 DOI: 10.1002/pro.4861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 11/14/2023] [Accepted: 12/06/2023] [Indexed: 12/28/2023]
Abstract
Insight into how mutations affect protein stability is crucial for protein engineering, understanding genetic diseases, and exploring protein evolution. Numerous computational methods have been developed to predict the impact of amino acid substitutions on protein stability. Nevertheless, comparing these methods poses challenges due to variations in their training data. Moreover, it is observed that they tend to perform better at predicting destabilizing mutations than stabilizing ones. Here, we meticulously compiled a new dataset from three recently published databases: ThermoMutDB, FireProtDB, and ProThermDB. This dataset, which does not overlap with the well-established S2648 dataset, consists of 4038 single-point mutations, including over 1000 stabilizing mutations. We assessed these mutations using 27 computational methods, including the latest ones utilizing mega-scale stability datasets and transfer learning. We excluded entries with overlap or similarity to training datasets to ensure fairness. Pearson correlation coefficients for the tested tools ranged from 0.20 to 0.53 on unseen data, and none of the methods could accurately predict stabilizing mutations, even those performing well in anti-symmetric property analysis. While most methods present consistent trends for predicting destabilizing mutations across various properties such as solvent exposure and secondary conformation, stabilizing mutations do not exhibit a clear pattern. Our study also suggests that solely addressing training dataset bias may not significantly enhance accuracy of predicting stabilizing mutations. These findings emphasize the importance of developing precise predictive methods for stabilizing mutations.
Collapse
Affiliation(s)
- Feifan Zheng
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Yang Liu
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Yan Yang
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Yuhao Wen
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Minghui Li
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| |
Collapse
|
13
|
Liu Z, Bao Y, Wang W, Pan L, Wang H, Lin GN. Emden: A novel method integrating graph and transformer representations for predicting the effect of mutations on clinical drug response. Comput Biol Med 2023; 167:107678. [PMID: 37976823 DOI: 10.1016/j.compbiomed.2023.107678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/22/2023] [Accepted: 11/06/2023] [Indexed: 11/19/2023]
Abstract
Precision medicine based on personalized genomics provides promising strategies to enhance the efficacy of molecular-targeted therapies. However, the clinical effectiveness of drugs has been severely limited due to genetic variations that lead to drug resistance. Predicting the impact of missense mutations on clinical drug response is an essential way to reduce the cost of clinical trials and understand genetic diseases. Here, we present Emden, a novel method integrating graph and transformer representations that predicts the effect of missense mutations on drug response through binary classification with interpretability. Emden utilized protein sequences-based features and drug structures as inputs for rapid prediction, employing competitive representation learning and demonstrating strong generalization capabilities and robustness. Our study showed promising potential for clinical drug guidance and deep insight into computer-assisted precision medicine. Emden is freely available as a web server at https://www.psymukb.net/Emden.
Collapse
Affiliation(s)
- Zhe Liu
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Yihang Bao
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Weidi Wang
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Liangwei Pan
- Department of Thoracic Surgery, Zhongshan Hospital, Fudan University, Shanghai, China
| | - Han Wang
- School of Information Science and Technology, Institute of Computational Biology, Northeast Normal University, Changchun, China.
| | - Guan Ning Lin
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China; Shanghai Key Laboratory of Psychotic Disorders, Shanghai, China.
| |
Collapse
|
14
|
Wang S, Tang H, Shan P, Wu Z, Zuo L. ProS-GNN: Predicting effects of mutations on protein stability using graph neural networks. Comput Biol Chem 2023; 107:107952. [PMID: 37643501 DOI: 10.1016/j.compbiolchem.2023.107952] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 08/18/2023] [Accepted: 08/25/2023] [Indexed: 08/31/2023]
Abstract
Predicting protein stability change upon variation through a computational approach is a valuable tool to unveil the mechanisms of mutation-induced drug failure and develop immunotherapy strategies. Some previous machine learning-based techniques exhibit anti-symmetric bias toward destabilizing situations, whereas others struggle with generalization to unseen examples. To address these issues, we propose a gated graph neural network-based approach to predict changes in protein stability upon mutation. The model uses message passing to encode the links between the molecular structure and property after eliminating the non-mutant structure and creating input feature vectors. While doing so, it also incorporates the coordinates of the raw atoms to provide spatial insights into the chemical systems. We test the model on the Ssym, Myoglobin, Broom, and p53 datasets to demonstrate the generalization performance. Compared to existing approaches, our proposed method achieves improved linearity with symmetry in less time. The code for this study is available at: https://github.com/HongzhouTang/Pros-GNN.
Collapse
Affiliation(s)
- Shuyu Wang
- Department of Control Engineering, Northeastern University, Qinhuangdao Campus, Qinhuangdao 066001, China.
| | - Hongzhou Tang
- Department of Control Engineering, Northeastern University, Qinhuangdao Campus, Qinhuangdao 066001, China
| | - Peng Shan
- Department of Control Engineering, Northeastern University, Qinhuangdao Campus, Qinhuangdao 066001, China
| | - Zhaoxia Wu
- Department of Control Engineering, Northeastern University, Qinhuangdao Campus, Qinhuangdao 066001, China
| | - Lei Zuo
- Department of Marine Engineering, University of Michigan, Ann Arbor 48109, USA
| |
Collapse
|
15
|
Kouba P, Kohout P, Haddadi F, Bushuiev A, Samusevich R, Sedlar J, Damborsky J, Pluskal T, Sivic J, Mazurenko S. Machine Learning-Guided Protein Engineering. ACS Catal 2023; 13:13863-13895. [PMID: 37942269 PMCID: PMC10629210 DOI: 10.1021/acscatal.3c02743] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 09/20/2023] [Indexed: 11/10/2023]
Abstract
Recent progress in engineering highly promising biocatalysts has increasingly involved machine learning methods. These methods leverage existing experimental and simulation data to aid in the discovery and annotation of promising enzymes, as well as in suggesting beneficial mutations for improving known targets. The field of machine learning for protein engineering is gathering steam, driven by recent success stories and notable progress in other areas. It already encompasses ambitious tasks such as understanding and predicting protein structure and function, catalytic efficiency, enantioselectivity, protein dynamics, stability, solubility, aggregation, and more. Nonetheless, the field is still evolving, with many challenges to overcome and questions to address. In this Perspective, we provide an overview of ongoing trends in this domain, highlight recent case studies, and examine the current limitations of machine learning-based methods. We emphasize the crucial importance of thorough experimental validation of emerging models before their use for rational protein design. We present our opinions on the fundamental problems and outline the potential directions for future research.
Collapse
Affiliation(s)
- Petr Kouba
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Faculty of
Electrical Engineering, Czech Technical
University in Prague, Technicka 2, 166 27 Prague 6, Czech Republic
| | - Pavel Kohout
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Faraneh Haddadi
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Anton Bushuiev
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Raman Samusevich
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Jiri Sedlar
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Jiri Damborsky
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Tomas Pluskal
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Josef Sivic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Stanislav Mazurenko
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| |
Collapse
|
16
|
Umerenkov D, Nikolaev F, Shashkova TI, Strashnov PV, Sindeeva M, Shevtsov A, Ivanisenko NV, Kardymon OL. PROSTATA: a framework for protein stability assessment using transformers. Bioinformatics 2023; 39:btad671. [PMID: 37935419 PMCID: PMC10651431 DOI: 10.1093/bioinformatics/btad671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 10/25/2023] [Accepted: 11/02/2023] [Indexed: 11/09/2023] Open
Abstract
MOTIVATION Accurate prediction of change in protein stability due to point mutations is an attractive goal that remains unachieved. Despite the high interest in this area, little consideration has been given to the transformer architecture, which is dominant in many fields of machine learning. RESULTS In this work, we introduce PROSTATA, a predictive model built in a knowledge-transfer fashion on a new curated dataset. PROSTATA demonstrates advantage over existing solutions based on neural networks. We show that the large improvement margin is due to both the architecture of the model and the quality of the new training dataset. This work opens up opportunities to develop new lightweight and accurate models for protein stability assessment. AVAILABILITY AND IMPLEMENTATION PROSTATA is available at https://github.com/AIRI-Institute/PROSTATA and https://prostata.airi.net.
Collapse
Affiliation(s)
| | | | | | - Pavel V Strashnov
- Bioinformatics Group, AIRI, Moscow 121170, Russia
- Department of Computer Design and Technology, Bauman Moscow State Technical University, Moscow 105005, Russia
| | | | - Andrey Shevtsov
- Bioinformatics Group, AIRI, Moscow 121170, Russia
- Regulatory Transcriptomics and Epigenomics Group, Institute of Bioengineering, Research Center of Biotechnology RAS, Moscow 117036, Russia
| | - Nikita V Ivanisenko
- Bioinformatics Group, AIRI, Moscow 121170, Russia
- Laboratory of Computational Proteomics, Institute of Cytology and Genetics SB RAS, Novosibirsk 630090, Russia
| | | |
Collapse
|
17
|
Dhiman A, Purohit R. Identification of potential mutational hotspots in serratiopeptidase to address its poor pH tolerance issue. J Biomol Struct Dyn 2023; 41:8831-8843. [PMID: 36307910 DOI: 10.1080/07391102.2022.2137699] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Accepted: 10/14/2022] [Indexed: 10/31/2022]
Abstract
Serratiopeptidase is the multifunctionality metalloendopeptidase extensively employed in biopharmaceutical and industrial biotechnology. Despite its poor pH tolerance, serratiopeptidase must withstand the highly acidic environment of the gastrointestinal tract to be used as a potent anti-inflammatory and analgesic medication. In earlier studies, post-translational deamination related mutations showed alteration in the net charge of protein's surface. Therefore, the current study aimed to enhance the acid resistance of serratiopeptidase via implementing computational interventions to screen out the most stable mutational hotspot. The methodology used in this study is as follows: (a) Higher accessibility to surface (b) 4 Å away from active site region to avoid interference with its proteolytic activity, and (c) By converting non-conserved amide residues to acidic residues. A docking study has been conducted to establish the substrate specificity and binding affinity to native and mutant proteins. The docking outcomes were then validated using molecular dynamic simulations to clarify each mutant's molecular stability and conformation while preserving their activity. The results showed that N412D is the best-screened mutant with negative electrostatic potential that can alter the overall charge on the protein's surface with increased H+ ions. Alteration in overall charge leads to protein surface more acidic that causes a common ion effect in stomach pH and act as a buffer which could stabilize the serratiopeptidase amid extreme pH.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Ankita Dhiman
- Structural Bioinformatics Lab, CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, Himachal Pradesh, India
- Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, Himachal Pradesh, India
- Academy of Scientific & Innovative Research (AcSIR), Ghaziabad, India
| | - Rituraj Purohit
- Structural Bioinformatics Lab, CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, Himachal Pradesh, India
- Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, Himachal Pradesh, India
- Academy of Scientific & Innovative Research (AcSIR), Ghaziabad, India
| |
Collapse
|
18
|
Zhang G, Luo Y, Dai X, Dai Z. Benchmarking deep learning methods for predicting CRISPR/Cas9 sgRNA on- and off-target activities. Brief Bioinform 2023; 24:bbad333. [PMID: 37775147 DOI: 10.1093/bib/bbad333] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Revised: 08/31/2023] [Accepted: 09/04/2023] [Indexed: 10/01/2023] Open
Abstract
In silico design of single guide RNA (sgRNA) plays a critical role in clustered regularly interspaced, short palindromic repeats/CRISPR-associated protein 9 (CRISPR/Cas9) system. Continuous efforts are aimed at improving sgRNA design with efficient on-target activity and reduced off-target mutations. In the last 5 years, an increasing number of deep learning-based methods have achieved breakthrough performance in predicting sgRNA on- and off-target activities. Nevertheless, it is worthwhile to systematically evaluate these methods for their predictive abilities. In this review, we conducted a systematic survey on the progress in prediction of on- and off-target editing. We investigated the performances of 10 mainstream deep learning-based on-target predictors using nine public datasets with different sample sizes. We found that in most scenarios, these methods showed superior predictive power on large- and medium-scale datasets than on small-scale datasets. In addition, we performed unbiased experiments to provide in-depth comparison of eight representative approaches for off-target prediction on 12 publicly available datasets with various imbalanced ratios of positive/negative samples. Most methods showed excellent performance on balanced datasets but have much room for improvement on moderate- and severe-imbalanced datasets. This study provides comprehensive perspectives on CRISPR/Cas9 sgRNA on- and off-target activity prediction and improvement for method development.
Collapse
Affiliation(s)
- Guishan Zhang
- College of Engineering, Shantou University, Shantou 515063, China
| | - Ye Luo
- College of Engineering, Shantou University, Shantou 515063, China
| | - Xianhua Dai
- School of Cyber Science and Technology, Sun Yat-sen University, Shenzhen 518107, China
- Southern Marine Science and Engineering Guangdong Laboratory, Zhuhai 519000, China
| | - Zhiming Dai
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China
- Guangdong Province Key Laboratory of Big Data Analysis and Processing, Sun Yat-sen University, Guangzhou 510006, China
| |
Collapse
|
19
|
Sieg J, Rarey M. Searching similar local 3D micro-environments in protein structure databases with MicroMiner. Brief Bioinform 2023; 24:bbad357. [PMID: 37833838 DOI: 10.1093/bib/bbad357] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 08/28/2023] [Accepted: 09/18/2023] [Indexed: 10/15/2023] Open
Abstract
The available protein structure data are rapidly increasing. Within these structures, numerous local structural sites depict the details characterizing structure and function. However, searching and analyzing these sites extensively and at scale poses a challenge. We present a new method to search local sites in protein structure databases using residue-defined local 3D micro-environments. We implemented the method in a new tool called MicroMiner and demonstrate the capabilities of residue micro-environment search on the example of structural mutation analysis. Usually, experimental structures for both the wild-type and the mutant are unavailable for comparison. With MicroMiner, we extracted $>255 \times 10^{6}$ amino acid pairs in protein structures from the PDB, exemplifying single mutations' local structural changes for single chains and $>45 \times 10^{6}$ pairs for protein-protein interfaces. We further annotate existing data sets of experimentally measured mutation effects, like $\Delta \Delta G$ measurements, with the extracted structure pairs to combine the mutation effect measurement with the structural change upon mutation. In addition, we show how MicroMiner can bridge the gap between mutation analysis and structure-based drug design tools. MicroMiner is available as a command line tool and interactively on the https://proteins.plus/ webserver.
Collapse
Affiliation(s)
- Jochen Sieg
- Universität Hamburg, ZBH - Center for Bioinformatics, Bundesstraße 43, 20146 Hamburg, Germany
| | - Matthias Rarey
- Universität Hamburg, ZBH - Center for Bioinformatics, Bundesstraße 43, 20146 Hamburg, Germany
| |
Collapse
|
20
|
Wang J, Zhang H, Chen N, Zeng T, Ai X, Wu K. PorcineAI-Enhancer: Prediction of Pig Enhancer Sequences Using Convolutional Neural Networks. Animals (Basel) 2023; 13:2935. [PMID: 37760334 PMCID: PMC10526013 DOI: 10.3390/ani13182935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 08/21/2023] [Accepted: 09/05/2023] [Indexed: 09/29/2023] Open
Abstract
Understanding the mechanisms of gene expression regulation is crucial in animal breeding. Cis-regulatory DNA sequences, such as enhancers, play a key role in regulating gene expression. Identifying enhancers is challenging, despite the use of experimental techniques and computational methods. Enhancer prediction in the pig genome is particularly significant due to the costliness of high-throughput experimental techniques. The study constructed a high-quality database of pig enhancers by integrating information from multiple sources. A deep learning prediction framework called PorcineAI-enhancer was developed for the prediction of pig enhancers. This framework employs convolutional neural networks for feature extraction and classification. PorcineAI-enhancer showed excellent performance in predicting pig enhancers, validated on an independent test dataset. The model demonstrated reliable prediction capability for unknown enhancer sequences and performed remarkably well on tissue-specific enhancer sequences.The study developed a deep learning prediction framework, PorcineAI-enhancer, for predicting pig enhancers. The model demonstrated significant predictive performance and potential for tissue-specific enhancers. This research provides valuable resources for future studies on gene expression regulation in pigs.
Collapse
Affiliation(s)
- Ji Wang
- College of Animal Science and Technology, China Agricultural University, Beijing 100193, China; (J.W.); (H.Z.); (T.Z.); (X.A.)
| | - Han Zhang
- College of Animal Science and Technology, China Agricultural University, Beijing 100193, China; (J.W.); (H.Z.); (T.Z.); (X.A.)
| | - Nanzhu Chen
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China;
| | - Tong Zeng
- College of Animal Science and Technology, China Agricultural University, Beijing 100193, China; (J.W.); (H.Z.); (T.Z.); (X.A.)
| | - Xiaohua Ai
- College of Animal Science and Technology, China Agricultural University, Beijing 100193, China; (J.W.); (H.Z.); (T.Z.); (X.A.)
| | - Keliang Wu
- College of Animal Science and Technology, China Agricultural University, Beijing 100193, China; (J.W.); (H.Z.); (T.Z.); (X.A.)
| |
Collapse
|
21
|
Liu Z, Qian W, Cai W, Song W, Wang W, Maharjan DT, Cheng W, Chen J, Wang H, Xu D, Lin GN. Inferring the Effects of Protein Variants on Protein-Protein Interactions with Interpretable Transformer Representations. RESEARCH (WASHINGTON, D.C.) 2023; 6:0219. [PMID: 37701056 PMCID: PMC10494974 DOI: 10.34133/research.0219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 08/20/2023] [Indexed: 09/14/2023]
Abstract
Identifying pathogenetic variants and inferring their impact on protein-protein interactions sheds light on their functional consequences on diseases. Limited by the availability of experimental data on the consequences of protein interaction, most existing methods focus on building models to predict changes in protein binding affinity. Here, we introduced MIPPI, an end-to-end, interpretable transformer-based deep learning model that learns features directly from sequences by leveraging the interaction data from IMEx. MIPPI was specifically trained to determine the types of variant impact (increasing, decreasing, disrupting, and no effect) on protein-protein interactions. We demonstrate the accuracy of MIPPI and provide interpretation through the analysis of learned attention weights, which exhibit correlations with the amino acids interacting with the variant. Moreover, we showed the practicality of MIPPI in prioritizing de novo mutations associated with complex neurodevelopmental disorders and the potential to determine the pathogenic and driving mutations. Finally, we experimentally validated the functional impact of several variants identified in patients with such disorders. Overall, MIPPI emerges as a versatile, robust, and interpretable model, capable of effectively predicting mutation impacts on protein-protein interactions and facilitating the discovery of clinically actionable variants.
Collapse
Affiliation(s)
- Zhe Liu
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Wei Qian
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Wenxiang Cai
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Weichen Song
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Weidi Wang
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
- Shanghai Key Laboratory of Psychotic Disorders, Shanghai, China
| | - Dhruba Tara Maharjan
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Wenhong Cheng
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Jue Chen
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Han Wang
- School of Information Science and Technology, Institute of Computational Biology, Northeast Normal University, Changchun, China
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Guan Ning Lin
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
- Shanghai Key Laboratory of Psychotic Disorders, Shanghai, China
| |
Collapse
|
22
|
Pandey P, Panday SK, Rimal P, Ancona N, Alexov E. Predicting the Effect of Single Mutations on Protein Stability and Binding with Respect to Types of Mutations. Int J Mol Sci 2023; 24:12073. [PMID: 37569449 PMCID: PMC10418460 DOI: 10.3390/ijms241512073] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 07/24/2023] [Accepted: 07/26/2023] [Indexed: 08/13/2023] Open
Abstract
The development of methods and algorithms to predict the effect of mutations on protein stability, protein-protein interaction, and protein-DNA/RNA binding is necessitated by the needs of protein engineering and for understanding the molecular mechanism of disease-causing variants. The vast majority of the leading methods require a database of experimentally measured folding and binding free energy changes for training. These databases are collections of experimental data taken from scientific investigations typically aimed at probing the role of particular residues on the above-mentioned thermodynamic characteristics, i.e., the mutations are not introduced at random and do not necessarily represent mutations originating from single nucleotide variants (SNV). Thus, the reported performance of the leading algorithms assessed on these databases or other limited cases may not be applicable for predicting the effect of SNVs seen in the human population. Indeed, we demonstrate that the SNVs and non-SNVs are not equally presented in the corresponding databases, and the distribution of the free energy changes is not the same. It is shown that the Pearson correlation coefficients (PCCs) of folding and binding free energy changes obtained in cases involving SNVs are smaller than for non-SNVs, indicating that caution should be used in applying them to reveal the effect of human SNVs. Furthermore, it is demonstrated that some methods are sensitive to the chemical nature of the mutations, resulting in PCCs that differ by a factor of four across chemically different mutations. All methods are found to underestimate the energy changes by roughly a factor of 2.
Collapse
Affiliation(s)
- Preeti Pandey
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA; (P.P.); (S.K.P.); (P.R.)
| | - Shailesh Kumar Panday
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA; (P.P.); (S.K.P.); (P.R.)
| | - Prawin Rimal
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA; (P.P.); (S.K.P.); (P.R.)
| | - Nicolas Ancona
- Department of Biological Sciences, Clemson University, Clemson, SC 29634, USA;
| | - Emil Alexov
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA; (P.P.); (S.K.P.); (P.R.)
| |
Collapse
|
23
|
Blaabjerg LM, Kassem MM, Good LL, Jonsson N, Cagiada M, Johansson KE, Boomsma W, Stein A, Lindorff-Larsen K. Rapid protein stability prediction using deep learning representations. eLife 2023; 12:e82593. [PMID: 37184062 PMCID: PMC10266766 DOI: 10.7554/elife.82593] [Citation(s) in RCA: 24] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 05/12/2023] [Indexed: 05/16/2023] Open
Abstract
Predicting the thermodynamic stability of proteins is a common and widely used step in protein engineering, and when elucidating the molecular mechanisms behind evolution and disease. Here, we present RaSP, a method for making rapid and accurate predictions of changes in protein stability by leveraging deep learning representations. RaSP performs on-par with biophysics-based methods and enables saturation mutagenesis stability predictions in less than a second per residue. We use RaSP to calculate ∼ 230 million stability changes for nearly all single amino acid changes in the human proteome, and examine variants observed in the human population. We find that variants that are common in the population are substantially depleted for severe destabilization, and that there are substantial differences between benign and pathogenic variants, highlighting the role of protein stability in genetic diseases. RaSP is freely available-including via a Web interface-and enables large-scale analyses of stability in experimental and predicted protein structures.
Collapse
Affiliation(s)
- Lasse M Blaabjerg
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of CopenhagenCopenhagenDenmark
| | - Maher M Kassem
- Center for Basic Machine Learning Research in Life Science, Department of Computer Science, University of CopenhagenCopenhagenDenmark
| | - Lydia L Good
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of CopenhagenCopenhagenDenmark
| | - Nicolas Jonsson
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of CopenhagenCopenhagenDenmark
| | - Matteo Cagiada
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of CopenhagenCopenhagenDenmark
| | - Kristoffer E Johansson
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of CopenhagenCopenhagenDenmark
| | - Wouter Boomsma
- Center for Basic Machine Learning Research in Life Science, Department of Computer Science, University of CopenhagenCopenhagenDenmark
| | - Amelie Stein
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of CopenhagenCopenhagenDenmark
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of CopenhagenCopenhagenDenmark
| |
Collapse
|
24
|
Dou Z, Sun Y, Jiang X, Wu X, Li Y, Gong B, Wang L. Data-driven strategies for the computational design of enzyme thermal stability: trends, perspectives, and prospects. Acta Biochim Biophys Sin (Shanghai) 2023; 55:343-355. [PMID: 37143326 PMCID: PMC10160227 DOI: 10.3724/abbs.2023033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 11/23/2022] [Indexed: 03/05/2023] Open
Abstract
Thermal stability is one of the most important properties of enzymes, which sustains life and determines the potential for the industrial application of biocatalysts. Although traditional methods such as directed evolution and classical rational design contribute greatly to this field, the enormous sequence space of proteins implies costly and arduous experiments. The development of enzyme engineering focuses on automated and efficient strategies because of the breakthrough of high-throughput DNA sequencing and machine learning models. In this review, we propose a data-driven architecture for enzyme thermostability engineering and summarize some widely adopted datasets, as well as machine learning-driven approaches for designing the thermal stability of enzymes. In addition, we present a series of existing challenges while applying machine learning in enzyme thermostability design, such as the data dilemma, model training, and use of the proposed models. Additionally, a few promising directions for enhancing the performance of the models are discussed. We anticipate that the efficient incorporation of machine learning can provide more insights and solutions for the design of enzyme thermostability in the coming years.
Collapse
Affiliation(s)
- Zhixin Dou
- State Key Laboratory of Microbial TechnologyShandong UniversityQingdao266237China
| | - Yuqing Sun
- School of SoftwareShandong UniversityJinan250101China
| | - Xukai Jiang
- National Glycoengineering Research CenterShandong UniversityQingdao266237China
| | - Xiuyun Wu
- State Key Laboratory of Microbial TechnologyShandong UniversityQingdao266237China
| | - Yingjie Li
- State Key Laboratory of Microbial TechnologyShandong UniversityQingdao266237China
| | - Bin Gong
- School of SoftwareShandong UniversityJinan250101China
| | - Lushan Wang
- State Key Laboratory of Microbial TechnologyShandong UniversityQingdao266237China
| |
Collapse
|
25
|
Patterson A, Elbasir A, Tian B, Auslander N. Computational Methods Summarizing Mutational Patterns in Cancer: Promise and Limitations for Clinical Applications. Cancers (Basel) 2023; 15:1958. [PMID: 37046619 PMCID: PMC10093138 DOI: 10.3390/cancers15071958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 02/24/2023] [Accepted: 03/09/2023] [Indexed: 03/29/2023] Open
Abstract
Since the rise of next-generation sequencing technologies, the catalogue of mutations in cancer has been continuously expanding. To address the complexity of the cancer-genomic landscape and extract meaningful insights, numerous computational approaches have been developed over the last two decades. In this review, we survey the current leading computational methods to derive intricate mutational patterns in the context of clinical relevance. We begin with mutation signatures, explaining first how mutation signatures were developed and then examining the utility of studies using mutation signatures to correlate environmental effects on the cancer genome. Next, we examine current clinical research that employs mutation signatures and discuss the potential use cases and challenges of mutation signatures in clinical decision-making. We then examine computational studies developing tools to investigate complex patterns of mutations beyond the context of mutational signatures. We survey methods to identify cancer-driver genes, from single-driver studies to pathway and network analyses. In addition, we review methods inferring complex combinations of mutations for clinical tasks and using mutations integrated with multi-omics data to better predict cancer phenotypes. We examine the use of these tools for either discovery or prediction, including prediction of tumor origin, treatment outcomes, prognosis, and cancer typing. We further discuss the main limitations preventing widespread clinical integration of computational tools for the diagnosis and treatment of cancer. We end by proposing solutions to address these challenges using recent advances in machine learning.
Collapse
Affiliation(s)
- Andrew Patterson
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- The Wistar Institute, Philadelphia, PA 19104, USA
| | | | - Bin Tian
- The Wistar Institute, Philadelphia, PA 19104, USA
| | - Noam Auslander
- The Wistar Institute, Philadelphia, PA 19104, USA
- Department of Cancer Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
26
|
Pennica C, Hanna G, Islam SA, JE Sternberg M, David A. Missense3D-PPI: a web resource to predict the impact of missense variants at protein interfaces using 3D structural data. J Mol Biol 2023. [DOI: 10.1016/j.jmb.2023.168060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/30/2023]
|
27
|
Lihan M, Lupyan D, Oehme D. Target-template relationships in protein structure prediction and their effect on the accuracy of thermostability calculations. Protein Sci 2023; 32:e4557. [PMID: 36573828 PMCID: PMC9878467 DOI: 10.1002/pro.4557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 12/22/2022] [Accepted: 12/23/2022] [Indexed: 12/28/2022]
Abstract
Improving protein thermostability has been a labor- and time-consuming process in industrial applications of protein engineering. Advances in computational approaches have facilitated the development of more efficient strategies to allow the prioritization of stabilizing mutants. Among these is FEP+, a free energy perturbation implementation that uses a thoroughly tested physics-based method to achieve unparalleled accuracy in predicting changes in protein thermostability. To gauge the applicability of FEP+ to situations where crystal structures are unavailable, here we have applied the FEP+ approach to homology models of 12 different proteins covering 316 mutations. By comparing predictions obtained with homology models to those obtained using crystal structures, we have identified that local rather than global sequence conservation between target and template sequence is a determining factor in the accuracy of predictions. By excluding mutation sites with low local sequence identity (<40%) to a template structure, we have obtained predictions with comparable performance to crystal structures (R2 of 0.67 and 0.63 and an RMSE of 1.20 and 1.16 kcal/mol for crystal structure and homology model predictions, respectively) for identifying stabilizing mutations when incorporating residue scanning into a cascade screening strategy. Additionally, we identify and discuss inherent limitations in sequence alignments and homology modeling protocols that translate into the poor FEP+ performance of a few select examples. Overall, our retrospective study provides detailed guidelines for the application of the FEP+ approach using homology models for protein thermostability predictions, which will greatly extend this approach to studies that were previously limited by structure availability.
Collapse
Affiliation(s)
- Muyun Lihan
- NIH Center for Macromolecular Modeling and Bioinformatics, Beckman Institute for Advanced Science and Technology, and Center for Biophysics and Quantitative BiologyUniversity of Illinois Urbana‐ChampaignUrbanaIllinoisUSA
- Schrödinger Inc.CambridgeMassachusettsUSA
| | | | | |
Collapse
|
28
|
Alteration of Chain-Length Selectivity and Thermostability of Rhizopus oryzae Lipase via Virtual Saturation Mutagenesis Coupled with Disulfide Bond Design. Appl Environ Microbiol 2023; 89:e0187822. [PMID: 36602359 PMCID: PMC9888275 DOI: 10.1128/aem.01878-22] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Rhizopus oryzae lipase (ROL) is one of the most important enzymes used in the food, biofuel, and pharmaceutical industries. However, the highly demanding conditions of industrial processes can reduce its stability and activity. To seek a feasible method to improve both the catalytic activity and the thermostability of this lipase, first, the structure of ROL was divided into catalytic and noncatalytic regions by identifying critical amino acids in the crevice-like binding pocket. Second, a mutant screening library aimed at improvement of ROL catalytic performance by virtual saturation mutagenesis of residues in the catalytic region was constructed based on Rosetta's Cartesian_ddg protocol. A double mutant, E265V/S267W (with an E-to-V change at residue 265 and an S-to-W change at residue 267), with markedly improved catalytic activity toward diverse chain-length fatty acid esters was identified. Then, computational design of disulfide bonds was conducted for the noncatalytic amino acids of E265V/S267W, and two potential disulfide bonds, S61C-S115C and E190C-E238C, were identified as candidates. Experimental data validated that the variant E265V/S267W/S61C-S115C/E190C-E238C had superior stability, with an increase of 8.5°C in the melting temperature and a half-life of 31.7 min at 60°C, 4.2-fold longer than that of the wild-type enzyme. Moreover, the variant improved the lipase activity toward five 4-nitrophenyl esters by 1.5 to 3.8 times, exhibiting a potential to modify the catalytic efficiency. IMPORTANCE Rhizopus oryzae lipase (ROL) is very attractive in biotechnology and industry as a safe and environmentally friendly biocatalyst. Functional expression of ROL in Escherichia coli facilitates effective high-throughput screening for positive variants. This work highlights a method to improve both selectivity and thermostability based on a combination of virtual saturation mutagenesis in the substrate pocket and disulfide bond prediction in the noncatalytic region. Using the method, ROL thermostability and activity to diverse 4-nitrophenyl esters could be substantially improved. The strategy of rational introduction of multiple mutations in different functional domains of the enzyme is a great prospect in the modification of biocatalysts.
Collapse
|
29
|
Hu R, Fu L, Chen Y, Chen J, Qiao Y, Si T. Protein engineering via Bayesian optimization-guided evolutionary algorithm and robotic experiments. Brief Bioinform 2023; 24:6958505. [PMID: 36562723 DOI: 10.1093/bib/bbac570] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 11/14/2022] [Accepted: 11/22/2022] [Indexed: 12/24/2022] Open
Abstract
Directed protein evolution applies repeated rounds of genetic mutagenesis and phenotypic screening and is often limited by experimental throughput. Through in silico prioritization of mutant sequences, machine learning has been applied to reduce wet lab burden to a level practical for human researchers. On the other hand, robotics permits large batches and rapid iterations for protein engineering cycles, but such capacities have not been well exploited in existing machine learning-assisted directed evolution approaches. Here, we report a scalable and batched method, Bayesian Optimization-guided EVOlutionary (BO-EVO) algorithm, to guide multiple rounds of robotic experiments to explore protein fitness landscapes of combinatorial mutagenesis libraries. We first examined various design specifications based on an empirical landscape of protein G domain B1. Then, BO-EVO was successfully generalized to another empirical landscape of an Escherichia coli kinase PhoQ, as well as simulated NK landscapes with up to moderate epistasis. This approach was then applied to guide robotic library creation and screening to engineer enzyme specificity of RhlA, a key biosynthetic enzyme for rhamnolipid biosurfactants. A 4.8-fold improvement in producing a target rhamnolipid congener was achieved after examining less than 1% of all possible mutants after four iterations. Overall, BO-EVO proves to be an efficient and general approach to guide combinatorial protein engineering without prior knowledge.
Collapse
Affiliation(s)
- Ruyun Hu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Lihao Fu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.,CAS Key Laboratory for Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen 518055, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yongcan Chen
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.,CAS Key Laboratory for Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen 518055, China
| | - Junyu Chen
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Yu Qiao
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Tong Si
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.,CAS Key Laboratory for Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen 518055, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
30
|
Chandra A, Tünnermann L, Löfstedt T, Gratz R. Transformer-based deep learning for predicting protein properties in the life sciences. eLife 2023; 12:e82819. [PMID: 36651724 PMCID: PMC9848389 DOI: 10.7554/elife.82819] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 01/06/2023] [Indexed: 01/19/2023] Open
Abstract
Recent developments in deep learning, coupled with an increasing number of sequenced proteins, have led to a breakthrough in life science applications, in particular in protein property prediction. There is hope that deep learning can close the gap between the number of sequenced proteins and proteins with known properties based on lab experiments. Language models from the field of natural language processing have gained popularity for protein property predictions and have led to a new computational revolution in biology, where old prediction results are being improved regularly. Such models can learn useful multipurpose representations of proteins from large open repositories of protein sequences and can be used, for instance, to predict protein properties. The field of natural language processing is growing quickly because of developments in a class of models based on a particular model-the Transformer model. We review recent developments and the use of large-scale Transformer models in applications for predicting protein characteristics and how such models can be used to predict, for example, post-translational modifications. We review shortcomings of other deep learning models and explain how the Transformer models have quickly proven to be a very promising way to unravel information hidden in the sequences of amino acids.
Collapse
Affiliation(s)
- Abel Chandra
- Department of Computing Science, Umeå UniversityUmeåSweden
| | - Laura Tünnermann
- Umeå Plant Science Centre (UPSC), Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural SciencesUmeåSweden
| | - Tommy Löfstedt
- Department of Computing Science, Umeå UniversityUmeåSweden
| | - Regina Gratz
- Umeå Plant Science Centre (UPSC), Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural SciencesUmeåSweden
- Department of Forest Ecology and Management, Swedish University of Agricultural SciencesUmeåSweden
| |
Collapse
|
31
|
Hernández IM, Dehouck Y, Bastolla U, López-Blanco JR, Chacón P. Predicting protein stability changes upon mutation using a simple orientational potential. Bioinformatics 2023; 39:6984713. [PMID: 36629451 PMCID: PMC9850275 DOI: 10.1093/bioinformatics/btad011] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Revised: 11/17/2022] [Accepted: 01/10/2023] [Indexed: 01/12/2023] Open
Abstract
MOTIVATION Structure-based stability prediction upon mutation is crucial for protein engineering and design, and for understanding genetic diseases or drug resistance events. For this task, we adopted a simple residue-based orientational potential that considers only three backbone atoms, previously applied in protein modeling. Its application to stability prediction only requires parametrizing 12 amino acid-dependent weights using cross-validation strategies on a curated dataset in which we tried to reduce the mutations that belong to protein-protein or protein-ligand interfaces, extreme conditions and the alanine over-representation. RESULTS Our method, called KORPM, accurately predicts mutational effects on an independent benchmark dataset, whether the wild-type or mutated structure is used as starting point. Compared with state-of-the-art methods on this balanced dataset, our approach obtained the lowest root mean square error (RMSE) and the highest correlation between predicted and experimental ΔΔG measures, as well as better receiver operating characteristics and precision-recall curves. Our method is almost anti-symmetric by construction, and it performs thus similarly for the direct and reverse mutations with the corresponding wild-type and mutated structures. Despite the strong limitations of the available experimental mutation data in terms of size, variability, and heterogeneity, we show competitive results with a simple sum of energy terms, which is more efficient and less prone to overfitting. AVAILABILITY AND IMPLEMENTATION https://github.com/chaconlab/korpm. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Iván Martín Hernández
- Department of Biological Physical Chemistry, Rocasolano Institute of Physical Chemistry, CSIC, 28006 Madrid, Spain
| | - Yves Dehouck
- Bioinformatic Unit, Centro de Biología Molecular “Severo Ochoa,” CSIC-UAM Cantoblanco, Madrid 28049, Spain
| | - Ugo Bastolla
- Bioinformatic Unit, Centro de Biología Molecular “Severo Ochoa,” CSIC-UAM Cantoblanco, Madrid 28049, Spain
| | - José Ramón López-Blanco
- Department of Biological Physical Chemistry, Rocasolano Institute of Physical Chemistry, CSIC, 28006 Madrid, Spain
| | | |
Collapse
|
32
|
Muacevic A, Adler JR. Comparison of In Vitro and In Silico Assessments of Human Galactose-1-Phosphate Uridylyltransferase Coding Variants. Cureus 2023; 15:e33592. [PMID: 36788839 PMCID: PMC9910814 DOI: 10.7759/cureus.33592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 01/10/2023] [Indexed: 01/12/2023] Open
Abstract
Introduction Human pathogenic coding variations of the galactose-1-phosphate uridylyltransferase (GALT) gene cause classic galactosemia, a recessive disease of galactose metabolism. Unfortunately, there are many variants of uncertain significance (VUS) that need to be characterized in order to be able to predict the likelihood of classic galactosemia for all possible genotypes. There are many bioinformatic resources available that attempt to predict the pathogenicity of a human variant, but it is unclear if these methods realistically predict the consequence of these variants. To determine the clinical application of these resources, we compared the results of in vitro enzymatic assays with in silico predictive models. Methods In all assays, we compared the activity of the three human GALT VUS (Alanine81Threonine, Histidine47Aspartate, Glutamate58Lysine) to native GALT (nGALT) and to a variant of known pathogenic clinical significance (Glutamine188Arginine). The enzymatic activities of VUS recombinant proteins were compared to the results of in silico analytical methods. The in silico methods included the comparison of molecular dynamic simulation root-mean-square deviation (RMSD) results and the results from predictive programs PredictSNP, evolutionary model of variant effect (EVE), ConSurf, and sorting intolerant from tolerant (SIFT). Results The enzymatic assays showed that the variants tested had diminished Vmax relative to the native protein. The VUS RMSD data for both the whole protein and individual residues in the molecular dynamics simulations were not significantly different when compared to nGALT. The other predictive programs had mixed results for each VUS and were not consistent with the enzyme activity or simulation results. Conclusions Our experiments indicated a statistically significant decrease in enzymatic activity of the VUS when compared to nGALT. These experiments also demonstrated significant differences between in silico predictions and in vitro results. These results suggest that the in silico tools used may not be beneficial in determining the pathogenicity of GALT VUS.
Collapse
|
33
|
Wang S, Tang H, Zhao Y, Zuo L. BayeStab: Predicting effects of mutations on protein stability with uncertainty quantification. Protein Sci 2022; 31:e4467. [PMID: 36217239 PMCID: PMC9601791 DOI: 10.1002/pro.4467] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 10/06/2022] [Accepted: 10/06/2022] [Indexed: 11/11/2022]
Abstract
Predicting protein thermostability change upon mutation is crucial for understanding diseases and designing therapeutics. However, accurately estimating Gibbs free energy change of the protein remained a challenge. Some methods struggle to generalize on examples with no homology and produce uncalibrated predictions. Here we leverage advances in graph neural networks for protein feature extraction to tackle this structure-property prediction task. Our method, BayeStab, is then tested on four test datasets, including S669, S611, S350, and Myoglobin, showing high generalization and symmetry performance. Meanwhile, we apply concrete dropout enabled Bayesian neural networks to infer plausible models and estimate uncertainty. By decomposing the uncertainty into parts induced by data noise and model, we demonstrate that the probabilistic method allows insights into the inherent noise of the training datasets, which is closely relevant to the upper bound of the task. Finally, the BayeStab web server is created and can be found at: http://www.bayestab.com. The code for this work is available at: https://github.com/HongzhouTang/BayeStab.
Collapse
Affiliation(s)
- Shuyu Wang
- Department of Control EngineeringNortheastern UniversityQinhuangdaoHebeiChina
| | - Hongzhou Tang
- Department of Control EngineeringNortheastern UniversityQinhuangdaoHebeiChina
| | - Yuliang Zhao
- Department of Control EngineeringNortheastern UniversityQinhuangdaoHebeiChina
| | - Lei Zuo
- Department of Naval Architecture and Marine EngineeringUniversity of MichiganAnn ArborMichiganUSA
| |
Collapse
|
34
|
Velecký J, Hamsikova M, Stourac J, Musil M, Damborsk J, Bednar D, Mazurenko S. SoluProtMutDB: a manually curated database of protein solubility changes upon mutations. Comput Struct Biotechnol J 2022; 20:6339-6347. [DOI: 10.1016/j.csbj.2022.11.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 11/04/2022] [Accepted: 11/04/2022] [Indexed: 11/11/2022] Open
|
35
|
Oliveira MLG, Castelli EC, Veiga‐Castelli LC, Pereira ALE, Marcorin L, Carratto TMT, Souza AS, Andrade HS, Simões AL, Donadi EA, Courtin D, Sabbagh A, Giuliatti S, Mendes‐Junior CT. Genetic diversity of the
LILRB1
and
LILRB2
coding regions in an admixed Brazilian population sample. HLA 2022; 100:325-348. [DOI: 10.1111/tan.14725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 06/02/2022] [Accepted: 06/24/2022] [Indexed: 11/27/2022]
Affiliation(s)
| | - Erick C. Castelli
- Pathology Department, School of Medicine São Paulo State University (UNESP) Botucatu State of São Paulo Brazil
- Molecular Genetics and Bioinformatics Laboratory, School of Medicine São Paulo State University (UNESP) Botucatu State of São Paulo Brazil
| | - Luciana C. Veiga‐Castelli
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto Universidade de São Paulo Ribeirão Preto SP Brazil
| | - Alison Luis E. Pereira
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto Universidade de São Paulo Ribeirão Preto SP Brazil
| | - Letícia Marcorin
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto Universidade de São Paulo Ribeirão Preto SP Brazil
| | - Thássia M. T. Carratto
- Departamento de Química, Laboratório de Pesquisas Forenses e Genômicas, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto Universidade de São Paulo Ribeirão Preto SP Brazil
| | - Andreia S. Souza
- Molecular Genetics and Bioinformatics Laboratory, School of Medicine São Paulo State University (UNESP) Botucatu State of São Paulo Brazil
| | - Heloisa S. Andrade
- Molecular Genetics and Bioinformatics Laboratory, School of Medicine São Paulo State University (UNESP) Botucatu State of São Paulo Brazil
| | - Aguinaldo L. Simões
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto Universidade de São Paulo Ribeirão Preto SP Brazil
| | - Eduardo A. Donadi
- Departamento de Clínica Médica, Faculdade de Medicina de Ribeirão Preto Universidade de São Paulo Ribeirão Preto SP Brazil
| | | | | | - Silvana Giuliatti
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto Universidade de São Paulo Ribeirão Preto SP Brazil
| | - Celso Teixeira Mendes‐Junior
- Departamento de Química, Laboratório de Pesquisas Forenses e Genômicas, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto Universidade de São Paulo Ribeirão Preto SP Brazil
| |
Collapse
|
36
|
Gao Q, Deng H, Yang Z, Yang Q, Zhang Y, Yuan X, Zeng M, Guo M, Zeng W, Jiang X, Yu B. Sodium danshensu attenuates cerebral ischemia–reperfusion injury by targeting AKT1. Front Pharmacol 2022; 13:946668. [PMID: 36188542 PMCID: PMC9520076 DOI: 10.3389/fphar.2022.946668] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 08/22/2022] [Indexed: 12/02/2022] Open
Abstract
The beneficial properties of Sodium Danshensu (SDSS) for controlling cerebral ischemia and reperfusion injury (CIRI) are elucidated here both in vivo and in vitro. SDSS administration significantly improved the viability of P12 cells, reduced lactate dehydrogenase (LDH) leakage, and decreased the apoptosis rate following exposure to an oxygen-glucose deprivation/reoxygenation (OGD) environment. In addition, the results of a HuprotTM human protein microarray and network pharmacology indicated that AKT1 is one of the main targets of SDSS. Moreover, functional experiments showed that SDSS intervention markedly increased the phosphorylation level of AKT1 and its downstream regulator, mTOR. The binding sites of SDSS to AKT1 protein were confirmed by Autodock software and a surface plasmon resonance experiment, the result of which imply that SDSS targets to the PH domain of AKT1 at ASN-53, ARG-86, and LYS-14 residues. Furthermore, knockdown of AKT1 significantly abolished the role of SDSS in protecting cells from apoptosis and necrosis. Finally, we investigated the curative effect of SDSS in a rat model of CIRI. The results suggest that administration of SDSS significantly reduces CIRI-induced necrosis and apoptosis in brain samples by activating AKT1 protein. In conclusion, SDSS exerts its positive role in alleviating CIRI by binding to the PH domain of AKT1 protein, further resulting in AKT1 activation.
Collapse
Affiliation(s)
- Qing Gao
- School of Integrative Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin, China
| | - Hao Deng
- Tianjin Key Laboratory of Translational Research of TCM Prescription and Syndrome, First Teaching Hospital of Tianjin University of Traditional Chinese Medicine, Tianjin, China
| | - Zhengfei Yang
- College of Traditional Chinese Medicine, Ningxia Medical University, Yinchuan, China
| | - Qiuyue Yang
- School of Integrative Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin, China
| | - Yilin Zhang
- School of Integrative Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin, China
| | - Xiaopeng Yuan
- Shenzhen Traditional Chinese Medicine Hospital, Shenzhen, China
| | - Miao Zeng
- School of Integrative Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin, China
| | - Maojuan Guo
- School of Integrative Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin, China
| | - Wenyun Zeng
- School of Integrative Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin, China
| | - Xijuan Jiang
- School of Integrative Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin, China
- *Correspondence: Xijuan Jiang, ; Bin Yu,
| | - Bin Yu
- International Exchanges Department and International Education College, Tianjin University of Traditional Chinese Medicine, Tianjin, China
- *Correspondence: Xijuan Jiang, ; Bin Yu,
| |
Collapse
|
37
|
Iqbal S, Ge F, Li F, Akutsu T, Zheng Y, Gasser RB, Yu DJ, Webb GI, Song J. PROST: AlphaFold2-aware Sequence-Based Predictor to Estimate Protein Stability Changes upon Missense Mutations. J Chem Inf Model 2022; 62:4270-4282. [PMID: 35973091 DOI: 10.1021/acs.jcim.2c00799] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
An essential step in engineering proteins and understanding disease-causing missense mutations is to accurately model protein stability changes when such mutations occur. Here, we developed a new sequence-based predictor for the protein stability (PROST) change (Gibb's free energy change, ΔΔG) upon a single-point missense mutation. PROST extracts multiple descriptors from the most promising sequence-based predictors, such as BoostDDG, SAAFEC-SEQ, and DDGun. RPOST also extracts descriptors from iFeature and AlphaFold2. The extracted descriptors include sequence-based features, physicochemical properties, evolutionary information, evolutionary-based physicochemical properties, and predicted structural features. The PROST predictor is a weighted average ensemble model based on extreme gradient boosting (XGBoost) decision trees and an extra-trees regressor; PROST is trained on both direct and hypothetical reverse mutations using the S5294 (S2647 direct mutations + S2647 inverse mutations). The parameters for the PROST model are optimized using grid searching with 5-fold cross-validation, and feature importance analysis unveils the most relevant features. The performance of PROST is evaluated in a blinded manner, employing nine distinct data sets and existing state-of-the-art sequence-based and structure-based predictors. This method consistently performs well on frataxin, S217, S349, Ssym, S669, Myoglobin, and CAGI5 data sets in blind tests and similarly to the state-of-the-art predictors for p53 and S276 data sets. When the performance of PROST is compared with the latest predictors such as BoostDDG, SAAFEC-SEQ, ACDC-NN-seq, and DDGun, PROST dominates these predictors. A case study of mutation scanning of the frataxin protein for nine wild-type residues demonstrates the utility of PROST. Taken together, these findings indicate that PROST is a well-suited predictor when no protein structural information is available. The source code of PROST, data sets, examples, and pretrained models along with how to use PROST are available at https://github.com/ShahidIqb/PROST and https://prost.erc.monash.edu/seq.
Collapse
Affiliation(s)
- Shahid Iqbal
- Department of Data Science and AI, Faculty of IT, Monash University, Clayton, Victoria 3800, Australia.,Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Clayton, Victoria 3800, Australia.,Monash Data Futures Institute, Monash University, Clayton, Victoria 3800, Australia
| | - Fang Ge
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Clayton, Victoria 3800, Australia.,Monash Data Futures Institute, Monash University, Clayton, Victoria 3800, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Yuanting Zheng
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Robin B Gasser
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Geoffrey I Webb
- Department of Data Science and AI, Faculty of IT, Monash University, Clayton, Victoria 3800, Australia.,Monash Data Futures Institute, Monash University, Clayton, Victoria 3800, Australia
| | - Jiangning Song
- Department of Data Science and AI, Faculty of IT, Monash University, Clayton, Victoria 3800, Australia.,Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Clayton, Victoria 3800, Australia.,Monash Data Futures Institute, Monash University, Clayton, Victoria 3800, Australia
| |
Collapse
|
38
|
Casadio R, Savojardo C, Fariselli P, Capriotti E, Martelli PL. Turning Failures into Applications: The Problem of Protein ΔΔG Prediction. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2449:169-185. [PMID: 35507262 DOI: 10.1007/978-1-0716-2095-3_6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
After nearly two decades of research in the field of computational methods based on machine learning and knowledge-based potentials for ΔG and ΔΔG prediction upon variations, we now realize that all the approaches are poorly performing when tested on specific cases and that there is large space for improvement. Why this is so? Is it wrong the underlying assumption that experimental protein thermodynamics in solution reflects the thermodynamics of a single protein? Both machine learning and knowledge-based computational methods are rigorous and we know the solid theory behind. We are now in a critical situation, which suggests that predictions of protein instability upon variation should be considered with care. In the following, we will show how to cope with the problem of understanding which protein positions may be of interest for biotechnological and biomedical purposes. By applying a consensus procedure, we indicate possible strategies for the result interpretation.
Collapse
Affiliation(s)
- Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy.
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Turin, Italy
| | - Emidio Capriotti
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| |
Collapse
|
39
|
Montanucci L, Capriotti E, Birolo G, Benevenuta S, Pancotti C, Lal D, Fariselli P. DDGun: an untrained predictor of protein stability changes upon amino acid variants. Nucleic Acids Res 2022; 50:W222-W227. [PMID: 35524565 PMCID: PMC9252764 DOI: 10.1093/nar/gkac325] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 04/15/2022] [Accepted: 05/04/2022] [Indexed: 01/22/2023] Open
Abstract
Estimating the functional effect of single amino acid variants in proteins is fundamental for predicting the change in the thermodynamic stability, measured as the difference in the Gibbs free energy of unfolding, between the wild-type and the variant protein (ΔΔG). Here, we present the web-server of the DDGun method, which was previously developed for the ΔΔG prediction upon amino acid variants. DDGun is an untrained method based on basic features derived from evolutionary information. It is antisymmetric, as it predicts opposite ΔΔG values for direct (A → B) and reverse (B → A) single and multiple site variants. DDGun is available in two versions, one based on only sequence information and the other one based on sequence and structure information. Despite being untrained, DDGun reaches prediction performances comparable to those of trained methods. Here we make DDGun available as a web server. For the web server version, we updated the protein sequence database used for the computation of the evolutionary features, and we compiled two new data sets of protein variants to do a blind test of its performances. On these blind data sets of single and multiple site variants, DDGun confirms its prediction performance, reaching an average correlation coefficient between experimental and predicted ΔΔG of 0.45 and 0.49 for the sequence-based and structure-based versions, respectively. Besides being used for the prediction of ΔΔG, we suggest that DDGun should be adopted as a benchmark method to assess the predictive capabilities of newly developed methods. Releasing DDGun as a web-server, stand-alone program and docker image will facilitate the necessary process of method comparison to improve ΔΔG prediction.
Collapse
Affiliation(s)
- Ludovica Montanucci
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, 9500 Euclid Avenue, Cleveland, OH 44195, USA
| | - Emidio Capriotti
- BioFolD Unit, Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Via F. Selmi 3, 40126 Bologna, Italy
| | - Giovanni Birolo
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126, Torino, Italy
| | - Silvia Benevenuta
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126, Torino, Italy
| | - Corrado Pancotti
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126, Torino, Italy
| | - Dennis Lal
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, 9500 Euclid Avenue, Cleveland, OH 44195, USA
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126, Torino, Italy
| |
Collapse
|
40
|
Kebabci N, Timucin AC, Timucin E. Toward Compilation of Balanced Protein Stability Data Sets: Flattening the ΔΔ G Curve through Systematic Enrichment. J Chem Inf Model 2022; 62:1345-1355. [PMID: 35201762 DOI: 10.1021/acs.jcim.2c00054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Often studies analyzing stability data sets and/or predictors ignore neutral mutations and use a binary classification scheme labeling only destabilizing and stabilizing mutations. Recognizing that highly concentrated neutral mutations interfere with data set quality, we have explored three protein stability data sets: S2648, PON-tstab, and the symmetric Ssym that differ in size and quality. A characteristic leptokurtic shape in the ΔΔG distributions of all three data sets including the curated and symmetric ones was reported due to concentrated neutral mutations. To further investigate the impact of neutral mutations on ΔΔG predictions, we have comprehensively assessed the performance of 11 predictors on the PON-tstab data set. Correlation and error analyses showed that all of the predictors performed the best on the neutral mutations, while their performance became gradually worse as the ΔΔG of the mutations departed further from the neutral zone regardless of the direction, implying a bias toward dense mutations. To this end, after unraveling the role of concentrated neutral mutations in biases of stability data sets, we described a systematic enrichment approach to balance the ΔΔG distributions. Before enrichment, mutations were clustered based on their biochemical and/or structural features, and then three mutations were selected from every 2 kcal/mol of each cluster. Upon implementation of this approach by distinct clustering schemes, we generated five subsets varying in size and ΔΔG distributions. All subsets showed improved ΔΔG and frequency distributions. We ultimately reported that the errors toward enriched subsets were higher than those toward the parent data sets, confirming the enrichment of difficult-to-predict mutations in the subsets. In summary, we elaborated the prediction bias toward a concentrated neutral zone and also implemented a rational strategy to tackle this and other forms of biases. Ultimately, this study equipping us with an extended view of shortcomings of stability data sets is a step taken toward development of an unbiased predictor.
Collapse
Affiliation(s)
- Narod Kebabci
- Department of Biostatistics and Bioinformatics, Institute of Health Sciences, Acibadem University, Istanbul 34752, Turkey
| | - Ahmet Can Timucin
- Department of Molecular Biology and Genetics, Faculty of Arts and Sciences, Acibadem University, Istanbul 34752, Turkey
| | - Emel Timucin
- Department of Biostatistics and Medical Informatics, School of Medicine, Acibadem University, Istanbul 34752, Turkey
| |
Collapse
|
41
|
Baek KT, Kepp KP. Data set and fitting dependencies when estimating protein mutant stability: Toward simple, balanced, and interpretable models. J Comput Chem 2022; 43:504-518. [PMID: 35040492 DOI: 10.1002/jcc.26810] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 12/13/2021] [Accepted: 01/03/2022] [Indexed: 12/27/2022]
Abstract
Accurate prediction of protein stability changes upon mutation (ΔΔG) is increasingly important to evolution studies, protein engineering, and screening of disease-causing gene variants but is challenged by biases in training data. We investigated 45 linear regression models trained on data sets that account systematically for destabilization bias and mutation-type bias BM . The models were externally validated on three test data sets probing different pathologies and for internal consistency (symmetry and neutrality). Model structure and performance substantially depended on training data and even fitting method. We developed two final models: SimBa-IB for typical natural mutations and SimBa-SYM for situations where stabilizing and destabilizing mutations occur to a similar extent. SimBa-SYM, despite is simplicity, is essentially non-biased (vs. the Ssym data set) while still performing well for all data sets (R ~ 0.46-0.54, MAE = 1.16-1.24 kcal/mol). The simple models provide advantage in terms of interpretability, use and future improvement, and are freely available on GitHub.
Collapse
Affiliation(s)
| | - Kasper P Kepp
- DTU Chemistry, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
42
|
Pancotti C, Benevenuta S, Birolo G, Alberini V, Repetto V, Sanavia T, Capriotti E, Fariselli P. Predicting protein stability changes upon single-point mutation: a thorough comparison of the available tools on a new dataset. Brief Bioinform 2022; 23:6502552. [PMID: 35021190 PMCID: PMC8921618 DOI: 10.1093/bib/bbab555] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Revised: 11/29/2021] [Accepted: 12/05/2021] [Indexed: 12/13/2022] Open
Abstract
Predicting the difference in thermodynamic stability between protein variants is crucial for protein design and understanding the genotype-phenotype relationships. So far, several computational tools have been created to address this task. Nevertheless, most of them have been trained or optimized on the same and ‘all’ available data, making a fair comparison unfeasible. Here, we introduce a novel dataset, collected and manually cleaned from the latest version of the ThermoMutDB database, consisting of 669 variants not included in the most widely used training datasets. The prediction performance and the ability to satisfy the antisymmetry property by considering both direct and reverse variants were evaluated across 21 different tools. The Pearson correlations of the tested tools were in the ranges of 0.21–0.5 and 0–0.45 for the direct and reverse variants, respectively. When both direct and reverse variants are considered, the antisymmetric methods perform better achieving a Pearson correlation in the range of 0.51–0.62. The tested methods seem relatively insensitive to the physiological conditions, performing well also on the variants measured with more extreme pH and temperature values. A common issue with all the tested methods is the compression of the \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$\Delta \Delta G$\end{document} predictions toward zero. Furthermore, the thermodynamic stability of the most significantly stabilizing variants was found to be more challenging to predict. This study is the most extensive comparisons of prediction methods using an entirely novel set of variants never tested before.
Collapse
Affiliation(s)
- Corrado Pancotti
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Silvia Benevenuta
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Giovanni Birolo
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Virginia Alberini
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Valeria Repetto
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Tiziana Sanavia
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Emidio Capriotti
- Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Bologna, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| |
Collapse
|
43
|
Artificial intelligence challenges for predicting the impact of mutations on protein stability. Curr Opin Struct Biol 2021; 72:161-168. [PMID: 34922207 DOI: 10.1016/j.sbi.2021.11.001] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 09/15/2021] [Accepted: 11/08/2021] [Indexed: 01/17/2023]
Abstract
Stability is a key ingredient of protein fitness, and its modification through targeted mutations has applications in various fields, such as protein engineering, drug design, and deleterious variant interpretation. Many studies have been devoted over the past decades to build new, more effective methods for predicting the impact of mutations on protein stability based on the latest developments in artificial intelligence. We discuss their features, algorithms, computational efficiency, and accuracy estimated on an independent test set. We focus on a critical analysis of their limitations, the recurrent biases toward the training set, their generalizability, and interpretability. We found that the accuracy of the predictors has stagnated at around 1 kcal/mol for over 15 years. We conclude by discussing the challenges that need to be addressed to reach improved performance.
Collapse
|
44
|
Narykov O, Johnson NT, Korkin D. Predicting protein interaction network perturbation by alternative splicing with semi-supervised learning. Cell Rep 2021; 37:110045. [PMID: 34818539 DOI: 10.1016/j.celrep.2021.110045] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 07/21/2021] [Accepted: 11/02/2021] [Indexed: 10/19/2022] Open
Abstract
Alternative splicing introduces an additional layer of protein diversity and complexity in regulating cellular functions that can be specific to the tissue and cell type, physiological state of a cell, or disease phenotype. Recent high-throughput experimental studies have illuminated the functional role of splicing events through rewiring protein-protein interactions; however, the extent to which the macromolecular interactions are affected by alternative splicing has yet to be fully understood. In silico methods provide a fast and cheap alternative to interrogating functional characteristics of thousands of alternatively spliced isoforms. Here, we develop an accurate feature-based machine learning approach that predicts whether a protein-protein interaction carried out by a reference isoform is perturbed by an alternatively spliced isoform. Our method, called the alternatively spliced interactions prediction (ALT-IN) tool, is compared with the state-of-the-art PPI prediction tools and shows superior performance, achieving 0.92 in precision and recall values.
Collapse
Affiliation(s)
- Oleksandr Narykov
- Department of Computer Science, and Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA, USA
| | - Nathan T Johnson
- Department of Computer Science, and Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA, USA; Harvard Program in Therapeutic Sciences, Harvard Medical School, and Breast Tumor Immunology Laboratory, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Dmitry Korkin
- Department of Computer Science, and Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA, USA.
| |
Collapse
|
45
|
Rostami N, Choupani E, Hernandez Y, Arab SS, Jazayeri SM, Gomari MM. SARS-CoV-2 spike evolutionary behaviors; simulation of N501Y mutation outcomes in terms of immunogenicity and structural characteristic. J Cell Biochem 2021; 123:417-430. [PMID: 34783057 PMCID: PMC8657535 DOI: 10.1002/jcb.30181] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 11/02/2021] [Accepted: 11/05/2021] [Indexed: 12/20/2022]
Abstract
Since the emergence of severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2), a large number of mutations in its genome have been reported. Some of the mutations occur in noncoding regions without affecting the pathobiology of the virus, while mutations in coding regions are significant. One of the regions where a mutation can occur, affecting the function of the virus is at the receptor‐binding domain (RBD) of the spike protein. RBD interacts with angiotensin‐converting enzyme 2 (ACE2) and facilitates the entry of the virus into the host cells. There is a lot of focus on RBD mutations, especially the displacement of N501Y which is observed in the UK/Kent, South Africa, and Brazilian lineages of SARS‐CoV‐2. Our group utilizes computational biology approaches such as immunoinformatics, protein–protein interaction analysis, molecular dynamics, free energy computation, and tertiary structure analysis to disclose the consequences of N501Y mutation at the molecular level. Surprisingly, we discovered that this mutation reduces the immunogenicity of the spike protein; also, displacement of Asn with Tyr reduces protein compactness and significantly increases the stability of the spike protein and its affinity to ACE2. Moreover, following the N501Y mutation secondary structure and folding of the spike protein changed dramatically.
Collapse
Affiliation(s)
- Neda Rostami
- Department of Chemical Engineering, Faculty of Engineering, Arak University, Arak, Iran
| | - Edris Choupani
- Department of Medical Biotechnology, Faculty of Allied Medicine, Iran University of Medical Sciences, Tehran, Iran
| | - Yaeren Hernandez
- Department of Cellular and Molecular Medicine, University of Arizona, Tucson, Arizona, USA
| | - Seyed S Arab
- Department of Biophysics, School of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Seyed M Jazayeri
- Department of Virology, School of Public Health, Tehran University of Medical Sciences, Tehran, Iran
| | - Mohammad M Gomari
- Department of Medical Biotechnology, Faculty of Allied Medicine, Iran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
46
|
Rogers AW, Vega-Ramon F, Yan J, Del Río-Chanona EA, Jing K, Zhang D. A transfer learning approach for predictive modeling of bioprocesses using small data. Biotechnol Bioeng 2021; 119:411-422. [PMID: 34716712 DOI: 10.1002/bit.27980] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Accepted: 10/28/2021] [Indexed: 11/06/2022]
Abstract
Predictive modeling of new biochemical systems with small data is a great challenge. To fill this gap, transfer learning, a subdomain of machine learning that serves to transfer knowledge from a generalized model to a more domain-specific model, provides a promising solution. While transfer learning has been used in natural language processing, image analysis, and chemical engineering fault detection, its application within biochemical engineering has not been systematically explored. In this study, we demonstrated the benefits of transfer learning when applied to predict dynamic behaviors of new biochemical processes. Two different case studies were presented to investigate the accuracy, reliability, and advantage of this innovative modeling approach. We thoroughly discussed the different transfer learning strategies and the effects of topology on transfer learning, comparing the performance of the transfer learning models against benchmark kinetic and data-driven models. Furthermore, strong connections between the underlying process mechanism and the transfer learning model's optimal structure were highlighted, suggesting the interpretability of transfer learning to enable more accurate prediction than a naive data-driven modeling approach. Therefore, this study shows a novel approach to effectively combining data from different resources for bioprocess simulation.
Collapse
Affiliation(s)
- Alexander W Rogers
- Department of Chemical Engineering and Analytical Science, The University of Manchester, Manchester, UK
| | - Fernando Vega-Ramon
- Department of Chemical Engineering and Analytical Science, The University of Manchester, Manchester, UK
| | - Jiangtao Yan
- Department of Chemical and Biochemical Engineering, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, China
| | | | - Keju Jing
- Department of Chemical Engineering and Analytical Science, The University of Manchester, Manchester, UK
| | - Dongda Zhang
- Department of Chemical Engineering and Analytical Science, The University of Manchester, Manchester, UK
| |
Collapse
|
47
|
Samaga YBL, Raghunathan S, Priyakumar UD. SCONES: Self-Consistent Neural Network for Protein Stability Prediction Upon Mutation. J Phys Chem B 2021; 125:10657-10671. [PMID: 34546056 DOI: 10.1021/acs.jpcb.1c04913] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Engineering proteins to have desired properties by mutating amino acids at specific sites is commonplace. Such engineered proteins must be stable to function. Experimental methods used to determine stability at throughputs required to scan the protein sequence space thoroughly are laborious. To this end, many machine learning based methods have been developed to predict thermodynamic stability changes upon mutation. These methods have been evaluated for symmetric consistency by testing with hypothetical reverse mutations. In this work, we propose transitive data augmentation, evaluating transitive consistency with our new Stransitive data set, and a new machine learning based method, the first of its kind, that incorporates both symmetric and transitive properties into the architecture. Our method, called SCONES, is an interpretable neural network that predicts small relative protein stability changes for missense mutations that do not significantly alter the structure. It estimates a residue's contributions toward protein stability (ΔG) in its local structural environment, and the difference between independently predicted contributions of the reference and mutant residues is reported as ΔΔG. We show that this self-consistent machine learning architecture is immune to many common biases in data sets, relies less on data than existing methods, is robust to overfitting, and can explain a substantial portion of the variance in experimental data.
Collapse
Affiliation(s)
- Yashas B L Samaga
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500 032, India
| | - Shampa Raghunathan
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500 032, India
| | - U Deva Priyakumar
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500 032, India
| |
Collapse
|
48
|
Marabotti A, Del Prete E, Scafuri B, Facchiano A. Performance of Web tools for predicting changes in protein stability caused by mutations. BMC Bioinformatics 2021; 22:345. [PMID: 34225665 PMCID: PMC8256537 DOI: 10.1186/s12859-021-04238-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Accepted: 05/18/2021] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Despite decades on developing dedicated Web tools, it is still difficult to predict correctly the changes of the thermodynamic stability of proteins caused by mutations. Here, we assessed the reliability of five recently developed Web tools, in order to evaluate the progresses in the field. RESULTS The results show that, although there are improvements in the field, the assessed predictors are still far from ideal. Prevailing problems include the bias towards destabilizing mutations, and, in general, the results are unreliable when the mutation causes a ΔΔG within the interval ± 0.5 kcal/mol. We found that using several predictors and combining their results into a consensus is a rough, but effective way to increase reliability of the predictions. CONCLUSIONS We suggest all developers to consider in their future tools the usage of balanced data sets for training of predictors, and all users to combine the results of multiple tools to increase the chances of having correct predictions about the effect of mutations on the thermodynamic stability of a protein.
Collapse
Affiliation(s)
- Anna Marabotti
- Department of Chemistry and Biology "A. Zambelli", University of Salerno, Fisciano, SA, Italy.
| | - Eugenio Del Prete
- CNR-IAC, National Research Council, Institute for Applied Mathematics "Mauro Picone", Naples, Italy
| | - Bernardina Scafuri
- Department of Chemistry and Biology "A. Zambelli", University of Salerno, Fisciano, SA, Italy
| | - Angelo Facchiano
- CNR-ISA, National Research Council, Institute of Food Science, Avellino, Italy.
| |
Collapse
|
49
|
Wu L, Qin L, Nie Y, Xu Y, Zhao YL. Computer-aided understanding and engineering of enzymatic selectivity. Biotechnol Adv 2021; 54:107793. [PMID: 34217814 DOI: 10.1016/j.biotechadv.2021.107793] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 04/26/2021] [Accepted: 06/28/2021] [Indexed: 12/26/2022]
Abstract
Enzymes offering chemo-, regio-, and stereoselectivity enable the asymmetric synthesis of high-value chiral molecules. Unfortunately, the drawback that naturally occurring enzymes are often inefficient or have undesired selectivity toward non-native substrates hinders the broadening of biocatalytic applications. To match the demands of specific selectivity in asymmetric synthesis, biochemists have implemented various computer-aided strategies in understanding and engineering enzymatic selectivity, diversifying the available repository of artificial enzymes. Here, given that the entire asymmetric catalytic cycle, involving precise interactions within the active pocket and substrate transport in the enzyme channel, could affect the enzymatic efficiency and selectivity, we presented a comprehensive overview of the computer-aided workflow for enzymatic selectivity. This review includes a mechanistic understanding of enzymatic selectivity based on quantum mechanical calculations, rational design of enzymatic selectivity guided by enzyme-substrate interactions, and enzymatic selectivity regulation via enzyme channel engineering. Finally, we discussed the computational paradigm for designing enzyme selectivity in silico to facilitate the advancement of asymmetric biosynthesis.
Collapse
Affiliation(s)
- Lunjie Wu
- School of Biotechnology and Key Laboratory of Industrial Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Lei Qin
- School of Biotechnology and Key Laboratory of Industrial Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Yao Nie
- School of Biotechnology and Key Laboratory of Industrial Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, China; Suqian Industrial Technology Research Institute of Jiangnan University, Suqian 223814, China.
| | - Yan Xu
- School of Biotechnology and Key Laboratory of Industrial Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, China; State Key Laboratory of Food Science and Technology, Jiangnan University, Wuxi 214122, China.
| | - Yi-Lei Zhao
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, MOE-LSB & MOE-LSC, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
50
|
Scherer M, Fleishman SJ, Jones PR, Dandekar T, Bencurova E. Computational Enzyme Engineering Pipelines for Optimized Production of Renewable Chemicals. Front Bioeng Biotechnol 2021; 9:673005. [PMID: 34211966 PMCID: PMC8239229 DOI: 10.3389/fbioe.2021.673005] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Accepted: 05/06/2021] [Indexed: 11/13/2022] Open
Abstract
To enable a sustainable supply of chemicals, novel biotechnological solutions are required that replace the reliance on fossil resources. One potential solution is to utilize tailored biosynthetic modules for the metabolic conversion of CO2 or organic waste to chemicals and fuel by microorganisms. Currently, it is challenging to commercialize biotechnological processes for renewable chemical biomanufacturing because of a lack of highly active and specific biocatalysts. As experimental methods to engineer biocatalysts are time- and cost-intensive, it is important to establish efficient and reliable computational tools that can speed up the identification or optimization of selective, highly active, and stable enzyme variants for utilization in the biotechnological industry. Here, we review and suggest combinations of effective state-of-the-art software and online tools available for computational enzyme engineering pipelines to optimize metabolic pathways for the biosynthesis of renewable chemicals. Using examples relevant for biotechnology, we explain the underlying principles of enzyme engineering and design and illuminate future directions for automated optimization of biocatalysts for the assembly of synthetic metabolic pathways.
Collapse
Affiliation(s)
- Marc Scherer
- Department of Bioinformatics, Julius-Maximilians University of Würzburg, Würzburg, Germany
| | - Sarel J Fleishman
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel
| | - Patrik R Jones
- Department of Life Sciences, Imperial College London, London, United Kingdom
| | - Thomas Dandekar
- Department of Bioinformatics, Julius-Maximilians University of Würzburg, Würzburg, Germany
| | - Elena Bencurova
- Department of Bioinformatics, Julius-Maximilians University of Würzburg, Würzburg, Germany
| |
Collapse
|