1
|
Tao J, Luo J, Li K, Yang R, Lin Y, Ge J. Comprehensive genetic analysis uncovers the mutational spectrum of MFRP and its genotype-phenotype correlation in a large cohort of Chinese microphthalmia patients. Gene 2024; 926:148647. [PMID: 38848879 DOI: 10.1016/j.gene.2024.148647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 05/21/2024] [Accepted: 06/03/2024] [Indexed: 06/09/2024]
Abstract
PURPOSE Microphthalmia is a severe congenital ocular disease featured by abnormal ocular development. The aim of this study was to detail the genetic and clinical characteristics of a large cohort of Chinese patients with microphthalmia related to MFRP variants, focusing on uncovering genotype-phenotype correlations. METHODS Fifty microphthalmia patients from 44 unrelated Chinese families were recruited. Whole-exome sequencing (WES) was conducted to analyze the coding regions and adjacent intronic regions of MFRP. Axial lengths (AL) were measured for all probands and available family members. Protein structures of mutations with high frequency in our cohort were predicted. The genotype-phenotype correlations were explored by statistical analysis. RESULTS Sixteen MFRP variants were detected in 17 families, accounting for 38.64 % of all microphthalmia families. There were 9 novel mutations (c.427+1G>C, c.428-2A>C, c.561_575del:p.A188_E192del, c.836G>A:p.C279Y, c.1010_1021del:p.H337_E340del:p.Y479*, c.1516_1517del:p.S506Pfs*66, c.1561T>G:p.C521G, c.1616G>A:p.R539H, and c.1735C>T:p.P579S) and six previously reported variants in MFRP, with p.E496K and p.H337_E340del being highly frequent, found in eight (47.06 %) and two families (11.76 %), respectively. Seven variants (43.75 %) were located in the C-terminal cysteine-rich frizzled-related domain (CRD) (7/16, 43.75 %). Protein prediction implicated p.E496K and p.H337_E340del mutations might lead to a destabilization of the MFRP protein. The average AL of all 42 eyes was 16.02 ± 1.05 mm, and 78.36 % of eyes with AL < 16 mm harbored p.E496K variant. Twenty-six eyes with variant variant had shorter AL than that of the other 16 eyes without this variant (p = 0.006), highlighting a novel genotype-phenotype correlation. CONCLUSIONS In this largest cohort of Chinese patients with microphthalmia, the 9 novel variants, high frequency of p.E496W, and mutation hotspots in CRD reveals unique insights into the MFRP mutation spectrum among Chinese patients, indicating ethnic variability. A new genotype-phenotype correlation that p.E496K variant associated with a shorter AL is unveiled. Our findings enhance the current knowledge of MFRP-associated microphthalmia and provide valuable information for prenatal diagnosis as well as future therapy.
Collapse
Affiliation(s)
- Jing Tao
- Beijing Tongren Eye Center, Beijing Tongren Hospital, Capital Medical University, Beijing Key Laboratory of Ophthalmology and Visual Sciences, Beijing 100730, China
| | - Jingyi Luo
- State Key Laboratory of Ophthalmology, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Kaijing Li
- State Key Laboratory of Ophthalmology, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Runcai Yang
- State Key Laboratory of Ophthalmology, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Yixiu Lin
- State Key Laboratory of Ophthalmology, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Jian Ge
- State Key Laboratory of Ophthalmology, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, Guangdong 510000, China.
| |
Collapse
|
2
|
Chu H, Tian Z, Hu L, Zhang H, Chang H, Bai J, Liu D, Lu L, Cheng J, Jiang H. High-Temperature Tolerance Protein Engineering through Deep Evolution. BIODESIGN RESEARCH 2024; 6:0031. [PMID: 38572349 PMCID: PMC10988389 DOI: 10.34133/bdr.0031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 03/12/2024] [Indexed: 04/05/2024] Open
Abstract
Protein engineering aimed at increasing temperature tolerance through iterative mutagenesis and high-throughput screening is often labor-intensive. Here, we developed a deep evolution (DeepEvo) strategy to engineer protein high-temperature tolerance by generating and selecting functional sequences using deep learning models. Drawing inspiration from the concept of evolution, we constructed a high-temperature tolerance selector based on a protein language model, acting as selective pressure in the high-dimensional latent spaces of protein sequences to enrich those with high-temperature tolerance. Simultaneously, we developed a variant generator using a generative adversarial network to produce protein sequence variants containing the desired function. Afterward, the iterative process involving the generator and selector was executed to accumulate high-temperature tolerance traits. We experimentally tested this approach on the model protein glyceraldehyde 3-phosphate dehydrogenase, obtaining 8 variants with high-temperature tolerance from just 30 generated sequences, achieving a success rate of over 26%, demonstrating the high efficiency of DeepEvo in engineering protein high-temperature tolerance.
Collapse
Affiliation(s)
- Huanyu Chu
- Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, Tianjin 300308, P. R. China
- National Center of Technology Innovation for Synthetic Biology, Tianjin 300308, P. R. China
| | - Zhenyang Tian
- Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, Tianjin 300308, P. R. China
- National Center of Technology Innovation for Synthetic Biology, Tianjin 300308, P. R. China
- Tianjin Zhonghe Gene Technology Co., LTD, Tianjin 300308, P. R. China
| | - Lingling Hu
- Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, Tianjin 300308, P. R. China
- National Center of Technology Innovation for Synthetic Biology, Tianjin 300308, P. R. China
- College of Biotechnology,
Tianjin University of Science and Technology, Tianjin 300457, P. R. China
| | - Hejian Zhang
- Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, Tianjin 300308, P. R. China
- National Center of Technology Innovation for Synthetic Biology, Tianjin 300308, P. R. China
- College of Biotechnology,
Tianjin University of Science and Technology, Tianjin 300457, P. R. China
| | - Hong Chang
- Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, Tianjin 300308, P. R. China
- National Center of Technology Innovation for Synthetic Biology, Tianjin 300308, P. R. China
- College of Biotechnology,
Tianjin University of Science and Technology, Tianjin 300457, P. R. China
| | - Jie Bai
- Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, Tianjin 300308, P. R. China
- National Center of Technology Innovation for Synthetic Biology, Tianjin 300308, P. R. China
- College of Biotechnology,
Tianjin University of Science and Technology, Tianjin 300457, P. R. China
| | - Dingyu Liu
- Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, Tianjin 300308, P. R. China
- National Center of Technology Innovation for Synthetic Biology, Tianjin 300308, P. R. China
| | - Lina Lu
- Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, Tianjin 300308, P. R. China
- National Center of Technology Innovation for Synthetic Biology, Tianjin 300308, P. R. China
| | - Jian Cheng
- Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, Tianjin 300308, P. R. China
- National Center of Technology Innovation for Synthetic Biology, Tianjin 300308, P. R. China
| | - Huifeng Jiang
- Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, Tianjin 300308, P. R. China
- National Center of Technology Innovation for Synthetic Biology, Tianjin 300308, P. R. China
| |
Collapse
|
3
|
Wang JM, Cui RK, Qian ZK, Yang ZZ, Li Y. Mining channel-regulated peptides from animal venom by integrating sequence semantics and structural information. Comput Biol Chem 2024; 109:108027. [PMID: 38340414 DOI: 10.1016/j.compbiolchem.2024.108027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 01/24/2024] [Accepted: 02/04/2024] [Indexed: 02/12/2024]
Abstract
Channel-regulated peptides (CRPs) derived from animal venom hold great promise as potential drug candidates for numerous diseases associated with channel proteins. However, discovering and identifying CRPs using traditional bio-experimental methods is a time-consuming and laborious process. While there were a few computational studies on CRPs, they were limited to specific channel proteins, relied heavily on complex feature engineering, and lacked the incorporation of multi-source information. To address these problems, we proposed a novel deep learning model, called DeepCRPs, based on graph neural networks for systematically mining CRPs from animal venom. By combining the sequence semantic and structural information, the classification performance of four CRPs was significantly enhanced, reaching an accuracy of 0.92. This performance surpassed baseline models with accuracies ranging from 0.77 to 0.89. Furthermore, we employed advanced interpretable techniques to explore sequence and structural determinants relevant to the classification of CRPs, yielding potentially valuable bio-function interpretations. Comprehensive experimental results demonstrated the precision and interpretive capability of DeepCRPs, making it an accurate and bio-explainable suit for the identification and categorization of CRPs. Our research will contribute to the discovery and development of toxin peptides targeting channel proteins. The source data and code are freely available at https://github.com/liyigerry/DeepCRPs.
Collapse
Affiliation(s)
- Jian-Ming Wang
- College of Mathematics and Computer Science, Dali University, Dali, China
| | - Rong-Kai Cui
- College of Mathematics and Computer Science, Dali University, Dali, China
| | - Zheng-Kun Qian
- College of Mathematics and Computer Science, Dali University, Dali, China
| | - Zi-Zhong Yang
- Yunnan Provincial Key Laboratory of Entomological Biopharmaceutical R&D, College of Pharmacy, Dali University, Dali, China
| | - Yi Li
- College of Mathematics and Computer Science, Dali University, Dali, China.
| |
Collapse
|
4
|
Gelman S, Johnson B, Freschlin C, D'Costa S, Gitter A, Romero PA. Biophysics-based protein language models for protein engineering. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.15.585128. [PMID: 38559182 PMCID: PMC10980077 DOI: 10.1101/2024.03.15.585128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Protein language models trained on evolutionary data have emerged as powerful tools for predictive problems involving protein sequence, structure, and function. However, these models overlook decades of research into biophysical factors governing protein function. We propose Mutational Effect Transfer Learning (METL), a protein language model framework that unites advanced machine learning and biophysical modeling. Using the METL framework, we pretrain transformer-based neural networks on biophysical simulation data to capture fundamental relationships between protein sequence, structure, and energetics. We finetune METL on experimental sequence-function data to harness these biophysical signals and apply them when predicting protein properties like thermostability, catalytic activity, and fluorescence. METL excels in challenging protein engineering tasks like generalizing from small training sets and position extrapolation, although existing methods that train on evolutionary signals remain powerful for many types of experimental assays. We demonstrate METL's ability to design functional green fluorescent protein variants when trained on only 64 examples, showcasing the potential of biophysics-based protein language models for protein engineering.
Collapse
Affiliation(s)
- Sam Gelman
- Department of Computer Sciences, University of Wisconsin-Madison
- Morgridge Institute for Research
| | - Bryce Johnson
- Department of Computer Sciences, University of Wisconsin-Madison
- Morgridge Institute for Research
| | | | - Sameer D'Costa
- Department of Biochemistry, University of Wisconsin-Madison
| | - Anthony Gitter
- Department of Computer Sciences, University of Wisconsin-Madison
- Morgridge Institute for Research
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison
| | | |
Collapse
|
5
|
Dieckhaus H, Brocidiacono M, Randolph NZ, Kuhlman B. Transfer learning to leverage larger datasets for improved prediction of protein stability changes. Proc Natl Acad Sci U S A 2024; 121:e2314853121. [PMID: 38285937 PMCID: PMC10861915 DOI: 10.1073/pnas.2314853121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2023] [Accepted: 12/26/2023] [Indexed: 01/31/2024] Open
Abstract
Amino acid mutations that lower a protein's thermodynamic stability are implicated in numerous diseases, and engineered proteins with enhanced stability can be important in research and medicine. Computational methods for predicting how mutations perturb protein stability are, therefore, of great interest. Despite recent advancements in protein design using deep learning, in silico prediction of stability changes has remained challenging, in part due to a lack of large, high-quality training datasets for model development. Here, we describe ThermoMPNN, a deep neural network trained to predict stability changes for protein point mutations given an initial structure. In doing so, we demonstrate the utility of a recently released megascale stability dataset for training a robust stability model. We also employ transfer learning to leverage a second, larger dataset by using learned features extracted from ProteinMPNN, a deep neural network trained to predict a protein's amino acid sequence given its three-dimensional structure. We show that our method achieves state-of-the-art performance on established benchmark datasets using a lightweight model architecture that allows for rapid, scalable predictions. Finally, we make ThermoMPNN readily available as a tool for stability prediction and design.
Collapse
Affiliation(s)
- Henry Dieckhaus
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, NC27599
- Division of Chemical Biology and Medicinal Chemistry, University of North Carolina Eshelman School of Pharmacy, Chapel Hill, NC27599
| | - Michael Brocidiacono
- Division of Chemical Biology and Medicinal Chemistry, University of North Carolina Eshelman School of Pharmacy, Chapel Hill, NC27599
| | - Nicholas Z. Randolph
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, NC27599
- Department of Bioinformatics and Computational Biology, University of North Carolina School of Medicine, Chapel Hill, NC27599
| | - Brian Kuhlman
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, NC27599
- Department of Bioinformatics and Computational Biology, University of North Carolina School of Medicine, Chapel Hill, NC27599
- Lineberger Comprehensive Cancer Center, University of North Carolina School of Medicine, Chapel Hill, NC27599
| |
Collapse
|
6
|
Liu B, Jiang Y, Yang Y, Chen JX. OmeDDG: Improved Protein Mutation Stability Prediction Based on Predicted 3D Structures. J Phys Chem B 2024; 128:67-76. [PMID: 38130113 DOI: 10.1021/acs.jpcb.3c05601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Determining changes in the protein's thermal stability following mutations is critical in protein engineering and understanding pathogenic missense mutations. Despite the development of various computational methods to predict the effects of single-point mutations, their accuracy remains limited. In this study, we propose a new computational method, OmeDDG, that more accurately predicts mutation-induced Gibbs free energy changes in protein folding (ΔΔG). OmeDDG takes the sequences of wild-type and mutant proteins as input, utilizes OmegaFold to obtain the 3D structure, employs a convolutional neural network to extract structural features, and combines them with protein mutation features and pretraining features to predict the stability of single-point mutations in proteins. We performed a comprehensive comparison between OmeDDG and other available prediction methods on four blind test datasets, confirming that OmeDDG can effectively enhance protein mutation prediction performance. Notably, on the antisymmetric dataset Ssym, OmeDDG achieves the best performance, demonstrating favorable antisymmetry with PCC = 0.79 and RMSE = 0.96 for forward mutations and PCC = 0.77 and RMSE = 0.97 for reverse mutant types.
Collapse
Affiliation(s)
- Baoying Liu
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu 611756, Sichuan, China
| | - Yongquan Jiang
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu 611756, Sichuan, China
- Artificial Intelligence Research Institute, Southwest Jiaotong University, Chengdu 611756, Sichuan, China
| | - Yan Yang
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu 611756, Sichuan, China
- Artificial Intelligence Research Institute, Southwest Jiaotong University, Chengdu 611756, Sichuan, China
| | - Jim X Chen
- Department of Computer Science, George Mason University, Fairfax, Virginia 22030-4444, United States
| |
Collapse
|
7
|
Zheng F, Liu Y, Yang Y, Wen Y, Li M. Assessing computational tools for predicting protein stability changes upon missense mutations using a new dataset. Protein Sci 2024; 33:e4861. [PMID: 38084013 PMCID: PMC10751734 DOI: 10.1002/pro.4861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 11/14/2023] [Accepted: 12/06/2023] [Indexed: 12/28/2023]
Abstract
Insight into how mutations affect protein stability is crucial for protein engineering, understanding genetic diseases, and exploring protein evolution. Numerous computational methods have been developed to predict the impact of amino acid substitutions on protein stability. Nevertheless, comparing these methods poses challenges due to variations in their training data. Moreover, it is observed that they tend to perform better at predicting destabilizing mutations than stabilizing ones. Here, we meticulously compiled a new dataset from three recently published databases: ThermoMutDB, FireProtDB, and ProThermDB. This dataset, which does not overlap with the well-established S2648 dataset, consists of 4038 single-point mutations, including over 1000 stabilizing mutations. We assessed these mutations using 27 computational methods, including the latest ones utilizing mega-scale stability datasets and transfer learning. We excluded entries with overlap or similarity to training datasets to ensure fairness. Pearson correlation coefficients for the tested tools ranged from 0.20 to 0.53 on unseen data, and none of the methods could accurately predict stabilizing mutations, even those performing well in anti-symmetric property analysis. While most methods present consistent trends for predicting destabilizing mutations across various properties such as solvent exposure and secondary conformation, stabilizing mutations do not exhibit a clear pattern. Our study also suggests that solely addressing training dataset bias may not significantly enhance accuracy of predicting stabilizing mutations. These findings emphasize the importance of developing precise predictive methods for stabilizing mutations.
Collapse
Affiliation(s)
- Feifan Zheng
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Yang Liu
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Yan Yang
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Yuhao Wen
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Minghui Li
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| |
Collapse
|
8
|
Rahban M, Ahmad F, Piatyszek MA, Haertlé T, Saso L, Saboury AA. Stabilization challenges and aggregation in protein-based therapeutics in the pharmaceutical industry. RSC Adv 2023; 13:35947-35963. [PMID: 38090079 PMCID: PMC10711991 DOI: 10.1039/d3ra06476j] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 11/30/2023] [Indexed: 04/26/2024] Open
Abstract
Protein-based therapeutics have revolutionized the pharmaceutical industry and become vital components in the development of future therapeutics. They offer several advantages over traditional small molecule drugs, including high affinity, potency and specificity, while demonstrating low toxicity and minimal adverse effects. However, the development and manufacturing processes of protein-based therapeutics presents challenges related to protein folding, purification, stability and immunogenicity that should be addressed. These proteins, like other biological molecules, are prone to chemical and physical instabilities. The stability of protein-based drugs throughout the entire manufacturing, storage and delivery process is essential. The occurrence of structural instability resulting from misfolding, unfolding, and modifications, as well as aggregation, poses a significant risk to the efficacy of these drugs, overshadowing their promising attributes. Gaining insight into structural alterations caused by aggregation and their impact on immunogenicity is vital for the advancement and refinement of protein therapeutics. Hence, in this review, we have discussed some features of protein aggregation during production, formulation and storage as well as stabilization strategies in protein engineering and computational methods to prevent aggregation.
Collapse
Affiliation(s)
- Mahdie Rahban
- Neuroscience Research Center, Institute of Neuropharmacology, Kerman University of Medical Sciences Kerman Iran
| | - Faizan Ahmad
- Department of Biochemistry, School of Chemical & Life Sciences, Jamia Hamdard New Delhi-110062 India
| | | | | | - Luciano Saso
- Department of Physiology and Pharmacology "Vittorio Erspamer", Sapienza University Rome Italy
| | - Ali Akbar Saboury
- Institute of Biochemistry and Biophysics, University of Tehran Tehran 1417614335 Iran +9821 66404680 +9821 66956984
| |
Collapse
|
9
|
Kurniawan J, Ishida T. Comparing Supervised Learning and Rigorous Approach for Predicting Protein Stability upon Point Mutations in Difficult Targets. J Chem Inf Model 2023; 63:6778-6788. [PMID: 37897811 DOI: 10.1021/acs.jcim.3c00750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/30/2023]
Abstract
Accurate prediction of protein stability upon a point mutation has important applications in drug discovery and personalized medicine. It remains a challenging issue in computational biology. Existing computational prediction methods, which range from mechanistic to supervised learning approaches, have experienced limited progress over the last few decades. This stagnation is largely due to their heavy reliance on both the quantity and quality of the training data. This is evident in recent state-of-the-art methods that continue to yield substantial errors on two challenging blind test sets: frataxin and p53, with average root-mean-square errors exceeding 3 and 1.5 kcal/mol, respectively, which is still above the theoretical 1 kcal/mol prediction barrier. Rigorous approaches, on the other hand, offer greater potential for accuracy without relying on training data but are computationally demanding and require both wild-type and mutant structure information. Although they showed high accuracy for conserving mutations, their performance is still limited for charge-changing mutation cases. This might be due to the lack of an available mutant structure, often represented by a simplified capped peptide. The recent advances in protein structure prediction methods now make it possible to obtain structures comparable to experimental ones, including complete mutant structure information. In this work, we compare the performance of supervised learning-based methods and rigorous approaches for predicting protein stability on point mutations in difficult targets: frataxin and p53. The rigorous alchemical method significantly surpasses state-of-the-art techniques in terms of both the root-mean-squared error and Pearson correlation coefficient in these two challenging blind test sets. Additionally, we propose an improved alchemical method that employs the pmx double-system/single-box approach to accurately predict the folding free energy change upon both conserving and charge-changing mutations. The enhanced protocol can accurately predict both types of mutations, thereby outperforming existing state-of-the-art methods in overall performance.
Collapse
Affiliation(s)
- Jason Kurniawan
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Tokyo 152-8550, Japan
| | - Takashi Ishida
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Tokyo 152-8550, Japan
| |
Collapse
|
10
|
Kouba P, Kohout P, Haddadi F, Bushuiev A, Samusevich R, Sedlar J, Damborsky J, Pluskal T, Sivic J, Mazurenko S. Machine Learning-Guided Protein Engineering. ACS Catal 2023; 13:13863-13895. [PMID: 37942269 PMCID: PMC10629210 DOI: 10.1021/acscatal.3c02743] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 09/20/2023] [Indexed: 11/10/2023]
Abstract
Recent progress in engineering highly promising biocatalysts has increasingly involved machine learning methods. These methods leverage existing experimental and simulation data to aid in the discovery and annotation of promising enzymes, as well as in suggesting beneficial mutations for improving known targets. The field of machine learning for protein engineering is gathering steam, driven by recent success stories and notable progress in other areas. It already encompasses ambitious tasks such as understanding and predicting protein structure and function, catalytic efficiency, enantioselectivity, protein dynamics, stability, solubility, aggregation, and more. Nonetheless, the field is still evolving, with many challenges to overcome and questions to address. In this Perspective, we provide an overview of ongoing trends in this domain, highlight recent case studies, and examine the current limitations of machine learning-based methods. We emphasize the crucial importance of thorough experimental validation of emerging models before their use for rational protein design. We present our opinions on the fundamental problems and outline the potential directions for future research.
Collapse
Affiliation(s)
- Petr Kouba
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Faculty of
Electrical Engineering, Czech Technical
University in Prague, Technicka 2, 166 27 Prague 6, Czech Republic
| | - Pavel Kohout
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Faraneh Haddadi
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Anton Bushuiev
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Raman Samusevich
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Jiri Sedlar
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Jiri Damborsky
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Tomas Pluskal
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Josef Sivic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Stanislav Mazurenko
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| |
Collapse
|
11
|
Umerenkov D, Nikolaev F, Shashkova TI, Strashnov PV, Sindeeva M, Shevtsov A, Ivanisenko NV, Kardymon OL. PROSTATA: a framework for protein stability assessment using transformers. Bioinformatics 2023; 39:btad671. [PMID: 37935419 PMCID: PMC10651431 DOI: 10.1093/bioinformatics/btad671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 10/25/2023] [Accepted: 11/02/2023] [Indexed: 11/09/2023] Open
Abstract
MOTIVATION Accurate prediction of change in protein stability due to point mutations is an attractive goal that remains unachieved. Despite the high interest in this area, little consideration has been given to the transformer architecture, which is dominant in many fields of machine learning. RESULTS In this work, we introduce PROSTATA, a predictive model built in a knowledge-transfer fashion on a new curated dataset. PROSTATA demonstrates advantage over existing solutions based on neural networks. We show that the large improvement margin is due to both the architecture of the model and the quality of the new training dataset. This work opens up opportunities to develop new lightweight and accurate models for protein stability assessment. AVAILABILITY AND IMPLEMENTATION PROSTATA is available at https://github.com/AIRI-Institute/PROSTATA and https://prostata.airi.net.
Collapse
Affiliation(s)
| | | | | | - Pavel V Strashnov
- Bioinformatics Group, AIRI, Moscow 121170, Russia
- Department of Computer Design and Technology, Bauman Moscow State Technical University, Moscow 105005, Russia
| | | | - Andrey Shevtsov
- Bioinformatics Group, AIRI, Moscow 121170, Russia
- Regulatory Transcriptomics and Epigenomics Group, Institute of Bioengineering, Research Center of Biotechnology RAS, Moscow 117036, Russia
| | - Nikita V Ivanisenko
- Bioinformatics Group, AIRI, Moscow 121170, Russia
- Laboratory of Computational Proteomics, Institute of Cytology and Genetics SB RAS, Novosibirsk 630090, Russia
| | | |
Collapse
|
12
|
Chen Z, Wang X, Chen X, Huang J, Wang C, Wang J, Wang Z. Accelerating therapeutic protein design with computational approaches toward the clinical stage. Comput Struct Biotechnol J 2023; 21:2909-2926. [PMID: 38213894 PMCID: PMC10781723 DOI: 10.1016/j.csbj.2023.04.027] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 04/11/2023] [Accepted: 04/27/2023] [Indexed: 01/13/2024] Open
Abstract
Therapeutic protein, represented by antibodies, is of increasing interest in human medicine. However, clinical translation of therapeutic protein is still largely hindered by different aspects of developability, including affinity and selectivity, stability and aggregation prevention, solubility and viscosity reduction, and deimmunization. Conventional optimization of the developability with widely used methods, like display technologies and library screening approaches, is a time and cost-intensive endeavor, and the efficiency in finding suitable solutions is still not enough to meet clinical needs. In recent years, the accelerated advancement of computational methodologies has ushered in a transformative era in the field of therapeutic protein design. Owing to their remarkable capabilities in feature extraction and modeling, the integration of cutting-edge computational strategies with conventional techniques presents a promising avenue to accelerate the progression of therapeutic protein design and optimization toward clinical implementation. Here, we compared the differences between therapeutic protein and small molecules in developability and provided an overview of the computational approaches applicable to the design or optimization of therapeutic protein in several developability issues.
Collapse
Affiliation(s)
- Zhidong Chen
- Department of Pathology, The Eighth Affiliated Hospital, Sun Yat-sen University, Shenzhen 518033, China
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Xinpei Wang
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Xu Chen
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Juyang Huang
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Chenglin Wang
- Shenzhen Qiyu Biotechnology Co., Ltd, Shenzhen 518107, China
| | - Junqing Wang
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Zhe Wang
- Department of Pathology, The Eighth Affiliated Hospital, Sun Yat-sen University, Shenzhen 518033, China
| |
Collapse
|
13
|
Enhancing the Thermal Stability of Glutathione Bifunctional Synthase by B-Factor Strategy and Un/Folding Free Energy Calculation. Catalysts 2022. [DOI: 10.3390/catal12121649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Glutathione is of great significance in pharmaceutical and health fields, and one-step synthesis of reduced glutathione by glutathione bifunctional synthase has become a focus of research. The stability of glutathione bifunctional synthase is generally poor and urgently needs to be modified. The B-factor strategy and un/folding free energy calculation were both applied to enhance the thermal stability of glutathione bifunctional synthase from Streptococcus agalactiae (GshFSA). Based on the concept of B-factor strategy, we calculated the B-factor by molecular dynamics simulation to find flexible residues, performed point saturation mutations and high-throughput screening. At the same time, we also calculated the un/folding free energy of GshFSA and performed the point mutations. The optimal mutant from the B-factor strategy was R270S, which had a 2.62-fold increase in half-life period compared to the wild type, and the Q406M was the optimal mutant from the un/folding free energy calculation, with a 3.02-fold increase in half-life period. Both of them have provided a mechanistic explanation.
Collapse
|
14
|
Wang S, Tang H, Zhao Y, Zuo L. BayeStab: Predicting effects of mutations on protein stability with uncertainty quantification. Protein Sci 2022; 31:e4467. [PMID: 36217239 PMCID: PMC9601791 DOI: 10.1002/pro.4467] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 10/06/2022] [Accepted: 10/06/2022] [Indexed: 11/11/2022]
Abstract
Predicting protein thermostability change upon mutation is crucial for understanding diseases and designing therapeutics. However, accurately estimating Gibbs free energy change of the protein remained a challenge. Some methods struggle to generalize on examples with no homology and produce uncalibrated predictions. Here we leverage advances in graph neural networks for protein feature extraction to tackle this structure-property prediction task. Our method, BayeStab, is then tested on four test datasets, including S669, S611, S350, and Myoglobin, showing high generalization and symmetry performance. Meanwhile, we apply concrete dropout enabled Bayesian neural networks to infer plausible models and estimate uncertainty. By decomposing the uncertainty into parts induced by data noise and model, we demonstrate that the probabilistic method allows insights into the inherent noise of the training datasets, which is closely relevant to the upper bound of the task. Finally, the BayeStab web server is created and can be found at: http://www.bayestab.com. The code for this work is available at: https://github.com/HongzhouTang/BayeStab.
Collapse
Affiliation(s)
- Shuyu Wang
- Department of Control EngineeringNortheastern UniversityQinhuangdaoHebeiChina
| | - Hongzhou Tang
- Department of Control EngineeringNortheastern UniversityQinhuangdaoHebeiChina
| | - Yuliang Zhao
- Department of Control EngineeringNortheastern UniversityQinhuangdaoHebeiChina
| | - Lei Zuo
- Department of Naval Architecture and Marine EngineeringUniversity of MichiganAnn ArborMichiganUSA
| |
Collapse
|