1
|
Zheng F, Jiang X, Wen Y, Yang Y, Li M. Systematic investigation of machine learning on limited data: A study on predicting protein-protein binding strength. Comput Struct Biotechnol J 2024; 23:460-472. [PMID: 38235359 PMCID: PMC10792694 DOI: 10.1016/j.csbj.2023.12.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 12/14/2023] [Accepted: 12/16/2023] [Indexed: 01/19/2024] Open
Abstract
The application of machine learning techniques in biological research, especially when dealing with limited data availability, poses significant challenges. In this study, we leveraged advancements in method development for predicting protein-protein binding strength to conduct a systematic investigation into the application of machine learning on limited data. The binding strength, quantitatively measured as binding affinity, is vital for understanding the processes of recognition, association, and dysfunction that occur within protein complexes. By incorporating transfer learning, integrating domain knowledge, and employing both deep learning and traditional machine learning algorithms, we mitigated the impact of data limitations and made significant advancements in predicting protein-protein binding affinity. In particular, we developed over 20 models, ultimately selecting three representative best-performing ones that belong to distinct categories. The first model is structure-based, consisting of a random forest regression and thirteen handcrafted features. The second model is sequence-based, employing an architecture that combines transferred embedding features with a multilayer perceptron. Finally, we created an ensemble model by averaging the predictions of the two aforementioned models. The comparison with other predictors on three independent datasets confirms the significant improvements achieved by our models in predicting protein-protein binding affinity. The programs for running these three models are available at https://github.com/minghuilab/BindPPI.
Collapse
Affiliation(s)
- Feifan Zheng
- MOE Key Laboratory of Geriatric Diseases and Immunology, School of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow University, Suzhou, Jiangsu Province 215123, China
| | - Xin Jiang
- MOE Key Laboratory of Geriatric Diseases and Immunology, School of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow University, Suzhou, Jiangsu Province 215123, China
| | - Yuhao Wen
- MOE Key Laboratory of Geriatric Diseases and Immunology, School of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow University, Suzhou, Jiangsu Province 215123, China
| | - Yan Yang
- MOE Key Laboratory of Geriatric Diseases and Immunology, School of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow University, Suzhou, Jiangsu Province 215123, China
| | - Minghui Li
- MOE Key Laboratory of Geriatric Diseases and Immunology, School of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow University, Suzhou, Jiangsu Province 215123, China
| |
Collapse
|
2
|
Cárdenas-Guerra RE, Montes-Flores O, Nava-Pintor EE, Reséndiz-Cardiel G, Flores-Pucheta CI, Rodríguez-Gavaldón YI, Arroyo R, Bottazzi ME, Hotez PJ, Ortega-López J. Chagasin from Trypanosoma cruzi as a molecular scaffold to express epitopes of TSA-1 as soluble recombinant chimeras. Protein Expr Purif 2024; 218:106458. [PMID: 38423156 DOI: 10.1016/j.pep.2024.106458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 02/13/2024] [Accepted: 02/21/2024] [Indexed: 03/02/2024]
Abstract
Trypanosoma cruzi is the causative agent of Chagas disease, a global public health problem. New therapeutic drugs and biologics are needed. The TSA-1 recombinant protein of T. cruzi is one such promising antigen for developing a therapeutic vaccine. However, it is overexpressed in E. coli as inclusion bodies, requiring an additional refolding step. As an alternative, in this study, we propose the endogenous cysteine protease inhibitor chagasin as a molecular scaffold to generate chimeric proteins. These proteins will contain combinations of two of the five conserved epitopes (E1 to E5) of TSA-1 in the L4 and L6 chagasin loops. Twenty chimeras (Q1-Q20) were designed, and their solubility was predicted using bioinformatics tools. Nine chimeras with different degrees of solubility were selected and expressed in E. coli BL21 (DE3). Western blot assays with anti-6x-His and anti-chagasin antibodies confirmed the expression of soluble recombinant chimeras. Both theoretically and experimentally, the Q12 (E5-E3) chimera was the most soluble, and the Q20 (E4-E5) the most insoluble protein. Q4 (E5-E1) and Q8 (E5-E2) chimeras were classified as proteins with medium solubility that exhibited the highest yield in the soluble fraction. Notably, Q4 has a yield of 239 mg/L, well above the yield of recombinant chagasin (16.5 mg/L) expressed in a soluble form. The expression of the Q4 chimera was scaled up to a 7 L fermenter obtaining a yield of 490 mg/L. These data show that chagasin can serve as a molecular scaffold for the expression of TSA-1 epitopes in the form of soluble chimeras.
Collapse
Affiliation(s)
- Rosa Elena Cárdenas-Guerra
- Departamento de Biotecnología y Bioingeniería, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (CINVESTAV-IPN), Av. IPN # 2508, Col. San Pedro Zacatenco, Gustavo A. Madero, CP 07360, Mexico City, Mexico
| | - Octavio Montes-Flores
- Departamento de Biotecnología y Bioingeniería, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (CINVESTAV-IPN), Av. IPN # 2508, Col. San Pedro Zacatenco, Gustavo A. Madero, CP 07360, Mexico City, Mexico
| | - Edgar Ezequiel Nava-Pintor
- Departamento de Biotecnología y Bioingeniería, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (CINVESTAV-IPN), Av. IPN # 2508, Col. San Pedro Zacatenco, Gustavo A. Madero, CP 07360, Mexico City, Mexico
| | - Gerardo Reséndiz-Cardiel
- Departamento de Biotecnología y Bioingeniería, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (CINVESTAV-IPN), Av. IPN # 2508, Col. San Pedro Zacatenco, Gustavo A. Madero, CP 07360, Mexico City, Mexico
| | - Claudia Ivonne Flores-Pucheta
- Departamento de Biotecnología y Bioingeniería, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (CINVESTAV-IPN), Av. IPN # 2508, Col. San Pedro Zacatenco, Gustavo A. Madero, CP 07360, Mexico City, Mexico
| | - Yasmín Irene Rodríguez-Gavaldón
- Departamento de Biotecnología y Bioingeniería, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (CINVESTAV-IPN), Av. IPN # 2508, Col. San Pedro Zacatenco, Gustavo A. Madero, CP 07360, Mexico City, Mexico
| | - Rossana Arroyo
- Departamento de Infectómica y Patogénesis Molecular, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (CINVESTAV-IPN), Av. IPN # 2508, Col. San Pedro Zacatenco, Gustavo A. Madero, CP 07360, Mexico City, Mexico
| | - Maria Elena Bottazzi
- Texas Children's Hospital Center for Vaccine Development, Department of Pediatrics and Molecular Virology and Microbiology, National School of Tropical Medicine, Baylor College of Medicine, Houston, TX, USA
| | - Peter J Hotez
- Texas Children's Hospital Center for Vaccine Development, Department of Pediatrics and Molecular Virology and Microbiology, National School of Tropical Medicine, Baylor College of Medicine, Houston, TX, USA
| | - Jaime Ortega-López
- Departamento de Biotecnología y Bioingeniería, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (CINVESTAV-IPN), Av. IPN # 2508, Col. San Pedro Zacatenco, Gustavo A. Madero, CP 07360, Mexico City, Mexico.
| |
Collapse
|
3
|
Li B, Ming D. GATSol, an enhanced predictor of protein solubility through the synergy of 3D structure graph and large language modeling. BMC Bioinformatics 2024; 25:204. [PMID: 38824535 DOI: 10.1186/s12859-024-05820-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 05/29/2024] [Indexed: 06/03/2024] Open
Abstract
BACKGROUND Protein solubility is a critically important physicochemical property closely related to protein expression. For example, it is one of the main factors to be considered in the design and production of antibody drugs and a prerequisite for realizing various protein functions. Although several solubility prediction models have emerged in recent years, many of these models are limited to capturing information embedded in one-dimensional amino acid sequences, resulting in unsatisfactory predictive performance. RESULTS In this study, we introduce a novel Graph Attention network-based protein Solubility model, GATSol, which represents the 3D structure of proteins as a protein graph. In addition to the node features of amino acids extracted by the state-of-the-art protein large language model, GATSol utilizes amino acid distance maps generated using the latest AlphaFold technology. Rigorous testing on independent eSOL and the Saccharomyces cerevisiae test datasets has shown that GATSol outperforms most recently introduced models, especially with respect to the coefficient of determination R2, which reaches 0.517 and 0.424, respectively. It outperforms the current state-of-the-art GraphSol by 18.4% on the S. cerevisiae_test set. CONCLUSIONS GATSol captures 3D dimensional features of proteins by building protein graphs, which significantly improves the accuracy of protein solubility prediction. Recent advances in protein structure modeling allow our method to incorporate spatial structure features extracted from predicted structures into the model by relying only on the input of protein sequences, which simplifies the entire graph neural network prediction process, making it more user-friendly and efficient. As a result, GATSol may help prioritize highly soluble proteins, ultimately reducing the cost and effort of experimental work. The source code and data of the GATSol model are freely available at https://github.com/binbinbinv/GATSol .
Collapse
Affiliation(s)
- Bin Li
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, 30 South Puzhu Road, Jiangbei New District, Nanjing, 211816, Jiangsu, People's Republic of China
| | - Dengming Ming
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, 30 South Puzhu Road, Jiangbei New District, Nanjing, 211816, Jiangsu, People's Republic of China.
| |
Collapse
|
4
|
Li W, Lin H, Huang Z, Xie S, Zhou Y, Gong R, Jiang Q, Xiang C, Huang J. DOTAD: A Database of Therapeutic Antibody Developability. Interdiscip Sci 2024:10.1007/s12539-024-00613-2. [PMID: 38530613 DOI: 10.1007/s12539-024-00613-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Revised: 01/25/2024] [Accepted: 01/27/2024] [Indexed: 03/28/2024]
Abstract
The development of therapeutic antibodies is an important aspect of new drug discovery pipelines. The assessment of an antibody's developability-its suitability for large-scale production and therapeutic use-is a particularly important step in this process. Given that experimental assays to assess antibody developability in large scale are expensive and time-consuming, computational methods have been a more efficient alternative. However, the antibody research community faces significant challenges due to the scarcity of readily accessible data on antibody developability, which is essential for training and validating computational models. To address this gap, DOTAD (Database Of Therapeutic Antibody Developability) has been built as the first database dedicated exclusively to the curation of therapeutic antibody developability information. DOTAD aggregates all available therapeutic antibody sequence data along with various developability metrics from the scientific literature, offering researchers a robust platform for data storage, retrieval, exploration, and downloading. In addition to serving as a comprehensive repository, DOTAD enhances its utility by integrating a web-based interface that features state-of-the-art tools for the assessment of antibody developability. This ensures that users not only have access to critical data but also have the convenience of analyzing and interpreting this information. The DOTAD database represents a valuable resource for the scientific community, facilitating the advancement of therapeutic antibody research. It is freely accessible at http://i.uestc.edu.cn/DOTAD/ , providing an open data platform that supports the continuous growth and evolution of computational methods in the field of antibody development.
Collapse
Affiliation(s)
- Wenzhen Li
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 611731, China
| | - Hongyan Lin
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 611731, China
| | - Ziru Huang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 611731, China
| | - Shiyang Xie
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 611731, China
| | - Yuwei Zhou
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 611731, China
| | - Rong Gong
- School of Computer Science and Technology, Aba Teachers University, Aba, 623002, China
| | - Qianhu Jiang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 611731, China
| | - ChangCheng Xiang
- School of Computer Science and Technology, Aba Teachers University, Aba, 623002, China.
| | - Jian Huang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 611731, China.
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu, 611844, China.
| |
Collapse
|
5
|
Rahbar MR, Nezafat N, Morowvat MH, Savardashtaki A, Ghoshoon MB, Mehrabani-Zeinabad K, Ghasemi Y. Targeting Efficient Features of Urate Oxidase to Increase Its Solubility. Appl Biochem Biotechnol 2024:10.1007/s12010-023-04819-w. [PMID: 38308671 DOI: 10.1007/s12010-023-04819-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/19/2023] [Indexed: 02/05/2024]
Abstract
With the demand for mass production of protein drugs, solubility has become a serious issue. Extrinsic and intrinsic factors both affect this property. A homotetrameric cofactor-free urate oxidase (UOX) is not sufficiently soluble. To engineer UOX for optimum solubility, it is important to identify the most effective factor that influences solubility. The most effective feature to target for protein engineering was determined by measuring various solubility-related factors of UOX. A large library of homologous sequences was obtained from the databases. The data was reduced to six enzymes from different organisms. On the basis of various sequence- and structure-derived elements, the most and the least soluble enzymes were defined. To determine the best protein engineering target for modification, features of the most and least soluble enzymes were compared. Metabacillus fastidiosus UOX was the most soluble enzyme, while Agrobacterium globiformis UOX was the least soluble. According to the comparison-constant method, positive surface patches caused by arginine residue distribution are appropriate targets for modification. Two Arg to Ala mutations were introduced to the least soluble enzyme to test this hypothesis. These mutations significantly enhanced the mutant's solubility. While different algorithms produced conflicting results, it was difficult to determine which proteins were most and least soluble. Solubility prediction requires multiple algorithms based on these controversies. Protein surfaces should be investigated regionally rather than globally, and both sequence and structural data should be considered. Several other biotechnological products could be engineered using the data reduction and comparison-constant methods used in this study.
Collapse
Affiliation(s)
- Mohammad Reza Rahbar
- Pharmaceutical Sciences Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Navid Nezafat
- Pharmaceutical Sciences Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
- Department of Pharmaceutical Biotechnology, School of Pharmacy, Shiraz University of Medical Sciences, P.O. Box 71345-1583, Shiraz, Iran
| | - Mohammad Hossein Morowvat
- Pharmaceutical Sciences Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
- Department of Pharmaceutical Biotechnology, School of Pharmacy, Shiraz University of Medical Sciences, P.O. Box 71345-1583, Shiraz, Iran
| | - Amir Savardashtaki
- Department of Medical Biotechnology, School of Advanced Medical Sciences and Technologies, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Mohammad Bagher Ghoshoon
- Pharmaceutical Sciences Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
- Department of Pharmaceutical Biotechnology, School of Pharmacy, Shiraz University of Medical Sciences, P.O. Box 71345-1583, Shiraz, Iran
| | - Kamran Mehrabani-Zeinabad
- Department of Biostatistics, Faculty of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Younes Ghasemi
- Pharmaceutical Sciences Research Center, Shiraz University of Medical Sciences, Shiraz, Iran.
- Department of Pharmaceutical Biotechnology, School of Pharmacy, Shiraz University of Medical Sciences, P.O. Box 71345-1583, Shiraz, Iran.
| |
Collapse
|
6
|
Zheng F, Liu Y, Yang Y, Wen Y, Li M. Assessing computational tools for predicting protein stability changes upon missense mutations using a new dataset. Protein Sci 2024; 33:e4861. [PMID: 38084013 PMCID: PMC10751734 DOI: 10.1002/pro.4861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 11/14/2023] [Accepted: 12/06/2023] [Indexed: 12/28/2023]
Abstract
Insight into how mutations affect protein stability is crucial for protein engineering, understanding genetic diseases, and exploring protein evolution. Numerous computational methods have been developed to predict the impact of amino acid substitutions on protein stability. Nevertheless, comparing these methods poses challenges due to variations in their training data. Moreover, it is observed that they tend to perform better at predicting destabilizing mutations than stabilizing ones. Here, we meticulously compiled a new dataset from three recently published databases: ThermoMutDB, FireProtDB, and ProThermDB. This dataset, which does not overlap with the well-established S2648 dataset, consists of 4038 single-point mutations, including over 1000 stabilizing mutations. We assessed these mutations using 27 computational methods, including the latest ones utilizing mega-scale stability datasets and transfer learning. We excluded entries with overlap or similarity to training datasets to ensure fairness. Pearson correlation coefficients for the tested tools ranged from 0.20 to 0.53 on unseen data, and none of the methods could accurately predict stabilizing mutations, even those performing well in anti-symmetric property analysis. While most methods present consistent trends for predicting destabilizing mutations across various properties such as solvent exposure and secondary conformation, stabilizing mutations do not exhibit a clear pattern. Our study also suggests that solely addressing training dataset bias may not significantly enhance accuracy of predicting stabilizing mutations. These findings emphasize the importance of developing precise predictive methods for stabilizing mutations.
Collapse
Affiliation(s)
- Feifan Zheng
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Yang Liu
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Yan Yang
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Yuhao Wen
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Minghui Li
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| |
Collapse
|
7
|
Helmick H, Tonner T, Hauersperger D, Okos M, Kokini JL. Comparison of the specific mechanical energy, specific thermal energy, and functional properties of cold and hot extruded pea protein isolate. Food Res Int 2023; 174:113603. [PMID: 37986466 DOI: 10.1016/j.foodres.2023.113603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 10/13/2023] [Accepted: 10/14/2023] [Indexed: 11/22/2023]
Abstract
Pea protein is a popular source of plant-based protein, though its application in meat and dairy analog products is still lacking. This is particularly true in the development of products with fatty and creamy textures. Cold denaturation may be a way to induce these types of textures in food since this is a universal phenomenon in protein that occurs due to a weakening of hydrophobic interactions at cold temperatures. This work utilizes a single screw extruder to systematically study the impacts of moisture content (50-65 %) and pH (2,4.5,8) on the outlet temperatures, specific mechanical energy, specific thermal energy, and texture of cold-extruded pea protein. It was found that at pH 2 and moistures of 60 % and greater, the temperature of the product exiting the extruder is <5.5 °C, and also produced 13.7 %-36.5 % more specific thermal energy, indicating the occurrence of cold denaturation in these products. Based on these findings, a comparison of hot and cold extrusion was conducted as a function of pH and oil content. It was found that cold extrusion imparts 43.0 %-56.2 % more mechanical energy into the protein than hot extrusion, and the cold extruded protein had higher values of Young's modulus and breaking stress. The protein extruded at low temperatures was also able to bind 32.93 % more oil than hot extruded proteins when extruded with 10 % added oil, which may aid in the formation of protein-based fat memetics for the food industry.
Collapse
Affiliation(s)
- Harrison Helmick
- Department of Food Science, Purdue University, 745 Agriculture Mall Dr, West Lafayette, IN 47907, United States
| | - Troy Tonner
- Department of Agricultural and Biological Engineering, Purdue University, 225 South University Street, West Lafayette, IN 47907, United States
| | - Daniel Hauersperger
- Department of Agricultural and Biological Engineering, Purdue University, 225 South University Street, West Lafayette, IN 47907, United States
| | - Martin Okos
- Department of Agricultural and Biological Engineering, Purdue University, 225 South University Street, West Lafayette, IN 47907, United States
| | - Jozef L Kokini
- Department of Food Science, Purdue University, 745 Agriculture Mall Dr, West Lafayette, IN 47907, United States.
| |
Collapse
|
8
|
Chen Z, Wang X, Chen X, Huang J, Wang C, Wang J, Wang Z. Accelerating therapeutic protein design with computational approaches toward the clinical stage. Comput Struct Biotechnol J 2023; 21:2909-2926. [PMID: 38213894 PMCID: PMC10781723 DOI: 10.1016/j.csbj.2023.04.027] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 04/11/2023] [Accepted: 04/27/2023] [Indexed: 01/13/2024] Open
Abstract
Therapeutic protein, represented by antibodies, is of increasing interest in human medicine. However, clinical translation of therapeutic protein is still largely hindered by different aspects of developability, including affinity and selectivity, stability and aggregation prevention, solubility and viscosity reduction, and deimmunization. Conventional optimization of the developability with widely used methods, like display technologies and library screening approaches, is a time and cost-intensive endeavor, and the efficiency in finding suitable solutions is still not enough to meet clinical needs. In recent years, the accelerated advancement of computational methodologies has ushered in a transformative era in the field of therapeutic protein design. Owing to their remarkable capabilities in feature extraction and modeling, the integration of cutting-edge computational strategies with conventional techniques presents a promising avenue to accelerate the progression of therapeutic protein design and optimization toward clinical implementation. Here, we compared the differences between therapeutic protein and small molecules in developability and provided an overview of the computational approaches applicable to the design or optimization of therapeutic protein in several developability issues.
Collapse
Affiliation(s)
- Zhidong Chen
- Department of Pathology, The Eighth Affiliated Hospital, Sun Yat-sen University, Shenzhen 518033, China
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Xinpei Wang
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Xu Chen
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Juyang Huang
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Chenglin Wang
- Shenzhen Qiyu Biotechnology Co., Ltd, Shenzhen 518107, China
| | - Junqing Wang
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Zhe Wang
- Department of Pathology, The Eighth Affiliated Hospital, Sun Yat-sen University, Shenzhen 518033, China
| |
Collapse
|
9
|
Wittmund M, Cadet F, Davari MD. Learning Epistasis and Residue Coevolution Patterns: Current Trends and Future Perspectives for Advancing Enzyme Engineering. ACS Catal 2022. [DOI: 10.1021/acscatal.2c01426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Marcel Wittmund
- Department of Bioorganic Chemistry, Leibniz Institute of Plant Biochemistry, Weinberg 3, 06120 Halle, Germany
| | - Frederic Cadet
- Laboratory of Excellence LABEX GR, DSIMB, Inserm UMR S1134, University of Paris city & University of Reunion, Paris 75014, France
| | - Mehdi D. Davari
- Department of Bioorganic Chemistry, Leibniz Institute of Plant Biochemistry, Weinberg 3, 06120 Halle, Germany
| |
Collapse
|
10
|
Qing R, Hao S, Smorodina E, Jin D, Zalevsky A, Zhang S. Protein Design: From the Aspect of Water Solubility and Stability. Chem Rev 2022; 122:14085-14179. [PMID: 35921495 PMCID: PMC9523718 DOI: 10.1021/acs.chemrev.1c00757] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Indexed: 12/13/2022]
Abstract
Water solubility and structural stability are key merits for proteins defined by the primary sequence and 3D-conformation. Their manipulation represents important aspects of the protein design field that relies on the accurate placement of amino acids and molecular interactions, guided by underlying physiochemical principles. Emulated designer proteins with well-defined properties both fuel the knowledge-base for more precise computational design models and are used in various biomedical and nanotechnological applications. The continuous developments in protein science, increasing computing power, new algorithms, and characterization techniques provide sophisticated toolkits for solubility design beyond guess work. In this review, we summarize recent advances in the protein design field with respect to water solubility and structural stability. After introducing fundamental design rules, we discuss the transmembrane protein solubilization and de novo transmembrane protein design. Traditional strategies to enhance protein solubility and structural stability are introduced. The designs of stable protein complexes and high-order assemblies are covered. Computational methodologies behind these endeavors, including structure prediction programs, machine learning algorithms, and specialty software dedicated to the evaluation of protein solubility and aggregation, are discussed. The findings and opportunities for Cryo-EM are presented. This review provides an overview of significant progress and prospects in accurate protein design for solubility and stability.
Collapse
Affiliation(s)
- Rui Qing
- State
Key Laboratory of Microbial Metabolism, School of Life Sciences and
Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
- Media
Lab, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
- The
David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Shilei Hao
- Media
Lab, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
- Key
Laboratory of Biorheological Science and Technology, Ministry of Education, College of Bioengineering, Chongqing University, Chongqing 400030, China
| | - Eva Smorodina
- Department
of Immunology, University of Oslo and Oslo
University Hospital, Oslo 0424, Norway
| | - David Jin
- Avalon GloboCare
Corp., Freehold, New Jersey 07728, United States
| | - Arthur Zalevsky
- Laboratory
of Bioinformatics Approaches in Combinatorial Chemistry and Biology, Shemyakin−Ovchinnikov Institute of Bioorganic
Chemistry RAS, Moscow 117997, Russia
| | - Shuguang Zhang
- Media
Lab, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
11
|
Ferreira CS, Martins YC, Souza RC, Vasconcelos ATR. EpiCurator: an immunoinformatic workflow to predict and prioritize SARS-CoV-2 epitopes. PeerJ 2021; 9:e12548. [PMID: 34909278 PMCID: PMC8641484 DOI: 10.7717/peerj.12548] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2021] [Accepted: 11/04/2021] [Indexed: 12/12/2022] Open
Abstract
The ongoing coronavirus 2019 (COVID-19) pandemic, triggered by the emerging SARS-CoV-2 virus, represents a global public health challenge. Therefore, the development of effective vaccines is an urgent need to prevent and control virus spread. One of the vaccine production strategies uses the in silico epitope prediction from the virus genome by immunoinformatic approaches, which assist in selecting candidate epitopes for in vitro and clinical trials research. This study introduces the EpiCurator workflow to predict and prioritize epitopes from SARS-CoV-2 genomes by combining a series of computational filtering tools. To validate the workflow effectiveness, SARS-CoV-2 genomes retrieved from the GISAID database were analyzed. We identified 11 epitopes in the receptor-binding domain (RBD) of Spike glycoprotein, an important antigenic determinant, not previously described in the literature or published on the Immune Epitope Database (IEDB). Interestingly, these epitopes have a combination of important properties: recognized in sequences of the current variants of concern, present high antigenicity, conservancy, and broad population coverage. The RBD epitopes were the source for a multi-epitope design to in silico validation of their immunogenic potential. The multi-epitope overall quality was computationally validated, endorsing its efficiency to trigger an effective immune response since it has stability, high antigenicity and strong interactions with Toll-Like Receptors (TLR). Taken together, the findings in the current study demonstrated the efficacy of the workflow for epitopes discovery, providing target candidates for immunogen development.
Collapse
Affiliation(s)
- Cristina S. Ferreira
- Bioinformatics Laboratory, National Laboratory of Scientific Computation, Petrópolis, Rio de Janeiro, Brazil
| | - Yasmmin C. Martins
- Bioinformatics Laboratory, National Laboratory of Scientific Computation, Petrópolis, Rio de Janeiro, Brazil
| | - Rangel Celso Souza
- Bioinformatics Laboratory, National Laboratory of Scientific Computation, Petrópolis, Rio de Janeiro, Brazil
| | - Ana Tereza R. Vasconcelos
- Bioinformatics Laboratory, National Laboratory of Scientific Computation, Petrópolis, Rio de Janeiro, Brazil
| |
Collapse
|
12
|
Sun T, Chen Y, Wen Y, Zhu Z, Li M. PremPLI: a machine learning model for predicting the effects of missense mutations on protein-ligand interactions. Commun Biol 2021; 4:1311. [PMID: 34799678 PMCID: PMC8604987 DOI: 10.1038/s42003-021-02826-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Accepted: 10/26/2021] [Indexed: 02/07/2023] Open
Abstract
Resistance to small-molecule drugs is the main cause of the failure of therapeutic drugs in clinical practice. Missense mutations altering the binding of ligands to proteins are one of the critical mechanisms that result in genetic disease and drug resistance. Computational methods have made a lot of progress for predicting binding affinity changes and identifying resistance mutations, but their prediction accuracy and speed are still not satisfied and need to be further improved. To address these issues, we introduce a structure-based machine learning method for quantitatively estimating the effects of single mutations on ligand binding affinity changes (named as PremPLI). A comprehensive comparison of the predictive performance of PremPLI with other available methods on two benchmark datasets confirms that our approach performs robustly and presents similar or even higher predictive accuracy than the approaches relying on first-principle statistical mechanics and mixed physics- and knowledge-based potentials while requires much less computational resources. PremPLI can be used for guiding the design of ligand-binding proteins, identifying and understanding disease driver mutations, and finding potential resistance mutations for different drugs. PremPLI is freely available at https://lilab.jysw.suda.edu.cn/research/PremPLI/ and allows to do large-scale mutational scanning. Sun et al. present PremPLI, a machine learning approach and web tool to predict how missense mutations in a drug’s target protein will affect the drug’s binding affinity. PremPLI can be applied to identify drug resistance mechanisms in cancer cells and microorganisms and develop drugs to combat drug resistance.
Collapse
Affiliation(s)
- Tingting Sun
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, 215123, Suzhou, China
| | - Yuting Chen
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, 215123, Suzhou, China
| | - Yuhao Wen
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, 215123, Suzhou, China
| | - Zefeng Zhu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, 215123, Suzhou, China
| | - Minghui Li
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, 215123, Suzhou, China.
| |
Collapse
|
13
|
Melo MCR, Maasch JRMA, de la Fuente-Nunez C. Accelerating antibiotic discovery through artificial intelligence. Commun Biol 2021; 4:1050. [PMID: 34504303 PMCID: PMC8429579 DOI: 10.1038/s42003-021-02586-0] [Citation(s) in RCA: 59] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 07/16/2021] [Indexed: 02/07/2023] Open
Abstract
By targeting invasive organisms, antibiotics insert themselves into the ancient struggle of the host-pathogen evolutionary arms race. As pathogens evolve tactics for evading antibiotics, therapies decline in efficacy and must be replaced, distinguishing antibiotics from most other forms of drug development. Together with a slow and expensive antibiotic development pipeline, the proliferation of drug-resistant pathogens drives urgent interest in computational methods that promise to expedite candidate discovery. Strides in artificial intelligence (AI) have encouraged its application to multiple dimensions of computer-aided drug design, with increasing application to antibiotic discovery. This review describes AI-facilitated advances in the discovery of both small molecule antibiotics and antimicrobial peptides. Beyond the essential prediction of antimicrobial activity, emphasis is also given to antimicrobial compound representation, determination of drug-likeness traits, antimicrobial resistance, and de novo molecular design. Given the urgency of the antimicrobial resistance crisis, we analyze uptake of open science best practices in AI-driven antibiotic discovery and argue for openness and reproducibility as a means of accelerating preclinical research. Finally, trends in the literature and areas for future inquiry are discussed, as artificially intelligent enhancements to drug discovery at large offer many opportunities for future applications in antibiotic development.
Collapse
Affiliation(s)
- Marcelo C R Melo
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Jacqueline R M A Maasch
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
- Department of Computer and Information Science, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA.
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
14
|
Revolutionizing enzyme engineering through artificial intelligence and machine learning. Emerg Top Life Sci 2021; 5:113-125. [PMID: 33835131 DOI: 10.1042/etls20200257] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 03/17/2021] [Accepted: 03/22/2021] [Indexed: 12/20/2022]
Abstract
The combinatorial space of an enzyme sequence has astronomical possibilities and exploring it with contemporary experimental techniques is arduous and often ineffective. Multi-target objectives such as concomitantly achieving improved selectivity, solubility and activity of an enzyme have narrow plausibility under approaches of restricted mutagenesis and combinatorial search. Traditional enzyme engineering approaches have a limited scope for complex optimization due to the requirement of a priori knowledge or experimental burden of screening huge protein libraries. The recent surge in high-throughput experimental methods including Next Generation Sequencing and automated screening has flooded the field of molecular biology with big-data, which requires us to re-think our concurrent approaches towards enzyme engineering. Artificial Intelligence (AI) and Machine Learning (ML) have great potential to revolutionize smart enzyme engineering without the explicit need for a complete understanding of the underlying molecular system. Here, we portray the role and position of AI techniques in the field of enzyme engineering along with their scope and limitations. In addition, we explain how the traditional approaches of directed evolution and rational design can be extended through AI tools. Recent successful examples of AI-assisted enzyme engineering projects and their deviation from traditional approaches are highlighted. A comprehensive picture of current challenges and future avenues for AI in enzyme engineering are also discussed.
Collapse
|
15
|
Ptak-Kaczor M, Banach M, Stapor K, Fabian P, Konieczny L, Roterman I. Solubility and Aggregation of Selected Proteins Interpreted on the Basis of Hydrophobicity Distribution. Int J Mol Sci 2021; 22:ijms22095002. [PMID: 34066830 PMCID: PMC8125953 DOI: 10.3390/ijms22095002] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Revised: 05/03/2021] [Accepted: 05/06/2021] [Indexed: 11/30/2022] Open
Abstract
Protein solubility is based on the compatibility of the specific protein surface with the polar aquatic environment. The exposure of polar residues to the protein surface promotes the protein’s solubility in the polar environment. The aquatic environment also influences the folding process by favoring the centralization of hydrophobic residues with the simultaneous exposure to polar residues. The degree of compatibility of the residue distribution, with the model of the concentration of hydrophobic residues in the center of the molecule, with the simultaneous exposure of polar residues is determined by the sequence of amino acids in the chain. The fuzzy oil drop model enables the quantification of the degree of compatibility of the hydrophobicity distribution observed in the protein to a form fully consistent with the Gaussian 3D function, which expresses an idealized distribution that meets the preferences of the polar water environment. The varied degrees of compatibility of the distribution observed with the idealized one allow the prediction of preferences to interactions with molecules of different polarity, including water molecules in particular. This paper analyzes a set of proteins with different levels of hydrophobicity distribution in the context of the solubility of a given protein and the possibility of complex formation.
Collapse
Affiliation(s)
- Magdalena Ptak-Kaczor
- Department of Bioinformatics and Telemedicine, Jagiellonian University—Medical College, Medyczna 7, 30-688 Kraków, Poland; (M.P.-K.); (M.B.)
- Faculty of Physics, Astronomy and Applied Computer Science, Jagiellonian University, Łojasiewicza 11, 30-348 Kraków, Poland
| | - Mateusz Banach
- Department of Bioinformatics and Telemedicine, Jagiellonian University—Medical College, Medyczna 7, 30-688 Kraków, Poland; (M.P.-K.); (M.B.)
| | - Katarzyna Stapor
- Institute of Computer Science, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland; (K.S.); (P.F.)
| | - Piotr Fabian
- Institute of Computer Science, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland; (K.S.); (P.F.)
| | - Leszek Konieczny
- Chair of Medical Biochemistry—Jagiellonian University—Medical College, Kopernika 7, 31-034 Kraków, Poland;
| | - Irena Roterman
- Department of Bioinformatics and Telemedicine, Jagiellonian University—Medical College, Medyczna 7, 30-688 Kraków, Poland; (M.P.-K.); (M.B.)
- Faculty of Physics, Astronomy and Applied Computer Science, Jagiellonian University, Łojasiewicza 11, 30-348 Kraków, Poland
- Correspondence:
| |
Collapse
|
16
|
Postic G, Janel N, Moroy G. Representations of protein structure for exploring the conformational space: A speed-accuracy trade-off. Comput Struct Biotechnol J 2021; 19:2618-2625. [PMID: 34025948 PMCID: PMC8120936 DOI: 10.1016/j.csbj.2021.04.049] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 04/19/2021] [Accepted: 04/20/2021] [Indexed: 11/25/2022] Open
Abstract
We compare ten structural representations, either atomistic or coarse-grained. Thus, ten distance-dependent statistical potentials of mean force (PMF) were built. The Cβ-only and Cα + Cβ representations provide the best speed–accuracy trade-off. Including glycines through Cα, in a Cβ-only representation, yields a higher accuracy. We generalize the conclusions to the total information gain (TIG) scoring function.
The recent breakthrough in the field of protein structure prediction shows the relevance of using knowledge-based based scoring functions in combination with a low-resolution 3D representation of protein macromolecules. The choice of not using all atoms is barely supported by any data in the literature, and is mostly motivated by empirical and practical reasons, such as the computational cost of assessing the numerous folds of the protein conformational space. Here, we present a comprehensive study, carried on a large and balanced benchmark of predicted protein structures, to see how different types of structural representations rank in either accuracy or calculation speed, and which ones offer the best compromise between these two criteria. We tested ten representations, including low-resolution, high-resolution, and coarse-grained approaches. We also investigated the generalization of the findings to other formalisms than the widely-used “potential of mean force” (PMF) method. Thus, we observed that representing protein structures by their β carbons—combined or not with Cα—provides the best speed–accuracy trade-off, when using a “total information gain” scoring function. For statistical PMFs, using MARTINI backbone and side-chains beads is the best option. Finally, we also demonstrated the necessity of training the reference state on all atom types, and of including the Cα atoms of glycine residues, in a Cβ-based representation.
Collapse
Affiliation(s)
- Guillaume Postic
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France
- Corresponding author.
| | - Nathalie Janel
- Université de Paris, BFA, UMR 8251, CNRS, F-75013 Paris, France
| | - Gautier Moroy
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France
| |
Collapse
|
17
|
Bhandari BK, Gardner PP, Lim CS. Solubility-Weighted Index: fast and accurate prediction of protein solubility. Bioinformatics 2021; 36:4691-4698. [PMID: 32559287 PMCID: PMC7750957 DOI: 10.1093/bioinformatics/btaa578] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Revised: 05/05/2020] [Accepted: 06/12/2020] [Indexed: 12/14/2022] Open
Abstract
Motivation Recombinant protein production is a widely used technique in the biotechnology and biomedical industries, yet only a quarter of target proteins are soluble and can therefore be purified. Results We have discovered that global structural flexibility, which can be modeled by normalized B-factors, accurately predicts the solubility of 12 216 recombinant proteins expressed in Escherichia coli. We have optimized these B-factors, and derived a new set of values for solubility scoring that further improves prediction accuracy. We call this new predictor the ‘Solubility-Weighted Index’ (SWI). Importantly, SWI outperforms many existing protein solubility prediction tools. Furthermore, we have developed ‘SoDoPE’ (Soluble Domain for Protein Expression), a web interface that allows users to choose a protein region of interest for predicting and maximizing both protein expression and solubility. Availability and implementation The SoDoPE web server and source code are freely available at https://tisigner.com/sodope and https://github.com/Gardner-BinfLab/TISIGNER-ReactJS, respectively. The code and data for reproducing our analysis can be found at https://github.com/Gardner-BinfLab/SoDoPE_paper_2020. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Bikash K Bhandari
- Department of Biochemistry, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
| | - Paul P Gardner
- Department of Biochemistry, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand.,Biomolecular Interaction Centre, University of Canterbury, Christchurch, New Zealand
| | - Chun Shen Lim
- Department of Biochemistry, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
| |
Collapse
|
18
|
Chen J, Zheng S, Zhao H, Yang Y. Structure-aware protein solubility prediction from sequence through graph convolutional network and predicted contact map. J Cheminform 2021; 13:7. [PMID: 33557952 PMCID: PMC7869490 DOI: 10.1186/s13321-021-00488-1] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 01/20/2021] [Indexed: 11/26/2022] Open
Abstract
Protein solubility is significant in producing new soluble proteins that can reduce the cost of biocatalysts or therapeutic agents. Therefore, a computational model is highly desired to accurately predict protein solubility from the amino acid sequence. Many methods have been developed, but they are mostly based on the one-dimensional embedding of amino acids that is limited to catch spatially structural information. In this study, we have developed a new structure-aware method GraphSol to predict protein solubility by attentive graph convolutional network (GCN), where the protein topology attribute graph was constructed through predicted contact maps only from the sequence. GraphSol was shown to substantially outperform other sequence-based methods. The model was proven to be stable by consistent [Formula: see text] of 0.48 in both the cross-validation and independent test of the eSOL dataset. To our best knowledge, this is the first study to utilize the GCN for sequence-based protein solubility predictions. More importantly, this architecture could be easily extended to other protein prediction tasks requiring a raw protein sequence.
Collapse
Affiliation(s)
- Jianwen Chen
- School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, China
| | - Shuangjia Zheng
- School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, China
| | - Huiying Zhao
- Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, Guangzhou, China
| | - Yuedong Yang
- School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, China.
- Key Laboratory of Machine Intelligence and Advanced Computing (Sun Yat-Sen University), Guangzhou, 510000, China.
| |
Collapse
|
19
|
Chen Y, Lu H, Zhang N, Zhu Z, Wang S, Li M. PremPS: Predicting the impact of missense mutations on protein stability. PLoS Comput Biol 2020; 16:e1008543. [PMID: 33378330 PMCID: PMC7802934 DOI: 10.1371/journal.pcbi.1008543] [Citation(s) in RCA: 93] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 01/12/2021] [Accepted: 11/16/2020] [Indexed: 12/12/2022] Open
Abstract
Computational methods that predict protein stability changes induced by missense mutations have made a lot of progress over the past decades. Most of the available methods however have very limited accuracy in predicting stabilizing mutations because existing experimental sets are dominated by mutations reducing protein stability. Moreover, few approaches could consistently perform well across different test cases. To address these issues, we developed a new computational method PremPS to more accurately evaluate the effects of missense mutations on protein stability. The PremPS method is composed of only ten evolutionary- and structure-based features and parameterized on a balanced dataset with an equal number of stabilizing and destabilizing mutations. A comprehensive comparison of the predictive performance of PremPS with other available methods on nine benchmark datasets confirms that our approach consistently outperforms other methods and shows considerable improvement in estimating the impacts of stabilizing mutations. A protein could have multiple structures available, and if another structure of the same protein is used, the predicted change in stability for structure-based methods might be different. Thus, we further estimated the impact of using different structures on prediction accuracy, and demonstrate that our method performs well across different types of structures except for low-resolution structures and models built based on templates with low sequence identity. PremPS can be used for finding functionally important variants, revealing the molecular mechanisms of functional influences and protein design. PremPS is freely available at https://lilab.jysw.suda.edu.cn/research/PremPS/, which allows to do large-scale mutational scanning and takes about four minutes to perform calculations for a single mutation per protein with ~ 300 residues and requires ~ 0.4 seconds for each additional mutation. The development of computational methods to accurately predict the impacts of amino acid substitutions on protein stability is of paramount importance for the field of protein design and understanding the roles of missense mutations in disease. However, most of the available methods have very limited predictive accuracy for mutations increasing stability and few could consistently perform well across different test cases. Here we present a new computational approach PremPS, which is capable of predicting the effects of single point mutations on protein stability. PremPS employs only ten evolutionary- and structure-based features and is trained on a symmetrical dataset consisting of the same number of cases of stabilizing and destabilizing mutations. Our method was tested against numerous blind datasets and shows a considerable improvement especially in evaluating the effects of stabilizing mutations, outperforming previously developed methods. PremPS is freely available as a user-friendly web server at http://lilab.jysw.suda.edu.cn/research/PremPS/, which is fast enough to handle the large number of cases.
Collapse
Affiliation(s)
- Yuting Chen
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Haoyu Lu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Ning Zhang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Zefeng Zhu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Shuqin Wang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Minghui Li
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
- * E-mail:
| |
Collapse
|
20
|
Ebo JS, Guthertz N, Radford SE, Brockwell DJ. Using protein engineering to understand and modulate aggregation. Curr Opin Struct Biol 2020; 60:157-166. [PMID: 32087409 PMCID: PMC7132541 DOI: 10.1016/j.sbi.2020.01.005] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Revised: 01/08/2020] [Accepted: 01/09/2020] [Indexed: 02/07/2023]
Abstract
Protein aggregation occurs through a variety of mechanisms, initiated by the unfolded, non-native, or even the native state itself. Understanding the molecular mechanisms of protein aggregation is challenging, given the array of competing interactions that control solubility, stability, cooperativity and aggregation propensity. An array of methods have been developed to interrogate protein aggregation, spanning computational algorithms able to identify aggregation-prone regions, to deep mutational scanning to define the entire mutational landscape of a protein's sequence. Here, we review recent advances in this exciting and emerging field, focussing on protein engineering approaches that, together with improved computational methods, hold promise to predict and control protein aggregation linked to human disease, as well as facilitating the manufacture of protein-based therapeutics.
Collapse
Affiliation(s)
- Jessica S Ebo
- Astbury Centre for Structural Molecular Biology, University of Leeds, Leeds, LS2 9JT, UK; School of Molecular and Cellular Biology, Faculty of Biological Sciences, University of Leeds, Leeds, LS2 9JT, UK
| | - Nicolas Guthertz
- Astbury Centre for Structural Molecular Biology, University of Leeds, Leeds, LS2 9JT, UK; School of Molecular and Cellular Biology, Faculty of Biological Sciences, University of Leeds, Leeds, LS2 9JT, UK
| | - Sheena E Radford
- Astbury Centre for Structural Molecular Biology, University of Leeds, Leeds, LS2 9JT, UK; School of Molecular and Cellular Biology, Faculty of Biological Sciences, University of Leeds, Leeds, LS2 9JT, UK
| | - David J Brockwell
- Astbury Centre for Structural Molecular Biology, University of Leeds, Leeds, LS2 9JT, UK; School of Molecular and Cellular Biology, Faculty of Biological Sciences, University of Leeds, Leeds, LS2 9JT, UK.
| |
Collapse
|
21
|
Affiliation(s)
- Stanislav Mazurenko
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
| | - Zbynek Prokop
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- International Centre for Clinical Research, St. Ann’s Hospital, 602 00 Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- International Centre for Clinical Research, St. Ann’s Hospital, 602 00 Brno, Czech Republic
| |
Collapse
|