1
|
Huang F, Gao Q, Zhou X, Guo W, Feng K, Zhu L, Huang T, Cai YD. Prediction of Solubility of Proteins in Escherichia coli Based on Functional and Structural Features Using Machine Learning Methods. Protein J 2024; 43:983-996. [PMID: 39243320 DOI: 10.1007/s10930-024-10230-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/21/2024] [Indexed: 09/09/2024]
Abstract
Protein solubility is a critical parameter that determines the stability, activity, and functionality of proteins, with broad and far-reaching implications in biotechnology and biochemistry. Accurate prediction and control of protein solubility are essential for successful protein expression and purification in research and industrial settings. This study gathered information on soluble and insoluble proteins. In characterizing the proteins, they were mapped to STRING and characterized by functional and structural features. All functional/structural features were integrated to create a 5768-dimensional binary vector to encode proteins. Seven feature-ranking algorithms were employed to analyze the functional/structural features, yielding seven feature lists. These lists were subjected to the incremental feature selection, incorporating four classification algorithms, one by one to build effective classification models and identify functional/structural features with classification-related importance. Some essential functional/structural features used to differentiate between soluble and insoluble proteins were identified, including GO:0009987 (intercellular communication) and GO:0022613 (ribonucleoprotein complex biogenesis). The best classification model using support vector machine as the classification algorithm and 295 optimized functional/structural features generated the F1 score of 0.825, which can be a powerful tool to differentiate soluble proteins from insoluble proteins.
Collapse
Affiliation(s)
- Feiming Huang
- School of Life Sciences, Shanghai University, Shanghai, 200444, People's Republic of China
| | - Qian Gao
- Department of Pharmacy, Shanghai Children's Medical Center, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - XianChao Zhou
- Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Institutes for Biological Sciences (SIBS), Shanghai Jiao Tong University School of Medicine (SJTUSM), Chinese Academy of Sciences (CAS), Shanghai, 200030, China
| | - KaiYan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou, 510507, China
| | - Lin Zhu
- School of Information Science, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Tao Huang
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, Bio-Med Big Data Center, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, 200444, People's Republic of China.
| |
Collapse
|
2
|
Watthanasakphuban N, Ninchan B, Pinmanee P, Rattanaporn K, Keawsompong S. In Silico Analysis and Development of the Secretory Expression of D-Psicose-3-Epimerase in Escherichia coli. Microorganisms 2024; 12:1574. [PMID: 39203416 PMCID: PMC11356227 DOI: 10.3390/microorganisms12081574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 07/27/2024] [Accepted: 07/28/2024] [Indexed: 09/03/2024] Open
Abstract
D-psicose-3-epimerase (DPEase), a key enzyme for D-psicose production, has been successfully expressed in Escherichia coli with high yield. However, intracellular expression results in high downstream processing costs and greater risk of lipopolysaccharide (LPS) contamination during cell disruption. The secretory expression of DPEase could minimize the number of purification steps and prevent LPS contamination, but achieving the secretion expression of DPEase in E. coli is challenging and has not been reported due to certain limitations. This study addresses these challenges by enhancing the secretion of DPEase in E. coli through computational predictions and structural analyses. Signal peptide prediction identified PelB as the most effective signal peptide for DPEase localization and enhanced solubility. Supplementary strategies included the addition of 0.1% (v/v) Triton X-100 to promote protein secretion, resulting in higher extracellular DPEase (0.5 unit/mL). Low-temperature expression (20 °C) mitigated the formation of inclusion bodies, thus enhancing DPEase solubility. Our findings highlight the pivotal role of signal peptide selection in modulating DPEase solubility and activity, offering valuable insights for protein expression and secretion studies, especially for rare sugar production. Ongoing exploration of alternative signal peptides and refinement of secretion strategies promise further enhancement in enzyme secretion efficiency and process safety, paving the way for broader applications in biotechnology.
Collapse
Affiliation(s)
- Nisit Watthanasakphuban
- Department of Biotechnology, Faculty of Agro-Industry, Kasetsart University, Chatuchak, Bangkok 10900, Thailand; (N.W.); (B.N.); (K.R.)
| | - Boontiwa Ninchan
- Department of Biotechnology, Faculty of Agro-Industry, Kasetsart University, Chatuchak, Bangkok 10900, Thailand; (N.W.); (B.N.); (K.R.)
| | - Phitsanu Pinmanee
- Enzyme Technology Research Team, National Center of Genetic Engineering and Biotechnology (BIOTEC), Pathum Thani 12120, Thailand;
| | - Kittipong Rattanaporn
- Department of Biotechnology, Faculty of Agro-Industry, Kasetsart University, Chatuchak, Bangkok 10900, Thailand; (N.W.); (B.N.); (K.R.)
| | - Suttipun Keawsompong
- Department of Biotechnology, Faculty of Agro-Industry, Kasetsart University, Chatuchak, Bangkok 10900, Thailand; (N.W.); (B.N.); (K.R.)
| |
Collapse
|
3
|
Wei H, Lunin VV, Alahuhta M, Himmel ME, Huang S, Bomble YJ, Zhang M. Streamlining heterologous expression of top carbonic anhydrases in Escherichia coli: bioinformatic and experimental approaches. Microb Cell Fact 2024; 23:190. [PMID: 38956607 PMCID: PMC11218372 DOI: 10.1186/s12934-024-02463-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Accepted: 06/18/2024] [Indexed: 07/04/2024] Open
Abstract
BACKGROUND Carbonic anhydrase (CA) enzymes facilitate the reversible hydration of CO2 to bicarbonate ions and protons. Identifying efficient and robust CAs and expressing them in model host cells, such as Escherichia coli, enables more efficient engineering of these enzymes for industrial CO2 capture. However, expression of CAs in E. coli is challenging due to the possible formation of insoluble protein aggregates, or inclusion bodies. This makes the production of soluble and active CA protein a prerequisite for downstream applications. RESULTS In this study, we streamlined the process of CA expression by selecting seven top CA candidates and used two bioinformatic tools to predict their solubility for expression in E. coli. The prediction results place these enzymes in two categories: low and high solubility. Our expression of high solubility score CAs (namely CA5-SspCA, CA6-SazCAtrunc, CA7-PabCA and CA8-PhoCA) led to significantly higher protein yields (5 to 75 mg purified protein per liter) in flask cultures, indicating a strong correlation between the solubility prediction score and protein expression yields. Furthermore, phylogenetic tree analysis demonstrated CA class-specific clustering patterns for protein solubility and production yields. Unexpectedly, we also found that the unique N-terminal, 11-amino acid segment found after the signal sequence (not present in its homologs), was essential for CA6-SazCA activity. CONCLUSIONS Overall, this work demonstrated that protein solubility prediction, phylogenetic tree analysis, and experimental validation are potent tools for identifying top CA candidates and then producing soluble, active forms of these enzymes in E. coli. The comprehensive approaches we report here should be extendable to the expression of other heterogeneous proteins in E. coli.
Collapse
Affiliation(s)
- Hui Wei
- Biosciences Center, National Renewable Energy Laboratory, Golden, CO, 80401, USA.
| | - Vladimir V Lunin
- Biosciences Center, National Renewable Energy Laboratory, Golden, CO, 80401, USA
| | - Markus Alahuhta
- Biosciences Center, National Renewable Energy Laboratory, Golden, CO, 80401, USA
| | - Michael E Himmel
- Biosciences Center, National Renewable Energy Laboratory, Golden, CO, 80401, USA
- National Bioenergy Center, National Renewable Energy Laboratory, Golden, CO, 80401, USA
| | - Shu Huang
- Biosciences Center, National Renewable Energy Laboratory, Golden, CO, 80401, USA
- Thayer School of Engineering, Dartmouth College, Hanover, New Hampshire, USA
| | - Yannick J Bomble
- Biosciences Center, National Renewable Energy Laboratory, Golden, CO, 80401, USA
| | - Min Zhang
- Biosciences Center, National Renewable Energy Laboratory, Golden, CO, 80401, USA.
| |
Collapse
|
4
|
Li B, Ming D. GATSol, an enhanced predictor of protein solubility through the synergy of 3D structure graph and large language modeling. BMC Bioinformatics 2024; 25:204. [PMID: 38824535 DOI: 10.1186/s12859-024-05820-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 05/29/2024] [Indexed: 06/03/2024] Open
Abstract
BACKGROUND Protein solubility is a critically important physicochemical property closely related to protein expression. For example, it is one of the main factors to be considered in the design and production of antibody drugs and a prerequisite for realizing various protein functions. Although several solubility prediction models have emerged in recent years, many of these models are limited to capturing information embedded in one-dimensional amino acid sequences, resulting in unsatisfactory predictive performance. RESULTS In this study, we introduce a novel Graph Attention network-based protein Solubility model, GATSol, which represents the 3D structure of proteins as a protein graph. In addition to the node features of amino acids extracted by the state-of-the-art protein large language model, GATSol utilizes amino acid distance maps generated using the latest AlphaFold technology. Rigorous testing on independent eSOL and the Saccharomyces cerevisiae test datasets has shown that GATSol outperforms most recently introduced models, especially with respect to the coefficient of determination R2, which reaches 0.517 and 0.424, respectively. It outperforms the current state-of-the-art GraphSol by 18.4% on the S. cerevisiae_test set. CONCLUSIONS GATSol captures 3D dimensional features of proteins by building protein graphs, which significantly improves the accuracy of protein solubility prediction. Recent advances in protein structure modeling allow our method to incorporate spatial structure features extracted from predicted structures into the model by relying only on the input of protein sequences, which simplifies the entire graph neural network prediction process, making it more user-friendly and efficient. As a result, GATSol may help prioritize highly soluble proteins, ultimately reducing the cost and effort of experimental work. The source code and data of the GATSol model are freely available at https://github.com/binbinbinv/GATSol .
Collapse
Affiliation(s)
- Bin Li
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, 30 South Puzhu Road, Jiangbei New District, Nanjing, 211816, Jiangsu, People's Republic of China
| | - Dengming Ming
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, 30 South Puzhu Road, Jiangbei New District, Nanjing, 211816, Jiangsu, People's Republic of China.
| |
Collapse
|
5
|
Gonçalves AAM, Ribeiro AJ, Resende CAA, Couto CAP, Gandra IB, Dos Santos Barcelos IC, da Silva JO, Machado JM, Silva KA, Silva LS, Dos Santos M, da Silva Lopes L, de Faria MT, Pereira SP, Xavier SR, Aragão MM, Candida-Puma MA, de Oliveira ICM, Souza AA, Nogueira LM, da Paz MC, Coelho EAF, Giunchetti RC, de Freitas SM, Chávez-Fumagalli MA, Nagem RAP, Galdino AS. Recombinant multiepitope proteins expressed in Escherichia coli cells and their potential for immunodiagnosis. Microb Cell Fact 2024; 23:145. [PMID: 38778337 PMCID: PMC11110257 DOI: 10.1186/s12934-024-02418-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 05/07/2024] [Indexed: 05/25/2024] Open
Abstract
Recombinant multiepitope proteins (RMPs) are a promising alternative for application in diagnostic tests and, given their wide application in the most diverse diseases, this review article aims to survey the use of these antigens for diagnosis, as well as discuss the main points surrounding these antigens. RMPs usually consisting of linear, immunodominant, and phylogenetically conserved epitopes, has been applied in the experimental diagnosis of various human and animal diseases, such as leishmaniasis, brucellosis, cysticercosis, Chagas disease, hepatitis, leptospirosis, leprosy, filariasis, schistosomiasis, dengue, and COVID-19. The synthetic genes for these epitopes are joined to code a single RMP, either with spacers or fused, with different biochemical properties. The epitopes' high density within the RMPs contributes to a high degree of sensitivity and specificity. The RMPs can also sidestep the need for multiple peptide synthesis or multiple recombinant proteins, reducing costs and enhancing the standardization conditions for immunoassays. Methods such as bioinformatics and circular dichroism have been widely applied in the development of new RMPs, helping to guide their construction and better understand their structure. Several RMPs have been expressed, mainly using the Escherichia coli expression system, highlighting the importance of these cells in the biotechnological field. In fact, technological advances in this area, offering a wide range of different strains to be used, make these cells the most widely used expression platform. RMPs have been experimentally used to diagnose a broad range of illnesses in the laboratory, suggesting they could also be useful for accurate diagnoses commercially. On this point, the RMP method offers a tempting substitute for the production of promising antigens used to assemble commercial diagnostic kits.
Collapse
Affiliation(s)
- Ana Alice Maia Gonçalves
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Anna Julia Ribeiro
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Carlos Ananias Aparecido Resende
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Carolina Alves Petit Couto
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Isadora Braga Gandra
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Isabelle Caroline Dos Santos Barcelos
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Jonatas Oliveira da Silva
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Juliana Martins Machado
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Kamila Alves Silva
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Líria Souza Silva
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Michelli Dos Santos
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Lucas da Silva Lopes
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Mariana Teixeira de Faria
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Sabrina Paula Pereira
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Sandra Rodrigues Xavier
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Matheus Motta Aragão
- Department of Biochemistry and Immunology, Federal University of Minas Gerais, Belo Horizonte, 31270-901, Brazil
| | - Mayron Antonio Candida-Puma
- Computational Biology and Chemistry Research Group, Vicerrectorado de Investigación, Universidad Católica de Santa María, Arequipa, 04000, Peru
| | | | - Amanda Araujo Souza
- Biophysics Laboratory, Institute of Biological Sciences, Department of Cell Biology, University of Brasilia, Brasília, 70910-900, Brazil
| | - Lais Moreira Nogueira
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Mariana Campos da Paz
- Bioactives and Nanobiotechnology Laboratory, Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Eduardo Antônio Ferraz Coelho
- Postgraduate Program in Health Sciences, Infectious Diseases and Tropical Medicine, Faculty of Medicine, Federal University of Minas Gerais, Belo Horizonte, 30130-100, Brazil
| | - Rodolfo Cordeiro Giunchetti
- Laboratory of Biology of Cell Interactions, National Institute of Science and Technology on Tropical Diseases (INCT-DT), Department of Morphology, Federal University of Minas Gerais, Belo Horizonte, 31270-901, Brazil
| | - Sonia Maria de Freitas
- Biophysics Laboratory, Institute of Biological Sciences, Department of Cell Biology, University of Brasilia, Brasília, 70910-900, Brazil
| | - Miguel Angel Chávez-Fumagalli
- Computational Biology and Chemistry Research Group, Vicerrectorado de Investigación, Universidad Católica de Santa María, Arequipa, 04000, Peru
| | - Ronaldo Alves Pinto Nagem
- Department of Biochemistry and Immunology, Federal University of Minas Gerais, Belo Horizonte, 31270-901, Brazil
| | - Alexsandro Sobreira Galdino
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil.
| |
Collapse
|
6
|
Taheri-Anganeh M, Nezafat N, Gharibi S, Khatami SH, Vahedi F, Shabaninejad Z, Asadi M, Savardashtaki A, Movahedpour A, Ghasemi H. Designing a Secretory form of RTX-A as an Anticancer Toxin: An In Silico Approach. Recent Pat Biotechnol 2024; 18:332-343. [PMID: 38817010 DOI: 10.2174/0118722083267796231210060150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 10/29/2023] [Accepted: 11/17/2023] [Indexed: 06/01/2024]
Abstract
BACKGROUND Cancer is a leading cause of death and a significant public health issue worldwide. Standard treatment methods such as chemotherapy, radiotherapy, and surgery are only sometimes effective. Therefore, new therapeutic approaches are needed for cancer treatment. Sea anemone actinoporins are pore-forming toxins (PFTs) with membranolytic activities. RTX-A is a type of PFT that interacts with membrane phospholipids, resulting in pore formation. The synthesis of recombinant proteins in a secretory form has several advantages, including protein solubility and easy purification. In this study, we aimed to discover suitable signal peptides for producing RTX-A in Bacillus subtilis in a secretory form. METHODS Signal peptides were selected from the Signal Peptide Web Server. The probability and secretion pathways of the selected signal peptides were evaluated using the SignalP server. ProtParam and Protein-sol were used to predict the physico-chemical properties and solubility. AlgPred was used to predict the allergenicity of RTX-A linked to suitable signal peptides. Non-allergenic, stable, and soluble signal peptides fused to proteins were chosen, and their secondary and tertiary structures were predicted using GOR IV and I-TASSER, respectively. The PROCHECK server performed the validation of 3D structures. RESULTS According to bioinformatics analysis, the fusion forms of OSMY_ECOLI and MALE_ECOLI linked to RTX-A were identified as suitable signal peptides. The final proteins with signal peptides were stable, soluble, and non-allergenic for the human body. Moreover, they had appropriate secondary and tertiary structures. CONCLUSION The signal above peptides appears ideal for rationalizing secretory and soluble RTX-A. Therefore, the signal peptides found in this study should be further investigated through experimental researches and patents.
Collapse
Affiliation(s)
- Mortaza Taheri-Anganeh
- Cellular and Molecular Research Center, Cellular and Molecular Medicine Research Institute, Urmia University of Medical Sciences, Urmia, Iran
| | - Navid Nezafat
- Pharmaceutical Sciences Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
- Department of Pharmaceutical Biotechnology, School of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Saba Gharibi
- School of Exercise and Nutrition Sciences, Faculty of Health, Deakin University, Melbourne, Australia
| | - Seyyed Hossein Khatami
- Department of Clinical Biochemistry, School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Farzaneh Vahedi
- Department of Medical Biotechnology, School of Advanced Medical Sciences and Technologies, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Zahra Shabaninejad
- Department of Nanobiotechnology, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Marzieh Asadi
- Department of Medical Biotechnology, School of Advanced Medical Sciences and Technologies, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Amir Savardashtaki
- Department of Medical Biotechnology, School of Advanced Medical Sciences and Technologies, Shiraz University of Medical Sciences, Shiraz, Iran
| | | | | |
Collapse
|
7
|
Chen Z, Wang X, Chen X, Huang J, Wang C, Wang J, Wang Z. Accelerating therapeutic protein design with computational approaches toward the clinical stage. Comput Struct Biotechnol J 2023; 21:2909-2926. [PMID: 38213894 PMCID: PMC10781723 DOI: 10.1016/j.csbj.2023.04.027] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 04/11/2023] [Accepted: 04/27/2023] [Indexed: 01/13/2024] Open
Abstract
Therapeutic protein, represented by antibodies, is of increasing interest in human medicine. However, clinical translation of therapeutic protein is still largely hindered by different aspects of developability, including affinity and selectivity, stability and aggregation prevention, solubility and viscosity reduction, and deimmunization. Conventional optimization of the developability with widely used methods, like display technologies and library screening approaches, is a time and cost-intensive endeavor, and the efficiency in finding suitable solutions is still not enough to meet clinical needs. In recent years, the accelerated advancement of computational methodologies has ushered in a transformative era in the field of therapeutic protein design. Owing to their remarkable capabilities in feature extraction and modeling, the integration of cutting-edge computational strategies with conventional techniques presents a promising avenue to accelerate the progression of therapeutic protein design and optimization toward clinical implementation. Here, we compared the differences between therapeutic protein and small molecules in developability and provided an overview of the computational approaches applicable to the design or optimization of therapeutic protein in several developability issues.
Collapse
Affiliation(s)
- Zhidong Chen
- Department of Pathology, The Eighth Affiliated Hospital, Sun Yat-sen University, Shenzhen 518033, China
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Xinpei Wang
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Xu Chen
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Juyang Huang
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Chenglin Wang
- Shenzhen Qiyu Biotechnology Co., Ltd, Shenzhen 518107, China
| | - Junqing Wang
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Zhe Wang
- Department of Pathology, The Eighth Affiliated Hospital, Sun Yat-sen University, Shenzhen 518033, China
| |
Collapse
|
8
|
Hashemzaei M, Nezafat N, Ghoshoon MB, Negahdaripour M. In-silico selection of appropriate signal peptides for romiplostim secretory production in Escherichia coli. INFORMATICS IN MEDICINE UNLOCKED 2022. [DOI: 10.1016/j.imu.2022.101146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
|
9
|
Tamehri M, Rasooli I, Pishgahi M, Jahangiri A, Ramezanalizadeh F, Banisaeed Langroodi SR. Combination of BauA and OmpA elicit immunoprotection against Acinetobacter baumannii in a murine sepsis model. Microb Pathog 2022; 173:105874. [DOI: 10.1016/j.micpath.2022.105874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 09/18/2022] [Accepted: 11/05/2022] [Indexed: 11/09/2022]
|
10
|
Rahmatabadi SS, Mobini K, Askari S, Najafian J, Karami K, Soleymani B, Mostafaie A. In silico characterization of fructosyl peptide oxidase properties from Eupenicillium terrenum. J Mol Recognit 2022; 35:e2980. [PMID: 35657361 DOI: 10.1002/jmr.2980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 05/23/2022] [Accepted: 06/01/2022] [Indexed: 12/24/2022]
Abstract
Fructosyl peptide oxidase (FPOX) enzyme from Eupenicillium terrenum has a high potential to be applied as a diagnostic enzyme. The aim of the present study is the characterization of FPOX from E. terrenum using different bioinformatics tools. The computational prediction of the RNA and protein secondary structures of FPOX, solubility profile in Escherichia coli, stability, domains, and functional properties were performed. In the FPOX protein, six motifs were detected. The d-amino acid oxidase motif was found as the most important motif that is a FAD-dependent oxidoreductase. The cysteines including 97, 154, 234, 280, and 360 showed a lower score than -10 that have a low possibility for participitation in the formation of the SS bond. The 56.52% of FPOX amino acids are nonpolar. Random coils are dominant in the FPOX sequence, followed by alpha-helix and extended strand. The fpox gene is capable of generating a stable RNA secondary structure (-423.90 kcal/mol) in E. coli. FPOX has a large number of hydrophobic amino acids. FPOX showed a low solubility in E. coli which has several aggregation-prone sites in its 3-D structure. According to the scores, the best mutation candidate for increasing solubility was the conversion of methionine 302 to arginine. The melting temperature of FPOX based on its amino acid sequence was 55°C to 65°C. The amounts of thermodynamic parameters for the FPOX enzyme were -137.4 kcal/mol, -3.59 kcal/(mol K), and -6.8 kcal/mol for standard folding enthalpy, heat capacity, and folding free energy, respectively. In conclusion, the in silico study of proteins can provide a valuable method for better understanding the protein properties and functions for use in our purposes.
Collapse
Affiliation(s)
| | - Keivan Mobini
- Department of Hematology, Faculty of Allied Medical Science, Bushehr University of Medical Sciences, Bushehr, Iran
| | - Soudabeh Askari
- Department Biotechnolgy, Applied Razi Biotechnology, Kermanshah, Iran
| | - Javad Najafian
- Department of Biology, Faculty of Basic Science, University of Mazandaran, Baboulsar, Iran
| | - Keyvan Karami
- Medical Biology Research Center, Health Technology Institute, Kermanshah University of Medical Sciences, Kermanshah, Iran
| | - Bijan Soleymani
- Medical Biology Research Center, Health Technology Institute, Kermanshah University of Medical Sciences, Kermanshah, Iran
| | - Ali Mostafaie
- Medical Biology Research Center, Health Technology Institute, Kermanshah University of Medical Sciences, Kermanshah, Iran
| |
Collapse
|
11
|
Yi W, Sun A, Liu M, Liu X, Zhang W, Dai Q. Comparative Study on Feature Selection in Protein Structure and Function Prediction. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:1650693. [PMID: 36267316 PMCID: PMC9578875 DOI: 10.1155/2022/1650693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Accepted: 09/14/2022] [Indexed: 11/18/2022]
Abstract
Many effective methods extract and fuse different protein features to study the relationship between protein sequence, structure, and function, but different methods have preferences in solving the research of protein structure and function, which requires selecting valuable and contributing features to design more effective prediction methods. This work mainly focused on the feature selection methods in the study of protein structure and function, and systematically compared and analyzed the efficiency of different feature selection methods in the prediction of protein structures, protein disorders, protein molecular chaperones, and protein solubility. The results show that the feature selection method based on nonlinear SVM performs best in protein structure prediction, protein solubility prediction, protein molecular chaperone prediction, and protein solubility prediction. After selection, the accuracy of features is improved by 13.16% ~71%, especially the Kmer features and PSSM features of proteins.
Collapse
Affiliation(s)
- Wenjing Yi
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Ao Sun
- College of Informatics Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Manman Liu
- College of Informatics Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Xiaoqing Liu
- College of Sciences, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Wei Zhang
- College of Informatics Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Qi Dai
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
| |
Collapse
|
12
|
Novel multi epitope-based vaccine against monkeypox virus: vaccinomic approach. Sci Rep 2022; 12:15983. [PMID: 36156077 PMCID: PMC9510130 DOI: 10.1038/s41598-022-20397-z] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 09/13/2022] [Indexed: 11/30/2022] Open
Abstract
While mankind is still dealing with the COVID-19 pandemic, a case of monkeypox virus (MPXV) has been reported to the WHO on May 7, 2022. Monkeypox is a viral zoonotic disease that has been a public health threat, particularly in Africa. However, it has recently expanded to other parts of the world, so it may soon become a global issue. Thus, the current work was planned and then designed a multi-epitope vaccine against MPXV utilizing the cell surface-binding protein as a target in order to develop a novel and safe vaccine that can evoke the desirable immunological response. The proposed MHC-I, MHC-II, and B-cell epitopes were selected to design multi-epitope vaccine constructs linked with suitable linkers in combination with different adjuvants to enhance the immune responses for the vaccine constructs. The proposed vaccine was composed of 275 amino acids and was shown to be antigenic in Vaxijen server (0.5311) and non-allergenic in AllerTop server. The 3D structure of the designed vaccine was predicted, refined and validated by various in silico tools to assess the stability of the vaccine. Moreover, the solubility of the vaccine construct was found greater than the average solubility provided by protein-Sol server which indicating the solubility of the vaccine construct. Additionally, the most promising epitopes bound to MHC I and MHC II alleles were found having good binding affinities with low energies ranging between − 7.0 and − 8.6 kcal/mol. According to the immunological simulation research, the vaccine was found to elicit a particular immune reaction against the monkeypox virus. Finally, the molecular dynamic study shows that the designed vaccine is stable with minimum RMSF against MHC I allele. We conclude from our research that the cell surface-binding protein is one of the primary proteins involved in MPXV pathogenesis. As a result, our study will aid in the development of appropriate therapeutics and prompt the development of future vaccines against MPXV.
Collapse
|
13
|
Abuei H, Pirouzfar M, Mojiri A, Behzad-Behbahani A, Kalantari T, Bemani P, Farhadi A. Maximizing the recovery of the native p28 bacterial peptide with improved activity and maintained solubility and stability in Escherichia coli BL21 (DE3). METHODS IN MICROBIOLOGY 2022; 200:106560. [PMID: 36031157 DOI: 10.1016/j.mimet.2022.106560] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 08/10/2022] [Accepted: 08/20/2022] [Indexed: 02/06/2023]
Abstract
p28 is a natural bacterial product, which recently has attracted much attention as an efficient cell penetrating peptide (CPP) and a promising anticancer agent. Considering the interesting biological qualities of p28, maximizing its expression appears to be a prominent priority. The optimization of such bioprocesses might be facilitated by utilizing statistical approaches such as Design of Experiment (DoE). In this study, we aimed to maximize the expression of "biologically active" p28 in Escherichia coli BL21 (DE3) host by harnessing statistical tools and experimental methods. Using Minitab, Plackett-Burman and Box-Behnken Response Surface Methodology (RSM) designs were generated to optimize the conditions for the expression of p28. Each condition was experimentally investigated by assessing the biological activity of the purified p28 in the MCF-7 breast cancer cell line. Seven independent variables were investigated, and three of them including ethanol concentration, OD600 of the culture at the time of induction, and the post-induction temperature were demonstrated to significantly affect the p28 expression in E. coli. The cytotoxicity, penetration efficiency, and total process time were measured as dependent variables. The optimized expression conditions were validated experimentally, and the final products were investigated in terms of expression yield, solubility, and stability in vitro. Following the optimization, an 8-fold increase of the concentration of p28 expression was observed. In this study, we suggest an optimized combination of effective factors to produce soluble p28 in the E. coli host, a protocol that results in the production of a significantly high amount of the biologically active peptide with retained solubility and stability.
Collapse
Affiliation(s)
- Haniyeh Abuei
- Division of Medical Biotechnology, Department of Medical Laboratory Sciences, School of Paramedical Sciences, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Mohammad Pirouzfar
- Human and Animal Cell Bank, Iranian Biological Resource Center (IBRC), ACECR, Tehran, Iran
| | - Anahita Mojiri
- Center for Cardiovascular Regeneration, Department of Cardiovascular Sciences, Houston Methodist Research Institute, Houston 77030, TX, USA
| | - Abbas Behzad-Behbahani
- Diagnostic Laboratory Sciences and Technology Research Center, School of Paramedical Sciences, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Tahereh Kalantari
- Division of Medical Biotechnology, Department of Medical Laboratory Sciences, School of Paramedical Sciences, Shiraz University of Medical Sciences, Shiraz, Iran; Diagnostic Laboratory Sciences and Technology Research Center, School of Paramedical Sciences, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Peyman Bemani
- Department of Immunology, School of Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Ali Farhadi
- Division of Medical Biotechnology, Department of Medical Laboratory Sciences, School of Paramedical Sciences, Shiraz University of Medical Sciences, Shiraz, Iran; Diagnostic Laboratory Sciences and Technology Research Center, School of Paramedical Sciences, Shiraz University of Medical Sciences, Shiraz, Iran.
| |
Collapse
|
14
|
Wang H, Kwong CF, Liu Q, Liu Z, Chen Z. A Novel Artificial Intelligence System in Formulation Dissolution Prediction. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:8640115. [PMID: 35978897 PMCID: PMC9377879 DOI: 10.1155/2022/8640115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/01/2022] [Revised: 06/20/2022] [Accepted: 06/24/2022] [Indexed: 11/29/2022]
Abstract
Artificial neural network (ANN) techniques are widely used to screen the data and predict the experimental result in pharmaceutical studies. In this study, a novel dissolution result prediction and screen system with a backpropagation network and regression methods was modeled. For this purpose, 21 groups of dissolution data were used to train and verify the ANN model. Based on the design of input data, the related data were still available to train the ANN model when the formulation composition was changed. Two regression methods, the effective data regression method (EDRM) and the reference line regression method (RLRM), make this system predict dissolution results with a high accuracy rate but use less database than the orthogonal experiment. Based on the decision tree, a data screen function is also realized in this system. This ANN model provides a novel drug prediction system with a decrease in time and cost and also easily facilitates the design of new formulation.
Collapse
Affiliation(s)
- Haoyu Wang
- Department of Electrical and Electronic Engineering, University of Nottingham Ningbo China, Ningbo, China
| | - Chiew Foong Kwong
- Department of Electrical and Electronic Engineering, University of Nottingham Ningbo China, Ningbo, China
| | - Qianyu Liu
- International Doctoral Innovation Centre, NingboTech University, Ningbo, China
| | - Zhixin Liu
- Department of Outpatient, Liaoning Thrombus Treatment Center of Integrated Chinese and Western Medicine, Shenyang, China
| | - Zhiyuan Chen
- Department of Mechanical, Materials and Manufacture, University of Nottingham Ningbo China, Ningbo, China
| |
Collapse
|
15
|
Karaiyan P, Chang CCH, Chan ES, Tey BT, Ramanan RN, Ooi CW. In silico screening and heterologous expression of soluble dimethyl sulfide monooxygenases of microbial origin in Escherichia coli. Appl Microbiol Biotechnol 2022; 106:4523-4537. [PMID: 35713659 PMCID: PMC9259527 DOI: 10.1007/s00253-022-12008-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Revised: 05/30/2022] [Accepted: 06/01/2022] [Indexed: 11/28/2022]
Abstract
Abstract Sequence-based screening has been widely applied in the discovery of novel microbial enzymes. However, majority of the sequences in the genomic databases were annotated using computational approaches and lacks experimental characterization. Hence, the success in obtaining the functional biocatalysts with improved characteristics requires an efficient screening method that considers a wide array of factors. Recombinant expression of microbial enzymes is often hampered by the undesirable formation of inclusion body. Here, we present a systematic in silico screening method to identify the proteins expressible in soluble form and with the desired biological properties. The screening approach was adopted in the recombinant expression of dimethyl sulfide (DMS) monooxygenase in Escherichia coli. DMS monooxygenase, a two-component enzyme consisting of DmoA and DmoB subunits, was used as a model protein. The success rate of producing soluble and active DmoA is 71% (5 out of 7 genes). Interestingly, the soluble recombinant DmoA enzymes exhibited the NADH:FMN oxidoreductase activity in the absence of DmoB (second subunit), and the cofactor FMN, suggesting that DmoA is also an oxidoreductase. DmoA originated from Janthinobacterium sp. AD80 showed the maximum NADH oxidation activity (maximum reaction rate: 6.6 µM/min; specific activity: 133 µM/min/mg). This novel finding may allow DmoA to be used as an oxidoreductase biocatalyst for various industrial applications. The in silico gene screening methodology established from this study can increase the success rate of producing soluble and functional enzymes while avoiding the laborious trial and error involved in the screening of a large pool of genes available. Key points • A systematic gene screening method was demonstrated. • DmoA is also an oxidoreductase capable of oxidizing NADH and reducing FMN. • DmoA oxidizes NADH in the absence of external FMN. Supplementary Information The online version contains supplementary material available at 10.1007/s00253-022-12008-8.
Collapse
Affiliation(s)
- Prasanth Karaiyan
- Chemical Engineering Discipline, School of Engineering, Monash University Malaysia, Jalan Lagoon Selatan, 47500, Bandar Sunway, Selangor, Malaysia
| | - Catherine Ching Han Chang
- Arkema Thiochemicals Sdn. Bhd., Jalan PJU 1A/7A OASIS Ara Damansara, 47301, Petaling Jaya, Selangor Darul Ehsan, Malaysia
| | - Eng-Seng Chan
- Chemical Engineering Discipline, School of Engineering, Monash University Malaysia, Jalan Lagoon Selatan, 47500, Bandar Sunway, Selangor, Malaysia
| | - Beng Ti Tey
- Chemical Engineering Discipline, School of Engineering, Monash University Malaysia, Jalan Lagoon Selatan, 47500, Bandar Sunway, Selangor, Malaysia.,Advanced Engineering Platform, Monash University Malaysia, Jalan Lagoon Selatan, 47500, Bandar Sunway, Selangor, Malaysia
| | - Ramakrishnan Nagasundara Ramanan
- Chemical Engineering Discipline, School of Engineering, Monash University Malaysia, Jalan Lagoon Selatan, 47500, Bandar Sunway, Selangor, Malaysia. .,Arkema Thiochemicals Sdn. Bhd., Jalan PJU 1A/7A OASIS Ara Damansara, 47301, Petaling Jaya, Selangor Darul Ehsan, Malaysia.
| | - Chien Wei Ooi
- Chemical Engineering Discipline, School of Engineering, Monash University Malaysia, Jalan Lagoon Selatan, 47500, Bandar Sunway, Selangor, Malaysia. .,Advanced Engineering Platform, Monash University Malaysia, Jalan Lagoon Selatan, 47500, Bandar Sunway, Selangor, Malaysia.
| |
Collapse
|
16
|
Santos-Junior MN, Neves WS, Santos RS, Almeida PP, Fernandes JM, Guimarães BCDB, Barbosa MS, da Silva LSC, Gomes CP, Sampaio BA, Rezende IDS, Correia TML, Neres NSDM, Campos GB, Bastos BL, Timenetsky J, Marques LM. Heterologous Expression, Purification, and Immunomodulatory Effects of Recombinant Lipoprotein GUDIV-103 Isolated from Ureaplasma diversum. Microorganisms 2022; 10:microorganisms10051032. [PMID: 35630474 PMCID: PMC9147684 DOI: 10.3390/microorganisms10051032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 05/05/2022] [Accepted: 05/07/2022] [Indexed: 02/06/2023] Open
Abstract
Ureaplasma diversum is a bacterial pathogen that infects cattle and can cause severe inflammation of the genital and reproductive systems. Lipid-associated membrane proteins (LAMPs), including GUDIV-103, are the main virulence factors in this bacterium. In this study, we heterologously expressed recombinant GUDIV-103 (rGUDIV-103) in Escherichia coli, purified it, and evaluated its immunological reactivity and immunomodulatory effects in bovine peripheral blood mononuclear cells (PBMCs). Samples from rabbits inoculated with purified rGUDIV-103 were analysed using indirect enzyme-linked immunosorbent assay and dot blotting to confirm polyclonal antibody production and assess kinetics, respectively. The expression of this lipoprotein in field isolates was confirmed via Western blotting with anti-rGUDIV-103 serum and hydrophobic or hydrophilic proteins from 42 U. diversum strains. Moreover, the antibodies produced against the U. diversum ATCC 49783 strain recognised rGUDIV-103. The mitogenic potential of rGUDIV-103 was evaluated using a lymphoproliferation assay in 5(6)-carboxyfluorescein diacetate succinimidyl ester−labelled bovine PBMCs, where it induced lymphocyte proliferation. Quantitative polymerase chain reaction analysis revealed that the expression of interleukin-1β, toll-like receptor (TLR)-α, TLR2, TLR4, inducible nitric oxide synthase, and caspase-3−encoding genes increased more in rGUDIV-103−treated PBMCs than in untreated cells (p < 0.05). Treating PBMCs with rGUDIV-103 increased nitric oxide and hydrogen peroxide levels. The antigenic and immunogenic properties of rGUDIV-103 suggested its suitability for immunobiological application.
Collapse
Affiliation(s)
- Manoel Neres Santos-Junior
- Department of Biointeraction, Multidisciplinary Institute of Health, Federal University of Bahia, Vitória da Conquista 40170-110, Brazil; (M.N.S.-J.); (W.S.N.); (R.S.S.); (J.M.F.); (T.M.L.C.); (N.S.d.M.N.)
- Department of Biology, and Biotechnology of Microorganisms, State University of Santa Cruz (UESC), Ilhéus 45662-900, Brazil; (B.C.d.B.G.); (L.S.C.d.S.); (C.P.G.); (B.A.S.); (G.B.C.); (B.L.B.)
| | - Wanderson Souza Neves
- Department of Biointeraction, Multidisciplinary Institute of Health, Federal University of Bahia, Vitória da Conquista 40170-110, Brazil; (M.N.S.-J.); (W.S.N.); (R.S.S.); (J.M.F.); (T.M.L.C.); (N.S.d.M.N.)
| | - Ronaldo Silva Santos
- Department of Biointeraction, Multidisciplinary Institute of Health, Federal University of Bahia, Vitória da Conquista 40170-110, Brazil; (M.N.S.-J.); (W.S.N.); (R.S.S.); (J.M.F.); (T.M.L.C.); (N.S.d.M.N.)
| | - Palloma Porto Almeida
- Bioinformatics and Computational Biology Lab, Division of Experimental and Translational Research, Brazilian National Cancer Institute (INCA), Rio de Janeiro 20231-050, Brazil;
| | - Janaina Marinho Fernandes
- Department of Biointeraction, Multidisciplinary Institute of Health, Federal University of Bahia, Vitória da Conquista 40170-110, Brazil; (M.N.S.-J.); (W.S.N.); (R.S.S.); (J.M.F.); (T.M.L.C.); (N.S.d.M.N.)
| | - Bruna Carolina de Brito Guimarães
- Department of Biology, and Biotechnology of Microorganisms, State University of Santa Cruz (UESC), Ilhéus 45662-900, Brazil; (B.C.d.B.G.); (L.S.C.d.S.); (C.P.G.); (B.A.S.); (G.B.C.); (B.L.B.)
| | - Maysa Santos Barbosa
- Department of Microbiology, Institute of Biomedical Science, University of São Paulo, São Paulo 05508-000, Brazil; (M.S.B.); (I.d.S.R.); (J.T.)
| | - Lucas Santana Coelho da Silva
- Department of Biology, and Biotechnology of Microorganisms, State University of Santa Cruz (UESC), Ilhéus 45662-900, Brazil; (B.C.d.B.G.); (L.S.C.d.S.); (C.P.G.); (B.A.S.); (G.B.C.); (B.L.B.)
| | - Camila Pacheco Gomes
- Department of Biology, and Biotechnology of Microorganisms, State University of Santa Cruz (UESC), Ilhéus 45662-900, Brazil; (B.C.d.B.G.); (L.S.C.d.S.); (C.P.G.); (B.A.S.); (G.B.C.); (B.L.B.)
| | - Beatriz Almeida Sampaio
- Department of Biology, and Biotechnology of Microorganisms, State University of Santa Cruz (UESC), Ilhéus 45662-900, Brazil; (B.C.d.B.G.); (L.S.C.d.S.); (C.P.G.); (B.A.S.); (G.B.C.); (B.L.B.)
| | - Izadora de Souza Rezende
- Department of Microbiology, Institute of Biomedical Science, University of São Paulo, São Paulo 05508-000, Brazil; (M.S.B.); (I.d.S.R.); (J.T.)
| | - Thiago Macedo Lopes Correia
- Department of Biointeraction, Multidisciplinary Institute of Health, Federal University of Bahia, Vitória da Conquista 40170-110, Brazil; (M.N.S.-J.); (W.S.N.); (R.S.S.); (J.M.F.); (T.M.L.C.); (N.S.d.M.N.)
| | - Nayara Silva de Macedo Neres
- Department of Biointeraction, Multidisciplinary Institute of Health, Federal University of Bahia, Vitória da Conquista 40170-110, Brazil; (M.N.S.-J.); (W.S.N.); (R.S.S.); (J.M.F.); (T.M.L.C.); (N.S.d.M.N.)
| | - Guilherme Barreto Campos
- Department of Biology, and Biotechnology of Microorganisms, State University of Santa Cruz (UESC), Ilhéus 45662-900, Brazil; (B.C.d.B.G.); (L.S.C.d.S.); (C.P.G.); (B.A.S.); (G.B.C.); (B.L.B.)
| | - Bruno Lopes Bastos
- Department of Biology, and Biotechnology of Microorganisms, State University of Santa Cruz (UESC), Ilhéus 45662-900, Brazil; (B.C.d.B.G.); (L.S.C.d.S.); (C.P.G.); (B.A.S.); (G.B.C.); (B.L.B.)
| | - Jorge Timenetsky
- Department of Microbiology, Institute of Biomedical Science, University of São Paulo, São Paulo 05508-000, Brazil; (M.S.B.); (I.d.S.R.); (J.T.)
| | - Lucas Miranda Marques
- Department of Biointeraction, Multidisciplinary Institute of Health, Federal University of Bahia, Vitória da Conquista 40170-110, Brazil; (M.N.S.-J.); (W.S.N.); (R.S.S.); (J.M.F.); (T.M.L.C.); (N.S.d.M.N.)
- Department of Biology, and Biotechnology of Microorganisms, State University of Santa Cruz (UESC), Ilhéus 45662-900, Brazil; (B.C.d.B.G.); (L.S.C.d.S.); (C.P.G.); (B.A.S.); (G.B.C.); (B.L.B.)
- Department of Microbiology, Institute of Biomedical Science, University of São Paulo, São Paulo 05508-000, Brazil; (M.S.B.); (I.d.S.R.); (J.T.)
- Correspondence:
| |
Collapse
|
17
|
Thumuluri V, Martiny HM, Almagro Armenteros JJ, Salomon J, Nielsen H, Johansen AR. NetSolP: predicting protein solubility in Escherichia coli using language models. Bioinformatics 2022; 38:941-946. [PMID: 35088833 DOI: 10.1093/bioinformatics/btab801] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 10/13/2021] [Accepted: 11/23/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Solubility and expression levels of proteins can be a limiting factor for large-scale studies and industrial production. By determining the solubility and expression directly from the protein sequence, the success rate of wet-lab experiments can be increased. RESULTS In this study, we focus on predicting the solubility and usability for purification of proteins expressed in Escherichia coli directly from the sequence. Our model NetSolP is based on deep learning protein language models called transformers and we show that it achieves state-of-the-art performance and improves extrapolation across datasets. As we find current methods are built on biased datasets, we curate existing datasets by using strict sequence-identity partitioning and ensure that there is minimal bias in the sequences. AVAILABILITY AND IMPLEMENTATION The predictor and data are available at https://services.healthtech.dtu.dk/service.php?NetSolP and the open-sourced code is available at https://github.com/tvinet/NetSolP-1.0. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Hannah-Marie Martiny
- Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmark, Lyngby 2800, Denmark
| | - Jose J Almagro Armenteros
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark
| | | | - Henrik Nielsen
- Department of Health Technology, Technical University of Denmark, Lyngby 2800, Denmark
| | - Alexander Rosenberg Johansen
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| |
Collapse
|
18
|
Caetano BDL, Domingos MDO, da Silva MA, da Silva JCA, Polatto JM, Montoni F, Iwai LK, Pimenta DC, Vigerelli H, Vieira PCG, Ruiz RDC, Patané JS, Piazza RMF. In Silico Prediction and Design of Uropathogenic Escherichia coli Alpha-Hemolysin Generate a Soluble and Hemolytic Recombinant Toxin. Microorganisms 2022; 10:microorganisms10010172. [PMID: 35056621 PMCID: PMC8778037 DOI: 10.3390/microorganisms10010172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 01/01/2022] [Accepted: 01/08/2022] [Indexed: 01/27/2023] Open
Abstract
The secretion of α-hemolysin by uropathogenic Escherichia coli (UPEC) is commonly associated with the severity of urinary tract infections, which makes it a predictor of poor prognosis among patients. Accordingly, this toxin has become a target for diagnostic tests and therapeutic interventions. However, there are several obstacles associated with the process of α-hemolysin purification, therefore limiting its utilization in scientific investigations. In order to overcome the problems associated with α-hemolysin expression, after in silico prediction, a 20.48 kDa soluble α-hemolysin recombinant denoted rHlyA was constructed. This recombinant is composed by a 182 amino acid sequence localized in the aa542–723 region of the toxin molecule. The antigenic determinants of the rHlyA were estimated by bioinformatics analysis taking into consideration the tertiary form of the toxin, epitope analysis tools, and solubility inference. The results indicated that rHlyA has three antigenic domains localized in the aa555–565, aa600–610, and aa674–717 regions. Functional investigation of rHlyA demonstrated that it has hemolytic activity against sheep red cells, but no cytotoxic effect against epithelial bladder cells. In summary, the results obtained in this study indicate that rHlyA is a soluble recombinant protein that can be used as a tool in studies that aim to understand the mechanisms involved in the hemolytic and cytotoxic activities of α-hemolysin produced by UPEC. In addition, rHlyA can be applied to generate monoclonal and/or polyclonal antibodies that can be utilized in the development of diagnostic tests and therapeutic interventions.
Collapse
Affiliation(s)
- Bruna De Lucca Caetano
- Laboratório de Bacteriologia, Instituto Butantan, Av. Vital Brazil, São Paulo 1500-05503-900, SP, Brazil; (B.D.L.C.); (M.d.O.D.); (M.A.d.S.); (J.C.A.d.S.); (J.M.P.); (P.C.G.V.); (R.d.C.R.)
| | - Marta de Oliveira Domingos
- Laboratório de Bacteriologia, Instituto Butantan, Av. Vital Brazil, São Paulo 1500-05503-900, SP, Brazil; (B.D.L.C.); (M.d.O.D.); (M.A.d.S.); (J.C.A.d.S.); (J.M.P.); (P.C.G.V.); (R.d.C.R.)
| | - Miriam Aparecida da Silva
- Laboratório de Bacteriologia, Instituto Butantan, Av. Vital Brazil, São Paulo 1500-05503-900, SP, Brazil; (B.D.L.C.); (M.d.O.D.); (M.A.d.S.); (J.C.A.d.S.); (J.M.P.); (P.C.G.V.); (R.d.C.R.)
| | - Jessika Cristina Alves da Silva
- Laboratório de Bacteriologia, Instituto Butantan, Av. Vital Brazil, São Paulo 1500-05503-900, SP, Brazil; (B.D.L.C.); (M.d.O.D.); (M.A.d.S.); (J.C.A.d.S.); (J.M.P.); (P.C.G.V.); (R.d.C.R.)
| | - Juliana Moutinho Polatto
- Laboratório de Bacteriologia, Instituto Butantan, Av. Vital Brazil, São Paulo 1500-05503-900, SP, Brazil; (B.D.L.C.); (M.d.O.D.); (M.A.d.S.); (J.C.A.d.S.); (J.M.P.); (P.C.G.V.); (R.d.C.R.)
| | - Fabio Montoni
- Laboratório de Toxinologia Aplicada, Instituto Butantan, Av. Vital Brazil, São Paulo 1500-05503-900, SP, Brazil; (F.M.); (L.K.I.)
| | - Leo Kei Iwai
- Laboratório de Toxinologia Aplicada, Instituto Butantan, Av. Vital Brazil, São Paulo 1500-05503-900, SP, Brazil; (F.M.); (L.K.I.)
| | - Daniel Carvalho Pimenta
- Laboratório de Biofísica e Bioquímica, Instituto Butantan, Av. Vital Brazil, São Paulo 1500-05503-900, SP, Brazil; (D.C.P.); (H.V.)
| | - Hugo Vigerelli
- Laboratório de Biofísica e Bioquímica, Instituto Butantan, Av. Vital Brazil, São Paulo 1500-05503-900, SP, Brazil; (D.C.P.); (H.V.)
| | - Paulo Cesar Gomes Vieira
- Laboratório de Bacteriologia, Instituto Butantan, Av. Vital Brazil, São Paulo 1500-05503-900, SP, Brazil; (B.D.L.C.); (M.d.O.D.); (M.A.d.S.); (J.C.A.d.S.); (J.M.P.); (P.C.G.V.); (R.d.C.R.)
| | - Rita de Cassia Ruiz
- Laboratório de Bacteriologia, Instituto Butantan, Av. Vital Brazil, São Paulo 1500-05503-900, SP, Brazil; (B.D.L.C.); (M.d.O.D.); (M.A.d.S.); (J.C.A.d.S.); (J.M.P.); (P.C.G.V.); (R.d.C.R.)
| | - José Salvatore Patané
- Laboratório de Ciclo Celular, Instituto Butantan, Av. Vital Brazil, São Paulo 1500-05503-900, SP, Brazil
- Correspondence: (J.S.P.); (R.M.F.P.)
| | - Roxane Maria Fontes Piazza
- Laboratório de Bacteriologia, Instituto Butantan, Av. Vital Brazil, São Paulo 1500-05503-900, SP, Brazil; (B.D.L.C.); (M.d.O.D.); (M.A.d.S.); (J.C.A.d.S.); (J.M.P.); (P.C.G.V.); (R.d.C.R.)
- Correspondence: (J.S.P.); (R.M.F.P.)
| |
Collapse
|
19
|
Mustafa MI, Shantier SW. Next generation multi epitope based peptide vaccine against Marburg Virus disease combined with molecular docking studies. INFORMATICS IN MEDICINE UNLOCKED 2022. [DOI: 10.1016/j.imu.2022.101087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
|
20
|
Madani M, Lin K, Tarakanova A. DSResSol: A Sequence-Based Solubility Predictor Created with Dilated Squeeze Excitation Residual Networks. Int J Mol Sci 2021; 22:13555. [PMID: 34948354 PMCID: PMC8704505 DOI: 10.3390/ijms222413555] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Revised: 12/13/2021] [Accepted: 12/14/2021] [Indexed: 11/16/2022] Open
Abstract
Protein solubility is an important thermodynamic parameter that is critical for the characterization of a protein's function, and a key determinant for the production yield of a protein in both the research setting and within industrial (e.g., pharmaceutical) applications. Experimental approaches to predict protein solubility are costly, time-consuming, and frequently offer only low success rates. To reduce cost and expedite the development of therapeutic and industrially relevant proteins, a highly accurate computational tool for predicting protein solubility from protein sequence is sought. While a number of in silico prediction tools exist, they suffer from relatively low prediction accuracy, bias toward the soluble proteins, and limited applicability for various classes of proteins. In this study, we developed a novel deep learning sequence-based solubility predictor, DSResSol, that takes advantage of the integration of squeeze excitation residual networks with dilated convolutional neural networks and outperforms all existing protein solubility prediction models. This model captures the frequently occurring amino acid k-mers and their local and global interactions and highlights the importance of identifying long-range interaction information between amino acid k-mers to achieve improved accuracy, using only protein sequence as input. DSResSol outperforms all available sequence-based solubility predictors by at least 5% in terms of accuracy when evaluated by two different independent test sets. Compared to existing predictors, DSResSol not only reduces prediction bias for insoluble proteins but also predicts soluble proteins within the test sets with an accuracy that is at least 13% higher than existing models. We derive the key amino acids, dipeptides, and tripeptides contributing to protein solubility, identifying glutamic acid and serine as critical amino acids for protein solubility prediction. Overall, DSResSol can be used for the fast, reliable, and inexpensive prediction of a protein's solubility to guide experimental design.
Collapse
Affiliation(s)
- Mohammad Madani
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA;
- Department of Computer Science & Engineering, University of Connecticut, Storrs, CT 06269, USA;
| | - Kaixiang Lin
- Department of Computer Science & Engineering, University of Connecticut, Storrs, CT 06269, USA;
| | - Anna Tarakanova
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA;
- Department of Biomedical Engineering, University of Connecticut, Storrs, CT 06269, USA
| |
Collapse
|
21
|
Mital S, Christie G, Dikicioglu D. Recombinant expression of insoluble enzymes in Escherichia coli: a systematic review of experimental design and its manufacturing implications. Microb Cell Fact 2021; 20:208. [PMID: 34717620 PMCID: PMC8557517 DOI: 10.1186/s12934-021-01698-w] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Accepted: 10/22/2021] [Indexed: 02/06/2023] Open
Abstract
Recombinant enzyme expression in Escherichia coli is one of the most popular methods to produce bulk concentrations of protein product. However, this method is often limited by the inadvertent formation of inclusion bodies. Our analysis systematically reviews literature from 2010 to 2021 and details the methods and strategies researchers have utilized for expression of difficult to express (DtE), industrially relevant recombinant enzymes in E. coli expression strains. Our review identifies an absence of a coherent strategy with disparate practices being used to promote solubility. We discuss the potential to approach recombinant expression systematically, with the aid of modern bioinformatics, modelling, and ‘omics’ based systems-level analysis techniques to provide a structured, holistic approach. Our analysis also identifies potential gaps in the methods used to report metadata in publications and the impact on the reproducibility and growth of the research in this field.
Collapse
Affiliation(s)
- Suraj Mital
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, CB3 0AS, UK
| | - Graham Christie
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, CB3 0AS, UK
| | - Duygu Dikicioglu
- Department of Biochemical Engineering, University College London, London, WC1E 6BT, UK.
| |
Collapse
|
22
|
Vakili O, Khatami SH, Maleksabet A, Movahedpour A, Fana SE, Sadegh R, Salmanzadeh AH, Razeghifam H, Nourdideh S, Tehrani SS, Taheri-Anganeh M. Finding Appropriate Signal Peptides for Secretory Production of Recombinant Glucarpidase: An In SilicoMethod. Recent Pat Biotechnol 2021; 15:302-315. [PMID: 34547999 DOI: 10.2174/1872208315666210921095420] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 06/16/2021] [Accepted: 08/02/2021] [Indexed: 11/22/2022]
Abstract
BACKGROUND Methotrexate (MTX) is a general chemotherapeutic agent utilized to treat a variety of malignancies, woefully, its high doses can cause nephrotoxicity and subsequent defect in the process of MTX excretion. The recombinant form of glucarpidase is produced by engineered E. coli and is a confirmed choice to overcoming this problem. OBJECTIVE In the present study, in silico analyses were performed to select suitable SPs for the secretion of recombinant glucarpidase in E. coli. METHODS The signal peptide website and UniProt database were employed to collect the SPs and protein sequences. In the next step, SignalP-5.0 helped us to predict the SPs and the position of cleavage sites. Moreover, physicochemical properties and solubility were evaluated using Prot- Param and Protein-sol online software, and finally, ProtCompB was used to predict the final subcellular localization. RESULTS Luckily, all SPs could form soluble fusion proteins. At last, it was found that PPB and TIBA could translocate the glucarpidase into the extracellular compartment. CONCLUSION This study showed that there are only 2 applicable SPs for the extracellular translocation of glucarpidase. Although the findings were remarkable with high degrees of accuracy and precision based on the utilization of bioinformatics analyses, additional experimental assessments are required to confirm and validate it. Recent patents revealed several inventions related to the clinical aspects of vaccine peptides against human disorders.
Collapse
Affiliation(s)
- Omid Vakili
- Department of Clinical Biochemistry, School of Pharmacy and Pharmaceutical Sciences, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Seyyed Hossein Khatami
- Department of Clinical Biochemistry, School of Medicine, Shahid Beheshti University of Medical Science, Tehran, Iran
| | - Amir Maleksabet
- Department of Medical Biotechnology, School of Advanced Technologies in Medicine, Mazandaran University of Medical Sciences, Sari, Iran
| | - Ahmad Movahedpour
- Department of Medical Biotechnology, School of Advanced Medical Sciences and Technologies, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Saeed Ebrahimi Fana
- Department of Clinical Biochemistry, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
| | | | | | | | | | - Sadra Samavarchi Tehrani
- Department of Clinical Biochemistry, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
| | - Mortaza Taheri-Anganeh
- Department of Medical Biotechnology, School of Advanced Medical Sciences and Technologies, Shiraz University of Medical Sciences, Shiraz, Iran
| |
Collapse
|
23
|
Prediction of Protein Solubility Based on Sequence Feature Fusion and DDcCNN. Interdiscip Sci 2021; 13:703-716. [PMID: 34236625 DOI: 10.1007/s12539-021-00456-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Revised: 06/21/2021] [Accepted: 06/23/2021] [Indexed: 10/20/2022]
Abstract
BACKGROUND Prediction of protein solubility is an indispensable prerequisite for pharmaceutical research and production. The general and specific objective of this work is to design a new model for predicting protein solubility by using protein sequence feature fusion and deep dual-channel convolutional neural networks (DDcCNN) to improve the performance of existing prediction models. METHODS The redundancy of raw protein is reduced by CD-HIT. The four subsequences are built from protein sequence: one global and three locals. The global subsequence is the entire protein sequence, and these local subsequences are obtained by moving a sliding window with some rules. Using G-gap to extract the features of the above four subsequences, a mixed matrix is constructed as the input of one channel which is composed of three-layer convolutional operating. Additional features are extracted by SCRATCH tool as input of another channel, which is consist of a single convolution in order to find hidden relationships and improve the accuracy of predictor. The outputs of two parallel channels are concatenated as the input of the hidden layer. And the prediction of protein solubility is obtained in the output layer. The best protein solubility prediction model is obtained by doing some comparative experiments of different frameworks. RESULTS The performance indicators of DDcCNN model (our designed) are as follows: accuracy of 77.82%, Matthew's correlation coefficient of 0.57, sensitivity of 76.13% and specificity of 79.32%. The results of some comparative experiments show that the overall performance of DDcCNN model is better than existing models (GCNN, LCNN and PCNN). The related models and data are publicly deposited at http://www.ddccnn.wang . CONCLUSION The satisfactory performance of DDcCNN model reveals that these features and flexible computational methodologies can reinforce the existing prediction models for better prediction of protein solubility could be applied in several applications, such as to preselect initial targets that are soluble or to alter solubility of target proteins, thus can help to reduce the production cost.
Collapse
|
24
|
Wu X, Yu L. EPSOL: sequence-based protein solubility prediction using multidimensional embedding. Bioinformatics 2021; 37:4314-4320. [PMID: 34145885 DOI: 10.1093/bioinformatics/btab463] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 05/18/2021] [Accepted: 06/17/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The heterologous expression of recombinant protein requires host cells, such as Escherichia coli, and the solubility of protein greatly affects the protein yield. A novel and highly accurate solubility predictor that concurrently improves the production yield and minimizes production cost, and that forecasts protein solubility in an E. coli expression system before the actual experimental work is highly sought. RESULTS In this paper, EPSOL, a novel deep learning architecture for the prediction of protein solubility in an E. coli expression system, which automatically obtains comprehensive protein feature representations using multidimensional embedding, is presented. EPSOL outperformed all existing sequence-based solubility predictors and achieved 0.79 in accuracy and 0.58 in Matthew's correlation coefficient. The higher performance of EPSOL permits large-scale screening for sequence variants with enhanced manufacturability and predicts the solubility of new recombinant proteins in an E. coli expression system with greater reliability. AVAILABILITY AND IMPLEMENTATION EPSOL's best model and results can be downloaded from GitHub (https://github.com/LiangYu-Xidian/EPSOL). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiang Wu
- School of Computer Science and Technology, Xidian University, Xi'an 710071, Shaanxi, China
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an 710071, Shaanxi, China
| |
Collapse
|
25
|
Barrett R, White AD. Investigating Active Learning and Meta-Learning for Iterative Peptide Design. J Chem Inf Model 2021; 61:95-105. [PMID: 33350829 PMCID: PMC7842147 DOI: 10.1021/acs.jcim.0c00946] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Indexed: 01/14/2023]
Abstract
Often the development of novel functional peptides is not amenable to high throughput or purely computational screening methods. Peptides must be synthesized one at a time in a process that does not generate large amounts of data. One way this method can be improved is by ensuring that each experiment provides the best improvement in both peptide properties and predictive modeling accuracy. Here, we study the effectiveness of active learning, optimizing experiment order, and meta-learning, transferring knowledge between contexts, to reduce the number of experiments necessary to build a predictive model. We present a multitask benchmark database of peptides designed to advance these methods for experimental design. Each task is a binary classification of peptides represented as a sequence string. We find neither active learning method tested to be better than random choice. The meta-learning method Reptile was found to improve the average accuracy across data sets. Combining meta-learning with active learning offers inconsistent benefits.
Collapse
Affiliation(s)
- Rainier Barrett
- Department of Chemical Engineering,
University of Rochester, Rochester, New York 14627,
United States
| | - Andrew D. White
- Department of Chemical Engineering,
University of Rochester, Rochester, New York 14627,
United States
| |
Collapse
|
26
|
Santos Junior MN, Santos RS, Neves WS, Fernandes JM, de Brito Guimarães BC, Barbosa MS, Silva LSC, Gomes CP, Rezende IS, Oliveira CNT, de Macêdo Neres NS, Campos GB, Bastos BL, Timenetsky J, Marques LM. Immunoinformatics and analysis of antigen distribution of Ureaplasma diversum strains isolated from different Brazilian states. BMC Vet Res 2020; 16:379. [PMID: 33028315 PMCID: PMC7542862 DOI: 10.1186/s12917-020-02602-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 09/30/2020] [Indexed: 01/29/2023] Open
Abstract
BACKGROUND Ureaplasma diversum has numerous virulence factors that contribute to pathogenesis in cattle, including Lipid-associated membrane proteins (LAMPs). Therefore, the objectives of this study were to evaluate in silico important characteristics for immunobiological applications and for heterologous expression of 36 LAMPs of U. diversum (UdLAMPs) and, also, to verify by conventional PCR the distribution of these antigens in strains of Brazilian states (Bahia, Minas Gerais, São Paulo, and Mato Grosso do Sul). The Manatee database was used to obtain the gene and peptide sequences of the antigens. Similarity and identity studies were performed using BLASTp and direct antigenicity was evaluated by the VaxiJen v2.0 server. Epitope prediction for B lymphocytes was performed on the BepiPred v2.0 and CBTOPE v1.0 servers. NetBoLApan v1.0 was used to predict CD8+ T lymphocyte epitopes. Subcellular location and presence of transmembrane regions were verified by the software PSORTb v3.0.2 and TMHMM v2.2 respectively. SignalP v5.0, SecretomeP v2.0, and DOLOP servers were used to predict the extracellular excretion signal. Physico-chemical properties were evaluated by the web-software ProtParam, Solpro, and Protein-sol. RESULTS In silico analysis revealed that many UdLAMPs have desirable properties for immunobiological applications and heterologous expression. The proteins gudiv_61, gudiv_103, gudiv_517, and gudiv_681 were most promising. Strains from the 4 states were PCR positive for antigens predicted with immunogenic and/or with good characteristics for expression in a heterologous system. CONCLUSION These works contribute to a better understanding of the immunobiological properties of the UdLAMPs and provide a profile of the distribution of these antigens in different Brazilian states.
Collapse
Affiliation(s)
- Manoel Neres Santos Junior
- Department of Biointeraction, Multidisciplinary Institute of Health, Universidade Federal da Bahia, Rua Hormindo Barros, 58 - Quadra 17 - Lote 58, Bairro Candeias - CEP: 45.029-094, Vitória da Conquista, BA, Brazil.,Department of Microbiology, State University of Santa Cruz (UESC), Ilhéus, Brazil
| | - Ronaldo Silva Santos
- Department of Biointeraction, Multidisciplinary Institute of Health, Universidade Federal da Bahia, Rua Hormindo Barros, 58 - Quadra 17 - Lote 58, Bairro Candeias - CEP: 45.029-094, Vitória da Conquista, BA, Brazil
| | - Wanderson Souza Neves
- Department of Biointeraction, Multidisciplinary Institute of Health, Universidade Federal da Bahia, Rua Hormindo Barros, 58 - Quadra 17 - Lote 58, Bairro Candeias - CEP: 45.029-094, Vitória da Conquista, BA, Brazil
| | - Janaina Marinho Fernandes
- Department of Biointeraction, Multidisciplinary Institute of Health, Universidade Federal da Bahia, Rua Hormindo Barros, 58 - Quadra 17 - Lote 58, Bairro Candeias - CEP: 45.029-094, Vitória da Conquista, BA, Brazil
| | | | - Maysa Santos Barbosa
- Department of Microbiology, Institute of Biomedical Science, University of São Paulo, São Paulo, Brazil
| | | | - Camila Pacheco Gomes
- Department of Microbiology, State University of Santa Cruz (UESC), Ilhéus, Brazil
| | - Izadora Souza Rezende
- Department of Microbiology, Institute of Biomedical Science, University of São Paulo, São Paulo, Brazil
| | - Caline Novaes Teixeira Oliveira
- Department of Biointeraction, Multidisciplinary Institute of Health, Universidade Federal da Bahia, Rua Hormindo Barros, 58 - Quadra 17 - Lote 58, Bairro Candeias - CEP: 45.029-094, Vitória da Conquista, BA, Brazil.,Department of Microbiology, State University of Santa Cruz (UESC), Ilhéus, Brazil
| | - Nayara Silva de Macêdo Neres
- Department of Biointeraction, Multidisciplinary Institute of Health, Universidade Federal da Bahia, Rua Hormindo Barros, 58 - Quadra 17 - Lote 58, Bairro Candeias - CEP: 45.029-094, Vitória da Conquista, BA, Brazil
| | - Guilherme Barreto Campos
- Department of Biointeraction, Multidisciplinary Institute of Health, Universidade Federal da Bahia, Rua Hormindo Barros, 58 - Quadra 17 - Lote 58, Bairro Candeias - CEP: 45.029-094, Vitória da Conquista, BA, Brazil
| | - Bruno Lopes Bastos
- Department of Biointeraction, Multidisciplinary Institute of Health, Universidade Federal da Bahia, Rua Hormindo Barros, 58 - Quadra 17 - Lote 58, Bairro Candeias - CEP: 45.029-094, Vitória da Conquista, BA, Brazil
| | - Jorge Timenetsky
- Department of Microbiology, Institute of Biomedical Science, University of São Paulo, São Paulo, Brazil
| | - Lucas Miranda Marques
- Department of Biointeraction, Multidisciplinary Institute of Health, Universidade Federal da Bahia, Rua Hormindo Barros, 58 - Quadra 17 - Lote 58, Bairro Candeias - CEP: 45.029-094, Vitória da Conquista, BA, Brazil. .,Department of Microbiology, State University of Santa Cruz (UESC), Ilhéus, Brazil. .,Department of Microbiology, Institute of Biomedical Science, University of São Paulo, São Paulo, Brazil.
| |
Collapse
|
27
|
Raimondi D, Orlando G, Fariselli P, Moreau Y. Insight into the protein solubility driving forces with neural attention. PLoS Comput Biol 2020; 16:e1007722. [PMID: 32352965 PMCID: PMC7217484 DOI: 10.1371/journal.pcbi.1007722] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Revised: 05/12/2020] [Accepted: 02/10/2020] [Indexed: 12/29/2022] Open
Abstract
Protein solubility is a key aspect for many biotechnological, biomedical and industrial processes, such as the production of active proteins and antibodies. In addition, understanding the molecular determinants of the solubility of proteins may be crucial to shed light on the molecular mechanisms of diseases caused by aggregation processes such as amyloidosis. Here we present SKADE, a novel Neural Network protein solubility predictor and we show how it can provide novel insight into the protein solubility mechanisms, thanks to its neural attention architecture. First, we show that SKADE positively compares with state of the art tools while using just the protein sequence as input. Then, thanks to the neural attention mechanism, we use SKADE to investigate the patterns learned during training and we analyse its decision process. We use this peculiarity to show that, while the attention profiles do not correlate with obvious sequence aspects such as biophysical properties of the aminoacids, they suggest that N- and C-termini are the most relevant regions for solubility prediction and are predictive for complex emergent properties such as aggregation-prone regions involved in beta-amyloidosis and contact density. Moreover, SKADE is able to identify mutations that increase or decrease the overall solubility of the protein, allowing it to be used to perform large scale in-silico mutagenesis of proteins in order to maximize their solubility. The solubility of proteins is a crucial biophysical aspect when it comes to understanding many human diseases and to improve the industrial processes for protein production. Due to its relevance, computational methods have been devised in order to study and possibly optimize the solubility of proteins. In this work we apply a deep-learning technique, called neural attention to predict protein solubility while “opening” the model itself to interpretability, even though Machine Learning models are usually considered black boxes. Thank to the attention mechanism, we show that i) our model implicitly learns complex patterns related to emergent, protein folding-related, aspects such as to recognize β-amyloidosis regions and that ii) the N-and C-termini are the regions with the highes signal fro solubility prediction. When it comes to enhancing the solubility of proteins, we, for the first time, propose to investigate the synergistic effects of tandem mutations instead of “single” mutations, suggesting that this could minimize the number of required proposed mutations.
Collapse
Affiliation(s)
| | | | | | - Yves Moreau
- ESAT-STADIUS, KU Leuven, Leuven, Belgium
- * E-mail:
| |
Collapse
|
28
|
Stepwise optimization of recombinant protein production in Escherichia coli utilizing computational and experimental approaches. Appl Microbiol Biotechnol 2020; 104:3253-3266. [DOI: 10.1007/s00253-020-10454-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2019] [Revised: 01/28/2020] [Accepted: 02/07/2020] [Indexed: 12/14/2022]
|
29
|
In Silico Study of Different Signal Peptides to Express Recombinant Glutamate Decarboxylase in the Outer Membrane of Escherichia coli. Int J Pept Res Ther 2019. [DOI: 10.1007/s10989-019-09986-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
30
|
Han X, Zhang L, Zhou K, Wang X. ProGAN: Protein solubility generative adversarial nets for data augmentation in DNN framework. Comput Chem Eng 2019. [DOI: 10.1016/j.compchemeng.2019.106533] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
31
|
Sadeghian-Rizi T, Ebrahimi A, Moazzen F, Yousefian H, Jahanian-Najafabadi A. Improvement of solubility and yield of recombinant protein expression in E. coli using a two-step system. Res Pharm Sci 2019; 14:400-407. [PMID: 31798656 PMCID: PMC6827196 DOI: 10.4103/1735-5362.268200] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Overexpression of recombinant proteins in Escherichia coli results in inclusion body formation, and consequently decreased production yield and increased production cost. Co-expression of chaperon systems accompanied by recombinant protein is a general method to increase the production yield. However, it has not been successful enough due to imposed intense stress to the host cells. The aim of this study was to balance the rate of protein production and the imposed cellular stresses using a two-step expression system. For this purpose, in the first step, green fluorescent protein (GFP) was expressed as a recombinant protein model under control of the T7-TetO artificial promoter-operator, accompanied by Dnak/J/GrpE chaperon system. Then, in the next step, TetR repressor was activated automatically under the control of the stress promoter ibpAB and suppressed the GFP production after accumulation of inclusion bodies. Thus in this step incorrect folded proteins and inclusion bodies are refolded causing increased yield and solubility of the recombinant protein and restarting GFP expression again. Total GFP, soluble and insoluble GFP fractions, were measured by Synergy H1 multiple reader. Results showed that expression yield and soluble/insoluble ratio of GFP have been increased 5 and 2.5 times using this system in comparison with the single step process, respectively. The efficiency of this system in increasing solubility and production yield of recombinant proteins was confirmed. The two-step system must be evaluated for expression of various proteins to further confirm its applicability in the field of recombinant protein production.
Collapse
Affiliation(s)
- Tahereh Sadeghian-Rizi
- Department of Pharmaceutical Biotechnology, School of Pharmacy and Pharmaceutical Sciences, Isfahan University of Medical Sciences, Isfahan, I.R. Iran.,Student Research Committee, School of Pharmacy and Pharmaceutical Sciences, Isfahan University of Medical Sciences, Isfahan, I.R. Iran
| | - Azade Ebrahimi
- Department of Pharmaceutical Biotechnology, School of Pharmacy and Pharmaceutical Sciences, Isfahan University of Medical Sciences, Isfahan, I.R. Iran.,Student Research Committee, School of Pharmacy and Pharmaceutical Sciences, Isfahan University of Medical Sciences, Isfahan, I.R. Iran
| | - Fatemeh Moazzen
- Department of Pharmaceutical Biotechnology, School of Pharmacy and Pharmaceutical Sciences, Isfahan University of Medical Sciences, Isfahan, I.R. Iran
| | - Hesam Yousefian
- Student Research Committee, School of Pharmacy and Pharmaceutical Sciences, Isfahan University of Medical Sciences, Isfahan, I.R. Iran
| | - Ali Jahanian-Najafabadi
- Department of Pharmaceutical Biotechnology, School of Pharmacy and Pharmaceutical Sciences, Isfahan University of Medical Sciences, Isfahan, I.R. Iran
| |
Collapse
|
32
|
Khurana S, Rawi R, Kunji K, Chuang GY, Bensmail H, Mall R. DeepSol: a deep learning framework for sequence-based protein solubility prediction. Bioinformatics 2019; 34:2605-2613. [PMID: 29554211 DOI: 10.1093/bioinformatics/bty166] [Citation(s) in RCA: 97] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2017] [Accepted: 03/13/2018] [Indexed: 01/09/2023] Open
Abstract
Motivation Protein solubility plays a vital role in pharmaceutical research and production yield. For a given protein, the extent of its solubility can represent the quality of its function, and is ultimately defined by its sequence. Thus, it is imperative to develop novel, highly accurate in silico sequence-based protein solubility predictors. In this work we propose, DeepSol, a novel Deep Learning-based protein solubility predictor. The backbone of our framework is a convolutional neural network that exploits k-mer structure and additional sequence and structural features extracted from the protein sequence. Results DeepSol outperformed all known sequence-based state-of-the-art solubility prediction methods and attained an accuracy of 0.77 and Matthew's correlation coefficient of 0.55. The superior prediction accuracy of DeepSol allows to screen for sequences with enhanced production capacity and can more reliably predict solubility of novel proteins. Availability and implementation DeepSol's best performing models and results are publicly deposited at https://doi.org/10.5281/zenodo.1162886 (Khurana and Mall, 2018). Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sameer Khurana
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Reda Rawi
- Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institute of Health, Bethesda, MD, USA
| | - Khalid Kunji
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Gwo-Yu Chuang
- Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institute of Health, Bethesda, MD, USA
| | - Halima Bensmail
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Raghvendra Mall
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| |
Collapse
|
33
|
Ghaheh HS, Ganjalikhany MR, Yaghmaei P, Pourfarzam M, Mir Mohammad Sadeghi H. Improving the solubility, activity, and stability of reteplase using in silico design of new variants. Res Pharm Sci 2019; 14:359-368. [PMID: 31516513 PMCID: PMC6714118 DOI: 10.4103/1735-5362.263560] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
Reteplase (recombinant plasminogen activator, r-PA) is a thrombolytic agent recombined from tissue-type plasminogen activator (t-PA), which has several prominent features such as strong thrombolytic ability and E. coli expressibility. Despite these outstanding features, it demonstrates reduced fibrin binding affinity, reduced stimulation of protease activity, and lower solubility, hence higher aggregation propensity, compared to t-PA. The present study was devoted to design r-PA variants with comparable structural stability, enhanced biological activity, and high solubility. For this purpose, computational molecular modeling techniques were utilized. The supercharging technique was applied for r-PA to designing new species of the protein. Based on the results from in silico evaluation of selected mutations in comparison to the wild-type r-PA, the designed supercharged mutant (S7 variant) exhibited augmented stability, decreased solvation energy, as well as enhanced binding affinity to fibrin. The data also implied increased plasminogen cleavage activity of the new variant. These findings have implications to therapies which involve removal of intravascular blood clots, including the treatment of acute myocardial infarction.
Collapse
Affiliation(s)
| | | | - Parichehreh Yaghmaei
- Department of Biology, Science and Research Branch, Islamic Azad University, Tehran, I.R. Iran
| | - Morteza Pourfarzam
- Department of Clinical Biochemistry, School of Pharmacy and Pharmaceutical Sciences, Isfahan University of Medical Sciences, Isfahan, I.R. Iran
| | - Hamid Mir Mohammad Sadeghi
- Department of Pharmaceutical Biotechnology, School of Pharmacy and Pharmaceutical Sciences, Isfahan University of Medical Sciences, Isfahan, I.R. Iran
| |
Collapse
|
34
|
In silico analysis of different signal peptides for the secretory production of recombinant human keratinocyte growth factor in Escherichia coli. Comput Biol Chem 2019; 80:225-233. [DOI: 10.1016/j.compbiolchem.2019.03.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2018] [Revised: 01/23/2019] [Accepted: 03/11/2019] [Indexed: 12/31/2022]
|
35
|
Amin SA, Endalur Gopinarayanan V, Nair NU, Hassoun S. Establishing synthesis pathway-host compatibility via enzyme solubility. Biotechnol Bioeng 2019; 116:1405-1416. [PMID: 30802311 DOI: 10.1002/bit.26959] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2018] [Revised: 12/18/2018] [Accepted: 02/21/2019] [Indexed: 12/12/2022]
Abstract
Current pathway synthesis tools identify possible pathways that can be added to a host to produce the desired target molecule through the exploration of abstract metabolic and reaction network space. However, not many of these tools explore gene-level information required to physically realize the identified synthesis pathways, and none explore enzyme-host compatibility. Developing tools that address this disconnect between abstract reactions/metabolic design space and physical genetic sequence design space will enable expedited experimental efforts that avoid exploring unprofitable synthesis pathways. This work describes a workflow, termed Probabilistic Pathway Assembly with Solubility Confidence Scores (ProPASS), which links synthesis pathway construction with the exploration of the physical design space as imposed by the availability of enzymes with predicted characterized activities within the host. Predicted protein solubility propensity scores are used as a confidence level to quantify the compatibility of each pathway enzyme with the host Escherichia coli (E. coli). This study also presents a database, termed Protein Solubility Database (ProSol DB), which provides solubility confidence scores in E. coli for 240,016 characterized enzymes obtained from UniProtKB/Swiss-Prot. The utility of ProPASS is demonstrated by generating genetic implementations of heterologous synthesis pathways in E. coli that target several commercially useful biomolecules.
Collapse
Affiliation(s)
- Sara A Amin
- Department of Computer Science, Tufts University, Medford, Massachusetts
| | | | - Nikhil U Nair
- Department of Chemical and Biological Engineering, Tufts University, Medford, Massachusetts
| | - Soha Hassoun
- Department of Computer Science, Tufts University, Medford, Massachusetts.,Department of Chemical and Biological Engineering, Tufts University, Medford, Massachusetts
| |
Collapse
|
36
|
de Marco A, Ferrer-Miralles N, Garcia-Fruitós E, Mitraki A, Peternel S, Rinas U, Trujillo-Roldán MA, Valdez-Cruz NA, Vázquez E, Villaverde A. Bacterial inclusion bodies are industrially exploitable amyloids. FEMS Microbiol Rev 2019; 43:53-72. [PMID: 30357330 DOI: 10.1093/femsre/fuy038] [Citation(s) in RCA: 66] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2018] [Accepted: 10/23/2018] [Indexed: 12/13/2022] Open
Abstract
Understanding the structure, functionalities and biology of functional amyloids is an issue of emerging interest. Inclusion bodies, namely protein clusters formed in recombinant bacteria during protein production processes, have emerged as unanticipated, highly tunable models for the scrutiny of the physiology and architecture of functional amyloids. Based on an amyloidal skeleton combined with varying amounts of native or native-like protein forms, bacterial inclusion bodies exhibit an unusual arrangement that confers mechanical stability, biological activity and conditional protein release, being thus exploitable as versatile biomaterials. The applicability of inclusion bodies in biotechnology as enriched sources of protein and reusable catalysts, and in biomedicine as biocompatible topographies, nanopills or mimetics of endocrine secretory granules has been largely validated. Beyond these uses, the dissection of how recombinant bacteria manage the aggregation of functional protein species into structures of highly variable complexity offers insights about unsuspected connections between protein quality (conformational status compatible with functionality) and cell physiology.
Collapse
Affiliation(s)
- Ario de Marco
- Laboratory for Environmental and Life Sciences, University of Nova Gorica, Vipavska Cesta 13, 5000 Nova Gorica, Slovenia
| | - Neus Ferrer-Miralles
- Institut de Biotecnologia i de Biomedicina (IBB), Carrer de la Vall Moronta s/n, Universitat Autònoma de Barcelona, 08193 Cerdanyola del Vallès, Spain.,Departament de Genètica i de Microbiologia, Carrer de la Vall Moronta s/n, Universitat Autònoma de Barcelona, 08193 Cerdanyola del Vallès, Spain.,CIBER de Bioingeniería, Biomateriales y Nanomedicina (CIBER-BBN), Carrer de la Vall Moronta s/n, 08193 Cerdanyola del Vallès, Spain
| | - Elena Garcia-Fruitós
- Department of Ruminant Production, Institut de Recerca i Tecnologia Agroalimentàries (IRTA), Torre Marimon, 08140 Caldes de Montbui, Barcelona, Spain
| | - Anna Mitraki
- Department of Materials Science and Technology, University of Crete, Vassilika Vouton, 70013 Heraklion, Crete, Greece.,Institute of Electronic Structure and Laser (IESL), Foundation for Research and Technology Hellas (FORTH), N. Plastira 100, Vassilika Vouton, 70013 Heraklion, Crete, Greece
| | | | - Ursula Rinas
- Leibniz University of Hannover, Technical Chemistry and Life Science, 30167 Hannover, Germany.,Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany
| | - Mauricio A Trujillo-Roldán
- Programa de Investigación de Producción de Biomoléculas, Unidad de Bioprocesos, Departamento de Biología Molecular y Biotecnología, Instituto de Investigaciones Biomédicas, Universidad Nacional Autónoma de México, 04510 Ciudad de México, México
| | - Norma A Valdez-Cruz
- Programa de Investigación de Producción de Biomoléculas, Departamento de Biología Molecular y Biotecnología, Instituto de Investigaciones Biomédicas, Universidad Nacional Autónoma de México, 04510 Ciudad de México, México
| | - Esther Vázquez
- Institut de Biotecnologia i de Biomedicina (IBB), Carrer de la Vall Moronta s/n, Universitat Autònoma de Barcelona, 08193 Cerdanyola del Vallès, Spain.,Departament de Genètica i de Microbiologia, Carrer de la Vall Moronta s/n, Universitat Autònoma de Barcelona, 08193 Cerdanyola del Vallès, Spain.,CIBER de Bioingeniería, Biomateriales y Nanomedicina (CIBER-BBN), Carrer de la Vall Moronta s/n, 08193 Cerdanyola del Vallès, Spain
| | - Antonio Villaverde
- Institut de Biotecnologia i de Biomedicina (IBB), Carrer de la Vall Moronta s/n, Universitat Autònoma de Barcelona, 08193 Cerdanyola del Vallès, Spain.,Departament de Genètica i de Microbiologia, Carrer de la Vall Moronta s/n, Universitat Autònoma de Barcelona, 08193 Cerdanyola del Vallès, Spain.,CIBER de Bioingeniería, Biomateriales y Nanomedicina (CIBER-BBN), Carrer de la Vall Moronta s/n, 08193 Cerdanyola del Vallès, Spain
| |
Collapse
|
37
|
Rawi R, Mall R, Kunji K, Shen CH, Kwong PD, Chuang GY. PaRSnIP: sequence-based protein solubility prediction using gradient boosting machine. Bioinformatics 2019; 34:1092-1098. [PMID: 29069295 DOI: 10.1093/bioinformatics/btx662] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2017] [Accepted: 10/17/2017] [Indexed: 11/13/2022] Open
Abstract
Motivation Protein solubility can be a decisive factor in both research and production efficiency, and in silico sequence-based predictors that can accurately estimate solubility outcomes are highly sought. Results In this study, we present a novel approach termed PRotein SolubIlity Predictor (PaRSnIP), which uses a gradient boosting machine algorithm as well as an approximation of sequence and structural features of the protein of interest. Based on an independent test set, PaRSnIP outperformed other state-of-the-art sequence-based methods by more than 9% in accuracy and 0.17 in Matthew's correlation coefficient, with an overall accuracy of 74% and Matthew's correlation coefficient of 0.48. Additionally, PaRSnIP provides importance scores for all features used in training. We observed higher fractions of exposed residues to associate positively with protein solubility and tripeptide stretches with multiple histidines to associate negatively with solubility. The improved prediction accuracy of PaRSnIP should enable it to predict protein solubility with greater reliability and to screen for sequence variants with enhanced manufacturability. Availability and implementation PaRSnIP software is available for download under GitHub (https://github.com/RedaRawi/PaRSnIP). Contact gwo-yu.chuang@nih.gov. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Reda Rawi
- Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Raghvendra Mall
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, 34110, Qatar
| | - Khalid Kunji
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, 34110, Qatar
| | - Chen-Hsiang Shen
- Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Peter D Kwong
- Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Gwo-Yu Chuang
- Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
38
|
Owji H, Hemmati S. A comprehensive in silico characterization of bacterial signal peptides for the excretory production of Anabaena variabilis phenylalanine ammonia lyase in Escherichia coli. 3 Biotech 2018; 8:488. [PMID: 30498661 DOI: 10.1007/s13205-018-1517-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2018] [Accepted: 11/13/2018] [Indexed: 12/30/2022] Open
Abstract
Anabaena variabilis double mutant (C503S/C565S) phenylalanine ammonia-lyase (PAL) is an appealing enzyme in the enzyme-replacement therapy of phenylketonuria. Yet abundant production of this enzyme has been of concern for industrial production. In this study, we have characterized 1175 bacterial signal peptides (SPs) and identified the most efficient ones for the excretory production of mutant AvPAL. Analysis by SignalP 4.1 revealed that more than 61% of SPs had a D-score greater than 0.7, denoting to be highly efficient. The optimum length of a bacterial SP was 25-30. The preferable net positive charge and the second residue of N-region were + 2 and Lys/Arg, respectively. Highly efficient SPs possessed 3-5 Leus in their H-region and A/L/VXA-FF cleavage site. Highly efficient SPs were from Escherichia coli, corroborating the necessity of an agreement between SPs and the host. Physiochemical characterization of mutant AvPAL conjugates via ProtParam and PROSOII, revealed that ~ 99.5% of proteins would not be entraped in inclusion bodies. Secretory pathways were identified by EffectiveDB and more than 98% of SPs were cleavable. Chimeras were modeled using the I-TASSER program, being evaluated by the Ramachandran plots. The mRNA secondary structure of mutant AvPAL upon linkage to SPs was assessed using the mfold program. It was shown that the linkage of a SP does not affect mutant AvPAL's stability at the protein or mRNA level. AllergenFP tool demonstrated that chimeras were not allergen. SPs, including FMF4_ECOLX, E2BB_ECOLX, and LPTA_ECOLI exhibited the highest propensity for secretion and appropriate features in all analyses.
Collapse
Affiliation(s)
- Hajar Owji
- 1Department of Pharmaceutical Biotechnology, School of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran
- 2Pharmaceutical Sciences Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Shiva Hemmati
- 1Department of Pharmaceutical Biotechnology, School of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran
- 2Pharmaceutical Sciences Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
| |
Collapse
|
39
|
In silico Analysis of Different Signal Peptides for the Excretory Production of Recombinant NS3-GP96 Fusion Protein in Escherichia coli. Int J Pept Res Ther 2018. [DOI: 10.1007/s10989-018-9775-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
40
|
Pellizza L, Smal C, Rodrigo G, Arán M. Codon usage clusters correlation: towards protein solubility prediction in heterologous expression systems in E. coli. Sci Rep 2018; 8:10618. [PMID: 30006617 PMCID: PMC6045634 DOI: 10.1038/s41598-018-29035-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2018] [Accepted: 06/21/2018] [Indexed: 12/15/2022] Open
Abstract
Production of soluble recombinant proteins is crucial to the development of industry and basic research. However, the aggregation due to the incorrect folding of the nascent polypeptides is still a mayor bottleneck. Understanding the factors governing protein solubility is important to grasp the underlying mechanisms and improve the design of recombinant proteins. Here we show a quantitative study of the expression and solubility of a set of proteins from Bizionia argentinensis. Through the analysis of different features known to modulate protein production, we defined two parameters based on the %MinMax algorithm to compare codon usage clusters between the host and the target genes. We demonstrate that the absolute difference between all %MinMax frequencies of the host and the target gene is significantly negatively correlated with protein expression levels. But most importantly, a strong positive correlation between solubility and the degree of conservation of codons usage clusters is observed for two independent datasets. Moreover, we evince that this correlation is higher in codon usage clusters involved in less compact protein secondary structure regions. Our results provide important tools for protein design and support the notion that codon usage may dictate translation rate and modulate co-translational folding.
Collapse
Affiliation(s)
- Leonardo Pellizza
- Laboratory of Nuclear Magnetic Resonance, Fundación Instituto Leloir, IIBBA-CONICET, Av. Patricias Argentinas 435, C1405BWE, CABA, Argentina
| | - Clara Smal
- Laboratory of Nuclear Magnetic Resonance, Fundación Instituto Leloir, IIBBA-CONICET, Av. Patricias Argentinas 435, C1405BWE, CABA, Argentina
| | - Guido Rodrigo
- Laboratory of Nuclear Magnetic Resonance, Fundación Instituto Leloir, IIBBA-CONICET, Av. Patricias Argentinas 435, C1405BWE, CABA, Argentina
| | - Martín Arán
- Laboratory of Nuclear Magnetic Resonance, Fundación Instituto Leloir, IIBBA-CONICET, Av. Patricias Argentinas 435, C1405BWE, CABA, Argentina.
| |
Collapse
|
41
|
Yang Y, Liu G, Liu M, Bai Z, Liu X, Dai X, Guo W. Correlation Between Protein Primary Structure and Soluble Expression Level of HSA dAb in Escherichia coli. Food Technol Biotechnol 2018; 56:101-109. [PMID: 29796003 DOI: 10.17113/ftb.56.01.18.5445] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
It is widely accepted that features such as pI, length, molecular mass and amino acid (AA) sequence have a significant influence on protein solubility. Here, we mainly focused on AA composition and explored those that most affected the soluble expression level of human serum albumin (HSA) domain antibody (dAb). The soluble expression and sequence of 65 dAb variants were analysed using clustering and linear modelling. Certain AAs significantly affected the soluble expression level of dAb, with the specific AA combinations being (S, R, N, D, Q), (G, R, C, N, S) and (R, S, G); these combinations respectively affected the dAb expression level in the broth supernatant, the level in the pellet lysate and total soluble dAb. Among the 20 AAs, R displayed a negative influence on the soluble expression level, whereas G and S showed positive effects. A linear model was built to predict the soluble expression level from the sequence; this model had a prediction accuracy of 80%. In summary, increasing the content of polar AAs, especially G and S, and decreasing the content of R, was helpful to improve the soluble expression level of HSA dAb.
Collapse
Affiliation(s)
- Yankun Yang
- The Key Laboratory of Carbohydrate Chemistry and Biotechnology, School of Biotechnology, Jiangnan University, Ministry of Education, 1800 Lihu Avenue, 214122 Wuxi, PR China.,National Engineering Laboratory for Cereal Fermentation Technology, Jiangnan University, 1800 Lihu Avenue, 214122 Wuxi, PR China
| | - Guoqiang Liu
- The Key Laboratory of Carbohydrate Chemistry and Biotechnology, School of Biotechnology, Jiangnan University, Ministry of Education, 1800 Lihu Avenue, 214122 Wuxi, PR China.,National Engineering Laboratory for Cereal Fermentation Technology, Jiangnan University, 1800 Lihu Avenue, 214122 Wuxi, PR China
| | - Meng Liu
- National Engineering Laboratory for Cereal Fermentation Technology, Jiangnan University, 1800 Lihu Avenue, 214122 Wuxi, PR China
| | - Zhonghu Bai
- National Engineering Laboratory for Cereal Fermentation Technology, Jiangnan University, 1800 Lihu Avenue, 214122 Wuxi, PR China.,Jiangsu Provincial Research Center for Bioactive Product Processing Technology, Jiangnan University, 1800 Lihu Avenue, 214122 Wuxi, PR China
| | - Xiuxia Liu
- National Engineering Laboratory for Cereal Fermentation Technology, Jiangnan University, 1800 Lihu Avenue, 214122 Wuxi, PR China.,Jiangsu Provincial Research Center for Bioactive Product Processing Technology, Jiangnan University, 1800 Lihu Avenue, 214122 Wuxi, PR China
| | - Xiaofeng Dai
- National Engineering Laboratory for Cereal Fermentation Technology, Jiangnan University, 1800 Lihu Avenue, 214122 Wuxi, PR China.,Jiangsu Provincial Research Center for Bioactive Product Processing Technology, Jiangnan University, 1800 Lihu Avenue, 214122 Wuxi, PR China
| | - Wenwen Guo
- Jiangsu Provincial Research Center for Bioactive Product Processing Technology, Jiangnan University, 1800 Lihu Avenue, 214122 Wuxi, PR China.,The Key Laboratory of Industrial Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, 1800 Lihu Avenue, 214122 Wuxi, PR China
| |
Collapse
|
42
|
Assessment of Prokaryotic Signal Peptides for Secretion of Tumor Necrosis Factor Related Apoptosis Inducing Ligand (TRAIL) in E. coli: An in silico Approach. JOURNAL OF PURE AND APPLIED MICROBIOLOGY 2016. [DOI: 10.22207/jpam.10.4.22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
43
|
Yang Y, Wu X, Xuan H, Gao Z. Functional analysis of plant NB-LRR gene L3 by using E. coli. Biochem Biophys Res Commun 2016; 478:1569-74. [PMID: 27586278 DOI: 10.1016/j.bbrc.2016.08.154] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2016] [Accepted: 08/27/2016] [Indexed: 11/19/2022]
Abstract
Plant NB-LRR genes mediate plant innate immunity and cause the programmed cell death of plant cells. Very little, however, is known about these processes. Taken advantage of easy manipulation of bacteria, genetic analysis was made to understand the mechanism of lethality of NB-LRR proteins to bacteria and correlate the information back to how NB-LRR proteins cause cell death in plants. It was found that only L3 encoded by NB-LRR gene L3 (At1g15890) specifically caused significant death of BL21(DE3), while other NBS-LRR proteins did not, and 760-851, the truncated form of L3, was essential to the lethality of L3. Gene yedZ (EG14048) and nupG (EG10664) were identified by genome re-sequencing from E. coli, both of which mediate the toxicity of L3 in E. coli. Furthermore, NupG can affect the activity of peroxidase and significantly suppress plant cell death, which is induced by NB-LRR protein RPM1(D505V) encoded by RPM1 (At3g07040) in N. benthamiana. These findings provide evidence that functional analysis of plant NB-LRR genes in microorganisms might be a potential and rapid method.
Collapse
Affiliation(s)
- Yin Yang
- State Key Laboratory of Hybrid Rice, College of Life Sciences, Wuhan University, Wuhan, 430072, People's Republic of China
| | - Xiaoqiu Wu
- State Key Laboratory of Hybrid Rice, College of Life Sciences, Wuhan University, Wuhan, 430072, People's Republic of China
| | - Hua Xuan
- State Key Laboratory of Hybrid Rice, College of Life Sciences, Wuhan University, Wuhan, 430072, People's Republic of China
| | - Zhiyong Gao
- State Key Laboratory of Hybrid Rice, College of Life Sciences, Wuhan University, Wuhan, 430072, People's Republic of China.
| |
Collapse
|
44
|
Cong C, Yu X, He Y, Dai Y, Zhang Y, Wang M, He M. Cell-free ribosome display and selection of antibodies on arrayed antigens. Proteomics 2016; 16:1291-6. [PMID: 26899874 DOI: 10.1002/pmic.201500412] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2015] [Revised: 01/05/2016] [Accepted: 02/16/2016] [Indexed: 11/09/2022]
Abstract
In vitro display technology is a powerful tool for discovery and optimisation of novel antibodies. With increasing demands on various binding molecules in proteomics studies, techniques for a large-scale generation of antibodies or antibody fragments are needed. Here, we describe a novel method for parallel generation of different antibody fragments (scFv) by integrating cell-free ribosome display with array technology. We have demonstrated the procedure by successfully isolating scFv antibodies specific to 16 different cancer biomarkers via a single process. Our results provide proof of principle for multiple production of various scFv antibodies simultaneously.
Collapse
Affiliation(s)
- Cong Cong
- Lab of Recombinant Protein Therapeutics, Chengdu Institute of Biological Products, SinoPharm, Chengdu, Sichuan, P. R. China
| | - Xin Yu
- Lab of Recombinant Protein Therapeutics, Chengdu Institute of Biological Products, SinoPharm, Chengdu, Sichuan, P. R. China
| | - Yongzhi He
- Lab of Recombinant Protein Therapeutics, Chengdu Institute of Biological Products, SinoPharm, Chengdu, Sichuan, P. R. China
| | - Yujian Dai
- Lab of Recombinant Protein Therapeutics, Chengdu Institute of Biological Products, SinoPharm, Chengdu, Sichuan, P. R. China
| | - Yongxia Zhang
- Lab of Recombinant Protein Therapeutics, Chengdu Institute of Biological Products, SinoPharm, Chengdu, Sichuan, P. R. China
| | - Mingrong Wang
- Lab of Recombinant Protein Therapeutics, Chengdu Institute of Biological Products, SinoPharm, Chengdu, Sichuan, P. R. China
| | - Mingyue He
- Lab of Recombinant Protein Therapeutics, Chengdu Institute of Biological Products, SinoPharm, Chengdu, Sichuan, P. R. China
| |
Collapse
|
45
|
Chang CCH, Li C, Webb GI, Tey B, Song J, Ramanan RN. Periscope: quantitative prediction of soluble protein expression in the periplasm of Escherichia coli. Sci Rep 2016; 6:21844. [PMID: 26931649 PMCID: PMC4773868 DOI: 10.1038/srep21844] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2015] [Accepted: 01/28/2016] [Indexed: 12/20/2022] Open
Abstract
Periplasmic expression of soluble proteins in Escherichia coli not only offers a much-simplified downstream purification process, but also enhances the probability of obtaining correctly folded and biologically active proteins. Different combinations of signal peptides and target proteins lead to different soluble protein expression levels, ranging from negligible to several grams per litre. Accurate algorithms for rational selection of promising candidates can serve as a powerful tool to complement with current trial-and-error approaches. Accordingly, proteomics studies can be conducted with greater efficiency and cost-effectiveness. Here, we developed a predictor with a two-stage architecture, to predict the real-valued expression level of target protein in the periplasm. The output of the first-stage support vector machine (SVM) classifier determines which second-stage support vector regression (SVR) classifier to be used. When tested on an independent test dataset, the predictor achieved an overall prediction accuracy of 78% and a Pearson's correlation coefficient (PCC) of 0.77. We further illustrate the relative importance of various features with respect to different models. The results indicate that the occurrence of dipeptide glutamine and aspartic acid is the most important feature for the classification model. Finally, we provide access to the implemented predictor through the Periscope webserver, freely accessible at http://lightning.med.monash.edu/periscope/.
Collapse
Affiliation(s)
- Catherine Ching Han Chang
- Chemical Engineering Discipline, School of Engineering, Monash University, Jalan Lagoon Selatan 46150, Bandar Sunway, Selangor, Malaysia
- Department of Biochemistry and Molecular Biology, Monash University, Melbourne VIC 3800, Australia
| | - Chen Li
- Department of Biochemistry and Molecular Biology, Monash University, Melbourne VIC 3800, Australia
| | - Geoffrey I. Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne VIC 3800, Australia
| | - BengTi Tey
- Chemical Engineering Discipline, School of Engineering, Monash University, Jalan Lagoon Selatan 46150, Bandar Sunway, Selangor, Malaysia
- Advanced Engineering Platform, School of Engineering, Monash University, Jalan Lagoon Selatan 46150, Bandar Sunway, Selangor, Malaysia
| | - Jiangning Song
- Department of Biochemistry and Molecular Biology, Monash University, Melbourne VIC 3800, Australia
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne VIC 3800, Australia
- National Engineering Laboratory for Industrial Enzymes, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
| | - Ramakrishnan Nagasundara Ramanan
- Chemical Engineering Discipline, School of Engineering, Monash University, Jalan Lagoon Selatan 46150, Bandar Sunway, Selangor, Malaysia
- Advanced Engineering Platform, School of Engineering, Monash University, Jalan Lagoon Selatan 46150, Bandar Sunway, Selangor, Malaysia
- School of Chemistry, Monash University, Melbourne VIC 3800, Australia
| |
Collapse
|
46
|
Li C, Ching Han Chang C, Nagel J, Porebski BT, Hayashida M, Akutsu T, Song J, Buckle AM. Critical evaluation of in silico methods for prediction of coiled-coil domains in proteins. Brief Bioinform 2016; 17:270-82. [PMID: 26177815 PMCID: PMC6078162 DOI: 10.1093/bib/bbv047] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Revised: 05/29/2015] [Indexed: 12/19/2022] Open
Abstract
Coiled-coils refer to a bundle of helices coiled together like strands of a rope. It has been estimated that nearly 3% of protein-encoding regions of genes harbour coiled-coil domains (CCDs). Experimental studies have confirmed that CCDs play a fundamental role in subcellular infrastructure and controlling trafficking of eukaryotic cells. Given the importance of coiled-coils, multiple bioinformatics tools have been developed to facilitate the systematic and high-throughput prediction of CCDs in proteins. In this article, we review and compare 12 sequence-based bioinformatics approaches and tools for coiled-coil prediction. These approaches can be categorized into two classes: coiled-coil detection and coiled-coil oligomeric state prediction. We evaluated and compared these methods in terms of their input/output, algorithm, prediction performance, validation methods and software utility. All the independent testing data sets are available at http://lightning.med.monash.edu/coiledcoil/. In addition, we conducted a case study of nine human polyglutamine (PolyQ) disease-related proteins and predicted CCDs and oligomeric states using various predictors. Prediction results for CCDs were highly variable among different predictors. Only two peptides from two proteins were confirmed to be CCDs by majority voting. Both domains were predicted to form dimeric coiled-coils using oligomeric state prediction. We anticipate that this comprehensive analysis will be an insightful resource for structural biologists with limited prior experience in bioinformatics tools, and for bioinformaticians who are interested in designing novel approaches for coiled-coil and its oligomeric state prediction.
Collapse
|
47
|
Soluble expression and stability enhancement of transcription factors using 30Kc19 cell-penetrating protein. Appl Microbiol Biotechnol 2015; 100:3523-32. [DOI: 10.1007/s00253-015-7199-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2015] [Revised: 11/15/2015] [Accepted: 11/23/2015] [Indexed: 12/20/2022]
|
48
|
Habibi N, Norouzi A, Mohd Hashim SZ, Shamsir MS, Samian R. Prediction of recombinant protein overexpression in Escherichia coli using a machine learning based model (RPOLP). Comput Biol Med 2015; 66:330-6. [DOI: 10.1016/j.compbiomed.2015.09.015] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2015] [Revised: 09/18/2015] [Accepted: 09/19/2015] [Indexed: 01/28/2023]
|
49
|
Zamani M, Nezafat N, Negahdaripour M, Dabbagh F, Ghasemi Y. In Silico Evaluation of Different Signal Peptides for the Secretory Production of Human Growth Hormone in E. coli. Int J Pept Res Ther 2015. [DOI: 10.1007/s10989-015-9454-z] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
50
|
Wang H, Wang M, Tan H, Li Y, Zhang Z, Song J. PredPPCrys: accurate prediction of sequence cloning, protein production, purification and crystallization propensity from protein sequences using multi-step heterogeneous feature fusion and selection. PLoS One 2014; 9:e105902. [PMID: 25148528 PMCID: PMC4141844 DOI: 10.1371/journal.pone.0105902] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2014] [Accepted: 07/25/2014] [Indexed: 01/14/2023] Open
Abstract
X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed ‘PredPPCrys’ using the support vector machine (SVM). Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I). Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II), which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization targets of currently non-crystallizable proteins were provided as compendium data, which are anticipated to facilitate target selection and design for the worldwide structural genomics consortium. PredPPCrys is freely available at http://www.structbioinfor.org/PredPPCrys.
Collapse
Affiliation(s)
- Huilin Wang
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China
| | - Mingjun Wang
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China
| | - Hao Tan
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, Victoria, Australia
| | - Yuan Li
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, China
- * E-mail: (JS); (ZZ)
| | - Jiangning Song
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, Victoria, Australia
- ARC Centre of Excellence in Structural and Functional Microbial Genomics, Monash University, Melbourne, Victoria, Australia
- * E-mail: (JS); (ZZ)
| |
Collapse
|