1
|
Taguchi H, Niwa T. Reconstituted cell-free translation systems for exploring protein folding and aggregation. J Mol Biol 2024:168726. [PMID: 39074633 DOI: 10.1016/j.jmb.2024.168726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 07/18/2024] [Accepted: 07/24/2024] [Indexed: 07/31/2024]
Abstract
Protein folding is crucial for achieving functional three-dimensional structures. However, the process is often hampered by aggregate formation, necessitating the presence of chaperones and quality control systems within the cell to maintain protein homeostasis. Despite a long history of folding studies involving the denaturation and subsequent refolding of translation-completed purified proteins, numerous facets of cotranslational folding, wherein nascent polypeptides are synthesized by ribosomes and folded during translation, remain unexplored. Cell-free protein synthesis (CFPS) systems are invaluable tools for studying cotranslational folding, offering a platform not only for elucidating mechanisms but also for large-scale analyses to identify aggregation-prone proteins. This review provides an overview of the extensive use of CFPS in folding studies to date. In particular, we discuss a comprehensive aggregation formation assay of thousands of Escherichia coli proteins conducted under chaperone-free conditions using a reconstituted translation system, along with its derived studies.
Collapse
Affiliation(s)
- Hideki Taguchi
- Cell Biology Center, Institute of Innovative Research, Tokyo Institute of Technology, S2-19, 4259 Nagatsuta, Midori-ku, Yokohama 226-8501, Japan.
| | - Tatsuya Niwa
- Cell Biology Center, Institute of Innovative Research, Tokyo Institute of Technology, S2-19, 4259 Nagatsuta, Midori-ku, Yokohama 226-8501, Japan
| |
Collapse
|
2
|
Kim H, Seo J. A Novel Strategy to Identify Endolysins with Lytic Activity against Methicillin-Resistant Staphylococcus aureus. Int J Mol Sci 2023; 24:ijms24065772. [PMID: 36982851 PMCID: PMC10059956 DOI: 10.3390/ijms24065772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 03/10/2023] [Accepted: 03/15/2023] [Indexed: 03/22/2023] Open
Abstract
The increasing prevalence of methicillin-resistant Staphylococcus aureus (MRSA) in the dairy industry has become a fundamental concern. Endolysins are bacteriophage-derived peptidoglycan hydrolases that induce the rapid lysis of host bacteria. Herein, we evaluated the lytic activity of endolysin candidates against S. aureus and MRSA. To identify endolysins, we used a bioinformatical strategy with the following steps: (1) retrieval of genetic information, (2) annotation, (3) selection of MRSA, (4) selection of endolysin candidates, and (5) evaluation of protein solubility. We then characterized the endolysin candidates under various conditions. Approximately 67% of S. aureus was detected as MRSA, and 114 putative endolysins were found. These 114 putative endolysins were divided into three groups based on their combinations of conserved domains. Considering protein solubility, we selected putative endolysins 117 and 177. Putative endolysin 117 was the only successfully overexpressed endolysin, and it was renamed LyJH1892. LyJH1892 showed potent lytic activity against both methicillin-susceptible S. aureus and MRSA and showed broad lytic activity against coagulase-negative staphylococci. In conclusion, this study demonstrates a rapid strategy for the development of endolysin against MRSA. This strategy could also be used to combat other antibiotic-resistant bacteria.
Collapse
|
3
|
Enhancement of the solubility of recombinant proteins by fusion with a short-disordered peptide. J Microbiol 2022; 60:960-967. [DOI: 10.1007/s12275-022-2122-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 06/07/2022] [Accepted: 06/13/2022] [Indexed: 10/17/2022]
|
4
|
Chen J, Zheng S, Zhao H, Yang Y. Structure-aware protein solubility prediction from sequence through graph convolutional network and predicted contact map. J Cheminform 2021; 13:7. [PMID: 33557952 PMCID: PMC7869490 DOI: 10.1186/s13321-021-00488-1] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 01/20/2021] [Indexed: 11/26/2022] Open
Abstract
Protein solubility is significant in producing new soluble proteins that can reduce the cost of biocatalysts or therapeutic agents. Therefore, a computational model is highly desired to accurately predict protein solubility from the amino acid sequence. Many methods have been developed, but they are mostly based on the one-dimensional embedding of amino acids that is limited to catch spatially structural information. In this study, we have developed a new structure-aware method GraphSol to predict protein solubility by attentive graph convolutional network (GCN), where the protein topology attribute graph was constructed through predicted contact maps only from the sequence. GraphSol was shown to substantially outperform other sequence-based methods. The model was proven to be stable by consistent [Formula: see text] of 0.48 in both the cross-validation and independent test of the eSOL dataset. To our best knowledge, this is the first study to utilize the GCN for sequence-based protein solubility predictions. More importantly, this architecture could be easily extended to other protein prediction tasks requiring a raw protein sequence.
Collapse
Affiliation(s)
- Jianwen Chen
- School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, China
| | - Shuangjia Zheng
- School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, China
| | - Huiying Zhao
- Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, Guangzhou, China
| | - Yuedong Yang
- School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, China.
- Key Laboratory of Machine Intelligence and Advanced Computing (Sun Yat-Sen University), Guangzhou, 510000, China.
| |
Collapse
|
5
|
Han X, Zhang L, Zhou K, Wang X. ProGAN: Protein solubility generative adversarial nets for data augmentation in DNN framework. Comput Chem Eng 2019. [DOI: 10.1016/j.compchemeng.2019.106533] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
6
|
Han X, Wang X, Zhou K. Develop machine learning-based regression predictive models for engineering protein solubility. Bioinformatics 2019; 35:4640-4646. [DOI: 10.1093/bioinformatics/btz294] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2018] [Revised: 03/09/2019] [Accepted: 04/17/2019] [Indexed: 11/14/2022] Open
Abstract
Abstract
Motivation
Protein activity is a significant characteristic for recombinant proteins which can be used as biocatalysts. High activity of proteins reduces the cost of biocatalysts. A model that can predict protein activity from amino acid sequence is highly desired, as it aids experimental improvement of proteins. However, only limited data for protein activity are currently available, which prevents the development of such models. Since protein activity and solubility are correlated for some proteins, the publicly available solubility dataset may be adopted to develop models that can predict protein solubility from sequence. The models could serve as a tool to indirectly predict protein activity from sequence. In literature, predicting protein solubility from sequence has been intensively explored, but the predicted solubility represented in binary values from all the developed models was not suitable for guiding experimental designs to improve protein solubility. Here we propose new machine learning (ML) models for improving protein solubility in vivo.
Results
We first implemented a novel approach that predicted protein solubility in continuous numerical values instead of binary ones. After combining it with various ML algorithms, we achieved a R2 of 0.4115 when support vector machine algorithm was used. Continuous values of solubility are more meaningful in protein engineering, as they enable researchers to choose proteins with higher predicted solubility for experimental validation, while binary values fail to distinguish proteins with the same value—there are only two possible values so many proteins have the same one.
Availability and implementation
We present the ML workflow as a series of IPython notebooks hosted on GitHub (https://github.com/xiaomizhou616/protein_solubility). The workflow can be used as a template for analysis of other expression and solubility datasets.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xi Han
- Department of Chemical and Biomolecular Engineering, National University of Singapore, 117585 Singapore
| | - Xiaonan Wang
- Department of Chemical and Biomolecular Engineering, National University of Singapore, 117585 Singapore
| | - Kang Zhou
- Department of Chemical and Biomolecular Engineering, National University of Singapore, 117585 Singapore
| |
Collapse
|
7
|
Pellizza L, Smal C, Rodrigo G, Arán M. Codon usage clusters correlation: towards protein solubility prediction in heterologous expression systems in E. coli. Sci Rep 2018; 8:10618. [PMID: 30006617 PMCID: PMC6045634 DOI: 10.1038/s41598-018-29035-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2018] [Accepted: 06/21/2018] [Indexed: 12/15/2022] Open
Abstract
Production of soluble recombinant proteins is crucial to the development of industry and basic research. However, the aggregation due to the incorrect folding of the nascent polypeptides is still a mayor bottleneck. Understanding the factors governing protein solubility is important to grasp the underlying mechanisms and improve the design of recombinant proteins. Here we show a quantitative study of the expression and solubility of a set of proteins from Bizionia argentinensis. Through the analysis of different features known to modulate protein production, we defined two parameters based on the %MinMax algorithm to compare codon usage clusters between the host and the target genes. We demonstrate that the absolute difference between all %MinMax frequencies of the host and the target gene is significantly negatively correlated with protein expression levels. But most importantly, a strong positive correlation between solubility and the degree of conservation of codons usage clusters is observed for two independent datasets. Moreover, we evince that this correlation is higher in codon usage clusters involved in less compact protein secondary structure regions. Our results provide important tools for protein design and support the notion that codon usage may dictate translation rate and modulate co-translational folding.
Collapse
Affiliation(s)
- Leonardo Pellizza
- Laboratory of Nuclear Magnetic Resonance, Fundación Instituto Leloir, IIBBA-CONICET, Av. Patricias Argentinas 435, C1405BWE, CABA, Argentina
| | - Clara Smal
- Laboratory of Nuclear Magnetic Resonance, Fundación Instituto Leloir, IIBBA-CONICET, Av. Patricias Argentinas 435, C1405BWE, CABA, Argentina
| | - Guido Rodrigo
- Laboratory of Nuclear Magnetic Resonance, Fundación Instituto Leloir, IIBBA-CONICET, Av. Patricias Argentinas 435, C1405BWE, CABA, Argentina
| | - Martín Arán
- Laboratory of Nuclear Magnetic Resonance, Fundación Instituto Leloir, IIBBA-CONICET, Av. Patricias Argentinas 435, C1405BWE, CABA, Argentina.
| |
Collapse
|
8
|
Gao YN, Hao QH, Zhang HL, Zhou B, Yu XM, Wang XL. Reduction of soy isoflavones by use of Escherichia coli
whole-cell biocatalyst expressing isoflavone reductase under aerobic conditions. Lett Appl Microbiol 2016; 63:111-6. [DOI: 10.1111/lam.12594] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2015] [Revised: 05/22/2016] [Accepted: 05/23/2016] [Indexed: 11/30/2022]
Affiliation(s)
- Y.-N. Gao
- College of Life Sciences; Agricultural University of Hebei; Baoding China
| | - Q.-H. Hao
- College of Life Sciences; Agricultural University of Hebei; Baoding China
| | - H.-L. Zhang
- College of Life Sciences; Agricultural University of Hebei; Baoding China
| | - B. Zhou
- College of Life Sciences; Agricultural University of Hebei; Baoding China
| | - X.-M. Yu
- College of Life Sciences; Agricultural University of Hebei; Baoding China
| | - X.-L. Wang
- College of Life Sciences; Agricultural University of Hebei; Baoding China
| |
Collapse
|
9
|
Marczak M, Okoniewska K, Grabowski T. Classification model of amino acid sequences prone to aggregation of therapeutic proteins. In Silico Pharmacol 2016; 4:6. [PMID: 27388622 PMCID: PMC4937009 DOI: 10.1186/s40203-016-0019-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2016] [Accepted: 06/15/2016] [Indexed: 11/28/2022] Open
Abstract
Background Total body clearance of biological drugs is for the most part dependent on the receptor mechanisms (receptor mediated clearance) and the concentration of antibodies aimed at administered drug – anti-drug-antibodies (ADA). One of the significant factors that induces the increase of ADA level after drug administration could be the aggregates present in the finished product or formed in the organism. Numerous attempts have been made to identify the sequence fragments that could be responsible for forming the aggregates – aggregate prone regions (APR). Purpose The aim of this study was to find physiochemical parameters specific to APR that would differentiate APR from other sequences present in therapeutic proteins. Methods Two groups of amino acid sequences were used in the study. The first one was represented by the sequences separated from the therapeutic proteins (n = 84) able to form APR. A control set (CS) consisted of peptides that were chosen based on 22 tregitope sequences. Results Classification model and four classes (A, B, C, D) of sequences were finally presented. For model validation Cooper statistics was presented. Conclusions The study proposes a classification model of APR. This consists in a distinction of APR from sequences that do not form aggregates based on the differences in the value of physicochemical parameters. Significant share of electrostatic parameters in relation to classification model was indicated.
Collapse
Affiliation(s)
| | - Krystyna Okoniewska
- P.F.O. Vetos-Farma sp. z o. o., ul. Dzierżoniowska 21, 58-260, Bielawa, Poland.
| | | |
Collapse
|
10
|
Das Roy R, Bhardwaj M, Bhatnagar V, Chakraborty K, Dash D. How do eubacterial organisms manage aggregation-prone proteome? F1000Res 2014; 3:137. [PMID: 25339987 PMCID: PMC4193397 DOI: 10.12688/f1000research.4307.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/24/2014] [Indexed: 11/20/2022] Open
Abstract
Eubacterial genomes vary considerably in their nucleotide composition. The percentage of genetic material constituted by guanosine and cytosine (GC) nucleotides ranges from 20% to 70%. It has been posited that GC-poor organisms are more dependent on protein folding machinery. Previous studies have ascribed this to the accumulation of mildly deleterious mutations in these organisms due to population bottlenecks. This phenomenon has been supported by protein folding simulations, which showed that proteins encoded by GC-poor organisms are more prone to aggregation than proteins encoded by GC-rich organisms. To test this proposition using a genome-wide approach, we classified different eubacterial proteomes in terms of their aggregation propensity and chaperone-dependence using multiple machine learning models. In contrast to the expected decrease in protein aggregation with an increase in GC richness, we found that the aggregation propensity of proteomes increases with GC content. A similar and even more significant correlation was obtained with the GroEL-dependence of proteomes: GC-poor proteomes have evolved to be less dependent on GroEL than GC-rich proteomes. We thus propose that a decrease in eubacterial GC content may have been selected in organisms facing proteostasis problems.
Collapse
Affiliation(s)
- Rishi Das Roy
- GNR Knowledge Centre for Genome Informatics, Institute of Genomics and Integrative Biology, Council of Scientific and Industrial Research, Delhi, 110007, India ; Department of Biotechnology, University of Pune, Pune, 411007, India
| | - Manju Bhardwaj
- Department of Computer Science, Maitreyi College, Chanakyapuri, Delhi, 110021, India
| | - Vasudha Bhatnagar
- Department of Computer Science, Faculty of Mathematical Sciences, University of Delhi, Delhi, 110007, India
| | - Kausik Chakraborty
- GNR Knowledge Centre for Genome Informatics, Institute of Genomics and Integrative Biology, Council of Scientific and Industrial Research, Delhi, 110007, India
| | - Debasis Dash
- GNR Knowledge Centre for Genome Informatics, Institute of Genomics and Integrative Biology, Council of Scientific and Industrial Research, Delhi, 110007, India ; Department of Biotechnology, University of Pune, Pune, 411007, India
| |
Collapse
|
11
|
A review of machine learning methods to predict the solubility of overexpressed recombinant proteins in Escherichia coli. BMC Bioinformatics 2014; 15:134. [PMID: 24885721 PMCID: PMC4098780 DOI: 10.1186/1471-2105-15-134] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2013] [Accepted: 03/25/2014] [Indexed: 12/14/2022] Open
Abstract
Background Over the last 20 years in biotechnology, the production of recombinant proteins has been a crucial bioprocess in both biopharmaceutical and research arena in terms of human health, scientific impact and economic volume. Although logical strategies of genetic engineering have been established, protein overexpression is still an art. In particular, heterologous expression is often hindered by low level of production and frequent fail due to opaque reasons. The problem is accentuated because there is no generic solution available to enhance heterologous overexpression. For a given protein, the extent of its solubility can indicate the quality of its function. Over 30% of synthesized proteins are not soluble. In certain experimental circumstances, including temperature, expression host, etc., protein solubility is a feature eventually defined by its sequence. Until now, numerous methods based on machine learning are proposed to predict the solubility of protein merely from its amino acid sequence. In spite of the 20 years of research on the matter, no comprehensive review is available on the published methods. Results This paper presents an extensive review of the existing models to predict protein solubility in Escherichia coli recombinant protein overexpression system. The models are investigated and compared regarding the datasets used, features, feature selection methods, machine learning techniques and accuracy of prediction. A discussion on the models is provided at the end. Conclusions This study aims to investigate extensively the machine learning based methods to predict recombinant protein solubility, so as to offer a general as well as a detailed understanding for researches in the field. Some of the models present acceptable prediction performances and convenient user interfaces. These models can be considered as valuable tools to predict recombinant protein overexpression results before performing real laboratory experiments, thus saving labour, time and cost.
Collapse
|
12
|
Li ZC, Lai YH, Chen LL, Xie Y, Dai Z, Zou XY. Identifying functions of protein complexes based on topology similarity with random forest. MOLECULAR BIOSYSTEMS 2014; 10:514-25. [PMID: 24389559 DOI: 10.1039/c3mb70401g] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Elucidating the functions of protein complexes is critical for understanding disease mechanisms, diagnosis and therapy. In this study, based on the concept that protein complexes with similar topology may have similar functions, we firstly model protein complexes as weighted graphs with nodes representing the proteins and edges indicating interaction between proteins. Secondly, we use topology features derived from the graphs to characterize protein complexes based on the graph theory. Finally, we construct a predictor by using random forest and topology features to identify the functions of protein complexes. Effectiveness of the current method is evaluated by identifying the functions of mammalian protein complexes. And then the predictor is also utilized to identify the functions of protein complexes retrieved from human protein-protein interaction networks. We identify some protein complexes with significant roles in the occurrence of tumors, vesicles and retinoblastoma. It is anticipated that the current research has an important impact on pathogenesis and the pharmaceutical industry. The source code of Matlab and the dataset are freely available on request from the authors.
Collapse
Affiliation(s)
- Zhan-Chao Li
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, People's Republic of China.
| | | | | | | | | | | |
Collapse
|