Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Fang Y, Fang J. Discrimination of soluble and aggregation-prone proteins based on sequence information. Mol Biosyst 2013;9:806-11. [PMID: 23440081 PMCID: PMC3627541 DOI: 10.1039/c3mb70033j] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]

For:	Fang Y, Fang J. Discrimination of soluble and aggregation-prone proteins based on sequence information. Mol Biosyst 2013;9:806-11. [PMID: 23440081 PMCID: PMC3627541 DOI: 10.1039/c3mb70033j] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]

Number

Cited by Other Article(s)

Taguchi H, Niwa T. Reconstituted cell-free translation systems for exploring protein folding and aggregation. J Mol Biol 2024:168726. [PMID: 39074633 DOI: 10.1016/j.jmb.2024.168726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 07/18/2024] [Accepted: 07/24/2024] [Indexed: 07/31/2024]

Kim H, Seo J. A Novel Strategy to Identify Endolysins with Lytic Activity against Methicillin-Resistant Staphylococcus aureus. Int J Mol Sci 2023;24:ijms24065772. [PMID: 36982851 PMCID: PMC10059956 DOI: 10.3390/ijms24065772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 03/10/2023] [Accepted: 03/15/2023] [Indexed: 03/22/2023] Open

Enhancement of the solubility of recombinant proteins by fusion with a short-disordered peptide. J Microbiol 2022;60:960-967. [DOI: 10.1007/s12275-022-2122-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 06/07/2022] [Accepted: 06/13/2022] [Indexed: 10/17/2022]

Chen J, Zheng S, Zhao H, Yang Y. Structure-aware protein solubility prediction from sequence through graph convolutional network and predicted contact map. J Cheminform 2021;13:7. [PMID: 33557952 PMCID: PMC7869490 DOI: 10.1186/s13321-021-00488-1] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 01/20/2021] [Indexed: 11/26/2022] Open

Han X, Zhang L, Zhou K, Wang X. ProGAN: Protein solubility generative adversarial nets for data augmentation in DNN framework. Comput Chem Eng 2019. [DOI: 10.1016/j.compchemeng.2019.106533] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]

Han X, Wang X, Zhou K. Develop machine learning-based regression predictive models for engineering protein solubility. Bioinformatics 2019;35:4640-4646. [DOI: 10.1093/bioinformatics/btz294] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2018] [Revised: 03/09/2019] [Accepted: 04/17/2019] [Indexed: 11/14/2022] Open

Abstract Abstract Motivation Protein activity is a significant characteristic for recombinant proteins which can be used as biocatalysts. High activity of proteins reduces the cost of biocatalysts. A model that can predict protein activity from amino acid sequence is highly desired, as it aids experimental improvement of proteins. However, only limited data for protein activity are currently available, which prevents the development of such models. Since protein activity and solubility are correlated for some proteins, the publicly available solubility dataset may be adopted to develop models that can predict protein solubility from sequence. The models could serve as a tool to indirectly predict protein activity from sequence. In literature, predicting protein solubility from sequence has been intensively explored, but the predicted solubility represented in binary values from all the developed models was not suitable for guiding experimental designs to improve protein solubility. Here we propose new machine learning (ML) models for improving protein solubility in vivo. Results We first implemented a novel approach that predicted protein solubility in continuous numerical values instead of binary ones. After combining it with various ML algorithms, we achieved a R2 of 0.4115 when support vector machine algorithm was used. Continuous values of solubility are more meaningful in protein engineering, as they enable researchers to choose proteins with higher predicted solubility for experimental validation, while binary values fail to distinguish proteins with the same value—there are only two possible values so many proteins have the same one. Availability and implementation We present the ML workflow as a series of IPython notebooks hosted on GitHub (https://github.com/xiaomizhou616/protein_solubility). The workflow can be used as a template for analysis of other expression and solubility datasets. Supplementary information Supplementary data are available at Bioinformatics online. Collapse

Pellizza L, Smal C, Rodrigo G, Arán M. Codon usage clusters correlation: towards protein solubility prediction in heterologous expression systems in E. coli. Sci Rep 2018;8:10618. [PMID: 30006617 PMCID: PMC6045634 DOI: 10.1038/s41598-018-29035-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2018] [Accepted: 06/21/2018] [Indexed: 12/15/2022] Open

Gao YN, Hao QH, Zhang HL, Zhou B, Yu XM, Wang XL. Reduction of soy isoflavones by use of Escherichia coli whole-cell biocatalyst expressing isoflavone reductase under aerobic conditions. Lett Appl Microbiol 2016;63:111-6. [DOI: 10.1111/lam.12594] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2015] [Revised: 05/22/2016] [Accepted: 05/23/2016] [Indexed: 11/30/2022]

Marczak M, Okoniewska K, Grabowski T. Classification model of amino acid sequences prone to aggregation of therapeutic proteins. In Silico Pharmacol 2016;4:6. [PMID: 27388622 PMCID: PMC4937009 DOI: 10.1186/s40203-016-0019-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2016] [Accepted: 06/15/2016] [Indexed: 11/28/2022] Open

Das Roy R, Bhardwaj M, Bhatnagar V, Chakraborty K, Dash D. How do eubacterial organisms manage aggregation-prone proteome? F1000Res 2014;3:137. [PMID: 25339987 PMCID: PMC4193397 DOI: 10.12688/f1000research.4307.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/24/2014] [Indexed: 11/20/2022] Open

A review of machine learning methods to predict the solubility of overexpressed recombinant proteins in Escherichia coli. BMC Bioinformatics 2014;15:134. [PMID: 24885721 PMCID: PMC4098780 DOI: 10.1186/1471-2105-15-134] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2013] [Accepted: 03/25/2014] [Indexed: 12/14/2022] Open

Abstract

Background

Over the last 20 years in biotechnology, the production of recombinant proteins has been a crucial bioprocess in both biopharmaceutical and research arena in terms of human health, scientific impact and economic volume. Although logical strategies of genetic engineering have been established, protein overexpression is still an art. In particular, heterologous expression is often hindered by low level of production and frequent fail due to opaque reasons. The problem is accentuated because there is no generic solution available to enhance heterologous overexpression. For a given protein, the extent of its solubility can indicate the quality of its function. Over 30% of synthesized proteins are not soluble. In certain experimental circumstances, including temperature, expression host, etc., protein solubility is a feature eventually defined by its sequence. Until now, numerous methods based on machine learning are proposed to predict the solubility of protein merely from its amino acid sequence. In spite of the 20 years of research on the matter, no comprehensive review is available on the published methods.

Results

This paper presents an extensive review of the existing models to predict protein solubility in Escherichia coli recombinant protein overexpression system. The models are investigated and compared regarding the datasets used, features, feature selection methods, machine learning techniques and accuracy of prediction. A discussion on the models is provided at the end.

Conclusions

This study aims to investigate extensively the machine learning based methods to predict recombinant protein solubility, so as to offer a general as well as a detailed understanding for researches in the field. Some of the models present acceptable prediction performances and convenient user interfaces. These models can be considered as valuable tools to predict recombinant protein overexpression results before performing real laboratory experiments, thus saving labour, time and cost.

Collapse

Li ZC, Lai YH, Chen LL, Xie Y, Dai Z, Zou XY. Identifying functions of protein complexes based on topology similarity with random forest. MOLECULAR BIOSYSTEMS 2014;10:514-25. [PMID: 24389559 DOI: 10.1039/c3mb70401g] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]