1
|
Hennigan JN, Menacho-Melgar R, Sarkar P, Golovsky M, Lynch MD. Scalable, robust, high-throughput expression & purification of nanobodies enabled by 2-stage dynamic control. Metab Eng 2024; 85:116-130. [PMID: 39059674 PMCID: PMC11408108 DOI: 10.1016/j.ymben.2024.07.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 05/16/2024] [Accepted: 07/24/2024] [Indexed: 07/28/2024]
Abstract
Nanobodies are single-domain antibody fragments that have garnered considerable use as diagnostic and therapeutic agents as well as research tools. However, obtaining pure VHHs, like many proteins, can be laborious and inconsistent. High level cytoplasmic expression in E. coli can be challenging due to improper folding and insoluble aggregation caused by reduction of the conserved disulfide bond. We report a systems engineering approach leveraging engineered strains of E. coli, in combination with a two-stage process and simplified downstream purification, enabling improved, robust, soluble cytoplasmic nanobody expression, as well as rapid cell autolysis and purification. This approach relies on the dynamic control over the reduction potential of the cytoplasm, incorporates lysis enzymes for purification, and can also integrate dynamic expression of protein folding catalysts. Collectively, the engineered system results in more robust growth and protein expression, enabling efficient scalable nanobody production, and purification from high throughput microtiter plates, to routine shake flask cultures and larger instrumented bioreactors. We expect this system will expedite VHH development.
Collapse
Affiliation(s)
| | | | - Payel Sarkar
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | | | - Michael D Lynch
- Department of Biomedical Engineering, Duke University, Durham, NC, USA.
| |
Collapse
|
2
|
Turina P, Fariselli P, Capriotti E. K-Pro: Kinetics Data on Proteins and Mutants. J Mol Biol 2023; 435:168245. [PMID: 37625584 DOI: 10.1016/j.jmb.2023.168245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 08/16/2023] [Accepted: 08/17/2023] [Indexed: 08/27/2023]
Abstract
The study of protein folding plays a crucial role in improving our understanding of protein function and of the relationship between genetics and phenotypes. In particular, understanding the thermodynamics and kinetics of the folding process is important for uncovering the mechanisms behind human disorders caused by protein misfolding. To address this issue, it is essential to collect and curate experimental kinetic and thermodynamic data on protein folding. K-Pro is a new database designed for collecting and storing experimental kinetic data on monomeric proteins, with a two-state folding mechanism. With 1,529 records from 62 proteins corresponding to 65 structures, K-Pro contains various kinetic parameters such as the logarithm of the folding and unfolding rates, Tanford's β and the ϕ values. When available, the database also includes thermodynamic parameters associated with the kinetic data. K-Pro features a user-friendly interface that allows browsing and downloading kinetic data of interest. The graphical interface provides a visual representation of the protein and mutants, and it is cross-linked to key databases such as PDB, UniProt, and PubMed. K-Pro is open and freely accessible through https://folding.biofold.org/k-pro and supports the latest versions of popular browsers.
Collapse
Affiliation(s)
- Paola Turina
- Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Via F. Selmi 3, 40126 Bologna, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Emidio Capriotti
- Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Via F. Selmi 3, 40126 Bologna, Italy.
| |
Collapse
|
3
|
Yang Y, Chong Z, Vihinen M. PON-Fold: Prediction of Substitutions Affecting Protein Folding Rate. Int J Mol Sci 2023; 24:13023. [PMID: 37629203 PMCID: PMC10455311 DOI: 10.3390/ijms241613023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/08/2023] [Accepted: 08/09/2023] [Indexed: 08/27/2023] Open
Abstract
Most proteins fold into characteristic three-dimensional structures. The rate of folding and unfolding varies widely and can be affected by variations in proteins. We developed a novel machine-learning-based method for the prediction of the folding rate effects of amino acid substitutions in two-state folding proteins. We collected a data set of experimentally defined folding rates for variants and used them to train a gradient boosting algorithm starting with 1161 features. Two predictors were designed. The three-class classifier had, in blind tests, specificity and sensitivity ranging from 0.324 to 0.419 and from 0.256 to 0.451, respectively. The other tool was a regression predictor that showed a Pearson correlation coefficient of 0.525. The error measures, mean absolute error and mean squared error, were 0.581 and 0.603, respectively. One of the previously presented tools could be used for comparison with the blind test data set, our method called PON-Fold showed superior performance on all used measures. The applicability of the tool was tested by predicting all possible substitutions in a protein domain. Predictions for different conformations of proteins, open and closed forms of a protein kinase, and apo and holo forms of an enzyme indicated that the choice of the structure had a large impact on the outcome. PON-Fold is freely available.
Collapse
Affiliation(s)
- Yang Yang
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China; (Y.Y.); (Z.C.)
- Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210000, China
| | - Zhang Chong
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China; (Y.Y.); (Z.C.)
| | - Mauno Vihinen
- Department of Experimental Medical Science, Lund University, BMC B13, SE-221 84 Lund, Sweden
| |
Collapse
|
4
|
Das AP, Saini S, Tyagi S, Chaudhary N, Agarwal SM. Elucidation of Increased Cervical Cancer Risk Due to Polymorphisms in XRCC1 (R399Q and R194W), ERCC5 (D1104H), and NQO1 (P187S). Reprod Sci 2023; 30:1118-1132. [PMID: 36195778 DOI: 10.1007/s43032-022-01096-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Accepted: 09/22/2022] [Indexed: 10/10/2022]
Abstract
Genetic variations like single nucleotide polymorphisms (SNPs) are associated with cervical carcinogenesis. In this study, SNPs have been identified that contribute toward changes in the function and stability of the proteins and show association with cervical cancer. Initially, literature mining identified 114 protein-coding polymorphisms with population-based evidence in cervical cancer. Subsequently, the functional assessment was performed using sequence-dependent tools, and thereafter, protein stability was analyzed using sequence and structural data. Twenty-three non-synonymous SNPs (nsSNPs) found to be damaging and destabilizing were then analyzed to check their risk association at the population level. The meta-analysis indicated that polymorphisms in DNA damage repair genes XRCC1 (rs25487 and rs1799782), ERCC5 (rs17655), and oxidative stress-related gene NQO1 (rs1800566) are significantly associated with increased cervical cancer risk. The XRCC1 rs25487 and rs1799782 polymorphisms showed the highest risk of cervical cancer in the homozygous model having odds ratio (OR) = 1.85, 95% confidence interval (CI) = 1.17-2.92, p = 0.01, and recessive model with OR = 1.81, 95% CI = 1.01-3.24, and p = 0.04 respectively. Similarly, rs17655 polymorphism of ERCC5 and rs1800566 polymorphism of NQO1 showed the highest pooled OR in the homozygous (OR = 1.70, 95% CI = 1.32-2.19, p = 0.00004) and heterozygous model (OR = 1.3, 95% CI = 1.06-1.58, p = 0.01) respectively. Thus, in this study, a comprehensive collection of nsSNPs was collated and assessed, leading to the identification of polymorphisms in DNA damage repair and oxidative stress-related genes, that destabilize the protein and shows increased risk associated with cervical cancer.
Collapse
Affiliation(s)
- Agneesh Pratim Das
- Bioinformatics Division, ICMR-National Institute of Cancer Prevention and Research, I-7, Sector-39, Noida, 201301, Uttar Pradesh, India
| | - Sandeep Saini
- Bioinformatics Division, ICMR-National Institute of Cancer Prevention and Research, I-7, Sector-39, Noida, 201301, Uttar Pradesh, India
| | - Shrishty Tyagi
- Multanimal Modi College, CCS University, Modinagar, 201204, India
| | - Nisha Chaudhary
- Multanimal Modi College, CCS University, Modinagar, 201204, India
| | - Subhash Mohan Agarwal
- Bioinformatics Division, ICMR-National Institute of Cancer Prevention and Research, I-7, Sector-39, Noida, 201301, Uttar Pradesh, India.
| |
Collapse
|
5
|
Nithiyanandam S, Sangaraju VK, Manavalan B, Lee G. Computational prediction of protein folding rate using structural parameters and network centrality measures. Comput Biol Med 2023; 155:106436. [PMID: 36848800 DOI: 10.1016/j.compbiomed.2022.106436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 11/28/2022] [Accepted: 12/13/2022] [Indexed: 02/17/2023]
Abstract
Protein folding is a complex physicochemical process whereby a polymer of amino acids samples numerous conformations in its unfolded state before settling on an essentially unique native three-dimensional (3D) structure. To understand this process, several theoretical studies have used a set of 3D structures, identified different structural parameters, and analyzed their relationships using the natural logarithmic protein folding rate (ln(kf)). Unfortunately, these structural parameters are specific to a small set of proteins that are not capable of accurately predicting ln(kf) for both two-state (TS) and non-two-state (NTS) proteins. To overcome the limitations of the statistical approach, a few machine learning (ML)-based models have been proposed using limited training data. However, none of these methods can explain plausible folding mechanisms. In this study, we evaluated the predictive capabilities of ten different ML algorithms using eight different structural parameters and five different network centrality measures based on newly constructed datasets. In comparison to the other nine regressors, support vector machine was found to be the most appropriate for predicting ln(kf) with mean absolute differences of 1.856, 1.55, and 1.745 for the TS, NTS, and combined datasets, respectively. Furthermore, combining structural parameters and network centrality measures improves the prediction performance compared to individual parameters, indicating that multiple factors are involved in the folding process.
Collapse
Affiliation(s)
- Saraswathy Nithiyanandam
- Department of Molecular Science and Technology, Ajou University, 206 World Cup-ro, Suwon, 16499, South Korea
| | - Vinoth Kumar Sangaraju
- Department of Physiology, Ajou University School of Medicine, 206 World Cup-ro, Suwon, 16499, South Korea
| | - Balachandran Manavalan
- Department of Physiology, Ajou University School of Medicine, 206 World Cup-ro, Suwon, 16499, South Korea.
| | - Gwang Lee
- Department of Molecular Science and Technology, Ajou University, 206 World Cup-ro, Suwon, 16499, South Korea; Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, South Korea.
| |
Collapse
|
6
|
FRTpred: A novel approach for accurate prediction of protein folding rate and type. Comput Biol Med 2022; 149:105911. [DOI: 10.1016/j.compbiomed.2022.105911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Revised: 07/08/2022] [Accepted: 07/23/2022] [Indexed: 11/20/2022]
|
7
|
Karaiyan P, Chang CCH, Chan ES, Tey BT, Ramanan RN, Ooi CW. In silico screening and heterologous expression of soluble dimethyl sulfide monooxygenases of microbial origin in Escherichia coli. Appl Microbiol Biotechnol 2022; 106:4523-4537. [PMID: 35713659 PMCID: PMC9259527 DOI: 10.1007/s00253-022-12008-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Revised: 05/30/2022] [Accepted: 06/01/2022] [Indexed: 11/28/2022]
Abstract
Abstract Sequence-based screening has been widely applied in the discovery of novel microbial enzymes. However, majority of the sequences in the genomic databases were annotated using computational approaches and lacks experimental characterization. Hence, the success in obtaining the functional biocatalysts with improved characteristics requires an efficient screening method that considers a wide array of factors. Recombinant expression of microbial enzymes is often hampered by the undesirable formation of inclusion body. Here, we present a systematic in silico screening method to identify the proteins expressible in soluble form and with the desired biological properties. The screening approach was adopted in the recombinant expression of dimethyl sulfide (DMS) monooxygenase in Escherichia coli. DMS monooxygenase, a two-component enzyme consisting of DmoA and DmoB subunits, was used as a model protein. The success rate of producing soluble and active DmoA is 71% (5 out of 7 genes). Interestingly, the soluble recombinant DmoA enzymes exhibited the NADH:FMN oxidoreductase activity in the absence of DmoB (second subunit), and the cofactor FMN, suggesting that DmoA is also an oxidoreductase. DmoA originated from Janthinobacterium sp. AD80 showed the maximum NADH oxidation activity (maximum reaction rate: 6.6 µM/min; specific activity: 133 µM/min/mg). This novel finding may allow DmoA to be used as an oxidoreductase biocatalyst for various industrial applications. The in silico gene screening methodology established from this study can increase the success rate of producing soluble and functional enzymes while avoiding the laborious trial and error involved in the screening of a large pool of genes available. Key points • A systematic gene screening method was demonstrated. • DmoA is also an oxidoreductase capable of oxidizing NADH and reducing FMN. • DmoA oxidizes NADH in the absence of external FMN. Supplementary Information The online version contains supplementary material available at 10.1007/s00253-022-12008-8.
Collapse
Affiliation(s)
- Prasanth Karaiyan
- Chemical Engineering Discipline, School of Engineering, Monash University Malaysia, Jalan Lagoon Selatan, 47500, Bandar Sunway, Selangor, Malaysia
| | - Catherine Ching Han Chang
- Arkema Thiochemicals Sdn. Bhd., Jalan PJU 1A/7A OASIS Ara Damansara, 47301, Petaling Jaya, Selangor Darul Ehsan, Malaysia
| | - Eng-Seng Chan
- Chemical Engineering Discipline, School of Engineering, Monash University Malaysia, Jalan Lagoon Selatan, 47500, Bandar Sunway, Selangor, Malaysia
| | - Beng Ti Tey
- Chemical Engineering Discipline, School of Engineering, Monash University Malaysia, Jalan Lagoon Selatan, 47500, Bandar Sunway, Selangor, Malaysia.,Advanced Engineering Platform, Monash University Malaysia, Jalan Lagoon Selatan, 47500, Bandar Sunway, Selangor, Malaysia
| | - Ramakrishnan Nagasundara Ramanan
- Chemical Engineering Discipline, School of Engineering, Monash University Malaysia, Jalan Lagoon Selatan, 47500, Bandar Sunway, Selangor, Malaysia. .,Arkema Thiochemicals Sdn. Bhd., Jalan PJU 1A/7A OASIS Ara Damansara, 47301, Petaling Jaya, Selangor Darul Ehsan, Malaysia.
| | - Chien Wei Ooi
- Chemical Engineering Discipline, School of Engineering, Monash University Malaysia, Jalan Lagoon Selatan, 47500, Bandar Sunway, Selangor, Malaysia. .,Advanced Engineering Platform, Monash University Malaysia, Jalan Lagoon Selatan, 47500, Bandar Sunway, Selangor, Malaysia.
| |
Collapse
|
8
|
Das AP, Saini S, Agarwal SM. A comprehensive meta-analysis of non-coding polymorphisms associated with precancerous lesions and cervical cancer. Genomics 2022; 114:110323. [PMID: 35227837 DOI: 10.1016/j.ygeno.2022.110323] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Revised: 02/15/2022] [Accepted: 02/22/2022] [Indexed: 01/14/2023]
Abstract
OBJECTIVES To study the risk of polymorphisms present in the non-coding regions of genes related with cervical cancer. METHODS The PubMed database was extensively searched using text-mining techniques to identify literature containing the association of single nucleotide polymorphisms and cervical cancer. Case-control studies published till June 2020 were considered for the meta-analysis if they fulfilled the selection criteria. The polymorphisms within each case-control study were checked for the presence of genotype data and then divided into groups based on the precancerous and cancerous conditions of the cervix. Odds ratio and 95% confidence intervals (CI) were used to study the effects of polymorphisms with the help of different genetic models (allele, dominant, recessive, heterozygous and homozygous). Also checked heterogeneity along with publication bias and statistical significance using the p-value. RESULTS 120 papers covering 48 unique non-coding SNPs having 37,123 cases and 39,641 control data was considered for the meta-analysis. The genotype data was categorised into Cancer, Precancer and "Cancer + Precancer" groups, for 43, 8 and 11 SNPs respectively. The meta-analysis identified 21 and 1 SNPs as significant in the Cancer and "Cancer + Precancer" groups. Among all the polymorphisms, rs1143627 (IL1B), rs1800795 (IL6), rs1800871 (IL10), rs568408 (IL12A), rs3312227 (IL12B), rs2275913 (IL17A), rs5742909 (CTLA4), rs1800629 (TNFα), and rs4646903 (CYP1A1) were found to increase risk of cervical cancer in at least three of the five genetic models. CONCLUSION We identified potential non-coding SNPs corresponding to various cytokines like interleukins (ILs), tumor necrosis factor (TNF), interferon (IFN) and other immune related genes like toll like receptor (TLR), cytotoxic T-lymphocyte associated protein (CTLA) and matrix metalloproteinase (MMP), as significant with increased pooled OR in this meta-analysis pointing to risk association of the immune-related genes in cervical carcinogenesis.
Collapse
Affiliation(s)
- Agneesh Pratim Das
- Bioinformatics Division, ICMR-National Institute of Cancer Prevention and Research, I-7, Sector-39, Noida 201301, India
| | - Sandeep Saini
- Bioinformatics Division, ICMR-National Institute of Cancer Prevention and Research, I-7, Sector-39, Noida 201301, India
| | - Subhash M Agarwal
- Bioinformatics Division, ICMR-National Institute of Cancer Prevention and Research, I-7, Sector-39, Noida 201301, India.
| |
Collapse
|
9
|
Merlotti A, Menichetti G, Fariselli P, Capriotti E, Remondini D. Network-based strategies for protein characterization. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2021; 127:217-248. [PMID: 34340768 DOI: 10.1016/bs.apcsb.2021.05.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Protein structure characterization is fundamental to understand protein properties, such as folding process and protein resistance to thermal stress, up to unveiling organism pathologies (e.g., prion disease). In this chapter, we provide an overview on how the spectral properties of the networks reconstructed from the Protein Contact Map (PCM) can be used to generate informative observables. As a specific case study, we apply two different network approaches to an example protein dataset, for the aim of discriminating protein folding state, and for the reconstruction of protein 3D structure.
Collapse
Affiliation(s)
| | - Giulia Menichetti
- Center for Complex Network Research, Department of Physics, Northeastern University, Boston, MA, United States; Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, United States
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Turin, Italy
| | - Emidio Capriotti
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Daniel Remondini
- Department of Physics and Astronomy, University of Bologna, Bologna, Italy.
| |
Collapse
|
10
|
Del Amparo R, Branco C, Arenas J, Vicens A, Arenas M. Analysis of selection in protein-coding sequences accounting for common biases. Brief Bioinform 2021; 22:6105943. [PMID: 33479739 DOI: 10.1093/bib/bbaa431] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Revised: 12/17/2020] [Accepted: 12/22/2020] [Indexed: 12/16/2022] Open
Abstract
The evolution of protein-coding genes is usually driven by selective processes, which favor some evolutionary trajectories over others, optimizing the subsequent protein stability and activity. The analysis of selection in this type of genetic data is broadly performed with the metric nonsynonymous/synonymous substitution rate ratio (dN/dS). However, most of the well-established methodologies to estimate this metric make crucial assumptions, such as lack of recombination or invariable codon frequencies along genes, which can bias the estimation. Here, we review the most relevant biases in the dN/dS estimation and provide a detailed guide to estimate this metric using state-of-the-art procedures that account for such biases, along with illustrative practical examples and recommendations. We also discuss the traditional interpretation of the estimated dN/dS emphasizing the importance of considering complementary biological information such as the role of the observed substitutions on the stability and function of proteins. This review is oriented to help evolutionary biologists that aim to accurately estimate selection in protein-coding sequences.
Collapse
Affiliation(s)
- Roberto Del Amparo
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Catarina Branco
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Jesús Arenas
- Unit of Microbiology and Immunology, University of Zaragoza, 50013 Zaragoza, Spain
| | - Alberto Vicens
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Miguel Arenas
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| |
Collapse
|
11
|
Li Y, Zhang Y, Lv J. An Effective Cumulative Torsion Angles Model for Prediction of Protein Folding Rates. Protein Pept Lett 2020; 27:321-328. [PMID: 31612815 DOI: 10.2174/0929866526666191014152207] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2019] [Revised: 06/07/2019] [Accepted: 06/29/2019] [Indexed: 02/05/2023]
Abstract
BACKGROUND Protein folding rate is mainly determined by the size of the conformational space to search, which in turn is dictated by factors such as size, structure and amino-acid sequence in a protein. It is important to integrate these factors effectively to form a more precisely description of conformation space. But there is no general paradigm to answer this question except some intuitions and empirical rules. Therefore, at the present stage, predictions of the folding rate can be improved through finding new factors, and some insights are given to the above question. OBJECTIVE Its purpose is to propose a new parameter that can describe the size of the conformational space to improve the prediction accuracy of protein folding rate. METHODS Based on the optimal set of amino acids in a protein, an effective cumulative backbone torsion angles (CBTAeff) was proposed to describe the size of the conformational space. Linear regression model was used to predict protein folding rate with CBTAeff as a parameter. The degree of correlation was described by the coefficient of determination and the mean absolute error MAE between the predicted folding rates and experimental observations. RESULTS It achieved a high correlation (with the coefficient of determination of 0.70 and MAE of 1.88) between the logarithm of folding rates and the (CBTAeff)0.5 with experimental over 112 twoand multi-state folding proteins. CONCLUSION The remarkable performance of our simplistic model demonstrates that CBTA based on optimal set was the major determinants of the conformation space of natural proteins.
Collapse
Affiliation(s)
- Yanru Li
- Department of Physics, College of Science, Inner Mongolia University of Technology, Hohhot, China
| | - Ying Zhang
- Department of Physics, College of Science, Inner Mongolia University of Technology, Hohhot, China
| | - Jun Lv
- Department of Physics, College of Science, Inner Mongolia University of Technology, Hohhot, China
| |
Collapse
|
12
|
Stepwise optimization of recombinant protein production in Escherichia coli utilizing computational and experimental approaches. Appl Microbiol Biotechnol 2020; 104:3253-3266. [DOI: 10.1007/s00253-020-10454-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2019] [Revised: 01/28/2020] [Accepted: 02/07/2020] [Indexed: 12/14/2022]
|
13
|
Benevenuta S, Fariselli P. On the Upper Bounds of the Real-Valued Predictions. Bioinform Biol Insights 2019; 13:1177932219871263. [PMID: 31488948 PMCID: PMC6710671 DOI: 10.1177/1177932219871263] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Accepted: 07/31/2019] [Indexed: 11/16/2022] Open
Abstract
Predictions are fundamental in science as they allow to test and falsify theories. Predictions are ubiquitous in bioinformatics and also help when no first principles are available. Predictions can be distinguished between classifications (when we associate a label to a given input) or regression (when a real value is assigned). Different scores are used to assess the performance of regression predictors; the most widely adopted include the mean square error, the Pearson correlation (ρ), and the coefficient of determination (or R2). The common conception related to the last 2 indices is that the theoretical upper bound is 1; however, their upper bounds depend both on the experimental uncertainty and the distribution of target variables. A narrow distribution of the target variable may induce a low upper bound. The knowledge of the theoretical upper bounds also has 2 practical applications: (1) comparing different predictors tested on different data sets may lead to wrong ranking and (2) performances higher than the theoretical upper bounds indicate overtraining and improper usage of the learning data sets. Here, we derive the upper bound for the coefficient of determination showing that it is lower than that of the square of the Pearson correlation. We provide analytical equations for both indices that can be used to evaluate the upper bound of the predictions when the experimental uncertainty and the target distribution are available. Our considerations are general and applicable to all regression predictors.
Collapse
Affiliation(s)
| | - Piero Fariselli
- Department of Medical Sciences, University of Turin, Turin, Italy
| |
Collapse
|
14
|
Prediction of change in protein unfolding rates upon point mutations in two state proteins. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2016; 1864:1104-1109. [DOI: 10.1016/j.bbapap.2016.06.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2016] [Revised: 05/05/2016] [Accepted: 06/01/2016] [Indexed: 11/23/2022]
|
15
|
Network measures for protein folding state discrimination. Sci Rep 2016; 6:30367. [PMID: 27464796 PMCID: PMC4964642 DOI: 10.1038/srep30367] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2016] [Accepted: 06/24/2016] [Indexed: 11/09/2022] Open
Abstract
Proteins fold using a two-state or multi-state kinetic mechanisms, but up to now there is not a first-principle model to explain this different behavior. We exploit the network properties of protein structures by introducing novel observables to address the problem of classifying the different types of folding kinetics. These observables display a plain physical meaning, in terms of vibrational modes, possible configurations compatible with the native protein structure, and folding cooperativity. The relevance of these observables is supported by a classification performance up to 90%, even with simple classifiers such as discriminant analysis.
Collapse
|
16
|
Chang CCH, Li C, Webb GI, Tey B, Song J, Ramanan RN. Periscope: quantitative prediction of soluble protein expression in the periplasm of Escherichia coli. Sci Rep 2016; 6:21844. [PMID: 26931649 PMCID: PMC4773868 DOI: 10.1038/srep21844] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2015] [Accepted: 01/28/2016] [Indexed: 12/20/2022] Open
Abstract
Periplasmic expression of soluble proteins in Escherichia coli not only offers a much-simplified downstream purification process, but also enhances the probability of obtaining correctly folded and biologically active proteins. Different combinations of signal peptides and target proteins lead to different soluble protein expression levels, ranging from negligible to several grams per litre. Accurate algorithms for rational selection of promising candidates can serve as a powerful tool to complement with current trial-and-error approaches. Accordingly, proteomics studies can be conducted with greater efficiency and cost-effectiveness. Here, we developed a predictor with a two-stage architecture, to predict the real-valued expression level of target protein in the periplasm. The output of the first-stage support vector machine (SVM) classifier determines which second-stage support vector regression (SVR) classifier to be used. When tested on an independent test dataset, the predictor achieved an overall prediction accuracy of 78% and a Pearson's correlation coefficient (PCC) of 0.77. We further illustrate the relative importance of various features with respect to different models. The results indicate that the occurrence of dipeptide glutamine and aspartic acid is the most important feature for the classification model. Finally, we provide access to the implemented predictor through the Periscope webserver, freely accessible at http://lightning.med.monash.edu/periscope/.
Collapse
Affiliation(s)
- Catherine Ching Han Chang
- Chemical Engineering Discipline, School of Engineering, Monash University, Jalan Lagoon Selatan 46150, Bandar Sunway, Selangor, Malaysia
- Department of Biochemistry and Molecular Biology, Monash University, Melbourne VIC 3800, Australia
| | - Chen Li
- Department of Biochemistry and Molecular Biology, Monash University, Melbourne VIC 3800, Australia
| | - Geoffrey I. Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne VIC 3800, Australia
| | - BengTi Tey
- Chemical Engineering Discipline, School of Engineering, Monash University, Jalan Lagoon Selatan 46150, Bandar Sunway, Selangor, Malaysia
- Advanced Engineering Platform, School of Engineering, Monash University, Jalan Lagoon Selatan 46150, Bandar Sunway, Selangor, Malaysia
| | - Jiangning Song
- Department of Biochemistry and Molecular Biology, Monash University, Melbourne VIC 3800, Australia
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne VIC 3800, Australia
- National Engineering Laboratory for Industrial Enzymes, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
| | - Ramakrishnan Nagasundara Ramanan
- Chemical Engineering Discipline, School of Engineering, Monash University, Jalan Lagoon Selatan 46150, Bandar Sunway, Selangor, Malaysia
- Advanced Engineering Platform, School of Engineering, Monash University, Jalan Lagoon Selatan 46150, Bandar Sunway, Selangor, Malaysia
- School of Chemistry, Monash University, Melbourne VIC 3800, Australia
| |
Collapse
|
17
|
Li C, Ching Han Chang C, Nagel J, Porebski BT, Hayashida M, Akutsu T, Song J, Buckle AM. Critical evaluation of in silico methods for prediction of coiled-coil domains in proteins. Brief Bioinform 2016; 17:270-82. [PMID: 26177815 PMCID: PMC6078162 DOI: 10.1093/bib/bbv047] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Revised: 05/29/2015] [Indexed: 12/19/2022] Open
Abstract
Coiled-coils refer to a bundle of helices coiled together like strands of a rope. It has been estimated that nearly 3% of protein-encoding regions of genes harbour coiled-coil domains (CCDs). Experimental studies have confirmed that CCDs play a fundamental role in subcellular infrastructure and controlling trafficking of eukaryotic cells. Given the importance of coiled-coils, multiple bioinformatics tools have been developed to facilitate the systematic and high-throughput prediction of CCDs in proteins. In this article, we review and compare 12 sequence-based bioinformatics approaches and tools for coiled-coil prediction. These approaches can be categorized into two classes: coiled-coil detection and coiled-coil oligomeric state prediction. We evaluated and compared these methods in terms of their input/output, algorithm, prediction performance, validation methods and software utility. All the independent testing data sets are available at http://lightning.med.monash.edu/coiledcoil/. In addition, we conducted a case study of nine human polyglutamine (PolyQ) disease-related proteins and predicted CCDs and oligomeric states using various predictors. Prediction results for CCDs were highly variable among different predictors. Only two peptides from two proteins were confirmed to be CCDs by majority voting. Both domains were predicted to form dimeric coiled-coils using oligomeric state prediction. We anticipate that this comprehensive analysis will be an insightful resource for structural biologists with limited prior experience in bioinformatics tools, and for bioinformaticians who are interested in designing novel approaches for coiled-coil and its oligomeric state prediction.
Collapse
|
18
|
Computational and experimental approaches to reveal the effects of single nucleotide polymorphisms with respect to disease diagnostics. Int J Mol Sci 2014; 15:9670-717. [PMID: 24886813 PMCID: PMC4100115 DOI: 10.3390/ijms15069670] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2014] [Revised: 05/15/2014] [Accepted: 05/16/2014] [Indexed: 12/25/2022] Open
Abstract
DNA mutations are the cause of many human diseases and they are the reason for natural differences among individuals by affecting the structure, function, interactions, and other properties of DNA and expressed proteins. The ability to predict whether a given mutation is disease-causing or harmless is of great importance for the early detection of patients with a high risk of developing a particular disease and would pave the way for personalized medicine and diagnostics. Here we review existing methods and techniques to study and predict the effects of DNA mutations from three different perspectives: in silico, in vitro and in vivo. It is emphasized that the problem is complicated and successful detection of a pathogenic mutation frequently requires a combination of several methods and a knowledge of the biological phenomena associated with the corresponding macromolecules.
Collapse
|