1
|
Xu R, Pan Q, Zhu G, Ye Y, Xin M, Wang Z, Wang S, Li W, Wei Y, Guo J, Zheng L. ThermoLink: Bridging disulfide bonds and enzyme thermostability through database construction and machine learning prediction. Protein Sci 2024; 33:e5097. [PMID: 39145402 PMCID: PMC11325166 DOI: 10.1002/pro.5097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Revised: 05/27/2024] [Accepted: 06/15/2024] [Indexed: 08/16/2024]
Abstract
Disulfide bonds, covalently formed by sulfur atoms in cysteine residues, play a crucial role in protein folding and structure stability. Considering their significance, artificial disulfide bonds are often introduced to enhance protein thermostability. Although an increasing number of tools can assist with this task, significant amounts of time and resources are often wasted owing to inadequate consideration. To enhance the accuracy and efficiency of designing disulfide bonds for protein thermostability improvement, we initially collected disulfide bond and protein thermostability data from extensive literature sources. Thereafter, we extracted various sequence- and structure-based features and constructed machine-learning models to predict whether disulfide bonds can improve protein thermostability. Among all models, the neighborhood context model based on the Adaboost-DT algorithm performed the best, yielding "area under the receiver operating characteristic curve" and accuracy scores of 0.773 and 0.714, respectively. Furthermore, we also found AlphaFold2 to exhibit high superiority in predicting disulfide bonds, and to some extent, the coevolutionary relationship between residue pairs potentially guided artificial disulfide bond design. Moreover, several mutants of imine reductase 89 (IR89) with artificially designed thermostable disulfide bonds were experimentally proven to be considerably efficient for substrate catalysis. The SS-bond data have been integrated into an online server, namely, ThermoLink, available at guolab.mpu.edu.mo/thermoLink.
Collapse
Affiliation(s)
- Ran Xu
- Centre in Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao, China
| | - Qican Pan
- Zelixir Biotech Company Ltd, Shanghai, China
| | | | - Yilin Ye
- Centre in Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao, China
| | - Minghui Xin
- School of Physics, Shandong University, Jinan, China
| | - Zechen Wang
- School of Physics, Shandong University, Jinan, China
| | - Sheng Wang
- Zelixir Biotech Company Ltd, Shanghai, China
| | - Weifeng Li
- School of Physics, Shandong University, Jinan, China
| | - Yanjie Wei
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Jingjing Guo
- Centre in Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao, China
| | - Liangzhen Zheng
- Zelixir Biotech Company Ltd, Shanghai, China
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| |
Collapse
|
2
|
Kabir MWU, Alawad DM, Pokhrel P, Hoque MT. DRBpred: A sequence-based machine learning method to effectively predict DNA- and RNA-binding residues. Comput Biol Med 2024; 170:108081. [PMID: 38295475 PMCID: PMC10922697 DOI: 10.1016/j.compbiomed.2024.108081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 01/12/2024] [Accepted: 01/27/2024] [Indexed: 02/02/2024]
Abstract
DNA-binding and RNA-binding proteins are essential to an organism's normal life cycle. These proteins have diverse functions in various biological processes. DNA-binding proteins are crucial for DNA replication, transcription, repair, packaging, and gene expression. Likewise, RNA-binding proteins are essential for the post-transcriptional control of RNAs and RNA metabolism. Identifying DNA- and RNA-binding residue is essential for biological research and understanding the pathogenesis of many diseases. However, most DNA-binding and RNA-binding proteins still need to be discovered. This research explored various properties of the protein sequences, such as amino acid composition type, Position-Specific Scoring Matrix (PSSM) values of amino acids, Hidden Markov model (HMM) profiles, physiochemical properties, structural properties, torsion angles, and disorder regions. We utilized a sliding window technique to extract more information from a target residue's neighbors. We proposed an optimized Light Gradient Boosting Machine (LightGBM) method, named DRBpred, to predict DNA-binding and RNA-binding residues from the protein sequence. DRBpred shows an improvement of 112.00 %, 33.33 %, and 6.49 % for the DNA-binding test set compared to the state-of-the-art method. It shows an improvement of 112.50 %, 16.67 %, and 7.46 % for the RNA-binding test set regarding Sensitivity, Mathews Correlation Coefficient (MCC), and AUC metric.
Collapse
Affiliation(s)
- Md Wasi Ul Kabir
- Department of Computer Science, University of New Orleans, New Orleans, LA, USA.
| | - Duaa Mohammad Alawad
- Department of Computer Science, University of New Orleans, New Orleans, LA, USA.
| | - Pujan Pokhrel
- Department of Computer Science, University of New Orleans, New Orleans, LA, USA.
| | - Md Tamjidul Hoque
- Department of Computer Science, University of New Orleans, New Orleans, LA, USA.
| |
Collapse
|
3
|
Zhang X, Chen H, Zhang X, Wang H, Tao L, He W, Li Q, Cheng O, Luo J, Man Y, Xiao Z, Fang W. Identification of essential tremor based on resting-state functional connectivity. Hum Brain Mapp 2023; 44:1407-1416. [PMID: 36326578 PMCID: PMC9921216 DOI: 10.1002/hbm.26124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 09/21/2022] [Accepted: 10/02/2022] [Indexed: 11/06/2022] Open
Abstract
Currently, machine-learning algorithms have been considered the most promising approach to reach a clinical diagnosis at the individual level. This study aimed to investigate whether the whole-brain resting-state functional connectivity (RSFC) metrics combined with machine-learning algorithms could be used to identify essential tremor (ET) patients from healthy controls (HCs) and further revealed ET-related brain network pathogenesis to establish the potential diagnostic biomarkers. The RSFC metrics obtained from 127 ET patients and 120 HCs were used as input features, then the Mann-Whitney U test and the least absolute shrinkage and selection operator (LASSO) methods were applied to reduce feature dimensionality. Four machine-learning algorithms were adopted to identify ET from HCs. The accuracy, sensitivity, specificity and the area under the curve (AUC) were used to evaluate the classification performances. The support vector machine, gradient boosting decision tree, random forest and Gaussian naïve Bayes algorithms could achieve good classification performances with accuracy at 82.8%, 79.4%, 78.9% and 72.4%, respectively. The most discriminative features were primarily located in the cerebello-thalamo-motor and non-motor circuits. Correlation analysis showed that two RSFC features were positively correlated with tremor frequency and four RSFC features were negatively correlated with tremor severity. The present study demonstrated that combining the RSFC matrices with multiple machine-learning algorithms could not only achieve high classification accuracy for discriminating ET patients from HCs but also help us to reveal the potential brain network pathogenesis in ET.
Collapse
Affiliation(s)
- Xueyan Zhang
- Department of Radiology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Huiyue Chen
- Department of Radiology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Xiaoyu Zhang
- Department of Radiology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Hansheng Wang
- Department of Radiology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Li Tao
- Department of Radiology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Wanlin He
- Department of Radiology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Qin Li
- Department of Radiology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Oumei Cheng
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Jing Luo
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Yun Man
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Zheng Xiao
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Weidong Fang
- Department of Radiology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| |
Collapse
|
4
|
Wang Y, Zhu X, Yang L, Hu X, He K, Yu C, Jiao S, Chen J, Guo R, Yang S. IDDLncLoc: Subcellular Localization of LncRNAs Based on a Framework for Imbalanced Data Distributions. Interdiscip Sci 2022; 14:409-420. [PMID: 35192174 DOI: 10.1007/s12539-021-00497-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Revised: 12/16/2021] [Accepted: 12/20/2021] [Indexed: 06/14/2023]
Abstract
Long non-coding RNAs play a crucial role in many life processes of cell, such as genetic markers, RNA splicing, signaling, and protein regulation. Considering that identifying lncRNA's localization in the cell through experimental methods is complicated, hard to reproduce, and expensive, we propose a novel method named IDDLncLoc in this paper, which adopts an ensemble model to solve the problem of the subcellular localization. In the proposal model, dinucleotide-based auto-cross covariance features, k-mer nucleotide composition features, and composition, transition, and distribution features are introduced to encode a raw RNA sequence to vector. To screen out reliable features, feature selection through binomial distribution, and recursive feature elimination is employed. Furthermore, strategies of oversampling in mini-batch, random sampling, and stacking ensemble strategies are customized to overcome the problem of data imbalance on the benchmark dataset. Finally, compared with the latest methods, IDDLncLoc achieves an accuracy of 94.96% on the benchmark dataset, which is 2.59% higher than the best method, and the results further demonstrate IDDLncLoc is excellent on the subcellular localization of lncRNA. Besides, a user-friendly web server is available at http://lncloc.club .
Collapse
Affiliation(s)
- Yan Wang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
- School of Artificial Intelligence, Jilin University, Changchun, China
| | - Xiaopeng Zhu
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Lili Yang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
- Department of Obstetrics, The First Hospital of Jilin University, Changchun, China
| | - Xuemei Hu
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Kai He
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Cuinan Yu
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Shaoqing Jiao
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Jiali Chen
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Rui Guo
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Sen Yang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China.
| |
Collapse
|
5
|
Mapes NJ, Rodriguez C, Chowriappa P, Dua S. Local Similarity Matrix for Cysteine Disulfide Connectivity Prediction from Protein Sequences. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1276-1289. [PMID: 30640622 DOI: 10.1109/tcbb.2019.2892441] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Accurately predicting three dimensional protein structures from sequences would present us with targets for drugs via molecular dynamics that would treat cancer, viral infections, and neurological diseases. These treatments would have a far reaching impact to our economy, quality of life, and society. The goal of this research was to build a data mining framework to predict cysteine connectivity in proteins from the sequence and oxidation state of cysteines. Accurately predicting the cysteine bonding configuration improves the TM-Score, a quantitative measurement of protein structure prediction accuracy. We provided state of the art Qp and Qc on the PDBCYS and IVD-54 Datasets. Furthermore, we have produced a Local Similarity Matrix that compares favorably to the default PSSMs generated from PSI-Blast in a statistically significant way. Our Qp for SP39, PDBCYS, and IVD-54 were 90.6, 80.6, and 68.5, respectively.
Collapse
|
6
|
Dehzangi A, López Y, Taherzadeh G, Sharma A, Tsunoda T. SumSec: Accurate Prediction of Sumoylation Sites Using Predicted Secondary Structure. Molecules 2018; 23:E3260. [PMID: 30544729 PMCID: PMC6320791 DOI: 10.3390/molecules23123260] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2018] [Revised: 11/30/2018] [Accepted: 12/05/2018] [Indexed: 12/13/2022] Open
Abstract
Post Translational Modification (PTM) is defined as the modification of amino acids along the protein sequences after the translation process. These modifications significantly impact on the functioning of proteins. Therefore, having a comprehensive understanding of the underlying mechanism of PTMs turns out to be critical in studying the biological roles of proteins. Among a wide range of PTMs, sumoylation is one of the most important modifications due to its known cellular functions which include transcriptional regulation, protein stability, and protein subcellular localization. Despite its importance, determining sumoylation sites via experimental methods is time-consuming and costly. This has led to a great demand for the development of fast computational methods able to accurately determine sumoylation sites in proteins. In this study, we present a new machine learning-based method for predicting sumoylation sites called SumSec. To do this, we employed the predicted secondary structure of amino acids to extract two types of structural features from neighboring amino acids along the protein sequence which has never been used for this task. As a result, our proposed method is able to enhance the sumoylation site prediction task, outperforming previously proposed methods in the literature. SumSec demonstrated high sensitivity (0.91), accuracy (0.94) and MCC (0.88). The prediction accuracy achieved in this study is 21% better than those reported in previous studies. The script and extracted features are publicly available at: https://github.com/YosvanyLopez/SumSec.
Collapse
Affiliation(s)
- Abdollah Dehzangi
- Department of Computer Science, Morgan State University, Baltimore, MD 21251, USA.
| | - Yosvany López
- Genesis Institute of Genetic Research, Genesis Healthcare Co., Tokyo 150-6015, Japan.
| | - Ghazaleh Taherzadeh
- School of Information and Communication Technology, Griffith University, Gold Coast 4222, Australia.
| | - Alok Sharma
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane 4111, Australia.
- School of Engineering & Physics, University of the South Pacific, Suva, Fiji.
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.
- CREST, JST, Tokyo 102-0076, Japan.
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo 113-8510, Japan.
| | - Tatsuhiko Tsunoda
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.
- CREST, JST, Tokyo 102-0076, Japan.
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo 113-8510, Japan.
| |
Collapse
|
7
|
Zhang YZ, Shen HB. Signal-3L 2.0: A Hierarchical Mixture Model for Enhancing Protein Signal Peptide Prediction by Incorporating Residue-Domain Cross-Level Features. J Chem Inf Model 2017; 57:988-999. [PMID: 28298081 DOI: 10.1021/acs.jcim.6b00484] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Signal peptides play key roles in targeting and translocation of integral membrane proteins and secretory proteins. However, signal peptides present several challenges for automatic prediction methods. One challenge is that it is difficult to discriminate signal peptides from transmembrane helices, as both the H-region of the peptides and the transmembrane helices are hydrophobic. Another is that it is difficult to identify the cleavage site between signal peptides and mature proteins, as cleavage motifs or patterns are still unclear for most proteins. To solve these problems and further enhance automatic signal peptide recognition, we report a new Signal-3L 2.0 predictor. Our new model is constructed with a hierarchical protocol, where it first determines the existence of a signal peptide. For this, we propose a new residue-domain cross-level feature-driven approach, and we demonstrate that protein functional domain information is particularly useful for discriminating between the transmembrane helices and signal peptides as they perform different functions. Next, in order to accurately identify the unique signal peptide cleavage sites along the sequence, we designed a top-down approach where a subset of potential cleavage sites are screened using statistical learning rules, and then a final unique site is selected according to its evolution conservation score. Because this mixed approach utilizes both statistical learning and evolution analysis, it shows a strong capacity for recognizing cleavage sites. Signal-3L 2.0 has been benchmarked on multiple data sets, and the experimental results have demonstrated its accuracy. The online server is available at www.csbio.sjtu.edu.cn/bioinf/Signal-3L/ .
Collapse
Affiliation(s)
- Yi-Ze Zhang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University , Shanghai, 200240, China.,Key Laboratory of System Control and Information Processing, Ministry of Education of China , Shanghai, 200240, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University , Shanghai, 200240, China.,Key Laboratory of System Control and Information Processing, Ministry of Education of China , Shanghai, 200240, China
| |
Collapse
|
8
|
Mining Chemical Activity Status from High-Throughput Screening Assays. PLoS One 2015; 10:e0144426. [PMID: 26658480 PMCID: PMC4682830 DOI: 10.1371/journal.pone.0144426] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2015] [Accepted: 11/18/2015] [Indexed: 01/20/2023] Open
Abstract
High-throughput screening (HTS) experiments provide a valuable resource that reports biological activity of numerous chemical compounds relative to their molecular targets. Building computational models that accurately predict such activity status (active vs. inactive) in specific assays is a challenging task given the large volume of data and frequently small proportion of active compounds relative to the inactive ones. We developed a method, DRAMOTE, to predict activity status of chemical compounds in HTP activity assays. For a class of HTP assays, our method achieves considerably better results than the current state-of-the-art-solutions. We achieved this by modification of a minority oversampling technique. To demonstrate that DRAMOTE is performing better than the other methods, we performed a comprehensive comparison analysis with several other methods and evaluated them on data from 11 PubChem assays through 1,350 experiments that involved approximately 500,000 interactions between chemicals and their target proteins. As an example of potential use, we applied DRAMOTE to develop robust models for predicting FDA approved drugs that have high probability to interact with the thyroid stimulating hormone receptor (TSHR) in humans. Our findings are further partially and indirectly supported by 3D docking results and literature information. The results based on approximately 500,000 interactions suggest that DRAMOTE has performed the best and that it can be used for developing robust virtual screening models. The datasets and implementation of all solutions are available as a MATLAB toolbox online at www.cbrc.kaust.edu.sa/dramote and can be found on Figshare.
Collapse
|
9
|
Márquez-Chamorro AE, Aguilar-Ruiz JS. Soft Computing Methods for Disulfide Connectivity Prediction. Evol Bioinform Online 2015; 11:223-9. [PMID: 26523116 PMCID: PMC4620934 DOI: 10.4137/ebo.s25349] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Revised: 08/24/2015] [Accepted: 08/31/2015] [Indexed: 11/26/2022] Open
Abstract
The problem of protein structure prediction (PSP) is one of the main challenges in structural bioinformatics. To tackle this problem, PSP can be divided into several subproblems. One of these subproblems is the prediction of disulfide bonds. The disulfide connectivity prediction problem consists in identifying which nonadjacent cysteines would be cross-linked from all possible candidates. Determining the disulfide bond connectivity between the cysteines of a protein is desirable as a previous step of the 3D PSP, as the protein conformational search space is highly reduced. The most representative soft computing approaches for the disulfide bonds connectivity prediction problem of the last decade are summarized in this paper. Certain aspects, such as the different methodologies based on soft computing approaches (artificial neural network or support vector machine) or features of the algorithms, are used for the classification of these methods.
Collapse
|
10
|
Xiao F, Shen HB. Prediction Enhancement of Residue Real-Value Relative Accessible Surface Area in Transmembrane Helical Proteins by Solving the Output Preference Problem of Machine Learning-Based Predictors. J Chem Inf Model 2015; 55:2464-74. [DOI: 10.1021/acs.jcim.5b00246] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Affiliation(s)
- Feng Xiao
- Institute
of Image Processing
and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory
of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Hong-Bin Shen
- Institute
of Image Processing
and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory
of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| |
Collapse
|
11
|
Yang J, He BJ, Jang R, Zhang Y, Shen HB. Accurate disulfide-bonding network predictions improve ab initio structure prediction of cysteine-rich proteins. Bioinformatics 2015; 31:3773-81. [PMID: 26254435 DOI: 10.1093/bioinformatics/btv459] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2015] [Accepted: 08/02/2015] [Indexed: 01/19/2023] Open
Abstract
MOTIVATION Cysteine-rich proteins cover many important families in nature but there are currently no methods specifically designed for modeling the structure of these proteins. The accuracy of disulfide connectivity pattern prediction, particularly for the proteins of higher-order connections, e.g., >3 bonds, is too low to effectively assist structure assembly simulations. RESULTS We propose a new hierarchical order reduction protocol called Cyscon for disulfide-bonding prediction. The most confident disulfide bonds are first identified and bonding prediction is then focused on the remaining cysteine residues based on SVR training. Compared with purely machine learning-based approaches, Cyscon improved the average accuracy of connectivity pattern prediction by 21.9%. For proteins with more than 5 disulfide bonds, Cyscon improved the accuracy by 585% on the benchmark set of PDBCYS. When applied to 158 non-redundant cysteine-rich proteins, Cyscon predictions helped increase (or decrease) the TM-score (or RMSD) of the ab initio QUARK modeling by 12.1% (or 14.4%). This result demonstrates a new avenue to improve the ab initio structure modeling for cysteine-rich proteins. AVAILABILITY AND IMPLEMENTATION http://www.csbio.sjtu.edu.cn/bioinf/Cyscon/ CONTACT zhng@umich.edu or hbshen@sjtu.edu.cn. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jing Yang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Bao-Ji He
- State Key Laboratory of Theoretical Physics, Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China, Department of Computational Medicine and Bioinformatics and
| | - Richard Jang
- Department of Computational Medicine and Bioinformatics and
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics and Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China, Department of Computational Medicine and Bioinformatics and
| |
Collapse
|
12
|
Briefing in application of machine learning methods in ion channel prediction. ScientificWorldJournal 2015; 2015:945927. [PMID: 25961077 PMCID: PMC4415473 DOI: 10.1155/2015/945927] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2014] [Accepted: 09/11/2014] [Indexed: 01/09/2023] Open
Abstract
In cells, ion channels are one of the most important classes of membrane proteins which allow inorganic ions to move across the membrane. A wide range of biological processes are involved and regulated by the opening and closing of ion channels. Ion channels can be classified into numerous classes and different types of ion channels exhibit different functions. Thus, the correct identification of ion channels and their types using computational methods will provide in-depth insights into their function in various biological processes. In this review, we will briefly introduce and discuss the recent progress in ion channel prediction using machine learning methods.
Collapse
|
13
|
Yu DJ, Li Y, Hu J, Yang X, Yang JY, Shen HB. Disulfide Connectivity Prediction Based on Modelled Protein 3D Structural Information and Random Forest Regression. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:611-621. [PMID: 26357272 DOI: 10.1109/tcbb.2014.2359451] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Disulfide connectivity is an important protein structural characteristic. Accurately predicting disulfide connectivity solely from protein sequence helps to improve the intrinsic understanding of protein structure and function, especially in the post-genome era where large volume of sequenced proteins without being functional annotated is quickly accumulated. In this study, a new feature extracted from the predicted protein 3D structural information is proposed and integrated with traditional features to form discriminative features. Based on the extracted features, a random forest regression model is performed to predict protein disulfide connectivity. We compare the proposed method with popular existing predictors by performing both cross-validation and independent validation tests on benchmark datasets. The experimental results demonstrate the superiority of the proposed method over existing predictors. We believe the superiority of the proposed method benefits from both the good discriminative capability of the newly developed features and the powerful modelling capability of the random forest. The web server implementation, called TargetDisulfide, and the benchmark datasets are freely available at: http://csbio.njust.edu.cn/bioinf/TargetDisulfide for academic use.
Collapse
|
14
|
Improved local ternary patterns for automatic target recognition in infrared imagery. SENSORS 2015; 15:6399-418. [PMID: 25785311 PMCID: PMC4435149 DOI: 10.3390/s150306399] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/12/2014] [Revised: 12/25/2014] [Accepted: 02/16/2015] [Indexed: 11/17/2022]
Abstract
This paper presents an improved local ternary pattern (LTP) for automatic target recognition (ATR) in infrared imagery. Firstly, a robust LTP (RLTP) scheme is proposed to overcome the limitation of the original LTP for achieving the invariance with respect to the illumination transformation. Then, a soft concave-convex partition (SCCP) is introduced to add some flexibility to the original concave-convex partition (CCP) scheme. Referring to the orthogonal combination of local binary patterns (OC_LBP), the orthogonal combination of LTP (OC_LTP) is adopted to reduce the dimensionality of the LTP histogram. Further, a novel operator, called the soft concave-convex orthogonal combination of robust LTP (SCC_OC_RLTP), is proposed by combing RLTP, SCCP and OC_LTP. Finally, the new operator is used for ATR along with a blocking schedule to improve its discriminability and a feature selection technique to enhance its efficiency. Experimental results on infrared imagery show that the proposed features can achieve competitive ATR results compared with the state-of-the-art methods.
Collapse
|
15
|
An overview of the prediction of protein DNA-binding sites. Int J Mol Sci 2015; 16:5194-215. [PMID: 25756377 PMCID: PMC4394471 DOI: 10.3390/ijms16035194] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2014] [Revised: 02/21/2015] [Accepted: 02/27/2015] [Indexed: 02/06/2023] Open
Abstract
Interactions between proteins and DNA play an important role in many essential biological processes such as DNA replication, transcription, splicing, and repair. The identification of amino acid residues involved in DNA-binding sites is critical for understanding the mechanism of these biological activities. In the last decade, numerous computational approaches have been developed to predict protein DNA-binding sites based on protein sequence and/or structural information, which play an important role in complementing experimental strategies. At this time, approaches can be divided into three categories: sequence-based DNA-binding site prediction, structure-based DNA-binding site prediction, and homology modeling and threading. In this article, we review existing research on computational methods to predict protein DNA-binding sites, which includes data sets, various residue sequence/structural features, machine learning methods for comparison and selection, evaluation methods, performance comparison of different tools, and future directions in protein DNA-binding site prediction. In particular, we detail the meta-analysis of protein DNA-binding sites. We also propose specific implications that are likely to result in novel prediction methods, increased performance, or practical applications.
Collapse
|
16
|
Yu DJ, Hu J, Yan H, Yang XB, Yang JY, Shen HB. Enhancing protein-vitamin binding residues prediction by multiple heterogeneous subspace SVMs ensemble. BMC Bioinformatics 2014; 15:297. [PMID: 25189131 PMCID: PMC4261549 DOI: 10.1186/1471-2105-15-297] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2014] [Accepted: 08/18/2014] [Indexed: 11/10/2022] Open
Abstract
Background Vitamins are typical ligands that play critical roles in various metabolic processes. The accurate identification of the vitamin-binding residues solely based on a protein sequence is of significant importance for the functional annotation of proteins, especially in the post-genomic era, when large volumes of protein sequences are accumulating quickly without being functionally annotated. Results In this paper, a new predictor called TargetVita is designed and implemented for predicting protein-vitamin binding residues using protein sequences. In TargetVita, features derived from the position-specific scoring matrix (PSSM), predicted protein secondary structure, and vitamin binding propensity are combined to form the original feature space; then, several feature subspaces are selected by performing different feature selection methods. Finally, based on the selected feature subspaces, heterogeneous SVMs are trained and then ensembled for performing prediction. Conclusions The experimental results obtained with four separate vitamin-binding benchmark datasets demonstrate that the proposed TargetVita is superior to the state-of-the-art vitamin-specific predictor, and an average improvement of 10% in terms of the Matthews correlation coefficient (MCC) was achieved over independent validation tests. The TargetVita web server and the datasets used are freely available for academic use at http://csbio.njust.edu.cn/bioinf/TargetVita or http://www.csbio.sjtu.edu.cn/bioinf/TargetVita. Electronic supplementary material The online version of this article (doi:10.1186/1471-2105-15-297) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, China.
| | | | | | | | | | | |
Collapse
|
17
|
Wang H, Wang M, Tan H, Li Y, Zhang Z, Song J. PredPPCrys: accurate prediction of sequence cloning, protein production, purification and crystallization propensity from protein sequences using multi-step heterogeneous feature fusion and selection. PLoS One 2014; 9:e105902. [PMID: 25148528 PMCID: PMC4141844 DOI: 10.1371/journal.pone.0105902] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2014] [Accepted: 07/25/2014] [Indexed: 01/14/2023] Open
Abstract
X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed ‘PredPPCrys’ using the support vector machine (SVM). Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I). Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II), which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization targets of currently non-crystallizable proteins were provided as compendium data, which are anticipated to facilitate target selection and design for the worldwide structural genomics consortium. PredPPCrys is freely available at http://www.structbioinfor.org/PredPPCrys.
Collapse
Affiliation(s)
- Huilin Wang
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China
| | - Mingjun Wang
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China
| | - Hao Tan
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, Victoria, Australia
| | - Yuan Li
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, China
- * E-mail: (JS); (ZZ)
| | - Jiangning Song
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, Victoria, Australia
- ARC Centre of Excellence in Structural and Functional Microbial Genomics, Monash University, Melbourne, Victoria, Australia
- * E-mail: (JS); (ZZ)
| |
Collapse
|
18
|
Yang F, Xu YY, Shen HB. Many local pattern texture features: which is better for image-based multilabel human protein subcellular localization classification? ScientificWorldJournal 2014; 2014:429049. [PMID: 25050396 PMCID: PMC4094881 DOI: 10.1155/2014/429049] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2014] [Accepted: 05/22/2014] [Indexed: 01/14/2023] Open
Abstract
Human protein subcellular location prediction can provide critical knowledge for understanding a protein's function. Since significant progress has been made on digital microscopy, automated image-based protein subcellular location classification is urgently needed. In this paper, we aim to investigate more representative image features that can be effectively used for dealing with the multilabel subcellular image samples. We prepared a large multilabel immunohistochemistry (IHC) image benchmark from the Human Protein Atlas database and tested the performance of different local texture features, including completed local binary pattern, local tetra pattern, and the standard local binary pattern feature. According to our experimental results from binary relevance multilabel machine learning models, the completed local binary pattern, and local tetra pattern are more discriminative for describing IHC images when compared to the traditional local binary pattern descriptor. The combination of these two novel local pattern features and the conventional global texture features is also studied. The enhanced performance of final binary relevance classification model trained on the combined feature space demonstrates that different features are complementary to each other and thus capable of improving the accuracy of classification.
Collapse
Affiliation(s)
- Fan Yang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China
- Key Laboratory of Optic-Electronic and Communication, Jiangxi Science & Technology Normal University, Nanchang 330013, China
- Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Ying-Ying Xu
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China
- Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China
- Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| |
Collapse
|
19
|
Guilloux A, Caudron B, Jestin JL. A method to predict edge strands in beta-sheets from protein sequences. Comput Struct Biotechnol J 2013; 7:e201305001. [PMID: 24688737 PMCID: PMC3962219 DOI: 10.5936/csbj.201305001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2013] [Revised: 05/27/2013] [Accepted: 05/30/2013] [Indexed: 12/15/2022] Open
Abstract
There is a need for rules allowing three-dimensional structure information to be derived from protein sequences. In this work, consideration of an elementary protein folding step allows protein sub-sequences which optimize folding to be derived for any given protein sequence. Classical mechanics applied to this system and the energy conservation law during the elementary folding step yields an equation whose solutions are taken over the field of rational numbers. This formalism is applied to beta-sheets containing two edge strands and at least two central strands. The number of protein sub-sequences optimized for folding per amino acid in beta-strands is shown in particular to predict edge strands from protein sequences. Topological information on beta-strands and loops connecting them is derived for protein sequences with a prediction accuracy of 75%. The statistical significance of the finding is given. Applications in protein structure prediction are envisioned such as for the quality assessment of protein structure models.
Collapse
Affiliation(s)
- Antonin Guilloux
- Analyse algébrique, Institut de Mathématiques de Jussieu, Université Pierre et Marie Curie, Paris VI, France
| | - Bernard Caudron
- Centre d'Informatique pour la Biologie, Institut Pasteur, Paris, France
| | | |
Collapse
|
20
|
PROSPER: an integrated feature-based tool for predicting protease substrate cleavage sites. PLoS One 2012; 7:e50300. [PMID: 23209700 PMCID: PMC3510211 DOI: 10.1371/journal.pone.0050300] [Citation(s) in RCA: 222] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2012] [Accepted: 10/18/2012] [Indexed: 12/04/2022] Open
Abstract
The ability to catalytically cleave protein substrates after synthesis is fundamental for all forms of life. Accordingly, site-specific proteolysis is one of the most important post-translational modifications. The key to understanding the physiological role of a protease is to identify its natural substrate(s). Knowledge of the substrate specificity of a protease can dramatically improve our ability to predict its target protein substrates, but this information must be utilized in an effective manner in order to efficiently identify protein substrates by in silico approaches. To address this problem, we present PROSPER, an integrated feature-based server for in silico identification of protease substrates and their cleavage sites for twenty-four different proteases. PROSPER utilizes established specificity information for these proteases (derived from the MEROPS database) with a machine learning approach to predict protease cleavage sites by using different, but complementary sequence and structure characteristics. Features used by PROSPER include local amino acid sequence profile, predicted secondary structure, solvent accessibility and predicted native disorder. Thus, for proteases with known amino acid specificity, PROSPER provides a convenient, pre-prepared tool for use in identifying protein substrates for the enzymes. Systematic prediction analysis for the twenty-four proteases thus far included in the database revealed that the features we have included in the tool strongly improve performance in terms of cleavage site prediction, as evidenced by their contribution to performance improvement in terms of identifying known cleavage sites in substrates for these enzymes. In comparison with two state-of-the-art prediction tools, PoPS and SitePrediction, PROSPER achieves greater accuracy and coverage. To our knowledge, PROSPER is the first comprehensive server capable of predicting cleavage sites of multiple proteases within a single substrate sequence using machine learning techniques. It is freely available at http://lightning.med.monash.edu.au/PROSPER/.
Collapse
|
21
|
An integrative computational framework based on a two-step random forest algorithm improves prediction of zinc-binding sites in proteins. PLoS One 2012; 7:e49716. [PMID: 23166753 PMCID: PMC3499040 DOI: 10.1371/journal.pone.0049716] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2012] [Accepted: 10/12/2012] [Indexed: 11/30/2022] Open
Abstract
Zinc-binding proteins are the most abundant metalloproteins in the Protein Data Bank where the zinc ions usually have catalytic, regulatory or structural roles critical for the function of the protein. Accurate prediction of zinc-binding sites is not only useful for the inference of protein function but also important for the prediction of 3D structure. Here, we present a new integrative framework that combines multiple sequence and structural properties and graph-theoretic network features, followed by an efficient feature selection to improve prediction of zinc-binding sites. We investigate what information can be retrieved from the sequence, structure and network levels that is relevant to zinc-binding site prediction. We perform a two-step feature selection using random forest to remove redundant features and quantify the relative importance of the retrieved features. Benchmarking on a high-quality structural dataset containing 1,103 protein chains and 484 zinc-binding residues, our method achieved >80% recall at a precision of 75% for the zinc-binding residues Cys, His, Glu and Asp on 5-fold cross-validation tests, which is a 10%-28% higher recall at the 75% equal precision compared to SitePredict and zincfinder at residue level using the same dataset. The independent test also indicates that our method has achieved recall of 0.790 and 0.759 at residue and protein levels, respectively, which is a performance better than the other two methods. Moreover, AUC (the Area Under the Curve) and AURPC (the Area Under the Recall-Precision Curve) by our method are also respectively better than those of the other two methods. Our method can not only be applied to large-scale identification of zinc-binding sites when structural information of the target is available, but also give valuable insights into important features arising from different levels that collectively characterize the zinc-binding sites. The scripts and datasets are available at http://protein.cau.edu.cn/zincidentifier/.
Collapse
|
22
|
A new avenue for classification and prediction of olive cultivars using supervised and unsupervised algorithms. PLoS One 2012; 7:e44164. [PMID: 22957050 PMCID: PMC3434224 DOI: 10.1371/journal.pone.0044164] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2012] [Accepted: 07/30/2012] [Indexed: 11/19/2022] Open
Abstract
Various methods have been used to identify cultivares of olive trees; herein we used different bioinformatics algorithms to propose new tools to classify 10 cultivares of olive based on RAPD and ISSR genetic markers datasets generated from PCR reactions. Five RAPD markers (OPA0a21, OPD16a, OP01a1, OPD16a1 and OPA0a8) and five ISSR markers (UBC841a4, UBC868a7, UBC841a14, U12BC807a and UBC810a13) selected as the most important markers by all attribute weighting models. K-Medoids unsupervised clustering run on SVM dataset was fully able to cluster each olive cultivar to the right classes. All trees (176) induced by decision tree models generated meaningful trees and UBC841a4 attribute clearly distinguished between foreign and domestic olive cultivars with 100% accuracy. Predictive machine learning algorithms (SVM and Naïve Bayes) were also able to predict the right class of olive cultivares with 100% accuracy. For the first time, our results showed data mining techniques can be effectively used to distinguish between plant cultivares and proposed machine learning based systems in this study can predict new olive cultivars with the best possible accuracy.
Collapse
|
23
|
Zhao X, Li J, Huang Y, Ma Z, Yin M. Prediction of bioluminescent proteins using auto covariance transformation of evolutional profiles. Int J Mol Sci 2012; 13:3650-3660. [PMID: 22489173 PMCID: PMC3317733 DOI: 10.3390/ijms13033650] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2012] [Revised: 02/21/2012] [Accepted: 03/05/2012] [Indexed: 12/21/2022] Open
Abstract
Bioluminescent proteins are important for various cellular processes, such as gene expression analysis, drug discovery, bioluminescent imaging, toxicity determination, and DNA sequencing studies. Hence, the correct identification of bioluminescent proteins is of great importance both for helping genome annotation and providing a supplementary role to experimental research to obtain insight into bioluminescent proteins' functions. However, few computational methods are available for identifying bioluminescent proteins. Therefore, in this paper we develop a new method to predict bioluminescent proteins using a model based on position specific scoring matrix and auto covariance. Tested by 10-fold cross-validation and independent test, the accuracy of the proposed model reaches 85.17% for the training dataset and 90.71% for the testing dataset respectively. These results indicate that our predictor is a useful tool to predict bioluminescent proteins. This is the first study in which evolutionary information and local sequence environment information have been successfully integrated for predicting bioluminescent proteins. A web server (BLPre) that implements the proposed predictor is freely available.
Collapse
Affiliation(s)
- Xiaowei Zhao
- School of Computer Science and Information Technology, Northeast Normal University, Changchun 130117, China; E-Mails: (X.Z.); (J.L.)
- School of Life Sciences, Northeast Normal University, Changchun 130024, China
| | - Jiakui Li
- School of Computer Science and Information Technology, Northeast Normal University, Changchun 130117, China; E-Mails: (X.Z.); (J.L.)
| | - Yanxin Huang
- National Engineering Laboratory for Druggable Gene and Protein Screening, Northeast Normal University, Changchun 130024, China
| | - Zhiqiang Ma
- School of Life Sciences, Northeast Normal University, Changchun 130024, China
| | - Minghao Yin
- School of Computer Science and Information Technology, Northeast Normal University, Changchun 130117, China; E-Mails: (X.Z.); (J.L.)
| |
Collapse
|
24
|
Kolossov VL, Spring BQ, Clegg RM, Henry JJ, Sokolowski A, Kenis PJA, Gaskins HR. Development of a high-dynamic range, GFP-based FRET probe sensitive to oxidative microenvironments. Exp Biol Med (Maywood) 2011; 236:681-91. [PMID: 21606117 DOI: 10.1258/ebm.2011.011009] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
We report the optimization of a novel redox-sensitive probe with enhanced dynamic range and an exceptionally well-positioned oxidative midpoint redox potential. The present work characterizes factors that contribute to the improved Förster resonance energy transfer (FRET) performance of this green fluorescent protein (GFP)-based redox sensor. The α-helical linker, which separates the FRET donor and acceptor, has been extended in the new probe and leads to a decreased FRET efficiency in the linker's reduced, 'FRET-off' state. Unexpectedly, the FRET efficiency is increased in the new linker's oxidized, 'FRET-on' state compared with the parent probe, in spite of the longer linker sequence. The combination of a lowered baseline 'FRET-off' and an increased 'FRET-on' signal significantly improves the dynamic range of the probe for a more robust discrimination of its reduced and oxidized linker states. Mutagenesis of the cysteine residues within the α-helix linker reveals the importance of the fourth, C-terminal cysteine and the relative insignificance of the second cysteine in forming the disulfide bridge to clamp the linker into the high-FRET, oxidized state. To further optimize the performance of the redox probe, various cyan fluorescent protein (CFP)/yellow fluorescent protein (YFP) FRET pairs, placed at opposite ends of the improved redox linker (RL7), were quantitatively compared and exchanged. We found that the CyPet/YPet and ECFP/YPet FRET pairs when attached to RL7 do not function well as sensitive redox probes due to a strong tendency to form heterodimers, which disrupt the α-helix. However, monomeric versions of CyPet and YPet (mCyPet and mYPet) eliminate dimerization and restore redox sensitivity of the probe. The best performing probe, ECFP-RL7-EYFP, exhibits an approximately six-fold increase in FRET efficiency in vitro when passing from the oxidized to the reduced state. We determined the midpoint redox potential of the probe to be -143 ± 6 mV, which is ideal for measuring glutathione (GSH/GSSG) redox potentials in oxidative compartments of mammalian cells (e.g. the endoplasmic reticulum).
Collapse
Affiliation(s)
- Vladimir L Kolossov
- Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.
| | | | | | | | | | | | | |
Collapse
|