1
|
Identifying DNase I hypersensitive sites using multi-features fusion and F-score features selection via Chou's 5-steps rule. Biophys Chem 2019; 253:106227. [DOI: 10.1016/j.bpc.2019.106227] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2019] [Revised: 07/04/2019] [Accepted: 07/10/2019] [Indexed: 01/12/2023]
|
2
|
Analysis of Protein-Protein Functional Associations by Using Gene Ontology and KEGG Pathway. BIOMED RESEARCH INTERNATIONAL 2019; 2019:4963289. [PMID: 31396531 PMCID: PMC6668538 DOI: 10.1155/2019/4963289] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/04/2019] [Revised: 06/04/2019] [Accepted: 06/26/2019] [Indexed: 12/19/2022]
Abstract
Protein–protein interaction (PPI) plays an extremely remarkable role in the growth, reproduction, and metabolism of all lives. A thorough investigation of PPI can uncover the mechanism of how proteins express their functions. In this study, we used gene ontology (GO) terms and biological pathways to study an extended version of PPI (protein–protein functional associations) and subsequently identify some essential GO terms and pathways that can indicate the difference between two proteins with and without functional associations. The protein–protein functional associations validated by experiments were retrieved from STRING, a well-known database on collected associations between proteins from multiple sources, and they were termed as positive samples. The negative samples were constructed by randomly pairing two proteins. Each sample was represented by several features based on GO and KEGG pathway information of two proteins. Then, the mutual information was adopted to evaluate the importance of all features and some important ones could be accessed, from which a number of essential GO terms or KEGG pathways were identified. The final analysis of some important GO terms and one KEGG pathway can partly uncover the difference between proteins with and without functional associations.
Collapse
|
3
|
Messerli MA, Sarkar A. Advances in Electrochemistry for Monitoring Cellular Chemical Flux. Curr Med Chem 2019; 26:4984-5002. [PMID: 31057100 DOI: 10.2174/0929867326666190506111629] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Revised: 03/06/2019] [Accepted: 03/12/2019] [Indexed: 11/22/2022]
Abstract
The transport of organic and inorganic molecules, along with inorganic ions across the plasma membrane results in chemical fluxes that reflect the cellular function in healthy and diseased states. Measurement of these chemical fluxes enables the characterization of protein function and transporter stoichiometry, characterization of a single cell and embryo viability prior to implantation, and screening of pharmaceutical agents. Electrochemical sensors emerge as sensitive and non-invasive tools for measuring chemical fluxes immediately outside the cells in the boundary layer, that are capable of monitoring a diverse range of transported analytes including inorganic ions, gases, neurotransmitters, hormones, and pharmaceutical agents. Used on their own or in combination with other methods, these sensors continue to expand our understanding of the function of rare cells and small tissues. Advances in sensor construction and detection strategies continue to improve sensitivity under physiological conditions, diversify analyte detection, and increase throughput. These advances will be discussed in the context of addressing technical challenges to measuring chemical flux in the boundary layer of cells and measuring the resultant changes to the chemical concentration in the bulk media.
Collapse
Affiliation(s)
- Mark A Messerli
- Department of Biology and Microbiology, South Dakota State University, Brookings, SD. United States
| | - Anyesha Sarkar
- Department of Biology and Microbiology, South Dakota State University, Brookings, SD. United States
| |
Collapse
|
4
|
Wang S, Li J, Sun X, Zhang YH, Huang T, Cai Y. Computational Method for Identifying Malonylation Sites by Using Random Forest Algorithm. Comb Chem High Throughput Screen 2018; 23:304-312. [PMID: 30588879 DOI: 10.2174/1386207322666181227144318] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2018] [Revised: 09/03/2018] [Accepted: 12/04/2018] [Indexed: 12/12/2022]
Abstract
BACKGROUND As a newly uncovered post-translational modification on the ε-amino group of lysine residue, protein malonylation was found to be involved in metabolic pathways and certain diseases. Apart from experimental approaches, several computational methods based on machine learning algorithms were recently proposed to predict malonylation sites. However, previous methods failed to address imbalanced data sizes between positive and negative samples. OBJECTIVE In this study, we identified the significant features of malonylation sites in a novel computational method which applied machine learning algorithms and balanced data sizes by applying synthetic minority over-sampling technique. METHOD Four types of features, namely, amino acid (AA) composition, position-specific scoring matrix (PSSM), AA factor, and disorder were used to encode residues in protein segments. Then, a two-step feature selection procedure including maximum relevance minimum redundancy and incremental feature selection, together with random forest algorithm, was performed on the constructed hybrid feature vector. RESULTS An optimal classifier was built from the optimal feature subset, which featured an F1-measure of 0.356. Feature analysis was performed on several selected important features. CONCLUSION Results showed that certain types of PSSM and disorder features may be closely associated with malonylation of lysine residues. Our study contributes to the development of computational approaches for predicting malonyllysine and provides insights into molecular mechanism of malonylation.
Collapse
Affiliation(s)
- ShaoPeng Wang
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - JiaRui Li
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Xijun Sun
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Yu-Hang Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yudong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| |
Collapse
|
5
|
Sankar K, Hoi KH, Yin Y, Ramachandran P, Andersen N, Hilderbrand A, McDonald P, Spiess C, Zhang Q. Prediction of methionine oxidation risk in monoclonal antibodies using a machine learning method. MAbs 2018; 10:1281-1290. [PMID: 30252602 PMCID: PMC6284603 DOI: 10.1080/19420862.2018.1518887] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2018] [Revised: 08/15/2018] [Accepted: 08/28/2018] [Indexed: 12/22/2022] Open
Abstract
Monoclonal antibodies (mAbs) have become a major class of protein therapeutics that target a spectrum of diseases ranging from cancers to infectious diseases. Similar to any protein molecule, mAbs are susceptible to chemical modifications during the manufacturing process, long-term storage, and in vivo circulation that can impair their potency. One such modification is the oxidation of methionine residues. Chemical modifications that occur in the complementarity-determining regions (CDRs) of mAbs can lead to the abrogation of antigen binding and reduce the drug's potency and efficacy. Thus, it is highly desirable to identify and eliminate any chemically unstable residues in the CDRs during the therapeutic antibody discovery process. To provide increased throughput over experimental methods, we extracted features from the mAbs' sequences, structures, and dynamics, used random forests to identify important features and develop a quantitative and highly predictive in silico methionine oxidation model.
Collapse
Affiliation(s)
- Kannan Sankar
- Department of Antibody Engineering, Genentech, South San Francisco, CA, USA
| | - Kam Hon Hoi
- Department of Antibody Engineering, Genentech, South San Francisco, CA, USA
- Department of Bioinformatics and Computational Biology, Genentech, South San Francisco, CA, USA
| | - Yizhou Yin
- Department of Antibody Engineering, Genentech, South San Francisco, CA, USA
- Institute for Bioscience and Biotechnology Research, Biological Sciences Graduate Program, University of Maryland, Rockville, MD, USA
| | - Prasanna Ramachandran
- Department of Analytical Development and Quality Control, Genentech, South San Francisco, CA, USA
| | - Nisana Andersen
- Department of Analytical Development and Quality Control, Genentech, South San Francisco, CA, USA
| | - Amy Hilderbrand
- Department of Analytical Development and Quality Control, Genentech, South San Francisco, CA, USA
| | - Paul McDonald
- Department of Purification Development and Bioprocess Development, Genentech, South San Francisco, CA, USA
| | - Christoph Spiess
- Department of Antibody Engineering, Genentech, South San Francisco, CA, USA
| | - Qing Zhang
- Department of Antibody Engineering, Genentech, South San Francisco, CA, USA
- Department of Bioinformatics and Computational Biology, Genentech, South San Francisco, CA, USA
| |
Collapse
|
6
|
Chen L, Zhang YH, Pan X, Liu M, Wang S, Huang T, Cai YD. Tissue Expression Difference between mRNAs and lncRNAs. Int J Mol Sci 2018; 19:ijms19113416. [PMID: 30384456 PMCID: PMC6274976 DOI: 10.3390/ijms19113416] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Revised: 10/26/2018] [Accepted: 10/28/2018] [Indexed: 12/15/2022] Open
Abstract
Messenger RNA (mRNA) and long noncoding RNA (lncRNA) are two main subgroups of RNAs participating in transcription regulation. With the development of next generation sequencing, increasing lncRNAs are identified. Many hidden functions of lncRNAs are also revealed. However, the differences in lncRNAs and mRNAs are still unclear. For example, we need to determine whether lncRNAs have stronger tissue specificity than mRNAs and which tissues have more lncRNAs expressed. To investigate such tissue expression difference between mRNAs and lncRNAs, we encoded 9339 lncRNAs and 14,294 mRNAs with 71 expression features, including 69 maximum expression features for 69 types of cells, one feature for the maximum expression in all cells, and one expression specificity feature that was measured as Chao-Shen-corrected Shannon's entropy. With advanced feature selection methods, such as maximum relevance minimum redundancy, incremental feature selection methods, and random forest algorithm, 13 features presented the dissimilarity of lncRNAs and mRNAs. The 11 cell subtype features indicated which cell types of the lncRNAs and mRNAs had the largest expression difference. Such cell subtypes may be the potential cell models for lncRNA identification and function investigation. The expression specificity feature suggested that the cell types to express mRNAs and lncRNAs were different. The maximum expression feature suggested that the maximum expression levels of mRNAs and lncRNAs were different. In addition, the rule learning algorithm, repeated incremental pruning to produce error reduction algorithm, was also employed to produce effective classification rules for classifying lncRNAs and mRNAs, which gave competitive results compared with random forest and could give a clearer picture of different expression patterns between lncRNAs and mRNAs. Results not only revealed the heterogeneous expression pattern of lncRNA and mRNA, but also gave rise to the development of a new tool to identify the potential biological functions of such RNA subgroups.
Collapse
Affiliation(s)
- Lei Chen
- School of Life Sciences, Shanghai University, Shanghai 200444, China.
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China.
- Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai 200241, China.
| | - Yu-Hang Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| | - Xiaoyong Pan
- Department of Medical Informatics, Erasmus MC, 3000 CA Rotterdam, The Netherlands.
| | - Min Liu
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China.
| | - Shaopeng Wang
- School of Life Sciences, Shanghai University, Shanghai 200444, China.
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China.
| |
Collapse
|
7
|
Liang Y, Zhang S. Identify Gram-negative bacterial secreted protein types by incorporating different modes of PSSM into Chou’s general PseAAC via Kullback–Leibler divergence. J Theor Biol 2018; 454:22-29. [DOI: 10.1016/j.jtbi.2018.05.035] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2018] [Revised: 05/19/2018] [Accepted: 05/29/2018] [Indexed: 12/14/2022]
|
8
|
Liang Y, Zhang S, Ding S. Accurate prediction of Gram-negative bacterial secreted protein types by fusing multiple statistical features from PSI-BLAST profile. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2018; 29:469-481. [PMID: 29688029 DOI: 10.1080/1062936x.2018.1459835] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Accepted: 03/27/2018] [Indexed: 06/08/2023]
Abstract
Gram-negative bacterial secreted proteins play different roles in invaded eukaryotic cells and cause various diseases. Prediction of Gram-negative bacterial secreted protein types is a meaningful and challenging task. In this paper, we develop a multiple statistical features extraction model based on the dipeptide composition (DPC) descriptor and the detrended moving-average auto-cross-correlation analysis (DMACA) descriptor by PSI-BLAST profile. A 610-dimensional feature vector was constructed on the training set, and the feature extraction model was denoted DPC-DMACA-PSSM. A support vector machine was then selected as a classifier, and the bias-free jackknife test method was used for evaluating the accuracy. Our predictor achieves favourable performance for overall accuracy on the test set and also outperforms the other published approaches. The results show that our approach offers a reliable tool for the identification of Gram-negative bacterial secreted protein types.
Collapse
Affiliation(s)
- Y Liang
- a School of Science , Xi'an Polytechnic University , Xi'an 710048 , PR China
| | - S Zhang
- b School of Mathematics and Statistics , Xidian University , Xi'an 710071 , PR China
| | - S Ding
- c Department of Sciences , Dalian Nationalities University , Dalian 116600 , PR China
| |
Collapse
|
9
|
A Two-Step Feature Selection Method to Predict Cancerlectins by Multiview Features and Synthetic Minority Oversampling Technique. BIOMED RESEARCH INTERNATIONAL 2018; 2018:9364182. [PMID: 29568772 PMCID: PMC5820548 DOI: 10.1155/2018/9364182] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/11/2017] [Revised: 12/25/2017] [Accepted: 12/26/2017] [Indexed: 11/18/2022]
Abstract
Cancerlectins have an inhibitory effect on the growth of cancer cells and are currently being employed as therapeutic agents. The accurate identification of the cancerlectins should provide insight into the molecular mechanisms of cancers. In this study, a new computational method based on the RF (Random Forest) algorithm is proposed for further improving the performance of identifying cancerlectins. Hybrid feature space before feature selection is developed by combining different individual feature spaces, CTD (Composition, Transition, and Distribution), PseAAC (Pseudo Amino Acid Composition), PSSM (Position-Specific Scoring Matrix), and disorder. The SMOTE (Synthetic Minority Oversampling Technique) is applied to solve the imbalanced data problem. To reduce feature redundancy and computation complexity, we propose a two-step feature selection process to select informative features. A 5-fold cross-validation technique is used for the evaluation of various prediction strategies. The proposed method achieves a sensitivity of 0.779, a specificity of 0.717, an accuracy of 0.748, and an MCC (Matthew's Correlation Coefficient) of 0.497. The prediction results are also compared with other existing methods on the same dataset using 5-fold cross-validation. The comparison results demonstrate the high effectiveness of our method for predicting cancerlectins.
Collapse
|
10
|
Wang S, Wang D, Li J, Huang T, Cai YD. Identification and analysis of the cleavage site in a signal peptide using SMOTE, dagging, and feature selection methods. Mol Omics 2018; 14:64-73. [DOI: 10.1039/c7mo00030h] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Several machine learning algorithms were adopted to investigate cleavage sites in a signal peptide. An optimal dagging based classifier was constructed and 870 important features were deemed to be important for this classifier.
Collapse
Affiliation(s)
- ShaoPeng Wang
- School of Life Sciences
- Shanghai University
- Shanghai 200444
- People's Republic of China
| | - Deling Wang
- Department of Medical Imaging
- Sun Yat-sen University Cancer Center
- State Key Laboratory of Oncology in South China
- Collaborative Innovation Center for Cancer Medicine
- Guangzhou
| | - JiaRui Li
- School of Life Sciences
- Shanghai University
- Shanghai 200444
- People's Republic of China
| | - Tao Huang
- Institute of Health Sciences
- Shanghai Institutes for Biological Sciences
- Chinese Academy of Sciences
- Shanghai 200031
- People's Republic of China
| | - Yu-Dong Cai
- School of Life Sciences
- Shanghai University
- Shanghai 200444
- People's Republic of China
| |
Collapse
|
11
|
Zhang L, Zhang C, Gao R, Yang R, Song Q. Sequence Based Prediction of Antioxidant Proteins Using a Classifier Selection Strategy. PLoS One 2016; 11:e0163274. [PMID: 27662651 PMCID: PMC5035026 DOI: 10.1371/journal.pone.0163274] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2016] [Accepted: 09/05/2016] [Indexed: 11/28/2022] Open
Abstract
Antioxidant proteins perform significant functions in maintaining oxidation/antioxidation balance and have potential therapies for some diseases. Accurate identification of antioxidant proteins could contribute to revealing physiological processes of oxidation/antioxidation balance and developing novel antioxidation-based drugs. In this study, an ensemble method is presented to predict antioxidant proteins with hybrid features, incorporating SSI (Secondary Structure Information), PSSM (Position Specific Scoring Matrix), RSA (Relative Solvent Accessibility), and CTD (Composition, Transition, Distribution). The prediction results of the ensemble predictor are determined by an average of prediction results of multiple base classifiers. Based on a classifier selection strategy, we obtain an optimal ensemble classifier composed of RF (Random Forest), SMO (Sequential Minimal Optimization), NNA (Nearest Neighbor Algorithm), and J48 with an accuracy of 0.925. A Relief combined with IFS (Incremental Feature Selection) method is adopted to obtain optimal features from hybrid features. With the optimal features, the ensemble method achieves improved performance with a sensitivity of 0.95, a specificity of 0.93, an accuracy of 0.94, and an MCC (Matthew’s Correlation Coefficient) of 0.880, far better than the existing method. To evaluate the prediction performance objectively, the proposed method is compared with existing methods on the same independent testing dataset. Encouragingly, our method performs better than previous studies. In addition, our method achieves more balanced performance with a sensitivity of 0.878 and a specificity of 0.860. These results suggest that the proposed ensemble method can be a potential candidate for antioxidant protein prediction. For public access, we develop a user-friendly web server for antioxidant protein identification that is freely accessible at http://antioxidant.weka.cc.
Collapse
Affiliation(s)
- Lina Zhang
- School of Control Science and Engineering, Shandong University, Jinan, China
| | - Chengjin Zhang
- School of Control Science and Engineering, Shandong University, Jinan, China
- School of Mechanical, Electrical and Information Engineering, Shandong University at Weihai, China
- * E-mail:
| | - Rui Gao
- School of Control Science and Engineering, Shandong University, Jinan, China
| | - Runtao Yang
- School of Control Science and Engineering, Shandong University, Jinan, China
| | - Qing Song
- School of Electrical Engineering, University of Jinan, Jinan, China
| |
Collapse
|
12
|
Prediction of aptamer-protein interacting pairs using an ensemble classifier in combination with various protein sequence attributes. BMC Bioinformatics 2016; 17:225. [PMID: 27245069 PMCID: PMC4888498 DOI: 10.1186/s12859-016-1087-5] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2016] [Accepted: 05/17/2016] [Indexed: 02/05/2023] Open
Abstract
Background Aptamer-protein interacting pairs play a variety of physiological functions and therapeutic potentials in organisms. Rapidly and effectively predicting aptamer-protein interacting pairs is significant to design aptamers binding to certain interested proteins, which will give insight into understanding mechanisms of aptamer-protein interacting pairs and developing aptamer-based therapies. Results In this study, an ensemble method is presented to predict aptamer-protein interacting pairs with hybrid features. The features for aptamers are extracted from Pseudo K-tuple Nucleotide Composition (PseKNC) while the features for proteins incorporate Discrete Cosine Transformation (DCT), disorder information, and bi-gram Position Specific Scoring Matrix (PSSM). We investigate predictive capabilities of various feature spaces. The proposed ensemble method obtains the best performance with Youden’s Index of 0.380, using the hybrid feature space of PseKNC, DCT, bi-gram PSSM, and disorder information by 10-fold cross validation. The Relief-Incremental Feature Selection (IFS) method is adopted to obtain the optimal feature set. Based on the optimal feature set, the proposed method achieves a balanced performance with a sensitivity of 0.753 and a specificity of 0.725 on the training dataset, which indicates that this method can solve the imbalanced data problem effectively. To evaluate the prediction performance objectively, an independent testing dataset is used to evaluate the proposed method. Encouragingly, our proposed method performs better than previous study with a sensitivity of 0.738 and a Youden’s Index of 0.451. Conclusions These results suggest that the proposed method can be a potential candidate for aptamer-protein interacting pair prediction, which may contribute to finding novel aptamer-protein interacting pairs and understanding the relationship between aptamers and proteins. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1087-5) contains supplementary material, which is available to authorized users.
Collapse
|
13
|
Gianazza E, Parravicini C, Primi R, Miller I, Eberini I. In silico prediction and characterization of protein post-translational modifications. J Proteomics 2015; 134:65-75. [PMID: 26436211 DOI: 10.1016/j.jprot.2015.09.026] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2015] [Revised: 07/17/2015] [Accepted: 09/23/2015] [Indexed: 01/06/2023]
Abstract
This review outlines the computational approaches and procedures for predicting post translational modification (PTM)-induced changes in protein conformation and their influence on protein function(s), the latter being assessed as differential affinity in interaction with either low (ligands for receptors or transporters, substrates for enzymes) or high molecular mass molecules (proteins or nucleic acids in supramolecular assemblies). The scope for an in silico approach is discussed against a summary of the in vitro evidence on the structural and functional outcome of protein PTM.
Collapse
Affiliation(s)
- Elisabetta Gianazza
- Dipartimento di Scienze Farmacologiche e Biomolecolari, Università degli Studi di Milano, Gruppo di Studio per la Proteomica e la Struttura delle Proteine, Sezione di Scienze Farmacologiche, Via Balzaretti 9, I-20133 Milan, Italy.
| | - Chiara Parravicini
- Dipartimento di Scienze Farmacologiche e Biomolecolari, Università degli Studi di Milano, Laboratorio di Biochimica e Biofisica Computazionale, Sezione di Biochimica, Biofisica, Fisiologia ed Immunopatologia, Via Trentacoste, 2, I-20134 Milan, Italy
| | - Roberto Primi
- Dipartimento di Scienze Farmacologiche e Biomolecolari, Università degli Studi di Milano, Laboratorio di Biochimica e Biofisica Computazionale, Sezione di Biochimica, Biofisica, Fisiologia ed Immunopatologia, Via Trentacoste, 2, I-20134 Milan, Italy
| | - Ingrid Miller
- Institut für Medizinische Biochemie, Veterinärmedizinische Universität Wien, Veterinärplatz 1, A-1210 Vienna, Austria
| | - Ivano Eberini
- Dipartimento di Scienze Farmacologiche e Biomolecolari, Università degli Studi di Milano, Laboratorio di Biochimica e Biofisica Computazionale, Sezione di Biochimica, Biofisica, Fisiologia ed Immunopatologia, Via Trentacoste, 2, I-20134 Milan, Italy
| |
Collapse
|
14
|
Zhang L, Zhang C, Gao R, Yang R. An Ensemble Method to Distinguish Bacteriophage Virion from Non-Virion Proteins Based on Protein Sequence Characteristics. Int J Mol Sci 2015; 16:21734-58. [PMID: 26370987 PMCID: PMC4613277 DOI: 10.3390/ijms160921734] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2015] [Revised: 08/16/2015] [Accepted: 08/25/2015] [Indexed: 11/16/2022] Open
Abstract
Bacteriophage virion proteins and non-virion proteins have distinct functions in biological processes, such as specificity determination for host bacteria, bacteriophage replication and transcription. Accurate identification of bacteriophage virion proteins from bacteriophage protein sequences is significant to understand the complex virulence mechanism in host bacteria and the influence of bacteriophages on the development of antibacterial drugs. In this study, an ensemble method for bacteriophage virion protein prediction from bacteriophage protein sequences is put forward with hybrid feature spaces incorporating CTD (composition, transition and distribution), bi-profile Bayes, PseAAC (pseudo-amino acid composition) and PSSM (position-specific scoring matrix). When performing on the training dataset 10-fold cross-validation, the presented method achieves a satisfactory prediction result with a sensitivity of 0.870, a specificity of 0.830, an accuracy of 0.850 and Matthew's correlation coefficient (MCC) of 0.701, respectively. To evaluate the prediction performance objectively, an independent testing dataset is used to evaluate the proposed method. Encouragingly, our proposed method performs better than previous studies with a sensitivity of 0.853, a specificity of 0.815, an accuracy of 0.831 and MCC of 0.662 on the independent testing dataset. These results suggest that the proposed method can be a potential candidate for bacteriophage virion protein prediction, which may provide a useful tool to find novel antibacterial drugs and to understand the relationship between bacteriophage and host bacteria. For the convenience of the vast majority of experimental Int. J. Mol. Sci. 2015, 16,21735 scientists, a user-friendly and publicly-accessible web-server for the proposed ensemble method is established.
Collapse
Affiliation(s)
- Lina Zhang
- School of Control Science and Engineering, Shandong University, Jinan 250061, China.
| | - Chengjin Zhang
- School of Control Science and Engineering, Shandong University, Jinan 250061, China.
- School of Mechanical, Electrical and Information Engineering, Shandong University, Weihai 264209, China.
| | - Rui Gao
- School of Control Science and Engineering, Shandong University, Jinan 250061, China.
| | - Runtao Yang
- School of Control Science and Engineering, Shandong University, Jinan 250061, China.
| |
Collapse
|
15
|
Artemenko K, Mi J, Bergquist J. Mass-spectrometry-based characterization of oxidations in proteins. Free Radic Res 2015; 49:477-93. [DOI: 10.3109/10715762.2015.1023795] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
16
|
Qiu WR, Xiao X, Lin WZ, Chou KC. iUbiq-Lys: prediction of lysine ubiquitination sites in proteins by extracting sequence evolution information via a gray system model. J Biomol Struct Dyn 2014; 33:1731-42. [PMID: 25248923 DOI: 10.1080/07391102.2014.968875] [Citation(s) in RCA: 126] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
As one of the most important posttranslational modifications (PTMs), ubiquitination plays an important role in regulating varieties of biological processes, such as signal transduction, cell division, apoptosis, and immune response. Ubiquitination is also named "lysine ubiquitination" because it occurs when an ubiquitin is covalently attached to lysine (K) residues of targeting proteins. Given an uncharacterized protein sequence that contains many lysine residues, which one of them is the ubiquitination site, and which one is of non-ubiquitination site? With the avalanche of protein sequences generated in the postgenomic age, it is highly desired for both basic research and drug development to develop an automated method for rapidly and accurately annotating the ubiquitination sites in proteins. In view of this, a new predictor called "iUbiq-Lys" was developed based on the evolutionary information, gray system model, as well as the general form of pseudo-amino acid composition. It was demonstrated via the rigorous cross-validations that the new predictor remarkably outperformed all its counterparts. As a web-server, iUbiq-Lys is accessible to the public at http://www.jci-bioinfo.cn/iUbiq-Lys . For the convenience of most experimental scientists, we have further provided a protocol of step-by-step guide, by which users can easily get their desired results without the need to follow the complicated mathematics that were presented in this paper just for the integrity of its development process.
Collapse
Affiliation(s)
- Wang-Ren Qiu
- a Computer Department, Jing-De-Zhen Ceramic Institute , Jing-De-Zhen 333403 , China
| | | | | | | |
Collapse
|
17
|
Xu R, Zhou J, Liu B, He Y, Zou Q, Wang X, Chou KC. Identification of DNA-binding proteins by incorporating evolutionary information into pseudo amino acid composition via the top-n-gram approach. J Biomol Struct Dyn 2014; 33:1720-30. [PMID: 25252709 DOI: 10.1080/07391102.2014.968624] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
DNA-binding proteins are crucial for various cellular processes and hence have become an important target for both basic research and drug development. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to establish an automated method for rapidly and accurately identifying DNA-binding proteins based on their sequence information alone. Owing to the fact that all biological species have developed beginning from a very limited number of ancestral species, it is important to take into account the evolutionary information in developing such a high-throughput tool. In view of this, a new predictor was proposed by incorporating the evolutionary information into the general form of pseudo amino acid composition via the top-n-gram approach. It was observed by comparing the new predictor with the existing methods via both jackknife test and independent data-set test that the new predictor outperformed its counterparts. It is anticipated that the new predictor may become a useful vehicle for identifying DNA-binding proteins. It has not escaped our notice that the novel approach to extract evolutionary information into the formulation of statistical samples can be used to identify many other protein attributes as well.
Collapse
Affiliation(s)
- Ruifeng Xu
- a School of Computer Science and Technology , Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University Town , Xili, Shenzhen 518055 , Guangdong , China
| | | | | | | | | | | | | |
Collapse
|
18
|
Ma X, Sun X. Sequence-based predictor of ATP-binding residues using random forest and mRMR-IFS feature selection. J Theor Biol 2014; 360:59-66. [PMID: 25014477 DOI: 10.1016/j.jtbi.2014.06.037] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2014] [Revised: 06/17/2014] [Accepted: 06/28/2014] [Indexed: 01/05/2023]
Abstract
We develop a computational and statistical approach (ATPBR) for predicting ATP-binding residues in proteins from amino acid sequences by using random forests with a novel hybrid feature. The hybrid feature incorporates a new feature called PSSMPP, the predicted secondary structure and orthogonal binary vectors. The mRMR-IFS feature selection method is utilized to construct the best prediction model. At last, ATPBR achieves significantly improved performance over existing methods, with 87.53% accuracy and a Matthew׳s correlation coefficient of 0.554. In addition, our further analysis demonstrates that PSSMPP distinguishes more effectively between ATP-binding and non-binding residues. Besides, the optimal features selected by the mRMR-IFS method improve the prediction performance and may provide useful insights for revealing the mechanisms of ATP and proteins interactions.
Collapse
Affiliation(s)
- Xin Ma
- Golden Audit College, Nanjing Audit University, Nanjing 210029, China.
| | - Xiao Sun
- State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China.
| |
Collapse
|
19
|
Silva AMN, Vitorino R, Domingues MRM, Spickett CM, Domingues P. Post-translational modifications and mass spectrometry detection. Free Radic Biol Med 2013; 65:925-941. [PMID: 24002012 DOI: 10.1016/j.freeradbiomed.2013.08.184] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/08/2013] [Revised: 08/22/2013] [Accepted: 08/24/2013] [Indexed: 12/14/2022]
Abstract
In this review, we provide a comprehensive bibliographic overview of the role of mass spectrometry and the recent technical developments in the detection of post-translational modifications (PTMs). We briefly describe the principles of mass spectrometry for detecting PTMs and the protein and peptide enrichment strategies for PTM analysis, including phosphorylation, acetylation and oxidation. This review presents a bibliographic overview of the scientific achievements and the recent technical development in the detection of PTMs is provided. In order to ascertain the state of the art in mass spectrometry and proteomics methodologies for the study of PTMs, we analyzed all the PTM data introduced in the Universal Protein Resource (UniProt) and the literature published in the last three years. The evolution of curated data in UniProt for proteins annotated as being post-translationally modified is also analyzed. Additionally, we have undertaken a careful analysis of the research articles published in the years 2010 to 2012 reporting the detection of PTMs in biological samples by mass spectrometry.
Collapse
Affiliation(s)
- André M N Silva
- Mass Spectrometry Centre, QOPNA, Department of Chemistry, University of Aveiro, 3810-193 Aveiro, Portugal
| | - Rui Vitorino
- Mass Spectrometry Centre, QOPNA, Department of Chemistry, University of Aveiro, 3810-193 Aveiro, Portugal
| | - M Rosário M Domingues
- Mass Spectrometry Centre, QOPNA, Department of Chemistry, University of Aveiro, 3810-193 Aveiro, Portugal
| | - Corinne M Spickett
- School of Life and Health Sciences, Aston University, Aston Triangle, Birmingham B4 7 ET, United Kingdom
| | - Pedro Domingues
- Mass Spectrometry Centre, QOPNA, Department of Chemistry, University of Aveiro, 3810-193 Aveiro, Portugal.
| |
Collapse
|