Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Masso M, Vaisman II. Knowledge-based computational mutagenesis for predicting the disease potential of human non-synonymous single nucleotide polymorphisms. J Theor Biol 2010;266:560-8. [DOI: 10.1016/j.jtbi.2010.07.026] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2010] [Revised: 04/25/2010] [Accepted: 07/21/2010] [Indexed: 10/19/2022]

For:	Masso M, Vaisman II. Knowledge-based computational mutagenesis for predicting the disease potential of human non-synonymous single nucleotide polymorphisms. J Theor Biol 2010;266:560-8. [DOI: 10.1016/j.jtbi.2010.07.026] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2010] [Revised: 04/25/2010] [Accepted: 07/21/2010] [Indexed: 10/19/2022]

Number

Cited by Other Article(s)

Khan YD, Amin N, Hussain W, Rasool N, Khan SA, Chou KC. iProtease-PseAAC(2L): A two-layer predictor for identifying proteases and their types using Chou's 5-step-rule and general PseAAC. Anal Biochem 2019;588:113477. [PMID: 31654612 DOI: 10.1016/j.ab.2019.113477] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2019] [Revised: 10/02/2019] [Accepted: 10/18/2019] [Indexed: 12/16/2022]

Masso M, Bansal A, Prem P, Gajjala A, Vaisman II. Fitness of unregulated human Ras mutants modeled by implementing computational mutagenesis and machine learning techniques. Heliyon 2019;5:e01884. [PMID: 31211262 PMCID: PMC6562371 DOI: 10.1016/j.heliyon.2019.e01884] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2019] [Revised: 04/23/2019] [Accepted: 05/30/2019] [Indexed: 10/26/2022] Open

Masso M. All-atom four-body knowledge-based statistical potential to distinguish native tertiary RNA structures from nonnative folds. J Theor Biol 2018;453:58-67. [PMID: 29782930 DOI: 10.1016/j.jtbi.2018.05.022] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2017] [Revised: 04/05/2018] [Accepted: 05/17/2018] [Indexed: 11/16/2022]

Khan S, Naseem I, Togneri R, Bennamoun M. RAFP-Pred: Robust Prediction of Antifreeze Proteins Using Localized Analysis of n-Peptide Compositions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018;15:244-250. [PMID: 28113406 DOI: 10.1109/tcbb.2016.2617337] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Cardoso JGR, Andersen MR, Herrgård MJ, Sonnenschein N. Analysis of genetic variation and potential applications in genome-scale metabolic modeling. Front Bioeng Biotechnol 2015;3:13. [PMID: 25763369 PMCID: PMC4329917 DOI: 10.3389/fbioe.2015.00013] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2014] [Accepted: 01/22/2015] [Indexed: 11/13/2022] Open

Masso M. Modeling functional changes to Escherichia coli thymidylate synthase upon single residue replacements: a structure-based approach. PeerJ 2015;3:e721. [PMID: 25648456 PMCID: PMC4304848 DOI: 10.7717/peerj.721] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Accepted: 12/18/2014] [Indexed: 11/30/2022] Open

AUTO-MUTE 2.0: A Portable Framework with Enhanced Capabilities for Predicting Protein Functional Consequences upon Mutation. Adv Bioinformatics 2014;2014:278385. [PMID: 25197272 PMCID: PMC4150472 DOI: 10.1155/2014/278385] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2014] [Revised: 07/29/2014] [Accepted: 07/29/2014] [Indexed: 11/18/2022] Open

Mao R, Raj Kumar PK, Guo C, Zhang Y, Liang C. Comparative analyses between retained introns and constitutively spliced introns in Arabidopsis thaliana using random forest and support vector machine. PLoS One 2014;9:e104049. [PMID: 25110928 PMCID: PMC4128822 DOI: 10.1371/journal.pone.0104049] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2014] [Accepted: 07/06/2014] [Indexed: 01/04/2023] Open

Abstract

One of the important modes of pre-mRNA post-transcriptional modification is alternative splicing. Alternative splicing allows creation of many distinct mature mRNA transcripts from a single gene by utilizing different splice sites. In plants like Arabidopsis thaliana, the most common type of alternative splicing is intron retention. Many studies in the past focus on positional distribution of retained introns (RIs) among different genic regions and their expression regulations, while little systematic classification of RIs from constitutively spliced introns (CSIs) has been conducted using machine learning approaches. We used random forest and support vector machine (SVM) with radial basis kernel function (RBF) to differentiate these two types of introns in Arabidopsis. By comparing coordinates of introns of all annotated mRNAs from TAIR10, we obtained our high-quality experimental data. To distinguish RIs from CSIs, We investigated the unique characteristics of RIs in comparison with CSIs and finally extracted 37 quantitative features: local and global nucleotide sequence features of introns, frequent motifs, the signal strength of splice sites, and the similarity between sequences of introns and their flanking regions. We demonstrated that our proposed feature extraction approach was more accurate in effectively classifying RIs from CSIs in comparison with other four approaches. The optimal penalty parameter C and the RBF kernel parameter in SVM were set based on particle swarm optimization algorithm (PSOSVM). Our classification performance showed F-Measure of 80.8% (random forest) and 77.4% (PSOSVM). Not only the basic sequence features and positional distribution characteristics of RIs were obtained, but also putative regulatory motifs in intron splicing were predicted based on our feature extraction approach. Clearly, our study will facilitate a better understanding of underlying mechanisms involved in intron retention.

Collapse

Masso M, Chuang G, Hao K, Jain S, Vaisman II. Structure-based predictors of resistance to the HIV-1 integrase inhibitor Elvitegravir. Antiviral Res 2014;106:5-12. [DOI: 10.1016/j.antiviral.2014.03.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2014] [Revised: 03/14/2014] [Accepted: 03/17/2014] [Indexed: 12/15/2022]

Zhao N, Han JG, Shyu CR, Korkin D. Determining effects of non-synonymous SNPs on protein-protein interactions using supervised and semi-supervised learning. PLoS Comput Biol 2014;10:e1003592. [PMID: 24784581 PMCID: PMC4006705 DOI: 10.1371/journal.pcbi.1003592] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2013] [Accepted: 03/13/2014] [Indexed: 12/31/2022] Open

Abstract

Single nucleotide polymorphisms (SNPs) are among the most common types of genetic variation in complex genetic disorders. A growing number of studies link the functional role of SNPs with the networks and pathways mediated by the disease-associated genes. For example, many non-synonymous missense SNPs (nsSNPs) have been found near or inside the protein-protein interaction (PPI) interfaces. Determining whether such nsSNP will disrupt or preserve a PPI is a challenging task to address, both experimentally and computationally. Here, we present this task as three related classification problems, and develop a new computational method, called the SNP-IN tool (non-synonymous SNP INteraction effect predictor). Our method predicts the effects of nsSNPs on PPIs, given the interaction's structure. It leverages supervised and semi-supervised feature-based classifiers, including our new Random Forest self-learning protocol. The classifiers are trained based on a dataset of comprehensive mutagenesis studies for 151 PPI complexes, with experimentally determined binding affinities of the mutant and wild-type interactions. Three classification problems were considered: (1) a 2-class problem (strengthening/weakening PPI mutations), (2) another 2-class problem (mutations that disrupt/preserve a PPI), and (3) a 3-class classification (detrimental/neutral/beneficial mutation effects). In total, 11 different supervised and semi-supervised classifiers were trained and assessed resulting in a promising performance, with the weighted f-measure ranging from 0.87 for Problem 1 to 0.70 for the most challenging Problem 3. By integrating prediction results of the 2-class classifiers into the 3-class classifier, we further improved its performance for Problem 3. To demonstrate the utility of SNP-IN tool, it was applied to study the nsSNP-induced rewiring of two disease-centered networks. The accurate and balanced performance of SNP-IN tool makes it readily available to study the rewiring of large-scale protein-protein interaction networks, and can be useful for functional annotation of disease-associated SNPs. SNIP-IN tool is freely accessible as a web-server at http://korkinlab.org/snpintool/.

Collapse

Computational Approaches and Resources in Single Amino Acid Substitutions Analysis Toward Clinical Research. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2014;94:365-423. [DOI: 10.1016/b978-0-12-800168-4.00010-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]

Predicting DNA binding proteins using support vector machine with hybrid fractal features. J Theor Biol 2013;343:186-92. [PMID: 24189096 DOI: 10.1016/j.jtbi.2013.10.009] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2013] [Revised: 08/12/2013] [Accepted: 10/17/2013] [Indexed: 11/20/2022]

Using protein granularity to extract the protein sequence features. J Theor Biol 2013;331:48-53. [DOI: 10.1016/j.jtbi.2013.04.019] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2012] [Revised: 04/16/2013] [Accepted: 04/18/2013] [Indexed: 11/21/2022]

Jahandideh S, Zhi D. Systematic investigation of predicted effect of nonsynonymous SNPs in human prion protein gene: a molecular modeling and molecular dynamics study. J Biomol Struct Dyn 2013;32:289-300. [PMID: 23527686 DOI: 10.1080/07391102.2012.763216] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Chiechi A, Novello C, Magagnoli G, Petricoin EF, Deng J, Benassi MS, Picci P, Vaisman I, Espina V, Liotta LA. Elevated TNFR1 and serotonin in bone metastasis are correlated with poor survival following bone metastasis diagnosis for both carcinoma and sarcoma primary tumors. Clin Cancer Res 2013;19:2473-85. [PMID: 23493346 DOI: 10.1158/1078-0432.ccr-12-3416] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Qiu Z, Qin C, Jiu M, Wang X. A simple iterative method to optimize protein–ligand-binding residue prediction. J Theor Biol 2013;317:219-23. [DOI: 10.1016/j.jtbi.2012.10.028] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2012] [Revised: 10/19/2012] [Accepted: 10/22/2012] [Indexed: 11/15/2022]

Analyzing effects of naturally occurring missense mutations. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2012;2012:805827. [PMID: 22577471 PMCID: PMC3346971 DOI: 10.1155/2012/805827] [Citation(s) in RCA: 82] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/21/2011] [Revised: 02/01/2012] [Accepted: 02/01/2012] [Indexed: 11/17/2022]

Chou KC, Wu ZC, Xiao X. iLoc-Hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites. ACTA ACUST UNITED AC 2012;8:629-41. [PMID: 22134333 DOI: 10.1039/c1mb05420a] [Citation(s) in RCA: 270] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]

Sequence-dependent prediction of recombination hotspots in Saccharomyces cerevisiae. J Theor Biol 2012;293:49-54. [DOI: 10.1016/j.jtbi.2011.10.004] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2011] [Revised: 10/04/2011] [Accepted: 10/04/2011] [Indexed: 11/18/2022]

Henriksen SB, Mortensen RJ, Geertz-Hansen HM, Neves-Petersen MT, Arnason O, Söring J, Petersen SB. Hyperdimensional analysis of amino acid pair distributions in proteins. PLoS One 2011;6:e25638. [PMID: 22174733 PMCID: PMC3235099 DOI: 10.1371/journal.pone.0025638] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2011] [Accepted: 09/08/2011] [Indexed: 01/06/2023] Open

Abstract

Our manuscript presents a novel approach to protein structure analyses. We have organized an 8-dimensional data cube with protein 3D-structural information from 8706 high-resolution non-redundant protein-chains with the aim of identifying packing rules at the amino acid pair level. The cube contains information about amino acid type, solvent accessibility, spatial and sequence distance, secondary structure and sequence length. We are able to pose structural queries to the data cube using program ProPack. The response is a 1, 2 or 3D graph. Whereas the response is of a statistical nature, the user can obtain an instant list of all PDB-structures where such pair is found. The user may select a particular structure, which is displayed highlighting the pair in question. The user may pose millions of different queries and for each one he will receive the answer in a few seconds. In order to demonstrate the capabilities of the data cube as well as the programs, we have selected well known structural features, disulphide bridges and salt bridges, where we illustrate how the queries are posed, and how answers are given. Motifs involving cysteines such as disulphide bridges, zinc-fingers and iron-sulfur clusters are clearly identified and differentiated. ProPack also reveals that whereas pairs of Lys residues virtually never appear in close spatial proximity, pairs of Arg are abundant and appear at close spatial distance, contrasting the belief that electrostatic repulsion would prevent this juxtaposition and that Arg-Lys is perceived as a conservative mutation. The presented programs can find and visualize novel packing preferences in proteins structures allowing the user to unravel correlations between pairs of amino acids. The new tools allow the user to view statistical information and visualize instantly the structures that underpin the statistical information, which is far from trivial with most other SW tools for protein structure analysis.

Collapse

Wavelet images and Chou’s pseudo amino acid composition for protein classification. Amino Acids 2011;43:657-65. [DOI: 10.1007/s00726-011-1114-9] [Citation(s) in RCA: 72] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2010] [Accepted: 09/28/2011] [Indexed: 10/16/2022]

Lu JL, Hu XH, Hu DG. A new hybrid fractal algorithm for predicting thermophilic nucleotide sequences. J Theor Biol 2011;293:74-81. [PMID: 22001320 DOI: 10.1016/j.jtbi.2011.09.028] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2011] [Revised: 09/23/2011] [Accepted: 09/26/2011] [Indexed: 01/20/2023]

OligoPred: A web-server for predicting homo-oligomeric proteins by incorporating discrete wavelet transform into Chou's pseudo amino acid composition. J Mol Graph Model 2011;30:129-34. [DOI: 10.1016/j.jmgm.2011.06.014] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2011] [Revised: 06/18/2011] [Accepted: 06/30/2011] [Indexed: 01/13/2023]

Jingbo X, Silan Z, Feng S, Huijuan X, Xuehai H, Xiaohui N, Zhi L. Using the concept of pseudo amino acid composition to predict resistance gene against Xanthomonas oryzae pv. oryzae in rice: An approach from chaos games representation. J Theor Biol 2011;284:16-23. [DOI: 10.1016/j.jtbi.2011.06.003] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2010] [Revised: 06/02/2011] [Accepted: 06/03/2011] [Indexed: 10/18/2022]

Cheng F, Theodorescu D, Schulman IG, Lee JK. In vitro transcriptomic prediction of hepatotoxicity for early drug discovery. J Theor Biol 2011;290:27-36. [PMID: 21884709 DOI: 10.1016/j.jtbi.2011.08.009] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2011] [Revised: 07/27/2011] [Accepted: 08/11/2011] [Indexed: 01/08/2023]

Nanni L, Lumini A, Gupta D, Garg A. Identifying bacterial virulent proteins by fusing a set of classifiers based on variants of Chou's pseudo amino acid composition and on evolutionary information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011;9:467-475. [PMID: 21860064 DOI: 10.1109/tcbb.2011.117] [Citation(s) in RCA: 113] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

Wang P, Xiao X, Chou KC. NR-2L: a two-level predictor for identifying nuclear receptor subfamilies based on sequence-derived features. PLoS One 2011;6:e23505. [PMID: 21858146 PMCID: PMC3156231 DOI: 10.1371/journal.pone.0023505] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2011] [Accepted: 07/19/2011] [Indexed: 11/18/2022] Open

Abstract

Nuclear receptors (NRs) are one of the most abundant classes of transcriptional regulators in animals. They regulate diverse functions, such as homeostasis, reproduction, development and metabolism. Therefore, NRs are a very important target for drug development. Nuclear receptors form a superfamily of phylogenetically related proteins and have been subdivided into different subfamilies due to their domain diversity. In this study, a two-level predictor, called NR-2L, was developed that can be used to identify a query protein as a nuclear receptor or not based on its sequence information alone; if it is, the prediction will be automatically continued to further identify it among the following seven subfamilies: (1) thyroid hormone like (NR1), (2) HNF4-like (NR2), (3) estrogen like, (4) nerve growth factor IB-like (NR4), (5) fushi tarazu-F1 like (NR5), (6) germ cell nuclear factor like (NR6), and (7) knirps like (NR0). The identification was made by the Fuzzy K nearest neighbor (FK-NN) classifier based on the pseudo amino acid composition formed by incorporating various physicochemical and statistical features derived from the protein sequences, such as amino acid composition, dipeptide composition, complexity factor, and low-frequency Fourier spectrum components. As a demonstration, it was shown through some benchmark datasets derived from the NucleaRDB and UniProt with low redundancy that the overall success rates achieved by the jackknife test were about 93% and 89% in the first and second level, respectively. The high success rates indicate that the novel two-level predictor can be a useful vehicle for identifying NRs and their subfamilies. As a user-friendly web server, NR-2L is freely accessible at either http://icpr.jci.edu.cn/bioinfo/NR2L or http://www.jci-bioinfo.cn/NR2L. Each job submitted to NR-2L can contain up to 500 query protein sequences and be finished in less than 2 minutes. The less the number of query proteins is, the shorter the time will usually be. All the program codes for NR-2L are available for non-commercial purpose upon request.

Collapse

Hu LL, Huang T, Cai YD, Chou KC. Prediction of body fluids where proteins are secreted into based on protein interaction network. PLoS One 2011;6:e22989. [PMID: 21829572 PMCID: PMC3146524 DOI: 10.1371/journal.pone.0022989] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2011] [Accepted: 07/08/2011] [Indexed: 12/27/2022] Open

A multi-label classifier for predicting the subcellular localization of gram-negative bacterial proteins with both single and multiple sites. PLoS One 2011;6:e20592. [PMID: 21698097 PMCID: PMC3117797 DOI: 10.1371/journal.pone.0020592] [Citation(s) in RCA: 176] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2011] [Accepted: 05/04/2011] [Indexed: 11/21/2022] Open

Abstract

Prediction of protein subcellular localization is a challenging problem, particularly when the system concerned contains both singleplex and multiplex proteins. In this paper, by introducing the “multi-label scale” and hybridizing the information of gene ontology with the sequential evolution information, a novel predictor called iLoc-Gneg is developed for predicting the subcellular localization of Gram-positive bacterial proteins with both single-location and multiple-location sites. For facilitating comparison, the same stringent benchmark dataset used to estimate the accuracy of Gneg-mPLoc was adopted to demonstrate the power of iLoc-Gneg. The dataset contains 1,392 Gram-negative bacterial proteins classified into the following eight locations: (1) cytoplasm, (2) extracellular, (3) fimbrium, (4) flagellum, (5) inner membrane, (6) nucleoid, (7) outer membrane, and (8) periplasm. Of the 1,392 proteins, 1,328 are each with only one subcellular location and the other 64 are each with two subcellular locations, but none of the proteins included has pairwise sequence identity to any other in a same subset (subcellular location). It was observed that the overall success rate by jackknife test on such a stringent benchmark dataset by iLoc-Gneg was over 91%, which is about 6% higher than that by Gneg-mPLoc. As a user-friendly web-server, iLoc-Gneg is freely accessible to the public at http://icpr.jci.edu.cn/bioinfo/iLoc-Gneg. Meanwhile, a step-by-step guide is provided on how to use the web-server to get the desired results. Furthermore, for the user's convenience, the iLoc-Gneg web-server also has the function to accept the batch job submission, which is not available in the existing version of Gneg-mPLoc web-server. It is anticipated that iLoc-Gneg may become a useful high throughput tool for Molecular Cell Biology, Proteomics, System Biology, and Drug Development.

Collapse

Roterman I, Konieczny L, Jurkowski W, Prymula K, Banach M. Two-intermediate model to characterize the structure of fast-folding proteins. J Theor Biol 2011;283:60-70. [PMID: 21635900 DOI: 10.1016/j.jtbi.2011.05.027] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2010] [Revised: 05/17/2011] [Accepted: 05/18/2011] [Indexed: 01/15/2023]

Feature importance analysis in guide strand identification of microRNAs. Comput Biol Chem 2011;35:131-6. [PMID: 21704258 DOI: 10.1016/j.compbiolchem.2011.04.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2011] [Revised: 03/22/2011] [Accepted: 04/23/2011] [Indexed: 11/22/2022]

iLoc-Euk: a multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins. PLoS One 2011;6:e18258. [PMID: 21483473 PMCID: PMC3068162 DOI: 10.1371/journal.pone.0018258] [Citation(s) in RCA: 241] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2010] [Accepted: 02/24/2011] [Indexed: 12/26/2022] Open

Abstract

Predicting protein subcellular localization is an important and difficult problem, particularly when query proteins may have the multiplex character, i.e., simultaneously exist at, or move between, two or more different subcellular location sites. Most of the existing protein subcellular location predictor can only be used to deal with the single-location or “singleplex” proteins. Actually, multiple-location or “multiplex” proteins should not be ignored because they usually posses some unique biological functions worthy of our special notice. By introducing the “multi-labeled learning” and “accumulation-layer scale”, a new predictor, called iLoc-Euk, has been developed that can be used to deal with the systems containing both singleplex and multiplex proteins. As a demonstration, the jackknife cross-validation was performed with iLoc-Euk on a benchmark dataset of eukaryotic proteins classified into the following 22 location sites: (1) acrosome, (2) cell membrane, (3) cell wall, (4) centriole, (5) chloroplast, (6) cyanelle, (7) cytoplasm, (8) cytoskeleton, (9) endoplasmic reticulum, (10) endosome, (11) extracellular, (12) Golgi apparatus, (13) hydrogenosome, (14) lysosome, (15) melanosome, (16) microsome (17) mitochondrion, (18) nucleus, (19) peroxisome, (20) spindle pole body, (21) synapse, and (22) vacuole, where none of proteins included has pairwise sequence identity to any other in a same subset. The overall success rate thus obtained by iLoc-Euk was 79%, which is significantly higher than that by any of the existing predictors that also have the capacity to deal with such a complicated and stringent system. As a user-friendly web-server, iLoc-Euk is freely accessible to the public at the web-site http://icpr.jci.edu.cn/bioinfo/iLoc-Euk. It is anticipated that iLoc-Euk may become a useful bioinformatics tool for Molecular Cell Biology, Proteomics, System Biology, and Drug Development Also, its novel approach will further stimulate the development of predicting other protein attributes.

Collapse

Wu ZC, Xiao X, Chou KC. iLoc-Plant: a multi-label classifier for predicting the subcellular localization of plant proteins with both single and multiple sites. MOLECULAR BIOSYSTEMS 2011;7:3287-97. [PMID: 21984117 DOI: 10.1039/c1mb05232b] [Citation(s) in RCA: 181] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

AFP-Pred: A random forest approach for predicting antifreeze proteins from sequence-derived properties. J Theor Biol 2010;270:56-62. [PMID: 21056045 DOI: 10.1016/j.jtbi.2010.10.037] [Citation(s) in RCA: 191] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2010] [Revised: 10/29/2010] [Accepted: 10/29/2010] [Indexed: 12/11/2022]