Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Lee K, Kim DW, Na D, Lee KH, Lee D. PLPD: reliable protein localization prediction from imbalanced and overlapped datasets. Nucleic Acids Res 2006;34:4655-66. [PMID: 16966337 PMCID: PMC1636404 DOI: 10.1093/nar/gkl638] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

For:	Lee K, Kim DW, Na D, Lee KH, Lee D. PLPD: reliable protein localization prediction from imbalanced and overlapped datasets. Nucleic Acids Res 2006;34:4655-66. [PMID: 16966337 PMCID: PMC1636404 DOI: 10.1093/nar/gkl638] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Number

Cited by Other Article(s)

Jiang Y, Wang D, Wang W, Xu D. Computational methods for protein localization prediction. Comput Struct Biotechnol J 2021;19:5834-5844. [PMID: 34765098 PMCID: PMC8564054 DOI: 10.1016/j.csbj.2021.10.023] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 10/12/2021] [Accepted: 10/13/2021] [Indexed: 12/16/2022] Open

Lu AX, Chong YT, Hsu IS, Strome B, Handfield LF, Kraus O, Andrews BJ, Moses AM. Integrating images from multiple microscopy screens reveals diverse patterns of change in the subcellular localization of proteins. eLife 2018;7:e31872. [PMID: 29620521 PMCID: PMC5935485 DOI: 10.7554/elife.31872] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2017] [Accepted: 03/30/2018] [Indexed: 01/29/2023] Open

Liu Z, Hu J. Mislocalization-related disease gene discovery using gene expression based computational protein localization prediction. Methods 2015;93:119-27. [PMID: 26416496 DOI: 10.1016/j.ymeth.2015.09.022] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2015] [Revised: 09/17/2015] [Accepted: 09/21/2015] [Indexed: 01/09/2023] Open

Wang X, Zhang J, Li GZ. Multi-location gram-positive and gram-negative bacterial protein subcellular localization using gene ontology and multi-label classifier ensemble. BMC Bioinformatics 2015;16 Suppl 12:S1. [PMID: 26329681 PMCID: PMC4705491 DOI: 10.1186/1471-2105-16-s12-s1] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open

Abstract

Background

It has become a very important and full of challenge task to predict bacterial protein subcellular locations using computational methods. Although there exist a lot of prediction methods for bacterial proteins, the majority of these methods can only deal with single-location proteins. But unfortunately many multi-location proteins are located in the bacterial cells. Moreover, multi-location proteins have special biological functions capable of helping the development of new drugs. So it is necessary to develop new computational methods for accurately predicting subcellular locations of multi-location bacterial proteins.

Results

In this article, two efficient multi-label predictors, Gpos-ECC-mPLoc and Gneg-ECC-mPLoc, are developed to predict the subcellular locations of multi-label gram-positive and gram-negative bacterial proteins respectively. The two multi-label predictors construct the GO vectors by using the GO terms of homologous proteins of query proteins and then adopt a powerful multi-label ensemble classifier to make the final multi-label prediction. The two multi-label predictors have the following advantages: (1) they improve the prediction performance of multi-label proteins by taking the correlations among different labels into account; (2) they ensemble multiple CC classifiers and further generate better prediction results by ensemble learning; and (3) they construct the GO vectors by using the frequency of occurrences of GO terms in the typical homologous set instead of using 0/1 values. Experimental results show that Gpos-ECC-mPLoc and Gneg-ECC-mPLoc can efficiently predict the subcellular locations of multi-label gram-positive and gram-negative bacterial proteins respectively.

Conclusions

Gpos-ECC-mPLoc and Gneg-ECC-mPLoc can efficiently improve prediction accuracy of subcellular localization of multi-location gram-positive and gram-negative bacterial proteins respectively. The online web servers for Gpos-ECC-mPLoc and Gneg-ECC-mPLoc predictors are freely accessible at http://biomed.zzuli.edu.cn/bioinfo/gpos-ecc-mploc/ and http://biomed.zzuli.edu.cn/bioinfo/gneg-ecc-mploc/ respectively.

Collapse

Proteome-wide remodeling of protein location and function by stress. Proc Natl Acad Sci U S A 2014;111:E3157-66. [PMID: 25028499 DOI: 10.1073/pnas.1318881111] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open

Zhang SW, Liu YF, Yu Y, Zhang TH, Fan XN. MSLoc-DT: A new method for predicting the protein subcellular location of multispecies based on decision templates. Anal Biochem 2014;449:164-71. [DOI: 10.1016/j.ab.2013.12.013] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2013] [Revised: 11/08/2013] [Accepted: 12/12/2013] [Indexed: 12/12/2022]

Lee K, Byun K, Hong W, Chuang HY, Pack CG, Bayarsaikhan E, Paek SH, Kim H, Shin HY, Ideker T, Lee B. Proteome-wide discovery of mislocated proteins in cancer. Genome Res 2013;23:1283-94. [PMID: 23674306 PMCID: PMC3730102 DOI: 10.1101/gr.155499.113] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Wan S, Mak MW, Kung SY. GOASVM: a subcellular location predictor by incorporating term-frequency gene ontology into the general form of Chou's pseudo-amino acid composition. J Theor Biol 2013;323:40-8. [PMID: 23376577 DOI: 10.1016/j.jtbi.2013.01.012] [Citation(s) in RCA: 82] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2012] [Revised: 01/16/2013] [Accepted: 01/16/2013] [Indexed: 01/03/2023]

Wan S, Mak MW, Kung SY. mGOASVM: Multi-label protein subcellular localization based on gene ontology and support vector machines. BMC Bioinformatics 2012;13:290. [PMID: 23130999 PMCID: PMC3582598 DOI: 10.1186/1471-2105-13-290] [Citation(s) in RCA: 72] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2012] [Accepted: 10/24/2012] [Indexed: 12/21/2022] Open

Abstract

Background

Although many computational methods have been developed to predict protein subcellular localization, most of the methods are limited to the prediction of single-location proteins. Multi-location proteins are either not considered or assumed not existing. However, proteins with multiple locations are particularly interesting because they may have special biological functions, which are essential to both basic research and drug discovery.

Results

This paper proposes an efficient multi-label predictor, namely mGOASVM, for predicting the subcellular localization of multi-location proteins. Given a protein, the accession numbers of its homologs are obtained via BLAST search. Then, the original accession number and the homologous accession numbers of the protein are used as keys to search against the Gene Ontology (GO) annotation database to obtain a set of GO terms. Given a set of training proteins, a set of T relevant GO terms is obtained by finding all of the GO terms in the GO annotation database that are relevant to the training proteins. These relevant GO terms then form the basis of a T-dimensional Euclidean space on which the GO vectors lie. A support vector machine (SVM) classifier with a new decision scheme is proposed to classify the multi-label GO vectors. The mGOASVM predictor has the following advantages: (1) it uses the frequency of occurrences of GO terms for feature representation; (2) it selects the relevant GO subspace which can substantially speed up the prediction without compromising performance; and (3) it adopts an efficient multi-label SVM classifier which significantly outperforms other predictors. Briefly, on two recently published virus and plant datasets, mGOASVM achieves an actual accuracy of 88.9% and 87.4%, respectively, which are significantly higher than those achieved by the state-of-the-art predictors such as iLoc-Virus (74.8%) and iLoc-Plant (68.1%).

Conclusions

mGOASVM can efficiently predict the subcellular locations of multi-label proteins. The mGOASVM predictor is available online at http://bioinfo.eie.polyu.edu.hk/mGoaSvmServer/mGOASVM.html.

Collapse

He J, Gu H, Liu W. Imbalanced multi-modal multi-label learning for subcellular localization prediction of human proteins with both single and multiple sites. PLoS One 2012;7:e37155. [PMID: 22715364 PMCID: PMC3371015 DOI: 10.1371/journal.pone.0037155] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2011] [Accepted: 04/14/2012] [Indexed: 12/20/2022] Open

Wu ZC, Xiao X, Chou KC. iLoc-Plant: a multi-label classifier for predicting the subcellular localization of plant proteins with both single and multiple sites. MOLECULAR BIOSYSTEMS 2011;7:3287-97. [PMID: 21984117 DOI: 10.1039/c1mb05232b] [Citation(s) in RCA: 181] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Jung J, Ryu T, Hwang Y, Lee E, Lee D. Prediction of extracellular matrix proteins based on distinctive sequence and domain characteristics. J Comput Biol 2010;17:97-105. [PMID: 20078400 DOI: 10.1089/cmb.2008.0236] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Lin HN, Chen CT, Sung TY, Ho SY, Hsu WL. Protein subcellular localization prediction of eukaryotes using a knowledge-based approach. BMC Bioinformatics 2009;10 Suppl 15:S8. [PMID: 19958518 PMCID: PMC2788359 DOI: 10.1186/1471-2105-10-s15-s8] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

The study of protein subcellular localization (PSL) is important for elucidating protein functions involved in various cellular processes. However, determining the localization sites of a protein through wet-lab experiments can be time-consuming and labor-intensive. Thus, computational approaches become highly desirable. Most of the PSL prediction systems are established for single-localized proteins. However, a significant number of eukaryotic proteins are known to be localized into multiple subcellular organelles. Many studies have shown that proteins may simultaneously locate or move between different cellular compartments and be involved in different biological processes with different roles.

RESULTS

In this study, we propose a knowledge based method, called KnowPredsite, to predict the localization site(s) of both single-localized and multi-localized proteins. Based on the local similarity, we can identify the "related sequences" for prediction. We construct a knowledge base to record the possible sequence variations for protein sequences. When predicting the localization annotation of a query protein, we search against the knowledge base and used a scoring mechanism to determine the predicted sites. We downloaded the dataset from ngLOC, which consisted of ten distinct subcellular organelles from 1923 species, and performed ten-fold cross validation experiments to evaluate KnowPred site's performance. The experiment results show that KnowPred site achieves higher prediction accuracy than ngLOC and Blast-hit method. For single-localized proteins, the overall accuracy of KnowPred site is 91.7%. For multi-localized proteins, the overall accuracy of KnowPred site is 72.1%, which is significantly higher than that of ngLOC by 12.4%. Notably, half of the proteins in the dataset that cannot find any Blast hit sequence above a specified threshold can still be correctly predicted by KnowPred site.

CONCLUSION

KnowPred site demonstrates the power of identifying related sequences in the knowledge base. The experiment results show that even though the sequence similarity is low, the local similarity is effective for prediction. Experiment results show that KnowPred site is a highly accurate prediction method for both single- and multi-localized proteins. It is worth-mentioning the prediction process of KnowPred site is transparent and biologically interpretable and it shows a set of template sequences to generate the prediction result. The KnowPred site prediction server is available at http://bio-cluster.iis.sinica.edu.tw/kbloc/.

Collapse

A top-down approach to enhance the power of predicting human protein subcellular localization: Hum-mPLoc 2.0. Anal Biochem 2009;394:269-74. [DOI: 10.1016/j.ab.2009.07.046] [Citation(s) in RCA: 135] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2009] [Revised: 07/28/2009] [Accepted: 07/28/2009] [Indexed: 12/12/2022]

Tung TQ, Lee D. A method to improve protein subcellular localization prediction by integrating various biological data sources. BMC Bioinformatics 2009;10 Suppl 1:S43. [PMID: 19208145 PMCID: PMC2648781 DOI: 10.1186/1471-2105-10-s1-s43] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

Lee K, Chuang HY, Beyer A, Sung MK, Huh WK, Lee B, Ideker T. Protein networks markedly improve prediction of subcellular localization in multiple eukaryotic species. Nucleic Acids Res 2008;36:e136. [PMID: 18836191 PMCID: PMC2582614 DOI: 10.1093/nar/gkn619] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open

An ensemble of support vector machines for predicting the membrane protein type directly from the amino acid sequence. Amino Acids 2008;35:573-80. [DOI: 10.1007/s00726-008-0083-0] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2008] [Accepted: 02/26/2008] [Indexed: 11/26/2022]

Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms. Nat Protoc 2008;3:153-62. [PMID: 18274516 DOI: 10.1038/nprot.2007.494] [Citation(s) in RCA: 690] [Impact Index Per Article: 43.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]

Nanni L, Lumini A. Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria localization. Amino Acids 2008;34:653-60. [PMID: 18175047 DOI: 10.1007/s00726-007-0018-1] [Citation(s) in RCA: 124] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2007] [Accepted: 12/11/2007] [Indexed: 01/25/2023]

Tantoso E, Li KB. AAIndexLoc: predicting subcellular localization of proteins based on a new representation of sequences using amino acid indices. Amino Acids 2007;35:345-53. [DOI: 10.1007/s00726-007-0616-y] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2007] [Accepted: 10/04/2007] [Indexed: 10/22/2022]

Chou KC, Shen HB. Recent progress in protein subcellular location prediction. Anal Biochem 2007;370:1-16. [PMID: 17698024 DOI: 10.1016/j.ab.2007.07.006] [Citation(s) in RCA: 603] [Impact Index Per Article: 35.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2007] [Revised: 07/02/2007] [Accepted: 07/04/2007] [Indexed: 01/16/2023]

Shen HB, Yang J, Chou KC. Methodology development for predicting subcellular localization and other attributes of proteins. Expert Rev Proteomics 2007;4:453-63. [PMID: 17705704 DOI: 10.1586/14789450.4.4.453] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Mundra P, Kumar M, Kumar KK, Jayaraman VK, Kulkarni BD. Using pseudo amino acid composition to predict protein subnuclear localization: Approached with PSSM. Pattern Recognit Lett 2007. [DOI: 10.1016/j.patrec.2007.04.001] [Citation(s) in RCA: 90] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Su ECY, Chiu HS, Lo A, Hwang JK, Sung TY, Hsu WL. Protein subcellular localization prediction based on compartment-specific features and structure conservation. BMC Bioinformatics 2007;8:330. [PMID: 17825110 PMCID: PMC2040162 DOI: 10.1186/1471-2105-8-330] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2007] [Accepted: 09/08/2007] [Indexed: 01/17/2023] Open

Abstract

Background

Protein subcellular localization is crucial for genome annotation, protein function prediction, and drug discovery. Determination of subcellular localization using experimental approaches is time-consuming; thus, computational approaches become highly desirable. Extensive studies of localization prediction have led to the development of several methods including composition-based and homology-based methods. However, their performance might be significantly degraded if homologous sequences are not detected. Moreover, methods that integrate various features could suffer from the problem of low coverage in high-throughput proteomic analyses due to the lack of information to characterize unknown proteins.

Results

We propose a hybrid prediction method for Gram-negative bacteria that combines a one-versus-one support vector machines (SVM) model and a structural homology approach. The SVM model comprises a number of binary classifiers, in which biological features derived from Gram-negative bacteria translocation pathways are incorporated. In the structural homology approach, we employ secondary structure alignment for structural similarity comparison and assign the known localization of the top-ranked protein as the predicted localization of a query protein. The hybrid method achieves overall accuracy of 93.7% and 93.2% using ten-fold cross-validation on the benchmark data sets. In the assessment of the evaluation data sets, our method also attains accurate prediction accuracy of 84.0%, especially when testing on sequences with a low level of homology to the training data. A three-way data split procedure is also incorporated to prevent overestimation of the predictive performance. In addition, we show that the prediction accuracy should be approximately 85% for non-redundant data sets of sequence identity less than 30%.

Conclusion

Our results demonstrate that biological features derived from Gram-negative bacteria translocation pathways yield a significant improvement. The biological features are interpretable and can be applied in advanced analyses and experimental designs. Moreover, the overall accuracy of combining the structural homology approach is further improved, which suggests that structural conservation could be a useful indicator for inferring localization in addition to sequence homology. The proposed method can be used in large-scale analyses of proteomes.

Collapse

Emanuelsson O, Brunak S, von Heijne G, Nielsen H. Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc 2007;2:953-71. [PMID: 17446895 DOI: 10.1038/nprot.2007.131] [Citation(s) in RCA: 2458] [Impact Index Per Article: 144.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Shen HB, Chou KC. Hum-mPLoc: An ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. Biochem Biophys Res Commun 2007;355:1006-11. [PMID: 17346678 DOI: 10.1016/j.bbrc.2007.02.071] [Citation(s) in RCA: 147] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2007] [Accepted: 02/09/2007] [Indexed: 10/23/2022]

Abstract

Proteins may simultaneously exist at, or move between, two or more different subcellular locations. Proteins with multiple locations or dynamic feature of this kind are particularly interesting because they may have some very special biological functions intriguing to investigators in both basic research and drug discovery. For instance, among the 6408 human protein entries that have experimentally observed subcellular location annotations in the Swiss-Prot database (version 50.7, released 19-Sept-2006), 973 ( approximately 15%) have multiple location sites. The number of total human protein entries (except those annotated with "fragment" or those with less than 50 amino acids) in the same database is 14,370, meaning a gap of (14,370-6408)=7962 entries for which no knowledge is available about their subcellular locations. Although one can use the computational approach to predict the desired information for the gap, so far all the existing methods for predicting human protein subcellular localization are limited in the case of single location site only. To overcome such a barrier, a new ensemble classifier, named Hum-mPLoc, was developed that can be used to deal with the case of multiple location sites as well. Hum-mPLoc is freely accessible to the public as a web server at http://202.120.37.186/bioinf/hum-multi. Meanwhile, for the convenience of people working in the relevant areas, Hum-mPLoc has been used to identify all human protein entries in the Swiss-Prot database that do not have subcellular location annotations or are annotated as being uncertain. The large-scale results thus obtained have been deposited in a downloadable file prepared with Microsoft Excel and named "Tab_Hum-mPLoc.xls". This file is available at the same website and will be updated twice a year to include new entries of human proteins and reflect the continuous development of Hum-mPLoc.

Collapse