Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: van Dijk ADJ, Bosch D, ter Braak CJF, van der Krol AR, van Ham RCHJ. Predicting sub-Golgi localization of type II membrane proteins. ACTA ACUST UNITED AC 2008;24:1779-86. [PMID: 18562268 PMCID: PMC7110242 DOI: 10.1093/bioinformatics/btn309] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

For:	van Dijk ADJ, Bosch D, ter Braak CJF, van der Krol AR, van Ham RCHJ. Predicting sub-Golgi localization of type II membrane proteins. ACTA ACUST UNITED AC 2008;24:1779-86. [PMID: 18562268 PMCID: PMC7110242 DOI: 10.1093/bioinformatics/btn309] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Number

Cited by Other Article(s)

Bao W, Gu Y, Chen B, Yu H. Golgi_DF: Golgi proteins classification with deep forest. Front Neurosci 2023;17:1197824. [PMID: 37250391 PMCID: PMC10213405 DOI: 10.3389/fnins.2023.1197824] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 04/19/2023] [Indexed: 05/31/2023] Open

Gu X, Ding Y, Xiao P, He T. A GHKNN model based on the physicochemical property extraction method to identify SNARE proteins. Front Genet 2022;13:935717. [PMID: 36506312 PMCID: PMC9727185 DOI: 10.3389/fgene.2022.935717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 11/02/2022] [Indexed: 11/24/2022] Open

Le NQK, Huynh TT. Identifying SNAREs by Incorporating Deep Learning Architecture and Amino Acid Embedding Representation. Front Physiol 2019;10:1501. [PMID: 31920706 PMCID: PMC6914855 DOI: 10.3389/fphys.2019.01501] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Accepted: 11/26/2019] [Indexed: 12/12/2022] Open

Lv Z, Jin S, Ding H, Zou Q. A Random Forest Sub-Golgi Protein Classifier Optimized via Dipeptide and Amino Acid Composition Features. Front Bioeng Biotechnol 2019;7:215. [PMID: 31552241 PMCID: PMC6737778 DOI: 10.3389/fbioe.2019.00215] [Citation(s) in RCA: 80] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Accepted: 08/22/2019] [Indexed: 02/01/2023] Open

Abstract

To gain insight into the malfunction of the Golgi apparatus and its relationship to various genetic and neurodegenerative diseases, the identification of sub-Golgi proteins, both cis-Golgi and trans-Golgi proteins, is of great significance. In this study, a state-of-art random forests sub-Golgi protein classifier, rfGPT, was developed. The rfGPT used 2-gap dipeptide and split amino acid composition for the feature vectors and was combined with the synthetic minority over-sampling technique (SMOTE) and an analysis of variance (ANOVA) feature selection method. The rfGPT was trained on a sub-Golgi protein sequence data set (137 sequences), with sequence identity less than 25%. For the optimal rfGPT classifier with 93 features, the accuracy (ACC) was 90.5%; the Matthews correlation coefficient (MCC) was 0.811; the sensitivity (Sn) was 92.6%; and the specificity (Sp) was 88.4%. The independent testing scores for the rfGPT were ACC = 90.6%; MCC = 0.696; Sn = 96.1%; and Sp = 69.2%. Although the independent testing accuracy was 4.4% lower than that for the best reported sub-Golgi classifier trained on a data set with 40% sequence identity (304 sequences), the rfGPT is currently the top sub-Golgi protein predictor utilizing feature vectors without any position-specific scoring matrix and its derivative features. Therefore, the rfGPT is a more practical tool, because no sequence alignment is required with tens of millions of protein sequences. To date, the rfGPT is the Golgi classifier with the best independent testing scores, optimized by training on smaller benchmark data sets. Feature importance analysis proves that the non-polar and aliphatic residues composition, the (aromatic residues) + (non-polar, aliphatic residues) dipeptide and aromatic residues composition between NH2-termial and COOH-terminal of protein sequences are the three top biological features for distinguishing the sub-Golgi proteins.

Collapse

Zhao W, Li GP, Wang J, Zhou YK, Gao Y, Du PF. Predicting protein sub-Golgi locations by combining functional domain enrichment scores with pseudo-amino acid compositions. J Theor Biol 2019;473:38-43. [DOI: 10.1016/j.jtbi.2019.04.025] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Revised: 04/22/2019] [Accepted: 04/29/2019] [Indexed: 12/11/2022]

Le NQK, Nguyen VN. SNARE-CNN: a 2D convolutional neural network architecture to identify SNARE proteins from high-throughput sequencing data. PeerJ Comput Sci 2019;5:e177. [PMID: 33816830 PMCID: PMC7924420 DOI: 10.7717/peerj-cs.177] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Accepted: 02/06/2019] [Indexed: 05/04/2023]

Rahman MS, Rahman MK, Kaykobad M, Rahman MS. isGPT: An optimized model to identify sub-Golgi protein types using SVM and Random Forest based feature selection. Artif Intell Med 2017;84:90-100. [PMID: 29183738 DOI: 10.1016/j.artmed.2017.11.003] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2017] [Revised: 11/13/2017] [Accepted: 11/17/2017] [Indexed: 10/18/2022]

Abstract

The Golgi Apparatus (GA) is a key organelle for protein synthesis within the eukaryotic cell. The main task of GA is to modify and sort proteins for transport throughout the cell. Proteins permeate through the GA on the ER (Endoplasmic Reticulum) facing side (cis side) and depart on the other side (trans side). Based on this phenomenon, we get two types of GA proteins, namely, cis-Golgi protein and trans-Golgi protein. Any dysfunction of GA proteins can result in congenital glycosylation disorders and some other forms of difficulties that may lead to neurodegenerative and inherited diseases like diabetes, cancer and cystic fibrosis. So, the exact classification of GA proteins may contribute to drug development which will further help in medication. In this paper, we focus on building a new computational model that not only introduces easy ways to extract features from protein sequences but also optimizes classification of trans-Golgi and cis-Golgi proteins. After feature extraction, we have employed Random Forest (RF) model to rank the features based on the importance score obtained from it. After selecting the top ranked features, we have applied Support Vector Machine (SVM) to classify the sub-Golgi proteins. We have trained regression model as well as classification model and found the former to be superior. The model shows improved performance over all previous methods. As the benchmark dataset is significantly imbalanced, we have applied Synthetic Minority Over-sampling Technique (SMOTE) to the dataset to make it balanced and have conducted experiments on both versions. Our method, namely, identification of sub-Golgi Protein Types (isGPT), achieves accuracy values of 95.4%, 95.9% and 95.3% for 10-fold cross-validation test, jackknife test and independent test respectively. According to different performance metrics, isGPT performs better than state-of-the-art techniques. The source code of isGPT, along with relevant dataset and detailed experimental results, can be found at https://github.com/srautonu/isGPT.

Collapse

Ahmad J, Javed F, Hayat M. Intelligent computational model for classification of sub-Golgi protein using oversampling and fisher feature selection methods. Artif Intell Med 2017;78:14-22. [DOI: 10.1016/j.artmed.2017.05.001] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2017] [Revised: 04/19/2017] [Accepted: 05/02/2017] [Indexed: 10/19/2022]

Prediction of Golgi-resident protein types using general form of Chou's pseudo-amino acid compositions: Approaches with minimal redundancy maximal relevance feature selection. J Theor Biol 2016;402:38-44. [PMID: 27155042 DOI: 10.1016/j.jtbi.2016.04.032] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2016] [Revised: 04/19/2016] [Accepted: 04/26/2016] [Indexed: 11/20/2022]

Jiao YS, Du PF. Predicting Golgi-resident protein types using pseudo amino acid compositions: Approaches with positional specific physicochemical properties. J Theor Biol 2015;391:35-42. [PMID: 26702543 DOI: 10.1016/j.jtbi.2015.11.009] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2015] [Revised: 11/17/2015] [Accepted: 11/19/2015] [Indexed: 11/24/2022]

Schoberer J, Liebminger E, Vavra U, Veit C, Castilho A, Dicker M, Maresch D, Altmann F, Hawes C, Botchway SW, Strasser R. The transmembrane domain of N -acetylglucosaminyltransferase I is the key determinant for its Golgi subcompartmentation. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2014;80:809-22. [PMID: 25230686 PMCID: PMC4282539 DOI: 10.1111/tpj.12671] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/07/2014] [Revised: 08/28/2014] [Accepted: 09/11/2014] [Indexed: 05/18/2023]

Li X, Wu X, Wu G. Robust feature generation for protein subchloroplast location prediction with a weighted GO transfer model. J Theor Biol 2014;347:84-94. [PMID: 24423409 DOI: 10.1016/j.jtbi.2014.01.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2013] [Revised: 10/17/2013] [Accepted: 01/03/2014] [Indexed: 10/25/2022]

Mei S. SVM ensemble based transfer learning for large-scale membrane proteins discrimination. J Theor Biol 2013;340:105-10. [PMID: 24050851 DOI: 10.1016/j.jtbi.2013.09.007] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2013] [Revised: 09/04/2013] [Accepted: 09/06/2013] [Indexed: 11/16/2022]

Using over-represented tetrapeptides to predict protein submitochondria locations. Acta Biotheor 2013;61:259-68. [PMID: 23475502 DOI: 10.1007/s10441-013-9181-9] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2012] [Accepted: 02/23/2013] [Indexed: 01/25/2023]

Mei S. Multi-label multi-kernel transfer learning for human protein subcellular localization. PLoS One 2012;7:e37716. [PMID: 22719847 PMCID: PMC3374840 DOI: 10.1371/journal.pone.0037716] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2011] [Accepted: 04/28/2012] [Indexed: 11/19/2022] Open

Abstract

Recent years have witnessed much progress in computational modelling for protein subcellular localization. However, the existing sequence-based predictive models demonstrate moderate or unsatisfactory performance, and the gene ontology (GO) based models may take the risk of performance overestimation for novel proteins. Furthermore, many human proteins have multiple subcellular locations, which renders the computational modelling more complicated. Up to the present, there are far few researches specialized for predicting the subcellular localization of human proteins that may reside in multiple cellular compartments. In this paper, we propose a multi-label multi-kernel transfer learning model for human protein subcellular localization (MLMK-TLM). MLMK-TLM proposes a multi-label confusion matrix, formally formulates three multi-labelling performance measures and adapts one-against-all multi-class probabilistic outputs to multi-label learning scenario, based on which to further extends our published work GO-TLM (gene ontology based transfer learning model for protein subcellular localization) and MK-TLM (multi-kernel transfer learning based on Chou's PseAAC formulation for protein submitochondria localization) for multiplex human protein subcellular localization. With the advantages of proper homolog knowledge transfer, comprehensive survey of model performance for novel protein and multi-labelling capability, MLMK-TLM will gain more practical applicability. The experiments on human protein benchmark dataset show that MLMK-TLM significantly outperforms the baseline model and demonstrates good multi-labelling ability for novel human proteins. Some findings (predictions) are validated by the latest Swiss-Prot database. The software can be freely downloaded at http://soft.synu.edu.cn/upload/msy.rar.

Collapse

Driouich A, Follet-Gueye ML, Bernard S, Kousar S, Chevalier L, Vicré-Gibouin M, Lerouxel O. Golgi-mediated synthesis and secretion of matrix polysaccharides of the primary cell wall of higher plants. FRONTIERS IN PLANT SCIENCE 2012;3:79. [PMID: 22639665 PMCID: PMC3355623 DOI: 10.3389/fpls.2012.00079] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2012] [Accepted: 04/09/2012] [Indexed: 05/17/2023]

Affiliation(s)

Azeddine Driouich Laboratoire ‶Glycobiologie et Matrice Extracellulaire Végétale″, UPRES EA 4358, Institut Federatif de Recherche Multidisciplinaire sur les Peptides, Plate-forme de Recherche en Imagerie Cellulaire de Haute Normandie, Université de RouenMont Saint Aignan, France *Correspondence: Azeddine Driouich, Laboratoire “Glycobiologie et Matrice Extracellulaire Végétale” UPRES EA 4358, Institut Federatif de Recherche Multidisciplinaire sur les Peptides, Plate-forme de Recherche en Imagerie Cellulaire de Haute Normandie, Université de Rouen, Rue Tesnière, Bâtiment Henri Gadeau de Kerville, 76821. Mont Saint Aignan, Cedex, France. e-mail:
Marie-Laure Follet-Gueye Laboratoire ‶Glycobiologie et Matrice Extracellulaire Végétale″, UPRES EA 4358, Institut Federatif de Recherche Multidisciplinaire sur les Peptides, Plate-forme de Recherche en Imagerie Cellulaire de Haute Normandie, Université de RouenMont Saint Aignan, France
Sophie Bernard Laboratoire ‶Glycobiologie et Matrice Extracellulaire Végétale″, UPRES EA 4358, Institut Federatif de Recherche Multidisciplinaire sur les Peptides, Plate-forme de Recherche en Imagerie Cellulaire de Haute Normandie, Université de RouenMont Saint Aignan, France
Sumaira Kousar Centre de Recherches sur les Macromolécules végétales–CNRS, Université Joseph FourierGrenoble, France
Laurence Chevalier Institut des Matériaux/UMR6634/CNRS, Faculté des Sciences et Techniques, Université de RouenSt. Etienne du Rouvray Cedex, France
Maïté Vicré-Gibouin Laboratoire ‶Glycobiologie et Matrice Extracellulaire Végétale″, UPRES EA 4358, Institut Federatif de Recherche Multidisciplinaire sur les Peptides, Plate-forme de Recherche en Imagerie Cellulaire de Haute Normandie, Université de RouenMont Saint Aignan, France
Olivier Lerouxel Centre de Recherches sur les Macromolécules végétales–CNRS, Université Joseph FourierGrenoble, France

Collapse

Du P, Li T, Wang X. Recent progress in predicting protein sub-subcellular locations. Expert Rev Proteomics 2011;8:391-404. [PMID: 21679119 DOI: 10.1586/epr.11.20] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Mei S, Fei W, Zhou S. Gene ontology based transfer learning for protein subcellular localization. BMC Bioinformatics 2011;12:44. [PMID: 21284890 PMCID: PMC3039576 DOI: 10.1186/1471-2105-12-44] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2010] [Accepted: 02/02/2011] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Prediction of protein subcellular localization generally involves many complex factors, and using only one or two aspects of data information may not tell the true story. For this reason, some recent predictive models are deliberately designed to integrate multiple heterogeneous data sources for exploiting multi-aspect protein feature information. Gene ontology, hereinafter referred to as GO, uses a controlled vocabulary to depict biological molecules or gene products in terms of biological process, molecular function and cellular component. With the rapid expansion of annotated protein sequences, gene ontology has become a general protein feature that can be used to construct predictive models in computational biology. Existing models generally either concatenated the GO terms into a flat binary vector or applied majority-vote based ensemble learning for protein subcellular localization, both of which can not estimate the individual discriminative abilities of the three aspects of gene ontology.

RESULTS

In this paper, we propose a Gene Ontology Based Transfer Learning Model (GO-TLM) for large-scale protein subcellular localization. The model transfers the signature-based homologous GO terms to the target proteins, and further constructs a reliable learning system to reduce the adverse affect of the potential false GO terms that are resulted from evolutionary divergence. We derive three GO kernels from the three aspects of gene ontology to measure the GO similarity of two proteins, and derive two other spectrum kernels to measure the similarity of two protein sequences. We use simple non-parametric cross validation to explicitly weigh the discriminative abilities of the five kernels, such that the time & space computational complexities are greatly reduced when compared to the complicated semi-definite programming and semi-indefinite linear programming. The five kernels are then linearly merged into one single kernel for protein subcellular localization. We evaluate GO-TLM performance against three baseline models: MultiLoc, MultiLoc-GO and Euk-mPLoc on the benchmark datasets the baseline models adopted. 5-fold cross validation experiments show that GO-TLM achieves substantial accuracy improvement against the baseline models: 80.38% against model Euk-mPLoc 67.40% with 12.98% substantial increase; 96.65% and 96.27% against model MultiLoc-GO 89.60% and 89.60%, with 7.05% and 6.67% accuracy increase on dataset MultiLoc plant and dataset MultiLoc animal, respectively; 97.14%, 95.90% and 96.85% against model MultiLoc-GO 83.70%, 90.10% and 85.70%, with accuracy increase 13.44%, 5.8% and 11.15% on dataset BaCelLoc plant, dataset BaCelLoc fungi and dataset BaCelLoc animal respectively. For BaCelLoc independent sets, GO-TLM achieves 81.25%, 80.45% and 79.46% on dataset BaCelLoc plant holdout, dataset BaCelLoc plant holdout and dataset BaCelLoc animal holdout, respectively, as compared against baseline model MultiLoc-GO 76%, 60.00% and 73.00%, with accuracy increase 5.25%, 20.45% and 6.46%, respectively.

CONCLUSIONS

Since direct homology-based GO term transfer may be prone to introducing noise and outliers to the target protein, we design an explicitly weighted kernel learning system (called Gene Ontology Based Transfer Learning Model, GO-TLM) to transfer to the target protein the known knowledge about related homologous proteins, which can reduce the risk of outliers and share knowledge between homologous proteins, and thus achieve better predictive performance for protein subcellular localization. Cross validation and independent test experimental results show that the homology-based GO term transfer and explicitly weighing the GO kernels substantially improve the prediction performance.

Collapse

Oikawa A, Joshi HJ, Rennie EA, Ebert B, Manisseri C, Heazlewood JL, Scheller HV. An integrative approach to the identification of Arabidopsis and rice genes involved in xylan and secondary wall development. PLoS One 2010;5:e15481. [PMID: 21124849 PMCID: PMC2990762 DOI: 10.1371/journal.pone.0015481] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2010] [Accepted: 09/24/2010] [Indexed: 11/19/2022] Open

Sun W, Jin J, Xu R, Hu W, Szulc ZM, Bielawski J, Obeid LM, Mao C. Substrate specificity, membrane topology, and activity regulation of human alkaline ceramidase 2 (ACER2). J Biol Chem 2010;285:8995-9007. [PMID: 20089856 PMCID: PMC2838321 DOI: 10.1074/jbc.m109.069203] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2009] [Revised: 01/14/2010] [Indexed: 11/06/2022] Open

Abstract

Human alkaline ceramidase 2 (ACER2) plays an important role in cellular responses by regulating the hydrolysis of ceramides in cells. Here we report its biochemical characterization, membrane topology, and activity regulation. Recombinant ACER2 was expressed in yeast mutant cells (Deltaypc1Deltaydc1) that lack endogenous ceramidase activity, and microsomes from ACER2-expressiong yeast cells were used to biochemically characterize ACER2. ACER2 catalyzed the hydrolysis of various ceramides and followed Michaelis-Menten kinetics. ACER2 required Ca(2+) for both its in vitro and cellular activities. ACER2 has 7 putative transmembrane domains, and its amino (N) and carboxyl (C) termini were found to be oriented in the lumen of the Golgi complex and cytosol, respectively. ACER2 mutant (ACER2DeltaN36) lacking the N-terminal tail (the first 36 amino acid residues) exhibited undetectable activity and was mislocalized to the endoplasmic reticulum, suggesting that the N-terminal tail is necessary for both ACER2 activity and Golgi localization. ACER2 mutant (ACER2DeltaN13) lacking the first 13 residues was also mislocalized to the endoplasmic reticulum although it retained ceramidase activity. Overexpression of ACER2, ACER2DeltaN13, but not ACER2DeltaN36 increased the release of sphingosine 1-phosphate from cells, suggesting that its mislocalization does not affect the ability of ACER2 to regulate sphingosine 1-phosphate secretion. However, overexpression of ACER2 but not ACER2DeltaN13 or ACER2DeltaN36 inhibited the glycosylation of integrin beta1 subunit and Lamp1, suggesting that its mistargeting abolishes the ability of ACER2 to regulation protein glycosylation. These data suggest that ACER2 has broad substrate specificity and requires Ca(2+) for its activity and that ACER2 has the cytosolic C terminus and luminal N terminus, which are essential for its activity, correct cellular localization, and regulation for protein glycosylation.

Collapse

Mei S, Fei W. Amino acid classification based spectrum kernel fusion for protein subnuclear localization. BMC Bioinformatics 2010;11 Suppl 1:S17. [PMID: 20122188 PMCID: PMC3009488 DOI: 10.1186/1471-2105-11-s1-s17] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open

Abstract

BACKGROUND

Prediction of protein localization in subnuclear organelles is more challenging than general protein subcelluar localization. There are only three computational models for protein subnuclear localization thus far, to the best of our knowledge. Two models were based on protein primary sequence only. The first model assumed homogeneous amino acid substitution pattern across all protein sequence residue sites and used BLOSUM62 to encode k-mer of protein sequence. Ensemble of SVM based on different k-mers drew the final conclusion, achieving 50% overall accuracy. The simplified assumption did not exploit protein sequence profile and ignored the fact of heterogeneous amino acid substitution patterns across sites. The second model derived the PsePSSM feature representation from protein sequence by simply averaging the profile PSSM and combined the PseAA feature representation to construct a kNN ensemble classifier Nuc-PLoc, achieving 67.4% overall accuracy. The two models based on protein primary sequence only both achieved relatively poor predictive performance. The third model required that GO annotations be available, thus restricting the model's applicability.

METHODS

In this paper, we only use the amino acid information of protein sequence without any other information to design a widely-applicable model for protein subnuclear localization. We use K-spectrum kernel to exploit the contextual information around an amino acid and the conserved motif information. Besides expanding window size, we adopt various amino acid classification approaches to capture diverse aspects of amino acid physiochemical properties. Each amino acid classification generates a series of spectrum kernels based on different window size. Thus, (I) window expansion can capture more contextual information and cover size-varying motifs; (II) various amino acid classifications can exploit multi-aspect biological information from the protein sequence. Finally, we combine all the spectrum kernels by simple addition into one single kernel called SpectrumKernel+ for protein subnuclear localization.

RESULTS

We conduct the performance evaluation experiments on two benchmark datasets: Lei and Nuc-PLoc. Experimental results show that SpectrumKernel+ achieves substantial performance improvement against the previous model Nuc-PLoc, with overall accuracy 83.47% against 67.4%; and 71.23% against 50% of Lei SVM Ensemble, against 66.50% of Lei GO SVM Ensemble.

CONCLUSION

The method SpectrumKernel+ can exploit rich amino acid information of protein sequence by embedding into implicit size-varying motifs the multi-aspect amino acid physiochemical properties captured by amino acid classification approaches. The kernels derived from diverse amino acid classification approaches and different sizes of k-mer are summed together for data integration. Experiments show that the method SpectrumKernel+ significantly outperforms the existing models for protein subnuclear localization.

Collapse