51
|
Yang Z, Wang J, Zheng Z, Bai X. A New Method for Recognizing Cytokines Based on Feature Combination and a Support Vector Machine Classifier. Molecules 2018; 23:E2008. [PMID: 30103521 PMCID: PMC6222536 DOI: 10.3390/molecules23082008] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2018] [Revised: 07/31/2018] [Accepted: 08/07/2018] [Indexed: 12/14/2022] Open
Abstract
Research on cytokine recognition is of great significance in the medical field due to the fact cytokines benefit the diagnosis and treatment of diseases, but the current methods for cytokine recognition have many shortcomings, such as low sensitivity and low F-score. Therefore, this paper proposes a new method on the basis of feature combination. The features are extracted from compositions of amino acids, physicochemical properties, secondary structures, and evolutionary information. The classifier used in this paper is SVM. Experiments show that our method is better than other methods in terms of accuracy, sensitivity, specificity, F-score and Matthew's correlation coefficient.
Collapse
Affiliation(s)
- Zhe Yang
- School of Computer Science, Inner Mongolia University, Hohhot, Inner Mongolia 010021, China.
| | - Juan Wang
- School of Computer Science, Inner Mongolia University, Hohhot, Inner Mongolia 010021, China.
| | - Zhida Zheng
- School of Computer Science, Inner Mongolia University, Hohhot, Inner Mongolia 010021, China.
| | - Xin Bai
- School of Computer Science, Inner Mongolia University, Hohhot, Inner Mongolia 010021, China.
| |
Collapse
|
52
|
Arabidopsis Heat Stress-Induced Proteins Are Enriched in Electrostatically Charged Amino Acids and Intrinsically Disordered Regions. Int J Mol Sci 2018; 19:ijms19082276. [PMID: 30081447 PMCID: PMC6121531 DOI: 10.3390/ijms19082276] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2018] [Revised: 07/24/2018] [Accepted: 07/31/2018] [Indexed: 01/06/2023] Open
Abstract
Comparison of the proteins of thermophilic, mesophilic, and psychrophilic prokaryotes has revealed several features characteristic to proteins adapted to high temperatures, which increase their thermostability. These characteristics include a profusion of disulfide bonds, salt bridges, hydrogen bonds, and hydrophobic interactions, and a depletion in intrinsically disordered regions. It is unclear, however, whether such differences can also be observed in eukaryotic proteins or when comparing proteins that are adapted to temperatures that are more subtly different. When an organism is exposed to high temperatures, a subset of its proteins is overexpressed (heat-induced proteins), whereas others are either repressed (heat-repressed proteins) or remain unaffected. Here, we determine the expression levels of all genes in the eukaryotic model system Arabidopsis thaliana at 22 and 37 °C, and compare both the amino acid compositions and levels of intrinsic disorder of heat-induced and heat-repressed proteins. We show that, compared to heat-repressed proteins, heat-induced proteins are enriched in electrostatically charged amino acids and depleted in polar amino acids, mirroring thermophile proteins. However, in contrast with thermophile proteins, heat-induced proteins are enriched in intrinsically disordered regions, and depleted in hydrophobic amino acids. Our results indicate that temperature adaptation at the level of amino acid composition and intrinsic disorder can be observed not only in proteins of thermophilic organisms, but also in eukaryotic heat-induced proteins; the underlying adaptation pathways, however, are similar but not the same.
Collapse
|
53
|
Classifying the molecular functions of Rab GTPases in membrane trafficking using deep convolutional neural networks. Anal Biochem 2018; 555:33-41. [DOI: 10.1016/j.ab.2018.06.011] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2018] [Revised: 06/07/2018] [Accepted: 06/12/2018] [Indexed: 01/26/2023]
|
54
|
Predicting protein submitochondrial locations by incorporating the pseudo-position specific scoring matrix into the general Chou's pseudo-amino acid composition. J Theor Biol 2018; 450:86-103. [DOI: 10.1016/j.jtbi.2018.04.026] [Citation(s) in RCA: 61] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Revised: 04/10/2018] [Accepted: 04/16/2018] [Indexed: 01/16/2023]
|
55
|
da Costa WLO, Araújo CLDA, Dias LM, Pereira LCDS, Alves JTC, Araújo FA, Folador EL, Henriques I, Silva A, Folador ARC. Functional annotation of hypothetical proteins from the Exiguobacterium antarcticum strain B7 reveals proteins involved in adaptation to extreme environments, including high arsenic resistance. PLoS One 2018; 13:e0198965. [PMID: 29940001 PMCID: PMC6016940 DOI: 10.1371/journal.pone.0198965] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2018] [Accepted: 05/28/2018] [Indexed: 02/07/2023] Open
Abstract
Exiguobacterium antarcticum strain B7 is a psychrophilic Gram-positive bacterium that possesses enzymes that can be used for several biotechnological applications. However, many proteins from its genome are considered hypothetical proteins (HPs). These functionally unknown proteins may indicate important functions regarding the biological role of this bacterium, and the use of bioinformatics tools can assist in the biological understanding of this organism through functional annotation analysis. Thus, our study aimed to assign functions to proteins previously described as HPs, present in the genome of E. antarcticum B7. We used an extensive in silico workflow combining several bioinformatics tools for function annotation, sub-cellular localization and physicochemical characterization, three-dimensional structure determination, and protein-protein interactions. This genome contains 2772 genes, of which 765 CDS were annotated as HPs. The amino acid sequences of all HPs were submitted to our workflow and we successfully attributed function to 132 HPs. We identified 11 proteins that play important roles in the mechanisms of adaptation to adverse environments, such as flagellar biosynthesis, biofilm formation, carotenoids biosynthesis, and others. In addition, three predicted HPs are possibly related to arsenic tolerance. Through an in vitro assay, we verified that E. antarcticum B7 can grow at high concentrations of this metal. The approach used was important to precisely assign function to proteins from diverse classes and to infer relationships with proteins with functions already described in the literature. This approach aims to produce a better understanding of the mechanism by which this bacterium adapts to extreme environments and to the finding of targets with biotechnological interest.
Collapse
Affiliation(s)
- Wana Lailan Oliveira da Costa
- Laboratory of Genomic and Bioinformatics, Center of Genomics and System Biology, Institute of Biological Science, Federal University of Para, Belém, Pará, Brazil
| | - Carlos Leonardo de Aragão Araújo
- Laboratory of Genomic and Bioinformatics, Center of Genomics and System Biology, Institute of Biological Science, Federal University of Para, Belém, Pará, Brazil
| | - Larissa Maranhão Dias
- Laboratory of Genomic and Bioinformatics, Center of Genomics and System Biology, Institute of Biological Science, Federal University of Para, Belém, Pará, Brazil
| | - Lino César de Sousa Pereira
- Laboratory of Genomic and Bioinformatics, Center of Genomics and System Biology, Institute of Biological Science, Federal University of Para, Belém, Pará, Brazil
| | - Jorianne Thyeska Castro Alves
- Laboratory of Genomic and Bioinformatics, Center of Genomics and System Biology, Institute of Biological Science, Federal University of Para, Belém, Pará, Brazil
| | - Fabrício Almeida Araújo
- Laboratory of Genomic and Bioinformatics, Center of Genomics and System Biology, Institute of Biological Science, Federal University of Para, Belém, Pará, Brazil
| | - Edson Luiz Folador
- Biotechnology Center, Federal University of Paraiba, João Pessoa, Paraíba, Brazil
| | - Isabel Henriques
- Biology Department & CESAM, University of Aveiro, Aveiro, Portugal
| | - Artur Silva
- Laboratory of Genomic and Bioinformatics, Center of Genomics and System Biology, Institute of Biological Science, Federal University of Para, Belém, Pará, Brazil
| | - Adriana Ribeiro Carneiro Folador
- Laboratory of Genomic and Bioinformatics, Center of Genomics and System Biology, Institute of Biological Science, Federal University of Para, Belém, Pará, Brazil
- * E-mail: ,
| |
Collapse
|
56
|
Yu B, Li S, Qiu W, Wang M, Du J, Zhang Y, Chen X. Prediction of subcellular location of apoptosis proteins by incorporating PsePSSM and DCCA coefficient based on LFDA dimensionality reduction. BMC Genomics 2018; 19:478. [PMID: 29914358 PMCID: PMC6006758 DOI: 10.1186/s12864-018-4849-9] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Accepted: 06/01/2018] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Apoptosis is associated with some human diseases, including cancer, autoimmune disease, neurodegenerative disease and ischemic damage, etc. Apoptosis proteins subcellular localization information is very important for understanding the mechanism of programmed cell death and the development of drugs. Therefore, the prediction of subcellular localization of apoptosis protein is still a challenging task. RESULTS In this paper, we propose a novel method for predicting apoptosis protein subcellular localization, called PsePSSM-DCCA-LFDA. Firstly, the protein sequences are extracted by combining pseudo-position specific scoring matrix (PsePSSM) and detrended cross-correlation analysis coefficient (DCCA coefficient), then the extracted feature information is reduced dimensionality by LFDA (local Fisher discriminant analysis). Finally, the optimal feature vectors are input to the SVM classifier to predict subcellular location of the apoptosis proteins. The overall prediction accuracy of 99.7, 99.6 and 100% are achieved respectively on the three benchmark datasets by the most rigorous jackknife test, which is better than other state-of-the-art methods. CONCLUSION The experimental results indicate that our method can significantly improve the prediction accuracy of subcellular localization of apoptosis proteins, which is quite high to be able to become a promising tool for further proteomics studies. The source code and all datasets are available at https://github.com/QUST-BSBRC/PsePSSM-DCCA-LFDA/ .
Collapse
Affiliation(s)
- Bin Yu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China. .,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China. .,School of Life Sciences, University of Science and Technology of China, Hefei, 230027, China.
| | - Shan Li
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China.,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Wenying Qiu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China.,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Minghui Wang
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China.,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Junwei Du
- College of Information Science and Technology, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Yusen Zhang
- School of Mathematics and Statistics, Shandong University at Weihai, Weihai, 264209, China
| | - Xing Chen
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 21116, China
| |
Collapse
|
57
|
Wang S, Yue Y. Protein subnuclear localization based on a new effective representation and intelligent kernel linear discriminant analysis by dichotomous greedy genetic algorithm. PLoS One 2018; 13:e0195636. [PMID: 29649330 PMCID: PMC5896989 DOI: 10.1371/journal.pone.0195636] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Accepted: 03/26/2018] [Indexed: 01/03/2023] Open
Abstract
A wide variety of methods have been proposed in protein subnuclear localization to improve the prediction accuracy. However, one important trend of these means is to treat fusion representation by fusing multiple feature representations, of which, the fusion process takes a lot of time. In view of this, this paper novelly proposed a method by combining a new single feature representation and a new algorithm to obtain good recognition rate. Specifically, based on the position-specific scoring matrix (PSSM), we proposed a new expression, correlation position-specific scoring matrix (CoPSSM) as the protein feature representation. Based on the classic nonlinear dimension reduction algorithm, kernel linear discriminant analysis (KLDA), we added a new discriminant criterion and proposed a dichotomous greedy genetic algorithm (DGGA) to intelligently select its kernel bandwidth parameter. Two public datasets with Jackknife test and KNN classifier were used for the numerical experiments. The results showed that the overall success rate (OSR) with single representation CoPSSM is larger than that with many relevant representations. The OSR of the proposed method can reach as high as 87.444% and 90.3361% for these two datasets, respectively, outperforming many current methods. To show the generalization of the proposed algorithm, two extra standard datasets of protein subcellular were chosen to conduct the expending experiment, and the prediction accuracy by Jackknife test and Independent test is still considerable.
Collapse
Affiliation(s)
- Shunfang Wang
- School of Information Science and Engineering, Yunnan University, Kunming, PR China
- * E-mail:
| | - Yaoting Yue
- School of Information Science and Engineering, Yunnan University, Kunming, PR China
| |
Collapse
|
58
|
Abstract
Many computational methods are available for predicting protein sorting in bacteria. When comparing them, it is important to know that they can be grouped into three fundamentally different approaches: signal-based, global-property-based and homology-based prediction. In this chapter, the strengths and drawbacks of each of these approaches is described through many examples of methods that predict secretion, integration into membranes, or subcellular locations in general. The aim of this chapter is to provide a user-level introduction to the field with a minimum of computational theory.
Collapse
Affiliation(s)
- Henrik Nielsen
- Technical University of Denmark, Kemitorvet, Building 208, DK-2800, Kgs. Lyngby, Denmark.
| |
Collapse
|
59
|
Shou W, Kang F, Lu J. Nature and Value of Freely Dissolved EPS Ecosystem Services: Insight into Molecular Coupling Mechanisms for Regulating Metal Toxicity. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2018; 52:457-466. [PMID: 29258301 DOI: 10.1021/acs.est.7b04834] [Citation(s) in RCA: 54] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Abstract
Extracellular polymeric substances (EPSs) dispersed in natural waters play a significant role in relieving impacts to microbial survival associated with heavy metal release, yet little is known about the association of freely dissolved EPS ecosystem services with metal transformation in natural waters. Here, we demonstrate that dispersive EPSs mitigate the metal toxicity to microbial cells through an associative coordination reaction. Microtitrimetry coupled with fluorescence spectroscopy ascribes the combination of freely dissolved EPSs from Escherichia coli (E. coli) with Cu2+/Cd2+ to a coordination reaction associated with chemical static quenching. Fourier transform infrared spectroscopy (FTIR), X-ray photoelectron spectroscopy (XPS), and computational chemistry confirm that carboxyl residues in protein-like substances of the EPSs are responsible for the coordination. Frontier molecular orbitals (MOs) of a deprotonated carboxyl integrate with the occupied d orbitals of Cu2+ and/or d, s orbitals of Cd2+ to form metal-EPS complexes. Microcosmic systems show that because the metal-EPS complexes decrease cellular absorbability of metals, E. coli survivals increase by 4.3 times for Cu2+ and 1.6 times for Cd2+, respectively. Based on bonding energies for six metals-EPS coordination, an associative toxic effect further confirms that increased bonding energies facilitate retardation of metals in the EPS matrix, protecting against E. coli apoptosis.
Collapse
Affiliation(s)
- Weijun Shou
- College of Resources and Environmental Sciences, Nanjing Agricultural University , Nanjing, Jiangsu 210095, China
| | - Fuxing Kang
- College of Resources and Environmental Sciences, Nanjing Agricultural University , Nanjing, Jiangsu 210095, China
| | - Jiahao Lu
- College of Resources and Environmental Sciences, Nanjing Agricultural University , Nanjing, Jiangsu 210095, China
| |
Collapse
|
60
|
Prediction of protein subcellular localization with oversampling approach and Chou's general PseAAC. J Theor Biol 2018; 437:239-250. [DOI: 10.1016/j.jtbi.2017.10.030] [Citation(s) in RCA: 76] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2017] [Revised: 09/29/2017] [Accepted: 10/27/2017] [Indexed: 12/27/2022]
|
61
|
Wang L, Zhao Y, Chen Y, Wang D. The effect of three novel feature extraction methods on the prediction of the subcellular localization of multi-site virus proteins. Bioengineered 2018; 9:196-202. [PMID: 28886267 PMCID: PMC5972939 DOI: 10.1080/21655979.2017.1373536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 07/05/2017] [Indexed: 11/08/2022] Open
Abstract
Experimental methods play a crucial role in identifying the subcellular localization of proteins and building high-quality databases. However, more efficient, automated computational methods are required to predict the subcellular localization of proteins on a large scale. Various efficient feature extraction methods have been proposed to predict subcellular localization, but challenges remain. In this paper, three novel feature extraction methods are established to improve multi-site prediction. The first novel feature extraction method utilizes repetitive information via moving windows based on a dipeptide pseudo amino acid composition method (R-Dipeptide). The second novel feature extraction method utilizes the impact of each amino acid residue on its following residues based on pseudo amino acids (I-PseAAC). The third novel feature extraction method provides local information about protein sequences that reflects the strength of the physicochemical properties of residues (PseAAC2). The multi-label k-nearest neighbor algorithm (MLKNN) is used to predict the subcellular localization of multi-site virus proteins. The best overall accuracy values of R-Dipeptide, I-PseAAC, and PseAAC2 when applied to dataset S from Virus-mPloc are 59.92%, 59.13%, and 57.94% respectively.
Collapse
Affiliation(s)
- Lei Wang
- School of Information Science and Engineering, University of Jinan, Jinan, China
- Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, China
| | - Yaou Zhao
- School of Information Science and Engineering, University of Jinan, Jinan, China
- Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, China
| | - Yuehui Chen
- School of Information Science and Engineering, University of Jinan, Jinan, China
- Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, China
| | - Dong Wang
- School of Information Science and Engineering, University of Jinan, Jinan, China
- Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, China
| |
Collapse
|
62
|
Wan S, Duan Y, Zou Q. HPSLPred: An Ensemble Multi-Label Classifier for Human Protein Subcellular Location Prediction with Imbalanced Source. Proteomics 2017; 17. [PMID: 28776938 DOI: 10.1002/pmic.201700262] [Citation(s) in RCA: 70] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Revised: 07/19/2017] [Indexed: 11/11/2022]
Abstract
Predicting the subcellular localization of proteins is an important and challenging problem. Traditional experimental approaches are often expensive and time-consuming. Consequently, a growing number of research efforts employ a series of machine learning approaches to predict the subcellular location of proteins. There are two main challenges among the state-of-the-art prediction methods. First, most of the existing techniques are designed to deal with multi-class rather than multi-label classification, which ignores connections between multiple labels. In reality, multiple locations of particular proteins imply that there are vital and unique biological significances that deserve special focus and cannot be ignored. Second, techniques for handling imbalanced data in multi-label classification problems are necessary, but never employed. For solving these two issues, we have developed an ensemble multi-label classifier called HPSLPred, which can be applied for multi-label classification with an imbalanced protein source. For convenience, a user-friendly webserver has been established at http://server.malab.cn/HPSLPred.
Collapse
Affiliation(s)
- Shixiang Wan
- School of Computer Science and Technology, Tianjin University, Tianjin, P. R. China
| | - Yucong Duan
- State Key Laboratory of Marine Resource Utilization in the South China Sea, College of Information and Technology, Hainan University, Haikou, Hainan, P. R. China
| | - Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin, P. R. China
| |
Collapse
|
63
|
Accurate prediction of subcellular location of apoptosis proteins combining Chou's PseAAC and PsePSSM based on wavelet denoising. Oncotarget 2017; 8:107640-107665. [PMID: 29296195 PMCID: PMC5746097 DOI: 10.18632/oncotarget.22585] [Citation(s) in RCA: 59] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2017] [Accepted: 10/30/2017] [Indexed: 02/05/2023] Open
Abstract
Apoptosis proteins subcellular localization information are very important for understanding the mechanism of programmed cell death and the development of drugs. The prediction of subcellular localization of an apoptosis protein is still a challenging task because the prediction of apoptosis proteins subcellular localization can help to understand their function and the role of metabolic processes. In this paper, we propose a novel method for protein subcellular localization prediction. Firstly, the features of the protein sequence are extracted by combining Chou's pseudo amino acid composition (PseAAC) and pseudo-position specific scoring matrix (PsePSSM), then the feature information of the extracted is denoised by two-dimensional (2-D) wavelet denoising. Finally, the optimal feature vectors are input to the SVM classifier to predict subcellular location of apoptosis proteins. Quite promising predictions are obtained using the jackknife test on three widely used datasets and compared with other state-of-the-art methods. The results indicate that the method proposed in this paper can remarkably improve the prediction accuracy of apoptosis protein subcellular localization, which will be a supplementary tool for future proteomics research.
Collapse
|
64
|
Kameshwar AKS, Barber R, Qin W. Comparative modeling and molecular docking analysis of white, brown and soft rot fungal laccases using lignin model compounds for understanding the structural and functional properties of laccases. J Mol Graph Model 2017; 79:15-26. [PMID: 29127854 DOI: 10.1016/j.jmgm.2017.10.019] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Revised: 10/25/2017] [Accepted: 10/25/2017] [Indexed: 11/19/2022]
Abstract
Extrinsic catalytic properties of laccase enable it to oxidize a wide range of aromatic (phenolic and non-phenolic) compounds which makes it commercially an important enzyme. In this study, we have extensively compared and analyzed the physico-chemical, structural and functional properties of white, brown and soft rot fungal laccases using standard protein analysis software. We have computationally predicted the three-dimensional comparative models of these laccases and later performed the molecular docking studies using the lignin model compounds. We also report a customizable rapid and reliable protein modelling and docking pipeline for developing structurally and functionally stable protein structures. We have observed that soft rot fungal laccases exhibited comparatively higher structural variation (higher random coil) when compared to brown and white rot fungal laccases. White and brown rot fungal laccase sequences exhibited higher similarity for conserved domains of Trametes versicolor laccase, whereas soft rot fungal laccases shared higher similarity towards conserved domains of Melanocarpus albomyces laccase. Results obtained from molecular docking studies showed that aminoacids PRO, PHE, LEU, LYS and GLN were commonly found to interact with the ligands. We have also observed that white and brown rot fungal laccases showed similar docking patterns (topologically monomer, dimer and trimer bind at same pocket location and tetramer binds at another pocket location) when compared to soft rot fungal laccases. Finally, the binding efficiencies of white and brown rot fungal laccases with lignin model compounds were higher compared to the soft rot fungi. These findings can be further applied in developing genetically efficient laccases which can be applied in growing biofuel and bioremediation industries.
Collapse
Affiliation(s)
| | - Richard Barber
- Department of Biology, Lakehead University, 955 Oliver Road, Thunder Bay, Ontario, P7 B 5E1, Canada
| | - Wensheng Qin
- Department of Biology, Lakehead University, 955 Oliver Road, Thunder Bay, Ontario, P7 B 5E1, Canada.
| |
Collapse
|
65
|
Leinisch F, Mariotti M, Rykaer M, Lopez-Alarcon C, Hägglund P, Davies MJ. Peroxyl radical- and photo-oxidation of glucose 6-phosphate dehydrogenase generates cross-links and functional changes via oxidation of tyrosine and tryptophan residues. Free Radic Biol Med 2017; 112:240-252. [PMID: 28756310 DOI: 10.1016/j.freeradbiomed.2017.07.025] [Citation(s) in RCA: 57] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/31/2017] [Revised: 07/11/2017] [Accepted: 07/25/2017] [Indexed: 02/05/2023]
Abstract
Protein oxidation is a frequent event as a result of the high abundance of proteins in biological samples and the multiple processes that generate oxidants. The reactions that occur are complex and poorly understood, but can generate major structural and functional changes on proteins. Current data indicate that pathophysiological processes and multiple human diseases are associated with the accumulation of damaged proteins. In this study we investigated the mechanisms and consequences of exposure of the key metabolic enzyme glucose-6-phosphate dehydrogenase (G6PDH) to peroxyl radicals (ROO•) and singlet oxygen (1O2), with particular emphasis on the role of Trp and Tyr residues in protein cross-linking and fragmentation. Cross-links and high molecular mass aggregates were detected by SDS-PAGE and Western blotting using specific antibodies. Amino acid analysis has provided evidence for Trp and Tyr consumption and formation of oxygenated products (diols, peroxides, N-formylkynurenine, kynurenine) from Trp, and di-tyrosine (from Tyr). Mass spectrometric data obtained after trypsin-digestion in the presence of H216O and H218O, has allowed the mapping of specific cross-linked residues and their locations. These data indicate that specific Tyr-Trp and di-Tyr cross-links are formed from residues that are proximal and surface-accessible, and that the extent of Trp oxidation varies markedly between sites. Limited modification at other residues is also detected. These data indicate that Trp and Tyr residues are readily modified by ROO• and 1O2 with this giving products that impact significantly on protein structure and function. The formation of such cross-links may help rationalize the accumulation of damaged proteins in vivo.
Collapse
Affiliation(s)
- Fabian Leinisch
- Dept. of Biomedical Sciences, Panum Institute, University of Copenhagen, Copenhagen, Denmark
| | - Michele Mariotti
- Department of Biotechnology and Biomedicine, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Martin Rykaer
- Department of Biotechnology and Biomedicine, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Camilo Lopez-Alarcon
- Departamento de Química Física, Facultad de Química, Pontificia Universidad Catolica de Chile, Avda. Vicuña Mackenna 4860, Santiago, Chile
| | - Per Hägglund
- Department of Biotechnology and Biomedicine, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Michael J Davies
- Dept. of Biomedical Sciences, Panum Institute, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
66
|
Ju Z, Sun J, Li Y, Wang L. Predicting lysine glycation sites using bi-profile bayes feature extraction. Comput Biol Chem 2017; 71:98-103. [PMID: 29040908 DOI: 10.1016/j.compbiolchem.2017.10.004] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2017] [Revised: 09/14/2017] [Accepted: 10/07/2017] [Indexed: 12/21/2022]
Abstract
Glycation is a nonenzymatic post-translational modification which has been found to be involved in various biological processes and closely associated with many metabolic diseases. The accurate identification of glycation sites is important to understand the underlying molecular mechanisms of glycation. As the traditional experimental methods are often labor-intensive and time-consuming, it is desired to develop computational methods to predict glycation sites. In this study, a novel predictor named BPB_GlySite is proposed to predict lysine glycation sites by using bi-profile bayes feature extraction and support vector machine algorithm. As illustrated by 10-fold cross-validation, BPB_GlySite achieves a satisfactory performance with a Sensitivity of 63.68%, a Specificity of 72.60%, an Accuracy of 69.63% and a Matthew's correlation coefficient of 0.3499. Experimental results also indicate that BPB_GlySite significantly outperforms three existing glycation sites predictors: NetGlycate, PreGly and Gly-PseAAC. Therefore, BPB_GlySite can be a useful bioinformatics tool for the prediction of glycation sites. A user-friendly web-server for BPB_GlySite is established at 123.206.31.171/BPB_GlySite/.
Collapse
Affiliation(s)
- Zhe Ju
- College of Science, Shenyang Aerospace University, 110136, People's Republic of China.
| | - Juhe Sun
- College of Science, Shenyang Aerospace University, 110136, People's Republic of China
| | - Yanjie Li
- College of Science, Shenyang Aerospace University, 110136, People's Republic of China
| | - Li Wang
- College of Science, Shenyang Aerospace University, 110136, People's Republic of China
| |
Collapse
|
67
|
Prediction of lysine crotonylation sites by incorporating the composition of k -spaced amino acid pairs into Chou’s general PseAAC. J Mol Graph Model 2017; 77:200-204. [DOI: 10.1016/j.jmgm.2017.08.020] [Citation(s) in RCA: 70] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2017] [Revised: 08/21/2017] [Accepted: 08/21/2017] [Indexed: 12/11/2022]
|
68
|
Nielsen H. Predicting Subcellular Localization of Proteins by Bioinformatic Algorithms. Curr Top Microbiol Immunol 2017; 404:129-158. [PMID: 26728066 DOI: 10.1007/82_2015_5006] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
When predicting the subcellular localization of proteins from their amino acid sequences, there are basically three approaches: signal-based, global property-based, and homology-based. Each of these has its advantages and drawbacks, and it is important when comparing methods to know which approach was used. Various statistical and machine learning algorithms are used with all three approaches, and various measures and standards are employed when reporting the performances of the developed methods. This chapter presents a number of available methods for prediction of sorting signals and subcellular localization, but rather than providing a checklist of which predictors to use, it aims to function as a guide for critical assessment of prediction methods.
Collapse
Affiliation(s)
- Henrik Nielsen
- Department of Systems Biology, Center for Biological Sequence Analysis, Technical University of Denmark, Kemitorvet building 208, 2800, Lyngby, Denmark.
| |
Collapse
|
69
|
Ju Z, He JJ. Prediction of lysine propionylation sites using biased SVM and incorporating four different sequence features into Chou's PseAAC. J Mol Graph Model 2017; 76:356-363. [PMID: 28763688 DOI: 10.1016/j.jmgm.2017.07.022] [Citation(s) in RCA: 58] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2017] [Revised: 07/20/2017] [Accepted: 07/21/2017] [Indexed: 12/21/2022]
Abstract
Lysine propionylation is an important and common protein acylation modification in both prokaryotes and eukaryotes. To better understand the molecular mechanism of propionylation, it is important to identify propionylated substrates and their corresponding propionylation sites accurately. In this study, a novel bioinformatics tool named PropPred is developed to predict propionylation sites by using multiple feature extraction and biased support vector machine. On the one hand, various features are incorporated, including amino acid composition, amino acid factors, binary encoding, and the composition of k-spaced amino acid pairs. And the F-score feature method and the incremental feature selection algorithm are adopted to remove the redundant features. On the other hand, the biased support vector machine algorithm is used to handle the imbalanced problem in propionylation sites training dataset. As illustrated by 10-fold cross-validation, the performance of PropPred achieves a satisfactory performance with a Sensitivity of 70.03%, a Specificity of 75.61%, an accuracy of 75.02% and a Matthew's correlation coefficient of 0.3085. Feature analysis shows that some amino acid factors play the most important roles in the prediction of propionylation sites. These analysis and prediction results might provide some clues for understanding the molecular mechanisms of propionylation. A user-friendly web-server for PropPred is established at 123.206.31.171/PropPred/.
Collapse
Affiliation(s)
- Zhe Ju
- College of Science, Shenyang Aerospace University, 110136, People's Republic of China.
| | - Jian-Jun He
- College of Information and Communication Engineering, Dalian Minzu University, 116600, People's Republic of China.
| |
Collapse
|
70
|
Orfanoudaki G, Markaki M, Chatzi K, Tsamardinos I, Economou A. MatureP: prediction of secreted proteins with exclusive information from their mature regions. Sci Rep 2017; 7:3263. [PMID: 28607462 PMCID: PMC5468347 DOI: 10.1038/s41598-017-03557-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2017] [Accepted: 04/28/2017] [Indexed: 11/09/2022] Open
Abstract
More than a third of the cellular proteome is non-cytoplasmic. Most secretory proteins use the Sec system for export and are targeted to membranes using signal peptides and mature domains. To specifically analyze bacterial mature domain features, we developed MatureP, a classifier that predicts secretory sequences through features exclusively computed from their mature domains. MatureP was trained using Just Add Data Bio, an automated machine learning tool. Mature domains are predicted efficiently with ~92% success, as measured by the Area Under the Receiver Operating Characteristic Curve (AUC). Predictions were validated using experimental datasets of mutated secretory proteins. The features selected by MatureP reveal prominent differences in amino acid content between secreted and cytoplasmic proteins. Amino-terminal mature domain sequences have enhanced disorder, more hydroxyl and polar residues and less hydrophobics. Cytoplasmic proteins have prominent amino-terminal hydrophobic stretches and charged regions downstream. Presumably, secretory mature domains comprise a distinct protein class. They balance properties that promote the necessary flexibility required for the maintenance of non-folded states during targeting and secretion with the ability of post-secretion folding. These findings provide novel insight in protein trafficking, sorting and folding mechanisms and may benefit protein secretion biotechnology.
Collapse
Affiliation(s)
- Georgia Orfanoudaki
- Institute of Molecular Biology and Biotechnology-FORTH and Department of Biology-University of Crete, PO Box 1385, Heraklion, Crete, Greece
| | - Maria Markaki
- Computer Science Department, University of Crete, Heraklion, Greece
| | - Katerina Chatzi
- KU Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Laboratory of Molecular Bacteriology, B-3000, Leuven, Belgium
| | - Ioannis Tsamardinos
- Computer Science Department, University of Crete, Heraklion, Greece.,Gnosis Data Analysis PC, Heraklion, Greece
| | - Anastassios Economou
- Institute of Molecular Biology and Biotechnology-FORTH and Department of Biology-University of Crete, PO Box 1385, Heraklion, Crete, Greece. .,KU Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Laboratory of Molecular Bacteriology, B-3000, Leuven, Belgium.
| |
Collapse
|
71
|
Xiang Q, Liao B, Li X, Xu H, Chen J, Shi Z, Dai Q, Yao Y. Subcellular localization prediction of apoptosis proteins based on evolutionary information and support vector machine. Artif Intell Med 2017; 78:41-46. [PMID: 28764871 DOI: 10.1016/j.artmed.2017.05.007] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2016] [Revised: 05/08/2017] [Accepted: 05/11/2017] [Indexed: 01/06/2023]
Abstract
OBJECTIVES In this paper, a high-quality sequence encoding scheme is proposed for predicting subcellular location of apoptosis proteins. METHODS In the proposed methodology, the novel evolutionary-conservative information is introduced to represent protein sequences. Meanwhile, based on the proportion of golden section in mathematics, position-specific scoring matrix (PSSM) is divided into several blocks. Then, these features are predicted by support vector machine (SVM) and the predictive capability of proposed method is implemented by jackknife test RESULTS: The results show that the golden section method is better than no segmentation method. The overall accuracy for ZD98 and CL317 is 98.98% and 91.11%, respectively, which indicates that our method can play a complimentary role to the existing methods in the relevant areas. CONCLUSIONS The proposed feature representation is powerful and the prediction accuracy will be improved greatly, which denotes our method provides the state-of-the-art performance for predicting subcellular location of apoptosis proteins.
Collapse
Affiliation(s)
- Qilin Xiang
- School of Information Science and Engineering, Hunan University, Changsha 410082, China
| | - Bo Liao
- School of Information Science and Engineering, Hunan University, Changsha 410082, China
| | - Xianhong Li
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Huimin Xu
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Jing Chen
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Zhuoxing Shi
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Qi Dai
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Yuhua Yao
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China; School of Mathematics and Statistics, Hainan Normal University, Haikou 571158, China.
| |
Collapse
|
72
|
Wang P, Ge R, Liu L, Xiao X, Li Y, Cai Y. Multi-label Learning for Predicting the Activities of Antimicrobial Peptides. Sci Rep 2017; 7:2202. [PMID: 28526820 PMCID: PMC5438384 DOI: 10.1038/s41598-017-01986-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2016] [Accepted: 04/05/2017] [Indexed: 01/06/2023] Open
Abstract
Antimicrobial peptides (AMPs) are peptide antibiotics with a broad spectrum of antimicrobial activities. Activity prediction of AMPs from their amino acid sequences is of great therapeutic importance but imposes challenges on prediction methods due to label interactions. In this paper we propose a novel multi-label learning model to address this problem. A weighted K-nearest neighbor classifier is adopted for efficient representation learning of the sequence data. A multiple linear regression model is then employed to learn a mapping from the classifier score vectors to the target labels, with label correlations considered. Several popular multi-label learning algorithms and feature extraction methods were tested on a comprehensive, up-to-date AMP dataset with twelve biological activities covered and its filtered version with five activities covered. The experimental results showed that our proposed method has competitive performance with previous works and could be used as a powerful engine for activity prediction of AMPs.
Collapse
Affiliation(s)
- Pu Wang
- Shenzhen Institutes of Advanced Technology, and Key Lab for Health Informatics, Chinese Academy of Sciences, Shenzhen, Guangdong, 518055, China.,Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen, 518055, China.,Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, 333403, China
| | - Ruiquan Ge
- Shenzhen Institutes of Advanced Technology, and Key Lab for Health Informatics, Chinese Academy of Sciences, Shenzhen, Guangdong, 518055, China.,Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen, 518055, China.,School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, China
| | - Liming Liu
- Shenzhen Institutes of Advanced Technology, and Key Lab for Health Informatics, Chinese Academy of Sciences, Shenzhen, Guangdong, 518055, China.,College of Mathematics and Statistics, Shenzhen University, Shenzhen, 518060, China
| | - Xuan Xiao
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, 333403, China
| | - Ye Li
- Shenzhen Institutes of Advanced Technology, and Key Lab for Health Informatics, Chinese Academy of Sciences, Shenzhen, Guangdong, 518055, China.
| | - Yunpeng Cai
- Shenzhen Institutes of Advanced Technology, and Key Lab for Health Informatics, Chinese Academy of Sciences, Shenzhen, Guangdong, 518055, China.
| |
Collapse
|
73
|
Beuve A. Thiol-Based Redox Modulation of Soluble Guanylyl Cyclase, the Nitric Oxide Receptor. Antioxid Redox Signal 2017; 26:137-149. [PMID: 26906466 PMCID: PMC5240013 DOI: 10.1089/ars.2015.6591] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/27/2015] [Revised: 01/27/2016] [Accepted: 02/21/2016] [Indexed: 02/06/2023]
Abstract
SIGNIFICANCE Soluble guanylyl cyclase (sGC), which produces the second messenger cyclic guanosine 3', 5'-monophosphate (cGMP), is at the crossroads of nitric oxide (NO) signaling: sGC catalytic activity is both stimulated by NO binding to the heme and inhibited by NO modification of its cysteine (Cys) thiols (S-nitrosation). Modulation of sGC activity by thiol oxidation makes sGC a therapeutic target for pathologies originating from oxidative or nitrosative stress. sGC has an unusually high percentage of Cys for a cytosolic protein, the majority solvent exposed and therefore accessible modulatory targets for biological and pathophysiological signaling. Recent Advances: Thiol oxidation of sGC contributes to the development of cardiovascular diseases by decreasing NO-dependent cGMP production and thereby vascular reactivity. This thiol-based resistance to NO (e.g., increase in peripheral resistance) is observed in hypertension and hyperaldosteronism. CRITICAL ISSUES Some roles of specific Cys thiols have been identified in vitro. So far, it has not been possible to pinpoint the roles of specific Cys of sGC in vivo and to investigate the molecular mechanisms in an animal model. FUTURE DIRECTIONS The role of Cys as redox sensors, intermediates of activation, and mediators of change in sGC conformation, activity, and dimerization remains largely unexplored. To understand modulation of sGC activity, it is critical to investigate the roles of specific oxidative thiol modifications that are formed during these processes. Where the redox state of sGC thiols contribute to pathologies (vascular resistance and sGC desensitization by NO donors), it becomes crucial to design therapeutic strategies to restore sGC to its normal, physiological thiol redox state. Antioxid. Redox Signal. 26, 137-149.
Collapse
Affiliation(s)
- Annie Beuve
- Department of Pharmacology, Physiology and Neuroscience, New Jersey Medical School-Rutgers , Newark, New Jersey
| |
Collapse
|
74
|
Kang F, Wang Q, Shou W, Collins CD, Gao Y. Alkali-earth metal bridges formed in biofilm matrices regulate the uptake of fluoroquinolone antibiotics and protect against bacterial apoptosis. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2017; 220:112-123. [PMID: 27638458 DOI: 10.1016/j.envpol.2016.09.029] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2016] [Revised: 09/06/2016] [Accepted: 09/11/2016] [Indexed: 06/06/2023]
Abstract
Bacterially extracellular biofilms play a critical role in relieving toxicity of fluoroquinolone antibiotic (FQA) pollutants, yet it is unclear whether antibiotic attack may be defused by a bacterial one-two punch strategy associated with metal-reinforced detoxification efficiency. Our findings help to assign functions to specific structural features of biofilms, as they strongly imply a molecularly regulated mechanism by which freely accessed alkali-earth metals in natural waters affect the cellular uptake of FQAs at the water-biofilm interface. Specifically, formation of alkali-earth-metal (Ca2+ or Mg2+) bridge between modeling ciprofloxacin and biofilms of Escherichia coli regulates the trans-biofilm transport rate of FQAs towards cells (135-nm-thick biofilm). As the addition of Ca2+ and Mg2+ (0-3.5 mmol/L, CIP: 1.25 μmol/L), the transport rates were reduced to 52.4% and 63.0%, respectively. Computational chemistry analysis further demonstrated a deprotonated carboxyl in the tryptophan residues of biofilms acted as a major bridge site, of which one side is a metal and the other is a metal girder jointly connected to the carboxyl and carbonyl of a FQA. The bacterial growth rate depends on the bridging energy at anchoring site, which underlines the environmental importance of metal bridge formed in biofilm matrices in bacterially antibiotic resistance.
Collapse
Affiliation(s)
- Fuxing Kang
- Institute of Organic Contaminant Control and Soil Remediation, College of Resources and Environmental Sciences, Nanjing Agricultural University, Jiangsu 210095, China
| | - Qian Wang
- State Key Laboratory of Lake Science and Environment, Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences, Jiangsu 210008, China
| | - Weijun Shou
- Institute of Organic Contaminant Control and Soil Remediation, College of Resources and Environmental Sciences, Nanjing Agricultural University, Jiangsu 210095, China
| | - Chris D Collins
- Soil Research Centre, University of Reading, Whiteknights, Reading RG6 6DW, UK
| | - Yanzheng Gao
- Institute of Organic Contaminant Control and Soil Remediation, College of Resources and Environmental Sciences, Nanjing Agricultural University, Jiangsu 210095, China.
| |
Collapse
|
75
|
Hasan MAM, Ahmad S, Molla MKI. Protein subcellular localization prediction using multiple kernel learning based support vector machine. MOLECULAR BIOSYSTEMS 2017; 13:785-795. [DOI: 10.1039/c6mb00860g] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
An efficient multi-label protein subcellular localization prediction system was developed by introducing multiple kernel learning (MKL) based support vector machine (SVM).
Collapse
Affiliation(s)
- Md. Al Mehedi Hasan
- Department of Computer Science & Engineering
- University of Rajshahi
- Rajshahi
- Bangladesh
| | - Shamim Ahmad
- Department of Computer Science & Engineering
- University of Rajshahi
- Rajshahi
- Bangladesh
| | | |
Collapse
|
76
|
Construction of Multilevel Structure for Avian Influenza Virus System Based on Granular Computing. BIOMED RESEARCH INTERNATIONAL 2017; 2017:5404180. [PMID: 28191464 PMCID: PMC5278516 DOI: 10.1155/2017/5404180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/11/2016] [Revised: 12/01/2016] [Accepted: 12/14/2016] [Indexed: 12/03/2022]
Abstract
Exploring the genetic structure of influenza viruses attracts the attention in the field of molecular ecology and medical genetics, whose epidemics cause morbidity and mortality worldwide. The rapid variations in RNA strand and changes of protein structure of the virus result in low-accuracy subtyping identification and make it difficult to develop effective drugs and vaccine. This paper constructs the evolutionary structure of avian influenza virus system considering both hemagglutinin and neuraminidase protein fragments. An optimization model was established to determine the rational granularity of the virus system for exploring the intrinsic relationship among the subtypes based on the fuzzy hierarchical evaluation index. Thus, an algorithm was presented to extract the rational structure. Furthermore, to reduce the systematic and computational complexity, the granular signatures of virus system were identified based on the coarse-grained idea and then its performance was evaluated through a designed classifier. The results showed that the obtained virus signatures could approximate and reflect the whole avian influenza virus system, indicating that the proposed method could identify the effective virus signatures. Once a new molecular virus is detected, it is efficient to identify the homologous virus hierarchically.
Collapse
|
77
|
Predicting protein subcellular localization based on information content of gene ontology terms. Comput Biol Chem 2016; 65:1-7. [PMID: 27665466 DOI: 10.1016/j.compbiolchem.2016.09.009] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2016] [Revised: 07/10/2016] [Accepted: 09/11/2016] [Indexed: 01/11/2023]
Abstract
Predicting the location where a protein resides within a cell is important in cell biology. Computational approaches to this issue have attracted more and more attentions from the community of biomedicine. Among the protein features used to predict the subcellular localization of proteins, the feature derived from Gene Ontology (GO) has been shown to be superior to others. However, most of the sights in this field are set on the presence or absence of some predefined GO terms. We proposed a method to derive information from the intrinsic structure of the GO graph. The feature vector was constructed with each element in it representing the information content of the GO term annotating to a protein investigated, and the support vector machines was used as classifier to test our extracted features. Evaluation experiments were conducted on three protein datasets and the results show that our method can enhance eukaryotic and human subcellular location prediction accuracy by up to 1.1% better than previous studies that also used GO-based features. Especially in the scenario where the cellular component annotation is absent, our method can achieved satisfied results with an overall accuracy of more than 87%.
Collapse
|
78
|
Wang X, Li H, Zhang Q, Wang R. Predicting Subcellular Localization of Apoptosis Proteins Combining GO Features of Homologous Proteins and Distance Weighted KNN Classifier. BIOMED RESEARCH INTERNATIONAL 2016; 2016:1793272. [PMID: 27213149 PMCID: PMC4860209 DOI: 10.1155/2016/1793272] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/05/2016] [Revised: 03/30/2016] [Accepted: 03/31/2016] [Indexed: 02/06/2023]
Abstract
Apoptosis proteins play a key role in maintaining the stability of organism; the functions of apoptosis proteins are related to their subcellular locations which are used to understand the mechanism of programmed cell death. In this paper, we utilize GO annotation information of apoptosis proteins and their homologous proteins retrieved from GOA database to formulate feature vectors and then combine the distance weighted KNN classification algorithm with them to solve the data imbalance problem existing in CL317 data set to predict subcellular locations of apoptosis proteins. It is found that the number of homologous proteins can affect the overall prediction accuracy. Under the optimal number of homologous proteins, the overall prediction accuracy of our method on CL317 data set reaches 96.8% by Jackknife test. Compared with other existing methods, it shows that our proposed method is very effective and better than others for predicting subcellular localization of apoptosis proteins.
Collapse
Affiliation(s)
- Xiao Wang
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China
| | - Hui Li
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China
| | - Qiuwen Zhang
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China
| | - Rong Wang
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China
| |
Collapse
|
79
|
Wan S, Mak MW, Kung SY. Sparse regressions for predicting and interpreting subcellular localization of multi-label proteins. BMC Bioinformatics 2016; 17:97. [PMID: 26911432 PMCID: PMC4765148 DOI: 10.1186/s12859-016-0940-x] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Accepted: 01/27/2016] [Indexed: 11/10/2022] Open
Abstract
Background Predicting protein subcellular localization is indispensable for inferring protein functions. Recent studies have been focusing on predicting not only single-location proteins, but also multi-location proteins. Almost all of the high performing predictors proposed recently use gene ontology (GO) terms to construct feature vectors for classification. Despite their high performance, their prediction decisions are difficult to interpret because of the large number of GO terms involved. Results This paper proposes using sparse regressions to exploit GO information for both predicting and interpreting subcellular localization of single- and multi-location proteins. Specifically, we compared two multi-label sparse regression algorithms, namely multi-label LASSO (mLASSO) and multi-label elastic net (mEN), for large-scale predictions of protein subcellular localization. Both algorithms can yield sparse and interpretable solutions. By using the one-vs-rest strategy, mLASSO and mEN identified 87 and 429 out of more than 8,000 GO terms, respectively, which play essential roles in determining subcellular localization. More interestingly, many of the GO terms selected by mEN are from the biological process and molecular function categories, suggesting that the GO terms of these categories also play vital roles in the prediction. With these essential GO terms, not only where a protein locates can be decided, but also why it resides there can be revealed. Conclusions Experimental results show that the output of both mEN and mLASSO are interpretable and they perform significantly better than existing state-of-the-art predictors. Moreover, mEN selects more features and performs better than mLASSO on a stringent human benchmark dataset. For readers’ convenience, an online server called SpaPredictor for both mLASSO and mEN is available at http://bioinfo.eie.polyu.edu.hk/SpaPredictorServer/.
Collapse
Affiliation(s)
- Shibiao Wan
- Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong, SAR, China.
| | - Man-Wai Mak
- Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong, SAR, China.
| | - Sun-Yuan Kung
- Department of Electrical Engineering, Princeton University, New Jersey, USA.
| |
Collapse
|
80
|
Chen J, Xu H, He PA, Dai Q, Yao Y. A multiple information fusion method for predicting subcellular locations of two different types of bacterial protein simultaneously. Biosystems 2016; 139:37-45. [DOI: 10.1016/j.biosystems.2015.12.002] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2015] [Revised: 10/08/2015] [Accepted: 12/10/2015] [Indexed: 12/14/2022]
|
81
|
Protein Sub-Nuclear Localization Based on Effective Fusion Representations and Dimension Reduction Algorithm LDA. Int J Mol Sci 2015; 16:30343-61. [PMID: 26703574 PMCID: PMC4691178 DOI: 10.3390/ijms161226237] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2015] [Revised: 12/07/2015] [Accepted: 12/11/2015] [Indexed: 01/01/2023] Open
Abstract
An effective representation of a protein sequence plays a crucial role in protein sub-nuclear localization. The existing representations, such as dipeptide composition (DipC), pseudo-amino acid composition (PseAAC) and position specific scoring matrix (PSSM), are insufficient to represent protein sequence due to their single perspectives. Thus, this paper proposes two fusion feature representations of DipPSSM and PseAAPSSM to integrate PSSM with DipC and PseAAC, respectively. When constructing each fusion representation, we introduce the balance factors to value the importance of its components. The optimal values of the balance factors are sought by genetic algorithm. Due to the high dimensionality of the proposed representations, linear discriminant analysis (LDA) is used to find its important low dimensional structure, which is essential for classification and location prediction. The numerical experiments on two public datasets with KNN classifier and cross-validation tests showed that in terms of the common indexes of sensitivity, specificity, accuracy and MCC, the proposed fusing representations outperform the traditional representations in protein sub-nuclear localization, and the representation treated by LDA outperforms the untreated one.
Collapse
|
82
|
Predicting subcellular localization of multi-location proteins by improving support vector machines with an adaptive-decision scheme. INT J MACH LEARN CYB 2015. [DOI: 10.1007/s13042-015-0460-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
|
83
|
Wan S, Mak MW, Kung SY. mLASSO-Hum: A LASSO-based interpretable human-protein subcellular localization predictor. J Theor Biol 2015; 382:223-34. [DOI: 10.1016/j.jtbi.2015.06.042] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2015] [Revised: 06/25/2015] [Accepted: 06/26/2015] [Indexed: 02/03/2023]
|
84
|
Wang X, Zhang J, Li GZ. Multi-location gram-positive and gram-negative bacterial protein subcellular localization using gene ontology and multi-label classifier ensemble. BMC Bioinformatics 2015; 16 Suppl 12:S1. [PMID: 26329681 PMCID: PMC4705491 DOI: 10.1186/1471-2105-16-s12-s1] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Background It has become a very important and full of challenge task to predict bacterial protein subcellular locations using computational methods. Although there exist a lot of prediction methods for bacterial proteins, the majority of these methods can only deal with single-location proteins. But unfortunately many multi-location proteins are located in the bacterial cells. Moreover, multi-location proteins have special biological functions capable of helping the development of new drugs. So it is necessary to develop new computational methods for accurately predicting subcellular locations of multi-location bacterial proteins. Results In this article, two efficient multi-label predictors, Gpos-ECC-mPLoc and Gneg-ECC-mPLoc, are developed to predict the subcellular locations of multi-label gram-positive and gram-negative bacterial proteins respectively. The two multi-label predictors construct the GO vectors by using the GO terms of homologous proteins of query proteins and then adopt a powerful multi-label ensemble classifier to make the final multi-label prediction. The two multi-label predictors have the following advantages: (1) they improve the prediction performance of multi-label proteins by taking the correlations among different labels into account; (2) they ensemble multiple CC classifiers and further generate better prediction results by ensemble learning; and (3) they construct the GO vectors by using the frequency of occurrences of GO terms in the typical homologous set instead of using 0/1 values. Experimental results show that Gpos-ECC-mPLoc and Gneg-ECC-mPLoc can efficiently predict the subcellular locations of multi-label gram-positive and gram-negative bacterial proteins respectively. Conclusions Gpos-ECC-mPLoc and Gneg-ECC-mPLoc can efficiently improve prediction accuracy of subcellular localization of multi-location gram-positive and gram-negative bacterial proteins respectively. The online web servers for Gpos-ECC-mPLoc and Gneg-ECC-mPLoc predictors are freely accessible at http://biomed.zzuli.edu.cn/bioinfo/gpos-ecc-mploc/ and http://biomed.zzuli.edu.cn/bioinfo/gneg-ecc-mploc/ respectively.
Collapse
|
85
|
Tamboli AS, Rane NR, Patil SM, Biradar SP, Pawar PK, Govindwar SP. Physicochemical characterization, structural analysis and homology modeling of bacterial and fungal laccases using in silico methods. ACTA ACUST UNITED AC 2015. [DOI: 10.1007/s13721-015-0089-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
86
|
Li W, Freudenberg J, Oswald M. Principles for the organization of gene-sets. Comput Biol Chem 2015; 59 Pt B:139-49. [PMID: 26188561 DOI: 10.1016/j.compbiolchem.2015.04.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2015] [Accepted: 04/08/2015] [Indexed: 12/23/2022]
Abstract
A gene-set, an important concept in microarray expression analysis and systems biology, is a collection of genes and/or their products (i.e. proteins) that have some features in common. There are many different ways to construct gene-sets, but a systematic organization of these ways is lacking. Gene-sets are mainly organized ad hoc in current public-domain databases, with group header names often determined by practical reasons (such as the types of technology in obtaining the gene-sets or a balanced number of gene-sets under a header). Here we aim at providing a gene-set organization principle according to the level at which genes are connected: homology, physical map proximity, chemical interaction, biological, and phenotypic-medical levels. We also distinguish two types of connections between genes: actual connection versus sharing of a label. Actual connections denote direct biological interactions, whereas shared label connection denotes shared membership in a group. Some extensions of the framework are also addressed such as overlapping of gene-sets, modules, and the incorporation of other non-protein-coding entities such as microRNAs.
Collapse
Affiliation(s)
- Wentian Li
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, North Shore LIJ Health System, Manhasset, NY, USA.
| | - Jan Freudenberg
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, North Shore LIJ Health System, Manhasset, NY, USA
| | - Michaela Oswald
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, North Shore LIJ Health System, Manhasset, NY, USA
| |
Collapse
|
87
|
Wang X, Zhang W, Zhang Q, Li GZ. MultiP-SChlo: multi-label protein subchloroplast localization prediction with Chou's pseudo amino acid composition and a novel multi-label classifier. Bioinformatics 2015; 31:2639-45. [PMID: 25900916 DOI: 10.1093/bioinformatics/btv212] [Citation(s) in RCA: 101] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2014] [Accepted: 04/13/2015] [Indexed: 01/11/2023] Open
Abstract
MOTIVATION Identifying protein subchloroplast localization in chloroplast organelle is very helpful for understanding the function of chloroplast proteins. There have existed a few computational prediction methods for protein subchloroplast localization. However, these existing works have ignored proteins with multiple subchloroplast locations when constructing prediction models, so that they can predict only one of all subchloroplast locations of this kind of multilabel proteins. RESULTS To address this problem, through utilizing label-specific features and label correlations simultaneously, a novel multilabel classifier was developed for predicting protein subchloroplast location(s) with both single and multiple location sites. As an initial study, the overall accuracy of our proposed algorithm reaches 55.52%, which is quite high to be able to become a promising tool for further studies. AVAILABILITY AND IMPLEMENTATION An online web server for our proposed algorithm named MultiP-SChlo was developed, which are freely accessible at http://biomed.zzuli.edu.cn/bioinfo/multip-schlo/. CONTACT pandaxiaoxi@gmail.com or gzli@tongji.edu.cn SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiao Wang
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China and
| | - Weiwei Zhang
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China and
| | - Qiuwen Zhang
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China and
| | - Guo-Zheng Li
- Department of Control Science and Engineering, Tongji University, Shanghai 201804, China
| |
Collapse
|
88
|
Gu Q, Ding YS, Zhang TL. An ensemble classifier based prediction of G-protein-coupled receptor classes in low homology. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2014.12.013] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
89
|
mPLR-Loc: An adaptive decision multi-label classifier based on penalized logistic regression for protein subcellular localization prediction. Anal Biochem 2015; 473:14-27. [DOI: 10.1016/j.ab.2014.10.014] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2014] [Revised: 09/29/2014] [Accepted: 10/21/2014] [Indexed: 01/16/2023]
|
90
|
Arango-Argoty GA, Jaramillo-Garzón JA, Castellanos-Domínguez G. Feature extraction by statistical contact potentials and wavelet transform for predicting subcellular localizations in gram negative bacterial proteins. J Theor Biol 2015; 364:121-30. [PMID: 25219623 DOI: 10.1016/j.jtbi.2014.08.051] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2013] [Revised: 08/27/2014] [Accepted: 08/28/2014] [Indexed: 11/16/2022]
Abstract
Predicting the localization of a protein has become a useful practice for inferring its function. Most of the reported methods to predict subcellular localizations in Gram-negative bacterial proteins make use of standard protein representations that generally do not take into account the distribution of the amino acids and the structural information of the proteins. Here, we propose a protein representation based on the structural information contained in the pairwise statistical contact potentials. The wavelet transform decodes the information contained in the primary structure of the proteins, allowing the identification of patterns along the proteins, which are used to characterize the subcellular localizations. Then, a support vector machine classifier is trained to categorize them. Cellular compartments like periplasm and extracellular medium are difficult to predict, having a high false negative rate. The wavelet-based method achieves an overall high performance while maintaining a low false negative rate, particularly, on "periplasm" and "extracellular medium". Our results suggest the proposed protein characterization is a useful alternative to representing and predicting protein sequences over the classical and cutting edge protein depictions.
Collapse
Affiliation(s)
- G A Arango-Argoty
- Signal Processing and Recognition Group, Universidad Nacional de Colombia, s. Manizales, Campus La Nubia, km 7 via al Magdalena, Manizales, Colombia; Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, 3501 Fifth Ave, Pittsburgh, PA 15260, USA.
| | - J A Jaramillo-Garzón
- Signal Processing and Recognition Group, Universidad Nacional de Colombia, s. Manizales, Campus La Nubia, km 7 via al Magdalena, Manizales, Colombia; Research Center of the Instituto Tecnologico Metropolitano, Calle 73 No 76A-354, Medellín, Colombia
| | - G Castellanos-Domínguez
- Signal Processing and Recognition Group, Universidad Nacional de Colombia, s. Manizales, Campus La Nubia, km 7 via al Magdalena, Manizales, Colombia
| |
Collapse
|
91
|
Chen J, Tang YY, Chen CLP, Fang B, Lin Y, Shang Z. Multi-Label Learning With Fuzzy Hypergraph Regularization for Protein Subcellular Location Prediction. IEEE Trans Nanobioscience 2014; 13:438-47. [DOI: 10.1109/tnb.2014.2341111] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
92
|
Xu R, Zhou J, Liu B, He Y, Zou Q, Wang X, Chou KC. Identification of DNA-binding proteins by incorporating evolutionary information into pseudo amino acid composition via the top-n-gram approach. J Biomol Struct Dyn 2014; 33:1720-30. [PMID: 25252709 DOI: 10.1080/07391102.2014.968624] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
DNA-binding proteins are crucial for various cellular processes and hence have become an important target for both basic research and drug development. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to establish an automated method for rapidly and accurately identifying DNA-binding proteins based on their sequence information alone. Owing to the fact that all biological species have developed beginning from a very limited number of ancestral species, it is important to take into account the evolutionary information in developing such a high-throughput tool. In view of this, a new predictor was proposed by incorporating the evolutionary information into the general form of pseudo amino acid composition via the top-n-gram approach. It was observed by comparing the new predictor with the existing methods via both jackknife test and independent data-set test that the new predictor outperformed its counterparts. It is anticipated that the new predictor may become a useful vehicle for identifying DNA-binding proteins. It has not escaped our notice that the novel approach to extract evolutionary information into the formulation of statistical samples can be used to identify many other protein attributes as well.
Collapse
Affiliation(s)
- Ruifeng Xu
- a School of Computer Science and Technology , Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University Town , Xili, Shenzhen 518055 , Guangdong , China
| | | | | | | | | | | | | |
Collapse
|
93
|
Pacharawongsakda E, Theeramunkong T. Predict subcellular locations of singleplex and multiplex proteins by semi-supervised learning and dimension-reducing general mode of Chou's PseAAC. IEEE Trans Nanobioscience 2014; 12:311-20. [PMID: 23864226 DOI: 10.1109/tnb.2013.2272014] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Predicting protein subcellular location is one of major challenges in Bioinformatics area since such knowledge helps us understand protein functions and enables us to select the targeted proteins during drug discovery process. While many computational techniques have been proposed to improve predictive performance for protein subcellular location, they have several shortcomings. In this work, we propose a method to solve three main issues in such techniques; i) manipulation of multiplex proteins which may exist or move between multiple cellular compartments, ii) handling of high dimensionality in input and output spaces and iii) requirement of sufficient labeled data for model training. Towards these issues, this work presents a new computational method for predicting proteins which have either single or multiple locations. The proposed technique, namely iFLAST-CORE, incorporates the dimensionality reduction in the feature and label spaces with co-training paradigm for semi-supervised multi-label classification. For this purpose, the Singular Value Decomposition (SVD) is applied to transform the high-dimensional feature space and label space into the lower-dimensional spaces. After that, due to limitation of labeled data, the co-training regression makes use of unlabeled data by predicting the target values in the lower-dimensional spaces of unlabeled data. In the last step, the component of SVD is used to project labels in the lower-dimensional space back to those in the original space and an adaptive threshold is used to map a numeric value to a binary value for label determination. A set of experiments on viral proteins and gram-negative bacterial proteins evidence that our proposed method improve the classification performance in terms of various evaluation metrics such as Aiming (or Precision), Coverage (or Recall) and macro F-measure, compared to the traditional method that uses only labeled data.
Collapse
|
94
|
Dehzangi A, Heffernan R, Sharma A, Lyons J, Paliwal K, Sattar A. Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into Chou׳s general PseAAC. J Theor Biol 2014; 364:284-94. [PMID: 25264267 DOI: 10.1016/j.jtbi.2014.09.029] [Citation(s) in RCA: 178] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2014] [Revised: 08/11/2014] [Accepted: 09/17/2014] [Indexed: 11/17/2022]
Abstract
Protein subcellular localization is defined as predicting the functioning location of a given protein in the cell. It is considered an important step towards protein function prediction and drug design. Recent studies have shown that relying on Gene Ontology (GO) for feature extraction can improve protein subcellular localization prediction performance. However, relying solely on GO, this problem remains unsolved. At the same time, the impact of other sources of features especially evolutionary-based features has not been explored adequately for this task. In this study, we aim to extract discriminative evolutionary features to tackle this problem. To do this, we propose two segmentation based feature extraction methods to explore potential local evolutionary-based information for Gram-positive and Gram-negative subcellular localizations. We will show that by applying a Support Vector Machine (SVM) classifier to our extracted features, we are able to enhance Gram-positive and Gram-negative subcellular localization prediction accuracies by up to 6.4% better than previous studies including the studies that used GO for feature extraction.
Collapse
Affiliation(s)
- Abdollah Dehzangi
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia; National ICT Australia (NICTA), Brisbane, Australia.
| | - Rhys Heffernan
- School of Engineering, Griffith University, Brisbane, Australia
| | - Alok Sharma
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia; School of Engineering and Physics, University of the South Pacific, Fiji
| | - James Lyons
- School of Engineering, Griffith University, Brisbane, Australia
| | - Kuldip Paliwal
- School of Engineering, Griffith University, Brisbane, Australia
| | - Abdul Sattar
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia; National ICT Australia (NICTA), Brisbane, Australia
| |
Collapse
|
95
|
Sharma AR, Chakraborty C, Lee SS, Sharma G, Yoon JK, George Priya Doss C, Song DK, Nam JS. Computational biophysical, biochemical, and evolutionary signature of human R-spondin family proteins, the member of canonical Wnt/β-catenin signaling pathway. BIOMED RESEARCH INTERNATIONAL 2014; 2014:974316. [PMID: 25276837 PMCID: PMC4172882 DOI: 10.1155/2014/974316] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/04/2014] [Revised: 07/12/2014] [Accepted: 07/12/2014] [Indexed: 12/27/2022]
Abstract
In human, Wnt/β-catenin signaling pathway plays a significant role in cell growth, cell development, and disease pathogenesis. Four human (Rspo)s are known to activate canonical Wnt/β-catenin signaling pathway. Presently, (Rspo)s serve as therapeutic target for several human diseases. Henceforth, basic understanding about the molecular properties of (Rspo)s is essential. We approached this issue by interpreting the biochemical and biophysical properties along with molecular evolution of (Rspo)s thorough computational algorithm methods. Our analysis shows that signal peptide length is roughly similar in (Rspo)s family along with similarity in aa distribution pattern. In Rspo3, four N-glycosylation sites were noted. All members are hydrophilic in nature and showed alike GRAVY values, approximately. Conversely, Rspo3 contains the maximum positively charged residues while Rspo4 includes the lowest. Four highly aligned blocks were recorded through Gblocks. Phylogenetic analysis shows Rspo4 is being rooted with Rspo2 and similarly Rspo3 and Rspo1 have the common point of origin. Through phylogenomics study, we developed a phylogenetic tree of sixty proteins (n = 60) with the orthologs and paralogs seed sequences. Protein-protein network was also illustrated. Results demonstrated in our study may help the future researchers to unfold significant physiological and therapeutic properties of (Rspo)s in various disease models.
Collapse
Affiliation(s)
- Ashish Ranjan Sharma
- Institute for Skeletal Aging & Orthopedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon 200704, Republic of Korea
- Institute for Skeletal Aging & Orthopedic Surgery, Hallym University Hospital, College of Medicine, Chuncheon-si, Gangwon-do 200-704, Republic of Korea
| | - Chiranjib Chakraborty
- Institute for Skeletal Aging & Orthopedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon 200704, Republic of Korea
- Department of Bioinformatics, School of Computer Sciences, Galgotias University, Greater Noida 203201, India
| | - Sang-Soo Lee
- Institute for Skeletal Aging & Orthopedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon 200704, Republic of Korea
| | - Garima Sharma
- Institute for Skeletal Aging & Orthopedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon 200704, Republic of Korea
| | - Jeong Kyo Yoon
- Center for Molecular Medicine, Maine Medial Center Research Institute, 81 Research Drive, Scarborough, ME 04074, USA
| | - C. George Priya Doss
- Medical Biotechnology Division, School of Biosciences and Technology, VIT University, Vellore, Tamil Nadu 632014, India
| | - Dong-Keun Song
- Institute for Skeletal Aging & Orthopedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon 200704, Republic of Korea
| | - Ju-Suk Nam
- Institute for Skeletal Aging & Orthopedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon 200704, Republic of Korea
| |
Collapse
|
96
|
Wan S, Mak MW, Kung SY. R3P-Loc: a compact multi-label predictor using ridge regression and random projection for protein subcellular localization. J Theor Biol 2014; 360:34-45. [PMID: 24997236 DOI: 10.1016/j.jtbi.2014.06.031] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2014] [Revised: 06/24/2014] [Accepted: 06/25/2014] [Indexed: 12/21/2022]
Abstract
Locating proteins within cellular contexts is of paramount significance in elucidating their biological functions. Computational methods based on knowledge databases (such as gene ontology annotation (GOA) database) are known to be more efficient than sequence-based methods. However, the predominant scenarios of knowledge-based methods are that (1) knowledge databases typically have enormous size and are growing exponentially, (2) knowledge databases contain redundant information, and (3) the number of extracted features from knowledge databases is much larger than the number of data samples with ground-truth labels. These properties render the extracted features liable to redundant or irrelevant information, causing the prediction systems suffer from overfitting. To address these problems, this paper proposes an efficient multi-label predictor, namely R3P-Loc, which uses two compact databases for feature extraction and applies random projection (RP) to reduce the feature dimensions of an ensemble ridge regression (RR) classifier. Two new compact databases are created from Swiss-Prot and GOA databases. These databases possess almost the same amount of information as their full-size counterparts but with much smaller size. Experimental results on two recent datasets (eukaryote and plant) suggest that R3P-Loc can reduce the dimensions by seven-folds and significantly outperforms state-of-the-art predictors. This paper also demonstrates that the compact databases reduce the memory consumption by 39 times without causing degradation in prediction accuracy. For readers׳ convenience, the R3P-Loc server is available online at url:http://bioinfo.eie.polyu.edu.hk/R3PLocServer/.
Collapse
Affiliation(s)
- Shibiao Wan
- Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong SAR, China.
| | - Man-Wai Mak
- Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong SAR, China.
| | - Sun-Yuan Kung
- Department of Electrical Engineering, Princeton University, NJ, USA.
| |
Collapse
|
97
|
A word of caution about biological inference - Revisiting cysteine covalent state predictions. FEBS Open Bio 2014; 4:310-4. [PMID: 24918043 PMCID: PMC4048844 DOI: 10.1016/j.fob.2014.03.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2013] [Revised: 03/05/2014] [Accepted: 03/07/2014] [Indexed: 11/20/2022] Open
Abstract
High prediction accuracy is often believed to validate implicit biological assumptions. Cys redox state predictions assume a local sequence environmental effect. Removing local sequence signals did not reduce prediction accuracy. Cys redox state predictions apparently correlate with subcellular localization. Subcellular localization depends on global sequence composition.
The success of methods for predicting the redox state of cysteine residues from the sequence environment seemed to validate the basic assumption that this state is mainly determined locally. However, the accuracy of predictions on randomized sequences or of non-cysteine residues remained high, suggesting that these predictions rather capture global features of proteins such as subcellular localization, which depends on composition. This illustrates that even high prediction accuracy is insufficient to validate implicit assumptions about a biological phenomenon. Correctly identifying the relevant underlying biochemical reasons for the success of a method is essential to gain proper biological insights and develop more accurate and novel bioinformatics tools.
Collapse
|
98
|
iMethyl-PseAAC: identification of protein methylation sites via a pseudo amino acid composition approach. BIOMED RESEARCH INTERNATIONAL 2014; 2014:947416. [PMID: 24977164 PMCID: PMC4054830 DOI: 10.1155/2014/947416] [Citation(s) in RCA: 122] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/15/2014] [Revised: 04/26/2014] [Accepted: 04/29/2014] [Indexed: 11/18/2022]
Abstract
Before becoming the native proteins during the biosynthesis, their polypeptide chains created by ribosome's translating mRNA will undergo a series of “product-forming” steps, such as cutting, folding, and posttranslational modification (PTM). Knowledge of PTMs in proteins is crucial for dynamic proteome analysis of various human diseases and epigenetic inheritance. One of the most important PTMs is the Arg- or Lys-methylation that occurs on arginine or lysine, respectively. Given a protein, which site of its Arg (or Lys) can be methylated, and which site cannot? This is the first important problem for understanding the methylation mechanism and drug development in depth. With the avalanche of protein sequences generated in the postgenomic age, its urgency has become self-evident. To address this problem, we proposed a new predictor, called iMethyl-PseAAC. In the prediction system, a peptide sample was formulated by a 346-dimensional vector, formed by incorporating its physicochemical, sequence evolution, biochemical, and structural disorder information into the general form of pseudo amino acid composition. It was observed by the rigorous jackknife test and independent dataset test that iMethyl-PseAAC was superior to any of the existing predictors in this area.
Collapse
|
99
|
HybridGO-Loc: mining hybrid features on gene ontology for predicting subcellular localization of multi-location proteins. PLoS One 2014; 9:e89545. [PMID: 24647341 PMCID: PMC3960097 DOI: 10.1371/journal.pone.0089545] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2013] [Accepted: 01/23/2014] [Indexed: 12/23/2022] Open
Abstract
Protein subcellular localization prediction, as an essential step to elucidate the functions in vivo of proteins and identify drugs targets, has been extensively studied in previous decades. Instead of only determining subcellular localization of single-label proteins, recent studies have focused on predicting both single- and multi-location proteins. Computational methods based on Gene Ontology (GO) have been demonstrated to be superior to methods based on other features. However, existing GO-based methods focus on the occurrences of GO terms and disregard their relationships. This paper proposes a multi-label subcellular-localization predictor, namely HybridGO-Loc, that leverages not only the GO term occurrences but also the inter-term relationships. This is achieved by hybridizing the GO frequencies of occurrences and the semantic similarity between GO terms. Given a protein, a set of GO terms are retrieved by searching against the gene ontology database, using the accession numbers of homologous proteins obtained via BLAST search as the keys. The frequency of GO occurrences and semantic similarity (SS) between GO terms are used to formulate frequency vectors and semantic similarity vectors, respectively, which are subsequently hybridized to construct fusion vectors. An adaptive-decision based multi-label support vector machine (SVM) classifier is proposed to classify the fusion vectors. Experimental results based on recent benchmark datasets and a new dataset containing novel proteins show that the proposed hybrid-feature predictor significantly outperforms predictors based on individual GO features as well as other state-of-the-art predictors. For readers' convenience, the HybridGO-Loc server, which is for predicting virus or plant proteins, is available online at http://bioinfo.eie.polyu.edu.hk/HybridGoServer/.
Collapse
|
100
|
Jeffrey, LSH, and HH. Impact of media composition and growth condition of antifungal production by Streptomyces ambofaciens S2. ACTA ACUST UNITED AC 2014. [DOI: 10.5897/ajmr11.1401] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
|