Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Yu C, Deng M, Cheng SY, Yau SC, He RL, Yau SST. Protein space: a natural method for realizing the nature of protein universe. J Theor Biol 2012;318:197-204. [PMID: 23154188 DOI: 10.1016/j.jtbi.2012.11.005] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2012] [Revised: 11/01/2012] [Accepted: 11/02/2012] [Indexed: 10/27/2022]

For:	Yu C, Deng M, Cheng SY, Yau SC, He RL, Yau SST. Protein space: a natural method for realizing the nature of protein universe. J Theor Biol 2012;318:197-204. [PMID: 23154188 DOI: 10.1016/j.jtbi.2012.11.005] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2012] [Revised: 11/01/2012] [Accepted: 11/02/2012] [Indexed: 10/27/2022]

Number

Cited by Other Article(s)

Ghosh S, Pal J, Cattani C, Maji B, Bhattacharya DK. Protein sequence comparison based on representation on a finite dimensional unit hypercube. J Biomol Struct Dyn 2024;42:6425-6439. [PMID: 37837426 DOI: 10.1080/07391102.2023.2268719] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 07/01/2023] [Indexed: 10/16/2023]

Guan M, Sun N, Yau SST. Geometric analysis of SARS-CoV-2 variants. Gene 2024;909:148291. [PMID: 38417688 DOI: 10.1016/j.gene.2024.148291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 01/23/2024] [Accepted: 02/14/2024] [Indexed: 03/01/2024]

Cahuantzi R, Lythgoe KA, Hall I, Pellis L, House T. Unsupervised identification of significant lineages of SARS-CoV-2 through scalable machine learning methods. Proc Natl Acad Sci U S A 2024;121:e2317284121. [PMID: 38478692 PMCID: PMC10962941 DOI: 10.1073/pnas.2317284121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 02/05/2024] [Indexed: 03/21/2024] Open

Sykes J, Holland BR, Charleston MA. A review of visualisations of protein fold networks and their relationship with sequence and function. Biol Rev Camb Philos Soc 2023;98:243-262. [PMID: 36210328 PMCID: PMC10092621 DOI: 10.1111/brv.12905] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 09/08/2022] [Accepted: 09/09/2022] [Indexed: 01/12/2023]

Apache Spark-based scalable feature extraction approaches for protein sequence and their clustering performance analysis. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2023. [DOI: 10.1007/s41060-022-00381-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]

Kumar N, Acharya V. Machine intelligence-driven framework for optimized hit selection in virtual screening. J Cheminform 2022;14:48. [PMID: 35869511 PMCID: PMC9306080 DOI: 10.1186/s13321-022-00630-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 07/05/2022] [Indexed: 11/10/2022] Open

Abstract AbstractVirtual screening (VS) aids in prioritizing unknown bio-interactions between compounds and protein targets for empirical drug discovery. In standard VS exercise, roughly 10% of top-ranked molecules exhibit activity when examined in biochemical assays, which accounts for many false positive hits, making it an arduous task. Attempts for conquering false-hit rates were developed through either ligand-based or structure-based VS separately; however, nonetheless performed remarkably well. Here, we present an advanced VS framework—automated hit identification and optimization tool (A-HIOT)—comprises chemical space-driven stacked ensemble for identification and protein space-driven deep learning architectures for optimization of an array of specific hits for fixed protein receptors. A-HIOT implements numerous open-source algorithms intending to integrate chemical and protein space leading to a high-quality prediction. The optimized hits are the selective molecules which we retrieve after extreme refinement implying chemical space and protein space modules of A-HIOT. Using CXC chemokine receptor 4, we demonstrated the superior performance of A-HIOT for hit molecule identification and optimization with tenfold cross-validation accuracies of 94.8% and 81.9%, respectively. In comparison with other machine learning algorithms, A-HIOT achieved higher accuracies of 96.2% for hit identification and 89.9% for hit optimization on independent benchmark datasets for CXCR4 and 86.8% for hit identification and 90.2% for hit optimization on independent test dataset for androgen receptor (AR), thus, shows its generalizability and robustness. In conclusion, advantageous features impeded in A-HIOT is making a reliable approach for bridging the long-standing gap between ligand-based and structure-based VS in finding the optimized hits for the desired receptor. The complete resource (framework) code is available at https://gitlab.com/neeraj-24/A-HIOT. Graphical Abstract Collapse

Sun N, Pei S, He L, Yin C, He RL, Yau SST. Geometric construction of viral genome space and its applications. Comput Struct Biotechnol J 2021;19:4226-4234. [PMID: 34429843 PMCID: PMC8353408 DOI: 10.1016/j.csbj.2021.07.028] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Revised: 07/24/2021] [Accepted: 07/24/2021] [Indexed: 11/25/2022] Open

Mu Z, Yu T, Liu X, Zheng H, Wei L, Liu J. FEGS: a novel feature extraction model for protein sequences and its applications. BMC Bioinformatics 2021;22:297. [PMID: 34078264 PMCID: PMC8172329 DOI: 10.1186/s12859-021-04223-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Accepted: 05/28/2021] [Indexed: 11/10/2022] Open

Wan X, Tan X. A protein structural study based on the centrality analysis of protein sequence feature networks. PLoS One 2021;16:e0248861. [PMID: 33780482 PMCID: PMC8006989 DOI: 10.1371/journal.pone.0248861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2020] [Accepted: 03/05/2021] [Indexed: 11/19/2022] Open

Wan X, Tan X. A Simple Protein Evolutionary Classification Method Based on the Mutual Relations Between Protein Sequences. Curr Bioinform 2021. [DOI: 10.2174/1574893615666200305090055] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Multivariate Chemometrics as a Strategy to Predict the Allergenic Nature of Food Proteins. Symmetry (Basel) 2020. [DOI: 10.3390/sym12101616] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open

Ensemble Learning Prediction of Drug-Target Interactions Using GIST Descriptor Extracted from PSSM-Based Evolutionary Information. BIOMED RESEARCH INTERNATIONAL 2020;2020:4516250. [PMID: 32908888 PMCID: PMC7463380 DOI: 10.1155/2020/4516250] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Revised: 08/02/2020] [Accepted: 08/10/2020] [Indexed: 12/02/2022]

Sun Z, Pei S, He RL, Yau SST. A novel numerical representation for proteins: Three-dimensional Chaos Game Representation and its Extended Natural Vector. Comput Struct Biotechnol J 2020;18:1904-1913. [PMID: 32774785 PMCID: PMC7390779 DOI: 10.1016/j.csbj.2020.07.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Revised: 07/04/2020] [Accepted: 07/05/2020] [Indexed: 12/16/2022] Open

Wan X, Tan X. A study on separation of the protein structural types in amino acid sequence feature spaces. PLoS One 2019;14:e0226768. [PMID: 31869390 PMCID: PMC6927603 DOI: 10.1371/journal.pone.0226768] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Accepted: 12/03/2019] [Indexed: 11/23/2022] Open

Zhao X, Tian K, He RL, Yau SST. Convex hull principle for classification and phylogeny of eukaryotic proteins. Genomics 2019;111:1777-1784. [DOI: 10.1016/j.ygeno.2018.11.033] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2018] [Revised: 11/25/2018] [Accepted: 11/30/2018] [Indexed: 12/11/2022]

Li Y, Huang YA, You ZH, Li LP, Wang Z. Drug-Target Interaction Prediction Based on Drug Fingerprint Information and Protein Sequence. Molecules 2019;24:molecules24162999. [PMID: 31430892 PMCID: PMC6719962 DOI: 10.3390/molecules24162999] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2019] [Revised: 08/13/2019] [Accepted: 08/14/2019] [Indexed: 01/09/2023] Open

Abstract

The identification of drug-target interactions (DTIs) is a critical step in drug development. Experimental methods that are based on clinical trials to discover DTIs are time-consuming, expensive, and challenging. Therefore, as complementary to it, developing new computational methods for predicting novel DTI is of great significance with regards to saving cost and shortening the development period. In this paper, we present a novel computational model for predicting DTIs, which uses the sequence information of proteins and a rotation forest classifier. Specifically, all of the target protein sequences are first converted to a position-specific scoring matrix (PSSM) to retain evolutionary information. We then use local phase quantization (LPQ) descriptors to extract evolutionary information in the PSSM. On the other hand, substructure fingerprint information is utilized to extract the features of the drug. We finally combine the features of drugs and protein together to represent features of each drug-target pair and use a rotation forest classifier to calculate the scores of interaction possibility, for a global DTI prediction. The experimental results indicate that the proposed model is effective, achieving average accuracies of 89.15%, 86.01%, 82.20%, and 71.67% on four datasets (i.e., enzyme, ion channel, G protein-coupled receptors (GPCR), and nuclear receptor), respectively. In addition, we compared the prediction performance of the rotation forest classifier with another popular classifier, support vector machine, on the same dataset. Several types of methods previously proposed are also implemented on the same datasets for performance comparison. The comparison results demonstrate the superiority of the proposed method to the others. We anticipate that the proposed method can be used as an effective tool for predicting drug-target interactions on a large scale, given the information of protein sequences and drug fingerprints.

Collapse

Tian K, Zhao X, Zhang Y, Yau S. Comparing protein structures and inferring functions with a novel three-dimensional Yau-Hausdorff method. J Biomol Struct Dyn 2018;37:4151-4160. [PMID: 30518311 DOI: 10.1080/07391102.2018.1540359] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]

Tian K, Zhao X, Yau SST. Convex hull analysis of evolutionary and phylogenetic relationships between biological groups. J Theor Biol 2018;456:34-40. [DOI: 10.1016/j.jtbi.2018.07.035] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Revised: 07/23/2018] [Accepted: 07/25/2018] [Indexed: 11/28/2022]

Phylogenetic analysis of protein sequences based on a novel k-mer natural vector method. Genomics 2018;111:1298-1305. [PMID: 30195069 DOI: 10.1016/j.ygeno.2018.08.010] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Revised: 08/19/2018] [Accepted: 08/27/2018] [Indexed: 11/22/2022]

Dong R, Zhu Z, Yin C, He RL, Yau SST. A new method to cluster genomes based on cumulative Fourier power spectrum. Gene 2018;673:239-250. [PMID: 29935353 DOI: 10.1016/j.gene.2018.06.042] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Revised: 06/12/2018] [Accepted: 06/14/2018] [Indexed: 11/27/2022]

Huang G, Li J, Zhao C. Computational Prediction and Analysis of Associations between Small Molecules and Binding-Associated S-Nitrosylation Sites. Molecules 2018;23:molecules23040954. [PMID: 29671802 PMCID: PMC6017196 DOI: 10.3390/molecules23040954] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2018] [Revised: 03/30/2018] [Accepted: 04/09/2018] [Indexed: 01/12/2023] Open

Dong R, Zheng H, Tian K, Yau SC, Mao W, Yu W, Yin C, Yu C, He RL, Yang J, Yau SS. Virus Database and Online Inquiry System Based on Natural Vectors. Evol Bioinform Online 2017;13:1176934317746667. [PMID: 29308007 PMCID: PMC5751915 DOI: 10.1177/1176934317746667] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2017] [Accepted: 10/05/2017] [Indexed: 01/09/2023] Open

Zhao X, Tian K, He RL, Yau SST. Establishing the phylogeny of Prochlorococcus with a new alignment-free method. Ecol Evol 2017;7:11057-11065. [PMID: 29299281 PMCID: PMC5743538 DOI: 10.1002/ece3.3535] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2017] [Revised: 09/04/2017] [Accepted: 09/14/2017] [Indexed: 11/11/2022] Open

Yu C, Arcos-Burgos M, Licinio J, Wong ML. A latent genetic subtype of major depression identified by whole-exome genotyping data in a Mexican-American cohort. Transl Psychiatry 2017;7:e1134. [PMID: 28509902 PMCID: PMC5534938 DOI: 10.1038/tp.2017.102] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/09/2016] [Revised: 04/04/2017] [Accepted: 04/10/2017] [Indexed: 02/07/2023] Open

Xiong D, Zeng J, Gong H. A deep learning framework for improving long-range residue–residue contact prediction using a hierarchical strategy. Bioinformatics 2017;33:2675-2683. [DOI: 10.1093/bioinformatics/btx296] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Accepted: 05/02/2017] [Indexed: 12/31/2022] Open

Wan X, Zhao X, Yau SST. An information-based network approach for protein classification. PLoS One 2017;12:e0174386. [PMID: 28350835 PMCID: PMC5370107 DOI: 10.1371/journal.pone.0174386] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2016] [Accepted: 03/08/2017] [Indexed: 11/25/2022] Open

Yu C, Baune BT, Licinio J, Wong ML. A novel strategy for clustering major depression individuals using whole-genome sequencing variant data. Sci Rep 2017;7:44389. [PMID: 28287625 PMCID: PMC5347377 DOI: 10.1038/srep44389] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2016] [Accepted: 02/07/2017] [Indexed: 12/01/2022] Open

Hou W, Pan Q, Peng Q, He M. A new method to analyze protein sequence similarity using Dynamic Time Warping. Genomics 2016;109:123-130. [PMID: 27974244 PMCID: PMC7125777 DOI: 10.1016/j.ygeno.2016.12.002] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2016] [Revised: 12/06/2016] [Accepted: 12/10/2016] [Indexed: 12/05/2022]

Zhao X, Wan X, He RL, Yau SST. A new method for studying the evolutionary origin of the SAR11 clade marine bacteria. Mol Phylogenet Evol 2016;98:271-9. [PMID: 26926946 DOI: 10.1016/j.ympev.2016.02.015] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2015] [Revised: 02/18/2016] [Accepted: 02/18/2016] [Indexed: 12/14/2022]

Tian K, Yang X, Kong Q, Yin C, He RL, Yau SST. Two Dimensional Yau-Hausdorff Distance with Applications on Comparison of DNA and Protein Sequences. PLoS One 2015;10:e0136577. [PMID: 26384293 PMCID: PMC4575136 DOI: 10.1371/journal.pone.0136577] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2015] [Accepted: 08/05/2015] [Indexed: 11/20/2022] Open

Yau SST, Mao WG, Benson M, He RL. Distinguishing proteins from arbitrary amino acid sequences. Sci Rep 2015;5:7972. [PMID: 25609314 PMCID: PMC4302309 DOI: 10.1038/srep07972] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2014] [Accepted: 12/09/2014] [Indexed: 01/26/2023] Open

Li J, Koehl P. 3D representations of amino acids-applications to protein sequence comparison and classification. Comput Struct Biotechnol J 2014;11:47-58. [PMID: 25379143 PMCID: PMC4212284 DOI: 10.1016/j.csbj.2014.09.001] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open

DFA7, a new method to distinguish between intron-containing and intronless genes. PLoS One 2014;9:e101363. [PMID: 25036549 PMCID: PMC4103774 DOI: 10.1371/journal.pone.0101363] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2014] [Accepted: 06/05/2014] [Indexed: 11/23/2022] Open

K-mer natural vector and its application to the phylogenetic analysis of genetic sequences. Gene 2014;546:25-34. [PMID: 24858075 DOI: 10.1016/j.gene.2014.05.043] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2014] [Revised: 05/04/2014] [Accepted: 05/20/2014] [Indexed: 11/21/2022]

APSLAP: an adaptive boosting technique for predicting subcellular localization of apoptosis protein. Acta Biotheor 2013;61:481-97. [PMID: 23982307 DOI: 10.1007/s10441-013-9197-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2013] [Accepted: 08/16/2013] [Indexed: 01/09/2023]

Abstract

Apoptotic proteins play key roles in understanding the mechanism of programmed cell death. Knowledge about the subcellular localization of apoptotic protein is constructive in understanding the mechanism of programmed cell death, determining the functional characterization of the protein, screening candidates in drug design, and selecting protein for relevant studies. It is also proclaimed that the information required for determining the subcellular localization of protein resides in their corresponding amino acid sequence. In this work, a new biological feature, class pattern frequency of physiochemical descriptor, was effectively used in accordance with the amino acid composition, protein similarity measure, CTD (composition, translation, and distribution) of physiochemical descriptors, and sequence similarity to predict the subcellular localization of apoptosis protein. AdaBoost with the weak learner as Random-Forest was designed for the five modules and prediction is made based on the weighted voting system. Bench mark dataset of 317 apoptosis proteins were subjected to prediction by our system and the accuracy was found to be 100.0 and 92.4 %, and 90.1 % for self-consistency test, jack-knife test, and tenfold cross validation test respectively, which is 0.9 % higher than that of other existing methods. Beside this, the independent data (N151 and ZW98) set prediction resulted in the accuracy of 90.7 and 87.7 %, respectively. These results show that the protein feature represented by a combined feature vector along with AdaBoost algorithm holds well in effective prediction of subcellular localization of apoptosis proteins. The user friendly web interface "APSLAP" has been constructed, which is freely available at http://apslap.bicpu.edu.in and it is anticipated that this tool will play a significant role in determining the specific role of apoptosis proteins with reliability.

Collapse

Yu C, He RL, Yau SST. Protein sequence comparison based on K-string dictionary. Gene 2013;529:250-6. [DOI: 10.1016/j.gene.2013.07.092] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2012] [Revised: 06/14/2013] [Accepted: 07/25/2013] [Indexed: 11/30/2022]