26
|
Ray S, Alberuni S, Maulik U. Computational Prediction of HCV-Human Protein-Protein Interaction via Topological Analysis of HCV Infected PPI Modules. IEEE Trans Nanobioscience 2019; 17:55-61. [PMID: 29570075 DOI: 10.1109/tnb.2018.2797696] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
In this paper, we have developed a framework for detection of protein-protein interactions (PPI) between Hepatitis-C virus (HCV) and human proteins based on PPI and gene ontology based information of the HCV infected proteins. First, a bipartite interaction network is formed between HCV proteins and human host proteins. Next, we have analyzed different topological properties of the interaction network and observed that degree of HCV-interacting proteins is significantly higher than non-interacting host proteins. We have also observed that the HCV interacted protein pairs are functionally similar with each other than the non-interacting pairs. Following the observations, we have applied an inference mechanism to predict novel interactions between HCV and human protein. The inference mechanism is based on partitioning the network formed by HCV interacted human proteins and their first neighbors in dense and functionally similar groups using a PPI network clustering algorithm. The groups are then analyzed to predict PPIs. The predicted interaction pairs are validated using literature search in PUBMED. Experimental evidence of over 50% of the predicted pairs are found in existing literatures by searching PUBMED. A Gene Ontology and pathway based analysis is also carried out to validate the identified modules biologically.
Collapse
|
27
|
Sen S, Dey A, Chowdhury S, Maulik U, Chattopadhyay K. Understanding the evolutionary trend of intrinsically structural disorders in cancer relevant proteins as probed by Shannon entropy scoring and structure network analysis. BMC Bioinformatics 2019; 19:549. [PMID: 30717651 PMCID: PMC7394331 DOI: 10.1186/s12859-018-2552-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2018] [Accepted: 11/30/2018] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Malignant diseases have become a threat for health care system. A panoply of biological processes is involved as the cause of these diseases. In order to unveil the mechanistic details of these diseased states, we analyzed protein families relevant to these diseases. RESULTS Our present study pivots around four apparently unrelated cancer types among which two are commonly occurring viz. Prostate Cancer, Breast Cancer and two relatively less frequent viz. Acute Lymphoblastic Leukemia and Lymphoma. Eight protein families were found to have implications for these cancer types. Our results strikingly reveal that some of the proteins with implications in the cancerous cellular states were showing the structural organization disparate from the signature of the family it constitutes. The sequences were further mapped onto respective structures and compared with the entropic profile. The structures reveal that entropic scores were able to reveal the inherent structural bias of these proteins with quantitative precision, otherwise unseen from other analysis. Subsequently, the betweenness centrality scoring of each residue from the structure network models was resorted to explore the changes in dependencies on residue owing to structural disorder. CONCLUSION These observations help to obtain the mechanistic changes resulting from the structural orchestration of protein structures. Finally, the hydropathy indexes were obtained to validate the sequence space observations using Shannon entropy and in-turn establishing the compatibility.
Collapse
|
28
|
Maulik U. Meet Our Editorial Board Member. Protein Pept Lett 2018. [DOI: 10.2174/092986652511181221115346] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
29
|
Maulik U, Uversky VN, Sen S. A Statistical Approach to Detect Intrinsically Disordered Proteins Associated with Uterine Leiomyoma. Protein Pept Lett 2018; 25:483-491. [PMID: 29577850 DOI: 10.2174/0929866525666180326114325] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2017] [Revised: 03/05/2018] [Accepted: 03/08/2018] [Indexed: 11/22/2022]
Abstract
BACKGROUND Uterine Leiomyoma is mainly widespread non-malignant tumor. Around more than 80% woman have these particular tumor among them only 30% of them are detected. Integrin-ᵦ1 is one of the up regulated biomarkers during tumorigenesis which is also associated with structural disordered. Intrinsically disordered proteins are one of the types which are dealing with un-structuredness especially in tertiary structural orchestration. Around 30% of the human proteins consist of intrinsically disordered regions. It is obvious that IDPs should have a significant change of functional activities under structure-function paradigm. Mostly IDPs are associated with malignancies, neurodegenerative diseases and heart diseases. DNA methylation is one Post Transcriptional Modification (PTM) techniques where methyl groups are added to nucleotide bases. It is responsible to control the functionality of Transcription Factors (TFs). Along with that, the structural orchestration is also affected due to PTM. Very few diseases related studies are focused on structural disordered along with methylation. OBJECTIVE In this article, our motivation is to establish a relation between uterine leiomyoma at differential methylation rate and tissue specific disordered proteins. METHOD In this article, we propose a framework for achieving our aforementioned object. We start with two set of data i.e., set of gene specifically related with uterine leiomyoma (GUL) and set of tissue specific proteins from uniprot (Puterine). Subsequently, 'two sample T-Test' is applied on GUL to find differentially methylated sample for uterine leiomyoma (DGUL). Comparing the gene transcripts of DGUL with the Puterine , the common biomarkers are selected (DPuterine). Thereafter the selected list of proteins is analyzed under D2P2 to find percentage disorder rate, number SCOP, number protein families and rate PTM. Proteins, with more than 10% of structural disorder rate, consider as structurally disordered (PUL disordered). Finally, to validate the listed up proteins we perform KEGG pathway and Gene Ontology analysis. RESULTS Following the proposed framework, we start with 2246 proteins from uniprot which are kept in Puterine. Under DGUL there are 6555 genes which are differentially methylated (p-value <0.05). Only 434 proteins selected from the intersection of DGUL and Puterine. Among them only 210 proteins are fallen PUL disordered with more than 10% structural disorder. Top ten proteins under the range of 100% to 74.2% are selected shown in the article. After performing KEGG pathway analysis and Gene Ontology analysis, it is found that Q969W3 has no connection with KEGG or GO terms. CONCLUSION After the applying the framework, we get some verified group of proteins at different stages of the proposed method. The group of 210 disordered proteins is verified from the KEGG and GO analysis. As the result is verified at satisfactory level then it can be said that the framework is successfully analyzed intrinsically disordered proteins, having a connection with differential methylation levels for a specific disease.
Collapse
|
30
|
Sen S, Maulik U. Recent advancement toward significant association between disordered transcripts and virus-infected diseases: a survey. Brief Funct Genomics 2018; 17:458-470. [DOI: 10.1093/bfgp/ely021] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
31
|
Ray S, Maulik U. Discovering Perturbation of Modular Structure in HIV Progression by Integrating Multiple Data Sources Through Non-Negative Matrix Factorization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:869-877. [PMID: 28029629 DOI: 10.1109/tcbb.2016.2642184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Detecting perturbation in modular structure during HIV-1 disease progression is an important step to understand stage specific infection pattern of HIV-1 virus in human cell. In this article, we proposed a novel methodology on integration of multiple biological information to identify such disruption in human gene module during different stages of HIV-1 infection. We integrate three different biological information: gene expression information, protein-protein interaction information, and gene ontology information in single gene meta-module, through non negative matrix factorization (NMF). As the identified meta-modules inherit those information so, detecting perturbation of these, reflects the changes in expression pattern, in PPI structure and in functional similarity of genes during the infection progression. To integrate modules of different data sources into strong meta-modules, NMF based clustering is utilized here. Perturbation in meta-modular structure is identified by investigating the topological and intramodular properties and putting rank to those meta-modules using a rank aggregation algorithm. We have also analyzed the preservation structure of significant GO terms in which the human proteins of the meta-modules participate. Moreover, we have performed an analysis to show the change of coregulation pattern of identified transcription factors (TFs) over the HIV progression stages.
Collapse
|
32
|
Maulik U, Sen S, Mallik S, Bandyopadhyay S. Detecting TF-miRNA-gene network based modules for 5hmC and 5mC brain samples: a intra- and inter-species case-study between human and rhesus. BMC Genet 2018; 19:9. [PMID: 29357837 PMCID: PMC5776763 DOI: 10.1186/s12863-017-0574-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2017] [Accepted: 11/29/2017] [Indexed: 01/09/2023] Open
Abstract
Background Study of epigenetics is currently a high-impact research topic. Multi stage methylation is also an area of high-dimensional prospect. In this article, we provide a new study (intra and inter-species study) on brain tissue between human and rhesus on two methylation cytosine variants based data-profiles (viz., 5-hydroxymethylcytosine (5hmC) and 5-methylcytosine (5mC) samples) through TF-miRNA-gene network based module detection. Results First of all, we determine differentially 5hmC methylated genes for human as well as rhesus for intra-species analysis, and differentially multi-stage methylated genes for inter-species analysis. Thereafter, we utilize weighted topological overlap matrix (TOM) measure and average linkage clustering consecutively on these genesets for intra- and inter-species study.We identify co-methylated and multi-stage co-methylated gene modules by using dynamic tree cut, for intra-and inter-species cases, respectively. Each module is represented by individual color in the dendrogram. Gene Ontology and KEGG pathway based analysis are then performed to identify biological functionalities of the identified modules. Finally, top ten regulator TFs and targeter miRNAs that are associated with the maximum number of gene modules, are determined for both intra-and inter-species analysis. Conclusions The novel TFs and miRNAs obtained from the analysis are: MYST3 and ZNF771 as TFs (for human intra-species analysis), BAZ2B, RCOR3 and ATF1 as TFs (for rhesus intra-species analysis), and mml-miR-768-3p and mml-miR-561 as miRs (for rhesus intra-species analysis); and MYST3 and ZNF771 as miRs(for inter-species study). Furthermore, the genes/TFs/miRNAs that are already found to be liable for several brain-related dreadful diseases as well as rare neglected diseases (e.g., wolf Hirschhorn syndrome, Joubarts Syndrome, Huntington’s disease, Simian Immunodeficiency Virus(SIV) mediated enchaphilits, Parkinsons Disease, Bipolar disorder and Schizophenia etc.) are mentioned. Electronic supplementary material The online version of this article (doi:10.1186/s12863-017-0574-7) contains supplementary material, which is available to authorized users.
Collapse
|
33
|
Ray S, Maulik U, Mukhopadhyay A. A review of computational approaches for analysis of hepatitis C virus-mediated liver diseases. Brief Funct Genomics 2017; 17:428-440. [DOI: 10.1093/bfgp/elx040] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
|
34
|
Mitra R, Chen X, Greenawalt EJ, Maulik U, Jiang W, Zhao Z, Eischen CM. Decoding critical long non-coding RNA in ovarian cancer epithelial-to-mesenchymal transition. Nat Commun 2017; 8:1604. [PMID: 29150601 PMCID: PMC5693921 DOI: 10.1038/s41467-017-01781-0] [Citation(s) in RCA: 134] [Impact Index Per Article: 19.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Accepted: 10/16/2017] [Indexed: 12/17/2022] Open
Abstract
Long non-coding RNA (lncRNA) are emerging as contributors to malignancies. Little is understood about the contribution of lncRNA to epithelial-to-mesenchymal transition (EMT), which correlates with metastasis. Ovarian cancer is usually diagnosed after metastasis. Here we report an integrated analysis of >700 ovarian cancer molecular profiles, including genomic data sets, from four patient cohorts identifying lncRNA DNM3OS, MEG3, and MIAT overexpression and their reproducible gene regulation in ovarian cancer EMT. Genome-wide mapping shows 73% of MEG3-regulated EMT-linked pathway genes contain MEG3 binding sites. DNM3OS overexpression, but not MEG3 or MIAT, significantly correlates to worse overall patient survival. DNM3OS knockdown results in altered EMT-linked genes/pathways, mesenchymal-to-epithelial transition, and reduced cell migration and invasion. Proteotranscriptomic characterization further supports the DNM3OS and ovarian cancer EMT connection. TWIST1 overexpression and DNM3OS amplification provides an explanation for increased DNM3OS levels. Therefore, our results elucidate lncRNA that regulate EMT and demonstrate DNM3OS specifically contributes to EMT in ovarian cancer.
Collapse
|
35
|
Dey S, Bhattacharyya S, Maulik U. Efficient quantum inspired meta-heuristics for multi-level true colour image thresholding. Appl Soft Comput 2017. [DOI: 10.1016/j.asoc.2016.04.024] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
36
|
Ray S, Maulik U. Identifying differentially coexpressed module during HIV disease progression: A multiobjective approach. Sci Rep 2017; 7:86. [PMID: 28273892 PMCID: PMC5428367 DOI: 10.1038/s41598-017-00090-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2016] [Accepted: 01/31/2017] [Indexed: 11/13/2022] Open
Abstract
Microarray analysis based on gene coexpression is widely used to investigate the coregulation pattern of a group (or cluster) of genes in a specific phenotype condition. Recent approaches go one step beyond and look for differential coexpression pattern, wherein there exists a significant difference in coexpression pattern between two phenotype conditions. These changes of coexpression patterns generally arise due to significant change in regulatory mechanism across different conditions governed by natural progression of diseases. Here we develop a novel multiobjective framework DiffCoMO, to identify differentially coexpressed modules that capture altered coexpression in gene modules across different stages of HIV-1 progression. The objectives are built to emphasize the distance between coexpression pattern of two phenotype stages. The proposed method is assessed by comparing with some state-of-the-art techniques. We show that DiffCoMO outperforms the state-of-the-art for detecting differential coexpressed modules. Moreover, we have compared the performance of all the methods using simulated data. The biological significance of the discovered modules is also investigated using GO and pathway enrichment analysis. Additionally, miRNA enrichment analysis is carried out to identify TF to miRNA and miRNA to TF connections. The gene modules discovered by DiffCoMO manifest regulation by miRNA-28, miRNA-29 and miRNA-125 families.
Collapse
|
37
|
Mallik S, Bhadra T, Maulik U. Identifying Epigenetic Biomarkers using Maximal Relevance and Minimal Redundancy Based Feature Selection for Multi-Omics Data. IEEE Trans Nanobioscience 2017; 16:3-10. [PMID: 28092570 DOI: 10.1109/tnb.2017.2650217] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Epigenetic Biomarker discovery is an important task in bioinformatics. In this article, we develop a new framework of identifying statistically significant epigenetic biomarkers using maximal-relevance and minimal-redundancy criterion based feature (gene) selection for multi-omics dataset. Firstly, we determine the genes that have both expression as well as methylation values, and follow normal distribution. Similarly, we identify the genes which consist of both expression and methylation values, but do not follow normal distribution. For each case, we utilize a gene-selection method that provides maximal-relevant, but variable-weighted minimum-redundant genes as top ranked genes. For statistical validation, we apply t-test on both the expression and methylation data consisting of only the normally distributed top ranked genes to determine how many of them are both differentially expressed andmethylated. Similarly, we utilize Limma package for performing non-parametric Empirical Bayes test on both expression and methylation data comprising only the non-normally distributed top ranked genes to identify how many of them are both differentially expressed and methylated. We finally report the top-ranking significant gene-markerswith biological validation. Moreover, our framework improves positive predictive rate and reduces false positive rate in marker identification. In addition, we provide a comparative analysis of our gene-selection method as well as othermethods based on classificationperformances obtained using several well-known classifiers.
Collapse
|
38
|
Bhattacharyya S, Dutta P, Maulik U. Preface. Appl Soft Comput 2016. [DOI: 10.1016/j.asoc.2016.05.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
39
|
Sarkar JP, Saha I, Maulik U. Rough Possibilistic Type-2 Fuzzy C-Means clustering for MR brain image segmentation. Appl Soft Comput 2016. [DOI: 10.1016/j.asoc.2016.01.040] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
40
|
Dey S, Bhattacharyya S, Maulik U. New quantum inspired meta-heuristic techniques for multi-level colour image thresholding. Appl Soft Comput 2016. [DOI: 10.1016/j.asoc.2015.09.042] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
41
|
Mallik S, Sen S, Maulik U. IDPT: Insights into potential intrinsically disordered proteins through transcriptomic analysis of genes for prostate carcinoma epigenetic data. Gene 2016; 586:87-96. [DOI: 10.1016/j.gene.2016.03.056] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Revised: 02/22/2016] [Accepted: 03/30/2016] [Indexed: 12/13/2022]
|
42
|
Sriwastava BK, Basu S, Maulik U. Predicting Protein-Protein Interaction Sites with a Novel Membership Based Fuzzy SVM Classifier. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:1394-1404. [PMID: 26684462 DOI: 10.1109/tcbb.2015.2401018] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Predicting residues that participate in protein-protein interactions (PPI) helps to identify, which amino acids are located at the interface. In this paper, we show that the performance of the classical support vector machine (SVM) algorithm can further be improved with the use of a custom-designed fuzzy membership function, for the partner-specific PPI interface prediction problem. We evaluated the performances of both classical SVM and fuzzy SVM (F-SVM) on the PPI databases of three different model proteomes of Homo sapiens, Escherichia coli and Saccharomyces Cerevisiae and calculated the statistical significance of the developed F-SVM over classical SVM algorithm. We also compared our performance with the available state-of-the-art fuzzy methods in this domain and observed significant performance improvements. To predict interaction sites in protein complexes, local composition of amino acids together with their physico-chemical characteristics are used, where the F-SVM based prediction method exploits the membership function for each pair of sequence fragments. The average F-SVM performance (area under ROC curve) on the test samples in 10-fold cross validation experiment are measured as 77.07, 78.39, and 74.91 percent for the aforementioned organisms respectively. Performances on independent test sets are obtained as 72.09, 73.24 and 82.74 percent respectively. The software is available for free download from http://code.google.com/p/cmater-bioinfo.
Collapse
|
43
|
Basu S, Maulik U. Community detection based on strong Nash stable graph partition. SOCIAL NETWORK ANALYSIS AND MINING 2015. [DOI: 10.1007/s13278-015-0299-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
44
|
Sarkar A, Maulik U. Gene microarray data analysis using parallel point-symmetry-based clustering. INT J DATA MIN BIOIN 2015; 11:277-300. [PMID: 26333263 DOI: 10.1504/ijdmb.2015.067320] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Identification of co-expressed genes is the central goal in microarray gene expression analysis. Point-symmetry-based clustering is an important unsupervised learning technique for recognising symmetrical convex- or non-convex-shaped clusters. To enable fast clustering of large microarray data, we propose a distributed time-efficient scalable approach for point-symmetry-based K-Means algorithm. A natural basis for analysing gene expression data using symmetry-based algorithm is to group together genes with similar symmetrical expression patterns. This new parallel implementation also satisfies linear speedup in timing without sacrificing the quality of clustering solution on large microarray data sets. The parallel point-symmetry-based K-Means algorithm is compared with another new parallel symmetry-based K-Means and existing parallel K-Means over eight artificial and benchmark microarray data sets, to demonstrate its superiority, in both timing and validity. The statistical analysis is also performed to establish the significance of this message-passing-interface based point-symmetry K-Means implementation. We also analysed the biological relevance of clustering solutions.
Collapse
|
45
|
Maulik U. Meet Our Editorial Board Member:. Protein Pept Lett 2015. [DOI: 10.2174/092986652210150821170543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
46
|
Mallik S, Maulik U. MiRNA-TF-gene network analysis through ranking of biomolecules for multi-informative uterine leiomyoma dataset. J Biomed Inform 2015; 57:308-19. [PMID: 26297985 DOI: 10.1016/j.jbi.2015.08.014] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2014] [Revised: 06/26/2015] [Accepted: 08/11/2015] [Indexed: 12/12/2022]
Abstract
Gene ranking is an important problem in bioinformatics. Here, we propose a new framework for ranking biomolecules (viz., miRNAs, transcription-factors/TFs and genes) in a multi-informative uterine leiomyoma dataset having both gene expression and methylation data using (statistical) eigenvector centrality based approach. At first, genes that are both differentially expressed and methylated, are identified using Limma statistical test. A network, comprising these genes, corresponding TFs from TRANSFAC and ITFP databases, and targeter miRNAs from miRWalk database, is then built. The biomolecules are then ranked based on eigenvector centrality. Our proposed method provides better average accuracy in hub gene and non-hub gene classifications than other methods. Furthermore, pre-ranked Gene set enrichment analysis is applied on the pathway database as well as GO-term databases of Molecular Signatures Database with providing a pre-ranked gene-list based on different centrality values for comparing among the ranking methods. Finally, top novel potential gene-markers for the uterine leiomyoma are provided.
Collapse
|
47
|
Bandyopadhyay S, Ray S, Mukhopadhyay A, Maulik U. A multiobjective approach for identifying protein complexes and studying their association in multiple disorders. Algorithms Mol Biol 2015; 10:24. [PMID: 26257820 PMCID: PMC4529733 DOI: 10.1186/s13015-015-0056-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2014] [Accepted: 07/28/2015] [Indexed: 11/17/2022] Open
Abstract
Background Detecting protein complexes within protein–protein interaction (PPI) networks is a major step toward the analysis of biological processes and pathways. Identification and characterization of protein complexes in PPI network is an ongoing challenge. Several high-throughput experimental techniques provide substantial number of PPIs which are widely utilized for compiling the PPI network of a species. Results Here we focus on detecting human protein complexes by developing a multiobjective framework. For this large human PPI network is partitioned into modules which serves as protein complex. For building the objective functions we have utilized topological properties of PPI network and biological properties based on Gene Ontology semantic similarity. The proposed method is compared with that of some state-of-the-art algorithms in the context of different performance metrics. For the purpose of biological validation of our predicted complexes we have also employed a Gene Ontology and pathway based analysis here. Additionally, we have performed an analysis to associate resulting protein complexes with 22 key disease classes. Two bipartite networks are created to clearly visualize the association of identified protein complexes with the disorder classes. Conclusions Here, we present the task of identifying protein complexes as a multiobjective optimization problem. Identified protein complexes are found to be associated with several disorders classes like ‘Cancer’, ‘Endocrine’ and ‘Multiple’. This analysis uncovers some new relationships between disorders and predicted complexes that may take a potential role in the prediction of multi target drugs. Electronic supplementary material The online version of this article (doi:10.1186/s13015-015-0056-2) contains supplementary material, which is available to authorized users.
Collapse
|
48
|
Saha I, Rak B, Bhowmick SS, Maulik U, Bhattacharjee D, Koch U, Lazniewski M, Plewczynski D. Binding Activity Prediction of Cyclin-Dependent Inhibitors. J Chem Inf Model 2015; 55:1469-82. [PMID: 26079845 DOI: 10.1021/ci500633c] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The Cyclin-Dependent Kinases (CDKs) are the core components coordinating eukaryotic cell division cycle. Generally the crystal structure of CDKs provides information on possible molecular mechanisms of ligand binding. However, reliable and robust estimation of ligand binding activity has been a challenging task in drug design. In this regard, various machine learning techniques, such as Support Vector Machine, Naive Bayesian classifier, Decision Tree, and K-Nearest Neighbor classifier, have been used. The performance of these heterogeneous classification techniques depends on proper selection of features from the data set. This fact motivated us to propose an integrated classification technique using Genetic Algorithm (GA), Rotational Feature Selection (RFS) scheme, and Ensemble of Machine Learning methods, named as the Genetic Algorithm integrated Rotational Ensemble based classification technique, for the prediction of ligand binding activity of CDKs. This technique can automatically find the important features and the ensemble size. For this purpose, GA encodes the features and ensemble size in a chromosome as a binary string. Such encoded features are then used to create diverse sets of training points using RFS in order to train the machine learning method multiple times. The RFS scheme works on Principal Component Analysis (PCA) to preserve the variability information of the rotational nonoverlapping subsets of original data. Thereafter, the testing points are fed to the different instances of trained machine learning method in order to produce the ensemble result. Here accuracy is computed as a final result after 10-fold cross validation, which also used as an objective function for GA to maximize. The effectiveness of the proposed classification technique has been demonstrated quantitatively and visually in comparison with different machine learning methods for 16 ligand binding CDK docking and rescoring data sets. In addition, the best possible features have been reported for CDK docking and rescoring data sets separately. Finally, the Friedman test has been conducted to judge the statistical significance of the results produced by the proposed technique. The results indicate that the integrated classification technique has high relevance in predicting of protein-ligand binding activity.
Collapse
|
49
|
Bandyopadhyay S, Chakraborty R, Maulik U. Priority based ∊ dominance: A new measure in multiobjective optimization. Inf Sci (N Y) 2015. [DOI: 10.1016/j.ins.2015.01.018] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
50
|
Sarkar A, Maulik U. Rough Based Symmetrical Clustering for Gene Expression Profile Analysis. IEEE Trans Nanobioscience 2015; 14:360-367. [DOI: 10.1109/tnb.2015.2421323] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|