51
|
Bandyopadhyay S, Chakraborty R, Maulik U. Priority based ∊ dominance: A new measure in multiobjective optimization. Inf Sci (N Y) 2015. [DOI: 10.1016/j.ins.2015.01.018] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
52
|
Sarkar A, Maulik U. Rough Based Symmetrical Clustering for Gene Expression Profile Analysis. IEEE Trans Nanobioscience 2015; 14:360-367. [DOI: 10.1109/tnb.2015.2421323] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
53
|
Maulik U, Mallik S, Mukhopadhyay A, Bandyopadhyay S. Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining. PLoS One 2015; 10:e0119448. [PMID: 25830807 PMCID: PMC4382191 DOI: 10.1371/journal.pone.0119448] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2014] [Accepted: 01/22/2015] [Indexed: 11/18/2022] Open
Abstract
Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level.
Collapse
|
54
|
Mandal M, Mukhopadhyay A, Maulik U. Prediction of protein subcellular localization by incorporating multiobjective PSO-based feature subset selection into the general form of Chou’s PseAAC. Med Biol Eng Comput 2015; 53:331-44. [DOI: 10.1007/s11517-014-1238-7] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2014] [Accepted: 12/22/2014] [Indexed: 01/01/2023]
|
55
|
Mallik S, Mukhopadhyay A, Maulik U. RANWAR: Rank-Based Weighted Association Rule Mining From Gene Expression and Methylation Data. IEEE Trans Nanobioscience 2015; 14:59-66. [DOI: 10.1109/tnb.2014.2359494] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
56
|
Bandyopadhyay S, Ray S, Mukhopadhyay A, Maulik U. A review of in silico approaches for analysis and prediction of HIV-1-human protein-protein interactions. Brief Bioinform 2014; 16:830-51. [PMID: 25479794 DOI: 10.1093/bib/bbu041] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2014] [Indexed: 12/19/2022] Open
Abstract
The computational or in silico approaches for analysing the HIV-1-human protein-protein interaction (PPI) network, predicting different host cellular factors and PPIs and discovering several pathways are gaining popularity in the field of HIV research. Although there exist quite a few studies in this regard, no previous effort has been made to review these works in a comprehensive manner. Here we review the computational approaches that are devoted to the analysis and prediction of HIV-1-human PPIs. We have broadly categorized these studies into two fields: computational analysis of HIV-1-human PPI network and prediction of novel PPIs. We have also presented a comparative assessment of these studies and proposed some methodologies for discussing the implication of their results. We have also reviewed different computational techniques for predicting HIV-1-human PPIs and provided a comparative study of their applicability. We believe that our effort will provide helpful insights to the HIV research community.
Collapse
|
57
|
Chakraborty D, Maulik U. Identifying Cancer Biomarkers From Microarray Data Using Feature Selection and Semisupervised Learning. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE-JTEHM 2014; 2:4300211. [PMID: 27170887 PMCID: PMC4848046 DOI: 10.1109/jtehm.2014.2375820] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/08/2014] [Revised: 09/20/2014] [Accepted: 11/22/2014] [Indexed: 11/07/2022]
Abstract
Microarrays have now gone from obscurity to being almost ubiquitous in biological research. At the same time, the statistical methodology for microarray analysis has progressed from simple visual assessments of results to novel algorithms for analyzing changes in expression profiles. In a micro-RNA (miRNA) or gene-expression profiling experiment, the expression levels of thousands of genes/miRNAs are simultaneously monitored to study the effects of certain treatments, diseases, and developmental stages on their expressions. Microarray-based gene expression profiling can be used to identify genes, whose expressions are changed in response to pathogens or other organisms by comparing gene expression in infected to that in uninfected cells or tissues. Recent studies have revealed that patterns of altered microarray expression profiles in cancer can serve as molecular biomarkers for tumor diagnosis, prognosis of disease-specific outcomes, and prediction of therapeutic responses. Microarray data sets containing expression profiles of a number of miRNAs or genes are used to identify biomarkers, which have dysregulation in normal and malignant tissues. However, small sample size remains a bottleneck to design successful classification methods. On the other hand, adequate number of microarray data that do not have clinical knowledge can be employed as additional source of information. In this paper, a combination of kernelized fuzzy rough set (KFRS) and semisupervised support vector machine (S(3)VM) is proposed for predicting cancer biomarkers from one miRNA and three gene expression data sets. Biomarkers are discovered employing three feature selection methods, including KFRS. The effectiveness of the proposed KFRS and S(3)VM combination on the microarray data sets is demonstrated, and the cancer biomarkers identified from miRNA data are reported. Furthermore, biological significance tests are conducted for miRNA cancer biomarkers.
Collapse
|
58
|
Dey S, Saha I, Bhattacharyya S, Maulik U. Multi-level thresholding using quantum inspired meta-heuristics. Knowl Based Syst 2014. [DOI: 10.1016/j.knosys.2014.04.006] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
59
|
Mukhopadhyay A, Maulik U. Network-based study reveals potential infection pathways of hepatitis-C leading to various diseases. PLoS One 2014; 9:e94029. [PMID: 24743187 PMCID: PMC3990553 DOI: 10.1371/journal.pone.0094029] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2013] [Accepted: 03/11/2014] [Indexed: 12/17/2022] Open
Abstract
Protein-protein interaction network-based study of viral pathogenesis has been gaining popularity among computational biologists in recent days. In the present study we attempt to investigate the possible pathways of hepatitis-C virus (HCV) infection by integrating the HCV-human interaction network, human protein interactome and human genetic disease association network. We have proposed quasi-biclique and quasi-clique mining algorithms to integrate these three networks to identify infection gateway host proteins and possible pathways of HCV pathogenesis leading to various diseases. Integrated study of three networks, namely HCV-human interaction network, human protein interaction network, and human proteins-disease association network reveals potential pathways of infection by the HCV that lead to various diseases including cancers. The gateway proteins have been found to be biologically coherent and have high degrees in human interactome compared to the other virus-targeted proteins. The analyses done in this study provide possible targets for more effective anti-hepatitis-C therapeutic involvement.
Collapse
|
60
|
Aqil M, Naqvi AR, Mallik S, Bandyopadhyay S, Maulik U, Jameel S. The HIV Nef protein modulates cellular and exosomal miRNA profiles in human monocytic cells. J Extracell Vesicles 2014; 3:23129. [PMID: 24678387 PMCID: PMC3967016 DOI: 10.3402/jev.v3.23129] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2013] [Revised: 01/28/2014] [Accepted: 02/15/2014] [Indexed: 12/15/2022] Open
Abstract
Introduction The HIV Nef protein is a multifunctional virulence factor that perturbs intracellular membranes and signalling and is secreted into exosomes. While Nef-containing exosomes have a distinct proteomic profile, no comprehensive analysis of their miRNA cargo has been carried out. Since Nef functions as a viral suppressor of RNA interference and disturbs the distribution of RNA-induced silencing complex proteins between cells and exosomes, we hypothesized that it might also affect the export of miRNAs into exosomes. Method Exosomes were purified from human monocytic U937 cells that stably expressed HIV-1 Nef. The RNA from cells and exosomes was profiled for 667 miRNAs using a Taqman Low Density Array. Selected miRNAs and their mRNA targets were validated by quantitative RT-PCR. Bioinformatics analyses were used to identify targets and predict pathways. Results Nef expression affected a significant fraction of miRNAs in U937 cells. Our analysis showed 47 miRNAs to be selectively secreted into Nef exosomes and 2 miRNAs to be selectively retained in Nef-expressing cells. The exosomal miRNAs were predicted to target several cellular genes in inflammatory cytokine and other pathways important for HIV pathogenesis, and an overwhelming majority had targets within the HIV genome. Conclusions This is the first study to report miRnome analysis of HIV Nef expressing monocytes and exosomes. Our results demonstrate that Nef causes large-scale dysregulation of cellular miRNAs, including their secretion through exosomes. We suggest this to be a novel viral strategy to affect pathogenesis and to limit the effects of RNA interference on viral replication and persistence.
Collapse
|
61
|
Saha I, Zubek J, Klingström T, Forsberg S, Wikander J, Kierczak M, Maulik U, Plewczynski D. Ensemble learning prediction of protein-protein interactions using proteins functional annotations. MOLECULAR BIOSYSTEMS 2014; 10:820-30. [PMID: 24469380 DOI: 10.1039/c3mb70486f] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Protein-protein interactions are important for the majority of biological processes. A significant number of computational methods have been developed to predict protein-protein interactions using protein sequence, structural and genomic data. Vast experimental data is publicly available on the Internet, but it is scattered across numerous databases. This fact motivated us to create and evaluate new high-throughput datasets of interacting proteins. We extracted interaction data from DIP, MINT, BioGRID and IntAct databases. Then we constructed descriptive features for machine learning purposes based on data from Gene Ontology and DOMINE. Thereafter, four well-established machine learning methods: Support Vector Machine, Random Forest, Decision Tree and Naïve Bayes, were used on these datasets to build an Ensemble Learning method based on majority voting. In cross-validation experiment, sensitivity exceeded 80% and classification/prediction accuracy reached 90% for the Ensemble Learning method. We extended the experiment to a bigger and more realistic dataset maintaining sensitivity over 70%. These results confirmed that our datasets are suitable for performing PPI prediction and Ensemble Learning method is well suited for this task. Both the processed PPI datasets and the software are available at .
Collapse
|
62
|
Mukhopadhyay A, Ray S, Maulik U. Incorporating the type and direction information in predicting novel regulatory interactions between HIV-1 and human proteins using a biclustering approach. BMC Bioinformatics 2014; 15:26. [PMID: 24460683 PMCID: PMC3922888 DOI: 10.1186/1471-2105-15-26] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2013] [Accepted: 01/08/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Discovering novel interactions between HIV-1 and human proteins would greatly contribute to different areas of HIV research. Identification of such interactions leads to a greater insight into drug target prediction. Some recent studies have been conducted for computational prediction of new interactions based on the experimentally validated information stored in a HIV-1-human protein-protein interaction database. However, these techniques do not predict any regulatory mechanism between HIV-1 and human proteins by considering interaction types and direction of regulation of interactions. RESULTS Here we present an association rule mining technique based on biclustering for discovering a set of rules among human and HIV-1 proteins using the publicly available HIV-1-human PPI database. These rules are subsequently utilized to predict some novel interactions among HIV-1 and human proteins. For prediction purpose both the interaction types and direction of regulation of interactions, (i.e., virus-to-host or host-to-virus) are considered here to provide important additional information about the regulation pattern of interactions. We have also studied the biclusters and analyzed the significant GO terms and KEGG pathways in which the human proteins of the biclusters participate. Moreover the predicted rules have also been analyzed to discover regulatory relationship between some human proteins in course of HIV-1 infection. Some experimental evidences of our predicted interactions have been found by searching the recent literatures in PUBMED. We have also highlighted some human proteins that are likely to act against the HIV-1 attack. CONCLUSIONS We pose the problem of identifying new regulatory interactions between HIV-1 and human proteins based on the existing PPI database as an association rule mining problem based on biclustering algorithm. We discover some novel regulatory interactions between HIV-1 and human proteins. Significant number of predicted interactions has been found to be supported by recent literature.
Collapse
|
63
|
Sriwastava BK, Basu S, Maulik U, Plewczynski D. PPIcons: identification of protein-protein interaction sites in selected organisms. J Mol Model 2013; 19:4059-70. [PMID: 23729008 PMCID: PMC3744667 DOI: 10.1007/s00894-013-1886-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2013] [Accepted: 05/06/2013] [Indexed: 01/08/2023]
Abstract
The physico-chemical properties of interaction interfaces have a crucial role in characterization of protein-protein interactions (PPI). In silico prediction of participating amino acids helps to identify interface residues for further experimental verification using mutational analysis, or inhibition studies by screening library of ligands against given protein. Given the unbound structure of a protein and the fact that it forms a complex with another known protein, the objective of this work is to identify the residues that are involved in the interaction. We attempt to predict interaction sites in protein complexes using local composition of amino acids together with their physico-chemical characteristics. The local sequence segments (LSS) are dissected from the protein sequences using a sliding window of 21 amino acids. The list of LSSs is passed to the support vector machine (SVM) predictor, which identifies interacting residue pairs considering their inter-atom distances. We have analyzed three different model organisms of Escherichia coli, Saccharomyces Cerevisiae and Homo sapiens, where the numbers of considered hetero-complexes are equal to 40, 123 and 33 respectively. Moreover, the unified multi-organism PPI meta-predictor is also developed under the current work by combining the training databases of above organisms. The PPIcons interface residues prediction method is measured by the area under ROC curve (AUC) equal to 0.82, 0.75, 0.72 and 0.76 for the aforementioned organisms and the meta-predictor respectively.
Collapse
|
64
|
Bandyopadhyay S, Sengupta D, Maulik U. GRF: A Greedy Rank Fusion Algorithm for Combining MicroRNA Target Orderings. Mol Inform 2013; 32:685-91. [PMID: 27480061 DOI: 10.1002/minf.201200165] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Accepted: 06/14/2013] [Indexed: 11/09/2022]
|
65
|
Chakraborty C, Bandyopadhyay S, Maulik U, Agoramoorthy G. Topology Mapping of Insulin-Regulated Glucose Transporter GLUT4 Using Computational Biology. Cell Biochem Biophys 2013; 67:1261-74. [DOI: 10.1007/s12013-013-9644-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
66
|
Sengupta D, Pyne A, Maulik U, Bandyopadhyay S. Reformulated Kemeny optimal aggregation with application in consensus ranking of microRNA targets. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:742-751. [PMID: 24091406 DOI: 10.1109/tcbb.2013.74] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
MicroRNAs are very recently discovered small noncoding RNAs, responsible for negative regulation of gene expression. Members of this endogenous family of small RNA molecules have been found implicated in many genetic disorders. Each microRNA targets tens to hundreds of genes. Experimental validation of target genes is a time- and cost-intensive procedure. Therefore, prediction of microRNA targets is a very important problem in computational biology. Though, dozens of target prediction algorithms have been reported in the past decade, they disagree significantly in terms of target gene ranking (based on predicted scores). Rank aggregation is often used to combine multiple target orderings suggested by different algorithms. This technique has been used in diverse fields including social choice theory, meta search in web, and most recently, in bioinformatics. Kemeny optimal aggregation (KOA) is considered the more profound objective for rank aggregation. The consensus ordering obtained through Kemeny optimal aggregation incurs minimum pairwise disagreement with the input orderings. Because of its computational intractability, heuristics are often formulated to obtain a near optimal consensus ranking. Unlike its real time use in meta search, there are a number of scenarios in bioinformatics (e.g., combining microRNA target rankings, combining disease-related gene rankings obtained from microarray experiments) where evolutionary approaches can be afforded with the ambition of better optimization. We conjecture that an ideal consensus ordering should have its total disagreement shared, as equally as possible, with the input orderings. This is also important to refrain the evolutionary processes from getting stuck to local extremes. In the current work, we reformulate Kemeny optimal aggregation while introducing a trade-off between the total pairwise disagreement and its distribution. A simulated annealing-based implementation of the proposed objective has been found effective in context of microRNA target ranking. Supplementary data and source code link are available at: >http://www.isical.ac.in/bioinfo_miu/ieee_tcbb_kemeny.rar.
Collapse
|
67
|
Bandyopadhyay S, Maulik U, Chakraborty R. Incorporating ϵ-dominance in AMOSA: Application to multiobjective 0/1 knapsack problem and clustering gene expression data. Appl Soft Comput 2013. [DOI: 10.1016/j.asoc.2012.11.050] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
68
|
Sarkar A, Maulik U. Cancer Gene Expression Data Analysis Using Rough Based Symmetrical Clustering. Bioinformatics 2013. [DOI: 10.4018/978-1-4666-3604-0.ch085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Identification of cancer subtypes is the central goal in the cancer gene expression data analysis. Modified symmetry-based clustering is an unsupervised learning technique for detecting symmetrical convex or non-convex shaped clusters. To enable fast automatic clustering of cancer tissues (samples), in this chapter, the authors propose a rough set based hybrid approach for modified symmetry-based clustering algorithm. A natural basis for analyzing gene expression data using the symmetry-based algorithm is to group together genes with similar symmetrical patterns of microarray expressions. Rough-set theory helps in faster convergence and initial automatic optimal classification, thereby solving the problem of unknown knowledge of number of clusters in gene expression measurement data. For rough-set-theoretic decision rule generation, each cluster is classified using heuristically searched optimal reducts to overcome overlapping cluster problem. The rough modified symmetry-based clustering algorithm is compared with another newly implemented rough-improved symmetry-based clustering algorithm and existing K-Means algorithm over five benchmark cancer gene expression data sets, to demonstrate its superiority in terms of validity. The statistical analyses are also performed to establish the significance of this rough modified symmetry-based clustering approach.
Collapse
|
69
|
Maulik U, Mukhopadhyay A, Chakraborty D. Gene-Expression-Based Cancer Subtypes Prediction Through Feature Selection and Transductive SVM. IEEE Trans Biomed Eng 2013; 60:1111-7. [DOI: 10.1109/tbme.2012.2225622] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
70
|
Bhar A, Haubrock M, Mukhopadhyay A, Maulik U, Bandyopadhyay S, Wingender E. Coexpression and coregulation analysis of time-series gene expression data in estrogen-induced breast cancer cell. Algorithms Mol Biol 2013; 8:9. [PMID: 23521829 PMCID: PMC3827943 DOI: 10.1186/1748-7188-8-9] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2012] [Accepted: 02/07/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Estrogen is a chemical messenger that has an influence on many breast cancers as it helps cells to grow and divide. These cancers are often known as estrogen responsive cancers in which estrogen receptor occupies the surface of the cells. The successful treatment of breast cancers requires understanding gene expression, identifying of tumor markers, acquiring knowledge of cellular pathways, etc. In this paper we introduce our proposed triclustering algorithm δ-TRIMAX that aims to find genes that are coexpressed over subset of samples across a subset of time points. Here we introduce a novel mean-squared residue for such 3D dataset. Our proposed algorithm yields triclusters that have a mean-squared residue score below a threshold δ. RESULTS We have applied our algorithm on one simulated dataset and one real-life dataset. The real-life dataset is a time-series dataset in estrogen induced breast cancer cell line. To establish the biological significance of genes belonging to resultant triclusters we have performed gene ontology, KEGG pathway and transcription factor binding site enrichment analysis. Additionally, we represent each resultant tricluster by computing its eigengene and verify whether its eigengene is also differentially expressed at early, middle and late estrogen responsive stages. We also identified hub-genes for each resultant triclusters and verified whether the hub-genes are found to be associated with breast cancer. Through our analysis CCL2, CD47, NFIB, BRD4, HPGD, CSNK1E, NPC1L1, PTEN, PTPN2 and ADAM9 are identified as hub-genes which are already known to be associated with breast cancer. The other genes that have also been identified as hub-genes might be associated with breast cancer or estrogen responsive elements. The TFBS enrichment analysis also reveals that transcription factor POU2F1 binds to the promoter region of ESR1 that encodes estrogen receptor α. Transcription factor E2F1 binds to the promoter regions of coexpressed genes MCM7, ANAPC1 and WEE1. CONCLUSIONS Thus our integrative approach provides insights into breast cancer prognosis.
Collapse
|
71
|
Maulik U, Mukhopadhyay A, Bhattacharyya M, Kaderali L, Brors B, Bandyopadhyay S, Eils R. Mining quasi-bicliques from HIV-1-human protein interaction network: a multiobjective biclustering approach. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:423-435. [PMID: 23929866 DOI: 10.1109/tcbb.2012.139] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
In this work, we model the problem of mining quasi-bicliques from weighted viral-host protein-protein interaction network as a biclustering problem for identifying strong interaction modules. In this regard, a multiobjective genetic algorithm-based biclustering technique is proposed that simultaneously optimizes three objective functions to obtain dense biclusters having high mean interaction strengths. The performance of the proposed technique has been compared with that of other existing biclustering methods on an artificial data. Subsequently, the proposed biclustering method is applied on the records of biologically validated and predicted interactions between a set of HIV-1 proteins and a set of human proteins to identify strong interaction modules. For this, the entire interaction information is realized as a bipartite graph. We have further investigated the biological significance of the obtained biclusters. The human proteins involved in the strong interaction module have been found to share common biological properties and they are identified as the gateways of viral infection leading to various diseases. These human proteins can be potential drug targets for developing anti-HIV drugs.
Collapse
|
72
|
Maulik U, Sarkar A. Searching remote homology with spectral clustering with symmetry in neighborhood cluster kernels. PLoS One 2013; 8:e46468. [PMID: 23457439 PMCID: PMC3574063 DOI: 10.1371/journal.pone.0046468] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2011] [Accepted: 09/04/2012] [Indexed: 11/18/2022] Open
Abstract
UNLABELLED Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of "recent" paralogs. The spectral clustering approach with new combined local alignment kernels more effectively exploits the unsupervised protein sequences globally reducing inter-cluster walks. When combined with the corrections based on modified symmetry based proximity norm deemphasizing outliers, the technique proposed in this article outperforms other state-of-the-art cluster kernels among all twelve implemented kernels. The comparison with the state-of-the-art string and mismatch kernels also show the superior performance scores provided by the proposed kernels. Similar performance improvement also is found over an existing large dataset. Therefore the proposed spectral clustering framework over combined local alignment kernels with modified symmetry based correction achieves superior performance for unsupervised remote homolog detection even in multi-domain and promiscuous domain proteins from Genolevures database families with better biological relevance. Source code available upon request. CONTACT sarkar@labri.fr.
Collapse
|
73
|
Mukhopadhyay A, Maulik U, Bandyopadhyay S. An Interactive Approach to Multiobjective Clustering of Gene Expression Patterns. IEEE Trans Biomed Eng 2013; 60:35-41. [DOI: 10.1109/tbme.2012.2220765] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
74
|
Bhattacharyya S, Maulik U, Dutta P. A parallel bi-directional self-organizing neural network (PBDSONN) architecture for color image extraction and segmentation. Neurocomputing 2012. [DOI: 10.1016/j.neucom.2011.11.025] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
75
|
Sengupta D, Maulik U, Bandyopadhyay S. Weighted Markov Chain Based Aggregation of Bio-molecule Orderings. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:924-933. [PMID: 22331863 DOI: 10.1109/tcbb.2012.28] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
The scope and effectiveness of rank aggregation have already been established in contemporary bioinformatics research. Rank aggregation helps in meta analysis of putative results collected from different analytic or experimental sources. For example, we often receive considerably differing ranked lists of genes or microRNAs from various target prediction algorithms or microarray studies. Sometimes combining them all, in some sense, yields more effective ordering of the set of objects. Also, assigning a certain level of confidence to each source of ranking is a natural demand of aggregation. Assignment of weights to the sources of orderings can be performed by experts. Several rank aggregation approaches like those based on Markov chains (MC), evolutionary algorithms etc., exist in the literature. Markov chains, in general are faster than the evolutionary approaches. Unlike the evolutionary computing approaches Markov chains have not been used for weighted aggregation scenarios. This is because of the absence of a formal framework of weighted Markov chain. In this article we propose the use of a modified version of MC4 (one of the Markov chains proposed by Dwork et al., 2001), followed by the weighted analog of local Kemenization for performing rank aggregation, where the sources of rankings can be prioritized by an expert.
Collapse
|