Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Hartuv E, Shamir R. A clustering algorithm based on graph connectivity. INFORM PROCESS LETT 2000;76:175-81. [DOI: 10.1016/s0020-0190(00)00142-3] [Citation(s) in RCA: 274] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Number

Cited by Other Article(s)

Zhao J, Hu X, He T, Li P, Zhang M, Shen X. An edge-based protein complex identification algorithm with gene co-expression data (PCIA-GeCo). IEEE Trans Nanobioscience 2014;13:80-8. [PMID: 24803023 DOI: 10.1109/tnb.2014.2317519] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]

Paul S, Maji P. City block distance and rough-fuzzy clustering for identification of co-expressed microRNAs. MOLECULAR BIOSYSTEMS 2014;10:1509-23. [PMID: 24682049 DOI: 10.1039/c4mb00101j] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]

MENÉNDEZ HÉCTORD, BARRERO DAVIDF, CAMACHO DAVID. A GENETIC GRAPH-BASED APPROACH FOR PARTITIONAL CLUSTERING. Int J Neural Syst 2014;24:1430008. [DOI: 10.1142/s0129065714300083] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Wang S, Wu F. Detecting overlapping protein complexes in PPI networks based on robustness. Proteome Sci 2013;11:S18. [PMID: 24565162 PMCID: PMC3908676 DOI: 10.1186/1477-5956-11-s1-s18] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open

Moschopoulos C, Beligiannis G, Likothanassis S, Kossida S. Using a Genetic Algorithm and Markov Clustering on Protein–Protein Interaction Graphs. Bioinformatics 2013. [DOI: 10.4018/978-1-4666-3604-0.ch043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open

Wang J, Peng X, Xiao Q, Li M, Pan Y. An effective method for refining predicted protein complexes based on protein activity and the mechanism of protein complex formation. BMC SYSTEMS BIOLOGY 2013;7:28. [PMID: 23537347 PMCID: PMC3648373 DOI: 10.1186/1752-0509-7-28] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/09/2012] [Accepted: 03/14/2013] [Indexed: 11/10/2022]

Maji P, Paul S. Rough-fuzzy clustering for grouping functionally similar genes from microarray data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013;10:286-299. [PMID: 22848138 DOI: 10.1109/tcbb.2012.103] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]

Hayes W, Sun K, Pržulj N. Graphlet-based measures are suitable for biological network comparison. Bioinformatics 2013;29:483-91. [PMID: 23349212 DOI: 10.1093/bioinformatics/bts729] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open

MOSCHOPOULOS CHARALAMPOS, FYTROS MARIOS, ALATSATHIANOS STAMATIS, LIKOTHANASSIS SPIRIDON, KOSSIDA SOPHIA. GAPPI: IDENTIFYING IMPORTANT PROTEIN MODULES THROUGH PROTEIN-PROTEIN INTERACTION GRAPHS. INT J ARTIF INTELL T 2013. [DOI: 10.1142/s0218213012500273] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Cai B, Wang H, Zheng H, Wang H. Detection of protein complexes from affinity purification/mass spectrometry data. BMC SYSTEMS BIOLOGY 2012;6 Suppl 3:S4. [PMID: 23282282 PMCID: PMC3524315 DOI: 10.1186/1752-0509-6-s3-s4] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]

Abstract

Background

Recent advances in molecular biology have led to the accumulation of large amounts of data on protein-protein interaction networks in different species. An important challenge for the analysis of these data is to extract functional modules such as protein complexes and biological processes from networks which are characterised by the present of a significant number of false positives. Various computational techniques have been applied in recent years. However, most of them treat protein interaction as binary. Co-complex relations derived from affinity purification/mass spectrometry (AP-MS) experiments have been largely ignored.

Methods

This paper presents a new algorithm for detecting protein complexes from AP-MS data. The algorithm intends to detect groups of prey proteins that are significantly co-associated with the same set of bait proteins. We first construct AP-MS data as a bipartite network, where one set of nodes consists of bait proteins and the other set is composed of prey proteins. We then calculate pair-wise similarities of bait proteins based on the number of their commonly shared neighbours. A hierarchical clustering algorithm is employed to cluster bait proteins based on the similarities and thus a set of 'seed' clusters is obtained. Starting from these 'seed' clusters, an expansion process is developed to identify prey proteins which are significantly associated with the same set of bait proteins. Then, a set of complete protein complexes is derived. In application to two real AP-MS datasets, we validate biological significance of predicted protein complexes by using curated protein complexes and well-characterized cellular component annotation from Gene Ontology (GO). Several statistical metrics have been applied for evaluation.

Results

Experimental results show that, the proposed algorithm achieves significant improvement in detecting protein complexes from AP-MS data. In comparison to the well-known MCL algorithm, our algorithm improves the accuracy rate by about 20% in detecting protein complexes in both networks and increases the F-Measure value by about 50% in Krogan_2006 network. Greater precision and better accuracy have been achieved and the identified complexes are demonstrated to match well with existing curated protein complexes.

Conclusions

Our study highlights the significance of taking co-complex relations into account when extracting protein complexes from AP-MS data. The algorithm proposed in this paper can be easily extended to the analysis of other biological networks which can be conveniently represented by bipartite graphs such as drug-target networks.

Collapse

Jianxin Wang, Gang Chen, Binbin Liu, Min Li, Yi Pan. Identifying Protein Complexes From Interactome Based on Essential Proteins and Local Fitness Method. IEEE Trans Nanobioscience 2012;11:324-35. [DOI: 10.1109/tnb.2012.2197863] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Osoba O, Kosko B. Noise-enhanced clustering and competitive learning algorithms. Neural Netw 2012;37:132-40. [PMID: 23137615 DOI: 10.1016/j.neunet.2012.09.012] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2012] [Revised: 09/20/2012] [Accepted: 09/20/2012] [Indexed: 11/30/2022]

BELLO-ORGAZ GEMA, MENÉNDEZ HÉCTORD, CAMACHO DAVID. ADAPTIVE K-MEANS ALGORITHM FOR OVERLAPPED GRAPH CLUSTERING. Int J Neural Syst 2012;22:1250018. [DOI: 10.1142/s0129065712500189] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Le T, Tran D, Nguyen P, Ma W, Sharma D. Proximity multi-sphere support vector clustering. Neural Comput Appl 2012. [DOI: 10.1007/s00521-012-1001-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Kuhl C, Tautenhahn R, Böttcher C, Larson TR, Neumann S. CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. Anal Chem 2011;84:283-9. [PMID: 22111785 DOI: 10.1021/ac202450g] [Citation(s) in RCA: 752] [Impact Index Per Article: 57.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]

Parker BJ, Moltke I, Roth A, Washietl S, Wen J, Kellis M, Breaker R, Pedersen JS. New families of human regulatory RNA structures identified by comparative analysis of vertebrate genomes. Genome Res 2011;21:1929-43. [PMID: 21994249 DOI: 10.1101/gr.112516.110] [Citation(s) in RCA: 80] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]

Abstract

Regulatory RNA structures are often members of families with multiple paralogous instances across the genome. Family members share functional and structural properties, which allow them to be studied as a whole, facilitating both bioinformatic and experimental characterization. We have developed a comparative method, EvoFam, for genome-wide identification of families of regulatory RNA structures, based on primary sequence and secondary structure similarity. We apply EvoFam to a 41-way genomic vertebrate alignment. Genome-wide, we identify 220 human, high-confidence families outside protein-coding regions comprising 725 individual structures, including 48 families with known structural RNA elements. Known families identified include both noncoding RNAs, e.g., miRNAs and the recently identified MALAT1/MEN β lincRNA family; and cis-regulatory structures, e.g., iron-responsive elements. We also identify tens of new families supported by strong evolutionary evidence and other statistical evidence, such as GO term enrichments. For some of these, detailed analysis has led to the formulation of specific functional hypotheses. Examples include two hypothesized auto-regulatory feedback mechanisms: one involving six long hairpins in the 3'-UTR of MAT2A, a key metabolic gene that produces the primary human methyl donor S-adenosylmethionine; the other involving a tRNA-like structure in the intron of the tRNA maturation gene POP1. We experimentally validate the predicted MAT2A structures. Finally, we identify potential new regulatory networks, including large families of short hairpins enriched in immunity-related genes, e.g., TNF, FOS, and CTLA4, which include known transcript destabilizing elements. Our findings exemplify the diversity of post-transcriptional regulation and provide a resource for further characterization of new regulatory mechanisms and families of noncoding RNAs.

Collapse

Gaussian kernel width exploration and cone cluster labeling for support vector clustering. Pattern Anal Appl 2011. [DOI: 10.1007/s10044-011-0244-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

YU L, GAO L, SUN PG. Research on Algorithms for Complexes and Functional Modules Prediction in Protein-Protein Interaction Networks. ACTA ACUST UNITED AC 2011. [DOI: 10.3724/sp.j.1016.2011.01239] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Wang J, Li M, Chen J, Pan Y. A fast hierarchical clustering algorithm for functional modules discovery in protein interaction networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011;8:607-620. [PMID: 20733244 DOI: 10.1109/tcbb.2010.75] [Citation(s) in RCA: 104] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]

A graph model for mutual information based clustering. J Intell Inf Syst 2010. [DOI: 10.1007/s10844-010-0132-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Khor S. Application of graph colouring to biological networks. IET Syst Biol 2010;4:185-92. [PMID: 20499999 DOI: 10.1049/iet-syb.2009.0038] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open

Koyutürk M. Algorithmic and analytical methods in network biology. WILEY INTERDISCIPLINARY REVIEWS. SYSTEMS BIOLOGY AND MEDICINE 2010;2:277-292. [PMID: 20836029 PMCID: PMC3087298 DOI: 10.1002/wsbm.61] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

Abstract

During the genomic revolution, algorithmic and analytical methods for organizing, integrating, analyzing, and querying biological sequence data proved invaluable. Today, increasing availability of high-throughput data pertaining to functional states of biomolecules, as well as their interactions, enables genome-scale studies of the cell from a systems perspective. The past decade witnessed significant efforts on the development of computational infrastructure for large-scale modeling and analysis of biological systems, commonly using network models. Such efforts lead to novel insights into the complexity of living systems, through development of sophisticated abstractions, algorithms, and analytical techniques that address a broad range of problems, including the following: (1) inference and reconstruction of complex cellular networks; (2) identification of common and coherent patterns in cellular networks, with a view to understanding the organizing principles and building blocks of cellular signaling, regulation, and metabolism; and (3) characterization of cellular mechanisms that underlie the differences between living systems, in terms of evolutionary diversity, development and differentiation, and complex phenotypes, including human disease. These problems pose significant algorithmic and analytical challenges because of the inherent complexity of the systems being studied; limitations of data in terms of availability, scope, and scale; intractability of resulting computational problems; and limitations of reference models for reliable statistical inference. This article provides a broad overview of existing algorithmic and analytical approaches to these problems, highlights key biological insights provided by these approaches, and outlines emerging opportunities and challenges in computational systems biology.

Collapse

A Survey of Algorithms for Dense Subgraph Discovery. MANAGING AND MINING GRAPH DATA 2010. [DOI: 10.1007/978-1-4419-6045-0_10] [Citation(s) in RCA: 100] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

A Survey of Graph Mining Techniques for Biological Datasets. MANAGING AND MINING GRAPH DATA 2010. [DOI: 10.1007/978-1-4419-6045-0_18] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Moschopoulos CN, Pavlopoulos GA, Schneider R, Likothanassis SD, Kossida S. GIBA: a clustering tool for detecting protein complexes. BMC Bioinformatics 2009;10 Suppl 6:S11. [PMID: 19534736 PMCID: PMC2697634 DOI: 10.1186/1471-2105-10-s6-s11] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/24/2023] Open

Gao L, Sun PG, Song J. Clustering algorithms for detecting functional modules in protein interaction networks. J Bioinform Comput Biol 2009;7:217-42. [PMID: 19226668 DOI: 10.1142/s0219720009004023] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2008] [Revised: 10/21/2008] [Accepted: 10/21/2008] [Indexed: 01/21/2023]

Multiobjective evolutionary clustering of Web user sessions: a case study in Web page recommendation. Soft comput 2009. [DOI: 10.1007/s00500-009-0428-y] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Maulik U, Mukhopadhyay A, Bandyopadhyay S. Finding multiple coherent biclusters in microarray data using variable string length multiobjective genetic algorithm. ACTA ACUST UNITED AC 2009;13:969-75. [PMID: 19304489 DOI: 10.1109/titb.2009.2017527] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Húsek D, Pokorný J, Řezanková H, Snášel V. Web Data Clustering. ACTA ACUST UNITED AC 2009. [DOI: 10.1007/978-3-642-01088-0_14] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/15/2023]

Effective Pruning Techniques for Mining Quasi-Cliques. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES 2008. [DOI: 10.1007/978-3-540-87481-2_3] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

SkyGraph: an algorithm for important subgraph discovery in relational graphs. Data Min Knowl Discov 2008. [DOI: 10.1007/s10618-008-0109-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Madi A, Friedman Y, Roth D, Regev T, Bransburg-Zabary S, Jacob EB. Genome holography: deciphering function-form motifs from gene expression data. PLoS One 2008;3:e2708. [PMID: 18628959 PMCID: PMC2444029 DOI: 10.1371/journal.pone.0002708] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2008] [Accepted: 06/19/2008] [Indexed: 12/28/2022] Open

Abstract

Background

DNA chips allow simultaneous measurements of genome-wide response of thousands of genes, i.e. system level monitoring of the gene-network activity. Advanced analysis methods have been developed to extract meaningful information from the vast amount of raw gene-expression data obtained from the microarray measurements. These methods usually aimed to distinguish between groups of subjects (e.g., cancer patients vs. healthy subjects) or identifying marker genes that help to distinguish between those groups. We assumed that motifs related to the internal structure of operons and gene-networks regulation are also embedded in microarray and can be deciphered by using proper analysis.

Methodology/Principal Findings

The analysis presented here is based on investigating the gene-gene correlations. We analyze a database of gene expression of Bacillus subtilis exposed to sub-lethal levels of 37 different antibiotics. Using unsupervised analysis (dendrogram) of the matrix of normalized gene-gene correlations, we identified the operons as they form distinct clusters of genes in the sorted correlation matrix. Applying dimension-reduction algorithm (Principal Component Analysis, PCA) to the matrices of normalized correlations reveals functional motifs. The genes are placed in a reduced 3-dimensional space of the three leading PCA eigen-vectors according to their corresponding eigen-values. We found that the organization of the genes in the reduced PCA space recovers motifs of the operon internal structure, such as the order of the genes along the genome, gene separation by non-coding segments, and translational start and end regions. In addition to the intra-operon structure, it is also possible to predict inter-operon relationships, operons sharing functional regulation factors, and more. In particular, we demonstrate the above in the context of the competence and sporulation pathways.

Conclusions/Significance

We demonstrated that by analyzing gene-gene correlation from gene-expression data it is possible to identify operons and to predict unknown internal structure of operons and gene-networks regulation.

Collapse

Zhao XM, Chen L, Aihara K. Protein function prediction with high-throughput data. Amino Acids 2008;35:517-30. [PMID: 18427717 DOI: 10.1007/s00726-008-0077-y] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2008] [Accepted: 03/13/2008] [Indexed: 12/12/2022]

Milenković T, Pržulj N. Uncovering Biological Network Function via Graphlet Degree Signatures. Cancer Inform 2008. [DOI: 10.4137/cin.s680] [Citation(s) in RCA: 166] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open

Hwang W, Cho YR, Zhang A, Ramanathan M. CASCADE: a novel quasi all paths-based network analysis algorithm for clustering biological interactions. BMC Bioinformatics 2008;9:64. [PMID: 18230159 PMCID: PMC2253513 DOI: 10.1186/1471-2105-9-64] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2007] [Accepted: 01/29/2008] [Indexed: 11/10/2022] Open

Santos JM, Marques de Sa J, Alexandre LA. LEGClust- a clustering algorithm based on layered entropic subgraphs. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2008;30:62-75. [PMID: 18000325 DOI: 10.1109/tpami.2007.1142] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]

Yan X, Mehan MR, Huang Y, Waterman MS, Yu PS, Zhou XJ. A graph-based approach to systematically reconstruct human transcriptional regulatory modules. ACTA ACUST UNITED AC 2007;23:i577-86. [PMID: 17646346 DOI: 10.1093/bioinformatics/btm227] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Abstract

MOTIVATION

A major challenge in studying gene regulation is to systematically reconstruct transcription regulatory modules, which are defined as sets of genes that are regulated by a common set of transcription factors. A commonly used approach for transcription module reconstruction is to derive coexpression clusters from a microarray dataset. However, such results often contain false positives because genes from many transcription modules may be simultaneously perturbed upon a given type of conditions. In this study, we propose and validate that genes, which form a coexpression cluster in multiple microarray datasets across diverse conditions, are more likely to form a transcription module. However, identifying genes coexpressed in a subset of many microarray datasets is not a trivial computational problem.

RESULTS

We propose a graph-based data-mining approach to efficiently and systematically identify frequent coexpression clusters. Given m microarray datasets, we model each microarray dataset as a coexpression graph, and search for vertex sets which are frequently densely connected across [theta m] datasets (0 < or = theta < or = 1). For this novel graph-mining problem, we designed two techniques to narrow down the search space: (1) partition the input graphs into (overlapping) groups sharing common properties; (2) summarize the vertex neighbor information from the partitioned datasets onto the 'Neighbor Association Summary Graph's for effective mining. We applied our method to 105 human microarray datasets, and identified a large number of potential transcription modules, activated under different subsets of conditions. Validation by ChIP-chip data demonstrated that the likelihood of a coexpression cluster being a transcription module increases significantly with its recurrence. Our method opens a new way to exploit the vast amount of existing microarray data accumulation for gene regulation study. Furthermore, the algorithm is applicable to other biological networks for approximate network module mining.

AVAILABILITY

http://zhoulab.usc.edu/NeMo/.

Collapse

Koyutürk M, Szpankowski W, Grama A. Assessing Significance of Connectivity and Conservation in Protein Interaction Networks. J Comput Biol 2007;14:747-64. [PMID: 17691892 DOI: 10.1089/cmb.2007.r014] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Abstract

Comparative analyses of cellular interaction networks enable understanding of the cell's modular organization through identification of functional modules and complexes. These techniques often rely on topological features such as connectedness and density, based on the premise that functionally related proteins are likely to interact densely and that these interactions follow similar evolutionary trajectories. Significant recent work has focused on efficient algorithms for identification of such functional modules and their conservation. In spite of algorithmic advances, development of a comprehensive infrastructure for interaction databases is in relative infancy compared to corresponding sequence analysis tools. One critical, and as yet unresolved aspect of this infrastructure is a measure of the statistical significance of a match, or a dense subcomponent. In the absence of analytical measures, conventional methods rely on computationally expensive simulations based on ad-hoc models for quantifying significance. In this paper, we present techniques for analytically quantifying statistical significance of dense components in reference model graphs. We consider two reference models--a G(n, p) model in which each pair of nodes in a graph has an identical likelihood, p, of sharing an edge, and a two-level G(n, p) model, which accounts for high-degree hub nodes generally observed in interaction networks. Experiments performed on a rich collection of protein interaction (PPI) networks show that the proposed model provides a reliable means of evaluating statistical significance of dense patterns in these networks. We also adapt existing state-of-the-art network clustering algorithms by using our statistical significance measure as an optimization criterion. Comparison of the resulting module identification algorithm, SIDES, with existing methods shows that SIDES outperforms existing algorithms in terms of sensitivity and specificity of identified clusters with respect to available GO annotations.

Collapse

Li W, Liu Y, Huang HC, Peng Y, Lin Y, Ng WK, Ong KL. Dynamical systems for discovering protein complexes and functional modules from biological networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2007;4:233-50. [PMID: 17473317 DOI: 10.1109/tcbb.2007.070210] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]

Belacel N, Wang Q, Cuperlovic-Culf M. Clustering methods for microarray gene expression data. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2007;10:507-31. [PMID: 17233561 DOI: 10.1089/omi.2006.10.507] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Sharan R, Ulitsky I, Shamir R. Network-based prediction of protein function. Mol Syst Biol 2007;3:88. [PMID: 17353930 PMCID: PMC1847944 DOI: 10.1038/msb4100129] [Citation(s) in RCA: 620] [Impact Index Per Article: 36.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2006] [Accepted: 01/09/2007] [Indexed: 12/22/2022] Open

Di Giacomo E, Didimo W, Grilli L, Liotta G. Graph visualization techniques for web clustering engines. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2007;13:294-304. [PMID: 17218746 DOI: 10.1109/tvcg.2007.40] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]

Hwang W, Cho YR, Zhang A, Ramanathan M. A novel functional module detection algorithm for protein-protein interaction networks. Algorithms Mol Biol 2006;1:24. [PMID: 17147822 PMCID: PMC1764415 DOI: 10.1186/1748-7188-1-24] [Citation(s) in RCA: 69] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2006] [Accepted: 12/05/2006] [Indexed: 11/29/2022] Open

Chen X, Chen M, Ning K. BNArray: an R package for constructing gene regulatory networks from microarray data by using Bayesian network. Bioinformatics 2006;22:2952-4. [PMID: 17005537 DOI: 10.1093/bioinformatics/btl491] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Jiang D, Pei J, Ramanathan M, Lin C, Tang C, Zhang A. Mining gene–sample–time microarray data: a coherent gene cluster discovery approach. Knowl Inf Syst 2006. [DOI: 10.1007/s10115-006-0031-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Huang X, Lai W. Clustering graphs for visualization via node similarities. ACTA ACUST UNITED AC 2006. [DOI: 10.1016/j.jvlc.2005.10.003] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

Haynes T, Knisley D, Seier E, Zou Y. A quantitative analysis of secondary RNA structure using domination based parameters on trees. BMC Bioinformatics 2006;7:108. [PMID: 16515683 PMCID: PMC1420337 DOI: 10.1186/1471-2105-7-108] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2005] [Accepted: 03/03/2006] [Indexed: 11/30/2022] Open

Abstract

Background

It has become increasingly apparent that a comprehensive database of RNA motifs is essential in order to achieve new goals in genomic and proteomic research. Secondary RNA structures have frequently been represented by various modeling methods as graph-theoretic trees. Using graph theory as a modeling tool allows the vast resources of graphical invariants to be utilized to numerically identify secondary RNA motifs. The domination number of a graph is a graphical invariant that is sensitive to even a slight change in the structure of a tree. The invariants selected in this study are variations of the domination number of a graph. These graphical invariants are partitioned into two classes, and we define two parameters based on each of these classes. These parameters are calculated for all small order trees and a statistical analysis of the resulting data is conducted to determine if the values of these parameters can be utilized to identify which trees of orders seven and eight are RNA-like in structure.

Results

The statistical analysis shows that the domination based parameters correctly distinguish between the trees that represent native structures and those that are not likely candidates to represent RNA. Some of the trees previously identified as candidate structures are found to be "very" RNA like, while others are not, thereby refining the space of structures likely to be found as representing secondary RNA structure.

Conclusion

Search algorithms are available that mine nucleotide sequence databases. However, the number of motifs identified can be quite large, making a further search for similar motif computationally difficult. Much of the work in the bioinformatics arena is toward the development of better algorithms to address the computational problem. This work, on the other hand, uses mathematical descriptors to more clearly characterize the RNA motifs and thereby reduce the corresponding search space. These preliminary findings demonstrate that graph-theoretic quantifiers utilized in fields such as computer network design hold significant promise as an added tool for genomics and proteomics.

Collapse

Sauleau EA, Paumier JP, Buemi A. Medical record linkage in health information systems by approximate string matching and clustering. BMC Med Inform Decis Mak 2005;5:32. [PMID: 16219102 PMCID: PMC1274322 DOI: 10.1186/1472-6947-5-32] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2005] [Accepted: 10/11/2005] [Indexed: 11/21/2022] Open

Xu R, Wunsch D. Survey of clustering algorithms. ACTA ACUST UNITED AC 2005;16:645-78. [PMID: 15940994 DOI: 10.1109/tnn.2005.845141] [Citation(s) in RCA: 1004] [Impact Index Per Article: 52.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

100

Pan HY, Zhu J, Han DF. Clustering gene expression data based on predicted differential effects of GV interaction. GENOMICS, PROTEOMICS & BIOINFORMATICS 2005;3:36-41. [PMID: 16144520 PMCID: PMC5172465 DOI: 10.1016/s1672-0229(05)03005-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]