Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Hartuv E, Schmitt AO, Lange J, Meier-Ewert S, Lehrach H, Shamir R. An algorithm for clustering cDNA fingerprints. Genomics 2000;66:249-56. [PMID: 10873379 DOI: 10.1006/geno.2000.6187] [Citation(s) in RCA: 59] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

For:	Hartuv E, Schmitt AO, Lange J, Meier-Ewert S, Lehrach H, Shamir R. An algorithm for clustering cDNA fingerprints. Genomics 2000;66:249-56. [PMID: 10873379 DOI: 10.1006/geno.2000.6187] [Citation(s) in RCA: 59] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Number

Cited by Other Article(s)

Grbic M, Kartelj A, Jankovic S, Matic D, Filipovic V. Variable Neighborhood Search for Partitioning Sparse Biological Networks into the Maximum Edge-Weighted k-Plexes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020;17:1822-1831. [PMID: 30736005 DOI: 10.1109/tcbb.2019.2898189] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

A Transformative Concept: From Data Being Passive Objects to Data Being Active Subjects. DATA 2019. [DOI: 10.3390/data4040135] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open

Thomas J, Seo D, Sael L. Review on Graph Clustering and Subgraph Similarity Based Analysis of Neurological Disorders. Int J Mol Sci 2016;17:ijms17060862. [PMID: 27258269 PMCID: PMC4926396 DOI: 10.3390/ijms17060862] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2016] [Revised: 05/10/2016] [Accepted: 05/24/2016] [Indexed: 01/03/2023] Open

Ong P, Zainuddin Z. Calibrating wavelet neural networks by distance orientation similarity fuzzy C-means for approximation problems. Appl Soft Comput 2016. [DOI: 10.1016/j.asoc.2016.01.042] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Fernandez M, Riveros JD, Campos M, Mathee K, Narasimhan G. Microbial "social networks". BMC Genomics 2015;16 Suppl 11:S6. [PMID: 26576770 PMCID: PMC4652466 DOI: 10.1186/1471-2164-16-s11-s6] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open

Abstract

BACKGROUND

It is well understood that distinct communities of bacteria are present at different sites of the body, and that changes in the structure of these communities have strong implications for human health. Yet, challenges remain in understanding the complex interconnections between the bacterial taxa within these microbial communities and how they change during the progression of diseases. Many recent studies attempt to analyze the human microbiome using traditional ecological measures and cataloging differences in bacterial community membership. In this paper, we show how to push metagenomic analyses beyond mundane questions related to the bacterial taxonomic profiles that differentiate one sample from another.

METHODS

We develop tools and techniques that help us to investigate the nature of social interactions in microbial communities, and demonstrate ways of compactly capturing extensive information about these networks and visually conveying them in an effective manner. We define the concept of bacterial "social clubs", which are groups of taxa that tend to appear together in many samples. More importantly, we define the concept of "rival clubs", entire groups that tend to avoid occurring together in many samples. We show how to efficiently compute social clubs and rival clubs and demonstrate their utility with the help of examples including a smokers' dataset and a dataset from the Human Microbiome Project (HMP).

RESULTS

The tools developed provide a framework for analyzing relationships between bacterial taxa modeled as bacterial co-occurrence networks. The computational techniques also provide a framework for identifying clubs and rival clubs and for studying differences in the microbiomes (and their interactions) of two or more collections of samples.

CONCLUSIONS

Microbial relationships are similar to those found in social networks. In this work, we assume that strong (positive or negative) tendencies to co-occur or co-infect is likely to have biological, physiological, or ecological significance, possibly as a result of cooperation or competition. As a consequence of the analysis, a variety of biological interpretations are conjectured. In the human microbiome context, the pattern of strength of interactions between bacterial taxa is unique to body site.

Collapse

Zainuddin Z, Pauline O. An effective fuzzy C-means algorithm based on symmetry similarity approach. Appl Soft Comput 2015. [DOI: 10.1016/j.asoc.2015.06.021] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Wen J, Mohammed J, Bortolamiol-Becet D, Tsai H, Robine N, Westholm JO, Ladewig E, Dai Q, Okamura K, Flynt AS, Zhang D, Andrews J, Cherbas L, Kaufman TC, Cherbas P, Siepel A, Lai EC. Diversity of miRNAs, siRNAs, and piRNAs across 25 Drosophila cell lines. Genome Res 2015;24:1236-50. [PMID: 24985917 PMCID: PMC4079977 DOI: 10.1101/gr.161554.113] [Citation(s) in RCA: 59] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]

Affiliation(s)

Jiayu Wen Department of Developmental Biology, Sloan-Kettering Institute, New York, New York 10065, USA
Jaaved Mohammed Department of Developmental Biology, Sloan-Kettering Institute, New York, New York 10065, USA; Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853, USA; Tri-Institutional Training Program in Computational Biology and Medicine, New York, New York 10065, USA
Diane Bortolamiol-Becet Department of Developmental Biology, Sloan-Kettering Institute, New York, New York 10065, USA
Harrison Tsai Department of Developmental Biology, Sloan-Kettering Institute, New York, New York 10065, USA
Nicolas Robine Department of Developmental Biology, Sloan-Kettering Institute, New York, New York 10065, USA; New York Genome Center, New York, New York 10022, USA
Jakub O Westholm Department of Developmental Biology, Sloan-Kettering Institute, New York, New York 10065, USA
Erik Ladewig Department of Developmental Biology, Sloan-Kettering Institute, New York, New York 10065, USA
Qi Dai Department of Developmental Biology, Sloan-Kettering Institute, New York, New York 10065, USA
Katsutomo Okamura Department of Developmental Biology, Sloan-Kettering Institute, New York, New York 10065, USA; Temasek Life Sciences, Temasek Lifesciences Laboratory, National University of Singapore, 117604 Singapore
Alex S Flynt Department of Developmental Biology, Sloan-Kettering Institute, New York, New York 10065, USA
Dayu Zhang Department of Biology, Indiana University, Bloomington, Indiana 47405, USA
Justen Andrews Department of Biology, Indiana University, Bloomington, Indiana 47405, USA
Lucy Cherbas Department of Biology, Indiana University, Bloomington, Indiana 47405, USA
Thomas C Kaufman Department of Biology, Indiana University, Bloomington, Indiana 47405, USA
Peter Cherbas Department of Biology, Indiana University, Bloomington, Indiana 47405, USA
Adam Siepel Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853, USA
Eric C Lai Department of Developmental Biology, Sloan-Kettering Institute, New York, New York 10065, USA

Collapse

Hüffner F, Komusiewicz C, Liebtrau A, Niedermeier R. Partitioning Biological Networks into Highly Connected Clusters with Maximum Edge Coverage. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014;11:455-467. [PMID: 26356014 DOI: 10.1109/tcbb.2013.177] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Pizzuti C, Rombo SE. Algorithms and tools for protein-protein interaction networks clustering, with a special focus on population-based stochastic methods. ACTA ACUST UNITED AC 2014;30:1343-52. [PMID: 24458952 DOI: 10.1093/bioinformatics/btu034] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Design of wavelet neural networks based on symmetry fuzzy C-means for function approximation. Neural Comput Appl 2013. [DOI: 10.1007/s00521-013-1350-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

Divina F, Pontes B, Giráldez R, Aguilar-Ruiz JS. An effective measure for assessing the quality of biclusters. Comput Biol Med 2011;42:245-56. [PMID: 22196882 DOI: 10.1016/j.compbiomed.2011.11.015] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2011] [Revised: 09/26/2011] [Accepted: 11/26/2011] [Indexed: 10/14/2022]

Gusev A, Kenny EE, Lowe JK, Salit J, Saxena R, Kathiresan S, Altshuler DM, Friedman JM, Breslow JL, Pe'er I. DASH: a method for identical-by-descent haplotype mapping uncovers association with recent variation. Am J Hum Genet 2011;88:706-717. [PMID: 21620352 DOI: 10.1016/j.ajhg.2011.04.023] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2011] [Revised: 04/13/2011] [Accepted: 04/26/2011] [Indexed: 02/01/2023] Open

Distance Functions, Clustering Algorithms and Microarray Data Analysis. LECTURE NOTES IN COMPUTER SCIENCE 2010. [DOI: 10.1007/978-3-642-13800-3_10] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]

Madi A, Hecht I, Bransburg-Zabary S, Merbl Y, Pick A, Zucker-Toledano M, Quintana FJ, Tauber AI, Cohen IR, Ben-Jacob E. Organization of the autoantibody repertoire in healthy newborns and adults revealed by system level informatics of antigen microarray data. Proc Natl Acad Sci U S A 2009;106:14484-9. [PMID: 19667184 PMCID: PMC2732819 DOI: 10.1073/pnas.0901528106] [Citation(s) in RCA: 71] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2009] [Indexed: 11/18/2022] Open

Wang K, Zheng J, Zhang J, Dong J. Estimating the number of clusters via system evolution for cluster analysis of gene expression data. ACTA ACUST UNITED AC 2009;13:848-53. [PMID: 19527960 DOI: 10.1109/titb.2009.2025119] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Giancarlo R, Scaturro D, Utro F. Computational cluster validation for microarray data analysis: experimental assessment of Clest, Consensus Clustering, Figure of Merit, Gap Statistics and Model Explorer. BMC Bioinformatics 2008;9:462. [PMID: 18959783 PMCID: PMC2657801 DOI: 10.1186/1471-2105-9-462] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2008] [Accepted: 10/29/2008] [Indexed: 12/04/2022] Open

Abstract

Background

Inferring cluster structure in microarray datasets is a fundamental task for the so-called -omic sciences. It is also a fundamental question in Statistics, Data Analysis and Classification, in particular with regard to the prediction of the number of clusters in a dataset, usually established via internal validation measures. Despite the wealth of internal measures available in the literature, new ones have been recently proposed, some of them specifically for microarray data.

Results

We consider five such measures: Clest, Consensus (Consensus Clustering), FOM (Figure of Merit), Gap (Gap Statistics) and ME (Model Explorer), in addition to the classic WCSS (Within Cluster Sum-of-Squares) and KL (Krzanowski and Lai index). We perform extensive experiments on six benchmark microarray datasets, using both Hierarchical and K-means clustering algorithms, and we provide an analysis assessing both the intrinsic ability of a measure to predict the correct number of clusters in a dataset and its merit relative to the other measures. We pay particular attention both to precision and speed. Moreover, we also provide various fast approximation algorithms for the computation of Gap, FOM and WCSS. The main result is a hierarchy of those measures in terms of precision and speed, highlighting some of their merits and limitations not reported before in the literature.

Conclusion

Based on our analysis, we draw several conclusions for the use of those internal measures on microarray data. We report the main ones. Consensus is by far the best performer in terms of predictive power and remarkably algorithm-independent. Unfortunately, on large datasets, it may be of no use because of its non-trivial computer time demand (weeks on a state of the art PC). FOM is the second best performer although, quite surprisingly, it may not be competitive in this scenario: it has essentially the same predictive power of WCSS but it is from 6 to 100 times slower in time, depending on the dataset. The approximation algorithms for the computation of FOM, Gap and WCSS perform very well, i.e., they are faster while still granting a very close approximation of FOM and WCSS. The approximation algorithm for the computation of Gap deserves to be singled-out since it has a predictive power far better than Gap, it is competitive with the other measures, but it is at least two order of magnitude faster in time with respect to Gap. Another important novel conclusion that can be drawn from our analysis is that all the measures we have considered show severe limitations on large datasets, either due to computational demand (Consensus, as already mentioned, Clest and Gap) or to lack of precision (all of the other measures, including their approximations). The software and datasets are available under the GNU GPL on the supplementary material web page.

Collapse

Discovery and expansion of gene modules by seeking isolated groups in a random graph process. PLoS One 2008;3:e3358. [PMID: 18843375 PMCID: PMC2559867 DOI: 10.1371/journal.pone.0003358] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2008] [Accepted: 09/08/2008] [Indexed: 12/03/2022] Open

Yang CS, Chuang LY, Ke CH, Yang CH. A Combination of Shuffled Frog-Leaping Algorithm and Genetic Algorithm for Gene Selection. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS 2008. [DOI: 10.20965/jaciii.2008.p0218] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Kerr G, Ruskin H, Crane M, Doolan P. Techniques for clustering gene expression data. Comput Biol Med 2008;38:283-93. [DOI: 10.1016/j.compbiomed.2007.11.001] [Citation(s) in RCA: 112] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2006] [Revised: 10/26/2007] [Accepted: 11/05/2007] [Indexed: 10/22/2022]

Lalonde S, Ehrhardt DW, Loqué D, Chen J, Rhee SY, Frommer WB. Molecular and cellular approaches for the detection of protein-protein interactions: latest techniques and current limitations. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2008;53:610-635. [PMID: 18269572 DOI: 10.1111/j.1365-313x.2007.03332.x] [Citation(s) in RCA: 109] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]

Abstract

Homotypic and heterotypic protein interactions are crucial for all levels of cellular function, including architecture, regulation, metabolism, and signaling. Therefore, protein interaction maps represent essential components of post-genomic toolkits needed for understanding biological processes at a systems level. Over the past decade, a wide variety of methods have been developed to detect, analyze, and quantify protein interactions, including surface plasmon resonance spectroscopy, NMR, yeast two-hybrid screens, peptide tagging combined with mass spectrometry and fluorescence-based technologies. Fluorescence techniques range from co-localization of tags, which may be limited by the optical resolution of the microscope, to fluorescence resonance energy transfer-based methods that have molecular resolution and can also report on the dynamics and localization of the interactions within a cell. Proteins interact via highly evolved complementary surfaces with affinities that can vary over many orders of magnitude. Some of the techniques described in this review, such as surface plasmon resonance, provide detailed information on physical properties of these interactions, while others, such as two-hybrid techniques and mass spectrometry, are amenable to high-throughput analysis using robotics. In addition to providing an overview of these methods, this review emphasizes techniques that can be applied to determine interactions involving membrane proteins, including the split ubiquitin system and fluorescence-based technologies for characterizing hits obtained with high-throughput approaches. Mass spectrometry-based methods are covered by a review by Miernyk and Thelen (2008; this issue, pp. 597-609). In addition, we discuss the use of interaction data to construct interaction networks and as the basis for the exciting possibility of using to predict interaction surfaces.

Collapse

Liang F, Wang N. Dynamic agglomerative clustering of gene expression profiles. Pattern Recognit Lett 2007. [DOI: 10.1016/j.patrec.2007.01.009] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data. BMC Bioinformatics 2007;8:3. [PMID: 17204155 PMCID: PMC1774579 DOI: 10.1186/1471-2105-8-3] [Citation(s) in RCA: 156] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2006] [Accepted: 01/04/2007] [Indexed: 11/16/2022] Open

Abstract

Background

Data clustering analysis has been extensively applied to extract information from gene expression profiles obtained with DNA microarrays. To this aim, existing clustering approaches, mainly developed in computer science, have been adapted to microarray data analysis. However, previous studies revealed that microarray datasets have very diverse structures, some of which may not be correctly captured by current clustering methods. We therefore approached the problem from a new starting point, and developed a clustering algorithm designed to capture dataset-specific structures at the beginning of the process.

Results

The clustering algorithm is named Fuzzy clustering by Local Approximation of MEmbership (FLAME). Distinctive elements of FLAME are: (i) definition of the neighborhood of each object (gene or sample) and identification of objects with "archetypal" features named Cluster Supporting Objects, around which to construct the clusters; (ii) assignment to each object of a fuzzy membership vector approximated from the memberships of its neighboring objects, by an iterative converging process in which membership spreads from the Cluster Supporting Objects through their neighbors. Comparative analysis with K-means, hierarchical, fuzzy C-means and fuzzy self-organizing maps (SOM) showed that data partitions generated by FLAME are not superimposable to those of other methods and, although different types of datasets are better partitioned by different algorithms, FLAME displays the best overall performance. FLAME is implemented, together with all the above-mentioned algorithms, in a C++ software with graphical interface for Linux and Windows, capable of handling very large datasets, named Gene Expression Data Analysis Studio (GEDAS), freely available under GNU General Public License.

Conclusion

The FLAME algorithm has intrinsic advantages, such as the ability to capture non-linear relationships and non-globular clusters, the automated definition of the number of clusters, and the identification of cluster outliers, i.e. genes that are not assigned to any cluster. As a result, clusters are more internally homogeneous and more diverse from each other, and provide better partitioning of biological functions. The clustering algorithm can be easily extended to applications different from gene expression analysis.

Collapse

Kim KJ, Cho SB. Ensemble classifiers based on correlation analysis for DNA microarray classification. Neurocomputing 2006. [DOI: 10.1016/j.neucom.2006.03.002] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

Cancer classification using ensemble of neural networks with multiple significant gene subsets. APPL INTELL 2006. [DOI: 10.1007/s10489-006-0020-4] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Shah S, Kusiak A. Cancer gene search with data-mining and genetic algorithms. Comput Biol Med 2006;37:251-61. [PMID: 16616736 DOI: 10.1016/j.compbiomed.2006.01.007] [Citation(s) in RCA: 89] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2005] [Revised: 11/20/2005] [Accepted: 01/24/2006] [Indexed: 12/13/2022]

Di Gesú V, Giancarlo R, Lo Bosco G, Raimondi A, Scaturro D. GenClust: a genetic algorithm for clustering gene expression data. BMC Bioinformatics 2005;6:289. [PMID: 16336639 PMCID: PMC1343581 DOI: 10.1186/1471-2105-6-289] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2005] [Accepted: 12/07/2005] [Indexed: 11/24/2022] Open

Abstract

BACKGROUND

Clustering is a key step in the analysis of gene expression data, and in fact, many classical clustering algorithms are used, or more innovative ones have been designed and validated for the task. Despite the widespread use of artificial intelligence techniques in bioinformatics and, more generally, data analysis, there are very few clustering algorithms based on the genetic paradigm, yet that paradigm has great potential in finding good heuristic solutions to a difficult optimization problem such as clustering.

RESULTS

GenClust is a new genetic algorithm for clustering gene expression data. It has two key features: (a) a novel coding of the search space that is simple, compact and easy to update; (b) it can be used naturally in conjunction with data driven internal validation methods. We have experimented with the FOM methodology, specifically conceived for validating clusters of gene expression data. The validity of GenClust has been assessed experimentally on real data sets, both with the use of validation measures and in comparison with other algorithms, i.e., Average Link, Cast, Click and K-means.

CONCLUSION

Experiments show that none of the algorithms we have used is markedly superior to the others across data sets and validation measures; i.e., in many cases the observed differences between the worst and best performing algorithm may be statistically insignificant and they could be considered equivalent. However, there are cases in which an algorithm may be better than others and therefore worthwhile. In particular, experiments for GenClust show that, although simple in its data representation, it converges very rapidly to a local optimum and that its ability to identify meaningful clusters is comparable, and sometimes superior, to that of more sophisticated algorithms. In addition, it is well suited for use in conjunction with data driven internal validation measures and, in particular, the FOM methodology.

Collapse

Figueroa A, Borneman J, Jiang T. Clustering binary fingerprint vectors with missing values for DNA array data analysis. J Comput Biol 2005;11:887-901. [PMID: 15700408 DOI: 10.1089/cmb.2004.11.887] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open

Abstract

Oligonucleotide fingerprinting is a powerful DNA array-based method to characterize cDNA and ribosomal RNA gene (rDNA) libraries and has many applications including gene expression profiling and DNA clone classification. We are especially interested in the latter application. A key step in the method is the cluster analysis of fingerprint data obtained from DNA array hybridization experiments. Most of the existing approaches to clustering use (normalized) real intensity values and thus do not treat positive and negative hybridization signals equally (positive signals are much more emphasized). In this paper, we consider a discrete approach. Fingerprint data are first normalized and binarized using control DNA clones. Because there may exist unresolved (or missing) values in this binarization process, we formulate the clustering of (binary) oligonucleotide fingerprints as a combinatorial optimization problem that attempts to identify clusters and resolve the missing values in the fingerprints simultaneously. We study the computational complexity of this clustering problem and a natural parameterized version and present an efficient greedy algorithm based on MINIMUM CLIQUE PARTITION on graphs. The algorithm takes advantage of some unique properties of the graphs considered here, which allow us to efficiently find the maximum cliques as well as some special maximal cliques. Our preliminary experimental results on simulated and real data demonstrate that the algorithm runs faster and performs better than some popular hierarchical and graph-based clustering methods. The results on real data from DNA clone classification also suggest that this discrete approach is more accurate than clustering methods based on real intensity values in terms of separating clones that have different characteristics with respect to the given oligonucleotide probes.

Collapse

Pancoska P, Moravek Z, Moll UM. Rational design of DNA sequences for nanotechnology, microarrays and molecular computers using Eulerian graphs. Nucleic Acids Res 2004;32:4630-45. [PMID: 15333695 PMCID: PMC516071 DOI: 10.1093/nar/gkh802] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open

Shah SC, Kusiak A. Data mining and genetic algorithm based gene/SNP selection. Artif Intell Med 2004;31:183-96. [PMID: 15302085 DOI: 10.1016/j.artmed.2004.04.002] [Citation(s) in RCA: 73] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2003] [Revised: 02/07/2004] [Accepted: 04/03/2004] [Indexed: 11/19/2022]

Kim YH, Lee SY, Moon BR. A Genetic Approach for Gene Selection on Microarray Expression Data. GENETIC AND EVOLUTIONARY COMPUTATION – GECCO 2004 2004. [DOI: 10.1007/978-3-540-24854-5_36] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]

Das M, Harvey I, Chu LL, Sinha M, Pelletier J. Full-length cDNAs: more than just reaching the ends. Physiol Genomics 2001;6:57-80. [PMID: 11459922 DOI: 10.1152/physiolgenomics.2001.6.2.57] [Citation(s) in RCA: 42] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open

A clustering algorithm based on graph connectivity. INFORM PROCESS LETT 2000. [DOI: 10.1016/s0020-0190(00)00142-3] [Citation(s) in RCA: 274] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]