Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Herwig R, Poustka AJ, Müller C, Bull C, Lehrach H, O'Brien J. Large-scale clustering of cDNA-fingerprinting data. Genome Res 1999;9:1093-105. [PMID: 10568749 PMCID: PMC310829 DOI: 10.1101/gr.9.11.1093] [Citation(s) in RCA: 92] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

For:	Herwig R, Poustka AJ, Müller C, Bull C, Lehrach H, O'Brien J. Large-scale clustering of cDNA-fingerprinting data. Genome Res 1999;9:1093-105. [PMID: 10568749 PMCID: PMC310829 DOI: 10.1101/gr.9.11.1093] [Citation(s) in RCA: 92] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Number

Cited by Other Article(s)

van Delft JHM, van Agen E, van Breda SGJ, Herwijnen MH, Staal YCM, Kleinjans JCS. Comparison of supervised clustering methods to discriminate genotoxic from non-genotoxic carcinogens by gene expression profiling. Mutat Res 2005;575:17-33. [PMID: 15924884 DOI: 10.1016/j.mrfmmm.2005.02.006] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2004] [Revised: 02/17/2005] [Accepted: 02/23/2005] [Indexed: 05/02/2023]

Abstract

Prediction of the toxic properties of chemicals based on modulation of gene expression profiles in exposed cells or animals is one of the major applications of toxicogenomics. Previously, we demonstrated that by Pearson correlation analysis of gene expression profiles from treated HepG2 cells it is possible to correctly discriminate and predict genotoxic from non-genotoxic carcinogens. Since to date many different supervised clustering methods for discrimination and prediction tests are available, we investigated whether application of the methods provided by the Whitehead Institute and Stanford University improved our initial prediction. Four different supervised clustering methods were applied for this comparison, namely Pearson correlation analysis (Pearson), nearest shrunken centroids analysis (NSC), K-nearest neighbour analysis (KNN) and Weighted voting (WV). For each supervised clustering method, three different approaches were followed: (1) using all the data points for all treatments, (2) exclusion of the samples with marginally affected gene expression profiles and (3) filtering out the gene expression signals that were hardly altered. On the complete data set, NSC, KNN and WV outperformed the Pearson test, but on the reduced data sets no clear difference was observed. Exclusion of samples with marginally affected profiles improved the prediction by all methods. For the various prediction models, gene sets of different compositions were selected; in these 27 genes appeared three times or more. These 27 genes are involved in many different biological processes and molecular functions, such as apoptosis, cell cycle control, regulation of transcription, and transporter activity, many of them related to the carcinogenic process. One gene, BAX, was selected in all 10 models, while ZFP36 was selected in 9, and AHR, MT1E and TTR in 8. Summarising, this study demonstrates that several supervised clustering methods can be used to discriminate certain genotoxic from non-genotoxic carcinogens by gene expression profiling in vitro in HepG2 cells. None of the methods clearly outperforms the others.

Collapse

Xu R, Wunsch D. Survey of clustering algorithms. ACTA ACUST UNITED AC 2005;16:645-78. [PMID: 15940994 DOI: 10.1109/tnn.2005.845141] [Citation(s) in RCA: 1004] [Impact Index Per Article: 52.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Au WH, Chan KCC, Wong AKC, Wang Y. Attribute clustering for grouping, selection, and classification of gene expression data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2005;2:83-101. [PMID: 17044174 DOI: 10.1109/tcbb.2005.17] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]

Abstract

This paper presents an attribute clustering method which is able to group genes based on their interdependence so as to mine meaningful patterns from the gene expression data. It can be used for gene grouping, selection, and classification. The partitioning of a relational table into attribute subgroups allows a small number of attributes within or across the groups to be selected for analysis. By clustering attributes, the search dimension of a data mining algorithm is reduced. The reduction of search dimension is especially important to data mining in gene expression data because such data typically consist of a huge number of genes (attributes) and a small number of gene expression profiles (tuples). Most data mining algorithms are typically developed and optimized to scale to the number of tuples instead of the number of attributes. The situation becomes even worse when the number of attributes overwhelms the number of tuples, in which case, the likelihood of reporting patterns that are actually irrelevant due to chances becomes rather high. It is for the aforementioned reasons that gene grouping and selection are important preprocessing steps for many data mining algorithms to be effective when applied to gene expression data. This paper defines the problem of attribute clustering and introduces a methodology to solving it. Our proposed method groups interdependent attributes into clusters by optimizing a criterion function derived from an information measure that reflects the interdependence between attributes. By applying our algorithm to gene expression data, meaningful clusters of genes are discovered. The grouping of genes based on attribute interdependence within group helps to capture different aspects of gene association patterns in each group. Significant genes selected from each group then contain useful information for gene expression classification and identification. To evaluate the performance of the proposed approach, we applied it to two well-known gene expression data sets and compared our results with those obtained by other methods. Our experiments show that the proposed method is able to find the meaningful clusters of genes. By selecting a subset of genes which have high multiple-interdependence with others within clusters, significant classification information can be obtained. Thus, a small pool of selected genes can be used to build classifiers with very high classification rate. From the pool, gene expressions of different categories can be identified.

Collapse

Figueroa A, Borneman J, Jiang T. Clustering binary fingerprint vectors with missing values for DNA array data analysis. J Comput Biol 2005;11:887-901. [PMID: 15700408 DOI: 10.1089/cmb.2004.11.887] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open

Abstract

Oligonucleotide fingerprinting is a powerful DNA array-based method to characterize cDNA and ribosomal RNA gene (rDNA) libraries and has many applications including gene expression profiling and DNA clone classification. We are especially interested in the latter application. A key step in the method is the cluster analysis of fingerprint data obtained from DNA array hybridization experiments. Most of the existing approaches to clustering use (normalized) real intensity values and thus do not treat positive and negative hybridization signals equally (positive signals are much more emphasized). In this paper, we consider a discrete approach. Fingerprint data are first normalized and binarized using control DNA clones. Because there may exist unresolved (or missing) values in this binarization process, we formulate the clustering of (binary) oligonucleotide fingerprints as a combinatorial optimization problem that attempts to identify clusters and resolve the missing values in the fingerprints simultaneously. We study the computational complexity of this clustering problem and a natural parameterized version and present an efficient greedy algorithm based on MINIMUM CLIQUE PARTITION on graphs. The algorithm takes advantage of some unique properties of the graphs considered here, which allow us to efficiently find the maximum cliques as well as some special maximal cliques. Our preliminary experimental results on simulated and real data demonstrate that the algorithm runs faster and performs better than some popular hierarchical and graph-based clustering methods. The results on real data from DNA clone classification also suggest that this discrete approach is more accurate than clustering methods based on real intensity values in terms of separating clones that have different characteristics with respect to the given oligonucleotide probes.

Collapse

Mocellin S, Provenzano M, Rossi CR, Pilati P, Nitti D, Lise M. DNA array-based gene profiling: from surgical specimen to the molecular portrait of cancer. Ann Surg 2005;241:16-26. [PMID: 15621987 PMCID: PMC1356842 DOI: 10.1097/01.sla.0000150157.83537.53] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Ferrazzi F, Magni P, Bellazzi R. Random Walk Models for Bayesian Clustering of Gene Expression Profiles. ACTA ACUST UNITED AC 2005;4:263-76. [PMID: 16309344 DOI: 10.2165/00822942-200504040-00006] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]

Clustering Gene Expression Series with Prior Knowledge. LECTURE NOTES IN COMPUTER SCIENCE 2005. [DOI: 10.1007/11557067_3] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]

Adjaye J. Whole-genome approaches for large-scale gene identification and expression analysis in mammalian preimplantation embryos. Reprod Fertil Dev 2005;17:37-45. [PMID: 15745630 DOI: 10.1071/rd04075] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2004] [Accepted: 10/01/2004] [Indexed: 11/23/2022] Open

Liu Y, Navathe SB, Civera J, Dasigi V, Ram A, Ciliax BJ, Dingledine R. Text mining biomedical literature for discovering gene-to-gene relationships: a comparative study of algorithms. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2005;2:62-76. [PMID: 17044165 DOI: 10.1109/tcbb.2005.14] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]

Abstract

Partitioning closely related genes into clusters has become an important element of practically all statistical analyses of microarray data. A number of computer algorithms have been developed for this task. Although these algorithms have demonstrated their usefulness for gene clustering, some basic problems remain. This paper describes our work on extracting functional keywords from MEDLINE for a set of genes that are isolated for further study from microarray experiments based on their differential expression patterns. The sharing of functional keywords among genes is used as a basis for clustering in a new approach called BEA-PARTITION in this paper. Functional keywords associated with genes were extracted from MEDLINE abstracts. We modified the Bond Energy Algorithm (BEA), which is widely accepted in psychology and database design but is virtually unknown in bioinformatics, to cluster genes by functional keyword associations. The results showed that BEA-PARTITION and hierarchical clustering algorithm outperformed k-means clustering and self-organizing map by correctly assigning 25 of 26 genes in a test set of four known gene groups. To evaluate the effectiveness of BEA-PARTITION for clustering genes identified by microarray profiles, 44 yeast genes that are differentially expressed during the cell cycle and have been widely studied in the literature were used as a second test set. Using established measures of cluster quality, the results produced by BEA-PARTITION had higher purity, lower entropy, and higher mutual information than those produced by k-means and self-organizing map. Whereas BEA-PARTITION and the hierarchical clustering produced similar quality of clusters, BEA-PARTITION provides clear cluster boundaries compared to the hierarchical clustering. BEA-PARTITION is simple to implement and provides a powerful approach to clustering genes or to any clustering problem where starting matrices are available from experimental observations.

Collapse

Wang Y, Makedon FS, Ford JC, Pearlman J. HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data. Bioinformatics 2004;21:1530-7. [PMID: 15585531 DOI: 10.1093/bioinformatics/bti192] [Citation(s) in RCA: 132] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Illiger J, Herwig R, Steinfath M, Przewieslik T, Elge T, Bull C, Radelof U, Lehrach H, Janitz M. Establishment of T cell-specific and natural killer cell-specific unigene sets: towards high-throughput genomics of leukaemia. EUROPEAN JOURNAL OF IMMUNOGENETICS : OFFICIAL JOURNAL OF THE BRITISH SOCIETY FOR HISTOCOMPATIBILITY AND IMMUNOGENETICS 2004;31:253-7. [PMID: 15548262 DOI: 10.1111/j.1365-2370.2004.00483.x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/01/2023]

Yoo C, Cooper GF. An evaluation of a system that recommends microarray experiments to perform to discover gene-regulation pathways. Artif Intell Med 2004;31:169-82. [PMID: 15219293 DOI: 10.1016/j.artmed.2004.01.018] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2003] [Revised: 04/14/2003] [Accepted: 01/16/2004] [Indexed: 11/23/2022]

Abstract

The main topic of this paper is modeling the expected value of experimentation (EVE) for discovering causal pathways in gene expression data. By experimentation we mean both interventions (e.g., a gene knockout experiment) and observations (e.g., passively observing the expression level of a "wild-type" gene). We introduce a system called GEEVE (causal discovery in Gene Expression data using Expected Value of Experimentation), which implements expected value of experimentation in discovering causal pathways using gene expression data. GEEVE provides the following assistance, which is intended to help biologists in their quest to discover gene-regulation pathways: Recommending which experiments to perform (with a focus on "knockout" experiments) using an expected value of experimentation method. Recommending the number of measurements (observational and experimental) to include in the experimental design, again using an EVE method. Providing a Bayesian analysis that combines prior knowledge with the results of recent microarray experimental results to derive posterior probabilities of gene regulation relationships. In recommending which experiments to perform (and how many times to repeat them) the EVE approach considers the biologist's preferences for which genes to focus the discovery process. Also, since exact EVE calculations are exponential in time, GEEVE incorporates approximation methods. GEEVE is able to combine data from knockout experiments with data from wild-type experiments to suggest additional experiments to perform and then to analyze the results of those microarray experimental results. It models the possibility that unmeasured (latent) variables may be responsible for some of the statistical associations among the expression levels of the genes under study. To evaluate the GEEVE system, we used a gene expression simulator to generate data from specified models of gene regulation. The results show that the GEEVE system gives better results than two recently published approaches (1) in learning the generating models of gene regulation and (2) in recommending experiments to perform.

Collapse

Balasubramaniyan R, Hüllermeier E, Weskamp N, Kämper J. Clustering of gene expression data using a local shape-based similarity measure. Bioinformatics 2004;21:1069-77. [PMID: 15513997 DOI: 10.1093/bioinformatics/bti095] [Citation(s) in RCA: 82] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

Desper R, Khan J, Schäffer AA. Tumor classification using phylogenetic methods on expression data. J Theor Biol 2004;228:477-96. [PMID: 15178197 DOI: 10.1016/j.jtbi.2004.02.021] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2003] [Revised: 02/03/2004] [Accepted: 02/20/2004] [Indexed: 10/26/2022]

Abstract

Tumor classification is a well-studied problem in the field of bioinformatics. Developments in the field of DNA chip design have now made it possible to measure the expression levels of thousands of genes in sample tissue from healthy cell lines or tumors. A number of studies have examined the problems of tumor classification: class discovery, the problem of defining a number of classes of tumors using the data from a DNA chip, and class prediction, the problem of accurately classifying an unknown tumor, given expression data from the unknown tumor and from a learning set. The current work has applied phylogenetic methods to both problems. To solve the class discovery problem, we impose a metric on a set of tumors as a function of their gene expression levels, and impose a tree structure on this metric, using standard tree fitting methods borrowed from the field of phylogenetics. Phylogenetic methods provide a simple way of imposing a clear hierarchical relationship on the data, with branch lengths in the classification tree representing the degree of separation witnessed. We tested our method for class discovery on two data sets: a data set of 87 tissues, comprised mostly of small, round, blue-cell tumors (SRBCTs), and a data set of 22 breast tumors. We fit the 87 samples of the first set to a classification tree, which neatly separated into four major clusters corresponding exactly to the four groups of tumors, namely neuroblastomas, rhabdomyosarcomas, Burkitt's lymphomas, and the Ewing's family of tumors. The classification tree built using the breast cancer data separated tumors with BRCA1 mutations from those with BRCA2 mutations, with sporadic tumors separated from both groups and from each other. We also demonstrate the flexibility of the class discovery method with regard to standard resampling methodology such as jackknifing and noise perturbation. To solve the class prediction problem, we built a classification tree on the learning set, and then sought the optimal placement of each test sample within the classification tree. We tested this method on the SRBCT data set, and classified each tumor successfully.

Collapse

Daub CO, Steuer R, Selbig J, Kloska S. Estimating mutual information using B-spline functions--an improved similarity measure for analysing gene expression data. BMC Bioinformatics 2004;5:118. [PMID: 15339346 PMCID: PMC516800 DOI: 10.1186/1471-2105-5-118] [Citation(s) in RCA: 194] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2003] [Accepted: 08/31/2004] [Indexed: 11/10/2022] Open

Büssow K, Quedenau C, Sievert V, Tischer J, Scheich C, Seitz H, Hieke B, Niesen FH, Götz F, Harttig U, Lehrach H. A catalog of human cDNA expression clones and its application to structural genomics. Genome Biol 2004;5:R71. [PMID: 15345055 PMCID: PMC522878 DOI: 10.1186/gb-2004-5-9-r71] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2004] [Revised: 07/21/2004] [Accepted: 07/23/2004] [Indexed: 11/10/2022] Open

Barra V. Analysis of gene expression data using functional principal components. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2004;75:1-9. [PMID: 15158042 DOI: 10.1016/j.cmpb.2003.08.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2002] [Revised: 08/28/2003] [Accepted: 08/28/2003] [Indexed: 05/24/2023]

Tsai HK, Yang JM, Tsai YF, Kao CY. An evolutionary approach for gene expression patterns. IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE : A PUBLICATION OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY 2004;8:69-78. [PMID: 15217251 DOI: 10.1109/titb.2004.826713] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Poustka AJ, Groth D, Hennig S, Thamm S, Cameron A, Beck A, Reinhardt R, Herwig R, Panopoulou G, Lehrach H. Generation, annotation, evolutionary analysis, and database integration of 20,000 unique sea urchin EST clusters. Genome Res 2004;13:2736-46. [PMID: 14656975 PMCID: PMC403816 DOI: 10.1101/gr.1674103] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Jiang D, Pei J, Zhang A. Towards interactive exploration of gene expression patterns. ACTA ACUST UNITED AC 2003. [DOI: 10.1145/980972.980983] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Xu D, Olman V, Wang L, Xu Y. EXCAVATOR: a computer program for efficiently mining gene expression data. Nucleic Acids Res 2003;31:5582-9. [PMID: 14500821 PMCID: PMC206478 DOI: 10.1093/nar/gkg783] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2003] [Revised: 08/01/2003] [Accepted: 08/18/2003] [Indexed: 11/14/2022] Open

Katagiri F, Glazebrook J. Local Context Finder (LCF) reveals multidimensional relationships among mRNA expression profiles of Arabidopsis responding to pathogen infection. Proc Natl Acad Sci U S A 2003;100:10842-7. [PMID: 12960373 PMCID: PMC196890 DOI: 10.1073/pnas.1934349100] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Kato N, Kobayashi T, Honda H. Screening of stress enhancer based on analysis of gene expression profiles: enhancement of hyperthermia-induced tumor necrosis by an MMP-3 inhibitor. Cancer Sci 2003;94:644-9. [PMID: 12841876 PMCID: PMC11160297 DOI: 10.1111/j.1349-7006.2003.tb01497.x] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2003] [Revised: 05/01/2003] [Accepted: 05/06/2003] [Indexed: 11/29/2022] Open

Panopoulou G, Hennig S, Groth D, Krause A, Poustka AJ, Herwig R, Vingron M, Lehrach H. New evidence for genome-wide duplications at the origin of vertebrates using an amphioxus gene set and completed animal genomes. Genome Res 2003;13:1056-66. [PMID: 12799346 PMCID: PMC403660 DOI: 10.1101/gr.874803] [Citation(s) in RCA: 129] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]

Bicciato S, Pandin M, Didonè G, Di Bello C. Pattern identification and classification in gene expression data using an autoassociative neural network model. Biotechnol Bioeng 2003;81:594-606. [PMID: 12514809 DOI: 10.1002/bit.10505] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Abstract

The application of DNA microarray technology for analysis of gene expression creates enormous opportunities to accelerate the pace in understanding living systems and identification of target genes and pathways for drug development and therapeutic intervention. Parallel monitoring of the expression profiles of thousands of genes seems particularly promising for a deeper understanding of cancer biology and the identification of molecular signatures supporting the histological classification schemes of neoplastic specimens. However, the increasing volume of data generated by microarray experiments poses the challenge of developing equally efficient methods and analysis procedures to extract, interpret, and upgrade the information content of these databases. Herein, a computational procedure for pattern identification, feature extraction, and classification of gene expression data through the analysis of an autoassociative neural network model is described. The identified patterns and features contain critical information about gene-phenotype relationships observed during changes in cell physiology. They represent a rational and dimensionally reduced base for understanding the basic biology of the onset of diseases, defining targets of therapeutic intervention, and developing diagnostic tools for the identification and classification of pathological states. The proposed method has been tested on two different microarray datasets-Golub's analysis of acute human leukemia [Golub et al. (1999) Science 286:531-537], and the human colon adenocarcinoma study presented by Alon et al. [1999; Proc Natl Acad Sci USA 97:10101-10106]. The analysis of the neural network internal structure allows the identification of specific phenotype markers and the extraction of peculiar associations among genes and physiological states. At the same time, the neural network outputs provide assignment to multiple classes, such as different pathological conditions or tissue samples, for previously unseen instances.

Collapse

Peterson LE. Partitioning large-sample microarray-based gene expression profiles using principal components analysis. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2003;70:107-119. [PMID: 12507787 DOI: 10.1016/s0169-2607(02)00009-3] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]

Sawa T, Ohno-Machado L. A neural network-based similarity index for clustering DNA microarray data. Comput Biol Med 2003;33:1-15. [PMID: 12485626 DOI: 10.1016/s0010-4825(02)00032-x] [Citation(s) in RCA: 39] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Sharan R, Elkon R, Shamir R. Cluster analysis and its applications to gene expression data. ERNST SCHERING RESEARCH FOUNDATION WORKSHOP 2002:83-108. [PMID: 12061008 DOI: 10.1007/978-3-662-04747-7_5] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Herwig R, Schulz B, Weisshaar B, Hennig S, Steinfath M, Drungowski M, Stahl D, Wruck W, Menze A, O'Brien J, Lehrach H, Radelof U. Construction of a 'unigene' cDNA clone set by oligonucleotide fingerprinting allows access to 25 000 potential sugar beet genes. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2002;32:845-57. [PMID: 12472698 DOI: 10.1046/j.1365-313x.2002.01457.x] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]

Fuchs T, Malecova B, Linhart C, Sharan R, Khen M, Herwig R, Shmulevich D, Elkon R, Steinfath M, O'Brien JK, Radelof U, Lehrach H, Lancet D, Shamir R. DEFOG: a practical scheme for deciphering families of genes. Genomics 2002;80:295-302. [PMID: 12213199 DOI: 10.1006/geno.2002.6830] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Herrero J, Dopazo J. Combining hierarchical clustering and self-organizing maps for exploratory analysis of gene expression patterns. J Proteome Res 2002;1:467-70. [PMID: 12645919 DOI: 10.1021/pr025521v] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]

Ramaswamy S, Nakamura N, Sansal I, Bergeron L, Sellers WR. A novel mechanism of gene regulation and tumor suppression by the transcription factor FKHR. Cancer Cell 2002;2:81-91. [PMID: 12150827 DOI: 10.1016/s1535-6108(02)00086-7] [Citation(s) in RCA: 333] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Kettman JR, Coleclough C, Frey JR, Lefkovits I. Clonal proteomics: one gene - family of proteins. Proteomics 2002;2:624-31. [PMID: 12112841 DOI: 10.1002/1615-9861(200206)2:6<624::aid-prot624>3.0.co;2-i] [Citation(s) in RCA: 42] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Li MD, Konu O, Kane JK, Becker KG. Microarray technology and its application on nicotine research. Mol Neurobiol 2002;25:265-85. [PMID: 12109875 DOI: 10.1385/mn:25:3:265] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Fraley C, Raftery AE. Model-Based Clustering, Discriminant Analysis, and Density Estimation. J Am Stat Assoc 2002. [DOI: 10.1198/016214502760047131] [Citation(s) in RCA: 2601] [Impact Index Per Article: 118.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Hess KR, Zhang W, Baggerly KA, Stivers DN, Coombes KR. Microarrays: handling the deluge of data and extracting reliable information. Trends Biotechnol 2001;19:463-8. [PMID: 11602311 DOI: 10.1016/s0167-7799(01)01792-9] [Citation(s) in RCA: 75] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]

Wu TD. Analysing gene expression data from DNA microarrays to identify candidate genes. J Pathol 2001;195:53-65. [PMID: 11568891 DOI: 10.1002/1096-9896(200109)195:1<53::aid-path891>3.0.co;2-h] [Citation(s) in RCA: 89] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

Clark MD, Hennig S, Herwig R, Clifton SW, Marra MA, Lehrach H, Johnson SL. An oligonucleotide fingerprint normalized and expressed sequence tag characterized zebrafish cDNA library. Genome Res 2001;11:1594-602. [PMID: 11544204 PMCID: PMC311136 DOI: 10.1101/gr.186901] [Citation(s) in RCA: 61] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]

Konu Ö, Kane JK, Barrett T, Vawter MP, Chang R, Ma JZ, Donovan DM, Sharp B, Becker KG, Li MD. Region-specific transcriptional response to chronic nicotine in rat brain. Brain Res 2001;909:194-203. [PMID: 11478936 PMCID: PMC3098570 DOI: 10.1016/s0006-8993(01)02685-3] [Citation(s) in RCA: 81] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Cahill DJ. Protein and antibody arrays and their medical applications. J Immunol Methods 2001;250:81-91. [PMID: 11251223 DOI: 10.1016/s0022-1759(01)00325-8] [Citation(s) in RCA: 216] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Dopazo J, Zanders E, Dragoni I, Amphlett G, Falciani F. Methods and approaches in the analysis of gene expression data. J Immunol Methods 2001;250:93-112. [PMID: 11251224 DOI: 10.1016/s0022-1759(01)00307-6] [Citation(s) in RCA: 57] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]

Cahill DJ. Protein arrays: a high-throughput solution for proteomics research? Trends Biotechnol 2000. [DOI: 10.1016/s0167-7799(00)00006-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]