1
|
Han L, Yang G, Dai H, Yang H, Xu B, Li H, Long H, Li Z, Yang X, Zhao C. Combining self-organizing maps and biplot analysis to preselect maize phenotypic components based on UAV high-throughput phenotyping platform. PLANT METHODS 2019; 15:57. [PMID: 31149023 PMCID: PMC6537385 DOI: 10.1186/s13007-019-0444-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Accepted: 05/22/2019] [Indexed: 05/31/2023]
Abstract
BACKGROUND With environmental deterioration, natural resource scarcity, and rapid population growth, mankind is facing severe global food security problems. To meet future needs, it is necessary to accelerate progress in breeding for new varieties with high yield and strong resistance. However, the traditional phenotypic screening methods have some disadvantages, such as destructive, inefficient, low-dimensional, labor-intensive and cumbersome, which seriously hinder the development of field breeding. Breeders urgently need a high-throughput technique for acquiring and evaluating phenotypic data that can efficiently screen out excellent phenotypic traits from large-scale genotype populations. RESULTS In the present study, we used an unmanned aerial vehicle (UAV) high-throughput phenotyping (HTP) platform to collect RGB and multispectral images for a breeding program and acquired multiple phenotypic components (or traits), such as plant height, normalized difference vegetation index, biomass accumulation, plant-height growth rate, lodging, and leaf color. By implementing self-organizing maps and principal components analysis biplots to establish phenotypic map and similarity, we proposed an UAV-assisted HTP framework for preselecting maize (Zee mays L.) phenotypic components (or traits). CONCLUSIONS This framework gives breeders additional information to allow them to quickly identify and preselect plants that have genotypes conferring desirable phenotypic components out of thousands of field plots. The present study also demonstrates that remote sensing is a powerful tool with which to acquire abundant phenotypic components. By using these rich phenotypic components, breeders should be able to more effectively identify and select superior genotypes.
Collapse
Affiliation(s)
- Liang Han
- Key Laboratory of Quantitative Remote Sensing in Agriculture of Ministry of Agriculture, Beijing Research Center for Information Technology in Agriculture, Beijing, 100097 China
- 2College of Architecture and Geomatics Engineering, Shanxi Datong University, Datong, 037003 China
- 4College of Geoscience and Surveying Engineering, China University of Mining and Technology (Beijing), Beijing, 100083 China
| | - Guijun Yang
- Key Laboratory of Quantitative Remote Sensing in Agriculture of Ministry of Agriculture, Beijing Research Center for Information Technology in Agriculture, Beijing, 100097 China
| | - Huayang Dai
- 4College of Geoscience and Surveying Engineering, China University of Mining and Technology (Beijing), Beijing, 100083 China
| | - Hao Yang
- Key Laboratory of Quantitative Remote Sensing in Agriculture of Ministry of Agriculture, Beijing Research Center for Information Technology in Agriculture, Beijing, 100097 China
- 3National Engineering Research Center for Information Technology in Agriculture, Beijing, 100097 China
| | - Bo Xu
- Key Laboratory of Quantitative Remote Sensing in Agriculture of Ministry of Agriculture, Beijing Research Center for Information Technology in Agriculture, Beijing, 100097 China
| | - Heli Li
- 3National Engineering Research Center for Information Technology in Agriculture, Beijing, 100097 China
| | - Huiling Long
- Key Laboratory of Quantitative Remote Sensing in Agriculture of Ministry of Agriculture, Beijing Research Center for Information Technology in Agriculture, Beijing, 100097 China
- 3National Engineering Research Center for Information Technology in Agriculture, Beijing, 100097 China
| | - Zhenhai Li
- 3National Engineering Research Center for Information Technology in Agriculture, Beijing, 100097 China
| | - Xiaodong Yang
- Key Laboratory of Quantitative Remote Sensing in Agriculture of Ministry of Agriculture, Beijing Research Center for Information Technology in Agriculture, Beijing, 100097 China
- 3National Engineering Research Center for Information Technology in Agriculture, Beijing, 100097 China
| | - Chunjiang Zhao
- Key Laboratory of Quantitative Remote Sensing in Agriculture of Ministry of Agriculture, Beijing Research Center for Information Technology in Agriculture, Beijing, 100097 China
- 3National Engineering Research Center for Information Technology in Agriculture, Beijing, 100097 China
| |
Collapse
|
2
|
Ribas L, Robledo D, Gómez-Tato A, Viñas A, Martínez P, Piferrer F. Comprehensive transcriptomic analysis of the process of gonadal sex differentiation in the turbot (Scophthalmus maximus). Mol Cell Endocrinol 2016; 422:132-149. [PMID: 26586209 DOI: 10.1016/j.mce.2015.11.006] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/09/2015] [Revised: 11/03/2015] [Accepted: 11/03/2015] [Indexed: 10/22/2022]
Abstract
The turbot is a flatfish with a ZW/ZZ sex determination system but with a still unknown sex determining gene(s), and with a marked sexual growth dimorphism in favor of females. To better understand sexual development in turbot we sampled young turbot encompassing the whole process of gonadal differentiation and conducted a comprehensive transcriptomic study on its sex differentiation using a validated custom oligomicroarray. Also, the expression profiles of 18 canonical reproduction-related genes were studied along gonad development. The expression levels of gonadal aromatase cyp19a1a alone at three months of age allowed the accurate and early identification of sex before the first signs of histological differentiation. A total of 56 differentially expressed genes (DEG) that had not previously been related to sex differentiation in fish were identified within the first three months of age, of which 44 were associated with ovarian differentiation (e.g., cd98, gpd1 and cry2), and 12 with testicular differentiation (e.g., ace, capn8 and nxph1). To identify putative sex determining genes, ∼4.000 DEG in juvenile gonads were mapped and their positions compared with that of previously identified sex- and growth-related quantitative trait loci (QTL). Although no genes mapped to the previously identified sex-related QTLs, two genes (foxl2 and 17βhsd) of the canonical reproduction-related genes mapped to growth-QTLs in linkage group (LG) 15 and LG6, respectively, suggesting that these genes are related to the growth dimorphism in this species.
Collapse
Affiliation(s)
- L Ribas
- Institut de Ciències del Mar, Consejo Superior de Investigaciones Científicas (CSIC), 08003, Barcelona, Spain
| | - D Robledo
- Departamento de Genética. Facultad de Veterinaria, Universidad de Santiago de Compostela, 27002, Lugo, Spain
| | - A Gómez-Tato
- Departamento de Matemática Aplicada, Facultad de Matemáticas, Universidad de Santiago de Compostela, 15781, Santiago de Compostela, Spain
| | - A Viñas
- Departamento de Genética. Facultad de Veterinaria, Universidad de Santiago de Compostela, 27002, Lugo, Spain
| | - P Martínez
- Departamento de Genética. Facultad de Veterinaria, Universidad de Santiago de Compostela, 27002, Lugo, Spain
| | - F Piferrer
- Institut de Ciències del Mar, Consejo Superior de Investigaciones Científicas (CSIC), 08003, Barcelona, Spain.
| |
Collapse
|
3
|
Yang R, Du Z, Han Y, Zhou L, Song Y, Zhou D, Cui Y. Omics strategies for revealing Yersinia pestis virulence. Front Cell Infect Microbiol 2012; 2:157. [PMID: 23248778 PMCID: PMC3521224 DOI: 10.3389/fcimb.2012.00157] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2012] [Accepted: 11/27/2012] [Indexed: 01/12/2023] Open
Abstract
Omics has remarkably changed the way we investigate and understand life. Omics differs from traditional hypothesis-driven research because it is a discovery-driven approach. Mass datasets produced from omics-based studies require experts from different fields to reveal the salient features behind these data. In this review, we summarize omics-driven studies to reveal the virulence features of Yersinia pestis through genomics, trascriptomics, proteomics, interactomics, etc. These studies serve as foundations for further hypothesis-driven research and help us gain insight into Y. pestis pathogenesis.
Collapse
Affiliation(s)
- Ruifu Yang
- Beijing Institute of Microbiology and Epidemiology Beijing, China.
| | | | | | | | | | | | | |
Collapse
|
4
|
Linghu C, Zheng H, Zhang L, Zhang J. Discovering common combinatorial histone modification patterns in the human genome. Gene 2012; 518:171-8. [PMID: 23235118 DOI: 10.1016/j.gene.2012.11.038] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2012] [Accepted: 11/27/2012] [Indexed: 01/08/2023]
Abstract
Histone modifications play a crucial role in regulating gene expression and cell lineage determination and maintenance at the epigenetic level. To systematically investigate this phenomenon, this paper presented a statistical hybrid clustering algorithm to identify common combinatorial histone modification patterns. We applied the algorithm to 39 histone modification marks in human CD4+ T cells and detected 854 common combinatorial histone modification patterns. Our results could cover 211 (76.17%) patterns among 277 patterns identified by the tandem mass spectrometry experiments. Based on the frequency statistical analysis, it was found that the co-occurrence frequencies of 20 backbone modifications are greater than or close to 0.2 in the 854 patterns. we also found that 15 modifications (H2BK120ac, H4K91ac, H2BK20ac, etc.), three histone acetylations (H2AK9ac, H4K16ac, and H4K12ac) and five histone methylations (H3K79me1, H3K79me2, 3K79me3, H4K20me1, and H2BK5me1) were most likely prone to coexist respectively in these patterns. In addition, we found that DNA methylation tends to combine with histone acetylation rather than histone methylation.
Collapse
Affiliation(s)
- Changgui Linghu
- School of Life Science, Beijing Institute of Technology, Beijing, China
| | | | | | | |
Collapse
|
5
|
Lima-Silva V, Rosado A, Amorim-Silva V, Muñoz-Mérida A, Pons C, Bombarely A, Trelles O, Fernández-Muñoz R, Granell A, Valpuesta V, Botella MÁ. Genetic and genome-wide transcriptomic analyses identify co-regulation of oxidative response and hormone transcript abundance with vitamin C content in tomato fruit. BMC Genomics 2012; 13:187. [PMID: 22583865 PMCID: PMC3462723 DOI: 10.1186/1471-2164-13-187] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2011] [Accepted: 04/25/2012] [Indexed: 12/02/2022] Open
Abstract
Background L-ascorbic acid (AsA; vitamin C) is essential for all living plants where it functions as the main hydrosoluble antioxidant. It has diverse roles in the regulation of plant cell growth and expansion, photosynthesis, and hormone-regulated processes. AsA is also an essential component of the human diet, being tomato fruit one of the main sources of this vitamin. To identify genes responsible for AsA content in tomato fruit, transcriptomic studies followed by clustering analysis were applied to two groups of fruits with contrasting AsA content. These fruits were identified after AsA profiling of an F8 Recombinant Inbred Line (RIL) population generated from a cross between the domesticated species Solanum lycopersicum and the wild relative Solanum pimpinellifollium. Results We found large variability in AsA content within the RIL population with individual RILs with up to 4-fold difference in AsA content. Transcriptomic analysis identified genes whose expression correlated either positively (PVC genes) or negatively (NVC genes) with the AsA content of the fruits. Cluster analysis using SOTA allowed the identification of subsets of co-regulated genes mainly involved in hormones signaling, such as ethylene, ABA, gibberellin and auxin, rather than any of the known AsA biosynthetic genes. Data mining of the corresponding PVC and NVC orthologs in Arabidopis databases identified flagellin and other ROS-producing processes as cues resulting in differential regulation of a high percentage of the genes from both groups of co-regulated genes; more specifically, 26.6% of the orthologous PVC genes, and 15.5% of the orthologous NVC genes were induced and repressed, respectively, under flagellin22 treatment in Arabidopsis thaliana. Conclusion Results here reported indicate that the content of AsA in red tomato fruit from our selected RILs are not correlated with the expression of genes involved in its biosynthesis. On the contrary, the data presented here supports that AsA content in tomato fruit co-regulates with genes involved in hormone signaling and they are dependent on the oxidative status of the fruit.
Collapse
Affiliation(s)
- Viviana Lima-Silva
- Departamento Biología Molecular y Bioquímica, Instituto de Hortofruticultura Subtropical y Mediterránea, Universidad de Málaga-Consejo Superior de Investigaciones Científicas, Universidad de Málaga, 29071, Málaga, Spain
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
6
|
Mendoza-Parra MA, Sankar M, Walia M, Gronemeyer H. POLYPHEMUS: R package for comparative analysis of RNA polymerase II ChIP-seq profiles by non-linear normalization. Nucleic Acids Res 2011; 40:e30. [PMID: 22156059 PMCID: PMC3287170 DOI: 10.1093/nar/gkr1205] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Chromatin immunoprecipitation coupled with massive parallel sequencing (ChIP-seq) is increasingly used to map protein–chromatin interactions at global scale. The comparison of ChIP-seq profiles for RNA polymerase II (PolII) established in different biological contexts, such as specific developmental stages or specific time-points during cell differentiation, provides not only information about the presence/accumulation of PolII at transcription start sites (TSSs) but also about functional features of transcription, including PolII stalling, pausing and transcript elongation. However, annotation and normalization tools for comparative studies of multiple samples are currently missing. Here, we describe the R-package POLYPHEMUS, which integrates TSS annotation with PolII enrichment over TSSs and coding regions, and normalizes signal intensity profiles. Thereby POLYPHEMUS facilitates to extract information about global PolII action to reveal changes in the functional state of genes. We validated POLYPHEMUS using a kinetic study on retinoic acid-induced differentiation and a publicly available data set from a comparative PolII ChIP-seq profiling in Caenorhabditis elegans. We demonstrate that POLYPHEMUS corrects the data sets by normalizing for technical variation between samples and reveal the potential of the algorithm in comparing multiple data sets to infer features of transcription regulation from dynamic PolII binding profiles.
Collapse
Affiliation(s)
- Marco A Mendoza-Parra
- Department of Cancer Biology, Institut de Génétique et de Biologie Moléculaire et Cellulaire/CNRS/INSERM/Université de Strasbourg, BP 10142, 67404 Illkirch Cedex, France.
| | | | | | | |
Collapse
|
7
|
Senf A, Chen XW. Identification of genes involved in the same pathways using a Hidden Markov Model-based approach. Bioinformatics 2009; 25:2945-54. [DOI: 10.1093/bioinformatics/btp521] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
|
8
|
Tsigelny I, Kouznetsova V, Sweeney DE, Wu W, Bush KT, Nigam SK. Analysis of metagene portraits reveals distinct transitions during kidney organogenesis. Sci Signal 2008; 1:ra16. [PMID: 19066399 PMCID: PMC3016920 DOI: 10.1126/scisignal.1163630] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Organogenesis is a multistage process, but it has been difficult, by conventional analysis, to separate stages and identify points of transition in developmentally complex organs or define genetic pathways that regulate pattern formation. We performed a detailed time-series examination of global gene expression during kidney development and then represented the resulting data as self-organizing maps (SOMs), which reduced more than 30,000 genes to 650 metagenes. Further clustering of these maps identified potential stages of development and suggested points of stability and transition during kidney organogenesis that are not obvious from either standard morphological analyses or conventional microarray clustering algorithms. We also performed entropy calculations of SOMs generated for each day of development and found correlations with morphometric parameters and expression of candidate genes that may help in orchestrating the transitions between stages of kidney development, as well as macro- and micropatterning of the organ.
Collapse
Affiliation(s)
- Igor Tsigelny
- Department of Chemistry and Biochemistry, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093–0505, USA
- San Diego Supercomputer Center, School of Medicine, University of California, San Diego, La Jolla, CA 92093–0505, USA
| | - Valentina Kouznetsova
- Department of Medicine, School of Medicine, University of California, San Diego, La Jolla, CA 92093–0693, USA
| | - Derina E. Sweeney
- Department of Medicine, School of Medicine, University of California, San Diego, La Jolla, CA 92093–0693, USA
| | - Wei Wu
- Department of Medicine, School of Medicine, University of California, San Diego, La Jolla, CA 92093–0693, USA
| | - Kevin T. Bush
- Department of Medicine, School of Medicine, University of California, San Diego, La Jolla, CA 92093–0693, USA
| | - Sanjay K. Nigam
- Department of Medicine, School of Medicine, University of California, San Diego, La Jolla, CA 92093–0693, USA
- Department of Pediatrics, School of Medicine, University of California, San Diego, La Jolla, CA 92093–0693, USA
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA 92093–0693, USA
- John and Rebecca Moores UCSD Cancer Center, School of Medicine, University of California, San Diego, La Jolla, CA 92093–0693, USA
| |
Collapse
|
9
|
Rubins KH, Hensley LE, Bell GW, Wang C, Lefkowitz EJ, Brown PO, Relman DA. Comparative analysis of viral gene expression programs during poxvirus infection: a transcriptional map of the vaccinia and monkeypox genomes. PLoS One 2008; 3:e2628. [PMID: 18612436 PMCID: PMC2440811 DOI: 10.1371/journal.pone.0002628] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2007] [Accepted: 05/02/2008] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Poxviruses engage in a complex and intricate dialogue with host cells as part of their strategy for replication. However, relatively little molecular detail is available with which to understand the mechanisms behind this dialogue. METHODOLOGY/PRINCIPAL FINDINGS We designed a specialized microarray that contains probes specific to all predicted ORFs in the Monkeypox Zaire (MPXV) and Vaccinia Western Reserve (VACV) genomes, as well as >18,000 human genes, and used this tool to characterize MPXV and VACV gene expression responses in vitro during the course of primary infection of human monocytes, primary human fibroblasts and HeLa cells. The two viral transcriptomes show distinct features of temporal regulation and species-specific gene expression, and provide an early foundation for understanding global gene expression responses during poxvirus infection. CONCLUSIONS/SIGNIFICANCE The results provide a temporal map of the transcriptome of each virus during infection, enabling us to compare viral gene expression across species, and classify expression patterns of previously uncharacterized ORFs.
Collapse
Affiliation(s)
- Kathleen H Rubins
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, California, United States of America.
| | | | | | | | | | | | | |
Collapse
|
10
|
Dogini DB, Ribeiro PAO, Rocha C, Pereira TC, Lopes-Cendes I. MicroRNA expression profile in murine central nervous system development. J Mol Neurosci 2008; 35:331-7. [PMID: 18452032 DOI: 10.1007/s12031-008-9068-4] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2008] [Accepted: 03/13/2008] [Indexed: 11/25/2022]
Abstract
MicroRNAs (miRNAs) regulate gene expression in a post-transcriptional sequence-specific manner. In order to better understand the possible roles of miRNAs in central nervous system (CNS) development, we examined the expression profile of 104 miRNAs during murine brain development. We obtained brain samples from animals at embryonic days (E) E15, E17, and postnatal days (P) P1 and P7. Total RNA was isolated from tissue and used to obtain mature miRNAs by reverse transcription. Our results indicate that there is a group of 12 miRNAs that show a distinct expression profile, with the highest expression during embryonic stages and decreasing significantly during development. This profile suggests key roles in processes occurring during early CNS development.
Collapse
Affiliation(s)
- Danyella B Dogini
- Department of Medical Genetics, Faculty of Medical Sciences, University of Campinas-UNICAMP, Tessália Vieira de Camargo 126, Campinas, Sao Paulo, Brazil
| | | | | | | | | |
Collapse
|
11
|
Wang H, Zheng H, Azuaje F. Poisson-based self-organizing feature maps and hierarchical clustering for serial analysis of gene expression data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2007; 4:163-75. [PMID: 17473311 DOI: 10.1109/tcbb.2007.070204] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Serial analysis of gene expression (SAGE) is a powerful technique for global gene expression profiling, allowing simultaneous analysis of thousands of transcripts without prior structural and functional knowledge. Pattern discovery and visualization have become fundamental approaches to analyzing such large-scale gene expression data. From the pattern discovery perspective, clustering techniques have received great attention. However, due to the statistical nature of SAGE data (i.e., underlying distribution), traditional clustering techniques may not be suitable for SAGE data analysis. Based on the adaptation and improvement of Self-Organizing Maps and hierarchical clustering techniques, this paper presents two new clustering algorithms, namely, PoissonS and PoissonHC, for SAGE data analysis. Tested on synthetic and experimental SAGE data, these algorithms demonstrate several advantages over traditional pattern discovery techniques. The results indicate that, by incorporating statistical properties of SAGE data, PoissonS and PoissonHC, as well as a hybrid approach (neuro-hierarchical approach) based on the combination of PoissonS and PoissonHC, offer significant improvements in pattern discovery and visualization for SAGE data. Moreover, a user-friendly platform, which may improve and accelerate SAGE data mining, was implemented. The system is freely available on request from the authors for nonprofit use.
Collapse
Affiliation(s)
- Haiying Wang
- School of Computing and Mathematics, University of Ulster, Jordanstown, Northern Ireland, UK.
| | | | | |
Collapse
|
12
|
Belacel N, Wang Q, Cuperlovic-Culf M. Clustering methods for microarray gene expression data. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2007; 10:507-31. [PMID: 17233561 DOI: 10.1089/omi.2006.10.507] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Within the field of genomics, microarray technologies have become a powerful technique for simultaneously monitoring the expression patterns of thousands of genes under different sets of conditions. A main task now is to propose analytical methods to identify groups of genes that manifest similar expression patterns and are activated by similar conditions. The corresponding analysis problem is to cluster multi-condition gene expression data. The purpose of this paper is to present a general view of clustering techniques used in microarray gene expression data analysis.
Collapse
Affiliation(s)
- Nabil Belacel
- National Research Council Canada, Institute for Information Technology, Scientific Park, Moncton, New Brunswick, Canada.
| | | | | |
Collapse
|
13
|
Meunier B, Dumas E, Piec I, Béchet D, Hébraud M, Hocquette JF. Assessment of Hierarchical Clustering Methodologies for Proteomic Data Mining. J Proteome Res 2006; 6:358-66. [PMID: 17203979 DOI: 10.1021/pr060343h] [Citation(s) in RCA: 119] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Hierarchical clustering methodology is a powerful data mining approach for a first exploration of proteomic data. It enables samples or proteins to be grouped blindly according to their expression profiles. Nevertheless, the clustering results depend on parameters such as data preprocessing, between-profile similarity measurement, and the dendrogram construction procedure. We assessed several clustering strategies by calculating the F-measure, a widely used quality metric. The combination, on logged matrix, of Pearson correlation and Ward's methods for data aggregation is among the best clustering strategies, at least with the data sets we studied. This study was carried out using PermutMatrix, a freely available software derived from transcriptomics.
Collapse
Affiliation(s)
- Bruno Meunier
- UR 1213, Unité de Recherches sur les Herbivores, Equipe Croissance et Métabolisme du Muscle, INRA de Clermont-Ferrand/Theix, F-63122 [corrected] Saint-Genès Champanelle, France.
| | | | | | | | | | | |
Collapse
|
14
|
Abstract
BACKGROUND DNA Microarray technology is an innovative methodology in experimental molecular biology, which has produced huge amounts of valuable data in the profile of gene expression. Many clustering algorithms have been proposed to analyze gene expression data, but little guidance is available to help choose among them. The evaluation of feasible and applicable clustering algorithms is becoming an important issue in today's bioinformatics research. RESULTS In this paper we first experimentally study three major clustering algorithms: Hierarchical Clustering (HC), Self-Organizing Map (SOM), and Self Organizing Tree Algorithm (SOTA) using Yeast Saccharomyces cerevisiae gene expression data, and compare their performance. We then introduce Cluster Diff, a new data mining tool, to conduct the similarity analysis of clusters generated by different algorithms. The performance study shows that SOTA is more efficient than SOM while HC is the least efficient. The results of similarity analysis show that when given a target cluster, the Cluster Diff can efficiently determine the closest match from a set of clusters. Therefore, it is an effective approach for evaluating different clustering algorithms. CONCLUSION HC methods allow a visual, convenient representation of genes. However, they are neither robust nor efficient. The SOM is more robust against noise. A disadvantage of SOM is that the number of clusters has to be fixed beforehand. The SOTA combines the advantages of both hierarchical and SOM clustering. It allows a visual representation of the clusters and their structure and is not sensitive to noises. The SOTA is also more flexible than the other two clustering methods. By using our data mining tool, Cluster Diff, it is possible to analyze the similarity of clusters generated by different algorithms and thereby enable comparisons of different clustering methods.
Collapse
Affiliation(s)
- Longde Yin
- Department of Computer Science & Engineering, University of Connecticut, Storrs, CT 06269, USA
| | - Chun-Hsi Huang
- Department of Computer Science & Engineering, University of Connecticut, Storrs, CT 06269, USA
| | - Jun Ni
- Department of Computer Science, the University of Iowa, Iowa City, IA 52242, USA
| |
Collapse
|
15
|
Montaner D, Tárraga J, Huerta-Cepas J, Burguet J, Vaquerizas JM, Conde L, Minguez P, Vera J, Mukherjee S, Valls J, Pujana MAG, Alloza E, Herrero J, Al-Shahrour F, Dopazo J. Next station in microarray data analysis: GEPAS. Nucleic Acids Res 2006; 34:W486-91. [PMID: 16845056 PMCID: PMC1538867 DOI: 10.1093/nar/gkl197] [Citation(s) in RCA: 93] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2006] [Revised: 03/21/2006] [Accepted: 03/21/2006] [Indexed: 11/15/2022] Open
Abstract
The Gene Expression Profile Analysis Suite (GEPAS) has been running for more than four years. During this time it has evolved to keep pace with the new interests and trends in the still changing world of microarray data analysis. GEPAS has been designed to provide an intuitive although powerful web-based interface that offers diverse analysis options from the early step of preprocessing (normalization of Affymetrix and two-colour microarray experiments and other preprocessing options), to the final step of the functional annotation of the experiment (using Gene Ontology, pathways, PubMed abstracts etc.), and include different possibilities for clustering, gene selection, class prediction and array-comparative genomic hybridization management. GEPAS is extensively used by researchers of many countries and its records indicate an average usage rate of 400 experiments per day. The web-based pipeline for microarray gene expression data, GEPAS, is available at http://www.gepas.org.
Collapse
Affiliation(s)
- David Montaner
- Bioinformatics Department, Centro de Investigación Príncipe Felipe (CIPF)Autopista del Saler 16, E46013, Valencia, Spain
- Functional Genomics Node, INBCIPF, Autopista del Saler 16, E46013, Valencia, Spain
| | - Joaquín Tárraga
- Bioinformatics Department, Centro de Investigación Príncipe Felipe (CIPF)Autopista del Saler 16, E46013, Valencia, Spain
- Functional Genomics Node, INBCIPF, Autopista del Saler 16, E46013, Valencia, Spain
| | - Jaime Huerta-Cepas
- Bioinformatics Department, Centro de Investigación Príncipe Felipe (CIPF)Autopista del Saler 16, E46013, Valencia, Spain
- Functional Genomics Node, INBCIPF, Autopista del Saler 16, E46013, Valencia, Spain
| | - Jordi Burguet
- Bioinformatics Department, Centro de Investigación Príncipe Felipe (CIPF)Autopista del Saler 16, E46013, Valencia, Spain
| | - Juan M. Vaquerizas
- Bioinformatics Department, Centro de Investigación Príncipe Felipe (CIPF)Autopista del Saler 16, E46013, Valencia, Spain
| | - Lucía Conde
- Bioinformatics Department, Centro de Investigación Príncipe Felipe (CIPF)Autopista del Saler 16, E46013, Valencia, Spain
| | - Pablo Minguez
- Bioinformatics Department, Centro de Investigación Príncipe Felipe (CIPF)Autopista del Saler 16, E46013, Valencia, Spain
| | - Javier Vera
- INB—BSCJordi Girona 29, Edifici Nexus II, E-08034 Barcelona, Spain
| | - Sach Mukherjee
- Pattern Analysis and Machine Learning Group, Department of Engineering Science University of OxfordOxford OX1 2JD, UK
| | - Joan Valls
- Translational Research Laboratory, Catalan Institute of Oncology, Institut d'Investigació Biomèdica de Bellvitge, L'Hospitalet08907 Barcelona, Spain
| | - Miguel A. G. Pujana
- Translational Research Laboratory, Catalan Institute of Oncology, Institut d'Investigació Biomèdica de Bellvitge, L'Hospitalet08907 Barcelona, Spain
| | - Eva Alloza
- Bioinformatics Department, Centro de Investigación Príncipe Felipe (CIPF)Autopista del Saler 16, E46013, Valencia, Spain
| | | | - Fátima Al-Shahrour
- Bioinformatics Department, Centro de Investigación Príncipe Felipe (CIPF)Autopista del Saler 16, E46013, Valencia, Spain
| | - Joaquín Dopazo
- Bioinformatics Department, Centro de Investigación Príncipe Felipe (CIPF)Autopista del Saler 16, E46013, Valencia, Spain
- Functional Genomics Node, INBCIPF, Autopista del Saler 16, E46013, Valencia, Spain
| |
Collapse
|
16
|
Vaquerizas JM, Conde L, Yankilevich P, Cabezón A, Minguez P, Díaz-Uriarte R, Al-Shahrour F, Herrero J, Dopazo J. GEPAS, an experiment-oriented pipeline for the analysis of microarray gene expression data. Nucleic Acids Res 2005; 33:W616-20. [PMID: 15980548 PMCID: PMC1160260 DOI: 10.1093/nar/gki500] [Citation(s) in RCA: 80] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2005] [Revised: 04/09/2005] [Accepted: 05/03/2005] [Indexed: 02/02/2023] Open
Abstract
The Gene Expression Profile Analysis Suite, GEPAS, has been running for more than three years. With >76,000 experiments analysed during the last year and a daily average of almost 300 analyses, GEPAS can be considered a well-established and widely used platform for gene expression microarray data analysis. GEPAS is oriented to the analysis of whole series of experiments. Its design and development have been driven by the demands of the biomedical community, probably the most active collective in the field of microarray users. Although clustering methods have obviously been implemented in GEPAS, our interest has focused more on methods for finding genes differentially expressed among distinct classes of experiments or correlated to diverse clinical outcomes, as well as on building predictors. There is also a great interest in CGH-arrays which fostered the development of the corresponding tool in GEPAS: InSilicoCGH. Much effort has been invested in GEPAS for developing and implementing efficient methods for functional annotation of experiments in the proper statistical framework. Thus, the popular FatiGO has expanded to a suite of programs for functional annotation of experiments, including information on transcription factor binding sites, chromosomal location and tissues. The web-based pipeline for microarray gene expression data, GEPAS, is available at http://www.gepas.org.
Collapse
Affiliation(s)
- Juan M. Vaquerizas
- Bioinformatics Unit, Centro Nacional de Investigaciones Oncológicas (CNIO)Melchor Fernández Almagro 3, 28029 Madrid, Spain
| | - Lucía Conde
- Bioinformatics Unit, Centro Nacional de Investigaciones Oncológicas (CNIO)Melchor Fernández Almagro 3, 28029 Madrid, Spain
| | - Patricio Yankilevich
- Bioinformatics Unit, Centro Nacional de Investigaciones Oncológicas (CNIO)Melchor Fernández Almagro 3, 28029 Madrid, Spain
| | - Amaya Cabezón
- Bioinformatics Unit, Centro Nacional de Investigaciones Oncológicas (CNIO)Melchor Fernández Almagro 3, 28029 Madrid, Spain
| | - Pablo Minguez
- Bioinformatics Unit, Centro Nacional de Investigaciones Oncológicas (CNIO)Melchor Fernández Almagro 3, 28029 Madrid, Spain
| | - Ramón Díaz-Uriarte
- Bioinformatics Unit, Centro Nacional de Investigaciones Oncológicas (CNIO)Melchor Fernández Almagro 3, 28029 Madrid, Spain
| | - Fátima Al-Shahrour
- Bioinformatics Unit, Centro Nacional de Investigaciones Oncológicas (CNIO)Melchor Fernández Almagro 3, 28029 Madrid, Spain
| | - Javier Herrero
- Bioinformatics Unit, Centro Nacional de Investigaciones Oncológicas (CNIO)Melchor Fernández Almagro 3, 28029 Madrid, Spain
- Ensembl Team, EMBL-EBIHinxton, Cambridge, UK
| | - Joaquín Dopazo
- Bioinformatics Unit, Centro Nacional de Investigaciones Oncológicas (CNIO)Melchor Fernández Almagro 3, 28029 Madrid, Spain
- Functional Genomics Node, INB, Centro de Investigación Príncipe FelipeAutopista del Saler 16, 46013 Valencia, Spain
| |
Collapse
|
17
|
Katsel PL, Davis KL, Haroutunian V. Large-Scale Microarray Studies of Gene Expression in Multiple Regions of the Brain in Schizophrenia and Alzheimer's Disease. INTERNATIONAL REVIEW OF NEUROBIOLOGY 2005; 63:41-82. [PMID: 15797465 DOI: 10.1016/s0074-7742(05)63003-6] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Affiliation(s)
- Pavel L Katsel
- Department of Psychiatry, The Mount Sinai School of Medicine New York, New York 10029 USA
| | | | | |
Collapse
|
18
|
de Brevern AG, Hazout S, Malpertuy A. Influence of microarrays experiments missing values on the stability of gene groups by hierarchical clustering. BMC Bioinformatics 2004; 5:114. [PMID: 15324460 PMCID: PMC514701 DOI: 10.1186/1471-2105-5-114] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2004] [Accepted: 08/23/2004] [Indexed: 02/06/2023] Open
Abstract
Background Microarray technologies produced large amount of data. The hierarchical clustering is commonly used to identify clusters of co-expressed genes. However, microarray datasets often contain missing values (MVs) representing a major drawback for the use of the clustering methods. Usually the MVs are not treated, or replaced by zero or estimated by the k-Nearest Neighbor (kNN) approach. The topic of the paper is to study the stability of gene clusters, defined by various hierarchical clustering algorithms, of microarrays experiments including or not MVs. Results In this study, we show that the MVs have important effects on the stability of the gene clusters. Moreover, the magnitude of the gene misallocations is depending on the aggregation algorithm. The most appropriate aggregation methods (e.g. complete-linkage and Ward) are highly sensitive to MVs, and surprisingly, for a very tiny proportion of MVs (e.g. 1%). In most of the case, the MVs must be replaced by expected values. The MVs replacement by the kNN approach clearly improves the identification of co-expressed gene clusters. Nevertheless, we observe that kNN approach is less suitable for the extreme values of gene expression. Conclusion The presence of MVs (even at a low rate) is a major factor of gene cluster instability. In addition, the impact depends on the hierarchical clustering algorithm used. Some methods should be used carefully. Nevertheless, the kNN approach constitutes one efficient method for restoring the missing expression gene values, with a low error level. Our study highlights the need of statistical treatments in microarray data to avoid misinterpretation.
Collapse
Affiliation(s)
- Alexandre G de Brevern
- Equipe de Bioinformatique Génomique et Moléculaire (EBGM), INSERM E0346, Université Denis DIDEROT-Paris 7, case 7113, 2, place Jussieu, 75251 Paris Cedex 05, France
| | - Serge Hazout
- Equipe de Bioinformatique Génomique et Moléculaire (EBGM), INSERM E0346, Université Denis DIDEROT-Paris 7, case 7113, 2, place Jussieu, 75251 Paris Cedex 05, France
| | - Alain Malpertuy
- Atragene Bioinformatics, 4 Rue Pierre Fontaine, 91000 Evry, France
| |
Collapse
|
19
|
Schlicht M, Matysiak B, Brodzeller T, Wen X, Liu H, Zhou G, Dhir R, Hessner MJ, Tonellato P, Suckow M, Pollard M, Datta MW. Cross-species global and subset gene expression profiling identifies genes involved in prostate cancer response to selenium. BMC Genomics 2004; 5:58. [PMID: 15318950 PMCID: PMC516028 DOI: 10.1186/1471-2164-5-58] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2003] [Accepted: 08/20/2004] [Indexed: 11/25/2022] Open
Abstract
Background Gene expression technologies have the ability to generate vast amounts of data, yet there often resides only limited resources for subsequent validation studies. This necessitates the ability to perform sorting and prioritization of the output data. Previously described methodologies have used functional pathways or transcriptional regulatory grouping to sort genes for further study. In this paper we demonstrate a comparative genomics based method to leverage data from animal models to prioritize genes for validation. This approach allows one to develop a disease-based focus for the prioritization of gene data, a process that is essential for systems that lack significant functional pathway data yet have defined animal models. This method is made possible through the use of highly controlled spotted cDNA slide production and the use of comparative bioinformatics databases without the use of cross-species slide hybridizations. Results Using gene expression profiling we have demonstrated a similar whole transcriptome gene expression patterns in prostate cancer cells from human and rat prostate cancer cell lines both at baseline expression levels and after treatment with physiologic concentrations of the proposed chemopreventive agent Selenium. Using both the human PC3 and rat PAII prostate cancer cell lines have gone on to identify a subset of one hundred and fifty-four genes that demonstrate a similar level of differential expression to Selenium treatment in both species. Further analysis and data mining for two genes, the Insulin like Growth Factor Binding protein 3, and Retinoic X Receptor alpha, demonstrates an association with prostate cancer, functional pathway links, and protein-protein interactions that make these genes prime candidates for explaining the mechanism of Selenium's chemopreventive effect in prostate cancer. These genes are subsequently validated by western blots showing Selenium based induction and using tissue microarrays to demonstrate a significant association between downregulated protein expression and tumorigenesis, a process that is the reverse of what is seen in the presence of Selenium. Conclusions Thus the outlined process demonstrates similar baseline and selenium induced gene expression profiles between rat and human prostate cancers, and provides a method for identifying testable functional pathways for the action of Selenium's chemopreventive properties in prostate cancer.
Collapse
Affiliation(s)
- Michael Schlicht
- Department of Pathology, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI, 53226, USA
- Department of Pathology, Winship Cancer Institute, Emory University School of Medicine, 1365-B Clifton Road NE, Atlanta, GA, 30322, USA
| | - Brian Matysiak
- Department of Pathology, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI, 53226, USA
| | - Tracy Brodzeller
- Department of Pathology, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI, 53226, USA
| | - Xinyu Wen
- Department of Pathology, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI, 53226, USA
- Bioinformatics Program and Human and Molecular Genetics Center, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI, 53226, USA
| | - Hang Liu
- Department of Pathology, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI, 53226, USA
- Bioinformatics Program and Human and Molecular Genetics Center, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI, 53226, USA
| | - Guohui Zhou
- Department of Pathology, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI, 53226, USA
- Department of Pathology, Winship Cancer Institute, Emory University School of Medicine, 1365-B Clifton Road NE, Atlanta, GA, 30322, USA
| | - Rajiv Dhir
- Department of Pathology, University of Pittsburgh Medical Center, 200 Lothrop Street, Pittsburgh, PA, 15242, USA
| | - Martin J Hessner
- Department of Pediatrics and Human and Molecular Genetics Center, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI, 53226, USA
- Department of Physiology, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI, 53226, USA
| | - Peter Tonellato
- Bioinformatics Program and Human and Molecular Genetics Center, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI, 53226, USA
- Department of Physiology, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI, 53226, USA
| | - Mark Suckow
- Walther Cancer Center, Lobund Laboratories, 400 Freiman Life Science Center, Notre Dame University, Notre Dame, IN, 46556, USA
| | - Morris Pollard
- Walther Cancer Center, Lobund Laboratories, 400 Freiman Life Science Center, Notre Dame University, Notre Dame, IN, 46556, USA
| | - Milton W Datta
- Department of Pathology, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI, 53226, USA
- Department of Pathology, Winship Cancer Institute, Emory University School of Medicine, 1365-B Clifton Road NE, Atlanta, GA, 30322, USA
| |
Collapse
|
20
|
Herrero J, Vaquerizas JM, Al-Shahrour F, Conde L, Mateos A, Díaz-Uriarte JSR, Dopazo J. New challenges in gene expression data analysis and the extended GEPAS. Nucleic Acids Res 2004; 32:W485-91. [PMID: 15215434 PMCID: PMC441559 DOI: 10.1093/nar/gkh421] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2004] [Revised: 04/07/2004] [Accepted: 04/07/2004] [Indexed: 01/30/2023] Open
Abstract
Since the first papers published in the late nineties, including, for the first time, a comprehensive analysis of microarray data, the number of questions that have been addressed through this technique have both increased and diversified. Initially, interest focussed on genes coexpressing across sets of experimental conditions, implying, essentially, the use of clustering techniques. Recently, however, interest has focussed more on finding genes differentially expressed among distinct classes of experiments, or correlated to diverse clinical outcomes, as well as in building predictors. In addition to this, the availability of accurate genomic data and the recent implementation of CGH arrays has made mapping expression and genomic data on the chromosomes possible. There is also a clear demand for methods that allow the automatic transfer of biological information to the results of microarray experiments. Different initiatives, such as the Gene Ontology (GO) consortium, pathways databases, protein functional motifs, etc., provide curated annotations for genes. Whereas many resources on the web focus mainly on clustering methods, GEPAS has evolved to cope with the aforementioned new challenges that have recently arisen in the field of microarray data analysis. The web-based pipeline for microarray gene expression data, GEPAS, is available at http://gepas.bioinfo.cnio.es.
Collapse
Affiliation(s)
- Javier Herrero
- Bioinformatics Unit, Biotechnology Programme, Centro Nacional de Investigaciones Oncológicas, Melchor Fernández Almagro, 3, E-28029 Madrid, Spain
| | | | | | | | | | | | | |
Collapse
|
21
|
Oakley BA, Hanna DM. A Review of Nanobioscience and Bioinformatics Initiatives in North America. IEEE Trans Nanobioscience 2004; 3:74-84. [PMID: 15382648 DOI: 10.1109/tnb.2003.820259] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Barbara A Oakley
- School of Engineering and Computer Science, Oakland University, Rochester, MI 48309, USA.
| | | |
Collapse
|
22
|
Zhang XHF, Heller KA, Hefter I, Leslie CS, Chasin LA. Sequence information for the splicing of human pre-mRNA identified by support vector machine classification. Genome Res 2003; 13:2637-50. [PMID: 14656968 PMCID: PMC403805 DOI: 10.1101/gr.1679003] [Citation(s) in RCA: 94] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2003] [Accepted: 09/10/2003] [Indexed: 12/23/2022]
Abstract
Vertebrate pre-mRNA transcripts contain many sequences that resemble splice sites on the basis of agreement to the consensus,yet these more numerous false splice sites are usually completely ignored by the cellular splicing machinery. Even at the level of exon definition,pseudo exons defined by such false splices sites outnumber real exons by an order of magnitude. We used a support vector machine to discover sequence information that could be used to distinguish real exons from pseudo exons. This machine learning tool led to the definition of potential branch points,an extended polypyrimidine tract,and C-rich and TG-rich motifs in a region limited to 50 nt upstream of constitutively spliced exons. C-rich sequences were also found in a region extending to 80 nt downstream of exons,along with G-triplet motifs. In addition,it was shown that combinations of three bases within the splice donor consensus sequence were more effective than consensus values in distinguishing real from pseudo splice sites; two-way base combinations were optimal for distinguishing 3' splice sites. These data also suggest that interactions between two or more of these elements may contribute to exon recognition,and provide candidate sequences for assessment as intronic splicing enhancers.
Collapse
Affiliation(s)
- Xiang H-F Zhang
- Department of Biological Sciences, Columbia University, New York, New York 10027, USA
| | | | | | | | | |
Collapse
|
23
|
Herrero J, Al-Shahrour F, Díaz-Uriarte R, Mateos A, Vaquerizas JM, Santoyo J, Dopazo J. GEPAS: A web-based resource for microarray gene expression data analysis. Nucleic Acids Res 2003; 31:3461-7. [PMID: 12824345 PMCID: PMC168997 DOI: 10.1093/nar/gkg591] [Citation(s) in RCA: 142] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We present a web-based pipeline for microarray gene expression profile analysis, GEPAS, which stands for Gene Expression Profile Analysis Suite (http://gepas.bioinfo.cnio.es). GEPAS is composed of different interconnected modules which include tools for data pre-processing, two-conditions comparison, unsupervised and supervised clustering (which include some of the most popular methods as well as home made algorithms) and several tests for differential gene expression among different classes, continuous variables or survival analysis. A multiple purpose tool for data mining, based on Gene Ontology, is also linked to the tools, which constitutes a very convenient way of analysing clustering results. On-line tutorials are available from our main web server (http://bioinfo.cnio.es).
Collapse
Affiliation(s)
- Javier Herrero
- Bioinformatics Unit, Centro Nacional de Investigaciones Oncológicas, c/Melchor Fernández Almagro 3, 28029, Madrid, Spain
| | | | | | | | | | | | | |
Collapse
|
24
|
Abstract
This paper presents a novel clustering technique known as adaptive double self-organizing map (ADSOM). ADSOM has a flexible topology and performs clustering and cluster visualization simultaneously, thereby requiring no a priori knowledge about the number of clusters. ADSOM is developed based on a recently introduced technique known as double self-organizing map (DSOM). DSOM combines features of the popular self-organizing map (SOM) with two-dimensional position vectors, which serve as a visualization tool to decide how many clusters are needed. Although DSOM addresses the problem of identifying unknown number of clusters, its free parameters are difficult to control to guarantee correct results and convergence. ADSOM updates its free parameters during training, and it allows convergence of its position vectors to a fairly consistent number of clusters provided that its initial number of nodes is greater than the expected number of clusters. The number of clusters can be identified by visually counting the clusters formed by the position vectors after training. A novel index is introduced based on hierarchical clustering of the final locations of position vectors. The index allows automated detection of the number of clusters, thereby reducing human error that could be incurred from counting clusters visually. The reliance of ADSOM in identifying the number of clusters is proven by applying it to publicly available gene expression data from multiple biological systems such as yeast, human, and mouse. ADSOM's performance in detecting number of clusters is compared with a model-based clustering method.
Collapse
|
25
|
Ressom H, Wang D, Natarajan P. Adaptive double self-organizing maps for clustering gene expression profiles. Neural Netw 2003; 16:633-40. [PMID: 12850017 DOI: 10.1016/s0893-6080(03)00102-3] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
This paper introduces a new model of self-organizing map (SOM) known as adaptive double self-organizing map (ADSOM). ADSOM has a flexible topology and performs data partitioning and cluster visualization simultaneously without requiring a priori knowledge about the number of clusters. It combines features of the popular SOM with two-dimensional position vectors, which serve as a visualization tool to detect the number of clusters present in the data. ADSOM updates its free parameters and allows convergence of its position vectors to a fairly consistent number of clusters provided its initial number of nodes is greater than the expected number of clusters. A novel index is introduced based on hierarchical clustering of the final locations of position vectors. The index allows automated detection of the number of clusters, thereby reducing human error that could be incurred from counting clusters visually. To test ADSOM's consistency in data partitioning, we examine the number of common profiles found in the clusters that were obtained by varying the initial number of nodes. This provides a confidence measure for the clusters formed by ADSOM and illustrates the effect of different initial number of nodes on data partitioning. The reliance of ADSOM in identifying number of clusters is demonstrated by applying it to publicly available yeast gene expression data.
Collapse
Affiliation(s)
- H Ressom
- Department of Electrical and Computer Engineering, University of Maine, 201 Barrows Hall, Orono, ME 04469-5708, USA
| | | | | |
Collapse
|
26
|
Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2003; 4:277-84. [PMID: 18629117 PMCID: PMC2447404 DOI: 10.1002/cfg.227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|