1
|
Thomson CE, Winney IS, Salles OC, Pujol B. A guide to using a multiple-matrix animal model to disentangle genetic and nongenetic causes of phenotypic variance. PLoS One 2018; 13:e0197720. [PMID: 30312317 PMCID: PMC6193571 DOI: 10.1371/journal.pone.0197720] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Accepted: 09/19/2018] [Indexed: 11/19/2022] Open
Abstract
Non-genetic influences on phenotypic traits can affect our interpretation of genetic variance and the evolutionary potential of populations to respond to selection, with consequences for our ability to predict the outcomes of selection. Long-term population surveys and experiments have shown that quantitative genetic estimates are influenced by nongenetic effects, including shared environmental effects, epigenetic effects, and social interactions. Recent developments to the "animal model" of quantitative genetics can now allow us to calculate precise individual-based measures of non-genetic phenotypic variance. These models can be applied to a much broader range of contexts and data types than used previously, with the potential to greatly expand our understanding of nongenetic effects on evolutionary potential. Here, we provide the first practical guide for researchers interested in distinguishing between genetic and nongenetic causes of phenotypic variation in the animal model. The methods use matrices describing individual similarity in nongenetic effects, analogous to the additive genetic relatedness matrix. In a simulation of various phenotypic traits, accounting for environmental, epigenetic, or cultural resemblance between individuals reduced estimates of additive genetic variance, changing the interpretation of evolutionary potential. These variances were estimable for both direct and parental nongenetic variances. Our tutorial outlines an easy way to account for these effects in both wild and experimental populations. These models have the potential to add to our understanding of the effects of genetic and nongenetic effects on evolutionary potential. This should be of interest both to those studying heritability, and those who wish to understand nongenetic variance.
Collapse
Affiliation(s)
- Caroline E. Thomson
- Laboratoire Evolution & Diversité Biologique (EDB UMR 5174), Université Fédérale Toulouse, Midi-Pyrénées, CNRS, ENSFEA, IRD, UPS, France
| | - Isabel S. Winney
- Laboratoire Evolution & Diversité Biologique (EDB UMR 5174), Université Fédérale Toulouse, Midi-Pyrénées, CNRS, ENSFEA, IRD, UPS, France
| | - Océane C. Salles
- Laboratoire Evolution & Diversité Biologique (EDB UMR 5174), Université Fédérale Toulouse, Midi-Pyrénées, CNRS, ENSFEA, IRD, UPS, France
| | - Benoit Pujol
- Laboratoire Evolution & Diversité Biologique (EDB UMR 5174), Université Fédérale Toulouse, Midi-Pyrénées, CNRS, ENSFEA, IRD, UPS, France
- Laboratoire d’Excellence “CORAIL”, Perpignan, France
| |
Collapse
|
2
|
Jia X, Liu Y, Han Q, Lu Z. Multiple-cumulative probabilities used to cluster and visualize transcriptomes. FEBS Open Bio 2017; 7:2008-2020. [PMID: 29226087 PMCID: PMC5715267 DOI: 10.1002/2211-5463.12327] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2017] [Accepted: 09/20/2017] [Indexed: 11/10/2022] Open
Abstract
Analysis of gene expression data by clustering and visualizing played a central role in obtaining biological knowledge. Here, we used Pearson's correlation coefficient of multiple-cumulative probabilities (PCC-MCP) of genes to define the similarity of gene expression behaviors. To answer the challenge of the high-dimensional MCPs, we used icc-cluster, a clustering algorithm that obtained solutions by iterating clustering centers, with PCC-MCP to group genes. We then used t-statistic stochastic neighbor embedding (t-SNE) of KC-data to generate optimal maps for clusters of MCP (t-SNE-MCP-O maps). From the analysis of several transcriptome data sets, we demonstrated clear advantages for using icc-cluster with PCC-MCP over commonly used clustering methods. t-SNE-MCP-O was also shown to give clearly projecting boundaries for clusters of PCC-MCP, which made the relationships between clusters easy to visualize and understand.
Collapse
Affiliation(s)
- Xingang Jia
- School of Mathematics Southeast University Nanjing China.,State Key Laboratory of Bioelectronics School of Biological Science and Medical Engineering Southeast University Nanjing China
| | - Yisu Liu
- Linyi No. 1 High School of Shandong Province Linyi China
| | - Qiuhong Han
- Department of Mathematics Nanjing Forestry University China
| | - Zuhong Lu
- State Key Laboratory of Bioelectronics School of Biological Science and Medical Engineering Southeast University Nanjing China
| |
Collapse
|
3
|
Jia X, Zhu G, Han Q, Lu Z. The biological knowledge discovery by PCCF measure and PCA-F projection. PLoS One 2017; 12:e0175104. [PMID: 28399180 PMCID: PMC5388332 DOI: 10.1371/journal.pone.0175104] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2016] [Accepted: 03/21/2017] [Indexed: 11/19/2022] Open
Abstract
In the process of biological knowledge discovery, PCA is commonly used to complement the clustering analysis, but PCA typically gives the poor visualizations for most gene expression data sets. Here, we propose a PCCF measure, and use PCA-F to display clusters of PCCF, where PCCF and PCA-F are modeled from the modified cumulative probabilities of genes. From the analysis of simulated and experimental data sets, we demonstrate that PCCF is more appropriate and reliable for analyzing gene expression data compared to other commonly used distances or similarity measures, and PCA-F is a good visualization technique for identifying clusters of PCCF, where we aim at such data sets that the expression values of genes are collected at different time points.
Collapse
Affiliation(s)
- Xingang Jia
- Department of Mathematics, Southeast University, Nanjing 210096, PR China
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, PR China
- * E-mail:
| | - Guanqun Zhu
- Department of chemistry, Nanjing Agricultural University, Nanjing 210000, PR China
| | - Qiuhong Han
- Department of Mathematics, Nanjing Forestry University, Nanjing 210037, PR China
| | - Zuhong Lu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, PR China
| |
Collapse
|
4
|
González J, Muñoz A, Martos G. Asymmetric latent semantic indexing for gene expression experiments visualization. J Bioinform Comput Biol 2016; 14:1650023. [PMID: 27427382 DOI: 10.1142/s0219720016500232] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
We propose a new method to visualize gene expression experiments inspired by the latent semantic indexing technique originally proposed in the textual analysis context. By using the correspondence word-gene document-experiment, we define an asymmetric similarity measure of association for genes that accounts for potential hierarchies in the data, the key to obtain meaningful gene mappings. We use the polar decomposition to obtain the sources of asymmetry of the similarity matrix, which are later combined with previous knowledge. Genetic classes of genes are identified by means of a mixture model applied in the genes latent space. We describe the steps of the procedure and we show its utility in the Human Cancer dataset.
Collapse
Affiliation(s)
- Javier González
- * Department of Computer Science, Sheffield Institute for Translational Neuroscience, University of Sheffield, Glossop Road S10 2HQ, Sheffield, UK
| | - Alberto Muñoz
- † Department of Statistics, University Carlos III of Madrid, Spain. C/Madrid, 126-28903, Getafe (Madrid), Spain
| | - Gabriel Martos
- † Department of Statistics, University Carlos III of Madrid, Spain. C/Madrid, 126-28903, Getafe (Madrid), Spain
| |
Collapse
|
5
|
Vavoulis DV, Francescatto M, Heutink P, Gough J. DGEclust: differential expression analysis of clustered count data. Genome Biol 2015; 16:39. [PMID: 25853652 PMCID: PMC4365804 DOI: 10.1186/s13059-015-0604-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2014] [Accepted: 02/03/2015] [Indexed: 11/10/2022] Open
Abstract
We present a statistical methodology, DGEclust, for differential expression analysis of digital expression data. Our method treats differential expression as a form of clustering, thus unifying these two concepts. Furthermore, it simultaneously addresses the problem of how many clusters are supported by the data and uncertainty in parameter estimation. DGEclust successfully identifies differentially expressed genes under a number of different scenarios, maintaining a low error rate and an excellent control of its false discovery rate with reasonable computational requirements. It is formulated to perform particularly well on low-replicated data and be applicable to multi-group data. DGEclust is available at http://dvav.github.io/dgeclust/.
Collapse
Affiliation(s)
| | - Margherita Francescatto
- />Genome Biology of Neurodegenerative Diseases, Deutsches Zentrum für Neurodegenerative Erkrankungen, Tübingen, Germany
| | - Peter Heutink
- />Genome Biology of Neurodegenerative Diseases, Deutsches Zentrum für Neurodegenerative Erkrankungen, Tübingen, Germany
| | - Julian Gough
- />Department of Computer Science, University of Bristol, Bristol, UK
| |
Collapse
|
6
|
Wang YXR, Huang H. Review on statistical methods for gene network reconstruction using expression data. J Theor Biol 2014; 362:53-61. [PMID: 24726980 DOI: 10.1016/j.jtbi.2014.03.040] [Citation(s) in RCA: 97] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2014] [Revised: 03/29/2014] [Accepted: 03/31/2014] [Indexed: 12/16/2022]
Abstract
Network modeling has proven to be a fundamental tool in analyzing the inner workings of a cell. It has revolutionized our understanding of biological processes and made significant contributions to the discovery of disease biomarkers. Much effort has been devoted to reconstruct various types of biochemical networks using functional genomic datasets generated by high-throughput technologies. This paper discusses statistical methods used to reconstruct gene regulatory networks using gene expression data. In particular, we highlight progress made and challenges yet to be met in the problems involved in estimating gene interactions, inferring causality and modeling temporal changes of regulation behaviors. As rapid advances in technologies have made available diverse, large-scale genomic data, we also survey methods of incorporating all these additional data to achieve better, more accurate inference of gene networks.
Collapse
Affiliation(s)
- Y X Rachel Wang
- Department of Statistics, University of California, Berkeley, CA 94720, USA.
| | - Haiyan Huang
- Department of Statistics, University of California, Berkeley, CA 94720, USA.
| |
Collapse
|
7
|
Bischoff SR, Tsai SQ, Hardison NE, Motsinger-Reif AA, Freking BA, Nonneman DJ, Rohrer GA, Piedrahita JA. Differences in X-chromosome transcriptional activity and cholesterol metabolism between placentae from swine breeds from Asian and Western origins. PLoS One 2013; 8:e55345. [PMID: 23383161 PMCID: PMC3561265 DOI: 10.1371/journal.pone.0055345] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2012] [Accepted: 12/21/2012] [Indexed: 12/19/2022] Open
Abstract
To gain insight into differences in placental physiology between two swine breeds noted for their dissimilar reproductive performance, that is, the Chinese Meishan and white composite (WC), we examined gene expression profiles of placental tissues collected at 25, 45, 65, 85, and 105 days of gestation by microarrays. Using a linear mixed model, a total of 1,595 differentially expressed genes were identified between the two pig breeds using a false-discovery rate q-value ≤0.05. Among these genes, we identified breed-specific isoforms of XIST, a long non-coding RNA responsible X-chromosome dosage compensation in females. Additionally, we explored the interaction of placental gene expression and chromosomal location by DIGMAP and identified three Sus scrofa X chromosomal bands (Xq13, Xq21, Xp11) that represent transcriptionally active clusters that differ between Meishan and WC during placental development. Also, pathway analysis identified fundamental breed differences in placental cholesterol trafficking and its synthesis. Direct measurement of cholesterol confirmed that the cholesterol content was significantly higher in the Meishan versus WC placentae. Taken together, this work identifies key metabolic pathways that differ in the placentae of two swine breeds noted for differences in reproductive prolificacy.
Collapse
Affiliation(s)
- Steve R. Bischoff
- Department of Molecular Biomedical Sciences, College of Veterinary Medicine, North Carolina State University, Raleigh, North Carolina, United States of America
- USDA, ARS, U.S. Meat Animal Research Center, Clay Center, Nebraska, United States of America
- Center for Comparative Medicine and Translational Research, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Shengdar Q. Tsai
- Department of Molecular Biomedical Sciences, College of Veterinary Medicine, North Carolina State University, Raleigh, North Carolina, United States of America
- Center for Comparative Medicine and Translational Research, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Nicholas E. Hardison
- Program in Statistical Genetics, Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Alison A. Motsinger-Reif
- Program in Statistical Genetics, Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Bradley A. Freking
- USDA, ARS, U.S. Meat Animal Research Center, Clay Center, Nebraska, United States of America
| | - Dan J. Nonneman
- USDA, ARS, U.S. Meat Animal Research Center, Clay Center, Nebraska, United States of America
| | - Gary A. Rohrer
- USDA, ARS, U.S. Meat Animal Research Center, Clay Center, Nebraska, United States of America
| | - Jorge A. Piedrahita
- Department of Molecular Biomedical Sciences, College of Veterinary Medicine, North Carolina State University, Raleigh, North Carolina, United States of America
- Center for Comparative Medicine and Translational Research, North Carolina State University, Raleigh, North Carolina, United States of America
- * E-mail:
| |
Collapse
|
8
|
Synnergren J, Améen C, Jansson A, Sartipy P. Global transcriptional profiling reveals similarities and differences between human stem cell-derived cardiomyocyte clusters and heart tissue. Physiol Genomics 2011; 44:245-58. [PMID: 22166955 DOI: 10.1152/physiolgenomics.00118.2011] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
It is now well documented that human embryonic stem cells (hESCs) can differentiate into functional cardiomyocytes. These cells constitute a promising source of material for use in drug development, toxicity testing, and regenerative medicine. To assess their utility as replacement or complement to existing models, extensive phenotypic characterization of the cells is required. In the present study, we used microarrays and analyzed the global transcription of hESC-derived cardiomyocyte clusters (CMCs) and determined similarities as well as differences compared with reference samples from fetal and adult heart tissue. In addition, we performed a focused analysis of the expression of cardiac ion channels and genes involved in the Ca(2+)-handling machinery, which in previous studies have been shown to be immature in stem cell-derived cardiomyocytes. Our results show that hESC-derived CMCs, on a global level, have a highly similar gene expression profile compared with human heart tissue, and their transcriptional phenotype was more similar to fetal than to adult heart. Despite the high similarity to heart tissue, a number of significantly differentially expressed genes were identified, providing some clues toward understanding the molecular difference between in vivo sourced tissue and stem cell derivatives generated in vitro. Interestingly, some of the cardiac-related ion channels and Ca(2+)-handling genes showed differential expression between the CMCs and heart tissues. These genes may represent candidates for future genetic engineering to create hESC-derived CMCs that better mimic the phenotype of the cardiomyocytes present in the adult human heart.
Collapse
Affiliation(s)
- Jane Synnergren
- Systems Biology Research Center, University of Skövde, Skövde, Sweden.
| | | | | | | |
Collapse
|
9
|
Improving pattern discovery and visualisation with self-adaptive neural networks through data transformations. INT J MACH LEARN CYB 2011. [DOI: 10.1007/s13042-011-0050-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
10
|
Pinto PI, Matsumura H, Thorne MA, Power DM, Terauchi R, Reinhardt R, Canário AV. Gill transcriptome response to changes in environmental calcium in the green spotted puffer fish. BMC Genomics 2010; 11:476. [PMID: 20716350 PMCID: PMC3091672 DOI: 10.1186/1471-2164-11-476] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2010] [Accepted: 08/17/2010] [Indexed: 12/13/2022] Open
Abstract
Background Calcium ion is tightly regulated in body fluids and for euryhaline fish, which are exposed to rapid changes in environmental [Ca2+], homeostasis is especially challenging. The gill is the main organ of active calcium uptake and therefore plays a crucial role in the maintenance of calcium ion homeostasis. To study the molecular basis of the short-term responses to changing calcium availability, the whole gill transcriptome obtained by Super Serial Analysis of Gene Expression (SuperSAGE) of the euryhaline teleost green spotted puffer fish, Tetraodon nigroviridis, exposed to water with altered [Ca2+] was analysed. Results Transfer of T. nigroviridis from 10 ppt water salinity containing 2.9 mM Ca2+ to high (10 mM Ca2+ ) and low (0.01 mM Ca2+) calcium water of similar salinity for 2-12 h resulted in 1,339 differentially expressed SuperSAGE tags (26-bp transcript identifiers) in gills. Of these 869 tags (65%) were mapped to T. nigroviridis cDNAs or genomic DNA and 497 (57%) were assigned to known proteins. Thirteen percent of the genes matched multiple tags indicating alternative RNA transcripts. The main enriched gene ontology groups belong to Ca2+ signaling/homeostasis but also muscle contraction, cytoskeleton, energy production/homeostasis and tissue remodeling. K-means clustering identified co-expressed transcripts with distinct patterns in response to water [Ca2+] and exposure time. Conclusions The generated transcript expression patterns provide a framework of novel water calcium-responsive genes in the gill during the initial response after transfer to different [Ca2+]. This molecular response entails initial perception of alterations, activation of signaling networks and effectors and suggests active remodeling of cytoskeletal proteins during the initial acclimation process. Genes related to energy production and energy homeostasis are also up-regulated, probably reflecting the increased energetic needs of the acclimation response. This study is the first genome-wide transcriptome analysis of fish gills and is an important resource for future research on the short-term mechanisms involved in the gill acclimation responses to environmental Ca2+ changes and osmoregulation.
Collapse
Affiliation(s)
- Patrícia Is Pinto
- Centro de Ciências do Mar, CIMAR-Laboratório Associado, University of Algarve, Campus de Gambelas, 8005-139 Faro, Portugal.
| | | | | | | | | | | | | |
Collapse
|
11
|
Kulka J. [Pathological aspects of in situ carcinoma/intraepithelial neoplasia of the breast]. Orv Hetil 2010; 151:45-53. [PMID: 20061232 DOI: 10.1556/oh.2010.28779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Dramatic development has happened in the field of classical and molecular breast pathology in the last three decades. Introduction of systematic screening programs advanced our knowledge in the field of classical surgical pathology, while molecular technical revolution resulted in dramatic improvement of our understanding of molecular pathology of breast tumors and precancerous lesions. This continuous increase of our knowledge results in the change of our concepts, classifications and approach. In this review, I would like to share the new and recently adapted views regarding intraepithelial neoplastic lesions of the breast.
Collapse
Affiliation(s)
- Janina Kulka
- Semmelweis Egyetem, Altalános Orvostudományi Kar II. Patológiai Intézet Budapest Ulloi út 93. 1091.
| |
Collapse
|
12
|
Jiang K, Zhu T, Diao Z, Huang H, Feldman LJ. The maize root stem cell niche: a partnership between two sister cell populations. PLANTA 2010; 231:411-24. [PMID: 20041334 PMCID: PMC2799627 DOI: 10.1007/s00425-009-1059-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/21/2009] [Accepted: 11/05/2009] [Indexed: 05/19/2023]
Abstract
Using transcript profile analysis, we explored the nature of the stem cell niche in roots of maize (Zea mays). Toward assessing a role for specific genes in the establishment and maintenance of the niche, we perturbed the niche and simultaneously monitored the spatial expression patterns of genes hypothesized as essential. Our results allow us to quantify and localize gene activities to specific portions of the niche: to the quiescent center (QC) or the proximal meristem (PM), or to both. The data point to molecular, biochemical and physiological processes associated with the specification and maintenance of the niche, and include reduced expression of metabolism-, redox- and certain cell cycle-associated transcripts in the QC, enrichment of auxin-associated transcripts within the entire niche, controls for the state of differentiation of QC cells, a role for cytokinins specifically in the PM portion of the niche, processes (repair machinery) for maintaining DNA integrity and a role for gene silencing in niche stabilization. To provide additional support for the hypothesized roles of the above-mentioned and other transcripts in niche specification, we overexpressed, in Arabidopsis, homologs of representative genes (eight) identified as highly enriched or reduced in the maize root QC. We conclude that the coordinated changes in expression of auxin-, redox-, cell cycle- and metabolism-associated genes suggest the linkage of gene networks at the level of transcription, thereby providing additional insights into events likely associated with root stem cell niche establishment and maintenance.
Collapse
Affiliation(s)
- Keni Jiang
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720 USA
| | - Tong Zhu
- Syngenta Biotechnology, Inc., 3054 Cornwallis Road, Research Triangle Park, NC 27709 USA
| | - Zhaoyan Diao
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720 USA
| | - Haiyan Huang
- Department of Statistics, University of California, Berkeley, CA 94720 USA
| | - Lewis J. Feldman
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720 USA
| |
Collapse
|
13
|
Li GG, Wang ZZ. Evaluation of similarity measures for gene expression data and their correspondent combined measures. Interdiscip Sci 2009; 1:72-80. [DOI: 10.1007/s12539-008-0005-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2008] [Revised: 08/10/2008] [Accepted: 08/10/2008] [Indexed: 11/30/2022]
|
14
|
Hestilow TJ, Huang Y. Clustering of gene expression data based on shape similarity. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2009; 2009:195712. [PMID: 19404484 PMCID: PMC3171421 DOI: 10.1155/2009/195712] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/04/2008] [Revised: 01/08/2009] [Accepted: 01/27/2009] [Indexed: 11/18/2022]
Abstract
A method for gene clustering from expression profiles using shape information is presented. The conventional clustering approaches such as K-means assume that genes with similar functions have similar expression levels and hence allocate genes with similar expression levels into the same cluster. However, genes with similar function often exhibit similarity in signal shape even though the expression magnitude can be far apart. Therefore, this investigation studies clustering according to signal shape similarity. This shape information is captured in the form of normalized and time-scaled forward first differences, which then are subject to a variational Bayes clustering plus a non-Bayesian (Silhouette) cluster statistic. The statistic shows an improved ability to identify the correct number of clusters and assign the components of cluster. Based on initial results for both generated test data and Escherichia coli microarray expression data and initial validation of the Escherichia coli results, it is shown that the method has promise in being able to better cluster time-series microarray data according to shape similarity.
Collapse
Affiliation(s)
- Travis J Hestilow
- Department of Electrical and Computer Engineering, The University of Texas at San Antonio, San Antonio, TX 78249, USA
| | - Yufei Huang
- Department of Electrical and Computer Engineering, The University of Texas at San Antonio, San Antonio, TX 78249, USA
- Greehey Children's Cancer Research Institute, University of Texas Health Science Center at San Antonio, TX 78229, USA
| |
Collapse
|
15
|
Wijaya E, Yiu SM, Son NT, Kanagasabai R, Sung WK. MotifVoter: a novel ensemble method for fine-grained integration of generic motif finders. ACTA ACUST UNITED AC 2008; 24:2288-95. [PMID: 18697768 DOI: 10.1093/bioinformatics/btn420] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
MOTIVATION Locating transcription factor binding sites (motifs) is a key step in understanding gene regulation. Based on Tompa's benchmark study, the performance of current de novo motif finders is far from satisfactory (with sensitivity <or=0.222 and precision <or=0.307). The same study also shows that no motif finder performs consistently well over all datasets. Hence, it is not clear which finder one should use for a given dataset. To address this issue, a class of algorithms called ensemble methods have been proposed. Though the existing ensemble methods overall perform better than stand-alone motif finders, the improvement gained is not substantial. Our study reveals that these methods do not fully exploit the information obtained from the results of individual finders, resulting in minor improvement in sensitivity and poor precision. RESULTS In this article, we identify several key observations on how to utilize the results from individual finders and design a novel ensemble method, MotifVoter, to predict the motifs and binding sites. Evaluations on 186 datasets show that MotifVoter can locate more than 95% of the binding sites found by its component motif finders. In terms of sensitivity and precision, MotifVoter outperforms stand-alone motif finders and ensemble methods significantly on Tompa's benchmark, Escherichia coli, and ChIP-Chip datasets. MotifVoter is available online via a web server with several biologist-friendly features.
Collapse
Affiliation(s)
- Edward Wijaya
- School of Computing, National University of Singapore, Singapore
| | | | | | | | | |
Collapse
|
16
|
Wang H, Zheng H, Azuaje F. Clustering-based approaches to SAGE data mining. BioData Min 2008; 1:5. [PMID: 18822151 PMCID: PMC2553774 DOI: 10.1186/1756-0381-1-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2008] [Accepted: 07/17/2008] [Indexed: 11/12/2022] Open
Abstract
Serial analysis of gene expression (SAGE) is one of the most powerful tools for global gene expression profiling. It has led to several biological discoveries and biomedical applications, such as the prediction of new gene functions and the identification of biomarkers in human cancer research. Clustering techniques have become fundamental approaches in these applications. This paper reviews relevant clustering techniques specifically designed for this type of data. It places an emphasis on current limitations and opportunities in this area for supporting biologically-meaningful data mining and visualisation.
Collapse
Affiliation(s)
- Haiying Wang
- School of Computing and Mathematics, University of Ulster, Newtownabbey, BT37 0QB, Co, Antrim, Northern Ireland, UK.
| | | | | |
Collapse
|