1
|
Stevens JR, Herrick JS, Wolff RK, Slattery ML. Power in pairs: assessing the statistical value of paired samples in tests for differential expression. BMC Genomics 2018; 19:953. [PMID: 30572829 PMCID: PMC6302489 DOI: 10.1186/s12864-018-5236-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Accepted: 11/09/2018] [Indexed: 12/19/2022] Open
Abstract
Background When genomics researchers design a high-throughput study to test for differential expression, some biological systems and research questions provide opportunities to use paired samples from subjects, and researchers can plan for a certain proportion of subjects to have paired samples. We consider the effect of this paired samples proportion on the statistical power of the study, using characteristics of both count (RNA-Seq) and continuous (microarray) expression data from a colorectal cancer study. Results We demonstrate that a higher proportion of subjects with paired samples yields higher statistical power, for various total numbers of samples, and for various strengths of subject-level confounding factors. In the design scenarios considered, the statistical power in a fully-paired design is substantially (and in many cases several times) greater than in an unpaired design. Conclusions For the many biological systems and research questions where paired samples are feasible and relevant, substantial statistical power gains can be achieved at the study design stage when genomics researchers plan on using paired samples from the largest possible proportion of subjects. Any cost savings in a study design with unpaired samples are likely accompanied by underpowered and possibly biased results. Electronic supplementary material The online version of this article (10.1186/s12864-018-5236-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- John R Stevens
- Department of Mathematics and Statistics, Utah State University, Logan, UT, USA.
| | - Jennifer S Herrick
- Division of Epidemiology, Department of Internal Medicine, University of Utah, Salt Lake City, UT, USA
| | - Roger K Wolff
- Division of Epidemiology, Department of Internal Medicine, University of Utah, Salt Lake City, UT, USA
| | - Martha L Slattery
- Division of Epidemiology, Department of Internal Medicine, University of Utah, Salt Lake City, UT, USA
| |
Collapse
|
2
|
Abstract
In this article we propose two practical types of designs for large time-course, dual-channel microarray experiments. One type consists of several interwoven loops, and the other type combines reference and loop designs. By representing the experiment as a graph, where the timepoints are nodes and the arrays are edges, we demonstrate how the time contrasts between any two timepoints can be estimated, provided that there is a path of edges linking them. In addition, we give a general formula for the variance of such contrasts. The efficiency of the proposed designs is evaluated by estimating the variances of the log-ratios of the comparisons of interest.
Collapse
Affiliation(s)
- Raya Khanin
- Department of Statistics, University of Glasgow, Glasgow, UK
| | | |
Collapse
|
3
|
Sugden LA, Tackett MR, Savva YA, Thompson WA, Lawrence CE. Assessing the validity and reproducibility of genome-scale predictions. ACTA ACUST UNITED AC 2013; 29:2844-51. [PMID: 24048353 DOI: 10.1093/bioinformatics/btt508] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
MOTIVATION Validation and reproducibility of results is a central and pressing issue in genomics. Several recent embarrassing incidents involving the irreproducibility of high-profile studies have illustrated the importance of this issue and the need for rigorous methods for the assessment of reproducibility. RESULTS Here, we describe an existing statistical model that is very well suited to this problem. We explain its utility for assessing the reproducibility of validation experiments, and apply it to a genome-scale study of adenosine deaminase acting on RNA (ADAR)-mediated RNA editing in Drosophila. We also introduce a statistical method for planning validation experiments that will obtain the tightest reproducibility confidence limits, which, for a fixed total number of experiments, returns the optimal number of replicates for the study. AVAILABILITY Downloadable software and a web service for both the analysis of data from a reproducibility study and for the optimal design of these studies is provided at http://ccmbweb.ccv.brown.edu/reproducibility.html .
Collapse
Affiliation(s)
- Lauren A Sugden
- Center for Computational Molecular Biology and the Division of Applied Mathematics, Brown University, Providence, RI 02912, USA, St. Laurent Institute, 317 New Boston St, Woburn, MA 01801, USA and Department of Molecular Biology, Cell Biology and Biochemistry, Brown University, Providence, RI 02912, USA
| | | | | | | | | |
Collapse
|
4
|
Abstract
Microarray is a technology to screen a large number of genes to discover those differentially expressed between clinical subtypes or different conditions of human diseases. Gene discovery using microarray data requires adjustment for the large-scale multiplicity of candidate genes. The family-wise error rate (FWER) has been widely chosen as a global type I error rate adjusting for the multiplicity. Typically in microarray data, the expression levels of different genes are correlated because of coexpressing genes and the common experimental conditions shared by the genes on each array. To accurately control the FWER, the statistical testing procedure should appropriately reflect the dependency among the genes. Permutation methods have been used for accurate control of the FWER in analyzing microarray data. It is important to calculate the required sample size at the design stage of a new (confirmatory) microarray study. Because of the high dimensionality and complexity of the correlation structure in microarray data, however, there have been no sample size calculation methods accurately reflecting the true correlation structure of real microarray data. We propose sample size and power calculation methods that are useful when pilot data are available to design a confirmatory experiment. If no pilot data are available, we recommend a two-stage sample size recalculation based on our proposed method using the first stage data as pilot data. The calculated sample sizes are shown to accurately maintain the power through simulations. A real data example is taken to illustrate the proposed method.
Collapse
Affiliation(s)
- Sin-Ho Jung
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27710, USA
| | | |
Collapse
|
5
|
Aittokallio T, Kurki M, Nevalainen O, Nikula T, West A, Lahesmaa R. Computational Strategies for Analyzing Data in Gene Expression Microarray Experiments. J Bioinform Comput Biol 2012; 1:541-86. [PMID: 15290769 DOI: 10.1142/s0219720003000319] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2003] [Revised: 07/02/2003] [Indexed: 11/18/2022]
Abstract
Microarray analysis has become a widely used method for generating gene expression data on a genomic scale. Microarrays have been enthusiastically applied in many fields of biological research, even though several open questions remain about the analysis of such data. A wide range of approaches are available for computational analysis, but no general consensus exists as to standard for microarray data analysis protocol. Consequently, the choice of data analysis technique is a crucial element depending both on the data and on the goals of the experiment. Therefore, basic understanding of bioinformatics is required for optimal experimental design and meaningful interpretation of the results. This review summarizes some of the common themes in DNA microarray data analysis, including data normalization and detection of differential expression. Algorithms are demonstrated by analyzing cDNA microarray data from an experiment monitoring gene expression in T helper cells. Several computational biology strategies, along with their relative merits, are overviewed and potential areas for additional research discussed. The goal of the review is to provide a computational framework for applying and evaluating such bioinformatics strategies. Solid knowledge of microarray informatics contributes to the implementation of more efficient computational protocols for the given data obtained through microarray experiments.
Collapse
Affiliation(s)
- Tero Aittokallio
- Department of Computational Biology, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa-Shi, Chiba 277-8562, Japan.
| | | | | | | | | | | |
Collapse
|
6
|
Owzar K, Barry WT, Jung SH. Statistical considerations for analysis of microarray experiments. Clin Transl Sci 2011; 4:466-77. [PMID: 22212230 DOI: 10.1111/j.1752-8062.2011.00309.x] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Microarray technologies enable the simultaneous interrogation of expressions from thousands of genes from a biospecimen sample taken from a patient. This large set of expressions generates a genetic profile of the patient that may be used to identify potential prognostic or predictive genes or genetic models for clinical outcomes. The aim of this article is to provide a broad overview of some of the major statistical considerations for the design and analysis of microarrays experiments conducted as correlative science studies to clinical trials. An emphasis will be placed on how the lack of understanding and improper use of statistical concepts and methods will lead to noise discovery and misinterpretation of experimental results.
Collapse
Affiliation(s)
- Kouros Owzar
- Department of Biostatistics and Bioinformatics, Duke University CALGB Statistical Center, Duke University, Durham, North Carolina, USA
| | | | | |
Collapse
|
7
|
Rosa GJM, Steibel JP, Tempelman RJ. Reassessing design and analysis of two-colour microarray experiments using mixed effects models. Comp Funct Genomics 2010; 6:123-31. [PMID: 18629220 PMCID: PMC2447516 DOI: 10.1002/cfg.464] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2005] [Accepted: 02/03/2005] [Indexed: 11/08/2022] Open
Abstract
Gene expression microarray studies have led to interesting experimental design and statistical analysis challenges. The comparison of expression profiles across populations is one of the most common objectives of microarray experiments. In this manuscript we review some issues regarding design and statistical analysis for two-colour microarray platforms using mixed linear models, with special attention directed towards the different hierarchical levels of replication and the consequent effect on the use of appropriate error terms for comparing experimental groups. We examine the traditional analysis of variance (ANOVA) models proposed for microarray data and their extensions to hierarchically replicated experiments. In addition, we discuss a mixed model methodology for power and efficiency calculations of different microarray experimental designs.
Collapse
Affiliation(s)
- Guilherme J M Rosa
- Department of Animal Science, Michigan State University, East Lansing, MI 48824-1225, USA.
| | | | | |
Collapse
|
8
|
Zou F, Huang H, Ibrahim JG. A Semiparametric Bayesian Approach for Estimating the Gene Expression Distribution. J Biopharm Stat 2010; 20:267-80. [DOI: 10.1080/10543400903572746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Fei Zou
- a Department of Biostatistics , University of North Carolina at Chapel Hill , Chapel Hill, North Carolina, USA
| | - Hanwen Huang
- a Department of Biostatistics , University of North Carolina at Chapel Hill , Chapel Hill, North Carolina, USA
| | - Joseph G. Ibrahim
- a Department of Biostatistics , University of North Carolina at Chapel Hill , Chapel Hill, North Carolina, USA
| |
Collapse
|
9
|
Abstract
Microarray technology has been used widely in gynecology. Numerous studies have used this method to address biological questions related to human endometrium. The cyclic changes of endometrium confer special characteristics that should be considered before genomic analysis. The present study reviews these considerations and the principles of transcriptomic analysis through an example of a comparison of three different phases of the menstrual cycle.
Collapse
|
10
|
Abstract
Sample size calculation is a critical procedure when designing a new biological study. In this chapter, we consider molecular biology studies generating huge dimensional data. Microarray studies are typical examples, so that we state this chapter in terms of gene microarray data, but the discussed methods can be used for design and analysis of any molecular biology studies involving high-dimensional data. In this chapter, we discuss sample size calculation methods for molecular biology studies when the discovery of prognostic molecular markers is performed by accurately controlling false discovery rate (FDR) or family-wise error rate (FWER) in the final data analysis. We limit our discussion to the two-sample case.
Collapse
Affiliation(s)
- Sin-Ho Jung
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| |
Collapse
|
11
|
Hauser NC, Dukalska M, Fellenberg K, Rupp S. From experimental setup to data analysis in transcriptomics: copper metabolism in the human pathogen Candida albicans. JOURNAL OF BIOPHOTONICS 2009; 2:262-268. [PMID: 19367594 DOI: 10.1002/jbio.200910004] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Transcript profiling by microarray analysis offers a great opportunity to reveal unknown effects in a comprehensive context. To be able to interpret the data, some basic issues in experimental setting and design including type and number of replications have to be considered and are discussed in this work. In order to facilitate and automate data interpretation, the experimental data were projected and clustered by Correspondence Analysis, subsequently associated with gene ontology (GO) terms for functional classification. We applied the technology to investigate copper metabolism in the human pathogen Candida albicans. The presented dataset gives an example of how different fluorescent labeling, biological and technical replicas and data analysis strategies for microarray experiments may influence the final outcome of the results.
Collapse
Affiliation(s)
- Nicole C Hauser
- Fraunhofer-Institut für Grenzflächen- und Bioverfahrenstechnik IGB, Department of Molecular Biotechnology, Stuttgart, Germany.
| | | | | | | |
Collapse
|
12
|
|
13
|
Seawater-regulated genes for two-component systems and outer membrane proteins in myxococcus. J Bacteriol 2009; 191:2102-11. [PMID: 19151139 DOI: 10.1128/jb.01556-08] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
When salt-tolerant Myxococcus cells are moved to a seawater environment, they change their growth, morphology, and developmental behavior. Outer membrane proteins and signal transduction pathways may play important roles in this shift. Chip hybridization targeting the genes predicted to encode 226 two-component signal transduction pathways and 74 outer membrane proteins of M. xanthus DK1622 revealed that the expression of 55 corresponding genes in the salt-tolerant strain M. fulvus HW-1 was significantly modified (most were downregulated) by the presence of seawater. Sequencing revealed that these seawater-regulated genes are highly homologous in both strains, suggesting that they have similar roles in the lifestyle of Myxococcus. Seven of the genes that had been reported in M. xanthus DK1622 are involved in different cellular processes, such as fruiting body development, sporulation, or motility. The outer membrane (Om) gene Om031 had the most significant change in expression (downregulated) in response to seawater, while the two-component system (Tc) gene Tc105 had the greatest increase in expression. Their homologues MXAN3106 and MXAN4042 were knocked out in DK1622 to analyze their functions in response to changes in salinity. In addition to having increased salt tolerance, sporulation of the MXAN3106 mutant was enhanced compared to that of DK1622, whereas mutating gene MXAN4042 produced contrary results. The results indicated that the genes that are involved in the cellular processes that are significantly changed in response to salinity may also be involved the salt tolerance of Myxococcus cells. Regulating the expression levels of these multifunctional genes may allow cells to quickly and efficiently respond to changing conditions in coastal environments.
Collapse
|
14
|
Feten G, Aastveit AH, Snipen L, Almøy T. A Discussion concerning the Inclusion of Variety Effect when Analysis of Variance is Used to Detect Differentially Expressed Genes. GENE REGULATION AND SYSTEMS BIOLOGY 2007. [DOI: 10.1177/117762500700100005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
In microarray studies several statistical methods have been proposed with the purpose of identifying differentially expressed genes in two varieties. A commonly used method is an analysis of variance model where only the effect of interaction between variety and gene is tested. In this paper we argue that in addition to the interaction effects, the main effect of variety should simultaneously also be taken into account when posting the hypothesis.
Collapse
Affiliation(s)
- Guri Feten
- Department of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, P.O. Box 5003, N-1432 Ås, Norway
| | - Are Halvor Aastveit
- Department of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, P.O. Box 5003, N-1432 Ås, Norway
| | - Lars Snipen
- Department of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, P.O. Box 5003, N-1432 Ås, Norway
| | - Trygve Almøy
- Department of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, P.O. Box 5003, N-1432 Ås, Norway
| |
Collapse
|
15
|
Tan YD, Yan HM. Powers of multiple-testing procedures for identification of genes significantly differentially expressed in microarray experiments. YI CHUAN XUE BAO = ACTA GENETICA SINICA 2006; 33:1132-40. [PMID: 17185174 DOI: 10.1016/s0379-4172(06)60152-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2006] [Accepted: 07/07/2006] [Indexed: 11/17/2022]
Abstract
Because of the high operation costs involved in microarray experiments, the determination of the number of replicates required to detect a gene significantly differentially expressed in a given multiple-testing procedure is of considerable significance. Calculation of power/replicate numbers required in multiple-testing procedures provides design guidance for microarray experiments. Based on this model and by choice of a multiple-testing procedure, expression noises based on permutation resampling can be considerably minimized. The method for mixture distribution model is suitable to various microarray data types obtained from single noise sources, or from multiple noise sources. By using the biological replicate number required in microarray experiments for a given power or by determining the power required to detect a gene significantly differentially expressed, given the sample size, or the best multiple-testing method can be chosen. As an example, a single-distribution model of t-statistic was fitted to an observed microarray dataset of 3 000 genes responsive to stroke in rat, and then used to calculate powers of four popular multiple-testing procedures to detect a gene of an expression change D. The results show that the B-procedure had the lowest power to detect a gene of small change among the multiple-testing procedures, whereas the BH-procedure had the highest power. However, all multiple-testing procedures had the same power to identify a gene having the largest change. Similar to a single test, the power of the BH-procedure to detect a small change does not vary as the number of genes increases, but powers of the other three multiple-testing procedures decline as the number of genes increases.
Collapse
Affiliation(s)
- Yuan-De Tan
- College of Life Science, Hunan Normal University, Changsha 410081, China
| | | |
Collapse
|
16
|
Raab RM. Incorporating genome-scale tools for studying energy homeostasis. Nutr Metab (Lond) 2006; 3:40. [PMID: 17081308 PMCID: PMC1636640 DOI: 10.1186/1743-7075-3-40] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2006] [Accepted: 11/03/2006] [Indexed: 11/16/2022] Open
Abstract
Mammals have evolved complex regulatory systems that enable them to maintain energy homeostasis despite constant environmental challenges that limit the availability of energy inputs and their composition. Biological control relies upon intricate systems composed of multiple organs and specialized cell types that regulate energy up-take, storage, and expenditure. Because these systems simultaneously perform diverse functions and are highly integrated, they are extremely difficult to understand in terms of their individual component contributions to energy homeostasis. In order to provide improved treatments and clinical options, it is important to identify the principle genetic and molecular components, as well as the systemic features of regulation. To begin, many of these features can be discovered by integrating experimental technologies with advanced methods of analysis. This review focuses on the analysis of transcriptional data derived from microarrays and how it can complement other experimental techniques to study energy homeostasis.
Collapse
|
17
|
Dragin N, Smani M, Arnaud-Dabernat S, Dubost C, Moranvillier I, Costet P, Daniel JY, Peuchant E. Acute oxidative stress is associated with cell proliferation in the mouse liver. FEBS Lett 2006; 580:3845-52. [PMID: 16797015 DOI: 10.1016/j.febslet.2006.06.006] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2006] [Accepted: 06/01/2006] [Indexed: 12/18/2022]
Abstract
Oxidative stress is known to produce tissue injury and to activate various signaling pathways. To investigate the molecular events linked to acute oxidative stress in mouse liver, we injected a toxic dose of paraquat. Liver necrosis was first observed, followed by histological marks of cell proliferation. Concomitantly, activation of the MAP kinase pathway and increased levels of the anti-apoptotic protein Bcl-XL were observed. Gene expression profiles revealed that the differentially expressed genes were potentially involved in cell proliferation. These data suggest that paraquat-induced acute oxidative stress triggers the activation of regeneration-related events in the liver.
Collapse
Affiliation(s)
- Nadine Dragin
- EA 3674 - Laboratoire de Biologie de la Différenciation et du Développement, Université de Bordeaux 2, 146 Rue Léo-Saignat, 33076 Bordeaux Cedex, France
| | | | | | | | | | | | | | | |
Collapse
|
18
|
Kendziorski C, Wang P. A review of statistical methods for expression quantitative trait loci mapping. Mamm Genome 2006; 17:509-17. [PMID: 16783633 DOI: 10.1007/s00335-005-0189-6] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2005] [Accepted: 02/23/2006] [Indexed: 10/24/2022]
Abstract
With high-throughput technologies now widely available, investigators can easily measure thousands of phenotypes for quantitative trait loci (QTL) mapping. Microarray measurements are particularly amenable to QTL mapping, as evidenced by a number of recent studies demonstrating utility across a broad range of biological endeavors. The early success stories have impelled a rapid increase in both the number and complexity of expression QTL (eQTL) experiments. Consequently, there is a need to consider the statistical principles involved in the design and analysis of these experiments and the methods currently being used. In this article we review these principles and methods and discuss the open questions most likely to yield significant progress toward increasing the amount of meaningful information obtained from eQTL mapping experiments.
Collapse
Affiliation(s)
- Christina Kendziorski
- Department of Biostatistics and Medical Informatics, University of Wisconsin, 1300 University Avenue (6729 MSC), Madison, WI 53706, USA.
| | | |
Collapse
|
19
|
Abstract
Studies that include high-throughput data, such as gene expression data, raise unique issues with respect to study design and analysis. At the same time, they should be viewed through the lens (albeit a modified one) of standard scientific approach that involves such issues as specifying objectives (even if the study is mainly hypothesis generating or exploratory), a careful consideration of design, including sample size and replication, deciding whether to include technical replication in addition to biological replication, and ensuring that the methods of analysis are appropriate for the objective.
Collapse
Affiliation(s)
- Jennifer Shoemaker
- Duke University, Department of Biostatistics and Bioinformatics, 2424 Erwin Road, Hock Plaza, Suite 802, Durham, NC 27705 USA.
| |
Collapse
|
20
|
Abstract
Large-scale genomic studies promise to advance our understanding of the biology of human cancers and to improve their diagnosis, prognostication, and treatment. The analysis and interpretation of genomics studies have faced challenges. The retrospective and observational design of many studies has rendered them susceptible to confounding and bias. Technological variations and advances have impacted on reproducibility. Statistical hurdles in relating a large number of variables to a small number of observations have added further constraints. This review considers the promise and challenge associated with the large-scale clinically oriented genomic analysis of human cancer and attempts to emphasize potential solutions.
Collapse
Affiliation(s)
- Anna V Tinker
- Ian Potter Centre for Cancer Genomics and Predictive Medicine, Peter MacCallum Cancer Centre, St. Andrew's Place, East Melbourne 3002, Victoria, Australia
| | | | | |
Collapse
|
21
|
Thomassen M, Skov V, Eiriksdottir F, Tan Q, Jochumsen K, Fritzner N, Brusgaard K, Dahlgaard J, Kruse TA. Spotting and validation of a genome wide oligonucleotide chip with duplicate measurement of each gene. Biochem Biophys Res Commun 2006; 344:1111-20. [PMID: 16647037 DOI: 10.1016/j.bbrc.2006.03.227] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2006] [Accepted: 03/26/2006] [Indexed: 10/24/2022]
Abstract
The quality of DNA microarray based gene expression data relies on the reproducibility of several steps in a microarray experiment. We have developed a spotted genome wide microarray chip with oligonucleotides printed in duplicate in order to minimise undesirable biases, thereby optimising detection of true differential expression. The validation study design consisted of an assessment of the microarray chip performance using the MessageAmp and FairPlay labelling kits. Intraclass correlation coefficient (ICC) was used to demonstrate that MessageAmp was significantly more reproducible than FairPlay. Further examinations with MessageAmp revealed the applicability of the system. The linear range of the chips was three orders of magnitude, the precision was high, as 95% of measurements deviated less than 1.24-fold from the expected value, and the coefficient of variation for relative expression was 13.6%. Relative quantitation was more reproducible than absolute quantitation and substantial reduction of variance was attained with duplicate spotting. An analysis of variance (ANOVA) demonstrated no significant day-to-day variation.
Collapse
Affiliation(s)
- Mads Thomassen
- Department of Biochemistry, Pharmacology, and Genetics, Odense University Hospital and Human Microarray Centre, University of Southern Denmark, Odense, Denmark.
| | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Lee JJ, Hassan OSS, Gao W, Wei NE, Kohel RJ, Chen XY, Payton P, Sze SH, Stelly DM, Chen ZJ. Developmental and gene expression analyses of a cotton naked seed mutant. PLANTA 2006; 223:418-32. [PMID: 16254724 DOI: 10.1007/s00425-005-0098-7] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2005] [Accepted: 07/25/2005] [Indexed: 05/05/2023]
Abstract
Cotton fiber development is a fundamental biological phenomenon, yet the molecular basis of fiber cell initiation is poorly understood. We examined molecular and cellular events of fiber cell development in the naked seed mutant (N1N1) and its isogenic line of cotton (Gossypium hirsutum L. cv. Texas Marker-1, TM-1). The dominant mutation not only delayed the process of fiber cell formation and elongation but also reduced the total number of fiber cells, resulting in sparsely distributed short fibers. Gene expression changes in TM-1 and N1N1 mutant lines among four tissues were analyzed using spotted cotton oligo-gene microarrays. Using the Arabidopsis genes, we selected and designed approximately 1,334 70-mer oligos from a subset of cotton fiber ESTs. Statistical analysis of the microarray data indicates that the number of significantly differentially expressed genes was 856 in the leaves compared to the ovules (3 days post-anthesis, DPA), 632 in the petals relative to the ovules (3 DPA), and 91 in the ovules at 0 DPA compared to 3 DPA, all in TM-1. Moreover, 117 and 30 genes were expressed significantly different in the ovules at three and 0 DPA, respectively, between TM-1 and N1N1. Quantitative RT-PCR analysis of 23 fiber-associated genes in seven tissues including ovules, fiber-bearing ovules, fibers, and non-fiber tissues in TM-1 and N1N1 indicates a mode of temporal regulation of the genes involved in transcriptional and translational regulation, signal transduction, and cell differentiation during early stages of fiber development. Suppression of the fiber-associated genes in the mutant may suggest that the N1N1 mutation disrupts temporal regulation of gene expression, leading to a defective process of fiber cell elongation and development.
Collapse
Affiliation(s)
- Jinsuk J Lee
- Department of Soil and Crop Sciences and Intercollegiate Program in Genetics, Texas A&M University, MS 2474/Molecular Genetics, College Station, TX 77843, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Tjaden B. An approach for clustering gene expression data with error information. BMC Bioinformatics 2006; 7:17. [PMID: 16409635 PMCID: PMC1360687 DOI: 10.1186/1471-2105-7-17] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2005] [Accepted: 01/12/2006] [Indexed: 11/22/2022] Open
Abstract
Background Clustering of gene expression patterns is a well-studied technique for elucidating trends across large numbers of transcripts and for identifying likely co-regulated genes. Even the best clustering methods, however, are unlikely to provide meaningful results if too much of the data is unreliable. With the maturation of microarray technology, a wealth of research on statistical analysis of gene expression data has encouraged researchers to consider error and uncertainty in their microarray experiments, so that experiments are being performed increasingly with repeat spots per gene per chip and with repeat experiments. One of the challenges is to incorporate the measurement error information into downstream analyses of gene expression data, such as traditional clustering techniques. Results In this study, a clustering approach is presented which incorporates both gene expression values and error information about the expression measurements. Using repeat expression measurements, the error of each gene expression measurement in each experiment condition is estimated, and this measurement error information is incorporated directly into the clustering algorithm. The algorithm, CORE (Clustering Of Repeat Expression data), is presented and its performance is validated using statistical measures. By using error information about gene expression measurements, the clustering approach is less sensitive to noise in the underlying data and it is able to achieve more accurate clusterings. Results are described for both synthetic expression data as well as real gene expression data from Escherichia coli and Saccharomyces cerevisiae. Conclusion The additional information provided by replicate gene expression measurements is a valuable asset in effective clustering. Gene expression profiles with high errors, as determined from repeat measurements, may be unreliable and may associate with different clusters, whereas gene expression profiles with low errors can be clustered with higher specificity. Results indicate that including error information from repeat gene expression measurements can lead to significant improvements in clustering accuracy.
Collapse
Affiliation(s)
- Brian Tjaden
- Computer Science Department, Wellesley College, Wellesley, MA 02481, USA.
| |
Collapse
|
24
|
Liang Y, Kelemen A. Associating phenotypes with molecular events: recent statistical advances and challenges underpinning microarray experiments. Funct Integr Genomics 2005; 6:1-13. [PMID: 16292543 DOI: 10.1007/s10142-005-0006-z] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2005] [Revised: 06/22/2005] [Accepted: 08/16/2005] [Indexed: 10/25/2022]
Abstract
Progress in mapping the genome and developments in array technologies have provided large amounts of information for delineating the roles of genes involved in complex diseases and quantitative traits. Since complex phenotypes are determined by a network of interrelated biological traits typically involving multiple inter-correlated genetic and environmental factors that interact in a hierarchical fashion, microarrays hold tremendous latent information. The analysis of microarray data is, however, still a bottleneck. In this paper, we review the recent advances in statistical analyses for associating phenotypes with molecular events underpinning microarray experiments. Classical statistical procedures to analyze phenotypes in genetics are reviewed first, followed by descriptions of the statistical procedures for linking molecular events to measured gene expression phenotypes (microarray-based gene expression) and observed phenotypes such as diseases status. These statistical procedures include (1) prior analysis, such as data quality controls, and normalization analyses for minimizing the effects of experimental artifacts and random noise; (2) gene selections and differentiation procedures based on inferential statistics for the class comparisons; (3) dynamic temporal patterns analysis through exploratory statistics such as unsupervised clustering and supervised classification and predictions; (4) assessing the reliability of microarray studies using real-time PCR and the reproducibility issues from many studies and multiple platforms. In addition, the post analysis to associate the discovered patterns of gene expression to pathway and functional analysis for selected genes are also considered in order to increase our understanding of interconnected gene processes.
Collapse
Affiliation(s)
- Yulan Liang
- Department of Biostatistics, The State University of New York at Buffalo, Buffalo, NY 14214, USA.
| | | |
Collapse
|
25
|
Boorman GA, Irwin RD, Vallant MK, Gerken DK, Lobenhofer EK, Hejtmancik MR, Hurban P, Brys AM, Travlos GS, Parker JS, Portier CJ. Variation in the hepatic gene expression in individual male Fischer rats. Toxicol Pathol 2005; 33:102-10. [PMID: 15805061 DOI: 10.1080/01926230590522211] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
A new tool beginning to have wider application in toxicology studies is transcript profiling using microarrays. Microarrays provide an opportunity to directly compare transcript populations in the tissues of chemical-exposed and unexposed animals. While several studies have addressed variation between microarray platforms and between different laboratories, much less effort has been directed toward individual animal differences especially among control animals where RNA samples are usually pooled. Estimation of the variation in gene expression in tissues from untreated animals is essential for the recognition and interpretation of subtle changes associated with chemical exposure. In this study hepatic gene expression as well as standard toxicological parameters were evaluated in 24 rats receiving vehicle only in 2 independent experiments. Unsupervised clustering demonstrated some individual variation but supervised clustering suggested that differentially expressed genes were generally random. The level of hepatic gene expression under carefully controlled study conditions is less than 1.5-fold for most genes. The impact of individual animal variability on microarray data can be minimized through experimental design.
Collapse
Affiliation(s)
- Gary A Boorman
- Environmental Toxicology Program, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina 27709, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Abstract
MOTIVATION Owing to the experimental cost and difficulty in obtaining biological materials, it is essential to consider appropriate sample sizes in microarray studies. With the growing use of the False Discovery Rate (FDR) in microarray analysis, an FDR-based sample size calculation is essential. METHOD We describe an approach to explicitly connect the sample size to the FDR and the number of differentially expressed genes to be detected. The method fits parametric models for degree of differential expression using the Expectation-Maximization algorithm. RESULTS The applicability of the method is illustrated with simulations and studies of a lung microarray dataset. We propose to use a small training set or published data from relevant biological settings to calculate the sample size of an experiment. AVAILABILITY Code to implement the method in the statistical package R is available from the authors.
Collapse
Affiliation(s)
- Jianhua Hu
- Department of Biostatistics and Applied Mathematics, University of Texas M.D. Anderson Cancer Center, TX 77030-4009, USA.
| | | | | |
Collapse
|
27
|
Coffman CJ, Wayne ML, Nuzhdin SV, Higgins LA, McIntyre LM. Identification of co-regulated transcripts affecting male body size in Drosophila. Genome Biol 2005; 6:R53. [PMID: 15960805 PMCID: PMC1175973 DOI: 10.1186/gb-2005-6-6-r53] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2005] [Revised: 02/21/2005] [Accepted: 05/09/2005] [Indexed: 11/17/2022] Open
Abstract
Factor analysis is applied to microarray data in order to relate gene networks to complex traits and identifies a factor associated with body size in Drosophila simulans. Factor analysis is an analytic approach that describes the covariation among a set of genes through the estimation of 'factors', which may be, for example, transcription factors, microRNAs (miRNAs), and so on, by which the genes are co-regulated. Factor analysis gives a direct mechanism by which to relate gene networks to complex traits. Using simulated data, we found that factor analysis clearly identifies the number and structure of factors and outperforms hierarchical cluster analysis. Noise genes, genes that are not correlated with any factor, can be distinguished even when factor structure is complex. Applied to body size in Drosophila simulans, an evolutionarily important complex trait, a factor was directly associated with body size.
Collapse
Affiliation(s)
- Cynthia J Coffman
- Health Services Research and Development Biostatistics Unit, Durham VA Medical Center (152), Durham, NC 27705, USA
- Duke University Medical Center, Department of Biostatistics and Bioinformatics, Durham, NC 27710, USA
| | - Marta L Wayne
- Department of Zoology, University of Florida, Gainesville, FL 32611, USA
| | - Sergey V Nuzhdin
- Department Ecology and Evolution, University of California at Davis, Davis, CA 95616, USA
| | - Laura A Higgins
- Department of Zoology, University of Florida, Gainesville, FL 32611, USA
| | - Lauren M McIntyre
- Duke University Medical Center, Department of Biostatistics and Bioinformatics, Durham, NC 27710, USA
- Department of Agronomy, Purdue University, West Lafayette, IN 47907, USA
| |
Collapse
|
28
|
Meunier B, Bouley J, Piec I, Bernard C, Picard B, Hocquette JF. Data analysis methods for detection of differential protein expression in two-dimensional gel electrophoresis. Anal Biochem 2005; 340:226-30. [PMID: 15840495 DOI: 10.1016/j.ab.2005.02.028] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2004] [Indexed: 10/25/2022]
Abstract
The recent development of microarray technology has led statisticians and bioinformaticians to develop new statistical methodologies for comparing different biological samples. The objective is to identify a small number of differentially expressed genes from among thousands. In quantitative proteomics, analysis of protein expression using two-dimensional gel electrophoresis shows some similarities with transcriptomic studies. Thus, the goal of this study was to evaluate different data analysis methodologies widely used in array analysis using different proteomic data sets of hundreds of proteins. Even with few replications, the significance analysis of microarrays method appeared to be more powerful than the Student's t test in truly declaring differentially expressed proteins. This procedure will avoid wasting time due to false positives and losing information with false negatives.
Collapse
Affiliation(s)
- Bruno Meunier
- INRA, Clermont-Ferrand Research Center, Herbivore Research Unit, Muscle Growth and Metabolism Group, 63122 St-Genès-Champanelle, France
| | | | | | | | | | | |
Collapse
|
29
|
Tempelman RJ. Assessing statistical precision, power, and robustness of alternative experimental designs for two color microarray platforms based on mixed effects models. Vet Immunol Immunopathol 2005; 105:175-86. [PMID: 15808299 DOI: 10.1016/j.vetimm.2005.02.002] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Recommendations on experimental designs for two color microarray systems have been generally conflicting as they pertain to the general choice between reference and non-reference loop designs. This conflict may currently exist because many previously published assessments may not have effectively connected design layout with the level of biological relative to technical replication. We reassess various reference and non-reference designs for statistical efficiency in terms of standard errors of mean differences, power of test, and robustness using recently developed mixed model software tools. In minimally replicated cases (n = 2), it appears that the reference design outperforms the classical loop design whereby a sample from each animal is used for only one particular array hybridization. Alternatively, the reference design was consistently inferior to those connected loop designs in which a sample from each animal is used in two different hybridizations. Nevertheless, the gap in power between these two designs diminished as the biological to residual variance ratio increased. The statistical efficiency of a single large classical loop design for the comparison of many treatments was demonstrated to be highly sensitive to missing arrays relative to a common reference design (n = 2). However, the use of two loops within an interwoven loop design was shown to be substantially more robust to missing arrays and statistically more efficient relative to a common reference design. Furthermore, the use of more than one loop leads to less disparity in precision and power comparisons between any two treatments.
Collapse
Affiliation(s)
- Robert J Tempelman
- Department of Animal Science, Michigan State University, 1205 Anthony Hall, East Lansing, MI 48824-1225, USA.
| |
Collapse
|
30
|
Abstract
We consider identifying differentially expressing genes between two patient groups using microarray experiment. We propose a sample size calculation method for a specified number of true rejections while controlling the false discovery rate at a desired level. Input parameters for the sample size calculation include the allocation proportion in each group, the number of genes in each array, the number of differentially expressing genes and the effect sizes among the differentially expressing genes. We have a closed-form sample size formula if the projected effect sizes are equal among differentially expressing genes. Otherwise, our method requires a numerical method to solve an equation. Simulation studies are conducted to show that the calculated sample sizes are accurate in practical settings. The proposed method is demonstrated with a real study.
Collapse
Affiliation(s)
- Sin-Ho Jung
- Department of Biostatistics and Bioinformatics, CALGB Statistical Center Hock Plaza, Suite 802,2424 Erwin Road Duke University Durham, NC 27705, USA.
| |
Collapse
|
31
|
Aivado M, Spentzos D, Alterovitz G, Otu HH, Grall F, Giagounidis AAN, Wells M, Cho JY, Germing U, Czibere A, Prall WC, Porter C, Ramoni MF, Libermann TA. Optimization and evaluation of surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) with reversed-phase protein arrays for protein profiling. Clin Chem Lab Med 2005; 43:133-40. [PMID: 15843205 DOI: 10.1515/cclm.2005.022] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
AbstractSurface-enhanced laser desorption/ionization (SELDI) time-of-flight mass spectrometry with protein arrays has facilitated the discovery of disease-specific protein profiles in serum. Such results raise hopes that protein profiles may become a powerful diagnostic tool. To this end, reliable and reproducible protein profiles need to be generated from many samples, accurate mass peak heights are necessary, and the experimental variation of the profiles must be known. We adapted the entire processing of protein arrays to a robotics system, thus improving the intra-assay coefficients of variation (CVs) from 45.1% to 27.8% (p<0.001). In addition, we assessed up to 16 technical replicates, and demonstrated that analysis of 2–4 replicates significantly increases the reliability of the protein profiles. A recent report on limited long-term reproducibility seemed to concord with our initial inter-assay CVs, which varied widely and reached up to 56.7%. However, we discovered that the inter-assay CV is strongly dependent on the drying time before application of the matrix molecule. Therefore, we devised a standardized drying process and demonstrated that our optimized SELDI procedure generates reliable and long-term reproducible protein profiles with CVs ranging from 25.7% to 32.6%, depending on the signal-to-noise ratio threshold used.
Collapse
Affiliation(s)
- Manuel Aivado
- BIDMC Genomics Center and Bioinformatics Core, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02115, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
32
|
Wang J, Lee JJ, Tian L, Lee HS, Chen M, Rao S, Wei EN, Doerge RW, Comai L, Chen ZJ. Methods for genome-wide analysis of gene expression changes in polyploids. Methods Enzymol 2005; 395:570-96. [PMID: 15865985 PMCID: PMC1986650 DOI: 10.1016/s0076-6879(05)95030-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Polyploidy is an evolutionary innovation, providing extra sets of genetic material for phenotypic variation and adaptation. It is predicted that changes of gene expression by genetic and epigenetic mechanisms are responsible for novel variation in nascent and established polyploids (Liu and Wendel, 2002; Osborn et al., 2003; Pikaard, 2001). Studying gene expression changes in allopolyploids is more complicated than in autopolyploids, because allopolyploids contain more than two sets of genomes originating from divergent, but related, species. Here we describe two methods that are applicable to the genome-wide analysis of gene expression differences resulting from genome duplication in autopolyploids or interactions between homoeologous genomes in allopolyploids. First, we describe an amplified fragment length polymorphism (AFLP)--complementary DNA (cDNA) display method that allows the discrimination of homoeologous loci based on restriction polymorphisms between the progenitors. Second, we describe microarray analyses that can be used to compare gene expression differences between the allopolyploids and respective progenitors using appropriate experimental design and statistical analysis. We demonstrate the utility of these two complementary methods and discuss the pros and cons of using the methods to analyze gene expression changes in autopolyploids and allopolyploids. Furthermore, we describe these methods in general terms to be of wider applicability for comparative gene expression in a variety of evolutionary, genetic, biological, and physiological contexts.
Collapse
Affiliation(s)
- Jianlin Wang
- Department of Soil and Crop Sciences, Texas A&M University, College Station, Texas 77843-2474, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
33
|
Abstract
INTRODUCTION Microarray experiments often have complex designs that include sample pooling, biological and technical replication, sample pairing and dye-swapping. This article demonstrates how statistical modelling can illuminate issues in the design and analysis of microarray experiments, and this information can then be used to plan effective studies. METHODS A very detailed statistical model for microarray data is introduced, to show the possible sources of variation that are present in even the simplest microarray experiments. Based on this model, the efficacy of common experimental designs, normalisation methodologies and analyses is determined. RESULTS When the cost of the arrays is high compared with the cost of samples, sample pooling and spot replication are shown to be efficient variance reduction methods, whereas technical replication of whole arrays is demonstrated to be very inefficient. Dye-swap designs can use biological replicates rather than technical replicates to improve efficiency and simplify analysis. When the cost of samples is high and technical variation is a major portion of the error, technical replication can be cost effective. Normalisation by centreing on a small number of spots may reduce array effects, but can introduce considerable variation in the results. Centreing using the bulk of spots on the array is less variable. Similarly, normalisation methods based on regression methods can introduce variability. Except for normalisation methods based on spiking controls, all normalisation requires that most genes do not differentially express. Methods based on spatial location and/or intensity also require that the nondifferentially expressing genes are at random with respect to location and intensity. Spotting designs should be carefully done so that spot replicates are widely spaced on the array, and genes with similar expression patterns are not clustered. DISCUSSION The tools for statistical design of experiments can be applied to microarray experiments to improve both efficiency and validity of the studies. Given the high cost of microarray experiments, the benefits of statistical input prior to running the experiment cannot be over-emphasised.
Collapse
Affiliation(s)
- Naomi Altman
- Department of Statistics, Pennsylvania State University, State College, Pennsylvania 16802-2111, USA.
| |
Collapse
|
34
|
Wayne ML, Pan YJ, Nuzhdin SV, McIntyre LM. Additivity and trans-acting effects on gene expression in male Drosophila simulans. Genetics 2004; 168:1413-20. [PMID: 15579694 PMCID: PMC1448806 DOI: 10.1534/genetics.104.030973] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2004] [Accepted: 07/28/2004] [Indexed: 11/18/2022] Open
Abstract
Understanding how genetic variation is maintained begins with a comprehensive description of what types of genetic variation exist, the extent and magnitude of the variation, and patterns discernable in that variation. However, such studies have focused primarily on DNA sequence data and have ignored genetic variation at other hierarchical levels of genetic information. Microarray technology permits an examination of genetic variation at the level of mRNA abundance. Utilizing a round-robin design, we present a quantitative description of variation in mRNA abundance in terms of GCA (general combining ability or additive variance). We test whether genes significant for GCA are randomly distributed across chromosomes and use a nonparametric approach to demonstrate that the magnitude of the variation is not random for GCA. We find that there is a paucity of genes significant for GCA on the X relative to the autosomes. The overall magnitude of the effects for GCA on the X tends to be lower than that on the autosomes and is contributed by rare alleles of larger effect. Due to male hemizygosity, GCA for X-linked phenotypes must be due to trans-acting factors, while GCA for autosomal phenotypes may be due to cis- or trans-acting factors. The contrast in the amount of variation between the X and the autosomes suggests that both cis and trans factors contribute to variation for expression in D. simulans with the preponderance of effects being trans. This nonrandom patterning of genetic variation in gene expression data with respect to chromosomal context may be due to hemizygosity in the male.
Collapse
Affiliation(s)
- M L Wayne
- Department of Zoology, University of Florida, Gainesville, Florida 32611, USA.
| | | | | | | |
Collapse
|
35
|
Pfister-Genskow M, Myers C, Childs LA, Lacson JC, Patterson T, Betthauser JM, Goueleke PJ, Koppang RW, Lange G, Fisher P, Watt SR, Forsberg EJ, Zheng Y, Leno GH, Schultz RM, Liu B, Chetia C, Yang X, Hoeschele I, Eilertsen KJ. Identification of differentially expressed genes in individual bovine preimplantation embryos produced by nuclear transfer: improper reprogramming of genes required for development. Biol Reprod 2004; 72:546-55. [PMID: 15483223 DOI: 10.1095/biolreprod.104.031799] [Citation(s) in RCA: 72] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Using an interwoven-loop experimental design in conjunction with highly conservative linear mixed model methodology using estimated variance components, 18 genes differentially expressed between nuclear transfer (NT)- and in vitro fertilization (IVF)-produced embryos were identified. The set is comprised of three intermediate-filament protein genes (cytokeratin 8, cytokeratin 19, and vimentin), three metabolic genes (phosphoribosyl pyrophosphate synthetase 1, mitochondrial acetoacetyl-coenzyme A thiolase, and alpha-glucosidase), two lysosomal-related genes (prosaposin and lysosomal-associated membrane protein 2), and a gene associated with stress responses (heat shock protein 27) along with major histocompatibility complex class I, nidogen 2, a putative transport protein, heterogeneous nuclear ribonuclear protein K, mitochondrial 16S rRNA, and ES1 (a zebrafish orthologue of unknown function). The three remaining genes are novel. To our knowledge, this is the first report comparing individual embryos produced by NT and IVF using cDNA microarray technology for any species, and it uses a rigorous experimental design that emphasizes statistical significance to identify differentially expressed genes between NT and IVF embryos in cattle.
Collapse
|
36
|
Lyons-Weiler J, Patel S, Becich MJ, Godfrey TE. Tests for finding complex patterns of differential expression in cancers: towards individualized medicine. BMC Bioinformatics 2004; 5:110. [PMID: 15307894 PMCID: PMC514539 DOI: 10.1186/1471-2105-5-110] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2004] [Accepted: 08/12/2004] [Indexed: 11/10/2022] Open
Abstract
Background Microarray studies in cancer compare expression levels between two or more sample groups on thousands of genes. Data analysis follows a population-level approach (e.g., comparison of sample means) to identify differentially expressed genes. This leads to the discovery of 'population-level' markers, i.e., genes with the expression patterns A > B and B > A. We introduce the PPST test that identifies genes where a significantly large subset of cases exhibit expression values beyond upper and lower thresholds observed in the control samples. Results Interestingly, the test identifies A > B and B < A pattern genes that are missed by population-level approaches, such as the t-test, and many genes that exhibit both significant overexpression and significant underexpression in statistically significantly large subsets of cancer patients (ABA pattern genes). These patterns tend to show distributions that are unique to individual genes, and are aptly visualized in a 'gene expression pattern grid'. The low degree of among-gene correlations in these genes suggests unique underlying genomic pathologies and high degree of unique tumor-specific differential expression. We compare the PPST and the ABA test to the parametric and non-parametric t-test by analyzing two independently published data sets from studies of progression in astrocytoma. Conclusions The PPST test resulted findings similar to the nonparametric t-test with higher self-consistency. These tests and the gene expression pattern grid may be useful for the identification of therapeutic targets and diagnostic or prognostic markers that are present only in subsets of cancer patients, and provide a more complete portrait of differential expression in cancer.
Collapse
Affiliation(s)
- James Lyons-Weiler
- Department of Pathology, Center for Biomedical Informatics, and Interdisciplinary Biomedical Graduate Program, University of Pittsburgh, PA 15232 USA
- Clinical Genomics Facility, Center for Pathology Informatics, Benedum Center for Oncology Informatics, University of Pittsburgh Cancer Institute, Pittsburgh, PA 15232 USA
| | - Satish Patel
- Department of Pathology, Center for Biomedical Informatics, and Interdisciplinary Biomedical Graduate Program, University of Pittsburgh, PA 15232 USA
- Clinical Genomics Facility, Center for Pathology Informatics, Benedum Center for Oncology Informatics, University of Pittsburgh Cancer Institute, Pittsburgh, PA 15232 USA
| | - Michael J Becich
- Department of Pathology, Center for Biomedical Informatics, and Interdisciplinary Biomedical Graduate Program, University of Pittsburgh, PA 15232 USA
- Clinical Genomics Facility, Center for Pathology Informatics, Benedum Center for Oncology Informatics, University of Pittsburgh Cancer Institute, Pittsburgh, PA 15232 USA
| | - Tony E Godfrey
- Departments of Surgery and Human Genetics, University of Pittsburgh Medical School, Pittsburgh, PA 15232 USA
- Mount Sinai School of Medicine, One Gustave Levy Place, Box 1668, East Building, Room 1070C, New York, NY 10029 USA
| |
Collapse
|
37
|
Jeffrey Chen Z, Wang J, Tian L, Lee HS, Wang JJ, Chen M, Lee JJ, Josefsson C, Madlung A, Watson B, Lippman Z, Vaughn M, Chris Pires J, Colot V, Doerge RW, Martienssen RA, Comai L, Osborn TC. The development of an Arabidopsis model system for genome-wide analysis of polyploidy effects. Biol J Linn Soc Lond 2004; 82:689-700. [PMID: 18079994 DOI: 10.1111/j.1095-8312.2004.00351.x] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Arabidopsis is a model system not only for studying numerous aspects of plant biology, but also for understanding mechanisms of the rapid evolutionary process associated with genome duplication and polyploidization. Although in animals interspecific hybrids are often sterile and aneuploids are related to disease syndromes, both Arabidopsis autopolyploids and allopolyploids occur in nature and can be readily formed in the laboratory, providing an attractive system for comparing changes in gene expression and genome structure among relatively 'young' and 'established' or 'ancient' polyploids. Powerful reverse and forward genetics in Arabidopsis offer an exceptional means by which regulatory mechanisms of gene and genome duplication may be revealed. Moreover, the Arabidopsis genome is completely sequenced; both coding and non-coding sequences are available. We have developed spotted oligo-gene and chromosome microarrays using the complete Arabidopsis genome sequence. The oligo-gene microarray consists of ~26 000 70-mer oligonucleotides that are designed from all annotated genes in Arabidopsis, and the chromosome microarray contains 1 kb genomic tiling fragments amplified from a chromosomal region or the complete sequence of chromosome 4. We have demonstrated the utility of microarrays for genome-wide analysis of changes in gene expression, genome organization and chromatin structure in Arabidopsis polyploids and related species.
Collapse
Affiliation(s)
- Z Jeffrey Chen
- Intercollegiate Program in Genetics and Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843-2474, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
38
|
Brodsky L, Leontovich A, Shtutman M, Feinstein E. Identification and handling of artifactual gene expression profiles emerging in microarray hybridization experiments. Nucleic Acids Res 2004; 32:e46. [PMID: 14999086 PMCID: PMC390318 DOI: 10.1093/nar/gnh043] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Mathematical methods of analysis of microarray hybridizations deal with gene expression profiles as elementary units. However, some of these profiles do not reflect a biologically relevant transcriptional response, but rather stem from technical artifacts. Here, we describe two technically independent but rationally interconnected methods for identification of such artifactual profiles. Our diagnostics are based on detection of deviations from uniformity, which is assumed as the main underlying principle of microarray design. Method 1 is based on detection of non-uniformity of microarray distribution of printed genes that are clustered based on the similarity of their expression profiles. Method 2 is based on evaluation of the presence of gene-specific microarray spots within the slides' areas characterized by an abnormal concentration of low/high differential expression values, which we define as 'patterns of differentials'. Applying two novel algorithms, for nested clustering (method 1) and for pattern detection (method 2), we can make a dual estimation of the profile's quality for almost every printed gene. Genes with artifactual profiles detected by method 1 may then be removed from further analysis. Suspicious differential expression values detected by method 2 may be either removed or weighted according to the probabilities of patterns that cover them, thus diminishing their input in any further data analysis.
Collapse
Affiliation(s)
- Leonid Brodsky
- Quark Biotech Inc./QBI Enterprises Ltd, Weizmann Science Park, POB 4071, Ness Ziona 70400 Israel.
| | | | | | | |
Collapse
|
39
|
Konu O, Xu X, Ma JZ, Kane J, Wang J, Shi SJ, Li MD. Application of a customized pathway-focused microarray for gene expression profiling of cellular homeostasis upon exposure to nicotine in PC12 cells. ACTA ACUST UNITED AC 2004; 121:102-13. [PMID: 14969741 DOI: 10.1016/j.molbrainres.2003.11.012] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/14/2003] [Indexed: 11/19/2022]
Abstract
Maintenance of cellular homeostasis is integral to appropriate regulation of cellular signaling and cell growth and division. In this study, we report the development and quality assessment of a pathway-focused microarray comprising genes involved in cellular homeostasis. Since nicotine is known to have highly modulatory effects on the intracellular calcium homeostasis, we therefore tested the applicability of the homeostatic pathway-focused microarray on the gene expression in PC-12 cells treated with 1 mM nicotine for 48 h relative to the untreated control cells. We first provided a detailed description of the focused array with respect to its gene and pathway content and then assessed the array quality using a robust regression procedure that allows for the exclusion of unreliable measurements while decreasing the number of false positives. As a result, the mean correlation coefficient between duplicate measurements of the arrays used in this study (control vs. nicotine treatment, three samples each) has increased from 0.974+/-0.017 to 0.995+/-0.002. Furthermore, we found that nicotine affected various structural and signaling components of the AKT/PKB signaling pathway and protein synthesis and degradation processes in PC-12 cells. Since modulation of intracellular calcium concentrations ([Ca(2+)](i)) and phosphatidylinositol signaling are important in various biological processes such as neurotransmitter release and tissue pathogenesis including tumor formation, we expect that the homeostatic pathway-focused microarray potentially can be used for the identification of unique gene expression profiles in comparative studies of drugs of abuse and diverse environmental stimuli, such as starvation and oxidative stress.
Collapse
Affiliation(s)
- Ozlen Konu
- Program in Genomics and Bioinformatics on Drug Addiction, Department of Psychiatry, The University of Texas Health Science Center at San Antonio, 7703 Floyd Curl Drive, Mail Code 7792, San Antonio, TX 78229-3900, USA
| | | | | | | | | | | | | |
Collapse
|
40
|
Lee HS, Wang J, Tian L, Jiang H, Black MA, Madlung A, Watson B, Lukens L, Chris Pires J, Wang JJ, Comai L, Osborn TC, Doerge RW, Jeffrey Chen Z. Sensitivity of 70-mer oligonucleotides and cDNAs for microarray analysis of gene expression in Arabidopsis and its related species. PLANT BIOTECHNOLOGY JOURNAL 2004; 2:45-57. [PMID: 17166142 PMCID: PMC2034503 DOI: 10.1046/j.1467-7652.2003.00048.x] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Synthetic oligonucleotides (oligos) represent an attractive alternative to cDNA amplicons for spotted microarray analysis in a number of model organisms, including Arabidopsis, C. elegans, Drosophila, human, mouse and yeast. However, little is known about the relative effectiveness of 60-70-mer oligos and cDNAs for detecting gene expression changes. Using 192 pairs of Arabidopsis thaliana cDNAs and corresponding 70-mer oligos, we performed three sets of dye-swap experiments and used analysis of variance (anova) to compare sources of variation and sensitivities for detecting gene expression changes in A. thaliana, A. arenosa and Brassica oleracea. Our major findings were: (1) variation among different RNA preparations from the same tissue was small, but large variation among dye-labellings and slides indicates the need to replicate these factors; (2) sources of variation were similar for experiments with all three species, suggesting these feature types are effective for analysing gene expression in related species; (3) oligo and cDNA features had similar sensitivities for detecting expression changes and they identified a common subset of significant genes, but results from quantitative RT-PCR did not support the use of one over the other. These findings indicate that spotted oligos are at least as effective as cDNAs for microarray analyses of gene expression. We are using oligos designed from approximately 26,000 annotated genes of A. thaliana to study gene expression changes in Arabidopsis and Brassica polyploids.
Collapse
Affiliation(s)
- Hyeon-Se Lee
- Department of Soil and Crop Sciences and Intercollegiate Program in Genetics, Texas A&M University, College Station, TX 77843-2474, USA
| | - Jianlin Wang
- Department of Soil and Crop Sciences and Intercollegiate Program in Genetics, Texas A&M University, College Station, TX 77843-2474, USA
| | - Lu Tian
- Department of Soil and Crop Sciences and Intercollegiate Program in Genetics, Texas A&M University, College Station, TX 77843-2474, USA
| | - Hongmei Jiang
- Department of Statistics, 1399 Math Building, Purdue University, West Lafayette, IN 47906, USA
- Computational Genomics, 206 Whistler Hall, Purdue University, West Lafayette, IN 47906, USA
| | - Michael A. Black
- Department of Statistics, 1399 Math Building, Purdue University, West Lafayette, IN 47906, USA
- Computational Genomics, 206 Whistler Hall, Purdue University, West Lafayette, IN 47906, USA
| | - Andreas Madlung
- Department of Biology, Box355325, University of Washington, Seattle, WA 98195-5325, USA
| | - Brian Watson
- Department of Biology, Box355325, University of Washington, Seattle, WA 98195-5325, USA
| | - Lewis Lukens
- Department of Agronomy, 1575 Linden Drive, University of Wisconsin, Madison, WI 53706, USA
| | - J. Chris Pires
- Department of Agronomy, 1575 Linden Drive, University of Wisconsin, Madison, WI 53706, USA
| | - Jiyuan J. Wang
- Department of Soil and Crop Sciences and Intercollegiate Program in Genetics, Texas A&M University, College Station, TX 77843-2474, USA
| | - Luca Comai
- Department of Biology, Box355325, University of Washington, Seattle, WA 98195-5325, USA
| | - Thomas C. Osborn
- Department of Agronomy, 1575 Linden Drive, University of Wisconsin, Madison, WI 53706, USA
| | - R. W. Doerge
- Department of Statistics, 1399 Math Building, Purdue University, West Lafayette, IN 47906, USA
- Computational Genomics, 206 Whistler Hall, Purdue University, West Lafayette, IN 47906, USA
- Department of Agronomy, 1150 Lilly Hall, Purdue University, West Lafayette, IN 47906, USA
| | - Z. Jeffrey Chen
- Department of Soil and Crop Sciences and Intercollegiate Program in Genetics, Texas A&M University, College Station, TX 77843-2474, USA
- * Correspondence: Department of Soil and Crop Sciences and Intercollegiate Program in Genetics, Texas A&M University, College Station, TX 77843-2474, USA (fax: +1 979 845 0456; e-mail: )
| |
Collapse
|
41
|
Yang MCK, Yang JJ, McIndoe RA, She JX. Microarray experimental design: power and sample size considerations. Physiol Genomics 2003; 16:24-8. [PMID: 14532333 DOI: 10.1152/physiolgenomics.00037.2003] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Gene expression analysis using high-throughput microarray technology has become a powerful approach to study systems biology. The exponential growth in microarray experiments has spawned a number of investigations into the reliability and reproducibility of this type of data. However, the sample size requirements necessary to obtain statistically significant results has not had as much attention. We report here statistical methods for the determination of the sufficient number of subjects necessary to minimize the false discovery rate while maintaining high power to detect differentially expressed genes. Two experimental designs were considered: 1) a comparison between two groups at a single time point, and 2) a comparison of two experimental groups with sequential time points. Computer programs are available for the methods discussed in this paper and are adaptable to more complicated situations.
Collapse
Affiliation(s)
- M C K Yang
- Department of Statistics, University of Florida, Gainesville, Florida 32611, USA.
| | | | | | | |
Collapse
|
42
|
Morrison DA, Ellis JT. The design and analysis of microarray experiments: applications in parasitology. DNA Cell Biol 2003; 22:357-94. [PMID: 12906732 DOI: 10.1089/104454903767650658] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Microarray experiments can generate enormous amounts of data, but large datasets are usually inherently complex, and the relevant information they contain can be difficult to extract. For the practicing biologist, we provide an overview of what we believe to be the most important issues that need to be addressed when dealing with microarray data. In a microarray experiment we are simply trying to identify which genes are the most "interesting" in terms of our experimental question, and these will usually be those that are either overexpressed or underexpressed (upregulated or downregulated) under the experimental conditions. Analysis of the data to find these genes involves first preprocessing of the raw data for quality control, including filtering of the data (e.g., detection of outlying values) followed by standardization of the data (i.e., making the data uniformly comparable throughout the dataset). This is followed by the formal quantitative analysis of the data, which will involve either statistical hypothesis testing or multivariate pattern recognition. Statistical hypothesis testing is the usual approach to "class comparison," where several experimental groups are being directly compared. The best approach to this problem is to use analysis of variance, although issues related to multiple hypothesis testing and probability estimation still need to be evaluated. Pattern recognition can involve "class prediction," for which a range of supervised multivariate techniques are available, or "class discovery," for which an even broader range of unsupervised multivariate techniques have been developed. Each technique has its own limitations, which need to be kept in mind when making a choice from among them. To put these ideas in context, we provide a detailed examination of two specific examples of the analysis of microarray data, both from parasitology, covering many of the most important points raised.
Collapse
Affiliation(s)
- David A Morrison
- Department of Parasitology (SWEPAR), National Veterinary Institute and Swedish University of Agricultural Sciences, Uppsala, Sweden
| | | |
Collapse
|
43
|
Tsai CA, Chen YJ, Chen JJ. Testing for differentially expressed genes with microarray data. Nucleic Acids Res 2003; 31:e52. [PMID: 12711697 PMCID: PMC154240 DOI: 10.1093/nar/gng052] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
This paper compares the type I error and power of the one- and two-sample t-tests, and the one- and two-sample permutation tests for detecting differences in gene expression between two microarray samples with replicates using Monte Carlo simulations. When data are generated from a normal distribution, type I errors and powers of the one-sample parametric t-test and one-sample permutation test are very close, as are the two-sample t-test and two-sample permutation test, provided that the number of replicates is adequate. When data are generated from a t-distribution, the permutation tests outperform the corresponding parametric tests if the number of replicates is at least five. For data from a two-color dye swap experiment, the one-sample test appears to perform better than the two-sample test since expression measurements for control and treatment samples from the same spot are correlated. For data from independent samples, such as the one-channel array or two-channel array experiment using reference design, the two-sample t-tests appear more powerful than the one-sample t-tests.
Collapse
Affiliation(s)
- Chen-An Tsai
- Division of Biometry and Risk Assessment, National Center for Toxicological Research, Food and Drug Administration, Jefferson, AR 72079, USA
| | | | | |
Collapse
|
44
|
Osborn TC, Pires JC, Birchler JA, Auger DL, Chen ZJ, Lee HS, Comai L, Madlung A, Doerge RW, Colot V, Martienssen RA. Understanding mechanisms of novel gene expression in polyploids. Trends Genet 2003; 19:141-7. [PMID: 12615008 DOI: 10.1016/s0168-9525(03)00015-5] [Citation(s) in RCA: 519] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Polyploidy has long been recognized as a prominent force shaping the evolution of eukaryotes, especially flowering plants. New phenotypes often arise with polyploid formation and can contribute to the success of polyploids in nature or their selection for use in agriculture. Although the causes of novel variation in polyploids are not well understood, they could involve changes in gene expression through increased variation in dosage-regulated gene expression, altered regulatory interactions, and rapid genetic and epigenetic changes. New research approaches are being used to study these mechanisms and the results should provide a more complete understanding of polyploidy.
Collapse
Affiliation(s)
- Thomas C Osborn
- Dept of Agronomy, University of Wisconsin, Madison, WI 53706, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Craig BA, Black MA, Doerge RW. Gene expression data: The technology and statistical analysis. JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS 2003. [DOI: 10.1198/1085711031256] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
46
|
Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2003. [PMCID: PMC2448450 DOI: 10.1002/cfg.228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|
47
|
Pan W, Lin J, Le CT. How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach. Genome Biol 2002; 3:research0022. [PMID: 12049663 PMCID: PMC115224 DOI: 10.1186/gb-2002-3-5-research0022] [Citation(s) in RCA: 61] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2001] [Revised: 02/15/2002] [Accepted: 03/11/2002] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND It has been recognized that replicates of arrays (or spots) may be necessary for reliably detecting differentially expressed genes in microarray experiments. However, the often-asked question of how many replicates are required has barely been addressed in the literature. In general, the answer depends on several factors: a given magnitude of expression change, a desired statistical power (that is, probability) to detect it, a specified Type I error rate, and the statistical method being used to detect the change. Here, we discuss how to calculate the number of replicates in the context of applying a nonparametric statistical method, the normal mixture model approach, to detect changes in gene expression. RESULTS The methodology is applied to a data set containing expression levels of 1,176 genes in rats with and without pneumococcal middle-ear infection. We illustrate how to calculate the power functions for 2, 4, 6 and 8 replicates. CONCLUSIONS The proposed method is potentially useful in designing microarray experiments to discover differentially expressed genes. The same idea can be applied to other statistical methods.
Collapse
Affiliation(s)
- Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, 420 Delaware Street, Minneapolis, MN 55455-0378, USA.
| | | | | |
Collapse
|