1
|
Wu CT, Shen M, Du D, Cheng Z, Parker SJ, Lu Y, Van Eyk JE, Yu G, Clarke R, Herrington DM, Wang Y. Cosbin: cosine score-based iterative normalization of biologically diverse samples. BIOINFORMATICS ADVANCES 2022; 2:vbac076. [PMID: 36330358 PMCID: PMC9614059 DOI: 10.1093/bioadv/vbac076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 10/02/2022] [Accepted: 10/18/2022] [Indexed: 11/06/2022]
Abstract
Motivation Data normalization is essential to ensure accurate inference and comparability of gene expression measures across samples or conditions. Ideally, gene expression data should be rescaled based on consistently expressed reference genes. However, to normalize biologically diverse samples, the most commonly used reference genes exhibit striking expression variability and size-factor or distribution-based normalization methods can be problematic when the amount of asymmetry in differential expression is significant. Results We report an efficient and accurate data-driven method—Cosine score-based iterative normalization (Cosbin)—to normalize biologically diverse samples. Based on the Cosine scores of cross-condition expression patterns, the Cosbin pipeline iteratively eliminates asymmetric differentially expressed genes, identifies consistently expressed genes, and calculates sample-wise normalization factors. We demonstrate the superior performance and enhanced utility of Cosbin compared with six representative peer methods using both simulation and real multi-omics expression datasets. Implemented in open-source R scripts and specifically designed to address normalization bias due to significant asymmetry in differential expression across multiple conditions, the Cosbin tool complements rather than replaces the existing methods and will allow biologists to more accurately detect true molecular signals among diverse phenotypic groups. Availability and implementation The R scripts of Cosbin pipeline are freely available at https://github.com/MinjieSh/Cosbin. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
| | | | - Dongping Du
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Zuolin Cheng
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Sarah J Parker
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Yingzhou Lu
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Jennifer E Van Eyk
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Guoqiang Yu
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Robert Clarke
- The Hormel Institute, University of Minnesota, Austin, MN 55912, USA
| | - David M Herrington
- Department of Internal Medicine, Wake Forest University, Winston-Salem, NC 27157, USA
| | - Yue Wang
- To whom correspondence should be addressed.
| |
Collapse
|
2
|
Data-driven detection of subtype-specific differentially expressed genes. Sci Rep 2021; 11:332. [PMID: 33432005 PMCID: PMC7801594 DOI: 10.1038/s41598-020-79704-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Accepted: 12/11/2020] [Indexed: 11/08/2022] Open
Abstract
Among multiple subtypes of tissue or cell, subtype-specific differentially-expressed genes (SDEGs) are defined as being most-upregulated in only one subtype but not in any other. Detecting SDEGs plays a critical role in the molecular characterization and deconvolution of multicellular complex tissues. Classic differential analysis assumes a null hypothesis whose test statistic is not subtype-specific, thus can produce a high false positive rate and/or lower detection power. Here we first introduce a One-Versus-Everyone Fold Change (OVE-FC) test for detecting SDEGs. We then propose a scaled test statistic (OVE-sFC) for assessing the statistical significance of SDEGs that applies a mixture null distribution model and a tailored permutation test. The OVE-FC/sFC test was validated on both type 1 error rate and detection power using extensive simulation data sets generated from real gene expression profiles of purified subtype samples. The OVE-FC/sFC test was then applied to two benchmark gene expression data sets of purified subtype samples and detected many known or previously unknown SDEGs. Subsequent supervised deconvolution results on synthesized bulk expression data, obtained using the SDEGs detected from the independent purified expression data by the OVE-FC/sFC test, showed superior performance in deconvolution accuracy when compared with popular peer methods.
Collapse
|
3
|
Fu Y, Yu G, Levine DA, Wang N, Shih IM, Zhang Z, Clarke R, Wang Y. BACOM2.0 facilitates absolute normalization and quantification of somatic copy number alterations in heterogeneous tumor. Sci Rep 2015; 5:13955. [PMID: 26350498 PMCID: PMC4563570 DOI: 10.1038/srep13955] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2015] [Accepted: 08/07/2015] [Indexed: 11/18/2022] Open
Abstract
Most published copy number datasets on solid tumors were obtained from specimens comprised of mixed cell populations, for which the varying tumor-stroma proportions are unknown or unreported. The inability to correct for signal mixing represents a major limitation on the use of these datasets for subsequent analyses, such as discerning deletion types or detecting driver aberrations. We describe the BACOM2.0 method with enhanced accuracy and functionality to normalize copy number signals, detect deletion types, estimate tumor purity, quantify true copy numbers, and calculate average-ploidy value. While BACOM has been validated and used with promising results, subsequent BACOM analysis of the TCGA ovarian cancer dataset found that the estimated average tumor purity was lower than expected. In this report, we first show that this lowered estimate of tumor purity is the combined result of imprecise signal normalization and parameter estimation. Then, we describe effective allele-specific absolute normalization and quantification methods that can enhance BACOM applications in many biological contexts while in the presence of various confounders. Finally, we discuss the advantages of BACOM in relation to alternative approaches. Here we detail this revised computational approach, BACOM2.0, and validate its performance in real and simulated datasets.
Collapse
Affiliation(s)
- Yi Fu
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Guoqiang Yu
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Douglas A Levine
- Department of Surgery, Memorial Sloan-Kettering Cancer Center, New York, NY 10021, USA
| | - Niya Wang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Ie-Ming Shih
- Departments of Pathology and Oncology, Johns Hopkins University, Baltimore, MD 21231, USA
| | - Zhen Zhang
- Departments of Pathology and Oncology, Johns Hopkins University, Baltimore, MD 21231, USA
| | - Robert Clarke
- Lombardi Comprehensive Cancer Center, Georgetown University, Washington, DC 20057, USA
| | - Yue Wang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| |
Collapse
|
4
|
Riker AI, Enkemann SA, Fodstad O, Liu S, Ren S, Morris C, Xi Y, Howell P, Metge B, Samant RS, Shevde LA, Li W, Eschrich S, Daud A, Ju J, Matta J. The gene expression profiles of primary and metastatic melanoma yields a transition point of tumor progression and metastasis. BMC Med Genomics 2008; 1:13. [PMID: 18442402 PMCID: PMC2408576 DOI: 10.1186/1755-8794-1-13] [Citation(s) in RCA: 402] [Impact Index Per Article: 25.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2007] [Accepted: 04/28/2008] [Indexed: 12/16/2022] Open
Abstract
Background The process of malignant transformation, progression and metastasis of melanoma is poorly understood. Gene expression profiling of human cancer has allowed for a unique insight into the genes that are involved in these processes. Thus, we have attempted to utilize this approach through the analysis of a series of primary, non-metastatic cutaneous tumors and metastatic melanoma samples. Methods We have utilized gene microarray analysis and a variety of molecular techniques to compare 40 metastatic melanoma (MM) samples, composed of 22 bulky, macroscopic (replaced) lymph node metastases, 16 subcutaneous and 2 distant metastases (adrenal and brain), to 42 primary cutaneous cancers, comprised of 16 melanoma, 11 squamous cell, 15 basal cell skin cancers. A Human Genome U133 Plus 2.0 array from Affymetrix, Inc. was utilized for each sample. A variety of statistical software, including the Affymetrix MAS 5.0 analysis software, was utilized to compare primary cancers to metastatic melanomas. Separate analyses were performed to directly compare only primary melanoma to metastatic melanoma samples. The expression levels of putative oncogenes and tumor suppressor genes were analyzed by semi- and real-time quantitative RT-PCR (qPCR) and Western blot analysis was performed on select genes. Results We find that primary basal cell carcinomas, squamous cell carcinomas and thin melanomas express dramatically higher levels of many genes, including SPRR1A/B, KRT16/17, CD24, LOR, GATA3, MUC15, and TMPRSS4, than metastatic melanoma. In contrast, the metastatic melanomas express higher levels of genes such as MAGE, GPR19, BCL2A1, MMP14, SOX5, BUB1, RGS20, and more. The transition from non-metastatic expression levels to metastatic expression levels occurs as melanoma tumors thicken. We further evaluated primary melanomas of varying Breslow's tumor thickness to determine that the transition in expression occurs at different thicknesses for different genes suggesting that the "transition zone" represents a critical time for the emergence of the metastatic phenotype. Several putative tumor oncogenes (SPP-1, MITF, CITED-1, GDF-15, c-Met, HOX loci) and suppressor genes (PITX-1, CST-6, PDGFRL, DSC-3, POU2F3, CLCA2, ST7L), were identified and validated by quantitative PCR as changing expression during this transition period. These are strong candidates for genes involved in the progression or suppression of the metastatic phenotype. Conclusion The gene expression profiling of primary, non-metastatic cutaneous tumors and metastatic melanoma has resulted in the identification of several genes that may be centrally involved in the progression and metastatic potential of melanoma. This has very important implications as we continue to develop an improved understanding of the metastatic process, allowing us to identify specific genes for prognostic markers and possibly for targeted therapeutic approaches.
Collapse
Affiliation(s)
- Adam I Riker
- Mitchell Cancer Institute-University of South Alabama, 315 North University Boulevard, MSB 2015, Mobile, Alabama 36688, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Clarke R, Ressom HW, Wang A, Xuan J, Liu MC, Gehan EA, Wang Y. The properties of high-dimensional data spaces: implications for exploring gene and protein expression data. Nat Rev Cancer 2008; 8:37-49. [PMID: 18097463 PMCID: PMC2238676 DOI: 10.1038/nrc2294] [Citation(s) in RCA: 317] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
High-throughput genomic and proteomic technologies are widely used in cancer research to build better predictive models of diagnosis, prognosis and therapy, to identify and characterize key signalling networks and to find new targets for drug development. These technologies present investigators with the task of extracting meaningful statistical and biological information from high-dimensional data spaces, wherein each sample is defined by hundreds or thousands of measurements, usually concurrently obtained. The properties of high dimensionality are often poorly understood or overlooked in data modelling and analysis. From the perspective of translational science, this Review discusses the properties of high-dimensional data spaces that arise in genomic and proteomic studies and the challenges they can pose for data analysis and interpretation.
Collapse
Affiliation(s)
- Robert Clarke
- Department of Oncology and Lombardi Comprehensive Cancer Center, Georgetown University School of Medicine, 3970 Reservoir Road NW, Washington, DC 20057, USA
| | | | | | | | | | | | | |
Collapse
|
6
|
Wang D, Zhang CH, Soares MB, Huang J. Systematic approaches for incorporating control spots and data quality information to improve normalization of cDNA microarray data. J Biopharm Stat 2007; 17:415-31. [PMID: 17479391 DOI: 10.1080/10543400701199544] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
BACKGROUND Normalization and data quality control are two important aspects in microarray data analysis. Proper normalization and data quality control ensure that intensity ratios provide meaningful and accurate measurement of relative gene expression values. Control spots such as spikes and housekeeping genes with known concentrations in two channels are often used for calibrating experimental parameters. They provide valuable information about experimental variation which can be utilized for better normalization. They are also needed for proper normalization in cases that the most of the spots tend to change in one direction. In addition, it is desirable to include information on spot quality. Such information is available in a typical microarray data set, but is not fully utilized by existing normalization methods. RESULTS We propose two extensions of the two-way semi-linear model (TW-SLM) for appropriately combining control genes and spot quality information in normalization. The first extension (TW-SLMC) is designed to systematically incorporate control spots in a semi-parametric model to calibrate estimated normalization curves so that the relative fold changes of gene expressions are accurately estimated. Extrapolation is not required in this approach. The second extension (TW-SLMQ) is proposed to incorporate spot quality measure into normalization. This approach down-weights spots with lower quality scores in normalization. These two extensions can be used simultaneously for normalizing a data set. Two microarray data sets are used to demonstrate the proposed methods. AVAILABILITY An R based computing package is developed for the proposed methods and available from the corresponding authors.
Collapse
Affiliation(s)
- D Wang
- Biostatistics and Bioinformatics Unit, Comprehensive Cancer Center, The University of Alabama, Birmingham, AL 35294, USA. USA.
| | | | | | | |
Collapse
|
7
|
Abstract
The study of gene expression profiling of cells and tissue has become a major tool for discovery in medicine. Microarray experiments allow description of genome-wide expression changes in health and disease. The results of such experiments are expected to change the methods employed in the diagnosis and prognosis of disease in obstetrics and gynecology. Moreover, an unbiased and systematic study of gene expression profiling should allow the establishment of a new taxonomy of disease for obstetric and gynecologic syndromes. Thus, a new era is emerging in which reproductive processes and disorders could be characterized using molecular tools and fingerprinting. The design, analysis, and interpretation of microarray experiments require specialized knowledge that is not part of the standard curriculum of our discipline. This article describes the types of studies that can be conducted with microarray experiments (class comparison, class prediction, class discovery). We discuss key issues pertaining to experimental design, data preprocessing, and gene selection methods. Common types of data representation are illustrated. Potential pitfalls in the interpretation of microarray experiments, as well as the strengths and limitations of this technology, are highlighted. This article is intended to assist clinicians in appraising the quality of the scientific evidence now reported in the obstetric and gynecologic literature.
Collapse
Affiliation(s)
- Adi L. Tarca
- Perinatology Research Branch, National Institute of Child Health and Human Development, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, and Detroit, MI
- Department of Computer Science, Wayne State University
| | - Roberto Romero
- Perinatology Research Branch, National Institute of Child Health and Human Development, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, and Detroit, MI
- Center for Molecular Medicine and Genetics, Wayne State University
| | - Sorin Draghici
- Department of Computer Science, Wayne State University
- Karmanos Cancer Institute, Detroit, MI
| |
Collapse
|
8
|
Zhang L, Yoder SJ, Enkemann SA. Identical probes on different high-density oligonucleotide microarrays can produce different measurements of gene expression. BMC Genomics 2006; 7:153. [PMID: 16776839 PMCID: PMC1525186 DOI: 10.1186/1471-2164-7-153] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2006] [Accepted: 06/15/2006] [Indexed: 11/23/2022] Open
Abstract
Background There are many potential sources of variability in a microarray experiment. Variation can arise from many aspects of the collection and processing of samples for gene expression analysis. Oligonucleotide-based arrays are thought to minimize one source of variability as identical oligonucleotides are expected to recognize the same transcripts during hybridization. Results We demonstrate that although the probes on the U133A GeneChip arrays are identical in sequence to probes designed for the U133 Plus 2.0 arrays the values obtained from an experimental hybridization can be quite different. Nearly half of the probesets in common between the two array types can produce slightly different values from the same sample. Nearly 70% of the individual probes in these probesets produced array specific differences. Conclusion The context of the probe may also contribute some bias to the final measured value of gene expression. At a minimum, this should add an extra level of caution when considering the direct comparison of experiments performed in two microarray formats. More importantly, this suggests that it may not be possible to know which value is the most accurate representation of a biological sample when comparing two formats.
Collapse
Affiliation(s)
- LanMin Zhang
- Microarray Core Laboratory, H. Lee Moffitt Cancer Center and Research Institute, SRB2, 12902 Magnolia Drive, Tampa, Florida 33612, USA
| | - Sean J Yoder
- Microarray Core Laboratory, H. Lee Moffitt Cancer Center and Research Institute, SRB2, 12902 Magnolia Drive, Tampa, Florida 33612, USA
| | - Steven A Enkemann
- Microarray Core Laboratory, H. Lee Moffitt Cancer Center and Research Institute, SRB2, 12902 Magnolia Drive, Tampa, Florida 33612, USA
| |
Collapse
|
9
|
Bayesian Clustering of Prostate Cancer Patients by Using a Latent Class Poisson Model. KOREAN JOURNAL OF APPLIED STATISTICS 2005. [DOI: 10.5351/kjas.2005.18.1.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
10
|
Print-tip Normalization for DNA Microarray Data. KOREAN JOURNAL OF APPLIED STATISTICS 2005. [DOI: 10.5351/kjas.2005.18.1.115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
11
|
Zhao Y, Li MC, Simon R. An adaptive method for cDNA microarray normalization. BMC Bioinformatics 2005; 6:28. [PMID: 15707486 PMCID: PMC552315 DOI: 10.1186/1471-2105-6-28] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2004] [Accepted: 02/11/2005] [Indexed: 12/03/2022] Open
Abstract
Background Normalization is a critical step in analysis of gene expression profiles. For dual-labeled arrays, global normalization assumes that the majority of the genes on the array are non-differentially expressed between the two channels and that the number of over-expressed genes approximately equals the number of under-expressed genes. These assumptions can be inappropriate for custom arrays or arrays in which the reference RNA is very different from the experimental samples. Results We propose a mixture model based normalization method that adaptively identifies non-differentially expressed genes and thereby substantially improves normalization for dual-labeled arrays in settings where the assumptions of global normalization are problematic. The new method is evaluated using both simulated and real data. Conclusions The new normalization method is effective for general microarray platforms when samples with very different expression profile are co-hybridized and for custom arrays where the majority of genes are likely to be differentially expressed.
Collapse
Affiliation(s)
- Yingdong Zhao
- Biometric Research Branch, National Cancer Institute, National Institutes of Health, Rockville, Maryland, USA
| | | | - Richard Simon
- Biometric Research Branch, National Cancer Institute, National Institutes of Health, Rockville, Maryland, USA
| |
Collapse
|
12
|
Wang D, Huang J, Xie H, Manzella L, Soares MB. A robust two-way semi-linear model for normalization of cDNA microarray data. BMC Bioinformatics 2005; 6:14. [PMID: 15663789 PMCID: PMC549200 DOI: 10.1186/1471-2105-6-14] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2004] [Accepted: 01/21/2005] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Normalization is a basic step in microarray data analysis. A proper normalization procedure ensures that the intensity ratios provide meaningful measures of relative expression values. METHODS We propose a robust semiparametric method in a two-way semi-linear model (TW-SLM) for normalization of cDNA microarray data. This method does not make the usual assumptions underlying some of the existing methods. For example, it does not assume that: (i) the percentage of differentially expressed genes is small; or (ii) the numbers of up- and down-regulated genes are about the same, as required in the LOWESS normalization method. We conduct simulation studies to evaluate the proposed method and use a real data set from a specially designed microarray experiment to compare the performance of the proposed method with that of the LOWESS normalization approach. RESULTS The simulation results show that the proposed method performs better than the LOWESS normalization method in terms of mean square errors for estimated gene effects. The results of analysis of the real data set also show that the proposed method yields more consistent results between the direct and the indirect comparisons and also can detect more differentially expressed genes than the LOWESS method. CONCLUSIONS Our simulation studies and the real data example indicate that the proposed robust TW-SLM method works at least as well as the LOWESS method and works better when the underlying assumptions for the LOWESS method are not satisfied. Therefore, it is a powerful alternative to the existing normalization methods.
Collapse
Affiliation(s)
- Deli Wang
- Biostatistics and Bioinformatics Unit, Comprehensive Cancer Center, the University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Jian Huang
- Department of Statistics and Actuarial Science, and Program in Public Health Genetics, the University of Iowa, Iowa City, IA 52242, USA
| | - Hehuang Xie
- Department of Pediatrics, the University of Iowa, Iowa City, IA 52242, USA
| | - Liliana Manzella
- Department of Pediatrics, the University of Iowa, Iowa City, IA 52242, USA
| | - Marcelo Bento Soares
- Department of Pediatrics, the University of Iowa, Iowa City, IA 52242, USA
- Departments of Biochemistry, Orthopaedics, Physiology and Biophysics, the University of Iowa, Iowa City, IA 52242, USA
| |
Collapse
|
13
|
Lukac R, Plataniotis KN, Smolka B, Venetsanopoulos AN. A Multichannel Order-Statistic Technique for cDNA Microarray Image Processing. IEEE Trans Nanobioscience 2004; 3:272-85. [PMID: 15631139 DOI: 10.1109/tnb.2004.837907] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
This paper introduces an automated image processing procedure capable of processing complementary deoxyribonucleic acid (cDNA) microarray images. Microarray data is contaminated by noise and suffers from broken edges and visual artifacts. Without the utilization of a filter, subsequent tasks such as spot identification and gene expression determination cannot be completed. By employing, in a unique cascade processing cycle, nonlinear filtering solutions based on robust order statistics, the procedure: 1) removes both background and high-frequency corrupting noise and 2) correctly identifies edges and spots in cDNA microarray data. The proposed solution operates directly on the microarray data, does not rely on explicit data normalization or spot separation preprocessing, and operates in a robust manner without using heuristically determined design parameters. Other routine microarray processing operations such as shape manipulations and grid adjustments can be used in conjunction with the developed solution in the processing pipeline. Experimentation reported in this paper indicates that the proposed solution yields excellent performance by removing noise and enhancing spot location determination.
Collapse
Affiliation(s)
- Rastislav Lukac
- Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON M5S 3G4, Canada.
| | | | | | | |
Collapse
|
14
|
Chiorino G, Acquadro F, Mello Grand M, Viscomi S, Segir R, Gasparini M, Dotto P. Interpretation of expression-profiling results obtained from different platforms and tissue sources: examples using prostate cancer data. Eur J Cancer 2004; 40:2592-603. [PMID: 15541960 DOI: 10.1016/j.ejca.2004.07.029] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2004] [Revised: 06/16/2004] [Accepted: 07/07/2004] [Indexed: 11/18/2022]
Abstract
The analysis of expression signatures is a powerful tool for the classification of cancer and other tissue samples. Several protocols and platforms are available on the market, and these lead to both confirmatory and complementary results. We review the main processing techniques for cross-platform comparisons and the different tissue sources for cancer profiling. Some examples and the cross-interpretation of bibliographic data related to prostate cancer are also presented.
Collapse
Affiliation(s)
- G Chiorino
- Laboratory of Cancer Pharmacogenomics, Fondo Edo Tempia, via Malta 3, Biella 13900, Italy.
| | | | | | | | | | | | | |
Collapse
|
15
|
Thomson JM, Parker J, Perou CM, Hammond SM. A custom microarray platform for analysis of microRNA gene expression. Nat Methods 2004; 1:47-53. [PMID: 15782152 DOI: 10.1038/nmeth704] [Citation(s) in RCA: 581] [Impact Index Per Article: 29.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2004] [Accepted: 08/10/2004] [Indexed: 12/30/2022]
Abstract
MicroRNAs are short, noncoding RNA transcripts that post-transcriptionally regulate gene expression. Several hundred microRNA genes have been identified in Caenorhabditis elegans, Drosophila, plants and mammals. MicroRNAs have been linked to developmental processes in C. elegans, plants and humans and to cell growth and apoptosis in Drosophila. A major impediment in the study of microRNA function is the lack of quantitative expression profiling methods. To close this technological gap, we have designed dual-channel microarrays that monitor expression levels of 124 mammalian microRNAs. Using these tools, we observed distinct patterns of expression among adult mouse tissues and embryonic stem cells. Expression profiles of staged embryos demonstrate temporal regulation of a large class of microRNAs, including members of the let-7 family. This microarray technology enables comprehensive investigation of microRNA expression, and furthers our understanding of this class of recently discovered noncoding RNAs.
Collapse
Affiliation(s)
- J Michael Thomson
- Department of Cell and Developmental Biology, University of North Carolina, Chapel Hill, North Carolina 27599, USA
| | | | | | | |
Collapse
|
16
|
Yoon D, Yi SG, Kim JH, Park T. Two-stage normalization using background intensities in cDNA microarray data. BMC Bioinformatics 2004; 5:97. [PMID: 15268767 PMCID: PMC509428 DOI: 10.1186/1471-2105-5-97] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2004] [Accepted: 07/21/2004] [Indexed: 11/10/2022] Open
Abstract
Background In the microarray experiment, many undesirable systematic variations are commonly observed. Normalization is the process of removing such variation that affects the measured gene expression levels. Normalization plays an important role in the earlier stage of microarray data analysis. The subsequent analysis results are highly dependent on normalization. One major source of variation is the background intensities. Recently, some methods have been employed for correcting the background intensities. However, all these methods focus on defining signal intensities appropriately from foreground and background intensities in the image analysis. Although a number of normalization methods have been proposed, no systematic methods have been proposed using the background intensities in the normalization process. Results In this paper, we propose a two-stage method adjusting for the effect of background intensities in the normalization process. The first stage fits a regression model to adjust for the effect of background intensities and the second stage applies the usual normalization method such as a nonlinear LOWESS method to the background-adjusted intensities. In order to carry out the two-stage normalization method, we consider nine different background measures and investigate their performances in normalization. The performance of two-stage normalization is compared to those of global median normalization as well as intensity dependent nonlinear LOWESS normalization. We use the variability among the replicated slides to compare performance of normalization methods. Conclusions For the selected background measures, the proposed two-stage normalization method performs better than global or intensity dependent nonlinear LOWESS normalization method. Especially, when there is a strong relationship between the background intensity and the signal intensity, the proposed method performs much better. Regardless of background correction methods used in the image analysis, the proposed two-stage normalization method can be applicable as long as both signal intensity and background intensity are available.
Collapse
Affiliation(s)
- Dankyu Yoon
- Program in Bioinformatics, Seoul National University, San56-l, Shin Lim-Dong, Kwan Ak-Ku, Seoul 151-747, Republic of Korea
| | - Sung-Gon Yi
- Department of Statistics, College of Natural Science, Seoul National University, San56-l, Shin Lim-Dong, Kwan Ak-Ku, Seoul 151-747, Republic of Korea
| | - Ju-Han Kim
- SNUBI: Seoul National University Biomedical Informatics, Seoul National University School of Medicine, 28 Yongon-dong Chongno-gu, Seoul 110-799, Republic of Korea
| | - Taesung Park
- Department of Statistics, College of Natural Science, Seoul National University, San56-l, Shin Lim-Dong, Kwan Ak-Ku, Seoul 151-747, Republic of Korea
| |
Collapse
|
17
|
Zhang Q, Ushijima R, Kawai T, Tanaka H. Which to use? - microarray data analysis in input and output data processing. CHEM-BIO INFORMATICS JOURNAL 2004. [DOI: 10.1273/cbij.4.56] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Affiliation(s)
- Qingwei Zhang
- Department of Bioinformatics, Medical Research Institute, Tokyo Medical and Dental University
- Pharmaceutical Research Laboratory, AJINOMOTO Co., Inc. [present address]
| | - Rie Ushijima
- Laboratory of Seeds Finding Technology, Eisai Co., Ltd
| | | | - Hiroshi Tanaka
- Department of Bioinformatics, Medical Research Institute, Tokyo Medical and Dental University
| |
Collapse
|
18
|
McQuain MK, Seale K, Peek J, Levy S, Haselton FR. Effects of relative humidity and buffer additives on the contact printing of microarrays by quill pins. Anal Biochem 2003; 320:281-91. [PMID: 12927835 DOI: 10.1016/s0003-2697(03)00348-8] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
DNA microarrays printed with quill pins exhibit significant variation in probe DNA spots. Interspot variations and nonuniform distribution of probe within spots are major sources of experimental uncertainty in microarray analysis. To gain better insight into the sources of variation, we analyzed 450 consecutive depositions printed at relative humidities between 40 and 80% using three print buffers. Increasing relative humidity improved printing performance by delaying pin failure but did not reduce the variability in spot characteristics. Adding either betaine or dimethyl sulfoxide (DMSO) to the print buffer also improved quill pin performance. Least interspot variation was observed with the DMSO additive printed at 80% relative humidity, but this additive also resulted in the greatest intraspot variation. Least intraspot variation was observed with 1.5M betaine printed at 60% relative humidity, but these conditions produced microarrays with high interspot variability. Evaporation of printing solution from the quill reservoir appeared to be the primary cause of interspot and intraspot variations. Our studies indicate that relative humidity and printing solution additives reduce evaporation. Based on the spot variability requirements for a particular application, humidity and additives may be chosen to optimize either inter- or intraspot variability.
Collapse
Affiliation(s)
- Mark K McQuain
- Biomedical Engineering, Vanderbilt University, Nashville, TN 37235, USA
| | | | | | | | | |
Collapse
|
19
|
Park T, Yi SG, Kang SH, Lee S, Lee YS, Simon R. Evaluation of normalization methods for microarray data. BMC Bioinformatics 2003; 4:33. [PMID: 12950995 PMCID: PMC200968 DOI: 10.1186/1471-2105-4-33] [Citation(s) in RCA: 88] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2003] [Accepted: 09/02/2003] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Microarray technology allows the monitoring of expression levels for thousands of genes simultaneously. This novel technique helps us to understand gene regulation as well as gene by gene interactions more systematically. In the microarray experiment, however, many undesirable systematic variations are observed. Even in replicated experiment, some variations are commonly observed. Normalization is the process of removing some sources of variation which affect the measured gene expression levels. Although a number of normalization methods have been proposed, it has been difficult to decide which methods perform best. Normalization plays an important role in the earlier stage of microarray data analysis. The subsequent analysis results are highly dependent on normalization. RESULTS In this paper, we use the variability among the replicated slides to compare performance of normalization methods. We also compare normalization methods with regard to bias and mean square error using simulated data. CONCLUSIONS Our results show that intensity-dependent normalization often performs better than global normalization methods, and that linear and nonlinear normalization methods perform similarly. These conclusions are based on analysis of 36 cDNA microarrays of 3,840 genes obtained in an experiment to search for changes in gene expression profiles during neuronal differentiation of cortical stem cells. Simulation studies confirm our findings.
Collapse
Affiliation(s)
- Taesung Park
- Department of Statistics, Seoul National University, Seoul, Korea
| | - Sung-Gon Yi
- Department of Statistics, Seoul National University, Seoul, Korea
| | - Sung-Hyun Kang
- Department of Statistics, Seoul National University, Seoul, Korea
| | - SeungYeoun Lee
- Department of Applied Mathematics, Sejong University, Seoul, Korea
| | - Yong-Sung Lee
- Department of Biochemistry, Hanyang University College of Medicine, Seoul, Korea
| | - Richard Simon
- Biometric Research Branch, Division of Cancer Treatment & Diagnosis National Cancer Institute, Bethesda MD, USA
| |
Collapse
|