1
|
Joseph SM, Sathidevi PS. An Automated cDNA Microarray Image Analysis for the Determination of Gene Expression Ratios. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:136-150. [PMID: 34910637 DOI: 10.1109/tcbb.2021.3135650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
This paper proposes a fully automated technique for cDNA microarray image analysis. Initially, an effective preprocessing stage combined with gridding is built to get the individual spot regions of images. Current work begins with the proposal of a new rule to get the foreground (spot) and background regions in the spot blocks, which uses TV-L1 image denoising, spot block binarization, and finds the most accurate spot label by measuring the centroid differences of labelled regions in the block with that of the spot block centroid. The credibility of the segmentation rule on real images is evaluated by metrics: mean absolute error (MAE) and coefficient of variation (CV) and on synthetic images by metrics: probability of error (PE) and discrepancy distance (DD). The performance values on real and synthetic datasets reveal better results than the competitive methods. After the segmentation, prior to the spot intensity extraction, background intensity correction and flagging of noisy spots are executed. Using the lowess method, intensities are normalized, and gene expression ratios are determined. To comprehend the linearities of red and green intensities and to discern up and down-regulated genes (abnormal), fold-change factor, scatter and box plots are also used to represent the gene expression levels.
Collapse
|
2
|
Ma S, Ren B, Mallick H, Moon YS, Schwager E, Maharjan S, Tickle TL, Lu Y, Carmody RN, Franzosa EA, Janson L, Huttenhower C. A statistical model for describing and simulating microbial community profiles. PLoS Comput Biol 2021; 17:e1008913. [PMID: 34516542 PMCID: PMC8491899 DOI: 10.1371/journal.pcbi.1008913] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 10/05/2021] [Accepted: 08/19/2021] [Indexed: 12/26/2022] Open
Abstract
Many methods have been developed for statistical analysis of microbial community profiles, but due to the complex nature of typical microbiome measurements (e.g. sparsity, zero-inflation, non-independence, and compositionality) and of the associated underlying biology, it is difficult to compare or evaluate such methods within a single systematic framework. To address this challenge, we developed SparseDOSSA (Sparse Data Observations for the Simulation of Synthetic Abundances): a statistical model of microbial ecological population structure, which can be used to parameterize real-world microbial community profiles and to simulate new, realistic profiles of known structure for methods evaluation. Specifically, SparseDOSSA's model captures marginal microbial feature abundances as a zero-inflated log-normal distribution, with additional model components for absolute cell counts and the sequence read generation process, microbe-microbe, and microbe-environment interactions. Together, these allow fully known covariance structure between synthetic features (i.e. "taxa") or between features and "phenotypes" to be simulated for method benchmarking. Here, we demonstrate SparseDOSSA's performance for 1) accurately modeling human-associated microbial population profiles; 2) generating synthetic communities with controlled population and ecological structures; 3) spiking-in true positive synthetic associations to benchmark analysis methods; and 4) recapitulating an end-to-end mouse microbiome feeding experiment. Together, these represent the most common analysis types in assessment of real microbial community environmental and epidemiological statistics, thus demonstrating SparseDOSSA's utility as a general-purpose aid for modeling communities and evaluating quantitative methods. An open-source implementation is available at http://huttenhower.sph.harvard.edu/sparsedossa2.
Collapse
Affiliation(s)
- Siyuan Ma
- Harvard Chan Microbiome in Public Health Center, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Broad Institute, Cambridge, Massachusetts, United States of America
| | - Boyu Ren
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Broad Institute, Cambridge, Massachusetts, United States of America
| | - Himel Mallick
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Broad Institute, Cambridge, Massachusetts, United States of America
| | - Yo Sup Moon
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Emma Schwager
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Sagun Maharjan
- Harvard Chan Microbiome in Public Health Center, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Broad Institute, Cambridge, Massachusetts, United States of America
| | - Timothy L. Tickle
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Broad Institute, Cambridge, Massachusetts, United States of America
| | - Yiren Lu
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Rachel N. Carmody
- Department of Human Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Eric A. Franzosa
- Harvard Chan Microbiome in Public Health Center, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Broad Institute, Cambridge, Massachusetts, United States of America
| | - Lucas Janson
- Department of Statistics, Harvard University, Cambridge, Massachusetts, United States of America
| | - Curtis Huttenhower
- Harvard Chan Microbiome in Public Health Center, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Broad Institute, Cambridge, Massachusetts, United States of America
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
| |
Collapse
|
3
|
Larriba Y, Rueda C, Fernández MA, Peddada SD. Microarray Data Normalization and Robust Detection of Rhythmic Features. Methods Mol Biol 2019; 1986:207-225. [PMID: 31115890 DOI: 10.1007/978-1-4939-9442-7_9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Data derived from microarray technologies are generally subject to various sources of noise and accordingly the raw data are pre-processed before formally analysed. Data normalization is a key pre-processing step when dealing with microarray experiments, such as circadian gene-expressions, since it removes systematic variations across arrays. A wide variety of normalization methods are available in the literature. However, from our experience in the study of rhythmic expression patterns in oscillatory systems (e.g. cell-cycle, circadian clock), the choice of the normalization method may substantially impair the identification of rhythmic genes. Hence, the identification of a gene as rhythmic could be just as an artefact of how the data were normalized. Yet, gene rhythmicity detection is crucial in modern toxicological and pharmacological studies, thus a procedure to truly identify rhythmic genes that are robust to the choice of a normalization method is required.To perform the task of detecting rhythmic features, we propose a rhythmicity measure based on bootstrap methodology to robustly identify rhythmic genes in oscillatory systems. Although our methodology can be extended to any high-throughput experiment, in this chapter, we illustrate how to apply it to a publicly available circadian clock microarray gene-expression data and give full details (both statistical and computational) so that the methodology can be used in an easy way. We will show that the choice of normalization method has very little effect on the proposed methodology since the results derived from the bootstrap-based rhythmicity measure are highly rank correlated for any pair of normalization methods considered. This suggests, on the one hand, that the rhythmicity measure proposed is robust to the choice of the normalization method, and on the other hand, that gene rhythmicity detected using this measure is potentially not a mere artefact of the normalization method used. In this way the researcher using this methodology will be protected against the possible effect of different normalizations, as the conclusions obtained will not depend so strongly on them. Additionally, the described bootstrap methodology can also be employed as a tool to simulate gene-expression participating in an oscillatory system from a reference data set.
Collapse
Affiliation(s)
- Yolanda Larriba
- Departamento de Estadística e Investigación Operativa, Universidad de Valladolid, Valladolid, Spain.
| | - Cristina Rueda
- Departamento de Estadística e Investigación Operativa, Universidad de Valladolid, Valladolid, Spain
| | - Miguel A Fernández
- Departamento de Estadística e Investigación Operativa, Universidad de Valladolid, Valladolid, Spain
| | - Shyamal D Peddada
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
4
|
Larriba Y, Rueda C, Fernández MA, Peddada SD. A Bootstrap Based Measure Robust to the Choice of Normalization Methods for Detecting Rhythmic Features in High Dimensional Data. Front Genet 2018; 9:24. [PMID: 29456555 PMCID: PMC5801422 DOI: 10.3389/fgene.2018.00024] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2017] [Accepted: 01/17/2018] [Indexed: 01/01/2023] Open
Abstract
Motivation: Gene-expression data obtained from high throughput technologies are subject to various sources of noise and accordingly the raw data are pre-processed before formally analyzed. Normalization of the data is a key pre-processing step, since it removes systematic variations across arrays. There are numerous normalization methods available in the literature. Based on our experience, in the context of oscillatory systems, such as cell-cycle, circadian clock, etc., the choice of the normalization method may substantially impact the determination of a gene to be rhythmic. Thus rhythmicity of a gene can purely be an artifact of how the data were normalized. Since the determination of rhythmic genes is an important component of modern toxicological and pharmacological studies, it is important to determine truly rhythmic genes that are robust to the choice of a normalization method. Results: In this paper we introduce a rhythmicity measure and a bootstrap methodology to detect rhythmic genes in an oscillatory system. Although the proposed methodology can be used for any high-throughput gene expression data, in this paper we illustrate the proposed methodology using several publicly available circadian clock microarray gene-expression datasets. We demonstrate that the choice of normalization method has very little effect on the proposed methodology. Specifically, for any pair of normalization methods considered in this paper, the resulting values of the rhythmicity measure are highly correlated. Thus it suggests that the proposed measure is robust to the choice of a normalization method. Consequently, the rhythmicity of a gene is potentially not a mere artifact of the normalization method used. Lastly, as demonstrated in the paper, the proposed bootstrap methodology can also be used for simulating data for genes participating in an oscillatory system using a reference dataset. Availability: A user friendly code implemented in R language can be downloaded from http://www.eio.uva.es/~miguel/robustdetectionprocedure.html
Collapse
Affiliation(s)
- Yolanda Larriba
- Departamento de Estadística e Investigación Operativa, Universidad de Valladolid, Valladolid, Spain
| | - Cristina Rueda
- Departamento de Estadística e Investigación Operativa, Universidad de Valladolid, Valladolid, Spain
| | - Miguel A Fernández
- Departamento de Estadística e Investigación Operativa, Universidad de Valladolid, Valladolid, Spain
| | - Shyamal D Peddada
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, NC, United States.,Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, United States
| |
Collapse
|
5
|
Kang S, Song J. Robust gene selection methods using weighting schemes for microarray data analysis. BMC Bioinformatics 2017; 18:389. [PMID: 28865426 PMCID: PMC5581932 DOI: 10.1186/s12859-017-1810-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2017] [Accepted: 08/27/2017] [Indexed: 11/10/2022] Open
Abstract
Background A common task in microarray data analysis is to identify informative genes that are differentially expressed between two different states. Owing to the high-dimensional nature of microarray data, identification of significant genes has been essential in analyzing the data. However, the performances of many gene selection techniques are highly dependent on the experimental conditions, such as the presence of measurement error or a limited number of sample replicates. Results We have proposed new filter-based gene selection techniques, by applying a simple modification to significance analysis of microarrays (SAM). To prove the effectiveness of the proposed method, we considered a series of synthetic datasets with different noise levels and sample sizes along with two real datasets. The following findings were made. First, our proposed methods outperform conventional methods for all simulation set-ups. In particular, our methods are much better when the given data are noisy and sample size is small. They showed relatively robust performance regardless of noise level and sample size, whereas the performance of SAM became significantly worse as the noise level became high or sample size decreased. When sufficient sample replicates were available, SAM and our methods showed similar performance. Finally, our proposed methods are competitive with traditional methods in classification tasks for microarrays. Conclusions The results of simulation study and real data analysis have demonstrated that our proposed methods are effective for detecting significant genes and classification tasks, especially when the given data are noisy or have few sample replicates. By employing weighting schemes, we can obtain robust and reliable results for microarray data analysis. Electronic supplementary material The online version of this article (10.1186/s12859-017-1810-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Suyeon Kang
- Department of Statistics, Ewha Womans University, Seoul, South Korea
| | - Jongwoo Song
- Department of Statistics, Ewha Womans University, Seoul, South Korea.
| |
Collapse
|
6
|
Katsigiannis S, Zacharia E, Maroulis D. MIGS-GPU: Microarray Image Gridding and Segmentation on the GPU. IEEE J Biomed Health Inform 2016; 21:867-874. [PMID: 26960232 DOI: 10.1109/jbhi.2016.2537922] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Complementary DNA (cDNA) microarray is a powerful tool for simultaneously studying the expression level of thousands of genes. Nevertheless, the analysis of microarray images remains an arduous and challenging task due to the poor quality of the images that often suffer from noise, artifacts, and uneven background. In this study, the MIGS-GPU [Microarray Image Gridding and Segmentation on Graphics Processing Unit (GPU)] software for gridding and segmenting microarray images is presented. MIGS-GPU's computations are performed on the GPU by means of the compute unified device architecture (CUDA) in order to achieve fast performance and increase the utilization of available system resources. Evaluation on both real and synthetic cDNA microarray images showed that MIGS-GPU provides better performance than state-of-the-art alternatives, while the proposed GPU implementation achieves significantly lower computational times compared to the respective CPU approaches. Consequently, MIGS-GPU can be an advantageous and useful tool for biomedical laboratories, offering a user-friendly interface that requires minimum input in order to run.
Collapse
|
7
|
A Synthetic Kinome Microarray Data Generator. MICROARRAYS 2015; 4:432-53. [PMID: 27600233 PMCID: PMC4996406 DOI: 10.3390/microarrays4040432] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/09/2015] [Revised: 08/25/2015] [Accepted: 10/10/2015] [Indexed: 02/02/2023]
Abstract
Cellular pathways involve the phosphorylation and dephosphorylation of proteins. Peptide microarrays called kinome arrays facilitate the measurement of the phosphorylation activity of hundreds of proteins in a single experiment. Analyzing the data from kinome microarrays is a multi-step process. Typically, various techniques are possible for a particular step, and it is necessary to compare and evaluate them. Such evaluations require data for which correct analysis results are known. Unfortunately, such kinome data is not readily available in the community. Further, there are no established techniques for creating artificial kinome datasets with known results and with the same characteristics as real kinome datasets. In this paper, a methodology for generating synthetic kinome array data is proposed. The methodology relies on actual intensity measurements from kinome microarray experiments and preserves their subtle characteristics. The utility of the methodology is demonstrated by evaluating methods for eliminating heterogeneous variance in kinome microarray data. Phosphorylation intensities from kinome microarrays often exhibit such heterogeneous variance and its presence can negatively impact downstream statistical techniques that rely on homogeneity of variance. It is shown that using the output from the proposed synthetic data generator, it is possible to critically compare two variance stabilization methods.
Collapse
|
8
|
Mizeranschi A, Zheng H, Thompson P, Dubitzky W. Evaluating a common semi-mechanistic mathematical model of gene-regulatory networks. BMC SYSTEMS BIOLOGY 2015; 9 Suppl 5:S2. [PMID: 26356485 PMCID: PMC4565562 DOI: 10.1186/1752-0509-9-s5-s2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
Modeling and simulation of gene-regulatory networks (GRNs) has become an important aspect of modern systems biology investigations into mechanisms underlying gene regulation. A key challenge in this area is the automated inference (reverse-engineering) of dynamic, mechanistic GRN models from gene expression time-course data. Common mathematical formalisms for representing such models capture two aspects simultaneously within a single parameter: (1) Whether or not a gene is regulated, and if so, the type of regulator (activator or repressor), and (2) the strength of influence of the regulator (if any) on the target or effector gene. To accommodate both roles, "generous" boundaries or limits for possible values of this parameter are commonly allowed in the reverse-engineering process. This approach has several important drawbacks. First, in the absence of good guidelines, there is no consensus on what limits are reasonable. Second, because the limits may vary greatly among different reverse-engineering experiments, the concrete values obtained for the models may differ considerably, and thus it is difficult to compare models. Third, if high values are chosen as limits, the search space of the model inference process becomes very large, adding unnecessary computational load to the already complex reverse-engineering process. In this study, we demonstrate that restricting the limits to the [−1, +1] interval is sufficient to represent the essential features of GRN systems and offers a reduction of the search space without loss of quality in the resulting models. To show this, we have carried out reverse-engineering studies on data generated from artificial and experimentally determined from real GRN systems.
Collapse
|
9
|
Hendrickx DM, Jennen DGJ, Briedé JJ, Cavill R, de Kok TM, Kleinjans JCS. Pattern recognition methods to relate time profiles of gene expression with phenotypic data: a comparative study. Bioinformatics 2015; 31:2115-22. [DOI: 10.1093/bioinformatics/btv108] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2014] [Accepted: 02/16/2015] [Indexed: 12/13/2022] Open
|
10
|
Katsigiannis S, Zacharia E, Maroulis D. Grow-cut based automatic cDNA microarray image segmentation. IEEE Trans Nanobioscience 2014; 14:138-45. [PMID: 25438323 DOI: 10.1109/tnb.2014.2369961] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Complementary DNA (cDNA) microarray is a well-established tool for simultaneously studying the expression level of thousands of genes. Segmentation of microarray images is one of the main stages in a microarray experiment. However, it remains an arduous and challenging task due to the poor quality of images. Images suffer from noise, artifacts, and uneven background, while spots depicted on images can be poorly contrasted and deformed. In this paper, an original approach for the segmentation of cDNA microarray images is proposed. First, a preprocessing stage is applied in order to reduce the noise levels of the microarray image. Then, the grow-cut algorithm is applied separately to each spot location, employing an automated seed selection procedure, in order to locate the pixels belonging to spots. Application on datasets containing synthetic and real microarray images shows that the proposed algorithm performs better than other previously proposed methods. Moreover, in order to exploit the independence of the segmentation task for each separate spot location, both a multithreaded CPU and a graphics processing unit (GPU) implementation were evaluated.
Collapse
|
11
|
SITDEM: a simulation tool for disease/endpoint models of association studies based on single nucleotide polymorphism genotypes. Comput Biol Med 2013; 45:136-42. [PMID: 24480173 DOI: 10.1016/j.compbiomed.2013.11.021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2013] [Revised: 11/24/2013] [Accepted: 11/26/2013] [Indexed: 01/29/2023]
Abstract
The association analysis between single nucleotide polymorphisms (SNPs) and disease or endpoint in genome-wide association studies (GWAS) has been considered as a powerful strategy for investigating genetic susceptibility and for identifying significant biomarkers. The statistical analysis approaches with simulated data have been widely used to review experimental designs and performance measurements. In recent years, a number of authors have proposed methods for the simulation of biological data in the genomic field. However, these methods use large-scale genomic data as a reference to simulate experiments, which may limit the use of the methods in the case where the data in specific studies are not available. Few methods use experimental results or observed parameters for simulation. The goal of this study is to develop a Web application called SITDEM to simulate disease/endpoint models in three different approaches based on only parameters observed in GWAS. In our simulation, a key task is to compute the probability of genotypes. Based on that, we randomly sample simulation data. Simulation results are shown as a function of p-value against odds ratio or relative risk of a SNP in dominant and recessive models. Our simulation results show the potential of SITDEM for simulating genotype data. SITDEM could be particularly useful for investigating the relationship among observed parameters for target SNPs and for estimating the number of variables (SNPs) required to result in significant p-values in multiple comparisons. The proposed simulation tool is freely available at http://www.snpmodel.com.
Collapse
|
12
|
Flores JL, Inza I, Larrañaga P, Calvo B. A new measure for gene expression biclustering based on non-parametric correlation. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2013; 112:367-397. [PMID: 24079964 DOI: 10.1016/j.cmpb.2013.07.025] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2012] [Revised: 06/14/2013] [Accepted: 07/26/2013] [Indexed: 06/02/2023]
Abstract
BACKGROUND One of the emerging techniques for performing the analysis of the DNA microarray data known as biclustering is the search of subsets of genes and conditions which are coherently expressed. These subgroups provide clues about the main biological processes. Until now, different approaches to this problem have been proposed. Most of them use the mean squared residue as quality measure but relevant and interesting patterns can not be detected such as shifting, or scaling patterns. Furthermore, recent papers show that there exist new coherence patterns involved in different kinds of cancer and tumors such as inverse relationships between genes which can not be captured. RESULTS The proposed measure is called Spearman's biclustering measure (SBM) which performs an estimation of the quality of a bicluster based on the non-linear correlation among genes and conditions simultaneously. The search of biclusters is performed by using a evolutionary technique called estimation of distribution algorithms which uses the SBM measure as fitness function. This approach has been examined from different points of view by using artificial and real microarrays. The assessment process has involved the use of quality indexes, a set of bicluster patterns of reference including new patterns and a set of statistical tests. It has been also examined the performance using real microarrays and comparing to different algorithmic approaches such as Bimax, CC, OPSM, Plaid and xMotifs. CONCLUSIONS SBM shows several advantages such as the ability to recognize more complex coherence patterns such as shifting, scaling and inversion and the capability to selectively marginalize genes and conditions depending on the statistical significance.
Collapse
Affiliation(s)
- Jose L Flores
- Intelligent Systems Group, Department of Computer Sciences and Artificial Intelligence, University of the Basque Country, P.O. Box 649, 20080 Donostia - San Sebastian, Spain.
| | | | | | | |
Collapse
|
13
|
Hedjazi L, Le Lann MV, Kempowsky T, Dalenc F, Aguilar-Martin J, Favre G. Symbolic data analysis to defy low signal-to-noise ratio in microarray data for breast cancer prognosis. J Comput Biol 2013; 20:610-20. [PMID: 23899014 DOI: 10.1089/cmb.2012.0249] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Microarray profiling has recently generated the hope to gain new insights into breast cancer biology and thereby improve the performance of current prognostic tools. However, it also poses several serious challenges to classical data analysis techniques related to the characteristics of resulting data, mainly high dimensionality and low signal-to-noise ratio. Despite the tremendous research work performed to handle the first challenge in the feature selection framework, very little attention has been directed to address the second one. We propose in this article to address both issues simultaneously based on symbolic data analysis capabilities in order to derive more accurate genetic marker-based prognostic models. In particular, interval data representation is employed to model various uncertainties in microarray measurements. A recent feature selection algorithm that handles symbolic interval data is used then to derive a genetic signature. The predictive value of the derived signature is then assessed by following a rigorous experimental setup and compared with existing prognostic approaches in terms of predictive performance and estimated survival probability. It is shown that the derived signature (GenSym) performs significantly better than other prognostic models, including the 70-gene signature, St. Gallen, and National Institutes of Health criteria.
Collapse
|
14
|
Giannakeas N, Karvelis PS, Exarchos TP, Kalatzis FG, Fotiadis DI. Segmentation of microarray images using pixel classification—Comparison with clustering-based methods. Comput Biol Med 2013; 43:705-16. [DOI: 10.1016/j.compbiomed.2013.03.003] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2012] [Revised: 07/26/2012] [Accepted: 03/14/2013] [Indexed: 11/16/2022]
|
15
|
Dembélé D. A Flexible Microarray Data Simulation Model. MICROARRAYS 2013; 2:115-30. [PMID: 27605184 PMCID: PMC5003477 DOI: 10.3390/microarrays2020115] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2013] [Revised: 04/07/2013] [Accepted: 04/15/2013] [Indexed: 11/16/2022]
Abstract
Microarray technology allows monitoring of gene expression profiling at the genome level. This is useful in order to search for genes involved in a disease. The performances of the methods used to select interesting genes are most often judged after other analyzes (qPCR validation, search in databases...), which are also subject to error. A good evaluation of gene selection methods is possible with data whose characteristics are known, that is to say, synthetic data. We propose a model to simulate microarray data with similar characteristics to the data commonly produced by current platforms. The parameters used in this model are described to allow the user to generate data with varying characteristics. In order to show the flexibility of the proposed model, a commented example is given and illustrated. An R package is available for immediate use.
Collapse
Affiliation(s)
- Doulaye Dembélé
- Microarray Platform, IGBMC, CNRS-INSERM-UdS, 1 rue Laurent Fries, Parc d'Innovation,67400 Illkirch, France.
| |
Collapse
|
16
|
Yoo C, Brilz EM, Wilcox M, Pershouse MA, Putnam EA. Gene Pathways Discovery in Asbestos-Related Diseases using Local Causal Discovery Algorithm. COMMUN STAT-SIMUL C 2012. [DOI: 10.1080/03610918.2011.621573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
17
|
Zhang J, Coombes KR. Sources of variation in false discovery rate estimation include sample size, correlation, and inherent differences between groups. BMC Bioinformatics 2012; 13 Suppl 13:S1. [PMID: 23320794 PMCID: PMC3426804 DOI: 10.1186/1471-2105-13-s13-s1] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Background High-throughtput technologies enable the testing of tens of thousands of measurements simultaneously. Identification of genes that are differentially expressed or associated with clinical outcomes invokes the multiple testing problem. False Discovery Rate (FDR) control is a statistical method used to correct for multiple comparisons for independent or weakly dependent test statistics. Although FDR control is frequently applied to microarray data analysis, gene expression is usually correlated, which might lead to inaccurate estimates. In this paper, we evaluate the accuracy of FDR estimation. Methods Using two real data sets, we resampled subgroups of patients and recalculated statistics of interest to illustrate the imprecision of FDR estimation. Next, we generated many simulated data sets with block correlation structures and realistic noise parameters, using the Ultimate Microarray Prediction, Inference, and Reality Engine (UMPIRE) R package. We estimated FDR using a beta-uniform mixture (BUM) model, and examined the variation in FDR estimation. Results The three major sources of variation in FDR estimation are the sample size, correlations among genes, and the true proportion of differentially expressed genes (DEGs). The sample size and proportion of DEGs affect both magnitude and precision of FDR estimation, while the correlation structure mainly affects the variation of the estimated parameters. Conclusions We have decomposed various factors that affect FDR estimation, and illustrated the direction and extent of the impact. We found that the proportion of DEGs has a significant impact on FDR; this factor might have been overlooked in previous studies and deserves more thought when controlling FDR.
Collapse
Affiliation(s)
- Jiexin Zhang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA
| | | |
Collapse
|
18
|
Zacharia E, Maroulis DE. 3-D Spot Modeling for Automatic Segmentation of cDNA Microarray Images. IEEE Trans Nanobioscience 2010; 9:181-92. [DOI: 10.1109/tnb.2010.2050900] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
19
|
Zeisel A, Amir A, Köstler WJ, Domany E. Intensity dependent estimation of noise in microarrays improves detection of differentially expressed genes. BMC Bioinformatics 2010; 11:400. [PMID: 20663218 PMCID: PMC2920277 DOI: 10.1186/1471-2105-11-400] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2010] [Accepted: 07/27/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In many microarray experiments, analysis is severely hindered by a major difficulty: the small number of samples for which expression data has been measured. When one searches for differentially expressed genes, the small number of samples gives rise to an inaccurate estimation of the experimental noise. This, in turn, leads to loss of statistical power. RESULTS We show that the measurement noise of genes with similar expression levels (intensity) is identically and independently distributed, and that this (intensity dependent) distribution is approximately normal. Our method can be easily adapted and used to test whether these statement hold for data from any particular microarray experiment. We propose a method that provides an accurate estimation of the intensity-dependent variance of the noise distribution, and demonstrate that using this estimation we can detect differential expression with much better statistical power than that of standard t-test, and can compare the noise levels of different experiments and platforms. CONCLUSIONS When the number of samples is small, the simple method we propose improves significantly the statistical power in identifying differentially expressed genes.
Collapse
Affiliation(s)
- Amit Zeisel
- Department of Physics of Complex Systems, The Weizmann Institute of Science, Rehovot, Israel
| | | | | | | |
Collapse
|
20
|
Daskalakis A, Glotsos D, Kostopoulos S, Cavouras D, Nikiforidis G. A comparative study of individual and ensemble majority vote cDNA microarray image segmentation schemes, originating from a spot-adjustable based restoration framework. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2009; 95:72-88. [PMID: 19278747 DOI: 10.1016/j.cmpb.2009.01.007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/2008] [Revised: 09/23/2008] [Accepted: 01/12/2009] [Indexed: 05/27/2023]
Abstract
The aim of this study was to comparatively evaluate the performances of various segmentation algorithms, in conjunction with a noise reduction step, for gene expression levels intensity extraction in cDNA microarray images. Different segmentation algorithms, based on histogram and unsupervised classification methods, which have never been previously employed in microarray image analysis, were employed either individually or in ensemble majority vote structures for separating spot-images from background pixels. The performances of segmentation algorithms or ensemble structures were evaluated by assessing the validity and reproducibility of gene expression levels extraction in simulated and real cDNA microarray images. By processing high quality simulated images, the highest segmentation accuracy was achieved by an ensemble structure (Histogram Concavity, Gaussian Kernelized Fuzzy-C-Means, Seeded Region Growing). Optimum performance in terms of processing time and segmentation precision for low quality simulated and replicated real cDNA microarray images was attained by the Histogram Concavity algorithm.
Collapse
Affiliation(s)
- Antonis Daskalakis
- Department of Medical Physics, Medical Image Processing and Analysis Laboratory, School of Medicine, University of Patras, Rio, Patras, Greece.
| | | | | | | | | |
Collapse
|
21
|
Shan WJ, Tong CF, Shi JS. [Comparison of statistical methods for detecting differential expression in microarray data]. YI CHUAN = HEREDITAS 2009; 30:1640-6. [PMID: 19073583 DOI: 10.3724/sp.j.1005.2008.01640] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
DNA microarray is a new tool in biotechnology, which allows simultaneously monitoring thousands of gene expression in cells. The goal of differential gene expression analysis is to detect genes with significant change of gene expression levels arising from experimental conditions. Although various statistical methods have been suggested to confirm differential gene expression, only a few studies compared performance of the statistical methods. This paper presented comparison of statistical methods for finding differentially expressed genes (DEGs) from the microarray data. Using simulated and real datasets (Populus cDNA microarray data), we compared eight methods of identifying differential gene expression. The simulated datasets included four differential distributions (normal distribution, uniform distribution, c2 distribution, and exponential distribution). The results of simulated datasets analysis showed that the eight methods were more preferable with the microarray data of uniform distribution than normal distribution. They were not preferable with the c2 distribution and exponential distribution. Of these eight methods, SAM (Significance Analysis of Microarrays) and Wilcoxon rank sum test performed well in most cases. The results of real cDNA microarray data of Populus showed that there was much similarity of SAM, Samroc, and regression modeling approach. Wilcoxon rank sum test was different from them. Samroc and regression modeling approach were similar in the eight methods. For both simulated and real datasets, SAM, Samroc, and regression modeling approach performed better than other methods.
Collapse
Affiliation(s)
- Wen-Juan Shan
- The Key Laboratory of Forest Genetics and Gene Engineering of the State Administration and Jiangsu Province, Nanjing Forestry University, Nanjing 210037, China.
| | | | | |
Collapse
|
22
|
|
23
|
Marbach D, Schaffter T, Mattiussi C, Floreano D. Generating Realistic In Silico Gene Networks for Performance Assessment of Reverse Engineering Methods. J Comput Biol 2009; 16:229-39. [PMID: 19183003 DOI: 10.1089/cmb.2008.09tt] [Citation(s) in RCA: 300] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
- Daniel Marbach
- Ecole Polytechnique Fédérale de Lausanne (EPFL), Laboratory of Intelligent Systems, Lausanne, Switzerland
| | - Thomas Schaffter
- Ecole Polytechnique Fédérale de Lausanne (EPFL), Laboratory of Intelligent Systems, Lausanne, Switzerland
| | - Claudio Mattiussi
- Ecole Polytechnique Fédérale de Lausanne (EPFL), Laboratory of Intelligent Systems, Lausanne, Switzerland
| | - Dario Floreano
- Ecole Polytechnique Fédérale de Lausanne (EPFL), Laboratory of Intelligent Systems, Lausanne, Switzerland
| |
Collapse
|
24
|
Siegmund K, Ahlborn C, Richert C. ChipCheckII - predicting binding curves for multiple analyte strands on small DNA microarrays. NUCLEOSIDES NUCLEOTIDES & NUCLEIC ACIDS 2008; 27:376-88. [PMID: 18404572 DOI: 10.1080/15257770801944147] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Incomplete binding, saturation, and cross-hybridization between partially complementary strands complicate the parallel detection of nucleic acids via DNA microarrays. Treating the competing equilibria governing binding to microarrays requires computational tools. We have developed the web-based program ChipCheckII that calculates total hybridization matrices for target strands interacting with probes on small DNA microarrays. The program can be used to compute the extent of cross-hybridization and other phenomena affecting fidelity of detection based on sequences, quantities of strands, and hybridization conditions as inputs. Enthalpy and entropy of duplex formation are generated locally with UNAfold, including those for complexes that are partially matched. Simulated binding versus temperature curves for portions of a commercial genome chip demonstrate the extent to which cross-hybridization can complicate DNA detection. ChipCheckII is expected to aid nucleic acid chemists in developing high fidelity DNA microarrays.
Collapse
Affiliation(s)
- Karsten Siegmund
- Institute for Organic Chemistry, University of Karlsruhe, Karlsruhe, Germany
| | | | | |
Collapse
|
25
|
Xiong H, Zhang D, Martyniuk CJ, Trudeau VL, Xia X. Using generalized procrustes analysis (GPA) for normalization of cDNA microarray data. BMC Bioinformatics 2008; 9:25. [PMID: 18199333 PMCID: PMC2275243 DOI: 10.1186/1471-2105-9-25] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2007] [Accepted: 01/16/2008] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND Normalization is essential in dual-labelled microarray data analysis to remove non-biological variations and systematic biases. Many normalization methods have been used to remove such biases within slides (Global, Lowess) and across slides (Scale, Quantile and VSN). However, all these popular approaches have critical assumptions about data distribution, which is often not valid in practice. RESULTS In this study, we propose a novel assumption-free normalization method based on the Generalized Procrustes Analysis (GPA) algorithm. Using experimental and simulated normal microarray data and boutique array data, we systemically evaluate the ability of the GPA method in normalization compared with six other popular normalization methods including Global, Lowess, Scale, Quantile, VSN, and one boutique array-specific housekeeping gene method. The assessment of these methods is based on three different empirical criteria: across-slide variability, the Kolmogorov-Smirnov (K-S) statistic and the mean square error (MSE). Compared with other methods, the GPA method performs effectively and consistently better in reducing across-slide variability and removing systematic bias. CONCLUSION The GPA method is an effective normalization approach for microarray data analysis. In particular, it is free from the statistical and biological assumptions inherent in other normalization methods that are often difficult to validate. Therefore, the GPA method has a major advantage in that it can be applied to diverse types of array sets, especially to the boutique array where the majority of genes may be differentially expressed.
Collapse
Affiliation(s)
- Huiling Xiong
- Centre for Advanced Research in Environmental Genomics, Department of Biology, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada.
| | | | | | | | | |
Collapse
|
26
|
Giannakeas N, Karvelis PS, Fotiadis DI. A classification-based segmentation of cDNA microarray images using Support Vector Machines. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2008; 2008:875-878. [PMID: 19162796 DOI: 10.1109/iembs.2008.4649293] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Microarray technology provides a tool for the simultaneous analysis of the expression level for an amount of genes. Microarray studies have been shown that image processing techniques can significantly influence microarray data precision. In this paper we propose a supervised method for the segmentation of microarray images based on classification techniques. Support Vector machine is employed to classify each pixel of the image into signal, background or artefacts. In addition, a preprocessing step is applied in order to filter the initial image. The proposed method is applied both to real and simulated images. The pixels of the image are classified in two classes for the real images and three classes for the simulated one. For this task, an informative set of features is used from both green and red channels. The results obtained indicate high accuracy (approximately 99%).
Collapse
Affiliation(s)
- Nikolaos Giannakeas
- Laboratory of Biological Chemistry, Medical School, University of Ioannina, Ioannina GR 45110, Greece.
| | | | | |
Collapse
|
27
|
Kim HY, Lee SE, Kim MJ, Han JI, Kim BK, Lee YS, Lee YS, Kim JH. Characterization and simulation of cDNA microarray spots using a novel mathematical model. BMC Bioinformatics 2007; 8:485. [PMID: 18096047 PMCID: PMC2267720 DOI: 10.1186/1471-2105-8-485] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2007] [Accepted: 12/20/2007] [Indexed: 11/30/2022] Open
Abstract
Background The quality of cDNA microarray data is crucial for expanding its application to other research areas, such as the study of gene regulatory networks. Despite the fact that a number of algorithms have been suggested to increase the accuracy of microarray gene expression data, it is necessary to obtain reliable microarray images by improving wet-lab experiments. As the first step of a cDNA microarray experiment, spotting cDNA probes is critical to determining the quality of spot images. Results We developed a governing equation of cDNA deposition during evaporation of a drop in the microarray spotting process. The governing equation included four parameters: the surface site density on the support, the extrapolated equilibrium constant for the binding of cDNA molecules with surface sites on glass slides, the macromolecular interaction factor, and the volume constant of a drop of cDNA solution. We simulated cDNA deposition from the single model equation by varying the value of the parameters. The morphology of the resulting cDNA deposit can be classified into three types: a doughnut shape, a peak shape, and a volcano shape. The spot morphology can be changed into a flat shape by varying the experimental conditions while considering the parameters of the governing equation of cDNA deposition. The four parameters were estimated by fitting the governing equation to the real microarray images. With the results of the simulation and the parameter estimation, the phenomenon of the formation of cDNA deposits in each type was investigated. Conclusion This study explains how various spot shapes can exist and suggests which parameters are to be adjusted for obtaining a good spot. This system is able to explore the cDNA microarray spotting process in a predictable, manageable and descriptive manner. We hope it can provide a way to predict the incidents that can occur during a real cDNA microarray experiment, and produce useful data for several research applications involving cDNA microarrays.
Collapse
Affiliation(s)
- Hye Young Kim
- Department of Physiology, College of Medicine, Hanyang University, Seoul, 133-791, Korea.
| | | | | | | | | | | | | | | |
Collapse
|
28
|
Daskalakis A, Cavouras D, Bougioukos P, Kostopoulos S, Georgiadis P, Kalatzis I, Kagadis G, Nikiforidis G. Genes expression level quantification using a spot-based algorithmic pipeline. ACTA ACUST UNITED AC 2007; 2007:1148-51. [PMID: 18002165 DOI: 10.1109/iembs.2007.4352499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
An efficient spot-based (SB) algorithmic pipeline of clustering, enhancement, and segmentation techniques was developed to quantify gene expression levels in microarray images. The SB-pipeline employed i/a griding procedure to locate spot-regions, ii/a clustering algorithm (enhanced fuzzy c-means or EnFCM) to roughly segment spots from background and estimate background noise and spot's center, iii/an adaptive histogram modification technique to accentuate spot's boundaries, and iv/a segmentation algorithm (Seeded Region Growing or SRG), to extract microarray spots' intensities. Extracted intensities were comparatively evaluated in term of Mean Absolute Error (MAE) against the MAGIC TOOL's SRG employing a dataset of 7 replicated microarray images (6400 spots each). MAE box-plots mean values were 0.254 and 0.630 for the SB-pipeline and the MAGIC TOOL respectively. Total processing times for the dataset evaluated (7 images) were 2100 seconds and 3410 seconds for the SB-pipeline and MAGIC TOOL respectively.
Collapse
Affiliation(s)
- Antonis Daskalakis
- Medical Image Processing and Analysis Group, Department of Medical Physics, School of Medicine, University of Patras, Rio, GR-26503, Greece.
| | | | | | | | | | | | | | | |
Collapse
|
29
|
Daskalakis A, Cavouras D, Bougioukos P, Kostopoulos S, Glotsos D, Kalatzis I, Kagadis GC, Argyropoulos C, Nikiforidis G. Improving gene quantification by adjustable spot-image restoration. Bioinformatics 2007; 23:2265-72. [PMID: 17599935 DOI: 10.1093/bioinformatics/btm337] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION One of the major factors that complicate the task of microarray image analysis is that microarray images are distorted by various types of noise. In this study a robust framework is proposed, designed to take into account the effect of noise in microarray images in order to assist the demanding task of microarray image analysis. The proposed framework, incorporates in the microarray image processing pipeline a novel combination of spot adjustable image analysis and processing techniques and consists of the following stages: (1) gridding for facilitating spot identification, (2) clustering (unsupervised discrimination between spot and background pixels) applied to spot image for automatic local noise assessment, (3) modeling of local image restoration process for spot image conditioning (adjustable wiener restoration using an empirically determined degradation function), (4) automatic spot segmentation employing seeded-region-growing, (5) intensity extraction and (6) assessment of the reproducibility (real data) and the validity (simulated data) of the extracted gene expression levels. RESULTS Both simulated and real microarray images were employed in order to assess the performance of the proposed framework against well-established methods implemented in publicly available software packages (Scanalyze and SPOT). Regarding simulated images, the novel combination of techniques, introduced in the proposed framework, rendered the detection of spot areas and the extraction of spot intensities more accurate. Furthermore, on real images the proposed framework proved of better stability across replicates. Results indicate that the proposed framework improves spots' segmentation and, consequently, quantification of gene expression levels. AVAILABILITY All algorithms were implemented in Matlab (The Mathworks, Inc., Natick, MA, USA) environment. The codes that implement microarray gridding, adaptive spot restoration and segmentation/intensity extraction are available upon request. Supplementary results and the simulated microarray images used in this study are available for download from: ftp://users:bioinformatics@mipa.med.upatras.gr. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Antonis Daskalakis
- Medical Image Processing and Analysis Group, Laboratory of Medical Physics, School of Medicine, University of Patras, 265 04 Rio, Greece.
| | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Aho T, Smolander OP, Niemi J, Yli-Harja O. RMBNToolbox: random models for biochemical networks. BMC SYSTEMS BIOLOGY 2007; 1:22. [PMID: 17524136 PMCID: PMC1896132 DOI: 10.1186/1752-0509-1-22] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/22/2007] [Accepted: 05/24/2007] [Indexed: 11/10/2022]
Abstract
BACKGROUND There is an increasing interest to model biochemical and cell biological networks, as well as to the computational analysis of these models. The development of analysis methodologies and related software is rapid in the field. However, the number of available models is still relatively small and the model sizes remain limited. The lack of kinetic information is usually the limiting factor for the construction of detailed simulation models. RESULTS We present a computational toolbox for generating random biochemical network models which mimic real biochemical networks. The toolbox is called Random Models for Biochemical Networks. The toolbox works in the Matlab environment, and it makes it possible to generate various network structures, stoichiometries, kinetic laws for reactions, and parameters therein. The generation can be based on statistical rules and distributions, and more detailed information of real biochemical networks can be used in situations where it is known. The toolbox can be easily extended. The resulting network models can be exported in the format of Systems Biology Markup Language. CONCLUSION While more information is accumulating on biochemical networks, random networks can be used as an intermediate step towards their better understanding. Random networks make it possible to study the effects of various network characteristics to the overall behavior of the network. Moreover, the construction of artificial network models provides the ground truth data needed in the validation of various computational methods in the fields of parameter estimation and data analysis.
Collapse
Affiliation(s)
- Tommi Aho
- Department of Information Technology, Institute of Signal Processing, Tampere University of Technology, Tampere, Finland
| | - Olli-Pekka Smolander
- Department of Information Technology, Institute of Signal Processing, Tampere University of Technology, Tampere, Finland
| | - Jari Niemi
- Department of Information Technology, Institute of Signal Processing, Tampere University of Technology, Tampere, Finland
- Department of Information Technology, Institute of Mathematics, Tampere University of Technology, Tampere, Finland
| | - Olli Yli-Harja
- Department of Information Technology, Institute of Signal Processing, Tampere University of Technology, Tampere, Finland
| |
Collapse
|