1
|
Yousefian-Jazi A, Choi J. Sequential Integration of Fuzzy Clustering and Expectation Maximization for Transcription Factor Binding Site Identification. J Comput Biol 2018; 25:1247-1256. [PMID: 30133315 DOI: 10.1089/cmb.2017.0230] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
The identification of transcription factor binding sites (TFBSs) is a problem for which computational methods offer great hope. Thus far, the expectation maximization (EM) technique has been successfully utilized in finding TFBSs in DNA sequences, but inappropriate initialization of EM has yielded poor performance or running time scalability under a given data set. In this study, we used a sequential integration approach that defined the final solution as the set of solutions acquired from solving objectives in a cascade manner to integrate the fuzzy C-means and the EM approaches to DNA motif discovery. The new method is explained in detail and tested on the chromatin immunoprecipitation sequencing (ChIP-seq) data sets for different transcription factors (TFs) with various motif patterns. The proposed algorithm also suggests an efficient process for analyzing motif similarity to known motifs as well as finding a target motif. A comparison of results with those of the well-known motif-finding tool, MEME-ChIP, shows the advantages of our proposed framework over this existing tool. Experimental results show that we were able to find the true motifs for all TFs, and that the motifs found by our proposed algorithm were more similar to JASPAR-known motifs for the STAT1, GATA1, and JUN TFs than those found by MEME-ChIP.
Collapse
Affiliation(s)
- Ali Yousefian-Jazi
- 1 Interdisciplinary Program in Bioengineering, Graduate School, Seoul National University , Seoul, Korea
| | - Jinwook Choi
- 2 Department of Biomedical Engineering, College of Medicine, Seoul National University , Seoul, Korea
| |
Collapse
|
2
|
Pham TD. Spatial uncertainty modeling of fuzzy information in images for pattern classification. PLoS One 2014; 9:e105075. [PMID: 25157744 PMCID: PMC4144883 DOI: 10.1371/journal.pone.0105075] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2014] [Accepted: 07/20/2014] [Indexed: 11/18/2022] Open
Abstract
The modeling of the spatial distribution of image properties is important for many pattern recognition problems in science and engineering. Mathematical methods are needed to quantify the variability of this spatial distribution based on which a decision of classification can be made in an optimal sense. However, image properties are often subject to uncertainty due to both incomplete and imprecise information. This paper presents an integrated approach for estimating the spatial uncertainty of vagueness in images using the theory of geostatistics and the calculus of probability measures of fuzzy events. Such a model for the quantification of spatial uncertainty is utilized as a new image feature extraction method, based on which classifiers can be trained to perform the task of pattern recognition. Applications of the proposed algorithm to the classification of various types of image data suggest the usefulness of the proposed uncertainty modeling technique for texture feature extraction.
Collapse
Affiliation(s)
- Tuan D. Pham
- Aizu Research Cluster for Medical Engineering and Informatics, Center for Advanced Information Science and Technology, The University of Aizu, Aizu-Wakamatsu, Fukushima, Japan
- * E-mail:
| |
Collapse
|
3
|
ISL1 regulates peroxisome proliferator-activated receptor γ activation and early adipogenesis via bone morphogenetic protein 4-dependent and -independent mechanisms. Mol Cell Biol 2014; 34:3607-17. [PMID: 25047837 DOI: 10.1128/mcb.00583-14] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
While adipogenesis is controlled by a cascade of transcription factors, the global gene expression profiles in the early phase of adipogenesis are not well defined. Using microarray analysis of gene expression in 3T3-L1 cells, we have identified evidence for the activity of 2,568 genes during the early phase of adipocyte differentiation. One of these, the ISL1 gene, was of interest since its expression was markedly upregulated 1 h after initiation of differentiation, with a subsequent rapid decline. Overexpression of ISL1 at early times during adipocyte differentiation but not at later times was found to profoundly inhibit differentiation. This was accompanied by moderate downregulation of peroxisome proliferator-activated receptor γ (PPARγ) levels, substantial downregulation of PPARγ downstream genes, and downregulation of bone morphogenetic protein 4 (BMP4) levels in preadipocytes. Readdition of BMP4 overcame the inhibitory effect of ISL1 on the expression of PPARγ but not aP2, a gene downstream of PPARγ, and BMP4 also partially rescued ISL1 inhibition of adipogenesis, an effect which is additive with rosiglitazone. These results suggest that ISL1 is intimately involved in early regulation of adipogenesis, modulating PPARγ expression and activity via BMP4-dependent and -independent mechanisms. Our time course gene expression survey sets the stage for further studies to explore other early and immediate regulators.
Collapse
|
4
|
Kazi JU, Rönnstrand L. Src-Like adaptor protein (SLAP) binds to the receptor tyrosine kinase Flt3 and modulates receptor stability and downstream signaling. PLoS One 2012; 7:e53509. [PMID: 23300935 PMCID: PMC3534055 DOI: 10.1371/journal.pone.0053509] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2012] [Accepted: 11/29/2012] [Indexed: 12/23/2022] Open
Abstract
Fms-like tyrosine kinase 3 (Flt3) is an important growth factor receptor in hematopoiesis. Gain-of-function mutations of the receptor contribute to the transformation of acute myeloid leukemia (AML). Src-like adaptor protein (SLAP) is an interaction partner of the E3 ubiquitin ligase Cbl that can regulate receptor tyrosine kinases-mediated signal transduction. In this study, we analyzed the role of SLAP in signal transduction downstream of the type III receptor tyrosine kinase Flt3. The results show that upon ligand stimulation SLAP stably associates with Flt3 through multiple phosphotyrosine residues in Flt3. SLAP constitutively interacts with oncogenic Flt3-ITD and co-localizes with Flt3 near the cell membrane. This association initiates Cbl-dependent receptor ubiquitination and degradation. Depletion of SLAP expression by shRNA in Flt3-transfected Ba/F3 cells resulted in a weaker activation of FL-induced PI3K-Akt and MAPK signaling. Meta-analysis of microarray data from patient samples suggests that SLAP mRNA is differentially expressed in different cancers and its expression was significantly increased in patients carrying the Flt3-ITD mutation. Thus, our data suggest a novel role of SLAP in different cancers and in modulation of receptor tyrosine kinase signaling apart from its conventional role in regulation of receptor stability.
Collapse
Affiliation(s)
- Julhash U. Kazi
- Experimental Clinical Chemistry, Wallenberg Laboratory, Department of Laboratory Medicine, Lund University, Skåne University Hospital, Malmö, Sweden
| | - Lars Rönnstrand
- Experimental Clinical Chemistry, Wallenberg Laboratory, Department of Laboratory Medicine, Lund University, Skåne University Hospital, Malmö, Sweden
- * E-mail:
| |
Collapse
|
5
|
Sánchez R, Argáez M, Guillén P. Sparse representation via ℓ1-minimization for underdetermined systems in classification of tumors with gene expression data. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2012; 2011:3362-6. [PMID: 22255060 DOI: 10.1109/iembs.2011.6090911] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The development of cancer diagnosis models and cancer discovery from DNA microarray data are of great interest in bioinformatics and medicine. In pattern recognition and machine learning, a classification problem refers to finding an algorithm for assigning a given input data into one of several categories. Many natural signals are sparse or compressible in the sense that they have short representations when expressed in a suitable basis. Motivated by the recent successful algorithm developments for sparse signal recovery, we apply the selective nature of sparse representation to perform the above mentioned classification. In order to find such sparse representation we implement an ℓ(1)-minimization algorithm. This methodology overcomes the lack of robustness with respect to outliers. In contrast to other classification algorithms, no model selection dependency is involved. The minimization algorithm is a convex relaxation-like that has been proven to efficiently recover sparse signals. To study its performance, the proposed method is applied to six tumor gene expression datasets and numerically compared with various support vector machine methods (SVM). The numerical results show that the ℓ(1)-minimization algorithm proposed performs at least comparably and often better than SVMs.
Collapse
Affiliation(s)
- R Sánchez
- University of Texas aEl Paso, TX 79968, USA.
| | | | | |
Collapse
|
6
|
Identification of the role of C/EBP in neurite regeneration following microarray analysis of a L. stagnalis CNS injury model. BMC Neurosci 2012; 13:2. [PMID: 22217148 PMCID: PMC3315421 DOI: 10.1186/1471-2202-13-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2010] [Accepted: 01/04/2012] [Indexed: 12/02/2022] Open
Abstract
Background Neuronal regeneration in the adult mammalian central nervous system (CNS) is severely compromised due to the presence of extrinsic inhibitory signals and a reduced intrinsic regenerative capacity. In contrast, the CNS of adult Lymnaea stagnalis (L. stagnalis), a freshwater pond snail, is capable of spontaneous regeneration following neuronal injury. Thus, L. stagnalis has served as an animal model to study the cellular mechanisms underlying neuronal regeneration. However, the usage of this model has been limited due to insufficient molecular tools. We have recently conducted a partial neuronal transcriptome sequencing project and reported over 10,000 EST sequences which allowed us to develop and perform a large-scale high throughput microarray analysis. Results To identify genes that are involved in the robust regenerative capacity observed in L. stagnalis, we designed the first gene chip covering ~15, 000 L. stagnalis CNS EST sequences. We conducted microarray analysis to compare the gene expression profiles of sham-operated (control) and crush-operated (regenerative model) central ganglia of adult L. stagnalis. The expression levels of 348 genes were found to be significantly altered (p < 0.05) following nerve injury. From this pool, 67 sequences showed a greater than 2-fold change: 42 of which were up-regulated and 25 down-regulated. Our qPCR analysis confirmed that CCAAT enhancer binding protein (C/EBP) was up-regulated following nerve injury in a time-dependent manner. In order to test the role of C/EBP in regeneration, C/EBP siRNA was applied following axotomy of cultured Lymnaea PeA neurons. Knockdown of C/EBP following axotomy prevented extension of the distal, proximal and intact neurites. In vivo knockdown of C/EBP postponed recovery of locomotory activity following nerve crush. Taken together, our data suggest both somatic and local effects of C/EBP are involved in neuronal regeneration. Conclusions This is the first high-throughput microarray study in L. stagnalis, a model of axonal regeneration following CNS injury. We reported that 348 genes were regulated following central nerve injury in adult L. stagnalis and provided the first evidence for the involvement of local C/EBP in neuronal regeneration. Our study demonstrates the usefulness of the large-scale gene profiling approach in this invertebrate model to study the molecular mechanisms underlying the intrinsic regenerative capacity of adult CNS neurons.
Collapse
|
7
|
ZHAO XINGMING, CHEUNG YIUMING, HUANG DESHUANG. ANALYSIS OF GENE EXPRESSION DATA USING RPEM ALGORITHM IN NORMAL MIXTURE MODEL WITH DYNAMIC ADJUSTMENT OF LEARNING RATE. INT J PATTERN RECOGN 2011. [DOI: 10.1142/s0218001410008056] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Microarray technology is a useful tool for monitoring the expression levels of thousands of genes simultaneously. Recently, mixture modeling has been used to extract expression signatures from gene expression profiles. In general, two separate steps are utilized to estimate the number of classes and model parameters, respectively. However, such a method is often time-consuming and leads to suboptimal solutions. In this paper, we therefore apply a one-step approach, namely Rival Penalized Expectation-Maximization (RPEM) algorithm, to analyze the gene expression data. The RPEM algorithm is capable of estimating the parameters of normal mixture model, while determining the number of classes automatically at the same time. Furthermore, we speed up the learning procedure of RPEM by proposing a new mechanism to adjust the learning rate dynamically. The numerical results on real gene expression data demonstrate that our proposed method is indeed effective and efficient.
Collapse
Affiliation(s)
- XING-MING ZHAO
- Institute of Systems Biology, Shanghai University, Shanghai 200444, P. R. China
| | - YIU-MING CHEUNG
- Department of Computer Science, Hong Kong Baptist University, Hong Kong, P. R. China
| | - DE-SHUANG HUANG
- Intelligent Computing Lab, Institute of Intelligent Machines, Chinese Academy of Sciences, P. O. Box 1130, Hefei, Anhui 230031, P. R. China
| |
Collapse
|
8
|
Kobrina Y, Turunen MJ, Saarakkala S, Jurvelin JS, Hauta-Kasari M, Isaksson H. Cluster analysis of infrared spectra of rabbit cortical bone samples during maturation and growth. Analyst 2010; 135:3147-55. [PMID: 21038039 DOI: 10.1039/c0an00500b] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Bone consists of an organic and an inorganic matrix. During development, bone undergoes changes in its composition and structure. In this study we apply three different cluster analysis algorithms [K-means (KM), fuzzy C-means (FCM) and hierarchical clustering (HCA)], and discriminant analysis (DA) on infrared spectroscopic data from developing cortical bone with the aim of comparing their ability to correctly classify the samples into different age groups. Cortical bone samples from the mid-diaphysis of the humerus of New Zealand white rabbits from three different maturation stages (newborn (NB), immature (11 days-1 month old), mature (3-6 months old)) were used. Three clusters were obtained by KM, FCM and HCA methods on different spectral regions (amide I, phosphate and carbonate). The newborn samples were well separated (71-100% correct classifications) from the other age groups by all bone components. The mature samples (3-6 months old) were well separated (100%) from those of other age groups by the carbonate spectral region, while by the phosphate and amide I regions some samples were assigned to another group (43-71% correct classifications). The greatest variance in the results for all algorithms was observed in the amide I region. In general, FCM clustering performed better than the other methods, and the overall error was lower. The discriminate analysis results showed that by combining the clustering results from all three spectral regions, the ability to predict the correct age group for all samples increased (from 29-86% to 77-91%). This study is the first to compare several clustering methods on infrared spectra of bone. Fuzzy C-means clustering performed best, and its ability to study the degree of memberships of samples to each cluster might be beneficial in future studies of medical diagnostics.
Collapse
Affiliation(s)
- Yevgeniya Kobrina
- Department of Physics and Mathematics, University of Eastern Finland, PO Box 1627, 70211 Kuopio, Finland
| | | | | | | | | | | |
Collapse
|
9
|
Howard BE, Sick B, Heber S. Unsupervised assessment of microarray data quality using a Gaussian mixture model. BMC Bioinformatics 2009; 10:191. [PMID: 19545436 PMCID: PMC2717951 DOI: 10.1186/1471-2105-10-191] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2008] [Accepted: 06/22/2009] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Quality assessment of microarray data is an important and often challenging aspect of gene expression analysis. This task frequently involves the examination of a variety of summary statistics and diagnostic plots. The interpretation of these diagnostics is often subjective, and generally requires careful expert scrutiny. RESULTS We show how an unsupervised classification technique based on the Expectation-Maximization (EM) algorithm and the naïve Bayes model can be used to automate microarray quality assessment. The method is flexible and can be easily adapted to accommodate alternate quality statistics and platforms. We evaluate our approach using Affymetrix 3' gene expression and exon arrays and compare the performance of this method to a similar supervised approach. CONCLUSION This research illustrates the efficacy of an unsupervised classification approach for the purpose of automated microarray data quality assessment. Since our approach requires only unannotated training data, it is easy to customize and to keep up-to-date as technology evolves. In contrast to other "black box" classification systems, this method also allows for intuitive explanations.
Collapse
Affiliation(s)
- Brian E Howard
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA.
| | | | | |
Collapse
|
10
|
Sopharak A, Uyyanonvara B, Barman S. Automatic Exudate Detection from Non-dilated Diabetic Retinopathy Retinal Images Using Fuzzy C-means Clustering. SENSORS 2009; 9:2148-61. [PMID: 22574005 PMCID: PMC3332251 DOI: 10.3390/s90302148] [Citation(s) in RCA: 129] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2009] [Revised: 03/19/2009] [Accepted: 03/20/2009] [Indexed: 11/16/2022]
Abstract
Exudates are the primary sign of Diabetic Retinopathy. Early detection can potentially reduce the risk of blindness. An automatic method to detect exudates from low-contrast digital images of retinopathy patients with non-dilated pupils using a Fuzzy C-Means (FCM) clustering is proposed. Contrast enhancement preprocessing is applied before four features, namely intensity, standard deviation on intensity, hue and a number of edge pixels, are extracted to supply as input parameters to coarse segmentation using FCM clustering method. The first result is then fine-tuned with morphological techniques. The detection results are validated by comparing with expert ophthalmologists' hand-drawn ground-truths. Sensitivity, specificity, positive predictive value (PPV), positive likelihood ratio (PLR) and accuracy are used to evaluate overall performance. It is found that the proposed method detects exudates successfully with sensitivity, specificity, PPV, PLR and accuracy of 87.28%, 99.24%, 42.77%, 224.26 and 99.11%, respectively.
Collapse
Affiliation(s)
- Akara Sopharak
- Department of Information Technology, Sirindhorn International Institute of Technology, Thammasat University 131 Moo 5, Tiwanont Road, Bangkadi, Muang, Pathumthani 12000, Thailand; E-mails: ,
- Author to whom correspondence should be addressed; E-mails: ; Phone: +66-2501-3505 Ext. 2021, Fax: +66-2501-3524
| | - Bunyarit Uyyanonvara
- Department of Information Technology, Sirindhorn International Institute of Technology, Thammasat University 131 Moo 5, Tiwanont Road, Bangkadi, Muang, Pathumthani 12000, Thailand; E-mails: ,
| | - Sarah Barman
- Faculty of Computing, Information Systems and Mathematics, Kingston University Penrhyn Road, Kingston upon Thames, Surrey, KT1 2EE, UK; E-mail:
| |
Collapse
|
11
|
Muir WM, Rosa GJM, Pittendrigh BR, Xu S, Rider SD, Fountain M, Ogas J. A mixture model approach for the analysis of small exploratory microarray experiments. Comput Stat Data Anal 2009; 53:1566-1576. [PMID: 20160862 DOI: 10.1016/j.csda.2008.06.011] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
The microarray is an important and powerful tool for prescreening of genes for further research. However, alternative solutions are needed to increase power in small microarray experiments. Use of traditional parametric and even non-parametric tests for such small experiments lack power and have distributional problems. A mixture model is described that is performed directly on expression differences assuming that genes in alternative treatments are expressed or not in all combinations (i) not expressed in either condition, (ii) expressed only under the first condition, (iii) expressed only under the second condition, and (iv) expressed under both conditions, giving rise to 4 possible clusters with two treatments. The approach is termed a Mean-Difference-Mixture-Model (MD-MM) method. Accuracy and power of the MD-MM was compared to other commonly used methods, using both simulations, microarray data, and quantitative real time PCR (qRT-PCR). The MD-MM was found to be generally superior to other methods in most situations. The advantage was greatest in situations where there were few replicates, poor signal to noise ratios, or non-homogenous variances.
Collapse
Affiliation(s)
- W M Muir
- Dept. Animal Sciences, Purdue University, W. Lafayette IN 47907
| | | | | | | | | | | | | |
Collapse
|
12
|
Abstract
Fuzzy clustering is a useful tool for identifying relevant subsets of microarray data. This paper proposes a fuzzy clustering method for microarray data analysis. An advantage of the method is that it used a combination of the fuzzy c-means and the principal component analysis to identify the groups of genes that show similar expression patterns. It allows a gene to belong to more than a gene expression pattern with different membership grades. The method is suitable for the analysis of large amounts of noisy microarray data.
Collapse
Affiliation(s)
- Lixin Han
- Department of Computer Science and Engineering, Hohai University, Nanjing, People's Republic of China
| | - Xiaoqin Zeng
- Department of Computer Science and Engineering, Hohai University, Nanjing, People's Republic of China
| | - Hong Yan
- Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong, People's Republic of China
| |
Collapse
|
13
|
[Association between ion channel subtype and its gene co-expression]. YI CHUAN = HEREDITAS 2008; 30:1157-62. [PMID: 18779173 DOI: 10.3724/sp.j.1005.2008.01157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Association between ion channel functional subtype and its genes expression is important for exploring function of ion channel, annotating function of an unknown subtype and probing into molecular mechanism of ion channel diseases. In this study, we began with noise reduction by standardizing original micro-array data, which consisted of human and mouse gene expression profiles, and then we employed principle component analysis (PCA) together with fuzzy C-mean clustering algorithm to analyze the pre-processed gene expression profiles. PCA is applied to rebuild the feature space of human gene in 21 dimensions as well as the feature space of mouse gene in 26 dimensions. Using this method we largely reduced computational complexity without losing much information involved in the original data. Subsequently, fuzzy C-mean clustering was used to classify the ion channel genes of human and mouse in their reduced feature space. In the end, four ion channel functional subtypes, such as potassium ion channels, calcium ion channel, chloride ion channel, and receptor-mediated ion channel were clustered in both human and mouse gene feature space. We applied two statistic ways to conduct significance test of the findings. In one way, we randomly sampled the data for each functional subtype of the ion channel genes and recorded the true positive rate. As a result, in both human and mouse gene feature spaces, genes that belong to one functional subtype were more likely to be clustered together than expected by chance. In the other way, we performed Kappa test and used the functional subtypes as gold standard. The result showed that consistency between the ion channel gene clusters and the ion channel gene subtypes was significantly high for both human and mouse. These results indicate that ion channel genes within the same functional subtype tend to be co-expressed at least at the mRNA-level.
Collapse
|
14
|
Wang YP, Gunampally M, Chen J, Bittel D, Butler MG, Cai WW. A Comparison of Fuzzy Clustering Approaches for Quantification of Microarray Gene Expression. JOURNAL OF SIGNAL PROCESSING SYSTEMS 2008; 50:305-320. [PMID: 28163819 PMCID: PMC5286559 DOI: 10.1007/s11265-007-0123-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Despite the widespread application of microarray imaging for biomedical imaging research, barriers still exist regarding its reliability for clinical use. A critical major problem lies in accurate spot segmentation and the quantification of gene expression level (mRNA) from the microarray images. A variety of commercial and research freeware packages are available, but most cannot handle array spots with complex shapes such as donuts and scratches. Clustering approaches such as k-means and mixture models were introduced to overcome this difficulty, which use the hard labeling of each pixel. In this paper, we apply fuzzy clustering approaches for spot segmentation, which provides soft labeling of the pixel. We compare several fuzzy clustering approaches for microarray analysis and provide a comprehensive study of these approaches for spot segmentation. We show that possiblistic c-means clustering (PCM) provides the best performance in terms of stability criterion when testing on both a variety of simulated and real microarray images. In addition, we compared three statistical criteria in measuring gene expression levels and show that a new asymptotically unbiased statistic is able to quantify the gene expression level more accurately.
Collapse
Affiliation(s)
- Yu-Ping Wang
- School of Computing and Engineering, University of Missouri, Kansas City, MO 64110, USA
| | - Maheswar Gunampally
- School of Computing and Engineering, University of Missouri, Kansas City, MO 64110, USA
| | - Jie Chen
- Department of Mathematics and Statistics, Kansas City, MO 64110, USA
| | - Douglas Bittel
- Children's Mercy Hospital and Clinics, School of Medicine, University of Missouri, Kansas City, MO 64108, USA
| | - Merlin G Butler
- Children's Mercy Hospital and Clinics, School of Medicine, University of Missouri, Kansas City, MO 64108, USA
| | - Wei-Wen Cai
- Department of Human Molecular Genetics, Baylor College of Medicine, Houston, TX 77005, USA
| |
Collapse
|
15
|
Sathyan P, Golden HB, Miranda RC. Competing interactions between micro-RNAs determine neural progenitor survival and proliferation after ethanol exposure: evidence from an ex vivo model of the fetal cerebral cortical neuroepithelium. J Neurosci 2007; 27:8546-57. [PMID: 17687032 PMCID: PMC2915840 DOI: 10.1523/jneurosci.1269-07.2007] [Citation(s) in RCA: 218] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2006] [Revised: 06/17/2007] [Accepted: 06/26/2007] [Indexed: 12/28/2022] Open
Abstract
The fetal brain is sensitive to a variety of teratogens, including ethanol. We showed previously that ethanol induced mitosis and stem cell maturation, but not death, in fetal cerebral cortex-derived progenitors. We tested the hypothesis that micro-RNAs (miRNAs) could mediate the teratogenic effects of ethanol in a fetal mouse cerebral cortex-derived neurosphere culture model. Ethanol, at a level attained by alcoholics, significantly suppressed the expression of four miRNAs, miR-21, -335, -9, and -153, whereas a lower ethanol concentration, attainable during social drinking, induced miR-335 expression. A GABA(A) receptor-dependent mechanism mediated miR-21, but not miR-335 suppression, suggesting that divergent mechanisms regulate ethanol-sensitive miRNAs. Antisense-mediated suppression of miR-21 expression resulted in apoptosis, suggesting that miR-21 is an antiapoptotic factor. miR-335 knockdown promoted cell proliferation and prevented death induced by concurrently suppressing miR-21, indicating that miR-335 is a proapoptotic, antimitogenic factor whose actions are antagonistic to miR-21. Computational analyses identified two genes, Jagged-1, a Notch-receptor ligand, and embryonic-lethal abnormal vision, Drosophila-like 2 (ELAVL2), a brain-specific regulator of RNA stability, as presumptive targets of three of four ethanol-sensitive micro-RNAs. Combined knockdown of miR-335, -21, and -153 significantly increased Jagged-1 mRNA. Furthermore, ethanol induced both Jagged-1 and ELAVL2 mRNA. The collective suppression of micro-RNAs is consistent with ethanol induction of cell cycle and neuroepithelial maturation in the absence of apoptosis. These data identify a role for micro-RNAs as epigenetic intermediaries, which permit teratogens to shape complex, divergent developmental processes, and additionally demonstrate that coordinately regulated miRNAs exhibit both functional synergy and antagonism toward each other.
Collapse
Affiliation(s)
- Pratheesh Sathyan
- Department Neuroscience and Experimental Therapeutics, College of Medicine, Texas A&M Health Science Center, Texas 77843-1114
| | - Honey B. Golden
- Department Neuroscience and Experimental Therapeutics, College of Medicine, Texas A&M Health Science Center, Texas 77843-1114
| | - Rajesh C. Miranda
- Department Neuroscience and Experimental Therapeutics, College of Medicine, Texas A&M Health Science Center, Texas 77843-1114
| |
Collapse
|
16
|
Sjahputera O, Keller JM, Davis JW, Taylor KH, Rahmatpanah F, Shi H, Anderson DT, Blisard SN, Luke RH, Popescu M, Arthur GC, Caldwell CW. Relational analysis of CpG islands methylation and gene expression in human lymphomas using possibilistic C-means clustering and modified cluster fuzzy density. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2007; 4:176-89. [PMID: 17473312 DOI: 10.1109/tcbb.2007.070205] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Heterogeneous genetic and epigenetic alterations are commonly found in human non-Hodgkin's lymphomas (NHL). One such epigenetic alteration is aberrant methylation of gene promoter-related CpG islands, where hypermethylation frequently results in transcriptional inactivation of target genes, while a decrease or loss of promoter methylation (hypomethylation) is frequently associated with transcriptional activation. Discovering genes with these relationships in NHL or other types of cancers could lead to a better understanding of the pathobiology of these diseases. The simultaneous analysis of promoter methylation using Differential Methylation Hybridization (DMH) and its associated gene expression using Expressed CpG Island Sequence Tag (ECIST) microarrays generates a large volume of methylation-expression relational data. To analyze this data, we propose a set of algorithms based on fuzzy sets theory, in particular Possibilistic c-Means (PCM) and cluster fuzzy density. For each gene, these algorithms calculate measures of confidence of various methylation-expression relationships in each NHL subclass. Thus, these tools can be used as a means of high volume data exploration to better guide biological confirmation using independent molecular biology methods.
Collapse
Affiliation(s)
- Ozy Sjahputera
- Ellis Fischel Cancer Research Lab, Department of Pathology and Anatomical Sciences, University of Missouri School of Medicine, Columbia, MO 65212, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Siddiqui AS, Delaney AD, Schnerch A, Griffith OL, Jones SJM, Marra MA. Sequence biases in large scale gene expression profiling data. Nucleic Acids Res 2006; 34:e83. [PMID: 16840527 PMCID: PMC1524917 DOI: 10.1093/nar/gkl404] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
We present the results of a simple, statistical assay that measures the G+C content sensitivity bias of gene expression experiments without the requirement of a duplicate experiment. We analyse five gene expression profiling methods: Affymetrix GeneChip, Long Serial Analysis of Gene Expression (LongSAGE), LongSAGELite, 'Classic' Massively Parallel Signature Sequencing (MPSS) and 'Signature' MPSS. We demonstrate the methods have systematic and random errors leading to a different G+C content sensitivity. The relationship between this experimental error and the G+C content of the probe set or tag that identifies each gene influences whether the gene is detected and, if detected, the level of gene expression measured. LongSAGE has the least bias, while Signature MPSS shows a strong bias to G+C rich tags and Affymetrix data show different bias depending on the data processing method (MAS 5.0, RMA or GC-RMA). The bias in the Affymetrix data primarily impacts genes expressed at lower levels. Despite the larger sampling of the MPSS library, SAGE identifies significantly more genes (60% more RefSeq genes in a single comparison).
Collapse
Affiliation(s)
| | | | | | | | | | - Marco A. Marra
- To whom correspondence should be addressed at Genome Sciences Centre, Suite 100, 570 West 7th Avenue, Vancouver BC, Canada V5Z 4S6. Tel: 604 877 6082; Fax: 604 877 6085;
| |
Collapse
|
18
|
Verducci JS, Melfi VF, Lin S, Wang Z, Roy S, Sen CK. Microarray analysis of gene expression: considerations in data mining and statistical treatment. Physiol Genomics 2006; 25:355-63. [PMID: 16554544 DOI: 10.1152/physiolgenomics.00314.2004] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
DNA microarray represents a powerful tool in biomedical discoveries. Harnessing the potential of this technology depends on the development and appropriate use of data mining and statistical tools. Significant current advances have made microarray data mining more versatile. Researchers are no longer limited to default choices that generate suboptimal results. Conflicting results in repeated experiments can be resolved through attention to the statistical details. In the current dynamic environment, there are many choices and potential pitfalls for researchers who intend to incorporate microarrays as a research tool. This review is intended to provide a simple framework to understand the choices and identify the pitfalls. Specifically, this review article discusses the choice of microarray platform, preprocessing raw data, differential expression and validation, clustering, annotation and functional characterization of genes, and pathway construction in light of emergent concepts and tools.
Collapse
Affiliation(s)
- Joseph S Verducci
- Davis Heart and Lung Research Institute, Department of Surgery, The Ohio State University, Columbus, Ohio, USA
| | | | | | | | | | | |
Collapse
|
19
|
Berrar D, Bradbury I, Dubitzky W. Instance-based concept learning from multiclass DNA microarray data. BMC Bioinformatics 2006; 7:73. [PMID: 16483361 PMCID: PMC1402330 DOI: 10.1186/1471-2105-7-73] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2005] [Accepted: 02/16/2006] [Indexed: 12/01/2022] Open
Abstract
Background Various statistical and machine learning methods have been successfully applied to the classification of DNA microarray data. Simple instance-based classifiers such as nearest neighbor (NN) approaches perform remarkably well in comparison to more complex models, and are currently experiencing a renaissance in the analysis of data sets from biology and biotechnology. While binary classification of microarray data has been extensively investigated, studies involving multiclass data are rare. The question remains open whether there exists a significant difference in performance between NN approaches and more complex multiclass methods. Comparative studies in this field commonly assess different models based on their classification accuracy only; however, this approach lacks the rigor needed to draw reliable conclusions and is inadequate for testing the null hypothesis of equal performance. Comparing novel classification models to existing approaches requires focusing on the significance of differences in performance. Results We investigated the performance of instance-based classifiers, including a NN classifier able to assign a degree of class membership to each sample. This model alleviates a major problem of conventional instance-based learners, namely the lack of confidence values for predictions. The model translates the distances to the nearest neighbors into 'confidence scores'; the higher the confidence score, the closer is the considered instance to a pre-defined class. We applied the models to three real gene expression data sets and compared them with state-of-the-art methods for classifying microarray data of multiple classes, assessing performance using a statistical significance test that took into account the data resampling strategy. Simple NN classifiers performed as well as, or significantly better than, their more intricate competitors. Conclusion Given its highly intuitive underlying principles – simplicity, ease-of-use, and robustness – the k-NN classifier complemented by a suitable distance-weighting regime constitutes an excellent alternative to more complex models for multiclass microarray data sets. Instance-based classifiers using weighted distances are not limited to microarray data sets, but are likely to perform competitively in classifications of high-dimensional biological data sets such as those generated by high-throughput mass spectrometry.
Collapse
Affiliation(s)
- Daniel Berrar
- School of Biomedical Sciences, University of Ulster at Coleraine, Cromore Road, Northern Ireland, UK
| | - Ian Bradbury
- School of Biomedical Sciences, University of Ulster at Coleraine, Cromore Road, Northern Ireland, UK
| | - Werner Dubitzky
- School of Biomedical Sciences, University of Ulster at Coleraine, Cromore Road, Northern Ireland, UK
| |
Collapse
|
20
|
Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2005. [PMCID: PMC2447491 DOI: 10.1002/cfg.425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|