1
|
Sriwastava BK, Halder AK, Basu S, Chakraborti T. RUBic: rapid unsupervised biclustering. BMC Bioinformatics 2023; 24:435. [PMID: 37974081 PMCID: PMC10655409 DOI: 10.1186/s12859-023-05534-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 10/16/2023] [Indexed: 11/19/2023] Open
Abstract
Biclustering of biologically meaningful binary information is essential in many applications related to drug discovery, like protein-protein interactions and gene expressions. However, for robust performance in recently emerging large health datasets, it is important for new biclustering algorithms to be scalable and fast. We present a rapid unsupervised biclustering (RUBic) algorithm that achieves this objective with a novel encoding and search strategy. RUBic significantly reduces the computational overhead on both synthetic and experimental datasets shows significant computational benefits, with respect to several state-of-the-art biclustering algorithms. In 100 synthetic binary datasets, our method took [Formula: see text] s to extract 494,872 biclusters. In the human PPI database of size [Formula: see text], our method generates 1840 biclusters in [Formula: see text] s. On a central nervous system embryonic tumor gene expression dataset of size 712,940, our algorithm takes 101 min to produce 747,069 biclusters, while the recent competing algorithms take significantly more time to produce the same result. RUBic is also evaluated on five different gene expression datasets and shows significant speed-up in execution time with respect to existing approaches to extract significant KEGG-enriched bi-clustering. RUBic can operate on two modes, base and flex, where base mode generates maximal biclusters and flex mode generates less number of clusters and faster based on their biological significance with respect to KEGG pathways. The code is available at ( https://github.com/CMATERJU-BIOINFO/RUBic ) for academic use only.
Collapse
Affiliation(s)
- Brijesh K Sriwastava
- Computer Science and Engineering Department, Government College of Engineering and Leather Technology, Kolkata, India
| | - Anup Kumar Halder
- Faculty of Mathematics and Information Sciences, Warsaw University of Technology, Warsaw, Poland
- CeNT, University of Warsaw, Warsaw, Poland
| | - Subhadip Basu
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, 700032, India.
| | | |
Collapse
|
2
|
Chu HM, Kong XZ, Liu JX, Zheng CH, Zhang H. A New Binary Biclustering Algorithm Based on Weight Adjacency Difference Matrix for Analyzing Gene Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2802-2809. [PMID: 37285246 DOI: 10.1109/tcbb.2023.3283801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Biclustering algorithms are essential for processing gene expression data. However, to process the dataset, most biclustering algorithms require preprocessing the data matrix into a binary matrix. Regrettably, this type of preprocessing may introduce noise or cause information loss in the binary matrix, which would reduce the biclustering algorithm's ability to effectively obtain the optimal biclusters. In this paper, we propose a new preprocessing method named Mean-Standard Deviation (MSD) to resolve the problem. Additionally, we introduce a new biclustering algorithm called Weight Adjacency Difference Matrix Binary Biclustering (W-AMBB) to effectively process datasets containing overlapping biclusters. The basic idea is to create a weighted adjacency difference matrix by applying weights to a binary matrix that is derived from the data matrix. This allows us to identify genes with significant associations in sample data by efficiently identifying similar genes that respond to specific conditions. Furthermore, the performance of the W-AMBB algorithm was tested on both synthetic and real datasets and compared with other classical biclustering methods. The experiment results demonstrate that the W-AMBB algorithm is significantly more robust than the compared biclustering methods on the synthetic dataset. Additionally, the results of the GO enrichment analysis show that the W-AMBB method possesses biological significance on real datasets.
Collapse
|
3
|
Chu HM, Liu JX, Zhang K, Zheng CH, Wang J, Kong XZ. A binary biclustering algorithm based on the adjacency difference matrix for gene expression data analysis. BMC Bioinformatics 2022; 23:381. [PMID: 36123637 PMCID: PMC9484244 DOI: 10.1186/s12859-022-04842-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2022] [Accepted: 07/14/2022] [Indexed: 11/20/2022] Open
Abstract
Biclustering algorithm is an effective tool for processing gene expression datasets. There are two kinds of data matrices, binary data and non-binary data, which are processed by biclustering method. A binary matrix is usually converted from pre-processed gene expression data, which can effectively reduce the interference from noise and abnormal data, and is then processed using a biclustering algorithm. However, biclustering algorithms of dealing with binary data have a poor balance between running time and performance. In this paper, we propose a new biclustering algorithm called the Adjacency Difference Matrix Binary Biclustering algorithm (AMBB) for dealing with binary data to address the drawback. The AMBB algorithm constructs the adjacency matrix based on the adjacency difference values, and the submatrix obtained by continuously updating the adjacency difference matrix is called a bicluster. The adjacency matrix allows for clustering of gene that undergo similar reactions under different conditions into clusters, which is important for subsequent genes analysis. Meanwhile, experiments on synthetic and real datasets visually demonstrate that the AMBB algorithm has high practicability.
Collapse
Affiliation(s)
- He-Ming Chu
- School of Computer Science, Qufu Normal University, Rizhao, 276826, China
| | - Jin-Xing Liu
- School of Computer Science, Qufu Normal University, Rizhao, 276826, China
| | - Ke Zhang
- Department of Oncology, Rizhao People's Hospital, Rizhao, 276826, China.
| | - Chun-Hou Zheng
- School of Computer Science, Qufu Normal University, Rizhao, 276826, China
| | - Juan Wang
- School of Computer Science, Qufu Normal University, Rizhao, 276826, China
| | - Xiang-Zhen Kong
- School of Computer Science, Qufu Normal University, Rizhao, 276826, China.
| |
Collapse
|
4
|
Jia Y, Liu Z, Cheng X, Liu R, Li P, Kong D, Liang W, Liu B, Wang H, Bu X, Gao Y. DRAXIN as a Novel Diagnostic Marker to Predict the Poor Prognosis of Glioma Patients. J Mol Neurosci 2022; 72:2136-2149. [PMID: 36040678 PMCID: PMC9596576 DOI: 10.1007/s12031-022-02054-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 07/20/2022] [Indexed: 11/24/2022]
Abstract
An increasing number of evidences have shown that the carcinogenic effect of DRAXIN plays an important role in the malignant process of tumors, but the mechanism of its involvement in glioma has not yet been revealed. The main aim of this study is to explore the relationship between DRAXIN and the prognosis and pathogenesis of glioma through a large quality of data analysis. Firstly, thousands of tissue samples with clinical information were collected based on various public databases. Then, a series of bioinformatics analyses were performed to mine data from information of glioma samples extracted from several reputable databases to reveal the key role of DRAXIN in glioma development and progression, with the confirmation of basic experiments. Our results showed that high expression of the oncogene DRAXIN in tumor tissue and cells could be used as an independent risk factor for poor prognosis in glioma patients and was strongly associated with clinical risk features. The reverse transcription-quantitative PCR technique was then utilized to validate the DRAXIN expression results we obtained. In addition, co-expression analysis identified, respectively, top 10 genes that were closely associated with DRAXIN positively or negatively. Finally, in vitro experiments demonstrated that knockdown of DRAXIN significantly inhibited proliferation and invasion of glioma cell. To sum up, this is the first report of DRAXIN being highly expressed in gliomas and leading to poor prognosis of glioma patients. DRAXIN may not only benefit to explore the pathogenesis of gliomas, but also serve as a novel biological target for the treatment of glioma.
Collapse
Affiliation(s)
- Yulong Jia
- Department of Neurosurgery, School of Clinical Medicine, Henan Provincial People's Hospital, People's Hospital of Zhengzhou University, Henan University, Zhengzhou, China
| | - Zhendong Liu
- Department of Orthopedics, School of Clinical Medicine, Henan Provincial People's Hospital, People's Hospital of Zhengzhou University, Henan University, Zhengzhou, Henan, China.,Department of Microbiome Laboratory, Henan Provincial People's Hospital, People's Hospital of Zhengzhou University, Zhengzhou University, Zhengzhou, Henan, China
| | - Xingbo Cheng
- Department of Microbiome Laboratory, Henan Provincial People's Hospital, People's Hospital of Zhengzhou University, Zhengzhou University, Zhengzhou, Henan, China.,People's Hospital of Zhengzhou University, Henan Provincial People's Hospital, Zhengzhou, China
| | - Runze Liu
- Department of Microbiome Laboratory, Henan Provincial People's Hospital, People's Hospital of Zhengzhou University, Zhengzhou University, Zhengzhou, Henan, China.,People's Hospital of Henan University, Henan Provincial People's Hospital, Henan Province, 450003, China
| | - Pengxu Li
- Department of Orthopedics, School of Clinical Medicine, Henan Provincial People's Hospital, People's Hospital of Zhengzhou University, Henan University, Zhengzhou, Henan, China.,Department of Microbiome Laboratory, Henan Provincial People's Hospital, People's Hospital of Zhengzhou University, Zhengzhou University, Zhengzhou, Henan, China
| | - Defu Kong
- Department of Orthopedics, Henan Provincial People's Hospital, Zhengzhou, Henan, China.,School of Basic Medical Sciences, Xinxiang Medical University, Xinxiang, 453003, China
| | - Wenjia Liang
- People's Hospital of Henan University, Henan Provincial People's Hospital, Henan Province, 450003, China
| | - Binfeng Liu
- People's Hospital of Zhengzhou University, Henan Provincial People's Hospital, Zhengzhou, China
| | - Hongbo Wang
- People's Hospital of Henan University, Henan Provincial People's Hospital, Henan Province, 450003, China
| | - Xingyao Bu
- Department of Neurosurgery, Zhengzhou University People's Hospital, Henan Provincial People's Hospital, Zhengzhou, Henan, China.
| | - Yanzheng Gao
- Department of Surgery of Spine and Spinal Cord, Henan International Joint Laboratory of Intelligentized Orthopedics Innovation and Transformation, Henan Key Laboratory for Intelligent Precision Orthopedics, Henan Provincial People's Hospital, People's Hospital of Zhengzhou University, People's Hospital of Henan University, Henan, Zhengzhou, 453003, China.
| |
Collapse
|
5
|
Jiang X, Chen M, Song W, Lin GN. Label propagation-based semi-supervised feature selection on decoding clinical phenotypes with RNA-seq data. BMC Med Genomics 2021; 14:141. [PMID: 34465339 PMCID: PMC8406783 DOI: 10.1186/s12920-021-00985-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2020] [Accepted: 05/14/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Clinically, behavior, cognitive, and mental functions are affected during the neurodegenerative disease progression. To date, the molecular pathogenesis of these complex disease is still unclear. With the rapid development of sequencing technologies, it is possible to delicately decode the molecular mechanisms corresponding to different clinical phenotypes at the genome-wide transcriptomic level using computational methods. Our previous studies have shown that it is difficult to distinguish disease genes from non-disease genes. Therefore, to precisely explore the molecular pathogenesis under complex clinical phenotypes, it is better to identify biomarkers corresponding to different disease stages or clinical phenotypes. So, in this study, we designed a label propagation-based semi-supervised feature selection approach (LPFS) to prioritize disease-associated genes corresponding to different disease stages or clinical phenotypes. METHODS In this study, we pioneering put label propagation clustering and feature selection into one framework and proposed label propagation-based semi-supervised feature selection approach. LPFS prioritizes disease genes related to different disease stages or phenotypes through the alternative iteration of label propagation clustering based on sample network and feature selection with gene expression profiles. Then the GO and KEGG pathway enrichment analysis were carried as well as the gene functional analysis to explore molecular mechanisms of specific disease phenotypes, thus to decode the changes in individual behavioral and mental characteristics during neurodegenerative disease progression. RESULTS Large amounts of experiments were conducted to verify the performance of LPFS with Huntington's gene expression data. Experimental results shown that LPFS performs better in comparison with the-state-of-art methods. GO and KEGG enrichment analysis of key gene sets shown that TGF-beta signaling pathway, cytokine-cytokine receptor interaction, immune response, and inflammatory response were gradually affected during the Huntington's disease progression. In addition, we found that the expression of SLC4A11, ZFP474, AMBP, TOP2A, PBK, CCDC33, APSL, DLGAP5, and Al662270 changed seriously by the development of the disease. CONCLUSIONS In this study, we designed a label propagation-based semi-supervised feature selection model to precisely selected key genes of different disease phenotypes. We conducted experiments using the model with Huntington's disease mice gene expression data to decode the mechanisms of it. We found many cell types, including astrocyte, microglia, and GABAergic neuron, could be involved in the pathological process.
Collapse
Affiliation(s)
- Xue Jiang
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030 China
| | - Miao Chen
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030 China
| | - Weichen Song
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030 China
| | - Guan Ning Lin
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030 China
- Shanghai Key Laboratory of Psychotic Disorders, Shanghai, 200030 China
| |
Collapse
|
6
|
Maind A, Raut S. COSCEB: Comprehensive search for column-coherent evolution biclusters and its application to hub gene identification. J Biosci 2019; 44:48. [PMID: 31180061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Biclustering is an increasingly used data mining technique for searching groups of co-expressed genes across the subset of experimental conditions from the gene-expression data. The group of co-expressed genes is present in the form of various patterns called a bicluster. A bicluster provides significant insights related to the functionality of genes and plays an important role in various clinical applications such as drug discovery, biomarker discovery, gene network analysis, gene identification, disease diagnosis, pathway analysis etc. This paper presents a novel unsupervised approach 'COmprehensive Search for Column-Coherent Evolution Biclusters (COSCEB)' for a comprehensive search of biologically significant column-coherent evolution biclusters. The concept of column subspace extraction from each gene pair and Longest Common Contiguous Subsequence (LCCS) is employed to identify significant biclusters. The experiments have been performed on both synthetic as well as real datasets. The performance of COSCEB is evaluated with the help of key issues. The issues are comprehensive search, Deep OPSM bicluster, bicluster types, bicluster accuracy, bicluster size, noise, overlapping, output nature, computational complexity and biologically significant biclusters. The performance of COSCEB is compared with six all-time famous biclustering algorithms SAMBA, OPSM, xMotif, Bimax, Deep OPSM- and UniBic. The result shows that the proposed approach performs effectively on most of the issues and extracts all possible biologically significant column-coherent evolution biclusters which are far more than other biclustering algorithms. Along with the proposed approach, we have also presented the case study which shows the application of significant biclusters for hub gene identification.
Collapse
Affiliation(s)
- Ankush Maind
- Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology, Nagpur, Maharashtra 440 010, India
| | | |
Collapse
|
7
|
COSCEB: Comprehensive search for column-coherent evolution biclusters and its application to hub gene identification. J Biosci 2019. [DOI: 10.1007/s12038-019-9862-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
8
|
Cheng KO, Law NF, Siu WC. Clustering-Based Compression for Population DNA Sequences. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:208-221. [PMID: 29028207 DOI: 10.1109/tcbb.2017.2762302] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Due to the advancement of DNA sequencing techniques, the number of sequenced individual genomes has experienced an exponential growth. Thus, effective compression of this kind of sequences is highly desired. In this work, we present a novel compression algorithm called Reference-based Compression algorithm using the concept of Clustering (RCC). The rationale behind RCC is based on the observation about the existence of substructures within the population sequences. To utilize these substructures, k-means clustering is employed to partition sequences into clusters for better compression. A reference sequence is then constructed for each cluster so that sequences in that cluster can be compressed by referring to this reference sequence. The reference sequence of each cluster is also compressed with reference to a sequence which is derived from all the reference sequences. Experiments show that RCC can further reduce the compressed size by up to 91.0 percent when compared with state-of-the-art compression approaches. There is a compromise between compressed size and processing time. The current implementation in Matlab has time complexity in a factor of thousands higher than the existing algorithms implemented in C/C++. Further investigation is required to improve processing time in future.
Collapse
|
9
|
Houari A, Ayadi W, Ben Yahia S. A new FCA-based method for identifying biclusters in gene expression data. INT J MACH LEARN CYB 2018. [DOI: 10.1007/s13042-018-0794-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
10
|
Golchin M, Liew AWC. Parallel biclustering detection using strength Pareto front evolutionary algorithm. Inf Sci (N Y) 2017. [DOI: 10.1016/j.ins.2017.06.031] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
|
11
|
Zhang J, Liu Y, Bu Y, Zhang X, Yao Y. Factor Analysis of MYB Gene Expression and Flavonoid Affecting Petal Color in Three Crabapple Cultivars. FRONTIERS IN PLANT SCIENCE 2017; 8:137. [PMID: 28223999 PMCID: PMC5293739 DOI: 10.3389/fpls.2017.00137] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2016] [Accepted: 01/23/2017] [Indexed: 05/11/2023]
Abstract
Flavonoid biosynthesis has received much attention concerning the structural genes and expression of the associated transcription factors (TFs). In the present study, we examined the gene expression patterns for petals of three colors using a statistical method. Factor analysis was successfully used to examine the expression patterns most present during regulation. The first expression patterns in the white and red petals were clearly demonstrated and have revealed different mechanisms of producing the proper components, whereas that in the pink petals was more complex, requiring factor analysis to supplement the other results. Combining the results of the correlation analysis between TFs and structural genes, the effects of each TF on the main expression pattern in each cultivar were determined. Moreover, McMYB10 was implicated in the regulation of the gene expression pattern in red petals, and McMYB5 was implicated in the maintenance of the balance of the pigment components and proanthocyanin (PA) production in cooperation with McMYB4 to generate pigmentation in the pink petals.
Collapse
Affiliation(s)
- Jie Zhang
- Department of Plant Science and Technology, Beijing University of AgricultureBeijing, China
- Key Laboratory of New Technology in Agricultural Application of Beijing, Beijing University of AgricultureBeijing, China
- Beijing Collaborative Innovation Center for Eco-Environmental Improvement with Forestry and Fruit TreesBeijing, China
| | - Yingying Liu
- Department of Plant Science and Technology, Beijing University of AgricultureBeijing, China
- Key Laboratory of New Technology in Agricultural Application of Beijing, Beijing University of AgricultureBeijing, China
- Beijing Collaborative Innovation Center for Eco-Environmental Improvement with Forestry and Fruit TreesBeijing, China
| | - YuFen Bu
- Department of Plant Science and Technology, Beijing University of AgricultureBeijing, China
- Key Laboratory of New Technology in Agricultural Application of Beijing, Beijing University of AgricultureBeijing, China
- Beijing Collaborative Innovation Center for Eco-Environmental Improvement with Forestry and Fruit TreesBeijing, China
| | - Xi Zhang
- Department of Plant Science and Technology, Beijing University of AgricultureBeijing, China
- Key Laboratory of New Technology in Agricultural Application of Beijing, Beijing University of AgricultureBeijing, China
- Beijing Collaborative Innovation Center for Eco-Environmental Improvement with Forestry and Fruit TreesBeijing, China
| | - Yuncong Yao
- Department of Plant Science and Technology, Beijing University of AgricultureBeijing, China
- Key Laboratory of New Technology in Agricultural Application of Beijing, Beijing University of AgricultureBeijing, China
- Beijing Collaborative Innovation Center for Eco-Environmental Improvement with Forestry and Fruit TreesBeijing, China
- *Correspondence: Yuncong Yao
| |
Collapse
|
12
|
Zhao H, Wang DD, Chen L, Liu X, Yan H. Identifying Multi-Dimensional Co-Clusters in Tensors Based on Hyperplane Detection in Singular Vector Spaces. PLoS One 2016; 11:e0162293. [PMID: 27598575 PMCID: PMC5012624 DOI: 10.1371/journal.pone.0162293] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2016] [Accepted: 08/19/2016] [Indexed: 11/18/2022] Open
Abstract
Co-clustering, often called biclustering for two-dimensional data, has found many applications, such as gene expression data analysis and text mining. Nowadays, a variety of multi-dimensional arrays (tensors) frequently occur in data analysis tasks, and co-clustering techniques play a key role in dealing with such datasets. Co-clusters represent coherent patterns and exhibit important properties along all the modes. Development of robust co-clustering techniques is important for the detection and analysis of these patterns. In this paper, a co-clustering method based on hyperplane detection in singular vector spaces (HDSVS) is proposed. Specifically in this method, higher-order singular value decomposition (HOSVD) transforms a tensor into a core part and a singular vector matrix along each mode, whose row vectors can be clustered by a linear grouping algorithm (LGA). Meanwhile, hyperplanar patterns are extracted and successfully supported the identification of multi-dimensional co-clusters. To validate HDSVS, a number of synthetic and biological tensors were adopted. The synthetic tensors attested a favorable performance of this algorithm on noisy or overlapped data. Experiments with gene expression data and lineage data of embryonic cells further verified the reliability of HDSVS to practical problems. Moreover, the detected co-clusters are well consistent with important genetic pathways and gene ontology annotations. Finally, a series of comparisons between HDSVS and state-of-the-art methods on synthetic tensors and a yeast gene expression tensor were implemented, verifying the robust and stable performance of our method.
Collapse
Affiliation(s)
- Hongya Zhao
- Industrial Center, Shenzhen Polytechnic, Shenzhen, China
| | - Debby D. Wang
- Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong
- Caritas Institute of Higher Education, New Territories, Hong Kong
| | - Long Chen
- Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong
- * E-mail:
| | - Xinyu Liu
- Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong
| | - Hong Yan
- Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong
| |
Collapse
|
13
|
Saber HB, Elloumi M. A novel biclustering algorithm of binary microarray data: BiBinCons and BiBinAlter. BioData Min 2015; 8:38. [PMID: 26628919 PMCID: PMC4666179 DOI: 10.1186/s13040-015-0070-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2015] [Accepted: 11/08/2015] [Indexed: 11/10/2022] Open
Abstract
The biclustering of microarray data has been the subject of a large research. No one of the existing biclustering algorithms is perfect. The construction of biologically significant groups of biclusters for large microarray data is still a problem that requires a continuous work. Biological validation of biclusters of microarray data is one of the most important open issues. So far, there are no general guidelines in the literature on how to validate biologically extracted biclusters. In this paper, we develop two biclustering algorithms of binary microarray data, adopting the Iterative Row and Column Clustering Combination (IRCCC) approach, called BiBinCons and BiBinAlter. However, the BiBinAlter algorithm is an improvement of BiBinCons. On the other hand, BiBinAlter differs from BiBinCons by the use of the EvalStab and IndHomog evaluation functions in addition to the CroBin one (Bioinformatics 20:1993-2003, 2004). BiBinAlter can extracts biclusters of good quality with better p-values.
Collapse
Affiliation(s)
- Haifa Ben Saber
- Latice laboratory, ENSIT, Tunis Time université, Tunis, Tunisia
| | - Mourad Elloumi
- Latice laboratory, ENSIT, Tunis Time université, Tunis, Tunisia
- Latice laboratory, Ensit, Tunis Université tunis el manar, Tunis, Tunisia
| |
Collapse
|
14
|
|
15
|
Horta D, Campello RJGB. Similarity Measures for Comparing Biclusterings. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:942-954. [PMID: 26356865 DOI: 10.1109/tcbb.2014.2325016] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The comparison of ordinary partitions of a set of objects is well established in the clustering literature, which comprehends several studies on the analysis of the properties of similarity measures for comparing partitions. However, similarity measures for clusterings are not readily applicable to biclusterings, since each bicluster is a tuple of two sets (of rows and columns), whereas a cluster is only a single set (of rows). Some biclustering similarity measures have been defined as minor contributions in papers which primarily report on proposals and evaluation of biclustering algorithms or comparative analyses of biclustering algorithms. The consequence is that some desirable properties of such measures have been overlooked in the literature. We review 14 biclustering similarity measures. We define eight desirable properties of a biclustering measure, discuss their importance, and prove which properties each of the reviewed measures has. We show examples drawn and inspired from important studies in which several biclustering measures convey misleading evaluations due to the absence of one or more of the discussed properties. We also advocate the use of a more general comparison approach that is based on the idea of transforming the original problem of comparing biclusterings into an equivalent problem of comparing clustering partitions with overlapping clusters.
Collapse
|
16
|
Streit M, Gratzl S, Gillhofer M, Mayr A, Mitterecker A, Hochreiter S. Furby: fuzzy force-directed bicluster visualization. BMC Bioinformatics 2014; 15 Suppl 6:S4. [PMID: 25078951 PMCID: PMC4159731 DOI: 10.1186/1471-2105-15-s6-s4] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Cluster analysis is widely used to discover patterns in multi-dimensional data. Clustered heatmaps are the standard technique for visualizing one-way and two-way clustering results. In clustered heatmaps, rows and/or columns are reordered, resulting in a representation that shows the clusters as contiguous blocks. However, for biclustering results, where clusters can overlap, it is not possible to reorder the matrix in this way without duplicating rows and/or columns. RESULTS We present Furby, an interactive visualization technique for analyzing biclustering results. Our contribution is twofold. First, the technique provides an overview of a biclustering result, showing the actual data that forms the individual clusters together with the information which rows and columns they share. Second, for fuzzy clustering results, the proposed technique additionally enables analysts to interactively set the thresholds that transform the fuzzy (soft) clustering into hard clusters that can then be investigated using heatmaps or bar charts. Changes in the membership value thresholds are immediately reflected in the visualization. We demonstrate the value of Furby by loading biclustering results applied to a multi-tissue dataset into the visualization. CONCLUSIONS The proposed tool allows analysts to assess the overall quality of a biclustering result. Based on this high-level overview, analysts can then interactively explore the individual biclusters in detail. This novel way of handling fuzzy clustering results also supports analysts in finding the optimal thresholds that lead to the best clusters.
Collapse
Affiliation(s)
- Marc Streit
- Institute of Computer Graphics, Johannes Kepler University Linz, Austria
| | - Samuel Gratzl
- Institute of Computer Graphics, Johannes Kepler University Linz, Austria
| | - Michael Gillhofer
- Institute of Computer Graphics, Johannes Kepler University Linz, Austria
| | - Andreas Mayr
- Institute of Bioinformatics, Johannes Kepler University Linz, Austria
| | | | - Sepp Hochreiter
- Institute of Bioinformatics, Johannes Kepler University Linz, Austria
| |
Collapse
|
17
|
Chen HC, Zou W, Tien YJ, Chen JJ. Identification of bicluster regions in a binary matrix and its applications. PLoS One 2013; 8:e71680. [PMID: 23940779 PMCID: PMC3733970 DOI: 10.1371/journal.pone.0071680] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2012] [Accepted: 07/09/2013] [Indexed: 11/18/2022] Open
Abstract
Biclustering has emerged as an important approach to the analysis of large-scale datasets. A biclustering technique identifies a subset of rows that exhibit similar patterns on a subset of columns in a data matrix. Many biclustering methods have been proposed, and most, if not all, algorithms are developed to detect regions of "coherence" patterns. These methods perform unsatisfactorily if the purpose is to identify biclusters of a constant level. This paper presents a two-step biclustering method to identify constant level biclusters for binary or quantitative data. This algorithm identifies the maximal dimensional submatrix such that the proportion of non-signals is less than a pre-specified tolerance δ. The proposed method has much higher sensitivity and slightly lower specificity than several prominent biclustering methods from the analysis of two synthetic datasets. It was further compared with the Bimax method for two real datasets. The proposed method was shown to perform the most robust in terms of sensitivity, number of biclusters and number of serotype-specific biclusters identified. However, dichotomization using different signal level thresholds usually leads to different sets of biclusters; this also occurs in the present analysis.
Collapse
Affiliation(s)
- Hung-Chia Chen
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arkansas, United States of America
- Graduate Institute of Biostatistics and Biostatistics Center, China Medical University, Taichung, Taiwan
| | - Wen Zou
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arkansas, United States of America
| | - Yin-Jing Tien
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | - James J. Chen
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arkansas, United States of America
- Graduate Institute of Biostatistics and Biostatistics Center, China Medical University, Taichung, Taiwan
| |
Collapse
|
18
|
An J, Liew AWC, Nelson CC. Seed-based biclustering of gene expression data. PLoS One 2012; 7:e42431. [PMID: 22879981 PMCID: PMC3411756 DOI: 10.1371/journal.pone.0042431] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2012] [Accepted: 07/09/2012] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Accumulated biological research outcomes show that biological functions do not depend on individual genes, but on complex gene networks. Microarray data are widely used to cluster genes according to their expression levels across experimental conditions. However, functionally related genes generally do not show coherent expression across all conditions since any given cellular process is active only under a subset of conditions. Biclustering finds gene clusters that have similar expression levels across a subset of conditions. This paper proposes a seed-based algorithm that identifies coherent genes in an exhaustive, but efficient manner. METHODS In order to find the biclusters in a gene expression dataset, we exhaustively select combinations of genes and conditions as seeds to create candidate bicluster tables. The tables have two columns (a) a gene set, and (b) the conditions on which the gene set have dissimilar expression levels to the seed. First, the genes with less than the maximum number of dissimilar conditions are identified and a table of these genes is created. Second, the rows that have the same dissimilar conditions are grouped together. Third, the table is sorted in ascending order based on the number of dissimilar conditions. Finally, beginning with the first row of the table, a test is run repeatedly to determine whether the cardinality of the gene set in the row is greater than the minimum threshold number of genes in a bicluster. If so, a bicluster is outputted and the corresponding row is removed from the table. Repeating this process, all biclusters in the table are systematically identified until the table becomes empty. CONCLUSIONS This paper presents a novel biclustering algorithm for the identification of additive biclusters. Since it involves exhaustively testing combinations of genes and conditions, the additive biclusters can be found more readily.
Collapse
Affiliation(s)
- Jiyuan An
- Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane, Australia.
| | | | | |
Collapse
|
19
|
Ayadi W, Elloumi M, Hao JK. Pattern-driven neighborhood search for biclustering of microarray data. BMC Bioinformatics 2012; 13 Suppl 7:S11. [PMID: 22594997 PMCID: PMC3348021 DOI: 10.1186/1471-2105-13-s7-s11] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Biclustering aims at finding subgroups of genes that show highly correlated behaviors across a subgroup of conditions. Biclustering is a very useful tool for mining microarray data and has various practical applications. From a computational point of view, biclustering is a highly combinatorial search problem and can be solved with optimization methods. RESULTS We describe a stochastic pattern-driven neighborhood search algorithm for the biclustering problem. Starting from an initial bicluster, the proposed method improves progressively the quality of the bicluster by adjusting some genes and conditions. The adjustments are based on the quality of each gene and condition with respect to the bicluster and the initial data matrix. The performance of the method was evaluated on two well-known microarray datasets (Yeast cell cycle and Saccharomyces cerevisiae), showing that it is able to obtain statistically and biologically significant biclusters. The proposed method was also compared with six reference methods from the literature. CONCLUSIONS The proposed method is computationally fast and can be applied to discover significant biclusters. It can also used to effectively improve the quality of existing biclusters provided by other biclustering methods.
Collapse
Affiliation(s)
- Wassim Ayadi
- LERIA, Université d'Angers, 2 Boulevard Lavoisier, 49045 Angers Cedex 01, France
- LaTICE, Higher School of Sciences and Technologies of Tunis, 5 Avenue Taha Hussein, B. P. : 56, Bab Menara, 1008 Tunis, University of Tunis, Tunisia
| | - Mourad Elloumi
- LaTICE, Higher School of Sciences and Technologies of Tunis, 5 Avenue Taha Hussein, B. P. : 56, Bab Menara, 1008 Tunis, University of Tunis, Tunisia
| | - Jin-Kao Hao
- LERIA, Université d'Angers, 2 Boulevard Lavoisier, 49045 Angers Cedex 01, France
| |
Collapse
|
20
|
Mishra D. Discovery of Overlapping Pattern Biclusters from Gene Expression Data using Hash based PSO. ACTA ACUST UNITED AC 2012. [DOI: 10.1016/j.protcy.2012.05.060] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
21
|
|
22
|
|
23
|
Turcan S, Vetter DE, Maron JL, Wei X, Slonim DK. Mining functionally relevant gene sets for analyzing physiologically novel clinical expression data. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2011:50-61. [PMID: 21121032 PMCID: PMC3201790 DOI: 10.1142/9789814335058_0006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2023]
Abstract
Gene set analyses have become a standard approach for increasing the sensitivity of transcriptomic studies. However, analytical methods incorporating gene sets require the availability of pre-defined gene sets relevant to the underlying physiology being studied. For novel physiological problems, relevant gene sets may be unavailable or existing gene set databases may bias the results towards only the best-studied of the relevant biological processes. We describe a successful attempt to mine novel functional gene sets for translational projects where the underlying physiology is not necessarily well characterized in existing annotation databases. We choose targeted training data from public expression data repositories and define new criteria for selecting biclusters to serve as candidate gene sets. Many of the discovered gene sets show little or no enrichment for informative Gene Ontology terms or other functional annotation. However, we observe that such gene sets show coherent differential expression in new clinical test data sets, even if derived from different species, tissues, and disease states. We demonstrate the efficacy of this method on a human metabolic data set, where we discover novel, uncharacterized gene sets that are diagnostic of diabetes, and on additional data sets related to neuronal processes and human development. Our results suggest that our approach may be an efficient way to generate a collection of gene sets relevant to the analysis of data for novel clinical applications where existing functional annotation is relatively incomplete.
Collapse
Affiliation(s)
- Sevin Turcan
- Department of Biomedical Engineering, Tufts University, 4 Colby St., Medford, MA 02155, USA.
| | | | | | | | | |
Collapse
|
24
|
Liew AWC, Law NF, Yan H. Missing value imputation for gene expression data: computational techniques to recover missing data from available information. Brief Bioinform 2010; 12:498-513. [PMID: 21156727 DOI: 10.1093/bib/bbq080] [Citation(s) in RCA: 92] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Affiliation(s)
- Alan Wee-Chung Liew
- School of Information and Communication Technology, Gold Coast Campus, Griffith University, QLD4222, Australia.
| | | | | |
Collapse
|
25
|
Reddy CK, Aziz MS. Modeling local nonlinear correlations using subspace principal curves. Stat Anal Data Min 2010. [DOI: 10.1002/sam.10086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
26
|
Erten C, Sözdinler M. Improving performances of suboptimal greedy iterative biclustering heuristics via localization. ACTA ACUST UNITED AC 2010; 26:2594-600. [PMID: 20733064 DOI: 10.1093/bioinformatics/btq473] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Biclustering gene expression data is the problem of extracting submatrices of genes and conditions exhibiting significant correlation across both the rows and the columns of a data matrix of expression values. Even the simplest versions of the problem are computationally hard. Most of the proposed solutions therefore employ greedy iterative heuristics that locally optimize a suitably assigned scoring function. METHODS We provide a fast and simple pre-processing algorithm called localization that reorders the rows and columns of the input data matrix in such a way as to group correlated entries in small local neighborhoods within the matrix. The proposed localization algorithm takes its roots from effective use of graph-theoretical methods applied to problems exhibiting a similar structure to that of biclustering. In order to evaluate the effectivenesss of the localization pre-processing algorithm, we focus on three representative greedy iterative heuristic methods. We show how the localization pre-processing can be incorporated into each representative algorithm to improve biclustering performance. Furthermore, we propose a simple biclustering algorithm, Random Extraction After Localization (REAL) that randomly extracts submatrices from the localization pre-processed data matrix, eliminates those with low similarity scores, and provides the rest as correlated structures representing biclusters. RESULTS We compare the proposed localization pre-processing with another pre-processing alternative, non-negative matrix factorization. We show that our fast and simple localization procedure provides similar or even better results than the computationally heavy matrix factorization pre-processing with regards to H-value tests. We next demonstrate that the performances of the three representative greedy iterative heuristic methods improve with localization pre-processing when biological correlations in the form of functional enrichment and PPI verification constitute the main performance criteria. The fact that the random extraction method based on localization REAL performs better than the representative greedy heuristic methods under same criteria also confirms the effectiveness of the suggested pre-processing method. AVAILABILITY Supplementary material including code implementations in LEDA C++ library, experimental data, and the results are available at http://code.google.com/p/biclustering/ CONTACTS cesim@khas.edu.tr; melihsozdinler@boun.edu.tr SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Cesim Erten
- Department of Computer Engineering, Kadir Has University, Cibali, Istanbul 34083, Turkey.
| | | |
Collapse
|
27
|
|
28
|
Ayadi W, Elloumi M, Hao JK. A biclustering algorithm based on a bicluster enumeration tree: application to DNA microarray data. BioData Min 2009; 2:9. [PMID: 20015398 PMCID: PMC2804695 DOI: 10.1186/1756-0381-2-9] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2009] [Accepted: 12/16/2009] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND In a number of domains, like in DNA microarray data analysis, we need to cluster simultaneously rows (genes) and columns (conditions) of a data matrix to identify groups of rows coherent with groups of columns. This kind of clustering is called biclustering. Biclustering algorithms are extensively used in DNA microarray data analysis. More effective biclustering algorithms are highly desirable and needed. METHODS We introduce BiMine, a new enumeration algorithm for biclustering of DNA microarray data. The proposed algorithm is based on three original features. First, BiMine relies on a new evaluation function called Average Spearman's rho (ASR). Second, BiMine uses a new tree structure, called Bicluster Enumeration Tree (BET), to represent the different biclusters discovered during the enumeration process. Third, to avoid the combinatorial explosion of the search tree, BiMine introduces a parametric rule that allows the enumeration process to cut tree branches that cannot lead to good biclusters. RESULTS The performance of the proposed algorithm is assessed using both synthetic and real DNA microarray data. The experimental results show that BiMine competes well with several other biclustering methods. Moreover, we test the biological significance using a gene annotation web-tool to show that our proposed method is able to produce biologically relevant biclusters. The software is available upon request from the authors to academic users.
Collapse
Affiliation(s)
- Wassim Ayadi
- UTIC, Higher School of Sciences and Technologies of Tunis, 1008 Tunis, Tunisia
- LERIA, Université d'Angers, 2 Boulevard Lavoisier, 49045 Angers, France
| | - Mourad Elloumi
- UTIC, Higher School of Sciences and Technologies of Tunis, 1008 Tunis, Tunisia
| | - Jin-Kao Hao
- LERIA, Université d'Angers, 2 Boulevard Lavoisier, 49045 Angers, France
| |
Collapse
|
29
|
Zeng T, Li J. Maximization of negative correlations in time-course gene expression data for enhancing understanding of molecular pathways. Nucleic Acids Res 2009; 38:e1. [PMID: 19854949 PMCID: PMC2800212 DOI: 10.1093/nar/gkp822] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
Positive correlation can be diversely instantiated as shifting, scaling or geometric pattern, and it has been extensively explored for time-course gene expression data and pathway analysis. Recently, biological studies emerge a trend focusing on the notion of negative correlations such as opposite expression patterns, complementary patterns and self-negative regulation of transcription factors (TFs). These biological ideas and primitive observations motivate us to formulate and investigate the problem of maximizing negative correlations. The objective is to discover all maximal negative correlations of statistical and biological significance from time-course gene expression data for enhancing our understanding of molecular pathways. Given a gene expression matrix, a maximal negative correlation is defined as an activation–inhibition two-way expression pattern (AIE pattern). We propose a parameter-free algorithm to enumerate the complete set of AIE patterns from a data set. This algorithm can identify significant negative correlations that cannot be identified by the traditional clustering/biclustering methods. To demonstrate the biological usefulness of AIE patterns in the analysis of molecular pathways, we conducted deep case studies for AIE patterns identified from Yeast cell cycle data sets. In particular, in the analysis of the Lysine biosynthesis pathway, new regulation modules and pathway components were inferred according to a significant negative correlation which is likely caused by a co-regulation of the TFs at the higher layer of the biological network. We conjecture that maximal negative correlations between genes are actually a common characteristic in molecular pathways, which can provide insights into the cell stress response study, drug response evaluation, etc.
Collapse
Affiliation(s)
- Tao Zeng
- School of Computer Engineering & Bioinformatics Research Center, Nanyang Technological University, Singapore
| | | |
Collapse
|
30
|
Chou JW, Bushel PR. Discernment of possible mechanisms of hepatotoxicity via biological processes over-represented by co-expressed genes. BMC Genomics 2009; 10:272. [PMID: 19538742 PMCID: PMC2706894 DOI: 10.1186/1471-2164-10-272] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2008] [Accepted: 06/18/2009] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Hepatotoxicity is a form of liver injury caused by exposure to stressors. Genomic-based approaches have been used to detect changes in transcription in response to hepatotoxicants. However, there are no straightforward ways of using co-expressed genes anchored to a phenotype or constrained by the experimental design for discerning mechanisms of a biological response. RESULTS Through the analysis of a gene expression dataset containing 318 liver samples from rats exposed to hepatotoxicants and leveraging alanine aminotransferase (ALT), a serum enzyme indicative of liver injury as the phenotypic marker, we identified biological processes and molecular pathways that may be associated with mechanisms of hepatotoxicity. Our analysis used an approach called Coherent Co-expression Biclustering (cc-Biclustering) for clustering a subset of genes through a coherent (consistency) measure within each group of samples representing a subset of experimental conditions. Supervised biclustering identified 87 genes co-expressed and correlated with ALT in all the samples exposed to the chemicals. None of the over-represented pathways related to liver injury. However, biclusters with subsets of samples exposed to one of the 7 hepatotoxicants, but not to a non-toxic isomer, contained co-expressed genes that represented pathways related to a stress response. Unsupervised biclustering of the data resulted in 1) four to five times more genes within the bicluster containing all the samples exposed to the chemicals, 2) biclusters with co-expression of genes that discerned 1,4 dichlorobenzene (a non-toxic isomer at low and mid doses) from the other chemicals, pathways and biological processes that underlie liver injury and 3) a bicluster with genes up-regulated in an early response to toxic exposure. CONCLUSION We obtained clusters of co-expressed genes that over-represented biological processes and molecular pathways related to hepatotoxicity in the rat. The mechanisms involved in the response of the liver to the exposure to 1,4-dichlorobenzene suggest non-genotoxicity whereas the exposure to the hepatotoxicants could be DNA damaging leading to overall genomic instability and activation of cell cycle check point signaling. In addition, key pathways and biological processes representative of an inflammatory response, energy production and apoptosis were impacted by the hepatotoxicant exposures that manifested liver injury in the rat.
Collapse
Affiliation(s)
- Jeff W Chou
- Biostatistics Branch, National Institute of Environmental Health Sciences, NIH, Research Triangle Park, NC, USA.
| | | |
Collapse
|