1
|
Sabah A, Tiun S, Sani NS, Ayob M, Taha AY. Enhancing web search result clustering model based on multiview multirepresentation consensus cluster ensemble (mmcc) approach. PLoS One 2021; 16:e0245264. [PMID: 33449949 PMCID: PMC7810326 DOI: 10.1371/journal.pone.0245264] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Accepted: 12/26/2020] [Indexed: 11/18/2022] Open
Abstract
Existing text clustering methods utilize only one representation at a time (single view), whereas multiple views can represent documents. The multiview multirepresentation method enhances clustering quality. Moreover, existing clustering methods that utilize more than one representation at a time (multiview) use representation with the same nature. Hence, using multiple views that represent data in a different representation with clustering methods is reasonable to create a diverse set of candidate clustering solutions. On this basis, an effective dynamic clustering method must consider combining multiple views of data including semantic view, lexical view (word weighting), and topic view as well as the number of clusters. The main goal of this study is to develop a new method that can improve the performance of web search result clustering (WSRC). An enhanced multiview multirepresentation consensus clustering ensemble (MMCC) method is proposed to create a set of diverse candidate solutions and select a high-quality overlapping cluster. The overlapping clusters are obtained from the candidate solutions created by different clustering methods. The framework to develop the proposed MMCC includes numerous stages: (1) acquiring the standard datasets (MORESQUE and Open Directory Project-239), which are used to validate search result clustering algorithms, (2) preprocessing the dataset, (3) applying multiview multirepresentation clustering models, (4) using the radius-based cluster number estimation algorithm, and (5) employing the consensus clustering ensemble method. Results show an improvement in clustering methods when multiview multirepresentation is used. More importantly, the proposed MMCC model improves the overall performance of WSRC compared with all single-view clustering models.
Collapse
Affiliation(s)
- Ali Sabah
- Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia
| | - Sabrina Tiun
- Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia
- * E-mail:
| | - Nor Samsiah Sani
- Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia
| | - Masri Ayob
- Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia
| | - Adil Yaseen Taha
- Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia
| |
Collapse
|
2
|
Mahini R, Li Y, Ding W, Fu R, Ristaniemi T, Nandi AK, Chen G, Cong F. Determination of the Time Window of Event-Related Potential Using Multiple-Set Consensus Clustering. Front Neurosci 2020; 14:521595. [PMID: 33192239 PMCID: PMC7610058 DOI: 10.3389/fnins.2020.521595] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Accepted: 09/09/2020] [Indexed: 01/27/2023] Open
Abstract
Clustering is a promising tool for grouping the sequence of similar time-points aimed to identify the attention blocks in spatiotemporal event-related potentials (ERPs) analysis. It is most likely to elicit the appropriate time window for ERP of interest if a suitable clustering method is applied to spatiotemporal ERP. However, how to reliably estimate a proper time window from entire individual subjects' data is still challenging. In this study, we developed a novel multiset consensus clustering method in which several clustering results of multiple subjects were combined to retrieve the best fitted clustering for all the subjects within a group. Then, the obtained clustering was processed by a newly proposed time-window detection method to determine the most suitable time window for identifying the ERP of interest in each condition/group. Applying the proposed method to the simulated ERP data and real data indicated that the brain responses from the individual subjects can be collected to determine a reliable time window for different conditions/groups. Our results revealed more precise time windows to identify N2 and P3 components in the simulated data compared to the state-of-the-art methods. Additionally, our proposed method achieved more robust performance and outperformed statistical analysis results in the real data for N300 and prospective positivity components. To conclude, the proposed method successfully estimates the time window for ERP of interest by processing the individual data, offering new venues for spatiotemporal ERP processing.
Collapse
Affiliation(s)
- Reza Mahini
- School of Biomedical Engineering, Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, China
- Faculty of Information Technology, University of Jyvaskyla, Jyvaskyla, Finland
| | - Yansong Li
- Reward, Competition and Social Neuroscience Lab, Department of Psychology, School of Social and Behavioral Sciences, Nanjing University, Nanjing, China
- Institute for Brain Sciences, Nanjing University, Nanjing, China
| | - Weiyan Ding
- Department of Psychiatry, Chinese PLA 967th Hospital, Dalian, China
| | - Rao Fu
- School of Biomedical Engineering, Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, China
| | - Tapani Ristaniemi
- Faculty of Information Technology, University of Jyvaskyla, Jyvaskyla, Finland
| | - Asoke K. Nandi
- Department of Electronic and Computer Engineering, Brunel University London, Uxbridge, United Kingdom
| | - Guoliang Chen
- Department of Psychiatry, Chinese PLA 967th Hospital, Dalian, China
| | - Fengyu Cong
- School of Biomedical Engineering, Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, China
- Faculty of Information Technology, University of Jyvaskyla, Jyvaskyla, Finland
- School of Artificial Intelligence, Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, China
- Key Laboratory of Integrated Circuit and Biomedical Electronic System, Liaoning Province, Dalian University of Technology, Dalian, China
| |
Collapse
|
3
|
Li Q, Shi R, Liang F. Drug sensitivity prediction with high-dimensional mixture regression. PLoS One 2019; 14:e0212108. [PMID: 30811440 PMCID: PMC6392252 DOI: 10.1371/journal.pone.0212108] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2018] [Accepted: 01/27/2019] [Indexed: 11/28/2022] Open
Abstract
This paper proposes a mixture regression model-based method for drug sensitivity prediction. The proposed method explicitly addresses two fundamental issues in drug sensitivity prediction, namely, population heterogeneity and feature selection pertaining to each of the subpopulations. The mixture regression model is estimated using the imputation-conditional consistency algorithm, and the resulting estimator is consistent. This paper also proposes an average-BIC criterion for determining the number of components for the mixture regression model. The proposed method is applied to the CCLE dataset, and the numerical results indicate that the proposed method can make a drastic improvement over the existing ones, such as random forest, support vector regression, and regularized linear regression, in both drug sensitivity prediction and feature selection. The p-values for the comparisons in drug sensitivity prediction can reach the order O(10-8) or lower for the drugs with heterogeneous populations.
Collapse
Affiliation(s)
- Qianyun Li
- Department of Biostatistics, University of Florida, Gainesville, FL 32611, United States of America
| | - Runmin Shi
- Department of Statistics, University of Florida, Gainesville, FL 32611, United States of America
| | - Faming Liang
- Department of Statistics, Purdue University, West Lafayette, IN 47906, United States of America
| |
Collapse
|
4
|
Abu-Jamous B, Kelly S. Clust: automatic extraction of optimal co-expressed gene clusters from gene expression data. Genome Biol 2018; 19:172. [PMID: 30359297 PMCID: PMC6203272 DOI: 10.1186/s13059-018-1536-8] [Citation(s) in RCA: 88] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Accepted: 09/11/2018] [Indexed: 01/24/2023] Open
Abstract
Identifying co-expressed gene clusters can provide evidence for genetic or physical interactions. Thus, co-expression clustering is a routine step in large-scale analyses of gene expression data. We show that commonly used clustering methods produce results that substantially disagree and that do not match the biological expectations of co-expressed gene clusters. We present clust, a method that solves these problems by extracting clusters matching the biological expectations of co-expressed genes and outperforms widely used methods. Additionally, clust can simultaneously cluster multiple datasets, enabling users to leverage the large quantity of public expression data for novel comparative analysis. Clust is available at https://github.com/BaselAbujamous/clust.
Collapse
Affiliation(s)
- Basel Abu-Jamous
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK
| | - Steven Kelly
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK.
| |
Collapse
|
5
|
Reybrouck M, Vuust P, Brattico E. Brain Connectivity Networks and the Aesthetic Experience of Music. Brain Sci 2018; 8:brainsci8060107. [PMID: 29895737 PMCID: PMC6025331 DOI: 10.3390/brainsci8060107] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2018] [Revised: 05/28/2018] [Accepted: 06/05/2018] [Indexed: 12/12/2022] Open
Abstract
Listening to music is above all a human experience, which becomes an aesthetic experience when an individual immerses himself/herself in the music, dedicating attention to perceptual-cognitive-affective interpretation and evaluation. The study of these processes where the individual perceives, understands, enjoys and evaluates a set of auditory stimuli has mainly been focused on the effect of music on specific brain structures, as measured with neurophysiology and neuroimaging techniques. The very recent application of network science algorithms to brain research allows an insight into the functional connectivity between brain regions. These studies in network neuroscience have identified distinct circuits that function during goal-directed tasks and resting states. We review recent neuroimaging findings which indicate that music listening is traceable in terms of network connectivity and activations of target regions in the brain, in particular between the auditory cortex, the reward brain system and brain regions active during mind wandering.
Collapse
Affiliation(s)
- Mark Reybrouck
- Faculty of Arts, University of Leuven, 3000 Leuven, Belgium.
- Department of Art History, Musicology and Theater Studies, IPEM Institute for Psychoacoustics and Electronic Music, 9000 Ghent, Belgium.
| | - Peter Vuust
- Center for Music in the Brain, Department of Clinical Medicine, Aarhus University & The Royal Academy of Music Aarhus/Aalborg, 8000 Aarhus, Denmark.
| | - Elvira Brattico
- Center for Music in the Brain, Department of Clinical Medicine, Aarhus University & The Royal Academy of Music Aarhus/Aalborg, 8000 Aarhus, Denmark.
| |
Collapse
|
6
|
|
7
|
Liu C, Brattico E, Abu-Jamous B, Pereira CS, Jacobsen T, Nandi AK. Effect of Explicit Evaluation on Neural Connectivity Related to Listening to Unfamiliar Music. Front Hum Neurosci 2017; 11:611. [PMID: 29311874 PMCID: PMC5742221 DOI: 10.3389/fnhum.2017.00611] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2017] [Accepted: 11/30/2017] [Indexed: 12/26/2022] Open
Abstract
People can experience different emotions when listening to music. A growing number of studies have investigated the brain structures and neural connectivities associated with perceived emotions. However, very little is known about the effect of an explicit act of judgment on the neural processing of emotionally-valenced music. In this study, we adopted the novel consensus clustering paradigm, called binarisation of consensus partition matrices (Bi-CoPaM), to study whether and how the conscious aesthetic evaluation of the music would modulate brain connectivity networks related to emotion and reward processing. Participants listened to music under three conditions - one involving a non-evaluative judgment, one involving an explicit evaluative aesthetic judgment, and one involving no judgment at all (passive listening only). During non-evaluative attentive listening we obtained auditory-limbic connectivity whereas when participants were asked to decide explicitly whether they liked or disliked the music excerpt, only two clusters of intercommunicating brain regions were found: one including areas related to auditory processing and action observation, and the other comprising higher-order structures involved with visual processing. Results indicate that explicit evaluative judgment has an impact on the neural auditory-limbic connectivity during affective processing of music.
Collapse
Affiliation(s)
- Chao Liu
- Department of Electronic and Computer Engineering, Brunel University London, Uxbridge, United Kingdom
| | - Elvira Brattico
- Department of Clinical Medicine, Center for Music in the Brain, Aarhus University & Royal Academy of Music Aarhus/Aalborg, Aarhus, Denmark.,AMI Centre, School of Science, Aalto University, Espoo, Finland
| | - Basel Abu-Jamous
- Department of Electronic and Computer Engineering, Brunel University London, Uxbridge, United Kingdom
| | | | - Thomas Jacobsen
- Experimental Psychology Unit, Helmut Schmidt University, University of Federal Armed Forces, Hamburg, Germany
| | - Asoke K Nandi
- Department of Electronic and Computer Engineering, Brunel University London, Uxbridge, United Kingdom.,The Key Laboratory of Embedded Systems and Service Computing, College of Electronic and Information Engineering, Tongji University, Shanghai, China
| |
Collapse
|
8
|
Paul AK, Shill PC. Incorporating gene ontology into fuzzy relational clustering of microarray gene expression data. Biosystems 2017; 163:1-10. [PMID: 29113811 DOI: 10.1016/j.biosystems.2017.09.017] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Revised: 09/26/2017] [Accepted: 09/27/2017] [Indexed: 12/28/2022]
Abstract
The product of gene expression works together in the cell for each living organism in order to achieve different biological processes. Many proteins are involved in different roles depending on the environment of the organism for the functioning of the cell. In this paper, we propose gene ontology (GO) annotations based semi-supervised clustering algorithm called GO fuzzy relational clustering (GO-FRC) where one gene is allowed to be assigned to multiple clusters which are the most biologically relevant behavior of genes. In the clustering process, GO-FRC utilizes useful biological knowledge which is available in the form of a gene ontology, as a prior knowledge along with the gene expression data. The prior knowledge helps to improve the coherence of the groups concerning the knowledge field. The proposed GO-FRC has been tested on the two yeast (Saccharomyces cerevisiae) expression profiles datasets (Eisen and Dream5 yeast datasets) and compared with other state-of-the-art clustering algorithms. Experimental results imply that GO-FRC is able to produce more biologically relevant clusters with the use of the small amount of GO annotations.
Collapse
Affiliation(s)
- Animesh Kumar Paul
- Department of Computer Science and Engineering, Khulna University of Engineering & Technology, Khulna, Bangladesh.
| | - Pintu Chandra Shill
- Department of Computer Science and Engineering, Khulna University of Engineering & Technology, Khulna, Bangladesh
| |
Collapse
|
9
|
|
10
|
Abu-Jamous B, Buffa FM, Harris AL, Nandi AK. In vitro downregulated hypoxia transcriptome is associated with poor prognosis in breast cancer. Mol Cancer 2017; 16:105. [PMID: 28619028 PMCID: PMC5472949 DOI: 10.1186/s12943-017-0673-0] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2016] [Accepted: 06/02/2017] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Hypoxia is a characteristic of breast tumours indicating poor prognosis. Based on the assumption that those genes which are up-regulated under hypoxia in cell-lines are expected to be predictors of poor prognosis in clinical data, many signatures of poor prognosis were identified. However, it was observed that cell line data do not always concur with clinical data, and therefore conclusions from cell line analysis should be considered with caution. As many transcriptomic cell-line datasets from hypoxia related contexts are available, integrative approaches which investigate these datasets collectively, while not ignoring clinical data, are required. RESULTS We analyse sixteen heterogeneous breast cancer cell-line transcriptomic datasets in hypoxia-related conditions collectively by employing the unique capabilities of the method, UNCLES, which integrates clustering results from multiple datasets and can address questions that cannot be answered by existing methods. This has been demonstrated by comparison with the state-of-the-art iCluster method. From this collection of genome-wide datasets include 15,588 genes, UNCLES identified a relatively high number of genes (>1000 overall) which are consistently co-regulated over all of the datasets, and some of which are still poorly understood and represent new potential HIF targets, such as RSBN1 and KIAA0195. Two main, anti-correlated, clusters were identified; the first is enriched with MYC targets participating in growth and proliferation, while the other is enriched with HIF targets directly participating in the hypoxia response. Surprisingly, in six clinical datasets, some sub-clusters of growth genes are found consistently positively correlated with hypoxia response genes, unlike the observation in cell lines. Moreover, the ability to predict bad prognosis by a combined signature of one sub-cluster of growth genes and one sub-cluster of hypoxia-induced genes appears to be comparable and perhaps greater than that of known hypoxia signatures. CONCLUSIONS We present a clustering approach suitable to integrate data from diverse experimental set-ups. Its application to breast cancer cell line datasets reveals new hypoxia-regulated signatures of genes which behave differently when in vitro (cell-line) data is compared with in vivo (clinical) data, and are of a prognostic value comparable or exceeding the state-of-the-art hypoxia signatures.
Collapse
Affiliation(s)
- Basel Abu-Jamous
- Department of Electronic and Computer Engineering, Brunel University London, Uxbridge, Middlesex, UB8 3PH UK
- Department of Plant Sciences, University of Oxford, Oxford, OX1 3RB UK
| | - Francesca M. Buffa
- Cancer Research UK, Department of Oncology, Weatherall Institute of Molecular Medicine, Oxford, OX3 9DS UK
| | - Adrian L. Harris
- Cancer Research UK, Department of Oncology, Weatherall Institute of Molecular Medicine, Oxford, OX3 9DS UK
| | - Asoke K. Nandi
- Department of Electronic and Computer Engineering, Brunel University London, Uxbridge, Middlesex, UB8 3PH UK
- The Key Laboratory of Embedded Systems and Service Computing, College of Electronic and Information Engineering, Tongji University, Shanghai, Peoples, Republic of China
| |
Collapse
|
11
|
Liu C, Abu-Jamous B, Brattico E, Nandi AK. Towards Tunable Consensus Clustering for Studying Functional Brain Connectivity During Affective Processing. Int J Neural Syst 2016; 27:1650042. [DOI: 10.1142/s0129065716500428] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
In the past decades, neuroimaging of humans has gained a position of status within neuroscience, and data-driven approaches and functional connectivity analyses of functional magnetic resonance imaging (fMRI) data are increasingly favored to depict the complex architecture of human brains. However, the reliability of these findings is jeopardized by too many analysis methods and sometimes too few samples used, which leads to discord among researchers. We propose a tunable consensus clustering paradigm that aims at overcoming the clustering methods selection problem as well as reliability issues in neuroimaging by means of first applying several analysis methods (three in this study) on multiple datasets and then integrating the clustering results. To validate the method, we applied it to a complex fMRI experiment involving affective processing of hundreds of music clips. We found that brain structures related to visual, reward, and auditory processing have intrinsic spatial patterns of coherent neuroactivity during affective processing. The comparisons between the results obtained from our method and those from each individual clustering algorithm demonstrate that our paradigm has notable advantages over traditional single clustering algorithms in being able to evidence robust connectivity patterns even with complex neuroimaging data involving a variety of stimuli and affective evaluations of them. The consensus clustering method is implemented in the R package “UNCLES” available on http://cran.r-project.org/web/packages/UNCLES/index.html .
Collapse
Affiliation(s)
- Chao Liu
- * Department of Electronic and Computer Engineering, Brunel University London, London, UK
| | - Basel Abu-Jamous
- * Department of Electronic and Computer Engineering, Brunel University London, London, UK
| | - Elvira Brattico
- † Center for Music in the Brain (MIB), Department of Clinical Medicine, Aarhus University, Aarhus, Denmark.,‡ The Royal Academy of Music Aarhus/Aalborg, Aarhus, Denmark
| | - Asoke K Nandi
- * Department of Electronic and Computer Engineering, Brunel University London, London, UK.,§ The Key Laboratory of Embedded Systems and Service Computing, College of Electronic and Information Engineering, Tongji University, Shanghai, P. R. China
| |
Collapse
|
12
|
Merryweather-Clarke AT, Tipping AJ, Lamikanra AA, Fa R, Abu-Jamous B, Tsang HP, Carpenter L, Robson KJH, Nandi AK, Roberts DJ. Distinct gene expression program dynamics during erythropoiesis from human induced pluripotent stem cells compared with adult and cord blood progenitors. BMC Genomics 2016; 17:817. [PMID: 27769165 PMCID: PMC5073849 DOI: 10.1186/s12864-016-3134-z] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2015] [Accepted: 09/27/2016] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND Human-induced pluripotent stem cells (hiPSCs) are a potentially invaluable resource for regenerative medicine, including the in vitro manufacture of blood products. HiPSC-derived red blood cells are an attractive therapeutic option in hematology, yet exhibit unexplained proliferation and enucleation defects that presently preclude such applications. We hypothesised that substantial differential regulation of gene expression during erythroid development accounts for these important differences between hiPSC-derived cells and those from adult or cord-blood progenitors. We thus cultured erythroblasts from each source for transcriptomic analysis to investigate differential gene expression underlying these functional defects. RESULTS Our high resolution transcriptional view of definitive erythropoiesis captures the regulation of genes relevant to cell-cycle control and confers statistical power to deploy novel bioinformatics methods. Whilst the dynamics of erythroid program elaboration from adult and cord blood progenitors were very similar, the emerging erythroid transcriptome in hiPSCs revealed radically different program elaboration compared to adult and cord blood cells. We explored the function of differentially expressed genes in hiPSC-specific clusters defined by our novel tunable clustering algorithms (SMART and Bi-CoPaM). HiPSCs show reduced expression of c-KIT and key erythroid transcription factors SOX6, MYB and BCL11A, strong HBZ-induction, and aberrant expression of genes involved in protein degradation, lysosomal clearance and cell-cycle regulation. CONCLUSIONS Together, these data suggest that hiPSC-derived cells may be specified to a primitive erythroid fate, and implies that definitive specification may more accurately reflect adult development. We have therefore identified, for the first time, distinct gene expression dynamics during erythroblast differentiation from hiPSCs which may cause reduced proliferation and enucleation of hiPSC-derived erythroid cells. The data suggest several mechanistic defects which may partially explain the observed aberrant erythroid differentiation from hiPSCs.
Collapse
Affiliation(s)
- Alison T Merryweather-Clarke
- Radcliffe Department of Medicine, University of Oxford, Headington, Oxford, OX3 9DU, UK.,National Health Service Blood and Transplant, John Radcliffe Hospital, Headington, Oxford, OX3 9BQ, UK
| | - Alex J Tipping
- Radcliffe Department of Medicine, University of Oxford, Headington, Oxford, OX3 9DU, UK.,National Health Service Blood and Transplant, John Radcliffe Hospital, Headington, Oxford, OX3 9BQ, UK
| | - Abigail A Lamikanra
- Radcliffe Department of Medicine, University of Oxford, Headington, Oxford, OX3 9DU, UK. .,National Health Service Blood and Transplant, John Radcliffe Hospital, Headington, Oxford, OX3 9BQ, UK.
| | - Rui Fa
- Department of Electronic and Computer Engineering, Brunel University London, Middlesex, UB8 3PH, UK
| | - Basel Abu-Jamous
- Department of Electronic and Computer Engineering, Brunel University London, Middlesex, UB8 3PH, UK
| | - Hoi Pat Tsang
- Radcliffe Department of Medicine, University of Oxford, Headington, Oxford, OX3 9DU, UK.,National Health Service Blood and Transplant, John Radcliffe Hospital, Headington, Oxford, OX3 9BQ, UK
| | - Lee Carpenter
- Radcliffe Department of Medicine, University of Oxford, Headington, Oxford, OX3 9DU, UK.,National Health Service Blood and Transplant, John Radcliffe Hospital, Headington, Oxford, OX3 9BQ, UK
| | - Kathryn J H Robson
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, Headington, OX3 9DU, Oxford, UK
| | - Asoke K Nandi
- Department of Electronic and Computer Engineering, Brunel University London, Middlesex, UB8 3PH, UK.,Distinguished Visiting Professor, The Key Laboratory of Embedded Systems and Service Computing, College of Electronic and Information Engineering, Tongji University, Shanghai, People's Republic of China
| | - David J Roberts
- Radcliffe Department of Medicine, University of Oxford, Headington, Oxford, OX3 9DU, UK. .,National Health Service Blood and Transplant, John Radcliffe Hospital, Headington, Oxford, OX3 9BQ, UK.
| |
Collapse
|
13
|
Fiori A, Mignone A, Rospo G. DeCoClu: Density consensus clustering approach for public transport data. Inf Sci (N Y) 2016. [DOI: 10.1016/j.ins.2015.08.054] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
14
|
Pirayre A, Couprie C, Bidard F, Duval L, Pesquet JC. BRANE Cut: biologically-related a priori network enhancement with graph cuts for gene regulatory network inference. BMC Bioinformatics 2015; 16:368. [PMID: 26537179 PMCID: PMC4634801 DOI: 10.1186/s12859-015-0754-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2015] [Accepted: 09/29/2015] [Indexed: 01/18/2023] Open
Abstract
Background Inferring gene networks from high-throughput data constitutes an important step in the discovery of relevant regulatory relationships in organism cells. Despite the large number of available Gene Regulatory Network inference methods, the problem remains challenging: the underdetermination in the space of possible solutions requires additional constraints that incorporate a priori information on gene interactions. Methods Weighting all possible pairwise gene relationships by a probability of edge presence, we formulate the regulatory network inference as a discrete variational problem on graphs. We enforce biologically plausible coupling between groups and types of genes by minimizing an edge labeling functional coding for a priori structures. The optimization is carried out with Graph cuts, an approach popular in image processing and computer vision. We compare the inferred regulatory networks to results achieved by the mutual-information-based Context Likelihood of Relatedness (CLR) method and by the state-of-the-art GENIE3, winner of the DREAM4 multifactorial challenge. Results Our BRANE Cut approach infers more accurately the five DREAM4 in silico networks (with improvements from 6 % to 11 %). On a real Escherichia coli compendium, an improvement of 11.8 % compared to CLR and 3 % compared to GENIE3 is obtained in terms of Area Under Precision-Recall curve. Up to 48 additional verified interactions are obtained over GENIE3 for a given precision. On this dataset involving 4345 genes, our method achieves a performance similar to that of GENIE3, while being more than seven times faster. The BRANE Cut code is available at: http://www-syscom.univ-mlv.fr/~pirayre/Codes-GRN-BRANE-cut.html. Conclusions BRANE Cut is a weighted graph thresholding method. Using biologically sound penalties and data-driven parameters, it improves three state-of-the art GRN inference methods. It is applicable as a generic network inference post-processing, due to its computational efficiency. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0754-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Aurélie Pirayre
- IFP Energies Nouvelles, 1-4 avenue de Bois-Préau, Rueil-Malmaison, 92852, France. .,Université Paris-Est, Laboratoire d'Informatique Gaspard-Monge, 5 boulevard Descartes - Champs-sur-Marne, Marne-la-Vallée, 77454, France.
| | - Camille Couprie
- IFP Energies Nouvelles, 1-4 avenue de Bois-Préau, Rueil-Malmaison, 92852, France. .,Facebook AI Research, Paris, France.
| | - Frédérique Bidard
- IFP Energies Nouvelles, 1-4 avenue de Bois-Préau, Rueil-Malmaison, 92852, France.
| | - Laurent Duval
- IFP Energies Nouvelles, 1-4 avenue de Bois-Préau, Rueil-Malmaison, 92852, France.
| | - Jean-Christophe Pesquet
- Université Paris-Est, Laboratoire d'Informatique Gaspard-Monge, 5 boulevard Descartes - Champs-sur-Marne, Marne-la-Vallée, 77454, France.
| |
Collapse
|
15
|
Abu-Jamous B, Fa R, Roberts DJ, Nandi AK. UNCLES: method for the identification of genes differentially consistently co-expressed in a specific subset of datasets. BMC Bioinformatics 2015; 16:184. [PMID: 26040489 PMCID: PMC4453228 DOI: 10.1186/s12859-015-0614-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2015] [Accepted: 05/16/2015] [Indexed: 12/13/2022] Open
Abstract
Background Collective analysis of the increasingly emerging gene expression datasets are required. The recently proposed binarisation of consensus partition matrices (Bi-CoPaM) method can combine clustering results from multiple datasets to identify the subsets of genes which are consistently co-expressed in all of the provided datasets in a tuneable manner. However, results validation and parameter setting are issues that complicate the design of such methods. Moreover, although it is a common practice to test methods by application to synthetic datasets, the mathematical models used to synthesise such datasets are usually based on approximations which may not always be sufficiently representative of real datasets. Results Here, we propose an unsupervised method for the unification of clustering results from multiple datasets using external specifications (UNCLES). This method has the ability to identify the subsets of genes consistently co-expressed in a subset of datasets while being poorly co-expressed in another subset of datasets, and to identify the subsets of genes consistently co-expressed in all given datasets. We also propose the M-N scatter plots validation technique and adopt it to set the parameters of UNCLES, such as the number of clusters, automatically. Additionally, we propose an approach for the synthesis of gene expression datasets using real data profiles in a way which combines the ground-truth-knowledge of synthetic data and the realistic expression values of real data, and therefore overcomes the problem of faithfulness of synthetic expression data modelling. By application to those datasets, we validate UNCLES while comparing it with other conventional clustering methods, and of particular relevance, biclustering methods. We further validate UNCLES by application to a set of 14 real genome-wide yeast datasets as it produces focused clusters that conform well to known biological facts. Furthermore, in-silico-based hypotheses regarding the function of a few previously unknown genes in those focused clusters are drawn. Conclusions The UNCLES method, the M-N scatter plots technique, and the expression data synthesis approach will have wide application for the comprehensive analysis of genomic and other sources of multiple complex biological datasets. Moreover, the derived in-silico-based biological hypotheses represent subjects for future functional studies. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0614-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Basel Abu-Jamous
- Department of Electronic and Computer Engineering, Brunel University London, Uxbridge, Middlesex, UB8 3PH, UK.
| | - Rui Fa
- Department of Electronic and Computer Engineering, Brunel University London, Uxbridge, Middlesex, UB8 3PH, UK.
| | - David J Roberts
- National Health Service Blood and Transplant, Oxford, OX3 9BQ, UK. .,Radcliffe Department of Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DU, UK.
| | - Asoke K Nandi
- Department of Electronic and Computer Engineering, Brunel University London, Uxbridge, Middlesex, UB8 3PH, UK. .,Department of Mathematical Information Technology, University of Jyväskylä, Jyväskylä, Finland.
| |
Collapse
|
16
|
Abu-Jamous B, Fa R, Roberts DJ, Nandi AK. Comprehensive analysis of forty yeast microarray datasets reveals a novel subset of genes (APha-RiB) consistently negatively associated with ribosome biogenesis. BMC Bioinformatics 2014; 15:322. [PMID: 25267386 PMCID: PMC4262117 DOI: 10.1186/1471-2105-15-322] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2014] [Accepted: 09/22/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The scale and complexity of genomic data lend themselves to analysis using sophisticated mathematical techniques to yield information that can generate new hypotheses and so guide further experimental investigations. An ensemble clustering method has the ability to perform consensus clustering over the same set of genes from different microarray datasets by combining results from different clustering methods into a single consensus result. RESULTS In this paper we have performed comprehensive analysis of forty yeast microarray datasets. One recently described Bi-CoPaM method can analyse expressions of the same set of genes from various microarray datasets while using different clustering methods, and then combine these results into a single consensus result whose clusters' tightness is tunable from tight, specific clusters to wide, overlapping clusters. This has been adopted in a novel way over genome-wide data from forty yeast microarray datasets to discover two clusters of genes that are consistently co-expressed over all of these datasets from different biological contexts and various experimental conditions. Most strikingly, average expression profiles of those clusters are consistently negatively correlated in all of the forty datasets while neither profile leads or lags the other. CONCLUSIONS The first cluster is enriched with ribosomal biogenesis genes. The biological processes of most of the genes in the second cluster are either unknown or apparently unrelated although they show high connectivity in protein-protein and genetic interaction networks. Therefore, it is possible that this mostly uncharacterised cluster and the ribosomal biogenesis cluster are transcriptionally oppositely regulated by some common machinery. Moreover, we anticipate that the genes included in this previously unknown cluster participate in generic, in contrast to specific, stress response processes. These novel findings illuminate coordinated gene expression in yeast and suggest several hypotheses for future experimental functional work. Additionally, we have demonstrated the usefulness of the Bi-CoPaM-based approach, which may be helpful for the analysis of other groups of (microarray) datasets from other species and systems for the exploration of global genetic co-expression.
Collapse
Affiliation(s)
- Basel Abu-Jamous
- />Department of Electronic and Computer Engineering, Brunel University, Uxbridge, Middlesex, UB8 3PH UK
| | - Rui Fa
- />Department of Electronic and Computer Engineering, Brunel University, Uxbridge, Middlesex, UB8 3PH UK
| | - David J Roberts
- />National Health Service Blood and Transplant, Oxford, UK
- />Radcliffe Department of Medicine, University of Oxford, John Radcliffe Hospital, Oxford, UK
| | - Asoke K Nandi
- />Department of Electronic and Computer Engineering, Brunel University, Uxbridge, Middlesex, UB8 3PH UK
- />Department of Mathematical Information Technology, University of Jyväskylä, Jyväskylä, Finland
| |
Collapse
|
17
|
Fa R, Nandi AK. Noise Resistant Generalized Parametric Validity Index of Clustering for Gene Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:741-752. [PMID: 26356344 DOI: 10.1109/tcbb.2014.2312006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Validity indices have been investigated for decades. However, since there is no study of noise-resistance performance of these indices in the literature, there is no guideline for determining the best clustering in noisy data sets, especially microarray data sets. In this paper, we propose a generalized parametric validity (GPV) index which employs two tunable parameters α and β to control the proportions of objects being considered to calculate the dissimilarities. The greatest advantage of the proposed GPV index is its noise-resistance ability, which results from the flexibility of tuning the parameters. Several rules are set to guide the selection of parameter values. To illustrate the noise-resistance performance of the proposed index, we evaluate the GPV index for assessing five clustering algorithms in two gene expression data simulation models with different noise levels and compare the ability of determining the number of clusters with eight existing indices. We also test the GPV in three groups of real gene expression data sets. The experimental results suggest that the proposed GPV index has superior noise-resistance ability and provides fairly accurate judgements.
Collapse
|
18
|
Fa R, Roberts DJ, Nandi AK. SMART: unique splitting-while-merging framework for gene clustering. PLoS One 2014; 9:e94141. [PMID: 24714159 PMCID: PMC3979766 DOI: 10.1371/journal.pone.0094141] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2013] [Accepted: 03/14/2014] [Indexed: 11/18/2022] Open
Abstract
Successful clustering algorithms are highly dependent on parameter settings. The clustering performance degrades significantly unless parameters are properly set, and yet, it is difficult to set these parameters a priori. To address this issue, in this paper, we propose a unique splitting-while-merging clustering framework, named "splitting merging awareness tactics" (SMART), which does not require any a priori knowledge of either the number of clusters or even the possible range of this number. Unlike existing self-splitting algorithms, which over-cluster the dataset to a large number of clusters and then merge some similar clusters, our framework has the ability to split and merge clusters automatically during the process and produces the the most reliable clustering results, by intrinsically integrating many clustering techniques and tasks. The SMART framework is implemented with two distinct clustering paradigms in two algorithms: competitive learning and finite mixture model. Nevertheless, within the proposed SMART framework, many other algorithms can be derived for different clustering paradigms. The minimum message length algorithm is integrated into the framework as the clustering selection criterion. The usefulness of the SMART framework and its algorithms is tested in demonstration datasets and simulated gene expression datasets. Moreover, two real microarray gene expression datasets are studied using this approach. Based on the performance of many metrics, all numerical results show that SMART is superior to compared existing self-splitting algorithms and traditional algorithms. Three main properties of the proposed SMART framework are summarized as: (1) needing no parameters dependent on the respective dataset or a priori knowledge about the datasets, (2) extendible to many different applications, (3) offering superior performance compared with counterpart algorithms.
Collapse
Affiliation(s)
- Rui Fa
- Department of Electronic and Computer Engineering, Brunel University, Uxbridge, Middlesex, United Kingdom
| | - David J. Roberts
- National Health Service Blood and Transplant, Oxford, United Kingdom
- The University of Oxford, John Radcliffe Hospital, Oxford, United Kingdom
| | - Asoke K. Nandi
- Department of Electronic and Computer Engineering, Brunel University, Uxbridge, Middlesex, United Kingdom
- Department of Mathematical Information Technology, University of Jyväskylä, Jyväskylä, Finland
| |
Collapse
|