1
|
Fratello M, Cattelani L, Federico A, Pavel A, Scala G, Serra A, Greco D. Unsupervised Algorithms for Microarray Sample Stratification. Methods Mol Biol 2022; 2401:121-146. [PMID: 34902126 DOI: 10.1007/978-1-0716-1839-4_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The amount of data made available by microarrays gives researchers the opportunity to delve into the complexity of biological systems. However, the noisy and extremely high-dimensional nature of this kind of data poses significant challenges. Microarrays allow for the parallel measurement of thousands of molecular objects spanning different layers of interactions. In order to be able to discover hidden patterns, the most disparate analytical techniques have been proposed. Here, we describe the basic methodologies to approach the analysis of microarray datasets that focus on the task of (sub)group discovery.
Collapse
Affiliation(s)
- Michele Fratello
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Luca Cattelani
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Antonio Federico
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Alisa Pavel
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Giovanni Scala
- Department of Biology, University of Naples Federico II, Naples, Italy
| | - Angela Serra
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Dario Greco
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland.
- BioMediTech Institute, Tampere University, Tampere, Finland.
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland.
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland.
| |
Collapse
|
2
|
Maâtouk O, Ayadi W, Bouziri H, Duval B. Evolutionary biclustering algorithms: an experimental study on microarray data. Soft comput 2019. [DOI: 10.1007/s00500-018-3394-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
3
|
Pontes B, Giráldez R, Aguilar-Ruiz JS. Biclustering on expression data: A review. J Biomed Inform 2015; 57:163-80. [PMID: 26160444 DOI: 10.1016/j.jbi.2015.06.028] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2015] [Revised: 06/22/2015] [Accepted: 06/30/2015] [Indexed: 11/28/2022]
Abstract
Biclustering has become a popular technique for the study of gene expression data, especially for discovering functionally related gene sets under different subsets of experimental conditions. Most of biclustering approaches use a measure or cost function that determines the quality of biclusters. In such cases, the development of both a suitable heuristics and a good measure for guiding the search are essential for discovering interesting biclusters in an expression matrix. Nevertheless, not all existing biclustering approaches base their search on evaluation measures for biclusters. There exists a diverse set of biclustering tools that follow different strategies and algorithmic concepts which guide the search towards meaningful results. In this paper we present a extensive survey of biclustering approaches, classifying them into two categories according to whether or not use evaluation metrics within the search method: biclustering algorithms based on evaluation measures and non metric-based biclustering algorithms. In both cases, they have been classified according to the type of meta-heuristics which they are based on.
Collapse
Affiliation(s)
- Beatriz Pontes
- Department of Languages and Computer Systems, University of Seville, Seville, Spain.
| | - Raúl Giráldez
- School of Engineering, Pablo de Olavide University, Seville, Spain.
| | | |
Collapse
|
5
|
Abu-Jamous B, Fa R, Roberts DJ, Nandi AK. Comprehensive analysis of forty yeast microarray datasets reveals a novel subset of genes (APha-RiB) consistently negatively associated with ribosome biogenesis. BMC Bioinformatics 2014; 15:322. [PMID: 25267386 PMCID: PMC4262117 DOI: 10.1186/1471-2105-15-322] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2014] [Accepted: 09/22/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The scale and complexity of genomic data lend themselves to analysis using sophisticated mathematical techniques to yield information that can generate new hypotheses and so guide further experimental investigations. An ensemble clustering method has the ability to perform consensus clustering over the same set of genes from different microarray datasets by combining results from different clustering methods into a single consensus result. RESULTS In this paper we have performed comprehensive analysis of forty yeast microarray datasets. One recently described Bi-CoPaM method can analyse expressions of the same set of genes from various microarray datasets while using different clustering methods, and then combine these results into a single consensus result whose clusters' tightness is tunable from tight, specific clusters to wide, overlapping clusters. This has been adopted in a novel way over genome-wide data from forty yeast microarray datasets to discover two clusters of genes that are consistently co-expressed over all of these datasets from different biological contexts and various experimental conditions. Most strikingly, average expression profiles of those clusters are consistently negatively correlated in all of the forty datasets while neither profile leads or lags the other. CONCLUSIONS The first cluster is enriched with ribosomal biogenesis genes. The biological processes of most of the genes in the second cluster are either unknown or apparently unrelated although they show high connectivity in protein-protein and genetic interaction networks. Therefore, it is possible that this mostly uncharacterised cluster and the ribosomal biogenesis cluster are transcriptionally oppositely regulated by some common machinery. Moreover, we anticipate that the genes included in this previously unknown cluster participate in generic, in contrast to specific, stress response processes. These novel findings illuminate coordinated gene expression in yeast and suggest several hypotheses for future experimental functional work. Additionally, we have demonstrated the usefulness of the Bi-CoPaM-based approach, which may be helpful for the analysis of other groups of (microarray) datasets from other species and systems for the exploration of global genetic co-expression.
Collapse
Affiliation(s)
- Basel Abu-Jamous
- />Department of Electronic and Computer Engineering, Brunel University, Uxbridge, Middlesex, UB8 3PH UK
| | - Rui Fa
- />Department of Electronic and Computer Engineering, Brunel University, Uxbridge, Middlesex, UB8 3PH UK
| | - David J Roberts
- />National Health Service Blood and Transplant, Oxford, UK
- />Radcliffe Department of Medicine, University of Oxford, John Radcliffe Hospital, Oxford, UK
| | - Asoke K Nandi
- />Department of Electronic and Computer Engineering, Brunel University, Uxbridge, Middlesex, UB8 3PH UK
- />Department of Mathematical Information Technology, University of Jyväskylä, Jyväskylä, Finland
| |
Collapse
|
6
|
Yun T, Yi GS. Biclustering for the comprehensive search of correlated gene expression patterns using clustered seed expansion. BMC Genomics 2013; 14:144. [PMID: 23496895 PMCID: PMC3618306 DOI: 10.1186/1471-2164-14-144] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2012] [Accepted: 02/21/2013] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND In a functional analysis of gene expression data, biclustering method can give crucial information by showing correlated gene expression patterns under a subset of conditions. However, conventional biclustering algorithms still have some limitations to show comprehensive and stable outputs. RESULTS We propose a novel biclustering approach called "BIclustering by Correlated and Large number of Individual Clustered seeds (BICLIC)" to find comprehensive sets of correlated expression patterns in biclusters using clustered seeds and their expansion with correlation of gene expression. BICLIC outperformed competing biclustering algorithms by completely recovering implanted biclusters in simulated datasets with various types of correlated patterns: shifting, scaling, and shifting-scaling. Furthermore, in a real yeast microarray dataset and a lung cancer microarray dataset, BICLIC found more comprehensive sets of biclusters that are significantly enriched to more diverse sets of biological terms than those of other competing biclustering algorithms. CONCLUSIONS BICLIC provides significant benefits in finding comprehensive sets of correlated patterns and their functional implications from a gene expression dataset.
Collapse
Affiliation(s)
- Taegyun Yun
- Department of Information and Communications Engineering, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon, 305-701, Republic of Korea
| | - Gwan-Su Yi
- Department of Information and Communications Engineering, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon, 305-701, Republic of Korea
- Department of Bio and Brain Engineering, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon, 305-701, Republic of Korea
| |
Collapse
|