1
|
Epigenetic regulation of fetal brain development in pig. Gene 2022; 844:146823. [PMID: 35988784 DOI: 10.1016/j.gene.2022.146823] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 07/27/2022] [Accepted: 08/15/2022] [Indexed: 02/01/2023]
Abstract
How fetal brain development is regulated at the molecular level is not well understood. Due to ethical challenges associated with research on the human fetus, large animals particularly pigs are increasingly used to study development and disorders of fetal brain. The pig fetal brain grows rapidly during the last ∼ 50 days before birth which is around day 60 (d60) of pig gestation. But what regulates the onset of accelerated growth of the brain is unknown. The current study tests the hypothesis that epigenetic alteration around d60 is involved in the onset of rapid growth of fetal brain of pig. To test this hypothesis, DNA methylation changes of fetal brain was assessed in a genome-wide manner by Enzymatic Methyl-seq (EM-seq) during two gestational periods (GP): d45 vs. d60 (GP1) and d60 vs. d90 (GP2). The cytosine-guanine (CpG) methylation data was analyzed in an integrative manner with the RNA-seq data generated from the same brain samples from our earlier study. A neural network based modeling approach was implemented to learn changes in methylation patterns of the differentially expressed genes, and then predict methylations of the brain in a genome-wide manner during rapid growth. This approach identified specific methylations that changed in a mutually informative manner during rapid growth of the fetal brain. These methylations were significantly overrepresented in specific genic as well as intergenic features including CpG islands, introns, and untranslated regions. In addition, sex-bias methylations of known single nucleotide polymorphic sites were also identified in the fetal brain ide during rapid growth.
Collapse
|
2
|
Maâtouk O, Ayadi W, Bouziri H, Duval B. Evolutionary Local Search Algorithm for the biclustering of gene expression data based on biological knowledge. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107177] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
3
|
Nepomuceno-Chamorro IA, Nepomuceno JA, Galván-Rojas JL, Vega-Márquez B, Rubio-Escudero C. Using prior knowledge in the inference of gene association networks. APPL INTELL 2020. [DOI: 10.1007/s10489-020-01705-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
4
|
Experimental correlation analysis of bicluster coherence measures and gene ontology information. Appl Soft Comput 2019. [DOI: 10.1016/j.asoc.2019.105688] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
5
|
Maâtouk O, Ayadi W, Bouziri H, Duval B. Evolutionary biclustering algorithms: an experimental study on microarray data. Soft comput 2019. [DOI: 10.1007/s00500-018-3394-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
6
|
Nepomuceno JA, Troncoso A, Nepomuceno-Chamorro IA, Aguilar-Ruiz JS. Pairwise gene GO-based measures for biclustering of high-dimensional expression data. BioData Min 2018; 11:4. [PMID: 29610579 PMCID: PMC5872503 DOI: 10.1186/s13040-018-0165-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2017] [Accepted: 03/01/2018] [Indexed: 11/15/2022] Open
Abstract
Background Biclustering algorithms search for groups of genes that share the same behavior under a subset of samples in gene expression data. Nowadays, the biological knowledge available in public repositories can be used to drive these algorithms to find biclusters composed of groups of genes functionally coherent. On the other hand, a distance among genes can be defined according to their information stored in Gene Ontology (GO). Gene pairwise GO semantic similarity measures report a value for each pair of genes which establishes their functional similarity. A scatter search-based algorithm that optimizes a merit function that integrates GO information is studied in this paper. This merit function uses a term that addresses the information through a GO measure. Results The effect of two possible different gene pairwise GO measures on the performance of the algorithm is analyzed. Firstly, three well known yeast datasets with approximately one thousand of genes are studied. Secondly, a group of human datasets related to clinical data of cancer is also explored by the algorithm. Most of these data are high-dimensional datasets composed of a huge number of genes. The resultant biclusters reveal groups of genes linked by a same functionality when the search procedure is driven by one of the proposed GO measures. Furthermore, a qualitative biological study of a group of biclusters show their relevance from a cancer disease perspective. Conclusions It can be concluded that the integration of biological information improves the performance of the biclustering process. The two different GO measures studied show an improvement in the results obtained for the yeast dataset. However, if datasets are composed of a huge number of genes, only one of them really improves the algorithm performance. This second case constitutes a clear option to explore interesting datasets from a clinical point of view.
Collapse
Affiliation(s)
- Juan A Nepomuceno
- 1Departamento de Lenguajes y Sistemas Informáticos, Universidad de Sevilla, Avd. Reina Mercedes s/n, Seville, 41012 Spain
| | - Alicia Troncoso
- 2Área de Informática, Universidad Pablo de Olavide, Ctra. Utrera km. 1, Seville, 41013 Spain
| | - Isabel A Nepomuceno-Chamorro
- 1Departamento de Lenguajes y Sistemas Informáticos, Universidad de Sevilla, Avd. Reina Mercedes s/n, Seville, 41012 Spain
| | - Jesús S Aguilar-Ruiz
- 2Área de Informática, Universidad Pablo de Olavide, Ctra. Utrera km. 1, Seville, 41013 Spain
| |
Collapse
|
7
|
Mandal K, Sarmah R, Bhattacharyya DK. Biomarker Identification for Cancer Disease Using Biclustering Approach: An Empirical Study. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 16:490-509. [PMID: 29993834 DOI: 10.1109/tcbb.2018.2820695] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper presents an exhaustive empirical study to identify biomarkers using two approaches: frequency-based and network-based, over seventeen different biclustering algorithms and six different cancer expression datasets. To systematically analyze the biclustering algorithms, we perform enrichment analysis, subtype identification and biomarker identification. Biclustering algorithms such as C&C, SAMBA and Plaid are useful to detect biomarkers by both approaches for all datasets except prostate cancer. We detect a total of 102 gene biomarkers using frequency-based method out of which 19 are for blood cancer, 36 for lung cancer, 25 for colon cancer, 13 for multi-tissue cancer and 9 for prostate cancer. Using the network-based approach we detect a total of 41 gene biomarkers of which 15 are from blood cancer, 12 from lung cancer, 6 from colon cancer, 7 from multi-tissue cancer and 1 from prostate cancer dataset. We further extend our network analysis over some biclusters and detect some gene biomarkers not detected earlier by both frequency-based or network-based approach. We expand our work on breast cancer miRNA expression data to evaluate the performance of the biclustering algorithms. We detect 19 breast cancer biomarkers by frequency-based method and 5 by network-based method for the miRNA dataset.
Collapse
|
8
|
Biclustering by sparse canonical correlation analysis. QUANTITATIVE BIOLOGY 2018. [DOI: 10.1007/s40484-017-0127-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
9
|
Yu X, Yu G, Wang J. Clustering cancer gene expression data by projective clustering ensemble. PLoS One 2017; 12:e0171429. [PMID: 28234920 PMCID: PMC5325197 DOI: 10.1371/journal.pone.0171429] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Accepted: 01/20/2017] [Indexed: 11/19/2022] Open
Abstract
Gene expression data analysis has paramount implications for gene treatments, cancer diagnosis and other domains. Clustering is an important and promising tool to analyze gene expression data. Gene expression data is often characterized by a large amount of genes but with limited samples, thus various projective clustering techniques and ensemble techniques have been suggested to combat with these challenges. However, it is rather challenging to synergy these two kinds of techniques together to avoid the curse of dimensionality problem and to boost the performance of gene expression data clustering. In this paper, we employ a projective clustering ensemble (PCE) to integrate the advantages of projective clustering and ensemble clustering, and to avoid the dilemma of combining multiple projective clusterings. Our experimental results on publicly available cancer gene expression data show PCE can improve the quality of clustering gene expression data by at least 4.5% (on average) than other related techniques, including dimensionality reduction based single clustering and ensemble approaches. The empirical study demonstrates that, to further boost the performance of clustering cancer gene expression data, it is necessary and promising to synergy projective clustering with ensemble clustering. PCE can serve as an effective alternative technique for clustering gene expression data.
Collapse
Affiliation(s)
- Xianxue Yu
- College of Computer and Information Science, Southwest University, Beibei, Chongqing, China
| | - Guoxian Yu
- College of Computer and Information Science, Southwest University, Beibei, Chongqing, China
| | - Jun Wang
- College of Computer and Information Science, Southwest University, Beibei, Chongqing, China
| |
Collapse
|
10
|
|
11
|
Abstract
Mining microarray data to unearth interesting expression profile patterns for discovery of in silico biological knowledge is an emerging area of research in computational biology. A group of functionally related genes may have similar expression patterns under a set of conditions or at some time points. Biclustering is an important data mining tool that has been successfully used to analyze gene expression data for biologically significant cluster discovery. The purpose of this chapter is to introduce interesting patterns that may be observed in expression data and discuss the role of biclustering techniques in detecting interesting functional gene groups with similar expression patterns.
Collapse
|
12
|
Nepomuceno JA, Troncoso A, Aguilar-Ruiz JS. Scatter search-based identification of local patterns with positive and negative correlations in gene expression data. Appl Soft Comput 2015. [DOI: 10.1016/j.asoc.2015.06.019] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
13
|
Xue Y, Liao Z, Li M, Luo J, Kuang Q, Hu X, Li T. A New Approach for Mining Order-Preserving Submatrices Based on All Common Subsequences. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2015; 2015:680434. [PMID: 26161131 PMCID: PMC4464847 DOI: 10.1155/2015/680434] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/04/2014] [Revised: 12/31/2014] [Accepted: 01/24/2015] [Indexed: 11/18/2022]
Abstract
Order-preserving submatrices (OPSMs) have been applied in many fields, such as DNA microarray data analysis, automatic recommendation systems, and target marketing systems, as an important unsupervised learning model. Unfortunately, most existing methods are heuristic algorithms which are unable to reveal OPSMs entirely in NP-complete problem. In particular, deep OPSMs, corresponding to long patterns with few supporting sequences, incur explosive computational costs and are completely pruned by most popular methods. In this paper, we propose an exact method to discover all OPSMs based on frequent sequential pattern mining. First, an existing algorithm was adjusted to disclose all common subsequence (ACS) between every two row sequences, and therefore all deep OPSMs will not be missed. Then, an improved data structure for prefix tree was used to store and traverse ACS, and Apriori principle was employed to efficiently mine the frequent sequential pattern. Finally, experiments were implemented on gene and synthetic datasets. Results demonstrated the effectiveness and efficiency of this method.
Collapse
Affiliation(s)
- Yun Xue
- Laboratory of Quantum Engineering and Quantum Materials, School of Physics and Telecommunication Engineering, South China Normal University, Guangzhou 510006, China
| | - Zhengling Liao
- Laboratory of Quantum Engineering and Quantum Materials, School of Physics and Telecommunication Engineering, South China Normal University, Guangzhou 510006, China
| | - Meihang Li
- Laboratory of Quantum Engineering and Quantum Materials, School of Physics and Telecommunication Engineering, South China Normal University, Guangzhou 510006, China
| | - Jie Luo
- Laboratory of Quantum Engineering and Quantum Materials, School of Physics and Telecommunication Engineering, South China Normal University, Guangzhou 510006, China
| | - Qiuhua Kuang
- Laboratory of Quantum Engineering and Quantum Materials, School of Physics and Telecommunication Engineering, South China Normal University, Guangzhou 510006, China
| | - Xiaohui Hu
- Laboratory of Quantum Engineering and Quantum Materials, School of Physics and Telecommunication Engineering, South China Normal University, Guangzhou 510006, China
| | - Tiechen Li
- Laboratory of Quantum Engineering and Quantum Materials, School of Physics and Telecommunication Engineering, South China Normal University, Guangzhou 510006, China
| |
Collapse
|
14
|
Nepomuceno JA, Troncoso A, Nepomuceno-Chamorro IA, Aguilar-Ruiz JS. Integrating biological knowledge based on functional annotations for biclustering of gene expression data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2015; 119:163-80. [PMID: 25843807 DOI: 10.1016/j.cmpb.2015.02.010] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2014] [Revised: 02/17/2015] [Accepted: 02/27/2015] [Indexed: 05/06/2023]
Abstract
Gene expression data analysis is based on the assumption that co-expressed genes imply co-regulated genes. This assumption is being reformulated because the co-expression of a group of genes may be the result of an independent activation with respect to the same experimental condition and not due to the same regulatory regime. For this reason, traditional techniques are recently being improved with the use of prior biological knowledge from open-access repositories together with gene expression data. Biclustering is an unsupervised machine learning technique that searches patterns in gene expression data matrices. A scatter search-based biclustering algorithm that integrates biological information is proposed in this paper. In addition to the gene expression data matrix, the input of the algorithm is only a direct annotation file that relates each gene to a set of terms from a biological repository where genes are annotated. Two different biological measures, FracGO and SimNTO, are proposed to integrate this information by means of its addition to-be-optimized fitness function in the scatter search scheme. The measure FracGO is based on the biological enrichment and SimNTO is based on the overlapping among GO annotations of pairs of genes. Experimental results evaluate the proposed algorithm for two datasets and show the algorithm performs better when biological knowledge is integrated. Moreover, the analysis and comparison between the two different biological measures is presented and it is concluded that the differences depend on both the data source and how the annotation file has been built in the case GO is used. It is also shown that the proposed algorithm obtains a greater number of enriched biclusters than other classical biclustering algorithms typically used as benchmark and an analysis of the overlapping among biclusters reveals that the biclusters obtained present a low overlapping. The proposed methodology is a general-purpose algorithm which allows the integration of biological information from several sources and can be extended to other biclustering algorithms based on the optimization of a merit function.
Collapse
Affiliation(s)
- Juan A Nepomuceno
- Departamento de Lenguajes y Sistemas Informáticos, Universidad de Sevilla, Avd. Reina Mercedes s/n, 41012 Seville, Spain.
| | - Alicia Troncoso
- Department of Computer Engineering, Pablo de Olavide University, Ctra. Utrera km. 1, 41013 Seville, Spain
| | - Isabel A Nepomuceno-Chamorro
- Departamento de Lenguajes y Sistemas Informáticos, Universidad de Sevilla, Avd. Reina Mercedes s/n, 41012 Seville, Spain
| | - Jesús S Aguilar-Ruiz
- Department of Computer Engineering, Pablo de Olavide University, Ctra. Utrera km. 1, 41013 Seville, Spain
| |
Collapse
|
15
|
Pontes B, Girldez R, Aguilar-Ruiz JS. Quality measures for gene expression biclusters. PLoS One 2015; 10:e0115497. [PMID: 25763839 PMCID: PMC4357449 DOI: 10.1371/journal.pone.0115497] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2014] [Accepted: 11/24/2014] [Indexed: 11/22/2022] Open
Abstract
An noticeable number of biclustering approaches have been proposed proposed for the study of gene expression data, especially for discovering functionally related gene sets under different subsets of experimental conditions. In this context, recognizing groups of co-expressed or co-regulated genes, that is, genes which follow a similar expression pattern, is one of the main objectives. Due to the problem complexity, heuristic searches are usually used instead of exhaustive algorithms. Furthermore, most of biclustering approaches use a measure or cost function that determines the quality of biclusters. Having a suitable quality metric for bicluster is a critical aspect, not only for guiding the search, but also for establishing a comparison criteria among the results obtained by different biclustering techniques. In this paper, we analyse a large number of existing approaches to quality measures for gene expression biclusters, as well as we present a comparative study of them based on their capability to recognize different expression patterns in biclusters.
Collapse
Affiliation(s)
- Beatriz Pontes
- Department of Computer Languages, University of Seville, Seville, Spain
| | - Ral Girldez
- School of Engineering, Pablo de Olavide University, Seville, Spain
| | | |
Collapse
|
16
|
Deveci M, Küçüktunç O, Eren K, Bozdağ D, Kaya K, Çatalyürek ÜV. Querying Co-regulated Genes on Diverse Gene Expression Datasets Via Biclustering. Methods Mol Biol 2015; 1375:55-74. [PMID: 26626937 DOI: 10.1007/7651_2015_246] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Rapid development and increasing popularity of gene expression microarrays have resulted in a number of studies on the discovery of co-regulated genes. One important way of discovering such co-regulations is the query-based search since gene co-expressions may indicate a shared role in a biological process. Although there exist promising query-driven search methods adapting clustering, they fail to capture many genes that function in the same biological pathway because microarray datasets are fraught with spurious samples or samples of diverse origin, or the pathways might be regulated under only a subset of samples. On the other hand, a class of clustering algorithms known as biclustering algorithms which simultaneously cluster both the items and their features are useful while analyzing gene expression data, or any data in which items are related in only a subset of their samples. This means that genes need not be related in all samples to be clustered together. Because many genes only interact under specific circumstances, biclustering may recover the relationships that traditional clustering algorithms can easily miss. In this chapter, we briefly summarize the literature using biclustering for querying co-regulated genes. Then we present a novel biclustering approach and evaluate its performance by a thorough experimental analysis.
Collapse
Affiliation(s)
- Mehmet Deveci
- Computer Science and Engineering, The Ohio State University, Columbus, OH, USA
| | - Onur Küçüktunç
- Computer Science and Engineering, The Ohio State University, Columbus, OH, USA
| | - Kemal Eren
- Computer Science and Engineering, The Ohio State University, Columbus, OH, USA
| | - Doruk Bozdağ
- Biomedical Informatics, The Ohio State University, Columbus, OH, USA
| | - Kamer Kaya
- Computer Science and Engineering, Sabancı University, Istanbul, Turkey
| | - Ümit V Çatalyürek
- Biomedical Informatics, Department of Electrical and Computer Engineering, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|
17
|
Pei Y, Gao Q, Li J, Zhao X. Identifying local co-regulation relationships in gene expression data. J Theor Biol 2014; 360:200-207. [PMID: 25042175 DOI: 10.1016/j.jtbi.2014.06.032] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2013] [Accepted: 06/26/2014] [Indexed: 11/24/2022]
Abstract
Identifying interesting relationships between pairs of genes, presented over some of experimental conditions in gene expression data set, is useful for discovering novel functional gene interactions. In this paper, we introduce a new method for id entifying L ocal C o-regulation R elationships (IdLCR). These local relationships describe the behaviors of pairwise genes, which are either up- or down-regulated throughout the identified condition subset. IdLCR firstly detects the pairwise gene-gene relationships taking functional forms and the condition subsets by using a regression spline model. Then it measures the relationships using a penalized Pearson correlation and ranks the responding gene pairs by their scores. By this way, those relationships without clearly biological interpretations can be filtered out and the local co-regulation relationships can be obtained. In the simulation data sets, ten different functional relationships are embedded. Applying IdLCR to these data sets, the results show its ability to identify functional relationships and the condition subsets. For micro-array and RNA-seq gene expression data, IdLCR can identify novel biological relationships which are different from those uncovered by IFGR and MINE.
Collapse
Affiliation(s)
- Yonggang Pei
- College of Mathematics and Information Science, Henan Normal University, Xinxiang 453007, China.
| | - Qinghui Gao
- College of Mathematics and Information Science, Henan Normal University, Xinxiang 453007, China.
| | - Juntao Li
- College of Mathematics and Information Science, Henan Normal University, Xinxiang 453007, China.
| | - Xiting Zhao
- College of Life Science, Henan Normal University, Xinxiang 453007, China.
| |
Collapse
|
18
|
Flores JL, Inza I, Larrañaga P, Calvo B. A new measure for gene expression biclustering based on non-parametric correlation. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2013; 112:367-397. [PMID: 24079964 DOI: 10.1016/j.cmpb.2013.07.025] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2012] [Revised: 06/14/2013] [Accepted: 07/26/2013] [Indexed: 06/02/2023]
Abstract
BACKGROUND One of the emerging techniques for performing the analysis of the DNA microarray data known as biclustering is the search of subsets of genes and conditions which are coherently expressed. These subgroups provide clues about the main biological processes. Until now, different approaches to this problem have been proposed. Most of them use the mean squared residue as quality measure but relevant and interesting patterns can not be detected such as shifting, or scaling patterns. Furthermore, recent papers show that there exist new coherence patterns involved in different kinds of cancer and tumors such as inverse relationships between genes which can not be captured. RESULTS The proposed measure is called Spearman's biclustering measure (SBM) which performs an estimation of the quality of a bicluster based on the non-linear correlation among genes and conditions simultaneously. The search of biclusters is performed by using a evolutionary technique called estimation of distribution algorithms which uses the SBM measure as fitness function. This approach has been examined from different points of view by using artificial and real microarrays. The assessment process has involved the use of quality indexes, a set of bicluster patterns of reference including new patterns and a set of statistical tests. It has been also examined the performance using real microarrays and comparing to different algorithmic approaches such as Bimax, CC, OPSM, Plaid and xMotifs. CONCLUSIONS SBM shows several advantages such as the ability to recognize more complex coherence patterns such as shifting, scaling and inversion and the capability to selectively marginalize genes and conditions depending on the statistical significance.
Collapse
Affiliation(s)
- Jose L Flores
- Intelligent Systems Group, Department of Computer Sciences and Artificial Intelligence, University of the Basque Country, P.O. Box 649, 20080 Donostia - San Sebastian, Spain.
| | | | | | | |
Collapse
|
19
|
Roy S, Bhattacharyya DK, Kalita JK. CoBi: Pattern Based Co-Regulated Biclustering of Gene Expression Data. Pattern Recognit Lett 2013. [DOI: 10.1016/j.patrec.2013.03.018] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
20
|
Hassanien AE, Al-Shammari ET, Ghali NI. Computational intelligence techniques in bioinformatics. Comput Biol Chem 2013; 47:37-47. [PMID: 23891719 DOI: 10.1016/j.compbiolchem.2013.04.007] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2013] [Revised: 04/06/2013] [Accepted: 04/24/2013] [Indexed: 10/26/2022]
Abstract
Computational intelligence (CI) is a well-established paradigm with current systems having many of the characteristics of biological computers and capable of performing a variety of tasks that are difficult to do using conventional techniques. It is a methodology involving adaptive mechanisms and/or an ability to learn that facilitate intelligent behavior in complex and changing environments, such that the system is perceived to possess one or more attributes of reason, such as generalization, discovery, association and abstraction. The objective of this article is to present to the CI and bioinformatics research communities some of the state-of-the-art in CI applications to bioinformatics and motivate research in new trend-setting directions. In this article, we present an overview of the CI techniques in bioinformatics. We will show how CI techniques including neural networks, restricted Boltzmann machine, deep belief network, fuzzy logic, rough sets, evolutionary algorithms (EA), genetic algorithms (GA), swarm intelligence, artificial immune systems and support vector machines, could be successfully employed to tackle various problems such as gene expression clustering and classification, protein sequence classification, gene selection, DNA fragment assembly, multiple sequence alignment, and protein function prediction and its structure. We discuss some representative methods to provide inspiring examples to illustrate how CI can be utilized to address these problems and how bioinformatics data can be characterized by CI. Challenges to be addressed and future directions of research are also presented and an extensive bibliography is included.
Collapse
Affiliation(s)
- Aboul Ella Hassanien
- Faculty of Computers and Information, Cairo University, 5 Ahmed Zewal Street, Orman, Giza, Egypt; Scientific Research Group in Egypt (SRGE), Egypt(1).
| | | | | |
Collapse
|
21
|
Gao Q, Ho C, Jia Y, Li JJ, Huang H. Biclustering of linear patterns in gene expression data. J Comput Biol 2012; 19:619-31. [PMID: 22697238 DOI: 10.1089/cmb.2012.0032] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Identifying a bicluster, or submatrix of a gene expression dataset wherein the genes express similar behavior over the columns, is useful for discovering novel functional gene interactions. In this article, we introduce a new algorithm for finding biClusters with Linear Patterns (CLiP). Instead of solely maximizing Pearson correlation, we introduce a fitness function that also considers the correlation of complementary genes and conditions. This eliminates the need for a priori determination of the bicluster size. We employ both greedy search and the genetic algorithm in optimization, incorporating resampling for more robust discovery. When applied to both real and simulation datasets, our results show that CLiP is superior to existing methods. In analyzing RNA-seq fly and worm time-course data from modENCODE, we uncover a set of similarly expressed genes suggesting maternal dependence. Supplementary Material is available online (at www.liebertonline.com/cmb).
Collapse
Affiliation(s)
- Qinghui Gao
- Seventh Research Division and Department of Systems and Control, Beihang University, Beijing China
| | | | | | | | | |
Collapse
|
22
|
Rathipriya R, Thangavel K. A Discrete Artificial Bees Colony Inspired Biclustering Algorithm. INTERNATIONAL JOURNAL OF SWARM INTELLIGENCE RESEARCH 2012. [DOI: 10.4018/jsir.2012010102] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Biclustering methods are the potential data mining technique that has been suggested to identify local patterns in the data. Biclustering algorithms are used for mining the web usage data which can determine a group of users which are correlated under a subset of pages of a web site. Recently, many blistering methods based on meta-heuristics have been proposed. Most use the Mean Squared Residue as merit function but interesting and relevant patterns such as shifting and scaling patterns may not be detected using this measure. However, it is important to discover this type of pattern since commonly the web users can present a similar behavior although their interest levels vary in different ranges or magnitudes. In this paper a new correlation based fitness function is designed to extract shifting and scaling browsing patterns. The proposed work uses a discrete version of Artificial Bee Colony optimization algorithm for biclustering of web usage data to produce optimal biclusters (i.e., highly correlated biclusters). It’s demonstrated on real dataset and its results show that proposed approach can find significant biclusters of high quality and has better convergence performance than Binary Particle Swarm Optimization (BPSO).
Collapse
|