Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Kim EY, Kim SY, Ashlock D, Nam D. MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering. BMC Bioinformatics 2009;10:260. [PMID: 19698124 PMCID: PMC2743671 DOI: 10.1186/1471-2105-10-260] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2009] [Accepted: 08/22/2009] [Indexed: 11/10/2022] Open

For:	Kim EY, Kim SY, Ashlock D, Nam D. MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering. BMC Bioinformatics 2009;10:260. [PMID: 19698124 PMCID: PMC2743671 DOI: 10.1186/1471-2105-10-260] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2009] [Accepted: 08/22/2009] [Indexed: 11/10/2022] Open

Number

Cited by Other Article(s)

Kumar N, Srivastava R. Deep learning in structural bioinformatics: current applications and future perspectives. Brief Bioinform 2024;25:bbae042. [PMID: 38701422 PMCID: PMC11066934 DOI: 10.1093/bib/bbae042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 01/05/2024] [Accepted: 01/18/2024] [Indexed: 05/05/2024] Open

An iterative approach to unsupervised outlier detection using ensemble method and distance-based data filtering. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-022-00674-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Band-based similarity indices for gene expression classification and clustering. Sci Rep 2021;11:21609. [PMID: 34732744 PMCID: PMC8566472 DOI: 10.1038/s41598-021-00678-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Accepted: 10/11/2021] [Indexed: 11/16/2022] Open

Cao D, Chen Y, Chen J, Zhang H, Yuan Z. An improved algorithm for the maximal information coefficient and its application. ROYAL SOCIETY OPEN SCIENCE 2021;8:201424. [PMID: 33972855 PMCID: PMC8074658 DOI: 10.1098/rsos.201424] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Accepted: 01/18/2021] [Indexed: 06/12/2023]

Clustering data with the presence of attribute noise: a study of noise completely at random and ensemble of multiple k-means clusterings. INT J MACH LEARN CYB 2019. [DOI: 10.1007/s13042-019-00989-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Yu J, Kim SB. Consensus rate-based label propagation for semi-supervised classification. Inf Sci (N Y) 2018. [DOI: 10.1016/j.ins.2018.06.074] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Alhusain L, Hafez AM. Cluster ensemble based on Random Forests for genetic data. BioData Min 2017;10:37. [PMID: 29270227 PMCID: PMC5732374 DOI: 10.1186/s13040-017-0156-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2017] [Accepted: 11/21/2017] [Indexed: 11/25/2022] Open

Abstract

Background

Clustering plays a crucial role in several application domains, such as bioinformatics. In bioinformatics, clustering has been extensively used as an approach for detecting interesting patterns in genetic data. One application is population structure analysis, which aims to group individuals into subpopulations based on shared genetic variations, such as single nucleotide polymorphisms. Advances in DNA sequencing technology have facilitated the obtainment of genetic datasets with exceptional sizes. Genetic data usually contain hundreds of thousands of genetic markers genotyped for thousands of individuals, making an efficient means for handling such data desirable.

Results

Random Forests (RFs) has emerged as an efficient algorithm capable of handling high-dimensional data. RFs provides a proximity measure that can capture different levels of co-occurring relationships between variables. RFs has been widely considered a supervised learning method, although it can be converted into an unsupervised learning method. Therefore, RF-derived proximity measure combined with a clustering technique may be well suited for determining the underlying structure of unlabeled data. This paper proposes, RFcluE, a cluster ensemble approach for determining the underlying structure of genetic data based on RFs. The approach comprises a cluster ensemble framework to combine multiple runs of RF clustering. Experiments were conducted on high-dimensional, real genetic dataset to evaluate the proposed approach. The experiments included an examination of the impact of parameter changes, comparing RFcluE performance against other clustering methods, and an assessment of the relationship between the diversity and quality of the ensemble and its effect on RFcluE performance.

Conclusions

This paper proposes, RFcluE, a cluster ensemble approach based on RF clustering to address the problem of population structure analysis and demonstrate the effectiveness of the approach. The paper also illustrates that applying a cluster ensemble approach, combining multiple RF clusterings, produces more robust and higher-quality results as a consequence of feeding the ensemble with diverse views of high-dimensional genetic data obtained through bagging and random subspace, the two key features of the RF algorithm.

Collapse

Huo Z, Tseng G. Integrative Sparse K-Means With Overlapping Group Lasso in Genomic Applications for Disease Subtype Discovery. Ann Appl Stat 2017;11:1011-1039. [PMID: 28959370 PMCID: PMC5613668 DOI: 10.1214/17-aoas1033] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Iam-On N, Boongoen T. Generating descriptive model for student dropout: a review of clustering approach. HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES 2017. [DOI: 10.1186/s13673-016-0083-0] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Ronan T, Qi Z, Naegle KM. Avoiding common pitfalls when clustering biological data. Sci Signal 2016;9:re6. [PMID: 27303057 DOI: 10.1126/scisignal.aad1932] [Citation(s) in RCA: 86] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]

Huo Z, Ding Y, Liu S, Oesterreich S, Tseng G. Meta-analytic framework for sparse K-means to identify disease subtypes in multiple transcriptomic studies. J Am Stat Assoc 2016;111:27-42. [PMID: 27330233 PMCID: PMC4908837 DOI: 10.1080/01621459.2015.1086354] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Revised: 08/01/2015] [Indexed: 12/15/2022]

Rodenas-Cuadrado P, Chen XS, Wiegrebe L, Firzlaff U, Vernes SC. A novel approach identifies the first transcriptome networks in bats: a new genetic model for vocal communication. BMC Genomics 2015;16:836. [PMID: 26490347 PMCID: PMC4618519 DOI: 10.1186/s12864-015-2068-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2015] [Accepted: 10/13/2015] [Indexed: 12/15/2022] Open

Wang J, Zhong J, Chen G, Li M, Wu FX, Pan Y. ClusterViz: A Cytoscape APP for Cluster Analysis of Biological Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015;12:815-822. [PMID: 26357321 DOI: 10.1109/tcbb.2014.2361348] [Citation(s) in RCA: 74] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Improved student dropout prediction in Thai University using ensemble of mixed-type data clusterings. INT J MACH LEARN CYB 2015. [DOI: 10.1007/s13042-015-0341-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Șenbabaoğlu Y, Michailidis G, Li JZ. Critical limitations of consensus clustering in class discovery. Sci Rep 2014;4:6207. [PMID: 25158761 PMCID: PMC4145288 DOI: 10.1038/srep06207] [Citation(s) in RCA: 187] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2014] [Accepted: 08/08/2014] [Indexed: 11/09/2022] Open

Wang X, Laird PW, Hinoue T, Groshen S, Siegmund KD. Non-specific filtering of beta-distributed data. BMC Bioinformatics 2014;15:199. [PMID: 24943962 PMCID: PMC4230495 DOI: 10.1186/1471-2105-15-199] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2013] [Accepted: 06/12/2014] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Non-specific feature selection is a dimension reduction procedure performed prior to cluster analysis of high dimensional molecular data. Not all measured features are expected to show biological variation, so only the most varying are selected for analysis. In DNA methylation studies, DNA methylation is measured as a proportion, bounded between 0 and 1, with variance a function of the mean. Filtering on standard deviation biases the selection of probes to those with mean values near 0.5. We explore the effect this has on clustering, and develop alternate filter methods that utilize a variance stabilizing transformation for Beta distributed data and do not share this bias.

RESULTS

We compared results for 11 different non-specific filters on eight Infinium HumanMethylation data sets, selected to span a variety of biological conditions. We found that for data sets having a small fraction of samples showing abnormal methylation of a subset of normally unmethylated CpGs, a characteristic of the CpG island methylator phenotype in cancer, a novel filter statistic that utilized a variance-stabilizing transformation for Beta distributed data outperformed the common filter of using standard deviation of the DNA methylation proportion, or its log-transformed M-value, in its ability to detect the cancer subtype in a cluster analysis. However, the standard deviation filter always performed among the best for distinguishing subgroups of normal tissue. The novel filter and standard deviation filter tended to favour features in different genome contexts; for the same data set, the novel filter always selected more features from CpG island promoters and the standard deviation filter always selected more features from non-CpG island intergenic regions. Interestingly, despite selecting largely non-overlapping sets of features, the two filters did find sample subsets that overlapped for some real data sets.

CONCLUSIONS

We found two different filter statistics that tended to prioritize features with different characteristics, each performed well for identifying clusters of cancer and non-cancer tissue, and identifying a cancer CpG island hypermethylation phenotype. Since cluster analysis is for discovery, we would suggest trying both filters on any new data sets, evaluating the overlap of features selected and clusters discovered.

Collapse

Wang Y, Pan Y. Semi-supervised consensus clustering for gene expression data analysis. BioData Min 2014;7:7. [PMID: 24920961 PMCID: PMC4036113 DOI: 10.1186/1756-0381-7-7] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2013] [Accepted: 04/05/2014] [Indexed: 01/08/2023] Open

Comparative study of matrix refinement approaches for ensemble clustering. Mach Learn 2013. [DOI: 10.1007/s10994-013-5342-y] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Liseron-Monfils C, Lewis T, Ashlock D, McNicholas PD, Fauteux F, Strömvik M, Raizada MN. Promzea: a pipeline for discovery of co-regulatory motifs in maize and other plant species and its application to the anthocyanin and phlobaphene biosynthetic pathways and the Maize Development Atlas. BMC PLANT BIOLOGY 2013;13:42. [PMID: 23497159 PMCID: PMC3658923 DOI: 10.1186/1471-2229-13-42] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2012] [Accepted: 03/08/2013] [Indexed: 05/05/2023]

Abstract

BACKGROUND

The discovery of genetic networks and cis-acting DNA motifs underlying their regulation is a major objective of transcriptome studies. The recent release of the maize genome (Zea mays L.) has facilitated in silico searches for regulatory motifs. Several algorithms exist to predict cis-acting elements, but none have been adapted for maize.

RESULTS

A benchmark data set was used to evaluate the accuracy of three motif discovery programs: BioProspector, Weeder and MEME. Analysis showed that each motif discovery tool had limited accuracy and appeared to retrieve a distinct set of motifs. Therefore, using the benchmark, statistical filters were optimized to reduce the false discovery ratio, and then remaining motifs from all programs were combined to improve motif prediction. These principles were integrated into a user-friendly pipeline for motif discovery in maize called Promzea, available at http://www.promzea.org and on the Discovery Environment of the iPlant Collaborative website. Promzea was subsequently expanded to include rice and Arabidopsis. Within Promzea, a user enters cDNA sequences or gene IDs; corresponding upstream sequences are retrieved from the maize genome. Predicted motifs are filtered, combined and ranked. Promzea searches the chosen plant genome for genes containing each candidate motif, providing the user with the gene list and corresponding gene annotations. Promzea was validated in silico using a benchmark data set: the Promzea pipeline showed a 22% increase in nucleotide sensitivity compared to the best standalone program tool, Weeder, with equivalent nucleotide specificity. Promzea was also validated by its ability to retrieve the experimentally defined binding sites of transcription factors that regulate the maize anthocyanin and phlobaphene biosynthetic pathways. Promzea predicted additional promoter motifs, and genome-wide motif searches by Promzea identified 127 non-anthocyanin/phlobaphene genes that each contained all five predicted promoter motifs in their promoters, perhaps uncovering a broader co-regulated gene network. Promzea was also tested against tissue-specific microarray data from maize.

CONCLUSIONS

An online tool customized for promoter motif discovery in plants has been generated called Promzea. Promzea was validated in silico by its ability to retrieve benchmark motifs and experimentally defined motifs and was tested using tissue-specific microarray data. Promzea predicted broader networks of gene regulation associated with the historic anthocyanin and phlobaphene biosynthetic pathways. Promzea is a new bioinformatics tool for understanding transcriptional gene regulation in maize and has been expanded to include rice and Arabidopsis.

Collapse

Kim EY, Hwang DU, Ko TW. Multiscale ensemble clustering for finding modules in complex networks. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2012;85:026119. [PMID: 22463291 DOI: 10.1103/physreve.85.026119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2011] [Indexed: 05/31/2023]

Mimaroglu S, Aksehirli E. DICLENS: divisive clustering ensemble with automatic cluster number. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011;9:408-420. [PMID: 21968960 DOI: 10.1109/tcbb.2011.129] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

Cancer classification based on microarray gene expression data using a principal component accumulation method. Sci China Chem 2011. [DOI: 10.1007/s11426-011-4263-5] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

New possibilistic method for discovering linear local behavior using hyper-Gaussian distributed membership function. Knowl Inf Syst 2011. [DOI: 10.1007/s10115-011-0385-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Bayá AE, Granitto PM. Clustering gene expression data with a penalized graph-based metric. BMC Bioinformatics 2011;12:2. [PMID: 21205299 PMCID: PMC3023695 DOI: 10.1186/1471-2105-12-2] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2010] [Accepted: 01/04/2011] [Indexed: 12/05/2022] Open

Sung MK, Bae YJ. Linking obesity to colorectal cancer: application of nutrigenomics. Biotechnol J 2010;5:930-41. [PMID: 20715079 DOI: 10.1002/biot.201000165] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Iam-on N, Boongoen T, Garrett S. LCE: a link-based cluster ensemble method for improved gene expression data analysis. Bioinformatics 2010;26:1513-9. [PMID: 20444838 DOI: 10.1093/bioinformatics/btq226] [Citation(s) in RCA: 85] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Newman AM, Cooper JB. AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number. BMC Bioinformatics 2010;11:117. [PMID: 20202218 PMCID: PMC2846907 DOI: 10.1186/1471-2105-11-117] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2009] [Accepted: 03/04/2010] [Indexed: 12/25/2022] Open