Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Teng SL, Huang H. A Statistical Framework to Infer Functional Gene Relationships From Biologically Interrelated Microarray Experiments. J Am Stat Assoc 2009. [DOI: 10.1198/jasa.2009.0037] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

For:	Teng SL, Huang H. A Statistical Framework to Infer Functional Gene Relationships From Biologically Interrelated Microarray Experiments. J Am Stat Assoc 2009. [DOI: 10.1198/jasa.2009.0037] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Number

Cited by Other Article(s)

Wang Y, Sun Z, Song D, Hero A. Kronecker-structured covariance models for multiway data. STATISTICS SURVEYS 2022. [DOI: 10.1214/22-ss139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]

Niu L, Liu X, Zhao J. Robust estimator of the correlation matrix with sparse Kronecker structure for a high-dimensional matrix-variate. J MULTIVARIATE ANAL 2020. [DOI: 10.1016/j.jmva.2020.104598] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

Chen X, Liu W. Testing independence with high-dimensional correlated samples. Ann Stat 2018. [DOI: 10.1214/17-aos1571] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Data Wisdom in Computational Genomics Research. STATISTICS IN BIOSCIENCES 2017. [DOI: 10.1007/s12561-016-9173-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Yang H, Liu X. Studies on the Clustering Algorithm for Analyzing Gene Expression Data with a Bidirectional Penalty. J Comput Biol 2017;24:689-698. [PMID: 28489418 DOI: 10.1089/cmb.2017.0051] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Kim S, Lin CW, Tseng GC. MetaKTSP: a meta-analytic top scoring pair method for robust cross-study validation of omics prediction analysis. Bioinformatics 2016;32:1966-73. [PMID: 27153719 DOI: 10.1093/bioinformatics/btw115] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2015] [Accepted: 02/19/2016] [Indexed: 01/08/2023] Open

Abstract

MOTIVATION

Supervised machine learning is widely applied to transcriptomic data to predict disease diagnosis, prognosis or survival. Robust and interpretable classifiers with high accuracy are usually favored for their clinical and translational potential. The top scoring pair (TSP) algorithm is an example that applies a simple rank-based algorithm to identify rank-altered gene pairs for classifier construction. Although many classification methods perform well in cross-validation of single expression profile, the performance usually greatly reduces in cross-study validation (i.e. the prediction model is established in the training study and applied to an independent test study) for all machine learning methods, including TSP. The failure of cross-study validation has largely diminished the potential translational and clinical values of the models. The purpose of this article is to develop a meta-analytic top scoring pair (MetaKTSP) framework that combines multiple transcriptomic studies and generates a robust prediction model applicable to independent test studies.

RESULTS

We proposed two frameworks, by averaging TSP scores or by combining P-values from individual studies, to select the top gene pairs for model construction. We applied the proposed methods in simulated data sets and three large-scale real applications in breast cancer, idiopathic pulmonary fibrosis and pan-cancer methylation. The result showed superior performance of cross-study validation accuracy and biomarker selection for the new meta-analytic framework. In conclusion, combining multiple omics data sets in the public domain increases robustness and accuracy of the classification model that will ultimately improve disease understanding and clinical treatment decisions to benefit patients.

AVAILABILITY AND IMPLEMENTATION

An R package MetaKTSP is available online. (http://tsenglab.biostat.pitt.edu/software.htm).

CONTACT

ctseng@pitt.edu

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Freytag S, Gagnon-Bartsch J, Speed TP, Bahlo M. Systematic noise degrades gene co-expression signals but can be corrected. BMC Bioinformatics 2015;16:309. [PMID: 26403471 PMCID: PMC4583191 DOI: 10.1186/s12859-015-0745-3] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2015] [Accepted: 09/16/2015] [Indexed: 12/31/2022] Open

Abstract

Background

In the past decade, the identification of gene co-expression has become a routine part of the analysis of high-dimensional microarray data. Gene co-expression, which is mostly detected via the Pearson correlation coefficient, has played an important role in the discovery of molecular pathways and networks. Unfortunately, the presence of systematic noise in high-dimensional microarray datasets corrupts estimates of gene co-expression. Removing systematic noise from microarray data is therefore crucial. Many cleaning approaches for microarray data exist, however these methods are aimed towards improving differential expression analysis and their performances have been primarily tested for this application. To our knowledge, the performances of these approaches have never been systematically compared in the context of gene co-expression estimation.

Results

Using simulations we demonstrate that standard cleaning procedures, such as background correction and quantile normalization, fail to adequately remove systematic noise that affects gene co-expression and at times further degrade true gene co-expression. Instead we show that a global version of removal of unwanted variation (RUV), a data-driven approach, removes systematic noise but also allows the estimation of the true underlying gene-gene correlations. We compare the performance of all noise removal methods when applied to five large published datasets on gene expression in the human brain. RUV retrieves the highest gene co-expression values for sets of genes known to interact, but also provides the greatest consistency across all five datasets. We apply the method to prioritize epileptic encephalopathy candidate genes.

Conclusions

Our work raises serious concerns about the quality of many published gene co-expression analyses. RUV provides an efficient and flexible way to remove systematic noise from high-dimensional microarray datasets when the objective is gene co-expression analysis. The RUV method as applicable in the context of gene-gene correlation estimation is available as a BioconductoR-package: RUVcorr.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0745-3) contains supplementary material, which is available to authorized users.

Collapse

Wang YXR, Jiang K, Feldman LJ, Bickel PJ, Huang H. Inferring gene–gene interactions and functional modules using sparse canonical correlation analysis. Ann Appl Stat 2015. [DOI: 10.1214/14-aoas792] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Touloumis A, Tavaré S, Marioni JC. Testing the mean matrix in high-dimensional transposable data. Biometrics 2015;71:157-166. [DOI: 10.1111/biom.12257] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2013] [Revised: 09/01/2014] [Accepted: 09/01/2014] [Indexed: 11/29/2022]

Wang YXR, Huang H. Review on statistical methods for gene network reconstruction using expression data. J Theor Biol 2014;362:53-61. [PMID: 24726980 DOI: 10.1016/j.jtbi.2014.03.040] [Citation(s) in RCA: 97] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2014] [Revised: 03/29/2014] [Accepted: 03/31/2014] [Indexed: 12/16/2022]

Ning Y, Liu H. High-dimensional semiparametric bigraphical models. Biometrika 2013. [DOI: 10.1093/biomet/ast009] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Yin J, Li H. Model Selection and Estimation in the Matrix Normal Graphical Model. J MULTIVARIATE ANAL 2012;107:119-140. [PMID: 22368309 PMCID: PMC3285238 DOI: 10.1016/j.jmva.2012.01.005] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Allen GI, Tibshirani R. Inference with transposable data: modelling the effects of row and column correlations. J R Stat Soc Series B Stat Methodol 2012;74:721-743. [DOI: 10.1111/j.1467-9868.2011.01027.x] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]

Kim K, Jiang K, Teng SL, Feldman LJ, Huang H. Using biologically interrelated experiments to identify pathway genes in Arabidopsis. ACTA ACUST UNITED AC 2012;28:815-22. [PMID: 22271267 DOI: 10.1093/bioinformatics/bts038] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]