Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	[Subscribe] [Scholar Register]

Number

Cited by Other Article(s)

Lu M, Yin R, Chen XS. Ensemble methods of rank-based trees for single sample classification with gene expression profiles. J Transl Med 2024;22:140. [PMID: 38321494 PMCID: PMC10848444 DOI: 10.1186/s12967-024-04940-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Accepted: 01/27/2024] [Indexed: 02/08/2024] Open

Kwan B, Fuhrer T, Montemayor D, Fink JC, He J, Hsu CY, Messer K, Nelson RG, Pu M, Ricardo AC, Rincon-Choles H, Shah VO, Ye H, Zhang J, Sharma K, Natarajan L. A generalized covariate-adjusted top-scoring pair algorithm with applications to diabetic kidney disease stage classification in the Chronic Renal Insufficiency Cohort (CRIC) Study. BMC Bioinformatics 2023;24:57. [PMID: 36803209 PMCID: PMC9942303 DOI: 10.1186/s12859-023-05171-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 02/02/2023] [Indexed: 02/22/2023] Open

Abstract

BACKGROUND

The growing amount of high dimensional biomolecular data has spawned new statistical and computational models for risk prediction and disease classification. Yet, many of these methods do not yield biologically interpretable models, despite offering high classification accuracy. An exception, the top-scoring pair (TSP) algorithm derives parameter-free, biologically interpretable single pair decision rules that are accurate and robust in disease classification. However, standard TSP methods do not accommodate covariates that could heavily influence feature selection for the top-scoring pair. Herein, we propose a covariate-adjusted TSP method, which uses residuals from a regression of features on the covariates for identifying top scoring pairs. We conduct simulations and a data application to investigate our method, and compare it to existing classifiers, LASSO and random forests.

RESULTS

Our simulations found that features that were highly correlated with clinical variables had high likelihood of being selected as top scoring pairs in the standard TSP setting. However, through residualization, our covariate-adjusted TSP was able to identify new top scoring pairs, that were largely uncorrelated with clinical variables. In the data application, using patients with diabetes (n = 977) selected for metabolomic profiling in the Chronic Renal Insufficiency Cohort (CRIC) study, the standard TSP algorithm identified (valine-betaine, dimethyl-arg) as the top-scoring metabolite pair for classifying diabetic kidney disease (DKD) severity, whereas the covariate-adjusted TSP method identified the pair (pipazethate, octaethylene glycol) as top-scoring. Valine-betaine and dimethyl-arg had, respectively, ≥ 0.4 absolute correlation with urine albumin and serum creatinine, known prognosticators of DKD. Thus without covariate-adjustment the top-scoring pair largely reflected known markers of disease severity, whereas covariate-adjusted TSP uncovered features liberated from confounding, and identified independent prognostic markers of DKD severity. Furthermore, TSP-based methods achieved competitive classification accuracy in DKD to LASSO and random forests, while providing more parsimonious models.

CONCLUSIONS

We extended TSP-based methods to account for covariates, via a simple, easy to implement residualizing process. Our covariate-adjusted TSP method identified metabolite features, uncorrelated from clinical covariates, that discriminate DKD severity stage based on the relative ordering between two features, and thus provide insights into future studies on the order reversals in early vs advanced disease states.

Collapse

Affiliation(s)

Brian Kwan Division of Biostatistics and Bioinformatics, Herbert Wertheim School of Public Health, University of California, San Diego, La Jolla, CA, USA Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
Tobias Fuhrer Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
Daniel Montemayor Division of Nephrology, Department of Medicine, University of Texas Health San Antonio, San Antonio, TX, USA Center for Renal Precision Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
Jeffery C Fink Department of Medicine, University of Maryland, Baltimore School of Medicine, Baltimore, MD, USA
Jiang He Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine and Tulane University Translational Science Institute,, New Orleans, LA, USA
Chi-Yuan Hsu Division of Nephrology, University of California, San Francisco School of Medicine, San Francisco, CA, USA
Karen Messer Division of Biostatistics and Bioinformatics, Herbert Wertheim School of Public Health, University of California, San Diego, La Jolla, CA, USA Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
Robert G Nelson Chronic Kidney Disease Section, National Institute of Diabetes and Digestive and Kidney Diseases, Phoenix, AZ, USA
Minya Pu Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
Ana C Ricardo Department of Medicine, University of Illinois, Chicago, IL, USA
Hernan Rincon-Choles Department of Nephrology, Glickman Urological and Kidney Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
Vallabh O Shah University of New Mexico Health Sciences Center, Albuquerque, NM, USA
Hongping Ye Division of Nephrology, Department of Medicine, University of Texas Health San Antonio, San Antonio, TX, USA Center for Renal Precision Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
Jing Zhang Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
Kumar Sharma Division of Nephrology, Department of Medicine, University of Texas Health San Antonio, San Antonio, TX, USA Center for Renal Precision Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
Loki Natarajan Division of Biostatistics and Bioinformatics, Herbert Wertheim School of Public Health, University of California, San Diego, La Jolla, CA, USA. Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA.

Collapse

Kim DM, Feilotter HE, Davey SK. BRCA1 Variant Assessment Using a Simple Analytic Assay. J Appl Lab Med 2022;7:674-688. [PMID: 35021209 DOI: 10.1093/jalm/jfab163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 10/04/2021] [Indexed: 11/14/2022]

Eriksson P, Marzouka NAD, Sjödahl G, Bernardo C, Liedberg F, Höglund M. A comparison of rule-based and centroid single-sample multiclass predictors for transcriptomic classification. Bioinformatics 2021;38:1022-1029. [PMID: 34788787 PMCID: PMC8796360 DOI: 10.1093/bioinformatics/btab763] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Revised: 10/24/2021] [Accepted: 11/02/2021] [Indexed: 02/03/2023] Open

Abstract

MOTIVATION

Gene expression-based multiclass prediction, such as tumor subtyping, is a non-trivial bioinformatic problem. Most classifier methods operate by comparing expression levels relative to other samples. Methods that base predictions on the expression pattern within a sample have been proposed as an alternative. As these methods are invariant to the cohort composition and can be applied to a sample in isolation, they can collectively be termed single sample predictors (SSP). Such predictors could potentially be used for preprocessing-free classification of new samples and be built to function across different expression platforms where proper batch and dataset normalization is challenging. Here, we evaluate the behavior of several multiclass SSPs based on binary gene-pair rules (k-Top Scoring Pairs, Absolute Intrinsic Molecular Subtyping and a new Random Forest approach) and compare them to centroids built with centered or raw expression values, with the criteria that an optimal predictor should have high accuracy, overcome differences in tumor purity, be robust across expression platforms and provide an informative prediction output score.

RESULTS

We found that gene-pair-based SSPs showed excellent performance on many expression-based classification tasks. The three methods differed in prediction score output, handling of tied scores and behavior in low purity samples. The k-Top Scoring Pairs and Random Forest approach both achieved high classification accuracy while providing an informative prediction score. Although gene-pair-based SSPs have been touted as being cross-platform compatible (through training on mixed platform data), out-of-the-box compatibility with a new dataset remains a potential issue that warrants cohort-to-cohort verification.

AVAILABILITY AND IMPLEMENTATION

Our R package 'multiclassPairs' (https://cran.r-project.org/package=multiclassPairs) (https://doi.org/10.1093/bioinformatics/btab088) is freely available and enables easy training, prediction, and visualization using the gene-pair rule-based Random Forest SSP method and provides additional multiclass functionalities to the switchBox k-Top-Scoring Pairs package.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Chen A, Laeyendecker O, Eshleman SH, Monaco DR, Kammers K, Larman HB, Ruczinski I. A top scoring pairs classifier for recent HIV infections. Stat Med 2021;40:2604-2612. [PMID: 33660319 DOI: 10.1002/sim.8920] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 01/07/2021] [Accepted: 02/03/2021] [Indexed: 11/11/2022]

Marzouka NAD, Eriksson P. multiclassPairs: an R package to train multiclass pair-based classifier. Bioinformatics 2021;37:3043-3044. [PMID: 33543757 PMCID: PMC8479681 DOI: 10.1093/bioinformatics/btab088] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 01/27/2021] [Accepted: 02/02/2021] [Indexed: 02/02/2023] Open

Li X, Huang H, Zhang J, Jiang F, Guo Y, Shi Y, Guo Z, Ao L. A qualitative transcriptional signature for predicting the biochemical recurrence risk of prostate cancer patients after radical prostatectomy. Prostate 2020;80:376-387. [PMID: 31961962 PMCID: PMC7065139 DOI: 10.1002/pros.23952] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Accepted: 01/02/2020] [Indexed: 12/27/2022]

Abstract

BACKGROUND

The qualitative transcriptional characteristics, the within-sample relative expression orderings (REOs) of genes, are highly robust against batch effects and sample quality variations. Hence, we develop a qualitative transcriptional signature based on REOs to predict the biochemical recurrence risk of prostate cancer (PCa) patients after radical prostatectomy.

METHODS

Gene pairs with REOs significantly correlated with the biochemical recurrence-free survival (BFS) were identified from 131 PCa samples in the training data set. From these gene pairs, we selected a qualitative transcriptional signature based on the within-sample REOs of gene pairs which could predict the recurrence risk of PCa patients after radical prostatectomy.

RESULTS

A signature consisting of 74 gene pairs, named 74-GPS, was developed for predicting the recurrence risk of PCa patients after radical prostatectomy based on the majority voting rule that a sample was assigned as high risk when at least 37 gene pairs of the 74-GPS voted for high risk; otherwise, low risk. The signature was validated in six independent datasets produced by different platforms. In each of the validation datasets, the Kaplan-Meier survival analysis showed that the average BFS of the low-risk group was significantly better than that of the high-risk group. Analyses of multiomics data of PCa samples from TCGA suggested that both the epigenomic and genomic alternations could cause the reproducible transcriptional differences between the two different prognostic groups.

CONCLUSIONS

The proposed qualitative transcriptional signature can robustly stratify PCa patients after radical prostatectomy into two groups with different recurrence risk and distinct multiomics characteristics. Hence, 74-GPS may serve as a helpful tool for guiding the management of PCa patients with radical prostatectomy at the individual level.

Collapse

Affiliation(s)

Xiang Li Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical SciencesFujian Medical UniversityFuzhouChina Key Laboratory of Medical BioinformaticsFujian Medical UniversityFuzhouChina Fujian Key Laboratory of Tumor MicrobiologyFujian Medical UniversityFuzhouChina
Haiyan Huang Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical SciencesFujian Medical UniversityFuzhouChina
Jiahui Zhang Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical SciencesFujian Medical UniversityFuzhouChina
Fengle Jiang Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical SciencesFujian Medical UniversityFuzhouChina
Yating Guo Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical SciencesFujian Medical UniversityFuzhouChina
Yidan Shi Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical SciencesFujian Medical UniversityFuzhouChina
Zheng Guo Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical SciencesFujian Medical UniversityFuzhouChina Key Laboratory of Medical BioinformaticsFujian Medical UniversityFuzhouChina Fujian Key Laboratory of Tumor MicrobiologyFujian Medical UniversityFuzhouChina
Lu Ao Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical SciencesFujian Medical UniversityFuzhouChina Key Laboratory of Medical BioinformaticsFujian Medical UniversityFuzhouChina Fujian Key Laboratory of Tumor MicrobiologyFujian Medical UniversityFuzhouChina

Collapse

Krzhizhanovskaya VV, Závodszky G, Lees MH, Dongarra JJ, Sloot PMA, Brissos S, Teixeira J. Tree Based Advanced Relative Expression Analysis. LECTURE NOTES IN COMPUTER SCIENCE 2020. [PMCID: PMC7304016 DOI: 10.1007/978-3-030-50420-5_37] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Rashid NU, Peng XL, Jin C, Moffitt RA, Volmar KE, Belt BA, Panni RZ, Nywening TM, Herrera SG, Moore KJ, Hennessey SG, Morrison AB, Kawalerski R, Nayyar A, Chang AE, Schmidt B, Kim HJ, Linehan DC, Yeh JJ. Purity Independent Subtyping of Tumors (PurIST), A Clinically Robust, Single-sample Classifier for Tumor Subtyping in Pancreatic Cancer. Clin Cancer Res 2019;26:82-92. [PMID: 31754050 DOI: 10.1158/1078-0432.ccr-19-1467] [Citation(s) in RCA: 109] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 07/10/2019] [Accepted: 10/01/2019] [Indexed: 12/20/2022]

Affiliation(s)

Naim U Rashid Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina. .,Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
Xianlu L Peng Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
Chong Jin Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina.,Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
Richard A Moffitt Department of Biomedical Informatics and Pathology, Stony Brook University, Stony Brook, New York.,Department of Pharmacological Sciences, Stony Brook Cancer Center, Stony Brook University, Stony Brook, New York
Keith E Volmar University of North Carolina-Rex Healthcare, Raleigh, North Carolina
Brian A Belt Department of Surgery, University of Rochester, Rochester, New York
Roheena Z Panni Department of Surgery, Washington University, Saint Louis, St. Louis, Missouri
Timothy M Nywening Department of Surgery, Washington University, Saint Louis, St. Louis, Missouri
Silvia G Herrera Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
Kristin J Moore Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
Sarah G Hennessey Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
Ashley B Morrison Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
Ryan Kawalerski Department of Biomedical Informatics and Pathology, Stony Brook University, Stony Brook, New York
Apoorve Nayyar Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
Audrey E Chang Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
Benjamin Schmidt Department of Surgery, Washington University, Saint Louis, St. Louis, Missouri
Hong Jin Kim Department of Surgery, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
David C Linehan Department of Surgery, University of Rochester, Rochester, New York
Jen Jen Yeh Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina. .,Department of Surgery, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina.,Department of Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina

Collapse

Rashid NU, Li Q, Yeh JJ, Ibrahim JG. Modeling Between-Study Heterogeneity for Improved Replicability in Gene Signature Selection and Clinical Prediction. J Am Stat Assoc 2019;115:1125-1138. [PMID: 33012902 DOI: 10.1080/01621459.2019.1671197] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]

Afsari B, Guo T, Considine M, Florea L, Kagohara LT, Stein-O'Brien GL, Kelley D, Flam E, Zambo KD, Ha PK, Geman D, Ochs MF, Califano JA, Gaykalova DA, Favorov AV, Fertig EJ. Splice Expression Variation Analysis (SEVA) for inter-tumor heterogeneity of gene isoform usage in cancer. Bioinformatics 2019;34:1859-1867. [PMID: 29342249 DOI: 10.1093/bioinformatics/bty004] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2017] [Accepted: 01/10/2018] [Indexed: 12/22/2022] Open

Abstract

Motivation

Current bioinformatics methods to detect changes in gene isoform usage in distinct phenotypes compare the relative expected isoform usage in phenotypes. These statistics model differences in isoform usage in normal tissues, which have stable regulation of gene splicing. Pathological conditions, such as cancer, can have broken regulation of splicing that increases the heterogeneity of the expression of splice variants. Inferring events with such differential heterogeneity in gene isoform usage requires new statistical approaches.

Results

We introduce Splice Expression Variability Analysis (SEVA) to model increased heterogeneity of splice variant usage between conditions (e.g. tumor and normal samples). SEVA uses a rank-based multivariate statistic that compares the variability of junction expression profiles within one condition to the variability within another. Simulated data show that SEVA is unique in modeling heterogeneity of gene isoform usage, and benchmark SEVA's performance against EBSeq, DiffSplice and rMATS that model differential isoform usage instead of heterogeneity. We confirm the accuracy of SEVA in identifying known splice variants in head and neck cancer and perform cross-study validation of novel splice variants. A novel comparison of splice variant heterogeneity between subtypes of head and neck cancer demonstrated unanticipated similarity between the heterogeneity of gene isoform usage in HPV-positive and HPV-negative subtypes and anticipated increased heterogeneity among HPV-negative samples with mutations in genes that regulate the splice variant machinery. These results show that SEVA accurately models differential heterogeneity of gene isoform usage from RNA-seq data.

Availability and implementation

SEVA is implemented in the R/Bioconductor package GSReg.

Contact

bahman@jhu.edu or favorov@sensi.org or ejfertig@jhmi.edu.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Johnson KW, Glicksberg BS, Shameer K, Vengrenyuk Y, Krittanawong C, Russak AJ, Sharma SK, Narula JN, Dudley JT, Kini AS. A transcriptomic model to predict increase in fibrous cap thickness in response to high-dose statin treatment: Validation by serial intracoronary OCT imaging. EBioMedicine 2019;44:41-49. [PMID: 31126891 PMCID: PMC6607084 DOI: 10.1016/j.ebiom.2019.05.007] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2019] [Revised: 04/15/2019] [Accepted: 05/03/2019] [Indexed: 02/04/2023] Open

A new data analysis method based on feature linear combination. J Biomed Inform 2019;94:103173. [PMID: 30965135 DOI: 10.1016/j.jbi.2019.103173] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2018] [Revised: 04/02/2019] [Accepted: 04/06/2019] [Indexed: 01/15/2023]

Abstract

In biological data, feature relationships are complex and diverse, they could reflect physiological and pathological changes. Defining simple and efficient classification rules based on feature relationships is helpful for discriminating different conditions and studying disease mechanism. The popular data analysis method, k top scoring pairs (k-TSP), explores the feature relationship by focusing on the difference of the relative level of two features in different groups and classifies samples based on the exploration. To define more efficient classification rules, we propose a new data analysis method based on the linear combination of k > 0 top scoring pairs (LC-k-TSP). LC-k-TSP applies support vector machine (SVM) to define the best linear relationship of each feature pair, scores feature pairs by the discriminative abilities of the corresponding linear combinations and selects k disjoint top scoring pairs to construct an ensemble classifier. Experiments on twelve public datasets showed the superiority of LC-k-TSP over k-TSP which evaluates the relationship of every two features in the same way. The experiment also illustrated that LC-k-TSP performed similarly to SVM and random forest (RF) in accuracy rate. LC-k-TSP studies the own unique linear combination for each feature pair and defines simple classification rules, it is easy to explore the biomedical explanation. Finally, we applied LC-k-TSP to analyze the hepatocellular carcinoma (HCC) metabolomics data and define the simple classification rules for discrimination of different liver diseases. It obtained accuracy rates of 89.76% and 89.13% in distinguishing between small HCC and hepatic cirrhosis (CIR) groups as well as between HCC and CIR groups, superior to 87.99% and 80.35% by k-TSP. Hence, defining classification rules based on feature relationships is an effective way to analyze biological data. LC-k-TSP which checks different feature pairs by their corresponding unique best linear relationship has the superiority over k-TSP which checks each pair by the same linear relationship. Availability and implementation: http://www.402.dicp.ac.cn/download_ok_4.htm.

Collapse

Sjöström M, Staaf J, Edén P, Wärnberg F, Bergh J, Malmström P, Fernö M, Niméus E, Fredriksson I. Identification and validation of single-sample breast cancer radiosensitivity gene expression predictors. Breast Cancer Res 2018;20:64. [PMID: 29973242 PMCID: PMC6033283 DOI: 10.1186/s13058-018-0978-y] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Accepted: 05/08/2018] [Indexed: 02/12/2023] Open

Abstract

BACKGROUND

Adjuvant radiotherapy is the standard of care after breast-conserving surgery for primary breast cancer, despite a majority of patients being over- or under-treated. In contrast to adjuvant endocrine therapy and chemotherapy, no diagnostic tests are in clinical use that can stratify patients for adjuvant radiotherapy. This study presents the development and validation of a targeted gene expression assay to predict the risk of ipsilateral breast tumor recurrence and response to adjuvant radiotherapy after breast-conserving surgery in primary breast cancer.

METHODS

Fresh-frozen primary tumors from 336 patients radically (clear margins) operated on with breast-conserving surgery with or without radiotherapy were collected. Patients were split into a discovery cohort (N = 172) and a validation cohort (N = 164). Genes predicting ipsilateral breast tumor recurrence in an Illumina HT12 v4 whole transcriptome analysis were combined with genes identified in the literature (248 genes in total) to develop a targeted radiosensitivity assay on the Nanostring nCounter platform. Single-sample predictors for ipsilateral breast tumor recurrence based on a k-top scoring pairs algorithm were trained, stratified for estrogen receptor (ER) status and radiotherapy. Two previously published profiles, the radiosensitivity signature of Speers et al., and the 10-gene signature of Eschrich et al., were also included in the targeted panel.

RESULTS

Derived single-sample predictors were prognostic for ipsilateral breast tumor recurrence in radiotherapy-treated ER+ patients (AUC 0.67, p = 0.01), ER+ patients without radiotherapy (AUC = 0.89, p = 0.02), and radiotherapy-treated ER- patients (AUC = 0.78, p < 0.001). Among ER+ patients, radiotherapy had an excellent effect on tumors classified as radiosensitive (p < 0.001), while radiotherapy had no effect on tumors classified as radioresistant (p = 0.36) and there was a high risk of ipsilateral breast tumor recurrence (55% at 10 years). Our single-sample predictors developed in ER+ tumors and the radiosensitivity signature correlated with proliferation, while single-sample predictors developed in ER- tumors correlated with immune response. The 10-gene signature negatively correlated with both proliferation and immune response.

CONCLUSIONS

Our targeted single-sample predictors were prognostic for ipsilateral breast tumor recurrence and have the potential to stratify patients for adjuvant radiotherapy. The correlation of models with biology may explain the different performance in subgroups of breast cancer.

Collapse

Kim S, Lin CW, Tseng GC. MetaKTSP: a meta-analytic top scoring pair method for robust cross-study validation of omics prediction analysis. Bioinformatics 2016;32:1966-73. [PMID: 27153719 DOI: 10.1093/bioinformatics/btw115] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2015] [Accepted: 02/19/2016] [Indexed: 01/08/2023] Open

Abstract

MOTIVATION

Supervised machine learning is widely applied to transcriptomic data to predict disease diagnosis, prognosis or survival. Robust and interpretable classifiers with high accuracy are usually favored for their clinical and translational potential. The top scoring pair (TSP) algorithm is an example that applies a simple rank-based algorithm to identify rank-altered gene pairs for classifier construction. Although many classification methods perform well in cross-validation of single expression profile, the performance usually greatly reduces in cross-study validation (i.e. the prediction model is established in the training study and applied to an independent test study) for all machine learning methods, including TSP. The failure of cross-study validation has largely diminished the potential translational and clinical values of the models. The purpose of this article is to develop a meta-analytic top scoring pair (MetaKTSP) framework that combines multiple transcriptomic studies and generates a robust prediction model applicable to independent test studies.

RESULTS

We proposed two frameworks, by averaging TSP scores or by combining P-values from individual studies, to select the top gene pairs for model construction. We applied the proposed methods in simulated data sets and three large-scale real applications in breast cancer, idiopathic pulmonary fibrosis and pan-cancer methylation. The result showed superior performance of cross-study validation accuracy and biomarker selection for the new meta-analytic framework. In conclusion, combining multiple omics data sets in the public domain increases robustness and accuracy of the classification model that will ultimately improve disease understanding and clinical treatment decisions to benefit patients.

AVAILABILITY AND IMPLEMENTATION

An R package MetaKTSP is available online. (http://tsenglab.biostat.pitt.edu/software.htm).

CONTACT

ctseng@pitt.edu

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Afsari B, Geman D, Fertig EJ. Learning dysregulated pathways in cancers from differential variability analysis. Cancer Inform 2014;13:61-7. [PMID: 25392694 PMCID: PMC4218688 DOI: 10.4137/cin.s14066] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2014] [Revised: 08/13/2014] [Accepted: 08/14/2014] [Indexed: 12/16/2022] Open

Afsari B, Fertig EJ, Geman D, Marchionni L. switchBox: an R package for k-Top Scoring Pairs classifier development. ACTA ACUST UNITED AC 2014;31:273-4. [PMID: 25262153 DOI: 10.1093/bioinformatics/btu622] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]