Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Hornung R, Causeur D, Bernau C, Boulesteix AL. Improving cross-study prediction through addon batch effect adjustment or addon normalization. Bioinformatics 2017;33:397-404. [PMID: 27797760 DOI: 10.1093/bioinformatics/btw650] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Accepted: 10/11/2016] [Indexed: 12/22/2022] Open

For:	Hornung R, Causeur D, Bernau C, Boulesteix AL. Improving cross-study prediction through addon batch effect adjustment or addon normalization. Bioinformatics 2017;33:397-404. [PMID: 27797760 DOI: 10.1093/bioinformatics/btw650] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Accepted: 10/11/2016] [Indexed: 12/22/2022] Open

Number

Cited by Other Article(s)

Van R, Alvarez D, Mize T, Gannavarapu S, Chintham Reddy L, Nasoz F, Han MV. A comparison of RNA-Seq data preprocessing pipelines for transcriptomic predictions across independent studies. BMC Bioinformatics 2024;25:181. [PMID: 38720247 PMCID: PMC11080237 DOI: 10.1186/s12859-024-05801-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 05/02/2024] [Indexed: 05/12/2024] Open

Abstract

BACKGROUND

RNA sequencing combined with machine learning techniques has provided a modern approach to the molecular classification of cancer. Class predictors, reflecting the disease class, can be constructed for known tissue types using the gene expression measurements extracted from cancer patients. One challenge of current cancer predictors is that they often have suboptimal performance estimates when integrating molecular datasets generated from different labs. Often, the quality of the data is variable, procured differently, and contains unwanted noise hampering the ability of a predictive model to extract useful information. Data preprocessing methods can be applied in attempts to reduce these systematic variations and harmonize the datasets before they are used to build a machine learning model for resolving tissue of origins.

RESULTS

We aimed to investigate the impact of data preprocessing steps-focusing on normalization, batch effect correction, and data scaling-through trial and comparison. Our goal was to improve the cross-study predictions of tissue of origin for common cancers on large-scale RNA-Seq datasets derived from thousands of patients and over a dozen tumor types. The results showed that the choice of data preprocessing operations affected the performance of the associated classifier models constructed for tissue of origin predictions in cancer.

CONCLUSION

By using TCGA as a training set and applying data preprocessing methods, we demonstrated that batch effect correction improved performance measured by weighted F1-score in resolving tissue of origin against an independent GTEx test dataset. On the other hand, the use of data preprocessing operations worsened classification performance when the independent test dataset was aggregated from separate studies in ICGC and GEO. Therefore, based on our findings with these publicly available large-scale RNA-Seq datasets, the application of data preprocessing techniques to a machine learning pipeline is not always appropriate.

Collapse

Rabaglino MB, Sánchez JM, McDonald M, O’Callaghan E, Lonergan P. Maternal blood transcriptome as a sensor of fetal organ maturation at the end of organogenesis in cattle†. Biol Reprod 2023;109:749-758. [PMID: 37658765 PMCID: PMC10651065 DOI: 10.1093/biolre/ioad103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 07/25/2023] [Accepted: 08/31/2023] [Indexed: 09/05/2023] Open

Pogosova-Agadjanyan EL, Hua X, Othus M, Appelbaum FR, Chauncey TR, Erba HP, Fitzgibbon MP, Jenkins IC, Fang M, Lee SC, Moseley A, Naru J, Radich JP, Smith JL, Willborg BE, Willman CL, Wu F, Meshinchi S, Stirewalt DL. Verification of prognostic expression biomarkers is improved by examining enriched leukemic blasts rather than mononuclear cells from acute myeloid leukemia patients. Biomark Res 2023;11:31. [PMID: 36927800 PMCID: PMC10022072 DOI: 10.1186/s40364-023-00461-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Accepted: 01/30/2023] [Indexed: 03/18/2023] Open

Abstract

BACKGROUND

Studies have not systematically compared the ability to verify performance of prognostic transcripts in paired bulk mononuclear cells versus viable CD34-expressing leukemic blasts from patients with acute myeloid leukemia. We hypothesized that examining the homogenous leukemic blasts will yield different biological information and may improve prognostic performance of expression biomarkers.

METHODS

To assess the impact of cellular heterogeneity on expression biomarkers in acute myeloid leukemia, we systematically examined paired mononuclear cells and viable CD34-expressing leukemic blasts from SWOG diagnostic specimens. After enrichment, patients were assigned into discovery and validation cohorts based on availability of extracted RNA. Analyses of RNA sequencing data examined how enrichment impacted differentially expressed genes associated with pre-analytic variables, patient characteristics, and clinical outcomes.

RESULTS

Blast enrichment yielded significantly different expression profiles and biological pathways associated with clinical characteristics (e.g., cytogenetics). Although numerous differentially expressed genes were associated with clinical outcomes, most lost their prognostic significance in the mononuclear cells and blasts after adjusting for age and ELN risk, with only 11 genes remaining significant for overall survival in both cell populations (CEP70, COMMD7, DNMT3B, ECE1, LNX2, NEGR1, PIK3C2B, SEMA4D, SMAD2, TAF8, ZNF444). To examine the impact of enrichment on biomarker verification, these 11 candidate biomarkers were examined by quantitative RT/PCR in the validation cohort. After adjusting for ELN risk and age, expression of 4 genes (CEP70, DNMT3B, ECE1, and PIK3CB) remained significantly associated with overall survival in the blasts, while none met statistical significance in mononuclear cells.

CONCLUSIONS

This study provides insights into biological information gained/lost by examining viable CD34-expressing leukemic blasts versus mononuclear cells from the same patient and shows an improved verification rate for expression biomarkers in blasts.

Collapse

Affiliation(s)

Era L Pogosova-Agadjanyan Clinical Research Division, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, D5-112, Seattle, WA, 98109, USA
Xing Hua SWOG Statistical Center, Fred Hutchinson Cancer Center, Seattle, WA, USA
Megan Othus SWOG Statistical Center, Fred Hutchinson Cancer Center, Seattle, WA, USA
Frederick R Appelbaum Clinical Research Division, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, D5-112, Seattle, WA, 98109, USA Departments of Oncology and Hematology, University of Washington, Seattle, WA, USA
Thomas R Chauncey Departments of Oncology and Hematology, University of Washington, Seattle, WA, USA VA Puget Sound Health Care System, Seattle, WA, USA
Harry P Erba Duke Cancer Institute, Durham, NC, USA
Matthew P Fitzgibbon Bioinformatics Shared Resource, Fred Hutchinson Cancer Center, Seattle, WA, USA
Isaac C Jenkins Clinical Research Division, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, D5-112, Seattle, WA, 98109, USA Clinical Biostatistics, Fred Hutchinson Cancer Center, Seattle, WA, USA
Min Fang Clinical Research Division, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, D5-112, Seattle, WA, 98109, USA
Stanley C Lee Clinical Research Division, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, D5-112, Seattle, WA, 98109, USA
Anna Moseley SWOG Statistical Center, Fred Hutchinson Cancer Center, Seattle, WA, USA
Jasmine Naru Clinical Research Division, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, D5-112, Seattle, WA, 98109, USA
Jerald P Radich Clinical Research Division, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, D5-112, Seattle, WA, 98109, USA Departments of Oncology and Hematology, University of Washington, Seattle, WA, USA
Jenny L Smith Clinical Research Division, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, D5-112, Seattle, WA, 98109, USA Department of Pediatrics, University of Washington, Seattle, WA, USA
Brooke E Willborg Clinical Research Division, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, D5-112, Seattle, WA, 98109, USA
Cheryl L Willman Department of Laboratory Medicine and Pathology, Mayo Clinic Comprehensive Cancer Center, Rochester, MN, USA
Feinan Wu Bioinformatics Shared Resource, Fred Hutchinson Cancer Center, Seattle, WA, USA
Soheil Meshinchi Clinical Research Division, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, D5-112, Seattle, WA, 98109, USA Department of Pediatrics, University of Washington, Seattle, WA, USA
Derek L Stirewalt Clinical Research Division, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, D5-112, Seattle, WA, 98109, USA. Departments of Oncology and Hematology, University of Washington, Seattle, WA, USA.

Collapse

Rabaglino MB, Salilew-Wondim D, Zolini A, Tesfaye D, Hoelker M, Lonergan P, Hansen PJ. Machine-learning methods applied to integrated transcriptomic data from bovine blastocysts and elongating conceptuses to identify genes predictive of embryonic competence. FASEB J 2023;37:e22809. [PMID: 36753406 DOI: 10.1096/fj.202201977r] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Revised: 01/13/2023] [Accepted: 01/26/2023] [Indexed: 02/09/2023]

Abstract

Early pregnancy loss markedly impacts reproductive efficiency in cattle. The objectives were to model a biologically relevant gene signature predicting embryonic competence for survival after integrating transcriptomic data from blastocysts and elongating conceptuses with different developmental capacities and to validate the potential biomarkers with independent embryonic data sets through the application of machine-learning algorithms. First, two data sets from in vivo-produced blastocysts competent or not to sustain a pregnancy were integrated with a data set from long and short day-15 conceptuses. A statistical contrast determined differentially expressed genes (DEG) increasing in expression from a competent blastocyst to a long conceptus and vice versa; these were enriched for KEGG pathways related to glycolysis/gluconeogenesis and RNA processing, respectively. Next, the most discriminative DEG between blastocysts that resulted or did not in pregnancy were selected by linear discriminant analysis. These eight putative biomarker genes were validated by modeling their expression in competent or noncompetent blastocysts through Bayesian logistic regression or neural networks and predicting embryo developmental fate in four external data sets consisting of in vitro-produced blastocysts (i) competent or not, or (ii) exposed or not to detrimental conditions during culture, and elongated conceptuses (iii) of different length, or (iv) developed in the uteri of high- or subfertile heifers. Predictions for each data set were more than 85% accurate, suggesting that these genes play a key role in embryo development and pregnancy establishment. In conclusion, this study integrated transcriptomic data from seven independent experiments to identify a small set of genes capable of predicting embryonic competence for survival.

Collapse

Rabaglino MB, O’Doherty A, Bojsen-Møller Secher J, Lonergan P, Hyttel P, Fair T, Kadarmideen HN. Application of multi-omics data integration and machine learning approaches to identify epigenetic and transcriptomic differences between in vitro and in vivo produced bovine embryos. PLoS One 2021;16:e0252096. [PMID: 34029343 PMCID: PMC8143403 DOI: 10.1371/journal.pone.0252096] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 05/09/2021] [Indexed: 01/16/2023] Open

Abstract

Pregnancy rates for in vitro produced (IVP) embryos are usually lower than for embryos produced in vivo after ovarian superovulation (MOET). This is potentially due to alterations in their trophectoderm (TE), the outermost layer in physical contact with the maternal endometrium. The main objective was to apply a multi-omics data integration approach to identify both temporally differentially expressed and differentially methylated genes (DEG and DMG), between IVP and MOET embryos, that could impact TE function. To start, four and five published transcriptomic and epigenomic datasets, respectively, were processed for data integration. Second, DEG from day 7 to days 13 and 16 and DMG from day 7 to day 17 were determined in the TE from IVP vs. MOET embryos. Third, genes that were both DE and DM were subjected to hierarchical clustering and functional enrichment analysis. Finally, findings were validated through a machine learning approach with two additional datasets from day 15 embryos. There were 1535 DEG and 6360 DMG, with 490 overlapped genes, whose expression profiles at days 13 and 16 resulted in three main clusters. Cluster 1 (188) and Cluster 2 (191) genes were down-regulated at day 13 or day 16, respectively, while Cluster 3 genes (111) were up-regulated at both days, in IVP embryos compared to MOET embryos. The top enriched terms were the KEGG pathway "focal adhesion" in Cluster 1 (FDR = 0.003), and the cellular component: "extracellular exosome" in Cluster 2 (FDR<0.0001), also enriched in Cluster 1 (FDR = 0.04). According to the machine learning approach, genes in Cluster 1 showed a similar expression pattern between IVP and less developed (short) MOET conceptuses; and between MOET and DKK1-treated (advanced) IVP conceptuses. In conclusion, these results suggest that early conceptuses derived from IVP embryos exhibit epigenomic and transcriptomic changes that later affect its elongation and focal adhesion, impairing post-transfer survival.

Collapse

Machine learning approach to integrated endometrial transcriptomic datasets reveals biomarkers predicting uterine receptivity in cattle at seven days after estrous. Sci Rep 2020;10:16981. [PMID: 33046742 PMCID: PMC7550564 DOI: 10.1038/s41598-020-72988-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2020] [Accepted: 09/07/2020] [Indexed: 12/12/2022] Open

Mazzoni G, Pedersen HS, Rabaglino MB, Hyttel P, Callesen H, Kadarmideen HN. Characterization of the endometrial transcriptome in early diestrus influencing pregnancy status in dairy cattle after transfer of in vitro-produced embryos. Physiol Genomics 2020;52:269-279. [PMID: 32508252 DOI: 10.1152/physiolgenomics.00027.2020] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open

Samaga D, Hornung R, Braselmann H, Hess J, Zitzelsberger H, Belka C, Boulesteix AL, Unger K. Single-center versus multi-center data sets for molecular prognostic modeling: a simulation study. Radiat Oncol 2020;15:109. [PMID: 32410693 PMCID: PMC7227093 DOI: 10.1186/s13014-020-01543-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Accepted: 04/22/2020] [Indexed: 02/07/2023] Open

Abstract

Background

Prognostic models based on high-dimensional omics data generated from clinical patient samples, such as tumor tissues or biopsies, are increasingly used for prognosis of radio-therapeutic success. The model development process requires two independent discovery and validation data sets. Each of them may contain samples collected in a single center or a collection of samples from multiple centers. Multi-center data tend to be more heterogeneous than single-center data but are less affected by potential site-specific biases. Optimal use of limited data resources for discovery and validation with respect to the expected success of a study requires dispassionate, objective decision-making. In this work, we addressed the impact of the choice of single-center and multi-center data as discovery and validation data sets, and assessed how this impact depends on the three data characteristics signal strength, number of informative features and sample size.

Methods

We set up a simulation study to quantify the predictive performance of a model trained and validated on different combinations of in silico single-center and multi-center data. The standard bioinformatical analysis workflow of batch correction, feature selection and parameter estimation was emulated. For the determination of model quality, four measures were used: false discovery rate, prediction error, chance of successful validation (significant correlation of predicted and true validation data outcome) and model calibration.

Results

In agreement with literature about generalizability of signatures, prognostic models fitted to multi-center data consistently outperformed their single-center counterparts when the prediction error was the quality criterion of interest. However, for low signal strengths and small sample sizes, single-center discovery sets showed superior performance with respect to false discovery rate and chance of successful validation.

Conclusions

With regard to decision making, this simulation study underlines the importance of study aims being defined precisely a priori. Minimization of the prediction error requires multi-center discovery data, whereas single-center data are preferable with respect to false discovery rate and chance of successful validation when the expected signal or sample size is low. In contrast, the choice of validation data solely affects the quality of the estimator of the prediction error, which was more precise on multi-center validation data.

Collapse

Affiliation(s)

Daniel Samaga Helmholtz Zentrum, München, Ingolstädter Landstr. 1, Neuherberg, 85764, Germany.
Roman Hornung Department of Medical Information Processing, Biometry and Epidemiology, University of Munich, Marchioninistr. 15, Munich, 81377, Germany
Herbert Braselmann Helmholtz Zentrum, München, Ingolstädter Landstr. 1, Neuherberg, 85764, Germany
Julia Hess Helmholtz Zentrum, München, Ingolstädter Landstr. 1, Neuherberg, 85764, Germany.,Clinical Cooperation Group Personalized Radiotherapy in Head and Neck Cancer, Helmholtz Zentrum München, Research Center for Environmental Health (GmbH), Munich, Ingolstädter Landstr. 1, Munich, 85764, Germany.,Department of Radiation Oncology, University Hospital, LMU Munich, Marchioninistr. 15, Munich, 81377, Germany
Horst Zitzelsberger Helmholtz Zentrum, München, Ingolstädter Landstr. 1, Neuherberg, 85764, Germany.,Clinical Cooperation Group Personalized Radiotherapy in Head and Neck Cancer, Helmholtz Zentrum München, Research Center for Environmental Health (GmbH), Munich, Ingolstädter Landstr. 1, Munich, 85764, Germany.,Department of Radiation Oncology, University Hospital, LMU Munich, Marchioninistr. 15, Munich, 81377, Germany
Claus Belka Clinical Cooperation Group Personalized Radiotherapy in Head and Neck Cancer, Helmholtz Zentrum München, Research Center for Environmental Health (GmbH), Munich, Ingolstädter Landstr. 1, Munich, 85764, Germany.,Department of Radiation Oncology, University Hospital, LMU Munich, Marchioninistr. 15, Munich, 81377, Germany
Anne-Laure Boulesteix Department of Medical Information Processing, Biometry and Epidemiology, University of Munich, Marchioninistr. 15, Munich, 81377, Germany
Kristian Unger Helmholtz Zentrum, München, Ingolstädter Landstr. 1, Neuherberg, 85764, Germany.,Clinical Cooperation Group Personalized Radiotherapy in Head and Neck Cancer, Helmholtz Zentrum München, Research Center for Environmental Health (GmbH), Munich, Ingolstädter Landstr. 1, Munich, 85764, Germany.,Department of Radiation Oncology, University Hospital, LMU Munich, Marchioninistr. 15, Munich, 81377, Germany

Collapse

Scalable Prediction of Acute Myeloid Leukemia Using High-Dimensional Machine Learning and Blood Transcriptomics. iScience 2019;23:100780. [PMID: 31918046 PMCID: PMC6992905 DOI: 10.1016/j.isci.2019.100780] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 12/03/2019] [Accepted: 12/12/2019] [Indexed: 01/16/2023] Open

Gradin R, Lindstedt M, Johansson H. Batch adjustment by reference alignment (BARA): Improved prediction performance in biological test sets with batch effects. PLoS One 2019;14:e0212669. [PMID: 30794641 PMCID: PMC6386283 DOI: 10.1371/journal.pone.0212669] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2018] [Accepted: 02/07/2019] [Indexed: 12/15/2022] Open

Yi H, Raman AT, Zhang H, Allen GI, Liu Z. Detecting hidden batch factors through data-adaptive adjustment for biological effects. Bioinformatics 2018;34:1141-1147. [PMID: 29617963 PMCID: PMC6454417 DOI: 10.1093/bioinformatics/btx635] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2017] [Revised: 09/05/2017] [Accepted: 10/06/2017] [Indexed: 11/13/2022] Open