1
|
Lopez-Rincon A, Mendoza-Maldonado L, Martinez-Archundia M, Schönhuth A, Kraneveld AD, Garssen J, Tonda A. Machine Learning-Based Ensemble Recursive Feature Selection of Circulating miRNAs for Cancer Tumor Classification. Cancers (Basel) 2020; 12:cancers12071785. [PMID: 32635415 PMCID: PMC7407482 DOI: 10.3390/cancers12071785] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Revised: 06/25/2020] [Accepted: 06/29/2020] [Indexed: 02/07/2023] Open
Abstract
Circulating microRNAs (miRNA) are small noncoding RNA molecules that can be detected in bodily fluids without the need for major invasive procedures on patients. miRNAs have shown great promise as biomarkers for tumors to both assess their presence and to predict their type and subtype. Recently, thanks to the availability of miRNAs datasets, machine learning techniques have been successfully applied to tumor classification. The results, however, are difficult to assess and interpret by medical experts because the algorithms exploit information from thousands of miRNAs. In this work, we propose a novel technique that aims at reducing the necessary information to the smallest possible set of circulating miRNAs. The dimensionality reduction achieved reflects a very important first step in a potential, clinically actionable, circulating miRNA-based precision medicine pipeline. While it is currently under discussion whether this first step can be taken, we demonstrate here that it is possible to perform classification tasks by exploiting a recursive feature elimination procedure that integrates a heterogeneous ensemble of high-quality, state-of-the-art classifiers on circulating miRNAs. Heterogeneous ensembles can compensate inherent biases of classifiers by using different classification algorithms. Selecting features then further eliminates biases emerging from using data from different studies or batches, yielding more robust and reliable outcomes. The proposed approach is first tested on a tumor classification problem in order to separate 10 different types of cancer, with samples collected over 10 different clinical trials, and later is assessed on a cancer subtype classification task, with the aim to distinguish triple negative breast cancer from other subtypes of breast cancer. Overall, the presented methodology proves to be effective and compares favorably to other state-of-the-art feature selection methods.
Collapse
Affiliation(s)
- Alejandro Lopez-Rincon
- Division of Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Faculty of Science, Utrecht University, Universiteitsweg 99, 3584 CG Utrecht, The Netherlands; (A.D.K.); (J.G.)
- Correspondence:
| | - Lucero Mendoza-Maldonado
- Nuevo Hospital Civil de Guadalajara “Dr. Juan I. Menchaca”, Salvador Quevedo y Zubieta 750, Independencia Oriente, Guadalajara C.P. 44340, Jalisco, Mexico;
| | - Marlet Martinez-Archundia
- Laboratorio de Modelado Molecular, Bioinformática y Diseno de farmacos, Seccion de Estudios de Posgrado e Investigación, Escuela Superior de Medicina, Instituto Politécnico Nacional, Mexico City 11340, Mexico;
| | - Alexander Schönhuth
- Life Sciences and Health, Centrum Wiskunde & Informatica, Science Park 123, 1098 XG Amsterdam, The Netherlands;
- Genome Data Science, Faculty of Technology, Bielefeld University, Universitätsstraße 25, 33615 Bielefeld, Germany
| | - Aletta D. Kraneveld
- Division of Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Faculty of Science, Utrecht University, Universiteitsweg 99, 3584 CG Utrecht, The Netherlands; (A.D.K.); (J.G.)
| | - Johan Garssen
- Division of Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Faculty of Science, Utrecht University, Universiteitsweg 99, 3584 CG Utrecht, The Netherlands; (A.D.K.); (J.G.)
- Global Centre of Excellence Immunology Danone Nutricia Research, Uppsalaan 12, 3584 CT Utrecht, The Netherlands
| | - Alberto Tonda
- UMR 518 MIA-Paris, INRAE, Université Paris-Saclay, 75013 Paris, France;
| |
Collapse
|
2
|
Talla SB, Rempel E, Endris V, Jenzer M, Allgäuer M, Schwab C, Kazdal D, Stögbauer F, Volckmar AL, Kocsmar I, Neumann O, Schirmacher P, Zschäbitz S, Duensing S, Budczies J, Stenzinger A, Kirchner M. Immuno-oncology gene expression profiling of formalin-fixed and paraffin-embedded clear cell renal cell carcinoma: Performance comparison of the NanoString nCounter technology with targeted RNA sequencing. Genes Chromosomes Cancer 2020; 59:406-416. [PMID: 32212351 DOI: 10.1002/gcc.22843] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Accepted: 03/03/2020] [Indexed: 01/05/2023] Open
Abstract
Inflammatory gene signatures are currently being explored as predictive biomarkers for immune checkpoint blockade, and particularly for the treatment of renal cell cancers. From a diagnostic point of view, the nCounter analysis platform and targeted RNA sequencing are emerging alternatives to microarrays and comprehensive transcriptome sequencing in assessing formalin-fixed and paraffin-embedded (FFPE) cancer samples. So far, no systematic study has analyzed and compared the technical performance metrics of these two approaches. Filling this gap, we performed a head-to-head comparison of two commercially available immune gene expression assays, using clear cell renal cell cancer FFPE specimens. We compared the nCounter system that utilizes a direct hybridization technology without amplification with an NGS assay that is based on targeted RNA-sequencing with preamplification. We found that both platforms displayed high technical reproducibility and accuracy (Pearson coefficient: ≥0.96, concordance correlation coefficient [CCC]: ≥0.93). A density plot for normalized expression of shared genes on both platforms showed a comparable bi-modal distribution and dynamic range. RNA-Seq demonstrated relatively larger signaling intensity whereas the nCounter system displayed higher inter-sample variability. Estimated fold changes for all shared genes showed high correlation (Spearman coefficient: 0.73). This agreement is even better when only significantly differentially expressed genes were compared. Composite gene expression profiles, such as an interferon gamma (IFNg) signature, can be reliably inferred by both assays. In summary, our study demonstrates that focused transcript read-outs can reliably be achieved by both technologies and that both approaches achieve comparable results despite their intrinsic technical differences.
Collapse
Affiliation(s)
- Suranand B Talla
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
| | - Eugen Rempel
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany.,German Cancer Consortium (DKTK), Heidelberg Partner Site, Heidelberg, Germany
| | - Volker Endris
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
| | - Maximilian Jenzer
- Department of Medical Oncology, National Center for Tumor Diseases (NCT) Heidelberg, Heidelberg, Germany
| | - Michael Allgäuer
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
| | - Constantin Schwab
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
| | - Daniel Kazdal
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
| | - Fabian Stögbauer
- Institute of Pathology, Technical University of Munich, Munich, Germany
| | - Anna-Lena Volckmar
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
| | - Ildiko Kocsmar
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
| | - Olaf Neumann
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
| | - Peter Schirmacher
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany.,German Cancer Consortium (DKTK), Heidelberg Partner Site, Heidelberg, Germany
| | - Stefanie Zschäbitz
- Department of Medical Oncology, National Center for Tumor Diseases (NCT) Heidelberg, Heidelberg, Germany
| | - Stefan Duensing
- Molecular Urooncology, Department of Urology, University Hospital Heidelberg, Heidelberg, Germany
| | - Jan Budczies
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany.,German Cancer Consortium (DKTK), Heidelberg Partner Site, Heidelberg, Germany
| | - Albrecht Stenzinger
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany.,German Cancer Consortium (DKTK), Heidelberg Partner Site, Heidelberg, Germany
| | - Martina Kirchner
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany.,German Cancer Consortium (DKTK), Heidelberg Partner Site, Heidelberg, Germany
| |
Collapse
|
3
|
Nersisyan S, Shkurnikov M, Poloznikov A, Turchinovich A, Burwinkel B, Anisimov N, Tonevitsky A. A Post-Processing Algorithm for miRNA Microarray Data. Int J Mol Sci 2020; 21:ijms21041228. [PMID: 32059403 PMCID: PMC7072892 DOI: 10.3390/ijms21041228] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2019] [Revised: 02/04/2020] [Accepted: 02/10/2020] [Indexed: 11/16/2022] Open
Abstract
One of the main disadvantages of using DNA microarrays for miRNA expression profiling is the inability of adequate comparison of expression values across different miRNAs. This leads to a large amount of miRNAs with high scores which are actually not expressed in examined samples, i.e., false positives. We propose a post-processing algorithm which performs scoring of miRNAs in the results of microarray analysis based on expression values, time of discovery of miRNA, and correlation level between the expressions of miRNA and corresponding pre-miRNA in considered samples. The algorithm was successfully validated by the comparison of the results of its application to miRNA microarray breast tumor samples with publicly available miRNA-seq breast tumor data. Additionally, we obtained possible reasons why miRNA can appear as a false positive in microarray study using paired miRNA sequencing and array data. The use of DNA microarrays for estimating miRNA expression profile is limited by several factors. One of them consists of problems with comparing expression values of different miRNAs. In this work, we show that situation can be significantly improved if some additional information is taken into consideration in a comparison.
Collapse
Affiliation(s)
- Stepan Nersisyan
- Faculty of Mechanics and Mathematics, Lomonosov Moscow State University, Leninskie Gory 1, 119991 Moscow, Russia
- Correspondence:
| | - Maxim Shkurnikov
- P.A. Hertsen Moscow Oncology Research Center, Branch of National Medical Research Radiological Center, Ministry of Health of the Russian Federation, Second Botkinsky lane 3, 125284 Moscow, Russia;
| | - Andrey Poloznikov
- National Medical Research Radiological Center, Ministry of Health of the Russian Federation, 249036 Obninks, Russia;
- School of Biomedicine, Far Eastern Federal University, 690922 Vladivostok, Russia;
| | - Andrey Turchinovich
- Molecular Epidemiology C080, German Cancer Research Center, 69120 Heidelberg, Germany; (A.T.); (B.B.)
- SciBerg e.Kfm, 68309 Mannheim, Germany
| | - Barbara Burwinkel
- Molecular Epidemiology C080, German Cancer Research Center, 69120 Heidelberg, Germany; (A.T.); (B.B.)
- University Hospital Heidelberg, 69120 Heidelberg, Germany
| | - Nikita Anisimov
- School of Biomedicine, Far Eastern Federal University, 690922 Vladivostok, Russia;
| | - Alexander Tonevitsky
- Faculty of Biology and Biotechnologies, Higher School of Economics, 117312 Moscow, Russia;
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry RAS, 117997 Moscow, Russia
| |
Collapse
|
4
|
Lopez-Rincon A, Martinez-Archundia M, Martinez-Ruiz GU, Schoenhuth A, Tonda A. Automatic discovery of 100-miRNA signature for cancer classification using ensemble feature selection. BMC Bioinformatics 2019; 20:480. [PMID: 31533612 PMCID: PMC6751684 DOI: 10.1186/s12859-019-3050-8] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Accepted: 08/22/2019] [Indexed: 12/16/2022] Open
Abstract
Background MicroRNAs (miRNAs) are noncoding RNA molecules heavily involved in human tumors, in which few of them circulating the human body. Finding a tumor-associated signature of miRNA, that is, the minimum miRNA entities to be measured for discriminating both different types of cancer and normal tissues, is of utmost importance. Feature selection techniques applied in machine learning can help however they often provide naive or biased results. Results An ensemble feature selection strategy for miRNA signatures is proposed. miRNAs are chosen based on consensus on feature relevance from high-accuracy classifiers of different typologies. This methodology aims to identify signatures that are considerably more robust and reliable when used in clinically relevant prediction tasks. Using the proposed method, a 100-miRNA signature is identified in a dataset of 8023 samples, extracted from TCGA. When running eight-state-of-the-art classifiers along with the 100-miRNA signature against the original 1046 features, it could be detected that global accuracy differs only by 1.4%. Importantly, this 100-miRNA signature is sufficient to distinguish between tumor and normal tissues. The approach is then compared against other feature selection methods, such as UFS, RFE, EN, LASSO, Genetic Algorithms, and EFS-CLA. The proposed approach provides better accuracy when tested on a 10-fold cross-validation with different classifiers and it is applied to several GEO datasets across different platforms with some classifiers showing more than 90% classification accuracy, which proves its cross-platform applicability. Conclusions The 100-miRNA signature is sufficiently stable to provide almost the same classification accuracy as the complete TCGA dataset, and it is further validated on several GEO datasets, across different types of cancer and platforms. Furthermore, a bibliographic analysis confirms that 77 out of the 100 miRNAs in the signature appear in lists of circulating miRNAs used in cancer studies, in stem-loop or mature-sequence form. The remaining 23 miRNAs offer potentially promising avenues for future research. Electronic supplementary material The online version of this article (10.1186/s12859-019-3050-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Alejandro Lopez-Rincon
- Division of Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Faculty of Science, Utrecht University, David de Wied building,Universiteitsweg 99, Utrecht, 3584 CG, The Netherlands.
| | - Marlet Martinez-Archundia
- Laboratorio de Modelado Molecular, Bioinformática y diseño de fármacos. Departamento de Posgrado. Escuela Superior de Medicina del Instituto Politécnico Nacional (IPN), Mexico City, Mexico
| | - Gustavo U Martinez-Ruiz
- Faculty of Medicine, National Autonomous University of Mexico; Federico Gomez Children's Hospital of Mexico, Mexico City, Mexico
| | | | - Alberto Tonda
- UMR 782 GMPA, Université Paris-Saclay, INRA, AgroParisTech, Thiverval-Grignon, France
| |
Collapse
|
5
|
Prahm KP, Høgdall C, Karlsen MA, Christensen IJ, Novotny GW, Høgdall E. Identification and validation of potential prognostic and predictive miRNAs of epithelial ovarian cancer. PLoS One 2018; 13:e0207319. [PMID: 30475821 PMCID: PMC6261038 DOI: 10.1371/journal.pone.0207319] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2018] [Accepted: 10/29/2018] [Indexed: 12/17/2022] Open
Abstract
Background Ovarian cancer is the leading cause of death by gynecologic cancers in the Western world. The aim of the study was to identify microRNAs (miRNAs) associated with prognosis and/or resistance to chemotherapy among patients with epithelial ovarian cancer. Methods Using information from the Pelvic Mass Study we identified a cohort of women with epithelial ovarian cancer. Tumor tissues were then collected and analyzed by global miRNA microarrays. MiRNA profiling was then linked to survival and time to progression using Cox proportional-hazards regression models. Logistic regression models were used for the analysis of resistance to chemotherapy. Our results were validated using external datasets retrieved from the NCBI Gene Expression Omnibus database. Results A total of 197 patients with epithelial ovarian cancer were included for miRNA microarray analysis. In multivariate analyses we identified a number of miRNAs significantly correlated with overall survival (miR-1183 (HR: 1.42, 95% CI:1.17–1.74, p = 0.0005), miR-126-3p (HR: 1.38, 95% CI:1.11–1.71, p = 0.0036), time to progression (miR-139-3p (HR: 1.48, 95% CI: 1.13–1.94, p = 0.0047), miR-802 (HR: 0.48, 95% CI: 0.29–0.78, p = 0.0035)), progression free survival (miR-23a-5p (HR:1.32, 95% CI:1.09–1.61, p = 0.004), miR-23a-3p (HR:1.70, 95% CI:1.15–2.51, p = 0.0074), miR-802 (HR: 0.48, 95% CI: 0.29–0.80, p = 0.0048)), and resistance to chemotherapy (miR-1234 (HR: 0.26, 95% CI: 0.11–0.64, p = 0.003)). A few miRNAs identified in our training cohort, were validated in external cohorts with similar results. Conclusion Eight miRNAs were identified as significant predictors of overall survival, progression free survival, time to progression, and chemotherapy resistance. A number of these miRNAs were significantly validated using external datasets. Inter-platform and inter-laboratory variations may have influence on the ability to compare and reproduce miRNA results. The use of miRNAs as potential markers of relapse and survival in ovarian cancer warrants further investigation.
Collapse
Affiliation(s)
- Kira Philipsen Prahm
- Department of Pathology, Molecular unit, Danish CancerBiobank, Herlev University Hospital, Herlev, Denmark
- Gynecological Clinic, The Juliane Marie Center, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark
- * E-mail:
| | - Claus Høgdall
- Gynecological Clinic, The Juliane Marie Center, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark
| | - Mona Aarenstrup Karlsen
- Department of Pathology, Molecular unit, Danish CancerBiobank, Herlev University Hospital, Herlev, Denmark
- Gynecological Clinic, The Juliane Marie Center, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark
| | - Ib Jarle Christensen
- Department of Pathology, Molecular unit, Danish CancerBiobank, Herlev University Hospital, Herlev, Denmark
| | - Guy Wayne Novotny
- Department of Pathology, Molecular unit, Danish CancerBiobank, Herlev University Hospital, Herlev, Denmark
| | - Estrid Høgdall
- Department of Pathology, Molecular unit, Danish CancerBiobank, Herlev University Hospital, Herlev, Denmark
| |
Collapse
|
6
|
MicroRNA expression profiling for the prediction of resistance to neoadjuvant radiochemotherapy in squamous cell carcinoma of the esophagus. J Transl Med 2018; 16:109. [PMID: 29695253 PMCID: PMC5918871 DOI: 10.1186/s12967-018-1492-9] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2017] [Accepted: 04/20/2018] [Indexed: 01/03/2023] Open
Abstract
Background MicroRNAs (miRNAs) play an important role in cancer biology. Neoadjuvant radiochemotherapy followed by surgery is a standard treatment for locally advanced esophageal squamous cell carcinoma (ESCC). However, a subset of patients do not respond. We evaluated whether miRNA profiles can predict resistance to radiochemotherapy. Methods Formalin-fixed, paraffin-embedded pretherapeutic biopsies of patients treated by radiochemotherapy followed by esophagectomy were analyzed. The response was determined by histopathological tumor regression grading. miRNA profiling was performed by microarray analysis (Agilent platform) in 16 non-responders and 15 responders. Differentially expressed miRNAs were confirmed by real-time quantitative PCR (qRT-PCR) in an expanded cohort of 53 cases. Results The miRNA profiles within and between non-responders and responders were highly similar (r = 0.96, 0.94 and 0.95). However, 12 miRNAs were differentially expressed (> twofold; p ≤ 0.025): non-responders showed upregulation of hsa-miR-1323, hsa-miR-3678-3p, hsv2-miR-H7-3p, hsa-miR-194*, hsa-miR-3152, kshv-miR-K12-4-3p, hsa-miR-665 and hsa-miR-3659 and downregulation of hsa-miR-126*, hsa-miR-484, hsa-miR-330-3p and hsa-miR-3653. qRT-PCR analysis confirmed the microarray findings for hsa-miR-194* and hsa-miR-665 (p < 0.001 each) with AUC values of 0.811 (95% CI 0.694–0.927) and 0.817 (95% CI 0.704–0.930), respectively, in ROC analysis. Conclusions Our results indicate that miRNAs are involved in the therapeutic response in ESCC and suggest that miRNA profiles could facilitate pretherapeutic patient selection. Electronic supplementary material The online version of this article (10.1186/s12967-018-1492-9) contains supplementary material, which is available to authorized users.
Collapse
|