1
|
Li C, Gao Z, Su B, Xu G, Lin X. Data analysis methods for defining biomarkers from omics data. Anal Bioanal Chem 2021; 414:235-250. [PMID: 34951658 DOI: 10.1007/s00216-021-03813-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 11/26/2021] [Accepted: 11/29/2021] [Indexed: 02/01/2023]
Abstract
Omics mainly includes genomics, epigenomics, transcriptomics, proteomics and metabolomics. The rapid development of omics technology has opened up new ways to study disease diagnosis and prognosis and to define prospective information of complex diseases. Since omics data are usually large and complex, the method used to analyze the data and to define important information is crucial in omics study. In this review, we focus on advances in biomarker discovery methods based on omics data in the last decade, and categorize them as individual feature analysis, combinatorial feature analysis and network analysis. We also discuss the challenges and perspectives in this field.
Collapse
Affiliation(s)
- Chao Li
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, Liaoning, China
| | - Zhenbo Gao
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
| | - Benzhe Su
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
| | - Guowang Xu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, Liaoning, China
| | - Xiaohui Lin
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China.
| |
Collapse
|
2
|
Ning Z, Yu S, Zhao Y, Sun X, Wu H, Yu X. Identification of miRNA-Mediated Subpathways as Prostate Cancer Biomarkers Based on Topological Inference in a Machine Learning Process Using Integrated Gene and miRNA Expression Data. Front Genet 2021; 12:656526. [PMID: 33841512 PMCID: PMC8024646 DOI: 10.3389/fgene.2021.656526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Accepted: 03/02/2021] [Indexed: 11/25/2022] Open
Abstract
Accurately identifying classification biomarkers for distinguishing between normal and cancer samples is challenging. Additionally, the reproducibility of single-molecule biomarkers is limited by the existence of heterogeneous patient subgroups and differences in the sequencing techniques used to collect patient data. In this study, we developed a method to identify robust biomarkers (i.e., miRNA-mediated subpathways) associated with prostate cancer based on normal prostate samples and cancer samples from a dataset from The Cancer Genome Atlas (TCGA; n = 546) and datasets from the Gene Expression Omnibus (GEO) database (n = 139 and n = 90, with the latter being a cell line dataset). We also obtained 10 other cancer datasets to evaluate the performance of the method. We propose a multi-omics data integration strategy for identifying classification biomarkers using a machine learning method that involves reassigning topological weights to the genes using a directed random walk (DRW)-based method. A global directed pathway network (GDPN) was constructed based on the significantly differentially expressed target genes of the significantly differentially expressed miRNAs, which allowed us to identify the robust biomarkers in the form of miRNA-mediated subpathways (miRNAs). The activity value of each miRNA-mediated subpathway was calculated by integrating multiple types of data, which included the expression of the miRNA and the miRNAs’ target genes and GDPN topological information. Finally, we identified the high-frequency miRNA-mediated subpathways involved in prostate cancer using a support vector machine (SVM) model. The results demonstrated that we obtained robust biomarkers of prostate cancer, which could classify prostate cancer and normal samples. Our method outperformed seven other methods, and many of the identified biomarkers were associated with known clinical treatments.
Collapse
Affiliation(s)
- Ziyu Ning
- The Higher Educational Key Laboratory for Measuring and Control Technology and Instrumentations of Heilongjiang Province, Harbin University of Science and Technology, Harbin, China.,School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, China
| | - Shuang Yu
- The Higher Educational Key Laboratory for Measuring and Control Technology and Instrumentations of Heilongjiang Province, Harbin University of Science and Technology, Harbin, China
| | - Yanqiao Zhao
- The Higher Educational Key Laboratory for Measuring and Control Technology and Instrumentations of Heilongjiang Province, Harbin University of Science and Technology, Harbin, China
| | - Xiaoming Sun
- The Higher Educational Key Laboratory for Measuring and Control Technology and Instrumentations of Heilongjiang Province, Harbin University of Science and Technology, Harbin, China
| | - Haibin Wu
- The Higher Educational Key Laboratory for Measuring and Control Technology and Instrumentations of Heilongjiang Province, Harbin University of Science and Technology, Harbin, China
| | - Xiaoyang Yu
- The Higher Educational Key Laboratory for Measuring and Control Technology and Instrumentations of Heilongjiang Province, Harbin University of Science and Technology, Harbin, China
| |
Collapse
|
3
|
Liu Y, Cui Y, Bai X, Feng C, Li M, Han X, Ai B, Zhang J, Li X, Han J, Zhu J, Jiang Y, Pan Q, Wang F, Xu M, Li C, Wang Q. MiRNA-Mediated Subpathway Identification and Network Module Analysis to Reveal Prognostic Markers in Human Pancreatic Cancer. Front Genet 2020; 11:606940. [PMID: 33362865 PMCID: PMC7756031 DOI: 10.3389/fgene.2020.606940] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Accepted: 11/13/2020] [Indexed: 12/16/2022] Open
Abstract
Background Pancreatic cancer (PC) remains one of the most lethal cancers. In contrast to the steady increase in survival for most cancers, the 5-year survival remains low for PC patients. Methods We describe a new pipeline that can be used to identify prognostic molecular biomarkers by identifying miRNA-mediated subpathways associated with PC. These modules were then further extracted from a comprehensive miRNA-gene network (CMGN). An exhaustive survival analysis was performed to estimate the prognostic value of these modules. Results We identified 105 miRNA-mediated subpathways associated with PC. Two subpathways within the MAPK signaling and cell cycle pathways were found to be highly related to PC. Of the miRNA-mRNA modules extracted from CMGN, six modules showed good prognostic performance in both independent validated datasets. Conclusions Our study provides novel insight into the mechanisms of PC. We inferred that six miRNA-mRNA modules could serve as potential prognostic molecular biomarkers in PC based on the pipeline we proposed.
Collapse
Affiliation(s)
- Yuejuan Liu
- School of Medical Informatics, Harbin Medical University, Daqing, China
| | - Yuxia Cui
- School of Nursing, Harbin Medical University, Daqing, China
| | - Xuefeng Bai
- School of Medical Informatics, Harbin Medical University, Daqing, China
| | - Chenchen Feng
- School of Medical Informatics, Harbin Medical University, Daqing, China
| | - Meng Li
- School of Medical Informatics, Harbin Medical University, Daqing, China
| | - Xiaole Han
- School of Medical Informatics, Harbin Medical University, Daqing, China
| | - Bo Ai
- School of Medical Informatics, Harbin Medical University, Daqing, China
| | - Jian Zhang
- School of Medical Informatics, Harbin Medical University, Daqing, China
| | - Xuecang Li
- School of Medical Informatics, Harbin Medical University, Daqing, China
| | - Junwei Han
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jiang Zhu
- School of Medical Informatics, Harbin Medical University, Daqing, China
| | - Yong Jiang
- School of Medical Informatics, Harbin Medical University, Daqing, China
| | - Qi Pan
- School of Medical Informatics, Harbin Medical University, Daqing, China
| | - Fan Wang
- School of Medical Informatics, Harbin Medical University, Daqing, China
| | - Mingcong Xu
- School of Medical Informatics, Harbin Medical University, Daqing, China
| | - Chunquan Li
- School of Medical Informatics, Harbin Medical University, Daqing, China
| | - Qiuyu Wang
- School of Medical Informatics, Harbin Medical University, Daqing, China
| |
Collapse
|
4
|
Lopez-Rincon A, Mendoza-Maldonado L, Martinez-Archundia M, Schönhuth A, Kraneveld AD, Garssen J, Tonda A. Machine Learning-Based Ensemble Recursive Feature Selection of Circulating miRNAs for Cancer Tumor Classification. Cancers (Basel) 2020; 12:cancers12071785. [PMID: 32635415 PMCID: PMC7407482 DOI: 10.3390/cancers12071785] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Revised: 06/25/2020] [Accepted: 06/29/2020] [Indexed: 02/07/2023] Open
Abstract
Circulating microRNAs (miRNA) are small noncoding RNA molecules that can be detected in bodily fluids without the need for major invasive procedures on patients. miRNAs have shown great promise as biomarkers for tumors to both assess their presence and to predict their type and subtype. Recently, thanks to the availability of miRNAs datasets, machine learning techniques have been successfully applied to tumor classification. The results, however, are difficult to assess and interpret by medical experts because the algorithms exploit information from thousands of miRNAs. In this work, we propose a novel technique that aims at reducing the necessary information to the smallest possible set of circulating miRNAs. The dimensionality reduction achieved reflects a very important first step in a potential, clinically actionable, circulating miRNA-based precision medicine pipeline. While it is currently under discussion whether this first step can be taken, we demonstrate here that it is possible to perform classification tasks by exploiting a recursive feature elimination procedure that integrates a heterogeneous ensemble of high-quality, state-of-the-art classifiers on circulating miRNAs. Heterogeneous ensembles can compensate inherent biases of classifiers by using different classification algorithms. Selecting features then further eliminates biases emerging from using data from different studies or batches, yielding more robust and reliable outcomes. The proposed approach is first tested on a tumor classification problem in order to separate 10 different types of cancer, with samples collected over 10 different clinical trials, and later is assessed on a cancer subtype classification task, with the aim to distinguish triple negative breast cancer from other subtypes of breast cancer. Overall, the presented methodology proves to be effective and compares favorably to other state-of-the-art feature selection methods.
Collapse
Affiliation(s)
- Alejandro Lopez-Rincon
- Division of Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Faculty of Science, Utrecht University, Universiteitsweg 99, 3584 CG Utrecht, The Netherlands; (A.D.K.); (J.G.)
- Correspondence:
| | - Lucero Mendoza-Maldonado
- Nuevo Hospital Civil de Guadalajara “Dr. Juan I. Menchaca”, Salvador Quevedo y Zubieta 750, Independencia Oriente, Guadalajara C.P. 44340, Jalisco, Mexico;
| | - Marlet Martinez-Archundia
- Laboratorio de Modelado Molecular, Bioinformática y Diseno de farmacos, Seccion de Estudios de Posgrado e Investigación, Escuela Superior de Medicina, Instituto Politécnico Nacional, Mexico City 11340, Mexico;
| | - Alexander Schönhuth
- Life Sciences and Health, Centrum Wiskunde & Informatica, Science Park 123, 1098 XG Amsterdam, The Netherlands;
- Genome Data Science, Faculty of Technology, Bielefeld University, Universitätsstraße 25, 33615 Bielefeld, Germany
| | - Aletta D. Kraneveld
- Division of Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Faculty of Science, Utrecht University, Universiteitsweg 99, 3584 CG Utrecht, The Netherlands; (A.D.K.); (J.G.)
| | - Johan Garssen
- Division of Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Faculty of Science, Utrecht University, Universiteitsweg 99, 3584 CG Utrecht, The Netherlands; (A.D.K.); (J.G.)
- Global Centre of Excellence Immunology Danone Nutricia Research, Uppsalaan 12, 3584 CT Utrecht, The Netherlands
| | - Alberto Tonda
- UMR 518 MIA-Paris, INRAE, Université Paris-Saclay, 75013 Paris, France;
| |
Collapse
|
5
|
Corrigendum to: Topologically inferring active miRNA-mediated subpathways toward precise cancer classification by directed random walk. Mol Oncol 2019; 13:2512. [PMID: 31670488 PMCID: PMC6822244 DOI: 10.1002/1878-0261.12590] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
|