1
|
Patrício A, Costa RS, Henriques R. Pattern-centric transformation of omics data grounded on discriminative gene associations aids predictive tasks in TCGA while ensuring interpretability. Biotechnol Bioeng 2024. [PMID: 38859573 DOI: 10.1002/bit.28758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 02/07/2024] [Accepted: 05/18/2024] [Indexed: 06/12/2024]
Abstract
The increasing prevalence of omics data sources is pushing the study of regulatory mechanisms underlying complex diseases such as cancer. However, the vast quantities of molecular features produced and the inherent interplay between them lead to a level of complexity that hampers both descriptive and predictive tasks, requiring custom-built algorithms that can extract relevant information from these sources of data. We propose a transformation that moves data centered on molecules (e.g., transcripts and proteins) to a new data space focused on putative regulatory modules given by statistically relevant co-expression patterns. To this end, the proposed transformation extracts patterns from the data through biclustering and uses them to create new variables with guarantees of interpretability and discriminative power. The transformation is shown to achieve dimensionality reductions of up to 99% and increase predictive performance of various classifiers across multiple omics layers. Results suggest that omics data transformations from gene-centric to pattern-centric data supports both prediction tasks and human interpretation, notably contributing to precision medicine applications.
Collapse
Affiliation(s)
- André Patrício
- INESC-ID and Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
- LAQV-REQUIMTE, Department of Chemistry, NOVA School of Science and Technology, NOVA University Lisbon, Caparica, Portugal
| | - Rafael S Costa
- LAQV-REQUIMTE, Department of Chemistry, NOVA School of Science and Technology, NOVA University Lisbon, Caparica, Portugal
| | - Rui Henriques
- INESC-ID and Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
| |
Collapse
|
2
|
Zhang Z, Liu ZP. Robust biomarker discovery for hepatocellular carcinoma from high-throughput data by multiple feature selection methods. BMC Med Genomics 2021; 14:112. [PMID: 34433487 PMCID: PMC8386074 DOI: 10.1186/s12920-021-00957-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Accepted: 04/08/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Hepatocellular carcinoma (HCC) is one of the most common cancers. The discovery of specific genes severing as biomarkers is of paramount significance for cancer diagnosis and prognosis. The high-throughput omics data generated by the cancer genome atlas (TCGA) consortium provides a valuable resource for the discovery of HCC biomarker genes. Numerous methods have been proposed to select cancer biomarkers. However, these methods have not investigated the robustness of identification with different feature selection techniques. METHODS We use six different recursive feature elimination methods to select the gene signiatures of HCC from TCGA liver cancer data. The genes shared in the six selected subsets are proposed as robust biomarkers. Akaike information criterion (AIC) is employed to explain the optimization process of feature selection, which provides a statistical interpretation for the feature selection in machine learning methods. And we use several methods to validate the screened biomarkers. RESULTS In this paper, we propose a robust method for discovering biomarker genes for HCC from gene expression data. Specifically, we implement recursive feature elimination cross-validation (RFE-CV) methods based on six different classication algorithms. The overlaps in the discovered gene sets via different methods are referred as the identified biomarkers. We give an interpretation of the feature selection process based on machine learning using AIC in statistics. Furthermore, the features selected by the backward logistic stepwise regression via AIC minimum theory are completely contained in the identified biomarkers. Through the classification results, the superiority of interpretable robust biomarker discovery method is verified. CONCLUSIONS It is found that overlaps among gene subsets contain different quantitative features selected by the RFE-CV of 6 classifiers. The AIC values in the model selection provide a theoretical foundation for the feature selection process of biomarker discovery via machine learning. What's more, genes containing in more optimally selected subsets make better biological sense and implication. The quality of feature selection is improved by the intersections of biomarkers selected from different classifiers. This is a general method suitable for screening biomarkers of complex diseases from high-throughput data.
Collapse
Affiliation(s)
- Zishuang Zhang
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, 250061, Shandong, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, 250061, Shandong, China. .,Center for Intelligent Medicine, Shandong University, Jinan, 250061, Shandong, China.
| |
Collapse
|
3
|
Zhi D, Zhao Z, Li F, Wu Z, Liu X, Wang K. The International Conference on Intelligent Biology and Medicine (ICIBM) 2018: genomics meets medicine. BMC Med Genomics 2019; 12:20. [PMID: 30704510 PMCID: PMC6357345 DOI: 10.1186/s12920-018-0448-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
During June 10–12, 2018, the International Conference on Intelligent Biology and Medicine (ICIBM 2018) was held in Los Angeles, California, USA. The conference included 11 scientific sessions, four tutorials, one poster session, four keynote talks and four eminent scholar talks that covered a wide range of topics ranging from 3D genome structure analysis and visualization, next generation sequencing analysis, computational drug discovery, medical informatics, cancer genomics to systems biology. While medical genomics has always been a main theme in ICIBM, this year we for the first time organized the BMC Medical Genomics Supplement for ICIBM. Here, we describe 15 ICIBM papers selected for publishing in BMC Medical Genomics.
Collapse
Affiliation(s)
- Degui Zhi
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Fuhai Li
- Department of Biomedical Informatics, Ohio State University, Columbus, OH, 43210, USA
| | - Zhijin Wu
- Department of Biostatistics, Brown University, Providence, RI, 02912, USA
| | - Xiaoming Liu
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.,Present address: College of Public Health, University of South Florida, Tampa, FL, 33612, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA. .,Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|