1
|
Vagliano I, Kingma MY, Dongelmans DA, de Lange DW, de Keizer NF, Schut MC. Automated identification of patient subgroups: A case-study on mortality of COVID-19 patients admitted to the ICU. Comput Biol Med 2023; 163:107146. [PMID: 37356293 PMCID: PMC10266884 DOI: 10.1016/j.compbiomed.2023.107146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 05/31/2023] [Accepted: 06/06/2023] [Indexed: 06/27/2023]
Abstract
BACKGROUND - Subgroup discovery (SGD) is the automated splitting of the data into complex subgroups. Various SGD methods have been applied to the medical domain, but none have been extensively evaluated. We assess the numerical and clinical quality of SGD methods. METHOD - We applied the improved Subgroup Set Discovery (SSD++), Patient Rule Induction Method (PRIM) and APRIORI - Subgroup Discovery (APRIORI-SD) algorithms to obtain patient subgroups on observational data of 14,548 COVID-19 patients admitted to 73 Dutch intensive care units. Hospital mortality was the clinical outcome. Numerical significance of the subgroups was assessed with information-theoretic measures. Clinical significance of the subgroups was assessed by comparing variable importance on population and subgroup levels and by expert evaluation. RESULTS - The tested algorithms varied widely in the total number of discovered subgroups (5-62), the number of selected variables, and the predictive value of the subgroups. Qualitative assessment showed that the found subgroups make clinical sense. SSD++ found most subgroups (n = 62), which added predictive value and generally showed high potential for clinical use. APRIORI-SD and PRIM found fewer subgroups (n = 5 and 6), which did not add predictive value and were clinically less relevant. CONCLUSION - Automated SGD methods find clinical subgroups that are relevant when assessed quantitatively (yield added predictive value) and qualitatively (intensivists consider the subgroups significant). Different methods yield different subgroups with varying degrees of predictive performance and clinical quality. External validation is needed to generalize the results to other populations and future research should explore which algorithm performs best in other settings.
Collapse
Affiliation(s)
- I Vagliano
- Dept. of Medical Informatics, Amsterdam UMC, University of Amsterdam, Meibergdreef 15, 1105 AZ, Amsterdam, the Netherlands; Amsterdam Public Health (APH), Postbus 7057, 1007 MB, Amsterdam, the Netherlands.
| | - M Y Kingma
- Dept. of Medical Informatics, Amsterdam UMC, University of Amsterdam, Meibergdreef 15, 1105 AZ, Amsterdam, the Netherlands
| | - D A Dongelmans
- Amsterdam Public Health (APH), Postbus 7057, 1007 MB, Amsterdam, the Netherlands; Dept. of Intensive Care Medicine, Amsterdam UMC, University of Amsterdam, Meibergdreef 15, 1105 AZ, Amsterdam, the Netherlands; National Intensive Care Evaluation (NICE) Foundation, Postbus 23640, 1100 EC, Amsterdam, the Netherlands
| | - D W de Lange
- National Intensive Care Evaluation (NICE) Foundation, Postbus 23640, 1100 EC, Amsterdam, the Netherlands; Dept. of Intensive Care, University Medical Center Utrecht, University Utrecht, Heidelberglaan 100, 3584 CX, Utrecht, the Netherlands
| | - N F de Keizer
- Dept. of Medical Informatics, Amsterdam UMC, University of Amsterdam, Meibergdreef 15, 1105 AZ, Amsterdam, the Netherlands; Amsterdam Public Health (APH), Postbus 7057, 1007 MB, Amsterdam, the Netherlands; National Intensive Care Evaluation (NICE) Foundation, Postbus 23640, 1100 EC, Amsterdam, the Netherlands
| | - M C Schut
- Dept. of Medical Informatics, Amsterdam UMC, University of Amsterdam, Meibergdreef 15, 1105 AZ, Amsterdam, the Netherlands; Amsterdam Public Health (APH), Postbus 7057, 1007 MB, Amsterdam, the Netherlands; Dept. of Clinical Chemistry, Amsterdam UMC, Vrije Universiteit Amsterdam, Meibergdreef 15, 1105 AZ, Amsterdam, the Netherlands
| |
Collapse
|
2
|
Varela-Martínez E, Bilbao-Arribas M, Abendaño N, Asín J, Pérez M, de Andrés D, Luján L, Jugo BM. Whole transcriptome approach to evaluate the effect of aluminium hydroxide in ovine encephalon. Sci Rep 2020; 10:15240. [PMID: 32943671 PMCID: PMC7498608 DOI: 10.1038/s41598-020-71905-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Accepted: 08/10/2020] [Indexed: 12/18/2022] Open
Abstract
Aluminium hydroxide adjuvants are crucial for livestock and human vaccines. Few studies have analysed their effect on the central nervous system in vivo. In this work, lambs received three different treatments of parallel subcutaneous inoculations during 16 months with aluminium-containing commercial vaccines, an equivalent dose of aluminium hydroxide or mock injections. Brain samples were sequenced by RNA-seq and miRNA-seq for the expression analysis of mRNAs, long non-coding RNAs and microRNAs and three expression comparisons were made. Although few differentially expressed genes were identified, some dysregulated genes by aluminium hydroxide alone were linked to neurological functions, the lncRNA TUNA among them, or were enriched in mitochondrial energy metabolism related functions. In the same way, the miRNA expression was mainly disrupted by the adjuvant alone treatment. Some differentially expressed miRNAs had been previously linked to neurological diseases, oxidative stress and apoptosis. In brief, in this study aluminium hydroxide alone altered the transcriptome of the encephalon to a higher degree than commercial vaccines that present a milder effect. The expression changes in the animals inoculated with aluminium hydroxide suggest mitochondrial disfunction. Further research is needed to elucidate to which extent these changes could have pathological consequences.
Collapse
Affiliation(s)
- Endika Varela-Martínez
- Department of Genetics, Physical Anthropology and Animal Physiology, Faculty of Science and Technology, University of the Basque Country (UPV/EHU), Leioa, Spain
| | - Martin Bilbao-Arribas
- Department of Genetics, Physical Anthropology and Animal Physiology, Faculty of Science and Technology, University of the Basque Country (UPV/EHU), Leioa, Spain
| | - Naiara Abendaño
- Department of Genetics, Physical Anthropology and Animal Physiology, Faculty of Science and Technology, University of the Basque Country (UPV/EHU), Leioa, Spain
| | - Javier Asín
- Department of Animal Pathology, University of Zaragoza, Zaragoza, Spain
| | - Marta Pérez
- Department of Animal Pathology, University of Zaragoza, Zaragoza, Spain
| | - Damián de Andrés
- Institute of Agrobiotechnology (CSIC-UPNA-Gov. Navarra), Navarra, Spain
| | - Lluís Luján
- Department of Animal Pathology, University of Zaragoza, Zaragoza, Spain
| | - Begoña M Jugo
- Department of Genetics, Physical Anthropology and Animal Physiology, Faculty of Science and Technology, University of the Basque Country (UPV/EHU), Leioa, Spain.
| |
Collapse
|
3
|
Mallik S, Zhao Z. Graph- and rule-based learning algorithms: a comprehensive review of their applications for cancer type classification and prognosis using genomic data. Brief Bioinform 2020; 21:368-394. [PMID: 30649169 PMCID: PMC7373185 DOI: 10.1093/bib/bby120] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Revised: 10/26/2018] [Accepted: 11/21/2018] [Indexed: 12/20/2022] Open
Abstract
Cancer is well recognized as a complex disease with dysregulated molecular networks or modules. Graph- and rule-based analytics have been applied extensively for cancer classification as well as prognosis using large genomic and other data over the past decade. This article provides a comprehensive review of various graph- and rule-based machine learning algorithms that have been applied to numerous genomics data to determine the cancer-specific gene modules, identify gene signature-based classifiers and carry out other related objectives of potential therapeutic value. This review focuses mainly on the methodological design and features of these algorithms to facilitate the application of these graph- and rule-based analytical approaches for cancer classification and prognosis. Based on the type of data integration, we divided all the algorithms into three categories: model-based integration, pre-processing integration and post-processing integration. Each category is further divided into four sub-categories (supervised, unsupervised, semi-supervised and survival-driven learning analyses) based on learning style. Therefore, a total of 11 categories of methods are summarized with their inputs, objectives and description, advantages and potential limitations. Next, we briefly demonstrate well-known and most recently developed algorithms for each sub-category along with salient information, such as data profiles, statistical or feature selection methods and outputs. Finally, we summarize the appropriate use and efficiency of all categories of graph- and rule mining-based learning methods when input data and specific objective are given. This review aims to help readers to select and use the appropriate algorithms for cancer classification and prognosis study.
Collapse
Affiliation(s)
- Saurav Mallik
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center, Houston
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center, Houston
| |
Collapse
|
4
|
Meysman P, Saeys Y, Sabaghian E, Bittremieux W, Van de Peer Y, Goethals B, Laukens K. Mining the Enriched Subgraphs for Specific Vertices in a Biological Graph. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1496-1507. [PMID: 27295680 DOI: 10.1109/tcbb.2016.2576440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, we present a subgroup discovery method to find subgraphs in a graph that are associated with a given set of vertices. The association between a subgraph pattern and a set of vertices is defined by its significant enrichment based on a Bonferroni-corrected hypergeometric probability value. This interestingness measure requires a dedicated pruning procedure to limit the number of subgraph matches that must be calculated. The presented mining algorithm to find associated subgraph patterns in large graphs is therefore designed to efficiently traverse the search space. We demonstrate the operation of this method by applying it on three biological graph data sets and show that we can find associated subgraphs for a biologically relevant set of vertices and that the found subgraphs themselves are biologically interesting.
Collapse
|
5
|
Mrzic A, Meysman P, Bittremieux W, Moris P, Cule B, Goethals B, Laukens K. Grasping frequent subgraph mining for bioinformatics applications. BioData Min 2018; 11:20. [PMID: 30202444 PMCID: PMC6122726 DOI: 10.1186/s13040-018-0181-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2017] [Accepted: 08/13/2018] [Indexed: 11/18/2022] Open
Abstract
Searching for interesting common subgraphs in graph data is a well-studied problem in data mining. Subgraph mining techniques focus on the discovery of patterns in graphs that exhibit a specific network structure that is deemed interesting within these data sets. The definition of which subgraphs are interesting and which are not is highly dependent on the application. These techniques have seen numerous applications and are able to tackle a range of biological research questions, spanning from the detection of common substructures in sets of biomolecular compounds, to the discovery of network motifs in large-scale molecular interaction networks. Thus far, information about the bioinformatics application of subgraph mining remains scattered over heterogeneous literature. In this review, we provide an introduction to subgraph mining for life scientists. We give an overview of various subgraph mining algorithms from a bioinformatics perspective and present several of their potential biomedical applications.
Collapse
Affiliation(s)
- Aida Mrzic
- 1Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium.,2Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Antwerp, Belgium
| | - Pieter Meysman
- 1Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium.,2Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Antwerp, Belgium
| | - Wout Bittremieux
- 1Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium.,2Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Antwerp, Belgium
| | - Pieter Moris
- 1Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium.,2Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Antwerp, Belgium
| | - Boris Cule
- 1Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
| | - Bart Goethals
- 1Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
| | - Kris Laukens
- 1Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium.,2Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Antwerp, Belgium
| |
Collapse
|
6
|
Jeng KS, Chang CF, Jeng WJ, Sheen IS, Jeng CJ. Heterogeneity of hepatocellular carcinoma contributes to cancer progression. Crit Rev Oncol Hematol 2015; 94:337-47. [PMID: 25680939 DOI: 10.1016/j.critrevonc.2015.01.009] [Citation(s) in RCA: 68] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2014] [Revised: 10/24/2014] [Accepted: 01/21/2015] [Indexed: 01/10/2023] Open
Abstract
Hepatocellular carcinoma (HCC) is a highly heterogeneous disease displaying differences in angiogenesis, extracellular matrix proteins, the immune microenvironment and tumor cell populations. Additionally, genetic variations and epigenetic changes of HCC cells could lead to aberrant signaling pathways, induce cancer stem cells and enhance tumor progression. Thus, the heterogeneity in HCC contributes to disease progression and a better understanding of its heterogeneity will greatly aid in the development of strategies for the HCC treatment.
Collapse
Affiliation(s)
- Kuo-Shyang Jeng
- Department of Surgery, Far Eastern Memorial Hospital, New Taipei City, Taiwan; Department of Medical Research, Far Eastern Memorial Hospital, New Taipei City, Taiwan.
| | - Chiung-Fang Chang
- Department of Medical Research, Far Eastern Memorial Hospital, New Taipei City, Taiwan
| | - Wen-Juei Jeng
- Department of Hepato-Gastroenterology, Chang-Gung Memorial Hospital, LinKou Medical Center, Chang Gung University, Taiwan
| | - I-Shyan Sheen
- Department of Hepato-Gastroenterology, Chang-Gung Memorial Hospital, LinKou Medical Center, Chang Gung University, Taiwan
| | - Chi-Juei Jeng
- Graduate Institute of Clinical Medicine, College of Medicine, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
7
|
Wang XW, Thorgeirsson SS. The biological and clinical challenge of liver cancer heterogeneity. Hepat Oncol 2014; 1:349-353. [PMID: 30190968 DOI: 10.2217/hep.14.18] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Affiliation(s)
- Xin Wei Wang
- Laboratory of Human Carcinogenesis, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20892, USA.,Laboratory of Human Carcinogenesis, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20892, USA
| | - Snorri S Thorgeirsson
- Laboratory of Experimental Carcinogenesis, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20892, USA.,Laboratory of Experimental Carcinogenesis, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20892, USA
| |
Collapse
|
8
|
Roessler S, Budhu A, Wang XW. Deciphering cancer heterogeneity: the biological space. Front Cell Dev Biol 2014; 2:12. [PMID: 25364720 PMCID: PMC4207029 DOI: 10.3389/fcell.2014.00012] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2013] [Accepted: 03/17/2014] [Indexed: 01/15/2023] Open
Abstract
Most lethal solid tumors including hepatocellular carcinoma (HCC) are considered incurable due to extensive heterogeneity in clinical presentation and tumor biology. Tumor heterogeneity may result from different cells of origin, patient ethnicity, etiology, underlying disease, and diversity of genomic and epigenomic changes which drive tumor development. Cancer genomic heterogeneity thereby impedes treatment options and poses a significant challenge to cancer management. Studies of the HCC genome have revealed that although various genomic signatures identified in different HCC subgroups share a common prognosis, each carries unique molecular changes which are linked to different sets of cancer hallmarks whose misregulation has been proposed by Hanahan and Weinberg to be essential for tumorigenesis. We hypothesize that these specific sets of cancer hallmarks collectively occupy different tumor biological space representing the misregulation of different biological processes. In principle, a combination of different cancer hallmarks can result in new convergent molecular networks that are unique to each tumor subgroup and represent ideal druggable targets. Due to the ability of the tumor to adapt to external factors such as treatment or changes in the tumor microenvironment, the tumor biological space is elastic. Our ability to identify distinct groups of cancer patients with similar tumor biology who are most likely to respond to a specific therapy would have a significant impact on improving patient outcome. It is currently a challenge to identify a particular hallmark or a newly emerged convergent molecular network for a particular tumor. Thus, it is anticipated that the integration of multiple levels of data such as genomic mutations, somatic copy number aberration, gene expression, proteomics, and metabolomics, may help us grasp the tumor biological space occupied by each individual, leading to improved therapeutic intervention and outcome.
Collapse
Affiliation(s)
- Stephanie Roessler
- Laboratory of Human Carcinogenesis, Center for Cancer Research, National Cancer Institute Bethesda, MD, USA
| | - Anuradha Budhu
- Laboratory of Human Carcinogenesis, Center for Cancer Research, National Cancer Institute Bethesda, MD, USA
| | - Xin W Wang
- Laboratory of Human Carcinogenesis, Center for Cancer Research, National Cancer Institute Bethesda, MD, USA
| |
Collapse
|