1
|
Mishra S, Singh G, Bhattacharya M. Tissue specific tumor-gene link prediction through sampling based GNN using a heterogeneous network. Med Biol Eng Comput 2024:10.1007/s11517-024-03087-y. [PMID: 38635004 DOI: 10.1007/s11517-024-03087-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Accepted: 03/31/2024] [Indexed: 04/19/2024]
Abstract
A tissue sample is a valuable resource for understanding a patient's symptoms and health status in relation to tumor growth. Recent research seeks to establish a connection between tissue-specific tumor samples and genetic markers (genes). This breakthrough has paved the way for personalized cancer therapies. With this motivation, the proposed model constructs a heterogeneous network based on tumor sample-gene relation data and gene-gene interaction data. This network also incorporates tissue-specific gene expression and primary site-based gene counts as features, enabling tissue-specific predictions. Graph neural networks (GNNs) have proven effective in modeling complex interactions and predicting links within this network. The proposed model has successfully predicted tumor-gene associations by leveraging sampling-based GNNs and link layer embedding. The model's performance metrics, such as AUC-ROC scores, reached approximately 94%, demonstrating the potential of this heterogeneous network in predicting tissue-specific tumor sample-gene links. This paper's findings highlight the importance of tissue-specific associations in cancer research.
Collapse
Affiliation(s)
- Surabhi Mishra
- Department of Information Technology, ABV- Indian Institute of Information Technology and Management, Morena Road, Gwalior, 474015, Madhya Pradesh, India.
| | - Gurjot Singh
- Department of Information Technology, ABV- Indian Institute of Information Technology and Management, Morena Road, Gwalior, 474015, Madhya Pradesh, India
| | - Mahua Bhattacharya
- Department of Information Technology, ABV- Indian Institute of Information Technology and Management, Morena Road, Gwalior, 474015, Madhya Pradesh, India
| |
Collapse
|
2
|
Timilsina M, Fey D, Buosi S, Janik A, Costabello L, Carcereny E, Abreu DR, Cobo M, Castro RL, Bernabé R, Minervini P, Torrente M, Provencio M, Nováček V. Synergy between imputed genetic pathway and clinical information for predicting recurrence in early stage non-small cell lung cancer. J Biomed Inform 2023; 144:104424. [PMID: 37352900 DOI: 10.1016/j.jbi.2023.104424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 06/06/2023] [Accepted: 06/11/2023] [Indexed: 06/25/2023]
Abstract
OBJECTIVE Lung cancer exhibits unpredictable recurrence in low-stage tumors and variable responses to different therapeutic interventions. Predicting relapse in early-stage lung cancer can facilitate precision medicine and improve patient survivability. While existing machine learning models rely on clinical data, incorporating genomic information could enhance their efficiency. This study aims to impute and integrate specific types of genomic data with clinical data to improve the accuracy of machine learning models for predicting relapse in early-stage, non-small cell lung cancer patients. METHODS The study utilized a publicly available TCGA lung cancer cohort and imputed genetic pathway scores into the Spanish Lung Cancer Group (SLCG) data, specifically in 1348 early-stage patients. Initially, tumor recurrence was predicted without imputed pathway scores. Subsequently, the SLCG data were augmented with pathway scores imputed from TCGA. The integrative approach aimed to enhance relapse risk prediction performance. RESULTS The integrative approach achieved improved relapse risk prediction with the following evaluation metrics: an area under the precision-recall curve (PR-AUC) score of 0.75, an area under the ROC (ROC-AUC) score of 0.80, an F1 score of 0.61, and a Precision of 0.80. The prediction explanation model SHAP (SHapley Additive exPlanations) was employed to explain the machine learning model's predictions. CONCLUSION We conclude that our explainable predictive model is a promising tool for oncologists that addresses an unmet clinical need of post-treatment patient stratification based on the relapse risk while also improving the predictive power by incorporating proxy genomic data not available for specific patients.
Collapse
Affiliation(s)
- Mohan Timilsina
- Data Science Institute, Insight Centre for Data Analytics, University of Galway, Ireland.
| | - Dirk Fey
- Systems Biology Ireland, University College Dublin, Ireland.
| | - Samuele Buosi
- Data Science Institute, Insight Centre for Data Analytics, University of Galway, Ireland.
| | | | | | - Enric Carcereny
- Catalan Institute of Oncology, Hospital Universitari Germans Trias i Pujol, B-ARGO, IGTP, Badalona, Spain.
| | | | - Manuel Cobo
- Medical Oncology Intercenter Unit. Regional and Virgen de la Victoria University Hospitals. IBIMA. Málaga., Spain.
| | | | - Reyes Bernabé
- Hospital Universitario Virgen del Rocio, Sevilla, Spain.
| | | | - Maria Torrente
- Medical Oncology Department, Hospital Universitario Puerta de Hierro Majadahonda, Madrid, Spain.
| | - Mariano Provencio
- Medical Oncology Department, Hospital Universitario Puerta de Hierro Majadahonda, Madrid, Spain.
| | - Vít Nováček
- Data Science Institute, Insight Centre for Data Analytics, University of Galway, Ireland; Faculty of Informatics, Masaryk University Brno, Czech Republic; Masaryk Memorial Cancer Institute, Brno, Czech Republic.
| |
Collapse
|
3
|
Timilsina M, Nováček V, d’Aquin M, Yang H. Boundary heat diffusion classifier for a semi-supervised learning in a multilayer network embedding. Neural Netw 2022; 156:205-217. [DOI: 10.1016/j.neunet.2022.10.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 09/16/2022] [Accepted: 10/06/2022] [Indexed: 11/06/2022]
|
4
|
Timilsina M, Tandan M, Nováček V. Machine learning approaches for predicting the onset time of the adverse drug events in oncology. MACHINE LEARNING WITH APPLICATIONS 2022. [DOI: 10.1016/j.mlwa.2022.100367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
|
5
|
Allegri SA, McCoy K, Mitchell CS. CompositeView: A Network-Based Visualization Tool. BIG DATA AND COGNITIVE COMPUTING 2022; 6. [PMID: 35847767 PMCID: PMC9281616 DOI: 10.3390/bdcc6020066] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Large networks are quintessential to bioinformatics, knowledge graphs, social network analysis, and graph-based learning. CompositeView is a Python-based open-source application that improves interactive complex network visualization and extraction of actionable insight. CompositeView utilizes specifically formatted input data to calculate composite scores and display them using the Cytoscape component of Dash. Composite scores are defined representations of smaller sets of conceptually similar data that, when combined, generate a single score to reduce information overload. Visualized interactive results are user-refined via filtering elements such as node value and edge weight sliders and graph manipulation options (e.g., node color and layout spread). The primary difference between CompositeView and other network visualization tools is its ability to auto-calculate and auto-update composite scores as the user interactively filters or aggregates data. CompositeView was developed to visualize network relevance rankings, but it performs well with non-network data. Three disparate CompositeView use cases are shown: relevance rankings from SemNet 2.0, an open-source knowledge graph relationship ranking software for biomedical literature-based discovery; Human Development Index (HDI) data; and the Framingham cardiovascular study. CompositeView was stress tested to construct reference benchmarks that define breadth and size of data effectively visualized. Finally, CompositeView is compared to Excel, Tableau, Cytoscape, neo4j, NodeXL, and Gephi.
Collapse
Affiliation(s)
- Stephen A. Allegri
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Kevin McCoy
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Cassie S. Mitchell
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Machine Learning Center at Georgia Tech, Georgia Institute of Technology, Atlanta, GA 30332, USA
- Correspondence:
| |
Collapse
|
6
|
Timilsina M, Kernan DPM, Yang H, d'Aquin M. Synergy Between Embedding and Protein Functional Association Networks for Drug Label Prediction Using Harmonic Function. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1203-1213. [PMID: 33064647 DOI: 10.1109/tcbb.2020.3031696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Semi-Supervised Learning (SSL)is an approach to machine learning that makes use of unlabeled data for training with a small amount of labeled data. In the context of molecular biology and pharmacology, one can take advantage of unlabeled data. For instance, to identify drugs and targets where a few genes are known to be associated with a specific target for drugs and considered as labeled data. Labeling the genes requires laboratory verification and validation. This process is usually very time consuming and expensive. Thus, it is useful to estimate the functional role of drugs from unlabeled data using computational methods. To develop such a model, we used openly available data resources to create (i)drugs and genes, (ii)genes and disease, bipartite graphs. We constructed the genetic embedding graph from the two bipartite graphs using Tensor Factorization methods. We integrated the genetic embedding graph with the publicly available protein functional association network. Our results show the usefulness of the integration by effectively predicting drug labels.
Collapse
|
7
|
Venkatraman DL, Pulimamidi D, Shukla HG, Hegde SR. Tumor relevant protein functional interactions identified using bipartite graph analyses. Sci Rep 2021; 11:21530. [PMID: 34728699 PMCID: PMC8563864 DOI: 10.1038/s41598-021-00879-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 09/30/2021] [Indexed: 12/02/2022] Open
Abstract
An increased surge of -omics data for the diseases such as cancer allows for deriving insights into the affiliated protein interactions. We used bipartite network principles to build protein functional associations of the differentially regulated genes in 18 cancer types. This approach allowed us to combine expression data to functional associations in many cancers simultaneously. Further, graph centrality measures suggested the importance of upregulated genes such as BIRC5, UBE2C, BUB1B, KIF20A and PTH1R in cancer. Pathway analysis of the high centrality network nodes suggested the importance of the upregulation of cell cycle and replication associated proteins in cancer. Some of the downregulated high centrality proteins include actins, myosins and ATPase subunits. Among the transcription factors, mini-chromosome maintenance proteins (MCMs) and E2F family proteins appeared prominently in regulating many differentially regulated genes. The projected unipartite networks of the up and downregulated genes were comprised of 37,411 and 41,756 interactions, respectively. The conclusions obtained by collating these interactions revealed pan-cancer as well as subtype specific protein complexes and clusters. Therefore, we demonstrate that incorporating expression data from multiple cancers into bipartite graphs validates existing cancer associated mechanisms as well as directs to novel interactions and pathways.
Collapse
Affiliation(s)
| | - Deepshika Pulimamidi
- Institute of Bioinformatics and Applied Biotechnology (IBAB), Bengaluru, 560 100, India
| | - Harsh G Shukla
- Institute of Bioinformatics and Applied Biotechnology (IBAB), Bengaluru, 560 100, India
| | - Shubhada R Hegde
- Institute of Bioinformatics and Applied Biotechnology (IBAB), Bengaluru, 560 100, India.
| |
Collapse
|
8
|
Sandini C, Zöller D, Schneider M, Tarun A, Armondo M, Nelson B, Amminger PG, Yuen HP, Markulev C, Schäffer MR, Mossaheb N, Schlögelhofer M, Smesny S, Hickie IB, Berger GE, Chen EY, de Haan L, Nieman DH, Nordentoft M, Riecher-Rössler A, Verma S, Thompson A, Yung AR, McGorry PD, Van De Ville D, Eliez S. Characterization and prediction of clinical pathways of vulnerability to psychosis through graph signal processing. eLife 2021; 10:59811. [PMID: 34569937 PMCID: PMC8476129 DOI: 10.7554/elife.59811] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Accepted: 09/09/2021] [Indexed: 11/21/2022] Open
Abstract
Causal interactions between specific psychiatric symptoms could contribute to the heterogenous clinical trajectories observed in early psychopathology. Current diagnostic approaches merge clinical manifestations that co-occur across subjects and could significantly hinder our understanding of clinical pathways connecting individual symptoms. Network analysis techniques have emerged as alternative approaches that could help shed light on the complex dynamics of early psychopathology. The present study attempts to address the two main limitations that have in our opinion hindered the application of network approaches in the clinical setting. Firstly, we show that a multi-layer network analysis approach, can move beyond a static view of psychopathology, by providing an intuitive characterization of the role of specific symptoms in contributing to clinical trajectories over time. Secondly, we show that a Graph-Signal-Processing approach, can exploit knowledge of longitudinal interactions between symptoms, to predict clinical trajectories at the level of the individual. We test our approaches in two independent samples of individuals with genetic and clinical vulnerability for developing psychosis. Novel network approaches can allow to embrace the dynamic complexity of early psychopathology and help pave the way towards a more a personalized approach to clinical care.
Collapse
Affiliation(s)
- Corrado Sandini
- Developmental Imaging and Psychopathology Laboratory, University of Geneva School of Medicine, Geneva, Switzerland
| | - Daniela Zöller
- Developmental Imaging and Psychopathology Laboratory, University of Geneva School of Medicine, Geneva, Switzerland.,Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Maude Schneider
- Developmental Imaging and Psychopathology Laboratory, University of Geneva School of Medicine, Geneva, Switzerland.,Center for Contextual Psychiatry, Research Group Psychiatry, Department of Neuroscience, KU Leuven, Leuven, Belgium
| | - Anjali Tarun
- Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Marco Armondo
- Developmental Imaging and Psychopathology Laboratory, University of Geneva School of Medicine, Geneva, Switzerland
| | - Barnaby Nelson
- Orygen, Parkville, Australia.,The Centre for Youth Mental Health, The University of Melbourne, Melbourne, Australia
| | - Paul G Amminger
- Orygen, Parkville, Australia.,The Centre for Youth Mental Health, The University of Melbourne, Melbourne, Australia.,Department of Psychiatry and Psychotherapy, Clinical Division of Social Psychiatry, Medical University Vienna, Vienna, Austria
| | - Hok Pan Yuen
- Orygen, Parkville, Australia.,The Centre for Youth Mental Health, The University of Melbourne, Melbourne, Australia
| | - Connie Markulev
- Orygen, Parkville, Australia.,The Centre for Youth Mental Health, The University of Melbourne, Melbourne, Australia
| | - Monica R Schäffer
- The Centre for Youth Mental Health, The University of Melbourne, Melbourne, Australia.,Department of Psychiatry and Psychotherapy, Clinical Division of Social Psychiatry, Medical University Vienna, Vienna, Austria
| | - Nilufar Mossaheb
- Department of Psychiatry and Psychotherapy, Clinical Division of Social Psychiatry, Medical University Vienna, Vienna, Austria
| | - Monika Schlögelhofer
- Department of Psychiatry and Psychotherapy, Clinical Division of Social Psychiatry, Medical University Vienna, Vienna, Austria
| | - Stefan Smesny
- Department of Psychiatry and Psychotherapy, Clinical Division of Social Psychiatry, Medical University Vienna, Vienna, Austria
| | - Ian B Hickie
- Department of Psychiatry, University Hospital Jena, Jena, Germany
| | | | - Eric Yh Chen
- Child and Adolescent Psychiatric Service of the Canton of Zurich, Zurich, Switzerland
| | - Lieuwe de Haan
- Department of Psychiatry, University of Hong Kong, Hong Kong, China
| | - Dorien H Nieman
- Department of Psychiatry, Amsterdam University Medical Centers, Amsterdam, Netherlands
| | | | | | - Swapna Verma
- Institute of Mental Health, Singapore, Singapore
| | - Andrew Thompson
- Orygen, Parkville, Australia.,The Centre for Youth Mental Health, The University of Melbourne, Melbourne, Australia.,Division of Mental Health and Wellbeing, Warwick Medical School, University of Warwick, Coventry, United Kingdom.,North Warwickshire Early Intervention in Psychosis Service, Conventry and Warwickshire National Health Service Partnership Trust, Coventry, United Kingdom
| | - Alison Ruth Yung
- Orygen, Parkville, Australia.,The Centre for Youth Mental Health, The University of Melbourne, Melbourne, Australia.,Division of Psychology and Mental Health, University of Manchester, Manchester, United Kingdom.,Greater Manchester Mental Health NHS Foundation Trust, Manchester, United Kingdom
| | - Patrick D McGorry
- Orygen, Parkville, Australia.,The Centre for Youth Mental Health, The University of Melbourne, Melbourne, Australia
| | - Dimitri Van De Ville
- Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.,Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Stephan Eliez
- Developmental Imaging and Psychopathology Laboratory, University of Geneva School of Medicine, Geneva, Switzerland.,Department of Genetic Medicine and Development, University of Geneva School of Medicine, Geneva, Switzerland
| |
Collapse
|
9
|
|
10
|
Tandan M, Acharya Y, Pokharel S, Timilsina M. Discovering symptom patterns of COVID-19 patients using association rule mining. Comput Biol Med 2021; 131:104249. [PMID: 33561673 PMCID: PMC7966840 DOI: 10.1016/j.compbiomed.2021.104249] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 01/25/2021] [Accepted: 01/25/2021] [Indexed: 12/16/2022]
Abstract
BACKGROUND The COVID-19 pandemic is a significant public health crisis that is hitting hard on people's health, well-being, and freedom of movement, and affecting the global economy. Scientists worldwide are competing to develop therapeutics and vaccines; currently, three drugs and two vaccine candidates have been given emergency authorization use. However, there are still questions of efficacy with regard to specific subgroups of patients and the vaccine's scalability to the general public. Under such circumstances, understanding COVID-19 symptoms is vital in initial triage; it is crucial to distinguish the severity of cases for effective management and treatment. This study aimed to discover symptom patterns and overall symptom rules, including rules disaggregated by age, sex, chronic condition, and mortality status, among COVID-19 patients. METHODS This study was a retrospective analysis of COVID-19 patient data made available online by the Wolfram Data Repository through May 27, 2020. We applied a widely used rule-based machine learning technique called association rule mining to identify frequent symptoms and define patterns in the rules discovered. RESULT In total, 1,560 patients with COVID-19 were included in the study, with a median age of 52 years. The most frequently occurring symptom was fever (67%), followed by cough (37%), malaise/body soreness (11%), pneumonia (11%), and sore throat (8%). Myocardial infarction, heart failure, and renal disease were present in less than 1% of patients. The top ten significant symptom rules (out of 71 generated) showed cough, septic shock, and respiratory distress syndrome as frequent consequents. If a patient had a breathing problem and sputum production, then, there was higher confidence of that patient having a cough; if cardiac disease, renal disease, or pneumonia was present, then there was a higher confidence of septic shock or respiratory distress syndrome. Symptom rules differed between younger and older patients and between male and female patients. Patients who had chronic conditions or died of COVID-19 had more severe symptom rules than those patients who did not have chronic conditions or survived of COVID-19. Concerning chronic condition rules among 147 patients, if a patient had diabetes, prerenal azotemia, and coronary bypass surgery, there was a certainty of hypertension. CONCLUSION The most frequently reported symptoms in patients with COVID-19 were fever, cough, pneumonia, and sore throat; while 1% had severe symptoms, such as septic shock, respiratory distress syndrome, and respiratory failure. Symptom rules differed by age and sex. Patients with chronic disease and patients who died of COVID-19 had severe symptom rules more specifically, cardiovascular-related symptoms accompanied by pneumonia, fever, and cough as consequents.
Collapse
Affiliation(s)
- Meera Tandan
- Cecil G Sheps Center for Health Service Research, University of North Carolina, Chapel Hill, USA,Corresponding author
| | - Yogesh Acharya
- Western Vascular Institute, Galway University Hospital, Galway, Ireland
| | - Suresh Pokharel
- The University of Queensland, St Lucia, Queensland, Australia
| | - Mohan Timilsina
- Data Science Institute, Insight Centre for Data Analytics, National University of Ireland Galway, Ireland
| |
Collapse
|