1
|
Koh HYK, Lam UTF, Ban KHK, Chen ES. Machine learning optimized DriverDetect software for high precision prediction of deleterious mutations in human cancers. Sci Rep 2024; 14:22618. [PMID: 39349509 PMCID: PMC11442673 DOI: 10.1038/s41598-024-71422-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2024] [Accepted: 08/28/2024] [Indexed: 10/02/2024] Open
Abstract
The detection of cancer-driving mutations is important for understanding cancer pathology and therapeutics development. Prediction tools have been created to streamline the computation process. However, most tools available have heterogeneous sensitivity or specificity. We built a machine learning-derived algorithm, DriverDetect that combines the outputs of seven pre-existing tools to improve the prediction of candidate driver cancer mutations. The algorithm was trained with cancer gene-specific mutation datasets of cancer patients to identify cancer drivers. DriverDetect performed better than the individual tools or their combinations in the validation test. It has the potential to incorporate future novel prediction algorithms and can be retrained with new datasets, offering an expanded application to pan-cancer analysis for cross-cancer study. (115 words).
Collapse
Affiliation(s)
- Herrick Yu Kan Koh
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Ulysses Tsz Fung Lam
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Kenneth Hon-Kim Ban
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
- National University Health System (NUHS), Singapore, Singapore.
- NUS Center for Cancer Research, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
| | - Ee Sin Chen
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
- National University Health System (NUHS), Singapore, Singapore.
- NUS Center for Cancer Research, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
- Integrative Sciences and Engineering Programme, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
2
|
Jamal SB, Ismail S, Yousaf R, Qazi AS, Iftkhar S, Abbasi SW. Exploring Novel 1-Hydroxynaphthalene-2-Carboxanilides Based Inhibitors Against C-Jun N-Terminal Kinases Through Molecular Dynamic Simulation and WaterSwap Analysis. Appl Biochem Biotechnol 2024; 196:1803-1819. [PMID: 37436549 DOI: 10.1007/s12010-023-04638-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/01/2023] [Indexed: 07/13/2023]
Abstract
Cancer is a disease of mutation and lifestyle modifications. A large number of normal genes can transform normal cells to cancer cells due to their deregulations including overexpression and loss of expression. Signal transduction is a complex signaling process that involves multiple interactions and different functions. C-Jun N-terminal kinases (JNKs) is an important protein involved in signaling process. JNK mediated pathways can detect, integrate, and amplify various external signals that may cause alterations in gene expression, enzyme activities, and different cellular functions that affect cellular behavior like metabolism, proliferation, differentiation, and cell survival. In this study, we performed molecular docking protocol (MOE) to predict the binding interactions of some known anticancer 1-hydroxynaphthalene-2-carboxanilides candidates. A set of 10 active compounds was retrieved after initial screening on the basis of docking scores, binding energies, and number of interactions and was re-docked in the active site of JNK protein. The results were further validated through molecular dynamics simulation and MMPB/GBSA calculations. The active compounds 4p and 5 k were ranked on top. After computationally exploring interactions of 1-hydroxynaphthalene-2-carboxanilides with JNK protein, we believe compounds 4p and 5 k can serve as potential inhibitors of JNK protein. It is believed that the results of current research would help to develop novel and structurally diverse anticancer compounds that will be useful not only treat cancer but also for the medication for the other diseases caused by protein deregulation.
Collapse
Affiliation(s)
- Syed Babar Jamal
- Department of Biological Sciences, National University of Medical Sciences, Rawalpindi, Pakistan
| | - Saba Ismail
- Department of Biological Sciences, National University of Medical Sciences, Rawalpindi, Pakistan
| | - Rimsha Yousaf
- Department of Biological Sciences, International Islamic University, Islamabad, Pakistan
| | - Asma Saleem Qazi
- Department of Biological Sciences, National University of Medical Sciences, Rawalpindi, Pakistan
| | - Saba Iftkhar
- Department of Biological Sciences, National University of Medical Sciences, Rawalpindi, Pakistan
| | - Sumra Wajid Abbasi
- Department of Biological Sciences, National University of Medical Sciences, Rawalpindi, Pakistan.
| |
Collapse
|
3
|
Taheri G, Habibi M. Uncovering driver genes in breast cancer through an innovative machine learning mutational analysis method. Comput Biol Med 2024; 171:108234. [PMID: 38430742 DOI: 10.1016/j.compbiomed.2024.108234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 01/25/2024] [Accepted: 02/25/2024] [Indexed: 03/05/2024]
Abstract
Breast cancer has become a severe public health concern and one of the leading causes of cancer-related death in women worldwide. Several genes and mutations in these genes linked to breast cancer have been identified using sophisticated techniques, despite the fact that the exact cause of breast cancer is still unknown. A commonly used feature for identifying driver mutations is the recurrence of a mutation in patients. Nevertheless, some mutations are more likely to occur than others for various reasons. Sequencing analysis has shown that cancer-driving genes operate across complex networks, often with mutations appearing in a modular pattern. In this work, as a retrospective study, we used TCGA data, which is gathered from breast cancer patients. We introduced a new machine-learning approach to examine gene functionality in networks derived from mutation associations, gene-gene interactions, and graph clustering for breast cancer analysis. These networks have uncovered crucial biological components in critical pathways, particularly those that exhibit low-frequency mutations. The statistical strength of the clinical study is significantly boosted by evaluating the network as a whole instead of just single gene effects. Our method successfully identified essential driver genes with diverse mutation frequencies. We then explored the functions of these potential driver genes and their related pathways. By uncovering low-frequency genes, we shed light on understudied pathways associated with breast cancer. Additionally, we present a novel Monte Carlo-based algorithm to identify driver modules in breast cancer. Our findings highlight the significance and role of these modules in critical signaling pathways in breast cancer, providing a comprehensive understanding of breast cancer development. Materials and implementations are available at: [https://github.com/MahnazHabibi/BreastCancer].
Collapse
Affiliation(s)
- Golnaz Taheri
- Department of Computer and Systems Sciences, Stockholm University, Stockholm, Sweden; Science for Life Laboratory, Stockholm, Sweden.
| | - Mahnaz Habibi
- Department of Mathematics, Qazvin Branch, Islamic Azad University, Qazvin, Iran
| |
Collapse
|
4
|
Visonà G, Bouzigon E, Demenais F, Schweikert G. Network propagation for GWAS analysis: a practical guide to leveraging molecular networks for disease gene discovery. Brief Bioinform 2024; 25:bbae014. [PMID: 38340090 PMCID: PMC10858647 DOI: 10.1093/bib/bbae014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 12/28/2023] [Accepted: 01/08/2024] [Indexed: 02/12/2024] Open
Abstract
MOTIVATION Genome-wide association studies (GWAS) have enabled large-scale analysis of the role of genetic variants in human disease. Despite impressive methodological advances, subsequent clinical interpretation and application remains challenging when GWAS suffer from a lack of statistical power. In recent years, however, the use of information diffusion algorithms with molecular networks has led to fruitful insights on disease genes. RESULTS We present an overview of the design choices and pitfalls that prove crucial in the application of network propagation methods to GWAS summary statistics. We highlight general trends from the literature, and present benchmark experiments to expand on these insights selecting as case study three diseases and five molecular networks. We verify that the use of gene-level scores based on GWAS P-values offers advantages over the selection of a set of 'seed' disease genes not weighted by the associated P-values if the GWAS summary statistics are of sufficient quality. Beyond that, the size and the density of the networks prove to be important factors for consideration. Finally, we explore several ensemble methods and show that combining multiple networks may improve the network propagation approach.
Collapse
Affiliation(s)
- Giovanni Visonà
- Empirical Inference, Max-Planck Institute for Intelligent Systems, Tübingen 72076, Germany
| | | | | | | |
Collapse
|
5
|
Zhang X, Pan C, Wei X, Yu M, Liu S, An J, Yang J, Wei B, Hao W, Yao Y, Zhu Y, Zhang W. Cancer-keeper genes as therapeutic targets. iScience 2023; 26:107296. [PMID: 37520717 PMCID: PMC10382876 DOI: 10.1016/j.isci.2023.107296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 05/18/2023] [Accepted: 07/03/2023] [Indexed: 08/01/2023] Open
Abstract
Finding cancer-driver genes has been a central theme of cancer research. We took a different perspective; instead of considering normal cells, we focused on cancerous cells and genes that maintained abnormal cell growth, which we named cancer-keeper genes (CKGs). Intervening CKGs may rectify aberrant cell growth, making them potential cancer therapeutic targets. We introduced control-hub genes and developed an efficient algorithm by extending network controllability theory. Control hub are essential for maintaining cancerous states and thus can be taken as CKGs. We applied our CKG-based approach to bladder cancer (BLCA). All genes on the cell-cycle and p53 pathways in BLCA were identified as CKGs, showing their importance in cancer. We discovered that sensitive CKGs - genes easily altered by structural perturbation - were particularly suitable therapeutic targets. Experiments on cell lines and a mouse model confirmed that six sensitive CKGs effectively suppressed cancer cell growth, demonstrating the immense therapeutic potential of CKGs.
Collapse
Affiliation(s)
- Xizhe Zhang
- Early Intervention Unit, Department of Psychiatry, The Affiliated Brain Hospital of Nanjing Medical University, Nanjing, China
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Chunyu Pan
- School of Computer Science and Engineering, Northeastern University, Shenyang, China
| | - Xinru Wei
- Early Intervention Unit, Department of Psychiatry, The Affiliated Brain Hospital of Nanjing Medical University, Nanjing, China
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Meng Yu
- Department of Laboratory Animal Science, China Medical University, Shenyang, China
- Key Laboratory of Transgenetic Animal Research, China Medical University, Shenyang, China
| | - Shuangjie Liu
- Department of Urology, First Affiliated Hospital of China Medical University, Shenyang, China
| | - Jun An
- Department of Urology, First Affiliated Hospital of China Medical University, Shenyang, China
| | - Jieping Yang
- Department of Urology, First Affiliated Hospital of China Medical University, Shenyang, China
| | - Baojun Wei
- Department of Urology, First Affiliated Hospital of China Medical University, Shenyang, China
| | - Wenjun Hao
- Department of Urology, First Affiliated Hospital of China Medical University, Shenyang, China
| | - Yang Yao
- Department of Physiology, Shenyang Medical College, Shenyang, China
| | - Yuyan Zhu
- Department of Urology, First Affiliated Hospital of China Medical University, Shenyang, China
| | - Weixiong Zhang
- Department of Health Technology and Informatics, Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China
- Department of Computer Science and Engineering, Department of Genetics, Washington University in St. Louis, St. Louis, MO, USA
| |
Collapse
|
6
|
Quan C, Liu F, Qi L, Tie Y. LRT-CLUSTER: A New Clustering Algorithm Based on Likelihood Ratio Test to Identify Driving Genes. Interdiscip Sci 2023; 15:217-230. [PMID: 36848004 DOI: 10.1007/s12539-023-00554-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 01/31/2023] [Accepted: 02/01/2023] [Indexed: 03/01/2023]
Abstract
Somatic mutations often occur at high relapse sites in protein sequences, which indicates that the location clustering of somatic missense mutations can be used to identify driving genes. However, the traditional clustering algorithm has such problems as the background signal over-fitting, the clustering algorithm is not suitable for mutation data, and the performance of identifying low-frequency mutation genes needs to be improved. In this paper, we propose a linear clustering algorithm based on likelihood ratio test knowledge to identify driver genes. In this experiment, firstly, the polynucleotide mutation rate is calculated based on the prior knowledge of likelihood ratio test. Then, the simulation data set is obtained through the background mutation rate model. Finally, the unsupervised peak clustering algorithm is used to, respectively, evaluate the somatic mutation data and the simulation data to identify the driver genes. The experimental results show that our method achieves a better balance of precision and sensitivity. It can also identify the driver genes missed by other methods, making it an effective supplement to other methods. We also discover some potential linkages between genes and between genes and mutation sites, which is of great value to target drug therapy research. Method framework: Our proposed model framework is as follows. a. Counting mutation sites and the number of mutations in tumor gene elements. b. The nucleotide context mutation frequency is counted based on the likelihood ratio test knowledge, and the background mutation rate model is obtained. c. Based on Monte Carlo simulation method, data sets with the same number of mutations as gene elements are randomly sampled to obtain simulated mutation data, and the sampling frequency of each mutation site is related to the mutation rate of polynucleotide. d. The original mutation data and the simulated mutation data after random reconstruction are clustered by peak density, respectively, and the corresponding clustering scores are obtained. e. We can obtain the clustering information statistics in each gene segment and score of each gene segment from the original single nucleotide mutation data through step d. f. According to the observed score and the simulated clustering score, the p-value of the corresponding gene fragment is calculated. g. We can obtain the clustering information statistics in each gene segment and score of each gene segment from the simulated single nucleotide mutation data through step d.
Collapse
Affiliation(s)
- Chenxu Quan
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou, China.,Department of Respiratory and Sleep Medicine, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Fenghui Liu
- Department of Respiratory and Sleep Medicine, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Lin Qi
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou, China
| | - Yun Tie
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou, China.
| |
Collapse
|
7
|
Chitra U, Park TY, Raphael BJ. NetMix2: A Principled Network Propagation Algorithm for Identifying Altered Subnetworks. J Comput Biol 2022; 29:1305-1323. [PMID: 36525308 PMCID: PMC9917315 DOI: 10.1089/cmb.2022.0336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
A standard paradigm in computational biology is to leverage interaction networks as prior knowledge in analyzing high-throughput biological data, where the data give a score for each vertex in the network. One classical approach is the identification of altered subnetworks, or subnetworks of the interaction network that have both outlier vertex scores and a defined network topology. One class of algorithms for identifying altered subnetworks search for high-scoring subnetworks in subnetwork families with simple topological constraints, such as connected subnetworks, and have sound statistical guarantees. A second class of algorithms employ network propagation-the smoothing of vertex scores over the network using a random walk or diffusion process-and utilize the global structure of the network. However, network propagation algorithms often rely on ad hoc heuristics that lack a rigorous statistical foundation. In this work, we unify the subnetwork family and network propagation approaches by deriving the propagation family, a subnetwork family that approximates the sets of vertices ranked highly by network propagation approaches. We introduce NetMix2, a principled algorithm for identifying altered subnetworks from a wide range of subnetwork families. When using the propagation family, NetMix2 combines the advantages of the subnetwork family and network propagation approaches. NetMix2 outperforms other methods, including network propagation on simulated data, pan-cancer somatic mutation data, and genome-wide association data from multiple human diseases.
Collapse
Affiliation(s)
- Uthsav Chitra
- Department of Computer Science, Princeton University, Princeton, New Jersey, USA
| | - Tae Yoon Park
- Department of Computer Science, Princeton University, Princeton, New Jersey, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA
| | - Benjamin J. Raphael
- Department of Computer Science, Princeton University, Princeton, New Jersey, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA
| |
Collapse
|
8
|
Multi-omics peripheral and core regions of cancer. NPJ Syst Biol Appl 2022; 8:47. [PMID: 36446819 PMCID: PMC9707100 DOI: 10.1038/s41540-022-00258-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 11/07/2022] [Indexed: 11/30/2022] Open
Abstract
Thousands of genes are perturbed by cancer, and these disturbances can be seen in transcriptome, methylation, somatic mutation, and copy number variation omics studies. Understanding their connectivity patterns as an omnigenic neighbourhood in a molecular interaction network (interactome) is a key step towards advancing knowledge of the molecular mechanisms underlying cancers. Here, we introduce a unified connectivity line (CLine) to pinpoint omics-specific omnigenic patterns across 15 curated cancers. Taking advantage of the universality of CLine, we distinguish the peripheral and core genes for each omics aspect. We propose a network-based framework, multi-omics periphery and core (MOPC), to combine peripheral and core genes from different omics into a button-like structure. On the basis of network proximity, we provide evidence that core genes tend to be specifically perturbed in one omics, but the peripheral genes are diversely perturbed in multiple omics. And the core of one omics is regulated by multiple omics peripheries. Finally, we take the MOPC as an omnigenic neighbourhood, describe its characteristics, and explore its relative contribution to network-based mechanisms of cancer. We were able to present how multi-omics perturbations percolate through the human interactome and contribute to an integrated periphery and core.
Collapse
|
9
|
Habibi M, Taheri G. A new machine learning method for cancer mutation analysis. PLoS Comput Biol 2022; 18:e1010332. [PMID: 36251702 PMCID: PMC9612828 DOI: 10.1371/journal.pcbi.1010332] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 10/27/2022] [Accepted: 10/05/2022] [Indexed: 11/23/2022] Open
Abstract
It is complicated to identify cancer-causing mutations. The recurrence of a mutation in patients remains one of the most reliable features of mutation driver status. However, some mutations are more likely to happen than others for various reasons. Different sequencing analysis has revealed that cancer driver genes operate across complex pathways and networks, with mutations often arising in a mutually exclusive pattern. Genes with low-frequency mutations are understudied as cancer-related genes, especially in the context of networks. Here we propose a machine learning method to study the functionality of mutually exclusive genes in the networks derived from mutation associations, gene-gene interactions, and graph clustering. These networks have indicated critical biological components in the essential pathways, especially those mutated at low frequency. Studying the network and not just the impact of a single gene significantly increases the statistical power of clinical analysis. The proposed method identified important driver genes with different frequencies. We studied the function and the associated pathways in which the candidate driver genes participate. By introducing lower-frequency genes, we recognized less studied cancer-related pathways. We also proposed a novel clustering method to specify driver modules. We evaluated each driver module with different criteria, including the terms of biological processes and the number of simultaneous mutations in each cancer. Materials and implementations are available at: https://github.com/MahnazHabibi/MutationAnalysis.
Collapse
Affiliation(s)
- Mahnaz Habibi
- Department of Mathematics, Qazvin Branch, Islamic Azad University, Qazvin, Iran
| | - Golnaz Taheri
- Department of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden
- Science for Life Laboratory, Stockholm, Sweden
- * E-mail:
| |
Collapse
|
10
|
Abstract
Three-dimensional protein structural data at the molecular level are pivotal for successful precision medicine. Such data are crucial not only for discovering drugs that act to block the active site of the target mutant protein but also for clarifying to the patient and the clinician how the mutations harbored by the patient work. The relative paucity of structural data reflects their cost, challenges in their interpretation, and lack of clinical guidelines for their utilization. Rapid technological advancements in experimental high-resolution structural determination increasingly generate structures. Computationally, modeling algorithms, including molecular dynamics simulations, are becoming more powerful, as are compute-intensive hardware, particularly graphics processing units, overlapping with the inception of the exascale era. Accessible, freely available, and detailed structural and dynamical data can be merged with big data to powerfully transform personalized pharmacology. Here we review protein and emerging genome high-resolution data, along with means, applications, and examples underscoring their usefulness in precision medicine. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Ruth Nussinov
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Laboratory of Cancer Immunometabolism, National Cancer Institute, Frederick, Maryland, USA; .,Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Hyunbum Jang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Laboratory of Cancer Immunometabolism, National Cancer Institute, Frederick, Maryland, USA;
| | - Guy Nir
- Department of Biochemistry and Molecular Biology, Department of Neuroscience, Cell Biology and Anatomy, and Sealy Center for Structural Biology and Molecular Biophysics, University of Texas Medical Branch, Galveston, Texas, USA
| | - Chung-Jung Tsai
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Laboratory of Cancer Immunometabolism, National Cancer Institute, Frederick, Maryland, USA;
| | - Feixiong Cheng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, USA.,Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, Ohio, USA.,Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
| |
Collapse
|
11
|
Andrades R, Recamonde-Mendoza M. Machine learning methods for prediction of cancer driver genes: a survey paper. Brief Bioinform 2022; 23:6551145. [PMID: 35323900 DOI: 10.1093/bib/bbac062] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 02/06/2022] [Accepted: 02/08/2022] [Indexed: 12/21/2022] Open
Abstract
Identifying the genes and mutations that drive the emergence of tumors is a critical step to improving our understanding of cancer and identifying new directions for disease diagnosis and treatment. Despite the large volume of genomics data, the precise detection of driver mutations and their carrying genes, known as cancer driver genes, from the millions of possible somatic mutations remains a challenge. Computational methods play an increasingly important role in discovering genomic patterns associated with cancer drivers and developing predictive models to identify these elements. Machine learning (ML), including deep learning, has been the engine behind many of these efforts and provides excellent opportunities for tackling remaining gaps in the field. Thus, this survey aims to perform a comprehensive analysis of ML-based computational approaches to identify cancer driver mutations and genes, providing an integrated, panoramic view of the broad data and algorithmic landscape within this scientific problem. We discuss how the interactions among data types and ML algorithms have been explored in previous solutions and outline current analytical limitations that deserve further attention from the scientific community. We hope that by helping readers become more familiar with significant developments in the field brought by ML, we may inspire new researchers to address open problems and advance our knowledge towards cancer driver discovery.
Collapse
Affiliation(s)
- Renan Andrades
- Institute of Informatics, Universidade Federal do Rio Grande do Sul, Porto Alegre/RS, Brazil.,Bioinformatics Core, Hospital de Clínicas de Porto Alegre, Porto Alegre/RS, Brazil
| | - Mariana Recamonde-Mendoza
- Institute of Informatics, Universidade Federal do Rio Grande do Sul, Porto Alegre/RS, Brazil.,Bioinformatics Core, Hospital de Clínicas de Porto Alegre, Porto Alegre/RS, Brazil
| |
Collapse
|
12
|
Strybol PP, Larmuseau M, de Schaetzen van Brienen L, Van den Bulcke T, Marchal K. Extracting functional insights from loss-of-function screens using deep link prediction. CELL REPORTS METHODS 2022; 2:100171. [PMID: 35474966 PMCID: PMC9017186 DOI: 10.1016/j.crmeth.2022.100171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Revised: 12/09/2021] [Accepted: 01/25/2022] [Indexed: 11/10/2022]
Abstract
We present deep link prediction (DLP), a method for the interpretation of loss-of-function screens. Our approach uses representation-based link prediction to reprioritize phenotypic readouts by integrating screening experiments with gene-gene interaction networks. We validate on 2 different loss-of-function technologies, RNAi and CRISPR, using datasets obtained from DepMap. Extensive benchmarking shows that DLP-DeepWalk outperforms other methods in recovering cell-specific dependencies, achieving an average precision well above 90% across 7 different cancer types and on both RNAi and CRISPR data. We show that the genes ranked highest by DLP-DeepWalk are appreciably more enriched in drug targets compared to the ranking based on original screening scores. Interestingly, this enrichment is more pronounced on RNAi data compared to CRISPR data, consistent with the greater inherent noise of RNAi screens. Finally, we demonstrate how DLP-DeepWalk can infer the molecular mechanism through which putative targets trigger cell line mortality.
Collapse
Affiliation(s)
- Pieter-Paul Strybol
- Department of Plant Biotechnology and Bioinformatics, Department of Information Technology, IDLab, imec, iGent Toren, 9000 Gent, Belgium
| | - Maarten Larmuseau
- Department of Plant Biotechnology and Bioinformatics, Department of Information Technology, IDLab, imec, iGent Toren, 9000 Gent, Belgium
| | - Louise de Schaetzen van Brienen
- Department of Plant Biotechnology and Bioinformatics, Department of Information Technology, IDLab, imec, iGent Toren, 9000 Gent, Belgium
| | | | - Kathleen Marchal
- Department of Plant Biotechnology and Bioinformatics, Department of Information Technology, IDLab, imec, iGent Toren, 9000 Gent, Belgium
| |
Collapse
|
13
|
Lung cancer prediction using multi-gene genetic programming by selecting automatic features from amino acid sequences. Comput Biol Chem 2022; 98:107638. [DOI: 10.1016/j.compbiolchem.2022.107638] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2021] [Revised: 12/22/2021] [Accepted: 02/01/2022] [Indexed: 02/07/2023]
|
14
|
gcMECM: graph clustering of mutual exclusivity of cancer mutations. BMC Bioinformatics 2021; 22:592. [PMID: 34906079 PMCID: PMC8670134 DOI: 10.1186/s12859-021-04505-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 11/30/2021] [Indexed: 11/29/2022] Open
Abstract
Background Next-generation sequencing platforms allow us to sequence millions of small fragments of DNA simultaneously, revolutionizing cancer research. Sequence analysis has revealed that cancer driver genes operate across multiple intricate pathways and networks with mutations often occurring in a mutually exclusive pattern. Currently, low-frequency mutations are understudied as cancer-relevant genes, especially in the context of networks. Results Here we describe a tool, gcMECM, that enables us to visualize the functionality of mutually exclusive genes in the subnetworks derived from mutation associations, gene–gene interactions, and graph clustering. These subnetworks have revealed crucial biological components in the canonical pathway, especially those mutated at low frequency. Examining the subnetwork, and not just the impact of a single gene, significantly increases the statistical power of clinical analysis and enables us to build models to better predict how and why cancer develops. Conclusions gcMECM uses a computationally efficient and scalable algorithm to identify subnetworks in a canonical pathway with mutually exclusive mutation patterns and distinct biological functions.
Collapse
|
15
|
Ahmed R, Erten C, Houdjedj A, Kazan H, Yalcin C. A Network-Centric Framework for the Evaluation of Mutual Exclusivity Tests on Cancer Drivers. Front Genet 2021; 12:746495. [PMID: 34899838 PMCID: PMC8664367 DOI: 10.3389/fgene.2021.746495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Accepted: 10/27/2021] [Indexed: 12/03/2022] Open
Abstract
One of the key concepts employed in cancer driver gene identification is that of mutual exclusivity (ME); a driver mutation is less likely to occur in case of an earlier mutation that has common functionality in the same molecular pathway. Several ME tests have been proposed recently, however the current protocols to evaluate ME tests have two main limitations. Firstly the evaluations are mostly with respect to simulated data and secondly the evaluation metrics lack a network-centric view. The latter is especially crucial as the notion of common functionality can be achieved through searching for interaction patterns in relevant networks. We propose a network-centric framework to evaluate the pairwise significances found by statistical ME tests. It has three main components. The first component consists of metrics employed in the network-centric ME evaluations. Such metrics are designed so that network knowledge and the reference set of known cancer genes are incorporated in ME evaluations under a careful definition of proper control groups. The other two components are designed as further mechanisms to avoid confounders inherent in ME detection on top of the network-centric view. To this end, our second objective is to dissect the side effects caused by mutation load artifacts where mutations driving tumor subtypes with low mutation load might be incorrectly diagnosed as mutually exclusive. Finally, as part of the third main component, the confounding issue stemming from the use of nonspecific interaction networks generated as combinations of interactions from different tissues is resolved through the creation and use of tissue-specific networks in the proposed framework. The data, the source code and useful scripts are available at: https://github.com/abu-compbio/NetCentric.
Collapse
Affiliation(s)
- Rafsan Ahmed
- Electrical and Computer Engineering Graduate Program, Antalya Bilim University, Antalya, Turkey
| | - Cesim Erten
- Department of Computer Engineering, Antalya Bilim University, Antalya, Turkey
| | - Aissa Houdjedj
- Department of Computer Engineering, Antalya Bilim University, Antalya, Turkey
| | - Hilal Kazan
- Department of Computer Engineering, Antalya Bilim University, Antalya, Turkey
| | - Cansu Yalcin
- Department of Computer Engineering, Antalya Bilim University, Antalya, Turkey
| |
Collapse
|
16
|
Cutigi JF, Evangelista AF, Reis RM, Simao A. A computational approach for the discovery of significant cancer genes by weighted mutation and asymmetric spreading strength in networks. Sci Rep 2021; 11:23551. [PMID: 34876593 PMCID: PMC8651746 DOI: 10.1038/s41598-021-02671-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2021] [Accepted: 10/26/2021] [Indexed: 11/25/2022] Open
Abstract
Identifying significantly mutated genes in cancer is essential for understanding the mechanisms of tumor initiation and progression. This task is a key challenge since large-scale genomic studies have reported an endless number of genes mutated at a shallow frequency. Towards uncovering infrequently mutated genes, gene interaction networks combined with mutation data have been explored. This work proposes Discovering Significant Cancer Genes (DiSCaGe), a computational method for discovering significant genes for cancer. DiSCaGe computes a mutation score for the genes based on the type of mutations they have. The influence received for their neighbors in the network is also considered and obtained through an asymmetric spreading strength applied to a consensus gene network. DiSCaGe produces a ranking of prioritized possible cancer genes. An experimental evaluation with six types of cancer revealed the potential of DiSCaGe for discovering known and possible novel significant cancer genes.
Collapse
Affiliation(s)
- Jorge Francisco Cutigi
- Federal Institute of Sao Paulo, Sao Carlos, SP, Brazil.
- University of Sao Paulo, Sao Carlos, SP, Brazil.
| | | | - Rui Manuel Reis
- Molecular Oncology Research Center, Barretos Cancer Hospital, Barretos, SP, Brazil
| | | |
Collapse
|
17
|
Pham VVH, Liu L, Bracken C, Goodall G, Li J, Le TD. Computational methods for cancer driver discovery: A survey. Am J Cancer Res 2021; 11:5553-5568. [PMID: 33859763 PMCID: PMC8039954 DOI: 10.7150/thno.52670] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Accepted: 01/20/2021] [Indexed: 12/21/2022] Open
Abstract
Identifying the genes responsible for driving cancer is of critical importance for directing treatment. Accordingly, multiple computational tools have been developed to facilitate this task. Due to the different methods employed by these tools, different data considered by the tools, and the rapidly evolving nature of the field, the selection of an appropriate tool for cancer driver discovery is not straightforward. This survey seeks to provide a comprehensive review of the different computational methods for discovering cancer drivers. We categorise the methods into three groups; methods for single driver identification, methods for driver module identification, and methods for identifying personalised cancer drivers. In addition to providing a “one-stop” reference of these methods, by evaluating and comparing their performance, we also provide readers the information about the different capabilities of the methods in identifying biologically significant cancer drivers. The biologically relevant information identified by these tools can be seen through the enrichment of discovered cancer drivers in GO biological processes and KEGG pathways and through our identification of a small cancer-driver cohort that is capable of stratifying patient survival.
Collapse
|
18
|
Erten C, Houdjedj A, Kazan H. Ranking cancer drivers via betweenness-based outlier detection and random walks. BMC Bioinformatics 2021; 22:62. [PMID: 33568049 PMCID: PMC7877041 DOI: 10.1186/s12859-021-03989-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Accepted: 01/31/2021] [Indexed: 12/04/2022] Open
Abstract
Background Recent cancer genomic studies have generated detailed molecular data on a large number of cancer patients. A key remaining problem in cancer genomics is the identification of driver genes. Results We propose BetweenNet, a computational approach that integrates genomic data with a protein-protein interaction network to identify cancer driver genes. BetweenNet utilizes a measure based on betweenness centrality on patient specific networks to identify the so-called outlier genes that correspond to dysregulated genes for each patient. Setting up the relationship between the mutated genes and the outliers through a bipartite graph, it employs a random-walk process on the graph, which provides the final prioritization of the mutated genes. We compare BetweenNet against state-of-the art cancer gene prioritization methods on lung, breast, and pan-cancer datasets. Conclusions Our evaluations show that BetweenNet is better at recovering known cancer genes based on multiple reference databases. Additionally, we show that the GO terms and the reference pathways enriched in BetweenNet ranked genes and those that are enriched in known cancer genes overlap significantly when compared to the overlaps achieved by the rankings of the alternative methods.
Collapse
Affiliation(s)
- Cesim Erten
- Department of Computer Engineering, Antalya Bilim University, Antalya, Turkey
| | - Aissa Houdjedj
- Electrical and Computer Engineering Graduate Program, Antalya Bilim University, Antalya, Turkey
| | - Hilal Kazan
- Department of Computer Engineering, Antalya Bilim University, Antalya, Turkey.
| |
Collapse
|
19
|
Yu L, Wang M, Yang Y, Xu F, Zhang X, Xie F, Gao L, Li X. Predicting therapeutic drugs for hepatocellular carcinoma based on tissue-specific pathways. PLoS Comput Biol 2021; 17:e1008696. [PMID: 33561121 PMCID: PMC7920387 DOI: 10.1371/journal.pcbi.1008696] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Revised: 03/01/2021] [Accepted: 01/12/2021] [Indexed: 02/06/2023] Open
Abstract
Hepatocellular carcinoma (HCC) is a significant health problem worldwide with poor prognosis. Drug repositioning represents a profitable strategy to accelerate drug discovery in the treatment of HCC. In this study, we developed a new approach for predicting therapeutic drugs for HCC based on tissue-specific pathways and identified three newly predicted drugs that are likely to be therapeutic drugs for the treatment of HCC. We validated these predicted drugs by analyzing their overlapping drug indications reported in PubMed literature. By using the cancer cell line data in the database, we constructed a Connectivity Map (CMap) profile similarity analysis and KEGG enrichment analysis on their related genes. By experimental validation, we found securinine and ajmaline significantly inhibited cell viability of HCC cells and induced apoptosis. Among them, securinine has lower toxicity to normal liver cell line, which is worthy of further research. Our results suggested that the proposed approach was effective and accurate for discovering novel therapeutic options for HCC. This method also could be used to indicate unmarked drug-disease associations in the Comparative Toxicogenomics Database. Meanwhile, our method could also be applied to predict the potential drugs for other types of tumors by changing the database.
Collapse
Affiliation(s)
- Liang Yu
- School of Computer Science and Technology, Xidian University, Shaanxi, China
| | - Meng Wang
- Shandong Provincial Key Laboratory of Animal Cell and Developmental Biology, School of Life Sciences, Advanced Medical Research Institute, Shandong University, 72, Jimo District, Qingdao, Shandong, China
| | - Yang Yang
- Shandong Provincial Key Laboratory of Animal Cell and Developmental Biology, School of Life Sciences, Advanced Medical Research Institute, Shandong University, 72, Jimo District, Qingdao, Shandong, China
| | - Fengdan Xu
- School of Computer Science and Technology, Xidian University, Shaanxi, China
| | - Xu Zhang
- Shandong Provincial Key Laboratory of Animal Cell and Developmental Biology, School of Life Sciences, Advanced Medical Research Institute, Shandong University, 72, Jimo District, Qingdao, Shandong, China
| | - Fei Xie
- Shandong Provincial Key Laboratory of Animal Cell and Developmental Biology, School of Life Sciences, Advanced Medical Research Institute, Shandong University, 72, Jimo District, Qingdao, Shandong, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Shaanxi, China
| | - Xiangzhi Li
- Shandong Provincial Key Laboratory of Animal Cell and Developmental Biology, School of Life Sciences, Advanced Medical Research Institute, Shandong University, 72, Jimo District, Qingdao, Shandong, China
| |
Collapse
|
20
|
Nussinov R, Jang H, Nir G, Tsai CJ, Cheng F. A new precision medicine initiative at the dawn of exascale computing. Signal Transduct Target Ther 2021; 6:3. [PMID: 33402669 PMCID: PMC7785737 DOI: 10.1038/s41392-020-00420-3] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 10/27/2020] [Accepted: 10/30/2020] [Indexed: 12/14/2022] Open
Abstract
Which signaling pathway and protein to select to mitigate the patient's expected drug resistance? The number of possibilities facing the physician is massive, and the drug combination should fit the patient status. Here, we briefly review current approaches and data and map an innovative patient-specific strategy to forecast drug resistance targets that centers on parallel (or redundant) proliferation pathways in specialized cells. It considers the availability of each protein in each pathway in the specific cell, its activating mutations, and the chromatin accessibility of its encoding gene. The construction of the resulting Proliferation Pathway Network Atlas will harness the emerging exascale computing and advanced artificial intelligence (AI) methods for therapeutic development. Merging the resulting set of targets, pathways, and proteins, with current strategies will augment the choice for the attending physicians to thwart resistance.
Collapse
Affiliation(s)
- Ruth Nussinov
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Laboratory of Cancer Immunometabolism, National Cancer Institute, Frederick, MD, 21702, USA.
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv, 69978, Israel.
| | - Hyunbum Jang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Laboratory of Cancer Immunometabolism, National Cancer Institute, Frederick, MD, 21702, USA
| | - Guy Nir
- Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, 02115, USA
- Department of Biochemistry & Molecular Biology, Department of Neuroscience, Cell Biology and Anatomy, Sealy Center for Structural Biology and Molecular Biophysics, University of Texas Medical Branch, Galveston, TX, 77555, USA
| | - Chung-Jung Tsai
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Laboratory of Cancer Immunometabolism, National Cancer Institute, Frederick, MD, 21702, USA
| | - Feixiong Cheng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, 44106, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH, 44195, USA
| |
Collapse
|
21
|
Reyna MA, Chitra U, Elyanow R, Raphael BJ. NetMix: A Network-Structured Mixture Model for Reduced-Bias Estimation of Altered Subnetworks. J Comput Biol 2021; 28:469-484. [PMID: 33400606 DOI: 10.1089/cmb.2020.0435] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
A classic problem in computational biology is the identification of altered subnetworks: subnetworks of an interaction network that contain genes/proteins that are differentially expressed, highly mutated, or otherwise aberrant compared with other genes/proteins. Numerous methods have been developed to solve this problem under various assumptions, but the statistical properties of these methods are often unknown. For example, some widely used methods are reported to output very large subnetworks that are difficult to interpret biologically. In this work, we formulate the identification of altered subnetworks as the problem of estimating the parameters of a class of probability distributions that we call the Altered Subset Distribution (ASD). We derive a connection between a popular method, jActiveModules, and the maximum likelihood estimator (MLE) of the ASD. We show that the MLE is statistically biased, explaining the large subnetworks output by jActiveModules. Based on these insights, we introduce NetMix, an algorithm that uses Gaussian mixture models to obtain less biased estimates of the parameters of the ASD. We demonstrate that NetMix outperforms existing methods in identifying altered subnetworks on both simulated and real data, including the identification of differentially expressed genes from both microarray and RNA-seq experiments and the identification of cancer driver genes in somatic mutation data.
Collapse
Affiliation(s)
- Matthew A Reyna
- Department of Biomedical Informatics, Emory University, Atlanta, Georgia, USA
| | - Uthsav Chitra
- Department of Computer Science, Princeton University, Princeton, New Jersey, USA
| | - Rebecca Elyanow
- Department of Computer Science, Princeton University, Princeton, New Jersey, USA
- Department of Computer Science, Brown University, Providence, Rhode Island, USA
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, New Jersey, USA
| |
Collapse
|
22
|
Fadaka AO, Sibuyi NRS, Madiehe AM, Meyer M. MicroRNA-based regulation of Aurora A kinase in breast cancer. Oncotarget 2020; 11:4306-4324. [PMID: 33245732 PMCID: PMC7679040 DOI: 10.18632/oncotarget.27811] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 10/27/2020] [Indexed: 02/07/2023] Open
Abstract
The involvement of non-coding RNAs (ncRNAs) in cellular physiology and disease pathogenesis is becoming increasingly relevant in recent years specifically in cancer research. Breast cancer (BC) has become a health concern and accounts for most of the cancer-related incidences and mortalities reported amongst females. In spite of the presence of promising tools for BC therapy, the mortality rate of metastatic BC cases is still high. Therefore, the genomic exploration of the BC subtype and the use of ncRNAs for possible regulation is pivotal. The expression and prognostic values of AURKA gene were assessed by Oncomine, GEPIA, KM-plotter, and bc-GenExMiner v4.4, respectively. Associated proteins and functional enrichment were evaluated by Cytoscape and DAVID databases. Additionally, molecular docking approach was employed to investigate the regulatory role of hsa-miR-32-3p assisted argonaute (AGO) protein of AURKA gene in BC. AURKA gene was highly expressed in patients with BC relative to normal counterpart and significantly correlated with poor survival. The docking result suggested that AURKA could be regulated by hsa-miR-32-3p as confirmed by the reported binding energy and specific interactions. The study gives some insights into role of AURKA and its regulation by microRNAs through AGO protein. It also provides exciting opportunities for cancer therapeutic intervention.
Collapse
Affiliation(s)
- Adewale Oluwaseun Fadaka
- Department of Science and Innovation/Mintek Nanotechnology Innovation Centre, Biolabels Node, Department of Biotechnology, Faculty of Natural Sciences, University of the Western Cape, Bellville, South Africa
| | - Nicole Remaliah Samantha Sibuyi
- Department of Science and Innovation/Mintek Nanotechnology Innovation Centre, Biolabels Node, Department of Biotechnology, Faculty of Natural Sciences, University of the Western Cape, Bellville, South Africa
| | - Abram Madimabe Madiehe
- Department of Science and Innovation/Mintek Nanotechnology Innovation Centre, Biolabels Node, Department of Biotechnology, Faculty of Natural Sciences, University of the Western Cape, Bellville, South Africa.,Nanobiotechnology Research Group, Department of Biotechnology, Faculty of Natural Sciences, University of the Western Cape, Bellville, South Africa
| | - Mervin Meyer
- Department of Science and Innovation/Mintek Nanotechnology Innovation Centre, Biolabels Node, Department of Biotechnology, Faculty of Natural Sciences, University of the Western Cape, Bellville, South Africa
| |
Collapse
|
23
|
Ahmed R, Baali I, Erten C, Hoxha E, Kazan H. MEXCOwalk: mutual exclusion and coverage based random walk to identify cancer modules. Bioinformatics 2020; 36:872-879. [PMID: 31432076 DOI: 10.1093/bioinformatics/btz655] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2019] [Revised: 07/03/2019] [Accepted: 08/18/2019] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Genomic analyses from large cancer cohorts have revealed the mutational heterogeneity problem which hinders the identification of driver genes based only on mutation profiles. One way to tackle this problem is to incorporate the fact that genes act together in functional modules. The connectivity knowledge present in existing protein-protein interaction (PPI) networks together with mutation frequencies of genes and the mutual exclusivity of cancer mutations can be utilized to increase the accuracy of identifying cancer driver modules. RESULTS We present a novel edge-weighted random walk-based approach that incorporates connectivity information in the form of protein-protein interactions (PPIs), mutual exclusivity and coverage to identify cancer driver modules. MEXCOwalk outperforms several state-of-the-art computational methods on TCGA pan-cancer data in terms of recovering known cancer genes, providing modules that are capable of classifying normal and tumor samples and that are enriched for mutations in specific cancer types. Furthermore, the risk scores determined with output modules can stratify patients into low-risk and high-risk groups in multiple cancer types. MEXCOwalk identifies modules containing both well-known cancer genes and putative cancer genes that are rarely mutated in the pan-cancer data. The data, the source code and useful scripts are available at: https://github.com/abu-compbio/MEXCOwalk. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rafsan Ahmed
- Electrical and Computer Engineering Graduate Program, Department of Computer Engineering, Antalya Bilim University, Antalya 07190, Turkey
| | - Ilyes Baali
- Electrical and Computer Engineering Graduate Program, Department of Computer Engineering, Antalya Bilim University, Antalya 07190, Turkey
| | - Cesim Erten
- Department of Computer Engineering, Antalya Bilim University, Antalya 07190, Turkey
| | - Evis Hoxha
- Department of Computer Engineering, Antalya Bilim University, Antalya 07190, Turkey
| | - Hilal Kazan
- Department of Computer Engineering, Antalya Bilim University, Antalya 07190, Turkey
| |
Collapse
|
24
|
Abstract
Breast cancer is one of the most common cancers worldwide, which makes it a very impactful malignancy in the society. Breast cancers can be classified through different systems based on the main tumor features and gene, protein, and cell receptors expression, which will determine the most advisable therapeutic course and expected outcomes. Multiple therapeutic options have already been proposed and implemented for breast cancer treatment. Nonetheless, their use and efficacy still greatly depend on the tumor classification, and treatments are commonly associated with invasiveness, pain, discomfort, severe side effects, and poor specificity. This has demanded an investment in the research of the mechanisms behind the disease progression, evolution, and associated risk factors, and on novel diagnostic and therapeutic techniques. However, advances in the understanding and assessment of breast cancer are dependent on the ability to mimic the properties and microenvironment of tumors in vivo, which can be achieved through experimentation on animal models. This review covers an overview of the main animal models used in breast cancer research, namely in vitro models, in vivo models, in silico models, and other models. For each model, the main characteristics, advantages, and challenges associated to their use are highlighted.
Collapse
|
25
|
Cutigi JF, Evangelista AF, Simao A. Approaches for the identification of driver mutations in cancer: A tutorial from a computational perspective. J Bioinform Comput Biol 2020; 18:2050016. [PMID: 32698724 DOI: 10.1142/s021972002050016x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Cancer is a complex disease caused by the accumulation of genetic alterations during the individual's life. Such alterations are called genetic mutations and can be divided into two groups: (1) Passenger mutations, which are not responsible for cancer and (2) Driver mutations, which are significant for cancer and responsible for its initiation and progression. Cancer cells undergo a large number of mutations, of which most are passengers, and few are drivers. The identification of driver mutations is a key point and one of the biggest challenges in Cancer Genomics. Many computational methods for such a purpose have been developed in Cancer Bioinformatics. Such computational methods are complex and are usually described in a high level of abstraction. This tutorial details some classical computational methods, from a computational perspective, with the transcription in an algorithmic format towards an easy access by researchers.
Collapse
Affiliation(s)
- Jorge Francisco Cutigi
- Federal Institute of São Paulo (IFSP), São Carlos, SP, Brazil.,University of São Paulo (USP), São Carlos, SP, Brazil
| | | | | |
Collapse
|
26
|
Fernández-Martínez JL, Álvarez-Machancoses Ó, deAndrés-Galiana EJ, Bea G, Kloczkowski A. Robust Sampling of Defective Pathways in Alzheimer's Disease. Implications in Drug Repositioning. Int J Mol Sci 2020; 21:ijms21103594. [PMID: 32438758 PMCID: PMC7279419 DOI: 10.3390/ijms21103594] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Revised: 05/09/2020] [Accepted: 05/13/2020] [Indexed: 12/21/2022] Open
Abstract
We present the analysis of the defective genetic pathways of the Late-Onset Alzheimer’s Disease (LOAD) compared to the Mild Cognitive Impairment (MCI) and Healthy Controls (HC) using different sampling methodologies. These algorithms sample the uncertainty space that is intrinsic to any kind of highly underdetermined phenotype prediction problem, by looking for the minimum-scale signatures (header genes) corresponding to different random holdouts. The biological pathways can be identified performing posterior analysis of these signatures established via cross-validation holdouts and plugging the set of most frequently sampled genes into different ontological platforms. That way, the effect of helper genes, whose presence might be due to the high degree of under determinacy of these experiments and data noise, is reduced. Our results suggest that common pathways for Alzheimer’s disease and MCI are mainly related to viral mRNA translation, influenza viral RNA transcription and replication, gene expression, mitochondrial translation, and metabolism, with these results being highly consistent regardless of the comparative methods. The cross-validated predictive accuracies achieved for the LOAD and MCI discriminations were 84% and 81.5%, respectively. The difference between LOAD and MCI could not be clearly established (74% accuracy). The most discriminatory genes of the LOAD-MCI discrimination are associated with proteasome mediated degradation and G-protein signaling. Based on these findings we have also performed drug repositioning using Dr. Insight package, proposing the following different typologies of drugs: isoquinoline alkaloids, antitumor antibiotics, phosphoinositide 3-kinase PI3K, autophagy inhibitors, antagonists of the muscarinic acetylcholine receptor and histone deacetylase inhibitors. We believe that the potential clinical relevance of these findings should be further investigated and confirmed with other independent studies.
Collapse
Affiliation(s)
- Juan Luis Fernández-Martínez
- Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo, C/Federico García Lorca, 18, 33007 Oviedo, Spain; (Ó.Á.-M.); (E.J.d.-G.); (G.B.)
- DeepBioInsights, C/Federico García Lorca, 18, 33007 Oviedo, Spain
- Correspondence:
| | - Óscar Álvarez-Machancoses
- Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo, C/Federico García Lorca, 18, 33007 Oviedo, Spain; (Ó.Á.-M.); (E.J.d.-G.); (G.B.)
- DeepBioInsights, C/Federico García Lorca, 18, 33007 Oviedo, Spain
| | - Enrique J. deAndrés-Galiana
- Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo, C/Federico García Lorca, 18, 33007 Oviedo, Spain; (Ó.Á.-M.); (E.J.d.-G.); (G.B.)
- Department of Informatics and Computer Science, University of Oviedo, C/Federico García Lorca, 18, 33007 Oviedo, Spain
| | - Guillermina Bea
- Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo, C/Federico García Lorca, 18, 33007 Oviedo, Spain; (Ó.Á.-M.); (E.J.d.-G.); (G.B.)
| | - Andrzej Kloczkowski
- Battelle Center for Mathematical Medicine, Nationwide Children’s Hospital, Columbus, OH 43205, USA;
- Department of Pediatrics, The Ohio State University, Columbus, OH 43205, USA
| |
Collapse
|
27
|
Bokhari Y, Alhareeri A, Arodz T. QuaDMutNetEx: a method for detecting cancer driver genes with low mutation frequency. BMC Bioinformatics 2020; 21:122. [PMID: 32293263 PMCID: PMC7092414 DOI: 10.1186/s12859-020-3449-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Accepted: 03/10/2020] [Indexed: 01/31/2023] Open
Abstract
BACKGROUND Cancer is caused by genetic mutations, but not all somatic mutations in human DNA drive the emergence or growth of cancers. While many frequently-mutated cancer driver genes have already been identified and are being utilized for diagnostic, prognostic, or therapeutic purposes, identifying driver genes that harbor mutations occurring with low frequency in human cancers is an ongoing endeavor. Typically, mutations that do not confer growth advantage to tumors - passenger mutations - dominate the mutation landscape of tumor cell genome, making identification of low-frequency driver mutations a challenge. The leading approach for discovering new putative driver genes involves analyzing patterns of mutations in large cohorts of patients and using statistical methods to discriminate driver from passenger mutations. RESULTS We propose a novel cancer driver gene detection method, QuaDMutNetEx. QuaDMutNetEx discovers cancer drivers with low mutation frequency by giving preference to genes encoding proteins that are connected in human protein-protein interaction networks, and that at the same time show low deviation from the mutual exclusivity pattern that characterizes driver mutations occurring in the same pathway or functional gene group across a cohort of cancer samples. CONCLUSIONS Evaluation of QuaDMutNetEx on four different tumor sample datasets show that the proposed method finds biologically-connected sets of low-frequency driver genes, including many genes that are not found if the network connectivity information is not considered. Improved quality and interpretability of the discovered putative driver gene sets compared to existing methods shows that QuaDMutNetEx is a valuable new tool for detecting driver genes. QuaDMutNetEx is available for download from https://github.com/bokhariy/QuaDMutNetExunder the GNU GPLv3 license.
Collapse
Affiliation(s)
- Yahya Bokhari
- Department of Computer Science, College of Engineering, Virginia Commonwealth University, 401 W. Main St., Richmond, VA 23284, USA
- Department of Biostatistics and Bioinformatics, King Abdullah International Medical Research Center, Riyadh, Saudi Arabia
- King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia
| | - Areej Alhareeri
- College of Applied Medical Sciences, King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia
- King Abdullah International Medical Research Center, Riyadh, Saudi Arabia
| | - Tomasz Arodz
- Department of Computer Science, College of Engineering, Virginia Commonwealth University, 401 W. Main St., Richmond, VA 23284, USA.
| |
Collapse
|
28
|
Li F, Gao L, Wang B. Detection of Driver Modules with Rarely Mutated Genes in Cancers. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:390-401. [PMID: 29994261 DOI: 10.1109/tcbb.2018.2846262] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Identifying driver modules or pathways is a key challenge to interpret the molecular mechanisms and pathogenesis underlying cancer. An increasing number of studies suggest that rarely mutated genes are important for the development of cancer. However, the driver modules consisting of mutated genes with low-frequency driver mutations are not well characterized. To identify driver modules with rarely mutated genes, we propose a functional similarity index to quantify the functional relationship between rarely mutated genes and other ones in the same module. Then, we develop a method to detect Driver Modules with Rarely mutated Genes (DMRG) by incorporating the functional similarity, coverage and mutual exclusivity. By applying DMRG on TCGA cancer dataset on three networks: HINT+HI2012, iRefIndex and MultiNet, we detect driver modules intersecting with the well-known signalling pathways and protein complexes, such as the cell cycle pathway and the mediator complex. DMRG can also detect driver modules effectively with 20, 40, 60 and 80 percent of samples by random selection. When compared with HotNet2, DMRG detects more rarely mutated cancer genes and has higher pathway enrichment. Overall, DMRG provides an effective method for the identification of driver modules with rarely mutated genes.
Collapse
|
29
|
Pham VVH, Liu L, Bracken CP, Goodall GJ, Long Q, Li J, Le TD. CBNA: A control theory based method for identifying coding and non-coding cancer drivers. PLoS Comput Biol 2019; 15:e1007538. [PMID: 31790386 PMCID: PMC6907873 DOI: 10.1371/journal.pcbi.1007538] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Revised: 12/12/2019] [Accepted: 11/12/2019] [Indexed: 02/06/2023] Open
Abstract
A key task in cancer genomics research is to identify cancer driver genes. As these genes initialise and progress cancer, understanding them is critical in designing effective cancer interventions. Although there are several methods developed to discover cancer drivers, most of them only identify coding drivers. However, non-coding RNAs can regulate driver mutations to develop cancer. Hence, novel methods are required to reveal both coding and non-coding cancer drivers. In this paper, we develop a novel framework named Controllability based Biological Network Analysis (CBNA) to uncover coding and non-coding cancer drivers (i.e. miRNA cancer drivers). CBNA integrates different genomic data types, including gene expression, gene network, mutation data, and contains a two-stage process: (1) Building a network for a condition (e.g. cancer condition) and (2) Identifying drivers. The application of CBNA to the BRCA dataset demonstrates that it is more effective than the existing methods in detecting coding cancer drivers. In addition, CBNA also predicts 17 miRNA drivers for breast cancer. Some of these predicted miRNA drivers have been validated by literature and the rest can be good candidates for wet-lab validation. We further use CBNA to detect subtype-specific cancer drivers and several predicted drivers have been confirmed to be related to breast cancer subtypes. Another application of CBNA is to discover epithelial-mesenchymal transition (EMT) drivers. Of the predicted EMT drivers, 7 coding and 6 miRNA drivers are in the known EMT gene lists. Cancer is a disease of cells in human body and it causes a high rate of deaths worldwide. There has been evidence that coding and non-coding RNAs are key players in the initialisation and progression of cancer. These coding and non-coding RNAs are considered as cancer drivers. To design better diagnostic and therapeutic plans for cancer patients, we need to know the roles of cancer drivers in cancer development as well as their regulatory mechanisms in the human body. In this study, we propose a novel framework to identify coding and non-coding cancer drivers (i.e. miRNA cancer drivers). The proposed framework is applied to the breast cancer dataset for identifying drivers of breast cancer. Comparing our method with existing methods in predicting coding cancer drivers, our method shows a better performance. Several miRNA cancer drivers predicted by our method have already been validated by literature. The predicted cancer drivers by our method could be a potential source for further wet-lab experiments to discover the causes of cancer. In addition, the proposed method can be used to detect drivers of cancer subtypes and drivers of the epithelial-mesenchymal transition in cancer.
Collapse
Affiliation(s)
- Vu V. H. Pham
- School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia
| | - Lin Liu
- School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia
| | - Cameron P. Bracken
- Centre for Cancer Biology, an alliance of SA Pathology and University of South Australia, Adelaide, Australia
- Department of Medicine, The University of Adelaide, Adelaide, Australia
| | - Gregory J. Goodall
- Centre for Cancer Biology, an alliance of SA Pathology and University of South Australia, Adelaide, Australia
- Department of Medicine, The University of Adelaide, Adelaide, Australia
| | - Qi Long
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Jiuyong Li
- School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia
- * E-mail: (JL); (TL)
| | - Thuc D. Le
- School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia
- * E-mail: (JL); (TL)
| |
Collapse
|
30
|
Melloy PG. The anaphase-promoting complex: A key mitotic regulator associated with somatic mutations occurring in cancer. Genes Chromosomes Cancer 2019; 59:189-202. [PMID: 31652364 DOI: 10.1002/gcc.22820] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2019] [Revised: 10/18/2019] [Accepted: 10/22/2019] [Indexed: 12/14/2022] Open
Abstract
The anaphase-promoting complex/cyclosome (APC/C) is an E3 ubiquitin ligase that helps control chromosome separation and exit from mitosis in many different kinds of organisms, including yeast, flies, worms, and humans. This review represents a new perspective on the connection between APC/C subunit mutations and cancer. The complex nature of APC/C and limited mutation analysis of its subunits has made it difficult to determine the relationship of each subunit to cancer. In this work, cancer genomic data were examined to identify APC/C subunits with a greater than 5% alteration frequency in 11 representative cancers using the cBioPortal database. Using the Genetic Determinants of Cancer Patient Survival database, APC/C subunits were also studied and found to be significantly associated with poor patient prognosis in several cases. In comparing these two kinds of cancer genomics data to published large-scale genomic analyses looking for cancer driver genes, ANAPC1 and ANAPC3/CDC27 stood out as being represented in all three types of analyses. Seven other subunits were found to be associated both with >5% alteration frequency in certain cancers and being associated with an effect on cancer patient prognosis. The aim of this review is to provide new approaches for investigators conducting in vivo studies of APC/C subunits and cancer progression. In turn, a better understanding of these APC/C subunits and their role in different cancers will help scientists design drugs that are more precisely targeted to certain cancers, using APC/C mutation status as a biomarker.
Collapse
Affiliation(s)
- Patricia G Melloy
- Department of Biological and Allied Health Sciences, Fairleigh Dickinson University, Madison, New Jersey
| |
Collapse
|
31
|
Nussinov R, Tsai CJ, Jang H. Why Are Some Driver Mutations Rare? Trends Pharmacol Sci 2019; 40:919-929. [PMID: 31699406 DOI: 10.1016/j.tips.2019.10.003] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Revised: 10/09/2019] [Accepted: 10/10/2019] [Indexed: 12/13/2022]
Abstract
Understanding why driver mutations that promote cancer are sometimes rare is important for precision medicine since it would help in their identification. Driver mutations are largely discovered through their frequencies. Thus, rare mutations often escape detection. Unlike high-frequency drivers, low-frequency drivers can be tissue specific; rare drivers have extremely low frequencies. Here, we discuss rare drivers and strategies to discover them. We suggest that allosteric driver mutations shift the protein ensemble from the inactive to the active state. Rare allosteric drivers are statistically rare since, to switch the protein functional state, they cooperate with additional mutations, and these are not considered in the patient cancer-specific protein sequence analysis. A complete landscape of mutations that drive cancer will reveal tumor-specific therapeutic vulnerabilities.
Collapse
Affiliation(s)
- Ruth Nussinov
- Computational Structural Biology Section, Basic Science Program, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA; Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel.
| | - Chung-Jung Tsai
- Computational Structural Biology Section, Basic Science Program, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
| | - Hyunbum Jang
- Computational Structural Biology Section, Basic Science Program, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
| |
Collapse
|
32
|
Dimitrakopoulos C, Hindupur SK, Häfliger L, Behr J, Montazeri H, Hall MN, Beerenwinkel N. Network-based integration of multi-omics data for prioritizing cancer genes. Bioinformatics 2019; 34:2441-2448. [PMID: 29547932 PMCID: PMC6041755 DOI: 10.1093/bioinformatics/bty148] [Citation(s) in RCA: 96] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2017] [Accepted: 03/13/2018] [Indexed: 12/21/2022] Open
Abstract
Motivation Several molecular events are known to be cancer-related, including genomic aberrations, hypermethylation of gene promoter regions and differential expression of microRNAs. These aberration events are very heterogeneous across tumors and it is poorly understood how they affect the molecular makeup of the cell, including the transcriptome and proteome. Protein interaction networks can help decode the functional relationship between aberration events and changes in gene and protein expression. Results We developed NetICS (Network-based Integration of Multi-omics Data), a new graph diffusion-based method for prioritizing cancer genes by integrating diverse molecular data types on a directed functional interaction network. NetICS prioritizes genes by their mediator effect, defined as the proximity of the gene to upstream aberration events and to downstream differentially expressed genes and proteins in an interaction network. Genes are prioritized for individual samples separately and integrated using a robust rank aggregation technique. NetICS provides a comprehensive computational framework that can aid in explaining the heterogeneity of aberration events by their functional convergence to common differentially expressed genes and proteins. We demonstrate NetICS’ competitive performance in predicting known cancer genes and in generating robust gene lists using TCGA data from five cancer types. Availability and implementation NetICS is available at https://github.com/cbg-ethz/netics. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Christos Dimitrakopoulos
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | | | - Luca Häfliger
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Jonas Behr
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Hesam Montazeri
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | | | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| |
Collapse
|
33
|
Perez-Romero CA, Weytjens B, Decap D, Swings T, Michiels J, De Maeyer D, Marchal K. IAMBEE: a web-service for the identification of adaptive pathways from parallel evolved clonal populations. Nucleic Acids Res 2019; 47:W151-W157. [PMID: 31127271 PMCID: PMC6602435 DOI: 10.1093/nar/gkz451] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2019] [Revised: 05/02/2019] [Accepted: 05/10/2019] [Indexed: 11/18/2022] Open
Abstract
IAMBEE is a web server designed for the Identification of Adaptive Mutations in Bacterial Evolution Experiments (IAMBEE). Input data consist of genotype information obtained from independently evolved clonal populations or strains that show the same adapted behavior (phenotype). To distinguish adaptive from passenger mutations, IAMBEE searches for neighborhoods in an organism-specific interaction network that are recurrently mutated in the adapted populations. This search for recurrently mutated network neighborhoods, as proxies for pathways is driven by additional information on the functional impact of the observed genetic changes and their dynamics during adaptive evolution. In addition, the search explicitly accounts for the differences in mutation rate between the independently evolved populations. Using this approach, IAMBEE allows exploiting parallel evolution to identify adaptive pathways. The web-server is freely available at http://bioinformatics.intec.ugent.be/iambee/ with no login requirement.
Collapse
Affiliation(s)
- Camilo Andres Perez-Romero
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.,Department of Information Technology, IDLab, Ghent University, IMEC, Ghent, Belgium
| | - Bram Weytjens
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.,Department of Information Technology, IDLab, Ghent University, IMEC, Ghent, Belgium
| | - Dries Decap
- Department of Information Technology, IDLab, Ghent University, IMEC, Ghent, Belgium
| | - Toon Swings
- VIB Center for Microbiology, Flanders Institute for Biotechnology, Leuven, Belgium.,Centre of Microbial and Plant Genetics, KU Leuven, Leuven, Belgium.,VIB Technology Watch, Flanders Institute for Biotechnology, Ghent, Belgium
| | - Jan Michiels
- VIB Center for Microbiology, Flanders Institute for Biotechnology, Leuven, Belgium.,Centre of Microbial and Plant Genetics, KU Leuven, Leuven, Belgium
| | - Dries De Maeyer
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.,Department of Information Technology, IDLab, Ghent University, IMEC, Ghent, Belgium
| | - Kathleen Marchal
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.,Department of Information Technology, IDLab, Ghent University, IMEC, Ghent, Belgium
| |
Collapse
|
34
|
Larmuseau M, Verbeke LPC, Marchal K. Associating expression and genomic data using co-occurrence measures. Biol Direct 2019; 14:10. [PMID: 31072345 PMCID: PMC6507230 DOI: 10.1186/s13062-019-0240-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Accepted: 04/10/2019] [Indexed: 12/11/2022] Open
Abstract
Abstract Recent technological evolutions have led to an exponential increase in data in all the omics fields. It is expected that integration of these different data sources, will drastically enhance our knowledge of the biological mechanisms behind genomic diseases such as cancer. However, the integration of different omics data still remains a challenge. In this work we propose an intuitive workflow for the integrative analysis of expression, mutation and copy number data taken from the METABRIC study on breast cancer. First, we present evidence that the expression profile of many important breast cancer genes consists of two modes or ‘regimes’, which contain important clinical information. Then, we show how the co-occurrence of these expression regimes can be used as an association measure between genes and validate our findings on the TCGA-BRCA study. Finally, we demonstrate how these co-occurrence measures can also be applied to link expression regimes to genomic aberrations, providing a more complete, integrative view on breast cancer. As a case study, an integrative analysis of the identified MLPH-FOXA1 association is performed, illustrating that the obtained expression associations are intimately linked to the underlying genomic changes. Reviewers This article was reviewed by Dirk Walther, Francisco Garcia and Isabel Nepomuceno. Electronic supplementary material The online version of this article (10.1186/s13062-019-0240-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Maarten Larmuseau
- Department of Information Technology, Ghent University - Imec, Technologiepark-Zwijnaarde 126, 9052, Ghent, Belgium
| | - Lieven P C Verbeke
- Department of Plant Biotechnology and Bioinformatics, Ghent University - Imec, Technologiepark-Zwijnaarde 126, 9052, Ghent, Belgium
| | - Kathleen Marchal
- Department of Plant Biotechnology and Bioinformatics, Ghent University - Imec, Technologiepark-Zwijnaarde 126, 9052, Ghent, Belgium.
| |
Collapse
|
35
|
Review: Precision medicine and driver mutations: Computational methods, functional assays and conformational principles for interpreting cancer drivers. PLoS Comput Biol 2019; 15:e1006658. [PMID: 30921324 PMCID: PMC6438456 DOI: 10.1371/journal.pcbi.1006658] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
At the root of the so-called precision medicine or precision oncology, which is our focus here, is the hypothesis that cancer treatment would be considerably better if therapies were guided by a tumor’s genomic alterations. This hypothesis has sparked major initiatives focusing on whole-genome and/or exome sequencing, creation of large databases, and developing tools for their statistical analyses—all aspiring to identify actionable alterations, and thus molecular targets, in a patient. At the center of the massive amount of collected sequence data is their interpretations that largely rest on statistical analysis and phenotypic observations. Statistics is vital, because it guides identification of cancer-driving alterations. However, statistics of mutations do not identify a change in protein conformation; therefore, it may not define sufficiently accurate actionable mutations, neglecting those that are rare. Among the many thematic overviews of precision oncology, this review innovates by further comprehensively including precision pharmacology, and within this framework, articulating its protein structural landscape and consequences to cellular signaling pathways. It provides the underlying physicochemical basis, thereby also opening the door to a broader community.
Collapse
|
36
|
Identifying Cancer Specific Driver Modules Using a Network-Based Method. Molecules 2018; 23:molecules23051114. [PMID: 29738475 PMCID: PMC6100049 DOI: 10.3390/molecules23051114] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Revised: 04/26/2018] [Accepted: 05/07/2018] [Indexed: 02/01/2023] Open
Abstract
Detecting driver modules is a key challenge for understanding the mechanisms of carcinogenesis at the pathway level. Identifying cancer specific driver modules is helpful for interpreting the different principles of different cancer types. However, most methods are proposed to identify driver modules in one cancer, but few methods are introduced to detect cancer specific driver modules. We propose a network-based method to detect cancer specific driver modules (CSDM) in a certain cancer type to other cancer types. We construct the specific network of a cancer by combining specific coverage and mutual exclusivity in all cancer types, to catch the specificity of the cancer at the pathway level. To illustrate the performance of the method, we apply CSDM on 12 TCGA cancer types. When we compare CSDM with SpeMDP and HotNet2 with regard to specific coverage and the enrichment of GO terms and KEGG pathways, CSDM is more accurate. We find that the specific driver modules of two different cancers have little overlap, which indicates that the driver modules detected by CSDM are specific. Finally, we also analyze three specific driver modules of BRCA, BLCA, and LAML intersecting with well-known pathways. The source code of CSDM is freely accessible at https://github.com/fengli28/CSDM.git.
Collapse
|
37
|
Jean-Quartier C, Jeanquartier F, Jurisica I, Holzinger A. In silico cancer research towards 3R. BMC Cancer 2018; 18:408. [PMID: 29649981 PMCID: PMC5897933 DOI: 10.1186/s12885-018-4302-0] [Citation(s) in RCA: 64] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2017] [Accepted: 03/26/2018] [Indexed: 01/11/2023] Open
Abstract
BACKGROUND Improving our understanding of cancer and other complex diseases requires integrating diverse data sets and algorithms. Intertwining in vivo and in vitro data and in silico models are paramount to overcome intrinsic difficulties given by data complexity. Importantly, this approach also helps to uncover underlying molecular mechanisms. Over the years, research has introduced multiple biochemical and computational methods to study the disease, many of which require animal experiments. However, modeling systems and the comparison of cellular processes in both eukaryotes and prokaryotes help to understand specific aspects of uncontrolled cell growth, eventually leading to improved planning of future experiments. According to the principles for humane techniques milestones in alternative animal testing involve in vitro methods such as cell-based models and microfluidic chips, as well as clinical tests of microdosing and imaging. Up-to-date, the range of alternative methods has expanded towards computational approaches, based on the use of information from past in vitro and in vivo experiments. In fact, in silico techniques are often underrated but can be vital to understanding fundamental processes in cancer. They can rival accuracy of biological assays, and they can provide essential focus and direction to reduce experimental cost. MAIN BODY We give an overview on in vivo, in vitro and in silico methods used in cancer research. Common models as cell-lines, xenografts, or genetically modified rodents reflect relevant pathological processes to a different degree, but can not replicate the full spectrum of human disease. There is an increasing importance of computational biology, advancing from the task of assisting biological analysis with network biology approaches as the basis for understanding a cell's functional organization up to model building for predictive systems. CONCLUSION Underlining and extending the in silico approach with respect to the 3Rs for replacement, reduction and refinement will lead cancer research towards efficient and effective precision medicine. Therefore, we suggest refined translational models and testing methods based on integrative analyses and the incorporation of computational biology within cancer research.
Collapse
Affiliation(s)
- Claire Jean-Quartier
- Holzinger Group, Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Graz, Austria
| | - Fleur Jeanquartier
- Holzinger Group, Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Graz, Austria
- Institute of Interactive Systems and Data Science, Graz University of Technology, Graz, Austria
| | - Igor Jurisica
- Krembil Research Institute, University Health Network; Depts. of Medical Bioph. and Comp. Sci., University of Toronto; Institute of Neuroimmunology, Slovak Academy of Sciences, Toronto, Canada
| | - Andreas Holzinger
- Holzinger Group, Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Graz, Austria
- Institute of Interactive Systems and Data Science, Graz University of Technology, Graz, Austria
| |
Collapse
|
38
|
Al-Obaide MAI, Ibrahim BA, Al-Humaish S, Abdel-Salam ASG. Genomic and Bioinformatics Approaches for Analysis of Genes Associated With Cancer Risks Following Exposure to Tobacco Smoking. Front Public Health 2018; 6:84. [PMID: 29616208 PMCID: PMC5869936 DOI: 10.3389/fpubh.2018.00084] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2017] [Accepted: 03/05/2018] [Indexed: 01/03/2023] Open
Abstract
Cancer is a significant health problem in the Middle East and global population. It is well established that there is a direct link between tobacco smoking and cancer, which will continue to pose a significant threat to human health. The impact of long-term exposure to tobacco smoke on the risk of cancer encouraged the study of biomarkers for vulnerable individuals to tobacco smoking, especially children, who are more susceptible than adults to the action of environmental carcinogens. The carcinogens in tobacco smoke condensate induce DNA damage and play a significant role in determining the health and well-being of smokers, non-smoker, and primarily children. Cancer is a result of genomic and epigenomic malfunctions that lead to an initial premalignant condition. Although premalignancy genetic cascade is a much-delayed process, it will end with adverse health consequences. In addition to the DNA damage and mutations, tobacco smoke can cause changes in the DNA methylation and gene expression associated with cancer. The genetic events hint on the possible use of genomic–epigenomic changes in genes related to cancer, in predicting cancer risks associated with exposure to tobacco smoking. Bioinformatics provides indispensable tools to identify the cascade of expressed genes in active smokers and non-smokers and could assist the development of a framework to manage this cascade of events linked with the evolvement of disease including cancer. The aim of this mini review is to cognize the essential genomic processes and health risks associated with tobacco smoking and the implications of bioinformatics in cancer prediction, prevention, and intervention.
Collapse
Affiliation(s)
- Mohammed A I Al-Obaide
- Department of Biomedical Science, School of Pharmacy, Texas Tech University Health Science Center, Amarillo, TX, United States
| | | | | | - Abdel-Salam G Abdel-Salam
- Department of Mathematics, Statistics and Physics, College of Arts and Sciences, Qatar University, Doha, Qatar
| |
Collapse
|
39
|
Dopazo J, Erten C. Graph-theoretical comparison of normal and tumor networks in identifying BRCA genes. BMC SYSTEMS BIOLOGY 2017; 11:110. [PMID: 29166896 PMCID: PMC5700672 DOI: 10.1186/s12918-017-0495-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2017] [Accepted: 11/13/2017] [Indexed: 12/18/2022]
Abstract
BACKGROUND Identification of driver genes related to certain types of cancer is an important research topic. Several systems biology approaches have been suggested, in particular for the identification of breast cancer (BRCA) related genes. Such approaches usually rely on differential gene expression and/or mutational landscape data. In some cases interaction network data is also integrated to identify cancer-related modules computationally. RESULTS We provide a framework for the comparative graph-theoretical analysis of networks integrating the relevant gene expression, mutations, and potein-protein interaction network data. The comparisons involve a graph-theoretical analysis of normal and tumor network pairs across all instances of a given set of breast cancer samples. The network measures under consideration are based on appropriate formulations of various centrality measures: betweenness, clustering coefficients, degree centrality, random walk distances, graph-theoretical distances, and Jaccard index centrality. CONCLUSIONS Among all the studied centrality-based graph-theoretical properties, we show that a betweenness-based measure differentiates BRCA genes across all normal versus tumor network pairs, than the rest of the popular centrality-based measures. The AUROC and AUPR values of the gene lists ordered with respect to the measures under study as compared to NCBI BioSystems pathway and the COSMIC database of cancer genes are the largest with the betweenness-based differentiation, followed by the measure based on degree centrality. In order to test the robustness of the suggested measures in prioritizing cancer genes, we further tested the two most promising measures, those based on betweenness and degree centralities, on randomly rewired networks. We show that both measures are quite resilient to noise in the input interaction network. We also compared the same measures against a state-of-the-art alternative disease gene prioritization method, MUFFFINN. We show that both our graph-theoretical measures outperform MUFFINN prioritizations in terms of ROC and precions/recall analysis. Finally, we filter the ordered list of the best measure, the betweenness-based differentiation, via a maximum-weight independent set formulation and investigate the top 50 genes in regards to literature verification. We show that almost all genes in the list are verified by the breast cancer literature and three genes are presented as novel genes that may potentialy be BRCA-related but missing in literature.
Collapse
Affiliation(s)
- Joaquin Dopazo
- Clinical Bioinformatics Research Area, Fundación Progreso y Salud, Hospital Virgen del Rocío, Sevilla, Spain
| | - Cesim Erten
- Computer Engineering, Antalya Bilim University, Antalya, Turkey.
| |
Collapse
|
40
|
Bokhari Y, Arodz T. QuaDMutEx: quadratic driver mutation explorer. BMC Bioinformatics 2017; 18:458. [PMID: 29065872 PMCID: PMC5655866 DOI: 10.1186/s12859-017-1869-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2017] [Accepted: 10/16/2017] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Somatic mutations accumulate in human cells throughout life. Some may have no adverse consequences, but some of them may lead to cancer. A cancer genome is typically unstable, and thus more mutations can accumulate in the DNA of cancer cells. An ongoing problem is to figure out which mutations are drivers - play a role in oncogenesis, and which are passengers - do not play a role. One way of addressing this question is through inspection of somatic mutations in DNA of cancer samples from a cohort of patients and detection of patterns that differentiate driver from passenger mutations. RESULTS We propose QuaDMutEx, a method that incorporates three novel elements: a new gene set penalty that includes non-linear penalization of multiple mutations in putative sets of driver genes, an ability to adjust the method to handle slow- and fast-evolving tumors, and a computationally efficient method for finding gene sets that minimize the penalty, through a combination of heuristic Monte Carlo optimization and exact binary quadratic programming. Compared to existing methods, the proposed algorithm finds sets of putative driver genes that show higher coverage and lower excess coverage in eight sets of cancer samples coming from brain, ovarian, lung, and breast tumors. CONCLUSIONS Superior ability to improve on both coverage and excess coverage on different types of cancer shows that QuaDMutEx is a tool that should be part of a state-of-the-art toolbox in the driver gene discovery pipeline. It can detect genes harboring rare driver mutations that may be missed by existing methods. QuaDMutEx is available for download from https://github.com/bokhariy/QuaDMutEx under the GNU GPLv3 license.
Collapse
Affiliation(s)
- Yahya Bokhari
- Department of Computer Science, School of Engineering, Virginia Commonwealth University, 401 W. Main St., Richmond, 23284, VA, USA
| | - Tomasz Arodz
- Department of Computer Science, School of Engineering, Virginia Commonwealth University, 401 W. Main St., Richmond, 23284, VA, USA. .,Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, 23284, VA, USA.
| |
Collapse
|
41
|
Zhang W, Chien J, Yong J, Kuang R. Network-based machine learning and graph theory algorithms for precision oncology. NPJ Precis Oncol 2017; 1:25. [PMID: 29872707 PMCID: PMC5871915 DOI: 10.1038/s41698-017-0029-7] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2017] [Revised: 06/28/2017] [Accepted: 06/29/2017] [Indexed: 01/07/2023] Open
Abstract
Network-based analytics plays an increasingly important role in precision oncology. Growing evidence in recent studies suggests that cancer can be better understood through mutated or dysregulated pathways or networks rather than individual mutations and that the efficacy of repositioned drugs can be inferred from disease modules in molecular networks. This article reviews network-based machine learning and graph theory algorithms for integrative analysis of personal genomic data and biomedical knowledge bases to identify tumor-specific molecular mechanisms, candidate targets and repositioned drugs for personalized treatment. The review focuses on the algorithmic design and mathematical formulation of these methods to facilitate applications and implementations of network-based analysis in the practice of precision oncology. We review the methods applied in three scenarios to integrate genomic data and network models in different analysis pipelines, and we examine three categories of network-based approaches for repositioning drugs in drug-disease-gene networks. In addition, we perform a comprehensive subnetwork/pathway analysis of mutations in 31 cancer genome projects in the Cancer Genome Atlas and present a detailed case study on ovarian cancer. Finally, we discuss interesting observations, potential pitfalls and future directions in network-based precision oncology.
Collapse
Affiliation(s)
- Wei Zhang
- 1Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, MN USA
| | - Jeremy Chien
- 2Department of Cancer Biology, University of Kansas Medical Center, Kansas City, KS USA
| | - Jeongsik Yong
- 3Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota Twin Cities, Minneapolis, MN USA
| | - Rui Kuang
- 1Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, MN USA
| |
Collapse
|