1
|
Deng C, Li HD, Zhang LS, Liu Y, Li Y, Wang J. Identifying new cancer genes based on the integration of annotated gene sets via hypergraph neural networks. Bioinformatics 2024; 40:i511-i520. [PMID: 38940121 PMCID: PMC11211849 DOI: 10.1093/bioinformatics/btae257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION Identifying cancer genes remains a significant challenge in cancer genomics research. Annotated gene sets encode functional associations among multiple genes, and cancer genes have been shown to cluster in hallmark signaling pathways and biological processes. The knowledge of annotated gene sets is critical for discovering cancer genes but remains to be fully exploited. RESULTS Here, we present the DIsease-Specific Hypergraph neural network (DISHyper), a hypergraph-based computational method that integrates the knowledge from multiple types of annotated gene sets to predict cancer genes. First, our benchmark results demonstrate that DISHyper outperforms the existing state-of-the-art methods and highlight the advantages of employing hypergraphs for representing annotated gene sets. Second, we validate the accuracy of DISHyper-predicted cancer genes using functional validation results and multiple independent functional genomics data. Third, our model predicts 44 novel cancer genes, and subsequent analysis shows their significant associations with multiple types of cancers. Overall, our study provides a new perspective for discovering cancer genes and reveals previously undiscovered cancer genes. AVAILABILITY AND IMPLEMENTATION DISHyper is freely available for download at https://github.com/genemine/DISHyper.
Collapse
Affiliation(s)
- Chao Deng
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Hong-Dong Li
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Li-Shen Zhang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Yiwei Liu
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Yaohang Li
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529-0001, United States
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| |
Collapse
|
2
|
Giudice G, Chen H, Koutsandreas T, Petsalaki E. phuEGO: A Network-Based Method to Reconstruct Active Signaling Pathways From Phosphoproteomics Datasets. Mol Cell Proteomics 2024; 23:100771. [PMID: 38642805 PMCID: PMC11134849 DOI: 10.1016/j.mcpro.2024.100771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 04/08/2024] [Accepted: 04/17/2024] [Indexed: 04/22/2024] Open
Abstract
Signaling networks are critical for virtually all cell functions. Our current knowledge of cell signaling has been summarized in signaling pathway databases, which, while useful, are highly biased toward well-studied processes, and do not capture context specific network wiring or pathway cross-talk. Mass spectrometry-based phosphoproteomics data can provide a more unbiased view of active cell signaling processes in a given context, however, it suffers from low signal-to-noise ratio and poor reproducibility across experiments. While progress in methods to extract active signaling signatures from such data has been made, there are still limitations with respect to balancing bias and interpretability. Here we present phuEGO, which combines up-to-three-layer network propagation with ego network decomposition to provide small networks comprising active functional signaling modules. PhuEGO boosts the signal-to-noise ratio from global phosphoproteomics datasets, enriches the resulting networks for functional phosphosites and allows the improved comparison and integration across datasets. We applied phuEGO to five phosphoproteomics data sets from cell lines collected upon infection with SARS CoV2. PhuEGO was better able to identify common active functions across datasets and to point to a subnetwork enriched for known COVID-19 targets. Overall, phuEGO provides a flexible tool to the community for the improved functional interpretation of global phosphoproteomics datasets.
Collapse
Affiliation(s)
- Girolamo Giudice
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, United Kingdom
| | - Haoqi Chen
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, United Kingdom
| | - Thodoris Koutsandreas
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, United Kingdom
| | - Evangelia Petsalaki
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, United Kingdom.
| |
Collapse
|
3
|
Kim Y, Han Y, Hopper C, Lee J, Joo JI, Gong JR, Lee CK, Jang SH, Kang J, Kim T, Cho KH. A gray box framework that optimizes a white box logical model using a black box optimizer for simulating cellular responses to perturbations. CELL REPORTS METHODS 2024; 4:100773. [PMID: 38744288 PMCID: PMC11133856 DOI: 10.1016/j.crmeth.2024.100773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Revised: 03/19/2024] [Accepted: 04/19/2024] [Indexed: 05/16/2024]
Abstract
Predicting cellular responses to perturbations requires interpretable insights into molecular regulatory dynamics to perform reliable cell fate control, despite the confounding non-linearity of the underlying interactions. There is a growing interest in developing machine learning-based perturbation response prediction models to handle the non-linearity of perturbation data, but their interpretation in terms of molecular regulatory dynamics remains a challenge. Alternatively, for meaningful biological interpretation, logical network models such as Boolean networks are widely used in systems biology to represent intracellular molecular regulation. However, determining the appropriate regulatory logic of large-scale networks remains an obstacle due to the high-dimensional and discontinuous search space. To tackle these challenges, we present a scalable derivative-free optimizer trained by meta-reinforcement learning for Boolean network models. The logical network model optimized by the trained optimizer successfully predicts anti-cancer drug responses of cancer cell lines, while simultaneously providing insight into their underlying molecular regulatory mechanisms.
Collapse
Affiliation(s)
- Yunseong Kim
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Younghyun Han
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Corbin Hopper
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Jonghoon Lee
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Jae Il Joo
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Jeong-Ryeol Gong
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Chun-Kyung Lee
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Seong-Hoon Jang
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Junsoo Kang
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Taeyoung Kim
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Kwang-Hyun Cho
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea.
| |
Collapse
|
4
|
Visonà G, Bouzigon E, Demenais F, Schweikert G. Network propagation for GWAS analysis: a practical guide to leveraging molecular networks for disease gene discovery. Brief Bioinform 2024; 25:bbae014. [PMID: 38340090 PMCID: PMC10858647 DOI: 10.1093/bib/bbae014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 12/28/2023] [Accepted: 01/08/2024] [Indexed: 02/12/2024] Open
Abstract
MOTIVATION Genome-wide association studies (GWAS) have enabled large-scale analysis of the role of genetic variants in human disease. Despite impressive methodological advances, subsequent clinical interpretation and application remains challenging when GWAS suffer from a lack of statistical power. In recent years, however, the use of information diffusion algorithms with molecular networks has led to fruitful insights on disease genes. RESULTS We present an overview of the design choices and pitfalls that prove crucial in the application of network propagation methods to GWAS summary statistics. We highlight general trends from the literature, and present benchmark experiments to expand on these insights selecting as case study three diseases and five molecular networks. We verify that the use of gene-level scores based on GWAS P-values offers advantages over the selection of a set of 'seed' disease genes not weighted by the associated P-values if the GWAS summary statistics are of sufficient quality. Beyond that, the size and the density of the networks prove to be important factors for consideration. Finally, we explore several ensemble methods and show that combining multiple networks may improve the network propagation approach.
Collapse
Affiliation(s)
- Giovanni Visonà
- Empirical Inference, Max-Planck Institute for Intelligent Systems, Tübingen 72076, Germany
| | | | | | | |
Collapse
|
5
|
Chatzianastasis M, Vazirgiannis M, Zhang Z. Explainable Multilayer Graph Neural Network for cancer gene prediction. Bioinformatics 2023; 39:btad643. [PMID: 37862225 PMCID: PMC10636280 DOI: 10.1093/bioinformatics/btad643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 10/04/2023] [Accepted: 10/19/2023] [Indexed: 10/22/2023] Open
Abstract
MOTIVATION The identification of cancer genes is a critical yet challenging problem in cancer genomics research. Existing computational methods, including deep graph neural networks, fail to exploit the multilayered gene-gene interactions or provide limited explanations for their predictions. These methods are restricted to a single biological network, which cannot capture the full complexity of tumorigenesis. Models trained on different biological networks often yield different and even opposite cancer gene predictions, hindering their trustworthy adaptation. Here, we introduce an Explainable Multilayer Graph Neural Network (EMGNN) approach to identify cancer genes by leveraging multiple gene-gene interaction networks and pan-cancer multi-omics data. Unlike conventional graph learning on a single biological network, EMGNN uses a multilayered graph neural network to learn from multiple biological networks for accurate cancer gene prediction. RESULTS Our method consistently outperforms all existing methods, with an average 7.15% improvement in area under the precision-recall curve over the current state-of-the-art method. Importantly, EMGNN integrated multiple graphs to prioritize newly predicted cancer genes with conflicting predictions from single biological networks. For each prediction, EMGNN provided valuable biological insights via both model-level feature importance explanations and molecular-level gene set enrichment analysis. Overall, EMGNN offers a powerful new paradigm of graph learning through modeling the multilayered topological gene relationships and provides a valuable tool for cancer genomics research. AVAILABILITY AND IMPLEMENTATION Our code is publicly available at https://github.com/zhanglab-aim/EMGNN.
Collapse
Affiliation(s)
| | | | - Zijun Zhang
- Division of Artificial Intelligence in Medicine, Cedars-Sinai Medical Center, 116 N. Robertson Boulevard, Los Angeles, CA 90048, United States
| |
Collapse
|
6
|
Pasquier C, Guerlais V, Pallez D, Rapetti-Mauss R, Soriani O. A network embedding approach to identify active modules in biological interaction networks. Life Sci Alliance 2023; 6:e202201550. [PMID: 37339804 PMCID: PMC10282331 DOI: 10.26508/lsa.202201550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 06/06/2023] [Accepted: 06/06/2023] [Indexed: 06/22/2023] Open
Abstract
The identification of condition-specific gene sets from transcriptomic experiments is important to reveal regulatory and signaling mechanisms associated with a given cellular response. Statistical methods of differential expression analysis, designed to assess individual gene variations, have trouble highlighting modules of small varying genes whose interaction is essential to characterize phenotypic changes. To identify these highly informative gene modules, several methods have been proposed in recent years, but they have many limitations that make them of little use to biologists. Here, we propose an efficient method for identifying these active modules that operates on a data embedding combining gene expressions and interaction data. Applications carried out on real datasets show that our method can identify new groups of genes of high interest corresponding to functions not revealed by traditional approaches. Software is available at https://github.com/claudepasquier/amine.
Collapse
Affiliation(s)
- Claude Pasquier
- Laboratoire d'Informatique, Signaux et Systèmes de Sophia-Antipolis, I3S - UMR7271 - UNS CNRS, Les Algorithmes - bât. Euclide B, Sophia Antipolis, France
| | - Vincent Guerlais
- Laboratoire d'Informatique, Signaux et Systèmes de Sophia-Antipolis, I3S - UMR7271 - UNS CNRS, Les Algorithmes - bât. Euclide B, Sophia Antipolis, France
| | - Denis Pallez
- Laboratoire d'Informatique, Signaux et Systèmes de Sophia-Antipolis, I3S - UMR7271 - UNS CNRS, Les Algorithmes - bât. Euclide B, Sophia Antipolis, France
| | - Raphaël Rapetti-Mauss
- iBV - Institut de Biologie Valrose, Université Nice Sophia Antipolis, Faculté des Sciences, Parc Valrose, Nice cedex 2, France
| | - Olivier Soriani
- iBV - Institut de Biologie Valrose, Université Nice Sophia Antipolis, Faculté des Sciences, Parc Valrose, Nice cedex 2, France
| |
Collapse
|
7
|
Yang L, Chen R, Melendy T, Goodison S, Sun Y. Identifying Significantly Perturbed Subnetworks in Cancer Using Multiple Protein-Protein Interaction Networks. Cancers (Basel) 2023; 15:4090. [PMID: 37627118 PMCID: PMC10452419 DOI: 10.3390/cancers15164090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 08/03/2023] [Accepted: 08/12/2023] [Indexed: 08/27/2023] Open
Abstract
BACKGROUND The identification of cancer driver genes and key molecular pathways has been the focus of large-scale cancer genome studies. Network-based methods detect significantly perturbed subnetworks as putative cancer pathways by incorporating genomics data with the topological information of PPI networks. However, commonly used PPI networks have distinct topological structures, making the results of the same method vary widely when applied to different networks. Furthermore, emerging context-specific PPI networks often have incomplete topological structures, which pose serious challenges for existing subnetwork detection algorithms. METHODS In this paper, we propose a novel method, referred to as MultiFDRnet, to address the above issues. The basic idea is to model a set of PPI networks as a multiplex network to preserve the topological structure of individual networks, while introducing dependencies among them, and, then, to detect significantly perturbed subnetworks on the modeled multiplex network using all the structural information simultaneously. RESULTS To illustrate the effectiveness of the proposed approach, an extensive benchmark analysis was conducted on both simulated and real cancer data. The experimental results showed that the proposed method is able to detect significantly perturbed subnetworks jointly supported by multiple PPI networks and to identify novel modular structures in context-specific PPI networks.
Collapse
Affiliation(s)
- Le Yang
- Department of Microbiology and Immunology, The State University of New York at Buffalo, Buffalo, NY 14203, USA; (L.Y.); (R.C.); (T.M.)
| | - Runpu Chen
- Department of Microbiology and Immunology, The State University of New York at Buffalo, Buffalo, NY 14203, USA; (L.Y.); (R.C.); (T.M.)
| | - Thomas Melendy
- Department of Microbiology and Immunology, The State University of New York at Buffalo, Buffalo, NY 14203, USA; (L.Y.); (R.C.); (T.M.)
| | - Steve Goodison
- Department of Quantitative Health Sciences, Mayo Clinic, Jacksonville, FL 32224, USA;
| | - Yijun Sun
- Department of Microbiology and Immunology, The State University of New York at Buffalo, Buffalo, NY 14203, USA; (L.Y.); (R.C.); (T.M.)
- Department of Computer Science and Engineering, The State University of New York at Buffalo, Buffalo, NY 14203, USA
| |
Collapse
|
8
|
Xu X, Qi Z, Zhang D, Zhang M, Ren Y, Geng Z. DriverGenePathway: Identifying driver genes and driver pathways in cancer based on MutSigCV and statistical methods. Comput Struct Biotechnol J 2023; 21:3124-3135. [PMID: 37293242 PMCID: PMC10244682 DOI: 10.1016/j.csbj.2023.05.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2023] [Revised: 05/18/2023] [Accepted: 05/18/2023] [Indexed: 06/10/2023] Open
Abstract
Although computational methods for driver gene identification have progressed rapidly, it is far from the goal of obtaining widely recognized driver genes for all cancer types. The driver gene lists predicted by these methods often lack consistency and stability across different studies or datasets. In addition to analytical performance, some tools may require further improvement regarding operability and system compatibility. Here, we developed a user-friendly R package (DriverGenePathway) integrating MutSigCV and statistical methods to identify cancer driver genes and pathways. The theoretical basis of the MutSigCV program is elaborated and integrated into DriverGenePathway, such as mutation categories discovery based on information entropy. Five methods of hypothesis testing, including the beta-binomial test, Fisher combined p-value test, likelihood ratio test, convolution test, and projection test, are used to identify the minimal core driver genes. Moreover, de novo methods, which can effectively overcome mutational heterogeneity, are introduced to identify driver pathways. Herein, we describe the computational structure and statistical fundamentals of the DriverGenePathway pipeline and demonstrate its performance using eight types of cancer from TCGA. DriverGenePathway correctly confirms many expected driver genes with high overlap with the Cancer Gene Census list and driver pathways associated with cancer development. The DriverGenePathway R package is freely available on GitHub: https://github.com/bioinformatics-xu/DriverGenePathway.
Collapse
Affiliation(s)
- Xiaolu Xu
- School of Computer and Information Technology, Liaoning Normal University, Dalian 116029, China
| | - Zitong Qi
- Department of Statistics, University of Washington, Seattle, WA 98195, USA
| | - Dawei Zhang
- School of Computer and Information Technology, Liaoning Normal University, Dalian 116029, China
| | - Meiwei Zhang
- Center for Reproductive and Genetic Medicine, Dalian Women and Children’s Medical Group, Dalian 116037, China
| | - Yonggong Ren
- School of Computer and Information Technology, Liaoning Normal University, Dalian 116029, China
| | - Zhaohong Geng
- Department of Cardiology, Second Affiliated Hospital of Dalian Medical University, Dalian 116023, China
| |
Collapse
|
9
|
Zhang W, Xiang X, Zhao B, Huang J, Yang L, Zeng Y. Identifying Cancer Driver Pathways Based on the Mouth Brooding Fish Algorithm. ENTROPY (BASEL, SWITZERLAND) 2023; 25:841. [PMID: 37372185 DOI: 10.3390/e25060841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 05/05/2023] [Accepted: 05/23/2023] [Indexed: 06/29/2023]
Abstract
Identifying the driver genes of cancer progression is of great significance in improving our understanding of the causes of cancer and promoting the development of personalized treatment. In this paper, we identify the driver genes at the pathway level via an existing intelligent optimization algorithm, named the Mouth Brooding Fish (MBF) algorithm. Many methods based on the maximum weight submatrix model to identify driver pathways attach equal importance to coverage and exclusivity and assign them equal weight, but those methods ignore the impact of mutational heterogeneity. Here, we use principal component analysis (PCA) to incorporate covariate data to reduce the complexity of the algorithm and construct a maximum weight submatrix model considering different weights of coverage and exclusivity. Using this strategy, the unfavorable effect of mutational heterogeneity is overcome to some extent. Data involving lung adenocarcinoma and glioblastoma multiforme were tested with this method and the results compared with the MDPFinder, Dendrix, and Mutex methods. When the driver pathway size was 10, the recognition accuracy of the MBF method reached 80% in both datasets, and the weight values of the submatrix were 1.7 and 1.89, respectively, which are better than those of the compared methods. At the same time, in the signal pathway enrichment analysis, the important role of the driver genes identified by our MBF method in the cancer signaling pathway is revealed, and the validity of these driver genes is demonstrated from the perspective of their biological effects.
Collapse
Affiliation(s)
- Wei Zhang
- College of Computer Science and Engineering, Changsha University, Changsha 410022, China
- Hunan Province Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha 410022, China
| | - Xiaowen Xiang
- College of Computer Science and Engineering, Changsha University, Changsha 410022, China
| | - Bihai Zhao
- College of Computer Science and Engineering, Changsha University, Changsha 410022, China
- Hunan Province Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha 410022, China
| | - Jianlin Huang
- College of Computer Science and Engineering, Changsha University, Changsha 410022, China
| | - Lan Yang
- College of Computer Science and Engineering, Changsha University, Changsha 410022, China
| | - Yifu Zeng
- College of Computer Science and Engineering, Changsha University, Changsha 410022, China
- Hunan Province Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha 410022, China
| |
Collapse
|
10
|
Bridel C, van Gils JHM, Miedema SSM, Hoozemans JJM, Pijnenburg YAL, Smit AB, Rozemuller AJM, Abeln S, Teunissen CE. Clusters of co-abundant proteins in the brain cortex associated with fronto-temporal lobar degeneration. Alzheimers Res Ther 2023; 15:59. [PMID: 36949537 PMCID: PMC10035199 DOI: 10.1186/s13195-023-01200-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 02/28/2023] [Indexed: 03/24/2023]
Abstract
BACKGROUND Frontotemporal lobar degeneration (FTLD) is characterized pathologically by neuronal and glial inclusions of hyperphosphorylated tau or by neuronal cytoplasmic inclusions of TDP43. This study aimed at deciphering the molecular mechanisms leading to these distinct pathological subtypes. METHODS To this end, we performed an unbiased mass spectrometry-based proteomic and systems-level analysis of the middle frontal gyrus cortices of FTLD-tau (n = 6), FTLD-TDP (n = 15), and control patients (n = 5). We validated these results in an independent patient cohort (total n = 24). RESULTS The middle frontal gyrus cortex proteome was most significantly altered in FTLD-tau compared to controls (294 differentially expressed proteins at FDR = 0.05). The proteomic modifications in FTLD-TDP were more heterogeneous (49 differentially expressed proteins at FDR = 0.1). Weighted co-expression network analysis revealed 17 modules of co-regulated proteins, 13 of which were dysregulated in FTLD-tau. These modules included proteins associated with oxidative phosphorylation, scavenger mechanisms, chromatin regulation, and clathrin-mediated transport in both the frontal and temporal cortex of FTLD-tau. The most strongly dysregulated subnetworks identified cyclin-dependent kinase 5 (CDK5) and polypyrimidine tract-binding protein 1 (PTBP1) as key players in the disease process. Dysregulation of 9 of these modules was confirmed in independent validation data sets of FLTD-tau and control temporal and frontal cortex (total n = 24). Dysregulated modules were primarily associated with changes in astrocyte and endothelial cell protein abundance levels, indicating pathological changes in FTD are not limited to neurons. CONCLUSIONS Using this innovative workflow and zooming in on the most strongly dysregulated proteins of the identified modules, we were able to identify disease-associated mechanisms in FTLD-tau with high potential as biomarkers and/or therapeutic targets.
Collapse
Affiliation(s)
- Claire Bridel
- Neurochemistry Laboratory and Biobank, Department of Clinical Chemistry, Amsterdam Neuroscience, Neurodegeneration, Amsterdam UMC, Amsterdam, The Netherlands
- Department of Clinical Neurosciences, Division of Neurology, Geneva University Hospital, Geneva, Switzerland
| | - Juami H. M. van Gils
- Department of Computer Science, Bioinformatics group, VU University, Amsterdam, The Netherlands
| | - Suzanne S. M. Miedema
- Department of Molecular and Cellular Neurobiology, Center for Neurogenomics and Cognitive Research, Neuroscience Campus Amsterdam, VU University, Amsterdam, The Netherlands
| | - Jeroen J. M. Hoozemans
- Department of Pathology, Amsterdam Neuroscience, Amsterdam UMC, Amsterdam, The Netherlands
| | - Yolande A. L. Pijnenburg
- Alzheimer Center, Department of Neurology, Amsterdam Neuroscience, Amsterdam UMC, Amsterdam, The Netherlands
| | - August B. Smit
- Department of Molecular and Cellular Neurobiology, Center for Neurogenomics and Cognitive Research, Neuroscience Campus Amsterdam, VU University, Amsterdam, The Netherlands
| | | | - Sanne Abeln
- Department of Computer Science, Bioinformatics group, VU University, Amsterdam, The Netherlands
| | - Charlotte E. Teunissen
- Neurochemistry Laboratory and Biobank, Department of Clinical Chemistry, Amsterdam Neuroscience, Neurodegeneration, Amsterdam UMC, Amsterdam, The Netherlands
| |
Collapse
|
11
|
Quan C, Liu F, Qi L, Tie Y. LRT-CLUSTER: A New Clustering Algorithm Based on Likelihood Ratio Test to Identify Driving Genes. Interdiscip Sci 2023; 15:217-230. [PMID: 36848004 DOI: 10.1007/s12539-023-00554-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 01/31/2023] [Accepted: 02/01/2023] [Indexed: 03/01/2023]
Abstract
Somatic mutations often occur at high relapse sites in protein sequences, which indicates that the location clustering of somatic missense mutations can be used to identify driving genes. However, the traditional clustering algorithm has such problems as the background signal over-fitting, the clustering algorithm is not suitable for mutation data, and the performance of identifying low-frequency mutation genes needs to be improved. In this paper, we propose a linear clustering algorithm based on likelihood ratio test knowledge to identify driver genes. In this experiment, firstly, the polynucleotide mutation rate is calculated based on the prior knowledge of likelihood ratio test. Then, the simulation data set is obtained through the background mutation rate model. Finally, the unsupervised peak clustering algorithm is used to, respectively, evaluate the somatic mutation data and the simulation data to identify the driver genes. The experimental results show that our method achieves a better balance of precision and sensitivity. It can also identify the driver genes missed by other methods, making it an effective supplement to other methods. We also discover some potential linkages between genes and between genes and mutation sites, which is of great value to target drug therapy research. Method framework: Our proposed model framework is as follows. a. Counting mutation sites and the number of mutations in tumor gene elements. b. The nucleotide context mutation frequency is counted based on the likelihood ratio test knowledge, and the background mutation rate model is obtained. c. Based on Monte Carlo simulation method, data sets with the same number of mutations as gene elements are randomly sampled to obtain simulated mutation data, and the sampling frequency of each mutation site is related to the mutation rate of polynucleotide. d. The original mutation data and the simulated mutation data after random reconstruction are clustered by peak density, respectively, and the corresponding clustering scores are obtained. e. We can obtain the clustering information statistics in each gene segment and score of each gene segment from the original single nucleotide mutation data through step d. f. According to the observed score and the simulated clustering score, the p-value of the corresponding gene fragment is calculated. g. We can obtain the clustering information statistics in each gene segment and score of each gene segment from the simulated single nucleotide mutation data through step d.
Collapse
Affiliation(s)
- Chenxu Quan
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou, China.,Department of Respiratory and Sleep Medicine, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Fenghui Liu
- Department of Respiratory and Sleep Medicine, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Lin Qi
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou, China
| | - Yun Tie
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou, China.
| |
Collapse
|
12
|
Sharma A, Lysenko A, Boroevich KA, Tsunoda T. DeepInsight-3D architecture for anti-cancer drug response prediction with deep-learning on multi-omics. Sci Rep 2023; 13:2483. [PMID: 36774402 PMCID: PMC9922304 DOI: 10.1038/s41598-023-29644-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 02/08/2023] [Indexed: 02/13/2023] Open
Abstract
Modern oncology offers a wide range of treatments and therefore choosing the best option for particular patient is very important for optimal outcome. Multi-omics profiling in combination with AI-based predictive models have great potential for streamlining these treatment decisions. However, these encouraging developments continue to be hampered by very high dimensionality of the datasets in combination with insufficiently large numbers of annotated samples. Here we proposed a novel deep learning-based method to predict patient-specific anticancer drug response from three types of multi-omics data. The proposed DeepInsight-3D approach relies on structured data-to-image conversion that then allows use of convolutional neural networks, which are particularly robust to high dimensionality of the inputs while retaining capabilities to model highly complex relationships between variables. Of particular note, we demonstrate that in this formalism additional channels of an image can be effectively used to accommodate data from different omics layers while implicitly encoding the connection between them. DeepInsight-3D was able to outperform other state-of-the-art methods applied to this task. The proposed improvements can facilitate the development of better personalized treatment strategies for different cancers in the future.
Collapse
Affiliation(s)
- Alok Sharma
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia.
| | - Artem Lysenko
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.
- Laboratory for Medical Science Mathematics, Department of Biological Sciences, School of Science, The University of Tokyo, Tokyo, Japan.
| | - Keith A Boroevich
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Tatsuhiko Tsunoda
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.
- Laboratory for Medical Science Mathematics, Department of Biological Sciences, School of Science, The University of Tokyo, Tokyo, Japan.
- Laboratory for Medical Science Mathematics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan.
| |
Collapse
|
13
|
Galindez G, Sadegh S, Baumbach J, Kacprowski T, List M. Network-based approaches for modeling disease regulation and progression. Comput Struct Biotechnol J 2022; 21:780-795. [PMID: 36698974 PMCID: PMC9841310 DOI: 10.1016/j.csbj.2022.12.022] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 12/14/2022] [Accepted: 12/14/2022] [Indexed: 12/23/2022] Open
Abstract
Molecular interaction networks lay the foundation for studying how biological functions are controlled by the complex interplay of genes and proteins. Investigating perturbed processes using biological networks has been instrumental in uncovering mechanisms that underlie complex disease phenotypes. Rapid advances in omics technologies have prompted the generation of high-throughput datasets, enabling large-scale, network-based analyses. Consequently, various modeling techniques, including network enrichment, differential network extraction, and network inference, have proven to be useful for gaining new mechanistic insights. We provide an overview of recent network-based methods and their core ideas to facilitate the discovery of disease modules or candidate mechanisms. Knowledge generated from these computational efforts will benefit biomedical research, especially drug development and precision medicine. We further discuss current challenges and provide perspectives in the field, highlighting the need for more integrative and dynamic network approaches to model disease development and progression.
Collapse
Affiliation(s)
- Gihanna Galindez
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Sepideh Sadegh
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany.,Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany.,Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Tim Kacprowski
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| |
Collapse
|
14
|
Agamah FE, Bayjanov JR, Niehues A, Njoku KF, Skelton M, Mazandu GK, Ederveen THA, Mulder N, Chimusa ER, 't Hoen PAC. Computational approaches for network-based integrative multi-omics analysis. Front Mol Biosci 2022; 9:967205. [PMID: 36452456 PMCID: PMC9703081 DOI: 10.3389/fmolb.2022.967205] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Accepted: 10/20/2022] [Indexed: 08/27/2023] Open
Abstract
Advances in omics technologies allow for holistic studies into biological systems. These studies rely on integrative data analysis techniques to obtain a comprehensive view of the dynamics of cellular processes, and molecular mechanisms. Network-based integrative approaches have revolutionized multi-omics analysis by providing the framework to represent interactions between multiple different omics-layers in a graph, which may faithfully reflect the molecular wiring in a cell. Here we review network-based multi-omics/multi-modal integrative analytical approaches. We classify these approaches according to the type of omics data supported, the methods and/or algorithms implemented, their node and/or edge weighting components, and their ability to identify key nodes and subnetworks. We show how these approaches can be used to identify biomarkers, disease subtypes, crosstalk, causality, and molecular drivers of physiological and pathological mechanisms. We provide insight into the most appropriate methods and tools for research questions as showcased around the aetiology and treatment of COVID-19 that can be informed by multi-omics data integration. We conclude with an overview of challenges associated with multi-omics network-based analysis, such as reproducibility, heterogeneity, (biological) interpretability of the results, and we highlight some future directions for network-based integration.
Collapse
Affiliation(s)
- Francis E. Agamah
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, CIDRI-Africa Wellcome Trust Centre, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Jumamurat R. Bayjanov
- Center for Molecular and Biomolecular Informatics (CMBI), Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, Netherlands
| | - Anna Niehues
- Center for Molecular and Biomolecular Informatics (CMBI), Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, Netherlands
| | - Kelechi F. Njoku
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Michelle Skelton
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, CIDRI-Africa Wellcome Trust Centre, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Gaston K. Mazandu
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, CIDRI-Africa Wellcome Trust Centre, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
- African Institute for Mathematical Sciences, Cape Town, South Africa
| | - Thomas H. A. Ederveen
- Center for Molecular and Biomolecular Informatics (CMBI), Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, Netherlands
| | - Nicola Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, CIDRI-Africa Wellcome Trust Centre, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Emile R. Chimusa
- Department of Applied Sciences, Faculty of Health and Life Sciences, Northumbria University, Newcastle, United Kingdom
| | - Peter A. C. 't Hoen
- Center for Molecular and Biomolecular Informatics (CMBI), Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, Netherlands
| |
Collapse
|
15
|
Bendl J, Hauberg ME, Girdhar K, Im E, Vicari JM, Rahman S, Fernando MB, Townsley KG, Dong P, Misir R, Kleopoulos SP, Reach SM, Apontes P, Zeng B, Zhang W, Voloudakis G, Brennand KJ, Nixon RA, Haroutunian V, Hoffman GE, Fullard JF, Roussos P. The three-dimensional landscape of cortical chromatin accessibility in Alzheimer's disease. Nat Neurosci 2022; 25:1366-1378. [PMID: 36171428 PMCID: PMC9581463 DOI: 10.1038/s41593-022-01166-7] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 08/16/2022] [Indexed: 02/06/2023]
Abstract
To characterize the dysregulation of chromatin accessibility in Alzheimer's disease (AD), we generated 636 ATAC-seq libraries from neuronal and nonneuronal nuclei isolated from the superior temporal gyrus and entorhinal cortex of 153 AD cases and 56 controls. By analyzing a total of ~20 billion read pairs, we expanded the repertoire of known open chromatin regions (OCRs) in the human brain and identified cell-type-specific enhancer-promoter interactions. We show that interindividual variability in OCRs can be leveraged to identify cis-regulatory domains (CRDs) that capture the three-dimensional structure of the genome (3D genome). We identified AD-associated effects on chromatin accessibility, the 3D genome and transcription factor (TF) regulatory networks. For one of the most AD-perturbed TFs, USF2, we validated its regulatory effect on lysosomal genes. Overall, we applied a systematic approach to understanding the role of the 3D genome in AD. We provide all data as an online resource for widespread community-based analysis.
Collapse
Affiliation(s)
- Jaroslav Bendl
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Mads E Hauberg
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Biomedicine, Aarhus University, Aarhus, Denmark
- The Lundbeck Foundation Initiative of Integrative Psychiatric Research (iPSYCH), Aarhus University, Aarhus, Denmark
- Centre for Integrative Sequencing (iSEQ), Aarhus University, Aarhus, Denmark
| | - Kiran Girdhar
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Eunju Im
- Center for Dementia Research, Nathan Kline Institute for Psychiatric Research, Orangeburg, NY, USA
- Department of Psychiatry, New York University Langone Health, New York, NY, USA
| | - James M Vicari
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Graduate School of Biomedical Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Samir Rahman
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Michael B Fernando
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Graduate School of Biomedical Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Kayla G Townsley
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Graduate School of Biomedical Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Pengfei Dong
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ruth Misir
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Steven P Kleopoulos
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Sarah M Reach
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Pasha Apontes
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Biao Zeng
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Wen Zhang
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Georgios Voloudakis
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Kristen J Brennand
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Yale University, New Haven, CT, USA
| | - Ralph A Nixon
- Center for Dementia Research, Nathan Kline Institute for Psychiatric Research, Orangeburg, NY, USA
- Department of Psychiatry, New York University Langone Health, New York, NY, USA
- Department of Cell Biology, New York University Langone Health, New York, NY, USA
- New York University Neuroscience Institute, New York, NY, USA
| | - Vahram Haroutunian
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Mental Illness Research Education and Clinical Center (MIRECC), James J. Peters VA Medical Center, Bronx, NY, USA
| | - Gabriel E Hoffman
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - John F Fullard
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Panos Roussos
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Center for Dementia Research, Nathan Kline Institute for Psychiatric Research, Orangeburg, NY, USA.
- Mental Illness Research Education and Clinical Center (MIRECC), James J. Peters VA Medical Center, Bronx, NY, USA.
| |
Collapse
|
16
|
Belikov AV, Vyatkin AD, Leonov SV. Novel Driver Strength Index highlights important cancer genes in TCGA PanCanAtlas patients. PeerJ 2022; 10:e13860. [PMID: 35975235 PMCID: PMC9375969 DOI: 10.7717/peerj.13860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Accepted: 07/18/2022] [Indexed: 01/18/2023] Open
Abstract
Background Cancer driver genes are usually ranked by mutation frequency, which does not necessarily reflect their driver strength. We hypothesize that driver strength is higher for genes preferentially mutated in patients with few driver mutations overall, because these few mutations should be strong enough to initiate cancer. Methods We propose formulas for the Driver Strength Index (DSI) and the Normalized Driver Strength Index (NDSI), the latter independent of gene mutation frequency. We validate them using TCGA PanCanAtlas datasets, established driver prediction algorithms and custom computational pipelines integrating SNA, CNA and aneuploidy driver contributions at the patient-level resolution. Results DSI and especially NDSI provide substantially different gene rankings compared to the frequency approach. E.g., NDSI prioritized members of specific protein families, including G proteins GNAQ, GNA11 and GNAS, isocitrate dehydrogenases IDH1 and IDH2, and fibroblast growth factor receptors FGFR2 and FGFR3. KEGG analysis shows that top NDSI-ranked genes comprise EGFR/FGFR2/GNAQ/GNA11-NRAS/HRAS/KRAS-BRAF pathway, AKT1-MTOR pathway, and TCEB1-VHL-HIF1A pathway. Conclusion Our indices are able to select for driver gene attributes not selected by frequency sorting, potentially for driver strength. Genes and pathways prioritized are likely the strongest contributors to cancer initiation and progression and should become future therapeutic targets.
Collapse
|
17
|
Kosoy R, Fullard JF, Zeng B, Bendl J, Dong P, Rahman S, Kleopoulos SP, Shao Z, Girdhar K, Humphrey J, de Paiva Lopes K, Charney AW, Kopell BH, Raj T, Bennett D, Kellner CP, Haroutunian V, Hoffman GE, Roussos P. Genetics of the human microglia regulome refines Alzheimer's disease risk loci. Nat Genet 2022; 54:1145-1154. [PMID: 35931864 PMCID: PMC9388367 DOI: 10.1038/s41588-022-01149-1] [Citation(s) in RCA: 52] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Accepted: 06/08/2022] [Indexed: 02/07/2023]
Abstract
Microglia are brain myeloid cells that play a critical role in neuroimmunity and the etiology of Alzheimer's disease (AD), yet our understanding of how the genetic regulatory landscape controls microglial function and contributes to AD is limited. Here, we performed transcriptome and chromatin accessibility profiling in primary human microglia from 150 donors to identify genetically driven variation and cell-specific enhancer-promoter (E-P) interactions. Integrative fine-mapping analysis identified putative regulatory mechanisms for 21 AD risk loci, of which 18 were refined to a single gene, including 3 new candidate risk genes (KCNN4, FIBP and LRRC25). Transcription factor regulatory networks captured AD risk variation and identified SPI1 as a key putative regulator of microglia expression and AD risk. This comprehensive resource capturing variation in the human microglia regulome provides insights into the etiology of neurodegenerative disease.
Collapse
Affiliation(s)
- Roman Kosoy
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA.
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, USA.
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, USA.
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA.
| | - John F Fullard
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Biao Zeng
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Jaroslav Bendl
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Pengfei Dong
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Samir Rahman
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Steven P Kleopoulos
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Zhiping Shao
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Kiran Girdhar
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Jack Humphrey
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Katia de Paiva Lopes
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, Illinois, USA
| | - Alexander W Charney
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Brian H Kopell
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Neurosurgery, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Towfique Raj
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - David Bennett
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, Illinois, USA
- Department of Neurological Sciences, Rush University Medical Center, Chicago, IL, USA
| | | | - Vahram Haroutunian
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, USA
- Mental Illness Research Education, and Clinical Center (VISN 2 South), James J. Peters VA Medical Center, Bronx, NY, USA
| | - Gabriel E Hoffman
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA.
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, USA.
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, USA.
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA.
| | - Panos Roussos
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA.
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, USA.
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, USA.
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA.
- Mental Illness Research Education, and Clinical Center (VISN 2 South), James J. Peters VA Medical Center, Bronx, NY, USA.
- Center for Dementia Research, Nathan Kline Institute for Psychiatric Research, Orangeburg, NY, USA.
| |
Collapse
|
18
|
Petrov I, Alexeyenko A. Individualized discovery of rare cancer drivers in global network context. eLife 2022; 11:74010. [PMID: 35593700 PMCID: PMC9159755 DOI: 10.7554/elife.74010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 05/20/2022] [Indexed: 11/13/2022] Open
Abstract
Late advances in genome sequencing expanded the space of known cancer driver genes several-fold. However, most of this surge was based on computational analysis of somatic mutation frequencies and/or their impact on the protein function. On the contrary, experimental research necessarily accounted for functional context of mutations interacting with other genes and conferring cancer phenotypes. Eventually, just such results become ‘hard currency’ of cancer biology. The new method, NEAdriver employs knowledge accumulated thus far in the form of global interaction network and functionally annotated pathways in order to recover known and predict novel driver genes. The driver discovery was individualized by accounting for mutations’ co-occurrence in each tumour genome – as an alternative to summarizing information over the whole cancer patient cohorts. For each somatic genome change, probabilistic estimates from two lanes of network analysis were combined into joint likelihoods of being a driver. Thus, ability to detect previously unnoticed candidate driver events emerged from combining individual genomic context with network perspective. The procedure was applied to 10 largest cancer cohorts followed by evaluating error rates against previous cancer gene sets. The discovered driver combinations were shown to be informative on cancer outcome. This revealed driver genes with individually sparse mutation patterns that would not be detectable by other computational methods and related to cancer biology domains poorly covered by previous analyses. In particular, recurrent mutations of collagen, laminin, and integrin genes were observed in the adenocarcinoma and glioblastoma cancers. Considering constellation patterns of candidate drivers in individual cancer genomes opens a novel avenue for personalized cancer medicine.
Collapse
Affiliation(s)
- Iurii Petrov
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden.,Science for Life Laboratory, Solna, Sweden
| | - Andrey Alexeyenko
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden.,Science for Life Laboratory, Solna, Sweden.,Evi-networks, enskild konsultföretag, Huddinge, Sweden
| |
Collapse
|
19
|
Smith SP, Shahamatdar S, Cheng W, Zhang S, Paik J, Graff M, Haiman C, Matise TC, North KE, Peters U, Kenny E, Gignoux C, Wojcik G, Crawford L, Ramachandran S. Enrichment analyses identify shared associations for 25 quantitative traits in over 600,000 individuals from seven diverse ancestries. Am J Hum Genet 2022; 109:871-884. [PMID: 35349783 PMCID: PMC9118115 DOI: 10.1016/j.ajhg.2022.03.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Accepted: 03/02/2022] [Indexed: 12/12/2022] Open
Abstract
Since 2005, genome-wide association (GWA) datasets have been largely biased toward sampling European ancestry individuals, and recent studies have shown that GWA results estimated from self-identified European individuals are not transferable to non-European individuals because of various confounding challenges. Here, we demonstrate that enrichment analyses that aggregate SNP-level association statistics at multiple genomic scales-from genes to genomic regions and pathways-have been underutilized in the GWA era and can generate biologically interpretable hypotheses regarding the genetic basis of complex trait architecture. We illustrate examples of the robust associations generated by enrichment analyses while studying 25 continuous traits assayed in 566,786 individuals from seven diverse self-identified human ancestries in the UK Biobank and the Biobank Japan as well as 44,348 admixed individuals from the PAGE consortium including cohorts of African American, Hispanic and Latin American, Native Hawaiian, and American Indian/Alaska Native individuals. We identify 1,000 gene-level associations that are genome-wide significant in at least two ancestry cohorts across these 25 traits as well as highly conserved pathway associations with triglyceride levels in European, East Asian, and Native Hawaiian cohorts.
Collapse
Affiliation(s)
- Samuel Pattillo Smith
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA; Department of Ecology, Evolution, and Organismal Biology, Brown University, Providence, RI 02912, USA
| | - Sahar Shahamatdar
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA; Department of Ecology, Evolution, and Organismal Biology, Brown University, Providence, RI 02912, USA
| | - Wei Cheng
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA; Department of Ecology, Evolution, and Organismal Biology, Brown University, Providence, RI 02912, USA
| | - Selena Zhang
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA
| | - Joseph Paik
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA
| | - Misa Graff
- Department of Epidemiology, University of North Carolina, Chapel Hill, Chapel Hill, NC 27599, USA
| | - Christopher Haiman
- Department of Preventative Medicine, University of Southern California, Los Angeles, CA 90089, USA
| | - T C Matise
- Department of Genetics, Rutgers University, Piscataway, NJ 08854, USA
| | - Kari E North
- Department of Epidemiology, University of North Carolina, Chapel Hill, Chapel Hill, NC 27599, USA
| | - Ulrike Peters
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Eimear Kenny
- The Center for Genomic Health, Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA; The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA
| | - Chris Gignoux
- Division of Biomedical Informatics and Personalized Medicine, University of Colorado, Denver, CO 80204, USA
| | - Genevieve Wojcik
- Department of Epidemiology, Johns Hopkins University, Baltimore, MD 21287, USA
| | - Lorin Crawford
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA; Department of Biostatistics, Brown University, Providence, RI 02906, USA; Microsoft Research New England, Cambridge, MA 02142, USA
| | - Sohini Ramachandran
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA; Department of Ecology, Evolution, and Organismal Biology, Brown University, Providence, RI 02912, USA; Data Science Initiative, Brown University, Providence, RI 02912, USA.
| |
Collapse
|
20
|
Bernett J, Krupke D, Sadegh S, Baumbach J, Fekete SP, Kacprowski T, List M, Blumenthal DB. Robust disease module mining via enumeration of diverse prize-collecting Steiner trees. Bioinformatics 2022; 38:1600-1606. [PMID: 34984440 DOI: 10.1093/bioinformatics/btab876] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 11/29/2021] [Accepted: 12/31/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Disease module mining methods (DMMMs) extract subgraphs that constitute candidate disease mechanisms from molecular interaction networks such as protein-protein interaction (PPI) networks. Irrespective of the employed models, DMMMs typically include non-robust steps in their workflows, i.e. the computed subnetworks vary when running the DMMMs multiple times on equivalent input. This lack of robustness has a negative effect on the trustworthiness of the obtained subnetworks and is hence detrimental for the widespread adoption of DMMMs in the biomedical sciences. RESULTS To overcome this problem, we present a new DMMM called ROBUST (robust disease module mining via enumeration of diverse prize-collecting Steiner trees). In a large-scale empirical evaluation, we show that ROBUST outperforms competing methods in terms of robustness, scalability and, in most settings, functional relevance of the produced modules, measured via KEGG (Kyoto Encyclopedia of Genes and Genomes) gene set enrichment scores and overlap with DisGeNET disease genes. AVAILABILITY AND IMPLEMENTATION A Python 3 implementation and scripts to reproduce the results reported in this article are available on GitHub: https://github.com/bionetslab/robust, https://github.com/bionetslab/robust-eval. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Judith Bernett
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Dominik Krupke
- Department of Computer Science, TU Braunschweig, 38106 Braunschweig, Germany
| | - Sepideh Sadegh
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany.,Institute for Computational Systems Biology, University of Hamburg, 22607 Hamburg, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, 22607 Hamburg, Germany.,Department of Mathematics and Computer Science, University of Southern Denmark, 5230 Odense, Denmark
| | - Sándor P Fekete
- Department of Computer Science, TU Braunschweig, 38106 Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), 38106 Braunschweig, Germany
| | - Tim Kacprowski
- Braunschweig Integrated Centre of Systems Biology (BRICS), 38106 Braunschweig, Germany.,Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics, Technical University of Braunschweig and Hannover Medical School, 38106 Braunschweig, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - David B Blumenthal
- Department Artificial Intelligence in Biomedical Engineering (AIBE), Friedrich-Alexander University Erlangen-Nürnberg (FAU), 91052 Erlangen, Germany
| |
Collapse
|
21
|
Park TY, Leiserson MD, Klau GW, Raphael BJ. SuperDendrix algorithm integrates genetic dependencies and genomic alterations across pathways and cancer types. CELL GENOMICS 2022; 2. [PMID: 35382456 PMCID: PMC8979493 DOI: 10.1016/j.xgen.2022.100099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Recent genome-wide CRISPR-Cas9 loss-of-function screens have identified genetic dependencies across many cancer cell lines. Associations between these dependencies and genomic alterations in the same cell lines reveal phenomena such as oncogene addiction and synthetic lethality. However, comprehensive identification of such associations is complicated by complex interactions between genes across genetically heterogeneous cancer types. We introduce and apply the algorithm SuperDendrix to CRISPR-Cas9 loss-of-function screens from 769 cancer cell lines, to identify differential dependencies across cell lines and to find associations between differential dependencies and combinations of genomic alterations and cell-type-specific markers. These associations respect the position and type of interactions within pathways: for example, we observe increased dependencies on downstream activators of pathways, such as NFE2L2, and decreased dependencies on upstream activators of pathways, such as CDK6. SuperDendrix also reveals dozens of dependencies on lineage-specific transcription factors, identifies cancer-type-specific correlations between dependencies, and enables annotation of individual mutated residues. Using SuperDendrix, Park et al. examine associations between genetic dependencies in 769 cancer cell lines. They report 127 genetic dependencies explained by combinations of mutually exclusive somatic mutations congregating into a few oncogenic pathways across cancer subtypes. These present a small number of prominent and highly specific genetic vulnerabilities in cancer. Graphical abstract
Collapse
|
22
|
Windels SFL, Malod-Dognin N, Pržulj N. Graphlet eigencentralities capture novel central roles of genes in pathways. PLoS One 2022; 17:e0261676. [PMID: 35077468 PMCID: PMC8789115 DOI: 10.1371/journal.pone.0261676] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Accepted: 12/07/2021] [Indexed: 01/16/2023] Open
Abstract
MOTIVATION Graphlet adjacency extends regular node adjacency in a network by considering a pair of nodes being adjacent if they participate in a given graphlet (small, connected, induced subgraph). Graphlet adjacencies captured by different graphlets were shown to contain complementary biological functions and cancer mechanisms. To further investigate the relationships between the topological features of genes participating in molecular networks, as captured by graphlet adjacencies, and their biological functions, we build more descriptive pathway-based approaches. CONTRIBUTION We introduce a new graphlet-based definition of eigencentrality of genes in a pathway, graphlet eigencentrality, to identify pathways and cancer mechanisms described by a given graphlet adjacency. We compute the centrality of genes in a pathway either from the local perspective of the pathway or from the global perspective of the entire network. RESULTS We show that in molecular networks of human and yeast, different local graphlet adjacencies describe different pathways (i.e., all the genes that are functionally important in a pathway are also considered topologically important by their local graphlet eigencentrality). Pathways described by the same graphlet adjacency are functionally similar, suggesting that each graphlet adjacency captures different pathway topology and function relationships. Additionally, we show that different graphlet eigencentralities describe different cancer driver genes that play central roles in pathways, or in the crosstalk between them (i.e. we can predict cancer driver genes participating in a pathway by their local or global graphlet eigencentrality). This result suggests that by considering different graphlet eigencentralities, we can capture different functional roles of genes in and between pathways.
Collapse
Affiliation(s)
- Sam F. L. Windels
- Department of Computer Science, University College London, London, United Kingdom
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Noël Malod-Dognin
- Department of Computer Science, University College London, London, United Kingdom
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Nataša Pržulj
- Department of Computer Science, University College London, London, United Kingdom
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
- ICREA, Barcelona, Spain
| |
Collapse
|
23
|
OUP accepted manuscript. Brief Funct Genomics 2022; 21:310-324. [DOI: 10.1093/bfgp/elac010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 04/03/2022] [Accepted: 04/22/2022] [Indexed: 11/14/2022] Open
|
24
|
Shen JP. Artificial intelligence, molecular subtyping, biomarkers, and precision oncology. Emerg Top Life Sci 2021; 5:747-756. [PMID: 34881776 PMCID: PMC8786277 DOI: 10.1042/etls20210212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 11/23/2021] [Accepted: 11/24/2021] [Indexed: 11/17/2022]
Abstract
A targeted cancer therapy is only useful if there is a way to accurately identify the tumors that are susceptible to that therapy. Thus rapid expansion in the number of available targeted cancer treatments has been accompanied by a robust effort to subdivide the traditional histological and anatomical tumor classifications into molecularly defined subtypes. This review highlights the history of the paired evolution of targeted therapies and biomarkers, reviews currently used methods for subtype identification, and discusses challenges to the implementation of precision oncology as well as possible solutions.
Collapse
Affiliation(s)
- John Paul Shen
- Department of Gastrointestinal Medical Oncology, University of Texas MD Anderson Cancer Center, Houston, U.S.A
| |
Collapse
|
25
|
Weiskittel TM, Ung CY, Correia C, Zhang C, Li H. De novo individualized disease modules reveal the synthetic penetrance of genes and inform personalized treatment regimens. Genome Res 2021; 32:124-134. [PMID: 34876496 PMCID: PMC8744682 DOI: 10.1101/gr.275889.121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 11/30/2021] [Indexed: 12/04/2022]
Abstract
Current understandings of individual disease etiology and therapeutics are limited despite great need. To fill the gap, we propose a novel computational pipeline that collects potent disease gene cooperative pathways to envision individualized disease etiology and therapies. Our algorithm constructs individualized disease modules de novo, which enables us to elucidate the importance of mutated genes in specific patients and to understand the synthetic penetrance of these genes across patients. We reveal that importance of the notorious cancer drivers TP53 and PIK3CA fluctuate widely across breast cancers and peak in tumors with distinct numbers of mutations and that rarely mutated genes such as XPO1 and PLEKHA1 have high disease module importance in specific individuals. Furthermore, individualized module disruption enables us to devise customized singular and combinatorial target therapies that were highly varied across patients, showing the need for precision therapeutics pipelines. As the first analysis of de novo individualized disease modules, we illustrate the power of individualized disease modules for precision medicine by providing deep novel insights on the activity of diseased genes in individuals.
Collapse
Affiliation(s)
- Taylor M Weiskittel
- Center for Individualized Medicine, Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic College of Medicine, Rochester, Minnesota 55905, USA
| | - Choong Y Ung
- Center for Individualized Medicine, Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic College of Medicine, Rochester, Minnesota 55905, USA
| | - Cristina Correia
- Center for Individualized Medicine, Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic College of Medicine, Rochester, Minnesota 55905, USA
| | - Cheng Zhang
- Center for Individualized Medicine, Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic College of Medicine, Rochester, Minnesota 55905, USA
| | - Hu Li
- Center for Individualized Medicine, Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic College of Medicine, Rochester, Minnesota 55905, USA
| |
Collapse
|
26
|
Cutigi JF, Evangelista AF, Reis RM, Simao A. A computational approach for the discovery of significant cancer genes by weighted mutation and asymmetric spreading strength in networks. Sci Rep 2021; 11:23551. [PMID: 34876593 PMCID: PMC8651746 DOI: 10.1038/s41598-021-02671-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2021] [Accepted: 10/26/2021] [Indexed: 11/25/2022] Open
Abstract
Identifying significantly mutated genes in cancer is essential for understanding the mechanisms of tumor initiation and progression. This task is a key challenge since large-scale genomic studies have reported an endless number of genes mutated at a shallow frequency. Towards uncovering infrequently mutated genes, gene interaction networks combined with mutation data have been explored. This work proposes Discovering Significant Cancer Genes (DiSCaGe), a computational method for discovering significant genes for cancer. DiSCaGe computes a mutation score for the genes based on the type of mutations they have. The influence received for their neighbors in the network is also considered and obtained through an asymmetric spreading strength applied to a consensus gene network. DiSCaGe produces a ranking of prioritized possible cancer genes. An experimental evaluation with six types of cancer revealed the potential of DiSCaGe for discovering known and possible novel significant cancer genes.
Collapse
Affiliation(s)
- Jorge Francisco Cutigi
- Federal Institute of Sao Paulo, Sao Carlos, SP, Brazil.
- University of Sao Paulo, Sao Carlos, SP, Brazil.
| | | | - Rui Manuel Reis
- Molecular Oncology Research Center, Barretos Cancer Hospital, Barretos, SP, Brazil
| | | |
Collapse
|
27
|
Demirel HC, Arici MK, Tuncbag N. Computational approaches leveraging integrated connections of multi-omic data toward clinical applications. Mol Omics 2021; 18:7-18. [PMID: 34734935 DOI: 10.1039/d1mo00158b] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
In line with the advances in high-throughput technologies, multiple omic datasets have accumulated to study biological systems and diseases coherently. No single omics data type is capable of fully representing cellular activity. The complexity of the biological processes arises from the interactions between omic entities such as genes, proteins, and metabolites. Therefore, multi-omic data integration is crucial but challenging. The impact of the molecular alterations in multi-omic data is not local in the neighborhood of the altered gene or protein; rather, the impact diffuses in the network and changes the functionality of multiple signaling pathways and regulation of the gene expression. Additionally, multi-omic data is high-dimensional and has background noise. Several integrative approaches have been developed to accurately interpret the multi-omic datasets, including machine learning, network-based methods, and their combination. In this review, we overview the most recent integrative approaches and tools with a focus on network-based methods. We then discuss these approaches according to their specific applications, from disease-network and biomarker identification to patient stratification, drug discovery, and repurposing.
Collapse
Affiliation(s)
- Habibe Cansu Demirel
- Graduate School of Informatics, Middle East Technical University, Ankara, 06800, Turkey
| | - Muslum Kaan Arici
- Graduate School of Informatics, Middle East Technical University, Ankara, 06800, Turkey.,Foot and Mouth Diseases Institute, Ministry of Agriculture and Forestry, Ankara, 06044, Turkey
| | - Nurcan Tuncbag
- Chemical and Biological Engineering, College of Engineering, Koc University, Istanbul, 34450, Turkey.,School of Medicine, Koc University, Istanbul, 34450, Turkey.,Koc University Research Center for Translational Medicine (KUTTAM), Istanbul, Turkey.
| |
Collapse
|
28
|
de Schaetzen van Brienen L, Miclotte G, Larmuseau M, Van den Eynden J, Marchal K. Network-Based Analysis to Identify Drivers of Metastatic Prostate Cancer Using GoNetic. Cancers (Basel) 2021; 13:5291. [PMID: 34771455 PMCID: PMC8582433 DOI: 10.3390/cancers13215291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 10/19/2021] [Accepted: 10/19/2021] [Indexed: 11/16/2022] Open
Abstract
Most known driver genes of metastatic prostate cancer are frequently mutated. To dig into the long tail of rarely mutated drivers, we performed network-based driver identification on the Hartwig Medical Foundation metastatic prostate cancer data set (HMF cohort). Hereto, we developed GoNetic, a method based on probabilistic pathfinding, to identify recurrently mutated subnetworks. In contrast to most state-of-the-art network-based methods, GoNetic can leverage sample-specific mutational information and the weights of the underlying prior network. When applied to the HMF cohort, GoNetic successfully recovered known primary and metastatic drivers of prostate cancer that are frequently mutated in the HMF cohort (TP53, RB1, and CTNNB1). In addition, the identified subnetworks contain frequently mutated genes, reflect processes related to metastatic prostate cancer, and contain rarely mutated driver candidates. To further validate these rarely mutated genes, we assessed whether the identified genes were more mutated in metastatic than in primary samples using an independent cohort. Then we evaluated their association with tumor evolution and with the lymph node status of the patients. This resulted in forwarding several novel putative driver genes for metastatic prostate cancer, some of which might be prognostic for disease evolution.
Collapse
Affiliation(s)
- Louise de Schaetzen van Brienen
- Department of Plant Biotechnology and Bioinformatics, Faculty of Sciences, Ghent University, 9052 Ghent, Belgium; (L.d.S.v.B.); (G.M.); (M.L.)
- Department of Information Technology, Faculty of Engineering and Architecture, Ghent University-IMEC, 9052 Ghent, Belgium
| | - Giles Miclotte
- Department of Plant Biotechnology and Bioinformatics, Faculty of Sciences, Ghent University, 9052 Ghent, Belgium; (L.d.S.v.B.); (G.M.); (M.L.)
- Department of Information Technology, Faculty of Engineering and Architecture, Ghent University-IMEC, 9052 Ghent, Belgium
| | - Maarten Larmuseau
- Department of Plant Biotechnology and Bioinformatics, Faculty of Sciences, Ghent University, 9052 Ghent, Belgium; (L.d.S.v.B.); (G.M.); (M.L.)
- Department of Information Technology, Faculty of Engineering and Architecture, Ghent University-IMEC, 9052 Ghent, Belgium
| | - Jimmy Van den Eynden
- Department of Human Structure and Repair, Faculty of Medicine and Health Sciences, Ghent University, 9000 Ghent, Belgium;
| | - Kathleen Marchal
- Department of Plant Biotechnology and Bioinformatics, Faculty of Sciences, Ghent University, 9052 Ghent, Belgium; (L.d.S.v.B.); (G.M.); (M.L.)
- Department of Information Technology, Faculty of Engineering and Architecture, Ghent University-IMEC, 9052 Ghent, Belgium
| |
Collapse
|
29
|
Mohsen H, Gunasekharan V, Qing T, Seay M, Surovtseva Y, Negahban S, Szallasi Z, Pusztai L, Gerstein MB. Network propagation-based prioritization of long tail genes in 17 cancer types. Genome Biol 2021; 22:287. [PMID: 34620211 PMCID: PMC8496153 DOI: 10.1186/s13059-021-02504-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 09/17/2021] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND The diversity of genomic alterations in cancer poses challenges to fully understanding the etiologies of the disease. Recent interest in infrequent mutations, in genes that reside in the "long tail" of the mutational distribution, uncovered new genes with significant implications in cancer development. The study of cancer-relevant genes often requires integrative approaches pooling together multiple types of biological data. Network propagation methods demonstrate high efficacy in achieving this integration. Yet, the majority of these methods focus their assessment on detecting known cancer genes or identifying altered subnetworks. In this paper, we introduce a network propagation approach that entirely focuses on prioritizing long tail genes with potential functional impact on cancer development. RESULTS We identify sets of often overlooked, rarely to moderately mutated genes whose biological interactions significantly propel their mutation-frequency-based rank upwards during propagation in 17 cancer types. We call these sets "upward mobility genes" and hypothesize that their significant rank improvement indicates functional importance. We report new cancer-pathway associations based on upward mobility genes that are not previously identified using driver genes alone, validate their role in cancer cell survival in vitro using extensive genome-wide RNAi and CRISPR data repositories, and further conduct in vitro functional screenings resulting in the validation of 18 previously unreported genes. CONCLUSION Our analysis extends the spectrum of cancer-relevant genes and identifies novel potential therapeutic targets.
Collapse
Affiliation(s)
- Hussein Mohsen
- Computational Biology & Bioinformatics Program, Yale University, New Haven, CT, 06511, USA.
| | | | - Tao Qing
- Breast Medical Oncology, Yale School of Medicine, New Haven, CT, 06511, USA
| | - Montrell Seay
- Yale Center for Molecular Discovery, Yale University, West Haven, CT, 06516, USA
| | - Yulia Surovtseva
- Yale Center for Molecular Discovery, Yale University, West Haven, CT, 06516, USA
| | - Sahand Negahban
- Department of Statistics & Data Science, Yale University, New Haven, CT, 06511, USA
| | - Zoltan Szallasi
- Children's Hospital Informatics Program, Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School, Boston, MA, 02115, USA
| | - Lajos Pusztai
- Breast Medical Oncology, Yale School of Medicine, New Haven, CT, 06511, USA.
| | - Mark B Gerstein
- Computational Biology & Bioinformatics Program, Yale University, New Haven, CT, 06511, USA.
- Department of Statistics & Data Science, Yale University, New Haven, CT, 06511, USA.
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06511, USA.
- Department of Computer Science, Yale University, New Haven, CT, 06511, USA.
| |
Collapse
|
30
|
Zheng F, Kelly MR, Ramms DJ, Heintschel ML, Tao K, Tutuncuoglu B, Lee JJ, Ono K, Foussard H, Chen M, Herrington KA, Silva E, Liu S, Chen J, Churas C, Wilson N, Kratz A, Pillich RT, Patel DN, Park J, Kuenzi B, Yu MK, Licon K, Pratt D, Kreisberg JF, Kim M, Swaney DL, Nan X, Fraley SI, Gutkind JS, Krogan NJ, Ideker T. Interpretation of cancer mutations using a multiscale map of protein systems. Science 2021; 374:eabf3067. [PMID: 34591613 PMCID: PMC9126298 DOI: 10.1126/science.abf3067] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
A major goal of cancer research is to understand how mutations distributed across diverse genes affect common cellular systems, including multiprotein complexes and assemblies. Two challenges—how to comprehensively map such systems and how to identify which are under mutational selection—have hindered this understanding. Accordingly, we created a comprehensive map of cancer protein systems integrating both new and published multi-omic interaction data at multiple scales of analysis. We then developed a unified statistical model that pinpoints 395 specific systems under mutational selection across 13 cancer types. This map, called NeST (Nested Systems in Tumors), incorporates canonical processes and notable discoveries, including a PIK3CA-actomyosin complex that inhibits phosphatidylinositol 3-kinase signaling and recurrent mutations in collagen complexes that promote tumor proliferation. These systems can be used as clinical biomarkers and implicate a total of 548 genes in cancer evolution and progression. This work shows how disparate tumor mutations converge on protein assemblies at different scales.
Collapse
Affiliation(s)
- Fan Zheng
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Marcus R. Kelly
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Dana J. Ramms
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
- Moores Cancer Center, University of California San Diego, La Jolla, CA 92093, USA
- Department of Pharmacology, University of California San Diego, La Jolla, CA 92093, USA
| | - Marissa L. Heintschel
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Kai Tao
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, OR, 97239, USA
- Center for Spatial Systems Biomedicine, Oregon Health and Science University, Portland, OR, 97201, USA
| | - Beril Tutuncuoglu
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, CA 94158, USA
- The J. David Gladstone Institutes, San Francisco, CA 94158, USA
- Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA, 94158, USA
| | - John J. Lee
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Keiichiro Ono
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Helene Foussard
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, CA 94158, USA
- The J. David Gladstone Institutes, San Francisco, CA 94158, USA
- Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Michael Chen
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Kari A. Herrington
- Department of Biochemistry and Biophysics Center for Advanced Light Microscopy at UCSF, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Erica Silva
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Sophie Liu
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Jing Chen
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Christopher Churas
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Nicholas Wilson
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Anton Kratz
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Rudolf T. Pillich
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Devin N. Patel
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Jisoo Park
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Brent Kuenzi
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Michael K. Yu
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Katherine Licon
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Dexter Pratt
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Jason F. Kreisberg
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Minkyu Kim
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, CA 94158, USA
- The J. David Gladstone Institutes, San Francisco, CA 94158, USA
- Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Danielle L. Swaney
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, CA 94158, USA
- The J. David Gladstone Institutes, San Francisco, CA 94158, USA
- Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Xiaolin Nan
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, OR, 97239, USA
- Center for Spatial Systems Biomedicine, Oregon Health and Science University, Portland, OR, 97201, USA
- Knight Cancer Early Detection Advanced Research Center, Oregon Health and Science University, Portland, OR, 97201, USA
| | - Stephanie I. Fraley
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA
| | - J. Silvio Gutkind
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
- Moores Cancer Center, University of California San Diego, La Jolla, CA 92093, USA
- Department of Pharmacology, University of California San Diego, La Jolla, CA 92093, USA
| | - Nevan J. Krogan
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, CA 94158, USA
- The J. David Gladstone Institutes, San Francisco, CA 94158, USA
- Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Trey Ideker
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| |
Collapse
|
31
|
Zhang W, Wang SL, Liu Y. Identification of Cancer Driver Modules Based on Graph Clustering from Multiomics Data. J Comput Biol 2021; 28:1007-1020. [PMID: 34529511 DOI: 10.1089/cmb.2021.0052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
A major challenge in cancer genomics is to identify cancer driver genes and modules. Most existing methods to identify cancer driver modules (iCDM) identify groups of genes whose somatic mutational patterns exhibit either mutual exclusivity or high coverage of patient samples, without considering other biological information from multiomics data sets. Here we integrate mutual exclusivity, coverage, and protein-protein interaction information to construct an edge-weighted network, and present a graph clustering approach based on symmetric non-negative matrix factorization to iCDM. iCDM was tested on pan-cancer data and the results were compared with those from several advanced computational methods. Our approach outperformed other methods in recovering known cancer driver modules, and the identified driver modules showed high accuracy in classifying normal and tumor samples.
Collapse
Affiliation(s)
- Wei Zhang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China.,Hunan Province Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, China
| | - Shu-Lin Wang
- College of Computer Science and Electronics Engineering, Hunan University, Changsha, China
| | - Yue Liu
- College of Computer Science and Electronics Engineering, Hunan University, Changsha, China
| |
Collapse
|
32
|
Integrating Protein-Protein Interaction Networks and Somatic Mutation Data to Detect Driver Modules in Pan-Cancer. Interdiscip Sci 2021; 14:151-167. [PMID: 34491536 DOI: 10.1007/s12539-021-00475-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Revised: 08/20/2021] [Accepted: 08/22/2021] [Indexed: 10/20/2022]
Abstract
With the constant update of large-scale sequencing data and the continuous improvement of cancer genomics data, such as International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA), it gains increasing importance to detect the functional high-frequency mutation gene set in cells that causes cancer in the field of medicine. In this study, we propose a new recognition method of driver modules, named ECSWalk to solve the issue of mutated gene heterogeneity and improve the accuracy of driver modules detection, based on human protein-protein interaction networks and pan-cancer somatic mutation data. This study first utilizes high mutual exclusivity and high coverage between mutation genes and topological structure similarity of the nodes in complex networks to calculate interaction weights between genes. Second, the method of random walk with restart is utilized to construct a weighted directed network, and the strong connectivity principle of the directed graph is utilized to create the initial candidate modules with a certain number of genes. Finally, the large modules in the candidate modules are split using induced subgraph method, and the small modules are expanded using a greedy strategy to obtain the optimal driver modules. This method is applied to TCGA pan-cancer data and the experimental results show that ECSWalk can detect driver modules more effectively and accurately, and can identify new candidate gene sets with higher biological relevance and statistical significance than MEXCOWalk and HotNet2. Thus, ECSWalk is of theoretical implication and practical value for cancer diagnosis, treatment and drug targets.
Collapse
|
33
|
Karimi MR, Karimi AH, Abolmaali S, Sadeghi M, Schmitz U. Prospects and challenges of cancer systems medicine: from genes to disease networks. Brief Bioinform 2021; 23:6361045. [PMID: 34471925 PMCID: PMC8769701 DOI: 10.1093/bib/bbab343] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 08/02/2021] [Accepted: 08/03/2021] [Indexed: 12/20/2022] Open
Abstract
It is becoming evident that holistic perspectives toward cancer are crucial in deciphering the overwhelming complexity of tumors. Single-layer analysis of genome-wide data has greatly contributed to our understanding of cellular systems and their perturbations. However, fundamental gaps in our knowledge persist and hamper the design of effective interventions. It is becoming more apparent than ever, that cancer should not only be viewed as a disease of the genome but as a disease of the cellular system. Integrative multilayer approaches are emerging as vigorous assets in our endeavors to achieve systemic views on cancer biology. Herein, we provide a comprehensive review of the approaches, methods and technologies that can serve to achieve systemic perspectives of cancer. We start with genome-wide single-layer approaches of omics analyses of cellular systems and move on to multilayer integrative approaches in which in-depth descriptions of proteogenomics and network-based data analysis are provided. Proteogenomics is a remarkable example of how the integration of multiple levels of information can reduce our blind spots and increase the accuracy and reliability of our interpretations and network-based data analysis is a major approach for data interpretation and a robust scaffold for data integration and modeling. Overall, this review aims to increase cross-field awareness of the approaches and challenges regarding the omics-based study of cancer and to facilitate the necessary shift toward holistic approaches.
Collapse
Affiliation(s)
| | | | | | - Mehdi Sadeghi
- Department of Cell & Molecular Biology, Semnan University, Semnan, Iran
| | - Ulf Schmitz
- Department of Molecular & Cell Biology, James Cook University, Townsville, QLD 4811, Australia
| |
Collapse
|
34
|
Tan K, Huang W, Liu X, Hu J, Dong S. A Hierarchical Graph Convolution Network for Representation Learning of Gene Expression Data. IEEE J Biomed Health Inform 2021; 25:3219-3229. [PMID: 33449889 DOI: 10.1109/jbhi.2021.3052008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The curse of dimensionality, which is caused by high-dimensionality and low-sample-size, is a major challenge in gene expression data analysis. However, the real situation is even worse: labelling data is laborious and time-consuming, so only a small part of the limited samples will be labelled. Having such few labelled samples further increases the difficulty of training deep learning models. Interpretability is an important requirement in biomedicine. Many existing deep learning methods are trying to provide interpretability, but rarely apply to gene expression data. Recent semi-supervised graph convolution network methods try to address these problems by smoothing the label information over a graph. However, to the best of our knowledge, these methods only utilize graphs in either the feature space or sample space, which restrict their performance. We propose a transductive semi-supervised representation learning method called a hierarchical graph convolution network (HiGCN) to aggregate the information of gene expression data in both feature and sample spaces. HiGCN first utilizes external knowledge to construct a feature graph and a similarity kernel to construct a sample graph. Then, two spatial-based GCNs are used to aggregate information on these graphs. To validate the model's performance, synthetic and real datasets are provided to lend empirical support. Compared with two recent models and three traditional models, HiGCN learns better representations of gene expression data, and these representations improve the performance of downstream tasks, especially when the model is trained on a few labelled samples. Important features can be extracted from our model to provide reliable interpretability.
Collapse
|
35
|
Abstract
Interpreting the effects of genetic variants is key to understanding individual susceptibility to disease and designing personalized therapeutic approaches. Modern experimental technologies are enabling the generation of massive compendia of human genome sequence data and associated molecular and phenotypic traits, together with genome-scale expression, epigenomics and other functional genomic data. Integrative computational models can leverage these data to understand variant impact, elucidate the effect of dysregulated genes on biological pathways in specific disease and tissue contexts, and interpret disease risk beyond what is feasible with experiments alone. In this Review, we discuss recent developments in machine learning algorithms for genome interpretation and for integrative molecular-level modelling of cells, tissues and organs relevant to disease. More specifically, we highlight existing methods and key challenges and opportunities in identifying specific disease-causing genetic variants and linking them to molecular pathways and, ultimately, to disease phenotypes.
Collapse
|
36
|
|
37
|
Zhou Y, Wang S, Yan H, Pang B, Zhang X, Pang L, Wang Y, Xu J, Hu J, Lan Y, Ping Y. Identifying Key Somatic Copy Number Alterations Driving Dysregulation of Cancer Hallmarks in Lower-Grade Glioma. Front Genet 2021; 12:654736. [PMID: 34163522 PMCID: PMC8215700 DOI: 10.3389/fgene.2021.654736] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2021] [Accepted: 04/26/2021] [Indexed: 01/17/2023] Open
Abstract
Somatic copy-number alterations (SCNAs) are major contributors to cancer development that are pervasive and highly heterogeneous in human cancers. However, the driver roles of SCNAs in cancer are insufficiently characterized. We combined network propagation and linear regression models to design an integrative strategy to identify driver SCNAs and dissect the functional roles of SCNAs by integrating profiles of copy number and gene expression in lower-grade glioma (LGG). We applied our strategy to 511 LGG patients and identified 98 driver genes that dysregulated 29 cancer hallmark signatures, forming 143 active gene-hallmark pairs. We found that these active gene-hallmark pairs could stratify LGG patients into four subtypes with significantly different survival times. The two new subtypes with similar poorest prognoses were driven by two different gene sets (one including EGFR, CDKN2A, CDKN2B, INFA8, and INFA5, and the other including CDK4, AVIL, and DTX3), respectively. The SCNAs of the two gene sets could disorder the same cancer hallmark signature in a mutually exclusive manner (including E2F_TARGETS and G2M_CHECKPOINT). Compared with previous methods, our strategy could not only capture the known cancer genes and directly dissect the functional roles of their SCNAs in LGG, but also discover the functions of new driver genes in LGG, such as IFNA5, IFNA8, and DTX3. Additionally, our method can be applied to a variety of cancer types to explore the pathogenesis of driver SCNAs and improve the treatment and diagnosis of cancer.
Collapse
Affiliation(s)
- Yao Zhou
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Shuai Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Haoteng Yan
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Bo Pang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Xinxin Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Lin Pang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yihan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jinyuan Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jing Hu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yujia Lan
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yanyan Ping
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| |
Collapse
|
38
|
Jia P, Manuel AM, Fernandes BS, Dai Y, Zhao Z. Distinct effect of prenatal and postnatal brain expression across 20 brain disorders and anthropometric social traits: a systematic study of spatiotemporal modularity. Brief Bioinform 2021; 22:6291943. [PMID: 34086851 DOI: 10.1093/bib/bbab214] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 04/30/2021] [Accepted: 05/15/2021] [Indexed: 02/06/2023] Open
Abstract
Different spatiotemporal abnormalities have been implicated in different neuropsychiatric disorders and anthropometric social traits, yet an investigation in the temporal network modularity with brain tissue transcriptomics has been lacking. We developed a supervised network approach to investigate the genome-wide association study (GWAS) results in the spatial and temporal contexts and demonstrated it in 20 brain disorders and anthropometric social traits. BrainSpan transcriptome profiles were used to discover significant modules enriched with trait susceptibility genes in a developmental stage-stratified manner. We investigated whether, and in which developmental stages, GWAS-implicated genes are coordinately expressed in brain transcriptome. We identified significant network modules for each disorder and trait at different developmental stages, providing a systematic view of network modularity at specific developmental stages for a myriad of brain disorders and traits. Specifically, we observed a strong pattern of the fetal origin for most psychiatric disorders and traits [such as schizophrenia (SCZ), bipolar disorder, obsessive-compulsive disorder and neuroticism], whereas increased co-expression activities of genes were more strongly associated with neurological diseases [such as Alzheimer's disease (AD) and amyotrophic lateral sclerosis] and anthropometric traits (such as college completion, education and subjective well-being) in postnatal brains. Further analyses revealed enriched cell types and functional features that were supported and corroborated prior knowledge in specific brain disorders, such as clathrin-mediated endocytosis in AD, myelin sheath in multiple sclerosis and regulation of synaptic plasticity in both college completion and education. Our study provides a landscape view of the spatiotemporal features in a myriad of brain-related disorders and traits.
Collapse
Affiliation(s)
- Peilin Jia
- Center for Precision Health, School of Biomedical Informatics, the University of Texas Health Science Center at Houston, 7000 Fannin St. Suite 600, Houston, TX 77030, USA
| | - Astrid M Manuel
- Center for Precision Health, School of Biomedical Informatics, the University of Texas Health Science Center at Houston, 7000 Fannin St. Suite 600, Houston, TX 77030, USA
| | - Brisa S Fernandes
- Center for Precision Health, School of Biomedical Informatics, the University of Texas Health Science Center at Houston, 7000 Fannin St. Suite 600, Houston, TX 77030, USA
| | - Yulin Dai
- Center for Precision Health, School of Biomedical Informatics, the University of Texas Health Science Center at Houston, 7000 Fannin St. Suite 600, Houston, TX 77030, USA
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, the University of Texas Health Science Center at Houston, 7000 Fannin St. Suite 600, Houston, TX 77030, USA
| |
Collapse
|
39
|
Multilevel proteomics reveals host perturbations by SARS-CoV-2 and SARS-CoV. Nature 2021; 594:246-252. [PMID: 33845483 DOI: 10.1038/s41586-021-03493-4] [Citation(s) in RCA: 371] [Impact Index Per Article: 123.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 03/25/2021] [Indexed: 02/06/2023]
Abstract
The emergence and global spread of SARS-CoV-2 has resulted in the urgent need for an in-depth understanding of molecular functions of viral proteins and their interactions with the host proteome. Several individual omics studies have extended our knowledge of COVID-19 pathophysiology1-10. Integration of such datasets to obtain a holistic view of virus-host interactions and to define the pathogenic properties of SARS-CoV-2 is limited by the heterogeneity of the experimental systems. Here we report a concurrent multi-omics study of SARS-CoV-2 and SARS-CoV. Using state-of-the-art proteomics, we profiled the interactomes of both viruses, as well as their influence on the transcriptome, proteome, ubiquitinome and phosphoproteome of a lung-derived human cell line. Projecting these data onto the global network of cellular interactions revealed crosstalk between the perturbations taking place upon infection with SARS-CoV-2 and SARS-CoV at different levels and enabled identification of distinct and common molecular mechanisms of these closely related coronaviruses. The TGF-β pathway, known for its involvement in tissue fibrosis, was specifically dysregulated by SARS-CoV-2 ORF8 and autophagy was specifically dysregulated by SARS-CoV-2 ORF3. The extensive dataset (available at https://covinet.innatelab.org ) highlights many hotspots that could be targeted by existing drugs and may be used to guide rational design of virus- and host-directed therapies, which we exemplify by identifying inhibitors of kinases and matrix metalloproteases with potent antiviral effects against SARS-CoV-2.
Collapse
|
40
|
Ülgen E, Sezerman OU. driveR: a novel method for prioritizing cancer driver genes using somatic genomics data. BMC Bioinformatics 2021; 22:263. [PMID: 34030627 PMCID: PMC8142487 DOI: 10.1186/s12859-021-04203-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Accepted: 05/17/2021] [Indexed: 12/15/2022] Open
Abstract
Background Cancer develops due to “driver” alterations. Numerous approaches exist for predicting cancer drivers from cohort-scale genomics data. However, methods for personalized analysis of driver genes are underdeveloped. In this study, we developed a novel personalized/batch analysis approach for driver gene prioritization utilizing somatic genomics data, called driveR. Results Combining genomics information and prior biological knowledge, driveR accurately prioritizes cancer driver genes via a multi-task learning model. Testing on 28 different datasets, this study demonstrates that driveR performs adequately, achieving a median AUC of 0.684 (range 0.651–0.861) on the 28 batch analysis test datasets, and a median AUC of 0.773 (range 0–1) on the 5157 personalized analysis test samples. Moreover, it outperforms existing approaches, achieving a significantly higher median AUC than all of MutSigCV (Wilcoxon rank-sum test p < 0.001), DriverNet (p < 0.001), OncodriveFML (p < 0.001) and MutPanning (p < 0.001) on batch analysis test datasets, and a significantly higher median AUC than DawnRank (p < 0.001) and PRODIGY (p < 0.001) on personalized analysis datasets. Conclusions This study demonstrates that the proposed method is an accurate and easy-to-utilize approach for prioritizing driver genes in cancer genomes in personalized or batch analyses. driveR is available on CRAN: https://cran.r-project.org/package=driveR. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04203-7.
Collapse
Affiliation(s)
- Ege Ülgen
- Department of Biostatistics and Medical Informatics, School of Medicine, Acibadem Mehmet Ali Aydinlar University, Istanbul, Turkey.
| | - O Uğur Sezerman
- Department of Biostatistics and Medical Informatics, School of Medicine, Acibadem Mehmet Ali Aydinlar University, Istanbul, Turkey
| |
Collapse
|
41
|
Mosquera Orgueira A, Ferreiro Ferro R, Díaz Arias JÁ, Aliste Santos C, Antelo Rodríguez B, Bao Pérez L, Alonso Vence N, Bendaña López Á, Abuin Blanco A, Melero Valentín P, Peleteiro Raindo A, Cid López M, Pérez Encinas MM, González Pérez MS, Fraga Rodríguez MF, Bello López JL. Detection of new drivers of frequent B-cell lymphoid neoplasms using an integrated analysis of whole genomes. PLoS One 2021; 16:e0248886. [PMID: 33945543 PMCID: PMC8096002 DOI: 10.1371/journal.pone.0248886] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 01/19/2021] [Indexed: 12/21/2022] Open
Abstract
B-cell lymphoproliferative disorders exhibit a diverse spectrum of diagnostic entities with heterogeneous behaviour. Multiple efforts have focused on the determination of the genomic drivers of B-cell lymphoma subtypes. In the meantime, the aggregation of diverse tumors in pan-cancer genomic studies has become a useful tool to detect new driver genes, while enabling the comparison of mutational patterns across tumors. Here we present an integrated analysis of 354 B-cell lymphoid disorders. 112 recurrently mutated genes were discovered, of which KMT2D, CREBBP, IGLL5 and BCL2 were the most frequent, and 31 genes were putative new drivers. Mutations in CREBBP, TNFRSF14 and KMT2D predominated in follicular lymphoma, whereas those in BTG2, HTA-A and PIM1 were more frequent in diffuse large B-cell lymphoma. Additionally, we discovered 31 significantly mutated protein networks, reinforcing the role of genes such as CREBBP, EEF1A1, STAT6, GNA13 and TP53, but also pointing towards a myriad of infrequent players in lymphomagenesis. Finally, we report aberrant expression of oncogenes and tumor suppressors associated with novel noncoding mutations (DTX1 and S1PR2), and new recurrent copy number aberrations affecting immune check-point regulators (CD83, PVR) and B-cell specific genes (TNFRSF13C). Our analysis expands the number of mutational drivers of B-cell lymphoid neoplasms, and identifies several differential somatic events between disease subtypes.
Collapse
Affiliation(s)
- Adrián Mosquera Orgueira
- Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Galicia, Spain
- Department of Hematology, Complexo Hospitalario Universitario de Santiago de Compostela (CHUS), SERGAS, Santiago de Compostela, Galicia, Spain
- University of Santiago de Compostela, Santiago de Compostela, Galicia, Spain
- * E-mail:
| | - Roi Ferreiro Ferro
- Department of Hematology, Complexo Hospitalario Universitario de Santiago de Compostela (CHUS), SERGAS, Santiago de Compostela, Galicia, Spain
| | - José Ángel Díaz Arias
- Department of Hematology, Complexo Hospitalario Universitario de Santiago de Compostela (CHUS), SERGAS, Santiago de Compostela, Galicia, Spain
| | - Carlos Aliste Santos
- Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Galicia, Spain
- Department of Pathology, Complexo Hospitalario Universitario de Santiago de Compostela (CHUS), SERGAS, Santiago de Compostela, Galicia, Spain
| | - Beatriz Antelo Rodríguez
- Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Galicia, Spain
- Department of Pathology, Complexo Hospitalario Universitario de Santiago de Compostela (CHUS), SERGAS, Santiago de Compostela, Galicia, Spain
| | - Laura Bao Pérez
- Department of Hematology, Complexo Hospitalario Universitario de Santiago de Compostela (CHUS), SERGAS, Santiago de Compostela, Galicia, Spain
| | - Natalia Alonso Vence
- Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Galicia, Spain
- Department of Hematology, Complexo Hospitalario Universitario de Santiago de Compostela (CHUS), SERGAS, Santiago de Compostela, Galicia, Spain
| | - Ággeles Bendaña López
- Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Galicia, Spain
- Department of Hematology, Complexo Hospitalario Universitario de Santiago de Compostela (CHUS), SERGAS, Santiago de Compostela, Galicia, Spain
- University of Santiago de Compostela, Santiago de Compostela, Galicia, Spain
| | - Aitor Abuin Blanco
- Department of Hematology, Complexo Hospitalario Universitario de Santiago de Compostela (CHUS), SERGAS, Santiago de Compostela, Galicia, Spain
| | - Paula Melero Valentín
- Department of Hematology, Complexo Hospitalario Universitario de Santiago de Compostela (CHUS), SERGAS, Santiago de Compostela, Galicia, Spain
| | - And´res Peleteiro Raindo
- Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Galicia, Spain
- Department of Hematology, Complexo Hospitalario Universitario de Santiago de Compostela (CHUS), SERGAS, Santiago de Compostela, Galicia, Spain
- University of Santiago de Compostela, Santiago de Compostela, Galicia, Spain
| | - Miguel Cid López
- Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Galicia, Spain
- Department of Hematology, Complexo Hospitalario Universitario de Santiago de Compostela (CHUS), SERGAS, Santiago de Compostela, Galicia, Spain
- University of Santiago de Compostela, Santiago de Compostela, Galicia, Spain
| | - Manuel Mateo Pérez Encinas
- Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Galicia, Spain
- Department of Hematology, Complexo Hospitalario Universitario de Santiago de Compostela (CHUS), SERGAS, Santiago de Compostela, Galicia, Spain
- University of Santiago de Compostela, Santiago de Compostela, Galicia, Spain
| | - Marta Sonia González Pérez
- Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Galicia, Spain
- Department of Hematology, Complexo Hospitalario Universitario de Santiago de Compostela (CHUS), SERGAS, Santiago de Compostela, Galicia, Spain
| | - Máximo Francisco Fraga Rodríguez
- Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Galicia, Spain
- University of Santiago de Compostela, Santiago de Compostela, Galicia, Spain
- Department of Pathology, Complexo Hospitalario Universitario de Santiago de Compostela (CHUS), SERGAS, Santiago de Compostela, Galicia, Spain
| | - José Luis Bello López
- Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Galicia, Spain
- Department of Hematology, Complexo Hospitalario Universitario de Santiago de Compostela (CHUS), SERGAS, Santiago de Compostela, Galicia, Spain
- University of Santiago de Compostela, Santiago de Compostela, Galicia, Spain
| |
Collapse
|
42
|
Gumpinger AC, Rieck B, Grimm DG, Borgwardt K. Network-guided search for genetic heterogeneity between gene pairs. Bioinformatics 2021; 37:57-65. [PMID: 32573681 PMCID: PMC8034561 DOI: 10.1093/bioinformatics/btaa581] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Revised: 05/19/2020] [Accepted: 06/15/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Correlating genetic loci with a disease phenotype is a common approach to improve our understanding of the genetics underlying complex diseases. Standard analyses mostly ignore two aspects, namely genetic heterogeneity and interactions between loci. Genetic heterogeneity, the phenomenon that genetic variants at different loci lead to the same phenotype, promises to increase statistical power by aggregating low-signal variants. Incorporating interactions between loci results in a computational and statistical bottleneck due to the vast amount of candidate interactions. RESULTS We propose a novel method SiNIMin that addresses these two aspects by finding pairs of interacting genes that are, upon combination, associated with a phenotype of interest under a model of genetic heterogeneity. We guide the interaction search using biological prior knowledge in the form of protein-protein interaction networks. Our method controls type I error and outperforms state-of-the-art methods with respect to statistical power. Additionally, we find novel associations for multiple Arabidopsis thaliana phenotypes, and, with an adapted variant of SiNIMin, for a study of rare variants in migraine patients. AVAILABILITY AND IMPLEMENTATION Code available at https://github.com/BorgwardtLab/SiNIMin. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anja C Gumpinger
- Department of Biosystems Science and Engineering, ETH Zürich, Basel 4058, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| | - Bastian Rieck
- Department of Biosystems Science and Engineering, ETH Zürich, Basel 4058, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| | - Dominik G Grimm
- Technical University of Munich, TUM Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Straubing 94315, Germany.,Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, Straubing 94315, Germany
| | | | - Karsten Borgwardt
- Department of Biosystems Science and Engineering, ETH Zürich, Basel 4058, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| |
Collapse
|
43
|
Pham VVH, Liu L, Bracken CP, Nguyen T, Goodall GJ, Li J, Le TD. pDriver : A novel method for unravelling personalised coding and miRNA cancer drivers. Bioinformatics 2021; 37:3285-3292. [PMID: 33904576 DOI: 10.1093/bioinformatics/btab262] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Revised: 03/19/2021] [Accepted: 04/22/2021] [Indexed: 02/07/2023] Open
Abstract
MOTIVATION Unravelling cancer driver genes is important in cancer research. Although computational methods have been developed to identify cancer drivers, most of them detect cancer drivers at population level. However, two patients who have the same cancer type and receive the same treatment may have different outcomes because each patient has a different genome and their disease might be driven by different driver genes. Therefore new methods are being developed for discovering cancer drivers at individual level, but existing personalised methods only focus on coding drivers while microRNAs (miRNAs) have been shown to drive cancer progression as well. Thus, novel methods are required to discover both coding and miRNA cancer drivers at individual level. RESULTS We propose the novel method, pDriver, to discover personalised cancer drivers. pDriver includes two stages: (1) Constructing gene networks for each cancer patient and (2) Discovering cancer drivers for each patient based on the constructed gene networks. To demonstrate the effectiveness of pDriver, we have applied it to five TCGA cancer datasets and compared it with the state-of-the-art methods. The result indicates that pDriver is more effective than other methods. Furthermore, pDriver can also detect miRNA cancer drivers and most of them have been confirmed to be associated with cancer by literature. We further analyse the predicted personalised drivers for breast cancer patients and the result shows that they are significantly enriched in many GO processes and KEGG pathways involved in breast cancer. AVAILABILITY AND IMPLEMENTATION pDriver is available at https://github.com/pvvhoang/pDriver. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Vu V H Pham
- UniSA STEM, University of South Australia, Mawson Lakes, SA 5095, Australia
| | - Lin Liu
- UniSA STEM, University of South Australia, Mawson Lakes, SA 5095, Australia
| | - Cameron P Bracken
- Centre for Cancer Biology, an alliance of SA Pathology and University of South Australia, Adelaide, SA 5000, Australia.,Department of Medicine, The University of Adelaide, Adelaide, SA 5005, Australia
| | - Thin Nguyen
- Applied Artificial Intelligence Institute, Deakin University, Australia
| | - Gregory J Goodall
- Centre for Cancer Biology, an alliance of SA Pathology and University of South Australia, Adelaide, SA 5000, Australia.,Department of Medicine, The University of Adelaide, Adelaide, SA 5005, Australia
| | - Jiuyong Li
- UniSA STEM, University of South Australia, Mawson Lakes, SA 5095, Australia
| | - Thuc D Le
- UniSA STEM, University of South Australia, Mawson Lakes, SA 5095, Australia
| |
Collapse
|
44
|
Integration of multiomics data with graph convolutional networks to identify new cancer genes and their associated molecular mechanisms. NAT MACH INTELL 2021. [DOI: 10.1038/s42256-021-00325-y] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
45
|
Lazareva O, Baumbach J, List M, Blumenthal DB. On the limits of active module identification. Brief Bioinform 2021; 22:6189770. [PMID: 33782690 DOI: 10.1093/bib/bbab066] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Revised: 01/29/2021] [Indexed: 12/12/2022] Open
Abstract
In network and systems medicine, active module identification methods (AMIMs) are widely used for discovering candidate molecular disease mechanisms. To this end, AMIMs combine network analysis algorithms with molecular profiling data, most commonly, by projecting gene expression data onto generic protein-protein interaction (PPI) networks. Although active module identification has led to various novel insights into complex diseases, there is increasing awareness in the field that the combination of gene expression data and PPI network is problematic because up-to-date PPI networks have a very small diameter and are subject to both technical and literature bias. In this paper, we report the results of an extensive study where we analyzed for the first time whether widely used AMIMs really benefit from using PPI networks. Our results clearly show that, except for the recently proposed AMIM DOMINO, the tested AMIMs do not produce biologically more meaningful candidate disease modules on widely used PPI networks than on random networks with the same node degrees. AMIMs hence mainly learn from the node degrees and mostly fail to exploit the biological knowledge encoded in the edges of the PPI networks. This has far-reaching consequences for the field of active module identification. In particular, we suggest that novel algorithms are needed which overcome the degree bias of most existing AMIMs and/or work with customized, context-specific networks instead of generic PPI networks.
Collapse
Affiliation(s)
- Olga Lazareva
- Chair of Experimental Bioinformatics, Technical University of Munich, Freising, Germany
| | - Jan Baumbach
- Chair of Experimental Bioinformatics, Technical University of Munich, Freising, Germany.,Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany.,Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Markus List
- Chair of Experimental Bioinformatics, Technical University of Munich, Freising, Germany
| | - David B Blumenthal
- Chair of Experimental Bioinformatics, Technical University of Munich, Freising, Germany
| |
Collapse
|
46
|
Spierer AN, Mossman JA, Smith SP, Crawford L, Ramachandran S, Rand DM. Natural variation in the regulation of neurodevelopmental genes modifies flight performance in Drosophila. PLoS Genet 2021; 17:e1008887. [PMID: 33735180 PMCID: PMC7971549 DOI: 10.1371/journal.pgen.1008887] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Accepted: 01/26/2021] [Indexed: 12/28/2022] Open
Abstract
The winged insects of the order Diptera are colloquially named for their most recognizable phenotype: flight. These insects rely on flight for a number of important life history traits, such as dispersal, foraging, and courtship. Despite the importance of flight, relatively little is known about the genetic architecture of flight performance. Accordingly, we sought to uncover the genetic modifiers of flight using a measure of flies’ reaction and response to an abrupt drop in a vertical flight column. We conducted a genome wide association study (GWAS) using 197 of the Drosophila Genetic Reference Panel (DGRP) lines, and identified a combination of additive and marginal variants, epistatic interactions, whole genes, and enrichment across interaction networks. Egfr, a highly pleiotropic developmental gene, was among the most significant additive variants identified. We functionally validated 13 of the additive candidate genes’ (Adgf-A/Adgf-A2/CG32181, bru1, CadN, flapper (CG11073), CG15236, flippy (CG9766), CREG, Dscam4, form3, fry, Lasp/CG9692, Pde6, Snoo), and introduce a novel approach to whole gene significance screens: PEGASUS_flies. Additionally, we identified ppk23, an Acid Sensing Ion Channel (ASIC) homolog, as an important hub for epistatic interactions. We propose a model that suggests genetic modifiers of wing and muscle morphology, nervous system development and function, BMP signaling, sexually dimorphic neural wiring, and gene regulation are all important for the observed differences flight performance in a natural population. Additionally, these results represent a snapshot of the genetic modifiers affecting drop-response flight performance in Drosophila, with implications for other insects. Insect flight is a widely recognizable phenotype of many winged insects, hence the name: flies. While fruit flies, or Drosophila melanogaster, are a genetically tractable model, flight performance is a highly integrative phenotype, and therefore challenging to identify comprehensively which genetic modifiers contribute to its genetic architecture. Accordingly, we screened 197 Drosophila Genetic Reference Panel lines for their ability to react and respond to an abrupt drop. Using several computational approaches, we identified additive, marginal, and epistatic variants, as well as whole genes and altered sub-networks of gene-gene and protein-protein interaction networks that contribute to variation in flight performance. More generally, we demonstrate the benefits of employing multiple methodologies to elucidate the genetic architecture of complex traits. Many variants and genes mapped to regions of the genome that affect neurodevelopment, wing and muscle development, and regulation of gene expression. We also introduce PEGASUS_flies, a Drosophila-adapted version of the PEGASUS platform first used in human studies, to infer gene-level significance of association based on the gene’s distribution of individual variant P-values. Our results contribute to the debate over the relative importance of individual, additive factors and epistatic, or higher order, interactions, in the mapping of genotype to phenotype.
Collapse
Affiliation(s)
- Adam N Spierer
- Department of Ecology and Evolutionary Biology, Brown University, Providence, Rhode Island, United States of America
| | - Jim A Mossman
- Department of Ecology and Evolutionary Biology, Brown University, Providence, Rhode Island, United States of America
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
| | - Samuel Pattillo Smith
- Department of Ecology and Evolutionary Biology, Brown University, Providence, Rhode Island, United States of America
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
| | - Lorin Crawford
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
- Microsoft Research New England, Cambridge, Massachusetts, United States of America
| | - Sohini Ramachandran
- Department of Ecology and Evolutionary Biology, Brown University, Providence, Rhode Island, United States of America
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
| | - David M Rand
- Department of Ecology and Evolutionary Biology, Brown University, Providence, Rhode Island, United States of America
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
| |
Collapse
|
47
|
Erten C, Houdjedj A, Kazan H. Ranking cancer drivers via betweenness-based outlier detection and random walks. BMC Bioinformatics 2021; 22:62. [PMID: 33568049 PMCID: PMC7877041 DOI: 10.1186/s12859-021-03989-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Accepted: 01/31/2021] [Indexed: 12/04/2022] Open
Abstract
Background Recent cancer genomic studies have generated detailed molecular data on a large number of cancer patients. A key remaining problem in cancer genomics is the identification of driver genes. Results We propose BetweenNet, a computational approach that integrates genomic data with a protein-protein interaction network to identify cancer driver genes. BetweenNet utilizes a measure based on betweenness centrality on patient specific networks to identify the so-called outlier genes that correspond to dysregulated genes for each patient. Setting up the relationship between the mutated genes and the outliers through a bipartite graph, it employs a random-walk process on the graph, which provides the final prioritization of the mutated genes. We compare BetweenNet against state-of-the art cancer gene prioritization methods on lung, breast, and pan-cancer datasets. Conclusions Our evaluations show that BetweenNet is better at recovering known cancer genes based on multiple reference databases. Additionally, we show that the GO terms and the reference pathways enriched in BetweenNet ranked genes and those that are enriched in known cancer genes overlap significantly when compared to the overlaps achieved by the rankings of the alternative methods.
Collapse
Affiliation(s)
- Cesim Erten
- Department of Computer Engineering, Antalya Bilim University, Antalya, Turkey
| | - Aissa Houdjedj
- Electrical and Computer Engineering Graduate Program, Antalya Bilim University, Antalya, Turkey
| | - Hilal Kazan
- Department of Computer Engineering, Antalya Bilim University, Antalya, Turkey.
| |
Collapse
|
48
|
Reyna MA, Chitra U, Elyanow R, Raphael BJ. NetMix: A Network-Structured Mixture Model for Reduced-Bias Estimation of Altered Subnetworks. J Comput Biol 2021; 28:469-484. [PMID: 33400606 DOI: 10.1089/cmb.2020.0435] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
A classic problem in computational biology is the identification of altered subnetworks: subnetworks of an interaction network that contain genes/proteins that are differentially expressed, highly mutated, or otherwise aberrant compared with other genes/proteins. Numerous methods have been developed to solve this problem under various assumptions, but the statistical properties of these methods are often unknown. For example, some widely used methods are reported to output very large subnetworks that are difficult to interpret biologically. In this work, we formulate the identification of altered subnetworks as the problem of estimating the parameters of a class of probability distributions that we call the Altered Subset Distribution (ASD). We derive a connection between a popular method, jActiveModules, and the maximum likelihood estimator (MLE) of the ASD. We show that the MLE is statistically biased, explaining the large subnetworks output by jActiveModules. Based on these insights, we introduce NetMix, an algorithm that uses Gaussian mixture models to obtain less biased estimates of the parameters of the ASD. We demonstrate that NetMix outperforms existing methods in identifying altered subnetworks on both simulated and real data, including the identification of differentially expressed genes from both microarray and RNA-seq experiments and the identification of cancer driver genes in somatic mutation data.
Collapse
Affiliation(s)
- Matthew A Reyna
- Department of Biomedical Informatics, Emory University, Atlanta, Georgia, USA
| | - Uthsav Chitra
- Department of Computer Science, Princeton University, Princeton, New Jersey, USA
| | - Rebecca Elyanow
- Department of Computer Science, Princeton University, Princeton, New Jersey, USA
- Department of Computer Science, Brown University, Providence, Rhode Island, USA
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, New Jersey, USA
| |
Collapse
|
49
|
Yang L, Chen R, Goodison S, Sun Y. An efficient and effective method to identify significantly perturbed subnetworks in cancer. NATURE COMPUTATIONAL SCIENCE 2021; 1:79-88. [PMID: 37346964 PMCID: PMC10284573 DOI: 10.1038/s43588-020-00009-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Accepted: 12/02/2020] [Indexed: 06/23/2023]
Abstract
The identification of key functional biological networks from high-dimensional genomics data is pivotal for cancer research. Here, we introduce FDRnet, a method for the detection of molecular subnetworks in cancer, which addresses several challenges in pathway analysis. FDRnet detects key subnetworks by solving a mixed-integer linear programming problem, using a given upper bound of false discovery rate (FDR) as a budget constraint, and minimizing a conductance score to find dense subgraphs around seed genes. A large-scale benchmark study was performed on both simulation and cancer genomics data. FDRnet outperformed other methods in the ability to detect functionally homogeneous subnetworks in a scale-free biological network, to control FDRs of the genes in detected subnetworks, to improve computational efficiency and to integrate multi-omics data. By overcoming the limitations of existing approaches, FDRnet can facilitate the detection of key functional pathways in cancer and other genetic diseases.
Collapse
Affiliation(s)
- Le Yang
- Department of Computer Science and Engineering, The State University of New York at Buffalo, Buffalo, NY, USA
| | - Runpu Chen
- Department of Computer Science and Engineering, The State University of New York at Buffalo, Buffalo, NY, USA
| | - Steve Goodison
- Department of Health Sciences Research, Mayo Clinic, Jacksonville, FL, USA
| | - Yijun Sun
- Department of Computer Science and Engineering, The State University of New York at Buffalo, Buffalo, NY, USA
- Department of Microbiology and Immunology, The State University of New York at Buffalo, Buffalo, NY, USA
- Department of Biostatistics, The State University of New York at Buffalo, Buffalo, NY, USA
| |
Collapse
|
50
|
Barel G, Herwig R. NetCore: a network propagation approach using node coreness. Nucleic Acids Res 2020; 48:e98. [PMID: 32735660 PMCID: PMC7515737 DOI: 10.1093/nar/gkaa639] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Revised: 06/22/2020] [Accepted: 07/21/2020] [Indexed: 02/07/2023] Open
Abstract
We present NetCore, a novel network propagation approach based on node coreness, for phenotype–genotype associations and module identification. NetCore addresses the node degree bias in PPI networks by using node coreness in the random walk with restart procedure, and achieves improved re-ranking of genes after propagation. Furthermore, NetCore implements a semi-supervised approach to identify phenotype-associated network modules, which anchors the identification of novel candidate genes at known genes associated with the phenotype. We evaluated NetCore on gene sets from 11 different GWAS traits and showed improved performance compared to the standard degree-based network propagation using cross-validation. Furthermore, we applied NetCore to identify disease genes and modules for Schizophrenia GWAS data and pan-cancer mutation data. We compared the novel approach to existing network propagation approaches and showed the benefits of using NetCore in comparison to those. We provide an easy-to-use implementation, together with a high confidence PPI network extracted from ConsensusPathDB, which can be applied to various types of genomics data in order to obtain a re-ranking of genes and functionally relevant network modules.
Collapse
Affiliation(s)
- Gal Barel
- Department of Computational Molecular Biology, Max-Planck-Institute for Molecular Genetics, Ihnestrasse 63-73, 14195 Berlin, Germany
| | - Ralf Herwig
- Department of Computational Molecular Biology, Max-Planck-Institute for Molecular Genetics, Ihnestrasse 63-73, 14195 Berlin, Germany
| |
Collapse
|