1
|
Trimbour R, Deutschmann IM, Cantini L. Molecular mechanisms reconstruction from single-cell multi-omics data with HuMMuS. Bioinformatics 2024; 40:btae143. [PMID: 38460192 PMCID: PMC11065476 DOI: 10.1093/bioinformatics/btae143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 12/20/2023] [Accepted: 03/07/2024] [Indexed: 03/11/2024] Open
Abstract
MOTIVATION The molecular identity of a cell results from a complex interplay between heterogeneous molecular layers. Recent advances in single-cell sequencing technologies have opened the possibility to measure such molecular layers of regulation. RESULTS Here, we present HuMMuS, a new method for inferring regulatory mechanisms from single-cell multi-omics data. Differently from the state-of-the-art, HuMMuS captures cooperation between biological macromolecules and can easily include additional layers of molecular regulation. We benchmarked HuMMuS with respect to the state-of-the-art on both paired and unpaired multi-omics datasets. Our results proved the improvements provided by HuMMuS in terms of transcription factor (TF) targets, TF binding motifs and regulatory regions prediction. Finally, once applied to snmC-seq, scATAC-seq and scRNA-seq data from mouse brain cortex, HuMMuS enabled to accurately cluster scRNA profiles and to identify potential driver TFs. AVAILABILITY AND IMPLEMENTATION HuMMuS is available at https://github.com/cantinilab/HuMMuS.
Collapse
Affiliation(s)
- Remi Trimbour
- Institut Pasteur, Université Paris Cité, CNRS UMR 3738, Machine Learning for Integrative Genomics Group, F-75015 Paris, France
- Institut de Biologie de l’Ecole Normale Supérieure, CNRS, INSERM, Ecole Normale Supérieure, Université PSL, 75005 Paris, France
| | - Ina Maria Deutschmann
- Institut de Biologie de l’Ecole Normale Supérieure, CNRS, INSERM, Ecole Normale Supérieure, Université PSL, 75005 Paris, France
| | - Laura Cantini
- Institut Pasteur, Université Paris Cité, CNRS UMR 3738, Machine Learning for Integrative Genomics Group, F-75015 Paris, France
- Institut de Biologie de l’Ecole Normale Supérieure, CNRS, INSERM, Ecole Normale Supérieure, Université PSL, 75005 Paris, France
| |
Collapse
|
2
|
Núñez-Carpintero I, Rigau M, Bosio M, O'Connor E, Spendiff S, Azuma Y, Topf A, Thompson R, 't Hoen PAC, Chamova T, Tournev I, Guergueltcheva V, Laurie S, Beltran S, Capella-Gutiérrez S, Cirillo D, Lochmüller H, Valencia A. Rare disease research workflow using multilayer networks elucidates the molecular determinants of severity in Congenital Myasthenic Syndromes. Nat Commun 2024; 15:1227. [PMID: 38418480 PMCID: PMC10902324 DOI: 10.1038/s41467-024-45099-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 01/15/2024] [Indexed: 03/01/2024] Open
Abstract
Exploring the molecular basis of disease severity in rare disease scenarios is a challenging task provided the limitations on data availability. Causative genes have been described for Congenital Myasthenic Syndromes (CMS), a group of diverse minority neuromuscular junction (NMJ) disorders; yet a molecular explanation for the phenotypic severity differences remains unclear. Here, we present a workflow to explore the functional relationships between CMS causal genes and altered genes from each patient, based on multilayer network community detection analysis of complementary biomedical information provided by relevant data sources, namely protein-protein interactions, pathways and metabolomics. Our results show that CMS severity can be ascribed to the personalized impairment of extracellular matrix components and postsynaptic modulators of acetylcholine receptor (AChR) clustering. This work showcases how coupling multilayer network analysis with personalized -omics information provides molecular explanations to the varying severity of rare diseases; paving the way for sorting out similar cases in other rare diseases.
Collapse
Affiliation(s)
- Iker Núñez-Carpintero
- Barcelona Supercomputing Center (BSC), Plaça Eusebi Güell, 1-3, 08034, Barcelona, Spain
| | - Maria Rigau
- Barcelona Supercomputing Center (BSC), Plaça Eusebi Güell, 1-3, 08034, Barcelona, Spain
- MRC London Institute of Medical Sciences, Du Cane Road, London, W12 0NN, UK
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Hospital Campus, Du Cane Road, London, W12 0NN, UK
| | - Mattia Bosio
- Barcelona Supercomputing Center (BSC), Plaça Eusebi Güell, 1-3, 08034, Barcelona, Spain
- Coordination Unit Spanish National Bioinformatics Institute (INB/ELIXIR-ES), Barcelona Supercomputing Center, Barcelona, Spain
| | - Emily O'Connor
- Children's Hospital of Eastern Ontario Research Institute, Ottawa, ON, Canada
- Brain and Mind Research Institute, University of Ottawa, Ottawa, ON, Canada
| | - Sally Spendiff
- Children's Hospital of Eastern Ontario Research Institute, Ottawa, ON, Canada
| | - Yoshiteru Azuma
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
- Department of Pediatrics, Aichi Medical University, Nagakute, Japan
| | - Ana Topf
- John Walton Muscular Dystrophy Research Centre, Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne, United Kingdom
- Newcastle Hospitals NHS Foundation Trust, Newcastle upon Tyne, United Kingdom
| | - Rachel Thompson
- Children's Hospital of Eastern Ontario Research Institute, Ottawa, ON, Canada
| | - Peter A C 't Hoen
- Center for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud university medical center, Nijmegen, The Netherlands
| | - Teodora Chamova
- Department of Neurology, Expert Centre for Hereditary Neurologic and Metabolic Disorders, Alexandrovska University Hospital, Medical University-Sofia, Sofia, Bulgaria
| | - Ivailo Tournev
- Department of Neurology, Expert Centre for Hereditary Neurologic and Metabolic Disorders, Alexandrovska University Hospital, Medical University-Sofia, Sofia, Bulgaria
- Department of Cognitive Science and Psychology, New Bulgarian University, Sofia, 1618, Bulgaria
| | - Velina Guergueltcheva
- Clinic of Neurology, University Hospital Sofiamed, Sofia University St. Kliment Ohridski, Sofia, Bulgaria
| | - Steven Laurie
- Centro Nacional de Análisis Genómico (CNAG-CRG), Center for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Barcelona, Catalonia, Spain
| | - Sergi Beltran
- Centro Nacional de Análisis Genómico (CNAG-CRG), Center for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Barcelona, Catalonia, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Universitat de Barcelona (UB), Barcelona, Spain
| | - Salvador Capella-Gutiérrez
- Barcelona Supercomputing Center (BSC), Plaça Eusebi Güell, 1-3, 08034, Barcelona, Spain
- Coordination Unit Spanish National Bioinformatics Institute (INB/ELIXIR-ES), Barcelona Supercomputing Center, Barcelona, Spain
| | - Davide Cirillo
- Barcelona Supercomputing Center (BSC), Plaça Eusebi Güell, 1-3, 08034, Barcelona, Spain.
| | - Hanns Lochmüller
- Children's Hospital of Eastern Ontario Research Institute, Ottawa, ON, Canada
- Brain and Mind Research Institute, University of Ottawa, Ottawa, ON, Canada
- Centro Nacional de Análisis Genómico (CNAG-CRG), Center for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Barcelona, Catalonia, Spain
- Division of Neurology, Department of Medicine, The Ottawa Hospital, Ottawa, ON, Canada
- Department of Neuropediatrics and Muscle Disorders, Medical Center - University of Freiburg, Faculty of Medicine, Freiburg, Germany
| | - Alfonso Valencia
- Barcelona Supercomputing Center (BSC), Plaça Eusebi Güell, 1-3, 08034, Barcelona, Spain
- Coordination Unit Spanish National Bioinformatics Institute (INB/ELIXIR-ES), Barcelona Supercomputing Center, Barcelona, Spain
- ICREA, Pg. Lluís Companys 23, 08010, Barcelona, Spain
| |
Collapse
|
3
|
Lu K, Gong H, Yang D, Ye M, Fang Q, Zhang XY, Wu R. Genome-Wide Network Analysis of Above- and Below-Ground Co-growth in Populus euphratica. PLANT PHENOMICS (WASHINGTON, D.C.) 2024; 6:0131. [PMID: 38188223 PMCID: PMC10769449 DOI: 10.34133/plantphenomics.0131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Accepted: 12/12/2023] [Indexed: 01/09/2024]
Abstract
Tree growth is the consequence of developmental interactions between above- and below-ground compartments. However, a comprehensive view of the genetic architecture of growth as a cohesive whole is poorly understood. We propose a systems biology approach for mapping growth trajectories in genome-wide association studies viewing growth as a complex (phenotypic) system in which above- and below-ground components (or traits) interact with each other to mediate systems behavior. We further assume that trait-trait interactions are controlled by a genetic system composed of many different interactive genes and integrate the Lotka-Volterra predator-prey model to dissect phenotypic and genetic systems into pleiotropic and epistatic interaction components by which the detailed genetic mechanism of above- and below-ground co-growth can be charted. We apply the approach to analyze linkage mapping data of Populus euphratica, which is the only tree species that can grow in the desert, and characterize several loci that govern how above- and below-ground growth is cooperated or competed over development. We reconstruct multilayer and multiplex genetic interactome networks for the developmental trajectories of each trait and their developmental covariation. Many significant loci and epistatic effects detected can be annotated to candidate genes for growth and developmental processes. The results from our model may potentially be useful for marker-assisted selection and genetic editing in applied tree breeding programs. The model provides a general tool to characterize a complete picture of pleiotropic and epistatic genetic architecture in growth traits in forest trees and any other organisms.
Collapse
Affiliation(s)
- Kaiyan Lu
- College of Science,
Beijing Forestry University, Beijing 100083, P. R. China
| | - Huiying Gong
- Center for Computational Biology, College of Biological Sciences and Technology,
Beijing Forestry University, Beijing 100083, P. R. China
| | - Dengcheng Yang
- Center for Computational Biology, College of Biological Sciences and Technology,
Beijing Forestry University, Beijing 100083, P. R. China
| | - Meixia Ye
- Center for Computational Biology, College of Biological Sciences and Technology,
Beijing Forestry University, Beijing 100083, P. R. China
| | - Qing Fang
- Faculty of Science,
Yamagata University, Yamagata 990, Japan
| | - Xiao-Yu Zhang
- College of Science,
Beijing Forestry University, Beijing 100083, P. R. China
| | - Rongling Wu
- Yanqi Lake BeijingInstitute of Mathematical Sciences and Applications, Beijing 101408, China
- Center for Computational Biology, College of Biological Sciences and Technology,
Beijing Forestry University, Beijing 100083, P. R. China
| |
Collapse
|
4
|
Russell M, Aqi A, Saitou M, Gokcumen O, Masuda N. Gene communities in co-expression networks across different tissues. ARXIV 2023:arXiv:2305.12963v2. [PMID: 37292479 PMCID: PMC10246089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
With the recent availability of tissue-specific gene expression data, e.g., provided by the GTEx Consortium, there is interest in comparing gene co-expression patterns across tissues. One promising approach to this problem is to use a multilayer network analysis framework and perform multilayer community detection. Communities in gene co-expression networks reveal groups of genes similarly expressed across individuals, potentially involved in related biological processes responding to specific environmental stimuli or sharing common regulatory variations. We construct a multilayer network in which each of the four layers is an exocrine gland tissue-specific gene co-expression network. We develop methods for multilayer community detection with correlation matrix input and an appropriate null model. Our correlation matrix input method identifies five groups of genes that are similarly co-expressed in multiple tissues (a community that spans multiple layers, which we call a generalist community) and two groups of genes that are co-expressed in just one tissue (a community that lies primarily within just one layer, which we call a specialist community). We further found gene co-expression communities where the genes physically cluster across the genome significantly more than expected by chance (on chromosomes 1 and 11). This clustering hints at underlying regulatory elements determining similar expression patterns across individuals and cell types. We suggest that KRTAP3-1, KRTAP3-3, and KRTAP3-5 share regulatory elements in skin and pancreas. Furthermore, we find that CELA3A and CELA3B share associated expression quantitative trait loci in the pancreas. The results indicate that our multilayer community detection method for correlation matrix input extracts biologically interesting communities of genes.
Collapse
Affiliation(s)
| | - Alber Aqi
- Department of Biological Sciences, University at Buffalo
| | - Marie Saitou
- Faculty of Biosciences, Norwegian University of Life Sciences
| | - Omer Gokcumen
- Department of Biological Sciences, University at Buffalo
| | - Naoki Masuda
- Department of Mathematics, University at Buffalo
- Institute for Artificial Intelligence and Data Science, University at Buffalo
| |
Collapse
|
5
|
Russell M, Aqil A, Saitou M, Gokcumen O, Masuda N. Gene communities in co-expression networks across different tissues. PLoS Comput Biol 2023; 19:e1011616. [PMID: 37976327 PMCID: PMC10691702 DOI: 10.1371/journal.pcbi.1011616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 12/01/2023] [Accepted: 10/19/2023] [Indexed: 11/19/2023] Open
Abstract
With the recent availability of tissue-specific gene expression data, e.g., provided by the GTEx Consortium, there is interest in comparing gene co-expression patterns across tissues. One promising approach to this problem is to use a multilayer network analysis framework and perform multilayer community detection. Communities in gene co-expression networks reveal groups of genes similarly expressed across individuals, potentially involved in related biological processes responding to specific environmental stimuli or sharing common regulatory variations. We construct a multilayer network in which each of the four layers is an exocrine gland tissue-specific gene co-expression network. We develop methods for multilayer community detection with correlation matrix input and an appropriate null model. Our correlation matrix input method identifies five groups of genes that are similarly co-expressed in multiple tissues (a community that spans multiple layers, which we call a generalist community) and two groups of genes that are co-expressed in just one tissue (a community that lies primarily within just one layer, which we call a specialist community). We further found gene co-expression communities where the genes physically cluster across the genome significantly more than expected by chance (on chromosomes 1 and 11). This clustering hints at underlying regulatory elements determining similar expression patterns across individuals and cell types. We suggest that KRTAP3-1, KRTAP3-3, and KRTAP3-5 share regulatory elements in skin and pancreas. Furthermore, we find that CELA3A and CELA3B share associated expression quantitative trait loci in the pancreas. The results indicate that our multilayer community detection method for correlation matrix input extracts biologically interesting communities of genes.
Collapse
Affiliation(s)
- Madison Russell
- Department of Mathematics, State University of New York at Buffalo, Buffalo, New York, United States of America
| | - Alber Aqil
- Department of Biological Sciences, State University of New York at Buffalo, Buffalo, New York, United States of America
| | - Marie Saitou
- Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Omer Gokcumen
- Department of Biological Sciences, State University of New York at Buffalo, Buffalo, New York, United States of America
| | - Naoki Masuda
- Department of Mathematics, State University of New York at Buffalo, Buffalo, New York, United States of America
- Institute for Artificial Intelligence and Data Science, State University of New York at Buffalo, Buffalo, New York, United States of America
| |
Collapse
|
6
|
Yassin A, Haidar A, Cherifi H, Seba H, Togni O. An evaluation tool for backbone extraction techniques in weighted complex networks. Sci Rep 2023; 13:17000. [PMID: 37813946 PMCID: PMC10562457 DOI: 10.1038/s41598-023-42076-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 09/05/2023] [Indexed: 10/11/2023] Open
Abstract
Networks are essential for analyzing complex systems. However, their growing size necessitates backbone extraction techniques aimed at reducing their size while retaining critical features. In practice, selecting, implementing, and evaluating the most suitable backbone extraction method may be challenging. This paper introduces netbone, a Python package designed for assessing the performance of backbone extraction techniques in weighted networks. Its comparison framework is the standout feature of netbone. Indeed, the tool incorporates state-of-the-art backbone extraction techniques. Furthermore, it provides a comprehensive suite of evaluation metrics allowing users to evaluate different backbones techniques. We illustrate the flexibility and effectiveness of netbone through the US air transportation network analysis. We compare the performance of different backbone extraction techniques using the evaluation metrics. We also show how users can integrate a new backbone extraction method into the comparison framework. netbone is publicly available as an open-source tool, ensuring its accessibility to researchers and practitioners. Promoting standardized evaluation practices contributes to the advancement of backbone extraction techniques and fosters reproducibility and comparability in research efforts. We anticipate that netbone will serve as a valuable resource for researchers and practitioners enabling them to make informed decisions when selecting backbone extraction techniques to gain insights into the structural and functional properties of complex systems.
Collapse
Affiliation(s)
- Ali Yassin
- Laboratoire d'Informatique de Bourgogne, University of Burgundy, Dijon, France.
| | - Abbas Haidar
- Computer Science Department, Lebanese University, Beirut, Lebanon
| | - Hocine Cherifi
- ICB UMR 6303 CNRS, Univ. Bourgogne - Franche-Comté, Dijon, France
| | - Hamida Seba
- UCBL, CNRS, INSA Lyon, LIRIS, UMR5205, Univ Lyon, 69622, Villeurbanne, France
| | - Olivier Togni
- Laboratoire d'Informatique de Bourgogne, University of Burgundy, Dijon, France
| |
Collapse
|
7
|
Demir Karaman E, Işık Z. Multi-Omics Data Analysis Identifies Prognostic Biomarkers across Cancers. Med Sci (Basel) 2023; 11:44. [PMID: 37489460 PMCID: PMC10366886 DOI: 10.3390/medsci11030044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 06/18/2023] [Accepted: 06/20/2023] [Indexed: 07/26/2023] Open
Abstract
Combining omics data from different layers using integrative methods provides a better understanding of the biology of a complex disease such as cancer. The discovery of biomarkers related to cancer development or prognosis helps to find more effective treatment options. This study integrates multi-omics data of different cancer types with a network-based approach to explore common gene modules among different tumors by running community detection methods on the integrated network. The common modules were evaluated by several biological metrics adapted to cancer. Then, a new prognostic scoring method was developed by weighting mRNA expression, methylation, and mutation status of genes. The survival analysis pointed out statistically significant results for GNG11, CBX2, CDKN3, ARHGEF10, CLN8, SEC61G and PTDSS1 genes. The literature search reveals that the identified biomarkers are associated with the same or different types of cancers. Our method does not only identify known cancer-specific biomarker genes, but also proposes new potential biomarkers. Thus, this study provides a rationale for identifying new gene targets and expanding treatment options across cancer types.
Collapse
Affiliation(s)
- Ezgi Demir Karaman
- Department of Computer Engineering, Institute of Natural and Applied Sciences, Dokuz Eylul University, Izmir 35390, Turkey
| | - Zerrin Işık
- Department of Computer Engineering, Faculty of Engineering, Dokuz Eylul University, Izmir 35390, Turkey
| |
Collapse
|
8
|
Wu X, Li Z, Chen G, Yin Y, Chen CYC. Hybrid neural network approaches to predict drug-target binding affinity for drug repurposing: screening for potential leads for Alzheimer's disease. Front Mol Biosci 2023; 10:1227371. [PMID: 37441162 PMCID: PMC10334190 DOI: 10.3389/fmolb.2023.1227371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 06/13/2023] [Indexed: 07/15/2023] Open
Abstract
Alzheimer's disease (AD) is a neurodegenerative disease that primarily affects elderly individuals. Recent studies have found that sigma-1 receptor (S1R) agonists can maintain endoplasmic reticulum stress homeostasis, reduce neuronal apoptosis, and enhance mitochondrial function and autophagy, making S1R a target for AD therapy. Traditional experimental methods are costly and inefficient, and rapid and accurate prediction methods need to be developed, while drug repurposing provides new ways and options for AD treatment. In this paper, we propose HNNDTA, a hybrid neural network for drug-target affinity (DTA) prediction, to facilitate drug repurposing for AD treatment. The study combines protein-protein interaction (PPI) network analysis, the HNNDTA model, and molecular docking to identify potential leads for AD. The HNNDTA model was constructed using 13 drug encoding networks and 9 target encoding networks with 2506 FDA-approved drugs as the candidate drug library for S1R and related proteins. Seven potential drugs were identified using network pharmacology and DTA prediction results of the HNNDTA model. Molecular docking simulations were further performed using the AutoDock Vina tool to screen haloperidol and bromperidol as lead compounds for AD treatment. Absorption, distribution, metabolism, excretion, and toxicity (ADMET) evaluation results indicated that both compounds had good pharmacokinetic properties and were virtually non-toxic. The study proposes a new approach to computer-aided drug design that is faster and more economical, and can improve hit rates for new drug compounds. The results of this study provide new lead compounds for AD treatment, which may be effective due to their multi-target action. HNNDTA is freely available at https://github.com/lizhj39/HNNDTA.
Collapse
Affiliation(s)
- Xialin Wu
- School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, China
- Guangzhou University of Chinese Medicine, Guangzhou, China
| | - Zhuojian Li
- Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-Sen University, Shenzhen, China
| | - Guanxing Chen
- Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-Sen University, Shenzhen, China
| | - Yiyang Yin
- Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-Sen University, Shenzhen, China
| | - Calvin Yu-Chian Chen
- Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-Sen University, Shenzhen, China
- Department of Medical Research, China Medical University Hospital, Taichung, Taiwan
- Department of Bioinformatics and Medical Engineering, Asia University, Taichung, Taiwan
| |
Collapse
|
9
|
Discovering Entities Similarities in Biological Networks Using a Hybrid Immune Algorithm. INFORMATICS 2023. [DOI: 10.3390/informatics10010018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Disease phenotypes are generally caused by the failure of gene modules which often have similar biological roles. Through the study of biological networks, it is possible to identify the intrinsic structure of molecular interactions in order to identify the so-called “disease modules”. Community detection is an interesting and valuable approach to discovering the structure of the community in a complex network, revealing the internal organization of the nodes, and has become a leading research topic in the analysis of complex networks. This work investigates the link between biological modules and network communities in test-case biological networks that are commonly used as a reference point and which include Protein–Protein Interaction Networks, Metabolic Networks and Transcriptional Regulation Networks. In order to identify small and structurally well-defined communities in the biological context, a hybrid immune metaheuristic algorithm Hybrid-IA is proposed and compared with several metaheuristics, hyper-heuristics, and the well-known greedy algorithm Louvain, with respect to modularity maximization. Considering the limitation of modularity optimization, which can fail to identify smaller communities, the reliability of Hybrid-IA was also analyzed with respect to three well-known sensitivity analysis measures (NMI, ARI and NVI) that assess how similar the detected communities are to real ones. By inspecting all outcomes and the performed comparisons, we will see that on one hand Hybrid-IA finds slightly lower modularity values than Louvain, but outperforms all other metaheuristics, while on the other hand, it can detect communities more similar to the real ones when compared to those detected by Louvain.
Collapse
|
10
|
de la Fuente L, Del Pozo-Valero M, Perea-Romero I, Blanco-Kelly F, Fernández-Caballero L, Cortón M, Ayuso C, Mínguez P. Prioritization of New Candidate Genes for Rare Genetic Diseases by a Disease-Aware Evaluation of Heterogeneous Molecular Networks. Int J Mol Sci 2023; 24:ijms24021661. [PMID: 36675175 PMCID: PMC9864172 DOI: 10.3390/ijms24021661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2022] [Revised: 01/10/2023] [Accepted: 01/11/2023] [Indexed: 01/18/2023] Open
Abstract
Screening for pathogenic variants in the diagnosis of rare genetic diseases can now be performed on all genes thanks to the application of whole exome and genome sequencing (WES, WGS). Yet the repertoire of gene-disease associations is not complete. Several computer-based algorithms and databases integrate distinct gene-gene functional networks to accelerate the discovery of gene-disease associations. We hypothesize that the ability of every type of information to extract relevant insights is disease-dependent. We compiled 33 functional networks classified into 13 knowledge categories (KCs) and observed large variability in their ability to recover genes associated with 91 genetic diseases, as measured using efficiency and exclusivity. We developed GLOWgenes, a network-based algorithm that applies random walk with restart to evaluate KCs' ability to recover genes from a given list associated with a phenotype and modulates the prediction of new candidates accordingly. Comparison with other integration strategies and tools shows that our disease-aware approach can boost the discovery of new gene-disease associations, especially for the less obvious ones. KC contribution also varies if obtained using recently discovered genes. Applied to 15 unsolved WES, GLOWgenes proposed three new genes to be involved in the phenotypes of patients with syndromic inherited retinal dystrophies.
Collapse
Affiliation(s)
- Lorena de la Fuente
- Department of Genetics, Health Research Institute–Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28049 Madrid, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III (ISCIII), 28040 Madrid, Spain
- Bioinformatics Unit, Health Research Institute–Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28049 Madrid, Spain
| | - Marta Del Pozo-Valero
- Department of Genetics, Health Research Institute–Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28049 Madrid, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III (ISCIII), 28040 Madrid, Spain
| | - Irene Perea-Romero
- Department of Genetics, Health Research Institute–Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28049 Madrid, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III (ISCIII), 28040 Madrid, Spain
| | - Fiona Blanco-Kelly
- Department of Genetics, Health Research Institute–Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28049 Madrid, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III (ISCIII), 28040 Madrid, Spain
| | - Lidia Fernández-Caballero
- Department of Genetics, Health Research Institute–Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28049 Madrid, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III (ISCIII), 28040 Madrid, Spain
| | - Marta Cortón
- Department of Genetics, Health Research Institute–Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28049 Madrid, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III (ISCIII), 28040 Madrid, Spain
| | - Carmen Ayuso
- Department of Genetics, Health Research Institute–Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28049 Madrid, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III (ISCIII), 28040 Madrid, Spain
| | - Pablo Mínguez
- Department of Genetics, Health Research Institute–Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28049 Madrid, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III (ISCIII), 28040 Madrid, Spain
- Bioinformatics Unit, Health Research Institute–Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28049 Madrid, Spain
- Correspondence:
| |
Collapse
|
11
|
Identifying Tumor-Associated Genes from Bilayer Networks of DNA Methylation Sites and RNAs. LIFE (BASEL, SWITZERLAND) 2022; 13:life13010076. [PMID: 36676027 PMCID: PMC9861397 DOI: 10.3390/life13010076] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 12/21/2022] [Accepted: 12/21/2022] [Indexed: 12/29/2022]
Abstract
Network theory has attracted much attention from the biological community because of its high efficacy in identifying tumor-associated genes. However, most researchers have focused on single networks of single omics, which have less predictive power. With the available multiomics data, multilayer networks can now be used in molecular research. In this study, we achieved this with the construction of a bilayer network of DNA methylation sites and RNAs. We applied the network model to five types of tumor data to identify key genes associated with tumors. Compared with the single network, the proposed bilayer network resulted in more tumor-associated DNA methylation sites and genes, which we verified with prognostic and KEGG enrichment analyses.
Collapse
|
12
|
Sussman L, Garcia-Robledo JE, Ordóñez-Reyes C, Forero Y, Mosquera AF, Ruíz-Patiño A, Chamorro DF, Cardona AF. Integration of artificial intelligence and precision oncology in Latin America. FRONTIERS IN MEDICAL TECHNOLOGY 2022; 4:1007822. [PMID: 36311461 PMCID: PMC9608820 DOI: 10.3389/fmedt.2022.1007822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Accepted: 09/21/2022] [Indexed: 11/07/2022] Open
Abstract
Next-generation medicine encompasses different concepts related to healthcare models and technological developments. In Latin America and the Caribbean, healthcare systems are quite different between countries, and cancer control is known to be insufficient and inefficient considering socioeconomically discrepancies. Despite advancements in knowledge about the biology of different oncological diseases, the disease remains a challenge in terms of diagnosis, treatment, and prognosis for clinicians and researchers. With the development of molecular biology, better diagnosis methods, and therapeutic tools in the last years, artificial intelligence (AI) has become important, because it could improve different clinical scenarios: predicting clinically relevant parameters, cancer diagnosis, cancer research, and accelerating the growth of personalized medicine. The incorporation of AI represents an important challenge in terms of diagnosis, treatment, and prognosis for clinicians and researchers in cancer care. Therefore, some studies about AI in Latin America and the Caribbean are being conducted with the aim to improve the performance of AI in those countries. This review introduces AI in cancer care in Latin America and the Caribbean, and the advantages and promising results that it has shown in this socio-demographic context.
Collapse
Affiliation(s)
- Liliana Sussman
- Department of Neurology, Fundación Universitaria de Ciencias de la Salud, Bogotá, Colombia,Foundation for Clinical and Applied Cancer Research – FICMAC, Bogotá, Colombia
| | - Juan Esteban Garcia-Robledo
- Foundation for Clinical and Applied Cancer Research – FICMAC, Bogotá, Colombia,Division of Hematology/Oncology, Mayo Clinic, Scottsdale, AZ, United States
| | - Camila Ordóñez-Reyes
- Foundation for Clinical and Applied Cancer Research – FICMAC, Bogotá, Colombia,MolecularOncology and Biology Systems Research Group (Fox-G), Universidad el Bosque, Bogotá, Colombia
| | - Yency Forero
- Foundation for Clinical and Applied Cancer Research – FICMAC, Bogotá, Colombia,MolecularOncology and Biology Systems Research Group (Fox-G), Universidad el Bosque, Bogotá, Colombia
| | - Andrés F. Mosquera
- Foundation for Clinical and Applied Cancer Research – FICMAC, Bogotá, Colombia,MolecularOncology and Biology Systems Research Group (Fox-G), Universidad el Bosque, Bogotá, Colombia
| | - Alejandro Ruíz-Patiño
- Foundation for Clinical and Applied Cancer Research – FICMAC, Bogotá, Colombia,MolecularOncology and Biology Systems Research Group (Fox-G), Universidad el Bosque, Bogotá, Colombia
| | - Diego F. Chamorro
- Foundation for Clinical and Applied Cancer Research – FICMAC, Bogotá, Colombia,MolecularOncology and Biology Systems Research Group (Fox-G), Universidad el Bosque, Bogotá, Colombia
| | - Andrés F. Cardona
- Foundation for Clinical and Applied Cancer Research – FICMAC, Bogotá, Colombia,MolecularOncology and Biology Systems Research Group (Fox-G), Universidad el Bosque, Bogotá, Colombia,Direction of Research, Science and Education, Luis Carlos Sarmiento Angulo Cancer Treatment and Research Center (CTIC), Bogotá, Colombia,Correspondence: Andrés F. Cardona
| |
Collapse
|
13
|
Artificial intelligence in cancer target identification and drug discovery. Signal Transduct Target Ther 2022; 7:156. [PMID: 35538061 PMCID: PMC9090746 DOI: 10.1038/s41392-022-00994-0] [Citation(s) in RCA: 60] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Revised: 03/14/2022] [Accepted: 04/05/2022] [Indexed: 02/08/2023] Open
Abstract
Artificial intelligence is an advanced method to identify novel anticancer targets and discover novel drugs from biology networks because the networks can effectively preserve and quantify the interaction between components of cell systems underlying human diseases such as cancer. Here, we review and discuss how to employ artificial intelligence approaches to identify novel anticancer targets and discover drugs. First, we describe the scope of artificial intelligence biology analysis for novel anticancer target investigations. Second, we review and discuss the basic principles and theory of commonly used network-based and machine learning-based artificial intelligence algorithms. Finally, we showcase the applications of artificial intelligence approaches in cancer target identification and drug discovery. Taken together, the artificial intelligence models have provided us with a quantitative framework to study the relationship between network characteristics and cancer, thereby leading to the identification of potential anticancer targets and the discovery of novel drug candidates.
Collapse
|
14
|
Multiomics Topic Modeling for Breast Cancer Classification. Cancers (Basel) 2022; 14:cancers14051150. [PMID: 35267458 PMCID: PMC8909787 DOI: 10.3390/cancers14051150] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 02/18/2022] [Indexed: 12/04/2022] Open
Abstract
The integration of transcriptional data with other layers of information, such as the post-transcriptional regulation mediated by microRNAs, can be crucial to identify the driver genes and the subtypes of complex and heterogeneous diseases such as cancer. This paper presents an approach based on topic modeling to accomplish this integration task. More specifically, we show how an algorithm based on a hierarchical version of stochastic block modeling can be naturally extended to integrate any combination of 'omics data. We test this approach on breast cancer samples from the TCGA database, integrating data on messenger RNA, microRNAs, and copy number variations. We show that the inclusion of the microRNA layer significantly improves the accuracy of subtype classification. Moreover, some of the hidden structures or "topics" that the algorithm extracts actually correspond to genes and microRNAs involved in breast cancer development and are associated to the survival probability.
Collapse
|
15
|
Gong H, Zhu S, Zhu X, Fang Q, Zhang XY, Wu R. A Multilayer Interactome Network Constructed in a Forest Poplar Population Mediates the Pleiotropic Control of Complex Traits. Front Genet 2021; 12:769688. [PMID: 34868256 PMCID: PMC8633413 DOI: 10.3389/fgene.2021.769688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 10/19/2021] [Indexed: 11/13/2022] Open
Abstract
The effects of genes on physiological and biochemical processes are interrelated and interdependent; it is common for genes to express pleiotropic control of complex traits. However, the study of gene expression and participating pathways in vivo at the whole-genome level is challenging. Here, we develop a coupled regulatory interaction differential equation to assess overall and independent genetic effects on trait growth. Based on evolutionary game theory and developmental modularity theory, we constructed multilayer, omnigenic networks of bidirectional, weighted, and positive or negative epistatic interactions using a forest poplar tree mapping population, which were organized into metagalactic, intergalactic, and local interstellar networks that describe layers of structure between modules, submodules, and individual single nucleotide polymorphisms, respectively. These multilayer interactomes enable the exploration of complex interactions between genes, and the analysis of not only differential expression of quantitative trait loci but also previously uncharacterized determinant SNPs, which are negatively regulated by other SNPs, based on the deconstruction of genetic effects to their component parts. Our research framework provides a tool to comprehend the pleiotropic control of complex traits and explores the inherent directional connections between genes in the structure of omnigenic networks.
Collapse
Affiliation(s)
- Huiying Gong
- College of Science, Beijing Forestry University, Beijing, China
| | - Sheng Zhu
- College of Biology and the Environment, Nanjing Forestry University, Nanjing, China
| | - Xuli Zhu
- Center for Computational Biology, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| | - Qing Fang
- Faculty of Science, Yamagata University, Yamagata, Japan
| | - Xiao-Yu Zhang
- College of Science, Beijing Forestry University, Beijing, China
| | - Rongling Wu
- Center for Computational Biology, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
- Center for Statistical Genetics, The Pennsylvania State University, Hershey, PA, United States
| |
Collapse
|
16
|
Alcalá-Corona SA, Sandoval-Motta S, Espinal-Enríquez J, Hernández-Lemus E. Modularity in Biological Networks. Front Genet 2021; 12:701331. [PMID: 34594357 PMCID: PMC8477004 DOI: 10.3389/fgene.2021.701331] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 08/23/2021] [Indexed: 01/13/2023] Open
Abstract
Network modeling, from the ecological to the molecular scale has become an essential tool for studying the structure, dynamics and complex behavior of living systems. Graph representations of the relationships between biological components open up a wide variety of methods for discovering the mechanistic and functional properties of biological systems. Many biological networks are organized into a modular structure, so methods to discover such modules are essential if we are to understand the biological system as a whole. However, most of the methods used in biology to this end, have a limited applicability, as they are very specific to the system they were developed for. Conversely, from the statistical physics and network science perspective, graph modularity has been theoretically studied and several methods of a very general nature have been developed. It is our perspective that in particular for the modularity detection problem, biology and theoretical physics/network science are less connected than they should. The central goal of this review is to provide the necessary background and present the most applicable and pertinent methods for community detection in a way that motivates their further usage in biological research.
Collapse
Affiliation(s)
- Sergio Antonio Alcalá-Corona
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico.,Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Santiago Sandoval-Motta
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico.,Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City, Mexico.,National Council on Science and Technology, Mexico City, Mexico
| | - Jesús Espinal-Enríquez
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico.,Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico.,Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City, Mexico
| |
Collapse
|
17
|
Moingeon P, Kuenemann M, Guedj M. Artificial intelligence-enhanced drug design and development: Toward a computational precision medicine. Drug Discov Today 2021; 27:215-222. [PMID: 34555509 DOI: 10.1016/j.drudis.2021.09.006] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 07/13/2021] [Accepted: 09/14/2021] [Indexed: 12/29/2022]
Abstract
Artificial Intelligence (AI) relies upon a convergence of technologies with further synergies with life science technologies to capture the value of massive multi-modal data in the form of predictive models supporting decision-making. AI and machine learning (ML) enhance drug design and development by improving our understanding of disease heterogeneity, identifying dysregulated molecular pathways and therapeutic targets, designing and optimizing drug candidates, as well as evaluating in silico clinical efficacy. By providing an unprecedented level of knowledge on both patient specificities and drug candidate properties, AI is fostering the emergence of a computational precision medicine allowing the design of therapies or preventive measures tailored to the singularities of individual patients in terms of their physiology, disease features, and exposure to environmental risks.
Collapse
Affiliation(s)
- Philippe Moingeon
- Servier, Research and Development, 50 rue Carnot, 92284 Suresnes Cedex, France.
| | - Mélaine Kuenemann
- Servier, Research and Development, 50 rue Carnot, 92284 Suresnes Cedex, France
| | - Mickaël Guedj
- Servier, Research and Development, 50 rue Carnot, 92284 Suresnes Cedex, France
| |
Collapse
|
18
|
Dou Z, Ma X. Inferring Functional Epigenetic Modules by Integrative Analysis of Multiple Heterogeneous Networks. Front Genet 2021; 12:706952. [PMID: 34504516 PMCID: PMC8421682 DOI: 10.3389/fgene.2021.706952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2021] [Accepted: 06/29/2021] [Indexed: 02/02/2023] Open
Abstract
Gene expression and methylation are critical biological processes for cells, and how to integrate these heterogeneous data has been extensively investigated, which is the foundation for revealing the underlying patterns of cancers. The vast majority of the current algorithms fuse gene methylation and expression into a network, failing to fully explore the relations and heterogeneity of them. To resolve these problems, in this study we define the epigenetic modules as a gene set whose members are co-methylated and co-expressed. To address the heterogeneity of data, we construct gene co-expression and co-methylation networks, respectively. In this case, the epigenetic module is characterized as a common module in multiple networks. Then, a non-negative matrix factorization-based algorithm that jointly clusters the co-expression and co-methylation networks is proposed for discovering the epigenetic modules (called Ep-jNMF). Ep-jNMF is more accurate than the baselines on the artificial data. Moreover, Ep-jNMF identifies more biologically meaningful modules. And the modules can predict the subtypes of cancers. These results indicate that Ep-jNMF is efficient for the integration of expression and methylation data.
Collapse
Affiliation(s)
- Zengfa Dou
- The 20-th Research Institute, China Electronics Technology Group Corporation, Xi'an, China
| | - Xiaoke Ma
- School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
19
|
Lin CH, Lichtarge O. Using Interpretable Deep Learning to Model Cancer Dependencies. Bioinformatics 2021; 37:2675-2681. [PMID: 34042953 PMCID: PMC8428607 DOI: 10.1093/bioinformatics/btab137] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 02/03/2021] [Accepted: 05/25/2021] [Indexed: 11/14/2022] Open
Abstract
Motivation Cancer dependencies provide potential drug targets. Unfortunately, dependencies differ among cancers and even individuals. To this end, visible neural networks (VNNs) are promising due to robust performance and the interpretability required for the biomedical field. Results We design Biological visible neural network (BioVNN) using pathway knowledge to predict cancer dependencies. Despite having fewer parameters, BioVNN marginally outperforms traditional neural networks (NNs) and converges faster. BioVNN also outperforms an NN based on randomized pathways. More importantly, dependency predictions can be explained by correlating with the neuron output states of relevant pathways, which suggest dependency mechanisms. In feature importance analysis, BioVNN recapitulates known reaction partners and proposes new ones. Such robust and interpretable VNNs may facilitate the understanding of cancer dependency and the development of targeted therapies. Availability and implementation Code and data are available at https://github.com/LichtargeLab/BioVNN Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics
- Department of Biochemistry and Molecular Biology
- Department of Pharmacology
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
- To whom correspondence should be addressed.
| |
Collapse
|
20
|
García-Cortés D, Hernández-Lemus E, Espinal-Enríquez J. Luminal A Breast Cancer Co-expression Network: Structural and Functional Alterations. Front Genet 2021; 12:629475. [PMID: 33959148 PMCID: PMC8096206 DOI: 10.3389/fgene.2021.629475] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2020] [Accepted: 03/17/2021] [Indexed: 12/20/2022] Open
Abstract
Luminal A is the most common breast cancer molecular subtype in women worldwide. These tumors have characteristic yet heterogeneous alterations at the genomic and transcriptomic level. Gene co-expression networks (GCNs) have contributed to better characterize the cancerous phenotype. We have previously shown an imbalance in the proportion of intra-chromosomal (cis-) over inter-chromosomal (trans-) interactions when comparing cancer and healthy tissue GCNs. In particular, for breast cancer molecular subtypes (Luminal A included), the majority of high co-expression interactions connect gene-pairs in the same chromosome, a phenomenon that we have called loss of trans- co-expression. Despite this phenomenon has been described, the functional implication of this specific network topology has not been studied yet. To understand the biological role that communities of co-expressed genes may have, we constructed GCNs for healthy and Luminal A phenotypes. Network modules were obtained based on their connectivity patterns and they were classified according to their chromosomal homophily (proportion of cis-/trans- interactions). A functional overrepresentation analysis was performed on communities in both networks to observe the significantly enriched processes for each community. We also investigated possible mechanisms for which the loss of trans- co-expression emerges in cancer GCN. To this end we evaluated transcription factor binding sites, CTCF binding sites, differential gene expression and copy number alterations (CNAs) in the cancer GCN. We found that trans- communities in Luminal A present more significantly enriched categories than cis- ones. Processes, such as angiogenesis, cell proliferation, or cell adhesion were found in trans- modules. The differential expression analysis showed that FOXM1, CENPA, and CIITA transcription factors, exert a major regulatory role on their communities by regulating expression of their target genes in other chromosomes. Finally, identification of CNAs, displayed a high enrichment of deletion peaks in cis- communities. With this approach, we demonstrate that network topology determine, to at certain extent, the function in Luminal A breast cancer network. Furthermore, several mechanisms seem to be acting together to avoid trans- co-expression. Since this phenomenon has been observed in other cancer tissues, a remaining question is whether the loss of long distance co-expression is a novel hallmark of cancer.
Collapse
Affiliation(s)
- Diana García-Cortés
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico.,Programa de Doctorado en Ciencias Biomédicas, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico.,Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Jesús Espinal-Enríquez
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico.,Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City, Mexico
| |
Collapse
|
21
|
Abstract
INTRODUCTION Knowledge graphs have proven to be promising systems of information storage and retrieval. Due to the recent explosion of heterogeneous multimodal data sources generated in the biomedical domain, and an industry shift toward a systems biology approach, knowledge graphs have emerged as attractive methods of data storage and hypothesis generation. AREAS COVERED In this review, the author summarizes the applications of knowledge graphs in drug discovery. They evaluate their utility; differentiating between academic exercises in graph theory, and useful tools to derive novel insights, highlighting target identification and drug repurposing as two areas showing particular promise. They provide a case study on COVID-19, summarizing the research that used knowledge graphs to identify repurposable drug candidates. They describe the dangers of degree and literature bias, and discuss mitigation strategies. EXPERT OPINION Whilst knowledge graphs and graph-based machine learning have certainly shown promise, they remain relatively immature technologies. Many popular link prediction algorithms fail to address strong biases in biomedical data, and only highlight biological associations, failing to model causal relationships in complex dynamic biological systems. These problems need to be addressed before knowledge graphs reach their true potential in drug discovery.
Collapse
Affiliation(s)
- Finlay MacLean
- Target Identification., BenevolentAI, United Kingdom of Great Britain and Northern Ireland
| |
Collapse
|
22
|
Valle F, Osella M, Caselle M. A Topic Modeling Analysis of TCGA Breast and Lung Cancer Transcriptomic Data. Cancers (Basel) 2020; 12:E3799. [PMID: 33339347 PMCID: PMC7766023 DOI: 10.3390/cancers12123799] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Revised: 12/07/2020] [Accepted: 12/11/2020] [Indexed: 01/18/2023] Open
Abstract
Topic modeling is a widely used technique to extract relevant information from large arrays of data. The problem of finding a topic structure in a dataset was recently recognized to be analogous to the community detection problem in network theory. Leveraging on this analogy, a new class of topic modeling strategies has been introduced to overcome some of the limitations of classical methods. This paper applies these recent ideas to TCGA transcriptomic data on breast and lung cancer. The established cancer subtype organization is well reconstructed in the inferred latent topic structure. Moreover, we identify specific topics that are enriched in genes known to play a role in the corresponding disease and are strongly related to the survival probability of patients. Finally, we show that a simple neural network classifier operating in the low dimensional topic space is able to predict with high accuracy the cancer subtype of a test expression sample.
Collapse
Affiliation(s)
- Filippo Valle
- Physics Department, University of Turin and INFN, via P. Giuria 1, 10125 Turin, Italy; (M.O.); (M.C.)
| | | | | |
Collapse
|
23
|
Singhal A, Cao S, Churas C, Pratt D, Fortunato S, Zheng F, Ideker T. Multiscale community detection in Cytoscape. PLoS Comput Biol 2020; 16:e1008239. [PMID: 33095781 PMCID: PMC7584444 DOI: 10.1371/journal.pcbi.1008239] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Accepted: 08/12/2020] [Indexed: 02/08/2023] Open
Abstract
Detection of community structure has become a fundamental step in the analysis of biological networks with application to protein function annotation, disease gene prediction, and drug discovery. This recent impact creates a need to make these techniques and their accompanying visualization schemes available to a broad range of biologists. Here we present a service-oriented, end-to-end software framework, CDAPS (Community Detection APplication and Service), that integrates the identification, annotation, visualization, and interrogation of multiscale network communities, accessible within the popular Cytoscape network analysis platform. With novel design principles, CDAPS addresses unmet new challenges, such as identifying hierarchical community structures, comparison of outputs generated from diverse network resources, and easy deployment of new algorithms, to facilitate community-sourced science. We demonstrate that the CDAPS framework can be applied to high-throughput protein-protein interaction networks to gain novel insights, such as the identification of putative new members of known protein complexes.
Collapse
Affiliation(s)
- Akshat Singhal
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California, United States of America
| | - Song Cao
- Department of Medicine, University of California, San Diego, La Jolla, California, United States of America
| | - Christopher Churas
- Department of Medicine, University of California, San Diego, La Jolla, California, United States of America
| | - Dexter Pratt
- Department of Medicine, University of California, San Diego, La Jolla, California, United States of America
| | - Santo Fortunato
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana, United States of America
| | - Fan Zheng
- Department of Medicine, University of California, San Diego, La Jolla, California, United States of America
- * E-mail: (FZ); (TI)
| | - Trey Ideker
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California, United States of America
- Department of Medicine, University of California, San Diego, La Jolla, California, United States of America
- * E-mail: (FZ); (TI)
| |
Collapse
|
24
|
Abstract
AbstractCommunity detection is one of the most popular researches in a variety of complex systems, ranging from biology to sociology. In recent years, there’s an increasing focus on the rapid development of more complicated networks, namely multilayer networks. Communities in a single-layer network are groups of nodes that are more strongly connected among themselves than the others, while in multilayer networks, a group of well-connected nodes are shared in multiple layers. Most traditional algorithms can rarely perform well on a multilayer network without modifications. Thus, in this paper, we offer overall comparisons of existing works and analyze several representative algorithms, providing a comprehensive understanding of community detection methods in multilayer networks. The comparison results indicate that the promoting of algorithm efficiency and the extending for general multilayer networks are also expected in the forthcoming studies.
Collapse
|
25
|
Ma X, Sun P, Gong M. An integrative framework of heterogeneous genomic data for cancer dynamic modules based on matrix decomposition. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 19:305-316. [PMID: 32750874 DOI: 10.1109/tcbb.2020.3004808] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Cancer progression is dynamic, and tracking dynamic modules is promising for cancer diagnosis and therapy. Accumulated genomic data provide us an opportunity to investigate the underlying mechanisms of cancers. However, as far as we know, no algorithm has been designed for dynamic modules by integrating heterogeneous omics data. To address this issue, we propose an integrative framework for dynamic module detection based on regularized nonnegative matrix factorization method (DrNMF) by integrating the gene expression and protein interaction network. To remove the heterogeneity of genomic data, we divide the samples of expression profiles into groups to construct gene co-expression networks. To characterize the dynamics of modules, the temporal smoothness framework is adopted, in which the gene co-expression network at the previous stage and protein interaction network are incorporated into the objective function of DrNMF via regularization. The experimental results demonstrate that DrNMF is superior to state-of-the-art methods in terms of accuracy. For breast cancer data, the obtained dynamic modules are more enriched by the known pathways, and can be used to predict the stages of cancers and survival time of patients. The proposed model and algorithm provide an effective integrative analysis of heterogeneous genomic data for cancer progression.
Collapse
|
26
|
Mallik S, Zhao Z. Graph- and rule-based learning algorithms: a comprehensive review of their applications for cancer type classification and prognosis using genomic data. Brief Bioinform 2020; 21:368-394. [PMID: 30649169 PMCID: PMC7373185 DOI: 10.1093/bib/bby120] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Revised: 10/26/2018] [Accepted: 11/21/2018] [Indexed: 12/20/2022] Open
Abstract
Cancer is well recognized as a complex disease with dysregulated molecular networks or modules. Graph- and rule-based analytics have been applied extensively for cancer classification as well as prognosis using large genomic and other data over the past decade. This article provides a comprehensive review of various graph- and rule-based machine learning algorithms that have been applied to numerous genomics data to determine the cancer-specific gene modules, identify gene signature-based classifiers and carry out other related objectives of potential therapeutic value. This review focuses mainly on the methodological design and features of these algorithms to facilitate the application of these graph- and rule-based analytical approaches for cancer classification and prognosis. Based on the type of data integration, we divided all the algorithms into three categories: model-based integration, pre-processing integration and post-processing integration. Each category is further divided into four sub-categories (supervised, unsupervised, semi-supervised and survival-driven learning analyses) based on learning style. Therefore, a total of 11 categories of methods are summarized with their inputs, objectives and description, advantages and potential limitations. Next, we briefly demonstrate well-known and most recently developed algorithms for each sub-category along with salient information, such as data profiles, statistical or feature selection methods and outputs. Finally, we summarize the appropriate use and efficiency of all categories of graph- and rule mining-based learning methods when input data and specific objective are given. This review aims to help readers to select and use the appropriate algorithms for cancer classification and prognosis study.
Collapse
Affiliation(s)
- Saurav Mallik
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center, Houston
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center, Houston
| |
Collapse
|
27
|
Pournoor E, Mousavian Z, Dalini AN, Masoudi-Nejad A. Identification of Key Components in Colon Adenocarcinoma Using Transcriptome to Interactome Multilayer Framework. Sci Rep 2020; 10:4991. [PMID: 32193399 PMCID: PMC7081269 DOI: 10.1038/s41598-020-59605-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Accepted: 01/31/2020] [Indexed: 12/21/2022] Open
Abstract
Complexity of cascading interrelations between molecular cell components at different levels from genome to metabolome ordains a massive difficulty in comprehending biological happenings. However, considering these complications in the systematic modelings will result in realistic and reliable outputs. The multilayer networks approach is a relatively innovative concept that could be applied for multiple omics datasets as an integrative methodology to overcome heterogeneity difficulties. Herein, we employed the multilayer framework to rehabilitate colon adenocarcinoma network by observing co-expression correlations, regulatory relations, and physical binding interactions. Hub nodes in this three-layer network were selected using a heterogeneous random walk with random jump procedure. We exploited local composite modules around the hub nodes having high overlay with cancer-specific pathways, and investigated their genes showing a different expressional pattern in the tumor progression. These genes were examined for survival effects on the patient's lifespan, and those with significant impacts were selected as potential candidate biomarkers. Results suggest that identified genes indicate noteworthy importance in the carcinogenesis of the colon.
Collapse
Affiliation(s)
- Ehsan Pournoor
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Zaynab Mousavian
- School of Mathematics, Statistics, and Computer Science, College of Science, University of Tehran, Tehran, Iran
| | - Abbas Nowzari Dalini
- School of Mathematics, Statistics, and Computer Science, College of Science, University of Tehran, Tehran, Iran
| | - Ali Masoudi-Nejad
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran.
| |
Collapse
|
28
|
Diversity Analysis Exposes Unexpected Key Roles in Multiplex Crime Networks. COMPLEX NETWORKS XI 2020. [DOI: 10.1007/978-3-030-40943-2_31] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
29
|
Jovanovski P, Kocarev L. Bayesian consensus clustering in multiplex networks. CHAOS (WOODBURY, N.Y.) 2019; 29:103142. [PMID: 31675792 DOI: 10.1063/1.5120503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Accepted: 10/07/2019] [Indexed: 06/10/2023]
Abstract
Multiplex networks are immanently characterized with heterogeneous relations among vertices. In this paper, we develop Bayesian consensus stochastic block modeling for multiplex networks. The posterior distribution of the model is approximated via Markov chain Monte Carlo, and a Gibbs sampler is derived in detail. The model allows both integrated analysis of heterogeneous relations, thus providing more accurate block assignments, and simultaneously handling uncertainty in the model parameters. Motivated by the fact that the symmetry in physics plays a crucial role, we discuss also the symmetry in statistics, which is nowadays commonly known as exchangeability-the concept that has recently transformed the field of statistical network analysis.
Collapse
Affiliation(s)
- Petar Jovanovski
- Research Center for Computer Science and Information Technologies, Macedonian Academy of Sciences and Arts, Bul Krste Misirkov 2, 1000 Skopje, Republic of North Macedonia
| | - Ljupco Kocarev
- Research Center for Computer Science and Information Technologies, Macedonian Academy of Sciences and Arts, Bul Krste Misirkov 2, 1000 Skopje, Republic of North Macedonia
| |
Collapse
|
30
|
Fan X, Wang Y, Tang XQ. Extracting predictors for lung adenocarcinoma based on Granger causality test and stepwise character selection. BMC Bioinformatics 2019; 20:197. [PMID: 31074380 PMCID: PMC6509866 DOI: 10.1186/s12859-019-2739-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Background Lung adenocarcinoma is the most common type of lung cancer, with high mortality worldwide. Its occurrence and development were thoroughly studied by high-throughput expression microarray, which produced abundant data on gene expression, DNA methylation, and miRNA quantification. However, the hub genes, which can be served as bio-markers for discriminating cancer and healthy individuals, are not well screened. Result Here we present a new method for extracting gene predictors, aiming to obtain the least predictors without losing the efficiency. We firstly analyzed three different expression microarrays and constructed multi-interaction network, since the individual expression dataset is not enough for describing biological behaviors dynamically and systematically. Then, we transformed the undirected interaction network to directed network by employing Granger causality test, followed by the predictors screened with the use of the stepwise character selection algorithm. Six predictors, including TOP2A, GRK5, SIRT7, MCM7, EGFR, and COL1A2, were ultimately identified. All the predictors are the cancer-related, and the number is very small fascinating diagnosis. Finally, the validation of this approach was verified by robustness analyses applied to six independent datasets; the precision is up to 95.3% ∼ 100%. Conclusion Although there are complicated differences between cancer and normal cells in gene functions, cancer cells could be differentiated in case that a group of special genes expresses abnormally. Here we presented a new, robust, and effective method for extracting gene predictors. We identified as low as 6 genes which can be taken as predictors for diagnosing lung adenocarcinoma.
Collapse
Affiliation(s)
- Xuemeng Fan
- School of Science, Jiangnan University, Wuxi, 214122, China
| | - Yaolai Wang
- School of Science, Jiangnan University, Wuxi, 214122, China
| | - Xu-Qing Tang
- School of Science, Jiangnan University, Wuxi, 214122, China. .,Wuxi Engineering Research Center for Biocomputing, Wuxi, 214122, China.
| |
Collapse
|
31
|
Nelson W, Zitnik M, Wang B, Leskovec J, Goldenberg A, Sharan R. To Embed or Not: Network Embedding as a Paradigm in Computational Biology. Front Genet 2019; 10:381. [PMID: 31118945 PMCID: PMC6504708 DOI: 10.3389/fgene.2019.00381] [Citation(s) in RCA: 74] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Accepted: 04/09/2019] [Indexed: 12/20/2022] Open
Abstract
Current technology is producing high throughput biomedical data at an ever-growing rate. A common approach to interpreting such data is through network-based analyses. Since biological networks are notoriously complex and hard to decipher, a growing body of work applies graph embedding techniques to simplify, visualize, and facilitate the analysis of the resulting networks. In this review, we survey traditional and new approaches for graph embedding and compare their application to fundamental problems in network biology with using the networks directly. We consider a broad variety of applications including protein network alignment, community detection, and protein function prediction. We find that in all of these domains both types of approaches are of value and their performance depends on the evaluation measures being used and the goal of the project. In particular, network embedding methods outshine direct methods according to some of those measures and are, thus, an essential tool in bioinformatics research.
Collapse
Affiliation(s)
- Walter Nelson
- Genetics and Genome Biology, SickKids Research Institute, Toronto, ON, Canada
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada
| | - Marinka Zitnik
- Department of Computer Science, Stanford University, Stanford, CA, United States
| | - Bo Wang
- Department of Computer Science, Stanford University, Stanford, CA, United States
- Peter Munk Cardiac Center, University Health Network, Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| | - Jure Leskovec
- Department of Computer Science, Stanford University, Stanford, CA, United States
- Chan Zuckerberg Biohub, San Francisco, CA, United States
| | - Anna Goldenberg
- Genetics and Genome Biology, SickKids Research Institute, Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Roded Sharan
- School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
32
|
Zhang Y, Chen J, Wang Y, Wang D, Cong W, Lai BS, Zhao Y. Multilayer network analysis of miRNA and protein expression profiles in breast cancer patients. PLoS One 2019; 14:e0202311. [PMID: 30946749 PMCID: PMC6448837 DOI: 10.1371/journal.pone.0202311] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Accepted: 03/19/2019] [Indexed: 12/21/2022] Open
Abstract
MiRNAs and proteins play important roles in different stages of breast tumor development and serve as biomarkers for the early diagnosis of breast cancer. A new algorithm that combines machine learning algorithms and multilayer complex network analysis is hereby proposed to explore the potential diagnostic values of miRNAs and proteins. XGBoost and random forest algorithms were employed to screen the most important miRNAs and proteins. Maximal information coefficient was applied to assess intralayer and interlayer connection. A multilayer complex network was constructed to identify miRNAs and proteins that could serve as biomarkers for breast cancer. Proteins and miRNAs that are nodes in the network were subsequently categorized into two network layers considering their distinct functions. The betweenness centrality was used as the first measurement of the importance of the nodes within each single layer. The degree of the nodes was chosen as the second measurement to map their signalling pathways. By combining these two measurements into one score and comparing the difference of the same candidate between normal tissue and cancer tissue, this novel multilayer network analysis could be applied to successfully identify molecules associated with breast cancer.
Collapse
Affiliation(s)
- Yang Zhang
- Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, China
| | - Jiannan Chen
- Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, China
| | - Yu Wang
- Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, China
| | - Dehua Wang
- Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, China
| | - Weihui Cong
- Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, China
| | - Bo Shiun Lai
- Johns Hopkins University School of Medicine, Baltimore, Maryland, United States
| | - Yi Zhao
- Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, China
| |
Collapse
|
33
|
Almasi SM, Hu T. Measuring the importance of vertices in the weighted human disease network. PLoS One 2019; 14:e0205936. [PMID: 30901770 PMCID: PMC6430629 DOI: 10.1371/journal.pone.0205936] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2018] [Accepted: 02/26/2019] [Indexed: 12/11/2022] Open
Abstract
Many human genetic disorders and diseases are known to be related to each other through frequently observed co-occurrences. Studying the correlations among multiple diseases provides an important avenue to better understand the common genetic background of diseases and to help develop new drugs that can treat multiple diseases. Meanwhile, network science has seen increasing applications on modeling complex biological systems, and can be a powerful tool to elucidate the correlations of multiple human diseases. In this article, known disease-gene associations were represented using a weighted bipartite network. We extracted a weighted human diseases network from such a bipartite network to show the correlations of diseases. Subsequently, we proposed a new centrality measurement for the weighted human disease network (WHDN) in order to quantify the importance of diseases. Using our centrality measurement to quantify the importance of vertices in WHDN, we were able to find a set of most central diseases. By investigating the 30 top diseases and their most correlated neighbors in the network, we identified disease linkages including known disease pairs and novel findings. Our research helps better understand the common genetic origin of human diseases and suggests top diseases that likely induce other related diseases.
Collapse
Affiliation(s)
| | - Ting Hu
- Department of Computer Science, Memorial University, St. John’s, NL, Canada
| |
Collapse
|
34
|
Carpi LC, Schieber TA, Pardalos PM, Marfany G, Masoller C, Díaz-Guilera A, Ravetti MG. Assessing diversity in multiplex networks. Sci Rep 2019; 9:4511. [PMID: 30872604 PMCID: PMC6418208 DOI: 10.1038/s41598-019-38869-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2018] [Accepted: 01/03/2019] [Indexed: 02/06/2023] Open
Abstract
Diversity, understood as the variety of different elements or configurations that an extensive system has, is a crucial property that allows maintaining the system's functionality in a changing environment, where failures, random events or malicious attacks are often unavoidable. Despite the relevance of preserving diversity in the context of ecology, biology, transport, finances, etc., the elements or configurations that more contribute to the diversity are often unknown, and thus, they can not be protected against failures or environmental crises. This is due to the fact that there is no generic framework that allows identifying which elements or configurations have crucial roles in preserving the diversity of the system. Existing methods treat the level of heterogeneity of a system as a measure of its diversity, being unsuitable when systems are composed of a large number of elements with different attributes and types of interactions. Besides, with limited resources, one needs to find the best preservation policy, i.e., one needs to solve an optimization problem. Here we aim to bridge this gap by developing a metric between labeled graphs to compute the diversity of the system, which allows identifying the most relevant components, based on their contribution to a global diversity value. The proposed framework is suitable for large multiplex structures, which are constituted by a set of elements represented as nodes, which have different types of interactions, represented as layers. The proposed method allows us to find, in a genetic network (HIV-1), the elements with the highest diversity values, while in a European airline network, we systematically identify the companies that maximize (and those that less compromise) the variety of options for routes connecting different airports.
Collapse
Affiliation(s)
- Laura C Carpi
- Programa de Pós-Graduação em Modelagem Matemática e Computacional, PPGMMC, Centro Federal de Educação Tecnológica de Minas Gerais, CEFET-MG. Av. Amazonas, 7675. 30510-000., Belo Horizonte, MG, Brazil
| | - Tiago A Schieber
- Departamento de Ciências Administrativas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | - Panos M Pardalos
- Industrial and Systems Engineering, University of Florida, Gainesville, FL, USA
| | - Gemma Marfany
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
- Institut de Biomedicina de la Universitat de Barcelona (IBUB-IRSJD), Barcelona, Spain
| | - Cristina Masoller
- Departament de Física, Universitat Politècnica de Catalunya, Rambla St. Nebridi 22, Terrassa, 08222, Barcelona, Spain
| | - Albert Díaz-Guilera
- Departament de Física de la Matèria Condensada, Universitat de Barcelona, Marti i Franques 1, Barcelona, 08028, Spain
- Universitat de Barcelona Institute of Complex Systems (UBICS), 08028, Barcelona, Spain
| | - Martín G Ravetti
- Departmento de Engenharia de Produção, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil.
| |
Collapse
|
35
|
Cova TFGG, Bento DJ, Nunes SCC. Computational Approaches in Theranostics: Mining and Predicting Cancer Data. Pharmaceutics 2019; 11:E119. [PMID: 30871264 PMCID: PMC6471740 DOI: 10.3390/pharmaceutics11030119] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2019] [Revised: 02/26/2019] [Accepted: 03/07/2019] [Indexed: 02/02/2023] Open
Abstract
The ability to understand the complexity of cancer-related data has been prompted by the applications of (1) computer and data sciences, including data mining, predictive analytics, machine learning, and artificial intelligence, and (2) advances in imaging technology and probe development. Computational modelling and simulation are systematic and cost-effective tools able to identify important temporal/spatial patterns (and relationships), characterize distinct molecular features of cancer states, and address other relevant aspects, including tumor detection and heterogeneity, progression and metastasis, and drug resistance. These approaches have provided invaluable insights for improving the experimental design of therapeutic delivery systems and for increasing the translational value of the results obtained from early and preclinical studies. The big question is: Could cancer theranostics be determined and controlled in silico? This review describes the recent progress in the development of computational models and methods used to facilitate research on the molecular basis of cancer and on the respective diagnosis and optimized treatment, with particular emphasis on the design and optimization of theranostic systems. The current role of computational approaches is providing innovative, incremental, and complementary data-driven solutions for the prediction, simplification, and characterization of cancer and intrinsic mechanisms, and to promote new data-intensive, accurate diagnostics and therapeutics.
Collapse
Affiliation(s)
- Tânia F G G Cova
- Coimbra Chemistry Centre, Department of Chemistry, Faculty of Sciences and Technology, University of Coimbra, 3004-535 Coimbra, Portugal.
| | - Daniel J Bento
- Coimbra Chemistry Centre, Department of Chemistry, Faculty of Sciences and Technology, University of Coimbra, 3004-535 Coimbra, Portugal.
| | - Sandra C C Nunes
- Coimbra Chemistry Centre, Department of Chemistry, Faculty of Sciences and Technology, University of Coimbra, 3004-535 Coimbra, Portugal.
| |
Collapse
|
36
|
Bathelt J, Johnson A, Zhang M, Astle DE. The cingulum as a marker of individual differences in neurocognitive development. Sci Rep 2019; 9:2281. [PMID: 30783161 PMCID: PMC6381161 DOI: 10.1038/s41598-019-38894-z] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 01/11/2019] [Indexed: 01/21/2023] Open
Abstract
The canonical approach to exploring brain-behaviour relationships is to group individuals according to a phenotype of interest, and then explore the neural correlates of this grouping. A limitation of this approach is that multiple aetiological pathways could result in a similar phenotype, so the role of any one brain mechanism may be substantially underestimated. Building on advances in network analysis, we used a data-driven community-clustering algorithm to identify robust subgroups based on white-matter microstructure in childhood and adolescence (total N = 313, mean age: 11.24 years). The algorithm indicated the presence of two equal-size groups that show a critical difference in fractional anisotropy (FA) of the left and right cingulum. Applying the brain-based grouping in independent samples, we find that these different 'brain types' had profoundly different cognitive abilities with higher performance in the higher FA group. Further, a connectomics analysis indicated reduced structural connectivity in the low FA subgroup that was strongly related to reduced functional activation of the default mode network. These results provide a proof-of-concept that bottom-up brain-based groupings can be identified that relate to cognitive performance. This provides a first demonstration of a complimentary approach for investigating individual differences in brain structure and function, particularly for neurodevelopmental disorders where researchers are often faced with phenotypes that are difficult to define at the cognitive or behavioural level.
Collapse
Affiliation(s)
- Joe Bathelt
- MRC Cognition & Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom.
| | - Amy Johnson
- MRC Cognition & Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
| | - Mengya Zhang
- MRC Cognition & Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
| | - Duncan E Astle
- MRC Cognition & Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
37
|
Luecken MD, Page MJT, Crosby AJ, Mason S, Reinert G, Deane CM. CommWalker: correctly evaluating modules in molecular networks in light of annotation bias. Bioinformatics 2019; 34:994-1000. [PMID: 29112702 PMCID: PMC5860269 DOI: 10.1093/bioinformatics/btx706] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Accepted: 11/02/2017] [Indexed: 11/24/2022] Open
Abstract
Motivation Detecting novel functional modules in molecular networks is an important step in biological research. In the absence of gold standard functional modules, functional annotations are often used to verify whether detected modules/communities have biological meaning. However, as we show, the uneven distribution of functional annotations means that such evaluation methods favor communities of well-studied proteins. Results We propose a novel framework for the evaluation of communities as functional modules. Our proposed framework, CommWalker, takes communities as inputs and evaluates them in their local network environment by performing short random walks. We test CommWalker’s ability to overcome annotation bias using input communities from four community detection methods on two protein interaction networks. We find that modules accepted by CommWalker are similarly co-expressed as those accepted by current methods. Crucially, CommWalker performs well not only in well-annotated regions, but also in regions otherwise obscured by poor annotation. CommWalker community prioritization both faithfully captures well-validated communities and identifies functional modules that may correspond to more novel biology. Availability and implementation The CommWalker algorithm is freely available at opig.stats.ox.ac.uk/resources or as a docker image on the Docker Hub at hub.docker.com/r/lueckenmd/commwalker/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- M D Luecken
- Department of Statistics, University of Oxford, Oxford, UK
- Doctoral Training Centre, University of Oxford, Oxford, UK
| | - M J T Page
- Department of Informatics, UCB Pharma, Slough, UK
| | - A J Crosby
- Immunology Therapeutic Area, UCB Pharma, Slough, UK
| | - S Mason
- Immunology Therapeutic Area, UCB Pharma, Slough, UK
| | - G Reinert
- Department of Statistics, University of Oxford, Oxford, UK
| | - C M Deane
- Department of Statistics, University of Oxford, Oxford, UK
- Doctoral Training Centre, University of Oxford, Oxford, UK
- To whom correspondence should be addressed.
| |
Collapse
|
38
|
Abstract
BACKGROUND Inflammation is a core element of many different, systemic and chronic diseases that usually involve an important autoimmune component. The clinical phase of inflammatory diseases is often the culmination of a long series of pathologic events that started years before. The systemic characteristics and related mechanisms could be investigated through the multi-omic comparative analysis of many inflammatory diseases. Therefore, it is important to use molecular data to study the genesis of the diseases. Here we propose a new methodology to study the relationships between inflammatory diseases and signalling molecules whose dysregulation at molecular levels could lead to systemic pathological events observed in inflammatory diseases. RESULTS We first perform an exploratory analysis of gene expression data of a number of diseases that involve a strong inflammatory component. The comparison of gene expression between disease and healthy samples reveals the importance of members of gene families coding for signalling factors. Next, we focus on interested signalling gene families and a subset of inflammation related diseases with multi-omic features including both gene expression and DNA methylation. We introduce a phylogenetic-based multi-omic method to study the relationships between multi-omic features of inflammation related diseases by integrating gene expression, DNA methylation through sequence based phylogeny of the signalling gene families. The models of adaptations between gene expression and DNA methylation can be inferred from pre-estimated evolutionary relationship of a gene family. Members of the gene family whose expression or methylation levels significantly deviate from the model are considered as the potential disease associated genes. CONCLUSIONS Applying the methodology to four gene families (the chemokine receptor family, the TNF receptor family, the TGF- β gene family, the IL-17 gene family) in nine inflammation related diseases, we identify disease associated genes which exhibit significant dysregulation in gene expression or DNA methylation in the inflammation related diseases, which provides clues for functional associations between the diseases.
Collapse
Affiliation(s)
- Hui Xiao
- Computer Laboratory, University of Cambridge, Cambridge, UK
| | - Krzysztof Bartoszek
- Department of Computer and Information Science, Linköping University, Linköping, Sweden
| | - Pietro Lio’
- Computer Laboratory, University of Cambridge, Cambridge, UK
| |
Collapse
|
39
|
Alcalá-Corona SA, Espinal-Enríquez J, de Anda-Jáuregui G, Hernández-Lemus E. The Hierarchical Modular Structure of HER2+ Breast Cancer Network. Front Physiol 2018; 9:1423. [PMID: 30364267 PMCID: PMC6193406 DOI: 10.3389/fphys.2018.01423] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2018] [Accepted: 09/19/2018] [Indexed: 11/13/2022] Open
Abstract
HER2-enriched breast cancer is a complex disease characterized by the overexpression of the ERBB2 amplicon. While the effects of this genomic aberration on the pathology have been studied, genome-wide deregulation patterns in this subtype of cancer are also observed. A novel approach to the study of this malignant neoplasy is the use of transcriptional networks. These networks generally exhibit modular structures, which in turn may be associated to biological processes. This modular regulation of biological functions may also exhibit a hierarchical structure, with deeper levels of modular organization accounting for more specific functional regulation. In this work, we identified the most probable (maximum likelihood) model of the hierarchical modular structure of the HER2-enriched transcriptional network as reconstructed from gene expression data, and analyzed the statistical associations of modules and submodules to biological functions. We found modular structures, independent from direct ERBB2 amplicon regulation, involved in different biological functions such as signaling, immunity, and cellular morphology. Higher resolution submodules were identified in more specific functions, such as micro-RNA regulation and the activation of viral-like immune response. We propose the approach presented here as one that may help to unveil mechanisms involved in the development of the pathology.
Collapse
Affiliation(s)
- Sergio Antonio Alcalá-Corona
- Computational Genomics, National Institute of Genomic Medicine, Mexico City, Mexico.,Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de Mexico, Ciudad de Mexico, Mexico
| | - Jesús Espinal-Enríquez
- Computational Genomics, National Institute of Genomic Medicine, Mexico City, Mexico.,Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de Mexico, Ciudad de Mexico, Mexico
| | | | - Enrique Hernández-Lemus
- Computational Genomics, National Institute of Genomic Medicine, Mexico City, Mexico.,Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de Mexico, Ciudad de Mexico, Mexico
| |
Collapse
|
40
|
Using multiplex networks to capture the multidimensional nature of social structure. Primates 2018; 60:277-295. [DOI: 10.1007/s10329-018-0686-3] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2018] [Accepted: 09/03/2018] [Indexed: 01/02/2023]
|
41
|
Ozturk K, Dow M, Carlin DE, Bejar R, Carter H. The Emerging Potential for Network Analysis to Inform Precision Cancer Medicine. J Mol Biol 2018; 430:2875-2899. [PMID: 29908887 PMCID: PMC6097914 DOI: 10.1016/j.jmb.2018.06.016] [Citation(s) in RCA: 54] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Revised: 05/30/2018] [Accepted: 06/06/2018] [Indexed: 12/19/2022]
Abstract
Precision cancer medicine promises to tailor clinical decisions to patients using genomic information. Indeed, successes of drugs targeting genetic alterations in tumors, such as imatinib that targets BCR-ABL in chronic myelogenous leukemia, have demonstrated the power of this approach. However, biological systems are complex, and patients may differ not only by the specific genetic alterations in their tumor, but also by more subtle interactions among such alterations. Systems biology and more specifically, network analysis, provides a framework for advancing precision medicine beyond clinical actionability of individual mutations. Here we discuss applications of network analysis to study tumor biology, early methods for N-of-1 tumor genome analysis, and the path for such tools to the clinic.
Collapse
Affiliation(s)
- Kivilcim Ozturk
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Michelle Dow
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Daniel E Carlin
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA
| | - Rafael Bejar
- Moores Cancer Center, Division of Hematology and Oncology, University of California San Diego, La Jolla, CA 92093, USA
| | - Hannah Carter
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA; Moores Cancer Center and Institute for Genomic Medicine, University of California San Diego, La Jolla, CA 92093, USA; CIFAR, MaRS Centre, West Tower, 661 University Ave., Suite 505, Toronto, ON M5G 1M1, Canada.
| |
Collapse
|
42
|
Xu M, Zhao Z, Zhang X, Gao A, Wu S, Wang J. Synstable Fusion: A Network-Based Algorithm for Estimating Driver Genes in Fusion Structures. Molecules 2018; 23:molecules23082055. [PMID: 30115851 PMCID: PMC6222865 DOI: 10.3390/molecules23082055] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Revised: 08/02/2018] [Accepted: 08/07/2018] [Indexed: 12/22/2022] Open
Abstract
Gene fusion structure is a class of common somatic mutational events in cancer genomes, which are often formed by chromosomal mutations. Identifying the driver gene(s) in a fusion structure is important for many downstream analyses and it contributes to clinical practices. Existing computational approaches have prioritized the importance of oncogenes by incorporating prior knowledge from gene networks. However, different methods sometimes suffer different weaknesses when handling gene fusion data due to multiple issues such as fusion gene representation, network integration, and the effectiveness of the evaluation algorithms. In this paper, Synstable Fusion (SYN), an algorithm for computationally evaluating the fusion genes, is proposed. This algorithm uses network-based strategy by incorporating gene networks as prior information, but estimates the driver genes according to the destructiveness hypothesis. This hypothesis balances the two popular evaluation strategies in the existing studies, thereby providing more comprehensive results. A machine learning framework is introduced to integrate multiple networks and further solve the conflicting results from different networks. In addition, a synchronous stability model is established to reduce the computational complexity of the evaluation algorithm. To evaluate the proposed algorithm, we conduct a series of experiments on both artificial and real datasets. The results demonstrate that the proposed algorithm performs well on different configurations and is robust when altering the internal parameter settings.
Collapse
Affiliation(s)
- Mingzhe Xu
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.
- Department of Automation, College of Intelligent Manufacturing and Automation, Henan University of Animal Husbandry and Economy, Zhengzhou 450011, China.
- Shaanxi Engineering Research Center of Medical and Health Big Data, School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.
| | - Zhongmeng Zhao
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.
- Shaanxi Engineering Research Center of Medical and Health Big Data, School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.
| | - Xuanping Zhang
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.
- Shaanxi Engineering Research Center of Medical and Health Big Data, School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.
| | - Aiqing Gao
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.
- Shaanxi Engineering Research Center of Medical and Health Big Data, School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.
| | - Shuyan Wu
- Department of Network Technology, College of Intelligent Manufacturing and Automation, Henan University of Animal Husbandry and Economy, Zhengzhou 450011, China.
| | - Jiayin Wang
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.
- Shaanxi Engineering Research Center of Medical and Health Big Data, School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.
| |
Collapse
|
43
|
ClueNet: Clustering a temporal network based on topological similarity rather than denseness. PLoS One 2018; 13:e0195993. [PMID: 29738568 PMCID: PMC5940177 DOI: 10.1371/journal.pone.0195993] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2017] [Accepted: 04/04/2018] [Indexed: 11/19/2022] Open
Abstract
Network clustering is a very popular topic in the network science field. Its goal is to divide (partition) the network into groups (clusters or communities) of "topologically related" nodes, where the resulting topology-based clusters are expected to "correlate" well with node label information, i.e., metadata, such as cellular functions of genes/proteins in biological networks, or age or gender of people in social networks. Even for static data, the problem of network clustering is complex. For dynamic data, the problem is even more complex, due to an additional dimension of the data-their temporal (evolving) nature. Since the problem is computationally intractable, heuristic approaches need to be sought. Existing approaches for dynamic network clustering (DNC) have drawbacks. First, they assume that nodes should be in the same cluster if they are densely interconnected within the network. We hypothesize that in some applications, it might be of interest to cluster nodes that are topologically similar to each other instead of or in addition to requiring the nodes to be densely interconnected. Second, they ignore temporal information in their early steps, and when they do consider this information later on, they do so implicitly. We hypothesize that capturing temporal information earlier in the clustering process and doing so explicitly will improve results. We test these two hypotheses via our new approach called ClueNet. We evaluate ClueNet against six existing DNC methods on both social networks capturing evolving interactions between individuals (such as interactions between students in a high school) and biological networks capturing interactions between biomolecules in the cell at different ages. We find that ClueNet is superior in over 83% of all evaluation tests. As more real-world dynamic data are becoming available, DNC and thus ClueNet will only continue to gain importance.
Collapse
|
44
|
Ma X, Sun P, Zhang ZY. An Integrative Framework for Protein Interaction Network and Methylation Data to Discover Epigenetic Modules. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 16:1855-1866. [PMID: 29994031 DOI: 10.1109/tcbb.2018.2831666] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
DNA methylation is a critical epigenetic modification that plays an important role in cancers. The available algorithms fail to fully characterize epigenetic modules. To address this issue, we first characterize the epigenetic module as a group of well-connected genes in the protein interaction network and are also co-methylated based on gene methylation profiles. Then, the epigenetic module discovery problem is transformed into an optimization problem. Then, a regularized nonnegative matrix factorization algorithm for methylation modules (RNMF-MM) is presented, where the co-methylation constraint is treated as a regularizer. Using the artificial networks with known module structure, we demonstrate that the proposed algorithm outperforms state-of-the-art approaches in terms of accuracy. On the basis of breast cancer methylation data and protein interaction network, the RNMF-MM algorithm discovers methylation modules that are significantly more enriched by the known pathways than those obtained by other algorithms. These modules serve as biomarkers for predicting cancer stages and estimating survival time of patients. The proposed model and algorithm provide an effective way for the integrative analysis of protein interaction network and methylation data.
Collapse
|
45
|
Abstract
BACKGROUND Omics profiling is now a routine component of biomedical studies. In the analysis of omics data, clustering is an essential step and serves multiple purposes including for example revealing the unknown functionalities of omics units, assisting dimension reduction in outcome model building, and others. In the most recent omics studies, a prominent trend is to conduct multilayer profiling, which collects multiple types of genetic, genomic, epigenetic and other measurements on the same subjects. In the literature, clustering methods tailored to multilayer omics data are still limited. Directly applying the existing clustering methods to multilayer omics data and clustering each layer first and then combing across layers are both "suboptimal" in that they do not accommodate the interconnections within layers and across layers in an informative way. METHODS In this study, we develop the MuNCut (Multilayer NCut) clustering approach. It is tailored to multilayer omics data and sufficiently accounts for both across- and within-layer connections. It is based on the novel NCut technique and also takes advantages of regularized sparse estimation. It has an intuitive formulation and is computationally very feasible. To facilitate implementation, we develop the function muncut in the R package NcutYX. RESULTS Under a wide spectrum of simulation settings, it outperforms competitors. The analysis of TCGA (The Cancer Genome Atlas) data on breast cancer and cervical cancer shows that MuNCut generates biologically meaningful results which differ from those using the alternatives. CONCLUSIONS We propose a more effective clustering analysis of multiple omics data. It provides a new venue for jointly analyzing genetic, genomic, epigenetic and other measurements.
Collapse
|
46
|
Cava C, Bertoli G, Colaprico A, Olsen C, Bontempi G, Castiglioni I. Integration of multiple networks and pathways identifies cancer driver genes in pan-cancer analysis. BMC Genomics 2018; 19:25. [PMID: 29304754 PMCID: PMC5756345 DOI: 10.1186/s12864-017-4423-x] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2017] [Accepted: 12/27/2017] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Modern high-throughput genomic technologies represent a comprehensive hallmark of molecular changes in pan-cancer studies. Although different cancer gene signatures have been revealed, the mechanism of tumourigenesis has yet to be completely understood. Pathways and networks are important tools to explain the role of genes in functional genomic studies. However, few methods consider the functional non-equal roles of genes in pathways and the complex gene-gene interactions in a network. RESULTS We present a novel method in pan-cancer analysis that identifies de-regulated genes with a functional role by integrating pathway and network data. A pan-cancer analysis of 7158 tumour/normal samples from 16 cancer types identified 895 genes with a central role in pathways and de-regulated in cancer. Comparing our approach with 15 current tools that identify cancer driver genes, we found that 35.6% of the 895 genes identified by our method have been found as cancer driver genes with at least 2/15 tools. Finally, we applied a machine learning algorithm on 16 independent GEO cancer datasets to validate the diagnostic role of cancer driver genes for each cancer. We obtained a list of the top-ten cancer driver genes for each cancer considered in this study. CONCLUSIONS Our analysis 1) confirmed that there are several known cancer driver genes in common among different types of cancer, 2) highlighted that cancer driver genes are able to regulate crucial pathways.
Collapse
Affiliation(s)
- Claudia Cava
- Institute of Molecular Bioimaging and Physiology, National Research Council (IBFM-CNR), Via F.Cervi 93, 20090 Milan, Segrate-Milan Italy
| | - Gloria Bertoli
- Institute of Molecular Bioimaging and Physiology, National Research Council (IBFM-CNR), Via F.Cervi 93, 20090 Milan, Segrate-Milan Italy
| | - Antonio Colaprico
- Interuniversity Institute of Bioinformatics in Brussels (IB)2, 1050 Brussels, Belgium
- Machine Learning Group (MLG), Department d’Informatique, Universite libre de Bruxelles (ULB), 1050 Brussels, Belgium
| | - Catharina Olsen
- Interuniversity Institute of Bioinformatics in Brussels (IB)2, 1050 Brussels, Belgium
- Machine Learning Group (MLG), Department d’Informatique, Universite libre de Bruxelles (ULB), 1050 Brussels, Belgium
| | - Gianluca Bontempi
- Interuniversity Institute of Bioinformatics in Brussels (IB)2, 1050 Brussels, Belgium
- Machine Learning Group (MLG), Department d’Informatique, Universite libre de Bruxelles (ULB), 1050 Brussels, Belgium
| | - Isabella Castiglioni
- Institute of Molecular Bioimaging and Physiology, National Research Council (IBFM-CNR), Via F.Cervi 93, 20090 Milan, Segrate-Milan Italy
| |
Collapse
|
47
|
Abstract
Most biological mechanisms involve more than one type of biomolecule, and hence operate not solely at the level of either genome, transcriptome, proteome, metabolome or ionome. Datasets resulting from single-omic analysis are rapidly increasing in throughput and quality, rendering multi-omic studies feasible. These should offer a comprehensive, structured and interactive overview of a biological mechanism. However, combining single-omic datasets in a meaningful manner has so far proved challenging, and the discovery of new biological information lags behind expectation. One reason is that experiments conducted in different laboratories can typically not to be combined without restriction. Second, the interpretation of multi-omic datasets represents a significant challenge by nature, as the biological datasets are heterogeneous not only for technical, but also for biological, chemical, and physical reasons. Here, multi-layer network theory and methods of artificial intelligence might contribute to solve these problems. For the efficient application of machine learning however, biological datasets need to become more systematic, more precise - and much larger. We conclude our review with basic guidelines for the successful set-up of a multi-omic experiment.
Collapse
|
48
|
Alcalá-Corona SA, de Anda-Jáuregui G, Espinal-Enríquez J, Hernández-Lemus E. Network Modularity in Breast Cancer Molecular Subtypes. Front Physiol 2017; 8:915. [PMID: 29204123 PMCID: PMC5699328 DOI: 10.3389/fphys.2017.00915] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 10/30/2017] [Indexed: 01/20/2023] Open
Abstract
Breast cancer is a heterogeneous and complex disease, a clear manifestation of this is its classification into different molecular subtypes. On the other hand, gene transcriptional networks may exhibit different modular structures that can be related to known biological processes. Thus, modular structures in transcriptional networks may be seen as manifestations of regulatory structures that tightly controls biological processes. In this work, we identify modular structures on gene transcriptional networks previously inferred from microarray data of molecular subtypes of breast cancer: luminal A, luminal B, basal, and HER2-enriched. We analyzed the modules (communities) found in each network to identify particular biological functions (described in the Gene Ontology database) associated to them. We further explored these modules and their associated functions to identify common and unique features that could allow a better level of description of breast cancer, particularly in the basal-like subtype, the most aggressive and poor prognosis manifestation. Our findings related to the immune system and a decrease in cell death-related processes in basal subtype could help to understand it and design strategies for its treatment.
Collapse
Affiliation(s)
- Sergio Antonio Alcalá-Corona
- Computational Genomics, National Institute of Genomic Medicine, México City, Mexico.,Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, México City, Mexico
| | - Guillermo de Anda-Jáuregui
- Computational Genomics, National Institute of Genomic Medicine, México City, Mexico.,School of Medicine and Health Sciences, University of North Dakota, Grand Forks, ND, United States
| | - Jesús Espinal-Enríquez
- Computational Genomics, National Institute of Genomic Medicine, México City, Mexico.,Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, México City, Mexico
| | - Enrique Hernández-Lemus
- Computational Genomics, National Institute of Genomic Medicine, México City, Mexico.,Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, México City, Mexico
| |
Collapse
|
49
|
Wang D, Wang H, Zou X. Identifying key nodes in multilayer networks based on tensor decomposition. CHAOS (WOODBURY, N.Y.) 2017; 27:063108. [PMID: 28679235 DOI: 10.1063/1.4985185] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The identification of essential agents in multilayer networks characterized by different types of interactions is a crucial and challenging topic, one that is essential for understanding the topological structure and dynamic processes of multilayer networks. In this paper, we use the fourth-order tensor to represent multilayer networks and propose a novel method to identify essential nodes based on CANDECOMP/PARAFAC (CP) tensor decomposition, referred to as the EDCPTD centrality. This method is based on the perspective of multilayer networked structures, which integrate the information of edges among nodes and links between different layers to quantify the importance of nodes in multilayer networks. Three real-world multilayer biological networks are used to evaluate the performance of the EDCPTD centrality. The bar chart and ROC curves of these multilayer networks indicate that the proposed approach is a good alternative index to identify real important nodes. Meanwhile, by comparing the behavior of both the proposed method and the aggregated single-layer methods, we demonstrate that neglecting the multiple relationships between nodes may lead to incorrect identification of the most versatile nodes. Furthermore, the Gene Ontology functional annotation demonstrates that the identified top nodes based on the proposed approach play a significant role in many vital biological processes. Finally, we have implemented many centrality methods of multilayer networks (including our method and the published methods) and created a visual software based on the MATLAB GUI, called ENMNFinder, which can be used by other researchers.
Collapse
Affiliation(s)
- Dingjie Wang
- School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China
| | - Haitao Wang
- School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China
| | - Xiufen Zou
- School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China
| |
Collapse
|
50
|
|