1
|
Choi Y, Cha J, Choi S. Evaluation of penalized and machine learning methods for asthma disease prediction in the Korean Genome and Epidemiology Study (KoGES). BMC Bioinformatics 2024; 25:56. [PMID: 38308205 PMCID: PMC10837879 DOI: 10.1186/s12859-024-05677-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 01/26/2024] [Indexed: 02/04/2024] Open
Abstract
BACKGROUND Genome-wide association studies have successfully identified genetic variants associated with human disease. Various statistical approaches based on penalized and machine learning methods have recently been proposed for disease prediction. In this study, we evaluated the performance of several such methods for predicting asthma using the Korean Chip (KORV1.1) from the Korean Genome and Epidemiology Study (KoGES). RESULTS First, single-nucleotide polymorphisms were selected via single-variant tests using logistic regression with the adjustment of several epidemiological factors. Next, we evaluated the following methods for disease prediction: ridge, least absolute shrinkage and selection operator, elastic net, smoothly clipped absolute deviation, support vector machine, random forest, boosting, bagging, naïve Bayes, and k-nearest neighbor. Finally, we compared their predictive performance based on the area under the curve of the receiver operating characteristic curves, precision, recall, F1-score, Cohen's Kappa, balanced accuracy, error rate, Matthews correlation coefficient, and area under the precision-recall curve. Additionally, three oversampling algorithms are used to deal with imbalance problems. CONCLUSIONS Our results show that penalized methods exhibit better predictive performance for asthma than that achieved via machine learning methods. On the other hand, in the oversampling study, randomforest and boosting methods overall showed better prediction performance than penalized methods.
Collapse
Affiliation(s)
- Yongjun Choi
- Department of Applied Artificial Intelligence, College of Computing, Hanyang University, 55 Hanyang-daehak-ro, Sangnok-gu, Ansan, 15588, South Korea
| | - Junho Cha
- Department of Applied Artificial Intelligence, College of Computing, Hanyang University, 55 Hanyang-daehak-ro, Sangnok-gu, Ansan, 15588, South Korea
| | - Sungkyoung Choi
- Department of Applied Artificial Intelligence, College of Computing, Hanyang University, 55 Hanyang-daehak-ro, Sangnok-gu, Ansan, 15588, South Korea.
- Department of Mathematical Data Science, College of Science and Convergence Technology, Hanyang University, 55 Hanyang-daehak-ro, Sangnok-gu, Ansan, 15588, South Korea.
| |
Collapse
|
2
|
Balachandran S, Prada-Medina CA, Mensah MA, Kakar N, Nagel I, Pozojevic J, Audain E, Hitz MP, Kircher M, Sreenivasan VKA, Spielmann M. STIGMA: Single-cell tissue-specific gene prioritization using machine learning. Am J Hum Genet 2024; 111:338-349. [PMID: 38228144 PMCID: PMC10870135 DOI: 10.1016/j.ajhg.2023.12.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 12/01/2023] [Accepted: 12/07/2023] [Indexed: 01/18/2024] Open
Abstract
Clinical exome and genome sequencing have revolutionized the understanding of human disease genetics. Yet many genes remain functionally uncharacterized, complicating the establishment of causal disease links for genetic variants. While several scoring methods have been devised to prioritize these candidate genes, these methods fall short of capturing the expression heterogeneity across cell subpopulations within tissues. Here, we introduce single-cell tissue-specific gene prioritization using machine learning (STIGMA), an approach that leverages single-cell RNA-seq (scRNA-seq) data to prioritize candidate genes associated with rare congenital diseases. STIGMA prioritizes genes by learning the temporal dynamics of gene expression across cell types during healthy organogenesis. To assess the efficacy of our framework, we applied STIGMA to mouse limb and human fetal heart scRNA-seq datasets. In a cohort of individuals with congenital limb malformation, STIGMA prioritized 469 variants in 345 genes, with UBA2 as a notable example. For congenital heart defects, we detected 34 genes harboring nonsynonymous de novo variants (nsDNVs) in two or more individuals from a set of 7,958 individuals, including the ortholog of Prdm1, which is associated with hypoplastic left ventricle and hypoplastic aortic arch. Overall, our findings demonstrate that STIGMA effectively prioritizes tissue-specific candidate genes by utilizing single-cell transcriptome data. The ability to capture the heterogeneity of gene expression across cell populations makes STIGMA a powerful tool for the discovery of disease-associated genes and facilitates the identification of causal variants underlying human genetic disorders.
Collapse
Affiliation(s)
- Saranya Balachandran
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany
| | - Cesar A Prada-Medina
- Human Molecular Genetics Group, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| | - Martin A Mensah
- Institut für Medizinische Genetik und Humangenetik, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Augustenburger Platz 1, 13353 Berlin, Germany; BIH Charité Digital Clinician Scientist Program, BIH Biomedical Innovation Academy, Anna-Louisa-Karsch-Strasse 2, 10178 Berlin, Germany; RG Development & Disease, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| | - Naseebullah Kakar
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany; Department of Biotechnology, BUITEMS, Quetta, Pakistan
| | - Inga Nagel
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany
| | - Jelena Pozojevic
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany
| | - Enrique Audain
- Institute of Medical Genetics, Carl von Ossietzky University, 26129 Oldenburg, Germany; DZHK e.V. (German Center for Cardiovascular Research), Partner Site Hamburg/Kiel/Lübeck; Department of Congenital Heart Disease and Pediatric Cardiology, University Hospital of Schleswig-Holstein, 24105 Kiel, Germany
| | - Marc-Phillip Hitz
- Institute of Medical Genetics, Carl von Ossietzky University, 26129 Oldenburg, Germany; DZHK e.V. (German Center for Cardiovascular Research), Partner Site Hamburg/Kiel/Lübeck; Department of Congenital Heart Disease and Pediatric Cardiology, University Hospital of Schleswig-Holstein, 24105 Kiel, Germany
| | - Martin Kircher
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany
| | - Varun K A Sreenivasan
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany.
| | - Malte Spielmann
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany; Human Molecular Genetics Group, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany; DZHK e.V. (German Center for Cardiovascular Research), Partner Site Hamburg/Kiel/Lübeck.
| |
Collapse
|
3
|
Dmitrzak-Węglarz M, Rybakowski J, Szczepankiewicz A, Kapelski P, Lesicka M, Jabłońska E, Reszka E, Pawlak J. Identification of shared disease marker genes and underlying mechanisms between major depression and rheumatoid arthritis. J Psychiatr Res 2023; 168:22-29. [PMID: 37871462 DOI: 10.1016/j.jpsychires.2023.10.036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 07/28/2023] [Accepted: 10/16/2023] [Indexed: 10/25/2023]
Abstract
Both depression and rheumatoid arthritis (RA) have a very high comorbidity rate. A bilateral association is estimated to increase the mutual risk and the common denominator is inflammation being observed in both diseases. Previous studies have mainly focused on assessing peripheral blood's inflammatory and pro-inflammatory cytokines levels. We aimed to extend insights into the molecular mechanisms of depression based on hub RA genes. To do so, we prioritized RA-related genes using in-silico tools. We then investigated whether RA-related genes undergo altered expression in patients with unipolar and bipolar depression without a concurrent RA diagnosis and any exponents of active inflammation. In addition, we selected a homogeneous group of patients treated with lithium (Li), which has immunomodulatory properties. The study was performed on patients with bipolar depression (BD, n = 45; Li, n = 20), unipolar depression (UD, n = 27), and healthy controls (HC, n = 22) of both sexes. To identify DEGs in peripheral blood mononuclear cells (PBMCs), we used the SurePrint G3 Microarray and GeneSpring software. We selected a list of 180 hub genes whose altered expression we analyzed using the expression microarray results. In the entire study group, we identified altered expression of 93 of the 180 genes, including 35 down-regulated (OPRM1 gene with highest FC > 3) and 58 up-regulated (TLR4 gene with highest FC > 3). In UD patients, we observed maximally up-regulated expression of the TEK gene (FC > 3), and in BD of the CXCL8 gene (FC > 5). On the other hand, in lithium-treated patients, the gene with the most reduced expression was the TRPV1 gene. The study proved that depression and RA are produced by a partially shared "inflammatory interactome" in which the opioid and angiogenesis pathways are important.
Collapse
Affiliation(s)
| | - Janusz Rybakowski
- Department of Adult Psychiatry, Poznan University of Medical Sciences, Poland.
| | - Aleksandra Szczepankiewicz
- Laboratory of Molecular and Cell Biology, Department of Pediatric Pulmonology, Allergy and Clinical Immunology, Poznan University of Medical Sciences, Poland.
| | - Paweł Kapelski
- Department of Psychiatric Genetics, Poznan University of Medical Sciences, Poland.
| | - Monika Lesicka
- Department of Translational Research, Nofer Institute of Occupational Medicine, Lodz, Poland.
| | - Ewa Jabłońska
- Department of Translational Research, Nofer Institute of Occupational Medicine, Lodz, Poland.
| | - Edyta Reszka
- Department of Translational Research, Nofer Institute of Occupational Medicine, Lodz, Poland.
| | - Joanna Pawlak
- Department of Psychiatric Genetics, Poznan University of Medical Sciences, Poland.
| |
Collapse
|
4
|
Raimondi D, Chizari H, Verplaetse N, Löscher BS, Franke A, Moreau Y. Genome interpretation in a federated learning context allows the multi-center exome-based risk prediction of Crohn's disease patients. Sci Rep 2023; 13:19449. [PMID: 37945674 PMCID: PMC10636050 DOI: 10.1038/s41598-023-46887-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 11/06/2023] [Indexed: 11/12/2023] Open
Abstract
High-throughput sequencing allowed the discovery of many disease variants, but nowadays it is becoming clear that the abundance of genomics data mostly just moved the bottleneck in Genetics and Precision Medicine from a data availability issue to a data interpretation issue. To solve this empasse it would be beneficial to apply the latest Deep Learning (DL) methods to the Genome Interpretation (GI) problem, similarly to what AlphaFold did for Structural Biology. Unfortunately DL requires large datasets to be viable, and aggregating genomics datasets poses several legal, ethical and infrastructural complications. Federated Learning (FL) is a Machine Learning (ML) paradigm designed to tackle these issues. It allows ML methods to be collaboratively trained and tested on collections of physically separate datasets, without requiring the actual centralization of sensitive data. FL could thus be key to enable DL applications to GI on sufficiently large genomics data. We propose FedCrohn, a FL GI Neural Network model for the exome-based Crohn's Disease risk prediction, providing a proof-of-concept that FL is a viable paradigm to build novel ML GI approaches. We benchmark it in several realistic scenarios, showing that FL can indeed provide performances similar to conventional ML on centralized data, and that collaborating in FL initiatives is likely beneficial for most of the medical centers participating in them.
Collapse
Affiliation(s)
| | | | | | - Britt-Sabina Löscher
- Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Kiel, Germany
- University Medical Center Schleswig-Holstein, Kiel, Germany
| | - Andre Franke
- Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Kiel, Germany
- University Medical Center Schleswig-Holstein, Kiel, Germany
| | - Yves Moreau
- ESAT-STADIUS, KU Leuven, 3001, Leuven, Belgium
| |
Collapse
|
5
|
Molotkov I, Artomov M. Detecting biased validation of predictive models in the positive-unlabeled setting: disease gene prioritization case study. BIOINFORMATICS ADVANCES 2023; 3:vbad128. [PMID: 37745001 PMCID: PMC10517638 DOI: 10.1093/bioadv/vbad128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/13/2023] [Accepted: 09/12/2023] [Indexed: 09/26/2023]
Abstract
Motivation Positive-unlabeled data consists of points with either positive or unknown labels. It is widespread in medical, genetic, and biological settings, creating a high demand for predictive positive-unlabeled models. The performance of such models is usually estimated using validation sets, assumed to be selected completely at random (SCAR) from known positive examples. For certain metrics, this assumption enables unbiased performance estimation when treating positive-unlabeled data as positive/negative. However, the SCAR assumption is often adopted without proper justifications, simply for the sake of convenience. Results We provide an algorithm that under the weak assumptions of a lower bound on the number of positive examples can test for the violation of the SCAR assumption. Applying it to the problem of gene prioritization for complex genetic traits, we illustrate that the SCAR assumption is often violated there, causing the inflation of performance estimates, which we refer to as validation bias. We estimate the potential impact of validation bias on performance estimation. Our analysis reveals that validation bias is widespread in gene prioritization data and can significantly overestimate the performance of models. This finding elucidates the discrepancy between the reported good performance of models and their limited practical applications. Availability and implementation Python code with examples of application of the validation bias detection algorithm is available at github.com/ArtomovLab/ValidationBias.
Collapse
Affiliation(s)
- Ivan Molotkov
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children’s Hospital, Columbus, OH, United States
- Department of Pediatrics, The Ohio State University, Columbus, OH, United States
- ITMO University, Saint Petersburg, Russia
| | - Mykyta Artomov
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children’s Hospital, Columbus, OH, United States
- Department of Pediatrics, The Ohio State University, Columbus, OH, United States
| |
Collapse
|
6
|
Rahaie Z, Rabiee HR, Alinejad-Rokny H. DeepGenePrior: A deep learning model for prioritizing genes affected by copy number variants. PLoS Comput Biol 2023; 19:e1011249. [PMID: 37486921 PMCID: PMC10399873 DOI: 10.1371/journal.pcbi.1011249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 08/03/2023] [Accepted: 06/06/2023] [Indexed: 07/26/2023] Open
Abstract
The genetic etiology of brain disorders is highly heterogeneous, characterized by abnormalities in the development of the central nervous system that lead to diminished physical or intellectual capabilities. The process of determining which gene drives disease, known as "gene prioritization," is not entirely understood. Genome-wide searches for gene-disease associations are still underdeveloped due to reliance on previous discoveries and evidence sources with false positive or negative relations. This paper introduces DeepGenePrior, a model based on deep neural networks that prioritizes candidate genes in genetic diseases. Using the well-studied Variational AutoEncoder (VAE), we developed a score to measure the impact of genes on target diseases. Unlike other methods that use prior data to select candidate genes, based on the "guilt by association" principle and auxiliary data sources like protein networks, our study exclusively employs copy number variants (CNVs) for gene prioritization. By analyzing CNVs from 74,811 individuals with autism, schizophrenia, and developmental delay, we identified genes that best distinguish cases from controls. Our findings indicate a 12% increase in fold enrichment in brain-expressed genes compared to previous studies and a 15% increase in genes associated with mouse nervous system phenotypes. Furthermore, we identified common deletions in ZDHHC8, DGCR5, and CATG00000022283 among the top genes related to all three disorders, suggesting a common etiology among these clinically distinct conditions. DeepGenePrior is publicly available online at http://git.dml.ir/z_rahaie/DGP to address obstacles in existing gene prioritization studies identifying candidate genes.
Collapse
Affiliation(s)
- Zahra Rahaie
- BCB Group, DML, Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| | - Hamid R. Rabiee
- BCB Group, DML, Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| | - Hamid Alinejad-Rokny
- UNSW Biomedical Machine Learning Lab (BML), the Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, Australia
| |
Collapse
|
7
|
Licata L, Via A, Turina P, Babbi G, Benevenuta S, Carta C, Casadio R, Cicconardi A, Facchiano A, Fariselli P, Giordano D, Isidori F, Marabotti A, Martelli PL, Pascarella S, Pinelli M, Pippucci T, Russo R, Savojardo C, Scafuri B, Valeriani L, Capriotti E. Resources and tools for rare disease variant interpretation. Front Mol Biosci 2023; 10:1169109. [PMID: 37234922 PMCID: PMC10206239 DOI: 10.3389/fmolb.2023.1169109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2023] [Accepted: 04/25/2023] [Indexed: 05/28/2023] Open
Abstract
Collectively, rare genetic disorders affect a substantial portion of the world's population. In most cases, those affected face difficulties in receiving a clinical diagnosis and genetic characterization. The understanding of the molecular mechanisms of these diseases and the development of therapeutic treatments for patients are also challenging. However, the application of recent advancements in genome sequencing/analysis technologies and computer-aided tools for predicting phenotype-genotype associations can bring significant benefits to this field. In this review, we highlight the most relevant online resources and computational tools for genome interpretation that can enhance the diagnosis, clinical management, and development of treatments for rare disorders. Our focus is on resources for interpreting single nucleotide variants. Additionally, we present use cases for interpreting genetic variants in clinical settings and review the limitations of these results and prediction tools. Finally, we have compiled a curated set of core resources and tools for analyzing rare disease genomes. Such resources and tools can be utilized to develop standardized protocols that will enhance the accuracy and effectiveness of rare disease diagnosis.
Collapse
Affiliation(s)
- Luana Licata
- Department of Biology, University of Rome Tor Vergata, Roma, Italy
| | - Allegra Via
- Department of Biochemical Sciences “A. Rossi Fanelli”, University of Rome “La Sapienza”, Roma, Italy
| | - Paola Turina
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Giulia Babbi
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | | | - Claudio Carta
- National Centre for Rare Diseases, Istituto Superiore di Sanità, Roma, Italy
| | - Rita Casadio
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Andrea Cicconardi
- Department of Physics, University of Genova, Genova, Italy
- Italiano di Tecnologia—IIT, Genova, Italy
| | - Angelo Facchiano
- National Research Council, Institute of Food Science, Avellino, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Torino, Italy
| | - Deborah Giordano
- National Research Council, Institute of Food Science, Avellino, Italy
| | - Federica Isidori
- Medical Genetics Unit, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Anna Marabotti
- Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Fisciano, SA, Italy
| | - Pier Luigi Martelli
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Stefano Pascarella
- Department of Biochemical Sciences “A. Rossi Fanelli”, University of Rome “La Sapienza”, Roma, Italy
| | - Michele Pinelli
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Napoli, Italy
| | - Tommaso Pippucci
- Medical Genetics Unit, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Roberta Russo
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Napoli, Italy
- CEINGE Biotecnologie Avanzate Franco Salvatore, Napoli, Italy
| | - Castrense Savojardo
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Bernardina Scafuri
- Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Fisciano, SA, Italy
| | | | - Emidio Capriotti
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| |
Collapse
|
8
|
Meng P, Wang G, Guo H, Jiang T. Identifying cancer driver genes using a two-stage random walk with restart on a gene interaction network. Comput Biol Med 2023; 158:106810. [PMID: 37011433 DOI: 10.1016/j.compbiomed.2023.106810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Revised: 03/08/2023] [Accepted: 03/20/2023] [Indexed: 04/03/2023]
Abstract
Cancer development and progression are significantly influenced by cancer driver genes. Understanding cancer driver genes and their mechanisms of action is essential for developing effective cancer treatments. As a result, identifying driver genes is important for drug development, cancer diagnosis, and treatment. Here, we present an algorithm to discover driver genes based on the two-stage random walk with restart (RWR), and the modified method for calculating the transition probability matrix in random walk algorithm. First, we performed the first stage of RWR on the whole gene interaction network, in which we employ a new method for calculating the transition probability matrix and extracted the subnetwork based on nodes that had a high correlation with the seed nodes. The subnetwork was then applied to the second stage of RWR and the nodes were re-ranked in the subnetwork. Our approach outperformed existing methods in identifying driver genes. The outcome of the effect of three gene interaction networks, two rounds of random walk, and the seed nodes' sensitivity were all compared at the same time. In addition, we identified several potential driver genes, some of which are involved in driving cancer development. Overall, our method is efficient in various cancer types, significantly outperforms existing methods, and can identify possible driver genes.
Collapse
|
9
|
Kumar T, Sethuraman R, Mitra S, Ravindran B, Narayanan M. MultiCens: Multilayer network centrality measures to uncover molecular mediators of tissue-tissue communication. PLoS Comput Biol 2023; 19:e1011022. [PMID: 37093889 PMCID: PMC10159362 DOI: 10.1371/journal.pcbi.1011022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 05/04/2023] [Accepted: 03/12/2023] [Indexed: 04/25/2023] Open
Abstract
With the evolution of multicellularity, communication among cells in different tissues and organs became pivotal to life. Molecular basis of such communication has long been studied, but genome-wide screens for genes and other biomolecules mediating tissue-tissue signaling are lacking. To systematically identify inter-tissue mediators, we present a novel computational approach MultiCens (Multilayer/Multi-tissue network Centrality measures). Unlike single-layer network methods, MultiCens can distinguish within- vs. across-layer connectivity to quantify the "influence" of any gene in a tissue on a query set of genes of interest in another tissue. MultiCens enjoys theoretical guarantees on convergence and decomposability, and performs well on synthetic benchmarks. On human multi-tissue datasets, MultiCens predicts known and novel genes linked to hormones. MultiCens further reveals shifts in gene network architecture among four brain regions in Alzheimer's disease. MultiCens-prioritized hypotheses from these two diverse applications, and potential future ones like "Multi-tissue-expanded Gene Ontology" analysis, can enable whole-body yet molecular-level systems investigations in humans.
Collapse
Affiliation(s)
- Tarun Kumar
- Department of Computer Science and Engineering, Indian Institute of Technology (IIT) Madras, Chennai, India
- The Centre for Integrative Biology and Systems medicinE (IBSE), IIT Madras, Chennai, India
- Robert Bosch Center for Data Science and Artificial Intelligence (RBCDSAI), IIT Madras, Chennai, India
| | | | - Sanga Mitra
- Department of Computer Science and Engineering, Indian Institute of Technology (IIT) Madras, Chennai, India
| | - Balaraman Ravindran
- Department of Computer Science and Engineering, Indian Institute of Technology (IIT) Madras, Chennai, India
- The Centre for Integrative Biology and Systems medicinE (IBSE), IIT Madras, Chennai, India
- Robert Bosch Center for Data Science and Artificial Intelligence (RBCDSAI), IIT Madras, Chennai, India
| | - Manikandan Narayanan
- Department of Computer Science and Engineering, Indian Institute of Technology (IIT) Madras, Chennai, India
- The Centre for Integrative Biology and Systems medicinE (IBSE), IIT Madras, Chennai, India
- Robert Bosch Center for Data Science and Artificial Intelligence (RBCDSAI), IIT Madras, Chennai, India
- Multiscale Digital Neuroanatomy (MDN), IIT Madras, Chennai, India
| |
Collapse
|
10
|
Kumar R, Mahmoud MM, Tashkandi HM, Haque S, Harakeh S, Ponnusamy K, Haider S. Combinatorial Network of Transcriptional and miRNA Regulation in Colorectal Cancer. Int J Mol Sci 2023; 24:ijms24065356. [PMID: 36982429 PMCID: PMC10048903 DOI: 10.3390/ijms24065356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 03/02/2023] [Accepted: 03/06/2023] [Indexed: 03/16/2023] Open
Abstract
Colorectal cancer is one of the leading causes of cancer-associated mortality across the worldwide. One of the major challenges in colorectal cancer is the understanding of the regulatory mechanisms of biological molecules. In this study, we aimed to identify novel key molecules in colorectal cancer by using a computational systems biology approach. We constructed the colorectal protein–protein interaction network which followed hierarchical scale-free nature. We identified TP53, CTNBB1, AKT1, EGFR, HRAS, JUN, RHOA, and EGF as bottleneck-hubs. The HRAS showed the largest interacting strength with functional subnetworks, having strong correlation with protein phosphorylation, kinase activity, signal transduction, and apoptotic processes. Furthermore, we constructed the bottleneck-hubs’ regulatory networks with their transcriptional (transcription factor) and post-transcriptional (miRNAs) regulators, which exhibited the important key regulators. We observed miR-429, miR-622, and miR-133b and transcription factors (EZH2, HDAC1, HDAC4, AR, NFKB1, and KLF4) regulates four bottleneck-hubs (TP53, JUN, AKT1 and EGFR) at the motif level. In future, biochemical investigation of the observed key regulators could provide further understanding about their role in the pathophysiology of colorectal cancer.
Collapse
Affiliation(s)
- Rupesh Kumar
- Department of Biotechnology, Jaypee Institute of Information Technology, A-10, Sector 62, Noida 201309, India;
| | - Maged Mostafa Mahmoud
- King Fahd Medical Research Center, King Abdulaziz University, Jeddah 21589, Saudi Arabia
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah 21589, Saudi Arabia
- Molecular Genetics and Enzymology Department, Human Genetics and Genome Research Institute, National Research Centre, Cairo 12622, Egypt
| | - Hanaa M. Tashkandi
- Department of General Surgery, Faculty of Medicine, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Shafiul Haque
- Research and Scientific Studies Unit, College of Nursing and Allied Health Sciences, Jazan University, Jazan 45142, Saudi Arabia
- Gilbert and Rose-Marie Chagoury School of Medicine, Lebanese American University, Beirut 13-5053, Lebanon
- Centre of Medical and Bio-Allied Health Sciences Research, Ajman University, Ajman P.O. Box 346, United Arab Emirates
| | - Steve Harakeh
- King Fahd Medical Research Center, and Yousef Abdullatif Jameel Chair of Prophetic Medicine Application, Faculty of Medicine, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Kalaiarasan Ponnusamy
- Biotechnology Division, National Centre for Disease Control, New Delhi 110054, India
- Correspondence: (K.P.); (S.H.)
| | - Shazia Haider
- Department of Biotechnology, Jaypee Institute of Information Technology, A-10, Sector 62, Noida 201309, India;
- Correspondence: (K.P.); (S.H.)
| |
Collapse
|
11
|
Chen X, Huang L. Computational model for disease research. Brief Bioinform 2023; 24:6987819. [PMID: 36642407 DOI: 10.1093/bib/bbac615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Affiliation(s)
- Xing Chen
- Artificial Intelligence Research Institute, China University of Mining and Technology, Xuzhou 221116, China.,School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Li Huang
- The Future Laboratory, Tsinghua University, Beijing 100084, China
| |
Collapse
|
12
|
Draetta EL, Lazarević D, Provero P, Cittaro D. The frequency of somatic mutations in cancer predicts the phenotypic relevance of germline mutations. Front Genet 2023; 13:1045301. [PMID: 36699457 PMCID: PMC9868957 DOI: 10.3389/fgene.2022.1045301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 12/28/2022] [Indexed: 01/12/2023] Open
Abstract
Genomic sequence mutations can be pathogenic in both germline and somatic cells. Several authors have observed that often the same genes are involved in cancer when mutated in somatic cells and in genetic diseases when mutated in the germline. Recent advances in high-throughput sequencing techniques have provided us with large databases of both types of mutations, allowing us to investigate this issue in a systematic way. Hence, we applied a machine learning based framework to this problem, comparing multiple models. The models achieved significant predictive power as shown by both cross-validation and their application to recently discovered gene/phenotype associations not used for training. We found that genes characterized by high frequency of somatic mutations in the most common cancers and ancient evolutionary age are most likely to be involved in abnormal phenotypes and diseases. These results suggest that the combination of tolerance for mutations at the cell viability level (measured by the frequency of somatic mutations in cancer) and functional relevance (demonstrated by evolutionary conservation) are the main predictors of disease genes. Our results thus confirm the deep relationship between pathogenic mutations in somatic and germline cells, provide new insight into the common origin of cancer and genetic diseases, and can be used to improve the identification of new disease genes.
Collapse
Affiliation(s)
- Edoardo Luigi Draetta
- University of Milan, Milan, Italy,Center for Omics Sciences, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Dejan Lazarević
- Center for Omics Sciences, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Paolo Provero
- Center for Omics Sciences, IRCCS San Raffaele Scientific Institute, Milan, Italy,Department of Neurosciences “Rita Levi Montalcini”, University of Turin, Turin, Italy
| | - Davide Cittaro
- Center for Omics Sciences, IRCCS San Raffaele Scientific Institute, Milan, Italy,*Correspondence: Davide Cittaro ,
| |
Collapse
|
13
|
Okamoto J, Wang L, Yin X, Luca F, Pique-Regi R, Helms A, Im HK, Morrison J, Wen X. Probabilistic integration of transcriptome-wide association studies and colocalization analysis identifies key molecular pathways of complex traits. Am J Hum Genet 2023; 110:44-57. [PMID: 36608684 PMCID: PMC9892769 DOI: 10.1016/j.ajhg.2022.12.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 12/06/2022] [Indexed: 01/07/2023] Open
Abstract
Integrative genetic association methods have shown great promise in post-GWAS (genome-wide association study) analyses, in which one of the most challenging tasks is identifying putative causal genes and uncovering molecular mechanisms of complex traits. Recent studies suggest that prevailing computational approaches, including transcriptome-wide association studies (TWASs) and colocalization analysis, are individually imperfect, but their joint usage can yield robust and powerful inference results. This paper presents INTACT, a computational framework to integrate probabilistic evidence from these distinct types of analyses and implicate putative causal genes. This procedure is flexible and can work with a wide range of existing integrative analysis approaches. It has the unique ability to quantify the uncertainty of implicated genes, enabling rigorous control of false-positive discoveries. Taking advantage of this highly desirable feature, we further propose an efficient algorithm, INTACT-GSE, for gene set enrichment analysis based on the integrated probabilistic evidence. We examine the proposed computational methods and illustrate their improved performance over the existing approaches through simulation studies. We apply the proposed methods to analyze the multi-tissue eQTL data from the GTEx project and eight large-scale complex- and molecular-trait GWAS datasets from multiple consortia and the UK Biobank. Overall, we find that the proposed methods markedly improve the existing putative gene implication methods and are particularly advantageous in evaluating and identifying key gene sets and biological pathways underlying complex traits.
Collapse
Affiliation(s)
- Jeffrey Okamoto
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.
| | - Lijia Wang
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Xianyong Yin
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Francesca Luca
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI 48201, USA
| | - Roger Pique-Regi
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI 48201, USA
| | - Adam Helms
- University of Michigan School of Medicine, Ann Arbor, MI 48109, USA
| | - Hae Kyung Im
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Jean Morrison
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Xiaoquan Wen
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
14
|
Li MM, Huang K, Zitnik M. Graph representation learning in biomedicine and healthcare. Nat Biomed Eng 2022; 6:1353-1369. [PMID: 36316368 PMCID: PMC10699434 DOI: 10.1038/s41551-022-00942-x] [Citation(s) in RCA: 52] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2021] [Accepted: 08/09/2022] [Indexed: 11/11/2022]
Abstract
Networks-or graphs-are universal descriptors of systems of interacting elements. In biomedicine and healthcare, they can represent, for example, molecular interactions, signalling pathways, disease co-morbidities or healthcare systems. In this Perspective, we posit that representation learning can realize principles of network medicine, discuss successes and current limitations of the use of representation learning on graphs in biomedicine and healthcare, and outline algorithmic strategies that leverage the topology of graphs to embed them into compact vectorial spaces. We argue that graph representation learning will keep pushing forward machine learning for biomedicine and healthcare applications, including the identification of genetic variants underlying complex traits, the disentanglement of single-cell behaviours and their effects on health, the assistance of patients in diagnosis and treatment, and the development of safe and effective medicines.
Collapse
Affiliation(s)
- Michelle M Li
- Bioinformatics and Integrative Genomics Program, Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Kexin Huang
- Health Data Science Program, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Harvard Data Science Initiative, Cambridge, MA, USA.
| |
Collapse
|
15
|
Raimondi D, Orlando G, Verplaetse N, Fariselli P, Moreau Y. Editorial: Towards genome interpretation: Computational methods to model the genotype-phenotype relationship. FRONTIERS IN BIOINFORMATICS 2022; 2:1098941. [PMID: 36530385 PMCID: PMC9749061 DOI: 10.3389/fbinf.2022.1098941] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Accepted: 11/17/2022] [Indexed: 11/12/2023] Open
Affiliation(s)
| | | | | | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Torino, Italy
| | | |
Collapse
|
16
|
Nguyen T, Yue Z, Slominski R, Welner R, Zhang J, Chen JY. WINNER: A network biology tool for biomolecular characterization and prioritization. Front Big Data 2022; 5:1016606. [DOI: 10.3389/fdata.2022.1016606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Accepted: 10/14/2022] [Indexed: 11/06/2022] Open
Abstract
Background and contributionIn network biology, molecular functions can be characterized by network-based inference, or “guilt-by-associations.” PageRank-like tools have been applied in the study of biomolecular interaction networks to obtain further the relative significance of all molecules in the network. However, there is a great deal of inherent noise in widely accessible data sets for gene-to-gene associations or protein-protein interactions. How to develop robust tests to expand, filter, and rank molecular entities in disease-specific networks remains an ad hoc data analysis process.ResultsWe describe a new biomolecular characterization and prioritization tool called Weighted In-Network Node Expansion and Ranking (WINNER). It takes the input of any molecular interaction network data and generates an optionally expanded network with all the nodes ranked according to their relevance to one another in the network. To help users assess the robustness of results, WINNER provides two different types of statistics. The first type is a node-expansion p-value, which helps evaluate the statistical significance of adding “non-seed” molecules to the original biomolecular interaction network consisting of “seed” molecules and molecular interactions. The second type is a node-ranking p-value, which helps evaluate the relative statistical significance of the contribution of each node to the overall network architecture. We validated the robustness of WINNER in ranking top molecules by spiking noises in several network permutation experiments. We have found that node degree–preservation randomization of the gene network produced normally distributed ranking scores, which outperform those made with other gene network randomization techniques. Furthermore, we validated that a more significant proportion of the WINNER-ranked genes was associated with disease biology than existing methods such as PageRank. We demonstrated the performance of WINNER with a few case studies, including Alzheimer's disease, breast cancer, myocardial infarctions, and Triple negative breast cancer (TNBC). In all these case studies, the expanded and top-ranked genes identified by WINNER reveal disease biology more significantly than those identified by other gene prioritizing software tools, including Ingenuity Pathway Analysis (IPA) and DiAMOND.ConclusionWINNER ranking strongly correlates to other ranking methods when the network covers sufficient node and edge information, indicating a high network quality. WINNER users can use this new tool to robustly evaluate a list of candidate genes, proteins, or metabolites produced from high-throughput biology experiments, as long as there is available gene/protein/metabolic network information.
Collapse
|
17
|
Buvall L, Menzies RI, Williams J, Woollard KJ, Kumar C, Granqvist AB, Fritsch M, Feliers D, Reznichenko A, Gianni D, Petrovski S, Bendtsen C, Bohlooly-Y M, Haefliger C, Danielson RF, Hansen PBL. Selecting the right therapeutic target for kidney disease. Front Pharmacol 2022; 13:971065. [DOI: 10.3389/fphar.2022.971065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 10/17/2022] [Indexed: 11/06/2022] Open
Abstract
Kidney disease is a complex disease with several different etiologies and underlying associated pathophysiology. This is reflected by the lack of effective treatment therapies in chronic kidney disease (CKD) that stop disease progression. However, novel strategies, recent scientific breakthroughs, and technological advances have revealed new possibilities for finding novel disease drivers in CKD. This review describes some of the latest advances in the field and brings them together in a more holistic framework as applied to identification and validation of disease drivers in CKD. It uses high-resolution ‘patient-centric’ omics data sets, advanced in silico tools (systems biology, connectivity mapping, and machine learning) and ‘state-of-the-art‘ experimental systems (complex 3D systems in vitro, CRISPR gene editing, and various model biological systems in vivo). Application of such a framework is expected to increase the likelihood of successful identification of novel drug candidates based on strong human target validation and a better scientific understanding of underlying mechanisms.
Collapse
|
18
|
Gnilopyat S, DePietro PJ, Parry TK, McLaughlin WA. The Pharmacorank Search Tool for the Retrieval of Prioritized Protein Drug Targets and Drug Repositioning Candidates According to Selected Diseases. Biomolecules 2022; 12:1559. [PMID: 36358909 PMCID: PMC9687941 DOI: 10.3390/biom12111559] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 10/19/2022] [Accepted: 10/22/2022] [Indexed: 08/13/2023] Open
Abstract
We present the Pharmacorank search tool as an objective means to obtain prioritized protein drug targets and their associated medications according to user-selected diseases. This tool could be used to obtain prioritized protein targets for the creation of novel medications or to predict novel indications for medications that already exist. To prioritize the proteins associated with each disease, a gene similarity profiling method based on protein functions is implemented. The priority scores of the proteins are found to correlate well with the likelihoods that the associated medications are clinically relevant in the disease's treatment. When the protein priority scores are plotted against the percentage of protein targets that are known to bind medications currently indicated to treat the disease, which we termed the pertinency score, a strong correlation was observed. The correlation coefficient was found to be 0.9978 when using a weighted second-order polynomial fit. As the highly predictive fit was made using a broad range of diseases, we were able to identify a general threshold for the pertinency score as a starting point for considering drug repositioning candidates. Several repositioning candidates are described for proteins that have high predicated pertinency scores, and these provide illustrative examples of the applications of the tool. We also describe focused reviews of repositioning candidates for Alzheimer's disease. Via the tool's URL, https://protein.som.geisinger.edu/Pharmacorank/, an open online interface is provided for interactive use; and there is a site for programmatic access.
Collapse
Affiliation(s)
| | | | | | - William A. McLaughlin
- Department of Medical Education, Geisinger Commonwealth School of Medicine, 525 Pine Street, Scranton, PA 18509, USA
| |
Collapse
|
19
|
Network-Based Methods for Approaching Human Pathologies from a Phenotypic Point of View. Genes (Basel) 2022; 13:genes13061081. [PMID: 35741843 PMCID: PMC9222217 DOI: 10.3390/genes13061081] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 06/10/2022] [Accepted: 06/14/2022] [Indexed: 01/27/2023] Open
Abstract
Network and systemic approaches to studying human pathologies are helping us to gain insight into the molecular mechanisms of and potential therapeutic interventions for human diseases, especially for complex diseases where large numbers of genes are involved. The complex human pathological landscape is traditionally partitioned into discrete “diseases”; however, that partition is sometimes problematic, as diseases are highly heterogeneous and can differ greatly from one patient to another. Moreover, for many pathological states, the set of symptoms (phenotypes) manifested by the patient is not enough to diagnose a particular disease. On the contrary, phenotypes, by definition, are directly observable and can be closer to the molecular basis of the pathology. These clinical phenotypes are also important for personalised medicine, as they can help stratify patients and design personalised interventions. For these reasons, network and systemic approaches to pathologies are gradually incorporating phenotypic information. This review covers the current landscape of phenotype-centred network approaches to study different aspects of human diseases.
Collapse
|
20
|
Network assisted analysis of de novo variants using protein-protein interaction information identified 46 candidate genes for congenital heart disease. PLoS Genet 2022; 18:e1010252. [PMID: 35671298 PMCID: PMC9205499 DOI: 10.1371/journal.pgen.1010252] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 06/17/2022] [Accepted: 05/12/2022] [Indexed: 11/19/2022] Open
Abstract
De novo variants (DNVs) with deleterious effects have proved informative in identifying risk genes for early-onset diseases such as congenital heart disease (CHD). A number of statistical methods have been proposed for family-based studies or case/control studies to identify risk genes by screening genes with more DNVs than expected by chance in Whole Exome Sequencing (WES) studies. However, the statistical power is still limited for cohorts with thousands of subjects. Under the hypothesis that connected genes in protein-protein interaction (PPI) networks are more likely to share similar disease association status, we developed a Markov Random Field model that can leverage information from publicly available PPI databases to increase power in identifying risk genes. We identified 46 candidate genes with at least 1 DNV in the CHD study cohort, including 18 known human CHD genes and 35 highly expressed genes in mouse developing heart. Our results may shed new insight on the shared protein functionality among risk genes for CHD. The topologic information in a pathway may be informative to identify functionally interrelated genes and help improve statistical power in DNV studies. Under the hypothesis that connected genes in PPI networks are more likely to share similar disease association status, we developed a novel statistical model that can leverage information from publicly available PPI databases. Through simulation studies under multiple settings, we proved our method can increase statistical power in identifying additional risk genes compared to methods without using the PPI network information. We then applied our method to a real example for CHD DNV data, and then visualized the subnetwork of candidate genes to find potential functional gene clusters for CHD.
Collapse
|
21
|
Ji Y, Chen R, Wang Q, Wei Q, Tao R, Li B. A Bayesian framework to integrate multi-level genome-scale data for Autism risk gene prioritization. BMC Bioinformatics 2022; 23:146. [PMID: 35459094 PMCID: PMC9034518 DOI: 10.1186/s12859-022-04616-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2021] [Accepted: 02/15/2022] [Indexed: 12/03/2022] Open
Abstract
Background Autism spectrum disorder (ASD) is a group of complex neurodevelopment disorders with a strong genetic basis. Large scale sequencing studies have identified over one hundred ASD risk genes. Nevertheless, the vast majority of ASD risk genes remain to be discovered, as it is estimated that more than 1000 genes are likely to be involved in ASD risk. Prioritization of risk genes is an effective strategy to increase the power of identifying novel risk genes in genetics studies of ASD. As ASD risk genes are likely to exhibit distinct properties from multiple angles, we reason that integrating multiple levels of genomic data is a powerful approach to pinpoint genuine ASD risk genes. Results We present BNScore, a Bayesian model selection framework to probabilistically prioritize ASD risk genes through explicitly integrating evidence from sequencing-identified ASD genes, biological annotations, and gene functional network. We demonstrate the validity of our approach and its improved performance over existing methods by examining the resulting top candidate ASD risk genes against sets of high-confidence benchmark genes and large-scale ASD genome-wide association studies. We assess the tissue-, cell type- and development stage-specific expression properties of top prioritized genes, and find strong expression specificity in brain tissues, striatal medium spiny neurons, and fetal developmental stages. Conclusions In summary, we show that by integrating sequencing findings, functional annotation profiles, and gene-gene functional network, our proposed BNScore provides competitive performance compared to current state-of-the-art methods in prioritizing ASD genes. Our method offers a general and flexible strategy to risk gene prioritization that can potentially be applied to other complex traits as well. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04616-y.
Collapse
Affiliation(s)
- Ying Ji
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, 37212, USA.,Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37212, USA
| | - Rui Chen
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, 37212, USA.,Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37212, USA
| | - Quan Wang
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, 37212, USA.,Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37212, USA
| | - Qiang Wei
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, 37212, USA.,Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37212, USA
| | - Ran Tao
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37212, USA. .,Department of Biostatistics, Vanderbilt University, Nashville, TN, 37212, USA.
| | - Bingshan Li
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, 37212, USA. .,Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37212, USA.
| |
Collapse
|
22
|
Xiang J, Meng X, Zhao Y, Wu FX, Li M. HyMM: hybrid method for disease-gene prediction by integrating multiscale module structure. Brief Bioinform 2022; 23:6547263. [PMID: 35275996 DOI: 10.1093/bib/bbac072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 01/18/2022] [Accepted: 02/13/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Identifying disease-related genes is an important issue in computational biology. Module structure widely exists in biomolecule networks, and complex diseases are usually thought to be caused by perturbations of local neighborhoods in the networks, which can provide useful insights for the study of disease-related genes. However, the mining and effective utilization of the module structure is still challenging in such issues as a disease gene prediction. RESULTS We propose a hybrid disease-gene prediction method integrating multiscale module structure (HyMM), which can utilize multiscale information from local to global structure to more effectively predict disease-related genes. HyMM extracts module partitions from local to global scales by multiscale modularity optimization with exponential sampling, and estimates the disease relatedness of genes in partitions by the abundance of disease-related genes within modules. Then, a probabilistic model for integration of gene rankings is designed in order to integrate multiple predictions derived from multiscale module partitions and network propagation, and a parameter estimation strategy based on functional information is proposed to further enhance HyMM's predictive power. By a series of experiments, we reveal the importance of module partitions at different scales, and verify the stable and good performance of HyMM compared with eight other state-of-the-arts and its further performance improvement derived from the parameter estimation. CONCLUSIONS The results confirm that HyMM is an effective framework for integrating multiscale module structure to enhance the ability to predict disease-related genes, which may provide useful insights for the study of the multiscale module structure and its application in such issues as a disease-gene prediction.
Collapse
Affiliation(s)
- Ju Xiang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China; Department of Basic Medical Sciences & Academician Workstation, Changsha Medical University, Changsha, Hunan 410219, China
| | - Xiangmao Meng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yichao Zhao
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, S7N 5A9, Canada
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
23
|
Vergara A, Haas JC, Aro T, Stachula P, Street NR, Hurry V. Norway spruce deploys tissue-specific responses during acclimation to cold. PLANT, CELL & ENVIRONMENT 2022; 45:427-445. [PMID: 34873720 DOI: 10.1111/pce.14241] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 11/12/2021] [Accepted: 11/23/2021] [Indexed: 06/13/2023]
Abstract
Climate change in the conifer-dominated boreal forest is expected to lead to warmer but more dynamic winter air temperatures, reducing the depth and duration of snow cover and lowering winter soil temperatures. To gain insight into the mechanisms that have enabled conifers to dominate extreme cold environments, we performed genome-wide RNA-Seq analysis from needles and roots of non-dormant two-year Norway spruce (Picea abies (L.) H. Karst), and contrasted these response to herbaceous model Arabidopsis We show that the main transcriptional response of Norway spruce needles exposed to cold was delayed relative to Arabidopsis, and this delay was associated with slower development of freezing tolerance. Despite this difference in timing, Norway spruce principally utilizes early response transcription factors (TFs) belonging to the same gene families as Arabidopsis, indicating broad evolutionary conservation of cold response networks. In keeping with their different metabolic and developmental states, needles and root of Norway spruce showed contrasting results. Regulatory network analysis identified both conserved TFs with known roles in cold acclimation (e.g. homologs of ICE1, AKS3, and of the NAC and AP2/ERF superfamilies), but also a root-specific bHLH101 homolog, providing functional insights into cold stress response strategies in Norway spruce.
Collapse
Affiliation(s)
- Alexander Vergara
- Department of Forest Genetics and Plant Physiology, Umeå Plant Science Centre, Swedish University of Agricultural Sciences, Umeå, Sweden
| | - Julia C Haas
- Department of Plant Physiology, Umeå Plant Science Centre, Umeå University, Umeå, Sweden
| | - Tuuli Aro
- Department of Forest Genetics and Plant Physiology, Umeå Plant Science Centre, Swedish University of Agricultural Sciences, Umeå, Sweden
| | - Paulina Stachula
- Department of Plant Physiology, Umeå Plant Science Centre, Umeå University, Umeå, Sweden
| | - Nathaniel R Street
- Department of Plant Physiology, Umeå Plant Science Centre, Umeå University, Umeå, Sweden
| | - Vaughan Hurry
- Department of Forest Genetics and Plant Physiology, Umeå Plant Science Centre, Swedish University of Agricultural Sciences, Umeå, Sweden
| |
Collapse
|
24
|
Wang L, Shang M, Dai Q, He PA. Prediction of lncRNA-disease association based on a Laplace normalized random walk with restart algorithm on heterogeneous networks. BMC Bioinformatics 2022; 23:5. [PMID: 34983367 PMCID: PMC8729064 DOI: 10.1186/s12859-021-04538-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Accepted: 12/15/2021] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND More and more evidence showed that long non-coding RNAs (lncRNAs) play important roles in the development and progression of human sophisticated diseases. Therefore, predicting human lncRNA-disease associations is a challenging and urgently task in bioinformatics to research of human sophisticated diseases. RESULTS In the work, a global network-based computational framework called as LRWRHLDA were proposed which is a universal network-based method. Firstly, four isomorphic networks include lncRNA similarity network, disease similarity network, gene similarity network and miRNA similarity network were constructed. And then, six heterogeneous networks include known lncRNA-disease, lncRNA-gene, lncRNA-miRNA, disease-gene, disease-miRNA, and gene-miRNA associations network were applied to design a multi-layer network. Finally, the Laplace normalized random walk with restart algorithm in this global network is suggested to predict the relationship between lncRNAs and diseases. CONCLUSIONS The ten-fold cross validation is used to evaluate the performance of LRWRHLDA. As a result, LRWRHLDA achieves an AUC of 0.98402, which is higher than other compared methods. Furthermore, LRWRHLDA can predict isolated disease-related lnRNA (isolated lnRNA related disease). The results for colorectal cancer, lung adenocarcinoma, stomach cancer and breast cancer have been verified by other researches. The case studies indicated that our method is effective.
Collapse
Affiliation(s)
- Liugen Wang
- School of Science, Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Min Shang
- School of Science, Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Qi Dai
- College of Life Science, Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Ping-An He
- School of Science, Zhejiang Sci-Tech University, Hangzhou, 310018, China.
| |
Collapse
|
25
|
Milano M. Using Gene Ontology to Annotate and Prioritize Microarray Data. Methods Mol Biol 2022; 2401:273-287. [PMID: 34902135 DOI: 10.1007/978-1-0716-1839-4_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The results of high-throughput experiments consist of numerous candidate genes, proteins, or other molecules potentially associated with diseases. A challenge for omics science is the knowledge extraction from the results and the filtering of promising gene or protein candidates. Especially, the hot topic in clinical scenarios consists of highlighting the behavior of few molecules related to some specific disease. In this contest, different computational approaches, also referred Gene prioritization methods, ensure to identify the most related genes to a disease among a larger set of candidate genes. The identification requires the use of domain-specific knowledge that is often encoded into ontologies.
Collapse
Affiliation(s)
- Marianna Milano
- Department of Medical and Surgical Sciences, University of Catanzaro, Catanzaro, Italy.
| |
Collapse
|
26
|
Tziastoudi M, Cholevas C, Theoharides TC, Stefanidis I. Meta-Analysis and Bioinformatics Detection of Susceptibility Genes in Diabetic Nephropathy. Int J Mol Sci 2021; 23:ijms23010020. [PMID: 35008447 PMCID: PMC8744540 DOI: 10.3390/ijms23010020] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Revised: 12/15/2021] [Accepted: 12/18/2021] [Indexed: 11/16/2022] Open
Abstract
The latest meta-analysis of genome-wide linkage studies (GWLS) identified nine cytogenetic locations suggestive of a linkage with diabetic nephropathy (DN) due to type 1 diabetes mellitus (T1DM) and seven locations due to type 2 diabetes mellitus (T2DM). In order to gain biological insight about the functional role of the genes located in these regions and to prioritize the most significant genetic loci for further research, we conducted a gene ontology analysis with an over representation test for the functional annotation of the protein coding genes. Protein analysis through evolutionary relationships (PANTHER) version 16.0 software and Cytoscape with the relevant plugins were used for the gene ontology analysis, and the overrepresentation test and STRING database were used for the construction of the protein network. The findings of the over-representation test highlight the contribution of immune related molecules like immunoglobulins, cytokines, and chemokines with regard to the most overrepresented protein classes, whereas the most enriched signaling pathways include the VEGF signaling pathway, the Cadherin pathway, the Wnt pathway, the angiogenesis pathway, the p38 MAPK pathway, and the EGF receptor signaling pathway. The common section of T1DM and T2DM results include the significant over representation of immune related molecules, and the Cadherin and Wnt signaling pathways that could constitute potential therapeutic targets for the treatment of DN, irrespective of the type of diabetes.
Collapse
Affiliation(s)
- Maria Tziastoudi
- Department of Nephrology, Faculty of Medicine, School of Health Sciences, University of Thessaly, 41500 Larisa, Greece;
- Correspondence: ; Tel.: +30-2413501667; Fax: +30-2413501015
| | - Christos Cholevas
- First Department of Ophthalmology, Faculty of Health Sciences, Aristotle University of Thessaloniki School of Medicine, AHEPA Hospital, 54636 Thessaloniki, Greece;
| | | | - Ioannis Stefanidis
- Department of Nephrology, Faculty of Medicine, School of Health Sciences, University of Thessaly, 41500 Larisa, Greece;
| |
Collapse
|
27
|
Neshan M, Malakouti SK, Kamalzadeh L, Makvand M, Campbell A, Ahangari G. Alterations in T-Cell Transcription Factors and Cytokine Gene Expression in Late-Onset Alzheimer's Disease. J Alzheimers Dis 2021; 85:645-665. [PMID: 34864659 DOI: 10.3233/jad-210480] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
BACKGROUND Late-onset Alzheimer's disease (LOAD) is associated with many environmental and genetic factors. The effect of systemic inflammation on the pathogenesis of neurodegenerative diseases such as AD has been strongly suggested. T helper cells (Th) are one of the important components of the immune system and can easily infiltrate the brain in pathological conditions. The development of each Th-subset depends on the production of unique cytokines and their main regulator. OBJECTIVE This study aimed to compare the mRNA levels of Th-related genes derived from peripheral blood mononuclear cells of LOAD patients with control. Also, the identification of the most important Th1/Th2 genes and downstream pathways that may be involved in the pathogenesis of AD was followed by computational approaches. METHODS This study invloved 30 patients with LOAD and 30 non-demented controls. The relative expression of T-cell cytokines (IFN-γ, TNF-α, IL-4, and IL-5) and transcription factors (T-bet and GATA-3) were assessed using real-time PCR. Additionally, protein-protein interaction (PPI) was investigated by gene network construction. RESULTS A significant decrease at T-bet, IFN-γ, TNF-α, and GATA-3 mRNA levels was detected in the LOAD group, compared to the controls. However, there was no significant difference in IL-4 or IL-5 mRNA levels. Network analysis revealed a list of the highly connected protein (hubs) related to mitogen-activated protein kinase (MAPK) signaling and Th17 cell differentiation pathways. CONCLUSION The findings point to a molecular dysregulation in Th-related genes, which can promising in the early diagnosis or targeted interventions of AD. Furthermore, the PPI analysis showed that upstream off-target stimulation may involve MAPK cascade activation and Th17 axis induction.
Collapse
Affiliation(s)
- Masoud Neshan
- Department of Medical Genetics, National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
| | - Seyed Kazem Malakouti
- Mental Health Research Center, Tehran Institute of Psychiatry-School of Behavioral Sciences and Mental Health, Iran University of Medical Sciences, Tehran, Iran
| | - Leila Kamalzadeh
- Mental Health Research Center, Tehran Institute of Psychiatry-School of Behavioral Sciences and Mental Health, Iran University of Medical Sciences, Tehran, Iran
| | - Mina Makvand
- Department of Medical Genetics, National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
| | - Arezoo Campbell
- Department of Pharmaceutical Sciences, Western University of Health Sciences, Pomona, CA, USA
| | - Ghasem Ahangari
- Department of Medical Genetics, National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
| |
Collapse
|
28
|
Raimondi D, Corso M, Fariselli P, Moreau Y. From genotype to phenotype in Arabidopsis thaliana: in-silico genome interpretation predicts 288 phenotypes from sequencing data. Nucleic Acids Res 2021; 50:e16. [PMID: 34792168 PMCID: PMC8860592 DOI: 10.1093/nar/gkab1099] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 10/06/2021] [Accepted: 10/22/2021] [Indexed: 01/09/2023] Open
Abstract
In many cases, the unprecedented availability of data provided by high-throughput sequencing has shifted the bottleneck from a data availability issue to a data interpretation issue, thus delaying the promised breakthroughs in genetics and precision medicine, for what concerns Human genetics, and phenotype prediction to improve plant adaptation to climate change and resistance to bioagressors, for what concerns plant sciences. In this paper, we propose a novel Genome Interpretation paradigm, which aims at directly modeling the genotype-to-phenotype relationship, and we focus on A. thaliana since it is the best studied model organism in plant genetics. Our model, called Galiana, is the first end-to-end Neural Network (NN) approach following the genomes in/phenotypes out paradigm and it is trained to predict 288 real-valued Arabidopsis thaliana phenotypes from Whole Genome sequencing data. We show that 75 of these phenotypes are predicted with a Pearson correlation ≥0.4, and are mostly related to flowering traits. We show that our end-to-end NN approach achieves better performances and larger phenotype coverage than models predicting single phenotypes from the GWAS-derived known associated genes. Galiana is also fully interpretable, thanks to the Saliency Maps gradient-based approaches. We followed this interpretation approach to identify 36 novel genes that are likely to be associated with flowering traits, finding evidence for 6 of them in the existing literature.
Collapse
Affiliation(s)
| | - Massimiliano Corso
- Institut Jean-Pierre Bourgin, Université Paris-Saclay, INRAE, AgroParisTech, 78000 Versailles, France
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, 10123 Torino, Italy
| | - Yves Moreau
- ESAT-STADIUS, KU Leuven, 3001 Leuven, Belgium
| |
Collapse
|
29
|
Wang W, Han R, Zhang M, Wang Y, Wang T, Wang Y, Shang X, Peng J. A network-based method for brain disease gene prediction by integrating brain connectome and molecular network. Brief Bioinform 2021; 23:6415315. [PMID: 34727570 DOI: 10.1093/bib/bbab459] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 09/18/2021] [Accepted: 10/07/2021] [Indexed: 12/27/2022] Open
Abstract
Brain disease gene identification is critical for revealing the biological mechanism and developing drugs for brain diseases. To enhance the identification of brain disease genes, similarity-based computational methods, especially network-based methods, have been adopted for narrowing down the searching space. However, these network-based methods only use molecular networks, ignoring brain connectome data, which have been widely used in many brain-related studies. In our study, we propose a novel framework, named brainMI, for integrating brain connectome data and molecular-based gene association networks to predict brain disease genes. For the consistent representation of molecular-based network data and brain connectome data, brainMI first constructs a novel gene network, called brain functional connectivity (BFC)-based gene network, based on resting-state functional magnetic resonance imaging data and brain region-specific gene expression data. Then, a multiple network integration method is proposed to learn low-dimensional features of genes by integrating the BFC-based gene network and existing protein-protein interaction networks. Finally, these features are utilized to predict brain disease genes based on a support vector machine-based model. We evaluate brainMI on four brain diseases, including Alzheimer's disease, Parkinson's disease, major depressive disorder and autism. brainMI achieves of 0.761, 0.729, 0.728 and 0.744 using the BFC-based gene network alone and enhances the molecular network-based performance by 6.3% on average. In addition, the results show that brainMI achieves higher performance in predicting brain disease genes compared to the existing three state-of-the-art methods.
Collapse
Affiliation(s)
- Wei Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Ruijiang Han
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Menghan Zhang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Yuxian Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Tao Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Yongtian Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| |
Collapse
|
30
|
Petti M, Farina L, Francone F, Lucidi S, Macali A, Palagi L, De Santis M. MOSES: A New Approach to Integrate Interactome Topology and Functional Features for Disease Gene Prediction. Genes (Basel) 2021; 12:1713. [PMID: 34828319 PMCID: PMC8624742 DOI: 10.3390/genes12111713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 10/16/2021] [Accepted: 10/25/2021] [Indexed: 11/17/2022] Open
Abstract
Disease gene prediction is to date one of the main computational challenges of precision medicine. It is still uncertain if disease genes have unique functional properties that distinguish them from other non-disease genes or, from a network perspective, if they are located randomly in the interactome or show specific patterns in the network topology. In this study, we propose a new method for disease gene prediction based on the use of biological knowledge-bases (gene-disease associations, genes functional annotations, etc.) and interactome network topology. The proposed algorithm called MOSES is based on the definition of two somewhat opposing sets of genes both disease-specific from different perspectives: warm seeds (i.e., disease genes obtained from databases) and cold seeds (genes far from the disease genes on the interactome and not involved in their biological functions). The application of MOSES to a set of 40 diseases showed that the suggested putative disease genes are significantly enriched in their reference disease. Reassuringly, known and predicted disease genes together, tend to form a connected network module on the human interactome, mitigating the scattered distribution of disease genes which is probably due to both the paucity of disease-gene associations and the incompleteness of the interactome.
Collapse
Affiliation(s)
- Manuela Petti
- Department of Computer, Control and Management Engineering, Sapienza University of Rome, 00185 Rome, Italy; (L.F.); (F.F.); (S.L.); (A.M.); (L.P.); (M.D.S.)
| | | | | | | | | | | | | |
Collapse
|
31
|
Common and Unique Genetic Background between Attention-Deficit/Hyperactivity Disorder and Excessive Body Weight. Genes (Basel) 2021; 12:genes12091407. [PMID: 34573389 PMCID: PMC8464917 DOI: 10.3390/genes12091407] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 09/08/2021] [Accepted: 09/09/2021] [Indexed: 02/07/2023] Open
Abstract
Comorbidity studies show that children with ADHD have a higher risk of being overweight and obese than healthy children. This study aimed to assess the genetic alternations that differ between and are shared by ADHD and excessive body weight (EBW). The sample consisted of 743 Polish children aged between 6 and 17 years. We analyzed a unique set of genes and polymorphisms selected for ADHD and/or obesity based on gene prioritization tools. Polymorphisms in the KCNIP1, SLC1A3, MTHFR, ADRA2A, and SLC6A2 genes proved to be associated with the risk of ADHD in the studied population. The COMT gene polymorphism was one that specifically increased the risk of EBW in the ADHD group. Using the whole-exome sequencing technique, we have shown that the ADHD group contains rare and protein-truncating variants in the FBXL17, DBH, MTHFR, PCDH7, RSPH3, SPTBN1, and TNRC6C genes. In turn, variants in the ADRA2A, DYNC1H1, MAP1A, SEMA6D, and ZNF536 genes were specific for ADHD with EBW. In this way, we confirmed, at the molecular level, the existence of genes specifically predisposing to EBW in ADHD patients, which are associated with the biological pathways involved in the regulation of the reward system, intestinal microbiome, and muscle metabolism.
Collapse
|
32
|
Wang Y, Xia Z, Deng J, Xie X, Gong M, Ma X. TLGP: a flexible transfer learning algorithm for gene prioritization based on heterogeneous source domain. BMC Bioinformatics 2021; 22:274. [PMID: 34433414 PMCID: PMC8386056 DOI: 10.1186/s12859-021-04190-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Accepted: 05/12/2021] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Gene prioritization (gene ranking) aims to obtain the centrality of genes, which is critical for cancer diagnosis and therapy since keys genes correspond to the biomarkers or targets of drugs. Great efforts have been devoted to the gene ranking problem by exploring the similarity between candidate and known disease-causing genes. However, when the number of disease-causing genes is limited, they are not applicable largely due to the low accuracy. Actually, the number of disease-causing genes for cancers, particularly for these rare cancers, are really limited. Therefore, there is a critical needed to design effective and efficient algorithms for gene ranking with limited prior disease-causing genes. RESULTS In this study, we propose a transfer learning based algorithm for gene prioritization (called TLGP) in the cancer (target domain) without disease-causing genes by transferring knowledge from other cancers (source domain). The underlying assumption is that knowledge shared by similar cancers improves the accuracy of gene prioritization. Specifically, TLGP first quantifies the similarity between the target and source domain by calculating the affinity matrix for genes. Then, TLGP automatically learns a fusion network for the target cancer by fusing affinity matrix, pathogenic genes and genomic data of source cancers. Finally, genes in the target cancer are prioritized. The experimental results indicate that the learnt fusion network is more reliable than gene co-expression network, implying that transferring knowledge from other cancers improves the accuracy of network construction. Moreover, TLGP outperforms state-of-the-art approaches in terms of accuracy, improving at least 5%. CONCLUSION The proposed model and method provide an effective and efficient strategy for gene ranking by integrating genomic data from various cancers.
Collapse
Affiliation(s)
- Yan Wang
- School of Computer Science and Technology, Xidian University, South TaiBai Road, Xi’an, China
- Department of Library, Xidian University, South TaiBai Road, Xi’an, China
| | - Zuheng Xia
- School of Computer Science and Technology, Xidian University, South TaiBai Road, Xi’an, China
| | - Jingjing Deng
- Department of Computer Science, Swansea University, Bay, UK
| | - Xianghua Xie
- Department of Computer Science, Swansea University, Bay, UK
| | - Maoguo Gong
- School of Electronic Engineering, Xidian University, South TaiBai Road, Xi’an, China
| | - Xiaoke Ma
- School of Computer Science and Technology, Xidian University, South TaiBai Road, Xi’an, China
| |
Collapse
|
33
|
Rintala TJ, Federico A, Latonen L, Greco D, Fortino V. A systematic comparison of data- and knowledge-driven approaches to disease subtype discovery. Brief Bioinform 2021; 22:6350885. [PMID: 34396389 PMCID: PMC8575038 DOI: 10.1093/bib/bbab314] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 07/05/2021] [Accepted: 07/20/2021] [Indexed: 12/14/2022] Open
Abstract
Typical clustering analysis for large-scale genomics data combines two unsupervised learning techniques: dimensionality reduction and clustering (DR-CL) methods. It has been demonstrated that transforming gene expression to pathway-level information can improve the robustness and interpretability of disease grouping results. This approach, referred to as biological knowledge-driven clustering (BK-CL) approach, is often neglected, due to a lack of tools enabling systematic comparisons with more established DR-based methods. Moreover, classic clustering metrics based on group separability tend to favor the DR-CL paradigm, which may increase the risk of identifying less actionable disease subtypes that have ambiguous biological and clinical explanations. Hence, there is a need for developing metrics that assess biological and clinical relevance. To facilitate the systematic analysis of BK-CL methods, we propose a computational protocol for quantitative analysis of clustering results derived from both DR-CL and BK-CL methods. Moreover, we propose a new BK-CL method that combines prior knowledge of disease relevant genes, network diffusion algorithms and gene set enrichment analysis to generate robust pathway-level information. Benchmarking studies were conducted to compare the grouping results from different DR-CL and BK-CL approaches with respect to standard clustering evaluation metrics, concordance with known subtypes, association with clinical outcomes and disease modules in co-expression networks of genes. No single approach dominated every metric, showing the importance multi-objective evaluation in clustering analysis. However, we demonstrated that, on gene expression data sets derived from TCGA samples, the BK-CL approach can find groupings that provide significant prognostic value in both breast and prostate cancers.
Collapse
Affiliation(s)
- Teemu J Rintala
- Institute of Biomedicine University of Eastern Finland, Yliopistonranta 1 E, 70210 Kuopio, Finland
| | - Antonio Federico
- Faculty of Medicine and Health Technology Tampere University, Kalevantie, 4 33100 Tampere, Finland.,BioMediTech Institute Tampere University, Kalevantie 4, 33100 Tampere, Finland
| | - Leena Latonen
- Institute of Biomedicine University of Eastern Finland, Yliopistonranta 1 E, 70210 Kuopio, Finland
| | - Dario Greco
- Faculty of Medicine and Health Technology Tampere University, Kalevantie, 4 33100 Tampere, Finland.,BioMediTech Institute Tampere University, Kalevantie 4, 33100 Tampere, Finland.,Institute of Biotechnology University of Helsinki, Viikinkaari 5d, 00014 Helsinki, Finland
| | - Vittorio Fortino
- Institute of Biomedicine University of Eastern Finland, Yliopistonranta 1 E, 70210 Kuopio, Finland
| |
Collapse
|
34
|
Gunning M, Pavlidis P. "Guilt by association" is not competitive with genetic association for identifying autism risk genes. Sci Rep 2021; 11:15950. [PMID: 34354131 PMCID: PMC8342445 DOI: 10.1038/s41598-021-95321-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 07/16/2021] [Indexed: 12/25/2022] Open
Abstract
Discovering genes involved in complex human genetic disorders is a major challenge. Many have suggested that machine learning (ML) algorithms using gene networks can be used to supplement traditional genetic association-based approaches to predict or prioritize disease genes. However, questions have been raised about the utility of ML methods for this type of task due to biases within the data, and poor real-world performance. Using autism spectrum disorder (ASD) as a test case, we sought to investigate the question: can machine learning aid in the discovery of disease genes? We collected 13 published ASD gene prioritization studies and evaluated their performance using known and novel high-confidence ASD genes. We also investigated their biases towards generic gene annotations, like number of association publications. We found that ML methods which do not incorporate genetics information have limited utility for prioritization of ASD risk genes. These studies perform at a comparable level to generic measures of likelihood for the involvement of genes in any condition, and do not out-perform genetic association studies. Future efforts to discover disease genes should be focused on developing and validating statistical models for genetic association, specifically for association between rare variants and disease, rather than developing complex machine learning methods using complex heterogeneous biological data with unknown reliability.
Collapse
Affiliation(s)
- Margot Gunning
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
- Department of Psychiatry, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
- Graduate Program in Bioinformatics, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Paul Pavlidis
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada.
- Department of Psychiatry, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada.
- Djavad Mowafaghian Centre for Brain Health, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada.
| |
Collapse
|
35
|
Anchang CG, Xu C, Raimondo MG, Atreya R, Maier A, Schett G, Zaburdaev V, Rauber S, Ramming A. The Potential of OMICs Technologies for the Treatment of Immune-Mediated Inflammatory Diseases. Int J Mol Sci 2021; 22:ijms22147506. [PMID: 34299122 PMCID: PMC8306614 DOI: 10.3390/ijms22147506] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2021] [Revised: 07/02/2021] [Accepted: 07/09/2021] [Indexed: 01/08/2023] Open
Abstract
Immune-mediated inflammatory diseases (IMIDs), such as inflammatory bowel diseases and inflammatory arthritis (e.g., rheumatoid arthritis, psoriatic arthritis), are marked by increasing worldwide incidence rates. Apart from irreversible damage of the affected tissue, the systemic nature of these diseases heightens the incidence of cardiovascular insults and colitis-associated neoplasia. Only 40–60% of patients respond to currently used standard-of-care immunotherapies. In addition to this limited long-term effectiveness, all current therapies have to be given on a lifelong basis as they are unable to specifically reprogram the inflammatory process and thus achieve a true cure of the disease. On the other hand, the development of various OMICs technologies is considered as “the great hope” for improving the treatment of IMIDs. This review sheds light on the progressive development and the numerous approaches from basic science that gradually lead to the transfer from “bench to bedside” and the implementation into general patient care procedures.
Collapse
Affiliation(s)
- Charles Gwellem Anchang
- Department of Internal Medicine 3—Rheumatology and Immunology, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Universitätsklinikum, 91054 Erlangen, Germany; (C.G.A.); (C.X.); (M.G.R.); (G.S.); (S.R.)
| | - Cong Xu
- Department of Internal Medicine 3—Rheumatology and Immunology, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Universitätsklinikum, 91054 Erlangen, Germany; (C.G.A.); (C.X.); (M.G.R.); (G.S.); (S.R.)
| | - Maria Gabriella Raimondo
- Department of Internal Medicine 3—Rheumatology and Immunology, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Universitätsklinikum, 91054 Erlangen, Germany; (C.G.A.); (C.X.); (M.G.R.); (G.S.); (S.R.)
| | - Raja Atreya
- Department of Internal Medicine 1, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Universitätsklinikum, 91054 Erlangen, Germany;
| | - Andreas Maier
- Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), 91054 Erlangen, Germany;
| | - Georg Schett
- Department of Internal Medicine 3—Rheumatology and Immunology, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Universitätsklinikum, 91054 Erlangen, Germany; (C.G.A.); (C.X.); (M.G.R.); (G.S.); (S.R.)
| | - Vasily Zaburdaev
- Max-Planck-Zentrum für Physik und Medizin, 91054 Erlangen, Germany;
- Department of Biology, Mathematics in Life Sciences, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), 91054 Erlangen, Germany
| | - Simon Rauber
- Department of Internal Medicine 3—Rheumatology and Immunology, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Universitätsklinikum, 91054 Erlangen, Germany; (C.G.A.); (C.X.); (M.G.R.); (G.S.); (S.R.)
| | - Andreas Ramming
- Department of Internal Medicine 3—Rheumatology and Immunology, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Universitätsklinikum, 91054 Erlangen, Germany; (C.G.A.); (C.X.); (M.G.R.); (G.S.); (S.R.)
- Correspondence: ; Tel.: +49-9131-8543048; Fax: +49-9131-8536448
| |
Collapse
|
36
|
Sefik E, Purcell RH, Walker EF, Bassell GJ, Mulle JG. Convergent and distributed effects of the 3q29 deletion on the human neural transcriptome. Transl Psychiatry 2021; 11:357. [PMID: 34131099 PMCID: PMC8206125 DOI: 10.1038/s41398-021-01435-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Revised: 04/29/2021] [Accepted: 05/07/2021] [Indexed: 12/13/2022] Open
Abstract
The 3q29 deletion (3q29Del) confers high risk for schizophrenia and other neurodevelopmental and psychiatric disorders. However, no single gene in this interval is definitively associated with disease, prompting the hypothesis that neuropsychiatric sequelae emerge upon loss of multiple functionally-connected genes. 3q29 genes are unevenly annotated and the impact of 3q29Del on the human neural transcriptome is unknown. To systematically formulate unbiased hypotheses about molecular mechanisms linking 3q29Del to neuropsychiatric illness, we conducted a systems-level network analysis of the non-pathological adult human cortical transcriptome and generated evidence-based predictions that relate 3q29 genes to novel functions and disease associations. The 21 protein-coding genes located in the interval segregated into seven clusters of highly co-expressed genes, demonstrating both convergent and distributed effects of 3q29Del across the interrogated transcriptomic landscape. Pathway analysis of these clusters indicated involvement in nervous-system functions, including synaptic signaling and organization, as well as core cellular functions, including transcriptional regulation, posttranslational modifications, chromatin remodeling, and mitochondrial metabolism. Top network-neighbors of 3q29 genes showed significant overlap with known schizophrenia, autism, and intellectual disability-risk genes, suggesting that 3q29Del biology is relevant to idiopathic disease. Leveraging "guilt by association", we propose nine 3q29 genes, including one hub gene, as prioritized drivers of neuropsychiatric risk. These results provide testable hypotheses for experimental analysis on causal drivers and mechanisms of the largest known genetic risk factor for schizophrenia and highlight the study of normal function in non-pathological postmortem tissue to further our understanding of psychiatric genetics, especially for rare syndromes like 3q29Del, where access to neural tissue from carriers is unavailable or limited.
Collapse
Affiliation(s)
- Esra Sefik
- grid.189967.80000 0001 0941 6502Department of Human Genetics, Emory University School of Medicine, Atlanta, GA USA ,grid.189967.80000 0001 0941 6502Department of Psychology, Emory University, Atlanta, GA USA
| | - Ryan H. Purcell
- grid.189967.80000 0001 0941 6502Department of Cell Biology, Emory University School of Medicine, Atlanta, GA USA ,grid.189967.80000 0001 0941 6502Laboratory of Translational Cell Biology, Emory University School of Medicine, Atlanta, GA USA
| | | | - Elaine F. Walker
- grid.189967.80000 0001 0941 6502Department of Psychology, Emory University, Atlanta, GA USA
| | - Gary J. Bassell
- grid.189967.80000 0001 0941 6502Department of Cell Biology, Emory University School of Medicine, Atlanta, GA USA ,grid.189967.80000 0001 0941 6502Laboratory of Translational Cell Biology, Emory University School of Medicine, Atlanta, GA USA
| | - Jennifer G. Mulle
- grid.189967.80000 0001 0941 6502Department of Human Genetics, Emory University School of Medicine, Atlanta, GA USA ,grid.189967.80000 0001 0941 6502Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA USA
| |
Collapse
|
37
|
Xiang J, Zhang J, Zheng R, Li X, Li M. NIDM: network impulsive dynamics on multiplex biological network for disease-gene prediction. Brief Bioinform 2021; 22:6236070. [PMID: 33866352 DOI: 10.1093/bib/bbab080] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 02/11/2021] [Accepted: 02/21/2021] [Indexed: 12/12/2022] Open
Abstract
The prediction of genes related to diseases is important to the study of the diseases due to high cost and time consumption of biological experiments. Network propagation is a popular strategy for disease-gene prediction. However, existing methods focus on the stable solution of dynamics while ignoring the useful information hidden in the dynamical process, and it is still a challenge to make use of multiple types of physical/functional relationships between proteins/genes to effectively predict disease-related genes. Therefore, we proposed a framework of network impulsive dynamics on multiplex biological network (NIDM) to predict disease-related genes, along with four variants of NIDM models and four kinds of impulsive dynamical signatures (IDSs). NIDM is to identify disease-related genes by mining the dynamical responses of nodes to impulsive signals being exerted at specific nodes. By a series of experimental evaluations in various types of biological networks, we confirmed the advantage of multiplex network and the important roles of functional associations in disease-gene prediction, demonstrated superior performance of NIDM compared with four types of network-based algorithms and then gave the effective recommendations of NIDM models and IDS signatures. To facilitate the prioritization and analysis of (candidate) genes associated to specific diseases, we developed a user-friendly web server, which provides three kinds of filtering patterns for genes, network visualization, enrichment analysis and a wealth of external links (http://bioinformatics.csu.edu.cn/DGP/NID.jsp). NIDM is a protocol for disease-gene prediction integrating different types of biological networks, which may become a very useful computational tool for the study of disease-related genes.
Collapse
Affiliation(s)
- Ju Xiang
- School of Computer Science and Engineering, Central South University, Human, China
| | - Jiashuai Zhang
- School of Computer Science and Engineering, Central South University, Human, China
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, China
| | - Xingyi Li
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
38
|
Mahlich Y, Miller M, Zeng Z, Bromberg Y. Low Diversity of Human Variation Despite Mostly Mild Functional Impact of De Novo Variants. Front Mol Biosci 2021; 8:635382. [PMID: 33816556 PMCID: PMC8012514 DOI: 10.3389/fmolb.2021.635382] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 02/01/2021] [Indexed: 01/07/2023] Open
Abstract
Non-synonymous Single Nucleotide Variants (nsSNVs), resulting in single amino acid variants (SAVs), are important drivers of evolutionary adaptation across the tree of life. Humans carry on average over 10,000 SAVs per individual genome, many of which likely have little to no impact on the function of the protein they affect. Experimental evidence for protein function changes as a result of SAVs remain sparse – a situation that can be somewhat alleviated by predicting their impact using computational methods. Here, we used SNAP to examine both observed and in silico generated human variation in a set of 1,265 proteins that are consistently found across a number of diverse species. The number of SAVs that are predicted to have any functional effect on these proteins is smaller than expected, suggesting sequence/function optimization over evolutionary timescales. Additionally, we find that only a few of the yet-unobserved SAVs could drastically change the function of these proteins, while nearly a quarter would have only a mild functional effect. We observed that variants common in the human population localized to less conserved protein positions and carried mild to moderate functional effects more frequently than rare variants. As expected, rare variants carried severe effects more frequently than common variants. In line with current assumptions, we demonstrated that the change of the human reference sequence amino acid to the reference of another species (a cross-species variant) is unlikely to significantly impact protein function. However, we also observed that many cross-species variants may be weakly non-neutral for the purposes of quick adaptation to environmental changes, but may not be identified as such by current state-of-the-art methodology.
Collapse
Affiliation(s)
- Yannick Mahlich
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, United States
| | - Maximillian Miller
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, United States
| | - Zishuo Zeng
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, United States
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, United States.,Department of Genetics, Rutgers University, Piscataway, NJ, United States
| |
Collapse
|
39
|
Wang YXR, Li L, Li JJ, Huang H. Network Modeling in Biology: Statistical Methods for Gene and Brain Networks. Stat Sci 2021; 36:89-108. [PMID: 34305304 DOI: 10.1214/20-sts792] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The rise of network data in many different domains has offered researchers new insight into the problem of modeling complex systems and propelled the development of numerous innovative statistical methodologies and computational tools. In this paper, we primarily focus on two types of biological networks, gene networks and brain networks, where statistical network modeling has found both fruitful and challenging applications. Unlike other network examples such as social networks where network edges can be directly observed, both gene and brain networks require careful estimation of edges using covariates as a first step. We provide a discussion on existing statistical and computational methods for edge esitimation and subsequent statistical inference problems in these two types of biological networks.
Collapse
Affiliation(s)
- Y X Rachel Wang
- School of Mathematics and Statistics, University of Sydney, Australia
| | - Lexin Li
- Department of Biostatistics and Epidemiology, School of Public Health, University of California, Berkeley
| | | | - Haiyan Huang
- Department of Statistics, University of California, Berkeley
| |
Collapse
|
40
|
Wu L, Han L, Li Q, Wang G, Zhang H, Li L. Using Interactome Big Data to Crack Genetic Mysteries and Enhance Future Crop Breeding. MOLECULAR PLANT 2021; 14:77-94. [PMID: 33340690 DOI: 10.1016/j.molp.2020.12.012] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 12/11/2020] [Accepted: 12/14/2020] [Indexed: 05/27/2023]
Abstract
The functional genes underlying phenotypic variation and their interactions represent "genetic mysteries". Understanding and utilizing these genetic mysteries are key solutions for mitigating the current threats to agriculture posed by population growth and individual food preferences. Due to advances in high-throughput multi-omics technologies, we are stepping into an Interactome Big Data era that is certain to revolutionize genetic research. In this article, we provide a brief overview of current strategies to explore genetic mysteries. We then introduce the methods for constructing and analyzing the Interactome Big Data and summarize currently available interactome resources. Next, we discuss how Interactome Big Data can be used as a versatile tool to dissect genetic mysteries. We propose an integrated strategy that could revolutionize genetic research by combining Interactome Big Data with machine learning, which involves mining information hidden in Big Data to identify the genetic models or networks that control various traits, and also provide a detailed procedure for systematic dissection of genetic mysteries,. Finally, we discuss three promising future breeding strategies utilizing the Interactome Big Data to improve crop yields and quality.
Collapse
Affiliation(s)
- Leiming Wu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | - Linqian Han
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | - Qing Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | - Guoying Wang
- Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Hongwei Zhang
- Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing 100081, China.
| | - Lin Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China.
| |
Collapse
|
41
|
Lai X, Dreyer FS, Cantone M, Eberhardt M, Gerer KF, Jaitly T, Uebe S, Lischer C, Ekici A, Wittmann J, Jäck HM, Schaft N, Dörrie J, Vera J. Network- and systems-based re-engineering of dendritic cells with non-coding RNAs for cancer immunotherapy. Theranostics 2021; 11:1412-1428. [PMID: 33391542 PMCID: PMC7738891 DOI: 10.7150/thno.53092] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Accepted: 10/15/2020] [Indexed: 12/12/2022] Open
Abstract
Dendritic cells (DCs) are professional antigen-presenting cells that induce and regulate adaptive immunity by presenting antigens to T cells. Due to their coordinative role in adaptive immune responses, DCs have been used as cell-based therapeutic vaccination against cancer. The capacity of DCs to induce a therapeutic immune response can be enhanced by re-wiring of cellular signalling pathways with microRNAs (miRNAs). Methods: Since the activation and maturation of DCs is controlled by an interconnected signalling network, we deploy an approach that combines RNA sequencing data and systems biology methods to delineate miRNA-based strategies that enhance DC-elicited immune responses. Results: Through RNA sequencing of IKKβ-matured DCs that are currently being tested in a clinical trial on therapeutic anti-cancer vaccination, we identified 44 differentially expressed miRNAs. According to a network analysis, most of these miRNAs regulate targets that are linked to immune pathways, such as cytokine and interleukin signalling. We employed a network topology-oriented scoring model to rank the miRNAs, analysed their impact on immunogenic potency of DCs, and identified dozens of promising miRNA candidates, with miR-15a and miR-16 as the top ones. The results of our analysis are presented in a database that constitutes a tool to identify DC-relevant miRNA-gene interactions with therapeutic potential (https://www.synmirapy.net/dc-optimization). Conclusions: Our approach enables the systematic analysis and identification of functional miRNA-gene interactions that can be experimentally tested for improving DC immunogenic potency.
Collapse
Affiliation(s)
- Xin Lai
- Laboratory of Systems Tumor Immunology, Department of Dermatology, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Universitätsklinikum Erlangen, Erlangen, Germany
- Deutsches Zentrum Immuntherapie (DZI), Erlangen, Germany
- Comprehensive Cancer Center (CCC) Erlangen, Erlangen, Germany
| | - Florian S. Dreyer
- Laboratory of Systems Tumor Immunology, Department of Dermatology, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Universitätsklinikum Erlangen, Erlangen, Germany
- Deutsches Zentrum Immuntherapie (DZI), Erlangen, Germany
- Comprehensive Cancer Center (CCC) Erlangen, Erlangen, Germany
| | - Martina Cantone
- Laboratory of Systems Tumor Immunology, Department of Dermatology, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Universitätsklinikum Erlangen, Erlangen, Germany
- Deutsches Zentrum Immuntherapie (DZI), Erlangen, Germany
- Comprehensive Cancer Center (CCC) Erlangen, Erlangen, Germany
| | - Martin Eberhardt
- Laboratory of Systems Tumor Immunology, Department of Dermatology, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Universitätsklinikum Erlangen, Erlangen, Germany
- Deutsches Zentrum Immuntherapie (DZI), Erlangen, Germany
- Comprehensive Cancer Center (CCC) Erlangen, Erlangen, Germany
| | - Kerstin F. Gerer
- RNA Group, Department of Dermatology, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Universitätsklinikum Erlangen, Erlangen, Germany
- Deutsches Zentrum Immuntherapie (DZI), Erlangen, Germany
- Comprehensive Cancer Center (CCC) Erlangen, Erlangen, Germany
| | - Tanushree Jaitly
- Laboratory of Systems Tumor Immunology, Department of Dermatology, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Universitätsklinikum Erlangen, Erlangen, Germany
- Deutsches Zentrum Immuntherapie (DZI), Erlangen, Germany
- Comprehensive Cancer Center (CCC) Erlangen, Erlangen, Germany
| | - Steffen Uebe
- Department of Human Genetics, Universitätsklinikum Erlangen, Erlangen, Germany
| | - Christopher Lischer
- Laboratory of Systems Tumor Immunology, Department of Dermatology, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Universitätsklinikum Erlangen, Erlangen, Germany
- Deutsches Zentrum Immuntherapie (DZI), Erlangen, Germany
- Comprehensive Cancer Center (CCC) Erlangen, Erlangen, Germany
| | - Arif Ekici
- Department of Human Genetics, Universitätsklinikum Erlangen, Erlangen, Germany
| | - Jürgen Wittmann
- Division of Molecular Immunology, Department of Medicine 3, Universitätsklinikum Erlangen, Erlangen, Germany
| | - Hans-Martin Jäck
- Division of Molecular Immunology, Department of Medicine 3, Universitätsklinikum Erlangen, Erlangen, Germany
| | - Niels Schaft
- RNA Group, Department of Dermatology, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Universitätsklinikum Erlangen, Erlangen, Germany
- Deutsches Zentrum Immuntherapie (DZI), Erlangen, Germany
- Comprehensive Cancer Center (CCC) Erlangen, Erlangen, Germany
| | - Jan Dörrie
- RNA Group, Department of Dermatology, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Universitätsklinikum Erlangen, Erlangen, Germany
- Deutsches Zentrum Immuntherapie (DZI), Erlangen, Germany
- Comprehensive Cancer Center (CCC) Erlangen, Erlangen, Germany
| | - Julio Vera
- Laboratory of Systems Tumor Immunology, Department of Dermatology, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Universitätsklinikum Erlangen, Erlangen, Germany
- Deutsches Zentrum Immuntherapie (DZI), Erlangen, Germany
- Comprehensive Cancer Center (CCC) Erlangen, Erlangen, Germany
| |
Collapse
|
42
|
Yue Z, Yan D, Guo G, Chen JY. Biological Network Mining. Methods Mol Biol 2021; 2328:139-151. [PMID: 34251623 DOI: 10.1007/978-1-0716-1534-8_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In this book chapter, we introduce a pipeline to mine significant biomedical entities (or bioentities) in biological networks. Our focus is on prioritizing both bioentities themselves and the associations between bioentities in order to reveal their biological functions. We will introduce three tools BEERE, WIPER, and PAGER 2.0 that can be used together for network analysis and function interpretation: (1) BEERE is a network analysis tool for "Biomedical Entity Expansion, Ranking and Explorations," (2) WIPER is an entity-to-entity association ranking tool, and (3) PAGER 2.0 is a service for gene enrichment analysis.
Collapse
Affiliation(s)
- Zongliang Yue
- The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Da Yan
- The University of Alabama at Birmingham, Birmingham, AL, USA.
| | - Guimu Guo
- The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Jake Y Chen
- The University of Alabama at Birmingham, Birmingham, AL, USA
| |
Collapse
|
43
|
Petti M, Bizzarri D, Verrienti A, Falcone R, Farina L. Connectivity Significance for Disease Gene Prioritization in an Expanding Universe. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:2155-2161. [PMID: 31484130 DOI: 10.1109/tcbb.2019.2938512] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
A fundamental topic in network medicine is disease genes prioritization. The underlying hypothesis is that disease genes are organized as modules confined within the interactome. Here, we propose a novel algorithm called DiaBLE (DIAMOnD Background Local Expansion) which is a modified version of DIAMOnD, a successful algorithm based on the concept of connectivity significance. Instead of taking the whole interactome as the background model, DiaBLE considers as gene universe the smallest local expansion of the current seeds set at each iteration step. We show that DiaBLE significantly increases the overall DIAMOnD ranking quality of genes prioritization both in terms of cross-validation and biological consistency. Here, we focus on the two algorithms only since a comparative analysis among gene prioritization methods is beyond the scope of this study. Finally, we briefly discuss the improvement of biological insight provided by DiaBLE for two cancers (head and neck squamous cell carcinoma and kidney renal clear cell carcinoma).
Collapse
|
44
|
Guerra C, Joshi S, Lu Y, Palini F, Ferraro Petrillo U, Rossignac J. Rank-Similarity Measures for Comparing Gene Prioritizations: A Case Study in Autism. J Comput Biol 2020; 28:283-295. [PMID: 33103913 DOI: 10.1089/cmb.2020.0244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We discuss the challenge of comparing three gene prioritization methods: network propagation, integer linear programming rank aggregation (RA), and statistical RA. These methods are based on different biological categories and estimate disease-gene association. Previously proposed comparison schemes are based on three measures of performance: receiver operating curve, area under the curve, and median rank ratio. Although they may capture important aspects of gene prioritization performance, they may fail to capture important differences in the rankings of individual genes. We suggest that comparison schemes could be improved by also considering recently proposed measures of similarity between gene rankings. We tested this suggestion on comparison schemes for prioritizations of genes associated with autism that were obtained using brain- and tissue-specific data. Our results show the effectiveness of our measures of similarity in clustering brain regions based on their relevance to autism.
Collapse
Affiliation(s)
- Concettina Guerra
- Georgia Institute of Technology College of Computing, School of Interactive Computing, Atlanta, Georgia, USA
| | - Sarang Joshi
- Georgia Institute of Technology College of Computing, School of Interactive Computing, Atlanta, Georgia, USA
| | - Yinquan Lu
- Georgia Institute of Technology College of Computing, School of Interactive Computing, Atlanta, Georgia, USA
| | - Francesco Palini
- Dipartimento di Scienze Statistiche, Università di Roma-La Sapienza, Rome, Italy
| | | | - Jarek Rossignac
- Georgia Institute of Technology College of Computing, School of Interactive Computing, Atlanta, Georgia, USA
| |
Collapse
|
45
|
Paliwal S, de Giorgio A, Neil D, Michel JB, Lacoste AM. Preclinical validation of therapeutic targets predicted by tensor factorization on heterogeneous graphs. Sci Rep 2020; 10:18250. [PMID: 33106501 PMCID: PMC7589557 DOI: 10.1038/s41598-020-74922-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Accepted: 09/30/2020] [Indexed: 12/04/2022] Open
Abstract
Incorrect drug target identification is a major obstacle in drug discovery. Only 15% of drugs advance from Phase II to approval, with ineffective targets accounting for over 50% of these failures1-3. Advances in data fusion and computational modeling have independently progressed towards addressing this issue. Here, we capitalize on both these approaches with Rosalind, a comprehensive gene prioritization method that combines heterogeneous knowledge graph construction with relational inference via tensor factorization to accurately predict disease-gene links. Rosalind demonstrates an increase in performance of 18%-50% over five comparable state-of-the-art algorithms. On historical data, Rosalind prospectively identifies 1 in 4 therapeutic relationships eventually proven true. Beyond efficacy, Rosalind is able to accurately predict clinical trial successes (75% recall at rank 200) and distinguish likely failures (74% recall at rank 200). Lastly, Rosalind predictions were experimentally tested in a patient-derived in-vitro assay for Rheumatoid arthritis (RA), which yielded 5 promising genes, one of which is unexplored in RA.
Collapse
Affiliation(s)
- Saee Paliwal
- BenevolentAI, 1 Dock72 Way, 7th Floor, Brooklyn, NY, 11205, USA.
| | - Alex de Giorgio
- BenevolentAI, 4-6 Maple Street, Bloomsbury, London, W1T5HD, UK
| | - Daniel Neil
- BenevolentAI, 1 Dock72 Way, 7th Floor, Brooklyn, NY, 11205, USA
| | | | - Alix Mb Lacoste
- BenevolentAI, 1 Dock72 Way, 7th Floor, Brooklyn, NY, 11205, USA
| |
Collapse
|
46
|
Ruan P, Wang S. DiSNEP: a Disease-Specific gene Network Enhancement to improve Prioritizing candidate disease genes. Brief Bioinform 2020; 22:5925270. [PMID: 33064143 DOI: 10.1093/bib/bbaa241] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Revised: 07/25/2020] [Accepted: 08/29/2020] [Indexed: 12/27/2022] Open
Abstract
Biological network-based strategies are useful in prioritizing genes associated with diseases. Several comprehensive human gene networks such as STRING, GIANT and HumanNet were developed and used in network-assisted algorithms to identify disease-associated genes. However, none of these networks are disease-specific and may not accurately reflect gene interactions for a specific disease. Aiming to improve disease gene prioritization using networks, we propose a Disease-Specific Network Enhancement Prioritization (DiSNEP) framework. DiSNEP first enhances a comprehensive gene network specifically for a disease through a diffusion process on a gene-gene similarity matrix derived from disease omics data. The enhanced disease-specific gene network thus better reflects true gene interactions for the disease and may improve prioritizing disease-associated genes subsequently. In simulations, DiSNEP that uses an enhanced disease-specific network prioritizes more true signal genes than comparison methods using a general gene network or without prioritization. Applications to prioritize cancer-associated gene expression and DNA methylation signal genes for five cancer types from The Cancer Genome Atlas (TCGA) project suggest that more prioritized candidate genes by DiSNEP are cancer-related according to the DisGeNET database than those prioritized by the comparison methods, consistently across all five cancer types considered, and for both gene expression and DNA methylation signal genes.
Collapse
|
47
|
Jiang S, Zhang CY, Tang L, Zhao LX, Chen HZ, Qiu Y. Integrated Genomic Analysis Revealed Associated Genes for Alzheimer's Disease in APOE4 Non-Carriers. Curr Alzheimer Res 2020; 16:753-763. [PMID: 31441725 DOI: 10.2174/1567205016666190823124724] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Revised: 07/14/2019] [Accepted: 08/08/2019] [Indexed: 12/31/2022]
Abstract
BACKGROUND APOE4 is the strongest genetic risk factor for late-onset Alzheimer's disease (LOAD). LOAD patients carrying or not carrying APOE4 manifest distinct clinico-pathological characteristics. APOE4 has been shown to play a critical role in the pathogenesis of AD by affecting various aspects of pathological processes. However, the pathogenesis involved in LOAD not-carrying APOE4 remains elusive. OBJECTIVE We aimed to identify the associated genes involved in LOAD not-carrying APOE4. METHODS An integrated genomic analysis of datasets of genome-wide association study, genome-wide expression profiling and genome-wide linkage scan and protein-protein interaction network construction were applied to identify associated gene clusters in APOE4 non-carriers. The role of one of hub gene of an APOE4 non-carrier-associated gene cluster in tau phosphorylation was studied by knockdown and western blot. RESULTS We identified 12 gene clusters associated with AD APOE4 non-carriers. The hub genes associated with AD in these clusters were MAPK8, POU2F1, XRCC1, PRKCG, EXOC6, VAMP4, SIRT1, MME, NOS1, ABCA1 and LDLR. The associated genes for APOE4 non-carriers were enriched in hereditary disorder, neurological disease and psychological disorders. Moreover, knockdown of PRKCG to reduce the expression of protein kinase Cγ isoform enhanced tau phosphorylation at Thr181 and Thr231 and the expression of glycogen synthase kinase 3β and cyclin-dependent kinase 5 in the presence of APOE3 but not APOE4. CONCLUSION The study provides new insight into the mechanism of distinct pathogenesis of LOAD not carrying APOE4 and prompts the functional exploration of identified genes based on APOE genotypes.
Collapse
Affiliation(s)
- Shan Jiang
- Department of Pharmacology and Chemical Biology, Institute of Medical Sciences, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China.,Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Chun-Yun Zhang
- Department of Pharmacology and Chemical Biology, Institute of Medical Sciences, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Ling Tang
- Department of Pharmacology and Chemical Biology, Institute of Medical Sciences, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Lan-Xue Zhao
- Department of Pharmacology and Chemical Biology, Institute of Medical Sciences, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Hong-Zhuan Chen
- Institute of Interdisciplinary Integrative Biomedical Research, Shanghai University of Traditional Chinese Medicine, Shanghai 201210, China
| | - Yu Qiu
- Department of Pharmacology and Chemical Biology, Institute of Medical Sciences, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| |
Collapse
|
48
|
Yoon KH, Fox SC, Dicipulo R, Lehmann OJ, Waskiewicz AJ. Ocular coloboma: Genetic variants reveal a dynamic model of eye development. AMERICAN JOURNAL OF MEDICAL GENETICS PART C-SEMINARS IN MEDICAL GENETICS 2020; 184:590-610. [PMID: 32852110 DOI: 10.1002/ajmg.c.31831] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2020] [Revised: 07/27/2020] [Accepted: 07/28/2020] [Indexed: 12/21/2022]
Abstract
Ocular coloboma is a congenital disorder of the eye where a gap exists in the inferior retina, lens, iris, or optic nerve tissue. With a prevalence of 2-19 per 100,000 live births, coloboma, and microphthalmia, an associated ocular disorder, represent up to 10% of childhood blindness. It manifests due to the failure of choroid fissure closure during eye development, and it is a part of a spectrum of ocular disorders that include microphthalmia and anophthalmia. Use of genetic approaches from classical pedigree analyses to next generation sequencing has identified more than 40 loci that are associated with the causality of ocular coloboma. As we have expanded studies to include singleton cases, hereditability has been very challenging to prove. As such, researchers over the past 20 years, have unraveled the complex interrelationship amongst these 40 genes using vertebrate model organisms. Such research has greatly increased our understanding of eye development. These genes function to regulate initial specification of the eye field, migration of retinal precursors, patterning of the retina, neural crest cell biology, and activity of head mesoderm. This review will discuss the discovery of loci using patient data, their investigations in animal models, and the recent advances stemming from animal models that shed new light in patient diagnosis.
Collapse
Affiliation(s)
- Kevin H Yoon
- Department of Biological Sciences, University of Alberta, Edmonton, Alberta, Canada.,Women & Children's Health Research Institute, University of Alberta, Edmonton, Canada
| | - Sabrina C Fox
- Department of Biological Sciences, University of Alberta, Edmonton, Alberta, Canada.,Women & Children's Health Research Institute, University of Alberta, Edmonton, Canada
| | - Renée Dicipulo
- Department of Biological Sciences, University of Alberta, Edmonton, Alberta, Canada.,Women & Children's Health Research Institute, University of Alberta, Edmonton, Canada
| | - Ordan J Lehmann
- Women & Children's Health Research Institute, University of Alberta, Edmonton, Canada.,Department of Medical Genetics, University of Alberta, Edmonton, Alberta, Canada.,Department of Ophthalmology, University of Alberta, Edmonton, Alberta, Canada
| | - Andrew J Waskiewicz
- Department of Biological Sciences, University of Alberta, Edmonton, Alberta, Canada.,Women & Children's Health Research Institute, University of Alberta, Edmonton, Canada
| |
Collapse
|
49
|
Romdhane L, Bouhamed H, Ghedira K, Ben Hamda C, Louhichi A, Jmel H, Romdhane S, Charfeddine C, Mokni M, Abdelhak S, Rebai A. The morbid cutaneous anatomy of the human genome revealed by a bioinformatic approach. Genomics 2020; 112:4232-4241. [PMID: 32650097 DOI: 10.1016/j.ygeno.2020.07.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Revised: 03/28/2020] [Accepted: 07/02/2020] [Indexed: 01/05/2023]
Abstract
Computational approaches have been developed to prioritize candidate genes in disease gene identification. They are based on different pieces of evidences associating each gene with the given disease. In this study, 648 genes underlying genodermatoses have been compared to 1808 genes involved in other genetic diseases using a bioinformatic approach. These genes were studied at the structural, evolutionary and functional levels. Results show that genes underlying genodermatoses present longer CDS and have more exons. Significant differences were observed in nucleotide motif and amino-acid compositions. Evolutionary conservation analysis revealed that genodermatoses genes have less paralogs, more orthologs in Mouse and Dog and are less conserved. Functional analysis revealed that genodermatosis genes seem to be involved in immune system and skin layers. The Bayesian network model returned a rate of good classification of around 80%. This computational approach could help investigators working in the field of dermatology by prioritizing positional candidate genes for mutation screening.
Collapse
Affiliation(s)
- Lilia Romdhane
- Biomedical Genomics and Oncogenetics Laboratory LR11IPT05, LR16IPT05, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia; Department of Biology, Faculty of Sciences of Bizerte, Jarzouna, Université Tunis Carthage, Tunis, Tunisia.
| | - Heni Bouhamed
- Molecular and Cellular Screening Process Laboratory, Centre of Biotechnology of Sfax, Sfax, Tunisia
| | - Kais Ghedira
- Laboratory of Bioinformatics, Biomathematics and Biostatistics (LR16IPT09), Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Cherif Ben Hamda
- Laboratory of Bioinformatics, Biomathematics and Biostatistics (LR16IPT09), Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Amel Louhichi
- Molecular and Cellular Screening Process Laboratory, Centre of Biotechnology of Sfax, Sfax, Tunisia
| | - Haifa Jmel
- Biomedical Genomics and Oncogenetics Laboratory LR11IPT05, LR16IPT05, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Safa Romdhane
- Biomedical Genomics and Oncogenetics Laboratory LR11IPT05, LR16IPT05, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Chérine Charfeddine
- Biomedical Genomics and Oncogenetics Laboratory LR11IPT05, LR16IPT05, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia; High Institut of Biotechnology of Sidi Thabet, University of Manouba, BiotechPole of Sidi Thabet, Ariana, Tunisia
| | - Mourad Mokni
- Department of Dermatology, CHU La Rabta Tunis, Tunis, Tunisia; Public health and infection Research Laboratory, La Rabta Hospital, Tunis, Tunisia
| | - Sonia Abdelhak
- Biomedical Genomics and Oncogenetics Laboratory LR11IPT05, LR16IPT05, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Ahmed Rebai
- Molecular and Cellular Screening Process Laboratory, Centre of Biotechnology of Sfax, Sfax, Tunisia
| |
Collapse
|
50
|
Renaux A, Papadimitriou S, Versbraegen N, Nachtegael C, Boutry S, Nowé A, Smits G, Lenaerts T. ORVAL: a novel platform for the prediction and exploration of disease-causing oligogenic variant combinations. Nucleic Acids Res 2020; 47:W93-W98. [PMID: 31147699 PMCID: PMC6602484 DOI: 10.1093/nar/gkz437] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 05/01/2019] [Accepted: 05/09/2019] [Indexed: 12/16/2022] Open
Abstract
A tremendous amount of DNA sequencing data is being produced around the world with the ambition to capture in more detail the mechanisms underlying human diseases. While numerous bioinformatics tools exist that allow the discovery of causal variants in Mendelian diseases, little to no support is provided to do the same for variant combinations, an essential task for the discovery of the causes of oligogenic diseases. ORVAL (the Oligogenic Resource for Variant AnaLysis), which is presented here, provides an answer to this problem by focusing on generating networks of candidate pathogenic variant combinations in gene pairs, as opposed to isolated variants in unique genes. This online platform integrates innovative machine learning methods for combinatorial variant pathogenicity prediction with visualization techniques, offering several interactive and exploratory tools, such as pathogenic gene and protein interaction networks, a ranking of pathogenic gene pairs, as well as visual mappings of the cellular location and pathway information. ORVAL is the first web-based exploration platform dedicated to identifying networks of candidate pathogenic variant combinations with the sole ambition to help in uncovering oligogenic causes for patients that cannot rely on the classical disease analysis tools. ORVAL is available at https://orval.ibsquare.be.
Collapse
Affiliation(s)
- Alexandre Renaux
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium.,Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium.,Artificial Intelligence lab, Vrije Universiteit Brussel, 1050 Brussels, Belgium
| | - Sofia Papadimitriou
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium.,Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium.,Artificial Intelligence lab, Vrije Universiteit Brussel, 1050 Brussels, Belgium
| | - Nassim Versbraegen
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium.,Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
| | - Charlotte Nachtegael
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium.,Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
| | - Simon Boutry
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium.,Laboratory of Human Molecular Genetics, de Duve Institute, UCLouvain, 1200 Brussels, Belgium
| | - Ann Nowé
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium.,Artificial Intelligence lab, Vrije Universiteit Brussel, 1050 Brussels, Belgium
| | - Guillaume Smits
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium.,Hôpital Universitaire des Enfants Reine Fabiola, 1020 Brussels, Belgium.,Center of Human Genetics, Hôpital Erasme, 1070 Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium.,Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium.,Artificial Intelligence lab, Vrije Universiteit Brussel, 1050 Brussels, Belgium
| |
Collapse
|