1
|
Wang A, Tian P, Zhang YD. TWAS-GKF: a novel method for causal gene identification in transcriptome-wide association studies with knockoff inference. Bioinformatics 2024; 40:btae502. [PMID: 39189955 PMCID: PMC11361808 DOI: 10.1093/bioinformatics/btae502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Revised: 07/02/2024] [Accepted: 08/24/2024] [Indexed: 08/28/2024] Open
Abstract
MOTIVATION Transcriptome-wide association study (TWAS) aims to identify trait-associated genes regulated by significant variants to explore the underlying biological mechanisms at a tissue-specific level. Despite the advancement of current TWAS methods to cover diverse traits, traditional approaches still face two main challenges: (i) the lack of methods that can guarantee finite-sample false discovery rate (FDR) control in identifying trait-associated genes; and (ii) the requirement for individual-level data, which is often inaccessible. RESULTS To address this challenge, we propose a powerful knockoff inference method termed TWAS-GKF to identify candidate trait-associated genes with a guaranteed finite-sample FDR control. TWAS-GKF introduces the main idea of Ghostknockoff inference to generate knockoff variables using only summary statistics instead of individual-level data. In extensive studies, we demonstrate that TWAS-GKF successfully controls the finite-sample FDR under a pre-specified FDR level across all settings. We further apply TWAS-GKF to identify genes in brain cerebellum tissue from the Genotype-Tissue Expression (GTEx) v8 project associated with schizophrenia (SCZ) from the Psychiatric Genomics Consortium (PGC), and genes in liver tissue related to low-density lipoprotein cholesterol (LDL-C) from the UK Biobank, respectively. The results reveal that the majority of the identified genes are validated by Open Targets Validation Platform. AVAILABILITY AND IMPLEMENTATION The R package TWAS.GKF is publicly available at https://github.com/AnqiWang2021/TWAS.GKF.
Collapse
Affiliation(s)
- Anqi Wang
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong SAR, 999077, China
| | - Peixin Tian
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong SAR, 999077, China
| | - Yan Dora Zhang
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong SAR, 999077, China
| |
Collapse
|
2
|
Cao X, Zhang S, Sha Q. A novel method for multiple phenotype association studies based on genotype and phenotype network. PLoS Genet 2024; 20:e1011245. [PMID: 38728360 PMCID: PMC11111089 DOI: 10.1371/journal.pgen.1011245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 05/22/2024] [Accepted: 03/29/2024] [Indexed: 05/12/2024] Open
Abstract
Joint analysis of multiple correlated phenotypes for genome-wide association studies (GWAS) can identify and interpret pleiotropic loci which are essential to understand pleiotropy in diseases and complex traits. Meanwhile, constructing a network based on associations between phenotypes and genotypes provides a new insight to analyze multiple phenotypes, which can explore whether phenotypes and genotypes might be related to each other at a higher level of cellular and organismal organization. In this paper, we first develop a bipartite signed network by linking phenotypes and genotypes into a Genotype and Phenotype Network (GPN). The GPN can be constructed by a mixture of quantitative and qualitative phenotypes and is applicable to binary phenotypes with extremely unbalanced case-control ratios in large-scale biobank datasets. We then apply a powerful community detection method to partition phenotypes into disjoint network modules based on GPN. Finally, we jointly test the association between multiple phenotypes in a network module and a single nucleotide polymorphism (SNP). Simulations and analyses of 72 complex traits in the UK Biobank show that multiple phenotype association tests based on network modules detected by GPN are much more powerful than those without considering network modules. The newly proposed GPN provides a new insight to investigate the genetic architecture among different types of phenotypes. Multiple phenotypes association studies based on GPN are improved by incorporating the genetic information into the phenotype clustering. Notably, it might broaden the understanding of genetic architecture that exists between diagnoses, genes, and pleiotropy.
Collapse
Affiliation(s)
- Xuewei Cao
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| |
Collapse
|
3
|
Yao X, Ouyang S, Lian Y, Peng Q, Zhou X, Huang F, Hu X, Shi F, Xia J. PheSeq, a Bayesian deep learning model to enhance and interpret the gene-disease association studies. Genome Med 2024; 16:56. [PMID: 38627848 PMCID: PMC11020195 DOI: 10.1186/s13073-024-01330-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 04/02/2024] [Indexed: 04/19/2024] Open
Abstract
Despite the abundance of genotype-phenotype association studies, the resulting association outcomes often lack robustness and interpretations. To address these challenges, we introduce PheSeq, a Bayesian deep learning model that enhances and interprets association studies through the integration and perception of phenotype descriptions. By implementing the PheSeq model in three case studies on Alzheimer's disease, breast cancer, and lung cancer, we identify 1024 priority genes for Alzheimer's disease and 818 and 566 genes for breast cancer and lung cancer, respectively. Benefiting from data fusion, these findings represent moderate positive rates, high recall rates, and interpretation in gene-disease association studies.
Collapse
Affiliation(s)
- Xinzhi Yao
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Sizhuo Ouyang
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Yulong Lian
- College of Science, Huazhong Agricultural University, Wuhan, China
| | - Qianqian Peng
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Xionghui Zhou
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Feier Huang
- College of Life Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Xuehai Hu
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Feng Shi
- College of Science, Huazhong Agricultural University, Wuhan, China
| | - Jingbo Xia
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China.
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China.
| |
Collapse
|
4
|
Weeks EM, Ulirsch JC, Cheng NY, Trippe BL, Fine RS, Miao J, Patwardhan TA, Kanai M, Nasser J, Fulco CP, Tashman KC, Aguet F, Li T, Ordovas-Montanes J, Smillie CS, Biton M, Shalek AK, Ananthakrishnan AN, Xavier RJ, Regev A, Gupta RM, Lage K, Ardlie KG, Hirschhorn JN, Lander ES, Engreitz JM, Finucane HK. Leveraging polygenic enrichments of gene features to predict genes underlying complex traits and diseases. Nat Genet 2023; 55:1267-1276. [PMID: 37443254 PMCID: PMC10836580 DOI: 10.1038/s41588-023-01443-6] [Citation(s) in RCA: 32] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 06/09/2023] [Indexed: 07/15/2023]
Abstract
Genome-wide association studies (GWASs) are a valuable tool for understanding the biology of complex human traits and diseases, but associated variants rarely point directly to causal genes. In the present study, we introduce a new method, polygenic priority score (PoPS), that learns trait-relevant gene features, such as cell-type-specific expression, to prioritize genes at GWAS loci. Using a large evaluation set of genes with fine-mapped coding variants, we show that PoPS and the closest gene individually outperform other gene prioritization methods, but observe the best overall performance by combining PoPS with orthogonal methods. Using this combined approach, we prioritize 10,642 unique gene-trait pairs across 113 complex traits and diseases with high precision, finding not only well-established gene-trait relationships but nominating new genes at unresolved loci, such as LGR4 for estimated glomerular filtration rate and CCR7 for deep vein thrombosis. Overall, we demonstrate that PoPS provides a powerful addition to the gene prioritization toolbox.
Collapse
Affiliation(s)
- Elle M Weeks
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Jacob C Ulirsch
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA, USA
- Artificial Intelligence Laboratory, Illumina, Inc., San Diego, CA, USA
| | | | - Brian L Trippe
- Program in Computational & Systems Biology, MIT, Cambridge, MA, USA
- Computer Science & Artificial Intelligence Lab, MIT, Cambridge, MA, USA
| | - Rebecca S Fine
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Endocrinology and Center for Basic and Translational Obesity Research, Boston Children's Hospital, Boston, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Vertex Pharmaceuticals Incorporated, Boston, MA, USA
| | - Jenkai Miao
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Endocrinology and Center for Basic and Translational Obesity Research, Boston Children's Hospital, Boston, MA, USA
| | - Tejal A Patwardhan
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Statistics, Harvard University, Cambridge, MA, USA
| | - Masahiro Kanai
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, MGH, Boston, MA, USA
- Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA, USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
| | - Joseph Nasser
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Charles P Fulco
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Bristol Myers Squibb, Cambridge, MA, USA
| | | | | | - Taibo Li
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- MD-PhD Program, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Jose Ordovas-Montanes
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Gastroenterology, Hepatology, and Nutrition, Boston Children's Hospital, Boston, MA, USA
- Program in Immunology, Harvard Medical School, Boston, MA, USA
- Harvard Stem Cell Institute, Cambridge, MA, USA
| | - Christopher S Smillie
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Computational & Systems Biology, MIT, Cambridge, MA, USA
| | - Moshe Biton
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Molecular Biology, MGH, Boston, MA, USA
- Department of Biological Regulation, Weizmann Institute of Science, Rehovot, Israel
| | - Alex K Shalek
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Institute for Medical Engineering and Science, MIT, Cambridge, MA, USA
- Department of Chemistry, MIT, Cambridge, MA, USA
- Koch Institute for Integrative Cancer Research, MIT, Cambridge, MA, USA
- Ragon Institute of MGH, MMIT, Cambridge, MA, USA
| | - Ashwin N Ananthakrishnan
- Gastrointestinal Unit and Center for the Study of Inflammatory Bowel Disease, MGH, Boston, MA, USA
| | - Ramnik J Xavier
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Molecular Biology, MGH, Boston, MA, USA
- Gastrointestinal Unit and Center for the Study of Inflammatory Bowel Disease, MGH, Boston, MA, USA
- Center for Computational and Integrative Biology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Aviv Regev
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Koch Institute for Integrative Cancer Research, MIT, Cambridge, MA, USA
- Howard Hughes Medical Institute, MIT, Cambridge, MA, USA
- Genentech, San Francisco, CA, USA
| | - Rajat M Gupta
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Cardiovascular Medicine and Genetics, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Kasper Lage
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Surgery, MGH, Boston, MA, USA
| | - Kristin G Ardlie
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Joel N Hirschhorn
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Endocrinology and Center for Basic and Translational Obesity Research, Boston Children's Hospital, Boston, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Eric S Lander
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biology, MIT, Cambridge, MA, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Jesse M Engreitz
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
- BASE Initiative, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Stanford University School of Medicine, Stanford, CA, USA
| | - Hilary K Finucane
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Analytic and Translational Genetics Unit, MGH, Boston, MA, USA.
| |
Collapse
|
5
|
Baronas JM, Bartell E, Eliasen A, Doench JG, Yengo L, Vedantam S, Marouli E, Kronenberg HM, Hirschhorn JN, Renthal NE. Genome-wide CRISPR screening of chondrocyte maturation newly implicates genes in skeletal growth and height-associated GWAS loci. CELL GENOMICS 2023; 3:100299. [PMID: 37228756 PMCID: PMC10203046 DOI: 10.1016/j.xgen.2023.100299] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 12/14/2022] [Accepted: 03/17/2023] [Indexed: 05/27/2023]
Abstract
Alterations in the growth and maturation of chondrocytes can lead to variation in human height, including monogenic disorders of skeletal growth. We aimed to identify genes and pathways relevant to human growth by pairing human height genome-wide association studies (GWASs) with genome-wide knockout (KO) screens of growth-plate chondrocyte proliferation and maturation in vitro. We identified 145 genes that alter chondrocyte proliferation and maturation at early and/or late time points in culture, with 90% of genes validating in secondary screening. These genes are enriched in monogenic growth disorder genes and in KEGG pathways critical for skeletal growth and endochondral ossification. Further, common variants near these genes capture height heritability independent of genes computationally prioritized from GWASs. Our study emphasizes the value of functional studies in biologically relevant tissues as orthogonal datasets to refine likely causal genes from GWASs and implicates new genetic regulators of chondrocyte proliferation and maturation.
Collapse
Affiliation(s)
- John M. Baronas
- Department of Pediatrics, Division of Endocrinology, Boston Children’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Eric Bartell
- Department of Pediatrics, Division of Endocrinology, Boston Children’s Hospital and Harvard Medical School, Boston, MA, USA
- Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Anders Eliasen
- Department of Pediatrics, Division of Endocrinology, Boston Children’s Hospital and Harvard Medical School, Boston, MA, USA
- Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - John G. Doench
- Genetic Perturbation Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Loic Yengo
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Sailaja Vedantam
- Department of Pediatrics, Division of Endocrinology, Boston Children’s Hospital and Harvard Medical School, Boston, MA, USA
- Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Eirini Marouli
- William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - GIANT Consortium
- Department of Pediatrics, Division of Endocrinology, Boston Children’s Hospital and Harvard Medical School, Boston, MA, USA
- Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Kgs. Lyngby, Denmark
- Genetic Perturbation Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
- William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
- Endocrine Unit, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Henry M. Kronenberg
- Endocrine Unit, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Joel N. Hirschhorn
- Department of Pediatrics, Division of Endocrinology, Boston Children’s Hospital and Harvard Medical School, Boston, MA, USA
- Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Nora E. Renthal
- Department of Pediatrics, Division of Endocrinology, Boston Children’s Hospital and Harvard Medical School, Boston, MA, USA
| |
Collapse
|
6
|
Identifying genes targeted by disease-associated non-coding SNPs with a protein knowledge graph. PLoS One 2022; 17:e0271395. [PMID: 35830458 PMCID: PMC9278741 DOI: 10.1371/journal.pone.0271395] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Accepted: 06/29/2022] [Indexed: 12/24/2022] Open
Abstract
Genome-wide association studies (GWAS) have identified many single nucleotide polymorphisms (SNPs) that play important roles in the genetic heritability of traits and diseases. With most of these SNPs located on the non-coding part of the genome, it is currently assumed that these SNPs influence the expression of nearby genes on the genome. However, identifying which genes are targeted by these disease-associated SNPs remains challenging. In the past, protein knowledge graphs have often been used to identify genes that are associated with disease, also referred to as “disease genes”. Here, we explore whether protein knowledge graphs can be used to identify genes that are targeted by disease-associated non-coding SNPs by testing and comparing the performance of six existing methods for a protein knowledge graph, four of which were developed for disease gene identification. We compare our performance against two baselines: (1) an existing state-of-the-art method that is based on guilt-by-association, and (2) the leading assumption that SNPs target the nearest gene on the genome. We test these methods with four reference sets, three of which were obtained by different means. Furthermore, we combine methods to investigate whether their combination improves performance. We find that protein knowledge graphs that include predicate information perform comparable to the current state of the art, achieving an area under the receiver operating characteristic curve (AUC) of 79.6% on average across all four reference sets. Protein knowledge graphs that lack predicate information perform comparable to our other baseline (genetic distance) which achieved an AUC of 75.7% across all four reference sets. Combining multiple methods improved performance to 84.9% AUC. We conclude that methods for a protein knowledge graph can be used to identify which genes are targeted by disease-associated non-coding SNPs.
Collapse
|
7
|
Wainberg M, Merico D, Keller MC, Fauman EB, Tripathy SJ. Predicting causal genes from psychiatric genome-wide association studies using high-level etiological knowledge. Mol Psychiatry 2022; 27:3095-3106. [PMID: 35411039 DOI: 10.1038/s41380-022-01542-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 03/08/2022] [Accepted: 03/21/2022] [Indexed: 12/24/2022]
Abstract
Genome-wide association studies have discovered hundreds of genomic loci associated with psychiatric traits, but the causal genes underlying these associations are often unclear, a research gap that has hindered clinical translation. Here, we present a Psychiatric Omnilocus Prioritization Score (PsyOPS) derived from just three binary features encapsulating high-level assumptions about psychiatric disease etiology - namely, that causal psychiatric disease genes are likely to be mutationally constrained, be specifically expressed in the brain, and overlap with known neurodevelopmental disease genes. To our knowledge, PsyOPS is the first method specifically tailored to prioritizing causal genes at psychiatric GWAS loci. We show that, despite its extreme simplicity, PsyOPS achieves state-of-the-art performance at this task, comparable to a prior domain-agnostic approach relying on tens of thousands of features. Genes prioritized by PsyOPS are substantially more likely than other genes at the same loci to have convergent evidence of direct regulation by the GWAS variant according to both DNA looping assays and expression or splicing quantitative trait locus (QTL) maps. We provide examples of genes hundreds of kilobases away from the lead variant, like GABBR1 for schizophrenia, that are prioritized by all three of PsyOPS, DNA looping and QTLs. Our results underscore the power of incorporating high-level knowledge of trait etiology into causal gene prediction at GWAS loci, and comprise a resource for researchers interested in experimentally characterizing psychiatric gene candidates.
Collapse
Affiliation(s)
- Michael Wainberg
- Krembil Centre for Neuroinformatics, Centre for Addiction and Mental Health, Toronto, ON, Canada
| | - Daniele Merico
- Deep Genomics Inc, Toronto, ON, Canada.,The Centre for Applied Genomics (TCAG), The Hospital for Sick Children, Toronto, ON, Canada
| | - Matthew C Keller
- Department of Psychology and Neuroscience, University of Colorado, Boulder, CO, USA.,Institute for Behavioral Genetics, University of Colorado, Boulder, CO, USA
| | - Eric B Fauman
- Internal Medicine Research Unit, Pfizer Worldwide Research, Development and Medical, Cambridge, MA, USA
| | - Shreejoy J Tripathy
- Krembil Centre for Neuroinformatics, Centre for Addiction and Mental Health, Toronto, ON, Canada. .,Institute of Medical Sciences, University of Toronto, Toronto, ON, Canada. .,Department of Psychiatry, University of Toronto, Toronto, ON, Canada. .,Department of Physiology, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
8
|
Kolobkov DS, Sviridova DA, Abilev SK, Kuzovlev AN, Salnikova LE. Genes and Diseases: Insights from Transcriptomics Studies. Genes (Basel) 2022; 13:genes13071168. [PMID: 35885950 PMCID: PMC9317567 DOI: 10.3390/genes13071168] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 06/13/2022] [Accepted: 06/23/2022] [Indexed: 01/25/2023] Open
Abstract
Results of expression studies can be useful to clarify the genotype-phenotype relationship. However, according to data from recent literature, there is a large group of genes that are revealed as differentially expressed (DE) in many studies, regardless of the biological context. Additional analyses could shed more light on the relationships between genes, their differential expression, and diseases. We generated a set of 9972 disease genes from five gene-phenotype databases (OMIM, ORPHANET, DDG2P, DisGeNet and MalaCards) and a report of the International Union of Immunological Societies. To study transcriptomics of disease and non-disease genes in healthy tissues, we obtained data from the Human Protein Atlas (HPA) website. We analyzed the dependency between expression in healthy tissues and gene occurrence in Gene Expression Omnibus series using tools within the Enrichr libraries. The results of expression studies were annotated with Gene Ontology (GO) and Human Phenotype Ontology (HPO) terms. Using transcriptomics analysis of healthy tissues, we validated the previous findings of higher expression levels of disease genes in pathologically linked tissues compared to other tissues. Preferentially DE genes were generally highly expressed in one or multiple tissues and were enriched for disease genes. According to the results of GO enrichment analyses, both down- and up-regulated DE genes most often took part in immune response, translation and tissue-specific processes. A connection between DE-related pathology and the diversity of HPO terms was found. Investigating a link between expression and phenotype contributes to understanding the mode of development and progression of human diseases.
Collapse
Affiliation(s)
- Dmitry S. Kolobkov
- The Laboratory of Ecological Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia; (D.S.K.); (D.A.S.); (S.K.A.)
| | - Darya A. Sviridova
- The Laboratory of Ecological Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia; (D.S.K.); (D.A.S.); (S.K.A.)
| | - Serikbai K. Abilev
- The Laboratory of Ecological Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia; (D.S.K.); (D.A.S.); (S.K.A.)
| | - Artem N. Kuzovlev
- The Laboratory of Clinical Pathophysiology of Critical Conditions, Federal Research and Clinical Center of Intensive Care Medicine and Rehabilitology, Moscow 107031, Russia;
| | - Lyubov E. Salnikova
- The Laboratory of Ecological Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia; (D.S.K.); (D.A.S.); (S.K.A.)
- The Laboratory of Clinical Pathophysiology of Critical Conditions, Federal Research and Clinical Center of Intensive Care Medicine and Rehabilitology, Moscow 107031, Russia;
- The Laboratory of Molecular Immunology, Rogachev National Research Center of Pediatric Hematology, Oncology and Immunology, Moscow 117997, Russia
- Correspondence:
| |
Collapse
|
9
|
Cao X, Wang X, Zhang S, Sha Q. Gene-based association tests using GWAS summary statistics and incorporating eQTL. Sci Rep 2022; 12:3553. [PMID: 35241742 PMCID: PMC8894384 DOI: 10.1038/s41598-022-07465-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Accepted: 02/11/2022] [Indexed: 01/29/2023] Open
Abstract
Although genome-wide association studies (GWAS) have been successfully applied to a variety of complex diseases and identified many genetic variants underlying complex diseases via single marker tests, there is still a considerable heritability of complex diseases that could not be explained by GWAS. One alternative approach to overcome the missing heritability caused by genetic heterogeneity is gene-based analysis, which considers the aggregate effects of multiple genetic variants in a single test. Another alternative approach is transcriptome-wide association study (TWAS). TWAS aggregates genomic information into functionally relevant units that map to genes and their expression. TWAS is not only powerful, but can also increase the interpretability in biological mechanisms of identified trait associated genes. In this study, we propose a powerful and computationally efficient gene-based association test, called Overall. Using extended Simes procedure, Overall aggregates information from three types of traditional gene-based association tests and also incorporates expression quantitative trait locus (eQTL) information into a gene-based association test using GWAS summary statistics. We show that after a small number of replications to estimate the correlation among the integrated gene-based tests, the p values of Overall can be calculated analytically. Simulation studies show that Overall can control type I error rates very well and has higher power than the tests that we compared with. We also apply Overall to two schizophrenia GWAS summary datasets and two lipids GWAS summary datasets. The results show that this newly developed method can identify more significant genes than other methods we compared with.
Collapse
Affiliation(s)
- Xuewei Cao
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, 49931, USA
| | - Xuexia Wang
- Department of Mathematics, University of North Texas, Denton, TX, USA
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, 49931, USA
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, 49931, USA.
| |
Collapse
|
10
|
Weiner DJ, Gazal S, Robinson EB, O'Connor LJ. Partitioning gene-mediated disease heritability without eQTLs. Am J Hum Genet 2022; 109:405-416. [PMID: 35143757 PMCID: PMC8948166 DOI: 10.1016/j.ajhg.2022.01.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Accepted: 01/13/2022] [Indexed: 12/30/2022] Open
Abstract
Unknown SNP-to-gene regulatory architecture complicates efforts to link noncoding GWAS associations with genes implicated by sequencing or functional studies. eQTLs are often used to link SNPs to genes, but expression in bulk tissue explains a small fraction of disease heritability. A simple but successful approach has been to link SNPs with nearby genes via base pair windows, but genes may often be regulated by SNPs outside their window. We propose the abstract mediation model (AMM) to estimate (1) the fraction of heritability mediated by the closest or kth-closest gene to each SNP and (2) the mediated heritability enrichment of a gene set (e.g., genes with rare-variant associations). AMM jointly estimates these quantities by matching the decay in SNP enrichment with distance from genes in the gene set. Across 47 complex traits and diseases, we estimate that the closest gene to each SNP mediates 27% (SE: 6%) of heritability and that a substantial fraction is mediated by genes outside the ten closest. Mendelian disease genes are strongly enriched for common-variant heritability; for example, just 21 dyslipidemia genes mediate 25% of LDL heritability (211× enrichment, p = 0.01). Among brain-related traits, genes involved in neurodevelopmental disorders are only about 4× enriched, but gene expression patterns are highly informative, as they have detectable differences in per-gene heritability even among weakly brain-expressed genes.
Collapse
Affiliation(s)
- Daniel J Weiner
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| | - Steven Gazal
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA; Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Elise B Robinson
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA
| | - Luke J O'Connor
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| |
Collapse
|
11
|
Badkas A, De Landtsheer S, Sauter T. Construction and contextualization approaches for protein-protein interaction networks. Comput Struct Biotechnol J 2022; 20:3280-3290. [PMID: 35832626 PMCID: PMC9251778 DOI: 10.1016/j.csbj.2022.06.040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Revised: 06/15/2022] [Accepted: 06/15/2022] [Indexed: 11/17/2022] Open
Abstract
Protein-protein interaction network (PPIN) analysis is a widely used method to study the contextual role of proteins of interest, to predict novel disease genes, disease or functional modules, and to identify novel drug targets. PPIN-based analysis uses both generic and context-specific networks. Multiple contextualization methodologies have been described, such as shortest-path algorithms, neighborhood-based methods, and diffusion/propagation algorithms. This review discusses these methods, provides intuitive representations of PPIN contextualization, and also examines how the quality of such context-specific networks could be improved by considering additional sources of evidence. As a heuristic, we observe that tasks such as identifying disease genes, drug targets, and protein complexes should consider local neighborhoods, while uncovering disease mechanisms and discovering disease-pathways would gain from diffusion-based construction.
Collapse
|
12
|
Kolosov N, Daly MJ, Artomov M. Prioritization of disease genes from GWAS using ensemble-based positive-unlabeled learning. Eur J Hum Genet 2021; 29:1527-1535. [PMID: 34276057 PMCID: PMC8484264 DOI: 10.1038/s41431-021-00930-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Revised: 05/23/2021] [Accepted: 06/21/2021] [Indexed: 02/07/2023] Open
Abstract
A primary challenge in understanding disease biology from genome-wide association studies (GWAS) arises from the inability to directly implicate causal genes from association data. Integration of multiple-omics data sources potentially provides important functional links between associated variants and candidate genes. Machine-learning is well-positioned to take advantage of a variety of such data and provide a solution for the prioritization of disease genes. Yet, classical positive-negative classifiers impose strong limitations on the gene prioritization procedure, such as a lack of reliable non-causal genes for training. Here, we developed a novel gene prioritization tool-Gene Prioritizer (GPrior). It is an ensemble of five positive-unlabeled bagging classifiers (Logistic Regression, Support Vector Machine, Random Forest, Decision Tree, Adaptive Boosting), that treats all genes of unknown relevance as an unlabeled set. GPrior selects an optimal composition of algorithms to tune the model for each specific phenotype. Altogether, GPrior fills an important niche of methods for GWAS data post-processing, significantly improving the ability to pinpoint disease genes compared to existing solutions.
Collapse
Affiliation(s)
- Nikita Kolosov
- ITMO University, St. Petersburg, Russia
- Almazov National Medical Research Center, St. Petersburg, Russia
- Broad Institute, Cambridge, MA, USA
| | - Mark J Daly
- Broad Institute, Cambridge, MA, USA.
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
- Institute for Molecular Medicine Finland (FIMM), Helsinki, Finland.
| | - Mykyta Artomov
- ITMO University, St. Petersburg, Russia.
- Almazov National Medical Research Center, St. Petersburg, Russia.
- Broad Institute, Cambridge, MA, USA.
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
- Institute for Molecular Medicine Finland (FIMM), Helsinki, Finland.
| |
Collapse
|
13
|
Nabirotchkin S, Peluffo AE, Rinaudo P, Yu J, Hajj R, Cohen D. Next-generation drug repurposing using human genetics and network biology. Curr Opin Pharmacol 2020; 51:78-92. [PMID: 31982325 DOI: 10.1016/j.coph.2019.12.004] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2019] [Revised: 12/16/2019] [Accepted: 12/19/2019] [Indexed: 12/26/2022]
Abstract
Drug repurposing has attracted increased attention, especially in the context of drug discovery rates that remain too low despite a recent wave of approvals for biological therapeutics (e.g. gene therapy). These new biological entities-based treatments have high costs that are difficult to justify for small markets that include rare diseases. Drug repurposing, involving the identification of single or combinations of existing drugs based on human genetics data and network biology approaches represents a next-generation approach that has the potential to increase the speed of drug discovery at a lower cost. This Pharmacological Perspective reviews progress and perspectives in combining human genetics, especially genome-wide association studies, with network biology to drive drug repurposing for rare and common diseases with monogenic or polygenic etiologies. Also, highlighted here are important features of this next generation approach to drug repurposing, which can be combined with machine learning methods to meet the challenges of personalized medicine.
Collapse
Affiliation(s)
- Serguei Nabirotchkin
- Network Biology & Drug Discovery Department, Pharnext, 11 rue René Jacques, 92130 Issy-les-Moulineaux, France
| | - Alex E Peluffo
- Data Science Department, Pharnext, 11 rue René Jacques, 92130 Issy-les-Moulineaux, France.
| | - Philippe Rinaudo
- Data Science Department, Pharnext, 11 rue René Jacques, 92130 Issy-les-Moulineaux, France
| | - Jinchao Yu
- Data Science Department, Pharnext, 11 rue René Jacques, 92130 Issy-les-Moulineaux, France
| | - Rodolphe Hajj
- Preclinical Research and Pharmacology Department, Pharnext, 11 rue René Jacques, 92130 Issy-les-Moulineaux, France
| | - Daniel Cohen
- Chief Executive Officer, Pharnext, 11 rue René Jacques, 92130 Issy-les-Moulineaux, France
| |
Collapse
|