1
|
Michoel T, Zhang JD. Causal inference in drug discovery and development. Drug Discov Today 2023; 28:103737. [PMID: 37591410 DOI: 10.1016/j.drudis.2023.103737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2022] [Revised: 07/31/2023] [Accepted: 08/10/2023] [Indexed: 08/19/2023]
Abstract
To discover new drugs is to seek and to prove causality. As an emerging approach leveraging human knowledge and creativity, data, and machine intelligence, causal inference holds the promise of reducing cognitive bias and improving decision-making in drug discovery. Although it has been applied across the value chain, the concepts and practice of causal inference remain obscure to many practitioners. This article offers a nontechnical introduction to causal inference, reviews its recent applications, and discusses opportunities and challenges of adopting the causal language in drug discovery and development.
Collapse
Affiliation(s)
- Tom Michoel
- Computational Biology Unit, Department of Informatics, University of Bergen, Postboks 7803, 5020 Bergen, Norway
| | - Jitao David Zhang
- Pharma Early Research and Development, Roche Innovation Centre Basel, F. Hoffmann-La Roche, Grenzacherstrasse 124, 4070 Basel, Switzerland; Department of Mathematics and Computer Science, University of Basel, Spiegelgasse 1, 4051 Basel, Switzerland.
| |
Collapse
|
2
|
Bankier S, Wang L, Crawford A, Morgan RA, Ruusalepp A, Andrew R, Björkegren JLM, Walker BR, Michoel T. Plasma cortisol-linked gene networks in hepatic and adipose tissues implicate corticosteroid-binding globulin in modulating tissue glucocorticoid action and cardiovascular risk. Front Endocrinol (Lausanne) 2023; 14:1186252. [PMID: 37745713 PMCID: PMC10513085 DOI: 10.3389/fendo.2023.1186252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 07/14/2023] [Indexed: 09/26/2023] Open
Abstract
Genome-wide association meta-analysis (GWAMA) by the Cortisol Network (CORNET) consortium identified genetic variants spanning the SERPINA6/SERPINA1 locus on chromosome 14 associated with morning plasma cortisol, cardiovascular disease (CVD), and SERPINA6 mRNA expression encoding corticosteroid-binding globulin (CBG) in the liver. These and other findings indicate that higher plasma cortisol levels are causally associated with CVD; however, the mechanisms by which variations in CBG lead to CVD are undetermined. Using genomic and transcriptomic data from The Stockholm Tartu Atherosclerosis Reverse Networks Engineering Task (STARNET) study, we identified plasma cortisol-linked single-nucleotide polymorphisms (SNPs) that are trans-associated with genes from seven different vascular and metabolic tissues, finding the highest representation of trans-genes in the liver, subcutaneous fat, and visceral abdominal fat, [false discovery rate (FDR) = 15%]. We identified a subset of cortisol-associated trans-genes that are putatively regulated by the glucocorticoid receptor (GR), the primary transcription factor activated by cortisol. Using causal inference, we identified GR-regulated trans-genes that are responsible for the regulation of tissue-specific gene networks. Cis-expression Quantitative Trait Loci (eQTLs) were used as genetic instruments for identification of pairwise causal relationships from which gene networks could be reconstructed. Gene networks were identified in the liver, subcutaneous fat, and visceral abdominal fat, including a high confidence gene network specific to subcutaneous adipose (FDR = 10%) under the regulation of the interferon regulatory transcription factor, IRF2. These data identify a plausible pathway through which variation in the liver CBG production perturbs cortisol-regulated gene networks in peripheral tissues and thereby promote CVD.
Collapse
Affiliation(s)
- Sean Bankier
- University/BHF Centre for Cardiovascular Science, Queen’s Medical Research Institute, University of Edinburgh, Edinburgh, United Kingdom
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
- Division of Genetics and Genomics, The Roslin Institute, The University of Edinburgh, Edinburgh, United Kingdom
| | - Lingfei Wang
- Division of Genetics and Genomics, The Roslin Institute, The University of Edinburgh, Edinburgh, United Kingdom
| | - Andrew Crawford
- University/BHF Centre for Cardiovascular Science, Queen’s Medical Research Institute, University of Edinburgh, Edinburgh, United Kingdom
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, United Kingdom
| | - Ruth A. Morgan
- University/BHF Centre for Cardiovascular Science, Queen’s Medical Research Institute, University of Edinburgh, Edinburgh, United Kingdom
- SRUC, The Roslin Institute, Edinburgh, United Kingdom
| | - Arno Ruusalepp
- Department of Cardiac Surgery, Tartu University Hospital, Tartu, Estonia
- Department of Cardiology, Institute of Clinical Medicine, Tartu University, Tartu, Estonia
- Clinical Gene Networks AB, Stockholm, Sweden
| | - Ruth Andrew
- University/BHF Centre for Cardiovascular Science, Queen’s Medical Research Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - Johan L. M. Björkegren
- Clinical Gene Networks AB, Stockholm, Sweden
- Department of Medicine, Karolinska Institutet, Karolinska Universitetssjukhuset, Huddinge, Sweden
- Department of Genetics & Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Brian R. Walker
- University/BHF Centre for Cardiovascular Science, Queen’s Medical Research Institute, University of Edinburgh, Edinburgh, United Kingdom
- Clinical and Translational Research Institute, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Tom Michoel
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
- Division of Genetics and Genomics, The Roslin Institute, The University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
3
|
Wang L, Trasanidis N, Wu T, Dong G, Hu M, Bauer DE, Pinello L. Dictys: dynamic gene regulatory network dissects developmental continuum with single-cell multiomics. Nat Methods 2023; 20:1368-1378. [PMID: 37537351 DOI: 10.1038/s41592-023-01971-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 07/05/2023] [Indexed: 08/05/2023]
Abstract
Gene regulatory networks (GRNs) are key determinants of cell function and identity and are dynamically rewired during development and disease. Despite decades of advancement, challenges remain in GRN inference, including dynamic rewiring, causal inference, feedback loop modeling and context specificity. To address these challenges, we develop Dictys, a dynamic GRN inference and analysis method that leverages multiomic single-cell assays of chromatin accessibility and gene expression, context-specific transcription factor footprinting, stochastic process network and efficient probabilistic modeling of single-cell RNA-sequencing read counts. Dictys improves GRN reconstruction accuracy and reproducibility and enables the inference and comparative analysis of context-specific and dynamic GRNs across developmental contexts. Dictys' network analyses recover unique insights in human blood and mouse skin development with cell-type-specific and dynamic GRNs. Its dynamic network visualizations enable time-resolved discovery and investigation of developmental driver transcription factors and their regulated targets. Dictys is available as a free, open-source and user-friendly Python package.
Collapse
Affiliation(s)
- Lingfei Wang
- Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital Research Institute, Department of Pathology, Harvard Medical School, Boston, MA, USA
- Gene Regulation Observatory, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Nikolaos Trasanidis
- Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital Research Institute, Department of Pathology, Harvard Medical School, Boston, MA, USA
- Hugh and Josseline Langmuir Centre for Myeloma Research, Centre for Haematology, Department of Immunology and Inflammation, Imperial College London, London, UK
| | - Ting Wu
- Division of Hematology/Oncology, Boston Children's Hospital, Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Stem Cell Institute, Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Guanlan Dong
- Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital Research Institute, Department of Pathology, Harvard Medical School, Boston, MA, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Department of Pediatrics, Harvard Medical School, Bioinformatics and Integrative Genomics PhD Program, Harvard Medical School, Boston, MA, USA
| | - Michael Hu
- Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital Research Institute, Department of Pathology, Harvard Medical School, Boston, MA, USA
| | - Daniel E Bauer
- Gene Regulation Observatory, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Hematology/Oncology, Boston Children's Hospital, Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Stem Cell Institute, Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Luca Pinello
- Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital Research Institute, Department of Pathology, Harvard Medical School, Boston, MA, USA.
- Gene Regulation Observatory, The Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
4
|
Pomiès L, Brouard C, Duruflé H, Maigné É, Carré C, Gody L, Trösser F, Katsirelos G, Mangin B, Langlade NB, de Givry S. Gene regulatory network inference methodology for genomic and transcriptomic data acquired in genetically related heterozygote individuals. Bioinformatics 2022; 38:4127-4134. [PMID: 35792837 DOI: 10.1093/bioinformatics/btac445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 06/17/2022] [Accepted: 07/05/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Inferring gene regulatory networks in non-independent genetically related panels is a methodological challenge. This hampers evolutionary and biological studies using heterozygote individuals such as in wild sunflower populations or cultivated hybrids. RESULTS First, we simulated 100 datasets of gene expressions and polymorphisms, displaying the same gene expression distributions, heterozygosities and heritabilities as in our dataset including 173 genes and 353 genotypes measured in sunflower hybrids. Secondly, we performed a meta-analysis based on six inference methods [least absolute shrinkage and selection operator (Lasso), Random Forests, Bayesian Networks, Markov Random Fields, Ordinary Least Square and fast inference of networks from directed regulation (Findr)] and selected the minimal density networks for better accuracy with 64 edges connecting 79 genes and 0.35 area under precision and recall (AUPR) score on average. We identified that triangles and mutual edges are prone to errors in the inferred networks. Applied on classical datasets without heterozygotes, our strategy produced a 0.65 AUPR score for one dataset of the DREAM5 Systems Genetics Challenge. Finally, we applied our method to an experimental dataset from sunflower hybrids. We successfully inferred a network composed of 105 genes connected by 106 putative regulations with a major connected component. AVAILABILITY AND IMPLEMENTATION Our inference methodology dedicated to genomic and transcriptomic data is available at https://forgemia.inra.fr/sunrise/inference_methods. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lise Pomiès
- MIAT, Université Fédérale de Toulouse, INRAE, Castanet-Tolosan 31326, France
| | - Céline Brouard
- MIAT, Université Fédérale de Toulouse, INRAE, Castanet-Tolosan 31326, France
| | - Harold Duruflé
- LIPME, Université de Toulouse, INRAE, CNRS, Castanet-Tolosan 31326, France
| | - Élise Maigné
- MIAT, Université Fédérale de Toulouse, INRAE, Castanet-Tolosan 31326, France
| | - Clément Carré
- MIAT, Université Fédérale de Toulouse, INRAE, Castanet-Tolosan 31326, France
| | - Louise Gody
- LIPME, Université de Toulouse, INRAE, CNRS, Castanet-Tolosan 31326, France
| | - Fulya Trösser
- MIAT, Université Fédérale de Toulouse, INRAE, Castanet-Tolosan 31326, France
| | - George Katsirelos
- MIA-Paris, AgroParisTech, Université Paris-Saclay, INRAE, Paris 75231, France
| | - Brigitte Mangin
- LIPME, Université de Toulouse, INRAE, CNRS, Castanet-Tolosan 31326, France
| | - Nicolas B Langlade
- LIPME, Université de Toulouse, INRAE, CNRS, Castanet-Tolosan 31326, France
| | - Simon de Givry
- MIAT, Université Fédérale de Toulouse, INRAE, Castanet-Tolosan 31326, France
| |
Collapse
|
5
|
Tan JY, Marques AC. The activity of human enhancers is modulated by the splicing of their associated lncRNAs. PLoS Comput Biol 2022; 18:e1009722. [PMID: 35015755 PMCID: PMC8803168 DOI: 10.1371/journal.pcbi.1009722] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Revised: 01/31/2022] [Accepted: 12/05/2021] [Indexed: 11/19/2022] Open
Abstract
Pervasive enhancer transcription is at the origin of more than half of all long noncoding RNAs in humans. Transcription of enhancer-associated long noncoding RNAs (elncRNA) contribute to their cognate enhancer activity and gene expression regulation in cis. Recently, splicing of elncRNAs was shown to be associated with elevated enhancer activity. However, whether splicing of elncRNA transcripts is a mere consequence of accessibility at highly active enhancers or if elncRNA splicing directly impacts enhancer function, remains unanswered. We analysed genetically driven changes in elncRNA splicing, in humans, to address this outstanding question. We showed that splicing related motifs within multi-exonic elncRNAs evolved under selective constraints during human evolution, suggesting the processing of these transcripts is unlikely to have resulted from transcription across spurious splice sites. Using a genome-wide and unbiased approach, we used nucleotide variants as independent genetic factors to directly assess the causal relationship that underpin elncRNA splicing and their cognate enhancer activity. We found that the splicing of most elncRNAs is associated with changes in chromatin signatures at cognate enhancers and target mRNA expression. We provide evidence that efficient and conserved processing of enhancer-associated elncRNAs contributes to enhancer activity. Most, if not all, active enhancers are transcribed, giving rise to a plethora of transcripts, including enhancer-associated long noncoding RNAs (elncRNAs). Changes in elncRNA levels impacts cognate enhancer activity. Recently splicing of elncRNA has also been found to associate with enhancer activity. Whether this associations reflects a contribution of elncRNA splicing to increased enhancer activity or else is simply the consequence of increased chromatin accessibility that promotes transcriptional elongation and allows for spurious splicing events remains unknown. We show that natural selection has acted, at the species and population level, to preserve DNA elements required for frequent and efficient elncRNA splicing Importantly, using a genome-wide and unbiased statistical population genomics approach, we demonstrate that elncRNA splicing is associated with cognate enhancer function, contributing to chromatin status and enhancer activity. Our results provides strong evidence that efficient elncRNA splicing contributes to enhancer activity genome-wide.
Collapse
Affiliation(s)
- Jennifer Yihong Tan
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- * E-mail: (JYT); (ACM)
| | - Ana Claudia Marques
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- * E-mail: (JYT); (ACM)
| |
Collapse
|
6
|
Koplev S, Seldin M, Sukhavasi K, Ermel R, Pang S, Zeng L, Bankier S, Di Narzo A, Cheng H, Meda V, Ma A, Talukdar H, Cohain A, Amadori L, Argmann C, Houten SM, Franzén O, Mocci G, Meelu OA, Ishikawa K, Whatling C, Jain A, Jain RK, Gan LM, Giannarelli C, Roussos P, Hao K, Schunkert H, Michoel T, Ruusalepp A, Schadt EE, Kovacic JC, Lusis AJ, Björkegren JLM. A mechanistic framework for cardiometabolic and coronary artery diseases. NATURE CARDIOVASCULAR RESEARCH 2022; 1:85-100. [PMID: 36276926 PMCID: PMC9583458 DOI: 10.1038/s44161-021-00009-1] [Citation(s) in RCA: 44] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/19/2023]
Abstract
Coronary atherosclerosis results from the delicate interplay of genetic and exogenous risk factors, principally taking place in metabolic organs and the arterial wall. Here we show that 224 gene-regulatory coexpression networks (GRNs) identified by integrating genetic and clinical data from patients with (n = 600) and without (n = 250) coronary artery disease (CAD) with RNA-seq data from seven disease-relevant tissues in the Stockholm-Tartu Atherosclerosis Reverse Network Engineering Task (STARNET) study largely capture this delicate interplay, explaining >54% of CAD heritability. Within 89 cross-tissue GRNs associated with clinical severity of CAD, 374 endocrine factors facilitated inter-organ interactions, primarily along an axis from adipose tissue to the liver (n = 152). This axis was independently replicated in genetically diverse mouse strains and by injection of recombinant forms of adipose endocrine factors (EPDR1, FCN2, FSTL3 and LBP) that markedly altered blood lipid and glucose levels in mice. Altogether, the STARNET database and the associated GRN browser (http://starnet.mssm.edu) provide a multiorgan framework for exploration of the molecular interplay between cardiometabolic disorders and CAD.
Collapse
Affiliation(s)
- Simon Koplev
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Marcus Seldin
- Departments of Medicine, Human Genetics and Microbiology, Immunology & Molecular Genetics, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Biological Chemistry and Center for Epigenetics and Metabolism, University of California, Irvine, CA, USA
| | - Katyayani Sukhavasi
- Department of Cardiac Surgery and the Heart Clinic, Tartu University Hospital and Department of Cardiology, Institute of Clinical Medicine, Tartu University, Tartu, Estonia
| | - Raili Ermel
- Department of Cardiac Surgery and the Heart Clinic, Tartu University Hospital and Department of Cardiology, Institute of Clinical Medicine, Tartu University, Tartu, Estonia
| | - Shichao Pang
- Deutsches Herzzentrum München, Klinik für Herz- und Kreislauferkrankungen, Technische Universität München, DZHK (German Centre for Cardiovascular Research), Munich Heart Alliance, Munich, Germany
| | - Lingyao Zeng
- Deutsches Herzzentrum München, Klinik für Herz- und Kreislauferkrankungen, Technische Universität München, DZHK (German Centre for Cardiovascular Research), Munich Heart Alliance, Munich, Germany
| | - Sean Bankier
- BHF Centre for Cardiovascular Science, Queen’s Medical Research Institute, University of Edinburgh, Edinburgh, UK
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Antonio Di Narzo
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Haoxiang Cheng
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Vamsidhar Meda
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Angela Ma
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Husain Talukdar
- Department of Medicine, Karolinska Institutet, Karolinska Universitetssjukhuset, Huddinge, Sweden
| | - Ariella Cohain
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Letizia Amadori
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- New York University Cardiovascular Research Center, Department of Medicine, Leon H. Charney Division of Cardiology, New York University Grossman School of Medicine, New York University Langone Health, New York, NY, USA
| | - Carmen Argmann
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Sander M. Houten
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Oscar Franzén
- Department of Medicine, Karolinska Institutet, Karolinska Universitetssjukhuset, Huddinge, Sweden
| | - Giuseppe Mocci
- Department of Medicine, Karolinska Institutet, Karolinska Universitetssjukhuset, Huddinge, Sweden
| | - Omar A. Meelu
- Cardiovascular Research Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Kiyotake Ishikawa
- Cardiovascular Research Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Carl Whatling
- Translational Science and Experimental Medicine, Research and Early Development, Cardiovascular, Renal and Metabolism, BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden
| | - Anamika Jain
- Department of Cardiac Surgery and the Heart Clinic, Tartu University Hospital and Department of Cardiology, Institute of Clinical Medicine, Tartu University, Tartu, Estonia
| | - Rajeev Kumar Jain
- Department of Cardiac Surgery and the Heart Clinic, Tartu University Hospital and Department of Cardiology, Institute of Clinical Medicine, Tartu University, Tartu, Estonia
| | - Li-Ming Gan
- Early Clinical Development, Research and Early Development, Cardiovascular, Renal and Metabolism, BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden
| | - Chiara Giannarelli
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- New York University Cardiovascular Research Center, Department of Medicine, Leon H. Charney Division of Cardiology, New York University Grossman School of Medicine, New York University Langone Health, New York, NY, USA
| | - Panos Roussos
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Dementia Research, Nathan Kline Institute for Psychiatric Research, Orangeburg, NY, USA
- Mental Illness Research Education and Clinical Center (MIRECC), James J. Peters VA Medical Center, Bronx, NY, USA
| | - Ke Hao
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Sema4, Stamford, CT, USA
| | - Heribert Schunkert
- Deutsches Herzzentrum München, Klinik für Herz- und Kreislauferkrankungen, Technische Universität München, DZHK (German Centre for Cardiovascular Research), Munich Heart Alliance, Munich, Germany
| | - Tom Michoel
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Arno Ruusalepp
- Department of Cardiac Surgery and the Heart Clinic, Tartu University Hospital and Department of Cardiology, Institute of Clinical Medicine, Tartu University, Tartu, Estonia
- Clinical Gene Networks AB, Stockholm, Sweden
| | - Eric E. Schadt
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Sema4, Stamford, CT, USA
| | - Jason C. Kovacic
- Cardiovascular Research Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Victor Chang Cardiac Research Institute, Darlinghurst, New South Wales, Australia
- St Vincent’s Clinical School, University of NSW, Sydney, New South Wales, Australia
| | - Aldon J. Lusis
- Departments of Medicine, Human Genetics and Microbiology, Immunology & Molecular Genetics, University of California, Los Angeles, Los Angeles, CA, USA
| | - Johan L. M. Björkegren
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Medicine, Karolinska Institutet, Karolinska Universitetssjukhuset, Huddinge, Sweden
- Clinical Gene Networks AB, Stockholm, Sweden
- Correspondence and requests for materials should be addressed to Johan L. M. Björkegren.
| |
Collapse
|
7
|
Bankier S, Michoel T. eQTLs as causal instruments for the reconstruction of hormone linked gene networks. Front Endocrinol (Lausanne) 2022; 13:949061. [PMID: 36060942 PMCID: PMC9428692 DOI: 10.3389/fendo.2022.949061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 07/25/2022] [Indexed: 11/17/2022] Open
Abstract
Hormones act within in highly dynamic systems and much of the phenotypic response to variation in hormone levels is mediated by changes in gene expression. The increase in the number and power of large genetic association studies has led to the identification of hormone linked genetic variants. However, the biological mechanisms underpinning the majority of these loci are poorly understood. The advent of affordable, high throughput next generation sequencing and readily available transcriptomic databases has shown that many of these genetic variants also associate with variation in gene expression levels as expression Quantitative Trait Loci (eQTLs). In addition to further dissecting complex genetic variation, eQTLs have been applied as tools for causal inference. Many hormone networks are driven by transcription factors, and many of these genes can be linked to eQTLs. In this mini-review, we demonstrate how causal inference and gene networks can be used to describe the impact of hormone linked genetic variation upon the transcriptome within an endocrinology context.
Collapse
|
8
|
Wang L. Single-cell normalization and association testing unifying CRISPR screen and gene co-expression analyses with Normalisr. Nat Commun 2021; 12:6395. [PMID: 34737291 PMCID: PMC8568964 DOI: 10.1038/s41467-021-26682-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2021] [Accepted: 10/19/2021] [Indexed: 12/13/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) provides unprecedented technical and statistical potential to study gene regulation but is subject to technical variations and sparsity. Furthermore, statistical association testing remains difficult for scRNA-seq. Here we present Normalisr, a normalization and statistical association testing framework that unifies single-cell differential expression, co-expression, and CRISPR screen analyses with linear models. By systematically detecting and removing nonlinear confounders arising from library size at mean and variance levels, Normalisr achieves high sensitivity, specificity, speed, and generalizability across multiple scRNA-seq protocols and experimental conditions with unbiased p-value estimation. The superior scalability allows us to reconstruct robust gene regulatory networks from trans-effects of guide RNAs in large-scale single cell CRISPRi screens. On conventional scRNA-seq, Normalisr recovers gene-level co-expression networks that recapitulated known gene functions.
Collapse
Affiliation(s)
- Lingfei Wang
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Center for Computational and Integrative Biology, Massachusetts General Hospital, Boston, MA, USA.
- Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital Research Institute, Charlestown, MA, USA.
| |
Collapse
|
9
|
Genetic program activity delineates risk, relapse, and therapy responsiveness in multiple myeloma. NPJ Precis Oncol 2021; 5:60. [PMID: 34183722 PMCID: PMC8239045 DOI: 10.1038/s41698-021-00185-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2020] [Accepted: 05/13/2021] [Indexed: 01/19/2023] Open
Abstract
Despite recent advancements in the treatment of multiple myeloma (MM), nearly all patients ultimately relapse and many become refractory to multiple lines of therapies. Therefore, we not only need the ability to predict which patients are at high risk for disease progression but also a means to understand the mechanisms underlying their risk. Here, we report a transcriptional regulatory network (TRN) for MM inferred from cross-sectional multi-omics data from 881 patients that predicts how 124 chromosomal abnormalities and somatic mutations causally perturb 392 transcription regulators of 8549 genes to manifest in distinct clinical phenotypes and outcomes. We identified 141 genetic programs whose activity profiles stratify patients into 25 distinct transcriptional states and proved to be more predictive of outcomes than did mutations. The coherence of these programs and accuracy of our network-based risk prediction was validated in two independent datasets. We observed subtype-specific vulnerabilities to interventions with existing drugs and revealed plausible mechanisms for relapse, including the establishment of an immunosuppressive microenvironment. Investigation of the t(4;14) clinical subtype using the TRN revealed that 16% of these patients exhibit an extreme-risk combination of genetic programs (median progression-free survival of 5 months) that create a distinct phenotype with targetable genes and pathways.
Collapse
|
10
|
Badsha MB, Martin EA, Fu AQ. MRPC: An R Package for Inference of Causal Graphs. Front Genet 2021; 12:651812. [PMID: 33995486 PMCID: PMC8120292 DOI: 10.3389/fgene.2021.651812] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 04/06/2021] [Indexed: 11/24/2022] Open
Abstract
Understanding the causal relationships between variables is a central goal of many scientific inquiries. Causal relationships may be represented by directed edges in a graph (or equivalently, a network). In biology, for example, gene regulatory networks may be viewed as a type of causal networks, where X→Y represents gene X regulating (i.e., being causal to) gene Y. However, existing general-purpose graph inference methods often result in a high number of false edges, whereas current causal inference methods developed for observational data in genomics can handle only limited types of causal relationships. We present MRPC (a PC algorithm with the principle of Mendelian Randomization), an R package that learns causal graphs with improved accuracy over existing methods. Our algorithm builds on the powerful PC algorithm (named after its developers Peter Spirtes and Clark Glymour), a canonical algorithm in computer science for learning directed acyclic graphs. The improvements in MRPC result in increased accuracy in identifying v-structures (i.e., X→Y←Z), and robustness to how the nodes are arranged in the input data. In the special case of genomic data that contain genotypes and phenotypes (e.g., gene expression) at the individual level, MRPC incorporates the principle of Mendelian randomization as constraints on edge direction to help orient the edges. MRPC allows for inference of causal graphs not only for general purposes, but also for biomedical data where multiple types of data may be input to provide evidence for causality. The R package is available on CRAN and is a free open-source software package under a GPL (≥2) license.
Collapse
Affiliation(s)
- Md. Bahadur Badsha
- Institute for Modeling Collaboration and Innovation, University of Idaho, Moscow, ID, United States
| | - Evan A. Martin
- The Graduate Program in Bioinformatics and Computational Biology, University of Idaho, Moscow, ID, United States
| | - Audrey Qiuyan Fu
- Institute for Modeling Collaboration and Innovation, University of Idaho, Moscow, ID, United States
- Department of Mathematics and Statistical Science, Institute for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, ID, United States
| |
Collapse
|
11
|
Ludl AA, Michoel T. Comparison between instrumental variable and mediation-based methods for reconstructing causal gene networks in yeast. Mol Omics 2021; 17:241-251. [PMID: 33438713 DOI: 10.1039/d0mo00140f] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Causal gene networks model the flow of information within a cell. Reconstructing causal networks from omics data is challenging because correlation does not imply causation. When genomics and transcriptomics data from a segregating population are combined, genomic variants can be used to orient the direction of causality between gene expression traits. Instrumental variable methods use a local expression quantitative trait locus (eQTL) as a randomized instrument for a gene's expression level, and assign target genes based on distal eQTL associations. Mediation-based methods additionally require that distal eQTL associations are mediated by the source gene. A detailed comparison between these methods has not yet been conducted, due to the lack of a standardized implementation of different methods, the limited sample size of most multi-omics datasets, and the absence of ground-truth networks for most organisms. Here we used Findr, a software package providing uniform implementations of instrumental variable, mediation, and coexpression-based methods, a recent dataset of 1012 segregants from a cross between two budding yeast strains, and the Yeastract database of known transcriptional interactions to compare causal gene network inference methods. We found that causal inference methods result in a significant overlap with the ground-truth, whereas coexpression did not perform better than random. A subsampling analysis revealed that the performance of mediation saturates at large sample sizes, due to a loss of sensitivity when residual correlations become significant. Instrumental variable methods on the other hand contain false positive predictions, due to genomic linkage between eQTL instruments. Instrumental variable and mediation-based methods also have complementary roles for identifying causal genes underlying transcriptional hotspots. Instrumental variable methods correctly predicted STB5 targets for a hotspot centred on the transcription factor STB5, whereas mediation failed due to Stb5p auto-regulating its own expression. Mediation suggests a new candidate gene, DNM1, for a hotspot on Chr XII, whereas instrumental variable methods could not distinguish between multiple genes located within the hotspot. In conclusion, causal inference from genomics and transcriptomics data is a powerful approach for reconstructing causal gene networks, which could be further improved by the development of methods to control for residual correlations in mediation analyses, and for genomic linkage and pleiotropic effects from transcriptional hotspots in instrumental variable analyses.
Collapse
Affiliation(s)
- Adriaan-Alexander Ludl
- Computational Biology Unit, Department of Informatics, University of Bergen, PO Box 7803, 5020 Bergen, Norway.
| | | |
Collapse
|
12
|
Blum MGB, Valeri L, François O, Cadiou S, Siroux V, Lepeule J, Slama R. Challenges Raised by Mediation Analysis in a High-Dimension Setting. ENVIRONMENTAL HEALTH PERSPECTIVES 2020; 128:55001. [PMID: 32379489 PMCID: PMC7263455 DOI: 10.1289/ehp6240] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Revised: 04/14/2020] [Accepted: 04/15/2020] [Indexed: 05/19/2023]
Abstract
BACKGROUND Mediation analysis is used in epidemiology to identify pathways through which exposures influence health. The advent of high-throughput (omics) technologies gives opportunities to perform mediation analysis with a high-dimension pool of covariates. OBJECTIVE We aimed to highlight some biostatistical issues of this expanding field of high-dimension mediation. DISCUSSION The mediation techniques used for a single mediator cannot be generalized in a straightforward manner to high-dimension mediation. Causal knowledge on the relation between covariates is required for mediation analysis, and it is expected to be more limited as dimension and system complexity increase. The methods developed in high dimension can be distinguished according to whether mediators are considered separately or as a whole. Methods considering each potential mediator separately do not allow efficient identification of the indirect effects when mutual influences exist among the mediators, which is expected for many biological (e.g., epigenetic) parameters. In this context, methods considering all potential mediators simultaneously, based, for example, on data reduction techniques, are more adapted to the causal inference framework. Their cost is a possible lack of ability to single out the causal mediators. Moreover, the ability of the mediators to predict the outcome can be overestimated, in particular because many machine-learning algorithms are optimized to increase predictive ability rather than their aptitude to make causal inference. Given the lack of overarching validated framework and the generally complex causal structure of high-dimension data, analysis of high-dimension mediation currently requires great caution and effort to incorporate a priori biological knowledge. https://doi.org/10.1289/EHP6240.
Collapse
Affiliation(s)
- Michaël G B Blum
- Laboratoire Techniques de l'Imagerie Médicale et de la Complexité (TIMC-IMAG; UMR 5525), French National Centre for Scientific Research (CNRS), University Grenoble Alpes, La Tronche, France
- OWKIN, Paris, France
| | - Linda Valeri
- Department of Biostatistics, Columbia University Mailman School of Public Health, New York, New York, USA
| | - Olivier François
- Laboratoire Techniques de l'Imagerie Médicale et de la Complexité (TIMC-IMAG; UMR 5525), French National Centre for Scientific Research (CNRS), University Grenoble Alpes, La Tronche, France
| | - Solène Cadiou
- Team of Environmental Epidemiology applied to Reproduction and Respiratory Health, Institute for Advanced Biosciences (IAB) joint research center, Institut national de la santé et de la recherché médicale (Inserm), CNRS, University Grenoble-Alpes, Grenoble, France
| | - Valérie Siroux
- Team of Environmental Epidemiology applied to Reproduction and Respiratory Health, Institute for Advanced Biosciences (IAB) joint research center, Institut national de la santé et de la recherché médicale (Inserm), CNRS, University Grenoble-Alpes, Grenoble, France
| | - Johanna Lepeule
- Team of Environmental Epidemiology applied to Reproduction and Respiratory Health, Institute for Advanced Biosciences (IAB) joint research center, Institut national de la santé et de la recherché médicale (Inserm), CNRS, University Grenoble-Alpes, Grenoble, France
| | - Rémy Slama
- Team of Environmental Epidemiology applied to Reproduction and Respiratory Health, Institute for Advanced Biosciences (IAB) joint research center, Institut national de la santé et de la recherché médicale (Inserm), CNRS, University Grenoble-Alpes, Grenoble, France
| |
Collapse
|
13
|
Wang L, Audenaert P, Michoel T. High-Dimensional Bayesian Network Inference From Systems Genetics Data Using Genetic Node Ordering. Front Genet 2019; 10:1196. [PMID: 31921278 PMCID: PMC6933017 DOI: 10.3389/fgene.2019.01196] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Accepted: 10/29/2019] [Indexed: 11/23/2022] Open
Abstract
Studying the impact of genetic variation on gene regulatory networks is essential to understand the biological mechanisms by which genetic variation causes variation in phenotypes. Bayesian networks provide an elegant statistical approach for multi-trait genetic mapping and modelling causal trait relationships. However, inferring Bayesian gene networks from high-dimensional genetics and genomics data is challenging, because the number of possible networks scales super-exponentially with the number of nodes, and the computational cost of conventional Bayesian network inference methods quickly becomes prohibitive. We propose an alternative method to infer high-quality Bayesian gene networks that easily scales to thousands of genes. Our method first reconstructs a node ordering by conducting pairwise causal inference tests between genes, which then allows to infer a Bayesian network via a series of independent variable selection problems, one for each gene. We demonstrate using simulated and real systems genetics data that this results in a Bayesian network with equal, and sometimes better, likelihood than the conventional methods, while having a significantly higher overlap with groundtruth networks and being orders of magnitude faster. Moreover our method allows for a unified false discovery rate control across genes and individual edges, and thus a rigorous and easily interpretable way for tuning the sparsity level of the inferred network. Bayesian network inference using pairwise node ordering is a highly efficient approach for reconstructing gene regulatory networks when prior information for the inclusion of edges exists or can be inferred from the available data.
Collapse
Affiliation(s)
- Lingfei Wang
- Division of Genetics and Genomics, The Roslin Institute, The University of Edinburgh, Easter Bush Campus, Midlothian, United Kingdom
- Broad Institute of Harvard and MIT, Cambridge, MA, United States
- Department of Molecular Biology, Massachusetts General Hospital, Boston, MA, United States
| | - Pieter Audenaert
- IDLab, Ghent University—imec, Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| | - Tom Michoel
- Division of Genetics and Genomics, The Roslin Institute, The University of Edinburgh, Easter Bush Campus, Midlothian, United Kingdom
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| |
Collapse
|
14
|
Bucur IG, Claassen T, Heskes T. Inferring the direction of a causal link and estimating its effect via a Bayesian Mendelian randomization approach. Stat Methods Med Res 2019; 29:1081-1111. [PMID: 31146640 PMCID: PMC7221461 DOI: 10.1177/0962280219851817] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The use of genetic variants as instrumental variables - an approach known as Mendelian randomization - is a popular epidemiological method for estimating the causal effect of an exposure (phenotype, biomarker, risk factor) on a disease or health-related outcome from observational data. Instrumental variables must satisfy strong, often untestable assumptions, which means that finding good genetic instruments among a large list of potential candidates is challenging. This difficulty is compounded by the fact that many genetic variants influence more than one phenotype through different causal pathways, a phenomenon called horizontal pleiotropy. This leads to errors not only in estimating the magnitude of the causal effect but also in inferring the direction of the putative causal link. In this paper, we propose a Bayesian approach called BayesMR that is a generalization of the Mendelian randomization technique in which we allow for pleiotropic effects and, crucially, for the possibility of reverse causation. The output of the method is a posterior distribution over the target causal effect, which provides an immediate and easily interpretable measure of the uncertainty in the estimation. More importantly, we use Bayesian model averaging to determine how much more likely the inferred direction is relative to the reverse direction.
Collapse
Affiliation(s)
- Ioan Gabriel Bucur
- Data Science Department, Institute for Computing and Information Sciences, Radboud University, Nijmegen, The Netherlands
| | - Tom Claassen
- Data Science Department, Institute for Computing and Information Sciences, Radboud University, Nijmegen, The Netherlands
| | - Tom Heskes
- Data Science Department, Institute for Computing and Information Sciences, Radboud University, Nijmegen, The Netherlands
| |
Collapse
|
15
|
Badsha MB, Fu AQ. Learning Causal Biological Networks With the Principle of Mendelian Randomization. Front Genet 2019; 10:460. [PMID: 31164902 PMCID: PMC6536645 DOI: 10.3389/fgene.2019.00460] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 04/30/2019] [Indexed: 01/09/2023] Open
Abstract
Although large amounts of genomic data are available, it remains a challenge to reliably infer causal (i. e., regulatory) relationships among molecular phenotypes (such as gene expression), especially when multiple phenotypes are involved. We extend the interpretation of the Principle of Mendelian randomization (PMR) and present MRPC, a novel machine learning algorithm that incorporates the PMR in the PC algorithm, a classical algorithm for learning causal graphs in computer science. MRPC learns a causal biological network efficiently and robustly from integrating individual-level genotype and molecular phenotype data, in which directed edges indicate causal directions. We demonstrate through simulation that MRPC outperforms several popular general-purpose network inference methods and PMR-based methods. We apply MRPC to distinguish direct and indirect targets among multiple genes associated with expression quantitative trait loci. Our method is implemented in the R package MRPC, available on CRAN (https://cran.r-project.org/web/packages/MRPC/index.html).
Collapse
Affiliation(s)
- Md. Bahadur Badsha
- Department of Statistical Science, Center for Modeling Complex Interactions, Institute for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, ID, United States
| | - Audrey Qiuyan Fu
- Department of Statistical Science, Center for Modeling Complex Interactions, Institute for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, ID, United States
| |
Collapse
|
16
|
Whole-Transcriptome Causal Network Inference with Genomic and Transcriptomic Data. Methods Mol Biol 2018. [PMID: 30547397 DOI: 10.1007/978-1-4939-8882-2_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Reconstruction of causal gene networks can distinguish regulators from targets and reduce false positives by integrating genetic variations. Its recent developments in speed and accuracy have enabled whole-transcriptome causal network inference on a personal computer. Here, we demonstrate this technique with program Findr on 3000 genes from the Geuvadis dataset. Subsequent analysis reveals major hub genes in the reconstructed network.
Collapse
|
17
|
Vipin D, Wang L, Devailly G, Michoel T, Joshi A. Causal Transcription Regulatory Network Inference Using Enhancer Activity as a Causal Anchor. Int J Mol Sci 2018; 19:ijms19113609. [PMID: 30445760 PMCID: PMC6274755 DOI: 10.3390/ijms19113609] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2018] [Revised: 11/05/2018] [Accepted: 11/08/2018] [Indexed: 02/08/2023] Open
Abstract
Transcription control plays a crucial role in establishing a unique gene expression signature for each of the hundreds of mammalian cell types. Though gene expression data have been widely used to infer cellular regulatory networks, existing methods mainly infer correlations rather than causality. We developed statistical models and likelihood-ratio tests to infer causal gene regulatory networks using enhancer RNA (eRNA) expression information as a causal anchor and applied the framework to eRNA and transcript expression data from the FANTOM Consortium. Predicted causal targets of transcription factors (TFs) in mouse embryonic stem cells, macrophages and erythroblastic leukaemia overlapped significantly with experimentally-validated targets from ChIP-seq and perturbation data. We further improved the model by taking into account that some TFs might act in a quantitative, dosage-dependent manner, whereas others might act predominantly in a binary on/off fashion. We predicted TF targets from concerted variation of eRNA and TF and target promoter expression levels within a single cell type, as well as across multiple cell types. Importantly, TFs with high-confidence predictions were largely different between these two analyses, demonstrating that variability within a cell type is highly relevant for target prediction of cell type-specific factors. Finally, we generated a compendium of high-confidence TF targets across diverse human cell and tissue types.
Collapse
Affiliation(s)
- Deepti Vipin
- Division of Developmental Biology, The Roslin Institute, The University of Edinburgh, Easter Bush, Midlothian, EH25 9RG Scotland, UK.
| | - Lingfei Wang
- Division of Genetics and Genomics, The Roslin Institute, The University of Edinburgh, Easter Bush, Midlothian, EH25 9RG Scotland, UK.
| | - Guillaume Devailly
- Division of Developmental Biology, The Roslin Institute, The University of Edinburgh, Easter Bush, Midlothian, EH25 9RG Scotland, UK.
| | - Tom Michoel
- Division of Genetics and Genomics, The Roslin Institute, The University of Edinburgh, Easter Bush, Midlothian, EH25 9RG Scotland, UK.
- Computational Biology Unit, Department of Informatics, University of Bergen, DataBlokk, 5th Floor, Thormohlensgt 55, N-5008 Bergen, Norway.
| | - Anagha Joshi
- Division of Developmental Biology, The Roslin Institute, The University of Edinburgh, Easter Bush, Midlothian, EH25 9RG Scotland, UK.
- Computational Biology Unit, Department of Clinical Science, University of Bergen, DataBlokk, 5th Floor, Thormohlensgt 55, N-5008 Bergen, Norway.
| |
Collapse
|
18
|
Hemani G, Bowden J, Davey Smith G. Evaluating the potential role of pleiotropy in Mendelian randomization studies. Hum Mol Genet 2018; 27:R195-R208. [PMID: 29771313 PMCID: PMC6061876 DOI: 10.1093/hmg/ddy163] [Citation(s) in RCA: 736] [Impact Index Per Article: 122.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Revised: 04/26/2018] [Accepted: 04/30/2018] [Indexed: 02/06/2023] Open
Abstract
Pleiotropy, the phenomenon of a single genetic variant influencing multiple traits, is likely widespread in the human genome. If pleiotropy arises because the single nucleotide polymorphism (SNP) influences one trait, which in turn influences another ('vertical pleiotropy'), then Mendelian randomization (MR) can be used to estimate the causal influence between the traits. Of prime focus among the many limitations to MR is the unprovable assumption that apparent pleiotropic associations are mediated by the exposure (i.e. reflect vertical pleiotropy), and do not arise due to SNPs influencing the two traits through independent pathways ('horizontal pleiotropy'). The burgeoning treasure trove of genetic associations yielded through genome wide association studies makes for a tantalizing prospect of phenome-wide causal inference. Recent years have seen substantial attention devoted to the problem of horizontal pleiotropy, and in this review we outline how newly developed methods can be used together to improve the reliability of MR.
Collapse
Affiliation(s)
- Gibran Hemani
- MRC Integrative Epidemiology Unit, Population Health Sciences, University of Bristol
| | - Jack Bowden
- MRC Integrative Epidemiology Unit, Population Health Sciences, University of Bristol
| | - George Davey Smith
- MRC Integrative Epidemiology Unit, Population Health Sciences, University of Bristol
| |
Collapse
|