1
|
Peng J, Bao Z, Li J, Han R, Wang Y, Han L, Peng J, Wang T, Hao J, Wei Z, Shang X. DeepRisk: A deep learning approach for genome-wide assessment of common disease risk. FUNDAMENTAL RESEARCH 2024; 4:752-760. [PMID: 39156563 PMCID: PMC11330112 DOI: 10.1016/j.fmre.2024.02.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 02/02/2024] [Accepted: 02/25/2024] [Indexed: 08/20/2024] Open
Abstract
The potential for being able to identify individuals at high disease risk solely based on genotype data has garnered significant interest. Although widely applied, traditional polygenic risk scoring methods fall short, as they are built on additive models that fail to capture the intricate associations among single nucleotide polymorphisms (SNPs). This presents a limitation, as genetic diseases often arise from complex interactions between multiple SNPs. To address this challenge, we developed DeepRisk, a biological knowledge-driven deep learning method for modeling these complex, nonlinear associations among SNPs, to provide a more effective method for scoring the risk of common diseases with genome-wide genotype data. Evaluations demonstrated that DeepRisk outperforms existing PRS-based methods in identifying individuals at high risk for four common diseases: Alzheimer's disease, inflammatory bowel disease, type 2 diabetes, and breast cancer.
Collapse
Affiliation(s)
- Jiajie Peng
- AI for Science Interdisciplinary Research Center, School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China
- Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an 710129, China
- Research and Development Institute of Northwestern Polytechnical University in Shenzhen, Shenzhen 518000, China
| | - Zhijie Bao
- AI for Science Interdisciplinary Research Center, School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China
- Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an 710129, China
| | - Jingyi Li
- AI for Science Interdisciplinary Research Center, School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China
- Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an 710129, China
| | - Ruijiang Han
- AI for Science Interdisciplinary Research Center, School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China
- Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an 710129, China
| | - Yuxian Wang
- AI for Science Interdisciplinary Research Center, School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China
- Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an 710129, China
| | - Lu Han
- AI for Science Interdisciplinary Research Center, School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China
- Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an 710129, China
| | - Jinghao Peng
- AI for Science Interdisciplinary Research Center, School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China
- Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an 710129, China
| | - Tao Wang
- AI for Science Interdisciplinary Research Center, School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China
- Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an 710129, China
| | - Jianye Hao
- College of Intelligence and Computing, Tianjin University, Tianjin 300072, China
| | - Zhongyu Wei
- School of Data Science, Fudan University, Shanghai 200433, China
| | - Xuequn Shang
- AI for Science Interdisciplinary Research Center, School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China
- Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an 710129, China
| |
Collapse
|
2
|
von Maydell D, Wright S, Bonner JM, Staab C, Spitaleri A, Liu L, Pao PC, Yu CJ, Scannail AN, Li M, Boix CA, Mathys H, Leclerc G, Menchaca GS, Welch G, Graziosi A, Leary N, Samaan G, Kellis M, Tsai LH. Single-cell atlas of ABCA7 loss-of-function reveals impaired neuronal respiration via choline-dependent lipid imbalances. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.09.05.556135. [PMID: 38979214 PMCID: PMC11230156 DOI: 10.1101/2023.09.05.556135] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Loss-of-function (LoF) variants in the lipid transporter ABCA7 significantly increase the risk of Alzheimer's disease (odds ratio ∼2), yet the pathogenic mechanisms and the neural cell types affected by these variants remain largely unknown. Here, we performed single-nuclear RNA sequencing of 36 human post-mortem samples from the prefrontal cortex of 12 ABCA7 LoF carriers and 24 matched non-carrier control individuals. ABCA7 LoF was associated with gene expression changes in all major cell types. Excitatory neurons, which expressed the highest levels of ABCA7, showed transcriptional changes related to lipid metabolism, mitochondrial function, cell cycle-related pathways, and synaptic signaling. ABCA7 LoF-associated transcriptional changes in neurons were similarly perturbed in carriers of the common AD missense variant ABCA7 p.Ala1527Gly (n = 240 controls, 135 carriers), indicating that findings from our study may extend to large portions of the at-risk population. Consistent with ABCA7's function as a lipid exporter, lipidomic analysis of isogenic iPSC-derived neurons (iNs) revealed profound intracellular triglyceride accumulation in ABCA7 LoF, which was accompanied by a relative decrease in phosphatidylcholine abundance. Metabolomic and biochemical analyses of iNs further indicated that ABCA7 LoF was associated with disrupted mitochondrial bioenergetics that suggested impaired lipid breakdown by uncoupled respiration. Treatment of ABCA7 LoF iNs with CDP-choline (a rate-limiting precursor of phosphatidylcholine synthesis) reduced triglyceride accumulation and restored mitochondrial function, indicating that ABCA7 LoF-induced phosphatidylcholine dyshomeostasis may directly disrupt mitochondrial metabolism of lipids. Treatment with CDP-choline also rescued intracellular amyloid β -42 levels in ABCA7 LoF iNs, further suggesting a link between ABCA7 LoF metabolic disruptions in neurons and AD pathology. This study provides a detailed transcriptomic atlas of ABCA7 LoF in the human brain and mechanistically links ABCA7 LoF-induced lipid perturbations to neuronal energy dyshomeostasis. In line with a growing body of evidence, our study highlights the central role of lipid metabolism in the etiology of Alzheimer's disease.
Collapse
|
3
|
Bass AJ, Bian S, Wingo AP, Wingo TS, Cutler DJ, Epstein MP. Identifying latent genetic interactions in genome-wide association studies using multiple traits. Genome Med 2024; 16:62. [PMID: 38664839 PMCID: PMC11044415 DOI: 10.1186/s13073-024-01329-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 04/02/2024] [Indexed: 04/28/2024] Open
Abstract
The "missing" heritability of complex traits may be partly explained by genetic variants interacting with other genes or environments that are difficult to specify, observe, and detect. We propose a new kernel-based method called Latent Interaction Testing (LIT) to screen for genetic interactions that leverages pleiotropy from multiple related traits without requiring the interacting variable to be specified or observed. Using simulated data, we demonstrate that LIT increases power to detect latent genetic interactions compared to univariate methods. We then apply LIT to obesity-related traits in the UK Biobank and detect variants with interactive effects near known obesity-related genes (URL: https://CRAN.R-project.org/package=lit ).
Collapse
Affiliation(s)
- Andrew J Bass
- Department of Human Genetics, Emory University, Atlanta, GA, 30322, USA.
| | - Shijia Bian
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, 30322, USA
| | - Aliza P Wingo
- Department of Psychiatry, Emory University, Atlanta, GA, 30322, USA
| | - Thomas S Wingo
- Department of Human Genetics, Emory University, Atlanta, GA, 30322, USA
- Department of Neurology, Emory University, Atlanta, GA, 30322, USA
| | - David J Cutler
- Department of Human Genetics, Emory University, Atlanta, GA, 30322, USA
| | - Michael P Epstein
- Department of Human Genetics, Emory University, Atlanta, GA, 30322, USA.
| |
Collapse
|
4
|
Yuan W, Beitel F, Srikant T, Bezrukov I, Schäfer S, Kraft R, Weigel D. Pervasive under-dominance in gene expression underlying emergent growth trajectories in Arabidopsis thaliana hybrids. Genome Biol 2023; 24:200. [PMID: 37667232 PMCID: PMC10478501 DOI: 10.1186/s13059-023-03043-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 08/21/2023] [Indexed: 09/06/2023] Open
Abstract
BACKGROUND Complex traits, such as growth and fitness, are typically controlled by a very large number of variants, which can interact in both additive and non-additive fashion. In an attempt to gauge the relative importance of both types of genetic interactions, we turn to hybrids, which provide a facile means for creating many novel allele combinations. RESULTS We focus on the interaction between alleles of the same locus, i.e., dominance, and perform a transcriptomic study involving 141 random crosses between different accessions of the plant model species Arabidopsis thaliana. Additivity is rare, consistently observed for only about 300 genes enriched for roles in stress response and cell death. Regulatory rare-allele burden affects the expression level of these genes but does not correlate with F1 rosette size. Non-additive, dominant gene expression in F1 hybrids is much more common, with the vast majority of genes (over 90%) being expressed below the parental average. Unlike in the additive genes, regulatory rare-allele burden in the dominant gene set is strongly correlated with F1 rosette size, even though it only mildly covaries with the expression level of these genes. CONCLUSIONS Our study underscores under-dominance as the predominant gene action associated with emergence of rosette growth trajectories in the A. thaliana hybrid model. Our work lays the foundation for understanding molecular mechanisms and evolutionary forces that lead to dominance complementation of rare regulatory alleles.
Collapse
Affiliation(s)
- Wei Yuan
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076, Tübingen, Germany
| | - Fiona Beitel
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076, Tübingen, Germany
| | - Thanvi Srikant
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076, Tübingen, Germany
| | - Ilja Bezrukov
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076, Tübingen, Germany
| | - Sabine Schäfer
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076, Tübingen, Germany
| | - Robin Kraft
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076, Tübingen, Germany
| | - Detlef Weigel
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076, Tübingen, Germany.
| |
Collapse
|
5
|
Ang RML, Chen SAA, Kern AF, Xie Y, Fraser HB. Widespread epistasis among beneficial genetic variants revealed by high-throughput genome editing. CELL GENOMICS 2023; 3:100260. [PMID: 37082144 PMCID: PMC10112194 DOI: 10.1016/j.xgen.2023.100260] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Revised: 09/27/2022] [Accepted: 01/06/2023] [Indexed: 04/22/2023]
Abstract
The phenotypic effect of any genetic variant can be altered by variation at other genomic loci. Known as epistasis, these genetic interactions shape the genotype-phenotype map of every species, yet their origins remain poorly understood. To investigate this, we employed high-throughput genome editing to measure the fitness effects of 1,826 naturally polymorphic variants in four strains of Saccharomyces cerevisiae. About 31% of variants affect fitness, of which 24% have strain-specific fitness effects indicative of epistasis. We found that beneficial variants are more likely to exhibit genetic interactions and that these interactions can be mediated by specific traits such as flocculation ability. This work suggests that adaptive evolution will often involve trade-offs where a variant is only beneficial in some genetic backgrounds, potentially explaining why many beneficial variants remain polymorphic. In sum, we provide a framework to understand the factors influencing epistasis with single-nucleotide resolution, revealing widespread epistasis among beneficial variants.
Collapse
Affiliation(s)
- Roy Moh Lik Ang
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Shi-An A. Chen
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| | - Alexander F. Kern
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Yihua Xie
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| | - Hunter B. Fraser
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
6
|
Evidence for Epistatic Interaction between HLA-G and LILRB1 in the Pathogenesis of Nonsegmental Vitiligo. Cells 2023; 12:cells12040630. [PMID: 36831297 PMCID: PMC9954564 DOI: 10.3390/cells12040630] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 12/31/2022] [Accepted: 01/29/2023] [Indexed: 02/18/2023] Open
Abstract
Vitiligo is the most frequent cause of depigmentation worldwide. Genetic association studies have discovered about 50 loci associated with disease, many with immunological functions. Among them is HLA-G, which modulates immunity by interacting with specific inhibitory receptors, mainly LILRB1 and LILRB2. Here we investigated the LILRB1 and LILRB2 association with vitiligo risk and evaluated the possible role of interactions between HLA-G and its receptors in this pathogenesis. We tested the association of the polymorphisms of HLA-G, LILRB1, and LILRB2 with vitiligo using logistic regression along with adjustment by ancestry. Further, methods based on the multifactor dimensionality reduction (MDR) approach (MDR v.3.0.2, GMDR v.0.9, and MB-MDR) were used to detect potential epistatic interactions between polymorphisms from the three genes. An interaction involving rs9380142 and rs2114511 polymorphisms was identified by all methods used. The polymorphism rs9380142 is an HLA-G 3'UTR variant (+3187) with a well-established role in mRNA stability. The polymorphism rs2114511 is located in the exonic region of LILRB1. Although no association involving this SNP has been reported, ChIP-Seq experiments have identified this position as an EBF1 binding site. These results highlight the role of an epistatic interaction between HLA-G and LILRB1 in vitiligo pathogenesis.
Collapse
|
7
|
Graph pangenome captures missing heritability and empowers tomato breeding. Nature 2022; 606:527-534. [PMID: 35676474 PMCID: PMC9200638 DOI: 10.1038/s41586-022-04808-9] [Citation(s) in RCA: 135] [Impact Index Per Article: 67.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Accepted: 04/27/2022] [Indexed: 12/20/2022]
Abstract
Missing heritability in genome-wide association studies defines a major problem in genetic analyses of complex biological traits1,2. The solution to this problem is to identify all causal genetic variants and to measure their individual contributions3,4. Here we report a graph pangenome of tomato constructed by precisely cataloguing more than 19 million variants from 838 genomes, including 32 new reference-level genome assemblies. This graph pangenome was used for genome-wide association study analyses and heritability estimation of 20,323 gene-expression and metabolite traits. The average estimated trait heritability is 0.41 compared with 0.33 when using the single linear reference genome. This 24% increase in estimated heritability is largely due to resolving incomplete linkage disequilibrium through the inclusion of additional causal structural variants identified using the graph pangenome. Moreover, by resolving allelic and locus heterogeneity, structural variants improve the power to identify genetic factors underlying agronomically important traits leading to, for example, the identification of two new genes potentially contributing to soluble solid content. The newly identified structural variants will facilitate genetic improvement of tomato through both marker-assisted selection and genomic selection. Our study advances the understanding of the heritability of complex traits and demonstrates the power of the graph pangenome in crop breeding. A precise catalogue of more than 19 million variants from 838 tomato genomes, including 32 new reference-level genome assemblies, advances the understanding of the heritability of complex traits and demonstrates the power of the graph pangenome in crop breeding.
Collapse
|
8
|
Hartmann K, Seweryn M, Sadee W. Interpreting coronary artery disease GWAS results: A functional genomics approach assessing biological significance. PLoS One 2022; 17:e0244904. [PMID: 35192625 PMCID: PMC8863290 DOI: 10.1371/journal.pone.0244904] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Accepted: 01/01/2022] [Indexed: 01/09/2023] Open
Abstract
Genome-wide association studies (GWAS) have implicated 58 loci in coronary artery disease (CAD). However, the biological basis for these associations, the relevant genes, and causative variants often remain uncertain. Since the vast majority of GWAS loci reside outside coding regions, most exert regulatory functions. Here we explore the complexity of each of these loci, using tissue specific RNA sequencing data from GTEx to identify genes that exhibit altered expression patterns in the context of GWAS-significant loci, expanding the list of candidate genes from the 75 currently annotated by GWAS to 245, with almost half of these transcripts being non-coding. Tissue specific allelic expression imbalance data, also from GTEx, allows us to uncover GWAS variants that mark functional variation in a locus, e.g., rs7528419 residing in the SORT1 locus, in liver specifically, and rs72689147 in the GUYC1A1 locus, across a variety of tissues. We consider the GWAS variant rs1412444 in the LIPA locus in more detail as an example, probing tissue and transcript specific effects of genetic variation in the region. By evaluating linkage disequilibrium (LD) between tissue specific eQTLs, we reveal evidence for multiple functional variants within loci. We identify 3 variants (rs1412444, rs1051338, rs2250781) that when considered together, each improve the ability to account for LIPA gene expression, suggesting multiple interacting factors. These results refine the assignment of 58 GWAS loci to likely causative variants in a handful of cases and for the remainder help to re-prioritize associated genes and RNA isoforms, suggesting that ncRNAs maybe a relevant transcript in almost half of CAD GWAS results. Our findings support a multi-factorial system where a single variant can influence multiple genes and each genes is regulated by multiple variants.
Collapse
Affiliation(s)
- Katherine Hartmann
- Department of Cancer Biology and Genetics, Center for Pharmacogenomics, College of Medicine, The Ohio State University, Columbus, OH, United States of America
- * E-mail:
| | - Michał Seweryn
- Biobank Lab, Department of Molecular Biophysics, University of Lodz, Lodz, Poland
| | - Wolfgang Sadee
- Department of Cancer Biology and Genetics, Center for Pharmacogenomics, College of Medicine, The Ohio State University, Columbus, OH, United States of America
| |
Collapse
|
9
|
Ogbunugafor CB. The mutation effect reaction norm (mu-rn) highlights environmentally dependent mutation effects and epistatic interactions. Evolution 2022; 76:37-48. [PMID: 34989399 DOI: 10.1111/evo.14428] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 12/23/2021] [Indexed: 11/27/2022]
Abstract
Since the modern synthesis, the fitness effects of mutations and epistasis have been central yet provocative concepts in evolutionary and population genetics. Studies of how the interactions between parcels of genetic information can change as a function of environmental context have added a layer of complexity to these discussions. Here I introduce the "mutation effect reaction norm" (Mu-RN), a new instrument through which one can analyze the phenotypic consequences of mutations and interactions across environmental contexts. It embodies the fusion of measurements of genetic interactions with the reaction norm, a classic depiction of the performance of genotypes across environments. I demonstrate the utility of the Mu-RN through the signature of a "compensatory ratchet" mutation that undermines reverse evolution of antimicrobial resistance. More broadly, I argue that the mutation effect reaction norm may help us resolve the dynamism and unpredictability of evolution, with implications for theoretical biology, genetic modification technology, and public health. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- C Brandon Ogbunugafor
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, 06520, USA
| |
Collapse
|
10
|
Ahmed H, Alarabi L, El-Sappagh S, Soliman H, Elmogy M. Genetic variations analysis for complex brain disease diagnosis using machine learning techniques: opportunities and hurdles. PeerJ Comput Sci 2021; 7:e697. [PMID: 34616886 PMCID: PMC8459785 DOI: 10.7717/peerj-cs.697] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Accepted: 08/05/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND AND OBJECTIVES This paper presents an in-depth review of the state-of-the-art genetic variations analysis to discover complex genes associated with the brain's genetic disorders. We first introduce the genetic analysis of complex brain diseases, genetic variation, and DNA microarrays. Then, the review focuses on available machine learning methods used for complex brain disease classification. Therein, we discuss the various datasets, preprocessing, feature selection and extraction, and classification strategies. In particular, we concentrate on studying single nucleotide polymorphisms (SNP) that support the highest resolution for genomic fingerprinting for tracking disease genes. Subsequently, the study provides an overview of the applications for some specific diseases, including autism spectrum disorder, brain cancer, and Alzheimer's disease (AD). The study argues that despite the significant recent developments in the analysis and treatment of genetic disorders, there are considerable challenges to elucidate causative mutations, especially from the viewpoint of implementing genetic analysis in clinical practice. The review finally provides a critical discussion on the applicability of genetic variations analysis for complex brain disease identification highlighting the future challenges. METHODS We used a methodology for literature surveys to obtain data from academic databases. Criteria were defined for inclusion and exclusion. The selection of articles was followed by three stages. In addition, the principal methods for machine learning to classify the disease were presented in each stage in more detail. RESULTS It was revealed that machine learning based on SNP was widely utilized to solve problems of genetic variation for complex diseases related to genes. CONCLUSIONS Despite significant developments in genetic diseases in the past two decades of the diagnosis and treatment, there is still a large percentage in which the causative mutation cannot be determined, and a final genetic diagnosis remains elusive. So, we need to detect the variations of the genes related to brain disorders in the early disease stages.
Collapse
Affiliation(s)
- Hala Ahmed
- Information Technology Department, Faculty of Computers and Information, Mansoura University, Mansoura, Egypt
| | - Louai Alarabi
- Department of Computer Science, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Shaker El-Sappagh
- Centro Singular de Investigación en Tecnoloxías Intelixentes (CiTIUS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain
- Information Systems Department, Faculty of Computers and Artificial Intelligence, Benha University, Benha, Egypt
| | - Hassan Soliman
- Information Technology Department, Faculty of Computers and Information, Mansoura University, Mansoura, Egypt
| | - Mohammed Elmogy
- Information Technology Department, Faculty of Computers and Information, Mansoura University, Mansoura, Egypt
| |
Collapse
|
11
|
Slavskii SA, Kuznetsov IA, Shashkova TI, Bazykin GA, Axenovich TI, Kondrashov FA, Aulchenko YS. The limits of normal approximation for adult height. Eur J Hum Genet 2021; 29:1082-1091. [PMID: 33664501 PMCID: PMC8298501 DOI: 10.1038/s41431-021-00836-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 01/05/2021] [Accepted: 02/11/2021] [Indexed: 11/14/2022] Open
Abstract
Adult height inspired the first biometrical and quantitative genetic studies and is a test-case trait for understanding heritability. The studies of height led to formulation of the classical polygenic model, that has a profound influence on the way we view and analyse complex traits. An essential part of the classical model is an assumption of additivity of effects and normality of the distribution of the residuals. However, it may be expected that the normal approximation will become insufficient in bigger studies. Here, we demonstrate that when the height of hundreds of thousands of individuals is analysed, the model complexity needs to be increased to include non-additive interactions between sex, environment and genes. Alternatively, the use of log-normal approximation allowed us to still use the additive effects model. These findings are important for future genetic and methodologic studies that make use of adult height as an exemplar trait.
Collapse
Affiliation(s)
- Sergei A Slavskii
- Skolkovo Institute of Science and Technology, Moscow, Russia
- Novosibirsk State University, Novosibirsk, Russia
- Moscow Institute of Physics and Technology, Moscow, Russia
| | | | - Tatiana I Shashkova
- Novosibirsk State University, Novosibirsk, Russia
- Moscow Institute of Physics and Technology, Moscow, Russia
- Institute for Information Transmission Problems (Kharkevich Institute), Moscow, Russia
| | - Georgii A Bazykin
- Skolkovo Institute of Science and Technology, Moscow, Russia
- Institute for Information Transmission Problems (Kharkevich Institute), Moscow, Russia
| | - Tatiana I Axenovich
- Novosibirsk State University, Novosibirsk, Russia
- Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia
| | | | - Yurii S Aulchenko
- Novosibirsk State University, Novosibirsk, Russia.
- Moscow Institute of Physics and Technology, Moscow, Russia.
- Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia.
- Kurchatov Genomics Center, Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia.
- PolyOmica, 's-Hertogenbosch, PA, The Netherlands.
| |
Collapse
|
12
|
Goldstein I, Ehrenreich IM. The complex role of genetic background in shaping the effects of spontaneous and induced mutations. Yeast 2020; 38:187-196. [PMID: 33125810 PMCID: PMC7984271 DOI: 10.1002/yea.3530] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 10/09/2020] [Accepted: 10/24/2020] [Indexed: 12/27/2022] Open
Abstract
Spontaneous and induced mutations frequently show different phenotypic effects across genetically distinct individuals. It is generally appreciated that these background effects mainly result from genetic interactions between the mutations and segregating loci. However, the architectures and molecular bases of these genetic interactions are not well understood. Recent work in a number of model organisms has tried to advance knowledge of background effects both by using large‐scale screens to find mutations that exhibit this phenomenon and by identifying the specific loci that are involved. Here, we review this body of research, emphasizing in particular the insights it provides into both the prevalence of background effects across different mutations and the mechanisms that cause these background effects. A large fraction of mutations show different effects in distinct individuals. These background effects are mainly caused by epistasis with segregating loci. Mapping studies show a diversity of genetic architectures can be involved. Genetically complex changes in gene expression are often, but not always, causative.
Collapse
Affiliation(s)
- Ilan Goldstein
- Molecular and Computational Biology Section, Department of Biological Sciences, University of Southern California, Los Angeles, California, 90089-2910, USA
| | - Ian M Ehrenreich
- Molecular and Computational Biology Section, Department of Biological Sciences, University of Southern California, Los Angeles, California, 90089-2910, USA
| |
Collapse
|
13
|
Vasilopoulou C, Morris AP, Giannakopoulos G, Duguez S, Duddy W. What Can Machine Learning Approaches in Genomics Tell Us about the Molecular Basis of Amyotrophic Lateral Sclerosis? J Pers Med 2020; 10:E247. [PMID: 33256133 PMCID: PMC7712791 DOI: 10.3390/jpm10040247] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 11/21/2020] [Accepted: 11/23/2020] [Indexed: 02/07/2023] Open
Abstract
Amyotrophic Lateral Sclerosis (ALS) is the most common late-onset motor neuron disorder, but our current knowledge of the molecular mechanisms and pathways underlying this disease remain elusive. This review (1) systematically identifies machine learning studies aimed at the understanding of the genetic architecture of ALS, (2) outlines the main challenges faced and compares the different approaches that have been used to confront them, and (3) compares the experimental designs and results produced by those approaches and describes their reproducibility in terms of biological results and the performances of the machine learning models. The majority of the collected studies incorporated prior knowledge of ALS into their feature selection approaches, and trained their machine learning models using genomic data combined with other types of mined knowledge including functional associations, protein-protein interactions, disease/tissue-specific information, epigenetic data, and known ALS phenotype-genotype associations. The importance of incorporating gene-gene interactions and cis-regulatory elements into the experimental design of future ALS machine learning studies is highlighted. Lastly, it is suggested that future advances in the genomic and machine learning fields will bring about a better understanding of ALS genetic architecture, and enable improved personalized approaches to this and other devastating and complex diseases.
Collapse
Affiliation(s)
- Christina Vasilopoulou
- Northern Ireland Centre for Stratified Medicine, Altnagelvin Hospital Campus, Ulster University, Londonderry BT47 6SB, UK; (C.V.); (S.D.)
| | - Andrew P. Morris
- Centre for Genetics and Genomics Versus Arthritis, Centre for Musculoskeletal Research, Manchester Academic Health Science Centre, University of Manchester, Manchester M13 9PT, UK;
| | - George Giannakopoulos
- Institute of Informatics and Telecommunications, NCSR Demokritos, 153 10 Aghia Paraskevi, Greece;
- Science For You (SciFY) PNPC, TEPA Lefkippos-NCSR Demokritos, 27, Neapoleos, 153 41 Ag. Paraskevi, Greece
| | - Stephanie Duguez
- Northern Ireland Centre for Stratified Medicine, Altnagelvin Hospital Campus, Ulster University, Londonderry BT47 6SB, UK; (C.V.); (S.D.)
| | - William Duddy
- Northern Ireland Centre for Stratified Medicine, Altnagelvin Hospital Campus, Ulster University, Londonderry BT47 6SB, UK; (C.V.); (S.D.)
| |
Collapse
|
14
|
Besnier F, Solberg MF, Harvey AC, Carvalho GR, Bekkevold D, Taylor MI, Creer S, Nielsen EE, Skaala Ø, Ayllon F, Dahle G, Glover KA. Epistatic regulation of growth in Atlantic salmon revealed: a QTL study performed on the domesticated-wild interface. BMC Genet 2020; 21:13. [PMID: 32033538 PMCID: PMC7006396 DOI: 10.1186/s12863-020-0816-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2019] [Accepted: 01/28/2020] [Indexed: 12/23/2022] Open
Abstract
Background Quantitative traits are typically considered to be under additive genetic control. Although there are indications that non-additive factors have the potential to contribute to trait variation, experimental demonstration remains scarce. Here, we investigated the genetic basis of growth in Atlantic salmon by exploiting the high level of genetic diversity and trait expression among domesticated, hybrid and wild populations. Results After rearing fish in common-garden experiments under aquaculture conditions, we performed a variance component analysis in four mapping populations totaling ~ 7000 individuals from six wild, two domesticated and three F1 wild/domesticated hybrid strains. Across the four independent datasets, genome-wide significant quantitative trait loci (QTLs) associated with weight and length were detected on a total of 18 chromosomes, reflecting the polygenic nature of growth. Significant QTLs correlated with both length and weight were detected on chromosomes 2, 6 and 9 in multiple datasets. Significantly, epistatic QTLs were detected in all datasets. Discussion The observed interactions demonstrated that the phenotypic effect of inheriting an allele deviated between half-sib families. Gene-by-gene interactions were also suggested, where the combined effect of two loci resulted in a genetic effect upon phenotypic variance, while no genetic effect was detected when the two loci were considered separately. To our knowledge, this is the first documentation of epistasis in a quantitative trait in Atlantic salmon. These novel results are of relevance for breeding programs, and for predicting the evolutionary consequences of domestication-introgression in wild populations.
Collapse
Affiliation(s)
- Francois Besnier
- Population Genetics Research group, Institute of Marine Research, P.O. Box 1870, Nordnes, NO-5817, Bergen, Norway.
| | - Monica F Solberg
- Population Genetics Research group, Institute of Marine Research, P.O. Box 1870, Nordnes, NO-5817, Bergen, Norway
| | - Alison C Harvey
- Population Genetics Research group, Institute of Marine Research, P.O. Box 1870, Nordnes, NO-5817, Bergen, Norway.,Molecular Ecology and Fisheries Genetics Laboratory, School of Biological Sciences, Bangor University, Deiniol Road, Bangor, LL57 2UW, UK
| | - Gary R Carvalho
- Molecular Ecology and Fisheries Genetics Laboratory, School of Biological Sciences, Bangor University, Deiniol Road, Bangor, LL57 2UW, UK
| | - Dorte Bekkevold
- Section for Marine Living Resources, National Institute of Aquatic Resources, Technical University of Denmark, Vejlsøvej 39, 8600, Silkeborg, Denmark
| | - Martin I Taylor
- School of Biological Sciences, University of East Anglia, Norwich, NR4 7TJ, UK
| | - Simon Creer
- Molecular Ecology and Fisheries Genetics Laboratory, School of Biological Sciences, Bangor University, Deiniol Road, Bangor, LL57 2UW, UK
| | - Einar E Nielsen
- Section for Marine Living Resources, National Institute of Aquatic Resources, Technical University of Denmark, Vejlsøvej 39, 8600, Silkeborg, Denmark
| | - Øystein Skaala
- Population Genetics Research group, Institute of Marine Research, P.O. Box 1870, Nordnes, NO-5817, Bergen, Norway
| | - Fernando Ayllon
- Population Genetics Research group, Institute of Marine Research, P.O. Box 1870, Nordnes, NO-5817, Bergen, Norway
| | - Geir Dahle
- Population Genetics Research group, Institute of Marine Research, P.O. Box 1870, Nordnes, NO-5817, Bergen, Norway.,Sea Lice Research Centre, Department of Biology, University of Bergen, Bergen, Norway
| | - Kevin A Glover
- Population Genetics Research group, Institute of Marine Research, P.O. Box 1870, Nordnes, NO-5817, Bergen, Norway.,Sea Lice Research Centre, Department of Biology, University of Bergen, Bergen, Norway
| |
Collapse
|
15
|
Domínguez-García S, García C, Quesada H, Caballero A. Accelerated inbreeding depression suggests synergistic epistasis for deleterious mutations in Drosophila melanogaster. Heredity (Edinb) 2019; 123:709-722. [PMID: 31477803 PMCID: PMC6834575 DOI: 10.1038/s41437-019-0263-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Revised: 08/15/2019] [Accepted: 08/18/2019] [Indexed: 01/21/2023] Open
Abstract
Epistasis may have important consequences for a number of issues in quantitative genetics and evolutionary biology. In particular, synergistic epistasis for deleterious alleles is relevant to the mutation load paradox and the evolution of sex and recombination. Some studies have shown evidence of synergistic epistasis for spontaneous or induced deleterious mutations appearing in mutation-accumulation experiments. However, many newly arising mutations may not actually be segregating in natural populations because of the erasing action of natural selection. A demonstration of synergistic epistasis for naturally segregating alleles can be achieved by means of inbreeding depression studies, as deleterious recessive allelic effects are exposed in inbred lines. Nevertheless, evidence of epistasis from these studies is scarce and controversial. In this paper, we report the results of two independent inbreeding experiments carried out with two different populations of Drosophila melanogaster. The results show a consistent accelerated inbreeding depression for fitness, suggesting synergistic epistasis among deleterious alleles. We also performed computer simulations assuming different possible models of epistasis and mutational parameters for fitness, finding some of them to be compatible with the results observed. Our results suggest that synergistic epistasis for deleterious mutations not only occurs among newly arisen spontaneous or induced mutations, but also among segregating alleles in natural populations.
Collapse
Affiliation(s)
- Sara Domínguez-García
- Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310, Vigo, Spain.,Centro de Investigación Marina (CIM-UVIGO), Universidade de Vigo, 36310, Vigo, Spain
| | - Carlos García
- CIBUS, Universidade de Santiago de Compostela, 15782, Santiago de Compostela, Galicia, Spain
| | - Humberto Quesada
- Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310, Vigo, Spain.,Centro de Investigación Marina (CIM-UVIGO), Universidade de Vigo, 36310, Vigo, Spain
| | - Armando Caballero
- Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310, Vigo, Spain. .,Centro de Investigación Marina (CIM-UVIGO), Universidade de Vigo, 36310, Vigo, Spain.
| |
Collapse
|
16
|
Yan J, Risacher SL, Shen L, Saykin AJ. Network approaches to systems biology analysis of complex disease: integrative methods for multi-omics data. Brief Bioinform 2019; 19:1370-1381. [PMID: 28679163 DOI: 10.1093/bib/bbx066] [Citation(s) in RCA: 107] [Impact Index Per Article: 21.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2017] [Indexed: 11/14/2022] Open
Abstract
In the past decade, significant progress has been made in complex disease research across multiple omics layers from genome, transcriptome and proteome to metabolome. There is an increasing awareness of the importance of biological interconnections, and much success has been achieved using systems biology approaches. However, because of the typical focus on one single omics layer at a time, existing systems biology findings explain only a modest portion of complex disease. Recent advances in multi-omics data collection and sharing present us new opportunities for studying complex diseases in a more comprehensive fashion, and yet simultaneously create new challenges considering the unprecedented data dimensionality and diversity. Here, our goal is to review extant and emerging network approaches that can be applied across multiple biological layers to facilitate a more comprehensive and integrative multilayered omics analysis of complex diseases.
Collapse
Affiliation(s)
- Jingwen Yan
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University Indianapolis, USA
| | - Shannon L Risacher
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, USA
| | - Li Shen
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, USA
| | - Andrew J Saykin
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, USA
| |
Collapse
|
17
|
López-Cortegano E, Caballero A. Inferring the Nature of Missing Heritability in Human Traits Using Data from the GWAS Catalog. Genetics 2019; 212:891-904. [PMID: 31123044 PMCID: PMC6614893 DOI: 10.1534/genetics.119.302077] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Accepted: 05/11/2019] [Indexed: 02/07/2023] Open
Abstract
Thousands of genes responsible for many diseases and other common traits in humans have been detected by Genome Wide Association Studies (GWAS) in the last decade. However, candidate causal variants found so far usually explain only a small fraction of the heritability estimated by family data. The most common explanation for this observation is that the missing heritability corresponds to variants, either rare or common, with very small effect, which pass undetected due to a lack of statistical power. We carried out a meta-analysis using data from the NHGRI-EBI GWAS Catalog in order to explore the observed distribution of locus effects for a set of 42 complex traits and to quantify their contribution to narrow-sense heritability. With the data at hand, we were able to predict the expected distribution of locus effects for 16 traits and diseases, their expected contribution to heritability, and the missing number of loci yet to be discovered to fully explain the familial heritability estimates. Our results indicate that, for 6 out of the 16 traits, the additive contribution of a great number of loci is unable to explain the familial (broad-sense) heritability, suggesting that the gap between GWAS and familial estimates of heritability may not ever be closed for these traits. In contrast, for the other 10 traits, the additive contribution of hundreds or thousands of loci yet to be found could potentially explain the familial heritability estimates, if this were the case. Computer simulations are used to illustrate the possible contribution from nonadditive genetic effects to the gap between GWAS and familial estimates of heritability.
Collapse
Affiliation(s)
| | - Armando Caballero
- Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310, Spain
| |
Collapse
|
18
|
Crawford L, Flaxman SR, Runcie DE, West M. VARIABLE PRIORITIZATION IN NONLINEAR BLACK BOX METHODS: A GENETIC ASSOCIATION CASE STUDY 1. Ann Appl Stat 2019; 13:958-989. [PMID: 32542104 PMCID: PMC7295151 DOI: 10.1214/18-aoas1222] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The central aim in this paper is to address variable selection questions in nonlinear and nonparametric regression. Motivated by statistical genetics, where nonlinear interactions are of particular interest, we introduce a novel and interpretable way to summarize the relative importance of predictor variables. Methodologically, we develop the "RelATive cEntrality" (RATE) measure to prioritize candidate genetic variants that are not just marginally important, but whose associations also stem from significant covarying relationships with other variants in the data. We illustrate RATE through Bayesian Gaussian process regression, but the methodological innovations apply to other "black box" methods. It is known that nonlinear models often exhibit greater predictive accuracy than linear models, particularly for phenotypes generated by complex genetic architectures. With detailed simulations and two real data association mapping studies, we show that applying RATE enables an explanation for this improved performance.
Collapse
|
19
|
Lee SH, Ahn WY, Seweryn M, Sadee W. Combined genetic influence of the nicotinic receptor gene cluster CHRNA5/A3/B4 on nicotine dependence. BMC Genomics 2018; 19:826. [PMID: 30453884 PMCID: PMC6245894 DOI: 10.1186/s12864-018-5219-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Accepted: 11/01/2018] [Indexed: 12/17/2022] Open
Abstract
Background The CHRNA5/A3/B4 gene locus is associated with nicotine dependence and other smoking related disorders. While the non-synonymous CHRNA5 variant rs16969968 appears to be the main risk factor, linkage disequilibrium (LD) bins in the gene cluster carry frequent variants that regulate expression. Pairwise LD and haplotype analyses had identified at least three haplotype tagging SNPs including rs16969968 as main genetic risk factors. Searching for variants with evidence of regulatory functions, we have reported interactions between CHRNA5 and CHRNA3 enhancer variants (tagged by rs880395 and rs1948, respectively) and rs16969968, forming 3-SNP haplotypes and diplotypes that may more accurately reflect the cluster’s combined effects on nicotine dependence (Barrie et al., Hum Mutat 38:112–9, 2017). Here we address further contributions by variants affecting CHRNB4, a possibly limiting component of nicotinic receptors. Results We identify an LD bin (tagged by rs4887074) associated with expression of CHRNB4. Additive logistic regression models indicate that rs4887074 is associated with nicotine dependence and modulates the effect of rs16969968 in GWAS datasets (COGEND, UW-TTURC, SAGE). 4-SNP haplotype and diplotype analyses (rs880395-rs16969968-rs1948 -rs4887074) yield nicotine dependence risk values that further differentiate those obtained with the 3-SNP model. Moreover, both the main G allele of rs16969968 and the minor G allele of rs4887074 (associated with reduced expression of CHRNB4), residing predominantly on common haplotypes that are protective, represent significant allele-specific variance QTLs, indicating that they interact with each other. Conclusions These results indicate rs4887074 is associated with CHRNB4 expression, and along with two regulatory variants of CHRNA3 and CHRNA5, modulates the effect of rs16969968 on nicotine dependence risk. Assignable to individuals because of strong LD structures, 4-SNP haplotypes and diplotypes serve to assess the combined genetic influence of this multi-gene cluster on complex traits, accounting for complex LD relationships and tissue-specific genetic effects (CHRNA5/3) relevant to the traits analyzed. The 4-SNP haplotypes account at least in part for previous tagging SNPs, including the highly GWAS-significant rs6495308, located in a distinct pair-wise LD bin but included in protective 4-SNP haplotypes. Our approach refines and integrates the cluster’s overall genetic influence, an important variable when integrating the genetics of multiple genomic loci. Electronic supplementary material The online version of this article (10.1186/s12864-018-5219-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sung-Ha Lee
- Center for Pharmacogenomics, Department of Cancer Biology and Genetics, College of Medicine, The Ohio State University, 1004 Biomedical Research Tower, 460 W 12th Avenue, Columbus, OH, USA. .,Center for Happiness Studies, Seoul National University, Gwanak-gu, Gwanak-ro 1, Bldg. 220, Seoul, 151-746, South Korea.
| | - Woo-Young Ahn
- Center for Pharmacogenomics, Department of Cancer Biology and Genetics, College of Medicine, The Ohio State University, 1004 Biomedical Research Tower, 460 W 12th Avenue, Columbus, OH, USA.,Department of Psychology, The Ohio State University, 1835 Neil Avenue, Columbus, OH, USA.,Department of Psychology, Seoul National University, Gwanak-gu, Gwanak-ro 1, Bldg. 16, Seoul, 151-746, South Korea
| | - Michał Seweryn
- Center for Medical Genomics OMICRON, Jagiellonian University, Medical College, Krakow, Poland
| | - Wolfgang Sadee
- Center for Pharmacogenomics, Department of Cancer Biology and Genetics, College of Medicine, The Ohio State University, 1004 Biomedical Research Tower, 460 W 12th Avenue, Columbus, OH, USA
| |
Collapse
|
20
|
Zhao Y, Yu J, Zhao J, Chen X, Xiong N, Wang T, Qing H, Lin Z. Intragenic Transcriptional cis-Antagonism Across SLC6A3. Mol Neurobiol 2018; 56:4051-4060. [PMID: 30259411 DOI: 10.1007/s12035-018-1357-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2018] [Accepted: 09/18/2018] [Indexed: 12/29/2022]
Abstract
A promoter can be regulated by various cis-acting elements so that delineation of the regulatory modes among them may help understand developmental, environmental and genetic mechanisms in gene activity. Here we report that the human dopamine transporter gene SLC6A3 carries a 5' distal 5-kb super enhancer (5KSE) which upregulated the promoter by 5-fold. Interestingly, 5KSE is able to prevent 3' downstream variable number tandem repeats (3'VNTRs) from silencing the promoter. This new enhancer consists of a 5'VNTR and three repetitive sub-elements that are conserved in primates. Two of 5KSE's sub-elements, E-9.7 and E-8.7, upregulate the promoter, but only the later could continue doing so in the presence of 3'VNTRs. Finally, E-8.7 is activated by novel dopaminergic transcription factors including SRP54 and Nfe2l1. Together, these results reveal a multimodal regulatory mechanism in SLC6A3.
Collapse
Affiliation(s)
- Ying Zhao
- Laboratory of Psychiatric Neurogenomics, Basic Neuroscience Division, McLean Hospital, Belmont, MA, 02478, USA.,School of Pharmacy, Xinxiang Medical University, Xinxiang, 453003, China
| | - Jinlong Yu
- Laboratory of Psychiatric Neurogenomics, Basic Neuroscience Division, McLean Hospital, Belmont, MA, 02478, USA
| | - Juan Zhao
- Laboratory of Psychiatric Neurogenomics, Basic Neuroscience Division, McLean Hospital, Belmont, MA, 02478, USA.,College of Life Science, Beijing Institute of Technology, Beijing, 100081, China
| | - Xiaowu Chen
- Laboratory of Psychiatric Neurogenomics, Basic Neuroscience Division, McLean Hospital, Belmont, MA, 02478, USA.,Department of Neurology, Shenzhen University General Hospital, Shenzhen, Guangzhou, 518060, China
| | - Nian Xiong
- Laboratory of Psychiatric Neurogenomics, Basic Neuroscience Division, McLean Hospital, Belmont, MA, 02478, USA.,Department of Neurology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Tao Wang
- Department of Neurology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Hong Qing
- College of Life Science, Beijing Institute of Technology, Beijing, 100081, China
| | - Zhicheng Lin
- Laboratory of Psychiatric Neurogenomics, Basic Neuroscience Division, McLean Hospital, Belmont, MA, 02478, USA.
| |
Collapse
|
21
|
Mullis MN, Matsui T, Schell R, Foree R, Ehrenreich IM. The complex underpinnings of genetic background effects. Nat Commun 2018; 9:3548. [PMID: 30224702 PMCID: PMC6141565 DOI: 10.1038/s41467-018-06023-5] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2017] [Accepted: 08/09/2018] [Indexed: 12/01/2022] Open
Abstract
Genetic interactions between mutations and standing polymorphisms can cause mutations to show distinct phenotypic effects in different individuals. To characterize the genetic architecture of these so-called background effects, we genotype 1411 wild-type and mutant yeast cross progeny and measure their growth in 10 environments. Using these data, we map 1086 interactions between segregating loci and 7 different gene knockouts. Each knockout exhibits between 73 and 543 interactions, with 89% of all interactions involving higher-order epistasis between a knockout and multiple loci. Identified loci interact with as few as one knockout and as many as all seven knockouts. In mutants, loci interacting with fewer and more knockouts tend to show enhanced and reduced phenotypic effects, respectively. Cross–environment analysis reveals that most interactions between the knockouts and segregating loci also involve the environment. These results illustrate the complicated interactions between mutations, standing polymorphisms, and the environment that cause background effects. Mutations often show distinct phenotypic effects across different genetic backgrounds. Here the authors describe the genetic basis of these so-called background effects using data on genotype and growth in 10 environments from 1411 segregants from a cross of two strains of budding yeast.
Collapse
Affiliation(s)
- Martin N Mullis
- Molecular and Computational Biology Section, Department of Biological Sciences, University of Southern California, Los Angeles, CA, 90089-2910, USA.
| | - Takeshi Matsui
- Molecular and Computational Biology Section, Department of Biological Sciences, University of Southern California, Los Angeles, CA, 90089-2910, USA.
| | - Rachel Schell
- Molecular and Computational Biology Section, Department of Biological Sciences, University of Southern California, Los Angeles, CA, 90089-2910, USA
| | - Ryan Foree
- Molecular and Computational Biology Section, Department of Biological Sciences, University of Southern California, Los Angeles, CA, 90089-2910, USA
| | - Ian M Ehrenreich
- Molecular and Computational Biology Section, Department of Biological Sciences, University of Southern California, Los Angeles, CA, 90089-2910, USA.
| |
Collapse
|
22
|
Banta JA, Richards CL. Quantitative epigenetics and evolution. Heredity (Edinb) 2018; 121:210-224. [PMID: 29980793 PMCID: PMC6082842 DOI: 10.1038/s41437-018-0114-x] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2017] [Revised: 06/07/2018] [Accepted: 06/15/2018] [Indexed: 01/05/2023] Open
Abstract
Epigenetics refers to chemical modifications of chromatin or transcribed DNA that can influence gene activity and expression without changes in DNA sequence. The last 20 years have yielded breakthroughs in our understanding of epigenetic processes that impact many fields of biology. In this review, we discuss how epigenetics relates to quantitative genetics and evolution. We argue that epigenetics is important for quantitative genetics because: (1) quantitative genetics is increasingly being combined with genomics, and therefore we should expand our thinking to include cellular-level mechanisms that can account for phenotypic variance and heritability besides just those that are hard-coded in the DNA sequence; and (2) epigenetic mechanisms change how phenotypic variance is partitioned, and can thereby change the heritability of traits and how those traits are inherited. To explicate these points, we show that epigenetics can influence all aspects of the phenotypic variance formula: VP (total phenotypic variance) = VG (genetic variance) + VE (environmental variance) + VGxE (genotype-by-environment interaction) + 2COVGE (the genotype-environment covariance) + Vɛ (residual variance), requiring new strategies to account for different potential sources of epigenetic effects on phenotypic variance. We also demonstrate how each of the components of phenotypic variance not only can be influenced by epigenetics, but can also have evolutionary consequences. We argue that no sources of epigenetic effects on phenotypic variance can be easily cast aside in a quantitative genetic research program that seeks to understand evolutionary processes.
Collapse
Affiliation(s)
- Joshua A Banta
- Department of Biology, University of Texas at Tyler, Tyler, TX, 75799, USA.
| | - Christina L Richards
- Department of Integrative Biology, University of South Florida, Tampa, FL, 33620, USA
| |
Collapse
|
23
|
Crawford L, Wood KC, Zhou X, Mukherjee S. Bayesian Approximate Kernel Regression with Variable Selection. J Am Stat Assoc 2018; 113:1710-1721. [PMID: 30799887 PMCID: PMC6383716 DOI: 10.1080/01621459.2017.1361830] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2016] [Revised: 06/07/2017] [Indexed: 01/17/2023]
Abstract
Nonlinear kernel regression models are often used in statistics and machine learning because they are more accurate than linear models. Variable selection for kernel regression models is a challenge partly because, unlike the linear regression setting, there is no clear concept of an effect size for regression coefficients. In this paper, we propose a novel framework that provides an effect size analog for each explanatory variable in Bayesian kernel regression models when the kernel is shift-invariant - for example, the Gaussian kernel. We use function analytic properties of shift-invariant reproducing kernel Hilbert spaces (RKHS) to define a linear vector space that: (i) captures nonlinear structure, and (ii) can be projected onto the original explanatory variables. This projection onto the original explanatory variables serves as an analog of effect sizes. The specific function analytic property we use is that shift-invariant kernel functions can be approximated via random Fourier bases. Based on the random Fourier expansion, we propose a computationally efficient class of Bayesian approximate kernel regression (BAKR) models for both nonlinear regression and binary classification for which one can compute an analog of effect sizes. We illustrate the utility of BAKR by examining two important problems in statistical genetics: genomic selection (i.e. phenotypic prediction) and association mapping (i.e. inference of significant variants or loci). State-of-the-art methods for genomic selection and association mapping are based on kernel regression and linear models, respectively. BAKR is the first method that is competitive in both settings.
Collapse
Affiliation(s)
- Lorin Crawford
- Department of Biostatistics, Brown University, Providence, RI, USA
- Center for Statistical Sciences, Brown University, Providence, RI, USA
- Center for Computational Molecular Biology, Brown University, Providence, RI, USA
| | - Kris C. Wood
- Department of Pharmacology & Cancer Biology, Duke University, Durham, NC, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Sayan Mukherjee
- Department of Statistical Science, Duke University, Durham, NC, USA
- Department of Computer Science, Duke University, Durham, NC, 27708
- Department of Mathematics, Duke University, Durham, NC, 27708
- Department of Bioinformatics & Biostatistics, Duke University, Durham, NC, 27708
| |
Collapse
|
24
|
Pecanka J, Jonker MA, Bochdanovits Z, Van Der Vaart AW. A powerful and efficient two-stage method for detecting gene-to-gene interactions in GWAS. Biostatistics 2018; 18:477-494. [PMID: 28334077 DOI: 10.1093/biostatistics/kxw060] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2016] [Accepted: 11/05/2016] [Indexed: 11/13/2022] Open
Abstract
For over a decade functional gene-to-gene interaction (epistasis) has been suspected to be a determinant in the "missing heritability" of complex traits. However, searching for epistasis on the genome-wide scale has been challenging due to the prohibitively large number of tests which result in a serious loss of statistical power as well as computational challenges. In this article, we propose a two-stage method applicable to existing case-control data sets, which aims to lessen both of these problems by pre-assessing whether a candidate pair of genetic loci is involved in epistasis before it is actually tested for interaction with respect to a complex phenotype. The pre-assessment is based on a two-locus genotype independence test performed in the sample of cases. Only the pairs of loci that exhibit non-equilibrium frequencies are analyzed via a logistic regression score test, thereby reducing the multiple testing burden. Since only the computationally simple independence tests are performed for all pairs of loci while the more demanding score tests are restricted to the most promising pairs, genome-wide association study (GWAS) for epistasis becomes feasible. By design our method provides strong control of the type I error. Its favourable power properties especially under the practically relevant misspecification of the interaction model are illustrated. Ready-to-use software is available. Using the method we analyzed Parkinson's disease in four cohorts and identified possible interactions within several SNP pairs in multiple cohorts.
Collapse
Affiliation(s)
- Jakub Pecanka
- Leiden University Medical Center, Department of Medical Statistics and Bioinformatics, Leiden, The Netherlands and VU University, Department of Mathematics, Amsterdam, the Netherlands
| | - Marianne A Jonker
- VU University Medical Center, Department of Epidemiology and Biostatistics, Amsterdam, The Netherlands and Radboud University medical center, Radboud Institute for Health Sciences, Nijmegen, The Netherlands
| | | | - Zoltan Bochdanovits
- VU University Medical Center, Department of Clinical Genetics, Amsterdam, The Netherlands
| | | |
Collapse
|
25
|
Yadav A, Sinha H. Gene-gene and gene-environment interactions in complex traits in yeast. Yeast 2018; 35:403-416. [PMID: 29322552 DOI: 10.1002/yea.3304] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Revised: 12/11/2017] [Accepted: 12/23/2017] [Indexed: 01/05/2023] Open
Abstract
One of the fundamental questions in biology is how the genotype regulates the phenotype. An increasing number of studies indicate that, in most cases, the effect of a genetic locus on the phenotype is context-dependent, i.e. it is influenced by the genetic background and the environment in which the phenotype is measured. Still, the majority of the studies, in both model organisms and humans, that map the genetic regulation of phenotypic variation in complex traits primarily identify additive loci with independent effects. This does not reflect an absence of the contribution of genetic interactions to phenotypic variation, but instead is a consequence of the technical limitations in mapping gene-gene interactions (GGI) and gene-environment interactions (GEI). Yeast, with its detailed molecular understanding, diverse population genomics and ease of genetic manipulation, is a unique and powerful resource to study the contributions of GGI and GEI in the regulation of phenotypic variation. Here we review studies in yeast that have identified GGI and GEI that regulate phenotypic variation, and discuss the contribution of these findings in explaining missing heritability of complex traits, and how observations from these GGI and GEI studies enhance our understanding of the mechanisms underlying genetic robustness and adaptability that shape the architecture of the genotype-phenotype map.
Collapse
Affiliation(s)
- Anupama Yadav
- Center for Cancer Systems Biology, and Cancer Biology, Dana Farber Cancer Institute, Boston, MA, 02215, USA.,Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA
| | - Himanshu Sinha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036, India.,Initiative for Biological Systems Engineering, Indian Institute of Technology Madras, Chennai, 600036, India.,Robert Bosch Centre for Data Sciences and Artificial Intelligence, Indian Institute of Technology Madras, Chennai, 600036, India
| |
Collapse
|
26
|
Seed CE, Tomkins JL. Positive size-speed relationships in gametes and vegetative cells of Chlamydomonas reinhardtii; implications for the evolution of sperm. Evolution 2018; 72:440-452. [PMID: 29345308 DOI: 10.1111/evo.13427] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2017] [Revised: 12/19/2017] [Accepted: 12/27/2017] [Indexed: 11/26/2022]
Abstract
It is commonly held that differences in gametes of the two sexes (anisogamy) evolved from ancestors whose gametes were similar in size and behavior (isogamy). Underlying many hypotheses explaining anisogamy are assumed relationships between cell size and speed in the ancestral isogamous population. Using the isogamous alga Chlamydomonas reinhardtii, we explored size-speed distributions in vegetative and gamete cells of 10 cell lines, and clonal data from within two cell lines. We applied an independent speed selection approach to gamete populations of C. reinhardtii, monitoring correlated responses in size following selection for high speed. We demonstrate positive size-speed relationships in clones, cell lines, and artificially selected speed selection lines. We found different size-speed relationships in the two cell types of C. reinhardtii even though they overlap in size, suggesting that cell composition and/or programs of gene expression are capable of altering this relationship, and that the relationship is evolvable. The positive genetic size-speed correlation means that the division of parent vegetative cells into numerous gametes trades off against not only size, but also speed, a trade-off that has not received previous attention. Our results support reevaluating the role of speed selection in the evolution of anisogamy.
Collapse
Affiliation(s)
- Catherine E Seed
- Centre for Evolutionary Biology, School of Biological Sciences (M092), The University of Western Australia, Crawley, Australia
| | - Joseph L Tomkins
- Centre for Evolutionary Biology, School of Biological Sciences (M092), The University of Western Australia, Crawley, Australia
| |
Collapse
|
27
|
A novel linkage-disequilibrium corrected genomic relationship matrix for SNP-heritability estimation and genomic prediction. Heredity (Edinb) 2017; 120:356-368. [PMID: 29238077 PMCID: PMC5842222 DOI: 10.1038/s41437-017-0023-4] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2017] [Revised: 10/13/2017] [Accepted: 10/23/2017] [Indexed: 12/15/2022] Open
Abstract
Single nucleotide polymorphism (SNP)-heritability estimation is an important topic in several research fields, including animal, plant and human genetics, as well as in ecology. Linear mixed model estimation of SNP-heritability uses the structures of genomic relationships between individuals, which is constructed from genome-wide sets of SNP-markers that are generally weighted equally in their contributions. Proposed methods to handle dependence between SNPs include, “thinning” the marker set by linkage disequilibrium (LD)-pruning, the use of haplotype-tagging of SNPs, and LD-weighting of the SNP-contributions. For improved estimation, we propose a new conceptual framework for genomic relationship matrix, in which Mahalanobis distance-based LD-correction is used in a linear mixed model estimation of SNP-heritability. The superiority of the presented method is illustrated and compared to mixed-model analyses using a VanRaden genomic relationship matrix, a matrix used by GCTA and a matrix employing LD-weighting (as implemented in the LDAK software) in simulated (using real human, rice and cattle genotypes) and real (maize, rice and mice) datasets. Despite of the computational difficulties, our results suggest that by using the proposed method one can improve the accuracy of SNP-heritability estimates in datasets with high LD.
Collapse
|
28
|
Noble LM, Chelo I, Guzella T, Afonso B, Riccardi DD, Ammerman P, Dayarian A, Carvalho S, Crist A, Pino-Querido A, Shraiman B, Rockman MV, Teotónio H. Polygenicity and Epistasis Underlie Fitness-Proximal Traits in the Caenorhabditis elegans Multiparental Experimental Evolution (CeMEE) Panel. Genetics 2017; 207:1663-1685. [PMID: 29066469 PMCID: PMC5714472 DOI: 10.1534/genetics.117.300406] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Accepted: 10/10/2017] [Indexed: 01/27/2023] Open
Abstract
Understanding the genetic basis of complex traits remains a major challenge in biology. Polygenicity, phenotypic plasticity, and epistasis contribute to phenotypic variance in ways that are rarely clear. This uncertainty can be problematic for estimating heritability, for predicting individual phenotypes from genomic data, and for parameterizing models of phenotypic evolution. Here, we report an advanced recombinant inbred line (RIL) quantitative trait locus mapping panel for the hermaphroditic nematode Caenorhabditis elegans, the C. elegans multiparental experimental evolution (CeMEE) panel. The CeMEE panel, comprising 507 RILs at present, was created by hybridization of 16 wild isolates, experimental evolution for 140-190 generations, and inbreeding by selfing for 13-16 generations. The panel contains 22% of single-nucleotide polymorphisms known to segregate in natural populations, and complements existing C. elegans mapping resources by providing fine resolution and high nucleotide diversity across > 95% of the genome. We apply it to study the genetic basis of two fitness components, fertility and hermaphrodite body size at time of reproduction, with high broad-sense heritability in the CeMEE. While simulations show that we should detect common alleles with additive effects as small as 5%, at gene-level resolution, the genetic architectures of these traits do not feature such alleles. We instead find that a significant fraction of trait variance, approaching 40% for fertility, can be explained by sign epistasis with main effects below the detection limit. In congruence, phenotype prediction from genomic similarity, while generally poor ([Formula: see text]), requires modeling epistasis for optimal accuracy, with most variance attributed to the rapidly evolving chromosome arms.
Collapse
Affiliation(s)
- Luke M Noble
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York 10003
| | - Ivo Chelo
- Instituto Gulbenkian de Ciência, P-2781-901 Oeiras, Portugal
| | - Thiago Guzella
- Institut de Biologie, École Normale Supérieure, Centre National de la Recherche Scientifique (CNRS) UMR 8197, Institut National de la Santé et de la Recherche Médicale (INSERM) U1024, F-75005 Paris, France
| | - Bruno Afonso
- Instituto Gulbenkian de Ciência, P-2781-901 Oeiras, Portugal
- Institut de Biologie, École Normale Supérieure, Centre National de la Recherche Scientifique (CNRS) UMR 8197, Institut National de la Santé et de la Recherche Médicale (INSERM) U1024, F-75005 Paris, France
| | - David D Riccardi
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York 10003
| | - Patrick Ammerman
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York 10003
| | - Adel Dayarian
- Kavli Institute for Theoretical Physics, University of California, Santa Barbara, California 93106
| | - Sara Carvalho
- Instituto Gulbenkian de Ciência, P-2781-901 Oeiras, Portugal
| | - Anna Crist
- Institut de Biologie, École Normale Supérieure, Centre National de la Recherche Scientifique (CNRS) UMR 8197, Institut National de la Santé et de la Recherche Médicale (INSERM) U1024, F-75005 Paris, France
| | | | - Boris Shraiman
- Kavli Institute for Theoretical Physics, University of California, Santa Barbara, California 93106
- Department of Physics, University of California, Santa Barbara, California 93106
| | - Matthew V Rockman
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York 10003
| | - Henrique Teotónio
- Institut de Biologie, École Normale Supérieure, Centre National de la Recherche Scientifique (CNRS) UMR 8197, Institut National de la Santé et de la Recherche Médicale (INSERM) U1024, F-75005 Paris, France
| |
Collapse
|
29
|
Egli T, Vukojevic V, Sengstag T, Jacquot M, Cabezón R, Coynel D, Freytag V, Heck A, Vogler C, de Quervain DJF, Papassotiropoulos A, Milnik A. Exhaustive search for epistatic effects on the human methylome. Sci Rep 2017; 7:13669. [PMID: 29057891 PMCID: PMC5651902 DOI: 10.1038/s41598-017-13256-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2017] [Accepted: 09/22/2017] [Indexed: 11/16/2022] Open
Abstract
Studies assessing the existence and magnitude of epistatic effects on complex human traits provide inconclusive results. The study of such effects is complicated by considerable increase in computational burden, model complexity, and model uncertainty, which in concert decrease model stability. An additional source introducing significant uncertainty with regard to the detection of robust epistasis is the biological distance between the genetic variation and the trait under study. Here we studied CpG methylation, a genetically complex molecular trait that is particularly close to genomic variation, and performed an exhaustive search for two-locus epistatic effects on the CpG-methylation signal in two cohorts of healthy young subjects. We detected robust epistatic effects for a small number of CpGs (N = 404). Our results indicate that epistatic effects explain only a minor part of variation in DNA-CpG methylation. Interestingly, these CpGs were more likely to be associated with gene-expression of nearby genes, as also shown by their overrepresentation in DNase I hypersensitivity sites and underrepresentation in CpG islands. Finally, gene ontology analysis showed a significant enrichment of these CpGs in pathways related to HPV-infection and cancer.
Collapse
Affiliation(s)
- Tobias Egli
- Division of Molecular Neuroscience, Department of Psychology, University of Basel, CH-4055, Basel, Switzerland.,Transfaculty Research Platform Molecular and Cognitive Neurosciences, University of Basel, CH-4055, Basel, Switzerland
| | - Vanja Vukojevic
- Division of Molecular Neuroscience, Department of Psychology, University of Basel, CH-4055, Basel, Switzerland.,Transfaculty Research Platform Molecular and Cognitive Neurosciences, University of Basel, CH-4055, Basel, Switzerland.,Department Biozentrum, Life Sciences Training Facility, University of Basel, CH-4056, Basel, Switzerland
| | - Thierry Sengstag
- sciCORE, Scientific Computing Center, University of Basel, CH-4056, Basel, Switzerland.,SIB - Swiss Institute of Bioinformatics, CH-1015, Lausanne, Switzerland
| | - Martin Jacquot
- sciCORE, Scientific Computing Center, University of Basel, CH-4056, Basel, Switzerland
| | - Rubén Cabezón
- sciCORE, Scientific Computing Center, University of Basel, CH-4056, Basel, Switzerland
| | - David Coynel
- Transfaculty Research Platform Molecular and Cognitive Neurosciences, University of Basel, CH-4055, Basel, Switzerland.,Division of Cognitive Neuroscience, Department of Psychology, University of Basel, CH-4055, Basel, Switzerland
| | - Virginie Freytag
- Division of Molecular Neuroscience, Department of Psychology, University of Basel, CH-4055, Basel, Switzerland.,Transfaculty Research Platform Molecular and Cognitive Neurosciences, University of Basel, CH-4055, Basel, Switzerland
| | - Angela Heck
- Division of Molecular Neuroscience, Department of Psychology, University of Basel, CH-4055, Basel, Switzerland.,Transfaculty Research Platform Molecular and Cognitive Neurosciences, University of Basel, CH-4055, Basel, Switzerland.,Psychiatric University Clinics, University of Basel, CH-4055, Basel, Switzerland
| | - Christian Vogler
- Division of Molecular Neuroscience, Department of Psychology, University of Basel, CH-4055, Basel, Switzerland.,Transfaculty Research Platform Molecular and Cognitive Neurosciences, University of Basel, CH-4055, Basel, Switzerland.,Psychiatric University Clinics, University of Basel, CH-4055, Basel, Switzerland
| | - Dominique J-F de Quervain
- Transfaculty Research Platform Molecular and Cognitive Neurosciences, University of Basel, CH-4055, Basel, Switzerland.,Psychiatric University Clinics, University of Basel, CH-4055, Basel, Switzerland.,Division of Cognitive Neuroscience, Department of Psychology, University of Basel, CH-4055, Basel, Switzerland
| | - Andreas Papassotiropoulos
- Division of Molecular Neuroscience, Department of Psychology, University of Basel, CH-4055, Basel, Switzerland.,Transfaculty Research Platform Molecular and Cognitive Neurosciences, University of Basel, CH-4055, Basel, Switzerland.,Psychiatric University Clinics, University of Basel, CH-4055, Basel, Switzerland.,Department Biozentrum, Life Sciences Training Facility, University of Basel, CH-4056, Basel, Switzerland
| | - Annette Milnik
- Division of Molecular Neuroscience, Department of Psychology, University of Basel, CH-4055, Basel, Switzerland. .,Transfaculty Research Platform Molecular and Cognitive Neurosciences, University of Basel, CH-4055, Basel, Switzerland. .,Psychiatric University Clinics, University of Basel, CH-4055, Basel, Switzerland.
| |
Collapse
|
30
|
Mezlini AM, Goldenberg A. Incorporating networks in a probabilistic graphical model to find drivers for complex human diseases. PLoS Comput Biol 2017; 13:e1005580. [PMID: 29023450 PMCID: PMC5638204 DOI: 10.1371/journal.pcbi.1005580] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2017] [Accepted: 05/09/2017] [Indexed: 12/12/2022] Open
Abstract
Discovering genetic mechanisms driving complex diseases is a hard problem. Existing methods often lack power to identify the set of responsible genes. Protein-protein interaction networks have been shown to boost power when detecting gene-disease associations. We introduce a Bayesian framework, Conflux, to find disease associated genes from exome sequencing data using networks as a prior. There are two main advantages to using networks within a probabilistic graphical model. First, networks are noisy and incomplete, a substantial impediment to gene discovery. Incorporating networks into the structure of a probabilistic models for gene inference has less impact on the solution than relying on the noisy network structure directly. Second, using a Bayesian framework we can keep track of the uncertainty of each gene being associated with the phenotype rather than returning a fixed list of genes. We first show that using networks clearly improves gene detection compared to individual gene testing. We then show consistently improved performance of Conflux compared to the state-of-the-art diffusion network-based method Hotnet2 and a variety of other network and variant aggregation methods, using randomly generated and literature-reported gene sets. We test Hotnet2 and Conflux on several network configurations to reveal biases and patterns of false positives and false negatives in each case. Our experiments show that our novel Bayesian framework Conflux incorporates many of the advantages of the current state-of-the-art methods, while offering more flexibility and improved power in many gene-disease association scenarios. Networks and pathway-based methods are commonly used to improve the power of gene detection in associations with complex human diseases. Network diffusion approaches have shown their effectiveness and superior performance in cancer studies. Still, there are many problems such as noise and missingness with currently available human networks that bias the results of gene detection. We propose a novel graphical model-based method Conflux that overcomes several of the pitfalls of the existing state-of-the-art approaches while building on their successes. Conflux integrates genotype data with networks directly, using diffusion-like methods, but only as part of a structure in a probabilistic model to reduce the negative effect of the noise in the networks. This Bayesian framework allows Conflux to keep track of the uncertainty in the gene list that is being associated with the disease and consequently rank the genes with respect to our confidence in the association. It also allows for the discovery of gene sets that are not fully supported by the network if they have enough support in the data. These improvements result in a flexible approach that improves the power in many gene-disease association scenarios while reducing the number of false positives reported.
Collapse
Affiliation(s)
- Aziz M Mezlini
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.,Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Anna Goldenberg
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.,Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| |
Collapse
|
31
|
Cheng SJ, Shi FY, Liu H, Ding Y, Jiang S, Liang N, Gao G. Accurately annotate compound effects of genetic variants using a context-sensitive framework. Nucleic Acids Res 2017; 45:e82. [PMID: 28158838 PMCID: PMC5449550 DOI: 10.1093/nar/gkx041] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2016] [Accepted: 01/24/2017] [Indexed: 02/07/2023] Open
Abstract
In genomics, effectively identifying the biological effects of genetic variants is crucial. Current methods handle each variant independently, assuming that each variant acts in a context-free manner. However, variants within the same gene may interfere with each other, producing combinational (compound) rather than individual effects. In this work, we introduce COPE, a gene-centric variant annotation tool that integrates the entire sequential context in evaluating the functional effects of intra-genic variants. Applying COPE to the 1000 Genomes dataset, we identified numerous cases of multiple-variant compound effects that frequently led to false-positive and false-negative loss-of-function calls by conventional variant-centric tools. Specifically, 64 disease-causing mutations were identified to be rescued in a specific genomic context, thus potentially contributing to the buffering effects for highly penetrant deleterious mutations. COPE is freely available for academic use at http://cope.cbi.pku.edu.cn.
Collapse
Affiliation(s)
- Si-Jin Cheng
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, People's Republic of China
| | - Fang-Yuan Shi
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, People's Republic of China
| | - Huan Liu
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, People's Republic of China
| | - Yang Ding
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, People's Republic of China
| | - Shuai Jiang
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, People's Republic of China
| | - Nan Liang
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, People's Republic of China
| | - Ge Gao
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, People's Republic of China
| |
Collapse
|
32
|
Crawford L, Zeng P, Mukherjee S, Zhou X. Detecting epistasis with the marginal epistasis test in genetic mapping studies of quantitative traits. PLoS Genet 2017; 13:e1006869. [PMID: 28746338 PMCID: PMC5550000 DOI: 10.1371/journal.pgen.1006869] [Citation(s) in RCA: 64] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Revised: 08/09/2017] [Accepted: 06/15/2017] [Indexed: 12/13/2022] Open
Abstract
Epistasis, commonly defined as the interaction between multiple genes, is an important genetic component underlying phenotypic variation. Many statistical methods have been developed to model and identify epistatic interactions between genetic variants. However, because of the large combinatorial search space of interactions, most epistasis mapping methods face enormous computational challenges and often suffer from low statistical power due to multiple test correction. Here, we present a novel, alternative strategy for mapping epistasis: instead of directly identifying individual pairwise or higher-order interactions, we focus on mapping variants that have non-zero marginal epistatic effects-the combined pairwise interaction effects between a given variant and all other variants. By testing marginal epistatic effects, we can identify candidate variants that are involved in epistasis without the need to identify the exact partners with which the variants interact, thus potentially alleviating much of the statistical and computational burden associated with standard epistatic mapping procedures. Our method is based on a variance component model, and relies on a recently developed variance component estimation method for efficient parameter inference and p-value computation. We refer to our method as the "MArginal ePIstasis Test", or MAPIT. With simulations, we show how MAPIT can be used to estimate and test marginal epistatic effects, produce calibrated test statistics under the null, and facilitate the detection of pairwise epistatic interactions. We further illustrate the benefits of MAPIT in a QTL mapping study by analyzing the gene expression data of over 400 individuals from the GEUVADIS consortium.
Collapse
Affiliation(s)
- Lorin Crawford
- Department of Biostatistics, Brown University, Providence, Rhode Island, United States of America
- Center for Statistical Sciences, Brown University, Providence, Rhode Island, United States of America
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
| | - Ping Zeng
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
- Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Sayan Mukherjee
- Department of Statistical Science, Duke University, Durham, North Carolina, United States of America
- Department of Computer Science, Duke University, Durham, North Carolina, United States of America
- Department of Mathematics, Duke University, Durham, North Carolina, United States of America
- Department of Bioinformatics & Biostatistics, Duke University, Durham, North Carolina, United States of America
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
- Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
33
|
van der Meer D, Hoekstra PJ, van Donkelaar M, Bralten J, Oosterlaan J, Heslenfeld D, Faraone SV, Franke B, Buitelaar JK, Hartman CA. Predicting attention-deficit/hyperactivity disorder severity from psychosocial stress and stress-response genes: a random forest regression approach. Transl Psychiatry 2017; 7:e1145. [PMID: 28585928 PMCID: PMC5537639 DOI: 10.1038/tp.2017.114] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/28/2016] [Revised: 04/24/2017] [Accepted: 04/28/2017] [Indexed: 12/20/2022] Open
Abstract
Identifying genetic variants contributing to attention-deficit/hyperactivity disorder (ADHD) is complicated by the involvement of numerous common genetic variants with small effects, interacting with each other as well as with environmental factors, such as stress exposure. Random forest regression is well suited to explore this complexity, as it allows for the analysis of many predictors simultaneously, taking into account any higher-order interactions among them. Using random forest regression, we predicted ADHD severity, measured by Conners' Parent Rating Scales, from 686 adolescents and young adults (of which 281 were diagnosed with ADHD). The analysis included 17 374 single-nucleotide polymorphisms (SNPs) across 29 genes previously linked to hypothalamic-pituitary-adrenal (HPA) axis activity, together with information on exposure to 24 individual long-term difficulties or stressful life events. The model explained 12.5% of variance in ADHD severity. The most important SNP, which also showed the strongest interaction with stress exposure, was located in a region regulating the expression of telomerase reverse transcriptase (TERT). Other high-ranking SNPs were found in or near NPSR1, ESR1, GABRA6, PER3, NR3C2 and DRD4. Chronic stressors were more influential than single, severe, life events. Top hits were partly shared with conduct problems. We conclude that random forest regression may be used to investigate how multiple genetic and environmental factors jointly contribute to ADHD. It is able to implicate novel SNPs of interest, interacting with stress exposure, and may explain inconsistent findings in ADHD genetics. This exploratory approach may be best combined with more hypothesis-driven research; top predictors and their interactions with one another should be replicated in independent samples.
Collapse
Affiliation(s)
- D van der Meer
- Department of Psychiatry, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- K.G. Jebsen Centre for Psychosis Research/Norwegian Centre for Mental Disorder Research (NORMENT), Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - P J Hoekstra
- Department of Psychiatry, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - M van Donkelaar
- Department of Human Genetics and Psychiatry, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Centre, Nijmegen, The Netherlands
| | - J Bralten
- Department of Human Genetics and Psychiatry, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Centre, Nijmegen, The Netherlands
| | - J Oosterlaan
- Department of Clinical Neuropsychology, VU University Amsterdam, Amsterdam, The Netherlands
| | - D Heslenfeld
- Department of Clinical Neuropsychology, VU University Amsterdam, Amsterdam, The Netherlands
| | - S V Faraone
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA
- Department of Neuroscience and Physiology, SUNY Upstate Medical University, Syracuse, NY, USA
- K.G. Jebsen Centre for Psychiatric Disorders, Department of Biomedicine, University of Bergen, Bergen, Norway
| | - B Franke
- Department of Human Genetics and Psychiatry, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Centre, Nijmegen, The Netherlands
| | - J K Buitelaar
- Karakter Child and Adolescent Psychiatry University Centre, Nijmegen, The Netherlands
- Department of Cognitive Neuroscience, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Centre, Nijmegen, The Netherlands
| | - C A Hartman
- Department of Psychiatry, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| |
Collapse
|
34
|
Genetic dissection of yield traits in super hybrid rice Xieyou9308 using both unconditional and conditional genome-wide association mapping. Sci Rep 2017; 7:824. [PMID: 28400567 PMCID: PMC5429764 DOI: 10.1038/s41598-017-00938-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Accepted: 03/20/2017] [Indexed: 01/03/2023] Open
Abstract
With the development and application of super rice breeding, elite rice hybrids with super high-yielding potential have been widely developed in last decades in China. Xieyou9308 is one of the most famous super hybrid rice varieties. To uncover the genetic mechanism of Xieyou9308’s high yield potential, a recombinant inbred line (RIL) population derived from cross of XieqingzaoB and Zhonghui9308 was re-sequenced and investigated on the grain yield (GYD) and its three component traits, number of panicles per plant (NP), number of filled grains per panicle (NFGP), and grain weight (GW). Unconditional and conditional genome-wide association analysis, based on a linear mixed model with epistasis and gene-environment interaction effects, were conducted, using ~0.7 million identified SNPs. There were six, four, seven, and seven QTSs identified for GYD, NP, NFGP, and GW, respectively, with accumulated explanatory heritability varying from 43.06% to 48.36%; additive by environment interactions were detected for GYD, some minor epistases were detected for NP and NFGP. Further, conditional genetic mapping analysis for GYD given its three components revealed several novel QTSs associated with yield than that were suppressed in our unconditional mapping analysis.
Collapse
|
35
|
Goudey B, Abraham G, Kikianty E, Wang Q, Rawlinson D, Shi F, Haviv I, Stern L, Kowalczyk A, Inouye M. Interactions within the MHC contribute to the genetic architecture of celiac disease. PLoS One 2017; 12:e0172826. [PMID: 28282431 PMCID: PMC5345796 DOI: 10.1371/journal.pone.0172826] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2016] [Accepted: 02/10/2017] [Indexed: 01/04/2023] Open
Abstract
Interaction analysis of GWAS can detect signal that would be ignored by single variant analysis, yet few robust interactions in humans have been detected. Recent work has highlighted interactions in the MHC region between known HLA risk haplotypes for various autoimmune diseases. To better understand the genetic interactions underlying celiac disease (CD), we have conducted exhaustive genome-wide scans for pairwise interactions in five independent CD case-control studies, using a rapid model-free approach to examine over 500 billion SNP pairs in total. We found 14 independent interaction signals within the MHC region that achieved stringent replication criteria across multiple studies and were independent of known CD risk HLA haplotypes. The strongest independent CD interaction signal corresponded to genes in the HLA class III region, in particular PRRC2A and GPANK1/C6orf47, which are known to contain variants for non-Hodgkin's lymphoma and early menopause, co-morbidities of celiac disease. Replicable evidence for statistical interaction outside the MHC was not observed. Both within and between European populations, we observed striking consistency of two-locus models and model distribution. Within the UK population, models of CD based on both interactions and additive single-SNP effects increased explained CD variance by approximately 1% over those of single SNPs. The interactions signal detected across the five cohorts indicates the presence of novel associations in the MHC region that cannot be detected using additive models. Our findings have implications for the determination of genetic architecture and, by extension, the use of human genetics for validation of therapeutic targets.
Collapse
Affiliation(s)
- Benjamin Goudey
- NICTA Victoria Research Lab, The University of Melbourne, Parkville, Victoria, Australia
- Centre for Epidemiology and Biostatistics, The University of Melbourne, Parkville, Victoria, Australia
- Department of Computing and Information Systems, The University of Melbourne, Parkville, Victoria, Australia
- IBM Research, Australia, Level 5, Carlton, Victoria, Australia
| | - Gad Abraham
- Centre for Systems Genomics, The University of Melbourne, Parkville, Victoria, Australia
- School of BioSciences, The University of Melbourne, Parkville, Victoria, Australia
- Department of Pathology, The University of Melbourne, Parkville, Victoria, Australia
| | - Eder Kikianty
- Department of Mathematics, University of Johannesburg, Auckland Park, South Africa
| | - Qiao Wang
- NICTA Victoria Research Lab, The University of Melbourne, Parkville, Victoria, Australia
- Centre for Epidemiology and Biostatistics, The University of Melbourne, Parkville, Victoria, Australia
| | - Dave Rawlinson
- NICTA Victoria Research Lab, The University of Melbourne, Parkville, Victoria, Australia
- Centre for Epidemiology and Biostatistics, The University of Melbourne, Parkville, Victoria, Australia
| | - Fan Shi
- NICTA Victoria Research Lab, The University of Melbourne, Parkville, Victoria, Australia
- Centre for Epidemiology and Biostatistics, The University of Melbourne, Parkville, Victoria, Australia
| | - Izhak Haviv
- Faculty of Medicine, Bar Ilan University, Safed, Israel
| | - Linda Stern
- Centre for Epidemiology and Biostatistics, The University of Melbourne, Parkville, Victoria, Australia
| | - Adam Kowalczyk
- NICTA Victoria Research Lab, The University of Melbourne, Parkville, Victoria, Australia
- Centre for Epidemiology and Biostatistics, The University of Melbourne, Parkville, Victoria, Australia
- Center for Neural Engineering, The University of Melbourne, Parkville, Victoria, Australia
| | - Michael Inouye
- Centre for Systems Genomics, The University of Melbourne, Parkville, Victoria, Australia
- School of BioSciences, The University of Melbourne, Parkville, Victoria, Australia
- Department of Pathology, The University of Melbourne, Parkville, Victoria, Australia
- * E-mail:
| |
Collapse
|
36
|
Luo X, Ding Y, Zhang L, Yue Y, Snyder JH, Ma C, Zhu J. Genomic Prediction of Genotypic Effects with Epistasis and Environment Interactions for Yield-Related Traits of Rapeseed ( Brassica napus L.). Front Genet 2017; 8:15. [PMID: 28270831 PMCID: PMC5318398 DOI: 10.3389/fgene.2017.00015] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2016] [Accepted: 02/03/2017] [Indexed: 11/16/2022] Open
Abstract
Oilseed rape (Brassica napus) is an economically important oil crop, yet the genetic architecture of its complex traits remain largely unknown. Here, genome-wide association study was conducted for eight yield-related traits to dissect the genetic architecture of additive, dominance, epistasis, and their environment interaction. Additionally, the optimal genotype combination and the breeding value of superior line, superior hybrid and existing best line in mapping population were predicted for each trait in two environments based on the predicted genotypic effects. As a result, 17 quantitative trait SNPs (QTSs) were identified significantly for target traits with total heritability varied from 58.47 to 87.98%, most of which were contributed by dominance, epistasis, and environment-specific effects. The results indicated that non-additive effects were large contributions to heritability and epistasis, and also noted that environment interactions were important variants for oilseed breeding. Our study facilitates the understanding of genetic basis of rapeseed yield trait, helps to accelerate rapeseed breading, and also offers a roadmap for precision plant breeding via marker-assisted selection.
Collapse
Affiliation(s)
- Xiang Luo
- National Key Laboratory of Crop Genetic Improvement, National Center of Rapeseed Improvement in Wuhan, Huazhong Agricultural University Wuhan, China
| | - Yi Ding
- Institute of Bioinformatics, Zhejiang University Hangzhou, China
| | - Linzhong Zhang
- Economic and Technical College, Anhui Agricultural University Hefei, China
| | - Yao Yue
- National Key Laboratory of Crop Genetic Improvement, National Center of Rapeseed Improvement in Wuhan, Huazhong Agricultural University Wuhan, China
| | - John H Snyder
- Institute of Bioinformatics, Zhejiang University Hangzhou, China
| | - Chaozhi Ma
- National Key Laboratory of Crop Genetic Improvement, National Center of Rapeseed Improvement in Wuhan, Huazhong Agricultural University Wuhan, China
| | - Jun Zhu
- Institute of Bioinformatics, Zhejiang University Hangzhou, China
| |
Collapse
|
37
|
Levine ME, Langfelder P, Horvath S. A Weighted SNP Correlation Network Method for Estimating Polygenic Risk Scores. Methods Mol Biol 2017; 1613:277-290. [PMID: 28849564 PMCID: PMC5998804 DOI: 10.1007/978-1-4939-7027-8_10] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/27/2023]
Abstract
Polygenic scores are useful for examining the joint associations of genetic markers. However, because traditional methods involve summing weighted allele counts, they may fail to capture the complex nature of biology. Here we describe a network-based method, which we call weighted SNP correlation network analysis (WSCNA), and demonstrate how it could be used to generate meaningful polygenic scores. Using data on human height in a US population of non-Hispanic whites, we illustrate how this method can be used to identify SNP networks from GWAS data, create network-specific polygenic scores, examine network topology to identify hub SNPs, and gain biological insights into complex traits. In our example, we show that this method explains a larger proportion of the variance in human height than traditional polygenic score methods. We also identify hub genes and pathways that have previously been identified as influencing human height. In moving forward, this method may be useful for generating genetic susceptibility measures for other health related traits, examining genetic pleiotropy, identifying at-risk individuals, examining gene score by environmental effects, and gaining a deeper understanding of the underlying biology of complex traits.
Collapse
Affiliation(s)
- Morgan E Levine
- Department of Human Genetics, University of California, Box 708822, 695 Charles E. Young Drive South, Los Angeles, CA, 90095, USA.
- Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, CA, 90095, USA.
| | - Peter Langfelder
- Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, CA, 90095, USA
| | - Steve Horvath
- Department of Human Genetics, University of California, Box 708822, 695 Charles E. Young Drive South, Los Angeles, CA, 90095, USA
- Department of Biostatistics, University of California, Los Angeles, CA, 90095, USA
| |
Collapse
|
38
|
Le Rouzic A, Álvarez-Castro JM. Epistasis-Induced Evolutionary Plateaus in Selection Responses. Am Nat 2016; 188:E134-E150. [DOI: 10.1086/688893] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
39
|
Abstract
Genes encode components of coevolved and interconnected networks. The effect of genotype on phenotype therefore depends on genotypic context through gene interactions known as epistasis. Epistasis is important in predicting phenotype from genotype for an individual. It is also examined in population studies to identify genetic risk factors in complex traits and to predict evolution under selection. Paradoxically, the effects of genotypic context in individuals and populations are distinct and sometimes contradictory. We argue that predicting genotype from phenotype for individuals based on population studies is difficult and, especially in human genetics, likely to result in underestimating the effects of genotypic context.
Collapse
Affiliation(s)
- Timothy B Sackton
- Informatics Group, 38 Oxford Street, Harvard University, Cambridge, MA 02138, USA
| | - Daniel L Hartl
- Department of Organismic and Evolutionary Biology, 16 Divinity Avenue, Harvard University, Cambridge, MA 02138, USA.
| |
Collapse
|
40
|
Exhaustive Genome-Wide Search for SNP-SNP Interactions Across 10 Human Diseases. G3-GENES GENOMES GENETICS 2016; 6:2043-50. [PMID: 27185397 PMCID: PMC4938657 DOI: 10.1534/g3.116.028563] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The identification of statistical SNP-SNP interactions may help explain the genetic etiology of many human diseases, but exhaustive genome-wide searches for these interactions have been difficult, due to a lack of power in most datasets. We aimed to use data from the Resource for Genetic Epidemiology Research on Adult Health and Aging (GERA) study to search for SNP-SNP interactions associated with 10 common diseases. FastEpistasis and BOOST were used to evaluate all pairwise interactions among approximately N = 300,000 single nucleotide polymorphisms (SNPs) with minor allele frequency (MAF) ≥ 0.15, for the dichotomous outcomes of allergic rhinitis, asthma, cardiac disease, depression, dermatophytosis, type 2 diabetes, dyslipidemia, hemorrhoids, hypertensive disease, and osteoarthritis. A total of N = 45,171 subjects were included after quality control steps were applied. These data were divided into discovery and replication subsets; the discovery subset had > 80% power, under selected models, to detect genome-wide significant interactions (P < 10(-12)). Interactions were also evaluated for enrichment in particular SNP features, including functionality, prior disease relevancy, and marginal effects. No interaction in any disease was significant in both the discovery and replication subsets. Enrichment analysis suggested that, for some outcomes, interactions involving SNPs with marginal effects were more likely to be nominally replicated, compared to interactions without marginal effects. If SNP-SNP interactions play a role in the etiology of the studied conditions, they likely have weak effect sizes, involve lower-frequency variants, and/or involve complex models of interaction that are not captured well by the methods that were utilized.
Collapse
|
41
|
Murk W, DeWan AT. Genome-wide search identifies a gene-gene interaction between 20p13 and 2q14 in asthma. BMC Genet 2016; 17:102. [PMID: 27387956 PMCID: PMC4936310 DOI: 10.1186/s12863-016-0376-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2016] [Accepted: 05/20/2016] [Indexed: 12/11/2022] Open
Abstract
Background Many studies have attempted to identify gene-gene interactions affecting asthma susceptibility. However, these studies have typically used candidate gene approaches in limiting the genetic search space, and there have been few searches for gene-gene interactions on a genome-wide scale. We aimed to conduct a genome-wide gene-gene interaction study for asthma, using data from the GABRIEL Consortium. Results A two-stage study design was used, including a screening analysis (N = 1625 subjects) and a follow-up analysis (N = 5264 subjects). In the screening analysis, all pairwise interactions among 301,547 SNPs were evaluated, encompassing a total of 4.55 × 1010 interactions. Those with a screening interaction p-value < 10−5 were evaluated in the follow-up analysis. No interaction selected from the screening analysis met strict statistical significance in the follow-up (p-value < 1.45 × 10−7). However, the top-ranked interaction (rs910652 [20p13] × rs11684871 [2q14]) in the follow-up (p-value = 1.58 × 10−6) was significant in one component of a replication analysis. This interaction was notable in that rs910652 is located within 78 kilobases of ADAM33, which is one of the most well studied asthma susceptibility genes. In addition, rs11684871 is located in or near GLI2, which may have biologically relevant roles in asthma. Conclusions Using a genome-wide approach, we identified and found suggestive evidence of replication for a gene-gene interaction in asthma involving loci that are potentially highly relevant in asthma pathogenesis. Electronic supplementary material The online version of this article (doi:10.1186/s12863-016-0376-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- William Murk
- Department of Chronic Disease Epidemiology, Yale School of Public Health, 60 College St., New Haven, CT, 06510, USA
| | - Andrew T DeWan
- Department of Chronic Disease Epidemiology, Yale School of Public Health, 60 College St., New Haven, CT, 06510, USA.
| |
Collapse
|
42
|
Local Joint Testing Improves Power and Identifies Hidden Heritability in Association Studies. Genetics 2016; 203:1105-16. [PMID: 27182951 DOI: 10.1534/genetics.116.188292] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2016] [Accepted: 04/27/2016] [Indexed: 12/19/2022] Open
Abstract
There is mounting evidence that complex human phenotypes are highly polygenic, with many loci harboring multiple causal variants, yet most genetic association studies examine each SNP in isolation. While this has led to the discovery of thousands of disease associations, discovered variants account for only a small fraction of disease heritability. Alternative multi-SNP methods have been proposed, but issues such as multiple-testing correction, sensitivity to genotyping error, and optimization for the underlying genetic architectures remain. Here we describe a local joint-testing procedure, complete with multiple-testing correction, that leverages a genetic phenomenon we call linkage masking wherein linkage disequilibrium between SNPs hides their signal under standard association methods. We show that local joint testing on the original Wellcome Trust Case Control Consortium (WTCCC) data set leads to the discovery of 22 associated loci, 5 more than the marginal approach. These loci were later found in follow-up studies containing thousands of additional individuals. We find that these loci significantly increase the heritability explained by genome-wide significant associations in the WTCCC data set. Furthermore, we show that local joint testing in a cis-expression QTL (eQTL) study of the gEUVADIS data set increases the number of genes containing significant eQTL by 10.7% over marginal analyses. Our multiple-hypothesis correction and joint-testing framework are available in a python software package called Jester, available at github.com/brielin/Jester.
Collapse
|
43
|
Percival CJ, Liberton DK, Pardo‐Manuel de Villena F, Spritz R, Marcucio R, Hallgrímsson B. Genetics of murine craniofacial morphology: diallel analysis of the eight founders of the Collaborative Cross. J Anat 2016; 228:96-112. [PMID: 26426826 PMCID: PMC4694168 DOI: 10.1111/joa.12382] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/20/2015] [Indexed: 11/28/2022] Open
Abstract
Using eight inbred founder strains of the mouse Collaborative Cross (CC) project and their reciprocal F1 hybrids, we quantified variation in craniofacial morphology across mouse strains, explored genetic contributions to craniofacial variation that distinguish the founder strains, and tested whether specific or summary measures of craniofacial shape display stronger additive genetic contributions. This study thus provides critical information about phenotypic diversity among CC founder strains and about the genetic contributions to this phenotypic diversity, which is relevant to understanding the basis of variation in standard laboratory strains and natural populations. Craniofacial shape was quantified as a series of size-adjusted linear dimensions (RDs) and by principal components (PC) analysis of morphological landmarks captured from computed tomography images from 62 of the 64 reciprocal crosses of the CC founder strains. We first identified aspects of skull morphology that vary between these phenotypically 'normal' founder strains and that are defining characteristics of these strains. We estimated the contributions of additive and various non-additive genetic factors to phenotypic variation using diallel analyses of a subset of these strongly differing RDs and the first eight PCs of skull shape variation. We find little difference in the genetic contributions to RD measures and PC scores, suggesting fundamental similarities in the magnitude of genetic contributions to both specific and summary measures of craniofacial phenotypes. Our results indicate that there are stronger additive genetic effects associated with defining phenotypic characteristics of specific founder strains, suggesting these distinguishing measures are good candidates for use in genotype-phenotype association studies of CC mice. Our results add significantly to understanding of genotype-phenotype associations in the skull, which serve as a foundation for modeling the origins of medically and evolutionarily relevant variation.
Collapse
Affiliation(s)
- Christopher J. Percival
- Alberta Children's Hospital Institute for Child and Maternal HealthUniversity of CalgaryCalgaryABCanada
- The McCaig Bone and Joint InstituteUniversity of CalgaryCalgaryABCanada
- Department of Cell Biology and AnatomyUniversity of CalgaryCalgaryABCanada
| | - Denise K. Liberton
- The McCaig Bone and Joint InstituteUniversity of CalgaryCalgaryABCanada
- Department of Cell Biology and AnatomyUniversity of CalgaryCalgaryABCanada
- Present address: National Institute of Dental and Craniofacial ResearchBethesdaMDUSA
| | | | - Richard Spritz
- Human Medical Genetics and Genomics ProgramUniversity of Colorado School of MedicineAuroraCOUSA
| | - Ralph Marcucio
- The Orthopaedic Trauma InstituteDepartment of Orthopaedic SurgeryUCSF School of MedicineSan FranciscoCAUSA
| | - Benedikt Hallgrímsson
- Alberta Children's Hospital Institute for Child and Maternal HealthUniversity of CalgaryCalgaryABCanada
- The McCaig Bone and Joint InstituteUniversity of CalgaryCalgaryABCanada
- Department of Cell Biology and AnatomyUniversity of CalgaryCalgaryABCanada
| |
Collapse
|
44
|
Nazarian A, Gezan SA. Integrating Nonadditive Genomic Relationship Matrices into the Study of Genetic Architecture of Complex Traits. J Hered 2015; 107:153-62. [PMID: 26712858 DOI: 10.1093/jhered/esv096] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2015] [Accepted: 11/05/2015] [Indexed: 01/22/2023] Open
Abstract
The study of genetic architecture of complex traits has been dramatically influenced by implementing genome-wide analytical approaches during recent years. Of particular interest are genomic prediction strategies which make use of genomic information for predicting phenotypic responses instead of detecting trait-associated loci. In this work, we present the results of a simulation study to improve our understanding of the statistical properties of estimation of genetic variance components of complex traits, and of additive, dominance, and genetic effects through best linear unbiased prediction methodology. Simulated dense marker information was used to construct genomic additive and dominance matrices, and multiple alternative pedigree- and marker-based models were compared to determine if including a dominance term into the analysis may improve the genetic analysis of complex traits. Our results showed that a model containing a pedigree- or marker-based additive relationship matrix along with a pedigree-based dominance matrix provided the best partitioning of genetic variance into its components, especially when some degree of true dominance effects was expected to exist. Also, we noted that the use of a marker-based additive relationship matrix along with a pedigree-based dominance matrix had the best performance in terms of accuracy of correlations between true and estimated additive, dominance, and genetic effects.
Collapse
Affiliation(s)
- Alireza Nazarian
- From the School of Forest Resources & Conservation, University of Florida, 363 Newins-Ziegler Hall P.O. Box 110410, Gainesville, FL 32611-0410
| | - Salvador A Gezan
- From the School of Forest Resources & Conservation, University of Florida, 363 Newins-Ziegler Hall P.O. Box 110410, Gainesville, FL 32611-0410.
| |
Collapse
|
45
|
Wang J, Joshi T, Valliyodan B, Shi H, Liang Y, Nguyen HT, Zhang J, Xu D. A Bayesian model for detection of high-order interactions among genetic variants in genome-wide association studies. BMC Genomics 2015; 16:1011. [PMID: 26607428 PMCID: PMC4660815 DOI: 10.1186/s12864-015-2217-6] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2015] [Accepted: 11/16/2015] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND A central question for disease studies and crop improvements is how genetics variants drive phenotypes. Genome Wide Association Study (GWAS) provides a powerful tool for characterizing the genotype-phenotype relationships in complex traits and diseases. Epistasis (gene-gene interaction), including high-order interaction among more than two genes, often plays important roles in complex traits and diseases, but current GWAS analysis usually just focuses on additive effects of single nucleotide polymorphisms (SNPs). The lack of effective computational modelling of high-order functional interactions often leads to significant under-utilization of GWAS data. RESULTS We have developed a novel Bayesian computational method with a Markov Chain Monte Carlo (MCMC) search, and implemented the method as a Bayesian High-order Interaction Toolkit (BHIT) for detecting epistatic interactions among SNPs. BHIT first builds a Bayesian model on both continuous data and discrete data, which is capable of detecting high-order interactions in SNPs related to case--control or quantitative phenotypes. We also developed a pipeline that enables users to apply BHIT on different species in different use cases. CONCLUSIONS Using both simulation data and soybean nutritional seed composition studies on oil content and protein content, BHIT effectively detected some high-order interactions associated with phenotypes, and it outperformed a number of other available tools. BHIT is freely available for academic users at http://digbio.missouri.edu/BHIT/.
Collapse
Affiliation(s)
- Juexin Wang
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China.
- Department of Computer Science, Informatics Institute, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA.
| | - Trupti Joshi
- Department of Computer Science, Informatics Institute, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA.
| | - Babu Valliyodan
- Division of Plant Sciences and National Center for Soybean Biotechnology (NCSB), University of Missouri, Columbia, MO, USA.
| | - Haiying Shi
- Division of Plant Sciences and National Center for Soybean Biotechnology (NCSB), University of Missouri, Columbia, MO, USA.
| | - Yanchun Liang
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China.
- Department of Computer Science, Informatics Institute, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA.
| | - Henry T Nguyen
- Division of Plant Sciences and National Center for Soybean Biotechnology (NCSB), University of Missouri, Columbia, MO, USA.
| | - Jing Zhang
- Department of Statistics, Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA.
- Department of Mathematics and Statistics, Georgia State University, Atlanta, GA, USA.
| | - Dong Xu
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China.
- Department of Computer Science, Informatics Institute, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA.
| |
Collapse
|
46
|
Porth I, Klápště J, McKown AD, La Mantia J, Guy RD, Ingvarsson PK, Hamelin R, Mansfield SD, Ehlting J, Douglas CJ, El-Kassaby YA. Evolutionary Quantitative Genomics of Populus trichocarpa. PLoS One 2015; 10:e0142864. [PMID: 26599762 PMCID: PMC4658102 DOI: 10.1371/journal.pone.0142864] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2015] [Accepted: 10/27/2015] [Indexed: 11/23/2022] Open
Abstract
Forest trees generally show high levels of local adaptation and efforts focusing on understanding adaptation to climate will be crucial for species survival and management. Here, we address fundamental questions regarding the molecular basis of adaptation in undomesticated forest tree populations to past climatic environments by employing an integrative quantitative genetics and landscape genomics approach. Using this comprehensive approach, we studied the molecular basis of climate adaptation in 433 Populus trichocarpa (black cottonwood) genotypes originating across western North America. Variation in 74 field-assessed traits (growth, ecophysiology, phenology, leaf stomata, wood, and disease resistance) was investigated for signatures of selection (comparing QST -FST) using clustering of individuals by climate of origin (temperature and precipitation). 29,354 SNPs were investigated employing three different outlier detection methods and marker-inferred relatedness was estimated to obtain the narrow-sense estimate of population differentiation in wild populations. In addition, we compared our results with previously assessed selection of candidate SNPs using the 25 topographical units (drainages) across the P. trichocarpa sampling range as population groupings. Narrow-sense QST for 53% of distinct field traits was significantly divergent from expectations of neutrality (indicating adaptive trait variation); 2,855 SNPs showed signals of diversifying selection and of these, 118 SNPs (within 81 genes) were associated with adaptive traits (based on significant QST). Many SNPs were putatively pleiotropic for functionally uncorrelated adaptive traits, such as autumn phenology, height, and disease resistance. Evolutionary quantitative genomics in P. trichocarpa provides an enhanced understanding regarding the molecular basis of climate-driven selection in forest trees and we highlight that important loci underlying adaptive trait variation also show relationship to climate of origin. We consider our approach the most comprehensive, as it uncovers the molecular mechanisms of adaptation using multiple methods and tests. We also provide a detailed outline of the required analyses for studying adaptation to the environment in a population genomics context to better understand the species’ potential adaptive capacity to future climatic scenarios.
Collapse
Affiliation(s)
- Ilga Porth
- Department of Forest and Conservation Sciences, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
- Département des Sciences du Bois et de la Forêt, Faculté de Foresterie, de Géographie et de Géomatique, Université Laval, Québec, QC, G1V 0A6 Canada
| | - Jaroslav Klápště
- Department of Forest and Conservation Sciences, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
- Department of Genetics and Physiology of Forest Trees, Czech University of Life Sciences, Prague, 165 21, Czech Republic
| | - Athena D. McKown
- Department of Forest and Conservation Sciences, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Jonathan La Mantia
- Department of Forest and Conservation Sciences, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
- Corn, Soybean and Wheat Quality Research Unit, United States Department of Agriculture, Wooster, Ohio, 44691 United States of America
| | - Robert D. Guy
- Department of Forest and Conservation Sciences, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Pär K. Ingvarsson
- Department of Ecology and Environmental Science, Umeå University, Umeå, SE-901 87, Sweden
| | - Richard Hamelin
- Department of Forest and Conservation Sciences, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Shawn D. Mansfield
- Department of Wood Science, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Jürgen Ehlting
- Department of Biology and Centre for Forest Biology, University of Victoria, Victoria, BC V8W 3N5, Canada
| | - Carl J. Douglas
- Department of Botany, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Yousry A. El-Kassaby
- Department of Forest and Conservation Sciences, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
- * E-mail:
| |
Collapse
|
47
|
De Kort H, Vander Mijnsbrugge K, Vandepitte K, Mergeay J, Ovaskainen O, Honnay O. Evolution, plasticity and evolving plasticity of phenology in the tree species Alnus glutinosa. J Evol Biol 2015; 29:253-64. [DOI: 10.1111/jeb.12777] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2015] [Revised: 10/13/2015] [Accepted: 10/16/2015] [Indexed: 12/12/2022]
Affiliation(s)
- H. De Kort
- Plant Conservation and Population Biology; Biology Department; University of Leuven; Heverlee Belgium
| | - K. Vander Mijnsbrugge
- Research Institute for Nature and Forest; Geraardsbergen Belgium
- Agency for Nature and Forest; Brussels Belgium
| | - K. Vandepitte
- Plant Conservation and Population Biology; Biology Department; University of Leuven; Heverlee Belgium
| | - J. Mergeay
- Research Institute for Nature and Forest; Geraardsbergen Belgium
| | - O. Ovaskainen
- Department of Biosciences; University of Helsinki; Helsinki Finland
| | - O. Honnay
- Plant Conservation and Population Biology; Biology Department; University of Leuven; Heverlee Belgium
| |
Collapse
|
48
|
Closing the translational gap between mutant mouse models and the clinical reality of psychotic illness. Neurosci Biobehav Rev 2015; 58:19-35. [DOI: 10.1016/j.neubiorev.2015.01.016] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2014] [Revised: 01/07/2015] [Accepted: 01/12/2015] [Indexed: 02/03/2023]
|
49
|
The Nature of Genetic Variation for Complex Traits Revealed by GWAS and Regional Heritability Mapping Analyses. Genetics 2015; 201:1601-13. [PMID: 26482794 DOI: 10.1534/genetics.115.177220] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Accepted: 10/09/2015] [Indexed: 02/08/2023] Open
Abstract
We use computer simulations to investigate the amount of genetic variation for complex traits that can be revealed by single-SNP genome-wide association studies (GWAS) or regional heritability mapping (RHM) analyses based on full genome sequence data or SNP chips. We model a large population subject to mutation, recombination, selection, and drift, assuming a pleiotropic model of mutations sampled from a bivariate distribution of effects of mutations on a quantitative trait and fitness. The pleiotropic model investigated, in contrast to previous models, implies that common mutations of large effect are responsible for most of the genetic variation for quantitative traits, except when the trait is fitness itself. We show that GWAS applied to the full sequence increases the number of QTL detected by as much as 50% compared to the number found with SNP chips but only modestly increases the amount of additive genetic variance explained. Even with full sequence data, the total amount of additive variance explained is generally below 50%. Using RHM on the full sequence data, a slightly larger number of QTL are detected than by GWAS if the same probability threshold is assumed, but these QTL explain a slightly smaller amount of genetic variance. Our results also suggest that most of the missing heritability is due to the inability to detect variants of moderate effect (∼0.03-0.3 phenotypic SDs) segregating at substantial frequencies. Very rare variants, which are more difficult to detect by GWAS, are expected to contribute little genetic variation, so their eventual detection is less relevant for resolving the missing heritability problem.
Collapse
|
50
|
Remington DL. Alleles versus mutations: Understanding the evolution of genetic architecture requires a molecular perspective on allelic origins. Evolution 2015; 69:3025-38. [DOI: 10.1111/evo.12775] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2014] [Revised: 07/06/2015] [Accepted: 09/08/2015] [Indexed: 01/02/2023]
Affiliation(s)
- David L. Remington
- Department of Biology; University of North Carolina at Greensboro; Greensboro North Carolina 27402
| |
Collapse
|