1
|
Chu BB, Gu J, Chen Z, Morrison T, Candès E, He Z, Sabatti C. Second-order group knockoffs with applications to genome-wide association studies. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae580. [PMID: 39340798 DOI: 10.1093/bioinformatics/btae580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 08/15/2024] [Accepted: 09/24/2024] [Indexed: 09/30/2024]
Abstract
MOTIVATION Conditional testing via the knockoff framework allows one to identify-among a large number of possible explanatory variables-those that carry unique information about an outcome of interest and also provides a false discovery rate guarantee on the selection. This approach is particularly well suited to the analysis of genome-wide association studies (GWAS), which have the goal of identifying genetic variants that influence traits of medical relevance. RESULTS While conditional testing can be both more powerful and precise than traditional GWAS analysis methods, its vanilla implementation encounters a difficulty common to all multivariate analysis methods: it is challenging to distinguish among multiple, highly correlated regressors. This impasse can be overcome by shifting the object of inference from single variables to groups of correlated variables. To achieve this, it is necessary to construct "group knockoffs." While successful examples are already documented in the literature, this paper substantially expands the set of algorithms and software for group knockoffs. We focus in particular on second-order knockoffs, for which we describe correlation matrix approximations that are appropriate for GWAS data and that result in considerable computational savings. We illustrate the effectiveness of the proposed methods with simulations and with the analysis of albuminuria data from the UK Biobank. AVAILABILITY AND IMPLEMENTATION The described algorithms are implemented in an open-source Julia package Knockoffs.jl. R and Python wrappers are available as knockoffsr and knockoffspy packages.
Collapse
Affiliation(s)
- Benjamin B Chu
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Jiaqi Gu
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94035, USA
| | - Zhaomeng Chen
- Department of Statistics, Stanford University, Stanford, CA, 94035, USA
| | - Tim Morrison
- Department of Statistics, Stanford University, Stanford, CA, 94035, USA
| | - Emmanuel Candès
- Department of Statistics, Stanford University, Stanford, CA, 94035, USA
- Department of Mathematics, Stanford University, Stanford, CA, 94035, USA
| | - Zihuai He
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94035, USA
- Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA, 94035, USA
| | - Chiara Sabatti
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
- Department of Statistics, Stanford University, Stanford, CA, 94035, USA
| |
Collapse
|
2
|
Gardiner K, Zhang X, Xing L. BLESS: bagged logistic regression for biomarker identification. Front Genet 2024; 15:1336891. [PMID: 39319317 PMCID: PMC11419974 DOI: 10.3389/fgene.2024.1336891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Accepted: 07/31/2024] [Indexed: 09/26/2024] Open
Abstract
The traditional single nucleotide polymorphism (SNP)-wise approach in genome-wide association studies is focused on examining the marginal association between each SNP with the outcome separately and applying multiple testing adjustments to the resulting p-values to reduce false positives. However, the approach suffers a lack of power in identifying biomarkers. We design an ensemble machine learning approach to aggregate results from logistic regression models based on multiple subsamples, which helps to identify biomarkers from high-dimensional genomic data. We use different methods to analyze a genome-wide association study from the Alzheimer's Disease Neuroimaging Initiative. The SNP-wise approach does not identify any significant signal, while our novel approach provides a list of ranked SNPs associated with the cognitive functions of interests.
Collapse
Affiliation(s)
- Kyle Gardiner
- Department of Mathematics and Statistics, University of Saskatchewan, Saskatoon, SK, Canada
| | - Xuekui Zhang
- Department of Mathematics and Statistics, University of Victoria, Victoria, BC, Canada
| | - Li Xing
- Department of Mathematics and Statistics, University of Saskatchewan, Saskatoon, SK, Canada
| |
Collapse
|
3
|
Neamatzadeh H, Dastgheib SA, Mazaheri M, Masoudi A, Shiri A, Omidi A, Rahmani A, Golshan-Tafti A, Aghasipour M, Yeganegi M, Bahrami M, Aghili K, Khajehnoori S, Mosavi Jarrahi A. Hardy-Weinberg Equilibrium in Meta-Analysis Studies and Large-Scale Genomic Sequencing Era. Asian Pac J Cancer Prev 2024; 25:2229-2235. [PMID: 39068553 PMCID: PMC11480592 DOI: 10.31557/apjcp.2024.25.7.2229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Indexed: 07/30/2024] Open
Abstract
The Hardy-Weinberg Equilibrium (HWE) is a fundamental principle employed in the analysis of genetic data, encompassing studies of meta-analysis and genomic sequencing. It has been demonstrated that HWE possesses the property of transitivity, wherein a multi-allelic polymorphism in equilibrium will persist in its equilibrium state even when alleles are deleted or combined. Nonetheless, the practice of filtering loci that do not adhere to HWE has been observed to impact the inference of population genetics within RADseq datasets. In response to this concern, the Robust Unified Test for HWE (RUTH) has been devised to consider population structure and genotype uncertainty, thereby offering a more precise evaluation of the quality of genotype data. Furthermore, deviations from HWE, such as extreme heterozygote excess, can be effectively utilized to identify genotyping errors or to pinpoint the presence of rare recessive disease-causing variants. In summary, it is evident that HWE holds immense significance in the field of genetic analysis, and its application in meta-analysis studies and genomic sequencing can yield invaluable insights into the intricacies of population structure and the genetics of diseases.
Collapse
Affiliation(s)
- Hossein Neamatzadeh
- Mother and Newborn Health Research Center, Shahid Sadoughi University of Medical Sciences, Yazd, Iran.
| | - Seyed Alireza Dastgheib
- Department of Medical Genetics, School of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran.
| | - Mahta Mazaheri
- Mother and Newborn Health Research Center, Shahid Sadoughi University of Medical Sciences, Yazd, Iran.
| | - Ali Masoudi
- General Practitioner, Shahid Sadoughi University of Medical Sciences, Yazd, Iran.
| | - Amirmasoud Shiri
- General Practitioner, Shiraz University of Medical Sciences, Shiraz, Iran.
| | - Amirhossein Omidi
- General Practitioner, Shahid Sadoughi University of Medical Sciences, Yazd, Iran.
| | - Amirhossein Rahmani
- Department of Plastic Surgery, Iranshahr University of Medical Sciences, Iranshahr, Iran.
| | - Ahmadreza Golshan-Tafti
- Student Research Committee, School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| | - Maryam Aghasipour
- Department of Cancer Biology, College of Medicine, University of Cincinnati, Ohio, USA.
| | - Maryam Yeganegi
- Department of Obstetrics and Gynecology, Iranshahr University of Medical Sciences, Iranshahr, Iran.
| | - Mohammad Bahrami
- General Practitioner, Shiraz University of Medical Sciences, Shiraz, Iran.
| | - Kazem Aghili
- Department of Radiology, Shahid Rahnamoun Hospital, School of Medicine, Shahid Sadoughi University of Medical Sciences, Yazd, Iran.
| | - Sahel Khajehnoori
- Hematology and Oncology Research Center, Shahid Sadoughi University of Medical Sciences, Yazd, Iran.
| | - Alireza Mosavi Jarrahi
- Department of Social Medicine, Medical School, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
4
|
Li Y, Lei H, Wen X, Cao H. A powerful approach to identify replicable variants in genome-wide association studies. Am J Hum Genet 2024; 111:966-978. [PMID: 38701746 PMCID: PMC11080610 DOI: 10.1016/j.ajhg.2024.04.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Revised: 04/04/2024] [Accepted: 04/04/2024] [Indexed: 05/05/2024] Open
Abstract
Replicability is the cornerstone of modern scientific research. Reliable identifications of genotype-phenotype associations that are significant in multiple genome-wide association studies (GWASs) provide stronger evidence for the findings. Current replicability analysis relies on the independence assumption among single-nucleotide polymorphisms (SNPs) and ignores the linkage disequilibrium (LD) structure. We show that such a strategy may produce either overly liberal or overly conservative results in practice. We develop an efficient method, ReAD, to detect replicable SNPs associated with the phenotype from two GWASs accounting for the LD structure. The local dependence structure of SNPs across two heterogeneous studies is captured by a four-state hidden Markov model (HMM) built on two sequences of p values. By incorporating information from adjacent locations via the HMM, our approach provides more accurate SNP significance rankings. ReAD is scalable, platform independent, and more powerful than existing replicability analysis methods with effective false discovery rate control. Through analysis of datasets from two asthma GWASs and two ulcerative colitis GWASs, we show that ReAD can identify replicable genetic loci that existing methods might otherwise miss.
Collapse
Affiliation(s)
- Yan Li
- School of Computer Science and Technology, Changchun University of Science and Technology, Changchun, Jilin 130022, China; School of Mathematics, Jilin University, Changchun, Jilin 130012, China
| | - Haochen Lei
- Department of Statistics, Florida State University, Tallahassee, FL 32306, USA
| | - Xiaoquan Wen
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Hongyuan Cao
- Department of Statistics, Florida State University, Tallahassee, FL 32306, USA.
| |
Collapse
|
5
|
He Z, Chu B, Yang J, Gu J, Chen Z, Liu L, Morrison T, Belloy ME, Qi X, Hejazi N, Mathur M, Le Guen Y, Tang H, Hastie T, Ionita-laza I, Sabatti C, Candès E. Beyond guilty by association at scale: searching for causal variants on the basis of genome-wide summary statistics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.28.582621. [PMID: 38464202 PMCID: PMC10925326 DOI: 10.1101/2024.02.28.582621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Understanding the causal genetic architecture of complex phenotypes is essential for future research into disease mechanisms and potential therapies. Here, we present a novel framework for genome-wide detection of sets of variants that carry non-redundant information on the phenotypes and are therefore more likely to be causal in a biological sense. Crucially, our framework requires only summary statistics obtained from standard genome-wide marginal association testing. The described approach, implemented in open-source software, is also computationally efficient, requiring less than 15 minutes on a single CPU to perform genome-wide analysis. Through extensive genome-wide simulation studies, we show that the method can substantially outperform usual two-stage marginal association testing and fine-mapping procedures in precision and recall. In applications to a meta-analysis of ten large-scale genetic studies of Alzheimer's disease (AD), we identified 82 loci associated with AD, including 37 additional loci missed by conventional GWAS pipeline. The identified putative causal variants achieve state-of-the-art agreement with massively parallel reporter assays and CRISPR-Cas9 experiments. Additionally, we applied the method to a retrospective analysis of 67 large-scale GWAS summary statistics since 2013 for a variety of phenotypes. Results reveal the method's capacity to robustly discover additional loci for polygenic traits and pinpoint potential causal variants underpinning each locus beyond conventional GWAS pipeline, contributing to a deeper understanding of complex genetic architectures in post-GWAS analyses.
Collapse
Affiliation(s)
- Zihuai He
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA 94305, USA
- Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA, 94305, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Benjamin Chu
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - James Yang
- Department of Statistics, Stanford University, Stanford, CA 94305, USA
| | - Jiaqi Gu
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA 94305, USA
- Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA, 94305, USA
| | - Zhaomeng Chen
- Department of Statistics, Stanford University, Stanford, CA 94305, USA
| | - Linxi Liu
- Department of Statistics, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | - Tim Morrison
- Department of Statistics, Stanford University, Stanford, CA 94305, USA
| | - Michael E. Belloy
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA 94305, USA
| | - Xinran Qi
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA 94305, USA
- Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA, 94305, USA
| | - Nima Hejazi
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Maya Mathur
- Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA, 94305, USA
- Department of Pediatrics, Stanford University, Stanford, CA 94305, USA
| | - Yann Le Guen
- Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA, 94305, USA
| | - Hua Tang
- Department of Pediatrics, Stanford University, Stanford, CA 94305, USA
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Trevor Hastie
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
- Department of Statistics, Stanford University, Stanford, CA 94305, USA
| | - Iuliana Ionita-laza
- Department of Biostatistics, Columbia University Mailman School of Public Health, New York, NY 10032, USA
| | - Chiara Sabatti
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
- Department of Statistics, Stanford University, Stanford, CA 94305, USA
| | - Emmanuel Candès
- Department of Statistics, Stanford University, Stanford, CA 94305, USA
- Department of Mathematics, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
6
|
Chen Z, He Z, Chu BB, Gu J, Morrison T, Sabatti C, Candès E. Controlled Variable Selection from Summary Statistics Only? A Solution via GhostKnockoffs and Penalized Regression. ARXIV 2024:arXiv:2402.12724v1. [PMID: 38463500 PMCID: PMC10925382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Identifying which variables do influence a response while controlling false positives pervades statistics and data science. In this paper, we consider a scenario in which we only have access to summary statistics, such as the values of marginal empirical correlations between each dependent variable of potential interest and the response. This situation may arise due to privacy concerns, e.g., to avoid the release of sensitive genetic information. We extend GhostKnockoffs He et al. [2022] and introduce variable selection methods based on penalized regression achieving false discovery rate (FDR) control. We report empirical results in extensive simulation studies, demonstrating enhanced performance over previous work. We also apply our methods to genome-wide association studies of Alzheimer's disease, and evidence a significant improvement in power.
Collapse
Affiliation(s)
| | - Zihuai He
- Department of Neurology and Neurological Sciences, Stanford University
- Department of Medicine (Biomedical Informatics Research), Stanford University
| | - Benjamin B Chu
- Department of Biomedical Data Science, Stanford University
| | - Jiaqi Gu
- Department of Neurology and Neurological Sciences, Stanford University
| | | | - Chiara Sabatti
- Department of Statistics, Stanford University
- Department of Biomedical Data Science, Stanford University
| | - Emmanuel Candès
- Department of Statistics, Stanford University
- Department of Mathematics, Stanford University
| |
Collapse
|
7
|
Zhai W, Zhao A, Wei C, Xu Y, Cui X, Zhang Y, Meng L, Sun L. Undetected Association Between Fatty Acids and Dementia with Lewy Bodies: A Bidirectional Two-Sample Mendelian Randomization Study. J Alzheimers Dis 2024; 100:1083-1097. [PMID: 38995791 DOI: 10.3233/jad-240267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/14/2024]
Abstract
Background Although observational studies indicated connections between fatty acids (FAs) and Alzheimer's disease and dementia, uncertainty persists regarding how these relationships extend to dementia with Lewy bodies (DLB). Objective To explore the potential causal relationships between FAs and the development of DLB, thus clarifying these associations using genetic instruments to infer causality. Methods We applied a two-sample Mendelian randomization (MR) and multivariable Mendelian randomization (MVMR) approach. Genetic data were obtained from a DLB cohort, comprising 2,591 cases and 4,027 controls of European descent. Eight FAs, including linoleic acid, docosahexaenoic acid, monounsaturated fatty acid, omega-3 fatty acid, omega-6 fatty acid, polyunsaturated fatty acid, saturated fatty acid, and total fatty acid, were procured from a comprehensive GWAS of metabolic biomarkers of UK Biobank, conducted by Nightingale Health in 2020 (met-d), involving 114,999 individuals. Our analysis included inverse-variance weighted, MR-Egger, weighted-median, simple mode, and weighted-mode MR estimates. Cochran's Q-statistics, MR-PRESSO, and MR-Egger intercept test were used to quantify the heterogeneity and horizontal pleiotropy of instrumental variables. Results Only linoleic acid showed a significant genetic association with the risk of developing DLB in the univariate MR. The odds ratio for linoleic acid was 1.337 with a 95% confidence interval of 1.019-1.756 (pIVW = 0.036). Results from the MVMR showed that no FAs were associated with the incidence of DLB. Conclusions The results did not support the hypothesis that FAs could reduce the risk of developing DLB. However, elucidating the relationship between FAs and DLB risk holds potential implications for informing dietary recommendations and therapeutic approaches in DLB.
Collapse
Affiliation(s)
- Weijie Zhai
- Department of Neurology and Neuroscience Center, The First Hospital of Jilin University, Jilin University, Changchun, China
| | - Anguo Zhao
- Department of Urology, The Fourth Affiliated Hospital of Soochow University Medical Center of Soochow University, Suzhou Dushu Lake Hospital, Suzhou, China
| | - Chunxiao Wei
- Department of Neurology and Neuroscience Center, The First Hospital of Jilin University, Jilin University, Changchun, China
| | - Yanjiao Xu
- Department of Neurology and Neuroscience Center, The First Hospital of Jilin University, Jilin University, Changchun, China
| | - Xinran Cui
- Department of Neurology and Neuroscience Center, The First Hospital of Jilin University, Jilin University, Changchun, China
| | - Yan Zhang
- Department of Neurology and Neuroscience Center, The First Hospital of Jilin University, Jilin University, Changchun, China
| | - Lingjie Meng
- Department of Neurology and Neuroscience Center, The First Hospital of Jilin University, Jilin University, Changchun, China
| | - Li Sun
- Department of Neurology and Neuroscience Center, The First Hospital of Jilin University, Jilin University, Changchun, China
| |
Collapse
|
8
|
Cui R, Elzur RA, Kanai M, Ulirsch JC, Weissbrod O, Daly MJ, Neale BM, Fan Z, Finucane HK. Improving fine-mapping by modeling infinitesimal effects. Nat Genet 2024; 56:162-169. [PMID: 38036779 PMCID: PMC11056999 DOI: 10.1038/s41588-023-01597-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 10/26/2023] [Indexed: 12/02/2023]
Abstract
Fine-mapping aims to identify causal genetic variants for phenotypes. Bayesian fine-mapping algorithms (for example, SuSiE, FINEMAP, ABF and COJO-ABF) are widely used, but assessing posterior probability calibration remains challenging in real data, where model misspecification probably exists, and true causal variants are unknown. We introduce replication failure rate (RFR), a metric to assess fine-mapping consistency by downsampling. SuSiE, FINEMAP and COJO-ABF show high RFR, indicating potential overconfidence in their output. Simulations reveal that nonsparse genetic architecture can lead to miscalibration, while imputation noise, nonuniform distribution of causal variants and quality control filters have minimal impact. Here we present SuSiE-inf and FINEMAP-inf, fine-mapping methods modeling infinitesimal effects alongside fewer larger causal effects. Our methods show improved calibration, RFR and functional enrichment, competitive recall and computational efficiency. Notably, using our methods' posterior effect sizes substantially increases polygenic risk score accuracy over SuSiE and FINEMAP. Our work improves causal variant identification for complex traits, a fundamental goal of human genetics.
Collapse
Affiliation(s)
- Ran Cui
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Roy A Elzur
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Masahiro Kanai
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
| | - Jacob C Ulirsch
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA, USA
| | - Omer Weissbrod
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Mark J Daly
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Benjamin M Neale
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Zhou Fan
- Department of Statistics and Data Science, Yale University, New Haven, CT, USA.
| | - Hilary K Finucane
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
9
|
Ma R, Liu Q, Liu Z, Sun X, Jiang X, Hou J, Zhang Y, Wu Y, Cheng M, Dong Z. H19/Mir-130b-3p/Cyp4a14 potentiate the effect of praziquantel on liver in the treatment of Schistosoma japonicum infection. Acta Trop 2023; 247:107012. [PMID: 37659685 DOI: 10.1016/j.actatropica.2023.107012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Revised: 08/09/2023] [Accepted: 08/30/2023] [Indexed: 09/04/2023]
Abstract
BACKGROUND Schistosomiasis is a prevalent infectious disease caused by the parasitic trematodes of the genus Schistosoma. Praziquantel (PZQ), a safe and affordable drug, is the recommended oral treatment for schistosomiasis. The main pathologic manifestation of schistosomiasis is liver injury. However, the role and interactions of various RNA molecules in the effect of PZQ on the liver after S. japonicum infection have not been elucidated. RESULTS In this study, C57BL/6 mice were randomly divided into the control group, infection group, and PZQ treatment group. Total RNA was extracted from the livers of the mice. High-throughput whole transcriptome sequencing was performed to detect the RNA expression profiles in the three groups. A co-expression gene-interaction network was established based on the significant differentially expressed genes in the PZQ treatment group; messenger RNA (mRNA) Cyp4a14 was identified as a critical hub gene. Furthermore, competitive endogenous RNA networks were constructed by predicting the specific binding relations between mRNA and long noncoding (lnc) RNA and between lncRNA and microRNA (miRNA) of Cyp4a14, suggesting the involvement of the H19/miR-130b-3p/Cyp4a14 regulatory axis. Dual luciferase reporter assay result proved the specific binding of miR-130b-3p with Cyp4a14 3'UTR. CONCLUSIONS Our findings indicate the involvement of the H19/miR-130b-3p/Cyp4a14 axis in the effect of PZQ on the liver after S. japonicum infection. Moreover, the expression of mRNA Cyp4a14 could be regulated by the bonding of miR-130b-3p with 3'UTR of Cyp4a14. The findings of this study could provide a novel perspective to understand the host response to PZQ against S. japonicum in the future.
Collapse
Affiliation(s)
- Rui Ma
- Department of Health and Disease Management, School of Nursing, Binzhou Medical University, Guanhai Road 346, Yantai, Shandong, 264000, China
| | - Qiang Liu
- Department of Anesthesia, Binzhou Medical University Hospital, Binzhou, Shandong, 256600, China
| | - Zimo Liu
- Electrocardiogram Room, Yantai Yuhuangding Hospital, Yantai, Shandong, 264000, China
| | - Xu Sun
- Department of Health and Disease Management, School of Nursing, Binzhou Medical University, Guanhai Road 346, Yantai, Shandong, 264000, China
| | - Xinze Jiang
- Department of Pathogenic Biology, School of Basic Medical Sciences, Binzhou Medical University, Guanhai Road 346, Yantai, Shandong, 264000, China
| | - Jiangshan Hou
- Department of Pathogenic Biology, School of Basic Medical Sciences, Binzhou Medical University, Guanhai Road 346, Yantai, Shandong, 264000, China
| | - Yumei Zhang
- Department of Pathogenic Biology, School of Basic Medical Sciences, Binzhou Medical University, Guanhai Road 346, Yantai, Shandong, 264000, China
| | - Yulong Wu
- Department of Pathogenic Biology, School of Basic Medical Sciences, Binzhou Medical University, Guanhai Road 346, Yantai, Shandong, 264000, China.
| | - Mei Cheng
- Department of Health and Disease Management, School of Nursing, Binzhou Medical University, Guanhai Road 346, Yantai, Shandong, 264000, China.
| | - Zhouyan Dong
- Department of Pathogenic Biology, School of Basic Medical Sciences, Binzhou Medical University, Guanhai Road 346, Yantai, Shandong, 264000, China.
| |
Collapse
|
10
|
Urbut SM, Koyama S, Hornsby W, Bhukar R, Kheterpal S, Truong B, Selvaraj MS, Neale B, O’Donnell CJ, Peloso GM, Natarajan P. Bayesian multivariate genetic analysis improves translational insights. iScience 2023; 26:107854. [PMID: 37766997 PMCID: PMC10520309 DOI: 10.1016/j.isci.2023.107854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 05/15/2023] [Accepted: 09/05/2023] [Indexed: 09/29/2023] Open
Abstract
While lipid traits are known essential mediators of cardiovascular disease, few approaches have taken advantage of their shared genetic effects. We apply a Bayesian multivariate size estimator, mash, to GWAS of four lipid traits in the Million Veterans Program (MVP) and provide posterior mean and local false sign rates for all effects. These estimates borrow information across traits to improve effect size accuracy. We show that controlling local false sign rates accurately and powerfully identifies replicable genetic associations and that multivariate control furthers the ability to explain complex diseases. Our application yields high concordance between independent datasets, more accurately prioritizes causal genes, and significantly improves polygenic prediction beyond state-of-the-art methods by up to 59% for lipid traits. The use of Bayesian multivariate genetic shrinkage has yet to be applied to human quantitative trait GWAS results, and we present a staged approach to prediction on a polygenic scale.
Collapse
Affiliation(s)
- Sarah M. Urbut
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA
| | - Satoshi Koyama
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA
- Department of Medicine Harvard Medical School, Boston, MA 02115, USA
| | - Whitney Hornsby
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA
- Department of Medicine Harvard Medical School, Boston, MA 02115, USA
| | - Rohan Bhukar
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA
- Department of Medicine Harvard Medical School, Boston, MA 02115, USA
| | - Sumeet Kheterpal
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA
| | - Buu Truong
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA
- Department of Medicine Harvard Medical School, Boston, MA 02115, USA
| | - Margaret S. Selvaraj
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA
- Department of Medicine Harvard Medical School, Boston, MA 02115, USA
| | - Benjamin Neale
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA
- Department of Medicine Harvard Medical School, Boston, MA 02115, USA
- Analytic Translational and Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Christopher J. O’Donnell
- Department of Medicine Harvard Medical School, Boston, MA 02115, USA
- VA Boston Department of Veterans Affairs, Boston, MA 02130, USA
| | - Gina M. Peloso
- Department of Biostatistics, Boston University School of Public Health, Boston, MA 02218, USA
| | - Pradeep Natarajan
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA
- Department of Medicine Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
11
|
Lopez-Ortiz C, Reddy UK, Zhang C, Natarajan P, Nimmakayala P, Benedito VA, Fabian M, Stommel J. QTL and PACE analyses identify candidate genes for anthracnose resistance in tomato. FRONTIERS IN PLANT SCIENCE 2023; 14:1200999. [PMID: 37615029 PMCID: PMC10443646 DOI: 10.3389/fpls.2023.1200999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 07/17/2023] [Indexed: 08/25/2023]
Abstract
Anthracnose, caused by the fungal pathogen Colletotrichum spp., is one of the most significant tomato diseases in the United States and worldwide. No commercial cultivars with anthracnose resistance are available, limiting resistant breeding. Cultivars with genetic resistance would significantly reduce crop losses, reduce the use of fungicides, and lessen the risks associated with chemical application. A recombinant inbred line (RIL) mapping population (N=243) has been made from a cross between the susceptible US28 cultivar and the resistant but semiwild and small-fruited 95L368 to identify quantitative trait loci (QTLs) associated with anthracnose resistance. The RIL population was phenotyped for resistance by inoculating ripe field-harvested tomato fruits with Colletotrichum coccodes for two seasons. In this study, we identified twenty QTLs underlying resistance, with a range of phenotypic variance of 4.5 to 17.2% using a skeletal linkage map and a GWAS. In addition, a QTLseq analysis was performed using deep sequencing of extreme bulks that validated QTL positions identified using traditional mapping and resolved candidate genes underlying various QTLs. We further validated AP2-like ethylene-responsive transcription factor, N-alpha-acetyltransferase (NatA), cytochrome P450, amidase family protein, tetratricopeptide repeat, bHLH transcription factor, and disease resistance protein RGA2-like using PCR allelic competitive extension (PACE) genotyping. PACE assays developed in this study will enable high-throughput screening for use in anthracnose resistance breeding in tomato.
Collapse
Affiliation(s)
- Carlos Lopez-Ortiz
- Department of Biology, Gus R. Douglass Institute, West Virginia State University, Institute, WV, United States
| | - Umesh K. Reddy
- Department of Biology, Gus R. Douglass Institute, West Virginia State University, Institute, WV, United States
| | - Chong Zhang
- The Genetic Improvement for Fruits & Vegetables Laboratory, United States Department of Agriculture, Agricultural Research Service, Beltsville, MD, United States
| | - Purushothaman Natarajan
- Department of Biology, Gus R. Douglass Institute, West Virginia State University, Institute, WV, United States
| | - Padma Nimmakayala
- Department of Biology, Gus R. Douglass Institute, West Virginia State University, Institute, WV, United States
| | | | - Matthew Fabian
- The Genetic Improvement for Fruits & Vegetables Laboratory, United States Department of Agriculture, Agricultural Research Service, Beltsville, MD, United States
| | - John Stommel
- The Genetic Improvement for Fruits & Vegetables Laboratory, United States Department of Agriculture, Agricultural Research Service, Beltsville, MD, United States
| |
Collapse
|
12
|
Sahana G, Cai Z, Sanchez MP, Bouwman AC, Boichard D. Invited review: Good practices in genome-wide association studies to identify candidate sequence variants in dairy cattle. J Dairy Sci 2023:S0022-0302(23)00357-0. [PMID: 37349208 DOI: 10.3168/jds.2022-22694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 02/01/2023] [Indexed: 06/24/2023]
Abstract
Genotype data from dairy cattle selection programs have greatly facilitated GWAS to identify variants related to economic traits. Results can enhance the accuracy of genomic prediction, analyze more complex models that go beyond additive effects, elucidate the genetic architecture of a trait, and finally, decipher the underlying biology of traits. The entire process, comprising data generation, quality control, statistical analyses, interpretation of association results, and linking results to biology should be designed and executed to minimize the generation of false-positive and false-negative associations and misleading links to biological processes. This review aims to provide general guidelines for data analysis that address data quality control, association tests, adjustment for population stratification, and significance evaluation to improve the reliability of conclusions. We also provide guidance on post-GWAS strategy and the interpretation of results. These guidelines are tailored to dairy cattle, which are characterized by long-range linkage disequilibrium, large half-sib families, and routinely collected phenotypes, requiring different approaches than those applied in human GWAS. We discuss common limitations and challenges that have been overlooked in the analysis and interpretation of GWAS to identify candidate sequence variants in dairy cattle.
Collapse
Affiliation(s)
- G Sahana
- Aarhus University, Center for Quantitative Genetic and Genomics, 8830 Tjele, Denmark.
| | - Z Cai
- Aarhus University, Center for Quantitative Genetic and Genomics, 8830 Tjele, Denmark
| | - M P Sanchez
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350 Jouy-en-Josas, France
| | - A C Bouwman
- Wageningen University & Research, Animal Breeding and Genomics, 6700 AH Wageningen, the Netherlands
| | - D Boichard
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350 Jouy-en-Josas, France
| |
Collapse
|
13
|
Chu BB, Ko S, Zhou JJ, Jensen A, Zhou H, Sinsheimer JS, Lange K. Multivariate genome-wide association analysis by iterative hard thresholding. Bioinformatics 2023; 39:btad193. [PMID: 37067496 PMCID: PMC10133532 DOI: 10.1093/bioinformatics/btad193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 04/07/2023] [Accepted: 04/13/2023] [Indexed: 04/18/2023] Open
Abstract
MOTIVATION In a genome-wide association study, analyzing multiple correlated traits simultaneously is potentially superior to analyzing the traits one by one. Standard methods for multivariate genome-wide association study operate marker-by-marker and are computationally intensive. RESULTS We present a sparsity constrained regression algorithm for multivariate genome-wide association study based on iterative hard thresholding and implement it in a convenient Julia package MendelIHT.jl. In simulation studies with up to 100 quantitative traits, iterative hard thresholding exhibits similar true positive rates, smaller false positive rates, and faster execution times than GEMMA's linear mixed models and mv-PLINK's canonical correlation analysis. On UK Biobank data with 470 228 variants, MendelIHT completed a three-trait joint analysis (n=185 656) in 20 h and an 18-trait joint analysis (n=104 264) in 53 h with an 80 GB memory footprint. In short, MendelIHT enables geneticists to fit a single regression model that simultaneously considers the effect of all SNPs and dozens of traits. AVAILABILITY AND IMPLEMENTATION Software, documentation, and scripts to reproduce our results are available from https://github.com/OpenMendel/MendelIHT.jl.
Collapse
Affiliation(s)
- Benjamin B Chu
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095-1554, United States
| | - Seyoon Ko
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095-1554, United States
- Department of Biostatistics, Fielding School of Public Health at UCLA, Los Angeles, CA 90095-1554, United States
| | - Jin J Zhou
- Department of Biostatistics, Fielding School of Public Health at UCLA, Los Angeles, CA 90095-1554, United States
- Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095-1554, United States
| | - Aubrey Jensen
- Department of Biostatistics, Fielding School of Public Health at UCLA, Los Angeles, CA 90095-1554, United States
| | - Hua Zhou
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095-1554, United States
- Department of Biostatistics, Fielding School of Public Health at UCLA, Los Angeles, CA 90095-1554, United States
| | - Janet S Sinsheimer
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095-1554, United States
- Department of Biostatistics, Fielding School of Public Health at UCLA, Los Angeles, CA 90095-1554, United States
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095-1554, United States
| | - Kenneth Lange
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095-1554, United States
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095-1554, United States
- Department of Statistics at UCLA, Los Angeles, CA 90095-1554, United States
| |
Collapse
|
14
|
Singh V. Current challenges and future implications of exploiting the omics data into nutrigenetics and nutrigenomics for personalized diagnosis and nutrition-based care. Nutrition 2023; 110:112002. [PMID: 36940623 DOI: 10.1016/j.nut.2023.112002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2022] [Revised: 01/18/2023] [Accepted: 02/04/2023] [Indexed: 02/12/2023]
Abstract
Nutrigenetics and nutrigenomics, combined with the omics technologies, are a demanding and an increasingly important field in personalizing nutrition-based care to understand an individual's response to nutrition-guided therapy. Omics is defined as the analysis of the large data sets of the biological system featuring transcriptomics, proteomics, and metabolomics and providing new insights into cell regulation. The effect of combining nutrigenetics and nutrigenomics with omics will give insight into molecular analysis, as human nutrition requirements vary per individual. Omics measures modest intraindividual variability and is critical to exploit these data for use in the development of precision nutrition. Omics, combined with nutrigenetics and nutrigenomics, is instrumental in the creation of goals for improving the accuracy of nutrition evaluations. Although dietary-based therapies are provided for various clinical conditions such as inborn errors of metabolism, limited advancement has been done to expand the omics data for a more mechanistic understanding of cellular networks dependent on nutrition-based expression and overall regulation of genes. The greatest challenge remains in the clinical sector to integrate the current data available, overcome the well-established limits of self-reported methods in research, and provide omics data, combined with nutrigenetics and nutrigenomics research, for each individual. Hence, the future seems promising if a design for personalized, nutrition-based diagnosis and care can be implemented practically in the health care sector.
Collapse
Affiliation(s)
- Varsha Singh
- Centre for Life Sciences, Chitkara School of Health Sciences, Chitkara University, Punjab, India.
| |
Collapse
|
15
|
Ogrodowicz P, Mikołajczak K, Kempa M, Mokrzycka M, Krajewski P, Kuczyńska A. Genome-wide association study of agronomical and root-related traits in spring barley collection grown under field conditions. FRONTIERS IN PLANT SCIENCE 2023; 14:1077631. [PMID: 36760640 PMCID: PMC9902773 DOI: 10.3389/fpls.2023.1077631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 01/06/2023] [Indexed: 06/18/2023]
Abstract
The root system is a key component for plant survival and productivity. In particular, under stress conditions, developing plants with a better root architecture can ensure productivity. The objectives of this study were to investigate the phenotypic variation of selected root- and yield-related traits in a diverse panel of spring barley genotypes. By performing a genome-wide association study (GWAS), we identified several associations underlying the variations occurring in root- and yield-related traits in response to natural variations in soil moisture. Here, we report the results of the GWAS based on both individual single-nucleotide polymorphism markers and linkage disequilibrium (LD) blocks of markers for 11 phenotypic traits related to plant morphology, grain quality, and root system in a group of spring barley accessions grown under field conditions. We also evaluated the root structure of these accessions by using a nondestructive method based on electrical capacitance. The results showed the importance of two LD-based blocks on chromosomes 2H and 7H in the expression of root architecture and yield-related traits. Our results revealed the importance of the region on the short arm of chromosome 2H in the expression of root- and yield-related traits. This study emphasized the pleiotropic effect of this region with respect to heading time and other important agronomic traits, including root architecture. Furthermore, this investigation provides new insights into the roles played by root traits in the yield performance of barley plants grown under natural conditions with daily variations in soil moisture content.
Collapse
|
16
|
Zhu X, Huang S, Kang W, Chen P, Liu J. Associations between polyunsaturated fatty acid concentrations and Parkinson's disease: A two-sample Mendelian randomization study. Front Aging Neurosci 2023; 15:1123239. [PMID: 36909950 PMCID: PMC9992541 DOI: 10.3389/fnagi.2023.1123239] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 02/02/2023] [Indexed: 02/24/2023] Open
Abstract
Introduction Observational studies demonstrated controversial effect of polyunsaturated fatty acids (PUFAs) on Parkinson's disease (PD) with limited causality evidence. Randomized control trials showed possible improvement in PD symptoms with PUFA supplement but had small study population and limited intervention time. Methods A two-sample Mendelian randomization was designed to evaluate the causal relevance between PUFAs and PD, using genetic variants of PUFAs as instrumental variables and PD data from the largest genome-wide association study as outcome. Inverse variance weighted (IVW) method was applied to obtain the primary outcome. Mendelian randomization Egger regression, weighted median and weighted mode methods were exploited to assist result analyses. Strict Mendelian randomization and multivariable Mendelian randomization (MVMR) were used to estimate direct effects of PUFAs on PD, eliminating pleiotropic effect. Debiased inverse variance weighted estimator was implemented when weak instrument bias was introduced into the analysis. A variety of sensitivity analyses were utilized to assess validity of the results. Results Our study included 33,674 PD cases and 449,056 controls. Higher plasma level of arachidonic acid (AA) was associated with a 3% increase of PD risk per 1-standard deviation (SD) increase of AA (IVW; Odds ratio (OR)=1.03 [95% confidence interval (CI) 1.01-1.04], P = 2.24E-04). After MVMR (IVW; OR=1.03 [95% CI 1.02-1.04], P =6.15E-08) and deletion of pleiotropic single-nucleotide polymorphisms overlapping with other lipids (IVW; OR=1.03 [95% CI 1.01-1.05], P =5.88E-04), result was still significant. Increased level of eicosapentaenoic acid (EPA) showed possible relevance with increased PD risk after adjustment of pleiotropy (MVMR; OR=1.05 [95% CI 1.01-1.08], P =5.40E-03). Linoleic acid (LA), docosahexaenoic acid (DHA), docosapentaenoic acid (DPA) and alpha-linolenic acid (ALA) were found not causally relevant to PD risk. Various sensitivity analyses verified the validity of our results. In conclusion, our findings from Mendelian randomization suggested that elevated levels of AA and possibly EPA might be linked to a higher risk of PD. No association between PD risk and LA, DHA, DPA, or ALA was found. Discussion The odds ratio for plasma AA and PD risk was weak. It is important to approach our results with caution in clinical practice and to conduct additional studies on the relationship between PUFAs and PD risk.
Collapse
Affiliation(s)
- Xue Zhu
- Department of Neurology and Institute of Neurology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Sijia Huang
- Department of Neurology and Institute of Neurology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Wenyan Kang
- Department of Neurology and Institute of Neurology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Peizhan Chen
- Department of General Surgery, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Jun Liu
- Department of Neurology and Institute of Neurology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.,Co-innovation Center of Neuroregeneration, Nantong University, Nantong, China
| |
Collapse
|
17
|
Gupta K, Kaur G, Pathak T, Banerjee I. Systematic review and meta-analysis of human genetic variants contributing to COVID-19 susceptibility and severity. Gene 2022; 844:146790. [PMID: 35987511 PMCID: PMC9384365 DOI: 10.1016/j.gene.2022.146790] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 07/25/2022] [Accepted: 08/05/2022] [Indexed: 12/12/2022]
Abstract
The COVID-19 pandemic has spawned global health crisis of unprecedented magnitude, claiming millions of lives and pushing healthcare systems in many countries to the brink. Among several factors that contribute to an increased risk of COVID-19 and progression to exacerbated manifestations, host genetic landscape is increasingly being recognized as a critical determinant of susceptibility/resistance to infection and a prognosticator of clinical outcomes in infected individuals. Recently, several case-control association studies investigated the influence of human gene variants on COVID-19 susceptibility and severity to identify the culpable mutations. However, a comprehensive synthesis of the recent advances in COVID-19 host genetics research was lacking, and the inconsistent findings of the association studies required reliable evaluation of the strength of association with greater statistical power. In this study, we embarked on a systematic search of all possible reports of genetic association with COVID-19 till April 07, 2022, and performed meta-analyses of all the genetic polymorphisms that were examined in at least three studies. After identifying a total of 84 studies that investigated the association of 130 polymorphisms in 61 genes, we performed meta-analyses of all the eligible studies. Seven genetic polymorphisms involving 15,550 cases and 444,007 controls were explored for association with COVID-19 susceptibility, of which, ACE1 I/D rs4646994/rs1799752, APOE rs429358, CCR5 rs333, and IFITM3 rs12252 showed increased risk of infection. Meta-analyses of 11 gene variants involving 6702 patients with severe COVID-19 and 8640 infected individuals with non-severe manifestations revealed statistically significant association of ACE2 rs2285666, ACE2 rs2106809, ACE2 rs2074192, AGTR1 rs5186, and TNFA rs1800629 with COVID-19 severity. Overall, our study presents a synthesis of evidence on all the genetic determinants implicated in COVID-19 to date, and provides evidence of correlation between the above polymorphisms with COVID-19 susceptibility and severity.
Collapse
Affiliation(s)
| | | | | | - Indranil Banerjee
- Cellular Virology Lab, Department of Biological Sciences, Indian Institute of Science Education and Research, Mohali (IISER Mohali), Sector 81, S.A.S Nagar, Mohali 140306, India.
| |
Collapse
|
18
|
Singh D, Roy J. A large-scale benchmark study of tools for the classification of protein-coding and non-coding RNAs. Nucleic Acids Res 2022; 50:12094-12111. [PMID: 36420898 PMCID: PMC9757047 DOI: 10.1093/nar/gkac1092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Revised: 10/22/2022] [Accepted: 10/28/2022] [Indexed: 11/27/2022] Open
Abstract
Identification of protein-coding and non-coding transcripts is paramount for understanding their biological roles. Computational approaches have been addressing this task for over a decade; however, generalized and high-performance models are still unreliable. This benchmark study assessed the performance of 24 tools producing >55 models on the datasets covering a wide range of species. We have collected 135 small and large transcriptomic datasets from existing studies for comparison and identified the potential bottlenecks hampering the performance of current tools. The key insights of this study include lack of standardized training sets, reliance on homogeneous training data, gradual changes in annotated data, lack of augmentation with homology searches, the presence of false positives and negatives in datasets and the lower performance of end-to-end deep learning models. We also derived a new dataset, RNAChallenge, from the benchmark considering hard instances that may include potential false alarms. The best and least well performing models under- and overfit the dataset, respectively, thereby serving a dual purpose. For computational approaches, it will be valuable to develop accurate and unbiased models. The identification of false alarms will be of interest for genome annotators, and experimental study of hard RNAs will help to untangle the complexity of the RNA world.
Collapse
Affiliation(s)
- Dalwinder Singh
- To whom correspondence should be addressed. Tel: +91 172 5221206;
| | - Joy Roy
- Correspondence may also be addressed to Joy Roy.
| |
Collapse
|
19
|
He Z, Liu L, Belloy ME, Le Guen Y, Sossin A, Liu X, Qi X, Ma S, Gyawali PK, Wyss-Coray T, Tang H, Sabatti C, Candès E, Greicius MD, Ionita-Laza I. GhostKnockoff inference empowers identification of putative causal variants in genome-wide association studies. Nat Commun 2022; 13:7209. [PMID: 36418338 PMCID: PMC9684164 DOI: 10.1038/s41467-022-34932-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Accepted: 11/09/2022] [Indexed: 11/27/2022] Open
Abstract
Recent advances in genome sequencing and imputation technologies provide an exciting opportunity to comprehensively study the contribution of genetic variants to complex phenotypes. However, our ability to translate genetic discoveries into mechanistic insights remains limited at this point. In this paper, we propose an efficient knockoff-based method, GhostKnockoff, for genome-wide association studies (GWAS) that leads to improved power and ability to prioritize putative causal variants relative to conventional GWAS approaches. The method requires only Z-scores from conventional GWAS and hence can be easily applied to enhance existing and future studies. The method can also be applied to meta-analysis of multiple GWAS allowing for arbitrary sample overlap. We demonstrate its performance using empirical simulations and two applications: (1) a meta-analysis for Alzheimer's disease comprising nine overlapping large-scale GWAS, whole-exome and whole-genome sequencing studies and (2) analysis of 1403 binary phenotypes from the UK Biobank data in 408,961 samples of European ancestry. Our results demonstrate that GhostKnockoff can identify putatively functional variants with weaker statistical effects that are missed by conventional association tests.
Collapse
Affiliation(s)
- Zihuai He
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA.
- Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA, 94305, USA.
| | - Linxi Liu
- Department of Statistics, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Michael E Belloy
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Yann Le Guen
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
- Institut du Cerveau - Paris Brain Institute - ICM, Paris, 75013, France
| | - Aaron Sossin
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Xiaoxia Liu
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Xinran Qi
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Shiyang Ma
- Department of Biostatistics, Columbia University, New York, NY, 10032, USA
| | - Prashnna K Gyawali
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Tony Wyss-Coray
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Hua Tang
- Department of Genetics, Stanford University, Stanford, CA, 94305, USA
| | - Chiara Sabatti
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Emmanuel Candès
- Department of Statistics, Stanford University, Stanford, CA, 94305, USA
- Department of Mathematics, Stanford University, Stanford, CA, 94305, USA
| | - Michael D Greicius
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | | |
Collapse
|
20
|
Yang Y, Wang C, Liu L, Buxbaum J, He Z, Ionita-Laza I. KnockoffTrio: A knockoff framework for the identification of putative causal variants in genome-wide association studies with trio design. Am J Hum Genet 2022; 109:1761-1776. [PMID: 36150388 PMCID: PMC9606389 DOI: 10.1016/j.ajhg.2022.08.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Accepted: 08/24/2022] [Indexed: 01/25/2023] Open
Abstract
Family-based designs can eliminate confounding due to population substructure and can distinguish direct from indirect genetic effects, but these designs are underpowered due to limited sample sizes. Here, we propose KnockoffTrio, a statistical method to identify putative causal genetic variants for father-mother-child trio design built upon a recently developed knockoff framework in statistics. KnockoffTrio controls the false discovery rate (FDR) in the presence of arbitrary correlations among tests and is less conservative and thus more powerful than the conventional methods that control the family-wise error rate via Bonferroni correction. Furthermore, KnockoffTrio is not restricted to family-based association tests and can be used in conjunction with more powerful, potentially nonlinear models to improve the power of standard family-based tests. We show, using empirical simulations, that KnockoffTrio can prioritize causal variants over associations due to linkage disequilibrium and can provide protection against confounding due to population stratification. In applications to 14,200 trios from three study cohorts for autism spectrum disorders (ASDs), including AGP, SPARK, and SSC, we show that KnockoffTrio can identify multiple significant associations that are missed by conventional tests applied to the same data. In particular, we replicate known ASD association signals with variants in several genes such as MACROD2, NRXN1, PRKAR1B, CADM2, PCDH9, and DOCK4 and identify additional associations with variants in other genes including ARHGEF10, SLC28A1, ZNF589, and HINT1 at FDR 10%.
Collapse
Affiliation(s)
- Yi Yang
- Department of Biostatistics, Columbia University, New York, NY 10032, USA; Department of Biostatistics, City University of Hong Kong, Hong Kong SAR, China; School of Data Science, City University of Hong Kong, Hong Kong SAR, China
| | - Chen Wang
- Department of Biostatistics, Columbia University, New York, NY 10032, USA
| | - Linxi Liu
- Department of Statistics, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | - Joseph Buxbaum
- Departments of Psychiatry, Neuroscience, and Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Zihuai He
- Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA 94305, USA; Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA 94305, USA
| | | |
Collapse
|
21
|
Mezzavilla M, Cocca M, Maisano Delser P, Badii R, Abbaszadeh F, Hadi KA, Giorgia G, Gasparini P. Ancestry-related distribution of Runs of homozygosity and functional variants in Qatari population. BMC Genom Data 2022; 23:73. [PMID: 36131251 PMCID: PMC9490902 DOI: 10.1186/s12863-022-01087-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Accepted: 08/29/2022] [Indexed: 11/16/2022] Open
Abstract
Background Describing how genetic history shapes the pattern of medically relevant variants could improve the understanding of how specific loci interact with each other and affect diseases and traits prevalence. The Qatari population is characterized by a complex history of admixture and substructure, and the study of its population genomic features would provide valuable insights into the genetic landscape of functional variants. Here, we analyzed the genomic variation of 186 newly-genotyped healthy individuals from the Qatari peninsula. Results We discovered an intricate genetic structure using ancestry related analyses. In particular, the presence of three different clusters, Cluster 1, Cluster 2 and Cluster 3 (with Near Eastern, South Asian and African ancestry, respectively), was detected with an additional fourth one (Cluster 4) with East Asian ancestry. These subpopulations show differences in the distribution of runs of homozygosity (ROH) and admixture events in the past, ranging from 40 to 5 generations ago. This complex genetic history led to a peculiar pattern of functional markers under positive selection, differentiated in shared signals and private signals. Interestingly we found several signatures of shared selection on SNPs in the FADS2 gene, hinting at a possible common evolutionary link to dietary intake. Among the private signals, we found enrichment for markers associated with HDL and LDL for Cluster 1(Near Eastern ancestry) and Cluster 3 (South Asian ancestry) and height and blood traits for Cluster 2 (African ancestry). The differences in genetic history among these populations also resulted in the different frequency distribution of putative loss of function variants. For example, homozygous carriers for rs2884737, a variant linked to an anticoagulant drug (warfarin) response, are mainly represented by individuals with predominant Bedouin ancestry (risk allele frequency G at 0.48). Conclusions We provided a detailed catalogue of the different ancestral pattern in the Qatari population highlighting differences and similarities in the distribution of selected variants and putative loss of functions. Finally, these results would provide useful guidance for assessing genetic risk factors linked to consanguinity and genetic ancestry.
Supplementary Information The online version contains supplementary material available at 10.1186/s12863-022-01087-1.
Collapse
|
22
|
Kassani PH, Lu F, Guen YL, Belloy ME, He Z. Deep neural networks with controlled variable selection for the identification of putative causal genetic variants. NAT MACH INTELL 2022; 4:761-771. [PMID: 37859729 PMCID: PMC10586424 DOI: 10.1038/s42256-022-00525-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Accepted: 07/26/2022] [Indexed: 11/09/2022]
Abstract
Deep neural networks (DNNs) have been successfully utilized in many scientific problems for their high prediction accuracy, but their application to genetic studies remains challenging due to their poor interpretability. Here we consider the problem of scalable, robust variable selection in DNNs for the identification of putative causal genetic variants in genome sequencing studies. We identified a pronounced randomness in feature selection in DNNs due to its stochastic nature, which may hinder interpretability and give rise to misleading results. We propose an interpretable neural network model, stabilized using ensembling, with controlled variable selection for genetic studies. The merit of the proposed method includes: flexible modelling of the nonlinear effect of genetic variants to improve statistical power; multiple knockoffs in the input layer to rigorously control the false discovery rate; hierarchical layers to substantially reduce the number of weight parameters and activations, and improve computational efficiency; and stabilized feature selection to reduce the randomness in identified signals. We evaluate the proposed method in extensive simulation studies and apply it to the analysis of Alzheimer's disease genetics. We show that the proposed method, when compared with conventional linear and nonlinear methods, can lead to substantially more discoveries.
Collapse
Affiliation(s)
- Peyman H. Kassani
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA
| | - Fred Lu
- Department of Statistics, Stanford University, Stanford, CA, USA
| | - Yann Le Guen
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA
| | - Michael E. Belloy
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA
| | - Zihuai He
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA
- Quantitative Sciences Unit, Department of Medicine (Biomedical Informatics Research), Stanford University, Stanford, CA, USA
| |
Collapse
|
23
|
Li S, Sesia M, Romano Y, Candès E, Sabatti C. Searching for robust associations with a multi-environment knockoff filter. Biometrika 2022; 109:611-629. [PMID: 38633763 PMCID: PMC11022501 DOI: 10.1093/biomet/asab055] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024] Open
Abstract
This paper develops a method based on model-X knockoffs to find conditional associations that are consistent across environments, controlling the false discovery rate. The motivation for this problem is that large data sets may contain numerous associations that are statistically significant and yet misleading, as they are induced by confounders or sampling imperfections. However, associations replicated under different conditions may be more interesting. In fact, consistency sometimes provably leads to valid causal inferences even if conditional associations do not. While the proposed method is widely applicable, this paper highlights its relevance to genome-wide association studies, in which robustness across populations with diverse ancestries mitigates confounding due to unmeasured variants. The effectiveness of this approach is demonstrated by simulations and applications to the UK Biobank data.
Collapse
Affiliation(s)
- S Li
- Department of Statistics, Stanford University, Stanford, California 94305, USA
| | - M Sesia
- Department of Data Sciences and Operations, University of Southern California, Los Angeles, California 90089, USA
| | - Y Romano
- Departments of Electrical Engineering and of Computer Science, Technion, Haifa, Israel
| | - E Candès
- Department of Statistics, Stanford University, Stanford, California 94305, USA
| | - C Sabatti
- Department of Statistics, Stanford University, Stanford, California 94305, USA
| |
Collapse
|
24
|
Yin X, Bi Y, Jiang F, Guo R, Zhang Y, Fan J, Kang MS, Fan X. Fine mapping of candidate quantitative trait loci for plant and ear height in a maize nested-association mapping population. FRONTIERS IN PLANT SCIENCE 2022; 13:963985. [PMID: 35991429 PMCID: PMC9386523 DOI: 10.3389/fpls.2022.963985] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 07/05/2022] [Indexed: 05/31/2023]
Abstract
Plant height (PH) and ear height (EH) are two important traits in maize (Zea mays L.), as they are closely related to lodging resistance and planting density. Our objectives were to (1) investigate single-nucleotide polymorphisms (SNPs) that are associated with PH and EH for detecting quantitative trait loci (QTL) and new gene that determines PH and EH, (2) explore the value of the QTL in maize breeding, and (3) investigate whether the "triangle heterotic group" theory is applicable for lowering PH and EH to increase yield. Seven inbred female parents were crossed with a common founder male parent Ye 107 to create a nested association mapping (NAM) population. The analysis of phenotypic data on PH and EH revealed wide variation among the parents of the NAM population. Genome-wide association study (GWAS) and high-resolution linkage mapping were conducted using the NAM population, which generated 264,694 SNPs by genotyping-by-sequencing. A total of 105 SNPs and 22 QTL were identified by GWAS and found to be significantly associated with PH and EH. A high-confidence QTL for PH, Qtl-chr1-EP, was identified on chromosome 1 via GWAS and confirmed by linkage analysis in two recombinant inbred line (RIL) populations. Results revealed that the SNP variation in the promoter region of the candidate gene Zm00001d031938, located at Qtl-chr1-EP, which encoded UDP-N-acetylglucosamine-peptide N-acetyl-glucosaminyl-transferase, might decrease PH and EH. Furthermore, the triangle heterotic pattern adopted in maize breeding programs by our team is practicable in selecting high-yield crosses based on the low ratio of EH/PH (EP).
Collapse
Affiliation(s)
- Xingfu Yin
- College of Agronomy and Biotechnology, Yunnan Agricultural University, Kunming, China
- Institute of Food Crops, Yunnan Academy of Agricultural Sciences, Kunming, China
| | - Yaqi Bi
- Institute of Food Crops, Yunnan Academy of Agricultural Sciences, Kunming, China
| | - Fuyan Jiang
- Institute of Food Crops, Yunnan Academy of Agricultural Sciences, Kunming, China
| | - Ruijia Guo
- Institute of Food Crops, Yunnan Academy of Agricultural Sciences, Kunming, China
| | - Yudong Zhang
- Institute of Food Crops, Yunnan Academy of Agricultural Sciences, Kunming, China
| | - Jun Fan
- Institute of Food Crops, Yunnan Academy of Agricultural Sciences, Kunming, China
| | - Manjit S. Kang
- Department of Plant Pathology, Kansas State University, Manhattan, KS, United States
| | - Xingming Fan
- Institute of Food Crops, Yunnan Academy of Agricultural Sciences, Kunming, China
| |
Collapse
|
25
|
Euclide PT, Jasonowicz A, Sitar S, Fischer G, Goetz FW. Further evidence from common garden rearing experiments of heritable traits separating lean and siscowet lake charr (Salvelinus namaycush) ecotypes. Mol Ecol 2022; 31:3432-3450. [PMID: 35510796 PMCID: PMC9323484 DOI: 10.1111/mec.16492] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 03/07/2022] [Accepted: 04/12/2022] [Indexed: 11/30/2022]
Abstract
Genetic evidence of selection for complex and polygenically regulated phenotypes can easily become masked by neutral population genetic structure and phenotypic plasticity. Without direct evidence of genotype‐phenotype associations it can be difficult to conclude to what degree a phenotype is heritable or a product of environment. Common garden laboratory studies control for environmental stochasticity and help to determine the mechanism that regulate traits. Here we assess lipid content, growth, weight, and length variation in full and hybrid F1 crosses of deep and shallow water sympatric lake charr ecotypes reared for nine years in a common garden experiment. Redundancy analysis (RDA) and quantitative‐trait‐loci (QTL) genomic scans are used to identify associations between genotypes at 19,714 single nucleotide polymorphisms (SNPs) aligned to the lake charr genome and individual phenotypes to determine the role that genetic inheritance plays in ecotype phenotypic diversity. Lipid content, growth, length, and weight differed significantly among lake charr crosses throughout the experiment suggesting that pedigree plays a large roll in lake charr development. Polygenic scores of 15 SNPs putatively associated with lipid content and/or condition factor indicated that ecotype distinguishing traits are polygenically regulated and additive. A QTL identified on chromosome 38 contained >200 genes, some of which were associated with lipid metabolism and growth, demonstrating the complex nature of ecotype diversity. The results of our common garden study further indicate that lake charr ecotypes observed in nature are predetermined at birth and that ecotypes differ fundamentally in lipid metabolism and growth.
Collapse
Affiliation(s)
- P T Euclide
- Purdue University, Department of Forestry and Natural Resources, West Lafayette, IN, 47907, USA
| | - A Jasonowicz
- The International Halibut Commission, 2320 West Commodore Way, Suite 300, Seattle, WA, 98199-1287, USA
| | - S Sitar
- Michigan Department of Natural Resources, Marquette Fisheries Research Station, 484 Cherry Creek Rd., Marquette, MI, 49855, USA
| | - G Fischer
- University of Wisconsin-Stevens Point, Northern Aquaculture Demonstration Facility, 36445 State Hwy 13, Bayfield, WI, 54814, USA
| | - F W Goetz
- University of Wisconsin - Milwaukee, School of Freshwater Sciences, 600 East Greenfield Ave., Milwaukee, WI, 53204, USA
| |
Collapse
|
26
|
Škrabišová M, Dietz N, Zeng S, Chan YO, Wang J, Liu Y, Biová J, Joshi T, Bilyeu KD. A novel Synthetic phenotype association study approach reveals the landscape of association for genomic variants and phenotypes. J Adv Res 2022; 42:117-133. [PMID: 36513408 PMCID: PMC9788956 DOI: 10.1016/j.jare.2022.04.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 02/14/2022] [Accepted: 04/08/2022] [Indexed: 12/27/2022] Open
Abstract
INTRODUCTION Genome-Wide Association Studies (GWAS) identify tagging variants in the genome that are statistically associated with the phenotype because of their linkage disequilibrium (LD) relationship with the causative mutation (CM). When both low-density genotyped accession panels with phenotypes and resequenced data accession panels are available, tagging variants can assist with post-GWAS challenges in CM discovery. OBJECTIVES Our objective was to identify additional GWAS evaluation criteria to assess correspondence between genomic variants and phenotypes, as well as enable deeper analysis of the localized landscape of association. METHODS We used genomic variant positions as Synthetic phenotypes in GWAS that we named "Synthetic phenotype association study" (SPAS). The extreme case of SPAS is what we call an "Inverse GWAS" where we used CM positions of cloned soybean genes. We developed and validated the Accuracy concept as a measure of the correspondence between variant positions and phenotypes. RESULTS The SPAS approach demonstrated that the genotype status of an associated variant used as a Synthetic phenotype enabled us to explore the relationships between tagging variants and CMs, and further, that utilizing CMs as Synthetic phenotypes in Inverse GWAS illuminated the landscape of association. We implemented the Accuracy calculation for a curated accession panel to an online Accuracy calculation tool (AccuTool) as a resource for gene identification in soybean. We demonstrated our concepts on three examples of soybean cloned genes. As a result of our findings, we devised an enhanced "GWAS to Genes" analysis (Synthetic phenotype to CM strategy, SP2CM). Using SP2CM, we identified a CM for a novel gene. CONCLUSION The SP2CM strategy utilizing Synthetic phenotypes and the Accuracy calculation of correspondence provides crucial information to assist researchers in CM discovery. The impact of this work is a more effective evaluation of landscapes of GWAS associations.
Collapse
Affiliation(s)
- Mária Škrabišová
- Department of Biochemistry, Faculty of Science, Palacky University Olomouc, Olomouc 78371, Czech Republic
| | - Nicholas Dietz
- Division of Plant Sciences, University of Missouri, Columbia, MO 65201, USA
| | - Shuai Zeng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65212, USA,Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65212, USA
| | - Yen On Chan
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65212, USA,MU Data Science and Informatics Institute, University of Missouri, Columbia, MO 65212, USA
| | - Juexin Wang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65212, USA,Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65212, USA
| | - Yang Liu
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65212, USA,MU Data Science and Informatics Institute, University of Missouri, Columbia, MO 65212, USA
| | - Jana Biová
- Department of Biochemistry, Faculty of Science, Palacky University Olomouc, Olomouc 78371, Czech Republic
| | - Trupti Joshi
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65212, USA,Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65212, USA,MU Data Science and Informatics Institute, University of Missouri, Columbia, MO 65212, USA,Department of Health Management and Informatics, School of Medicine, University of Missouri, Columbia, MO 65212, USA,Corresponding authors at: Department of Health Management and Informatics, School of Medicine, 1201 E Rollins St, 271B Life Science Center, Columbia, MO 65201, USA (T. Joshi). Plant Genetics Research Unit, United States Department of Agriculture-Agricultural Research Service, 110 Waters Hall, University of Missouri, Columbia, MO 65211, USA (K.D. Bilyeu).
| | - Kristin D. Bilyeu
- Plant Genetics Research Unit, United States Department of Agriculture-Agricultural Research Service, University of Missouri, Columbia, MO 65211, USA,Corresponding authors at: Department of Health Management and Informatics, School of Medicine, 1201 E Rollins St, 271B Life Science Center, Columbia, MO 65201, USA (T. Joshi). Plant Genetics Research Unit, United States Department of Agriculture-Agricultural Research Service, 110 Waters Hall, University of Missouri, Columbia, MO 65211, USA (K.D. Bilyeu).
| |
Collapse
|
27
|
Kenny D, Carthy TR, Murphy CP, Sleator RD, Evans RD, Berry DP. The Association Between Genomic Heterozygosity and Carcass Merit in Cattle. Front Genet 2022; 13:789270. [PMID: 35281838 PMCID: PMC8908906 DOI: 10.3389/fgene.2022.789270] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Accepted: 01/25/2022] [Indexed: 12/16/2022] Open
Abstract
The objective of the present study was to quantify the association between both pedigree and genome-based measures of global heterozygosity and carcass traits, and to identify single nucleotide polymorphisms (SNPs) exhibiting non-additive associations with these traits. The carcass traits of interest were carcass weight (CW), carcass conformation (CC) and carcass fat (CF). To define the genome-based measures of heterozygosity, and to quantify the non-additive associations between SNPs and the carcass traits, imputed, high-density genotype data, comprising of 619,158 SNPs, from 27,213 cattle were used. The correlations between the pedigree-based heterosis coefficient and the three defined genomic measures of heterozygosity ranged from 0.18 to 0.76. The associations between the different measures of heterozygosity and the carcass traits were biologically small, with positive associations for CW and CC, and negative associations for CF. Furthermore, even after accounting for the pedigree-based heterosis coefficient of an animal, part of the remaining variability in some of the carcass traits could be captured by a genomic heterozygosity measure. This signifies that the inclusion of both a heterosis coefficient based on pedigree information and a genome-based measure of heterozygosity could be beneficial to limiting bias in predicting additive genetic merit. Finally, one SNP located on Bos taurus (BTA) chromosome number 5 demonstrated a non-additive association with CW. Furthermore, 182 SNPs (180 SNPs on BTA 2 and two SNPs on BTA 21) demonstrated a non-additive association with CC, while 231 SNPs located on BTA 2, 5, 11, 13, 14, 18, 19 and 21 demonstrated a non-additive association with CF. Results demonstrate that heterozygosity both at a global level and at the level of individual loci contribute little to the variability in carcass merit.
Collapse
Affiliation(s)
- David Kenny
- Animal and Grassland Research and Innovation Centre, Teagasc, Fermoy, Ireland
- Department of Biological Sciences, Munster Technological University, Cork, Ireland
| | - Tara R. Carthy
- Animal and Grassland Research and Innovation Centre, Teagasc Grange, Dunsany, Ireland
| | - Craig P. Murphy
- Department of Biological Sciences, Munster Technological University, Cork, Ireland
| | - Roy D. Sleator
- Department of Biological Sciences, Munster Technological University, Cork, Ireland
| | | | - Donagh P. Berry
- Animal and Grassland Research and Innovation Centre, Teagasc, Fermoy, Ireland
- *Correspondence: Donagh P. Berry,
| |
Collapse
|
28
|
Wang W, Janson L. A High-Dimensional Power Analysis of the Conditional Randomization Test and Knockoffs. Biometrika 2021. [DOI: 10.1093/biomet/asab052] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Summary
In many scientific problems, researchers try to relate a response variable Y to a set of potential explanatory variables X = (X1,…,Xp), and start by trying to identify variables that contribute to this relationship. In statistical terms, this goal can be posed as trying to identify the Xj’s upon which Y is conditionally dependent. Sometimes it is of value to simultaneously test for each j, which is more commonly known as variable selection. The conditional randomization test, CRT, and model-X knockoffs are two recently proposed methods that respectively perform conditional independence testing and variable selection by, for each Xj, computing any test statistic on the data and assessing that test statistic’s significance by comparing it to test statistics computed on synthetic variables generated using knowledge of X’s distribution. Our main contribution is to analyse their power in a high-dimensional linear model where the ratio of the dimension p and the sample size n converge to a positive constant. We give explicit expressions for the asymptotic power of the CRT, variable selection with CRT p-values, and model-X knockoffs, each with a test statistic based on either the marginal covariance, the least squares coefficient, or the lasso. One useful application of our analysis is the direct theoretical comparison of the asymptotic powers of variable selection with CRT p-values and model-X knockoffs; in the instances with independent covariates that we consider, the CRT provably dominates knockoffs. We also analyse the power gain from using unlabelled data in the CRT when limited knowledge of X’s distribution is available, and the power of the CRT when samples are collected retrospectively.
Collapse
Affiliation(s)
- Wenshuo Wang
- Department of Statistics, Harvard University, One Oxford Street, Cambridge, Massachusetts 02138, U.S.A
| | - Lucas Janson
- Department of Statistics, Harvard University, One Oxford Street, Cambridge, Massachusetts 02138, U.S.A
| |
Collapse
|