1
|
Alamin M, Sultana MH, Lou X, Jin W, Xu H. Dissecting Complex Traits Using Omics Data: A Review on the Linear Mixed Models and Their Application in GWAS. PLANTS (BASEL, SWITZERLAND) 2022; 11:3277. [PMID: 36501317 PMCID: PMC9739826 DOI: 10.3390/plants11233277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 11/23/2022] [Accepted: 11/25/2022] [Indexed: 06/17/2023]
Abstract
Genome-wide association study (GWAS) is the most popular approach to dissecting complex traits in plants, humans, and animals. Numerous methods and tools have been proposed to discover the causal variants for GWAS data analysis. Among them, linear mixed models (LMMs) are widely used statistical methods for regulating confounding factors, including population structure, resulting in increased computational proficiency and statistical power in GWAS studies. Recently more attention has been paid to pleiotropy, multi-trait, gene-gene interaction, gene-environment interaction, and multi-locus methods with the growing availability of large-scale GWAS data and relevant phenotype samples. In this review, we have demonstrated all possible LMMs-based methods available in the literature for GWAS. We briefly discuss the different LMM methods, software packages, and available open-source applications in GWAS. Then, we include the advantages and weaknesses of the LMMs in GWAS. Finally, we discuss the future perspective and conclusion. The present review paper would be helpful to the researchers for selecting appropriate LMM models and methods quickly for GWAS data analysis and would benefit the scientific society.
Collapse
Affiliation(s)
- Md. Alamin
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | | | - Xiangyang Lou
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Wenfei Jin
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Haiming Xu
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
2
|
Pralle RS, Schultz NE, White HM, Weigel KA. Hyperketonemia GWAS and parity-dependent SNP associations in Holstein dairy cows intensively sampled for blood β-hydroxybutyrate concentration. Physiol Genomics 2020; 52:347-357. [PMID: 32628084 DOI: 10.1152/physiolgenomics.00016.2020] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Hyperketonemia (HYK) is a metabolic disorder that affects early postpartum dairy cows; however, there has been limited success in identifying genomic variants contributing to HYK susceptibility. We conducted a genome-wide association study (GWAS) using HYK phenotypes based on an intensive screening protocol, interrogated genotype interactions with parity group (GWIS), and evaluated the enrichment of annotated metabolic pathways. Holstein cows were enrolled into the experiment after parturition, and blood samples were collected at four timepoints between 5 and 18 days postpartum. Concentration of blood β-hydroxybutyrate (BHB) was quantified cow-side via a handheld BHB meter. Cows were labeled as a HYK case when at least one blood sample had BHB ≥ 1.2 mmol/L, and all other cows were considered non-HYK controls. After quality control procedures, 1,710 cows and 58,699 genotypes were available for further analysis. The GWAS and GWIS were performed using the forward feature select linear mixed model method. There was evidence for an association between ARS-BFGL-NGS-91238 and HYK susceptibility, as well as parity-dependent associations to HYK for BovineHD0600024247 and BovineHD1400023753. Candidate genes annotated to these single nuclear polymorphism associations have been previously associated with obesity, diabetes, insulin resistance, and fatty liver in humans and rodent models. Enrichment analysis revealed focal adhesion and axon guidance as metabolic pathways contributing to HYK etiology, while genetic variation in pathways related to insulin secretion and sensitivity may affect HYK susceptibility in a parity-dependent matter. In conclusion, the present work proposes several novel marker associations and metabolic pathways contributing to genetic risk for HYK susceptibility.
Collapse
Affiliation(s)
- Ryan S Pralle
- Department of Dairy Science, University of Wisconsin-Madison, Madison, Wisconsin
| | - Nichol E Schultz
- Department of Dairy Science, University of Wisconsin-Madison, Madison, Wisconsin
| | - Heather M White
- Department of Dairy Science, University of Wisconsin-Madison, Madison, Wisconsin
| | - Kent A Weigel
- Department of Dairy Science, University of Wisconsin-Madison, Madison, Wisconsin
| |
Collapse
|
3
|
Shafquat A, Crystal RG, Mezey JG. Identifying novel associations in GWAS by hierarchical Bayesian latent variable detection of differentially misclassified phenotypes. BMC Bioinformatics 2020; 21:178. [PMID: 32381021 PMCID: PMC7204256 DOI: 10.1186/s12859-020-3387-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Accepted: 01/24/2020] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Heterogeneity in the definition and measurement of complex diseases in Genome-Wide Association Studies (GWAS) may lead to misdiagnoses and misclassification errors that can significantly impact discovery of disease loci. While well appreciated, almost all analyses of GWAS data consider reported disease phenotype values as is without accounting for potential misclassification. RESULTS Here, we introduce Phenotype Latent variable Extraction of disease misdiagnosis (PheLEx), a GWAS analysis framework that learns and corrects misclassified phenotypes using structured genotype associations within a dataset. PheLEx consists of a hierarchical Bayesian latent variable model, where inference of differential misclassification is accomplished using filtered genotypes while implementing a full mixed model to account for population structure and genetic relatedness in study populations. Through simulations, we show that the PheLEx framework dramatically improves recovery of the correct disease state when considering realistic allele effect sizes compared to existing methodologies designed for Bayesian recovery of disease phenotypes. We also demonstrate the potential of PheLEx for extracting new potential loci from existing GWAS data by analyzing bipolar disorder and epilepsy phenotypes available from the UK Biobank. From the PheLEx analysis of these data, we identified new candidate disease loci not previously reported for these datasets that have value for supplemental hypothesis generation. CONCLUSION PheLEx shows promise in reanalyzing GWAS datasets to provide supplemental candidate loci that are ignored by traditional GWAS analysis methodologies.
Collapse
Affiliation(s)
- Afrah Shafquat
- Department of Computational Biology, Cornell University, Ithaca, NY USA
| | - Ronald G. Crystal
- Department of Genetic Medicine, Weill Cornell Medicine, New York, NY USA
- Department of Medicine, Weill Cornell Medicine, New York, NY USA
| | - Jason G. Mezey
- Department of Computational Biology, Cornell University, Ithaca, NY USA
- Department of Genetic Medicine, Weill Cornell Medicine, New York, NY USA
| |
Collapse
|
4
|
Guo Y, Wu C, Guo M, Zou Q, Liu X, Keinan A. Combining Sparse Group Lasso and Linear Mixed Model Improves Power to Detect Genetic Variants Underlying Quantitative Traits. Front Genet 2019; 10:271. [PMID: 31024614 PMCID: PMC6469383 DOI: 10.3389/fgene.2019.00271] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2018] [Accepted: 03/12/2019] [Indexed: 11/13/2022] Open
Abstract
Genome-Wide association studies (GWAS), based on testing one single nucleotide polymorphism (SNP) at a time, have revolutionized our understanding of the genetics of complex traits. In GWAS, there is a need to consider confounding effects such as due to population structure, and take groups of SNPs into account simultaneously due to the “polygenic” attribute of complex quantitative traits. In this paper, we propose a new approach SGL-LMM that puts together sparse group lasso (SGL) and linear mixed model (LMM) for multivariate associations of quantitative traits. LMM, as has been often used in GWAS, controls for confounders, while SGL maintains sparsity of the underlying multivariate regression model. SGL-LMM first sets a fixed zero effect to learn the parameters of random effects using LMM, and then estimates fixed effects using SGL regularization. We present efficient algorithms for hyperparameter tuning and feature selection using stability selection. While controlling for confounders and constraining for sparse solutions, SGL-LMM also provides a natural framework for incorporating prior biological information into the group structure underlying the model. Results based on both simulated and real data show SGL-LMM outperforms previous approaches in terms of power to detect associations and accuracy of quantitative trait prediction.
Collapse
Affiliation(s)
- Yingjie Guo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.,Department of Computational Biology, Cornell University, Ithaca, NY, United States
| | - Chenxi Wu
- Department of Mathematics, Rutgers University, Piscataway, NJ, United States
| | - Maozu Guo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.,School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Xiaoyan Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Alon Keinan
- Department of Computational Biology, Cornell University, Ithaca, NY, United States.,Cornell Center for Comparative and Population Genomics, Center for Vertebrate Genomics, and Center for Enervating Neuroimmune Disease, Cornell University, Ithaca, NY, United States
| |
Collapse
|
5
|
Ramstein GP, Evans J, Nandety A, Saha MC, Brummer EC, Kaeppler SM, Buell CR, Casler MD. Candidate Variants for Additive and Interactive Effects on Bioenergy Traits in Switchgrass ( Panicum virgatum L.) Identified by Genome-Wide Association Analyses. THE PLANT GENOME 2018; 11:180002. [PMID: 30512032 DOI: 10.3835/plantgenome2018.01.0002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Switchgrass ( L.) is a promising herbaceous energy crop, but further gains in biomass yield and quality must be achieved to enable a viable bioenergy industry. Developing DNA markers can contribute to such progress, but depiction of genetic bases should be reliable, involving simple additive marker effects and also interactions with genetic backgrounds (e.g., ecotypes) or synergies with other markers. We analyzed plant height, C content, N content, and mineral concentration in a diverse panel consisting of 512 genotypes of upland and lowland ecotypes. We performed association analyses based on exome capture sequencing and tested 439,170 markers for marginal effects, 83,290 markers for marker × ecotype interactions, and up to 311,445 marker pairs for pairwise interactions. Analyses of pairwise interactions focused on subsets of marker pairs preselected on the basis of marginal marker effects, gene ontology annotation, and pairwise marker associations. Our tests identified 12 significant effects. Homology and gene expression information corroborated seven effects and indicated plausible causal pathways: flowering time and lignin synthesis for plant height; plant growth and senescence for C content and mineral concentration. Four pairwise interactions were detected, including three interactions preselected on the basis of pairwise marker correlations. Furthermore, a marker × ecotype interaction and a pairwise interaction were confirmed in an independent switchgrass panel. Our analyses identified reliable candidate variants for important bioenergy traits. Moreover, they exemplified the importance of interactive effects for depicting genetic bases and illustrated the usefulness of preselecting marker pairs for identifying pairwise marker interactions in association studies.
Collapse
|
6
|
Ju JH, Shenoy SA, Crystal RG, Mezey JG. An independent component analysis confounding factor correction framework for identifying broad impact expression quantitative trait loci. PLoS Comput Biol 2017; 13:e1005537. [PMID: 28505156 PMCID: PMC5448815 DOI: 10.1371/journal.pcbi.1005537] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2016] [Revised: 05/30/2017] [Accepted: 04/28/2017] [Indexed: 11/19/2022] Open
Abstract
Genome-wide expression Quantitative Trait Loci (eQTL) studies in humans have provided numerous insights into the genetics of both gene expression and complex diseases. While the majority of eQTL identified in genome-wide analyses impact a single gene, eQTL that impact many genes are particularly valuable for network modeling and disease analysis. To enable the identification of such broad impact eQTL, we introduce CONFETI: Confounding Factor Estimation Through Independent component analysis. CONFETI is designed to address two conflicting issues when searching for broad impact eQTL: the need to account for non-genetic confounding factors that can lower the power of the analysis or produce broad impact eQTL false positives, and the tendency of methods that account for confounding factors to model broad impact eQTL as non-genetic variation. The key advance of the CONFETI framework is the use of Independent Component Analysis (ICA) to identify variation likely caused by broad impact eQTL when constructing the sample covariance matrix used for the random effect in a mixed model. We show that CONFETI has better performance than other mixed model confounding factor methods when considering broad impact eQTL recovery from synthetic data. We also used the CONFETI framework and these same confounding factor methods to identify eQTL that replicate between matched twin pair datasets in the Multiple Tissue Human Expression Resource (MuTHER), the Depression Genes Networks study (DGN), the Netherlands Study of Depression and Anxiety (NESDA), and multiple tissue types in the Genotype-Tissue Expression (GTEx) consortium. These analyses identified both cis-eQTL and trans-eQTL impacting individual genes, and CONFETI had better or comparable performance to other mixed model confounding factor analysis methods when identifying such eQTL. In these analyses, we were able to identify and replicate a few broad impact eQTL although the overall number was small even when applying CONFETI. In light of these results, we discuss the broad impact eQTL that have been previously reported from the analysis of human data and suggest that considerable caution should be exercised when making biological inferences based on these reported eQTL.
Collapse
Affiliation(s)
- Jin Hyun Ju
- Department of Genetic Medicine, Weill Cornell Medical College, New York, NY, United States of America
- Institute for Computational Biomedicine, Weill Cornell Medical College, New York, NY, United States of America
| | - Sushila A. Shenoy
- Department of Genetic Medicine, Weill Cornell Medical College, New York, NY, United States of America
| | - Ronald G. Crystal
- Department of Genetic Medicine, Weill Cornell Medical College, New York, NY, United States of America
| | - Jason G. Mezey
- Department of Genetic Medicine, Weill Cornell Medical College, New York, NY, United States of America
- Institute for Computational Biomedicine, Weill Cornell Medical College, New York, NY, United States of America
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY, United States of America
- * E-mail:
| |
Collapse
|
7
|
Burghardt LT, Young ND, Tiffin P. A Guide to Genome-Wide Association Mapping in Plants. ACTA ACUST UNITED AC 2017; 2:22-38. [PMID: 31725973 DOI: 10.1002/cppb.20041] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Genome-wide association studies (GWAS) have developed into a valuable approach for identifying the genetic basis of phenotypic variation. In this article, we provide an overview of the design, analysis, and interpretation of GWAS. First, we present results from simulations that explore key elements of experimental design as well as considerations for collecting the relevant genomic and phenotypic data. Next, we outline current statistical methods and tools used for GWA analyses and discuss the inclusion of covariates to account for population structure and the interpretation of results. Given that many false positive associations will occur in any GWA analysis, we highlight strategies for prioritizing GWA candidates for further statistical and empirical validation. While focused on plants, the material we cover is also applicable to other systems. © 2017 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Liana T Burghardt
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, Minnesota
| | - Nevin D Young
- Department of Plant Pathology, University of Minnesota, St. Paul, Minnesota
| | - Peter Tiffin
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, Minnesota
| |
Collapse
|
8
|
Otto JM, Gizer IR, Bizon C, Wilhelmsen KC, Ehlers CL. Polygenic risk scores for cigarettes smoked per day do not generalize to a Native American population. Drug Alcohol Depend 2016; 167:95-102. [PMID: 27530288 PMCID: PMC5037040 DOI: 10.1016/j.drugalcdep.2016.07.029] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/29/2016] [Revised: 07/19/2016] [Accepted: 07/29/2016] [Indexed: 11/18/2022]
Abstract
BACKGROUND Recent studies have demonstrated the utility of polygenic risk scores (PRSs) for exploring the genetic etiology of psychiatric phenotypes and the genetic correlations between them. To date, these studies have been conducted almost exclusively using participants of European ancestry, and thus, there is a need for similar studies conducted in other ancestral populations. However, given that the predictive ability of PRSs are sensitive to differences in linkage disequilibrium (LD) patterns and minor allele frequencies across discovery and target samples, the applicability of PRSs developed in European ancestry samples to other ancestral populations has yet to be determined. Therefore, the current study derived PRSs for cigarettes per day (CPD) from predominantly European-ancestry samples and examined their ability to predict nicotine dependence (ND) in a Native American (NA) population sample. METHOD Results from the Tobacco and Genetics Consortium's meta-analysis of genome-wide association studies of CPD were used to compute PRSs in a NA community sample (N=288). These scores were then used to predict ND diagnostic status. RESULTS The PRS was not significantly associated with liability for ND in the full sample. However, a significant interaction between PRS and percent NA ancestry was observed. Risk scores were positively associated with liability for ND at higher levels of European ancestry, but no association was observed at higher levels of NA ancestry. CONCLUSION These findings illustrate how differences in patterns of LD across discovery and target samples can reduce the predictive ability of PRSs for complex traits.
Collapse
Affiliation(s)
- Jacqueline M Otto
- Department of Psychological Sciences, University of Missouri, 210 McAlester Hall, Columbia, MO 65211, United States
| | - Ian R Gizer
- Department of Psychological Sciences, University of Missouri, 210 McAlester Hall, Columbia, MO 65211, United States
| | - Chris Bizon
- Renaissance Computing Institute (RENCI), 100 Europa Drive, Suite 540, Chapel Hill, NC 27517, United States
| | - Kirk C Wilhelmsen
- Renaissance Computing Institute (RENCI), 100 Europa Drive, Suite 540, Chapel Hill, NC 27517, United States; Departments of Genetics and Neurology, University of North Carolina at Chapel Hill, 120 Mason Farm Road, 5093 Genetic Medicine Building, CB#7264, Chapel Hill, NC 27599, United States
| | - Cindy L Ehlers
- Department of Molecular and Cellular Neurosciences, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037, United States.
| |
Collapse
|