1
|
Shikha K, Madhumal Thayil V, Shahi JP, Zaidi PH, Seetharam K, Nair SK, Singh R, Tosh G, Singamsetti A, Singh S, Sinha B. Genomic-regions associated with cold stress tolerance in Asia-adapted tropical maize germplasm. Sci Rep 2023; 13:6297. [PMID: 37072497 PMCID: PMC10113201 DOI: 10.1038/s41598-023-33250-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Accepted: 04/10/2023] [Indexed: 05/03/2023] Open
Abstract
Maize is gaining impetus in non-traditional and non-conventional seasons such as off-season, primarily due to higher demand and economic returns. Maize varieties directed for growing in the winter season of South Asia must have cold resilience as an important trait due to the low prevailing temperatures and frequent cold snaps observed during this season in most parts of the lowland tropics of Asia. The current study involved screening of a panel of advanced tropically adapted maize lines to cold stress during vegetative and flowering stage under field conditions. A suite of significant genomic loci (28) associated with grain yield along and agronomic traits such as flowering (15) and plant height (6) under cold stress environments. The haplotype regression revealed 6 significant haplotype blocks for grain yield under cold stress across the test environments. Haplotype blocks particularly on chromosomes 5 (bin5.07), 6 (bin6.02), and 9 (9.03) co-located to regions/bins that have been identified to contain candidate genes involved in membrane transport system that would provide essential tolerance to the plant. The regions on chromosome 1 (bin1.04), 2 (bin 2.07), 3 (bin 3.05-3.06), 5 (bin5.03), 8 (bin8.05-8.06) also harboured significant SNPs for the other agronomic traits. In addition, the study also looked at the plausibility of identifying tropically adapted maize lines from the working germplasm with cold resilience across growth stages and identified four lines that could be used as breeding starts in the tropical maize breeding pipelines.
Collapse
Affiliation(s)
- Kumari Shikha
- Department of Genetics and Plant Breeding, Banaras Hindu University (BHU), Varanasi, India
| | - Vinayan Madhumal Thayil
- International Maize and Wheat Improvement Centre (CIMMYT), ICRISAT Campus, Patancheru, Telangana, India.
| | - J P Shahi
- Department of Genetics and Plant Breeding, Banaras Hindu University (BHU), Varanasi, India
| | - P H Zaidi
- International Maize and Wheat Improvement Centre (CIMMYT), ICRISAT Campus, Patancheru, Telangana, India
| | - Kaliyamoorthy Seetharam
- International Maize and Wheat Improvement Centre (CIMMYT), ICRISAT Campus, Patancheru, Telangana, India
| | - Sudha K Nair
- International Maize and Wheat Improvement Centre (CIMMYT), ICRISAT Campus, Patancheru, Telangana, India
| | - Raju Singh
- Borlaug Institute for South Asia (BISA), Ludhiana, Punjab, India
| | - Garg Tosh
- Punjab Agricultural University (PAU), Ludhiana, India
| | - Ashok Singamsetti
- Department of Genetics and Plant Breeding, Banaras Hindu University (BHU), Varanasi, India
| | - Saurabh Singh
- Department of Genetics and Plant Breeding, Banaras Hindu University (BHU), Varanasi, India
| | - B Sinha
- Department of Genetics and Plant Breeding, Banaras Hindu University (BHU), Varanasi, India
| |
Collapse
|
2
|
Li H, Wang Z, Xu L, Li Q, Gao H, Ma H, Cai W, Chen Y, Gao X, Zhang L, Gao H, Zhu B, Xu L, Li J. Genomic prediction of carcass traits using different haplotype block partitioning methods in beef cattle. Evol Appl 2022; 15:2028-2042. [PMID: 36540636 PMCID: PMC9753827 DOI: 10.1111/eva.13491] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 09/18/2022] [Indexed: 09/22/2023] Open
Abstract
Genomic prediction (GP) based on haplotype alleles can capture quantitative trait loci (QTL) effects and increase predictive ability because the haplotypes are expected to be in linkage disequilibrium (LD) with QTL. In this study, we constructed haploblocks using LD-based and the fixed number of single nucleotide polymorphisms (fixed-SNP) methods with Illumina BovineHD chip in beef cattle. To evaluate the performance of different haplotype block partitioning methods, we constructed haploblocks based on LD thresholds (from r 2 > 0.2 to r 2 > 0.8) and the number of fixed-SNPs (5, 10, 20). The performance of predictive methods for three carcass traits including liveweight (LW), dressing percentage (DP), and longissimus dorsi muscle weight (LDMW) was evaluated using three approaches (GBLUP and BayesB model based on the SNP, GHBLUP, and BayesBH models based on the haploblock, and GHBLUP+GBLUP and BayesBH+BayesB models based on the combined haploblock and the nonblocked SNPs, which were located between blocks). In this study, we found the accuracies of LD-based and fixed-SNP haplotype Bayesian methods outperformed the Bayesian models (up to 8.54 ± 7.44% and 5.74 ± 2.95%, respectively). GHBLUP showed a high improvement (up to 11.29 ± 9.87%) compared with GBLUP. The Bayesian models have higher accuracies than BLUP models in most scenarios. The average computing time of the BayesBH+BayesB model can reduce by 29.3% compared with the BayesB model. The prediction accuracies using the LD-based haplotype method showed higher improvements than the fixed-SNP haplotype method. In addition, to avoid the influence of rare haplotypes generated from haplotype construction, we compared the performance of GP by filtering four types of minor haplotype allele frequency (MHAF) (0.01, 0.025, 0.05, and 0.1) under different conditions (LD levels were set at r 2 > 0.3, and the fixed number of SNPs was 5). We found the optimal MHAF threshold for LW was 0.01, and the optimal MHAF threshold for DP and LDMW was 0.025.
Collapse
Affiliation(s)
- Hongwei Li
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Zezhao Wang
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Lei Xu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Qian Li
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Han Gao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Haoran Ma
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Wentao Cai
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Yan Chen
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Xue Gao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Lupei Zhang
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Huijiang Gao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Bo Zhu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Lingyang Xu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Junya Li
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| |
Collapse
|
3
|
Zhao Y, Gao J, Guo X, Su B, Wang H, Yang R, Jiang L. Gene-Based Genome-Wide Association Study Identified Genes for Agronomic Traits in Maize. BIOLOGY 2022; 11:1649. [PMID: 36421363 PMCID: PMC9687540 DOI: 10.3390/biology11111649] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 11/05/2022] [Accepted: 11/08/2022] [Indexed: 07/05/2024]
Abstract
A gene integrates the effects of all SNPs in its sequence span, which benefits the genome-wide association study. To explore gene-level variations affecting economic traits in maize, we extended the SNP-based GWAS analysis software Single-RunKing developed by our team to gene-based GWAS, which used the FaST-LMM algorithm to convert the linear mixed model into simple linear model association analysis. An F-test statistic was formulated to test and identify candidate genes. We compared the statistical efficiency of using 80% principal components (EPC), the first principal component (FPC), and all SNP markers (ALLSNP) as independent variables, which predecessors commonly used to integrate SNPs and represent genes. With a Huazhong Agricultural University (HAU) genomic dataset of 2.65M SNPs from 540 maize plants, 34,774 genes were annotated across the whole genome. Genome-wide association studies with 20 agronomic traits were performed using the software developed here. Another maize dataset from the Ames panel (AP) was also analyzed. The EPC method fits the model well and has good statistical efficiency. It not only overcomes the false negative problem when using all SNP markers for analysis (ALLSNP) but also solves the false positive problem of its corresponding simple linear model method EPCLM. Compared with FPC, the EPC method has higher statistical efficiency. A total of 132 quantitative trait genes (QTG) were identified for the 20 traits from HAU maize dataset and one trait of AP maize.
Collapse
Affiliation(s)
- Yunfeng Zhao
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, Beijing 100141, China
- General Education College, Weifang University of Science and Technology, Weifang 262700, China
| | - Jin Gao
- Hainan Academy of Ocean and Fisheries Sciences, Haikou 571126, China
| | - Xiugang Guo
- General Education College, Weifang University of Science and Technology, Weifang 262700, China
| | - Baofeng Su
- School of Fisheries, Aquaculture and Aquatic Sciences, Auburn University, Auburn, AL 36849, USA
| | - Haijie Wang
- General Education College, Weifang University of Science and Technology, Weifang 262700, China
| | - Runqing Yang
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, Beijing 100141, China
| | - Li Jiang
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, Beijing 100141, China
| |
Collapse
|
4
|
Ballard JL, O'Connor LJ. Shared components of heritability across genetically correlated traits. Am J Hum Genet 2022; 109:989-1006. [PMID: 35477001 PMCID: PMC9247834 DOI: 10.1016/j.ajhg.2022.04.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 04/01/2022] [Indexed: 11/01/2022] Open
Abstract
Most disease-associated genetic variants are pleiotropic, affecting multiple genetically correlated traits. Their pleiotropic associations can be mechanistically informative: if many variants have similar patterns of association, they may act via similar pleiotropic mechanisms, forming a shared component of heritability. We developed pleiotropic decomposition regression (PDR) to identify shared components and their underlying genetic variants. We validated PDR on simulated data and identified limitations of existing methods in recovering the true components. We applied PDR to three clusters of five to six traits genetically correlated with coronary artery disease (CAD), asthma, and type II diabetes (T2D), producing biologically interpretable components. For CAD, PDR identified components related to BMI, hypertension, and cholesterol, and it clarified the relationship among these highly correlated risk factors. We assigned variants to components, calculated their posterior-mean effect sizes, and performed out-of-sample validation. Our posterior-mean effect sizes pool statistical power across traits and substantially boost the correlation (r2) between true and estimated effect sizes (compared with the original summary statistics) by 94% and 70% for asthma and T2D out of sample, respectively, and by a predicted 300% for CAD.
Collapse
Affiliation(s)
- Jenna Lee Ballard
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Luke Jen O'Connor
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
5
|
Zhou YH, Li G, Zhang YM. A compressed variance component mixed model framework for detecting small and linked QTL-by-environment interactions. Brief Bioinform 2022; 23:6527275. [DOI: 10.1093/bib/bbab596] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Revised: 12/07/2021] [Accepted: 12/23/2021] [Indexed: 12/22/2022] Open
Abstract
Abstract
Detecting small and linked quantitative trait loci (QTLs) and QTL-by-environment interactions (QEIs) for complex traits is a difficult issue in immortalized F2 and F2:3 design, especially in the era of global climate change and environmental plasticity research. Here we proposed a compressed variance component mixed model. In this model, a parametric vector of QTL genotype and environment combination effects replaced QTL effects, environmental effects and their interaction effects, whereas the combination effect polygenic background replaced the QTL and QEI polygenic backgrounds. Thus, the number of variance components in the mixed model was greatly reduced. The model was incorporated into our genome-wide composite interval mapping (GCIM) to propose GCIM-QEI-random and GCIM-QEI-fixed, respectively, under random and fixed models of genetic effects. First, potentially associated QTLs and QEIs were selected from genome-wide scanning. Then, significant QTLs and QEIs were identified using empirical Bayes and likelihood ratio test. Finally, known and candidate genes around these significant loci were mined. The new methods were validated by a series of simulation studies and real data analyses. Compared with ICIM, GCIM-QEI-random had 29.77 ± 18.20% and 24.33 ± 10.15% higher average power, respectively, in 0.5–3.0% QTL and QEI detection, 43.44 ± 9.53% and 51.47 ± 15.70% higher average power, respectively, in linked QTL and QEI detection, and identified 30 more known genes for four rice yield traits, because GCIM-QEI-random identified more small genes/loci, being 2.69 ± 2.37% for additional genes. GCIM-QEI-random was slightly better than GCIM-QEI-fixed. In addition, the new methods may be extended into backcross and genome-wide association studies. This study provides effective methods for detecting small-effect and linked QTLs and QEIs.
Collapse
Affiliation(s)
- Ya-Hui Zhou
- College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Guo Li
- College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
- State Key Laboratory of Cotton Biology, Anyang 455000, China
| | - Yuan-Ming Zhang
- College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
6
|
Bedhane M, van der Werf J, de las Heras-Saldana S, Lim D, Park B, Na Park M, Seung Hee R, Clark S. The accuracy of genomic prediction for meat quality traits in Hanwoo cattle when using genotypes from different SNP densities and preselected variants from imputed whole genome sequence. ANIMAL PRODUCTION SCIENCE 2022. [DOI: 10.1071/an20659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Context
Genomic prediction is the use of genomic data in the estimation of genomic breeding values (GEBV) in animal breeding. In beef cattle breeding programs, genomic prediction increases the rates of genetic gain by increasing the accuracy of selection at earlier ages.
Aims
The objectives of the study were to examine the effect of single-nucleotide polymorphism (SNP) density and to evaluate the effect of using SNPs preselected from imputed whole-genome sequence for genomic prediction.
Methods
Genomic and phenotypic data from 2110 Hanwoo steers were used to predict GEBV for marbling score (MS), meat texture (MT), and meat colour (MC) traits. Three types of SNP densities including 50k, high-density (HD), and whole-genome sequence data and preselected SNPs from genome-wide association study (GWAS) were used for genomic prediction analyses. Two scenarios (independent and dependent discovery populations) were used to select top significant SNPs. The accuracy of GEBV was assessed using random cross-validation. Genomic best linear unbiased prediction (GBLUP) was used to predict the breeding values for each trait.
Key results
Our result showed that very similar prediction accuracies were observed across all SNP densities used in the study. The prediction accuracy among traits ranged from 0.29±0.05 for MC to 0.46±0.04 for MS. Depending on the studied traits, up to 5% of prediction accuracy improvement was obtained when the preselected SNPs from GWAS analysis were included in the prediction analysis.
Conclusions
High SNP density such as HD and the whole-genome sequence data yielded a similar prediction accuracy in Hanwoo beef cattle. Therefore, the 50K SNP chip panel is sufficient to capture the relationships in a breed with a small effective population size such as the Hanwoo cattle population. Preselected variants improved prediction accuracy when they were included in the genomic prediction model.
Implications
The estimated genomic prediction accuracies are moderately accurate in Hanwoo cattle and for searching for SNPs that are more productive could increase the accuracy of estimated breeding values for the studied traits.
Collapse
|
7
|
Ghosh Dasgupta M, Abdul Bari MP, Shanmugavel S, Dharanishanthi V, Muthupandi M, Kumar N, Chauhan SS, Kalaivanan J, Mohan H, Krutovsky KV, Rajasugunasekar D. Targeted re-sequencing and genome-wide association analysis for wood property traits in breeding population of Eucalyptus tereticornis × E. grandis. Genomics 2021; 113:4276-4292. [PMID: 34785351 DOI: 10.1016/j.ygeno.2021.11.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2020] [Revised: 06/20/2021] [Accepted: 11/10/2021] [Indexed: 11/16/2022]
Abstract
Globally, Eucalyptus plantations occupy 22 million ha area and is one of the preferred hardwood species due to their short rotation, rapid growth, adaptability and wood properties. In this study, we present results of GWAS in parents and 100 hybrids of Eucalyptus tereticornis × E. grandis using 762 genes presumably involved in wood formation. Comparative analysis between parents predicted 32,202 polymorphic SNPs with high average read depth of 269-562× per individual per nucleotide. Seventeen wood related traits were phenotyped across three diverse environments and GWAS was conducted using 13,610 SNPs. A total of 45 SNP-trait associations were predicted across two locations. Seven large effect markers were identified which explained more than 80% of phenotypic variation for fibre area. This study has provided an array of candidate genes which may govern fibre morphology in this genus and has predicted potential SNPs which can guide future breeding programs in tropical Eucalyptus.
Collapse
Affiliation(s)
| | | | | | | | - Muthusamy Muthupandi
- Institute of Forest Genetics and Tree Breeding, R.S. Puram, Coimbatore 641002, India
| | - Naveen Kumar
- Institute of Wood Science and Technology, 18(th) Cross Malleshwaram, Bangalore 560 003, India
| | - Shakti Singh Chauhan
- Institute of Wood Science and Technology, 18(th) Cross Malleshwaram, Bangalore 560 003, India
| | | | - Haritha Mohan
- Institute of Forest Genetics and Tree Breeding, R.S. Puram, Coimbatore 641002, India
| | - Konstantin V Krutovsky
- Department of Forest Genetics and Forest Tree Breeding, Georg-August University of Göttingen, 37077 Göttingen, Germany; Center for Integrated Breeding Research, George-August University of Göttingen, 37075 Göttingen, Germany; Laboratory of Forest Genomics, Genome Research and Education Center, Institute of Fundamental Biology and Biotechnology, Siberian Federal University, 660036 Krasnoyarsk, Russia; Laboratory of Population Genetics, N.I. Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia; Department of Ecosystem Science and Management, Texas A&M University, College Station, TX 77843-2138, USA
| | | |
Collapse
|
8
|
Ubbens J, Parkin I, Eynck C, Stavness I, Sharpe AG. Deep neural networks for genomic prediction do not estimate marker effects. THE PLANT GENOME 2021; 14:e20147. [PMID: 34596363 DOI: 10.1002/tpg2.20147] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Accepted: 07/09/2021] [Indexed: 06/13/2023]
Abstract
Genomic prediction is a promising technology for advancing both plant and animal breeding, with many different prediction models evaluated in the literature. It has been suggested that the ability of powerful nonlinear models, such as deep neural networks, to capture complex epistatic effects between markers offers advantages for genomic prediction. However, these methods tend not to outperform classical linear methods, leaving it an open question why this capacity to model nonlinear effects does not seem to result in better predictive capability. In this work, we propose the theory that, because of a previously described principle called shortcut learning, deep neural networks tend to base their predictions on overall genetic relatedness rather than on the effects of particular markers such as epistatic effects. Using several datasets of crop plants [lentil (Lens culinaris Medik.), wheat (Triticum aestivum L.), and Brassica carinata A. Braun], we demonstrate the network's indifference to the values of the markers by showing that the same network, provided with only the locations of matches between markers for two individuals, is able to perform prediction to the same level of accuracy.
Collapse
Affiliation(s)
- Jordan Ubbens
- Global Institute for Food Security (GIFS), University of Saskatchewan, Saskatoon, SK, S7N 0W9, Canada
| | - Isobel Parkin
- Agriculture and Agri-Food Canada, Saskatoon, SK, S7N 0X2, Canada
| | - Christina Eynck
- Agriculture and Agri-Food Canada, Saskatoon, SK, S7N 0X2, Canada
| | - Ian Stavness
- Global Institute for Food Security (GIFS), University of Saskatchewan, Saskatoon, SK, S7N 0W9, Canada
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK, S7N 0W9, Canada
| | - Andrew G Sharpe
- Global Institute for Food Security (GIFS), University of Saskatchewan, Saskatoon, SK, S7N 0W9, Canada
| |
Collapse
|
9
|
Anwar MY, Raffield LM, Lange LA, Correa A, Taylor KC. Genetic underpinnings of regional adiposity distribution in African Americans: Assessments from the Jackson Heart Study. PLoS One 2021; 16:e0255609. [PMID: 34347846 PMCID: PMC8336790 DOI: 10.1371/journal.pone.0255609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 07/19/2021] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND African ancestry individuals with comparable overall anthropometric measures to Europeans have lower abdominal adiposity. To explore the genetic underpinning of different adiposity patterns, we investigated whether genetic risk scores for well-studied adiposity phenotypes like body mass index (BMI) and waist circumference (WC) also predict other, less commonly measured adiposity measures in 2420 African American individuals from the Jackson Heart Study. METHODS Polygenic risk scores (PRS) were calculated using GWAS-significant variants extracted from published studies mostly representing European ancestry populations for BMI, waist-hip ratio (WHR) adjusted for BMI (WHRBMIadj), waist circumference adjusted for BMI (WCBMIadj), and body fat percentage (BF%). Associations between each PRS and adiposity measures including BF%, subcutaneous adiposity tissue (SAT), visceral adiposity tissue (VAT) and VAT:SAT ratio (VSR) were examined using multivariable linear regression, with or without BMI adjustment. RESULTS In non-BMI adjusted models, all phenotype-PRS were found to be positive predictors of BF%, SAT and VAT. WHR-PRS was a positive predictor of VSR, but BF% and BMI-PRS were negative predictors of VSR. After adjusting for BMI, WHR-PRS remained a positive predictor of BF%, VAT and VSR but not SAT. WC-PRS was a positive predictor of SAT and VAT; BF%-PRS was a positive predictor of BF% and SAT only. CONCLUSION These analyses suggest that genetically driven increases in BF% strongly associate with subcutaneous rather than visceral adiposity and BF% is strongly associated with BMI but not central adiposity-associated genetic variants. How common genetic variants may contribute to observed differences in adiposity patterns between African and European ancestry individuals requires further study.
Collapse
Affiliation(s)
- Mohammad Y. Anwar
- School of Public Health & Information Sciences, The University of Louisville, Louisville, KY, United States of America
| | - Laura M. Raffield
- Department of Genetics, University of North Carolina, Chapel Hill, NC, United States of America
| | - Leslie A. Lange
- Division of Biomedical Informatics and Personalized Medicine, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| | - Adolfo Correa
- Jackson Heart Study, Department of Medicine, University of Mississippi Medical Center, Jackson, Mississippi, United States of America
| | - Kira C. Taylor
- School of Public Health & Information Sciences, The University of Louisville, Louisville, KY, United States of America
| |
Collapse
|
10
|
Cuomo ASE, Alvari G, Azodi CB, McCarthy DJ, Bonder MJ. Optimizing expression quantitative trait locus mapping workflows for single-cell studies. Genome Biol 2021; 22:188. [PMID: 34167583 PMCID: PMC8223300 DOI: 10.1186/s13059-021-02407-x] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Accepted: 06/09/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Single-cell RNA sequencing (scRNA-seq) has enabled the unbiased, high-throughput quantification of gene expression specific to cell types and states. With the cost of scRNA-seq decreasing and techniques for sample multiplexing improving, population-scale scRNA-seq, and thus single-cell expression quantitative trait locus (sc-eQTL) mapping, is increasingly feasible. Mapping of sc-eQTL provides additional resolution to study the regulatory role of common genetic variants on gene expression across a plethora of cell types and states and promises to improve our understanding of genetic regulation across tissues in both health and disease. RESULTS While previously established methods for bulk eQTL mapping can, in principle, be applied to sc-eQTL mapping, there are a number of open questions about how best to process scRNA-seq data and adapt bulk methods to optimize sc-eQTL mapping. Here, we evaluate the role of different normalization and aggregation strategies, covariate adjustment techniques, and multiple testing correction methods to establish best practice guidelines. We use both real and simulated datasets across single-cell technologies to systematically assess the impact of these different statistical approaches. CONCLUSION We provide recommendations for future single-cell eQTL studies that can yield up to twice as many eQTL discoveries as default approaches ported from bulk studies.
Collapse
Affiliation(s)
- Anna S E Cuomo
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK.
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.
| | - Giordano Alvari
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Christina B Azodi
- St. Vincent's Institute of Medical Research, Fitzroy, Victoria, Australia
- University of Melbourne, Parkville, Victoria, Australia
| | - Davis J McCarthy
- St. Vincent's Institute of Medical Research, Fitzroy, Victoria, Australia.
- University of Melbourne, Parkville, Victoria, Australia.
| | - Marc Jan Bonder
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany.
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany.
| |
Collapse
|
11
|
Watts EL, Fensom GK, Smith Byrne K, Perez‐Cornago A, Allen NE, Knuppel A, Gunter MJ, Holmes MV, Martin RM, Murphy N, Tsilidis KK, Yeap BB, Key TJ, Travis RC. Circulating insulin-like growth factor-I, total and free testosterone concentrations and prostate cancer risk in 200 000 men in UK Biobank. Int J Cancer 2021; 148:2274-2288. [PMID: 33252839 PMCID: PMC8048461 DOI: 10.1002/ijc.33416] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Revised: 10/09/2020] [Accepted: 11/16/2020] [Indexed: 12/12/2022]
Abstract
Insulin-like growth factor-I (IGF-I) and testosterone have been implicated in prostate cancer aetiology. Using data from a large prospective full-cohort with standardised assays and repeat blood measurements, and genetic data from an international consortium, we investigated the associations of circulating IGF-I, sex hormone-binding globulin (SHBG), and total and calculated free testosterone concentrations with prostate cancer incidence and mortality. For prospective analyses, risk was estimated using multivariable-adjusted Cox regression in 199 698 male UK Biobank participants. Hazard ratios (HRs) were corrected for regression dilution bias using repeat hormone measurements from a subsample. Two-sample Mendelian randomisation (MR) analysis of IGF-I and risk used genetic instruments identified from UK Biobank men and genetic outcome data from the PRACTICAL consortium (79 148 cases and 61 106 controls). We used cis- and all (cis and trans) SNP MR approaches. A total of 5402 men were diagnosed with and 295 died from prostate cancer (mean follow-up 6.9 years). Higher circulating IGF-I was associated with elevated prostate cancer diagnosis (HR per 5 nmol/L increment = 1.09, 95% CI 1.05-1.12) and mortality (HR per 5 nmol/L increment = 1.15, 1.02-1.29). MR analyses also supported the role of IGF-I in prostate cancer diagnosis (cis-MR odds ratio per 5 nmol/L increment = 1.34, 1.07-1.68). In observational analyses, higher free testosterone was associated with a higher risk of prostate cancer (HR per 50 pmol/L increment = 1.10, 1.05-1.15). Higher SHBG was associated with a lower risk (HR per 10 nmol/L increment = 0.95, 0.94-0.97), neither was associated with prostate cancer mortality. Total testosterone was not associated with prostate cancer. These findings implicate IGF-I and free testosterone in prostate cancer development and/or progression.
Collapse
Affiliation(s)
- Eleanor L. Watts
- Cancer Epidemiology Unit, Nuffield Department of Population HealthUniversity of OxfordOxfordUK
| | - Georgina K. Fensom
- Cancer Epidemiology Unit, Nuffield Department of Population HealthUniversity of OxfordOxfordUK
| | - Karl Smith Byrne
- Genetic Epidemiology GroupInternational Agency for Research on CancerLyonFrance
| | - Aurora Perez‐Cornago
- Cancer Epidemiology Unit, Nuffield Department of Population HealthUniversity of OxfordOxfordUK
| | - Naomi E. Allen
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population HealthUniversity of OxfordOxfordUK
- UK Biobank LtdStockportUK
| | - Anika Knuppel
- Cancer Epidemiology Unit, Nuffield Department of Population HealthUniversity of OxfordOxfordUK
| | - Marc J. Gunter
- Section of Nutrition and MetabolismInternational Agency for Research on CancerLyonFrance
| | - Michael V. Holmes
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population HealthUniversity of OxfordOxfordUK
- Medical Research Council Population Health Research UnitUniversity of OxfordOxfordUK
| | - Richard M. Martin
- MRC Integrative Epidemiology Unit (IEU), Population Health Sciences, Bristol Medical SchoolUniversity of BristolBristolUK
- Bristol Medical School, Department of Population Health SciencesUniversity of BristolBristolUK
- National Institute for Health Research (NIHR) Bristol Biomedical Research CentreUniversity Hospitals Bristol NHS Foundation Trust and the University of BristolBristolUK
| | - Neil Murphy
- Section of Nutrition and MetabolismInternational Agency for Research on CancerLyonFrance
| | - Konstantinos K. Tsilidis
- Department of Hygiene and EpidemiologyUniversity of Ioannina School of MedicineIoanninaGreece
- Department of Epidemiology and Biostatistics, School of Public HealthImperial College LondonLondonUK
| | - Bu B. Yeap
- Medical SchoolUniversity of Western AustraliaPerthAustralia
- Department of Endocrinology and DiabetesFiona Stanley HospitalPerthAustralia
| | - Timothy J. Key
- Cancer Epidemiology Unit, Nuffield Department of Population HealthUniversity of OxfordOxfordUK
| | - Ruth C. Travis
- Cancer Epidemiology Unit, Nuffield Department of Population HealthUniversity of OxfordOxfordUK
| |
Collapse
|
12
|
Rohde PD, Kristensen TN, Sarup P, Muñoz J, Malmendal A. Prediction of complex phenotypes using the Drosophila melanogaster metabolome. Heredity (Edinb) 2021; 126:717-732. [PMID: 33510469 PMCID: PMC8102504 DOI: 10.1038/s41437-021-00404-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Revised: 01/04/2021] [Accepted: 01/04/2021] [Indexed: 01/30/2023] Open
Abstract
Understanding the genotype-phenotype map and how variation at different levels of biological organization is associated are central topics in modern biology. Fast developments in sequencing technologies and other molecular omic tools enable researchers to obtain detailed information on variation at DNA level and on intermediate endophenotypes, such as RNA, proteins and metabolites. This can facilitate our understanding of the link between genotypes and molecular and functional organismal phenotypes. Here, we use the Drosophila melanogaster Genetic Reference Panel and nuclear magnetic resonance (NMR) metabolomics to investigate the ability of the metabolome to predict organismal phenotypes. We performed NMR metabolomics on four replicate pools of male flies from each of 170 different isogenic lines. Our results show that metabolite profiles are variable among the investigated lines and that this variation is highly heritable. Second, we identify genes associated with metabolome variation. Third, using the metabolome gave better prediction accuracies than genomic information for four of five quantitative traits analyzed. Our comprehensive characterization of population-scale diversity of metabolomes and its genetic basis illustrates that metabolites have large potential as predictors of organismal phenotypes. This finding is of great importance, e.g., in human medicine, evolutionary biology and animal and plant breeding.
Collapse
Affiliation(s)
- Palle Duun Rohde
- Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark.
| | - Torsten Nygaard Kristensen
- Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark
- Department of Animal Science, Aarhus University, Tjele, Denmark
| | - Pernille Sarup
- Department of Molecular Biology and Genetics, Aarhus University, Tjele, Denmark
- Nordic Seed A/S, Odder, Denmark
| | - Joaquin Muñoz
- Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark
| | - Anders Malmendal
- Department of Science and Environment, Roskilde University, Roskilde, Denmark.
| |
Collapse
|
13
|
Zhang J, Chen M, Wen Y, Zhang Y, Lu Y, Wang S, Chen J. A Fast Multi-Locus Ridge Regression Algorithm for High-Dimensional Genome-Wide Association Studies. Front Genet 2021; 12:649196. [PMID: 33854527 PMCID: PMC8041068 DOI: 10.3389/fgene.2021.649196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Accepted: 03/01/2021] [Indexed: 11/13/2022] Open
Abstract
The mixed linear model (MLM) has been widely used in genome-wide association study (GWAS) to dissect quantitative traits in human, animal, and plant genetics. Most methodologies consider all single nucleotide polymorphism (SNP) effects as random effects under the MLM framework, which fail to detect the joint minor effect of multiple genetic markers on a trait. Therefore, polygenes with minor effects remain largely unexplored in today’s big data era. In this study, we developed a new algorithm under the MLM framework, which is called the fast multi-locus ridge regression (FastRR) algorithm. The FastRR algorithm first whitens the covariance matrix of the polygenic matrix K and environmental noise, then selects potentially related SNPs among large scale markers, which have a high correlation with the target trait, and finally analyzes the subset variables using a multi-locus deshrinking ridge regression for true quantitative trait nucleotide (QTN) detection. Results from the analyses of both simulated and real data show that the FastRR algorithm is more powerful for both large and small QTN detection, more accurate in QTN effect estimation, and has more stable results under various polygenic backgrounds. Moreover, compared with existing methods, the FastRR algorithm has the advantage of high computing speed. In conclusion, the FastRR algorithm provides an alternative algorithm for multi-locus GWAS in high dimensional genomic datasets.
Collapse
Affiliation(s)
- Jin Zhang
- College of Science, Nanjing Agricultural University, Nanjing, China.,Postdoctoral Research Station of Crop Science, Nanjing Agricultural University, Nanjing, China
| | - Min Chen
- College of Science, Nanjing Agricultural University, Nanjing, China
| | - Yangjun Wen
- College of Science, Nanjing Agricultural University, Nanjing, China
| | - Yin Zhang
- College of Science, Nanjing Agricultural University, Nanjing, China
| | - Yunan Lu
- College of Science, Nanjing Agricultural University, Nanjing, China
| | - Shengmeng Wang
- College of Science, Nanjing Agricultural University, Nanjing, China
| | - Juncong Chen
- College of Finance, Nanjing Agricultural University, Nanjing, China
| |
Collapse
|
14
|
Zhang YW, Tamba CL, Wen YJ, Li P, Ren WL, Ni YL, Gao J, Zhang YM. mrMLM v4.0.2: An R Platform for Multi-locus Genome-wide Association Studies. GENOMICS PROTEOMICS & BIOINFORMATICS 2020; 18:481-487. [PMID: 33346083 PMCID: PMC8242264 DOI: 10.1016/j.gpb.2020.06.006] [Citation(s) in RCA: 74] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Revised: 03/27/2020] [Accepted: 09/08/2020] [Indexed: 12/01/2022]
Abstract
Previous studies have reported that some important loci are missed in single-locus genome-wide association studies (GWAS), especially because of the large phenotypic error in field experiments. To solve this issue, multi-locus GWAS methods have been recommended. However, only a few software packages for multi-locus GWAS are available. Therefore, we developed an R software named mrMLM v4.0.2. This software integrates mrMLM, FASTmrMLM, FASTmrEMMA, pLARmEB, pKWmEB, and ISIS EM-BLASSO methods developed by our lab. There are four components in mrMLM v4.0.2, including dataset input, parameter setting, software running, and result output. The fread function in data.table is used to quickly read datasets, especially big datasets, and the doParallel package is used to conduct parallel computation using multiple CPUs. In addition, the graphical user interface software mrMLM.GUI v4.0.2, built upon Shiny, is also available. To confirm the correctness of the aforementioned programs, all the methods in mrMLM v4.0.2 and three widely-used methods were used to analyze real and simulated datasets. The results confirm the superior performance of mrMLM v4.0.2 to other methods currently available. False positive rates are effectively controlled, albeit with a less stringent significance threshold. mrMLM v4.0.2 is publicly available at BioCode (https://bigd.big.ac.cn/biocode/tools/BT007077) or R (https://cran.r-project.org/web/packages/mrMLM.GUI/index.html) as an open-source software.
Collapse
Affiliation(s)
- Ya-Wen Zhang
- Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Cox Lwaka Tamba
- Department of Mathematics, Egerton University, Egerton 536-20115, Kenya
| | - Yang-Jun Wen
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing 210095, China
| | - Pei Li
- Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Wen-Long Ren
- Department of Epidemiology and Medical Statistics, School of Public Health, Nantong University, Nantong 226019, China
| | - Yuan-Li Ni
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing 210095, China
| | - Jun Gao
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Yuan-Ming Zhang
- Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China.
| |
Collapse
|
15
|
Genomic Prediction Informed by Biological Processes Expands Our Understanding of the Genetic Architecture Underlying Free Amino Acid Traits in Dry Arabidopsis Seeds. G3-GENES GENOMES GENETICS 2020; 10:4227-4239. [PMID: 32978264 PMCID: PMC7642941 DOI: 10.1534/g3.120.401240] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Plant growth, development, and nutritional quality depends upon amino acid homeostasis, especially in seeds. However, our understanding of the underlying genetics influencing amino acid content and composition remains limited, with only a few candidate genes and quantitative trait loci identified to date. Improved knowledge of the genetics and biological processes that determine amino acid levels will enable researchers to use this information for plant breeding and biological discovery. Toward this goal, we used genomic prediction to identify biological processes that are associated with, and therefore potentially influence, free amino acid (FAA) composition in seeds of the model plant Arabidopsis thaliana. Markers were split into categories based on metabolic pathway annotations and fit using a genomic partitioning model to evaluate the influence of each pathway on heritability explained, model fit, and predictive ability. Selected pathways included processes known to influence FAA composition, albeit to an unknown degree, and spanned four categories: amino acid, core, specialized, and protein metabolism. Using this approach, we identified associations for pathways containing known variants for FAA traits, in addition to finding new trait-pathway associations. Markers related to amino acid metabolism, which are directly involved in FAA regulation, improved predictive ability for branched chain amino acids and histidine. The use of genomic partitioning also revealed patterns across biochemical families, in which serine-derived FAAs were associated with protein related annotations and aromatic FAAs were associated with specialized metabolic pathways. Taken together, these findings provide evidence that genomic partitioning is a viable strategy to uncover the relative contributions of biological processes to FAA traits in seeds, offering a promising framework to guide hypothesis testing and narrow the search space for candidate genes.
Collapse
|
16
|
Couvy-Duchesne B, Strike LT, Zhang F, Holtz Y, Zheng Z, Kemper KE, Yengo L, Colliot O, Wright MJ, Wray NR, Yang J, Visscher PM. A unified framework for association and prediction from vertex-wise grey-matter structure. Hum Brain Mapp 2020; 41:4062-4076. [PMID: 32687259 PMCID: PMC7469763 DOI: 10.1002/hbm.25109] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 05/11/2020] [Accepted: 06/14/2020] [Indexed: 01/29/2023] Open
Abstract
The recent availability of large‐scale neuroimaging cohorts facilitates deeper characterisation of the relationship between phenotypic and brain architecture variation in humans. Here, we investigate the association (previously coined morphometricity) of a phenotype with all 652,283 vertex‐wise measures of cortical and subcortical morphology in a large data set from the UK Biobank (UKB; N = 9,497 for discovery, N = 4,323 for replication) and the Human Connectome Project (N = 1,110). We used a linear mixed model with the brain measures of individuals fitted as random effects with covariance relationships estimated from the imaging data. We tested 167 behavioural, cognitive, psychiatric or lifestyle phenotypes and found significant morphometricity for 58 phenotypes (spanning substance use, blood assay results, education or income level, diet, depression, and cognition domains), 23 of which replicated in the UKB replication set or the HCP. We then extended the model for a bivariate analysis to estimate grey‐matter correlation between phenotypes, which revealed that body size (i.e., height, weight, BMI, waist and hip circumference, body fat percentage) could account for a substantial proportion of the morphometricity (confirmed using a conditional analysis), providing possible insight into previous MRI case–control results for psychiatric disorders where case status is associated with body mass index. Our LMM framework also allowed to predict some of the associated phenotypes from the vertex‐wise measures, in two independent samples. Finally, we demonstrated additional new applications of our approach (a) region of interest (ROI) analysis that retain the vertex‐wise complexity; (b) comparison of the information retained by different MRI processings.
Collapse
Affiliation(s)
- Baptiste Couvy-Duchesne
- Institute for Molecular Bioscience, the University of Queensland, St Lucia, Queensland, Australia
| | - Lachlan T Strike
- Queensland Brain Institute, the University of Queensland, St Lucia, Queensland, Australia
| | - Futao Zhang
- Institute for Molecular Bioscience, the University of Queensland, St Lucia, Queensland, Australia
| | - Yan Holtz
- Institute for Molecular Bioscience, the University of Queensland, St Lucia, Queensland, Australia.,Queensland Brain Institute, the University of Queensland, St Lucia, Queensland, Australia
| | - Zhili Zheng
- Institute for Molecular Bioscience, the University of Queensland, St Lucia, Queensland, Australia.,Institute for Advanced Research, Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Kathryn E Kemper
- Institute for Molecular Bioscience, the University of Queensland, St Lucia, Queensland, Australia
| | - Loic Yengo
- Institute for Molecular Bioscience, the University of Queensland, St Lucia, Queensland, Australia
| | - Olivier Colliot
- ARAMIS, Inria, Paris, France.,ARAMIS, Paris Brain Institute, Paris, France.,ARAMIS, Inserm, Paris, France.,ARAMIS, CNRS, Paris, France.,ARAMIS, Sorbonne University, Paris, France
| | - Margaret J Wright
- Queensland Brain Institute, the University of Queensland, St Lucia, Queensland, Australia.,Centre for Advanced Imaging, the University of Queensland, St Lucia, Queensland, Australia
| | - Naomi R Wray
- Institute for Molecular Bioscience, the University of Queensland, St Lucia, Queensland, Australia.,Queensland Brain Institute, the University of Queensland, St Lucia, Queensland, Australia
| | - Jian Yang
- Institute for Molecular Bioscience, the University of Queensland, St Lucia, Queensland, Australia.,Institute for Advanced Research, Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Peter M Visscher
- Institute for Molecular Bioscience, the University of Queensland, St Lucia, Queensland, Australia.,Queensland Brain Institute, the University of Queensland, St Lucia, Queensland, Australia
| |
Collapse
|
17
|
Teng J, Huang S, Chen Z, Gao N, Ye S, Diao S, Ding X, Yuan X, Zhang H, Li J, Zhang Z. Optimizing genomic prediction model given causal genes in a dairy cattle population. J Dairy Sci 2020; 103:10299-10310. [PMID: 32952023 DOI: 10.3168/jds.2020-18233] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Accepted: 07/07/2020] [Indexed: 01/15/2023]
Abstract
As genotypic data are moving from SNP chip toward whole-genome sequence, the accuracy of genomic prediction (GP) exhibits a marginal gain, although all genetic variation, including causal genes, are contained in whole-genome sequence data. Meanwhile, genetic analyses on complex traits, such as genome-wide association studies, have identified an increasing number of genomic regions, including potential causal genes, which would be reliable prior knowledge for GP. Many studies have tried to improve the performance of GP by modifying the prediction model to incorporate prior knowledge. Although several plausible results have been obtained from model modification or strategy optimization, most of them were validated in a specific empirical population with a limited variety of genetic architecture for complex traits. An alternative approach is to use simulated genetic architecture with known causal genes (e.g., simulated causative SNP) to evaluate different GP models with given causal genes. Our objectives were to (1) evaluate the performance of GP under a variety of genetic architectures with a subset of known causal genes and (2) compare different GP models modified by highlighting causal genes and different strategies to weight causal genes. In this study, we simulated pseudo-phenotypes under a variety of genetic architectures based on the real genotypes and phenotypes of a dairy cattle population. Besides classical genomic best linear unbiased prediction, we evaluated 3 modified GP models that highlight causal genes as follows: (1) by treating them as fixed effects, (2) by treating them as a separate random component, and (3) by combining them into the genomic relationship matrix as random effects. Our results showed that highlighting the known causal genes, which explained a considerable proportion of genetic variance in the GP models, increased the predictive accuracy. Combining all given causal genes into the genomic relationship matrix was the optimal strategy under all the scenarios validated, and treating causal genes as a separate random component is also recommended, when more than 20% of genetic variance was explained by known causal genes. Moreover, assigning differential weights to each causal gene further improved the predictive accuracy.
Collapse
Affiliation(s)
- Jinyan Teng
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Shuwen Huang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Zitao Chen
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Ning Gao
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, North Third Road, Guangzhou Higher Education Mega Center, Guangzhou 510006, China
| | - Shaopan Ye
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Shuqi Diao
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Xiangdong Ding
- National Engineering Laboratory for Animal Breeding, Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| | - Xiaolong Yuan
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Hao Zhang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Jiaqi Li
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Zhe Zhang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China.
| |
Collapse
|
18
|
Li Z, Simianer H. Pan-genomic open reading frames: A potential supplement of single nucleotide polymorphisms in estimation of heritability and genomic prediction. PLoS Genet 2020; 16:e1008995. [PMID: 32833967 PMCID: PMC7470747 DOI: 10.1371/journal.pgen.1008995] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2019] [Revised: 09/03/2020] [Accepted: 07/15/2020] [Indexed: 11/19/2022] Open
Abstract
Pan-genomic open reading frames (ORFs) potentially carry protein-coding gene or coding variant information in a population. In this study, we suggest that pan-genomic ORFs are promising to be utilized in estimation of heritability and genomic prediction. A Saccharomyces cerevisiae dataset with whole-genome SNPs, pan-genomic ORFs, and the copy numbers of those ORFs is used to test the effectiveness of ORF data as a predictor in three prediction models for 35 traits. Our results show that the ORF-based heritability can capture more genetic effects than SNP-based heritability for all traits. Compared to SNP-based genomic prediction (GBLUP), pan-genomic ORF-based genomic prediction (OBLUP) is distinctly more accurate for all traits, and the predictive abilities on average are more than doubled across all traits. For four traits, the copy number of ORF-based prediction(CBLUP) is more accurate than OBLUP. When using different numbers of isolates in training sets in ORF-based prediction, the predictive abilities for all traits increased as more isolates are added in the training sets, suggesting that with very large training sets the prediction accuracy will be in the range of the square root of the heritability. We conclude that pan-genomic ORFs have the potential to be a supplement of single nucleotide polymorphisms in estimation of heritability and genomic prediction.
Collapse
Affiliation(s)
- Zhengcao Li
- Animal Breeding and Genetics Group, Center for Integrated Breeding Research, Department of Animal Sciences, University of Goettingen, Goettingen, Germany
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, China
- * E-mail:
| | - Henner Simianer
- Animal Breeding and Genetics Group, Center for Integrated Breeding Research, Department of Animal Sciences, University of Goettingen, Goettingen, Germany
| |
Collapse
|
19
|
Zhang YW, Tamba CL, Wen YJ, Li P, Ren WL, Ni YL, Gao J, Zhang YM. mrMLM v4.0.2: An R Platform for Multi-locus Genome-wide Association Studies. GENOMICS, PROTEOMICS & BIOINFORMATICS 2020; 18:481-487. [PMID: 33346083 DOI: 10.1101/2020.03.04.976464] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Revised: 03/27/2020] [Accepted: 09/08/2020] [Indexed: 05/22/2023]
Abstract
Previous studies have reported that some important loci are missed in single-locus genome-wide association studies (GWAS), especially because of the large phenotypic error in field experiments. To solve this issue, multi-locus GWAS methods have been recommended. However, only a few software packages for multi-locus GWAS are available. Therefore, we developed an R software named mrMLM v4.0.2. This software integrates mrMLM, FASTmrMLM, FASTmrEMMA, pLARmEB, pKWmEB, and ISIS EM-BLASSO methods developed by our lab. There are four components in mrMLM v4.0.2, including dataset input, parameter setting, software running, and result output. The fread function in data.table is used to quickly read datasets, especially big datasets, and the doParallel package is used to conduct parallel computation using multiple CPUs. In addition, the graphical user interface software mrMLM.GUI v4.0.2, built upon Shiny, is also available. To confirm the correctness of the aforementioned programs, all the methods in mrMLM v4.0.2 and three widely-used methods were used to analyze real and simulated datasets. The results confirm the superior performance of mrMLM v4.0.2 to other methods currently available. False positive rates are effectively controlled, albeit with a less stringent significance threshold. mrMLM v4.0.2 is publicly available at BioCode (https://bigd.big.ac.cn/biocode/tools/BT007077) or R (https://cran.r-project.org/web/packages/mrMLM.GUI/index.html) as an open-source software.
Collapse
Affiliation(s)
- Ya-Wen Zhang
- Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Cox Lwaka Tamba
- Department of Mathematics, Egerton University, Egerton 536-20115, Kenya
| | - Yang-Jun Wen
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing 210095, China
| | - Pei Li
- Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Wen-Long Ren
- Department of Epidemiology and Medical Statistics, School of Public Health, Nantong University, Nantong 226019, China
| | - Yuan-Li Ni
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing 210095, China
| | - Jun Gao
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Yuan-Ming Zhang
- Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China.
| |
Collapse
|
20
|
Wen YJ, Zhang YW, Zhang J, Feng JY, Dunwell JM, Zhang YM. An efficient multi-locus mixed model framework for the detection of small and linked QTLs in F2. Brief Bioinform 2020; 20:1913-1924. [PMID: 30032279 PMCID: PMC6917223 DOI: 10.1093/bib/bby058] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Revised: 06/05/2018] [Indexed: 01/03/2023] Open
Abstract
In the genetic system that regulates complex traits, metabolites, gene expression levels, RNA editing levels and DNA methylation, a series of small and linked genes exist. To date, however, little is known about how to design an efficient framework for the detection of these kinds of genes. In this article, we propose a genome-wide composite interval mapping (GCIM) in F2. First, controlling polygenic background via selecting markers in the genome scanning of linkage analysis was replaced by estimating polygenic variance in a genome-wide association study. This can control large, middle and minor polygenic backgrounds in genome scanning. Then, additive and dominant effects for each putative quantitative trait locus (QTL) were separately scanned so that a negative logarithm P-value curve against genome position could be separately obtained for each kind of effect. In each curve, all the peaks were identified as potential QTLs. Thus, almost all the small-effect and linked QTLs are included in a multi-locus model. Finally, adaptive least absolute shrinkage and selection operator (adaptive lasso) was used to estimate all the effects in the multi-locus model, and all the nonzero effects were further identified by likelihood ratio test for true QTL identification. This method was used to reanalyze four rice traits. Among 25 known genes detected in this study, 16 small-effect genes were identified only by GCIM. To further demonstrate GCIM, a series of Monte Carlo simulation experiments was performed. As a result, GCIM is demonstrated to be more powerful than the widely used methods for the detection of closely linked and small-effect QTLs.
Collapse
Affiliation(s)
- Yang-Jun Wen
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, China
| | - Ya-Wen Zhang
- College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Jin Zhang
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, China
| | - Jian-Ying Feng
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, China
| | - Jim M Dunwell
- School of Agriculture, Policy and Development, University of Reading, Reading RG6 6AR, United Kingdom
| | - Yuan-Ming Zhang
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, China.,Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, China
| |
Collapse
|
21
|
Chun S, Imakaev M, Hui D, Patsopoulos NA, Neale BM, Kathiresan S, Stitziel NO, Sunyaev SR. Non-parametric Polygenic Risk Prediction via Partitioned GWAS Summary Statistics. Am J Hum Genet 2020; 107:46-59. [PMID: 32470373 PMCID: PMC7332650 DOI: 10.1016/j.ajhg.2020.05.004] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2019] [Accepted: 05/01/2020] [Indexed: 02/07/2023] Open
Abstract
In complex trait genetics, the ability to predict phenotype from genotype is the ultimate measure of our understanding of genetic architecture underlying the heritability of a trait. A complete understanding of the genetic basis of a trait should allow for predictive methods with accuracies approaching the trait's heritability. The highly polygenic nature of quantitative traits and most common phenotypes has motivated the development of statistical strategies focused on combining myriad individually non-significant genetic effects. Now that predictive accuracies are improving, there is a growing interest in the practical utility of such methods for predicting risk of common diseases responsive to early therapeutic intervention. However, existing methods require individual-level genotypes or depend on accurately specifying the genetic architecture underlying each disease to be predicted. Here, we propose a polygenic risk prediction method that does not require explicitly modeling any underlying genetic architecture. We start with summary statistics in the form of SNP effect sizes from a large GWAS cohort. We then remove the correlation structure across summary statistics arising due to linkage disequilibrium and apply a piecewise linear interpolation on conditional mean effects. In both simulated and real datasets, this new non-parametric shrinkage (NPS) method can reliably allow for linkage disequilibrium in summary statistics of 5 million dense genome-wide markers and consistently improves prediction accuracy. We show that NPS improves the identification of groups at high risk for breast cancer, type 2 diabetes, inflammatory bowel disease, and coronary heart disease, all of which have available early intervention or prevention treatments.
Collapse
Affiliation(s)
- Sung Chun
- Division of Genetics, Brigham and Women's Hospital, Boston, MA 02115, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA; Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Altius Institute for Biomedical Sciences, Seattle, WA 98121, USA
| | - Maxim Imakaev
- Division of Genetics, Brigham and Women's Hospital, Boston, MA 02115, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA; Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Altius Institute for Biomedical Sciences, Seattle, WA 98121, USA
| | - Daniel Hui
- Division of Genetics, Brigham and Women's Hospital, Boston, MA 02115, USA; Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Systems Biology and Computer Science Program, Ann Romney Center for Neurological Diseases, Department of Neurology, Brigham & Women's Hospital, Boston, MA 02115, USA
| | - Nikolaos A Patsopoulos
- Division of Genetics, Brigham and Women's Hospital, Boston, MA 02115, USA; Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Systems Biology and Computer Science Program, Ann Romney Center for Neurological Diseases, Department of Neurology, Brigham & Women's Hospital, Boston, MA 02115, USA
| | - Benjamin M Neale
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Sekar Kathiresan
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA 02114, USA; Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Nathan O Stitziel
- Cardiovascular Division, Department of Medicine, Washington University School of Medicine, Saint Louis, MO 63110, USA; Department of Genetics, Washington University School of Medicine, Saint Louis, MO 63110, USA; McDonnell Genome Institute, Washington University School of Medicine, Saint Louis, MO 63110, USA.
| | - Shamil R Sunyaev
- Division of Genetics, Brigham and Women's Hospital, Boston, MA 02115, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA; Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Altius Institute for Biomedical Sciences, Seattle, WA 98121, USA.
| |
Collapse
|
22
|
Chen H, Hao Z, Zhao Y, Yang R. A fast-linear mixed model for genome-wide haplotype association analysis: application to agronomic traits in maize. BMC Genomics 2020; 21:151. [PMID: 32046650 PMCID: PMC7014697 DOI: 10.1186/s12864-020-6552-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Accepted: 02/04/2020] [Indexed: 11/25/2022] Open
Abstract
Background Haplotypes combine the effects of several single nucleotide polymorphisms (SNPs) with high linkage disequilibrium, which benefit the genome-wide association analysis (GWAS). In the haplotype association analysis, both haplotype alleles and blocks are tested. Haplotype alleles can be inferred with the same statistics as SNPs in the linear mixed model, while blocks require the formulation of unified statistics to fit different genetic units, such as SNPs, haplotypes, and copy number variations. Results Based on the FaST-LMM, the fastLmPure function in the R/RcppArmadillo package has been introduced to speed up genome-wide regression scans by a re-weighted least square estimation. When large or highly significant blocks are tested based on EMMAX, the genome-wide haplotype association analysis takes only one to two rounds of genome-wide regression scans. With a genomic dataset of 541,595 SNPs from 513 maize inbred lines, 90,770 haplotype blocks were constructed across the whole genome, and three types of markers (SNPs, haplotype alleles, and haplotype blocks) were genome-widely associated with 17 agronomic traits in maize using the software developed here. Conclusions Two SNPs were identified for LNAE, four haplotype alleles for TMAL, LNAE, CD, and DTH, and only three blocks reached the significant level for TMAL, CD, and KNPR. Compared to the R/lm function, the computational time was reduced by ~ 10–15 times.
Collapse
Affiliation(s)
- Heli Chen
- Research Center for Aquatic Biotechnology, Chinese Academy of Fishery Sciences, Beijing, 100141, People's Republic of China
| | - Zhiyu Hao
- College of Animal Science and Technology, Northeast Agricultural University, Harbin, 150030, China
| | - Yunfeng Zhao
- Research Center for Aquatic Biotechnology, Chinese Academy of Fishery Sciences, Beijing, 100141, People's Republic of China
| | - Runqing Yang
- Research Center for Aquatic Biotechnology, Chinese Academy of Fishery Sciences, Beijing, 100141, People's Republic of China. .,College of Animal Science and Technology, Northeast Agricultural University, Harbin, 150030, China.
| |
Collapse
|
23
|
Couvy-Duchesne B, Faouzi J, Martin B, Thibeau-Sutre E, Wild A, Ansart M, Durrleman S, Dormont D, Burgos N, Colliot O. Ensemble Learning of Convolutional Neural Network, Support Vector Machine, and Best Linear Unbiased Predictor for Brain Age Prediction: ARAMIS Contribution to the Predictive Analytics Competition 2019 Challenge. Front Psychiatry 2020; 11:593336. [PMID: 33384629 PMCID: PMC7770104 DOI: 10.3389/fpsyt.2020.593336] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Accepted: 11/20/2020] [Indexed: 12/14/2022] Open
Abstract
We ranked third in the Predictive Analytics Competition (PAC) 2019 challenge by achieving a mean absolute error (MAE) of 3.33 years in predicting age from T1-weighted MRI brain images. Our approach combined seven algorithms that allow generating predictions when the number of features exceeds the number of observations, in particular, two versions of best linear unbiased predictor (BLUP), support vector machine (SVM), two shallow convolutional neural networks (CNNs), and the famous ResNet and Inception V1. Ensemble learning was derived from estimating weights via linear regression in a hold-out subset of the training sample. We further evaluated and identified factors that could influence prediction accuracy: choice of algorithm, ensemble learning, and features used as input/MRI image processing. Our prediction error was correlated with age, and absolute error was greater for older participants, suggesting to increase the training sample for this subgroup. Our results may be used to guide researchers to build age predictors on healthy individuals, which can be used in research and in the clinics as non-specific predictors of disease status.
Collapse
Affiliation(s)
- Baptiste Couvy-Duchesne
- Paris Brain Institute, ICM, Paris, France.,Inserm, U 1127, Paris, France.,CNRS, UMR 7225, Paris, France.,Sorbonne Université, Paris, France.,Inria Paris, Aramis project-team, Paris, France.,Institute for Molecular Bioscience, The University of Queensland, St Lucia, QLD, Australia
| | - Johann Faouzi
- Paris Brain Institute, ICM, Paris, France.,Inserm, U 1127, Paris, France.,CNRS, UMR 7225, Paris, France.,Sorbonne Université, Paris, France.,Inria Paris, Aramis project-team, Paris, France
| | - Benoît Martin
- Paris Brain Institute, ICM, Paris, France.,Inserm, U 1127, Paris, France.,CNRS, UMR 7225, Paris, France.,Sorbonne Université, Paris, France.,Inria Paris, Aramis project-team, Paris, France
| | - Elina Thibeau-Sutre
- Paris Brain Institute, ICM, Paris, France.,Inserm, U 1127, Paris, France.,CNRS, UMR 7225, Paris, France.,Sorbonne Université, Paris, France.,Inria Paris, Aramis project-team, Paris, France
| | - Adam Wild
- Paris Brain Institute, ICM, Paris, France.,Inserm, U 1127, Paris, France.,CNRS, UMR 7225, Paris, France.,Sorbonne Université, Paris, France.,Inria Paris, Aramis project-team, Paris, France
| | - Manon Ansart
- Paris Brain Institute, ICM, Paris, France.,Inserm, U 1127, Paris, France.,CNRS, UMR 7225, Paris, France.,Sorbonne Université, Paris, France.,Inria Paris, Aramis project-team, Paris, France
| | - Stanley Durrleman
- Paris Brain Institute, ICM, Paris, France.,Inserm, U 1127, Paris, France.,CNRS, UMR 7225, Paris, France.,Sorbonne Université, Paris, France.,Inria Paris, Aramis project-team, Paris, France
| | - Didier Dormont
- Paris Brain Institute, ICM, Paris, France.,Inserm, U 1127, Paris, France.,CNRS, UMR 7225, Paris, France.,Sorbonne Université, Paris, France.,Inria Paris, Aramis project-team, Paris, France.,AP-HP, Hôpital de la Pitié-Salpêtrière, Department of Neuroradiology, Paris, France
| | - Ninon Burgos
- Paris Brain Institute, ICM, Paris, France.,Inserm, U 1127, Paris, France.,CNRS, UMR 7225, Paris, France.,Sorbonne Université, Paris, France.,Inria Paris, Aramis project-team, Paris, France
| | - Olivier Colliot
- Paris Brain Institute, ICM, Paris, France.,Inserm, U 1127, Paris, France.,CNRS, UMR 7225, Paris, France.,Sorbonne Université, Paris, France.,Inria Paris, Aramis project-team, Paris, France
| |
Collapse
|
24
|
Gao J, Zhou X, Hao Z, Jiang L, Yang R. Genome-wide barebones regression scan for mixed-model association analysis. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2020; 133:51-58. [PMID: 31552442 DOI: 10.1007/s00122-019-03439-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Accepted: 09/17/2019] [Indexed: 06/10/2023]
Abstract
Based on the simplified FaST-LMM, wherein genomic variance is replaced with heritability, we have significantly improved computational efficiency by implementing rapid R/fastLmPure to statistically infer the genetic effects of tested SNPs and focus on large or highly significant SNPs obtained using the EMMAX algorithm. For a genome-wide mixed-model association analysis, we introduce a barebones linear model fitting function called fastLmPure from the R/RcppArmadillo package for the rapid estimation of single nucleotide polymorphism (SNP) effects and the maximum likelihood values of factored spectrally transformed linear mixed models (FaST-LMM). Starting from the estimated genomic heritability of quantitative traits under a null model without quantitative trait nucleotides, maximum likelihood estimations of the polygenic heritabilities of candidate markers consume the same time as approximately four rounds of genome-wide regression scans. When focusing only on SNPs with large effects or high significance levels, as estimated by the efficient mixed-model association expedited algorithm, the run time of genome-wide mixed-model association analysis is reduced to at most two rounds of genome-wide regression scans. We have developed a novel software application called Single-RunKing to transform nonlinear mixed-model association analyses into barebones linear regression scans. Based on a realised relationship matrix calculated using genome-wide markers, Single-RunKing saves significantly computation time, as compared with the FaST-LMM that optimises the variance ratios of polygenic variances to residual variances using the R/lm function.
Collapse
Affiliation(s)
- Jin Gao
- Wuxi Fisheries College, Nanjing Agricultural University, Wuxi, 214081, China
| | - Xuefei Zhou
- Zhongbo International Business School, Guangzhou Zhongbo Education Corporation Limited, Guangzhou, 511458, China
| | - Zhiyu Hao
- College of Animal Science and Technology, Northeast Agricultural University, Harbin, 150030, China
| | - Li Jiang
- Research Centre for Aquatic Biotechnology, Chinese Academy of Fishery Sciences, Beijing, 100141, China
| | - Runqing Yang
- Wuxi Fisheries College, Nanjing Agricultural University, Wuxi, 214081, China.
- Research Centre for Aquatic Biotechnology, Chinese Academy of Fishery Sciences, Beijing, 100141, China.
| |
Collapse
|
25
|
Yengo L, Sidari M, Verweij KJH, Visscher PM, Keller MC, Zietsch BP. No Evidence for Social Genetic Effects or Genetic Similarity Among Friends Beyond that Due to Population Stratification: A Reappraisal of Domingue et al (2018). Behav Genet 2019; 50:67-71. [PMID: 31713005 DOI: 10.1007/s10519-019-09979-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Accepted: 10/24/2019] [Indexed: 11/30/2022]
Abstract
Using data from 5500 adolescents from the National Longitudinal Study of Adolescent to Adult Health, Domingue et al. (Proc Natl Acad Sci 25:256., 2018) claimed to show that friends are genetically more similar to one another than randomly selected peers, beyond the confounding effects of population stratification by ancestry. The authors also claimed to show 'social-genetic' effects, whereby individuals' educational attainment (EA) is influenced by their friends' genes. We argue that neither claim is justified by the data. Mathematically we show that (1) the genetic similarity reported between friends is far larger than theoretically possible if it was caused by phenotypic assortment as the authors claim; uncontrolled population stratification is a likely reason for the genetic similarity they observed, and (2) significant association between individuals' EA and their friends' polygenic scores for EA is a necessary consequence of EA similarity among friends, and does not provide evidence for social-genetic effects. Going forward, we urge caution in the analysis and interpretation of data at the intersection of human genetics and the social sciences.
Collapse
Affiliation(s)
- Loic Yengo
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, 4072, Australia
| | - Morgan Sidari
- School of Psychology, The University of Queensland, Brisbane, 4072, Australia
| | - Karin J H Verweij
- Department of Psychiatry, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | - Peter M Visscher
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, 4072, Australia
| | - Matthew C Keller
- Institute for Behavioral Genetics, University of Colorado, Boulder, CO, USA
- Department of Psychology and Neuroscience, University of Colorado, Boulder, CO, USA
| | - Brendan P Zietsch
- School of Psychology, The University of Queensland, Brisbane, 4072, Australia.
| |
Collapse
|
26
|
Improved polygenic prediction by Bayesian multiple regression on summary statistics. Nat Commun 2019; 10:5086. [PMID: 31704910 PMCID: PMC6841727 DOI: 10.1038/s41467-019-12653-0] [Citation(s) in RCA: 224] [Impact Index Per Article: 44.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Accepted: 08/30/2019] [Indexed: 01/21/2023] Open
Abstract
Accurate prediction of an individual’s phenotype from their DNA sequence is one of the great promises of genomics and precision medicine. We extend a powerful individual-level data Bayesian multiple regression model (BayesR) to one that utilises summary statistics from genome-wide association studies (GWAS), SBayesR. In simulation and cross-validation using 12 real traits and 1.1 million variants on 350,000 individuals from the UK Biobank, SBayesR improves prediction accuracy relative to commonly used state-of-the-art summary statistics methods at a fraction of the computational resources. Furthermore, using summary statistics for variants from the largest GWAS meta-analysis (n ≈ 700, 000) on height and BMI, we show that on average across traits and two independent data sets that SBayesR improves prediction R2 by 5.2% relative to LDpred and by 26.5% relative to clumping and p value thresholding. Various approaches are being used for polygenic prediction including Bayesian multiple regression methods that require access to individual-level genotype data. Here, the authors extend BayesR to utilise GWAS summary statistics (SBayesR) and show that it outperforms other summary statistic-based methods.
Collapse
|
27
|
Yáñez JM, Yoshida GM, Parra Á, Correa K, Barría A, Bassini LN, Christensen KA, López ME, Carvalheiro R, Lhorente JP, Pulgar R. Comparative Genomic Analysis of Three Salmonid Species Identifies Functional Candidate Genes Involved in Resistance to the Intracellular Bacterium Piscirickettsia salmonis. Front Genet 2019; 10:665. [PMID: 31428125 PMCID: PMC6690157 DOI: 10.3389/fgene.2019.00665] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2019] [Accepted: 06/25/2019] [Indexed: 12/23/2022] Open
Abstract
Piscirickettsia salmonis is the etiologic agent of salmon rickettsial syndrome (SRS) and is responsible for considerable economic losses in salmon aquaculture. The bacterium affects coho salmon (CS; Oncorhynchus kisutch), Atlantic salmon (AS; Salmo salar), and rainbow trout (RT; Oncorhynchus mykiss) in several countries, including Norway, Canada, Scotland, Ireland, and Chile. We used Bayesian genome-wide association study analyses to investigate the genetic architecture of resistance to P. salmonis in farmed populations of these species. Resistance to SRS was defined as the number of days to death and as binary survival (BS). A total of 828 CS, 2130 RT, and 2601 AS individuals were phenotyped and then genotyped using double-digest restriction site-associated DNA sequencing and 57K and 50K Affymetrix® Axiom® single nucleotide polymorphism (SNP) panels, respectively. Both traits of SRS resistance in CS and RT appeared to be under oligogenic control. In AS, there was evidence of polygenic control of SRS resistance. To identify candidate genes associated with resistance, we applied a comparative genomics approach in which we systematically explored the complete set of genes adjacent to SNPs, which explained more than 1% of the genetic variance of resistance in each salmonid species (533 genes in total). Thus, genes were classified based on the following criteria: i) shared function of their protein domains among species, ii) shared orthology among species, iii) proximity to the SNP explaining the highest proportion of the genetic variance, and iv) presence in more than one genomic region explaining more than 1% of the genetic variance within species. Our results allowed us to identify 120 candidate genes belonging to at least one of the four criteria described above. Of these, 21 of them were part of at least two of the criteria defined above and are suggested to be strong functional candidates influencing P. salmonis resistance. These genes are related to diverse biological processes, such as kinase activity, GTP hydrolysis, helicase activity, lipid metabolism, cytoskeletal dynamics, inflammation, and innate immune response, which seem essential in the host response against P. salmonis infection. These results provide fundamental knowledge on the potential functional genes underpinning resistance against P. salmonis in three salmonid species.
Collapse
Affiliation(s)
- José M. Yáñez
- Facultad de Ciencias Veterinarias y Pecuarias, Universidad de Chile, Santiago, Chile
- Núcleo Milenio INVASAL, Concepción, Chile
| | - Grazyella M. Yoshida
- Facultad de Ciencias Veterinarias y Pecuarias, Universidad de Chile, Santiago, Chile
| | - Ángel Parra
- Facultad de Ciencias Veterinarias y Pecuarias, Universidad de Chile, Santiago, Chile
- Instituto de Nutrición y Tecnología de los Alimentos, Universidad de Chile, Santiago, Chile
- Doctorado en Acuicultura. Programa Cooperativo Universidad de Chile, Universidad Católica del Norte, Pontificia Universidad Católica de Valparaíso, Valparaíso, Chile
- Facultad de Ciencias del Mar, Universidad Católica del Norte, Coquimbo, Chile
| | | | - Agustín Barría
- Facultad de Ciencias Veterinarias y Pecuarias, Universidad de Chile, Santiago, Chile
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh Easter Bush, Midlothian, United Kingdom
| | - Liane N. Bassini
- Escuela de Medicina Veterinaria, Facultad de Ciencias de la Vida, Universidad Andres Bello, Santiago, Chile
| | | | - Maria E. López
- Facultad de Ciencias Veterinarias y Pecuarias, Universidad de Chile, Santiago, Chile
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Roberto Carvalheiro
- School of Agricultural and Veterinarian Sciences, São Paulo State University (Unesp), Jaboticabal, Brazil
- National Council for Scientific and Technological Development (CNPq), Brasília, Brazil
| | | | - Rodrigo Pulgar
- Instituto de Nutrición y Tecnología de los Alimentos, Universidad de Chile, Santiago, Chile
| |
Collapse
|
28
|
Peace CP, Bianco L, Troggio M, van de Weg E, Howard NP, Cornille A, Durel CE, Myles S, Migicovsky Z, Schaffer RJ, Costes E, Fazio G, Yamane H, van Nocker S, Gottschalk C, Costa F, Chagné D, Zhang X, Patocchi A, Gardiner SE, Hardner C, Kumar S, Laurens F, Bucher E, Main D, Jung S, Vanderzande S. Apple whole genome sequences: recent advances and new prospects. HORTICULTURE RESEARCH 2019; 6:59. [PMID: 30962944 PMCID: PMC6450873 DOI: 10.1038/s41438-019-0141-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Revised: 03/15/2019] [Accepted: 03/15/2019] [Indexed: 05/19/2023]
Abstract
In 2010, a major scientific milestone was achieved for tree fruit crops: publication of the first draft whole genome sequence (WGS) for apple (Malus domestica). This WGS, v1.0, was valuable as the initial reference for sequence information, fine mapping, gene discovery, variant discovery, and tool development. A new, high quality apple WGS, GDDH13 v1.1, was released in 2017 and now serves as the reference genome for apple. Over the past decade, these apple WGSs have had an enormous impact on our understanding of apple biological functioning, trait physiology and inheritance, leading to practical applications for improving this highly valued crop. Causal gene identities for phenotypes of fundamental and practical interest can today be discovered much more rapidly. Genome-wide polymorphisms at high genetic resolution are screened efficiently over hundreds to thousands of individuals with new insights into genetic relationships and pedigrees. High-density genetic maps are constructed efficiently and quantitative trait loci for valuable traits are readily associated with positional candidate genes and/or converted into diagnostic tests for breeders. We understand the species, geographical, and genomic origins of domesticated apple more precisely, as well as its relationship to wild relatives. The WGS has turbo-charged application of these classical research steps to crop improvement and drives innovative methods to achieve more durable, environmentally sound, productive, and consumer-desirable apple production. This review includes examples of basic and practical breakthroughs and challenges in using the apple WGSs. Recommendations for "what's next" focus on necessary upgrades to the genome sequence data pool, as well as for use of the data, to reach new frontiers in genomics-based scientific understanding of apple.
Collapse
Affiliation(s)
- Cameron P. Peace
- Department of Horticulture, Washington State University, Pullman, WA 99164 USA
| | - Luca Bianco
- Computational Biology, Fondazione Edmund Mach, San Michele all’Adige, TN 38010 Italy
| | - Michela Troggio
- Department of Genomics and Biology of Fruit Crops, Fondazione Edmund Mach, San Michele all’Adige, TN 38010 Italy
| | - Eric van de Weg
- Plant Breeding, Wageningen University and Research, Wageningen, 6708PB The Netherlands
| | - Nicholas P. Howard
- Department of Horticultural Science, University of Minnesota, St. Paul, MN 55108 USA
- Institut für Biologie und Umweltwissenschaften, Carl von Ossietzky Universität, 26129 Oldenburg, Germany
| | - Amandine Cornille
- GQE – Le Moulon, Institut National de la Recherche Agronomique, University of Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, 91190 Gif-sur-Yvette, France
| | - Charles-Eric Durel
- Institut National de la Recherche Agronomique, Institut de Recherche en Horticulture et Semences, UMR 1345, 49071 Beaucouzé, France
| | - Sean Myles
- Department of Plant, Food and Environmental Sciences, Faculty of Agriculture, Dalhousie University, Truro, NS B2N 5E3 Canada
| | - Zoë Migicovsky
- Department of Plant, Food and Environmental Sciences, Faculty of Agriculture, Dalhousie University, Truro, NS B2N 5E3 Canada
| | - Robert J. Schaffer
- The New Zealand Institute for Plant and Food Research Ltd, Motueka, 7198 New Zealand
- School of Biological Sciences, University of Auckland, Auckland, 1142 New Zealand
| | - Evelyne Costes
- AGAP, INRA, CIRAD, Montpellier SupAgro, University of Montpellier, Montpellier, France
| | - Gennaro Fazio
- Plant Genetic Resources Unit, USDA ARS, Geneva, NY 14456 USA
| | - Hisayo Yamane
- Laboratory of Pomology, Graduate School of Agriculture, Kyoto University, Kyoto, 606-8502 Japan
| | - Steve van Nocker
- Department of Horticulture, Michigan State University, East Lansing, MI 48824 USA
| | - Chris Gottschalk
- Department of Horticulture, Michigan State University, East Lansing, MI 48824 USA
| | - Fabrizio Costa
- Department of Genomics and Biology of Fruit Crops, Fondazione Edmund Mach, San Michele all’Adige, TN 38010 Italy
| | - David Chagné
- The New Zealand Institute for Plant and Food Research Ltd (Plant & Food Research), Palmerston North Research Centre, Palmerston North, 4474 New Zealand
| | - Xinzhong Zhang
- College of Horticulture, China Agricultural University, 100193 Beijing, China
| | | | - Susan E. Gardiner
- The New Zealand Institute for Plant and Food Research Ltd (Plant & Food Research), Palmerston North Research Centre, Palmerston North, 4474 New Zealand
| | - Craig Hardner
- Queensland Alliance of Agriculture and Food Innovation, University of Queensland, St Lucia, 4072 Australia
| | - Satish Kumar
- New Cultivar Innovation, Plant and Food Research, Havelock North, 4130 New Zealand
| | - Francois Laurens
- Institut National de la Recherche Agronomique, Institut de Recherche en Horticulture et Semences, UMR 1345, 49071 Beaucouzé, France
| | - Etienne Bucher
- Institut National de la Recherche Agronomique, Institut de Recherche en Horticulture et Semences, UMR 1345, 49071 Beaucouzé, France
- Agroscope, 1260 Changins, Switzerland
| | - Dorrie Main
- Department of Horticulture, Washington State University, Pullman, WA 99164 USA
| | - Sook Jung
- Department of Horticulture, Washington State University, Pullman, WA 99164 USA
| | - Stijn Vanderzande
- Department of Horticulture, Washington State University, Pullman, WA 99164 USA
| |
Collapse
|
29
|
Wen YJ, Zhang H, Ni YL, Huang B, Zhang J, Feng JY, Wang SB, Dunwell JM, Zhang YM, Wu R. Methodological implementation of mixed linear models in multi-locus genome-wide association studies. Brief Bioinform 2019; 19:700-712. [PMID: 28158525 PMCID: PMC6054291 DOI: 10.1093/bib/bbw145] [Citation(s) in RCA: 187] [Impact Index Per Article: 37.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2016] [Indexed: 12/01/2022] Open
Abstract
The mixed linear model has been widely used in genome-wide association studies (GWAS), but its application to multi-locus GWAS analysis has not been explored and assessed. Here, we implemented a fast multi-locus random-SNP-effect EMMA (FASTmrEMMA) model for GWAS. The model is built on random single nucleotide polymorphism (SNP) effects and a new algorithm. This algorithm whitens the covariance matrix of the polygenic matrix K and environmental noise, and specifies the number of nonzero eigenvalues as one. The model first chooses all putative quantitative trait nucleotides (QTNs) with ≤ 0.005 P-values and then includes them in a multi-locus model for true QTN detection. Owing to the multi-locus feature, the Bonferroni correction is replaced by a less stringent selection criterion. Results from analyses of both simulated and real data showed that FASTmrEMMA is more powerful in QTN detection and model fit, has less bias in QTN effect estimation and requires a less running time than existing single- and multi-locus methods, such as empirical Bayes, settlement of mixed linear model under progressively exclusive relationship (SUPER), efficient mixed model association (EMMA), compressed MLM (CMLM) and enriched CMLM (ECMLM). FASTmrEMMA provides an alternative for multi-locus GWAS.
Collapse
Affiliation(s)
- Yang-Jun Wen
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, China
| | - Hanwen Zhang
- Applied Science, University of British Columbia, Columbia, Canada
| | - Yuan-Li Ni
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, China
| | - Bo Huang
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, China
| | - Jin Zhang
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, China
| | - Jian-Ying Feng
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, China
| | - Shi-Bo Wang
- College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Jim M Dunwell
- School of Agriculture, Policy and Development, University of Reading, Berkshire, UK
| | - Yuan-Ming Zhang
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, China.,College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Rongling Wu
- Public Health Sciences and Statistics and Center for Statistical Genetics, Pennsylvania State University, Hershey, PA, USA.,Center for Computational Biology, Beijing Forestry University, Beijing, China
| |
Collapse
|
30
|
Abstract
Genomic prediction has the potential to contribute to precision medicine. However, to date, the utility of such predictors is limited due to low accuracy for most traits. Here theory and simulation study are used to demonstrate that widespread pleiotropy among phenotypes can be utilised to improve genomic risk prediction. We show how a genetic predictor can be created as a weighted index that combines published genome-wide association study (GWAS) summary statistics across many different traits. We apply this framework to predict risk of schizophrenia and bipolar disorder in the Psychiatric Genomics consortium data, finding substantial heterogeneity in prediction accuracy increases across cohorts. For six additional phenotypes in the UK Biobank data, we find increases in prediction accuracy ranging from 0.7% for height to 47% for type 2 diabetes, when using a multi-trait predictor that combines published summary statistics from multiple traits, as compared to a predictor based only on one trait.
Collapse
|
31
|
Goddard ME, Kemper KE, MacLeod IM, Chamberlain AJ, Hayes BJ. Genetics of complex traits: prediction of phenotype, identification of causal polymorphisms and genetic architecture. Proc Biol Sci 2017; 283:rspb.2016.0569. [PMID: 27440663 DOI: 10.1098/rspb.2016.0569] [Citation(s) in RCA: 83] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2016] [Accepted: 06/23/2016] [Indexed: 01/01/2023] Open
Abstract
Complex or quantitative traits are important in medicine, agriculture and evolution, yet, until recently, few of the polymorphisms that cause variation in these traits were known. Genome-wide association studies (GWAS), based on the ability to assay thousands of single nucleotide polymorphisms (SNPs), have revolutionized our understanding of the genetics of complex traits. We advocate the analysis of GWAS data by a statistical method that fits all SNP effects simultaneously, assuming that these effects are drawn from a prior distribution. We illustrate how this method can be used to predict future phenotypes, to map and identify the causal mutations, and to study the genetic architecture of complex traits. The genetic architecture of complex traits is even more complex than previously thought: in almost every trait studied there are thousands of polymorphisms that explain genetic variation. Methods of predicting future phenotypes, collectively known as genomic selection or genomic prediction, have been widely adopted in livestock and crop breeding, leading to increased rates of genetic improvement.
Collapse
Affiliation(s)
- M E Goddard
- Faculty of Veterinary and Agricultural Sciences, University of Melbourne, Melbourne, Victoria 3010, Australia Department of Economic Development, Jobs, Transport and Resources, AgriBio, La Trobe University, Bundoora, Victoria 3083, Australia
| | - K E Kemper
- Faculty of Veterinary and Agricultural Sciences, University of Melbourne, Melbourne, Victoria 3010, Australia
| | - I M MacLeod
- Faculty of Veterinary and Agricultural Sciences, University of Melbourne, Melbourne, Victoria 3010, Australia Department of Economic Development, Jobs, Transport and Resources, AgriBio, La Trobe University, Bundoora, Victoria 3083, Australia Dairy Futures Cooperative Research Centre, AgriBio, La Trobe University, Bundoora, Victoria 3083, Australia
| | - A J Chamberlain
- Department of Economic Development, Jobs, Transport and Resources, AgriBio, La Trobe University, Bundoora, Victoria 3083, Australia
| | - B J Hayes
- Department of Economic Development, Jobs, Transport and Resources, AgriBio, La Trobe University, Bundoora, Victoria 3083, Australia School of Applied System Biology, La Trobe University, Agribiosciences Building, Bundoora, Australia
| |
Collapse
|
32
|
Yoshida GM, Lhorente JP, Carvalheiro R, Yáñez JM. Bayesian genome-wide association analysis for body weight in farmed Atlantic salmon (Salmo salar L.). Anim Genet 2017; 48:698-703. [PMID: 29044715 DOI: 10.1111/age.12621] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/15/2017] [Indexed: 12/15/2022]
Abstract
We performed a genome-wide association study to detect markers associated with growth traits in Atlantic salmon. The analyzed traits included body weight at tagging (BWT) and body weight at 25 months (BW25M). Genotypes of 4662 animals were imputed from the 50K SNP chip to the 200K SNP chip using fimpute software. The markers were simultaneously modeled using Bayes C to identify genomic regions associated with the traits. We identified windows explaining a maximum of 3.71% and 3.61% of the genetic variance for BWT and BW25M respectively. We found potential candidate genes located within the top ten 1-Mb windows for BWT and BW25M. For instance, the vitronectin (VTN) gene, which has been previously reported to be associated with cell growth, was found within one of the top ten 1-Mb windows for BWT. In addition, the WNT1-inducible-signaling pathway protein 3, melanocortin 2 receptor accessory protein 2, myosin light chain kinase, transforming growth factor beta receptor type 3 and myosin light chain 1 genes, which have been reported to be associated with skeletal growth in humans, growth stimulation during the larval stage in zebrafish, body weight in pigs, feed conversion in chickens and growth rate of sheep skeletal muscle respectively, were found within some of the top ten 1-Mb windows for BW25M. These results indicate that growth traits are most likely controlled by many variants with relatively small effects in Atlantic salmon. The genomic regions associated with the traits studied here may provide further insight into the functional regions underlying growth traits in this species.
Collapse
Affiliation(s)
- G M Yoshida
- Facultad de Ciencias Veterinarias y Pecuarias, Universidad de Chile, Av Santa Rosa 11735, La Pintana, Santiago, 8820808, Chile.,Animal Science Department, Faculdade de Ciências Agrárias e Veterinárias (FCAV), Universidade Estadual Paulista "Júlio de Mesquita Filho" (UNESP), Campus Jaboticabal, Via de Acesso Prof. Paulo Donato Castellane, 14884-900, Jaboticabal, Brazil
| | | | - R Carvalheiro
- Animal Science Department, Faculdade de Ciências Agrárias e Veterinárias (FCAV), Universidade Estadual Paulista "Júlio de Mesquita Filho" (UNESP), Campus Jaboticabal, Via de Acesso Prof. Paulo Donato Castellane, 14884-900, Jaboticabal, Brazil
| | - J M Yáñez
- Facultad de Ciencias Veterinarias y Pecuarias, Universidad de Chile, Av Santa Rosa 11735, La Pintana, Santiago, 8820808, Chile.,Aquainnovo, Cardonal S/N, Puerto Montt, Chile
| |
Collapse
|
33
|
Resende RT, Resende MDV, Silva FF, Azevedo CF, Takahashi EK, Silva-Junior OB, Grattapaglia D. Regional heritability mapping and genome-wide association identify loci for complex growth, wood and disease resistance traits in Eucalyptus. THE NEW PHYTOLOGIST 2017; 213:1287-1300. [PMID: 28079935 DOI: 10.1111/nph.14266] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 09/08/2016] [Indexed: 05/18/2023]
Abstract
Although genome-wide association studies (GWAS) have provided valuable insights into the decoding of the relationships between sequence variation and complex phenotypes, they have explained little heritability. Regional heritability mapping (RHM) provides heritability estimates for genomic segments containing both common and rare allelic effects that individually contribute too little variance to be detected by GWAS. We carried out GWAS and RHM for seven growth, wood and disease resistance traits in a breeding population of 768 Eucalyptus hybrid trees using EuCHIP60K. Total genomic heritabilities accounted for large proportions (64-89%) of pedigree-based trait heritabilities, providing additional evidence that complex traits in eucalypts are controlled by many sequence variants across the frequency spectrum, each with small contributions to the phenotypic variance. RHM detected 26 quantitative trait loci (QTLs) encompassing 2191 single nucleotide polymorphisms (SNPs), whereas GWAS detected 13 single SNP-trait associations. RHM and GWAS QTLs individually explained 5-15% and 4-6% of the genomic heritability, respectively. RHM was superior to GWAS in capturing larger proportions of genomic heritability. Equated to previously mapped QTLs, our results highlighted genomic regions for further examination towards gene discovery. RHM-QTLs bearing a combination of common and rare variants could be useful enhancements to incorporate prior knowledge of the underlying genetic architecture in genomic prediction models.
Collapse
Affiliation(s)
| | - Marcos Deon Vilela Resende
- Department of Statistics, Universidade Federal de Viçosa, Viçosa, MG, 36570-000, Brazil
- EMBRAPA Forestry Research, Colombo, PR, 83411-000, Brazil
| | - Fabyano Fonseca Silva
- Department of Animal Science, Universidade Federal de Viçosa, Viçosa, MG, 36570-000, Brazil
| | | | | | - Orzenil Bonfim Silva-Junior
- EMBRAPA Genetic Resources and Biotechnology - EPqB, 70770-910, Brasilia, DF, Brazil
- Universidade Católica de Brasília - SGAN, 916 modulo B, Brasilia, DF, 70790-160, Brazil
| | - Dario Grattapaglia
- EMBRAPA Genetic Resources and Biotechnology - EPqB, 70770-910, Brasilia, DF, Brazil
- Universidade Católica de Brasília - SGAN, 916 modulo B, Brasilia, DF, 70790-160, Brazil
| |
Collapse
|
34
|
|
35
|
Li H, Zhang L, Hu J, Zhang F, Chen B, Xu K, Gao G, Li H, Zhang T, Li Z, Wu X. Genome-Wide Association Mapping Reveals the Genetic Control Underlying Branch Angle in Rapeseed ( Brassica napus L.). FRONTIERS IN PLANT SCIENCE 2017; 8:1054. [PMID: 28674549 PMCID: PMC5474488 DOI: 10.3389/fpls.2017.01054] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Accepted: 05/31/2017] [Indexed: 05/20/2023]
Abstract
Plant architecture is vital not only for crop yield, but also for field management, such as mechanical harvesting. The branch angle is one of the key factors determining plant architecture. With the aim of revealing the genetic control underlying branch angle in rapeseed (Brassica napus L.), the positional variation of branch angles on individual plants was evaluated, and the branch angle increased with the elevation of branch position. Furthermore, three middle branches of individual plants were selected to measure the branch angle because they exhibited the most representative phenotypic values. An association panel with 472 diverse accessions was estimated for branch angle trait in six environments and genotyped with a 60K Brassica Infinium® SNP array. As a result of association mapping, 46 and 38 significantly-associated loci were detected using a mixed linear model (MLM) and a multi-locus random-SNP-effect mixed linear model (MRMLM), which explained up to 62.2 and 66.2% of the cumulative phenotypic variation, respectively. Numerous highly-promising candidate genes were identified by annotating against Arabidopsis thaliana homologous, including some first found in rapeseed, such as TAC1, SGR1, SGR3, and SGR5. These findings reveal the genetic control underlying branch angle and provide insight into genetic improvements that are possible in the plant architecture of rapeseed.
Collapse
Affiliation(s)
- Hongge Li
- Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of AgricultureWuhan, China
- National Key Lab of Crop Genetic Improvement, National Center of Crop molecular Breeding, National Center of Oil Crop Improvement, College of Plant Science and Technology, Huazhong Agricultural UniversityWuhan, China
| | - Liping Zhang
- Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of AgricultureWuhan, China
| | - Jihong Hu
- Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of AgricultureWuhan, China
| | - Fugui Zhang
- Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of AgricultureWuhan, China
| | - Biyun Chen
- Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of AgricultureWuhan, China
| | - Kun Xu
- Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of AgricultureWuhan, China
| | - Guizhen Gao
- Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of AgricultureWuhan, China
| | - Hao Li
- Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of AgricultureWuhan, China
| | - Tianyao Zhang
- Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of AgricultureWuhan, China
| | - Zaiyun Li
- National Key Lab of Crop Genetic Improvement, National Center of Crop molecular Breeding, National Center of Oil Crop Improvement, College of Plant Science and Technology, Huazhong Agricultural UniversityWuhan, China
- *Correspondence: Zaiyun Li
| | - Xiaoming Wu
- Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of AgricultureWuhan, China
- Xiaoming Wu
| |
Collapse
|
36
|
Veerkamp RF, Bouwman AC, Schrooten C, Calus MPL. Genomic prediction using preselected DNA variants from a GWAS with whole-genome sequence data in Holstein-Friesian cattle. Genet Sel Evol 2016; 48:95. [PMID: 27905878 PMCID: PMC5134274 DOI: 10.1186/s12711-016-0274-1] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2016] [Accepted: 11/24/2016] [Indexed: 11/10/2022] Open
Abstract
Background Whole-genome sequence data is expected to capture genetic variation more completely than common genotyping panels. Our objective was to compare the proportion of variance explained and the accuracy of genomic prediction by using imputed sequence data or preselected SNPs from a genome-wide association study (GWAS) with imputed whole-genome sequence data. Methods Phenotypes were available for 5503 Holstein–Friesian bulls. Genotypes were imputed up to whole-genome sequence (13,789,029 segregating DNA variants) by using run 4 of the 1000 bull genomes project. The program GCTA was used to perform GWAS for protein yield (PY), somatic cell score (SCS) and interval from first to last insemination (IFL). From the GWAS, subsets of variants were selected and genomic relationship matrices (GRM) were used to estimate the variance explained in 2087 validation animals and to evaluate the genomic prediction ability. Finally, two GRM were fitted together in several models to evaluate the effect of selected variants that were in competition with all the other variants. Results The GRM based on full sequence data explained only marginally more genetic variation than that based on common SNP panels: for PY, SCS and IFL, genomic heritability improved from 0.81 to 0.83, 0.83 to 0.87 and 0.69 to 0.72, respectively. Sequence data also helped to identify more variants linked to quantitative trait loci and resulted in clearer GWAS peaks across the genome. The proportion of total variance explained by the selected variants combined in a GRM was considerably smaller than that explained by all variants (less than 0.31 for all traits). When selected variants were used, accuracy of genomic predictions decreased and bias increased. Conclusions Although 35 to 42 variants were detected that together explained 13 to 19% of the total variance (18 to 23% of the genetic variance) when fitted alone, there was no advantage in using dense sequence information for genomic prediction in the Holstein data used in our study. Detection and selection of variants within a single breed are difficult due to long-range linkage disequilibrium. Stringent selection of variants resulted in more biased genomic predictions, although this might be due to the training population being the same dataset from which the selected variants were identified. Electronic supplementary material The online version of this article (doi:10.1186/s12711-016-0274-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Roel F Veerkamp
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands. .,Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, P.O. Box 5003, 1432, Ås, Norway.
| | - Aniek C Bouwman
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| | | | - Mario P L Calus
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| |
Collapse
|
37
|
Zhang M, Baird PN. A decade of age-related macular degeneration risk models: What have we learned from them and where are we going? Ophthalmic Genet 2016; 38:301-307. [PMID: 27901647 DOI: 10.1080/13816810.2016.1227451] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The genomic revolution has revealed the complexity of multifactorial diseases, making the development of effective diagnostics extremely challenging. In turn, the prospect of precision medicine as applied through targeted therapeutic treatments continues to remain largely elusive. Age-related macular degeneration (AMD) as a complex disease falls under this category, despite it being one of the most well characterized multifactorial diseases. This reflects both the extent of identified genetic components and known environmental risk factors. Additional considerations in dissecting out the roles played by genetic and non-genetic risk factors arise through the rapid increase in prevalence of AMD with age and the varying time periods over which disease progression can occur, complicating efforts to discriminate between "progressors" and non-"progressors." As a consequence, extensive research into the aetiology of AMD is yet to realize a clinically acceptable predictive test. This review covers the current climate of risk models in late AMD but will focus mainly on genetic risk factors as well as the types of models that have currently been employed in the AMD modelling literature.
Collapse
Affiliation(s)
- Michael Zhang
- a Centre for Eye Research Australia, University of Melbourne , East Melbourne , Victoria , Australia
| | - Paul N Baird
- a Centre for Eye Research Australia, University of Melbourne , East Melbourne , Victoria , Australia
| |
Collapse
|
38
|
Urbinati I, Stafuzza NB, Oliveira MT, Chud TCS, Higa RH, Regitano LCDA, de Alencar MM, Buzanskas ME, Munari DP. Selection signatures in Canchim beef cattle. J Anim Sci Biotechnol 2016; 7:29. [PMID: 27158491 PMCID: PMC4858954 DOI: 10.1186/s40104-016-0089-5] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2015] [Accepted: 04/24/2016] [Indexed: 11/10/2022] Open
Abstract
Background Recent technological advances in genomics have allowed the genotyping of cattle through single nucleotide polymorphism (SNP) panels. High-density SNP panels possess greater genome coverage and are useful for the identification of conserved regions of the genome due to selection, known as selection signatures (SS). The SS are detectable by different methods, such as the extended haplotype homozygosity (EHH); and the integrated haplotype score (iHS), which is derived from the EHH. The aim of this study was to identify SS regions in Canchim cattle (composite breed), genotyped with high-density SNP panel. Results A total of 687,655 SNP markers and 396 samples remained for SS analysis after the genotype quality control. The iHS statistic for each marker was transformed into piHS for better interpretation of the results. Chromosomes BTA5 and BTA14 showed piHS > 5, with 39 and nine statistically significant SNPs (P < 0.00001), respectively. For the candidate selection regions, iHS values were computed across the genome and averaged within non-overlapping windows of 500 Kb. We have identified genes that play an important role in metabolism, melanin biosynthesis (pigmentation), and embryonic and bone development. Conclusions The observation of SS indicates that the selection processes performed in Canchim, as well as in the founder breeds (i.e. Charolais), are maintaining specific genomic regions, particularly on BTA5 and BTA14. These selection signatures regions could be associated with Canchim characterization. Electronic supplementary material The online version of this article (doi:10.1186/s40104-016-0089-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ismael Urbinati
- Departamento de Ciências Exatas, - Univ Estadual Paulista, Faculdade de Ciências Agrárias e Veterinárias, Jaboticabal, São Paulo 14884-900 Brazil
| | - Nedenia Bonvino Stafuzza
- Departamento de Ciências Exatas, - Univ Estadual Paulista, Faculdade de Ciências Agrárias e Veterinárias, Jaboticabal, São Paulo 14884-900 Brazil
| | - Marcos Túlio Oliveira
- Departamento de Tecnologia, UNESP - Univ Estadual Paulista, Faculdade de Ciências Agrárias e Veterinárias, Jaboticabal, São Paulo 14884-900 Brazil
| | - Tatiane Cristina Seleguim Chud
- Departamento de Ciências Exatas, - Univ Estadual Paulista, Faculdade de Ciências Agrárias e Veterinárias, Jaboticabal, São Paulo 14884-900 Brazil
| | | | | | | | - Marcos Eli Buzanskas
- Departamento de Ciências Exatas, - Univ Estadual Paulista, Faculdade de Ciências Agrárias e Veterinárias, Jaboticabal, São Paulo 14884-900 Brazil
| | - Danísio Prado Munari
- Departamento de Ciências Exatas, - Univ Estadual Paulista, Faculdade de Ciências Agrárias e Veterinárias, Jaboticabal, São Paulo 14884-900 Brazil
| |
Collapse
|
39
|
Chen GB, Lee SH, Zhu ZX, Benyamin B, Robinson MR. EigenGWAS: finding loci under selection through genome-wide association studies of eigenvectors in structured populations. Heredity (Edinb) 2016; 117:51-61. [PMID: 27142779 DOI: 10.1038/hdy.2016.25] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2015] [Revised: 03/02/2016] [Accepted: 03/07/2016] [Indexed: 12/13/2022] Open
Abstract
We develop a novel approach to identify regions of the genome underlying population genetic differentiation in any genetic data where the underlying population structure is unknown, or where the interest is assessing divergence along a gradient. By combining the statistical framework for genome-wide association studies (GWASs) with eigenvector decomposition (EigenGWAS), which is commonly used in population genetics to characterize the structure of genetic data, loci under selection can be identified without a requirement for discrete populations. We show through theory and simulation that our approach can identify regions under selection along gradients of ancestry, and in real data we confirm this by demonstrating LCT to be under selection between HapMap CEU-TSI cohorts, and we then validate this selection signal across European countries in the POPRES samples. HERC2 was also found to be differentiated between both the CEU-TSI cohort and within the POPRES sample, reflecting the likely anthropological differences in skin and hair colour between northern and southern European populations. Controlling for population stratification is of great importance in any quantitative genetic study and our approach also provides a simple, fast and accurate way of predicting principal components in independent samples. With ever increasing sample sizes across many fields, this approach is likely to be greatly utilized to gain individual-level eigenvectors avoiding the computational challenges associated with conducting singular value decomposition in large data sets. We have developed freely available software, Genetic Analysis Repository (GEAR), to facilitate the application of the methods.
Collapse
Affiliation(s)
- G-B Chen
- Queensland Brain Institute, The University of Queensland, Brisbane, Queensland, Australia
| | - S H Lee
- Queensland Brain Institute, The University of Queensland, Brisbane, Queensland, Australia.,School of Environmental and Rural Science, The University of New England, Armidale, New South Wales, Australia
| | - Z-X Zhu
- SPLUS Game, Guangzhou, Guangdong, China
| | - B Benyamin
- Queensland Brain Institute, The University of Queensland, Brisbane, Queensland, Australia
| | - M R Robinson
- Queensland Brain Institute, The University of Queensland, Brisbane, Queensland, Australia
| |
Collapse
|
40
|
Evaluation of random forest regression for prediction of breeding value from genomewide SNPs. J Genet 2016; 94:187-92. [PMID: 26174666 DOI: 10.1007/s12041-015-0501-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Genomic prediction is meant for estimating the breeding value using molecular marker data which has turned out to be a powerful tool for efficient utilization of germplasm resources and rapid improvement of cultivars. Model-based techniques have been widely used for prediction of breeding values of genotypes from genomewide association studies. However, application of the random forest (RF), a model-free ensemble learning method, is not widely used for prediction. In this study, the optimum values of tuning parameters of RF have been identified and applied to predict the breeding value of genotypes based on genomewide single-nucleotide polymorphisms (SNPs), where the number of SNPs (P variables) is much higher than the number of genotypes (n observations) (P » n). Further, a comparison was made with the model-based genomic prediction methods, namely, least absolute shrinkage and selection operator (LASSO), ridge regression (RR) and elastic net (EN) under P » n. It was found that the correlations between the predicted and observed trait response were 0.591, 0.539, 0.431 and 0.587 for RF, LASSO, RR and EN, respectively, which implies superiority of the RF over the model-based techniques in genomic prediction. Hence, we suggest that the RF methodology can be used as an alternative to the model-based techniques for the prediction of breeding value at genome level with higher accuracy.
Collapse
|
41
|
Ueki M, Tamiya G. Smooth-Threshold Multivariate Genetic Prediction with Unbiased Model Selection. Genet Epidemiol 2016; 40:233-43. [PMID: 26947266 PMCID: PMC5849235 DOI: 10.1002/gepi.21958] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2015] [Revised: 12/07/2015] [Accepted: 12/14/2015] [Indexed: 01/14/2023]
Abstract
We develop a new genetic prediction method, smooth-threshold multivariate genetic prediction, using single nucleotide polymorphisms (SNPs) data in genome-wide association studies (GWASs). Our method consists of two stages. At the first stage, unlike the usual discontinuous SNP screening as used in the gene score method, our method continuously screens SNPs based on the output from standard univariate analysis for marginal association of each SNP. At the second stage, the predictive model is built by a generalized ridge regression simultaneously using the screened SNPs with SNP weight determined by the strength of marginal association. Continuous SNP screening by the smooth thresholding not only makes prediction stable but also leads to a closed form expression of generalized degrees of freedom (GDF). The GDF leads to the Stein's unbiased risk estimation (SURE), which enables data-dependent choice of optimal SNP screening cutoff without using cross-validation. Our method is very rapid because computationally expensive genome-wide scan is required only once in contrast to the penalized regression methods including lasso and elastic net. Simulation studies that mimic real GWAS data with quantitative and binary traits demonstrate that the proposed method outperforms the gene score method and genomic best linear unbiased prediction (GBLUP), and also shows comparable or sometimes improved performance with the lasso and elastic net being known to have good predictive ability but with heavy computational cost. Application to whole-genome sequencing (WGS) data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) exhibits that the proposed method shows higher predictive power than the gene score and GBLUP methods.
Collapse
Affiliation(s)
- Masao Ueki
- Biostatistics Center, Kurume University, Kurume, Fukuoka, Japan
| | - Gen Tamiya
- Tohoku Medical Megabank Organization, Tohoku University, Aoba-Ku, Sendai, Miyagi, Japan
| |
Collapse
|
42
|
Holland D, Wang Y, Thompson WK, Schork A, Chen CH, Lo MT, Witoelar A, Werge T, O'Donovan M, Andreassen OA, Dale AM. Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics. Front Genet 2016; 7:15. [PMID: 26909100 PMCID: PMC4754432 DOI: 10.3389/fgene.2016.00015] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2015] [Accepted: 01/28/2016] [Indexed: 12/19/2022] Open
Abstract
Genome-wide Association Studies (GWAS) result in millions of summary statistics (“z-scores”) for single nucleotide polymorphism (SNP) associations with phenotypes. These rich datasets afford deep insights into the nature and extent of genetic contributions to complex phenotypes such as psychiatric disorders, which are understood to have substantial genetic components that arise from very large numbers of SNPs. The complexity of the datasets, however, poses a significant challenge to maximizing their utility. This is reflected in a need for better understanding the landscape of z-scores, as such knowledge would enhance causal SNP and gene discovery, help elucidate mechanistic pathways, and inform future study design. Here we present a parsimonious methodology for modeling effect sizes and replication probabilities, relying only on summary statistics from GWAS substudies, and a scheme allowing for direct empirical validation. We show that modeling z-scores as a mixture of Gaussians is conceptually appropriate, in particular taking into account ubiquitous non-null effects that are likely in the datasets due to weak linkage disequilibrium with causal SNPs. The four-parameter model allows for estimating the degree of polygenicity of the phenotype and predicting the proportion of chip heritability explainable by genome-wide significant SNPs in future studies with larger sample sizes. We apply the model to recent GWAS of schizophrenia (N = 82,315) and putamen volume (N = 12,596), with approximately 9.3 million SNP z-scores in both cases. We show that, over a broad range of z-scores and sample sizes, the model accurately predicts expectation estimates of true effect sizes and replication probabilities in multistage GWAS designs. We assess the degree to which effect sizes are over-estimated when based on linear-regression association coefficients. We estimate the polygenicity of schizophrenia to be 0.037 and the putamen to be 0.001, while the respective sample sizes required to approach fully explaining the chip heritability are 106 and 105. The model can be extended to incorporate prior knowledge such as pleiotropy and SNP annotation. The current findings suggest that the model is applicable to a broad array of complex phenotypes and will enhance understanding of their genetic architectures.
Collapse
Affiliation(s)
- Dominic Holland
- Multimodal Imaging Laboratory, University of CaliforniaSan Diego, La Jolla, CA, USA; Department of Neurosciences, University of CaliforniaSan Diego, La Jolla, CA, USA
| | - Yunpeng Wang
- Multimodal Imaging Laboratory, University of CaliforniaSan Diego, La Jolla, CA, USA; Department of Neurosciences, University of CaliforniaSan Diego, La Jolla, CA, USA; NORMENT, KG Jebsen Centre for Psychosis Research, Institute of Clinical Medicine, University of OsloOslo, Norway; Division of Mental Health and Addiction, Oslo University HospitalOslo, Norway
| | - Wesley K Thompson
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
| | - Andrew Schork
- Multimodal Imaging Laboratory, University of CaliforniaSan Diego, La Jolla, CA, USA; Department of Cognitive Sciences, University of CaliforniaSan Diego, La Jolla, CA, USA
| | - Chi-Hua Chen
- Multimodal Imaging Laboratory, University of CaliforniaSan Diego, La Jolla, CA, USA; Department of Radiology, University of CaliforniaSan Diego, La Jolla, CA, USA
| | - Min-Tzu Lo
- Multimodal Imaging Laboratory, University of CaliforniaSan Diego, La Jolla, CA, USA; Department of Radiology, University of CaliforniaSan Diego, La Jolla, CA, USA
| | - Aree Witoelar
- NORMENT, KG Jebsen Centre for Psychosis Research, Institute of Clinical Medicine, University of OsloOslo, Norway; Division of Mental Health and Addiction, Oslo University HospitalOslo, Norway
| | | | | | - Thomas Werge
- Institute of Biological Psychiatry, MHC, Sct. Hans Hospital and University of Copenhagen Copenhagen, Denmark
| | - Michael O'Donovan
- MRC Centre for Neuropsychiatric Genetics and Genomics, School of Medicine, Cardiff University Cardiff, UK
| | - Ole A Andreassen
- NORMENT, KG Jebsen Centre for Psychosis Research, Institute of Clinical Medicine, University of OsloOslo, Norway; Division of Mental Health and Addiction, Oslo University HospitalOslo, Norway
| | - Anders M Dale
- Multimodal Imaging Laboratory, University of CaliforniaSan Diego, La Jolla, CA, USA; Department of Neurosciences, University of CaliforniaSan Diego, La Jolla, CA, USA; Department of Psychiatry, University of CaliforniaSan Diego, La Jolla, CA, USA; Department of Radiology, University of CaliforniaSan Diego, La Jolla, CA, USA
| |
Collapse
|
43
|
Improving power and accuracy of genome-wide association studies via a multi-locus mixed linear model methodology. Sci Rep 2016; 6:19444. [PMID: 26787347 PMCID: PMC4726296 DOI: 10.1038/srep19444] [Citation(s) in RCA: 257] [Impact Index Per Article: 32.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2015] [Accepted: 12/14/2015] [Indexed: 02/05/2023] Open
Abstract
Genome-wide association studies (GWAS) have been widely used in genetic dissection of complex traits. However, common methods are all based on a fixed-SNP-effect mixed linear model (MLM) and single marker analysis, such as efficient mixed model analysis (EMMA). These methods require Bonferroni correction for multiple tests, which often is too conservative when the number of markers is extremely large. To address this concern, we proposed a random-SNP-effect MLM (RMLM) and a multi-locus RMLM (MRMLM) for GWAS. The RMLM simply treats the SNP-effect as random, but it allows a modified Bonferroni correction to be used to calculate the threshold p value for significance tests. The MRMLM is a multi-locus model including markers selected from the RMLM method with a less stringent selection criterion. Due to the multi-locus nature, no multiple test correction is needed. Simulation studies show that the MRMLM is more powerful in QTN detection and more accurate in QTN effect estimation than the RMLM, which in turn is more powerful and accurate than the EMMA. To demonstrate the new methods, we analyzed six flowering time related traits in Arabidopsis thaliana and detected more genes than previous reported using the EMMA. Therefore, the MRMLM provides an alternative for multi-locus GWAS.
Collapse
|
44
|
Thompson WK, Wang Y, Schork AJ, Witoelar A, Zuber V, Xu S, Werge T, Holland D, Andreassen OA, Dale AM. An Empirical Bayes Mixture Model for Effect Size Distributions in Genome-Wide Association Studies. PLoS Genet 2015; 11:e1005717. [PMID: 26714184 PMCID: PMC5456456 DOI: 10.1371/journal.pgen.1005717] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2014] [Accepted: 11/10/2015] [Indexed: 12/01/2022] Open
Abstract
Characterizing the distribution of effects from genome-wide genotyping data is crucial for understanding important aspects of the genetic architecture of complex traits, such as number or proportion of non-null loci, average proportion of phenotypic variance explained per non-null effect, power for discovery, and polygenic risk prediction. To this end, previous work has used effect-size models based on various distributions, including the normal and normal mixture distributions, among others. In this paper we propose a scale mixture of two normals model for effect size distributions of genome-wide association study (GWAS) test statistics. Test statistics corresponding to null associations are modeled as random draws from a normal distribution with zero mean; test statistics corresponding to non-null associations are also modeled as normal with zero mean, but with larger variance. The model is fit via minimizing discrepancies between the parametric mixture model and resampling-based nonparametric estimates of replication effect sizes and variances. We describe in detail the implications of this model for estimation of the non-null proportion, the probability of replication in de novo samples, the local false discovery rate, and power for discovery of a specified proportion of phenotypic variance explained from additive effects of loci surpassing a given significance threshold. We also examine the crucial issue of the impact of linkage disequilibrium (LD) on effect sizes and parameter estimates, both analytically and in simulations. We apply this approach to meta-analysis test statistics from two large GWAS, one for Crohn’s disease (CD) and the other for schizophrenia (SZ). A scale mixture of two normals distribution provides an excellent fit to the SZ nonparametric replication effect size estimates. While capturing the general behavior of the data, this mixture model underestimates the tails of the CD effect size distribution. We discuss the implications of pervasive small but replicating effects in CD and SZ on genomic control and power. Finally, we conclude that, despite having very similar estimates of variance explained by genotyped SNPs, CD and SZ have a broadly dissimilar genetic architecture, due to differing mean effect size and proportion of non-null loci. We describe in detail the implications of a particular mixture model (a scale mixture of two normals) for effect size distributions from genome-wide genotyping data. Parameters from this model can be used for estimation of the non-null proportion, the probability of replication in de novo samples, the local false discovery rate, power for detecting non-null loci, and proportion of variance explained from additive effects. Here, we fit this model by minimizing discrepancies with nonparametric estimates from a resampling-based algorithm. We examine the effects of linkage disequilibrium (LD) on effect sizes and parameter estimates, both analytically and in simulations. We validate this approach using meta-analysis test statistics (“z-scores”) from two large GWAS, one for Crohn’s disease and the other for schizophrenia. We demonstrate that for these studies a scale mixture of two normal distributions generally fits empirical replication effect sizes well, providing an excellent fit for the schizophrenia effect sizes but underestimating the tails of the distribution for Crohn’s disease.
Collapse
Affiliation(s)
- Wesley K. Thompson
- Institute of Biological Psychiatry, Mental Health Centre Sct. Hans, Mental Health Services, Copenhagen, Denmark
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Copenhagen, Denmark
- Department of Psychiatry, University of California, San Diego, La Jolla, California, United States of America
- * E-mail:
| | - Yunpeng Wang
- Division of Mental Health and Addiction, Oslo University Hospital, Oslo, Norway
| | - Andrew J. Schork
- Department of Cognitive Science, University of California, San Diego, La Jolla, California, United States of America
| | - Aree Witoelar
- Division of Mental Health and Addiction, Oslo University Hospital, Oslo, Norway
| | - Verena Zuber
- Division of Mental Health and Addiction, Oslo University Hospital, Oslo, Norway
| | - Shujing Xu
- Department of Psychiatry, University of California, San Diego, La Jolla, California, United States of America
| | - Thomas Werge
- Institute of Biological Psychiatry, Mental Health Centre Sct. Hans, Mental Health Services, Copenhagen, Denmark
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Copenhagen, Denmark
- Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
| | - Dominic Holland
- Multimodal Imaging Laboratory, University of California at San Diego, La Jolla, California, United States of America
| | | | - Ole A. Andreassen
- Division of Mental Health and Addiction, Oslo University Hospital, Oslo, Norway
| | - Anders M. Dale
- Multimodal Imaging Laboratory, University of California at San Diego, La Jolla, California, United States of America
| |
Collapse
|
45
|
Abstract
Whole-genome prediction (WGP) models that use single-nucleotide polymorphism marker information to predict genetic merit of animals and plants typically assume homogeneous residual variance. However, variability is often heterogeneous across agricultural production systems and may subsequently bias WGP-based inferences. This study extends classical WGP models based on normality, heavy-tailed specifications and variable selection to explicitly account for environmentally-driven residual heteroskedasticity under a hierarchical Bayesian mixed-models framework. WGP models assuming homogeneous or heterogeneous residual variances were fitted to training data generated under simulation scenarios reflecting a gradient of increasing heteroskedasticity. Model fit was based on pseudo-Bayes factors and also on prediction accuracy of genomic breeding values computed on a validation data subset one generation removed from the simulated training dataset. Homogeneous vs. heterogeneous residual variance WGP models were also fitted to two quantitative traits, namely 45-min postmortem carcass temperature and loin muscle pH, recorded in a swine resource population dataset prescreened for high and mild residual heteroskedasticity, respectively. Fit of competing WGP models was compared using pseudo-Bayes factors. Predictive ability, defined as the correlation between predicted and observed phenotypes in validation sets of a five-fold cross-validation was also computed. Heteroskedastic error WGP models showed improved model fit and enhanced prediction accuracy compared to homoskedastic error WGP models although the magnitude of the improvement was small (less than two percentage points net gain in prediction accuracy). Nevertheless, accounting for residual heteroskedasticity did improve accuracy of selection, especially on individuals of extreme genetic merit.
Collapse
|
46
|
Vilhjálmsson B, Yang J, Finucane H, Gusev A, Lindström S, Ripke S, Genovese G, Loh PR, Bhatia G, Do R, Hayeck T, Won HH, Kathiresan S, Pato M, Pato C, Tamimi R, Stahl E, Zaitlen N, Pasaniuc B, Belbin G, Kenny EE, Schierup MH, De Jager P, Patsopoulos NA, McCarroll S, Daly M, Purcell S, Chasman D, Neale B, Goddard M, Visscher PM, Kraft P, Patterson N, Price AL, Ripke S, Neale B, Corvin A, Walters J, Farh KH, Holmans P, Lee P, Bulik-Sullivan B, Collier D, Huang H, Pers T, Agartz I, Agerbo E, Albus M, Alexander M, Amin F, Bacanu S, Begemann M, Belliveau R, Bene J, Bergen S, Bevilacqua E, Bigdeli T, Black D, Bruggeman R, Buccola N, Buckner R, Byerley W, Cahn W, Cai G, Campion D, Cantor R, Carr V, Carrera N, Catts S, Chambert K, Chan R, Chen R, Chen E, Cheng W, Cheung E, Chong S, Cloninger C, Cohen D, Cohen N, Cormican P, Craddock N, Crowley J, Curtis D, Davidson M, Davis K, Degenhardt F, Del Favero J, DeLisi L, Demontis D, Dikeos D, Dinan T, Djurovic S, Donohoe G, Drapeau E, Duan J, Dudbridge F, Durmishi N, Eichhammer P, Eriksson J, Escott-Price V, Essioux L, Fanous A, Farrell M, Frank J, Franke L, Freedman R, Freimer N, Friedl M, Friedman J, Fromer M, Genovese G, Georgieva L, Gershon E, Giegling I, Giusti-Rodrguez P, Godard S, Goldstein J, Golimbet V, Gopal S, Gratten J, Grove J, de Haan L, Hammer C, Hamshere M, Hansen M, Hansen T, Haroutunian V, Hartmann A, Henskens F, Herms S, Hirschhorn J, Hoffmann P, Hofman A, Hollegaard M, Hougaard D, Ikeda M, Joa I, Julia A, Kahn R, Kalaydjieva L, Karachanak-Yankova S, Karjalainen J, Kavanagh D, Keller M, Kelly B, Kennedy J, Khrunin A, Kim Y, Klovins J, Knowles J, Konte B, Kucinskas V, Kucinskiene Z, Kuzelova-Ptackova H, Kahler A, Laurent C, Keong J, Lee S, Legge S, Lerer B, Li M, Li T, Liang KY, Lieberman J, Limborska S, Loughland C, Lubinski J, Lnnqvist J, Macek M, Magnusson P, Maher B, Maier W, Mallet J, Marsal S, Mattheisen M, Mattingsdal M, McCarley R, McDonald C, McIntosh A, Meier S, Meijer C, Melegh B, Melle I, Mesholam-Gately R, Metspalu A, Michie P, Milani L, Milanova V, Mokrab Y, Morris D, Mors O, Mortensen P, Murphy K, Murray R, Myin-Germeys I, Mller-Myhsok B, Nelis M, Nenadic I, Nertney D, Nestadt G, Nicodemus K, Nikitina-Zake L, Nisenbaum L, Nordin A, O’Callaghan E, O’Dushlaine C, O’Neill F, Oh SY, Olincy A, Olsen L, Van Os J, Pantelis C, Papadimitriou G, Papiol S, Parkhomenko E, Pato M, Paunio T, Pejovic-Milovancevic M, Perkins D, Pietilinen O, Pimm J, Pocklington A, Powell J, Price A, Pulver A, Purcell S, Quested D, Rasmussen H, Reichenberg A, Reimers M, Richards A, Roffman J, Roussos P, Ruderfer D, Salomaa V, Sanders A, Schall U, Schubert C, Schulze T, Schwab S, Scolnick E, Scott R, Seidman L, Shi J, Sigurdsson E, Silagadze T, Silverman J, Sim K, Slominsky P, Smoller J, So HC, Spencer C, Stahl E, Stefansson H, Steinberg S, Stogmann E, Straub R, Strengman E, Strohmaier J, Stroup T, Subramaniam M, Suvisaari J, Svrakic D, Szatkiewicz J, Sderman E, Thirumalai S, Toncheva D, Tooney P, Tosato S, Veijola J, Waddington J, Walsh D, Wang D, Wang Q, Webb B, Weiser M, Wildenauer D, Williams N, Williams S, Witt S, Wolen A, Wong E, Wormley B, Wu J, Xi H, Zai C, Zheng X, Zimprich F, Wray N, Stefansson K, Visscher P, Adolfsson R, Andreassen O, Blackwood D, Bramon E, Buxbaum J, Børglum A, Cichon S, Darvasi A, Domenici E, Ehrenreich H, Esko T, Gejman P, Gill M, Gurling H, Hultman C, Iwata N, Jablensky A, Jonsson E, Kendler K, Kirov G, Knight J, Lencz T, Levinson D, Li Q, Liu J, Malhotra A, McCarroll S, McQuillin A, Moran J, Mortensen P, Mowry B, Nthen M, Ophoff R, Owen M, Palotie A, Pato C, Petryshen T, Posthuma D, Rietschel M, Riley B, Rujescu D, Sham P, Sklar P, St. Clair D, Weinberger D, Wendland J, Werge T, Daly M, Sullivan P, O’Donovan M, Kraft P, Hunter DJ, Adank M, Ahsan H, Aittomäki K, Baglietto L, Berndt S, Blomquist C, Canzian F, Chang-Claude J, Chanock SJ, Crisponi L, Czene K, Dahmen N, Silva IDS, Easton D, Eliassen AH, Figueroa J, Fletcher O, Garcia-Closas M, Gaudet MM, Gibson L, Haiman CA, Hall P, Hazra A, Hein R, Henderson BE, Hofman A, Hopper JL, Irwanto A, Johansson M, Kaaks R, Kibriya MG, Lichtner P, Lindström S, Liu J, Lund E, Makalic E, Meindl A, Meijers-Heijboer H, Müller-Myhsok B, Muranen TA, Nevanlinna H, Peeters PH, Peto J, Prentice RL, Rahman N, Sánchez MJ, Schmidt DF, Schmutzler RK, Southey MC, Tamimi R, Travis R, Turnbull C, Uitterlinden AG, van der Luijt RB, Waisfisz Q, Wang Z, Whittemore AS, Yang R, Zheng W. Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores. Am J Hum Genet 2015; 97:576-92. [PMID: 26430803 DOI: 10.1016/j.ajhg.2015.09.001] [Citation(s) in RCA: 794] [Impact Index Per Article: 88.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2015] [Accepted: 09/01/2015] [Indexed: 11/24/2022] Open
Abstract
Polygenic risk scores have shown great promise in predicting complex disease risk and will become more accurate as training sample sizes increase. The standard approach for calculating risk scores involves linkage disequilibrium (LD)-based marker pruning and applying a p value threshold to association statistics, but this discards information and can reduce predictive accuracy. We introduce LDpred, a method that infers the posterior mean effect size of each marker by using a prior on effect sizes and LD information from an external reference panel. Theory and simulations show that LDpred outperforms the approach of pruning followed by thresholding, particularly at large sample sizes. Accordingly, predicted R(2) increased from 20.1% to 25.3% in a large schizophrenia dataset and from 9.8% to 12.0% in a large multiple sclerosis dataset. A similar relative improvement in accuracy was observed for three additional large disease datasets and for non-European schizophrenia samples. The advantage of LDpred over existing methods will grow as sample sizes increase.
Collapse
|
47
|
Guo G, Liu H, Wang L, Shen H, Hu W. The Genome-Wide Influence on Human BMI Depends on Physical Activity, Life Course, and Historical Period. Demography 2015; 52:1651-70. [PMID: 26319003 PMCID: PMC6642062 DOI: 10.1007/s13524-015-0421-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
In this analysis, guided by an evolutionary framework, we investigate how the human genome as a whole interacts with historical period, age, and physical activity to influence body mass index (BMI). The genomic influence is estimated by (1) heritability or the proportion of variance in BMI explained by genome-wide genotype data, and (2) the random effects or the best linear unbiased predictors (BLUPs) of genome-wide association studies (GWAS) data on BMI. Data were used from the Framingham Heart Study (FHS) in the United States. The study was initiated in 1948, and the obesity data were collected repeatedly over the subsequent decades. The analyses draw analysis samples from a pool of >8,000 individuals in the FHS. The hypothesis testing based on Pitman test, permutation Pitman test, F test, and permutation F test produces three sets of significant findings. First, the genomic influence on BMI is substantially larger after the mid-1980s than in the few decades before the mid-1980s within each age group of 21-40, 41-50, 51-60, and >60. Second, the genomic influence on BMI weakens as one ages across the life course, or the genomic influence on BMI tends to be more important during reproductive ages than after reproductive ages within each of the two historical periods. Third, within the age group of 21-50 and not in the age group of >50, the genomic influence on BMI among physically active individuals is substantially smaller than the influence on those who are not physically active. In summary, this study provides evidence that the influence of human genome as a whole on obesity depends on historical period, age, and level of physical activity.
Collapse
Affiliation(s)
- Guang Guo
- Department of Sociology, University of North Carolina, Chapel Hill, NC, 27599, USA.
- Carolina Population Center, University of North Carolina, Chapel Hill, NC, 27599, USA.
- Carolina Center for Genome Sciences, University of North Carolina, Chapel Hill, NC, 27599, USA.
| | - Hexuan Liu
- Department of Sociology, University of North Carolina, Chapel Hill, NC, 27599, USA.
- Carolina Population Center, University of North Carolina, Chapel Hill, NC, 27599, USA.
| | - Ling Wang
- Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, NC, 27599, USA.
| | - Haipeng Shen
- Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, NC, 27599, USA.
| | - Wen Hu
- Department of Sociology, Nankai University, Tianjin, China.
| |
Collapse
|
48
|
Grenier C, Cao TV, Ospina Y, Quintero C, Châtel MH, Tohme J, Courtois B, Ahmadi N. Accuracy of Genomic Selection in a Rice Synthetic Population Developed for Recurrent Selection Breeding. PLoS One 2015; 10:e0136594. [PMID: 26313446 PMCID: PMC4551487 DOI: 10.1371/journal.pone.0136594] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2015] [Accepted: 08/05/2015] [Indexed: 01/06/2023] Open
Abstract
Genomic selection (GS) is a promising strategy for enhancing genetic gain. We investigated the accuracy of genomic estimated breeding values (GEBV) in four inter-related synthetic populations that underwent several cycles of recurrent selection in an upland rice-breeding program. A total of 343 S2:4 lines extracted from those populations were phenotyped for flowering time, plant height, grain yield and panicle weight, and genotyped with an average density of one marker per 44.8 kb. The relative effect of the linkage disequilibrium (LD) and minor allele frequency (MAF) thresholds for selecting markers, the relative size of the training population (TP) and of the validation population (VP), the selected trait and the genomic prediction models (frequentist and Bayesian) on the accuracy of GEBVs was investigated in 540 cross validation experiments with 100 replicates. The effect of kinship between the training and validation populations was tested in an additional set of 840 cross validation experiments with a single genomic prediction model. LD was high (average r2 = 0.59 at 25 kb) and decreased slowly, distribution of allele frequencies at individual loci was markedly skewed toward unbalanced frequencies (MAF average value 15.2% and median 9.6%), and differentiation between the four synthetic populations was low (FST ≤0.06). The accuracy of GEBV across all cross validation experiments ranged from 0.12 to 0.54 with an average of 0.30. Significant differences in accuracy were observed among the different levels of each factor investigated. Phenotypic traits had the biggest effect, and the size of the incidence matrix had the smallest. Significant first degree interaction was observed for GEBV accuracy between traits and all the other factors studied, and between prediction models and LD, MAF and composition of the TP. The potential of GS to accelerate genetic gain and breeding options to increase the accuracy of predictions are discussed.
Collapse
Affiliation(s)
- Cécile Grenier
- CIAT, A.A. 6713, Cali, Colombia
- CIRAD, UMR AGAP, F-34398, Montpellier, France
- * E-mail:
| | | | | | | | | | | | | | | |
Collapse
|
49
|
Azevedo CF, de Resende MDV, E Silva FF, Viana JMS, Valente MSF, Resende MFR, Muñoz P. Ridge, Lasso and Bayesian additive-dominance genomic models. BMC Genet 2015; 16:105. [PMID: 26303864 PMCID: PMC4549024 DOI: 10.1186/s12863-015-0264-2] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2015] [Accepted: 08/13/2015] [Indexed: 11/27/2022] Open
Abstract
Background A complete approach for genome-wide selection (GWS) involves reliable statistical genetics models and methods. Reports on this topic are common for additive genetic models but not for additive-dominance models. The objective of this paper was (i) to compare the performance of 10 additive-dominance predictive models (including current models and proposed modifications), fitted using Bayesian, Lasso and Ridge regression approaches; and (ii) to decompose genomic heritability and accuracy in terms of three quantitative genetic information sources, namely, linkage disequilibrium (LD), co-segregation (CS) and pedigree relationships or family structure (PR). The simulation study considered two broad sense heritability levels (0.30 and 0.50, associated with narrow sense heritabilities of 0.20 and 0.35, respectively) and two genetic architectures for traits (the first consisting of small gene effects and the second consisting of a mixed inheritance model with five major genes). Results G-REML/G-BLUP and a modified Bayesian/Lasso (called BayesA*B* or t-BLASSO) method performed best in the prediction of genomic breeding as well as the total genotypic values of individuals in all four scenarios (two heritabilities x two genetic architectures). The BayesA*B*-type method showed a better ability to recover the dominance variance/additive variance ratio. Decomposition of genomic heritability and accuracy revealed the following descending importance order of information: LD, CS and PR not captured by markers, the last two being very close. Conclusions Amongst the 10 models/methods evaluated, the G-BLUP, BAYESA*B* (−2,8) and BAYESA*B* (4,6) methods presented the best results and were found to be adequate for accurately predicting genomic breeding and total genotypic values as well as for estimating additive and dominance in additive-dominance genomic models.
Collapse
Affiliation(s)
| | - Marcos Deon Vilela de Resende
- Department of Statistics, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil. .,Embrapa Forestry, Colombo, Paraná, Brazil.
| | | | | | | | | | - Patricio Muñoz
- Agronomy Department, University of Florida, Gainesville, Florida, USA.
| |
Collapse
|
50
|
Abstract
Since Tysk et al's pioneering analysis of the Swedish twin registry, twin and family studies continue to support a strong genetic basis of the inflammatory bowel diseases. The coefficient of heritability for siblings of inflammatory bowel disease probands is 25 to 42 for Crohn's disease and 4 to 15 for ulcerative colitis. Heritability estimates for Crohn's disease and ulcerative colitis from pooled twin studies are 0.75 and 0.67, respectively. However, this is at odds with the much lower heritability estimates from Genome-Wide Association Studies (GWAS). This "missing heritability" is likely due to shortfalls in both family studies and GWAS. The coefficient of heritability fails to account for familial shared environment. Heritability calculations from twin data are based on Falconer's method, with premises that are increasingly understood to be flawed. GWAS based heritability estimates may underestimate heritability due to incomplete linkage disequilibrium, and because some single nucleotide polypeptides (SNPs) do not reach a level of significance to allow detection. SNPs missed by GWAS include common SNPs with low penetrance and rare SNPs with high penetrance. All methods of heritability estimation regard genetic and environmental variance as separate entities, although it is now understood that there is a complex multidirectional interplay between genetic are environmental factors mediated by the microbiota, the epigenome, and the innate and acquired immune systems. Due to the limitations of heritability estimates, it is unlikely that a true value for heritability will be reached. Further work aimed at quantifying the variance explained across GWAS, epigenome-wide, and microbiota-wide association studies will help to define factors leading to inflammatory bowel disease.
Collapse
|