1
|
Feng X, Zan Y, Li T, Yao Y, Ning Z, Li J, Charati H, Xu W, Wan Q, Zeng D, Zeng Z, Liu Y, Shen X. Dual-trait genomic analysis in highly stratified Arabidopsis thaliana populations using genome-wide association summary statistics. Heredity (Edinb) 2024; 133:11-20. [PMID: 38822132 PMCID: PMC11222461 DOI: 10.1038/s41437-024-00688-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 05/07/2024] [Indexed: 06/02/2024] Open
Abstract
Genome-wide association study (GWAS) is a powerful tool to identify genomic loci underlying complex traits. However, the application in natural populations comes with challenges, especially power loss due to population stratification. Here, we introduce a bivariate analysis approach to a GWAS dataset of Arabidopsis thaliana. We demonstrate the efficiency of dual-phenotype analysis to uncover hidden genetic loci masked by population structure via a series of simulations. In real data analysis, a common allele, strongly confounded with population structure, is discovered to be associated with late flowering and slow maturation of the plant. The discovered genetic effect on flowering time is further replicated in independent datasets. Using Mendelian randomization analysis based on summary statistics from our GWAS and expression QTL scans, we predicted and replicated a candidate gene AT1G11560 that potentially causes this association. Further analysis indicates that this locus is co-selected with flowering-time-related genes. The discovered pleiotropic genotype-phenotype map provides new insights into understanding the genetic correlation of complex traits.
Collapse
Affiliation(s)
- Xiao Feng
- Center for Intelligent Medicine Research, Greater Bay Area Institute of Precision Medicine (Guangzhou), State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Guangzhou, China
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Yanjun Zan
- Key Laboratory of Tobacco Improvement and Biotechnology, Tobacco Research Institute, Chinese Academy of Agricultural Sciences, Qingdao, China
| | - Ting Li
- Center for Intelligent Medicine Research, Greater Bay Area Institute of Precision Medicine (Guangzhou), State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Guangzhou, China
| | - Yue Yao
- Center for Intelligent Medicine Research, Greater Bay Area Institute of Precision Medicine (Guangzhou), State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Guangzhou, China
| | - Zheng Ning
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Jiabei Li
- Center for Intelligent Medicine Research, Greater Bay Area Institute of Precision Medicine (Guangzhou), State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Guangzhou, China
| | - Hadi Charati
- Center for Intelligent Medicine Research, Greater Bay Area Institute of Precision Medicine (Guangzhou), State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Guangzhou, China
| | - Weilin Xu
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
| | - Qianhui Wan
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
- Department of Mathematics, University of California, Davis, CA, USA
| | - Dongyu Zeng
- State Key Laboratory of Biocontrol, School of Ecology, Sun Yat-sen University, Shenzhen, China
| | - Ziyi Zeng
- School of Engineering, Sun Yat-sen University, Guangzhou, China
| | - Yang Liu
- State Key Laboratory of Biocontrol, School of Ecology, Sun Yat-sen University, Shenzhen, China.
| | - Xia Shen
- Center for Intelligent Medicine Research, Greater Bay Area Institute of Precision Medicine (Guangzhou), State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Guangzhou, China.
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.
- Center for Global Health Research, Usher Institute, University of Edinburgh, Edinburgh, Scotland, UK.
| |
Collapse
|
2
|
Rennberger G, Branham SE, Wechter WP. Genome-Wide Association Study of Resistance to Pseudomonas syringae in the USDA Collection of Citrullus amarus. PLANT DISEASE 2023; 107:3464-3474. [PMID: 37129351 DOI: 10.1094/pdis-04-23-0795-re] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Pseudomonas leaf spot (PLS), caused by Pseudomonas syringae pv. syringae, is an emerging disease of watermelon in the United States with the potential to severely reduce yield under humid conditions. The genetic basis of resistance to this disease is not known and no resistant germplasm is available. Because Citrullus amarus is an important reservoir of resistance genes for the cultivated watermelon, C. lanatus, we screened the United States Department of Agriculture plant introduction collection of C. amarus for resistance to PLS. Accessions (n = 117) were phenotyped for their level of resistance to PLS in two separate tests. Accession means of percent leaf area affected ranged from 1.5 to 99.4%. The broad-sense heritability for the trait was 0.51. Whole-genome resequencing generated 2,126,759 single-nucleotide polymorphisms (SNPs) which were used to perform a genome-wide association study (GWAS) aimed at discovering molecular markers for resistance. Three different models-BLINK, FarmCPU, and MLM-were included in the GWAS analyses. BLINK and FarmCPU, which are multilocus models, found eight SNPs, located on chromosomes Ca01, Ca05, Ca06, Ca08, and Ca10, that were significantly associated with resistance to PLS. Two of these SNPs were found by both BLINK and FarmCPU. The MLM model did not detect any significant associations. BLINK and FarmCPU estimated an explained phenotypic variance of 43.6 and 28.5%, respectively, for SNP S6_19327000 and 25.0 and 26.0%, respectively, for SNP S1_33362258, the two most significant SNPs found. In total, 43 candidate genes with known involvement in disease resistance were discovered within the genomic intervals of seven of the eight peak SNPs. Eleven of the candidate genes that were found have been reported to be involved in resistance to P. syringae in other plant species. Two significant SNPs were within resistance genes previously documented to play important roles of plant resistance specific to P. syringae in other pathosystems. The SNPs identified in this study will be instrumental in finding causal genes involved in PLS resistance in watermelon and developing resistant germplasm through breeding.
Collapse
Affiliation(s)
- Gabriel Rennberger
- United States Department of Agriculture-Agricultural Research Service (USDA-ARS), U.S. Vegetable Laboratory, Charleston, SC 29414
| | - Sandra E Branham
- Clemson University, Department of Plant and Environmental Sciences, Coastal Research and Education Center, Charleston, SC 29414
| | - William P Wechter
- United States Department of Agriculture-Agricultural Research Service (USDA-ARS), U.S. Vegetable Laboratory, Charleston, SC 29414
| |
Collapse
|
3
|
Jarquin D, Roy A, Clarke B, Ghosal S. Combining phenotypic and genomic data to improve prediction of binary traits. J Appl Stat 2023; 51:1497-1523. [PMID: 38863802 PMCID: PMC11164039 DOI: 10.1080/02664763.2023.2208773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2022] [Accepted: 04/22/2023] [Indexed: 06/13/2024]
Abstract
Plant breeders want to develop cultivars that outperform existing genotypes. Some characteristics (here 'main traits') of these cultivars are categorical and difficult to measure directly. It is important to predict the main trait of newly developed genotypes accurately. In addition to marker data, breeding programs often have information on secondary traits (or 'phenotypes') that are easy to measure. Our goal is to improve prediction of main traits with interpretable relations by combining the two data types using variable selection techniques. However, the genomic characteristics can overwhelm the set of secondary traits, so a standard technique may fail to select any phenotypic variables. We develop a new statistical technique that ensures appropriate representation from both the secondary traits and the genotypic variables for optimal prediction. When two data types (markers and secondary traits) are available, we achieve improved prediction of a binary trait by two steps that are designed to ensure that a significant intrinsic effect of a phenotype is incorporated in the relation before accounting for extra effects of genotypes. First, we sparsely regress the secondary traits on the markers and replace the secondary traits by their residuals to obtain the effects of phenotypic variables as adjusted by the genotypic variables. Then, we develop a sparse logistic classifier using the markers and residuals so that the adjusted phenotypes may be selected first to avoid being overwhelmed by the genotypic variables due to their numerical advantage. This classifier uses forward selection aided by a penalty term and can be computed effectively by a technique called the one-pass method. It compares favorably with other classifiers on simulated and real data.
Collapse
Affiliation(s)
- D. Jarquin
- Agronomy, University of Florida, Gainesville, FL, USA
| | - A. Roy
- Biostatistics Department, University of Florida, Gainesville, FL, USA
| | - B. Clarke
- Statistics, University of Nebraska-Lincoln, Lincoln, NE, USA
| | - S. Ghosal
- Statistics, North Carolina State University, Raleigh, NC, USA
| |
Collapse
|
4
|
Boatwright JL, Sapkota S, Kresovich S. Functional genomic effects of indels using Bayesian genome-phenome wide association studies in sorghum. Front Genet 2023; 14:1143395. [PMID: 37065477 PMCID: PMC10102435 DOI: 10.3389/fgene.2023.1143395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 03/20/2023] [Indexed: 04/03/2023] Open
Abstract
High-throughput genomic and phenomic data have enhanced the ability to detect genotype-to-phenotype associations that can resolve broad pleiotropic effects of mutations on plant phenotypes. As the scale of genotyping and phenotyping has advanced, rigorous methodologies have been developed to accommodate larger datasets and maintain statistical precision. However, determining the functional effects of associated genes/loci is expensive and limited due to the complexity associated with cloning and subsequent characterization. Here, we utilized phenomic imputation of a multi-year, multi-environment dataset using PHENIX which imputes missing data using kinship and correlated traits, and we screened insertions and deletions (InDels) from the recently whole-genome sequenced Sorghum Association Panel for putative loss-of-function effects. Candidate loci from genome-wide association results were screened for potential loss of function using a Bayesian Genome-Phenome Wide Association Study (BGPWAS) model across both functionally characterized and uncharacterized loci. Our approach is designed to facilitate in silico validation of associations beyond traditional candidate gene and literature-search approaches and to facilitate the identification of putative variants for functional analysis and reduce the incidence of false-positive candidates in current functional validation methods. Using this Bayesian GPWAS model, we identified associations for previously characterized genes with known loss-of-function alleles, specific genes falling within known quantitative trait loci, and genes without any previous genome-wide associations while additionally detecting putative pleiotropic effects. In particular, we were able to identify the major tannin haplotypes at the Tan1 locus and effects of InDels on the protein folding. Depending on the haplotype present, heterodimer formation with Tan2 was significantly affected. We also identified major effect InDels in Dw2 and Ma1, where proteins were truncated due to frameshift mutations that resulted in early stop codons. These truncated proteins also lost most of their functional domains, suggesting that these indels likely result in loss of function. Here, we show that the Bayesian GPWAS model is able to identify loss-of-function alleles that can have significant effects upon protein structure and folding as well as multimer formation. Our approach to characterize loss-of-function mutations and their functional repercussions will facilitate precision genomics and breeding by identifying key targets for gene editing and trait integration.
Collapse
Affiliation(s)
- J. Lucas Boatwright
- Department of Plant and Environmental Sciences, Clemson University, Clemson, SC, United States
- Advanced Plant Technology, Clemson University, Clemson, SC, United States
- *Correspondence: J. Lucas Boatwright,
| | - Sirjan Sapkota
- Advanced Plant Technology, Clemson University, Clemson, SC, United States
| | - Stephen Kresovich
- Department of Plant and Environmental Sciences, Clemson University, Clemson, SC, United States
- Advanced Plant Technology, Clemson University, Clemson, SC, United States
- Feed the Future Innovation Lab for Crop Improvement, Cornell University, Ithaca, NY, United States
| |
Collapse
|
5
|
Saldivar EV, Ding Y, Poretsky E, Bird S, Block AK, Huffaker A, Schmelz EA. Maize Terpene Synthase 8 (ZmTPS8) Contributes to a Complex Blend of Fungal-Elicited Antibiotics. PLANTS (BASEL, SWITZERLAND) 2023; 12:1111. [PMID: 36903970 PMCID: PMC10005556 DOI: 10.3390/plants12051111] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 02/23/2023] [Accepted: 02/23/2023] [Indexed: 06/18/2023]
Abstract
In maize (Zea mays), fungal-elicited immune responses include the accumulation of terpene synthase (TPS) and cytochrome P450 monooxygenases (CYP) enzymes resulting in complex antibiotic arrays of sesquiterpenoids and diterpenoids, including α/β-selinene derivatives, zealexins, kauralexins and dolabralexins. To uncover additional antibiotic families, we conducted metabolic profiling of elicited stem tissues in mapping populations, which included B73 × M162W recombinant inbred lines and the Goodman diversity panel. Five candidate sesquiterpenoids associated with a chromosome 1 locus spanning the location of ZmTPS27 and ZmTPS8. Heterologous enzyme co-expression studies of ZmTPS27 in Nicotiana benthamiana resulted in geraniol production while ZmTPS8 yielded α-copaene, δ-cadinene and sesquiterpene alcohols consistent with epi-cubebol, cubebol, copan-3-ol and copaborneol matching the association mapping efforts. ZmTPS8 is an established multiproduct α-copaene synthase; however, ZmTPS8-derived sesquiterpene alcohols are rarely encountered in maize tissues. A genome wide association study further linked an unknown sesquiterpene acid to ZmTPS8 and combined ZmTPS8-ZmCYP71Z19 heterologous enzyme co-expression studies yielded the same product. To consider defensive roles for ZmTPS8, in vitro bioassays with cubebol demonstrated significant antifungal activity against both Fusarium graminearum and Aspergillus parasiticus. As a genetically variable biochemical trait, ZmTPS8 contributes to the cocktail of terpenoid antibiotics present following complex interactions between wounding and fungal elicitation.
Collapse
Affiliation(s)
- Evan V. Saldivar
- Department of Cell and Developmental Biology, University of California at San Diego, San Diego, CA 92093, USA
- Department of Plant Biology, Carnegie Institution for Science, Stanford University, Palo Alto, CA 94305, USA
| | - Yezhang Ding
- Department of Cell and Developmental Biology, University of California at San Diego, San Diego, CA 92093, USA
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Elly Poretsky
- Department of Cell and Developmental Biology, University of California at San Diego, San Diego, CA 92093, USA
| | - Skylar Bird
- Department of Cell and Developmental Biology, University of California at San Diego, San Diego, CA 92093, USA
| | - Anna K. Block
- Chemistry Research Unit, U.S. Department of Agriculture-Agricultural Research Service, Center for Medical, Agricultural and Veterinary Entomology, Gainesville, FL 32608, USA
| | - Alisa Huffaker
- Department of Cell and Developmental Biology, University of California at San Diego, San Diego, CA 92093, USA
| | - Eric A. Schmelz
- Department of Cell and Developmental Biology, University of California at San Diego, San Diego, CA 92093, USA
| |
Collapse
|
6
|
Exome-wide variation in a diverse barley panel reveals genetic associations with ten agronomic traits in Eastern landraces. J Genet Genomics 2022; 50:241-252. [PMID: 36566016 DOI: 10.1016/j.jgg.2022.12.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 12/08/2022] [Accepted: 12/09/2022] [Indexed: 12/24/2022]
Abstract
Barley (Hordeum vulgare ssp. vulgare) was one of the first crops to be domesticated and is adapted to a wide range of environments. Worldwide barley germplasm collections possess valuable allelic variations that could further improve barley productivity. Although barley genomics has offered a global picture of allelic variation among varieties and its association with various agronomic traits, polymorphisms from East Asian varieties remain scarce. In this study, we analyzed exome polymorphisms in a panel of 274 barley varieties collected worldwide, including 137 varieties from East Asian countries and Ethiopia. We revealed the underlying population structure and conducted genome-wide association studies for ten agronomic traits. Moreover, we examined genome-wide associations for traits related to grain size such as awn length and glume length. Our results demonstrate the value of diverse barley germplasm panels containing Eastern varieties, highlighting their distinct genomic signatures relative to Western subpopulations.
Collapse
|
7
|
Bandyopadhyay T, Swarbreck SM, Jaiswal V, Maurya J, Gupta R, Bentley AR, Griffiths H, Prasad M. GWAS identifies genetic loci underlying nitrogen responsiveness in the climate resilient C 4 model Setaria italica (L.). J Adv Res 2022; 42:249-261. [PMID: 36513416 PMCID: PMC9788950 DOI: 10.1016/j.jare.2022.01.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 01/21/2022] [Accepted: 01/23/2022] [Indexed: 12/27/2022] Open
Abstract
INTRODUCTION N responsiveness is the capacity to perceive and induce morpho-physiological adaptation to external and internal Nitrogen (N). Crop productivity is propelled by N fertilizer and requires the breeding/selection of cultivars with intrinsically high N responsiveness. This trait has many advantages in being more meaningful in commercial/environmental context, facilitating in-season N management and not being inversely correlated with N availability over processes regulating NUE. Current lack of its understanding at the physio-genetic basis is an impediment to select for cultivars with a predictably high N response. OBJECTIVES To dissect physio-genetic basis of N responsiveness in 142 diverse population of foxtail millet, Setaria italica (L.) by employing contrasting N fertilizer nutrition regimes. METHODS We phenotyped S. italica accessions for major yield related traits under low (N10, N25) and optimal (N100) growth conditions and genotyped them to subsequently perform a genome-wide association study to identify genetic loci associated with nitrogen responsiveness trait. Groups of accessions showing contrasting trait performance and allelic forms of specific linked genetic loci (showing haplotypes) were further accessed for N dependent transcript abundances of their proximal genes. RESULTS Our study show that N dependent yield rise in S. italica is driven by grain number whose responsiveness to N availability is genetically underlined. We identify 22 unique SNP loci strongly associated with this trait out of which six exhibit haplotypes and consistent allelic variation between lines with contrasting N dependent grain number response and panicle architectures. Furthermore, differential transcript abundances of specific genes proximally linked to these SNPs in same lines is indicative of their N dependence in a genotype specific manner. CONCLUSION The study demonstrates the value/ potential of N responsiveness as a selection trait and identifies key genetic components underlying the trait in S. italica. This has major implications for improving crop N sustainability and food security.
Collapse
Affiliation(s)
| | - Stéphanie M Swarbreck
- The John Bingham Laboratory, NIAB, 93 Lawrence Weaver Rd, Cambridge CB3 0LE, United Kingdom,Department of Plant Sciences, University of Cambridge, Downing Street, Cambridge CB2 3EA, United Kingdom
| | - Vandana Jaiswal
- CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh 176061, India
| | - Jyoti Maurya
- National Institute of Plant Genome Research, Aruna Asaf Ali Marg, New Delhi 110067, India
| | - Rajeev Gupta
- Cereal Crops Research Unit, US Department of Agriculture (USDA) Agricultural Research Service (ARS), Fargo, ND, United States,International Crop Research Institute for the Semi -arid Tropics, Patancheru, Hyderabad, Telangana 502324, India
| | - Alison R. Bentley
- The John Bingham Laboratory, NIAB, 93 Lawrence Weaver Rd, Cambridge CB3 0LE, United Kingdom,International Maize and Wheat Improvement Center, Texcoco, México
| | - Howard Griffiths
- Department of Plant Sciences, University of Cambridge, Downing Street, Cambridge CB2 3EA, United Kingdom
| | - Manoj Prasad
- National Institute of Plant Genome Research, Aruna Asaf Ali Marg, New Delhi 110067, India,Corresponding author at: National Institute of Plant Genome Research, Aruna Asaf Ali Marg, New Delhi 110067, India.
| |
Collapse
|
8
|
Abstract
The same gene is often regulated differently in response to stress in even closely related plant species. Directly measuring stress-responsive gene expression can be financially and logistically challenging in nonmodel species. Here, we show that models trained using data on which genes respond to cold in one species can predict which genes will respond to cold in related species, even when the training and target species vary in their degree of tolerance to cold. The prediction models we used require only genomic sequence and gene models. As a result, data from well-studied model species may be used to predict which genes will respond to stress in less-studied species with sequenced genomes. Although genome-sequence assemblies are available for a growing number of plant species, gene-expression responses to stimuli have been cataloged for only a subset of these species. Many genes show altered transcription patterns in response to abiotic stresses. However, orthologous genes in related species often exhibit different responses to a given stress. Accordingly, data on the regulation of gene expression in one species are not reliable predictors of orthologous gene responses in a related species. Here, we trained a supervised classification model to identify genes that transcriptionally respond to cold stress. A model trained with only features calculated directly from genome assemblies exhibited only modest decreases in performance relative to models trained by using genomic, chromatin, and evolution/diversity features. Models trained with data from one species successfully predicted which genes would respond to cold stress in other related species. Cross-species predictions remained accurate when training was performed in cold-sensitive species and predictions were performed in cold-tolerant species and vice versa. Models trained with data on gene expression in multiple species provided at least equivalent performance to models trained and tested in a single species and outperformed single-species models in cross-species prediction. These results suggest that classifiers trained on stress data from well-studied species may suffice for predicting gene-expression patterns in related, less-studied species with sequenced genomes.
Collapse
|
9
|
Liang Y, Liu HJ, Yan J, Tian F. Natural Variation in Crops: Realized Understanding, Continuing Promise. ANNUAL REVIEW OF PLANT BIOLOGY 2021; 72:357-385. [PMID: 33481630 DOI: 10.1146/annurev-arplant-080720-090632] [Citation(s) in RCA: 68] [Impact Index Per Article: 22.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Crops feed the world's population and shape human civilization. The improvement of crop productivity has been ongoing for almost 10,000 years and has evolved from an experience-based to a knowledge-driven practice over the past three decades. Natural alleles and their reshuffling are long-standing genetic changes that affect how crops respond to various environmental conditions and agricultural practices. Decoding the genetic basis of natural variation is central to understanding crop evolution and, in turn, improving crop breeding. Here, we review current advances in the approaches used to map the causal alleles of natural variation, provide refined insights into the genetics and evolution of natural variation, and outline how this knowledge promises to drive the development of sustainable agriculture under the dome of emerging technologies.
Collapse
Affiliation(s)
- Yameng Liang
- State Key Laboratory of Plant Physiology and Biochemistry, National Maize Improvement Center, Key Laboratory of Biology and Genetic Improvement of Maize (MOA), Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China; ,
| | - Hai-Jun Liu
- Gregor Mendel Institute, Austrian Academy of Sciences, Vienna BioCenter, 1030 Vienna, Austria;
| | - Jianbing Yan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China;
| | - Feng Tian
- State Key Laboratory of Plant Physiology and Biochemistry, National Maize Improvement Center, Key Laboratory of Biology and Genetic Improvement of Maize (MOA), Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China; ,
| |
Collapse
|
10
|
Bi W, Lee S. Scalable and Robust Regression Methods for Phenome-Wide Association Analysis on Large-Scale Biobank Data. Front Genet 2021; 12:682638. [PMID: 34211504 PMCID: PMC8239389 DOI: 10.3389/fgene.2021.682638] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Accepted: 05/17/2021] [Indexed: 02/05/2023] Open
Abstract
With the advances in genotyping technologies and electronic health records (EHRs), large biobanks have been great resources to identify novel genetic associations and gene-environment interactions on a genome-wide and even a phenome-wide scale. To date, several phenome-wide association studies (PheWAS) have been performed on biobank data, which provides comprehensive insights into many aspects of human genetics and biology. Although inspiring, PheWAS on large-scale biobank data encounters new challenges including computational burden, unbalanced phenotypic distribution, and genetic relationship. In this paper, we first discuss these new challenges and their potential impact on data analysis. Then, we summarize approaches that are scalable and robust in GWAS and PheWAS. This review can serve as a practical guide for geneticists, epidemiologists, and other medical researchers to identify genetic variations associated with health-related phenotypes in large-scale biobank data analysis. Meanwhile, it can also help statisticians to gain a comprehensive and up-to-date understanding of the current technical tool development.
Collapse
Affiliation(s)
- Wenjian Bi
- Department of Medical Genetics, School of Basic Medical Sciences, Peking University, Beijing, China
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, United States
| | - Seunggeun Lee
- Graduate School of Data Science, Seoul National University, Seoul, South Korea
| |
Collapse
|
11
|
Zhong H, Liu S, Meng X, Sun T, Deng Y, Kong W, Peng Z, Li Y. Uncovering the genetic mechanisms regulating panicle architecture in rice with GPWAS and GWAS. BMC Genomics 2021; 22:86. [PMID: 33509071 PMCID: PMC7842007 DOI: 10.1186/s12864-021-07391-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 01/13/2021] [Indexed: 02/08/2023] Open
Abstract
Background The number of panicles per plant, number of grains per panicle, and 1000-grain weight are important factors contributing to the grain yield per plant in rice. The Rice Diversity Panel 1 (RDP1) contains a total of 421 purified, homozygous rice accessions representing diverse genetic variations within O. sativa. The release of High-Density Rice Array (HDRA, 700 k SNPs) dataset provides a new opportunity to discover the genetic variants of panicle architectures in rice. Results In this report, a new method genome-phenome wide association study (GPWAS) was performed with 391 individuals and 27 traits derived from RDP1 to scan the relationship between the genes and multi-traits. A total of 1985 gene models were linked to phenomic variation with a p-value cutoff of 4.49E-18. Besides, 406 accessions derived from RDP1 with 411,066 SNPs were used to identify QTLs associated with the total spikelets number per panicle (TSNP), grain number per panicle (GNP), empty grain number per panicle (EGNP), primary branch number (PBN), panicle length (PL), and panicle number per plant (PN) by GLM, MLM, FarmCPU, and BLINK models for genome-wide association study (GWAS) analyses. A total of 18, 21, 18, 17, 15, and 17 QTLs were identified tightly linked with TSNP, GNP, EGNP, PBN, PL, and PN, respectively. Then, a total of 23 candidate genes were mapped simultaneously using both GWAS and GPWAS methods, composed of 6, 4, 5, 4, and 4 for TSNP, GNP, EGNP, PBN, and PL. Notably, one overlapped gene (Os01g0140100) were further investigated based on the haplotype and gene expression profile, indicating this gene might regulate the TSNP or panicle architecture in rice. Conclusions Nearly 30 % (30/106) QTLs co-located with the previous published genes or QTLs, indicating the power of GWAS. Besides, GPWAS is a new method to discover the relationship between genes and traits, especially the pleiotropy genes. Through comparing the results from GWAS and GPWAS, we identified 23 candidate genes related to panicle architectures in rice. This comprehensive study provides new insights into the genetic basis controlling panicle architectures in rice, which lays a foundation in rice improvement. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07391-x.
Collapse
Affiliation(s)
- Hua Zhong
- State Key Laboratory of Hybrid Rice, Key Laboratory for Research and Utilization of Heterosis in Indica Rice, Ministry of Agriculture, College of Life Sciences, Wuhan University, Wuhan, China
| | - Shuai Liu
- Department of Biochemistry, Molecular Biology, Entomology and Plant Pathology, Mississippi State University, Starkville, MS, 39762, USA
| | - Xiaoxi Meng
- Department of Biochemistry, Molecular Biology, Entomology and Plant Pathology, Mississippi State University, Starkville, MS, 39762, USA
| | - Tong Sun
- State Key Laboratory of Hybrid Rice, Key Laboratory for Research and Utilization of Heterosis in Indica Rice, Ministry of Agriculture, College of Life Sciences, Wuhan University, Wuhan, China
| | - Yujuan Deng
- Department of Computer Science and Engineering, Experimental Teaching Center, Shijiazhuang University, Shijiazhuang, Hebei, China
| | - Weilong Kong
- State Key Laboratory of Hybrid Rice, Key Laboratory for Research and Utilization of Heterosis in Indica Rice, Ministry of Agriculture, College of Life Sciences, Wuhan University, Wuhan, China
| | - Zhaohua Peng
- Department of Biochemistry, Molecular Biology, Entomology and Plant Pathology, Mississippi State University, Starkville, MS, 39762, USA
| | - Yangsheng Li
- State Key Laboratory of Hybrid Rice, Key Laboratory for Research and Utilization of Heterosis in Indica Rice, Ministry of Agriculture, College of Life Sciences, Wuhan University, Wuhan, China.
| |
Collapse
|
12
|
Liu S, Zhong H, Meng X, Sun T, Li Y, Pinson SRM, Chang SKC, Peng Z. Genome-wide association studies of ionomic and agronomic traits in USDA mini core collection of rice and comparative analyses of different mapping methods. BMC PLANT BIOLOGY 2020; 20:441. [PMID: 32972357 PMCID: PMC7513512 DOI: 10.1186/s12870-020-02603-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Accepted: 08/16/2020] [Indexed: 05/08/2023]
Abstract
BACKGROUND Rice is an important human staple food vulnerable to heavy metal contamination leading to serious concerns. High yield with low heavy metal contamination is a common but highly challenging goal for rice breeders worldwide due to lack of genetic knowledge and markers. RESULTS To identify candidate QTLs and develop molecular markers for rice yield and heavy metal content, a total of 191 accessions from the USDA Rice mini-core collection with over 3.2 million SNPs were employed to investigate the QTLs. Sixteen ionomic and thirteen agronomic traits were analyzed utilizing two univariate (GLM and MLM) and two multivariate (MLMM and FarmCPU) GWAS methods. 106, 47, and 97 QTLs were identified for ionomics flooded, ionomics unflooded, and agronomic traits, respectively, with the criterium of p-value < 1.53 × 10- 8, which was determined by the Bonferroni correction for p-value of 0.05. While 49 (~ 20%) of the 250 QTLs were coinciding with previously reported QTLs/genes, about 201 (~ 80%) were new. In addition, several new candidate genes involved in ionomic and agronomic traits control were identified by analyzing the DNA sequence, gene expression, and the homologs of the QTL regions. Our results further showed that each of the four GWAS methods can identify unique as well as common QTLs, suggesting that using multiple GWAS methods can complement each other in QTL identification, especially by combining univariate and multivariate methods. CONCLUSIONS While 49 previously reported QTLs/genes were rediscovered, over 200 new QTLs for ionomic and agronomic traits were found in the rice genome. Moreover, multiple new candidate genes for agronomic and ionomic traits were identified. This research provides novel insights into the genetic basis of both ionomic and agronomic variations in rice, establishing the foundation for marker development in breeding and further investigation on reducing heavy-metal contamination and improving crop yields. Finally, the comparative analysis of the GWAS methods showed that each method has unique features and different methods can complement each other.
Collapse
Affiliation(s)
- Shuai Liu
- Department of Biochemistry, Molecular Biology, Entomology and Plant Pathology, Mississippi State University, Starkville, MS, 39762, USA
| | - Hua Zhong
- State Key Laboratory of Hybrid Rice, Key Laboratory for Research and Utilization of Heterosis in Indica Rice, Ministry of Agriculture, College of Life Sciences, Wuhan University, Wuhan, China
| | - Xiaoxi Meng
- Department of Biochemistry, Molecular Biology, Entomology and Plant Pathology, Mississippi State University, Starkville, MS, 39762, USA
| | - Tong Sun
- State Key Laboratory of Hybrid Rice, Key Laboratory for Research and Utilization of Heterosis in Indica Rice, Ministry of Agriculture, College of Life Sciences, Wuhan University, Wuhan, China
| | - Yangsheng Li
- State Key Laboratory of Hybrid Rice, Key Laboratory for Research and Utilization of Heterosis in Indica Rice, Ministry of Agriculture, College of Life Sciences, Wuhan University, Wuhan, China
| | - Shannon R M Pinson
- Dale Bumpers National Rice Research Center, USDA ARS, Stuttgart, AR, 72160, USA
| | - Sam K C Chang
- Experimental Seafood Processing Laboratory, Coastal and Research Extension Center, Mississippi State University, Pascagoula, MS, 39567, USA
| | - Zhaohua Peng
- Department of Biochemistry, Molecular Biology, Entomology and Plant Pathology, Mississippi State University, Starkville, MS, 39762, USA.
| |
Collapse
|