1
|
Martins FB, Aono AH, Moraes ADCL, Ferreira RCU, Vilela MDM, Pessoa-Filho M, Rodrigues-Motta M, Simeão RM, de Souza AP. Genome-wide family prediction unveils molecular mechanisms underlying the regulation of agronomic traits in Urochloa ruziziensis. FRONTIERS IN PLANT SCIENCE 2023; 14:1303417. [PMID: 38148869 PMCID: PMC10749977 DOI: 10.3389/fpls.2023.1303417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 11/15/2023] [Indexed: 12/28/2023]
Abstract
Tropical forage grasses, particularly those belonging to the Urochloa genus, play a crucial role in cattle production and serve as the main food source for animals in tropical and subtropical regions. The majority of these species are apomictic and tetraploid, highlighting the significance of U. ruziziensis, a sexual diploid species that can be tetraploidized for use in interspecific crosses with apomictic species. As a means to support breeding programs, our study investigates the feasibility of genome-wide family prediction in U. ruziziensis families to predict agronomic traits. Fifty half-sibling families were assessed for green matter yield, dry matter yield, regrowth capacity, leaf dry matter, and stem dry matter across different clippings established in contrasting seasons with varying available water capacity. Genotyping was performed using a genotyping-by-sequencing approach based on DNA samples from family pools. In addition to conventional genomic prediction methods, machine learning and feature selection algorithms were employed to reduce the necessary number of markers for prediction and enhance predictive accuracy across phenotypes. To explore the regulation of agronomic traits, our study evaluated the significance of selected markers for prediction using a tree-based approach, potentially linking these regions to quantitative trait loci (QTLs). In a multiomic approach, genes from the species transcriptome were mapped and correlated to those markers. A gene coexpression network was modeled with gene expression estimates from a diverse set of U. ruziziensis genotypes, enabling a comprehensive investigation of molecular mechanisms associated with these regions. The heritabilities of the evaluated traits ranged from 0.44 to 0.92. A total of 28,106 filtered SNPs were used to predict phenotypic measurements, achieving a mean predictive ability of 0.762. By employing feature selection techniques, we could reduce the dimensionality of SNP datasets, revealing potential genotype-phenotype associations. The functional annotation of genes near these markers revealed associations with auxin transport and biosynthesis of lignin, flavonol, and folic acid. Further exploration with the gene coexpression network uncovered associations with DNA metabolism, stress response, and circadian rhythm. These genes and regions represent important targets for expanding our understanding of the metabolic regulation of agronomic traits and offer valuable insights applicable to species breeding. Our work represents an innovative contribution to molecular breeding techniques for tropical forages, presenting a viable marker-assisted breeding approach and identifying target regions for future molecular studies on these agronomic traits.
Collapse
Affiliation(s)
- Felipe Bitencourt Martins
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, São Paulo, Brazil
| | - Alexandre Hild Aono
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, São Paulo, Brazil
| | - Aline da Costa Lima Moraes
- Department of Plant Biology, Biology Institute, University of Campinas (UNICAMP), Campinas, São Paulo, Brazil
| | | | | | - Marco Pessoa-Filho
- Embrapa Cerrados, Brazilian Agricultural Research Corporation, Brasília, Brazil
| | | | - Rosangela Maria Simeão
- Embrapa Gado de Corte, Brazilian Agricultural Research Corporation, Campo Grande, Mato Grosso, Brazil
| | - Anete Pereira de Souza
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, São Paulo, Brazil
- Department of Plant Biology, Biology Institute, University of Campinas (UNICAMP), Campinas, São Paulo, Brazil
| |
Collapse
|
2
|
Qu J, Runcie D, Cheng H. Mega-scale Bayesian regression methods for genome-wide prediction and association studies with thousands of traits. Genetics 2023; 223:6931802. [PMID: 36529897 PMCID: PMC9991502 DOI: 10.1093/genetics/iyac183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 05/06/2022] [Accepted: 11/17/2022] [Indexed: 12/23/2022] Open
Abstract
Large-scale phenotype data are expected to increase the accuracy of genome-wide prediction and the power of genome-wide association analyses. However, genomic analyses of high-dimensional, highly correlated traits are challenging. We developed a method for implementing high-dimensional Bayesian multivariate regression to simultaneously analyze genetic variants underlying thousands of traits. As a demonstration, we implemented the BayesC prior in the R package MegaLMM. Applied to Genomic Prediction, MegaBayesC effectively integrated hyperspectral reflectance data from 620 hyperspectral wavelengths to improve the accuracy of genetic value prediction on grain yield in a wheat dataset. Applied to Genome-Wide Association Studies, we used simulations to show that MegaBayesC can accurately estimate the effect sizes of QTL across a range of genetic architectures and causes of correlations among traits. To apply MegaBayesC to a realistic scenario involving whole-genome marker data, we developed a 2-stage procedure involving a preliminary step of candidate marker selection prior to multivariate regression. We then used MegaBayesC to identify genetic associations with flowering time in Arabidopsis thaliana, leveraging expression data from 20,843 genes. MegaBayesC selected 15 single nucleotide polymorphisms as important for flowering time, with 13 located within 100 kb of known flowering-time related genes, a higher validation rate than achieved by a single-stage analysis using only the flowering time data itself. These results demonstrate that MegaBayesC can efficiently and effectively leverage high-dimensional phenotypes in genetic analyses.
Collapse
Affiliation(s)
- Jiayi Qu
- Department of Animal Science, University of California Davis, Davis, CA 95616, USA
| | - Daniel Runcie
- Department of Plant Sciences, University of California Davis, Davis, CA 95616, USA
| | - Hao Cheng
- Department of Plant Sciences, University of California Davis, Davis, CA 95616, USA
| |
Collapse
|
3
|
Varona L, González-Recio O. Invited review: Recursive models in animal breeding: Interpretation, limitations, and extensions. J Dairy Sci 2023; 106:2198-2212. [PMID: 36870846 DOI: 10.3168/jds.2022-22578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Accepted: 10/30/2022] [Indexed: 03/05/2023]
Abstract
Structural equation models allow causal effects between 2 or more variables to be considered and can postulate unidirectional (recursive models; RM) or bidirectional (simultaneous models) causality between variables. This review evaluated the properties of RM in animal breeding and how to interpret the genetic parameters and the corresponding estimated breeding values. In many cases, RM and mixed multitrait models (MTM) are statistically equivalent, although subject to the assumption of variance-covariance matrices and restrictions imposed for achieving model identification. Inference under RM requires imposing some restrictions on the (co)variance matrix or on the location parameters. The estimates of the variance components and the breeding values can be transformed from RM to MTM, although the biological interpretation differs. In the MTM, the breeding values predict the full influence of the additive genetic effects on the traits and should be used for breeding purposes. In contrast, the RM breeding values express the additive genetic effect while holding the causal traits constant. The differences between the additive genetic effect in RM and MTM can be used to identify the genomic regions that affect the additive genetic variation of traits directly or causally mediated for another trait or traits. Furthermore, we presented some extensions of the RM that are useful for modeling quantitative traits with alternative assumptions. The equivalence of RM and MTM can be used to infer causal effects on sequentially expressed traits by manipulating the residual (co)variance matrix under the MTM. Further, RM can be implemented to analyze causality between traits that might differ among subgroups or within the parametric space of the independent traits. In addition, RM can be expanded to create models that introduce some degree of regularization in the recursive structure that aims to estimate a large number of recursive parameters. Finally, RM can be used in some cases for operational reasons, although there is no causality between traits.
Collapse
Affiliation(s)
- L Varona
- Instituto Agroalimentario de Aragón (IA2), Facultad de Veterinaria, Universidad de Zaragoza, C/ Miguel Servet 177, 50013 Zaragoza, Spain.
| | - O González-Recio
- Departamento de mejora genética animal, INIA-CSIC, Crta, de la Coruña km 7.5, 28040 Madrid, Spain
| |
Collapse
|
4
|
Wolc A, Dekkers JCM. Application of Bayesian genomic prediction methods to genome-wide association analyses. Genet Sel Evol 2022; 54:31. [PMID: 35562659 PMCID: PMC9103490 DOI: 10.1186/s12711-022-00724-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 04/27/2022] [Indexed: 11/19/2022] Open
Abstract
Background Bayesian genomic prediction methods were developed to simultaneously fit all genotyped markers to a set of available phenotypes for prediction of breeding values for quantitative traits, allowing for differences in the genetic architecture (distribution of marker effects) of traits. These methods also provide a flexible and reliable framework for genome-wide association (GWA) studies. The objective here was to review developments in Bayesian hierarchical and variable selection models for GWA analyses. Results By fitting all genotyped markers simultaneously, Bayesian GWA methods implicitly account for population structure and the multiple-testing problem of classical single-marker GWA. Implemented using Markov chain Monte Carlo methods, Bayesian GWA methods allow for control of error rates using probabilities obtained from posterior distributions. Power of GWA studies using Bayesian methods can be enhanced by using informative priors based on previous association studies, gene expression analyses, or functional annotation information. Applied to multiple traits, Bayesian GWA analyses can give insight into pleiotropic effects by multi-trait, structural equation, or graphical models. Bayesian methods can also be used to combine genomic, transcriptomic, proteomic, and other -omics data to infer causal genotype to phenotype relationships and to suggest external interventions that can improve performance. Conclusions Bayesian hierarchical and variable selection methods provide a unified and powerful framework for genomic prediction, GWA, integration of prior information, and integration of information from other -omics platforms to identify causal mutations for complex quantitative traits.
Collapse
Affiliation(s)
- Anna Wolc
- Department of Animal Science, Iowa State University, 806 Stange Road, 239 Kildee Hall, Ames, IA, 50010, USA.,Hy-Line International, 2583 240th Street, Dallas Center, IA, 50063, USA
| | - Jack C M Dekkers
- Department of Animal Science, Iowa State University, 806 Stange Road, 239 Kildee Hall, Ames, IA, 50010, USA.
| |
Collapse
|
5
|
Naserkheil M, Mehrban H, Lee D, Park MN. Genome-wide Association Study for Carcass Primal Cut Yields Using Single-step Bayesian Approach in Hanwoo Cattle. Front Genet 2021; 12:752424. [PMID: 34899840 PMCID: PMC8662546 DOI: 10.3389/fgene.2021.752424] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 11/02/2021] [Indexed: 12/30/2022] Open
Abstract
The importance of meat and carcass quality is growing in beef cattle production to meet both producer and consumer demands. Primal cut yields, which reflect the body compositions of carcass, could determine the carcass grade and, consequently, command premium prices. Despite its importance, there have been few genome-wide association studies on these traits. This study aimed to identify genomic regions and putative candidate genes related to 10 primal cut traits, including tenderloin, sirloin, striploin, chuck, brisket, top round, bottom round, shank, flank, and rib in Hanwoo cattle using a single-step Bayesian regression (ssBR) approach. After genomic data quality control, 43,987 SNPs from 3,745 genotyped animals were available, of which 3,467 had phenotypic records for the analyzed traits. A total of 16 significant genomic regions (1-Mb window) were identified, of which five large-effect quantitative trait loci (QTLs) located on chromosomes 6 at 38–39 Mb, 11 at 21–22 Mb, 14 at 6–7 Mb and 26–27 Mb, and 19 at 26–27 Mb were associated with more than one trait, while the remaining 11 QTLs were trait-specific. These significant regions were harbored by 154 genes, among which TOX, FAM184B, SPP1, IBSP, PKD2, SDCBP, PIGY, LCORL, NCAPG, and ABCG2 were noteworthy. Enrichment analysis revealed biological processes and functional terms involved in growth and lipid metabolism, such as growth (GO:0040007), muscle structure development (GO:0061061), skeletal system development (GO:0001501), animal organ development (GO:0048513), lipid metabolic process (GO:0006629), response to lipid (GO:0033993), metabolic pathways (bta01100), focal adhesion (bta04510), ECM–receptor interaction (bta04512), fat digestion and absorption (bta04975), and Rap1 signaling pathway (bta04015) being the most significant for the carcass primal cut traits. Thus, identification of quantitative trait loci regions and plausible candidate genes will aid in a better understanding of the genetic and biological mechanisms regulating carcass primal cut yields.
Collapse
Affiliation(s)
- Masoumeh Naserkheil
- Animal Breeding and Genetics Division, National Institute of Animal Science, Cheonan-si, South Korea
| | - Hossein Mehrban
- Department of Animal Science, Shahrekord University, Shahrekord, Iran
| | - Deukmin Lee
- Department of Animal Life and Environment Sciences, Hankyong National University, Anseong-si, South Korea
| | - Mi Na Park
- Animal Breeding and Genetics Division, National Institute of Animal Science, Cheonan-si, South Korea
| |
Collapse
|
6
|
Zhou X, Cai X. Joint eQTL mapping and inference of gene regulatory network improves power of detecting both cis- and trans-eQTLs. Bioinformatics 2021; 38:149-156. [PMID: 34487140 PMCID: PMC8696109 DOI: 10.1093/bioinformatics/btab609] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Revised: 07/19/2021] [Accepted: 08/25/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Genetic variations of expression quantitative trait loci (eQTLs) play a critical role in influencing complex traits and diseases development. Two main factors that affect the statistical power of detecting eQTLs are: (i) relatively small size of samples available, and (ii) heavy burden of multiple testing due to a very large number of variants to be tested. The later issue is particularly severe when one tries to identify trans-eQTLs that are far away from the genes they influence. If one can exploit co-expressed genes jointly in eQTL-mapping, effective sample size can be increased. Furthermore, using the structure of the gene regulatory network (GRN) may help to identify trans-eQTLs without increasing multiple testing burden. RESULTS In this article, we use the structure equation model (SEM) to model both GRN and effect of eQTLs on gene expression, and then develop a novel algorithm, named sparse SEM for eQTL mapping (SSEMQ), to conduct joint eQTL mapping and GRN inference. The SEM can exploit co-expressed genes jointly in eQTL mapping and also use GRN to determine trans-eQTLs. Computer simulations demonstrate that our SSEMQ significantly outperforms nine existing eQTL mapping methods. SSEMQ is further used to analyze two real datasets of human breast and whole blood tissues, yielding a number of cis- and trans-eQTLs. AVAILABILITY AND IMPLEMENTATION R package ssemQr is available at https://github.com/Ivis4ml/ssemQr.git. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xin Zhou
- Department of Electrical and Computer Engineering, University of Miami, Coral Gables, FL 33146, USA
| | | |
Collapse
|
7
|
Li J, Wang Z, Fernando R, Cheng H. Tests of association based on genomic windows can lead to spurious associations when using genotype panels with heterogeneous SNP densities. Genet Sel Evol 2021; 53:45. [PMID: 34039266 PMCID: PMC8157676 DOI: 10.1186/s12711-021-00638-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 05/11/2021] [Indexed: 11/10/2022] Open
Abstract
Dense single nucleotide polymorphism (SNP) panels are widely used for genome-wide association studies (GWAS). In these panels, SNPs within a genomic segment tend to be highly correlated. Thus, association studies based on testing the significance of single SNPs are not very effective, and genomic-window based tests have been proposed to address this problem. However, when the SNP density on the genotype panel is not homogeneous, genomic-window based tests can lead to the detection of spurious associations by declaring effects of genomic windows that explain a large proportion of genetic variance as significant. We propose two methods to solve this problem.
Collapse
Affiliation(s)
- Jinghui Li
- Department of Animal Science, University of California, Davis, USA
| | - Zigui Wang
- Department of Animal Science, University of California, Davis, USA
| | - Rohan Fernando
- Department of Animal Science, Iowa State Univeristy, Ames, USA
| | - Hao Cheng
- Department of Animal Science, University of California, Davis, USA.
| |
Collapse
|
8
|
Momen M, Bhatta M, Hussain W, Yu H, Morota G. Modeling multiple phenotypes in wheat using data-driven genomic exploratory factor analysis and Bayesian network learning. PLANT DIRECT 2021; 5:e00304. [PMID: 33532691 PMCID: PMC7833463 DOI: 10.1002/pld3.304] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Revised: 12/03/2020] [Accepted: 12/16/2020] [Indexed: 06/12/2023]
Abstract
Inferring trait networks from a large volume of genetically correlated diverse phenotypes such as yield, architecture, and disease resistance can provide information on the manner in which complex phenotypes are interrelated. However, studies on statistical methods tailored to multidimensional phenotypes are limited, whereas numerous methods are available for evaluating the massive number of genetic markers. Factor analysis operates at the level of latent variables predicted to generate observed responses. The objectives of this study were to illustrate the manner in which data-driven exploratory factor analysis can map observed phenotypes into a smaller number of latent variables and infer a genomic latent factor network using 45 agro-morphological, disease, and grain mineral phenotypes measured in synthetic hexaploid wheat lines (Triticum aestivum L.). In total, eight latent factors including grain yield, architecture, flag leaf-related traits, grain minerals, yellow rust, two types of stem rust, and leaf rust were identified as common sources of the observed phenotypes. The genetic component of the factor scores for each latent variable was fed into a Bayesian network to obtain a trait structure reflecting the genetic interdependency among traits. Three directed paths were consistently identified by two Bayesian network algorithms. Flag leaf-related traits influenced leaf rust, and yellow rust and stem rust influenced grain yield. Additional paths that were identified included flag leaf-related traits to minerals and minerals to architecture. This study shows that data-driven exploratory factor analysis can reveal smaller dimensional common latent phenotypes that are likely to give rise to numerous observed field phenotypes without relying on prior biological knowledge. The inferred genomic latent factor structure from the Bayesian network provides insights for plant breeding to simultaneously improve multiple traits, as an intervention on one trait will affect the values of focal phenotypes in an interrelated complex trait system.
Collapse
Affiliation(s)
- Mehdi Momen
- Department of Animal and Poultry SciencesVirginia Polytechnic Institute and State UniversityBlacksburgVAUSA
| | - Madhav Bhatta
- Department of AgronomyUniversity of Wisconsin‐MadisonMadisonWIUSA
| | - Waseem Hussain
- International Rice Research InstituteLos BanosPhilippines
| | - Haipeng Yu
- Department of Animal and Poultry SciencesVirginia Polytechnic Institute and State UniversityBlacksburgVAUSA
| | - Gota Morota
- Department of Animal and Poultry SciencesVirginia Polytechnic Institute and State UniversityBlacksburgVAUSA
| |
Collapse
|