1
|
Momen M, Campbell MT, Walia H, Morota G. Utilizing trait networks and structural equation models as tools to interpret multi-trait genome-wide association studies. PLANT METHODS 2019; 15:107. [PMID: 31548847 PMCID: PMC6749677 DOI: 10.1186/s13007-019-0493-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/11/2019] [Accepted: 09/06/2019] [Indexed: 05/13/2023]
Abstract
BACKGROUND Plant breeders seek to develop cultivars with maximal agronomic value, which is often assessed using numerous, often genetically correlated traits. As intervention on one trait will affect the value of another, breeding decisions should consider the relationships among traits in the context of putative causal structures (i.e., trait networks). While multi-trait genome-wide association studies (MTM-GWAS) can infer putative genetic signals at the multivariate scale, standard MTM-GWAS does not accommodate the network structure of phenotypes, and therefore does not address how the traits are interrelated. We extended the scope of MTM-GWAS by incorporating trait network structures into GWAS using structural equation models (SEM-GWAS). Here, we illustrate the utility of SEM-GWAS using a digital metric for shoot biomass, root biomass, water use, and water use efficiency in rice. RESULTS A salient feature of SEM-GWAS is that it can partition the total single nucleotide polymorphism (SNP) effects acting on a trait into direct and indirect effects. Using this novel approach, we show that for most QTL associated with water use, total SNP effects were driven by genetic effects acting directly on water use rather that genetic effects originating from upstream traits. Conversely, total SNP effects for water use efficiency were largely due to indirect effects originating from the upstream trait, projected shoot area. CONCLUSIONS We describe a robust framework that can be applied to multivariate phenotypes to understand the interrelationships between complex traits. This framework provides novel insights into how QTL act within a phenotypic network that would otherwise not be possible with conventional multi-trait GWAS approaches. Collectively, these results suggest that the use of SEM may enhance our understanding of complex relationships among agronomic traits.
Collapse
Affiliation(s)
- Mehdi Momen
- Department of Animal and Poultry Sciences, Virginia Polytechnic Institute and State University, 175 West Campus Drive, Blacksburg, VA 24061 USA
| | - Malachy T. Campbell
- Department of Animal and Poultry Sciences, Virginia Polytechnic Institute and State University, 175 West Campus Drive, Blacksburg, VA 24061 USA
| | - Harkamal Walia
- Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE 68583 USA
| | - Gota Morota
- Department of Animal and Poultry Sciences, Virginia Polytechnic Institute and State University, 175 West Campus Drive, Blacksburg, VA 24061 USA
| |
Collapse
|
2
|
Genomic Bayesian Confirmatory Factor Analysis and Bayesian Network To Characterize a Wide Spectrum of Rice Phenotypes. G3-GENES GENOMES GENETICS 2019; 9:1975-1986. [PMID: 30992319 PMCID: PMC6553530 DOI: 10.1534/g3.119.400154] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
With the advent of high-throughput phenotyping platforms, plant breeders have a means to assess many traits for large breeding populations. However, understanding the genetic interdependencies among high-dimensional traits in a statistically robust manner remains a major challenge. Since multiple phenotypes likely share mutual relationships, elucidating the interdependencies among economically important traits can better inform breeding decisions and accelerate the genetic improvement of plants. The objective of this study was to leverage confirmatory factor analysis and graphical modeling to elucidate the genetic interdependencies among a diverse agronomic traits in rice. We used a Bayesian network to depict conditional dependencies among phenotypes, which can not be obtained by standard multi-trait analysis. We utilized Bayesian confirmatory factor analysis which hypothesized that 48 observed phenotypes resulted from six latent variables including grain morphology, morphology, flowering time, physiology, yield, and morphological salt response. This was followed by studying the genetics of each latent variable, which is also known as factor, using single nucleotide polymorphisms. Bayesian network structures involving the genomic component of six latent variables were established by fitting four algorithms (i.e., Hill Climbing, Tabu, Max-Min Hill Climbing, and General 2-Phase Restricted Maximization algorithms). Physiological components influenced the flowering time and grain morphology, and morphology and grain morphology influenced yield. In summary, we show the Bayesian network coupled with factor analysis can provide an effective approach to understand the interdependence patterns among phenotypes and to predict the potential influence of external interventions or selection related to target traits in the interrelated complex traits systems.
Collapse
|
3
|
Martínez CA, Khare K, Rahman S, Elzo MA. Gaussian covariance graph models accounting for correlated marker effects in genome-wide prediction. J Anim Breed Genet 2017; 134:412-421. [PMID: 28804930 DOI: 10.1111/jbg.12286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2016] [Accepted: 06/30/2017] [Indexed: 11/26/2022]
Abstract
Several statistical models used in genome-wide prediction assume uncorrelated marker allele substitution effects, but it is known that these effects may be correlated. In statistics, graphical models have been identified as a useful tool for covariance estimation in high-dimensional problems and it is an area that has recently experienced a great expansion. In Gaussian covariance graph models (GCovGM), the joint distribution of a set of random variables is assumed to be Gaussian and the pattern of zeros of the covariance matrix is encoded in terms of an undirected graph G. In this study, methods adapting the theory of GCovGM to genome-wide prediction were developed (Bayes GCov, Bayes GCov-KR and Bayes GCov-H). In simulated data sets, improvements in correlation between phenotypes and predicted breeding values and accuracies of predicted breeding values were found. Our models account for correlation of marker effects and permit to accommodate general structures as opposed to models proposed in previous studies, which consider spatial correlation only. In addition, they allow incorporation of biological information in the prediction process through its use when constructing graph G, and their extension to the multi-allelic loci case is straightforward.
Collapse
Affiliation(s)
- C A Martínez
- Department of Animal Sciences, University of Florida, Gainesville, FL, USA
| | - K Khare
- Department of Statistics, University of Florida, Gainesville, FL, USA
| | - S Rahman
- Department of Statistics, University of Florida, Gainesville, FL, USA
| | - M A Elzo
- Department of Animal Sciences, University of Florida, Gainesville, FL, USA
| |
Collapse
|
4
|
Töpner K, Rosa GJM, Gianola D, Schön CC. Bayesian Networks Illustrate Genomic and Residual Trait Connections in Maize ( Zea mays L.). G3 (BETHESDA, MD.) 2017; 7:2779-2789. [PMID: 28637811 PMCID: PMC5555481 DOI: 10.1534/g3.117.044263] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Accepted: 06/15/2017] [Indexed: 12/12/2022]
Abstract
Relationships among traits were investigated on the genomic and residual levels using novel methodology. This included inference on these relationships via Bayesian networks and an assessment of the networks with structural equation models. The methodology employed three steps. First, a Bayesian multiple-trait Gaussian model was fitted to the data to decompose phenotypic values into their genomic and residual components. Second, genomic and residual network structures among traits were learned from estimates of these two components. Network learning was performed using six different algorithmic settings for comparison, of which two were score-based and four were constraint-based approaches. Third, structural equation model analyses ranked the networks in terms of goodness of fit and predictive ability, and compared them with the standard multiple-trait fully recursive network. The methodology was applied to experimental data representing the European heterotic maize pools Dent and Flint (Zea mays L.). Inferences on genomic and residual trait connections were depicted separately as directed acyclic graphs. These graphs provide information beyond mere pairwise genetic or residual associations between traits, illustrating for example conditional independencies and hinting at potential causal links among traits. Network analysis suggested some genetic correlations as potentially spurious. Genomic and residual networks were compared between Dent and Flint.
Collapse
Affiliation(s)
- Katrin Töpner
- Plant Breeding, TUM School of Life Sciences Weihenstephan, Technical University of Munich, 85354 Freising, Germany
- Institute for Advanced Study, Technical University of Munich, 85748 Garching, Germany
| | - Guilherme J M Rosa
- Institute for Advanced Study, Technical University of Munich, 85748 Garching, Germany
- Department of Animal Sciences, University of Wisconsin-Madison, Wisconsin 53706
| | - Daniel Gianola
- Institute for Advanced Study, Technical University of Munich, 85748 Garching, Germany
- Department of Animal Sciences, University of Wisconsin-Madison, Wisconsin 53706
| | - Chris-Carolin Schön
- Plant Breeding, TUM School of Life Sciences Weihenstephan, Technical University of Munich, 85354 Freising, Germany
- Institute for Advanced Study, Technical University of Munich, 85748 Garching, Germany
| |
Collapse
|
5
|
Glória LS, Cruz CD, Vieira RAM, de Resende MDV, Lopes PS, de Siqueira OHD, Fonseca e Silva F. Accessing marker effects and heritability estimates from genome prediction by Bayesian regularized neural networks. Livest Sci 2016. [DOI: 10.1016/j.livsci.2016.07.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
6
|
Abstract
Models for genome-wide prediction and association studies usually target a single phenotypic trait. However, in animal and plant genetics it is common to record information on multiple phenotypes for each individual that will be genotyped. Modeling traits individually disregards the fact that they are most likely associated due to pleiotropy and shared biological basis, thus providing only a partial, confounded view of genetic effects and phenotypic interactions. In this article we use data from a Multiparent Advanced Generation Inter-Cross (MAGIC) winter wheat population to explore Bayesian networks as a convenient and interpretable framework for the simultaneous modeling of multiple quantitative traits. We show that they are equivalent to multivariate genetic best linear unbiased prediction (GBLUP) and that they are competitive with single-trait elastic net and single-trait GBLUP in predictive performance. Finally, we discuss their relationship with other additive-effects models and their advantages in inference and interpretation. MAGIC populations provide an ideal setting for this kind of investigation because the very low population structure and large sample size result in predictive models with good power and limited confounding due to relatedness.
Collapse
|
7
|
Morota G, Gianola D. Evaluation of linkage disequilibrium in wheat with an L1-regularized sparse Markov network. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2013; 126:1991-2002. [PMID: 23661079 DOI: 10.1007/s00122-013-2112-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2013] [Accepted: 04/26/2013] [Indexed: 06/02/2023]
Abstract
Linkage disequilibrium (LD) is defined as a stochastic dependence between alleles at two or more loci. Although understanding LD is important in the study of the genetics of many species, little attention has been paid on how a covariance structure between many loci distributed across the genome should be represented. Given that biological systems at the cellular level often involve gene networks, it is appealing to evaluate LD from a network perspective, i.e., as a set of associated loci involved in a complex system. We applied a Markov network (MN) to study LD using data on 1,279 markers derived from 599 wheat inbred lines. The MN attempts to account for association between two markers, conditionally on the remaining markers in the network model. In this study, the recovery of the structure of a LD network was done through two variants of pseudo-likelihoods subject to an L1 penalty on the MN parameters. It is shown that, while the L1-regularized Markov network preserves features of a Bayesian network (BN), the nodes in the resulting networks have fewer links. The resulting sparse network, encoding conditional independencies, provides a clearer picture of association than marginal LD metrics, and a sparse graph eases interpretation markedly, since it includes a smaller number of edges than a BN. Thus, an L1-regularized sparse Markov network seems appealing for representing conditional LD with high-dimensional genomic data, where variables, e.g., single nucleotide polymorphism markers, are expected to be sparsely connected.
Collapse
Affiliation(s)
- Gota Morota
- Department of Animal Sciences, University of Wisconsin, Madison, WI 53706, USA.
| | | |
Collapse
|
8
|
Morota G, Koyama M, Rosa GJM, Weigel KA, Gianola D. Predicting complex traits using a diffusion kernel on genetic markers with an application to dairy cattle and wheat data. Genet Sel Evol 2013; 45:17. [PMID: 23763755 PMCID: PMC3706293 DOI: 10.1186/1297-9686-45-17] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2012] [Accepted: 05/31/2013] [Indexed: 01/09/2023] Open
Abstract
Background Arguably, genotypes and phenotypes may be linked in functional forms that are not well addressed by the linear additive models that are standard in quantitative genetics. Therefore, developing statistical learning models for predicting phenotypic values from all available molecular information that are capable of capturing complex genetic network architectures is of great importance. Bayesian kernel ridge regression is a non-parametric prediction model proposed for this purpose. Its essence is to create a spatial distance-based relationship matrix called a kernel. Although the set of all single nucleotide polymorphism genotype configurations on which a model is built is finite, past research has mainly used a Gaussian kernel. Results We sought to investigate the performance of a diffusion kernel, which was specifically developed to model discrete marker inputs, using Holstein cattle and wheat data. This kernel can be viewed as a discretization of the Gaussian kernel. The predictive ability of the diffusion kernel was similar to that of non-spatial distance-based additive genomic relationship kernels in the Holstein data, but outperformed the latter in the wheat data. However, the difference in performance between the diffusion and Gaussian kernels was negligible. Conclusions It is concluded that the ability of a diffusion kernel to capture the total genetic variance is not better than that of a Gaussian kernel, at least for these data. Although the diffusion kernel as a choice of basis function may have potential for use in whole-genome prediction, our results imply that embedding genetic markers into a non-Euclidean metric space has very small impact on prediction. Our results suggest that use of the black box Gaussian kernel is justified, given its connection to the diffusion kernel and its similar predictive performance.
Collapse
Affiliation(s)
- Gota Morota
- Department of Animal Sciences, University of Wisconsin-Madison, Madison, WI, USA.
| | | | | | | | | |
Collapse
|
9
|
Edwards D. Modelling and visualizing fine-scale linkage disequilibrium structure. BMC Bioinformatics 2013; 14:179. [PMID: 23742095 PMCID: PMC3683336 DOI: 10.1186/1471-2105-14-179] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2013] [Accepted: 05/29/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Detailed study of genetic variation at the population level in humans and other species is now possible due to the availability of large sets of single nucleotide polymorphism data. Alleles at two or more loci are said to be in linkage disequilibrium (LD) when they are correlated or statistically dependent. Current efforts to understand the genetic basis of complex phenotypes are based on the existence of such associations, making study of the extent and distribution of linkage disequilibrium central to this endeavour. The objective of this paper is to develop methods to study fine-scale patterns of allelic association using probabilistic graphical models. RESULTS An efficient, linear-time forward-backward algorithm is developed to estimate chromosome-wide LD models by optimizing a penalized likelihood criterion, and a convenient way to display these models is described. To illustrate the methods they are applied to data obtained by genotyping 8341 pigs. It is found that roughly 20% of the porcine genome exhibits complex LD patterns, forming islands of relatively high genetic diversity. CONCLUSIONS The proposed algorithm is efficient and makes it feasible to estimate and visualize chromosome-wide LD models on a routine basis.
Collapse
Affiliation(s)
- David Edwards
- Department of Molecular Biology and Genetics, Centre for Quantitative Genetics and Genomics, Blichers Allé 20, Tjele 8830, Denmark.
| |
Collapse
|